You are on page 1of 543

MEASURE THEORY

Volume 2

D.H.Fremlin
By the same author:
Topological Riesz Spaces and Measure Theory, Cambridge University Press, 1974.
Consequences of Martin’s Axiom, Cambridge University Press, 1982.

Companions to the present volume:


Measure Theory, vol. 1, Torres Fremlin, 2000;
Measure Theory, vol. 3, Torres Fremlin, 2002;
Measure Theory, vol. 4, Torres Fremlin, 2003;
Measure Theore, vol. 5, Torres Fremlin, 2008.

First edition May 2001

Second edition January 2010


MEASURE THEORY
Volume 2
Broad Foundations

D.H.Fremlin
Research Professor in Mathematics, University of Essex
Dedicated by the Author
to the Publisher

This book may be ordered from the printers, http://www.lulu.com/buy.

First published in 2001


by Torres Fremlin, 25 Ireton Road, Colchester CO3 3AT, England
c D.H.Fremlin 2001
The right of D.H.Fremlin to be identified as author of this work has been asserted in accordance with the Copyright,
Designs and Patents Act 1988. This work is issued under the terms of the Design Science License as published in
http://www.gnu.org/licenses/dsl.html. For the source files see http://www.essex.ac.uk/maths/staff/fremlin
/mt2.2010/index.htm.
Library of Congress classification QA312.F72
AMS 2010 classification 28-01
ISBN 978-0-9538129-7-4
Typeset by AMS-TEX
Printed by Lulu.com
5

Contents

General Introduction 9

Introduction to Volume 2 10

*Chapter 21: Taxonomy of measure spaces


Introduction 12
211 Definitions 12
Complete, totally finite, σ-finite, strictly localizable, semi-finite, localizable, locally determined measure spaces; atoms;
elementary relationships; countable-cocountable measures.
212 Complete spaces 17
Measurable and integrable functions on complete spaces; completion of a measure.
213 Semi-finite, locally determined and localizable spaces 23
Integration on semi-finite spaces; c.l.d. versions; measurable envelopes; characterizing localizability, strict localizabil-
ity, σ-finiteness.
214 Subspaces 33
Subspace measures on arbitrary subsets; integration; direct sums of measure spaces; *extending measures to well-
ordered families of sets.
215 σ-finite spaces and the principle of exhaustion 43
The principle of exhaustion; characterizations of σ-finiteness; the intermediate value theorem for atomless measures.
*216 Examples 46
A complete localizable non-locally-determined space; a complete locally determined non-localizable space; a complete
locally determined localizable space which is not strictly localizable.

Chapter 22: The Fundamental Theorem of Calculus


Introduction 52
221 Vitali’s theorem in R 52
Vitali’s theorem for intervals in R.
222 Differentiating an indefinite integral Rx
55
d
Monotonic functions are differentiable a.e., and their derivatives are integrable; dx a f = f a.e.; *the Denjoy-Young-
Saks theorem.
223 Lebesgue’s density theorems 63
1
R x+h 1
R x+h
f (x) = limh↓0 2h x−h f a.e. (x); density points; limh↓0 2h x−h |f − f (x)| = 0 a.e. (x); the Lebesgue set of a
function.
224 Functions of bounded variation 67
Variation of a function; differences ofR monotonic functions; sums and products, limits, continuity and differentiability
for b.v. functions; an inequality for f ×g.
225 Absolutely continuous functions 75
Absolute continuity of indefinite integrals; absolutely continuous functions on R; integration by parts; lower semi-
continuous functions; *direct images of negligible sets; the Cantor function.
*226 The Lebesgue decomposition of a function of bounded variation 84
Sums over arbitrary index sets; saltus functions; the Lebesgue decomposition.

Chapter 23: The Radon-Nikodým theorem


Introduction 92
231 Countably additive functionals 92
Additive and countably additive functionals; Jordan and Hahn decompositions.
232 The Radon-Nikodým theorem 96
Absolutely and truly continuous additive functionals; truly continuous functionals are indefinite integrals; *the
Lebesgue decomposition of a countably additive functional.
233 Conditional expectations 105
σ-subalgebras; conditional expectations of integrable functions; convex functions; Jensen’s inequality.
234 Operations on measures 112
Inverse-measure-preserving functions; image measures; sums of measures; indefinite-integral measures; ordering of
measures.
235 MeasurableR transformations
R
122
The formula g(y)ν(dy) = J(x)g(φ(x))µ(dx); detailed conditions of applicability; inverse-measure-preserving func-
tions; the image measure catastrophe; using the Radon-Nikodým theorem.
6

Chapter 24: Function spaces


Introduction 133
241 L0 and L0 133
The linear, order and multiplicative structure of L0 ; Dedekind completeness and localizability; action of Borel func-
tions.
242 L1 141
The normed lattice L1 ; integration as a linear functional; completeness and Dedekind completeness; the Radon-
Nikodým theorem and conditional expectations; convex functions; dense subspaces.
243 L∞ 151
The normed lattice L∞ ; norm-completeness; the duality between L1 and L∞ ; localizability, Dedekind completeness
and the identification L∞ ∼
= (L1 )∗ .
244 Lp 158
The normed lattices Lp , for 1 < p < ∞; Hölder’s inequality; completeness and Dedekind completeness; (Lp )∗ ∼
= Lq ;
conditional expectations; *uniform convexity.
245 Convergence in measure 172
The topology of (local) convergence in measure on L0 ; pointwise convergence; localizability and Dedekind complete-
ness; embedding Lp in L0 ; k k1 -convergence and convergence in measure; σ-finite spaces, metrizability and sequential
convergence.
246 Uniform integrability 183
Uniformly integrable sets in L1 and L1 ; elementary properties; disjoint-sequence characterizations; k k1 and conver-
gence in measure on uniformly integrable sets.
247 Weak compactness in L1 191
A subset of L1 is uniformly integrable iff it is relatively weakly compact.

Chapter 25: Product measures


Introduction 196
251 Finite products 196
Primitive and c.l.d. products; basic properties; Lebesgue measure on Rr+s as a product measure; products of direct
sums and subspaces; c.l.d. versions.
252 Fubini’s
RR
theorem R
212
When f (x, y)dxdy and f (x, y)d(x, y) are equal; measures of ordinate sets; *the volume of a ball in Rr .
253 Tensor products 228
Bilinear operators; bilinear operators L1 (µ)×L1 (ν)→W and linear operators L1 (µ×ν)→W ; positive bilinear operators
and the ordering of L1 (µ × ν); conditional expectations; upper integrals.
254 Infinite products 238
Products of arbitrary families of probability spaces; basic properties; inverse-measure-preserving functions; usual
measure on {0, 1}I ; {0, 1}N isomorphic, as measure space, to [0, 1]; subspaces of full outer measure; sets determined
by coordinates in a subset of the index set; generalized associative law for products of measures; subproducts as
image measures; factoring functions through subproducts; conditional expectations on subalgebras corresponding to
subproducts.
255 Convolutions of functions R R
255
Shifts in R2 as measure space automorphisms; convolutions of functions on R; h × (f ∗ g) = h(x + y)f (x)g(y)d(x, y);
r
f ∗ (g ∗ h) = (f ∗ g) ∗ h; kf ∗ gk1 ≤ kf k1 kgk1 ; the groups R and ]−π, π].
256 Radon measures on Rr 267
Definition of Radon measures on Rr ; completions of Borel measures; Lusin measurability; image measures; products
of two Radon measures; semi-continuous functions.
257 Convolutions of measures R RR
274
Convolution of totally finite Radon measures on Rr ; h d(ν1 ∗ν2 ) = h(x+y)ν1 (dx)ν2 (dy); ν1 ∗(ν2 ∗ν3 ) = (ν1 ∗ν2 )∗ν3 .

Chapter 26: Change of variable in the integral


Introduction 277
261 Vitali’s theorem in R r 277
Vitali’s theorem for balls in R r ; Lebesgue’s Density Theorem.
262 Lipschitz and differentiable functions 285
Lipschitz functions; elementary properties; differentiable functions from R r to R s ; differentiability and partial deriva-
tives; approximating a differentiable function by piecewise affine functions; *Rademacher’s theorem.
263 DifferentiableR transformations
R
in R r 296
In the formula g(y)dy = J(x)g(φ(x))dx, find J when φ is (i) linear (ii) differentiable; detailed conditions of
applicability; polar coordinates; the one-dimensional case.
264 Hausdorff measures 307
r-dimensional Hausdorff measure on R s ; Borel sets are measurable; Lipschitz functions; if s = r, we have a multiple
of Lebesgue measure; *Cantor measure as a Hausdorff measure.
7

265 Surface measures 316


Normalized Hausdorff measure; action of linear operators and differentiable functions; surface measure on a sphere.
*266 The Brunn-Minkowski inequality 324
Arithmetic and geometric means; essential closures; the Brunn-Minkowski inequality.

Chapter 27: Probability theory


Introduction 328
271 Distributions 329
Terminology; distributions as Radon measures; distribution functions; densities; transformations of random variables;
*distribution functions and convergence in measure.
272 Independence 335
Independent families of random variables; characterizations of independence; joint distributions of (finite) independent
families, and product measures; the zero-one law; E(X×Y ), Var(X + Y ); distribution of a sum as convolution of
distributions; Etemadi’s inequality; *Hoeffding’s inequality.
273 The strong law of large numbers 348
1 Pn P∞ 1
n+1 i=0 Xi →0 a.e. if the Xn are independent with zero expectation and either (i) n=0 (n+1)2 Var(Xn ) < ∞ or
P∞ 1+δ ) < ∞
(ii) n=0 E(|Xn | for some δ > 0 or (iii) the Xn are identically distributed.
274 The Central Limit Theorem R∞
360
2
Normally distributed r.v.s; Lindeberg’s conditions for the Central Limit Theorem; corollaries; estimating e−x /2 dx.
α
275 Martingales 371
Sequences of σ-algebras, and martingales adapted to them; up-crossings; Doob’s Martingale Convergence Theorem;
uniform integrability, k k1 -convergence and martingales as sequences of conditional expectations; reverse martingales;
stopping times.
276 Martingale difference sequences 382
Martingale difference sequences; strong law of large numbers for m.d.ss.; Komlós’ theorem.

Chapter 28: Fourier analysis


Introduction 390
281 The Stone-Weierstrass theorem 390
Approximating a function on a compact set by members of a given lattice or algebra of functions; real and complex
cases; approximation by polynomials and trigonometric functions; Weyl’s Equidistribution Theorem in [0, 1]r .
282 Fourier series 401
Fourier and Fejér sums; Dirichlet and Fejér kernels; Riemann-Lebesgue lemma; uniform convergence of Fejér sums of
a continuous function; a.e. and k k1 -convergence of Fejér sums of an integrable function; k k2 -convergence of Fourier
sums of a square-integrable function; convergence of Fourier sums of a differentiable or b.v. function; convolutions
and Fourier coefficients.
283 Fourier transforms I 419
R∞ 1 1
∧∨
Fourier and inverse Fourier transforms; elementary properties; 0 x
sin x dx = 2
π; the formula f = f for differen-
2 R ∧ R ∧
tiable and b.v. f ; convolutions; e−x /2 ; f × g = f ×g.
284 Fourier transforms II 434
∧∨
Test functions; h = h; tempered functions; tempered functions which represent each other’s transforms; convolutions;
square-integrable functions; Dirac’s delta function.
285 Characteristic functions 450
The characteristic function of a distribution; independent r.v.s; the normal distribution; the vague topology on the
space of distributions, and sequential convergence of characteristic functions; Poisson’s theorem.
*286 Carleson’s theorem 464
The Hardy-Littlewood Maximal Theorem; the Lacey-Thiele proof of Carleson’s theorem.

Appendix to Volume 2
Introduction 494
2A1 Set theory 494
Ordered sets; transfinite recursion; ordinals; initial ordinals; Schröder-Bernstein theorem; filters; Axiom of Choice;
Zermelo’s Well-Ordering Theorem; Zorn’s Lemma; ultrafilters.
2A2 The topology of Euclidean space 500
Closures; continuous functions; compact sets; open sets in R.
2A3 General topology 503
Topologies; continuous functions; subspace topologies; closures and interiors; Hausdorff topologies; pseudometrics;
convergence of sequences; compact spaces; cluster points of sequences; convergence of filters; lim sup and lim inf;
product topologies; dense subsets.
8

2A4 Normed spaces 511


Normed spaces; linear subspaces; Banach spaces; bounded linear operators; dual spaces; extending a linear operator
from a dense subspace; normed algebras.
2A5 Linear topological spaces 514
Linear topological spaces; topologies defined by functionals; convex sets; completeness; weak topologies.
2A6 Factorization of matrices 517
Determinants; orthonormal families; T = P DQ where D is diagonal and P , Q are orthogonal.

Concordance 519
References for Volume 2 521
Index to Volumes 1 and 2
Principal topics and results 524
General index 528
General introduction 9

General introduction In this treatise I aim to give a comprehensive description of modern abstract measure theory,
with some indication of its principal applications. The first two volumes are set at an introductory level; they are
intended for students with a solid grounding in the concepts of real analysis, but possibly with rather limited detailed
knowledge. As the book proceeds, the level of sophistication and expertise demanded will increase; thus for the volume
on topological measure spaces, familiarity with general topology will be assumed. The emphasis throughout is on the
mathematical ideas involved, which in this subject are mostly to be found in the details of the proofs.
My intention is that the book should be usable both as a first introduction to the subject and as a reference work.
For the sake of the first aim, I try to limit the ideas of the early volumes to those which are really essential to the
development of the basic theorems. For the sake of the second aim, I try to express these ideas in their full natural
generality, and in particular I take care to avoid suggesting any unnecessary restrictions in their applicability. Of course
these principles are to to some extent contradictory. Nevertheless, I find that most of the time they are very nearly
reconcilable, provided that I indulge in a certain degree of repetition. For instance, right at the beginning, the puzzle
arises: should one develop Lebesgue measure first on the real line, and then in spaces of higher dimension, or should
one go straight to the multidimensional case? I believe that there is no single correct answer to this question. Most
students will find the one-dimensional case easier, and it therefore seems more appropriate for a first introduction, since
even in that case the technical problems can be daunting. But certainly every student of measure theory must at a
fairly early stage come to terms with Lebesgue area and volume as well as length; and with the correct formulations,
the multidimensional case differs from the one-dimensional case only in a definition and a (substantial) lemma. So
what I have done is to write them both out (§§114-115). In the same spirit, I have been uninhibited, when setting
out exercises, by the fact that many of the results I invite students to look for will appear in later chapters; I believe
that throughout mathematics one has a better chance of understanding a theorem if one has previously attempted
something similar alone.
The plan of the work is as follows:
Volume 1: The Irreducible Minimum
Volume 2: Broad Foundations
Volume 3: Measure Algebras
Volume 4: Topological Measure Spaces
Volume 5: Set-theoretic Measure Theory.
Volume 1 is intended for those with no prior knowledge of measure theory, but competent in the elementary techniques
of real analysis. I hope that it will be found useful by undergraduates meeting Lebesgue measure for the first time.
Volume 2 aims to lay out some of the fundamental results of pure measure theory (the Radon-Nikodým theorem,
Fubini’s theorem), but also gives short introductions to some of the most important applications of measure theory
(probability theory, Fourier analysis). While I should like to believe that most of it is written at a level accessible
to anyone who has mastered the contents of Volume 1, I should not myself have the courage to try to cover it in an
undergraduate course, though I would certainly attempt to include some parts of it. Volumes 3 and 4 are set at a
rather higher level, suitable to postgraduate courses; while Volume 5 assumes a wide-ranging competence over large
parts of analysis and set theory.
There is a disclaimer which I ought to make in a place where you might see it in time to avoid paying for this book.
I make no attempt to describe the history of the subject. This is not because I think the history uninteresting or
unimportant; rather, it is because I have no confidence of saying anything which would not be seriously misleading.
Indeed I have very little confidence in anything I have ever read concerning the history of ideas. So while I am happy to
honour the names of Lebesgue and Kolmogorov and Maharam in more or less appropriate places, and I try to include
in the bibliographies the works which I have myself consulted, I leave any consideration of the details to those bolder
and better qualified than myself.
For the time being, at least, printing will be in short runs. I hope that readers will be energetic in commenting on
errors and omissions, since it should be possible to correct these relatively promptly. An inevitable consequence of this
is that paragraph references may go out of date rather quickly. I shall be most flattered if anyone chooses to rely on
this book as a source for basic material; and I am willing to attempt to maintain a concordance to such references,
indicating where migratory results have come to rest for the moment, if authors will supply me with copies of papers
which use them. In the concordance to the present volume you will find notes on the items which have been referred
to in other published volumes of this work.
I mention some minor points concerning the layout of the material. Most sections conclude with lists of ‘basic
exercises’ and ‘further exercises’, which I hope will be generally instructive and occasionally entertaining. How many
of these you should attempt must be for you and your teacher, if any, to decide, as no two students will have quite
the same needs. I mark with a > those which seem to me to be particularly important. But while you may not need
to write out solutions to all the ‘basic exercises’, if you are in any doubt as to your capacity to do so you should take
10 General introduction

this as a warning to slow down a bit. The ‘further exercises’ are unbounded in difficulty, and are unified only by a
presumption that each has at least one solution based on ideas already introduced. Occasionally I add a final ‘problem’,
a question to which I do not know the answer and which seems to arise naturally in the course of the work.
The impulse to write this book is in large part a desire to present a unified account of the subject. Cross-references
are correspondingly abundant and wide-ranging. In order to be able to refer freely across the whole text, I have chosen
a reference system which gives the same code name to a paragraph wherever it is being called from. Thus 132E is the
fifth paragraph in the second section of the third chapter of Volume 1, and is referred to by that name throughout. Let
me emphasize that cross-references are supposed to help the reader, not distract her. Do not take the interpolation
‘(121A)’ as an instruction, or even a recommendation, to lift Volume 1 off the shelf and hunt for §121. If you are happy
with an argument as it stands, independently of the reference, then carry on. If, however, I seem to have made rather
a large jump, or the notation has suddenly become opaque, local cross-references may help you to fill in the gaps.
Each volume will have an appendix of ‘useful facts’, in which I set out material which is called on somewhere in that
volume, and which I do not feel I can take for granted. Typically the arrangement of material in these appendices is
directed very narrowly at the particular applications I have in mind, and is unlikely to be a satisfactory substitute for
conventional treatments of the topics touched on. Moreover, the ideas may well be needed only on rare and isolated
occasions. So as a rule I recommend you to ignore the appendices until you have some direct reason to suppose that a
fragment may be useful to you.
During the extended gestation of this project I have been helped by many people, and I hope that my friends and
colleagues will be pleased when they recognise their ideas scattered through the pages below. But I am especially
grateful to those who have taken the trouble to read through earlier drafts and comment on obscurities and errors.

Introduction to Volume 2
For this second volume I have chosen seven topics through which to explore the insights and challenges offered by
measure theory. Some, like the Radon-Nikodým theorem (Chapter 23) are necessary for any understanding of the
structure of the subject; others, like Fourier analysis (Chapter 28) and the discussion of function spaces (Chapter 24)
demonstrate the power of measure theory to attack problems in general real and functional analysis. But all have
applications outside measure theory, and all have influenced its development. These are the parts of measure theory
which any analyst may find himself using.
Every topic is one which ideally one would wish undergraduates to have seen, but the length of this volume makes
it plain that no ordinary undergraduate course could include very much of it. It is directed rather at graduate level,
where I hope it will be found adequate to support all but the most ambitious courses in measure theory, though it is
perhaps a bit too solid to be suitable for direct use as a course text, except with careful selection of the parts to be
covered. If you are using it to teach yourself measure theory, I strongly recommend an eclectic approach, looking for
particular subjects and theorems that seem startling or useful, and working backwards from them. My other objective,
of course, is to provide an account of the central ideas at this level in measure theory, rather fuller than can easily be
found in one volume elsewhere. I cannot claim that it is ‘definitive’, but I do think I cover a good deal of ground in ways
that provide a firm foundation for further study. As in Volume 1, I usually do not shrink from giving ‘best’ results, like
Lindeberg’s conditions for the Central Limit Theorem (§274), or the theory of products of arbitrary measure spaces
(§251). If I were teaching this material to students in a PhD programme I would rather accept a limitation in the
breadth of the course than leave them unaware of what could be done in the areas discussed.
The topics interact in complex ways – one of the purposes of this book is to exhibit their relationships. There is no
canonical linear ordering in which they should be taken. Nor do I think organization charts are very helpful, not least
because it may be only two or three paragraphs in a section which are needed for a given chapter later on. I do at least
try to lay the material of each section out in an order which makes initial segments useful by themselves. But the order
in which to take the chapters is to a considerable extent for you to choose, perhaps after a glance at their individual
introductions. I have done my best to pitch the exposition at much the same level throughout the volume, sometimes
allowing gradients to steepen in the course of a chapter or a section, but always trying to return to something which
anyone who has mastered Volume 1 ought to be able to cope with. (Though perhaps the main theorems of Chapter 26
are harder work than the principal results elsewhere, and §286 is only for the most determined.)
I said there were seven topics, and you will see eight chapters ahead of you. This is because Chapter 21 is rather
different from the rest. It is the purest of pure measure theory, and is here only because there are places later in
the volume where (in my view) the theorems make sense only in the light of some abstract concepts which are not
particularly difficult, but are also not obvious. However it is fair to say that the most important ideas of this volume
do not really depend on the work of Chapter 21.
As always, it is a puzzle to know how much prior knowledge to assume in this volume. I do of course call on the
results of Volume 1 of this treatise whenever they seem to be relevant. I do not doubt, however, that there will be
Introduction to Volume 2 11

readers who have learnt the elementary theory from other sources. Provided you can, from first principles, construct
Lebesgue measure and prove the basic convergence theorems for integrals on arbitrary measure spaces, you ought to be
able to embark on the present volume. Perhaps it would be helpful to have in hand the results-only version of Volume
1, since that includes the most important definitions as well as statements of the theorems.
There is also the question of how much material from outside measure theory is needed. Chapter 21 calls for some
non-trivial set theory (given in §2A1), but the more advanced ideas are needed only for the counter-examples in §216,
and can be passed over to begin with. The problems become acute in Chapter 24. Here we need a variety of results
from functional analysis, some of them depending on non-trivial ideas in general topology. For a full understanding of
this material there is no substitute for a course in normed spaces up to and including a study of weak compactness.
But I do not like to insist on such a preparation, because it is likely to be simultaneously too much and too little.
Too much, because I hardly mention linear operators at this stage; too little, because I do ask for some of the theory
of non-locally-convex spaces, which is often omitted in first courses on functional analysis. At the risk, therefore, of
wasting paper, I have written out condensed accounts of the essential facts (§§2A3-2A5).

Note on second printing, April 2003


For the second printing of this volume, I have made two substantial corrections to inadequate proofs and a large
number of minor amendments; I am most grateful to T.D.Austin for his careful reading of the first printing. In addition,
I have added a dozen exercises and a handful of straightforward results which turn out to be relevant to the work of
later volumes and fit naturally here.
The regular process of revision of this work has led me to make a couple of notational innovations not de-
scribed explicitly in the early printings of Volume 1. I trust that most readers will find these immediately com-
prehensible. If, however, you find that there is a puzzling cross-reference which you are unable to match with
anything in the version of Volume 1 which you are using, it may be worth while checking the errata pages in
http://www.essex.ac.uk/maths/staff/fremlin/mterr.htm.

Note on second edition, January 2010


For the new (‘Lulu’) edition of this volume, I have eliminated a number of further errors; no doubt many remain.
There are many new exercises, several new theorems and some corresponding rearrangements of material. The new
results are mostly additions with little effect on the structure of the work, but there is a short new section (§266) on
the Brunn-Minkowski inequality.
12 Taxonomy of measure spaces

*Chapter 21
Taxonomy of measure spaces
I begin this volume with a ‘starred chapter’. The point is that I do not really recommend this chapter for beginners.
It deals with a variety of technical questions which are of great importance for the later development of the subject,
but are likely to be both abstract and obscure for anyone who has not encountered the problems these techniques are
designed to solve. On the other hand, if (as is customary) this work is omitted, and the ideas are introduced only when
urgently needed, the student is likely to finish with very vague ideas on which theorems can be expected to apply to
which types of measure space, and with no vocabulary in which to express those ideas. I therefore take a few pages to
introduce the terminology and concepts which can be used to distinguish ‘good’ measure spaces from others, with a
few of the basic relationships. The only paragraphs which are immediately relevant to the theory set out in Volume 1
are those on ‘complete’, ‘σ-finite’ and ‘semi-finite’ measure spaces (211A, 211D, 211F, 211Lc, §212, 213A-213B, 215B),
and on Lebesgue measure (211M). For the rest, I think that a newcomer to the subject can very well pass over this
chapter for the time being, and return to it for particular items when the text of later chapters refers to it. On the
other hand, it can also be used as an introduction to the flavour of the ‘purest’ kind of measure theory, the study of
measure spaces for their own sake, with a systematic discussion of a few of the elementary constructions.

211 Definitions
I start with a list of definitions, corresponding to the concepts which I have found to be of value in distinguishing
different types of measure space. Necessarily, the significance of many of these ideas is likely to be obscure until you
have encountered some of the obstacles which arise later on. Nevertheless, you will I hope be able to deal with these
definitions on a formal, abstract basis, and to follow the elementary arguments involved in establishing the relationships
between them (211L).
In 216C-216E below I will give three substantial examples to demonstrate the rich variety of objects which the
definition of ‘measure space’ encompasses. In the present section, therefore, I content myself with very brief descriptions
of sufficient cases to show at least that each of the definitions here discriminates between different spaces (211M-211R).

211A Definition Let (X, Σ, µ) be a measure space. Then µ, or (X, Σ, µ), is (Carathéodory) complete if whenever
A ⊆ E ∈ Σ and µE = 0 then A ∈ Σ; that is, if every negligible subset of X is measurable.

211B Definition Let (X, Σ, µ) be a measure space. Then (X, Σ, µ) is a probability space if µX = 1. In this case
µ is called a probability or probability measure.

211C Definition Let (X, Σ, µ) be a measure space. Then µ, or (X, Σ, µ), is totally finite if µX < ∞.

S Then µ, or (X, Σ, µ), is σ-finite if there is a sequence hEn in∈N


211D Definition Let (X, Σ, µ) be a measure space.
of measurable sets of finite measure such that X = n∈N En .
Remark Note that in this case we can set
S S
F n = En \ i<n Ei , Gn = i≤n Ei
for each n, to obtain a partition hFn in∈N of X (that is, a disjoint cover of X) into measurable sets of finite measure,
and a non-decreasing sequence hGn in∈N of sets of finite measure covering X.

211E Definition Let (X, Σ, µ) be a measure space. Then µ, or (X, Σ, µ), is strictly localizable or decomposable
if there is a partition hXi ii∈I of X into measurable sets of finite measure such that
Σ = {E : E ⊆ X, E ∩ Xi ∈ Σ ∀ i ∈ I},
P
µE = i∈I µ(E ∩ Xi ) for every E ∈ Σ.
I will call such a family hXi ii∈I a decomposition of X.
P
Remark In this context, we can interpret the sum i∈I µ(E ∩ Xi ) simply as
P
sup{ i∈J µ(E ∩ Xi ) : J is a finite subset of I},
P
taking i∈∅ µ(E ∩ Xi ) = 0, because we are concerned only with sums of non-negative terms (cf. 112Bd).
211L Definitions 13

211F Definition Let (X, Σ, µ) be a measure space. Then µ, or (X, Σ, µ), is semi-finite if whenever E ∈ Σ and
µE = ∞ there is an F ⊆ E such that F ∈ Σ and 0 < µF < ∞.

211G Definition Let (X, Σ, µ) be a measure space. Then µ, or (X, Σ, µ), is localizable if it is semi-finite and, for
every E ⊆ Σ, there is an H ∈ Σ such that (i) E \ H is negligible for every E ∈ E (ii) if G ∈ Σ and E \ G is negligible for
every E ∈ E, then H \ G is negligible. It will be convenient to call such a set H an essential supremum of E in Σ.
Remark The definition here is clumsy, because really the concept applies to measure algebras rather than to measure
spaces (see 211Ya-211Yb). However, the present definition can be made to work (see 213N, 241G, 243G below) and
enables us to proceed without a formal introduction to the concept of ‘measure algebra’ before the time comes to do
the job properly in Volume 3.

211H Definition Let (X, Σ, µ) be a measure space. Then µ, or (X, Σ, µ), is locally determined if it is semi-finite
and
Σ = {E : E ⊆ X, E ∩ F ∈ Σ whenever F ∈ Σ and µF < ∞};
that is to say, for any E ∈ PX \ Σ there is an F ∈ Σ such that µF < ∞ and E ∩ F ∈
/ Σ.

211I Definition Let (X, Σ, µ) be a measure space. A set E ∈ Σ is an atom for µ if µE > 0 and whenever F ∈ Σ,
F ⊆ E one of F , E \ F is negligible.

211J Definition Let (X, Σ, µ) be a measure space. Then µ, or (X, Σ, µ), is atomless or diffused if there are no
atoms for µ. (Note that this is not the same thing as saying that all finite sets are negligible; see 211R below. Some
authors use the word continuous in this context.)

211K Definition Let (X, Σ, µ) be a measure space. Then µ, or (X, Σ, µ), is purely atomic if whenever E ∈ Σ
and E is not negligible there is an atom for µ included in E.
P
Remark Recall that a measure µ on a set X is point-supported if µ measures every subset of X and µE = x∈E µ{x}
for every E ⊆ X (112Bd). Every point-supported measure is purely atomic, because {x} must be an atom whenever
µ{x} > 0, but not every purely atomic measure is point-supported (211R).

211L The relationships between the concepts above are in a sense very straightforward; all the direct implications
in which one property implies another are given in the next theorem.
Theorem (a) A probability space is totally finite.
(b) A totally finite measure space is σ-finite.
(c) A σ-finite measure space is strictly localizable.
(d) A strictly localizable measure space is localizable and locally determined.
(e) A localizable measure space is semi-finite.
(f) A locally determined measure space is semi-finite.
proof (a), (b), (e) and (f) are trivial.
(c) Let (X, Σ, µ) be a σ-finite measure space; let hFn in∈N be a disjoint sequence of measurable sets of finite measure
covering X (see the remark in 211D). If E ∈ Σ, then of course E ∩ Fn ∈ Σ for every n ∈ N, and
P∞ P
µE = n=0 µ(E ∩ Fn ) = n∈N µ(E ∩ Fn ).
If E ⊆ X and E ∩ Fn ∈ Σ for every n ∈ N, then
S
E= n∈N E ∩ Fn ∈ Σ.
So hFn in∈N is a decomposition of X and (X, Σ, µ) is strictly localizable.
(d) Let (X, Σ, µ) be a strictly localizable measure space; let hXi ii∈I be a decomposition of X.
(i) Let E be a family of measurable subsets of X. Let F be the family of measurableSsets F ⊆ X such that
µ(F ∩ E) = 0 for every E ∈ E. Note that ∅ ∈ F and, if hFn in∈N is any sequence in F, then n∈N Fn ∈ F. For each
i ∈ I, set γi = sup{µ(F ∩ Xi ) : F ∈ F} and choose a sequence hFin in∈N in F such that limn→∞ µ(Fin ∩ Xi ) = γi ; set
S
Fi = n∈N Fin ∈ F.
14 Taxonomy of measure spaces 211L

Set
S
F = i∈I Fi ∩ X i ⊆ X
and H = X \ F .
We see that F ∩ Xi = Fi ∩ Xi for each i ∈ I (because hXi ii∈I is disjoint), so F ∈ Σ and H ∈ Σ. For any E ∈ E,
P P
µ(E \ H) = µ(E ∩ F ) = i∈I µ(E ∩ F ∩ Xi ) = i∈I µ(E ∩ Fi ∩ Xi ) = 0
because every Fi belongs to F. Thus F ∈ F. If G ∈ Σ and µ(E \ G) = 0 for every E ∈ E, then X \ G and
F ′ = F ∪ (X \ G) belong to F. So µ(F ′ ∩ Xi ) ≤ γi for each i ∈ I. But also µ(F ∩ Xi ) ≥ supn∈N µ(Fin ∩ Xi ) = γi , so
µ(F ∩ Xi ) = µ(F ′ ∩ Xi ) for each i. Because µXi is finite, it follows that µ((F ′ \ F ) ∩ Xi ) = 0, for each i. Summing
over i, µ(F ′ \ F ) = 0, that is, µ(H \ G) = 0.
Thus H is an essential supremum for E in Σ. As E is arbitrary, (X, Σ, µ) is localizable.
(ii) If E ∈ Σ and µE = ∞, then there is some i ∈ I such that
0 < µ(E ∩ Xi ) ≤ µXi < ∞;
so (X, Σ, µ) is semi-finite. If E ⊆ X and E ∩ F ∈ Σ whenever µF < ∞, then, in particular, E ∩ Xi ∈ Σ for every i ∈ I,
so E ∈ Σ; thus (X, Σ, µ) is locally determined.

211M Example: Lebesgue measure Let us consider Lebesgue measure in the light of the concepts above. Write
µ for Lebesgue measure on R r and Σ for its domain.

(a) µ is complete, because it is constructed by Carathéodory’s method; if A ⊆ E and µE = 0, then µ∗ A = µ∗ E = 0


(writing µ∗ for Lebesgue outer measure), so, for any B ⊆ R,
µ∗ (B ∩ A) + µ∗ (B \ A) ≤ 0 + µ∗ B = µ∗ B,
and A must be measurable.
S
(b) µ is σ-finite, because R r = n∈N [−n, n], writing n for the vector (n, . . . , n), and µ[−n, n] = (2n)r < ∞ for
every n. Of course µ is neither totally finite nor a probability measure.

(c) Because µ is σ-finite, it is strictly localizable (211Lc), localizable (211Ld), locally determined (211Ld) and
semi-finite (211Le or 211Lf).

P Suppose that E ∈ Σ. Consider the function


(d) µ is atomless. P
a 7→ f (a) = µ(E ∩ [−a, a]) : [0, ∞[ → R
We have
f (a) ≤ f (b) ≤ f (a) + µ[−b, b] − µ[−a, a] = f (a) + (2b)r − (2a)r
whenever a ≤ b in [0, ∞[, so f is continuous. Now f (0) = 0 and limn→∞ f (n) = µE > 0. By the Intermediate Value
Theorem there is an a ∈ [0, ∞[ such that 0 < f (a) < µE. So we have
0 < µ(E ∩ [−a, a]) < µE.
As E is arbitrary, µ is atomless. Q
Q

(e) It is now a trivial observation that µ cannot be purely atomic, because R r itself is a set of positive measure not
including any atom.

211N Counting measure Take X to be any uncountable set (e.g., X = R), and µ to be counting measure on X
(112Bd).

(a) µ is complete, because if A ⊆ E and µE = 0 then


A = E = ∅ ∈ Σ.

σ-finite, because if hEn in∈N is any sequence of sets of finite measure then every En is finite, therefore
(b) µ is not S
countable, and n∈N En is countable (1A1F), so cannot be X. A fortiori, µ is not a probability measure nor totally
finite.
211Rb Definitions 15

P Set Xx = {x} for every x ∈ X. Then hXx ix∈X is a partition of X, and for any E ⊆ X
(c) µ is strictly localizable. P
µ(E ∩ Xx ) = 1 if x ∈ E, 0 otherwise.
By the definition of µ,
P
µE = x∈X µ(E ∩ Xx )
for every E ⊆ X, and hXx ix∈X is a decomposition of X. Q Q
Consequently µ is localizable, locally determined and semi-finite.

P {x} is an atom for every x ∈ X, and if µE > 0 then surely E includes {x} for some x.
(d) µ is purely atomic. P
Q
Q Obviously, µ is not atomless.

211O A non-semi-finite space Set X = {0}, Σ = {∅, X}, µ∅ = 0 and µX = ∞. Then µ is not semi-finite, as
µX = ∞ but X has no subset of non-zero finite measure. It follows that µ cannot be localizable, locally determined,
σ-finite, totally finite nor a probability measure. Because Σ = PX, µ is complete. X is an atom for µ, so µ is purely
atomic (indeed, it is point-supported).

211P A non-complete space Write B for the σ-algebra of Borel subsets of R (111G), and µ for the restriction
of Lebesgue measure to B (recall that by 114G every Borel subset of R is Lebesgue measurable). Then (R, B, µ) is
atomless, σ-finite and not complete.
proof (a) To see that µ is not complete, recall that there is a continuous, strictly increasing bijection g : [0, 1] → [0, 1]
such that µg[C] > 0, where C is the Cantor set, so that there is a set A ⊆ g[C] which is not Lebesgue measurable
(134Ib). Now g −1 [A] ⊆ C cannot be a Borel set, since χA = χ(g −1 [A])◦g −1 is not Lebesgue measurable, therefore not
Borel measurable, and the composition of two Borel measurable functions is Borel measurable (121Eg); so g −1 [A] is a
non-measurable subset of the negligible set C.
(b) The rest of the arguments of 211M apply to µ just as well as to true Lebesgue measure, so µ is σ-finite and
atomless.
*Remark The argument offered in (a) could give rise to a seriously false impression. The set A referred to there can
be constructed only with the use of a strong form of the axiom of choice. No such device is necessary for the result
here. There are many methods of constructing non-Borel subsets of the Cantor set, all illuminating in different ways.
In the absence of any form of the axiom of choice, there are difficulties with the concept of ‘Borel set’, and others with
the concept of ‘Lebesgue measure’, which I will come to in Chapter 56; but countable choice is quite sufficient for the
existence of a non-Borel subset of R. For details of a possible approach see 423L in Volume 4.

211Q Some probability spaces Two obvious constructions of probability spaces, restricting myself to the methods
described in Volume 1, are
(a) the subspace measure induced by Lebesgue measure on [0, 1] (131B); P
(b) the point-supported measure induced on a set X by a function h : X → [0, 1] such that x∈X h(x) = 1, writing
P
µE = x∈E h(x) for every E ⊆ X; for instance, if X is a singleton {x} and h(x) = 1, or if X = N and h(n) = 2−n−1 .
Of these two, (a) gives an atomless probability measure and (b) gives a purely atomic probability measure.

211R Countable-cocountable measure The following is one of the basic constructions to keep in mind when
considering abstract measure spaces.
(a) Let X be any set. Let Σ be the family of those sets E ⊆ X such that either E or X \ E is countable. Then Σ is
P (i) ∅ is countable, so belongs to Σ. (ii) The condition for E to belong to Σ isSsymmetric
a σ-algebra of subsets of X. P
between E and X \ E, so X \ E ∈ Σ for every E ∈ Σ. (iii) Let hEn in∈N be any sequence in Σ, and set E = n∈N En . If
every En is countable, then E is countable, so belongs to Σ. Otherwise, there is some n such that X \ En is countable,
in which case X \ E ⊆ X \ En is countable, so again E ∈ Σ. Q Q Σ is called the countable-cocountable σ-algebra
of X.

(b) Now consider the function µ : Σ → {0, 1} defined by writing µE = 0 if E is countable, µE = 1 if E ∈ Σ and E
is not countable. Then µ is a measure. PP (i) ∅ is countable so µ∅ = 0. (ii) Let hEn in∈N be a disjoint sequence in Σ,
and E its union. (α) If every Em is countable, then so is E, so
P∞
µE = 0 = n=0 µEn .
16 Taxonomy of measure spaces 211Rb

(β) If some Em is uncountable, then E ⊇ Em also is uncountable, and µE = µEm = 1. But in this case, because
Em ∈ Σ, X \ Em is countable, so En , being a subset of X \ Em , is countable for every n 6= m; thus µEn = 0 for every
n 6= m, and
P∞
µE = 1 = n=0 µEn .
As hEn in∈N is arbitrary, µ is a measure. Q
Q This is the countable-cocountable measure on X.

(c) If X is any uncountable set and µ is the countable-cocountable measure on X, then µ is a complete, purely
P (i) If A ⊆ E and µE = 0, then E is countable, so A also
atomic probability measure, but is not point-supported. P
is countable and belongs to Σ. Thus µ is complete. (ii) Because X is uncountable, µX = 1 and µ is a probability
measure. (iii) If µE > 0, then µF = µE = 1 whenever
P F is a non-negligible measurable subset of E, so E is itself an
atom; thus µ is purely atomic. (iv) µX = 1 > 0 = x∈X µ{x}, so µ is not point-supported. Q
Q

211X Basic exercises > (a) Let (X, Σ, µ) be a measure space. Show that µ is σ-finite iff there is a totally finite
measure ν on X with the same measurable sets and the same negligible sets as µ.

> (b) Let g : R → R be a non-decreasing function and µg the associated Lebesgue-Stieltjes measure (114Xa). Show
that µg is complete and σ-finite. Show that
(i) µg is totally finite iff g is bounded;
(ii) µg is a probability measure iff limx→∞ g(x) − limx→−∞ g(x) = 1;
(iii) µg is atomless iff g is continuous;
(iv) if E is any atom for µg , there is a point x ∈ E such that µg E = µg {x};
(v) µg is purely atomic iff it is point-supported.

> (c) Let µ be counting measure on a set X. Show that µ is always complete, strictly localizable and purely atomic,
and that it is σ-finite iff X is countable, totally finite iff X is finite, a probability measure iff X is a singleton, and
atomless iff X is empty.

(d) Show that a point-supported measure is always complete, and is strictly localizable iff it is semi-finite.

(e) Let X be a set. Show that for any σ-ideal I of subsets of X (definition: 112Db), the set
Σ = I ∪ {X \ A : A ∈ I}
is a σ-algebra of subsets of X, and that there is a measure µ : Σ → {0, 1} given by setting
µE = 0 if E ∈ I, µE = 1 if E ∈ Σ \ I.
Show that I is precisely the null ideal of µ, that µ is complete, totally finite and purely atomic, and is a probability
measure iff X ∈
/ I.

(f ) Show that a point-supported measure is strictly localizable iff it is semi-finite.

(g) Let (X, Σ, µ) be a measure space such that µX > 0. Show that the set of conegligible subsets of X is a filter on
X.

211Y Further exercises (a) Let (X, Σ, µ) be a measure space, and for E, F ∈ Σ write E ∼ F if µ(E△F ) = 0.
Show that ∼ is an equivalence relation on Σ. Let A be the set of equivalence classes in Σ for ∼; for E ∈ Σ, write
E • ∈ A for its equivalence class. Show that there is a partial ordering ⊆ on A defined by saying that, for E, F ∈ Σ,
E• ⊆ F • ⇐⇒ µ(E \ F ) = 0.
Show that µ is localizable iff for every A ⊆ A there is an h ∈ A such that (i) a ⊆ h for every a ∈ A (ii) whenever g ∈ A
is such that a ⊆ g for every a ∈ A, then h ⊆ g.

(b) Let (X, Σ, µ) be a measure space, and construct A as in 211Ya. Show that there are operations ∪, ∩, \ on A
defined by saying that
E • ∩ F • = (E ∩ F )• , E • ∪ F • = (E ∪ F )• , E • \ F • = (E \ F )•
for all E, F ∈ Σ. Show that if A ⊆ A is any countable set, then there is an h ∈ A such that (i) a ⊆ h for every a ∈ A
(ii) whenever g ∈ A is such that a ⊆ g for every a ∈ A, then h ⊆ g. Show that there is a functional µ̄ : A → [0, ∞]
defined by saying that µ̄(E • ) = µE for every E ∈ Σ. ((A, µ̄) is called the measure algebra of (X, Σ, µ).)
212B Complete spaces 17

(c) Let (X, Σ, µ) be a semi-finite measure space. Show that it is atomless iff whenever ǫ > 0, E ∈ Σ and µE < ∞,
then there is a finite partition of E into measurable sets of measure at most ǫ.

(d) Let (X, Σ, µ) be a strictly localizable measure space. Show that it is atomless iff for every ǫ > 0 there is a
decomposition of X consisting of sets of measure at most ǫ.

(e) Let Σ be the countable-cocountable σ-algebra of R. Show that [0, ∞[ ∈ / Σ. Let µ be the restriction of counting
measure to Σ. Show that (R, Σ, µ) is complete, semi-finite and purely atomic, but not localizable nor locally determined.

211 Notes and comments The list of definitions in 211A-211K probably strikes you as quite long enough, even though
I have omitted many occasionally useful ideas. The concepts here vary widely in importance, and the importance of
each varies widely with context. My own view is that it is absolutely necessary, when studying any measure space, to
know its classification under the eleven discriminating features listed here, and to be able to describe any atoms which
are present. Fortunately, for most ‘ordinary’ measure spaces, the classification is fairly quick, because if (for instance)
the space is σ-finite, and you know the measure of the whole space, the only remaining questions concern completeness
and atoms. The distinctions between spaces which are, or are not, strictly localizable, semi-finite, localizable and
locally determined are relevant only for spaces which are not σ-finite, and do not arise in elementary applications.
I think it is also fair to say that the notions of ‘complete’ and ‘locally determined’ measure space are technical; I mean,
that they do not correspond to significant features of the essential structure of a space, though there are some interesting
problems concerning incomplete measures. One manifestation of this is the existence of canonical constructions for
rendering spaces complete or complete and locally determined (212C, 213D-213E). In addition, measure spaces which
are not semi-finite do not really belong to measure theory, but rather to the more general study of σ-algebras and
σ-ideals. The most important classifications, in terms of the behaviour of a measure space, seem to me to be ‘σ-finite’,
‘localizable’ and ‘strictly localizable’; these are the critical features which cannot be forced by elementary constructions.
If you know anything about Borel subsets of the real line, the argument of part (a) of the proof of 211P must look
very clumsy. But ‘better’ proofs rely on ideas which we shall not need until Volume 4, and the proof here is based on
a construction which we have to understand for other reasons.

212 Complete spaces


In the next two sections of this chapter I give brief accounts of the theory of measure spaces possessing certain of
the properties described in §211. I begin with ‘completeness’. I give the elementary facts about complete measure
spaces in 212A-212B; then I turn to the notion of ‘completion’ of a measure (212C) and its relationships with the other
concepts of measure theory introduced so far (212D-212G).

212A Proposition Any measure space constructed by Carathéodory’s method is complete.


proof Recall that ‘Carathéodory’s method’ starts from an arbitrary outer measure θ : PX → [0, ∞] and sets
Σ = {E : E ⊆ X, θA = θ(A ∩ E) + θ(A \ E) for every A ⊆ X}, µ = θ↾Σ
(113C). In this case, if B ⊆ E ∈ Σ and µE = 0, then θB = θE = 0 (113A(ii)), so for any A ⊆ X we have
θ(A ∩ B) + θ(A \ B) = θ(A \ B) ≤ θA ≤ θ(A ∩ B) + θ(A \ B),
and B ∈ Σ.

212B Proposition (a) If (X, Σ, µ) is a complete measure space, then any conegligible subset of X is measurable.
(b) Let (X, Σ, µ) be a complete measure space, and f a [−∞, ∞]-valued function defined on a subset of X. If f is
virtually measurable (that is, there is a conegligible set E ⊆ X such that f ↾E is measurable), then f is measurable.
(c) Let (X, Σ, µ) be a complete measure space, and f a real-valued function defined on a conegligible subset of X.
Then the following are equiveridical, that is, if one is true so are the others:
(i) f is integrable;
(ii) f is measurable and |f | is integrable;
(iii) f is measurable and there is an integrable function g such that |f | ≤a.e. g.
proof (a) If E is conegligible, then X \ E is negligible, therefore measurable, and E is measurable.
(b) Let a ∈ R. Then there is an H ∈ Σ such that
{x : (f ↾E)(x) ≤ a} = H ∩ dom(f ↾E) = H ∩ E ∩ dom f .
18 Taxonomy of measure spaces 212B

Now F = {x : x ∈ dom f \ E, f (x) ≤ a} is a subset of the negligible set X \ E, so is measurable, and


{x : f (x) ≤ a} = (F ∪ H) ∩ dom f ∈ Σdom f ,
writing ΣD = {D ∩ E : E ∈ Σ}, as in 121A. As a is arbitrary, f is measurable (135E).
(c)(i)⇒(ii) If f is integrable, then by 122P f is virtually measurable and by 122Re |f | is integrable. By (b) here,
f is measurable, so (ii) is true.
(ii)⇒(iii) is trivial.
(iii)⇒(i) If f is measurable and g is integrable and |f | ≤a.e. g, then f is virtually measurable, |g| is integrable
and |f | ≤a.e. |g|, so 122P tells us that f is integrable.

212C The completion of a measure Let (X, Σ, µ) be any measure space.

(a) Let Σ̂ be the family of those sets E ⊆ X such that there are E ′ , E ′′ ∈ Σ with E ′ ⊆ E ⊆ E ′′ and µ(E ′′ \ E ′ ) = 0.
Then Σ̂ is a σ-algebra of subsets of X. PP (i) Of course ∅ belongs to Σ̂, because we can take E ′ = E ′′ = ∅. (ii) If E ∈ Σ̂,
take E , E ∈ Σ such that E ⊆ E ⊆ E ′′ and µ(E ′′ \ E ′ ) = 0. Then
′ ′′ ′

X \ E ′′ ⊆ X \ E ⊆ X \ E ′ , µ((X \ E ′ ) \ (X \ E ′′ )) = µ(E ′′ \ E ′ ) = 0,
so X \ E ∈ Σ̂. (iii) If hEn in∈N is a sequence in Σ̂, then for each n choose En′ , En′′ ∈ Σ such that En′ ⊆ En ⊆ En′′ and
µ(En′′ \ En′ ) = 0. Set
S S S
E = n∈N En , E ′ = n∈N En′ , E ′′ = n∈N En′′ ;
S
then E ′ ⊆ E ⊆ E ′′ and E ′′ \ E ′ ⊆ n∈N (En′′ \ En′ ) is negligible, so E ∈ Σ̂. Q
Q

(b) For E ∈ Σ̂, set


µ̂E = µ∗ E = min{µF : E ⊆ F ∈ Σ}
(132A). It is worth remarking at once that if E ∈ Σ̂, E ′ , E ′′ ∈ Σ, E ′ ⊆ E ⊆ E ′′ and µ(E ′′ \ E ′ ) = 0, then
µE ′ = µ̂E = µE ′′ ; this is because
µE ′ = µ∗ E ′ ≤ µ∗ E ≤ µ∗ E ′′ = µE ′′ = µE ′ + µ(E ′′ \ E) = µE ′
(recalling from 132A, or noting now, that µ∗ A ≤ µ∗ B whenever A ⊆ B ⊆ X, and that µ∗ agrees with µ on Σ).

(c) We now find that (X, Σ̂, µ̂) is a measure space. P P (i) Of course µ̂, like µ, takes values in [0, ∞]. (ii) µ̂∅ = µ∅ = 0.
′ ′′ ′ ′′
(iii) Let hEn in∈N be a disjoint sequence
S S union ′′E. For each n ∈ N choose En , E
in Σ̂, with n ∈ Σ such that En ⊆ En ⊆ En
′′ ′ ′ ′ ′′ ′ ′′ ′′ ′
and µ(En \ En ) = 0. Set E = n∈N En , E = n∈N En . Then (as in (a-iii) above) E ⊆ E ⊆ E and µ(E \ E ) = 0,
so
P∞ P∞
µ̂E = µE ′ = n=0 µEn′ = n=0 µ̂En
because hEn′ in∈N , like hEn in∈N , is disjoint. Q
Q

(d) The measure space (X, Σ̂, µ̂) is called the completion of the measure space (X, Σ, µ); equally, I will call µ̂ the
completion of µ, and occasionally (if it seems plain which null ideal is under consideration) I will call Σ̂ the completion
of Σ. Members of Σ̂ are sometimes called µ-measurable.

212D There is something I had better check at once.


Proposition Let (X, Σ, µ) be any measure space. Then (X, Σ̂, µ̂), as defined in 212C, is a complete measure space
and µ̂ is an extension of µ; and (X, Σ̂, µ̂) = (X, Σ, µ) iff (X, Σ, µ) is complete.
proof (a) Suppose that A ⊆ E ∈ Σ̂ and µ̂E = 0. Then (by 212Cb) there is an E ′′ ∈ Σ such that E ⊆ E ′′ and µE ′′ = 0.
Accordingly we have
∅ ⊆ A ⊆ E ′′ , µ(E ′′ \ ∅) = 0,
so A ∈ Σ̂. As A is arbitrary, µ̂ is complete.
(b) If E ∈ Σ, then of course E ∈ Σ̂, because E ⊆ E ⊆ E and µ(E \ E) = 0; and µ̂E = µ∗ E = µE. Thus Σ ⊆ Σ̂ and
µ̂ extends µ.
212F Complete spaces 19

(c) If µ = µ̂ then of course µ must be complete. If µ is complete, and E ∈ Σ̂, then there are E ′ , E ′′ ∈ Σ such that
E ⊆ E ⊆ E ′′ and µ(E ′′ \ E ′ ) = 0. But now E \ E ′ ⊆ E ′′ \ E ′ , so (because (X, Σ, µ) is complete) E \ E ′ ∈ Σ and

E = E ′ ∪ (E \ E ′ ) ∈ Σ. As E is arbitrary, Σ̂ ⊆ Σ and Σ̂ = Σ and µ = µ̂.

212E The importance of this construction is such that it is worth spelling out some further elementary properties.
Proposition Let (X, Σ, µ) be a measure space, and (X, Σ̂, µ̂) its completion.
(a) The outer measures µ̂∗ , µ∗ defined from µ̂ and µ coincide.
(b) µ, µ̂ give rise to the same negligible and conegligible sets and the same sets of full outer measure.
(c) µ̂ is the only measure with domain Σ̂ which agrees with µ on Σ.
(d) A subset of X belongs to Σ̂ iff it is expressible as F △A where F ∈ Σ and A is µ-negligible.
proof (a) Take any A ⊆ X. (i) If A ⊆ F ∈ Σ, then F ∈ Σ̂ and µF = µ̂F , so
µ̂∗ A ≤ µ̂F = µF ;
as F is arbitrary, µ̂∗ A ≤ µ∗ A. (ii) If A ⊆ E ∈ Σ̂, there is an E ′′ ∈ Σ such that E ⊆ E ′′ and µE ′′ = µ̂E, so
µ∗ A ≤ µE ′′ = µ̂E;
as E is arbitrary, µ∗ A ≤ µ̂∗ A.
(b) Now, for A ⊆ X,
A is µ-negligible ⇐⇒ µ∗ A = 0 ⇐⇒ µ̂∗ A = 0 ⇐⇒ A is µ̂-negligible,

A is µ-conegligible ⇐⇒ µ∗ (X \ A) = 0
⇐⇒ µ̂∗ (X \ A) = 0 ⇐⇒ A is µ̂-conegligible.

If A has full outer measure for µ, F ∈ Σ̂ and F ∩ A = ∅, then there is an F ′ ∈ Σ such that F ′ ⊆ F and µF ′ = µ̂F ;
as F ′ ∩ A = ∅, F ′ is µ-negligible and F is µ̂-negligible; as F is arbitrary, A has full outer measure for µ̂. In the other
direction, of course, if A has full outer measure for µ̂ then
µ∗ (F ∩ A) = µ̂∗ (F ∩ A) = µ̂F = µF
for every F ∈ Σ, so A has full outer measure for µ.
(c) If µ̃ is any measure with domain Σ̂ extending µ, we must have
µE ′ ≤ µ̃E ≤ µE ′′ , µE ′ = µ̂E = µE ′′ ,
so µ̃E = µ̂E, whenever E ′ , E ′′ ∈ Σ, E ′ ⊆ E ⊆ E ′′ and µ(E ′′ \ E ′ ) = 0.
(d)(i) If E ∈ Σ̂, take E ′ , E ′′ ∈ Σ such that E ′ ⊆ E ⊆ E ′′ and µ(E ′′ \ E ′ ) = 0. Then E \ E ′ ⊆ E ′′ \ E ′ , so E \ E ′ is
µ-negligible, and E = E ′ △(E \ E ′ ) is the symmetric difference of a member of Σ and a negligible set.
(ii) If E = F △A, where F ∈ Σ and A is µ-negligible, take G ∈ Σ such that µG = 0 and A ⊆ G; then
F \ G ⊆ E ⊆ F ∪ G and µ((F ∪ G) \ (F \ G)) = µG = 0, so E ∈ Σ̂.

212F Now let us consider integration with respect to the completion of a measure.
Proposition Let (X, Σ, µ) be a measure space and (X, Σ̂, µ̂) its completion.
(a) A [−∞, ∞]-valued function f defined on a subset of X is Σ̂-measurable iffR it is µ-virtually
R measurable.
(b) Let f be a [−∞, ∞]-valued function defined on a subset of X. Then f dµ = f dµ̂ if either is defined in
[−∞, ∞]; in particular, f is µ-integrable iff it is µ̂-integrable.
proof (a)(i) Suppose that f is a [−∞, ∞]-valued Σ̂-measurable function. For q ∈ Q let Eq ∈ Σ̂ be such that
{x : f (x)S≤ q} = dom f ∩ Eq , and choose Eq′ , Eq′′ ∈ Σ such that Eq′ ⊆ Eq ⊆ Eq′′ and µ(Eq′′ \ Eq′ ) = 0. Set
H = X \ q∈Q (Eq′′ \ Eq′ ); then H is µ-conegligible. For a ∈ R set
S
Ga = q∈Q,q<a Eq′ ∈ Σ;
then
{x : x ∈ dom(f ↾H), (f ↾H)(x) < a} = Ga ∩ dom(f ↾H).
This shows that f ↾H is Σ-measurable, so that f is µ-virtually measurable.
20 Taxonomy of measure spaces 212F

(ii) If f is µ-virtually measurable, then there is a µ-conegligible set H ⊆ X such that f ↾H is Σ-measurable. Since
Σ ⊆ Σ̂, f ↾H is also Σ̂-measurable. And H is µ̂-conegligible, by 212Eb. But this means that f is µ̂-virtually measurable,
therefore Σ̂-measurable, by 212Bb.
R R
(b)(i) Let f : D → [−∞, ∞] be a function, where D ⊆ X. If either of f dµ, f dµ̂ is defined in [−∞, ∞], then
f is virtually measurable, and defined almost everywhere, for one of the appropriate measures, and therefore for both
(putting (a) above together with 212Bb).
(ii) Now suppose that f is non-negative and integrable either with respect to µ or with respect to µ̂. Let E ∈ Σ
be a conegligible set included in dom f such that f ↾E is Σ-measurable. For n ∈ N, k ≥ 1 set
Enk = {x : x ∈ E, f (x) ≥ 2−n k};
then each Enk belongs to Σ and is of finite measure for both µ and µ̂. (If f is µ-integrable,
R
µ̂Enk = µEnk ≤ 2n f dµ;
if f is µ̂-integrable,
R
µEnk = µ̂Enk ≤ 2n f dµ̂.)
So
P 4n
fn = k=1 2−n χEnk
R R
is both µ-simple and µ̂-simple, and fn dµ = fn dµ̂. Observe that, for x ∈ E,

fn (x) = 2−n k if k < 4n and 2−n k ≤ f (x) < 2−n (k + 1),


= 2n if f (x) ≥ 2n .
Thus hfn in∈N is a non-decreasing sequence of functions converging to f at every point of E, that is, both µ-almost
everywhere and µ̂-almost everywhere. So we have, for any c ∈ R,
Z Z
f dµ = c ⇐⇒ lim fn dµ = c
n→∞
Z Z
⇐⇒ lim fn dµ̂ = c ⇐⇒ f dµ̂ = c.
n→∞

R
(iii) As for infinite integrals, recall that for a non-negative function I write ‘ f = ∞’ just when R f is defined
R
almost everywhere, is virtually measurable, and is not integrable. So (i) and (ii) together show that f dµ = f dµ̂
whenever f is non-negative and either integral is defined in [0, ∞].
R R R
(iv) Since both µ, µ̂ agree that f is to be interpreted as f + − f − just when this can be defined in [−∞, ∞],
writing f + (x) = max(f (x), 0), f − (x) = max(−f (x), 0) for x ∈ dom f , the result for general real-valued f follows at
once.

212G I turn now to the question of the effect of the construction on the properties listed in 211B-211K.
Proposition Let (X, Σ, µ) be a measure space, and (X, Σ̂, µ̂) its completion.
(a) (X, Σ̂, µ̂) is a probability space, or totally finite, or σ-finite, or semi-finite, or localizable, iff (X, Σ, µ) is.
(b) (X, Σ̂, µ̂) is strictly localizable if (X, Σ, µ) is, and any decomposition of X for µ is a decomposition for µ̂.
(c) A set H ∈ Σ̂ is an atom for µ̂ iff there is an E ∈ Σ such that E is an atom for µ and µ̂(H△E) = 0.
(d) (X, Σ̂, µ̂) is atomless or purely atomic iff (X, Σ, µ) is.
proof (a)(i) Because µ̂X = µX, (X, Σ̂, µ̂) is a probability space, or totally finite, iff (X, Σ, µ) is.
α) If (X, Σ, µ) is σ-finite, there is a sequence hEn in∈N , covering X, with µEn < ∞ for each n. Now µ̂En < ∞
(ii)(α
for each n, so (X, Σ̂, µ̂) is σ-finite.
β ) If (X, Σ̂, µ̂) is σ-finite, there is a sequence hEn in∈N , covering X, with µ̂En < ∞ for each n. Now we can

find, for each n, an En′′ ∈ Σ such that µEn′′ < ∞ and En ⊆ En′′ ; so that hEn′′ in∈N witnesses that (X, Σ, µ) is σ-finite.
α) If (X, Σ, µ) is semi-finite and µ̂E = ∞, then there is an E ′ ∈ Σ such that E ′ ⊆ E and µE ′ = ∞. Next,
(iii)(α
there is an F ∈ Σ such that F ⊆ E ′ and 0 < µF < ∞. Of course we now have F ∈ Σ̂, F ⊆ E and 0 < µ̂F < ∞. As E
is arbitrary, (X, Σ̂, µ) is semi-finite.
212X Complete spaces 21

β ) If (X, Σ̂, µ̂) is semi-finite and µE = ∞, then µ̂E = ∞, so there is an F ⊆ E such that 0 < µ̂F < ∞. Next,

there is an F ′ ∈ Σ such that F ′ ⊆ F and µF ′ = µ̂F . Of course we now have F ′ ⊆ E and 0 < µF ′ < ∞. As E is
arbitrary, (X, Σ, µ) is semi-finite.
α) If (X, Σ, µ) is localizable and E ⊆ Σ̂, then set
(iv)(α
F = {F : F ∈ Σ, ∃ E ∈ E, F ⊆ E}.
Let H be an essential supremum of F in Σ, as in 211G.
If E ∈ E, there is an E ′ ∈ Σ such that E ′ ⊆ E and E \ E ′ is negligible; now E ′ ∈ F, so
µ̂(E \ H) ≤ µ̂(E \ E ′ ) + µ(E ′ \ H) = 0.
If G ∈ Σ̂ and µ̂(E \ G) = 0 for every E ∈ E, let G′′ ∈ Σ be such that G ⊆ G′′ and µ̂(G′′ \ G) = 0; then, for any F ∈ F,
there is an E ∈ E including F , so that
µ(F \ G′′ ) ≤ µ̂(E \ G) = 0.
As F is arbitrary, µ(H \ G′′ ) = 0 and µ̂(H \ G) = 0. This shows that H is an essential supremum of E in Σ̂. As E is
arbitrary, (X, Σ̂, µ̂) is localizable.
β ) Suppose that (X, Σ̂, µ̂) is localizable and that E ⊆ Σ. Working in (X, Σ̂, µ̂), let H be an essential supremum

for E in Σ̂. Let H ′ ∈ Σ be such that H ′ ⊆ H and µ̂(H \ H ′ ) = 0. Then
µ(E \ H ′ ) ≤ µ̂(E \ H) + µ̂(H \ H ′ ) = 0
for every E ∈ E; while if G ∈ Σ and µ(E \ G) = 0 for every E ∈ E, we must have
µ(H ′ \ G) ≤ µ̂(H \ G) = 0.
Thus H ′ is an essential supremum of E in Σ. As E is arbitrary, (X, Σ, µ) is localizable.
(b) Let hXi ii∈I be a decomposition of X for µ, as in 211E. Of course it is a partition of X into sets of finite
µ̂-measure. If H ⊆ X and H ∩ Xi ∈ Σ̂ for every i, choose for each i ∈ I sets Ei′ , Ei′′ ∈ Σ such that
Ei′ ⊆ H ∩ Xi ⊆ Ei′′ , µ(Ei′′ \ Ei′ ) = 0.
S S
Set E ′ = i∈I Ei′ , E ′′ = ′′
i∈I (Ei ∩ Xi ). Then E ′ ∩ Xi = Ei′ , E ′′ ∩ Xi = Ei′′ ∩ Xi for each i, so E ′ and E ′′ belong to Σ.
Also
P
µ(E ′′ \ E ′ ) = i∈I µ(Ei′′ ∩ Xi \ Ei′ ) = 0.
As E ′ ⊆ H ⊆ E ′′ , H ∈ Σ̂ and
P P
µ̂H = µE ′ = i∈I µEi′ = i∈I µ̂(H ∩ Xi ).
As H is arbitrary, hXi ii∈I is a decomposition of X for µ̂.
Accordingly, (X, Σ̂, µ̂) is strictly localizable if such a decomposition exists, which is so if (X, Σ, µ) is strictly localiz-
able.
(c)-(d)(i) Suppose that E ∈ Σ̂ is an atom for µ̂. Let E ′ ∈ Σ be such that E ′ ⊆ E and µ̂(E \ E ′ ) = 0. Then
µE ′ = µ̂E > 0. If F ∈ Σ and F ⊆ E ′ , then F ⊆ E, so either µF = µ̂F = 0 or µ(E ′ \ F ) = µ̂(E \ F ) = 0. As F is
arbitrary, E ′ is an atom for µ, and µ̂(E△E ′ ) = µ̂(E \ E ′ ) = 0.
(ii) Suppose that E ∈ Σ is an atom for µ, and that H ∈ Σ̂ is such that µ̂(H△E) = 0. Then µ̂H = µE > 0. If
F ∈ Σ̂ and F ⊆ H, let F ′ ⊆ F be such that F ′ ∈ Σ and µ̂(F \ F ′ ) = 0. Then E ∩ F ′ ⊆ E and µ̂(F △(E ∩ F ′ )) = 0, so
either µ̂F = µ(E ∩ F ′ ) = 0 or µ̂(H \ F ) = µ(E \ F ′ ) = 0. As F is arbitrary, H is an atom for µ̂.
(iii) It follows at once that (X, Σ̂, µ̂) is atomless iff (X, Σ, µ) is.
α) On the other hand, if (X, Σ, µ) is purely atomic and µ̂H > 0, there is an E ∈ Σ such that E ⊆ H and
(iv)(α
µE > 0, and an atom F for µ such that F ⊆ E; but F is also an atom for µ̂. As H is arbitrary, (X, Σ̂, µ̂) is purely
atomic.
β ) And if (X, Σ̂, µ̂) is purely atomic and µE > 0, then there is an H ⊆ E which is an atom for µ̂; now let

F ∈ Σ be such that F ⊆ H and µ̂(H \ F ) = 0, so that F is an atom for µ and F ⊆ E. As E is arbitary, (X, Σ, µ) is
purely atomic.

212X Basic exercises > (a) Let (X, Σ, µ) be a complete measure space. Suppose that A ⊆ E ∈ Σ and that
µ∗ A + µ∗ (E \ A) = µE < ∞. Show that A ∈ Σ.
22 Taxonomy of measure spaces 212Xb

> (b) Let µ and ν be two measures on a set X, with completions µ̂ and ν̂. Show that the following are equiveridical:
∗ ∗
(i)
R the outer
R measures µ , ν defined from µ and ν coincide; (ii) µ̂E = ν̂E whenever either is defined and finite; (iii)
f dµ = f dν whenever f is a real-valued function such that either integral is defined and finite. (Hint: for (i)⇒(ii),
if µ̂E < ∞, take a measurable envelope F of E for ν and calculate ν ∗ E + ν ∗ (F \ E).)

(c) Let µ be the restriction of Lebesgue measure to the Borel σ-algebra of R, as in 211P. Show that its completion
is Lebesgue measure itself. (Hint: 134F.)

(d) Repeat 212Xc for (i) Lebesgue measure on R r (ii) Lebesgue-Stieltjes measures on R (114Xa).

(e) Let X be a set and Σ a σ-algebra of subsets of X. Let I be a σ-ideal of subsets of X (112Db). (i) Show that
Σ1 = {E△A : E ∈ Σ, A ∈ I} is a σ-algebra of subsets of X. (ii) Let Σ2 be the family of sets E ⊆ X such that there
are E ′ , E ′′ ∈ Σ with E ′ ⊆ E ⊆ E ′′ and E ′′ \ E ′ ∈ I. Show that Σ2 is a σ-algebra of subsets of X and that Σ2 ⊆ Σ1 .
(iii) Show that Σ2 = Σ1 iff every member of I is included in a member of Σ ∩ I.

(f ) Let (X, Σ, µ) be a measure space, Y any set and φ : X → Y a function. Set θB = µ∗ φ−1 [B] for every B ⊆ Y .
(i) Show that θ is an outer measure on Y . (ii) Let ν be the measure defined from θ by Carathéodory’s method, and
T its domain. Show that if C ⊆ Y and φ−1 [C] ∈ Σ then C ∈ T. (iii) Suppose that (X, Σ, µ) is complete and totally
finite. Show that ν is the image measure µφ−1 .

(g) Let g, h be two non-decreasing functions from R to itself, and µg , µh the associated Lebesgue-Stieltjes measures.
Show that a real-valued function
R f defined
R on Ra subset of R is µg+h -integrable iff it is both µg -integrable and µh -
integrable, and that then f dµg+h = f dµg + f dµh . (Hint: 114Yb).

(h) Let (X, Σ, µ) be a measure space, and I a σ-ideal of subsets of X; set Σ1 = {E△A : E ∈ Σ, A ∈ I}, as in
212Xe. Show that if every member of Σ ∩ I is µ-negligible, then there is a unique extension of µ to a measure µ1 with
domain Σ1 such that µ1 A = 0 for every A ∈ I.

(i) Let (X, Σ, µ) be a complete measure space such that µX > 0, Y a set, f : X → Y a function and µf −1 the
image measure on Y . Show that if F is the filter of µ-conegligible subsets of X, then the image filter f [[F]] (2A1Ib) is
the filter of µf −1 -conegligible subsets of Y .

212Y Further exercises (a) Let X be a set and φ an inner measure on X, that is, a functional from PX to [0, ∞]
such that
φ∅ = 0,
φ(A ∪ B) ≥ φA + φB if A ∩ B = ∅,
T
φ( n∈N An ) = limn→∞ φAn whenever hAn in∈N is a non-increasing sequence of subsets of X and φA0 < ∞,
if φA = ∞, a ∈ R there is a B ⊆ A such that a ≤ φB < ∞.
Let µ be the measure defined from φ, that is, µ = φ↾Σ, where
Σ = {E : φ(A) = φ(A ∩ E) + φ(A \ E) ∀ A ⊆ X}
(113Yg). Show that µ must be complete.

212 Notes and comments The process of completion is so natural, and so universally applicable, and so convenient,
that over large parts of measure theory it is reasonable to use only complete measure spaces. Indeed many authors
so phrase their definitions that, explicitly or implicitly, only complete measure spaces are considered. In this treatise
I avoid taking quite such a large step, even though it would simplify the statements of many of the theorems in this
volume (for instance). I did take the trouble, in Volume 1, to give a definition of ‘integrable function’ which, in effect,
looks at integrability with respect to the completion of a measure (212Fb). There are non-complete measure spaces
which are worthy of study (for example, the restriction of Lebesgue measure to the Borel σ-algebra of R – see 211P),
and some interesting questions to be dealt with in Volumes 3 and 5 apply to them. At the cost of rather a lot of verbal
manoeuvres, therefore, I prefer to write theorems out in a form in which they can be applied to arbitrary measure
spaces, without assuming completeness. But it would be reasonable, and indeed would sharpen your technique, if you
regularly sought the alternative formulations which become natural if you are interested only in complete spaces.
213B Semi-finite, locally determined and localizable spaces 23

213 Semi-finite, locally determined and localizable spaces


In this section I collect a variety of useful facts concerning these types of measure space. I start with the characteristic
properties of semi-finite spaces (213A-213B), and continue with the complete locally determined spaces (213C) and the
concept of ‘c.l.d. version’ (213D-213H), the most powerful of the universally available methods of modifying a measure
space into a better-behaved one. I briefly discuss ‘locally determined negligible sets’ (213I-213L), and measurable
envelopes (213L-213M), and end with results on localizable spaces (213N) and strictly localizable spaces (213O).

213A Lemma Let (X, Σ, µ) be a semi-finite measure space. Then


µE = sup{µF : F ∈ Σ, F ⊆ E, µF < ∞}
for every E ∈ Σ.
proof Set c = sup{µF : F ∈ Σ, F ⊆ E, µF < ∞}. Then surely c ≤ µE, so if c = ∞ we can stop. If c S < ∞, let
hFn in∈N be a sequence
S of measurable subsets of E, of finite measure, such that lim µF
Sn→∞ n = c; set F = n∈N Fn .
For each n ∈ N, k≤n Fk is a measurable set of finite measure included in E, so µ( k≤n Fk ) ≤ c, and
S
µF = limn→∞ µ( k≤n Fk ) ≤ c.
Also
µF ≥ supn∈N µFn ≥ c,
so µF = c.
If F ′ is a measurable subset of E \ F and µF ′ < ∞, then F ∪ F ′ has finite measure and is included in E, so has
measure at most c = µF ; it follows that µF ′ = 0. But this means that µ(E \ F ) cannot be infinite, since then, because
(X, Σ, µ) is semi-finite, it would have to include a measurable set of non-zero finite measure. So E \ F has finite
measure, and is therefore in fact negligible; and µE = c, as claimed.

213B Proposition Let (X, Σ, µ) be a semi-finite measure space. Let f be a µ-virtually measurable [0, ∞]-valued
function defined almost everywhere in X. Then
Z Z
f = sup{ g : g is a simple function, g ≤a.e. f }
Z
= sup f
F ∈Σ,µF <∞ F

in [0, ∞].
proof (a) For any measure space (X, Σ, µ), a [0, ∞]-valued function defined on a subset of X is integrable iff there is
a conegligible set E such that
(α) E ⊆ dom f and f ↾E is measurable,
R
(β) sup{ g : g is a simple function, g ≤a.e. f } is finite,
(γ) for every ǫ > 0, {x : x ∈ E, f (x) ≥ ǫ} has finite measure,
(δ) f is finite almost everywhere
(see 122Ja, 133B). But if µ is semi-finite, (γ) and (δ) are consequences of the rest. P
P Let ǫ > 0. Set
Eǫ = {x : x ∈ E, f (x) ≥ ǫ},
R
c = sup{ g : g is a simple function, g ≤a.e. f };
we are supposing that c is finite. If F ⊆ Eǫ is measurable and µF < ∞, then ǫχF is a simple function and ǫχF ≤a.e. f ,
so
R
ǫµF = ǫχF ≤ c, µF ≤ c/ǫ.
As F is arbitrary, 213A tells us that µEǫ ≤ c/ǫ is finite. As ǫ is arbitrary, (γ) is satisfied.
As for (δ), if F = {x : x ∈ E, f (x) = ∞} then µF is finite (by (γ)) and nχF ≤a.e. f , so nµF ≤ c, for every n ∈ N,
so µF = 0. Q Q
R
(b) Now suppose that f : D → [0, ∞] is a µ-virtually measurable function, where D ⊆ X is conegligible, so that f
is defined in [0, ∞] (135F). Then (a) tells us that
24 Taxonomy of measure spaces 213B

Z Z
f= sup g
g is simple,g≤f a.e.
(if either is finite, and therefore also if either is infinite)
Z Z Z
= sup g ≤ sup f≤ f,
g is simple,g≤f a.e.,µF <∞ F µF <∞ F

so we have the equalities we seek.

*213C Proposition Let (X, Σ, µ) be a complete locally determined measure space, and µ∗ the outer measure
derived from µ (132A-132B). Then the measure defined from µ∗ by Carathéodory’s method is µ itself.
proof Write µ̌ for the measure defined by Carathéodory’s method from µ∗ , and Σ̌ for its domain.
(a) If E ∈ Σ and A ⊆ X then µ∗ (A ∩ E) + µ∗ (A \ E) = µ∗ A (132Af), so E ∈ Σ̌. Now µ̌E = µ∗ E = µE (132Ac).
Thus Σ ⊆ Σ̌ and µ = µ̌↾Σ.
(b) Now suppose that H ∈ Σ̌. Let E ∈ Σ be such that µE < ∞. Then H ∩ E ∈ Σ. P P Let E1 , E2 ∈ Σ be measurable
envelopes of E ∩ H, E \ H respectively, both included in E (132Ee). Because H ∈ Σ̌,
µE1 + µE2 = µ∗ (E ∩ H) + µ∗ (E \ H) = µ∗ E = µE.
As E1 ∪ E2 = E,
µ(E1 ∩ E2 ) = µE1 + µE2 − µE = 0.
Now E1 \ (E ∩ H) ⊆ E1 ∩ E2 ; because µ is complete, E1 \ (E ∩ H) and E ∩ H belong to Σ. Q Q
As E is arbitrary, and µ is locally determined, H ∈ Σ. As H is arbitrary, Σ̌ = Σ and µ̌ = µ.

213D C.l.d. versions: Proposition Let (X, Σ, µ) be a measure space. Write (X, Σ̂, µ̂) for its completion (212C)
and Σf for {E : E ∈ Σ, µE < ∞}. Set
Σ̃ = {H : H ⊆ X, H ∩ E ∈ Σ̂ for every E ∈ Σf },
and for H ∈ Σ̃ set
µ̃H = sup{µ̂(H ∩ E) : E ∈ Σf }.
Then (X, Σ̃, µ̃) is a complete locally determined measure space.
P (i) ∅ ∩ E = ∅ ∈ Σ̂ for every E ∈ Σf , so ∅ ∈ Σ̃. (ii) if H ∈ Σ̃ then
proof (a) I check first that Σ̃ is a σ-algebra. P
(X \ H) ∩ E = E \ (E ∩ H) ∈ Σ̂
f
for every E ∈ Σ , so X \ H ∈ Σ̃. (iii) If hHn in∈N is a sequence in Σ̃ with union H, then
S
H ∩ E = n∈N H ∩ Hn ∈ Σ̂
for every E ∈ Σf , so H ∈ Σ̃. Q
Q
(b) It is obvious that µ̃∅ = 0. If hHn in∈N is a disjoint sequence in Σ̃ with union H, then

µ̃H = sup{µ̂(H ∩ E) : E ∈ Σf }

X ∞
X
= sup{ µ̂(Hn ∩ E) : E ∈ Σf } ≤ µ̃Hn .
n=0 n=0
P∞ Pm
On the other hand, S is an m ∈ N
Pm given a < n=0 µ̃Hn , there such that a < n=0 µ̃Hn ; now we can find E0 , . . . , Em ∈ Σf
f
such that a ≤ n=0 µ̂(Hn ∩ En ). Set E = n≤m En ∈ Σ ; then
P∞ Pm
µ̃H ≥ µ̂(H ∩ E) = n=0 µ̂(Hn ∩ E) ≥ n=0 µ̂(Hn ∩ En ) ≥ a.
P∞ P∞
As a is arbitrary, µ̃H ≥ n=0 µ̃Hn and µ̃H = n=0 µ̃Hn .
(c) Thus (X, Σ̃, µ̃) is a measure space. To see that it is semi-finite, note first that Σ̂ ⊆ Σ̃ (because if H ∈ Σ̂ then
surely H ∩ E ∈ Σ̂ for every E ∈ Σf ), and that µ̃H = µ̂H whenever µ̂H < ∞ (because then, by the definition in 212Ca,
213G Semi-finite, locally determined and localizable spaces 25

there is an E ∈ Σf such that H ⊆ E, so that µ̃H = µ̂(H ∩ E) = µ̂H). Now suppose that H ∈ Σ̃ and that µ̃H = ∞.
There is surely an E ∈ Σf such that µ̂(H ∩ E) > 0; but now 0 < µ̂(H ∩ E) < ∞, so 0 < µ̃(H ∩ E) < ∞.
(d) Thus (X, Σ, µ) is a semi-finite measure space. To see that it is locally determined, let H ⊆ X be such that
H ∩ G ∈ Σ̃ whenever G ∈ Σ̃ and µ̃G < ∞. Then, in particular, we must have H ∩ E ∈ Σ̃ for every E ∈ Σf . But this
means in fact that H ∩ E ∈ Σ̂ for every E ∈ Σf , so that H ∈ Σ̃. As H is arbitrary, (X, Σ, µ) is locally determined.
(e) To see that (X, Σ̃, µ̃) is complete, suppose that A ⊆ H ∈ Σ̃ and that µ̃H = 0. Then for every E ∈ Σf we must
have µ̂(H ∩ E) = 0. Because (X, Σ̂, µ̂) is complete, and A ∩ E ⊆ H ∩ E, A ∩ E ∈ Σ̂. As E is arbitrary, A ∈ Σ̃.

213E Definition For any measure space (X, Σ, µ), I will call (X, Σ̃, µ̃), as constructed in 213D, the c.l.d. version
(‘complete locally determined version’) of (X, Σ, µ); and µ̃ will be the c.l.d. version of µ.

213F Following the same pattern as in 212E-212G, I start with some elementary remarks to facilitate manipulation
of this construction.
Proposition Let (X, Σ, µ) be any measure space and (X, Σ̃, µ̃) its c.l.d. version.
(a) Σ ⊆ Σ̃ and µ̃E = µE whenever E ∈ Σ and µE < ∞ – in fact, if (X, Σ̂, µ̂) is the completion of (X, Σ, µ), Σ̂ ⊆ Σ̃
and µ̃E = µ̂E whenever µ̂E < ∞.
(b) Writing µ̃∗ and µ∗ for the outer measures defined from µ̃ and µ respectively, µ̃∗ A ≤ µ∗ A for every A ⊆ X,
with equality if µ∗ A is finite. In particular, µ-negligible sets are µ̃-negligible; consequently, µ-conegligible sets are
µ̃-conegligible.
(c) For every H ∈ Σ̃ there is an E ∈ Σ such that E ⊆ H and µE = µ̃H; if µ̃H < ∞ then µ̃(H \ E) = 0.
proof (a) This is already covered by remarks in the proof of 213D.
(b) If µ∗ A = ∞ then surely µ̃∗ A ≤ µ∗ A. If µ∗ A < ∞, take E ∈ Σ such that A ⊆ E and µE = µ∗ A (132Aa). Then
µ̃∗ A ≤ µ̃E = µE = µ∗ A.
On the other hand, if A ⊆ H ∈ Σ̃, then
µ̃H ≥ µ̂(H ∩ E) ≥ µ̂∗ A = µ∗ A,
using 212Ea. So µ∗ A ≤ µ̃∗ A and µ∗ A = µ̃∗ A.
(c) Write Σf for {E : E ∈ Σ, µE < ∞}; then, by the definition in 213D, µ̃H = sup{µ̂(H ∩ E) : E ∈ Σf }. Let
hEn in∈N be a sequence in Σf such that µ̃H = supn∈N
S µ̂(H ∩ En ). For each n ∈ N there is an Fn ∈ Σ such that
Fn ⊆ H ∩ En and µFn = µ̂(H ∩ En ) (212C). Set E = n∈N Fn . Then E ∈ Σ, E ⊆ H and
[
µ̃H = sup µFn ≤ lim µ( Fi ) = µE
n∈N n→∞
i≤n
[
= lim µ̃( Fi ) ≤ µ̃H,
n→∞
i≤n

so µE = µ̃H, and if µ̃H < ∞ then µ̃(H \ E) = 0.

213G The next step is to look at functions which are measurable or integrable with respect to µ̃.
Proposition Let (X, Σ, µ) be a measure space, and (X, Σ̃, µ̃) its c.l.d. version.
(a) If a real-valued function f defined on a subset of X is µ-virtually measurable, it is Σ̃-measurable.
(b) If a real-valued function is µ-integrable, it is µ̃-integrable with the same integral.
(c) If f is a µ̃-integrable real-valued function, there is a µ-integrable real-valued function which is equal to f µ̃-almost
everywhere.
proof Write Σf for {E : E ∈ Σ, µE < ∞}. By 213Fa, µ̃ and µ agree on Σf .
(a) By 212Fa, f is Σ̂-measurable, where Σ̂ is the domain of the completion of µ; but since Σ̂ ⊆ Σ̃, f is Σ̃-measurable.
R R
(b)(i) If f is a µ-simple function it is µ̃-simple, and f dµ = f dµ̃, because µ̃E = µE for every E ∈ Σf .
(ii) If f is a non-negative µ-integrable function, there is a non-decreasing sequence hfn in∈N of µ-simple functions
converging to f µ-almost everywhere; now (by 213Fb) µ-negligible sets are µ̃-negligible, so hfn in∈N converges to f µ̃-a.e.
and (by B.Levi’s theorem, 123A) f is µ̃-integrable, with
26 Taxonomy of measure spaces 213G
R R R R
f dµ̃ = limn→∞ fn dµ̃ = limn→∞ fn dµ = f dµ.
R
(iii) In general, if f dµ is defined in R, we have
R R R R R R
f dµ̃ = f + dµ̃ − f − dµ̃ = f + dµ − f − dµ = f dµ,
+ −
writing f = f ∨ 0, f = (−f ) ∨ 0.
Pn
(c)(i) Let f be a µ̃-simple function. Express it as i=0 ai χHi where µ̃Hi < ∞Pnfor each i. Choose E0 , . . . , En ∈ Σ
suchRthat Ei R⊆ Hi and µ̃(Hi \ Ei ) = 0 for each i (using 213Fc above). Then g = i=0 ai χEi is µ-simple, g = f µ̃-a.e.,
and gdµ = f dµ̃.
(ii) Let f be a non-negative µ̃-integrable function. Let hfn in∈N be a non-decreasing sequence of µ̃-simple functions
converging µ̃-almost everywhere to f . For each n, choose a µ-simple function gn equal µ̃-almost everywhere to fn .
Then {x : gn+1 (x) < gn (x)} belongs to Σf and is µ̃-negligible, therefore µ-negligible. So hgn in∈N is non-decreasing
µ-almost everywhere. Because
R R R
limn→∞ gn dµ = limn→∞ fn dµ̃ = f dµ̃,
B.Levi’s theorem tells us that hgn in∈N converges µ-almost everywhere to a µ-integrable function g; because µ-negligible
sets are µ̃-negligible,

(X \ dom f ) ∪ (X \ dom g)
[
∪ {x : fn (x) 6= gn (x)}
n∈N

∪ {x : x ∈ dom f, f (x) 6= sup fn (x)}


n∈N

∪ {x : x ∈ dom g, g(x) 6= sup gn (x)}


n∈N

is µ̃-negligible, and f = g µ̃-a.e.


(iii) If f is µ̃-integrable, express it as f1 − f2 where f1 and f2 are µ̃-integrable and non-negative; then there are
µ-integrable functions g1 , g2 such that f1 = g1 , f2 = g2 µ̃-a.e., so that g = g1 − g2 is µ-integrable and equal to f µ̃-a.e.

213H Thirdly, I turn to the effect of the construction here on the other properties being considered in this chapter.
Proposition Let (X, Σ, µ) be a measure space and (X, Σ̂, µ̂) its completion, (X, Σ̃, µ̃) its c.l.d. version.
(a) If (X, Σ, µ) is a probability space, or totally finite, or σ-finite, or strictly localizable, so is (X, Σ̃, µ̃), and in all
these cases µ̃ = µ̂;
(b) if (X, Σ, µ) is localizable, so is (X, Σ̃, µ̃), and for every H ∈ Σ̃ there is anRE ∈ Σ such
R that µ̃(E△H) = 0;
(c) (X, Σ, µ) is semi-finite iff µ̃F = µF for every F ∈ Σ, and in this case f dµ̃ = f dµ whenever the latter is
defined in [−∞, ∞];
(d) a set H ∈ Σ̃ is an atom for µ̃ iff there is an atom E for µ such that µE < ∞ and µ̃(H△E) = 0;
(e) if (X, Σ, µ) is atomless or purely atomic, so is (X, Σ̃, µ̃);
(f) (X, Σ, µ) is complete and locally determined iff µ̃ = µ.
P Let hXi ii∈I be a decomposition
proof (a)(i) I start by showing that if (X, Σ, µ) is strictly localizable, then µ̃ = µ̂. P
of X for µ; then it is also a decomposition for µ̂ (212Gb). If H ∈ Σ̃, we shall have H ∩ Xi ∈ Σ̂ for every i, and therefore
H ∈ Σ̂; moreover,
X X
µ̂H = µ̂(H ∩ Xi ) = sup{ µ̂(H ∩ Xi ) : J ⊆ I is finite}
i∈I i∈J

≤ sup{µ̂(H ∩ E) : E ∈ Σ, µE < ∞} = µ̃H ≤ µ̂H.

So µ̂H = µ̃H for every H ∈ Σ̃ and µ̂ = µ̃. Q


Q
(ii) Consequently, if (X, Σ, µ) is a probability space, or totally finite, or σ-finite, or strictly localizable, so is
(X, Σ̃, µ̃), using 212Ga-212Gb to see that (X, Σ̂, µ̂) has the property involved.
(b) If (X, Σ, µ) is localizable, let H be any subset of Σ̃. Set
E = {E : E ∈ Σ, ∃ H ∈ H, E ⊆ H}.
213H Semi-finite, locally determined and localizable spaces 27

Working in (X, Σ, µ), let F ∈ Σ be an essential supremum for E.


(i) ?? Suppose, if possible, that there is an H ∈ H such that µ̃(H \ F ) > 0. Then there is an E ∈ Σ such that
E ⊆ H \ F and µE = µ̂(H \ F ) > 0 (213Fc). This E belongs to E and µ(E \ F ) = µE > 0; which is impossible if F is
an essential supremum of E. X X
(ii) Thus µ̃(H \ F ) = 0 for every H ∈ H. Now take any G ∈ Σ̃ such that µ̃(H \ G) = 0 for every H ∈ H. Let
E0 ∈ Σ be such that E0 ⊆ F \ G and µE0 = µ̃(F \ G); note that F \ E0 ⊇ F ∩ G. If E ∈ E, there is an H ∈ H such
that E ⊆ H, so that
µ(E \ (F \ E0 )) ≤ µ̃(H \ (F ∩ G)) ≤ µ̃(H \ F ) + µ̃(H \ G) = 0.
Because F is an essential supremum for E in Σ,
0 = µ(F \ (F \ E0 )) = µE0 = µ̃(F \ G).
This shows that F is an essential supremum for H in Σ̃. As H is arbitrary, (X, Σ̃, µ̃) is localizable.
(iii) The argument of (i)-(ii) shows in fact that if H ⊆ Σ̃ then H has an essential supremum F in Σ̃ such that F
actually belongs to Σ. Taking H = {H}, we see that if H ∈ Σ̃ there is an F ∈ Σ such that µ(H△F ) = 0.
(c) We already know that µ̃E ≤ µE for every E ∈ Σ, with equality if µE < ∞, by 213Fa.
(i) If (X, Σ, µ) is semi-finite, then for any F ∈ Σ we have

µF = sup{µE : E ∈ Σ, E ⊆ F, µE < ∞}
= sup{µ̃E : E ∈ Σ, E ⊆ F, µE < ∞} ≤ µ̃F ≤ µF,
so that µ̃F = µF .
(ii) Suppose that µ̃F = µF for every F ∈ Σ. If µF = ∞, then there must be an E ∈ Σ such that µE < ∞,
µ̂(F ∩ E) > 0; in which case F ∩ E ∈ Σ and 0 < µ(F ∩ E) < ∞. As F is arbitrary, (X, Σ, µ) is semi-finite.
R
(iii) If f is non-negative and f dµ = ∞, then f is µ-virtually measurable, therefore Σ̃-measurable (213Ga), and
defined µ-almost everywhere, therefore µ̃-almost everywhere. Now
Z Z
f dµ̃ = sup{ g dµ̃ : g is µ̃-simple, 0 ≤ g ≤ f µ̃-a.e.}
Z
≥ sup{ g dµ : g is µ-simple, 0 ≤ g ≤ f µ-a.e.} = ∞
R R R
by 213B. With 213Gb, this shows that f dµ̃ = f dµ whenever fR is non-negative
R and f dµ is defined in [0, ∞].
Applying this to the positive and negative parts of f , we see that f dµ̃ = f dµ whenever the latter is defined in
[−∞, ∞].
(d)(i) If H ∈ Σ̃ is an atom for µ̃, then (because µ̃ is semi-finite) there is surely an H ′ ∈ Σ̃ such that H ′ ⊆ H and
0 < µ̃H ′ < ∞, and we must have µ̃(H \ H ′ ) = 0, so that µ̃H < ∞. Accordingly there is an E ∈ Σ such that E ⊆ H
and µ̃(H \ E) = 0 (213Fc above). We have µE = µ̃H > 0. If F ∈ Σ and F ⊆ E, then either µF = µ̃F = 0 or
µ(E \ F ) = µ̃(H \ F ) = 0. Thus E ∈ Σ is an atom for µ with µ̃(H△E) = 0 and µE = µ̃H < ∞.
(ii) If H ∈ Σ̃ and there is an atom E for µ such that µE < ∞ and µ̃(H△E) = 0, let G ∈ Σ̃ be a subset of H. We
have
µ̃G ≤ µ̃H = µE < ∞,
so there is an F ∈ Σ such that F ⊆ G and µ̃(G \ F ) = 0. Now either µ̃G = µ(E ∩ F ) = 0 or µ̃(H \ G) = µ(E \ F ) = 0.
This is true whenever G ∈ Σ̃ and G ⊆ H; also µ̃H = µE > 0. So H is an atom for µ̃.
(e) If (X, Σ, µ) is atomless, then (X, Σ̃, µ̃) must be atomless, by (d).
If (X, Σ, µ) is purely atomic and H ∈ Σ̃, µ̃H > 0, then there is an E ∈ Σ such that 0 < µ̂(H ∩ E) < ∞. Let E1 ∈ Σ
be such that E1 ⊆ H ∩ E and µE1 > 0. There is an atom F for µ such that F ⊆ E1 ; now µF < ∞ so F is an atom
for µ̃, by (d). Also F ⊆ H. As H is arbitrary, (X, Σ̃, µ̃) is purely atomic.
(f ) If µ = µ̃, then of course (X, Σ, µ) must be complete and locally determined, because (X, Σ̃, µ̃) is. If (X, Σ, µ) is
complete and locally determined, then µ̂ = µ so (using the definition in 213D) Σ̃ ⊆ Σ and µ̃ = µ, by (c) above.
28 Taxonomy of measure spaces 213I

213I Locally determined negligible sets The following simple idea is occasionally useful.
Definition A measure space (X, Σ, µ) has locally determined negligible sets if for every non-negligible A ⊆ X
there is an E ∈ Σ such that µE < ∞ and A ∩ E is not negligible.

213J Proposition If a measure space (X, Σ, µ) is either strictly localizable or complete and locally determined, it
has locally determined negligible sets.
proof Let A ⊆ X be a set such that A ∩ E is negligible whenever µE < ∞; I need to show that A is negligible.
(i) If µ is strictly localizable, let hXi ii∈I be a decomposition ofSX. For each i ∈ I, A ∩ XiP
is negligible, so there we
can choose a negligible Ei ∈ Σ such that A ∩ Xi ⊆ Ei . Set E = i∈I Ei ∩ Xi . Then µE = i∈I µ(Ei ∩ Xi ) = 0 and
A ⊆ E, so A is negligible.
(ii) If µ is complete and locally determined, take any measurable set E of finite measure. Then A ∩ E is negligible,
therefore measurable; as E is arbitrary, A is measurable; as µ is semi-finite, A is negligible.

213K Lemma If a measure space (X, Σ, µ) hasS locally determined negligible sets, and E ⊆ Σ has an essential
supremum H ∈ Σ in the sense of 211G, then H \ E is negligible.
S
proof Set A = H \ E. Take any F ∈ Σ such that µF < ∞. Then F ∩ A has a measurable envelope V say (132Ee
again). If E ∈ E, then
µ(E \ (X \ V )) = µ(E ∩ V ) = µ∗ (E ∩ F ∩ A) = 0,
so H ∩ V = H \ (X \ V ) is negligible and F ∩ A is negligible. As F is arbitrary and µ has locally determined negligible
sets, A is negligible, as claimed.

213L Proposition Let (X, Σ, µ) be a localizable measure space with locally determined negligible sets. Then every
subset A of X has a measurable envelope.
proof Set
E = {E : E ∈ Σ, µ∗ (A ∩ E) = µE < ∞}.
Let G be an essential supremum for E in Σ.
(i) A \ G is negligible. PP Let F be any set of finite measure for µ. Let E be a measurable envelope of A ∩ F .
Then E ∈ E so E \ G is negligible. But F ∩ A \ G ⊆ E \ G, so F ∩ A \ G is negligible. Because µ has locally determined
negligible sets, this is enough to show that A \ G is negligible. Q
Q
(ii) Let E0 be a negligible measurable set including A \ G, and set G̃ = E0 ∪ G, so that G̃ ∈ Σ, A ⊆ G̃ and
µ(G̃ \ G) = 0. ?? Suppose, if possible, that there is an F ∈ Σ such that µ∗ (A ∩ F ) < µ(G̃ ∩ F ). Let F1 ⊆ F be a
measurable envelope of A ∩ F . Set H = X \ (F \ F1 ); then A ⊆ H. If E ∈ E then
µE = µ∗ (A ∩ E) ≤ µ(H ∩ E),
so E \ H is negligible; as E is arbitrary, G \ H is negligible and G̃ \ H is negligible. But G̃ ∩ F \ F1 ⊆ G̃ \ H and
µ(G̃ ∩ F \ F1 ) = µ(G̃ ∩ F ) − µ∗ (A ∩ F ) > 0. X
X
This shows that G̃ is a measurable envelope of A, as required.

213M Corollary (a) If (X, Σ, µ) is σ-finite, then every subset of X has a measurable envelope for µ.
(b) If (X, Σ, µ) is localizable, then every subset of X has a measurable envelope for the c.l.d. version of µ.
proof (a) Use 132Ee, or 213L, 211Lc and 213L.
(b) Use 213L and the fact that the c.l.d. version of µ is localizable as well as being complete and locally determined
(213Hb).

213N When we come to use the concept of ‘localizability’, it will frequently be through the following characteriza-
tion.
Theorem Let (X, Σ, µ) be a localizable measure space. Suppose that Φ is a family of measurable real-valued functions,
all defined on measurable subsets of X, such that whenever f , g ∈ Φ then f = g almost everywhere in dom f ∩ dom g.
Then there is a measurable function h : X → R such that every f ∈ Φ agrees with h almost everywhere in dom f .
213O Semi-finite, locally determined and localizable spaces 29

proof For q ∈ Q, f ∈ Φ set


Ef q = {x : x ∈ dom f, f (x) ≥ q} ∈ Σ.
For each q ∈ Q, let Eq be an essential supremum of {Ef q : f ∈ Φ} in Σ. Set
h∗ (x) = sup{q : q ∈ Q, x ∈ Eq } ∈ [−∞, ∞]
for x ∈ X, taking sup ∅ = −∞ if necessary.
If f , g ∈ Φ and q ∈ Q, then

Ef q \ (X \ (dom g \ Egq )) = Ef q ∩ dom g \ Egq


⊆ {x : x ∈ dom f ∩ dom g, f (x) 6= g(x)}
is negligible; as f is arbitrary,
Eq ∩ dom g \ Egq = Eq \ (X \ (dom g \ Egq ))
S
is negligible. Also Egq \ Eq is negligible, so Egq △(Eq ∩ dom g) is negligible. Set Hg = q∈Q Egq △(Eq ∩ dom g); then
Hg is negligible. But if x ∈ dom g \ Hg , then, for every q ∈ Q, x ∈ Eq ⇐⇒ x ∈ Egq ; it follows that for such x,
h∗ (x) = g(x). Thus h∗ = g almost everywhere in dom g; and this is true for every g ∈ Φ.
The function h∗ is not necessarily real-valued. But it is measurable, because
S
{x : h∗ (x) > a} = {Eq : q ∈ Q, q > a} ∈ Σ
for every real a. So if we modify it by setting

h(x) = h∗ (x) if h(x) ∈ R,


= 0 if h∗ (x) ∈ {−∞, ∞},
we shall get a measurable real-valued function h : X → R; and for any g ∈ Φ, h(x) will be equal to g(x) at least
whenever h∗ (x) = g(x), which is true for almost every x ∈ dom g. Thus h is a suitable function.

213O There is an interesting and useful criterion for a space to be strictly localizable which I introduce at this
point, though it will be used rarely in this volume.
Proposition Let (X, Σ, µ) be a complete locally determined space.
(a) Suppose that there is a disjoint family E ⊆ Σ such that (α) µE < ∞ for every E ∈ E (β) S whenever F ∈ Σ and
µF > 0 then
S there is an E ∈ E such that µ(E ∩ F ) > 0. Then (X, Σ, µ) is strictly localizable, E is conegligible, and
E ∪ {X \ E} is a decomposition of X.
(b) Suppose that hXi ii∈I is a partition of X into measurable sets of finite measure such that whenever E ∈ Σ and
µE > 0 there is an i ∈ I such that µ(E ∩ Xi ) > 0. Then (X, Σ, µ) is strictly localizable, and hXi ii∈I is a decomposition
of X.
S
proof (a)(i) The first thing to note is that if F ∈ Σ and µF < ∞, there is a countable E ′ ⊆ E such that µ(F \ E ′ ) = 0.
P
P Set
En′ = {E : E ∈ E, µ(F ∩ E) ≥ 2−n } for each n ∈ N,
S
E ′ = n∈N En′ = {E : E ∈ E, µ(F ∩ E) > 0}.
Because E is disjoint, we must have
#(En′ ) ≤ 2n µF

S ′n ∈ N, ′so that every


for every En′ is finite and E ′ , being the union of a sequence of countable sets, is countable. Set
E = E and F = F \ E , so that both E ′ and F ′ belong to Σ. If E ∈ E ′ , then E ⊆ E ′ so µ(E ∩ F ′ ) = µ∅ = 0; if
′ ′

E ∈ E \ ES , then µ(E ∩ F ′ ) = µ(E ∩ F ) = 0. Thus µ(E ∩ F ′ ) = 0 for every E ∈ E. By the hypothesis (β) on E, µF ′ = 0,
so µ(F \ E ′ ) = 0, as required. Q Q
(ii) Now suppose that H ⊆ X is such that H ∩ E ∈ Σ for every E ∈ E. In this case S H ∈ Σ. P P Let F ∈ Σ be
such that µF < ∞. Let E ′ ⊆ E be a countable setSsuch that µ(F \ E ′ ) = 0, where E ′ = E ′ . Then H ∩ (F \ E ′ ) ∈ Σ
because (X, Σ, µ) is complete. But also H ∩ E ′ = E∈E ′ H ∩ E ∈ Σ. So
H ∩ F = (H ∩ (F \ E ′ )) ∪ (F ∩ (H ∩ E ′ )) ∈ Σ.
As F is arbitrary and (X, Σ, µ) is locally determined, H ∈ Σ. Q
Q
30 Taxonomy of measure spaces 213O

P
P (iii) We find also that µH = E∈E µ(H ∩ E) for every H ∈ Σ. P P (α) Because E is disjoint, we must have

E∈E ′ µ(H ∩ E) ≤ µH for every finite E ⊆ E, so
P P ′
E∈E µ(H ∩ E) = sup{ E∈E ′ µ(H ∩ E) : E ⊆ E is finite} ≤ µH.

(β) For
S the reverse inequality, consider first the case µH < ∞. By (i), there is a countable E ′ ⊆ E such that

µ(H \ E ) = 0, so that
S P P
µH = µ(H ∩ E ′ ) = E∈E ′ µ(H ∩ E) ≤ E∈E µ(H ∩ E).
(γ) In general, because (X, Σ, µ) is semi-finite,

µH = sup{µF : F ⊆ H, µF < ∞}
X X
≤ sup{ µ(F ∩ E) : F ⊆ H, µF < ∞} ≤ µ(H ∩ E).
E∈E E∈E
P
So in all cases we have µH ≤ E∈E µ(H ∩ E), and the two are equal. Q
Q
S S
(iv) In particular, setting E0 = X \ E, E0 ∈ Σ and µE0 = 0; that is, E is conegligible. Consider E ∗ = E ∪{E0 }.
This is a partition of X into sets of finite measure (now using the hypothesis (α) on E). If H ⊆ X is such that H ∩E ∈ Σ
for every E ∈ E ∗ , then H ∈ Σ and
P P
µH = E∈E µ(H ∩ E) = E∈E ∗ µ(H ∩ E).
Thus E ∗ (or, if you prefer, the indexed family hEiE∈E ∗ ) is a decomposition witnessing that (X, Σ, µ) is strictly localiz-
able.
(b) Apply (a) with E = {Xi : i ∈ I}, noting that E0 in (iv) is empty, so can be dropped.

213X Basic exercises (a) Let (X, Σ, µ) be any measure space, µ∗ the outer measure defined from µ, and µ̌
the measure defined by Carathéodory’s method from µ∗ ; write Σ̌ for the domain of µ̌. Show that (i) µ̌ extends the
completion µ̂ of µ; (ii) if H ⊆ X is such that H ∩ F ∈ Σ̌ whenever F ∈ Σ and µF < ∞, then H ∈ Σ̌; (iii) (µ̌)∗ = µ∗ ,
so that the integrable functions for µ̌ and µ are the same (212Xb); (iv) if µ is strictly localizable then µ̌ = µ̂; (v) if µ
is defined by Carathéodory’s method from another outer measure, then µ = µ̌.

> (b) Let µ be counting measure restricted to the countable-cocountable σ-algebra of a set X (211R, 211Ye). (i)
Show that the c.l.d. version µ̃ of µ is just counting measure on X. (ii) Show that µ̌, as defined in 213Xa, is equal to µ̃,
and in particular strictly extends the completion of µ.

(c) Let (X, Σ, µ) be any measure space. For E ∈ Σ set


µsf E = sup{µ(E ∩ F ) : F ∈ Σ, µF < ∞}.
(i) Show that (X, Σ, µsf ) is a semi-finite measure space, and is equal to (X, Σ, µ) iff (X, Σ, µ) is semi-finite.
(ii) Show that a µ-integrable real-valued function f is µsf -integrable, with the same integral.
(iii) Show that if E ∈ Σ and µsf E < ∞, then E can be expressed as E1 ∪ E2 where E1 , E2 ∈ Σ, µE1 = µsf E1
and µsf E2 = 0.
(iv) Show that if f is a µsf -integrable real-valued function on X, it is equal µsf -almost everywhere to a µ-integrable
function.
(v) Show that if (X, Σ, µsf ) is complete, so is (X, Σ, µ).
(vi) Show that µ and µsf have identical c.l.d. versions.

(d) Let (X, Σ, µ) be any measure space. Define µ̌ as in 213Xa. Show that (µ̌)sf , as constructed in 213Xc, is precisely
the c.l.d. version µ̃ of µ, so that µ̌ = µ̃ iff µ̌ is semi-finite.

(e) Let (X, Σ, µ) be a measure space. For A ⊆ X set µ∗ A = sup{µE : E ∈ Σ, µE < ∞, E ⊆ A}, as in 113Yh. (i)
Show that the measure constructed from µ∗ by the method of 113Yg is just the c.l.d. version µ̃ of µ. (ii) Show that
µ̃∗ = µ∗ . (iii) Show that if ν is another measure on X, with domain T, then µ̃ = ν̃ iff µ∗ = ν∗ .

(f ) Let X be a set and θ an outer measure on X. Show that θsf , defined by writing
θsf A = sup{θB : B ⊆ A, θB < ∞}
is also an outer measure on X. Show that the measures defined by Carathéodory’s method from θ, θsf have the same
domains.
213Yb Semi-finite, locally determined and localizable spaces 31

(g) Let (X, Σ, µ) be any measure space. Set


µ∗sf A = sup{µ∗ (A ∩ E) : E ∈ Σ, µE < ∞}
for every A ⊆ X.
(i) Show that
µ∗sf A = sup{µ∗ B : B ⊆ A, µ∗ B < ∞}
for every A.
(ii) Show that µ∗sf is an outer measure.
(iii) Show that if A ⊆ X and µ∗sf A < ∞, there is an E ∈ Σ such that µ∗sf A = µ∗ (A ∩ E) = µE, µ∗sf (A \ E) = 0.
(Hint: take a non-decreasing sequence hEn in∈N of measurable sets of finite measure such that µ∗sf A = limn→∞ µ∗ (A ∩
S S
En ), and let E ⊆ n∈N En be a measurable envelope of A ∩ n∈N En .)
(iv) Show that the measure defined from µ∗sf by Carathéodory’s method is precisely the c.l.d. version µ̃ of µ.
(v) Show that µ∗sf = µ̃∗ , so that if µ is complete and locally determined then µ∗sf = µ∗ .


P > (h)∗ Let (X, Σ, µ) be a strictly localizable measure space with a decomposition hXi ii∈I . Show that µ A =
i∈I µ (A ∩ Xi ) for every A ⊆ X.

> (i) Let (X, Σ, µ) be a complete locally determined measure space, and let A ⊆ X be such that max(µ∗ (E ∩
A), µ∗ (E \ A)) < µE whenever E ∈ Σ and 0 < µE < ∞. Show that A ∈ Σ. (Hint: given µF < ∞, consider the
intersection E of measurable envelopes of F ∩ A, F \ A to see that µ∗ (F ∩ A) + µ∗ (F \ A) = µF .)

> (j) Let (X, Σ, µ) be a measure space, µ̃ its c.l.d. version, and µ̌ the measure defined by Carathéodory’s method
from µ∗ . (i) Show that the following are equiveridical: (α) µ has locally determined negligible sets; (β) µ and µ̃ have
the same negligible sets; (γ) µ̌ = µ̃. (ii) Show that in this case µ is semi-finite.

(k) Let (X, Σ, µ) be a measure space. Show that the following are equiveridical: (i) (X, Σ, µ) has locally determined
negligible sets; (ii) the completion µ̂ and c.l.d. version µ̃ of µ have the same sets of finite measure; (iii) µ and µ̃ have
the same integrable functions; (iv) µ̃∗ = µ∗ ; (v) the outer measure µ∗sf of 213Xg is equal to µ∗ .

(l) Let us say that a measure space (X, Σ, µ) has the measurable envelope property if every subset of X has a
measurable envelope. (i) Show that a semi-finite space with the measurable envelope property has locally determined
negligible sets. (ii) Show that a complete semi-finite space with the measurable envelope property is locally determined.

(m) Let (X, Σ, µ) be a semi-finite measure space, and suppose that it satisfies the conclusion of Theorem 213N.
Show that it is localizable. (Hint: given E ⊆ Σ, set F = {F : F ∈ Σ, E ∩ F is negligible for every E ∈ E}. Let Φ be
the set of functions f from subsets of X to {0, 1} such that f −1 [{1}] ∈ E and f −1 [{0}] ∈ F.)

(n) Let (X, Σ, µ) be a measure space. Show that its c.l.d. version is strictly localizable iff there is a disjoint family
E ⊆ Σ such that µE < ∞ for every E ∈ E and whenever F ∈ Σ, 0 < µF < ∞ there is an E ∈ E such that µ(E ∩ F ) > 0.

(o) Show that the c.l.d. version of any point-supported measure is point-supported.

213Y Further exercises (a) Set X = N, and for A ⊆ X set


p
θA = #(A) if A is finite, ∞ if A is infinite.
Show that θ is an outer measure on X, that θA = sup{θB : B ⊆ A, θB < ∞} for every A ⊆ X, but that the measure µ
defined from θ by Carathéodory’s method is not semi-finite. Show that if µ̌ is the measure defined by Carathéodory’s
method from µ∗ (213Xa), then µ̌ 6= µ.

(b) Set X = [0, 1] × {0, 1}, and let Σ be the family of those subsets E of X such that
{x : x ∈ [0, 1], E[{x}] 6= ∅, E[{x}] 6= {0, 1}}
is countable, writing E[{x}] = {y : (x, y) ∈ E} for each x ∈ [0, 1]. Show that Σ is a σ-algebra of subsets of X. For
E ∈ Σ, set µE = #({x : (x, 1) ∈ E}) if this is finite, ∞ otherwise. Show that µ is a complete semi-finite measure.
Show that the measure µ̌ defined from µ∗ by Carathéodory’s method (213Xa) is not semi-finite. Show that the domain
of the c.l.d. version of µ is the whole of PX.
32 Taxonomy of measure spaces 213Yc

(c) Set X = N, and for A ⊆ X set


φA = #(A)2 if A is finite, ∞ if A is infinite.
Show that φ satisfies the conditions of 113Yg/212Ya, but that the measure defined from φ by the method of 113Yg is
not semi-finite.

(d) Let (X, Σ, µ) be a complete locally determined measure space. Suppose that D ⊆ X and that f : D → R is a
function. Show that the following are equiveridical: (i) f is measurable; (ii)
µ∗ {x : x ∈ D ∩ E, f (x) ≤ a} + µ∗ {x : x ∈ D ∩ E, f (x) ≥ b} ≤ µE
whenever a < b in R, E ∈ Σ and µE < ∞ (iii)
max(µ∗ {x : x ∈ D ∩ E, f (x) ≤ a}, µ∗ {x : x ∈ D ∩ E, f (x) ≥ b}) < µE
whenever a < b in R and 0 < µE < ∞. (Hint: for (iii)⇒(i), show that if E ⊆ X then
µ∗ {x : x ∈ D ∩ E, f (x) > a} = supb>a µ∗ {x : x ∈ D ∩ E, f (x) ≥ b},
and use 213Xi above.)

(e) Let (X, Σ, µ) be a complete locally determined measure space and suppose that E ⊆ ΣS is such that
S µE < ∞
for every E ∈ E and whenever F ∈ Σ and µF < ∞ there is a countable E0 ⊆ E such that F \ E0 , F ∩ (E \ E0 ) are
negligible. Show that (X, Σ, µ) is strictly localizable.

a measure space. Show that µ is semi-finite iff there is a family E ⊆ Σ such that µE < ∞ for
(f ) Let (X, Σ, µ) be P
every E ∈ E and µF = E∈E µ(F ∩ E) for every F ∈ Σ. (Hint: take E maximal subject to the intersection of any two
elements being negligible.)

213 Notes and comments I think it is fair to say that if the definition of ‘measure space’ were re-written to exclude
all spaces which are not semi-finite, nothing significant would be lost from the theory. There are solid reasons for not
taking such a drastic step, starting with the fact that it would confuse everyone (if you say to an unprepared audience
‘let (X, Σ, µ) be a measure space’, there is a danger that some will imagine that you mean ‘σ-finite measure space’,
but very few will suppose that you mean ‘semi-finite measure space’). But the whole point of measure theory is that
we distinguish between sets by their measures, and if every subset of E is either non-measurable, or negligible, or of
infinite measure, the classification is too crude to support most of the usual ideas, starting, of course, with ordinary
integration.
Let us say that a measurable set E is purely infinite if E itself and all its non-negligible measurable subsets have
infinite measure. On the definition of the integral which I chose in Volume 1, every simple function, and therefore every
integrable function, must be zero almost everywhere in E. This means that the whole theory of integration will ignore
E entirely. Looking at the definition of ‘c.l.d. version’ (213D-213E), you will see that the c.l.d. version of the measure
will render E negligible, as does the ‘semi-finite version’ described in 213Xc. These amendments do not, however, affect
sets of finite measure, and consequently leave integrable functions integrable, with the same integrals.
The strongest reason we have yet seen for admitting non-semi-finite spaces into consideration is that Carathéodory’s
method does not always produce semi-finite spaces. (I give examples in 213Ya-213Yb; more important ones are the
Hausdorff measures of §§264-265 below.) In practice the right thing to do is often to take the c.l.d. version of the
measure produced by Carathéodory’s construction.
It is a reasonable general philosophy, in measure theory, to say that we wish to measure as many sets, and integrate
as many functions, as we can manage in a canonical way – I mean, without making blatantly arbitrary choices about
the values we assign to our measure or integral. The revision of a measure µ to its c.l.d. version µ̃ is about as far as
we can go with an arbitrary measure space in which we have no other structure to guide our choices.
You will observe that µ̃ is not as close to µ as the completion µ̂ of µ is; naturally so, because if E ∈ Σ is purely
infinite for µ then we have to choose between setting µ̃E = 0 6= µE and finding some way of fitting many sets of
finite measure into E; which if E is a singleton will be actually impossible, and in any case would be an arbitrary
process. However the integrable functions for µ̃, while not always the same as those for µ (since µ̃ turns purely infinite
sets into negligible ones, so that their characteristic functions become integrable), are ‘nearly’ the same, in the sense
that any µ̃-integrable function can be changed into a µ-integrable function by adjusting it on a µ̃-negligible set. This
corresponds, of course, to the fact that any set of finite measure for µ̃ is the symmetric difference of a set of finite
measure for µ and a µ̃-negligible set. For sets of infinite measure this can fail, unless µ is localizable (213Hb, 213Xb).
If (X, Σ, µ) is semi-finite, or localizable, or strictly localizable, then of course it is correspondingly closer to (X, Σ̃, µ̃),
as detailed in 213Ha-c.
214C Subspaces 33

It is worth noting that while the measure µ̌ obtained by Carathéodory’s method directly from the outer measure µ∗
defined from µ may fail to be semi-finite, even when µ is (213Yb), a simple modification of µ∗ (213Xg) yields the c.l.d.
version µ̃ of µ, which can also be obtained from an appropriate inner measure (213Xe). The measure µ̌ is of course
related in other ways to µ̃; see 213Xd.

214 Subspaces
In §131 I described a construction for subspace measures on measurable subsets. It is now time to give the gener-
alization to subspace measures on arbitrary subsets of a measure space. The relationship between this construction
and the properties listed in §211 is not quite as straightforward as one might imagine, and in this section I try to give
a full account of what can be expected of subspaces in general. I think that for the present volume only (i) general
subspaces of σ-finite spaces and (ii) measurable subspaces of general measure spaces will be needed in any essential
way, and these do not give any difficulty; but in later volumes we shall need the full theory.
I begin with a general construction for ‘subspace measures’ (214A-214C), with an account of integration with respect
to a subspace measure (214E-214G); these (with 131E-131H) give a solid foundation for the concept of ‘integration
over a subset’ (214D). I present this work in its full natural generality, which will eventually be essential, but even for
Lebesgue measure alone it is important to be aware of the ideas here. I continue with answers to some obvious questions
concerning subspace measures and the properties of measure spaces so far considered, both for general subspaces (214I)
and for measurable subspaces (214K), and I mention a basic construction for assembling measure spaces side-by-side,
the ‘direct sums’ of 214L-214M. At the end of the section I discuss a measure extension problem (214O-214P).

214A Proposition Let (X, Σ, µ) be a measure space, and Y any subset of X. Let µ∗ be the outer measure defined
from µ (132A-132B), and set ΣY = {E ∩ Y : E ∈ Σ}; let µY be the restriction of µ∗ to ΣY . Then (Y, ΣY , µY ) is a
measure space.
proof (a) I have noted in 121A that ΣY is a σ-algebra of subsets of Y .
(b) Of course µY F ∈ [0, ∞] for every F ∈ ΣY .
(c) µY ∅ = µ∗ ∅ = 0.
(d) If hFn in∈N is a disjoint sequence in ΣY with union F , then choose En , En′ , E ∈ Σ such that Fn = Y ∩E ′
S n , F n ⊆ En
′ ′
and µY Fn = µEn for each n, F ⊆ E and µY F = µE (using 132Aa repeatedly). Set Gn = En ∩ ES n ∩ E \ m<n Em for
each n ∈ N; then hGn in∈N is disjoint and Fn ⊆ Gn ⊆ En′ for each n, so µY Fn = µGn . Also F ⊆ n∈N Gn ⊆ E, so
S P∞ P∞
µY F = µ( n∈N Gn ) = n=0 µGn = n=0 µY Fn .
As hFn in∈N is arbitrary, µY is a measure.

214B Definition If (X, Σ, µ) is any measure space and Y is any subset of X, then µY , defined as in 214A, is the
subspace measure on Y .
It is worth noting the following.

214C Lemma Let (X, Σ, µ) be a measure space, Y a subset of X, µY the subspace measure on Y and ΣY its
domain. Then
(a) for any F ∈ ΣY , there is an E ∈ Σ such that F = E ∩ Y and µE = µY F ;
(b) for any A ⊆ Y , A is µY -negligible iff it is µ-negligible;
(c)(i) if A ⊆ X is µ-conegligible, then A ∩ Y is µY -conegligible;
(ii) if A ⊆ Y is µY -conegligible, then A ∪ (X \ Y ) is µ-conegligible;
(d) (µY )∗ , the outer measure on Y defined from µY , agrees with µ∗ on PY ;
(e) if Z ⊆ Y ⊆ X, then ΣZ = (ΣY )Z , the subspace σ-algebra of subsets of Z regarded as a subspace of (Y, ΣY ), and
µZ = (µY )Z is the subspace measure on Z regarded as a subspace of (Y, µY );
(f) if Y ∈ Σ, then µY , as defined here, is exactly the subspace measure on Y defined in 131A-131B; that is,
ΣY = Σ ∩ PY and µY = µ↾ΣY .
proof (a) By the definition of ΣY , there is an E0 ∈ Σ such that F = E0 ∩ Y . By 132Aa, there is an E1 ∈ Σ such that
F ⊆ E1 and µ∗ F = µE1 . Set E = E0 ∩ E1 ; this serves.
34 Taxonomy of measure spaces 214C

(b)(i) If A is µY -negligible, there is a set F ∈ ΣY such that A ⊆ F and µY F = 0; now µ∗ A ≤ µ∗ F = 0 so A is


µ-negligible, by 132Ad. (ii) If A is µ-negligible, there is an E ∈ Σ such that A ⊆ E and µE = 0; now A ⊆ E ∩ Y ∈ ΣY
and µY (E ∩ Y ) = 0, so A is µY -negligible.
(c) If A ⊆ X is µ-conegligible, then A ∩ Y is µY -conegligible, because Y \ A = Y ∩ (X \ A) is µ-negligible, therefore
µY -negligible. If A ⊆ Y is µY -conegligible, then A ∪ (X \ Y ) is µ-conegligible because X \ (A ∪ (X \ Y )) = Y \ A is
µY -negligible, therefore µ-negligible.
(d) Let A ⊆ Y . (i) If A ⊆ E ∈ Σ, then A ⊆ E ∩ Y ∈ ΣY , so µ∗Y A ≤ µY (E ∩ Y ) ≤ µE; as E is arbitrary, µ∗Y A ≤ µ∗ A.
(ii) If A ⊆ F ∈ ΣY , there is an E ∈ Σ such that F ⊆ E and µY F = µ∗ F = µE; now A ⊆ E so µ∗ A ≤ µE = µY F . As
F is arbitrary, µ∗ A ≤ µ∗Y A.
(e) That ΣZ = (ΣY )Z is immediate from the definition of ΣY , etc.; now
(µY )Z = µ∗Y ↾ΣZ = µ∗ ↾ΣZ = µZ
by (d).
(f ) This is elementary, because E ∩ Y ∈ Σ and µ∗ (E ∩ Y ) = µ(E ∩ Y ) for every E ∈ Σ.

214D Integration over subsets: Definition Let R (X, Σ,


R µ) be a measure space, RY a subset of X and f a
[−∞, ∞]-valued function defined on a subset of X. By Y f (or Y f (x)µ(dx), etc.) I mean (f ↾ Y )dµY , if this exists in
[−∞, ∞], following the definitions of 214A-214B, 133A and 135F, and taking dom(f ↾ Y ) = Y ∩ dom f , (f ↾ Y )(x) = f (x)
for x ∈ Y ∩ dom f . (Compare 131D.)

214E Proposition Let (X, Σ, µ) be a measure space, Y ⊆ X, and f a [−∞, ∞]-valued function defined on a subset
dom f of X. R R
(a) If f is µ-integrable then f ↾ Y is µY -integrable, and Y f ≤ f if f is non-negative.
˜
R (b) IfR dom f ⊆ Y and f is µY -integrable, then there is a µ-integrable function f on X, extending f , such that
f˜= f for every F ∈ Σ.
F F ∩Y
Pn
proof (a)(i) If f is P µ-simple, it is expressible as i=0 ai χEi , where E0 , . . . , En ∈ Σ, a0 , . . . , an ∈ R and µEi < ∞ for
n
each i. Now f ↾ Y = i=0 ai χY (Ei ∩ Y ), where χY (Ei ∩ Y ) = (χEi )↾ Y is the characteristic function of Ei ∩ Y regarded
as a subset of Y ; and each Ei ∩ Y belongs to ΣY , with µY (Ei ∩ Y ) ≤P µEi < ∞, so f ↾Y : Y → R is µY -simple.
n
If f : X → R is a non-negative simple Pn function, it is expressible as i=0 ai χEi where E0 , . . . , En are disjoint sets of
finite measure (122Cb). Now f ↾ Y = i=0 ai χY (Ei ∩ Y ) and
R Pn Pn R
(f ↾ Y )dµY = i=0 ai µY (Ei ∩ Y ) ≤ i=0 ai µEi = f dµ
because ai ≥ 0 whenever Ei 6= ∅, so that ai µY (Ei ∩ Y ) ≤ ai µEi for every i.
(ii) If f is a non-negative µ-integrable function, there is a non-decreasing sequence hfn in∈N of non-negative
µ-simple functions converging to f µ-almost everywhere; now hfn ↾ Y in∈N is a non-decreasing sequence of µY -simple
functions increasing to f ↾ Y µY -a.e. (by 214Cb), and
R R R
supn∈N (fn ↾ Y )dµY ≤ supn∈N fn dµ = f dµ < ∞,
R R
so (f ↾ Y )dµY exists and is at most f dµ.
(iii) Finally, if f is any µ-integrable real-valued function, it is expressible as f1 − f2 where f1 and f2 are non-
negative µ-integrable functions, so that f ↾ Y = (f1 ↾ Y ) − (f2 ↾ Y ) is µY -integrable.
(b) Let us say that if f is a µY -integrable function, ˜
R R then an ‘enveloping extension’ of f is a µ-integrable function f ,
˜
extending f , real-valued on X \ Y , such that F f = F ∩Y f for every F ∈ Σ.
(i) If f is of the form χH, where H ∈ ΣY and µY H < ∞, let E0 ∈ Σ be such that H = Y ∩ E0 and E1 ∈ Σ a
measurable envelope for H (132Ee); then E = E0 ∩ E1 is a measurable envelope for H and H = E ∩ Y . Set f˜ = χE,
regarded as a function from X to {0, 1}. Then f˜↾ Y = f , and for any F ∈ Σ we have
R R
F
f˜ = µF (E ∩ F ) = µ(E ∩ F ) = µ∗ (H ∩ F ) = µY ∩F (H ∩ F ) = Y ∩F
f.

So f˜ is an enveloping extension of f .
(ii) If f , g are µY -integrable functions with enveloping extensions f˜, g̃, and a, b ∈ R, then af˜+ bg̃ extends af + bg
and
214F Subspaces 35

Z Z Z
af˜ + bg̃ = a f˜ + b g̃
F
ZF F
Z Z
=a f +b g= af + bg
F ∩Y F ∩Y F ∩Y
for every F ∈ Σ, so af˜ + bg̃ is an enveloping extension of af + bg.
(iii) Putting (i) and (ii) together, we see that every µY -simple function f has an enveloping extension.
(iv) Now suppose that hfn in∈N is a non-decreasing sequence of non-negative µY -simple functions converging µY -
almost everywhere to a µY -integrable function f . For each n ∈ N let f˜n be an enveloping extension of fn . Then
f˜n ≤a.e. f˜n+1 . P
P If F ∈ Σ then
R R R R
F
f˜n = F ∩Y
fn ≤ F ∩Y
fn+1 = F
f˜n+1 .
So f˜n ≤a.e. f˜n+1 , by 131Ha. Q
Q Also
R R R
limn→∞ F
f˜n = limn→∞ F ∩Y
fn = F ∩Y
f
for every F ∈ Σ. Taking F = X to begin with, B.Levi’s theorem tellsR us thatR h = limn→∞ f˜n is defined (as a
real-valued function) µ-almost everywhere; now letting F vary, we have F h = F ∩Y f for every F ∈ Σ, because
h↾F = limn→∞ f˜n ↾F µF -a.e. (I seem to be using 214Cb here.) Now h↾ Y = f µY -a.e., by 214Cb again. If we define f˜
by setting
f˜(x) = f (x) for x ∈ dom f , h(x) for x ∈ dom h \ dom f , 0 for other x ∈ X,
then f˜ is defined everywhere in X and is equal to h µ-almost everywhere; so that if F ∈ Σ, f˜↾F will be equal to
h↾F µF -almost everywhere, and
R R R
˜=
f h = f.
F F F ∩Y

As F is arbitrary, f˜ is an enveloping extension of f .


(v) Thus every non-negative µY -integrable function has an enveloping extension. Using (ii) again, every µY -
integrable function has an enveloping extension, as claimed.

R 214F Proposition Let (X, Σ, µ) be a measure space, Y a subset of X, and f a [−∞, ∞]-valued function such that
RX f is defined in [−∞, ∞]. RIf either Y is of full outer measure in X or f is zero almost everywhere in X \ Y , then
Y
f is defined and equal to X f .
proof (a) Suppose first that f is non-negative, Σ-measurable and defined everywhere in X. In this case f ↾ Y is ΣY -
P4n
measurable. Set Fnk = {x : x ∈ X, f (x) ≥ 2−n k} for k, n ∈ N, fn = k=1 2−n χFnk for n ∈ N, R so that hfn in∈N
R is a
non-decreasing sequence of real-valued measurable functions converging everywhere to f , and X f = limn→∞ X fn .
For each n ∈ N and k ≥ 1,
µY (Fnk ∩ Y ) = µ∗ (Fnk ∩ Y ) = µFnk
either because Fnk \ Y is negligible or because X is a measurable envelope of Y . So

Z Z 4
X
n

f = lim fn = lim 2−n µY (Fnk ∩ Y )


Y n→∞ Y n→∞
k=1
4n
X Z Z
= lim 2−n µFnk = lim fn = f.
n→∞ n→∞ X X
k=1

(b) Now suppose that f is non-negative, defined almost everywhere in X and µ-virtually measurable. In this case
there is a conegligible measurable set E ⊆ dom f such that f ↾E is measurable. Set f˜(x) = f (x) for x ∈ E, 0 for
x ∈ X \ E; then f˜ satisfies the conditions of (a) and f = f˜ µ-a.e. Accordingly f ↾ Y = f˜↾ Y µY -a.e. (214Cc), and
R R R R
Y
f= Y
f˜ = X
f˜ = X
f.

(c) Finally, for the general case, we can apply (b) to the positive and negative parts f + , f − of f to get
R R R R R R
Y
f= Y
f+ − Y
f− = X
f+ − X
f− = X
f.
36 Taxonomy of measure spaces 214G

214G Corollary Let (X, Σ, µ) be a Rmeasure space, Y a subset of X, Rand E ∈ Σ a measurable envelope
R of Y . If f
is a [−∞, ∞]-valued function such that E f is defined in [−∞, ∞], then Y f is defined and equal to E f .
proof By 214Ce, we can identify the subspace measure µY with the subspace measure (µE )Y induced by the subspace
measure on E. Now, regarded as a subspace of E, Y is of full outer measure, so 214F gives the result.

214H Subspaces and Carathéodory’s method The following easy technical results will occasionally be useful.
Lemma Let X be a set, Y ⊆ X a subset, and θ an outer measure on X.
(a) θY = θ↾ PY is an outer measure on Y .
(b) Let µ, ν be the measures on X, Y defined by Carathéodory’s method from the outer measures θ, θY , and Σ, T
their domains; let µY be the subspace measure on Y induced by µ, and ΣY its domain. Then
(i) ΣY ⊆ T and νF ≤ µY F for every F ∈ ΣY ;
(ii) if Y ∈ Σ then ν = µY ;
(iii) if θ = µ∗ (that is, θ is ‘regular’) then ν extends µY ;
(iv) if θ = µ∗ and θY < ∞ then ν = µY .
proof (a) You have only to read the definition of ‘outer measure’ (113A).
(b)(i) Suppose that F ∈ ΣY . Then it is of the form E ∩ Y where E ∈ Σ. If A ⊆ Y , then
θY (A ∩ F ) + θY (A \ F ) = θ(A ∩ F ) + θ(A \ F ) = θ(A ∩ E) + θ(A \ E) = θA = θY A,
so F ∈ T. Now
νF = θY F = θF ≤ µ∗ F = µY F .

(ii) Suppose that F ∈ T. If A ⊆ X, then

θA = θ(A ∩ Y ) + θ(A \ Y ) = θY (A ∩ Y ) + θ(A \ Y )


= θY (A ∩ Y ∩ F ) + θY (A ∩ Y \ F ) + θ(A \ Y )
= θ(A ∩ F ) + θ(A ∩ Y \ F ) + θ(A \ Y )
= θ(A ∩ F ) + θ((A \ F ) ∩ Y ) + θ((A \ F ) \ Y ) = θ(A ∩ F ) + θ(A \ F );
as A is arbitrary, F ∈ Σ and therefore F ∈ ΣY . Also
µY F = µF = θF = θY F = νF .
Putting this together with (i), we see that µY and ν are identical.
(iii) Let F ∈ ΣY . Then F ∈ T, by (i). Now νF = θF = µ∗ F = µY F . As F is arbitrary, ν extends µY .
(iv) Now suppose that F ∈ T. Because µ∗ Y = θY < ∞, we have measurable envelopes E1 , E2 of F and Y \ F
for µ (132Ee). Then

θY = θY Y = θY F + θY (Y \ F ) = θF + θ(Y \ F )
= µ∗ F + µ∗ (Y \ F ) = µE1 + µE2 ≥ µ(E1 ∪ E2 ) = θ(E1 ∪ E2 ) ≥ θY,
so µE1 + µE2 = µ(E1 ∪ E2 ) and
µ(E1 ∩ E2 ) = µE1 + µE2 − µ(E1 ∪ E2 ) = 0.
As µ is complete (212A) and E1 ∩ Y \ F ⊆ E1 ∩ E2 is µ-negligible, therefore belongs to Σ, F = Y ∩ (E1 \ (E1 ∩ Y \ F ))
belongs to ΣY . Thus T ⊆ ΣY ; putting this together with (iii), we see that ν = µY .

214I I now turn to the relationships between subspace measures and the classification of measure spaces developed
in this chapter.
Theorem Let (X, Σ, µ) be a measure space and Y a subset of X. Let µY be the subspace measure on Y and ΣY its
domain.
(a) If (X, Σ, µ) is complete, or totally finite, or σ-finite, or strictly localizable, so is (Y, ΣY , µY ). If hXi ii∈I is a
decomposition of X for µ, then hXi ∩ Y ii∈I is a decomposition of Y for µY .
(b) Writing µ̂ for the completion of µ, the subspace measure µ̂Y = (µ̂)Y is the completion of µY .
(c) If (X, Σ, µ) has locally determined negligible sets, then µY is semi-finite.
214I Subspaces 37

(d) If (X, Σ, µ) is complete and locally determined, then (Y, ΣY , µY ) is complete and semi-finite.
(e) If (X, Σ, µ) is complete, locally determined and localizable then so is (Y, ΣY , µY ).
proof (a)(i) Suppose that (X, Σ, µ) is complete. If A ⊆ U ∈ ΣY and µY U = 0, there is an E ∈ Σ such that U = E ∩ Y
and µE = µY U = 0; now A ⊆ E so A ∈ Σ and A = A ∩ Y ∈ ΣY .
(ii) µY Y = µ∗ Y ≤ µX, so µY is totally finite if µ is.
(iii) If hXn in∈N is a sequence of sets of finite measure for µ which covers X, then hXn ∩ Y in∈N is a sequence of
sets of finite measure for µY which covers Y . So (Y, ΣY , µY ) is σ-finite if (X, Σ, µ) is.
(iv) Suppose that hXi ii∈I is a decomposition of X for µ. Then hXi ∩ Y ii∈I is a decomposition of Y for µY . P P
Because µY (Xi ∩ Y ) ≤ µXi < ∞ for each i, hXi ∩ Y ii∈I is a partition of Y into sets of finite measure. Suppose that
U ⊆ Y is such that Ui = U ∩ Xi ∩ Y ∈ ΣY for every i. For each S i ∈ I, choose Ei ∈ Σ such that Ui = Ei ∩ Y and
µEi = µY UPi ; we may of course suppose that E i ⊆ X i . Set E = i∈I Ei . Then E ∩ Xi = Ei ∈ Σ for every i, so E ∈ Σ
and µE = i∈I µEi . Now U = E ∩ Y so U ∈ ΣY and
P P
µY U ≤ µE = i∈I µEi = i∈I µY Ui .
P P
On the other hand, µY U is surely greater than or equal to i∈I µY Ui = supJ⊆I is finite i∈J µY Ui , so they are equal.
As U is arbitrary, hXi ∩ Y ii∈I is a decomposition of Y for µY . Q Q
Consequently (Y, ΣY , µY ) is strictly localizable if (X, Σ, µ) is.
(b) The domain of the completion (µY )b is

Σ̂Y = {F △A : F ∈ ΣY , A ⊆ Y is µY -negligible}
= {(E ∩ Y )△(A ∩ Y ) : E ∈ Σ, A ⊆ X is µ-negligible}
(214Cb)
= {(E△A) ∩ Y : E ∈ Σ, A is µ-negligible} = dom µ̂Y .

If H ∈ Σ̂Y then
(µY )b(H) = µ∗Y H = µ∗ H = (µ̂)∗ H = µ̂Y H,
using 214Cd for the second step, and 212Ea for the third.
(c) Take U ∈ ΣY such that µY U > 0. Then there is an E ∈ Σ such that µE < ∞ and µ∗ (E ∩ U ) > 0. P P??
Otherwise, E ∩ U is µ-negligible whenever µE < ∞; because µ has locally determined negligible sets, U is µ-negligible
and µY U = µ∗ U = 0. XXQQ Now E ∩ U ∈ ΣY and
0 < µ∗ (E ∩ U ) = µY (E ∩ U ) ≤ µE < ∞.

(d) By (a), µY is complete; by (c) and 213J, it is semi-finite.


(e) By (d), µY is complete and semi-finite. To see that it is locally determined, take any U ⊆ Y such that U ∩V ∈ ΣY
whenever V ∈ ΣY and µY V < ∞. By 213L and 213J, there is a measurable envelope E of U for µ; of course E ∩Y ∈ ΣY .
I claim that µ(E ∩ Y \ U ) = 0. P
P Take any F ∈ Σ with µF < ∞. Then F ∩ U ∈ ΣY , so
µY (F ∩ E ∩ Y ) ≤ µ(F ∩ E) = µ∗ (F ∩ U ) = µY (F ∩ U ) ≤ µY (F ∩ E ∩ Y );
thus µY (F ∩ E ∩ Y ) = µY (F ∩ U ) and
µ∗ (F ∩ E ∩ Y \ U ) = µY (F ∩ E ∩ Y \ U ) = 0.
Because µ is complete, µ(F ∩ E ∩ Y \ U ) = 0; because µ is locally determined and F is arbitrary, µ(E ∩ Y \ U ) = 0.
Q But this means that E ∩ Y \ U ∈ ΣY and U ∈ ΣY . As U is arbitrary, µY is locally determined.
Q
To see that µY is localizable, let U be any family in ΣY . Set
E = {E : E ∈ Σ, µE < ∞, µE = µ∗ (E ∩ U ) for some U ∈ U },
and let G ∈ Σ be an essential supremum for E in Σ. I claim that G ∩ Y is an essential supremum for U in ΣY . P P (i)
?? If U ∈ U and U \ (G ∩ Y ) is not negligible, then (because µY is semi-finite) there is a V ∈ ΣY such that V ⊆ U \ G
and 0 < µY V < ∞. Now there is an E ∈ Σ such that V ⊆ E and µE = µ∗ V . We have µ∗ (E ∩ U ) ≥ µ∗ V = µE,
so E ∈ E and E \ G must be negligible; but V ⊆ E \ G is not negligible. X X Thus U \ (G ∩ Y ) is negligible for every
U ∈ U . (ii) If W ∈ ΣY is such that U \ W is negligible for every U ∈ U , express W as H ∩ Y where H ∈ Σ. If E ∈ E,
38 Taxonomy of measure spaces 214I

there is a U ∈ U such that µE = µ∗ (E ∩ U ); now µ∗ (E ∩ U \ W ) = 0, so µE = µ∗ (E ∩ U ∩ W ) ≤ µ(E ∩ H) and


E \ H is negligible. As E is arbitrary, H is an essential upper bound for E and G \ H is negligible; but this means that
G ∩ Y \ W is negligible. As W is arbitrary, G ∩ Y is an essential supremum for U . Q Q
As U is arbitrary, µY is localizable.

214J Upper and lower integrals The following elementary facts are sometimes useful.
Proposition Let (X, Σ, µ) be a measure space, A a subset of X and f a real-valued function defined almost everywhere
in X. Then R R
(a) if either f is non-negative or A has R measure in X, (f ↾A)dµA ≤ f dµ;
R full outer
(b) if A has full outer measure in X, f dµ ≤ (f ↾A)dµA .
R
proof (a)(i) Suppose that f is non-negative. If f dµ = ∞, the result is trivial. Otherwise, there is a µ-integrable
R R R
function g such that f ≤ g µ-a.e. and R f dµ = g dµ, by 133Ja. Now f ↾A ≤ g↾A µA -a.e., by 214Cb, and (g↾A) dµA
is defined and less than or equal to g dµ, by 214Ea; so
R R R R
(f ↾A)dµA ≤ (g↾A)dµA ≤ g dµ = f dµ.
R
(ii) Now suppose that A has full outer measure in X. If g is such that f ≤ g µ-a.e. and g dµ is defined in
R R R R
[−∞, ∞], then f ↾A ≤ g↾A µA -a.e. and (g↾A)dµA = g dµ, by 214F. So (f ↾A)dµA ≤ g dµ. As g is arbitrary,
R R
(f ↾A)dµA ≤ f dµ.
(b) Apply (a) to −f , and use 133J(b-iv).

214K Measurable subspaces: Proposition Let (X, Σ, µ) be a measure space.


(a) Let E ∈ Σ and let µE be the subspace measure, with ΣE its domain. If (X, Σ, µ) is complete, or totally finite,
or σ-finite, or strictly localizable, or semi-finite, or localizable, or locally determined, or atomless, or purely atomic, so
is (E, ΣE , µE ).
(b) Suppose that hXi ii∈I is a partition of X into measurable sets (not necessarily of finite measure) such that
Σ = {E : E ⊆ X, E ∩ Xi ∈ Σ for every i ∈ I},
P
µE = i∈I µ(E ∩ Xi ) for every E ∈ Σ.
Then (X, Σ, µ) is complete, or strictly localizable, or semi-finite, or localizable, or locally determined, or atomless, or
purely atomic, iff (Xi , ΣXi , µXi ) has that property for every i ∈ I.
proof I really think that if you have read attentively up to this point, you ought to find this easy. If you are in any
doubt, this makes a very suitable set of sixteen exercises to do.
S
214L Direct sums Let h(Xi , Σi , µi )ii∈I be any indexed family of measure spaces. Set X = i∈I (Xi × {i}); for
E ⊆ X, i ∈ I set Ei = {x : (x, i) ∈ E}. Write
Σ = {E : E ⊆ X, Ei ∈ Σi for every i ∈ I},
P
µE = i∈I µi Ei for every E ∈ Σ.
Then it is easy to check that (X, Σ, µ) is a measure space; I will call it the direct sum of the family h(Xi , Σi , µi )ii∈I .
L decomposition hXi ii∈I , then we have a natural
Note that if (X, Σ, µ) is any strictly localizable measure space, with
isomorphism between (X, Σ, µ) and the direct sum (X ′ , Σ′ , µ′ ) = i∈I (Xi , ΣXi , µXi ) of the subspace measures, if we
match (x, i) ∈ X ′ with x ∈ X for every i ∈ I and x ∈ Xi .
For some of the elementary properties (to put it plainly, I know of no properties which are not elementary) of direct
sums, see 214M and 214Xh-214Xk.

214M Proposition Let h(Xi , Σi , µi )ii∈I be a family of measure spaces, with direct sum (X, Σ, µ). Let f be a
real-valued function defined on a subset of X. For each i ∈ I, set fi (x) = f (x, i) whenever (x, i) ∈ dom f .
(a) f is measurable iff fi is measurable
R Pfor every
R i ∈ I.
(b) If f is non-negative, then f dµ = i∈I fi dµi if either is defined in [0, ∞].
proof (a) For a ∈ R, set Fa = {(x, i) : (x, i) ∈ dom f, f (x, i) ≥ a}. (i) If f is measurable, i ∈ I and a ∈ R, then there
is an E ∈ Σ such that Fa = E ∩ dom f ; now
*214O Subspaces 39

{x : fi (x) ≥ a} = dom fi ∩ {x : (x, i) ∈ E}


belongs to the subspace σ-algebra on dom fi induced by Σi . As a is arbitrary, fi is measurable. (ii) If every fi is
measurable and a ∈ R, then for each i ∈ I there is an Ei ∈ Σi such that {x : (x, i) ∈ Fa } = Ei ∩ dom f ; setting
E = {(x, i) : i ∈ I, x ∈ Ei }, Fa = dom f ∩ E belongs to the subspace σ-algebra on dom f . As a is arbitrary, f is
measurable.
(b)(i) Suppose first that f is measurable and defined everywhere. Set Fnk = {(x, i) : (x, i) ∈ X, f (x, i) ≥ 2−n k} for
P 4n
k, n ∈ N, gn = k=1 2−n χFnk for n ∈ N, Fnki = {x : (x, i) ∈ Fnk } for k, n ∈ N and i ∈ I, gni (x) = gn (x, i) for i ∈ I,
x ∈ Xi . Then
Z Z 4
X
n

f dµ = lim gn dµ = sup 2−n µFnk


n→∞ n∈N
k=1
4n X
X X 4
X
n

−n
= sup 2 µFnki = sup 2−n µFnki
n∈N n∈N
k=1 i∈I i∈I k=1
X Z XZ
= sup gni dµi = fi dµi .
n∈N
i∈I i∈I

R
(ii) Generally, if f dµ is defined, there are a measurable g : X → [0, ∞[ and a P conegligible measurable set E ⊆
dom f such that g = f on E. Now Ei = {x : (x, i) ∈ Xi } belongs to Σi for each i, and i∈I µi (Xi \ Ei ) = µ(X \ E) = 0,
so Ei is µi -conegligible for every i. Setting gi (x) = g(x, i) for x ∈ Xi , (i) tells us that
P R P R R R
i∈I fi dµi = i∈I gi dµi = g dµ = f dµ.
R
(iii) On the other hand, if fi dµi is defined for every i ∈ I, then for each i ∈ I we can find a measurable function
gi : Xi → [0, ∞[ and a µi -conegligible measurable set Ei ⊆ dom fi such that gi = fi on Ei . Setting g(x, i) = gi (x) for
i ∈ I, x ∈ Xi , (a) tells us that g is measurable, while g = f on {(x, i) : i ∈ I, x ∈ Ei }, which is conegligible (by the
calculation in (ii) just above); so
R R P R P R
f dµ = g dµ = i∈I gi dµi = i∈I fi dµi ,
again using (i) for the middle step.

214N Corollary Let (X, Σ, µ) be a measure space with a decomposition hXi ii∈I . If f is a real-valued function
defined on a subset of X, then
(a) f is measurable P i isRmeasurable for every i ∈ I,
R iff f ↾X
(b) if f ≥ 0, then f = i∈I Xi f if either is defined in [0, ∞].
proof Apply 214M to the direct sum of h(Xi , ΣXi , µXi )ii∈I , identified with (X, Σ, µ) as in 214L.

*214O I make space here for a general theorem which puts rather heavy demands on the reader. So I ought to say
that I advise skipping it on first reading. It will not be quoted in this volume, in the full form here I do not expect
to use it anywhere in this treatise, only the special case of 214Xm is at all often applied, and the proof depends on a
concept (‘ideal of sets’) and a technique (‘transfinite induction’, part (d) of the proof of 214P) which are used nowhere
else in this volume. However, ‘extension of measures’ is one of the central themes of Volume 4, and this result may
help to make sense of some of the patterns which will appear there.
Lemma Let (X, Σ, µ) be a measure space, and I an ideal of subsets of X, that is, a family of subsets of X such that
∅ ∈ I, I ∪ J ∈ I for all I, J ∈ I, and I ∈ I whenever I ⊆ J ∈ I. Then there is a measure λ on X such that
Σ ∪ I ⊆ dom λ, µE = λE + supI∈I µ∗ (E ∩ I) for every E ∈ Σ, and λI = 0 for every I ∈ I.
S
proof (a) Let Λ be the set of those F ⊆ X such that there are E ∈ Σ and a countable
S J ⊆ I such that E△F ⊆ J .
Then Λ Sis a σ-algebra of subsets of X including Σ ∪ I. P
P Σ ⊆ Λ because E△E ⊆ ∅ for every E ∈ Σ. I ⊆ Λ because
∅△I ⊆ S {I} for every I ∈ I. In particular,
S ∅ ∈ Λ. If F ∈ Λ, let E ∈ Σ and J ⊆ I be such that J is countable and
F △E ⊆ J ; then (X \ F )△(X \ E) ⊆ J so X \ F ∈ Λ. If hFn in∈N is aSsequence in Λ with S union F , then for
n ∈ N choose En ∈ Σ, Jn ⊆ I such that Jn is countable
each S S and E n △F n ⊆ J n ; then E = n∈N En belongs to Σ,
J = n∈N Jn is a countable subset of I and E△F ⊆ J , so F ∈ Σ. Thus Λ is a σ-algebra. Q Q
(b) For F ∈ Λ set
40 Taxonomy of measure spaces *214O

λF = sup{µE : E ∈ Σ, E ⊆ F , µ∗ (E ∩ I) = 0 for every I ∈ I}.


Then λ is
Pa∞measure. P P The only subset of ∅ is ∅, so λ∅ = 0. Let hFn in∈N be a disjoint sequence in Λ with union F ;
set u = n=0 λFn . (i) If E ∈ Σ, E ⊆ F and µ∗ (E ∩ I) = 0 for every I ∈ I, then for each n set En = E ∩ Fn . As
µ∗ (En ∩ I) = 0 for every I ∈ I, µEn ≤ λFn for each n. Now hEn in∈N is disjoint and has union E, so
P∞ P∞
µE = n=0 µEn ≤ n=0 λFn = u.
As E is arbitrary, λF ≤ u. (ii) Take any γ < u. For n ∈ N, set γn = λFn − 2−n−1 min(1, u − γ) if λFn is finite, γ
otherwise.
S For each n, we can find an En S ∈ Σ such that En ⊆ Fn , µ∗ (En ∩ I) = 0 for every I ∈ I, andPµEn ≥ γn . Set

E = n∈N En ; then E ⊆ F and E ∩ I = n∈N En ∩ I is µ-negligible for every I ∈ I, so λF ≥ µE = n=0 µEn ≥ γ.
As γ is arbitrary, λF ≥ u. (iii) As hFn in∈N is arbitrary, λ is a measure. Q
Q
(c) Now take any E ∈ Σ and set u = supI∈I µ∗ (E ∩ I). If u = ∞ then we certainly have µE S = ∞ = λE + u.
Otherwise, let hIn in∈N be a sequence in I such that limn→∞ µ∗ (E ∩ In ) = u; replacing In by m≤n Im for each n
S
if necessary, we may suppose that hIn in∈N is non-decreasing. Set A = E ∩ n∈N In ; because E ∩ In has finite outer
measure for each n, A can be covered by a sequence of sets of finite measure, and has a measurable envelope H for µ
included in E (132Ee). Observe that
µH = µ∗ A = supn∈N µ∗ (E ∩ In ) = u
by 132Ae.
Set G = E \ H. Then µ∗ (G ∩ I) = 0 for every I ∈ I. P
P For any n ∈ N there is an F ∈ Σ such that F ⊇ E ∩ (In ∪ I)
and µF ≤ u; in which case
µ∗ (G ∩ I) + µ∗ (E ∩ In ) ≤ µ(F \ H) + µ(F ∩ H) ≤ u.
As n is arbitrary, µ∗ (G ∩ I) = 0. Q
Q Accordingly
u + λE ≥ µH + µG = µE.
On the other hand, if F ∈ Σ is such that F ⊆ E and µ∗ (F ∩ I) = 0 for every I ∈ I, then
µ∗ (E ∩ In ) ≤ µ(E \ F ) + µ∗ (F ∩ In ) = µ(E \ F )
for every n, so
u + µF ≤ µ(E \ F ) + µF = µE;
as F is arbitrary, u + λE ≤ µE.
(d) If J ∈ I, F ∈ Σ, F ⊆ J and µ∗ (F ∩ I) = 0 for every I ∈ I, then F ∩ J = F is µ-negligible; as F is arbitrary,
λJ = 0. Thus λ has all the required properties.

*214P Theorem Let (X, Σ, µ) be a measure space, and A a family of subsets of X which is well-ordered by the
relation ⊆. Then there is an extension of µ to a measure λ on X such that λ(E ∩ A) is defined and equal to µ∗ (E ∩ A)
whenever E ∈ Σ and A ∈ A.
proof (a) Adding ∅ and X to A if necessary, we may suppose that A has ∅ as its least member and X as its greatest
member. By 2A1Dg, A is isomorphic, as ordered set, to some ordinal; since A has a greatest member, this ordinal is a
successor, expressible as ζ + 1; let ξ 7→ Aξ : ζ + 1 → A be the order-isomorphism, so that hAξ iξ≤ζ is a non-decreasing
family of subsets of X, A0 = ∅ and Aζ = X.
S
(b) For each ordinal ξ ≤ ζ, write µξ for the subspace measure on Aξ , Σξ for its domain and Iξ for η<ξ PAη .
Because Aη ∪ Aη′ = Amax(η,η′ ) for η, η ′ < ξ, Iξ is an ideal of subsets of Aξ . By 214O, we have a measure λξ on Aξ ,
with domain Λξ ⊇ Σξ ∪ Iξ , such that µξ E = λξ E + supI∈Iξ µ∗ξ (E ∩ I) for every E ∈ Σξ and λξ I = 0 for every I ∈ Iξ .
Because every member of Iξ is included in Aη for some η < ξ, we have
µ∗ (E ∩ Aξ ) = λξ (E ∩ Aξ ) + supη<ξ µ∗ξ (E ∩ Aη ) = λξ (E ∩ Aξ ) + supη<ξ µ∗ (E ∩ Aη )
(214Cd) for every E ∈ Σ. Also, of course, λξ Aη = 0 for every η < ξ.
(c) Now set
Λ = {F : F ⊆ X, F ∩ Aξ ∈ Λξ for every ξ ≤ ζ},
P
λF = ξ≤ζ λξ (F ∩ Aξ )
for every F ∈ Λ. Because Λξ is a σ-algebra of subsets of Aξ for each ξ, Λ is a σ-algebra of subsets of X; because every
λξ is a measure, so is λ. If E ∈ Σ, then
214Xg Subspaces 41

E ∩ Aξ ∈ Σξ ⊆ Λξ
for each ξ, so E ∈ Λ. If η ≤ ζ, then for each ξ ≤ ζ either η < ξ and
Aη ∩ Aξ = Aη ∈ Iξ ⊆ Λξ
or η ≥ ξ and Aη ∩ Aξ = Aξ belongs to Λξ . So Aη ∈ Λ for every η ≤ ζ.
(d) Finally, λ(E ∩ Aξ ) = µ∗ (E ∩ Aξ ) whenever E ∈ Σ and ξ ≤ ζ. P P?? Otherwise, because the ordinal ζ + 1 is
well-ordered, there is a least ξ such that λ(E ∩ Aξ ) 6= µ∗ (E ∩ Aξ ). As A0 = ∅ we surely have λ(E ∩ A0 ) = µ∗ (E ∩ A0 )
and ξ > 0. Note that if η > ξ, then λη (E ∩ Aξ ) = 0; so
P P
λ(E ∩ Aξ ) = η≤ξ λη (E ∩ Aξ ∩ Aη ) = η≤ξ λη (E ∩ Aη ).
Now

µ∗ (E ∩ Aξ ) = λξ (E ∩ Aξ ) + sup µ∗ (E ∩ Aξ′ )
ξ ′ <ξ
((b) above)
X
= λξ (E ∩ Aξ ) + sup λη (E ∩ Aη )
ξ ′ <ξ
η≤ξ ′
(because ξ was the first problematic ordinal)
X
= λξ (E ∩ Aξ ) + sup sup λη (E ∩ Aη )
ξ ′ <ξ K⊆ξ ′ +1 is finite η∈K

(see the definition of ‘sum’ in 112Bd, or 226A below)


X
= λξ (E ∩ Aξ ) + sup λη (E ∩ Aη )
K⊆ξ is finite η∈K
X X
= sup λη (E ∩ Aη ) = λη (E ∩ Aη ) 6= µ∗ (E ∩ Aξ )
K⊆ξ+1 is finite η∈K
η≤ξ

by the choice of ξ; but this is absurd. X


XQQ
In particular,
λE = λ(E ∩ Aζ ) = µ∗ (E ∩ Aζ ) = µE
for every E ∈ Σ. This completes the proof of the theorem.

214X Basic exercises (a) Let (X, Σ, µ) be a localizable measure space. Show that there is an E ∈ Σ such that
the subspace measure µE is purely atomic and µX\E is atomless.

(b) Let X be a set, θ a regular outer measure on X, and Y a subset of X. Let µ be the measure on X defined by
Carathéodory’s method from θ, µY the subspace measure on Y , and ν the measure on Y defined by Carathéodory’s
method from θ↾ PY . Show that if µY is locally determined (in particular, if µ is locally determined and localizable)
then ν = µY .

(c) Let (X, Σ, µ) be a localizable measure space, and Y a subset of X such that the subspace measure µY is
semi-finite. Show that µY is localizable.

> (d) Let (X, Σ, µ) be a measure space, and Y a subset of X such that the subspace measure µY is semi-finite. (i)
Show that a set F ⊆ Y is an atom for µY iff it is of the form E ∩ Y where E an atom for µ. (ii) Show that if µ is
atomless or purely atomic, so is µY .

(e) Let (X, Σ, µ) be a localizable measure space, and Y any subset of X. Show that the c.l.d. version of the subspace
measure on Y is localizable.

(f ) Let (X, Σ, µ) be a measure space with locally determined negligible sets, and Y a subset of X, with its subspace
measure µY . Show that µY has locally determined negligible sets.

> (g) Let (X, Σ, µ) be a measure space. Show that (X, Σ, µ) has locally determined negligible sets iff the subspace
measure µY is semi-finite for every Y ⊆ X.
42 Taxonomy of measure spaces 214Xh

> (h) Let h(Xi , Σi , µi )ii∈I be a family of measure spaces, with direct sum (X, Σ, µ) (214L). Set Xi′ = Xi × {i} ⊆ X
for each i ∈ I. Show that Xi′ , with the subspace measure, is isomorphic to (Xi , Σi , µi ). Under what circumstances is
hXi′ ii∈I a decomposition of X? Show that µ is complete, or strictly localizable, or localizable, or locally determined,
or semi-finite, or atomless, or purely atomic iff every µi is. Show that a measure space is strictly localizable iff it is
isomorphic to a direct sum of totally finite spaces.

> (i) Let h(Xi , Σi , µi )ii∈I be a family of measure spaces, and (X, Σ, µ) their direct sum. Show that the completion
of (X, Σ, µ) can be identified with the direct sum of the completions of the (Xi , Σi , µi ), and that the c.l.d. version of
(X, Σ, µ) can be identified with the direct sum of the c.l.d. versions of the (Xi , Σi , µi ).

(j) Let h(Xi , Σi , µi )ii∈I be a family of measure spaces. Show that their direct sum has locally determined negligible
sets iff every µi has.

(k) Let h(Xi , Σi , µi )ii∈I be a family of measure spaces, and (X, Σ, µ) their direct sum. Show that (X, Σ, µ) has the
measurable envelope property (213Xl) iff every (Xi , Σi , µi ) has.
R
(l) Let (X, Σ, µ) be a measure space, Y a subset of X, and f : X → [0, ∞] a function such that Y f is defined in
R R
[0, ∞]. Show that Y f = f × χY dµ.

> (m) Write out a direct proof of 214P in the special case in which A = {A}. (Hint: for E, F ∈ Σ,
λ((E ∩ A) ∪ (F \ A)) = µ∗ (E ∩ A) + sup{µG : G ∈ Σ, G ⊆ F \ A}.)

> (n) Let (X, Σ, µ) be a measure space and A a finite family of subsets of X. Show that there is a measure on X,
extending µ, which measures every member of A.

214Y Further exercises (a) Let (X, Σ, µ) be a measure space and A a subset of X such that the subspace measure
on A is semi-finite. Set α = sup{µE : E ∈ Σ, E ⊆ A}. Show that if α ≤ γ ≤ µ∗ A then there is a measure λ on X,
extending µ, such that λA = γ.

(b) Let (X, Σ, µ) be a measure space and hAn in∈Z a double-ended sequence of subsets of X such that Am ⊆ An
whenever m ≤ n in Z. Show that there is a measure on X, extending µ, which measures every An . (Hint: use 214P
twice.)

(c) Let X be a set and A a family of subsets of X. Show that the following are equiveridical: (i) for every measure
µ on X there is a measure on X extending µ and measuring every member of A; (ii) for every totally finite measure µ
on X there is a measure on X extending µ and measuring every member of A. (Hint: 213Xa.)

214 Notes and comments I take the first part of the section, down to 214H, slowly and carefully, because while none
of the arguments are deep (214Eb is the longest) the patterns formed by the results are not always easy to predict.
There is a counter-example to a tempting extension of 214H/214Xb in 216Xb.
The message of the second part of the section (214I-214L) is that subspaces inherit many, but not all, of the properties
of a measure space; and in particular there is a difficulty with semi-finiteness, unless we have locally determined
negligible sets (214Xg). (I give an example in 216Xa.) Of course 213Hb shows that if we start with a localizable space,
we can convert it into a complete locally determined localizable space without doing great violence to the structure of
the space, so the difficulty is ordinarily superable.
By far the most important case of 214P is when A = {A} is a singleton, so that the argument simplifies dramatically
(214Xm). In §439 of Volume 4 I will return to the problem of extending a measure to a given larger σ-algebra in the
absence of any helpful auxiliary structure. That section will mostly offer counter-examples, in particular showing that
there is no general theorem extending 214Xn from finite families to countable families, and that the special conditions
in 214P and 214Yb are there for good reasons. But in §552 of Volume 5 I will present some positive results dependent
on special axioms beyond those of ZFC.
215B σ-finite spaces and the principle of exhaustion 43

215 σ-finite spaces and the principle of exhaustion


I interpolate a short section to deal with some useful facts which might get lost if buried in one of the longer sections
of this chapter. The great majority of the applications of measure theory involve σ-finite spaces, to the point that
many authors skim over any others. I myself prefer to signal the importance of such concepts by explicitly stating just
which theorems apply only to the restricted class of spaces. But undoubtedly some facts about σ-finite spaces need to
be grasped early on. In 215B I give a list of properties characterizing σ-finite spaces. Some of these make better sense
in the light of the principle of exhaustion (215A). I take the opportunity to include a fundamental fact about atomless
measure spaces (215D).

215A The principle of exhaustion The following is an example of the use of one of the most important methods
in measure theory.
Lemma Let (X, Σ, µ) be any measure space and E ⊆ Σ a non-empty set such that supn∈N µFn is finite for every
non-decreasing sequence hFn in∈N in E.
(a) There is a non-decreasing sequence hFn in∈N in E suchSthat, for every E ∈ Σ, either there is an n ∈ N such that
E ∪ Fn is not included in any member of E or, setting F = n∈N Fn ,
limn→∞ µ(E \ Fn ) = µ(E \ F ) = 0.
In particular, if E ∈ E and E ⊇ F , then E \ F is negligible. S
(b) If E is upwards-directed, then there is a non-decreasing sequence hFn in∈N in E such that, setting F = n∈N Fn ,
µF = supE∈E µE and E \ F is negligible for every E ∈ E, so that F is an essential supremum of E in Σ in the sense of
211G.
(c) If the union of any non-decreasing sequence in E belongs to E, then there is an F ∈ E such that E \ F is negligible
whenever E ∈ E and F ⊆ E.
proof (a) Choose hFn in∈N , hEn in∈N and hun in∈N inductively, as follows. Take F0 to be any member of E. Given
Fn ∈ E, set En = {E : Fn ⊆ E ∈ E} and un = sup{µE : E ∈ En } in [0, ∞], and choose Fn+1 ∈ En such that
µFn+1 ≥ min(n, un − 2−n ); continue.
Observe that this construction yields a non-decreasing sequence hFn in∈N in E. Since En+1 ⊆ En for every n, hun in∈N
is non-increasing, and has a limit u in [0, ∞]. Since min(n, u − 2−n ) ≤ µFn+1 ≤ un for every n, limn→∞ µFn = u. Our
hypothesis on E now tells us that u is finite.
If E ∈ Σ is such that for every n ∈ N there is an En ∈ E such that E ∪ Fn ⊆ En , then En ∈ En , so
µFn ≤ µ(E ∪ Fn ) ≤ µEn ≤ un
for every n, and limn→∞ µ(E ∪ Fn ) = u. But this means that
µ(E \ F ) ≤ limn→∞ µ(E \ Fn ) = limn→∞ µ(E ∪ Fn ) − µFn = 0,
as stated. In particular, this is so if E ∈ E and E ⊇ F .
(b) Take hFn in∈N from (a). If E ∈ E, then (because E is upwards-directed) E ∪ Fn is included in some member of
E for every n ∈ N; so we must have the second alternative of (a), and E \ F is negligible. It follows that
supE∈E µE ≤ µF = limn→∞ µFn ≤ supE∈E µE,
so µF = supE∈E µE.
If G is any measurable set such that E \ F is negligible for every E ∈ E, then Fn \ G is negligible for every n, so
that F \ G is negligible; thus F is an essential supremum for E.
S
(c) Again take hFn in∈N from (a), and set F = n∈N En . Our hypothesis now is that F ∈ E, so has both the
properties declared.

215B σ-finite spaces are so important that I think it is worth spelling out the following facts.
Proposition Let (X, Σ, µ) be a semi-finite measure space. Write N for the family of µ-negligible sets and Σf for the
family of measurable sets of finite measure. Then the following are equiveridical:
(i) (X, Σ, µ) is σ-finite;
(ii) every disjoint family in Σf \ N is countable;
(iii) every disjoint family in Σ \ N is countable; S
(iv) for every E ⊆ Σ there is a countable set E0 ⊆ E such that E \ E0 is negligible for every E ∈ E;
S (v) for every non-empty upwards-directed E ⊆ Σ there is a non-decreasing sequence hFn in∈N in E such that E \
n∈N Fn is negligible for every E ∈ E;
44 Taxonomy of measure spaces 215B

S
(vi) for every non-empty E ⊆ Σ, there is a non-decreasing sequence hFn in∈N in E such that E \ n∈N Fn is negligible
whenever E ∈ E and E ⊇ Fn for every n ∈ N;
(vii) either µX = 0 or there is a probability measure ν on X with the same domain and the same negligible sets as
µ;
(viii) there is a measurable integrable function f : X → ]0, 1]; R
(ix) either µX = 0 or there is a measurable function f : X → ]0, ∞[ such that f dµ = 1.
proof (i)⇒(vii) and (viii) If µX = 0, (vii) is trivial and we can take f = χX in (viii). Otherwise, let hEn in∈N be
in Σf covering X. Then it is P
a disjoint sequence P easy to see that there is a sequence hαn in∈N of strictly positive real
∞ ∞
numbers such that n=0 αn µEn = 1. Set νE = n=0Pαn µ(E ∩ En ) for E ∈ Σ; then ν is a probability measure with

domain Σ and the same negligible sets as µ. Also f = n=0 min(1, αn )χEn is a strictly positive measurable integrable
function.
(vii)⇒(vi) and (v) Assume (vii), and let E be a non-empty family of measurable sets. If µX = 0 then (vi) and (v)
are certainly true. Otherwise, let ν be a probability measure with domain Σ and the same negligible sets as S
µ. Since
supE∈E νE ≤ 1 is finite, we can applyS215Aa to find a non-decreasing sequence hFSn i n∈N in E such that E \ n∈N Fn
is negligible whenever E ∈ E includes n∈N Fn ; and if E is upwards-directed, E \ n∈N Fn will be negligible for every
E ∈ E, as in 215Ab.
(vi)⇒(iv) Assume (vi), and let E be any subset of Σ. Set
S
H = { E0 : E0 ⊆ E is countable}.
S
By (vi), there is a sequence hHn in∈N in HSsuch that H \ n∈N Hn is negligible whenever HS∈ H and H ⊇ Hn for every
′ ′ ′
n ∈ N. Now we can S Hn as En , where En ⊆ E is countable; setting E0 = n∈NSEn , E0 is countable.
S express each S If
E ∈ E, then E ∪ n∈N Hn = ({E} ∪ E0 ) belongs to H and includes every Hn , so that E \ E0 = E \ n∈N Hn is
negligible. So E0 has the property we need, and (iv) is true.
S
(iv)⇒(iii) Assume (iv). If E Sis a disjoint family in Σ \ N , take a countable E0 ⊆ E such that E \ E0 is negligible
for every E ∈ E. Then E = E \ E0 is negligible for every E ∈ E \ E0 ; but this just means that E \ E0 is empty, so that
E = E0 is countable.
(iii)⇒(ii) is trivial.
(ii)⇒(i) Assume (ii). Let P be the set of all disjoint subsets of Σf \ N , ordered by ⊆. Then P is a partially ordered
S empty (as ∅ ∈ P), and if Q ⊆ P is non-empty and totally ordered then it has an upper bound in P. P
set, not P Set
E = Q, the union of all the disjoint families belonging to Q. If E ∈ E then E ∈ C for some C ∈ Q, so E ∈ Σf \ N .
If E, F ∈ E and E 6= F , then there are C, D ∈ Q such that E ∈ C, F ∈ D; now Q is totally ordered, so one of C, D is
larger than the other, and in either case C ∪ D is a member of Q containing both E and F . But since any member of
Q is a disjoint collection of sets, E ∩ F = ∅. As E and F are arbitrary, E is a disjoint family of sets and belongs to P.
And of course C ⊆ E for every C ∈ Q, so E is an upper bound for Q in P. Q Q S
By Zorn’s
S Lemma (2A1M), P has a maximal element E say. By (ii), E must be countable, so E ∈ Σ. Now
H = X \ E is negligible. P P?? Suppose, if possible, otherwise. Because (X, Σ, µ) is semi-finite, there is a set G of
finite measure such that G ⊆ H and µG > 0, that is, G ∈ Σf \ N and G ∩ E = ∅ for every E ∈ E. But this means that
{G} ∪ E is a member of P strictly larger than E, which is supposed to be impossible. X XQ Q
Let hXn in∈N be a sequence running over E ∪ {H}. Then hXn in∈N is a cover of X by a sequence of measurable sets
of finite measure, so (X, Σ, µ) is σ-finite.
f
S f
(v)⇒(i) If (v) is true, thenS we have a sequence hEn in∈N in Σ such that E \ n∈N En is negligible for every E ∈ Σ .
Because µ is semi-finite, X \ n∈N En must be negligible, so X is covered by a countable family of sets of finite measure
and µ is σ-finite.
(viii)⇒(ix) If µX = 0 this is trivial. Otherwise, if f is a strictly positive measurable integrable function, then
R 1
c = f > 0 (122Rc), so f is a strictly positive measurable function with integral 1.
c
(ix)⇒(i) If f : X → ]0, ∞[ is measurable and integrable, h{x : f (x) ≥ 2−n }in∈N is a sequence of sets of finite
measure covering X.

215C Corollary Let (X, Σ, µ) be a σ-finite measure space, and suppose that E ⊆ Σ is any non-empty set.
(a) There is a non-decreasing sequence hFn in∈N S
in E such that, for every E ∈ Σ, either there is an n ∈ N such that
E ∪ Fn is not included in any member of E or E \ n∈N Fn is negligible.
215Yb σ-finite spaces and the principle of exhaustion 45

S
(b) If E is upwards-directed, then there is a non-decreasing sequence hFn in∈N in E such that n∈N Fn is an essential
supremum of E in Σ.
(c) If the union of any non-decreasing sequence in E belongs to E, then there is an F ∈ E such that E \ F is negligible
whenever E ∈ E and F ⊆ E.
proof By 215B, there is a totally finite measure ν on X with the same measurable sets and the same negligible sets
as µ. Since supE∈E νE is finite, we can apply 215A to ν to obtain the results.

215D As a further example of the use of the principle of exhaustion, I give a fundamental fact about atomless
measure spaces.
Proposition Let (X, Σ, µ) be an atomless measure space. If E ∈ Σ and 0 ≤ α ≤ µE < ∞, there is an F ∈ Σ such
that F ⊆ E and µF = α.
proof (a) We need to know that if G ∈ Σ is non-negligible and n ∈ N, then there is an H ⊆ G such that 0 < µH ≤
2−n µG. P P Induce on n. For n = 0 this is trivial. For the inductive step to n + 1, use the inductive hypothesis to find
H ⊆ G such that 0 < µH ≤ 2−n µG. Because µ is atomless, there is an H ′ ⊆ H such that µH ′ , µ(H \ H ′ ) are both
defined and non-zero. Now at least one of them has measure less than or equal to 12 µH, so gives us a subset of G of
non-zero measure less than or equal to 2−n−1 µG. Q Q
It follows that if G ∈ Σ has non-zero finite measure and ǫ > 0, there is a measurable set H ⊆ G such that 0 < µH ≤ ǫ.
(b) Let H be the family
S of all those H ∈ Σ such that S
H ⊆ E and µH ≤ α. If hHn in∈N is any non-decreasing
sequence in H, then µ( n∈N Hn ) = limn→∞ µHn ≤ α, so n∈N Hn ∈ H. So 215Ac tells us that there is an F ∈ H
such that H \ F is negligible whenever H ∈ H and F ⊆ H. ?? Suppose, if possible, that µF < α. By (a), there is an
H ⊆ E \ F such that 0 < µH ≤ α − µF . But in this case H ∪ F ∈ H and µ((H ∪ F ) \ F ) > 0, which is impossible. X
X
So we have found an appropriate set F .

215X Basic exercises (a) Let (X, Σ, µ) be R a measure space and Φ a non-empty set of µ-integrable real-valued
functions from X to R. Suppose that supn∈N fn is finite for every sequence hfn in∈N in Φ such that fn ≤a.e. fn+1
for every n. Show that there is a sequence hfn in∈N in Φ such that fn ≤a.e. fn+1 for every n and, for every integrable
real-valued function f on X, either f ≤a.e. supn∈N fn or there is an n ∈ N such that no member of Φ is greater than
or equal to max(f, fn ) almost everywhere.

S (i) Suppose that E is a non-empty upwards-directed subset of Σ such that


> (b) Let (X, Σ, µ) be a measure space.
c = supE∈E µE is finite. Show that E \ n∈N Fn is negligible whenever E ∈ E and hFn in∈N is a sequence in E such that
limn→∞ µFn = c. (ii) Let Φ be a non-empty set of integrable functions on X which is upwards-directed
R in the sense
that for all f , g ∈ Φ there is an h ∈ Φ such that max(f, g) ≤a.e. h, and suppose that c =R supf ∈Φ f is finite. Show
that f ≤a.e. supn∈N fn whenever f ∈ Φ and hfn in∈N is a sequence in Φ such that limn→∞ fn = c.

(c) Use 215A to shorten the proof of 211Ld.

(d) Give an example of a (non-semi-finite) measure space (X, Σ, µ) satisfying conditions (ii)-(iv) of 215B, but not
(i).

> (e) Let (X, Σ, µ) be an atomless σ-finite measure space. Show


S that for any ǫ > 0 there is a disjoint sequence
hEn in∈N of measurable sets with measure at most ǫ such that X = n∈N En .

(f ) Let (X, Σ, µ) be an atomless strictly localizable measure space. Show that for any ǫ > 0 there is a decomposition
hXi ii∈I of X such that µXi ≤ ǫ for every i ∈ I.

215Y Further exercises (a) Let (X, Σ, µ) be a σ-finite measure space and hfmn im,n∈N , hfm im∈N , f measurable
real-valued functions defined almost everywhere in X and such that hfmn in∈N → fm a.e. for each m and hfm im∈N → f
a.e. Show that there is a strictly increasing sequence hnm im∈N in N such that hfm,nm im∈N → f a.e. (Compare 134Yb.)

(b) Let (X, Σ, µ) be a σ-finite measure space. Let hfn in∈N be a sequence of measurable real-valued functions such
that f = limn→∞ fn is defined almostS everywhere in X. Show that there is a non-decreasing sequence hXk ik∈N of
measurable subsets of X such that k∈N Xk is conegligible in X and hfn in∈N → f uniformly on every Xk , in the sense
that for any ǫ > 0 there is an m ∈ N such that |fj (x) − f (x)| is defined and less than or equal to ǫ whenever j ≥ m,
x ∈ Xk .
(This is a version of Egorov’s theorem.)
46 Taxonomy of measure spaces 215Yc

(c) Let (X, Σ, µ) be a totally finite measure space and hfn in∈N , f measurable real-valued functions defined almost
everywhere in X. Show that hfn in∈N → f a.e. iff there is a sequence hǫn in∈N of strictly positive real numbers, converging
to 0, such that
S
limn→∞ µ∗ ( k≥n {x : x ∈ dom fk ∩ dom f, |fk (x) − f (x)| ≥ ǫn }) = 0.

(d) Find a direct proof of (v)⇒(vi) in 215B. (Hint: given E ⊆ Σ, use Zorn’s Lemma to find a maximal totally
ordered E ′ ⊆ E such that E△F ∈ / N for any distinct E, F ∈ E ′ , and apply (v) to E ′ .)

215 Notes and comments The common ground of 215A, 215B(vi), 215C and 215Xa is actually one of the most
fundamental ideas in measure theory. It appears in such various forms that it is often easier to prove an application
from first principles than to explain how it can be reduced to the versions here. But I will try henceforth to signal such
applications as they arise, calling the method (the proof of 215Aa or 215Xa) the ‘principle of exhaustion’. One point
which is perhaps worth noting here is the inductive construction of the sequence hFn in∈N in the proof of 215Aa. Each
Fn+1 is chosen after the preceding one. It is this which makes it possible, in the proof of 215B(vii)⇒(vi), to extract
a suitable sequence hFn in∈N directly. In many applications (starting with what is surely the most important one in
the elementary theory, the Radon-Nikodým theorem of §232, or with part (i) of the proof of 211Ld), this refinement is
not needed; we are dealing with an upwards-directed set, as in 215B(v), and can choose the whole sequence hFn in∈N
at once, no term interacting with any other, as in 215Xb. The axiom of ‘dependent choice’, which asserts that we can
construct sequences term-by-term, is known to be stronger than the axiom of ‘countable choice’, which asserts only
that we can choose countably many objects simultaneously.
In 215B I try to indicate the most characteristic properties of σ-finiteness; in particular, the properties which
distinguish σ-finite measures from other strictly localizable measures. This result is in a way more abstract than the
manipulations in the rest of the section. Note that it makes an essential use of the axiom of choice in the form of Zorn’s
Lemma. I spent a paragraph in 134C commenting on the distinction between ‘countable choice’, which is needed for
anything which looks like the standard theory of Lebesgue measure, and the full axiom of choice, which is relatively
little used in the elementary theory. The implication (ii)⇒(i) of 215B is one of the points where we do need something
beyond countable choice. (I should perhaps remark that the whole theory of non-σ-finite measure spaces looks very odd
without the general axiom of choice.) Note also that in 215B the proofs of (i)⇒(vii) and (vii)⇒(vi) are the only points
where anything so vulgar as a number appears. The conditions (iii), (iv), (v) and (vi) are linked in ways that have
nothing to do with measure theory, and involve only with the structure (X, Σ, N ). (See 215Yd here, and 316D-316E
in Volume 3.) There are similar conditions relating to measurable functions rather than measurable sets; for a fairly
abstract example, see 241Ye.
In 215Ya-215Yc are three more standard theorems on almost-everywhere-convergent sequences which depend on σ-
or total finiteness.

216 Examples
It is common practice – and, in my view, good practice – in books on pure mathematics, to provide discriminating
examples; I mean that whenever we are given a list of new concepts, we expect to be provided with examples to show
that we have a fair picture of the relationships between them, and in particular that we are not being kept ignorant of
some startling implication. Concerning the concepts listed in 211A-211K, we have ten different properties which some,
but not all, measure spaces possess, giving a conceivable total of 210 different types of measure space, classified according
to which of these ten properties they have. The list of basic relationships in 211L reduces these 1024 possibilities to 72.
Observing that a space can be simultaneously atomless and purely atomic only when the measure of the whole space
is 0, we find ourselves with 56 possibilities, being two trivial cases with µX = 0 (because such a measure may or may
not be complete) together with 9 × 2 × 3 cases, corresponding to the nine classes
probability spaces,
spaces which are totally finite, but not probability spaces,
spaces which are σ-finite, but not totally finite,
spaces which are strictly localizable, but not σ-finite,
spaces which are localizable and locally determined, but not strictly localizable,
spaces which are localizable, but not locally determined,
spaces which are locally determined, but not localizable,
spaces which are semi-finite, but neither locally determined nor localizable,
spaces which are not semi-finite;
*216C Examples 47

the two classes


spaces which are complete,
spaces which are not complete;
and the three classes
spaces which are atomless, not of measure 0,
spaces which are purely atomic, not of measure 0,
spaces which are neither atomless nor purely atomic.
I do not propose to give a complete set of fifty-six examples, particularly as rather fewer than fifty-six different ideas
are required. However, I do think that for a proper understanding of abstract measure spaces it is necessary to have
seen realizations of some of the critical combinations of properties. I therefore take a few paragraphs to describe three
special examples to add to those of 211M-211R.

216A Lebesgue measure Before turning to the new ideas, let me mention Lebesgue measure again. As remarked
in 211M, 211P and 211Qa,
(a) Lebesgue measure µ on R is complete, atomless and σ-finite, therefore strictly localizable, localizable and locally
determined.
(b) The subspace measure µ[0,1] on [0, 1] is a complete, atomless probability measure.
(c) The restriction µ↾B of µ to the Borel σ-algebra B of R is atomless, σ-finite and not complete.

216B I now embark on the description of three ‘counter-examples’; meaning spaces built specifically for the purpose
of showing that there are no unexpected implications among the ten properties under consideration here. Even by
the standards of this chapter these must be regarded as dispensable by the student who wants to get on with the real
business of understanding the big theorems of the subject. Neither the existence of these examples, nor the techniques
needed in constructing them, are vital for anything else we shall look at before Volume 5. But if you are going to take
abstract measure theory seriously at all, sooner or later you will need to form some kind of mental picture of the nature
of the spaces possessing the different properties here, and a minimal requirement of such a picture is that it should
include the discriminations witnessed by these examples.

*216C A complete, localizable, non-locally-determined space The first example hardly needs an idea beyond
what we already have, but it does call for more manipulations than it seems fair to set as an exercise, and may therefore
be useful as a demonstration of technique.
(a) Let I be any uncountable set, and set X = {0, 1} × I. For E ⊆ X, y ∈ {0, 1} set E[{y}] = {i : (y, i) ∈ E} ⊆ I.
Set
Σ = {E : E ⊆ X, E[{0}]△E[{1}] is countable}.
P (i) ∅[{0}]△∅[{1}] = ∅ is countable, so ∅ ∈ Σ. (ii) If E ∈ Σ then
Then Σ is a σ-algebra of subsets of X. P
(X \ E)[{0}]△(X \ E)[{1}] = E[{0}]△E[{1}]
S
is countable. (iii) If hEn in∈N is a sequence in Σ and E = n∈N En , then
S
E[{0}]△E[{1}] ⊆ n∈N En [{0}]△En [{1}]
is countable. Q
Q
For E ∈ Σ, set µE = #(E[{0}]) if this is finite, ∞ otherwise; then (X, Σ, µ) is a measure space.
P If A ⊆ E ∈ Σ and µE = 0, then (0, i) ∈
(b) (X, Σ, µ) is complete. P / E for every i. So
A[{0}]△A[{1}] = A[{1}] ⊆ E[{1}] = E[{1}]△E[{0}]
must be countable, and A ∈ Σ. Q
Q
P If E ∈ Σ and µE > 0, there is an i ∈ I such that (0, i) ∈ E; now F = {(0, i)} ⊆ E
(c) (X, Σ, µ) is semi-finite. P
and µF = 1. QQ
P Let E be any subset of Σ. Set
(d) (X, Σ, µ) is localizable. P
S
J = E∈E E[{0}], G = {0, 1} × J.
Then G ∈ Σ. If H ∈ Σ, then
48 Taxonomy of measure spaces *216C

µ(E \ H) = 0 for every E ∈ E


⇐⇒ E[{0}] ⊆ H[{0}] for every E ∈ E
⇐⇒ (0, i) ∈ H for every i ∈ J
⇐⇒ µ(G \ H) = 0.
Thus G is an essential supremum for E in Σ; as E is arbitrary, µ is localizable. Q
Q
P Consider H = {0} × I. Then H ∈
(e) (X, Σ, µ) is not locally determined. P / Σ because H[{0}]△H[{1}] = I is
uncountable. But let E ∈ Σ be any set such that µE < ∞. Then
(E ∩ H)[{0}]△(E ∩ H)[{1}] = (E ∩ H)[{0}] ⊆ E[{0}]
is finite, so E ∩ H ∈ Σ. As E is arbitrary, H witnesses that µ is not locally determined. Q
Q
(f ) (X, Σ, µ) is purely atomic. PP Let E ∈ Σ be any set of non-zero measure. Let i ∈ I be such that (0, i) ∈ E.
Then (0, i) ∈ E and F = {(0, i)} is a set of measure 1, included in E; because F is a singleton set, it must be an atom
for µ; as E is arbitrary, µ is purely atomic. Q
Q
(g) Thus the construction here yields a complete, localizable, purely atomic, non-locally-determined space.

*216D A complete, locally determined space which is not localizable The next construction requires a
little set theory. We need two sets I, J such that I is uncountable (moreS strictly, I cannot be expressed as the union of
countably many countable sets), I ⊆ J and J cannot be expressed as i∈I Ki where every Ki is countable. The most
natural way of doing this, subject to the axiom of choice, is to take I = ω1 , the first uncountable ordinal, and J to
be ω2 , the first ordinal from which there is no injection into ω1 (see 2A1Fc); but in case you prefer other formulations
(e.g., I = {{x} : x ∈ R} and J = PR), I will write the following argument in terms of I and J, and you can pick your
own pair.
(a) Let T be the countable-cocountable σ-algebra of J and ν the countable-cocountable measure on J (211R). Set
X = J × J and for E ⊆ X set
E[{ξ}] = {η : (ξ, η) ∈ E}, E −1 [{ξ}] = {η : (η, ξ) ∈ E}
for every ξ ∈ J. Set
Σ = {E : E[{ξ}] and E −1 [{ξ}] belong to T for every ξ ∈ J},
P P
µE = ξ∈J νE[{ξ}] + ξ∈J νE −1 [{ξ}]
for every E ∈ Σ. It is easy to check that Σ is a σ-algebra and that µ is a measure.
(b) (X, Σ, µ) is complete. P P If A ⊆ E ∈ Σ and µE = 0, then all the sets E[{ξ}] and E −1 [{ξ}] are countable, so the
same is true of all the sets A[{ξ}] and A−1 [{ξ}], and A ∈ Σ. Q
Q
P For each ζ ∈ J, set
(d) (X, Σ, µ) is semi-finite. P
Gζ = {ζ} × J, G̃ζ = J × {ζ}.
Then all the sections Gζ [{ξ}], G−1
ζ [{ξ}],
G̃ζ [{ξ}] and G̃−1are either J or ∅ or {ζ}, so belong to T, and all the Gζ ,
ζ [{ξ}]
G̃ζ belong to Σ, with µ-measure 1.
Suppose that E ∈ Σ is a set of strictly positive measure. Then there must be some ξ ∈ J such that
0 < νE[{ξ}] + νE −1 [{ξ}] = µ(E ∩ Gξ ) + µ(E ∩ G̃ξ ) < ∞,
and one of the sets E ∩ Gξ , E ∩ G̃ξ is a set of non-zero finite measure included in E. Q
Q
P Suppose that H ⊆ X is such that H ∩ E ∈ Σ whenever E ∈ Σ and µE < ∞.
(e) (X, Σ, µ) is locally determined. P
Then, in particular, H ∩ Gζ and H ∩ G̃ζ belong to Σ, so
H[{ζ}] = (H ∩ G̃ζ )[{ζ}] ∈ T,

H −1 [{ζ}] = (H ∩ Gζ )−1 [{ζ}] ∈ T,


for every ζ ∈ J. This shows that H ∈ Σ. As H is arbitrary, µ is locally determined. Q
Q
P Set E = {Gζ : ζ ∈ J}. ?? Suppose, if possible, that G ∈ Σ is an essential
(f ) (X, Σ, µ) is not localizable. P
supremum for E. Then
*216E Examples 49

ν(J \ G[{ξ}]) = µ(Gξ \ G) = 0


S
and J \ G[{ξ}] is countable, for every ξ ∈ J. Consequently J 6= ξ∈I (J \ G[{ξ}]), and there is an η belonging to
S T
J \ ξ∈I (J \ G[{ξ}]) = ξ∈I G[{ξ}]. This means just that (ξ, η) ∈ G for every ξ ∈ I, that is, that I ⊆ G−1 [{η}].
Accordingly G−1 [{η}] is uncountable, so that νG−1 [{η}] = µ(G∩ G̃η ) = 1. But observe that µ(Gξ ∩ G̃η ) = µ{(ξ, η)} = 0
for every ξ ∈ J. This means that, setting H = X \ G̃η , E \ H is negligible, for every E ∈ E; so that we must have
0 = µ(G \ H) = µ(G ∩ G̃η ) = 1, which is absurd. XX
Thus E has no essential supremum in Σ, and µ cannot be localizable. Q Q
(g) (X, Σ, µ) is purely atomic. P P If E ∈ Σ has non-zero measure, there must be some ξ ∈ J such that one of E[{ξ}],
E −1 [{ξ}] is not countable; that is, such that one of E ∩ Gξ , E ∩ G̃ξ is not negligible. But if now H ∈ Σ and H ⊆ E ∩ Gξ ,
either H[{ξ}] is countable, and µH = 0, or J \ H[{ξ}] is countable, and µ(Gξ \ H) = 0; similarly, if H ⊆ E ∩ G̃ξ , one
of µH, µ(G̃ξ \ H) must be 0, according to whether H −1 [{ξ}] is countable or not. Thus E ∩ Gξ and E ∩ G̃ξ , if not
negligible, must be atoms, and E must include an atom. As E is arbitrary, µ is purely atomic. Q Q
(h) Thus (X, Σ, µ) is complete, locally determined and purely atomic, but is not localizable.

*216E A complete, locally determined, localizable space which is not strictly localizable For the last,
and most interesting, construction, we need a non-trivial result in infinitary combinatorics, which I have written out
in 2A1P: if I is any set, and hfα iα∈A is a family in {0, 1}I , the set of functions from I to {0, 1}, with #(A) strictly
greater than c, the cardinal of the continuum, and if hKα iα∈A is any family of countable subsets of I, then there must
be distinct α, β ∈ A such that fα and fβ agree on Kα ∩ Kβ .
Armed with this fact, I proceed as follows.
(a) Let C be any set of cardinal greater than c. Set I = PC and X = {0, 1}I . For γ ∈ C, define xγ ∈ X by saying
that xγ (Γ) = 1 if γ ∈ Γ ⊆ C and xγ (Γ) = 0 if γ ∈ / Γ ⊆ C. Let K be the family of countable subsets of I, and for
K ∈ K, γ ∈ C set
FγK = {x : x ∈ X, x↾K = xγ ↾K} ⊆ X.
Let

Σγ = {E : E ⊆ X, either there is a K ∈ K such that FγK ⊆ E


or there is a K ∈ K such that FγK ⊆ X \ E}.
Then Σγ is a σ-algebra of subsets of X. PP (i) Fγ∅ ⊆ X \ ∅ so ∅ ∈ Σγ . (ii) The definition of Σγ is symmetric between
E and X \ E, so X \ E ∈ Σγ whenever E ∈ Σγ . (iii) Let hEn in∈N be a sequence in Σγ , with union E. (α) If there are
n ∈ N, K ∈ K such that FγK ⊆ S En , then FγK ⊆ E, so E ∈ Σγ . (β) Otherwise, there is for each n ∈ N a Kn ∈ K such
that Fγ,Kn ⊆ X \ En . Set K = n∈N Kn ∈ K. Then

FγK = {x : x↾K = xγ ↾K} = {x : x↾Kn = xγ ↾Kn for every n ∈ N}


\ \
= Fγ,Kn ⊆ X \ En = X \ E,
n∈N n∈N

so again E ∈ Σγ . As hEn in∈N is arbitrary, Σγ is a σ-algebra. Q


Q
(b) Set
T
Σ= γ∈C Σγ ;
then Σ, being an intersection of σ-algebras, is a σ-algebra of subsets of X (see 111Ga). Define µ : Σ → [0, ∞] by setting

µE = #({γ : xγ ∈ E}) if this is finite,


= ∞ otherwise;
then µ is a measure.
(c) It will be convenient later to know something about the sets
GD = {x : x ∈ X, x(D) = 1}
for D ⊆ C. In particular, every GD belongs to Σ. P
P If γ ∈ D, then xγ (D) = 1 so GD = Fγ,{D} ∈ Σγ . If γ ∈ C \ D,
then xγ (D) = 0 so GD = X \ Fγ,{D} ∈ Σγ . Q
Q Also, of course, {γ : xγ ∈ GD } = D.
50 Taxonomy of measure spaces *216E

P Suppose that A ⊆ E ⊆ Σ and that µE = 0. For every γ ∈ C, E ∈ Σγ and xγ ∈


(d) (X, Σ, µ) is complete. P / E, so
FγK 6⊆ E for any K ∈ K and there is a K ∈ K such that
FγK ⊆ X \ E ⊆ X \ A.
Thus A ∈ Σγ ; as γ is arbitrary, A ∈ Σ. As A is arbitrary, µ is complete. Q
Q
(e) (X, Σ, µ) is semi-finite. PP Let E ∈ Σ be a set of positive measure. Then there must be some γ ∈ C such that
xγ ∈ E. Consider E ′ = E ∩ G{γ} . As xγ ∈ E ′ , µE ′ ≥ 1 > 0. On the other hand, µG{γ} = #({δ : δ ∈ {γ}}) = 1, so
µE ′ = 1. As E is arbitrary, µ is semi-finite. Q
Q
S
P Let E be any subset of Σ. Set D = {δ : δ ∈ C, xδ ∈ E}. Consider GD . For H ∈ Σ,
(f ) (X, Σ, µ) is localizable. P

µ(E \ H) = 0 for every E ∈ E


⇐⇒ xγ ∈
/ E \ H for every E ∈ E, γ ∈ C
⇐⇒ xγ ∈ H for every γ ∈ D
⇐⇒ xγ ∈
/ GD \ H for every γ ∈ C
⇐⇒ µ(GD \ H) = 0.
Thus GD is an essential supremum for E in Σ. As E is arbitrary, µ is localizable. Q
Q
P?? Suppose, if possible, that hXj ij∈J is a decomposition of (X, Σ, µ). Set
(g) (X, Σ, µ) is not strictly localizable. P
J ′ = {j : j ∈ J, µXj > 0}. For each j ∈ J ′ , the set Cj = {γ : xγ ∈ Xj } must be finite and non-empty. Moreover,

S γ ∈ C, there must be some j ∈ J such that µ(G{γ} ∩ Xj ) > 0, and in this case j ∈ J and γ ∈ Cj . Thus
for each
C = j∈J ′ Cj . Because #(C) > c, #(J ′ ) > c (2A1Ld).
For each j ∈ J ′ , choose γj ∈ Cj . Then
xγj ∈ Xj ∈ Σ ⊆ Σγj ,
so there must be a Kj ∈ K such that Fγj ,Kj ⊆ Xj .
At this point I finally turn to the result cited at the start of this example. Because #(J ′ ) > c, there must be distinct
j, k ∈ J ′ such that xγj and xγk agree on Kj ∩ Kk . We may therefore define x ∈ X by saying that

x(δ) = xγj (δ) if δ ∈ Kj ,


= xγk (δ) if δ ∈ Kk ,
= 0 if δ ∈ C \ (Kj ∪ Kj ).
Now
x ∈ Fγj ,Kj ∩ Fγk ,Kk ⊆ Xj ∩ Xk ,
and Xj ∩ Xk 6= ∅; contradicting the assumption that the Xj formed a decomposition of X. X
XQQ
P If E ∈ Σ and µE > 0, then (as remarked in (e) above) there is a γ ∈ C such that
(h) (X, Σ, µ) is purely atomic. P
µ(E ∩ G{γ} ) = 1; now E ∩ G{γ} must be an atom. Q Q
(i) Accordingly (X, Σ, µ) is a complete, locally determined, localizable, purely atomic measure space which is not
strictly localizable.

216X Basic exercises (a) In the construction of 216C, show that the subspace measure on {1}×I is not semi-finite.

(b) Suppose, in 216D, that I = ω1 . (i) Show that the set {(ξ, η) : ξ ≤ η < ω1 } is measured by the measure
constructed by Carathéodory’s method from µ∗ ↾ P(I × I), but not by the subspace measure on I × I. (ii) Hence, or
otherwise, show that the subspace measure on I × I is not locally determined.

(c) In 216Ya, 252Yq and 252Ys below, I indicate how to construct atomless versions of 216C, 216D and 216E, that
is, atomless complete measure spaces of which the first is localizable but not locally determined, the second is locally
determined spaces but not localizable, and the third is locally determined and localizable but not strictly localizable.
Show how direct sums of these, together with counting measure and the examples described in this chapter, can be
assembled to provide all 56 examples called for by the discussion in the introduction to this section.

216Y Further exercises (a) Let λ be Lebesgue measure on [0, 1], and Λ its domain. Set Y = [0, 1] × {0, 1} and
write
216 Notes Examples 51

T = {F : F ⊆ Y, F −1 [{0}] ∈ Λ},

νF = λF −1 [{0}] for every F ∈ T.


Set
T0 = {F : F ∈ T, F −1 [{0}]△F −1 [{1}] is λ-negligible}.
Let I be an uncountable set. Set X = Y × I,
Σ = {E : E ⊆ X, E −1 [{i}] ∈ T for every i ∈ I, {i : E −1 [{i}] ∈
/ T0 } is countable},
P
µE = i∈I νE −1 [{i}] for E ∈ Σ.
(i) Show that (Y, T, ν) and (Y, T0 , ν↾ T0 ) are complete probability spaces, and that for every F ∈ T there is an F ′ ∈ T0
such that ν(F △F ′ ) = 0. (ii) Show that (X, Σ, µ) is an atomless complete localizable measure space which is not locally
determined.
(b) Define a measure µ on X = ω2 × ω2 as follows. Take Σ to be the σ-algebra of subsets of X generated by
{A × ω2 : A ⊆ ω2 } ∪ {ω2 × α : α < ω2 }.
For E ∈ Σ set
W (E) = {ξ : ξ < ω2 , sup E[{ξ}] = ω2 },
and set µE = #(W (E)) if this is finite, ∞ otherwise. Show that µ is a measure on X, is localizable and locally
determined, but does not have locally determined negligible sets. Find a subspace Y of X such that the subspace
measure on Y is not semi-finite.
(c) Show that in the space described in 216E every set has a measurable envelope, but that this is not true in the
spaces of 216C and 216D.
(d) Set X = ω1 × ω2 . For E ⊆ X set
A(E) = {ζ : for some ξ, just one of (ξ, ζ), (ξ, ζ + 1) belongs to E},

B(E) = {ζ : there are ξ, ζ ′ such that ζ < ζ ′ < ω2 and just one of (ξ, ζ), (ξ, ζ ′ ) belongs to E},

W (E) = {ξ : #(E[{ξ}]) = ω2 }.
Let Σ be the set of subsets E of X such that A(E) is countable and #(B(E)) ≤ ω1 . For E ∈ Σ, set µE = #(W (E)) if
this is finite, ∞ otherwise. (i) Show that (X, Σ, µ) is a measure space. (ii) Show that if µ̂ is the completion of µ, then
its domain is the set of subsets E of X such that A(E) is countable, and µ̂ is strictly localizable. (iii) Show that µ is
not strictly localizable.

216 Notes and comments The examples 216C-216E are designed to form, with Lebesgue measure, a basis for
constructing a complete set of examples for the concepts listed in 211A-211K. One does not really expect to encounter
these phenomena in applications, but a clear understanding of the possibilities demonstrated by these examples is part
of a proper appreciation of their rarity. Of course, if we add further properties to our list – for instance, the property of
having locally determined negligible sets (213I), or the property that every subset should have a measurable envelope
(213Xl) – then there are further positive results to complement 211L, and more examples to hunt for, like 216Yb. But
it is time, perhaps past time, that we returned to the classical theorems which apply to the measure spaces at the
centre of the subject.
52 The Fundamental Theorem of Calculus

Chapter 22
The Fundamental Theorem of Calculus
In this chapter I address one of the most important properties
Rx of the Lebesgue integral. Given an integrable function
f : [a, b] → R, we can form its indefinite integral F (x) = a f (t)dt for x ∈ [a, b]. Two questions immediately present
themselves. (i) Can we expect to have the derivative F ′ of F equal to f ? (ii) Can we identify which functions F
will appear as indefinite integrals? Reasonably satisfactory answers may be found for both of these questions: F ′ = f
almost everywhere (222E) and indefinite integrals are the absolutely continuous functions (225E). In the course of
dealing with them, we need to develop a variety of techniques which lead to many striking results both in the theory
of Lebesgue measure and in other, apparently unrelated, topics in real analysis.
The first step is ‘Vitali’s theorem’ (§221), a remarkable argument – it is more a method than a theorem – which uses
the geometric nature of the real line to extract disjoint subfamilies from collections of intervals. It is the foundation
stone not only of the results in §222 but of all geometric measure theory, that is, measure theory on spaces with
a geometric structure. I use it here to show that monotonic functions are differentiable almost everywhere (222A).
Following this, Fatou’s Lemma and Lebesgue’s Dominated Convergence Theorem are enough to show that the derivative
of an indefinite integral is almost everywhere equal to the integrand. We find that some innocent-looking manipulations
of this fact take us surprisingly far; I present these in §223.
I begin the second half of the chapter with a discussion of functions ‘of bounded variation’, that is, expressible as the
difference of bounded monotonic functions (§224). This is one of the least measure-theoretic sections in the volume;
only in 224I and 224J are measure and integration even mentioned. But this material is needed for Chapter 28 as
well as for the next section, and is also one of the basic topics of twentieth-century real analysis. §225 deals with
the characterization of indefinite integrals as the ‘absolutely continuous’ functions. In fact this is now quite easy; it
helps to call on Vitali’s theorem again, but everything else is a straightforward application of methods previously used.
The second half of the section introduces some new ideas in an attempt to give a deeper intuition into the essential
nature of absolutely continuous functions. §226 returns to functions of bounded variation and their decomposition into
‘saltus’ and ‘absolutely continuous’ and ‘singular’ parts, the first two being relatively manageable and the last looking
something like the Cantor function.

221 Vitali’s theorem in R


I give the first theorem of this chapter a section to itself. It occupies a position between measure theory and geometry
(it is, indeed, one of the fundamental results of ‘geometric measure theory’), and its proof involves both the measure
and the geometry of the real line.

221A Vitali’s theorem Let A be a bounded subset of R and I a family of non-singleton closed intervals in R such
that every point of A belongs to arbitrarily short members of I. S Then there is a countable set I0 ⊆ I such that (i) I0
is disjoint, that is, I ∩ I ′ = ∅ for all distinct I, I ′ ∈ I0 (ii) µ(A \ I0 ) = 0, where µ is Lebesgue measure on R.
S
proof (a) If there is a finite disjoint set I0 ⊆ I such that A ⊆ I0 (including the possibility that A = I0 = ∅), we
can stop. So let us suppose henceforth that there is no such I0 .
Let µ∗ be Lebesgue outer measure on R. Suppose that |x| < M for every x ∈ A, and set
I ′ = {I : I ∈ I, I ⊆ [−M, M ]}.
′ ′
S if I0 is any finite disjoint subset of I , there is a J ∈ I which is disjoint from any member of I0 .
(b) In this case,
P Take x ∈ A \ I0 . Now there is a δ > 0 such that [x − δ, x + δ] does not meet any member of I0 , and as |x| < M
P
we can supposeSthat [x − δ, x + δ] ⊆ [−M, M ]. Let J be a member of I, containing x, and of length at most δ; then
J ∈ I ′ and J ∩ I0 = ∅. QQ
(c) We can now choose a sequence hγn in∈N of real numbers and a disjoint sequence hIn in∈N in I ′ inductively, as
follows. Given hIj ij<n (if n = 0, this is the empty sequence, with no members), with Ij ∈ I ′ for each j < n, and
Ij ∩ Ik = ∅ for j < k < n, set
Jn = {I : I ∈ I ′ , I ∩ Ij = ∅ for every j < n}.
We know from (b) that Jn 6= ∅. Set
γn = sup{µI : I ∈ Jn };
then 0 < γn ≤ 2M . We may therefore choose a set In ∈ Jn such that µIn ≥ 12 γn , and this continues the induction.
221X Vitali’s theorem in R 53

(e) Because the In are disjoint Lebesgue measurable subsets of [−M, M ], we have
P∞ P∞
n=0 γn ≤ 2 n=0 µIn ≤ 4M < ∞,

and limn→∞ γn = 0. Now define In′ to be the closed interval with the same midpoint as In but five times the length,
so that it projects past each end of In by at least γn . I claim that, for any n,
S S
A ⊆ j<n Ij ∪ j≥n Ij′ .
S S
P?? Suppose, if possible, otherwise. Take any x belonging to A \ ( j<n Ij ∪ j≥n Ij′ ). Let δ > 0 be such that
P
S
[x − δ, x + δ] ⊆ [−M, M ] \ j<n Ij ,
and let J ∈ I be such that
x ∈ J ⊆ [x − δ, x + δ].
Then
µJ > 0 = limm→∞ γm ;
let m be the least integer greater than or equal to n such that γm < µJ. In this case J cannot belong to Jm , so there
must be some k < m such that J ∩ Ik 6= ∅, because certainly J ∈ I ′ . By the choice of δ, k cannot be less than n, so
n ≤ k < m, and γk ≥ µJ. In this case, the distance from x to the nearest endpoint of Ik is at most µJ ≤ γk . But the
ends of Ik′ project beyond the ends of Ik by at least γk , so x ∈ Ik′ ; which contradicts the choice of x. X
XQQ

(f ) It follows that
S S P∞ P∞
µ∗ (A \ j<n Ij ) ≤ µ( ′
j≥n Ij ) ≤ j=n µIj′ ≤ 5 j=n µIj .
As
P∞
j=0 µIj ≤ 2M < ∞,
we must have
S
limn→∞ µ∗ (A \ j<n Ij ) = 0,
and
S S S
µ(A \ j∈N Ij ) = µ∗ (A \ j∈N Ij ) ≤ inf n∈N µ∗ (A \ j<n Ij ) = 0.
S
Thus in this case we may set I0 = {In : n ∈ N} to obtain a countable disjoint family in I with µ(A \ I0 ) = 0.

221B Remarks (a) I have expressed this theorem in the form ‘there is a countable set I0 ⊆ I such that . . . ’ in an
attempt to find a concise way of expressing the three possibilities
(i) A = I = ∅, so that we must take I0 = ∅;
(ii) there are disjoint I0 , . . . , In ∈ I such that A ⊆ I0 ∪ . . . ∪ In , so that we can take I0 = {I0 , . . . , In };
S
(iii) there is a disjoint sequence hIn in∈N in I such that µ(A \ n∈N In ) = 0, so that we can take I0 =
{In : n ∈ N}.
Of course many applications, like the proof of 221A itself, will use forms of these three alternatives.

(b) The actual theorem here, as stated, will be used in the next section. But quite as important as the statement
of the theorem is the principle of its proof. The In are chosen ‘greedily’, that is, when we come to choose In we look
at the family Jn of possible intervals, given the choices I0 , . . . , In−1 already made, and choose an In ∈ Jn which is
‘about’ as big as it could be. The supremum of the possibilities for µIn is γn ; but since we do not know that there
is any I ∈ Jn such that µI = γn , we must settle for a little less. I follow the standard formula in taking µIn ≥ 12 γn ,
99
but of course I could have taken µIn ≥ 100 γn , or µIn ≥ (1 − 2−n )γn , if that had helped later on. The remarkable
thing is that this works; we can choose the In without foresight and without considering their interrelationships (for
that matter, without examining the set A) beyond the minimal requirement that In ∩ Ij = ∅ for j < n, and even this
arbitrary and casual procedure yields a suitable sequence.

(c) I have stated the theorem in terms of bounded sets A and closed intervals, which is adequate for our needs, but
very small changes in the proof suffice to deal with arbitrary (non-singleton) intervals, and another refinement handles
unbounded sets A. (See 221Ya.)
54 The Fundamental Theorem of Calculus 221X

221X Basic exercises (a) Let α ∈ ]0, 1[. Suppose, in part (c) of the proof of 221A, we take µIn ≥ αγn for each
n ∈ N, rather than µIn ≥ 21 γn . What will be the appropriate constant to take in place of 5 in defining the sets Ij′ of
part (e)?

221Y Further exercises (a) Let A be a subset of R and I a family of non-singleton intervals in R such that every
S of A belongs to arbitrarily short members of I. Show that there is a countable disjoint set I0 ⊆ I such that
point
A \ I0 is Lebesgue negligible. (Hint: apply 221A to the sets A ∩ ]n, n + 1[, {I : I ∈ I, I ⊆ ]n, n + 1[}, writing I for
the closed interval with the same endpoints as I.)
S
S J be any family of non-singleton intervals in R. Show that J is Lebesgue measurable. (Hint: apply (a)
(b) Let
to A = J and the family I of non-singleton subintervals of members of J .)

(c) Let (X, ρ) be a metric space, A a subset of X, and I a family of closed balls of non-zero radius in X such that
every point of A belongs to arbitrarily small members of I. (I say here that a set is a ‘closed ball of non-zero radius’
if it is expressible in the form B(x, δ) = {y : ρ(y, x) ≤ δ} where x ∈ X and δ > 0. Of course it is possible for such
a ball to be a singleton {x}.) Show that either A can be covered by a finite disjoint family in I or there is a disjoint
sequence hB(xn , δn )in∈N in I such that
S S
A ⊆ m≤n B(xm , δm ) ∪ m>n B(xm , 5δm ) for every n ∈ N
or there is a disjoint sequence hB(xn , δn )in∈N in I such that inf n∈N δn > 0.

(d) Give an example of a family I of open intervals such that every point of R belongs to arbitrarily small members

of I, but if hIn in∈N is any disjoint sequence in I, and for each n ∈ N we write
S In for the
Sclosed′ interval with the same
centre as In and ten times the length, then there is an n such that ]0, 1[ 6⊆ m<n Im ∪ m≥n Im .
S S
(e)(i) Show that if I is a finite family of intervals in R there are I0 , I1 ⊆ I such that (I0 ∪ I1 ) = I and both
I0 and I1 are disjoint families. (Hint: induce on #(I).) (ii) Suppose that I is a family of non-singleton intervals, of
S 1, covering a bounded set A ⊆ R, and that ǫ > 0. Show that there is a disjoint subfamily I0 of I such
length at most
that µ∗ (A \ I0 ) ≤ 12 µ∗ A + ǫ. (Hint: replacing each member of I by a slightly longer one with rational endpoints,
reduce to the case in which I is countable and thence to the case in which I is finite; now use (i).) (iii) Use (ii) to
prove Vitali’s theorem. (I learnt this argument from J.Aldaz.)

221 Notes and comments I have headed this section ‘Vitali’s theorem in R’ because there is an r-dimensional version,
which will appear in Chapter 26 below. There is an anomaly in the position of this theorem. It is an indispensable
element of the proofs of some of the most important theorems in measure theory; on the other hand, the ideas involved
in its own proof are not used elsewhere in the elementary theory. I have therefore myself sometimes omitted the proof
when teaching this material, and would not reproach any student who left it to one side for the moment. At some
stage, of course, any measure theorist must master the method, not just for the sake of completeness, but in order to
gain an intuition for possible variations. I must emphasize that it is the principle of the proof, rather than its details,
which is important, because there are innumerable forms of ‘Vitali’s theorem’. (I offer some variations in the exercises
above and in §261 below, and there are many others which are important in more advanced work; one will appear in
§472 in Volume 4.) This principle is, I suppose, that
(i) we choose the In greedily, according to some more or less natural criterion applicable to each In as
we come to choose it, without attempting to look ahead;
(ii) we prove that their sizes tend to zero, even though we seemed to do nothing to ensure that they
would (but note the shift from I to I ′ in part (a) of the proof of 221A, which is exactly what is needed to
make this step work);
S S
(iii) we check that for a suitable definition of In′ , enlarging In , we shall have A ⊆ m<n Im ∪ m≥n Im ′
P∞ ′
for every n, while n=0 µIn < ∞.
In a way, we have to count ourselves lucky every time this works. The reason for studying as many variations as
possible of a technique of this kind is to learn to guess when we might be lucky.
222A Differentiating an indefinite integral 55

222 Differentiating an indefinite integral


I come now to the first ofR the two questions mentioned in the introduction to this chapter: if f is an integrable
d x
function on [a, b], what is dx a
f ? It turns out that this derivative exists and is equal to f almost everywhere (222E).
The argument is based on a striking property of monotonic functions: they are differentiable almost everywhere (222A),
and we can bound the integrals of their derivatives (222C).

222A Theorem Let I ⊆ R be an interval and f : I → R a monotonic function. Then f is differentiable almost
everywhere in I.

Remark If I seem to be speaking of a measure on R without naming it, as here, I mean Lebesgue measure.

proof As usual, write µ∗ for Lebesgue outer measure on R, µ for Lebesgue measure.

(a) To begin with (down to the end of (c) below), let us suppose that f is non-decreasing and I is a bounded open
interval on which f is bounded; say |f (x)| ≤ M for x ∈ I. For any closed subinterval J = [a, b] of I, write f ∗ (J) for
the open interval ]f (a), f (b)[. For x ∈ I, write
D∗ f (x) = lim suph→0 h1 (f (x + h) − f (x)), D∗ f (x) = lim inf h→0 h1 (f (x + h) − f (x)),
allowing the value ∞ in both cases. Then f is differentiable at x iff D∗ f (x) = D∗ f (x) ∈ R. Because surely D∗ f (x) ≥
D∗ f (x) ≥ 0, f will be differentiable at x iff D∗ f (x) is finite and D∗ f (x) ≤ D∗ f (x).
I therefore have to show that the sets
{x : x ∈ I, D∗ f (x) = ∞}, {x : x ∈ I, D∗ f (x) > D∗ f (x)}
are negligible.

(b) Let us take A = {x : x ∈ I, D∗ f (x) = ∞} first. Fix an integer m ≥ 1 for the moment, and set
Am = {x : x ∈ I, D∗ f (x) > m} ⊇ A.
Let I be the family of non-trivial closed intervals [a, b] ⊆ I such that f (b) − f (a) ≥ m(b − a); then µf ∗ (J) ≥ mµJ for
every J ∈ I. If x ∈ Am , then for any δ > 0 we have an h with 0 < |h| ≤ δ and h1 (f (x + h) − f (x)) > m, so that
[x, x + h] ∈ I if h > 0, [x + h, x] ∈ I if h < 0;
thus every member of Am belongs toSarbitrarily small intervals in I. By Vitali’s theorem (221A), there is a countable
disjoint set I0 ⊆ I such that µ(A \P I0 ) = 0. Now, becauseP f is non-decreasing, hf ∗ (J)iJ∈I0 isSdisjoint, and all the
∗ ∗
f (J) are included in [−M, M ], so J∈I0 µf (J) ≤ 2M and J∈I0 µJ ≤ 2M/m. Because Am \ I0 is negligible,
2M
µ∗ A ≤ µ∗ Am ≤ .
m

As m is arbitrary, µ∗ A = 0 and A is negligible.

(c) Now consider B = {x : x ∈ I, D∗ f (x) > D∗ f (x)}. For q, q ′ ∈ Q with 0 ≤ q < q ′ , set
Bqq′ = {x : x ∈ I, D∗ f (x) < q, D∗ f (x) > q ′ }.
Fix such q, q ′ for the moment, and write γ = µ∗ Bqq′ . Take any ǫ > 0, and let G be an open set including Bqq′ such
that µG ≤ γ + ǫ (134Fa). Let J be the set of non-trivial closed intervals [a, b] ⊆ I ∩ G such that f (b) − f (a) ≤ q(b − a);
this time µf ∗ (J) ≤ qµJ for J ∈ J . Then every member
S of Bqq′ is included in arbitrarily small members of J , so there
is a countable disjoint J0 ⊆ J such that Bqq′ \ J0 is negligible. Let L be the set of endpoints of members of J0 ;
then L is a countable union of doubleton sets, so is countable, therefore negligible. Set
S
C = Bqq′ ∩ J0 \ L;
then µ∗ C = γ. Let I be the set of non-trivial closed intervals J = [a, b] such that (i) J is included in one of the
members of J0 (ii) f (b) − f (a) ≥ q ′ (b − a); now µf ∗ (J) ≥ q ′ µJ for every J ∈ I. Once again, because every member of
C is an interior point of some member of JS0 , every point of C belongs to arbitrarily small members of I; so there is a
countable disjoint I0 ⊆ I such that µ(C \ I0 ) = 0.
As in (b) above,
S P P S
γq ′ ≤ q ′ µ( I0 ) = I∈I0 q ′ µI ≤ I∈I0 µf ∗ (I) = µ( I∈I0 f ∗ (I)).
On the other hand,
56 The Fundamental Theorem of Calculus 222A

[ X X [
µ( f ∗ (J)) = µf ∗ (J) ≤ q µJ = qµ( J0 )
J∈J0 J∈J0 J∈J0
[
≤ qµ( J ) ≤ qµG ≤ q(γ + ǫ).
S S
But I∈I0 f ∗ (I) ⊆ J∈J0 f ∗ (J), because every member of I0 is included in a member of J0 , so γq ′ ≤ q(γ + ǫ) and
γ ≤ ǫq/(q ′ − q). As ǫ is arbitrary, γ = 0. S
Thus every Bqq′ is negligible. Consequently B = q,q′ ∈Q,0≤q<q′ Bqq′ is negligible.
(d) This deals with the case of a bounded open interval on which f is bounded and non-decreasing. Still for non-
decreasing
S f , but for an arbitrary interval I, observe that K = {(q, q ′ ) : q, q ′ ∈ I ∩ Q, q < q ′ } is countable and that
I \ (q,q′ )∈K ]q, q ′ [ has at most two points (the endpoints of I, if any), so is negligible. If we write S for the set of points
at which f is not differentiable, then from (a)-(c) we see that S ∩ ]q, q ′ [ is negligible for every (q, q ′ ) ∈ K, so that
of I S
S ∩ (q,q′ )∈K ]q, q ′ [ is negligible and S is negligible.
(e) Thus we are done if f is non-decreasing. For non-increasing f , apply the above to −f , which is differentiable at
exactly the same points as f .
P
222B Remarks(a) I note that in the above argument I am using such formulae as J∈I0 µf ∗ (J). This is because
Vitali’s theorem leaves it open whether the families P I0 will be finite or infinite. The sum must be interpreted along
the lines laid down
P in 112Bd in Volume 1; P k∈K ak , where K is an arbitrary set and every ak ≥ 0, is to be
generally,
supL⊆K is finite k∈L ak , with the convention that k∈∅ ak = 0. Now, in this context, if (X, Σ, µ) is a measure space,
K is a countable set, and hEk ik∈K is a family in Σ,
S P
µ( k∈K Ek ) ≤ k∈K µEk ,
with equality if hEk ik∈K is disjoint. P
P If K = ∅, this is trivial. Otherwise, let n 7→ kn : N → K be a surjection, and
set
S S
Kn = {ki : i ≤ n}, Gn = i≤n Eki = k∈Kn Ek
S
for each n ∈ N. Then hGn in∈N is a non-decreasing sequence with union E = k∈K Ek , so
µE = limn→∞ µGn = supn∈N µGn ;
and
P P
µGn ≤ k∈Kn µEk ≤ k∈K µEk
P P
for every n, so µE ≤ k∈K µEk . If the Ek are disjoint, then µGn is precisely k∈Kn µEk for each n; but as hKn in∈N
is a non-decreasing sequence of sets with union K, every finite subset of K is included in some Kn , and
P P
k∈K µEk = supn∈N k∈Kn µEk = supn∈N µGn = µE,

as required. Q
Q

Pn(b) SomeP∞readers will prefer to re-index sets regularly, so that all the sums they need to look at will be of the form
i=0 or i=0 . In effect, that is what I did in Volume 1, in the proof of 114Da/115Da, when showing that Lebesgue
outer measure is indeed an outer measure. The disadvantage of this procedure in the context of 222A is that we must
continually check that it doesn’t matter whether we have a finite or infinite sum at any particular moment. I believe
that it is worth taking the trouble to learn the technique sketched here, because it very frequently happens that we
wish to consider unions of sets indexed by sets other than N and {0, . . . , n}.

(c) Of course the argument above can be shortened if you P know a tiny bit more about countable sets than I have
explicitly stated so far. But note that the value assigned to k∈K ak must not depend on which enumeration hkn in∈N
we pick on.
Rb
222C Lemma Suppose that a ≤ b in R, and that F : [a, b] → R is a non-decreasing function. Then a
F ′ exists
and is at most F (b) − F (a).
Remark I discussed integration over subsets at length in §131 and §214. For measurable subsets, which are sufficient
for our needs inRthis chapter,
R we have a simple description: if (X, Σ, µ) is a measure space, E ∈ Σ and f is a real-valued
function, then E f = f˜ if the latter integral exists, where dom f˜ = (E ∩ dom f ) ∪ (X \ E) and f˜(x) = f (x) if
R R
x ∈ E ∩ dom f , 0 if x ∈ X \ E (apply 131Fa to f˜). It follows at once that if now F ∈ Σ and F ⊆ E, F f = E f × χF .
222D Differentiating an indefinite integral 57
Rx R
I write a f to mean [a,x[ f , which (because [a, x[ is measurable) can be dealt with as described above. Note that, as
long as weR are dealing
R Rwith RLebesgue measure, so that [a, x] \ ]a, x[ = {a, x} is negligible, there is no need to distinguish
between [a,x] , ]a,x[ , [a,x[ , ]a,x] ; for other measures on R we may need to take more care. I use half-open intervals to
Rx Ry Ry
make it obvious that a f + x f = a f if a ≤ x ≤ y, because
f × χ [a, y[ = f × χ [a, x[ + f × χ [x, y[.

proof (a) The result is trivial if a = b; let us suppose that a < b. By 222A, F ′ is defined almost everywhere in [a, b].
(b) For each n ∈ N, define a simple function gn : [a, b[ → R as follows. For 0 ≤ k < 2n , set ank = a + 2−n k(b − a),
bnk = a + 2−n (k + 1)(b − a), Ink = [ank , bnk [. For each x ∈ [a, b[, take that k < 2n such that x ∈ Ink , and set
2n
gn (x) = (F (bnk ) − F (ank ))
b−a
for x ∈ Ink , so that gn gives the slope of the chord of the graph of F defined by the endpoints of Ink . Then
Rb P2n −1
g = k=0 F (bnk ) − F (ank ) = F (b) − F (a).
a n

(c) On the other hand, if we set


C = {x : x ∈ ]a, b[ , F ′ (x) exists},
then [a, b] \ C is negligible, by 222A, and F ′ (x) = limn→∞ gn (x) for every x ∈ C. P P Let ǫ > 0. Then there is a δ > 0
such that x + h ∈ [a, b] and |F (x + h) − F (x)) − hF ′ (x)| ≤ ǫ|h| whenever |h| ≤ δ. Let n ∈ N be such that 2−n (b − a) ≤ δ.
Let k < 2n be such that x ∈ Ink . Then
2n
x − δ ≤ ank ≤ x < bnk ≤ x + δ, gn (x) = (F (bnk ) − F (ank )).
b−a
Now we have

2n
|gn (x) − F ′ (x)| = | (F (bnk ) − F (ank )) − F ′ (x)|
b−a
2n
= |F (bnk ) − F (ank ) − (bnk − ank )F ′ (x)|
b−a
2n
≤ |F (bnk ) − F (x) − (bnk − x)F ′ (x)|
b−a

+ |F (x) − F (ank ) − (x − ank )F ′ (x)|
2n
≤ (ǫ|bnk − x| + ǫ|x − ank |) = ǫ.
b−a

And this is true whenever 2−n ≤ δ, that is, for all n large enough. As ǫ is arbitrary, F ′ (x) = limn→∞ gn (x). Q
Q
(d) Thus gn → F ′ almost everywhere in [a, b]. By Fatou’s Lemma,
Rb Rb Rb Rb
a
F′ = a
lim inf n→∞ gn ≤ lim inf n→∞ a
gn = limn→∞ a
gn = F (b) − F (a),
as required.
Remark There is a generalization of this result in 224I.

222D
R x Lemma
Rx Suppose that a < b in R, and that f , g are real-valued functions, both integrable over [a, b], such
that a f = a g for every x ∈ [a, b]. Then f = g almost everywhere in [a, b].
proof The point is that
R Rb Rb R
E
f= a
f × χE = a
g × χE = E
g
for any measurable set E ⊆ [a, b[.
P (i) If E = [c, d[ where a ≤ c ≤ d ≤ b, then
P
R Rd Rc Rd Rc R
E
f= a
f− a
f= a
g− a
g= E
g.

(ii) If E = [a, b[ ∩ G for some open set G ⊆ R, then for each n ∈ N set
Kn = {k : k ∈ Z, |k| ≤ 4n , [2−n k, 2−n (k + 1)[ ⊆ G},
58 The Fundamental Theorem of Calculus 222D

S
Hn = k∈Kn [2−n k, 2−n (k + 1)[ ∩ [a, b[;
then hHn in∈N is a non-decreasing sequence of measurable sets with union E, so f × χE = limn→∞ f × χHn , and
(by Lebesgue’s Dominated Convergence Theorem, because |f × χHn | ≤ |f | almost everywhere for every n, and |f | is
integrable)
R R
E
f = limn→∞ Hn
f.
At the same time, each Hn is a finite disjoint union of half-open intervals in [a, b[, so
R P R P R R
H
f = k∈Kn [2−n k,2−n (k+1)[∩[a,b[ f = k∈Kn [2−n k,2−n (k+1)[∩[a,b[ g = H g,
n n

and
R R R R
E
g = limn→∞ Hn
g = limn→∞ Hn
f= E
f.

−n
Tmeasurable E ⊆ [a, b[, we′ can choose for each n ∈ N an open set Gn ⊇ E such that µGn ≤ µE+2
(iii) For general

(134Fa). Set Gn = m≤n Gm , En = [a, b[ ∩ Gn for each n,
T T T
F = [a, b[ ∩ n∈N Gn = n∈N [a, b[ ∩ G′n = n∈N En .
Then E ⊆ F and
µF ≤ inf n∈N µGn = µE,
R R R
so F \ E is negligible and f × χ(F \ E) is zero almost everywhere; consequently F \E
f = 0 and F
f = E
f . On the
other hand,
f × χF = limn→∞ f × χEn ,
so by Lebesgue’s Dominated Convergence Theorem again
R R R
E
f= F
f = limn→∞ En
f.
Similarly
R R
g = limn→∞ E g.
E n
R R R R
But by part (ii) we have En g = En f for every n, so E g = E f , as required. Q
Q
By 131Hb, f = g almost everywhere in [a, b[, and therefore almost everywhere in [a, b].

222E RTheorem Suppose that a ≤ b in R and that f is a real-valued function which is integrable over [a, b]. Then
x
F (x) = a f exists in R for every x ∈ [a, b], and the derivative F ′ (x) exists and is equal to f (x) for almost every
x ∈ [a, b].
proof (a) For most of this proof (down to the end of (c) below) I suppose that f is non-negative. In this case,
Ry
F (y) = F (x) + x
f ≥ F (x)
whenever a ≤ x ≤ y ≤ b; thus FR is non-decreasing and therefore differentiable almost everywhere in [a, b], by 222A.
x
By 222C we know also that a F ′ exists and is less than or equal to F (x) − F (a) = F (x) for every x ∈ [a, b].
(b) Now suppose, in addition, that f is bounded; say 0 ≤ f (t) ≤ M for every t ∈ dom f . Then M − f is integrable
over [a, b]; let G Rbe its indefinite integral, so that G(x) = M (x − a) − F (x) for every x ∈ [a, b].
R x Applying (a) to M − f
x
and G, we have a G′ ≤ G(x) for every x ∈ [a, b]; but of course G′ = M − F ′ , so M (x − a) − a F ′ ≤ M (x − a) − F (x),
Rx Rx Rx
that is, a F ′ ≥ F (x) for every x ∈ [a, b]. Thus a F ′ = a f for every x ∈ [a, b]. Now 222D tells us that F ′ = f almost
everywhere in [a, b].
(c) Thus for bounded, non-negative f we are done. For unbounded f , let hfn in∈N be a non-decreasing sequence
of non-negative simple functions converging to f almost everywhere in [a, b], and let hFn in∈N be the corresponding
indefinite integrals. Then for any n and any x, y with a ≤ x ≤ y ≤ b, we have
Ry Ry
F (y) − F (x) = x
f≥ x
fn = Fn (y) − Fn (x),
so that F ′ (x) ≥ Fn′ (x) for any x ∈ ]a, b[ where both are defined, and F ′ (x) ≥ fn (x) for almost every x ∈ [a, b]. This is
true for every n, so F ′ ≥ f almost everywhere, and F ′ − f ≥ 0 almost everywhere. On the other hand, as noted in (a),
Rb Rb
a
F ′ ≤ F (b) − F (a) = a
f,
*222J Differentiating an indefinite integral 59

Rb
so a
F ′ − f ≤ 0. It follows that F ′ =a.e. f (that is, that F ′ = f almost everywhere in [a, b])(122Rd).
(d) This completes the proof for non-negative f . For general f , we can express f as f1 − f2 where f1 , f2 are
non-negative integrable functions; now F = F1 − F2 where F1 , F2 are the corresponding indefinite integrals, so
F ′ =a.e. F1′ − F2′ =a.e. f1 − f2 , and F ′ =a.e. f .
Rx
222F Corollary Suppose that f is any real-valued function which is integrable over R, and set F (x) = −∞
f for
every x ∈ R. Then F ′ (x) exists and is equal to f (x) for almost every x ∈ R.
proof For each n ∈ N, set
Rx
Fn (x) = −n
f
for x ∈ [−n, n]. Then Fn′ (x)
= f (x) for almost every x ∈ [−n, n]. But F (x) = F (−n) + Fn (x) for every x ∈ [−n, n], so
F ′ (x) exists and is equal to Fn′ (x) for every x ∈ ]−n, n[ for which Fn′ (x) is defined; and F ′ (x) = f (x) for almost every
x ∈ [−n, n]. As n is arbitrary, F ′ =a.e. f .

222G CorollaryR Suppose that E ⊆ R is a measurable set and that f is a real-valued function which is integrable
over E. Set F (x) = E∩]−∞,x[ f for x ∈ R. Then F ′ (x) = f (x) for almost every x ∈ E, and F ′ (x) = 0 for almost every
x ∈ R \ E.
Rx
proof Apply 222F to f˜, where f˜(x) = f (x) for x ∈ E ∩ dom f and f˜(x) = 0 for x ∈ R \ E, so that F (x) = −∞ f˜ for
every x ∈ R.

d
Rx
222H The result that dx a
f = f (x) for almost every x is satisfying, but is no substitute for the more elementary
result that this equality is valid at any point at which f is continuous.
Rx
Proposition Suppose that a ≤ b in R and that f is a real-valued function which is integrable over [a, b]. Set F (x) = a f
for x ∈ [a, b]. Then F ′ (x) exists and is equal to f (x) at any point x ∈ dom(f ) ∩ ]a, b[ at which f is continuous.
proof Set c = f (x). Let ǫ > 0. Let δ > 0 be such that δ ≤ min(b − x, x − a) and |f (t) − c| ≤ ǫ whenever t ∈ dom f
and |t − x| ≤ δ. If x < y ≤ x + δ, then
F (y)−F (x) 1 Ry 1 Ry
| − c| = | f − c| ≤ |f − c| ≤ ǫ.
y−x y−x x y−x x
Similarly, if x − δ ≤ y < x,
F (y)−F (x) 1 Rx 1 Rx
| − f (x)| = | f − c| ≤ |f − c| ≤ ǫ.
y−x x−y y x−y y
As ǫ is arbitrary,
F (y)−F (x)
F ′ (x) = limy→x = c,
y−x
as required.

222I Complex-valued functions So far in this section, I have taken every f to be real-valued. The extension to
complex-valued f is just a matter of applying the above results to the real and imaginary parts of f . Specifically, we
have the following.
Rx
(a) If a ≤ b in R and f is a complex-valued function which is integrable over [a, b], then F (x) = a f is defined
in C for every x ∈ [a, b], and its derivative F ′ (x) exists and is equal to f (x) for almost every x ∈ [a, b]; moreover,
F ′ (x) = f (x) whenever x ∈ dom(f ) ∩ ]a, b[ and f is continuous at x.
Rx
(b) If f is a complex-valued function which is integrable over R, and F (x) = −∞ f for each x ∈ R, then F ′ exists
and is equal to f almost everywhere in R.

R (c) If E ⊆ R is a measurable ′set and f is a complex-valued function which′


is integrable over E, and F (x) =
E∩]−∞,x[
f for each x ∈ R, then F (x) = f (x) for almost every x ∈ E and F (x) = 0 for almost every x ∈ R \ E.

*222J The Denjoy-Young-Saks theorem The next result will not be used, on present plans, anywhere in this
treatise. It is however central to parts of real analysis for which this volume is supposed to be a foundation, and while
60 The Fundamental Theorem of Calculus *222J

the argument requires a certain sophistication it is not really a large step from Lebesgue’s theorem 222A. I must begin
with some notation.
Definition Let f be any real function, and A ⊆ R its domain. Write
Ã+ = {x : x ∈ A, ]x, x + δ] ∩ A 6= ∅ for every δ > 0},

Ã− = {x : x ∈ A, [x − δ, x[ ∩ A 6= ∅ for every δ > 0}.


Set
f (y)−f (x) f (y)−f (x)
D+ (x) = lim supy∈A,y↓x = inf δ>0 supy∈A,x<y≤x+δ ,
y−x y−x

f (y)−f (x) f (y)−f (x)


D+ (x) = lim inf y∈A,y↓x = supδ>0 inf y∈A,x<y≤x+δ
y−x y−x

for x ∈ Ã+ , and


f (y)−f (x) f (y)−f (x)
D− (x) = lim supy∈A,y↑x = inf δ>0 supy∈A,x−δ≤y<x ,
y−x y−x

f (y)−f (x) f (y)−f (x)


D− (x) = lim inf y∈A,y↑x = supδ>0 inf y∈A,x−δ≤y<x
y−x y−x

for x ∈ Ã− , all defined in [−∞, ∞]. (These are the four Dini derivates of f . You will also see D+ , d+ , D− , d− used
in place of my D+ , D+ , D− and D− .)
Note that we surely have (D+ f )(x) ≤ (D+ f )(x) for every x ∈ Ã+ , while (D− f )(x) ≤ (D− f )(x) for every x ∈ Ã− .
The ordinary derivative f ′ (x) is defined and equal to c ∈ R iff (α) x belongs to some open interval included in A (β)
(D+ f )(x) = (D+ f )(x) = (D− f )(x) = (D+ f )(x) = c.

*222K Lemma Let A be any subset of R, and define Ã+ and Ã− as in 222J. Then A \ Ã∗ and A \ Ã− are countable,
therefore negligible.
proof We have
S
A \ Ã+ = q∈Q {x : x ∈ A, x < q, A ∩ ]x, q] = ∅}.
But for any q ∈ Q there can be at most one x ∈ A such that x < q and ]x, q] does not meet A, so A \ Ã+ is a countable
union of finite sets and is countable. Similarly,
S
A \ Ã− = q∈Q {x : x ∈ A, q < x, A ∩ [q, x[ = ∅}
is countable.

*222L Theorem Let f be any real function, and A its domain. Then for almost every x ∈ A
either all four Dini derivates of f at x are defined, finite and equal
or (D+ f )(x) = (D− f )(x) is finite, (D+ f )(x) = −∞ and (D+ f )(x) = ∞
or (D+ f )(x) = (D− f )(x) is finite, (D+ f )(x) = ∞ and (D− f )(x) = −∞
or (D+ f )(x) = (D− f )(x) = ∞ and (D+ f )(x) = (D− f )(x) = −∞.
proof (a) Set à = Ã+ ∩ Ã− , defining Ã+ and Ã− as in 222J, so that à is a cocountable subset of A and all four Dini
derivates are defined on Ã. For n ∈ N, q ∈ Q set
Eqn = {x : x ∈ Ã, x < q, f (y) ≥ f (x) − n(y − x)} for every y ∈ A ∩ [x, q]}.
Observe that
S
n∈N,q∈Q Eqn = {x : x ∈ Ã, (D+ f )(x) > −∞}.
For those q ∈ Q, n ∈ N such that Eqn is not empty, set βqn = sup Eqn ∈ ]−∞, q], αqn = inf Eqn ∈ [−∞, βqn ], and for
x ∈ ]αqn , βqn [ set gqn (x) = inf{f (y) + ny : y ∈ A ∩ [x, q]}. Note that if x ∈ Eqn \ {αqn , βqn } then gqn (x) = f (x) + nx
is finite; also g is monotonic, therefore finite everywhere in ]αqn , βqn [, and of course gqn (x) ≤ f (x) + nx for every
x ∈ A ∩ ]αqn , βqn [.

By 222A, almost every point of ]αqn , βqn [ belongs to Fqn = dom gqn ; in particular, Eqn \ Fqn is negligible. Set
hqn (x) = gqn (x) − nx for x ∈ ]αqn , βqn [; then h is differentiable at every point of Fqn . Now if x ∈ Eqn ∩ Fqn , we have
hqn (x) = f (x), while hqn (x) ≤ f (x) for x ∈ A ∩ ]αqn , βqn [; it follows that
*222L Differentiating an indefinite integral 61

f (y)−f (x)
(D+ f )(x) = sup inf
δ>0 y∈A∩]x,x+δ] y−x
hqn (y)−hqn (x)
≥ sup inf
δ>0 y∈A∩]x,x+δ] y−x
hqn (y)−hqn (x)
≥ sup inf = h′qn (x),
0<δ<βqn −x y∈]x,x+δ] y−x

and similarly (D− f )(x) ≤ h′qn (x). On the other hand, if x ∈ Ẽqn
+
, then

f (y)−f (x)
(D+ f )(x) = sup inf
δ>0 y∈A∩]x,x+δ] y−x
f (y)−f (x)
≤ sup inf
δ>0 y∈Eqn ∩]x,x+δ] y−x
hqn (y)−hqn (x)
= sup inf
δ>0 y∈Eqn ∩]x,x+δ] y−x
hqn (y)−hqn (x)
≤ inf sup
δ>0 y∈Eqn ∩]x,x+δ] y−x

hqn (y)−hqn (x)


≤ inf sup = h′qn (x).
0<δ<βqn −x y∈]x,x+δ] y−x

+
Putting these together, we see that if x ∈ Fqn ∩ Ẽqn then (D+ f )(x) = h′qn (x) ≥ (D− f )(x).
Conventionally setting Fqn = ∅ if Eqn is empty, the last sentence is true for all q ∈ Q and n ∈ N, while A \ Ã and
S + +
S
q∈Q,n∈N Eqn \ (Fqn ∩ Ẽqn ) are negligible, and (D f )(x) = −∞ for every x ∈ Ã \ q∈Q,n∈N Eqn . So we see that, for
almost every x ∈ A, either (D+ f )(x) = −∞ or ∞ > (D+ f )(x) ≥ (D− f )(x).
(b) Reflecting the above argument left-to-right or up-to-down, we see that, for almost every x ∈ A,
either (D− f )(x) = −∞ or ∞ > (D− f )(x) ≥ (D+ f )(x),

either (D+ f )(x) = ∞ or −∞ < (D+ f )(x) ≤ (D− f )(x),

either (D− f )(x) = ∞ or −∞ < (D− f )(x) ≤ (D+ f )(x),


and also
(D+ f )(x) ≤ (D+ f )(x), (D− f )(x) ≤ (D− f )(x).
For such x, therefore,
(D+ f )(x) > −∞ =⇒ (D− f )(x) ≤ (D+ f )(x) < ∞ =⇒ (D+ f )(x) = (D− f )(x) ∈ R,
and similarly
(D− f )(x) > −∞ =⇒ (D− f )(x) = (D+ f )(x) ∈ R,

(D+ f )(x) < ∞ =⇒ (D+ f )(x) = (D− f )(x) ∈ R,

(D− f )(x) < ∞ =⇒ (D− f )(x) = (D+ f )(x) ∈ R.


So we have
either (D+ f )(x) = (D− f )(x) is finite or (D+ f )(x) = −∞ and (D− f )(x) = ∞,

either (D− f )(x) = (D+ f )(x) is finite or (D− f )(x) = −∞ and (D+ f )(x) = ∞.
These two dichotomies lead to four possibilities; and since
(D+ f )(x) = (D− f )(x) is finite, (D− f )(x) = (D+ f )(x) is finite
can be true together only when all four derivates are equal and finite, we have the four cases listed in the statement of
the theorem.
62 The Fundamental Theorem of Calculus 222X

R1
222X Basic exercises > (a) Let F : [0, 1] → [0, 1] be the Cantor function (134H). Show that 0 F ′ = 0 <
F (1) − F (0).
Ry
> (b) Suppose that a < b in R and that h is a real-valued function such that x h exists and is non-negative whenever
a ≤ x ≤ y ≤ b. Show that h ≥ 0 almost everywhere in [a, b].
Rx Rx
> (c) Suppose that a < b in R and that f , g are integrable complex-valued functions on [a, b] such that a f = a g
for every x ∈ [a, b]. Show that f = g almost everywhere in [a, b].

> (d) Suppose that aR < b in R and that f is a real-valued function which is integrable over [a, b]. Show that the
x
indefinite integral x 7→ a f is continuous.

P∞ exercises (a) Let hFn in∈N be a sequence of non-negative,


222Y Further P∞ ′
non-decreasing functions on [0, 1] such
that F (x) = n=0 Fn (x) is finite
P∞ for every x ∈ [0, 1]. Show that Pnk
n=0 F n (x) = F ′ (x) for almost
P∞ every x ∈ [0, 1].
(Hint: take hnk ik∈N such that k=0 F (1) − Gk (1) < ∞, where Gk = j=0 Fj , and set H(x) = k=0 F (x) − Gk (x).
P∞
Observe that k=0 F ′ (x) − G′k (x) ≤ H ′ (x) whenever all the derivatives are defined, so that F ′ = limk→∞ G′k almost
everywhere.)

(b) Let F : [0, 1] → R be a continuous non-decreasing function. (i) Show that if c ∈ R then C = {(x, y) : x, y ∈
[0, 1], F (y) − F (x) = c} is connected. (Hint: A set A ⊆ R r is connected if there is no continuous surjection h : A →
{0, 1}. Show that if h : C → {0, 1} is continuous then it is of the form (x, y) 7→ h1 (x) for some continuous function h1 .)
(ii) Now suppose that F (0) = 0, F (1) = 1 and that G : [0, 1] → [0, 1] is a second continuous non-decreasing function
with G(0) = 0, G(1) = 1. Show that for any n ≥ 1 there are x, y ∈ [0, 1] such that F (y) − F (x) = G(y) − G(x) = n1 .

R v (c) LetR f , g be Rnon-negative


R integrable functions on R, and n ≥ 1. Show that there are u < v in [−∞, ∞] such that
1 v 1
u
f = n f and u
g = n g.

(d) Let f : R → R be measurable. Show that H = dom f ′ is a measurable set and that f ′ is a measurable function.

(e) Construct a Borel measurable function f : [0, 1] → {−1, 0, 1} such that each of the four possibilities described
in Theorem 222L occurs on a set of measure 14 .

222 RNotes and comments I have relegated to an exercise (222Xd) the fundamental fact that an indefinite integral
x
x 7→ a f is always continuous; this is not strictly speaking needed in this section, and a much stronger result is given
in 225E. There is also much more to be said about monotonic functions, to which I will return in §224. What we
need here is the fact that they are differentiable almost everywhere (222A), which I prove by applying Vitali’s theorem
three times, once in part (b) of the proof and twice in part (c). Following this, the arguments of 222C-222E form a
fine series of exercises in the central ideas of Volume 1, using the concept of integration over a (measurable) subset,
Fatou’s Lemma (part (d) of the proof of 222C), Lebesgue’s Dominated Convergence Theorem (parts (ii) and (iii) of the
proof of 222D) and the Rapproximation of Lebesgue measurable sets by open sets (part (iii) of the proof of 222D). Of
d x
course knowing that dx a
f = f (x) almost everywhere is not at all the same thing as knowing that this holds for any
particular x, and when we come to differentiate any particular indefinite integral we generally turn to 222H first; the
point of 222E is that it applies to wildly discontinuous functions, for which more primitive methods give no information
at all.
The Denjoy-Young-Saks theorem (222L) is one of the starting points of a flourishing theory of ‘typical’ phenomena
in real analysis. It is easy to build a function f with any prescribed set of values for (D+ f )(0), (D+ f )(0), (D− f )(0)
and (D− f )(0) (subject, of course, to the requirements (D+ f )(0) ≤ (D+ f )(0) and (D− f )(0) ≤ (D− f )(0)). But 222L
tells us that such combinations as (D− f )(x) = (D+ f )(x) = ∞ (what we might call ‘f ′ (x) = ∞’) can occur only on
negligible sets. The four easily realized possibilities in 222L (see 222Ye) are the only ones which can appear at points
which are ‘typical’ for the given function, from the point of view of Lebesgue measure. For a monotonic function, 222A
tells us more: at ‘typical’ points for a monotonic function, the function is actually differentiable. In the next section we
shall see some more ways of generating negligible and conegligible sets from a given set or function, leading to further
refinements of the idea.
223C Lebesgue’s density theorems 63

223 Lebesgue’s density theorems


I now turn to a group of results which may be thought of as corollaries of Theorem 222E, but which also have a
vigorous life of their own, including the possibility of significant generalizations which will be treated in Chapter 26.
The idea is that any measurable function f on R is almost everywhere ‘continuous’ in a variety of very weak senses;
for almost every x, the value f (x) is determined by the behaviour of f near x, in the sense that f (y) ≏ f (x) for ‘most’
y near x. I should perhaps say that while I recommend this work as a preparation for Chapter 26, and I also rely on it
in Chapter 28, I shall not refer to it again in the present chapter, so that readers in a hurry to characterize indefinite
integrals may proceed directly to §224.

223A Lebesgue’s Density Theorem: integral form Let I be an interval in R, and let f be a real-valued
function which is integrable over I. Then
Z x+h Z x Z x+h
1 1 1
f (x) = lim f = lim f = lim f
h↓0 h x h↓0 h x−h h↓0 2h x−h

for almost every x ∈ I.


R
proof Setting F (x) = I∩]−∞,x[
f , we know from 222G that
Z x+h
1 1
f (x) = F ′ (x) = lim (F (x + h) − F (x)) = lim f
h↓0 h h↓0 h
Zxx
1 1
= lim (F (x) − F (x − h)) = lim f
h↓0 h h↓0 h x−h
Z x+h
1 1
= lim (F (x + h) − F (x − h)) = lim f
h↓0 2h h↓0 2h x−h

for almost every x ∈ I.

223B Corollary Let E ⊆ R be a measurable set. Then


1
limh↓0 µ(E ∩ [x − h, x + h]) = 1 for almost every x ∈ E,
2h

1
limh↓0 µ(E ∩ [x − h, x + h]) = 0 for almost every x ∈ R \ E.
2h

proof Take n ∈ N. Applying 223A to f = χ(E ∩ [−n, n]), we see that


1 R x+h 1
limh↓0 f = limh↓0 µ(E ∩ [x − h, x + h])
2h x−h 2h

whenever x ∈ ]−n, n[ and either limit exists, so that


1
limh↓0 µ(E ∩ [x − h, x + h]) = 1 for almost every x ∈ E ∩ [−n, n],
2h

1
limh↓0 µ(E ∩ [x − h, x + h]) = 0 for almost every x ∈ [−n, n] \ E.
2h
As n is arbitrary, we have the result.
1
Remark For a measurable set E ⊆ R, a point x such that limh↓0 µ(E ∩ [x − h, x + h]) = 1 is sometimes called a
2h
density point of E.

223C Corollary Let f be a measurable real-valued function defined almost everywhere in R. Then for almost every
x ∈ R,
1
limh↓0 µ{y : y ∈ dom f, |y − x| ≤ h, |f (y) − f (x)| ≤ ǫ} = 1,
2h

1
limh↓0 µ{y : y ∈ dom f, |y − x| ≤ h, |f (y) − f (x)| ≥ ǫ} = 0
2h
64 The Fundamental Theorem of Calculus 223C

for every ǫ > 0.


proof For q, q ′ ∈ Q, set
Dqq′ = {x : x ∈ dom f, q ≤ f (x) < q ′ },
so that Dqq′ is measurable,
1
Cqq′ = {x : x ∈ Dqq′ , limh↓0 µ(Dqq′ ∩ [x − h, x + h]) = 1},
2h
so that Dqq′ \ Cqq′ is negligible, by 223B; now set
S
C = dom f \ q,q ′ ∈Q (Dqq ′ \ Cqq′ ),
so that C is conegligible. If x ∈ C and ǫ > 0, then there are q, q ′ ∈ Q such that f (x) − ǫ ≤ q ≤ f (x) < q ′ ≤ f (x) + ǫ,
so that x belongs to Dqq′ and therefore to Cqq′ , and now

1
lim inf µ{y : y ∈ dom f ∩ [x − h, x + h], |f (y) − f (x)| ≤ ǫ}
h↓0 2h
1
≥ lim inf µ(Dqq′ ∩ [x − h, x + h]) = 1,
h↓0 2h
so
1
limh↓0 µ{y : y ∈ dom f ∩ [x − h, x + h], |f (y) − f (x)| ≤ ǫ} = 1.
2h
It follows at once that
1
limh↓0 µ{y : y ∈ dom f ∩ [x − h, x + h], |f (y) − f (x)| > ǫ} = 0
2h

for almost every x; but since ǫ is arbitrary, this is also true of 12 ǫ, so in fact
1
limh↓0 µ{y : y ∈ dom f ∩ [x − h, x + h], |f (y) − f (x)| ≥ ǫ} = 0
2h
for almost every x.

223D Theorem Let I be an interval in R, and let f be a real-valued function which is integrable over I. Then
1 R x+h
limh↓0 |f (y) − f (x)|dy =0
2h x−h
for almost every x ∈ I.
proof (a) Suppose first that I is a bounded open interval ]a, b[. For each q ∈ Q, set gq (x) = |f (x)−q| for x ∈ I ∩dom f ;
then g is integrable over I, and
1 R x+h
limh↓0 gq = gq (x)
2h x−h
for almost every x ∈ I, by 223A. Setting
1 R x+h
Eq = {x : x ∈ I ∩ dom f, limh↓0 gq = gq (x)},
2h x−h
T
we have I \ Eq negligible, so I \ E is negligible, where E = q∈Q Eq . Now
1 R x+h
limh↓0 |f (y) − f (x)|dy =0
2h x−h

for every x ∈ E. P
P Take x ∈ E and ǫ > 0. Then there is a q ∈ Q such that |f (x) − q| ≤ ǫ, so that
|f (y) − f (x)| ≤ |f (y) − q| + ǫ = gq (y) + ǫ
for every y ∈ I ∩ dom f , and
Z x+h Z x+h
1 1
lim sup |f (y) − f (x)|dy ≤ lim sup gq (y) + ǫ dy
h↓0 2h x−h h↓0 2h x−h

= ǫ + gq (x) ≤ 2ǫ.
As ǫ is arbitrary,
223Xf Lebesgue’s density theorems 65

1 R x+h
limh↓0 |f (y) − f (x)|dy = 0,
2h x−h
as required. Q
Q
(b) If I is an unbounded open interval, apply (a) to the intervals In = I ∩ ]−n, n[ to see that the limit is zero almost
everywhere in every In , and therefore on I. If I is an arbitrary interval, note that it differs by at most two points from
an open interval, and that since we are looking only for something to happen almost everywhere we can ignore these
points.
Remark The set
1 R x+h
{x : x ∈ dom f, limh↓0 |f (y) − f (x)|dy = 0}
2h x−h
is sometimes called the Lebesgue set of f .

223E Complex-valued functions I have expressed the results above in terms of real-valued functions, this being
the most natural vehicle for the ideas. However there are applications of great importance in which the functions
involved are complex-valued, so I spell out the relevant statements here. In all cases the proof is elementary, being
nothing more than applying the corresponding result (223A, 223C or 223D) to the real and imaginary parts of the
function f .

(a) Let I be an interval in R, and let f be a complex-valued function which is integrable over I. Then
Z x+h Z x Z x+h
1 1 1
f (x) = lim f = lim f = lim f
h↓0 h x h↓0 h x−h h↓0 2h x−h

for almost every x ∈ I.

(b) Let f be a measurable complex-valued function defined almost everywhere in R. Then for almost every x ∈ R,
1
limh↓0 µ{y : y ∈ dom f, |y − x| ≤ h, |f (y) − f (x)| ≥ ǫ} = 0
2h
for every ǫ > 0.

(c) Let I be an interval in R, and let f be a complex-valued function which is integrable over I. Then
1 R x+h
limh↓0 |f (y) − f (x)|dy =0
2h x−h
for almost every x ∈ I.

223X Basic exercises >(a) Let E ⊆ [0, 1] be a measurable set for which there is an α > 0 such that µ(E ∩ [a, b]) ≥
α(b − a) whenever 0 ≤ a ≤ b ≤ 1. Show that µE = 1.

1 ∗
(b) Let A ⊆ R be any set. Show that limh↓0 µ (A ∩ [x − h, x + h]) = 1 for almost every x ∈ A. (Hint: apply 223B
2h
to a measurable envelope E of A.)

(c) Let E, F ⊆ R be measurable sets, and x ∈ R a point which is a density point of both. Show that x is a density
point of E ∩ F .
T
(d) Let E ⊆ R be a non-negligible measurable set. Show that for any n ∈ N there is a δ > 0 such that i≤n E + xi
is non-empty whenever x0 , . . . , xn ∈ R are such that |xi − xj | ≤ δ for all i, j ≤ n. (Hint: find a non-trivial interval I
n
such that µ(E ∩ I) > n+1 µI.)

1 ∗
(e) Let f be any real-valued function defined almost everywhere in R. Show that limh↓0 µ {y : y ∈ dom f, |y−x| ≤
2h
h, |f (y) − f (x)| ≤ ǫ} = 1 for almost every x ∈ R. (Hint: use the argument of 223C, but with 223Xb in place of 223B.)

> (f ) Let I be an interval in R, and let f be a real-valued function which is integrable over I. Show that
R x+h
limh↓0 h1 x |f (y) − f (x)|dy = 0 for almost every x ∈ I.
66 The Fundamental Theorem of Calculus 223Xg

(g) Let E, F ⊆ R be measurable sets, and suppose that F is bounded and of non-zero measure. Let x ∈ R be such
1 µ(E∩(x+hF ))
that limh↓0 µ(E ∩ [x − h, x + h]) = 1. Show that limh↓0 = 1. (Hint: it helps to know that µ(hF ) = hµF
2h h µF
(134Ya, 263A). Show that if F ⊆ [−M, M ], then
1 µF µ(E∩(x+hF )) 
µ(E ∩ [x − hM, x + hM ]) ≤ 1 − 1− .)
2hM 2M h µF

(Compare 223Ya.)

(h) Let f be a real-valued function which is integrable over R, and E be the Lebesgue set of f . Show that
1
R x+h
limh↓0 2h x−h
|f (t) − c|dt = |f (x) − c| for every x ∈ E and c ∈ R.

(i) Let f be an integrable real-valued function defined almost everywhere in R. Let x ∈ dom f be such that
n R x+1/n
limn→∞ x−1/n
|f (y) − f (x)| = 0. Show that x belongs to the Lebesgue set of f .
2

(j) Let f be an integrable real-valued function defined almost everywhere in R, and x any point of the Lebesgue set
of f . Show that for every ǫ > 0 there is a δ > 0 such that whenever I is a non-trivial interval and x ∈ I ⊆ [x − δ, x + δ],
1 R
then |f (x) − I
f | ≤ ǫ.
µI

223Y Further exercises (a) Let E, F ⊆ R be measurable sets, and suppose that 0 < µF < ∞. Let x ∈ R be
1
such that limh↓0 µ(E ∩ [x − h, x + h]) = 1. Show that
2h
µ(E∩(x+hF ))
limh↓0 = 1.
h µF

(Hint: apply 223Xg to sets of the form F ∩ [−M, M ].)

(b) Let T be the family of measurable sets G ⊆ R such that every point of G is a density point of
S G. (i) Show that
T is a topology on R. (Hint: take G ⊆ T. By 215B(iv) there is a countable G0 ⊆ G such that µ(G \ G0 ) = 0 for every
G ∈ G. Show that
S 1 S
G ⊆ {x : lim suph↓0 µ( G0 ∩ [x − h, x + h]) > 0},
2h
S S
so that µ( G \ G0 ) = 0.) (ii) Show that a function f : R → R is measurable iff it is T-continuous at almost every
x ∈ R. (T is the density topology on R. See 414P in Volume 4.)
d
Rx
(c) Show that if f : [a, b] → R is bounded and continuous for the density topology on R, then f (x) = dx a
f for
every x ∈ ]a, b[.
1 ∗
(d) Show that a function f : R → R is continuous for the density topology at x ∈ R iff limh↓0 2h µ {y : y ∈
[x − h, x + h], |f (y) − f (x)| ≥ ǫ} = 0 for every ǫ > 0.

ρ(y,A)
(e) A set A ⊆ R is porous at a point x ∈ R if lim supy→x > 0, where ρ(y, A) = inf a∈A |y − a|. (Take
|y−x|
ρ(y, ∅) = ∞.) Show that if A is porous at every x ∈ A then A is negligible.

(f ) For a measurable set E ⊆ R write int*E for the set of its density points. Show that if E, F ⊆ R are measurable
then (i) int*(E ∩ F ) = int*E ∩ int*F (ii) int*E ⊆ int*F iff µ(E \ F ) = 0 (iii) µ(E△int*E) = 0 (iv) int*(int*E) = int*E
(v) for every compact set K ⊆ int*E there is a compact set L ⊆ K ∪ E such that K ⊆ int*L.

(g) Let f be an integrable real-valued function defined almost R everywhere


R inRR, and x any point of the Lebesgue set
of f . Show that for every ǫ > 0 there is a δ > 0 such that |f (x) g − f ×g| ≤ ǫ g whenever g : R → [0, ∞[ is such that
g is non-decreasing on ]−∞, x], non-increasing on [x, ∞[ and zero outside [x−δ, x+δ]. (Hint: express g as a limit almost
g(x) Pn
everywhere of functions of the form
n+1 i=0 χ ]ai , bi [, where x − δ ≤ a0 ≤ . . . ≤ an ≤ x ≤ bn ≤ . . . ≤ b0 ≤ x + δ.)

(h) For each integrable real-valued function f defined almost everywhere in R, let Ef be the Lebesgue set of f .
Show that Ef ∩ Eg ⊆ Ef +g , Ef ⊆ E|f | for all integrable f , g.
224C Functions of bounded variation 67

223 Notes and comments The results of this section can be thought of as saying that a measurable function is in
some sense ‘almost continuous’; indeed, 223Yb is an attempt to make this notion precise. For an integrable function
we have stronger results, of which the furthest-reaching seems to be 223D/223Ec.
There are r-dimensional versions of all these theorems, using balls centered on x in place of intervals [x − h, x + h]; I
give these in 261C-261E. A new idea is needed for the r-dimensional version of Lebesgue’s density theorem (261C), but
the rest of the generalization is straightforward. A less natural, and less important, extension, also in §261, involves
functions defined on non-measurable sets (compare 223Xb-223Xe).

224 Functions of bounded variation


I turn now to the second of the two problems to which this chapter is devoted: the identification of those real
functions which are indefinite integrals. I take the opportunity to offer a brief introduction to the theory of functions
of bounded variation, which are interesting in themselves and will be important in Chapter 28. I give the basic
characterization of these functions as differences of monotonic functions (224D), with a representative sample of their
elementary properties.

224A Definition Let f be a real-valued function and D a subset of R. I define VarD (f ), the (total) variation of
f on D, as follows. If D ∩ dom f = ∅, VarD (f ) = 0. Otherwise, VarD (f ) is
Pn
sup{ i=1 |f (ai ) − f (ai−1 )| : a0 , a1 , . . . , an ∈ D ∩ dom f, a0 ≤ a1 ≤ . . . ≤ an },
allowing VarD (f ) = ∞. If VarD (f ) is finite, we say that f is of bounded variation on D. If the context seems clear,
I may write Var f for Vardom f (f ), and say that f is simply ‘of bounded variation’ if this is finite.

224B Remarks (a) In the present chapter, we shall virtually exclusively be concerned with the case in which D
is a bounded closed interval included in dom f . The general formulation will be useful for some technical questions
arising in Chapter 28; but if it makes you more comfortable, you will lose nothing by supposing for the moment that
D is an interval.

(b) Clearly
VarD (f ) = VarD∩dom f (f ) = Var(f ↾D)
for all D, f .

224C Proposition (a) If f , g are two real-valued functions and D ⊆ R, then


VarD (f + g) ≤ VarD (f ) + VarD (g).
(b) If f is a real-valued function, D ⊆ R and c ∈ R then VarD (cf ) = |c| VarD (f ).
(c) If f is a real-valued function, D ⊆ R and x ∈ R then
VarD (f ) ≥ VarD∩]−∞,x] (f ) + VarD∩[x,∞[ (f ),
with equality if x ∈ D ∩ dom f .
(d) If f is a real-valued function and D ⊆ D′ ⊆ R then VarD (f ) ≤ VarD′ (f ).
(e) If f is a real-valued function and D ⊆ R, then |f (x) − f (y)| ≤ VarD (f ) for all x, y ∈ D ∩ dom f ; so if f is of
bounded variation on D then f is bounded on D ∩ dom f and (if D ∩ dom f 6= ∅)
supy∈D∩dom f |f (y)| ≤ |f (x)| + VarD (f )
for every x ∈ D ∩ dom f .
(f) If f is a monotonic real-valued function and D ⊆ R meets dom f , then VarD (f ) = supx∈D∩dom f f (x) −
inf x∈D∩dom f f (x).
proof (a) If D ∩ dom(f + g) = ∅ this is trivial, because VarD (f ) and VarD (g) are surely non-negative. Otherwise, if
a0 ≤ . . . ≤ an in D ∩ dom(f + g), then
n
X n
X n
X
|(f + g)(ai ) − (f + g)(ai−1 )| ≤ |f (ai ) − f (ai−1 )| + |g(ai ) − g(ai−1 )|
i=1 i=1 i=1
≤ VarD (f ) + VarD (g);
as a0 , . . . , an are arbitrary, VarD (f + g) ≤ VarD (f ) + VarD (g).
68 The Fundamental Theorem of Calculus 224C

(b)
Pn Pn
i=1 |(cf )(ai ) − (cf )(ai−1 )| = |c| i=1 |f (ai ) − f (ai−1 )|
whenever a0 ≤ . . . ≤ an in D ∩ dom f .

(c)(i) If either D ∩ ]−∞, x] ∩ dom f or D ∩ [x, ∞[ ∩ dom f is empty, this is trivial. If a0 ≤ . . . ≤ am in D ∩ ]−∞, x] ∩
dom f , b0 ≤ . . . ≤ bn in D ∩ [x, ∞[ ∩ dom f , then

m
X n
X m+n+1
X
|f (ai ) − f (ai−1 )| + |f (bi ) − f (bi−1 )| ≤ |f (ai ) − f (ai−1 )|
i=1 j=1 i=1

≤ Var[a,b] (f ),

if we write ai = bi−m−1 for m + 1 ≤ i ≤ m + n + 1. So


VarD∩]−∞,x] (f ) + VarD∩[x,∞[ (f ) ≤ VarD (f ).

(ii) Now suppose that x ∈ D ∩ dom f . If a0 ≤ . . . ≤ an in D ∩ dom f , and a0 ≤ x ≤ an , let k be such that
x ∈ [ak−1 , ak ]; then

n
X k−1
X
|f (ai ) − f (ai−1 )| ≤ |f (ai ) − f (ai−1 )| + |f (x) − f (ak−1 )|
i=1 i=1
n
X
+ |f (ak ) − f (x)| + |f (ai ) − f (ai−1 )|
i=k+1

≤ VarD∩]−∞,x] (f ) + VarD∩[x,∞[ (f )

P0 Pn Pn
(counting empty sums i=1 , i=n+1 as 0). If x ≤ a0 then i=1 |f (ai ) − f (ai−1 )| ≤ VarD∩[x,∞[ (f ); if x ≥ an then
Pn
i=1 |f (ai ) − f (ai−1 )| ≤ VarD∩]−∞,x] (f ). Thus
Pn
i=1 |f (ai ) − f (ai−1 )| ≤ VarD∩]−∞,x] (f ) + VarD∩[x,∞[ (f )

in all cases; as a0 , . . . , an are arbitrary,


VarD (f ) ≤ VarD∩]−∞,x] (f ) + VarD∩[x,∞[ (f ).
So the two sides are equal.

(d) is trivial.

(e) If x, y ∈ D ∩ dom f and x ≤ y then


|f (x) − f (y)| = |f (y) − f (x)| ≤ VarD (f )
by the definition of VarD ; and the same is true if y ≤ x. So of course |f (y)| ≤ |f (x)| + VarD (f ).

(f ) If f is non-decreasing, then

Xn
VarD (f ) = sup{ |f (ai ) − f (ai−1 )| : a0 , a1 , . . . , an ∈ D ∩ dom f, a0 ≤ a1 ≤ . . . ≤ an }
i=1
Xn
= sup{ f (ai ) − f (ai−1 ) : a0 , a1 , . . . , an ∈ D ∩ dom f, a0 ≤ a1 ≤ . . . ≤ an }
i=1
= sup{f (b) − f (a) : a, b ∈ D ∩ dom f, a ≤ b}
= sup f (b) − inf f (a).
b∈D∩dom f a∈D∩dom f

If f is non-increasing then
224D Functions of bounded variation 69

Xn
VarD (f ) = sup{ |f (ai ) − f (ai−1 )| : a0 , a1 , . . . , an ∈ D ∩ dom f, a0 ≤ a1 ≤ . . . ≤ an }
i=1
Xn
= sup{ f (ai−1 ) − f (ai ) : a0 , a1 , . . . , an ∈ D ∩ dom f, a0 ≤ a1 ≤ . . . ≤ an }
i=1
= sup{f (a) − f (b) : a, b ∈ D ∩ dom f, a ≤ b}
= sup f (a) − inf f (b).
a∈D∩dom f b∈D∩dom f

224D Theorem For any real-valued function f and any set D ⊆ R, the following are equiveridical:
(i) there are two bounded non-decreasing functions f1 , f2 : R → R such that f = f1 − f2 on D ∩ dom f ;
(ii) f is of bounded variation on D;
(iii) there are bounded non-decreasing functions f1 , f2 : R → R such that f = f1 − f2 on D ∩ dom f and
VarD (f ) = Var f1 + Var f2 .
proof (i)⇒(ii) If f : R → R is bounded and non-decreasing, then Var f = supx∈R f (x) − inf x∈R f (x) is finite. So if f
agrees on D ∩ dom f with f1 − f2 where f1 and f2 are bounded and non-decreasing, then

VarD (f ) = VarD∩dom f (f ) ≤ VarD∩dom f (f1 ) + VarD∩dom f (f2 )


≤ Var f1 + Var f2 < ∞,
using (a), (b) and (d) of 224C.
(ii)⇒(iii) Suppose that f is of bounded variation on D. Set D′ = D ∩ dom f . If D′ = ∅ we can take both fj to be
the zero function, so henceforth suppose that D′ 6= ∅. Write
g(x) = VarD∩]−∞,x] (f )
for x ∈ D′ . Then g1 = g + f and g2 = g − f are both non-decreasing. P
P If a, b ∈ D′ and a ≤ b, then
g(b) = g(a) + VarD∩[a,b] (f ) ≥ g(a) + |f (b) − f (a)|.
So
g1 (b) − g1 (a) = g(b) − g(a) + f (b) − f (a), g2 (b) − g2 (a) = g(b) − g(a) − f (b) + f (a)
are both non-negative. Q Q
Now there are non-decreasing functions h1 , h2 : R → R, extending g1 , g2 respectively, such that Var hj = Var gj for
both j. P P f is bounded on D, by 224Ce, and g is bounded just because VarD (f ) < ∞, so that gj is bounded. Set
cj = inf x∈D′ gj (x) and
hj (x) = sup({cj } ∪ {gj (y) : y ∈ D′ , y ≤ x})
Q Observe that for x ∈ D′ ,
for every x ∈ R; this works. Q
h1 (x) + h2 (x) = g1 (x) + g2 (x) = g(x) + f (x) + g(x) − f (x) = 2g(x),

h1 (x) − h2 (x) = 2f (x).


Now, because g1 and g2 are non-decreasing,
supx∈D′ g1 (x) + supx∈D′ g2 (x) = supx∈D′ g1 (x) + g2 (x) = 2 supx∈D′ g(x),

inf x∈D′ g1 (x) + inf x∈D′ g2 (x) = inf x∈D′ g1 (x) + g2 (x) = 2 inf x∈D′ g(x) ≥ 0.
But this means that
Var h1 + Var h2 = Var g1 + Var g2 = 2 Var g ≤ 2 VarD (f ),
using 224Cf three times. So if we set fj (x) = 12 hj (x) for j ∈ {1, 2} and x ∈ R, we shall have non-decreasing functions
such that
1 1
f1 (x) − f2 (x) = f (x) for x ∈ D′ , Var f1 + Var f2 = Var h1 + Var h2 ≤ VarD (f ).
2 2
Since we surely also have
70 The Fundamental Theorem of Calculus 224D

VarD (f ) ≤ VarD (f1 ) + VarD (f2 ) ≤ Var f1 + Var f2 ,


we see that VarD (f ) = Var f1 + Var f2 , and (iii) is true.
(iii)⇒(i) is trivial.

224E Corollary Let f be a real-valued function and D any subset of R. If f is of bounded variation on D, then
limx↓a VarD∩]a,x] (f ) = limx↑a VarD∩[x,a[ (f ) = 0
for every a ∈ R, and
lima→−∞ VarD∩]−∞,a] (f ) = lima→∞ VarD∩[a,∞[ (f ) = 0.

proof (a) Consider first the case in which D = dom f = R and f is a bounded non-decreasing function. Then
VarD∩]a,x] (f ) = supy∈]a,x] f (x) − f (y) = f (x) − inf y>a f (y) = f (x) − limy↓a f (y),
so of course
limx↓a VarD∩]a,x] (f ) = limx↓a f (x) − limy↓a f (y) = 0.
In the same way
limx↑a VarD∩[x,a[ (f ) = limy↑a f (y) − limx↑a f (x) = 0,

lima→−∞ VarD∩]−∞,a] (f ) = lima→−∞ f (a) − limy→−∞ f (y) = 0,

lima→∞ VarD∩[a,∞[ (f ) = limy→∞ f (y) − lima→∞ f (a) = 0.

(b) For the general case, define f1 , f2 from f and D as in 224D. Then for every interval I we have
VarD∩I (f ) ≤ VarI (f1 ) + VarI (f2 ),
so the results for f follow from those for f1 and f2 as established in part (a) of the proof.

224F Corollary Let f be a real-valued function of bounded variation on [a, b], where a < b. If dom f meets every
interval ]a, a + δ] with δ > 0, then
limt∈dom f,t↓a f (t)
is defined in R. If dom f meets [b − δ, b[ for every δ > 0, then
limt∈dom f,t↑b f (t)
is defined in R.
proof Let f1 , f2 : R → R be non-decreasing functions such that f = f1 − f2 on [a, b] ∩ dom f . Then
limt∈dom f,t↓a f (t) = limt↓a f1 (t) − limt↓a f2 (t) = inf t>a f1 (t) − inf t>a f2 (t),

limt∈dom f,t↑b f (t) = limt↑b f1 (t) − limt↑b f2 (t) = supt<b f1 (t) − supt<b f2 (t).

224G Corollary Let f , g be real functions and D a subset of R. If f and g are of bounded variation on D, so is
f × g.
proof (a) The point is that there are non-negative bounded non-decreasing functions f1 , f2 : R → R such that
f = f1 − f2 on D ∩ dom f . P P We know that there are bounded non-decreasing h1 , h2 such that f = h1 − h2 on
D ∩ dom f . Set γi = inf x∈R hi (x) for i = 1, 2,
β1 = max(γ1 − γ2 , 0), β2 = max(γ2 − γ1 , 0),

f 1 = h1 − γ 1 + β 1 , f 2 = h1 − γ 2 + β 2 ;
this works. Q
Q
(b) Now taking similar functions g1 , g2 such that g = g1 − g2 on D ∩ dom g, we have
f × g = f1 × g 1 − f2 × g 1 − f1 × g 2 + f2 × g 2
everywhere in D ∩ dom(f × g) = D ∩ dom f ∩ dom g; but all the fi × gj are bounded non-decreasing functions, so of
bounded variation, and f × g must be of bounded variation on D.
224J Functions of bounded variation 71

224H Proposition Let f : D → R be a function of bounded variation, where D ⊆ R. Then f is continuous at all
except countably many points of D.
proof For n ≥ 1 set

An = {x : x ∈ D, for every δ > 0 there is a y ∈ D ∩ [x − δ, x + δ]


1
such that |f (y) − f (x)| ≥ }.
n

Then #(An ) ≤ n Var f . P P?? Otherwise, we can find distinct x0 , . . . , xk ∈ An with k + 1 > n Var f . Order these so
that x0 < x1 < . . . < xk . Set δ = 12 min1≤i≤k xi − xi−1 > 0. For each i, there is a yi ∈ D ∩ [xi − δ, xi + δ] such that
|f (yi ) − f (xi )| ≥ n1 . Take x′i , yi′ to be xi , yi in order, so that x′i < yi′ . Now
x′0 ≤ y0′ ≤ x′1 ≤ y1′ ≤ . . . ≤ x′k ≤ yk′ ,
and
Pk Pk 1
Var f ≥ i=0 |f (yi′ ) − f (x′i )| = i=0 |f (yi ) − f (xi )| ≥ (k + 1) > Var f ,
n
which is impossible. X
XSQ
Q
It follows that A = n∈N An is countable, being a countable union of finite sets. But A is exactly the set of points
of D at which f is not continuous.

224I Theorem Let I ⊆ R be an interval, and f : I → R a function of bounded variation. Then f is differentiable
almost everywhere in I, and f ′ is integrable over I, with
R
I
|f ′ | ≤ VarI (f ).

proof (a) Let f1 and f2 be non-decreasing functions such that f = f1 − f2 everywhere in I (224D). Then f1 and f2 are
differentiable almost everywhere (222A). At any point of I except possibly its endpoints, if any, f will be differentiable
if f1 and f2 are, so f ′ (x) is defined for almost every x ∈ I.
(b) Set F (x) = VarI∩]−∞,x] f for x ∈ R. If x, y ∈ I and x ≤ y, then
F (y) − F (x) = Var[x,y] f ≥ |f (y) − f (x)|,
′ ′
by 224Cc;
R soRF (x) ≥ |f (x)| whenever x is an interior point of I and both derivatives exist, which is almost everywhere.
So I |f ′ | ≤ I F ′ . But if a, b ∈ I and a ≤ b,
Rb
a
F ′ ≤ F (b) − F (a) ≤ F (b) ≤ Var f .
S
Now I is expressible as n∈N [an , bn ] where an+1 ≤ an ≤ bn ≤ bn+1 for every n. So

Z Z Z
|f ′ | ≤ F′ = F ′ × χI
I
ZI Z
= sup F ′ × χ[an , bn ] = sup F ′ × χ[an , bn ]
n∈N n∈N
(by B.Levi’s theorem)
Z bn
= sup F ′ ≤ VarI (f ).
n∈N an

224J The next result is not needed in this chapter, but is one of the most useful properties of functions of bounded
variation, and will be used repeatedly in Chapter 28.
Proposition Let f , g be real-valued functions defined on subsets of R, and suppose that g is integrable over an
interval [a, b], where a < b, and f is of bounded variation on ]a, b[ and defined almost everywhere in ]a, b[. Then f × g
is integrable over [a, b], and
Z b Z c

f × g ≤ lim |f (x)| + Var]a,b[ (f ) sup g .
a x∈dom f,x↑b c∈[a,b] a
72 The Fundamental Theorem of Calculus 224J

proof (a) By 224F, l = limx∈dom f,x↑b f (x) is defined. Write M = |l| + Var]a,b[ (f ). Note that if y is any point of
dom f ∩ ]a, b[,
|f (y)| ≤ |f (x)| + |f (x) − f (y)| ≤ |f (x)| + Var]a,b[ (f ) → M
as x ↑ b in dom f , so |f (y)| ≤ M . Moreover, f is measurable on ]a, b[, because there are bounded monotonic functions
f1 , f2 : R → R such that f = f1 − f2 everywhere in ]a, b[ ∩ dom f . So f × g is measurable and dominated by M |g|, and
is integrable over ]a, b[ or [a, b].
(b) For n ∈ N, k ≤ 2n set ank = a + 2−n k(b − a), and for 1 ≤ k ≤ 2n choose xnk ∈ dom f ∩ ]an,k−1 , ank ]. Define
fn : ]a, b] → R by setting fn (x) = f (xnk ) if 1 ≤ k ≤ 2n and x ∈ ]an,k−1 , ank ]. Then f (x) = limn→∞ fn (x) whenever
x ∈ ]a, b[ ∩ dom f and f is continuous at x, which must be almost everywhere (224H). Note next that all the fn are
measurable, and that they are uniformly bounded, in modulus, by M . So {fn × g : n ∈ N} is dominated by the
integrable function M |g|, and Lebesgue’s Dominated Convergence Theorem tells us that
Rb Rb
a
f × g = limn→∞ a
fn × g.
Rc Rc
(c) Fix n ∈ N for the moment. Set K = supc∈[a,b] | a
g|. (Note that K is finite because c 7→ a
g is continuous.)
Then

Z 2 n Z
b X ank
fn × g = fn × g
a k=1 an,k−1

2 n Z Z
X ank an,k−1
= f (xnk )( g− g)
k=1 a a
n
2X −1 Z ank Z b

= (f (xnk ) − f (xn,k+1 )) g + f (x n,2n ) g
k=1 a a
Z b
n
2X −1 Z ank

≤ f (xn,2n ) g + f (xn,k+1 ) − f (xnk ) g
a k=1 a

≤ (|f (x n,2n )| + Var]a,b[ (f ))K → M K


as n → ∞.
(d) Now
Rb Rb
| a
f × g| = limn→∞ | a
fn × g| ≤ M K,
as required.

224K Complex-valued functions So far I have taken all functions to be real-valued. This is adequate for the
needs of the present chapter, but in Chapter 28 we shall need to look at complex-valued functions of bounded variation,
and I should perhaps spell out the (elementary) adaptations involved in the extension to the complex case.
(a) Let D be a subset of R and f a complex-valued function. The variation of f on D, VarD (f ), is zero if
D ∩ dom f = ∅, and otherwise is
Pn
sup{ j=1 |f (aj ) − f (aj−1 )| : a0 ≤ a1 ≤ . . . ≤ an in D ∩ dom f },
allowing ∞. If VarD (f ) is finite, we say that f is of bounded variation on D.
(b) Just as in the real case, a complex-valued function of bounded variation must be bounded, and
VarD (f + g) ≤ VarD (f ) + VarD (g),

VarD (cf ) = |c| VarD (f ),

VarD (f ) ≥ VarD∩]−∞,x] (f ) + VarD∩[x,∞[ (f )


for every x ∈ R, with equality if x ∈ D ∩ dom f ,
VarD (f ) ≤ VarD′ (f ) whenever D ⊆ D′ ;
the arguments of 224C go through unchanged.
224Xh Functions of bounded variation 73

(c) A complex-valued function is of bounded variation iff its real and imaginary parts are both of bounded variation
(because
max(VarD (Re f ), VarD (Im f )) ≤ VarD (f ) ≤ VarD (Re f ) + VarD (Im f ).)
So a complex-valued function f is of bounded variation on D iff there are bounded non-decreasing functions f1 , . . . , f4 :
R → R such that f = f1 − f2 + if3 − if4 on D (224D).

(d) Let f be a complex-valued function and D any subset of R. If f is of bounded variation on D, then
limx↓a VarD∩]a,x] (f ) = limx↑a VarD∩[x,a[ (f ) = 0
for every a ∈ R, and
lima→−∞ VarD∩]−∞,a] (f ) = lima→∞ VarD∩[a,∞[ (f ) = 0.
(Apply 224E to the real and imaginary parts of f .)

(e) Let f be a complex-valued function of bounded variation on [a, b], where a < b. If dom f meets every in-
terval ]a, a + δ] with δ > 0, then limt∈dom f,t↓a f (t) is defined in C. If dom f meets [b − δ, b[ for every δ > 0, then
limt∈dom f,t↑b f (t) is defined in C. (Apply 224F to the real and imaginary parts of f .)

(f ) Let f , g be complex functions and D a subset of R. If f and g are of bounded variation on D, so is f × g. (For
f × g is expressible as a linear combination of the four products Re f × Re g, . . . , Im f × Im g, to each of which we
can apply 224G.)

(g) Let I ⊆ R be Ran interval, and f : I → C a function of bounded variation. Then f is differentiable almost
everywhere in I, and I |f ′ | ≤ VarI (f ). (As 224I.)

(h) Let f and g be complex-valued functions defined on subsets of R, and suppose that g is integrable over an
interval [a, b], where a < b, and f is of bounded variation on ]a, b[ and defined almost everywhere in ]a, b[. Then f × g
is integrable over [a, b], and
Z b Z c

f × g ≤ lim |f (x)| + Var]a,b[ (f ) sup g .
a x∈dom f,x↑b c∈[a,b] a

(The argument of 224J applies virtually unchanged.)

1
224X Basic exercises > (a) Set f (x) = x2 sin for x 6= 0, f (0) = 0. Show that f : R → R is differentiable
x2
everywhere and uniformly continuous, but is not of bounded variation on any non-trivial interval containing 0.

(b) Give an example of a non-negative function g : [0, 1] → [0, 1], of bounded variation, such that g is not of
bounded variation.

(c) Show that if f is any real-valued function defined on a subset of R, there is a function f˜ : R → R, extending f ,
such that Var f˜ = Var f . Under what circumstances is f˜ unique?

(d) Let f : D → R be a function of bounded variation, where D ⊆ R is a non-empty set. Show that if inf x∈D |f (x)| >
0 then 1/f is of bounded variation.

(e)PLet f : [a, b] → R be a continuous function, where a ≤ b in R. Show that if c < Var f then there is a δ > 0 such
n
that i=1 |f (ai ) − f (ai−1 )| ≥ c whenever a = a0 ≤ a1 ≤ . . . ≤ an = b and max1≤i≤n ai − ai−1 ≤ δ.

(f ) Let hfn in∈N be a sequence of real functions, and set f (x) = limn→∞ fn (x) whenever the limit is defined. Show
that Var f ≤ lim inf n→∞ Var fn .
Rx
(g) Let f be a real-valued function which is integrable over an interval [a, b] ⊆ R. Set F (x) = a f for x ∈ [a, b].
Rb R
Show that Var F = a |f |. (Hint: start by checking that Var F ≤ |f |; for the reverse inequality, consider the case
f ≥ 0 first.)

(h) Show that if f is a real-valued function defined on a non-empty set D ⊆ R, then


Pn
Var f = sup{| i=1 (−1)i (f (ai ) − f (ai−1 ))| : a0 ≤ a1 ≤ . . . ≤ an in D}.
74 The Fundamental Theorem of Calculus 224Xi

(i) Let f be a real-valued function which is integrable over a bounded interval [a, b] ⊆ R. Show that
Rb Pn R ai
a
|f | =sup{| i=1 (−1)i a f | : a = a0 ≤ a1 ≤ a2 ≤ . . . ≤ an = b}.
i−1

(Hint: put 224Xg and 224Xh together.)

(j) Let f and g be real-valued functions defined on subsets of R, and suppose that g is integrable over an interval
[a, b], where a < b, and f is of bounded variation on ]a, b[ and defined almost everywhere in ]a, b[. Show that
Rb Rb
| a
f × g| ≤ (limx∈dom f,x↓a |f (x)| + Var]a,b[ (f )) supc∈[a,b] | c
g|.

224Y Further exercises (a) Show that if f is any complex-valued function defined on a subset of R, there is a
function f˜ : R → C, extending f , such that Var f˜ = Var f . Under what circumstances is f˜ unique?

(b) Let D be any non-empty subset of R, and let V be the space of functions f : D → R of bounded variation. For
f ∈ V set
Pn
kf k = sup{|f (t0 )| + i=1 |f (ti ) − f (ti−1 )| : t0 ≤ . . . ≤ tn ∈ D}.
Show that (i) k k is a norm on V (ii) V is complete under k k (iii) kf × gk ≤ kf kkgk for all f , g ∈ V, so that V is a
Banach algebra.

(c) Let f : R → R be a function of bounded variation. Show that


R there is a sequence hfn in∈N of differentiable
functions such that limn→∞ fn (x) = f (x) for every x ∈ R, limn→∞ |fn − f | = 0, and Var(fn ) ≤ Var(f ) for every
n ∈ N. (Hint: start with non-decreasing f .)

(d) For any partially ordered set X and any function f : X → R, say that VarX (f ) = 0 if X = ∅ and otherwise
Pn
VarX (f ) = sup{ i=1 |f (ai ) − f (ai−1 )| : a0 , a1 , . . . , an ∈ X, a0 ≤ a1 ≤ . . . ≤ an }.
State and prove results in this framework generalizing 224D and 224Yb. (Hints: f will be ‘non-decreasing’ if f (x) ≤ f (y)
whenever x ≤ y; interpret ]−∞, x] as {y : y ≤ x}.)
Pn
(e) Let (X, ρ) be a metric space and f : [a, b] → X a function, where a ≤ b in R. Set Var[a,b] (f ) = sup{ i=1 ρ(f (ai ), f (ai−1 )) :
a ≤ a0 ≤ . . . ≤ an ≤ b}. (i) Show that Var[a,b] (f ) = Var[a,c] (f ) + Var[c,b] (f ) for every c ∈ [a, b]. (ii) Show that if
Var[a,b] (f ) is finite then f is continuous at all but countably many points of [a, b]. (iii) Show that if X is complete and
Var[a,b] (f ) < ∞ then limt↑x f (t) is defined for every x ∈ ]a, b]. (iv) Show that if X is complete then Var[a,b] (f ) is finite
iff f is expressible as a composition gh, where h : [a, b] → R is non-decreasing and g : R → X is 1-Lipschitz, that is,
ρ(g(c), g(d)) ≤ |c − d| for all c, d ∈ R.

(f ) Let U be a normed space and a ≤ b in R. For functions f : [a, b] → U define Var[a,b] (f ) as in 224Ye,
using the standard metric ρ(x, y) = kx − yk for x, y ∈ U . (i) Show that Var[a,b] (f + g) ≤ Var[a,b] (f ) + Var[a,b] (g),
Var[a,b] (cf ) = |c| Var[a,b] (f ) for all f , g : [a, b] → U and all c ∈ R. (ii) Show that if V is another normed space and
T : U → V is a bounded linear operator then Var[a,b] (T f ) ≤ kT k Var[a,b] (f ) for every f : [a, b] → U .

(g) Let f : [0, 1] → R be a continuous function. For y ∈ R setR h(y) = #(f −1 [{y}]) if this is finite, ∞ otherwise.
Show that (if we allow ∞ as a value of the integral) Var[0,1] (f ) = h. (Hint: for n ∈ N, i < 2n set cni = sup{fR(x) −
f (y) : x, y ∈ [2−n i, 2−n (i + 1)]}, hni (y) = 1 if y ∈ f [ [2−n i, 2−n (i + 1)[ ], 0 otherwise. Show that cni = hni ,
P2n −1 P2n −1
limn→∞ i=0 cni = Var f , limn→∞ i=0 hni = h.) (See also 226Yb.)

(h) Let ν be any Lebesgue-Stieltjes measure on R, I ⊆ R an interval (which may be either open or closed, bounded
or unbounded), and D ⊆ I a non-empty set. Let V be the space of functionsR of bounded variation from D to R, and
k k the norm of 224Yb on V. Let g : D → R be a function such that [a,b]∩D g dν exists whenever a ≤ b in I, and
R R
K = supa,b∈I,a≤b | [a,b]∩D g dν|. Show that | D f × g dν| ≤ Kkf k for every f ∈ V.

(i) Explain how to apply 224Yh with D = N to obtain Abel’s theorem that the product of a monotonic sequence
converging to 0 with a series which has bounded partial sums is summable.

I ⊆ R is an interval, and that hAn in∈N is a sequence of sets covering I. Let f : I → R be continuous.
(j) Suppose that P

Show that Var f ≤ n=0 VarAn f . (Hint: reduce to the case of closed sets An ; use Baire’s theorem (4A2Ma).)
225C Absolutely continuous functions 75

224 Notes and comments I have taken the ideas above rather farther than we need immediately; for the present
chapter, it is enough to consider the case in which D = dom f = [a, b] for some interval [a, b] ⊆ R. The extension to
functions with irregular domains will be useful in Chapter 28, and the extension to irregular sets D, while not important
to us here, is of some interest – for instance, taking D = N, we obtain the notion of ‘sequence of bounded variation’,
which is surely relevant to problems of convergence and summability.
The central result of the section is of course the fact that a function of bounded variation can be expressed as the
difference of monotonic functions (224D); indeed, one of the objects of the concept is to characterize the linear span of
the monotonic functions. Nearly everything else here can be derived as easy consequences of this, as in 224E-224G. In
224I and 224Xg we go a little deeper, and indeed some measure theory appears; this is where the ideas here begin to
connect with the real business of this chapter, to be continued in the next section. Another result which is easy enough
in itself, but contains the germs of important ideas, is 224Yg.
In 224Yb I mention a natural development in functional analysis, and in 224Yd-224Yf I suggest further wide-ranging
generalizations.

225 Absolutely continuous functions


We are now ready for a full characterization of the functions that can appear as indefinite integrals (225E, 225Xh).
The essential idea is that of ‘absolute continuity’ (225B). In the second half of the section (225G-225N) I describe some
of the relationships between this concept and those we have already seen.

225A Absolute continuity of the indefinite integral I begin with an easy fundamental result from general
measure theory.
Theorem Let (X, Σ, µ) be any measure space and f an integrable real-valued function defined on a conegligible
R subset
of X. Then for any ǫ > 0 there are a measurable set E of finite measure and a real number δ > 0 such that F |f | ≤ ǫ
whenever F ∈ Σ and µ(F ∩ E) ≤ δ.
proof
R There is a non-decreasing
R sequence hgn in∈N
R of non-negative
R simple functions such that |f | =a.e. limn→∞ gn
and |f | = limn→∞ gn . Take n ∈ N such that gn ≥ |f | − 12 ǫ. Let M > 0, E ∈ Σ be such that µE < ∞ and
gn ≤ M χE; set δ = ǫ/2M . If F ∈ Σ and µ(F ∩ E) ≤ δ, then
R R 1
F
gn = gn × χF ≤ M µ(F ∩ E) ≤ ǫ;
2
consequently
R R R 1 R
F
|f | = F
gn + F
|f | − gn ≤ ǫ + |f | − gn ≤ ǫ.
2

225B Absolutely continuous functions on R: Definition If [a, b] is a non-empty closed interval in R and
fP: [a, b] → R is a function, we say that f is absolutely continuous if for every Pǫn > 0 there is a δ > 0 such that
n
i=1 |f (b i ) − f (a i )| ≤ ǫ whenever a ≤ a 1 ≤ b 1 ≤ a 2 ≤ b 2 ≤ . . . ≤ a n ≤ b n ≤ b and i=1 bi − ai ≤ δ.

Remark The phrase ‘absolutely continuous’ is used in various senses in measure theory, closely related (if you look at
them in the right way) but not identical; you will need to keep the context of each definition in clear focus.

225C Proposition Let [a, b] be a non-empty closed interval in R.


(a) If f : [a, b] → R is absolutely continuous, it is uniformly continuous.
(b) If f : [a, b] → R is absolutely continuous it is of bounded variation on [a, b], so is differentiable almost everywhere
in [a, b], and its derivative is integrable over [a, b].
(c) If f , g : [a, b] → R are absolutely continuous, so are f + g and cf , for every c ∈ R.
(d) If f , g : [a, b] → R are absolutely continuous so is f × g.
(e) If g : [a, b] → [c, d] and f : [c, d] → R are absolutely continuous, and g is non-decreasing, then the composition
f g : [a, b] → R is absolutely continuous.
Pn
proof (a) Let ǫ >P0. Then there is a δ > 0 such that i=1 |f (bi ) − f (ai )| ≤ ǫ whenever a ≤ a1 ≤ b1 ≤ a2 ≤ b2 ≤ . . . ≤
n
an ≤ bn ≤ b and i=1 bi − ai ≤ δ; but of course now |f (y) − f (x)| ≤ ǫ whenever x, y ∈ [a, b] and |x − y| ≤ δ. As ǫ is
arbitrary, f is uniformly continuous.
76 The Fundamental Theorem of Calculus 225C

Pn
Pn(b) Let δ > 0 be such that i=1 |f (bi ) − f (ai )| ≤ 1 whenever a ≤ aP 1 ≤ b1 ≤ a2 ≤ b2 ≤ . . . ≤ an ≤ bn ≤ b and
n
b
i=1 i − a i ≤ δ. If a ≤ c = c 0 ≤ c 1 ≤ . . . ≤ c n ≤ d ≤ min(b, c + δ), then i=1 |f (ci ) − f (ci−1 )| ≤ 1, so Var[c,d] (f ) ≤ 1;
accordingly (inducing on k, using 224Cc for the inductive step) Var[a,min(a+kδ,b)] (f ) ≤ k for every k, and
Var[a,b] (f ) ≤ ⌈(b − a)/δ⌉ < ∞.
It follows that f ′ is integrable, by 224I.
(c)(i) Let ǫ > 0. Then there are δ1 , δ2 > 0 such that
Pn 1
i=1 |f (bi ) − f (ai )| ≤ 2 ǫ
Pn
whenever a ≤ a1 ≤ b1 ≤ a2 ≤ b2 ≤ . . . ≤ an ≤ bn ≤ b and i=1 bi − ai ≤ δ1 ,
Pn 1
i=1 |g(bi ) − g(ai )| ≤ 2 ǫ
Pn
whenever a ≤ a1 ≤ b1 ≤ a2 ≤ b2 ≤ . . . ≤ an ≤ bnP≤ b and i=1 bi − ai ≤ δ2 . Set δ = min(δ1 , δ2 ) > 0, and suppose that
n
a ≤ a1 ≤ b1 ≤ a2 ≤ b2 ≤ . . . ≤ an ≤ bn ≤ b and i=1 bi − ai ≤ δ. Then
Pn Pn Pn
i=1 |(f + g)(bi ) − (f + g)(ai )| ≤ i=1 |f (bi ) − f (ai )| + i=1 |g(bi ) − g(ai )| ≤ ǫ.

As ǫ is arbitrary, f + g is absolutely continuous.


(ii) Let ǫ > 0. Then there is a δ > 0 such that
Pn ǫ
i=1 |f (bi ) − f (ai )| ≤ 1+|c|
Pn
whenever a ≤ a1 ≤ b1 ≤ a2 ≤ b2 ≤ . . . ≤ an ≤ bn ≤ b and i=1 bi − ai ≤ δ. Now
Pn
i=1 |(cf )(bi ) − (cf )(ai )| ≤ ǫ
Pn
whenever a ≤ a1 ≤ b1 ≤ a2 ≤ b2 ≤ . . . ≤ an ≤ bn ≤ b and i=1 bi − ai ≤ δ. As ǫ is arbitrary, cf is absolutely
continuous.
(d) By either (a) or (b), f and g are bounded; set M = supx∈[a,b] |f (x)|, M ′ = supx∈[a,b] |g(x)|. Let ǫ > 0. Then
there are δ1 , δ2 > 0 such that
Pn Pn
i=1 |f (bi ) − f (ai )| ≤ ǫ whenever a ≤ a1 ≤ b1 ≤ a2 ≤ b2 ≤ . . . ≤ an ≤ bn ≤ b and bi − a i ≤ δ 1 ,
Pn Pni=1
i=1 |g(bi ) − g(ai )| ≤ ǫ whenever a ≤ a1 ≤ b1 ≤ a2 ≤ b2 ≤ . . . ≤ a Pn ≤ bn ≤ b and i=1 bi − ai ≤ δ2 .
n
Set δ = min(δ1 , δ2 ) > 0 and suppose that a ≤ a1 ≤ b1 ≤ . . . ≤ bn ≤ b and i=1 bi − ai ≤ δ. Then
n
X n
X
|f (bi )g(bi ) − f (ai )g(ai )| = |(f (bi ) − f (ai ))g(bi ) + f (ai )(g(bi ) − g(ai ))|
i=1 i=1
Xn
≤ |f (bi ) − f (ai )||g(bi )| + |f (ai )||g(bi ) − g(ai )|
i=1
Xn
≤ |f (bi ) − f (ai )|M ′ + M |g(bi ) − g(ai )|
i=1
≤ ǫM ′ + M ǫ = ǫ(M + M ′ ).
As ǫ is arbitrary, f × g is absolutely continuous.
Pn
Pn(e) Let ǫ > 0. Then there is a δ > 0 such that i=1 Pn|f (di )−f (ci )| ≤ ǫ whenever c ≤ c1 ≤ d1 ≤ . . . ≤ cn ≤ dn ≤ d and
i=1Pd i − c i ≤ δ; and there is an η > 0 such that i=1 |g(bi ) − g(ai )| ≤ δ whenever aP≤n a1 ≤ b1 ≤ . . . ≤ an ≤ bn ≤ b
n
and i=1 bi − ai ≤ η. Now suppose that a ≤ a1 ≤ b1 ≤ Pn . . . ≤ a n ≤ b n ≤ b and Pni=1 bi − ai ≤ η. Because g is
non-decreasing, we have c ≤ g(a1 ) ≤ . . . ≤ g(bn ) ≤ d and i=1 g(bi ) − g(ai ) ≤ δ, so i=1 |f (g(bi )) − f (g(ai ))| ≤ ǫ; as
ǫ is arbitrary, f g is absolutely continuous.

225D Lemma Let [a, b] be a non-empty closed interval in R and f : [a, b] → R an absolutely continuous function
which has zero derivative almost everywhere in [a, b]. Then f is constant on [a, b].
Pn
proof Let x ∈ [a, b], ǫ >Pn0. Let δ > 0 be such that i=1 |f (bi ) − f (a i )| ≤ ǫ whenever a ≤ a1 ≤ b1 ≤ a2 ≤ b2 ≤
. . . ≤ an ≤ bn ≤ b and i=1 bi − ai ≤ δ. Set A = {t : a < t < x, f ′ (t) exists = 0}; then µA = x − a, writing µ
for Lebesgue measure as usual. Let I be the set of non-empty non-singleton closed intervals [c, d] ⊆ [a, x] such that
225F Absolutely continuous functions 77

|f (d) − f (c)| ≤ ǫ(d − c); then every member of A belongs to arbitrarily


S short members of I. By Vitali’s theorem (221A),
there is a countable disjoint family I0 ⊆ I such that µ(A \ I0 ) = 0, that is,
S P
x − a = µ( I0 ) = I∈I0 µI.
Now there is a finite I1 ⊆ I0 such that
S P
µ( I1 ) = I∈I1 µI ≥ x − a − δ.
If I1 = ∅, then x ≤ a + δ and |f (x) − f (a)| ≤ ǫ. Otherwise, express I1 as {[c0 , d0 ], . . . , [cn , dn ]}, where a ≤ c0 < d0 <
c1 < d1 < . . . < cn < dn ≤ x. Then
Pn S
(c0 − a) + i=1 (ci − di−1 ) + (x − dn ) = µ([a, x] \ I1 ) ≤ δ,
so
Pn
|f (c0 ) − f (a)| + i=1 |f (ci ) − f (di−1 )| + |f (x) − f (dn )| ≤ ǫ.
On the other hand, |f (di ) − f (ci )| ≤ ǫ(di − ci ) for each i, so
Pn Pn
i=0 |f (di ) − f (ci )| ≤ ǫ i=0 di − ci ≤ ǫ(x − a).
Putting these together,

|f (x) − f (a)| ≤ |f (c0 ) − f (a)| + |f (d0 ) − f (c0 )| + |f (c1 ) − f (d0 )| + . . .


+ |f (dn ) − f (cn )| + |f (x) − f (dn )|
n
X
= |f (c0 ) − f (a)| + |f (ci ) − f (di−1 )|
i=1
n
X
+ |f (x) − f (dn )| + |f (di ) − f (ci )|
i=0
≤ ǫ + ǫ(x − a) = ǫ(1 + x − a).
As ǫ is arbitrary, f (x) = f (a). As x is arbitrary, f is constant.

225E Theorem Let [a, b] be a non-empty closed interval in R and F : [a, b] → R a function. Then the following are
equiveridical: Rx
(i) there is an integrable real-valued function f such that F (x) = F (a) + a f for every x ∈ [a, b];
Rx ′
(ii) a F exists and is equal to F (x) − F (a) for every x ∈ [a, b];
(iii) F is absolutely continuous.
R
proof (i)⇒(iii) Assume (i). Let ǫ > 0. By 225A, there is a δ > 0 such that H |f | ≤ ǫ whenever H ⊆ [a, b] and
µH ≤ δ, writing µ for Lebesgue measure
P S as usual. Now suppose that a ≤ a1 ≤ b1 ≤ a2 ≤ b2 ≤ . . . ≤ an ≤ bn ≤ b and
n
i=1 bi − ai ≤ δ. Consider H = 1≤i≤n [ai , bi [. Then µH ≤ δ and
Pn Pn R Pn R R
i=1 |F (bi ) − F (ai )| = i=1 | [a ,b [ f | ≤i
i=1 [a ,b [ |f | = F |f | ≤ ǫ.
i i i

As ǫ is arbitrary, F is absolutely continuous.


Rb
(iii)⇒(ii) If F is absolutely continuous, then it is of bounded variation (by 225Ba), so a F ′ exists (224I). Set
Rx ′
G(x) = a F for x ∈ [a, b]; then G′ =a.e. F ′ (222E) and G is absolutely continuous (by (i)⇒(iii) just proved).
Accordingly G − F is absolutely continuous (225Bb) and is differentiable, with zero derivative, almost everywhere. It
follows that G − F must be constant (225D). But as G(a) = 0, G = F + F (a); just as required by (ii).
(ii)⇒(i) is trivial.

225F Integration by parts As an application of this result, I give a justification of a familiar formula.
Theorem Let f be a real-valued function which is integrable over an interval [a, b] ⊆ R, andR g : [a, b] → R an absolutely
x
continuous function. Suppose that F is an indefinite integral of f , so that F (x) − F (a) = a f for x ∈ [a, b]. Then
Rb Rb
a
f × g = F (b)g(b) − F (a)g(a) − a
F × g′ .
Rb
proof Set h = F × g. Because F is absolutely continuous (225E), so is h (225Cd). Consequently h(b) − h(a) = a h′ ,
by (iii)⇒(ii) of 225E. But h′ = F ′ × g + F × g ′ wherever F ′ and g ′ are defined, which is almost everywhere, and
78 The Fundamental Theorem of Calculus 225F

F ′ =a.e. f , by 222E; so h′ =a.e. f × g + F × g ′ . Finally, g and F are continuous, therefore measurable, and bounded,
while f and g ′ are integrable (using 225E yet again), so f × g and F × g ′ are integrable, and
Rb Rb Rb
F (b)g(b) − F (a)g(a) = h(b) − h(a) = a
h′ = a
f ×g+ a
F × g′ ,
as required.

225G I come now to a group of results at a rather deeper level than most of the work of this chapter, being closer
to the ideas of Chapter 26.
Proposition Let [a, b] be a non-empty closed interval in R and f : [a, b] → R an absolutely continuous function. Then
f [A] is negligible for every negligible set A ⊆ R.
Pn
proofPnLet ǫ > 0. Then there is a δ > 0 such that i=1 |f (bi ) − f (ai )| ≤ ǫ whenever a ≤ a1P≤∞b1 . . . ≤ an ≤ bn ≤ b
and i=1 bi − ai ≤ δ. Now S there is a sequence hIk ik∈N of closed intervals, covering A, S
with k=0 µIk ≤ δ. For each
m ∈ N, let Fm be [a, b] ∩ k≤m Ik . Then µf [Fm ] ≤ ǫ. P P Fm must be expressible as i≤n [ci , di ] where n ≤ m and
a ≤ c0 ≤ d0 ≤ . . . ≤ cn ≤ dn ≤ b. For each i ≤ n choose xi , yi such that ci ≤ xi , yi ≤ di and
f (xi ) = minx∈[ci ,di ] f (x), f (yi ) = maxx∈[ci ,di ] f (x);
such exist because f is continuous, so is bounded and attains its bounds on [ci , di ]. Set ai = min(xi , yi ), bi = max(xi , yi ),
so that ci ≤ ai ≤ bi ≤ di . Then
Pn Pn S
i=0 bi − ai ≤ i=0 di − ci = µFm ≤ µ( k∈N Ik ) ≤ δ,
so

[ n
X
µf [Fm ] = µ( f [ [ci , di ] ]) ≤ µ(f [ [ci , di ] ])
i≤m i=0
n
X Xn
= µ[f (xi ), f (yi )] = |f (bi ) − f (ai )| ≤ ǫ. Q
Q
i=0 i=0

But hf [Fm ]im∈N is a non-decreasing sequence covering f [A], so


S
µ∗ f [A] ≤ µ( m∈N f [Fm ]) = supm∈N µf [Fm ] ≤ ǫ.
As ǫ is arbitrary, f [A] is negligible, as claimed.

225H Semi-continuous functions In preparation for the last main result of this section, I give a general result
concerning measurable real-valued functions on subsets of R. It will be convenient here, for once, to consider functions
taking values in [−∞, ∞]. If D ⊆ R r , a function g : D → [−∞, ∞] is lower semi-continuous if {x : g(x) > u} is an
open subset of D (for the subspace topology, see 2A3C) for every u ∈ [−∞, ∞]. Any lower semi-continuous function is
Borel measurable, therefore Lebesgue measurable (121B-121D). Now we have the following result.

225I Proposition Suppose that r ≥ 1 and that f is a real-valued function, defined on a subset D of R r , which
r
is integrable over D. Then for any
R ǫ > 0 there is a lower semi-continuousR function g : R → [−∞, ∞] such that
g(x) ≥ f (x) for every x ∈ D and D g is defined and not greater than ǫ + D f .
Remarks This is a result of great general importance, so I give it in a fairly general form; but for the present chapter
all we need is the case r = 1, D = [a, b] where a ≤ b.
R
proof (a) We can enumerate Q as hqn in∈N . By 225A, there is a δ > 0 such that F |f | ≤ 12 ǫ whenever µD F ≤ δ,
where µD is the subspace measure on D, so that µD F = µ∗ F , the outer Lebesgue measure of F , for every F ∈ ΣD ,
the domain of µD (214A-214B). For each n ∈ N, set
ǫ
δn = 2−n−1 min( , δ),
1+2|qn |
P∞ P∞
so that n=0 δn |qn | ≤ 12 ǫ and n=0 δn ≤ δ. For each n ∈ N, let En ⊆ R r be a Lebesgue measurable set such that
{x : f (x) ≥ qn } = D ∩ En , and choose an open set Gn ⊇ En ∩ B(0, n) such that µGn ≤ µ(En ∩ B(0, n)) + δn (134Fa),
writing B(0, n) for the ball {x : kxk ≤ n}. For x ∈ R r , set
g(x) = sup{qn : x ∈ Gn },
allowing −∞ as sup ∅ and ∞ as the supremum of a set with no upper bound in R.
225J Absolutely continuous functions 79

(b) Now check the properties of g.


P If u ∈ [−∞, ∞], then
(i) g is lower semi-continuous. P
S
{x : g(x) > u} = {Gn : qn > u}
is a union of open sets, therefore open. Q
Q
(ii) g(x) ≥ f (x) for every x ∈ D. P
P If x ∈ D and η > 0, there is an n ∈ N such that kxk ≤ n and f (x) − η ≤
qn ≤ f (x); now x ∈ En ⊆ Gn so g(x) ≥ qn ≥ f (x) − η. As η is arbitrary, g(x) ≥ f (x). Q
Q
(iii) Consider the functions h1 , h2 : D → ]−∞, ∞] defined by setting
[
h1 (x) = |f (x)| if x ∈ D ∩ (Gn \ En ),
n∈N

= 0 for other x ∈ D,
X∞
h2 (x) = |qn |χ(Gn \ En )(x) for every x ∈ D.
n=0
S
Setting F = n∈N Gn \ En ,
P∞
µF ≤ n=0 µ(Gn \ En ) ≤ δ,
so
R R 1
D
h1 ≤ D∩F
|f | ≤ ǫ
2
by the choice of δ. As for h2 , we have (by B.Levi’s theorem)
R P∞ P∞ 1
h = n=0 |qn |µD (D ∩ Gn \ Fn ) ≤ n=0 |qn |µ(Gn \ Fn ) ≤ ǫ
D 2 2
R
– because this is finite, h2 (x) < ∞ for almost every x ∈ D. Thus D h1 + h2 ≤ ǫ.
(iv) The point is that g ≤ f + h1 + h2 everywhere in D. P
P Take any x ∈ D. If n ∈ N and x ∈ Gn , then either
x ∈ En , in which case
f (x) + h1 (x) + h2 (x) ≥ f (x) ≥ qn ,
or x ∈ Gn \ En , in which case
f (x) + h1 (x) + h2 (x) ≥ f (x) + |f (x)| + |qn | ≥ qn .
Thus
f (x) + h1 (x) + h2 (x) ≥ sup{qn : x ∈ Gn } ≥ g(x). Q
Q
So g ≤ f + h1 + h2 everywhere in D.
(v) Putting (iii) and (iv) together,
R R R
D
g≤ D
f + h1 + h2 ≤ ǫ + D
f,
as required.

225J We need some results on Borel measurable sets and functions which are of independent interest.
Theorem Let D be a subset of R and f : D → R any function. Then
E = {x : x ∈ D, f is continuous at x}
is relatively Borel measurable in D, and
F = {x : x ∈ D, f is differentiable at x}
is actually Borel measurable; moreover, f ′ : F → R is Borel measurable.
proof (a) For k ∈ N set
Gk = {]a, b[ : a, b ∈ R, |f (x) − f (y)| ≤ 2−k for all x, y ∈ D ∩ ]a, b[}.
S T
Then Gk = Gk is an open set, so E0 = k∈N Gk is a Borel set. But E = D ∩ E0 , so E is a relatively Borel subset of
D.
80 The Fundamental Theorem of Calculus 225J

(b)(i) I should perhaps say at once that when interpreting the formula f ′ (x) = limh→0 (f (x + h) − f (x))/h, I insist
on the restrictive definition
f (x+h)−f (x)
a = limh→0 h
if
f (x+h)−f (x)
for every ǫ > 0 there is a δ > 0 such that is defined and
h
f (x+h)−f (x)
| − a| ≤ ǫ whenever 0 < |h| ≤ δ.
h

So f (x) can be defined only if there is some δ > 0 such that the whole interval [x − δ, x + δ] lies within the domain D
of f .
(ii) For p, q, q ′ ∈ Q and k ∈ N set

H(k, p, q, q ′ ) = ∅ if ]q, q ′ [ 6⊆ D,
= {x : x ∈ E ∩ ]q, q ′ [ , |f (y) − f (x) − p(y − x)| ≤ 2−k for every y ∈ ]q, q ′ [}
if ]q, q ′ [ ⊆ D.

P If x ∈ E ∩ ]q, q ′ [ ∩ H(k, p, q, q ′ ) there is a sequence hxn in∈N in


Then H(k, p, q, q ′ ) = E ∩ ]q, q ′ [ ∩ H(k, p, q, q ′ ). P

H(k, p, q, q ) converging to x. Because f is continuous at x,
|f (y) − f (x) − p(y − x)| = limn→∞ |f (y) − f (xn ) − p(y − xn )| ≤ 2−k
for every y ∈ ]q, q ′ [, so that x ∈ H(k, p, q, q ′ ). Q
Q Since E is a Borel set, by (a), so is H(k, p, q, q ′ ).
(iii) Now
T S
F = k∈N p,q,q ′ ∈Q H(k, p, q, q ′ ).
P (α) Suppose x ∈ F , that is, f ′ (x) is defined; say f ′ (x) = a. Take any k ∈ N. Then there are p ∈ Q, δ ∈ ]0, 1]
P
such that |p − a| ≤ 2−k−1 and [x − δ, x + δ] ⊆ D and | f (x+h)−f h
(x)
− a| ≤ 2−k−1 whenever
T 0S< |h| ≤ δ; now take
q ∈ Q ∩ [x − δ, x[, q ′ ∈ Q ∩ ]x, x + δ] and see that x ∈ H(k, p, q, q ′ ). As x is arbitrary, F ⊆ k∈N p,q,q′ ∈Q H(k, p, q, q ′ ).
T S
(β) If x ∈ k∈N p,q,q′ ∈Q H(k, p, q, q ′ ), then for each k ∈ N choose pk , qk , qk′ ∈ Q such that x ∈ H(k, pk , qk , qk′ ). If
h 6= 0, x + h ∈ ]qk , qk′ [ then | f (x+h)−f
h
(x)
− pk | ≤ 2−k . But this means, first, that |pk − pl | ≤ 2−k + 2−l for every k, l
(since surely there is some h 6= 0 such that x + h ∈ ]qk , qk′ [ ∩ ]ql , ql′ [), so that hpk ik∈N is a Cauchy sequence, with limit a
say; and, second, that | f (x+h)−f
h
(x)
− a| ≤ 2−k + |a − pk | whenever h 6= 0 and x + h ∈ ]qk , qk′ [, so that f ′ (x) is defined
and equal to a. QQ
S
(iv) Because Q is countable, all the unions p,q,q′ ∈Q H(k, p, q, q ′ ) are Borel sets, so F also is.
S
(v) Now enumerate Q 3 as h(pi , qi , qi′ )ii∈N , and set Hki ′
= H(k, pi , qi , qi′ ) \ j<i H(k, pj , qj , qj′ ) for each k, i ∈ N.
′ ′
Every Hki is Borel measurable, hHki ii∈N is disjoint, and
S ′
S ′
i∈N Hki = i∈N H(k, pi , qi , qi ) ⊇ F

for each k. Note that |f ′ (x) − p| ≤ 2−k whenever x ∈ F ∩ H(k, p, q, q ′ ), so if we set fk (x) = pi for every x ∈ Hki

we
−k ′
shall have a Borel function fk such that |f (x) − fk (x)| ≤ 2 for every x ∈ F . Accordingly f = limk→∞ fk ↾F is Borel
measurable.

225K Proposition Let [a, b] be a non-empty closed interval in R, and f : [a, b] → R a function. Set F = {x :
x ∈ ]a, b[ , f ′ (x) is defined}. Then f is absolutely continuous iff (i) f is continuous (ii) f ′ is integrable over F (iii)
f [ [a, b] \ F ] is negligible.
proof (a) Suppose first that f is absolutely continuous. Then f is surely continuous (225Ca) and f ′ is integrable over
[a, b], therefore over F (225E); also [a, b] \ F is negligible, so f [ [a, b] \ F ] is negligible, by 225G.
(b) So now suppose that f satisfies the conditions. Set f ∗ (x) = |f ′ (x)| for x ∈ F , 0 for x ∈ [a, b] \ F . Then
Rb
f (b) ≤ f (a) + a f ∗ .
P (i) Because F is a Borel set and f ′ is a Borel measurable function (225J), f ∗ is measurable. Let ǫ > 0. Let G be
P
an open subset of R such that f [ [a, b] \ F ] ⊆ G and µG ≤ ǫ (134Fa). Let g : R → [0, ∞] be a lower semi-continuous
Rb Rb
function such that f ∗ (x) ≤ g(x) for every x ∈ [a, b] and a g ≤ a f ∗ + ǫ (225I). Consider
225L Absolutely continuous functions 81
Rx
A = {x : a ≤ x ≤ b, µ([f (a), f (x)] \ G) ≤ 2ǫ(x − a) + a
g},
interpreting [f (a), f (x)] as ∅ if f (x) < f (a). Then a ∈ A ⊆ [a, b], so c = sup A is defined and belongs to
R x[a, b].
Because f is continuous, the function x 7→ µ([f (a), f (x)] \ G) is continuous; also x 7→ 2ǫ(x − a) + a g is certainly
continuous, so c ∈ A.
(ii) ?? If c ∈ F , so that f ∗ (c) = |f ′ (c)|, then there is a δ > 0 such that
a ≤ c − δ ≤ c + δ ≤ b,

g(x) ≥ g(c) − ǫ ≥ |f ′ (c)| − ǫ whenever |x − c| ≤ δ,

f (x)−f (c)
| − f ′ (c)| ≤ ǫ whenever |x − c| ≤ δ.
x−c
Consider x = c + δ. Then c < x ≤ b and

µ([f (a), f (x)] \ G) ≤ µ([f (a), f (c)] \ G) + |f (x) − f (c)|


Z c
≤ 2ǫ(c − a) + g + ǫ(x − c) + |f ′ (c)|(x − c)
a
Z c Z x
≤ 2ǫ(c − a) + g + ǫ(x − c) + (g + ǫ)
a c
(because g(t) ≥ |f ′ (c)| − ǫ whenever c ≤ t ≤ x)
Z x
= 2ǫ(x − a) + g,
a

so x ∈ A; but c is supposed to be an upper bound of A. X


X
Thus c ∈ [a, b] \ F .
(iii) ?? Now suppose, if possible, that c < b. We know that f (c) ∈ G, so there is an η > 0 such that [f (c)−η, f (c)+
η] ⊆ G; now there is a δ > 0 such that |f (x) − f (c)| ≤ η whenever x ∈ [a, b] and |x − c| ≤ δ. Set x = min(c + δ, b); then
c < x ≤ b and [f (c), f (x)] ⊆ G, so
Rc Rx
µ([f (a), f (x)] \ G) = µ([f (a), f (c)] \ G) ≤ 2ǫ(c − a) + a
g ≤ 2ǫ(x − a) + a
g
and once again x ∈ A, even though x > sup A. X
X
(iv) We conclude that c = b, so that b ∈ A. But this means that

f (b) − f (a) ≤ µ([f (a), f (b)]) ≤ µ([f (a), f (b)] \ G) + µG


Z b Z b
≤ 2ǫ(b − a) + g + ǫ ≤ 2ǫ(b − a) + f∗ + ǫ + ǫ
a a
Z b
= 2ǫ(1 + b − a) + f ∗.
a
Rb
As ǫ is arbitrary, f (b) − f (a) ≤ a
f ∗ , as claimed. Q
Q
Rb Rb
(c) Similarly, or applying (b) to −f , f (a) − f (b) ≤ a
f ∗ , so that |f (b) − f (a)| ≤ a
f ∗.

Rd
Of course the argument applies equally to any subinterval of [a, R b], ∗so |f (d) − f (c)| ≤ c f whenever a ≤ c ≤ d ≤ b.
Now let ǫ > 0. By 225A once more, there is a δ > 0 such that E f ≤ ǫ whenever E ⊆ [a, b] and µE ≤ δ. Suppose
Pn
that a ≤ a1 ≤ b1 ≤ . . . ≤ an ≤ bn ≤ b and i=1 bi − ai ≤ δ. Then
Pn Pn R bi ∗ R
i=1 |f (bi ) − f (ai )| ≤ i=1 a f =
S
[a ,b ]
f ∗ ≤ ǫ.
i i≤n i i

So f is absolutely continuous, as claimed.

225L Corollary Let [a, b] be a non-empty closed interval in R. Let f : [a, b] → R be a continuous function which
is differentiable on the open interval ]a, b[. If its derivative f ′ is integrable over [a, b], then f is absolutely continuous,
Rb
and f (b) − f (a) = a f ′ .
proof f [ [a, b]\F ] = {f (a), f (b)} is surely negligible, so f is absolutely continuous, by 225K; consequently f (b)−f (a) =
Rb ′
a
f , by 225E.
82 The Fundamental Theorem of Calculus 225M

225M Corollary Let [a, b] be a non-empty closed interval in R, and f : [a, b] → R a continuous function. Then f is
absolutely continuous iff it is continuous and of bounded variation and f [A] is negligible for every negligible A ⊆ [a, b].
proof (a) Suppose that f is absolutely continuous. By 225C(a-b) it is continuous and of bounded variation, and by
225G we have f [A] negligible for every negligible A ⊆ [a, b].
(b) So now suppose that f satisfies the conditions. Set F = {x : x ∈ ]a, b[ , f ′ (x) is defined}. By 224I, [a, b] \ F is
negligible, so f [ [a, b] \ F ] is negligible. Moreover, also by 224I, f ′ is integrable over [a, b] or F . So the conditions of
225K are satisfied and f is absolutely continuous.

225N The Cantor function I should mention the standard example of a continuous function of bounded variation
which is not absolutely continuous. Let C ⊆ [0, 1] be the Cantor set (134G). Recall that the ‘Cantor function’ is a
non-decreasing continuous function f : [0, 1] → [0, 1] such that f ′ (x) is defined and equal to zero for every x ∈ [0, 1] \ C,
but f (0) = 0 < 1 = f (1) (134H). Of course f is of bounded variation and not absolutely continuous. C is negligible
and f [C] = [0, 1] is not. If x ∈ C, then for every n ∈ N there is an interval of length 3−n , containing x, on which f
increases by 2−n ; so f cannot be differentiable at x, and the set F = dom f ′ of 225K is precisely [0, 1] \ C, so that
f [ [0, 1] \ F ] = [0, 1].

225O Complex-valued functions As usual, I spell out the results above in the forms applicable to complex-valued
functions.
(a) Let (X, Σ, µ) be any measure space and f an integrable complex-valued function defined on a conegligible
R subset
of X. Then for any ǫ > 0 there are a measurable set E of finite measure and a real number δ > 0 such that F |f | ≤ ǫ
whenever F ∈ Σ and µ(F ∩ E) ≤ δ. (Apply 225A to |f |.)

(b) If [a, b] is a non-empty closed interval in R and fP: [a, b] → C is a function, we say that f is absolutely
n
continuous if for everyPǫ > 0 there is a δ > 0 such that i=1 |f (bi ) − f (ai )| ≤ ǫ whenever a ≤ a1 ≤ b1 ≤ a2 ≤ b2 ≤
n
. . . ≤ an ≤ bn ≤ b and i=1 bi − ai ≤ δ. Observe that f is absolutely continuous iff its real and imaginary parts are
both absolutely continuous.

(c) Let [a, b] be a non-empty closed interval in R.


(i) If f : [a, b] → C is absolutely continuous it is of bounded variation on [a, b], so is differentiable almost everywhere
in [a, b], and its derivative is integrable over [a, b].
(ii) If f , g : [a, b] → C are absolutely continuous, so are f + g and ζf , for any ζ ∈ C, and f × g.
(iii) If g : [a, b] → [c, d] is monotonic and absolutely continuous, and f : [c, d] → C is absolutely continuous, then
f g : [a, b] → C is absolutely continuous.

(d) Let [a, b] be a non-empty closed interval in R and F : [a, b] → C a function. Then
R x the following are equiveridical:
(i) there is an integrable complex-valued function f such that F (x) = F (a) + a f for every x ∈ [a, b];
Rx
(ii) a F ′ exists and is equal to F (x) − F (a) for every x ∈ [a, b];
(iii) F is absolutely continuous.
(Apply 225E to the real and imaginary parts of F .)

(e) Let f be an integrable complex-valued


Rx function on an interval [a, b] ⊆ R, and g : [a, b] → C an absolutely
continuous function. Set F (x) = a f for x ∈ [a, b]. Then
Rb Rb
a
f × g = F (b)g(b) − F (a)g(a) − a
F × g′ .
(Apply 225F to the real and imaginary parts of f and g.)

(f ) Let f be a continuous complex-valued function on a closed interval [a, b] ⊆ R, and suppose that f is differentiable
at every point of the open interval ]a, b[, with f ′ integrable over [a, b]. Then f is absolutely continuous. (Apply 225L
to the real and imaginary parts of f .)

(g) For a result corresponding to 225M, see 264Yp.

225X Basic exercises (a) Show directly from the definition in 225B (without appealing to 225E) that any ab-
solutely continuous real-valued function on a closed interval [a, b] is expressible as the difference of non-decreasing
absolutely continuous functions.
225Xo Absolutely continuous functions 83

(b) Let f : [a, b] → R be an absolutely continuous function, where a ≤ b. (i) Show that |f | : [a, b] → R is absolutely
continuous. (ii) Show that gf is absolutely continuous whenever g : R → R is a differentiable function with bounded
derivative.

(c) Show directly from the definition in 225B and the Mean Value Theorem (without appealing to 225K) that if a
function f is continuous on a closed interval [a, b], differentiable Ron the open interval ]a, b[, and has bounded derivative
x
in ]a, b[, then f is absolutely continuous, so that f (x) = f (a) + a f ′ for every x ∈ [a, b].
Rb
(d) Show that if f : [a, b] → R is absolutely continuous, then Var f = a
|f ′ |. (Hint: put 224I and 225E together.)

(e) Let f : [0, ∞[


R ∞→ C be a function which is absolutely continuous on [0, a] for every a ∈ [0, ∞[ and has Laplace
transform F (s) = 0 e−sx f (x)dx defined on {s : Re s > S}. Suppose also that limx→∞ e−Sx f (x) = 0. Show that f ′
has Laplace transform sF (s) − f (0) defined whenever Re s > S. (Hint: show that
Rx d
f (x)e−sx − f (0) = 0 dt
(f (t)e−st )dt

for every x ≥ 0.)

(f ) Let g : R → R be a non-decreasing function which is absolutely continuous onR every bounded interval; let µg be
the associated Lebesgue-Stieltjes measure (114Xa), and Σg its domain. Show that E g ′ = µg E for any E ∈ Σg , if we
allow ∞ as a value of the integral. (Hint: start with intervals E.)

(g) Let g : [a, b] → R be a non-decreasing absolutely continuous function, and f : [g(a), g(b)] → R a continuous
R g(b) Rb Rx Rb
function. Show that g(a) f (t)dt = a f (g(t))g ′ (t)dt. (Hint: set F (x) = g(a) f , G = F g and consider a G′ (t)dt. See
also 263I.)

(h) Suppose that I ⊆ R is any non-trivial interval (bounded or unbounded, open, closed or half-open, but not empty
or a singleton), and f : I → R a function. Show that f is absolutely continuous on every closed bounded subinterval
Rb
of I iff there is a function g such that a g = f (b) − f (a) whenever a ≤ b in I, and in this case g is integrable iff f is of
bounded variation on I.
R1 ln x P∞ 1 R1 1
P∞
(i) Show that 0 x−1
dx = n=1 n2 . (Hint: use 225F to find 0
xn ln x dx, and recall that 1−x = n=0 xn for
0 ≤ x < 1.)
R1 R∞
(j)(i) Show that 0 ta dt is finite for every a > −1. (ii) Show that 1 ta e−t dt is finite for every a ∈ R. (Hint: show
R∞
that there is an M such that ta ≤ M et/2 for t ≥ M .) (iii) Show that Γ(a) = 0 ta−1 e−t dt is defined for every a > 0.
(iv) Show that Γ(a + 1) = aΓ(a) for every a > 0. (v) Show that Γ(n + 1) = n! for every n ∈ N.
(Γ is of course the gamma function.)
R∞ 2
(k) Show that if b > 0 then 0
ub−1 e−u /2
du = 2(b−2)/2 Γ( 2b ). (Hint: consider f (t) = t(b−2)/2 e−t , g(u) = u2 /2 in
225Xg.)

(l) Suppose that f , g are lower semi-continuous functions, defined on subsets of R r , and taking values in ]−∞, ∞].
(i) Show that f + g, f ∧ g and f ∨ g are lower semi-continuous, and that αf is lower semi-continuous for every α ≥ 0.
(ii) Show that if f and g are non-negative, then f × g is lower semi-continuous. (iii) Show that if f is non-negative and
g is continuous, then f × g is lower semi-continuous. (iv) Show that if f is non-decreasing then the composition f g is
lower semi-continuous.

(m) Let A be a non-empty family of lower semi-continuous functions


S defined on subsets of R r and taking values in
[−∞, ∞]. Set g(x) = sup{f (x) : f ∈ A, x ∈ dom f } for x ∈ D = f ∈A dom f . Show that g is lower semi-continuous.

(n) Suppose that f : [a, b] → R is continuous, and differentiable at all but countably many points of [a, b]. Show
that f is absolutely continuous iff it is of bounded variation.

(o) Show that if f : [a, b] → R is absolutely continuous, then f [E] is Lebesgue measurable for every Lebesgue
measurable set E ⊆ [a, b].
84 The Fundamental Theorem of Calculus 225Y

225Y Further exercises (a) Show that the composition of two absolutely continuous functions need not be
absolutely continuous. (Hint: 224Xb.)

(b) Let f : [a, b] → R be a continuous function, where a < b. Set G = {x : x ∈ ]a, b[ , ∃ y ∈ ]x, b] such that
f (x) < f (y)}. Show that G is open and is expressible as a disjoint union of intervals ]c, d[ where f (c) ≤ f (d). Use this
to prove 225D without calling on Vitali’s theorem.

(c) Let f : [a, b] → R be a function of bounded variation and γ > 0. Show that there is an absolutely continuous
function g : [a, b] → R such that |g ′ (x)| ≤ γ wherever the derivative is defined and {x : x ∈ [a, b], f (x) 6= g(x)} has
measure at most γ −1 Var f . (Hint: reduce to the case of non-decreasing f . Apply 225Yb to the function x 7→ f (x) − γx
and show that γµG ≤ Var[a,b] (f ). Set g(x) = f (x) for x ∈ ]a, b[ \ G.)

(d) Let f be a non-negative measurable real-valued function defined on a subset D of R r , where r ≥ 1. Show that
r
R any ǫ > 0 there is a lower semi-continuous function g : R → [−∞, ∞] such that g(x) ≥ f (x) for every x ∈ D and
for
D
g − f ≤ ǫ.

(e) Let f be a measurable real-valued function defined on a subset D of R r , where r ≥ 1. Show that for any
ǫ > 0 there is a lower semi-continuous function g : R r → [−∞, ∞] such that g(x) ≥ f (x) for every x ∈ D and
µ∗ {x : x ∈ D, g(x) > f (x)} ≤ ǫ. (Hint: 134Yd, 134Fb.)

(f )(i) Show that if f is a Lebesgue measurable real function then all its Dini derivates are Lebesgue measurable. (ii)
Show that if f is a Borel measurable real function then all its Dini derivates are Borel measurable.

225 Notes and comments There is a good deal more to say about absolutely continuous functions; I will return to
the topic in the next section and in Chapter 26. I shall not make direct use of any of the results from 225H on, but it
seems to me that this kind of investigation is necessary for any clear picture of the relationships between such concepts
as absolute continuity and bounded variation. Of course, in order to apply these results, we do need a store of simple
kinds of absolutely continuous function, differentiable functions with bounded derivative forming the most important
class (225Xc). A larger family of the same kind is the class of ‘Lipschitz’ functions (262Bc).
The definition of ‘absolutely continuous function’ is ordinarily set out for closed bounded intervals, as in 225B. The
point is that for other intervals the simplest generalizations of this formulation do not seem quite appropriate. In
225Xh I try to suggest the kind of demands one might make on functions defined on other types of interval.
I should remark that the real prize is still not quite within our grasp. I have been able to give a reasonably
satisfactory formulation of simple integration by parts (225F), at least for bounded intervals – a further limiting
process is necessary to deal with unbounded intervals. But a companion method from advanced calculus, integration
by substitution, remains elusive. The best I think we can do at this point is 225Xg, which insists on a continuous
integrand f . It is the case that the result is valid for general integrable f , but there are some further subtleties to
be mastered on the way; the necessary ideas are given in the much more general results 235A and 263D below, and
applied to the one-dimensional case in 263I.
On the way to the characterization of absolutely continuous functions in 225K, I find myself calling on one of
the fundamental relationships between Lebesgue measure and the topology of R r (225I). The technique here can be
adapted to give many variations of the result; see 225Yd-225Ye. If you have not seen semi-continuous functions before,
225Xl-225Xm give a partial idea of their properties. In 225J I give a fragment of ‘descriptive set theory’, the study
of the kinds of set which can arise from the formulae of analysis. These ideas too will re-surface elsewhere; compare
225Yf and also the proof of 262M below.

226 The Lebesgue decomposition of a function of bounded variation


I end this chapter with some notes on a method of analysing a general function of bounded variation which may help
to give a picture of what such functions can be, though it is not directly necessary for anything of great importance
dealt with in this volume.

226A Sums over arbitrary index sets To get a full picture of this fragment of real analysis, a bit of preparation
will be helpful. This concerns the notion of a sum over an arbitrary index set, which I have rather been skirting around
so far.

(a) If I is any set and hai ii∈I any family in [0, ∞], we set
226Ad The Lebesgue decomposition of a function of bounded variation 85

P P
i∈I ai = sup{ i∈K ai : K is a finite subset of I},
P
with the convention that ai = 0. (See 112Bd, 222Ba.) For general ai ∈ [−∞, ∞], we can set
i∈∅
P P + P −
i∈I ai = i∈I ai − i∈I ai
P P −
∞], that is, at least one of i∈I a+
if this is defined in [−∞,P i , i∈I ai is finite, where a
+
= max(a, 0) and a− =
max(−a, 0) for each a. If i∈I ai is defined and finite, we say that hai ii∈I is summable.

(b) Since this is a book on measure theory, I will immediately describe the relationship between this kind of
summability and an appropriate notion of integration. For any set I, we have the corresponding ‘counting measure’
µ on I (112Bd). Every subset of I is measurable, so every family hai ii∈I of real numbers is a measurable real-
valued function on I. A subset of I has finite measure iff it is finite; so a real-valued function f on I is ‘simple’ if
K = {i : f (i) 6= 0} is finite. In this case,
R P P
f dµ = i∈K f (i) = i∈I f (i)
R
as definedR in part (a). The measure µ is semi-finite (211Nc) so a non-negative function f is integrable iff f =
supµK<∞ K f is finite (213B); but of course this supremum is precisely
P P
sup{ i∈K f (i) : K ⊆ I is finite} = i∈I f (i).
R P
Now a general function f : I → R is integrable iff it is measurable and |f |dµ < ∞, that is, iff i∈I |f (i)| < ∞, and
in this case
R R R P P P
f dµ = f + dµ − f − dµ = i∈I f (i)+ − i∈I f (i)− = i∈I f (i),
writing f ± (i) = f (i)± for each i. Thus we have
P R
i∈I ai = I
ai µ(di),
and the standard rules under which we allow ∞ as the value of an integral (133A, 135F) match well with the interpre-
tations in (a) above.

(c) Accordingly, and unsurprisingly, the operation of summation is a linear operation on the linear space of summable
families of real numbers.
I observe here that this notion of summability is ‘absolute’; a family hai ii∈I is summable iff it is absolutely summable.
This is necessary because it must also be ‘unconditional’; we have no structure on an arbitrary
P set I to guide us to
take the sum in any particular order. See 226Xf. In particular, I distinguish between ‘ n∈N an ’, which in this book
P∞
will always be interpreted as in 226A above, and ‘ n=0 an which (if it makes a difference) should be interpreted as
Pm P∞ (−1)n P (−1)n P∞ P
limm→∞ n=0 an . So, for instance, n=0 = ln 2, while n∈N is undefined. Of course n=0 an = n∈N an
n+1 n+1
whenever the latter is defined in [−∞, ∞].

P If hai ii∈I is an (absolutely) summable


(d) There is another, and very important, approach to the sum described here.
family of real numbers, then for every ǫ > 0 there is a finite K ⊆ I such that i∈I\K |ai | ≤ ǫ. P P This is nothing but
R
a special case of 225A; there is a set K with µK < ∞ such that I\K |ai |µ(di) ≤ ǫ, but
R P
I\K
|ai |µ(di) = i∈I\K |ai |. Q
Q

(Of course there are ‘direct’ proofs of this result from the definition in (a), not mentioning measures or integrals. But
I think you will see that they rely on the same idea as that in the proof of 225A.) Consequently, for any family hai ii∈I
of real numbers and any s ∈ R, the following are equiveridical:
P
(i) i∈I ai = s;
P
(ii) for every ǫ > 0 there is a finite K ⊆ I such that |s − i∈J ai | ≤ ǫ whenever J is finite and K ⊆ J ⊆ I.
P
P (i)⇒(ii) Take K such that i∈I\K |ai | ≤ ǫ. If K ⊆ J ⊆ I, then
P
P P P
|s − i∈J ai | = | i∈I\J ai | ≤ i∈I\K |ai | ≤ ǫ.
(ii)⇒(i) Let ǫ > 0, and let K ⊆ I be as described in (ii). If J ⊆ I \K is any finite set, then set J1 = {i : i ∈ J, ai ≥ 0},
J2 = J \ J1 . We have
86 The Fundamental Theorem of Calculus 226Ad

X X X
|ai | = | ai − ai |
i∈J i∈J1 ∪K i∈J2 ∪K
X X
≤ |s − ai | + |s − ai | ≤ 2ǫ.
i∈J1 ∪K i∈J2 ∪K
P
As J is arbitrary, i∈I\K |ai | ≤ 2ǫ and
P P
i∈I |ai | ≤ i∈K |ai | + 2ǫ < ∞.
P
Accordingly ai is well-defined in R. Also
i∈I
P P P P
|s − i∈I ai | ≤ |s − i∈K ai | + | i∈I\K ai | ≤ ǫ + i∈I\K |ai | ≤ 3ǫ.
P
As ǫ is arbitrary, i∈I ai = Ps, as required. Q
Q
In this way, we express i∈I ai directly as a limit; we could write it as
P P
i∈I ai = limK↑I i∈K ai ,

on the understanding that we look at finite sets K in the right-hand formula.


P
(e) Yet another approach is through the following fact. If i∈I |ai | < ∞, then for any δ > 0 the set {i : |ai | ≥ δ} is
P
finite, indeed can have at most 1δ i∈I |ai | members; consequently
S
J = {i : ai 6= 0} = n∈N {i : |ai | ≥ 2−n }
P P
is countable (1A1F). If J is finite, then of course i∈I ai = i∈J ai reduces to a finite sum. Otherwise, we can
enumerate J as hjn in∈N , and we shall have
P P Pn P∞
i∈I ai = i∈J ai = limn→∞ k=0 ajk = n=0 ajn
P
(using (d) to reduce the sum i∈J ai to a limit of finite sums). Conversely, if hai ii∈I is such that there is a countably
P∞ P P∞
infinite J ⊆ {i : ai 6= 0} enumerated as hjn in∈N , and if n=0 |ajn | < ∞, then i∈I ai will be n=0 ajn .

(f ) It will be useful later to have a fragment of general theory. Let I and J be sets and haij ii∈I,j∈J a family in
[0, ∞]. Then
P P P P P
(i,j)∈I×J aij = i∈I ( j∈J aij ) = j∈J ( i∈I aij ).
P P
PP (i) If (i,j)∈I×J aij > u, then there is a finite set M ⊆ I × J such that (i,j)∈M aij > u. Now K = {i : (i, j) ∈ M }
and L = {j : (i, j) ∈ M } are finite, so

XX XX XX
aij ≥ aij ≥ aij
i∈I j∈J i∈K j∈J i∈K j∈L
P P
(because j∈J aij ≥ j∈L aij for every i)
X X
= aij ≥ aij > u.
(i,j)∈K×L (i,j)∈M
P P P P P
As u is arbitrary, i∈I j∈J aij ≥ (i,j)∈I×J aij . (ii) If i∈I j∈J aij > u, there is a finite set K ⊆ I such that
P P P P ǫ
i∈K j∈J aij > u. Let ǫ ∈ ]0, 1[ be such that i∈K j∈J aij > u + ǫ, and set δ = #(K) . For each i ∈ K set
P
γi = min(u + 1, j∈J aij ) − δ; then
P P P P P
ǫ + i∈K γi = i∈K min(u + 1, j∈J aij ) ≥ min(u + 1, i∈K j∈J aij ) > u + ǫ,
P P P
so i∈K γi > u. For each i ∈ K, γi < j∈J aij , so there is a finite Li ⊆ J such that j∈Li aij ≥ γi . Set
M = {(i, j) : i ∈ K, j ∈ Li }, so that M is a finite subset of I × J; then
P P P P P
(i,j)∈I×J aij ≥ (i,j)∈M aij = i∈K j∈Li aij ≥ i∈K γi > u.
P P P P
As u is arbitrary, (i,j)∈I×J aij ≥ i∈I j∈J aij and these two sums are equal. (iii) Similarly, (i,j)∈I×J aij =
P P
j∈J i∈I aij . Q
Q

226B Saltus functions Now we are ready for a special type of function of bounded variation on R. Suppose that
a < b in R.
226Bd The Lebesgue decomposition of a function of bounded variation 87

(a) A (real) saltus function on [a, b] is a function F : [a, b] → R expressible in the form
P P
F (x) = t∈[a,x[ ut + t∈[a,x] vt
P P
for x ∈ [a, b], where hut it∈[a,b[ , hvt it∈[a,b] are real-valued families such that t∈[a,b[ |ut | and t∈[a,b] |vt | are finite.

(b) For any function F : [a, b] → R we can write


F (x+ ) = limy↓x F (y) if x ∈ [a, b[ and the limit exists,

F (x− ) = limy↑x F (y) if x ∈ ]a, b] and the limit exists.


(I hope that this will not lead to confusion with the alternative interpretation of x+ as max(x, 0).) Observe that if F is a
saltus function, as defined in (b), with associated families hut it∈[a,b[ and hvt it∈[a,b] , then va = F (a), vx = F (x) − F (x− )
for x ∈ ]a, b], ux = F (x+ ) − F (x) for x ∈ [a, b[. PP Let ǫ > 0. As remarked in 226Ad, there is a finite K ⊆ [a, b] such
that
P P
t∈[a,b[\K |ut | + t∈[a,b]\K |vt | ≤ ǫ.

Given x ∈ [a, b], let δ > 0 be such that [x − δ, x + δ] contains no point of K except perhaps x. In this case, if
max(a, x − δ) ≤ y < x, we must have
X X
|F (y) − (F (x) − vx )| = | ut + vt |
t∈[y,x[ t∈]y,x[
X X
≤ |ut | + |vt | ≤ ǫ,
t∈[a,b[\K t∈[a,b]\K

while if x < y ≤ min(b, x + δ) we shall have


X X
|F (y) − (F (x) + ux )| = | ut + vt |
t∈]x,y[ t∈]x,y]
X X
≤ |ut | + |vt | ≤ ǫ.
t∈[a,b[\K t∈[a,b]\K

As ǫ is arbitrary, we get F (x− ) = F (x) − vx (if x > a) and F (x+ ) = F (x) + ux (if x < b). Q Q
It follows that F is continuous at x ∈ ]a, b[ iff ux = vx = 0, while F is continuous at a iff ua = 0 and F is continuous
at b iff vb = 0. In particular, {x : x ∈ [a, b], F is not continuous at x} is countable (see 226Ae).
(c) If F is a saltus function defined on [a, b], with associated families hut it∈[a,b[ , hvt it∈[a,b] , then F is of bounded
variation on [a, b], and
P P
Var[a,b] (F ) ≤ t∈[a,b[ |ut | + t∈]a,b] |vt |.
P If a ≤ x < y ≤ b, then
P
P
F (y) − F (x) = ux + t∈]x,y[ (ut + vt ) + vy ,
so
P P
|F (y) − F (x)| ≤ t∈[x,y[ |ut | + t∈]x,y] |vt |.
If a ≤ a0 ≤ a1 ≤ . . . ≤ an ≤ b, then
n
X n
X X X 
|F (ai ) − F (ai−1 )| ≤ |ut | + |vt |
i=1 i=1 t∈[ai−1 ,ai [ t∈]ai−1 ,ai ]
X X
≤ |ut | + |vt |.
t∈[a,b[ t∈]a,b]

Consequently
P P
Var[a,b] (F ) ≤ t∈[a,b[ |ut | + t∈]a,b] |vt | < ∞. Q
Q

(d) The inequality in (c) is actually an equality. To see this, note first that if a ≤ x < y ≤ b, then Var[x,y] (F ) ≥
|ux | + |vy |. P
P I noted in (b) that ux = limt↓x F (t) − F (x) and vy = F (y) − limt↑y F (t). So, given ǫ > 0, we can find
t1 , t2 such that x < t1 ≤ t2 ≤ y and
88 The Fundamental Theorem of Calculus 226Bd

|F (t1 ) − F (x)| ≥ |ux | − ǫ, |F (y) − F (t2 )| ≥ |vy | − ǫ.


Now
Var[x,y] (F ) ≥ |F (t1 ) − F (x)| + |F (t2 ) − F (t1 )| + |F (y) − F (t2 )| ≥ |ux | + |vy | − 2ǫ.
As ǫ is arbitrary, we have the result. Q
Q
Now, given a ≤ t0 < t1 < . . . < tn ≤ b, we must have

n
X
Var[a,b] (F ) ≥ Var[ti−1 ,ti ] (F )
i=1
(using 224Cc)
n
X
≥ |uti−1 | + |vti |.
i=1

As t0 , . . . , tn are arbitrary,
P P
Var[a,b] (F ) ≥ t∈[a,b[ |ut | + t∈]a,b] |vt |,
as required.

(e) Because a saltus function is of bounded variation ((c) above), it is differentiable almost everywhere (224I). In fact
its derivative is zero almost everywhere. P P Let F : [a, b] → R be a saltus function, with associated families hut it∈[a,b[ ,
hvt it∈[a,b] . Let ǫ > 0. Let K ⊆ [a, b] be a finite set such that
P P
t∈[a,b[\K |ut | + t∈[a,b]\K |vt | ≤ ǫ.

Set

u′t = ut if t ∈ [a, b[ ∩ K,
= 0 if t ∈ [a, b[ \ K,
vt′ = vt if t ∈ K,
= 0 if t ∈ [a, b] \ K,
u′′t = ut − u′t for t ∈ [a, b[ ,
vt′′ = vt − vt′ for t ∈ [a, b].
Let G, H be the saltus functions corresponding to hu′t it∈[a,b[ , hvt′ it∈[a,b] and hu′′t it∈[a,b[ hvt′′ it∈[a,b] , so that F = G + H.
Then G′ (t) = 0 for every t ∈ ]a, b[ \ K, since ]a, b[ \ K comprises a finite number of open intervals on each of which G
is constant. So G′ = 0 a.e. and F ′ =a.e. H ′ . On the other hand,
Rb ′ P P
a
|H | ≤ Var[a,b] (H) = t∈[a,b[\K |ut | + t∈]a,b]\K |vt | ≤ ǫ,
using 224I and (d) above. So
Rb Rb
a
|F ′ | = a
|H ′ | ≤ ǫ.
Rb
As ǫ is arbitrary, a
|F ′ | = 0 and F ′ = 0 a.e., as claimed. Q
Q

226C The Lebesgue decomposition of a function of bounded variation Take a, b ∈ R with a < b.

(a) If F : [a, b] → R is non-decreasing, set va = 0, vt = F (t) − F (t− ) for t ∈ ]a, b], ut = F (t+ ) − F (t) for t ∈ [a, b[,
defining F (t+ ), F (t− ) as in 226Bb. Then all the vt , ut are non-negative, and if a < t0 < t1 < . . . < tn < b, then
Pn Pn + −
i=0 (uti + vti ) = i=0 (F (ti ) − F (ti )) ≤ F (b) − F (a).
P P
Accordingly t∈[a,b[ ut and t∈[a,b] vt are both finite. Let Fp be the corresponding saltus function, as defined in 226Ba,
so that
P
Fp (x) = F (a+ ) − F (a) + t∈]a,x[ (F (t+ ) − F (t− )) + F (x) − F (x− )
if a < x ≤ b. If a ≤ x < y ≤ b then
226D The Lebesgue decomposition of a function of bounded variation 89

X
Fp (y) − Fp (x) = F (x+ ) − F (x) + (F (t+ ) − F (t− )) + F (y) − F (y − )
t∈]x,y[

≤ F (y) − F (x)
because if x = t0 < t1 < . . . < tn < tn+1 = y then
n
X

F (x+ ) − F (x) + (F (t+ −
i ) − F (ti )) + F (y) − F (y )
i=1
n+1
X
= F (y) − F (x) − (F (t− +
i ) − F (ti−1 )) ≤ F (y) − F (x).
i=1

Accordingly both Fp and Fc = F − Fp are non-decreasing. Also, because


Fp (a) = 0 = va ,

Fp (t) − Fp (t− ) = vt = F (t) − F (t− ) for t ∈ ]a, b],

Fp (t+ ) − Fp (t) = ut = F (t+ ) − F (t) for t ∈ [a, b[,


we shall have
Fc (a) = F (a),

Fc (t) = Fc (t− ) for t ∈ ]a, b],

Fc (t) = Fc (t+ ) for t ∈ [a, b[,


and Fc is continuous.
Clearly this expression of F = Fp + Fc as the sum of a saltus function and a continuous function is unique, except
that we can freely add a constant to one if we subtract it from the other.

(b) Still taking F : [a, b] →R R to be non-decreasing, we know that F ′ is integrable (222C); moreover, F ′ =a.e. Fc′ , by
x
226Be. Set Fac (x) = F (a) + a F ′ for each x ∈ [a, b]. We have
Ry
Fac (y) − Fac (x) = x
Fc′ ≤ Fc (y) − Fc (x)
for a ≤ x ≤ y ≤ b (222C again), so Fcs = Fc − Fac is still non-decreasing; Fac is continuous (225A), so Fcs is continuous;

Fac ′
=a.e. F ′ =a.e. Fc′ (222E), so Fcs = 0 a.e.
Again, the expression of Fc = Fac + Fcs as the sum of an absolutely continuous function and a function with zero
derivative almost everywhere is unique, except for the possibility of moving a constant from one to the other, because
two absolutely continuous functions whose derivatives are equal almost everywhere must differ by a constant (225D).

(c) Putting all these together: if F : [a, b] → R is any non-decreasing function, it is expressible as Fp + Fac + Fcs ,
where Fp is a saltus function, Fac is absolutely continuous, and Fcs is continuous and differentiable, with zero derivative,
almost everywhere; all three components are non-decreasing; and the expression is unique if we say that Fac (a) = F (a),
Fp (a) = Fcs (a) = 0.
The Cantor function f : [0, 1] → [0, 1] (134H) is continuous and f ′ = 0 a.e. (134Hb), so fp = fac = 0 and f = fcs .
Setting g(x) = 12 (x + f (x)) for x ∈ [0, 1], as in 134I, we get gp (x) = 0, gac (x) = x2 and gcs (x) = 21 f (x).

(d) Now suppose that F : [a, b] → R is of bounded variation. Then it is expressible as a difference G − H of
non-decreasing functions (224D). So writing Fp = Gp − Hp , etc., we can express F as a sum Fp + Fcs + Fac , where Fp

is a saltus function, Fac is absolutely continuous, Fcs is continuous, Fcs = 0 a.e., Fac (a) = F (a), Fcs (a) = Fp (a) = 0.
Under these conditions the expression is unique, because (for instance) Fp (t+ ) − Fp (t) = F (t+ ) − F (t) for t ∈ [a, b[,

while Fac =a.e. (F − Fp )′ =a.e. F ′ .
This is a Lebesgue decomposition of the function F . (I have to say ‘a’ Lebesgue decomposition because of course
the assignments Fac (a) = F (a), Fp (a) = Fcs (a) = 0 are arbitrary.) I will call Fp the saltus part of F .

226D Complex functions The modifications needed to deal with complex functions are elementary.
(a) If I is any set and haj ij∈I is a family of complex numbers, then the following are equiveridical:
90 The Fundamental Theorem of Calculus 226D

P
(i) j∈I |aj | < ∞;
P
(ii) there is an s ∈ C such that for every ǫ > 0 there is a finite K ⊆ I such that |s − j∈J aj | ≤ ǫ
whenever J is finite and K ⊆ J ⊆ I.
In this case
P P R
s = j∈I Re(aj ) + i j∈I Im(aj ) = I aj µ(dj),
P
where µ is counting measure on I, and we write s = j∈I aj .

(b) If a < b in R, a complex saltus function on [a, b] is a function F : [a, b] → C expressible in the form
P P
F (x) = t∈[a,x[ ut + t∈[a,x] vt
P P
for x ∈ [a, b], where hut it∈[a,b[ , hvt it∈[a,b] are complex-valued families such that t∈[a,b[ |ut | and t∈[a,b] |vt | are finite;
that is, if the real and imaginary parts of F are saltus functions. In this case F is continuous except at countably many
points and differentiable, with zero derivative, almost everywhere in [a, b], and
ux = limt↓x F (t) − F (x) for every x ∈ [a, b[,

vx = limt↑x F (x) − F (t) for every x ∈ ]a, b]


(apply the results of 226B to the real and imaginary parts of F ). F is of bounded variation, and its variation is
P P
Var[a,b] (F ) = t∈[a,b[ |ut | + t∈]a,b] |vt |
(repeat the arguments of 226Bc-d).

(c) If F : [a, b] → C is a function of bounded variation, where a < b in R, it is uniquely expressible as F =


Fp + Fcs + Fac , where Fp is a saltus function, Fac is absolutely continuous, Fcs is continuous and has zero derivative
almost everywhere, and Fac (a) = F (a), Fp (a) = Fcs (a) = 0. (Apply 226C to the real and imaginary parts of F .)

226E As an elementary exercise in the language of 226A, I interpolate a version of a theorem of B.Levi which is
sometimes useful.
Proposition Let (X, Σ, µ) Pbe a measure
R space, I a countable set,Pand hfi ii∈I a family of µ-integrable real- or complex-
R
valuedR functions such that i∈I |fi |dµ is finite. Then f (x) = i∈I fi (x) is defined almost everywhere and f dµ =
P
i∈I fi dµ.
proof If I is finite this
Pnis elementary. Otherwise, since there must be a bijection between I and N, we may take it that
IR = N. P Setting g
R n = i=0 |fi | for each n, we have a non-decreasing sequence hgn in∈N of integrable functions such that
gn ≤ i∈N |fi | for every n, so that g = supn∈N gn is integrable, by B.Levi’s theorem P as stated in 123A. In particular,
g is finite almost everywhere. Now if x ∈ X is such that g(x) is defined and finite, i∈J |fi (x)| ≤ g(x) for every finite
P P P Pn
J ⊆ N, so i∈N |fi (x)| and i∈N fi (x) are defined. In this case, of course, i∈N fi (x) = limn→∞ i=0 fi (x). But
Pn
| i=0 fi | ≤a.e. g for each n, so Lebesgue’s Dominated Convergence Theorem tells us that
RP R Pn Pn R P
i∈N fi = limn→∞ i=0 fi = limn→∞ i=0 fi = i∈N fi .

226X Basic exercises > (a) A step-function on an interval [a, b] is a function F such that, for suitable t0 , . . . , tn
with a = t0 ≤ . . . ≤ tn = b, F is constant on each interval ]ti−1 , ti [. Show that F : [a, b] → R is a saltus function iff for
every ǫ > 0 there is a step-function G : [a, b] → R such that Var[a,b] (F − G) ≤ ǫ.

(b) Let F , G be real-valued functions of bounded variation defined on an interval [a, b] ⊆ R. Show that, in the
language of 226C,
(F + G)p = Fp + Gp , (F + G)c = Fc + Gc ,

(F + G)cs = Fcs + Gcs , (F + G)ac = Fac + Gac .

> (c) Let F be a real-valued function of bounded variation on an interval [a, b] ⊆ R. Show that, in the language of
226C,
Var[a,b] (F ) = Var[a,b] (Fp ) + Var[a,b] (Fc ) = Var[a,b] (Fp ) + Var[a,b] (Fcs ) + Var[a,b] (Fac ).
226 Notes The Lebesgue decomposition of a function of bounded variation 91

(d) Let F be a real-valued function of bounded variation on an interval [a, b] ⊆ R. Show that F is absolutely
Rb
continuous iff Var[a,b] (F ) = a |F ′ |.

(e) Consider the function g of 134I/226Cc. Show that g −1 : 0, 1] → [0, 1] is differentiable almost everywhere in [0, 1],
and find µ{x : (g −1 )′ (x) ≤ a} for each a ∈ R.

> (f ) Suppose that I and J are sets and that hai ii∈I is a summable family of real numbers.
P (i)PShow that if f : J → I
is injective then haf (j) ij∈J is summable. (ii) Show that if g : I → J is any function, then j∈J i∈g−1 [{j}] ai is defined
P
and equal to i∈I ai .

226Y Further exercises (a) Explain what modifications are appropriate in the description of the Lebesgue
decomposition of a function of bounded variation if we wish to consider functions on open or half-open intervals,
including unbounded intervals.
−1
R Suppose that F : [a, b] → R is a function of bounded variation, and set h(y) = #(F [{y}]) for y ∈ R. Show
(b)
that h = Var[a,b] (Fc ), where Fc is the ‘continuous part’ of F as defined in 226Ca/226Cd.

(c) Show that a set I is countable iff there is a summable family hai ii∈I of non-zero real numbers.

(d) Suppose that a < b in R, and that F : [a, b] → R is a function of bounded variation; let Fp be its saltus part.
Show that |F (b) − F (a)| ≤ µF [ [a, b] ] + Var[a,b] Fp , where µ is Lebesgue measure on R.

226 Notes and comments In 232I and 232Yb below I will revisit these ideas, linking them to a decomposition of
the Lebesgue-Stieltjes measure corresponding to a non-decreasing real function, and thence to more general measures.
All this work is peripheral to the main concerns of this volume, but I think it is illuminating, and certainly it is part
of the basic knowledge assumed of anyone working in real analysis.
92 The Radon-Nikodým theorem

Chapter 23
The Radon-Nikodým Theorem
In Chapter 22, I discussed the indefinite integrals of integrable functions on R, and gave what I hope you feel are
satisfying descriptions both of the functions which are indefinite integrals (the absolutely continuous functions) and of
how to find which functions they are indefinite integrals of (you differentiate them). For general measure spaces, we
have no structure present which can give such simple formulations; but nevertheless the same questions can be asked
and, up to a point, answered.
The first section of this chapter introduces the basic machinery needed, the concept of ‘countably additive’ functional
and its decomposition into positive and negative parts. The main theorem takes up the second section: indefinite inte-
grals are the ‘truly continuous’ additive functionals; on σ-finite spaces, these are the ‘absolutely continuous’ countably
additive functionals. In §233 I discuss the most important single application of the theorem, its use in providing a
concept of ‘conditional expectation’. This is one of the central concepts of probability theory – as you very likely know;
but the form here is a dramatic generalization of the elementary concept of the conditional probability of one event
given another, and needs the whole strength of the general theory of measure and integration as developed in Volume
1 and this chapter. I include some notes on convex functions, up to and including versions of Jensen’s inequality
(233I-233J).
While we are in the area of ‘pure’ measure theory, I take the opportunity to discuss some further topics. I begin
with some essentially elementary constructions, image measures, sums of measures and indefinite-integral measures; I
think the details need a little attention, and I work through them in §234. Rather deeper ideas are needed to deal
with ‘measurable transformations’. In §235 I set out the techniques necessary to provide an abstract basis for a general
method of integration-by-substitution, with a detailed account of sufficient conditions for a formula of the type
R R
g(y)dy = g(φ(x))J(x)dx
to be valid.

231 Countably additive functionals


I begin with an abstract description of the objects which will, in appropriate circumstances, correspond to the
indefinite integrals of general integrable functions. In this section I give those parts of the theory which do not involve
a measure, but only a set with a distinguished σ-algebra of subsets. The basic concepts are those of ‘finitely additive’
and ‘countably additive’ functional, and there is one substantial theorem, the ‘Hahn decomposition’ (231E).

231A Definition Let X be a set and Σ an algebra of subsets of X (136E). A functional ν : Σ → R is finitely
additive, or just additive, if ν(E ∪ F ) = νE + νF whenever E, F ∈ Σ and E ∩ F = ∅.

231B Elementary facts Let X be a set, Σ an algebra of subsets of X, and ν : Σ → R a finitely additive functional.

(a) ν∅ = 0. (For ν∅ = ν(∅ ∪ ∅) = ν∅ + ν∅.)


S Pn
(b) If E0 , . . . , En are disjoint members of Σ then ν( i≤n Ei ) = i=0 νEi .

(c) If E, F ∈ Σ and E ⊆ F then νF = νE + ν(F \ E). More generally, for any E, F ∈ Σ,


νF = ν(F ∩ E) + ν(F \ E).

(d) If E, F ∈ Σ then
νE − νF = ν(E \ F ) + ν(E ∩ F ) − ν(E ∩ F ) − ν(F \ E) = ν(E \ F ) − ν(F \ E).

231C Definition
P∞ Let X be a set and Σ an algebra ofS subsets of X. A function ν : Σ → R is countably additive
or
S σ-additive if n=0 νE n exists in R and is equal to ν( n∈N En ) for every disjoint sequence hEn in∈N in Σ such that
E
n∈N n ∈ Σ.
Remark Note that when I use the phrase ‘countably additive functional’ I mean to exclude the possibility of ∞ as a
value of the functional. Thus a measure is a countably additive functional iff it is totally finite (211C).
You will sometimes see the phrase ‘signed measure’ used to mean what I call a countably additive functional.
231E Countably additive functionals 93

231D Elementary facts Let X be a set, Σ a σ-algebra of subsets of X and ν : Σ → R a countably additive
functional.
P∞
P (i) Setting En = ∅ for every n ∈ N, n=0 ν∅ must be defined in R so ν∅ must be 0. (ii)
(a) ν is finitely additive. P
Now if E, F ∈ Σ and E ∩ F = ∅ we can set E0 = E, E1 = F , En = ∅ for n ≥ 2 and get
S P∞
ν(E ∪ F ) = ν( n∈N En ) = n=0 νEn = νE + νF . Q Q

(b) If hEn in∈N is a non-decreasing sequence in Σ, with union E ∈ Σ, then


P∞
νE = νE0 + n=0 ν(En+1 \ En ) = limn→∞ νEn .

(c) If hEn in∈N is a non-increasing sequence in Σ with intersection E ∈ Σ, then


νE = νE0 − limn→∞ ν(E0 \ En ) = limn→∞ νEn .

(d) If ν ′ : Σ → R is another countably additive functional, and c ∈ R, then ν + ν ′ : Σ → R and cν : Σ → R are


countably additive.

(e) If H ∈ Σ, then νH : Σ → R is countably additive, where νH E = ν(E ∩ H) for every E ∈ Σ. P


P If hEn in∈N is a
disjoint sequence in Σ with union E ∈ Σ then hEn ∩ Hin∈N is disjoint, with union E ∩ H, so
S P∞ P∞
νH E = ν(H ∩ E) = ν( n∈N (H ∩ En )) = n=0 ν(H ∩ En ) = n=0 νH En . Q Q

Remark For the time being, we shall be using the notion of ‘countably additive functional’ only on σ-algebras Σ, in
which case we can take it for granted that the unions and intersections above belong to Σ.

231E All the ideas above amount to minor modifications of ideas already needed at the very beginning of the
theory of measure spaces. We come now to something more substantial.
Theorem Let X be a set, Σ a σ-algebra of subsets of X, and ν : Σ → R a countably additive functional. Then
(a) ν is bounded;
(b) there is a set H ∈ Σ such that
νF ≥ 0 whenever F ∈ Σ and F ⊆ H,

νF ≤ 0 whenever F ∈ Σ and F ∩ H = ∅.

proof (a) ?? Suppose, if possible, otherwise. For E ∈ Σ, set M (E) = sup{|νF | : F ∈ Σ, F ⊆ E}; then M (X) = ∞.
Moreover, whenever E1 , E2 , F ∈ Σ and F ⊆ E1 ∪ E2 , then
|νF | = |ν(F ∩ E1 ) + ν(F \ E1 )| ≤ |ν(F ∩ E1 )| + |ν(F \ E1 )| ≤ M (E1 ) + M (E2 ),
so M (E1 ∪ E2 ) ≤ M (E1 ) + M (E2 ). Choose a sequence hEn in∈N in Σ as follows. E0 = X. Given that M (En ) = ∞,
where n ∈ N, then surely there is an Fn ⊆ En such that |νFn | ≥ 1 + |νEn |, in which case |ν(En \ Fn )| ≥ 1. Now at
least one of M (Fn ), M (En \ Fn ) is infinite; if M (Fn ) = ∞, set En+1 = Fn ; otherwise, set En+1 = En \ Fn ; in either
case, note that |ν(En \ En+1 )| ≥ 1 and M (En+1 ) = ∞, so that the induction will continue. P∞
On completing this induction, set Gn = En \En+1 for n ∈ N. Then hGn in∈N is a disjoint sequence in Σ, so n=0 νGn
is defined in R and limn→∞ νGn = 0; but |νGn | ≥ 1 for every n. X X
(b)(i) By (a), γ = sup{νE : E T ∈ Σ} < ∞. Choose a sequence hEn in∈N in Σ such that νEn ≥ γ − 2−n for every
n ∈ N. For m ≤ n ∈ N, set Fmn = m≤i≤n Ei . Then νFmn ≥ γ − 2 · 2−m + 2−n for every n ≥ m. P P Induce on n. For
n = m, this is due to the choice of Em = Fmm . For the inductive step, we have Fm,n+1 = Fmn ∩ En+1 , while surely
γ ≥ ν(En+1 ∪ Fmn ), so

γ + νFm,n+1 ≥ ν(En+1 ∪ Fmn ) + νFm,n+1


= νEn+1 + ν(Fmn \ En+1 ) + νFm,n+1
= νEn+1 + νFmn
≥ γ − 2−n−1 + γ − 2 · 2−m + 2−n
(by the choice of En+1 and the inductive hypothesis)
94 The Radon-Nikodým theorem 231E

= 2γ − 2 · 2−m + 2−n−1 .

Subtracting γ from both sides, νFm,n+1 ≥ γ − 2 · 2−m + 2−n−1 and the induction proceeds. Q
Q
(ii) For m ∈ N, set
T T
Fm = n≥m Fmn = n≥m En .
Then
νFm = limn→∞ νFmn ≥ γ − 2 · 2−m ,
S
by 231Dc. Next, hFm im∈N is non-decreasing, so setting H = m∈N Fm we have
νH = limm→∞ νFm ≥ γ;
since νH is surely less than or equal to γ, νH = γ.
If F ∈ Σ and F ⊆ H, then
νH − νF = ν(H \ F ) ≤ γ = νH,
so νF ≥ 0. If F ∈ Σ and F ∩ H = ∅ then
νH + νF = ν(H ∪ F ) ≤ γ = νH
so νF ≤ 0. This completes the proof.

231F Corollary Let X be a set, Σ a σ-algebra of subsets of X, and ν : Σ → R a countably additive functional.
Then ν can be expressed as the difference of two totally finite measures with domain Σ.
proof Take H ∈ Σ as described in 231Eb. Set ν1 E = ν(E ∩ H), ν2 E = −ν(E \ H) for E ∈ Σ. Then, as in 231Dd-e,
both ν1 and ν2 are countably additive functionals on Σ, and of course ν = ν1 − ν2 . But also, by the choice of H, both
ν1 and ν2 are non-negative, so are totally finite measures.
Remark This is called the ‘Jordan decomposition’ of ν. The expression of 231Eb is a ‘Hahn decomposition’.

231X Basic exercises (a) Let Σ be the family of subsets A of N such that one of A, N \ A is finite. Show that Σ
is an algebra of subsets of N. (This is the finite-cofinite algebra of subsets of N; compare 211Ra.)

(b) Let X be a set, Σ an algebra of subsets of X and ν : Σ → R a finitely additive functional. Show that
ν(E ∪ F ∪ G) + ν(E ∩ F ) + ν(E ∩ G) + ν(F ∩ G) = νE + νF + νG + ν(E ∩ F ∩ G) for all E, F , G ∈ Σ. Generalize this
result to longer sequences of sets.

> (c) Let Σ be the finite-cofinite algebra of subsets of N, as in 231Xa. Define ν : Σ → Z by setting

νE = limn→∞ #({i : i ≤ n, 2i ∈ E}) − #({i : i ≤ n, 2i + 1 ∈ E})
for every E ∈ Σ. Show that ν is well-defined and finitely additive and unbounded.

(d) Let X be a set and Σ an algebra of subsets of X. (i) Show that if ν : Σ → R and ν ′ : Σ → R are finitely
additive, so are ν + ν ′ and cν for any c ∈ R. (ii) Show that if ν : Σ → R is finitely additive and H ∈ Σ, then νH is
finitely additive, where νH (E) = ν(H ∩ E) for every E ∈ Σ.

(e) Let X be a set, Σ an algebra of subsets of X and ν : Σ → R Pan finitely additive functional. Let S be the linear
space of those real-valued functions
R on X expressible in the form i=0 ai χEi where Ei ∈ Σ for each i. (i) Show that
we have a linear functional : S → R given by writing
R Pn Pn
i=0 ai χEi = i=0 ai νEi
R
whenever a0 , . . . , an ∈ R and E0 , . . . , En ∈ Σ. (ii) Show that if νE ≥ 0 for every E ∈ Σ then f ≥ 0 whenever f ∈ S
and f (x) ≥ 0 for every x ∈ X. (iii) Show that if ν is bounded and X 6= ∅ then
R
sup{| f | : f ∈ S, kf k∞ ≤ 1} = supE,F ∈Σ |νE − νF |,
writing kf k∞ = supx∈X |f (x)|.

> (f ) Let X be a set, Σ a σ-algebra of subsets of X and ν : Σ → R a finitely additive functional. Show that the
following are equiveridical:
231Ye Countably additive functionals 95

(i) ν is countably additive; T


(ii) limn→∞ νEn = 0 whenever hEn in∈N is a non-increasing sequence
T inSΣ and n∈N En = ∅;
(iii) limn→∞ νEn = 0 whenever hEn in∈N is a sequence in Σ and n∈N m≥n Em = ∅;
(iv) limn→∞ νEn = νE whenever hEn in∈N is a sequence in Σ and
T S S T
E = n∈N m≥n Em = n∈N m≥n Em .
(Hint: for (i)⇒(iv), consider non-negative ν first.)

(g) Let X be a set and Σ a σ-algebra of subsets of X, and let ν : Σ → [−∞, ∞[P be a function which is
Pcountably
∞ n
additive in the sense that ν∅ = 0 and whenever
S hE i
n n∈N is a disjoint sequence in Σ, n=0 νE n = lim n→∞ i=0 νEi is
defined in [−∞, ∞[ and is equal to ν( n∈N En ). Show that ν is bounded above and attains its upper bound (that is,
there is an H ∈ Σ such that νH = supF ∈Σ νF ). Hence, or otherwise, show that ν is expressible as the difference of a
totally finite measure and a measure, both with domain Σ.

231Y Further exercises (a) Let X be a set, Σ an algebra of subsets of X, and ν : Σ → R a bounded finitely
additive functional. Set
ν + E = sup{νF : F ∈ Σ, F ⊆ E},

ν − E = − inf{νF : F ∈ Σ, F ⊆ E},

|ν|E = sup{νF1 − νF2 : F1 , F2 ∈ Σ, F1 , F2 ⊆ E}.


Show that ν , ν and |ν| are all bounded finitely additive functionals on Σ and that ν = ν + − ν − , |ν| = ν + + ν − .
+ −

Show that if ν is countably additive so are ν + , ν − and |ν|. (|ν| is sometimes called the variation of ν.)

(b) Let X be a set and Σ an algebra of subsets of X. Let ν1 , ν2 be two bounded finitely additive functionals defined
on Σ. Set
(ν1 ∨ ν2 )(E) = sup{ν1 F + ν2 (E \ F ) : F ∈ Σ, F ⊆ E},

(ν1 ∧ ν2 )(E) = inf{ν1 F + ν2 (E \ F ) : F ∈ Σ, F ⊆ E}.


Show that ν1 ∨ ν2 and ν1 ∧ ν2 are finitely additive functionals, and that ν1 + ν2 = ν1 ∨ ν2 + ν1 ∧ ν2 . Show that, in the
language of 231Ya,
ν + = ν ∨ 0, ν − = (−ν) ∨ 0 = −(ν ∧ 0), |ν| = ν ∨ (−ν) = ν + ∨ ν − = ν + + ν − ,

ν1 ∨ ν2 = ν1 + (ν2 − ν1 )+ , ν1 ∧ ν2 = ν1 − (ν1 − ν2 )+ ,
so that ν1 ∨ ν2 and ν1 ∧ ν2 are countably additive if ν1 and ν2 are.

(c) Let X be a set and Σ an algebra of subsets of X. Let M be the set of all bounded finitely additive functionals
from Σ to R. Show that M is a linear space under the natural definitions of addition and scalar multiplication. Show
that M has a partial order ≤ defined by saying that
ν ≤ ν ′ iff νE ≤ ν ′ E for every E ∈ Σ,
and that for this partial order ν1 ∨ ν2 , ν1 ∧ ν2 , as defined in 231Yb, are sup{ν1 , ν2 }, inf{ν1 , ν2 }.

(d) Let X be a set and Σ an algebra of subsets of X. Let ν0 , . . . , νn be bounded finitely additive functionals on Σ
and set
Pn S
ν̌E = sup{ i=0 νi Fi : F0 , . . . , Fn ∈ Σ, i≤n Fi = E, Fi ∩ Fj = ∅ for i 6= j},
Pn S
ν̂E = inf{ i=0 νi Fi : F0 , . . . , Fn ∈ Σ, i≤n Fi = E, Fi ∩ Fj = ∅ for i 6= j}
for E ∈ Σ. Show that ν̌ and ν̂ are finitely additive and are, respectively, sup{ν0 , . . . , νn } and inf{ν0 , . . . , νn } in the
partially ordered set of finitely additive functionals on Σ.

(e) Let X be a set and Σ a σ-algebra of subsets of X; let M be the partially ordered set of all bounded finitely
additive functionals from Σ to R. (i) Show that if A ⊆ M is non-empty and bounded above in M , then A has a
supremum ν̌ in M , given by the formula
96 The Radon-Nikodým theorem 231Ye

Xn [
ν̌E = sup{ νi Fi : ν0 , . . . , νn ∈ A, F0 , . . . , Fn ∈ Σ, Fi = E,
i=0 i≤n

Fi ∩ Fj = ∅ for i 6= j}.
(ii) Show that if A ⊆ M is non-empty and bounded below in M then it has an infimum ν̂ ∈ M , given by the formula

Xn [
ν̂E = inf{ νi Fi : ν0 , . . . , νn ∈ A, F0 , . . . , Fn ∈ Σ, Fi = E,
i=0 i≤n

Fi ∩ Fj = ∅ for i 6= j}.

(f ) Let X be a set, Σ an algebra of subsets of X, and ν : Σ → R a non-negative finitely additive functional. For
E ∈ Σ set
νca (E) = inf{supn∈N νFn : hFn in∈N is a non-decreasing sequence in Σ with union E}.
Show that νca is a countably additive functional on Σ and that if ν ′ is any countably additive functional with ν ′ ≤ ν
then ν ′ ≤ νca . Show that νca ∧ (ν − νca ) = 0.

(g) Let X be a set, Σ an algebra of subsets of X, and ν : Σ → R a bounded finitely additive functional. Show that
ν is uniquely expressible as νca + νpf a , where νca is countably additive, νpf a is finitely additive and if 0 ≤ ν ′ ≤ |νpf a |
and ν ′ is countably additive then ν ′ = 0.

(h) Let X be a set and Σ an algebra of subsets of X. Let M be the linear space of bounded finitely additive
functionals on Σ, and for ν ∈ M set kνk = |ν|(X), defining |ν| as in 231Ya. (kνk is the total variation of ν.) Show
that k k is a norm on M under which M is a Banach space. Show that the space of bounded countably additive
functionals on Σ is a closed linear subspace of M .

(i) Repeat as many as possible of the results of this section for complex-valued functionals.

231 Notes and comments The real purpose of this section has been to describe the Hahn decomposition of a
countably additive functional (231E). The very leisurely exposition in 231A-231D is intended as a review of the most
elementary properties of measures, in the slightly more general context of ‘signed measures’, with those properties
corresponding to ‘additivity’ alone separated from those which depend on ‘countable additivity’. In 231Xf I set out
necessary and sufficient conditions for a finitely additive functional on a σ-algebra to be countably additive, designed
to suggest that a finitely additive functional is countably additive iff it is ‘sequentially order-continuous’ in some sense.
The fact that a countably additive functional can be expressed as the difference of non-negative countably additive
functionals (231F) has an important counterpart in the theory of finitely additive functionals: a finitely additive
functional can be expressed as the difference of non-negative finitely additive functionals if (and only if) it is bounded
(231Ya). But I do not think that this, or the further properties of bounded finitely additive functionals described in
231Xe and 231Y, will be important to us before Volume 3.

232 The Radon-Nikodým theorem


I come now to the chief theorem of this chapter, one of the central results of measure theory, relating countably
additive functionals to indefinite integrals. The objective is to give a complete description of the functionals which
can arise as indefinite integrals of integrable functions (232E). These can be characterized as the ‘truly continuous’
additive functionals (232Ab). A more commonly used concept, and one adequate in many cases, is that of ‘absolutely
continuous’ additive functional (232Aa); I spend the first few paragraphs (232B-232D) on elementary facts about truly
continuous and absolutely continuous functionals. I end the section with a discussion of the decomposition of general
countably additive functionals (232I).

232A Absolutely continuous functionals Let (X, Σ, µ) be a measure space and ν : Σ → R a finitely additive
functional.

(a) ν is absolutely continuous with respect to µ (sometimes written ‘ν ≪ µ’) if for every ǫ > 0 there is a δ > 0
such that |νE| ≤ ǫ whenever E ∈ Σ and µE ≤ δ.
232B The Radon-Nikodým theorem 97

(b) ν is truly continuous with respect to µ if for every ǫ > 0 there are E ∈ Σ, δ > 0 such that µE is finite and
|νF | ≤ ǫ whenever F ∈ Σ and µ(E ∩ F ) ≤ δ.

(c) For reference, I add another definition here. If ν is countably additive, it is singular with respect to µ if there
is a set F ∈ Σ such that µF = 0 and νE = 0 whenever E ∈ Σ and E ⊆ X \ F .

232B Proposition Let (X, Σ, µ) be a measure space and ν : Σ → R a finitely additive functional.
(a) If ν is countably additive, it is absolutely continuous with respect to µ iff νE = 0 whenever µE = 0.
(b) ν is truly continuous with respect to µ iff (α) it is countably additive (β) it is absolutely continuous with respect
to µ (γ) whenever E ∈ Σ and νE 6= 0 there is an F ∈ Σ such that µF < ∞ and ν(E ∩ F ) 6= 0.
(c) If (X, Σ, µ) is σ-finite, then ν is truly continuous with respect to µ iff it is countably additive and absolutely
continuous with respect to µ.
(d) If (X, Σ, µ) is totally finite, then ν is truly continuous with respect to µ iff it is absolutely continuous with respect
to µ.
proof (a)(i) If ν is absolutely continuous with respect to µ and µE = 0, then µE ≤ δ for every δ > 0, so |νE| ≤ ǫ for
every ǫ > 0 and νE = 0.
(ii) ?? Suppose, if possible, that νE = 0 whenever µE = 0, but ν is not absolutely continuous. Then there is an
ǫ > 0 such that for every δ > 0 there is an E ∈ Σ such thatTµE ≤ Sδ but |νE| ≥ ǫ. For each n ∈ N we may choose an
Fn ∈ Σ such that µFn ≤ 2−n and |νFn | ≥ ǫ. Consider F = n∈N k≥n Fk . Then we have
S P∞
µF ≤ inf n∈N µ( k≥n Fk ) ≤ inf n∈N k=n 2−k = 0,
so µF = 0.
Now recall that by 231Eb there is an H ∈ Σ such that νG ≥ 0 when G ∈ Σ and G ⊆ H, and νG ≤ 0 when G ∈ Σ
and G ∩ H = ∅. As in 231F, set ν1 G = ν(G ∩ H), ν2 G = −ν(G \ H) for G ∈ Σ, so that ν1 and ν2 are totally finite
measures, and ν1 F = ν2 F = 0 because µ(F ∩ H) = µ(F \ H) = 0. Consequently
S
0 = νi F = limn→∞ νi ( m≥n Fm ) ≥ lim supn→∞ νi Fn
for both i, and
0 = limn→∞ (ν1 Fn + ν2 Fn ) ≥ lim inf n→∞ |νFn | ≥ ǫ > 0,
which is absurd. X
X
(b)(i) Suppose that ν is truly continuous with respect to µ. It is obvious from the definitions that ν is absolutely
continuous with respect to µ. If νE 6= 0, there must be an F of finite measure such that |νG| < |νE| whenever
G ∩ F = ∅, so that |ν(E \ F )| < |νE| and ν(E ∩ F ) 6= 0. This deals with the conditions (β) and (γ).
To check that ν is countably additive, let hEn in∈N be a disjoint sequence in Σ, with union E, and ǫ > 0. Let δ > 0,
F ∈ Σ be such µF < ∞ and |νG| ≤ ǫ whenever G ∈ Σ and µ(F ∩ G) ≤ δ. Then
P∞
n=0 µ(En ∩ F ) ≤ µF < ∞,
P∞ ∗
S
so there is an n ∈ N such that i=n µ(Ei ∩ F ) ≤ δ. Take any m ≥ n and consider Em = i≤m Ei . We have
Pm ∗ ∗
|νE − i=0 νEi | = |νE − νEm | = |ν(E \ Em )| ≤ ǫ,
because

P∞
µ(F ∩ E \ Em )= i=m+1 µ(F ∩ Ei ) ≤ δ.
As ǫ is arbitrary,
P∞
νE = i=0 νEi ;
as hEn in∈N is arbitrary, ν is countably additive.
(ii) Now suppose that ν satisfies the three conditions. By 231F, ν can be expressed as the difference of two
non-negative countably additive functionals ν1 , ν2 ; set ν ′ = ν1 + ν2 , so that ν ′ is a non-negative countably additive
functional and |νF | ≤ ν ′ F for every F ∈ Σ. Set
γ = sup{ν ′ F : F ∈ Σ, µF < ∞} ≤ ν ′ X < ∞,
S
and choose a sequence hFn in∈N of sets of finite measure such that limn→∞ ν ′ Fn = γ; set F ∗ = n∈N Fn . If G ∈ Σ and
G ∩ F ∗ = ∅ then νG = 0. P P?? Otherwise, by condition (γ), there is an F ∈ Σ such that µF < ∞ and ν(G ∩ F ) 6= 0.
It follows that
98 The Radon-Nikodým theorem 232B

ν ′ (F \ F ∗ ) ≥ ν ′ (F ∩ G) ≥ |ν(F ∩ G)| > 0,


and there must be an n ∈ N such that
γ < ν ′ Fn + ν ′ (F \ F ∗ ) = ν ′ (Fn ∪ (F \ F ∗ )) ≤ ν ′ (F ∪ Fn ) ≤ γ
because µ(F ∪ FnS) < ∞; but this is impossible. X
XQQ
Setting Fn∗ = k≤n Fk for each n, we have limn→∞ ν ′ (F ∗ \ Fn∗ ) = 0. Take any ǫ > 0, and (using condition (β)) let
δ > 0 be such that |νE| ≤ 21 ǫ whenever µE ≤ δ. Let n be such that ν ′ (F ∗ \ Fn∗ ) ≤ 21 ǫ. Now if F ∈ Σ and µ(F ∩ Fn∗ ) ≤ δ
then

|νF | ≤ |ν(F ∩ Fn∗ )| + |ν(F ∩ F ∗ \ Fn∗ )| + |ν(F \ F ∗ )|


1
≤ ǫ + ν ′ (F ∩ F ∗ \ Fn∗ ) + 0
2
1 1 1
≤ ǫ + ν ′ (F ∗ \ Fn∗ ) ≤ ǫ + ǫ = ǫ.
2 2 2

And µFn∗ < ∞. As ǫ is arbitrary, ν is truly continuous.


(c) Now suppose that (X, Σ, µ) is σ-finite and that ν is countably additive and absolutely continuous with respect
to µ. Let hXn in∈N be a non-decreasing sequence of sets of finite measure covering X (211D). If νE 6= 0, then
limn→∞ ν(E ∩ Xn ) 6= 0, so ν(E ∩ Xn ) 6= 0 for some n. This shows that ν satisfies condition (γ) of (b), so is truly
continuous.
Of course the converse of this fact is already covered by (b).
(d) Finally, suppose that µX < ∞ and that ν is absolutely continuous with respect to µ. Then it must be truly
continuous, because we can take F = X in the definition 232Ab.

232C Lemma Let (X, Σ, µ) be a measure space and ν, ν ′ two countably additive functionals on Σ which are truly
continuous with respect to µ. Take c ∈ R and H ∈ Σ, and set νH E = ν(E ∩ H), as in 231De. Then ν + ν ′ , cν and
νH are all truly continuous with respect to µ, and ν is expressible as the difference of non-negative countably additive
functionals which are truly continuous with respect to µ.
proof Let ǫ > 0. Set η = ǫ/(2 + |c|) > 0. Then there are δ, δ ′ > 0 and E, E ′ ∈ Σ such that µE < ∞, µE ′ < ∞ and
|νF | ≤ η whenever µ(F ∩ E) ≤ δ, |ν ′ F | ≤ η whenever µ(F ∩ E) ≤ δ ′ . Set δ ∗ = min(δ, δ ′ ) > 0, E ∗ = E ∪ E ′ ∈ Σ; then
µE ∗ ≤ µE + µE ′ < ∞.
Suppose that F ∈ Σ and µ(F ∩ E ∗ ) ≤ δ ∗ ; then
µ(F ∩ H ∩ E) ≤ µ(F ∩ E) ≤ δ ∗ ≤ δ, µ(F ∩ E ′ ) ≤ δ ∗
so
|(ν + ν ′ )F | ≤ |νF | + |ν ′ F | ≤ η + η ≤ ǫ,

|(cν)F | = |c||νF | ≤ |c|η ≤ ǫ,

|νH F | = |ν(F ∩ H)| ≤ η ≤ ǫ.


As ǫ is arbitrary, ν + ν ′ , cν and νH are all truly continuous.
Now, taking H from 231Eb, we see that ν1 = νH and ν2 = −νX\H are truly continuous and non-negative, and
ν = ν1 − ν2 is the difference of truly continuous measures.

232DR Proposition Let (X, Σ, µ) be a measure space, and f a µ-integrable real-valued function. For E ∈ Σ set
νE = E f . Then ν : Σ → R is a countably additive functional and is truly continuous with respect to µ, therefore
absolutely continuous with respect to µ.
R R
proof Recall that E f = f × χE is defined for every E ∈ Σ (131Fa). So ν : Σ → R is well-defined. If E, F ∈ Σ are
disjoint then
Z Z
ν(E ∪ F ) = f × χ(E ∪ F ) = (f × χE) + (f × χF )
Z Z
= f × χE + f × χF = νE + νF,
232E The Radon-Nikodým theorem 99

so ν is finitely additive.
Now 225A, without using the phrase ‘truly continuous’, proved exactly that ν is truly continuous with respect to µ.
It follows from 232Bb that ν is countably additive and absolutely continuous.
R
Remark The functional E 7→ E f is called the indefinite integral of f .

232E We are now at last ready for the theorem.


The Radon-Nikodým theorem Let (X, Σ, µ) be a measure space and ν : Σ → R a function. Then the following are
equiveridical: R
(i) there is a µ-integrable function f such that νE = E f for every E ∈ Σ;
(ii) ν is finitely additive and truly continuous with respect to µ.
R
proof (a) If f is a µ-integrable real-valued function and νE = E f for every E ∈ Σ, then 232D tells us that ν is
finitely additive and truly continuous.
(b) In the other direction, suppose that ν is finitely additive and truly continuous; note that (by 232B(a-b)) νE = 0
whenever µE = 0. To begin with, suppose that ν is non-negative andR not zero. R
In this case, there is a non-negative simple function f such that f > 0 and E f ≤ νE for every E ∈ Σ. P P Let
H ∈ Σ be such that νH > 0; set ǫ = 13 νH > 0. Let E ∈ Σ, δ > 0 be such that µE < ∞ and νF ≤ ǫ whenever F ∈ Σ
and µ(F ∩ E) ≤ δ; then ν(H \ E) ≤ ǫ so νE ≥ ν(H ∩ E) ≥ 2ǫ and µE ≥ µ(H ∩ E) > 0. Set µE F = µ(F ∩ E) for every
F ∈ Σ; then µE is a countably additive functional on Σ. Set ν ′ = ν − αµE , where α = ǫ/µE; then ν ′ is a countably
additive functional and ν ′ E > 0. By 231Eb, as usual, there is a set G ∈ Σ such that ν ′ F ≥ 0 if F ∈ Σ, F ⊆ G, but
ν ′ F ≤ 0 if F ∈ Σ and F ∩ G = ∅. As ν ′ (E \ G) ≤ 0,
0 < ν ′ E ≤ ν ′ (E ∩ G) ≤ ν(E ∩ G)
R
and µ(E ∩ G) > 0. Set f = αχ(E ∩ G); then f is a non-negative simple function and f = αµ(E ∩ G) > 0.
If F ∈ Σ then ν ′ (F ∩ G) ≥ 0, that is,
R
ν(F ∩ G) ≥ αµE (F ∩ G) = αµ(F ∩ E ∩ G) = F
f.
So
R
νF ≥ ν(F ∩ G) ≥ F
f,
as required. Q
Q
(c) Still supposing that ν is a non-negative,
R truly continuous additive functional, let Φ be the set of non-negative
simple functions f : X → R such that E f ≤ νE for every E ∈ Σ; then the constant function 0 belongs to Φ, so Φ is
not empty.
If f , g ∈ Φ then f ∨ g ∈ Φ, where (f ∨ g)(x) = max(f (x), g(x)) for x ∈ X. P P Set H = {x : (f − g)(x) ≥ 0} ∈ Σ;
then f ∨ g = (f × χH) + (g × χ(X \ H)) is a non-negative simple function, and for any E ∈ Σ,
R R R
E
f ∨g = E∩H
f+ E\H
g ≤ ν(E ∩ H) + ν(E \ H) = νE. Q
Q
Set
R
γ = sup{ f : f ∈ Φ} ≤ νX < ∞.
R
RChoose Ra sequence hfn in∈N in Φ such that
R limn→∞ fn = γ. For each n, set gn = f0 ∨ f1 ∨ . . . ∨ fn ; then
R gn ∈ Φ and
fn ≤ gn ≤ γ for each n, so limn→∞ gn = γ. By B.Levi’s theorem, f = limn→∞ gn is integrable and f = γ. Note
that if E ∈ Σ then
R R
E
f = limn→∞ E fn ≤ νE.
R
?? Suppose, if possible, that there is an H ∈ Σ such that H f 6= νH. Set
R
ν1 F = νF − F
f ≥0
for every F ∈ Σ; then by (a) of this proof and 232C, ν1 is a truly continuous finitely additive functional,
R and we are
supposing that
R ν 1 6
= 0. By (b) of this proof,
R there
R is a non-negative simple function g such that F
g ≤ ν 1 F for every
F ∈ Σ and g > 0. Take n ∈ N such that fn + g > γ. Then fn + g is a non-negative simple function and
R R R R R R
(f + g) =
F n F
fn + F
g≤ F
f+ F
g = νF − ν1 F + F
g ≤ νF
for any F ∈ Σ, so fn + g ∈ Φ, and
100 The Radon-Nikodým theorem 232E
R R R
γ< fn + g= fn + g ≤ γ,
R
which is absurd. X
X Thus we have H
f = νH for every H ∈ Σ.
(d) This proves the theorem for non-negative ν. For general ν, we need only observe that ν is expressible as ν1 − ν2 ,
where ν1 and ν2 are non-negative Rtruly continuous countably additive functionals, by 232C; so that there are integrable
R
functions f1 , f2 such that νi F = F fi for both i and every F ∈ Σ. Of course f = f1 − f2 is integrable and νF = F f
for every F ∈ Σ. This completes the proof.

232F Corollary Let (X,R Σ, µ) be a σ-finite measure space and ν : Σ → R a function. Then there is a µ-integrable
function f such that νE = E f for every E ∈ Σ iff ν is countably additive and absolutely continuous with respect to
µ.
proof Put 232Bc and 232E together.

R finite measure space and ν : Σ → R a function. Then there is a µ-


232G Corollary Let (X, Σ, µ) be a totally
integrable function f on X such that νE = E f for every E ∈ Σ iff ν is finitely additive and absolutely continuous
with respect to µ.
proof Put 232Bd and 232E together.

232H Remarks (a) Most authors are satisfied with 232F as the ‘Radon-Nikodým theorem’. In my view the problem
of identifying indefinite integrals is of sufficient importance to justify an analysis which applies to all measure spaces,
even if it requires a new concept (the notion of ‘truly continuous’ functional).

(b) I ought to offer an example of an absolutely continuous functional which is not truly continuous. A simple one
is the following. Let X be any uncountable set. Let Σ be the countable-cocountable σ-algebra of subsets of X and ν
the countable-cocountable measure on X (211R). Let µ be the restriction to Σ of counting measure on X. If µE = 0
then E = ∅ and νE = 0, so ν is absolutely continuous. But for any E of finite measure we have ν(X \ E) = 1, so ν is
not truly continuous. See also 232Xf(i).

*(c) The space (X, Σ, µ) of this example is, in terms of the classification developed in Chapter 21, somewhat
irregular; for instance, it is neither locally determined nor localizable, and therefore not strictly localizable, though it
is complete and semi-finite. Can this phenomenon occur in a strictly localizable measure space? We are led here into
a fascinating question. Suppose, in (b), I used the same idea, but with Σ = PX. No difficulty arises in constructing µ;
but can there now be a ν with the required properties, that is, a non-zero countably additive functional from PX to
R which is zero on all finite sets? This is the ‘Banach-Ulam problem’, on which I have written extensively elsewhere
(Fremlin 93), and to which I will return in Volume 5. The present question is touched on again in 363S in Volume 3.

(d) Following the Radon-Nikodým theorem, the question immediately arises: for a given ν, how much possible
variation is there in the corresponding f ? The answer is straightforward enough: two integrable functions f and g give
rise to the same indefinite integral iff they are equal almost everywhere (131Hb).

(e) I have stated the Radon-Nikodým theorem in terms of arbitrary integrable functions, meaning to interpret
‘integrability’ in a wide sense, according to the conventions of Volume 1. However, given a truly continuous countably
additive functional ν, we can ask whether there is in any sense a canonical integrable function representing it. The
answer is no. But we certainly do not need to take arbitrary integrable functions of the type considered in Chapter
12. If f is any integrable function, there is a conegligible set E such that f ↾E is measurable, and now we can find a
conegligible measurable set G ⊆ E ∩ dom f ; if we set g(x) = f (x) for x ∈ G, 0 for x ∈ X \ G, then f =a.e. g, so g has
the same indefinite integral as f (as noted in (d) just above), while g is measurable and has domain X. Thus we can
make a trivial, but sometimes convenient, refinement to the theorem: if (X, Σ, µ) is a measure space, and ν : Σ → R is
finitely additive
R and truly continuous with respect to µ, then there is a Σ-measurable µ-integrable function g : X → R
such that E g = νE for every E ∈ Σ.

(f ) It is convenient to introduce now a general definition. If (X, Σ, µ) is a measure space and ν is a [−∞, ∞]-valued
R a [−∞, ∞]-valued function f defined on a subset of X is
functional defined on a family of subsets of X, I will say that
a Radon-Nikodým derivative of ν with respect to µ if E f dµ is defined (in the sense of 214D) and equal to νE for
every E ∈ dom ν. Thus the integrable functions called f in 232E-232G are all ‘Radon-Nikodým derivatives’; later on
we shall have less well-regulated examples.
When ν is a measure and f is non-negative, f may be called a density function.
232I The Radon-Nikodým theorem 101

(g) Throughout the work above I have taken it that ν is defined on the whole domain Σ of µ. In some of the most
important applications, however, ν is defined only on some smaller σ-algebra T. In this case we commonly seek to
apply the same results with µ↾ T in place of µ.

232I The Lebesgue decomposition of a countably additive functional: Proposition (a) Let (X, Σ, µ) be
a measure space and ν : Σ → R a countably additive functional. Then ν has unique expressions as
ν = νs + νac = νs + νtc + νe ,
where νtc is truly continuous with respect to µ, νs is singular with respect to ν, and νe is absolutely continuous with
respect to µ and zero on every set of finite measure.
(b) If X = R r , Σ is the algebra of Borel sets in R r and µ is the restriction of Lebesgue measure to Σ, then ν is
uniquely expressible as νp + νcs + νac where
P νac is absolutely continuous with respect to µ, νcs is singular with respect
to µ and zero on singletons, and νp E = x∈E νp {x} for every E ∈ Σ.
proof (a)(i) Suppose first that ν is non-negative. In this case, set
νs E = sup{ν(E ∩ F ) : F ∈ Σ, µF = 0},

νt E = sup{ν(E ∩ F ) : F ∈ Σ, µF < ∞}.


P Surely νs ∅ = νt ∅ = 0. Let hEn in∈N be a disjoint sequence in Σ with
Then both νs and νt are countably additive. P
α) If F ∈ Σ and µF = 0, then
union E. (α
P∞ P∞
ν(E ∩ F ) = n=0 ν(En ∩ F ) ≤ n=0 νs (En );
as F is arbitrary,
P∞
νs E ≤ n=0 ν s En .
β ) If F ∈ Σ and µF < ∞, then

P∞ P∞
ν(E ∩ F ) = n=0 ν(En ∩ F ) ≤ n=0 νt (En );
as F is arbitrary,
P∞
νt E ≤ n=0 ν t En .
P∞ P∞
(γγ ) If ǫ > 0, then (because n=0 νEn = νE < ∞) there is an n ∈ N such that k=n+1 νE S k ≤ ǫ. Now, for each k ≤ n,
ǫ
there is an Fk ∈ Σ such that µFk = 0 and ν(Ek ∩ Fk ) ≥ νs Ek − n+1 . In this case, F = k≤n Fk ∈ Σ, µF = 0 and
Pn Pn P∞
νs E ≥ ν(E ∩ F ) ≥ k=0 ν(Ek ∩ Fk ) ≥ k=0 νs Ek − ǫ ≥ k=0 νs Ek − 2ǫ,
because
P∞ P∞
k=n+1 ν s Ek ≤ k=n+1 νEk ≤ ǫ.
As ǫ is arbitrary,
P∞
νs E ≥ k=0 ν s Ek .
ǫ
(δδ ) Similarly, for each k ≤ n, there is an Fk′ ∈ Σ such that µFk′ < ∞ and ν(Ek ∩ Fk′ ) ≥ νt Ek − n+1 . In this case,
S
F ′ = k≤n Fk′ ∈ Σ, µF ′ < ∞ and
Pn Pn P∞
νt E ≥ ν(E ∩ F ′ ) ≥ k=0 ν(Ek ∩ Fk′ ) ≥ k=0 νt Ek − ǫ ≥ k=0 νt Ek − 2ǫ,
because
P∞ P∞
k=n+1 ν t Ek ≤ k=n+1 νEk ≤ ǫ.
As ǫ is arbitrary,
P∞
νt E ≥ k=0 ν t Ek .
P∞ P∞
(ǫǫ) Putting these together, νs E = n=0 νs En and νt E = n=0 νt En . As hEn in∈N is arbitrary, νs and νt are countably
additive. QQ
(ii) Still supposing that ν is non-negative,
S if we choose a sequence hFn in∈N in Σ such that µFn = 0 for each n
and limn→∞ νFn = νs X, then F ∗ = n∈N Fn has µF ∗ = 0, νF ∗ = νs X; so that νs (X \ F ∗ ) = 0, and νs is singular
with respect to µ in the sense of 232Ac.
Note that νs F = νF whenever µF = 0. So if we write νac = ν − νs , then νac is a countably additive functional and
νac F = 0 whenever µF = 0; that is, νac is absolutely continuous with respect to µ.
102 The Radon-Nikodým theorem 232I

If we write νtc = νt − νs , then νtc is a non-negative countably additive functional; νtc F = 0 whenever µF = 0, and
if νtc E > 0 there is a set F with µF < ∞ and νtc (E ∩ F ) > 0. So νtc is truly continuous with respect to µ, by 232Bb.
Set νe = ν − νt = νac − νtc .
Thus for any non-negative countably additive functional ν, we have expressions
ν = νs + νac , νac = νtc + νe
where νs , νac , νtc and νe are all non-negative countably additive functionals, νs is singular with respect to µ, νac and νe
are absolutely continuous with respect to µ, νtc is truly continuous with respect to µ, and νe F = 0 whenever µF < ∞.
(iii) For general countably additive functionals ν : Σ → R, we can express ν as ν ′ − ν ′′ , where ν ′ and ν ′′
are non-negative countably additive functionals. If we define νs′ , νs′′ , . . . , νe′′ as in (i)-(ii), we get countably additive
functionals
νs = νs′ − νs′′ , ′
νac = νac ′′
− νac , ′
νtc = νtc ′′
− νtc , νe = νe′ − νe′′
such that νs is singular with respect to µ (if F ′ , F ′′ are such that
µF = µF ′ = νs′ (X \ F ) = νs′′ (X \ F ) = 0,
then µ(F ′ ∪ F ′′ ) = 0 and νs E = 0 whenever E ⊆ X \ (F ′ ∪ F ′′ )), νac is absolutely continuous with respect to µ, νtc is
truly continuous with respect to µ, and νe F = 0 whenever µF < ∞, while
ν = νs + νac = νs + νtc + νe .

(iv) Moreover, these decompositions are unique. P P(αα) If, for instance, ν = ν̃s + ν̃ac , where ν̃s is singular and ν̃ac
is absolutely continuous with respect to µ, let F , F̃ be such that µF = µF̃ = 0 and ν̃s E = 0 whenever E ∩ F̃ = ∅,
νs E = 0 whenever E ∩ F = ∅; then we must have
νac (E ∩ (F ∪ F̃ )) = ν̃ac (E ∩ (F ∪ F̃ )) = 0
for every E ∈ Σ, so
νs E = ν(E ∩ (F ∪ F̃ )) = ν̃s E
for every E ∈ Σ. Thus ν̃s = νs and ν̃ac = νac .
β ) Similarly, if νac = ν̃tc + ν̃e where ν̃tc is truly continuous with respect to µ and ν̃e F = 0 whenever µF < ∞,

S
then there are sequences hFn in∈N , hF̃n in∈N of sets of finite measure such that νtc F = 0 whenever F ∩ n∈N Fn = ∅
S S
and ν̃tc F = 0 whenever F ∩ n∈N F̃n = ∅. Write F ∗ = n∈N (Fn ∪ F̃n ); then ν̃e E = νe E = 0 whenever E ⊆ F ∗ and
ν̃tc E = νtc E = 0 whenever E ∩ F ∗ = ∅, so νe E = νac (E \ F ∗ ) = ν̃e E for every E ∈ Σ, and νe = ν̃e , νtc = ν̃tc . Q
Q
(b) In this case, µ is σ-finite (cf. 211P), so every absolutely continuous countably additive functional is truly
continuous (232Bc), and we shall always have νe = 0, νac = νtc . But in the other direction we know that singleton
sets, and therefore countable sets, are all measurable. We therefore have a further decomposition νs = νp + νcs , where
there is a countable set K ⊆ R r with νp E = 0 whenever E ∈ Σ, E ∩ K = ∅, and νcs is singular with respect to µ and
zero on countable sets. PP (i) If ν ≥ 0, set
νp E = sup{ν(E ∩ K) : K ⊆ R r is countable};
just as with νs , dealt with in (a) above, νp is countably additive and there is a countable K ⊆ R r such that νp E =
ν(E ∩ K) for every E ∈ Σ. (ii) For general ν, we can express ν as ν ′ − ν ′′ where ν ′ and ν ′′ are non-negative, and write
νp = νp′ − νp′′ . (iii) νp is characterized by saying that there is a countable set K such that νp E = ν(E ∩ K) for every
E ∈ Σ and ν{x} = 0 for every x ∈ R r \ K. (iv) So if we set νcs = νs − νp , νcs will be singular with respect to µ and
zero on countable sets. Q Q
Now, for any E ∈ Σ,
P P
νp E = ν(E ∩ K) = x∈K∩E ν{x} = x∈E ν{x}.

Remark The expression ν = νp + νcs + νac of (b) is the Lebesgue decomposition of ν.

232X Basic exercises (a) Let (X, Σ, µ) be a measure space and ν : Σ → R a countably additive functional which
is absolutely continuous with respect to µ. Show that the following are equiveridical: (i) ν is truly continuous with
respect toSµ; (ii) there is a sequence hEn in∈N in Σ such that µEn < ∞ for every n ∈ N and νF = 0 whenever F ∈ Σ
and F ∩ n∈N En = ∅.
232Yg The Radon-Nikodým theorem 103

> (b) Let g : R → R be a bounded non-decreasing function and µg the associated Lebesgue-Stieltjes measure
(114Xa). Show that µg is absolutely continuous (equivalently, truly continuous) with respect to Lebesgue measure iff
the restriction of g to any closed bounded interval is absolutely continuous in the sense of 225B.

(c) Let X be a set and Σ a σ-algebra of subsets of X; let ν : Σ → R be a countably additive functional. Let I be
an ideal of Σ, that is, a subset of Σ such that (α) ∅ ∈ I (β) E ∪ F ∈ I for all E, F ∈ I (γ) if E ∈ Σ, F ∈ I and
E ⊆ F then E ∈ I. Show that ν has a unique decomposition as ν = νI + νI′ , where νI and νI′ are countably additive
functionals, νI′ E = 0 for every E ∈ I, and whenever E ∈ Σ, νI E 6= 0 there is an F ∈ I such that νI (E ∩ F ) 6= 0.

> (d) Let X be a non-empty set and Σ a σ-algebra of subsets of X. Show that for any sequence hνn in∈N of
countably additive functionals on Σ there is a probability measure µ on X, with domain Σ, such that every νn is
absolutely continuous with respect to µ. (Hint: start with the case νn ≥ 0.)

(e) Let (X, Σ, µ) be a measure space and (X, Σ̂, µ̂) its completion (212C). Let ν : Σ → R be an additive functional
such that νE = 0 whenever µE = 0. Show that ν has a unique extension to an additive functional ν̂ : Σ̂ → R such
that ν̂E = 0 whenever µ̂E = 0.

(f ) Let F be an ultrafilter on N including the filter {N\I : I ⊆ N is finite} (2A1O). Define ν : PN → {0, 1} by setting
νE = 1 if E ∈ F, 0 for E ∈ PN \ F. (i) Let µ1 be counting measure on PN. Show that ν is additive and P absolutely
continuous with respect to µ2 , but is not truly continuous. (ii) Define µ2 : PN → [0, 1] by setting µ2 E = n∈E 2−n−1 .
Show that ν is zero on µ2 -negligible sets, but is not absolutely continuous with respect to µ2 .

(g) Rewrite this section in terms of complex-valued additive functionals.

(h) Let (X, Σ, µ) be a measure space, and ν and λ additive functionals on Σ of which ν is positive and countably
additive, so that (X, Σ, ν) also is a measure space. (i) Show that if ν is absolutely continuous with respect to µ and
λ is absolutely continuous with respect to ν, then λ is absolutely continuous with respect to µ. (ii) Show that if ν is
truly continuous with respect to µ and λ is absolutely continuous with respect to ν then λ is truly continuous with
respect to µ.

232Y Further exercises (a) Let (X, Σ, µ) be a measure space and ν : Σ → R a finitely additive functional. If E,
F , H ∈ Σ and µH < ∞ set ρH (E, F ) = µ(H ∩ (E△F )). (i) Show that ρH is a pseudometric on Σ (2A3Fa). (ii) Let
T be the topology on Σ generated by {ρH : H ∈ Σ, µH < ∞} (2A3Fc). Show that ν is continuous for T iff it is truly
continuous in the sense of 232Ab. (T is the topology of convergence in measure on Σ.)

(b) For a non-decreasing function F : [a, b] → R, where a < b, let νF be the corresponding Lebesgue-Stieltjes
measure. Show that if we define (νF )ac , etc., with regard to Lebesgue measure on [a, b], as in 232I, then
(νF )p = νFp , (νF )ac = νFac , (νF )cs = νFcs ,
where Fp , Fcs and Fac are defined as in 226C.

(c) Extend the idea of (b) to general functions F of bounded variation.

(d) Extend the ideas of (b) and (c) to open, half-open and unbounded intervals (cf. 226Ya).

(e) Let (X, Σ, µ) be a measure space and (X, Σ̃, µ̃) its c.l.d. version (213E). Let ν : Σ → R be an additive functional
which is truly continuous with respect to µ. Show that ν has a unique extension to a functional ν̃ : Σ̃ → R which is
truly continuous with respect to µ̃.

(f ) Let (X, Σ, µ) be a measure space and f a µ-integrable real-valued function. Show that the indefinite integral of
f is the unique countably additive functional ν : Σ → R such that whenever E ∈ Σ and f (x) ∈ [a, b] for almost every
x ∈ E, then aµE ≤ νE ≤ bµE.

(g) Say that two bounded additive functionals ν1 , ν2 on an algebra Σ of sets are mutually singular if for any ǫ > 0
there is an H ∈ Σ such that
sup{|ν1 F | : F ∈ Σ, F ⊆ H} ≤ ǫ,

sup{|ν2 F | : F ∈ Σ, F ∩ H = ∅} ≤ ǫ.
104 The Radon-Nikodým theorem 232Yg

(i) Show that ν1 and ν2 are mutually singular iff, in the language of 231Ya-231Yb, |ν1 | ∧ |ν2 | = 0.
(ii) Show that if Σ is a σ-algebra and ν1 and ν2 are countably additive, then they are mutually singular iff there
is an H ∈ Σ such that ν1 F = 0 whenever F ∈ Σ and F ⊆ H, while ν2 F = 0 whenever F ∈ Σ and F ∩ H = ∅.
(iii) Show that if νs , νtc and νe are defined from ν and µ as in 232I, then each pair of the three are mutually
singular.

(h) Let (X, Σ, µ) be a measure space and f a non-negative real-valued


R R function which is integrable over X; let ν be
its indefinite integral. Show that for any function g : X → R, g dν = f × g dµ in the sense that if one of these is
defined in [−∞, ∞] so is the other, and they are then equal. (Hint: start with simple functions g.)

(i) Let (X, Σ, µ) be a measure space, f an integrable function, and ν : Σ → R the indefinite integral of f . Show
that |ν|, as defined in 231Ya, is the indefinite integral of |f |.

(j) Let X be a set, Σ a σ-algebra of subsets of X, and ν : Σ → R a countably additive functional. Show that ν has
a Radon-Nikodým derivative with respect to |ν| as defined in 231Ya, and that any such derivative has modulus equal
to 1 |ν|-a.e.

(k) (H.König) Let X be a set and µ, ν two measures on X with the same domain Σ. For α ≥ 0, E ∈ Σ set
(αµ ∧ ν)(E) = inf{αµ(E ∩ F ) + ν(E \ F ) : F ∈ Σ} (cf. 112Ya1 ). Show that the following are equiveridical: (i) νE = 0
whenever µE = 0; (ii) supα≥0 (αµ ∧ ν)(E) = νE for every E ∈ Σ.

232 Notes and comments The Radon-Nikodým theorem must be on any list of the half-dozen most important
theorems of measure theory, and not only the theorem itself, but the techniques necessary to prove it, are at the heart
of the subject. In my book Fremlin 74 I discussed a variety of more or less abstract versions of the theorem and of
the method, to some of which I will return in §§327 and 365 of the next volume.
As I have presented it here, the essence of the proof is split between 231E and 232E. I think we can distinguish the
following elements. Let ν be a countably additive functional.
(i) ν is bounded (231Ea).
(ii) ν is expressible as the difference of non-negative functionals (231F).
(I gave this as a corollary of 231Eb, but it can also be proved by simpler methods, as in 231Ya.)
(iii) If ν > 0, there is an integrable f such that 0 < νf ≤ ν,
writing νf for the indefinite integral of f . (This is the point at which we really do need the Hahn decomposition 231Eb.)
R
(iv) The set Ψ = {f : νf ≤ ν} is closed under countable suprema, so there is an f ∈ Ψ maximising f .
(In part (b) of the proof of 232E, I spoke of simple functions; but this was solely to simplify the technical details, and
the same argument works if we apply it to Ψ instead of Φ. Note the use here of B.Levi’s theorem.)
(v) Take f from (iv) and use (iii) to show that ν − νf = 0.
Each of the steps (i)-(iv) requires a non-trivial idea, and the importance of the theorem lies not only in its remarkable
direct consequences in the rest of this chapter and elsewhere, but in the versatility and power of these ideas.
I introduce the idea of ‘truly continuous’ functional in order to give a reasonably straightforward account of the status
of the Radon-Nikodým theorem in non-σ-finite measure spaces. Of course the whole point is that a truly continuous
functional, like an indefinite integral, must be concentrated on a σ-finite part of the space (232Xa), so that 232E, as
stated, can be deduced easily from the standard form 232F. I dare to use the word ‘truly’ in this context because this
kind of continuity does indeed correspond to a topological notion (232Ya).
There is a possible trap in the definition I give of ‘absolutely continuous’ functional. Many authors use the condition
of 232Ba as a definition, saying that ν is absolutely continuous with respect to µ if νE = 0 whenever µE = 0. For
countably additive functionals this coincides with the ǫ-δ formulation in 232Aa; but for other additive functionals this
need not be so (232Xf(ii)). Mostly the distinction is insignificant, but I note that in 232Bd it is critical, since ν there
is not assumed to be countably additive.
In 232I I describe one of the many ways of decomposing a countably additive functional into mutually singular parts
with special properties. In 231Yf-231Yg I have already suggested a method of decomposing an additive functional into
the sum of a countably additive part and a ‘purely finitely additive’ part. All these results have natural expressions in
terms of the ordered linear space of bounded additive functionals on an algebra (231Yc).

1 Formerly 112Yb.
233C Conditional expectations 105

233 Conditional expectations


I devote a section to a first look at one of the principal applications of the Radon-Nikodým theorem. It is one of
the most vital ideas of measure theory, and will appear repeatedly in one form or another. Here I give the definition
and most basic properties of conditional expectations as they arise in abstract probability theory, with notes on convex
functions and a version of Jensen’s inequality (233I-233J).

233A σ-subalgebras Let X be a set and Σ a σ-algebra of subsets of X. A σ-subalgebra of Σ is a σ-algebra T of


subsets of X such that T ⊆ Σ. If (X, Σ, µ) is a measure space and T is a σ-subalgebra of Σ, then (X, T, µ↾ T) is again
a measure space; this is immediate from the definition (112A). Now we have the following straightforward lemma. It
is a special case of 235G below, but I give a separate proof in case you do not wish as yet to embark on the general
investigation pursued in §235.

233B Lemma Let (X, Σ, µ) be a measure space and T a σ-subalgebra of Σ. A real-valued function f defined on a
subset of X is µ↾RT-integrable Riff (i) it is µ-integrable (ii) dom f is µ↾ T-conegligible (iii) f is µ↾ T-virtually measurable;
and in this case f d(µ↾ T) = f dµ.
Pn
proof (a) Note first that if f is a µ↾ T-simple function, that is, is expressible as i=0 ai χEi where ai ∈ R, Ei ∈ T and
(µ↾ T)Ei < ∞ for each i, then f is µ-simple and
R Pn R
f dµ = i=0 ai µEi = f d(µ↾ T).

(b) Let Uµ be the set of non-negative µ-integrable functions and Uµ↾ T the set of non-negative µ↾ T-integrable
functions.
Suppose f ∈ Uµ↾ T . Then there is a non-decreasing sequence hfn in∈N of µ↾ T-simple functions such that f (x) =
limn→∞ fn µ↾ T-a.e. and
R R
f d(µ↾ T) = limn→∞ fn d(µ↾ T).
R R
RBut now Revery fn is also µ-simple, and fn dµ = fn d(µ↾ T) for every n, and f = limn→∞ fn µ-a.e. So f ∈ Uµ and
f dµ = f d(µ↾ T).

R (c) NowR suppose that f is µ↾ T-integrable. Then it is the difference of two members of Uµ↾ T , so is µ-integrable, and
f dµ = f d(µ↾ T). Also conditions (ii) and (iii) are satisfied, according to the conventions established in Volume 1
(122Nc, 122P-122Q).
(d) Suppose that f satisfies conditions (i)-(iii). Then |f | ∈ Uµ , and there is a conegligible set E ⊆ dom f such that
E ∈ T and f ↾E is T-measurable. Accordingly |f |↾E is T-measurable. Now, if ǫ > 0, then
1
R
(µ↾ T){x : x ∈ E, |f |(x) ≥ ǫ} = µ{x : x ∈ E, |f |(x) ≥ ǫ} ≤ ǫ |f |dµ < ∞;
moreover,
Z
sup{ g d(µ↾ T) : g is a µ↾ T-simple function, g ≤ |f | µ↾ T-a.e.}
Z
= sup{ g dµ : g is a µ↾ T-simple function, g ≤ |f | µ↾ T-a.e.}
Z
≤ sup{ g dµ : g is a µ-simple function, g ≤ |f | µ-a.e.}
Z
≤ |f |dµ < ∞.

By the criterion of 122Ja, |f | ∈ Uµ↾ T . Consequently f , being µ↾ T-virtually T-measurable, is µ↾ T-integrable, by 122P.
This completes the proof.

233C Remarks (a) My argument just above is detailed to the point of pedantry. I think, however, that while I
can be accused of wasting paper by writing everything down, every element of the argument is necessary to the result.
To be sure, some of the details are needed only because I use such a wide notion of ‘integrable function’; if you restrict
the notion of ‘integrability’ to measurable functions defined on the whole measure space, there are simplifications at
this stage, to be paid for later when you discover that many of the principal applications are to functions defined by
formulae which do not apply on the whole underlying space.
The essential point which does have to be grasped is that while a µ↾ T-negligible set is always µ-negligible, a
µ-negligible set need not be µ↾ T-negligible.
106 The Radon-Nikodým theorem 233Cb

(b) As the simplest possible example of the problems which can arise, I offer the following. Let (X, Σ, µ) be [0, 1]2
with Lebesgue measure. Let T be the set of those members of Σ expressible as F × [0, 1] for some F ⊆ [0, 1]; it is easy
to see that T is a σ-subalgebra of Σ. Consider f , g : X → [0, 1] defined by saying that
f (t, u) = 1 if u > 0, 0 otherwise,

g(t, u) = 1 if t > 0, 0 otherwise.


Then both f and g are µ-integrable, being constant µ-a.e. But only g is µ↾ T-integrable, because any non-negligible
E ∈ T includes a complete vertical section {t} × [0, 1], so that f takes both values 0 and 1 on E. If we set
h(t, u) = 1 if u > 0, undefined otherwise,
then again (on the conventions I use) h is µ-integrable but not µ↾ T-integrable, as there is no conegligible member of
T included in the domain of h.

(c) If f is defined everywhere in X, and µ↾ T is complete, then of course f is µ↾ T-integrable iff it is µ-integrable
and T-measurable. But note that in the example just above, which is one of the archetypes for this topic, µ↾ T is not
complete, as singleton sets are negligible but not measurable.

233D Conditional expectations Let (X, Σ, µ) be a probability space, that is, a measure space with µX = 1.
(Nearly all the ideas here work perfectly well for any totally finite measure space, but there seems nothing to be gained
from the extension, and the traditional phrase ‘conditional expectation’ demands a probability space.) Let T ⊆ Σ be
a σ-subalgebra.

(a) For any µ-integrable real-valued function f defined onR a conegligible subset of X, we have a corresponding
indefinite integral νf : Σ → R given by the formula νf E = E f for every E ∈ Σ. We know that νf is countably
additive and truly continuous with respect to µ, which in the present context is the same as saying that it is absolutely
continuous (232Bc-232Bd). Now consider the restrictions µ↾ T, νf ↾ T of µ and νf to the σ-algebra T. It follows directly
from the definitions of ‘countably additive’ and ‘absolutely continuous’ that νf ↾ T is countably additive and absolutely
continuous with respect to µ↾ T, therefore truly continuous with respect to µ↾ T. Consequently,
R the Radon-Nikodým
theorem (232E) tells us that there is a µ↾ T-integrable function g such that (νf ↾ T)F = F g d(µ↾ T) for every F ∈ T.

(b) Let us
R define a conditional
R expectation of f on T to be such a function; that is, a µ↾ T-integrable function
g such that F g d(µ↾ T) = F f dµ for every F ∈ T. Looking back at 233B, we see that for such a g we have
R R R R
F
g d(µ↾ T) = g × χF d(µ↾ T) = g × χF dµ = F
g dµ
for every F ∈ T; also, that g is almost everywhere equal to a T-measurable function defined everywhere in X which is
also a conditional expectation of f on T (232He).

(c) I set the word ‘a’ of the phrase ‘a conditional expectation’ in bold type to emphasize that there is nothing
unique about the function g. In 242J I will return to this point, and describe an object which could properly be called
‘the’ conditional expectation of f on T. g is ‘essentially unique’ only in the sense that if g1 , g2 are both conditional
expectations of f on T then g1 = g2 µ↾ T-a.e. (131Hb). This does of course mean that a very large number of its
properties – for instance, the distribution function G(a) = µ̂{x : g(x) ≤ a}, where µ̂ is the completion of µ (212C) –
are independent of which g we take.

(d) A word of explanation of the phrase ‘conditional expectation’ is in order. This derives from the standard
identification of probability with measure, due to Kolmogorov, which I will discuss more fully in Chapter 27. A real-
valued random variable may be regarded as a measurable,
R or virtually measurable, function f on a probability space
(X, Σ, µ); its ‘expectation’ becomes identifiedR with f dµ, supposing that this exists. If F ∈ Σ and µF > 0 then the
1
‘conditional expectation of f given F ’ is µF F
f . If F0 , . . . , Fn is a partition of X into measurable sets of non-zero
measure, then the function g given by
1 R
g(x) = f if x ∈ Fi
µFi Fi
is a kind of anticipated conditional expectation; if we are one day told that x ∈ Fi , then g(x) will be our subsequent
estimate of the expectation of f . In the terms of the definition above, g is a conditional expectation of f on the finite
algebra T generated by {F0 , . . . , Fn }. An appropriate intuition for general σ-algebras T is that they consist of the
events which we shall be able to observe at some stated future time t0 , while the whole algebra Σ consists of all events,
including those not observable until times later than t0 , if ever.
233E Conditional expectations 107

233E I list some of the elementary facts concerning conditional expectations.


Proposition Let (X, Σ, µ) be a probability space and T a σ-subalgebra of Σ. Let hfn in∈N be a sequence of µ-integrable
real-valued functions, and for each n let gn be a conditional expectation of fn on T. Then
(a) g1 + g2 is a conditional expectation of f1 + f2 on T;
(b) for any c ∈ R, cg0 is a conditional expectation of cf0 on T;
(c) if f1 ≤a.e. f2 then g1 ≤a.e. g2 ;
(d) if hfn in∈N is non-decreasing a.e. and f = limn→∞ fn is µ-integrable, then limn→∞ gn is a conditional expectation
of f on T;
(e) if f = limn→∞ fn is defined a.e. and there is a µ-integrable function h such that |fn | ≤a.e. h for every n, then
limn→∞ gn is a conditional expectation of f on T;
(f) if F ∈ T then g0 × χF is a conditional expectation of f0 × χF on T;
(g) if h is a bounded, µ↾ T-virtually measurable real-valued function defined µ↾ T-almost everywhere in X, then
g0 × h is a conditional expectation of f0 × h on T;
(h) if Υ is a σ-subalgebra of T, then a function h0 is a conditional expectation of f0 on Υ iff it is a conditional
expectation of g0 on Υ.
proof (a)-(b) We have only to observe that
R R R R R R
F
g1 + g2 d(µ↾ T) = F
g1 d(µ↾ T) + F
g2 d(µ↾ T) = F
f1 dµ + F
f2 dµ = F
f1 + f2 dµ,
R R R R
F
cg0 d(µ↾ T) = c F
g0 d(µ↾ T) = c F
f0 dµ = F
cf0 dµ
for every F ∈ T.
(c) If F ∈ T then
R R R R
F
g1 d(µ↾ T) = F
f1 dµ ≤ F
f2 dµ = F
g2 d(µ↾ T)
for every F ∈ T; consequently g1 ≤ g2 µ↾ T-a.e. (131Ha).
(d) By (c), hgn in∈N is non-decreasing µ↾ T-a.e.; moreover,
R R R
supn∈N gn d(µ↾ T) = supn∈N fn dµ = f dµ < ∞.
By B.Levi’s theorem, g = limn→∞ gn is defined µ↾ T-almost everywhere, and
R R R R
F
g d(µ↾ T) = limn→∞ F
gn d(µ↾ T) = limn→∞ F
fn dµ = F
f dµ
for every F ∈ T, so g is a conditional expectation of f on T.
(e) Set fn′ = inf m≥n fm , fn′′ = supm≥n fm for each n ∈ N. Then we have
−h ≤a.e. fn′ ≤ fn ≤ fn′′ ≤a.e. h,
and hfn′ in∈N , hfn′′ in∈N are almost-everywhere-monotonic sequences of functions both converging almost everywhere to
f . For each n, let gn′ , gn′′ be conditional expectations of fn′ , fn′′ on T. By (iii) and (iv), hgn′ in∈N and hgn′′ in∈N are
almost-everywhere-monotonic sequences converging almost everywhere to conditional expectations g ′ , g ′′ of f . Of
course g ′ = g ′′ µ↾ T-a.e. (233Dc). Also, for each n, gn′ ≤a.e. gn ≤a.e. gn′′ , so hgn in∈N converges to g ′ µ↾ T-a.e., and
g = limn→∞ gn is defined almost everywhere and is a conditional expectation of f on T.
(f ) For any H ∈ T,
R R R R
g × χF d(µ↾ T) =
H 0 H∩F
g0 d(µ↾ T) = H∩F
f0 dµ = H
f0 × χF dµ.
Pn
(g)(i) If h is actually (µ↾ T)-simple, say h = i=0 ai χFi where Fi ∈ T for each i, then
R Pn R Pn R R
F
g 0 × h d(µ↾ T) = i=0 a i F
g 0 × χF i d(µ↾ T) = i=0 a i F
f × χF i dµ = F
f × h dµ
for every F ∈ T. (ii) For the general case, if h is µ↾ T-virtually measurable and |h(x)| ≤ M µ↾ T-almost everywhere,
then there is a sequence hhn in∈N of µ↾ T-simple functions converging to h almost everywhere, and with |hn (x)| ≤ M for
every x, n. Now f0 × hn → f0 × h a.e. and |f0 × hn | ≤a.e. M |f0 | for each n, while g0 × hn is a conditional expectation
of f0 × hn for every n, so by (e) we see that limn→∞ g0 × hn will be a conditional expectation of f0 × h; but this is
equal almost everywhere to g0 × h.
R R
(h) We need note only that H g0 d(µ↾ T) = H f0 dµ for every H ∈ Υ, so
108 The Radon-Nikodým theorem 233E

Z Z
h0 d(µ↾ Υ) = g0 d(µ↾ T) for every H ∈ Υ
H H
Z Z
⇐⇒ h0 d(µ↾ Υ) = f0 dµ for every H ∈ Υ.
H H

233F Remarks Of course the results above are individually nearly trivial (though I think (e) and (g) might give
you pause for thought if they were offered without previous preparation of the ground). Cumulatively they amount to
some quite strong properties. In §242 I will restate them in language which is syntactically more direct, but relies on
a deeper level of abstraction.
As an illustration of the power of conditional expectations to surprise us, I offer the next proposition, which depends
on the concept of ‘convex’ function.

233G Convex functions Recall that a real-valued function φ defined on an interval I ⊆ R is convex if
φ(tb + (1 − t)c) ≤ tφ(b) + (1 − t)φ(c)
whenever b, c ∈ I and t ∈ [0, 1].

Examples The formulae |x|, x2 , e±x ± x define convex functions on R; on ]−1, 1[ we have 1/(1 − x2 ); on ]0, ∞[ we
have 1/x and x ln x; on [0, 1] we have the function which is zero on ]0, 1[ and 1 on {0, 1}.

233H The general theory of convex functions is both extensive and important; I list a few of their more salient
properties in 233Xe. For the moment the following lemma covers what we need.

Lemma Let I ⊆ R be a non-empty open interval (bounded or unbounded) and φ : I → R a convex function.
(a) For every a ∈ I there is a b ∈ R such that φ(x) ≥ φ(a) + b(x − a) for every x ∈ I.
(b) If we take, for each q ∈ I ∩ Q, a bq ∈ R such that φ(x) ≥ φ(q) + bq (x − q) for every x ∈ I, then
φ(x) = supq∈I∩Q φ(q) + bq (x − q)
for every x ∈ I.
(c) φ is Borel measurable.

proof (a) If c, c′ ∈ I and c < a < c′ , then a is expressible as dc + (1 − d)c′ for some d ∈ ]0, 1[, so that φ(a) ≤
dφ(c) + (1 − d)φ(c′ ) and

φ(a)−φ(c) dφ(c)+(1−d)φ(c′ )−φ(c) (1−d)(φ(c′ )−φ(c))


≤ =
a−c dc+(1−d)c′ −c (1−d)(c′ −c)
d(φ(c′ )−φ(c)) φ(c′ )−dφ(c)−(1−d)φ(c′ ) φ(c′ )−φ(a)
= = ≤ .
d(c′ −c) c′ −dc−(1−d)c′ c′ −a

This means that


φ(a)−φ(c)
b = supc<a,c∈I
a−c

φ(c′ )−φ(a)
is finite, and b ≤ whenever a < c′ ∈ I; accordingly φ(x) ≥ φ(a) + b(x − a) for every x ∈ I.
c′ −a

(b) By the choice of the bq , φ(x) ≥ supq∈Q φq (x). On the other hand, given x ∈ I, fix y ∈ I such that x < y and let
b ∈ R be such that φ(z) ≥ φ(x) + b(z − x) for every z ∈ I. If q ∈ Q and x < q < y, we have φ(y) ≥ φ(q) + bq (y − q), so
that bq ≤ φ(y)−φ(q)
y−q and

φ(y)−φ(q)
φ(q) + bq (x − q) = φ(q) − bq (q − x) ≥ φ(q) − (q − x)
y−q
y−x q−x y−x q−x
= φ(q) − φ(y) ≥ (φ(x) + b(q − x)) − φ(y).
y−q y−q y−q y−q

Now
233K Conditional expectations 109

y−x q−x
φ(x) = lim (φ(x) + b(q − x)) − φ(y)
q↓x y−q y−q
y−x q−x
≤ sup (φ(x) + b(q − x)) − φ(y)
q∈Q∩]x,y[ y−q y−q

≤ sup φ(q) + bq (x − q) ≤ sup φ(q) + bq (x − q).


q∈Q∩]x,y[ q∈Q∩I

(c) Writing φq (x) = φ(q)+bq (x−q) for every q ∈ Q∩I, every φq is a Borel measurable function, and φ = supq∈I∩Q φq
is the supremum of a countable family of Borel measurable functions, so is Borel measurable.

233I Jensen’s inequality Let (X, Σ, µ) be a measure space and φ : R → R a convex function.
(a) Suppose that f andRg are real-valued µ-virtually measurableR functions Rdefined almost everywhere in X and that
g ≥ 0 almost everywhere, g = 1 and g × f is integrable. Then φ( g × f ) ≤ g × φf , where we may need to interpret
the right-hand integral as ∞. R R
(b) In particular, if µX = 1 and f is a real-valued function which is integrable over X, then φ( f ) ≤ φf .
proof (a) For each q ∈ Q take bq such that φ(t) ≥ φq (t) = φ(q) + bq (t − q) for every t ∈ R (233Ha). Because φ is Borel
measurable (233Hc), φf is µ-virtually measurable (121H), so g × φf also is; sinceRg × φf is defined almost everywhere
and almost everywhere greater than or equal to the integrable function g × φ0 f , g × φf is defined in ]−∞, ∞]. Now
Z Z

φq g × f = φ(q) + bq g × f − bq q
Z Z Z
= g × (bq f + (φ(q) − bq q)χX) = g × φq f ≤ g × φf,
R
because g = 1 and g ≥ 0 a.e. By 233Hb,
R R R
φ( g × f ) = supq∈Q φq ( g × f ) ≤ g × φf .

(b) Take g to be the constant function with value 1.

233J Even the special case 233Ib of Jensen’s inequality is already very useful. It can be extended as follows.
Theorem Let (X, Σ, µ) be a probability space and T a σ-subalgebra of Σ. Let φ : R → R be a convex function and f
a µ-integrable real-valued function defined almost everywhere in X such that the composition φfR is alsoR integrable. If
g and h are conditional expectations on T of f , φf respectively, then φg ≤a.e. h. Consequently φg ≤ φf .
proof We use the same ideas as in 233I. For each q ∈ Q take a bq ∈ R such that φ(t) ≥ φq (t) = φ(q) + bq (t − q) for
every t ∈ R, so that φ(t) = supq∈Q φq (t) for every t ∈ R. Now setting
ψq (x) = φ(q) + bq (g(x) − q)
for x ∈ dom g, we see that ψq = φq g is a conditional expectation of φq f , and as φq f ≤a.e. φf we must have ψq ≤a.e. h.
But also φg = supq∈Q ψq wherever g is defined, so φg ≤a.e. h, as claimed.
R R R
It follows at once that φg ≤ h = φf .

233K I give the following proposition, an elaboration of 233Eg, in a very general form, as its applications can turn
up anywhere.
Proposition Let (X, Σ, µ) be a probability space, and T a σ-subalgebra of Σ. Suppose that f is a µ-integrable
function and h is a (µ↾ T)-virtually measurable real-valued function defined (µ↾ T)-almost everywhere in X. Let g, g0
be conditional expectations of f and |f | on T. Then f × h is integrable iff g0 × h is integrable, and in this case g × h
is a conditional expectation of f × h on T.
proof (a) Suppose that h is a µ↾ T-simple function. Then surely f × h and g0 × h are integrable, and g × h is a
conditional expectation of f × h as in 233Eg.
(b) Now suppose that f , h ≥ 0. Then g = g0 ≥ 0 a.e. (233Ec). Let h̃ be a non-negative T-measurable function
defined everywhere in X such that h =a.e. h̃. For each n ∈ N set
110 The Radon-Nikodým theorem 233K

hn (x) = 2−n k if 0 ≤ k < 4n and 2−n k ≤ h̃(x) < 2−n (k + 1),


= 2n if h̃(x) ≥ 2−n .
Then hn is a (µ↾ T)-simple function, so g × hn is a conditional expectation of f × hn . Both hf × hn in∈N and hg × hn in∈N
are almost everywhere non-decreasing sequences of integrable functions, with limits f × h and g × h respectively. By
B.Levi’s theorem,

f × h is integrable ⇐⇒ f × h̃ is integrable
Z Z
⇐⇒ sup f × hn < ∞ ⇐⇒ sup g × hn < ∞
n∈N n∈N
R R
(because g × hn = f × hn for each n)
⇐⇒ g × h is integrable ⇐⇒ g0 × h is integrable.

Moreover, in this case


Z Z Z
f ×h= f × h̃ = lim f × hn
E E n→∞ E
Z Z Z
= lim g × hn = g × h̃ = g×h
n→∞ E E E

for every E ∈ T, while g × h is (µ↾ T)-virtually measurable, so g × h is a conditional expectation of f × h.


(c) Finally, consider the general case of integrable f and virtually measurable h. Set f + = f ∨ 0, f − = (−f ) ∨ 0,
so that f = f + − f − and 0 ≤ f + , f − ≤ |f |; similarly, set h+ = h ∨ 0, h− = (−h) ∨ 0. Let g1 , g2 be conditional
expectations of f + , f − on T. Because 0 ≤ f + , f − ≤ |f |, 0 ≤ g1 , g2 ≤a.e. g0 , while g =a.e. g1 − g2 .
We see that

f × h is integrable ⇐⇒ |f | × |h| = |f × h| is integrable


⇐⇒ g0 × |h| is integrable
⇐⇒ g0 × h is integrable.

And in this case all four of f + × h+ , . . . , f − × h− are integrable, so


(g1 − g2 ) × h = g1 × h+ − g2 × h+ − g1 × h− + g2 × h−
is a conditional expectation of
f + × h+ − f − × h+ − f + × h− + f − × h− = f × h.
Since g × h =a.e. (g1 − g2 ) × h, this also is a conditional expectation of f × h, and we’re done.

233X Basic exercises (a) Let (X, Σ, µ) be a probability space and T a σ-subalgebra of Σ. Let hfn in∈N be a
sequence of non-negative µ-integrable functions and suppose that gn is a conditional expectation of fn on T for each n.
Suppose that f = lim inf n→∞ fn is integrable and has a conditional expectation g. Show that g ≤a.e. lim inf n→∞ gn .

(b) Let I ⊆ R be an interval, and φ : I → R a function. Show that φ is convex iff {x : x ∈ I, φ(x) + bx ≤ c} is an
interval for every b, c ∈ R.

> (c) Let I ⊆ R be an open interval and φ : I → R a function. (i) Show that if φ is differentiable then it is convex
iff φ′ is non-decreasing. (ii) Show that if φ is absolutely continuous on every bounded closed subinterval of I then φ is
convex iff φ′ is non-decreasing on its domain.

(d) For any r ≥ 1, a subset C of R r is convex if tx + (1 − t)y ∈ C for all x, y ∈ C and t ∈ [0, 1]. If C ⊆ R r is
convex, then a function φ : C → R is convex if φ(tx + (1 − t)y) ≤ tφ(x) + (1 − t)φ(y) for all x, y ∈ C and t ∈ [0, 1].
Let C ⊆ R r be a convex set and φ : C → R a function. Show that the following are equiveridical: (i) the function
is convex in R r+1 ; (iii) the set {x : x ∈ C, φ(x) + b . x ≤ c} is
φ is convex; (ii) the set {(x, t) : x ∈ C, t ∈ R, t ≥ φ(x)} P
r r r
convex in R for every b ∈ R and c ∈ R, writing b . x = i=1 βi ξi if b = (β1 , . . . , βr ) and x = (ξ1 , . . . , ξr ).
233Yh Conditional expectations 111

(e) Let I ⊆ R be an interval and φ : I → R a convex function.


(i) Show that if a, d ∈ I and a < b ≤ c < d then
φ(b)−φ(a) φ(d)−φ(c)
≤ .
b−a d−c

(ii) Show that φ is continuous at every interior point of I.


(iii) Show that either φ is monotonic on I or there is a c ∈ I such that φ(c) = minx∈I φ(x) and φ is non-increasing
on I ∩ ]−∞, c], monotonic non-decreasing on I ∩ [c, ∞[.
(iv) Show that φ is differentiable at all but countably many points of I, and that its derivative is non-decreasing
in the sense that φ′ (x) ≤ φ′ (y) whenever x, y ∈ dom φ′ and x ≤ y.
(v) Show that if I is closed and bounded and φ is continuous then φ is absolutely continuous.
(vi) Show that if I is closed and bounded and ψ : I → R is absolutely continuous with a non-decreasing derivative
then ψ is convex.

(f ) Show that if I ⊆ R is an interval and φ, ψ : I → R are convex functions so is aφ + bψ for any a, b ≥ 0.

(g) In the context of 233K, give an example in which g × h is integrable but f × h is not. (Hint: take X, µ, T as
in 233Cb, and arrange for g to be 0.)

(h) Let I ⊆ R be an interval and Φ a non-empty family of convex real-valued functions on I such that ψ(x) =
supφ∈Φ φ(x) is finite for every x ∈ I. Show that ψ is convex.

233Y Further exercises (a) If I ⊆ R is an interval, a function φ : I → R is mid-convex if φ( x+y 1


2 ) ≤ 2 (φ(x)+φ(y))
for all x, y ∈ I. Show that a mid-convex function which is bounded on any non-trivial subinterval of I is convex.

(b) Generalize 233Xd to arbitrary normed spaces in place of R r .

(c) Let (X, Σ, µ) be a probability space and T a σ-subalgebra of Σ. Let φ be a convex real-valued function with
domain an interval I ⊆ R, and f an integrable real-valued function on X such that f (x) ∈ I for almost every x ∈ X
and φf is integrable. Let g, h be conditional expectations on T of f , φf respectively. Show that g(x) ∈ I for almost
every x and that φg ≤a.e. h.

(d)(i) Show that if I ⊆ R is a bounded interval, E ⊆ I is Lebesgue measurable, and µE > 23 µI where µ is Lebesgue
measure, then for every x ∈ I there are y, z ∈ E such that z = x+y 2 . (Hint: by 134Ya/263A, µ(x+E)+µ(2E) > µ(2I).)
(ii) Show that if f : [0, 1] → R is a mid-convex Lebesgue measurable function (definition: 233Ya), a > 0, and
E = {x : x ∈ [0, 1], a ≤ f (x) < 2a} is not negligible, then there is a non-trivial interval I ⊆ [0, 1] such that f (x) > 0
for every x ∈ I. (Hint: 223B.) (iii) Suppose that f : [0, 1] → R is a mid-convex function such that f ≤ 0 almost
everywhere in [0, 1]. Show that f ≤ 0 everywhere in ]0, 1[. (Hint: for every x ∈ ]0, 1[, max(f (x − t), f (x + t)) ≤ 0 for
almost every t ∈ [0, min(x, 1 − x)].) (iv) Suppose that f : [0, 1] → R is a mid-convex Lebesgue measurable function
such that f (0) = f (1) = 0. Show that f (x) ≤ 0 for every x ∈ [0, 1]. (Hint: show that {x : f (x) ≤ 0} is dense in [0, 1],
use (ii) to show that it is conegligible in [0, 1] and apply (iii).) (v) Show that if I ⊆ R is an interval and f : I → R is a
mid-convex Lebesgue measurable function then it is convex.

(e) Let (X, Σ, µ) be a probability space, T a σ-subalgebra of subsets ofR X, andR f : X → [0, ∞] a Σ-measurable
function. Show that (i) there is a T-measurable g : X → [0, ∞] such that F g = F f for every F ∈ T (ii) any two
such functions are equal a.e.

(f ) Suppose that r ≥ 1 and C ⊆ R r \ {0} is a convex set. Show that there is a non-zero b ∈ R r such that b . z ≥ 0
for every z ∈ C. (Hint: if r = 2, identify R 2 with C; reduce to the case in which C contains no points which are real
and negative; set θ = sup{arg z : z ∈ C} and b = −ieiθ . Now induce on r.)

(g) Suppose that r ≥ 1, C ⊆ R r is a convex set and φ : C → R is a convex function. Show that there is a function
h : R r → [−∞, ∞[ such that φ(z) = sup{h(y)+z . y : y ∈ R r } for every z ∈ C. (Hint: try h(y) = inf{φ(z)−z . y : z ∈ C},
and apply 233Yf to a translate of {(z, t) : φ(z) ≤ t}.)

(h) Let (X, Σ, µ) be a probability space, r ≥ 1Tan integer and C ⊆ R r a convex set. Let f1 , . . . , fr be µ-integrable
real-valued functions and suppose that {x : x ∈ j≤r dom fj , (f1 (x), . . . , fr (x)) ∈ C} is a conegligible subset of X.
R R
Show that ( f1 , . . . , fr ) ∈ C. (Hint: induce on r.)
112 The Radon-Nikodým theorem 233Yi

(i) Let (X, Σ, µ) be a probability space, r ≥ 1 an integer, C ⊆ R r a convexT set and φ : C → R r a convex function.
Let f1 , . . . , fr be µ-integrable real-valued functions and suppose that {x : x ∈ j≤r dom fj , (f1 (x), . . . , fr (x)) ∈ C} is
R R R
a conegligible subset of X. Show that φ( f1 , . . . , fr ) ≤ φ(f1 , . . . , fr ).

(j) Let (X, Σ, µ) be a measure space with µX > 0, r ≥ 1 an integer, C ⊆ R r a convex set such that tz ∈ C whenever
z ∈ C and t >T0, and φ : C → R a convex function. Let f1 , . . . , fr be µ-integrable real-valuedR functionsR and suppose
that {x : x ∈ j≤r dom fj , (f1 (x), . . . , fr (x)) ∈ C} is a conegligible subset of X. Show that ( f1 , . . . , fr ) ∈ C and
R R R
that φ( f1 , . . . , fr ) ≤ φ(f1 , . . . , fr ). (Hint: putting 215B(viii) and
R 235K Rbelow together, show that there are a
probability measure ν on X and a function h : X → [0, ∞[ such that fj dµ = fj × h dν for every j.)

233 Notes and comments The concept of ‘conditional expectation’ is fundamental in probability theory, and will
reappear in Chapter 27 in its natural context. I hope that even as an exercise in technique, however, it will strike you
as worth taking note of.
I introduced 233E as a ‘list of elementary facts’, and they are indeed straightforward. But below the surface there
are some remarkable ideas waiting for expression. If you take T to be the trivial R algebra {∅, X}, so that the (unique)
conditional expectation of an integrable function f is the constant function ( f )χX, then 233Ed and 233Ee become
versions of B.Levi’s theorem and Lebesgue’s Dominated Convergence
R RTheorem. (Fatou’s Lemma is in 233Xa.) Even
233Eg can be thought of as a generalization of the result that cf = c f , where the constant c has been replaced by a
bounded T-measurable function. A recurrent theme in the later parts of this treatise will be the search for ‘conditioned’
versions of theorems. The proof of 233Ee is a typical example of an argument which has been translated from a proof
of the original ‘unconditioned’ result.
I suggested that 233I-233J are surprising, and I think that most of us find them so, even applied to the list of convex
functions given in 233G. But I should remark that in a way 233J has very little to do with conditional expectations.
The only properties of conditional expectations used in the proof are (i) that if g is a conditional expectation of f , then
aχX + bg is a conditional expectation of aχX + bf for all real a, b (ii) if g1 , g2 are conditional expectations of f1 , f2 and
f1 ≤a.e. f2 , then g1 ≤a.e. g2 . See 244Xm below. Jensen’s inequality has an interesting extension to the multidimensional
case, explored in 233Yf-233Yj. If you have encountered ‘geometric’ forms of the Hahn-Banach theorem (see 3A5C in
Volume 3) you will find 233Yf and 233Yg very natural, and you may notice that the finite-dimensional case is slightly
different from the infinite-dimensional case you have probably been taught. I think that in fact the most delicate step
is in 233Yh.
Note that 233Ib can be regarded as the special
R case of 233J in which T = {∅, X}. In fact 233Ia can be derived from
233Ib applied to the measure ν where νE = E g for every E ∈ Σ.
Like 233B, 233K seems to have rather a lot of technical detail in the argument. The point of this result is that we
can deduce the integrability of f × h from that of g0 × h (but not from the integrability of g × h; see 233Xg). Otherwise
it should be routine.

234 Operations on measures


I take a few pages to describe some standard constructions. The ideas are straightforward, but a number of details
need to be worked out if they are to be securely integrated into the general framework I employ. The first step is
to formally introduce inverse-measure-preserving functions (234A-234B), the most important class of transformations
between measure spaces. For construction of new measures, we have the notions of image measure (234C-234E), sum of
measures (234G-234H) and indefinite-integral measure (234I-234O). Finally I mention a way of ordering the measures
on a given set (234P-234Q).

234A Inverse-measure-preserving functions It is high time that I introduced the nearest thing in measure
theory to a ‘morphism’. If (X, Σ, µ) and (Y, T, ν) are measure spaces, a function φ : X → Y is inverse-measure-
preserving if φ−1 [F ] ∈ Σ and µ(φ−1 [F ]) = νF for every F ∈ T.

234B Proposition Let (X, Σ, µ) and (Y, T, ν) be measure spaces, and φ : X → Y an inverse-measure-preserving
function.
(a) If µ̂, ν̂ are the completions of µ, ν respectively, φ is also inverse-measure-preserving for µ̂ and ν̂.
(b) µ is a probability measure iff ν is a probability measure.
(c) µ is totally finite iff ν is totally finite.
(d)(i) If ν is σ-finite, then µ is σ-finite.
(ii) If ν is semi-finite and µ is σ-finite, then ν is σ-finite.
234C Operations on measures 113

(e)(i) If ν is σ-finite and atomless, then µ is atomless.


(ii) If ν is semi-finite and µ is purely atomic, then ν is purely atomic.
(f)(i) µ∗ φ−1 [B] ≤ ν ∗ B for every B ⊆ Y .
(ii) µ∗ A ≤ ν ∗ φ[A] for every A ⊆ X.
(g) If (Z, Λ, λ) is another measure space, and ψ : Y → Z is inverse-measure-preserving, then ψφ : X → Z is inverse-
measure-preserving.
proof (a) If ν̂ measures F , there are F ′ , F ′′ ∈ T such that F ′ ⊆ F ⊆ F ′′ and ν(F ′′ \ F ′ ) = 0. Now
φ−1 [F ′ ] ⊆ φ−1 [F ] ⊆ φ−1 [F ′′ ], µ(φ−1 [F ′′ ] \ φ−1 [F ′ ]) = ν(F ′′ \ F ′ ) = 0,
so µ̂ measures φ−1 [F ] and
µ̂(φ−1 [F ]) = µφ−1 [F ′ ] = νF ′ = ν̂F .
As F is arbitrary, φ is inverse-measure-preserving for µ̂ and ν̂.
(b)-(c) are surely obvious.
(d)(i) If hFn in∈N is a cover of Y by sets of finite measure for ν, then hφ−1 [Fn ]in∈N is a cover of X by sets of finite
measure for µ.
(ii) Let F ⊆ T be a disjoint family of non-ν-negligible sets. Then hφ−1 [F ]iF ∈F is a disjoint family of non-µ-
negligible sets. By 215B(iii), F is countable. By 215B(iii) in the opposite direction, ν is σ-finite.
(e)(i) Suppose that E ∈ Σ and µE > 0. Let hFn in∈N be a cover of Y by sets of finite measure for ν. Because ν is
S each n, a finite partition hFni ii∈In of Fn such that νFni < µE for every i ∈ In (use 215D
atomless, we can find, for
repeatedly). Now X = n∈N,i∈In φ−1 [Fni ], so there are n ∈ N and i ∈ In with
0 < µ(E ∩ φ−1 [Fni ) ≤ µφ−1 [Fni ] = νFni < µE,
and E is not a µ-atom. As E is arbitrary, µ is atomless.
(ii) Suppose that F ∈ T and νF > 0. Because ν is semi-finite, there is an F1 ⊆ F such that 0 < νF1 < ∞. Now
µφ−1 [F1 ] > 0; because µ is purely atomic, there is a µ-atom E ⊆ φ−1 [F1 ].
Let G be the set of those G ∈ T such that G ⊆ F1 and µ(E ∩ φ−1 [G]) = 0. Then the union of any sequence in G
belongs to G, so by 215Ac there is an H ∈ G such that ν(G \ H) = 0 whenever G ∈ G. Consider F1 \ H. We have
ν(F1 \ H) = µ(φ−1 [F1 ] \ φ−1 [H]) ≥ µ(E \ φ−1 [H]) = µE > 0.
If G ∈ T and G ⊆ F1 \ H, then one of E ∩ φ−1 [G], E \ φ−1 [G] is µ-negligible. In the former case, G ∈ G and G = G \ H
is ν-negligible. In the latter case, F1 \ G ∈ G and (F1 \ H) \ G is ν-negligible. As G is arbitrary, F1 \ H is a ν-atom
included in F ; as F is arbitrary, ν is purely atomic.
(f )(i) Let F ∈ T be such that B ⊆ F and ν ∗ B = νF (132Aa); then φ−1 [B] ⊆ φ−1 [F ] so
µ∗ φ−1 [B] ≤ µφ−1 [F ] = νF = ν ∗ B.

(ii) µ∗ A ≤ µ∗ (φ−1 [φ[A]]) ≤ ν ∗ φ[A] by (i).


(g) For any W ∈ Λ,
µ(ψφ)−1 [W ] = µφ−1 [ψ −1 [W ]] = νψ −1 [W ] = λW .

234C Image measures The following construction is one of the commonest ways in which new measure spaces
appear.
Proposition Let (X, Σ, µ) be a measure space, Y any set, and φ : X → Y a function. Set
T = {F : F ⊆ Y, φ−1 [F ] ∈ Σ}, νF = µ(φ−1 [F ]) for every F ∈ T.
Then (Y, T, ν) is a measure space.
proof (a) ∅ = φ−1 [∅] ∈ Σ so ∅ ∈ T.
(b) If F ∈ T, then φ−1 [F ] ∈ Σ, so X \ φ−1 [F ] ∈ Σ; but X \ φ−1 [F ] = φ−1 [Y \ F ], so Y \ F ∈ T.
S S
(c) If hFn in∈N is a sequence in T, then φ−1 [Fn ] ∈ Σ for every n, so n∈N φ−1 [Fn ] ∈ Σ; but φ−1 [ n∈N Fn ] =
S −1
S
n∈N φ [Fn ], so n∈N Fn ∈ T.
114 The Radon-Nikodým theorem 234C

Thus T is a σ-algebra.
(d) ν∅ = µφ−1 [∅] = µ∅ = 0.
(e) If hFn in∈N is a disjoint sequence in T, then hφ−1 [Fn ]in∈N is a disjoint sequence in Σ, so
S S S P∞ P∞
ν( n∈N Fn ) = µφ−1 [ n∈N Fn ] = µ( n∈N φ−1 [Fn ]) = n=0 µφ−1 [Fn ] = n=0 νFn .
So ν is a measure.

234D Definition In the context of 234C, ν is called the image measure or push-forward measure; I will denote
it µφ−1 .
Remark I ought perhaps to say that this construction does not always produce exactly the ‘right’ measure on Y ; there
are circumstances in which some modification of the measure µφ−1 described here is more useful. But I will note these
explicitly when they occur; when I use the unadorned phrase ‘image measure’ I shall mean the measure constructed
above.

234E Proposition Let (X, Σ, µ) be a measure space, Y a set and φ : X → Y a function; let µφ−1 be the image
measure on Y .
(a) φ is inverse-measure-preserving for µ and µφ−1 .
(b) If µ is complete, so is µφ−1 .
(c) If Z is another set, and ψ : Y → Z a function, then the image measures µ(ψφ)−1 and (µφ−1 )ψ −1 on Z are the
same.
proof (a) Immediate from the definitions.
(b) Write ν for µφ−1 and T for its domain. If ν ∗ B = 0, then µ∗ φ−1 [B] = 0, by 234B(f-i); as µ is complete,
−1
φ [B] ∈ Σ, so B ∈ T. As B is arbitrary, ν is complete.
(c) For G ⊆ Z and u ∈ [0, ∞],

(µ(ψφ)−1 )(G) is defined and equal to u


⇐⇒ µ((ψφ)−1 [G]) is defined and equal to u
⇐⇒ µ(φ−1 [ψ −1 [G]]) is defined and equal to u
⇐⇒ (µφ−1 )(ψ −1 [G])) is defined and equal to u
⇐⇒ ((µφ−1 )ψ −1 )(G) is defined and equal to u.

*234F In the opposite direction, the following construction of a pull-back measure is sometimes useful.
Proposition Let X be a set, (Y, T, ν) a measure space, and φ : X → Y a function such that φ[X] has full outer
measure in Y . Then there is a measure µ on X, with domain Σ = {φ−1 [F ] : F ∈ T}, such that φ is inverse-measure-
preserving for µ and ν.
proof The check that Σ is a σ-algebra of subsets of X is S straightforward; we need to know is that φ−1 [∅] = ∅,
S all −1
−1 −1 −1
X \ φ [F ] = φ [Y \ F ] for every F ⊆ Y , and that φ [ n∈N Fn ] = n∈N φ [Fn ] for every sequence hFn in∈N of
subsets of Y . The key fact is that if F1 , F2 ∈ T and φ−1 [F1 ] = φ−1 [F2 ], then φ[X] does not meet F1 △F2 ; because φ[X]
has full outer measure, F1 △F2 is ν-negligible and νF1 = νF2 . Accordingly the formula µφ−1 [F ] = νF does define a
function µ : Σ → [0, ∞]. Now
µ∅ = µφ−1 [∅] = ν∅ = 0.
Next, if hEn in∈N is a disjoint sequence in Σ, choose FSn ∈ T such that En = φ−1 [Fn ] for each n ∈ N. The sequence
hFn in∈N need not be disjoint, but if we set Fn′ = Fn \ i<n Fi for each n ∈ N, then hFn′ in∈N is disjoint and
S
En = En \ i<n Ei = φ−1 [Fn′ ]
for each n; so
S S P∞ P∞
µ( n∈N En ) = ν( n∈N Fn′ ) = n=0 νFn′ = n=0 µEn .
As hEn in∈N is arbitrary, µ is a measure on X, as required.
234H Operations on measures 115

234G Sums of measures I come now to a quite different way of building measures. The idea is an obvious one,
but the technical details, in the general case I wish to examine, need watching.

T Let X be a set, and hµi ii∈I a family of measuresPon X. For each i ∈ I, let Σi be the domain of µi . Set
Proposition
Σ = PX ∩ i∈I Σi and define µ : Σ → [0, ∞] by setting µE = i∈I µi E for every E ∈ Σ. Then µ is a measure on X.
proof Σ is a σ-algebra of subsets of X because every Σi is. (Apply 111Ga with S = {Σi : i ∈ I} ∪ {PX}.) Of course
µ takes values in [0, ∞] (226A). µ∅ = 0 because µi ∅ = 0 for every i. If hEn in∈N is a disjoint sequence in Σ with union
E, then

X ∞
XX ∞ X
X
µE = µi E = µ i En = µ i En
i∈I i∈I n=0 n=0 i∈I
(226Af)

X
= µEn .
n=0

So µ is a measure.
Remark In this context, I will call µ the sum of the family hµi ii∈I .

234H Proposition Let X be a set and hµi ii∈I a family of complete measures on X with sum µ.
(a) µ is complete.
(b)(i) A subset of X is µ-negligible iff it is µi -negligible for every i ∈ I.
(ii) A subset of X is µ-conegligible iff it is µi -conegligible for Revery i ∈ I. R
(c) Let f be a function from Pa subset of XPto [−∞, ∞]. Then f dµ is defined Rin [−∞, ∞] P iff R f dµi is defined in
[−∞, ∞] for every i and one of i∈I f + dµi , i∈I f − dµi is finite, and in this case f dµ = i∈I f dµi .
T
proof Write Σi = dom µi for i ∈ I, Σ = PX ∩ i∈I Σi = dom µ.
(a) If E ⊆ F ∈ Σ and µF = 0, then µi F = 0 for every i ∈ I; because µi is complete, Ei ∈ Σi for every i ∈ I, and
E ∈ Σ.
(b) This now follows at once, since a set A ⊆ X is µ-negligible iff µA = 0.
(c)(i) Note first that (b-ii) tells us that, under either hypothesis, dom f is conegligible
R both
P forR µ and for every µi ,
so that if we extend f to X by giving it the value 0 on X \ dom f then neither f dµ nor i∈I f dµi is affected. So
let us assume from now on that f is defined everywhere on X. Now it is plain that either hypothesis ensures that f is
Σ-measurable, that is, is Σi -measurable for every i ∈ I.
P4n
(ii) Suppose that f is non-negative. For n ∈ N set fn (x) = k=1 2−n χ{x : f (x) ≥ 2−n k}, so that hfn in∈N is a
non-decreasing sequence with supremum f . We have
Z 4
X
n
4 X
X
n

fn dµ = 2−n µ{x : f (x) ≥ 2−n k} = 2−n µi {x : f (x) ≥ 2−n k}


k=1 k=1 i∈I
4n
XX XZ
= 2−n µi {x : f (x) ≥ 2−n k} = fn dµi
i∈I k=1 i∈I

for every n, so
Z Z XZ XZ
f dµ = sup fn dµ = sup sup fn dµi = sup sup fn dµi
n∈N n∈N J⊆I is finite J⊆I is finite n∈N i∈J
i∈J
XZ X Z XZ
= sup lim fn dµi = sup lim fn dµi = f dµi .
J⊆I is finite n→∞ i∈J J⊆I is finite i∈J n→∞ i∈I

(iii) Generally,
116 The Radon-Nikodým theorem 234H

Z
f dµ is defined in [∞, ∞]
Z Z
⇐⇒ f + dµ and f − dµ are defined and at most one is infinite
XZ XZ
⇐⇒ +
f dµi and f − dµi are defined and at most one is infinite
i∈I i∈I
Z XZ
⇐⇒ f dµi is defined for every i and at most one of f + dµi ,
i∈I
XZ
f − dµi is infinite,
i∈I

and in this case


R R R P R P R P R
f dµ = f + dµ − f − dµ = i∈I f + dµi − i∈I f − dµi = i∈I f dµi .

234I Indefinite-integral measures Extending an idea already used in 232D, we are led to the following construc-
tion; once again, we need to take care over the formal details if we want to get full value from it.
Theorem Let (X, Σ, µ) be a measure space,R and f a non-negative µ-virtually measurable real-valued function defined
on a conegligible subset of X. Write νF = f × χF dµ whenever F ⊆ X is such that the integral is defined in [0, ∞]
according to the conventions of 133A. Then ν is a complete measure on X, and its domain includes Σ.
R
proof (a) Write T for the domain of ν, that is, the family of sets F ⊆ X such that f × χF dµ is defined in [0, ∞],
that is, f × χF is µ-virtually measurable (133A). Then T is a σ-algebra of subsets of X. P
P For each F ∈ T let HF ⊆ X
be a µ-conegligible set such that f × χF ↾HF is Σ-measurable. Because f itself is µ-virtually measurable, X ∈ T. If
F ∈ T, then
f × χ(X \ F )↾(HX ∩ HF ) = f ↾(HX ∩ HF ) − (f × χF )↾(HX ∩ HF )
is Σ-measurable,
T while HX ∩ HF is µ-conegligible, so X \ F ∈ T. If hFn in∈N is a sequence in T with union F , set
H = n∈N HFn ; then H is conegligible, f × χFn ↾H is Σ-measurable for every n ∈ N, and f × χF = supn∈N f × χFn ,
so f × χF ↾H is Σ-measurable, and F ∈ T. Thus T is a σ-algebra. If F ∈ Σ, then f × χF ↾HX is Σ-measurable, so
F ∈ T. QQ
(b) Next, ν is a measure. P P Of course νF ∈ [0, ∞] for every F ∈PT. f × χ∅ = 0 wherever it is defined, so ν∅ = 0.

If hFn in∈N is a disjoint P
sequence in T with union F , then f × χFP= n=0 f × χFn . If νFm = ∞ for some m, then we
∞ ∞
surely have νF = ∞ = n=0 νFn . If νFm < ∞ for each m but n=0 νFn = ∞, then
R S Pm R
f × χ( n≤m Fn ) = n=0 f × χFn → ∞
P∞ P∞
as m → ∞, so again νF = ∞ = n=0 νFn . If n=0 νFn < ∞ then by B.Levi’s theorem
R P∞ P∞ R P∞
νF = n=0 f × χFn = n=0 f × χFn = n=0 νFn . QQ

P If A ⊆ F ∈ T and νF = 0, then f × χF = 0 a.e., so f × χA = 0 a.e. and νA is defined


(c) Finally, ν is complete. P
and equal to zero. QQ

234J Definition Let (X, Σ, µ) be a measure space, and ν another measure on X with domain T. I will call ν an
indefinite-integral measure over µ, or sometimes a completed indefinite-integral measure, if it can be obtained
by the method of 234I from some non-negative virtually measurable function f defined almost everywhere on X. In
this case, f is a Radon-Nikodým derivative of ν with respect to µ in the sense of 232Hf. As in 232Hf, the phrase
density function is also used in this context.

234K Remarks Let (X, Σ, µ) be a measure space, and f a µ-virtually measurable non-negative real-valued function
defined almost everywhere on X; let ν be the associated indefinite-integral measure.

(a) There is a Σ-measurable function g : X → [0, ∞[ such that f = g µ-a.e. P P Let H ⊆ dom f be a measurable
conegligible
R set Rsuch that f ↾H is measurable, and set g(x) = f (x) for x ∈ H, g(x) = 0 for x ∈ X \ H. Q
Q In this case,
f × χE dµ = g × χE dµ if either is defined. So g is a Radon-Nikodým derivative of ν, and ν has a Radon-Nikodým
derivative which is Σ-measurable and defined everywhere.
234M Operations on measures 117

(b) If E is µ-negligible, then f × χE = 0 µ-a.e., so νE = 0. Many authors are prepared to say ‘ν is absolutely
continuous with respect to µ’ in this context. But if ν is not totally finite, it need not be absolutely continuous in the
ǫ-δ sense of 232Aa (234Xh), and further difficulties can arise if µ or ν is not σ-finite (see 234Yk, 234Ym).

(c) I have defined ‘indefinite-integral measure’ in such a way as to produce a complete measure. In my view this
is what makes best sense in most applications. There R are occasions
R on which it seems more appropriate to use the
measure ν0 : Σ → [0, ∞] defined by setting ν0 E = E f dµ = f × χE dµ for E ∈ Σ. I suppose I would call this the
uncompleted indefinite-integral measure over µ defined by f . (ν is always the completion of ν0 ; see 234Lb.)
R
(d) Note Rthe way in which I formulated the definition of ν: ‘νE = f × χE dµ if the integral is defined’, rather
than ‘νE = E f dµ’. The point Ris that the longer formula gives a rule for deciding what the domain of ν must be. Of
course it is the case that νE = E f dµ for every E ∈ dom ν (apply 214F to f × χE).

(e) Because µ and its completion define the same virtually measurable functions, the same null ideals and the same
integrals (212Eb, 212F), they define the same indefinite-integral measures.

234L The domain of an indefinite-integral measure It is sometimes useful to have an explicit description of
the domain of a measure constructed in this way.
Proposition Let (X, Σ, µ) be a measure space, f a non-negative µ-virtually measurable function defined almost
everywhere in X, and ν the associated indefinite-integral measure. Set G = {x : x ∈ dom f , f (x) > 0}, and let Σ̂ be
the domain of the completion µ̂ of µ.
(a) The domain T of ν is {E : E ⊆ X, E ∩ G ∈ Σ̂}; in particular, T ⊇ Σ̂ ⊇ Σ.
(b) ν is the completion of its restriction to Σ.
(c) A set A ⊆ X is ν-negligible iff A ∩ G is µ-negligible.
(d) In particular, if µ itself is complete, then T = {E : E ⊆ X, E ∩ G ∈ Σ} and νA = 0 iff µ(A ∩ G) = 0.
proof (a)(i) If E ∈ T, then f × χE is virtually measurable, so there is a conegligible measurable set H ⊆ dom f such
that f × χE↾H is measurable. Now E ∩ G ∩ H = {x : x ∈ H, (f × χE)(x) > 0} must belong to Σ, while E ∩ G \ H is
negligible, so belongs to Σ̂, and E ∩ G ∈ Σ̂.
(ii) If E ∩ G ∈ Σ̂, let F1 , F2 ∈ Σ be such that F1 ⊆ E ∩ G ⊆ F2 and F2 \ F1 is negligible. Let H ⊆ dom f be a
conegligible set such that f ↾H is measurable. Then H ′ = H \ (F2 \ F1 ) is conegligible and f × χE↾H ′ = f × χF1 ↾H ′
is measurable, so f × χE is virtually measurable and E ∈ T.
(b) Thus the given formula does indeed describe T. If E ∈ T, let F1 , F2 ∈ Σ be such that F1 ⊆ E ∩ G ⊆ F2 and
µ(F2 \ F1 ) = 0. Because G itself also belongs to Σ̂, there are G1 , G2 ∈ Σ such that G1 ⊆ G ⊆ G2 and µ(G2 \ G1 ) = 0.
Set F2′ = F2 ∪ (X \ G1 ). Then F2′ ∈ Σ and F1 ⊆ E ⊆ F2′ . But (F2′ \ F1 ) ∩ G ⊆ (G2 \ G1 ) ∪ (F2 \ F1 ) is µ-negligible, so
ν(F2′ \ F1 ) = 0.
This shows that if ν ′ is the completion of ν↾Σ and T′ is its domain, then T ⊆ T′ . But as ν is complete, it surely
extends ν ′ , so ν = ν ′ , as claimed.
(c) Now take any A ⊆ X. Because ν is complete,

A is ν-negligible ⇐⇒ νA = 0
Z
⇐⇒ f × χA dµ = 0

⇐⇒ f × χA = 0 µ-a.e.
⇐⇒ A ∩ G is µ-negligible.

(d) This is just a restatement of (a) and (c) when µ = µ̂.

234M Corollary If (X, Σ, µ) is a complete measure space and G ∈ Σ, then the indefinite-integral measure over µ
defined by χG is just the measure µ G defined by setting
(µ G)(F ) = µ(F ∩ G) whenever F ⊆ X and F ∩ G ∈ Σ.

proof 234Ld.
118 The Radon-Nikodým theorem *234N

*234N The next two results will not be relied on in this volume, but I include them for future reference, and to
give an idea of the scope of these ideas.
Proposition Let (X, Σ, µ) be a measure space, and ν an indefinite-integral measure over µ.
(a) If µ is semi-finite, so is ν.
(b) If µ is complete and locally determined, so is ν.
(c) If µ is localizable, so is ν.
(d) If µ is strictly localizable, so is ν.
(e) If µ is σ-finite, so is ν.
(f) If µ is atomless, so is ν.
proof By 234Ka, we may express ν as the indefinite integral of a Σ-measurable function f : X → [0, ∞[. Let T be
the domain of ν, and Σ̂ the domain of the completion µ̂ of µ; set G = {x : x ∈ X, f (x) > 0} ∈ Σ.
(a) Suppose that E ∈ T and that νE = ∞. Then E ∩ G cannot be S µ-negligible. Because µ is semi-finite, there is
a non-negligible F ∈ Σ such that F ⊆ E ∩ G and µF < ∞. Now F = n∈N {x : x ∈ F , 2−n ≤ f (x) ≤ n}, so there is
an n ∈ N such that F ′ = {x : x ∈ F , 2−n ≤ f (x) ≤ n} is non-negligible. Because f is measurable, F ′ ∈ Σ ⊆ T and
2−n µF ′ ≤ νF ′ ≤ nµF ′ . Thus we have found an F ′ ⊆ E such that 0 < νF ′ < ∞. As E is arbitrary, ν is semi-finite.
(b) We already know that ν is complete (234Lb) and semi-finite. Now suppose that E ⊆ X is such that E ∩ F ∈ T,
that is, E ∩ F ∩ G ∈ Σ (234Ld), whenever F ∈ T and νF < ∞. Then E ∩ G ∩ F ∈ Σ whenever F ∈ Σ and µF < ∞.
PP Set Fn = {xS : x ∈ F ∩ G, f (x) ≤ n}. Then νFn ≤ nµF < ∞, so E ∩ G ∩ Fn ∈ Σ for every n. But this means that
E ∩ G ∩ F = n∈N E ∩ G ∩ Fn ∈ Σ. Q Q Because µ is locally determined, E ∩ G ∈ Σ and E ∈ T. As E is arbitrary, ν is
locally determined.

(c) Let F ⊆ T be any set. Set E = {F ∩ G : F ∈ F}, so that E ⊆ Σ̂. By 212Ga, µ̂ is localizable, so E has an essential
supremum H ∈ Σ̂. But now, for any H ′ ∈ T, H ′ ∪ (X \ G) = (H ′ ∩ G) ∪ (X \ G) belongs to Σ̂, so

ν(F \ H ′ ) = 0 for every F ∈ F


⇐⇒ µ̂(F ∩ G \ H ′ ) = 0 for every F ∈ F
⇐⇒ µ̂(E \ H ′ ) = 0 for every E ∈ E
⇐⇒ µ̂(E \ (H ′ ∪ (X \ G))) = 0 for every E ∈ E
⇐⇒ µ̂(H \ ((H ′ ∪ (X \ G))) = 0
⇐⇒ µ̂(H ∩ G \ H ′ ) = 0
⇐⇒ ν(H \ H ′ ) = 0.

Thus H is also an essential supremum of F in T. As F is arbitrary, ν is localizable.


(d) Let hXi ii∈I be a decomposition of X for µ; then it is also a decomposition for µ̂ (212Gb). Set F0 = X \ G, Fn =
{x : x ∈ G, n − 1 < f (x) ≤ n} for n ≥ 1. Then hXi ∩ Fn ii∈I,n∈N is a decomposition for ν. P P (i) hXi ii∈I and hFn in∈N
are partitions of X into members of Σ ⊆ T, so hXi ∩ Fn ii∈I,n∈N also is. (ii) ν(Xi ∩ F0 ) = 0, ν(XS i ∩ Fn ) ≤ nµXi < ∞
for i ∈ I, n ≥ 1. (iii) If E ⊆ X and E ∩ Xi ∩ Fn ∈ T for every i ∈ I and n ∈ N then E ∩ Xi ∩ G = n∈N E ∩ Xi ∩ Fn ∩ G
belongs to Σ̂ for every i, so E ∩ G ∈ Σ̂ and E ∈ T. (iv) If E ∈ T, then of course
P P
i∈I,n∈N ν(E ∩ Xi ∩ Fn ) = supJ⊆I×N is finite (i,n)∈J ν(E ∩ Xi ∩ Fn ) ≤ νE.
P
So if i∈I,n∈N ν(E ∩ Xi ∩ Fn ) = ∞ it is surely equal to νE. If the sum is finite, then K = {i : i ∈ I, ν(E ∩ Xi ) > 0}
R
must be countable. But for i ∈ I \ K, E∩Xi f dµ = 0, so f = 0 µ-a.e. on E ∩ Xi , that is, µ̂(E ∩ G ∩ Xi ) = 0. Because
S S
hXi ii∈I is a decomposition for µ̂, µ̂(E ∩ G ∩ i∈I\K Xi ) = 0 and ν(E ∩ i∈I\K Xi ) = 0. But this means that
P P P
νE = i∈K ν(E ∩ Xi ) = i∈K,n∈N ν(E ∩ Xi ∩ Fn ) = i∈I,n∈N ν(E ∩ Xi ∩ Fn ).
As E is arbitrary, hXi ∩ Fn ii∈I,n∈N is a decomposition for ν. Q
Q So ν is strictly localizable.
(e) If µ is σ-finite, then in (d) we can take I to be countable, so that I × N also is countable, and ν will be σ-finite.

(f ) If µ is atomless, so is µ̂ (212Gd). If E ∈ T and νE > 0, then µ̂(E ∩ G) > 0,R so there is an F ∈ Σ̂ such
R that
F ⊆ E ∩ G and neither F nor E ∩ G \ F is µ̂-negligible. But in this case both νF = F f dµ and ν(E \ F ) = E\F f dµ
must be greater than 0 (122Rc). As E is arbitrary, ν is atomless.
234Q Operations on measures 119

*234O For localizable measures, there is a straightforward description of the associated indefinite-integral measures.
Theorem Let (X, Σ, µ) be a localizable measure space. Then a measure ν, with domain T ⊇ Σ, is an indefinite-integral
measure over µ iff (α) ν is semi-finite and zero on µ-negligible sets (β) ν is the completion of its restriction to Σ (γ)
whenever νE > 0 there is an F ⊆ E such that F ∈ Σ, µF < ∞ and νF > 0.
proof (a) If ν is an indefinite-integral measure over ν, then by 234Na, 234Kb and 234Lb it is semi-finite, zero on
µ-negligible sets and the completion of its restriction to Σ. Now suppose that E ∈ T and νE > 0. Then there is an
E0 ∈ Σ such that E0 ⊆ E and νE0 = νE, by 234Lb. If f : X → R is a Σ-measurable Radon-Nikodým derivative of ν
(234Ka), and G = {x : f (x) > 0}, then µ(G ∩ E0 ) > 0; because µ is semi-finite, there is an F ∈ Σ such that F ⊆ G ∩ E0
and 0 < µF < ∞, in which case νF > 0.
(b) So now suppose that ν satisfies the conditions.
(i) Set E = {E : E ∈ Σ, νE < ∞}. For each E ∈ E, consider νE : Σ → R, setting νE G = ν(G ∩ E) for every
G ∈ Σ. Then νE is countably additive and truly continuous with respect to µ. PP νE is countably additive, just as in
231De. Because ν is zero on µ-negligible sets, νE must be absolutely continuous with respect to µ, by 232Ba. Since
νE clearly satisfies condition (γ) of 232Bb, it must be truly continuous.
R Q
Q
By 232E, there is a µ-integrable function fE such that νE G = G fE dµ for every G ∈ Σ, and we may suppose that
fE is Σ-measurable (232He). Because νE is non-negative, fE ≥ 0 µ-almost everywhere.
(ii) If E, F ∈ E then fE = fF µ-a.e. on E ∩ F , because
R R
f dµ = νG =
G E G
fF dµ
whenever G ∈ Σ and G ⊆ E ∩ F . Because (X, Σ, µ) is localizable, there is a measurable f : X → R such that
fE = f µ-a.e. on E for every E ∈ E (213N). Because every fE is non-negative almost everywhere, we may suppose
that f is non-negative, since surely fE = f ∨ 0 µ-a.e. on E for every E ∈ E.
(iii) Let ν ′ be the indefinite-integral measure defined by f . If E ∈ E then
R R
νE = f dµ =
E E E
f dµ = ν ′ E.
For E ∈ Σ \ E, we have
ν ′ E ≥ sup{ν ′ F : F ∈ E, F ⊆ E} = sup{νF : F ∈ E, F ⊆ E} = νE = ∞
because ν is semi-finite. Thus ν ′ and ν agree on Σ. But since each is the completion of its restriction to Σ, they must
be equal.

234P Ordering measures There are many ways in which one measure can dominate another. Here I will describe
one of the simplest.
Definition Let µ, ν be two measures on a set X. I will say that µ ≤ ν if µE is defined, and µE ≤ νE, whenever ν
measures E.

234Q Proposition Let X be a set, and write M for the set of all measures on X.
(a) Defining ≤ as in 234P, (M, ≤) is a partially ordered set.
(b) If µ, ν ∈ M, then µ ≤ ν iff there is a λ ∈ M such that µ + λ = ν. R
(c) If µ ≤ νR in M and f is a [−∞, ∞]-valued function,
R defined
R on a subset of X, such that f dν is defined in
[−∞, ∞], then f dµ is defined; if f is non-negative, f dµ ≤ f dν.
proof (a) Of course µ ≤ µ for every µ ∈ M. If µ ≤ ν and ν ≤ λ in M, then dom λ ⊆ dom ν ⊆ dom µ, and
µE ≤ νE ≤ λE whenever λ measures E. If µ ≤ ν and ν ≤ µ then dom µ ⊆ dom ν ⊆ dom µ and µE ≤ νE ≤ µE for
every E in their common domain, so µ = ν.
(b)(i) If µ + λ = ν, then the definitions in 234G and 234P make it plain that µ ≤ ν.
α) In the reverse direction, if µ ≤ ν, write T for the domain of ν. Define λ : T → [0, ∞] by setting
(ii)(α
λG = sup{νF − µF : F ∈ T, F ⊆ G, µF < ∞}
for G ∈ T. Then λ ∈ M. P P Of course dom λ = T is a σ-algebra, and λ∅ = 0. Suppose that hGn in∈N is a disjoint
sequence in T with union G. If F ∈ T, F ⊆ G and µF < ∞, then
120 The Radon-Nikodým theorem 234Q


X ∞
X
νF − µF = ν(F ∩ Gn ) − µ(F ∩ Gn )
n=0 n=0
X∞ ∞
X
= ν(F ∩ Gn ) − µ(F ∩ Gn ) ≤ λGn ;
n=0 n=0
P∞ P∞ Pm
as F is arbitrary, λG ≤ n=0 λGn . If γ < n=0 λGn , there are an m P∈mN such that γ < n=0 λGn , and
S F 0 , . . . , Fm
such that Fn ∈ T, Fn ⊆ Gn and µFn < ∞ for every n ≤ m, while n=0 νFn − µFn ≥ γ. Set F = n≤m Fn ; then
F ∈ T, F ⊆ G and µF < ∞, so
Pm
λG ≥ νF − µF = n=0 νFn − µFn ≥ γ.
P∞
As γ is arbitrary, λG ≥ n=0 λGn ; as hGn in∈N is arbitrary, λ is countably additive. Q
Q
β ) Now µ + λ = ν. P
(β P The domain of µ + λ is dom µ ∩ dom λ = T = dom ν. Take G ∈ T. If µG = ∞, then
νG = ∞ = (µ + λ)G. Otherwise,
(µ + λ)G ≥ µG + νG − µG = νG.
So if νG = ∞ we shall certainly have νG = (µ + λ)G. Finally, if νG < ∞ then

(µ + λ)G = µG + sup{νF − µF : F ∈ T, F ⊆ G}
= sup{νF + µ(G \ F ) : F ∈ T, F ⊆ G}
≤ sup{νF + ν(G \ F ) : F ∈ T, F ⊆ G} = νG,
so again we have equality. Q
Q
Thus we have an appropriate expression of ν as a sum of measures.
(c)(i) If f is non-negative, put (b) and 234Hc together.
R R + R − R +
R − (ii) In general, if f dν is defined, so are both f dν and f dν, and at most one is infinite; so f dµ and
f dµ are defined and at most one is infinite.

234X Basic exercises (a) Let (X, Σ, µ) and (Y, T, ν) be measure spaces, and φ : X → Y an inverse-measure-
preserving function. Let A ⊆ X be a set of full outer measure in X. Show that φ[A] has full outer measure in Y , and
that φ↾A is inverse-measure-preserving for the subspace measures on A and φ[A].

(b) Let (X, Σ, µ) be a measure space, Y a set and φ : X → Y a function. Show that if µ is point-supported, so is
the image measure µφ−1 .

(c) Give an example of a probability space (X, Σ, µ), a set Y , and a function φ : X → Y such that the completion
of the image measure µφ−1 is not the image of the completion of µ. (Hint: #(X) = 3.)
−1
(d) Let X, Y be sets, φ : X → Y a function and hµP i ii∈I a family of measures on X with sum µ. Writing µi φ ,
−1 −1 −1
µφ for the image measures on Y , show that µφ = i∈I µi φ .
P
(e) Let X be a set. (i) Show that if hµi ii∈I is a countable family of σ-finite measures on X, and µ = i∈I P µi is
semi-finite, then µ is σ-finite. (ii) Show that if hµi ii∈I is a family of purely atomic measures on X, and µ = i∈I µi
Psemi-finite, then µ is purely atomic. (iii) Show that if hµi ii∈I is any family of point-supported measures on X, then
is
i∈I µi is point-supported.

> (f ) Let X be a set, and write M for the set of all measures on X. For µ ∈ M and α ∈ [0, ∞[, define αµ by saying
that if α > 0 then (αµ)(E) = αµE for E ∈ dom µ, while if α = 0 then (αµ)(E) = 0 for every E ⊆ X. (i) Show that
αµ ∈ M for all α ∈ [0, ∞[ and µ ∈ M. (ii) Show that (α + β)µ = αµ + βµ, α(βµ) = (αβ)µ, α(µ + ν) = αµ + αν for all
α, β ∈ [0, ∞[ and µ, ν ∈ M.

(g) Let X be a set, and hµi ii∈I a family of complete measures on X with sum µ. Show R a [−∞, ∞]-valued
P that
function f defined on a subset of X is µ-integrable iff it is µi -integrable for every i ∈ I and i∈I |f |dµi is finite.

(h) Let µ be Lebesgue measure on [0, 1], and set f (x) = x1 for x > 0. Let ν be the associated
 1 indefinite-integral
measure. Show that the domain of ν is equal to the domain of µ. Show that for every δ ∈ 0, 2 there is a measurable
set E such that µE = δ but νE = 1δ .
234Yl Operations on measures 121

(i) Let (X, Σ, µ) be a measure space. (i) Show that if ν1 and ν2 are indefinite-integral measures
Pover µ, so is ν1 + ν2 .
(ii) Show that if hνi ii∈I is a countable family of indefinite-integral measures over µ, and ν = i∈I νi is semi-finite,
then ν is an indefinite-integral measure over µ.

(j) Let (X, Σ, µ) be a measure space, and ν an indefinite-integral measure over µ. Show that if µ is purely atomic,
so is ν.

(k) Let µ be a point-supported measure. Show that any indefinite-integral measure over µ is point-supported.

(l) Let X be a set, and M the set of measures on X, with the partial ordering defined in 234P. Show that (i) M
has greatest andP least members
P (to be described); (ii) if hµi ii∈I and hνi ii∈I are families in M such that µi ≤ νi for
every i, then i∈I µi ≤ i∈I νi ; (iii) if we define scalar multiplication as in 234Xf, then αµ ≤ µ whenever µ ∈ M and
α ∈ [0, 1]; (iv) writing µ̂ for the completion of µ, µ̂ ≤ µ and µ̂ ≤ ν̂ whenever µ, ν ∈ M and µ ≤ ν; (v) writing µ̃ for the
c.l.d. version of µ, µ̃ ≤ µ for every µ ∈ M; (vi) whenever A ⊆ M is upwards-directed, it has a least upper bound in M.

(m) Write out an elementary direct proof of 234Qc not depending on 234Qb.

234Y Further exercises (a) Write ν for Lebesgue measure on Y = [0, 1], and T for its domain. Let A ⊆ [0, 1]
be a set such that ν ∗ A = ν ∗ ([0, 1] \ A) = 1, and set X = [0, 1] ∪ {x + 1 : x ∈ A} ∪ {x + 2 : x ∈ [0, 1] \ A}. Let µLX
be the subspace measure induced on X by Lebesgue measure, and set µE = 31 µLX E for E ∈ Σ = dom µLX . Define
φ : X → Y by writing φ(x) = x if x ∈ [0, 1], φ(x) = x − 1 if x ∈ X ∩ ]1, 2] and φ(x) = x − 2 if x ∈ X ∩ ]2, 3]. Show that
ν is the image measure µφ−1 , but that ν ∗ A > µ∗ φ−1 [A].

(b) Look for interesting examples of probability spaces (X, Σ, µ) and (Y, T, ν) for which there are functions φ : X → Y
such that φ[E] ∈ T and νφ[E] = µE for every E ∈ Σ. (Hint: 254K, 343J.)

(c) Let µ be two-dimensional Lebesgue measure on the unit square [0, 1]2 , and let φ : [0, 1]2 → [0, 1] be the projection
onto the first coordinate, so that φ(ξ1 , ξ2 ) = ξ1 for ξ1 , ξ2 ∈ [0, 1]. Show that the image measure µφ−1 is Lebesgue
one-dimensional measure on [0, 1].

(d) In 234F, show that the image measure µφ−1 extends ν, and is equal to ν if and only if F ∈ T for every
F ⊆ Y \ φ[X].

(e) Let (Y, T, ν) be a complete measure space, X a set and φ : X → Y a surjection. Set
Σ = {E : E ⊆ X, φ[E] ∈ T, ν(φ[E] ∩ φ[X \ E]) = 0}, µE = νφ[E] for E ∈ T.
Show that µ is the completion of the measure constructed by the process of 234F.

(f ) Let X be a set, and M the set of measures on X. Show that M, with addition as defined for two measures by
the formulae of 234G, is a commutative semigroup with identity; describe the identity.

(g) Give an example of a set X, probability measures µ1 , µ2 on X and a set A ⊆ X such that A is both µ1 -negligible
and µ2 -negligible, but is not µ-negligible, where µ = µ1 + µ2 .

(h) In 214O, show that if we set νE = supI∈I µ∗ (E ∩ I) for every E ∈ Σ, then ν is a measure, while µ = ν + λ.

(i) Let (X, Σ, µ) be an atomless semi-finite measure space and ν an indefinite-integral measure over µ. Show that
the following are equiveridical: (i) for every ǫ > 0 there is a δ > 0 such that νE ≤ ǫ whenever µE ≤ δ (ii) ν has a
Radon-Nikodým derivative expressible as the sum of a bounded function and an integrable function.

(j) Let (X, Σ, µ) be a measure space and ν an indefinite-integral measure over µ, with Radon-Nikodým derivative
f . Show that the c.l.d. version of ν is the indefinite-integral measure defined by f over the c.l.d. version of µ.

(k) Let (X, Σ, µ) be a semi-finite measure space which is not localizable. Show that there isR a measure ν : Σ → [0, ∞]
such that νE ≤ µE for every E ∈ Σ but there is no measurable function f such that νE = E f dµ for every E ∈ Σ.

(l) Let (X, Σ, µ) be a localizable measure space with locally determined negligible sets. Show that a measure ν, with
domain T ⊇ Σ, is an indefinite-integral measure over µ iff (α) ν is complete and semi-finite and zero on µ-negligible
sets (β) whenever νE > 0 there is an F ⊆ E such that F ∈ Σ and µF < ∞ and νF > 0.
122 The Radon-Nikodým theorem 234Ym

(m) Give an example of a localizable measure space (X, Σ, µ) and a complete semi-finite measure ν on X, defined
on a σ-algebra T ⊇ Σ, zero on µ-negligible sets, and such that whenever νE > 0 there is an F ⊆ E such that F ∈ Σ
and µF < ∞ and νF > 0, but ν is not an indefinite-integral measure over µ. (Hint: 216Yb.)

(n) Let (X, Σ, µ) be a localizable measure space, and ν a complete localizable measure on X, with domain T ⊇ Σ,
which is the completion of its restriction to Σ. Show that if we set ν1 F = sup{ν(F ∩ E) : E ∈ Σ, µE < ∞} for every
F ∈ T, then ν1 is an indefinite-integral measure over µ, and there is an H ∈ Σ such that ν1 F = ν(F ∩ H) for every
F ∈ T.

(o) Let X be a set, and Msf the set of semi-finite measures on X. For µ, ν ∈ Msf say that µ 4 ν if dom ν ⊆ dom µ,
µF ≤ νF for every F ∈ dom ν, and whenever E ∈ dom µ and µE > 0 there is an F ∈ dom ν such that F ⊆ E and
0 < µF < ∞. (i) Show that (Msf , 4) is a partially ordered set. (ii) Show that if A ⊆TMsf is a non-empty set with an
upper bound in Msf , then it has a least upper bound λ defined by saying that dom λ = µ∈A dom µ and, for E ∈ dom λ,

Xn
λE = sup{ µi Fi : µ0 , . . . , µn ∈ A, hFi ii≤n is a partition of E,
i=0
Fi ∈ dom λ for every i ≤ n}
Xn
= sup{ µi Fi : µ0 , . . . , µn ∈ A, F0 , . . . , Fn are disjoint,
i=0
Fi ∈ dom µi and Fi ⊆ E for every i ≤ n}.
(iii) Suppose that µ, ν ∈ Msf have completions µ̂, ν̂ and c.l.d. versions µ̃, ν̃. Show that µ̃ 4 µ̂ 4 µ. Show that if µ 4 ν
then µ̂ 4 ν̂ and µ̃ 4 ν̃.

234 Notes and comments One of the striking features of measure theory, compared with other comparably abstract
branches of pure mathematics, is the relative unimportance of any notion of ‘morphism’. The theory of groups, for
instance, is dominated by the concept of ‘homomorphism’, and general topology gives a similar place to ‘continuous
function’. In my view, the nearest equivalent in measure theory is the idea of ‘inverse-measure-preserving function’
(234A). I mean in Volumes 3 and 4 to explore this concept more thoroughly. In this volume I will content myself with
signalling such functions when they arise, and with the basic facts listed in 234B.
Naturally linked with the idea of inverse-measure-preserving function is the construction of ‘image measures’ (234C).
These appear everywhere in the subject, starting with the not-quite-elementary 234Yc. They are of such importance
that it is natural to explore variations, as in 234F and 234Yb, but in my view none are of comparable significance.
Nearly half the section is taken up with ‘indefinite-integral measures’. I have taken this part very carefully because
the ideas I wish to express here, in so far as they extend the work of §232, rely critically on the details of the formulation
in 234I, and it is easy to make a false step once we have left the relatively sheltered context of complete σ-finite measures.
I believe that if we take a little trouble at this point we can develop a theory (234K-234N) which will offer a smooth
path to later applications; to see what I have in mind, you can refer to the entries under ‘indefinite-integral measure’
in the index. For the moment I mention only a kind of Radon-Nikodým theorem for localizable measures (234O).
The partial ordering described in 234P-234Q is only one of many which can be considered, and for some purposes
it seems unsatisfactory. The most important examples will appear in Chapter 41 of Volume 4, and have a variety of
special features for which it might be worth setting out further abstractions. However the version here has the merit
of simplicity and supports at least some of the relevant ideas (234Xl). For an alternative notion, see 234Yo.

235 Measurable transformations


I turn now to a topic which is separate from the Radon-Nikodým theorem, but which seems to fit better here than
in either of the next two chapters. I seek to give results which will generalize the basic formula of calculus
R R
g(y)dy = g(φ(x))φ′ (x)dx
in the context of a general transformation φ between measure spaces. The principal results are I suppose 235A/235E,
which are very similar expressions of the basic idea, and 235J, which gives a general criterion for a stronger result. A
formulation from a different direction is in 235R.
235B Measurable transformations 123

235A I start with the basic result, which is already sufficient for a large proportion of the applications I have in
mind.
Theorem Let (X, Σ, µ) and (Y, T, ν) be measure spaces, and φ : Dφ → Y , J : DJ → [0, ∞[ functions defined on
conegligible subsets Dφ , DJ of X such that
R
J × χ(φ−1 [F ])dµ exists = νF
whenever F ∈ T and νF < ∞. Then
R R
φ−1 [H]
J × gφ dµ exists = H
g dν

for every ν-integrable function g taking values in [−∞, ∞] and every H ∈ T, provided that we interpret (J × gφ)(x) as
0 when J(x) = 0 and g(φ(x)) is undefined. Consequently, interpreting J × f φ in the same way,
R R R R
f dν ≤ J × f φ dµ ≤ J × f φ dµ ≤ f dν
for every [−∞, ∞]-valued function f defined almost everywhere in Y .
Pn
proof (a) If g is a simple function, say g = i=0 ai χFi where νFi < ∞ for each i, then
R Pn R Pn R
J × gφ dµ = i=0 ai J × χ(φ−1 [Fi ]) dµ = i=0 ai νFi = g dν.
R
(b) If νF = 0 then J × χ(φ−1 [F ]) = 0 so J = 0 a.e. on φ−1 [F ]. So if g is defined ν-a.e., J = 0 µ-a.e. on
X \ dom(gφ) = (X \ Dφ ) ∪ φ−1 [Y \ dom g], and, on the convention proposed, J × gφ is defined µ-a.e. Moreover, if
limn→∞ gn = g ν-a.e., then limn→∞ J × gn φ = J × gφ µ-a.e. So if hgn in∈N is a non-decreasing sequence of simple
functions converging almost everywhere to g, hJ × gn φin∈N will be an non-decreasing sequence of integrable functions
converging almost everywhere to J × gφ; by B.Levi’s theorem,
R R R R
J × gφ dµ exists = limn→∞ J × gn φ dµ = limn→∞ gn dν = g dν.

(c) If g = g + − g − , where g + and g − are ν-integrable functions, then


R R R R R R
J × gφ dµ = J × g + φ dµ − J × g − φ dµ = g + dν − g − dν = g dν.

(d) This deals with the case H = Y . For the general case, we have

Z Z
g dν = (g × χH)dν
H
(131Fa)
Z Z Z
−1
= J × (g × χH)φ dµ = J × gφ × χ(φ [H])dµ = J × gφ dµ
φ−1 [H]

by 214F.
R
(e) For the upper and lower integrals, I note first that if F is ν-negligible then J × χ(φ−1 [F ])dµ = 0, so that
J = 0 µ-a.e. on φ−1 [F ]. It follows that if f and g are [−∞, ∞]-valued functions on subsets of Y and f ≤a.e. g, then
R R R
J × f φ ≤a.e. J × gφ. Now if f dν = ∞, we surely have J × f φ dµ ≤ f dν. Otherwise,
Z Z
f dν = inf{ g dν : g is ν − integrable and f ≤a.e. g}
Z
= inf{ J × gφ dµ : g is ν-integrable and f ≤a.e. g}
Z Z
≤ inf{ h dµ : h is µ-integrable and J × f φ ≤a.e. h} = J × f φ dµ.
R R
Similarly, or applying this argument to −f , we have J × f φ dµ ≤ f dν.

235B Remarks (a) Note the particular convention


0 × undefined = 0
124 The Radon-Nikodým theorem 235B

which I am applying to the interpretation of J × gφ. This is the first of a number of technical points which will concern
us in this section. The point is that if g is defined ν-almost everywhere, then for any extension of g to a function
g1 : Y → R we shall, on this convention, have J × gφ = J × g1 φ except on {x : J(x) > 0, φ(x) ∈ Y \ dom g}, which is
negligible; so that
R R R R
J × gφ dµ = J × g1 φ dµ = g1 dν = g dν
if g and g1 are integrable. Thus the convention is appropriate here, and while it adds a phrase to the statements of
many of the results of this section, it makes their application smoother. (But I ought to insist that I am using this as
a local convention only, and the ordinary rule 0 × undefined = undefined will stand elsewhere in this treatise unless
explicitly overruled.)

(b) I have had to take care in the formulation of this theorem to distinguish between the hypothesis
R
J(x)χ(φ−1 [F ])(x)µ(dx) exists = νF whenever νF < ∞
and the perhaps more elegant alternative
R
J(x)µ(dx) exists = νF whenever νF < ∞,
φ−1 [F ]
R R
which is not quite adequate for the theorem. (See 235QR below.) Recall that by A f I mean R (f ↾A)dµA , where µA
is the subspace measure on A (214D). It is possible for A (f ↾A)dµA to be defined even when f × χA dµ is not; for
instance, take µ to Rbe Lebesgue measure on [0, 1], A any non-measurable subset of [0, 1], and f the constant
R function
with value 1; then R A f = µ∗ A, but f × χA = χA is not µ-integrable. It is however the case that if f × χA dµ is
defined, then so is A f , and the two are equal; this is a consequence of 214F. While 235P shows that in most of the
cases relevant to the present volume the distinction can be passed over, it is important to avoid assuming that φ−1 [F ]
is measurable for every F ∈ T. A simple example is the following. Set X = Y = [0, 1]. Let µ be Lebesgue measure on
[0, 1], and define ν by setting
T = {F : F ⊆ [0, 1], F ∩ [0, 12 ] is Lebesgue measurable},

νF = 2µ(F ∩ [0, 21 ]) for every F ∈ T.


Set φ(x) = x for every x ∈ [0, 1]. Then we have
R R
νF = J × χ(φ−1 [F ])dµ
F
J dµ =
 
for every F ∈ T, where J(x) = 2 for x ∈ [0, 12 ] and J(x) = 0 for x ∈ 21 , 1 . But of course there are subsets F of [ 12 , 1]
which are not Lebesgue measurable (see 134D), and such an F necessarily belongs to T, even though φ−1 [F ] does not
belong to the domain Σ of µ.
The point here is that if νF0 = 0 then we expect to have J = 0 on φ−1 [F0 ], and it is of no importance whether
−1
φ [F ] is measurable for F ⊆ F0 .
R
235C Theorem 235A is concerned with integration, and accordingly the hypothesis J × χ(φ−1 [F ])dµ = νF looks
only at sets F of finite measure. If we wish to consider measurability of non-integrable functions, we need a slightly
stronger hypothesis. I approach this version more gently, with a couple of lemmas.
Lemma Let Σ, T be σ-algebras of subsets of X and Y respectively. Suppose that D ⊆ X and that φ : D → Y
is a function such that φ−1 [F ] ∈ ΣD , the subspace σ-algebra, for every F ∈ T. Then gφ is Σ-measurable for every
[−∞, ∞]-valued T-measurable function g defined on a subset of Y .
proof Set C = dom g and B = dom gφ = φ−1 [C]. If a ∈ R, then there is an F ∈ T such that {y : g(y) ≤ a} = F ∩ C.
Now there is an E ∈ Σ such that φ−1 [F ] = E ∩ D. So
{x : gφ(x) ≤ a} = B ∩ E ∈ ΣB .
As a is arbitrary, gφ is Σ-measurable.

235D Some of the results below are easier when we can move freely between measure spaces and their completions
(212C). The next lemma is what we need.
Lemma Let (X, Σ, µ) and (Y, T, ν) be measure spaces, with completions (X, Σ̂, µ̂) and (Y, T̂, ν̂). Let φ : Dφ → Y ,
J : DJ →R [0, ∞[ be functions defined on conegligible subsets of X. R
(a) If J × χ(φ−1 [F ])dµ = νF whenever F ∈ T and νF < ∞, then J × χ(φ−1 [F ])dµ̂ = ν̂F whenever F ∈ T̂ and
ν̂F < ∞.
235E Measurable transformations 125
R R
(b) If J × χ(φ−1 [F ])dµ = νF whenever F ∈ T, then J × χ(φ−1 [F ])dµ̂ = ν̂F whenever F ∈ T̂.
R
proof Both rely on the fact that either hypothesis is enough to ensure that J × χ(φ−1 [F ])dµ = 0 whenever νF = 0.
Accordingly, if F is ν-negligible, so that there is an F ′ ∈ T such that F ⊆ F ′ and νF ′ = 0, we shall have
R R
J × χ(φ−1 [F ])dµ = J × χ(φ−1 [F ′ ])dµ = 0.
But now, given any F ∈ T̂, there is an F0 ∈ T such that F0 ⊆ F and ν̂(F \ F0 ) = 0, so that
Z Z
J × χ(φ [F ])dµ̂ = J × χ(φ−1 [F ])dµ
−1

Z Z
= J × χ(φ−1 [F0 ])dµ + J × χ(φ−1 [F \ F0 ])dµ

= νF0 = ν̂F,
provided (for part (a)) that ν̂F < ∞.
Remark Thus if we have the hypotheses of any of the principal results of this section valid for a pair of non-complete
measure spaces, we can expect to be able to draw some conclusion by applying the result to the completions of the
given spaces.

235E Now I come to the alternative version of 235A.


Theorem Let (X, Σ, µ) and (Y, T, ν) be measure spaces, and φ : Dφ → Y , J : DJ → [0, ∞[ two functions defined on
conegligible subsets of X such that
R
J × χ(φ−1 [F ])dµ = νF
for every F ∈ T, allowing ∞ as a value of the integral.
(a) J × gφ is µ-virtually measurable for every ν-virtually measurable function g defined on a subset of Y . R
(b) LetR g be a ν-virtually measurable [−∞, ∞]-valued function defined on a conegligible subset of Y . Then J ×
gφ dµ = g dν whenever either integral is defined in [−∞, ∞], if we interpret (J × gφ)(x) as 0 when J(x) = 0 and
g(φ(x)) is undefined.
proof Let (X, Σ̂, µ̂) and (Y, T̂, ν̂) be the completions of (X, Σ, µ) and (Y, T, ν). By 235D,
R
J × χ(φ−1 [F ])dµ̂ = ν̂F
for every
R F ∈RT̂. Recalling that a real-valued function is µ-virtually measurable iff it is Σ̂-measurable (212Fa), and
that f dµ = f dµ̂ if either is defined in [−∞, ∞] (212Fb), the conclusions we are seeking are
(a)′ J × gφ is Σ̂-measurable for every T̂-measurable function g defined on a subset of Y ;
R R
(b)′ J × gφ dµ̂ = g dν̂ whenever g is a T̂-measurable function defined almost everywhere in Y and
either integral is defined in [−∞, ∞].
(a) When I write
R R
J × χDφ dµ = J × χ(φ−1 [Y ])dµ = νY ,
which is part of the hypothesis of this theorem, I mean to imply that J × χDφ is µ-virtually measurable, that is, is
Σ̂-measurable. Because Dφ is conegligible, it follows that J is Σ̂-measurable, and its domain DJ , being conegligible,
also belongs to Σ̂. Set G = {x : x ∈ DJ , J(x) > 0} ∈ Σ̂. Then for any set A ⊆ X, J ×χA is Σ̂-measurable iff A∩G ∈ Σ̂.
So the hypothesis is just that G ∩ φ−1 [F ] ∈ Σ̂ for every F ∈ T̂.
Now let g be a [−∞, ∞]-valued function, defined on a subset C of Y , which is T̂-measurable. Applying 235C to
φ↾G, we see that gφ↾G is Σ̂-measurable, so (J × gφ)↾G is Σ̂-measurable. On the other hand, J × gφ is zero almost
everywhere in X \ G, so (because G ∈ Σ̂) J × gφ is Σ̂-measurable, as required.
R
(b)(i) Suppose first that g ≥ 0. Then J × gφ ≥ 0, so (a) tells us that J × gφ is defined in [0, ∞].
R R R
α) If g dν̂ < ∞ then J × gφ dµ̂ = g dν̂ by 235A.

β ) If there is some ǫ > 0 such that ν̂H = ∞, where H = {y : g(y) ≥ ǫ}, then

R R
J × gφ dµ̂ ≥ ǫ J × χ(φ−1 [H])dµ̂ = ǫν̂H = ∞,
so
126 The Radon-Nikodým theorem 235E
R R
J × gφ dµ̂ = ∞ = g dν̂.

(γγ ) Otherwise,
Z Z
J × gφ dµ̂ ≥ sup{ J × hφ dµ̂ : h is ν̂-integrable, 0 ≤ h ≤ g}
Z Z
= sup{ h dν̂ : h is ν̂-integrable, 0 ≤ h ≤ g} = g dν̂ = ∞,
R R
so once again J × φ dµ̂ = g dν̂.
(ii) For general real-valued g, apply (i) to g + and g − where g + = 12 (|g| + g), g − = 21 (|g| − g); the point is that
(J × gφ)+ = J × g + φ and (J × gφ)− = J × g − φ, so that
R R R R R R
J × gφ = J × g+ φ − J × g− φ = g+ − g− = g
if either side is defined in [−∞, ∞].

235F Remarks (a) Of course there are two special cases of this theorem which between them carry all its content:
the case J = 1 a.e. and the case in which X = Y and φ is the identity function. If J = χX we are very close to 235G
below, and if φ is the identity function we are close to the indefinite-integral measures of §234.

(b) As in 235A, we can strengthen the conclusion of (b) in 235E to


R R
φ−1 [F ]
J × gφ dµ = F
g dν
R
whenever F ∈ T and F
g dν is defined in [−∞, ∞].

235G Theorem Let (X, Σ, µ) and (Y, T, ν) be measure spaces and φ : X → Y an inverse-measure-preserving
function. Then
(a) if g is a ν-virtually measurable [−∞, ∞]-valued function defined on a subset of Y , gφ is µ-virtually
R measurable;
R
(b) if g is a ν-virtually measurable [−∞, ∞]-valued function defined on a conegligible subset of Y , gφ dµ = g dν
if either integral is defined in [−∞, ∞];
R (c) if g is a ν-virtually
R measurable [−∞, ∞]-valued function defined on a conegligible subset of Y , and F ∈ T, then
φ−1 [F ]
gφ dµ = F
g dν if either integral is defined in [−∞, ∞].

proof (a) This follows immediately from 234Ba and 235C; taking Σ̂, T̂ to be the domains of the completions of µ, ν
respectively, φ−1 [F ] ∈ Σ̂ for every F ∈ T̂, so if g is T̂-measurable then gφ will be Σ̂-measurable.
(b) Apply 235E with J = χX; we have
R
J × χ(φ−1 [F ])dµ = µφ−1 [F ] = νF
for every F ∈ T, so
R R R
gφ = J × gφ = g
if either integral is defined in [−∞, ∞].
(c) Apply (b) to g × χF .

235H RThe image measure catastropheR Applications of 235A would run much more smoothly if we could say
‘ g dν exists and is equal to J × gφ dµ for every g : Y → R such that J × gφ is µ-integrable’.
Unhappily there is no hope of a universally applicable result in this direction. Suppose, for instance, that ν is Lebesgue
measure on Y = [0, 1], that X ⊆ [0, 1] is a non-Lebesgue-measurable set of outer measure 1 (134D), that µ is the
subspace measure νX on X, and that φ(x) = x for x ∈ X. Then
µφ−1 F = ν ∗ (X ∩ F ) = νF
for every Lebesgue measurable set F ⊆ Y , so we can Rtake J = χX and the hypotheses R of 235A and 235E will be
satisfied. But if we write g = χX : [0, 1] → {0, 1}, then gφ dµ is defined even though g dν is not.
The point here is that there is a set A ⊆ Y such that (in the language of 235A/235E) φ−1 [A] ∈ Σ but A ∈
/ T̂. This
is the image measure catastrophe. The search for contexts in which we can be sure that it does not occur will be
one of the motive themes of Volume 4. For the moment, I will offer some general remarks (235I-235J), and describe
one of the important cases in which the problem does not arise (235K).
235L Measurable transformations 127

235I Lemma Let Σ, T be σ-algebras of subsets of X, Y respectively, and φ a function from a subset D of X to Y .
Suppose that G ⊆ X and that
T = {F : F ⊆ Y, G ∩ φ−1 [F ] ∈ Σ}.
Then a real-valued function g, defined on a member of T, is T-measurable iff χG × gφ is Σ-measurable.
proof Because surely Y ∈ T, the hypothesis implies that G ∩ D = G ∩ φ−1 [Y ] belongs to Σ.
Let g : C → R be a function, where C ∈ T. Set B = dom(gφ) = φ−1 [C], and for a ∈ R set Fa = {y : g(y) ≥ a},
Ea = G ∩ φ−1 [Fa ] = {x : x ∈ G ∩ B, gφ(x) ≥ a}.
Note that G ∩ B ∈ Σ because C ∈ T.
(i) If g is T-measurable, then Fa ∈ T and Ea ∈ Σ for every a. Now
G ∩ {x : x ∈ B, gφ(x) ≥ a} = G ∩ φ−1 [Fa ] = Ea ,
so {x : x ∈ B, (χG × gφ)(x) ≥ a} is either Ea or Ea ∪ (B \ G), and in either case is relatively Σ-measurable in B. As
a is arbitrary, χG × gφ is Σ-measurable.
(ii) If χG × gφ is Σ-measurable, then, for any a ∈ R,
Ea = {x : x ∈ G ∩ B, (χG × gφ)(x) ≥ a} ∈ Σ
because G ∩ B ∈ Σ and χG × gφ is Σ-measurable. So Fa ∈ T. As a is arbitrary, g is T-measurable.

235J Theorem Let (X, Σ, µ) and (Y, T, ν) be complete measure spaces. Let φ : Dφ → Y , J : DJ → [0, ∞[ be
functions defined on conegligible subsets of X, and set G = {x : x ∈ DJ , J(x) > 0}. Suppose that
T = {F : F ⊆ Y, G ∩ φ−1 [F ] ∈ Σ},
R
νF = J × χ(φ−1 [F ])dµ for every F ∈ T.
R R
Then, for any real-valued function g defined on a subset of Y , J × gφ dµ = g dν whenever either integral is defined
in [−∞, ∞], provided that we interpret (J × gφ)(x) as 0 when J(x) = 0 and g(φ(x)) is undefined.
proof If g is T-measurable and defined almost everywhere, this is a consequence of 235E. So I have to show that
if J × gφ is measurable and defined almost everywhere, so is g. Set W = Y \ dom g. Then J × gφ is undefined on
G ∩ φ−1 [W ], because gφ is undefined there and we cannot take advantage of the escape clause available when J = 0;
so G ∩ φ−1 [W ] must be negligible, therefore measurable, and W ∈ T. Next,
R
νW = J × χ(φ−1 [W ]) = 0
because J × χ(φ−1 [W ]) can be non-zero only on the negligible set G ∩ φ−1 [W ]. So g is defined almost everywhere.
Note that the hypothesis surely implies that J × χDφ = J × χ(φ−1 [Y ]) is measurable, so that J is measurable
(because Dφ is conegligible) and G ∈ Σ. Writing K(x) = 1/J(x) for x ∈ G, 0 for x ∈ X \ G, the function K : X → R
is measurable, and
χG × gφ = K × J × gφ
is measurable. So 235I tells us that g must be measurable, and we’re done.
Remark When J = χX, the hypothesis of this theorem becomes
T = {F : F ⊆ Y, φ−1 [F ] ∈ Σ}, νF = µφ−1 [F ] for every F ∈ T;
that is, ν is the image measure µφ−1 as defined in 234D.

235K Corollary Let (X, Σ, µ) be a complete measure space, and J a non-negative measurable function defined
on a conegligible subset of X. Let ν be the associated indefinite-integral measure, and T its domain.
R Then
R for any
real-valued function g defined on a subset of X, g is T-measurable iff J × g is Σ-measurable, and g dν = J × g dµ if
either integral is defined in [−∞, ∞], provided that we interpret (J × g)(x) as 0 when J(x) = 0 and g(x) is undefined.
proof Put 235J, taking Y = X and φ the identity function, together with 234Ld.

235L Applying the Radon-Nikodým theorem In order to use 235A-235J effectively, we need to be able to find
suitable functions J. This can be difficult – some very special examples will take up most of Chapter 26 below. But
there are many circumstances in which we can be sure that such J exist, even if Rwe do not know what they are. A
minimal requirement is that if νF < ∞ and µ∗ φ−1 [F ] = 0 then νF = 0, because J × χ(φ−1 [F ])dµ will be zero for
any J. A sufficient condition, in the special case of indefinite-integral measures, is in 234O. Another is the following.
128 The Radon-Nikodým theorem 235M

235M Theorem Let (X, Σ, µ) be a σ-finite measure space, (Y, T, ν) a semi-finite measure space, and φ : D → Y a
function such that
(i) D is a conegligible subset of X,
(ii) φ−1 [F ] ∈ Σ for every F ∈ T;
(iii) µφ−1 [F ] > 0 whenever F ∈ T and νF > 0. R
Then there is a Σ-measurable function J : X → [0, ∞[ such that J × χφ−1 [F ] dµ = νF for every F ∈ T.
proof (a) To begin with (down to the end of (c) below) let us suppose that D = X and that ν is totally finite.
Set T̃ = {φ−1 [F ] : F ∈ T} ⊆ Σ. Then T̃ is a σ-algebra of subsets of X. P
P (i)
∅ = φ−1 [∅] ∈ T̃.
(ii) If E ∈ T̃, take F ∈ T such that E = φ−1 [F ], so that
X \ E = φ−1 [Y \ F ] ∈ T̃.
(iii) If hEn in∈N is any sequence in T̃, then for each n ∈ N choose Fn ∈ T such that En = φ−1 [Fn ]; then
S −1
S
n∈N En = φ [ n∈N Fn ] ∈ T̃. Q
Q
Next, we have a totally finite measure ν̃ : T̃ → [0, νY ] given by setting
ν̃(φ−1 [F ]) = νF for every F ∈ T.
P (i) If F , F ′ ∈ T and φ−1 [F ] = φ−1 [F ′ ], then φ−1 [F △F ′ ] = ∅, so µ(φ−1 [F △F ′ ]) = 0 and ν(F △F ′ ) = 0; consequently
P
νF = νF ′ . This shows that ν̃ is well-defined. (ii) Now
ν̃∅ = ν̃(φ−1 [∅]) = ν∅ = 0.
in∈N is a disjoint sequence in T̃, let hFn in∈N be a sequence in T such that En = φ−1 [Fn ] for each n; set
(iii) If hEnS
Fn = Fn \ m<n Fm for each n; then En = φ−1 [Fn′ ] for each n, so

S S S P∞ P∞
ν̃( n∈N En ) = ν̃(φ−1 [ n∈N Fn′ ]) = ν( n∈N Fn′ ) = n=0 νFn′ = n=0 ν̃En . Q
Q
Finally, observe that if ν̃E > 0 then µE > 0, because E = φ−1 [F ] where νF > 0.
R
(b) By 215B(ix) there is a Σ-measurable function h : X → ]0, ∞[ such that h dµ is finite. Define µ̃ : T̃ → [0, ∞[
R
by setting µ̃E = E h dµ for every E ∈ T̃; then µ̃ is a totally finite measure. If E ∈ T̃ and µ̃E = 0, then (because h is
strictly positive) µE = 0 and ν̃E = 0. Accordingly we mayR apply the Radon-Nikodým theorem to µ̃ and ν̃ to see that
there is a T̃-measurable function g : X → R such that E g dµ̃ = ν̃E for every E ∈ T̃. Because ν̃ is non-negative, we
may suppose that g ≥ 0.
(c) Applying 235A to µ, µ̃, h and the identity function from X to itself, we see that
R R
E
g × h dµ = E
g dµ̃ = ν̃E

for every E ∈ T̃, that is, that


R
J × χ(φ−1 [F ])dµ = νF
for every F ∈ T, writing J = g × h.
(d) This completes the proof when ν is totally finite and D = X. For the general case, if Y = ∅ then µX = 0 and
the result is trivial. Otherwise, let φ̂ be any extension of φ to a function from X to Y which is constant on X \ D;
then φ̂−1 [F ] ∈ Σ for every F ∈ T, because D = φ−1 [Y ] ∈ Σ and φ̂−1 [F ] is always either φ−1 [F ] or (X \ D) ∪ φ−1 [F ].
Now ν must be σ-finite. P P Use the criterion of 215B(ii). If F is a disjoint family in {F : F ∈ T, 0 < νF < ∞}, then
−1
E = {φ̂ [F ] : F ∈ F} is a disjoint family in {E : µE > 0}, so E and F are countable. Q Q
Let hYn in∈N be a partition of Y into sets of finite ν-measure, and for each n ∈ N set νn F = ν(F ∩ Yn ) for every
F ∈ T. Then νn is a totally finite measure on Y , and if νn F > 0 then νF > 0 so
µφ̂−1 [F ] = µφ−1 [F ] > 0.
Accordingly µ, φ̂ and νn satisfy the assumptions of the theorem together with those of (a) above, and there is a
Σ-measurable function Jn : X → [0, ∞[ such that
R
νn F = Jn × χ(φ−1 [F ])dµ
P∞
for every F ∈ T. Now set J = n=0 Jn × χ(φ−1 [Yn ]), so that J : X → [0, ∞[ is Σ-measurable. If F ∈ T, then
*235P Measurable transformations 129

Z ∞ Z
X
−1
J × χ(φ [F ])dµ = Jn × χ(φ−1 [Yn ]) × χ(φ−1 [F ])dµ
n=0
X∞ Z ∞
X
= Jn × χ(φ−1 [F ∩ Yn ])dµ = ν(F ∩ Yn ) = νF,
n=0 n=0

as required.

235N Remark Theorem 235M can fail if µ is only strictly localizable rather than σ-finite. P P Let X = Y be an
uncountable set, Σ = PX, µ counting measure on X (112Bd), T the countable-cocountable σ-algebra of Y , ν the
countable-cocountable measure on Y (211R), φ : X → Y the identity map. Then φ−1 [F ] ∈ Σ and µφ−1 [F ] > 0
whenever νF > 0. But if J is any µ-integrable function on X, then F = {x : J(x) 6= 0} is countable and
R
ν(Y \ F ) = 1 6= 0 = φ−1 [Y \F ]
J dµ. Q
Q

*235O There are some simplifications in the case of σ-finite spaces; in particular, 235A and 235E become conflated.
I will give an adaptation of the hypotheses of 235A which may be used in the σ-finite case. First a lemma.

RLemma R Let (X,RΣ, µ) be a measure space and f a non-negative integrable function on X. If A ⊆ X is such that
A
f + X\A
f = f , then f × χA is integrable.
proof By 214Eb, there are µ-integrable functions f1 , f2 such that f1 extends f ↾A, f2 extends f ↾X \ A, and
R R R R
f = E∩A f ,
E 1
f = E\A f
E 2
R R
for every E ∈ Σ. Because f is non-negative, E f1 and E f2 are non-negative for every E ∈ Σ, and f1 , f2 are
non-negative a.e. Accordingly we have f × χA ≤a.e. f1 and f × χ(X \ A) ≤a.e. f2 , so that f ≤a.e. f1 + f2 . But also
R R R R R R
f1 + f2 = X
f1 + X
f2 = A
f+ X\A
f= f,
so f =a.e. f1 + f2 . Accordingly
f1 =a.e. f − f2 ≤a.e. f − f × χ(X \ A) = f × χA ≤a.e. f1
and f × χA =a.e. f1 is integrable.

*235P Proposition Let (X, Σ, µ) be a complete measure space and (Y, T, ν) a complete σ-finite measure space.
Suppose
R that φ : Dφ → Y , J : DJ → [0, ∞[ are functions defined on conegligible subsets Dφ , DJ of X such that
−1
φ [F ]
J dµ exists and is equal to νF whenever F ∈ T and νF < ∞.
(a) J × gφ is Σ-measurable for every T-measurable real-valued function g defined onRa subset of Y .R
(b) If g is a T-measurable real-valued function defined almost everywhere in Y , then J × gφ dµ = g dν whenever
either integral is defined in [−∞, ∞], interpreting (J × gφ)(x) as 0 when J(x) = 0, g(φ(x)) is undefined.
proof The point is that the hypotheses of 235E are satisfied. To see this, let us write ΣC = {E ∩ C : E ∈ Σ} and
µC = µ∗ ↾ΣC for the subspace measure on C, for each C ⊆ X. Let hYn in∈N be a non-decreasing sequence of sets with
union Y and with νYn < ∞ for every n ∈ N, starting from Y0 = ∅.
(i) Take any F ∈ T with νF < ∞, and set Fn = F ∪ Yn for each n ∈ N; write Cn = φ−1 [Fn ].
Fix n for the moment. Then our hypothesis implies that
R R R
C0
J dµ + Cn \C0
J dµ = νF + ν(Fn \ F ) = νFn = Cn
J dµ.
If we regard the subspace measures on C0 and Cn \ C0 as derived from the measure µCn of Cn (214Ce), then 235O
tells us that J × χC0 is µCn -integrable, and there is a µ-integrable function hn such that hn extends (J × χC0 )↾Cn .
Let E be a µ-conegligible set, included in the domainSDφ of φ, such that hn ↾E is Σ-measurable for every n. Because
hCn in∈N is a non-decreasing sequence with union φ−1 [ n∈N Fn ] = Dφ ,
(J × χC0 )(x) = limn→∞ hn (x)
for every x ∈ E, and (J × χC0 )↾E is measurable. At the same time, we know that there is a µ-integrable h extending
J↾C0 , and 0 ≤a.e. J × χC0 ≤a.e. |h|. Accordingly J × χC0 is integrable, and (using 214F)
R R R
J × χφ−1 [F ] dµ = J × χC0 dµ = C0
J↾C0 dµC0 = νF .
130 The Radon-Nikodým theorem *235P

(ii) This deals with F of finite measure. For general F ∈ T,


R R
J × χ(φ−1 [F ]) dµ = limn→∞ J × χ(φ−1 [F ∩ Yn ]) dµ = limn→∞ ν(F ∩ Yn ) = νF .
So the hypotheses of 235E are satisfied, and the result follows at once.
R
*235Q I remarked in 235Bb that a difficulty can arise in 235A, for general measure spaces, if we speak of φ−1 [F ]
J dµ
R
in the hypothesis, in place of J × χ(φ−1 [F ])dµ. Here is an example.
Example Set X = Y = [0, 2]. Write ΣL for the algebra of Lebesgue measurable subsets of R, and µL for Lebesgue
measure; write µc for counting measure on R. Set
Σ = T = {E : E ⊆ [0, 2], E ∩ [0, 1[ ∈ ΣL };
of course this is a σ-algebra of subsets of [0, 2]. For E ∈ Σ = T, set
µE = νE = µL (E ∩ [0, 1[) + µc (E ∩ [1, 2]);
then µ is a complete measure – in effect, it is the direct sum of Lebesgue measure on [0, 1[ and counting measure on
[1, 2] (see 214L). It is easy to see that
µ∗ B = µ∗L (B ∩ [0, 1[) + µc (B ∩ [1, 2])
for every B ⊆ [0, 2]. Let A ⊆ [0, 1[ be a non-Lebesgue-measurable set such that µ∗L (E \ A) = µL E for every Lebesgue
measurable E ⊆ [0, 1[ (see 134D). Define φ : [0, 2] → [0, 2] by setting φ(x) = x + 1 if x ∈ A, φ(x) = x if x ∈ [0, 2] \ A.
If F ∈ Σ, then µ∗ (φ−1 [F ]) = µF . P
P (i) If F ∩ [1, 2] is finite, then µF = µL (F ∩ [0, 1]) + #(F ∩ [1, 2]). Now
φ−1 [F ] = (F ∩ [0, 1[ \ A) ∪ (F ∩ [1, 2]) ∪ {x : x ∈ A, x + 1 ∈ F };
as the last set is finite, therefore µ-negligible,
µ∗ (φ−1 [F ]) = µ∗L (F ∩ [0, 1[ \ A) + #(F ∩ [1, 2]) = µL (F ∩ [0, 1[) + #(F ∩ [1, 2]) = µF .
(ii) If F ∩ [1, 2] is infinite, so is φ−1 [F ] ∩ [1, 2], so
µ∗ (φ−1 [F ]) = ∞ = µF . Q
Q
This means that if we set J(x) = 1 for every x ∈ [0, 2],
R
φ−1 [F ]
J dµ = µφ−1 [F ] (φ−1 [F ]) = µ∗ (φ−1 [F ]) = µF

R F ∈ Σ, and φ, J satisfy the amended hypotheses for 235A. But if we set g = χ [0, 1[, then g is µ-integrable,
for every
with g dµ = 1, while
J(x)g(φ(x)) = 1 if x ∈ [0, 1] \ A, 0 otherwise,
so, because A ∈
/ Σ, J × gφ is not measurable, and therefore (since µ is complete) not µ-integrable.

235R Reversing the burden Throughout the work above, I have been using the formula
R R
J × gφ = g,
as being the natural extension of the formula
R R
g= gφ × φ′
of ordinary advanced calculus. But we can also move the ‘derivative’ J to the other side of the equation, as follows.
R
Theorem Let (X, Σ, µ), (Y, T, ν) be measure spaces and φ : X → YR, J : Y →R [0, ∞[ functions such that F J dν
and µφ−1 [F ] are defined in [0, ∞] and equal for every F ∈ T. Then gφ dµ = J × g dν whenever g is ν-virtually
measurable and defined ν-almost everywhere and either integral is defined in [−∞, ∞].
proof Let ν1 be the indefinite-integral measure over ν defined
R by J, and µ̂ the completion of µ. Then φ is inverse-
P If F ∈ T, then ν1 F = F J dν = µφ−1 [F ]; that is, φ is inverse-measure-preserving
measure-preserving for µ̂ and ν1 . P
for µ and ν1 ↾T. Since ν1 is the completion of ν1 ↾T (234Lb), φ is inverse-measure-preserving for µ and ν1 (234Ba). QQ
Of course we can also regard ν1 as being an indefinite-integral measure over the completion ν̂ of ν (212Fb). So if g
is ν-virtually measurable and defined ν-almost everywhere,
R R R R R
J × g dν = J × g dν̂ = g dν1 = gφ dµ̂ = gφ dµ
if any of the five integrals is defined in [−∞, ∞], by 235K, 235Gb and 212Fb again.
235Xm Measurable transformations 131

235X Basic exercises (a) Explain what 235A tells us when X = Y , T = Σ, φ is the identity function and
νE = αµE for every E ∈ Σ.

(b) Let (X, Σ, µ) be a measure space, J an integrable non-negative real-valued function on X, and φ : Dφ → R a
measurable function, where Dφ is a conegligible subset of X. Set
R
g(a) = {x:φ(x)≤a}
J
R R
for a ∈ R, and let µg be the Lebesgue-Stieltjes measure associated with g. Show that J × f φ dµ = f dµg for every
µg -integrable real function f .

(c) Let Σ, T and Λ be σ-algebras of subsets of X, Y and Z respectively. Let us say that a function φ : A → Y , where
A ⊆ X, is (Σ, T)-measurable if φ−1 [F ] ∈ ΣA , the subspace σ-algebra of A, for every F ∈ T. Suppose that A ⊆ X,
B ⊆ Y , φ : A → Y is (Σ, T)-measurable and ψ : B → Z is (T, Λ)-measurable. Show that ψφ is (Σ, Λ)-measurable.
Deduce 235C.

(d) Let (X, Σ, µ) be a measure space and (Y, T, ν) a semi-finite measure


R space. Let φ : Dφ → Y and J : DJ → [0, ∞[
be functions defined on conegligible subsets Dφ , DJ of X such that J × χ(φ−1 [F ])dµ exists = νF whenever F ∈ T
and νF < ∞. Let g be a T-measurable real-valued function, defined on a conegligible subset of Y . Show that J × gφ is
µ-integrable iff g is ν-integrable, and the integrals are then equal, provided we interpret (J × gφ)(x) as 0 when J(x) = 0
and g(φ(x)) is undefined.

(e) Let (X, Σ, µ) be a measure space and E ∈ Σ. Define a measure µ E on X by setting (µ E)(FR) = µ(E ∩ F )
whenever
R F ⊆ X is such that F ∩ E ∈ Σ. Show that, for any function f from a subset of X to [−∞, ∞], f d(µ E) =
E
f dµ if either is defined in [−∞, ∞].

> (f ) Let g : R → R be a non-decreasing function which is absolutely continuous on every closed bounded interval, and
µg the associated Lebesgue-Stieltjes
R measure
R (114Xa, 225Xf). Write µ for Lebesgue measure on R, and let f : R → R
be a function. Show that f × g ′ dµ = f dµg in the sense that if one of the integrals exists, finite or infinite, so does
the other, and they are then equal.

(g) Let g : R → R be a non-decreasing function and J a non-negative


R real-valued µg -integrable function, where
µg is the Lebesgue-Stieltjes measure defined from g. Set h(x) = ]−∞,x] J dµg for each x ∈ R, and let µh be the
R R
Lebesgue-Stieltjes measure associated with h. Show that, for any f : R → R, f × J dµg = f dµh , in the sense that
if one of the integrals is defined in [−∞, ∞] so is the other, and they are then equal.

> (h) Let X be a set and λ, µ, ν three measures on X such that µ is an indefinite-integral measure over λ, with
Radon-Nikodým derivative f , and ν is an indefinite-integral measure over µ, with Radon-Nikodým derivative g. Show
that ν is an indefinite-integral measure over λ, and that f × g is a Radon-Nikodým derivative of ν with respect to λ,
provided we interpret (f × g)(x) as 0 when f (x) = 0 and g(x) is undefined.
R
(i) In 235M, if ν is not semi-finite, show that we can still find a J such that φ−1 [F ] J dµ = νF for every set F of
finite measure. (Hint: use the ‘semi-finite version’ of ν, as described in 213Xc.)

(j) Let (X, Σ, µ) be a σ-finite measure space, and T a σ-subalgebra of Σ. Let ν : T → R be a countably additive
functional
R such that νF = 0 whenever F ∈ T and µF = 0. Show that there is a µ-integrable function f such that
F
f dµ = νF for every F ∈ T. (Hint: use the method of 235M, applied to the positive and negative parts of ν.)

(k) Let (X, Σ, µ) and (Y, T, ν) be measure spaces, with completions (X, Σ̂, µ̂)R and (Y, T̂, ν̂). Let φ : Dφ → Y ,
J : DJ → [0, ∞[ be functions defined on conegligible subsets of X. Show that if φ−1 [F ] J dµ = νF whenever F ∈ T
R
and νF < ∞, then φ−1 [F ] J dµ = νF whenever F ∈ T̂ and ν̂F < ∞. Hence, or otherwise, show that 235Pb is valid for
non-complete spaces (X, Σ, µ) and (Y, T, ν).

(l) Let (X, Σ, µ) be a complete measure space, Y a set, φ : X → Y a function and ν = µφ−1 the corresponding
image measure on Y . Let ν1 be an indefinite-integral measure over ν. Show that there is an indefinite-integral measure
µ1 over µ such that ν1 is the image measure µ1 φ−1 .

(m) Let (X, Σ, µ) and (Y, T, ν) be measure spaces, and φ : X → Y an inverse-measure-preserving function. Let ν1
be an indefinite-integral measure over ν. Show that there is an indefinite-integral measure µ1 over µ such that φ is
inverse-measure-preserving for µ1 and ν1 .
132 The Radon-Nikodým theorem 235Y

235Y Further exercises (a) Write T for the algebra of Borel subsets of Y = [0, 1], and ν for the restriction of
Lebesgue measure to T. Let A ⊆ [0, 1] be a set such that both A and [0, 1] \ A have Lebesgue outer measure 1, and
set X = A ∪ [1, 2]. Let Σ be the algebra of relatively Borel subsets of X, and set µE = µA (A ∩ E) for E ∈ Σ, where
µA is the subspace measure induced on A by Lebesgue measure. Define φ : X → Y by setting φ(x) = x if x ∈ A, x − 1
if x ∈ X \ A. Show that ν is the image measure µφ−1 , but that, setting g = χ([0, 1] \ A), gφ is µ-integrable while g is
not ν-integrable.

(b)R Let (X, Σ, µ) be a probability space and T a σ-subalgebra of Σ. Let f be a non-negative µ-integrable function
with f dµ = 1, so that its indefinite-integral measure ν is a probability measure. Let g be a ν-integrable real-valued
function and set h = f ×g, intepreting h(x) as 0 if f (x) = 0 and g(x) is undefined. Let f1 , h1 be conditional expectations
of f , h on T with respect to the measure µ, and set g1 = h1 /f1 , interpreting g1 (x) as 0 if h1 (x) = 0 and f1 (x) is either
0 or undefined. Show that g1 is a conditional expectation of g on T with respect to the measure ν.

235 Notes and comments I see that I have taken up a great deal of space in this section with technicalities; the
hypotheses of the theorems vary erratically, with completeness, in particular, being invoked at apparently arbitrary
intervals, and ideas repeat themselves in a haphazard pattern. There is nothing deep, and most of the work consists in
laboriously verifying details. The trouble with this topic is that it is useful. The results here are abstract expressions
of integration-by-substitution; they have applications all over measure theory. I cannot therefore content myself with
theorems which will elegantly express the underlying ideas, but must seek formulations which I can quote in later
arguments.
I hope that the examples in 235Bb, 235H, 235N, 235Q, 234Ya and 235Ya will go some way to persuade you that there
are real traps for the unwary, and that the careful verifications written out at such length are necessary. On the other
hand, it is happily the case that in simple contexts, in which the measures µ, ν are σ-finite and the transformations
φ are Borel isomorphisms, no insuperable difficulties arise, and in particular the image measure catastrophe does not
trouble us. But for further work in this direction I refer you to the applications in §263, §265 and §271, and to Volume
4.
241Bd L0 and L0 133

Chapter 24
Function spaces
The extraordinary power of Lebesgue’s theory of integration is perhaps best demonstrated by its ability to provide
structures relevant to questions quite different from those to which it was at first addressed. In this chapter I give the
constructions, and elementary properties, of some of the fundamental spaces of functional analysis.
I do not feel called on here to justify the study of normed spaces; if you have not met them before, I hope that the
introduction here will show at least that they offer a basis for a remarkable fusion of algebra and analysis. The fragments
of the theory of metric spaces, normed spaces and general topology which we shall need are sketched in §§2A2-2A5.
The principal ‘function spaces’ described in this chapter in fact combine three structural elements: they are (infinite-
dimensional) linear spaces, they are metric spaces, with associated concepts of continuity and convergence, and they
are ordered spaces, with corresponding notions of supremum and infimum. The interactions between these three types
of structure provide an inexhaustible wealth of ideas. Furthermore, many of these ideas are directly applicable to a
wide variety of problems in more or less applied mathematics, particularly in differential and integral equations, but
more generally in any system with infinitely many degrees of freedom.
I have laid out the chapter with sections on L0 (the space of equivalence classes of all real-valued measurable functions,
in which all the other spaces of the chapter are embedded), L1 (equivalence classes of integrable functions), L∞
(equivalence classes of bounded measurable functions) and Lp (equivalence classes of pth-power-integrable functions).
While ordinary functional analysis gives much more attention to the Banach spaces Lp for 1 ≤ p ≤ ∞ than to L0 ,
from the special point of view of this book the space L0 is at least as important and interesting as any of the others.
Following these four sections, I return to a study of the standard topology on L0 , the topology of ‘convergence in
measure’ (§245), and then to two linked sections on uniform integrability and weak compactness in L1 (§§246-247).
There is a technical point here which must never be lost sight of. While it is customary and natural to call L1 , L2
and the others ‘function spaces’, their elements are not in fact functions, but equivalence classes of functions. As you
see from the language of the preceding paragraph, my practice is to scrupulously maintain the distinction; I give my
reasons in the notes to §241.

241 L0 and L0
The chief aim of this chapter is to discuss the spaces L1 , L∞ and Lp of the following three sections. However it
will be convenient to regard all these as subspaces of a larger space L0 of equivalence classes of (virtually) measurable
functions, and I have collected in this section the basic facts concerning the ordered linear space L0 .
It is almost the first principle of measure theory that sets of measure zero can often be ignored; the phrase ‘negligible
set’ itself asserts this principle. Accordingly, two functions which agree almost everywhere may often (not always!)
be treated as identical. A suitable expression of this idea is to form the space of equivalence classes of functions,
saying that two functions are equivalent if they agree on a conegligible set. This is the basis of all the constructions of
this chapter. It is a remarkable fact that the spaces of equivalence classes so constructed are actually better adapted
to certain problems than the spaces of functions from which they are derived, so that once the technique has been
mastered it is easier to do one’s thinking in the more abstract spaces.

241A The space L0 : Definition It is time to give a name to a set of functions which has already been used more
than once. Let (X, Σ, µ) be a measure space. I write L0 , or L0 (µ), for the space of real-valued functions f defined on
conegligible subsets of X which are virtually measurable, that is, such that f ↾E is measurable for some conegligible
set E ⊆ X. Recall that f is µ-virtually measurable iff it is Σ̂-measurable, where Σ̂ is the completion of Σ (212Fa).

241B Basic properties If (X, Σ, µ) is any measure space, then we have the following facts, corresponding to the
fundamental properties of measurable functions listed in §121 of Volume 1. I work through them in order, so that if
you have Volume 1 to hand you can see what has to be missed out.
(a) A constant real-valued function defined almost everywhere in X belongs to L0 (121Ea).

(b) f + g ∈ L0 for all f , g ∈ L0 (for if f ↾F and g↾G are measurable, then (f + g)↾(F ∩ G) = (f ↾F ) + (g↾G) is
measurable)(121Eb).

(c) cf ∈ L0 for all f ∈ L0 , c ∈ R (121Ec).

(d) f × g ∈ L0 for all f , g ∈ L0 (121Ed).


134 Function spaces 241Be

(e) If f ∈ L0 and h : R → R is Borel measurable, then hf ∈ L0 (121Eg).

(f ) If hfn in∈N is a sequence in L0 and f = limn→∞ fn is defined (as a real-valued function) almost everywhere in
X, then f ∈ L0 (121Fa).

(g) If hfn in∈N is a sequence in L0 and f = supn∈N fn is defined (as a real-valued function) almost everywhere in X,
then f ∈ L0 (121Fb).

(h) If hfn in∈N is a sequence in L0 and f = inf n∈N fn is defined (as a real-valued function) almost everywhere in X,
then f ∈ L0 (121Fc).

(i) If hfn in∈N is a sequence in L0 and f = lim supn→∞ fn is defined (as a real-valued function) almost everywhere
in X, then f ∈ L0 (121Fd).

(j) If hfn in∈N is a sequence in L0 and f = lim inf n→∞ fn is defined (as a real-valued function) almost everywhere in
X, then f ∈ L0 (121Fe).

(k) L0 is just the set of real-valued functions, defined on subsets of X, which are equal almost everywhere to
some Σ-measurable function from X to R. P P (i) If g : X → R is Σ-measurable and f =a.e. g, then F = {x : x ∈
dom f, f (x) = g(x)} is conegligible and f ↾F = g↾F is measurable (121Eh), so f ∈ L0 . (ii) If f ∈ L0 , let E ⊆ X be a
conegligible set such that f ↾E is measurable. Then D = E ∩ dom f is conegligible and f ↾D is measurable, so there is
a measurable h : X → R agreeing with f on D (121I); and h =a.e. f . QQ

241C The space L0 : Definition Let (X, Σ, µ) be any measure space. Then =a.e. is an equivalence relation on L0 .
Write L0 , or L0 (µ), for the set of equivalence classes in L0 under =a.e. . For f ∈ L0 , write f • for its equivalence class
in L0 .

241D The linear structure of L0 Let (X, Σ, µ) be any measure space, and set L0 = L0 (µ), L0 = L0 (µ).

(a) If f1 , f2 , g1 , g2 ∈ L0 , f1 =a.e. f2 and g1 =a.e. g2 then f1 + g1 =a.e. f2 + g2 . Accordingly we may define addition
on L0 by setting f • + g • = (f + g)• for all f , g ∈ L0 .

(b) If f1 , f2 ∈ L0 and f1 =a.e. f2 , then cf1 =a.e. cf2 for every c ∈ R. Accordingly we may define scalar multiplication
on L0 by setting c · f • = (cf )• for all f ∈ L0 and c ∈ R.

(c) Now L0 is a linear space over R, with zero 0• , where 0 is the function with domain X and constant value 0, and
negatives −(f • ) = (−f )• . P
P (i)
f + (g + h) = (f + g) + h for all f , g, h ∈ L0 ,
so
u + (v + w) = (u + v) + w for all u, v, w ∈ L0 .
(ii)
f + 0 = 0 + f = f for every f ∈ L0 ,
so
u + 0• = 0• + u = u for every u ∈ L0 .
(iii)
f + (−f ) =a.e. 0 for every f ∈ L0 ,
so
f • + (−f )• = 0• for every f ∈ L0 .
(iv)
f + g = g + f for all f , g ∈ L0 ,
so
u + v = v + u for all u, v ∈ L0 .
241Ed L0 and L0 135

(v)
c(f + g) = cf + cg for all f , g ∈ L0 and c ∈ R,
so
c(u + v) = cu + cv for all u, v ∈ L0 and c ∈ R.
(vi)
(a + b)f = af + bf for all f ∈ L0 and a, b ∈ R,
so
(a + b)u = au + bu for all u ∈ L0 and a, b ∈ R.
(vii)
(ab)f = a(bf ) for all f ∈ L0 and a, b ∈ R,
so
(ab)u = a(bu) for all u ∈ L0 and a, b ∈ R.
(viii)
1f = f for all f ∈ L0 ,
so
1u = u for all u ∈ L0 . Q
Q

241E The order structure of L0 Let (X, Σ, µ) be any measure space and set L0 = L0 (µ), L0 = L0 (µ).

(a) If f1 , f2 , g1 , g2 ∈ L0 , f1 =a.e. f2 , g1 =a.e. g2 and f1 ≤a.e. g1 , then f2 ≤a.e. g2 . Accordingly we may define a
relation ≤ on L0 by saying that f • ≤ g • iff f ≤a.e. g.

(b) Now ≤ is a partial order on L0 . P P (i) If f , g, h ∈ L0 and f ≤a.e. g and g ≤a.e. h, then f ≤a.e. h. Accordingly
u ≤ w whenever u, v, w ∈ L , u ≤ v and v ≤ w. (ii) If f ∈ L0 then f ≤a.e. f ; so u ≤ u for every u ∈ L0 . (iii) If f ,
0

g ∈ L0 and f ≤a.e. g and g ≤a.e. f , then f =a.e. g, so if u ≤ v and v ≤ u then u = v. Q


Q

(c) In fact L0 , with ≤, is a partially ordered linear space, that is, a (real) linear space with a partial order ≤
such that
if u ≤ v then u + w ≤ v + w for every w,
if 0 ≤ u then 0 ≤ cu for every c ≥ 0.
P (i) If f , g, h ∈ L0 and f ≤a.e. g, then f + h ≤a.e. g + h. (ii) If f ∈ L0 and f ≥ 0 a.e., then cf ≥ 0 a.e. for every
P
c ≥ 0. Q
Q

(d) More: L0 is a Riesz space or vector lattice, that is, a partially ordered linear space such that u∨v = sup{u, v}
and u ∧ v = inf{u, v} are defined for all u, v ∈ L0 . P
P Take f , g ∈ L0 such that f • = u and g • = v. Then f ∨ g,
0
f ∧ g ∈ L , writing
(f ∨ g)(x) = max(f (x), g(x)), (f ∧ g)(x) = min(f (x), g(x))
for x ∈ dom f ∩ dom g. (Compare 241Bg-h.) Now, for any h ∈ L0 , we have
f ∨ g ≤a.e. h ⇐⇒ f ≤a.e. h and g ≤a.e. h,

h ≤a.e. f ∧ g ⇐⇒ h ≤a.e. f and h ≤a.e. g,


so for any w ∈ L0 we have
(f ∨ g)• ≤ w ⇐⇒ u ≤ w and v ≤ w,

w ≤ (f ∧ g)• ⇐⇒ w ≤ u and w ≤ v.
Thus we have
(f ∨ g)• = sup{u, v} = u ∨ v, (f ∧ g)• = inf{u, v} = u ∧ v
in L0 . Q
Q
136 Function spaces 241Ee

(e) In particular, for any u ∈ L0 we can speak of |u| = u ∨ (−u); if f ∈ L0 then |f • | = |f |• .


If f , g ∈ L0 , c ∈ R then
1
|cf | = |c||f |, f ∨ g = (f + g + |f − g|),
2

1
f ∧ g = (f + g − |f − g|), |f + g| ≤a.e. |f | + |g|,
2
so
1
|cu| = |c||u|, u ∨ v = (u + v + |u − v|),
2

1
u ∧ v = (u + v − |u − v|), |u + v| ≤ |u| + |v|
2

for all u, v ∈ L0 .

(f ) A special notation is often useful. If f is a real-valued function, set f + (x) = max(f (x), 0), f − (x) = max(−f (x), 0)
for x ∈ dom f , so that
f = f + − f −, |f | = f + + f − = f + ∨ f − ,
all these functions being defined on dom f . In L0 , the corresponding operations are u+ = u ∨ 0, u− = (−u) ∨ 0, and
we have
u = u+ − u− , |u| = u+ + u− = u+ ∨ u− , u+ ∧ u− = 0.

(g) It is perhaps obvious, but I say it anyway: if u ≥ 0 in L0 , then there is an f ≥ 0 in L0 such that f • = u. P
P
Take any g ∈ L0 such that u = g • , and set f = g ∨ 0. Q
Q

241F Riesz spaces There is an extensive abstract theory of Riesz spaces, which I think it best to leave aside for the
moment; a general account may be found in Luxemburg & Zaanen 71 and Zaanen 83; my own book Fremlin 74
covers the elementary material, and Chapter 35 in the next volume repeats the most essential ideas. For our purposes
here we need only a few definitions and some simple results which are most easily proved for the special cases in which
we need them, without reference to the general theory.
(a) A Riesz space U is Archimedean if whenever u ∈ U , u > 0 (that is, u ≥ 0 and u 6= 0), and v ∈ U , there is an
n ∈ N such that nu 6≤ v.

(b) A Riesz space U is Dedekind σ-complete (or σ-order-complete, or σ-complete) if every non-empty count-
able set A ⊆ U which is bounded above has a least upper bound in U .

(c) A Riesz space U is Dedekind complete (or order complete, or complete) if every non-empty set A ⊆ U
which is bounded above in U has a least upper bound in U .

241G Now we have the following important properties of L0 .


Theorem Let (X, Σ, µ) be a measure space. Set L0 = L0 (µ).
(a) L0 is Archimedean and Dedekind σ-complete.
(b) If (X, Σ, µ) is semi-finite, then L0 is Dedekind complete iff (X, Σ, µ) is localizable.
proof Set L0 = L0 (µ).
(a)(i) If u, v ∈ L0 and u > 0, express u as f • and v as g • where f , g ∈ L0 . Then E = {x : x ∈ dom f, f (x) > 0} is
not negligible. So there is an n ∈ N such that
En = {x : x ∈ dom f ∩ dom g, nf (x) > g(x)}
S
is not negligible, since E ∩ dom g ⊆ n∈N En . But now nu 6≤ v. As u and v are arbitrary, L0 is Archimedean.
(ii) Now let A ⊆ L0 be a non-empty countable set with an upper bound w in L0 . Express A as {fn• : n ∈ N}
where hfn in∈N is a sequence
T in L0 , and w as h• where h ∈ L0 . Set f = supn∈N fn . Then we have f (x) defined in R at
any point x ∈ dom h ∩ n∈N dom fn such that fn (x) ≤ h(x) for every n ∈ N, that is, for almost every x ∈ X; so f ∈ L0
(241Bg). Set u = f • ∈ L0 . If v ∈ L0 , say v = g • where g ∈ L0 , then
241Ha L0 and L0 137

un ≤ v for every n ∈ N
⇐⇒ for every n ∈ N, fn ≤a.e. g
⇐⇒ for almost every x ∈ X, fn (x) ≤ g(x) for every n ∈ N
⇐⇒ f ≤a.e. g ⇐⇒ u ≤ v.
Thus u = supn∈N un in L0 . As A is arbitrary, L0 is Dedekind σ-complete.
(b)(i) Suppose that (X, Σ, µ) is localizable. Let A ⊆ L0 be any non-empty set with an upper bound w0 ∈ L0 . Set
A = {f : f is a measurable function from X to R, f • ∈ A};
then every member of A is of the form f • for some f ∈ A (241Bk). For each q ∈ Q, let Eq be the family of subsets of
X expressible in the form {x : f (x) ≥ q} for some f ∈ A; then Eq ⊆ Σ. Because (X, Σ, µ) is localizable, there is a set
Fq ∈ Σ which is an essential supremum for Eq . For x ∈ X, set
g ∗ (x) = sup{q : q ∈ Q, x ∈ Fq },
allowing ∞ as the supremum of a set which is not bounded above, and −∞ as sup ∅. Then
S
{x : g ∗ (x) > a} = q∈Q,q>a Fq ∈ Σ
for every a ∈ R.
If f ∈ A, then f ≤a.e. g ∗ . P
P For each q ∈ Q, set
Eq = {x : f (x) ≥ q} ∈ Eq ;
S
then Eq \ Fq is negligible. Set H = q∈Q (Eq \ Fq ). If x ∈ X \ H, then
f (x) ≥ q =⇒ g ∗ (x) ≥ q,
so f (x) ≤ g ∗ (x); thus f ≤a.e. g ∗ . Q
Q
If h : X → R is measurable and u ≤ h• for every u ∈ A, then g ∗ ≤a.e. h. P P Set Gq = {x : h(x) ≥ q} for each q ∈ Q.
If E ∈ Eq , there is an f ∈ A such that E = {x : f (x) ≥ q}; now f ≤a.e. h, so E \ Gq ⊆ {x : f (x) > h(x)} is negligible.
Because Fq is an essential supremum for Eq , Fq \ Gq is negligible; and this is true for every q ∈ Q. Consequently
S
{x : h(x) < g ∗ (x)} ⊆ q∈Q Fq \ Gq
is negligible, and g ∗ ≤a.e. h. Q
Q
Now recall that we are assuming that A 6= ∅ and that A has an upper bound w0 ∈ L0 . Take any f0 ∈ A and a
measurable h0 : X → R such that h•0 = w0 ; then f ≤a.e. h0 for every f ∈ A, so f0 ≤a.e. g ∗ ≤a.e. h0 , and g ∗ must be
finite a.e. Setting g(x) = g ∗ (x) when g ∗ (x) ∈ R, we have g ∈ L0 and g =a.e. g ∗ , so that
f ≤a.e. g ≤a.e. h
whenever f , h are measurable functions from X to R, f • ∈ A and h• is an upper bound for A; that is,
u ≤ g• ≤ w
whenever u ∈ A and w is an upper bound for A. But this means that g • is a least upper bound for A in L0 . As A is
arbitrary, L0 is Dedekind complete.
(ii) Suppose that L0 is Dedekind complete. We are assuming that (X, Σ, µ) is semi-finite. Let E be any subset of
Σ. Set
A = {0} ∪ {(χE)• : E ∈ E} ⊆ L0 .
Then A is bounded above by (χX)• so has a least upper bound w ∈ L0 . Express w as h• where h : X → R is
measurable, and set F = {x : h(x) > 0}. Then F is an essential supremum for E in Σ. P P (α) If E ∈ E, then (χE)• ≤ w
so χE ≤a.e. h, that is, h(x) ≥ 1 for almost every x ∈ E, and E \ F ⊆ {x : x ∈ E, h(x) < 1} is negligible. (β) If G ∈ Σ
and E \ G is negligible for every E ∈ E, then χE ≤a.e. χG for every E ∈ E, that is, (χE)• ≤ (χG)• for every E ∈ E; so
w ≤ (χG)• , that is, h ≤a.e. χG. Accordingly F \ G ⊆ {x : h(x) > (χG)(x)} is negligible. Q
Q
As E is arbitrary, (X, Σ, µ) is localizable.

241H The multiplicative structure of L0 Let (X, Σ, µ) be any measure space; write L0 = L0 (µ), L0 = L0 (µ).

(a) If f1 , f2 , g1 , g2 ∈ L0 , f1 =a.e. f2 and g1 =a.e. g2 then f1 × g1 =a.e. f2 × g2 . Accordingly we may define


multiplication on L0 by setting f • × g • = (f × g)• for all f , g ∈ L0 .
138 Function spaces 241Hb

(b) It is now easy to check that, for all u, v, w ∈ L0 and c ∈ R,


u × (v × w) = (u × v) × w,
u × e = e × u = u,
where e = χX • is the equivalence class of the function with constant value 1,
c(u × v) = cu × v = u × cv,
u × (v + w) = (u × v) + (u × w),
(u + v) × w = (u × w) + (v × w),
u × v = v × u,
|u × v| = |u| × |v|,
u × v = 0 iff |u| ∧ |v| = 0,
|u| ≤ |v| iff there is a w such that |w| ≤ e and u = v × w.

241I The action of Borel functions on L0 Let (X, Σ, µ) be a measure space and h : R → R a Borel measurable
function. Then hf ∈ L0 = L0 (µ) for every f ∈ L0 (241Be) and hf =a.e. hg whenever f =a.e. g. So we have a function
h̄ : L0 → L0 defined by setting h̄(f • ) = (hf )• for every f ∈ L0 . For instance, if u ∈ L0 and p ≥ 1, we can consider
|u|p = h̄(u) where h(x) = |x|p for x ∈ R.

241J Complex L0 The ideas of this chapter, like those of Chapters 22-23, are often applied to spaces based on
complex-valued functions instead of real-valued functions. Let (X, Σ, µ) be a measure space.

(a) We may write L0C = L0C (µ) for the space of complex-valued functions f such that dom f is a conegligible subset
of X and there is a conegligible subset E ⊆ X such that f ↾E is measurable; that is, such that the real and imaginary
parts of f both belong to L0 (µ). Next, L0C = L0C (µ) will be the space of equivalence classes in L0C under the equivalence
relation =a.e. .

(b) Using just the same formulae as in 241D, it is easy to describe addition and scalar multiplication rendering L0C a
linear space over C. We no longer have quite the same kind of order structure, but we can identify a ‘real part’, being
{f • : f ∈ L0C is real a.e.},
obviously identifiable with the real linear space L0 , and corresponding maps u 7→ Re(u), u 7→ Im(u) : L0C → L0 such
that u = Re(u) + i Im(u) for every u. Moreover, we have a notion of ‘modulus’, writing
|f • | = |f |• ∈ L0 for every f ∈ L0C ,
satisfying the basic relations |cu| = |c||u|, |u + v| ≤ |u| + |v| for u, v ∈ L0C and c ∈ C, as in 241Ef. We do of course still
have a multiplication on L0C , for which all the formulae in 241H are still valid.

(c) The following fact is useful. For any u ∈ L0C , |u| is the supremum in L0 of {Re(ζu) : ζ ∈ C, |ζ| = 1}. P P (i)
If |ζ| = 1, then Re(ζu) ≤ |ζu| = |u|. So |u| is an upper bound of {Re(ζu) : |ζ| = 1}. (ii) If v ∈ L0 and Re(ζu) ≤ v
whenever |ζ| = 1, then express u, v as f • , g • where f : X → C and g : X → R are measurable. For any q ∈ Q,
x ∈ X set fq (x) = Re(eiqx f (x)). Then fq ≤a.e. g. Accordingly H = {x : fq (x) ≤ g(x) for every q ∈ Q} is conegligible.
But of course H = {x : |f (x)| ≤ g(x)}, so |f | ≤a.e. g and |u| ≤ v. As v is arbitrary, |u| is the least upper bound of
{Re(ζu) : |ζ| = 1}. Q
Q

241X Basic exercises > (a) Let X be a set, and let µ be counting measure on X (112Bd). Show that L0 (µ) can
be identified with L0 (µ) = RX .

> (b) Let (X, Σ, µ) be a measure space and µ̂ the completion of µ. Show that L0 (µ) = L0 (µ̂) and L0 (µ) = L0 (µ̂).

(c) Let (X, Σ, µ) be a measure space. (i) Show that for every u ∈ L0 (µ) we may define an outer measure θu : PR →
[0, ∞] by writing θu (A) = µ∗ f −1 [A] whenever A ⊆ R and f ∈ L0 (µ) is such that f • = u. (ii) Show that the measure
defined from θu by Carathéodory’s method measures every Borel subset of R.

(d) Let h(Xi , Σi , µi )ii∈I be a family of measure spaces, with direct sum (X, Σ, µ) (214L). (i) Writing φi : Xi → X
for the canonical mapsQ (in the construction of 214L, φi (x) = (x, i) for x ∈ Xi ), show that f 7→ hf φQ
i ii∈I is a bijection
between L0 (µ) and i∈I L0 (µi ). (ii) Show that this corresponds to a bijection between L0 (µ) and i∈I L0 (µi ).

(e) Let U be a Dedekind σ-complete Riesz space and A ⊆ U a non-empty countable set which is bounded below in
U . Show that inf A is defined in U .
241Yg L0 and L0 139

(f ) Let U be a Dedekind complete Riesz space and A ⊆ U a non-empty set which is bounded below in U . Show
that inf A is defined in U .
(g) Let (X, Σ, µ) and (Y, T, ν) be measure spaces, and φ : X → Y an inverse-measure-preserving function. (i) Show
that we have a map T : L0 (ν) → L0 (µ) defined by setting T g • = (gφ)• for every g ∈ L0 (ν). (ii) Show that T is linear,
that T (v × w) = T v × T w for all v, w ∈ L0 (ν), and that T (supn∈N vn ) = supn∈N T vn whenever hvn in∈N is a sequence
in L0 (ν) with an upper bound in L0 (ν).
> (h) Let (X, Σ, µ) be a measure space. Suppose that r ≥ 1 and that h : R r → R is a Borel measurable function.
Show that there is a function h̄ : L0 (µ)r → L0 (µ) defined by writing
h̄(f1• , . . . , fr• ) = (h(f1 , . . . , fr ))•
for f1 , . . . , fr ∈ L0 (µ).
(i) Let (X, Σ, µ) be a measure space and g, h, hgn in∈N Borel measurable functions from R to itself; write ḡ, h̄, ḡn
for the corresponding functions from L0 = L0 (µ) to itself (241I). (i) Show that
ḡ(u) + h̄(u) = g + h(u), ḡ(u) × h̄(u) = g × h(u), ḡ(h̄(u)) = gh(u)
for every u ∈ L0 . (ii) Show that if g(t) ≤ h(t) for every t ∈ R, then ḡ(u) ≤ h̄(u) for every u ∈ L0 . (iii) Show that if g
is non-decreasing, then ḡ(u) ≤ ḡ(v) whenever u ≤ v in L0 . (iv) Show that if h(t) = supn∈N gn (t) for every t ∈ R, then
h̄(u) = supn∈N ḡn (u) in L0 for every u ∈ L0 .

241Y Further exercises (a) Let U be any Riesz space. For u ∈ U write |u| = u ∨ (−u), u+ = u ∨ 0, u− = (−u) ∨ 0.
Show that, for any u, v ∈ U ,
u = u+ − u− , |u| = u+ + u− = u+ ∨ u− , u+ ∧ u− = 0,

1
u ∨ v = (u + v + |u − v|) = u + (v − u)+ ,
2

1
u ∧ v = (u + v − |u − v|) = u − (u − v)+ ,
2

|u + v| ≤ |u| + |v|.

(b) Let U be a partially ordered linear space and N a linear subspace of U such that whenever u, u′ ∈ N and

u ≤ v ≤ u then v ∈ N . (i) Show that the linear space quotient U/N is a partially ordered linear space if we say that
u• ≤ v • in U/N iff there is a w ∈ N such that u ≤ v + w in U . (ii) Show that in this case U/N is a Riesz space if U is
a Riesz space and |u| ∈ N for every u ∈ N .
(c) Let (X, Σ, µ) be a measure space. Write L0strict for the space of all measurable functions from X to R, and N
for the subspace of L0strict consisting of measurable functions which are zero almost everywhere. (i) Show that L0strict is
a Dedekind σ-complete Riesz space. (ii) Show that L0 (µ) can be identified, as ordered linear space, with the quotient
L0strict /N as defined in 241Yb above.
(d) Show that any Dedekind σ-complete Riesz space is Archimedean.
(e) A Riesz space U is said to have the countable sup property if for every A ⊆ U with a least upper bound in
U , there is a countable B ⊆ A such that sup B = sup A. Show that if (X, Σ, µ) is a semi-finite measure space, then it
is σ-finite iff L0 (µ) has the countable sup property.
(f ) Let (X, Σ, µ) be a measure space and µ̃ the c.l.d. version of µ (213E). (i) Show that L0 (µ) ⊆ L0 (µ̃). (ii) Show that
this inclusion defines a linear operator T : L0 (µ) → L0 (µ̃) such that T (u × v) = T u × T v for all u, v ∈ L0 (µ). (iii) Show
that whenever v > 0 in L0 (µ̃) there is a u ≥ 0 in L0 (µ) such that 0 < T u ≤ v. (iv) Show that T (sup A) = sup T [A]
whenever A ⊆ L0 (µ) is a non-empty set with a least upper bound in L0 (µ). (v) Show that T is injective iff µ is
semi-finite. (vi) Show that if µ is localizable, then T is an isomorphism for the linear and order structures of L0 (µ)
and L0 (µ̃). (Hint: 213Hb.)
(g) Let (X, Σ, µ) be a measure space and Y any subset of X; let µY be the subspace measure on Y . (i) Show that
L0 (µY ) = {f ↾ Y : f ∈ L0 (µ)}. (ii) Show that there is a canonical surjection T : L0 (µ) → L0 (µY ) defined by setting
T (f • ) = (f ↾ Y )• for every f ∈ L0 (µ), which is linear and multiplicative and preserves finite suprema and infima, so
that (in particular) T (|u|) = |T u| for every u ∈ L0 (µ). (iii) Show that T is injective iff Y has full outer measure.
140 Function spaces 241Yh

(h) Suppose, in 241Yg, that Y ∈ Σ. Explain how L0 (µY ) may be identified (as ordered linear space) with the
subspace {u : u × χ(X \ Y )• = 0} of L0 (µ).

(i) Let (X, Σ, µ) be a measure space, and h : R → R a non-decreasing function which is continuous on the left. Show
that if A ⊆ L0 = L0 (µ) is a non-empty set with a supremum v ∈ L0 , then h̄(v) = supu∈A h̄(u), where h̄ : L0 → L0 is
the function described in 241I.

241 Notes and comments As hinted in 241Ya and 241Yd, the elementary properties of the space L0 which take up
most of this section are strongly interdependent; it is not difficult to develop a theory of ‘Riesz algebras’ to incorporate
the ideas of 241H into the rest. (Indeed, I sketch such a theory in §352 in the next volume.)
If we write L0strict for the space of measurable functions from X to R, then L0strict is also a Dedekind σ-complete
Riesz space, and L0 can be identified with the quotient L0strict /N, writing N for the set of functions in L0strict which are
zero almost everywhere. (To do this properly, we need a theory of quotients of ordered linear spaces; see 241Yb-241Yc
above.) Of course L0 , as I define it, is not quite a linear space. I choose the slightly more awkward description of L0 as
a space of equivalence classes in L0 rather than in L0strict because it frequently happens in practice that a member of L0
arises from a member of L0 which is either not defined at every point of the underlying space, or not quite measurable;
and to adjust such a function so that it becomes a member of L0strict , while trivial, is an arbitrary process which to my
mind is liable to distort the true nature of such a construction. Of course the same argument could be used in favour
of a slightly larger space, the space L0∞ of µ-virtually measurable [−∞, ∞]-valued functions defined and finite almost
everywhere, relying on 135E rather than on 121E-121F. But I maintain that the operation of restricting a function in
L0∞ to the set on which it is finite is not arbitrary, but canonical and entirely natural.
Reading the exposition above – or, for that matter, scanning the rest of this chapter – you are sure to notice a
plethora of • s, adding a distinctive character to the pages which, I expect you will feel, is disagreeable to the eye and
daunting, or at any rate wearisome, to the spirit. Many, perhaps most, authors prefer to simplify the typography by
using the same symbol for a function in L0 or L0strict and for its equivalence class in L0 ; and indeed it is common to
use syntax which does not distinguish between them either, so that an object which has been defined as a member of
L0 will suddenly become a function with actual values at points of the underlying measure space. I prefer to maintain
a rigid distinction; you must choose for yourself whether to follow me. Since I have chosen the more cumbersome
form, I suppose the burden of proof is on me, to justify my decision. (i) Anyone would agree that there is at least a
formal difference between a function and a set of functions. This by itself does not justify insisting on the difference
in every sentence; mathematical exposition would be impossible if we always insisted on consistency in such questions
as whether (for instance) the number 3 belonging to the set N of natural numbers is exactly the same object as the
number 3 belonging to the set C of complex numbers, or the ordinal 3. But the difference between an object and
a set to which it belongs is a sufficient difference in kind to make any confusion extremely dangerous, and while I
agree that you can study this topic without using different symbols for f and f • , I do not think you can ever safely
escape a mental distinction for more than a few lines of argument. (ii) As a teacher, I have to say that quite a few
students, encountering this material for the first time, are misled by any failure to make the distinction between f and
f • into believing that no distinction need be made; and – as a teacher – I always insist on a student convincing me, by
correctly writing out the more pedantic forms of the arguments for a few weeks, that he understands the manipulations
necessary, before I allow him to go his own way. (iii) The reason why it is possible to evade the distinction in certain
types of argument is just that the Dedekind σ-complete Riesz space L0strict parallels the Dedekind σ-complete Riesz
space L0 so closely that any proposition involving only countably many members of these spaces is likely to be valid in
one if and only if it is valid in the other. In my view, the implications of this correspondence are at the very heart of
measure theory. I prefer therefore to keep it constantly conspicuous, reminding myself through symbolism that every
theorem has a Siamese twin, and rising to each challenge to express the twin theorem in an appropriate language. (iv)
There are ways in which L0strict and L0 are actually very different, and many interesting ideas can be expressed only in
a language which keeps them clearly separated.
For more than half my life now I have felt that these points between them are sufficient reason for being consistent
in maintaining the formal distinction between f and f • . You may feel that in (iii) and (iv) of the last paragraph I
am trying to have things both ways; I am arguing that both the similarities and the differences between L0 and L0
support my case. Indeed that is exactly my position. If they were totally different, using the same language for both
would not give rise to confusion; if they were essentially the same, it would not matter if we were sometimes unclear
which we were talking about.
242Cd L1 141

242 L1
While the space L0 treated in the previous section is of very great intrinsic interest, its chief use in the elementary
theory is as a space in which some of the most important spaces of functional analysis are embedded. In the next few
sections I introduce these one at a time.
The first is the space L1 of equivalence classes of integrable functions. The importance of this space is not only that
it offers a language in which to express those many theorems about integrable functions which do not depend on the
differences between two functions which are equal almost everywhere. It can also appear as the natural space in which
to seek solutions to a wide variety of integral equations, and as the completion of a space of continuous functions.

242A The space L1 Let (X, Σ, µ) be any measure space.

(a) Let L1 = L1 (µ) be the set of real-valued functions, defined on subsets of X, which are integrable over X. Then
L ⊆ L0 = L0 (µ), as defined in §241, and, for f ∈ L0 , we have f ∈ L1 iff there is a g ∈ L1 such that |f | ≤a.e. g; if
1

f ∈ L1 , g ∈ L0 and f =a.e. g, then g ∈ L1 . (See 122P-122R.)


1 1 0 0
R = L (µ) ⊆ L = L (µ) be the set of equivalence classes
(b)R Let L R of members ofR L1 . IfR f , g ∈ L1 and f =a.e. g
then f = g (122Rb). Accordingly we may define a functional on L1 by writing f • = f for every f ∈ L1 .
R R • R
(c) It will be convenient to be able to write A u for u ∈ L1 , A ⊆ X; this may be defined by saying that
R f = Af
A R
for every f ∈ L1 , where the integral is defined in 214D. PP I have only to check that if f =a.e. g then A f = A g; and
this is because f ↾A = g↾A almostR everywhere
R in A. Q
Q R R
If E ∈ Σ and u ∈ L1 then E u = u × (χE)• ; this is because E f = f × χE for every integrable function f
(131Fa).

(d) If u ∈ L1 , there is a Σ-measurable, µ-integrable function f : X → R such that f • = u. P P As noted in 241Bk,


there is a measurable f : X → R such that f • = u; but of course f is integrable because it is equal almost everywhere
to some integrable function. Q Q
R
242B Theorem Let (X, Σ, µ) be any measure space. Then L1 (µ) is a linear subspace of L0 (µ) and : L1 → R is
a linear functional.
proof If u, v ∈ L1 = L1 (µ) and c ∈ R let f , g be integrable functions such that u = f • and v = g • ; then f + g and cf
are integrable, so u + v = (f + g)• and cu = (cf )• belong to L1 . Also
R R R R R R
u+v = f +g = f+ g= u+ v
and
R R R R
cu = cf = c f = c u.

242C The order structure of L1 Let (X, Σ, µ) be any measure space.

(a) L1 = L1 (µ) has an order structure derived from that of L0 = L0 (µ) (241E); that is, f • ≤ g • iff f ≤ g a.e. Being
a linear subspace of L0 , L1 must be a partially ordered linear space; the two conditions of 241Ec are obviously inherited
by linear subspaces. R R
1
R Note R also that if u, v ∈ L and u ≤ v then u ≤ v, because if f , g are integrable functions and f ≤a.e. g then
f ≤ g (122Od).

(b) If u ∈ L0 , v ∈ L1 and |u| ≤ |v| then u ∈ L1 . P P Let f ∈ L0 = L0 (µ), g ∈ L1 = L1 (µ) be such that u = f • and
v = g ; then g is integrable and |f | ≤a.e. |g|, so f is integrable and u ∈ L1 . Q

Q

(c) In particular, |u| ∈ L1 whenever u ∈ L1 , and


R R R R
| u| = max( u, (−u)) ≤ |u|,
because u, −u ≤ |u|.

(d) Because |u| ∈ L1 for every u ∈ L1 ,


1 1
u ∨ v = (u + v + |u − v|), u ∧ v = (u + v − |u − v|)
2 2
142 Function spaces 242Cd

belong to L1 for all u, v ∈ L1 . But if w ∈ L1 we surely have


w ≤ u & w ≤ v ⇐⇒ w ≤ u ∧ v,

w ≥ u & w ≥ v ⇐⇒ w ≥ u ∨ v
because these are true for all w ∈ L , so u ∨ v = sup{u, v} and u ∧ v = inf{u, v} in L1 . Thus L1 is, in itself, a Riesz
0

space.
1
R
R Note that if u ∈ L , then u ≥ 0 iff E u ≥ 0 for every E ∈ Σ; this is because 1if f isRan integrable
(e) R function on X
and E f ≥ 0 for every E ∈ Σ, then f ≥ 0 a.e. (131Fb).
R More
R generally, if u, v ∈ L and E
u ≤ E
v for every E ∈ Σ,
then u ≤ v. It follows at once that if u, v ∈ L1 and E u = E v for every E ∈ Σ, then u = v (cf. 131Fc).

(f ) If u ≥ 0 in L1 , there is a non-negative f ∈ L1 such that f • = u (compare 241Eg).

242D The norm of L1 Let (X, Σ, µ) be any measure space.


R R
(a) For f ∈ L1 = L1 (µ) I write kf k1 = |f | ∈ [0, ∞[. For u ∈ L1 = L1 (µ) set kuk1 = |u|, so that kf • k1 = kf k1
for every f ∈ L1 . Then k k1 is a norm on L1 . P
P (i) If u, v ∈ L1 then |u + v| ≤ |u| + |v|, by 241Ee, so
R R R R
ku + vk1 = |u + v| ≤ |u| + |v| = |u| + |v| = kuk1 + kvk1 .
1
(ii) If u ∈ L and c ∈ R then
R R R
kcuk1 = |cu| =
|c||u| = |c| |u| = |c|kuk1 .
R R
(iii) If u ∈ L and kuk1 = 0, express u as f , where f ∈ L1 ; then |f | = |u| = 0. Because |f | is non-negative, it must
1 •

be zero almost everywhere (122Rc), so f = 0 a.e. and u = 0 in L1 . Q Q


R R
(b) Thus L1 , with k k1 , is a normed space and : L1 → R is a linear operator; observe that k k ≤ 1, because
R R
| u| ≤ |u| = kuk1
1
for every u ∈ L .

(c) If u, v ∈ L1 and |u| ≤ |v|, then


R R
kuk1 = |u| ≤ |v| = kvk1 .
In particular, kuk1 = k|u|k1 for every u ∈ L1 .

(d) Note the following property of the normed Riesz space L1 : if u, v ∈ L1 and u, v ≥ 0, then
R R R
ku + vk1 = u + v = u + v = kuk1 + kvk1 .

(e) The set (L1 )+ = {u : u ≥ 0} is closed in L1 . P


P If v ∈ L1 , u ∈ (L1 )+ then ku − vk1 ≥ kv ∧ 0k1 ; this is because if
1
f , g ∈ L and f ≥ 0 a.e., |f (x) − g(x)| ≥ | min(g(x), 0)| whenever f (x) and g(x) are both defined and f (x) ≥ 0, which
is almost everywhere, so
R R
ku − vk1 = |f − g| ≥ |g ∧ 0| = kv ∧ 0k1 .
Now this means that if v ∈ L1 and v 6≥ 0, the ball {w : kw − vk1 < δ} does not meet (L1 )+ , where δ = kv ∧ 0k1 > 0
because v ∧ 0 6= 0. Thus L1 \ (L1 )+ is open and (L1 )+ is closed. Q
Q

242E For the next result we need a variant of B.Levi’s theorem.


Lemma
P∞ R Let (X, Σ, µ) be a measure
P∞ space and hfn in∈N a sequence of µ-integrable real-valued functions such that
n=0 |f n | < ∞. Then f = n=0 f n is integrable and
R P∞ R R P∞ R
f = n=0 fn , |f | ≤ n=0 |fn |.
Pn
proof (a) Suppose first that every fn is non-negative. Set gn = k=0 fk for each n; then hgn in∈N is increasing a.e.
and
R P∞ R
limn→∞ gn = k=0 fk
242H L1 143

is finite, so by B.Levi’s theorem (123A) f = limn→∞ gn is integrable and


R R P∞ R
f = limn→∞ gn = k=0 fk .
In this case, of course,
R R P∞ R P∞ R
|f | = f= n=0 fn = n=0 |fn |.

(b) For the general case, set fn+ = 21 (|fn | + fn ), fn− = 12 (|fn | − fn ), as in 241Ef; then fn+ and fn− are non-negative
integrable functions, and
P ∞ R + P ∞ R − P∞ R
n=0 fn + n=0 fn = n=0 |fn | < ∞.
P∞ +
P∞ −
So h1 = n=0 fn and h2 = n=0 fn are both integrable. Now f =a.e. h1 − h2 , so
R R R P∞ R P∞ R P∞ R
f = h1 − h2 = n=0 fn+ − n=0 fn− = n=0 fn .
Finally
R R R P∞ R P∞ R P∞ R
|f | ≤ |h1 | + |h2 | = n=0 fn+ + n=0 fn− = n=0 |fn |.

242F Theorem For any measure space (X, Σ, µ), L1 (µ) is complete under its norm k k1 .
proof Let hun in∈N be a sequence in L1 such that kun+1 − un k1 ≤ 4−n for every n ∈ N. Choose integrable functions
fn such that f0• = u0 , fn+1

= un+1 − un for each n ∈ N. Then
P∞ R P∞
n=0 |fn | = ku0 k1 + n=0 kun+1 − un k1 < ∞.
P∞ Pn
So f = n=0 fn is integrable, by 242E, and u = f • ∈ L1 . Set gn = j=0 fj for each n; then gn• = un , so
R R P∞ P∞ −j
ku − un k1 = |f − gn | ≤ j=n+1 |fj | ≤ j=n+1 4 = 4−n /3
for each n. Thus u = limn→∞ un in L1 . As hun in∈N is arbitrary, L1 is complete (2A4E).

242G Definition It will be convenient, for later reference, to introduce the following phrase. A Banach lattice
is a Riesz space U together with a norm k k on U such that (i) kuk ≤ kvk whenever u, v ∈ U and |u| ≤ |v|, writing |u|
for u ∨ (−u), as in 241Ee (ii) U is complete under k k. Thus 242Dc and 242F amount to saying that the normed Riesz
space (L1 , k k1 ) is a Banach lattice.

242H L1 as a Riesz space We can discuss the ordered linear space L1 in the language already used in 241E-241G
for L0 .
Theorem Let (X, Σ, µ) be any measure space. Then L1 = L1 (µ) is Dedekind complete.
proof (a) Let A ⊆ L1 be any non-empty set which is bounded above in L1 . Set
A′ = {u0 ∨ . . . ∨ un : u0 , . . . , un ∈ A}.
Then A ⊆ A′ , A′ has the
R sameR upper bounds as A and u ∨ v ∈ A′ Rfor all u, v ∈ A′ . Taking w0 to be any upper bound

of A and A
R , we have −nu ≤ w0 for every u ∈ A , so γ = supu∈A′ u is defined in R. For each n ∈ N, choose un ∈ A′

such that un ≥ γ − 2 . Because L0 = L0 (µ) is Dedekind σ-complete (241Ga), u∗ = supn∈N un is defined in L0 , and
u0 ≤ u∗ ≤ w0 in L0 . Consequently
0 ≤ u ∗ − u 0 ≤ w0 − u 0
in L0 . But w0 − u0 ∈ L1 , so u∗ − u0 ∈ L1 (242Cb) and u∗ ∈ L1 .
(b) The point is that u∗ is an upper bound for A. P
P If u ∈ A, then u ∨ un ∈ A′ for every n, so

Z Z
∗ ∗
ku − u ∧ u k1 = u−u∧u ≤ u − u ∧ un

(because u ∧ un ≤ un ≤ u∗ , so u ∧ un ≤ u ∧ u∗ )
Z
= u ∨ un − un

(because u ∨ un + u ∧ un = u + un – see the formulae in 242Cd)


144 Function spaces 242H
Z Z
= u ∨ un − un ≤ γ − (γ − 2−n ) = 2−n

for every n; so ku − u ∧ u∗ k1 = 0. But this means that u = u ∧ u∗ , that is, that u ≤ u∗ . As u is arbitrary, u∗ is an
upper bound for A. QQ
(c) On the other hand, any upper bound for A is surely an upper bound for {un : n ∈ N}, so is greater than or
equal to u∗ . Thus u∗ = sup A in L1 . As A is arbitrary, L1 is Dedekind complete.
Remark Note that the order-completeness of L1 , unlike that of L0 , does not depend on any particular property of the
measure space (X, Σ, µ).

242I The Radon-Nikodým theorem I think it is worth re-writing the Radon-Nikodým theorem (232E) in the
language of this chapter.
Theorem Let (X, Σ, µ) be a measure space. Then there is a canonical bijection between L1 = L1 (µ) and the set of
truly continuous additive functionals ν : Σ → R, given by the formula
R
νF = F
u for F ∈ Σ, u ∈ L1 .

Remark Recall that if µ is σ-finite, then the truly continuous additive functionals are just the absolutely continuous
countably additive functionals; and that if µ is totally finite, then all absolutely continuous (finitely) additive functionals
are truly continuous (232Bd).
R
proof For u ∈ L1 , F ∈ Σ set νu F = F u. If u ∈ L1 , there is an integrable function f such that f • = u, in which case
R
F 7→ νu F = F
f :Σ→R
is additive and truly continuous, by 232D.
R If ν : Σ → R is additive and truly continuous, then by 232E there is an
integrable function f such that νF = F f forR every RF ∈ Σ; setting u = f • in L1 , ν = νu . Finally, if u, v are distinct
members of L1 , there is an F ∈ Σ such that F u 6= F v (242Ce), so that νu 6= νv ; thus u 7→ νu is injective as well as
surjective.

242J Conditional expectations revisited We now have the machinery necessary for a new interpretation of
some of the ideas of §233.
(a) Let (X, Σ, µ) be a measure space, and T a σ-subalgebra of Σ, as in 233A. Then (X, T, µ↾ T) is a measure
space, and L0 (µ↾ T) ⊆ L0 (µ); moreover, if f , g ∈ L0 (µ↾ T), then f = g (µ↾ T)-a.e. iff f = g µ-a.e. P
P There are
µ↾ T-conegligible sets F , G ∈ T such that f ↾F and g↾G are T-measurable; set
E = {x : x ∈ F ∩ G, f (x) 6= g(x)} ∈ T;
then
f = g (µ↾ T)-a.e. ⇐⇒ (µ↾ T)(E) = 0 ⇐⇒ µE = 0 ⇐⇒ f = g µ-a.e. Q
Q
Accordingly we have a canonical map S : L0 (µ↾ T) → L0 (µ) defined by saying that if u ∈ L0 (µ↾ T) is the equivalence
class of f ∈ L0 (µ↾ T), then Su is the equivalence class of f in L0 (µ). It is easy to check, working through the
operations described in 241D, 241E and 241H, that S is linear, injective and order-preserving, and that |Su| = S|u|,
S(u ∨ v) = Su ∨ Sv and S(u × v) = Su × Sv for u, v ∈ L0 (µ↾ T).
R R
(b) Next, if f ∈ L1 (µ↾ T), then f ∈ L1 (µ) and f dµ = f d(µ↾ T) (233B); so Su ∈ L1 (µ) and kSuk1 = kuk1 for
every u ∈ L1 (µ↾ T).
Observe also that every member of L1 (µ) ∩ S[L0 (µ↾ T)] is actually in S[L1 (µ↾ T)]. P
P Take u ∈ L1 (µ) ∩ S[L0 (µ↾ T)].
1 0
Then u is expressible both as f where f ∈ L (µ), and as g where g ∈ L (µ↾ T). So g =a.e. f , and g is µ-integrable,
• •

therefore (µ↾ T)-integrable (233B again). Q


Q
This means that S : L1 (µ↾ T) → L1 (µ) ∩ S[L0 (µ↾ T)] is a bijection.

(c) Now suppose that µX = 1, so that (X, Σ, µ) R is a probability


R space. Recall that g is a conditional expectation of f
on T if g is µ↾ T-integrable, f is µ-integrable and F g = F f for every F ∈ T; and that every µ-integrable function has
such a conditional expectation
R (233D).
R If g is a conditional expectation of f and f1 = f µ-a.e. then g is a conditional
expectation of f1 , because F f1 = F f for every F ; and I have already remarked in 233Dc that if g, g1 are conditional
expectations of f on T then g = g1 µ↾ T-a.e.
242M L1 145

(d) This means that we have an operator P : L1 (µ) → L1 (µ↾ T) definedR by saying R that P (f ) = g whenever
• •

g ∈ L (µ↾ T) is a conditional expectation of f ∈ L (µ) on T; that is, that F P u = F u whenever u ∈ L1 (µ) and
1 1

F ∈ T. If we identify L1 (µ), L1 (µ↾ T) with the sets of absolutely continuous additive functionals defined on Σ and T,
as in 242I, then P corresponds to the operation ν 7→ ν↾ T.
R R
(e) Because P u is uniquely defined in L1 (µ↾ T) by the requirement F
Pu = F
u for every F ∈ T (242Ce), we see
that P must be linear. PP If u, v ∈ L1 (µ) and c ∈ R, then
R R R R R R R
F
Pu + Pv = F
Pu + F
Pv = F
u+ F
v= F
u+v = F
P (u + v),
R R R R R
P (cu) = F cu = c F u = c F P u = F cP u
F
R R
for every F ∈ T. Q Q Also, if u ≥ 0, then F P u = F u ≥ 0 for every F ∈ T, so P u ≥ 0 (242Ce again).
It follows at once that P is order-preserving, that is, that P u ≤ P v whenever u ≤ v. Consequently
|P u| = P u ∨ (−P u) = P u ∨ P (−u) ≤ P |u|
1
for every u ∈ L (µ), because u ≤ |u| and −u ≤ |u|. Finally, P is a bounded linear operator, with norm 1. P
P The last
formula tells us that
R R
kP uk1 ≤ kP |u|k1 = P |u| = |u| = kuk1
for every u ∈ L1 (µ), so kP k ≤ 1. On the other hand, P (χX • ) = χX • 6= 0, so kP k = 1. Q
Q

(f ) We may legitimately regard P u ∈ L1 (µ↾ T) as ‘the’ conditional expectation of u ∈ L1 (µ) on T; P is the


conditional expectation operator.
R R R
(g) If u ∈ L1 (µ↾ T), then we have a corresponding Su ∈ L1 (µ), as in (b); now P Su = u. P
P F
P Su = F
Su = F
u
for every F ∈ T. Q Q Consequently SP SP = SP : L1 (µ) → L1 (µ).

(h) The distinction drawn above between u = f • ∈ L0 (µ↾ T) and Su = f • ∈ L0 (µ) is of course pedantic. I believe
it is necessary to be aware of such distinctions, even though for nearly all purposes it is safe as well as convenient
to regard L0 (µ↾ T) as actually a subset of L0 (µ). If we do so, then (b) tells us that we can identify L1 (µ↾ T) with
L1 (µ) ∩ L0 (µ↾ T), while (g) becomes ‘P 2 = P ’.

242K The language just introduced allows the following re-formulations of 233J-233K.
Theorem Let (X, Σ, µ) be a probability space and T a σ-subalgebra of Σ. Let φ : R → R be a convex function and
φ̄ : L0 (µ) → L0 (µ) the corresponding operator defined by setting φ̄(f • ) = (φf )• (241I). If P : L1 (µ) → L1 (µ↾ T) is the
conditional expectation operator, then φ̄(P u) ≤ P (φ̄u) whenever u ∈ L1 (µ) is such that φ̄(u) ∈ L1 (µ).
proof This is just a restatement of 233J.

242L Proposition Let (X, Σ, µ) be a probability space, and T a σ-subalgebra of Σ. Let P : L1 (µ) → L1 (µ↾ T)
be the corresponding conditional expectation operator. If u ∈ LR1 = L1 (µ) 0 1
R and v ∈ L (µ↾ T), then u × v ∈ L iff
1
P |u| × v ∈ L , and in this case P (u × v) = P u × v; in particular, u × v = P u × v.
proof (I am here using the identification of L0 (µ↾ T) as a subspace of L0 (µ), as suggested in 242Jh.) Express u as
f • and v as h• , where f ∈ L1 = L1 (µ) and h ∈ L0 (µ↾ T). Let g, g0 ∈ L1 (µ↾ T) be conditional expectations of f , |f |
respectively, so that P u = g • and P |u| = g0• . Then, using 233K,
u × v ∈ L1 ⇐⇒ f × h ∈ L1 ⇐⇒ g0 × h ∈ L1 ⇐⇒ P |u| × v ∈ L1 ,
and in this case g × h is a conditional expectation of f × h, that is, P u × v = P (u × v).

242M L1 as a completion I mentioned in the introduction to this section that L1 appears in functional analysis
as a completion of some important spaces; put another way, some dense subspaces of L1 are significant. The first is
elementary.
Proposition Let (X, Σ, µ) be any measure space, and write S for the space of µ-simple functions
R on X. Then
(a) whenever f is a µ-integrable real-valued function and ǫ > 0, there is an h ∈ S such that |f − h| ≤ ǫ;
(b) S = {f • : f ∈ S} is a dense linear subspace of L1 = L1 (µ).
146 Function spaces 242M
R R
proof (a)(i) If f is non-negative, then there is a simple function h such that h ≤a.e. f and h≥ f − 21 ǫ (122K), in
which case
R R R R 1
|f − h| = f −h= f− h ≤ ǫ.
2

(ii) In the generalR case, f is expressible as a difference f1 − f2 of non-negative integrable functions. Now there are h1 ,
h2 ∈ S such that |fj − hj | ≤ 12 ǫ for both j and
R R R
|f − h| ≤ |f1 − h1 | + |f2 − h2 | ≤ ǫ.

(b) Because S is a linear subspace of RX included in L1 = L 1


R (µ), S is a linear subspace of L1 . If u ∈ L1 and ǫ > 0,
1
there are an f ∈ L such that f = u and an h ∈ S such that |f − h| ≤ ǫ; now v = h ∈ S and
• •

R
ku − vk1 = |f − h| ≤ ǫ.
1
As u and ǫ are arbitrary, S is dense in L .

242N As always, Lebesgue measure on R r and its subsets is by far the most important example; and in this
case we have further classes of dense subspace of L1 . If you have reached this point without yet troubling to master
multi-dimensional Lebesgue measure, just take r = 1. If you feel uncomfortable with general subspace measures, take
X to be R r or [0, 1] ⊆ R or some other particular subset which you find interesting. The following term will be useful.
Definition If f is a real- or complex-valued function defined on a subset of R r , say that the support of f is
{x : x ∈ dom f, f (x) 6= 0}.

242O Theorem Let X be any subset of R r , where r ≥ 1, and let µ be Lebesgue measure on X, that is, the
subspace measure on X induced by Lebesgue measure on R r . Write Ck for the space of bounded continuous functions
f : R r → R which have bounded support, and S0 for the space of linear combinations of functions of the form χI where
I ⊆ R r is a bounded half-open interval. Then R R
(a) whenever f ∈ L1 = L1 (µ) and ǫ > 0, there are g ∈ Ck , h ∈ S0 such that X |f − g| ≤ ǫ and X |f − h| ≤ ǫ;
(b) {(g↾X)• : g ∈ Ck } and {(h↾X)• : h ∈ S0 } are dense linear subspaces of L1 = L1 (µ).
Remark Of course there is a redundant ‘bounded’ in the description of Ck ; see 242Xh.
proof (a) I argue in turn that the result is valid for each of an increasing number of members f of L1 = L1 (µ). Write
µr for Lebesgue measure on R r , so that µ is the subspace measure (µr )X .
(i) Suppose first that f = χI↾X where I ⊆ R r is a bounded half-open interval. Of course χI is already in S0 ,
so I have only to show that it is approximated by members of Ck . If I = ∅ the result is trivial; we can take g = 0.
Otherwise, express I as [a − b, a + b[ where a = (α1 , . . . , αr ), b = (β1 , . . . , βr ) and βj > 0 for each j. Let δ > 0 be such
that
Qr Qr
2r j=1 (βj + δ) ≤ ǫ + 2r j=1 βj .
For ξ ∈ R set

gj (ξ) = 1 if |ξ − αj | ≤ βj ,
= (βj + δ − |ξ − αj |)/δ if βj ≤ |ξ − αj | ≤ βj + δ,
= 0 if |ξ − αj | ≥ βj + δ.

βj δ
αj

The function gj
242O L1 147

For x = (ξ1 , . . . , ξr ) ∈ R r set


Qr
g(x) = j=1 gj (ξj ).

Then g ∈ Ck and χI ≤ g ≤ χJ, where J = [a − b − δ1, a + b + δ1] (writing 1 = (1, . . . , 1)), so that (by the choice of δ)
µr J ≤ µr I + ǫ, and
Z Z
|g − f | ≤ (χ(J ∩ X) − χ(I ∩ X))dµ = µ((J \ I) ∩ X)
X
≤ µr (J \ I) = µr J − µr I ≤ ǫ,

as required.
(ii) Now suppose that f = χ(X ∩ E) where S E ⊆ R r is a set of finite measure. Then there is a disjoint family
I0 , . . . , In of half-open intervals such that µr (E△ j≤n Ij ) ≤ 12 ǫ. P P There is an open set G ⊇ E such that µr (G \
E) ≤ 41 ǫ (134Fa). For each m ∈ N, let Im be the family of half-open intervals in R r of the form [a, b[ where
a = (2−m k1 , . . . , 2−m kr ), k1 , . . . , kr being integers, and b = a + 2−m 1; then Im is a disjoint family. Set Hm =
S
{I : I ∈ Im , I ⊆ G}; then hHm im∈N is a non-decreasing family with union G, so that S there is an m such that
µr (G \ Hm ) ≤ 14 ǫ and µr (E△Hm ) ≤ 21 ǫ. But now Hm is expressible as a disjoint union j≤n Ij where I0 , . . . , In
enumerate the members of Im included in Hm . (The last sentence derails if Hm is empty. But if Hm = ∅ then we can
take n = 0, I0 = ∅.) Q Q
Pn
Accordingly h = j=0 χIj ∈ S0 and
R S 1
X
|f − h| = µ(X ∩ (E△ j≤n Ij )) ≤ ǫ.
2
R Pn
As for Ck , (i) tells us that there is for each j ≤ n a gj ∈ Ck such that X
|gj −χIj | ≤ ǫ/2(n+1), so that g = j=0 gj ∈ Ck
and
R R R ǫ Pn R
X
|f − g| ≤ X
|f − h| + X
|h − g| ≤ + j=0 X
|gj − χIj | ≤ ǫ.
2

Pn
(iii) If f is a simple function, express f as k=0 ak χEk where each Ek is of finite measure in X. Each Ek is
expressible as X ∩ Fk where µr Fk = µEk (214Ca). By (ii), we can find gk ∈ Ck , hk ∈ S0 such that
R ǫ R ǫ
|ak | |g − χFk | ≤
X k
, |ak | X
|hk − χFk | ≤
n+1 n+1
Pn Pn
for each k. Set g = k=0 ak gk and h = k=0 ak hk ; then g ∈ Ck , h ∈ S0 and
R R P n Pn R
X
|f − g| ≤ X k=0 |ak ||χFk − gk | = k=0 |ak | X |χFk − gk | ≤ ǫ,
R Pn R
X
|f − h| ≤ k=0 |ak | X
|χFk − hk | ≤ ǫ,
as required.
R
(iv) If f is any integrable function on X, then by
R 242Ma we can find
R a simple function f0 such that |f −f0 | ≤ 12 ǫ,
and now by (iii) there are g ∈ Ck , h ∈ S0 such that X |f0 − g| ≤ 21 ǫ, X |f0 − h| ≤ 12 ǫ; so that
R R R
X
|f − g| ≤ X
|f − f0 | + X
|f0 − g| ≤ ǫ,
R R R
X
|f − h| ≤ X
|f − f0 | + X
|f0 − h| ≤ ǫ.

(b)(i) We must check first that if g ∈ Ck then g↾X is actually µ-integrable. The point here is that if g ∈ Ck and
a ∈ R then
{x : x ∈ X, g(x) > a}
is the intersection of X with an open subset of R r , and is therefore measured by µ, because all open sets are measured
by µr (115G). Next, g is bounded and the set E = {x : x ∈ X, g(x) 6= 0} is bounded in R r , therefore of finite outer
measure for µr and of finite measure for µ. Thus there is an M ≥ 0 such that |g| ≤ M χE, which is µ-integrable.
Accordingly g is µ-integrable.
Of course h↾X is µ-integrable for every h ∈ S0 because (by the definition of subspace measure) µ(I ∩ X) is defined
and finite for every bounded half-open interval I.
148 Function spaces 242O

(ii) Now the rest follows by just the same arguments as in 242Mb. Because {g↾X : g ∈ Ck } and {h↾X : h ∈ S0 }
are linear subspaces of RX included in L1 (µ), their images Ck# and S0# are linear subspaces of L1 . If u ∈ L1 and ǫ > 0,
R R
there are an f ∈ L1 such that f • = u, and g ∈ Ck , h ∈ S0 such that X |f − g|, X |f − h| ≤ ǫ; now v = (g↾X)• ∈ Ck#
and w = (h↾X)• ∈ S0# and
R R
ku − vk1 = X
|f − g| ≤ ǫ, ku − wk1 = X
|f − h| ≤ ǫ.

As u and ǫ are arbitrary, Ck# and S0# are dense in L1 .

242P Complex L1 As you would, I hope, expect, we can repeat the work above with L1C , the space of complex-
valued integrable functions, in place of L1 , to construct a complex Banach space L1C . The required changes, based on
the ideas of 241J, are minor.
(a) In 242Aa, it is perhaps helpful to remark that, for f ∈ L0C ,
f ∈ L1C ⇐⇒ |f | ∈ L1 ⇐⇒ Re(f ), Im(f ) ∈ L1 .
Consequently, for u ∈ L0C ,
u ∈ L1C ⇐⇒ |u| ∈ L1 ⇐⇒ Re(u), Im(u) ∈ L1 .
P∞ R
(b)PTo prove
R versionR of 242E, observe that if hfn in∈N is a sequence in L1C such that n=0 |fn | < ∞,
a complex P
∞ ∞
then n=0 | Re(fn )| and n=0 | Im(fn )| are both finite, so we may apply 242E twice and see that
R P∞ R P∞ R P∞ P∞ R
( n=0 fn ) = ( n=0 Re(fn )) + ( n=0 Im(fn )) = n=0 fn .
Accordingly we can prove that L1C is complete under k k1 by the argument of 242F.

(c) Similarly, little change is needed to adapt 242J to give a description of a conditional expectation operator
P : L1C (µ) → L1C (µ↾ T) when (X, Σ, µ) is a probability space and T is a σ-subalgebra of Σ. In the formula
|P u| ≤ P |u|
of 242Je, we need to know that
|P u| = sup|ζ|=1 Re(ζP u)
in L0 (µ↾ T) (241Jc), while
Re(ζP u) = Re(P (ζu)) = P (Re(ζu)) ≤ P |u|
whenever |ζ| = 1.
Pn
(d) In 242M, we need to replace S by SC , the space of ‘complex-valued simple functions’ of the form k=0 ak χEk
where each ak is a complex number and each Ek is a measurable set of finite measure; then we get a dense linear
subspace SC = {f • : f ∈ SC } of L1C . In 242O, we must replace Ck by Ck (R r ; C), the space of bounded continuous
complex-valued functions of bounded support, and S0 by the linear span over C of {χI : I is a bounded half-open
interval}.

242X Basic exercises > (a) Let X be a set, and let µ be counting measure on X. Show that L1 (µ) can be
identified with the space ℓ1 (X) of absolutely summable real-valued functions on X (see 226A). In particular, the space
ℓ1 = ℓ1 (N) of absolutely summable real-valued sequences is an L1 space. Write out proofs of 242F adapted to these
special cases.

> (b) Let (X, Σ, µ) be any measure space, and µ̂ the completion of µ. Show that L1 (µ̂) = L1 (µ) and L1 (µ̂) = L1 (µ)
(cf. 241Xb).

(c) Let h(Xi , Σi , µ


Qi )ii∈I be a family of measure spaces, and (X, Σ, µ) their direct sum. Show that the isomorphism
between L0 (µ) and i∈I L0 (µi ) (241Xd) induces an identification between L1 (µ) and
Q P Q
{u : u ∈ i∈I L1 (µi ), kuk = i∈I ku(i)k1 < ∞} ⊆ i∈I L1 (µi ).

(d) Let (X, Σ, µ) and (Y, T, ν) be measure spaces, and φ : X → Y an inverse-measure-preserving function. Show
that g 7→ gφ : L1 (ν) → L1 (µ) (235G) induces a linear operator T : L1 (ν) → L1 (µ) such that kT vk1 = kvk1 for every
v ∈ L1 (ν).
242Yg L1 149

(e) Let U be a Riesz space (definition: 241Ed). A Riesz norm on U is a norm k k such that kuk ≤ kvk whenever
|u| ≤ |v|. Show that if U is given its norm topology (2A4Bb) for such a norm, then (i) u 7→ |u| : U → U , (u, v) 7→
u ∨ v : U × U → U are continuous (ii) {u : u ≥ 0} is closed.
(f ) Show that any Banach lattice must be an Archimedean Riesz space (241Fa).
(g) Let (X, Σ, µ) be a probability space, and T a σ-subalgebra of Σ, Υ a σ-subalgebra of T. Let P1 : L1 (µ) →
L (µ↾ T), P2 : L1 (µ↾ T) → L1 (µ↾ Υ) and P : L1 (µ) → L1 (µ↾ Υ) be the corresponding conditional expectation operators.
1

Show that P = P2 P1 .
(h) Show that if g : R r → R is continuous and has bounded support it is bounded and attains its bounds. (Hint:
2A2F-2A2G.)
1
(i) Let µ be Lebesgue measure on R. (i) Take δ > 0. Show that if φδ (x) = exp(− ) for |x| < δ, 0 for |x| ≥ δ
δ 2 −x2
Rx
then φ is smooth, that is, differentiable arbitrarily often. (ii) Show that if Fδ (x) = −∞ φδ dµ for x ∈ R then Fδ is
smooth. (iii) Show that if a < b < c < d in R there is a smooth function h such that χ[b, c] ≤ h ≤ χ[a, d]. (iv) Write
D for the space of smooth functions h : R → R such that {x : h(x) 6= 0} is bounded. Show that {h• : h ∈ D} is dense
in L1 (µ). (v) Let f be a real-valued function
R which is integrable over every bounded subset of R. Show that f × h is
integrable for every h ∈ D, and that if f × h = 0 for every h ∈ D then f = 0 a.e. (Hint: 222D.)
(j) Let (X, Σ, µ) be a probability space, T a σ-subalgebra of Σ and P : L1 (µ) → L1 (µ↾ T) ⊆ L1 (µ) the corresponding
R
conditional
R expectation
R operator. Show that if u, v ∈ L1 (µ) are such that P |u| × P |v| ∈ L1 (µ), then P u × v =
P u × P v = u × P v.

242Y Further exercises (a) Let (X, Σ, µ) be a measure space. Let A ⊆ L1 = L1 (µ) be a non-empty downwards-
directed set, and suppose that inf A = 0 in L1 . (i) Show that inf u∈A kuk1 = 0. (Hint: set γ = inf u∈A kuk1 ; find a
non-increasing sequence hun in∈N in A such that limn→∞ kun k1 = γ; set v = inf n∈N un and show that u ∧ v = v for
every u ∈ A, so that v = 0.) (ii) Show that if U is any open set containing 0, there is a u ∈ A such that v ∈ U whenever
0 ≤ v ≤ u.
(b) Let (X, Σ, µ) be a measure space and Y any subset of X; let µY be the subspace measure on Y and T : L0 (µ) →
0 1 1
L (µY ) the canonical map described in 241Yg. (i)
1
R ShowRthat T u ∈ L (µY ) and kT uk1 ≤ kuk1 for every u ∈ L (µ).
(ii) Show that if u ∈ L (µ) then kT uk1 = kuk1 iff E u = Y ∩E T u for every E ∈ Σ. (iii) Show that T is surjective and
that kvk1 = min{kuk1 : u ∈ L1 (µ), T u = v} for every v ∈ L1 (µY ). (Hint: 214Eb.) (See also 244Yd below.)
(c) Let (X, Σ, µ) be a measure space. Write L1strict for the space of all integrable Σ-measurable functions from
X to R, and N for the subspace of L1strict consisting of measurable functions which are zero almost everywhere. (i)
Show that L1strict is a Dedekind σ-complete Riesz space. (ii) Show that L1 (µ) can be identified, as ordered linear
space, with the quotient L1strict /N as defined in 241Yb. (iii) Show that k k1 is a seminorm on L1strict . (iv) Show that
f 7→ |f | : L1strict → L1strict is continuous if L1strict is given the topology defined from k k1 . (v) Show that {f : f = 0 a.e.}
is closed in L1strict , but that {f : f ≥ 0} need not be.
(d) Let (X, Σ, µ) be a measure space, and µ̃ the c.l.d. version of µ (213E). Show that the inclusion L1 (µ) ⊆ L1 (µ̃)
induces an isomorphism, as ordered normed linear spaces, between L1 (µ̃) and L1 (µ).
1
Pn
(e) Let (X, PnΣ, P un ∈ L
µ) be a measure space and u0 , . . . , P (µ). (i) Suppose k0 , . . . , kn ∈ Z are such that i=0 ki = 1.
n n Pn
Show that i=0 j=0 ki kj kui − uj k1 ≤ 0. (Hint: i=0 j=0 ki kj |αi − αj | ≤ 0 for all α0 , . . . , αn ∈ R.) (ii) Suppose
Pn P n Pn
γ0 , . . . , γn ∈ R are such that i=0 γi = 0. Show that i=0 j=0 γi γj kui − uj k1 ≤ 0.

(f ) Let (X, Σ, µ) be a measure space, and A ⊆ L1 = L1 (µ) a non-empty upwards-directed set. Suppose that R A is
boundedR for the norm k k1 . (i) Show that there is a non-decreasing sequence hun in∈N in A such that limn→∞ un =
supu∈A u, and that hun in∈N is Cauchy. (ii) Show that w = sup A is defined in L1 and belongs to the norm-closure of
A in L1 , so that, in particular, kwk1 ≤ supu∈A kuk1 .
(g) A Riesz norm (definition: 242Xe) on a Riesz space U is order-continuous if inf u∈A kuk = 0 whenever A ⊆ U is
a non-empty downwards-directed set with infimum 0. (Thus 242Ya tells us that the norms k k1 are all order-continuous.)
Show that in this case (i) any non-decreasing sequence in U which has an upper bound in U must be Cauchy (ii) if U
is a Banach lattice, it is U is Dedekind complete. (Hint for (i): if hun in∈N is a non-decreasing sequence with an upper
bound in U , let B be the set of upper bounds of {un : n ∈ N} and show that A = {v − un : v ∈ B, n ∈ N} has infimum
0 because U is Archimedean.)
150 Function spaces 242Yh

(h) Let (X, Σ, µ) be any measure space. Show that L1 (µ) has the countable sup property (241Ye).
(i) More generally, show that any Riesz space with an order-continuous Riesz norm has the countable sup property.
(j) Let (X, Σ, µ) and (Y, T, ν) be measure spaces and U ⊆ L0 (µ) a linear subspace. Let T : U → L0 (ν) be a linear
operator such that T u ≥ 0 in L0 (ν) whenever u ∈ U and u ≥ 0 in L0 (µ). Suppose that w ∈ U is such that w ≥ 0 and
T w = (χY )• . Show that whenever φ : R → R is a convex function and u ∈ L0 (µ) is such that w × u and w × φ̄(u) ∈ U ,
defining φ̄ : L0 (µ) → L0 (µ) as in 241I, then φ̄T (w × u) ≤ T (w × φ̄u). Explain how this result may be regarded as a
common generalization of Jensen’s inequality, as stated in 233I, and 242K above. See also 244M below.
(k)(i) A function φ : C → R is convex if φ(ab + (1 − a)c) ≤ aφ(b) + (1 − a)φ(c) for all b, c ∈ C and a ∈ [0, 1].
(ii) Show that such a function must be bounded on any bounded subset of C. (iii) If φ : C → R is convex and
c ∈ C, show that there is a b ∈ C such that φ(x) ≥ φ(c) + Re(b(x − c)) for every x ∈ C. (iv) If hbc ic∈C is such that
φ(x) ≥ φc (x) = φ(c) + Re(bc (x − c)) for all x, c ∈ C, show that {bc : c ∈ I} is bounded for any bounded I ⊆ C. (v)
Show that if D ⊆ C is any dense set, φ(x) = supc∈D φc (x) for every x ∈ C.

(l) Let (X, Σ, µ) be a probability space and T a σ-subalgebra of Σ. Let P : L1C (µ) → L1C (µ↾ T) be the conditional
expectation operator. Show that if φ : C → R is any convex function, and we define φ̄(f • ) = (φf )• for every f ∈ L0C (µ),
then φ̄(P u) ≤ P (φ̄(u)) whenever u ∈ L1C (µ) is such that φ̄(u) ∈ L1 (µ).

242 Notes and comments Of course L1 -spaces compose one of the most important classes of Riesz space, and
accordingly their properties have great prominence in the general theory; 242Xe, 242Xf, 242Ya and 242Yf-242Yi outline
some of the interrelations between these properties. I will return to these questions in Chapter 35 in the next volume.
I have mentioned in passing (242Dd) the additivity of the norm of L1 on the positive elements. This elementary fact
actually characterizes L1 spaces among Banach lattices; see 369E in the next volume.
Just as L0 (µ) can be regarded as a quotient of a linear space L0strict , so can L1 (µ) be regarded as a quotient of a
linear space L1strict (242Yc). I have discussed this question in the notes to §241; all I try to do here is to be consistent.
We now have a language in which we can speak of ‘the’ conditional expectation of a function f , the equivalence class
in L1 (µ↾ T) consisting precisely of all the conditional expections of f on T. If we think of L1 (µ↾ T) as identified with
its image in L1 (µ), then the conditional expectation operator P : L1 (µ) → L1 (µ↾ T) becomes a projection (242Jh). We
therefore have re-statements of 233J-233K, as in 242K, 242L and 242Yj.
I give 242O in a fairly general form; but its importance already appears if we take X to be [0, 1] with one-dimensional
Lebesgue measure. In this case, we have a natural norm on C([0, 1]), the space of all continuous real-valued functions
on [0, 1], given by setting
R1
kf k1 = 0
|f (x)|dx
for every f ∈ C([0, 1]). The integral here can, of course, be taken to be the Riemann integral; we do not need the
Lebesgue theory to show that k k1 is a norm on C([0, 1]). It is easy to check that C([0, 1]) is not complete for this norm
(if we set fn (x) = min(1, 2n xn ) for x ∈ [0, 1], then hfn in∈N is a k k1 -Cauchy sequence with no k k1 -limit in C([0, 1])).
We can use the abstract theory of normed spaces to construct a completion of C([0, 1]); but it is much more satisfactory
if this completion can be given a relatively concrete form, and this is what the identification of L1 with the completion
of C([0, 1]) can do. (Note that the remark that k k1 is a norm on C([0, 1]), that is, that kf k1 6= 0 for every non-zero
f ∈ C([0, 1]), means just that the map f 7→ f • : C([0, 1]) → L1 is injective, so that C([0, 1]) can be identified, as
ordered normed space, with its image in L1 .) It would be even better if we could find a realization of the completion
of C([0, 1]) as a space of functions on some set Z, rather than as a space of equivalence classes of functions on [0, 1].
Unfortunately this is not practical; such realizations do exist, but necessarily involve either a thoroughly unfamiliar
base set Z, or an intolerably arbitrary embedding map from C([0, 1]) into R Z .
You can get an idea of the obstacle to realizing the completion of P C([0, 1]) as a space of functions
P∞ on [0, 1] itself by

considering fn (x) = n1 xn for n ≥ 1. An easy calculation shows that n=1 kfn k1 < ∞, so that n=1 fn must exist in
the completion of C([0, 1]); but there is no natural value to assign to it at the point 1. Adaptations of this idea can
give rise to indefinitely complicated phenomena – indeed, 242O shows that every integrable function is associated with
some appropriate sequence from C([0, 1]). In §245 I shall have more to say about what k k1 -convergent sequences look
like.
From the point of view of measure theory, narrowly conceived, most of the interesting ideas appear most clearly with
real functions and real linearspaces. But some of the most important applications of measure theory – important not
only as mathematics in general, but also for the measure-theoretic questions they inspire – deal with complex functions
and complex linear spaces. I therefore continue to offer sketches of the complex theory, as in 242P. I note that at
irregular intervals we need ideas not already spelt out in the real theory, as in 242Pb and 242Yl.
243Da L∞ 151

243 L∞
The second of the classical Banach spaces of measure theory which I treat is the space L∞ . As will appear below,

L is the polar companion of L1 , the linked opposite; for ‘ordinary’ measure spaces it is actually the dual of L1
(243F-243G).

243A Definitions Let (X, Σ, µ) be any measure space. Let L∞ = L∞ (µ) be the set of functions f ∈ L0 = L0 (µ)
which are essentially bounded, that is, such that there is some M ≥ 0 such that {x : x ∈ dom f, |f (x)| ≤ M } is
conegligible, and write
L∞ = L∞ (µ) = {f • : f ∈ L∞ (µ)} ⊆ L0 (µ).
Note that if f ∈ L∞ , g ∈ L0 and g =a.e. f , then g ∈ L∞ ; thus L∞ = {f : f ∈ L0 , f • ∈ L∞ }.

243B Theorem Let (X, Σ, µ) be any measure space. Then


(a) L∞ = L∞ (µ) is a linear subspace of L0 = L0 (µ).
(b) If u ∈ L∞ , v ∈ L0 and |v| ≤ |u| then v ∈ L∞ . Consequently |u|, u ∨ v, u ∧ v, u+ = u ∨ 0 and u− = (−u) ∨ 0
belong to L∞ for all u, v ∈ L∞ .
(c) Writing e = χX • , the equivalence class in L0 of the constant function with value 1, then an element u of L0
belongs to L∞ iff there is an M ≥ 0 such that |u| ≤ M e.
(d) If u, v ∈ L∞ then u × v ∈ L∞ .
(e) If u ∈ L∞ and v ∈ L1 = L1 (µ) then u × v ∈ L1 .
proof (a) If f , g ∈ L∞ = L∞ (µ) and c ∈ R, then f + g, cf ∈ L∞ . P
P We have M1 , M2 ≥ 0 such that |f | ≤ M1 a.e.
and |g| ≤ M2 a.e. Now
|f + g| ≤ |f | + |g| ≤ M1 + M2 a.e., |cf | ≤ |c||M1 | a.e.,
∞ ∞
so f + g, cf ∈ L . Q
Q It follows at once that u + v, cu ∈ L whenever u, v ∈ L∞ and c ∈ R.
(b)(i) Take f ∈ L∞ , g ∈ L0 = L0 (µ) such that u = f • and v = g • . Then |g| ≤a.e. |f |. Let M ≥ 0 be such that
|f | ≤ M a.e.; then |g| ≤ M a.e., so g ∈ L∞ and v ∈ L∞ .
(ii) Now | |u| | = |u| so |u| ∈ L∞ whenever u ∈ L∞ . Also u ∨ v = 21 (u + v + |u − v|), u ∧ v = 21 (u + v − |u − v|)
belong to L∞ for all u, v ∈ L∞ .
(c)(i) If u ∈ L∞ , take f ∈ L∞ such that f • = u. Then there is an M ≥ 0 such that |f | ≤ M a.e., so that
|f | ≤a.e. M χX and |u| ≤ M e. (ii) Of course χX ∈ L∞ , so e ∈ L∞ , and if u ∈ L0 and |u| ≤ M e then u ∈ L∞ by (b).
(d) f × g ∈ L∞ whenever f , g ∈ L∞ . P
P If |f | ≤ M1 a.e. and |g| ≤ M2 a.e., then
|f × g| = |f | × |g| ≤ M1 M2 a.e. Q
Q
∞ ∞
So u × v ∈ L for all u, v ∈ L .

(e) If f ∈ L and g ∈ L1 = L1 (µ), then there is an M ≥ 0 such that |f | ≤ M a.e., so |f × g| ≤a.e. M |g|; because
M |g| is integrable and f × g is virtually measurable, f × g is integrable and u × v ∈ L1 .

243C The order structure of L∞ Let (X, Σ, µ) be any measure space. Then L∞ = L∞ (µ), being a linear
subspace of L0 = L0 (µ), inherits a partial order which renders it a partially ordered linear space (compare 242Ca).
Because |u| ∈ L∞ whenever u ∈ L∞ (243Bb), u ∧ v and u ∨ v belong to L∞ whenever u, v ∈ L∞ , and L∞ is a Riesz
space (compare 242Cd).
The behaviour of L∞ as a Riesz space is dominated by the fact that it has an order unit e with the property that
for every u ∈ L∞ there is an M ≥ 0 such that |u| ≤ M e
(243Bc).

243D The norm of L∞ Let (X, Σ, µ) be any measure space.


(a) For f ∈ L∞ = L∞ (µ), say that the essential supremum of |f | is
ess sup |f | = inf{M : M ≥ 0, {x : x ∈ dom f, |f (x)| ≤ M } is conegligible}.
P Set M = ess sup |f |. For each n ∈ N, there is an Mn ≤ M + 2−n such that |f | ≤ Mn a.e.
Then |f | ≤ ess sup |f | a.e. P
Now
T
{x : |f (x)| ≤ M } = n∈N {x : |f (x)| ≤ Mn }
is conegligible, so |f | ≤ M a.e. Q
Q
152 Function spaces 243Db

(b) If f , g ∈ L∞ and f =a.e. g, then ess sup |f | = ess sup |g|. Accordingly we may define a functional k k∞ on
L = L∞ (µ) by setting kuk∞ = ess sup |f | whenever u = f • .

(c) From (a), we see that, for any u ∈ L∞ , kuk∞ = min{γ : |u| ≤ γe}, where, as before, e = χX • ∈ L∞ .
Consequently k k∞ is a norm on L∞ . P
P(i) If u, v ∈ L∞ then
|u + v| ≤ |u| + |v| ≤ (kuk∞ + kvk∞ )e
so ku + vk∞ ≤ kuk∞ + kvk∞ . (ii) If u ∈ L∞ and c ∈ R then
|cu| = |c||u| ≤ |c|kuk∞ e,
so kcuk∞ ≤ |c|kuk∞ . (iii) If kuk∞ = 0, there is an f ∈ L∞ such that f • = u and |f | ≤ kuk∞ a.e.; now f = 0 a.e. so
u = 0. Q
Q

(d) Note also that if u ∈ L0 , v ∈ L∞ and |u| ≤ |v| then |u| ≤ kvk∞ e so u ∈ L∞ and kuk∞ ≤ kvk∞ ; similarly,
ku × vk∞ ≤ kuk∞ kvk∞ , ku ∨ vk∞ ≤ max(kuk∞ , kvk∞ )
∞ ∞
for all u, v ∈ L . Thus L is a commutative Banach algebra (2A4J).

(e) Moreover,
R R
| u × v| ≤ |u × v| = ku × vk1 ≤ kuk1 kvk∞
whenever u ∈ L1 and v ∈ L∞ , because
|u × v| = |u| × |v| ≤ |u| × kvk∞ e = kvk∞ |u|.

(f ) Observe that if u, v are non-negative members of L∞ then


ku ∨ vk∞ = max(kuk∞ , kvk∞ );
this is because, for any γ ≥ 0,
u ∨ v ≤ γe ⇐⇒ u ≤ γe and v ≤ γe.

243E Theorem For any measure space (X, Σ, µ), L∞ = L∞ (µ) is a Banach lattice under k k∞ .
proof (a) We already know that kuk∞ ≤ kvk∞ whenever |u| ≤ |v| (243Dd); so we have just to check that L∞ is
complete under k k∞ . Let hun in∈N be a Cauchy sequence in L∞ . For each n ∈ N choose fn ∈ L∞ = L∞ (µ) such that
fn• = un in L∞ . For all m, n ∈ N, (fm − fn )• = um − un . Consequently
Emn = {x : |fm (x) − fn (x)| > kum − un k∞ }
is negligible, by 243Da. This means that
T S
E = n∈N {x : x ∈ dom fn , |fn (x)| ≤ kun k∞ } \ m,n∈N Emn
is conegligible. But for every x ∈ E, |fm (x) − fn (x)| ≤ kum − un k∞ for all m, n ∈ N, so that hfn (x)in∈N is a Cauchy
sequence, with a limit in R. Thus f = limn→∞ fn is defined almost everywhere. Also, at least for x ∈ E,
|f (x)| ≤ supn∈N kun k∞ < ∞,
∞ ∞
so f ∈ L and u = f ∈ L . If m ∈ N, then, for every x ∈ E,

|f (x) − fm (x)| ≤ supn≥m |fn (x) − fm (x)| ≤ supn≥m kun − um k∞ ,


so
ku − um k∞ ≤ supn≥m kun − um k∞ → 0
as m → ∞, and u = limm→∞ um in L . As hun in∈N is arbitrary, L∞ is complete.

243F The duality between L∞ and L1 Let (X, Σ, µ) be any measure space.
R
(a) I have already remarked that if u ∈ L1 = L1 (µ) and v ∈ L∞ = L∞ (µ), then u×v ∈ L1 and | u×v| ≤ kuk1 kvk∞
(243Bd, 243De).
243G L∞ 153

(b) Consequently we have a bounded linear operator T from L∞ to the normed space dual (L1 )∗ of L1 , given by
writing
R
(T v)(u) = u × v for all u ∈ L1 , v ∈ L∞ .
P (i) By (a), (T v)(u) is well-defined for u ∈ L1 , v ∈ L∞ . (ii) If v ∈ L∞ , u, u1 , u2 ∈ L1 and c ∈ R, then
P
Z Z
(T v)(u1 + u2 ) = (u1 + u2 ) × v = (u1 × v) + (u2 × v)
Z Z
= u1 × v + u2 × v = (T v)(u1 ) + (T v)(u2 ),

R R R
(T v)(cu) = cu × v = c(u × v) = c u × v = c(T v)(u).
This shows that T v : L1 → R is a linear functional for each v ∈ L∞ . (iii) Next, for any u ∈ L1 and v ∈ L∞ ,
R
|(T v)(u)| = | u × v| ≤ ku × vk1 ≤ kuk1 kvk∞ ,
as remarked in (a). This means that T v ∈ (L1 )∗ and kT vk ≤ kvk∞ for every v ∈ L∞ . (iv) If v, v1 , v2 ∈ L∞ , u ∈ L1
and c ∈ R, then
Z Z
T (v1 + v2 )(u) = (v1 + v2 ) × u = (v1 × u) + (v2 × u)
Z Z
= v1 × u + v2 × u = (T v1 )(u) + (T v2 )(u)

= (T v1 + T v2 )(u),
R R
T (cv)(u) = cv × u = c v × u = c(T v)(u) = (cT v)(u).
As u is arbitrary, T (v1 + v2 ) = T v1 + T v2 and T (cv) = c(T v); thus T : L∞ → (L1 )∗ is linear. (v) Recalling from (iii)
that kT vk ≤ kvk∞ for every v ∈ L∞ , we see that kT k ≤ 1. Q Q

(c) Exactly the same arguments show that we have a linear operator T ′ : L1 → (L∞ )∗ , given by writing
R
(T ′ u)(v) = u × v for all u ∈ L1 , v ∈ L∞ ,
and that kT ′ k also is at most 1.

243G Theorem Let (X, Σ, µ) be a measure space, and T : L∞ (µ) → (L1 (µ))∗ the canonical map described in
243F. Then
(a) T is injective iff (X, Σ, µ) is semi-finite, and in this case is norm-preserving;
(b) T is bijective iff (X, Σ, µ) is localizable, and in this case is a normed space isomorphism.
proof (a)(i) Suppose that T is injective, and that E ∈ Σ has µE = ∞. Then χE is not equal R almost everywhere to
0, so (χE)• 6= 0 in L∞ , and T (χE)•R 6= 0; let u ∈R L1 be such that T (χE)• (u) 6= 0, that is, u × (χE)• 6= 0. Express
Ru as f R whereR f is integrable;

R then E f 6= 0 so E |f | =
6 0. Let g be a simple function such that 0 ≤ g ≤a.e. |f | and
Pn Pn
g > |f | − E |f |; then E g 6= 0. Express g as i=0 ai χEi where µEi < ∞ for each i; then 0 6= i=0 ai µ(Ei ∩ E),
so there is an i ≤ n such that µ(E ∩ Ei ) 6= 0, and now E ∩ Ei is a measurable subset of E of non-zero finite measure.
As E is arbitrary, this shows that (X, Σ, µ) must be semi-finite if T is injective.
(ii) Now suppose that (X, Σ, µ) is semi-finite, and that v ∈ L∞ is non-zero. Express v as g • where g : X → R
is measurable; then g ∈ L∞ . Take any a ∈ ]0, kvk∞ [; then E = {x : |g(x)| ≥ a} has non-zero measure. Let F ⊆ E
be a measurable set of non-zero finite measure, and set f (x) = |g(x)|/g(x) if x ∈ F , 0 otherwise; then f ∈ L1 and
(f × g)(x) ≥ a for x ∈ F , so, setting u = f • ∈ L1 , we have
R R R
(T v)(u) = u×v = f × g ≥ aµF = a |f | = akuk1 > 0.
This shows that kT vk ≥ a; as a is arbitrary, kT vk ≥ kvk∞ . We know already from 243F that kT vk ≤ kvk∞ , so
kT vk = kvk∞ for every non-zero v ∈ L∞ ; the same is surely true for v = 0, so T is norm-preserving and injective.
(b)(i) Using (a) and the definition of ‘localizable’, we see that under either of the conditions proposed (X, Σ, µ) is
semi-finite and T is injective and norm-preserving. I therefore have to show just that it is surjective iff (X, Σ, µ) is
localizable.
154 Function spaces 243G

(ii) Suppose that T is surjective and that E ⊆ Σ. Let F be the family of finite unions of members of E, counting
∅ as the union of no members of E, so that F is closed under finite unions and, for any G ∈ Σ, E \ G is negligible for
every E ∈ E iff E \ G is negligible forR every E ∈ F.
If u ∈ L1 , then h(u) = limE∈F ,E↑ E u exists in R. P
P If u is non-negative, then
R R
h(u) = sup{ E
u : E ∈ F} ≤ u < ∞.
For other u, we can express u as u1 − u2 , where u1 and u2 are non-negative, Rand now h(u) = h(u1 ) − h(u2 ). Q
Q
Evidently h : L1 → R is linear, being a limit of the linear functionals u 7→ E u, and also
R R
|h(u)| ≤ supE∈F | E
u| ≤ |u|
for every u, so h ∈ (L1 )∗ . Since we are supposing that T is surjective, there is a v ∈ L∞ such that T v = h. Express v
as g • where g : X → R is measurable and essentially bounded. Set G = {x : g(x) > 0} ∈ Σ.
If F ∈ Σ and µF < ∞, then
R R
F
g= (χF )• × g • = (T v)(χF )• = h(χF )• = supE∈F µ(E ∩ F ).
?? If E ∈ E and E \ G is not negligible, then there is a set F ⊆ E \ G such that 0 < µF < ∞; now
R
µF = µ(E ∩ F ) ≤ F
g ≤ 0,
as g(x) ≤ 0 for x ∈ F . X
X Thus E \ G is negligible for every E ∈ E.
Let H ∈ Σ be such that E \ H is negligible for every E ∈ E. ?? If G \ H is not negligible, there is a set F ⊆ G \ H
of non-zero finite measure. Now
µ(E ∩ F ) ≤ µ(H ∩ F ) = 0
R
for every E ∈ E, so µ(E ∩ F ) = 0 for every E ∈ F, and F g = 0; but g(x) > 0 for every x ∈ F , so µF = 0, which is
X Thus G \ H is negligible.
impossible. X
Accordingly G is an essential supremum of E in Σ. As E is arbitrary, (X, Σ, µ) is localizable.
(iii) For the rest of this proof, I will suppose that (X, Σ, µ) is localizable and seek to show that T is surjective.
Take h ∈ (L1 )∗ such that khk = 1. Write Σf = {F : F ∈ Σ, µF < ∞}, and for F ∈ Σf define νF : Σ → R by setting
νF E = h(χ(E ∩ F )• )
for every E ∈ Σ. Then νF ∅ = h(0) = 0, and if E, E ′ ∈ Σ are disjoint

νF E + νF E ′ = h(χ(E ∩ F )• ) + h(χ(E ′ ∩ F )• ) = h((χ(E ∩ F ) + χ(E ′ ∩ F ))• )


= h(χ((E ∪ E ′ ) ∩ F )• ) = νF (E ∪ E ′ ).
Thus νF is additive. Also
|νF E| ≤ kχ(E ∩ F )• k1 = µ(E ∩ F )
for every E ∈ Σ, so νF is truly continuous
R in the sense of 232Ab. By the Radon-Nikodým theorem (232E), there is an
integrable function gF such that E gF = νF E for every E ∈ Σ; we may take it that gF is measurable and has domain
X (232He).
(iv) It is worth noting that |gF | ≤ 1 a.e. P
P If G = {x : gF (x) > 1}, then
R
g
G F
= νF G ≤ µ(F ∩ G) ≤ µG;
but this is possible only if µG = 0. Similarly, if G′ = {x : gF (x) < −1}, then
R
G′
gF = νF G′ ≥ −µG′ ,
so again µG′ = 0. Q
Q
(v) If F , F ′ ∈ Σf , then gF = gF ′ almost everywhere in F ∩ F ′ . P
P If E ∈ Σ and E ⊆ F ∩ F ′ , then
R R
E
gF = h(χ(E ∩ F )• ) = h(χ(E ∩ F ′ )• ) = E
gF ′ .
Q 213N (applied to {gF ↾F : F ∈ Σf }) now tells us that, because µ is localizable, there is a
So 131Hb gives the result. Q
measurable function g : X → R such that g = gF almost everywhere in F , for every F ∈ Σf .
(vi) For any F ∈ Σf , the set
{x : x ∈ F, |g(x)| > 1} ⊆ {x : |gF (x)| > 1} ∪ {x : x ∈ F, g(x) 6= gF (x)}
243I L∞ 155

is negligible; because µ is semi-finite, {x : |g(x)| > 1} is negligible, and g ∈ L∞ , with ess sup |g| ≤ 1. Accordingly
v = g • ∈ L∞ , and we may speak of T v ∈ (L1 )∗ .
(vii) If F ∈ Σf , then
R R
F
g= F
gF = νF X = h(χF • ).
It follows at once that
R
(T v)(f • ) = f × g = h(f • )
for every simple function f : X → R. Consequently T v = h, because both T v and h are continuous and the equivalence
classes of simple functions form a dense subset of L1 (242Mb, 2A3Uc). Thus h = T v is a value of T .
(viii) The argument as written above has assumed that khk = 1. But of course any non-zero member of (L1 )∗ is
a scalar multiple of an element of norm 1, so is a value of T . So T : L∞ → (L1 )∗ is indeed surjective, and is therefore
an isometric isomorphism, as claimed.

243H Recall that L0 is always Dedekind σ-complete and sometimes Dedekind complete (241G), while L1 is always
Dedekind complete (242H). In this respect L∞ follows L0 .
Theorem Let (X, Σ, µ) be a measure space.
(a) L∞ (µ) is Dedekind σ-complete.
(b) If µ is localizable, L∞ (µ) is Dedekind complete.
proof These are both consequences of 241G. If A ⊆ L∞ = L∞ (µ) is bounded above in L∞ , fix u0 ∈ A and an upper
bound w0 of A in L∞ . If B is the set of upper bounds for A in L0 = L0 (µ), then B ∩ L∞ is the set of upper bounds
for A in L∞ . Moreover, if B has a least member v0 , then we must have u0 ≤ v0 ≤ w0 , so that
0 ≤ v 0 − u 0 ≤ w 0 − u 0 ∈ L∞
and v0 − u0 , v0 belong to L∞ . (Compare part (a) of the proof of 242H.) Thus v0 = sup A in L∞ .
Now we know that L0 is Dedekind σ-complete; if A ⊆ L∞ is a non-empty countable set which is bounded above in
L , it is surely bounded above in L0 , so has a supremum in L0 which is also its supremum in L∞ . As A is arbitrary,

L∞ is Dedekind σ-complete. While if µ is localizable, we can argue in the same way with arbitrary non-empty subsets
of L∞ to see that L∞ is Dedekind complete because L0 is.

243I A dense subspace of L∞ In 242M and 242O I described a couple of important dense linear subspaces of L1
spaces. The position concerning L∞ is a little different. However I can describe one important dense subspace.
Proposition Let (X, Σ, µ) be a measure space.
Pn(a) Write S for the space of ‘Σ-simple’ functions on X, that is, the space of∞functions ∞
from X to R expressible as
a
k=0 k χE k where a k ∈ R and E k ∈ Σ for every k ≤ n. Then for every f ∈ L = L (µ) and every ǫ > 0, there is a
g ∈ S such that ess sup |f − g| ≤ ǫ.
(b) S = {f • : f ∈ S} is a k k∞ -dense linear subspace of L∞ = L∞ (µ).
(c) If (X, Σ, µ) is totally finite, then S is the space of µ-simple functions, so S becomes just the space of equivalence
classes of simple functions, as in 242Mb.
proof (a) Let f˜ : X → R be a bounded measurable function such that f =a.e. f˜. Let n ∈ N be such that |f (x)| ≤ nǫ
for every x ∈ X. For −n ≤ k ≤ n set
Ek = {x : kǫ ≤ f˜(x) < k + 1)ǫ.
Set
Pn
g= k=−n kǫχEk ∈ S;
then 0 ≤ f˜(x) − g(x) ≤ ǫ for every x ∈ X, so
ess sup |f − g| = ess sup |f˜ − g| ≤ ǫ.

(b) This follows immediately, as in 242Mb.


(c) also is elementary.
156 Function spaces 243J

243J Conditional expectations Conditional expectations are so important that it is worth considering their
interaction with every new concept.
(a) If (X, Σ, µ) is any measure space, and T is a σ-subalgebra of Σ, then the canonical embedding S : L0 (µ↾ T) →
L0 (µ) (242Ja) embeds L∞ (µ↾ T) as a subspace of L∞ (µ), and kSuk∞ = kuk∞ for every u ∈ L∞ (µ↾ T). As in 242Jb,
we can identify L∞ (µ↾ T) with its image in L∞ (µ).

(b) Now suppose that µX = 1, and let P : L1 (µ) → L1 (µ↾ T) be the conditional expectation
R operator (242Jd).
Then L∞ (µ) is actually a linear subspace of L1 (µ). Setting e = χX • ∈ L∞ (µ), we see that F e = (µ↾ T)(F ) for every
F ∈ T, so
P e = χX • ∈ L∞ (µ↾ T).
If u ∈ L∞ (µ), then setting M = kuk∞ we have −M e ≤ u ≤ M e, so −M P e ≤ P u ≤ M P e, because P is order-
preserving (242Je); accordingly kP uk∞ ≤ M = kuk∞ . Thus P ↾L∞ (µ) : L∞ (µ) → L∞ (µ↾ T) is an operator of norm
1.
If u ∈ L∞ (µ↾ T), then P u = u; so P [L∞ ] is the whole of L∞ (µ↾ T).

243K Complex L∞ All the ideas needed to adapt the work above to complex L∞ spaces have already appeared
in 241J and 242P. Let L∞
C be

{f : f ∈ L0C , ess sup |f | < ∞} = {f : Re(f ) ∈ L∞ , Im(f ) ∈ L∞ }.


Then
L∞ ∞ 0 ∞ ∞
C = {f : f ∈ LC } = {u : u ∈ LC , Re(u) ∈ L , Im(u) ∈ L }.

Setting
kuk∞ = k|u|k∞ = ess sup |f | whenever f • = u,
we have a norm on L∞ ∞
C rendering it a Banach space. We still have u × v ∈ LC and ku × vk∞ ≤ kuk∞ kvk∞ for all u,

v ∈ LC .
We now have a duality between L1C and L∞ ∞ 1 ∗
C giving rise to a linear operator T : LC → (LC ) of norm at most 1,
defined by the formula
R
(T v)(u) = u × v for every u ∈ L1 , v ∈ L∞ .
T is injective iff the underlying measure space is semi-finite, and is a bijection iff the underlying measure space is
localizable. (This can of course be proved by re-working the arguments of 243G; but it is perhaps easier to note that
T (Re(v)) = Re(T v), T (Im(v)) = Im(T v) for every v, so that the result for complex spaces can be deduced from the
result for real spaces.) To check that T is norm-preserving when it is injective, the quickest route seems to be to imitate
the argument of (a-ii) of the proof of 243G.

243X Basic exercises (a) Let (X, Σ, µ) be any measure space, and µ̂ the completion of µ (212C, 241Xb). Show
that L∞ (µ̂) = L∞ (µ) and L∞ (µ̂) = L∞ (µ).

> (b) Let (X, Σ, µ) be a non-empty measure space. Write L∞strict for the space of bounded Σ-measurable real-valued
functions with domain X. (i) Show that L∞ (µ) = {f • : f ∈ L∞ 0 0 ∞
strict } ⊆ L = L (µ). (ii) Show that Lstrict is a Dedekind
σ-complete Banach lattice if we give it the norm
kf k∞ = supx∈X |f (x)| for every f ∈ L∞
strict .

(iii) Show that for every u ∈ L∞ = L∞ (µ), kuk∞ = min{kf k∞ : f ∈ L∞


strict , f = u}.

> (c) Let (X, Σ, µ) be any measure space, and A a subset of L∞ (µ). Show that A is bounded for the norm k k∞ iff
it is bounded above and below for the ordering of L∞ .

(d) Let (X, Σ, µ) be any measure space, and A ⊆ L∞ (µ) a non-empty set with a least upper bound w in L∞ (µ).
Show that kwk∞ ≤ supu∈A kuk∞ .

(e) Let h(Xi , Σi , µi )ii∈I be a family


Q of measure spaces, and (X, Σ, µ) their direct sum (214L). Show that the canonical
isomorphism between L0 (µ) and i∈I L0 (µi ) (241Xd) induces an isomorphism between L∞ (µ) and the subspace
Q
{u : u ∈ i∈I L∞ (µi ), kuk = supi∈I ku(i)k∞ < ∞}
Q
of i∈I L∞ (µi ).
243 Notes L∞ 157

1 ∞
R (f ) Let (X, Σ, µ) be any measure space, and u ∈ L (µ). Show that there is a v ∈ L (µ) such that kvk∞ ≤ 1 and
u × v = kuk1 .

(g) Let (X, Σ, µ) be a semi-finite measure space and v ∈ L∞ (µ). Show that
R
kvk∞ = sup{ u × v : u ∈ L1 , kuk1 ≤ 1} = sup{ku × vk1 : u ∈ L1 , kuk1 ≤ 1}.

(h) Give an example of a probability space (X, Σ, µ) and a v ∈ L∞ (µ) such that ku × vk1 < kvk∞ whenever
u ∈ L1 (µ) and kuk1 ≤ 1.

(i) Write out proofs of 243G adapted to the special cases (i) µX = 1 (ii) (X, Σ, µ) is σ-finite.

(j) Let (X, Σ, µ) be any measure space. Show that L0 (µ) is Dedekind complete iff L∞ (µ) is Dedekind complete.

(k) Let (X, Σ, µ) be a totally finite measure space and ν : Σ → R a functional. Show that the following are
equiveridical: (i) there is a continuous linear functional h : L1 (µ) → R such that h((χE)• ) = νE for every E ∈ Σ (ii)
ν is additive and there is an M ≥ 0 such that |νE| ≤ M µE for every E ∈ Σ.

> (l) Let X be any set, and let µ be counting measure on X. In this case it is customary to write ℓ∞ (X) for L∞ (µ),
and to identify it with L∞ (µ). Write out statements and proofs of the results of this chapter adapted to this special
case – if you like, with X = N. In particular, write out a direct proof that (ℓ1 )∗ can be identified with ℓ∞ . What
happens when X has just two members? or three?

(m) Show that if (X, Σ, µ) is any measure space and u ∈ L∞


C (µ), then

kuk∞ = sup{k Re(ζu)k∞ : ζ ∈ C, |ζ| = 1}.

(n) Let (X, Σ, µ) and (Y, T, ν) be measure spaces, and φ : X → Y an inverse-measure-preserving function. Show
that gφ ∈ L∞ (µ) for every g ∈ L∞ (ν), and that the map g 7→ gφ induces a linear operator T : L∞ (ν) → L∞ (µ) defined
by setting T (g • ) = (gφ)• for every g ∈ L∞ (ν). (Compare 241Xg.) Show that kT vk∞ = kvk∞ for every v ∈ L∞ (ν).

(o) For f , g ∈ C = C([0, 1]), the space of continuous real-valued functions on the unit interval [0, 1], say
f ≤ g iff f (x) ≤ g(x) for every x ∈ [0, 1],

kf k∞ = supx∈[0,1] |f (x)|.
Show that C is a Banach lattice, and that moreover
kf ∨ gk∞ = max(kf k∞ , kgk∞ ) whenever f , g ≥ 0,

kf × gk∞ ≤ kf k∞ kgk∞ for all f , g ∈ C,

kf k∞ = min{γ : |f | ≤ γχX} for every f ∈ C.

243Y Further exercises (a) Let (X, Σ, µ) be a measure space, and Y a subset of X; write µY for the subspace
measure on Y . Show that the canonical map from L0 (µ) onto L0 (µY ) (241Yg) induces a canonical map from L∞ (µ)
onto L∞ (µY ), which is norm-preserving iff it is injective iff Y has full outer measure.

243 Notes and comments I mention the formula


ku ∨ vk∞ = max(kuk∞ , kvk∞ ) for u, v ≥ 0
(243Df) because while it does not characterize L∞ spaces among Banach lattices (see 243Xo), it is in a sense dual to
the characteristic property
ku + vk1 = kuk1 + kvk1 for u, v ≥ 0
1
of the norm of L . (I will return to this in Chapter 35 in the next volume.)
The particular set L∞ I have chosen (243A) is somewhat arbitrary. The space L∞ can very well be described
entirely as a subspace of L0 , without going back to functions at all; see 243Bc, 243Dc. Just as with L0 and L1 , there
are occasions when it would be simpler to work with the linear space of essentially bounded measurable functions from
158 Function spaces 243 Notes

X to R; and we now have a third obvious candidate, the linear space L∞ strict of measurable functions from X to R which
are literally, rather than essentially, bounded, which is itself a Banach lattice (243Xb).
I suppose the most important theorem of this section is 243G, identifying L∞ with (L1 )∗ . This identification is
the chief reason for setting ‘localizable’ measure spaces apart. The proof of 243Gb is long because it depends on two
separate ideas. The Radon-Nikodým theorem deals, in effect, with the totally finite case, and then in parts (b-v) and
(b-vi) of the proof localizability is used to link the partial solutions gF together. Exercise 243Xi is supposed to help
you to distinguish the two operations. The map T ′ : L1 → (L∞ )∗ (243Fc) is also very interesting in its way, but I shall
leave it for Chapter 36.
243G gives another way of looking at conditional expectation operators. If (X, Σ, µ) is a probability space and T is
a σ-subalgebra of Σ, of course both µ and µ↾ T are localizable, so L∞ (µ) can be identified with (L1 (µ))∗ and L∞ (µ↾ T)
can be identified with (L1 (µ↾ T))∗ . Now we have the canonical embedding S : L1 (µ↾ T) → L1 (µ) (242Jb) which is a
norm-preserving linear operator, so gives rise to an adjoint operator S ′ : L1 (µ)∗ → L1 (µ↾ T)∗ defined by the formula
(S ′ h)(v) = h(Sv) for all v ∈ L1 (µ↾ T), h ∈ L1 (µ)∗ .
−1 ′
Writing Tµ : L∞ (µ) → L1 (µ)∗ and Tµ↾ T : L∞ (µ↾ T) → L1 (µ↾ T)∗ for the canonical maps, we get a map Q = Tµ↾ T S Tµ :
∞ ∞
L (µ) → L (µ↾ T), defined by saying that
R R R
Qu × v = (Tµ↾ T Qu)(v) = (S ′ Tµ u)(v) = (Tµ v)(Su) = Su × v = u×v
whenever v ∈ L1 (µ↾ T) and u ∈ L∞ (µ). But this agrees with the formula of 242L: we have
R R R R
Qu × v = u×v = P (u × v) = P u × v.
Because v is arbitrary, we must have Qu = P u for every u ∈ L∞ (µ). Thus a conditional expectation operator is, in a
sense, the adjoint of the appropriate embedding operator.
The discussion in the last paragraph applies, of course, only to the restriction P ↾L∞ (µ) of the conditional expectation
operator to the L∞ space. Because µ is totally finite, L∞ (µ) is a subspace of L1 (µ), and the real qualities of the operator
P are related to its behaviour on the whole space L1 . P : L1 (µ) → L1 (µ↾ T) can also be expressed as an adjoint operator,
but the expression needs more of the theory of Riesz spaces than I have space for here. I will return to this topic in
Chapter 36.

244 Lp
Continuing with our tour of the classical Banach spaces, we come to the Lp spaces for 1 < p < ∞. The case p = 2
is more important than all the others put together, and it would be reasonable, perhaps even advisable, to read this
section first with this case alone in mind. But the other spaces provide instructive examples and remain a basic part
of the education of any functional analyst.

244A Definitions Let (X, Σ, µ) be any measure space, and p ∈ ]1, ∞[. Write Lp = Lp (µ) for the set of functions
f ∈ L0 = L0 (µ) such that |f |p is integrable, and Lp = Lp (µ) for {f • : f ∈ Lp } ⊆ L0 = L0 (µ).
Note that if f ∈ Lp , g ∈ L0 and f =a.e. g, then |f |p =a.e. |g|p so |g|p is integrable and g ∈ Lp ; thus Lp = {f : f ∈
L , f • ∈ Lp }.
0

Alternatively, we can define up whenever u ∈ L0 , u ≥ 0 by writing (f • )p = (f p )• for every f ∈ L0 such that f (x) ≥ 0
for every x ∈ dom f (compare 241I), and say that Lp = {u : u ∈ L0 , |u|p ∈ L1 (µ)}.

244B Theorem Let (X, Σ, µ) be any measure space, and p ∈ [1, ∞].
(a) Lp = Lp (µ) is a linear subspace of L0 = L0 (µ).
(b) If u ∈ Lp , v ∈ L0 and |v| ≤ |u|, then v ∈ Lp . Consequently |u|, u ∨ v and u ∧ v belong to Lp for all u, v ∈ Lp .
proof The cases p = 1, p = ∞ are covered by 242B, 242C and 243B; so I suppose that 1 < p < ∞.
(a)(i) Suppose that f , g ∈ Lp = Lp (µ). If a, b ∈ R then |a + b|p ≤ 2p max(|a|p , |b|p ), so |f + g|p ≤a.e. 2p (|f |p ∨ |g|p );
now |f + g|p ∈ L0 and 2p (|f |p ∨ |g|p ) ∈ L1 so |f + g|p ∈ L1 . Thus f + g ∈ Lp for all f , g ∈ Lp ; it follows at once that
u + v ∈ Lp for all u, v ∈ Lp .
(ii) If f ∈ Lp and c ∈ R then |cf |p = |c|p |f |p ∈ L1 , so cf ∈ Lp . Accordingly cu ∈ Lp whenever u ∈ Lp and c ∈ R.
(b)(i) Express u as f • and v as g • , where f ∈ Lp and g ∈ L0 . Then |g| ≤a.e. |f |, so |g|p ≤a.e. |f |p and |g|p is
integrable; accordingly g ∈ Lp and v ∈ Lp .
(ii) Now | |u| | = |u| so |u| ∈ Lp whenever u ∈ Lp . Finally u ∨ v = 12 (u + v + |u − v|) and u ∧ v = 21 (u + v − |u − v|)
belong to Lp for all u, v ∈ Lp .
244E Lp 159

244C The order structure of Lp Let (X, Σ, µ) be any measure space, and p ∈ [1, ∞]. Then 244B is enough to
ensure that the partial order inherited from L0 (µ) makes Lp (µ) a Riesz space (compare 242C, 243C).

244D The norm of Lp Let (X, Σ, µ) be a measure space, and p ∈ ]1, ∞[.
R
(a) For f ∈ Lp = Lp (µ), set kf kp = ( |f |p )1/p . If f , g ∈ Lp and f =a.e. g then |f |p =a.e. |g|p so kf kp = kgkp .
Accordingly we may define k kp : Lp = Lp (µ) →R [0, ∞[ by writing kf • kp = kf kp for every f ∈ Lp .
Alternatively, we can say just that kukp = ( |u|p )1/p for every u ∈ Lp = Lp (µ).

(b) The notation k kp carries a promise that it is a norm on Lp ; this is indeed so, but I hold the proof over to 244F
below.R For the moment, however, let us note just that kcukp = |c|kukp for all u ∈ Lp and c ∈ R, and that if kukp = 0
then |u|p = 0 so |u|p = 0 and u = 0.

(c) If |u| ≤ |v| in Lp then kukp ≤ kvkp ; this is because |u|p ≤ |v|p .

244E I now work through the lemmas required to show that k kp is a norm on Lp and, eventually, that the normed
space dual of Lp may be identified with a suitable Lq .
1 1
Lemma Suppose (X, Σ, µ) is a measure space, and that p, q ∈ ]1, ∞[ are such that p + q = 1.
(a) ab ≤ p1 ap + 1q bq for all real a, b ≥ 0.
(b)(i) f × g is integrable and
R R
| f × g| ≤ |f × g| ≤ kf kp kgkq
for all f ∈ Lp = Lp (µ), g ∈ Lq = Lq (µ);
(ii) u × v ∈ L1 = L1 (µ) and
R
| u × v| ≤ ku × vk1 ≤ kukp kvkq
for all u ∈ Lp = Lp (µ), v ∈ Lq = Lq (µ).
proof (a) If either a or b is 0, this is trivial. If both are non-zero, we may argue as follows. The function x 7→ x1/p :
[0, ∞[ → R is concave, with second derivative strictly less than 0, so lies entirely below any of its tangents; in particular,
below its tangent at the point (1, 1), which has equation y = 1 + p1 (x − 1). Thus we have
1 1 1 1
x1/p ≤ x + 1 − = x+
p p p q

for every x ∈ [0, ∞[. So if c, d > 0, then


c 1c 1
( )1/p ≤ + ;
d pd q
multiplying both sides by d,
1 1
c1/p d1/q ≤ c + d;
p q
setting c = ap and d = bq , we get
1 1
ab ≤ ap + bq ,
p q
as claimed.
(b)(i)(α) Suppose first that kf kp = kgkq = 1. For every x ∈ dom f ∩ dom g we have
1 1
|f (x)g(x)| ≤ |f (x)|p + |g(x)|q
p q

by (a). So
1 1
|f × g| ≤a.e. |f |p + |g|q ∈ L1 (µ)
p q
and f × g is integrable; also
R 1R 1R 1 1 1 1
|f × g| ≤ |f |p + |g|q = kf kpp + kgkqq = + = 1.
p q p q p q
160 Function spaces 244E
R
(β) If kf kp = 0, then |f |p = 0 so |f |p =a.e. 0, f =a.e. 0, f × g =a.e. 0 and
R
|f × g| = 0 = kf kp kgkq .
Similarly, if kgkq = 0, then g =a.e. 0 and again
R
|f × g| = 0 = kf kp kgkq .
(γ) Finally, for general f ∈ Lp , g ∈ Lq such that c = kf kp and d = kgkq are both non-zero, we have k 1c f kp = k d1 gkq = 1
so
1 1
f × g = cd( f × g)
c d
is integrable, and
R R 1 1
|f × g| = cd | f × g| ≤ cd,
c d
as required.
(ii) Now if u ∈ Lp , v ∈ Lq take f ∈ Lp , g ∈ Lq such that u = f • and v = g • ; f × g is integrable, so u × v ∈ L1 ,
and
R R
| u × v| ≤ ku × vk1 = |f × g| ≤ kf kp kgkq = kukp kvkq .

Remark Part (b) is ‘Hölder’s inequality’. In the case p = q = 2 it is ‘Cauchy’s inequality’.

1 1
244F Proposition Let (X, Σ, µ) be a measure space and p ∈ ]1, ∞[. Set q = p/(p − 1), so that p + q = 1.
R
(a) For every u ∈ Lp = Lp (µ), kukp = max{ u × v : v ∈ Lq (µ), kvkq ≤ 1}.
(b) k kp is a norm on Lp .
proof (a) For u ∈ Lp , set
R
τ (u) = sup{ u × v : v ∈ Lq (µ), kvkq ≤ 1}.
By 244E(b-ii), kukp ≥ τ (u). If kukp = 0 then surely
R
0 = kukp = τ (u) = max{ u × v : v ∈ Lq (µ), kvkq ≤ 1}.
If kukp = c > 0, consider
v = c−p/q sgn u × |u|p/q ,
where for a ∈ R I write sgn a = |a|/a if a 6= 0, 0 if a = 0, so that sgn : R → R is a Borel measurable function; for
f ∈ L0 I write (sgn f )(x) = sgn(f (x)) for x ∈ dom f , so that sgn f ∈ L0 ; and for f ∈ L0 I write sgn(f • ) = (sgn f )• to
define a function sgn : L0 → L0 (cf. 241I). Then v ∈ Lq = Lq (µ) and
R R
kvkq = ( |v|q )1/q = c−p/q ( |u|p )1/q = c−p/q cp/q = 1.
So
Z Z
τ (u) ≥ u × v = c−p/q sgn u × |u| × sgn u × |u|p/q
Z Z
1+ p p
=c −p/q
|u| q =c −p/q
|u|p = cp− q = c,

p p
recalling that 1 + q = p, p − q = 1. Thus τ (u) ≥ kukp and
R
τ (u) = kukp = u × v.

(b) In view of the remarks in 244Db, I have only


R to check that ku + vkp ≤ kukp + kvkp for all u, v ∈ Lp . But given
q
u and v, let w ∈ L be such that kwkq = 1 and (u + v) × w = ku + vkp . Then
R R R
ku + vkp = (u + v) × w = u×w+ v × w ≤ kukp + kvkp ,
as required.
Remark The triangle inequality ‘ku + vkp ≤ kukp + kvkp ’ is Minkowski’s inequality.
244H Lp 161

244G Theorem Let (X, Σ, µ) be any measure space, and p ∈ [1, ∞]. Then Lp = Lp (µ) is a Banach lattice under
its norm k kp .
proof The cases p = 1, p = ∞ are covered by 242F and 243E, so let us suppose that 1 < p < ∞. We know already
that kukp ≤ kvkp whenever |u| ≤ |v|, so it remains only to show that Lp is complete.
Let hun in∈N be a sequence in Lp such that kun+1 − un kp ≤ 4−n for every n ∈ N. Note that
Pn−1 P∞
kun kp ≤ ku0 kp + k=0 kuk+1 − uk kp ≤ ku0 kp + k=0 4−k ≤ ku0 kp + 2
for every n. For each n ∈ N, choose fn ∈ Lp such that f0• = u0 , fn• = un − un−1 for n ≥ 1; do this in such a way that
dom fn = X and fn is Σ-measurable (241Bk). Then kfn kp ≤ 4−n+1 for n ≥ 1.
For m, n ∈ N, set
Emn = {x : |fm (x)| ≥ 2−n } ∈ Σ.
Then |fm (x)|p ≥ 2−np for x ∈ Emn , so
R
2−np µEmn ≤ |fm |p < ∞
and µEmn < ∞. So χEmn ∈ Lq = Lq (µ) and
R R
Emn
|fk | = |fk | × χEmn ≤ kfk kp kχEmn kq
for each k, by 244E(b-i). Accordingly
P∞ R P∞
k=0 Emn
|fk | ≤ kχEmn kq k=0 kfk kp < ∞,
P∞ S
and k=0 fk (x) exists for almost every x ∈ Emn , by 242E. This is true for all m, n ∈ N. But if x ∈ X \ m,n∈N Emn ,
P∞ P∞
every n, so k=0 fk (x) certainly exists. Thus g(x) = k=0 fk (x) is defined in R for almost every x ∈ X.
fn (x) = 0 forP
n
Set gn = k=0 fk ; then gn• = un ∈ Lp for each n, and g(x) = limn→∞ gn (x) is defined for almost every x. Now
consider |g| =a.e. limn→∞ |gn |p . We know that
p
R
lim inf n→∞ |gn |p = lim inf n→∞ kun kpp ≤ (2 + ku0 kp )p < ∞,
so by Fatou’s Lemma
R R
|g|p ≤ lim inf k→∞ |gk |p < ∞.
Thus u = g • ∈ Lp . Moreover, for any m ∈ N,
Z Z
|g − gm |p ≤ lim inf |gn − gm |p = lim inf kun − um kpp
n→∞ n→∞
n−1
X ∞
X
≤ lim inf 4−kp = 4−kp = 4−mp /(1 − 4−p ).
n→∞
k=m k=m

So
R
ku − um kp = ( |g − gm |p )1/p ≤ 4−m /(1 − 4−p )1/p → 0
as m → ∞. Thus u = limm→∞ um in Lp . As hun in∈N is arbitrary, Lp is complete.

244H Following 242M and 242O, I note that Lp behaves like L1 in respect of certain dense subspaces.
Proposition (a) Let (X, Σ, µ) be any measure space, and p ∈ [1, ∞[. Then the space S of equivalence classes of
µ-simple functions is a dense linear subspace of Lp = Lp (µ).
(b) Let X be any subset of R r , where r ≥ 1, and let µ be the subspace measure on X induced by Lebesgue measure
on R r . Write Ck for the set of (bounded) continuous functions g : R r → R such that {x : g(x) 6= 0} is bounded, and S0
for the space of linear combinations of functions of the form χI, where I ⊆ R r is a bounded half-open interval. Then
{(g↾X)• : g ∈ Ck } and {(h↾X)• : h ∈ S0 } are dense in Lp (µ).
proof (a) I repeat the argument of 242M with a tiny modification.
(i) Suppose that u ∈ Lp (µ), u ≥ 0 and ǫ > 0. Express Ru as fR• where f : X → [0, ∞[ is a measurable function. Let
g : X → R be a simple function such that 0 ≤ g ≤ f p and g ≥ f p − ǫp . Set h = g 1/p . Then h is a simple function
and h ≤ f . Because p > 1, (f − h)p + hp ≤ f p and
R R
(f − h)p ≤ f p − g ≤ ǫp ,
162 Function spaces 244H

so
R
ku − h• kp = ( |f − h|p )1/p ≤ ǫ,
while h• ∈ S.
(ii) For general u ∈ Lp , ǫ > 0, u can be expressed as u+ − u− where u+ = u ∨ 0, u− = (−u) ∨ 0 belong to Lp and
are non-negative. By (i), we can find v1 , v2 ∈ S such that ku+ −v1 kp ≤ 12 ǫ and ku− −v2 kp ≤ 12 ǫ, so that v = v1 −v2 ∈ S
and ku − vkp ≤ ǫ. As u and ǫ are arbitrary, S is dense.
(b) Again, all the ideas are to be found in 242O; the changes needed are Rin the formulae, R not in the method. They
p p p p
will go
R more easily if I note at the outset that whenever f 1 , f 2 ∈ L (µ) and |f 1 | ≤ ǫ , |f 2 | ≤ δ p (where ǫ, δ ≥ 0),
p p
then |f1 + f2 | ≤ (ǫ + δ) R; this is just the triangle
R inequality for k kp (244Fb). Also I will regularly express the target
relationships in the form ‘ X |f − g|p ≤ ǫp ’, ‘ X |f − g|p ≤ ǫp ’. Now let me run through the argument of 242Oa, rather
more briskly than before.
(i) RSuppose first that f = χI↾X where I ⊆ R r is a bounded half-open interval. As before, we can set h = χI
and get X |f − h|p = 0. This time, use the same construction
R to find an interval J and a function g ∈ Ck such that
χI ≤ g ≤ χJ and µr (J \ I) ≤ ǫp ; this will ensure that X |f − g|p ≤ ǫp .
(ii) Now suppose that f = χ(X ∩ E) where E ⊆ R r is a set of finite measure. S Then, for the same reasons
as before, there is a disjoint family I0 , . . . , In of half-open intervals such that µr (E△ j≤n Ij ) ≤ ( 21 ǫ)p . Accordingly
Pn R p 1 p
h =
R j=0 χIj ∈ S0 and X |f − h| ≤ ( 2P ǫ) . And (i) tells us that there is for each j ≤ n a gj ∈ Ck such that
R
p p n p p
X
|g j − χI j | ≤ (ǫ/2(n + 1)) , so that g = j=0 gj ∈ Ck and X |f − g| ≤ ǫ .
p
R move to simple functions, and thence to arbitrary members of L (µ), is just as before, but using kf kp
(iii) The
in place of X |f |. Finally, the translation from Lp to Lp is again direct – remembering, as before, to check that g↾X,
h↾X belong to Lp (µ) whenever g ∈ Ck and h ∈ S0 .

*244I Corollary In the context of 244Hb, Lp (µ) is separable.


proof Let A be the set
Pn
{( j=0 qj χ([aj , bj [ ∩ X))• : n ∈ N, q0 , . . . , qn ∈ Q, a0 , . . . , an , b0 , . . . , bn ∈ Q r }.
Pn
Then A is a countable subset of Lp (µ), and its closure must contain ( j=0 cj χ([aj , bj [ ∩ X))• whenever c0 , . . . , cn ∈ R
and a0 , . . . , an , b0 , . . . , bn ∈ R r ; that is, A is a closed set including {(h↾X)• : h ∈ S0 }, and is the whole of Lp (µ), by
244Hb.

244J Duality in Lp spaces Let (X, Σ, µ) be any measure space, and p ∈ ]1, ∞[. Set q = p/(p − 1); note that
1 1
p+q = 1 and that p = q/(q−1); the relation between p and q is symmetric. Now u×v ∈ L1 (µ) and ku×vk1 ≤ kukp kvkq
whenever u ∈ Lp = Lp (µ) and v ∈ Lq = Lq (µ) (244E). Consequently we have a bounded linear operator T from Lq to
the normed space dual (Lp )∗ of Lp , given by writing
R
(T v)(u) = u×v
p q
for all u ∈ L and v ∈ L , exactly as in 243F.

244K Theorem Let (X, Σ, µ) be a measure space, and p ∈ ]1, ∞[; set q = p/(p − 1). Then the canonical map
T : Lq (µ) → Lp (µ)∗ , described in 244J, is a normed space isomorphism.
Remark I should perhaps remind anyone who is reading this section to learn about L2 that the basic theory of Hilbert
spaces yields this theorem in the case p = q = 2 without any need for the more generally applicable argument given
below (see 244N, 244Yk).
proof We know that T is a bounded linear operator of norm at most 1; I need to show (i) that T is actually an
isometry (that is, that kT vk = kvkq for every v ∈ Lq ), which will show incidentally that T is injective (ii) that T is
surjective, which is the really substantial part of the theorem.
R
(a) If v ∈ Lq , then by 244Fa (recalling that p = q/(q − 1)) there is a u ∈ Lp such that kukp ≤ 1 and u × v = kvkq ;
thus kT vk ≥ (T v)(u) = kvkq . As we know already that kT vk ≤ kvkq , we have kT vk = kvkq for every v, and T is an
isometry.
(b) The rest of the proof, therefore, will be devoted to showing that T : Lq → (Lp )∗ is surjective. Fix h ∈ (Lp )∗
with khk = 1.
244K Lp 163

I need to show that h ‘lives on’ a countable union of sets of finite measure in X, in the following sense: there is
a non-decreasing
S sequence hEn in∈N of sets of finite measure such that h(f • ) = 0 whenever f ∈ Lp and f (x) = 0 for
x ∈ n∈N En . P P Choose a sequence hun in∈N in Lp such that kun kp ≤ 1 for every n and limn→∞ h(un ) = khk = 1. For
each n, express un as fn• , where fn : X → R is a measurable function. Set
Pn
En = {x : k=0 |fk (x)|p ≥ 2−n }
for n ∈ N; because |fk |p is measurable and integrable and hasS
domain X for every k, En ∈ Σ and µEn < ∞ for each n.
Now suppose that f ∈ Lp (X) and that f (x) = 0 for x ∈ n∈N En ; set u = f • ∈ Lp . ?? Suppose, if possible, that
h(u) 6= 0, and consider h(cu), where
sgn c = sgn h(u), 0 < |c| < (p |h(u)| kuk−p
p )
1/(p−1)
.
(Of course kukp 6= 0 if h(u) 6= 0.) For each n, we have
S
{x : fn (x) 6= 0} ⊆ m∈N Em ⊆ {x : f (x) = 0},
p p p
so |fn + cf | = |fn | + |cf | and
h(un + cu) ≤ kun + cukp = (kun kpp + kcukpp )1/p ≤ (1 + |c|p kukpp )1/p .
Letting n → ∞,
1 + ch(u) ≤ (1 + |c|p kukpp )1/p .
Because sgn c = sgn h(u), ch(u) = |c||h(u)| and we have
1 + p|c||h(u)| ≤ (1 + ch(u))p ≤ 1 + |c|p kukpp ,
so that
p|h(u)| ≤ |c|p−1 kukpp < p|h(u)|
by the choice of c; which is impossible. XX S
This means that h(f • ) = 0 whenever f : X → R is measurable, belongs to Lq , and is zero on n∈N En . Q Q
S
(c) Set Hn = En \ k<n Ek for each n ∈ N; then hHn in∈N is a disjoint sequence of sets of finite measure. Now
P∞
h(u) = n=0 h(u × (χHn )S •
) for every u ∈ Lp . P S f : X →S
P Express u as f • , where R is measurable. Set fn = f × χHn
for each n, g = f × χ(X \ n∈N Hn ); then h(g ) = 0, by (a), because n∈N Hn = n∈N En . Consider

Pn
gn = g + k=0 fk ∈ Lp
for each n. Then limn→∞ f − gn = 0, and
|f − gn |p ≤ |f |p ∈ L1
for every n, so by either Fatou’s Lemma or Lebesgue’s Dominated Convergence Theorem
R
limn→∞ |f − gn |p = 0,
and
n
X
lim ku − g • − u × (χHk )• kp = lim ku − gn• kp
n→∞ n→∞
k=0
Z
1/p
= lim |f − gn |p = 0,
n→∞

that is,
P∞
u = g• + k=0 u × χHk•
in Lp . Because h : Lp → R is linear and continuous, it follows that
P∞ P∞
h(u) = h(g • ) + k=0 h(u × χHk• ) = k=0 h(u × χHk• ),
as claimed. Q
Q
(d) For each n ∈ N, define νn : Σ → R by setting
νn F = h(χ(F ∩ Hn )• )
for every F ∈ Σ. (Note that νn F is always defined because µ(F ∩ Hn ) < ∞, so that
164 Function spaces 244K

kχ(F ∩ Hn )kp = µ(F ∩ Hn )1/p < ∞.)


Then νn ∅ = h(0) = 0, and if hFk ik∈N is a disjoint sequence in Σ,
S Pm S∞
kχ( k∈N Hn ∩ Fk ) − k=0 χ(Hn ∩ Fk )kp = µ(Hn ∩ k=m+1 Fk )1/p → 0
as m → ∞, so
S P∞
νn ( k∈N Fk ) = k=0 ν n Fk .
So νn is countably additive. Further, |νn F | ≤ µ(Hn ∩ F )1/p for every F ∈ Σ, so νn is truly continuous in the sense of
232Ab. R
There is therefore an integrable function gn such that νn F = F gn for every F ∈ Σ; let us suppose that S gn is
measurable and defined on the whole of X. Set g(x) = gn (x) whenever n ∈ N and x ∈ Hn , g(x) = 0 for x ∈ X \ n∈N Hn .
P∞ R
(e) g = n=0 gn × χHn is measurable
R and has the property that F g = h(χF • ) whenever n ∈ N and F is a
S
measurable subset of Hn ; consequently F g = h(χF • ) whenever n ∈ N and F is a measurable subset of En = k≤n Hk .
S
Set G = {x : g(x) > 0} ⊆ n∈N En . If F ⊆ G and µF < ∞, then
R
limn→∞ g × χ(F ∩ En ) ≤ supn∈N h(χ(F ∩ En )• ) ≤ supn∈N kχ(F ∩ En )kp = (µF )1/p ,
so by B.Levi’s theorem
R R R
F
g= g × χF = limn→∞ g × χ(F ∩ En )
R R
exists. Similarly,
R F
g exists if F ⊆ {x : g(x) < 0} has finite measure; while obviously F g exists if F ⊆ {x : g(x) = 0}.
Accordingly F g exists for every set F of finite measure. Moreover, by Lebesgue’s Dominated Convergence Theorem,
R R P∞
F
g = limn→∞ F ∩E g = limn→∞ h(χ(F ∩ En )• ) = n=0 h(χ(F ∩ Hn )• ) = h(χF • )
n

for such F , by (c) above. It follows at once that


R
g × f = h(f • )
for every simple function f : X → R.
(f ) Now g ∈ Lq . P
P (i) We already know that |g|q : X → R is measurable, because g is measurable and a 7→ |a|q is
continuous. (ii) Suppose that f is a non-negative simple function and f ≤a.e. |g|q . Then f 1/p is a simple Rfunction, Rand
1/p
R 1/p and takes only the values 0, 1 and −1, so f1 = f
sgn g is measurable × sgn g is simple. We see that |f1 |p = f ,
so kf1 kp = ( f ) . Accordingly

Z Z Z
1/p
( f) ≥ h(f1 ) = g × f1 =

|g × f 1/p |
Z
≥ f 1/q × f 1/p

(because 0 ≤ f 1/q ≤a.e. |g|)


Z
= f,

R
and we must have f ≤ 1. (iii) Thus
R
sup{ f : f is a non-negative simple function, f ≤a.e. |g|q } ≤ 1 < ∞.
But now observe that if ǫ > 0 then
S
{x : |g(x)|q ≥ ǫ} = n∈N {x : x ∈ En , |g(x)|q ≥ ǫ},
and for each n ∈ N
µ{x : x ∈ En , |g(x)|q ≥ ǫ} ≤ 1ǫ ,
because f = ǫχ{x : x ∈ En , |g(x)|q ≥ ǫ} is a simple function less than or equal to |g|q , so has integral at most 1.
Accordingly
1
µ{x : |g(x)|q ≥ ǫ} = supn∈N µ{x : x ∈ En , |g(x)|q ≥ ǫ} ≤ ǫ < ∞.
q
Thus |g| is integrable, by the criterion in 122Ja. Q
Q
244Nb Lp 165

(g) We may therefore speak of h1 = T (g • ) ∈ (Lp )∗ , and we know that it agrees with h on members of Lp of the form
f where f is a simple function. But these form a dense subset of Lp , by 244Ha, and both h and h1 are continuous, so

h = h1 , by 2A3Uc, and h is a value of T . The argument as written so far has assumed that khk = 1. But of course
any non-zero member of (Lp )∗ is a scalar multiple of an element of norm 1, so is a value of T . So T : Lq → (Lp )∗ is
indeed surjective, and is therefore an isometric isomorphism, as claimed.

244L Continuing with the same topics as in §§242 and 243, I turn to the order-completeness of Lp .
Theorem Let (X, Σ, µ) be any measure space, and p ∈ [1, ∞[. Then Lp = Lp (µ) is Dedekind complete.
proof I use 242H. Let A ⊆ Lp be a non-empty set which is bounded above in Lp . Fix u0 ∈ A and set
A′ = {u0 ∨ u : u ∈ A},
so that A′ has the same upper bounds as A and is bounded below by u0 . Fixing an upper bound w0 of A in Lp , then
u0 ≤ u ≤ w0 for every u ∈ A′ . Set
B = {(u − u0 )p : u ∈ A′ }.
Then
0 ≤ v ≤ (w0 − u0 )p ∈ L1 = L1 (µ)
for every v ∈ B, so B is a non-empty subset of L1 which is bounded above in L1 , and therefore has a least upper bound
1/p 1/p 1/p
v1 in L1 . Now v1 ∈ Lp ; consider w1 = u0 + v1 . If u ∈ A′ then (u − u0 )p ≤ v1 so u − u0 ≤ v1 and u ≤ w1 ; thus w1
is an upper bound for A′ . If w ∈ Lp is an upper bound for A′ , then u − u0 ≤ w − u0 and (u − u0 )p ≤ (w − u0 )p for every
1/p
u ∈ A′ , so (w − u0 )p is an upper bound for B and v1 ≤ (w − u0 )p , v1 ≤ w − u0 and w1 ≤ w. Thus w = sup A′ = sup A
p p
in L . As A is arbitrary, L is Dedekind complete.

244M As in the last two sections, the theory of conditional expectations is worth revisiting.
Theorem Let (X, Σ, µ) be a probability space, and T a σ-subalgebra of Σ. Take p ∈ [1, ∞]. Regard L0 (µ↾ T) as
a subspace of L0 = L0 (µ), as in 242Jh, so that Lp (µ↾ T) becomes Lp (µ) ∩ L0 (µ↾ T). Let P : L1 (µ) → L1 (µ↾ T) be
the conditional expectation operator, as described in 242Jd. Then whenever u ∈ Lp = Lp (µ), |P u|p ≤ P (|u|p ), so
P u ∈ Lp (µ↾ T) and kP ukp ≤ kukp . Moreover, P [Lp ] = Lp (µ↾ T).
proof For p = ∞, this is 243Jb, so I assume henceforth that p < ∞. Concerning the identification of Lp (µ↾ T) with
Lp ∩ L0 (µ↾ T), if S : L0 (µ↾ T) → L0 is the canonical embedding described in 242J, we have |Su|p = S(|u|p ) for every
u ∈ L0 (µ↾ T), so that Su ∈ Lp iff |u|p ∈ L1 (µ↾ T) iff u ∈ Lp (µ↾ T).
Set φ(t) = |t|p for t ∈ R; then φ is a convex function (because it is absolutely continuous on any bounded interval,
and its derivative is non-decreasing), and |u|p = φ̄(u) for every u ∈ L0 = L0 (µ), where φ̄ is defined as in 241I. Now if
u ∈ Lp = Lp (µ), we surely have u ∈ L1 (because |u| ≤ |u|p ∨ (χX)• , or otherwise); so 242K tells us that |P u|p ≤ P |u|p .
But this means that P u ∈ Lp ∩ L1 (µ↾ T) = Lp (µ↾ T), and
R R R
kP ukp = ( |P u|p )1/p ≤ ( P |u|p )1/p = ( |u|p )1/p = kukp ,
as claimed. If u ∈ Lp (µ↾ T), then P u = u, so P [Lp ] is the whole of Lp (µ↾ T).

244N The space L2 (a) As I have already remarked, the really important function spaces are L0 , L1 , L2 and L∞ .
2
L has the special property of being an inner product space;
R if (X, Σ, µ) is any measure space and u, v ∈ L2 = L2 (µ)
1 2
then u × v ∈ L (µ), by 244Eb, and we may write (u|v) = u × v. This makes L a real inner product space (because
(u1 + u2 |v) = (u1 |v) + (u2 |v), (cu|v) = c(u|v), (u|v) = (v|u),

(u|u) ≥ 0, u = 0 whenever (u|u) = 0


p
for all u, u1 , u2 , v ∈ L and c ∈ R) and its norm k k2 is the associated norm (because kuk2 = (u|u) whenever u ∈ L2 ).
2

Because L2 is complete (244G), it is a real Hilbert space. The fact that it may be identified with its own dual (244K)
can of course be deduced from this.
I will use the phrase ‘square-integrable’ to describe functions in L2 (µ).

(b) Conditional expectations take a special form in the case of L2 . Let (X, Σ, µ) be a probability space, T a σ-
subalgebra of Σ, and P : L1 = L1 (µ) → L1 (µ↾ T) ⊆ L1 the corresponding conditional expectation operator. Then
P [L2 ] ⊆ L2 , where L2 = L2 (µ) (244M), so we have an operator P2 = P ↾L2 from L2 to itself. Now P2 is an orthogonal
166 Function spaces 244Nb
R R
projection and its kernel is {u : u ∈ L2 , F u = 0 for every F ∈ T}. P P (i) If u ∈ L1 then P u = 0 iff F u = 0 for every
F ∈ T (cf. 242Je); so surely the kernel of P2 is the set described. (ii) Since P 2 = P , P2 also is a projection; because P2
has norm at most 1 (244M), and is therefore continuous,
U = P2 [L2 ] = L2 (µ↾ T) = {u : u ∈ L2 , P2 u = u}, V = {u : P2 u = 0}
are closed linear subspaces of L such that U ⊕ V = L . (iii) Now suppose that u ∈ U and v ∈ V . Then P |v| ∈ L2 , so
2 2

u × P |v| ∈ L1 and P (u × v) = u × P v, by 242L. Accordingly


R R R
(u|v) = u×v = P (u × v) = u × P v = 0.
Thus U and V are orthogonal subspaces of L2 , which is what we mean by saying that P2 is an orthogonal projection.
(Some readers will know that every projection of norm at most 1 on an inner product space is orthogonal.) Q
Q

*244O This is not the place for a detailed discussion of the geometry of Lp spaces. However there is a particularly
important fact about the shape of the unit ball which is accessible by the methods available to us here.
Theorem (Clarkson 36) Suppose that p ∈ ]1, ∞[ and (X, Σ, µ) is a measure space. Then Lp = Lp (µ) is uniformly
convex (definition: 2A4K).
proof (Hanner 56, Naor 04)
(a)(i) For 0 < t ≤ 1 and a, b ∈ R, set
φ0 (t) = (1 + t)p−1 + (1 − t)p−1 ,

(1+t)p−1 −(1−t)p−1 1 1
φ1 (t) = = ( + 1)p−1 − ( − 1)p−1 ,
tp−1 t t

ψab (t) = |a|p φ0 (t) + |b|p φ1 (t),

φ2 (b) = (1 + b)p + |1 − b|p .

(ii) We have

φ′0 (t) = (p − 1)((1 + t)p−2 − (1 − t)p−2 ), which has the same sign as p − 2,
(of course it is zero if p = 2),
p−1 1 1
φ′1 (t) = − (( − 1)p−2 − ( − 1)p−2 )
t2 t t
p−1 1
=− ((1 + t)p−2 − (1 − t)p−2 ) = − p φ′0 (t)
tp t

for every t ∈ ]0, 1[. Accordingly φ′0 − φ′1 has the same sign as p − 2 everywhere on ]0, 1[. Also
φ0 (1) = 2p−1 = φ1 (1),
so φ0 − φ1 has the same sign as 2 − p everywhere on ]0, 1].
(iii) φ2 is strictly increasing on [0, ∞[. P
P For b > 0,

φ′2 (b) = p((1 + b)p−1 − (1 − b)p−1 ) > 0 if b ≤ 1,


= p((1 + b)p−1 + (b − 1)p−1 ) > 0 if b ≥ 1. Q
Q

(iv) If 0 < b ≤ a, then

b b b
ψab ( ) = ap φ0 ( ) + bp φ1 ( )
a a a
b b a a
= ap (1 + )p−1 + ap (1 − )p−1 + bp ( + 1)p−1 − bp ( − 1)p−1
a a b b
= a(a + b)p−1 + a(a − b)p−1 + b(a + b)p−1 − b(a − b)p−1
= (a + b)p + (a − b)p = (a + b)p + |a − b|p . (†)
*244O Lp 167

′ bp ′ b b
Also ψab (t) = (ap − )φ (t) has the sign of 2 − p if 0 < t < and the sign of p − 2 if < t < 1. Accordingly
tp 0 a a
b
—– if 1 < p ≤ 2, ψab (t) ≤ ψab ( ) = (a + b)p + |a − b|p for every t ∈ ]0, 1],
a
b
—– if p ≥ 2, ψab (t) ≥ ψab ( ) = (a + b)p + |a − b|p for every t ∈ ]0, 1].
a

(v) Now consider the case 0 < a ≤ b. If 1 < p ≤ 2,

ψab (t) = ap φ0 (t) + bp φ1 (t) ≤ ap φ0 (t) + bp φ1 (t) + (bp − ap )(φ0 (t) − φ1 (t))
(by (ii))
= bp φ0 (t) + ap φ1 (t) ≤ (b + a)p + (b − a)p = (a + b)p + |a − b|p

for every t ∈ ]0, 1]. If p ≥ 2, on the other hand,

ψab (t) = ap φ0 (t) + bp φ1 (t) ≥ ap φ0 (t) + bp φ1 (t) + (bp − ap )(φ0 (t) − φ1 (t))
= bp φ0 (t) + ap φ1 (t) ≥ (a + b)p + |a − b|p

for every t.
(vi) Thus we have the inequalities

ψab (t) ≤ |a + b|p + |a − b|p if p ∈ ]1, 2] ,


(∗)
≥ |a + b|p + |a − b|p if p ∈ [2, ∞[
whenever t ∈ ]0, 1] and a, b ∈ ]0, ∞[. Since (a, b) 7→ ψab (t) is continuous for every t, the same inequalities are valid for
all a, b ∈ [0, ∞[. And since
p p
ψab (t) = ψ|a|,|b| (t), |a + b|p + |a − b|p = |a| + |b| + |a| − |b|
for all a, b ∈ R and t ∈ ]0, 1], the inequalities (*) are valid for all a, b ∈ R and t ∈ ]0, 1].
(b) Suppose that p ≥ 2.
(i)
ku + vkpp + ku − vkpp ≤ (kukp + kvkp )p + |kukp − kvkp |p
for all u, v ∈ Lp . P
P First consider the case 0 < kvkp ≤ kukp . Let f , g : X → R be Σ-measurable functions such that
f • = u and g • = v. Then for any t ∈ ]0, 1],

Z
ku + vkpp + ku − vkpp = |f (x) + g(x)|p + |f (x) − g(x)|p µ(dx)
Z
≤ ψf (x),g(x) (t)µ(dx)

(by the second inequality in (*))


Z
= |f (x)|p φ0 (t) + |g(x)|p φ1 (t)µ(dx) = kukpp φ0 (t) + kvkpp φ1 (t).

In particular, taking t = kvkp /kukp , and applying (†) from (a-iv),


ku + vkpp + ku − vkpp ≤ (kukp + kvkp )p + |kukp − kvkp |p .
Of course the result will also be true if 0 < kukp ≤ kvkp , and the case in which either u or v is zero is trivial. Q
Q
(ii) Let ǫ ∈ ]0, 2]. Set δ = 2 − (2p − ǫp )1/p > 0. If u, v ∈ Lp , kukp = kvkp = 1 and ku − vkp ≥ ǫ, then
ku + vkpp + ǫp ≤ ku + vkpp + ku − vkpp ≤ (kukp + kvkp )p + |kukp − kvkp |p = 2p ,
so ku + vkp ≤ (2p − ǫp )1/p = 2 − δ. As u, v and ǫ are arbitrary, Lp is uniformly convex.
168 Function spaces *244O

(c) Next suppose that p ∈ ]1, 2].


(i)
(kukp + kvkp )p + |kukp − kvkp |p ≤ ku + vkpp + ku − vkpp
for all u, v ∈ Lp . P
P We can repeat all the ideas, and most of the formulae, of (b-i). As before, start with the case
0 < kvkp ≤ kukp . Let f , g : X → R be Σ-measurable functions such that f • = u and g • = v. Taking t = kvkp /kukp ,
Z
ku + vkpp + ku − vkpp = |f (x) + g(x)|p + |f (x) − g(x)|p µ(dx)
Z
≥ ψf (x),g(x) (t)µ(dx)

(by the first inequality in (*))


= kukpp φ0 (t) + kvkpp φ1 (t) = (kukp + kvkp )p + |kukp − kvkp |p .
Similarly if 0 < kukp ≤ kvkp , and the case in which either u or v is zero is trivial. Q
Q
2 
(ii) Let ǫ > 0. Set γ = φ2 ( 2ǫ ) > 2 (see (a-iii) above) and δ = 2 1−( )1/p > 0. Now suppose that kukp = kvkp = 1
γ
and ku − vkp ≥ ǫ. Then ku + vkp ≤ 2 − δ. P
P If u + v = 0 this is trivial. Otherwise, set a = ku + vkp and b = ku − vkp .
Then a ≤ 2 and b ≥ ǫ, so

ǫ b
ap γ = ap φ 2 ( ) ≤ ap φ 2 ( )
2 a
(by (a-iii) again)
= (a + b)p + |a − b|p = (ku + vkp + ku − vkp )p + |ku + vkp − ku − vkp |p
≤ k2ukpp + k2vkpp
(by (i) here)
= 2p+1
2 1/p
and a ≤ 2 Q As u, v and ǫ are arbitrary, Lp is uniformly convex.
= 2 − δ. Q
γ
Remark The inequalities in (b-i) and (c-i) of the proof are called Hanner’s inequalities.

244P Complex Lp Let (X, Σ, µ) be any measure space.


(a) For any p ∈ ]1, ∞[, set
LpC = LpC (µ) = {f : f ∈ L0C (µ), |f |p is integrable},

LpC = LpC (µ) = {f • : f ∈ LpC }


= {u : u ∈ L0C (µ), Re(u) ∈ Lp (µ) and Im(u) ∈ Lp (µ)}
= {u : u ∈ L0C (µ), |u| ∈ Lp (µ)}.
R
Then LpC is a linear subspace of L0C (µ). Set kukp = k|u|kp = ( |u|p )1/p for u ∈ LpC .
(b) The proof of 244E(b-i) applies unchanged to complex-valued functions, so taking q = p/(p − 1) we get
ku × vk1 ≤ kukp kvkq
for all u ∈ LpC ,
v∈ LqC .
244Fa becomes
p
for every u ∈ LC there is a v ∈ LqC such that kvkq ≤ 1 and
R R
u × v = | u × v| = kukp ;
the same proof works, if you omit all mention of the functional τ , and allow me to write sgn a = |a|/a for all non-zero
complex numbers – it would perhaps be more natural to write sgna in place of sgn a. So, just as before, we find that
k kp is a norm. We can use the argument of 244G to show that LpC is complete. (Alternatively, note that a sequence
hun in∈N in L0C is Cauchy, or convergent, iff its real and imaginary parts are.) The space SC of equivalence classes of
‘complex-valued simple functions’ is dense in LpC . If X is a subset of R r and µ is Lebesgue measure on X, then the
space of equivalence classes of continuous complex-valued functions on X with bounded support is dense in LpC .
244Xk Lp 169
R
(c) The canonical map T : LqC → (LpC )∗ , defined by writing (T v)(u) = u×v, is surjective because T ↾Lq : Lq → (Lp )∗
is surjective; and it is an isometry by the remarks in (b) just above. Thus we can still identify LqC with (LpC )∗ .
(d) When we come to the complex form of Jensen’s inequality, it seems that a new idea is needed. I have relegated
this to 242Yk-242Yl. But for the complex form of 244M a simpler argument will suffice. If (X, Σ, µ) is a probability
space, T is a σ-subalgebra of Σ and P : L1C (µ) → L1C (µ↾ T) is the corresponding conditional expectation operator, then
for any u ∈ LpC we shall have
|P u|p ≤ (P |u|)p ≤ P (|u|p ),
applying 242Pc and 244M. So kP ukp ≤ kukp , as before.
(e) There is a special point arising with L2C . We now have to define
R
(u|v) = u × v̄
R
for u, v ∈ L2C , so that (u|u) = |u|2 = kuk22 for every u; this means that (v|u) is the complex conjugate of (u|v).

244X Basic exercises > (a) Let (X, Σ, µ) be a measure space, and (X, Σ̂, µ̂) its completion. Show that Lp (µ̂) =
L (µ) and Lp (µ̂) = Lp (µ) for every p ∈ [1, ∞].
p

(b) Let (X, Σ, µ) be a measure space, and 1 ≤ p ≤ q ≤ r ≤ ∞. Show that Lp (µ) ∩ Lr (µ) ⊆ Lq (µ) ⊆ Lp (µ) + Lr (µ) ⊆
0
L (µ). (See also 244Yh.)
(c) Let (X, Σ, µ) be a measure space. Suppose that p, q, r ∈ [1, ∞] and that p1 + 1q = 1r , setting ∞
1
= 0 as usual.
r p q
Show that u × v ∈ L (µ) and ku × vkr ≤ kukp kvkq whenever u ∈ L (µ) and v ∈ L (µ). (Hint: if r < ∞ apply Hölder’s
inequality to |u|r ∈ Lp/r , |v|r ∈ Lq/r .)
> (d)(i) Let (X, Σ, µ) be a probability space.
R Show that if 1 ≤ p ≤ r ≤ ∞ then kf kp ≤ kf kr for every f ∈ Lr (µ).
(Hint: use Hölder’s inequality to show that |f | ≤ k|f |p kr/p .) In particular, Lp (µ) ⊇ Lr (µ). (ii) Let (X, Σ, µ) be
p

a measure space such that µE ≥ 1 whenever E ∈ Σ and µE > 0. (This happens, for instance, when µ is ‘counting
measure’ on X.) Show that if 1 ≤ p ≤ r ≤ ∞ then Lp (µ) ⊆ Lr (µ) and kukp ≥ kukr for every u ∈ Lp (µ). (Hint: look
first at the case kukp = 1.)

(e) Let (X, Σ, µ) be a semi-finite measure space, and p, q ∈ [1, ∞] such that p1 + 1q = 1. Show that if u ∈ L0 (µ)\Lp (µ)
then there is a v ∈ Lq (µ) such that u × v ∈
/ L1 (µ). (Hint: reduce to the case u ≥ 0. Show thatR in this case there is for
eachPn ∈ N a un ≤ u such that 4 ≤ kun kp < ∞; take vn ∈ Lq such that kvn kq ≤ 2−n and un × vn ≥ 2n , and set
n

v = n=0 vn .)
(f ) Let h(Xi , Σi , µi )ii∈I be a family of measure spaces,Qand (X, Σ, µ) their direct sum (214L). Take any p ∈ [1, ∞[.
Show that the canonical isomorphism between L0 (µ) and i∈I L0 (µi ) (241Xd) induces an isomorphism between Lp (µ)
and the subspace
Q P
{u : u ∈ i∈I Lp (µi ), kuk = p 1/p
i∈I ku(i)kp ) < ∞}
Q p
of i∈I L (µi ).

(g) Let (X, Σ, µ) be a measure space. Set M ∞,1 = L1 (µ) ∩ L∞ (µ). Show that for u ∈ M ∞,1 the function
p 7→ kukp : [1, ∞[ → [0, ∞[ is continuous, and that kuk∞ = limp→∞ kukp . (Hint: consider first the case in which u is
the equivalence class of a simple function.)
(h) Let µ be counting measure on X = {1, 2}, so that L0 (µ) = R 2 and Lp (µ) = L0 (µ) can be identified with R 2 for
every p ∈ [1, ∞]. Sketch the unit balls {u : kukp ≤ 1} in R 2 for p = 1, 32 , 2, 3 and ∞.

(i) Let µ be counting measure on X = {1, 2, 3}, so that L0 (µ) = R 3 and Lp (µ) = L0 (µ) can be identified with R 3
for every p ∈ [1, ∞]. Describe the unit balls {u : kukp ≤ 1} in R 3 for p = 1, 2 and ∞.
(j) At which points does the argument of 244Hb break down if we try to apply it to L∞ with k k∞ ?
(k) Let p ∈ [1, ∞[. (i) Show that |ap − bp | ≥ |a − b|p for all a, b ≥ 0. (Hint: for a > b, differentiate both sides with
respect to a.) (ii) Let (X, Σ, µ) be a measure space and U a linear subspace of L0 (µ) such that (α) |u| ∈ U for every
u ∈ U (β) u1/p ∈ U for every u ∈ U (γ) U ∩ L1 is dense in L1 = L1 (µ). Show that U ∩ Lp is dense in Lp = Lp (µ).
(Hint: check first that {u : u ∈ U ∩ L1 , u ≥ 0} is dense in {u : u ∈ L1 , u ≥ 0}.) (iii) Use this to prove 244H from 242M
and 242O.
170 Function spaces 244Xl

(l) For any measure space (X, Σ, µ) write M 1,∞ = M 1,∞ (µ) for {v + w : v ∈ L1 (µ), w ∈ L∞ (µ)} ⊆ L0 (µ). Show
that M 1,∞ is a linear subspace of L0 including Lp for every p ∈ [1, ∞], and that if u ∈ L0 , v ∈ M 1,∞ and |u| ≤ |v|
then u ∈ M 1,∞ . (Hint: u = v × w where |w| ≤ χX • .)

(m) Let (X, Σ, µ) and (Y, T, ν) be two measure spaces, and let T + be the set of linear operators T : M 1,∞ (µ) →
1,∞
M (ν) such that (α) T u ≥ 0 whenever u ≥ 0 in M 1,∞ (µ) (β) T u ∈ L1 (ν) and kT uk1 ≤ kuk1 whenever u ∈ L1 (µ)
(γ) T u ∈ L∞ (ν) and kT uk∞ ≤ kuk∞ whenever u ∈ L∞ (µ). (i) Show that if φ : R → R is a convex function such
that φ(0) = 0, and u ∈ M 1,∞ (µ) is such that φ̄(u) ∈ M 1,∞ (µ) (interpreting φ̄ : L0 (µ) → L0 (µ) as in 241I), then
φ̄(T u) ∈ M 1,∞ (ν) and φ̄(T u) ≤ T (φ̄(u)) for every T ∈ T + . (ii) Hence show that if p ∈ [1, ∞] and u ∈ Lp (µ),
T u ∈ Lp (ν) and kT ukp ≤ kukp for every T ∈ T + .

>(n) Let X be any set, and let µ be counting measure on X. In this case it is customary to write ℓp (X) for
L (µ), and to identify it with Lp (µ). In particular, L2 (µ) becomes identified with ℓ2 (X), the space of square-summable
p

functions on X. Write out statements and proofs of the results of this section adapted to this special case.

(o) Let (X, Σ, µ) and (Y, T, ν) be measure spaces and φ : X → Y an inverse-measure-preserving function. Show that
the map g 7→ gφ : L0 (ν) → L0 (µ) (241Xg) induces a norm-preserving map from Lp (ν) to Lp (µ) for every p ∈ [1, ∞],
and also a map from M 1,∞ (ν) to M 1,∞ (µ) which belongs to the class T + of 244Xm.

244Y Further exercises (a) Let (X, Σ, µ) be a measure space, and (X, Σ̃, µ̃) its c.l.d. version. Show that Lp (µ) ⊆
L (µ̃) and that this embedding induces a Banach lattice isomorphism between Lp (µ) and Lp (µ̃), for every p ∈ [1, ∞[.
p

(b) Let (X, Σ, µ) be any measure space, and p ∈ [1, ∞[. Show that Lp (µ) has the countable sup property in the
sense of 241Ye. (Hint: 242Yh.)
1 1
(c) Suppose that (X, Σ, µ) is a measure space, and that p ∈ ]0, 1[, q < 0 are such that p + q = 1. (i) Show that
1 p 1 q ′ 1 ′ p′ p −p
ab ≥ pa + qb for all real a ≥ 0, b > 0. (Hint: set p = q = p, p′ −1 ,
c = (ab) , d = b and apply 244Ea.) (ii) Show
that if f , g ∈ L0 (µ) are non-negative and E = {x : x ∈ dom g, g(x) > 0}, then
R R R
( E
f p )1/p ( E
g q )1/q ≤ f × g.
(iii) Show that if f , g ∈ L0 (µ) are non-negative, then
R R R
( f p )1/p + ( g p )1/p ≤ ( (f + g)p )1/p .

(d) Let (X, Σ, µ) be a measure space, and Y a subset of X; write µY for the subspace measure on Y . Show that the
canonical map T from L0 (µ) onto L0 (µY ) (241Yg) includes a surjection from Lp (µ) onto Lp (µY ) for every p ∈ [1, ∞],
and also a map from M 1,∞ (µ) to M 1,∞ (µY ) which belongs to the class T + of 244Xm. Show that the following are
equiveridical: (i) there is some p ∈ [1, ∞[ such that T ↾Lp (µ) is injective; (ii) T : Lp (µ) → Lp (µY ) is norm-preserving
for every p ∈ [1, ∞[; (iii) F ∩ Y 6= ∅ whenever F ∈ Σ and 0 < µF < ∞.

(e) Let (X, Σ, µ) be any measure space, and p ∈ [1, ∞[. Show that the norm k kp on Lp (µ) is order-continuous in
the sense of 242Yg.

(f ) Let (X, Σ, µ) be any measure space, and p ∈ [1, ∞]. Show that if A ⊆ Lp (µ) is upwards-directed and norm-
bounded, then it is bounded above. (Hint: 242Yf.)

(g) Let (X, Σ, µ) be any measure space, and p ∈ [1, ∞]. Show that if a non-empty set A ⊆ Lp (µ) is upwards-directed
and has a supremum in Lp (µ), then k sup Akp ≤ supu∈A kukp . (Hint: consider first the case 0 ∈ A.)

(h) Let (X, Σ, µ) be a measure space and u ∈ L0 (µ). (i) Show that I = {p : p ∈ [1, ∞[ , u ∈ Lp (µ)} is an interval.
Give examples to show that it may be R open, closed or half-open. (ii) Show that p 7→ p ln kukp : I → R is convex. (Hint:
if p < q and t ∈ ]0, 1[, observe that |u|tp+(1−t)q ≤ k|u|pt k1/t k|u|q(1−t) k1/(1−t) .) (iii) Show that if p ≤ q ≤ r in I, then
kukq ≤ max(kukp , kukr ).

(i) Let [a, b] be a non-trivial closed interval in R and F : [a, b] → R a function; take p ∈ ]1, ∞[. Show that the
following are equiveridical: (i) F is absolutely continuous and its derivative F ′ belongs to Lp (µ), where µ is Lebesgue
measure on [a, b] (ii)
Pn |F (ai )−F (ai−1 )|p
c = sup{ i=1 p−1
: a ≤ a0 < a1 < . . . < an ≤ b}
(ai −ai−1 )
244 Notes Lp 171

is finite, and that in this case c = kF ′ kp . (Hint: (i) if F is absolutely continuous and F ′ ∈ Lp , use Hölder’s inequality
R b′
to show that |F (b′ ) − F (a′ )|p ≤ (b′ − a′ )p−1 a′ |F ′ |p whenever a ≤ a′ ≤ b′ ≤ b. (ii) If F satisfies the condition, show
Pn P n
that ( i=0 |F (bi ) − F (ai )|)p ≤ c( i=0 (bi − ai ))p−1 whenever a ≤ a0 ≤ b0 ≤ a1 ≤ . . . ≤ bn ≤ b, so that F is absolutely
continuous.
R ′Take R ′hFp n in∈N
a sequence of polygonal functions approximating F ; use 223Xj to show that Fn′ → F ′ a.e.,
p p
so that |F | ≤ supn∈N |Fn | ≤ c .)

(j) Let G be an open set in R r and write µ for Lebesgue measure on G. Let Ck (G) be the set of continuous functions
f : G → R such that inf{kx − yk : x ∈ G, f (x) 6= 0, y ∈ R r \ G} > 0 (counting inf ∅ as ∞). Show that for any p ∈ [1, ∞[
the set {f • : f ∈ Ck (G)} is a dense linear subspace of Lp (µ).

(k) Let U be any Hilbert space. (i) Show that if C ⊆ U is convex (that is, tu + (1 − t)v ∈ C whenever u, v ∈ C and
t ∈ [0, 1]; see 233Xd), closed and not empty, and u ∈ U , then there is a unique v ∈ C such that ku−vk = inf w∈C ku−wk,
and (u − v|v − w) ≥ 0 for every w ∈ C. (ii) Show that if h ∈ U ∗ there is a unique v ∈ U such that h(w) = (w|v) for
every w ∈ U . (Hint: apply (i) with C = {w : h(w) = 1}, u = 0.) (iii) Show that if V ⊆ U is a closed linear subspace
then there is a unique linear projection P on U such that P [U ] = V and (u − P u|v) = 0 for all u ∈ U , v ∈ V (P is
‘orthogonal’). (Hint: take P u to be the point of V nearest to u.)

(l) Let (X, Σ, µ) be a probability space, and T a σ-subalgebra


R R of Σ. Use part (iii) of 244Yk to show that there is an
orthogonal projection P : RL2 (µ) →R L2 (µ↾ T) such that F P u = F u for every u ∈ L2 (µ) and F ∈ T. Show that P u ≥ 0
whenever u ≥ 0 and that P u = u for every u, so that P has a unique extension to a continuous operator from L1 (µ)
onto L1 (µ↾ T). Use this to develop the theory of conditional expectations without using the Radon-Nikodým theorem.
2
(m) (Roselli & Willem 02) (i) Set C = [0, ∞[ ⊆ R 2 . Let φ : C → R be a continuous function such that
φ(tz) = tφ(z) for all z ∈ C. Show that φ is convex (definition: 233Xd) iff t 7→ φ(1, t) : [0, ∞[ → R is convex.
p
(ii) Show that if p ∈ ]1, ∞[ and q = p−1 then (s, t) 7→ −s1/p t1/q , (s, t) 7→ −(s1/p + t1/p )p : C → R are convex.
(iii) Show that if p ∈ [1, 2] then (s, t) 7→ |s1/p + t1/p |p + |s1/p − t1/p |p is convex. (iv) Show that if p ∈ [2, ∞[ then
(s, t) 7→ −|s1/p + t1/p |p − |s1/p − t1/p |p is convex. (v) Use (ii) and 233Yj to prove Hölder’s and Minkowski’s inequalities.
(vi) Use (iii) and (iv) to prove Hanner’s inequalities. (vii) Adapt the method to answer (ii) and (iii) of 244Yc.

(n)(i) Show that any inner product space is uniformly convex. (ii) Let U be a uniformly convex Banach space, C ⊆ U
a non-empty closed convex set, and u ∈ U . Show that there is a unique v0 ∈ C such that ku − v0 k = inf v∈C ku − vk.
(iii) Let U be a uniformly convex Banach space, and A ⊆ U a non-empty bounded set. Set δ0 = inf{δ : there is some
u ∈ U such that A ⊆ B(u, δ) = {v : kv − uk ≤ δ}}. Show that there is a unique u0 ∈ U such that A ⊆ B(u0 , δ0 ).

(o) Let (X, Σ, µ) be a measure space, and u ∈ L0 (µ). Suppose that hpn in∈N is a sequence in [1, ∞] with limit
p ∈ [1, ∞]. Show that if lim supn→∞ kukpn is finite then limn→∞ kukpn is defined and is equal to kukp .

244 Notes and comments At this point I feel we must leave the investigation of further function spaces. The next
stage would have to be a systematic abstract analysis of general Banach lattices. The Lp spaces give a solid foundation
for such an analysis, since they introduce the basic themes of norm-completeness, order-completeness and identification
of dual spaces. I have tried in the exercises to suggest the importance of the next layer of concepts: order-continuity of
norms and the relationship between norm-boundedness and order-boundedness. What I have not had space to discuss
is the subject of order-preserving linear operators between Riesz spaces, which is the key to understanding the order
structure of the dual spaces here. (But you can make a start by re-reading the theory of conditional expectation
operators in 242J-242L, 243J and 244M.) All these topics are treated in Fremlin 74 and in Chapters 35 and 36 of the
next volume.
I remember that one of my early teachers of analysis said that the Lp spaces (for p 6= 1, 2, ∞) had somehow got
into the syllabus and had never been got out again. I would myself call them classics, in the sense that they have
been part of the common experience of all functional analysts since functional analysis began; and while you are at
liberty to dislike them, you can no more ignore them than you can ignore Milton if you are studying English poetry.
Hölder’s inequality, in particular, has a wealth of applications; not only 244F and 244K, but also 244Xc-244Xd and
244Yh-244Yi, for instance.
The Lp spaces, for 1 ≤ p ≤ ∞, form a kind of continuum. In terms of the concepts dealt with here, there is no
distinction to be drawn between different Lp spaces for 1 < p < ∞ except the observation that the norm of L2 is
an inner product norm, corresponding to a Euclidean geometry on its finite-dimensional subspaces. To discriminate
between the other Lp spaces we need much more refined concepts in the geometry of normed spaces.
172 Function spaces 244 Notes

In terms of the theorems given here, L1 seems closer to the middle range of Lp for 1 < p < ∞ than L∞ does;
thus, for all 1 ≤ p < ∞, we have Lp Dedekind complete (independent of the measure space involved), the space S of
equivalence classes of simple functions is dense in Lp (again, for every measure space), and the dual (Lp )∗ is (almost)
identifiable as another function space. All of these should be regarded as consequences in one way or another of the
order-continuity of the norm of Lp for p < ∞. The chief obstacle to the universal identification of (L1 )∗ with L∞ is
that for non-σ-finite measure spaces the space L∞ can be inadequate, rather than any pathology in the L1 space itself.
(This point, at least, I mean to return to in Volume 3.) There is also the point that for a non-semi-finite measure space
the purely infinite sets can contribute to L∞ without any corresponding contribution to L1 . For 1 < p < ∞, neither
of these problems can arise. Any member of any such Lp is supported entirely by a σ-finite part of the measure space,
and the same applies to the dual – see part (c) of the proof of 244K.
Of course L1 does have a markedly different geometry from the other Lp spaces. The first sign of this is that it is not
reflexive as a Banach space (except when it is finite-dimensional), whereas for 1 < p < ∞ the identifications of (Lp )∗
with Lq and of (Lq )∗ with Lp , where q = p/(p − 1), show that the canonical embedding of Lp in (Lp )∗∗ is surjective,
that is, that Lp is reflexive. But even when L1 is finite-dimensional the unit balls of L1 and L∞ are clearly different in
kind from the unit balls of Lp for 1 < p < ∞; they have corners instead of being smoothly rounded (244Xh-244Xi). A
direct expression of the difference is in 244O. As usual, the case p = 2 is both much more important than the general
case and enormously easier (244Yn(i)); and note how Hanner’s inequalities reverse at p = 2. (See 244Yc for the reversal
of Hölder’s and Minkowski’s inequalities at p = 1.) There are occasions on which it is useful to know that k k1 and
k k∞ can be approximated, in an exactly describable way, by uniformly convex norms (244Yo). I have written out a
proof of 244O based on ingenuity and advanced calculus, like that of 244E. With a bit more about convex sets and
functions, sketched in 233Yf-233Yj, there is a striking alternative proof (244Ym). Of course the proof of 244Ea also
uses convexity, upside down.
The proof of 244K, identifying (Lp )∗ , is a fairly long haul, and it is natural to ask whether we really have to work
so hard, especially since in the case of L2 we have a much easier argument (244Yk). Of course we can go faster if we
know a bit more about Banach lattices (§369 in Volume 3 has the relevant facts), though this route uses some theorems
quite as hard as 244K as given. There are alternative routes using the geometry of the Lp spaces, following the ideas
of 244Yk, but I do not think they are any easier, and the argument I have presented here at least has the virtue of
using some of the same ideas as the identification of (L1 )∗ in 243G. The difference is that whereas in 243G we may
have to piece together a large family of functions gF (part (b-v) of the proof), in 244K there are only countably many
gn ; consequently we can make the argument work for arbitrary measure spaces, not just localizable ones.
The geometry of Hilbert space gives us an approach to conditional expectations which does not depend on the
Radon-Nikodým theorem (244Yl). To turn these ideas into a proof of the Radon-Nikodým theorem itself, however,
requires qualities of determination and ingenuity which can be better employed elsewhere.
The convexity arguments of 233J/242K can be used on many operators besides conditional expectations (see 244Xm).
The class ‘T + ’ described there is not in fact the largest for which these arguments work; I take the ideas farther in
Chapter 37. There is also a great deal more to be said if you put an arbitrary pair of Lp spaces in place of L1 and L∞
in 244Xl. 244Yh is a start, but for the real thing (the ‘Riesz convexity theorem’) I refer you to Zygmund 59, XII.1.11
or Dunford & Schwartz 57, VI.10.11.

245 Convergence in measure


I come now to an important and interesting topology on the spaces L0 and L0 . I start with the definition (245A)
and with properties which echo those of the Lp spaces for p ≥ 1 (245D-245E). In 245G-245J I describe the most useful
relationships between this topology and the norm topologies of the Lp spaces. For σ-finite spaces, it is metrizable
(245Eb) and sequential convergence can be described in terms of pointwise convergence of sequences of functions
(245K-245L).

245A Definitions Let (X, Σ, µ) be a measure space.

(a) For any measurable set F ⊆ X of finite measure, we have a functional τF on L0 = L0 (µ) defined by setting
R
τF (f ) = |f | ∧ χF
for every f ∈ L . (The integral exists in R because |f | ∧ χF belongs to L0 and is dominated by the integrable function
0

χF ). Now τF (f + g) ≤ τF (f ) + τF (g) whenever f , g ∈ L0 . P


P We need only observe that
min(|(f + g)(x)|, χF (x)) ≤ min(|f (x)|, χF (x)) + min(|g(x)|, χF (x))
for every x ∈ dom f ∩ dom g, which is almost every x ∈ X. Q
Q Consequently, setting ρF (f, g) = τF (f − g), we have
245Cb Convergence in measure 173

ρF (f, h) = τF ((f − g) + (g − h)) ≤ τF (f − g) + τF (g − h) = ρF (f, g) + ρF (g, h),

ρF (f, g) = τF (f − g) ≥ 0,

ρF (f, g) = τF (f − g) = τF (g − f ) = ρF (g, f )
for all f , g, h ∈ L ; that is, ρF is a pseudometric on L0 .
0

(b) The family


{ρF : F ∈ Σ, µF < ∞}
now defines a topology on L (2A3F); I will call it the topology of convergence in measure on L0 .
0

(c) If f , g ∈ L0 and f =a.e. g, then |f | ∧ χF =a.e. |g| ∧ χF and τF (f ) = τF (g), for every set F of finite measure.
Consequently we have functionals τ̄F on L0 = L0 (µ) defined by writing
τ̄F (f • ) = τF (f )
whenever f ∈ L0 , F ∈ Σ and µF < ∞. Corresponding to these we have pseudometrics ρ̄F defined by either of the
formulae
ρ̄F (u, v) = τ̄F (u − v), ρ̄F (f • , g • ) = ρF (f, g)
for u, v ∈ L0 , f , g ∈ L0 and F of finite measure. The family of these pseudometrics defines the topology of
convergence in measure on L0 .

(d) I shall allow myself to say that a sequence (in L0 or L0 ) converges in measure if it converges for the topology
of convergence in measure (in the sense of 2A3M).

245B Remarks (a) Of course the topologies of L0 , L0 are about as closely related as it is possible for them to be.
Not only is the topology of L0 the quotient of the topology on L0 (that is, a set G ⊆ L0 is open iff {f : f • ∈ G} is
open in L0 ), but every open set in L0 is the inverse image under the quotient map of an open set in L0 .

(b) It is convenient to note that if F0 , . . . , Fn are measurable sets of finite measure with union F , then, in the
notation of 245A, τFi ≤ τF for every i; this means that a set G ⊆ L0 is open for the topology of convergence in measure
iff for every f ∈ G we can find a single set F of finite measure and a δ > 0 such that
ρF (g, f ) ≤ δ =⇒ g ∈ G.
Similarly, a set G ⊆ L0 is open for the topology of convergence in measure iff for every u ∈ G we can find a set F of
finite measure and a δ > 0 such that
ρ̄F (v, u) ≤ δ =⇒ v ∈ G.

(c) The phrase ‘topology of convergence in measure’ agrees well enough with standard usage when (X, Σ, µ) is
totally finite. But a warning! the phrase ‘topology of convergence in measure’ is also used for the topology defined
by the metric of 245Ye below, even when µX = ∞. I have seen the phrase local convergence in measure used for
the topology of 245A. Most authors ignore non-σ-finite spaces in this context. However I hold that 245D-245E below
are of sufficient interest to make the extension worth while.

245C Pointwise convergence The topology of convergence in measure is almost definable in terms of ‘pointwise
convergence’, which is one of the roots of measure theory. The correspondence is closest in σ-finite measure spaces (see
245K), but there is still a very important relationship in the general case, as follows. Let (X, Σ, µ) be a measure space,
and write L0 = L0 (µ), L0 = L0 (µ).

(a) If hfn in∈N is a sequence in L0 converging almost everywhere to f ∈ L0 , then hfn in∈N → f in measure. P P By
2A3Mc, I have only to show that limn→∞ ρF (fn , f ) = 0 whenever µF < ∞. But h|fn − f | ∧ χF in∈N converges to 0 a.e.
and is dominated by the integrable function χF , so by Lebesgue’s Dominated Convergence Theorem
R
limn→∞ ρF (fn , f ) = limn→∞ |fn − f | ∧ χF = 0. Q
Q

(b) To formulate a corresponding result applicable to L0 , we need the following concept. If hfn in∈N , hgn in∈N are
sequences in L0 such that fn• = gn• for every n, and f , g ∈ L0 are such that f • = g • , and hfn in∈N → f a.e., then
hgn in∈N → g a.e., because
174 Function spaces 245Cb

\ \
{x : x ∈ dom f ∩ dom g ∩ dom fn ∩ gn ,
n∈N n∈N

g(x) = f (x) = lim fn (x), fn (x) = gn (x) ∀ n ∈ N}


n→∞

is conegligible. Consequently we have a definition applicable to sequences in L0 ; we can say that, for f , fn ∈ L0 ,
hfn• in∈N is order*-convergent, or order*-converges, to f • iff f =a.e. limn→∞ fn . In this case, of course, hfn in∈N → f
in measure. Thus, in L0 , a sequence hun in∈N which order*-converges to u ∈ L0 also converges to u in measure.
Remark I suggest alternative descriptions of order-convergence in 245Xc; the conditions (iii)-(vi) there are in forms
adapted to more general structures.

(c) For a typical example of a sequence which is convergent in measure without being order-convergent, consider
the following. Take µ to be Lebesgue measure on [0, 1], and set fn (x) = 2m if x ∈ [2−m k, 2−m (k + 1)], 0 otherwise,
where k = k(n) ∈ N, m = m(n) ∈ N are defined by saying that n + 1 = 2m + k and 0 ≤ k < 2m . Then hfn in∈N → 0
for the topology of convergence in measure (since ρF (fn , 0) ≤ 2−m if F ⊆ [0, 1] is measurable and 2m − 1 ≤ n), though
hfn in∈N is not convergent to 0 almost everywhere (indeed, lim supn→∞ fn = ∞ everywhere).

245D Proposition Let (X, Σ, µ) be any measure space.


(a) The topology of convergence in measure is a linear space topology on L0 = L0 (µ).
(b) The maps ∨, ∧ : L0 × L0 → L0 , and u 7→ |u|, u 7→ u+ , u 7→ u− : L0 → L0 are all continuous.
(c) The map × : L0 × L0 → L0 is continuous.
(d) For any continuous function h : R → R, the corresponding function h̄ : L0 → L0 (241I) is continuous.
proof (a) The point is that the functionals τ̄F , as defined in 245Ac, satisfy the conditions of 2A5B below. P
P Fix a
set F ∈ Σ of finite measure. I noted in 245Aa that
τF (f + g) ≤ τF (f ) + τF (g) for all f , g ∈ L0 ,
so
τ̄F (u + v) ≤ τ̄F (u) + τ̄F (v) for all u, v ∈ L0 .
Next,

τ̄F (cu) ≤ τ̄F (u) whenever u ∈ L0 and |c| ≤ 1 (*)

because |cf | ∧ χF ≤a.e. |f | ∧ χF whenever f ∈ L0 and |c| ≤ 1. Finally, given u ∈ L0 and ǫ > 0, let f ∈ L0 be such that
f • = u. Then
limn→∞ |2−n f | ∧ χF = 0 a.e.,
so by Lebesgue’s Dominated Convergence Theorem
R
limn→∞ τ̄F (2−n u) = limn→∞ |2−n f | ∧ χF = 0,
and there is an n such that τ̄F (2−n u) ≤ ǫ. It follows (by (*) just above) that τ̄F (cu) ≤ ǫ whenever |c| ≤ 2−n . As ǫ is
arbitrary, limc→0 τ̄F (u) = 0 for every u ∈ L0 ; which is the third condition in 2A5B. QQ
Now 2A5B tells us that the topology defined by the τ̄F is a linear space topology.
(b) For any u, v ∈ L0 , ||u| − |v|| ≤ |u − v|, so ρ̄F (|u|, |v|) ≤ ρ̄F (u, v) for every set F of finite measure. By 2A3H,
| | : L0 → L0 is continuous. Now
1 1
u ∨ v = (u + v + |u − v|), u ∧ v = (u + v − |u − v|),
2 2

u+ = u ∧ 0, u− = (−u) ∧ 0.
+ −
As addition and subtraction are continuous, so are ∨, ∧, and .
(c) Take u0 , v0 ∈ L0 , F ∈ Σ a set of finite measure and ǫ > 0. Represent u0 and v0 as f0• , g0• respectively, where f0 ,
g0 : X → R are Σ-measurable (241Bk). If we set
Fm = {x : x ∈ F, |f0 (x)| + |g0 (x)| ≤ m},
245E Convergence in measure 175

then hFm im∈N is a non-decreasing sequence of sets with union F , so there is an m ∈ N such that µ(F \ Fm ) ≤ 12 ǫ. Let
δ > 0 be such that (2m + µF )δ 2 + 2δ ≤ 12 ǫ.
Now suppose that u, v ∈ L0 are such that ρ̄F (u, u0 ) ≤ δ 2 and ρ̄F (v, v0 ) ≤ δ 2 . Let f , g : X → R be measurable
functions such that f • = u and v • = v. Then
µ{x : x ∈ F, |f (x) − f0 (x)| ≥ δ} ≤ δ, µ{x : x ∈ F, |g(x) − g0 (x)| ≥ δ} ≤ δ,
so that
µ{x : x ∈ F, |f (x) − f0 (x)||g(x) − g0 (x)| ≥ δ 2 } ≤ 2δ
and
R
F
min(1, |f − f0 | × |g − g0 |) ≤ δ 2 µF + 2δ.
Also
|f × g − f0 × g0 | ≤ |f − f0 | × |g − g0 | + |f0 | × |g − g0 | + |f − f0 | × |g0 |,
so that
Z
ρ̄F (u × v, u0 × v0 ) = min(1, |f × g − f0 × g0 |)
F
Z
1
≤ ǫ+ min(1, |f × g − f0 × g0 |)
2 F
Z m
1
≤ ǫ+ min(1, |f − f0 | × |g − g0 | + m|g − g0 | + m|f − f0 |)
2 F
Z m
1
≤ ǫ+ min(1, |f − f0 | × |g − g0 |)
2 F
Z Z
+m min(1, |g − g0 |) + m min(1, |f − f0 |)
F F
1
≤ ǫ + δ 2 µF + 2δ + 2mδ ≤ ǫ. 2
2

As F and ǫ are arbitrary, × is continuous at (u0 , v0 ); as u0 and v0 are arbitrary, × is continuous.

(d) Take u ∈ L0 , F ∈ Σ of finite measure and ǫ > 0. Then there is a δ > 0 such that ρF (h̄(v), h̄(u)) ≤ ǫ whenever
P?? Otherwise, we can find, for each n ∈ N, a vn such that ρ̄F (vn , u) ≤ 4−n but ρ̄F (h̄(vn ), h̄(u)) > ǫ.
ρF (v, u) ≤ δ. P
Express u as f and vn as gn• where f , gn : X → R are measurable. Set

En = {x : x ∈ F, |gn (x) − f (x)| ≥ 2−n }


T S
for each n. Then ρ̄F (vn , u) ≥ 2−n µEn , so µEn ≤ 2−n for each n, and E = n∈N m≥n Em is negligible. But
limn→∞ gn (x) = f (x) for every x ∈ F \ E, so (because h is continuous) limn→∞ h(gn (x)) = h(f (x)) for every x ∈ F \ E.
Consequently (by Lebesgue’s Dominated Convergence Theorem, as always)
R
limn→∞ ρ̄F (h̄(vn ), h̄(u)) = limn→∞ F
min(1, |h(gn (x)) − h(f (x))|µ(dx) = 0,
which is impossible. X
XQQ
By 2A3H, h̄ is continuous.

Remark I cannot say that the topology of convergence in measure on L0 is a linear space topology solely because (on
the definitions I have chosen) L0 is not in general a linear space.

245E I turn now to the principal theorem relating the properties of the topological linear space L0 (µ) to the
classification of measure spaces in Chapter 21.

Theorem Let (X, Σ, µ) be a measure space. Let T be the topology of convergence in measure on L0 = L0 (µ), as
described in 245A.
(a) (X, Σ, µ) is semi-finite iff T is Hausdorff.
(b) (X, Σ, µ) is σ-finite iff T is metrizable.
(c) (X, Σ, µ) is localizable iff T is Hausdorff and L0 is complete under T.
176 Function spaces 245E

proof I use the pseudometrics ρF on L0 = L0 (µ), ρ̄F on L0 described in 245A.


(a)(i) Suppose that (X, Σ, µ) is semi-finite and that u, v are distinct members of L0 . Express them as f • and g •
where f and g are measurable functions from X to R. Then µ{x : f (x) 6= g(x)} > 0 so, because (X, Σ, µ) is semi-finite,
there is a set F ∈ Σ of finite measure such that µ{x : x ∈ F, f (x) 6= g(x)} > 0. Now
R
ρ̄F (u, v) = F
min(|f (x) − g(x)|, 1)dx > 0
(see 122Rc). As u and v are arbitrary, T is Hausdorff (2A3L).
(ii) Suppose that T is Hausdorff and that E ∈ Σ, µE > 0. Then u = χE • 6= 0 so there is an F ∈ Σ such that
µF < ∞ and ρ̄F (u, 0) 6= 0, that is, µ(E ∩ F ) > 0. Now E ∩ F is a non-negligible set of finite measure included in E.
As E is arbitrary, (X, Σ, µ) is semi-finite.
(b)(i) Suppose that (X, Σ, µ) is σ-finite. Let hEn in∈N be a non-decreasing sequence of sets of finite measure covering
X. Set

X∞
ρ̄En (u, v)
ρ̄(u, v) =
n=0
1 + 2n µEn

for u, v ∈ L0 . Then ρ̄ is a metric on L0 . P


P Because every ρ̄En is a pseudometric, so is ρ̄. If ρ̄(u, v) = 0, express u as
f • , v as g • where f , g ∈ L0 (µ); then
R
|f − g| ∧ χEn = ρ̄En (u, v) = 0,
S
so f = g almost everywhere in En , for every n. Because X = n∈N En , f =a.e. g and u = v. Q Q
If F ∈ Σ and µF < ∞ and ǫ > 0, take n such that µ(F \ En ) ≤ 12 ǫ. If u, v ∈ L0 and ρ̄(u, v) ≤ ǫ/2(1 + 2n µEn ), then
P Express u as f • , v = g • where f , g ∈ L0 . Then
ρ̄F (u, v) ≤ ǫ. P
R ǫ
|u − v| ∧ χEn = ρ̄En (u, v) ≤ (1 + 2n µEn )ρ̄(u, v) ≤ ,
2
while
R ǫ
|f − g| ∧ χ(F \ En ) ≤ µ(F \ En ) ≤ ,
2
so
R R R ǫ ǫ
ρ̄F (u, v) = |f − g| ∧ χF ≤ |f − g| ∧ χEn + |f − g| ∧ χ(F \ En ) ≤ + = ǫ. Q
Q
2 2

In the other direction, given ǫ > 0, take n ∈ N such that 2−n ≤ 21 ǫ; then ρ̄(u, v) ≤ ǫ whenever ρ̄En (u, v) ≤ ǫ/2(n + 1).
These show that ρ̄ defines the same topology as the ρ̄F (2A3Ib), so that T, the topology defined by the ρ̄F , is
metrizable.
(ii) Suppose that T is metrizable. Let ρ̄ be a metric defining T. For each n ∈ N there must be a measurable set
Fn of finite measure and a δn > 0 such that
ρ̄Fn (u, 0) ≤ δn =⇒ ρ̄(u, 0) ≤ 2−n .
S
Set E = X \ n∈N Fn . ?? If E is not negligible, then u = χE • 6= 0; because ρ̄ is a metric, there is an n ∈ N such that
ρ̄(u, 0) > 2−n ; now
µ(E ∩ Fn ) = ρ̄Fn (u, 0) > δn .
But E ∩ Fn = ∅. X
X S
So µE = 0 < ∞. Now X = E ∪ n∈N Fn is a countable union of sets of finite measure, so µ is σ-finite.
(c) By (a), either hypothesis ensures that µ is semi-finite and that T is Hausdorff.
(i) Suppose that (X, Σ, µ) is localizable. Let F be a Cauchy filter on L0 (2A5F). For each measurable set F of
finite measure, choose a sequence ≤ 4−n for every u, v ∈ An (F ) and
T hAn (F )in∈N of members of F such that ρ̄F (u, v)−n
every n (2A5G). Choose uF n ∈ k≤n An (F ) for each n; then ρ̄F (uF,n+1 , uF n ) ≤ 2 for each n. Express each uF n as
fF• n where fF n is a measurable function from X to R. Then
µ{x : x ∈ F, |fF,n+1 (x) − fF n (x)| ≥ 2−n } ≤ 2n ρ̄F (uF,n+1 , uF n ) ≤ 2−n
for each n. It follows that hfF n in∈N must converge almost everywhere in F . P
P Set
Hn = {x : x ∈ F, |fF,n+1 (x) − fF n (x)| ≥ 2−n }.
245E Convergence in measure 177

Then µHn ≤ 2−n for each n, so


T S P∞
µ( n∈N m≥n Hm ) ≤ inf n∈N m=n 2−m = 0.
T S S
If x ∈ F \ n∈N m≥n Hm , then there is some k such that x ∈ F \ m≥k Hm , so that |fF,m+1 (x) − fF m (x)| ≤ 2−m for
every m ≥ k, and hfF n (x)in∈N is Cauchy, therefore convergent. QQ
Set fF (x) = limn→∞ fF n (x) for every x ∈ F such that the limit is defined in R, so that fF is measurable and defined
almost everywhere in F .
If E, F are two sets of finite measure and E ⊆ F , then ρ̄E (uEn , uF n ) ≤ 2 · 4−n for each n. PP An (E) and An (F )
both belong to F, so must have a point w in common; now

ρ̄E (uEn , uF n ) ≤ ρ̄E (uEn , w) + ρ̄E (w, uF n )


≤ ρ̄E (uEn , w) + ρ̄F (w, uF n ) ≤ 4−n + 4−n . Q
Q
Consequently
µ{x : x ∈ E, |fF n (x) − fEn (x)| ≥ 2−n } ≤ 2n ρ̄E (uF n , uEn ) ≤ 2−n+1
for each n, and limn→∞ fF n − fEn = 0 almost everywhere in E; so that fE = fF a.e. on E.
Consequently, if E and F are any two sets of finite measure, fE = fF a.e. on E ∩ F , because both are equal almost
everywhere on E ∩ F to fE∪F .
Because µ is localizable, it follows that there is an f ∈ L0 such that f = fE a.e. on E for every measurable set E of
finite measure (213N). Consider u = f • ∈ L0 . For any set E of finite measure,
R R
ρ̄E (u, uEn ) = E
min(1, |f (x) − fEn (x)|)dx = E
min(1, |fE (x) − fEn (x)|)dx → 0
as n → ∞, using Lebesgue’s Dominated Convergence Theorem. Now

inf sup ρ̄E (v, u) ≤ inf sup ρ̄E (v, u)


A∈F v∈A n∈N v∈AEn

≤ inf sup (ρ̄E (v, uEn ) + ρ̄E (u, uEn ))


n∈N v∈AEn

≤ inf (4−n + ρ̄E (u, uEn )) = 0.


n∈N

As E is arbitrary, F → u. As F is arbitrary, L0 is complete under T.


(ii) Now suppose that L0 is complete under T and let E be any family of sets in Σ. Let E ′ be
S
{ E0 : E0 is a finite subset of E}.
Then the union of any two members of E ′ belongs to E ′ . Let F be the set
{A : A ⊆ L0 , A ⊇ AE for some E ∈ E ′ },
where for E ∈ E ′ I write
AE = {χF • : F ∈ E ′ , F ⊇ E}.
Then F is a filter on L0 , because AE ∩ AF = AE∪F for all E, F ∈ E ′ .
In fact F is a Cauchy filter. PP Let H be any set of finite measure and ǫ > 0. Set γ = supE∈E ′ µ(H ∩ E) and take
E ∈ E ′ such that µ(H ∩ E) ≥ γ − ǫ. Consider AE ∈ F. If F , G ∈ E ′ and F ⊇ E, G ⊇ E then

ρ̄H (χF • , χG• ) = µ(H ∩ (F △G)) = µ(H ∩ (F ∪ G)) − µ(H ∩ F ∩ G)


≤ γ − µ(H ∩ E) ≤ ǫ.
Thus ρ̄H (u, v) ≤ ǫ for all u, v ∈ AE . As H and ǫ are arbitrary, F is Cauchy. QQ
Because L0 is complete under T, F has a limit w say. Express w as h• , where h : X → R is measurable, and consider
G = {x : h(x) > 12 }.
?? If E ∈ E and µ(E \ G) > 0, let F ⊆ E \ G be a set of non-zero finite measure. Then {u : ρ̄F (u, w) < 12 µF } belongs
to F, so meets AE ; let H ∈ E ′ be such that E ⊆ H and ρ̄F (χH • , w) < 12 µF . Then
R 1
F
min(1, |1 − h(x)|) = ρ̄F (χH • , w) < µF ;
2

but because F ∩ G = ∅, 1 − h(x) ≥ 21 for every x ∈ F , so this is impossible. X


X
Thus E \ G is negligible for every E ∈ E.
178 Function spaces 245E

Now suppose that H ∈ Σ and µ(G \ H) > 0. Then there is an E ∈ E such that µ(E \ H) > 0. P P Let F ⊆ G \ H
be a set of non-zero finite measure. Let u ∈ A∅ be such that ρ̄F (u, w) < 12 µF . Then u is of the form χC • for some
C ∈ E ′ , and
R 1
F
min(1, |χC(x) − h(x)|) < µF .
2

As h(x) ≥ 12 for every x ∈ F , µ(C ∩ F ) > 0. But C is a finite union of members of E, so there is an E ∈ E such that
µ(E ∩ F ) > 0, and now µ(E \ H) > 0. Q Q
As H is arbitrary, G is an essential supremum of E in Σ. As E is arbitrary, (X, Σ, µ) is localizable.

245F Alternative description of the topology of convergence in measure Let us return to arbitrary measure
spaces (X, Σ, µ).
(a) For any F ∈ Σ of finite measure and ǫ > 0 define τF ǫ : L0 → [0, ∞[ by
τF ǫ (f ) = µ∗ {x : x ∈ F ∩ dom f, |f (x)| > ǫ}
for f ∈ L0 , taking µ∗ to be the outer measure defined from µ (132B). If f , g ∈ L0 and f =a.e. g, then
{x : x ∈ F ∩ dom f, |f (x)| > ǫ}△{x : x ∈ F ∩ dom g, |g(x)| > ǫ}
is negligible, so τF ǫ (f ) = τF ǫ (g); accordingly we have a functional from L0 to [0, ∞[, given by
τ̄F ǫ (u) = τF ǫ (f )
0 0
whenever f ∈ L and u = f ∈ L .

(b) Now τF ǫ is not (except in trivial cases) subadditive, so does not define a pseudometric on L0 or L0 . But we can
say that, for f ∈ L0 ,
τF (f ) ≤ ǫ min(1, ǫ) =⇒ τF ǫ (f ) ≤ ǫ =⇒ τF (f ) ≤ ǫ(1 + µF ).
(The point is that if E ⊆ dom f is a measurable conegligible set such that f ↾E is measurable, then
R
τF (f ) = E∩F
min(f (x), 1)dx, τF ǫ (f ) = µ{x : x ∈ E ∩ F, f (x) > ǫ}.)
This means that a set G ⊆ L0 is open for the topology of convergence in measure iff for every f ∈ G we can find a
set F of finite measure and ǫ, δ > 0 such that
τF ǫ (g − f ) ≤ δ =⇒ g ∈ G.
Of course τF δ (f ) ≥ τF ǫ (f ) whenever δ ≤ ǫ, so we can equally say: G ⊆ L0 is open for the topology of convergence in
measure iff for every f ∈ G we can find a set F of finite measure and ǫ > 0 such that
τF ǫ (g − f ) ≤ ǫ =⇒ g ∈ G.
Similarly, G ⊆ L is open for the topology of convergence in measure on L0 iff for every u ∈ G we can find a set F of
0

finite measure and ǫ > 0 such that


τ̄F ǫ (v − u) ≤ ǫ =⇒ v ∈ G.

(c) It follows at once that a sequence hfn in∈N in L0 = L0 (µ) converges in measure to f ∈ L0 iff
limn→∞ µ∗ {x : x ∈ F ∩ dom f ∩ dom fn , |fn (x) − f (x)| > ǫ} = 0
whenever F ∈ Σ, µF < ∞ and ǫ > 0. Similarly, a sequence hun in∈N in L0 converges in measure to u iff limn→∞ τ̄F ǫ (u −
un ) = 0 whenever µF < ∞ and ǫ > 0.
(d) In particular, if (X, Σ, µ) is totally finite, hfn in∈N → f in L0 iff
limn→∞ µ∗ {x : x ∈ dom f ∩ dom fn , |f (x) − fn (x)| > ǫ} = 0
for every ǫ > 0, and hun in∈N → u in L0 iff
limn→∞ τ̄Xǫ (u − un ) = 0
for every ǫ > 0.

245G Embedding Lp in L0 : Proposition Let (X, Σ, µ) be any measure space. Then for any p ∈ [1, ∞], the
embedding of Lp = Lp (µ) in L0 = L0 (µ) is continuous for the norm topology of Lp and the topology of convergence in
measure on L0 .
245I Convergence in measure 179

proof Suppose that u, v ∈ Lp and that µF < ∞. Then (χF )• belongs to Lq , where q = p/(p − 1) (taking q = 1 if
p = ∞, q = ∞ if p = 1 as usual), and
R
ρ̄F (u, v) ≤ |u − v| × (χF )• ≤ ku − vkp kχF • kq
(244Eb). By 2A3H, this is enough to ensure that the embedding u 7→ u : Lp → L0 is continuous.

245H The case of L1 is so important that I go farther with it.


Proposition Let (X, Σ, µ) be a measure space. R
(a)(i) If f ∈ L1R = L1 (µ)
R and ǫ > 0, there are a δ > 0 and a set F ∈ Σ of finite measure such that |f − g| ≤ ǫ
whenever g ∈ L1 , |g| ≤ |f | + δ and ρF (f, g) ≤ δ. R
(ii) ForRany sequence
R hfn in∈N in L1 and any f ∈ L1 , limn→∞ |f − fn | = 0 iff hfn in∈N → f in measure and
lim supn→∞ |fn | ≤ |f |.
(b)(i) If u ∈ L1 = L1 (µ) and ǫ > 0, there are a δ > 0 and a set F ∈ Σ of finite measure such that ku − vk1 ≤ ǫ
whenever v ∈ L1 , kvk1 ≤ kuk1 + δ and ρ̄F (u, v) ≤ δ.
(ii) For any sequence hun in∈N in L1 and any u ∈ L1 , hun in∈N → u for k k1 iff hun in∈N → u in measure and
lim supn→∞ kun k1 ≤ kuk1 .
R
proof (a)(i) We know that there are a set F of finite measure and an η > R0 suchR that E |f | ≤ 51 ǫ whenever
µ(E ∩ F ) ≤ η (225A). Take δ > 0 such that δ(ǫ + 5µF ) ≤ ǫη and δ ≤ 51 ǫ. Then if |g| ≤ |f | + δ and ρF (f, g) ≤ δ, let
G ⊆ dom f ∩ dom g be a conegligible measurable set such that f ↾G and g↾G are both measurable. Set
ǫ
E = {x : x ∈ F ∩ G, |f (x) − g(x)| ≥ };
ǫ+5µF
then
ǫ
δ ≥ ρF (f, g) ≥ µE,
ǫ+5µF
R
so µE ≤ η. Set H = F \ E, so that µ(F \ H) ≤ η and X\H |f | ≤ 51 ǫ. On the other hand, for almost every x ∈ H,
ǫ
R
|f (x) − g(x)| ≤ ǫ+5µF , so H |f − g| ≤ 51 ǫ and
R R 1 R R 1 R 2
H
|g| ≥ H
|f | − ǫ ≥ |f | − X\H
|f | − ǫ ≥ |f | − ǫ.
5 5 5
R R R
Since |g| ≤ |f | + 15 ǫ, X\H
|g| ≤ 35 ǫ. Now this means that
R R R R 3 1 1
|g − f | ≤ X\H
|g| + X\H
|f | + H
|g − f | ≤ ǫ + ǫ + ǫ = ǫ,
5 5 5
as required.
R
(ii) If limn→∞ |f − fn | = 0, that is, limn→∞ fn• = f • in L1 , then by 245G we mustR have hfRn• in∈N → f • in L0 ,
that is, hfn in∈N → f for the topology of Rconvergence
R in measure. Also, of course, limn→∞ |fn | = |f |.
In the other direction, if lim supn→∞ |fn | ≤ |f | and hfn in∈N → R f for theR topology of convergence in measure,
then whenever δ > 0 and µF R < ∞ there must be an m ∈ N such that |f n | ≤ |f | + δ, ρF (f, fn ) ≤ δ for every n ≥ m;
so (i) tells us that limn→∞ |fn − f | = 0.
(b) This now follows immediately if we express u as f • , v as g • and un as fn• .

245I Remarks (a) I think the phenomenon here is so important that it is worth looking at some elementary
examples.
R (i) If µ is counting measure on N, and we set fn (n) = 1, fn (i) = 0 for i 6= n, then hfn in∈N → 0 in measure, while
|fn | = 1 for every n.
(ii) If µ is Lebesgue
R measure on [0, 1], and we set fn (x) = 2n for 0 < x ≤ 2−n , 0 for other x, then again hfn in∈N → 0
in measure, while |fn | = 1 for every n. R
(iii) In 245Cc we have another sequence hfn in∈N converging to 0 in measure,
R while |fnR| = 1 for every n. In all these
cases (as required by Fatou’s Lemma, at least in (i) and (ii)) we have |f | ≤ lim inf n→∞ |fn |. (The next proposition
shows that this applies to any sequence which is convergent in measure.)
The common feature of these examples is the way in which somehow the fn escape to infinity, either laterally (in
′ n
R in′ all three examples we can set fn = 2 fn to obtain a sequence
(i)) or vertically (in (iii)) or both (in (ii)). Note that
still converging to 0 in measure, but with limn→∞ |fn | = ∞.
180 Function spaces 245Ib
R
(b) In 245H, I have used the explicit formulations ‘limn→∞ |fn − f | = 0’ (for sequences of functions), ‘hun in∈N → u
for k k1 ’ (for sequences in L1 ). These are often expressed by saying that hfn in∈N , hun in∈N are convergent in mean
to f , u respectively.

245J For semi-finite spaces we have a further relationship.


Proposition Let (X, Σ, µ) be a semi-finite measure R space. Write L0 = L0 (µ), etc.
(a)(i) For any a ≥ 0, the set {f : f ∈ L , |f | ≤ a} is closed in L0 for the topology of convergence
1
R in measure.
1 0
(ii) If hfn in∈N
R is a sequence in L
R which is convergent in measure to f ∈ L , and lim inf n→∞ |fn | < ∞, then f
is integrable and |f | ≤ lim inf n→∞ |fn |.
(b)(i) For any a ≥ 0, the set {u : u ∈ L1 , kuk1 ≤ a} is closed in L0 for the topology of convergence in measure.
(ii) If hun in∈N is a sequence in L1 which is convergent in measure to u ∈ L0 , and lim inf n→∞ kun k1 < ∞, then
u ∈ L1 and kuk1 ≤ lim inf n→∞ kun k1 .
R
proof (a)(i) Set A = {f : f ∈ L1 , |f | ≤ a}, and let g be any member 0
R of the closure of A in L . Let h be any simple
function such that 0 ≤ h ≤a.e. |g|, and ǫ > 0. If h = 0 then of course h ≤ a. Otherwise, setting F = {x : h(x) > 0}
and M = supx∈X h(x), there is an f ∈ A such that µ∗ {x : x ∈ F ∩ dom f ∩ dom g, |f (x) − g(x)| ≥ ǫ} ≤ ǫ (245F);
let E ⊇ {x : x ∈ F ∩Rdom f ∩ dom g, |f (x) − g(x)| ≥ ǫ} be a Rmeasurable set of measure at most ǫ. Then h ≤a.e.
|f | + ǫχF + M χE, so h ≤ a + ǫ(M + µF ). As ǫ is arbitrary, R h ≤ a. But we are supposing that µ is semi-finite,
so this is enough to ensure that g is integrable and that |g| ≤ a (213B), that is, that g ∈ A. As g is arbitrary, A is
closed.
R
(ii) Now if hfn in∈N is convergent R in measure to f , and lim inf n→∞ |fn | = a, then for any ǫ > 0 there is
Ra subsequence hfn(k) ik∈N such Rthat |fn(k) | ≤ a + ǫ for every k; since hfn(k) ik∈N still converges in measure to f ,
|f | ≤ a + ǫ. As ǫ is arbitrary, |f | ≤ a.
(b) As in 245H, this is just a translation of part (a) into the language of L1 and L0 .

245K For σ-finite measure spaces, the topology of convergence in measure on L0 is metrizable, so can be described
effectively in terms of convergent sequences; it is therefore important that we have, in this case, a sharp characterisation
of sequential convergence in measure.
Proposition Let (X, Σ, µ) be a σ-finite measure space. Then
(a) a sequence hfn in∈N in L0 converges in measure to f ∈ L0 iff every subsequence of hfn in∈N has a sub-subsequence
converging to f almost everywhere;
(b) a sequence hun in∈N in L0 converges in measure to u ∈ L0 iff every subsequence of hun in∈N has a sub-subsequence
which order*-converges to u.
R
proof (a)(i) Suppose that hfn in∈N → f , that is, that limn→∞ |f − fn | ∧ χF = 0 for every set F of finite measure. Let
hEk in∈N be a non-decreasing
R sequence of sets of finite measure covering X, and P∞let hn(k)ik∈N be a strictly increasing
sequence in N such that |f − fn(k) | ∧ χEk ≤ 4−k for every k ∈ N. Then k=0 |f − fn(k) | ∧ χEk is finite almost
P∞
everywhere (242E); but limk→∞ fn(k) (x) = f (x) whenever k=0 min(1, |f (x) − fn(k) (x)|) < ∞, so hfn(k) ik∈N → f a.e.
(ii) The same applies to every subsequence of hfn in∈N , so that every subsequence of hfn in∈N has a sub-subsequence
converging to f almost everywhere.
(iii) Now suppose that hfn in∈N does not converge to f . Then there is an open set U containing f such that
{n : fn ∈ / U } is infinite, that is, hfn in∈N has a subsequence hfn′ in∈N with fn′ ∈
/ U for every n. But now no sub-
subsequence of hfn′ in∈N converges to f in measure, so no such sub-subsequence can converge almost everywhere, by
245Ca.
(b) This follows immediately from (a) if we express u as f • , un as fn• .

245L Corollary Let (X, Σ, µ) be a σ-finite measure space.


(a) A subset A of L0 = L0 (µ) is closed for the topology of convergence in measure iff f ∈ A whenever f ∈ L0 and
there is a sequence hfn in∈N in A such that f =a.e. limn→∞ fn .
(b) A subset A of L0 = L0 (µ) is closed for the topology of convergence in measure iff u ∈ A whenever u ∈ L0 and
there is a sequence hun in∈N in A order*-converging to u.
proof (a)(i) If A is closed for the topology of convergence in measure, and hfn in∈N is a sequence in A converging to
f almost everywhere, then hfn in∈N converges to f in measure, so surely f ∈ A (since otherwise all but finitely many
of the fn would have to belong to the open set L0 \ A).
245Xi Convergence in measure 181

(ii) If A is not closed, there is an f ∈ A \ A. The topology can be defined by a metric ρ (245Eb), and we can
choose a sequence hfn in∈N in A such that ρ(fn , f ) ≤ 2−n for every n, so that hfn in∈N → f in measure. By 245K,
hfn in∈N has a subsequence hfn′ in∈N converging a.e. to f , and this witnesses that A fails to satisfy the condition.
(b) This follows immediately, because A ⊆ L0 is closed iff {f : f • ∈ A} is closed in L0 .

245M Complex L0 In 241J I briefly discussed the adaptations needed to construct the complex linear space L0C .
The formulae of 245A may be used unchanged to define topologies of convergence in measure on L0C and L0C . I think
that every word of 245B-245L still applies if we replace each L0 or L0 with L0C or L0C . Alternatively, to relate the ‘real’
and ‘complex’ forms of 245E, for instance, we can observe that because

max(ρF (Re(u), Re(v)), ρF (Im(u), Im(v))) ≤ ρF (u, v)


≤ ρF (Re(u), Re(v)) + ρF (Im(u), Im(v))
for all u, v ∈ L0 and all sets F of finite measure, L0C can be identified, as uniform space, with L0 × L0 , so is Hausdorff,
or metrizable, or complete iff L0 is.

245X Basic exercises > (a) Let X be any set, and µ counting measure on X. Show that the topology of convergence
in measure on L0 (µ) = RX is just the product topology on RX regarded as a product of copies of R.

> (b) Let (X, Σ, µ) be any measure space, and (X, Σ̂, µ̂) its completion. Show that the topologies of convergence in
measure on L0 (µ) = L0 (µ̂) (241Xb), corresponding to the families {ρF : F ∈ Σ, µF < ∞}, {ρF : F ∈ Σ̂, µ̂F < ∞} are
the same.

> (c) Let (X, Σ, µ) be any measure space; set L0 = L0 (µ). Let u, un ∈ L0 for n ∈ N. Show that the following are
equiveridical:
(i) hun in∈N order*-converges to u in the sense of 245C;
(ii) there are measurable functions f , fn : X → R such that f • = u, fn• = un for every n ∈ N, and f (x) =
limn→∞ fn (x) for every x ∈ X;
(iii) u = inf n∈N supm≥n um = supn∈N inf m≥n um , the infima and suprema being taken in L0 ;
(iv) inf n∈N supm≥n |u − um | = 0 in L0 ;
(v) there is a non-increasing sequence hvn in∈N in L0 such that inf n∈N vn = 0 in L0 and u − vn ≤ un ≤ u + vn for
every n ∈ N;
(vi) there are sequences hvn in∈N , hwn in∈N in L0 such that hvn in∈N is non-decreasing, hwn in∈N is non-increasing,
supn∈N vn = u = inf n∈N wn and vn ≤ un ≤ wn for every n ∈ N.

(d) Let (X, Σ, µ) be a semi-finite measure space. Show that a sequence hun in∈N in L0 = L0 (µ) is order*-convergent
to u ∈ L0 iff {|un | : n ∈ N} is bounded above in L0 and hsupm≥n |um − u|in∈N → 0 for the topology of convergence in
measure.

(e) Write out proofs that L0 (µ) is complete (as linear topological space) adapted to the special cases (i) µX = 1
(ii) µ is σ-finite, taking advantage of any simplifications you can find.

(f ) Let (X, Σ, µ) be a measure space and r ≥ 1; let h : Rr → R be a continuous function. (i) Suppose that for 1 ≤ k ≤
r we are given a sequence hfkn in∈N in L0 = L0 (µ) converging in measure to fk ∈ L0 . Show that hh(f1n , . . . , fkn )in∈N
converges in measure to h(f1 , . . . , fk ). (ii) Generally, show that (f1 , . . . , fr ) 7→ h(f1 , . . . , fr ) : (L0 )r → L0 is continuous
for the topology of convergence in measure. (iii) Show that the corresponding function h̄ : (L0 )r → L0 (241Xh) is
continuous for the topology of convergence in measure.
R
(g) Let (X, Σ, µ) be a measure space and u ∈ L1 (µ). Show that v 7→ u×v : L∞ → R is continuous for the topology
of convergence in measure on the unit ball of L∞ , but not, as a rule, on the whole of L∞ .

(h) Let (X, Σ, µ) be a measure space and v a non-negative member of L1 = L1 (µ). Show that on the set A = {u :
u ∈ L1 , |u| ≤ v} the subspace topologies (2A3C) induced by the norm topology of L1 and the topology
R of convergence
in measure are the same. (Hint: given ǫ > 0, take F ∈ Σ of finite measure and M ≥ 0 such that (|v| − M χF • )+ ≤ ǫ.
Show that ku − u′ k1 ≤ ǫ + M ρ̄F (u, u′ ) for all u, u′ ∈ A.)

(i) Let (X, Σ, µ) be a measure space and F a filter on L1 = L1 (µ) which is convergent, for the topology of convergence
in measure, to u ∈ L1 . Show that F → u for the norm topology of L1 iff inf A∈F supv∈A kvk1 ≤ kuk1 .
182 Function spaces 245Xj

(j) Let (X, Σ, µ) be a measure space and p ∈ [1, ∞[. Suppose that hun in∈N is a sequence in Lp (µ) which converges
for k kp to u ∈ Lp (µ). Show that h|un |p in∈N → |u|p for k k1 . (Hint: 245G, 245Dd, 245H.)

> (k) Let (X, Σ, µ) be a semi-finite measure space and p ∈ [1, ∞], a ≥ 0. Show that {u : u ∈ Lp (µ), kukp ≤ a} is
closed in L0 (µ) for the topology of convergence in measure.

(l) Let (X, Σ, µ) be a measure space, and hun in∈N a sequence in Lp = Lp (µ), where 1 ≤ p < ∞. Let u ∈ Lp . Show
that the following are equiveridical: (i) u = limn→∞ un for the norm topology of Lp (ii) hun in∈N → u for the topology
of convergence in measure and limn→∞ kun kp = kukp (iii) hun in∈N → u for the topology of convergence in measure
and lim supn→∞ kun kp ≤ kukp .

(m) Let X be a set and µ, ν two measures on X with the same measurable sets and the same negligible sets. (i)
Show that L0 (µ) = L0 (ν) and L0 (µ) = L0 (ν). (ii) Show that if both µ and ν are semi-finite, then they define the same
topology of convergence in measure on L0 and L0 . (Hint: use 215A to show that if µE < ∞ then µE = sup{µF : F ⊆
E, νF < ∞}.)

245Y Further exercises (a) Let (X, Σ, µ) be a measure space and give Σ the topology described in 232Ya. Show
that χ : Σ → L0 (µ) is a homeomorphism between Σ and its image χ[Σ] in L0 , if L0 is given the topology of convergence
in measure and χ[Σ] the subspace topology.

(b) Let (X, Σ, µ) be a measure space and Y any subset of X; let µY be the subspace measure on Y . Let T :
L0 (µ) → L0 (µY ) be the canonical map defined by setting T (f • ) = (f ↾ Y )• for every f ∈ L0 (µ) (241Yg). Show that T
is continuous for the topologies of convergence in measure on L0 (µ) and L0 (µY ).

(c) Let (X, Σ, µ) be a measure space, and µ̃ the c.l.d. version of µ. Show that the map T : L0 (µ) → L0 (µ̃) induced
by the inclusion L0 (µ) ⊆ L0 (µ̃) (241Yf) is continuous for the topologies of convergence in measure.

(d) Let (X, Σ, µ) be a measure space, and give L0 = L0 (µ) the topology of convergence in measure. Let A ⊆ L0 be
a non-empty downwards-directed set, and suppose that inf A = 0 in L0 . (i) Let F ∈ Σ be any set of finite measure,
and define τ̄F as in 245A; show that inf u∈A τ̄F (u) = 0. (Hint: set γ = inf u∈A τ̄F (u); find a non-increasing sequence
hun in∈N in A such that limn→∞ τ̄F (un ) = γ; set v = (χF )• ∧ inf n∈N un and show that u ∧ v = v for every u ∈ A, so
that v = 0.) (ii) Show that if U is any open set containing 0, there is a u ∈ A such that v ∈ U whenever 0 ≤ v ≤ u.

(e) Let (X, Σ, µ) be a measure space. (i) Show that for u ∈ L0 = L0 (µ) we may define ψa (u), for a ≥ 0, by setting
ψa (u) = µ{x : |f (x)| ≥ a} whenever f : X → R is a measurable function and f • = u. (ii) Define ρ : L0 × L0 → [0, 1]
by setting ρ(u, v) = min({1} ∪ {a : a ≥ 0, ψa (u − v) ≤ a}. Show that ρ is a metric on L0 , that L0 is complete under
ρ, and that +, −, ∧, ∨ : L0 × L0 → L0 are continuous for ρ. (iii) Show that c 7→ cu : R → L0 is continuous for every
u ∈ L0 iff (X, Σ, µ) is totally finite, and that in this case ρ defines the topology of convergence in measure on L0 .

(f ) Let (X, Σ, µ) be a localizable measure space and A ⊆ L0 = L0 (µ) a non-empty upwards-directed set which is
bounded in the linear topological space sense (i.e., such that for every neighbourhood U of 0 in L0 there is a k ∈ N
such that A ⊆ kU ). Show that A is bounded above in L0 , and that its supremum belongs to its closure.

(g) Let (X, Σ, µ) be a measure space, p ∈ [1, ∞[ and v a non-negative member of Lp = Lp (µ). Show that on the set
A = {u : u ∈ Lp , |u| ≤ v} the subspace topologies induced by the norm topology of Lp and the topology of convergence
in measure are the same.

(h) Let S be the set of all sequences s : N → N such that limn→∞ s(n) = ∞. For every s ∈ S, let (Xs , Σs , µs )
be [0, 1] with Lebesgue measure, and let (X, Σ, µ) be the direct sum of h(Xs , Σs , µs )is∈S (214L). For s ∈ S, t ∈ [0, 1],
n ∈ N set hn (s, t) = fs(n) (t), where hfn in∈N is the sequence of 245Cc. Show that hhn in∈N → 0 for the topology of
convergence in measure on L0 (µ), but that hhn in∈N has no subsequence which is convergent to 0 almost everywhere.

(i) Let X be a set, and suppose we are given a relation ⇀ between sequences in X and members of X such that
(α) if xn = x for every n then hxn in∈N ⇀ x (β) hx′n in∈N ⇀ x whenever hxn in∈N ⇀ x and hx′n in∈N is a subsequence
of hxn in∈N . Show that we have a topology T on X defined by saying that a subset G of X belongs to T iff whenever
hxn in∈N is a sequence in X and hxn in∈N ⇀ x ∈ G then some xn belongs to G. Show that a sequence hxn in∈N in X is
T-convergent to x iff every subsequence of hxn in∈N has a sub-subsequence hx′′n in∈N such that hx′′n in∈N ⇀ x.
246Bb Uniform integrability 183

(j) Let µ be Lebesgue measure on R r . Show that L0 (µ) is separable for the topology of convergence in measure.
(Hint: 244I.)

245 Notes and comments In this section I am inviting you to regard the topology of (local) convergence in measure
as the standard topology on L0 , just as the norms define the standard topologies on Lp spaces for p ≥ 1. The definition
I have chosen is designed to make addition and scalar multiplication and the operations ∨, ∧ and × continuous (245D);
see also 245Xf. From the point of view of functional analysis these properties are more important than metrizability
or even completeness.
Just as the algebraic and order structure of L0 can be described in terms of the general theory of Riesz spaces,
the more advanced results 241G and 245E also have interpretations in the general theory. It is not an accident that
(for semi-finite measure spaces) L0 is Dedekind complete iff it is complete as uniform space; you may find the relevant
generalizations in 23K and 24E of Fremlin 74. Of course it is exactly because the two kinds of completeness are
interrelated that I feel it necessary to use the phrase ‘Dedekind completeness’ to distinguish this particular kind of
order-completeness from the more familiar uniformity-completeness described in 2A5F.
The usefulness of the topology of convergence in measure derives in large part from 245G-245J and the Lp versions
245Xk and 245Xl. Some of the ideas here can be related to a question arising out of the basic convergence R theorems.
If hfn in∈N
R is a sequence of integrable functions converging (pointwise) to a function f , in what ways can f fail to be
limn→∞ fn ? In the language of this section, this translates into: if we have a sequence (or filter) in L1 converging
for the topology of convergence in measure, in what ways can it fail to converge for the norm topology of L1 ? The
first answer is Lebesgue’s Dominated Convergence Theorem: this cannot happen if the sequence is dominated, that
is, lies within some set of the form {u : |u| ≤ v} where v ∈ L1 . (See 245Xh and 245Yg.) I will return to this in the
next section. For the moment, though, 245H tells us that if hun in∈N converges in measure to u ∈ L1 , but not for the
topology of L1 , it is because lim supn→∞ kun k1 is too big; some of its weight is being lost at infinity, as in the examples
of 245I. If hun in∈N actually order*-converges to u, then Fatou’s Lemma tells us that lim inf n→∞ kun k1 ≥ kuk1 , that
is, that the limit cannot have greater weight (as measured by k k1 ) than the sequence provides. 245J and 245Xk are
generalizations of this to convergence in measure. If you want a generalization of B.Levi’s theorem, then 242Yf remains
the best expression in the language of this chapter; but 245Yf is a version in terms of the concepts of the present
section.
In the case of σ-finite spaces, we have an alternative description of the topology of convergence in measure (245L)
which makes no use of any of the functionals or pseudo-metrics in 245A. This can be expressed, at least in the context
of L0 , in terms of a standard result from general topology (245Yi). You will see that that result gives a recipe for a
topology on L0 which could be applied in any measure space. What is remarkable is that for σ-finite spaces we get a
linear space topology.

246 Uniform integrability


The next topic is a fairly specialized one, but it is of great importance, for different reasons, in both probability
theory and functional analysis, and it therefore seems worth while giving a proper treatment straight away.

246A Definition Let (X, Σ, µ) be a measure space.

(a) A set A ⊆ L1 (µ) is uniformly integrable if for every ǫ > 0 we can find a set E ∈ Σ, of finite measure, and an
M ≥ 0 such that
R
(|f | − M χE)+ ≤ ǫ for every f ∈ A.

(b) A set A ⊆ L1 (µ) is uniformly integrable if for every ǫ > 0 we can find a set E ∈ Σ, of finite measure, and an
M ≥ 0 such that
R
(|u| − M χE • )+ ≤ ǫ for every u ∈ A.

246B Remarks (a) Recall the formulae from 241Ef: u+ = u ∨ 0, so (u − v)+ = u − u ∧ v.

(b) The phrase ‘uniformly integrable’ is not particularly helpful. But of course we can observe that for any particular
integrable function f , there are simple functions approximating f for k k1 (242M), and such functions will be bounded
(in modulus) by functions of the form M χE, with µE < ∞; thus singleton subsets of L1 and L1 are uniformly
184 Function spaces 246Bb

integrable. A general uniformly integrable set of functions is one in which M and E can be chosen uniformly over the
set.
(c) It will I hope be clear from the definitions that A ⊆ L1 is uniformly integrable iff {f • : f ∈ A} ⊆ L1 is uniformly
integrable.

(d) There is a useful simplification in the definition if µX < ∞ (in particular, if (X, Σ, µ) is a probability space).
In this case a set A ⊆ L1 (µ) is uniformly integrable iff
R
inf M ≥0 supu∈A (|u| − M e)+ = 0
iff
R
limM →∞ supu∈A (|u| − M e)+ = 0,
R R
writing e = χX • ∈ L1 (µ). (For if supu∈A (|u| − M χE • )+ ≤ ǫ, then (|u| − M ′ e)+ ≤ ǫ for every M ′ ≥ M .) Similarly,
A ⊆ L1 (µ) is uniformly integrable iff
R
limM →∞ supf ∈A (|f | − M χX)+ = 0
iff
R
inf M ≥0 supf ∈A (|f | − M χX)+ = 0.

Warning! Some authors use the phrase ‘uniformly integrable’ for sets satisfying the conditions in (d) even when µ is
not totally finite.

246C We have the following wide-ranging stability properties of the class of uniformly integrable sets in L1 or L1 .
Proposition Let (X, Σ, µ) be a measure space and A a uniformly integrable subset of L1 (µ).
(a) A is bounded for the norm k k1 .
(b) Any subset of A is uniformly integrable.
(c) For any a ∈ R, aA = {au : u ∈ A} is uniformly integrable.
(d) There is a uniformly integrable C ⊇ A such that C is convex and k k1 -closed and v ∈ C whenever u ∈ C and
|v| ≤ |u|.
(e) If B is another uniformly integrable subset of L1 , then A ∪ B and A + B = {u + v : u ∈ A, v ∈ B} are uniformly
integrable.
proof Write Σf for {E : E ∈ Σ, µE < ∞}.
R
(a) There must be E ∈ Σf , M ≥ 0 such that (|u| − M χE • )+ ≤ 1 for every u ∈ A; now
R R
kuk1 ≤ (|u| − M χE • )+ + M χE • ≤ 1 + M µE
for every u ∈ A, so A is bounded.
(b) This is immediate from the definition 246Ab.
R R
(c) Given ǫ > 0, we can find E ∈ Σf , M ≥ 0 such that |a| E
(|u| − M χE • )+ ≤ ǫ for every u ∈ A; now E
(|v| −
|a|M χE • )+ ≤ ǫ for every v ∈ aA.
(d) If A is empty, take C = A. Otherwise, try
R R
C = {v : v ∈ L1 , (|v| − w)+ ≤ supu∈A (|u| − w)+ for every w ∈ L1 (µ)}.
Evidently A ⊆ C, and C satisfies the definition 246Ab because A does, considering w of the form M χE • where E ∈ Σf
and M ≥ 0. The functionals
R
v 7→ (|v| − w)+ : L1 (µ) → R
R
are all continuous for k k1 (because the operators v 7→ |v|, v 7→ v − w, v 7→ v + , v 7→ v are continuous), so C is closed.
If |v ′ | ≤ |v| and v ∈ C, then
R R R
(|v ′ | − w)+ ≤ (|v| − w)+ ≤ supu∈A (|u| − w)+
for every w, and v ′ ∈ C. If v = av1 + bv2 where v1 , v2 ∈ C, a ∈ [0, 1] and b = 1 − a, then |v| ≤ a|v1 | + b|v2 |, so
|v| − w ≤ (a|v1 | − aw) + (b|v2 | − bw) ≤ (a|v1 | − aw)+ + (b|v2 | − bw)+
and
246Ec Uniform integrability 185

(|v| − w)+ ≤ a(|v1 | − w)+ + b(|v2 | − w)+


for every w; accordingly
Z Z Z
(|v| − w)+ ≤ a (|v1 | − w)+ + b (|v2 | − w)+
Z Z
+
≤ (a + b) sup (|u| − w) = sup (|u| − w)+
u∈A u∈A

for every w, and v ∈ C.


Thus C has all the required properties.
P Given ǫ > 0, let M1 , M2 ≥ 0 and E1 , E2 ∈ Σf be such that
(e) I show first that A ∪ B is uniformly integrable. P
R
(|u| − M1 χE1• )+ ≤ ǫ for every u ∈ A,
R
(|u| − M2 χE2• )+ ≤ ǫ for every u ∈ B.
Set M = max(M1 , M2 ), E = E1 ∪ E2 ; then µE < ∞ and
R
(|u| − M χE • )+ ≤ ǫ for every u ∈ A ∪ B.
As ǫ is arbitrary, A ∪ B is uniformly integrable. Q
Q
Now (d) tells us that there is a convex uniformly integrable set C including A ∪ B, and in this case A + B ⊆ 2C, so
A + B is also uniformly integrable, using (b) and (c).

246D Proposition Let (X, Σ, µ) be a probability space and A ⊆ L1 (µ) a uniformly integrable set. Then there is
a convex, k k1 -closed uniformly integrable set C ⊆ L1 such that A ⊆ C, w ∈ C whenever v ∈ C and |w| ≤ |v|, and
P v ∈ C whenever v ∈ C and P is the conditional expectation operator associated with a σ-subalgebra of Σ.
proof Set
R R
C = {v : v ∈ L1 (µ), (|v| − M e)+ ≤ supu∈A (|u| − M e)+ for every M ≥ 0},
writing e = χX • as usual. The arguments in the proof of 246Cd make it plain that C ⊇ A is uniformly integrable,
convex and closed, and that w ∈ C whenever v ∈ C and |w| ≤ |v|. As for the conditional expectation operators, if
v ∈ C, T is a σ-subalgebra of Σ, P is the associated conditional expectation operator, and M ≥ 0, then
|P v| ≤ P |v| = P ((|v| ∧ M e) + (|v| − M e)+ ) ≤ M e + P ((|v| − M e)+ ),
so
(|P v| − M e)+ ≤ P ((|v| − M e)+ )
and
R R R R
(|P v| − M e)+ ≤ P (|v| − M e)+ = (|v| − M e)+ ≤ supu∈A (|u| − M e)+ ;
as M is arbitrary, P v ∈ C.

246E Remarks (a) Of course 246D has an expression in terms of L1 rather than L1 : if (X, Σ, µ) is a probability
space and A ⊆ L1 (µ) is uniformly integrable, then there is a uniformly integrable set C ⊇ A such that (i) af +(1−a)g ∈
C whenever f , g ∈ C and a ∈ [0, 1] (ii) g ∈ C whenever
R f ∈ C, g ∈ L0 (µ) and |g| ≤a.e. |f | (iii) f ∈ C whenever there
is a sequence hfn in∈N in C such that limn→∞ |f − fn | = 0 (iv) g ∈ C whenever there is an f ∈ C such that g is a
conditional expectation of f with respect to some σ-subalgebra of Σ.

(b) In fact, there are obvious extensions of 246D; the proof there already shows that T [C] ⊆ C whenever T :
L1 (µ) → L1 (µ) is an order-preserving linear operator such that kT uk1 ≤ kuk1 for every u ∈ L1 (µ) and kT uk∞ ≤ kuk∞
for every u ∈ L1 (µ) ∩ L∞ (µ) (246Yc). If we had done a bit more of the theory of operators on Riesz spaces I should
be able to take you a good deal farther along this road; for instance, it is not in fact necessary to assume that the
operators T of the last sentence are order-preserving. I will return to this in Chapter 37 in the next volume.

(c) Moreover, the main theorem of the next section will show that for any measure spaces (X, Σ, µ), (Y, T, ν),
T [A] will be uniformly integrable in L1 (ν) whenever A ⊆ L1 (µ) is uniformly integrable and T : L1 (µ) → L1 (ν) is a
continuous linear operator (247D).
186 Function spaces 246F

246F We shall need an elementary lemma which I have not so far spelt out.
Lemma Let (X, Σ, µ) be a measure space. Then for any u ∈ L1 (µ),
R
kuk1 ≤ 2 supE∈Σ | E
u|.

proof Express u as f • where f : X → R is measurable. Set F = {x : f (x) ≥ 0}. Then


R R R R R
kuk1 = |f | = | F
f| + | X\F
f | ≤ 2 supE∈Σ | E
f | = 2 supE∈Σ | E
u|.

246G Now we come to some of the remarkable alternative descriptions of uniform integrability.
Theorem Let (X, Σ, µ) be any measure space and A a non-empty subset of L1 (µ). Then the following are equiveridical:
(i) A is uniformly
R integrable;
(ii)
R sup u∈A | F
u| < ∞ for every µ-atom F ∈ Σ, and for every ǫ > 0 there are E ∈ Σ, δ > 0 such that µE < ∞
and | F u| ≤ ǫ whenever
R u ∈ A, F ∈ Σ and µ(F ∩ E) ≤ δ; R
(iii) supu∈A | F u| < ∞ for every µ-atom F ∈ Σ, and limn→∞ supu∈A | Fn u| = 0 whenever hFn in∈N is a disjoint
sequence in Σ; R R
(iv) supu∈A | F u| < ∞ for every µ-atom F ∈ Σ, and limn→∞ supu∈A | Fn u| = 0 whenever hFn in∈N is a non-
increasing sequence in Σ with empty intersection.
Remark I use the phrase ‘µ-atom’ to emphasize that I mean an atom in the measure space sense (211I).
proof (a)(i)⇒(iv) Suppose that A is uniformly integrable. Then surely if F ∈ Σ is a µ-atom,
R
supu∈A | F
u| ≤ supu∈A kuk1 < ∞,
by 246Ca. Now suppose that hFn in∈N is a non-increasing
R sequence in Σ with empty intersection, and that ǫ > 0.
Take E ∈ Σ, M ≥ 0 such that µE < ∞ and (|u| − M χE • )+ ≤ 21 ǫ whenever u ∈ A. Then for all n large enough,
M µ(Fn ∩ E) ≤ 12 ǫ, so that
R R R R ǫ
| Fn
u| ≤ (|u| − M χE • )+ + F M χE • ≤ + M µ(Fn ∩ E) ≤ ǫ
Fn
|u| ≤
n 2
R
for every u ∈ A. As ǫ is arbitrary, limn→∞ supu∈A | Fn u| = 0, and (iv) is true.
R
(b)(iv)⇒(iii) Suppose that (iv) is true. Then of course supu∈A | F u| < ∞ for every µ-atom F R∈ Σ. ?? Suppose,
if possible, that hFn in∈N is a disjoint sequence in Σ such that ǫ = lim supn→∞ supu∈A min(1, 13 | Fn u|) > 0. Set
S R
Hn = i≥n Fi for each n, so that hHn in∈N is non-increasing and has empty intersection, and Hn u → 0 as n → ∞ for
every u ∈ LR1 (µ). Choose hni ii∈N , hmi ii∈N , hui ii∈N inductively,
R as follows. n0 = 0. Given ni ∈ N, take mi ≥ ni , ui ∈ A
such that | Fm ui | ≥ 2ǫ. Take ni+1 > mi such that Hn |ui | ≤ ǫ. Continue.
Si i+1
Set Gk = i≥k Fmi for each k. Then hGk ik∈N is a non-increasing sequence in Σ with empty intersection. But
Fmi ⊆ Gi ⊆ Fmi ∪ Hni+1 , so
R R R R
| Gi
ui | ≥ | Fm i
ui | − | Gi \Fmi
ui | ≥ 2ǫ − Hni+1
|ui | ≥ ǫ

for every i, contradicting the hypothesis


R (iv). X X
This means that limn→∞ supu∈A | Fn u| must be zero, and (iii) is true.
R
(c)(iii)⇒(ii) We still have supu∈A | F u| < ∞ for every µ-atom F . ?? Suppose, if possible, that there is an ǫ > 0
such that
R for every measurable set E of finite measure and every δ > 0 there are u ∈ A, F ∈ Σ such that µ(F ∩ E) ≤ δ
and | F u| ≥ ǫ. Choose a sequence hEn in∈N of sets of finite measure, a sequence hGn in∈N in Σ, a sequence hδn in∈N of
S and a sequence hun in∈N in A as follows. Given
strictly positive real numbers R uk , Ek , δk for k < n, choose un ∈ A and
Gn ∈ Σ such that µ(Gn ∩ k<n Ek ) ≤ 2−n min({1} ∪ {δk : k < n}) and | Gn un | ≥ ǫ; then choose a set En of finite
R
measure and a δn > 0 such that F |un | ≤ 21 ǫ wheneverS F ∈ Σ and µ(F ∩ En ) ≤ δn (see 225A). Continue.
On completing the induction, set Fn = En ∩ Gn \ k>n Gk for each n; then hFn in∈N is a disjoint sequence in Σ. By
the choice of Gk ,
S P∞
µ(En ∩ k>n Gk ) ≤ k=n+1 2−k δn ≤ δn ,
R R R
so µ(En ∩ (Gn \ Fn )) ≤ δn and Gn \Fn |un | ≤ 12 ǫ. This means that | Fn un | ≥ | Gn un | − 21 ǫ ≥ 12 ǫ. But this is contrary
to the hypothesis (iii). X
X
246I Uniform integrability 187
R
α) Assume (ii). Let ǫ > 0. Then there
(d)(ii)⇒(i)(α R are E ∈ Σ, δ > 0 such that µE < ∞ and | F u| ≤ ǫ whenever
u ∈ A, F ∈ Σ and Rµ(F ∩ E) ≤ δ. Now supu∈A E |u| < ∞. P P Write I for R the family of those
R F ∈ Σ such that
F ⊆ E and supu∈A F |u| is finite. If F ⊆ E is an atom for µ, then supu∈A F |u| = supu∈A | F u| < ∞, so F ∈ I.

(The point is that if f : X → R is a measurable functionR such that f • =R u, thenR one of FR = {x : x R∈ F, f (x) ≥ 0},
F ′′ = {x : x ∈ F, f (x) < 0} must be negligible, so that F |u| is either F ′ u = F u or − F ′′ u = − F u.) If F ∈ Σ,
F ⊆ E and µF ≤ δ then
R R
supu∈AF
|u| ≤ 2 supu∈A,G∈Σ,G⊆F | G u| ≤ 2ǫ
R R R
(by 246F), so F ∈ I. Next, if F , G ∈ I then supu∈A S F ∪G |u| ≤ supu∈A F |u| + supu∈A G |u| is finite,
S so F ∪ G ∈ I.
Finally, if hFn in∈N is any sequence in I, and F = n∈N Fn , there is some n ∈ N such that µ(F \ i≤n Fi ) ≤ δ; now
S S
i≤n Fi and F \ i≤n Fi both belong to I, so F ∈ I.
By 215Ab, there is an F ∈ I such that H \ F is negligible for every H ∈ I. Now observe that E \ F cannot
include any non-negligible member of I; in particular, cannot include either an atom or a non-negligible set of measure
less than δ. But this means that the subspace measure on E \ F is atomless, totally finite and has no non-negligible
measurableR sets of measure less than δ; by 215D, µ(ER \ F ) = 0 and E \ F and E belong to I, as required. Q Q
Since X\E |u| ≤ δ for every u ∈ A, γ = supu∈A |u| is finite.
β ) Set M = γ/δ. If u ∈ A, express u as f • , where f : X → R is measurable, and consider

F = {x : f (x) ≥ M χE(x)}.
Then
R R
M µ(F ∩ E) ≤
f = F u ≤ γ, F
R R
so µ(F ∩ E) ≤ γ/M = δ. Accordingly F u ≤ ǫ. Similarly, F ′ (−u) ≤ ǫ, writing F ′ = {x : −f (x) ≥ M χE(x)}. But
this means that
R R R R
(|u| − M χE • )+ = (|f | − M χE)+ ≤ F ∪F ′
|f | = F ∪F ′
|u| ≤ 2ǫ
for every u ∈ A. As ǫ is arbitrary, A is uniformly integrable.

246H Remarks (a) Of course conditions (ii)-(iv) of this theorem, like (i), haveR direct translations in terms of
members of L1 . Thus a non-empty set A ⊆ L1 is uniformly integrable iff supf ∈A | F f | is finite for every atom F ∈ Σ
and R
either for every ǫ > 0 we can find E ∈ Σ, δ > 0 such that µE < ∞ and | F f | ≤ ǫ whenever f ∈ A,
F ∈ Σ and µ(F ∩ E) ≤ δ
R
or limn→∞ supf ∈A | Fn f | = 0 for every disjoint sequence hFn in∈N in Σ
R
or limn→∞ supf ∈A | Fn f | = 0 for every non-increasing sequence hFn in∈N in Σ with empty intersection.

(b) There are innumerable further equivalent expressions characterizing uniform integrability; every author has his
own favourite. Many of them are variants on (i)-(iv) of this theorem, as in 246I and 246Yd-246Yf. For a condition of
a quite different kind, see Theorem 247C.

246I Corollary Let (X, Σ, µ) be a probability space. For f ∈ L0 (µ), M ≥ 0 set RF (f, M ) = {x : x ∈ dom f, |f (x)| ≥
M }. Then a non-empty set A ⊆ L1 (µ) is uniformly integrable iff limM →∞ supf ∈A F (f,M ) |f | = 0.
proof (a) If A satisfies the condition, then
R R
inf M ≥0 supf ∈A (|f | − M χX)+ ≤ inf M ≥0 supf ∈A F (f,M )
|f | = 0,
so A is uniformly integrable.
R
(b) If A isR uniformly integrable, and ǫ > 0, there is an M0 ≥ 0 such that (|f | − M0 χX)+ ≤ ǫ for every f ∈ A; also,
γ = supf ∈A |f | is finite (246Ca). Take any M ≥ M0 max(1, (1 + γ)/ǫ). If f ∈ A, then
ǫ
|f | × χF (f, M ) ≤ (|f | − M0 χX)+ + M0 χF (f, M ) ≤ (|f | − M0 χX)+ + |f |
γ+1
everywhere on dom f , so
R R ǫ R
F (f,M )
|f | ≤ (|f | − M0 χX)+ + |f | ≤ 2ǫ.
γ+1
R
As ǫ is arbitrary, limM →∞ supf ∈A F (f,M )
|f | = 0.
188 Function spaces 246J

246J The next step is to set out some remarkable connexions between uniform integrability and the topology of
convergence in measure discussed in the last section.
Theorem Let (X, Σ, µ) be a measure space.
(a) If hfn in∈N is a uniformly integrable sequence
R of real-valued functions on
R X, and f (x) R= limn→∞ fn (x) for almost
every x ∈ X, then f is integrable and limn→∞ |fn − f | = 0; consequently f = limn→∞ fn .
(b) If A ⊆ L1 = L1 (µ) is uniformly integrable, then the norm topology of L1 and the topology of convergence in
measure of L0 = L0 (µ) agree on A.
(c) For any u ∈ L1 and any sequence hun in∈N in L1 , the following are equiveridical:
(i) u = limn→∞ un for k k1 ;
(ii) {un : n ∈ N} is uniformly integrable and hun in∈N converges to u in measure.
(d) If (X, Σ, µ) is semi-finite, and A ⊆ L1 is uniformly integrable, then the closure A of A in L0 for the topology of
convergence in measure is still a uniformly integrable subset of L1 .
R
proof (a) Note first that because
R supn∈N |fn | R< ∞ (246Ca) and |f | = lim inf n→∞ |fn |, Fatou’s Lemma assures us
that |f | is integrable, with |f | ≤ lim supn→∞ |fn |. It follows immediately that {fn − f : n ∈ N} is uniformly
integrable, being the sum of two uniformly integrable sets (246Cc, 246Ce).
R
Given ǫ > 0, there are M ≥ 0, E ∈ Σ such that µE < ∞ and (|fn − f | − M χE)+ ≤ ǫ for every n ∈ N. Also
|fn − f | ∧ M χE → 0 a.e., so
Z Z
lim sup |fn − f | ≤ lim sup (|fn − f | − M χE)+
n→∞ n→∞
Z
+ lim sup |fn − f | ∧ M χE
n→∞
≤ ǫ,
R R
by Lebesgue’s Dominated Convergence Theorem. As ǫ is arbitrary, limn→∞ |fn − f | = 0 and limn→∞ fn − f = 0.
(b) Let TA , SA be the topologies on A induced by the norm topology of L1 and the topology of convergence in
measure on L0 respectively.
R
(i) Given ǫ > 0, let F ∈ Σ, M ≥ 0 be such that µF < ∞ and (|v| − M χF • )+ ≤ ǫ for every v ∈ A, and consider
ρ̄F , defined as in 245A. Then for any f , g ∈ L0 ,
|f − g| ≤ (|f | − M χF )+ + (|g| − M χF )+ + M (|f − g| ∧ χF )
everywhere on dom f ∩ dom g, so
|u − v| ≤ (|u| − M χF • )+ + (|v| − M χF • )+ + M (|u − v| ∧ χF • )
for all u, v ∈ L0 . Consequently
ku − vk1 ≤ 2ǫ + M ρ̄F (u, v)
for all u, v ∈ A.
This means that, given ǫ > 0, we can find F , M such that, for u, v ∈ A,
ǫ
ρ̄F (u, v) ≤ =⇒ ku − vk1 ≤ 3ǫ.
1+M
It follows that every subset of A which is open for TA is open for SA (2A3Ib).
(ii) In the other direction, we have ρ̄F (u, v) ≤ ku − vk1 for every u ∈ L1 and every set F of finite measure, so
every subset of A which is open for SA is open for TA .
(c) If hun in∈N → u for k k1 , A = {un : n P
∈ N} is uniformly integrable. P
P Given ǫ > 0, let m be such that
kun − uk1 ≤ ǫ whenever n ≥ m. Set v = |u| + i≤m |ui | ∈ L1 , and let M ≥ 0, E ∈ Σ be such that µE is finite and
R
E
(v − M χE • )+ ≤ ǫ. Then, for w ∈ A,
(|w| − M χE • )+ ≤ (|w| − v)+ + (v − M χE • )+ ,
so
R R
E
(|w| − M χE • )+ ≤ k(|w| − v)+ k1 + E
(v − M χE • )+ ≤ 2ǫ. Q
Q
Thus on either hypothesis we can be sure that {un : n ∈ N} and A = {u} ∪ {un : n ∈ N} are uniformly integrable,
so that the two topologies agree on A (by (b)) and hun in∈N converges to u in one topology iff it converges to u in the
other.
246Yb Uniform integrability 189

(d) Because A is k k1 -bounded


R (246Ca) and µ is semi-finite, A ⊆ L1 (245J(b-i)). Given ǫ > 0, let M ≥ 0, E ∈ Σ
be such that µE < ∞ and (|u| − M χE • )+ ≤ ǫ for every u ∈ A. Now the maps u 7→ |u|, u 7→ u − M χE • ,
u 7→ u+ : L0 → L0 are all continuous for the topology ofR convergence in measure (245D), while {u : kuk1 ≤ ǫ} is
Rclosed for the•same topology (245J again), so {u : u ∈ L0 , (|u| − M χE • )+ ≤ ǫ} is closed and must include A. Thus
(|u| − M χE )+ ≤ ǫ for every u ∈ A. As ǫ is arbitrary, A is uniformly integrable.

246K Complex L1 and L1 The definitions and theorems above can be repeated without difficulty for spaces
of (equivalence classes of) complex-valued functions, with just one variation: in the complex equivalent of 246F, the
constant must be changed. It is easy to see that, for u ∈ L1C (µ),

kuk1 ≤ k Re(u)k1 + k Im(u)k1


Z Z Z
≤ 2 sup | Re(u)| + 2 sup | Im(u)| ≤ 4 sup | u|.
F ∈Σ F F ∈Σ F F ∈Σ F
R
(In fact, kuk1 ≤ π supF ∈Σ | F u|; see 246Yl and 252Yt.) Consequently some of the arguments of 246G need to be
written out with different constants, but the results, as stated, are unaffected.

246X Basic exercises (a) Let (X, Σ, µ) be a measure space and A a subset of L1 = L1 (µ). Show that R the following
are equiveridical: (i) A is uniformly integrable; (ii) for every ǫ > 0 there is a w ≥ 0 in L1 such that (|u| − w)+ ≤ ǫ
for every u ∈ A; (iii) h(|un+1 | − supi≤n |ui |)+ in∈N → 0 in L1 for every sequence hun in∈N in A. (Hint: for (ii)⇒(iii), set
vn = supi≤n |ui | and note that hvn ∧ win∈N is convergent in L1 for every w ≥ 0.)

R p > 1 and M ≥ 0 the set {f : f ∈


> (b) Let (X, Σ, µ) be a totally finite measureRspace. Show that for any
Lp (µ), kf kp ≤ M } is uniformly integrable. (Hint: (|f | − M χX)+ ≤ M 1−p |f |p .)

> (c) Let µ be counting measure on N. Show that a set A ⊆ L1 (µ) = ℓ1 is uniformly integrable iff (i) supf ∈A |f (n)| <
P∞
∞ for every n ∈ N (ii) for every ǫ > 0 there is an m ∈ N such that n=m |f (n)| ≤ ǫ for every f ∈ A.

(d) Let X be a set, and let µ be counting measure on X. Show that a set A ⊆ L1 (µ) = ℓ1 (X) is uniformly
P integrable
iff (i) supf ∈A |f (x)| < ∞ for every x ∈ X (ii) for every ǫ > 0 there is a finite set I ⊆ X such that x∈X\I |f (x)| ≤ ǫ
for every f ∈ A. Show that in this case A is relatively compact for the norm topology of ℓ1 (X).

(e) Let (X, Σ, µ) be a measure space, δ > 0, and I ⊆ Σ a family such that (i) every atom belongs to I (ii) E ∈ I
whenever E ∈ Σ and µE ≤ δ (iii) E ∪ F ∈ I whenever E, F ∈ I and E ∩ F = ∅. Show that every set of finite measure
belongs to I.

(f ) Let (X, Σ, µ) and (Y, T, ν) be measure spaces and φ : X → Y an inverse-measure-preserving function. Show
that a set A ⊆ L1 (ν) is uniformly integrable iff {gφ : g ∈ A} is uniformly integrable in L1 (µ). (Hint: use 246G for ‘if’,
246A for ‘only if’.)
p p
> (g) Let (X, Σ, µ) be a measure space and p ∈ [1, ∞[. Let hfn in∈N be a sequence
R in L p = L (µ) such that
p p
{|fn | : n ∈ N} is uniformly integrable and fn → f a.e. Show that f ∈ L and limn→∞ |fn − f | = 0.

(h) Let (X, Σ, µ) be a semi-finite measure space and p ∈ [1, ∞[. Let hun in∈N be a sequence in Lp = Lp (µ) and
u ∈ L0 (µ). Show that the following are equiveridical: (i) u ∈ Lp and hun in∈N converges to u for k kp (ii) hun in∈N
converges in measure to u and {|un |p : n ∈ N} is uniformly integrable. (Hint: 245Xl.)

(i) Let (X, Σ, µ) be a totally finite measure space, and 1 ≤ p < r ≤ ∞. Let hun in∈N be a k kr -bounded sequence
in Lr (µ) which converges in measure to u ∈ L0 (µ). Show that hun in∈N converges to u for k kp . (Hint: show that
{|un |p : n ∈ N} is uniformly integrable.)

246Y Further exercises (a) Let (X, Σ, µ) be a totally finite measure space. Show that A ⊆RL1 (µ) is uniformly
integrable iff there is a convex function φ : [0, ∞[ → R such that lima→∞ φ(a)/a = ∞ and supf ∈A φ(|f |) < ∞.

(b) For any metric space (Z, ρ), let CZ be the family of closed subsets of Z, and for F , F ′ ∈ CZ \ {∅} set ρ̃(F, F ′ ) =
min(1, max(supz∈F inf z′ ∈F ′ ρ(z, z ′ ), supz′ ∈F ′ inf z∈F ρ(z, z ′ ))). Show that ρ̃ is a metric on CZ \ {∅} (it is the Hausdorff
metric). Show that if (Z, ρ) is complete then the family KZ \ {∅} of non-empty compact subsets of Z is closed for ρ̃.
Now let (X, Σ, µ) be any measure space and take Z = L1 = L1 (µ), ρ(z, z ′ ) = kz − z ′ k1 for z, z ′ ∈ Z. Show that the
family of non-empty closed uniformly integrable subsets of L1 is a closed subset of CZ \ {∅} including KZ \ {∅}.
190 Function spaces 246Yc

(c) Let (X, Σ, µ) be a totally finite measure space and A ⊆ L1 (µ) a uniformly integrable set. Show that there is
a uniformly integrable set C ⊇ A such that (i) C is convex and closed in L0 (µ) for the topology of convergence in
measure (ii) if u ∈ C and |v| ≤ |u| then v ∈ C (iii) if T belongs to the set T + of operators from L1 (µ) = M 1,∞ (µ) to
itself, as described in 244Xm, then T [C] ⊆ C.
R
(d) Let µ be Lebesgue measure on R. Show that a set A ⊆ L1 (µ) is uniformly integrable iff limn→∞ Fn fn
= 0 for every disjoint sequence hFn in∈N of compact sets in R and every sequence hfn in∈N in A.
R
(e) Let µ be Lebesgue measure on R. Show that a set A ⊆ L1 (µ) is uniformly integrable iff limn→∞ Gn fn
= 0 for every disjoint sequence hGn in∈N of open sets in R and every sequence hfn in∈N in A.

(f ) Repeat 246Yd and 246Ye for Lebesgue measure on arbitrary subsets of R r .

(g) Let X be a set and Σ a σ-algebra of subsets of X. Let hνn in∈N be a sequence of countably additive functionals on
Σ such that νE = limn→∞ νn E is defined for every E ∈ Σ. Show that limn→∞ νn Fn = 0 whenever hFn in∈N is a disjoint
sequence in Σ. (Hint: suppose otherwise. By taking suitable S subsequences reduce to the case in which |νn Fi − νFi | ≤
2−n ǫ for i < n, |νn Fn | ≥ 3ǫ, |νn Fi | ≤ 2−i ǫ for i > n. Set F = i∈N F2i+1 and show that |ν2n+1 F − ν2n F | ≥ ǫ for every
n.)
R
(h) Let (X, Σ, µ) be a measure space and hun in∈N a sequence in L1 = L1 (µ) such that limn→∞ F un is defined for
every F ∈ Σ. Show that {un : n ∈ N} is uniformly integrable. (Hint: suppose R not. Then there are a disjoint sequence
hFn in∈N in Σ and a subsequence hu′n in∈N of hun in∈N such that inf n∈N | Fn u′n | = ǫ > 0. But this contradicts 246Yg.)
P∞
(i) In 246Yg, show that ν is countably additive. (Hint: Set µ = n=0 an νn for a suitable sequence han in∈N of
strictly positive numbers. For each n choose a Radon-Nikodým derivative fn of νn with respect to µ. Show that
{fn : n ∈ N} is uniformly integrable, so that ν is truly continuous.)(This is the Vitali-Hahn-Saks theorem.)

(j) Let (X, Σ, µ) be Rany measure space, and A ⊆ L1 (µ). Show that the following are
R equiveridical: (i) A is k k1 -
bounded; (ii) supu∈A | F u| < ∞ for every µ-atom F ∈ Σ and lim supn→∞ supu∈A | Fn u| < ∞ for every disjoint
R
sequence hFn in∈N of measurable sets of finite measure; (iii) supu∈A | E u| < ∞ for every E ∈ Σ. (Hint: show that
han un in∈N is uniformly integrable whenever limn→∞ an = 0 in R and hun in∈N is a sequence in A.)

(k) Let (X, Σ, µ) be a measure space and A ⊆ L1 (µ) a non-empty set. Show that the following are equiveridical:
(i) A is uniformly integrable;R (ii) whenever B ⊆ L∞ (µ) is non-empty and downwards-directed and has infimum 0 in
L∞ (µ) then inf v∈B supu∈A | u × v| = 0. (Hint: for (i)⇒(ii), note that inf v∈B w × v = 0 for every w ≥ 0 in L0 . For
(ii)⇒(i), use 246G(iv).)
R
(l) Set f (x) = eix for x ∈ [−π, π]. Show that | E f | ≤ 2 for every E ⊆ [−π, π].

246 Notes and comments I am holding over to the next section the most striking property of uniformly integrable
sets (they are the relatively weakly compact sets in L1 ) because this demands some non-trivial ideas from functional
analysis and general topology. In this section I give the results which can be regarded as essentially measure-theoretic
in inspiration. The most important new concept, or technique, is that of ‘disjoint-sequence theorem’. A typical example
is in condition (iii) of 246G, relating uniform integrability to the behaviour of functionals on disjoint sequences of sets.
I give variants of this in 246Yd-246Yf, and 246Yg-246Yj are further results in which similar methods can be used. The
central result of the next section (247C) will also use disjoint sequences in the proof, and they will appear more than
once in Chapter 35 in the next volume.
The phrase ‘uniformly integrable’ ought to mean something like ‘uniformly approximable by simple functions’, and
the definition 246A can be forced into such a form, but I do not think it very useful to do so. However condition
(ii) of 246G amounts to something like ‘uniformly truly continuous’, if we think of members of L1 as truly continuous
functionals on Σ, as in 242I. (See 246Yi.) Note that in each of the statements (ii)-(iv) of 246G we need to take special
note of any atoms for the measure, since they are not controlled by the main condition imposed. In an atomless measure
space, of course, we have a simplification here, as in 246Yd-246Yf.
Another way of justifying
R the ‘uniformly’ in ‘uniformly integrable’ is by considering functionals θw where w ≥ 0 in
L1 , setting θw (u) = (|u| − w)+ for u ∈ L1 ; then A ⊆ L1 is uniformly integrable iff θw → 0 uniformly on A as w rises
in L1 (246Xa). It is sometimes useful to know that if this is true at all then it is necessarily witnessed by elements w
which can be built directly from materials at hand (see (iii) of 246Xa). Furthermore, the sets Awǫ = {u : θw (u) ≤ ǫ} are
always convex, k k1 -closed and ‘solid’ (if u ∈ Awǫ and |v| ≤ |u| then v ∈ Awǫ )(246Cd); they are closed under pointwise
247B Weak compactness in L1 191

convergence of sequences (246Ja) and in semi-finite measure spaces they are closed for the topology of convergence in
measure (246Jd); in probability spaces, for level w, they are closed under conditional expectations (246D) and similar
operators (246Yc). Consequently we can expect that any uniformly integrable set will be included in a uniformly
integrable set which is closed under operations of several different types.
Yet another ‘uniform’ property of uniformly integrable sets is in 246Yk. The norm k k∞ is never (in interesting
cases) order-continuous in the way that other k kp are (244Ye); but the uniformly integrable subsets of L1 provide
interesting order-continuous seminorms on L∞ .
246J supplementsR results from §245. In the R notes to that section I Rmentioned the question: if hfn in∈N → f a.e., in
what ways can h fn in∈N fail to converge to f ? Here we find that h |fn − f |in∈N → 0 iff {fn : n ∈ N} is uniformly
integrable; this is a way of making precise the expression ‘none of the weight of the sequence is lost at infinity’. Generally,
for sequences, convergence in k kp , for p ∈ [1, ∞[, is convergence in measure for pth-power-uniformly-integrable sequences
(246Xh).

247 Weak compactness in L1


I now come to the most striking feature of uniform integrability: it provides a description of the relatively weakly
compact subsets of L1 (247C). I have put this into a separate section because it demands some knowledge of functional
analysis – in particular, of course, of weak topologies on Banach spaces. I will try to give an account in terms which are
accessible to novices in the theory of normed spaces because the result is essentially measure-theoretic, as well as being
of vital importance to applications in probability theory. I have written out the essential definitions in §§2A3-2A5.

247A Part of the argument of the main theorem below will run more smoothly if I separate out an idea which is,
in effect, a simple special case of a theme which has been running through the exercises of this chapter (241Yg, 242Yb,
243Ya, 244Yd).
Lemma Let (X, Σ, µ) be a measure space, and G any member of Σ. Let µG be the subspace measure on G, so that
µG E = µE for E ⊆ G, E ∈ Σ. Set
U = {u : u ∈ L1 (µ), u × χG• = u} ⊆ L1 (µ).
Then we have an isomorphism S between the ordered normed spaces U and L1 (µG ), given by writing
S(f • ) = (f ↾G)•
for every f ∈ L1 (µ) such that f • ∈ U .
proof Of course I should remark explicitly that U is a linear subspace of L1 (µ). I have discussed integration over
subspaces in §§131 and 214; in particular, I noted that f ↾G is integrable, and that
R R R
|f ↾G|dµG = |f | × χG dµ ≤ |f |dµ
for every f ∈ L1 (µ) (131Fa). If f , g ∈ L1 (µ) and f = g µ-a.e., then f ↾G = g↾G µG -a.e.; so the proposed formula for
S does indeed define a map from U to L1 (µG ).
Because
(f + g)↾G = (f ↾G) + (g↾G), (cf )↾G = c(f ↾G)
1
for all f , g ∈ L (µ) and all c ∈ R, S is linear. Because
f ≤ g µ-a.e. =⇒ f ↾G ≤ g↾G µG -a.e.,
R R
S is order-preserving. Because |f ↾G|dµG ≤ |f |dµ for every f ∈ L1 (µ), kSuk1 ≤ kuk1 for every u ∈ U .
To see that S is surjective, take any v ∈ L1 (µG ). Express v as g • where g ∈ L1 (µG ). By 131E, f ∈ L1 (µ), where
f (x) = g(x) for x ∈ dom g, 0 for x ∈ X \ G; so that f • ∈ U and f ↾G = g and v = S(f • ) ∈ S[U ].
To see that S is norm-preserving, note that, for any f ∈ L1 (µ),
R R
|f ↾G|dµG = |f | × χG dµ,
so that if u = f • ∈ U we shall have
R R
kSuk1 = |f ↾G|dµG = |f | × χG dµ = ku × χG• k1 = kuk1 .
192 Function spaces 247B

247B Corollary Let (X, Σ, µ) be any measure space, and let G ∈ Σ be a measurable set expressible as a countable
1
R as in 247A, and let h : L (µ) → R be any continuous linear functional. Then
union of sets of finite measure. Define U

there is a v ∈ L (µ) such that h(u) = u × v dµ for every u ∈ U .
proof Let S : U → L1 (µG ) be the isomorphism described in 247A. Then S −1 : L1 (µG ) → U is linear and continuous,
so h1 = hS −1 belongs to the normed space dual (L1 (µG ))∗ of L1 (µG ). Now of course µG is σ-finite, therefore localizable
(211L), so 243Gb tells us that there is a v1 ∈ L∞ (µG ) such that
R
h1 (u) = u × v1 dµG
for every u ∈ L1 (µG ).
Express v1 as g1• where g1 : G → R is a bounded measurable function. Set g(x) = g1 (x) for x ∈ G, 0 for x ∈ X \ G;
then g : X → R is a bounded measurable function, and v = g • ∈ L∞ (µ). If u ∈ U , express u as f • where f ∈ L1 (µ);
then

Z
h(u) = h(S −1 Su) = h1 ((f ↾G)• ) = (f ↾G) × g1 dµG
Z Z Z Z
= (f × g)↾G dµG = f × g × χG dµ = f × g dµ = u × v.

As u is arbitrary, this proves the result.

247C Theorem Let (X, Σ, µ) be any measure space and A a subset of L1 = L1 (µ). Then A is uniformly integrable
iff it is relatively compact in L1 for the weak topology of L1 .
proof (a) Suppose that A is relatively compact for the weak topology. I seek to show that it satisfies the condition
(iii) of 246G.
R R
(i) If F ∈ Σ, then surely supu∈A | F u| < ∞, because u 7→ F u belongs to (L1 )∗ , and if h ∈ (L1 )∗ then the image
of any relatively weakly compact set under h must be bounded (2A5Ie).
R
(ii) Now suppose that hFn in∈N is a disjoint sequence in Σ. ?? Suppose, if possible, that hsupu∈A | Fn u|in∈N does
not converge to 0. Then there is a strictly increasing sequence hn(k)ik∈N in N such that
1 R
γ= inf k∈N supu∈A | Fn(k)
u| > 0.
2
R
For each k, choose uk ∈ A such that | Fn(k)
uk | ≥ γ. Because A is relatively compact for the weak topology, there is a
cluster point u of huk ik∈N in L for the weak topology (2A3Ob). Set ηj = 2−j γ/6 > 0 for each j ∈ N.
1

We can now choose a strictly increasing sequence hk(j)ij∈N inductively so that, for each j,
R Pj−1
F
(|u| + i=0 |uk(i) |) ≤ ηj
n(k(j))

Pj−1 R R
i=0 | F n(k(i))
u− Fn(k(i))
uk(j) | ≤ ηj
P−1 Pj−1 R
for every j, interpreting i=0 as 0. P P Given hk(i)ii<j , set v ∗ = |u| + i=0 |uk(i) |; then limk→∞ Fn(k) v ∗ = 0, by
RLebesgue’s Dominated Convergence Theorem or otherwise, so there is a k ∗ such that k ∗ > k(i) for every i < j and
Fn(k)
v ≤ ηj for every k ≥ k ∗ . Next,

Pj−1 R R
w 7→ i=0 | F u− F w| : L1 → R
n(k(i)) n(k(i))

is continuous for the weak topology of L and zero at u, and u belongs to every weakly open set containing {uk : k ≥ k ∗ },
1
Pj−1 R R
so there is a k(j) ≥ k ∗ such that i=0 | Fn(k(i)) u − Fn(k(i)) uk(j) | < ηj , which continues the construction. Q
Q
1
R
Let v be any cluster point in L , for the weak topology, of huk(j) ij∈N . Setting Gi = Fn(k(i)) , we have | Gi u −
R R R R R
uk(j) | ≤ ηj whenever i < j, so limj→∞ Gi uk(j) exists = Gi u for each i, and Gi v = Gi u for every i; setting
Gi S
G = i∈N Gi ,
R P∞ R P∞ R R
G
v = i=0 G v = i=0 G u = G u,
i i

by 232D, because hGi ii∈N is disjoint.


For each j ∈ N,
247C Weak compactness in L1 193

j−1 Z
X ∞
X Z
| uk(j) | + | uk(j) |
i=0 Gi i=j+1 Gi

j−1 Z
X j−1 Z
X Z ∞ Z
X
≤ |u| + | u− uk(j) | + |uk(j) |
i=0 Gi i=0 Gi Gi i=j+1 Gi

j−1
X ∞
X ∞
X γ
≤ ηi + ηj + ηi = ηi = .
3
i=0 i=j+1 i=0

R
On the other hand, | Gj
uk(j) | ≥ γ. So
R P∞ R 2
| G
uk(j) | = | i=0 Gi
uk(j) | ≥ γ.
3
R 2
R This is true for every j; because every weakly open set containing v meets {uk(j) : j ∈ N}, | G
v| ≥ 3γ and
| G u| ≥ 23 γ. On the other hand,
R P∞ R P∞ R P∞ γ
| G
u| = | i=0 Gi
u| ≤ i=0 Gi
|u| ≤ i=0 ηi = ,
3
which is absurd. XX R
This contradiction shows that limn→∞ supu∈A | Fn u| = 0. As hFn in∈N is arbitrary, A satisfies the condition 246G(iii)
and is uniformly integrable.
(b) Now assume that A is uniformly integrable. I seek a weakly compact set C ⊇ A.
R
(i) For each n ∈ N, choose En ∈ Σ, Mn ≥ 0 such that µEn < ∞ and (|u| − Mn χEn• )+ ≤ 2−n for every u ∈ A.
Set
R
C = {v : v ∈ L1 , | F
v| ≤ Mn µ(F ∩ En ) + 2−n ∀ n ∈ N, F ∈ Σ},
and note that A ⊆ C, because if u ∈ A and F ∈ Σ,
R R R
| F
u| ≤ F
(|u| − Mn χEn• )+ + F
Mn χEn• ≤ 2−n + Mn µ(F ∩ En )
for every n. Observe also that C is k k1 -bounded, because
R
kuk1 ≤ 2 supF ∈Σ | F
u| ≤ 2(1 + M0 µ(F ∩ E0 )) ≤ 2(1 + M0 µE0 )
for every u ∈ C (using 246F).
(ii) Because I am seeking to prove this theorem for arbitrary measure spaces (X, Σ, µ), I cannot use 243G to
identify the dual of L1 . Nevertheless, 247B above
R shows that 243Gb it is ‘nearly’Svalid, in the following sense: if
h ∈ (L1 )∗ , there is a v ∈ L∞ such that h(u) = u × v for everyRu ∈ C. P
P Set G = n∈N En ∈ Σ, and define U ⊆ L1

as in 247A-247B. By 247B, there is a v ∈ L such that h(u) = u × v for every u ∈ U . But if u ∈ C, we can express
u as f • where f : X → R is measurable. If F ∈ Σ and F ∩ G = ∅, then
R R
| F
f| = | F
u| ≤ 2−n + Mn µ(F ∩ En ) = 2−n
R
for every n ∈ N, so F Rf = 0; it follows that f = 0 a.e. on X \ G (131Fc), so that f × χG =a.e. f and u = u × χG• , that
is, u ∈ U , and h(u) = u × v, as required. Q Q
(iii) So we may proceed, having an adequate description, not of (L1 (µ))∗ itself, but of its action on C.
Let F be any ultrafilter on L1 containing C (see 2A3R). For each F ∈ Σ, set
R
νF = limu→F F
u;
because
R
supu∈C |
u| ≤ supu∈C kuk1 < ∞, F
R R R
this is well-defined in R (2A3Se). If E, F are disjoint members of Σ, then E∪F u = E u + F u for every u ∈ C, so
R R R
ν(E ∪ F ) = limu→F E∪F
u = limu→F E
u + limu→F F
u = νE + νF
194 Function spaces 247C

(2A3Sf). Thus ν : Σ → R is additive. Next, it is truly continuous with respect to µ. P


P Given ǫ > 0, take n ∈ N such
that 2−n ≤ 21 ǫ, set δ = ǫ/2(Mn + 1) > 0 and observe that
R
|νF | ≤ supu∈C | F
u| ≤ 2−n + Mn µ(F ∩ En ) ≤ ǫ
R
whenever µ(F ∩ En ) ≤ δ. QQ By the Radon-Nikodým theorem (232E), there is an f0 ∈ L1 such that F
f0 = νF for
every F ∈ Σ. Set u0 = f0 ∈ L1 . If n ∈ N, F ∈ Σ then

R R
| F
u0 | = |νF | ≤ supu∈C | F
u| ≤ 2−n + Mn µ(F ∩ En ),
so u0 ∈ C.

R Of course the point is that F converges


(iv) to u0 . PP Let h ∈ (L1 )∗ . Then there is a v ∈ L∞ such that
h(u) = u × v for every u ∈ C. Express v as g , where g : X → R is bounded and Σ-measurable. Let ǫ > 0. Take a0 ≤

a1 ≤ . . . ≤ an such that ai+1


Pn− ai ≤ ǫ for each i while a0 ≤ g(x) < an for each x ∈ X. Set Fi = {x : ai−1 ≤ g(x) < ai }
for 1 ≤ i ≤ n, and set g̃ = i=1 ai χFi , ṽ = g̃ • ; then kṽ − vk∞ ≤ ǫ. We have
Z n
X Z n
X
u0 × ṽ = ai u= ai νFi
i=1 Fi i=1
Xn Z n
X Z Z
= ai lim u = lim ai u = lim u × ṽ.
u→F Fi u→F Fi u→F
i=1 i=1

Consequently
Z Z Z Z Z Z
lim sup | u×v− u0 × v| ≤ | u0 × v − u0 × ṽ| + sup | u×v− u × ṽ|
u→F u∈C

≤ ku0 k1 kv − ṽk∞ + sup kuk1 kv − ṽk∞


u∈C

≤ 2ǫ sup kuk1 .
u∈C

As ǫ is arbitrary,
Z Z
lim sup |h(u) − h(u0 )| = lim sup | u×v− u0 × v| = 0.
u→F u→F

As h is arbitrary, u0 is a limit of F in C for the weak topology of L1 . Q


Q
As F is arbitrary, C is weakly compact in L1 , and the proof is complete.

247D Corollary Let (X, Σ, µ) and (Y, T, ν) be any two measure spaces, and T : L1 (µ) → L1 (ν) a continuous linear
operator. Then T [A] is a uniformly integrable subset of L1 (ν) whenever A is a uniformly integrable subset of L1 (µ).
proof The point is that T is continuous for the respective weak topologies (2A5If). If A ⊆ L1 (µ) is uniformly
integrable, then there is a weakly compact C ⊇ A, by 247C; T [C], being the image of a compact set under a continuous
map, must be weakly compact (2A3N(b-ii)); so T [C] and T [A] are uniformly integrable by the other half of 247C.

247E Complex L1 There are no difficulties, and no surprises, in proving 247C for L1C . If we follow the same proof,
everything works, but of course we must remember to change the constant when applying 246F, or rather 246K, in
part (b-i) of the proof.

247X Basic exercises > (a) Let (X, Σ, µ) be any measure space. Show that if A ⊆ L1 = L1 (µ) is relatively weakly
compact, then {v : v ∈ L1 , |v| ≤ |u| for some u ∈ A} is relatively weakly compact.

(b) Let (X,


R Σ, µ)Rbe a measure space. 1
R On L =
1
R L (µ) define pseudometrics ρF , ρ′w for F ∈ Σ, w ∈ L∞ (µ) by setting

ρF (u, v) = | F u − F v|, ρw (u, v) = | u × w − v × w| for u, v ∈ L . Show that on any k k1 -bounded subset of L1 ,
1

the topology defined by {ρF : F ∈ Σ} agrees with the topology generated by {ρ′w : w ∈ L∞ }.

> (c) Show that for any set X a subset of ℓ1 = ℓ1 (X) is compact for the weak topology of ℓ1 iff it is compact for the
norm topology of ℓ1 . (Hint: 246Xd.)
247 Notes Weak compactness in L1 195

(d) Use the argument of (a-ii) in the proof of 247C to show directly that if A ⊆ ℓ1 (N) is weakly compact then
inf n∈N |un (n)| = 0 for any sequence hun in∈N in A.
(e) Let (X, Σ, µ) and (Y, T, ν) be measure spaces, and T : L2 (ν) → L1 (µ) any bounded linear operator. Show that
{T u : u ∈ L2 (ν), kuk2 ≤ 1} is uniformly integrable in L1 (µ). (Hint: use 244K to see that {u : kuk2 ≤ 1} is weakly
compact in L2 (ν).)

247Y Further exercises (a) Let (X, Σ, µ) be a measure space. Take 1 < p < ∞ and M ≥ 0 and set A = {u : u ∈
Lp = Lp (µ), kukp ≤ M }. Write SA for the topology of convergence in measure on A, that is, the subspace topology
induced by the topology of convergence in measure on L0 (µ). Show that if h ∈ (Lp )∗ then h↾A is continuous for SA ;
so that if T is the weak topology on Lp , then the subspace topology TA is included in SA .
R
(b) Let (X, Σ, µ) be a measure space and hun in∈N a sequence in L1 = L1 (µ) such that limn→∞ F un is defined for
every F ∈ Σ. Show that {un : n ∈ N} is weakly convergent. (Hint: 246Yh.) Find an alternative argument relying on
2A5J and the result of 246Yj.

247 Notes and comments In 247D and 247Xa I try to suggest the power of the identification between weak
compactness and uniform integrability. That a continuous image of a weakly compact set should be weakly compact is
a commonplace of functional analysis; that the solid hull of a uniformly integrable set should be uniformly integrable
is immediate from the definition. But I see no simple arguments to show that a continuous image of a uniformly
integrable set should be uniformly integrable, or that the solid hull of a weakly compact set should be relatively weakly
compact. (Concerning the former, an alternative route does exist; see 371Xf in the next volume.)
I can distinguish two important ideas in the proof of 247C. The first, in (a-ii) of the proof, is a careful manipulation
of sequences; it is the argument needed to show that a weakly compact subset of ℓ1 is norm-compact. (You may find
it helpful to write out a solution to 247Xd.) The Fn(k) and uk are chosen to mimic the situation in which we have a
sequence in ℓ1 such that uk (k) = 1 for each k. The k(i) are chosen so thatP∞the ‘hump’ moves sufficiently rapidly
R along
for uk(j) (k(i)) to be very small whenever i 6= j. But this means that i=0 uk(j) (k(i)) (corresponding to G uk(j) in
P∞
the proof) is always substantial, while i=0 v(k(i)) will be small for any putative cluster point v of huk(j) ij∈N . I used
similar techniques in §246; compare 246Yg.
In the other half of the proof of 247C, the strategy is clearer. Members of L1 correspond to truly continuous
functionals on Σ; the uniform integrability of C makes the corresponding set of functionals ‘uniformly truly continuous’,
so that any limit functional will also be truly continuous and will give us a member of L1 via the Radon-NikodýmR
theorem.
R A straightforward approximation argument ((a-iv) in the proof, and 247Xb) shows that limu∈F u × w =
v × w for every w ∈ L∞ . For localizable measures µ, this would complete the proof. For the general case, we need
another step, here done in 247A-247B; a uniformly integrable subset of L1 effectively lives on a σ-finite part of the
measure space, so that we can ignore the rest of the measure and suppose that we have a localizable measure space.
The conditions (ii)-(iv) of 246G make it plain that weak compactness in L1 can be effectively discussed in terms
of sequences; see also 246Yh. I should remark that this is a general feature of weak compactness in Banach spaces
(2A5J). Of course the disjoint-sequence formulations in 246G are characteristic of L1 – I mean that while there are
similar results applicable elsewhere (see Fremlin 74, chap. 8), the ideas are clearest and most dramatically expressed
in their application to L1 .
196 Product measures

Chapter 25
Product Measures
I come now to another chapter on ‘pure’ measure theory, discussing a fundamental construction – or, as you may
prefer to consider it, two constructions, since the problems involved in forming the product of two arbitrary measure
spaces (§251) are rather different from those arising in the product of arbitrarily many probability spaces (§254). This
work is going to stretch our technique to the utmost, for while the fundamental theorems to which we are moving are
natural aims, the proofs are lengthy and there are many pitfalls beside the true paths. RR
The central idea is that of ‘repeated integration’. You have probably already seen formulae of the type ‘ f (x, y)dxdy’
used to calculate the integral of a function of two real variables over a region in the plane.
R 1 R 1 One of the basic techniques
of advanced calculus is reversing the order of integration; for instance, we expect 0 ( y f (x, y)dx)dy to be equal to
R1 Rx
R0 ( 0 f (x, y)dy)dx. As I have developed the subject, we already have a third calculation to compare with these two:
D
f , where D = {(x, y) : 0 ≤ y ≤ x ≤ 1} and the integral is taken with respect to Lebesgue measure on the plane.
The first two sections of this chapter are devoted to an analysis of the relationship between one- and two-dimensional
Lebesgue measure which makes these operations valid – some of the time; part of the work has to be devoted to a
careful description of the exact conditions which must be imposed on f and D if we are to be safe.
Repeated integration, in one form or another, appears everywhere in measure theory, and it is therefore necessary
sooner or later to develop the most general possible expression of the idea. The standard method is through the theory
of products of general measure spaces. Given measure spaces (X, Σ, µ) and (Y, T, ν), the aim is to find a measure λ
on X × Y which will, at least, give the right measure µE · νF to a ‘rectangle’ E × F where E ∈ Σ and F ∈ T. It
turns out that there are already difficulties in deciding what ‘the’ product measure is, and to do the job properly I
find I need, even at this stage, to describe two related but distinguishable constructions. These constructions and their
elementary properties take Rup the whole RR of §251. In §252 I turn to integration over the product, with Fubini’s and
Tonelli’s theorems relating f dλ with f (x, y)µ(dx)ν(dy).
RR Because the construction RR of λ is symmetric between the
two factors, this automatically provides theorems relating f (x, y)µ(dx)ν(dy) with f (x, y)ν(dy)µ(dx). §253 looks
at the space L1 (λ) and its relationship with L1 (µ) and L1 (ν).
For general measure spaces, there are obstacles in the way of forming an Q infinite product; to start with,
Qif h(Xn , µn )in∈N

is a sequence of measure spaces, then a product measure λ on X = n∈N Xn ought to set λX = n=0 µn Xn , and
there is no guarantee that the product will converge, or behave well when it does. But for probability spaces, when
µn Xn = 1 for every n, this problem at least evaporates. It is possible to define the product of any family of probability
spaces; this is the burden of §254.
I end the chapter with three sections which are a preparation for Chapters 27 and 28, but are also important in
their own right as an investigation of the way in which the group structure of R r interactsR with Lebesgue and other
measures. §255 deals with the ‘convolution’ f ∗ g of two functions, where (f ∗ g)(x) = f (y)g(x − y)dy (the integration
being with respect to Lebesgue measure). In §257 I show that some of the same ideas, suitably transformed, can be
used to describe a convolution ν1 ∗ ν2 of two measures on R r ; in preparation for this I include a section on Radon
measures on R r (§256).

251 Finite products


The first construction to set up is the product of a pair of measure spaces. It turns out that there are already
substantial technical difficulties in the way of finding a canonical universally applicable method. I find myself therefore
describing two related, but distinct, constructions, the ‘primitive’ and ‘c.l.d.’ product measures (251C, 251F). After
listing the fundamental properties of the c.l.d product measure (251I-251J), I work through the identification of the
product of Lebesgue measure with itself (251N) and a fairly thorough discussion of subspaces (251O-251S).

251A Definition Let (X, Σ, µ) and (Y, T, ν) be two measure spaces. For A ⊆ X × Y set
P∞ S
θA = inf{ n=0 µEn · νFn : En ∈ Σ, Fn ∈ T ∀ n ∈ N, A ⊆ n∈N En × Fn }.

Remark In the products µEn · νFn , 0 · ∞ is to be taken as 0, as in §135.

251B Lemma In the context of 251A, θ is an outer measure on X × Y .


proof (a) Setting En = Fn = ∅ for every n ∈ N, we see that θ∅ = 0.
S S
(b) If A ⊆ B ⊆ X × Y , then whenever B ⊆ n∈N En × Fn we shall have A ⊆ n∈N En × Fn ; so θA ≤ θB.
251E Finite products 197

(c) Let hAn in∈N be a sequence of subsets of X × Y , with union


S A. For any ǫ > 0,
Pwe may choose, for each n ∈ N,

sequences hEnm im∈N in Σ and hFnm im∈N in T such that An ⊆ m∈N Enm ×Fnm and m=0 µEnm ·νFnm ≤ θAn +2−n ǫ.
Because N × N is countable, we have a bijection k 7→ (nk , mk ) : N → N × N, and now
S S
A ⊆ n,m∈N Enm × Fnm = k∈N Enk mk × Fnk mk ,
so that

X ∞ X
X ∞
θA ≤ µEnk mk · νFnk mk = µEnm · νFnm
k=0 n=0 m=0
X∞ ∞
X
≤ θAn + 2−n ǫ = 2ǫ + θAn .
n=0 n=0
P∞
As ǫ is arbitrary, θA ≤ n=0 θAn .
As hAn in∈N is arbitrary, θ is an outer measure.

251C Definition Let (X, Σ, µ) and (Y, T, ν) be measure spaces. By the primitive product measure on X × Y
I shall mean the measure λ0 derived by Carathéodory’s method (113C) from the outer measure θ defined in 251A.
Remark I ought to point out that there is no general agreement on what ‘the’ product measure on X × Y should be.
Indeed in 251F below I will introduce an alternative one, and in the notes to this section I will mention a third.

251D Definition It is convenient to have a name for a natural construction for σ-algebras. If X and Y are
b for the σ-algebra of subsets of X × Y generated by
sets with σ-algebras Σ ⊆ PX and T ⊆ PY , I will write Σ⊗T
{E × F : E ∈ Σ, F ∈ T}.

251E Proposition Let (X, Σ, µ) and (Y, T, ν) be measure spaces; let λ0 be the primitive product measure on X ×Y ,
b ⊆ Λ and λ0 (E × F ) = µE · νF for all E ∈ Σ and F ∈ T.
and Λ its domain. Then Σ⊗T
proof Throughout this proof, write Σf = {E : E ∈ Σ, µE < ∞}, Tf = {F : F ∈ T, νF < ∞}.
(a) Suppose
S that E ∈ Σ and
P∞ A ⊆ X × Y . For any ǫ > 0, there are sequences hEn in∈N in Σ and hFn in∈N in T such
that A ⊆ n∈N En × Fn and n=0 µEn · νEn ≤ θA + ǫ. Now
S S
A ∩ (E × Y ) ⊆ n∈N (En ∩ E) × Fn , A \ (E × Y ) ⊆ n∈N (En \ E) × Fn ,
so

X ∞
X
θ(A ∩ (E × Y )) + θ(A \ (E × Y )) ≤ µ(En ∩ E) · νFn + µ(En \ E) · νFn
n=0 n=0
X∞
= µEn · νFn ≤ θA + ǫ.
n=0

As ǫ is arbitrary, θ(A ∩ (E × Y )) + θ(A \ (E × Y )) ≤ θA. And this is enough to ensure that E × Y ∈ Λ (see 113D).
(b) Similarly, X × F ∈ Λ for every F ∈ T, so E × F = (E × Y ) ∩ (X × F ) ∈ Λ for every E ∈ Σ, F ∈ T.
b
Because Λ is a σ-algebra, it must include the smallest σ-algebra containing all the products E ×F , that is, Λ ⊇ Σ⊗T.
(c) Take E ∈ Σ, F ∈ T. We know that E × F ∈ Λ; setting E0 = E, F0 = F , En = Fn = ∅ for n ≥ 1 in the definition
of θ, we have
λ0 (E × F ) = θ(E × F ) ≤ µE · νF .

S We have come to the central idea of the construction. In factP∞θ(E × F ) = µE · νF . P P Suppose that E × F ⊆
n∈N E n × F n where E n ∈ Σ and F n ∈ T for every n. Set u = n=0 µE n · νF n . If u = ∞ or µE = 0 or νF = 0 then
of course µE · νF ≤ u. Otherwise, set
I = {n : n ∈ N, µEn = 0},
J = {n : n ∈ N, νFn = 0}, K = N \ (I ∪ J),
S S
E ′ = E \ n∈I En , F ′ = F \ n∈J Fn .
S
Then µE ′ = µE and νF ′ = νF ; E ′ × F ′ ⊆ n∈K En × Fn ; and for n ∈ K, µEn < ∞ and νFn < ∞, since
µEn · νFn ≤ u < ∞ and neither µEn nor νFn is zero. Set
198 Product measures 251E

fn = νFn χEn : X → R
R
if n ∈ K, and fn = 0 : X → R if n ∈ I ∪ J. Then fn is a simple function and fn = νFn µEn for n ∈ K, 0 otherwise,
so
P∞ R P∞
n=0 fn (x)µ(dx) = n=0 µEn · νFn ≤ u.
Pn P∞ R
By B.Levi’s theorem (123A), applied to h k=0 fk in∈N , g = n=0 fn is integrable and g dµ ≤ u. Write E ′′ for
′ ′′ ′ ′′
{x : x ∈ E S , g(x) < ∞}, so that µE
S = µE = µE. Now take any x ∈ E and set Kx = {n : n ∈ K, x ∈ En }. Because
E ′ × F ′ ⊆ n∈K En × Fn , F ′ ⊆ n∈Kx Fn and
P P∞
νF = νF ′ ≤ n∈Kx νFn = n=0 fn (x) = g(x).
Thus g(x) ≥ νF for every x ∈ E ′′ . We are supposing that 0 < µE = µE ′′ and 0 < νF , so we must have νF < ∞,
µE ′′ < ∞. Now g ≥ νF χE ′′ , so
R R P∞
µE · νF = µE ′′ · νF = νF χE ′′ ≤ g ≤ u = n=0 µEn · νFn .
As hEn in∈N , hFn in∈N are arbitrary, θ(E × F ) ≥ µE · νF and θ(E × F ) = µE · νF . Q
Q
Thus
λ0 (E × F ) = θ(E × F ) = µE · νF
for all E ∈ Σ, F ∈ T.

251F Definition Let (X, Σ, µ) and (Y, T, ν) be measure spaces, and λ0 the primitive product measure defined in
251C. By the c.l.d. product measure on X × Y I shall mean the function λ : dom λ0 → [0, ∞] defined by setting
λW = sup{λ0 (W ∩ (E × F )) : E ∈ Σ, F ∈ T, µE < ∞, νF < ∞}
for W ∈ dom λ0 .

251G Remark I had better show at once that λ is a measure. P P Of course its domain Λ = dom λ0 is a σ-algebra,
and λ∅ = λ0 ∅ = 0. If hWn in∈N is a disjoint sequence in Λ, then for any E ∈ Σ, F ∈ T of finite measure
S P∞ P∞
λ0 ( n∈N Wn ∩ (E × F )) = n=0 λ0 (Wn ∩ (E × F )) ≤ n=0 λWn ,
S P∞ P∞
so λ( n∈N Wn ) ≤ n=0 λWn . On the other hand, if a < n=0 λWn , then we can find m ∈ N and a0 , . . . , am such
Pm
that a ≤ n=0 an and an < λWn for each n ≤ m; now there are E0 ,S . . . , Em ∈ Σ and FS
0 , . . . , Fm ∈ T, all of finite
measure, such that an ≤ λ0 (Wn ∩ (En × Fn )) for each n. Setting E = n≤m En and F = n≤m Fn , we have µE < ∞
and νF < ∞, so
[ [ X∞
λ( Wn ) ≥ λ 0 ( Wn ∩ (E × F )) = λ0 (Wn ∩ (E × F ))
n∈N n∈N n=0
m
X m
X
≥ λ0 (Wn ∩ (En × Fn )) ≥ an ≥ a.
n=0 n=0
S P∞ S P∞
As a is arbitrary, λ( n∈N Wn ) ≥ n=0 λWn and λ( n∈N Wn ) = n=0 λWn . As hWn in∈N is arbitrary, λ is a measure.
Q
Q

251H We need a simple property of the measure λ0 .

Lemma Let (X, Σ, µ) and (Y, T, ν) be two measure spaces; let λ0 be the primitive product measure on X × Y , and Λ
its domain. If H ⊆ X × Y and H ∩ (E × F ) ∈ Λ whenever µE < ∞ and νF < ∞, then H ∈ Λ.

proof Let θ be the outer measure described in 251A. SupposeS that A ⊆ X ×P Y and θA < ∞. Let ǫ > 0. Let hEn in∈N ,

hFn in∈N be sequences in Σ, T respectively such that A ⊆ n∈N En × Fn and n=0 µEn · νFn ≤ θA + ǫ. Now, for each
n, the product of the measures µEn , νEn is finite, so either one is zero or both are finite. If µEn = 0 or νFn = 0 then
of course
µEn · νFn = 0 = θ((En × Fn ) ∩ H) + θ((En × Fn ) \ H).
If µEn < ∞ and νFn < ∞ then
251I Finite products 199

µEn · νFn = λ0 (En × Fn )


= λ0 ((En × Fn ) ∩ H) + λ0 ((En × Fn ) \ H)
= θ((En × Fn ) ∩ H) + θ((En × Fn ) \ H).

Accordingly, because θ is an outer measure,



X ∞
X
θ(A ∩ H) + θ(A \ H) ≤ θ((En × Fn ) ∩ H) + θ((En × Fn ) \ H)
n=0 n=0
X∞
= µEn · νFn ≤ θA + ǫ.
n=0

As ǫ is arbitrary, θ(A ∩ H) + θ(A \ H) ≤ θA. As A is arbitrary, H ∈ Λ.

251I Now for the fundamental properties of the c.l.d. product measure.
Theorem Let (X, Σ, µ) and (Y, T, ν) be measure spaces; let λ be the c.l.d. product measure on X × Y , and Λ its
domain. Then
(a) Σ⊗Tb ⊆ Λ and λ(E × F ) = µE · νF whenever E ∈ Σ, F ∈ T and µE · νF < ∞;
(b) for every W ∈ Λ there is a V ∈ Σ⊗Tb such that V ⊆ W and λV = λW ;
(c) (X × Y, Λ, λ) is complete and locally determined, and in fact is the c.l.d. version of (X × Y, Λ, λ0 ) as described
in 213D-213E; in particular, λW = λ0 W whenever λ0 W < ∞;
(d) if W ∈ Λ and λW > 0 then there are E ∈ Σ, F ∈ T such that µE < ∞, νF < ∞ and λ(W ∩ (E × F )) > 0;
(e) if W ∈ Λ and
S λW < ∞, then for every ǫ > 0 there are E0 , . . . , En ∈ Σ, F0 , . . . , Fn ∈ T, all of finite measure,
such that λ(W △ i≤n (Ei × Fi )) ≤ ǫ.
proof Take θ to be the outer measure of 251A and λ0 the primitive product measure of 251C. Set Σf = {E : E ∈
Σ, µE < ∞} and Tf = {F : F ∈ T, νF < ∞}.
b ⊆ Λ. If E ∈ Σ and F ∈ T and µE · νF < ∞, either µE · νF = 0 and λ(E × F ) = λ0 (E × F ) = 0
(a) By 251E, Σ⊗T
or both µE and νF are finite and again λ(E × F ) = λ0 (E × F ) = µE · νF .
(b)(i) Take any a < λW . Then there are E ∈ Σf , F ∈ Tf such that λ0 (W ∩ (E × F )) > a (251F); now

θ((E × F ) \ W ) = λ0 ((E × F ) \ W )
= λ0 (E × F ) − λ0 (W ∩ (E × F )) < λ0 (E × F ) − a.
S P∞
Let hEn in∈N , hFn in∈N be sequences in Σ, T respectively such that (E × F ) \ W ⊆ n∈N En × Fn and n=0 µEn · νFn ≤
λ0 (E × F ) − a. Consider
S
V = (E × F ) \ n∈N En × Fn ∈ Σ⊗T; b
then V ⊆ W , and

λV = λ0 V = λ0 (E × F ) − λ0 ((E × F ) \ V )
[
≥ λ0 (E × F ) − λ0 ( En × F n )
n∈N
S
(because (E × F ) \ V ⊆ n∈N En × Fn )

X
≥ λ0 (E × F ) − µEn · νFn ≥ a
n=0

(by the choice of the En , Fn ).


(ii) Thus for every a < λW there is a V ∈ Σ⊗Tb such that V ⊆ W and λV ≥ a. Now choose a sequence han in∈N
S
b is
strictly increasing to λW , and for each an a corresponding Vn ; then V = n∈N Vn belongs to the σ-algebra Σ⊗T,
included in W , and has measure at least supn∈N λVn and at most λW ; so λV = λW , as required.
200 Product measures 251I

(c)(i) If H ⊆ X × Y is λ-negligible, there is a W ∈ Λ such that H ⊆ W and λW = 0. If E ∈ Σ, F ∈ T are of


finite measure, λ0 (W ∩ (E × F )) = 0; but λ0 , being derived from the outer measure θ by Carathéodory’s method, is
complete (212A), so H ∩ (E × F ) ∈ Λ and λ0 (H ∩ (E × F )) = 0. Because E and F are arbitrary, H ∈ Λ, by 251H. As
H is arbitrary, λ is complete.
(ii) If W ∈ Λ and λW = ∞, then there must be E ∈ Σ, F ∈ T such that µE < ∞, νF < ∞ and λ0 (W ∩(E ×F )) >
0; now
0 < λ(W ∩ (E × F )) ≤ µE · νF < ∞.
Thus λ is semi-finite.
(iii) If H ⊆ X × Y and H ∩ W ∈ Λ whenever λW < ∞, then, in particular, H ∩ (E × F ) ∈ Λ whenever µE < ∞
and νF < ∞; by 251H again, H ∈ Λ. Thus λ is locally determined.
S
(iv) If W ∈ Λ and λ0 W < ∞, then we have sequences hEn in∈N in Σ, hFn in∈N in T such that W ⊆ n∈N (En × Fn )
P∞
and n=0 µEn · νFn < ∞. Set
I = {n : µEn = ∞}, J = {n : νFn = ∞}, K = N \ (I ∪ J);
S S
then ν( n∈I En ) = 0, so λ0 (W \ W ′ ) = 0, where
Fn ) = µ( n∈J
S ′
S S
W = W ∩ n∈K (En × Fn ) ⊇ W \ (( n∈J En × Y ) ∪ (X × n∈I Fn )).
S S S
Now set En′ = i∈K,i≤n Ei , Fn′ = i∈K,i≤n Fi for each n. We have W ′ = n∈N W ′ ∩ (En′ × Fn′ ), so
λW ≤ λ0 W = λ0 W ′ = limn→∞ λ0 (W ′ ∩ (En′ × Fn′ )) ≤ λW ′ ≤ λW ,
and λW = λ0 W .
(v) Following the terminology of 213D, let us write
Λ̃ = {W : W ⊆ X × Y, W ∩ V ∈ Λ whenever V ∈ Λ and λ0 V < ∞},

λ̃W = sup{λ0 (W ∩ V ) : V ∈ Λ, λ0 V < ∞}.


Because λ0 (E × F ) < ∞ whenever µE < ∞ and νF < ∞, Λ̃ ⊆ Λ and Λ̃ = Λ.
Now for any W ∈ Λ we have

λ̃W = sup{λ0 (W ∩ V ) : V ∈ Λ, λ0 V < ∞}


≥ sup{λ0 (W ∩ (E × F )) : E ∈ Σf , F ∈ Tf }
= λW
≥ sup{λ(W ∩ V ) : V ∈ Λ, λ0 V < ∞}
= sup{λ0 (W ∩ V ) : V ∈ Λ, λ0 V < ∞},

using (iv) just above, so that λ = λ̃ is the c.l.d. version of λ0 .


(d) If W ∈ Λ and λW > 0, there are E ∈ Σf and F ∈ Tf such that λ(W ∩ (E × F )) = λ0 (W ∩ (E × F )) > 0.
(e) There are E ∈ Σf , F ∈ Tf such that λ0 (W ∩ (E × F )) ≥ λW − 13 ǫ; set V1 = W ∩ (E × F ); then
1
λ(W \ V1 ) = λW − λV1 = λW − λ0 V1 ≤ ǫ.
3
S P∞
There are sequences hEn′ in∈N in Σ, hFn′ in∈N in T such that V1 ⊆ n∈N En′ × Fn′ and n=0 µEn′ · νFn′ ≤ λ0 V1 + 13 ǫ.
Replacing
S En′ , Fn′ by En′ ∩ E, Fn′ ∩ F if necessary, we may suppose that En′ ∈ Σf and Fn′ ∈ Tf for every n. Set
V2 = n∈N En′ × Fn′ ; then
P∞ 1
λ(V2 \ V1 ) ≤ λ0 (V2 \ V1 ) ≤ n=0 µEn′ · νFn′ − λ0 V1 ≤ ǫ.
3
P∞
Let m ∈ N be such that n=m+1 µEn′ · νFn′ ≤ 31 ǫ, and set
Sm
V = n=0 En′ × Fn′ .
Then
*251L Finite products 201

P∞ 1
λ(V2 \ V ) ≤ n=m+1 µEn′ · νFn′ ≤ ǫ.
3
Putting these together, we have W △V ⊆ (W \ V1 ) ∪ (V2 \ V1 ) ∪ (V2 \ V ), so
1 1 1
λ(W △V ) ≤ λ(W \ V1 ) + λ(V2 \ V1 ) + λ(V2 \ V ) ≤ ǫ + ǫ + ǫ = ǫ.
3 3 3
And V is of the required form.

251J Proposition If (X, Σ, µ) and (Y, T, ν) are semi-finite measure spaces and λ is the c.l.d. product measure on
X × Y , then λ(E × F ) = µE · νF for all E ∈ Σ, F ∈ T.
proof Setting Σf = {E : E ∈ Σ, µE < ∞}, Tf = {F : F ∈ T, νF < ∞}, we have

λ(E × F ) = sup{λ0 ((E ∩ E0 ) × (F ∩ F0 )) : E0 ∈ Σf , F0 ∈ Tf }


= sup{µ(E ∩ E0 ) · ν(F ∩ F0 )) : E0 ∈ Σf , F0 ∈ Tf }
= sup{µ(E ∩ E0 ) : E0 ∈ Σf } · sup{ν(F ∩ F0 ) : F0 ∈ Tf } = µE · νF
(using 213A).

251K σ-finite spaces Of course most of the measure spaces we shall apply these results to are σ-finite, and in this
case there are some useful simplifications.
Proposition Let (X, Σ, µ) and (Y, T, ν) be σ-finite measure spaces. Then the c.l.d. product measure on X × Y is equal
b
to the primitive product measure, and is the completion of its restriction to Σ⊗T; moreover, this common product
measure is σ-finite.
proof Write λ0 , λ for the primitive and c.l.d. product measures, as usual, and Λ for their domain. Let hEn in∈N ,
hFn in∈N be non-decreasing sequences of sets of finite measure covering X, Y respectively (see 211D).
S
(a) For each n ∈ N, λ(En × Fn ) = µEn · νFn is finite, by 251Ia. Since X × Y = n∈N En × Fn , λ is σ-finite.
(b) For any W ∈ Λ,
λ0 W = limn→∞ λ0 (W ∩ (En × Fn )) = limn→∞ λ(W ∩ (En × Fn )) = λW .
So λ = λ0 .
b and λ̂B for its completion.
(c) Write λB for the restriction of λ = λ0 to Σ⊗T,
b such that W ′ ⊆ W ⊆ W ′′ and λB (W ′′ \ W ′ ) = 0
(i) Suppose that W ∈ dom λ̂B . Then there are W ′ , W ′′ ∈ Σ⊗T
′′ ′
(212C). In this case, λ(W \ W ) = 0; as λ is complete, W ∈ Λ and
λW = λW ′ = λB W ′ = λ̂B W .
Thus λ extends λ̂B .
(ii) If W ∈ Λ, then there is a V ∈ Σ⊗Tb such that V ⊆ W and λ(W \ V ) = 0. P P For each n ∈ N there is a
Vn ∈ Σ⊗T b such that Vn ⊆ W ∩ (En × Fn ) and λVn = λ(W ∩ (En × Fn )) (251Ib). But as λ(En × Fn ) = µEn · νFn is
S
b V ⊆ W and
finite, this means that λ(W ∩ (En × Fn ) \ Vn ) = 0. So if we set V = n∈N Vn , we shall have V ∈ Σ⊗T,
S S
W \ V = n∈N W ∩ (En × Fn ) \ V ⊆ n∈N W ∩ (En × Fn ) \ Vn
is λ-negligible. QQ
b such that V ′ ⊆ (X ×Y )\W and λ(((X ×Y )\W )\V ′ ) = 0. Setting V ′′ = (X ×Y )\V ′ ,
Similarly, there is a V ′ ∈ Σ⊗T
b W ⊆ V ′′ and λ(V ′′ \ W ) = 0. So
V ′′ ∈ Σ⊗T,
λB (V ′′ \ V ) = λ(V ′′ \ V ) = λ(V ′′ \ W ) + λ(W \ V ) = 0,
and W is measured by λ̂B , with λ̂B W = λB V = λW . As W is arbitrary, λ̂B = λ.

*251L The following result fits in naturally here; I star it because it will be used seldom (there is a more important
version of the same idea in 254G) and the proof can be skipped until you come to need it.
Proposition Let (X1 , Σ1 , µ1 ), (X2 , Σ2 , µ2 ), (Y1 , T1 , ν1 ) and (Y2 , T2 , ν2 ) be σ-finite measure spaces; let λ1 , λ2 be the
product measures on X1 × Y1 , X2 × Y2 respectively. Suppose that f : X1 → X2 and g : Y1 → Y2 are inverse-measure-
preserving functions, and that h(x, y) = (f (x), g(y)) for x ∈ X1 , y ∈ Y1 . Then h is inverse-measure-preserving.
202 Product measures *251L

proof Write Λ1 , Λ2 for the domains of λ1 , λ2 respectively.


(a) Suppose that E ∈ Σ2 and F ∈ T2 have finite measure. Then λ1 h−1 [W ∩ (E × F )] is defined and equal to
λ2 (W ∩ (E × F )) for every W ∈ Λ2 . P
P

λ1 h−1 [E × F ] = λ1 (f −1 [E] × g −1 [F ]) = µ1 f −1 [E] · ν1 g −1 [F ]


= µ2 E · ν2 F = λ2 (E × F )
by 251E/251J. Q
Q
(b) Take E0 ∈ Σ2 and F0 ∈ T2 of finite measure. Let λ̃1 , λ̃2 be the subspace measures on f −1 [E0 ] × g −1 [F0 ] and
E0 × F0 respectively. Set h̃ = h↾f −1 [E0 ] × g −1 [F0 ], and write ι for the identity map from E0 × F0 to X2 × Y2 ; let
λ = λ̃1 h̃−1 and λ′ = λ̃2 ι−1 be the image measures on X2 × Y2 . Then (a) tells us that

λ(E × F ) = λ1 (h−1 [(E ∩ E0 ) × (F ∩ F0 )])


= λ2 ((E ∩ E0 ) × (F ∩ F0 )) = λ′ (E × F )
b 2 , that is, λ1 (h−1 [W ∩
whenever E ∈ Σ2 and F ∈ T2 . By the Monotone Class Theorem (136C), λ and λ′ agree on Σ2 ⊗T
b 2.
(E0 × F0 )]) = λ2 (W ∩ (E0 × F0 )) for every W ∈ Σ2 ⊗T
′ ′′
If W is any member of Λ2 , there are W , W ∈ Σ2 ⊗T b 2 such that W ′ ⊆ W ⊆ W ′′ and λ2 (W ′′ \ W ′ ) = 0 (251K).
Now we must have
h−1 [W ′ ∩ (E0 × F0 )] ⊆ h−1 [W ∩ (E0 × F0 )] ⊆ h−1 [W ′′ ∩ (E0 × F0 )],

λ1 (h−1 [W ′′ ∩ (E0 × F0 )] \ h−1 [W ′ ∩ (E0 × F0 )]) = λ2 ((W ′′ \ W ′ ) ∩ (E0 × F0 )) = 0;


because λ1 is complete, λ1 h−1 [W ∩ (E0 × F0 )] is defined and equal to
λ1 h−1 [W ′ ∩ (E0 × F0 )] = λ2 (W ′ ∩ (E0 × F0 )) = λ2 (W ∩ (E0 × F0 )).

(c) Now suppose that hEn in∈N , hFn in∈N are non-decreasing sequences of sets of finite measure with union X2 , Y2
respectively. If W ∈ Λ2 ,
λ1 h−1 [W ] = supn∈N λ1 h−1 [W ∩ (En × Fn )] = supn∈N λ2 (W ∩ (En × Fn )) = λ2 W .
So h is inverse-measure-preserving, as claimed.

251M It is time that I gave some examples. Of course the central example is Lebesgue measure. In this case we
b introduced in 251D.
have the only reasonable result. I pause to describe the leading example of the product Σ⊗T
Proposition Let r, s ≥ 1 be integers. Then we have a natural bijection φ : R r × R s → R r+s , defined by setting
φ((ξ1 , . . . , ξr ), (η1 , . . . , ηs )) = (ξ1 , . . . , ξr , η1 , . . . , ηs )
for ξ1 , . . . , ξr , η1 , . . . , ηs ∈ R. If we write Br , Bs and Br+s for the Borel σ-algebras of R r , R s and R r+s respectively,
then φ identifies Br+s with Br ⊗B b s.

proof (a) Write B for the σ-algebra {φ−1 [W ] : W ∈ Br+s } copied onto R r × R s by the bijection φ; we are seeking
to prove that B = Br ⊗Bb s . We have maps π1 : R r+s → R r , π2 : R r+s → R s defined by setting π1 (φ(x, y)) = x,
π2 (φ(x, y)) = y. Each co-ordinate of π1 is continuous, therefore Borel measurable (121Db), so π1−1 [E] ∈ Br+s for every
E ∈ Br , by 121K. Similarly, π2−1 [F ] ∈ Br+s for every F ∈ Bs . So φ[E × F ] = π1−1 [E] ∩ π1−1 [F ] belongs to Br+s , that is,
E × F ∈ B, whenever E ∈ Br and F ∈ Bs . Because B is a σ-algebra, Br ⊗B b s ⊆ B.

(b) Now examine sets of the form


{(x, y) : ξi ≤ α} = {x : ξi ≤ α} × R s ,

{(x, y) : ηj ≤ α} = R r × {y : ηj ≤ α}
b s . But the
for α ∈ R, i ≤ r and j ≤ s, taking x = (ξ1 , . . . , ξr ) and y = (η1 , . . . , ηs ). All of these belong to Br ⊗B
σ-algebra they generate is just B, by 121J. So B ⊆ Br ⊗B b s and B = Br ⊗B b s.

251N Theorem Let r, s ≥ 1 be integers. Then the bijection φ : R r × R s → R r+s described in 251M identifies
Lebesgue measure on R r+s with the c.l.d. product λ of Lebesgue measure on R r and Lebesgue measure on R s .
251O Finite products 203

proof Write µr , µs , µr+s for the three versions of Lebesgue measure, µ∗r , µ∗s and µ∗r+s for the corresponding outer
measures, and θ for the outer measure on R r × R s derived from µr and µs by the formula of 251A.
(a) If I ⊆ R r and J ⊆ R s are half-open intervals, then φ[I × J] ⊆ R r+s is also a half-open interval, and
µr+s (φ[I × J]) = µr I · µs J;
this is immediate from the Qdefinition of the Lebesgue measure of an interval. (I speak of ‘half-open’ intervals here, that
is, intervals of the form 1≤j≤r [αj , βj [, because I used them in the definition of Lebesgue measure in §115. If you
prefer to work with open intervals or closed intervals it makes no difference.) Note also that every half-open interval
in R r+s is expressible as φ[I × J] for suitable I, J.
−1
(b) For any A ⊆ R r+sS , θ(φ [A]) ≤ P∞ µ∗r+s (A). P
P For any ǫ > 0, there is a sequence hKn in∈N of half-open intervals
in Rr+s
such that A ⊆ n∈N Kn and n=0 µr+s (Kn ) ≤ µ∗r+s (A) + ǫ. Express each Kn as φ[In × Jn ], where In and Jn
S
are half-open intervals in R r and R s respectively; then φ−1 [A] ⊆ n∈N In × Jn , so that
P∞ P∞
θ(φ−1 [A]) ≤ n=0 µr In · µs Jn = n=0 µr+s (Kn ) ≤ µ∗r+s (A) + ǫ.
As ǫ is arbitrary, we have the result. Q
Q
(c) If E ⊆ R r and F ⊆ R s are measurable, then µ∗r+s (φ[E × F ]) ≤ µr E · µs F .
P < ∞, µs F < ∞.
P (i) Consider first the case µr E S S In this case, given ǫ > 0, there are sequences hIn in∈N , hJn in∈N
of half-open intervals such that E ⊆ n∈N In , F ⊆ n∈N Fn ,
P∞ ∗
n=0 µr In ≤ µr E + ǫ = µr E + ǫ,
P∞ ∗
n=0 µs Jn ≤ µs F + ǫ = µs F + ǫ.
S S
Accordingly E × F ⊆ m,n∈N Im × Jn and φ[E × F ] ⊆ m,n∈N φ[Im × Jn ], so that

X ∞
X
µ∗r+s (φ[E × F ]) ≤ µr+s (φ[Im × Jn ]) = µr I m · µs Jn
m,n=0 m,n=0
X∞ ∞
X
= µr I m · µs Jn ≤ (µr E + ǫ)(µs F + ǫ).
m=0 n=0

As ǫ is arbitrary, we have the result.


(ii) Next, if µr E = 0, there is a sequence hFn in∈N of sets of finite measure covering R s ⊇ F , so that
P∞ P∞
µ∗r+s (φ[E × F ]) ≤ n=0 µ∗r+s (φ[E × Fn ]) ≤ n=0 µr E · µs Fn = 0 = µr E · µs F .
Similarly, µ∗r+s (φ[E × F ]) ≤ µr E · µs F if µs F = 0. The only remaining case is that in which both of µr E, µs F are
strictly positive and one is infinite; but in this case µr E · µs F = ∞, so surely µ∗r+s (φ[E × F ]) ≤ µr E · µs F . Q
Q
(d) If A ⊆ R r+s , then µ∗r+s (A) ≤ θ(φ−1 [A]). P P
S Given ǫ > 0, therePare sequences hEn in∈N , hFn in∈N of measurable
r s −1 ∞ −1
sets
S in R , R respectively such that φ [A] ⊆ n∈N E n × F n and n=0 µr En · µs Fn ≤ θ(φ [A]) + ǫ. Now A ⊆
n∈N φ[E n × F n ], so
P∞ P∞
µ∗r+s (A) ≤ n=0 µ∗r+s (φ[En × Fn ]) ≤ n=0 µr En · µs Fn ≤ θ(φ−1 [A]) + ǫ.
As ǫ is arbitrary, we have the result. Q
Q
(e) Putting (c) and (d) together, we have θ(φ−1 [A]) = µ∗r+s (A) for every A ⊆ Rr+s . Thus θ on R r × R s corresponds
exactly to µ∗r+s on R r+s . So the associated measures λ0 , µr+s must correspond in the same way, writing λ0 for the
primitive product measure. But 251K tells us that λ0 = λ, so we have the result.

251O In fact, a large proportion of the applications of the constructions here are to subspaces of Euclidean space,
rather than to the whole product R r × R s . It would not have been especially difficult to write 251N out to deal with
arbitrary subspaces, but I prefer to give a more general description of the product of subspace measures, as I feel that
it illuminates the method. I start with a straightforward result on strictly localizable spaces.
Proposition Let (X, Σ, µ) and (Y, T, ν) be strictly localizable measure spaces. Then the c.l.d. product measure on
X × Y is strictly localizable; moreover, if hXi ii∈I and hYj ij∈J are decompositions of X and Y respectively, hXi ×
Yj i(i,j)∈I×J is a decomposition of X × Y .
204 Product measures 251O

proof Let hXi ii∈I and hYj ij∈J be decompositions of X, Y respectively. Then hXi × Yj i(i,j)∈I×J is a partition of X × Y
into measurable sets of finite measure. If W ⊆ X × Y and P λW > 0, there are sets EP∈ Σ, F ∈ T such that µE < ∞,
νF < ∞ and λ(W ∩ (E × F )) > 0. We know that µE = i∈I µ(E ∩ Xi ) and µF = j∈J µ(F ∩ Yj ), so there must be
finite sets I0 ⊆ I, J0 ⊆ J such that
P P
µE · νF − ( i∈I0 µ(E ∩ Xi ))( j∈J0 ν(F ∩ Yj )) < λ(W ∩ (E × F )).
S S
Setting E ′ = i∈I0 Xi and F ′ = j∈J0 Yj we have
λ((E × F ) \ (E ′ × F ′ )) = λ(E × F ) − λ((E ∩ E ′ ) × (F ∩ F ′ )) < λ(W ∩ (E × F )),
so that λ(W ∩ (E ′ × F ′ )) > 0. There must therefore be some i ∈ I0 , j ∈ J0 such that λ(W ∩ (Xi × Yj )) > 0.
This shows that {Xi × Yj : i ∈ I, j ∈ J} satisfies the criterion of 213O, so that λ, being complete and locally
determined, must be strictly localizable. Because hXi × Yj i(i,j)∈I×J covers X × Y , it is actually a decomposition of
X × Y (213Ob).

251P Lemma Let (X, Σ, µ) and (Y, T, ν) be measure spaces, and λ the c.l.d. product measure on X × Y . Let λ∗
be the corresponding outer measure (132B). Then
λ∗ C = sup{θ(C ∩ (E × F )) : E ∈ Σ, F ∈ T, µE < ∞, νF < ∞}
for every C ⊆ X × Y , where θ is the outer measure of 251A.
proof Write Λ for the domain of λ, Σf for {E : E ∈ Σ, µE < ∞}, Tf for {F : F ∈ T, νF < ∞}; set u =
sup{θ(C ∩ (E × F )) : E ∈ Σf , F ∈ Tf }.
(a) If C ⊆ W ∈ Λ, E ∈ Σf and F ∈ Tf , then

θ(C ∩ (E × F )) ≤ θ(W ∩ (E × F )) = λ0 (W ∩ (E × F ))
(where λ0 is the primitive product measure)
≤ λW.
As E and F are arbitrary, u ≤ λW ; as W is arbitrary, u ≤ λ∗ C.
(b) If u = ∞, then of course λ∗ C = u. Otherwise, let hEn in∈N , hFn in∈N be sequences in Σf , Tf respectively such
that
u = supn∈N θ(C ∩ (En × Fn )).
S S
Consider C ′ = C \ ( n∈N En × n∈N Fn ). If E ∈ Σf and F ∈ Tf , then for every n ∈ N we have

u ≥ θ(C ∩ ((E ∪ En ) × (F ∪ Fn )))


= θ(C ∩ ((E ∪ En ) × (F ∪ Fn )) ∩ (En × Fn ))
+ θ(C ∩ ((E ∪ En ) × (F ∪ Fn )) \ (En × Fn ))
(because En × Fn ∈ Λ, by 251E)
≥ θ(C ∩ (En × Fn )) + θ(C ′ ∩ (E × F )).
Taking the supremum of the right-hand expression as n varies, we have u ≥ u + θ(C ′ ∩ (E × F )) so
λ(C ′ ∩ (E × F )) = θ(C ′ ∩ (E × F )) = 0.
As E and F are arbitrary, λC ′ = 0.
But this means that

[ [
λ∗ C ≤ λ∗ (C ∩ ( En × Fn )) + λ∗ C ′
n∈N n∈N
[ [

= lim λ (C ∩ ( Ei × Fi ))
n→∞
i≤n i≤n
(using 132Ae)
≤ u,
as required.
251Q Finite products 205

251Q Proposition Let (X, Σ, µ) and (Y, T, ν) be measure spaces, and A ⊆ X, B ⊆ Y subsets; write µA , νB for the
subspace measures on A, B respectively. Let λ be the c.l.d. product measure on X × Y , and λ# the subspace measure
it induces on A × B. Let λ̃ be the c.l.d. product measure of µA and νB on A × B. Then
(i) λ̃ extends λ# .
(ii) If
either (α) A ∈ Σ and B ∈ T
or (β) A and B can both be covered by sequences of sets of finite measure
or (γ) µ and ν are both strictly localizable,
then λ̃ = λ# .
proof Let θ be the outer measure on X × Y defined from µ and ν by the formula of 251A, and θ̃ the outer measure on
A × B similarly defined from µA and νB . Write Λ for the domain of λ, Λ̃ for the domain of λ̃, and Λ# = {W ∩ (A × B) :
W ∈ Λ} for the domain of λ# . Set Σf = {E : µE < ∞}, Tf = {F : νF < ∞}.
that θ̃C = θC for every C ⊆ A × B. P
(a) The first point to observe is S P (i) If hEn in∈N and hFn in∈N are sequences
in Σ, T respectively such that C ⊆ n∈N En × Fn , then
S
C = C ∩ (A × B) ⊆ n∈N (En ∩ A) × (Fn ∩ B),
so

X
θ̃C ≤ µA (En ∩ A) · νB (Fn ∩ B)
n=0
X∞ ∞
X
= µ∗ (En ∩ A) · ν ∗ (Fn ∩ B) ≤ µEn · νFn .
n=0 n=0

As hEn in∈N and hFn in∈N are arbitrary, θ̃C ≤ θC. (ii) If hẼn in∈N , hF̃n in∈N are sequences in ΣA = dom µA , TB = dom νB
S
respectively such that C ⊆ n∈N Ẽn × F̃n , then for each n ∈ N we can choose En ∈ Σ, Fn ∈ T such that
Ẽn ⊆ En , µEn = µ∗ Ẽn = µA Ẽn ,

F̃n ⊆ Fn , νFn = ν ∗ F̃n = νB F̃n ,


and now
P∞ P∞
θC ≤ n=0 µEn · νFn = n=0 µA Ẽn · νB F̃n .
As hẼn in∈N , hF̃n in∈N are arbitrary, θC ≤ θ̃C. Q
Q
(b) It follows that Λ# ⊆ Λ̃. P
P Suppose that V ∈ Λ# and that C ⊆ A × B. In this case there is a W ∈ Λ such that
V = W ∩ (A × B). So
θ̃(C ∩ V ) + θ̃(C \ V ) = θ(C ∩ W ) + θ(C \ W ) = θC = θ̃C.
As C is arbitrary, V ∈ Λ̃. Q
Q
Accordingly, for V ∈ Λ# ,

λ# V = λ∗ V = sup{θ(V ∩ (E × F )) : E ∈ Σf , F ∈ Tf }
= sup{θ(V ∩ (Ẽ × F̃ )) : Ẽ ∈ ΣA , F̃ ∈ TB , µA Ẽ < ∞, νB F̃ < ∞}
= sup{θ̃(V ∩ (Ẽ × F̃ )) : Ẽ ∈ ΣA , F̃ ∈ TB , µA Ẽ < ∞, νB F̃ < ∞} = λ̃V,

using 251P twice.


This proves part (i) of the proposition.
(c) The next thing to check is that if V ∈ Λ̃ and V ⊆ E × F where E ∈ Σf and F ∈ Tf , then V ∈ Λ# . P
P Let
W ⊆ E × F be a measurable envelope of V with respect to λ (132Ee). Then

θ(W ∩ (A × B) \ V ) = θ̃(W ∩ (A × B) \ V ) = λ̃(W ∩ (A × B) \ V )


(because W ∩ (A × B) ∈ Λ# ⊆ Λ̃, V ∈ Λ̃)
206 Product measures 251Q

= λ̃(W ∩ (A × B)) − λ̃V = θ̃(W ∩ (A × B)) − θ̃V


= θ(W ∩ (A × B)) − θV = λ∗ (W ∩ (A × B)) − λ∗ V
≤ λW − λ∗ V = 0.

But this means that W ′ = W ∩ (A × B) \ V ∈ Λ and V = (A × B) ∩ (W \ W ′ ) belongs to Λ# . Q


Q
(d) Now fix any V ∈ Λ̃, and look at the conditions (α)-(γ) of part (ii) of the proposition.
α) If A ∈ Σ and B ∈ T, and C ⊆ X × Y , then A × B ∈ Λ (251E), so

θ(C ∩ V ) + θ(C \ V ) = θ(C ∩ V ) + θ((C \ V ) ∩ (A × B)) + θ((C \ V ) \ (A × B))


= θ̃(C ∩ V ) + θ̃(C ∩ (A × B) \ V ) + θ(C \ (A × B))
= θ̃(C ∩ (A × B)) + θ(C \ (A × B))
= θ(C ∩ (A × B)) + θ(C \ (A × B)) = θC.

As C is arbitrary, V ∈ Λ, so V = V ∩ (A × B) belongs to Λ# .
S S S
β ) If A ⊆ n∈N En and B ⊆ n∈N Fn where all the En , Fn are of finite measure, then V = m,n∈N V ∩(Em ×Fn ) ∈

Λ# , by (c).
(γγ ) If hXi ii∈I , hYj ij∈J are decompositions of X, Y respectively, then for each i ∈ I, j ∈ J we have V ∩ (Xi × Yj ) ∈
#
Λ , that is, there is a Wij ∈ Λ such that V ∩ (Xi × Yj ) = Wij ∩ (A × B). Now hXi × Yj i(i,j)∈I×J is a decomposition
of X × Y for λ (251O), so that
S
W = i∈I,j∈J Wij ∩ (Xi × Yj ) ∈ Λ,
and V = W ∩ (A × B) ∈ Λ# .
(e) Thus any of the three conditions is sufficient to ensure that Λ̃ = Λ# , in which case (a) tells us that λ̃ = λ# .

251R Corollary Let r, s ≥ 1 be integers, and φ : Rr × R s → R r+s the natural bijection. If A ⊆ R r and B ⊆ R s ,
then the restriction of φ to A × B identifies the product of Lebesgue measure on A and Lebesgue measure on B with
Lebesgue measure on φ[A × B] ⊆ R r+s .
Remark Note that by ‘Lebesgue measure on A’ I mean the subspace measure µrA on A induced by r-dimensional
Lebesgue measure µr on R r , whether or not A is itself a measurable set.
proof By 251Q, using either of the conditions (ii-β) or (ii-γ), the product measure λ̃ on A × B is just the subspace
measure λ# on A × B induced by the product measure λ on R r × R s . But by 251N we know that φ is an isomorphism
between (R r × Rs , λ) and (R r+s , µr+s ); so it must also identify λ̃ with the subspace measure on φ[A × B].

251S Corollary Let (X, Σ, µ) and (Y, T, ν) be measure spaces, and λ the c.l.d. product measure on X × Y . If
A ⊆ X and B ⊆ Y can be covered by sequences of sets of finite measure, then λ∗ (A × B) = µ∗ A · ν ∗ B.
proof In the language of 251Q,

λ∗ (A × B) = λ# (A × B) = λ̃(A × B) = µA A · νB B
(by 251K and 251E)
= µ∗ A · ν ∗ B.

251T The next proposition gives an idea of how the technical definitions here fit together.
Proposition Let (X, Σ, µ) and (Y, T, ν) be measure spaces. Write (X, Σ̂, µ̂) and (X, Σ̃, µ̃) for the completion and c.l.d.
version of (X, Σ, µ) (212C, 213E). Let λ, λ̂ and λ̃ be the three c.l.d. product measures on X × Y obtained from the
pairs (µ, ν), (µ̂, ν) and (µ̃, ν) of factor measures. Then λ = λ̂ = λ̃.
proof Write Λ, Λ̂ and Λ̃ for the domains of λ, λ̂, λ̃ respectively; and θ, θ̂, θ̃ for the outer measures on X × Y obtained
by the formula of 251A from the three pairs of factor measures.
*251W Finite products 207

(a) If E ∈ Σ and µE < ∞, then θ, θ̂ and θ̃ agree on subsets of E × Y . P


P Take A ⊆ E × Y and ǫ > 0.
S P∞
(i) There are sequences hEn in∈N in Σ, hFn in∈N in T such that A ⊆ n∈N En × Fn and n=0 µEn · νFn ≤ θA + ǫ.
Now µ̃En ≤ µEn for every n (213Fb), so
P∞ P∞
θ̃A ≤ n=0 µ̃En · νFn ≤ n=0 µEn · νFn ≤ θA + ǫ.
S P∞
(ii) There are sequences hÊn in∈N in Σ̂, hF̂n in∈N in T such that A ⊆ n∈N Ên × F̂n and n=0 µ̂Ên · ν F̂n ≤ θ̂A + ǫ.
Now for each n there is an En′ ∈ Σ such that Ên ⊆ En′ and µEn′ = µ̂Ên , so that
P∞ P∞
θA ≤ n=0 µEn′ · ν F̂n = n=0 µ̂Ên · ν F̂n ≤ θ̂A + ǫ.
S P∞
(iii) There are sequences hẼn in∈N in Σ̃, hF̃n in∈N in T such that A ⊆ n∈N Ẽn × F̃n and n=0 µ̃Ẽn · ν F̃n ≤ θ̃A + ǫ.
Now for each n, Ẽn ∩ E ∈ Σ̂, so
P∞ P∞
θ̂A ≤ n=0 µ̂(Ẽn ∩ E) · ν F̃n ≤ n=0 µ̃Ẽn · ν F̃n ≤ θ̃A + ǫ.

(iv) Since A and ǫ are arbitrary, θ = θ̂ = θ̃ on P(E × Y ). Q


Q
(b) Consequently, the outer measures λ∗ , λ̂∗ and λ̃∗ are identical. P
P Use 251P. Take A ⊆ X × Y , E ∈ Σ, Ê ∈ Σ̂,
Ẽ ∈ Σ̃, F ∈ T such that µE, µ̂Ê, µ̃Ẽ and νF are all finite. Then
(i)
θ(A ∩ (E × F )) = θ̂(A ∩ (E × F )) ≤ λ̂∗ A, θ(A ∩ (E × F )) = θ̃(A ∩ (E × F )) ≤ λ̃∗ A
because µ̂E and µ̃E are both finite.
(ii) There is an E ′ ∈ Σ such that Ê ⊆ E ′ and µE ′ < ∞, so that
θ̂(A ∩ (Ê × F )) ≤ θ̂(A ∩ (E ′ × F )) = θ(A ∩ (E ′ × F )) ≤ λ∗ A.

(iii) There is an E ′′ ∈ Σ such that E ′′ ⊆ Ẽ and µ̃(Ẽ \ E ′′ ) = 0 (213Fc), so that θ̃((Ẽ \ E ′′ ) × Y ) = 0 and µE ′′ < ∞;
accordingly
θ̃(A ∩ (Ẽ × F )) = θ̃(A ∩ (E ′′ × F )) = θ(A ∩ (E ′′ × F )) ≤ λ∗ A.

(iv) Taking the suprema over E, Ê, Ẽ and F , we get


λ∗ A ≤ λ̂∗ A, λ∗ A ≤ λ̃∗ A, λ̂∗ A ≤ λ∗ A, λ̃∗ A ≤ λ∗ A.
As A is arbitrary, λ∗ = λ̂∗ = λ̃∗ . Q
Q
(c) Now λ, λ̂ and λ̃ are all complete and locally determined, so by 213C are the measures defined by Carathéodory’s
method from their own outer measures, and are therefore identical.

251U It is ‘obvious’, and an easy consequence of theorems so far proved, that the set {(x, x) : x ∈ R} is negligible
for Lebesgue measure on R 2 . The corresponding result is true in the square of any atomless measure space.
Proposition Let (X, Σ, µ) be an atomless measure space, and let λ be the c.l.d. measure on X ×X. Then ∆ = {(x, x) :
x ∈ X} is λ-negligible.
proof Let E, F ∈ Σ be sets of finite measure, and n ∈ N. Applying 215D repeatedly, we can find a disjoint family
µF S µF
hFi ii<n of measurable subsets of F such that µFi = for each i; setting Fn = F \ i<n Fi , we also have µFn = .
n+1 n+1
Now
S
∆ ∩ (E × F ) ⊆ i≤n (E ∩ Fi ) × Fi ,
so
Pn µF Pn 1
λ∗ (∆ ∩ (E × F )) ≤ i=0 µ(E ∩ Fi ) · µFi = µ(E ∩ Fi ) ≤ µE · µF .
n+1 i=0 n+1
As n is arbitrary, λ(∆ ∩ (E × F )) = 0; as E and F are arbitrary, λ∆ = 0.

*251W Products of more than two spaces The whole of this section can be repeated for arbitrary finite
products. The labour is substantial but no new ideas are required. By the time we need the general construction in
208 Product measures *251W

any formal way, it should come very naturally, and I do not think it is necessary to work through the next couple of
pages before proceeding, especially as products of probability spaces are dealt with in §254. However, for completeness,
and to help locate results when applications do appear, I list them here. They do of course constitute a very instructive
set of exercises. The most important fragments are repeated in 251Xe-251Xf.
Q
Let h(Xi , Σi , µi )ii∈I be a finite family of measure spaces, and set X = i∈I Xi . Write Σfi = {E : E ∈ Σi , µi E < ∞}
for each i ∈ I.

(a) For A ⊆ X set


∞ Y
X [Y
θA = inf{ µi Eni : Eni ∈ Σi ∀ i ∈ I, n ∈ N, A ⊆ Eni }.
n=0 i∈I n∈N i∈I

Then θ is an outer measure on X. Let λ0 be the measure on X derived by Carathéodory’s method from θ, and Λ its
domain.
N
(b) If hXi ii∈I is a finite family σ-algebra of subsets of Xi for each i ∈ I, then c i∈I Σi is the
Q of sets and Σi is a Q
σ-algebra of subsets of X = i∈I Xi generated by { i∈I Ei : Ei ∈ Σi for every i ∈ I}. (For the corresponding
construction when I is infinite, see 254E.)
Q Q
(c) λ0 ( i∈I Ei ) is defined and equal to i∈I µi Ei whenever Ei ∈ Σi for each i ∈ I.

(d) The c.l.d. product measure on X is the measure λ defined by setting


Q
λW = sup{λ0 (W ∩ i∈I Ei ) : Ei ∈ Σfi for each i ∈ I}
for W ∈ Λ. If I is empty, so that X = {∅}, then the appropriate convention is to set λX = 1.
Q
(e) If H ⊆ X, then H ∈ Λ iff H ∩ i∈I Ei ∈ Λ whenever Ei ∈ Σfi for each i ∈ I.
N Q Q
(f )(i) c i∈I Σi ⊆ Λ and λ( i∈I Ei ) = i∈I µi Ei whenever Ei ∈ Σfi for each i.
N
(ii) For every W ∈ Λ there is a V ∈ c Σi such that V ⊆ W and λV = λW .
i∈I
(iii) λ is complete and locally determined, and is the c.l.d. version of λ0 .
Q
(iv) If W ∈ Λ and λW > 0 then there are Ei ∈ Σfi , for i ∈ I, such that λ(W ∩ i∈I Ei ) > 0.
(v)SIf WQ ∈ Λ and λW < ∞, then for every ǫ > 0 there are n ∈ N and E0i , . . . , Eni ∈ Σfi , for each i ∈ I, such that
λ(W △ k≤n i∈I Eki ) ≤ ǫ.
N
(g) If each µi is σ-finite, so is λ, and λ = λ0 is the completion of its restriction to c i∈I Σi .

(h) If hIj ij∈J is any partition of I, then λ can be identified with the c.l.d. product of hλj ij∈J , where λj is the c.l.d.
product of hµi ii∈Ij . (See the arguments in 251N and also in 254N below.)

(i) If I = {1, . . . , n} and each µi is Lebesgue measure on R, then λ can be identified with Lebesgue measure on R n .
Q
(j) If, for each i ∈ I, we have a decomposition hXij ij∈Ji of Xi , then h i∈I Xi,f (i) if ∈Qi∈I Ji is a decomposition of X.

(k) For any C ⊆ X,


Q
λ∗ C = sup{θ(C ∩ i∈I Ei ) : Ei ∈ Σfi for every i ∈ I}.

Q
(l) Suppose that Ai ⊆ Xi for each i ∈ I. Write λ# for the subspace measure on A = i∈I Ai , and λ̃ for the c.l.d.
product of the subspace measures on the Ai . Then λ̃ extends λ# , and if
either Ai ∈ Σi for every i
or every Ai can be covered by a sequence of sets of finite measure
or every µi is strictly localizable,
then λ̃ = λ# .
Q Q
(m) If Ai ⊆ Xi can be covered by a sequence of sets of finite measure for each i ∈ I, then λ∗ ( i∈I Ai ) = i∈I µ∗i Ai .
251Xi Finite products 209

(n) Writing µ̂i , µ̃i for the completion and c.l.d. version of each µi , λ is the c.l.d. product of hµ̂i ii∈I and also of
hµ̃i ii∈I .

(o) If all the (Xi , Σi , µi ) are the same atomless measure space, then {x : x ∈ X, i 7→ x(i) is injective} is λ-conegligible.

(p) Now suppose that we have another family h(Yi , Ti , νi )ii∈I of measure spaces, with product (Y, Λ′ , λ′ ), and
inverse-measure-preserving functions fi : Xi → Yi for each i; define f : X → Y by saying that f (x)(i) = fi (x(i)) for
x ∈ X and i ∈ I. If all the νi are σ-finite, then f is inverse-measure-preserving for λ and λ′ .

251X Basic exercises (a) Let X and Y be sets, A ⊆ PX and B ⊆ PY . Let Σ be the σ-algebra of subsets of X
b is the σ-algebra of subsets of
generated by A and T the σ-algebra of subsets of Y generated by B. Show that Σ⊗T
X × Y generated by {A × B : A ∈ A, B ∈ B}.

(b) Let (X, Σ, µ) and (Y, T, ν) be measure spaces; let λ0 be the primitive product measure on X × Y , and λ the
c.l.d. product measure. Show that λ0 W < ∞ iff λW < ∞ and W is included in a set of the form
S
(E × Y ) ∪ (X × F ) ∪ n∈N En × Fn
where µE = νF = 0 and µEn < ∞, νFn < ∞ for every n.

> (c) Show that if X and Y are any sets, with their respective counting measures, then the primitive and c.l.d.
product measures on X × Y are both counting measure on X × Y .

(d) Let (X, Σ, µ) and (Y, T, ν) be measure spaces; let λ0 be the primitive product measure on X × Y , and λ the
c.l.d. product measure. Show that

λ0 is locally determined
⇐⇒ λ0 is semi-finite
⇐⇒ λ0 = λ
⇐⇒ λ0 and λ have the same negligible sets.

Q (See 251W.) Let h(Xi , Σi , µi )ii∈I be a family of measure spaces, where I is a non-empty finite set. Set
> (e)
X = i∈I Xi . For A ⊆ X, set
P∞ Q S Q
θA = inf{ n=0 i∈I µi Eni : Eni ∈ Σi ∀ n ∈ N, i ∈ I, A ⊆ n∈N i∈I Eni }.
Show that θ is an outer measure on X. Let λ0 be the measure defined from θ by Carathéodory’s method, and for
W ∈ dom λ0 set
Q
λW = sup{λ0 (W ∩ i∈I Ei ) : Ei ∈ Σi , µi Ei < ∞ for every i ∈ I}.
Show that λ is a measure on X, and is the c.l.d. version of λ0 .

> (f ) (See 251W.) Let I be a non-empty finite set and h(Xi , Σi , µi )ii∈I a family of measure spaces. For non-empty
Q (K)
K ⊆ I set X (K) = i∈K Xi and let λ0 , λ(K) be the measures on X (K) constructed as in 251Xe. Show that if K is
(I)
a non-empty proper subset of I, then the natural bijection between X (I) and X (K) × X (I\K) identifies λ0 with the
(K) (I\K)
primitive product measure of λ0 and λ0 , and λ(I) with the c.l.d. product measure of λ(K) and λ(I\K) .

> (g) Using 251Xe-251Xf above, or otherwise, show that if (X1 , Σ1 , µ1 ), (X2 , Σ2 , µ2 ), (X3 , Σ3 , µ3 ) are measure spaces
then the primitive and c.l.d. product measures λ0 , λ of (X1 × X2 ) × X3 , constructed by first taking the appropriate
product measure on X1 × X2 and then taking the product of this with the measure of X3 , are identified with the
corresponding product measures on X1 × (X2 × X3 ) by the canonical bijection between the sets (X1 × X2 ) × X3 and
X1 × (X2 × X3 ).

(h)(i) What happens in 251Xe when I is a singleton? (ii) Devise an appropriate convention to make 251Xe-251Xf
remain valid when one or more of the sets I, K, I \ K there is empty.

> (i) Let (X, Σ, µ) be a complete locally determined measure space, and I any non-empty set; let ν be counting
measure on I. Show that the c.l.d. product measure on X × I is equal to (or at any rate identifiable with) the direct
sum measure of the family h(Xi , Σi , µi )ii∈I , if we set (Xi , Σi , µi ) = (X, Σ, µ) for every i.
210 Product measures 251Xj

> (j) Let h(Xi , Σi , µi )ii∈I be a family of measure spaces, with direct sum (X, Σ, µ) (214L). Let (Y, T, ν) be any
measure space,
S and give X × Y , Xi × Y their c.l.d. product measures. Show that the natural bijection between X × Y
and Z = i∈I (Xi × Y ) × {i} is an isomorphism between the measure of X × Y and the direct sum measure on Z.

> (k) Let (X, Σ, µ) be any measure space, and Y a singleton set {y}; let ν be the measure on Y such that νY = 1.
Show that the natural bijection between X × {y} and X identifies the primitive product measure on X × {y} with µ̌
as defined in 213Xa, and the c.l.d. product measure with the c.l.d. version of µ. Explain how to put this together with
251Xg and 251Ic to prove 251T.

> (l) Let (X, Σ, µ) and (Y, T, ν) be measure spaces, and λ the c.l.d. product measure on X × Y . Show that λ is the
b
c.l.d. version of its restriction to Σ⊗T.

(m) Let (X, Σ, µ) and (Y, T, ν) be measure spaces, with primitive and c.l.d. product measures λ0 , λ. Let λ1 be any
measure with domain Σ⊗Tb such that λ1 (E × F ) = µE · νF whenever E ∈ Σ and F ∈ T. Show that λW ≤ λ1 W ≤ λ0 W
b
for every W ∈ Σ⊗T.

(n) Let (X, Σ, µ) and (Y, T, ν) be two measure spaces, and λ0 the primitive product measure on X × Y . Show that
the corresponding outer measure λ∗0 is just the outer measure θ of 251A.

(o) Let (X, Σ, µ) and (Y, T, ν) be measure spaces, and A ⊆ X, B ⊆ Y subsets; write µA , νB for the subspace
measures. Let λ0 be the primitive product measure on X × Y , and λ#
0 the subspace measure it induces on A × B. Let
λ̃0 be the primitive product measure of µA and νB on A × B. Show that λ̃0 extends λ#
0 . Show that if either (α) A ∈ Σ
and B ∈ T or (β) A and B can both be covered by sequences of sets of finite measure or (γ) µ and ν are both strictly
localizable, then λ̃0 = λ#
0 .

(p) In 251Q, show that λ̃ and λ# will have the same null ideals, even if none of the conditions of 251Q(ii) are
satisfied.

(q) Let (X, Σ, µ) and (Y, T, ν) be any measure spaces, and λ0 the primitive product measure on X × Y . Show that
λ∗0 (A × B) = µ∗ A · ν ∗ B for any A ⊆ X and B ⊆ Y .

(r) Let (X, Σ, µ) and (Y, T, ν) be measure spaces, and µ̂ the completion of µ. Show that µ, ν and µ̂, ν have the
same primitive product measures.

(s) Let (X, Σ, µ) be a semi-finite measure space. Show that µ is atomless iff the diagonal {(x, x) : x ∈ X} is negligible
for the c.l.d. product measure on X × X.

(t) Let (X, Σ, µ) be an atomless measure space, and (Y, T, ν) any measure space. Show that the c.l.d. product
measure on X × Y is atomless.

> (u) Let (X, Σ, µ) and (Y, T, ν) be measure spaces, and λ the c.l.d. product measure on X × Y . (i) Show that if µ
and ν are purely atomic, so is λ. (ii) Show that if µ and ν are point-supported, so is λ.

251Y Further exercises (a) Let X, Y be sets with σ-algebras of subsets Σ, T. Suppose that h : X × Y → R is
b
Σ⊗T-measurable and φ : X → Y is (Σ, T)-measurable (121Yc). Show that x 7→ h(x, φ(x)) : X → R is Σ-measurable.

(b) Show that there are measure spaces (X1 , Σ1 , µ1 ) and (X2 , Σ2 , µ2 ), a probability space (Y, T, ν) and an inverse-
measure-preserving function f : X1 → X2 such that h : X1 × Y → X2 × Y is not inverse-measure-preserving for the
c.l.d. product measures on X1 × Y and X2 × Y , where h(x, y) = (f (x), y) for x ∈ X1 and y ∈ Y .

(c) Let (X, Σ, µ) be a complete locally determined measure space with a subspace A whose measure is not locally
determined (see 216Xb). Set Y = {0}, νY = 1 and consider the c.l.d. product measures on X × Y and A × Y ; write
Λ, Λ̃ for their domains. Show that Λ̃ properly includes {W ∩ (A × Y ) : W ∈ Λ}.

(d) Let (X, Σ, µ) be any measure space, (Y, T, ν) an atomless measure space, and f : X → Y a (Σ, T)-measurable
function. Show that {(x, f (x)) : x ∈ X} is negligible for the c.l.d. product measure on X × Y .
251 Notes Finite products 211

251 Notes and comments There are real difficulties in deciding which construction to declare as ‘the’ product of
two arbitrary measures. My phrase ‘primitive product measure’, and notation λ0 , betray a bias; my own preference is
for the c.l.d. product λ, for two principal reasons. The first is that λ0 is likely to be ‘bad’, in particular, not semi-finite,
even if µ and ν are ‘good’ (251Xd, 252Yk), while λ inherits some of the most important properties of µ and ν (e.g.,
251O); the second is that in the case of topological measure spaces X and Y , there is often a canonical topological
measure on X × Y , which is likely to be more closely related to λ than to λ0 . But for elucidation of this point I must
ask you to wait until §417 in Volume 4.
It would be possible to remove the ‘primitive’ product measure entirely from the exposition, or at least to relegate it
to the exercises. This is indeed what I expect to do in the rest of this treatise, since (in my view) all significant features
of product measures on finitely many factors can be expressed in terms of the c.l.d. product measure. For the first
introduction to product measures, however, a direct approach to the c.l.d. product measure (through the description of
λ∗ in 251P, for instance) is an uncomfortably large bite, and I have some sort of duty to present the most natural rival
to the c.l.d. product measure prominently enough for you to judge for yourself whether I am right to dismiss it. There
certainly are results associated with the primitive product measure (251Xn, 251Xq, 252Yc) which have an agreeable
simplicity.
The clash is avoided altogether, of course, if we specialize immediately to σ-finite spaces, in which the two construc-
tions coincide (251K). But even this does not solve all problems. There is a popular alternative measure often called
‘the’ product measure: the restriction λ0B of λ0 to the σ-algebra Σ⊗T. b (See, for instance, Halmos 50.) The advantage
b
of this is that if a function f on X × Y is Σ⊗T-measurable, then x 7→ f (x, y) is Σ-measurable for every y ∈ Y . (This
is because

{W : W ⊆ X × Y, {x : (x, y) ∈ W } ∈ Σ ∀ y ∈ Y }

is a σ-algebra of subsets of X × Y containing E × F whenever E ∈ Σ and F ∈ T, and therefore including Σ⊗T.) b The
2
primary objection, to my mind, is that Lebesgue measure on R is no longer ‘the’ product of Lebesgue measure on
R with itself. Generally, it is right to seek measures which measure as many sets as possible, and I prefer to face up
to the technical problems (which I acknowledge are off-putting) by seeking appropriate definitions on the approach to
major theorems, rather than rely on ad hoc fixes when the time comes to apply them.
I omit further examples of product measures for the moment, because the investigation of particular examples will
be much easier with the aid of results from the next section. Of course the leading example, and the one which should
come always to mind in response to the words ‘product measure’, is Lebesgue measure on R 2 , the case r = s = 1 of
251N and 251R. For an indication of what can happen when one of the factors is not σ-finite, you could look ahead to
252K.
I hope that you will see that the definition of the outer measure θ in 251A corresponds to the standard definition
of Lebesgue outer measure, with ‘measurable rectangles’ E × F taking the place of intervals, and the functional
E × F 7→ µE · νF taking the place of ‘length’ or ‘volume’ of an interval; moreover, thinking of E and F as intervals,
there is an obvious relation between Lebesgue measure on R 2 and the product measure on R × R. Of course an ‘obvious
relationship’ is not the same thing as a proper theorem with exact hypotheses and conclusions, but Theorem 251N
is clearly central. Long before that, however, there is another parallel between the construction of 251A and that of
Lebesgue measure. In both cases, the proof that we have an outer measure comes directly from the defining formula
(in 113Yd I gave as an exercise a general result covering 251B), and consequently a very general construction can
lead us to a measure. But the measure would be of far less interest and value if it did not measure, and measure
correctly, the basic sets, in this case the measurable rectangles. Thus 251E corresponds to the theorem that intervals
are Lebesgue measurable, with the right measure (114Db, 114G). This is the real key to the construction, and is one
of the fundamental ideas of measure theory.
Yet another parallel is in 251Xn; the outer measure defining the primitive product measure λ0 is exactly equal to
the outer measure defined from λ0 . I described the corresponding phenomenon for Lebesgue measure in 132C.
Any construction which claims the title ‘canonical’ must satisfy a variety of natural requirements; for instance, one
expects the canonical bijection between X × Y and Y × X to be an isomorphism between the corresponding product
measure spaces. ‘Commutativity’ of the product in this sense is I think obvious from the definitions in 251A-251C. It
is obviously desirable – not, I think, obviously true – that the product should be ‘associative’ in that the canonical
bijection between (X × Y ) × Z and X × (Y × Z) should also be an isomorphism between the corresponding products
of product measures. This is in fact valid for both the primitive and c.l.d. product measures (251Wh, 251Xe-251Xg).
Working through the classification of measure spaces presented in §211, we find that the primitive product measure
λ0 of arbitrary factor measures µ, ν is complete, while the c.l.d. product measure λ is always complete and locally
determined. λ0 may not be semi-finite, even if µ and ν are strictly localizable (252Yk); but λ will be strictly localizable
if µ and ν are (251O). Of course this is associated with the fact that the c.l.d. product measure is distributive over
212 Product measures 251 Notes

direct sums (251Xj). If either µ or ν is atomless, so is λ (251Xt). Both λ and λ0 are σ-finite if µ and ν are (251K). It
is possible for both µ and ν to be localizable but λ not (254U).
At least if you have worked through Chapter 21, you have now done enough ‘pure’ measure theory for this kind
of investigation, however straightforward, to raise a good many questions. Apart from direct sums, we also have the
constructions of ‘completion’, ‘subspace’, ‘outer measure’ and (in particular) ‘c.l.d. version’ to integrate into the new
ideas; I offer some results in 251T and 251Xk. Concerning subspaces, some possibly surprising difficulties arise. The
problem is that the product measure on the product of two subspaces can have a larger domain than one might expect.
I give a simple example in 251Yc and a more elaborate one in 254Ye. For strictly localizable spaces, there is no
problem (251Q); but no other criterion drawn from the list of properties considered in §251 seems adequate to remove
the possibility of a disconcerting phenomenon.

252 Fubini’s theorem


Perhaps the most important feature of the concept of ‘product measure’ is the fact that we can use it to discuss
repeated integrals. In this section I give versions of Fubini’s theorem and Tonelli’s theorem (252B, 252G) with a
variety of corollaries, the most useful ones being versions for σ-finite spaces (252C, 252H). As applications I describe
the relationship between integration and measuring ordinate sets (252N) and calculate the r-dimensional volume of
a ball in R r (252Q, 252Xi). I mention counter-examples showing the difficulties which can arise with non-σ-finite
measures and non-integrable functions (252K-252L, 252Xf-252Xg).

252A Repeated integrals Let (X, Σ, µ) and (Y, T, ν) be measure spaces, and f a real-valued function defined on
a set dom f ⊆ X × Y . We can seek to form the repeated integral
RR R R 
f (x, y)ν(dy)µ(dx) = f (x, y)ν(dy) µ(dx),
which should be interpreted as follows: set
R
D = {x : x ∈ X, f (x, y)ν(dy) is defined in [−∞, ∞]},
R
g(x) = f (x, y)ν(dy) for x ∈ D,
RR R
and then write f (x, y)ν(dy)µ(dx) = g(x)µ(dx) if this is defined. Of course the subset of Y on which y 7→ f (x, y)
is defined may vary with x, but it must always be conegligible, as must D.
Similarly, exchanging the roles of X and Y , we can seek a repeated integral
RR R R 
f (x, y)µ(dx)ν(dy) = f (x, y)µ(dx) ν(dy).
The point is that, under appropriate conditions on µ and ν, we can relate these repeated integrals to each other by
connecting them both with the integral of f itself with respect to the product measure on X × Y .
As will become apparent shortly, it is essential here to allow oneself to discuss the integral of a function which is not
everywhere defined. It is of less importance whether one allows integrandsR and integrals
R R to take infinite values, but for
definiteness let me say that I shall be following the rules of 135F; that is, f = f + − f − provided
R R that f is defined
almost everywhere, takes values in [−∞, ∞] and is virtually measurable, and at most one of f + , f − is infinite.

252B Theorem Let (X, Σ, µ) and (Y, T, ν) be measure spaces, with c.l.d. product (X × Y, Λ, λ) (251F). Suppose
that ν is σ-finite and or complete and locally determined. Let f be a R[−∞, ∞]-valued
R that µ is either strictly localizable RR
function such that f dλ is defined in [−∞, ∞]. Then f (x, y)ν(dy)µ(dx) is defined and is equal to f dλ.
proof The proof of this result involves substantial technical difficulties. If you have not seen these ideas before, you
should almost certainly not go straight to the full generality of the version announced above. I will therefore start by
writing out a proof in the case in which both µ and ν are totally finite; this is already lengthy enough. I will present it
in such a way that only the central section (part (b) below) needs to be amended in the general case, and then, after
completing the proof of the special case, I will give the alternative version of (b) which is required for the full result.
R RR
(a) Write L for the family of [0, ∞]-valued functions f such that fRdλ and f (x, y)ν(dy)µ(dx) are defined and
equal. My aim is to show first that f ∈ L whenever f is non-negative and f dλ is defined, and then to look at differences
of functions in L. To prove that enough functions belong to L, my strategy will be to start with ‘elementary’ functions
and work outwards through progressively larger classes. It is most efficient to begin by describing ways of building new
members of L from old, as follows.
(i) f1 + f2 ∈ L for all f1 , f2 ∈ L, and cf ∈ L for all f ∈ L, c ∈ [0, ∞[; this is because
252B Fubini’s theorem 213
R R R
(f1 + f2 )(x, y)ν(dy) = f1 (x, y)ν(dy) + f2 (x, y)ν(dy),
R R
(cf )(x, y)ν(dy) = c f (x, y)ν(dy)
whenever the right-hand sides are defined, which we are supposing to be the case for almost every x, so that
ZZ ZZ ZZ
(f1 + f2 )(x, y)ν(dy)µ(dx) = f1 (x, y)ν(dy)µ(dx) + f2 (x, y)ν(dy)µ(dx)
Z Z Z
= f1 dλ + f2 dλ = (f1 + f2 )dλ,
ZZ Z Z Z
(cf )(x, y)ν(dy)µ(dx) = c f (x, y)ν(dy)µ(dx) = c f dλ = (cf )dλ.

(ii) If hfn in∈N is a sequence in L such that fn (x, y) ≤ fn+1 (x, y) whenever
R n ∈ N and (x, y) ∈ dom fn ∩ dom fn+1 ,
then supn∈N fn ∈ L. P P Set f = supn∈N fn ; for x ∈ X, n ∈ N set gn (x) = fn (x, y)ν(dy) when the integral
T is defined
in [0, ∞]. Since hereR I am allowing ∞
R as a value of a function, it is natural to regard f as defined on n∈N dom fn . By
B.Levi’s theorem, f dλ = supn∈N fn dλ; write u for this common value in [0, ∞]. Next, because fn ≤ fn+1 wherever
both are defined, gn ≤ gn+1 wherever both are defined, for each n; we are supposing that fn ∈ L, so gn is defined
µ-almost everywhere for each n, and
R R
supn∈N gn dµ = supn∈N fn dλ = u.
R T
By B.Levi’s theorem again, g dµ = u, where g = supn∈N gn . Now take any x ∈ n∈N dom gn , and consider the
functions fxn on Y , setting fxn (y) = fn (x, y) whenever this is defined. Each fxn has an integral in [0, ∞], and
fxn (y) ≤ fx,n+1 (y) whenever both are defined, and
R
supn∈N fxn dν = g(x);
R
so, using B.Levi’s theorem for a third time, (supn∈N fxn )dν is defined and equal to g(x), that is,
R
f (x, y)ν(dy) = g(x).
This is true for almost every x, so
RR R R
f (x, y)ν(dy)µ(dx) = g dµ = u = f dλ.
Thus f ∈ L, as claimed. Q
Q
(iii) The expression of the ideas in the next section of the proof will go more smoothly if I introduce another
term. Write W for {W : W ⊆ X × Y, χW ∈ L}. Then
(α) if W , W ′ ∈ W and W ∩ W ′ = ∅, W ∪ W ′ ∈ W
by (i), because χ(W ∪ W ′ ) = χW + χW ′ ,
S
(β) n∈N Wn ∈ W whenever hWn in∈N is a non-decreasing sequence in W
because hχWn in∈N ↑ χW , and we can use (ii). R
It is also helpful to note that, for any W ⊆ X × Y and any x ∈ X, χW (x, y)ν(dy) = νW [{x}], at least whenever
W [{x}] = {y : (x, y) ∈ W } isRmeasured by ν. Moreover, because λ is complete, a Rset W ⊆ X × Y belongs to Λ iff χW
is λ-virtually measurable iff χW dλ is defined in [0, ∞], and in this case λW = χW dλ.
(iv) Finally, we need to observe that, in appropriate circumstances, the difference of two members Rof W will belong

to W: if W , W R ∈W and W ⊆ W ′ and λW ′ < ∞, then W ′ \W ∈ W. P P WeR are supposing R that g(x) = χW (x, y)ν(dy)
and g (x) = χW (x, y)ν(dy) are defined for almost every x, and that g dµ = λW , g dµ = λW ′ . Because λW ′ is
′ ′ ′

finite, g ′ must be finite almost everywhere, and D = {x : x ∈ dom g ∩ dom g ′ , g ′ (x) < ∞} is conegligible. Now, for any
x ∈ D, both g(x) and g ′ (x) are finite, so
y 7→ χ(W ′ \ W )(x, y) = χW ′ (x, y) − χW (x, y)
is the difference of two integrable functions, and
Z Z
χ(W \ W )(x, y)ν(dy) = χW ′ (x, y) − χW (x, y)ν(dy)

Z Z
= χW ′ (x, y)ν(dy) − χW (x, y)ν(dy) = g ′ (x) − g(x).

Accordingly
214 Product measures 252B
RR R
χ(W ′ \ W )(x, y)ν(dy)µ(dx) = g ′ (x) − g(x)µ(dx) = λW ′ − λW = λ(W ′ \ W ),
and W ′ \ W belongs to W. Q Q
(Of course the argument just above can be shortened by a few words if we allow ourselves to assume that µ and ν
are totally finite, since then g(x) and g ′ (x) will be finite whenever they are defined; but the key idea, that the difference
of integrable functions is integrable, is unchanged.)
(b) Now let us examine the class W, assuming that µ and ν are totally finite.
(i) E × F ∈ W for all E ∈ Σ, F ∈ T. P
P λ(E × F ) = µE · νF (251J), and
R
χ(E × F )(x, y)ν(dy) = νF χE(x)
for each x, so
ZZ Z
χ(E × F )(x, y)ν(dy)µ(dx) = (νF χE(x))µ(dx) = µE · νF
Z
= λ(E × F ) = χ(E × F )dλ. QQ

(ii) Let C be {E × F : E ∈ Σ, F ∈ T}. Then C is closed under finite intersections (because (E × F ) ∩ (E ′ × F ′ ) =


(E ∩ E ′ ) × (F ∩ F ′ )) and is included in W. In particular, X × Y ∈ W. But this, together with (a-iv) and (a-iii-β)
above, means that W is a Dynkin class (definition: 136A), so includes the σ-algebra of subsets of X × Y generated by
C, by the Monotone Class Theorem (136B); that is, W ⊇ Σ⊗T b (definition: 251D).
(iii) Next, W ∈ W whenever W ⊆ X × Y is λ-negligible. P P By 251Ib, there is a V ∈ Σ⊗T b such that V ⊆
(X × Y ) \ W and λV = λ((X × Y ) \ W ). Because λ(X × Y ) = µX · νY is finite, V ′ = (X × Y ) \ V is λ-negligible, and
b
we have W ⊆ V ′ ∈ Σ⊗T. Consequently
RR
0 = λV ′ = χV ′ (x, y)ν(dy)µ(dx).
But this means that
R
D = {x : χV ′ (x, y)ν(dy) is defined and equal to 0}
is conegligible. If x ∈ D, then we must have χV ′ (x, y) ′
R = 0 for ν-almost every y, that is, V [{x}] is negligible; in

which
R case W [{x}] ⊆ V [{x}] also is negligible, and χW (x, y)ν(dy) = 0. And this is true for every x ∈ D, so
χW (x, y)ν(dy) is defined and equal to 0 for almost every x, and
RR
χW (x, y)ν(dy)µ(dx) = 0 = λW ,
as required. Q
Q
(iv) It follows that Λ ⊆ W. P b such that V ⊆ W and
P If W ∈ Λ, then, by 251Ib again, there is a V ∈ Σ⊗T
λV = λW , so that λ(W \ V ) = 0. Now V ∈ W by (ii) and W \ V ∈ W by (iii), so W ∈ W by (a-iii-α). Q
Q
(c) I return to the class L.
(i) If f ∈ L and g is a [0, ∞]-valued function defined and equal to f λ-a.e., then g ∈ L. P
P Set
W = (X × Y ) \ {(x, y) : (x, y) ∈ dom f ∩ dom g, f (x, y) = g(x, y)},
RR
so that λW = 0. (Remember that λ is complete.) By (b), χW (x, y)ν(dy)µ(dx) = 0, that is, W [{x}] is ν-negligible
for µ-almost every x. Let D be {x : x ∈ X, W [{x}] is ν-negligible}. Then D is µ-conegligible. If x ∈ D, then
W [{x}] = Y \ {y : (x, y) ∈ dom f ∩ dom g, f (x, y) = g(x, y)}
R R R
is negligible,
R so that f (x, y)ν(dy) = g(x, y)ν(dy) if either is defined. Thus the functions x 7→ f (x, y)ν(dy),
x 7→ g(x, y)ν(dy) are equal almost everywhere, and
RR RR R R
g(x, y)ν(dy)µ(dx) = f (x, y)ν(dy)µ(dx) = f dλ = g dλ,
so that g ∈ L. Q
Q
R
(ii) Now let f be any non-negative function such that f dλ is defined in [0, ∞]. Then f ∈ L. P
P For k, n ∈ N set
Wnk = {(x, y) : (x, y) ∈ dom f, f (x, y) ≥ 2−n k}.
Because λ is complete and f is λ-virtually measurable and dom f is conegligible, every Wnk belongs to Λ, so χWnk ∈ L,
P 4n
by (b). Set fn = k=1 2−n χWnk , so that
252B Fubini’s theorem 215

fn (x, y) = 2−n k if k ≤ 4n and 2−n k ≤ f (x, y) < 2−n (k + 1),


= 2n if f (x, y) ≥ 2n ,
= 0 if (x, y) ∈ (X × Y ) \ dom f.
By (a-i), fn ∈ L for every n ∈ N, while hfn in∈N is non-decreasing, so f ′ = supn∈N fn ∈ L, by (a-ii). But f =a.e. f ′ , so
f ∈ L, by (i) just above. Q Q
R R R
(iii) Finally, let f be any [−∞, ∞]-valued function such that f dλ is defined in [−∞, ∞]. Then R f + dλ, f − dλ
+ −
are bothR defined and at most one is infinite. By (ii), bothR f and R f+ belongR to L. SetR g(x) = f + (x, y)ν(dy),
− −
h(x) = f (x, y)ν(dy) whenever these are defined; then g dµ = f dλ and h dµ = f dλ are both defined in
[0, ∞]. R R
Suppose first that f − dλ is finite. Then h dµ is finite, so h must be finite µ-almost everywhere; set
D = {x : x ∈ dom g ∩ dom h, h(x) < ∞}.
R R
For any x ∈ D, +
f (x, y)ν(dy) and f − (x, y)ν(dy) are defined in [0, ∞], and the latter is finite; so
R R R
f (x, y)ν(dy) = f + (x, y)ν(dy) − f − (x, y)ν(dy) = g(x) − h(x)
is defined in ]−∞, ∞]. Because D is conegligible,
ZZ Z Z Z
f (x, y)ν(dy)µ(dx) = g(x) − h(x)µ(dx) = g dµ − h dµ
Z Z Z
= f + dλ − f − dλ = f dλ,

as required. R RR
R Thus weR have the result when f − dλ is finite. Similarly, or by applying the argument above to −f , f (x, y)ν(dy)µ(dx) =
f dλ if f + dλ is finite.
Thus the theorem is proved, at least when µ and ν are totally finite.
(b*) The only point in the argument above where we needed to know anything special about the measures µ and ν
was in part (b), when showing that Λ ⊆ W. I now return to this point under the hypotheses of the theorem as stated,
that ν is σ-finite and µ is either strictly localizable or complete and locally determined.
(i) It will be helpful to note that the completion µ̂ of µ (212C) is identical with its c.l.d. version µ̃ (213E). P
P If µ
is strictly localizable, then µ̂ = µ̃ by 213Ha. If µ is complete and locally determined, then µ̂ = µ = µ̃ (212D, 213Hf).
QQ
(ii) Write Σf = {G : G ∈ Σ, µG < ∞}, Tf = {H : H ∈ T, νH < ∞}. For G ∈ Σf , H ∈ Tf let µG , νH and λG×H
be the subspace measures on G, H and G × H respectively; then λG×H is the c.l.d. product measure of µG and νH
(251Q(ii-α)). Now W ∩ (G × H) ∈ W for every W ∈ Λ. P P W ∩ (G × H) belongs to the domain of λG×H , so by (b) of
this proof, applied to the totally finite measures µG and νH ,

λ(W ∩ (G × H)) = λG×H (W ∩ (G × H))


Z Z
= χ(W ∩ (G × H))(x, y)νH (dy)µG (dx)
G H
Z Z
= χ(W ∩ (G × H))(x, y)ν(dy)µG (dx)
G Y
(because χ(W ∩ (G × H))(x, y) = 0 if y ∈ Y \ H, so we can use 131E)
Z Z
= χ(W ∩ (G × H))(x, y)ν(dy)µ(dx)
X Y
R
by 131E again, because Y
χ(W ∩ (G × H))(x, y)ν(dy) = 0 if x ∈ X \ G. So W ∩ (G × H) ∈ W. Q
Q
(iii) In fact, W ∈ W for every W ∈ Λ. P P Remember that we are supposing that ν is σ-finite.R Let hYn in∈N be
a non-decreasing sequence in Tf covering Y , and for each n ∈ N set Wn = W ∩ (X × Yn ), gn (x) = χWn (x, y)ν(dy)
whenever this is defined. For any G ∈ Σf ,
R RR
g dµ =
G n
χ(W ∩ (G × Yn ))(x, y)ν(dy)µ(dx)
216 Product measures 252B

is defined and equal to λ(W ∩ (G × Yn )), by (ii). But this means, first, that G \ dom gn is negligible, that is, that
µ̂(G \ dom gn ) = 0. Since this is so whenever µG is finite, µ̃(X \ dom gn ) = 0, and gn is defined µ̃-a.e.; but µ̃ = µ̂,
so gn is defined µ̂-a.e., that is, µ-a.e. (212Eb). Next, if we set Ena = {x : x ∈ dom gn , gn (x) ≥ a} for a ∈ R, then
Ena ∩ G ∈ Σ̂ whenever G ∈ Σf , where Σ̂ is the domain of µ̂; by the definition in 213D, Ena is measured by µ̃ = µ̂. As
a is arbitrary, gn is µ-virtuallyR measurable (212Fa).
We can therefore speak of gn dµ. Now

ZZ Z Z
χWn (x, y)ν(dy)µ(dx) = gn dµ = sup gn
G∈Σf G
(213B, because µ is semi-finite)
= sup λ(W ∩ (G × Yn )) = λ(W ∩ (X × Yn ))
G∈Σf

by the definition in 251F. Thus W ∩ (X × Yn ) ∈ W.


This is true for every n ∈ N. Because hYn in∈N ↑ Y , W ∈ W, by (a-iii-β). Q
Q
(iv) We can therefore return to part (c) of the argument above and conclude as before.

252C The theorem above is of course asymmetric, in that different hypotheses are imposed on the two factor
measures µ and ν. If we want a ‘symmetric’ theorem we have to suppose that they are both σ-finite, as follows.
Corollary Let (X, Σ, µ) RR measure spaces, and λ the c.l.d. product measure on X × Y . If
RR and (Y, T, ν) be two σ-finite
f is λ-integrable, then f (x, y)ν(dy)µ(dx) and f (x, y)µ(dx)ν(dy) are defined, finite and equal.
proof Since µ and ν are surely strictly localizable (211Lc), we can apply 252B from either side to conclude that
RR R RR
f (x, y)ν(dy)µ(dx) = f dλ = f (x, y)µ(dx)ν(dy).

252D So many applications of Fubini’s theorem are to characteristic functions that I take a few lines to spell out
the form which 252B takes in this case, as in parts (b)-(b*) of the proof there.
Corollary Let (X, Σ, µ) and (Y, T, ν) be measure spaces and λ the c.l.d. product measure on X × Y . Suppose that ν
is σ-finite and that µ is either
R strictly localizable or complete and locally determined.
(i) If W ∈ dom λ, then ν ∗ W [{x}]µ(dx)
R is defined in [0, ∞] and
R equal to λW .
(ii) If ν is complete, we can write νW [{x}]µ(dx) in place of ν ∗ W [{x}]µ(dx).
R
proof The point is just that χW (x, y)ν(dy) = ν̂W [{x}] whenever either is defined, where ν̂ is the completion of ν
(212F). Now 252B tells us that
RR R
λW = χW (x, y)ν(dy)µ(dx) = ν̂W [{x}]µ(dx).

We
R always have ν̂W [{x}] = ν W [{x}], by the definition of ν̂ (212C); and if ν is complete, then ν̂ = ν so λW =
νW [{x}]µ(dx).

252E Corollary Let (X, Σ, µ) and (Y, T, ν) be measure spaces, with c.l.d. product (X × Y, Λ, λ). Suppose that ν
is σ-finite and that µ is has locally determined negligible sets (213I). Then if f is a Λ-measurable real-valued function
defined on a subset of X × Y , y 7→ f (x, y) is ν-virtually measurable for µ-almost every x ∈ X.
proof Let f˜ be a Λ-measurable extension of f to a real-valued function defined everywhere in X × Y (121I), and set
f˜x (y) = f˜(x, y) for all x ∈ X, y ∈ Y ,
D = {x : x ∈ X, f˜x is ν-virtually measurable}.
If G ∈ Σ and µG < ∞, then G \ D is negligible. P
P Let hYn in∈N be a non-decreasing sequence of sets of finite
measure covering Y respectively, and set

f˜n (x, y) = f˜(x, y) if x ∈ G, y ∈ Yn and |f˜(x, y)| ≤ n,


= 0 for other x ∈ X × Y.

Then each f˜n is λ-integrable, being bounded and Λ-measurable and zero off G × Yn . Consequently, setting f˜nx (y) =
f˜n (x, y),
252G Fubini’s theorem 217
R R R
( f˜nx dν)µ(dx) exists = f˜n dλ.
But this surely means that f˜nx is ν-integrable, therefore ν-virtually measurable, for almost every x ∈ X. Set
Dn = {x : x ∈ X, f˜nx is ν-virtually measurable};
T T ˜ ˜
then every Dn is µ-conegligible, so n∈N D Tn is conegligible. But for any x ∈ G ∩ n∈N Dn , fx = limn→∞ fnx is
ν-virtually measurable. Thus G \ D ⊆ X \ n∈N Dn is negligible. Q Q
This is true whenever µG < ∞. As G is arbitrary and µ has locally determined negligible sets, D is conegligible.
But, for any x ∈ D, y 7→ f (x, y) is a restriction of f˜x and must be ν-virtually measurable.

252F As a further corollary we can get some useful information about the c.l.d. product measure for arbitrary
measure spaces.
Corollary Let (X, Σ, µ) and (Y, T, ν) be two measure spaces, λ the c.l.d. product measure on X × Y , and Λ its domain.
Let W ∈ Λ be such that the vertical section W [{x}] is ν-negligible for µ-almost every x ∈ X. Then λW = 0.
proof Take E ∈ Σ, F ∈ T of finite measure. Let λE×F be the subspace measure on E × F . By 251Q(ii-α) again, this
is just the product of the subspace measures µE and νF . We know that W ∩ (E × F ) is measured by λE×F . At the
same time, the vertical section (W ∩ (E × F ))[{x}] = W [{x}] ∩ F is νF -negligible for µE -almost every x ∈ E. Applying
252B to µE and νF and χ(W ∩ (E × F )),
R
λ(W ∩ (E × F )) = λE×F (W ∩ (E × F )) = E
νF (W [{x}] ∩ F )µE (dx) = 0.
But looking at the definition in 251F, we see that this means that λW = 0, as claimed.

252G Theorem 252B and its corollaries depend on the factor measures µ and ν belonging to restricted classes.
There is a partial result which applies to all c.l.d. product measures, as follows.
Tonelli’s theorem Let (X, Σ, µ) and (Y, T, ν) be measure spaces, and (X × Y, Λ, λ) their c.l.d.
RR product. Let f be a
Λ-measurable
RR [−∞, ∞]-valued function defined on a member of Λ, and suppose that either |f (x, y)|µ(dx)ν(dy) or
|f (x, y)|ν(dy)µ(dx) exists in R. Then f is λ-integrable.
proof Because RR the construction of the product measure is symmetric in the two factors, it is enough to consider the
case in which |f (x, y)|ν(dy)µ(dx) is defined and finite, as the same ideas will surely deal with the other case also.
P Set W = {(x, y) : (x, y) ∈ dom f, f (x, y) is finite}.
(a) The first step is to check that f is defined and finite λ-a.e. P
Then W ∈ Λ. The hypothesis
RR
|f (x, y)|ν(dy)µ(dx) is defined and finite
includes the assertion
R
|f (x, y)|ν(dy) is defined and finite for µ-almost every x,
which implies that
for µ-almost every x, f (x, y) is defined and finite for ν-almost every y;
that is, that
for µ-almost every x, W [{x}] is ν-conegligible.
But by 252F this implies that (X × Y ) \ W is λ-negligible, as required. Q
Q
R
RR (b) Let h be any non-negative λ-simple function such that h ≤ |f | λ-a.e. Then h cannot be greater than
|f (x, y)|ν(dy)µ(dx). P
P Set
W = {(x, y) : (x, y) ∈ dom f, h(x, y) ≤ |f (x, y)|}, h′ = h × χW ;
Pn
then h′ is a simple function and h′ =a.e. h. Express h′ as i=0 ai χWi where ai ≥ 0 and λWi < ∞ for each i. Let
ǫ > 0.S For each i ≤ n there
S are Ei ∈ Σ, Fi ∈ T such that µEi < ∞, νFi < ∞ and λ(Wi ∩ (Ei × Fi )) ≥ λWi − ǫ. Set
E = i≤n Ei and F = i≤n Fi . Consider the subspace measures µE and νF and their product λE×F on E × F ; then
λE×F is the subspace measure on E × F defined from λ (251Q(ii-α) once more). Accordingly, applying 252B to the
product µE × νF ,
R R R R
E×F
h′ dλ = E×F
h′ dλE×F = E F
h′ (x, y)νF (dy)µE (dx).
For any x, we know that h′ (x, y) ≤ |f (x, y)| whenever f (x, y) is defined. So we can be sure that
218 Product measures 252G
R R R
h′ (x, y)νF (dy) = h′ (x, y)χF (y)ν(dy) ≤ |f (x, y)|ν(dy)
F
R R
at least whenever F h′ (x, y)νF (dy) and |f (x, y)|ν(dy) are both defined, which is the case for almost every x ∈ E.
Consequently
Z Z Z
h′ dλ = h′ (x, y)νF (dy)µE (dx)
E×F
ZE ZF ZZ
≤ |f (x, y)|ν(dy)µ(dx) ≤ |f (x, y)|ν(dy)µ(dx).
E
On the other hand,
Z Z n
X
h′ dλ − h′ dλ = ai λ(Wi \ (E × F ))
E×F i=0
Xn n
X
≤ ai λ(Wi \ (Ei × Fi )) ≤ ǫ ai .
i=0 i=0

So
R R RR Pn
h dλ = h′ dλ ≤ |f (x, y)|ν(dy)µ(dx)+ǫ i=0 ai .
R RR
As ǫ is arbitrary, h dλ ≤ |f (x, y)|ν(dy)µ(dx), as claimed. Q
Q
(c) This is true whenever h is a λ-simple function less than or equal to |f | λ-a.e. But |f | is Λ-measurable and λ
is semi-finite (251Ic), so this is enough to ensure that |f | is λ-integrable (213B), which (because f is supposed to be
Λ-measurable) in turn implies that f is λ-integrable.

252H Corollary Let (X, Σ, µ) and (Y, T, ν) be σ-finite measure spaces, λ the c.l.d. product measure on X × Y ,
and Λ its domain. Let f be a Λ-measurable [−∞, ∞]-valued function defined on a member of Λ. Then if one of
R R R R R
X×Y
|f (x, y)|λ(d(x, y)), Y X
|f (x, y)|µ(dx)ν(dy), X Y
|f (x, y)|ν(dy)µ(dx)
exists in R, so do the other two, and in this case
R R R R R
X×Y
f (x, y)λ(d(x, y)) = Y X
f (x, y)µ(dx)ν(dy) = X Y
f (x, y)ν(dy)µ(dx).
R
proof (a) Suppose that |f |dλ is finite. Because both µ and ν are σ-finite, 252B tells us that
RR RR
|f (x, y)|µ(dx)ν(dy), |f (x, y)|ν(dy)µ(dx)
R
both exist and are equal to |f |dλ, while
RR RR
f (x, y)µ(dx)ν(dy), f (x, y)ν(dy)µ(dx)
R
both exist and are equal to f dλ.
RR
(b) Now suppose that |f (x, y)|ν(dy)µ(dx) exists in R. Then 252G tells us that |f | isRRλ-integrable, so we can use
(a) to complete the argument. Exchanging the coordinates, the same argument applies if |f (x, y)|µ(dx)ν(dy) exists
in R.

252I Corollary Let (X, Σ, µ) and (Y, T, ν) be measure spaces, λ the c.l.d. product measure on X × Y , and Λ its
domain. Take W ∈ Λ. If either of the integrals
R R
µ∗ W −1 [{y}]ν(dy), ν ∗ W [{x}]µ(dx)
exists and is finite, then λW < ∞.
proof Apply 252G with f = χW , remembering that
R R
µ∗ W −1 [{y}] = χW (x, y)µ(dx), ν ∗ W [{x}] = χW (x, y)ν(dy)
whenever the integrals are defined, as in the proof of 252D.

252J Remarks 252H is the basic form of Fubini’s theorem; it is not a coincidence that most authors avoid non-
σ-finite spaces in this context. The next two examples exhibit some of the difficulties which can arise if we leave the
familiar territory of more-or-less Borel measurable functions on σ-finite spaces. The first is a classic.
252L Fubini’s theorem 219

252K Example Let (X, Σ, µ) be [0, 1] with Lebesgue measure, and let (Y, T, ν) be [0, 1] with counting measure.

(a) Consider the set


W = {(t, t) : t ∈ [0, 1]} ⊆ X × Y .
We observe that W is expressible as
T Sn k k+1 k k+1
n∈N k=0 [ n+1 , n+1 ] ×[ , ] b
∈ Σ⊗T.
n+1 n+1
If we look at the sections
W −1 [{t}] = W [{t}] = {t}
for t ∈ [0, 1], we have
RR R R
χW (x, y)µ(dx)ν(dy) = µW −1 [{y}]ν(dy) = 0 ν(dy) = 0,
RR R R
χW (x, y)ν(dy)µ(dx) = νW [{x}]µ(dx) = 1 µ(dx) = 1,
so the two repeated integrals differ. It is therefore not generally possible to reverse the order of repeated integration,
even for a non-negative measurable function in which both repeated integrals exist and are finite.

(b) Because the set W of part (a) actually belongs to Σ⊗T,b we know that it is measured by the c.l.d. product
measure λ, and 252F (applied with the coordinates reversed) tells us that λW = 0.

(c) It is in fact easy to give a full description of λ.


(i) The point is that a set W ⊆ [0, 1] × [0, 1] belongs to the domain Λ of λ iff every horizontal section W −1 [{y}]
is Lebesgue measurable. P P (α) If W ∈ Λ, then, for every b ∈ [0, 1], λ([0, 1] × {b}) is finite, so W ∩ ([0, 1] × {b}) is a set
of finite measure, and
R
λ(W ∩ ([0, 1] × {b})) = µ(W ∩ ([0, 1] × {b}))−1 [{y}]ν(dy) = µW −1 [{b}]
by 252D, because µ is σ-finite, ν is both strictly localizable and complete and locally determined, and

(W ∩ ([0, 1] × {b}))−1 [{y}] = W −1 [{b}] if y = b,


= ∅ otherwise.
As b is arbitrary, every horizontal section of W is measurable. (β) If every horizontal section of W is measurable, let
F ⊆ [0, 1] be any set of finite measure for ν; then F is finite, so
S
b ⊆ Λ.
W ∩ ([0, 1] × F ) = y∈F W −1 [{y}] × {y} ∈ Σ⊗T
But it follows that W itself belongs to Λ, by 251H. Q
Q
(ii) Now some of the same calculations show that for every W ∈ Λ,
P
λW = y∈[0,1] µW −1 [{y}].
P For any finite F ⊆ [0, 1],
P
Z
λ(W ∩ ([0, 1] × F )) = µ(W ∩ ([0, 1] × F ))−1 [{y}]ν(dy)
Z X
= µW −1 [{y}]ν(dy) = µW −1 [{y}].
F y∈F

So
P P
λW = supF ⊆[0,1] is finite y∈F µW −1 [{y}] = y∈[0,1] µW −1 [{y}]. Q
Q

252L Example For the second example, I turn to a problem that can arise if we neglect to check that a function
is measurable as a function of two variables.
Let (X, Σ, µ) = (Y, T, ν) be ω1 , the first uncountable ordinal (2A1Fc), with the countable-cocountable measure
(211R). Set
W = {(ξ, η) : ξ ≤ η < ω1 } ⊆ X × Y .
220 Product measures 252L

Then all the horizontal sections W −1 [{η}] = {ξ : ξ ≤ η} are countable, so


R R
µW −1 [{η}]ν(dη) = 0 ν(dη) = 0,
while all the vertical sections W [{ξ}] = {η : ξ ≤ η < ω1 } are cocountable, so
R R
νW [{ξ}]µ(dξ) = 1 µ(dξ) = 1.
Because the two repeated integrals are different, they cannot both be equal to the measure of W , and the sole resolution
is to say that W is not measured by the product measure.

252M Remark A third kind of difficulty in the formula


RR RR
f (x, y)dxdy = f (x, y)dydx
b
can arise even on probability spaces with Σ⊗T-measurable real-valued functions defined everywhere if we neglect to
check that f is integrable with respect to the product measure. In 252H, we do need the hypothesis that one of
R R R R R
X×Y
|f (x, y)|λ(d(x, y)), Y X
|f (x, y)|µ(dx)ν(dy), X Y
|f (x, y)|ν(dy)µ(dx)
is finite. For examples to show this, see 252Xf and 252Xg.

252N Integration through ordinate sets I: Proposition Let (X, Σ, µ) be a complete locally determined measure
space, and λ the c.l.d. product measure on X × R, where R is given Lebesgue measure; write Λ for the domain of λ.
For any [0, ∞]-valued function f defined on a conegligible subset of X, write Ωf , Ω′f for the ordinate sets
Ωf = {(x, a) : x ∈ dom f, 0 ≤ a ≤ f (x)} ⊆ X × R,

Ω′f = {(x, a) : x ∈ dom f, 0 ≤ a < f (x)} ⊆ X × R.


Then
R
λΩf = λΩ′f = f dµ
in the sense that if one of these is defined in [0, ∞], so are the other two, and they are equal.
proof (a) If Ωf ∈ Λ, then
R R
f (x)µ(dx) = ν{y : (x, y) ∈ Ωf }µ(dx) = λΩf
by 252D, writing µ for Lebesgue measure, because f is defined almost everywhere. Similarly, if Ω′f ∈ Λ,
R R
f (x)µ(dx) = ν{y : (x, y) ∈ Ω′f }µ(dx) = λΩ′f .
R
(b) If f dµ is defined, then f is µ-virtually measurable, therefore measurable (because µ is complete); again because
µ is complete, dom f ∈ Σ. So
S
Ω′f = q∈Q,q>0 {x : x ∈ dom f, f (x) > q} × [0, q],

T S 1
Ωf = n≥1 q∈Q,q>0 {x
: x ∈ dom f, f (x) ≥ q − } × [0, q]
n
R
belong to Λ, so that λΩf and λΩ′f are defined. Now both are equal to f dµ, by (a).

252O Integration through ordinate sets II: Proposition Let (X, Σ, µ) be a measure space, and f a non-
negative µ-virtually measurable function defined on a conegligible subset of X. Then
R R∞ R∞
f dµ = 0
µ∗ {x : x ∈ dom f, f (x) ≥ t}dt = 0
µ∗ {x : x ∈ dom f, f (x) > t}dt
R
in [0, ∞], where the integrals . . . dt are taken with respect to Lebesgue measure.
proof Completing µ does not change the integral of f or the outer measure µ∗ (212Fb, 212Ea), so we may suppose that
µ is complete, in which case dom f and f will be measurable. For n, k ∈ N set Enk = {x : x ∈ dom f, f (x) > 2−n k},
P 4n
gn (x) = 2−n k=1 χEnk .R Then hgn in∈N isR a non-decreasing sequence of measurable functions converging to f at
every point of dom f , so f dµ = limn→∞ gn dµ and µ{x : f (x) > t} = limn→∞ µ{x : gn (x) > t} for every t ≥ 0;
consequently
R∞ R∞
0
µ{x : f (x) > t}dt = limn→∞ 0
µ{x : gn (x) > t}dt.
*252P Fubini’s theorem 221

On the other hand, µ{x : gn (x) > t} = µEnk if 1 ≤ k ≤ 4n and 2−n (k − 1) ≤ t < 2−n k, 0 if t ≥ 2n , so that
R∞ P4n R
0
µ{x : gn (x) > t}dt = k=1 2−n µEnk = gn dµ,
R∞ R
for every n ∈ N. So 0 µ{x : f (x) > t}dt = f dµ.
Now µ{x : f (x) ≥ t} = µ{x : f (x) > t} for almost all t. P
P Set C = {t : µ{x : f (x) > t} < ∞}, h(t) = µ{x : f (x) > t}
for t ∈ C. If C is not empty, h : C → [0, ∞[ is monotonic, so is continuous almost everywhere in C (222A). But at any
point of C \ {inf C} at which h is continuous,
µ{x : f (x) ≥ t} = lims↑t µ{x : f (x) > s} = µ{x : f (x) > t}.
So we have the Rresult, since µ{x : f (x) ≥ t} = µ{x : fR(x) > t} = ∞ for any t ∈ [0, ∞[ \ C. Q
Q

Accordingly 0 µ{x : f (x) ≥ t}dt is also equal to f dµ.

b
*252P If we work through the ideas of 252B for Σ⊗T-measurable functions, we get the following, which is sometimes
useful.
Proposition Let (X, Σ, µ) be a measure b
space, and (Y, T, ν) a σ-finite measure space. Then for anyRRΣ⊗T-measurable
R
function
R f : X×Y → [0, ∞], x →
7 f (x, y)ν(dy) : X → [0, ∞] is Σ-measurable; and if µ is semi-finite, f (x, y)ν(dy)µ(dx) =
f dλ, where λ is the c.l.d. product measure on X × Y .
proof (a) Let hYn in∈N be a non-decreasing sequence of subsets of Y of finite measure with union Y . Set

A = {W : W ⊆ X × Y, W [{x}] ∈ T for every x ∈ X,


x 7→ ν(Yn ∩ W [{x}]) is Σ-measurable for every n ∈ N}.
b by the Monotone
Then A is a Dynkin class of subsets of X × Y including {E × F : E ∈ Σ, F ∈ T}, so includes Σ⊗T,
Class Theorem again (136B).
b then
This means that if W ∈ Σ⊗T,
µW [{x}] = supn∈N ν(Yn ∩ W [{x}])
is defined for every x ∈ X and is a Σ-measurable function of x.
(b) Now, for n, k ∈ N, set
P 4n
Wnk = {(x, y) : f (x, y) ≥ 2−n k}, gn = k=1 2−n χWnk .
Then if we set
R P 4n
hn (x) = gn (x, y)ν(dy)= k=1 2−n νWnk [{x}]
for n ∈ N and x ∈ X, hn : X → [0, ∞] is Σ-measurable, and
R R
limn→∞ hn (x) = (limn→∞ gn (x, y))ν(dy) = f (x, y)ν(dy)

Rfor every x, because hgn (x, y)in∈N is a non-decreasing sequence with limit f (x, y) for all x ∈ X, y ∈ Y . So x 7→
f (x, y)ν(dy) is defined everywhere in X and is Σ-measurable.
R R R
(c) If E ⊆ X is measurable and has finite measure, then E f (x, y)ν(dy)µ(dx) = E×Y f dλ, applying 252B to the
product of the subspace measure µE and ν (and using 251Q to check that the product of µE and ν is the subspace
measure on E × Y ). Now if λW is defined and finite, there must be a non-decreasing
S sequence hEn in∈N of subsets of
X of finite measure such that λW = supn∈N λ(W ∩ (En × Y )), so that W \ n∈N (En × Y ) is negligible, and

Z Z
f dλ = lim f dλ
W n→∞ W ∩(En ×Y )
(by B.Levi’s theorem applied to hf × χ(W ∩ (En × Y ))in∈N )
Z Z Z
≤ lim f dλ = lim f (x, y)ν(dy)µ(dx)
n→∞ E ×Y n→∞ E
n n
ZZ
≤ f (x, y)ν(dy)µ(dx).

By 213B once more,


R R RR
f dλ = supλW <∞ W
f dλ ≤ f (x, y)ν(dy)µ(dx).
222 Product measures *252P

But also, if µ is semi-finite,


RR R R R
f (x, y)ν(dy)µ(dx) = supµE<∞ E
f (x, y)ν(dy)µ(dx) ≤ f dλ,
R RR
so f dλ = f (x, y)ν(dy)µ(dx), as claimed.

252Q The volume of a ball We now have all the essential machinery to perform a little calculation Prwhich is, I
suppose, desirable simply as general knowledge: the volume of the unit ball {x : kxk ≤ 1} = {(ξ1 , . . . , ξr ) : i=1 ξi2 ≤ 1}
in R r . In fact, from a theoretical point of view, I think we could very nearly just call it βr and leave it at that; but since
there is a general formula in terms of β2 = π and factorials, it seems shameful not to present it. The calculation has
nothing to do with Lebesgue integration, and I could dismiss it as mere advanced calculus; but since only a minority of
mathematicians are now taught calculus to this level with reasonable rigour before being introduced to the Lebesgue
integral, I do not doubt that many readers, like myself, missed some of the subtleties involved. I therefore take the
space to spell the details out in the style used elsewhere in this volume, recognising that the machinery employed is a
good deal more elaborate than is really necessary for this result.

(a) The first basic fact we need is that, for any n ≥ 1,


Z π/2
(2k)!
In = cosn t dt = π if n = 2k is even,
−π/2
(2k k!)2

(2k k!)2
=2 if n = 2k + 1 is odd.
(2k+1)!

P
P For n = 0, of course,
R π/2 0!
I0 = −π/2
1 dt = π = π,
(20 0!)2
while for n = 1 we have
(20 0!)2
I1 = sin π2 − sin(− π2 ) = 2 = 2 ,
1!

using the Fundamental Theorem of Calculus (225L) and the fact that sin′ = cos is bounded. For the inductive step to
n + 1 ≥ 2, we can use integration by parts (225F):
Z π/2
In+1 = cos t cosn t dt
−π/2
Z π/2
π π π π
= sin cosn − sin(− ) cosn (− ) + sin t · n cosn−1 t · sin t dt
2 2 2 2 −π/2
Z π/2
=n (1 − cos2 t) cosn−1 t dt = n(In−1 − In+1 ),
−π/2

n
so that In+1 = In−1 . Now the given formulae follow by an easy induction. Q
Q
n+1

(b) The next result is that, for any n ∈ N and any a ≥ 0,


Ra
−a
(a2 − s2 )n/2 ds = In+1 an+1 .
PP Of course this is an integration by substitution; but the singularity of the integrand at s = ±a complicates the
issue slightly. I offer the following argument. If a = 0 the result is trivial; take a > 0. For −a ≤ b ≤ a, set
Rb
F (b) = −a (a2 − s2 )n/2 ds. Because the integrand is continuous, F ′ (b) exists and is equal to (a2 − b2 )n/2 for −a < b < a
(222H). Set G(t) = F (a sin t); then G is continuous and
G′ (t) = aF ′ (a sin t) cos t = an+1 cosn+1 t
for − π2 < t < π
2. Consequently

Z a Z π/2
π π
(a2 − s2 )n/2 ds = F (a) − F (−a) = G( ) − G(− ) = G′ (t)dt
−a
2 2 −π/2
(by 225L, as before)
252Rc Fubini’s theorem 223

= an+1 In+1 ,

as required. Q
Q

(c) Now at last we are ready for the balls Br = {x : x ∈ R r , kxk ≤ 1}. Let µr be Lebesgue measure on R r , and set
βr = I1 I2 . . . Ir for r ≥ 1. I claim that, writing
Br (a) = {x : x ∈ R r , kxk ≤ a},
we have µr (Br (a)) = βr ar for every a ≥ 0. PP Induce on r. For r = 1 we have β1 = 2, B1 (a) = [−a, a], so the result is
trivial. For the inductive step to r + 1, we have

Z
µr+1 Br+1 (a) = µr {x : (x, t) ∈ Br+1 (a)}dt

(putting 251N and 252D together, and using the fact that Br+1 (a) is closed, therefore measurable)
Z a p
= µr Br ( a2 − t2 )dt
−a

(because (x, t) ∈ Br+1 (a) iff |t| ≤ a and kxk ≤ a2 − t2 )
Z a
= βr (a2 − t2 )r/2 dt
−a
(by the inductive hypothesis)
= βr ar+1 Ir+1
(by (b) above)
= βr+1 ar+1

(by the definition of βr+1 ). Thus the induction continues. Q


Q

(d) In particular, the r-dimensional Lebesgue measure of the r-dimensional ball Br = Br (1) is just βr = I1 . . . Ir .
Now an easy induction on k shows that

1 k
βr = π if r = 2k is even,
k!
22k+1 k! k
= π if r = 2k + 1 is odd.
(2k+1)!

(e) Note that in part (c) of the proof we saw that {x : x ∈ R r , kxk ≤ a} has measure βr ar for every a ≥ 0.
The formulae here are consistent with the assignation β0 = 1; which corresponds to saying that R 0 = {∅}, that
µ0 R 0 = 1 and that B0 = {∅}. Taking µ0 R 0 to be 1 is itself consistent with the idea that, following 251N, the product
measure µ0 × µr ought to match µ0+r on R 0+r .

252R Complex-valued functions It is easy to apply the results of 252B-252I above to complex-valued functions,
by considering their real and imaginary parts. Specifically:
(a) Let (X, Σ, µ) and (Y, T, ν) be measure spaces, and λ the c.l.d. product measure on X × Y . Suppose that
ν is σ-finite and that µ is either
RR strictly localizable or complete and locally
R determined. Let f be a λ-integrable
complex-valued function. Then f (x, y)ν(dy)µ(dx) is defined and equal to f dλ.

(b) Let (X, Σ, µ) and (Y, T, ν) be measure spaces, λ the c.l.d. product measure on X × Y , and
RRΛ its domain. Let f
be RR
a Λ-measurable complex-valued function defined on a member of Λ, and suppose that either |f (x, y)|µ(dx)ν(dy)
or |f (x, y)|ν(dy)µ(dx) is defined and finite. Then f is λ-integrable.

(c) Let (X, Σ, µ) and (Y, T, ν) be σ-finite measure spaces, λ the c.l.d. product measure on X × Y , and Λ its domain.
Let f be a Λ-measurable complex-valued function defined on a member of Λ. Then if one of
R R R R R
X×Y
|f (x, y)|λ(d(x, y)), Y X
|f (x, y)|µ(dx)ν(dy), X Y
|f (x, y)|ν(dy)µ(dx)
exists in R, so do the other two, and in this case
R R R R R
X×Y
f (x, y)λ(d(x, y)) = Y X
f (x, y)µ(dx)ν(dy) = X Y
f (x, y)ν(dy)µ(dx).
224 Product measures 252X

252X Basic exercises (a) Let (X, Σ, µ) and (Y, T, ν) be measure


R spaces, and λ the c.l.d. product measure on
X × Y . Let f be a λ-integrable real-valued function such that E×F f = 0 whenever E ∈ Σ, F ∈ T, µE < ∞ and
R
νF < ∞. Show that f = 0 λ-a.e. (Hint: use 251Ie to show that W f = 0 whenever λW < ∞.)

(b) Let f , g : R → R be two non-decreasing functions, and µf , µg the associated Lebesgue-Stieltjes measures (see
114Xa). Set
f (x+ ) = limt↓x f (t), f (x− ) = limt↑x f (t)
for each x ∈ R, and define g(x+ ), g(x− ) similarly. Show that whenever a ≤ b in R,
Z Z

f (x )µg (dx) + g(x+ )µf (dx) = g(b+ )f (b+ ) − g(a− )f (a− )
[a,b] [a,b]
Z Z
1 1
= (f (x− ) + f (x+ ))µg (dx) + ((g(x− ) + g(x+ ))µf (dx).
[a,b]
2 [a,b]
2

(Hint: find two expressions for (µf × µg ){(x, y) : a ≤ x < y ≤ b}.)

>(c) Let (X, Σ, µ) and (Y, T, ν) be complete locally determined measure spaces, λ the c.l.d. product measure on
X × Y , and Λ its domain. Suppose that A ⊆ X and B ⊆ Y . Show that A × B ∈ Λ iff either µA = 0 or νB = 0 or
A ∈ Σ and B ∈ T. (Hint: if B is not negligible and A × B ∈ Λ, take H such that νH < ∞ and B ∩ H is not negligible.
Then W = A × (B ∩ H) is measured by µ × νH , where νH is the subspace measure on H. Now apply 252D to µ, νH
and W to see that A ∈ Σ.)

> (d) Let (X1 , Σ1 , µ1 ), (X2 , Σ2 , µ2 ), (X2 , Σ3 , µ3 ) be three σ-finite measure spaces, and f a real-valued function
defined almost everywhere on X1 × X2 × RRRX3 and Λ-measurable, where Λ is the domain RRRof the product measure
described
RRR in 251W or 251Xg. Show that if |f (x 1 , x 2 , x 3 )|dx 1 dx 2 dx 3 is defined in R, then f (x1 , x2 , x3 )dx2 dx3 dx1
and f (x1 , x2 , x3 )dx3 dx1 dx2 exist and are equal.

(e) Give an example of strictly localizable measure spaces (X, Σ, µ), (Y, T, ν) and a W ∈ Σ⊗T b such that x 7→
νW [{x}] is not Σ-measurable. (Hint: in 252Kb, try Y a proper subset of [0, 1].)
2
RR
RR > (f ) Set f (x, y) = sin(x − y) if 0 ≤ y ≤ x ≤ y + 2π, 0 for other x, y ∈ R . Show that f (x, y)dx dy = 0 and
f (x, y)dy dx = 2π, taking all integrals with respect to Lebesgue measure.

x2 −y 2 R1R1 π R1R1 π
> (g) Set f (x, y) = for x, y ∈ ]0, 1]. Show that 0 0
f (x, y)dydx = , 0 0
f (x, y)dxdy = − .
(x2 +y 2 )2 4 4

b
> (h) Let (X, Σ, µ) and (Y, T, ν) be measure spaces, and f a Σ⊗T-measurable function defined on a subset of X × Y .
Show that y 7→ f (x, y) is T-measurable for every x ∈ X.

(i) Let r ≥ 1 be an integer, and write βr for the Lebesgue measure of the unit ball in R r . SetR gr (t) = rβr tr−1 for t ≥ 0,
φ(x) = kxk for x ∈ R r . (i) Writing µr for Lebesgue measure on Rr , show that µr φ−1 [E] = E rβr tr−1 µ1 (dt) for every
Lebesgue measurable set E ⊆ [0, ∞[. (Hint: start with intervals E, noting from 115Xe that µr {x : kxk ≤ a} = βr ar
for a ≥ 0, and progress to open sets, negligible sets and general measurable sets.) (ii) Using 235R, show that
Z Z ∞
−kxk2 /2 2 r
e µr (dx) = rβr tr−1 e−t /2 µ1 (dt) = 2(r−2)/2 rβr Γ( )
0
2
r √ 1
= 2r/2 βr Γ(1 + ) = ( 2Γ( ))r
2 2

where Γ is the Γ-function (225Xj). (iii) Show that


R∞ 2
2Γ( 21 )2 = 2β2 Γ(2) = 2β2 0
te−t /2
dt = 2π,
π r/2 R∞ 2 √
and hence that βr = and −∞
e−t /2
dt = 2π.
Γ(1+ r2 )

252Y Further exercises (a) Let (X, Σ, µ) be a measure space. Show that the following are equiveridical: (i) the
completion of µ is locally determined; (ii) the completion of µ coincides with the c.l.d. version of µ; (iii) whenever
R
RR space and λ the c.l.d. product measure onR X × Y and f is a function such that f dλ is
(Y, T, ν) is a σ-finite measure
defined in [−∞, ∞], then f (x, y)ν(dy)µ(dx) is defined and equal to f dλ.
252Ym Fubini’s theorem 225

(b) Let (X, Σ, µ) be a measure space. Show that the following are equiveridical: (i) µ has locally determined
negligible
RR sets; (ii) whenever (Y, T, ν) is a σ-finite
R measure space and λ the c.l.d. product measure on X × Y , then
f (x, y)ν(dy)µ(dx) is defined and equal to f dλ for any λ-integrable function f .

(c) Let (X, Σ, µ) and (Y, T, ν) be measure spaces, and R measure on X × Y (251C). Let
RR λ0 the primitive product
f be any λ0 -integrable real-valued function. Show that f (x, y)ν(dy)µ(dx) = f dλ0 . (Hint: show that there are
sequences hG
S n in∈N , hHn in∈N of sets of finite measure such that f (x, y) is defined and equal to 0 for every (x, y) ∈
(X × Y ) \ n∈N Gn × Hn .)

(d) Let (X, Σ, µ) and (Y, T, ν) be measure spaces; let λ0 be the primitive product measure on XR × Y , and R λ the
c.l.d. product measure. Show that if f is a λ0 -integrable real-valued function, it is λ-integrable, and f dλ = f dλ0 .

(e) Let (X, Σ, µ) be a complete locally determined measure space and a < b in R, endowed with Lebesgue measure;
let Λ be the domain of the c.l.d. product measure λ on X × [a, b]. Let f : X × ]a, b[ → R be a Λ-measurable function
such that t 7→ f (x, t) : [a, b] → R is continuous on [a, b] and differentiable on ]a, b[ for every x ∈ X. (i) Show that the
∂f ∂f
partial derivative with respect to the second variable is Λ-measurable. (ii) Now suppose that is λ-integrable
R ∂t R ∂t
and that f (x, t0 )µ(dx) is defined and finite for some t0 ∈ ]a, b[. Show that F (t) = f (x, t)µ(dx) is defined in R for
R ∂f
every t ∈ [a, b], that F is absolutely continuous, and that F ′ (t) = (x, t)µ(dx) for almost every t ∈ ]a, b[. (Hint:
∂t
R ∂f
F (c) = F (a) + X×[a,c] dλ for every c ∈ [a, b].)
∂t

Γ(a)Γ(b) R1
(f ) Show that = 0
ta−1 (1 − t)b−1 dt for all a, b > 0. (Hint: show that
Γ(a+b)
R∞ R∞ R∞ Rx
0
ta−1 t
e−x (x − t)b−1 dxdt = 0
e−x 0
ta−1 (x − t)b−1 dtdx.)

(g) Let (X, Σ, µ) and (Y, T, ν) be σ-finite measure


R R R product measure on X × Y . Suppose
spaces and λ the Rc.l.d.
p
that
f ∈ L0 (λ) and that 1 < p < ∞. Show that ( | f (x, y)dx|p dy)1/p ≤ ( |f (x, y)|p dy)1/p dx. (Hint: set q = p−1 and
R q
consider the integral |f (x, y)g(y)|λ(d(x, y)) for g ∈ L (ν), using 244K.)

1 Ry
(h) Let ν be Lebesgue measure on [0, ∞[; suppose that f ∈ Lp (ν) where 1 < p < ∞. Set F (y) = f for y > 0.
y 0
p R1
Show that kF kp ≤ kf kp . (Hint: F (y) = 0 f (xy)dx; use 252Yg with X = [0, 1], Y = [0, ∞[.)
p−1

(i) Show that if p is any non-zero (real) polynomial in r variables, then {x : x ∈ R r , p(x) = 0} is Lebesgue negligible.

(j) Let (X, Σ, µ) and (Y, T, ν) be measure spaces, with c.l.d. product (X × Y, Λ, λ). Let f be a non-negative
Λ-measurable real-valued function defined on a λ-conegligible set, and suppose that
R R 
f (x, y)µ(dx) ν(dy)
is finite. Show that f is λ-integrable.

(k) Let (X, Σ, µ) be the unit interval [0, 1] with Lebesgue measure, and (Y, T, ν) the interval with counting measure,
as in 252K; let λ0 be the primitive product measure on [0, 1]2 . (i) Setting ∆ = {(t, t)P: t ∈ [0, 1]}, show that λ0 ∆ = ∞.
(ii) Show that λ0 is not semi-finite. (iii) Show that if W ∈ dom λ0 , then λ0 W = y∈[0,1] µW −1 [{y}] if there are a
countable set A ⊆ [0, 1] and a Lebesgue negligible set E ⊆ [0, 1] such that W ⊆ ([0, 1] × A) ∪ (E × [0, 1]), ∞ otherwise.

(l) Let (X, Σ, µ) be a measure space, and λ0 the primitive product measure on X × R, where R is given Lebesgue
measure; write Λ for its domain. For any [0, ∞]-valued function f defined on a conegligible subset of X, write Ωf , Ω′f
R
for the corresponding ordinate sets, as in 252N. Show that if any of λ0 Ωf , λ0 Ω′f , f dµ is defined and finite, so are the
others, and all three are equal.

(m) Let (X, Σ, µ) be a complete locally determined measure space, and f a non-negative function defined on a
conegligible subset of X. Write Ωf , Ω′f for the corresponding ordinate sets, as in 252N. Let λ be the c.l.d. product
R
measure on X × R, where R is given Lebesgue measure. Show that f dµ = λ∗ Ωf = λ∗ Ω′f .
226 Product measures 252Yn

R R∞
(n) Let (X, Σ, µ) be a measure space and f : X → [0, ∞[ a function. Show that f dµ = 0
µ∗ {x : f (x) ≥ t}dt.

(o) Let (X, Σ, µ) be a complete measure space and write M0,∞ for the set {f : f ∈ L0 (µ), µ{x : |f (x)| ≥ a}
is finite for some a ∈ [0, ∞[}. (i) Show that for each f ∈ M0,∞ there is a non-increasing f ∗ : ]0, ∞[ → R such
that µL {t : f ∗ (t) ≥ α} = µ{x : |f (x)| ≥ α} for every α > 0, writing µL for Lebesgue measure. (ii) Show that
R R µE
E
|f |dµ ≤ 0 f ∗ dµL for every E ∈ Σ (allowing ∞). (Hint: (f × χE)∗ ≤ f ∗ .) (iii)RShow that kfR∗ kp = kf kp for every
p ∈ [1, ∞], f ∈ M0,∞ . (Hint: (|f |p )∗ = (f ∗ )p .) (iv) Show that if f , g R∈ M0,∞ then |f × g|dµ ≤R f ∗ × g ∗ dµL . (Hint:
a
look at simple functions first.) (v) Show that if µ is atomless then 0 f ∗ dµL = supE∈Σ,µE≤a E |f | for every a ≥ 0.
(Hint: 215D.) (vi) Show that A ⊆ L1 (µ) is uniformly integrable iff {f ∗ : f ∈ A} is uniformly integrable in L1 (µL ). (f ∗
is called the decreasing rearrangement of f .)

(p) Let (X, Σ, µ) be a complete locally determined measure space, and write ν for Lebesgue measure on [0, 1]. Show
that the c.l.d. product measure λ on X × [0, 1] is localizable iff µ is localizable. (Hints: (i) if E ⊆ Σ, show that F ∈ Σ
is an essential supremum for E in Σ iff F × [0, 1] is an essential supremum for {E × [0, 1] : E ∈ E} in Λ = dom λ. (ii)
For W ∈ Λ, n ∈ N, k < 2n set
Wnk = {x : x ∈ X, ν ∗ {t : (x, t) ∈ W, 2−n k ≤ t ≤ 2−n (k + 1)} ≥ 2−n−1 }.
Show that if W ⊆ Λ and Fnk is an essential supremum for {Wnk : W ∈ W} in Σ for all n, k, then
S T S −m
n∈N m≥n k<2m Fmk × [2 k, 2−m (k + 1)]
is an essential supremum for W in Λ.)

(q) Let (X, Σ, µ) be the space of Example 216D, and give Lebesgue measure to [0, 1]. Show that the c.l.d. product
measure on X × [0, 1] is complete, locally determined, atomless and not localizable.

(r) Let (X, Σ, µ) be a complete locally determined measure space and (Y, T, ν) a semi-finite measure space with
νY > 0. Show that if the c.l.d. product measure on X × Y is strictly localizable, then µ is strictly localizable. (Hint:
take F ∈ T, 0 < νF < ∞. Let hWi ii∈I be a decomposition of X × Y . For i ∈ I, n ∈ N set Ein = {x : ν ∗ {y : y ∈
F, (x, y) ∈ Wi } ≥ 2−n }. Apply 213Ye to {Ein : i ∈ I, n ∈ N}.)

(s) Let (X, Σ, µ) be the space of Example 216E, and give Lebesgue measure to [0, 1]. Show that the c.l.d. product
measure on X × [0, 1] is complete, locally determined, atomless and localizable, but not strictly localizable.

(t) Let (X, Σ, µ) be a measure space andR f a µ-integrable


R complex-valued
R function. For α ∈ ]−π, π] set Hα = {x :
π
x ∈ dom f, Re(e−iα f (x)) > 0}. Show that −π Re(e−iα Hα f )dα = 2 |f |, and hence that there is some α such that
R 1 R
| Hα f | ≥ |f |. (Compare 246F.)
π
R∞
(u) Set f (t) = t − ln(t + 1) for t > −1. (i) Show that Γ(a + 1) = aa+1 e−a −1 e−af (u) du for every a > 0. (Hint:
substitute u = at − 1 in 225Xj(iii).) (ii) Show that there is a δ > 0 such that f (t) ≥ 13 t2 for −1 ≤ t ≤ δ. (iii) Setting
α = 21 f (δ), show that (for a ≥ 1)
√ R ∞ −af (t) √ R∞
a δ e dt ≤ ae−aα 0 e−f (t)/2 dt → 0
√ √ √ 2
as a → ∞. (iv) Set ga (t) = e−af (t/ a) if − a < t ≤ δ a, 0 otherwise. Show that ga (t) ≤ e−t /3 for all a, t and that
2
lima→∞ ga (t) = e−t /2 for all t, so that
Z ∞ Z δ
ea Γ(a+1) √ √
lim 1 = lim a e−af (t) dt = lim a e−af (t) dt
a→∞ aa+ 2 a→∞ −1 a→∞ −1
Z ∞ Z ∞ √
−t2 /2
= lim ga (t)dt = e dt = 2π.
a→∞ −∞ −∞

n! √
(v) Show that limn→∞ √ = 2π. (This is Stirling’s formula.)
e−n nn n

(v) Let (X, Σ, µ) be a complete locally determined measure space and f , g two real-valued, µ-virtually measurable
functions defined almost everywhere in X. (i) Let λ be the c.l.d. product of µ and Lebesgue measure on R. Setting
Ω∗f = {(x, a) : x ∈ dom f , a ∈ R, a ≤ f (x)} and Ω∗g = {(x, a) : x ∈ dom g, a ∈ R, a ≤ g(x)}, show that λ(Ω∗f \ Ω∗g ) =
R R
(f − g)+ dµ and λ(Ω∗ △Ω∗g ) = |f − g|dµ. (ii) Suppose that µ is σ-finite. Show that
252 Notes Fubini’s theorem 227
R R∞
|f − g|dµ = −∞
µ({x : x ∈ dom f ∩ dom g, (f (x) − a)(g(x) − a) < 0}da.
(iii) Suppose that µ is σ-finite, that T is a σ-subalgebra
R of Σ, that E ∈ Σ and that g : X → [0, 1] is T-measurable.
Show that there is an F ∈ T such that µ(E△F ) ≤ |χE − g|dµ.

252 Notes and comments For a volume and a half now I have asked you to accept the idea of integrating partially-
defined functions, insisting that sooner or later they would appear at the core of the subject. The moment has now
come. If we wish to apply Fubini’s and Tonelli’s theorems in the most fundamental of all cases, with both factors
equal to Lebesgue measure on the unit interval, it is surely natural to look at all functions which are integrable on
the square for two-dimensional Lebesgue measure. Now two-dimensional Lebesgue measure is a complete measure, so,
in particular, assigns zero measure to any set of the form {(x, b) : x ∈ A} or {(a, y) : y ∈ A}, whether or not the
set A is measured by one-dimensional measure. Accordingly, if f is a function of two variables which is integrable
for two-dimensional Lebesgue measure, there is no reason why any particular section x 7→ f (x, b) or y 7→ f (a, y)
should
RR be measurable, let alone integrable. Consequently, even if f itself is defined everywhere, the outer integral of
f (x, y)dxdy is likely to be applied to a function which is not defined for every y. Let me remark that the problem
does not concern ‘∞’; the awkward functions are those with sections so irregular that they cannot be assigned an
integral at all.
I have seen many approaches to this particular nettle, generally less whole-hearted than the one I have determined
on for this treatise. Part of the difficulty is that Fubini’s theorem really is at the centre of measure theory. Over large
parts of the subject, it is possible to assert that a result is non-trivial if and only if it depends on Fubini’s theorem.
I am therefore unwilling to insert any local fix, saying that ‘in this chapter, we shall integrate functions which are
not defined everywhere’; before long, such a provision would have to be interpolated into the preambles to half the
best theorems, or an explanation offered of why it wasn’t necessary in their particular contexts. I suppose that one
b
of the commonest responses is (like Halmos 50) to restrict attention to Σ⊗T-measurable functions, which eliminates
measurability problems for the moment (252Xh, 252P); but unhappily (or rather, to my mind, happily) there are
b
crucial applications in which the functions are not actually Σ⊗T-measurable, but belong to some wider class, and
this restriction sooner or later leads to undignified contortions as we are forced to adapt limited results to unforeseen
contexts. Besides, it leaves unsaid the really rather important information that if f is a measurable function of two
variables then (under appropriate conditions) almost all its sections are measurable (252E).
In 252B and its corollaries there is a clumsy restriction: we assume that one of the measures is σ-finite and the
other is either strictly localizable or complete and locally determined. The obvious question is, whether we need these
hypotheses. From 252K we see that the hypothesis ‘σ-finite’ on the second factor can certainly not be abandoned, even
when the first factor is a complete probability measure. The requirement ‘µ is either strictly localizable or complete
and locally determined’ is in fact fractionally stronger than what is needed, as well as disagreeably elaborate. The
‘right’ hypothesis is that the completion of µ should be locally determined (see 252Ya). The point is that because the
product of two measures is the same as the product of their c.l.d. versions (251T), no theorem which leads from the
product measure to the factor measures can distinguish between a measure and its c.l.d. version; so that, in 252B, we
must expect to need µ and its c.l.d. version to give rise to the same integrals. The proof of 252B would be better
focused if the hypothesis was simplified to ‘ν is σ-finite and µ is complete and locally determined’. But this would just
transfer part of the argument into the proof of 252C.
We also have to work a little harder in 252B in order to cover functions and integrals taking the values ±∞. Fubini’s
theorem is so central to measure theory that I believe it is worth taking a bit of extra trouble to state the results in
maximal generality. This is especially important because we frequently apply it in multiply repeated integrals, as in
252Xd, in which we have even less control than usual over the intermediate functions to be integrated.
I have expressed all the main results of this section in terms of the ‘c.l.d.’ product measure. In the case of σ-finite
spaces, of course, which is where the theory works best, we could just as well use the ‘primitive’ product measure.
Indeed, Fubini’s theorem itself has a version in terms of the primitive product measure which is rather more elegant
than 252B as stated (252Yc), and covers the great majority of applications. (Integrals with respect to the primitive and
c.l.d. product measures are of course very closely related; see 252Yd.) But we do sometimes need to look at non-σ-finite
spaces, and in these cases the asymmetric form in 252B is close to the best we can do. Using the primitive product
measure does not help at all with the most substantial obstacle, the phenomenon in 252K (see 252Yk).
The pre-calculus concept of an integral as ‘the area under a curve’ is given expression in 252N: the integral of a
non-negative function is the measure of its ordinate set. This is unsatisfactory as a definition of the integral, not just
because of the requirement that the base space should be complete and locally determined (which can be dealt with
by using the primitive product measure, as in 252Yl), but because the construction of the product measure involves
integration (part (c) of the proof of 251E). The idea of 252N is to relate the measure of an ordinate set to the integral
of the measures of its vertical sections. Curiously, if instead we integrate the measures of its horizontal sections, as in
228 Product measures 252 Notes

252O, we get a more versatile result. (Indeed this one does not Rinvolve the concept of ‘product measure’, and could

have appeared at any point after §123.) Note that the integral 0 . . . dt here is applied to a monotonic function, so
may be interpreted as an improper Riemann integral. If you think you know enough about the Riemann integral to
make this a tempting alternative to the construction in §122, the tricky bit now becomes the proof that the integral is
additive.
A different line of argument is to use integration over sections to define a product measure. The difficulty with this
approach is that unless we take great care we may find ourselves with an asymmetric construction. My own view is
that such an asymmetry is acceptable only when there is no alternative. But in Chapter 43 of Volume 4 I will describe
a couple of examples.
Of the two examples I give here, 252K is supposed to show that when I call for σ-finite spaces they are really
necessary, while 252L is supposed to show that joint measurability is essential in Tonelli’s theorem and its corollaries.
The factor spaces in 252K, Lebesgue measure and counting measure, are chosen to show that it is only the lack of
σ-finiteness that can be the problem; they are otherwise as regular as one can reasonably ask. In 252L I have used the
countable-cocountable measure on ω1 , which you may feel is fit only for counter-examples; and the question does arise,
whether the same phenomenon occurs with Lebesgue measure. This leads into deep water, and I will return to it in
Chapter 53 of Volume 5.
I ought perhaps to note explicitly that in Fubini’s theorem, we really do need to have a function which is integrable
for the product measure.
RR I include
RR 252Xf and 252Xg to remind you that even in the best-regulated circumstances, the
repeated integrals f dxdy, f dydx may fail to be equal if f is not integrable as a function of two variables.
There are many ways to calculate the volume βr of an r-dimensional ball; the one I have used in 252Q follows a line
that would have been natural to me before I ever heard of measure theory. In 252Xi I suggest another method. The idea
of integration-by-substitution, used in part (b) of the argument for 252Q, is there supported by an ad hoc argument; I
will present a different, more generally applicable, approach in Chapter 26. Elsewhere (252Xi, 252Yf, 252Yh, 252Yu)
I find myself taking for granted substitutions of the form t 7→ at, t 7→ a + t, t 7→ t2 ; for a systematic justification, see
§263. Of course an enormous number of other formulae of advanced calculus are also based on repeated integration of
one kind or another, and I give a sample handful of such results (252Xb, 252Ye-252Yh, 252Yu).

253 Tensor products


The theorems of the last section show that the integrable functions on a product of two measure spaces can be
effectively studied in terms of integration on each factor space separately. In this section I present a very striking
relationship between the L1 space of a product measure and the L1 spaces of its factors, which actually determines
the product L1 up to isomorphism as Banach lattice. I start with a brief note on bilinear operators (253A) and a
description of the canonical bilinear operator from L1 (µ) × L1 (ν) to L1 (µ × ν) (253B-253E). The main theorem of the
section is 253F, showing that this canonical map is universal for continuous bilinear operators from L1 (µ) × L1 (ν) to
Banach spaces; it also determines the ordering of L1 (µ × ν) (253G). I end with a description of a fundamental type
of conditional expectation operator (253H) and notes on products of indefinite-integral measures (253I) and upper
integrals of special kinds of function (253J, 253K).

253A Bilinear operators Before looking at any of the measure theory in this section, I introduce a concept from
the theory of linear spaces.
(a) Let U , V and W be linear spaces over R (or, indeed, any field). A map φ : U × V → W is bilinear if it is linear
in each variable separately, that is,
φ(u1 + u2 , v) = φ(u1 , v) + φ(u2 , v),

φ(u, v1 + v2 ) = φ(u, v1 ) + φ(u, v2 ),

φ(αu, v) = αφ(u, v) = φ(u, αv)


for all u, u1 , u2 ∈ U , v, v1 , v2 ∈ V and scalars α. Observe that such a φ gives rise to, and in turn can be defined by, a
linear operator T : U → L(V ; W ), writing L(V ; W ) for the space of linear operators from V to W , where
(T u)(v) = φ(u, v)
for all u ∈ U , v ∈ V . Hence, or otherwise, we can see, for instance, that φ(0, v) = φ(u, 0) = 0 whenever u ∈ U and
v ∈V.
If W ′ is another linear space over the same field, and S : W → W ′ is a linear operator, then Sφ : U × V → W ′ is
bilinear.
253D Tensor products 229

(b) Now suppose that U , V and W are normed spaces, and φ : U × V → W a bilinear operator. Then we say that
φ is bounded if sup{kφ(u, v)k : kuk ≤ 1, kvk ≤ 1} is finite, and in this case we call this supremum the norm kφk of φ.
Note that kφ(u, v)k ≤ kφkkukkvk for all u ∈ U , v ∈ V (because
kφ(u, v)k = αβkφ(α−1 u, β −1 v)k ≤ αβkφk
whenever α > kuk, β > kvk).
If W ′ is another normed space and S : W → W ′ is a bounded linear operator, then Sφ : U × V → W ′ is a bounded
bilinear operator, and kSφk ≤ kSkkφk.

253B Definition The most important bilinear operators of this section are based on the following idea. Let f and
g be real-valued functions. I will write f ⊗ g for the function (x, y) 7→ f (x)g(y) : dom f × dom g → R.

253C Proposition (a) Let X and Y be sets, and Σ, T σ-algebras of subsets of X, Y respectively. If f is a Σ-
measurable real-valued function defined on a subset of X, and g is a T-measurable real-valued function defined on a
b
subset of Y , then f ⊗ g, as defined in 253B, is Σ⊗T-measurable.
(b) Let (X, Σ, µ) and (Y, T, ν) be measure spaces, and λ the c.l.d. product measure on X × Y . If f ∈ L0 (µ) and
g ∈ L0 (ν), then f ⊗ g ∈ L0 (λ).
Remark Recall from 241A that L0 (µ) is the space of µ-virtually measurable real-valued functions defined on µ-
conegligible subsets of X.
b
proof (a) The point is that f ⊗ χY is Σ⊗T-measurable, because for any α ∈ R there is an E ∈ Σ such that
{x : f (x) ≥ α} = E ∩ dom f ,
so that
{(x, y) : (f ⊗ χY )(x, y) ≥ α} = (E ∩ dom f ) × Y = (E × Y ) ∩ dom(f ⊗ χY ),
b
and of course E ×Y ∈ Σ⊗T. b
Similarly, χX ⊗g is Σ⊗T-measurable b
and f ⊗g = (f ⊗χY )×(χX ⊗g) is Σ⊗T-measurable.
(b) Let E ∈ Σ, F ∈ T be conegligible subsets of X, Y respectively such that E ⊆ dom f , F ⊆ dom g, f ↾E is
Σ-measurable and g↾F is T-measurable. Write Λ for the domain of λ. Then Σ⊗T b ⊆ Λ (251Ia). Also E × F is
λ-conegligible, because

λ((X × Y ) \ (E × F )) ≤ λ((X \ E) × Y ) + λ(X × (Y \ F ))


= µ(X \ E) · νY + µX · ν(Y \ F ) = 0
(also from 251Ia). So dom(f ⊗ g) ⊇ E × F is conegligible. Also, by (a), (f ⊗ g)↾(E × F ) = (f ↾E) ⊗ (g↾F ) is
b
Σ⊗T-measurable, therefore Λ-measurable, and f ⊗ g is virtually measurable. Thus f ⊗ g ∈ L0 (λ), as claimed.

253D Now we can apply the ideas of 253B-253C to integrable functions.


Proposition Let (X, Σ, µ) and (Y, T, ν) be measure R writeR λ for the c.l.d. product measure on X × Y . If
R spaces, and
f ∈ L1 (µ) and g ∈ L1 (ν), then f ⊗ g ∈ L1 (λ) and f ⊗ g dλ = f dµ g dν.
Remark I follow §242 in writing L1 (µ) for the space of µ-integrable real-valued functions.
proof (a) Consider first the case f = χE, g = χF where E ∈ Σ, F ∈ T have finite measure; then f ⊗ g = χ(E × F ) is
λ-integrable with integral
R R
λ(E × F ) = µE · νF = f dµ · g dν,
by 251Ia.
R R R
(b) It follows at once that f ⊗ g is λ-simple, with f ⊗ g dλ = f dµ g dν, whenever f is a µ-simple function and
g is a ν-simple function.
(c) If f and g are non-negative integrable functions, there are non-decreasing sequences hfn in∈N , hgn in∈N of non-
negative simple functions converging almost everywhere to f , g respectively; now note that if E ⊆ X, F ⊆ Y are
conegligible, E × F is conegligible in X × Y , as remarked in the proof of 253C, so the non-decreasing sequence
hfn × gn in∈N of λ-simple functions converges almost everywhere to f ⊗ g, and
R R R R R R
f ⊗ g dλ = limn→∞ fn ⊗ gn dλ = limn→∞ fn dµ gn dν = f dµ g dν
by B.Levi’s theorem.
230 Product measures 253D

(d) Finally, for general f and g, we can express them as the differences f + − f − , g + − g − of non-negative integrable
functions, and see that
R R R R
f ⊗ g dλ = f + ⊗ g + − f + ⊗ g − − f − ⊗ g + + f − ⊗ g − dλ = f dµ g dν.

253E The canonical map L1 × L1 → L1 I continue the argument from 253D. Because E × F is conegligible in
X × Y whenever E and F are conegligible subsets of X and Y , f1 ⊗ g1 = f ⊗ g λ-a.e. whenever f = f1 µ-a.e. and
g = g1 ν-a.e. We may therefore define u ⊗ v ∈ L1 (λ), for u ∈ L1 (µ) and v ∈ L1 (ν), by saying that u ⊗ v = (f ⊗ g)•
whenever u = f • and v = g • .
Now if f , f1 , f2 ∈ L1 (µ), g, g1 , g2 ∈ L1 (ν) and a ∈ R,
(f1 + f2 ) ⊗ g = (f1 ⊗ g) + (f2 ⊗ g),

f ⊗ (g1 + g2 ) = (f ⊗ g1 ) + (f ⊗ g2 ),

(af ) ⊗ g = a(f ⊗ g) = f ⊗ (ag).


It follows at once that the map (u, v) 7→ u ⊗ v is bilinear. R R R
Moreover, if f ∈ L1 (µ) and g ∈ L1 (ν), |f | ⊗ |g| = |f ⊗ g|, so |f ⊗ g|dλ = |f |dµ |g|dν. Accordingly
ku ⊗ vk1 = kuk1 kvk1
for all u ∈ L1 (µ), v ∈ L1 (ν). In particular, the bilinear operator ⊗ is bounded, with norm 1 (except in the trivial case
in which one of L1 (µ), L1 (ν) is 0-dimensional).

253F We are now ready for the main theorem of this section.

Theorem Let (X, Σ, µ) and (Y, T, ν) be measure spaces, and let λ be the c.l.d. product measure on X × Y . Let W
be any Banach space and φ : L1 (µ) × L1 (ν) → W a bounded bilinear operator. Then there is a unique bounded linear
operator T : L1 (λ) → W such that T (u ⊗ v) = φ(u, v) for all u ∈ L1 (µ) and v ∈ L1 (ν), and kT k = kφk.

proof (a) The centre of the argument is the following fact: if E0 , . . . , En are measurable Pn sets of finite measure in
X,
Pn F 0 , . . . , F n are measurable sets of finite measure in Y , a 0 , . . . , a n ∈ R and i=0 a i χ(E i × Fi ) = 0 λ-a.e., then
i=0 a i φ(χE •
i , χF i

) = 0 in W . P
P We can find a disjoint family hG j i j≤m of measurable sets of finite measure in
X
Pm such that each E i is expressible as a union of some subfamily of the G j ; so that χE i is expressible in the form
j=0 b ij χG j (see 122Ca). Similarly, we can find a disjoint family hH k i k≤l of measurable sets of finite measure in Y
Pl
such that each χFi is expressible as k=0 cik χHk . Now
Pm Pl Pn  Pn
j=0 k=0 i=0 ai bij cik χ(Gj × Hk ) = i=0 ai χ(Ei × Fi ) = 0 λ-a.e.

Because the Gj × Hk are disjoint, and


Pnλ(Gj × Hk ) = µGj · νHk for all j, k, it follows
Pthat for every j ≤ m, k ≤ l we
n
have either µGj = 0 or νHk = 0 or i=0 ai bij cik = 0. In any of these three cases, i=0 ai bij cik φ(χG•j , χHk• ) = 0 in
W . But this means that
Pm Pl Pn  Pn
0 = j=0 k=0 i=0 ai bij cik φ(χGj , χHk ) =
• •
i=0 ai φ(χEi , χFi ),
• •

as claimed. Q
Q

(b) It follows that if E0 , . . . , En , E0′ , . . . , Em are measurable sets of finite ′ ′
0 , . . . , Fn , F0 , . . . , Fm are
Pn measure in X, FP
′ ′ m ′ ′ ′
measurable sets of finite measure in Y , a0 , . . . , an , a0 , . . . , am ∈ R and i=0 ai χ(Ei × Fi ) = i=0 ai χ(Ei × Fi ) λ-a.e.,
then
Pn Pm ′ ′• ′•
i=0 ai φ(χEi , χFi ) = i=0 ai φ(χEi , χFi )
• •

in W . Let M be the linear subspace of L1 (λ) generated by


{χ(E × F )• : E ∈ Σ, µE < ∞, F ∈ T, νF < ∞};
then we have a unique map T0 : M → W such that
Pn Pn
T0 ( i=0 ai χ(Ei × Fi )• ) = i=0 ai φ(χEi• , χFi• )
whenever E0 , . . . , En are measurable sets of finite measure in X, F0 , . . . , Fn are measurable sets of finite measure in
Y and a0 , . . . , an ∈ R. Of course T0 is linear.
253F Tensor products 231

(c) Some of the same calculations show that kT0 uk ≤ kφkkuk1 for every u ∈ M . P
P If u ∈ M , then, by the arguments
Pm Pl
of (a), we can express u as j=0 k=0 ajk χ(Gj × Hk )• , where hGj ij≤m and hHk ik≤l are disjoint families of sets of
finite measure. Now

X l
m X m X
X l
kT0 uk = k ajk φ(χG•j , χHk• )k ≤ |ajk |kφ(χG•j , χHk• )k
j=0 k=0 j=0 k=0

X l
m X m X
X l
≤ |ajk |kφkkχGj k1 kχHk k1 = kφk
• •
|ajk |µGj · νHk
j=0 k=0 j=0 k=0
m X
X l
= kφk |ajk |λ(Gj × Hk ) = kφkkuk1 ,
j=0 k=0

as claimed. Q
Q
(d) The next point is to observe that M is dense in L1 (λ) for k k1 . P P Repeating the ideas above once again,
weSobserve that if E0 , . . . , En are sets of finite measure in X and F0 , . . . , Fn are sets of finite measure in Y , then
χ( i≤n Ei × Fi )• ∈ M ; this is because, expressing each Ei as a union of Gj , where the Gj are disjoint, we have
S S ′
i≤n Ei × Fi = j≤m Gj × Fj ,
S
where Fj′ = {Fi : Gj ⊆ Ei } for each j; now hGj × Fj′ ij≤m is disjoint, so
S Pm
χ( j≤m Gj × Fj )• = j=0 χ(Gj × Fj′ )• ∈ M.
So 251Ie tells us that whenever λH < ∞ and ǫ > 0 there is a G such that λ(H△G) ≤ ǫ and χG• ∈ M ; now
kχH • − χG• k1 = λ(G△H) ≤ ǫ,
so χH • is approximated arbitrarily closely by members of M , and belongs to the closure M of M in L1 (λ). Because M
is a linear subspace of L1 (λ), so is M (2A4Cb); accordingly M contains the equivalence classes of all λ-simple functions;
but these are dense in L1 (λ) (242Mb), so M = L1 (λ), as claimed. Q Q
(e) Because W is a Banach space, it follows that there is a bounded linear operator T : L1 (λ) → W extending T0 ,
with kT k = kT0 k ≤ kφk (2A4I). Now T (u ⊗ v) = φ(u, v) for all u ∈ L1 (µ), v ∈ L1 (ν). P
P If u = χE • and v = χF • ,
where E, F are measurable sets of finite measure, then
T (u ⊗ v) = T (χ(E × F )• ) = T0 (χ(E × F )• ) = φ(χE • , χF • ) = φ(u, v).
Because φ and ⊗ are bilinear and T is linear,
T (f • ⊗ g • ) = φ(f • , g • )
whenever f and g are simple functions. Now whenever u ∈ L1 (µ), v ∈ L1 (ν) and ǫ > 0, there are simple functions f ,
g such that ku − f • k1 ≤ ǫ, kv − g • k1 ≤ ǫ (242Mb again); so that

kφ(u, v) − φ(f • , g • )k ≤ kφ(u − f • , v − g • )k + kφ(u, g • − v)k + kφ(f • − u, v)k


≤ kφk(ǫ2 + ǫkuk1 + ǫkvk1 ).
Similarly
ku ⊗ v − f • ⊗ g • k1 ≤ ǫ(ǫ + kuk1 + kvk1 ),
so
kT (u ⊗ v) − T (f • ⊗ g • )k ≤ ǫkT k(ǫ + kuk1 + kvk1 );
because T (f • ⊗ g • ) = φ(f • , g • ),
kT (u ⊗ v) − φ(u, v)k ≤ ǫ(kT k + kφk)(ǫ + kuk1 + kvk1 ).
As ǫ is arbitrary, T (u ⊗ v) = φ(u, v), as required. Q
Q
(f ) The argument of (e) ensured that kT k ≤ kφk. Because ku ⊗ vk1 ≤ kuk1 kvk1 for all u ∈ L1 (µ) and v ∈ L1 (ν),
kφ(u, v)k ≤ kT kkuk1 kvk1 for all u, v, and kφk ≤ kT k; so kT k = kφk.
(g) Thus T has the required properties. To see that it is unique, we have only to observe that any bounded linear
operator S : L1 (λ) → W such that S(u ⊗ v) = φ(u, v) for all u ∈ L1 (µ), v ∈ L1 (ν) must agree with T on objects of the
232 Product measures 253F

form χ(E × F )• where E and F are of finite measure, and therefore on every member of M ; because M is dense and
both S and T are continuous, they agree everywhere in L1 (λ).

253G The order structure of L1 In 253F I have treated the L1 spaces exclusively as normed linear spaces.
In general, however, the order structure of an L1 space (see 242C) is as important as its norm. The map ⊗ :
L1 (µ) × L1 (ν) → L1 (λ) respects the order structures of the three spaces in the following strong sense.
Proposition Let (X, Σ, µ) and (Y, T, ν) be measure spaces, and λ the c.l.d. product measure on X × Y . Then
(a) u ⊗ v ≥ 0 in L1 (λ) whenever u ≥ 0 in L1 (µ) and v ≥ 0 in L1 (ν).
(b) The positive cone {w : w ≥ 0} of L1 (λ) is precisely the closed convex hull C of {u ⊗ v : u ≥ 0, v ≥ 0} in L1 (λ).
*(c) Let W be any Banach lattice, and T : L1 (λ) → W a bounded linear operator. Then the following are
equiveridical:
(i) T w ≥ 0 in W whenever w ≥ 0 in L1 (λ);
(ii) T (u ⊗ v) ≥ 0 in W whenever u ≥ 0 in L1 (µ) and v ≥ 0 in L1 (ν).
proof (a) If u, v ≥ 0 then they are expressible as f • , g • where f ∈ L1 (µ), g ∈ L1 (ν), f ≥ 0 and g ≥ 0. Now f ⊗ g ≥ 0
so u ⊗ v = (f ⊗ g)• ≥ 0.
(b)(i) Write L1 (λ)+ for {w : w ∈ L1 (λ), w ≥ 0}. Then L1 (λ)+ is a closed convex set in L1 (λ) (242De); by (a), it
contains u ⊗ v whenever u ∈ L1 (µ)+ and v ∈ L1 (ν)+ , so it must include C.

P(ii)(α) Of course 0 = 0 ⊗ 0 ∈ C. (β) If u ∈ M , as defined in the proof of 253F, and u > 0, then u is expressible
as j≤m,k≤l ajk χ(Gj × Hk )• , where G0 , . . . , Gm and H0 , . . . , Hl are disjoint sequences of sets of finite measure, as in
(a) of the proof of 253F. Now ajk can be negative only if χ(Gj × Hk )• = 0, so replacing P every ajk by max(0, ajk ) if
necessary, we can suppose that ajk ≥ 0 for all j, k. Not all the ajk can be zero, so a = j≤m,k≤l ajk > 0, and
P ajk P ajk
u= j≤m,k≤l a · aχ(Gj × Hk )• = j≤m,k≤l a · (aχG•j ) ⊗ χHk• ∈ C.

(γ) If w ∈ L1 (λ)R+ and Rǫ > 0, express w as h• where 1


Pn h ≥ 0 in L (λ). There is a simple function h1 ≥ 0 such that
h1 ≤a.e. h and h ≤ h1 + ǫ. Express h1 as i=0 ai χHi where λHi < ∞ and ai ≥ 0 for each i, and for each
i ≤ n choose
S sets G i0 , . . . , Gim i
∈ Σ, F i0 , . . . , F im i
∈ T, all of finite measure, such that Gi0 , . . . , Gimi are disjoint and
λ(Hi △ j≤mi Gij × Fij ) ≤ ǫ/(n + 1)(ai + 1), as in (d) of the proof of 253F. Set
Pn Pm i
w0 = i=0 ai j=0 χ(Gij × Fij )• .
Then w0 ∈ C because w0 ∈ M and w0 ≥ 0. Also

kw − w0 k1 ≤ kw − h•1 k1 + kh•1 − w0 k1
Z X n Z mi
X
≤ (h − h1 )dλ + ai |χHi − χ(Gij × Fij )|dλ
i=0 j=0
n
X [
≤ǫ+ ai λ(H△ Gij × Fij ) ≤ 2ǫ.
i=0 j≤mi

As ǫ is arbitrary and C is closed, w ∈ C. As w is arbitrary, L1 (λ)+ ⊆ C and C = L1 (λ)+ .


(c) Part (a) tells us that (i)⇒(ii). For the reverse implication, we need a fragment from the theory of Banach
P If w, w′ ∈ W , then
lattices: W + = {w : w ∈ W, w ≥ 0} is a closed set in W . P
w = (w − w′ ) + w′ ≤ |w − w′ | + w′ ≤ |w − w′ | + |w′ |,

−w = (w′ − w) − w′ ≤ |w − w′ | − w′ ≤ |w − w′ | + |w′ |,

|w| ≤ |w − w′ | + |w′ |, |w| − |w′ | ≤ |w − w′ |,


because |w| = w ∨ (−w) and the order of W is translation-invariant (241Ec). Similarly, |w′ | − |w| ≤ |w − w′ | and
||w| − |w′ || ≤ |w − w′ |, so k|w| − |w′ |k ≤ kw − w′ k, by the definition of Banach lattice (242G). Setting φ(w) = |w| − w,
we see that kφ(w) − φ(w′ )k ≤ 2kw − w′ k for all w, w′ ∈ W , so that φ is continuous.
Now, because the order is invariant under multiplication by positive scalars,
w ≥ 0 ⇐⇒ 2w ≥ 0 ⇐⇒ w ≥ −w ⇐⇒ w = |w| ⇐⇒ φ(w) = 0,
+
so W = {w : φ(w) = 0} is closed. Q
Q
253I Tensor products 233

Now suppose that (ii) is true, and set C1 = {w : w ∈ L1 (λ), T w ≥ 0}. Then C1 contains u ⊗ v whenever u, v ≥ 0;
but also it is convex, because T is linear, and closed, because T is continuous and C1 = T −1 [W + ]. By (b), C1 includes
{w : w ∈ L1 (λ), w ≥ 0}, as required by (i).

253H Conditional expectations The ideas of this section and the preceding one provide us with some of the
most important examples of conditional expectations.
Theorem Let (X, Σ, µ) and (Y, T, ν) be complete probability spaces, with c.l.d. product (X × Y, Λ, λ). Set Λ1 =
{E × Y : E ∈ Σ}. Then Λ1 is a σ-subalgebra of Λ. Given a λ-integrable real-valued function f , set
R
g(x, y) = f (x, z)ν(dz)
whenever x ∈ X, y ∈ Y and the integral is defined in R. Then g is a conditional expectation of f on Λ1 .
proof We know thatRΛ1 ⊆ Λ, by 251Ia, and Λ1 is a σ-algebra of sets because Σ is. Fubini’s theorem (252B, 252C)
tells us that f1 (x) = f (x, z)ν(dz) is defined for almost every x, and therefore that g = f1 ⊗ χY is defined almost
everywhere in X × Y . f1 is µ-virtually measurable; because µ is complete, f1 is Σ-measurable, so g is Λ1 -measurable
(since {(x, y) : g(x, y) ≤ α} = {x : f1 (x) ≤ α} × Y for every α ∈ R). Finally, if W ∈ Λ1 , then W = E × Y for some
E ∈ Σ, so

Z Z Z Z
g dλ = (f1 ⊗ χY ) × (χE ⊗ χY )dλ = f1 × χE dµ χY dν
W
(by 253D)
ZZ Z
= χE(x)f (x, y)ν(dy)µ(dx) = f × χ(E × Y )dλ

(by Fubini’s theorem)


Z
= f dλ.
W

So g is a conditional expectation of f .

253I This is a convenient moment to set out a useful result on products of indefinite-integral measures.
Proposition Let (X, Σ, µ) and (Y, T, ν) be measure spaces, and f ∈ L0 (µ), g ∈ L0 (ν) non-negative functions. Let
µ′ , ν ′ be the corresponding indefinite-integral measures (see §234). Let λ be the c.l.d. product of µ and ν, and λ′ the
indefinite-integral measure defined from λ and f ⊗ g ∈ L0 (λ) (253Cb). Then λ′ is the c.l.d. product of µ′ and ν ′ .
proof Write θ for the c.l.d. product of µ′ and ν ′ .
(a) If we replace µ by its completion, we do not change µ′ (234Ke); at the same time, we do not change λ, by 251T.
The same applies to ν. So it will be enough to prove the result on the assumption that µ and ν are complete; in which
case f and g are measurable and have measurable domains.
Set F = {x : x ∈ dom f, f (x) > 0} and G = {y : y ∈ dom g, g(y) > 0}, so that F × G = {w : w ∈ dom(f ⊗ g), (f ⊗
g)(w) > 0}. Then F is µ′ -conegligible and G is ν ′ -conegligible, so F × G is θ-conegligible as well as λ′ -conegligible.
Because both θ and λ′ are complete (251Ic, 234I), it will be enough to show that the subspace measures θF ×G , λ′F ×G
on F × G are equal. But note that θF ×G can be identified with the product of µ′F and νG ′
, where µ′F and νG′
are

the subspace measures on F , G respectively (251Q(ii-α)). At the same time, µF is the indefinite-integral measure

defined from the subspace measure µF on F and the function f ↾F , νG is the indefinite-integral measure defined from

the subspace measure νG on G and g↾G, and λF ×G is defined from the subspace measure λF ×G and (f ↾F ) ⊗ (g↾G).
Finally, by 251Q again, λF ×G is the product of µF and νG .
What all this means is that it will be enough to deal with the case in which F = X and G = Y , that is, f and g are
everywhere defined and strictly positive; which is what I will suppose from now on.
(b) In this case dom µ′ = Σ and dom ν ′ = T (234La). Similarly, dom λ′ = Λ is just the domain of λ. Set
Fn = {x : x ∈ X, 2−n ≤ f (x) ≤ 2n }, Gn = {y : y ∈ Y, 2−n ≤ g(y) ≤ 2n }
for n ∈ N.
(c) Set
A = {W : W ∈ dom θ ∩ dom λ′ , θ(W ) = λ′ (W )}.
234 Product measures 253I

If µ′ E and ν ′ H are defined and finite, then f × χE and g × χH are integrable, so


Z Z
λ′ (E × H) = (f ⊗ g) × χ(E × H)dλ = (f × χE) ⊗ (g × χH)dλ
Z Z
= f × χE dµ · g × χH dν = θ(E × H)

by 253D and 251Ia, that is, E × H ∈ A. If we now look at AEH = {W : W ⊆ X × Y , W ∩ (E × H) ∈ A}, then we see
that
AEH contains E ′ × H ′ for every E ′ ∈ Σ, H ′ ∈ T,
S
if hWn in∈N is a non-decreasing sequence in AEH then n∈N Wn ∈ AEH ,
if W , W ′ ∈ AEH and W ⊆ W ′ then W ′ \ W ∈ AEH .
Thus AEH is a Dynkin class of subsets of X × Y , and by the Monotone Class Theorem (136B) includes the σ-algebra
b
generated by {E ′ × H ′ : E ′ ∈ Σ, H ′ ∈ T}, which is Σ⊗T.
(d) Now suppose that W ∈ Λ. In this case W ∈ dom θ and θW ≤ λ′ W . P P Take n ∈ N, and E ∈ Σ, H ∈ T such that
µ E and ν ′ H are both finite. Set E ′ = E ∩Fn , H ′ = H ∩Gn and W ′ = W ∩(E ′ ×H ′ ). Then W ′ ∈ Λ, while µE ′ ≤ 2n µ′ E

and νH ′ ≤ 2n ν ′ H are finite. By 251Ib there is a V ∈ Σ⊗T b such that V ⊆ W ′ and λV = λW ′ . Similarly, there is a

V ∈ Σ⊗T b such that V ⊆ (E × H ) \ W and λV = λ((E ′ × H ′ ) \ W ′ ). This means that λ((E ′ × H ′ ) \ (V ∪ V ′ )) = 0,
′ ′ ′ ′ ′

so λ ((E ′ × H ′ ) \ (V ∪ V ′ )) = 0. But (E ′ × H ′ ) \ (V ∪ V ′ ) ∈ A, by (c), so θ((E ′ × H ′ ) \ (V ∪ V ′ )) = 0 and W ′ ∈ dom θ,


while
θW ′ = θV = λ′ V ≤ λ′ W .
Since E and H are arbitrary, W ∩ (Fn × Gn ) ∈ dom θ (251H) and θ(W ∩ (Fn × Gn )) ≤ λ′ W . Since hFn in∈N , hGn in∈N
are non-decreasing sequences with unions X, Y respectively,
θW = supn∈N θ(W ∩ (Fn × Gn )) ≤ λ′ W . Q
Q

(e) In the same way, λ′ W is defined and less than or equal to θW for every W ∈ dom θ. P P The arguments are very
similar, but a refinement seems to be necessary at the last stage. Take n ∈ N, and E ∈ Σ, H ∈ T such that µE and
νH are both finite. Set E ′ = E ∩ Fn , H ′ = H ∩ Gn and W ′ = W ∩ (E ′ × H ′ ). Then W ′ ∈ dom θ, while µ′ E ′ ≤ 2n µE
b such that V ⊆ W ′ , V ′ ⊆ (E ′ × H ′ ) \ W ′ , θV = θW ′
and ν ′ H ′ ≤ 2n νH are finite. This time, there are V , V ′ ∈ Σ⊗T
′ ′ ′ ′
and θV = θ((E × H ) \ W ). Accordingly
λ′ V + λ′ V ′ = θV + θV ′ = θ(E ′ × H ′ ) = λ′ (E ′ × H ′ ),
so that λ′ W ′ is defined and equal to θW ′ .
What this means is that W ∩ (Fn × Gn ) ∩ (E × H) ∈ A whenever µE and νH are finite. So W ∩ (Fn × Gn ) ∈ Λ, by
251H; as n is arbitrary, W ∈ Λ and λ′ W is defined.
?? Suppose, if possible, that λ′ W > θW . Then there is some n ∈ N such that λ′ (W ∩ (Fn × Gn )) > θW . Because
λ
R is semi-finite, 213B tells us that there is some λ-simple function h such that h ≤ (f ⊗ g) × χ(W ∩ (Fn × Gn )) and
h dλ > θW ; setting V = {(x, y) : h(x, y) > 0}, we see that V ⊆ W ∩ (Fn × Gn ), λV is defined and finite and
λ′ V > θW . Now there must be sets E ∈ Σ, H ∈ T such that µE and νF are both finite and λ(V \ (E × H)) <
4−n (λ′ V − θW ). But in this case V ∈ Λ ⊆ dom θ (by (d)), so we can apply the argument just above to V and conclude
that V ∩ (E × H) = V ∩ (Fn × Gn ) ∩ (E × H) belongs to A. And now

λ′ V = λ′ (V ∩ (E × H)) + λ′ (V \ (E × H))
≤ θ(V ∩ (E × H)) + 4n λ(V \ (E × H)) < θV + λ′ V − θW ≤ λ′ V,
which is absurd. X
X
So λ′ W is defined and not greater than θW . Q
Q
(f ) Putting this together with (d), we see that λ′ = θ, as claimed.
Remark If µ′ and ν ′ are totally finite, so that they are ‘truly continuous’ with respect to µ and ν in the sense of
232Ab, then f and g are integrable, so f ⊗ g is λ-integrable, and θ = λ′ is truly continuous with respect to λ.
The proof above can be simplified using fragments of the general theory of complete locally determined spaces, which
will be given in §412 in Volume 4.

*253J Upper integrals The idea of 253D can be repeated in terms of upper integrals, as follows.
253Lc Tensor products 235

Proposition Let (X, Σ, µ) and (Y, T, ν) be σ-finite measure spaces, with c.l.d. product measure λ. Then for any
functions f and g, defined on conegligible subsets of X and Y respectively, and taking values in [0, ∞],
R R R
f ⊗ g dλ = f dµ · g dν.

Remark Here (f ⊗ g)(x, y) = f (x)g(y) for all x ∈ dom f and y ∈ dom g, taking 0 · ∞ = 0, as in §135.
R R R R
proof (a) I show first that f ⊗ g ≤ f g. P P If f = 0, then f = 0 a.e., so f ⊗ g = 0 a.e. and the result is immediate.
R R R
The same argument applies if g = 0. If both f and g are non-zero, and either is infinite, the result is trivial. So
R R
let us suppose that both are finite. In this case there are integrable f0 , g0 such that f ≤a.e. f0 , g ≤a.e. g0 , f = f0
R R
and g = g0 (133Ja/135Ha). So f ⊗ g ≤a.e. f0 ⊗ g0 , and
R R R R R R
f ⊗g ≤ f0 ⊗ g 0 = f0 g0 = f g,
by 253D. Q
Q
R
(b) For the reverse inequality, we need consider only the case in which f ⊗ g is finite, so that there is a λ-integrable
R R
function h such that f ⊗ g ≤a.e. h and f ⊗ g = h. Set
R
f0 (x) = h(x, y)ν(dy)
R
whenever this is defined in R, which is almost everywhere, by Fubini’s theorem (252B-252C). Then f0 (x) ≥ f (x) g dν
for every x ∈ dom f0 ∩ dom f , which is a conegligible set in X; so
R R R R R
f ⊗g = h dλ = f0 dµ ≥ f g,
as required.

*253K A similar argument applies to upper integrals of sums, as follows.


Proposition Let (X, Σ, µ) and (Y, T, ν) be probability spaces, with c.l.d. product measure λ. Then for any real-valued
functions f , g defined on conegligible subsets of X, Y respectively,
R R R
f (x) + g(y) λ(d(x, y)) = f (x)µ(dx) + g(y)ν(dy),
at least when the right-hand side is defined in [−∞, ∞].
proof Set h(x, y) = f (x) + g(y) for x ∈ dom f and y ∈ dom g, so that dom h is λ-conegligible.
R R R R R
(a) As in 253J, I start by showing that h ≤ f + g. P P If either f or g is ∞, this is trivial. Otherwise, take
integrable functions f0 , g0 such that f ≤a.e. f0 and g ≤a.e. g0 . Set h0 = (f0 ⊗ χY ) + (χX ⊗ g0 ); then h ≤ h0 λ-a.e., so
R R R R
h dλ ≤ h0 dλ = f0 dµ + g0 dν.
R R R
As f0 , g0 are arbitrary, h ≤ f + g. Q
Q
(b) For the reverse inequality, suppose that h ≤ h0 for λ-almost every (x, y), where h0 is λ-integrable. Set f0 (x) =
R R
h0 (x, y)ν(dy) whenever this is defined in R. Then f0 (x) ≥ f (x) + g dν whenever x ∈ dom f ∩ dom f0 , so
R R R R
h0 dλ = f0 dµ ≥ f dµ + g dν.
R R R
As h0 is arbitrary, h ≥ f + g, as required.

253L Complex spaces As usual, the ideas of 253F and 253H apply essentially unchanged to complex L1 spaces.
Writing L1C (µ), etc., for the complex L1 spaces involved, we have the following results. Throughout, let (X, Σ, µ) and
(Y, T, ν) be measure spaces, and λ the c.l.d. product measure on X × Y .

(a) If f ∈ L0C (µ) and g ∈ L0C (ν) then f ⊗ g, defined by the formula (f ⊗ g)(x, y) = f (x)g(y) for x ∈ dom f and
y ∈ dom g, belongs to L0C (λ).
R R R
(b) If f ∈ L1C (µ) and g ∈ L1C (ν) then f ⊗ g ∈ L1C (λ) and f ⊗ g dλ = f dµ g dν.

(c) We have a bilinear operator (u, v) 7→ u ⊗ v : L1C (µ) × L1C (ν) → L1C (λ) defined by writing f • ⊗ g • = (f ⊗ g)• for
all f ∈ L1C (µ), g ∈ L1C (ν).
236 Product measures 253Ld

(d) If W is any complex Banach space and φ : L1C (µ) × L1C (ν) → W is any bounded bilinear operator, then there is
a unique bounded linear operator T : L1C (λ) → W such that T (u ⊗ v) = φ(u, v) for every u ∈ L1C (µ) and v ∈ L1C (ν),
and kT k = kφk.

(e) If µ and ν are complete probability measures, and Λ1 =R{E × Y : E ∈ Σ}, then for any f ∈ L1C (λ) we have a
conditional expectation g of f on Λ1 given by setting g(x, y) = f (x, z)ν(dz) whenever this is defined.

253X Basic exercises > (a) Let U , V and W be linear spaces. Show that the set of bilinear operators from U × V
to W has a natural linear structure agreeing with those of L(U ; L(V ; W )) and L(V ; L(U ; W )), writing L(U ; W ) for the
linear space of linear operators from U to W .

> (b) Let U , V and W be normed spaces. (i) Show that for a bilinear operator φ : U × V → W the following
are equiveridical: (α) φ is bounded in the sense of 253Ab; (β) φ is continuous; (γ) φ is continuous at some point of
U × V . (ii) Show that the space of bounded bilinear operators from U × V to W is a linear subspace of the space of all
bilinear operators from U × V to W , and that the functional k k defined in 253Ab is a norm, agreeing with the norms
of B(U ; B(V ; W )) and B(V ; B(U ; W )), writing B(U ; W ) for the normed space of bounded linear operators from U to
W.

(c) Let (X1 , Σ1 , µ1 ), . . . , (Xn , Σn , µn ) be measure spaces, and λ the c.l.d. product measure on X1 × . . . × Xn , as
described in 251W. Let W be a Banach space, and suppose that φ : L1 (µ1 ) × . . . × L1 (µn ) → W is multilinear (that is,
linear in each variable separately) and bounded (that is, kφk = sup{φ(u1 , . . . , un ) : kui k1 ≤ 1 ∀ i ≤ n} < ∞). Show
that there is a unique bounded linear operator T : L1 (λ) → W such that T ⊗ = φ, where ⊗ : L1 (µ1 ) × . . . × L1 (µn ) →
L1 (λ) is a canonical multilinear operator (to be defined).

(d) Let (X, Σ, µ) and (Y, T, ν) be measure spaces, and λ the c.l.d. product measure on X ×Y . Show that if A ⊆ L1 (µ)
and B ⊆ L1 (ν) are both uniformly integrable, then {u ⊗ v : u ∈ A, v ∈ B} is uniformly integrable in L1 (λ).

> (e) Let (X, Σ, µ) and (Y, T, ν) be measure spaces and λ the c.l.d. product measure on X × Y . Show that
(i) we have a bilinear operator (u, v) 7→ u ⊗ v : L0 (µ) × L0 (ν) → L0 (λ) given by setting f • ⊗ g • = (f ⊗ g)• for all
f ∈ L0 (µ) and g ∈ L0 (ν);
(ii) if 1 ≤ p ≤ ∞ then u ⊗ v ∈ Lp (λ) and ku ⊗ vkp = kukp kvkp for all u ∈ Lp (µ) and v ∈ Lp (ν);
(iii) if u, u′ ∈ L2 (µ) and v, v ′ ∈ L2 (ν) then the inner product (u ⊗ v|u′ ⊗ v ′ ), taken in L2 (λ), is just (u|u′ )(v|v ′ );
(iv) the map (u, v) 7→ u ⊗ v : L0 (µ) × L0 (ν) → L0 (λ) is continuous if L0 (µ), L0 (ν) and L0 (λ) are all given their
topologies of convergence in measure.

(f ) In 253Xe, assume that µ and ν are semi-finite. Pn Show that if u0 , . . . , un are linearly independent members of
L0 (µ) and v0 , . . . , vn ∈ L0 (ν) are not all 0, then i=0 ui ⊗ vi 6= 0 in L0 (λ). (Hint: start by finding sets E ∈ Σ, F ∈ T
of finite measure such that u0 × χE • , . . . , un × χE • are linearly independent and v0 × χF • , . . . , vn × χF • are not all 0.)

(g) In 253Xe, assume that µ and ν are semi-finite. If U , V are linear subspaces of L0 (µ) and L0 (ν) respectively,
write U ⊗ V for the linear subspace of L0 (λ) generated by {u ⊗ v : u ∈ U, v ∈ V }. Show that if W is any linear space
and φ : U ×V → W is a bilinear operator, there is a unique linear operator T : U ⊗V → W such that TP (u⊗v) = φ(u, v)
n
for allPu ∈ U , v ∈ V . (Hint: start by showing that if u0 , . . . , un ∈ U and v0 , . . . , vn ∈ V are such that i=0 ui ⊗ vi = 0,
n
then i=0 φ(ui , vi ) = 0 – do this by expressing the ui as linear combinations of some linearly independent family and
applying 253Xf.)

> (h) Let (X, Σ, µ) and (Y, T, ν) be complete


R probability spaces, with c.l.d. product measure λ. Suppose that
p ∈ [1, ∞] and that f ∈ Lp (λ). Set g(x) = f (x, y)ν(dy) whenever this is defined. Show that g ∈ Lp (µ) and that
kgkp ≤ kf kp . (Hint: 253H, 244M.)

(i) Let (X, Σ, µ) and (Y, T, ν) be measure spaces, with c.l.d. product measure λ, and p ∈ [1, ∞[. Show that
{w : w ∈ Lp (λ), w ≥ 0} is the closed convex hull in Lp (λ) of {u ⊗ v : u ∈ Lp (µ), v ∈ Lp (ν), u ≥ 0, v ≥ 0} (see 253Xe(ii)
above).

253Y Further exercises (a) Let (X, Σ, µ) and (Y, T, ν) be measure spaces, and λ0 the primitive product measure
on X × Y . Show that if f ∈ L0 (µ) and g ∈ L0 (ν), then f ⊗ g ∈ L0 (λ0 ).

(b) Let (X, Σ, µ) and (Y, T, ν) be measure spaces, R product measure on X × Y . Show that if
R and λ0 the Rprimitive
f ∈ L1 (µ) and g ∈ L1 (ν), then f ⊗ g ∈ L1 (λ0 ) and f ⊗ g dλ0 = f dµ g dν.
253 Notes Tensor products 237

(c) Let (X, Σ, µ) and (Y, T, ν) be measure spaces, and λ0 , λ the primitive and c.l.d. product measures on X × Y .
Show that the embedding L1 (λ0 ) →⊂ L1 (λ) induces a Banach lattice isomorphism between L1 (λ0 ) and L1 (λ).

(d) Let (X, Σ, µ), (Y, T, ν) be strictly localizable measure spaces, with c.l.d. product measure λ. Show that L∞ (λ)
can be identified with L1 (λ)∗ . Show that under this identification {w : w ∈ L∞ (λ), w ≥ 0} is the weak*-closed convex
hull of {u ⊗ v : u ∈ L∞ (µ), v ∈ L∞ (ν), u ≥ 0, v ≥ 0}.

(e) Find a version of 253J valid when one of µ, ν is not σ-finite.

(f ) Let (X, Σ, µ) be any measure space and V any Banach space. Write L1V = L1V (µ) for the set of functions f such
that (α) dom f is a conegligible subset of X (β) f takes values in V (γ) there is a conegligible
R set D ⊆ dom f such
that f [D] is separable and D ∩ f −1 [G] ∈ Σ for every open set G ⊆ V (δ) the integral kf (x)kµ(dx) is finite. (These
are the Bochner integrable functions from X to V .) For f , g ∈ L1V write f ∼ g if f = g µ-a.e.; let L1V be the set of
equivalence classes in L1V under ∼. Show that
(i) f + g, cf ∈ L1V for all f , g ∈ L1V , c ∈ R;
(ii) L1V has a natural linear space structure, defined by writing f • + g • = (f + g)• , cf • = (cf )• for f , g ∈ L1V and
c ∈ R; R
(iii) L1V has a norm k k, defined by writing kf • k = kf (x)kµ(dx) for f ∈ L1V ;
(iv) L1V is a Banach space under this norm;
(v) there is a natural map ⊗ : L1 × V → L1V defined by writing (f ⊗ v)(x) = f (x)v when f ∈ L1 = L1R (µ), v ∈ V
and x ∈ dom f ;
(vi) there is a canonical bilinear operator ⊗ : L1 × V → L1V defined by writing f • ⊗ v = (f ⊗ v)• for f ∈ L1 and
v ∈V;
(vii) whenever W is a Banach space and φ : L1 ×V → W is a bounded bilinear operator, there is a unique bounded
linear operator RT : L1V → W such that T (u ⊗ v) = φ(u, v) for all u ∈ L1 and v ∈ V , and kT k = kφk. (When W = V
and φ(u, v) = ( u)v for u ∈ L1 and v ∈ V , T f • is called the Bochner integral of f .)

(g) Let (X, Σ, µ) and (Y, T, ν) be measure spaces, and λ0 the primitive product measure on X × Y . If f is a
λ0 -integrable function, write fx (y) = f (x, y) whenever this is defined. Show that we have a map x 7→ fx• from a
conegligible subset D0 of XRto L1 (ν). Show that this map is a Bochner integrable function, as defined in 253Yf, and
that its Bochner integral is f dλ0 .

(h) Let (X, Σ, µ) and (Y, T, ν) be measure spaces, and suppose that φ is a function from X to a separable subset of
L1 (ν) which is measurable in the sense that φ−1 [G] ∈ Σ for every open G ⊆ L1 (ν). Show that there is a Λ-measurable
function f from X × Y to R, where Λ is the domain of the c.l.d. product measure on X × Y , such that φ(x) = fx• for
every x ∈ X, writing fx (y) = f (x, y) for x ∈ X, y ∈ Y .

(i) Let (X, Σ, µ) and (Y, T, ν) be measure spaces, and λ the c.l.d. product measure on X × Y . Show that 253Yg
provides a canonical identification between L1 (λ) and L1L1 (ν) (µ).

(j) Let (X, Σ, µ) and (Y, T, ν) be complete locally determined


R measure spaces, with c.l.d. product measure λ. (i)
Suppose that K ∈ L2 (λ), f ∈ L2 (µ). Show that h(y) = K(x, R y)f (x)dx is defined for almost all y ∈ Y and that
h ∈ L2 (ν). (Hint: to see that h is defined a.e., consider E×F K(x, y)f (x)d(x, y) for µE, νF < ∞; to see that
R
h ∈ L2 consider h × g where g ∈ L2 (ν).) (ii) Show that the map f 7→ h corresponds to a bounded linear operator
TK : L2 (µ) → L2 (ν). (iii) Show that the map K 7→ TK corresponds to a bounded linear operator, of norm at most 1,
from L2 (λ) to B(L2 (µ); L2 (ν)).

(k) Suppose that p, q ∈ [1, ∞] and that p1 + 1q = 1, interpreting ∞


1
as 0 as usual. Let (X, Σ, µ), (Y, T, ν) be complete
locally determined measure spaces with c.l.d. product measure λ. Show that the ideas of 253Yj can be used to define
a bounded linear operator, of norm at most 1, from Lp (λ) to B(Lq (µ); Lp (ν)).

(l) In 253Xc, suppose that W is a Banach lattice. Show that the following are equiveridical: (i) T u ≥ 0 whenever
u ∈ L1 (λ); (ii) φ(u1 , . . . , un ) ≥ 0 whenever ui ≥ 0 in L1 (µi ) for each i ≤ n.

253 Notes and comments Throughout the main arguments of this section, I have written the results in terms of
the c.l.d. product measure; of course the isomorphism noted in 253Yc means that they could just as well have been
expressed in terms of the primitive product measure. The more restricted notion of integrability with respect to the
primitive product measure is indeed the one appropriate for the ideas of 253Yg.
238 Product measures 253 Notes

Theorem 253F is a ‘universal mapping theorem’; it asserts that every bounded bilinear operator on L1 (µ) × L1 (ν)
factors through ⊗ : L1 (µ) × L1 (ν) → L1 (λ), at least if the range space is a Banach space. It is easy to see that
this property defines the pair (L1 (λ), ⊗) up to Banach space isomorphism, in the following sense: if V is a Banach
space, and ψ : L1 (µ) × L1 (ν) → V is a bounded bilinear operator such that for every bounded bilinear operator φ
from L1 (µ) × L1 (ν) to any Banach space W there is a unique bounded linear operator T : V → W such that T ψ = φ
and kT k = kφk, then there is an isometric Banach space isomorphism S : L1 (λ) → V such that S⊗ = ψ. There is
of course a general theory of bilinear operators between Banach spaces; in the language of this theory, L1 (λ) is, or is
isomorphic to, the ‘projective tensor product’ of L1 (µ) and L1 (ν). For an introduction to this subject, see Defant &
Floret 93, §I.3, or Semadeni 71, §20. I should perhaps emphasise, for the sake of those who have not encountered
tensor products before, that this theorem is special to L1 spaces. While some of the same ideas can be applied to other
function spaces (see 253Xe-253Xg), there is no other class to which 253F applies.
There is also a theory of tensor products of Banach lattices, for which I do not think we are quite ready (it needs
general ideas about ordered linear spaces for which I mean to wait until Chapter 35 in the next volume). However
253G shows that the ordering, and therefore the Banach lattice structure, of L1 (λ) is determined by the ordering of
L1 (µ) and L1 (ν) and the map ⊗ : L1 (µ) × L1 (ν) → L1 (λ).
The conditional expectation operators described in 253H are of very great importance, largely because in this special
context we have a realization of the conditional expectation operator as a function P0 from L1 (λ) to L1 (λ↾Λ1 ), not just
as a function from L1 (λ) to L1 (λ↾Λ1 ), as in 242J. As described here, P0 (f + f ′ ) need not be equal, in the strict sense,
to P0 f + P0 f ′ ; it can have a larger domain. In applications, however, one might be willing to restrict attention to the
b
linear space U of bounded Σ⊗T-measurable functions defined everywhere on X × Y , so that P0 becomes an operator
from U to itself (see 252P).

254 Infinite products


I come now to the second basic idea of this chapter: the description of a product measure on the product of a (possibly
large) family of probability spaces. The section begins with a construction on similar lines to that of §251 (254A-254F)
and its defining property in terms of inverse-measure-preserving functions (254G). I discuss the usual measure on {0, 1}I
(254J-254K), subspace measures (254L) and various properties of subproducts (254M-254T), including a study of the
associated conditional expectation operators (254R-254T).
Q
254A Definitions (a) Let h(Xi , Σi , µi )ii∈I be a family of probability spaces. Set X = i∈I Xi , the family of
functions x with domain I such that x(i) ∈ Xi for every i ∈ I. In this context, I will say that a measurable cylinder
is a subset of X expressible in the form
Q
C = i∈I Ci ,
where Ci ∈ Σi for every
Q i ∈ I and
Q{i : Ci 6= Xi } is finite. Note that for a non-empty C ⊆ X this expression is unique.
P Suppose that C = i∈I Ci = i∈I Ci′ . For each i ∈ I set
P
Di = {x(i) : x ∈ C}.
Of course Di ⊆ Ci . Because C 6= ∅, we can fix on some z ∈ C. If i ∈ I and ξ ∈ Ci , consider x ∈ X defined by setting
x(i) = ξ, x(j) = z(j) for j 6= i;
then x ∈ C so ξ = x(i) ∈ Di . Thus Di = Ci for i ∈ I. Similarly, Di = Ci′ . Q
Q

(b) We can therefore define a functional θ0 : C → [0, 1], where C is the set of measurable cylinders, by setting
Q
θ0 C = i∈I µi Ci
whenever Ci ∈ Σi for every i ∈ I and {i : Ci 6= Xi } is finite, noting that only finitely many terms in the product can
differ from 1, so that it can safely be treatedQas a finite product. If C = ∅, one of the Ci must be empty, so θ0 C is
surely 0, even though the expression of C as i∈I Ci is no longer unique.

(c) Now define θ : PX → [0, 1] by setting


P∞ S
θA = inf{ n=0 θ0 Cn : Cn ∈ C for every n ∈ N, A ⊆ n∈N Cn }.

254B Lemma The functional θ defined in 254Ac is always an outer measure on X.


proof Use exactly the same arguments as those in 251B above.
254F Infinite products 239

Q 254C Definition Let h(Xi , Σi , µi )ii∈I be any indexed family of probability spaces, and X the Cartesian product
i∈I Xi . The product measure on X is the measure defined by Carathéodory’s method (113C) from the outer
measure θ defined in 254A.
Q
254D Remarks (a) In 254Ab, I asserted that if C ∈ C and no Ci is empty, then nor is C = i∈I Ci . This is
the ‘Axiom of Choice’: the product of any family hCi ii∈I of non-empty sets is non-empty, that is, there is a ‘choice
function’ x with domain I picking out a distinguished member x(i) of each Ci . In this volume I have not attempted
to be scrupulous in indicating uses of the axiom of choice. In fact the use here is not an absolutely vital one; I mean,
the theory of infinite products, even uncountable products, of probability spaces does not change character completely
in the absence of the full axiom of choice (provided, that is, that we allow ourselves
Q to use the countable axiom of
choice). The point is that all we really need, in the present context, is that X = i∈I Xi should be non-empty; and
in many contexts we can prove this, for the particular cases of interest, without using the axiom of choice, by actually
exhibiting a member of X. The simplest case in which this is difficult is when the Xi are uncontrolled Borel subsets
of [0, 1]; and even then, if they are presented with coherent descriptions, we may, with appropriate labour, be able to
construct a member of X. But clearly such a process is liable to slow us down a good deal, and for the moment I think
there is no great virtue in taking so much trouble.

(b) I have given this section the title ‘infinite products’, but it is useful to be able to apply the ideas to finite I; I
should mention in particular the cases #(I) ≤ 2.
(i) If I = ∅, X consists of the unique function with domain I, the empty function. If we identify a function with
its graph, then X is actually {∅}; in any case, X is to be a singleton set, with λX = 1.
(ii) If I is a singleton {i}, then we can identify X with Xi ; C becomes identified with Σi and θ0 with µi , so
that θ can be identified with µ∗i and the ‘product measure’ becomes the measure on Xi defined from µ∗i , that is, the
completion of µi (213Xa(iv)).
(iii) If I is a doubleton {i, j}, then we can identify X with Xi × Xj ; in this case the definitions of 254A and 254C
match exactly with those of 251A and 251C, so that λ here can be identified with the primitive product measure as
defined in 251C. Because µi and µj are both totally finite, this agrees with the c.l.d. product measure of 251F.
Q
254E Definition Let hXi ii∈I be any family of sets, and X = i∈I Xi . If Σi is a σ-subalgebra of subsets of Xi for
N
each i ∈ I, I write c Σi for the σ-algebra of subsets of X generated by
i∈I

{{x : x ∈ X, x(i) ∈ E} : i ∈ I, E ∈ Σi }.
(Compare 251D.)

Q Theorem Let h(Xi , Σi , µi )ii∈I be a family of probability spaces, and let λ be the product measure on
254F
X = i∈I Xi defined as in 254C; let Λ be its domain.
(a) λX = 1. Q Q Q
(b) If Ei ∈ Σi for every i ∈ I, and {i : Ei 6= Xi } is countable, then i∈I Ei ∈ Λ, and λ( i∈I Ei ) = i∈I µi Ei . In
particular, λC = θ0 C for every measurable cylinder C, as defined in 254A, and if j ∈ I then x 7→ x(j) : X → Xj is
inverse-measure-preserving.
N
(c) c i∈I Σi ⊆ Λ.
(d) λ is complete. S
(e) For every W ∈ Λ and ǫ > 0 there is a finite family C0 , . . . , Cn of measurable cylinders such that λ(W △ k≤n Ck ) ≤
ǫ. N
(f) For every W ∈ Λ there are W1 , W2 ∈ c i∈I Σi such that W1 ⊆ W ⊆ W2 and λ(W2 \ W1 ) = 0.
Q
Remark Perhaps I should Q pause to interpret the product i∈I µi Ei . Because all the µi Ei belong to [0, 1], this is
simply inf J⊆I,J is finite i∈J µi Ei , taking the empty product to be 1.
proof Throughout this proof, define C, θ0 and θ as in 254A. I will write out an argument which applies to finite I as
well as infinite I, but you may reasonably prefer to assume that I is infinite on first reading.
Q
(a) Of course λX = θX, so I have to show that θX = 1. Because X, ∅ ∈ C and θ0 X = i∈I µi Xi = 1 and θ0 ∅ = 0,
θX ≤ θ0 X + θ0 ∅ + . . . = 1.
I therefore have to show that θX ≥ 1. ?? Suppose, if possible, otherwise.
240 Product measures 254F

P∞
(i) There is a sequence hCn in∈N in C, covering X, such that n=0 θ0 Cn < 1. For each n ∈ N, express Cn as
{x : x(i) ∈ Eni ∀ i ∈ I}, where
S every Eni ∈ Σi and Jn = {i : Eni 6= Xi } is finite. No Jn can be empty, because
θ0 Cn < 1 = θ0 X; set J = n∈N Jn . Then J is a countable non-empty subset of I. Set K = N if J is infinite,
{k : 0 ≤ k < #(J)} if J is finite; let k 7→ ik : K → J be a bijection.
Q
For each k ∈ K, set Lk = {ij : j < k} ⊆ J, and set αnk = i∈I\Lk µi Eni for n ∈ N, k ∈ K. If J is finite, then we
P∞
can identify L#(J) with J, and set αn,#(J) = 1 for every n. We have αn0 = θ0 Cn for each n, so n=0 αn0 < 1. For
n ∈ N, k ∈ K and t ∈ Xik set

fnk (t) = αn,k+1 if t ∈ En,ik ,


= 0 otherwise.
Then
R
fnk dµik = αn,k+1 µik En,ik = αnk .
P
(ii) Choose tk ∈ Xik inductively, for k ∈ K, as follows. The inductive hypothesis will be that n∈Mk αnk < 1,
where Mk = {n : n ∈ N, tj ∈ En,ij ∀ j < k}; of course M0 = N, so the induction starts. Given that
P P R R P
1 > n∈Mk αnk = n∈Mk fnk dµik = ( n∈Mk fnk )dµik
P
(by B.Levi’s theorem), there must be a tk ∈ Xik such that n∈Mk fnk (tk ) < 1. Now for such a choice of tk , αn,k+1 =
P
fnk (tk ) for every n ∈ Mk+1 , so that n∈Mk+1 αn,k+1 < 1, and the induction continues, unless J is finite and k + 1 =
#(J). In this last case we must just have M#(J) = ∅, because αn,#(J) = 1 for every n.
(iii) If J is infinite, we obtain a full sequence htk ik∈N ; if J is finite, we obtain just a finite sequence htk ik<#(J) . In
either case, there is an x ∈ X such that x(ik ) = tk for each k ∈ K. Now there must be some m ∈ N such that x ∈ Cm .
Because Jm = {i : Emi 6= Xi } is finite, there is a k ∈ N such that P Jm ⊆ Lk (allowing k = #(J) if J is finite). Now
m ∈ Mk , so in fact we cannot have k = #(J), and αmk = 1, so n∈Mk αnk ≥ 1, contrary to the inductive hypothesis.
X
X
This contradiction shows that θX = 1.
(b)(i) I take the particular case first. Let j ∈ I and E ∈ Σj , and let C ∈ C; set Q
W = {x : x ∈ X, x(j) ∈ E}; then
C ∩ W and C \Q W both belong to C, and θ0 C = θ0 (C ∩ W ) + θ0 (C \ W ). P P If C = Qi∈I Ci , where Ci ∈ Σi for each i,
then C ∩ W = i∈I Ci′ , where Ci′ = Ci if i 6= j, and Cj′ = Cj ∩ E; similarly, C \ W = i∈I Ci′′ , where Ci′′ = Ci if i 6= j,
and Cj′′ = Cj \ E. So both belong to C, and
Q Q
θ0 (C ∩ W ) + θ0 (C \ W ) = (µj (Cj ∩ E) + µj (Cj \ E)) i6=j µCi = i∈I µCi = θ0 C. Q Q
S
(ii) Now suppose that A ⊆ X is any set, and ǫ > 0. Then there is a sequence hCn in∈N in C such that A ⊆ Cn
P∞ n∈N
and n=0 θ0 Cn ≤ θA + ǫ. In this case
S S
A ∩ W ⊆ n∈N Cn ∩ W , A \ W ⊆ n∈N Cn \ W,
so
P∞ P∞
θ(A ∩ W ) ≤ n=0 θ0 (Cn ∩ W ), θ(A \ W ) ≤ n=0 θ0 (Cn \ W ),
and
P∞ P∞
θ(A ∩ W ) + θ(A \ W ) ≤ n=0 θ0 (Cn ∩ W ) + θ0 (Cn \ W ) = n=0 θ0 Cn ≤ θA + ǫ.
As ǫ is arbitrary, θ(A ∩ W ) + θ(A \ W ) ≤ θA; as A is arbitrary, W ∈ Λ.

Q that if J ⊆ I is finite and Ci ∈ Σi for each i ∈ J, and C = {x : x ∈ X, x(i) ∈ Ci ∀ i ∈ J}, then


(iii) I show next
C ∈ Λ and λC = i∈J µi Ci . P P Induce on #(J). If #(J) = 0, that is, J = ∅, then C = X and this is part (a). For the
inductive step to #(J) = n + 1, take any j ∈ J and set J ′ = J \ {j},
C ′ = {x : x ∈ X, x(i) ∈ Ci ∀ i ∈ J ′ },

C ′′ = C ′ \ C = {x : x ∈ C ′ , x(j) ∈ Xj \ Cj }.
Q
Then C, C ′ , C ′′ all belong to C, and θ0 C ′ = i∈J ′ µi Ci = α say, θ0 C = αµj Cj , θ0 C ′′ = α(1 − µj Cj ). Moreover, by
the inductive hypothesis, C ′ ∈ Λ and α = λC ′ = θC ′ . So C = C ′ ∩ {x : x(j) ∈ Cj } ∈ Λ by (ii), and C ′′ = C ′ \ C ∈ Λ.
We surely have λC = θC ≤ θ0 C, λC ′′ ≤ θ0 C ′′ ; but also
α = λC ′ = λC + λC ′′ ≤ θ0 C + θ0 C ′′ = α,
254G Infinite products 241

so in fact
Q
λC = θ0 C = αµj Cj = i∈J µCi ,
and the induction proceeds. Q
Q
Q
(iv) Now let us return to the general case of a set W of the form i∈I Ei where Ei ∈ Σi for each i, and
K = {i : Ei 6= Xi } is countable. If K is finite then W = {x : x(i) ∈ Ei ∀ i ∈ K} so W ∈ Λ and
Q Q
λW = i∈K µi Ei = i∈I µi Ei .
Otherwise, let hin in∈N be an enumeration
Qn of K. For each n ∈ N set Wn = {x : x ∈ X, x(ik ) ∈ Eik ∀ k ≤ n}; then we
know that Wn ∈ Λ and that λWn = k=0 µik Eik . But hWn in∈N is a non-increasing sequence with intersection W , so
W ∈ Λ and
Q Q
λW = limn→∞ λWn = i∈K µi Ei = i∈I µi Ei .
N
(c) is an immediate consequence of (b) and the definition of c i∈I Σi .
(d) Because λ is constructed by Carathéodory’s method it must be complete.
S P∞ S
(e) Let hCn in∈N be a sequence in C such that W ⊆ n∈N Cn and n=0 θ0 Cn ≤ θW + 21 ǫ. Set V = n∈N Cn ; by
P∞ S S
(b), V ∈ Λ. Let n ∈ N be such that i=n+1 θ0 Ci ≤ 12 ǫ, and consider W ′ = k≤n Ck . Since V \ W ′ ⊆ i>n Ci ,

λ(W △W ′ ) ≤ λ(V \ W ′ ) + λ(V \ W ) = λV − λW + λ(V \ W ) = θV − θW + θ(V \ W )


X∞ X 1 1
≤ θ0 Ci − θW + θ0 Ci ≤ ǫ + ǫ = ǫ.
2 2
i=0 i=n+1

N
(f )(i) If W ∈ Λ and ǫ > 0 there is a V ∈ c i∈I Σi such that W ⊆ V and λV ≤ λW + ǫ. P
P Let hCn in∈N be a sequence
S P∞ N S N
in C such that W ⊆ n∈N Cn and n=0 θ0 Cn ≤ θW + ǫ. Then Cn ∈ i∈I Σi for each n, so V = n∈N Cn ∈ c i∈I Σi .
c
Now W ⊆ V , and
P∞
λV = θV ≤ n=0 θ0 Cn ≤ θW + ǫ = λW + ǫ. Q Q
N
(ii) Now, given W ∈ Λ, let hVn in∈N be a sequence of sets in c i∈I Σi such that W ⊆ Vn and λVn ≤ λW + 2−n for
T N N
each n; then W2 = n∈N Vn belongs to c i∈I Σi and W ⊆ W2 and λW2 = λW . Similarly, there is a W2′ ∈ c i∈I Σi such
that X \ W ⊆ W2′ and λW2′ = λ(X \ W ), so we may take W1 = X \ W2′ to complete the proof.

254G The following is a fundamental, indeed defining, property of product measures. (Compare 251L.)
Lemma Let h(Xi , Σi , µi )ii∈I be a family of probability spaces with product (X, Λ, λ). Let (Y, T, ν) be a complete
probability space and φ : Y → X a function. Suppose that ν ∗ φ−1 [C] ≤ λC for every measurable cylinder C ⊆ X.
Then φ is inverse-measure-preserving. In particular, φ is inverse-measure-preserving iff φ−1 [C] ∈ T and νφ−1 [C] = λC
for every measurable cylinder C ⊆ X.
Remark By ν ∗ I mean the usual outer measure defined from ν as in §132.
proof (a) First note that, writing θ for the outer measure of 254A,Sν ∗ φ−1 [A] ≤ θA
P∞for every A ⊆ X. P
P Given ǫ > 0,
there is a sequence hCn in∈N of measurable cylinders such that A ⊆ n∈N Cn and n=0 θ0 Cn ≤ θA + ǫ, where θ0 is the
functional of 254A. But we know that θ0 C = λC for every measurable cylinder C (254Fb), so
S P∞ P∞
ν ∗ φ−1 [A] ≤ ν ∗ ( n∈N φ−1 [Cn ]) ≤ n=0 ν ∗ φ−1 [Cn ] ≤ n=0 λCn ≤ θA + ǫ.
As ǫ is arbitrary, ν ∗ φ−1 [A] ≤ θA. Q
Q
(b) Now take any W ∈ Λ. Then there are F , F ′ ∈ T such that
φ−1 [W ] ⊆ F , φ−1 [X \ W ] ⊆ F ′ ,

νF = ν ∗ φ−1 [W ] ≤ θW = λW , νF ′ ≤ λ[X \ W ].
We have
F ∪ F ′ ⊇ φ−1 [W ] ∪ φ−1 [X \ W ] = Y ,
so
242 Product measures 254G

ν(F ∩ F ′ ) = νF + νF ′ − ν(F ∪ F ′ ) ≤ λW + λ(X \ W ) − 1 = 0.


Now
F \ φ−1 [W ] ⊆ F ∩ φ−1 [X \ W ] ⊆ F ∩ F ′
is ν-negligible. Because ν is complete, F \ φ−1 [W ] ∈ T and φ−1 [W ] = F \ (F \ φ−1 [W ]) belongs to T. Moreover,
1 = νF + νF ′ ≤ λW + λ(X \ W ) = 1,
so we must have νF = λW ; but this means that νφ−1 [W ] = νW . As W is arbitrary, φ is inverse-measure-preserving.

254H Corollary Let h(Xi , Σi , µi )ii∈I and h(Yi , Ti , νi )ii∈I be two families of probability spaces, with products
(X, Λ, λ) and (Y, Λ′ , λ′ ). Suppose that for each i ∈ I we are given an inverse-measure-preserving function φi : Xi → Yi .
Set φ(x) = hφi (x(i))ii∈I for x ∈ X. Then φ : X → Y is inverse-measure-preserving.
Q Q
proof If C = i∈I Ci is a measurable cylinder in Y , then φ−1 [C] = i∈I φ−1 i [Ci ] is a measurable cylinder in X, and
Q −1 Q
λφ [C] = i∈I µi φi [Ci ] = i∈I νi Ci = λ′ C.
−1

Since λ is a complete probability measure, 254G tells us that φ is inverse-measure-preserving.

254I Corresponding to 251T we have the following.


Q
Proposition Let h(Xi , Σi , µi )ii∈I be a family of probability spaces, λ the product measure on X = i∈I Xi , and Λ
its domain. Then λ is also the product of the completions µ̂i of the µi (212C).

proof Write λ̂ for the product of the µ̂i , and Λ̂ for its domain. (i) The identity map from Xi to itself is inverse-measure-
preserving if regarded as a map from (Xi , µ̂i ) to (Xi , µi ), so the identity map on X is inverse-measure-preserving if
regarded as a map from (X, λ̂) to (X, λ), by 254H; that is, Λ ⊆ Λ̂ and λ = λ̂↾Λ. (ii) If C is a measurable cylinder for
Q
hµ̂i ii∈I , that is, C = i∈I Ci where Ci ∈ Σ̂i for every i and Q {i : Ci 6= Xi } is finite, then for each i ∈ I we can find a
Ci′ ∈ Σi such that Ci ⊆ Ci′ and µi Ci′ = µ̂i Ci ; setting C ′ = i∈I Ci′ , we get
Q Q
λ∗ C ≤ λC ′ = i∈I µi Ci′ = i∈I µ̂i Ci = λ̂C.
By 254G, λW must be defined and equal to λ̂W whenever W ∈ Λ̂. Putting this together with (i), we see that λ = λ̂.

254J The product measure on {0, 1}I (a) Perhaps the most important of all examples of infinite product
measures is the case in which each factor Xi is just {0, 1} and each µi is the ‘fair-coin’ probability measure, setting
1
µi {0} = µi {1} = .
2

In this case, the product X = {0, 1}I has a family hEi ii∈I of measurable sets such that, writing λ for the product
measure on X,
T
λ( i∈J Ei ) = 2−#(J) if J ⊆ I is finite.
(Just take Ei = {x : x(i) = 1} for each i.) I will call this λ the usual measure on {0, 1}I . Observe that if I is finite
then λ{x} = 2−#(I) for each x ∈ X (using 254Fb). On the other hand, if I is infinite, then λ{x} = 0 for every x ∈ X
(because, again using 254Fb, λ∗ {x} ≤ 2−n for every n).

(b) There is a natural bijection between {0, 1}I and PI, matching x ∈ {0, 1}I with {i : i ∈ I, x(i) = 1}. So we get a
standard measure λ̃ on PI, which I will call the usual measure on PI. Note that for any finite b ⊆ I and any c ⊆ b
we have
λ̃{a : a ∩ b = c} = λ{x : x(i) = 1 for i ∈ c, x(i) = 0 for i ∈ b \ c} = 2−#(b) .

(c) Of course we can apply 254G to these measures; if (Y, T, ν) is a complete probability space, a function φ : Y →
{0, 1}I is inverse-measure-preserving iff
ν{y : y ∈ Y , φ(y)↾J = z} = 2−#(J)
whenever J ⊆ I is finite and z ∈ {0, 1}J ; this is because the measurable cylinders in {0, 1}I are precisely the sets of
the form {x : x↾J = z} where J ⊆ I is finite.
254K Infinite products 243

254K In the case of countably infinite I, we have a very important relationship between the usual product measure
of {0, 1}I and Lebesgue measure on [0, 1].
Proposition Let λ be the usual measure on X = {0, 1}N , and let µ be Lebesgue measure on [0, 1]; write Λ for the
domain of λ and Σ for the domain
P∞ of µ.
(i) For x ∈ X set φ(x) = i=0 2−i−1 x(i). Then
φ−1 [E] ∈ Λ and λφ−1 [E] = µE for every E ∈ Σ;
φ[F ] ∈ Σ and µφ[F ] = λF for every F ∈ Λ.
(ii) There is a bijection φ̃ : X → [0, 1] which is equal to φ at all but countably many points, and any such bijection
is an isomorphism between (X, Λ, λ) and ([0, 1], Σ, µ).
proof (a) The first point to observe is that φ is nearly a bijection. Setting
H = {x : x ∈ X, ∃ m ∈ N, x(i) = x(m) ∀ i ≥ m},

H ′ = {2−n k : n ∈ N, k ≤ 2n },
then H and H ′ are countable and φ↾X \ H is a bijection between X \ H and [0, 1] \ H ′ . (For t ∈ [0, 1] \ H ′ , φ−1 (t)
is the binary expansion of t.) Because H and H ′ are countably infinite, there is a bijection between them; combining
this with φ↾X \ H, we have a bijection between X and [0, 1] equal to φ except at countably many points. For the rest
of this proof, let φ̃ be any such bijection. Let M be the countable set {x : x ∈ X, φ(x) 6= φ̃(x)}, and N the countable
set φ[M ] ∪ φ̃[M ]; then φ[A]△φ̃[A] ⊆ N for every A ⊆ X.
(b) To see that λφ̃−1 [E] exists and is equal to µE for every E ∈ Σ, I consider successively more complex sets E.
α) If E = {t} then λφ̃−1 [E] = λ{φ̃−1 (t)} exists and is zero.

β ) If E is of the form [2−n k, 2−n (k + 1)[, where n ∈ N and 0 ≤ k < 2n , then φ−1 [E] differs by at most two points

from a set of the form {x : x(i) = z(i) ∀ i < n}, so φ̃−1 [E] differs from this by a countable set, and
λφ̃−1 [E] = 2−n = µE.
S
(γγ ) If E is of the form [2−n k, 2−n l[, where n ∈ N and 0 ≤ k < l ≤ 2n , then E = k≤i<l [2−n i, 2−n (i + 1)[, so
λφ̃−1 [E] = 2−n (l − k) = µE.

of 2n t and ln the
(δδ ) If E is of the form [t, u[, where 0 ≤ t < u ≤ 1, then for each n ∈ N let kn be the integral part S
n −n −n
integral part of 2 u; set En = [2 (kn + 1), 2 ln [; then hEn in∈N is a non-decreasing sequence and n∈N En is ]t, u[.
So (using (α))
[
λφ̃−1 [E] = λφ̃−1 [ En ] = lim λφ̃−1 [En ]
n→∞
n∈N

= lim µEn = µE.


n→∞

(ǫǫ) If E ∈ Σ, then for any ǫ > 0 there is a sequence hIn in∈N of half-open subintervals of [0, 1[ such that E \ {1} ⊆
S P∞ −1
S
n∈N In and n=0 µIn ≤ µE + ǫ; now φ̃ [E] ⊆ {φ̃−1 (1)} ∪ n∈N φ−1 [In ], so
S P∞ P∞
λ∗ φ̃−1 [E] ≤ λ( n∈N φ̃−1 [In ]) ≤ n=0 λφ̃−1 [In ] = n=0 µIn ≤ µE + ǫ.
As ǫ is arbitrary, λ∗ φ̃−1 [E] ≤ µE, and there is a V ∈ Λ such that φ−1 [E] ⊆ V and λV ≤ µE.
(ζζ ) Similarly, there is a V ′ ∈ Λ such that V ′ ⊇ φ̃−1 [[0, 1] \ E] and λV ′ ≤ µ([0, 1] \ E). Now V ∪ V ′ = X, while
λV + λV ′ ≤ µE + (1 − µE) = 1 = λ(V ∪ V ′ ),
so λ(V ∩ V ′ ) = 0 and
φ̃−1 [E] = (X \ V ′ ) ∪ (V ∩ V ′ ∩ φ̃−1 [E])
belongs to Λ, with
λφ̃−1 [E] ≤ λV ≤ µE;
at the same time,
1 − λφ̃−1 [E] ≤ λV ′ ≤ 1 − µE
244 Product measures 254K

so λφ̃−1 [E] = µE.


(c) Now suppose that C ⊆ X is a measurable cylinder of the special form {x : x(0) = ǫ0 , . . . , x(n) = ǫn } for some
Pn
ǫ0 , . . . , ǫn ∈ {0, 1}. Then φ[C] = [t, t + 2−n−1 ] where t = i=0 2−i−1 ǫi , so that µφ[C] = λC. Since φ̃[C]△φ[C] ⊆ N is
countable, µφ̃[C] = λC.
If C ⊆ X is any measurable cylinder, then it is of the form {x : x↾J = z} for some finite J ⊆ N; taking n so large that
J ⊆ {0, . . . , n}, C is expressible as a disjoint union of 2n+1−#(J) sets of the form just considered, being just those in
which ǫi = z(i) for i ∈ J. Summing their measures, we again get µφ̃[C] = λC. Now 254G tells us that φ̃−1 : [0, 1] → X
is inverse-measure-preserving, that is, φ̃[W ] is Lebesgue measurable, with measure λW , for every W ∈ Λ.
Putting this together with (b), φ̃ must be an isomorphism between (X, Λ, λ) and ([0, 1], Σ, µ), as claimed in (ii) of
the proposition.
(d) As for (i), if E ∈ Σ then φ−1 [E]△φ̃−1 [E] ⊆ M is countable, so λφ−1 [E] = λφ̃−1 [E] = µE. While if W ∈ Λ,
φ[F ]△φ̃[W ] ⊆ N is countable, so µφ[W ] = µφ̃[W ] = λW .
(e) Finally, if ψ : X → [0, 1] is any other bijection which agrees with φ at all but countably many points, set
M ′ = {x : ψ(x) 6= φ(x)}, N ′ = ψ[M ′ ] ∪ φ[M ′ ]. Then
ψ −1 [E]△φ−1 [E] ⊆ M ′ , λψ −1 [E] = λφ−1 [E] = µE
for every E ∈ Σ, and
ψ[F ]△φ[F ] ⊆ N ′ , µψ[F ] = µφ[F ] = λF
for every F ∈ Λ.

254L Subspaces Just as in 251Q, we can consider the product of subspace measures. There is a simplification in
the form of the result because in the present context we are restricted to probability measures.
Theorem Let h(Xi , Σi , µi )ii∈I be a family of probability spaces, and (X, Λ, λ) their product.
(a) For each i ∈ I, let Ai ⊆ Xi be a set of full outer measure, and write µ̃i for the subspace measure on Ai (214B).
Q
Let λ̃ be the
Q measure on A = i∈I Ai . Then λ̃ is the subspace measure on A induced by λ.
product Q
(b) λ∗ ( i∈I Ai ) = i∈I µ∗i Ai whenever Ai ⊆ Xi for every i.
proof (a) Write λA for the subspace measure on A defined from λ, and ΛA for its domain; write Λ̃ for the domain of
λ̃.
Q
(i) Let φ : A → X be the identity map. If C ⊆ X is a measurable cylinder, say C = i∈I Ci where Ci ∈ Σi for
Q
each i, then φ−1 [C] = i∈I (Ci ∩ Ai ) is a measurable cylinder in A, and
Q Q
λ̃φ−1 [C] = i∈I µ̃i (Ci ∩ Ai ) ≤ i∈I µi Ci = µC.
By 254G, φ is inverse-measure-preserving, that is, λ̃(A ∩ W ) = λW for every W ∈ Λ. But this means that λ̃V is
defined and equal to λA V = λ∗ V for every V ∈ ΛA , since for any such V there is a W ∈ Λ such that V = A ∩ W and
λW = λA V . In particular, λA A = 1.
(ii) Now regard φ as Qa function from the measure space (A, ΛA , λA ) to (A, Λ̃, λ̃). If D is a measurable cylinder
in A, we can express it as i∈I Di where every Di belongs to the domain of µ̃i and Di = Ai for all but finitely many
i. Now for each i we can find Ci ∈ Σ Qi such that Di = Ci ∩ Ai and µCi = µ̃i Di , and we can suppose that Ci = Xi
whenever Di = Ai . In this case C = i∈I Ci ∈ Λ and
Q Q
λC = i∈I µi Ci = i∈I µ̃i Di = λ̃D.
Accordingly
λA φ−1 [D] = λA (A ∩ C) ≤ λC = λ̃D.
By 254G again, φ is inverse-measure-preserving in this manifestation, that is, λA V is defined and equal to λ̃V for every
V ∈ Λ̃. Putting this together with (i), we have λA = λ̃, as claimed.
(b) For each i ∈ I, choose a set Ei ∈ Σi such that Ai ⊆ Ei and µi Ei = µ∗i Ai ; do this in such a way that Ei = Xi
whenever µ∗i Ai = 1. Set Bi = Ai ∪ (Xi \ Ei ), so that µ∗i Bi = 1 for each i (if F ∈ Σi and F ⊇ Bi then F ∩ Ei ⊇ Ai , so
µi F = µi (F ∩ Ei ) + µi (F \ Ei ) = µi Ei + µi (Xi \ Ei ) = 1.)
Q
By (a), we can identify the subspace measure λB on B = i∈I Bi with Qthe product of the Qsubspace measures µ̃i on
Bi . In particular, λ∗ B = λB B = 1. Now Ai = Bi ∩ Ei so (writing A = i∈I Ai ), A = B ∩ i∈I Ei .
254N Infinite products 245

Q Q
If µ∗i Ai = 0, then for every ǫ > 0 there is a finite J ⊆ I such that i∈J µ∗i Ai ≤ ǫ; consequently (using 254Fb)
i∈I
Q
λ∗ A ≤ λ{x : x(i) ∈ Ei for every i ∈ J} = i∈J µi Ei ≤ ǫ.
Q
As ǫ is arbitrary, λ∗ A = 0. If i∈I µ∗i Ai > 0, then for every n ∈ N the set {i : µ∗ Ai ≤ 1 − 2−n } must be finite, so
J = {i : µ∗ Ai < 1} = {i : Ei =
6 Xi }
Q
is countable. By 254Fb again, applied to hEi ∩ Bi ii∈I in the product i∈I Bi ,
Y Y
λ∗ ( Ai ) = λB ( Ai ) = λB {x : x ∈ B, x(i) ∈ Ei ∩ Bi for every i ∈ J}
i∈I i∈I
Y Y
= µ̃i (Ei ∩ Bi ) = µ∗i Ai ,
i∈J i∈I

as required.

254M I now turn to the basic results which make it possible to use these product measures effectively. First, I
offer a vocabulary for dealing with subproducts. Let hXi ii∈I be a family of sets, with product X.
Q
(a) For J ⊆ I, write XJ for i∈J Xi . We have a canonical bijection x 7→ (x↾J, x↾I \ J) : X → XI × XI\J .
Associated with this we have the map x 7→ πJ (x) = x↾J : X → XJ . Now I will say that a set W ⊆ X is determined
by coordinates in J if there is a V ⊆ XJ such that W = πJ−1 [V ]; that is, W corresponds to V × XI\J ⊆ XJ × XI\J .
It is easy to see that

W is determined by coordinates in J
⇐⇒ x′ ∈ W whenever x ∈ W, x′ ∈ X and x′ ↾J = x↾J
⇐⇒ W = πJ−1 [πJ [W ]].
It follows that if W is determined by coordinates in J, and J ⊆ K ⊆ I, W is also determined by coordinates in K.
The family WJ of subsets of X determined by coordinates in J is closed under complementation and arbitrary unions
and intersections. PP If W ∈ WJ , then
X \ W = X \ πJ−1 [πJ [W ]] = πJ−1 [XJ \ πJ [W ]] ∈ WJ .
If V ⊆ WJ , then
S S S
V= V ∈V πJ−1 [πJ [V ]] = πJ−1 [ V ∈V πJ [V ]] ∈ WJ . Q
Q

(b) It follows that


S
W= {WJ : J ⊆ I is countable},
the family of subsets of X determined by coordinates in some countable set, is a σ-algebra of subsets of X. P
P (i) X
−1 −1
and ∅ are determined by coordinates in ∅ (recall that X∅ is a singleton, and that X = π∅ [X∅ ], ∅ = π∅ [∅]). (ii) If
W ∈ W, there is a countable J ⊆ I such that W ∈ WJ ; now
X \ W = πJ−1 [XJ \ πJ [W ]] ∈ WJ ⊆ W.
If hWn in∈N is a sequence in W, then for each n ∈ N there is a countable Jn ⊆ I such that W ∈ WJn . Now
(iii) S
J = n∈N Jn is a countable subset of I, and every Wn belongs to WJ , so
S
n∈N Wn ∈ WJ ⊆ W. Q Q

(c) If i ∈ I and E ⊆ Xi then {x : x ∈ X, x(i) ∈ E} is determined by the single coordinate i, so surely belongs to
N N
W; accordingly W must include c i∈I PXi . A fortiori, if Σi is a σ-algebra of subsets of Xi for each i, W ⊇ c i∈I Σi ;
N
that is, every member of c Σi is determined by coordinates in some countable set.
i∈I

254N Theorem Let h(Xi , Σi , µi )ii∈I Q


be a family of probability spaces and hKj ij∈J a partition of
QI. For each j ∈ J
let λj be the product measure on Zj = i∈Kj Xi , and write λ for the product measure on X = i∈I Xi . Then the
natural bijection
Q
x 7→ φ(x) = hx↾Kj ij∈J : X → j∈J Zj
246 Product measures 254N

identifies λ with the product of the family hλj ij∈J .


Q if K ⊆ I is any set, then λ can be identified with the c.l.d. product of the product measures on
Q In particular,
i∈K X i and i∈I\K Xi .
Q
proof (Compare 251N.) Write Z = j∈J Zj and λ̃ for the product measure on Z; let Λ, Λ̃ be the domains of λ and λ̃.
Q
(a) Let C ⊆ Z be a measurable cylinder. Then λ∗ φ−1 [C] ≤ λ̃C. P P Express C as j∈J Cj where Cj ⊆ Zj belongs
to the domain Λj of λj for each j. Set L = {jQ: Cj 6= Zj }, so that L is finite.
S Let ǫ > 0.PFor each j ∈ L let hCjn in∈N be

a sequence of measurable cylinders in Zj = i∈Kj Xi such that Cj ⊆ n∈N Cjn and n=0 λj Cjn ≤ λCj + ǫ. Express
Q
each Cjn as i∈Kj Cjni where Cjni ∈ Σi for i ∈ Kj (and {i : Cjni 6= Xi } is finite).
For f ∈ N L , set
Df = {x : x ∈ X, x(i) ∈ Cj,f (j),i whenever j ∈ L, i ∈ Kj }.
S
Because j∈L {i : Cj,f (j),i 6= Xi } is finite, Df is a measurable cylinder in X, and
Q Q Q
λDf = j∈L i∈Kj µi Cj,f (j),i = j∈L λj Cj,f (j) .
Also
S
{Df : f ∈ N L } ⊇ φ−1 [C]
because if φ(x) ∈ C then φ(x)(j) ∈ Cj for each j ∈ L, so there must be an f ∈ N L such that φ(x)(j) ∈ Cj,f (j) for every
j ∈ L. But (because N L is countable) this means that
X X Y
λ∗ φ−1 [C] ≤ λDf = λj Cj,f (j)
f ∈N L f ∈N L j∈L
YX ∞ Y
= λj Cjn ≤ (λj Cj + ǫ).
j∈L n=0 j∈L

As ǫ is arbitrary,
Q
λ∗ φ−1 [C] ≤ j∈L λj Cj = λ̃C. Q
Q
By 254G, it follows that λφ−1 [W ] is defined, and equal to λ̃W , whenever W ∈ Λ̃.
Q
(b) Next, λ̃φ[D] = λD for every measurable cylinder D ⊆ X. PP This is easy. Express D as i∈I Di where Di ∈ Σi
Q Q
for every i ∈ I and {i : Di 6= Σi } is finite. Then φ[D] = j∈J D̃j , where D̃j = i∈Kj Di is a measurable cylinder
for each j ∈ J. Because {j : D̃j = 6 Zj } must also be finite (in fact, it cannot have more members than the finite set
Q
{i : Di 6= Xi }), j∈J D̃j is itself a measurable cylinder in Z, and
Q Q Q
λ̃φ[D] = j∈J λj D̃j = j∈J i∈Kj µDi = λD. Q Q

Applying 254G to φ−1 : Z → X, it follows that λ̃φ[W ] is defined, and equal to λW , for every W ∈ Λ. But together
with (a) this means that for any W ⊆ X,
if W ∈ Λ then φ[W ] ∈ Λ̃ and λ̃φ[W ] = λW ,
if φ[W ] ∈ Λ̃ then W ∈ Λ and λW = λ̃φ[W ].
And of course this is just what is meant by saying that φ is an isomorphism between (X, Λ, λ) and (Z, Λ̃, λ̃).

254O Proposition Let h(X Qi , Σi , µi )ii∈I be a family of probability spaces. For each J ⊆ I let λJ be the product
probability measure on XJ = i∈J Xi , and ΛJ its domain; write X = XI , λ = λI and Λ = ΛI . For x ∈ X and J ⊆ I
set πJ (x) = x↾J ∈ XJ .
(a) For every J ⊆ I, λJ is the image measure λπJ−1 (234D); in particular, πJ : X → XJ is inverse-measure-preserving
for λ and λJ .
(b) If J ⊆ I and W ∈ Λ is determined by coordinates in J (254M), then λJ πJ [W ] is defined and equal to λW .
Consequently there are W1 , W2 belonging to the σ-algebra of subsets of X generated by
{{x : x(i) ∈ E} : i ∈ J, E ∈ Σi }
such that W1 ⊆ W ⊆ W2 and λ(W2 \ W1 ) = 0.
(c) For every W ∈ Λ, we can find a countable set J and W1 , W2 ∈ Λ, both determined by coordinates in J, such
that W1 ⊆ W ⊆ W2 and λ(W2 \ W1 ) = 0.
254P Infinite products 247

(d) For every W ∈ Λ, there is a countable set J ⊆ I such that πJ [W ] ∈ ΛJ and λJ πJ [W ] = λW ; so that
W ′ = πJ−1 [πJ [W ]] belongs to Λ, and λ(W ′ \ W ) = 0.
proof (a)(i) By 254N, we can identify λ with the product of λJ and λI\J on XJ × XI\J . Now πJ−1 [E] ⊆ X corresponds
to E × XI\J ⊆ XJ × XI\J , so
λ(π −1 [E]) = λJ E · λI\J XI\J = λJ E,
by 251E or 251Ia, whenever E ∈ ΛJ . This shows that πJ is inverse-measure-preserving.
(ii) To see that λJ is actually the image measure, suppose that E ⊆ XJ is such that πJ−1 [E] ∈ Λ. Identifying
πJ−1 [E]with E × XI\J , as before, we are supposing that E × XI\J is measured by the product measure on XJ × XI\J .
But this means that for λI\J -almost every z ∈ XI\J , Ez = {y : (y, z) ∈ E × XI\J } belongs to ΛJ (252D(ii), because
λJ is complete). Since Ez = E for every z, E itself belongs to ΛJ , as claimed.
(b) If W ∈ Λ is determined by coordinates in J, set H = πJ [W ]; then πJ−1 [H] = W , so H ∈ ΛJ by (a) just above.
N
By 254Ff, there are H1 , H2 ∈ c i∈J Σi such that H1 ⊆ H ⊆ H2 and λJ (H2 \ H1 ) = 0.
Let TJ be the σ-algebra of subsets of X generated by sets of the form {x : x(i) ∈ E} where i ∈ J and E ∈ ΣJ .
Consider T′J = {G : G ⊆ XJ , πJ−1 [G] ∈ TJ }. This is a σ-algebra of subsets of XJ , and it contains {y : y ∈ XJ ,
y(i) ∈ E} whenever i ∈ J, E ∈ ΣJ (because
πJ−1 [{y : y ∈ XJ , y(i) ∈ E}] = {x : x ∈ X, x(i) ∈ E}
N
whenever i ∈ J, E ⊆ Xi ). So T′J must include c i∈J Σi . In particular, H1 and H2 both belong to T′J , that is,
Wk = πJ−1 [Hk ] belongs to TJ for both k. Of course W1 ⊆ W ⊆ W2 , because H1 ⊆ H ⊆ H2 , and
λ(W2 \ W1 ) = λJ (H2 \ H1 ) = 0,
as required.
N
(c) Now take any W ∈ Λ. By 254Ff, there are W1 and W2 ∈ c i∈I Σi such that W1 ⊆ W ⊆ W2 and λ(W2 \ W1 ) = 0.
By 254Mc, there are countable sets J1 , J2 ⊆ I such that, for each k, Wk is determined by coordinates in Jk . Setting
J = J1 ∪ J2 , J is a countable subset of I and both W1 and W2 are determined by coordinates in J.
(d) Continuing the argument from (c), πJ [W1 ], πJ [W2 ] ∈ ΛJ , by (b), and λJ (πJ [W2 ] \ πJ [W1 ]) = 0. Since πJ [W1 ] ⊆
πJ [W ] ⊆ πJ [W2 ], it follows that πJ [W ] ∈ ΛJ , with λJ πJ [W ] = λJ πJ [W2 ]; so that, setting W ′ = πJ−1 [πJ [W ]], W ′ ∈ Λ,
and
λW ′ = λJ πJ [W ] = λJ πJ [W2 ] = λπJ−1 [πJ [W2 ]] = λW2 = λW .

254P Proposition Let h(XQ i , Σi , µi )ii∈I be a family of probability spaces, and for each J ⊆ I let λJ be the product
probability measure on XJ = i∈J Xi , and ΛJ its domain; write X = XI , Λ = ΛI and λ = λI . For x ∈ X and J ⊆ I
set πJ (x) = x↾J ∈ XJ .
(a) If J ⊆ I and g is a real-valued function defined on a subset of XJ , then g is ΛJ -measurable iff gπJ is Λ-measurable.
(b) Whenever f is a Λ-measurable real-valued function defined on a λ-conegligible subset of X, we can find a
countable set J ⊆ I and a ΛJ -measurable function g defined on a λJ -conegligible subset of XJ such that f extends
gπJ .
proof (a)(i) If g is ΛJ -measurable and a ∈ R, there is an H ∈ ΛJ such that {y : y ∈ dom g, g(y) ≥ a} = H ∩ dom g.
Now πJ−1 [H] ∈ Λ, by 254Oa, and {x : x ∈ dom gπJ , gπJ (x) ≥ a} = πJ−1 [H] ∩ dom gπJ . So gπJ is Λ-measurable.
(ii) If gπJ is Λ-measurable and a ∈ R, then there is a W ∈ Λ such that {x : gπJ (x) ≥ a} = W ∩ dom gπJ . As in
the proof of 254Oa, we may identify λ with the product of λJ and λI\J , and 252D(ii) tells us that, if we identify W
with the corresponding subset of XJ × XI\J , there is at least one z ∈ XI\J such that Wz = {y : y ∈ XI , (y, z) ∈ W }
belongs to ΛJ . But since (on this convention) gπJ (y, z) = g(y) for every y ∈ XJ , we see that {y : y ∈ dom g,
g(y) ≥ a} = Wz ∩ dom g. As a is arbitrary, g is ΛJ -measurable.
(b) For rational numbers q, set Wq = {x : x ∈ dom f , f (x) ≥ q}. By 254Oc we can find for each q a countable
set Jq ⊆ IS and sets Wq′ , Wq′′ ,Sboth determined by coordinates in Jq , such that Wq′ ⊆ Wq ⊆ Wq′′ and λ(Wq′′ \ Wq′ ) = 0.
Set J = q∈Q Jq , V = X \ q∈Q (Wq′′ \ Wq′ ); then J is a countable subset of I and V is a conegligible subset of X;
moreover, V is determined by coordinates in J because all the Wq′ , Wq′′ are.
For every q ∈ Q, Wq ∩ V = Wq′ ∩ V , because ′ ′′ ′
S V ∩ (Wq \ Wq ) ⊆ V ∩ (Wq \ Wq ) = ∅; so Wq ∩ V is determined by
coordinates in J. Consequently V ∩ dom f = q∈Q V ∩ Wq also is determined by coordinates in J. Also
248 Product measures 254P

T
{x : x ∈ V ∩ dom f , f (x) ≥ a} = q≤a V ∩ Wq
is determined by coordinates in J. What this means is that if x, x′ ∈ V and πJ x = πJ x′ , then x ∈ dom f iff x′ ∈ dom f
and in this case f (x) = f (x′ ). Setting H = πJ [V ∩ dom f ], we have πJ−1 [H] = V ∩ dom f a conegligible subset of X,
so (because λJ = λπJ−1 ) H is conegligible in XJ . Also, for y ∈ H, f (x) = f (x′ ) whenever πJ x = πJ x′ = y, so there
is a function g : H → R defined by saying that gπJ (x) = f (x) whenever x ∈ V ∩ dom f . Thus g is defined almost
everywhere in XJ and f extends gπJ . Finally, for any a ∈ R,
πJ−1 [{y : g(y) ≥ a}] = {x : x ∈ V ∩ dom f , f (x) ≥ a} ∈ Λ;
by 254Oa, {y : g(y) ≥ a} ∈ ΛJ ; as a is arbitrary, g is measurable.

254Q Proposition Let h(X Qi , Σi , µi )ii∈I be a family of probability spaces, and for each J ⊆ I let λJ be the product
probability measure on XJ = i∈J Xi ; write X = XI , λ = λI . For x ∈ X, J ⊆ I set πJ (x) = x↾J ∈ XJ .
(a) Let S be the linear subspace of RX spanned by {χC : C ⊆ X is a Rmeasurable cylinder}. Then for every
λ-integrable real-valued function f and every ǫ > 0 there is a g ∈ S such that |f − g|dλ ≤ R ǫ. R
(b) Whenever J ⊆ I and g is a real-valued function defined on a subset of XJ , then g dλJ = gπJ dλ if either
integral is defined in [−∞, ∞].
(c) Whenever f is a λ-integrable real-valued function, we can find a countable set J ⊆ X and a λJ -integrable
function g such that f extends gπJ .

R S for the set of functions f satisfying the assertion, that is, such that for every ǫ > 0 there is a
proof (a)(i) Write
g ∈ SRsuch that |f − g| R≤ ǫ. Then f1 + f2 and cf1 ∈ S whenever f1R, f2 ∈ S. P P Given ǫ > 0 thereR are g1 , g2 ∈ S such
ǫ
that |f1 − g1 | ≤ 2+|c| , |f2 − g2 | ≤ 2ǫ ; now g1 + g2 , cg1 ∈ S and |(f1 + f2 ) − (g1 + g2 )| ≤ ǫ, |cf1 − cg1 | ≤ ǫ. Q
Q
Also, of course, f ∈ S whenever f0 ∈ S and f =a.e. f0 .
(ii) Write W for {W : W ⊆ X, χW ∈ S}, and C for the family of measurable cylinders in X. Then it is plain
from the definition in 254A that C ∩ C ′ ∈ C for all C, C ′ ∈ C, and of course C ∈ W for every C ∈ C, because
S χC ∈ S.
Next, W \ V ∈ W whenever W , V ∈ W and V ⊆ W , because then S χ(W \ V ) = χW − χV . Thirdly, n∈N W n ∈ W
for every non-decreasing sequence hWn in∈N in W.R P P Set W = n∈N Wn . Given R ǫ > 0, there is an n ∈ N such that
ǫ ǫ
λ(W \ Wn ) ≤ 2 . Now there is a g ∈ S such that |χWn − g| ≤ 2 , so that |χW − g| ≤ ǫ. Q Q Thus W is a Dynkin
class of subsets of X.
By the Monotone Class Theorem (136B), W must include the σ-algebra of subsets of X generated by C, which is
N
c Σi . But this means that W contains every measurable subset of X, since by 254Ff any measurable set differs by
i∈I
N
a negligible set from some member of c Σi .i∈I

(iii) Thus S contains the characteristic function of any measurable subset of X. Because it is closed under addition
and scalar multiplication, it contains all simple functions. But this means that it must contain all integrable functions.
P
RP If f is a real-valued function which is integrable over X, R and ǫ > 0, there is aRsimple function h : X → R such that
|f − h| ≤ 2ǫ (242M), and now there is a g ∈ S such that |h − g| ≤ 2ǫ , so that |f − g| ≤ ǫ. Q Q
This proves part (a) of the proposition.
(b) Put 254Oa and 235J together.
(c) By 254Pb, there are a countable J ⊆ I and a real-valued function g defined on a conegligible subset of XJ such
that f extends gπJ . Now dom(gπJ ) = πJ−1 [dom g] is conegligible, so f =a.e. gπJ and gπJ is λ-integrable. By (b), g is
λJ -integrable.

254R Conditional expectations again Putting the ideas of 253H together with the work above, we obtain some
results which are important not only for their direct applications but for the light they throw on the structures here.
Theorem Let h(Xi , Σi , µi )ii∈I be a family of probability spaces with product (X, Λ, λ). For J ⊆ I let ΛJ ⊆ Λ be the
σ-subalgebra of sets determined by coordinates in J (254Mb). Then we may regard L0 (λ↾ΛJ ) as a subspace of L0 (λ)
(242Jh). Let PJ : L1 (λ) → L1 (λ↾ΛJ ) ⊆ L1 (λ) be the corresponding conditional expectation operator (242Jd). Then
(a) for any J, K ⊆ I, PK∩J = PK PJ ;
(b) for any u ∈ L1 (λ), there is a countable set J ∗ ⊆ I such that PJ u = u iff J ⊇ J ∗ ;
(c) for any u ∈ L0 (λ), there is a unique smallest set J ∗ ⊆ I such that u ∈ L0 (λ↾ΛJ ∗ ), and this J ∗ is countable;
(d) for any W ∈ Λ there is a unique smallest set J ∗ ⊆ I such that W △W ′ is negligible for some W ′ ∈ ΛJ ∗ , and this

J is countable;
254R Infinite products 249

(e) for any Λ-measurable real-valued function f : X → R there is a unique smallest set J ∗ ⊆ I such that f is equal
almost everywhere to a Λ∗J -measurable function, and this J ∗ is countable.
Q
proof For J ⊆ I, write XJ = i∈J Xi , let λJ be the product measure on XJ , and set φJ (x) = x↾J for x ∈ X. Write
L0J for L0 (λ↾ΛJ ), regarded as a subset of L0 = L0I , and L1J for L1 (λ↾ΛJ ) = L1 (λ) ∩ L0J , as in 242Jb; thus L1J is the set
of values of the projection PJ .
Q
(a)(i) Let C ⊆ X be a measurable cylinder, expressed as i∈I Ci where Ci ∈ Σi for every i and L = {i : Ci 6= Xi }
is finite. Set
Q Q
Ci′ = Ci for i ∈ J, Xi for i ∈ I \ J, C ′ = i∈I Ci′ , α = i∈I\J µi Ci .
Then αχC ′ is a conditional expectation of χC on ΛJ . P
P By 254N, we can identify λ with the product of λJ and λI\J .
This identifies ΛJ with {E × XI\J : E ∈ dom λJ }. By 253H we have a conditional expectation g of χC defined by
setting
R
g(y, z) = χC(y, t)λI\J (dt)
Q
for y ∈ XJ , z ∈ XI\J . But C is identified with CJ × CI\J , where CJ = i∈J Ci , so that g(y, z) = 0 if y ∈/ CJ and
otherwise is λI\J CI\J = α. Thus g = αχ(CJ × XI\J ). But the identification between XI × XI\J and X matches
CJ × XI\J with C ′ , as described above. So g becomes identified with αχC ′ and αχC ′ is a conditional expectation of
χC. QQ
(ii) Next, setting
Q
Ci′′ = Ci′ for i ∈ K, Xi for i ∈ I \ K, C ′′ = i∈I Ci′′ ,
Q Q
β = i∈I\K µi Ci′ = i∈I\(J∪K) µi Ci ,
the same arguments show that βχC ′′ is a conditional expectation of χC ′ on ΛK . So we have
PK PJ (χC)• = βα(χC ′′ )• .
Q
But if we look at βα, this is just i∈I\(K∩J) µi Ci , while Ci′′ = Ci if i ∈ K ∩ J, Xi for other i. So βαχC ′′ is a conditional
expectation of χC on ΛK∩J , and
PK PJ (χC)• = PK∩J (χC)• .

(iii) Thus we see that the operators PK PJ , PK∩J agree on elements of the form χC • where C is a measurable
cylinder. Because they are both linear, they agree on linear combinations of these, that is, PK PJ v = PK∩J v whenever
v = g • for some g in the space S of 254Q.
R But if u ∈ L1 (λ) and ǫ > 0, there is a λ-integrable function f such that
f = u an there is a g ∈ S such that |f − g| ≤ ǫ (254Qa), so that ku − vk1 ≤ ǫ, where v = g • . Since PJ , PK and

PK∩J are all linear operators of norm 1,


kPK PJ u − PK∩J uk1 ≤ 2ku − vk1 + kPK PJ v − PK∩J vk1 ≤ 2ǫ.
As ǫ is arbitrary, PK PJ u = PK∩J u; as u is arbitrary, PK PJ = PK∩J .
(b) Take u ∈ L1 (λ). Let J be the family of all subsets J of I such that PJ u = u. By (a), J ∩ K ∈ J for all J,
K ∈ J . Next, J contains a countable set J0 . P P Let f be a λ-integrable function such that f • = u. By 254Qc, we can
find a countable set J0 ⊆ I and a λJ0 -integrable function g such that f =a.e. gπJ0 . Now gπJ0 is ΛJ0 -measurable and
u = (gπJ0 )• belongs
T to L1J0 , so J0 ∈ J . Q
Q
Write J = J , so that J ∗ ⊆ J0 is countable. Then J ∗ ∈ J . P

P Let ǫ > 0. As in the proof of (a) above, there is
a g ∈ S such that ku − vk1 ≤ ǫ, where v = g • . But because g is a finite linear combination of characteristic functions
of measurable cylinders, each determined by coordinates in some finite set, there is a finite K ⊆ I suchTthat g is ΛK -
measurable, so that PK v = v. Because K is finite, there must be J1 , . . . , Jn ∈ J such that J ∗ ∩ K = 1≤i≤n Ji ∩ K;
but as J is closed under finite intersections, J = J1 ∩ . . . ∩ Jn ∈ J , and J ∗ ∩ K = J ∩ K.
Now we have
PJ ∗ v = PJ ∗ PK v = PJ ∗ ∩K v = PJ∩K v = PJ PK v = PJ v,
using (a) twice. Because both PJ and PJ ∗ have norm 1,

kPJ ∗ u − uk1 ≤ kPJ ∗ u − PJ ∗ vk1 + kPJ ∗ v − PJ vk1 + kPJ v − PJ uk1 + kPJ u − uk1
≤ ku − vk1 + 0 + ku − vk1 + 0 ≤ 2ǫ.
As ǫ is arbitrary, PJ ∗ u = u and J ∗ ∈ J . Q
Q
250 Product measures 254R

Now, for any J ⊆ I,

PJ u = u =⇒ J ∈ J =⇒ J ⊇ J ∗
=⇒ PJ u = PJ PJ ∗ u = PJ∩J ∗ u = PJ ∗ u = u.
Thus J ∗ has the required properties.
(c) Set e = (χX)• , un = (−ne) ∨ (u ∧ ne) for each n ∈ N. Then, for any J ⊆ I, u ∈ L0J iff un ∈ L0J for every n. P
P (α)
If u ∈ L0J , then u is expressible as f • for some ΛJ -measurable f ; now fn = (−nχX) ∨ (f ∧ nχX) is ΛJ -measurable, so
un = fn• ∈ L0J for every n. (β) If un ∈ L0J for each n, then for each n we can find a ΛJ -measurable function fn such that
fn• = un . But there is also a Λ-measurable function f such that u = f • , and we must have fn =a.e. (−nχX) ∨ (f ∧ nχX)
for each n, so that f =a.e. limn→∞ fn and u = (limn→∞ fn )• . Since limn→∞ fn is ΛJ -measurable, u ∈ L0J . Q Q
1
As every un belongs to L , we know that
un ∈ L0J ⇐⇒ un ∈ L1J ⇐⇒ PJ un = un .

By (b), there isSfor each n a countable
S Jn such that PJ un = un iff J ⊇ Jn∗ . So we see that u ∈ L0J iff J ⊇ Jn∗ for every
∗ ∗ ∗
n, that is, J ⊇ n∈N Jn . Thus J = n∈N Jn has the property claimed.
(d) Applying (c) to u = (χW )• , we have a (countable) unique smallest J ∗ such that u ∈ L0J ∗ . But if J ⊆ I, then
there is a W ′ ∈ ΛJ such that W ′ △W is negligible iff u ∈ L0J . So this is the J ∗ we are looking for.
(e) Again apply (c), this time to f • .

254S Proposition Let h(Xi , Σi , µi )ii∈I be a family of probability spaces, with product (X, Λ, λ).
(a) If A ⊆ X is determined by coordinates in I \ {j} for every j ∈ I, then its outer measure λ∗ A must be either 0
or 1.
(b) If W ∈ Λ and λW > 0, then for every ǫ > 0 there are a W ′ ∈ Λ and a finite set J ⊆ I such that λW ′ ≥ 1 − ǫ
and for every x ∈ W ′ there is a y ∈ W such that x↾I \ J = y↾I \ J.
Q
proof For J ⊆ I write XJ for i∈J Xi and λJ for the product measure on XJ .
(a) Let W be a measurable envelope of A. By 254Rd, there is a smallest J ⊆ I for which there is a W ′ ∈ Λ,
determined by coordinates in J, with λ(W △W ′ ) = 0. Now J = ∅. P P Take any j ∈ I. Then A is determined by
coordinates in I \ {j}, that is, can be regarded as Xj × A′ for some A′ ⊆ XI\{j} . We can also think of λ as the product
of λ{j} and λI\{j} (254N). Let ΛI\{j} be the domain of λI\{j} . By 251S,
λ∗ A = λ∗{j} Xj · λ∗I\{j} A′ = λ∗I\{j} A′ .
Let V ∈ ΛI\{j} be measurable envelope of A′ . Then W ′ = Xj × V belongs to Λ, includes A and has measure λ∗ A, so
λ(W ∩ W ′ ) = λW = λW ′ and W △W ′ is negligible. At the same time, W ′ is determined by coordinates in I \ {j}.
This means that J must be included in I \ {j}. As j is arbitrary, J = ∅. Q
Q
But the only subsets of X which are determined by coordinates in ∅ are X and ∅. Since W differs from one of these
by a negligible set, λ∗ A = λW ∈ {0, 1}, as claimed.
(b) Set η = 12 min(ǫ, 1)λW . By 254Fe, there is a measurable set V , determined by coordinates in a finite subset J
of I, such that λ(W △V ) ≤ η. Note that
1
λV ≥ λW − η ≥ λW > 0,
2
so
1
λ(W △V ) ≤ ǫλW ≤ ǫλV .
2

We may identify λ with the c.l.d. product of λJ and λI\J (254N). Let W̃ , Ṽ ⊆ XI × XI\J be the sets corresponding to
W , V ⊆ X. Then Ṽ can be expressed as U × XI\J where λJ U = λV > 0. Set U ′ = {z : z ∈ XI\J , λJ W̃ −1 [{z}] = 0}.
Then U ′ is measured by λI\J (252D(ii) again, because both λJ and λI\J are complete), and

Z
λJ U · λI\J U ′ ≤ λJ (W̃ −1 [{z}]△U )λI\J (dz)

(because if z ∈ U ′ then λJ (W̃ −1 [{z}]△U ) = λJ U )


*254U Infinite products 251
Z
= λJ (W̃ △Ṽ )−1 [{z}]λI\J (dz)

= (λJ × λI\J )(W̃ △Ṽ )


(252D once more)
= λ(W △V ) ≤ ǫλV = ǫλJ U.

This means that λI\J U ′ ≤ ǫ. Set W ′ = {x : x ∈ X, x↾I \ J ∈ / U ′ }; then λW ′ ≥ 1 − ǫ. If x ∈ W ′ , then z = x↾I \ J ∈


/ U ′,
so W̃ −1 [{z}] is not empty, that is, there is a y ∈ W such that y↾I \ J = z. So this W ′ has the required properties.

254T Remarks It is important to understand that the results above apply to L0 and L1 and measurable-sets-up-to-
a-negligible set, not to sets and functions themselves. One idea does apply to sets and functions, whether measurable
or not.

(a) Let hXi ii∈I be a family of sets with Cartesian product X. For each J ⊆ I let WJ be the set of subsets of X
determined by coordinates in J. Then WJ ∩ WK = WJ∩K for all J, K ⊆ I. P P Of course WJ ∩ WK ⊇ WJ∩K , because
WJ ⊇ WJ ′ whenever J ′ ⊆ J. On the other hand, suppose W ∈ WJ ∩ WK , x ∈ W , y ∈ X and x↾J ∩ K = y↾J ∩ K. Set
z(i) = x(i) for i ∈ J, y(i) for i ∈ I \ J. Then z↾J = x↾J so z ∈ W . Also y↾K = z↾K so y ∈ W . As x, y are arbitrary,
W ∈ WJ∩K ; as W is arbitrary, WJ ∩ WK ⊆ WJ∩K . Q Q Accordingly, for any W ⊆ X, F = {J : W ∈ WJ } is a filter on
I (unless W = X or W = ∅, in which case F = PX). But F does not necessarily have a least element, as the following
example shows.

(b) Set X = {0, 1}N ,


W = {x : x ∈ X, limi→∞ x(i) = 0}.
Then
T for every n ∈ N W is determined by coordinates in Jn = {i : i ≥ n}. But W is not determined by coordinates in
n∈N Jn = ∅. Note that
S T
W = n∈N i≥n {x : x(i) = 0}
is measured by the usual measure on X. But it is also negligible (since it is countable); in 254Rd we have J = ∅,
W ′ = ∅.

*254U I am now in a position to describe a counter-example answering a natural question arising out of §251.
Example There are a localizable measure space (X, Σ, µ) and a probability space (Y, T, ν) such that the c.l.d. product
measure λ on X × Y is not localizable.
proof (a) Take (X, Σ, µ) to be the space of 216E, so that X = {0, 1}I , where I = PC for some set C of cardinal
greater than c. For each γ ∈ C write Eγ for {x : x ∈ X, x({γ}) = 1} (that is, G{γ} in the notation of 216Ec); then
Eγ ∈ Σ and µEγ = 1; also every measurable set of non-zero measure meets some Eγ in a set of non-zero measure,
while Eγ ∩ Eδ is negligible for all distinct γ, δ (see 216Ee).
Let (Y, T, ν) be {0, 1}C with the usual measure (254J). For γ ∈ C, let Fγ be {y : y ∈ Y , y(γ) = 1}, so that νFγ = 21 .
Let λ be the c.l.d. product measure on X × Y , and Λ its domain.
(b) Consider the family W = {Eγ × Fγ : γ ∈ C} ⊆ Λ. ?? Suppose, if possible, that V were an essential supremum of
W in Λ in the sense of 211G. For γ ∈ C write Hγ = {x : V [{x}]△Fγ is negligible}. Because Fγ △Fδ is non-negligible,
Hγ ∩ Hδ = ∅ for all γ 6= δ.
Now Eγ \ Hγ is µ-negligible for every γ ∈ C. PP λ((Eγ × Fγ ) \ V ) = 0, so Fγ \ V [{x}] is negligible for almost every
x ∈ Eγ , by 252D. On the other hand, if we set Fγ′ = Y \ Fγ , Wγ = (X × Y ) \ (Eγ × Fγ′ ), then we see that
(Eγ × Fγ′ ) ∩ (Eγ × Fγ ) = ∅, Eγ × Fγ ⊆ Wγ ,

λ((Eδ × Fδ ) \ Wγ ) = λ((Eγ × Fγ′ ) ∩ (Eδ × Fδ )) ≤ µ(Eγ ∩ Eδ ) = 0


for every δ 6= γ, so Wγ is an essential upper bound for W and V ∩(Eγ ×Fγ′ ) = V \Wγ must be λ-negligible. Accordingly
V [{x}] \ Fγ = V [{x}] ∩ Fγ′ is ν-negligible for µ-almost every x ∈ Eγ . But this means that V [{x}]△Fγ is ν-negligible
for µ-almost every x ∈ Eγ , that is, ν(Eγ \ Hγ ) = 0. QQ
Now consider the family hEγ ∩ Hγ iγ∈C . This is a disjoint family of sets of finite measure in X. If E ∈ Σ has non-zero
measure, there is a γ ∈ C such that µ(Eγ ∩ Hγ ∩ E) = ν(Eγ ∩ E) > 0. But this means that E = {Eγ ∩ Hγ : γ ∈ C}
satisfies the conditions of 213O, and µ must be strictly localizable; which it isn’t. X X
252 Product measures *254U

(c) Thus we have found a family W ⊆ Λ with no essential supremum in Λ, and λ is not localizable.
Remark If (X, Σ, µ) and (Y, T, ν) are any localizable measure spaces with a non-localizable c.l.d. product measure,
then their c.l.d. versions are still localizable (213Hb) and still have a non-localizable product (251T), which cannot be
strictly localizable; so that at least one of the factors is not strictly localizable (251O). Thus any example of the type
here must involve a complete locally determined localizable space which is not strictly localizable, as in 216E.

254V Corresponding to 251U and 251Wo, we have the following result on countable powers of atomless probability
spaces.
Proposition Let (X, Σ, µ) be an atomless probability space and I a countable set. Let λ be the product probability
measure on X I . Then {x : x ∈ X I , x is injective} is λ-conegligible.
proof For any pair {i, j} of distinct elements of X, the set {z : z ∈ X {i,j} , z(i) = z(j)} is negligible for the product
measure on X {i,j} , by 251U. By 254Oa, {x : x ∈ X, x(i) = x(j)} is λ-negligible. Because I is countable, there are
only countably many such pairs {i, j}, so {x : x ∈ X, x(i) = x(j) for some distinct i, j ∈ I} is negligible, and its
complement is conegligible; but this complement is just the set of injective functions from I to X.

254X Basic exercises (a) Let h(Xi , Σi , µi )ii∈I be any family of probability spaces, with product (X, Λ, µ). Write
E for the family of subsets of X expressible as the union of a finite disjoint family of measurable cylinders. (i) Show
that if C ⊆ X is a measurable cylinder then X \ C ∈ E. (ii) Show that W ∩ V ∈ E for all W , V ∈ E. (iii) Show that
X \ W ∈ E for every W ∈ E. (iv) Show that E is an algebra of subsets of X. (v) Show that for any W ∈ Λ, ǫ > 0
there is a V ∈ E such that λ(W △V ) ≤ ǫ2 . (vi) Show that for any W ∈ Λ, Sǫ > 0 there are disjoint measurable cylinders
C0 , . . . , Cn such that λ(W ∩ Cj ) ≥ (1 − ǫ)λCj for every j and λ(W \ j≤n Cj ) ≤ ǫ. (Hint: select the Cj from the
R R
measurable cylinders composing a set V as in (v).) (vii) Show that if f R, g are λ-integrable
R functions and C f ≤ C g
for every measurable cylinder C ⊆ X, then f ≤a.e. g. (Hint: show that W f ≤ W f for every W ∈ Λ.)

> (b) Let h(Xi , Σi , µi ) be a family of probability spaces, with product (X, Λ, λ). Show that the outer measure λ∗
defined by λ is exactly the outer measure θ described in 254A, that is, that θ is a regular outer measure.

(c) Let h(Xi , Σi , µi ) be a family of probability spaces, with product (X, Λ, λ). Write λ0 for the restriction of λ to
N
c
i∈I Σi , and C for the family of measurable cylinders in X. Suppose that (Y, T, ν) is a probability space and φ : Y → X
N
a function. (i) Show that φ is inverse-measure-preserving when regarded as a function from (Y, T, ν) to (X, c Σi , λ0 )
i∈I
iff φ−1 [C] belongs to T and νφ−1 [C] = λ0 C for every C ∈ C. (ii) Show that λ0 is the only measure on X with this
property. (Hint: 136C.)

> (d) Let I be a set and (Y, T, ν) a complete probability space. Show that a function φ : Y → {0, 1}I is inverse-
measure-preserving for ν and the usual measure on {0, 1}I iff ν{y : φ(y)(i) = 1 for every i ∈ J} = 2−#(J) for every
finite J ⊆ I.

> (e) Let I be any set and λ the usual measure on X = {0, 1}I . Define addition on X by setting (x+y)(i) = x(i)+2 y(i)
for every i ∈ I, x, y ∈ X, where 0 +2 0 = 1 +2 1 = 0, 0 +2 1 = 1 +2 0 = 1. (i) Show that for any y ∈ X, the map
x 7→ x+y : X → X is inverse-measure-preserving. (Hint: Use 254G.) (ii) Show that the map (x, y) 7→ x+y : X ×X → X
is inverse-measure-preserving, if X × X is given its product measure.

> (f ) Let I be any set and λ the usual measure on PI. (i) Show that the map a 7→ a△b : PI → PI is inverse-
measure-preserving for any b ⊆ I; in particular, a 7→ I \ a is inverse-measure-preserving. (ii) Show that the map
(a, b) 7→ a△b : PI × PI → PI is inverse-measure-preserving.

>(g) Show that for any q ∈ [0, 1] and any set I there is a measure λ on PI such that λ{a : J ⊆ a} = q #(J) for every
finite J ⊆ I.

> (h) Let (Y, T, ν) be a complete probability space, and write µ for Lebesgue measure on [0, 1]. Suppose that
φ : Y → [0, 1] is a function such that νφ−1 [I] exists and is equal to µI for every interval I of the form [2−n k, 2−n (k +1)],
where n ∈ N and 0 ≤ k < 2n . Show that φ is inverse-measure-preserving for ν and µ.

(i) Let hXi ii∈I be a family of sets, and for each i ∈ I let Σi be a σ-algebra of subsets of Xi . Show that for every
N N
E ∈ c i∈I Σi there
Q is a countable
−1
Q set J ⊆ I such that E is expressible as πJ [F ] for some F ∈
c Xj , writing
i∈J
πJ (x) = x↾J ∈ i∈J Xi for x ∈ i∈I Xi .
254Yb Infinite products 253

(j)(i) Let ν be the usual measure on X = {0, 1}N . Show that for any k ≥ 1, (X, ν) is isomorphic to (X k , νk ), where
νk is the measure on X k which is the product measure obtained by giving each factor X the measure ν. (ii) Writing
µ[0,1] for Lebesgue measure on [0, 1], etc., show that for any k ≥ 1, ([0, 1]k , µ[0,1]k ) is isomorphic to ([0, 1], µ[0,1] ).

(k)(i) Writing µ[0,1] for Lebesgue measure on [0, 1], etc., show that ([0, 1], µ[0,1] ) is isomorphic to ([0, 1[ , µ[0,1[ ). (ii)
k
Show that for any k ≥ 1, ([0, 1[ , µ[0,1[k ) is isomorphic to ([0, 1[ , µ[0,1[ ). (iii) Show that for any k ≥ 1, (R, µR ) is
isomorphic to (R k , µRk ).

(l) Let µ be Lebesgue measure on [0, 1] and λ the product measure on [0, 1]N . Show that ([0, 1], µ) and ([0, 1]N , λ)
are isomorphic.
Q
(m) Let h(Xi , Σi , µi )ii∈I be a family of complete probability
Q spaces and λ the product
Q measure on i∈I Xi , with
domain Λ. Suppose that Ai ⊆ Xi for each i ∈ I. Show that i∈I Ai ∈ Λ iff either (i) i∈I µ∗i Ai = 0 or (ii) Ai ∈ Σi for
every i and {i : Ai 6= Xi } is countable. (Hint: assemble ideas from 252Xc, 254F, 254L and 254N.)

(n) Let h(Xi , Σi , µi )ii∈I be a family of probability spaces with product (X, Λ, λ). (i) Show that, for any A ⊆ X,
λ∗ A = min{λ∗J πJ [A] : J ⊆ I is countable},
Q
where for J ⊆ I I write λJ for the product probability measure on XJ = i∈J Xi and πJ : X → XJ for the canonical
map. (ii) Show that if J, K ⊆ I are disjoint and A, B ⊆ X are determined by coordinates in J, K respectively, then
λ∗ (A ∩ B) = λ∗ A · λ∗ B.

(o) Let h(Xi , Σi , µi )ii∈I be a family of probability spaces with product (X, Λ, λ). Let S be the linear span of the set
of characteristic functions of measurable cylinders in X, as in 254Q. Show that {f • : f ∈ S} is dense in Lp (µ) for every
p ∈ [1, ∞[.

(p) Let h(Xi , Σi , µi )ii∈I be a family of probability spaces, and (X, Λ, λ) their product; for J ⊆ I let ΛJ be the
σ-algebra of members of Λ determined by coordinates in J and PJ : L1 = L1 (λ) → L 1 1
R J = L (λ↾Λ
R J )Rthe corresponding
1 1 1
conditional expectation. (i) Show that if u ∈ LJ and v ∈ LI\J then u × v ∈ L and u × v = u · v. (Hint: 253D.)
R R
(ii) Show that if u ∈ L1 then u ∈ L1J iff C u = λC · u for every measurable cylinder C ⊆ X which is determined by
T T
coordinates in I \J. (Hint: 254Xa(vii).) (iii) Show that if J ⊆ PI is non-empty, with J ∗ = J , then L1J ∗ = J∈J L1J .

(q)(i) Let I be any set and λ the usual measure on PI. Let A ⊆ PI be such that a△b ∈ A whenever a ∈ A and
b ⊆ I is finite. Show that λ∗ A must be either 0 or 1. (ii) Let λ be the usual measure on {0, 1}N , and Λ its domain.
Let f : {0, 1}N → R be a function such that, for x, y ∈ {0, 1}N , f (x) = f (y) ⇐⇒ {n : n ∈ N, x(n) 6= y(n)} is finite.
Show that f is not Λ-measurable. (Hint: for any q ∈ Q, λ∗ {x : f (x) ≤ q} is either 0 or 1.)
Q
(r) Let hXi ii∈I be any family of sets and A ⊆ B ⊆ i∈I Xi . Suppose that A is determined by coordinates in J ⊆ I
and that B is determined by coordinates in K. Show that there is a set C such that A ⊆ C ⊆ B and C is determined
by coordinates in J ∩ K.

(s) Show that if φ̃ : {0, 1}N → [0, 1] is any bijection constructed by the method of 254K, then {φ̃−1 [E] : E ⊆ [0, 1] is
a Borel set} is just the σ-algebra of subsets of {0, 1}N generated by the sets {x : x(i) = 1} for i ∈ N.

254Y Further exercises Q (a) Let h(Xi , Σi , µi )ii∈I be a family of probability spaces, and for J ⊆ I let λJ be the
product measure on XJ = i∈J Xi ; write X = XI , λ = λI and πJ (x) = x↾J for x ∈ X and J ⊆ I.
(i) Show that for K ⊆ J ⊆ I we have a natural linear, order-preserving and norm-preserving map TJK : L1 (λK ) →
1
L (λJ ) defined by writing TJK (f • ) = (f πKJ )• for every λK -integrable function f , where πKJ (y) = y↾K for y ∈ XJ .
(ii) Write K for the set of finite subsets of I. Show that if W is any Banach space and hTK iK∈K is a family such
that (α) TK is a bounded linear operator from L1 (λK ) to W for every K ∈ K (β) TK = TJ TJK whenever K ⊆ J ∈ K
(γ) supK∈K kTK k < ∞, then there is a unique bounded linear operator T : L1 (λ) → W such that TK = T TIK for every
K ∈ K. S
(iii) Write J for the set of countable subsets of I. Show that L1 (λ) = J∈J TIJ [L1 (λJ )].
Q
(b) Let h(Xi , Σi , µi )ii∈I be a family of probability spaces, and λ a complete measure on X = i∈I Xi . Suppose
that for every complete probability space (Y, T, ν) and function φ : Y → X, φ is inverse-measure-preserving for ν and
λ iff νφ−1 [C] is defined and equal to θ0 C for every measurable cylinder C ⊆ X, writing θ0 for the functional of 254A.
Show that λ is the product measure on X.
254 Product measures 254Yc

(c) Let I be a set, and λ the usual measure on {0, 1}I . Show that L1 (λ) is separable, in its norm topology, iff I is
countable.

(d) Let I be a set, and λ the usual measure on PI. Show that if F is a non-principal ultrafilter on I then λ∗ F = 1.
(Hint: 254Xq, 254Xf.)

(e) Let (X, Σ, µ), (Y, T, ν) and λ be as in 254U. Set A = {xγ : γ ∈ C} as defined in 216E. Let µA be the subspace
measure on A, and λ̃ the c.l.d. product measure of µA and ν on A × Y . Show that λ̃ is a proper extension of the
subspace measure λA×Y . (Hint: consider W̃ = {(fγ , y) : γ ∈ C, y ∈ Fγ }, in the notation of 254U.)

(f ) Let (X, Σ, µ) be an atomless probability space, I a set with cardinal at most #(X), and A the set of injective
functions from I to X. Show that A has full outer measure for the product measure on X I .

254 Notes and comments While there are many reasons for studying infinite products of probability spaces, one
stands pre-eminent, from the point of view of abstract measure theory: they provide constructions of essentially new
kinds of measure space. I cannot describe the nature of this ‘newness’ effectively without venturing into the territory
of Volume 3. But the function spaces of Chapter 24 do give at least a form of words we can use: these are the first
probability spaces (X, Λ, λ) we have seen for which L1 (λ) need not be separable for its norm topology (254Yc).
The formulae of 254A, like those of 251A, lead very naturally to measures; the point at which they become more
than a curiosity is when we find that the product measure λ is a probability measure (254Fa), which must be regarded
as the crucial argument of this section, just as 251E is the essential basis of §251. It is I think remarkable that it
makes no difference to the result here whether I is finite, countably infinite or uncountable. If you write out the
proof for the case I = N, it will seem natural to expand the sets Jn until they are initial segments of I itself, thereby
avoiding altogether the auxiliary set K; but this is a misleading simplification, because it hides an essential feature of
the argument, which is that any sequence in C involves only countably many coordinates, so that as long as we are
dealing with only one such sequence the uncountability of the whole set I is irrelevant. This general principle naturally
permeates the whole of the section; in 254O I have tried to spell out the way in which many of the questions we are
interested in can be expressed in terms of countable subproducts of the factor spaces Xi . See also the exercises 254Xi,
254Xm and 254Ya(iii).
There is a slightly paradoxical side to this principle: even the best-behaved subsets Ei of Xi may fail to have
Q I
measurable products i∈I Ei if Ei 6= Xi for uncountably many i. For instance, ]0, 1[ is not a measurable subset of
[0, 1]I if I is uncountable (254Xm). It has full outer measure and its own product measure is just the subspace measure
(254L), but any measurable subset must have measure zero. The point is that the empty set is the only member of
N
c Σi , where Σi is the algebra of Lebesgue measurable subsets of [0, 1] for each i, which is included in ]0, 1[I (see
i∈I
254Xi).
As in §251, I use a construction which automatically produces a complete measure on the product space. I am sure
that this is the best choice for ‘the’ product measure. But there are occasions when its restriction to the σ-algebra
generated by the measurable cylinders is worth looking at; see 254Xc.
Lemma 254G is a result of a type which will be commoner in Volume 3 than in the present volume. It describes the
product measure in terms not of what it is but of what it does; specifically, in terms of a property of the associated
family of inverse-measure-preserving functions. It is therefore a ‘universal mapping theorem’. (Compare 253F.) Because
this description is sufficient to determine the product measure completely (254Yb), it is not surprising that I use it
repeatedly.
The ‘usual measure’ on {0, 1}I (254J) is sometimes called ‘coin-tossing measure’ because it can be used to model the
concept of tossing a coin arbitrarily many times indexed by the set I, taking an x ∈ {0, 1}I to represent the outcome
in which the coin is ‘heads’ for just those i ∈ I for which x(i) = 1. The sets, or ‘events’, in the class C are just those
which can be specified by declaring the outcomes of finitely many tosses, and the probability of any particular sequence
of n results is 1/2n , regardless of which tosses we look at or in which order. In Chapter 27 I will return to the use of
product measures to represent probabilities involving independent events.
In 254K I come to the first case in this treatise of a non-trivial isomorphism between two measure spaces. If you
have been brought up on a conventional diet of modern abstract pure mathematics based on algebra and topology,
you may already have been struck by the absence of emphasis on any concept of ‘homomorphism’ or ‘isomorphism’.
Here indeed I start to speak of ‘isomorphisms’ between measure spaces without even troubling to define them; I hope
it really is obvious that an isomorphism between measure spaces (X, Σ, µ) and (Y, T, ν) is a bijection φ : X → Y such
that T = {F : F ⊆ Y , φ−1 [F ] ∈ Σ} and νF = µφ−1 [F ] for every F ∈ T, so that Σ is necessarily {E : E ⊆ X, φ[E] ∈ T}
and µE = νφ[E] for every E ∈ Σ. Put like this, you may, if you worked through the exercises of Volume 1, be reminded
of some constructions of σ-algebras in 111Xc-111Xd and of the ‘image measures’ in 234C-234D. The result in 254K
255A Convolutions of functions 255

(see also 134Yo) naturally leads to two distinct notions of ‘homomorphism’ between two measure spaces (X, Σ, µ) and
(Y, T, ν):
(i) a function φ : X → Y such that φ−1 [F ] ∈ Σ and µφ−1 [F ] = νF for every F ∈ T,
(ii) a function φ : X → Y such that φ[E] ∈ T and νφ[E] = µE for every E ∈ Σ.
On either definition, we find that a bijection φ : X → Y is an isomorphism iff φ and φ−1 are both homomorphisms.
(Also, of course, the composition of homomorphisms will be a homomorphism.) My own view is that (i) is the more
important, and in this treatise I study such functions at length, calling them ‘inverse-measure-preserving’. But both
have their uses. The function φ of 254K not only satisfies both definitions, but is also ‘nearly’ an isomorphism in several
different ways, of which possibly the most important is that there are conegligible sets X ′ ⊆ {0, 1}N , Y ′ ⊆ [0, 1] such
that φ↾X ′ is an isomorphism between X ′ and Y ′ when both are given their subspace measures.
Having once established the isomorphism between [0, 1] and {0, 1}N , we are led immediately to many more; see
254Xj-254Xl. In fact Lebesgue measure on [0, 1] is isomorphic to a large proportion of the probability spaces arising in
applications. In Volumes 3 and 4 I will discuss these isomorphisms at length.
The general notion of ‘subproduct’ is associated with some of the deepest and most characteristic results in the theory
of product measures. Because we are looking at products of arbitrary families of probability spaces, the definition must
ignore any possible structure in the index set I of 254A-254C. But many applications, naturally enough, deal with index
sets with favoured subsets or partitions, and the first essential step is the ‘associative law’ (254N; compare 251Xe-251Xf
and 251Wh). This is, for instance, Q the tool
Q by which we can apply Fubini’s theorem within infinite products. The
natural projection maps from i∈I Xi to i∈J Xi , where Q J ⊆ I, are related in a way which has already been used
as
Q the basis of theorems in §235; the product measure on i∈J Xi is precisely the image of the product measure on
i∈I X i (254Oa). In 254O-254Q I explore the consequences of this fact and the fact already noted that all measurable
sets in the product are ‘essentially’ determined by coordinates Q in some countable set.
In 254R I go more deeply into this notion of a set W ⊆ i∈I Xi ‘determined by coordinates in’ a set J ⊆ I. In its
primitive form this is a purely set-theoretic notion (254M, 254Ta). I think that even a three-element set I can give
us surprises; I invite you to try to visualize subsets of [0, 1]3 which are determined by pairs of coordinates. But the
interactions of this with measure-theoretic ideas, and in particular with a willingness to add or discard negligible sets,
lead to much more, and in particular to the unique minimal sets of coordinates associated with measurable sets and
functions (254R). Of course these results can be elegantly and effectively described in terms of L1 and L0 spaces, in
which negligible sets are swept out of sight as the spaces are constructed. The basis of all this is the fact that the
conditional expectation operators associated with subproducts multiply together in the simplest possible T way (254Ra);
but some further idea is needed to show that if J is a non-empty family of subsets of I, then L0T J = J∈J L0J (see
part (b) of the proof of 254R, and 254Xp(iii)).
254Sa is a version of the ‘zero-one law’ (272O below). 254Sb is a strong version of the principle that measurable sets
in a product must be approximable by sets determined by a finite set of coordinates (254Fe, 254Qa, 254Xa). Evidently
it is not a coincidence that the set W of 254Tb is negligible. In §272 I will revisit many of the ideas of 254R-254S and
254Xp, in particular, in the more general context of ‘independent σ-algebras’.
Finally, 254U and 254Ye hardly belong to this section at all; they are unfinished business from §251. They are
here because the construction of 254A-254C is the simplest way to produce an adequately complex probability space
(Y, T, ν).

255 Convolutions of functions


I devote a section to a construction which is of great importance – and will in particular be very useful in Chapters
27 and 28 – and may also be regarded as a series of exercises on the work so far.
I find it difficult to know how much repetition to indulge in in this section, because the natural unified expression
of the ideas is in the theory of topological groups, and I do not think we are yet ready for the general theory (I will
come to it in Chapter 44 in Volume 4). The groups we need for this volume are
R;
R r , for r ≥ 2;
S 1 = {z : z ∈ C, |z| = 1}, the ‘circle group’;
Z, the group of integers.
All the ideas already appear in the theory of convolutions on R, and I will therefore present this material in relatively
detailed form, before sketching the forms appropriate to the groups R r and S 1 (or ]−π, π]); Z can I think be safely left
to the exercises.
256 Product measures 255A

255A This being a book on measure theory, it is perhaps appropriate for me to emphasize, as the basis of the
theory of convolutions, certain measure space isomorphisms.
Theorem Let µ be Lebesgue measure on R and µ2 Lebesgue measure on R 2 ; write Σ, Σ2 for their domains.
(a) For any a ∈ R, the map x 7→ a + x : R → R is a measure space automorphism of (R, Σ, µ).
(b) The map x 7→ −x : R → R is a measure space automorphism of (R, Σ, µ).
(c) For any a ∈ R, the map x 7→ a − x : R → R is a measure space automorphism of (R, Σ, µ).
(d) The map (x, y) 7→ (x + y, y) : R 2 → R 2 is a measure space automorphism of (R 2 , Σ2 , µ2 ).
(e) The map (x, y) 7→ (x − y, y) : R 2 → R 2 is a measure space automorphism of (R 2 , Σ2 , µ2 ).
Remark I ought to remark that (b), (d) and (e) may be regarded as simple special cases of Theorem 263A in the
next chapter. I nevertheless feel that it is worth writing out separate proofs here, partly because the general case
of linear operators dealt with in 263A requires some extra machinery not needed here, but more because the result
here has nothing to do with the linear structure of R and R 2 ; it is exclusively dependent on the group structure of R,
together with the links between its topology and measure, and the arguments I give now are adaptable to the proper
generalizations to abelian topological groups.
proof (a) This is just the translation-invariance of Lebesgue measure, dealt with in §134. There I showed that if E ∈ Σ
then E + a ∈ Σ and µ(E + a) = µE (134Ab); that is, writing φ(x) = x + a, µ(φ[E]) exists and is equal to µE for every
E ∈ Σ. But of course we also have
µ(φ−1 [E]) = µ(E + (−a)) = µE
for every E ∈ Σ, so φ is an automorphism.
(b) The point is that µ∗ (A) = µ∗ (−A) for every A ⊆ R. P P
P(I follow the definitions of VolumeS1.) If ǫ > 0, there

is a sequence hIn in∈N of half-open intervals covering A with n=0 µIn ≤ µ∗ A + ǫ. Now −A ⊆ n∈N (−In ). But if
In = [an , bn [ then −In = ]−bn , an ], so
P∞ P∞ P∞
µ∗ (−A) ≤ n=0 µ(−In ) = n=0 max(0, −an − (−bn )) = n=0 µIn ≤ µ∗ A + ǫ.
As ǫ is arbitrary, µ∗ (−A) ≤ µ∗ A. Also of course µ∗ A ≤ µ∗ (−(−A)) = µ∗ A, so µ∗ (−A) = µ∗ A. Q
Q
This means that, setting φ(x) = −x this time, φ is an automorphism of the structure (R, µ∗ ). But since µ is
defined from µ∗ by the abstract procedure of Carathéodory’s method, φ must also be an automorphism of the structure
(R, Σ, µ).
(c) Put (a) and (b) together; x 7→ a − x is the composition of the automorphisms x 7→ −x and x 7→ a + x, and the
composition of automorphisms is surely an automorphism.
(d)(i) Write T for the set {E : E ∈ Σ2 , φ[E] ∈ Σ2 }, where this time φ(x, y) = (x + y, y) for x, y ∈ R, so that
φ : R 2 → R2 is a bijection. Then T is a σ-algebra, being the intersection of the σ-algebras Σ2 and {E : φ[E] ∈ Σ2 } =
{φ−1 [F ] : F ∈ Σ2 }. Moreover, µ2 E = µ2 (φ[E]) for every E ∈ T. P
P By 252D, we have
R
µ2 E = µ{x : (x, y) ∈ E}µ(dy).
But applying the same result to φ[E] we have

Z Z
µ2 φ[E] = µ{x : (x, y) ∈ φ[E]}µ(dy) = µ{x : (x − y, y) ∈ E}µ(dy)
Z Z
= µ(E −1 [{y}] + y)µ(dy) = µE −1 [{y}]µ(dy)

(because Lebesgue measure is translation-invariant)


= µ2 E. Q
Q

(ii) Now φ and φ−1 are clearly continuous, so that φ[G] is open, and therefore measurable, for every open G;
consequently all open sets must belong to T. Because T is a σ-algebra, it contains all Borel sets. Now let E be any
measurable set. Then there are Borel sets H1 , H2 such that H1 ⊆ E ⊆ H2 and µ2 (H2 \ H1 ) = 0 (134Fb). We have
φ[H1 ] ⊆ φ[E] ⊆ φ[H2 ] and
µ(φ[H2 ] \ φ[H1 ]) = µφ[H2 \ H1 ] = µ(H2 \ H1 ) = 0.
Thus φ[E] \ φ[H1 ] must be negligible, therefore measurable, and φ[E] = φ[H1 ] ∪ (φ[E] \ φ[H1 ]) is measurable. This
shows that φ[E] is measurable whenever E is.
255E Convolutions of functions 257

(iii) Repeating the same arguments with −y in the place of y, we see that φ−1 [E] is measurable, and µ2 φ−1 [E] =
µ2 E, for every E ∈ Σ2 . So φ is an automorphism of the structure (R 2 , Σ2 , µ2 ).
(e) Of course this is an immediate corollary either of the proof of (d) or of (d) itself as stated, since (x, y) 7→ (x−y, y)
is just the inverse of (x, y) 7→ (x + y, y).

255B Corollary (a) If a ∈ R, then for any complex-valued function f defined on a subset of R
R R R R
f (x)dx = f (a + x)dx = f (−x)dx = f (a − x)dx
in the sense that if one of the integrals exists so do the others, and they are then all equal.
(b) If f is a complex-valued function defined on a subset of R 2 , then
R R R
f (x + y, y)d(x, y) = f (x − y, y)d(x, y) = f (x, y)d(x, y)
in the sense that if one of the integrals exists and is finite so does the other, and they are then equal.

255C Remarks (a) I am not sure whether it ought to be ‘obvious’ that if (X, Σ, µ), (Y, T, ν) are measure spaces
and φ : X → Y is an isomorphism, then for any function f defined on a subset of Y
R R
f (φ(x))µ(dx) = f (y)ν(dy)
in the sense that if one is defined so is the other, and they are then equal. If it is obvious then the obviousness must be
contingent on the nature of the definition of integration: integrability with respect to the measure µ is something which
depends on the structure (X, Σ, µ) and on no other properties of X. If it is not obvious then it is an easy deduction
from Theorem 235A above, applied in turn to φ and φ−1 and to the real and imaginary parts of f . In any case the
isomorphisms of 255A are just those needed to prove 255B.
R
(b) Note that in 255Bb I write f (x, y)d(x, y) to emphasize that I am considering the integral of f with respect to
two-dimensional Lebesgue measure. The fact that
R R  R R  R R 
f (x, y)dx dy = f (x + y, y)dx dy = f (x − y, y)dx dy
R R
is actually easier, being an immediate consequence of the equality f (a + x)dx = f (x)dx. But applications of this
result often depend essentially on the fact that the functions (x, y) 7→ f (x + y, y), (x, y) 7→ f (x − y, y) are measurable
as functions of two variables.

(c) I have moved directly to complex-valued functions because these are necessary for the applications in Chapter
28. If however they give you any discomfort, either technically or aesthetically, all the measure-theoretic ideas of this
section are already to be found in the real case, and you may wish at first to read it as if only real numbers were
involved.

255D A further corollary of 255A will be useful.


Corollary Let f be a complex-valued function defined on a subset of R.
(a) If f is measurable, then the functions (x, y) 7→ f (x + y), (x, y) 7→ f (x − y) are measurable.
(b) If f is defined almost everywhere in R, then the functions (x, y) 7→ f (x + y), (x, y) 7→ f (x − y) are defined almost
everywhere in R 2 .
proof Writing g1 (x, y) = f (x + y), g2 (x, y) = f (x − y) whenever these are defined, we have
g1 (x, y) = (f ⊗ χR)(φ(x, y)), g2 (x, y) = (f ⊗ χR)(φ−1 (x, y)),
writing φ(x, y) = (x + y, y) as in 255B(d-e), and (f ⊗ χR)(x, y) = f (x), following the notation of 253B. By 253C,
f ⊗ χR is measurable if f is, and defined almost everywhere if f is. Because φ is a measure space automorphism,
(f ⊗ χR)φ = g1 and (f ⊗ χR)φ−1 = g2 are measurable, or defined almost everywhere, if f is.

255E The basic formula Let f and g be measurable complex-valued functions defined almost everywhere in R.
Write f ∗ g for the function defined by the formula
R
(f ∗ g)(x) = f (x − y)g(y)dy
whenever the integral exists (with respect to Lebesgue measure, naturally) as a complex number. Then f ∗ g is the
convolution of the functions f and g.
258 Product measures 255E

Observe that dom(|f | ∗ |g|) = dom(f ∗ g), and that |f ∗ g| ≤ |f | ∗ |g| everywhere on their common domain, for all f
and g.
Remark Note that I am here prepared to contemplate the convolution of f and g for arbitrary members of L0C , the
space of almost-everywhere-defined measurable complex-valued functions, even though the domain of f ∗ g may be
empty.

255F Elementary properties (a) Because integration is linear, we surely have


((f1 + f2 ) ∗ g)(x) = (f1 ∗ g)(x) + (f2 ∗ g)(x),

(f ∗ (g1 + g2 ))(x) = (f ∗ g1 )(x) + (f ∗ g2 )(x),

(cf ∗ g)(x) = (f ∗ cg)(x) = c(f ∗ g)(x)


whenever the right-hand sides of the formulae are defined.

(b) If f and g are measurable complex-valued functions defined almost everywhere in R, then f ∗ g = g ∗ f , in the
strict sense that they have the same domain and the same value at each point of that common domain.
PP Take x ∈ R and apply 255Ba to see that
Z Z
(f ∗ g)(x) = f (x − y)g(y)dy = f (x − (x − y))g(x − y)dy
Z
= f (y)g(x − y)dy = (g ∗ f )(x)

if either is defined. Q
Q

(c) If f1 , f2 , g1 , g2 are measurable complex-valued functions defined almost everywhere in R, f1 =a.e. f2 and
g1 =a.e. g2 , then f1 ∗ g1 = f2 ∗ g2 . P P For every x ∈ R we shall have f1 (x − y) = f2 (x − y) for almost every y ∈ R, by
255Ac. Consequently f1 (x − y)g1 (y) = f2 (x − y)g2 (y) for almost every y, and (f1 ∗ g1 )(x) = (f2 ∗ g2 )(x) in the sense
that if one of these is defined so is the other, and they are then equal. Q Q
It follows that if u, v ∈ L0C , then we have a function θ(u, v) which is equal to f ∗ g whenever f , g ∈ L0C are such that
f • = u and g • = u. Observe that θ(u, v) = θ(v, u), and that θ(u1 + u2 , v) extends θ(u1 , v) + θ(u2 , v), θ(cu, v) extends
cθ(u, v) for all u, u1 , u2 , v ∈ L0C and c ∈ C.

255G I have grouped 255Fa-255Fc together because they depend only on ideas up to and including 255Ac and
255Ba. Using the second halves of 255A and 255B we get much deeper. I begin with what seems to be the fundamental
result.
Theorem Let f , g and
R h be measurable complex-valued functions defined almost everywhere in R.
(a) Suppose that h(x + y)f (x)g(y)d(x, y) exists in C. Then
Z Z
h(x)(f ∗ g)(x)dx = h(x + y)f (x)g(y)d(x, y)
ZZ ZZ
= h(x + y)f (x)g(y)dxdy = h(x + y)f (x)g(y)dydx

provided that in the expression h(x)(f ∗ g)(x) we interpret the product as R 0 if h(x) = 0 and (f ∗ g)(x) is undefined.
R
(b) If, on a similar interpretation of |h(x)|(|f | ∗ |g|)(x), the integral |h(x)|(|f | ∗ |g|)(x)dx is finite, then h(x +
y)f (x)g(y)d(x, y) exists in C.
proof Consider the functions
k1 (x, y) = h(x)f (x − y)g(y), k2 (x, y) = h(x + y)f (x)g(y)
wherever these are defined. 255D tells us that k1 and k2 are measurable and defined almost everywhere. Now setting
φ(x, y) = (x + y, y), we have k2 = k1 φ, so that
R R
k1 (x, y)d(x, y) = k2 (x, y)d(x, y)
if either exists, by 255Bb.
If
255I Convolutions of functions 259
R R
h(x + y)f (x)g(y)d(x, y) = k2
exists, then by Fubini’s theorem we have
R R R R
k2 = k1 (x, y)d(x, y) = ( h(x)f (x − y)g(y)dy)dx
R
so h(x)f (x − y)g(y)dy exists almost everywhere, that is, (f ∗ g)(x) exists for almost every x such that h(x) 6= 0; on
the interpretation I am using here, h(x)(f ∗ g)(x) exists almost everywhere, and
Z Z Z Z

h(x)(f ∗ g)(x)dx = h(x)f (x − y)g(y)dy dx = k1
Z Z
= k2 = h(x + y)f (x)g(y)d(x, y)
ZZ ZZ
= h(x + y)f (x)g(y)dxdy = h(x + y)f (x)g(y)dydx

by Fubini’s theorem again.


If (on the same interpretation) |h| × (|f | ∗ |g|) is integrable,
|k1 (x, y)| = |h(x)||f (x − y)||g(y)|
is measurable, and
RR R
|h(x)||f (x − y)||g(y)|dydx = |h(x)|(|f | ∗ |g|)(x)dx
is finite, so by Tonelli’s theorem (252G, 252H) k1 and k2 are integrable.

255H Certain standard results are now easy.

Corollary If f , g are complex-valued functions which are integrable over R, then f ∗ g is integrable, with
R R R R R R
f ∗g = f g, |f ∗ g| ≤ |f | |g|.

proof In 255G, set h(x) = 1 for every x ∈ R; then


R R R R
h(x + y)f (x)g(y)d(x, y) = f (x)g(y)d(x, y) = f g
by 253D, so
R R R R R
f ∗g = h(x)(f ∗ g)(x)dx = h(x + y)f (x)g(y)d(x, y) = f g,
as claimed. Now
R R R R
|f ∗ g| ≤ |f | ∗ |g| = |f | |g|.

255I Corollary For any measurable complex-valued functions f , g defined almost everywhere in R, f ∗ g is mea-
surable and has measurable domain.

proof Set fn (x) = f (x) if x ∈ dom f , |x| ≤ n and |f (x)| ≤ n, and 0 elsewhere in R; define gn similarly from g.
Then fn and gn are integrable, |fn | ≤ |f | and |gn | ≤ |g| almost everywhere, f =a.e. limn→∞ fn and g =a.e. limn→∞ gn .
Consequently, by Lebesgue’s Dominated Convergence Theorem,
Z Z
(f ∗ g)(x) = f (x − y)g(y)dy = lim fn (x − y)gn (y)dy
n→∞
Z
= lim fn (x − y)gn (y)dy = lim (fn ∗ gn )(x)
n→∞ n→∞

for every x ∈ dom f ∗ g. But fn ∗ gn is integrable, therefore measurable, for every n, so that f ∗ g must be measurable.
As for the domain of f ∗ g,
260 Product measures 255I

Z
x ∈ dom(f ∗ g) ⇐⇒ f (x − y)g(y)dy is defined in C
Z
⇐⇒ |f (x − y)||g(y)|dy is defined in R
Z
⇐⇒ |fn (x − y)||gn (y)|dy is defined in R for every n
Z
and sup |fn (x − y)||gn (y)|dy < ∞.
n∈N

Because every |fn | ∗ |gn | is integrable, therefore measurable and with measurable domain,
T
dom(f ∗ g) = {x : x ∈ n∈N dom(|fn | ∗ |gn |), supn∈N (|fn | ∗ |gn |)(x) < ∞}
is measurable.

255J Theorem Let f , g and h be complex-valued measurable functions, defined almost everywhere in R, such that
f ∗ g and g ∗ h are defined a.e. Suppose that x ∈ R is such that one of (|f | ∗ (|g| ∗ |h|))(x), ((|f | ∗ |g|) ∗ |h|)(x) is defined
in R. Then f ∗ (g ∗ h) and (f ∗ g) ∗ h are defined and equal at x.
proof Set k(y) = f (x − y) when this is defined, so that k is measurable and defined almost everywhere (255D).
R
(a) If (|f | ∗ (|g| ∗ |h|))(x) is defined, this is |k(y)|(|g| ∗ |h|)(y)dy, so by 255G we have
R R
k(y)(g ∗ h)(y)dy = k(y + z)g(y)h(z)d(y, z),
that is,
Z Z
(f ∗ (g ∗ h))(x) = f (x − y)(g ∗ h)(y)dy =k(y)(g ∗ h)(y)dy
Z ZZ
= k(y + z)g(y)h(z)d(y, z) = k(y + z)g(y)h(z)dydz
ZZ Z
= f (x − y − z)g(y)h(z)dydz = (f ∗ g)(x − z)h(z)dz

= ((f ∗ g) ∗ h)(x).

(b) If ((|f | ∗ |g|) ∗ |h|)(x) is defined, this is


Z ZZ
(|f | ∗ |g|)(x − z)|h(z)|dz = |f (x − z − y)||g(y)||h(z)|dydz
ZZ
= |k(y + z)||g(y)||h(z)|dydz.
R
By 255D again, (y, z) 7→ k(y
R + z) is measurable, so we can apply Tonelli’s theorem to see that k(y + z)g(y)h(z)d(y, z)
is defined, and isRequal to k(y)(g ∗ h)(y)dy = (f ∗ (g ∗ h))(x) by 255Ga. On the other side, by the last two lines of
the proof of (a), k(y + z)g(y)h(z)d(y, z) is also equal to ((f ∗ g) ∗ h)(x).

255K I do not think we shall need an exhaustive discussion of the question of just when (f ∗ g)(x) is defined;
this seems to be complicated. However there is a fundamental case in which we can be sure that (f ∗ g)(x) is defined
everywhere.
Proposition Suppose that f , g are measurable complex-valued functions defined almost everywhere in R, and that
f ∈ LpC , g ∈ LqC where p, q ∈ [1, ∞] and p1 + 1q = 1 (writing ∞
1
= 0 as usual). Then f ∗ g is defined everywhere in R, is
uniformly continuous, and

sup |(f ∗ g)(x)| ≤ kf kp kgkq if 1 < p < ∞, 1 < q < ∞,


x∈R

≤ kf k1 ess sup |g| if p = 1, q = ∞,


≤ ess sup |f | · kgk1 if p = ∞, q = 1.
255Ma Convolutions of functions 261

proof (a) (For an introduction to Lp spaces, see §244.) For any x ∈ R, the function fx , defined by setting fx (y) =
p
f (x−y) whenever x−y ∈ dom
R f , must also belong to L , because fx = f φ for an automorphism φ of the measure space.
Consequently (f ∗ g)(x) = fx × g is defined, and of modulus at most kf kp kgkq or kf k1 ess sup |g| or ess sup |f | · kgk1 ,
by 244Eb/244Pb and 243Fa/243K.
(b) To see that f ∗ g is uniformly continuous, argue as follows. Suppose first that p < ∞. Let ǫ > 0. Let η > 0
be such that (2 + 21/p )kgkq η ≤ ǫ. Then there is a bounded continuous function h : R → C such that {x : h(x) 6= 0}
is bounded and kf − hkp ≤ η (244Hb/244Pb); let M ≥ 1 be such that h(x) = 0 whenever |x| ≥ M − 1. Next, h is
uniformly continuous, so there is a δ ∈ ]0, 1] such that |h(x) − h(x′ )| ≤ M −1/p η whenever |x − x′ | ≤ δ.
Suppose that |x − x′ | ≤ δ. Defining hx (y) = h(x − y), as before, we have

Z Z Z
|hx − hx′ |p = |h(x − y) − h(x′ − y)|p dy = |h(t) − h(x′ − x + t)|p dt

(substituting t = x − y)
Z M
= |h(t) − h(x′ − x + t)|p dt
−M
(because h(t) = h(x′ − x + t) = 0 if |t| ≥ M )
≤ 2M (M −1/p η)p
(because |h(t) − h(x′ − x + t)| ≤ M −1/p η for every t)
= 2η p .

So khx − hx′ kp ≤ 21/p η. On the other hand,


R R R
|hx − fx |p = |h(x − y) − f (x − y)|p dy = |h(y) − f (y)|p dy,
so khx − fx kp = kh − f kp ≤ η, and similarly khx′ − fx′ kp ≤ η. So
kfx − fx′ kp ≤ kfx − hx kp + |hx − hx′ kp + khx′ − fx′ kp ≤ (2 + 21/p )η.
This means that
Z Z Z

|(f ∗ g)(x) − (f ∗ g)(x )| = | fx × g − f × g| = |
x′ (fx − fx′ ) × g|

≤ kfx − fx′ |p kgkq ≤ (2 + 21/p )kgkq η ≤ ǫ.


As ǫ is arbitrary, f ∗ g is uniformly continuous.
The argument here supposes that p is finite. But if p = ∞ then q = 1 is finite, so we can apply the method with g
in place of f to show that g ∗ f is uniformly continuous, and f ∗ g = g ∗ f by 255Fb.

255L The r-dimensional case I have written 255A-255K out as theorems about Lebesgue measure on R. However
they all apply equally well to Lebesgue measure on R r for any r ≥ 1, and the modifications required are so small that
I think I need do no more than ask you to read through the arguments again, turning every R into an R r , and every
R 2 into an (R r )2 . In 255A and elsewhere, the measure µ2 should be read either as Lebesgue measure on R 2r or as the
product measure on (R r )2 ; by 251N the two may be identified. There is a trivial modification required in part (b) of
the proof; if In = [an , bn [ then
Qr
µIn = µ(−In ) = i=1 max(0, βni − αni ),
writing an = (αn1 , . . . , αnr ). In the proof of 255I, the functions fn should be defined by saying that fn (x) = f (x) if
|f (x)| ≤ n and kxk ≤ n, 0 otherwise.
In quoting these results, therefore, I shall be uninhibited in referring to the paragraphs 255A-255K as if they were
actually written out for general r ≥ 1.

255M The case of ]−π, π] The same ideas also apply to the circle group S 1 and to the interval ]−π, π], but here
perhaps rather more explanation is in order.

(a) The first thing to establish is the appropriate group operation. If we think of S 1 as the set {z : z ∈ C, |z| = 1},
then the group operation is complex multiplication, and in the formulae above x + y must be rendered as xy, while
x − y must be rendered as xy −1 . On the interval ]−π, π], the group operation is +2π , where for x, y ∈ ]−π, π] I write
262 Product measures 255Ma

x +2π y for whichever of x + y, x + y + 2π, x + y − 2π belongs to ]−π, π]. To see that this is indeed a group operation, one
method is to note that it corresponds to multiplication on S 1 if we use the canonical bijection x 7→ eix : ]−π, π] → S 1 ;
another, to note that it corresponds to the operation on the quotient group R/2πZ. Thus in this interpretation of the
ideas of 255A-255K, we shall wish to replace x + y by x +2π y, −x by −2π x, and x − y by x −2π y, where
−2π x = −x if x ∈ ]−π, π[, −2π π = π,
and x −2π y is whichever of x − y, x − y + 2π, x − y − 2π belongs to ]−π, π].

(b) As for the measure, the measure to use on ]−π, π] is just Lebesgue measure. Note that because ]−π, π] is
Lebesgue measurable, there will be no confusion concerning the meaning of ‘measurable subset’, as the relatively
measurable subsets of ]−π, π] are actually measured by Lebesgue measure on R. Also we can identify the product
measure on ]−π, π] × ]−π, π] with the subspace measure induced by Lebesgue measure on R 2 (251R).
On S 1 , we need the corresponding measure induced by the canonical bijection between S 1 and ]−π, π], which indeed
is often called ‘Lebesgue measure on S 1 ’. (We shall see in 265E that it is also equal to Hausdorff one-dimensional
measure on S 1 .) We are very close to the level at which it would become reasonable to move to S 1 and this measure
(or its normalized version, in which it is reduced by a factor of 2π, so as to make S 1 a probability space). However,
the elementary theory of Fourier series, which will be the principal application of this work in the present volume, is
generally done on intervals in R, so that formulae based on ]−π, π] are closer to the standard expressions. Henceforth,
therefore, I will express the work in terms of ]−π, π].

(c) The result corresponding to 255A now takes a slightly different form, so I spell it out.

255N Theorem Let µ be Lebesgue measure on ]−π, π] and µ2 Lebesgue measure on ]−π, π] × ]−π, π]; write Σ, Σ2
for their domains.
(a) For any a ∈ ]−π, π], the map x 7→ a +2π x : ]−π, π] → ]−π, π] is a measure space automorphism of (]−π, π] , Σ, µ).
(b) The map x 7→ −2π x : ]−π, π] → ]−π, π] is a measure space automorphism of (]−π, π] , Σ, µ).
(c) For any a ∈ ]−π, π], the map x 7→ a −2π x : ]−π, π] → ]−π, π] is a measure space automorphism of (]−π, π] , Σ, µ).
2 2 2
(d) The map (x, y) 7→ (x +2π y, y) : ]−π, π] → ]−π, π] is a measure space automorphism of (]−π, π] , Σ2 , µ2 ).
2 2 2
(e) The map (x, y) 7→ (x −2π y, y) : ]−π, π] → ]−π, π] is a measure space automorphism of (]−π, π] , Σ2 , µ2 ).
proof (a) Set φ(x) = a +2π x. Then for any E ⊆ ]−π, π],
φ[E] = ((E + a) ∩ ]−π, π]) ∪ (((E + a) ∩ ]π, 3π]) − 2π) ∪ (((E + a) ∩ ]−3π, −π]) + 2π),
and these three sets are disjoint, so that

µφ[E] = µ((E + a) ∩ ]−π, π]) + µ(((E + a) ∩ ]π, 3π]) − 2π)


+ µ(((E + a) ∩ ]−3π, −π]) + 2π)
= µL ((E + a) ∩ ]−π, π]) + µL (((E + a) ∩ ]π, 3π]) − 2π)
+ µL (((E + a) ∩ ]−3π, −π]) + 2π)
(writing µL for Lebesgue measure on R)
= µL ((E + a) ∩ ]−π, π]) + µL ((E + a) ∩ ]π, 3π]) + µL ((E + a) ∩ ]−3π, −π])
= µL (E + a) = µL E = µE.

Similarly, µφ−1 [E] is defined and equal to µE for every E ∈ Σ, so that φ is an automorphism of (]−π, π] , Σ, µ).
(b) Of course this is quicker. Setting φ(x) = −2π x for x ∈ ]−π, π], we have

µ(φ[E]) = µ(φ[E] ∩ ]−π, π[) = µ(−(E ∩ ]−π, π[)


= µL (−(E ∩ ]−π, π[)) = µL (E ∩ ]−π, π[)
= µ(E ∩ ]−π, π[) = µE
for every E ∈ Σ.
(c) This is just a matter of putting (a) and (b) together, as in 255A.
(d) We can argue as in (a), but with a little more elaboration. If E ∈ Σ2 , and φ(x, y) = (x+2π y, y) for x, y ∈ ]−π, π],
2
set ψ(x, y) = (x + y, y) for x, y ∈ R, and write c = (2π, 0) ∈ R 2 , H = ]−π, π] , H ′ = H + c, H ′′ = H − c. Then for any
E ∈ Σ2 ,
255Xb Convolutions of functions 263

φ[E] = (ψ[E] ∩ H) ∪ ((ψ[E] ∩ H ′ ) − c) ∪ ((ψ[E] ∩ H ′′ ) + c),


so

µ2 φ[E] = µ2 (ψ[E] ∩ H) + µ2 ((ψ[E] ∩ H ′ ) − c) + µ2 ((ψ[E] ∩ H ′′ ) + c)


= µL (ψ[E] ∩ H) + µL ((ψ[E] ∩ H ′ ) − c) + µL ((ψ[E] ∩ H ′′ ) + c)
(this time writing µL for Lebesgue measure on R 2 )
= µL (ψ[E] ∩ H) + µL (ψ[E] ∩ H ′ ) + µL (ψ[E] ∩ H ′′ )
= µL ψ[E] = µL E = µ2 E.

2
In the same way, µ2 (φ−1 [E]) = µ2 E for every E ∈ Σ2 , so φ is an automorphism of (]−π, π] , Σ2 , µ2 ), as required.
(e) Finally, (e) is just a restatement of (d), as before.

255O Convolutions on ]−π, π] With the fundamental result established, the same arguments as in 255B-255K
now yield the following. Write µ for Lebesgue measure on ]−π, π].

(a) Let f and g be measurable complex-valued functions defined almost everywhere in ]−π, π]. Write f ∗ g for the
function defined by the formula

(f ∗ g)(x) = −π
f (x −2π y)g(y)dy
whenever x ∈ ]−π, π] and the integral exists as a complex number. Then f ∗ g is the convolution of the functions f
and g.

(b) If f and g are measurable complex-valued functions defined almost everywhere in ]−π, π], then f ∗ g = g ∗ f .

(c) Let f , g and h be measurable complex-valued functions defined almost everywhere in ]−π, π]. Then
(i)
Rπ R
−π
h(x)(f ∗ g)(x)dx = ]−π,π]2
h(x +2π y)f (x)g(y)d(x, y)

whenever the right-hand side exists and is finite, provided that in the expression h(x)(f ∗ g)(x) we interpret the product
as 0 if h(x) = 0 and (f ∗ g)(x) is undefined. Rπ
(ii) If, on the same interpretation of |h(x)|(|f | ∗ |g|)(x), the integral −π |h(x)|(|f | ∗ |g|)(x)dx is finite, then
R
]−π,π]2
h(x +2π y)f (x)g(y)d(x, y) exists in C, so again we shall have
Rπ R
−π
h(x)(f ∗ g)(x)dx = ]−π,π]2
h(x +2π y)f (x)g(y)d(x, y).

(d) If f , g are complex-valued functions which are integrable over ]−π, π], then f ∗ g is integrable, with
Rπ Rπ Rπ Rπ Rπ Rπ
−π
f ∗g = −π
f −π
g, −π
|f ∗ g| ≤ −π
|f | −π
|g|.

(e) Let f , g, h be complex-valued measurable functions defined almost everywhere in ]−π, π], such that f ∗g and g ∗h
are also defined almost everywhere. Suppose that x ∈ ]−π, π] is such that one of (|f | ∗ (|g| ∗ |h|))(x), ((|f | ∗ |g|) ∗ |h|)(x)
is defined in R. Then f ∗ (g ∗ h) and (f ∗ g) ∗ h are defined and equal at x.

(f ) Suppose that f ∈ LpC (µ), g ∈ LqC (µ) where p, q ∈ [1, ∞] and p1 + 1q = 1. Then f ∗ g is defined everywhere in
]−π, π], and supx∈]−π,π] |(f ∗ g)(x)| ≤ kf kp kgkq , interpreting k k∞ as ess sup | |, as in 255K.

255X Basic exercises R > (a) Let f , g be complex-valued functions defined almost everywhere in R. Show that for
any x ∈ R, (f ∗ g)(x) = f (x + y)g(−y)dy if either is defined.

> (b) Let f and g be complex-valued functions defined almost everywhere in R. (i) Show that if f and g are even
functions, so is f ∗ g. (ii) Show that if f is even and g is odd then f ∗ g is odd. (iii) Show that if f and g are odd then
f ∗ g is even.
264 Product measures 255Xc

(c) Suppose that f , g are real-valued measurable functions defined almost everywhere in R r and such that f > 0
a.e., g ≥ 0 a.e. and {x : g(x) > 0} is not negligible. Show that f ∗ g > 0 everywhere in dom(f ∗ g).

> (d) Suppose that f : R → C is a bounded differentiable function and that f ′ is bounded. Show that for any
integrable complex-valued function g on R, f ∗ g is differentiable and (f ∗ g)′ = f ′ ∗ g everywhere. (Hint: 123D.)
Rb
(e) A complex-valued function g defined almost everywhere in R is locally integrable if a g is defined in C
whenever a < b in R. Suppose that g is such a function and that f : R → C is a differentiable function, with continuous
derivative, such that {x : f (x) 6= 0} is bounded. Show that (f ∗ g)′ = f ′ ∗ g everywhere.
R
1
> (f ) Set φδ (x) = exp(− δ2 −x 2 ) if |x| < δ, 0 if |x| ≥ δ, as in 242Xi. Set αδ = φδ , ψδ = αδ−1 φδ . Let f be a
locally integrable complex-valued function on R. (i) Show that f ∗ ψδ is a smooth function defined everywhere on R
for every δ > 0. (ii) Show R that limδ↓0 (f ∗ ψδ )(x) = f (x) for almost every x ∈ R. (Hint: 223Yg.) (iii) Show that if f
is integrable then lim
R δ↓0 |f − fR ∗ ψδ | = 0. (Hint: use (ii) and 245H(a-ii) or look first at the case f = χ[a, b] and use
242O, noting that |f ∗ ψδ | ≤ |f |.) (iv) Show that if f is uniformly continuous and defined everywhere in R then
limδ↓0 supx∈R |f (x) − (f ∗ ψδ )(x)| = 0.
1 α−1
> (g) For α > 0, set gα (t) = t for t > 0, 0 for t ≤ 0. Show that gα ∗ gβ = gα+β for all α, β > 0. (Hint:
Γ(α)
252Yf.)

> (h) Let µ be Lebesgue measure on R. For u, v, w ∈ L0C = L0C (µ), say that u ∗ v = w if f ∗ g is defined almost
everywhere and (f ∗ g)• = w whenever f , g ∈ L0C (µ), f • = u and g • = w. (i) Show that (u1 + u2 ) ∗ v = u1 ∗ v + u2 ∗ v
whenever u1 , u2 , v ∈ L0C and u1 ∗ v and u2 ∗ v are defined in this sense. (ii) Show that u ∗ v = v ∗ u whenever u,
v ∈ L0 (C) and either u ∗ v or v ∗ u is defined. (iii) Show that if u, v, w ∈ L0C , u ∗ v and v ∗ w are defined, and either
|u| ∗ (|v| ∗ |w|) or (|u| ∗ |v|) ∗ |w| is defined, then then u ∗ (v ∗ w) = (u ∗ v) ∗ w are defined and equal.

> (i) Let µ be Lebesgue measure on R. (i) Show that u ∗ v, as defined in 255Xh, belongs to L1C (µ) whenever u,
v ∈ L1C (µ). (ii) Show that L1C is a commutative Banach algebra under ∗ (definition: 2A4Jb).
R
(j)(i) Show
R that if h isRan integrable function on R 2 , then (T h)(x) = h(x − y, y)dy exists for almost every x ∈ R,
and that (T h)(x)dx = h(x, y)d(x, y). (ii) Write µ2 for Lebesgue measure on R 2 , µ for Lebesgue measure on R.
Show that there is a linear operator T̃ : L1 (µ2 ) → L1 (µ) defined by setting T̃ (h• ) = (T h)• for every integrable function
h on R 2 . (iii) Show that in the language of 253E and 255Xh, T̃ (u ⊗ v) = u ∗ v for all u, v ∈ L1 (µ).
P P
> (k) For a , b ∈ CZ set (a a ∗ b )(n) = i∈Z a (n − i)bb(i) whenever i∈Z |a a(n − i)bb(i)| < ∞. Show that
(i) aP ∗ b = b ∗ a; P P
(ii) i∈Z c (i)(a a ∗ b )(i) = i,j∈Z c (i + j)a a(i)bb(j) if i,j∈Z |cc(i + j)a a(i)bb(j)| < ∞;
1 1 a ∗ b k1 ≤ ka
ak1 kbbk1 ;
(iii) if a , b ∈ ℓ (Z) then a ∗ b ∈ ℓ (Z) and ka
(iv) If a , b ∈ ℓ2 (Z) then a ∗ b ∈ ℓ∞ (Z) and ka a ∗ b k∞ ≤ ka
ak2 kbbk2 ;
(v) if a , b , c ∈ CZ and (|a a| ∗ (|bb| ∗ |cc|))(n) is well-defined, then (a a ∗ (bb ∗ c ))(n) = ((a
a ∗ b ) ∗ c )(n).

255Y Further exercises (a) Let f be a complex-valued function which is integrable over R. (i) Let x be any
point of the Lebesgue set of f . Show that for any ǫ > 0 there is a δ > 0 such that |f (x) − (f ∗ g)(x)| ≤R ǫ whenever
g : R → [0, ∞[ is a function which is non-decreasing on ]−∞, 0], non-decreasing on [0, ∞[, and has g = 1 and

−δ
g ≥ 1 − δ. (ii) Show that for any ǫ > 0 there is a δ > 0 such that kf − f ∗ gk1 ≤ ǫ whenever g : R → [0, ∞[ is a
R Rδ
function which is non-decreasing on ]−∞, 0], non-decreasing on [0, ∞[, and has g = 1 and −δ g ≥ 1 − δ.

(b) Let f be a complex-valued function which is integrable over R. Show that, for almost every x ∈ R,
aR ∞ f (y) 1R ∞
lima→∞ dy, lima→∞ f (y)e−a(y−x) dy,
π −∞ 1+a2 (x−y)2 a x

1 R∞ 2 2
limσ↓0 √ f (y)e−(y−x) /2σ dy
σ 2π −∞
all exist and are equal to f (x). (Hint: 263G.)
x
(c) Set f (x) = 1 for all x ∈ R, g(x) = for 0 < |x| ≤ 1 and 0 otherwise, h(x) = tanh x for all x ∈ R. Show that
|x|
f ∗ (g ∗ h) and (f ∗ g) ∗ h are both defined (and constant) everywhere, and are different.
255Yo Convolutions of functions 265

(d) Discuss what can happen if, in the context of 255J, we know that (|f | ∗ (|g| ∗ |h|))(x) is defined, but have no
information on the domain of f ∗ g.

(e) Suppose that p ∈ [1, ∞[ and that f ∈ LpC (µ), where µ is Lebesgue measure on R r . For a ∈ R r set (Sa f )(x) =
f (a + x) whenever a + x ∈ dom f . Show that Sa f ∈ LpC (µ), and that for every ǫ > 0 there is a δ > 0 such that
kSa f − f kp ≤ ǫ whenever |a| ≤ δ.

(f ) Suppose that p, q ∈ ]1, ∞[ and p1 + 1q = 1. Take f ∈ LpC (µ) and g ∈ LqC (µ), where µ is Lebesgue measure on R r .
Show that limkxk→∞ (f ∗ g)(x) = 0. (Hint: use 244Hb.)

(g) Repeat 255Ye and 255K, this time taking µ to be Lebesgue measure on ]−π, π], and setting (Sa f )(x) = f (a+2π x)
for a ∈ ]−π, π]; show that in the new version of 255K, (f ∗ g)(π) = limx↓−π (f ∗ g)(x).

(h) Let µ be Lebesgue measure on R. For a ∈ R, f ∈ L0 = L0 (µ) set (Sa f )(x) = f (a + x) whenever a + x ∈ dom f .
(i) Show that Sa f ∈ L0 for every f ∈ L0 .
(ii) Show that we have a map S̃a : L0 → L0 defined by setting S̃a (f • ) = (Sa f )• for every f ∈ L0 .
(iii) Show that S̃a is a Riesz space isomorphism and is a homeomorphism for the topology of convergence in
measure; moreover, that S̃a (u × v) = S̃a u × S̃a v for all u, v ∈ L0 .
(iv) Show that S̃a+b = S̃a S̃b for all a, b ∈ R.
(v) Show that lima→0 S̃a u = u for the topology of convergence in measure, for every u ∈ L0 .
(vi) Show that if 1 ≤ p ≤ ∞ then S̃a ↾Lp is an isometric isomorphism of the Banach lattice Lp .
(vii) Show that if p ∈ [1, ∞[ then lima→0 kS̃a u − ukp = 0 for every u ∈ Lp .
(viii) Show that if A ⊆ L1 is uniformly integrable and M ≥ 0, then {S̃a u : u ∈ A, |a| ≤ M } is uniformly integrable.
(ix) Suppose that u, v ∈ L0 are such that u ∗ v is defined in L0 in the sense of 255Xh. Show that S̃a (u ∗ v) =
(S̃a u) ∗ v = u ∗ (S̃a v) for every a ∈ R.

(i) Prove 255Nd from 255Na by the method used to prove 255Ad from 255Aa, rather than by quoting 255Ad.

(j) Let µ be Lebesgue measure on R, and φ : R → R a convex function; let φ̄ :RL0 → L0 = L0 (µ) be the associated
operator (see 241I). Show that if u ∈ L1 = L1 (µ), v ∈ L0 are such that u ≥ 0, u = 1 and u ∗ v, u ∗ φ̄(v) are both
defined in the sense of 255Xh, then φ̄(u ∗ v) ≤ u ∗ φ̄(v). (Hint: 233I.)

(k) Let µ be Lebesgue measure on R, and p ∈ [1, ∞]. Let f ∈ L1C (µ), g ∈ LpC (µ). Show that f ∗ g ∈ LpC (µ) and that
kf ∗ gkp ≤ kf k1 kgkp . (Hint: argue from 255Yj, as in 244M.)
1 1
(l) Suppose that p, q, r ∈ ]1, ∞[ and that p + q = 1 + 1r . Let µ be Lebesgue measure on R. (i) Show that
R 1−p/r 1−q/r
R
f × g ≤ kf kp kgkq ( f p × g q )1/r
′ ′
whenever f , g ≥ 0 and f ∈ Lp (µ), g ∈ Lq (µ). (Hint: set p′ R= p/(p − 1), etc.; f1 = f p/q , g1 = g q/p , h = (f p × g q )1/r .
Use 244Xc to see that kf1 × g1 kr′ ≤ kf1 kq′ kg1 kp′ , so that f1 × g1 × h ≤ kf1 kq′ kg1 kp′ khkr .) (ii) Show that f ∗ g is
defined a.e. and that kf ∗ gkRr ≤ kf kp kgkq for all f ∈ Lp (µ), g ∈ Lq (µ). (Hint: takeR f , g ≥ 0. Use (i) to see that
(f ∗ g)(x)r ≤ kf kpr−p kgkqr−q f (y)p g(x − y)q dy, so that kf ∗ gkrr ≤ kf kpr−p kgkqr−q f (y)p kgkqq dy.) (This is Young’s
inequality.)

(m) Repeat the results of this section for the group (S 1 )r , where r ≥ 2, given its product measure.

(n) Let G be a group and µ a σ-finite measure on G such that (α) for every a ∈ G, the map x 7→ ax is an
automorphism of (G, µ) (β) the map (x, y) 7→ (x, xy) isRan automorphism of (G2 , µ2 ), where µ2 is the c.l.d. product
measure on G × G. For f , g ∈ LR0C (µ) write (f ∗ g)(x) = f (y)g(y −1 x)dy whenever R this is defined. Show that
0
R (i) if f , g, h ∈ L C (µ) and h(xy)f (x)g(y)d(x, y) is defined in C, then h(x)(f ∗ g)(x)dx exists and is equal to
h(xy)f (x)g(y)d(x, y), provided that in the expression h(x)(f ∗ g)(x) we interpret the product as 0 if h(x) = 0 and
(f ∗ g)(x) is undefined; R R R
(ii) if f , g ∈ L1C (µ) then f ∗ g ∈ L1C (µ) and f ∗ g = f g, kf ∗ gk1 ≤ kf k1 kgk1 ;
(iii) if f , g, h ∈ L1C (µ) then f ∗ (g ∗ h) = (f ∗ g) ∗ h.
(See Halmos 50, §59.)

(o) Repeat 255Yn for counting measure on any group G.


266 Product measures 255 Notes

255 Notes and comments I have tried to set this section out in such a way that it will be clear that the basis of all the
work here is 255A, and the crucial application is 255G. I hope that if and when you come to look at general topological
groups (for instance, in Chapter 44), you will find it easy to trace through the ideas in any abelian topological group
for which you can prove a version of 255A. For non-abelian groups, of course, rather more care is necessary, especially
as in some important examples we no longer have µ{x−1 : x ∈ E} = µE for every E; see 255Yn-255Yo for a little of
what can be done without using topological ideas.
The critical point in 255A is the move from the one-dimensional results in 255Aa-255Ac, which are just the
translation- and reflection-invariance of Lebesgue measure, to the two-dimensional results in 255Ac-255Ad. And the
living centre of the argument, as I present it, is the fact that the shear transformation φ is an automorphism of the
structure (R 2 , Σ2 ). The actual calculation of µ2 φ[E], assuming that it is measurable, is an easy application of Fubini’s
and Tonelli’s theorems and the translation-invariance of µ. It is for this step that we absolutely need the topological
properties of Lebesgue measure. I should perhaps remind you that the fact that φ is a homeomorphism is not suffi-
cient; in 134I I described a homeomorphism of the unit interval which does not preserve measurability, and it is easy
to adapt this to produce a homeomorphism ψ : R 2 → R 2 such that ψ[E] is not always measurable for measurable E.
The argument of 255A is dependent on the special relationships between all three of the measure, topology and group
structure of R.
I have already indulged in a few remarks on what ought, or ought not, to be ‘obvious’ (255C). But perhaps I can add
that such results as 255B and the later claim, in the proof of 255K, that a reflected version of a function in Lp is also in
Lp , can only be trivial consequences of results like 255A if every step in the construction of the integral is done in the
abstract context of general measure spaces. Even though we are here working exclusively with the Lebesgue integral,
the argument will become untrustworthy if we have at any stage in the definition of the integral even mentioned that
we are thinking of Lebesgue measure. I advance this as a solid reason for defining ‘integration’ on abstract measure
spaces from the beginning, as I did in Volume 1. Indeed, I suggest that generally in pure mathematics there are good
reasons for casting arguments into the forms appropriate to the arguments themselves.
I am writing this book for readers who are interested in proofs, and as elsewhere I have written the proofs of this
section out in detail. But most of us find it useful to go through some material in ‘advanced calculus’ mode, by which
I mean starting with a formula such as
R
(f ∗ g)(x) = f (x − y)g(y)dy,
and then working out consequences by formal manipulations, for instance
R RR RR
h(x)(f ∗ g)(x)dx = h(x)f (x − y)g(y)dydx = h(x + y)f (x)g(y)dydx,
without troubling about the precise applicability of the formulae to begin with. In some ways this formula-driven
approach can be more truthful to the structure of the subject than the careful analysis I habitually present. The exact
hypotheses necessary to make the theorems strictly true are surely secondary, in such contexts as this section, to the
pattern formed by the ensemble of the theorems, which can be adequately and elegantly expressed in straightforward
formulae. Of course I do still insist that we cannot properly appreciate the structure, nor safely use it, without
mastering the ideas of the proofs – and as I have said elsewhere, I believe that mastery of ideas necessarily includes
mastery of the formal details, at least in the sense of being able to reconstruct them fairly fluently on demand.
Throughout the main exposition of this section, I have worked with functions rather than equivalence classes of
functions. But all the results here have interpretations of great importance for the theory of the ‘function spaces’ of
Chapter 24. In 255Xh and the succeeding exercises, I have pointed to a definition of convolution as an operator from
a subset of L0 × L0 to L0 . It is an interesting point that if u, v ∈ L0 then u ∗ v can be interpreted as a function, not
as a member of L0 (255Fc). Thus 255H can be regarded as saying that u ∗ v ∈ L1 for u, v ∈ L1 . We cannot quite
say that convolution is a bilinear operator from L1 × L1 to L1 , because L1 , as I define it, is not strictly speaking a
linear space. If we want a bilinear operator, then we have to regard convolution as a function from L1 × L1 to L1 . But
when we look at convolution as a function on L2 × L2 , for instance, then our functions u ∗ v are defined everywhere
(255K), and indeed are continuous functions vanishing at ∞ (255Ye-255Yf). So in this case it seems more appropriate
to regard convolution as a bilinear operator from L2 × L2 to some space of continuous functions, and not as an operator
from L2 × L2 to L∞ . For an example of an interesting convolution which is not naturally representable in terms of an
operator on Lp spaces, see 255Xg.
Because convolution acts as a continuous bilinear operator from L1 (µ)×L1 (µ) to L1 (µ), where µ is Lebesgue measure
on R, Theorem 253F tells us that it must correspond to a linear operator from L1 (µ2 ) to L1 (µ), where µ2 is Lebesgue
measure on R 2 . This is the operator T̃ of 255Xj.
So far in these notes I have written as though we were concerned only with Lebesgue measure on R. However many
applications of the ideas involve R r or ]−π, π] or S 1 . The move to R r should be elementary. The move to S 1 does
require a re-formulation of the basic result 255A/255N. It should also be clear that there will be no new difficulties in
256B Radon measures on Rr 267

r
moving to ]−π, π] or (S 1 )r . Moreover, we can also go through the whole theory for the groups Z and Zr , where the
r
appropriate measure is now counting measure, so that L0C becomes identified with CZ or CZ (255Xk, 255Yo).

256 Radon measures on R r


In the next section, and again in Chapters 27 and 28, we need to consider the principal class of measures on
Euclidean spaces. For a proper discussion of this class, and the interrelationships between the measures and the
topologies involved, we must wait until Volume 4. For the moment, therefore, I present definitions adapted to the
case in hand, warning you that the correct generalizations are not quite obvious. I give the definition (256A) and a
characterization (256C) of Radon measures on Euclidean spaces, and theorems on the construction of Radon measures
as indefinite integrals (256E, 256J), as image measures (256G) and as product measures (256K). In passing I give a
version of Lusin’s theorem concerning measurable functions on Radon measure spaces (256F).

256A Definitions Let ν be a measure on R r , where r ≥ 1, and Σ its domain.

(a) ν is a topological measure if every open set belongs to Σ. Note that in this case every Borel set, and in
particular every closed set, belongs to Σ.

(b) ν is locally finite if every bounded set has finite outer measure.

(c) If ν is a topological measure, it is inner regular with respect to the compact sets if
νE = sup{νK : K ⊆ E is compact}
for every E ∈ Σ. (Because ν is a topological measure, and compact sets are closed (2A2Ec), νK is defined for every
compact set K.)

(d) ν is a Radon measure if it is a complete locally finite topological measure which is inner regular with respect
to the compact sets.

256B It will be convenient to be able to call on the following elementary facts.


Lemma Let ν be a Radon measure on R r , and Σ its domain.
(a) ν is σ-finite.
(b) For any E ∈ Σ and any ǫ > 0 there are a closed set F ⊆ E and an open set G ⊇ E such that ν(G \ F ) ≤ ǫ.
(c) For every E ∈ Σ there is a set H ⊆ E, expressible as the union of a sequence of compact sets, such that
ν(E \ H) = 0.
(d) Every continuous real-valued function on R r is Σ-measurable.
(e) If h : R r → R is continuous and has bounded support, then h is ν-integrable.
proof (a) For each n ∈ N, B(0, n) = {x : kxk ≤ n} is a closed bounded set, therefore Borel. So if ν is a Radon measure
on R r , hB(0, n)in∈N is a cover of R r by a sequence of sets of finite measure.
(b) Set En = {x : x ∈ E, n ≤ kxkS< n + 1} for each n. Then νEn < ∞, so there is a compact set Kn ⊆ En such
that νKn ≥ νEn − 2−n−2 ǫ. Set F = n∈N Kn ; then
P∞ 1
ν(E \ F ) = n=0 ν(En \ Kn ) ≤ ǫ.
2
Also F ⊆ E and F is closed because
S
F ∩ B(0, n) = i≤n Ki ∩ B(0, n)
is closed for each n.
In the same way, there is a closed set F ′ ⊆ R r \ E such that ν((R r \ E) \ F ′ ) ≤ 12 ǫ. Setting G = R r \ F ′ , we see that
G is open, that G ⊇ E and that ν(G \ E) ≤ 12 ǫ, so that ν(G \ F ) ≤ ǫ, as required.
S
(c) By (b), we can choose for each n ∈SN a closed set Fn ⊆ E such that ν(E \ Fn ) ≤ 2−n . Set H = n∈N Fn ; then
H ⊆ E and ν(E \ H) = 0, and also H = m,n∈N B(0, m) ∩ Fn is a countable union of compact sets.
(d) If h : R r → R is continuous, all the sets {x : h(x) > a} are open, so belong to Σ.
(e) By (d), h is measurable. Now we are supposing that there is some n ∈ N such that h(x) = 0 whenever
x∈/ B(0, n). Since B(0, n) is compact (2A2F), h is bounded on B(0, n) (2A2G), and we have |h| ≤ γχB(0, n) for some
γ; since νB(0, n) is finite, h is ν-integrable.
268 Product measures 256C

256C Theorem A measure ν on R r is a Radon measure iff it is the completion of a locally finite measure defined
on the σ-algebra B of Borel subsets of R r .
proof (a) Suppose first that ν is a Radon measure. Write Σ for its domain.
(i) Set ν0 = ν↾B. Then ν0 is a measure with domain B, and it is locally finite because ν0 B(0, n) = νB(0, n) is
finite for every n. Let ν̂0 be the completion of ν0 (212C).
(ii) If ν̂0 measures E, there are E1 , E2 ∈ B such that E1 ⊆ E ⊆ E2 and ν0 (E2 \ E1 ) = 0. Now E \ E1 ⊆ E2 \ E1
must be ν-negligible; as ν is complete, E ∈ Σ and
νE = νE1 = ν0 E1 = ν̂0 E.

(iii) If E ∈ Σ, then by 256Bc there is a Borel set H ⊆ E such that ν(E \ H) = 0. Equally, there is a Borel set
H ′ ⊆ R r \ E such that ν((R r \ E) \ H ′ ) = 0, so that we have H ⊆ E ⊆ R r \ H ′ and
ν0 ((R r \ H ′ ) \ H) = ν((Rr \ H ′ ) \ H) = 0.
So ν̂0 E is defined and equal to ν0 E1 = νE.
This shows that ν = ν̂0 is the completion of the locally finite Borel measure ν↾B. And this is true for any Radon
measure ν on R r .
(b) For the rest of the proof, I suppose that ν0 is a locally finite measure on R r and ν is its completion. Write Σ
for the domain of ν. We say that a subset of R r is a Kσ set if it is expressible as the union of a sequence of compact
sets. Note that every Kσ set is a Borel set, so belongs to Σ. Set
A = {E : E ∈ Σ, there is a Kσ set H ⊆ E such that ν(E \ H) = 0},

Σ = {E : E ∈ A, R r \ E ∈ A}.

(c)(i) Every open set is itself a Kσ set, so belongs to A. P P Let G ⊆ R r be open. If G = ∅ then G is compact
and the result is trivial. Otherwise, let I be the set of closed intervals of the form [q, q ′ ], where q, q ′ ∈ Qr , which are
included in G. Then all the members of I are closed and bounded, therefore compact. If x ∈ G, there isSa δ > 0 such
that B(x, δ) = {y : ky − xk ≤ δ} ⊆ G; now there is an I ∈ I such that x ∈ I ⊆ B(x, δ). Thus G = I. But I is
countable, so G is Kσ . QQ
S
(ii) Every closed subset of R is Kσ , so belongs to A. PP If F ⊆ R is closed, then F = n∈N F ∩ B(0, n); but every
F ∩ B(0, n) is closed and bounded, therefore compact. Q Q
S
(iii) If hEn in∈N is any sequence in A, then E = Sn∈N En belongs to A. S P P For each n ∈ N we have a countable
family Kn of compact S subsetsSof En suchSthat ν(En \ Kn ) = 0; now K = n∈N Kn is a countable family of compact
subsets of E, and E \ K ⊆ n∈N (En \ Kn ) is ν-negligible. Q Q
T
(iv) If hEn in∈N is any sequence in A,S then F = n∈N En ′∈ A.SP P For each n ∈ N, let hKni ii∈N be a sequence of
compact subsets of En such that ν(En \ i∈N Kni ) = 0. Set Knj = i≤j Kni for each j, so that

ν(En ∩ H) = limj→∞ ν(Knj ∩ H)
for every H ∈ Σ. Now, for each m, n ∈ N, choose j(m, n) such that

ν(En ∩ B(0, m) ∩ Kn,j(m,n) ) ≥ ν(En ∩ B(0, m)) − 2−(m+n) .
T ′
Set Km = n∈N Kn,j(m,n) ; then Km is closed (being an intersection of closed sets) and bounded (being a subset of
′ ′
K0,j(m,0) ), therefore compact. Also Km ⊆ F , because Kn,j(m,n) ⊆ En for each n, and
P∞ ′
P∞
ν(F ∩ B(0, m) \ Km ) ≤ n=0 ν(En ∩ B(0, m) \ Kn,j(m,n) ) ≤ n=0 2−(m+n) = 2−m+1 .
S
Consequently H = m∈N Km is a Kσ subset of F and
ν(F ∩ B(0, m) \ H) ≤ inf k≥m ν(F ∩ B(0, k) \ Hk ) = 0
for every m, so ν(F \ H) = 0 and F ∈ A. Q
Q
P (i) ∅ and its complement are open, so belong to A and therefore to Σ. (ii)
(d) Σ is a σ-algebra of subsets of R. P
If E ∈ Σ then both R r \ E and Rr \ (R r \ E) = E belong to A, so Rr \ E ∈ Σ. (iii) Let hEn in∈N be a sequence in Σ
with union E. By (a-iii) and (a-iv),
T
E ∈ A, R r \ E = n∈N (R r \ En ) ∈ A,
256E Radon measures on Rr 269

so E ∈ Σ. Q
Q
(e) By (c-i) and (c-ii), every open set belongs to Σ; consequently every Borel set belongs to Σ and therefore to A.
Now if E is any member of Σ,Sthere is a Borel set E1 ⊆ E such that ν(E \ E1 ) = 0 and a Kσ set H ⊆ E1 such that
ν(E1 \ H) = 0. Express H as n∈N Kn where every Kn is compact; then
S
νE = νH = limn→∞ ν( i≤n Ki ) ≤ supK⊆E is compact νK ≤ νE
S
because i∈n Ki is a compact subset of E for every n.
(f ) Thus ν is inner regular with respect to the compact sets. But of course it is complete (being the completion of
ν0 ) and a locally finite topological measure (because ν0 is); so it is a Radon measure. This completes the proof.

256D Proposition If ν and ν ′ are two Radon measures on R r , the following are equiveridical:
(i) ν = ν ′ ;
(ii) νK = ν ′ K for every compact set K ⊆ R r ;
(iii) RνG = ν ′ GR for every open set G ⊆ R r ;
(iv) h dν = h dν ′ for every continuous function h : R r → R with bounded support.
proof (a)(i)⇒(iv) is trivial.
(b)(iv)⇒(iii) If (iv) is true, and G ⊆ R r is an open set, then for each n ∈ N set
hn (x) = min(1, 2n inf y∈R r \(G∩B(0,n)) ky − xk)
for Rx ∈ R r . Then
R hn is continuous (in fact |hn (x) − hn (x′ )| ≤ 2n kx − x′ k for all x, x′ ∈ R r ) and zero outside B(0, n),
so hn dν = hn dν ′ . Next, hhn (x)in∈N is a non-decreasing sequence converging to χG(x) for every x ∈ R r . So
R R
νG = limn→∞ hn dν = limn→∞ hn dν ′ = ν ′ G,
by 135Ga. As G is arbitrary, (iii) is true.
(c)(iii)⇒(ii) If (iii) is true, and K ⊆ R r is compact, let n be so large that kxk < n for every x ∈ K. Set
G = {x : kxk < n}, H = G \ K. Then G and H are open and G is bounded, so νG = ν ′ G is finite, and
νK = νG − νH = ν ′ G − ν ′ H = ν ′ K.
As K is arbitrary, (ii) is true.
(d)(ii)⇒(i) If ν, ν ′ agree on the compact sets, then
νE = supK⊆E is compact νK = supK⊆E is compact ν ′K = ν ′E
for every Borel set E. So ν↾B = ν ′ ↾B, where B is the algebra of Borel sets. But since ν and ν ′ are both the completions
of their restrictions to B, they are identical.

256E It is I suppose time I gave some examples of Radon measures. However it will save a few lines if I first
establish some basic constructions. You may wish to glance ahead to 256H at this point.
Theorem Let ν be a Radon measure on R r , with domain Σ, and f a non-negative Σ-measurable
R function defined on
a ν-conegligible subset of R r . Suppose that f is locally integrable in the sense that E f dν < ∞ for every bounded
set E. Then the indefinite-integral measure ν ′ on R r defined by saying that
R
ν ′E = E
f dν whenever E ∩ {x : x ∈ dom f, f (x) > 0} ∈ Σ
r
is a Radon measure on R .
proof For the construction of ν ′ , see 234I-234L. It is a topological measure because every open set belongs to Σ and
therefore to the domain Σ′ of ν ′ . ν ′ is locally finite because f is locally integrable. To see that ν ′ is inner regular with
respect to the compact sets, take any set E ∈ Σ′ , and set E ′ = {x : x ∈ E ∩ dom f, f (x) > 0}. Then E ′ ∈ Σ, so there
is a set H ⊆ E ′ , expressible as the union of a sequence of compact sets, such that ν(E ′ \ H) = 0. In this case
R
ν ′ (E \ H) = E\H
f dν = 0.

Let hKn in∈N be a sequence of compact sets with union H; then


S
ν ′ E = ν ′ H = limn→∞ ν ′ ( i≤n Ki ) ≤ supK⊆E is compact ν ′ K ≤ ν ′ E.
As E is arbitrary, ν ′ is inner regular with respect to the compact sets.
270 Product measures 256F

256F Theorem Let ν be a Radon measure on R r , and Σ its domain. Let f : D → R be a Σ-measurable function,
where D ⊆ R r . Then for every ǫ > 0 there is a closed set F ⊆ R r such that ν(R r \ F ) ≤ ǫ and f ↾F is continuous.
proof By 121I, there is a Σ-measurable function h : R r → R extending f . Enumerate Q as hqn in∈N . For each n ∈ N
set En = {x : h(x) ≤ qn }, En′ = {x : h(x) > qn } andTuse 256Bb to choose closed sets Fn ⊆ En , Fn′ ⊆ En′ such that
ν(En \ Fn ) ≤ 2−n−2 ǫ, ν(En′ \ Fn′ ) ≤ 2−n−2 ǫ. Set F = n∈N (Fn ∪ Fn′ ); then F is closed and
P∞ P∞
ν(R r \ F ) ≤ n=0 ν(R r \ (Fn ∪ Fn′ )) ≤ n=0 ν(En \ Fn ) + ν(En′ \ Fn′ ) ≤ ǫ.
P Suppose that x ∈ F and δ > 0. Then there are m, n ∈ N such that
I claim that h↾F is continuous. P
h(x) − δ ≤ qm < h(x) ≤ qn ≤ h(x) + δ.

This means that x ∈ Em / Fm ∪ Fn′ . Because Fm ∪ Fn′ is closed, there is an η > 0 such that
∩ En ; consequently x ∈
′ ′
/ Fm ∪ Fn whenever ky − xk ≤ η. Now suppose that y ∈ F and ky − xk ≤ η. Then y ∈ (Fm ∪ Fm
y ∈ ) ∩ (Fn ∪ Fn′ )
′ ′ ′
and y ∈/ Fm ∪ Fn , so y ∈ Fm ∩ Fn ⊆ Em ∩ En and qm < h(y) ≤ qn . Consequently |h(y) − h(x)| ≤ δ. As x and δ are
arbitrary, h↾F is continuous. Q
Q Consequently f ↾F = (h↾F )↾D is continuous, as required.

256G Theorem Let ν be a Radon measure on R r , with domain Σ, and suppose that φ : R r → R s is measurable
in the sense that all its coordinates are Σ-measurable. If the image measure ν ′ = νφ−1 (234D) is locally finite, it is a
Radon measure.
proof Write Σ for the domain of ν and Σ′ for the domain of ν ′ . If φ = (φ1 , . . . , φs ), then
φ−1 [{y : ηj ≤ α}] = {x : φj (x) ≤ α} ∈ Σ,
so {y : ηj ≤ α} ∈ Σ′ for every j ≤ s, α ∈ R, where I write y = (η1 , . . . , ηs ) for y ∈ R s . Consequently every Borel subset
of R s belongs to Σ′ (121J), and ν ′ is a topological measure. It is complete by 234Eb.
The point is of course that ν ′ is inner regular with respect to the compact sets. P P Suppose that F ∈ Σ′ and that
γ < ν F . For eachTj ≤ s, there is a closed set Hj ⊆ R such that φj ↾Hj is continuous and ν(R r \ Hj ) < 1s (ν ′ F − γ),
′ r

by 256F. Set H = j≤s Hj ; then H is closed and φ↾H is continuous and


ν(R r \ H) < ν ′ F − γ = νφ−1 [F ] − γ,
so that ν(φ−1 [F ] ∩ H) > γ. Let K ⊆ φ−1 [F ] ∩ H be a compact set such that νK ≥ γ, and set L = φ[K]. Because
K ⊆ H and φ↾H is continuous, L is compact (2A2Eb). Of course L ⊆ F , and
ν ′ L = νφ−1 [L] ≥ νK ≥ γ.
As F and γ are arbitrary, ν ′ is inner regular with respect to the compact sets. Q
Q
Since ν ′ is locally finite by the hypothesis of the theorem, it is a Radon measure.

256H Examples I come at last to the promised examples.

(a) Lebesgue measure on R r is a Radon measure. (It is a topological measure by 115G, and inner regular with
respect to the compact sets by 134Fb.)

(b) Let htn in∈N be any sequence in R r , and han in∈N any summable sequence in [0, ∞[. For every E ⊆ Rr set
P
νE = {an : tn ∈ E}.
so that ν is a totally finite point-supported measure. Then ν is a (totally finite) Radon measure on R r . P
P Clearly
ν is complete and defined on every Borel set and gives finite measure to bounded sets. To see that it is inner regular
with respect to the compact sets, observe that for any E ⊆ R r the sets
Kn = E ∩ {ti : i ≤ n}
are compact and νE = limn→∞ νKn . Q
Q

(c) Now we come to a new idea. Recall that the Cantor set C (134G) is a closed negligible subset of [0, 1], and that
the Cantor function (134H) is a non-decreasing continuous function f : [0, 1] → [0, 1] such that f (0) = 0, f (1) = 1 and
f is constant on each of the intervals composing [0, 1] \ C. It follows that if we set g(x) = 21 (x + f (x)) for x ∈ [0, 1],
then g : [0, 1] → [0, 1] is a continuous bijection such that the Lebesgue measure of g[C] is 21 (134I); consequently
g −1 : [0, 1] → [0, 1] is continuous. Now extend g to a bijection h : R → R by setting h(x) = x for x ∈ R \ [0, 1]. Then h
and h−1 are continuous. Note that h[C] = g[C] has Lebesgue measure 12 .
256K Radon measures on Rr 271

Let ν1 be the indefinite-integral measure defined from Lebesgue measure µ on R and the function 2χ(h[C]); that
is, ν1 E = 2µ(E ∩ h[C]) whenever this is defined. By 256E, ν1 is a Radon measure, and ν1 h[C] = ν1 R = 1. Let ν be
the measure ν1 h, that is, νE = ν1 h[E] for just those E ⊆ R such that h[E] ∈ dom ν1 . Then ν is a Radon probability
measure on R, by 256G, and νC = 1, ν(R \ C) = µC = 0.

256I Remarks (a) The measure ν of 256Hc, sometimes called Cantor measure, is a classic example, and as such
has many constructions, some rather more natural than the one I use here (see 256Xk, and also 264Ym below). But
I choose the method above because it yields directly, without further investigation or any appeal to more advanced
general theory, the fact that ν is a Radon measure.

(b) The examples above are chosen to represent the extremes under the ‘Lebesgue decomposition’ described in 232I.
If ν is a (totally finite) Radon measure on R r , we can use 232Ib to express its restriction ν↾B to the Borel σ-algebra as
νp + νac + νcs , where νp is the ‘point-mass’ or ‘atomic’ part of ν↾B, νac is the ‘absolutely continuous’ part (with respect
to Lebesgue measure), and νcs is the ‘atomless singular part’. In the example of 256Hb, we have ν↾B = νp ; in 256E, if
we start from Lebesgue measure, we have ν↾B = νac ; and in 256Hc we have ν↾B = νcs .

256J Absolutely continuous Radon measures It is worth pausing a moment over the indefinite-integral mea-
sures described in 256E.
Proposition Let ν be a Radon measure on R r , where r ≥ 1, and write µ for Lebesgue measure on R r . Then the
following are equiveridical:
(i) ν is an indefinite-integral measure over µ;
(ii) νE = 0 whenever E isRa Borel subset of R r and µE = 0.
In this case, if g ∈ L0 (µ) and E g dµ = νE for every Borel set E ⊆ R r , then g is a Radon-Nikodým derivative of ν
with respect to µ in the sense of 232Hf.
proof (a)(i)⇒(ii) If f is a Radon-Nikodým derivative of ν with respect to µ, then of course
R
νE = E
f dµ = 0
whenever µE = 0.
(ii)⇒(i) If νE = 0 for every µ-negligible Borel set E, then νE is defined and equal to 0 for every µ-negligible set
E, because ν is complete and any µ-negligible set is included in a µ-negligible Borel set. Consequently dom ν includes
the domain Σ of µ, since every Lebesgue measurable set is expressible as the union of a Borel set and a negligible set.
For each n ∈ N set En = {x : n ≤ kxk < n + 1}, so that hEn in∈N is a partition of R r into bounded Borel sets. Set
νn E = ν(E ∩ En ) for every Lebesgue measurable set E and every n ∈ N. Now νn is absolutely continuous R with respect
to µ (232Ba), so by the Radon-Nikodým theorem (232F) there is a µ-integrable function fn such that E fn dµ = νn E
for every Lebesgue measurable set E. Because νn E ≥ 0 for every E ∈ Σ, fn ≥ 0 a.e.; because νn (R r \ En ) = 0, fn = 0
a.e. on R r \ En . Now if we set
P∞
f = max(0, n=0 fn ),
f will be defined µ-a.e. and we shall have
R P∞ R P∞
E
f dµ = n=0 E
fn dµ = n=0 ν(E ∩ En ) = νE
for every Borel set E, so that the indefinite-integral measure ν ′ defined by f and µ agrees with ν on the Borel sets.
Since this ensures that ν ′ is locally finite, ν ′ is a Radon measure, by 256E, and is equal to ν, by 256D. Accordingly ν
is an indefinite-integral measure over µ.
(b) As in (a-ii) above, h must be locally integrable and the indefinite-integral measure defined by h agrees with ν
on the Borel sets, so is identical with ν.

256K Products The class of Radon measures on Euclidean spaces is stable under a wide variety of operations, as
we have already seen; in particular, we have the following.
Theorem Let ν1 , ν2 be Radon measures on R r and R s respectively, where r, s ≥ 1. Let λ be their c.l.d. product
measure on R r × R s . Then λ is a Radon measure.
Remark When I say that λ is ‘Radon’ according to the definition in 256A, I am of course identifying R r × R s with
R r+s , as in 251M-251N.
272 Product measures 256K

proof (a) I hope the following rather voluminous notation will seem natural. Write Σ1 , Σ2 for the domains of ν1 , ν2 ;
Br , Bs for the Borel σ-algebras of R r , R s ; Λ for the domain of λ; and B for the Borel σ-algebra of R r+s .
Because each νi is the completion of its restriction to the Borel sets (256C), λ is the product of ν1 ↾Br and ν2 ↾Bs
b s,
(251T). Because ν1 ↾Br and ν2 ↾Bs are σ-finite (256Ba, 212Ga), λ must be the completion of its restriction to Br ⊗B
which by 251M is identified with B. Setting Qn = {(x, y) : kxk ≤ n, kyk ≤ n} we have
λQn = ν1 {x : kxk ≤ n} · ν2 {y : kyk ≤ n} < ∞
for every n, while every bounded subset of R r+s is included in some Qn . So λ↾B is locally finite, and its completion λ
is a Radon measure, by 256C.

256L Remark We see from 253I that if ν1 and ν2 are Radon measures on R r and R s respectively, and both are
indefinite-integral measures over Lebesgue measure, then their product measure on R r+s is also an indefinite-integral
measure over Lebesgue measure.

*256M For the sake of applications in §286 below, I include another result, which is in fact one of the fundamental
properties of Radon measures, as will appear in §414.
Proposition Let ν be a Radon measure on Rr , and D any subset of Rr . Let Φ be a non-empty upwards-directed
family of non-negative continuous functions from D to R. For x ∈ D set g(x) = supf ∈Φ f (x) in [0, ∞]. Then
(a) gR : D → [0, ∞] is lower
R semi-continuous, therefore Borel measurable;
(b) D g dν = supf ∈Φ D f dν.
proof (a) For any u ∈ [−∞, ∞],
S
{x : x ∈ D, g(x) > u} = f ∈Φ {x : x ∈ D, f (x) > u}
is an open set for the subspace topology on D (2A3C), so is the intersection of D with a Borel subset of Rr . This is
enough to show that g is Borel measurable (121B-121C).
R R R
(b) Accordingly D g dν will be defined in [0, ∞], and of course D g dν ≥ supf ∈Φ D f dν.
For the reverse inequality, observe that there is a countable set Ψ ⊆ Φ such that g(x) = supf ∈Ψ f (x) for every x ∈ D.
PP For a ∈ Q, q, q ′ ∈ Q r set
Φaqq′ = {f : f ∈ Φ, f (y) > a whenever y ∈ D ∩ [q, q ′ ]},
interpreting [q, q ′ ] as in 115G. Choose faqq′ ∈ Φaqq′ if Φaqq′ is not empty, and arbitrarily in Φ otherwise; and set
Ψ = {faqq′ : a ∈ Q, q, q ′ ∈ Q r }, so that Ψ is a countable subset of Φ. If x ∈ D and b < g(x), there is an a ∈ Q
such that b ≤ a < g(x); there is an fˆ ∈ Φ such that fˆ(x) > a; because fˆ is continuous, there are q, q ′ ∈ Qr
such that q ≤ x ≤ q ′ and fˆ(y) ≥ a whenever y ∈ D ∩ [q, q ′ ]; so that fˆ ∈ Φaqq′ , Φaqq′ 6= ∅, faqq′ ∈ Φaqq′ and
supf ∈Ψ f (x) ≥ faqq′ (x) ≥ b. As b is arbitrary, g(x) = supf ∈Ψ f (x). Q Q
Let hfn in∈N be a sequence running over Ψ. Because Φ is upwards-directed, we can choose hfn′ in∈N in Φ inductively in

such a way that fn+1 ≥ max(fn′ , fn ) for every n ∈ N. So hfn′ in∈N is a non-decreasing sequence in Φ and supn∈N fn′ (x) ≥
supf ∈Ψ f (x) = g(x) for every x ∈ D. By B.Levi’s theorem,
R R R
D
g dν ≤ supn∈N D
fn′ dν ≤ supf ∈Φ D
f dν,
and we have the required inequality.

256X Basic exercises > (a) Let ν be a measure on R r . (i) Show that it is locally finite, in the sense of 256Ab, iff
for every x ∈ R r there is a δ > 0 such that ν ∗ B(x, δ) < ∞. (Hint: the sets B(0, n) are compact.) (ii) Show that in
this case ν is σ-finite.

> (b)SLet ν be a Radon measure on R r and G a non-empty


S upwards-directed family of open sets in R r . (i) Show
that ν(S G) = supG∈G νG. (Hint: observe that if K ⊆ G is compact, then K ⊆ G for some G ∈ G.) (ii) Show that
ν(E ∩ G) = supG∈G ν(E ∩ G) for every set E which is measured by ν.

on R r and F a non-empty downwards-directed family of closed sets in R r such that


>(c) Let ν be a Radon measure T
r
T ∞. (i) Show that ν( F) = inf F ∈F νF . (Hint: apply 256Xb(ii) to G = {R \ F : F ∈ F}.) (ii) Show
inf F ∈F νF <
that ν(E ∩ F) = inf F ∈F ν(E ∩ F ) for every E in the domain of ν.

> (d) Show that a Radon measure ν on R r is atomless iff ν{x} = 0 for every x ∈ R r . (Hint: apply 256Xc with
F = {F : F ⊆ E is closed, not negligible}.)
256Ye Radon measures on Rr 273

(e) Let ν1 , ν2 be Radon measures on R r , and α1 , α2 ∈ ]0, ∞[. Set Σ = dom ν1 ∩ dom ν2 , and for E ∈ Σ set
νE = α1 ν1 E + α2 ν2 E. Show that ν is a Radon measure on R r . Show that ν is an indefinite-integral measure over
Lebesgue measure iff ν1 , ν2 are, and that in this case a linear combination of of Radon-Nikodým derivatives of ν1 and
ν2 is a Radon-Nikodým derivative of ν.

>(f ) Let ν be a Radon measure on R r . (i) Show that there is a unique closed set F ⊆ R r such that, for open sets
G ⊆ R r , νG > 0 iff G ∩ F 6= ∅. (F is called the support of ν.) (ii) Generally, a set A ⊆ R r is called self-supporting
if ν ∗ (A ∩ G) > 0 whenever G ⊆ R r is an open set meeting A. Show that for every closed set F ⊆ R r there is a unique
self-supporting closed set F ′ ⊆ F such that ν(F \ F ′ ) = 0.

> (g) Show that a measure ν on R is a Radon measure iff it is a Lebesgue-Stieltjes measure as described in 114Xa.
Show that in this case ν is an indefinite-integral measure over Lebesgue measure iff the function x 7→ ν ]−∞, x] is
absolutely continuous on every bounded interval.

(h) Let ν be a Radon measure on R r . Let Ck be the space of continuous real-valued functions on RRr with bounded
supports. Show that for every ν-integrable function f and every ǫ > 0 there is a g ∈ Ck such that |f − g|dν ≤ ǫ.
(Hint: use arguments from 242O, but in (a-i) of the proof there start with closed intervals I.)

(i) Let ν be a Radon measure on R r . Show that νE = inf{νG : G ⊇ E is open} for every set E in the domain of ν.

(j) Let ν, ν ′ be two Radon measures on R r , and suppose that νI = ν ′ I for every half-open interval I ⊆ R r
(definition: 115Ab). Show that ν = ν ′ .

(k) Let ν be Cantor measure (256Hc). (i) Show that if Cn is the nth set used in the construction of the Cantor
set, so that Cn consists of 2n intervals of length 3−n , then νI = 2−n for each of the intervalsPI∞composing Cn . (ii)
Let λ be the usual measure on {0, 1}N (254J). Define φ : {0, 1}N → R by setting φ(x) = 23 n=0 3−n x(n) for each
x ∈ {0, 1}N . Show that φ is a bijection between {0, 1}N and C. (iii) Show that if B is the Borel σ-algebra of R, then
{φ−1 [E] : E ∈ B} is precisely the σ-algebra of subsets of {0, 1}N generated by the sets {x : x(n) = i} for n ∈ N,
i ∈ {0, 1}. (iv) Show that φ is an isomorphism between ({0, 1}N , λ) and (C, νC ), where νC is the subspace measure on
C induced by ν.

(l) Let ν and ν ′ be two Radon measures on R r . Show that ν ′ is an indefinite-integral measure over Rν iff ν ′ E = 0
whenever νE = 0, and in this case a function f is a Radon-Nikodým derivative of ν ′ with respect to ν iff E f dν = ν ′ E
for every Borel set E.

256Y Further exercises (a) Let ν be a Radon measure on R r , and X any subset of Rr ; let νX be the subspace
measure on X and ΣX its domain, and give X its subspace topology (2A3C). Show that νX has the following properties:
(i) νX is complete and locally determined; (ii) every open subset of X belongs to ΣX ; (iii) νX E = sup{νX F : F ⊆ E
is closed
S in X} for every E ∈ ΣX ; (iv) whenever G is a non-empty upwards-directed family of open subsets of X,
νX ( G) = supG∈G νX G; (v) every point of X belongs to an open set of finite measure.

(b) Let ν be a Radon measure on R r , with domain Σ, and f : Rr → R a function. Show that the following are
equiveridical: (i) f is Σ-measurable; (ii) for every non-negligible set E ∈ Σ there is a non-negligible F ∈ Σ such that
F ⊆ E and f ↾F is continuous; (iii) for every set E ∈ Σ, νE = supK∈Kf ,K⊆E νK, where Kf = {K : K ⊆ R r is compact,
f ↾K is continuous}. (Hint: for (ii)⇒(i), apply 215B(iv) to CalKf .)

(c) Take ν, X, νX and ΣX as in 256Ya. Suppose that f : X → R is a function. Show that f is ΣX -measurable iff for
every non-negligible measurable set E ⊆ X there is a non-negligible measurable F ⊆ E such that f ↾F is continuous.

(d) Let hνn in∈N be a sequence of Radon measures on R r . Show that there is a Radon measure ν on R r such that
P ∞ P∞ find a sequence hαn in∈N of strictly positive numbers such that
every νn is an indefinite-integral measure over ν. (Hint:
n=0 α n ν n B(0, k) < ∞ for every k, and set ν = n=0 αn νn , using the idea of 256Xe.)

(e) A set G ⊆ R N is open if for every x ∈ G there are n ∈ N, δ > 0 such that
{y : y ∈ R N , |y(i) − x(i)| < δ for every i ≤ n} ⊆ G.
The Borel σ-algebra of R N is the σ-algebra B of subsets of R N generated, in the sense of 111Gb, by the family T of
open sets. (i) Show that T is a topology (2A3A). (ii) Show that a filter F on R N converges to x ∈ R N iff πi [[F]] → x(i)
for every i ∈ N, where πi (y) = y(i) for i ∈ N, y ∈ R N . (iii) Show that B is the σ-algebra generated by sets of the
274 Product measures 256Ye

form {x : x ∈ R N , x(i) ≤ a}, where i runs over N and a runs over R. (iv) Show that if αi ≥ 0 for every i ∈ N, then
{x : |x(i)| ≤ αi ∀ i ∈ N} is compact. (Hint: 2A3R.) (v) Show that any open set in R N is the union of a sequence of
closed sets. (Hint: look at sets of the form {x : qi ≤ x(i) ≤ qi′ ∀ i ≤ n}, where qi , qi′ ∈ Q for i ≤ n.) (vi) Show that if
ν0 is any probability measure with domain B, then its completion ν is inner regular with respect to the compact sets,
and therefore may be called a ‘Radon measure on R N ’. (Hint: show that there are compact sets of measure arbitrarily
close to 1, and therefore that every open set, and every closed set, includes a Kσ set of the same measure.)

256 Notes and comments Radon measures on Euclidean spaces are very special, and the results of this section do
not give clear pointers to the direction the theory takes when applied to other kinds of topological space. With the
material here you could make a stab at developing a theory of Radon measures on complete separable metric spaces,
provided you use 256Xa as the basis for your definition of ‘locally finite’. These are the spaces for which a version of
256C is true. (See 256Ye.) But for generalizations to other types of topological space, and for the more interesting
parts of the theory on R r , I must ask you to wait for Volume 4. My purpose in introducing Radon measures here is
strictly limited; I wish only to give a basis for §257 and §271 sufficiently solid not to need later revision. In fact I think
that all we really need are the Radon probability measures.
The chief technical difficulty in the definition of ‘Radon measure’ here lies in the insistence on completeness. It
may well be that for everything studied in this volume, it would be simpler to look at locally finite measures with
domain the algebra of Borel sets. This would involve us in a number of circumlocutions when dealing with Lebesgue
measure itself and its derivates, since Lebesgue measure is defined on a larger σ-algebra; but the serious objection
arises in the more advanced theory, when non-Borel sets of various kinds become central. Since my aim in this book
is to provide secure foundations for the study of all aspects of measure theory, I ask you to take a little extra trouble
now in order to avoid the possibility of having to re-work all your ideas later. The extra trouble arises, for instance, in
256D, 256Xe and 256Xj; since different Radon measures are defined on different σ-algebras, we have to check that two
Radon measures which agree on the compact sets, or on the open sets, have the same domains. On the credit side,
some of the power of 256G arises from the fact that the Radon image measure νφ−1 is defined on the whole σ-algebra
{F : φ−1 [F ] ∈ dom(ν)}, not just on the Borel sets.
The further technical point that Radon measures are expected to be locally finite gives less difficulty; its effect is
that from most points of view there is little difference between a general Radon measure and a totally finite Radon
measure. The extra condition which obviously has to be put into the hypotheses of such results as 256E and 256G is
no burden on either intuition or memory.
In effect, we have two definitions of Radon measures on Euclidean spaces: they are the inner regular locally finite
topological measures, and they are also the completions of the locally finite Borel measures. The equivalence of these
definitions is Theorem 256C. The latter definition is the better adapted to 256K, and the former to 256G. The ‘inner
regularity’ of the basic definition refers to compact sets; we also have forms of inner regularity with respect to closed
sets (256Bb) and Kσ sets (256Bc), and a complementary notion of ‘outer regularity’ with respect to open sets (256Xi).

257 Convolutions of measures


The ideas of this chapter can be brought together in a satisfying way in the theory of convolutions of Radon measures,
which will be useful in §272 and again in §285. I give just the definition (257A) and the central property (257B) of the
convolution of totally finite Radon measures, with a few corollaries and a note on the relation between convolution of
functions and convolution of measures (257F).

257A Definition Let r ≥ 1 be an integer and ν1 , ν2 two totally finite Radon measures on Rr . Let λ be the product
measure on R r × R r ; then λ also is a (totally finite) Radon measure, by 256K. Define φ : R r × R r → R r by setting
φ(x, y) = x + y; then φ is continuous, therefore measurable in the sense of 256G. The convolution of ν1 and ν2 , ν1 ∗ ν2 ,
is the image measure λφ−1 ; by 256G, this is a Radon measure.
Note that if ν1 and ν2 are Radon probability measures, then λ and ν1 ∗ ν2 are also probability measures.

257B Theorem Let r ≥ 1 be an integer, and ν1 and ν2 two totally finite Radon measures on R r ; let ν = ν1 ∗ ν2 be
their convolution, and λ their product on R r × R r . Then for any real-valued function h defined on a subset of R r ,
R R
h(x + y)λ(d(x, y)) exists = h(x)ν(dx)
if either integral is defined in [−∞, ∞].
proof Apply 235J with J(x, y) = 1, φ(x, y) = x + y for all x, y ∈ R r .
257F Convolutions of measures 275

257C Corollary Let r ≥ 1 be an integer, and ν1 , ν2 two totally finite Radon measures on R r ; let ν = ν1 ∗ ν2 be
their convolution, and λ their product on R r × R r ; write Λ for the domain of λ. Let h be a Λ-measurable function
defined λ-almost everywhere in R r . Suppose that any one of the integrals
RR RR R
|h(x + y)|ν1 (dx)ν2 (dy), |h(x + y)|ν2 (dy)ν1 (dx), h(x + y)λ(d(x, y))
exists and is finite. Then h is ν-integrable and
R RR RR
h(x)ν(dx) = h(x + y)ν1 (dx)ν2 (dy) = h(x + y)ν2 (dy)ν1 (dx).

proof Put 257B together with Fubini’s and Tonelli’s theorems (252H).

257D Corollary If ν1 and ν2 are totally finite Radon measures on R r , then ν1 ∗ ν2 = ν2 ∗ ν1 .


proof For any Borel set E ⊆ R r , apply 257C to h = χE to see that
ZZ ZZ
(ν1 ∗ ν2 )(E) = χE(x + y)ν1 (dx)ν2 (dy) = χE(x + y)ν2 (dy)ν1 (dx)
ZZ
= χE(y + x)ν2 (dy)ν1 (dx) = (ν2 ∗ ν1 )(E).

Thus ν1 ∗ ν2 and ν2 ∗ ν1 agree on the Borel sets of R r ; because they are both Radon measures, they must be identical
(256D).

257E Corollary If ν1 , ν2 and ν3 are totally finite Radon measures on R r , then (ν1 ∗ ν2 ) ∗ ν3 = ν1 ∗ (ν2 ∗ ν3 ).
proof For any Borel set E ⊆ R r , apply 257B to h = χE to see that
ZZ
((ν1 ∗ ν2 ) ∗ ν3 )(E) = χE(x + z)(ν1 ∗ ν2 )(dx)ν3 (dz)
ZZZ
= χE(x + y + z)ν1 (dx)ν2 (dy)ν3 (dz)

(because x 7→ χE(x + z) is Borel measurable for every z)


ZZ
= χE(x + y)ν1 (dx)(ν2 ∗ ν3 )(dy)
R
(because (x, y) 7→ χE(x + y) is Borel measurable, so y 7→ χE(x + y)ν1 (dx) is (ν2 ∗ ν3 )-integrable)
= (ν1 ∗ (ν2 ∗ ν3 ))(E).
Thus (ν1 ∗ ν2 ) ∗ ν3 and ν1 ∗ (ν2 ∗ ν3 ) agree on the Borel sets of R r ; because they are both Radon measures, they must
be identical.

257F Theorem Suppose that ν1 and ν2 are totally finite Radon measures on R r which are indefinite-integral
measures over Lebesgue measure µ. Then ν1 ∗ ν2 also is an indefinite-integral measure over µ; if f1 and f2 are Radon-
Nikodým derivatives of ν1 , ν2 respectively, then f1 ∗ f2 is a Radon-Nikodým derivative of ν1 ∗ ν2 .
R
proof By 255H/255L, f1 ∗ f2 is integrable with respect to µ, with f1 ∗ f2 dµ = 1, and of course f1 ∗ f2 is non-negative.
If E ⊆ R r is a Borel set,
Z ZZ
f1 ∗ f2 dµ = χE(x + y)f1 (x)f2 (y)µ(dx)µ(dy)
E
(255G)
ZZ
= χE(x + y)f2 (y)ν1 (dx)µ(dy)

(because x 7→ χE(x + y) is Borel measurable)


ZZ
=
χE(x + y)ν1 (dx)ν2 (dy)
R
(because (x, y) 7→ χE(x + y) is Borel measurable, so y 7→ χE(x + y)ν1 (dx) is ν2 -integrable)
= (ν1 ∗ ν2 )(E).
So f1 ∗ f2 is a Radon-Nikodým derivative of ν with respect to µ, by 256J.
276 Product measures 257X

257X Basic exercises > (a) Let r ≥ 1 be an integer. Let δ0 be the Dirac measure on R r concentrated at 0. Show
that δ0 is a Radon probability measure on R r and that δ0 ∗ ν = ν for every totally finite Radon measure on R r .
r
R Let µ and ν be totally finite Radon measures on R , and E any set measured by their convolution µ ∗ ν. Show
(b)
that µ(E − y)ν(dy) is defined in [0, ∞] and equal to (µ ∗ ν)(E).
(c) Let ν1 , . . . , νn be totally finite Radon measures on R r , and let ν be the convolution ν1 ∗ . . . ∗ νn (using 257E to
see that such a bracketless expression is legitimate). Show that
R R R
h(x)ν(dx) = ... h(x1 + . . . + xn )ν1 (dx1 ) . . . νn (dxn )
for every ν-integrable function h.
(d) Let ν1 and ν2 be totally finite Radon measures on R r , with supports F1 , F2 (256Xf). Show that the support of
ν1 ∗ ν2 is {x + y : x ∈ F1 , y ∈ F2 }.
> (e) Let ν1 and ν2 be totally finite Radon measures on R r , and suppose that ν1 has a Radon-Nikodým
R derivative f
with respect to Lebesgue measure µ. Show that ν1 ∗ν2 has a Radon-Nikodým derivative g, where g(x) = f (x−y)ν2 (dy)
for µ-almost every x ∈ R r .
(f ) Suppose that ν1 , ν2 , ν1′ and ν2′ are totally finite Radon measures on R r , and that ν1′ , ν2′ are absolutely continuous
with respect to ν1 , ν2 respectively. Show that ν1′ ∗ ν2′ is absolutely continuous with respect to ν1 ∗ ν2 .

257Y Further exercises (a) Let M be the space of countably additive functionals defined on the algebra B
of Borel subsets of R, with its norm kνk = |ν|(R) (see 231Yh). (i) Show that we have a unique bilinear operator
∗ : M × M → M such that (µ1 ↾B) ∗ (µ2 ↾B) = (µ1 ∗ µ2 )↾B for all totally finite Radon measures µ1 , µ2 on R. (ii) Show
that ∗ is commutative and associative. (iii) Show that kν1 ∗ ν2 k ≤ kν1 kkν2 k for all ν1 , ν2 ∈ M , so that M is a Banach
algebra under this multiplication. (iv) Show that M has a multiplicative identity. (v) Show that L1 (µ) can be regarded
as a closed subalgebra of M , where µ is Lebesgue measure on R r (cf. 255Xi).
(b) Let us say that a Radon measure on ]−π, π] is a measure ν, with domain Σ, on ]−π, π] such that (i) every
Borel subset of ]−π, π] belongs to Σ (ii) for every E ∈ Σ there are Borel sets E1 , E2 such that E1 ⊆ E ⊆ E2 and
ν(E2 \ E1 ) = 0 (iii) every compact subset of ]−π, π] has finite measure. Show that for any two totally finite Radon
measures ν1 , ν2 on ]−π, π] there is a unique totally finite Radon measure ν on ]−π, π] such that
R R
h(x)ν(dx) = h(x +2π y)ν1 (dx)ν2 (dy)
for every ν-integrable function h, where +2π is defined as in 255Ma.

257 Notes and comments Of course convolution of functions and convolution of measures are very closely connected;
the obvious link being 257F, but the correspondence between 255G and 257B is also very marked. In effect, they give
us the same notion of convolution u ∗ v when u, v are positive members of L1 and u ∗ v is interpreted in L1 rather than
as a function (257Ya). But we should have to go rather deeper than the arguments here to find ideas in the theory of
convolution of measures to correspond to such results as 255K. I will return to questions of this type in §444 in Volume
4.
All the theorems of this section can be extended to general abelian locally compact Hausdorff topological groups;
but for such generality we need much more advanced ideas (see §444), and for the moment I leave only the suggestion
in 257Yb that you should try to adapt the ideas here to ]−π, π] or S 1 .
261A Vitali’s theorem in Rr 277

Chapter 26
Change of Variable in the Integral
I suppose most courses on basic calculus still devote a substantial amount of time to practice in the techniques
R of
integrating
R standard functions. Surely the most powerful single technique is that of substitution: replacing g(y)dy
by g(φ(x))φ′ (x)dx for an appropriate function φ. At this level one usually concentrates on the skills of guessing at
appropriate φ and getting the formulae right. I will not address such questions here, except for rare special cases; in
this book I am concerned rather with validating the process. For functions of one variable, it can usually be justified
by an appeal to the Fundamental Theorem of Calculus, and for any particular case I would normally go first to §225
in the hope that the results there would cover it. But for functions of two or more variables some much deeper ideas
are necessary.
I have already treated the generalR problem ofR integration-by-substitution in abstract measure spaces in §235. There
I described conditions under which g(y)dy = g(φ(x))J(x)dx for an appropriate function J. The context there gave
very little scope for suggestions as to how to compute J; at best, it could be presented as a Radon-Nikodým derivative
(235M). In this chapter I give a form of the fundamental theorem for the case of Lebesgue measure, in which φ is a
more or less differentiable function between Euclidean spaces, and J is a ‘Jacobian’, the modulus of the determinant
of the derivative of φ (263D). This necessarily depends on a serious investigation of the relationship between Lebesgue
measure and geometry. The first step is to establish a form of Vitali’s theorem for r-dimensional space, together with
r-dimensional density theorems; I do this in §261, following closely the scheme of §§221 and 223 above. We need to
know quite a lot about differentiable functions between Euclidean spaces, and it turns out that the theory is intertwined
with that of ‘Lipschitz’ functions; I treat these in §262.
In the next two sections of the chapter, I turn to a separate problem for which some of the same techniques turn
out to be appropriate: the description of surface measure on (smooth) surfaces in Euclidean space, like the surface of
a cone or sphere. I suppose there is no difficulty in forming a robust intuition as to what is meant by the ‘area’ of
such a surface and of suitably simple regions within it, and there is a very strong presumption that there ought to be
an expression for this intuition in terms of measure theory as presented in this book; but the details are not I think
straightforward. The first point to note is that for any calculation of the area of a region G in a surface S, one would
always turn at once to a parametrization of the region, that is, a bijection φ : D → G from some subset D of Euclidean
space. But obviously one needs to be sure that the result of the calculation is independent of the parametrization
chosen, and while it would be possible to base the theory on results showing such independence directly, that does not
seem to me to be a true reflection of the underlying intuition, which is that the area of simple surfaces, at least, is
something intrinsic to their geometry. I therefore see no acceptable alternative to a theory of ‘r-dimensional measure’
which can be described in purely geometric terms. This is the burden of §264, in which I give the definition and most
fundamental properties of Hausdorff r-dimensional measure in Euclidean spaces. With this established, we find that
the techniques of §§261-263 are sufficient to relate it to calculations through parametrizations, which is what I do in
§265.
The chapter ends with a brief account of the Brunn-Minkowski inequality (266C), which is an essential tool for the
geometric measure theory of convex sets.

261 Vitali’s theorem in R r


The main aim of this section is to give r-dimensional versions of Vitali’s theorem and Lebesgue’s Density Theorem,
following ideas already presented in §§221 and 223.

261A Notation For most of this chapter, we shall be dealing with the geometry and measure of Euclidean space;
it will save space to fix some notation.
Throughout this section and the two following, r ≥ 1 will be an integer. I will use Roman letters for members of R r
and Greek letters for their coordinates, so that a = (α1 , . . . , αr ), etc.; if you see any Greek letter with a subscript you
should look first for a nearby vector of which it might be a coordinate. The measure under consideration will nearly
always be Lebesgue measure on Rr ; so unless R otherwise indicated µ should be interpreted as Lebesgue measure, and
µ∗ as Lebesgue outer measure. Similarly, . . . dx will always be integration with respect to Lebesgue measure (in a
dimension determined by the context). p
For x = (ξ1 , . . . , ξr ) ∈ R r , write kxk = ξ12 + . . . + ξr2 . Recall that kx + yk ≤ kxk + kyk (1A2C) and that
kαxk = |α|kxk for any vectors x, y and scalar α.
I will use the same notation as in §115 for ‘intervals’, so that, in particular,
[a, b[ = {x : αi ≤ ξi < βi ∀ i ≤ r},
278 Change of variable in the integral 261A

]a, b[ = {x : αi < ξi < βi ∀ i ≤ r},

[a, b] = {x : αi ≤ ξi ≤ βi ∀ i ≤ r}
r
whenever a, b ∈ R .
0 = (0, . . . , 0) will be the zero vector in R r , and 1 will be (1, . . . , 1). If x ∈ R r and δ > 0, B(x, δ) will be the closed
ball with centre x and radius δ, that is, {y : y ∈ R r , ky − xk ≤ δ}. Note that B(x, δ) = x + B(0, δ); so that by the
translation-invariance of Lebesgue measure we have
µB(x, δ) = µB(0, δ) = βr δ r ,
where

1 k
βr = π if r = 2k is even,
k!
22k+1 k! k
= π if r = 2k + 1 is odd
(2k+1)!

(252Q).

261B Vitali’s theorem in R r Let A ⊆ R r be any set, and I a family of closed non-trivial (that is, non-singleton,
or, equivalently, non-negligible) balls in R r such that every point
S of A is contained in arbitrarily small members of I.
Then there is a countable disjoint set I0 ⊆ I such that µ(A \ I0 ) = 0.
proof (a) To begin with (down to the end of (f) below), suppose that kxk < M for every x ∈ A, and set
I ′ = {I : I ∈ I, I ⊆ B(0, M )}.
S
If there is a finite disjoint set I0 ⊆ I ′ such that A ⊆ I0 (including the possibility that A = I0 = ∅), we can stop. So
let us suppose henceforth that there is no such I0 .
′ ′
S if I0 is any finite disjoint subset of I , there is a J ∈ I which is disjoint from any member of I0 .
(b) In this case,
P Take x ∈ A \ I0 . Because every member of I0 is closed, there is a δ > 0 such that B(x, δ) does not meet any
P
member of I0 , and as kxk < M we can suppose S that B(x, δ) ⊆ B(0, M ). Let J be a member of I, containing x, and
of diameter at most δ; then J ∈ I ′ and J ∩ I0 = ∅. Q Q
(c) We can therefore choose a sequence hγn in∈N of real numbers and a disjoint sequence hIn in∈N in I ′ inductively,
as follows. Given hIj ij<n (if n = 0, this is the empty sequence, with no members), with Ij ∈ I ′ for each j < n, and
Ij ∩ Ik = ∅ for j < k < n, set Jn = {I : I ∈ I ′ , I ∩ Ij = ∅ for every j < n}. We know from (b) that Jn 6= ∅. Set
γn = sup{diam I : I ∈ Jn };
then γn ≤ 2M , because every member of Jn is included in B(0, M ). We can therefore find a set In ∈ Jn such that
diam In ≥ 12 γn , and this continues the induction.
(e) Because the In are disjoint measurable subsets of the bounded set B(0, M ), we have
P∞
n=0 µIn ≤ µB(0, M ) < ∞,

and limn→∞ µIn = 0. Also µIn ≥ βr ( 41 γn )r for each n, so limn→∞ γn = 0.


Now define In′ to be the closed ball with the same centre as SIn but fiveStimes the diameter, so that it contains every
point within a distance γn of In . I claim that, for any n, A ⊆ j<n Ij ∪ j≥n Ij′ . PP?? Suppose, if possible, otherwise.
S S
Take any x ∈ A \ ( j<n Ij ∪ j≥n Ij′ ). Let δ > 0 be such that
S
B(x, δ) ⊆ B(0, M ) \ j<n Ij ,
and let J ∈ I be such that x ∈ J ⊆ B(x, δ). Then
limm→∞ γm = 0 < diam J
(this is where we use the hypothesis that all the balls in I are non-trivial); let m be the least integer greater than
or equal to n such that γm < diam J. In this case J cannot belong to Jm , so there must be some k < m such that
J ∩ Ik 6= ∅, because certainly J ∈ I ′ . By the choice of δ, k cannot be less than n, so n ≤ k < m, and γk ≥ diam J. So
the distance from x to the nearest point of Ik is at most diam J ≤ γk . But this means that x ∈ Ik′ ; which contradicts
the choice of x. X
XQQ
(f ) It follows that
261C Vitali’s theorem in Rr 279

S S P∞ P∞
µ∗ (A \ j<n Ij ) ≤ µ( ′
j≥n Ij ) ≤ j=n µIj′ ≤ 5r j=n µIj .
As
P∞
j=0 µIj ≤ µB(0, M ) < ∞,

S
limn→∞ µ (A \ j<n Ij ) = 0 and
S S
µ(A \ j∈N Ij ) = µ∗ (A \ j∈N Ij ) = 0.
S
Thus in this case we may set I0 = {In : n ∈ N} to obtain a countable disjoint family in I with µ(A \ I0 ) = 0.
(g) This completes the proof if A is bounded. In general, set
Un = {x : x ∈ R r , n < kxk < n + 1}, An = A ∩ Un , Jn = {I : I ∈ I, I ⊆ Un },
for each n ∈ N. Then for each n we see that every
S point of An belongs to arbitrarily small members of Jn ,Sso there is
a countable disjoint Jn′ ⊆ Jn such that An \ Jn′ is negligible. Now (because the Un are disjoint) I0 = n∈N Jn′ is
disjoint, and of course I0 is a countable subset of I; moreover,
S S S S
A \ I0 ⊆ (R r \ n∈N Un ) ∪ n∈N (An \ Jn′ )
S
is negligible. (To see that Rr \ n∈N Un = {x : kxk ∈ N} is negligible, note that for any n ∈ N the set
{x : kxk = n} ⊆ B(0, n) \ B(0, δn)
has measure at most βr nr − βr (δn)r for every δ ∈ [0, 1[, so must be negligible.)

261C Just as in §223, we can use the r-dimensional Vitali theorem to prove theorems on the approximation of
functions by their local mean values.
Density Theorem in R r : integral form Let D be a subset of R r , and f a real-valued function which is integrable
over D. Then
1 R
f (x) = limδ↓0 f dµ
µB(x,δ) D∩B(x,δ)
for almost every x ∈ D.
proof (a) To begin with (down to the end of (b)), let us suppose that D = dom f = R r .
Take n ∈ N and q, q ′ ∈ Q with q < q ′ , and set
1 R
A = Anqq′ = {x : kxk ≤ n, f (x) ≤ q, lim supδ↓0 f dµ > q ′ }.
µB(x,δ) B(x,δ)
∗ ′ ∗
R?? Suppose, if possible, that µ A > 0. Let ǫ > 0 be such that ǫ(1 + |q|) < (q − q)µ A, ∗and let η ∈ ]0, ǫ] be such that
E
|f | ≤ ǫ whenever µE ≤ η (225A). Let G ⊇ A be R an open set of measure at most µ A + η (134Fa). Let I be the
1 ′
set of non-trivial closed balls B ⊆ G such that µB f dµ ≥ q . Then every point of A is contained in (indeed, is the
B S
centre of) arbitrarily
S small members of I. So there is a countable disjoint set I0 ⊆ I such that µ(A \ I0 ) = 0, by
261B; set HR = I0 .
Because I f dµ ≥ q ′ µI for each I ∈ I0 , we have
R P R P
H
f dµ = I∈I0 I f dµ ≥ q ′ I∈I0 µI = q ′ µH ≥ q ′ µ∗ A.
Set
E = {x : x ∈ G, f (x) ≤ q}.
Then E is measurable, and A ⊆ E ⊆ G; so
µ∗ A ≤ µE ≤ µG ≤ µ∗ A + η ≤ µ∗ A + ǫ.
Also
µ(H \ E) ≤ µG − µE ≤ η,
R
so by the choice of η, H\E
f ≤ ǫ and
Z Z
f ≤ǫ+ f ≤ ǫ + qµ(H ∩ E)
H H∩E
≤ ǫ + qµ A + |q|(µ(H ∩ E) − µ∗ A) ≤ qµ∗ A + ǫ(1 + |q|)

(because µ∗ A = µ∗ (A ∩ H) ≤ µ(H ∩ E))


280 Change of variable in the integral 261C
Z
< q ′ µ∗ A ≤ f,
H

which is impossible. X
X
Thus Anqq′ is negligible. This is true for all q < q ′ and all n, so
S S
A∗ = q,q′ ∈Q,q<q′ n∈N Anqq′
is negligible. But
1 R
f (x) ≥ lim supδ↓0 f
µB(x,δ) B(x,δ)

for every x ∈ R r \ A∗ , that is, for almost all x ∈ R r .


(b) Similarly, or applying this result to −f .
1 R
f (x) ≤ lim inf δ↓0 f
µB(x,δ) B(x,δ)
for almost every x, so
1 R
f (x) = limδ↓0 f
µB(x,δ) B(x,δ)
for almost every x.
(c) For the (superficially) more general caseRenunciated
R in the theorem, let f˜ be a µ-integrable function extending
˜
f ↾D, defined everywhere on R , and such that F f = D∩F f for every measurable F ⊆ R r (applying 214Eb to f ↾D).
r

Then
1 R 1 R
f (x) = f˜(x) = limδ↓0 f˜ = limδ↓0 f
µB(x,δ) B(x,δ) µB(x,δ) D∩B(x,δ)
for almost every x ∈ D.

261D Corollary (a) If D ⊆ R r is any set, then


µ∗ (D∩B(x,δ))
limδ↓0 =1
µB(x,δ)
for almost every x ∈ D.
(b) If E ⊆ R r is a measurable set, then
µ(E∩B(x,δ))
limδ↓0 = χE(x)
µB(x,δ)
for almost every x ∈ R r .
(c) If D ⊆ R r and f : D → R is any function, then for almost every x ∈ D,
µ∗ ({y:y∈D, |f (y)−f (x)|≤ǫ}∩B(x,δ))
limδ↓0 =1
µB(x,δ)
for every ǫ > 0.
(d) If D ⊆ R r and f : D → R is measurable, then for almost every x ∈ D,
µ∗ ({y:y∈D, |f (y)−f (x)|≥ǫ}∩B(x,δ))
limδ↓0 =0
µB(x,δ)
for every ǫ > 0.
proof (a) Apply 261C with f = χB(0, n) to see that, for any n ∈ N,
µ∗ (D∩B(x,δ))
limδ↓0 =1
µB(x,δ)

for almost every x ∈ D with kxk < n.


(b) Apply (a) to E to see that
µ(E∩B(x,δ))
lim inf δ↓0 ≥ χE(x)
µB(x,δ)

for almost every x ∈ R r , and to E ′ = R r \ E to see that


261E Vitali’s theorem in Rr 281

µ(E∩B(x,δ)) µ(E ′ ∩B(x,δ))


lim supδ↓0 = 1 − lim inf δ↓0 ≤ 1 − χE ′ (x) = χE(x)
µB(x,δ) µB(x,δ)
for almost every x.
(c) For q, q ′ ∈ Q, set
Dqq′ = {x : x ∈ D, q ≤ f (x) ≤ q ′ },

µ∗ (Dqq′ ∩B(x,δ))
Cqq′ = {x : x ∈ Dqq′ , limδ↓0 = 1};
µB(x,δ)
now set
S
C =D\ q,q ′ ∈Q (Dqq ′ \ Cqq′ ),
so that D \ C is negligible. If x ∈ C and ǫ > 0, then there are q, q ′ ∈ Q such that f (x) − ǫ ≤ q ≤ f (x) ≤ q ′ ≤ f (x) + ǫ,
and now
µ∗ {y:y∈D∩B(x,δ), |f (y)−f (x)|≤ǫ} µ∗ (Dqq′ ∩B(x,δ))
lim inf δ↓0 ≥ lim inf δ↓0 = 1,
µB(x,δ) µB(x,δ)
so
µ∗ {y:y∈D∩B(x,δ), |f (y)−f (x)|≤ǫ}
limδ↓0 = 1.
µB(x,δ)

(d) Define C as in (c). We know from (a) that µ(D \ C ′ ) = 0, where


µ∗ (D∩B(x,δ))
C ′ = {x : x ∈ D, limδ↓0 = 1}.
µB(x,δ)

If x ∈ C ∩ C ′ and ǫ > 0, we know from (c) that


µ∗ {y:y∈D∩B(x,δ), |f (y)−f (x)|≤ǫ/2}
limδ↓0 = 1.
µB(x,δ)
But because f is measurable, we have

µ∗ {y : y ∈D ∩ B(x, δ), |f (y) − f (x)| ≥ ǫ}


1
+ µ∗ {y : y ∈ D ∩ B(x, δ), |f (y) − f (x)| ≤ ǫ} ≤ µ∗ (D ∩ B(x, δ))
2

for every δ > 0. Accordingly

µ∗ {y:y∈D∩B(x,δ), |f (y)−f (x)|≥ǫ}


lim sup
δ↓0 µB(x,δ)

µ∗ (D∩B(x,δ)) µ∗ {y:y∈D∩B(x,δ), |f (y)−f (x)|≤ǫ/2}


≤ lim − lim = 0,
δ↓0 µB(x,δ) δ↓0 µB(x,δ)

and
µ∗ {y:y∈D∩B(x,δ), |f (y)−f (x)|≥ǫ}
limδ↓0 =0
µB(x,δ)

for every x ∈ C ∩ C ′ , that is, for almost every x ∈ D.

261E Theorem Let f be a locally integrable function defined on a conegligible subset of R r . Then
1 R
limδ↓0 |f (y) − f (x)|dy =0
µB(x,δ) B(x,δ)
for almost every x ∈ R r .
proof (Compare 223D.)
(a) Fix n ∈ N for the moment, and set G = {x : kxk < n}. For each q ∈ Q, set gq (x) = |f (x) − q| for x ∈ G ∩ dom f ;
then gq is integrable over G, and
1 R
limδ↓0 gq = gq (x)
µB(x,δ) G∩B(x,δ)
for almost every x ∈ G, by 261C. Setting
282 Change of variable in the integral 261E

1 R
Eq = {x : x ∈ G ∩ dom f, limδ↓0 gq = gq (x)},
µB(x,δ) G∩B(x,δ)
T
we have G \ Eq negligible for every q, so G \ E is negligible, where E = q∈Q Eq . Now
1 R
limδ↓0 |f (y) − f (x)|dy =0
µB(x,δ) G∩B(x,δ)
for every x ∈ E. P
P Take x ∈ E and ǫ > 0. Then there is a q ∈ Q such that |f (x) − q| ≤ ǫ, so that
|f (y) − f (x)| ≤ |f (y) − q| + ǫ = gq (y) + ǫ
for every y ∈ G ∩ dom f , and
Z Z
1 1
lim sup |f (y) − f (x)|dy ≤ lim sup gq (y) + ǫ dy
δ↓0 µB(x,δ) G∩B(x,δ) δ↓0 µB(x,δ) G∩B(x,δ)

= ǫ + gq (x) ≤ 2ǫ.
As ǫ is arbitrary,
1 R
limδ↓0 |f (y) − f (x)|dy = 0,
µB(x,δ) G∩B(x,δ)
as required. Q
Q
(b) Because G is open,
1 R 1 R
limδ↓0 |f (y) − f (x)|dy = limδ↓0 |f (y) − f (x)|dy =0
µB(x,δ) B(x,δ) µB(x,δ) G∩B(x,δ)
for almost every x ∈ G. As n is arbitrary,
1 R
limδ↓0 |f (y) − f (x)|dy =0
µB(x,δ) B(x,δ)
for almost every x ∈ R r .
Remark The set
1 R
{x : x ∈ dom f, limδ↓0 |f (y) − f (x)|dy = 0}
µB(x,δ) B(x,δ)
is sometimes called the Lebesgue set of f .

261F Another very useful consequence of 261B is the following.


r
Proposition Let A ⊆ R and ǫ > 0. Then there is a sequence hBn in∈N of closed balls in R r , all of radius at
S be any set, P ∞
most ǫ, such that A ⊆ n∈N Bn and n=0 µBn ≤ µ∗ A + ǫ. Moreover, we may suppose that the balls in the sequence
whose centres do not lie in A have measures summing to at most ǫ.
proof (a) Set βr = µB(0, 1). The first step is the obvious remark that if x ∈ R r , δ > 0 then the half-open cube

I = [x, x + δ1[ is a subset of the ball B(x, δ r), which has measure γr δ r = γr µI, where γr = βr rr/2 . It follows that if
G ⊆ R r is any open set, then G can be covered by a sequence of balls of total measure at most γr µG. P P If G is empty,
we can take all the balls to be singletons. Otherwise, for each k ∈ N, set
 
Qk = {z : z ∈ Zr , 2−k z, 2−k (z + 1) ⊆ G},
S  
Ek = z∈Qk 2−k z, 2−k (z + 1 ).
Then hEk ik∈N is a non-decreasing sequence of sets with union G, and E0 and each of the differences Ek+1 \ Ek is
expressible as a disjoint union of half-open cubes. Thus G also is expressible as a disjoint
S union of a sequence hIn in∈N
of half-open cubes. Each In is covered by a ball Bn of measure γr µIn ; so that G ⊆ n∈N Bn and
P∞ P∞
n=0 µBn ≤ γr n=0 µIn = γr µG. Q Q

(b) It follows at once that if µA = 0 then for any ǫ > 0 there is a sequence hBn in∈N of balls covering A of measures
summing to at most ǫ, because there is certainly an open set including A with measure at most ǫ/γr .
(c) Now take any set A, and ǫ > 0. Let G ⊇ A be an open set with µG ≤ µ∗ A+ 21 ǫ. Let I be the family of non-trivial
closed balls included in G, of radius at most ǫ and with centres in A. Then Severy point of A belongs to arbitrarily
small members of I, so there is a countable disjoint I0 ⊆ I such that µ(A \ I0 ) = 0. Let hBn′ in∈N be a sequence
261Yh Vitali’s theorem in Rr 283

S P∞
of balls covering A \ I0 with n=0 µBn′ ≤ min( 12 ǫ, r
Sβr ǫ ); these surely all have radius at most ǫ. Let hBn in∈N be a

sequence amalgamating I0 with hBn in∈N ; then A ⊆ n∈N Bn , every Bn has radius at most ǫ and
P∞ P P∞ 1
n=0 µBn = B∈I0 µB + n=0 µBn′ ≤ µG + ǫ ≤ µA + ǫ,
2

while the Bn whose centres do not lie in A must come from the sequence hBn′ in∈N , so their measures sum to at most
1
2 ǫ ≤ ǫ.

Remark In fact we can (if A is not empty) arrange that the centre of every Bn belongs to A. This is an easy
consequence of Besicovitch’s Covering Lemma (see §472 in Volume 4).

261X Basic exercises (a) Show that 261C is valid for any locally integrable real-valued function f ; in particular,
for any f ∈ Lp (µD ) for any p ≥ 1, writing µD for the subspace measure on D.

(b) Show that 261C, 261Dc, 261Dd and 261E are valid for complex-valued functions f .

> (c) Take three disks in the plane, each touching the other two, so that they enclose an open region R with three
cusps. In R let D be a disk tangent to each of the three original disks, and R0 , R1 , R2 the three components of R \ D.
In each Rj let Dj be a disk tangent to each of the disks bounding Rj , and Rj0 , Rj1 , Rj2 the three components of
Rj \ Dj . Continue, obtaining 27 regions at the next step, 81 regions at the next, and so on.
Show that the total area of the residual regions converges to zero as the process continues indefinitely. (Hint:
compare with the process in the proof of 261B.)

261Y Further exercises (a) Formulate an abstract definition of ‘Vitali cover’, meaning a family of sets satisfying
the conclusion of 261B in some sense, and corresponding generalizations of 261C-261E, covering (at least) (b)-(d) below.
 
(b) For x ∈ R r , k ∈ N let C(x, k) be the half-open cube of the form 2−k z, 2−k (z + 1) , with z ∈ Zr , containing x.
Show that if f is an integrable function on R r then
R
limk→∞ 2kr C(x,k)
f = f (x)
for almost every x ∈ R r .

(c) Let f be a real-valued function which is integrable over R r . Show that


1R
limδ↓0 f = f (x)
δ r [x,x+δ1[
for almost every x ∈ R r .

(d) Give X = {0, 1}N its usual measure ν (254J). For x ∈ X, k ∈ N set C(x, k) = {y : y ∈R X, y(i) = x(i) for
i < k}. Show that if f is any real-valued function which is integrable over X then limk→∞ 2k C(x,k) f dν = f (x),
R
limk→∞ 2k C(x,k) |f (y) − f (x)|ν(dy) = 0 for almost every x ∈ X.

(e) Let f be a real-valued function which is Rintegrable over R r , and x a point in the Lebesgue set of f . Show that for
every ǫ > 0 there isRa δ > 0 such that |f (x)
R − f (x − y)g(kyk)dy| ≤ ǫ whenever g : [0, ∞[ → [0, ∞[ is a non-increasing
function such that Rr g(kyk)dy = 1 and B(0,δ) g(kyk)dy ≥ 1 − δ. (Hint: 223Yg.)

µ(G∩B(x,δ))
(f ) Let T be the family of those measurable sets G ⊆ R r such that limδ↓0 = 1 for every x ∈ G. Show
µB(x,δ)
r r r
that T is a topology on R , the density topology of R . Show that a function f : R → R is measurable iff it is
T-continuous at almost every point of Rr .

ρ(y,A)
(g) A set A ⊆ R r is said to be porous at x ∈ R r if lim supy→x > 0, writing ρ(y, A) = inf z∈A ky − zk (or ∞
ky−xk
if A is empty). (i) Show that
S if A is porous at all its points then it is negligible. (ii) Show that in the construction of
261A the residual set A \ I0 is always porous.

(h) Let A ⊆ R r be a bounded set and I a non-empty family of non-trivialPn closed balls covering A. Show that for
any ǫ > 0 there are disjoint B0 , . . . , Bn ∈ I such that µ∗ A ≤ (3 + ǫ)r k=0 µBk .
284 Change of variable in the integral 261Yi

(i) Let (X, ρ) be a metric space and A ⊆ X any set, x 7→ δx : A → [0, ∞[ any boundedS function. Show
S that if γ > 3
then there is an A′ ⊆ A such that (i) ρ(x, y) > δx + δy for all distinct x, y ∈ A′ (ii) x∈A B(x, δx ) ⊆ x∈A′ B(x, γδx ),
writing B(x, α) for the closed ball {y : ρ(y, x) ≤ α}.

(j) Show that any union of non-trivial closed balls in R r is Lebesgue measurable. (Hint: induce on r. Compare
415Ye in Volume 4.)

(k) Suppose that A ⊆ R r and that I is a family of closed subsets of R r such that
for every x ∈ A there is an η > 0 such that for every ǫ > 0 there is an I ∈ I such that x ∈ I and
0 < η(diam I)r ≤ µI ≤ ǫ.
S
Show that there is a countable disjoint set I0 ⊆ I such that A \ I0 is negligible.

(l) Let T′ be the family of measurable sets G ⊆ R r such that whenever x ∈ G and ǫ > 0 there is a δ > 0 such that
µ(G ∩ I) ≥ (1 − ǫ)µI whenever I is an interval containing x and included in B(x, δ). Show that T′ is a topology on R r
intermediate between the density topology (261Yf) and the Euclidean topology.

261 Notes and comments In the proofs of 261B-261E above, I have done my best to follow the lines of the one-
dimensional case; this section amounts to a series of generalizations of the work of §§221 and 223.
It will be clear that the idea of 261A/261B can be used on other shapes than balls. To make it work in the form
above, we need a family I such that there is a constant K for which
µI ′ ≤ KµI
for every I ∈ I, where we write
I ′ = {x : inf y∈I kx − yk ≤ diam(I)}.
Evidently this will be true for many classes I determined by the shapes of the sets involved; for instance, if E ⊆ R r is
any bounded set of strictly positive measure, the family I = {x + δE : x ∈ R r , δ > 0} will satisfy the condition.
In 261Ya I challenge you to find an appropriate generalization of the arguments depending on the conclusion of
261B.
Another way of using 261B is to say that because sets can be essentially covered by disjoint sequences of balls, it
ought to be possible to use balls, rather than half-open intervals, in the definition of Lebesgue measure on R r . This is
indeed so (261F). The difficulty in using balls in the basic definition comes right at the start, in proving that if a ball
is covered by finitely many balls then the sum of the volumes of the covering balls is at least the volume of the covered
ball. (There is a trick, using the compactness of closed balls and the openness of open balls, to extend such a proof
to infinite covers.) Of course you could regard this fact as ‘elementary’, on the ground that Archimedes would have
noticed if it weren’t true, but nevertheless it would be something of a challenge to prove it, unless you were willing to
wait for a version of Fubini’s theorem, as some authors do.
I have given the results in 261C-261D for arbitrary subsets D of Rr not because I have any applications in mind
in which non-measurable subsets are significant, but becauseR I wish to make it possible to notice when measurability
matters. Of course it is necessary to interpret the integrals D f dµ in the way laid down in §214. The game is given
away in part (c) of theRproof ofR 261C, where I rely on the fact that if f is integrable over D then there is an integrable
f˜ : R r → R such that F f˜ = D∩F f for every measurable F ⊆ R r . In effect, for all the questions dealt with here, we
can replace f , D by f˜, R r .
The idea of 261C is that, for almost every x, f (x) is approximated by its mean value on small balls B(x, δ), ignoring
the missing values on B(x, δ) \ (D ∩ dom f ); 261E is a sharper version of the same idea. The formulae of 261C-261E
mostly involve the expression µB(x, δ). Of course this is just βr δ r . But I think that leaving it unexpanded is actually
more illuminating, as well as avoiding sub- and superscripts, since it makes it clearer what these density theorems are
really about. In §472 of Volume 4 I will revisit this material, showing that a surprisingly large proportion of the ideas
can be applied to arbitrary Radon measures on R r , even though Vitali’s theorem (in the form stated here) is no longer
valid.
262B Lipschitz and differentiable functions 285

262 Lipschitz and differentiable functions


In preparation for the main work of this chapter in §263, I devote a section to two important classes of functions
between Euclidean spaces. What we really need is the essentially elementary material down to 262I, together with the
technical lemma 262M and its corollaries. Theorem 262Q is not relied on in this volume, though I believe that it makes
the patterns which will develop more natural and comprehensible.

262A Lipschitz functions Suppose that r, s ≥ 1 and φ : D → R s is a function, where D ⊆ R r . We say that φ is
γ-Lipschitz, where γ ∈ [0, ∞[, if
kφ(x) − φ(y)k ≤ γkx − yk
p p
for all x, y ∈ D, writing kxk = ξ12 + . . . + ξr2 if x = (ξ1 , . . . , ξr ) ∈ R r , kzk = ζ12 + . . . + ζs2 if z = (ζ1 , . . . , ζs ) ∈ R s .
In this case, γ is a Lipschitz constant for φ.
A Lipschitz function is a function φ which is γ-Lipschitz for some γ ≥ 0. Note that in this case φ has a least
Lipschitz constant (since if A is the set of Lipschitz constants for φ, and γ0 = inf A, then γ0 is a Lipschitz constant for
φ).

262B We need the following easy facts.


Lemma Let D ⊆ R r be a set and φ : D → R s a function.
(a) φ is Lipschitz iff φi : D → R is Lipschitz for every i, writing φ(x) = (φ1 (x), . . . , φs (x)) for every x ∈ D =
dom φ ⊆ R r .
(b) In this case, there is a Lipschitz function φ̃ : R r → R s extending φ.
(c) If r = s = 1 and D = [a, b] is an interval, then φ is Lipschitz iff it is absolutely continuous and has a bounded
derivative.
proof (a) For any x, y ∈ D and i ≤ s,

|φi (x) − φi (y)| ≤ kφ(x) − φ(y)k ≤ s supj≤s |φj (x) − φj (y)|,
so any Lipschitz
√ constant for φ will be a Lipschitz constant for every φi , and if γj is a Lipschitz constant for φj for
each j, then s supj≤s γj will be a Lipschitz constant for φ.

(b) By (a), it is enough to consider the case s = 1, for if every φi has a Lipschitz extension φ̃i , we can set
φ̃(x) = (φ̃1 (x), . . . , φ̃s (x)) for every x to obtain a Lipschitz extension of φ. Taking s = 1, then, note that the case
D = ∅ is trivial; so suppose that D 6= ∅. Let γ be a Lipschitz constant for φ, and write
φ̃(z) = supy∈D φ(y) − γky − zk
for every z ∈ R . If x ∈ D, then, for any z ∈ R r and y ∈ D,
r

φ(y) − γky − zk ≤ φ(x) + γky − xk − γky − zk ≤ φ(x) + γkz − xk,


so that φ̃(z) ≤ φ(x) + γkz − xk; this shows, in particular, that φ̃(z) < ∞. Also, if z ∈ D, we must have
φ(z) − γkz − zk ≤ φ̃(z) ≤ φ(z) + γkz − zk,
so that φ̃ extends φ. Finally, if w, z ∈ R r and y ∈ D,
φ(y) − γky − wk ≤ φ(y) − γky − zk + γkw − zk ≤ φ̃(z) + γkw − zk;
and taking the supremum over y ∈ D,
φ̃(w) ≤ φ̃(z) + γkw − zk.
As w and z are arbitrary, φ̃ is Lipschitz.
Pn
(c)(i) Suppose that φ is γ-Lipschitz. If ǫ > 0 and a ≤ a1 ≤ b1 ≤ . . . ≤ an ≤ bn ≤ b and i=1 bi − ai ≤ ǫ/(1 + γ),
then
Pn Pn
i=1 |φ(bi ) − φ(ai )| ≤ i=1 γ|bi − ai | ≤ ǫ.

As ǫ is arbitrary, φ is absolutely continuous. If x ∈ [a, b] and φ′ (x) is defined, then


|φ(y)−φ(x)|
|φ′ (x)| = limy→x ≤ γ,
|y−x|

so φ′ is bounded.
286 Change of variable in the integral 262B

(ii) Now suppose that φ is absolutely continuous and that |φ′ (x)| ≤ γ for every x ∈ dom φ′ , where γ ≥ 0. Then
whenever a ≤ x ≤ y ≤ b,
Ry Ry
|φ(y) − φ(x)| = | x
φ′ | ≤ x
|φ′ | ≤ γ(y − x)
(using 225E for the first equality). As x and y are arbitrary, φ is γ-Lipschitz.

262C Remark The argument for (b) above shows that if φ : D → R is a Lipschitz function, where D ⊆ Rr , then
φ has an extension to R r with the same Lipschitz constants. In fact it is the case that if φ : D → R s is a Lipschitz
function, then φ has an extension to φ̃ : R r → R s with the same Lipschitz constants; this is ‘Kirzbraun’s theorem’
(Kirzbraun 34, or Federer 69, 2.10.43).

262D Proposition If φ : D → R r is a γ-Lipschitz function, where D ⊆ R r , then µ∗ φ[A] ≤ γ r µ∗ A for every A ⊆ D,


where µ is Lebesgue measure on R r . In particular, φ[D ∩ A] is negligible for every negligible set A ⊆ R r .
proof Let ǫ > 0. By 261F,Pthere is a sequence hBn in∈N = hB(xn , δn )in∈N of closed balls in Rr , covering A, such that
P ∞ ∗
n=0 µBn ≤ µ A + ǫ and n∈N\K µBn ≤ ǫ, where K = {n : n ∈ N, xn ∈ A}. Set

L = {n : n ∈ N \ K, Bn ∩ D 6= ∅},
and for n ∈ L choose yn ∈ D ∩ Bn . Now set

Bn′ = B(φ(xn ), γδn ) if n ∈ K,


= B(φ(yn ), 2γδn ) if n ∈ L,
= ∅ if n ∈ N \ (K ∪ L).
S
Then φ[Bn ∩ D] ⊆ Bn′ for every n, so φ[D ∩ A] ⊆ n∈N Bn′ , and

X X X
µ∗ φ[A ∩ D] ≤ µBn′ = γ r µBn + 2r γ r µBn
n=0 n∈K n∈L
r ∗ r r
≤ γ (µ A + ǫ) + 2 γ ǫ.
∗ r ∗
As ǫ is arbitrary, µ φ[A ∩ D] ≤ γ µ A, as claimed.

262E Corollary Let φ : D → R r be an injective Lipschitz function, where D ⊆ R r , and f a measurable function
from a subset of R r to R.
(a) If φ−1 is defined almost everywhere in a subset H of R r and f is defined almost everywhere in R r , then f φ−1
is defined almost everywhere in H.
(b) If E ⊆ D is Lebesgue measurable then φ[E] is measurable.
(c) If D is measurable then f φ−1 is measurable.
proof Set
C = dom(f φ−1 ) = {y : y ∈ φ[D], φ−1 (y) ∈ dom f } = φ[D ∩ dom f ].

(a) Because f is defined almost everywhere, φ[D \ dom f ] is negligible. But now
C = φ[D] \ φ[D \ dom f ] = dom φ−1 \ φ[D \ dom f ],
so
H \ C ⊆ (H \ dom φ−1 ) ∪ φ[D \ dom f ]
is negligible.

S that E ⊆ D and that E is measurable. Let hFn in∈N be a sequence of closed bounded subsets of E
(b) Now suppose
such that µ(E \ n∈N Fn ) = 0 (134Fb). Because φ is Lipschitz, it
Sis continuous, so φ[Fn ] is compact, therefore closed,
therefore measurable for every n (2A2F, 2A2E, 115G); also φ[E \ n∈N Fn ] is negligible, by 262D, therefore measurable.
So
S S
φ[E] = φ[E \ n∈N Fn ] ∪ n∈N φ[Fn ]
is measurable.
(c) For any a ∈ R, take a measurable set E ⊆ R r such that {x : f (x) ≥ a} = E ∩ dom f . Then
262Ha Lipschitz and differentiable functions 287

{y : y ∈ C, f φ−1 (y) ≥ a} = C ∩ φ[D ∩ E].


But φ[D ∩ E] is measurable, by (b), so {y : f φ−1 (y) ≥ a} is relatively measurable in C. As a is arbitrary, f φ−1 is
measurable.

262F Differentiability I come now to the class of functions whose properties will take up most of the rest of the
chapter.
Definitions Suppose r, s ≥ 1 and that φ is a function from a subset D = dom φ of R r to R s .
(a) φ is differentiable at x ∈ D if there is a real s × r matrix T such that
kφ(y)−φ(x)−T (y−x)k
limy→x = 0;
ky−xk

in this case we may write T = φ′ (x).


(b) I will say that φ is differentiable relative to its domain at x, and that T is a derivative of φ at x, if x ∈ D
and for every ǫ > 0 there is a δ > 0 such that kφ(y) − φ(x) − T (y − x)k ≤ ǫky − xk for every y ∈ B(x, δ) ∩ D.

262G Remarks (a) The standard definition in 262Fa, involving an all-sided limit ‘limy→x ’, implicitly requires φ to
be defined on some non-trivial ball centered on x, so that we can calculate φ(y) − φ(x) − T (y − x) for all y sufficiently
kT1 z−T2 zk
near x. It has the advantage that the derivative T = φ′ (x) is uniquely defined (because if limz→0 = 0 then
kzk
k(T1 −T2 )zk kT1 (αz)−T2 (αz)k
= limα→0 =0
kzk kαzk

for every non-zero z, so T1 − T2 must be the zero matrix). For our purposes here, there is some advantage in relaxing
this slightly to the form in 262Fb, so that we do not need to pay special attention to the boundary of dom φ.

(b) If you have not seen this concept of ‘differentiability’ before, but have some familiarity with partial differentiation,
it is necessary to emphasize that the concept of ‘differentiable’ function (at least in the strict sense demanded by 262Fa)
is strictly stronger than the concept of ‘partially differentiable’ function. For purposes of computation, the most useful
method of finding true derivatives is through 262Id below. For a simple example of a function with a full set of partial
derivatives, which is not everywhere differentiable, consider φ : R 2 → R defined by

ξ1 ξ2
φ(ξ1 , ξ2 ) = if ξ12 + ξ22 6= 0,
ξ12 + ξ22
= 0 if ξ1 = ξ2 = 0.
∂φ
Then φ is not even continuous at 0, although both partial derivatives ∂ξj are defined everywhere.

(c) In the definition above, I speak of a derivative as being a matrix. Properly speaking, the derivative of a function
defined on a subset of R r and taking values in R s should be thought of as a bounded linear operator from R r to R s ;
the formulation in terms of matrices is acceptable just because there is a natural one-to-one correspondence between
s × r real matrices and linear operators from R r to R s , and all these linear operators are bounded. I use the ‘matrix’
description because it makes certain calculations more direct; in particular, the relationship between φ′ and the partial
derivatives of φ (262Ic), and the notion of the determinant det φ′ (x), used throughout §§263 and 265.

262H The norm of a matrix Some of the calculations below will rely on the notion of ‘norm’ of a matrix. The
one I will use (in fact, for our purposes here, any norm would do) is the ‘operator norm’, defined by saying
kT k = sup{kT xk : x ∈ R r , kxk ≤ 1}
for any s × r matrix T . For the basic facts concerning these norms, see 2A4F-2A4G. The following will also be useful.

(a) If all the coefficients of T are small, so is kT k; in fact, if T = hτij ii≤s,j≤r , and kxk ≤ 1, then |ξj | ≤ 1 for each j,
so
Ps Pr 
2 1/2
Ps Pr 1/2 √
kT xk = i=1 ( j=1 τij ξj ) ≤ i=1 ( j=1 |τij |)2 ≤ r s maxi≤s,j≤r |τij |,

and kT k ≤ r s maxi≤s,j≤r |τij |. (This is a singularly crude inequality. A better one is in 262Ya. But it tells us, in
particular, that kT k is always finite.)
288 Change of variable in the integral 262Hb

(b) If kT k is small, so are all the coefficients of T ; in fact, writing ej for the jth unit vector of R r , then the ith
coordinate of T ej is τij , so |τij | ≤ kT ej k ≤ kT k.

262I Lemma Let φ : D → R s be a function, where D ⊆ R r . For i ≤ s let φi : D → R be its ith coordinate, so that
φ(x) = (φ1 (x), . . . , φs (x)) for x ∈ D.
(a) If φ is differentiable relative to its domain at x ∈ D, then φ is continuous at x.
(b) If x ∈ D, then φ is differentiable relative to its domain at x iff each φi is differentiable relative to its domain at
x.
(c) If φ is differentiable at x ∈ D, then all the partial derivatives ∂φ
∂ξj of φ are defined at x, and the derivative of φ
i

at x is the matrix h ∂φ
∂ξj (x)ii≤s,j≤r .
i

∂φi
(d) If all the partial derivatives ∂ξj , for i ≤ s and j ≤ r, are defined in a neighbourhood of x ∈ D and are continuous
at x, then φ is differentiable at x.
proof (a) Let T be a derivative of φ at x. Applying the definition 262Fb with ǫ = 1, we see that there is a δ > 0 such
that
kφ(y) − φ(x) − T (y − x)k ≤ ky − xk
whenever y ∈ D and ky − xk ≤ δ. Now
kφ(y) − φ(x)k ≤ kT (y − x)k + ky − xk ≤ (1 + kT k)ky − xk
whenever y ∈ D and ky − xk ≤ δ, so φ is continuous at x.
(b)(i) If φ is differentiable relative to its domain at x ∈ D, let T be a derivative of φ at x. For i ≤ s let Ti be the
1 × r matrix consisting of the ith row of T . Let ǫ > 0. Then we have a δ > 0 such that

|φi (y) − φi (x) − Ti (y − x)| ≤ kφ(y) − φ(x) − T (y − x)k


≤ ǫky − xk
whenever y ∈ D and ky − xk ≤ δ, so that Ti is a derivative of φi at x.
(ii) If each φi is differentiable relative to its domain at x, with corresponding derivatives Ti , let T be the s × r
matrix with rows T1 , . . . , Ts . Given ǫ > 0, there is for each i ≤ s a δi > 0 such that
|φi (y) − φi (x) − Ti y| ≤ ǫky − xk whenever y ∈ D, ky − xk ≤ δi ;
set δ = mini≤s δi > 0; then if y ∈ D and ky − xk ≤ δ, we shall have
Ps
kφ(y) − φ(x) − T (y − x)k2 = i=1 |φi (y) − φi (x) − Ti (y − x)|2 ≤ sǫ2 ky − xk2 ,
so that

kφ(y) − φ(x) − T (y − x)k ≤ ǫ sky − xk.
As ǫ is arbitrary, T is a derivative of φ at x.
(c) Set T = φ′ (x). We have
kφ(y)−φ(x)−T (y−x)k
limy→x = 0;
ky−xk
fix j ≤ r, and consider y = x + ηej , where ej = (0, . . . , 0, 1, 0, . . . , 0) is the jth unit vector in R r . Then we must have
kφ(x+ηej )−φ(x)−ηT (ej )k
limη→0 = 0.
|η|
Looking at the ith coordinate of φ(x + ηej ) − φ(x) − ηT (ej ), we have
|φi (x + ηej ) − φi (x) − τij η| ≤ kφ(x + ηej ) − φ(x) − ηT (ej )k,
where τij is the (i, j)th coefficient of T ; so that
|φi (x+ηej )−φi (x)−τij η|
limη→0 = 0.
|η|
∂φi
But this just says that the partial derivative ∂ξj (x) exists and is equal to τij , as claimed.
∂φi
(d) Now suppose that the partial derivatives ∂ξj are defined near x and continuous at x. Let ǫ > 0. Let δ > 0 be
such that
262L Lipschitz and differentiable functions 289

| ∂φ
∂ξj (y) − τij | ≤ ǫ
i

∂φi
whenever ky − xk ≤ δ, writing τij = ∂ξj (x). Now suppose that ky − xk ≤ δ. Set
y = (η1 , . . . , ηr ), x = (ξ1 , . . . , ξr ),

yj = (η1 , . . . , ηj , ξj+1 , . . . , ξr ) for 0 ≤ j ≤ r,


so that y0 = x, yr = y and the line segment between yj−1 and yj lies wholly within δ of x whenever 1 ≤ j ≤ r,
since if z lies on this line segment then ζi lies between ξi and ηi for every i. By the ordinary mean value theorem for
differentiable real functions, applied to the function
t 7→ φi (η1 , . . . , ηj−1 , t, ξj+1 , . . . , ξr ),
there is for each i ≤ s, j ≤ r a point zij on the line segment between yj−1 and yj such that
φi (yj ) − φi (yj−1 ) = (ηj − ξj ) ∂φ
∂ξj (zij ).
i

But
| ∂φ
∂ξj (zij ) − τij | ≤ ǫ,
i

so
|φi (yj ) − φi (yj−1 ) − τij (ηj − ξj )| ≤ ǫ|ηj − ξj | ≤ ǫky − xk.
Summing over j,
Pr
|φi (y) − φi (x) − j=1 τij (ηj − ξj )| ≤ rǫky − xk
for each i. Summing the squares and taking the square root,

kφ(y) − φ(x) − T (y − x)k ≤ ǫr sky − xk,
where T = hτij ii≤s,j≤r . And this is true whenever ky − xk ≤ δ. As ǫ is arbitrary, φ′ (x) = T is defined.

262J Remark I am not sure if I ought to apologize for the notation ∂ξ∂ j . In such formulae as (ηj − ξj ) ∂φ ∂ξj (zij )
i

above, the two appearances of ξj clash most violently. But I do not think that any person of good will is likely to be
misled, provided that the labels ξj (or whatever symbols are used to represent the variables involved) are adequately
described when the domain of φ is first introduced (and always remembering that in partial differentiation, we are not
only moving one variable – a ξj in the present context – but holding fixed some further list of variables, not listed in
the notation). I believe that the traditional notation ∂ξ∂ j has survived for solid reasons, and I should like to offer a
welcome to those who are more comfortable with it than with any of the many alternatives which have been proposed,
but have never taken root.

262K The Cantor function revisited It is salutary to re-examine the examples of 134H-134I in the light of the
present considerations. Let f : [0, 1] → [0, 1] be the Cantor function (134H) and set g(x) = 12 (x + f (x)) for x ∈ [0, 1].
Then g : [0, 1] → [0, 1] is a homeomorphism (134I); set φ = g −1 : [0, 1] → [0, 1]. We see that if 0 ≤ x ≤ y ≤ 1
then g(y) − g(x) ≥ 12 (y − x); equivalently, φ(y) − φ(x) ≤ 2(y − x) whenever 0 ≤ x ≤ y ≤ 1, so that φ is a Lipschitz
function, therefore absolutely continuous (262Bc). If D = {x : φ′ (x) is defined}, then [0, 1] \ D is negligible (225Cb),
so [0, 1] \ φ[D] = φ[ [0, 1] \ D] is negligible (262Da). I noted in 134I that there is a measurable function h : [0, 1] → R
such that the composition hφ is not measurable; now h(φ↾D) = (hφ)↾D cannot be measurable, even though φ↾D is
differentiable.

262L It will be convenient to be able to call on the following straightforward result.


µ∗ (D∩B(x,δ)) ρ(x+z,D)
Lemma Suppose that D ⊆ R r and x ∈ R r are such that limδ↓0 = 1. Then limz→0 = 0, where
µB(x,δ) kzk
ρ(x + z, D) = inf y∈D kx + z − yk.
proof Let ǫ > 0. Let δ0 > 0 be such that
ǫ r
µ∗ (D ∩ B(x, δ)) > (1 − ( ) )µB(x, δ)
1+ǫ

whenever 0 < δ ≤ δ0 . Take any z such that 0 < kzk ≤ δ0 /(1 + ǫ). ?? Suppose, if possible, that ρ(x + z, D) > ǫkzk.
Then B(x + z, ǫkzk) ⊆ B(x, (1 + ǫ)kzk) \ D, so
290 Change of variable in the integral 262L

µ∗ (D ∩ B(x, (1 + ǫ)kzk)) ≤ µB(x, (1 + ǫ)kzk) − µB(x + z, ǫkzk)


ǫ r
= (1 − ( ) )µB(x, (1 + ǫ)kzk),
1+ǫ

which is impossible, as (1 + ǫ)kzk ≤ δ0 . X


X Thus ρ(x + z, D) ≤ ǫkzk. As ǫ is arbitrary, this proves the result.
Remark There is a word for this; see 261Yg.

262M I come now to the first result connecting Lipschitz functions with differentiable functions. I approach it
through a substantial lemma which will be the foundation of §263.
Lemma Let r, s ≥ 1 be integers and φ a function from a subset D of R r to R s which is differentiable at each point
of its domain. For each x ∈ D let T (x) be a derivative of φ. Let Msr be the set of s × r matrices and ζ : A → ]0, ∞[
a strictly positive function, where A ⊆ Msr is a non-empty set containing T (x) for every x ∈ D. Then we can find
sequences hDn in∈N , hTn in∈N such that
(i) hDn in∈N is a partition of D into sets which are relatively measurable in D, that is, are intersections of D with
measurable subsets of R r ;
(ii) Tn ∈ A for every n;
(iii) kφ(x) − φ(y) − Tn (x − y)k ≤ ζ(Tn )kx − yk for every n ∈ N and x, y ∈ Dn ;
(iv) kT (x) − Tn k ≤ ζ(Tn ) for every x ∈ Dn .
proof (a) The first step is to note that there is a sequence hSn in∈N in A such that
S
A ⊆ n∈N {T : T ∈ Msr , kT − Sn k < ζ(Sn )}.
P
P (Of course this is a standard result about separable metric spaces.) Write Q for the set of matrices in Msr with
rational coefficients; then there is a natural bijection between Q and Qsr , so Q and Q × N are countable. Enumerate
Q × N as h(Rn , kn )in∈N . For each n ∈ N, choose Sn ∈ A by the rule
— if there is an S ∈ A such that {T : kT − Rn k ≤ 2−kn } ⊆ {T : kT − Sk < ζ(S)}, take such an S for Sn ;
— otherwise, take Sn to be any member of A.
I claim that this works. For let S ∈ A. Then ζ(S) > 0; take k ∈ N such that 2−k < ζ(S). Take R∗ ∈ Q such that
kR∗ − Sk < min(ζ(S) − 2−k , 2−k ); this is possible because kR − Sk will be small whenever all the coefficients of R
are close enough to the corresponding coefficients of S (262Ha), and we can find rational numbers to achieve this. Let
n ∈ N be such that R∗ = Rn and k = kn . Then
{T : kT − Rn k ≤ 2−kn } ⊆ {T : kT − Sk < ζ(S)}
(because kT − Sk ≤ kT − Rn k + kRn − Sk), so we must have chosen Sn by the first part of the rule above, and
S ∈ {T : kT − Rn k ≤ 2−kn } ⊆ {T : kT − Sn k < ζ(Sn )}.
As S is arbitrary, this proves the result. Q
Q
(b) Enumerate Qr × Qr × N as h(qn , qn′ , mn )in∈N . For each n ∈ N, set

Hn = {x : x ∈ [qn , qn′ ] ∩ D, kφ(y) − φ(x) − Smn (y − x)k ≤ ζ(Smn )ky − xk


for every y ∈ [qn , qn′ ] ∩ D}
\
= [qn , qn′ ] ∩ D ∩ {x : x ∈ D,
′ ]∩D
y∈[qn ,qn

kφ(y) − φ(x) − Smn (y − x)k ≤ ζ(Smn )ky − xk}.


Because φ is continuous, Hn = D ∩ H n , writing H n for the closure of Hn , so Hn is relatively measurable in D. Note
that if x, y ∈ Hn , then y ∈ D ∩ [qn , qn′ ], so that
kφ(y) − φ(x) − Smn (y − x)k ≤ ζ(Smn )ky − xk.
Set
Hn′ = {x : x ∈ Hn , kT (x) − Smn k ≤ ζ(Smn )}.
S
(c) D = n∈N Hn′ . P
P Let x ∈ D. Then T (x) ∈ A, so there is a k ∈ N such that kT (x) − Sk k < ζ(Sk ). Let δ > 0 be
such that
kφ(y) − φ(x) − T (x)(x − y)k ≤ (ζ(Sk ) − kT (x) − Sk k)kx − yk
262N Lipschitz and differentiable functions 291

whenever y ∈ D and ky − xk ≤ δ. Then

kφ(y) − φ(x) − Sk (x − y)k ≤ (ζ(Sk ) − kT (x) − Sk k)kx − yk + kT (x) − Sk kkx − yk


≤ ζ(Sk )kx − yk
whenever y ∈ D ∩ B(x, δ). Let q, q ∈ Q be such that x ∈ [q, q ′ ] ⊆ B(x, δ). Let n be such that q = qn , q ′ = qn′ and
′ r

k = mn . Then x ∈ Hn′ . Q
Q
(d) Write
µ∗ (Hn ∩B(x,δ))
Cn = {x : x ∈ Hn , limδ↓0 = 1}.
µB(x,δ)

Then Cn ⊆ Hn′ .
P (i) Take x ∈ Cn , and set T̃ = T (x) − Smn . I have to show that kT̃ k ≤ ζ(Smn ). Take ǫ > 0. Let δ0 > 0 be such
P
that
kφ(y) − φ(x) − T (x)(y − x)k ≤ ǫky − xk
whenever y ∈ D and ky − xk ≤ δ0 . Since
kφ(y) − φ(x) − Smn (y − x)k ≤ ζ(Smn )ky − xk
whenever y ∈ Hn , we have
kT̃ (y − x)k ≤ (ǫ + ζ(Smn ))ky − xk
whenever y ∈ Hn and ky − xk ≤ δ0 .
(ii) By 262L, there is a δ1 > 0 such that (1 + 2ǫ)δ1 ≤ δ0 and ρ(x + z, Hn ) ≤ ǫkzk whenever 0 < kzk ≤ δ1 . So if
kzk ≤ δ1 there is a y ∈ Hn such that kx + z − yk ≤ 2ǫkzk. (If z = 0 we can take y = x.) Now kx − yk ≤ (1 + 2ǫ)kzk ≤ δ0 ,
so

kT̃ zk ≤ kT̃ (y − x)k + kT̃ (x + z − y)k


≤ (ǫ + ζ(Smn ))ky − xk + kT̃ kkx + z − yk
≤ (ǫ + ζ(Smn ))kzk + (ǫ + ζ(Smn ) + kT̃ k)kx + z − yk
≤ (ǫ + ζ(Smn ) + 2ǫ2 + 2ǫζ(Smn ) + 2ǫkT̃ k)kzk.
And this is true whenever 0 < kzk ≤ δ1 . But multiplying this inequality by suitable positive scalars we see that

kT̃ zk ≤ ǫ + ζ(Smn ) + 2ǫ2 + 2ǫζ(Smn ) + 2ǫkT̃ k kzk
for all z ∈ R r , and
kT̃ k ≤ ǫ + ζ(Smn ) + 2ǫ2 + 2ǫζ(Smn ) + 2ǫkT̃ k.
As ǫ is arbitrary, kT̃ k ≤ ζ(Smn ), as claimed. Q
Q
(e) By 261Da, Hn \ Cn is negligible for every n, so Hn \ Hn′ is negligible, and
Hn′ = D ∩ (H n \ (Hn \ Hn′ ))
is relatively measurable in D. Set
S
Dn = Hn′ \ k<n Hk′ , Tn = Sm n
for each n; these serve.

262N Corollary Let φ be a function from a subset D of Rr to R s , and suppose that φ is differentiable relative to
its domain at each point of D. Then D can be expressed as the union of a sequence hDn in∈N of sets such that φ↾Dn
is Lipschitz for each n ∈ N.
proof In 262M, take ζ(T ) = 1 for every T ∈ A = Msr . If x, y ∈ Dn then

kφ(x) − φ(y)k ≤ kφ(x) − φ(y) − Tn (x − y)k + kTn (x − y)k


≤ kx − yk + kTn kkx − yk,
so φ↾Dn is (1 + kTn k)-Lipschitz.
292 Change of variable in the integral 262O

262O Corollary Suppose that φ is an injective function from a measurable subset D of R r to R r , and that φ is
differentiable relative to its domain at every point of D.
(a) If A ⊆ D is negligible, φ[A] is negligible.
(b) If E ⊆ D is measurable, then φ[E] is measurable.
(c) If D is measurable and f is a measurable function defined on a subset of R r , then f φ−1 is measurable.
(d) If H ⊆ R r and φ−1 is defined almost everywhere on H, and if f is a function defined almost everywhere in Rr ,
then f φ−1 is defined almost everywhere in H.
hDn in∈N be a sequence of measurable sets with union D such that φ↾Dn is Lipschitz for each n. Then
proof Let S
φ[A ∩ D] = n∈N (φ↾Dn )[A ∩ Dn ] is negligible for every negligible A ⊆ R r , by 262D.
Now parts (b)-(d) follow from (a) (because φ is continuous), just as in 262E.

262P Corollary Let φ be a function from a a subset D of R r to R s , and suppose that φ is differentiable relative
to its domain, with a derivative T (x), at each point x ∈ D. Then the function x 7→ T (x) is measurable in the sense
that τij : D → R is measurable for all i ≤ s and j ≤ r, where τij (x) is the (i, j)th coefficient of the matrix T (x) for all
i, j and x.
proof For each k ∈ N, apply 262M with ζ(T ) = 2−k for each T ∈ A = Msr , obtaining sequences hDkn in∈N of relatively
(kn)
measurable subsets of D and hTkn in∈N in Msr . Let τij be the (i, j)th coefficient of Tkn . Then we have functions
fijk : D → R defined by setting
(kn)
fijk (x) = τij if x ∈ Dkn .
Because the Dkn are relatively measurable, the fijk are measurable functions. For x ∈ Dkn ,
|τij (x) − fijk (x)| ≤ kT (x) − Tn k ≤ 2−k ,
so |τij (x) − fijk (x)| ≤ 2−k for every x ∈ D, and
τij = limk→∞ fijk
is measurable, as claimed.

*262Q This concludes the part of the section which is essential for the rest of the chapter. However the main
results of §263 will I think be better understood if you are aware of the fact that any Lipschitz function is differentiable
(relative to its domain) almost everywhere in its domain. I devote the next couple of pages to a proof of this fact,
which apart from its intrinsic interest is a useful exercise.
Rademacher’s theorem Let φ be a Lipschitz function from a subset of R r to R s , where s ≥ 1. Then φ is differentiable
relative to its domain almost everywhere in its domain.
proof (a) By 262Ba and 262Ib, it will be enough to deal with the case s = 1. By 262Bb, there is a Lipschitz function
φ̃ : R r → R extending φ; now φ is differentiable with respect to its domain at any point of dom φ at which φ̃ is
differentiable, so it will be enough if I can show that φ̃ is differentiable almost everywhere. To make the notation more
agreeable to the eye, I will suppose that φ itself was defined everywhere in R r . Let γ be a Lipschitz constant for φ.
The proof proceeds by induction on r. If r = 1, we have a Lipschitz function φ : R → R; now φ is absolutely
continuous in any bounded interval (262Bc), therefore differentiable almost everywhere. Thus the induction starts.
The rest of the proof is devoted to the inductive step to r > 1.
∂φ
(b) The first step is to show that all the partial derivatives ∂ξj are defined almost everywhere and are Borel
P Take j ≤ r. For q ∈ Q \ {0} set
measurable. P
1
∆q (x) = (φ(x + qej ) − φ(x)),
q
writing ej for the jth unit vector of R r . Because φ is continuous, so is ∆q , so that ∆q is a Borel measurable function
for each q. Next, for any x ∈ Rr ,
1
D+ (x) = lim supδ→0 (φ(x + δej ) − φ(x)) = limn→∞ supq∈Q,0<|q|≤2−n ∆q (x),
δ

so that the set on which D+ (x) is defined in R is Borel and D+ is a Borel measurable function. Similarly,
1
D− (x) = lim inf δ→0 (φ(x + δej ) − φ(x))
δ
*262Q Lipschitz and differentiable functions 293

is a Borel measurable function with Borel domain. So


∂φ
E = {x : ∂ξj (x) exists in R} = {x : D+ (x) = D− (x) ∈ R}
∂φ
is a Borel set, and ∂ξj is a Borel measurable function.
On the other hand, if we identify R r with R J × R, taking J to be {1, . . . , j − 1, j + 1, . . . , r}, then we can think
of the measure µ on R r as being the product of Lebesgue measure µJ on R J with Lebesgue measure µ1 on R (251N).
Now for every y ∈ R J we have a function φy : R → R defined by writing
φy (σ) = φ(y, σ),
and E becomes
{(y, σ) : φ′y (σ) is defined},
so that all the sections
{σ : (y, σ) ∈ E}
are conegligible subsets of R, because every φy is Lipschitz, therefore differentiable almost everywhere, as remarked in
part (a) of the proof. Since we know that E is measurable, it must be conegligible, by Fubini’s theorem (apply 252D
∂φ
or 252F to the complement of E). Thus ∂ξ j
is defined almost everywhere, as claimed. Q Q
Write
∂φ
H = {x : x ∈ R r , ∂ξj (x) exists for every j ≤ r},
so that H is a conegligible Borel set in R r .
(c) For the rest of this proof, I fix on the natural identification of R r with R r−1 × R, identifying (ξ1 , . . . , ξr ) with
∂φ ∂φ
((ξ1 , . . . , ξr−1 ), ξr ). For x ∈ H, let T (x) be the 1 × r matrix ( ∂ξ 1
(x), . . . , ∂ξ r
(x)).

(d) Set
|φ(x+(u,0))−φ(x)−T (x)(u,0)|
H1 = {x : x ∈ H, limu→0 in R r−1 = 0}.
kuk

I claim that H1 is conegligible in R r . P


P This is really the same idea as in (b). For x ∈ H, x ∈ H1 iff
for every ǫ > 0 there is a δ > 0 such that
|φ(x + (u, 0)) − φ(x) − T (x)(u, 0)| ≤ ǫkuk
whenever kuk ≤ δ,
that is, iff
for every m ∈ N there is an n ∈ N such that
|φ(x + (u, 0)) − φ(x) − T (x)(u, 0)| ≤ 2−m kuk
whenever u ∈ Qr−1 and kuk ≤ 2−n .
But for any particular m ∈ N and u ∈ Qr−1 the set
{x : |φ(x + (u, 0)) − φ(x) − T (x)(u, 0)| ≤ 2−m kuk}
is measurable, indeed Borel, because all the functions x 7→ φ(x+(u, 0)), x 7→ φ(x), x 7→ T (x)(u, 0) are Borel measurable.
So H1 is of the form
T S T
m∈N n∈N u∈Qr−1 ,kuk≤2−n Emnu

where every Emnu is a measurable set, and H1 is therefore measurable.


Now however observe that for any σ ∈ R, the function
v 7→ φσ (v) = φ(v, σ) : R r−1 → R
is Lipschitz, therefore (by the inductive hypothesis) differentiable almost everywhere in R r−1 ; and that (v, σ) ∈ H1 iff
(v, σ) ∈ H and φ′σ (v) is defined. Consequently {v : (v, σ) ∈ H1 } is conegligible whenever {v : (v, σ) ∈ H} is, that is,
for almost every σ ∈ R; so that H1 , being measurable, must be conegligible. Q Q
(e) Now, for q, q ′ ∈ Q and n ∈ N, set
φ(x+(0,η))−φ(x)
F (q, q ′ , n) = {x : x ∈ R r , q ≤ ≤ q ′ whenever 0 < |η| ≤ 2−n }.
η
294 Change of variable in the integral *262Q

Set
µ∗ (F (q,q ′ ,n)∩B(x,δ))
F∗ (q, q ′ , n) = {x : x ∈ F (q, q ′ , n), limδ↓0 = 1}.
µB(x,δ)

By 261Da, F (q, q ′ , n) \ F∗ (q, q ′ , n) is negligible for all q, q ′ , n, so that


S
H2 = H1 \ q,q′ ∈Q,n∈N (F (q, q ′ , n) \ F∗ (q, q ′ , n))
is conegligible.
∂φ
P Take x = (u, σ) ∈ H2 . Then α =
(f ) I claim that φ is differentiable at every point of H2 . P (x) and T = T (x)
∂ξr
are defined. Let γ be a Lipschitz constant for φ.
Take ǫ > 0; take q, q ′ ∈ Q such that α − ǫ ≤ q < α < q ′ ≤ α + ǫ. There must be an n ∈ N such that x ∈ F (q, q ′ , n);
consequently x ∈ F∗ (q, q ′ , n), by the definition of H2 . By 262L, there is a δ0 > 0 such that ρ(x + z, F (q, q ′ , n)) ≤ ǫkzk
whenever kzk ≤ δ0 . Next, there is a δ1 > 0 such that |φ(x + (v, 0)) − φ(x) − T (v, 0)| ≤ ǫkvk whenever v ∈ R r−1 and
kvk ≤ δ1 . Set
δ = min(δ0 , δ1 , 2−n )/(1 + 2ǫ) > 0.
Suppose that z = (v, τ ) ∈ R r and that kzk ≤ δ. Because kzk ≤ δ0 there is an x′ = (u′ , σ ′ ) ∈ F (q, q ′ , n) such that
kx + z − x′ k ≤ 2ǫkzk; set x∗ = (u′ , σ). Now
max(ku − u′ k, |σ − σ ′ |) ≤ kx − x′ k ≤ (1 + 2ǫ)kzk ≤ min(δ1 , 2−n ).
so
|φ(x∗ ) − φ(x) − T (x∗ − x)| ≤ ǫku′ − uk ≤ ǫ(1 + 2ǫ)kzk.
But also
|φ(x′ ) − φ(x∗ ) − T (x′ − x∗ )| = |φ(x′ ) − φ(x∗ ) − α(σ ′ − σ)| ≤ ǫ|σ ′ − σ| ≤ ǫ(1 + 2ǫ)kzk,
because x′ ∈ F (q, q ′ , n) and |σ − σ ′ | ≤ 2−n , so that (if x′ 6= x∗ )
φ(x∗ )−φ(x′ )
α−ǫ≤q ≤ ≤ q′ ≤ α + ǫ
σ−σ ′
and
φ(x′ )−φ(x∗ )
− α ≤ ǫ.
σ −σ

Finally,
|φ(x + z) − φ(x′ )| ≤ γkx + z − x′ k ≤ 2γǫkzk,

|T z − T (x′ − x)| ≤ kT kkx + z − x′ k ≤ 2ǫkT kkzk.


Putting all these together,

|φ(x + z) − φx − T z| ≤ |φ(x + z) − φ(x′ )| + |T (x′ − x) − T z|


+ |φ(x′ ) − φ(x∗ ) − T (x′ − x∗ )| + |φ(x∗ ) − φ(x) − T (x∗ − x)|
≤ 2γǫkzk + 2ǫkT kkzk + ǫ(1 + 2ǫ)kzk + ǫ(1 + 2ǫ)kzk
= ǫ(2γ + 2kT k + 2 + 4ǫ)kzk.
And this is true whenever kzk ≤ δ. As ǫ is arbitrary, φ is differentiable at x. Q
Q
Thus {x : φ is differentiable at x} includes H2 and is conegligible; and the induction continues.

262X Basic exercises (a) Let φ and ψ be Lipschitz functions from subsets of R r to R s . Show that φ + ψ is a
Lipschitz function from dom φ ∩ dom ψ to Rs .
(b) Let φ be a Lipschitz function from a subset of R r to R s , and c ∈ R. Show that cφ is a Lipschitz function.
(c) Suppose φ : D → R s and ψ : E → R q are Lipschitz functions, where D ⊆ R r and E ⊆ R s . Show that the
composition ψφ : D ∩ φ−1 [E] → R q is Lipschitz.
(d) Suppose φ, ψ are functions from subsets of R r to R s , and suppose that x ∈ dom φ ∩ dom ψ is such that each
function is differentiable relative to its domain at x, with derivatives S, T there. Show that φ + ψ is differentiable
relative to its domain at x, and that S + T is a derivative of φ + ψ at x.
262Yg Lipschitz and differentiable functions 295

(e) Suppose that φ is a function from a subset of R r to R s , and is differentiable relative to its domain at x ∈ dom φ.
Show that cφ is differentiable relative to its domain at x for every c ∈ R.

> (f ) Suppose φ : D → R s and ψ : E → Rq are functions, where D ⊆ R r and E ⊆ R s ; suppose that φ is differentiable
relative to its domain at x ∈ D ∩ φ−1 [E], with an s × r matrix T a derivative there, and that ψ is differentiable relative
to its domain at φ(x), with a q × s matrix S a derivative there. Show that the composition ψφ is differentiable relative
to its domain at x, and that the q × r matrix ST is a derivative of ψφ at x.

(g) Let φ : R r → R s be a linear operator, with associated matrix T . Show that φ is differentiable everywhere, with

φ (x) = T for every x.
∂φi
> (h) Let G ⊆ R r be a convex open set, and φ : G → R s a function such that all the partial derivatives ∂ξj are
defined everywhere in G. Show that φ is Lipschitz iff all the partial derivatives are bounded on G.

(i) Let φ : R r → R s be a function. Show that φ is differentiable at x ∈ R r iff for every m ∈ N there are an n ∈ N and
an r × s matrix T with rational coefficients such that kφ(y) − φ(x) − T (y − x)k ≤ 2−m ky − xk whenever ky − xk ≤ 2−n .

> (j) Suppose that f is a real-valued function which is integrable over R r , and that g : R r → R is a bounded
∂g
differentiable function such that the partial derivative is bounded, where j ≤ r. Let f ∗ g be the convolution of f
∂ξj
∂ ∂g
and g (255L). Show that (f ∗ g) is defined everywhere and equal to f ∗ . (Hint: 255Xd.)
∂ξj ∂ξj

> (k) Let (X, Σ, µ) be a measure space, G ⊆ R r an open set, and f : X × G → R a function. Suppose that
(i) for every x ∈ X, t 7→ f (x, t) : G → R is differentiable;
∂f
(ii) there is an integrable function g on X such that | ∂τ (x, t)| ≤ g(x) whenever x ∈ X, t ∈ G and j ≤ r;
R j

(iii) |f (x, t)|µ(dx) exists in R for every t ∈ G.


R
Show that t 7→ f (x, t)µ(dx) : G → R is differentiable. (Hint: show first that, for a suitable M , |f (x, t) − f (x, t′ )| ≤
M |g(x)|kt − t′ k for every t, t′ ∈ G and x ∈ X.)

262Y Further exercises (a) Show that if T = hτij ii≤s,j≤r is an s × r matrix then the operator norm kT k, as
qP
s Pr 2
defined in 262H, is at most i=1 j=1 τij .

∂φ
(b) Give an example of a measurable function φ : R 2 → R such that dom is not measurable.
∂ξ1

(c) Let φ : D → R be any function, where D ⊆ R r . Show that H = {x : x ∈ D, φ is differentiable relative to its
∂φ
domain at x} is relatively measurable in D, and that ↾H is measurable for every j ≤ r.
∂ξj

∂...∂φ
(d) A function φ : R r → R is smooth if all its partial derivatives are defined everywhere in R r and
∂ξi ∂ξj ...∂ξl
are continuous. Show that if f is integrable over R r and φ : R r → R is smooth and has bounded support then the
convolution f ∗ φ is smooth. (Hint: 262Xj, 262Xk.)
2 2 R
(e) For δ > 0 set φ̃δ (x) = e1/(δ −kxk ) if kxk < δ, 0 if kxk ≥ δ; set αδ = φ̃δ (x)dx, φδ (x) = αδ−1 φ̃δ (x) for every
x. (i) RShow that φδ : R r → R is smooth and has bounded support. (ii) Show that if f is integrable over R r then
limδ↓0 |f (x) − (f ∗ φδ )(x)|dx = 0. (Hint: start with continuous functions f with bounded support, and use 242O.)
r
R (f ) Show that if f is integrable over R and ǫ > 0 there is a smooth function h with bounded support such that
|f − h| ≤ ǫ. (Hint: either reduce to the case in which f has bounded support and use 262Ye or adapt the method of
242Xi.)

(g) Suppose that f is a real function which is integrable over every bounded subset of R r . (i)R Show that f × φ is
integrable whenever φ : R r → R is a smooth function with bounded support. R (ii) Show that if f × φ = 0 for every
smooth function with bounded support then f = 0 a.e. (Hint: show that B(x,δ) f = 0 for every x ∈ R r and δ > 0,
R
and use 261C. Alternatively show that E f = 0 first for E = [b, c], then for open sets E, then for arbitrary measurable
sets E.)
296 Change of variable in the integral 262Yh

(h) Let f be integrable over R r , and for δ > 0 let φδ : R r → R be the function of 262Ye. Show that limδ↓0 (f ∗φδ )(x) =
f (x) for every x in the Lebesgue set of f . (Hint: 261Ye.)

(i) Let L be the space of all Lipschitz functions from R r to R s and for φ ∈ L set
kφk = kφ(0)k + inf{γ : γ ∈ [0, ∞[, kφ(y) − φ(x)k ≤ γky − xk for every x, y ∈ R r }.
Show that (L, k k) is a Banach space.

262 Notes and comments The emphasis of this section has turned out to be on the connexions between the concepts
of ‘Lipschitz function’ and ‘differentiable function’. It is the delight of classical real analysis that such intimate
relationships arise between concepts which belong to different categories. ‘Lipschitz functions’ clearly belong to the
theory of metric spaces (I will return to this in §264), while ‘differentiable functions’ belong to the theory of differentiable
manifolds, which is outside the scope of this volume. I have written this section out carefully just in case there are
readers who have so far missed the theory of differentiable mappings between multi-dimensional Euclidean spaces; but
it also gives me a chance to work through the notion of ‘function differentiable relative to its domain’, which will make
it possible in the next section to ride smoothly past a variety of problems arising at boundaries. The difficulties I am
concerned with arise in the first place with such functions as the polar-coordinate transformation
(ρ, θ) 7→ (ρ cos θ, ρ sin θ) : {(0, 0)} ∪ (]0, ∞[ × ]−π, π]) → R 2 .
In order to make this a bijection we have to do something rather arbitrary, and the domain of the transformation
cannot be an open set. On the definitions I am using, this function is differentiable relative to its domain at every point
of its domain, and we can apply such results as 262O uninhibitedly. You will observe that in this case the non-interior
points of the domain form a negligible set {(0, 0)} ∪ (]0, ∞[ × {π}), so we can expect to be able to ignore them; and for
most of the geometrically straightforward transformations that the theory is applied to, judicious excision of negligible
sets will reduce problems to the case of honestly differentiable functions with open domains. But while open-domain
theory will deal with a large proportion of the most important examples, there is a danger that you would be left with
real misapprehensions concerning the scope of these methods.
The essence of differentiability is that a differentiable function φ is approximable, near any given point of its domain,
by an affine function. The idea of 262M is to describe a widely effective method of dissecting D = dom φ into countably
many pieces on each of which φ is well-behaved. This will be applied in §§263 and 265 to investigate the measure of
φ[D]; but we already have several straightforward consequences (262N-262P).

263 Differentiable transformations in R r


This section is devoted to the proof of a single major theorem (263D) concerning differentiable transformations
between subsets of R r . There will be a generalization of this result in §265, and those with some familiarity with the
topic, or sufficient hardihood, may wish to read §264 before taking this section and §265 together. I end with a few
simple corollaries and an extension of the main result which can be made in the one-dimensional case (263I).
Throughout this section, as in the rest of the chapter, µ will denote Lebesgue measure on R r .

263A Linear transformations I begin with the special case of linear operators, which is not only the basis of the
proof of 263D, but is also one of its most important applications, and is indeed sufficient for many very striking results.
Theorem Let T be a real r × r matrix; regard T as a linear operator from R r to itself. Let J = | det T | be the modulus
of its determinant. Then
µT [E] = JµE
r
for every measurable set E ⊆ R . If T is a bijection (that is, if J 6= 0), then
µF = JµT −1 [F ]
for every measurable F ⊆ R r , and
R R
F
g dµ = J T −1 [F ]
gT dµ
for every integrable function g and measurable set F .
proof (a) The first step is to show that T [I] is measurable for every half-open interval I ⊆ R r . P P Any non-empty
half-open interval I = [a, b[ is a countable union of closed intervals In = [a, b − 2−n 1], and each In is compact (2A2F),
263A Differentiable transformations in Rr 297

S
so that T [In ] is compact (2A2Eb), therefore closed (2A2Ec), therefore measurable (115G), and T [I] = n∈N T [In ] is
measurable. Q Q

(b) Set J ∗ = µT [ [0, 1[ ], where 0 = (0, . . . , 0) and 1 = (1, . . . , 1); because T [ [0, 1[ ] is bounded, J ∗ < ∞. (I will
eventually show that J ∗ = J.) It is convenient to deal with the case of singular T first. Recall that T , regarded as a
linear transformation from R r to itself, is either bijective or onto a proper linear subspace. In the latter case, take any
e ∈ Rr \ T [R r ]; then the sets
T [ [0, 1[ ] + γe,
as γ runs over [0, 1], are disjoint and all of the same measure J ∗ , because µ is translation-invariant (134A); moreover,
their union is bounded, so has finite outer measure. As there are infinitely many such γ, the common measure J ∗ must
be zero. Now observe that
S
T [R r ] = z∈Zr T [ [0, 1[ ] + T z,
and
µ(T [ [0, 1[ ] + T z) = J ∗ = 0
for every z ∈ Zr , while Zr is countable, so µT [Rr ] = 0. At the same time, because T is singular, it has zero determinant,
and J = 0. Accordingly
µT [E] = 0 = JµE
r
for every measurable E ⊆ R , and we’re done.

(c) Henceforth, therefore, let us assume that T is non-singular. Note that it and its inverse are continuous, so that
T is a homeomorphism, and T [G] is open iff G is open.
If a ∈ R r and k ∈ N, then
 
µT [ a, a + 2−k 1 ] = 2−kr J ∗ .
     
PP Set Jk∗ = µT [ 0, 2−k 1 ]. Now T [ a, a + 2−k 1 ] = T [ 0, 2−k 1 ] + T a; because
 µ is translation-invariant,
 its measure
∗ kr −k
is also Jk . Next, [0, 1[ is expressible as a disjoint uion of 2 sets of the form a, a + 2 1 ; consequently, T [ [0, 1[ ] is
expressible as a disjoint uion of 2kr sets of the form T [ a, a + 2−k 1 ], and
J ∗ = µT [ [0, 1[ ] = 2kr Jk∗ ,
that is, Jk∗ = 2−kr J ∗ , as claimed. Q
Q

(d) Consequently µT [G] = J ∗ µG for every open set G ⊆ R r . P P For each k ∈ N, set
 −k 
Qk = {z : z ∈ Z , 2 z, 2 z + 2−k 1 ⊆ G,
r −k

S  
Gk = z∈Qk 2−k z, 2−k z + 2−k 1 .
 
Then Gk is a disjoint union of #(Qk ) sets of the form 2−k z, 2−k z + 2−k 1 , so µGk = 2kr #(Qk ); also, T [Gk ] is a
disjoint union of #(Qk ) sets of the form T [ 2−k z, 2−k z + 2−k 1 ], so has measure 2−kr J ∗ #(Qk ) = J ∗ µGk , using (c).
Observe next that hGk ik∈N is a non-decreasing sequence with union G, so that
µT [G] = limk→∞ µT [Gk ] = limk→∞ J ∗ µGk = J ∗ µG. Q
Q

(e) It follows that µ∗ T [A] = J ∗ µ∗ A for every A ⊆ Rr . P


P Given A ⊆ R r and ǫ > 0, there are open sets G, H such
that G ⊇ A, H ⊇ T [A], µG ≤ µ A + ǫ and µH ≤ µ T [A] + ǫ (134Fa). Set G1 = G ∩ T −1 [H]; then G1 is open because
∗ ∗

T −1 [H] is. Now µT [G1 ] = J ∗ µG1 , so

µ∗ T [A] ≤ µT [G1 ] = J ∗ µG1 ≤ J ∗ µ∗ A + J ∗ ǫ


≤ J ∗ µG1 + J ∗ ǫ = µT [G1 ] + J ∗ ǫ ≤ µH + J ∗ ǫ
≤ µ∗ T [A] + ǫ + J ∗ ǫ.

As ǫ is arbitrary, µ∗ T [A] = J ∗ µ∗ A. Q
Q

(f ) Consequently µT [E] exists and is equal to J ∗ µE for every measurable E ⊆ R r . P


P Let E ⊆ R r be measurable,
r ′ −1
and take any A ⊆ R . Set A = T [A]. Then
298 Change of variable in the integral 263A

µ∗ (A ∩ T [E]) + µ∗ (A \ T [E]) = µ∗ (T [A′ ∩ E]) + µ∗ (T [A′ \ E])


= J ∗ (µ∗ (A′ ∩ E) + µ∗ (A′ \ E))
= J ∗ µ∗ A′ = µ∗ T [A′ ] = µ∗ A.
As A is arbitrary, T [E] is measurable, and now
µT [E] = µ∗ T [E] = J ∗ µ∗ E = J ∗ µE. Q
Q

(g) We are at last ready for the calculation of J ∗ . Recall that the matrix T must be expressible as P DQ, where P
and Q are orthogonal matrices and D is diagonal, with non-negative diagonal entries (2A6C). Now we must have
T [ [0, 1[ ] = P [D[Q[ [0, 1[ ]]],
so, using (f),
J ∗ = JP∗ JD
∗ ∗
JQ ,
where JP∗ = µP [ [0, 1[ ], etc. Now we find that JP∗ = JQ∗
P Let B = B(0, 1) be the unit ball of R r . Because B
= 1. P
is closed,
 −1/2  it is measurable; because it is bounded, µB < ∞; and because B includes the non-empty half-open interval
0, r 1 , µB > 0. Now P [B] = Q[B] = B, because P and Q are orthogonal matrices; so we have
µB = µP [B] = JP∗ µB,

and JP∗ must be 1; similarly, JQ = 1. Q
Q

(h) So we have only to calculate JD . Suppose the coefficients of D are δ1 , . . . , δr ≥ 0, so that Dx = (δ1 ξ1 , . . . , δr ξr ) =
d × x. We have been assuming since the beginning of (c) that T is non-singular, so no δi can be 0. Accordingly
D[ [0, 1[ ] = [0, d[,
and

Qr
JD = µ [0, d[ = i=1 δi = det D.
Now because P and Q are orthogonal, both have determinant ±1, so det T = ± det D and J ∗ = ± det T ; because J ∗ is
surely non-negative, J ∗ = | det T | = J.
(i) Thus µT [E] = JµE for every Lebesgue measurable E ⊆ R r . If T is non-singular, then we may use the above
argument to show that T −1 [F ] is measurable for every measurable F , and
R
µF = µT [T −1 [F ]] = JµT −1 [F ] = J × χ(T −1 [F ]) dµ,
identifying J with the constant function with value J. By 235A,
R R R
F
g dµ = T −1 [F ]
JgT dµ = J T −1 [F ]
gT dµ
for every integrable function g and measurable set F .

263B Remark Perhaps I should have warned you that I should be calling on the results of §235. But if they were
fresh in your mind the formulae of the statement of the theorem will have recalled them, and if not then it is perhaps
better to turn back to them now rather than before reading the theorem, since they are used only in the last sentence
of the proof.
I have taken the argument above at a leisurely, not to say pedestrian, pace. The point is that while the translation-
invariance of Lebesgue measure, and its behaviour under simple magnification of a single coordinate, are more or less
built into the definition, its behaviour under general rotations is not, since a rotation takes half-open intervals into
skew cuboids. Of course the calculation of the measure of such an object is not really anything to do with the Lebesgue
theory, and it will be clear that much of the argument would apply equally to any geometrically reasonable notion of
r-dimensional volume.
We come now to the central result of the chapter. We have already done some of the detail work in 262M. The next
basic element is the following lemma.

263C Lemma Let T be any r × r matrix; set J = | det T |. Then for any ǫ > 0 there is a ζ = ζ(T, ǫ) > 0 such that
(i) | det S − det T | ≤ ǫ whenever S is an r × r matrix and kS − T k ≤ ζ;
(ii) whenever D ⊆ R r is a bounded set and φ : D → R r is a function such that kφ(x) − φ(y) − T (x − y)k ≤ ζkx − yk
for all x, y ∈ D, then |µ∗ φ[D] − Jµ∗ D| ≤ ǫµ∗ D.
263C Differentiable transformations in Rr 299

proof (a) Of course (i) is the easy part. Because det S is a continuous function of the coefficients of S, and the
coefficients of S must be close to those of T if kS−T k is small (262Hb), there is surely a ζ0 > 0 such that | det S−det T | ≤
ǫ whenever kS − T k ≤ ζ0 .
(b)(i) Write B = B(0, 1) for the unit ball of R r , and consider T [B]. We know that µT [B] = JµB (263A). Let
G ⊇ T [B] be an open set such that µG ≤ (J + ǫ)µB (134Fa). Because B is compact (2A2F) so is T [B], so there is a
ζ1 > 0 such that T [B] + ζ1 B ⊆ G (2A2Ed). This means that µ∗ (T [B] + ζ1 B) ≤ (J + ǫ)µB.
(ii) Now suppose that D ⊆ R r is a bounded set, and that φ : D → R r is a function such that kφ(x) − φ(y) −
T (x − y)k ≤ ζ1 kx − yk for all x, y ∈ D. Then if x ∈ D and δ > 0,
φ[D ∩ B(x, δ)] ⊆ φ(x) + δT [B] + δζ1 B,
because if y ∈ D ∩ B(x, δ) then T (y − x) ∈ δT [B] and

φ(y) = φ(x) + T (y − x) + (φ(y) − φ(x) − T (y − x))


∈ φ(x) + δT [B] + ζ1 ky − xkB ⊆ φ(x) + δT [B] + ζ1 δB.
Accordingly

µ∗ φ[D ∩ B(x, δ)] ≤ µ∗ (δT [B] + δζ1 B) = δ r µ∗ (T [B] + ζ1 B)


≤ δ r (J + ǫ)µB = (J + ǫ)µB(x, δ).
S P∞
Let η > 0. Then there is a sequence hBn in∈N of balls in R r such that D ⊆ n∈N Bn , n=0 µBn ≤ µ∗ D + η and the
sum of the measures of those Bn whose centres do not lie in D is at most η (261F). Let K be the set of those n such that
the centre of Bn lies in D. Then µ∗ φ[D ∩ Bn ] ≤ (J + ǫ)µBn for every n ∈ K. Also, of course, φ is (kT k + ζ1 )-Lipschitz,
so µ∗ φ[D ∩ Bn ] ≤ (kT k + ζ1 )r µBn for n ∈ N \ K (262D). Now

X
µ∗ φ[D] ≤ µ∗ φ[D ∩ Bn ]
n=0
X X
≤ (J + ǫ)µBn + (kT k + ζ1 )r µBn
n∈K n∈N\K

≤ (J + ǫ)(µ D + η) + η(kT k + ζ1 )r .
As η is arbitrary,
µ∗ φ[D] ≤ (J + ǫ)µ∗ D.

(c) If J = 0, we can stop here, setting ζ = min(ζ0 , ζ1 ); for then we surely have | det S − det T | ≤ ǫ whenever
kS − T k ≤ ζ, while if φ : D → Rr is such that kφ(x) − φ(y) − T (x − y)k ≤ ζkx − yk for all x, y ∈ D, then
|µ∗ φ[D] − Jµ∗ D| = µ∗ φ[D] ≤ ǫµ∗ D.
If J 6= 0, we have more to do. Because T has non-zero determinant, it has an inverse T −1 , and | det T −1 | = J −1 . As in
(b-i) above, there is a ζ2 > 0 such that µ∗ (T −1 [B] + ζ2 B) ≤ (J −1 + ǫ′ )µB, where ǫ′ = ǫ/J(J + ǫ). Repeating (b), we
see that if C ⊆ Rr is bounded and ψ : C → R r is such that kψ(u) − ψ(v) − T −1 (u − v)k ≤ ζ2 ku − vk for all u, v ∈ C,
then µ∗ ψ[C] ≤ (J −1 + ǫ′ )µ∗ C.
Now suppose that D ⊆ R r is bounded and φ : D → R r is such that kφ(x) − φ(y) − T (x − y)k ≤ ζ2′ kx − yk for all x,
y ∈ D, where ζ2′ = min(ζ2 , kT −1 k)/2kT −1 k2 > 0. Then
1
kT −1 (φ(x) − φ(y)) − (x − y)k ≤ kT −1 kζ2′ kx − yk ≤ kx − yk
2

for all x, y ∈ D, so φ must be injective; set C = φ[D] and ψ = φ−1 : C → D. Note that C is bounded, because
kφ(x) − φ(y)k ≤ (kT k + ζ2′ )kx − yk
whenever x, y ∈ D. Also
1
kT −1 (u − v) − (ψ(u) − ψ(v))k ≤ kT −1 kζ2′ kψ(u) − ψ(v)k ≤ kψ(u) − ψ(v)k
2
for all u, v ∈ C. But this means that
1
kψ(u) − ψ(v)k − kT −1 kku − vk ≤ kψ(u) − ψ(v)k
2
300 Change of variable in the integral 263C

and kψ(u) − ψ(v)k ≤ 2kT −1 kku − vk for all u, v ∈ C, so that


kψ(u) − ψ(v) − T −1 (u − v)k ≤ 2ζ2′ kT −1 k2 ku − vk ≤ ζ2 ku − vk
for all u, v ∈ C.
By (b) just above, it follows that
µ∗ D = µ∗ ψ[C] ≤ (J −1 + ǫ′ )µ∗ C = (J −1 + ǫ′ )µ∗ φ[D],
and
Jµ∗ D ≤ (1 + Jǫ′ )µ∗ φ[D].

(d) So if we set ζ = min(ζ0 , ζ1 , ζ2′ ) > 0, and if D ⊆ R r , φ : D → R r are such that D is bounded and kφ(x) − φ(y) −
T (x − y)k ≤ ζkx − yk for all x, y ∈ D, we shall have
µ∗ φ[D] ≤ (J + ǫ)µ∗ D,

µ∗ φ[D] ≥ Jµ∗ D − Jǫ′ µ∗ φ[D] ≥ Jµ∗ D − Jǫ′ (J + ǫ)µ∗ D = Jµ∗ D − ǫµ∗ D,


so we get the required formula
|µ∗ φ[D] − Jµ∗ D| ≤ ǫµ∗ D.

263D We are ready for the theorem.


Theorem Let D ⊆ R r be any set, and φ : D → R r a function differentiable relative to its domain at each point of D.
For each x ∈ D let T (x) be a derivative of φ relative to D at x, and set J(x) = | det T (x)|. Then
(i) J : D → [0,R∞[ is a measurable function,
(ii) µ∗ φ[D] ≤ D J dµ,
allowing ∞ as the value of the integral. If D is measurable, then
(iii) φ[D] is measurable.
If D is measurableRand φ is injective, then
(iv) µφ[D] = D J dµ,
(v) for every real-valued function g defined on a subset of φ[D],
R R
φ[D]
g dµ = D
J × gφ dµ

if either integral is defined in [−∞, ∞], provided we interpret J(x)g(φ(x)) as zero when J(x) = 0 and g(φ(x)) is
undefined.
proof (a) To see that J is measurable, use 262P; the function T 7→ | det T | is a continuous function of the coefficients
of T , and the coefficients of T (x) are measurable functions of x, by 262P, so x 7→ | det T (x)| is measurable (121K). We
also know that if D is measurable, φ[D] will be measurable, by 262Ob. Thus (i) and (iii) are done.
(b) For the moment, assume that D is bounded, and fix ǫ > 0. For r × r matrices T , take ζ(T, ǫ) > 0 as in 263C.
Take hDn in∈N , hTn in∈N as in 262M, so that hDn in∈N is a disjoint cover of D by sets which are relatively measurable in
D, and each Tn is an r × r matrix such that
kT (x) − Tn k ≤ ζ(Tn , ǫ) whenever x ∈ Dn ,

kφ(x) − φ(y) − Tn (x − y)k ≤ ζ(Tn , ǫ)kx − yk for all x, y ∈ Dn .


Then, setting Jn = | det Tn |, we have
|J(x) − Jn | ≤ ǫ for every x ∈ Dn ,

|µ∗ φ[Dn ] − Jn µ∗ Dn | ≤ ǫµ∗ Dn ,


by the choice of ζ(Tn , ǫ). So we have
R P∞ R
D
J dµ ≤ n=0 Jn µ∗ Dn + ǫµ∗ D ≤ D
J dµ + 2ǫµ∗ D;
P∞
I am using here the fact that all the Dn are relatively measurable in D, so that, in particular, µ∗ D = n=0 µ∗ Dn .
Next,
P∞ P∞
µ∗ φ[D] ≤ n=0 µ∗ φ[Dn ] ≤ n=0 Jn µ∗ Dn + ǫµ∗ D.
263Ec Differentiable transformations in Rr 301

Putting these together,


R
µ∗ φ[D] ≤ D
J dµ + 2ǫµ∗ D.
If D is measurable and φ is injective, then all the Dn are measurable subsets of R r , so all the φ[Dn ] are measurable,
and they are also disjoint. Accordingly
R P∞ P∞
D
J dµ ≤ n=0 Jn µDn + ǫµD ≤ n=0 (µφ[Dn ] + ǫµDn ) + ǫµD = µφ[D] + 2ǫµD.
Since ǫ is arbitrary, we get
R
µ∗ φ[D] ≤ D
J dµ,
and if D is measurable and φ is injective,
R
D
J dµ ≤ µφ[D];
thus we have (ii) and (iv), on the assumption that D is bounded.
(c) For a general set D, set Bk = B(0, k); then
R R
µ∗ φ[D] = limk→∞ µ∗ φ[D ∩ Bk ] ≤ limk→∞ D∩Bk
J dµ = D
J dµ,
with equality if φ is injective and D is measurable.
(d) For part (v), I seek to show that the hypotheses of 235J are satisfied, taking X = D and Y = φ[D]. P
P Set
G = {x : x ∈ D, J(x) > 0}.
α) If F ⊆ φ[D] is measurable, then there are Borel sets F1 , F2 such that F1 ⊆ F ⊆ F2 and µ(F2 \ F1 ) = 0. Set

Ej = φ−1 [Fj ] for each j, so that E1 ⊆ φ−1 [F ] ⊆ E2 , and both the sets Ej are measurable, because φ and dom φ are
measurable. Now, applying (iv) to φ↾Ej ,
R
Ej
J dµ = µφ[Ej ] = µ(Fj ∩ φ[D]) = µF
R R
for both j, so E2 \E1 J dµ = 0 and J = 0 a.e. on E2 \E1 . Accordingly J ×χ(φ−1 [F ]) =a.e. J ×χE1 , and J ×χ(φ−1 [F ])dµ
R
exists and is equal to E1 J dµ = µF . At the same time, (φ−1 [F ]∩G)△(E1 ∩G) is negligible, so φ−1 [F ]∩G is measurable.
R
(ββ ) If F ⊆ φ[D] and G ∩ φ−1 [F ] is measurable, then we know that µφ[D \ G] = D\G J = 0 (by (iv)), so F \ φ[G]
must be negligible; while F ∩ φ[G] = φ[G ∩ φ−1 [F ]] also is measurable, by (iii). Accordingly F is measurable whenever
G ∩ φ−1 [F ] is measurable.
Thus all the hypotheses of 235J are satisfied. Q
Q Now (v) can be read off from the conclusion of 235J.

263E Remarks (a) This is a version of the classical result on change of variable in a many-dimensional integral.
What I here call J(x) is the Jacobian of φ at x; it describes the change in volumes of objects near x, following the
rule already established in 263A for functions with constant derivative. The idea of the proof is also the classical
one: to break the set D up into small enough pieces Dm for us to be able to approximate φ by affine operators
y 7→ φ(x) + Tm (y − x) on each. The potential irregularity of the set D, which in this theorem may be any set, is
compensated for by a corresponding freedom in choosing the sets Dm . In fact there is a further decomposition of
the sets Dm hidden in part (b-ii) of the proof of 263C; each Dm is essentially covered by a disjoint family of balls,
the measures of whose images we can estimate with an adequate accuracy. There is always a danger of a negligible
exceptional set, and we need the crude inequalities of the proof of 262D to deal with it.

(b) Throughout the work of this chapter, from 261B to 263D, I have chosen balls B(x, δ) as the basic shapes to
work with. I think it should be clear that in fact any reasonable shapes would do just as well. In particular, the ‘balls’
Pr
B1 (x, δ) = {y : i=1 |ηi − ξi | ≤ δ}, B∞ (x, δ) = {y : |ηi − ξi | ≤ δ ∀ i}
would serve perfectly. There are many alternatives. We
 −k  could use sets of the form C(x, k), for x ∈ R r and k ∈ N, defined
to be the half-open cube of the form 2 z, 2 (z + 1) with z ∈ Z containing x, instead; or even C ′ (x, δ) = [x, x + δ1[.
−k r

In all such cases we have versions of the density theorems (261Yb-261Yc) which support the remaining theory.

(c) I have presented 263D as a theorem about differentiable functions, because that is the normal form in which one
uses it in elementary applications. However, the proof depends essentially on the fact that a differentiable function is
a countable union of Lipschitz functions, and 263D would follow at once from the same theorem proved for Lipschitz
functions only. Now the fact is that the theorem applies to any countable union of Lipschitz functions, because
302 Change of variable in the integral 263Ec

a Lipschitz function is differentiable almost everywhere. For more advanced work (see Federer 69 or Evans &
Gariepy 92, or Chapter 47 in Volume 4) it seems clear that Lipschitz functions are the vital ones, so I spell out the
result.

*263F Corollary Let D ⊆ R r be any set and φ : D → R r a Lipschitz function. Let D1 be the set of points at
which φ has a derivative relative to D, and for each x ∈ D1 let T (x) be such a derivative, with J(x) = | det T (x)|. Then
(i) D \ D1 is negligible;
(ii) J : D1 → [0,
R ∞[ is measurable;
(iii) µ∗ φ[D] ≤ D J(x)dx.
If D is measurable, then
(iv) φ[D] is measurable.
If D is measurableR and φ is injective, then
(v) µφ[D] = D J dµ,
(vi) for every real-valued function g defined on a subset of φ[D],
R R
φ[D]
g dµ = D
J × gφ dµ
if either integral is defined in [−∞, ∞], provided we interpret J(x)g(φ(x)) as zero when J(x) = 0 and g(φ(x)) is
undefined.
proof This is now just a matter of putting 262Q and 263D together, with a little help from 262D. Use 262Q to show
that D \ D1 is negligible, 262D to show that φ[D \ D1 ] is negligible, and apply 263D to φ↾D1 .

263G Polar coordinates in the plane I offer an elementary example consequence. Define φ : R 2 →
 with a useful 
cos θ −ρ sin θ
R 2 by setting φ(ρ, θ) = (ρ cos θ, ρ sin θ) for ρ, θ ∈ R 2 . Then φ′ (ρ, θ) = , so J(ρ, θ) = |ρ| for all ρ, θ.
sin θ ρ cos θ
Of course φ is not injective, but if we restrict it to the domain D = {(0, 0)} ∪ {(ρ, θ) : ρ > 0, −π < θ ≤ π} then φ↾D is
a bijection between D and R 2 , and
R R
g dξ1 dξ2 = D
g(φ(ρ, θ))ρ dρdθ
for every real-valued function g which is integrable over R 2 .
Suppose, in particular, that we set
2 2 2
g(x) = e−kxk /2
= e−ξ1 /2 e−ξ2 /2
for x = (ξ1 , ξ2 ) ∈ R. Then
R R 2
R 2
g(x)dx = e−ξ1 /2 dξ1 e−ξ2 /2 dξ2 ,
R 2 R
as in 253D. Setting I = e−t /2 dt, we have g = I 2 . (To see that I is well-defined in R, note that the integrand is
continuous, therefore measurable, and that
R1 2

−1
e−t /2
dt ≤ 2,

R −1 2
R∞ 2
R∞ Ra 1
−∞
e−t /2
dt = 1
e−t /2
dt ≤ 1
e−t/2 dt = lima→∞ 1
e−t/2 dt = e−1/2
2
are both finite.) Now looking at the alternative expression we have

Z Z
I2 = g(x)dx = g(ρ cos θ, ρ sin θ)ρ d(ρ, θ)
D
Z Z ∞Z π
−ρ2 /2 2
= e ρ d(ρ, θ) = ρe−ρ /2 dθdρ
D 0 −π
(ignoring the point (0, 0), which has zero measure)
Z ∞ Z a
2 2
= 2πρe−ρ /2 dρ = 2π lim ρe−ρ /2

0 a→∞ 0
2
= 2π lim (−ea /2
+ 1) = 2π.
a→∞

Consequently
263I Differentiable transformations in Rr 303
R∞ 2 √
−∞
e−t /2
dt = I = 2π,
which is one of the many facts every mathematician should know, and in particular is vital for Chapter 27 below.

263H Corollary If k ∈ N is odd,


R∞ 2

−∞
xk e−x /2
dx = 0;
if k = 2l ∈ N is even, then
R∞ 2 (2l)! √
−∞
xk e−x /2
dx = 2π.
2l l!

2
proof (a) To see that all the integrals are well-defined and finite, observe that limx→±∞ xk e−x /4
= 0, so that
2
Mk = supx∈R |xk e−x /4 | is finite, and
R∞ 2
R∞ 2

−∞
|xk e−x /2
|dx ≤ Mk −∞
e−x /4
dx < ∞.

(b) If k is odd, then substituting y = −x we get


R∞ 2
R∞ 2

−∞
xk e−x /2
dx = − −∞
y k e−y /2
dy,
so that both integrals must be zero.
R∞ 2 √ 0!

(c) For even k, proceed by induction. Set Il = −∞
x2l e−x /2
dx. I0 = 2π = 20 0! 2π by 263G. For the inductive
step to l + 1 ≥ 1, integrate by parts to see that
Ra 2 2 2
Ra 2

−a
x2l+1 · xe−x /2
dx = −a2l+1 e−a /2
+ (−a)2l+1 e−a /2
+ −a
(2l + 1)x2l e−x /2
dx
for every a ≥ 0. Letting a → ∞,
Il+1 = (2l + 1)Il .
Because
(2(l+1))! √ (2l)! √
2π = (2l + 1) 2π,
2l+1 (l+1)! 2l l!
the induction proceeds.

263I The one-dimensional case The restriction to injective functions φ in 263D(v) is unavoidable in the context
of the result there. But in the substitutions of elementary calculus it is not always essential. In the hope of clarifying
the position I give a result here which covers many of the standard tricks.
Theorem Let I ⊆ R be an interval with more than one point, and φ : I → R a function which is absolutely continuous
on any closed bounded subinterval of I. Write u = inf I, u′ = sup I in [−∞, ∞], and suppose that v = limx↓u φ(x)
and v ′ = limx↑u′ φ(x) are defined in [−∞, ∞]. Let g be a Lebesgue measurable real-valued function defined almost
everywhere in φ[I]. Then
R v′ R
v
g= I
g(φ(x))φ′ (x)dx
R v′ Rv
whenever the right-hand side is defined in R, on the understanding that we interpret v
g as − v′
g when v ′ < v, and
g(φ(x))φ′ (x) as 0 when φ′ (x) = 0 and g(φ(x)) is undefined.
proof (a) Recall that φ is differentiable almost everywhere on I (225Cb) and that φ[A] is negligible for every negligible
A ⊆ I (225G). (These results are stated for closed bounded intervals; but since any interval is expressible as the union
of a sequence of closed bounded intervals, they remain valid in the present context.) Set D = dom φ′ , so that I \ D
and φ[I \ D] are negligible. Next,
R setting D0 = {xR : x ∈ D, φ′ (x) = 0}, D and D0 are Borel sets (225J) and φ[D0 ] is
negligible, by 263D(ii), while I g(φ(x))φ′ (x)dx = D\D0 g(φ(x))φ′ (x)dx.
Applying 262M with A = R\{0} and ζ(α) = 12 |α| for α ∈ A, we have sequences hEn in∈N , hαn in∈N such that hEn in∈N
is a partition of D \ D0 into measurable sets, every αn is non-zero, and |φ(x) − φ(y) − αn (x − y)| ≤ 12 |αn ||x − y| for all
x, y ∈ En ; so that, in particular, φ↾En is injective, while sgn φ′ (x) = sgn αn for every x ∈ En , writing sgn α = α/|α| as
usual. Set ǫn = sgn αn for each n. Now 263D(v) tells us that
P∞ R P∞ R
n=0 |g| × χ(φ[En ]) = n=0 E |g(φ(x))φ′ (x)|dx
n
304 Change of variable in the integral 263I

is finite.
R Note that 263D(v) also shows that if B ⊆ R is negligible, then En ∩ φ−1 [B] must be negligible for every n, so that

φ−1 [B]
g(φ(x))φ (x)dx = 0.
Consequently, setting
P∞
C0 = {y : y ∈ (φ[I] ∩ dom g) \ ({v, v ′ } ∪ φ[I \ D] ∪ φ[D0 ]), n=0 |g(y)χ(φ[En ])(y)| < ∞},
φ[I] \ C0 is negligible, and if we set C = {y : y ∈ C0 , g(y) 6= 0},
R R
J∩C
g= J
g
for every J ⊆ φ[I].
(b) The point of the argument is the following fact: if y ∈ C then

X
ǫn χ(φ[En ])(y) = 1 if v < y < v ′ ,
n=0
= −1 if v ′ < y < v,
= 0 if y < v ≤ v ′ or v ′ ≤ v < y.
P∞
P Because g(y) 6= 0 and n=0 |g(y)χ(φ[E
P S n ])(y)| is finite, {n : y ∈ φ[En ]} is finite; because y ∈
/ φ[I \ D] ∪ φ[D0 ], and
φ↾En is injective for every n, and n∈N En = D \ D0 , K = φ−1 [{y}] is finite. For each x ∈ K, let nx be such that
P∞ P
x ∈ Enx ; then ǫnx = sgn φ′ (x). So n=0 ǫn χ(φ[En ])(y) = x∈K sgn φ′ (x).
If J ⊆ R \ K is an interval, φ(z) 6= y for z ∈ J; since φ is continuous, the Intermediate Value Theorem tells us
P sgn(φ(z) − y) is constant on J. A simple induction on #(K ∩ ]−∞,
that P z[) shows that sgn(φ(z) − y) = sgn(v − y) +
2 x∈K,x<z sgn φ′ (x) for every z ∈ R \ K; taking the limit as z ↑ u′ , x∈K sgn φ′ (x) = 12 (sgn(v ′ − y) − sgn(v − y)).
(Here we may have to interpret sgn(±∞) as ±1 in the obvious way.) This turns out to be just what we need to know.
Q
Q
(c) What this means is that

Z v′ Z v′ Z ∞
X
g= g × χC = g× ǫn χ(φ[En ])
v v C n=0
(allowing for the conventional reversal if v ′ < v)
X∞ Z ∞
X Z
= ǫn g × χ(φ[En ]) = ǫn g(φ(x))|φ′ (x)|dx
n=0 C n=0 En ∩φ−1 [C]

(263D(v) again, applied to φ↾(En ∩ φ−1 [C]) for each n)


X∞ Z
= ǫn g(φ(x))|φ′ (x)|dx
n=0 En

(because g(φ(x))φ (x) = 0 almost everywhere in En \ φ−1 [C])


∞ Z
X Z Z

= g(φ(x))φ (x)dx = ′
g(φ(x))φ (x)dx = g(φ(x))φ′ (x)dx,
n=0 En D\D0 I

as claimed.

263X Basic exercises (a) Let (X, Σ, µ) be any measure space, f ∈ L0 (µ) and p ∈ [1, ∞[. Show that f ∈ Lp (µ) iff
R∞
αp−1 µ∗ {x : x ∈ dom f, |f (x)| > α}dα
γ=p 0
R R∞
is finite, and in this case kf kp = γ 1/p . (Hint: |f |p = 0 µ∗ {x : |f (x)|p > β}dβ, by 252O; now substitute β = αp .)
P∞
(b) Let f be an integrable function defined almostP everywhere
R in R r . Show that if α < r − 1 then n=1 nα |f (nx)|

is finite for almost every x ∈ R r . (Hint: estimate n=0 nα B |f (nx)|dx for any ball B centered at the origin.)

(c) Let A ⊆ ]0, 1[ be a set such that µ∗ A = µ∗ ([0, 1] \ A) = 1, where µ is Lebesgue measure on R. Set D = A ∪ {−x :
x ∈ ]0, 1[ \ A} ⊆ [−1, 1], and set φ(x) = |x| for
R x ∈ D. Show that φ is injective, that φ is differentiable relative to its
domain everywhere in D, and that µ∗ φ[D] < D |φ′ (x)|dx.
263 Notes Differentiable transformations in Rr 305

(d) Let φ : D → R r be a function differentiable relative to D at each point of D ⊆ R r , and suppose thatSfor each
x ∈ D there is a non-singular derivative T (x) of φ at x; set J(x) = | det T (x)|. Show that D is expressible as k∈N Dk
where Dk = D ∩ Dk and φ↾Dk is injective for each k.
R 1
R 1
> (e)(i) Show that for any Lebesgue measurable E ⊆ R, t ∈ R \ {0}, tE |u| du = E |u| du. (ii) For t ∈ R, u ∈ R \ {0}
t
R 1
R 1
set φ(t, u) = ( u , u). Show that φ[E] |tu| d(t, u) = E |tu| d(t, u) for any Lebesgue measurable E ⊆ R 2 .

> (f ) Define φ : R 3 → R 3 by setting


φ(ρ, θ, α) = (ρ sin θ sin α, ρ sin θ cos α, ρ cos θ).
Show that det φ′ (ρ, θ, α) = ρ2 sin θ.
R∞ 2
(g) Show that if k = 2l + 1 is odd, then 0
xk e−x /2
dx = 2l l!. (Compare 252Xi.)
R 1
263Y Further exercises (a) Define a measure ν on R by setting νE = E |x| dx for Lebesgue measurable sets
1
R x
E ⊆ R. For f , g ∈ L (ν) set R (f ∗ g)(x) = f ( t )g(t)ν(dt)
R whenever this is defined in R. (i) Show that f ∗ g =
g ∗ f ∈ L1 (ν). (ii) Show that h(x)(f ∗ g)(x)ν(dx) = h(xy)f (x)f (y)ν(dx)ν(dy) for every h ∈ L∞ (ν). (iii) Show that
f ∗ (g ∗ h) = (f ∗ g) ∗ h for every h ∈ L1 (ν). (Hint: 263Xe.)

(b) Let E ⊆ R 2 be a measurable set such that lim supα→∞ α12 µ2 (E ∩ B(0, α)) > 0, writing µ2 for Lebesgue measure
on R 2 . Show that there is some θR ∈ ]−π, π] such that µ1 Eθ = ∞, where Eθ = {ρ : ρ ≥ 0, (ρ cos θ, ρ sin θ) ∈ E}. (Hint:
π
show that α12 µ2 (E ∩ B(0, α)) ≤ −π min( 21 , α1 µ1 Eθ )dθ.) Generalize to higher dimensions and to functions other than
χE.

(c) Let E ⊆ R r be a measurable set, and φ : E → R r a function differentiable relative to its domain, with a
derivative T (x), at each point x of E; set J(x) = | det T (x)|. Show that for any integrable function g defined on φ[E],
R R
g(y)#(φ−1 [{y}])dy = E
J(x)g(φ(x))dx
(Hint: 263I.)

(d) Find a proof of 263I based on the ideas of §225. (Hint: 225Xg.)

(e) Let f : [a, b] → R be a function of bounded variation, where a < b in R, with Lebesgue decomposition
f = fp + fcs + fac as in 226Cd; let µ be Lebesgue measure on R. Show that the following are equiveridical: (i) fcs is
Rb R
constant; (ii) µf [ [a, b] ] ≤ a |f ′ |dµ; (iii) µ∗ f [A] ≤ A |f ′ |dµ for every A ⊆ [a, b]; (iv) f [A] is negligible for every negligible
Rd
set A ⊆ [a, b]. (Hint: for (iv)⇒(i) put 226Yd and 263D(ii) together to show that |f (d) − f (c)| ≤ c |f ′ |dµ + Var[c,d] fp
whenever a ≤ c ≤ d ≤ b, and therefore that Var[a,b] f ≤ Var[a,b] fp + Var[a,b] fac .)

263 Notes and comments Yet again, approaching 263D, I find myself having to choose between giving an accessible,
relatively weak result and making the extra effort to set out a theorem which is somewhere near the natural boundary
of what is achievable within the concepts being developed in this volume; and, as usual, I go for the more powerful
form. There are three basic sources of difficulty: (i) the fact that we are dealing with more than one dimension; (ii) the
fact that we are dealing with irregular domains; (iii) the fact that we are dealing with arbitrary integrable functions. I
do not think I need to apologise for (iii) in a book on measure theory. Concerning (ii), it is quite true that the principal
applications of these results are to cases in which the transformation φ is differentiable everywhere, with continuous
derivative, and the set D has negligible boundary; and in these cases there are substantial simplifications available –
mostly because the sets Dm of the proof of 263D can be taken to be cubes. Nevertheless, I think any form of the
result which makes such assumptions is deeply unsatisfactory at this level, being an awkward compromise between
ideas natural to the Riemann integral and those natural to the Lebesgue integral. Concerning (i), it might even have
been right to lay out the whole argument for the case r = 1 before proceeding to the general case, as I did in §§114-115,
because the one-dimensional case is already important and interesting; and if you find the work above difficult – which
it is – and your immediate interests are in one-dimensional integration by substitution, then I think you might find it
worth your time to reproduce the r = 1 argument yourself, up to a proof of 263I. In fact the biggest difference is in
263A, which becomes nearly trivial; the work of 262M and 263C becomes more readable, because all the matrices turn
into scalars and we can drop the word ‘determinant’, but I do not think we can dispense with any of the ideas, at least
if we wish to obtain 263D as stated. (But see 263Yd.)
306 Change of variable in the integral 263 Notes

I found myself insisting, in the last paragraph, that a distinction can be made between ‘ideas natural to the Riemann
integral and those natural to the Lebesgue integral’. We are approaching deep questions here, like ‘what are books on
measure theory for?’, which I do not think can be answered without some – possibly unconscious – reference to the
question ‘what is mathematics for?’. I do of course want to present here some of the wonderful general theorems which
arise in the Lebesgue theory. But more important than any specific theorem is a general idea of what can be proved
by these methods. It is the essence of modern measure theory that continuity does not matter, or, if you prefer, that
measurable functions are in some sense so nearly continuous that we do not have to add hypotheses of continuity in our
theorems. Now this is in a sense a great liberation, and the Lebesgue integral is now the standard one. But you must
not regard the Riemann integral as outdated. The intuitions on which it is founded – for instance, that the surface
of a solid body has zero volume – remain of great value in their proper context, which certainly includes the study of
differentiable functions with continuous derivatives. What I am saying here is that I believe we can use these intuitions
best if we maintain a division, a flexible and permeable one, of course, between the ideas of the two theories; and that
when transferring a theorem from one side of the boundary to the other we should do so whole-heartedly, seeking to
express the full power of the methods we are using.
I have already said that the essential difference between the one-dimensional and multi-dimensional cases lies in
263A, where the Jacobian J = | det T | enters the argument. Shorn of the technical devices necessary to deal with
arbitrary Lebesgue measurable sets, this amounts to a calculation of the volume of the parallelepiped T [I] where I is
the interval [0, 1[. I have dealt with this by a little bit of algebra, saying that the result is essentially obvious if T is
diagonal, whereas if T is an isometry it follows from the fact that the unit ball is left invariant; and the algebra comes
in to express an arbitrary matrix as a product of diagonal and orthogonal matrices (2A6C). It is also plain from 261F
that Lebesgue measure must be rotation-invariant as well as translation-invariant; that is to say, it is invariant under
all isometries. Another way of looking at this will appear in the next section.
I feel myself that the centre of the argument for 263D is in the lemma 263C. This is where we turn the exact result
for linear operators into an approximate result for almost-linear functions; and the whole point of differentiability is
that a differentiable function is well approximated, in a neighbourhood of any point of its domain, by a linear operator.
The lemma involves two rather different ideas. To show that µ∗ φ[D] ≤ (J + ǫ)µ∗ D, we look first at balls and then
use Vitali’s
P theorem to see that D is economically covered by balls, so that an upper bound for µ∗ φ[D] in terms of

a sum B∈I0 µ φ[D ∩ B] is adequate. To obtain a lower bound, we need to reverse the argument by looking at
ψ = φ−1 , which involves checking first that φ is invertible, and then that ψ is appropriately linked to T −1 . I have
written out exact formulae for ǫ′ , ζ2′ and so on, but this is only in case you do not trust your intuition; the fact that
kφ−1 (u) − φ−1 (v) − T −1 (u − v)k is small compared with ku − vk is pretty clearly a consequence of the hypothesis that
kφ(x) − φ(y) − T (x − y)k is small compared with kx − yk.
The argument of 263D itself is now a matter of breaking the set D up into appropriate pieces on each of which φ is
sufficiently nearly linear for 263C to apply, so that
P∞ P∞
µ∗ φ[D] ≤ m=0 µ∗ φ[Dm ] ≤ m=0 (Jm + ǫ)µ∗ Dm .
With a little care (taken in 263C, with itsP
condition (i)), we Rcan also ensure that the Jacobian J is well approximated

by Jm almost everywhere in Dm , so that m=0 Jm µ∗ Dm ≏ D J(x)dx.
These ideas, joined with the results of §262, bring us to the point
R
E
J dµ = µφ[E]
when φ is injective and E ⊆ D is measurable. We need a final trick, involving Borel sets, to translate this into
R
φ−1 [F ]
J dµ = µF

whenever F ⊆ φ[D] is measurable, which is what is needed for the application of 235J.
I hope that you long ago saw, and were delighted by, the device in 263G. Once again, this is not really Lebesgue
integration; but I include it just to show that the machinery of this chapter can be turned to deal with the classical
results, and that indeed we have a tiny profit from our labour, in that no apology need be made for the boundary of
the set D into which the polar coordinate system maps the plane. I have already given the actual result as an exercise
in 252Xi. That involved (if you chase through the references) a one-dimensional substitution (performed in 225Xj),
Fubini’s theorem and an application of the formulae of §235; that is to say, very much the same elements as those used
above, though in a different order. I could present this with no mention of differentiation in higher dimensions because
the first change of variable was in one dimension, and the second (involving the function x 7→ kxk, in 252Xi(i)) was of
a particularly simple type, so that a different method could be used to find the function J.
The abstract ideas to which this treatise is devoted do not, indeed, lead us to many particular examples on which to
practise the ideas of this section. The ones which do arise tend to be very straightforward, as in 263G, 263Xa-263Xb
and 263Xe. I mention the last because it provides a formula needed to discuss a new type of convolution (263Ya).
264D Hausdorff measures 307

In effect, this depends on the multiplicative group R \ {0} in place of the additive group R treated in §255. The
formula x1 in the definition of ν is of course the derivative of ln x, and ln is an isomorphism between (]0, ∞[ , ·, ν) and
(R, +, Lebesgue measure).

264 Hausdorff measures


The next topic I wish to approach is the question of ‘surface measure’; a useful example to bear in mind throughout
this section and the next is the notion of area for regions on the sphere, but any other smoothly curved two-dimensional
surface in three-dimensional space will serve equally well. It is I think more than plausible that our intuitive concepts
of ‘area’ for such surfaces should correspond to appropriate measures. But formalizing this intuition is non-trivial,
especially if we seek the generality that simple geometric ideas lead us to; I mean, not contenting ourselves with
arguments that depend on the special nature of the sphere, for instance, to describe spherical surface area. I divide
the problem into two parts. In this section I will describe a construction which enables us to define the r-dimensional
measure of an r-dimensional surface – among other things – in s-dimensional space. In the next section I will set out
the basic theorems making it possible to calculate these measures effectively in the leading cases.

264A Definitions Let s ≥ 1 be an integer, and r > 0. (I am primarily concerned with integral r, but will not insist
on this until it becomes necessary, since there are some very interesting ideas which involve non-integral ‘dimension’
r.) For any A ⊆ R s , δ > 0 set

X
θrδ A = inf{ (diam An )r : hAn in∈N is a sequence of subsets of R s covering A,
n=0
diam An ≤ δ for every n ∈ N}.
It is convenient in this context to say that diam ∅ = 0. Now set
θr A = supδ>0 θrδ A;
θr is r-dimensional Hausdorff outer measure on R s .

264B Of course we must immediately check the following:


Lemma θr , as defined in 264A, is always an outer measure.
proof You should be used to these arguments by now, but there is an extra step in this one, so I spell out the details.
(a) Interpreting the diameter of the empty set as 0, we have θrδ ∅ = 0 for every δ > 0, so θr ∅ = 0.
(b) If A ⊆ B ⊆ R s , then every sequence covering B also covers A, so θrδ A ≤ θrδ B for every δ and θr A ≤ θr B.
s
(c) Let hAn in∈N be a sequence
P∞ of subsets of R with union A, and take any a < θr A. Then there is a δ > 0 such
that a ≤ θrδ A. Now θrδ A ≤ n=0 θrδ (An ). P P∞ǫ > 0, and for each n ∈ N choose a sequence hAnm im∈N of sets,
P Let
covering A, with diam Anm ≤ δ for every m and m=0 (diam Anm )r ≤ θrδ + 2−n ǫ. Then hAnm im,n∈N is a cover of A
by countably many sets of diameter at most δ, so
P∞ P ∞ P∞ P∞
θrδ A ≤ n=0 m=0 (diam Anm )r ≤ n=0 θrδ An + 2−n ǫ = 2ǫ + n=0 θrδ An .
As ǫ is arbitrary, we have the result. Q
Q
Accordingly
P∞ P∞
a ≤ θrδ A ≤ n=0 θrδ An ≤ n=0 θr An .

As a is arbitrary,
P∞
θr A ≤ n=0 θr An ;

as hAn in∈N is arbitrary, θr is an outer measure.

264C Definition If s ≥ 1 is an integer, and r > 0, then Hausdorff r-dimensional measure on R s is the measure
µHr on R s defined by Carathéodory’s method from the outer measure θr of 264A-264B.

264D Remarks (a) It is important to note that the sets used in the definition of the θrδ need not be balls; even
in R 2 not every set A can be covered by a ball of the same diameter as A.
308 Change of variable in the integral 264Db

(b) In the definitions above I require r > 0. It is sometimes appropriate to take µH0 to be counting measure.
This is nearly the result of applying the formulae above with r = 0, but there can be difficulties if we interpret them
over-literally.

(c) All Hausdorff measures must be complete, because they are defined by Carathéodory’s method (212A). For r > 0,
they are atomless (264Yg). In terms of the other criteria of §211, however, they are very ill-behaved; for instance, if
r, s are integers and 1 ≤ r < s, then µHr on R s is not semi-finite. (I will give a proof of this in 439H in Volume 4.)
Nevertheless, they do have some striking properties which make them reasonably tractable.

(d) In 264A, note that θrδ A ≤ θrδ′ A when 0 < δ ′ ≤ δ; consequently, for instance, θr A = limn→∞ θr,2−n A. I have
allowed arbitrary sets An in the covers, but it makes no difference if we restrict our attention to covers consisting of
open sets or of closed sets (264Xc).

264E Theorem Let s ≥ 1 be an integer, and r ≥ 0; let µHr be Hausdorff r-dimensional measure on R s , and ΣHr
its domain. Then every Borel subset of R s belongs to ΣHr .

proof This is trivial if r = 0; so suppose henceforth that r > 0.

(a) The first step is to note that if A, B are subsets of Rs and η > 0 is such that kx−yk ≥ η for all x ∈ A, y ∈ B, then
θr (A ∪ B) = θr A + θr B, where θr is r-dimensional Hausdorff outer measure on R s . P P Of course θr (A ∪ B) ≤ θr A + θr B,
because θr is an outer measure. For the reverse inequality, we may suppose that θr (A ∪ B) < ∞, so that θr A and θr B
are both finite. Let ǫ > 0 and let δ1 , δ2 > 0 be such that
θr A + θr B ≤ θrδ1 A + θrδ2 B + ǫ.
Set δ = min(δ1 , δ2 , 21 η) > 0 and let hAn in∈N
P be a sequence of sets of diameter at most δ, covering A ∪ B, and such that
∞ r
n=0 (diam An ) ≤ θrδ (A ∪ B) + ǫ. Set

K = {n : An ∩ A 6= ∅}, L = {n : An ∩ B 6= ∅}.
Because
kx − yk ≥ η > diam An
S S
whenever x ∈ A, y ∈ B and n ∈ N, K ∩ L = ∅; and of course A ⊆ n∈K Ak , B ⊆ n∈L An . Consequently

θr A + θr B ≤ ǫ + θrδ1 A + θrδ2 B
X X
≤ǫ+ (diam An )r + (diam An )r
n∈K n∈L
X∞
≤ǫ+ (diam An )r ≤ 2ǫ + θrδ (A ∪ B) ≤ 2ǫ + θr (A ∪ B).
n=0

As ǫ is arbitrary, θr (A ∪ B) ≥ θr A + θr B, as required. Q
Q

(b) It follows that θr A = θr (A ∩ G) + θr (A \ G) whenever A ⊆ R s and G is open. P P As usual, it is enough to


consider the case θr A < ∞ and to show that in this case θr (A ∩ G) + θr (A \ G) ≤ θr A. Set
An = {x : x ∈ A, kx − yk ≥ 2−n for every y ∈ A \ G},

B0 = A0 , Bn = An \ An−1 for n > 1.


S S
Observe that An ⊆ An+1 for every n and n∈N An = n∈N Bn = A ∩ G. The point is that if m, n ∈ N and n ≥ m + 2,
and if x ∈ Bm and y ∈ Bn , then there is a z ∈ A \ G such that ky − zk < 2−n+1 ≤ 2−m−1 , while kx − zk must be at
least 2−m , so kx − yk ≥ kx − zk − ky − zk ≥ 2−m−1 . It follows that for any k ≥ 0
Pk S
m=0 θr B2m = θr ( m≤k B2m ) ≤ θr (A ∩ G) < ∞,

Pk S
m=0 θr B2m+1 = θr (B2m+1 ) ≤ θr (A ∩ G) < ∞,
m≤k
P∞
(inducing on k, using (a) above for the inductive step). Consequently n=0 θr Bn < ∞.
P∞
But now, given ǫ > 0, there is an m such that n=m θr Bm ≤ ǫ, so that
264G Hausdorff measures 309


X
θr (A ∩ G) + θr (A \ G) ≤ θr Am + θr Bn + θr (A \ G)
n=m
≤ ǫ + θr Am + θr (A \ G) = ǫ + θr (Am ∪ (A \ G))
−m
(by (a) again, since kx − yk ≥ 2 for x ∈ Am , y ∈ A \ G)
≤ ǫ + θr A.

As ǫ is arbitrary, θr (A ∩ G) + θr (A \ G) ≤ θr A, as required. Q
Q
(c) Part (b) shows exactly that open sets belong to ΣHr . It follows at once that the Borel σ-algebra of R s is included
in ΣHr , as claimed.

264F Proposition Let s ≥ 1 be an integer, and r > 0; let θr be r-dimensional Hausdorff outer measure on R s , and
write µHr for r-dimensional Hausdorff measure on R s , ΣHr for its domain. Then
(a) for every A ⊆ R s there is a Borel set E ⊇ A such that µHr E = θr A;
(b) θr = µ∗Hr , the outer measure defined from µHr ;
(c) if E ∈ ΣHr is expressible as a countable union of sets of finite measure, there are Borel sets E ′ , E ′′ such that
E ′ ⊆ E ⊆ E ′′ and µHr (E ′′ \ E ′ ) = 0.
proof (a) If θr A = ∞ this is trivial – take E = R s . Otherwise,
P∞ for each n ∈ N choose a sequence hAnm im∈N of
−n
sets of
T diameter
S at most 2 , covering A, and such that m=0 (diam Anm )r ≤ θr,2−n A + 2−n . Set Fnm = Anm ,
E = n∈N m∈N Fnm ; then E is a Borel set in R s . Of course
T S T S
A ⊆ n∈N m∈N Amn ⊆ n∈N m∈N Fnm = E.
For any n ∈ N,
diam Fnm = diam Anm ≤ 2−n for every m ∈ N,
P∞ r
P∞ r −n
m=0 (diam Fnm ) = m=0 (diam Anm ) ≤ θr,2−n A + 2 ,
so
θr,2−n E ≤ θr,2−n A + 2−n .
Letting n → ∞,
θr E = limn→∞ θr,2−n E ≤ limn→∞ θr,2−n A + 2−n = θr A;
of course it follows that θr A = θr E, because A ⊆ E. Now by 264E we know that E ∈ ΣHr , so we can write µHr E in
place of θr E.
(b) This follows at once, because we have
µ∗Hr A = inf{µHr E : E ∈ ΣHr , A ⊆ E} = inf{θr E : E ∈ ΣHr , A ⊆ E} ≥ θr A
for every A ⊆ R s . On the other hand, if A ⊆ R s , we have a Borel set E ⊇ A such that θr A = µHr E, so that
µ∗Hr A ≤ µHr E = θr A.
(c)(i) Suppose first that µHr E < ∞. By (a), there are Borel sets E ′′ ⊇ E, H ⊇ E ′′ \ E such that µHr E ′′ = θr E,
µHr H = θr (E ′′ \ E) = µHr (E ′′ \ E) = µHr E ′′ − µHr E = µHr E ′′ − θr E = 0.
So setting E ′ = E ′′ \ H, we obtain a Borel set included in E, and
µHr (E ′′ \ E ′ ) ≤ µHr H = 0.
S
En < ∞ for eachSn; take Borel sets En′ , En′′ such that
(ii) For the general case, express E as n∈N En where µHrS
En ⊆ En ⊆ En and µHr (En \ En ) = 0 for each n; and set E = n∈N En′ , E ′′ = n∈N En′′ .
′ ′′ ′′ ′ ′

264G Lipschitz functions The definition of Hausdorff measure is exactly adapted to the following result, corre-
sponding to 262D.
310 Change of variable in the integral 264G

Proposition Let m, s ≥ 1 be integers, and φ : D → R s a γ-Lipschitz function, where D is a subset of R m . Then for
any A ⊆ D and r ≥ 0,
µ∗Hr (φ[A]) ≤ γ r µ∗Hr A
for every A ⊆ D, writing µHr for r-dimensional Hausdorff outer measure on either R m or R s .
proof (a) The case r = 0 is trivial, since then γ r = 1 and µ∗Hr A = µH0 A = #(A) if A is finite, ∞ otherwise, while
#(φ[A]) ≤ #(A).
(b) If r > 0, then take any δ > 0. Set η = δ/(1 + γ) and consider θrη : PR m → [0, ∞], defined as in 264A. We know
from 264Fb that
µ∗Hr A = θr A ≥ θrη A,
P∞
so thereSis a sequence hAn in∈N of sets, all of diameter at most η, covering A, with n=0 (diam An )
r
≤ µ∗Hr A + δ. Now
φ[A] ⊆ n∈N φ[An ∩ D] and
diam φ[An ∩ D] ≤ γ diam An ≤ γη ≤ δ
for every n. Consequently
P∞ P∞
θrδ (φ[A]) ≤ n=0 (diam φ[An ])
r
≤ n=0 γ r (diam An )r ≤ γ r (µ∗Hr A + δ),
and
µ∗Hr (φ[A]) = limδ↓0 θrδ (φ[A]) ≤ γ r µ∗Hr A,
as claimed.

264H The next step is to relate r-dimensional Hausdorff measure on R r to Lebesgue measure on R r . The basic
fact we need is the following, which is even more important for the idea in its proof than for the result.
Theorem Let r ≥ 1 be an integer, and A a bounded subset of R r ; write µr for Lebesgue measure on R r and d = diam A.
Then
d
µ∗r (A) ≤ µr B(0, ) = 2−r βr dr ,
2

where B(0, d2 ) is the ball with centre 0 and diameter d, so that B(0, 1) is the unit ball in R r , and has measure

1 k
βr = π if r = 2k is even,
k!
22k+1 k! k
= π if r = 2k + 1 is odd.
(2k+1)!

proof (a) For the calculation of βr , see 252Q or 252Xi.


(b) The case r = 1 is elementary, for in this case A is included in an interval of length diam A, so that µ∗1 A ≤ diam A.
So henceforth let us suppose that r ≥ 2.
(c) For 1 ≤ i ≤ r let Si : R r → R r be reflection in the ith coordinate, so that Si x = (ξ1 , . . . , ξi−1 , −ξi , ξi+1 ,
. . . , ξr ) for every x = (ξ1 , . . . , ξr ) ∈ R r . Let us say that a set C ⊆ R r is symmetric in coordinates in J, where
J ⊆ {1, . . . , r}, if Si [C] = C for i ∈ J. Now the centre of the argument is the following fact: if C ⊆ R is a bounded set
which is symmetric in coordinates in J, where J is a proper subset of {1, . . . , r}, and j ∈ {1, . . . , r} \ J, then there is
a set D, symmetric in coordinates in J ∪ {j}, such that diam D ≤ diam C and µ∗r C ≤ µ∗r D.
PP (i) Because Lebesgue measure is invariant under permutation of coordinates, it is enough to deal with the case
j = r. Start by writing F = C, so that diam F = diam C and µr F ≥ µ∗r C. Note that because Si is a homeomorphism
for every i,
Si [F ] = Si [C] = Si [C] = C = F
for i ∈ J, and F is symmetric in coordinates in J.
For y = (η1 , . . . , ηr−1 ) ∈ R r−1 , set
Fy = {ξ : (η1 , . . . , ηr−1 , ξ) ∈ F }, f (y) = µ1 Fy ,
where µ1 is Lebesgue measure on R. Set
264H Hausdorff measures 311

1
D = {(y, ξ) : y ∈ R r−1 , |ξ| < f (y)} ⊆ R r .
2

(ii) If H ⊆ R r is measurable and H ⊇ D, then, writing µr−1 for Lebesgue measure on R r−1 , we have

Z
µr H = µ1 {ξ : (y, ξ) ∈ H}µr−1 (dy)

(using 251N and 252D)


Z Z
≥ µ1 {ξ : (y, ξ) ∈ D}µr−1 (dy) = f (y)µr−1 (dy)
Z
= µ1 {ξ : (y, ξ) ∈ F }µr−1 (dy) = µr F ≥ µ∗r C.

As H is arbitrary, µ∗r D ≥ µ∗r C.


(iii) The next step is to check that diam D ≤ diam C. If x, x′ ∈ D, express them as (y, ξr ) and (y ′ , ξr′ ). Because
F is a bounded closed set in R r , Fy and Fy′ are bounded closed subsets of R. Also both f (y) and f (y ′ ) must be greater
than 0, so that Fy , Fy′ are both non-empty. Consequently
α = inf Fy , β = sup Fy , α′ = inf Fy′ , β ′ = sup Fy′
are all defined in R, and α, β ∈ Fy , while α′ and β ′ belong to Fy′ . We have

1 1
|ξr − ξr′ | ≤ |ξr | + |ξr′ | < f (y) + f (y ′ )
2 2
1 1
= (µ1 Fy + µ1 Fy′ ) ≤ (β − α + β ′ − α′ ) ≤ max(β ′ − α, β − α′ ).
2 2

So taking (ξ, ξ ′ ) to be one of (α, β ′ ) or (β, α′ ), we can find ξ ∈ Fy , ξ ′ ∈ Fy′ such that |ξ − ξ ′ | ≥ |ξr − ξr′ |. Now z = (y, ξ),
z ′ = (y ′ , ξ ′ ) both belong to F , so
kx − x′ k2 = ky − y ′ k2 + |ξr − ξr′ |2 ≤ ky − y ′ k2 + |ξ − ξ ′ |2 = kz − z ′ k2 ≤ (diam F )2 ,
and kx − x′ k ≤ diam F . As x and x′ are arbitrary, diam D ≤ diam F = diam C, as claimed.
(iv) Evidently Sr [D] = D. Moreover, if i ∈ J, then (interpreting Si as an operator on R r−1 )
FSi (y) = Fy for every y ∈ R r−1 ,
so f (Si (y)) = f (y) and, for ξ ∈ R, y ∈ R r−1 ,
(y, ξ) ∈ D ⇐⇒ |ξ| < 12 f (y) ⇐⇒ |ξ| < 12 f (Si (y)) ⇐⇒ (Si (y), ξ) ∈ D,
so that Si [D] = D. Thus D is symmetric in coordinates in J ∪ {r}. Q
Q
(d) The rest is easy. Starting from any bounded A ⊆ R r , set A0 = A and construct inductively A1 , . . . , Ar such
that
d = diam A = diam A0 ≥ diam A1 ≥ . . . ≥ diam Ar ,

µ∗r A = µ∗r A0 ≤ . . . ≤ µ∗r Ar ,

Aj is symmetric in coordinates in {1, . . . , j} for every j ≤ r.


At the end, we have Ar symmetric in coordinates in {1, . . . , r}. But this means that if x ∈ Ar then
−x = S1 S2 . . . Sr x ∈ Ar ,
so that
1 1 d
kxk = kx − (−x)k ≤ diam Ar ≤ .
2 2 2

Thus Ar ⊆ B(0, d2 ), and


d
µ∗r A ≤ µ∗r Ar ≤ µr B(0, ),
2
as claimed.
312 Change of variable in the integral 264I

264I Theorem Let r ≥ 1 be an integer; let µ be Lebesgue measure on R r , and let µHr be r-dimensional Hausdorff
measure on R r . Then µ and µHr have the same measurable sets and
µE = 2−r βr µHr E
for every measurable set E ⊆ R r , where βr = µB(0, 1), so that the normalizing factor is

1
2−r βr = πk if r = 2k is even,
22k k!
k!
= πk if r = 2k + 1 is odd.
(2k+1)!

proof (a) Of course if B = B(x, α) is any ball of radius α,


2−r βr (diam B)r = βr αr = µB.

(b) The point is that µ∗ = 2−r βr µ∗Hr . P


P Let A ⊆ R r .
S
(i) Let δ, ǫ > 0. By 261F, there is a sequence hBn in∈N of balls, all of diameter at most δ, such that A ⊆ Bn
P∞ n∈N
and n=0 µBn ≤ µ∗ A + ǫ. Now, defining θrδ as in 264A,
P∞ P∞
2−r βr θrδ (A) ≤ 2−r βr n=0 (diam Bn )r = n=0 µBn ≤ µ∗ A + ǫ.
Letting δ ↓ 0,
2−r βr µ∗Hr A ≤ µ∗ A + ǫ.
As ǫ is arbitrary, 2−r βr µ∗Hr A ≤ µ∗ A.
S
(ii) Let ǫ > 0. Then there is a sequence hAn in∈N of sets of diameter at most 1 such that A ⊆ An and
P∞ r
n∈N
n=0 (diam An ) ≤ θr1 A + ǫ, so that
P∞ P∞
µ∗ A ≤ n=0 µ∗ An ≤ n=0 2−r βr (diam An )r ≤ 2−r βr (θr1 A + ǫ) ≤ 2−r βr (µ∗Hr A + ǫ)
by 264H. As ǫ is arbitrary, µ∗ A ≤ 2−r βr µ∗Hr A. Q
Q
(c) Because µ, µHr are the measures defined from their respective outer measures by Carathéodory’s method, it
follows at once that µ = 2−r βr µHr in the strict sense required.

*264J The Cantor set I remarked in 264A that fractional ‘dimensions’ r were of interest. I have no space for
these here, and they are off the main lines of this volume, but I will give one result for its intrinsic interest.
Proposition Let C be the Cantor set in [0, 1]. Set r = ln 2/ ln 3. Then the r-dimensional Hausdorff measure of C is 1.
T
proof (a) Recall that C = n∈N Cn , where each Cn consists of 2n closed intervals of length 3−n , and Cn+1 is obtained
from Cn by deleting the middle (open) third of each interval of Cn . (See 134G.) Because C is closed, µHr C is defined
(264E). Note that 3r = 2.
(b) If δ > 0, take n such that 3−n ≤ δ; then C can be covered by 2n intervals of diameter 3−n , so
θrδ C ≤ 2n (3−n )r = 1.
Consequently
µHr C = µ∗Hr C = limδ↓0 θrδ C ≤ 1.

(c) We need the following elementary fact: if α, β, γ ≥ 0 and max(α, γ) ≤ β, then αr + γ r ≤ (α + β + γ)r . P
P
Because 0 < r ≤ 1,

ξ 7→ (ξ + η)r − ξ r = r 0
(ξ + ζ)r−1 dζ
is non-increasing for every η ≥ 0. Consequently

(α + β + γ)r − αr − γ r ≥ (β + β + γ)r − β r − γ r
≥ (β + β + β)r − β r − β r = β r (3r − 2) = 0,
as required. Q
Q
(d) Now suppose that I ⊆ R is any interval, and m ∈ N; write jm (I) for the number of the intervals composing Cm
which are included in I. Then 2−m jm (I) ≤ (diam I)r . P
P If I does not meet Cm , this is trivial. Otherwise, induce on
264Xc Hausdorff measures 313

l = min{i : I meets only one of the intervals composing Cm−i }.


If l = 0, so that I meets only one of the intervals composing Cm , then jm (I) ≤ 1, and if jm (I) = 1 then diam I ≥ 3−m
so (diam I)r ≥ 2−m ; thus the induction starts. For the inductive step to l > 1, let J be the interval of Cm−l which
meets I, and J ′ , J ′′ the two intervals of Cm−l+1 included in J, so that I meets both J ′ and J ′′ , and
jm (I) = jm (I ∩ J) = jm (I ∩ J ′ ) + jm (I ∩ J ′′ ).
By the inductive hypothesis,
(diam(I ∩ J ′ ))r + (diam(I ∩ J ′′ ))r ≥ 2−m jm (I ∩ J ′ ) + 2−m jm (I ∩ J ′′ ) = 2−m jm (I).
On the other hand, by (c),

(diam(I ∩ J ′ ))r + (diam(I ∩ J ′′ ))r ≤ (diam(I ∩ J ′ ) + 3−m+l−1 + diam(I ∩ J ′′ ))r


= (diam(I ∩ J))r ≤ (diam I)r
because J ′ , J ′′ both have diameter at most 3−(m−l+1) , the length of the interval between them. Thus the induction
continues. Q Q
P∞
(e) Now suppose that ǫ > 0. Then P∞ there is a sequence hAn in∈N of sets, covering C, such that n=0 (diam An )r <
µHr C + ǫ. Take ηn > 0 such that n=0 (diam An + ηn )r ≤ µHr C + ǫ, and for each n take an open interval In ⊇ An of
length at most diam An + ηn and S with neither endpoint belonging to C; this is possible because S
C does not include any
non-trivial interval. Now C ⊆ n∈N In ; because C is compact, there is a k ∈ N such that C ⊆ n≤k In . Next, there is
an m ∈ N such that no endpoint of any In , for n ≤ k, belongs to Cm . Consequently each of the intervals composing
Pk
Cm must be included in some In , and (in the terminology of (d) above) n=0 jm (In ) ≥ 2m . Accordingly
Pk Pk P∞
1 ≤ n=0 2−m jm (In ) ≤ n=0 (diam In )r ≤ n=0 (diam An + ηn )r ≤ µHr C + ǫ.
As ǫ is arbitrary, µHr C ≥ 1, as required.

*264K General metric spaces While this chapter deals exclusively with Euclidean spaces, readers familiar with
the general theory of metric spaces may find the nature of the theory clearer if they use the language of metric spaces
in the basic definitions and results. I therefore repeat the definition here, and spell out the corresponding results in
the exercises 264Yb-264Yl.
Let (X, ρ) be a metric space, and r > 0. For any A ⊆ X, δ > 0 set

X
θrδ A = inf{ (diam An )r : hAn in∈N is a sequence of subsets of X covering A,
n=0
diam An ≤ δ for every n ∈ N},
interpreting the diameter of the empty set as 0, and inf ∅ as ∞, so that θrδ A = ∞ if A cannot be covered by a sequence
of sets of diameter at most δ. Say that θr A = supδ>0 θrδ A is the r-dimensional Hausdorff outer measure of A, and
take the measure µHr defined by Carathéodory’s method from this outer measure to be r-dimensional Hausdorff
measure on X.

264X Basic exercises > (a) Show that all the functions θrδ of 264A are outer measures. Show that in that context,
θrδ (A) = 0 iff θr (A) = 0, for any δ > 0 and any A ⊆ R s .
(b) Let s ≥ 1 be an integer, and θ an outer measure on R s such that θ(A ∪ B) = θA + θB whenever A, B are
non-empty subsets of R s and inf x∈A,y∈B kx − yk > 0. Show that every Borel subset of R s is measured by the measure
defined from θ by Carathéodory’s method.
> (c) Let s ≥ 1 be an integer and r > 0; define θrδ as in 264A. Show that for any A ⊆ R s , δ > 0,

X
θrδ A = inf{ (diam Fn )r : hFn in∈N is a sequence of closed subsets of X
n=0
covering A, diam Fn ≤ δ for every n ∈ N}

X
= inf{ (diam Gn )r : hGn in∈N is a sequence of open subsets of X
n=0
covering A, diam Gn ≤ δ for every n ∈ N}.
314 Change of variable in the integral 264Xd

> (d) Let s ≥ 1 be an integer and r ≥ 0; let µHr be r-dimensional Hausdorff measure on R s . Show that for every
A ⊆ R s there is a Gδ set (that is, a set expressible as the intersection of a sequence of open sets) H ⊇ A such that
µHr H = µ∗Hr A. (Hint: use 264Xc.)

> (e) Let s ≥ 1 be an integer, and 0 ≤ r < r′ . Show that if A ⊆ R s and the r-dimensional Hausdorff outer measure
µ∗Hr Aof A is finite, then µ∗Hr′ A must be zero.

(f )(i) Suppose that f : [a, b] → R has graph Γf ⊆ R 2 , where a ≤ b in R. Show that the outer measure µ∗H1 (Γf ) of Γ
for one-dimensional Hausdorff measure on R 2 is at most b − a + Var[a,b] (f ). (Hint: if f has finite variation, show that
diam(Γf ↾]t,u[ ) ≤ u − t + Var]t,u[ (f ); then use 224E.) (ii) Let f : [0, 1] → [0, 1] be the Cantor function (134H). Show that
µH1 (Gammaf ) = 2. (Hint: 264G.)

(g) In 264A, show that



X
θrδ A = inf{ (diam An )r : hAn in∈N is a sequence of convex sets covering A,
n=0
diam An ≤ δ for every n ∈ N}
s
for any A ⊆ R .

264Y Further exercises (a) Let θ11 be the outer measure on R 2 defined in 264A, with r = δ = 1, and µ11 the
measure derived from θ11 by Carathéodory’s method, Σ11 its domain. Show that any set in Σ11 is either negligible or
conegligible.

(b) Let (X, ρ) be a metric space and r ≥ 0. Show that if A ⊆ X and µ∗Hr A < ∞, then A is separable.

(c) Let (X, ρ) be a metric space, and θ an outer measure on X such that θ(A ∪ B) = θA + θB whenever A, B
are non-empty subsets of X and inf x∈A,y∈B ρ(x, y) > 0. (Such an outer measure is called a metric outer measure.)
Show that every open subset of X is measured by the measure defined from θ by Carathéodory’s method.

(d) Let (X, ρ) be a metric space and r > 0; define θrδ as in 264K. Show that for any A ⊆ X,

X
µ∗Hr A = sup inf{ (diam Fn )r : hFn in∈N is a sequence of closed subsets of X
δ>0 n=0
covering A, diam Fn ≤ δ for every n ∈ N}

X
= sup inf{ (diam Gn )r : hGn in∈N is a sequence of open subsets of X
δ>0 n=0
covering A, diam Gn ≤ δ for every n ∈ N}.

(e) Let (X, ρ) be a metric space and r ≥ 0; let µHr be r-dimensional Hausdorff measure on X. Show that for every
A ⊆ X there is a Gδ set H ⊇ A such that µHr H = µ∗Hr A is the r-dimensional Hausdorff outer measure of A.

(f ) Let (X, ρ) be a metric space and r ≥ 0; let Y be any subset of X, and give Y its induced metric ρY . (i) Show
(Y )∗
that the r-dimensional Hausdorff outer measure µHr on Y is just the restriction to PY of the outer measure µ∗Hr on
(Y )
X. (ii) Show that if either µ∗Hr Y < ∞ or µHr measures Y then r-dimensional Hausdorff measure µHr on Y is just the
subspace measure on Y induced by the measure µHr on X.

(g) Let (X, ρ) be a metric space and r > 0. Show that r-dimensional Hausdorff measure on X is atomless. (Hint:
Let E ∈ dom µHr . (i) If E is not separable, there is an open set G such that E ∩ G and E \ G are both non-separable,
therefore both non-negligible. (ii) If there is an x ∈ E such that µHr (E ∩ B(x, δ)) > 0 for every δ > 0, then one of
these sets has non-negligible complement in E. (iii) Otherwise, µHr E = 0.)

(h) Let (X, ρ) be a metric space and r ≥ 0; let µHr be r-dimensional Hausdorff measure on X. Show that if
µHr E < ∞ then µHr E = sup{µHr F : F ⊆ E is closed and totally bounded}. (Hint: given ǫ > 0, use 264Yd to find a
closed totally bounded set F such that µHr (F \ E) = 0 and µHr (E \ F ) ≤ ǫ, and now apply 264Ye to F \ E.)
264 Notes Hausdorff measures 315

(i) Let (X, ρ) be a complete metric space and r ≥ 0; let µHr be r-dimensional Hausdorff measure on X. Show that
if µHr E < ∞ then µHr E = sup{µHr F : F ⊆ E is compact}.

(j) Let (X, ρ) and (Y, σ) be metric spaces. If D ⊆ X and φ : D → Y is a function, then φ is γ-Lipschitz if
σ(φ(x), φ(x′ )) ≤ γρ(x, x′ ) for every x, x′ ∈ D. (i) Show that in this case, if r ≥ 0, µ∗Hr (φ[A]) ≤ γ r µ∗Hr A for every
A ⊆ D, writing µ∗Hr for r-dimensional Hausdorff outer measure on either X or Y . (ii) Show that if X is complete and
µHr E is defined and finite, then µHr (φ[E]) is defined. (Hint: 264Yi.)

(k) Let (X, ρ) be a metric space, and for r ≥ 0 let µHr be Hausdorff r-dimensional measure on X. Show that there
is a unique ∆ = ∆(X) ∈ [0, ∞] such that µHr X = ∞ if r ∈ [0, ∆[, 0 if r ∈ ]∆, ∞[.

(l) Let (X, ρ) be a metric space and φ : I → X a continuous function, where I ⊆ R is an interval. Write µH1 for
one-dimensional Hausdorff measure on X. Show that
Pn
µH1 (φ[I]) ≤ sup{ i=1 ρ(φ(ti ), φ(ti−1 )) : t0 , . . . , tn ∈ I, t0 ≤ . . . ≤ tn },
the length of the curve φ, with equality if φ is injective.

(m) Set r = ln 2/ ln 3, as in 264J, and write µHr for r-dimensional Hausdorff measure
P∞on the Cantor set C. Let λ
be the usual measure on {0, 1}N (254J). Define φ : {0, 1}N → C by setting φ(x) = 32 n=0 3−n x(n) for x ∈ {0, 1}N .
Show that φ is an isomorphism between ({0, 1}N , λ) and (C, µHr ).

(n) Set r = ln 2/ ln 3 and write µHr for r-dimensional Hausdorff measure on the Cantor set C. Let f : [0, 1] → [0, 1]
be the Cantor function and let µ be Lebesgue measure on R. Show that µf [E] = µHr E for every E ∈ dom µHr and
µHr (C ∩ f −1 [F ]) = µF for every Lebesgue measurable set F ⊆ [0, 1].

(o) Let (X, ρ) be a metric space and h : [0, ∞[ → [0, ∞[ a non-decreasing function. For A ⊆ X set

X
θh A = sup inf{ h(diam An ) : hAn in∈N is a sequence of subsets of X
δ>0 n=0
covering A, diam An ≤ δ for every n ∈ N},
interpreting diam ∅ as 0, inf ∅ as ∞ as usual. Show that θh is an outer measure on X. State and prove theorems
corresponding to 264E and 264F. Look through 264X and 264Y for further results which might be generalizable,
perhaps on the assumption that h is continuous on the right.

(p) Let (X, ρ) be a metric space. Let us say that if a < b in


PR and f : [a, b] → X is a function, then f is absolutely
n
continuous if for Pevery ǫ > 0 there is a δ > 0 such that i=1 ρ(f (ai ), f (bi )) ≤ ǫ whenever a ≤ a0 ≤ b0 ≤ . . . ≤
n
an ≤ bn ≤ b and i=0 bi − ai ≤ δ. Show that f : [a, b] → X is absolutely continuous iff it is continuous and of
bounded variation (in the sense of 224Ye) and µH1 f [A] = 0 whenever A ⊆ [a, b] is Lebesgue negligible, where µH1 is
1-dimensional Hausdorff measure on X. (Compare 225M.) Show that in this case µH1 f [ [a, b] ] < ∞.

(q) Let s ≥ 1 be an integer, and r ∈ [1, inf ty[. For x, y ∈ R s set ρ(x, y) = kx − yks/r . (i) Show that ρ is a metric
on R s inducing the Euclidean topology. (ii) Let µHr be the associated r-dimensional Hausdorff measure. Show that
µHr B(0, 1) = 2s .

264 Notes and comments In this section we have come to the next step in ‘geometric measure theory’. I am taking
this very slowly, because there are real difficulties in the subject, and for the purposes of this volume we do not need
to master very much of it. The idea here is to find a definition of r-dimensional Lebesgue measure which will be
‘geometric’ in the strict sense, that is, dependent only on the metric structure of R r , and therefore applicable to sets
which have a metric structure but no linear structure. As has happened before, the definition of Hausdorff measure
from an outer measure gives no problems – the only new idea in 264A-264C is that of using a supremum θr = supδ>0 θrδ
of outer measures – and the difficult part is proving that our new measure has any useful properties. Concerning the
properties of Hausdorff measure, there are two essential objectives; first, to check that these measures, in general, share
a reasonable proportion of the properties of Lebesgue measure; and second, to justify the term ‘r-dimensional measure’
by relating Hausdorff r-dimensional measure on R r to Lebesgue measure on R r .
As for the properties of general Hausdorff measures, we have to go rather carefully. I do not give counter-examples
here because they involve concepts which belong to Volumes 4 and 5 rather than this volume, but I must warn you to
expect the worst. However, we do at least have open sets measurable, so that all Borel sets are measurable (264E).
316 Change of variable in the integral 264 Notes

The outer measure of a set A can be defined in terms of the Borel sets including A (264Fa), though not in general
in terms of the open sets including A; but the measure of a measurable set E is not necessarily the supremum of the
measures of the Borel sets included in E, unless E is of finite measure (264Fc). We do find that the outer measure
θr defined in 264A is the outer measure defined from µHr (264Fb), so that the phrase ‘r-dimensional Hausdorff outer
measure’ is unambiguous. A crucial property of Lebesgue measure is the fact that the measure of a measurable set E
is the supremum of the measures of the compact subsets of E; this is not generally shared by Hausdorff measures, but
is valid for sets E of finite measure in complete spaces (264Yi). Concerning subspaces, there are no problems with the
outer measures, and for sets of finite measure the subspace measures are also consistent (264Yf). Because Hausdorff
measure is defined in metric terms, it behaves regularly for Lipschitz maps (264G); one of the most natural classes of
functions to consider when studying metric spaces is that of 1-Lipschitz functions, so that (in the language of 264G)
µ∗Hr φ[A] ≤ µ∗Hr A for every A.
The second essential feature of Hausdorff measure, its relation with Lebesgue measure in the appropriate dimension,
is Theorem 264I. Because both Hausdorff measure and Lebesgue measure are translation-invariant, this can be proved
by relatively elementary means, except for the evaluation of the normalizing constant; all we need to know is that
r r
µ [0, 1[ = 1 and µHr [0, 1[ are both finite and non-zero, and this is straightforward. (The arguments of part (a) of the
proof of 261F are relevant.) For the purposes of this chapter, we do not I think have to know the value of the constant;
but I cannot leave it unsettled, and therefore give Theorem 264H, the isodiametric inequality, to show that it is
just the Lebesgue measure of an r-dimensional ball of diameter 1, as one would hope and expect. The critical step
in the argument of 264H is in part (c) of the proof. This is called ‘Steiner symmetrization’; the idea is that given a
set A, we transform A through a series of steps, at each stage lowering, or at least not increasing, its diameter, and
raising, or at least not decreasing, its outer measure, progressively making A more symmetric, until at the end we have
a set which is sufficiently constrained to be amenable. The particular symmetrization operation used in this proof
is important enough; but the idea of progressive regularization of an object is one of the most powerful methods in
measure theory, and you should give all your attention to mastering any example you encounter. In my experience,
the idea is principally useful when seeking an inequality involving disparate quantities – in the present example, the
diameter and volume of a set.
Of course it is awkward having two measures on R r , differing by a constant multiple, and for the purposes of the
next section it would actually have been a little more convenient to follow Federer 69 in using ‘normalized Hausdorff
measure’ 2−r βr µHr . (For non-integral r, we could take βr = π r/2 /Γ(1 + 2r ), as suggested in 252Xi.) However, I believe
this to be a minority position, and the striking example of Hausdorff measure on the Cantor set (264J, 264Ym-264Yn)
looks much better in the non-normalized version.
Hausdorff (ln 2/ ln 3)-dimensional measure on the Cantor set is of course but one, perhaps the easiest, of a large
class of examples. Because the Hausdorff r-dimensional outer measure of a set A, regarded as a function of r, behaves
dramatically (falling from ∞ to 0) at a certain critical value ∆(A) (see 264Xe, 264Yk), it gives us a metric space
invariant of A; ∆(A) is the Hausdorff dimension of A. Evidently the Hausdorff dimension of C is ln 2/ ln 3, while
that of r-dimensional Euclidean space is r.

265 Surface measures


In this section I offer a new version of the arguments of §263, this time not with the intention of justifying integration-
by-substitution, but instead to give a practically effective method of computing the Hausdorff r-dimensional measure
of a smooth r-dimensional surface in an s-dimensional space. The basic case to bear in mind is r = 2, s = 3, though
any other combination which you can easily visualize will also be a valuable aid to intuition. I give a fundamental
theorem (265E) providing a formula from which we can hope to calculate the r-dimensional measure of a surface in
s-dimensional space which is parametrized by a differentiable function, and work through some of the calculations in
the case of the r-sphere (265F-265H).

265A Normalized Hausdorff measure As I remarked at the end of the last section, Hausdorff measure, as
defined in 264A-264C, is not quite the most appropriate measure for our work here; so in this section I will use nor-
malized Hausdorff measure, meaning νr = 2−r βr µHr , where µHr is r-dimensional Hausdorff measure (interpreted
in whichever space is under consideration) and βr = µr B(0, 1) is the Lebesgue measure of any ball of radius 1 in R r .
It will be convenient to take β0 = 1. As shown in 264H-264I, this normalization makes νr on R r agree with Lebesgue
measure µr . Observe that of course νr∗ = 2−r βr µ∗Hr (264Fb).

265B Linear subspaces Just as in §263, the first step is to deal with linear operators.
265C Surface measures 317

Theorem Suppose that r,√s are integers with 1 ≤ r ≤ s, and that T is a real s × r matrix; regard T as a linear operator
from R r to R s . Set J = det T ′ T , where T ′ is the transpose of T . Write νr for normalized r-dimensional Hausdorff
measure on R s , Tr for its domain, and µr for Lebesgue measure on R r . Then
νr T [E] = Jµr E
for every measurable set E ⊆ R r . If T is injective (that is, if J 6= 0), then
νr F = Jµr T −1 [F ]
whenever F ∈ Tr and F ⊆ T [R r ].
proof The formula for J assumes that det T ′ T is non-negative, which is a fact not in evidence; but the argument
below will establish it adequately soon.
(a) Let V be the linear subspace of R s consisting of vectors y = (η1 , . . . , ηs ) such that ηi = 0 whenever r < i ≤ s.
Let R be the r × s matrix hρij ii≤r,j≤s , where ρij = 1 if i = j ≤ r, 0 otherwise; then the s × r matrix R′ may be
regarded as a bijection from R r to V . Let W be an r-dimensional linear subspace of R s including T [R r ], and let P be
an orthogonal s × s matrix such that P [W ] = V . Then S = RP T is an r × r matrix. We have R′ Ry = y for y ∈ V , so
R′ RP T = P T and
S ′ S = T ′ P ′ R′ RP T = T ′ P ′ P T = T ′ T ;
accordingly
det T ′ T = det S ′ S = (det S)2 ≥ 0
and J = | det S|. At the same time,
P ′ R′ S = P ′ R′ RP T = P ′ P T = T .
Observe that J = 0 iff S is not injective, that is, T is not injective.
(b) If we consider the s × r matrix P ′ R′ as a map from R r to R s , we see that φ = P ′ R′ is an isometry between R r
(r)
and W , with inverse φ−1 = RP ↾ W . It follows that φ is an isomorphism between the measure spaces (Rr , µHr ) and
(s) (r) (s)
(W, µHrW ), where µHr is r-dimensional Hausdorff measure on R r and µHrW is the subspace measure on W induced
(s)
by r-dimensional Hausdorff measure µHr on R s .
P (i) If A ⊆ R r , A′ ⊆ W ,
P
(s)∗ (r)∗ (r)∗ (s)∗
µHr (φ[A]) ≤ µHr (A), µHr (φ−1 [A′ ]) ≤ µHr (A′ ),
(s)∗ (r)∗
using 264G twice. Thus µHr (φ[A]) = µHr (A) for every A ⊆ R r .
(s) (s)
(ii) Now because W is closed, therefore in the domain of µHr (264E), the subspace measure µHrW is just the
(s)∗ (r)∗
measure induced by µHr ↾ W by Carathéodory’s method (214H(b-ii)). Because φ is an isomorphism between (Rr , µHr )
(s)∗ (r) (s)
and (W, µHr ↾ W ), it is an isomorphism between (R r , µHr ) and (W, µHrW ). Q
Q
(c) It follows that φ is also an isomorphism between the normalized versions (R r , µr ) and (W, νrW ), writing νrW
for the subspace measure on W induced by νr .
Now if E ⊆ R r is Lebesgue measurable, we have µr S[E] = Jµr E, by 263A; so that
νr T [E] = νr (P ′ R′ [S[E]]) = νr (φ[S[E]]) = µr S[E] = Jµr E.
If T is injective, then S = φ−1 T must also be injective, so that J 6= 0 and
νr F = µr (φ−1 [F ]) = Jµr (S −1 [φ−1 [F ]]) = Jµr T −1 [F ]
whenever F ∈ Tr and F ⊆ W = T [R r ].

265C Corollary Under the conditions of 265B,


νr∗ T [A] = Jµ∗r A
for every A ⊆ R r .
proof (a) If E is Lebesgue measurable and A ⊆ E, then T [A] ⊆ T [E], so
νr∗ T [A] ≤ νr T [E] = Jµr E;
as E is arbitrary, νr∗ T [A] ≤ Jµ∗r A.
318 Change of variable in the integral 265C

(b) If J = 0 we can stop. If J 6= 0 then T is injective, so if F ∈ Tr and T [A] ⊆ F we shall have


Jµ∗r A ≤ Jµr T −1 [F ∩ W ] = νr (F ∩ W ) ≤ νr F ;
as F is arbitrary, Jµ∗r A ≤ νr∗ T [A].

265D I now proceed to the lemma corresponding to 263C.



Lemma Suppose that 1 ≤ r ≤ s and that T is an s × r matrix; set J = det T ′ T , and suppose that J 6= 0. Then for
any ǫ >√0 there is a ζ = ζ(T, ǫ) > 0 such that
(i) | det S ′ S − J| ≤ ǫ whenever S is an s × r matrix and kS − T k ≤ ζ;
(ii) whenever D ⊆ R r is a bounded set and φ : D → Rs is a function such that kφ(x) − φ(y) − T (x − y)k ≤ ζkx − yk
for all x, y ∈ D, then |νr∗ φ[D] − Jµ∗r D| ≤ ǫµ∗r D.

proof (a) Because
√ det S S is a continuous function of the coefficients of S, 262Hb tells us that there must be a ζ0 > 0

such that |J − det S S| ≤ ǫ whenever kS − T k ≤ ζ0 .
(b) Because J 6= 0, T is injective, and there is an r × s matrix T ∗ such that T ∗ T is the identity r × r matrix. Take
ζ > 0 such that ζ ≤ ζ0 , ζkT ∗ k < 1, J(1 + ζkT ∗ k)r ≤ J + ǫ and 1 − J −1 ǫ ≤ (1 − ζkT ∗ k)r .
Let φ : D → R s be such that kφ(x) − φ(y) − T (x − y)k ≤ ζkx − yk whenever x, y ∈ D. Set ψ = φT ∗ , so that φ = ψT .
Then for u, v ∈ T [D]
kψ(u) − ψ(v)k ≤ (1 + ζkT ∗ k)ku − vk, ku − vk ≤ (1 − ζkT ∗ k)−1 kψ(u) − ψ(v)k.
P Take x, y ∈ D such that u = T x, v = T y; of course x = T ∗ u, y = T ∗ v. Then
P

kψ(u) − ψ(v)k = kφ(T ∗ u) − φ(T ∗ v)k = kφ(x) − φ(y)k


≤ kT (x − y)k + ζkx − yk
= ku − vk + ζkT ∗ u − T ∗ vk ≤ ku − vk(1 + ζkT ∗ k).
Next,

ku − vk = kT x − T yk ≤ kφ(x) − φ(y)k + ζkx − yk


= kψ(u) − ψ(v)k + ζkT ∗ u − T ∗ vk
≤ kψ(u) − ψ(v)k + ζkT ∗ kku − vk,
so that (1 − ζkT ∗ k)ku − vk ≤ kψ(u) − ψ(v)k and ku − vk ≤ (1 − ζkT ∗ k)−1 kψ(u) − ψ(v)k. Q
Q
(c) Now from 264G and 265C we see that
νr∗ φ[D] = νr∗ ψ[T [D]] ≤ (1 + ζkT ∗ k)r νr∗ T [D] = (1 + ζkT ∗ k)r Jµ∗r D ≤ (J + ǫ)µ∗r D,
and (provided ǫ ≤ J)

(J − ǫ)µ∗r D = (1 − J −1 ǫ)νr∗ T [D] ≤ (1 − J −1 ǫ)(1 − ζkT ∗ k)−r νr∗ ψ[T [D]]


≤ νr∗ ψ[T [D]] = νr∗ φ[D].
(Of course, if ǫ ≥ J, then surely (J − ǫ)µ∗r D ≤ νr∗ φ[D].) Thus
(J − ǫ)µ∗r D ≤ νr∗ φ[D] ≤ (J + ǫ)µ∗r D
as required, and we have an appropriate ζ.

265E Theorem Suppose that 1 ≤ r ≤ s; write µr for Lebesgue measure on R r , νr for normalized Hausdorff
measure on R s , and Tr for the domain of νr . Let D ⊆ R r be any set, and φ : D → R s a function differentiable
relative p to its domain at each point of D. For each x ∈ D let T (x) be a derivative of φ at x relative to D, and set
J(x) = det T (x)′ T (x). Set D′ = {x : x ∈ D, J(x) > 0}. Then
(i) J : D → [0,R∞[ is a measurable function;
(ii) νr∗ φ[D] ≤ D J(x)µr (dx),
allowing ∞ as the value of the integral;
(iii) νr∗ φ[D \ D′ ] = 0.
If D is Lebesgue measurable, then
(iv) φ[D] ∈ Tr .
265E Surface measures 319

If D is measurableRand φ is injective, then


(v) νr φ[D] = D J dµr ;
(vi) for any set E ⊆ φ[D], E ∈ Tr iff φ−1 [E] ∩ D′ is Lebesgue measurable, and in this case
R R
νr E = φ−1 [E]
J(x)µr (dx) = D
J × χ(φ−1 [E])dµr ;
(vii) for every real-valued function g defined on a subset of φ[D],
R R
φ[D]
g dνr = D
J × gφ dµr ,
if either integral is defined in [−∞, ∞], provided we interpret J(x)g(φ(x)) as zero when J(x) = 0 and g(φ(x)) is
undefined.
proof I seek to follow the line laid out in the proof of 263D.
(a) Just as in 263D, we know that J : D → R is measurable, since J(x) is a continuous function of the coefficients of
T (x), all of which are measurable,
S by 262P. If D is Lebesgue measurable, then there is a sequence hFn in∈N of compact
subsets ofS D such that D \ n∈N Fn is µr -negligible. Now φ[Fn ] is compact, therefore belongs to Tr , for each n ∈ N. As
for φ[D \ n∈N Fn ], this must be νr -negligible by 264G, because φ is a countable union of Lipschitz functions (262N).
So
S S
φ[D] = n∈N φ[Fn ] ∪ φ[D \ n∈N Fn ] ∈ Tr .
This deals with (i) and (iv).

(b) For the moment, assume that D is bounded and that J(x) > 0 for every x ∈ D, and fix ǫ > 0. Let Msr be the
′ r s ∗
set of s × r matrices T such that det T T 6= 0, that is, the corresponding map T : R → R is injective. For T ∈ Msr
take ζ(T, ǫ) > 0 as in 265D.

Take hDn in∈N , hTn in∈N as in 262M, with A = Msr , so that hDn in∈N is a partition of D into sets which are relatively
measurable in D, and each Tn is an s × r matrix such that
kT (x) − Tn k ≤ ζ(Tn , ǫ) whenever x ∈ Dn ,

kφ(x) − φ(y) − Tn (x − y)k ≤ ζ(Tn , ǫ)kx − yk for all x, y ∈ Dn .


p
Then, setting Jn = det Tn′ Tn , we have
|J(x) − Jn | ≤ ǫ for every x ∈ Dn ,

|νr∗ φ[Dn ] − Jn µ∗r Dn | ≤ ǫµ∗r Dn ,


by the choice of ζ(Tn , ǫ). So


X
νr∗ φ[D] ≤ νr∗ φ[Dn ]
n=0
S
(because φ[D] = n∈N φ[Dn ])

X ∞
X
≤ Jµ∗r Dn + ǫµ∗r Dn ≤ ǫµ∗r D + Jn µ∗r Dn
n=0 n=0
(because the Dn are disjoint and relatively measurable in D)
Z X ∞

= ǫµr D + Jn χDn dµ
D n=0
Z Z
≤ ǫµ∗r D + J(x) + ǫµr (dx) = 2ǫµ∗r D + J dµr .
D D

If D is measurable and φ is injective, then all the Dn are Lebesgue measurable subsets of R r , so all the φ[Dn ] are
measured by νr , and they are also disjoint. Accordingly
Z ∞
X
J dµ ≤ Jn µr Dn + ǫµr D
D n=0
X∞
≤ (νr φ[Dn ] + ǫµr Dn ) + ǫµr D = νr φ[D] + 2ǫµr D.
n=0
320 Change of variable in the integral 265E

Since ǫ is arbitrary, we get


R
νr∗ φ[D] ≤ D
J dµr ,
and if D is measurable and φ is injective,
R
D
J dµr ≤ νr φ[D];
thus we have (ii) and (v), on the assumption that D is bounded and J > 0 everywhere on D.

(c) Just as in 263D, we can now relax the assumption that D is bounded by considering Bk = B(0, k) ⊆ R r ;
provided J > 0 everywhere on D, we get
R R
νr∗ φ[D] = limk→∞ µ∗r φ[D ∩ Bk ] ≤ limk→∞ D∩Bk
J dµr = D
J dµr ,
with equality if D is measurable and φ is injective.

(d) Now we find that νr∗ φ[D \ D′ ] = 0.

PP (i) Let η ∈ ]0, 1]. Define ψη : D → R s+r by setting ψ(x) = (φ(x), ηx), identifying R s+r with R s × R r . ψη is
differentiable relative to its domain at each point of D, with derivative T̃η (x), being the (s + r) × r matrix in which the
top s rows consist of the s × r matrix T (x), and the bottom r rows are ηIr , writing Ir for the r × r identity matrix.
(Use 262Ib.) Now of course T̃η (x), regarded as a map from R r to R s+r , is injective, so
q p
J˜η (x) = det T̃η (x)′ T̃η (x) = det(T (x)′ T (x) + η 2 I) > 0.

We have limη↓0 J˜η (x) = J(x) = 0 for x ∈ D \ D′ .

(ii) Express T (x) as hτij (x)ii≤s,j≤r for each x ∈ D. Set


Cm = {x : x ∈ D, kxk ≤ m, |τij (x)| ≤ m for all i ≤ s, j ≤ r}
for each m ≥ 1. For x ∈ Cm , all the coefficients of T̃η (x) have moduli at most m; consequently (giving the crudest
and mostpimmediately available inequalities) all the coefficients of T̃η (x)′ T̃η (x) have moduli at most (r + s)m2 and
J˜η (x) ≤ r!(s + r)r mr . Consequently we can use Lebesgue’s Dominated Convergence Theorem to see that
R
limη↓0 Cm \D ′
J˜η dµr = 0.

(iii) Let ν̃r be normalized Hausdorff r-dimensional measure on R s+r . Applying (b) of this proof to ψη ↾Cm \ D′ ,
we see that
R
ν̃r∗ ψη [Cm \ D′ ] ≤ J˜η dµr . Cm \D ′

Now we have a natural map P : R s+r → R s given by setting P (ξ1 , . . . , ξs+r ) = (ξ1 , . . . , ξs ), and P is 1-Lipschitz, so
by 264G we have (allowing for the normalizing constants 2−r βr )
νr∗ P [A] ≤ ν̃r∗ A
for every A ⊆ R s+r . In particular,
R
νr∗ φ[Cm \ D′ ] = νr∗ P [ψη [Cm \ D′ ]] ≤ ν̃r∗ ψη [Cm \ D′ ] ≤ C \D′ J˜η dµr → 0
m

∗ ′
S ∗ ′
as η ↓ 0. But this means that νr φ[Cm \ D ] = 0. As D = m≥1 Cm , νr φ[D \ D ] = 0, as claimed. Q Q

(d) This proves (iii) of the theorem. But of course this is enough to give (ii) and (v), because we must have
R R
νr∗ φ[D] = νr∗ φ[D′ ] ≤ D′
J dµr = D
J dµr ,
with equality if D is measurable and φ is injective.

(e) So let us turn to part (vi). Assume that D is measurable and that φ is injective.

(i) Suppose that E ⊆ φ[D] belongs to Tr . Let


Hk = {x : x ∈ D, kxk ≤ k, J(x) ≤ k}
for each k; then each Hk is Lebesgue measurable, so (applying (iii) to φ↾Hk ) φ[Hk ] ∈ Tr , and
νr φ[Hk ] ≤ kµr Hk < ∞.
265F Surface measures 321

Thus φ[D] can be covered by a sequence of sets of finite measure for νr , which of course are of finite measure for
r-dimensional Hausdorff measure on R s . By 264Fc, there are Borel sets E1 , E2 ⊆ R s such that E1 ⊆ E ⊆ E2 and
νr (E2 \ E1 ) = 0. Now F1 = φ−1 [E1 ], F2 = φ−1 [E2 ] are Lebesgue measurable subsets of D, and
R
F2 \F1
J dµr = νr φ[F2 \ F1 ] = νr (φ[D] ∩ E2 \ E1 ) = 0.
Accordingly µr (D′ ∩ (F2 \ F1 )) = 0. But as
D′ ∩ F1 ⊆ D′ ∩ φ−1 [E] ⊆ D′ ∩ F2 ,
it follows that D′ ∩ φ−1 [E] is measurable, and that
Z Z Z
J dµr = J dµr = J dµr
φ−1 [E] D ′ ∩φ−1 [E] D ′ ∩F1
Z
= J dµr = νr φ[D ∩ F1 ] = νr E1 = νr E.
D∩F1
−1 ′ −1
R R
Moreover, J × χ(φ [E]) = J × χ(D ∩ φ [E]) is measurable, so we can write J × χ(φ−1 [E]) in place of φ−1 [E]
J.

(ii) If E ⊆ φ[D] and D′ ∩ φ−1 [E] is measurable, then of course


E = φ[D′ ∩ φ−1 [E]] ∪ φ[(D \ D′ ) ∩ φ−1 [E]] ∈ Tr ,
because φ[G] ∈ Tr for every measurable G ⊆ D and φ[D \ D′ ] is νr -negligible.
(f ) Finally, (vii) follows at once from (vi), applying 235J to µr and the subspace measure induced by νr on φ[D].

265F The surface of a sphere To show how these ideas can be applied to one of the basic cases, I give the details
of a method of describing spherical surface measure in s-dimensional space. Take r ≥ 1 and s = r + 1. Write Sr for
{z : z ∈ Rr+1 , kzk = 1}, the r-sphere. Then we have a parametrization φr of Sr given by setting
 
  sin ξ1 sin ξ2 sin ξ3 . . . sin ξr
ξ1
 cos ξ1 sin ξ2 sin ξ3 . . . sin ξr 
 ξ2   
   cos ξ2 sin ξ3 . . . sin ξr 
...   
φr  = ... .
...   
   cos ξr−2 sin ξr−1 sin ξr 
...  
cos ξr−1 sin ξr
ξr
cos ξr
I choose this formulation because I wish to use an inductive argument based on the fact that
   
x sin ξ φr (x)
φr+1 =
ξ cos ξ
for x ∈ R r , ξ ∈ R. Every φr is differentiable, by 262Id. If we set

Dr = {x : ξ1 ∈ ]−π, π] , ξ2 , . . . , ξr ∈ [0, π],


if ξj ∈ {0, π} then ξi = 0 for i < j},
then it is easy to check that Dr is a Borel subset of Rr and that φr ↾Dr is a bijection between Dr and Sr . Now let
Tr (x) be the (r + 1) × r matrix φ′r (x). Then
   
x sin ξ Tr (x) cos ξ φr (x)
Tr+1 = .
ξ 0 − sin ξ
So
     
x ′ x sin2 ξ Tr (x)′ Tr (x) sin ξ cos ξ Tr (x)′ φr (x)
(Tr+1 ) Tr+1 = .
ξ ξ cos ξ sin ξ φr (x)′ Tr (x) cos2 ξφr (x)′ φr (x) + sin2 ξ
But of course φr (x)′ φr (x) = kφr (x)k2 = 1 for every x, and (differentiating with respect to each coordinate of x, if you
wish) Tr (x)′ φr (x) = 0, φr (x)′ Tr (x) = 0. So we get
     2 
x ′ x sin ξ Tr (x)′ Tr (x) 0
(Tr+1 ) Tr+1 = ,
ξ ξ 0 1
322 Change of variable in the integral 265F
p
and writing Jr (x) = det Tr (x)′ Tr (x),
 
x
Jr+1 = | sinr ξ|Jr (x).
ξ

At this point we induce on r to see that


Jr (x) = | sinr−1 ξr sinr−2 ξr−1 . . . sin ξ2 |
   
sin x cos x
(since of course the induction starts with the case r = 1, φ1 (x) = , T1 (x) = , T1 (x)′ T1 (x) = 1,
cos x − sin x
J1 (x) = 1).
To find the surface measure of Sr , we need to calculate

Z Z π Z π Z π
Jr dµr = ... sinr−1 ξr . . . sin ξ2 dξ1 dξ2 . . . dξr
Dr 0 0 −π
r Z
Y π Y Z π/2
r−1
= 2π sink−1 t dt = 2π cosk t dt
k=2 0 k=1 −π/2

π
(substituting 2 − t for t). But in the language of 252Q, this is just
Qr−1
2π k=1 Ik = 2πβr−1 ,
where βr−1 is the volume of the unit ball of R r−1 (interpreting β0 as 1, if you like).

265G The surface area of a sphere can also be calculated through the following result.
Theorem Let µr+1 be Lebesgue measure on R r+1 , and νr normalized r-dimensional Hausdorff measure on R r+1 . If f
is a locally µr+1 -integrable real-valued function, y ∈ R r+1 and δ > 0,
R RδR
B(y,δ)
f dµr+1 =
f dνr dt, 0 ∂B(y,t)
R
where I write ∂B(y, s) for the sphere {x : kx − yk = s} and the integral . . . dt is to be taken with respect to Lebesgue
measure on R.
proof Take any differentiable function φ : R r → Sr with a Borel set F ⊆ R r such that φ↾F is a bijection between F
and Sr ; such a pair (φ, F ) is described in 265F. Define ψ : R r × R → R r+1 by setting ψ(z, t) = y + tφ(z); then ψ is
differentiable and ψ↾F ×]0, δ] is a bijection between F ×]0, δ] and B(y, δ)\{y}. For t ∈ ]0, δ], z ∈ R r set ψt (z) = ψ(z, t);
then ψt ↾F is a bijection between F and the sphere {x : kx − yk = t} = ∂B(y, t).
The derivative of φ at z is an (r + 1) × r matrix T1 (z) say, and the derivative Tt (z) of ψt at z is just tT1 (z); also
the derivative of ψ at (z, t) is the the (r + 1) × (r + 1) matrix T (z, t) = ( tT1 (z) φ(z) ), where φ(z) is interpreted as a
column vector. If we set
p
Jt (z) = det Tt (z)′ Tt (z), J(z, t) = | det T (z, t)|,
then
 
tT1 (z)′
J(z, t)2 = det T (z, t)′ T (z, t) = det ( tT1 z φ(z) )
φ(z)′
 
t2 T1 (z)′ T1 (z) 0
= det = Jt (z)2 ,
0 1

because when we come to calculate the (i, r + 1)-coefficient of T (z, t)′ T (z, t), for 1 ≤ i ≤ r, it is
Pr+1 ∂φj t ∂ Pr+1
j=1 t (z)φj (z) = ( j=1 φj (z)2 ) = 0,
∂ζi 2 ∂ζi
Pr+1
where φj is the jth coordinate of φ; while the (r + 1, r + 1)-coefficient of T (z, t)′ T (z, t) is just j=1 φj (z)2 = 1. So in
fact J(z, t) = Jt (z) for all z ∈ R r , t > 0.
Now, given f ∈ L1 (µr+1 ), we can calculate
265Xe Surface measures 323

Z Z
f dµr+1 = f dµr+1
B(y,δ) B(y,δ)\{y}
Z
= f (ψ(z, t))J(z, t)µr+1 (d(z, t))
F ×]0,δ]
(by 263D)
Z δ Z
= f (ψt (z))Jt (z)µr (dz)dt
0 F
(where µr is Lebesgue measure on R r , by Fubini’s theorem, 252B)
Z δZ
= f dνr dt
0 ∂B(y,t)

by 265E(vii).

265H Corollary If νr is normalized r-dimensional Hausdorff measure on R r+1 , then νr Sr = (r + 1)βr+1 .


proof In 265G, take y = 0, δ = 1, and f = χB(0, 1); then
R R1 R1 1
βr+1 = f dµr+1 = 0
νr (∂B(0, t))dt = 0
tr νr Sr dt = ν r Sr
r+1

applying 264G to the maps x 7→ tx, x 7→ 1t x from R r+1 to itself to see that νr (∂B(0, t)) = tr νr Sr for t > 0.

265X Basic exercises (a) Let r ≥ 1, and let Sr (α) = {z : z ∈ R r+1 , kzk = α} be the r-sphere of radius α. Show
that νr Sr (α) = 2πβr−1 αr = (r + 1)βr+1 αr for every α ≥ 0.

> (b) Let r ≥ 1, and for a ∈ [−1, 1] set Ca = {z : z ∈ R r+1 , kzk = 1, ζ1 ≥ a}, writing z = (ζ1 , . . . , ζr+1 ) as usual.
Show that
R arccos a
νr Ca = rβr 0
sinr−1 t dt.

> (c) Again write Ca = {z : z ∈ Sr , ζ1 ≥ a}, where Sr ⊆ R r+1 is the unit sphere. Show that, for any a ∈ ]0, 1],
νr Sr Pr+1 R
ν r Ca ≤ 2
. (Hint: calculate i=1 Sr kξi k2 νr (dx).)
2(r+1)a

>(d) Let φ : ]0, 1[ → R r be an injective differentiable function. Show that the ‘length’ or one-dimensional Hausdorff
R1
measure of φ[ ]0, 1[ ] is just 0 kφ′ (t)kdt.

(e)(i) Show that if I is the identity r × r matrix and z ∈ R r , then det(I + zz ′ ) = 1 + kzk2 . (Hint: induce on r.) (ii)
Write Ur−1 for the open unit ball in R r−1 , where r ≥ 2. Define φ : Ur−1 × R → Sr by setting
 
  x
x
φ =  θ(x) cos ξ  ,
ξ
θ(x) sin ξ
p
where θ(x) = 1 − kxk2 . Show that
 ′   !
1 ′
′ x ′ x I+ xx 0
φ φ = θ(x)2 ,
ξ ξ 0 θ(x)2
 
x
so that J = 1 for all x ∈ Ur−1 , ξ ∈ R. (iii) Hence show that the normalized r-dimensional Hausdorff measure of
ξ  
Pr−1 2 z
{y : y ∈ Sr , i=1 ηi < 1} is just 2πβr−1 , where βr−1 is the Lebesgue measure of Ur−1 . (iv) By considering ψz =  0 
0
for z ∈ Sr−2 , or otherwise, show that the normalized r-dimensional Hausdorff measure of Sr is 2πβr−1 . (v) Setting
Ca = {z : z ∈ R r+1 , kzk = 1, ζr ≥ a}, as in 265Xb and 265Xc, show that νr Ca = 2πµr−1 {x : x ∈ R r−1 , kxk ≤ 1, ξ1 ≥
a} for every a ∈ [−1, 1].
324 Change of variable in the integral 265Y

265Y Further exercises (a) Take a < b in R. (i) Show that φ : [a, b] → R r is absolutely continuous in the
sense of 264Yp iff all its coordinates φi : [a, b] → R, for i ≤ r, are absolutely continuous in the sense of §225. (ii) Let
φ : [a, b] → R rRbe a continuous function, and set F = {x : x ∈ ]a, b[ , φ is differentiable at x}. Show that φ is absolutely
continuous iff F kφ′ (x)kdx is finite and ν1 (φ[[a, b] \ F ]) = 0, where ν1 is normalized Hausdorff one-dimensional
R measure
on R r . (Hint: 225K.) (iii) Show that if φ : [a, b] → R r is absolutely continuous then ν1∗ (φ[D]) ≤ D kφ′ (x)kdx for every
D ⊆ [a, b], with equality if D is measurable and φ↾D is injective.
(b) Suppose that a ≤ b in R, and that f : [a, b] → R is a continuous function of bounded variation with graph Γf .
Rb p
Show that the one-dimensional Hausdorff measure of Γf is Var[a,b] (f ) + a ( 1 + (f ′ )2 − |f ′ |). (Hint: set E = dom f ′
and examine Γf ↾E , Γf \ Γf ↾E separately; use 264Xf and/or 264Yl.)

265 Notes and comments The proof of 265B seems to call on most of the second half of the alphabet. The idea is
supposed to be straightforward enough. Because T [R r ] has dimension at most r, it can be rotated by an orthogonal
transformation P into a subspace of the canonical r-dimensional subspace V , which is a natural copy of R r ; the matrix
R represents the copying process from V to R r , and φ or P ′ R′ is a copy of R r onto a subspace including T [Rr ]. All
this copying back and forth is designed to turn T into a linear operator S : R r → R r to which we can apply 263A, and
part (b) of the proof is the check that we are copying the measures as well as the linear structures.
In 265D-265E I have tried to follow 263C-263D as closely as possible. In fact only one new idea is needed. When
s = r, we have a special argument available to show that µ∗r φ[D] ≤ Jµ∗r D + ǫµ∗r D (in the language of 263C) which
applies whether or not J = 0. When s > r, this approach fails, because we can no longer approximate νr T [B] by νr G
where G ⊇ T [B] is open. (See part (b-i) of the proof of 263C.) I therefore turn to a different argument, valid only when
J > 0, and accordingly have to find a separate method to show that {φ(x) : x ∈ D, J(x) = 0} is νr -negligible. Since
we are working without restrictions on the dimensions r, s except that r ≤ s, we can use the trick of approximating
φ : D → R s by ψη : D → R s+r , as in part (d) of the proof of 265E.
I give three methods by which the area of the r-sphere can be calculated; a bare-hands approach (265F), the
surrounding-cylinder method (265Xe) and an important repeated-integral theorem (265G). The first two provide for-
mulae for the area of a cap (265Xb, 265Xe(v)). The surrounding-cylinder method is attractive because the Jacobian
comes out to be 1, that is, we have an inverse-measure-preserving function. I note that despite having developed a
technique which allows irregular domains, I am still forced by the singularity in the function θ of 265Xe to take the
sphere in two bites. Theorem 265G is a special case of the Coarea Theorem (Evans & Gariepy 92, §3.4; Federer
69, 3.2.12).
For the next step in the geometric theory of measures on Euclidean space, see Chapter 47 in Volume 4.

*266 The Brunn-Minkowski inequality


We now have most of the essential ingredients for a proof of the Brunn-Minkowski inequality (266C) in a strong
form. I do not at present expect to use it in this treatise, but it is one of the basic results of geometric measure theory
and from where we now stand is not difficult, so I include it here. The preliminary results on arithmetic and geometric
means (266A) and essential closures (266B) are of great importance for other reasons.

266A Arithmetic and geometric means We shall need the following standard result.
Pn Qn Pn
Proposition If u0 , . . . , un , p0 , . . . , pn ∈ [0, ∞[ and i=0 pi = 1, then i=0 upi i ≤ i=0 pi ui .
proof Induce on n. For n = 0, p0 = 1 the result is trivial. If n = 1, then if u1 = 0 the result is trivial (even if, as is
u0
standard in this book, we interpret 00 as 1). Otherwise, set t = ; then
u1
t p 0 ≤ p0 t + 1 − p0 = p0 t + p1
(as in part (a) of the proof of 244E), so
up00 up11 = tp0 u1 ≤ p0 tu1 + p1 u1 = p0 u0 + p1 u1 .
For the inductive step to n ≥ 2, if p0 = . . . = pn−1 = 0 the result is trivial. Otherwise, set q = p0 + . . . + pn−1 = 1 − pn ;
then

n
Y n−1
Y n−1
X
p /q pi
upi i = ( ui i )q upnn ≤ ( ui )q upnn
q
i=0 i=0 i=0
(by the inductive hypothesis)
266B The Brunn-Minkowski inequality 325

n−1
X pi
≤ q( u i ) + pn u n
q
i=0
(by the two-term case just examined)
n
X
= pi u i ,
i=0

and the induction continues.

266B Proposition For any set D ⊆ R r set


µ∗ (D∩B(x,δ))
cl*D = {x : lim supδ↓0 > 0},
µB(x,δ)
where µ is Lebesgue measure on R r .
(a) D \ cl*D is negligible.
(b) cl*D ⊆ D.
(c) cl*D is a Borel set.
(d) µ(cl*D) = µ∗ D.
(e) If C ⊆ R then C + cl*D ⊆ cl*(C + D), writing C + D for {x + y : x ∈ C, y ∈ D}.
proof (a) 261Da.
(b) If x ∈ R r \ D then D ∩ B(x, δ) = ∅ for all small δ.
(c) The point is just that (x, δ) 7→ µ∗ (D ∩ B(x, δ)) is continuous. P
P For any x, y ∈ R r and δ, η ≥ 0 we have

|µ∗ (D ∩ B(y, η)) − µ∗ (D ∩ B(x, δ))| ≤ µ(B(y, η)△B(x, δ))


= 2µ(B(x, δ) ∪ B(y, η)) − µB(x, δ) − µB(y, η)

≤ βr 2(max(δ, η) + kx − yk)r − δ r − η r
(where βr = µB(0, 1))
→0
as (y, η) → (x, δ). Q
Q So
µ∗ (D∩B(x,δ)) 1
x 7→ lim supδ↓0 = inf α∈Q,α>0 supβ∈Q,0<β≤α µ∗ (D ∩ B(x, β))
µB(x,δ) βr β r
is Borel measurable, and
µ∗ (D∩B(x,δ))
cl*D = {x : lim supδ↓0 > 0}
µB(x,δ)
is a Borel set.
(d) By (c), µ(cl*D) is defined; by (a), µ(cl*D) ≥ µ∗ D. On the other hand, let E be a measurable envelope of D
(132Ee); then 261Db tells us that
µ∗ (D∩B(x,δ)) µ(E∩B(x,δ))
lim supδ↓0 ≤ lim supδ↓0 =0
µB(x,δ) µB(x,δ)
for almost every x ∈ R r \ E, so cl*D \ E is negligible and
µ(cl*D) ≤ µE = µ∗ D.

(e) If x ∈ C and y ∈ cl*D, set


1 µ∗ (D∩B(y,δ))
γ= lim supδ↓0 > 0.
3 µB(y,δ)
For any η > 0, there is a δ ∈ ]0, η] such that µ∗ (D ∩ B(y, δ)) ≥ 2γµB(x, δ). Let δ1 ∈ [0, δ[ be such that δ r − δ1r ≤ γδ r .
Then there is an x′ ∈ C such that kx − x′ k ≤ δ − δ1 . In this case,

µ∗ ((C + D) ∩ B(x + y, δ)) ≥ µ∗ ((x′ + D) ∩ B(x′ + y, δ1 )) = µ∗ (D ∩ B(y, δ1 ))


≥ µ∗ (D ∩ B(y, δ)) − µB(y, δ) + µB(y, δ1 )
≥ 2βr γδ r − βr δ r + βr δ1r ≥ βr γδ r .
326 Change of variable in the integral 266B

As η is arbitrary,
µ∗ ((C+D)∩B(x+y,δ))
lim supδ↓0 ≥γ
µB(y,δ)

and x + y ∈ cl*(C + D); as x and y are arbitrary, C + cl*D ⊆ cl*(C + D).


Remark In this context, cl*D is called the essential closure of D.

266C Theorem Let A, B ⊆ R r be non-empty sets, where r ≥ 1 is an integer. If µ is Lebesgue measure on R r , and
A + B = {x + y : x ∈ A, y ∈ B}, then µ∗ (A + B)1/r ≥ (µ∗ A)1/r + (µ∗ B)1/r .
proof (a) Consider first the case in which A = [a, a′ [ and B = [b, b′ [ are half-open intervals. In this case A + B =
[a + b, a′ + b′ [; writing a = (α1 , . . . , αr ), etc., as in §115, set
α′i −αi βi′ −βi
ui = , vi =
α′i +βi′ −αi −βi α′i +βi′ −αi −βi
for each i. Then we have

r
Y r
Y
(µA)1/r + (µB)1/r = (αi′ − αi )1/r + (βi′ − βi )1/r
i=0 i=0
r
Y r
Y
1/r 1/r
= µ(A + B)1/r ( ui + vi )
i=1 i=1
Xr Xr
1 1
≤ µ(A + B)1/r ( ui + vi )
r r
i=1 i=1
(266A)
= µ(A + B)1/r .

Sm Sn
(b) Now I show by induction on m + n that if A = j=0 Aj and B = j=0 Bj , where hAj ij≤m and hBj ij≤n are
both disjoint families of non-empty half-open intervals, then µ(A + B)1/r ≥ (µA)1/r + (µB)1/r . P
P The induction starts
with the case m = n = 0, dealt with in (a). For the inductive step to m + n = l ≥ 1, one of m, n is non-zero; the
argument is the same in both cases; suppose the former. Since A0 ∩ A1 = ∅, there must be some j ≤ r and α ∈ R such
that A0 and A1 are separated by the hyperplane {x : ξj = α}. Set A′ = {x : x ∈ A, ξj < α} and A′′ = {x : x ∈ A,
ξj ≥ α}; then both A′ and A′′ are non-empty and can be expressed as the union of at most m − 1 disjoint half-open
µA′
intervals. Set γ = ∈ ]0, 1[. The function β 7→ µ{x : x ∈ B, ξj < β} is continuous, so there is a β ∈ R such that
µA
µB = γµB, where B = {x : x ∈ B, ξj < β}; set B ′′ = B \ B. Then B ′ and B ′′ can be expressed as unions of at most
′ ′

n half-open intervals. By the inductive hypothesis,


µ(A′ + B ′ )1/r ≥ (µA′ )1/r + (µB ′ )1/r , µ(A′′ + B ′′ )1/r ≥ (µA′′ )1/r + (µB ′′ )1/r .
Now A′ + B ′ ⊆ {x : ξj < α + β}, while A′′ + B ′′ ⊆ {x : ξj ≥ α + β}. So

µ(A + B) ≥ µ(A′ + B ′ ) + µ(A′′ + B ′′ )


r r
≥ (µA′ )1/r + (µB ′ )1/r + (µA′′ )1/r + (µB ′′ )1/r
r r
= (γµA)1/r + (γµB)1/r + ((1 − γ)µA)1/r + ((1 − γ)µB)1/r
r
= ( µA)1/r + (µB)1/r .

Taking rth roots, µ(A + B)1/r ≥ (µA)1/r + (µB)1/r and the induction proceeds. Q
Q
(c) Now suppose that A and B are compact non-empty subsets of R r . Then µ(A + B)1/r ≥ (µA)1/r + (µB)1/r . P P
A + B is compact (because A × B ⊆ R r × R r is compact, being closed and bounded, and addition is continuous, so we
can use 2A2Eb). Let ǫ > 0. Let G ⊇ A + B be an open set such that µG ≤ µ(A + B) √ + ǫ (134Fa); then there is a δ > 0
such that B(x, 2δ) ⊆ G for every x ∈ A + B (2A2Ed). Let n ∈ N be such that 2−n r ≤ δ, and let A1 be the union of
all the half-open intervals of the form [2−n z, 2−n z + 2−n e[ which meet A, where z ∈ Z r and e = (1, 1, . . . , 1). Then A1
is a finite disjoint union of half-open intervals, A ⊆ A1 and every point of A1 is within a distance δ of some point of A.
Similarly, we can find a set B1 , a finite disjoint union of half-open intervals, including B and such that every point of
266 Notes The Brunn-Minkowski inequality 327

B1 is within δ of some point of B. But this means that every point of A1 + B1 is within a distance 2δ of some point
of A + B, and belongs to G. Accordingly

(µ(A + B) + ǫ)1/r ≥ (µG)1/r ≥ µ(A1 + B1 )1/r ≥ (µA1 )1/r + (µB1 )1/r


(by (b))
≥ (µA)1/r + (µB)1/r .
As ǫ is arbitrary, µ(A + B)1/r ≥ (µA)1/r + (µB)1/r . Q
Q
(d) Next suppose that A, B ⊆ R r are Lebesgue measurable. Then

(µA)1/r + (µB)1/r = sup{(µK)1/r + (µL)1/r : K ⊆ A and L ⊆ B are compact}


(134Fb)
≤ sup{µ(K + L)1/r : K ⊆ A and L ⊆ B are compact}
(by (c))
≤ µ∗ (A + B)1/r .

(e) For the penultimate step, suppose that A, B ⊆ R r have non-zero outer Lebesgue measure. Consider cl*A, cl*B
and cl*(A + B) as defined in 266B. Then cl*A and cl*B are non-empty and their sum is included in cl*(A + B), by
266Bb and 266Be. So we have

(µ∗ A)1/r + (µ∗ B)1/r = µ(cl*A)1/r + µ(cl*B)1/r


(266Bd)
≤ µ∗ (cl*A + cl*B)1/r
(by (d) here)
≤ µ(cl*(A + B))1/r = µ∗ (A + B)1/r .

(f ) Finally, for arbitrary non-empty sets A, B ⊆ R r , note that if (for instance) A is negligible then we can take any
x ∈ A and see that
µ∗ (A + B)1/r ≥ µ∗ (x + B)1/r = (µB)1/r = (µ∗ A)1/r + (µ∗ B)1/r ,
and the result is similarly trivial if B is negligible. So all cases are covered.

266X Basic exercises (a) Let D, D′ be subsets of R r . Show that (i) cl*(D ∪ D′ ) = cl*D ∪ cl*D′ (ii) cl*D = cl*D′
iff D and D′ have a common measurable envelope (iii) cl*D \ cl*(R r \ D′ ) ⊆ cl*(D ∩ D′ ) (iv) D is Lebesgue measurable
iff cl*D ∩ cl*(R r \ D) is Lebesgue negligible (v) D ∪ cl*D is a measurable envelope of D (vi) cl*(cl*D) = cl*D.
(b) Show that, for a measurable set E ⊆ R, cl*E is just the set of real numbers which are not density points of
R \ E.
(c) In 266C, show that if A and B are similar convex sets in the same orientation then A + B is a convex set similar
to both and µ(A + B)1/r = (µA)1/r + (µB)1/r .

266 Notes and comments The proof of 266C is taken from Federer 69. There is a slightly specious generality
in the form given here. If the sets A and B are at all irregular, then µ∗ (A + B)1/r is likely to be much greater than
(µ∗ A)1/r + (µ∗ B)1/r . The critical case, in which A and B are similar convex sets, is much easier (266Xc). The theorem
is therefore most useful when A and B are non-similar convex sets and we get a non-trivial estimate which may be
hard to establish by other means. For this case we do not need 266B. Theorem 266C is an instructive example of the
way in which the dimension r enters formulae when we seek results applying to general Euclidean spaces. There will
be many more when I return to geometric measure theory in Chapter 47 of Volume 4.
328 Probability theory

Chapter 27
Probability theory
Lebesgue created his theory of integration in response to a number of problems in real analysis, and all his life seems
to have thought of it as a tool for use in geometry and calculus (Lebesgue 72, vols. 1 and 2). Remarkably, it turned
out, when suitably adapted, to provide a solid foundation for probability theory. The development of this approach
is generally associated with the name of Kolmogorov. It has so come to dominate modern abstract probability theory
that many authors ignore all other methods. I do not propose to commit myself to any view on whether σ-additive
measures are the only way to give a rigorous foundation to probability theory, or whether they are adequate to deal
with all probabilistic ideas; there are some serious philosophical questions here, since probability theory, at least in its
applied aspects, seeks to help us to understand the material world outside mathematics. But from my position as a
measure theorist, it is incontrovertible that probability theory is among the central applications of the concepts and
theorems of measure theory, and is one of the most vital sources of new ideas; and that every measure theorist must
be alert to the intuitions which probabilistic methods can provide.
I have written the preceding paragraph in terms suggesting that ‘probability theory’ is somehow distinguishable
from the rest of measure theory; this is another point on which I should prefer not to put forward any opinion as
definitive. But undoubtedly there is a distinction, rather deeper than the elementary point that probability deals
(almost) exclusively with spaces of measure 1. M.Loève argues persuasively (Loève 77, §10.2) that the essence of
probability theory is the artificial nature of the probability spaces themselves. In measure theory, when we wish to
integrate a function, we usually feel that we have a proper function with a domain and values. In probability theory,
when we take the expectation of a random variable, the variable is an ‘observable’ or ‘the result of an experiment’;
we are generally uncertain, or ignorant, or indifferent concerning the factors underlying the variable. Let me give an
example from the theorems below. In the proof of the Central Limit Theorem (274F), I find that I need an auxiliary
list Z0 , . . . , Zn of random variables, independent of each other and of the original sequence X0 , . . . , Xn . I create such
a sequence by taking a product space Ω × Ω′ , and writing Xi′ (ω, ω ′ ) = Xi (ω), while the Zi are functions of ω ′ . Now the
difference between the Xi and the Xi′ is of a type which a well-trained analyst would ordinarily take seriously. We do
not think that the function x 7→ x2 : [0, 1] → [0, 1] is the same thing as the function (x1 , x2 ) 7→ x21 : [0, 1]2 → [0, 1]. But
a probabilist is likely to feel that it is positively pedantic to start writing Xi′ instead of Xi . He did not believe in the
space Ω in the first place, and if it turns out to be inadequate for his intuition he enlarges it without a qualm. Loève
calls probability spaces ‘fictions’, ‘inventions of the imagination’ in Larousse’s words; they are necessary in the models
Kolmogorov has taught us to use, but we have a vast amount of freedom in choosing them, and in their essence they
are nothing so definite as a set with points.
A probability space, therefore, is somehow a more shadowy entity in probability theory than it is in measure theory.
The important objects in probability theory are random variables and distributions, particularly joint distributions.
In this volume I shall deal exclusively with random variables which can be thought of as taking values in some power
of R; but this is not the central point. What is vital is that somehow the codomain, the potential set of values, of a
random variable, is much better defined than its domain. Consequently our attention is focused not on any features
of the artificial space which it is convenient to use as the underlying probability space – I write ‘underlying’, though it
is the most superficial and easily changed aspect of the model – but on the distribution on the codomain induced by
the random variable. Thus the Central Limit Theorem, which speaks only of distributions, is actually more important
in applied probability than the Strong Law of Large Numbers, which claims to tell us what a long-term average will
almost certainly be.
W.Feller (Feller 66) goes even farther than Loève, and as far as possible works entirely with distributions, setting
up machinery which enables him to go for long stretches without mentioning probability spaces at all. I make no
attempt to emulate him. But the approach is instructive and faithful to the essence of the subject.
Probability theory includes more mathematics than can easily be encompassed in a lifetime, and I have selected for
this introductory chapter the two limit theorems I have already mentioned, the Strong Law of Large Numbers and the
Central Limit Theorem, together with some material on martingales (§§275-276). They illustrate not only the special
character of probability theory – so that you will be able to form your own judgement on the remarks above – but also
some of its chief contributions to ‘pure’ measure theory, the concepts of ‘independence’ and ‘conditional expectation’.
271Ad Distributions 329

271 Distributions
I start this chapter with a discussion of ‘probability distributions’, the probability measures on R n defined by families
(X1 , . . . , Xn ) of random variables. I give the basic results describing the circumstances under which two distributions
are equal (271G), integration with respect to a distribution (271E), and probability density functions (271H-271K).

271A Notation I have just spent some paragraphs on an attempt to describe the essential difference between
probability theory and measure theory. But there is a quicker test by which you may discover whether your author is a
measure theorist orR a probabilist: open any page, and look for the phrases ‘measurable function’ and ‘random variable’,
and the formulae ‘ f dµ’ and ‘E(X)’. The first member of each pair will enable you to diagnose ‘measure’ and the second
‘probability’, with little danger of error. So far in this treatise I have firmly used measure theorists’ terminology, with
a few individual quirks. But in a chapter on probability theory I find that measure-theoretic notation, while perfectly
adequate in a formal sense, does such violence to the familiar formulations as to render them unnatural. Moreover,
you must surely at some point – if you have not already done so – become familiar with probabilists’ language. So in
this chapter I will make a substantial step in that direction. Happily, I think that this can be done without setting up
any direct conflicts, so that I shall be able, in later volumes, to call upon this work in whichever notation then seems
appropriate, without needing to re-formulate it.
(a) So let (Ω, Σ, µ) be a probability space. I take the opportunity given by a new phrase to make a technical move.
A real-valued random variable on Ω will be a member of L0 (µ), as defined in 241A; that is, a real-valued function
X defined on a conegligible subset of Ω such that X is measurable with respect to the completion µ̂ of µ, or, if you
prefer, such that X↾E is Σ-measurable for some conegligible set E ⊆ Ω.1
R
(b) If X is a real-valued random variable on a probability space (Ω, Σ, µ), write E(X) = X dµ if this is defined in
[−∞, ∞] in the sense of Chapter 12 and §133. In this case I will call E(X) the mean or expectation of X. Thus we
may say that ‘X has a finite expectation’ in place of ‘X is integrable’. 133A says that ‘E(X + Y ) = E(X) + E(Y )
whenever E(X) and E(Y ) and their sum are defined in [−∞, ∞]’, and 122P becomes ‘a real-valued random variable X
has a finite expectation iff E(|X|) < ∞’.

(c) If X is a real-valued random variable with finite expectation, the variance of X is


Var(X) = E(X − E(X))2 = E(X 2 − 2E(X)X + E(X)2 ) = E(X 2 ) − (E(X))2
(Note that this formula shows that E(X)2 ≤ E(X 2 ); compare 244Xd(i).) Var(X) is finite iff E(X 2 ) < ∞, that is, iff
X ∈ L2 (µ) (244A). In particular, X + Y and cX have finite variance whenever X and Y do and c ∈ R.

(d) I shall allow myself to use such formulae as


Pr(X > a), Pr(X − ǫ ≤ Y ≤ X + δ),
where X and Y are random variables on the same probability space (Ω, Σ, µ), to mean respectively
µ̂{ω : ω ∈ dom X, X(ω) > a},

µ̂{ω : ω ∈ dom X ∩ dom Y, X(ω) − ǫ ≤ Y (ω) ≤ X(ω) + δ},


writing µ̂ for the completion of µ as usual. There are two points to note here. First, Pr depends on µ̂, not on µ; in
effect, the notation automatically directs us to complete the probability space (Ω, Σ, µ). I could, of course, equally well
write
Pr(X 2 + Y 2 > 1) = µ∗ {ω : ω ∈ dom X ∩ dom Y, X(ω)2 + Y (ω)2 > 1},
taking µ∗ to be the outer measure on Ω associated with µ (132B). Secondly, I will use this notation only for predicates
corresponding to Borel measurable sets; that is to say, I shall write
T
Pr(ψ(X1 , . . . , Xn )) = µ̂{ω : ω ∈ i≤n dom Xi , ψ(X1 (ω), . . . , Xn (ω))}
only when the set
{(α1 , . . . , αn ) : ψ(α1 , . . . , αn )}
is a Borel set in R n . Part of the reason for this restriction will appear in the next few paragraphs; Pr(ψ(X1 , . . . , Xn ))
must be something calculable from knowledge of the joint distribution of X1 , . . . , Xn , as defined in 271C. In fact we
1 For an account of how this terminology became standard, see http://www.dartmouth.edu/∼ chance/Doob/conversation.html.
330 Probability theory 271Ad

can safely extend the idea to ‘universally measurable’ predicates ψ, to be discussed in Volume 4. But it could happen
that µ gave a measure to a set of the form {ω : X(ω) ∈ A} for some exceedingly irregular set A, and in such a case it
would be prudent to regard this as an accidental pathology of the probability space, and to treat it in a rather different
way.
(I see that I have rather glibly assumed that the formula above defines Pr(ψ(X1 , . . . , Xn )) for every Borel predicate
ψ. This is a consequence of 271Bb below.)

271B Theorem Let (Ω, Σ, µ) beTa probability space, and X1 , . . . , Xn real-valued random variables on Ω. Set
X (ω) = (X1 (ω), . . . , Xn (ω)) for ω ∈ i≤n dom Xi .
(a) There is a unique Radon measure ν on R n such that
ν ]−∞, a] = Pr(Xi ≤ αi for every i ≤ n)
n
Q
whenever a = (α1 , . . . , αn ) ∈ R , writing ]−∞, a] for i≤n ]−∞, αi ];
(b) νR n = 1 and νE = µ̂(X X −1 [E]) whenever νE is defined, where µ̂ is the completion of µ; in particular, νE =
Pr((X1 , . . . , Xn ) ∈ E) for every Borel set E ⊆ R n .
T
proof Let Σ̂ be the domain of µ̂, and set D = i≤n dom Xi = dom X ; then D is conegligible, so belongs to Σ̂. Let
µ̂D = µ̂↾ PD be the subspace measure on D (131B, 214B), and ν0 the image measure µ̂DX −1 (234D); let T be the
domain of ν0 .
Write B for the algebra of Borel sets in R n . Then B ⊆ T. P P For i ≤ n, α ∈ R set Fiα = {x : x ∈ R n , ξi ≤ α},
Hiα = {ω : ω ∈ dom Xi , Xi (ω) ≤ α}. Xi is Σ̂-measurable and its domain is in Σ̂, so Hiα ∈ Σ̂, and X −1 [Fiα ] = D ∩ Hiα
is µ̂D -measurable. Thus Fiα ∈ T. As T is a σ-algebra of subsets of R n , B ⊆ T (121J). Q Q
Accordingly ν0 ↾B is a measure on R n with domain B; of course ν0 R n = µ̂D = 1. By 256C, the completion ν of ν0 ↾B
is a Radon measure on R n , and νR n = ν0 R n = 1.
For E ∈ B,
νE = ν0 E = µ̂DX −1 [E] = µ̂X
X −1 [E] = Pr((X1 , . . . , Xn ) ∈ E).
More generally, if E ∈ dom ν, then there are Borel sets E ′ , E ′′ such that E ′ ⊆ E ⊆ E ′′ and ν(E ′′ \ E ′ ) = 0, so that
X −1 [E ′ ] ⊆ X −1 [E] ⊆ X −1 [E ′′ ] and µ̂(X
X −1 [E ′′ ] \ X −1 [E ′ ]) = 0. This means that X −1 [E] ∈ Σ̂ and
X −1 [E] = µ̂X
µ̂X X −1 [E ′ ] = νE ′ = νE.
As for the uniqueness of ν, if ν ′ is any Radon measure on Rn such that ν ′ ]−∞, a] = Pr(Xi ≤ αi ∀ i ≤ n) for every
a ∈ R n , then surely
ν ′ R n = limk→∞ ν ′ ]−∞, k1] = limk→∞ ν ]−∞, k1] = 1 = νR n .
Also I = {]−∞, a] : a ∈ R n } is closed under finite intersections, and ν and ν ′ agree on I. By the Monotone Class
Theorem (or rather, its corollary 136C), ν and ν ′ agree on the σ-algebra generated by I, which is B (121J), and are
identical (256D).

271C Definition Let (Ω, Σ, µ) be a probability space and X1 , . . . , Xn real-valued random variables on Ω. By the
(joint) distribution or law νX of the family
T X = (X1 , . . . , Xn ) I shall mean the Radon probability measure νn of
271B. If we think of X as a function from i≤n dom Xi to R n , then νX E = Pr(X X ∈ E) for every Borel set E ⊆ R .

271D Remarks (a) The choice of the Radon probability measure νX as ‘the’ distribution of X , with the insistence
that ‘Radon measures’ should be complete, is of course somewhat arbitrary. Apart from the general principle that one
should always complete measures, these conventions fit better with some of the work in Volume 4 and with such results
as 272G below.

(b) Observe that in order to speak of the distribution of a family X = (X1 , . . . , Xn ) of random variables, it is
essential that all the Xi should be based on the same probability space.

(c) I see that the language I have chosen allows the Xi to have different
T domains, so that the family (X1 , . . . , Xn )
may not be exactly identifiable with the corresponding function from i≤n dom Xi to R n . I hope however that using
the same symbol X for both will cause no confusion.

(d) It is not useful to think of the whole image measure ν0 = µ̂DX −1 in the proof of 271B as the distribution of X ,
unless it happens to be equal to ν = νX . The ‘distribution’ of a random variable is exactly that aspect of it which can
be divorced from any consideration of the underlying space (Ω, Σ, µ), and the point of such results as 271K and 272G
271E Distributions 331

is that distributions can be calculated from each other, without going back to the relatively fluid and uncertain model
of a random variable in terms of a function on a probability space.

(e) If X = (X1 , . . . , Xn ) and Y = (Y1 , . . . , Yn ) are such that Xi =a.e. Yi for each i, then
T T
{ω : ω ∈ i≤n dom Xi , Xi (ω) ≤ αi ∀ i ≤ n}△{ω : ω ∈ i≤n dom Yi , Yi (ω) ≤ αi ∀ i ≤ n}
is negligible, so
\
Pr(Xi ≤ αi ∀ i ≤ n) = µ̂{ω : ω ∈ dom Xi , Xi (ω) ≤ αi ∀ i ≤ n}
i≤n

= Pr(Yi ≤ αi ∀ i ≤ n)

for all α0 , . . . , αn ∈ R, and νX = νY . This means that we can, if we wish, think of a distribution as a measure νu
where u = (u0 , . . . , un ) is a finite sequence in L0 (µ). In the present chapter I shall not emphasize this approach, but
it will always be at the back of my mind.

271E Measurable functions of random variables: Proposition Let X = (X1 , . . . , Xn ) be a family of random
variables (as always in such a context, I mean them all to be on the same probability space (Ω, Σ, µ)); write TX for
the domain of the distribution νX , and let h be a TX -measurable real-valued function defined νX -a.e. on R n . Then we
have a random variable Y = h(X1 , . . . , Xn ) defined by setting
h(X1 , . . . , Xn )(ω) = h(X1 (ω), . . . , Xn (ω)) for every ω ∈ X −1 [dom h].
The distribution νY of Y is the measure on R defined by the formula
νY F = νX h−1 [F ]
for just those sets F ⊆ R such that h−1 [F ] ∈ TX . Also
R
E(Y ) = h dνX
in the sense that if one of these exists in [−∞, ∞], so does the other, and they are then equal.

proof (a)(i) Once again, write (Ω, Σ̂, µ̂) for the completion of (Ω, Σ, µ). Since
S
Ω \ dom Y ⊆ i≤n (Ω \ dom Xi ) ∪ X −1 [R n \ dom h]
is negligible (using 271Bb), dom Y is conegligible. If a ∈ R, then
E = {x : x ∈ dom h, h(x) ≤ a} ∈ TX ,
so
{ω : ω ∈ Ω, Y (ω) ≤ a} = X −1 [E] ∈ Σ̂.
As a is arbitrary, Y is Σ̂-measurable, and is a random variable.

(ii) Let h̃ : R n → R be any extension of h to the whole of R n . Then h̃ is TX -measurable, so the ordinary image
measure νX h̃−1 , defined on {F : h̃−1 [F ] ∈ dom νX }, is a Radon probability measure on R (256G). But for any A ⊆ R,
h̃−1 [A]△h−1 [A] ⊆ R n \ dom h
is νX -negligible, so νX h−1 [F ] = νX h̃−1 [F ] if either is defined.
If F ⊆ R is a Borel set, then
X −1 [h−1 [F ]]) = νX (h−1 [F ]).
νY F = µ̂{ω : Y (ω) ∈ F } = µ̂(X
So νY and νX h̃−1 agree on the Borel sets and are equal (256D again).

(b) Now apply Theorem 235E to the measures µ̂ and νX and the function φ = X . We have
R
X −1 [F ])dµ̂ = µ̂(X
χ(X X −1 [F ]) = νX F
for every F ∈ TX , by 271Bb. Because h is νX -virtually measurable and defined νX -a.e., 235Eb tells us that
R R R
X )dµ =
h(X X )dµ̂ =
h(X h dνX
whenever either side is defined in [−∞, ∞], which is exactly the result we need.
332 Probability theory 271F

271F Corollary If X is a single random variable with distribution νX , then


R∞
E(X) = −∞
x νX (dx)
if either is defined in [−∞, ∞]. Similarly
R∞
E(X 2 ) = −∞
x2 νX (dx)
(whatever X may be). If X, Y are two random variables (on the same probability space!) then we have
R
E(X × Y ) = xy ν(X,Y ) d(x, y)
if either side is defined in [−∞, ∞].

271G Distribution functions (a) If X is a real-valued random variable, its distribution function is the function
FX : R → [0, 1] defined by setting
FX (a) = Pr(X ≤ a) = νX ]−∞, a]
for every a ∈ R. (Warning! some authors prefer FX (a) = Pr(X < a).) Observe that FX is non-decreasing, that
lima→−∞ FX (a) = 0, that lima→∞ FX (a) = 1 and that limx↓a FX (x) = FX (a) for every a ∈ R. By 271Ba, X and Y
have the same distribution iff FX = FY .

(b) If X1 , . . . , Xn are real-valued random variables on the same probability space, their (joint) distribution
function is the function FX : R n → [0, 1] defined by writing
FX (a) = Pr(Xi ≤ αi ∀ i ≤ n)
whenever a = (α1 , . . . , αn ) ∈ R n . If X and Y have the same distribution function, they have the same distribution,
by the n-dimensional version of 271B.

271H Densities Let X = (X1 , . . . , Xn ) be a family of random variables, all defined on the same probability space.
A density function for (X1 , . . . , Xn ) is a Radon-Nikodým derivative, with respect to Lebesgue measure, for the
distribution νX ; that is, a non-negative function f , integrable with respect to Lebesgue measure µL on R n , such that
R
E
X ∈ E)
f dµL = νX E = Pr(X
n
for every Borel set E ⊆ R (256J) – if there is such a function, of course.

271I Proposition Let X = (X1 , . . . , Xn ) be a family of random variables, all defined on the same probability
space. Write µL for Lebesgue measure on R n .
X ∈ E) = 0 for every Borel set E suchRthat µL E = 0.
(a) There is a density function for X iff Pr(X
X ∈ ]−∞, a])
(b) A non-negative Lebesgue integrable function f is a density function for X iff ]−∞,a] f dµL = Pr(X
n
for every a ∈ R .
(c) Suppose that f is a density function for X , and G = {x : f (x) > 0}. Then if h is a Lebesgue measurable
real-valued function defined almost everywhere in G,
R R
X )) =
E(h(X h dνX = h × f dµL
if any of the three integrals is defined in [−∞, ∞], interpreting (h × f )(x) as 0 if f (x) = 0 and x ∈
/ dom h.
proof (a) Apply 256J to the Radon probability measure νX .
(b) Of course the condition is necessary. If it is satisfied, then (by B.Levi’s theorem)
R R
f dµL = limk→∞ 1]
]−∞,k1
f dµL = limk→∞ νX ]−∞, k11] = 1.
So we have a Radon probability measure ν defined by writing
R
νE = E
f dµL
whenever E ∩ {x : f (x) > 0} is Lebesgue measurable (256E). We are supposing that ν ]−∞, a] = νX ]−∞, a] for every
a ∈ R n ; by 271Ba, as usual, ν = νX , so
R
E
X ∈ E)
f dµL = νE = νX E = Pr(X
for every Borel set E ⊆ R n , and f is a density function for X .
271K Distributions 333

(c) By 256E, νX is the indefinite-integral measure over µ associated with f . So, writing G = {x : f (x) > 0}, we
have
R R
h dνX = h × f dµL
whenever either is defined in [−∞,
R ∞] (235K). By 234La, h is TX -measurable and defined νX -almost everywhere, where
TX = dom νX , so E(h(X X )) = h dνX by 271E.

271J The machinery developed in §263 is sufficient to give a very general result on the densities of random variables
X ), as follows.
of the form φ(X
Theorem Let X = (X1 , . . . , Xn ) be a family of random variables, and D ⊆ R n a Borel set such that Pr(X
X ∈ D) = 1.
Let φ : D → R n be a function which is differentiable relative to its domain everywhere in D; for x ∈ D, let T (x) be
a derivative of φ at x, and set J(x) = | det T (x)|. Suppose that J(x) 6= 0 for each x ∈ D, and that X has a density
function f ; and suppose moreover that hDk ik∈N is a disjointPsequence of Borel sets, with union D, such that φ↾Dk is
X ) has a density function g = ∞
injective for every k. Then φ(X k=0 gk where

f (φ−1 (y))
gk (y) = for y ∈ φ[Dk ∩ dom f ],
J(φ−1 (y))
= 0 for y ∈ R n \ φ[Dk ].

proof By 262Ia, φ is continuous, therefore Borel measurable, so φ(X X ) is a random variable.


For the moment, fix k ∈ N and a Borel set F ⊆ R n . By 263D(iii), φ[Dk ] is measurable, and by 263D(ii) φ[Dk \dom f ]
is negligible. The function gk is such that f (x) = J(x)gk (φ(x)) for every x ∈ Dk ∩ dom f , so by 263D(v) we have
Z Z Z
gk dµ = gk × χF dµ = J(x)gk (φ(x))χF (φ(x))µ(dx)
F φ[Dk ] Dk
Z
= X ∈ Dk ∩ φ−1 [F ]).
f dµ = Pr(X
Dk ∩φ−1 [F ]
R R R
(The integral φ[Dk ] gk × χF is defined because Dk J × (gk × χF )φ is defined, and the integral gk × χF is defined
because φ[Dk ] is measurable and g is zero off φ[Dk ].)
Now sum over k. Every gk is non-negative, so by B.Levi’s theorem (123A, 123Xa)
Z ∞ Z
X ∞
X
g dµ = gk dµ = X ∈ Dk ∩ φ−1 [F ])
Pr(X
F k=0 F k=0
X ∈ φ−1 [F ]) = Pr(φ(X
= Pr(X X ) ∈ F ).
X ), as claimed.
As F is arbitrary, g is a density function for φ(X

271K The application of the last theorem to ordinary transformations is sometimes indirect, so I give an example.
Proposition Let X, Y be two random variables with a joint density function f . Then X × Y has a density function
h, where
R∞ 1
h(u) = −∞ |v|
f ( uv , v)dv

whenever this is defined in R.


 
2 y x
proof Set φ(x, y) = (xy, y) for x, y ∈ R . Then φ is differentiable, with derivative T (x, y) = , so J(x, y) =
0 1
| det T (x, y)| = |y|. Set D = {(x, y) : y 6= 0}; then D is a conegligible Borel set in R 2 and φ↾D is injective. Now
φ[D] = D and φ−1 (u, v) = ( uv , v) for v 6= 0. So φ(X, Y ) = (X × Y, Y ) has a density function g, where
f (u/v,v)
g(u, v) = if v 6= 0.
|v|
To find a density function for X × Y , we calculate
R Ra R∞ Ra
Pr(X × Y ≤ a) = ]−∞,a]×R
g= −∞ −∞
g(u, v)dv du = −∞
h
by Fubini’s theorem (252B, 252C). In particular, h is defined and finite almost everywhere; and by 271Ib it is a density
function for X × Y .
334 Probability theory *271L

*271L When a random variable is presented as the limit of a sequence of random variables the following can be
very useful.
Proposition Let hXn in∈N be a sequence of real-valued random variables converging in measure to a random variable
X (definition: 245A). Writing FXn , FX for the distribution functions of Xn , X respectively,
FX (a) = inf b>a lim inf n→∞ FXn (b) = inf b>a lim supn→∞ FXn (b)
for every a ∈ R.
proof Set γ = inf b>a lim inf n→∞ FXn (b), γ ′ = inf b>a lim supn→∞ FXn (b).
(a) FX (a) ≤ γ. P
P Take any b > a and ǫ > 0. Then there is an n0 ∈ N such that Pr(|Xn − X| ≥ b − a) ≤ ǫ for every
n ≥ n0 (245F). Now, for n ≥ n0 ,
FX (a) = Pr(X ≤ a) ≤ Pr(Xn ≤ b) + Pr(Xn − X ≥ b − a) ≤ FXn (b) + ǫ.
So FX (a) ≤ lim inf n→∞ FXn (b) + ǫ; as ǫ is arbitrary, FX (a) ≤ lim inf n→∞ FXn (b); as b is arbitrary, FX (a) ≤ γ. Q
Q
(b) γ ′ ≤ FX (a). P
P Let ǫ > 0. Then there is a δ > 0 such that FX (a + 2δ) ≤ FX (a) + ǫ (271Ga). Next, there is an
n0 ∈ N such that Pr(|Xn − X| ≥ δ) ≤ ǫ for every n ≥ n0 . In this case, for n ≥ n0 ,

FXn (a + δ) = Pr(Xn ≤ a + δ) ≤ Pr(X ≤ a + 2δ) + Pr(X − Xn ≥ δ)


≤ FX (a + 2δ) + ǫ ≤ FX (a) + 2ǫ.
Accordingly
γ ′ ≤ lim supn→∞ FXn (a + δ) ≤ FX (a) + 2ǫ.
As ǫ is arbitrary, γ ′ ≤ FX (a). Q
Q
(c) Since of course γ ≤ γ ′ , we must have FX (a) = γ = γ ′ , as claimed.

271X Basic exercises > (a) Let X be a real-valued random variable with finite expectation, and ǫ > 0. Show that
1
Pr(|X − E(X)| ≥ ǫ) ≤ Var(X). (This is Chebyshev’s inequality.)
ǫ2

> (b) Let F : R → [0, 1] be a non-decreasing function such that (i) lima→−∞ F (a) = 0 (ii) lima→∞ F (a) = 1
(iii) limx↓a F (x) = F (a) for every a ∈ R. Show that there is a unique Radon probability measure ν in R such that
F (a) = ν ]−∞, a] for every a ∈ R. (Hint: 114Xa.) Hence show that F is the distribution function of some random
variable.

> (c) Let X be a real-valued random variable with a density function f . (i) Show that |X| has a density function
g1 where g1 (x) = f (x) + f (−x) whenever√x ≥ 0 and√f (x), f√(−x) are both defined, 0 otherwise. (ii) Show that X 2 has
a density function g2 where g2 (x) = (f ( x) + f (− x))/2 x whenever x > 0 and this is defined, 0 for other x. (iii)
Show that if Pr(X = 0) = 0 then 1/X has a density function g3 where g3 (x) = x12 f ( x1 ) whenever this is defined. (iv)

Show that if Pr(X < 0) = 0 then X has a density function g4 where g4 (x) = 2xf (x2 ) if x ≥ 0 and f (x2 ) is defined,
0 otherwise.

> (d) Let X and Y beRrandom variables with a joint density function f : R 2 → R. Show that X + Y has a density
function h where h(u) = f (u − v, v)dv for almost every u.

(e) Let X, Y Rbe random variables with a joint density function f : R 2 → R. Show that X/Y has a density function
h where h(u) = |v|f (uv, v)dv for almost every u.

(f ) Devise an alternative proof of 271K by using Fubini’s theorem and one-dimensional substitutions to show that
RbR∞ 1 R
a −∞ |v|
f ( uv , v)dv du = {(u,v):a≤uv≤b}
f

whenever a ≤ b in R.

271Y Further exercises (a) Let T be the topology of R N and B the σ-algebra of Borel sets (256Ye). (i) Let I be
the family of sets of the form
{x : x ∈ R N , x(i) ≤ αi ∀ i ≤ n},
§272 intro. Independence 335

where n ∈ N and αi ∈ R for each i ≤ n. Show that B S is the smallest family of subsets of R N such that (α) I ⊆ B
(β) B \ A ∈ B whenever A, B ∈ B and A ⊆ B (γ) k∈N Ak ∈ B for every non-decreasing sequence hAk ik∈N in
B. (ii) Show that if µ, µ′ are two totally finite measures defined on R N , and µF and µ′ F are defined and equal
for every F ∈ I, then µE and µ′ E are defined and equal for every E ∈ B. (iii) Show that if Ω is a set and Σ a
σ-algebra of subsets of Ω and X : Ω → R N is a function, then X −1 [E] ∈ Σ for every E ∈ B iff πi X is Σ-measurable
for every i ∈ N, where πi (x) = x(i) for each x ∈ R N , i ∈ N. (iv) Show that if X = hXi ii∈N is a sequence of
B
real-valued random variables on a probability space (Ω, Σ, µ), then there is a unique probability measure νX , with
B
domain B, such that νX {x : x(i) ≤ αi ∀ i ≤ n} = Pr(Xi ≤ αi ∀ i ≤ n) for every α0 , . . . , αn ∈ R. (v) Under
the conditions of (iv), show that there is a unique Radon measure νX on R N (in the sense of 256Ye) such that
νX {x : x(i) ≤ αi ∀ i ≤ n} = Pr(Xi ≤ αi ∀ i ≤ n) for every α0 , . . . , αn ∈ R.

(b) Let F : R 2 → [0, 1] be a function. Show that the following are equiveridical: (i) F is the distribution function
of some pair (X1 , X2 ) of random variables (ii) there is a probability measure ν on R 2 such that ν ]−∞, a] = F (a) for
every a ∈ R 2 (iii)(α) F (α1 , α2 ) + F (β1 , β2 ) ≥ F (α1 , β2 ) + F (α2 , β1 ) whenever α1 ≤ β1 and α2 ≤ β2 (β) F (α1 , α2 ) =
limξ1 ↓α1 ,ξ2 ↓α2 F (ξ1 , ξ2 ) for every α1 , α2 (γ) limα→−∞ F (α, β) = limα→−∞ F (β, α) = 0 for all β (δ) limα→∞ F (α, α) = 1.
(Hint: for non-empty half-open intervals ]a, b], set λ ]a, b] = F (α1 , α2 ) + F (β1 , β2 ) − F (α1 , β2 ) − F (α2 , β1 ), and continue
as in 115B-115F.)

(c) Generalize (b) to higher dimensions, finding a suitable formula to stand in place of that in (iii-α) of (b).

(d) Let (Ω, Σ, µ) be a probability space and F a filter on L0 (µ) converging to X0 ∈ L0 (µ) for the topology of
convergence in measure. Show that, writing FX for the distribution function of X ∈ L0 (µ),
FX0 (a) = inf b>a lim inf X→F FX (b) = inf b>a lim supX→F FX (b)
for every a ∈ R.

(e) Let X, Y be non-negative random variables with the same distribution, and h : [0, ∞[ → [0, ∞[ a non-decreasing
function. Show that E(X × hY ) ≤ E(Y × hY ). (Hint: in the language of 252Yo, (Y × hY )∗ = Y ∗ × (hY )∗ .)

271 Notes and comments Most of this section seems to have been taken up with technicalities. This is perhaps
unsurprising in view of the fact that it is devoted to the relationship between a vector random variable X and the
associated distribution νX , and this necessarily leads us into the minefield which I attempted to chart in §235. Indeed,
I call on results from §235 twice; once in 271E, with a φ(ω) = X (ω) and J(ω) = 1, and once in 271I, with φ(x) = x
and J(x) = f (x).
Distribution functions of one-dimensional random variables are easily characterized (271Xb); in higher dimensions
we have to work harder (271Yb-271Yc). Distributions, rather than distribution functions, can be described for infinite
sequences of random variables (271Ya); indeed, these ideas can be extended to uncountable families, but this requires
proper topological measure theory, and belongs in Volume 4.
The statement of 271J is lengthy, not to say cumbersome. The point is that many of the most important transforma-
tions φ are not themselves injective, but can easily be dissected into injective fragments (see, for instance, 271Xc and
263Xd). The point of 271K is that we frequently wish to apply the ideas here to transformations which are singular,
and indeed change the dimension of the random variable. I have not given the theorems which make such applications
routine and suggest rather that you seek out tricks such as that used in the proof of 271K, which in any case are
necessary if you want amenable formulae. Of course other methods are available (271Xf).

272 Independence
I introduce the concept of ‘independence’ for families of events, σ-algebras and random variables. The first part of
the section, down to 272G, amounts to an analysis of the elementary relationships between the three manifestations of
the idea. In 272G I give the fundamental result that the joint distribution of a (finite) independent family of random
variables is just the product of the individual distributions. Further expressions of the connexion between independence
and product measures are in 272J, 272M and 272N. I give a version of the zero-one law (272O), and I end the section
with a group of basic results from probability theory concerning sums and products of independent random variables
(272R-272W).
336 Probability theory 272A

272A Definitions Let (Ω, Σ, µ) be a probability space.

(a) A family hEi ii∈I in Σ is (stochastically) independent if


Qn
µ(Ei1 ∩ Ei2 ∩ . . . ∩ Ein ) = j=1 µEij
whenever i1 , . . . , in are distinct members of I.

(b) A family hΣi ii∈I of σ-subalgebras of Σ is (stochastically) independent if


Qn
µ(E1 ∩ E2 ∩ . . . ∩ En ) = j=1 µEj
whenever i1 , . . . , in are distinct members of I and Ej ∈ Σij for every j ≤ n.

(c) A family hXi ii∈I of real-valued random variables on Ω is (stochastically) independent if


Qn
Pr(Xij ≤ αj for every j ≤ n) = j=1 Pr(Xij ≤ αj )
whenever i1 , . . . , in are distinct members of I and α1 , . . . , αn ∈ R.

272B Remarks (a) This is perhaps the central contribution of probability theory to measure theory, and as such
deserves the most careful scrutiny. The idea of ‘independence’ comes from outside mathematics altogether, in the
notion of events which have independent causes. I suppose that 272G and 272M are the results below which most
clearly show the measure-theoretic aspects of the concept. It is not an accident that both involve product measures;
one of the wonders of measure theory is the fact that the same technical devices are used in establishing the probability
theory of stochastic independence and the geometry of multi-dimensional volume.

(b) In the following paragraphs I will try to describe some relationships between the three notions of independence
just defined. But it is worth noting at once the fact that, in all three cases, a family is independent iff all its finite
subfamilies are independent. Consequently any subfamily of an independent family is independent. Another elementary
fact which is immediate from the definitions is that if hΣi ii∈I is an independent family of σ-algebras, and Σ′i is a σ-
subalgebra of Σi for each i, then hΣ′i ii∈I is an independent family.

(c) A useful reformulation of 272Ab is the following: A family hΣi ii∈I of σ-subalgebras of Σ is independent iff
T Q
µ( i∈I Ei ) = i∈I µEi
whenever Ei ∈ Σi for everyQ i and {i : Ei 6= Ω} is finite. (Here I follow the convention of Q 254F, saying that for a family
hαi ii∈I in [0, 1] we take i∈I αi = 1 if I = ∅, and otherwise it is to be inf J⊆I,J is finite i∈J αj .)

(d) In 272Aa-b I speak of sets Ei ∈ Σ and algebras Σi ⊆ Σ. In fact (272Ac already gives a hint of this) we shall
more often than not be concerned with Σ̂ rather than with Σ, if there is a difference, where (Ω, Σ̂, µ̂) is the completion
of (Ω, Σ, µ).

272C The σ-subalgebra defined by a random variable To relate 272Ab to 272Ac we need the following notion.
Let (Ω, Σ, µ) be a probability space and X a real-valued random variable defined on Ω. Write B for the σ-algebra of
Borel subsets of R, and ΣX for
{X −1 [F ] : F ∈ B} ∪ {(Ω \ dom X) ∪ X −1 [F ] : F ∈ B}.
Then ΣX is a σ-algebra of subsets of Ω. P
P
∅ = X −1 [∅] ∈ ΣX ;
if F ∈ B then
Ω \ X −1 [F ] = (Ω \ dom X) ∪ X −1 [R \ F ] ∈ ΣX ,

Ω \ ((Ω \ dom X) ∪ X −1 [F ]) = X −1 [R \ F ] ∈ ΣX ;
if hFk ik∈N is any sequence in B then
S S
k∈N X −1 [Fk ] = X −1 [ k∈N Fk ],
so
S S
k∈N X −1 [Fk ], (Ω \ dom X) ∪ k∈N X −1 [Fk ]
272D Independence 337

belong to ΣX . QQ
Evidently ΣX is the smallest σ-algebra of subsets of Ω, containing dom X, for which X is measurable. Also ΣX is a
subalgebra of Σ̂, where Σ̂ is the domain of the completion of µ (271Aa).
Now we have the following result.

272D Proposition Let (Ω, Σ, µ) be a probability space and hXi ii∈I a family of real-valued random variables on Ω.
For each i ∈ I, let Σi be the σ-algebra defined by Xi , as in 272C. Then the following are equiveridical:
(i) hXi ii∈I is independent;
(ii) whenever i1 , . . . , in are distinct members of I and F1 , . . . , Fn are Borel subsets of R, then
Qn
Pr(Xij ∈ Fj for every j ≤ n) = j=1 Pr(Xij ∈ Fj );
(iii) whenever hFi ii∈I is a family of Borel subsets of R, and {i : Fi 6= R} is finite, then
T  Q
µ̂ i∈I (Xi−1 [Fi ] ∪ (Ω \ dom Xi )) = i∈I Pr(Xi ∈ Fi ),
where µ̂ is the completion of µ;
(iv) hΣi ii∈I is independent with respect to µ̂.
proof (a)(i)⇒(ii) Write X = (Xi1 , . . . , Xin ). Write νX for the joint distribution of X , and for each j ≤ n write νj for
the distribution of Xij ; let ν be the product of ν1 , . . . , νn as described in 254A-254C. (I wrote §254 out as for infinite
products. If you are interested only in finite products of probability spaces, which are adequate for our needs in this
paragraph, I recommend reading §§251-252 with the mental proviso that all measures are probabilities, and then §254
with the proviso that the set I is finite.) By 256K, ν is a Radon measure on R n . (This is an induction on n, relying
on 254N for assurance that we can regard ν as the repeated product (. . . ((ν1 × ν2 ) × ν3 ) × . . . νn−1 ) × νn .) Then for
any a = (α1 , . . . , αn ) ∈ R n , we have

n
Y n
 Y
ν ]−∞, a] = ν ]−∞, αj ] = νj ]−∞, αj ]
j=1 j=1
(using 254Fb)
n
Y
= Pr(Xij ≤ αj ) = Pr(Xij ≤ αj for every j ≤ n)
j=1
(using the condition (i))
= νX ]−∞, a] .

By the uniqueness assertion in 271Ba, ν = νX . In particular, if F1 , . . . , Fn are Borel subsets of R,


Y Y
Pr(Xij ∈ Fj for every j ≤ n) = Pr(XX∈ Fj ) = νX ( Fj )
j≤n j≤n
Y n
Y Yn
= ν( Fj ) = ν j Fj = Pr(Xij ∈ Fj ),
j≤n j=1 j=1

as required.
(b)(ii)⇒(i) is trivial, if we recall that all sets ]−∞, α] are Borel sets, so that the definition of independence given
in 272Ac is just a special case of (ii).
(c)(ii)⇒(iv) Assume (ii), and suppose that i1 , . . . , in are distinct members of I and Ej ∈ Σij for each j ≤ n. For
each j, set Ej′ = Ej ∩dom Xij , so that Ej′ may be expressed as Xi−1 j
[Fj ] for some Borel set Fj ⊆ R. Then µ̂(Ej \Ej′ ) = 0
for each j, so

\ \
µ̂( Ej ) = µ̂( Ej′ ) = Pr(Xi1 ∈ F1 , . . . , Xin ∈ Fn )
1≤j≤n 1≤j≤n
n
Y
= Pr(Xij ∈ Fj )
j=1
(using (ii))
338 Probability theory 272D

n
Y
= µ̂Ej .
i=1

As E1 , . . . , Ek are arbitrary, hΣi ii∈I is independent.


(d)(iv)⇒(ii) Now suppose that hΣi ii∈I is independent. If i1 , . . . , in are distinct members of I and F1 , . . . , Fn are
Borel sets in R, then Xi−1
j
[Fj ] ∈ Σij for each j, so

\
Pr(Xi1 ∈ F1 , . . . , Xin ∈ Fn ) = µ̂( Xi−1
j
[Fj ])
1≤j≤n
n
Y n
Y
= µ̂Xi−1
j
[Fj ] = Pr(Xij ∈ Fj )
i=1 j=1
.

(e) Finally, observe that (iii) is nothing but a re-formulation of (ii), because if Fi = R then Pr(Xi ∈ Fi ) = 1 and
Xi−1 [Fi ] ∪ (Ω \ dom Xi ) = Ω.

272E Corollary Let hXi ii∈I be an independent family of real-valued random variables, and hhi ii∈I any family of
Borel measurable functions from R to R. Then hhi (Xi )ii∈I is independent.
proof Writing Σi for the σ-algebra defined by Xi , Σ′i for the σ-algebra defined by h(Xi ), h(Xi ) is Σi -measurable
(121Eg) so Σ′i ⊆ Σi for every i and hΣ′i ii∈I is independent, as in 272Bb.

272F Similarly, we can relate the definition in 272Aa to the others.


Proposition Let (Ω, Σ, µ) be a probability space, and hEi ii∈I a family in Σ. Set Σi = {∅, Ei , Ω\Ei , Ω}, the (σ-)algebra
of subsets of Ω generated by Ei , and Xi = χEi , the characteristic function of Ei . Then the following are equiveridical:
(i) hEi ii∈I is independent;
(ii) hΣi ii∈I is independent;
(iii) hXi ii∈I is independent.
proof (i)⇒(iii) If i1 , . . . , in are distinct members of I and α1 , . . . , αn ∈ R, then for each j ≤ n the set Gj = {ω :
Xij (ω) ≤ αj } is either Eij or ∅ or Ω. If any Gj is empty, then
Qn
Pr(Xij ≤ αj for everyj ≤ n} = 0 = j=1 Pr(Xij ≤ αj ).
Otherwise, set K = {j : Gj = Eij }; then
\ \
Pr(Xij ≤ αj for everyj ≤ n} = µ( Gj ) = µ( Eij )
j≤n j∈K
Y n
Y
= µEij = Pr(Xij ≤ αj ).
j∈K j=1

As i1 , . . . , in and α1 , . . . , αn are arbitrary, hXi ii∈I is independent.


(iii)⇒(ii) follows from (i)⇒(iii) of 272D, because Σi is the σ-algebra defined by Xi .
(ii)⇒(i) is trivial, because Ei ∈ Σi for each i.
Remark You will I hope feel that while the theory of product measures might be appropriate to 272D, it is surely
rather heavy machinery to use on what ought to be a simple combinatorial problem like (iii)⇒(ii) of this proposition.
I suggest that you construct an ‘elementary’ proof, and examine which of the ideas of the theory of product measures
(and the Monotone Class Theorem, 136B) are actually needed here.

272G Distributions of independent random variables I have not tried to describe the ‘joint distribution’ of an
infinite family of random variables. (Indications of how to deal with a countable family are offered in 271Ya and 272Yb.
For uncountable families I will wait until §454 in Volume 4.) As, however, the independence of a family of random
variables is determined by the behaviour of finite subfamilies, we can approach it through the following proposition.
272K Independence 339

Theorem Let X = (X1 , . . . , Xn ) be a finite family of real-valued random variables on a probability space. Let νX be
the corresponding distribution on R n . Then the following are equiveridical:
(i) X1 , . . . , Xn are independent;
(ii) νX can be expressed as a product of n probability measures ν1 , . . . , νn , one for each factor R of Rn ;
(iii) νX is the product measure of νX1 , . . . , νXn , writing νXi for the distribution of the random variable Xi .
proof (a)(i)⇒(iii) In the proof of (i)⇒(ii) of 272D above I showed that νX is the product ν of νX1 , . . . , νXn .
(b)(iii)⇒(ii) is trivial.
(c)(ii)⇒(i) Suppose that νX is expressible as a product ν1 × . . . × νn . Take a = (α1 , . . . , αn ) in R n . Then
Q
Pr(Xi ≤ αi ∀ i ≤ n) = Pr(X X ∈ ]−∞, a]) = νX (]−∞, a]) = ni=1 νi ]−∞, αi ].
On the other hand, setting Fi = {(ξ1 , . . . , ξn ) : ξi ≤ αi }, we must have
X ∈ Fi ) = Pr(Xi ≤ αi )
νi ]−∞, αi ] = νX Fi = Pr(X
for each i. So we get
Qn
Pr(Xi ≤ αi for every i ≤ n) = i=1 Pr(Xi ≤ αi ),
as required.

272H Corollary Suppose that hXi ii∈I is an independent family of real-valued random variables on a probability
space (Ω, Σ, µ), and that for each i ∈ I we are given another real-valued random variable Yi on Ω such that Yi =a.e. Xi .
Then hYi ii∈I is independent.
proof For every distinct i1 , . . . , in ∈ I, if we set X = (Xi1 , . . . , Xin ) and Y = (Yi1 , . . . , Yin ), then X =a.e. Y , so
νX , νY are equal (271De). By 272G, Yi1 , . . . , Yin must be independent because Xi1 , . . . , Xin are. As i1 , . . . , in are
arbitrary, the whole family hYi ii∈I is independent.
Remark It follows that we may speak of independent families in the space L0 (µ) of equivalence classes of random
variables (241C), saying that hXi• ii∈I is independent iff hXi ii∈I is.

272I Corollary Suppose that X1 , . . . , Xn are independent real-valued random variables with Qn density functions
f1 , . . . , fn (271H). Q
Then X = (X1 , . . . , Xn ) has a density function f given by setting f (x) = i=1 fi (ξi ) whenever
x = (ξ1 , . . . , ξn ) ∈ i≤n dom(fi ) ⊆ R n .
proof For n = 2 this is covered by 253I; the general case follows by induction on n.

272J The most important theorems of the subject refer to independent families of random variables, rather than
independent families of σ-algebras. The value of the concept of independent σ-algebras lies in such results as the
following.
Proposition Let (Ω, Σ, µ) be a complete probability space, and hΣi ii∈I a family of σ-subalgebras of Σ. For each i ∈ I
let µi be the restriction of µ to Σi , and let (ΩI , Λ, λ) be the product probability space of the family h(Ω, Σi , µi )ii∈I .
Define φ : Ω → ΩI by setting φ(ω)(i) = ω whenever ω ∈ Ω and i ∈ I. Then φ is inverse-measure-preserving iff hΣi ii∈I
is independent.
proof This is virtually a restatementTof 254Fb and 254G. (i) If φ is inverse-measure-preserving, i1 , . . . , in ∈ I are
distinct and Ej ∈ Σij for each j, then j≤n Eij = φ−1 [{x : x(ij ) ∈ Ej for every j ≤ n}], so that
T Qn Qn
µ( j≤n Eij ) = λ{x : x(ij ) ∈ Ej for every j ≤ n} = j=1 µij Eij = j=1 µEij .
(ii) If hΣi ii∈I is independent, Ei ∈ Σi for every i ∈ I and {i : Ei 6= Ω} is finite, then
Q T Q Q
µφ−1 [ i∈I Ei ] = µ( i∈I Ei ) = i∈I µEi = i∈I µi Ei .
So the conditions of 254G are satisfied and µφ−1 [W ] = λW for every W ∈ Λ.

272K Proposition Let (Ω, Σ, µ) be a probability space and hΣi ii∈I an independent family of σ-subalgebras of Σ.
Let hJ(s)is∈S be a disjoint family of subsets of I, and for each s ∈ S let Σ̃s be the σ-algebra of subsets of Ω generated
S
by i∈J(s) Σi . Then hΣ̃s is∈S is independent.

proof Let (Ω, Σ̂, µ̂) be the completion of (Ω, Σ, µ). On ΩI let λ be the product of the measures µ↾Σi , and let φ : Ω → ΩI
be the diagonal map, as in 272J. φ is inverse-measure-preserving for µ̂ and λ, by 272J.
340 Probability theory 272K

We can identify λ with the product of hλs is∈S , where for each s ∈ S λs is the product of hµ↾Σi ii∈J(s) (254N). For
s ∈ S, let Λs be the domain of λs , and set πs (x) = x↾J(s) for x ∈ ΩI , so that πs is inverse-measure-preserving for λ
and λs (254Oa), and φs = πs φ is inverse-measure-preserving for µ̂ and λs ; of course φs is the diagonal map from Ω to
ΩJ(s) . Set Σ∗s = {φ−1 ∗ ∗
s [H] : H ∈ Λs }. Then Σs is a σ-subalgebra of Σ̂, and Σs ⊇ Σ̃s , because

E = φ−1 ∗
s [{x : x(i) ∈ E}] ∈ Σs

whenever i ∈ J(s)a dn E ∈ Σi .
Now suppose that s1 , . . . , sn ∈ S are distinct and that Ej ∈ Σ̃sj for each j. Then Ej ∈ Σ∗sj , so there are Hj ∈ Λsj
such that Ej = φ−1
sj [Hj ] for each j. Set

W = {x : x ∈ ΩI , x↾J(sj ) ∈ Hj for every j ≤ n}.


Because we can identify λ with the product of the λs , we have
Qn Qn Qn Qn
λW = j=1 λsj Hj = j=1 µ̂(φ−1 sj [Hj ]) = j=1 µ̂Ej = j=1 µEj .
T
On the other hand, φ−1 [W ] = j≤n Ej , so, because φ is inverse-measure-preserving,
T T Qn
µ( j≤n Ej ) = µ̂( j≤n Ej ) = λW = j=1 µEj .
As E1 , . . . , En are arbitrary, hΣ̃s is∈S is independent.

272L I give a typical application of this result as a sample.


Corollary Let X, X1 , . . . , Xn be independent real-valued random variables and h : R n → R a Borel function. Then
X and h(X1 , . . . , Xn ) are independent.
proof Let ΣX , ΣXi be the σ-algebras defined by X, Xi (272C). Then ΣX , ΣX1 , . . . , ΣXn are independent (272D).
Let Σ∗ be the σ-algebra generated by ΣX1 ∪ . . . ∪ ΣXn . Then 272K (perhaps working in the completion of the original
probability space) tells us that ΣX and Σ∗ are independent. But every Xj is Σ∗ -measurable so Y = h(X1 , . . . , Xn ) is
Σ∗ -measurable (121Kb); also dom Y ∈ Σ∗ , so ΣY ⊆ Σ∗ and ΣX , ΣY are independent. By 272D again, X and Y are
independent, as claimed.
Remark Nearly all of us, when teaching elementary probability theory, would invite our students to treat this corollary
(with an explicit function h, of course) as ‘obvious’. In effect, the proof here is a confirmation that the formal definition
of ‘independence’ offered is a faithful representation of our intuition of independent events having independent causes.

272M Products of probability spaces and independent families of random variables We have already
seen that the concept of ‘independent random variables’ is intimately linked with that of ‘product measure’. I now give
some further manifestations of the connexion.
Proposition Let h(Ωi , Σi , µi )ii∈I be a family of probability spaces, and (Ω, Σ, µ) their product.
(a) For each i ∈ I write Σ̃i = {πi−1 [E] : E ∈ Σi }, where πi : Ω → Ωi is the coordinate map. Then hΣ̃i ii∈I is an
independent family of σ-subalgebras of Σ.
(b) For each i ∈ I let hXij ij∈J(i) be an independent family of real-valued random variables on Ωi , and for i ∈ I,
j ∈ J(i) write X̃ij (ω) = Xij (ω(i)) for those ω ∈ Ω such that ω(i) ∈ dom Xij . Then hX̃ij ii∈I,j∈J(i) is an independent
family of random variables, and each X̃ij has the same distribution as the corresponding Xij .
proof (a) It is easy to check that each Σ̃i is a σ-algebra of sets. The rest amounts just to recalling from 254Fb that if
J ⊆ I is finite and Ei ∈ Σi for i ∈ J, then
T Q
µ( i∈J πi−1 [Ei ]) = µ{ω : ω(i) ∈ Ei for every i ∈ I} = i∈I µi Ei
if we set Ei = Xi for i ∈ I \ J.
(b) We know also that (Ω, Σ, µ) is the product of the completions (Ωi , Σ̂i , µ̂i ) (254I). From this, we see that each X̃ij
is defined µ-a.e., and is Σ-measurable, with the same distribution as Xij . Now apply condition (iii) of 272D. Suppose
that hFij ii∈I,j∈J(i) is a family of Borel sets in R, and that {(i, j) : Fij 6= R} is finite. Consider
T −1
Ei = j∈J(i) (Xij [Fij ] ∪ (Ωi \ dom Xij )),
Q T −1
E= i∈I Ei = i∈I,j∈J(i) (X̃ij [Fij ] ∪ (Ω \ dom X̃ij )).
Because each family hXij ij∈J(i) is independent, and {j : Fij 6= R} is finite,
272P Independence 341

Q
µ̂i Ei = j∈J(i) Pr(Xij ∈ Eij )
for each i ∈ I. Because
{i : Ei 6= Ωi } ⊆ {i : ∃ j ∈ J(i), Fij 6= R}
is finite,
Q Q
µE = i∈I µ̂i Ei = i∈I,j∈J(i) Pr(X̃ij ∈ Fij );
as hFij ii∈I,j∈J(i) is arbitrary, hX̃ij ii∈I,j∈J(i) is independent.
Remark The formulation in (b) is more complicated than is necessary to express the idea, but is what is needed for
an application below.

272N A special case of 272J is of particular importance in general measure theory, and is most useful in an adapted
form.
Proposition Let (Ω, Σ, µ) be a complete probability space, and hEi ii∈I an independent family in Σ such that µEi = 21
for every i ∈ I. Define φ : Ω → {0, 1}I by setting φ(ω)(i) = 1 if ω ∈ Ei , 0 if ω ∈ Ω \ Ei . Then φ is inverse-measure-
preserving for the usual measure λ on {0, 1}I (254J).
proof I use 254G again. For each i ∈ I let Σi be the algebra {∅, Ei , Ω \ Ei , Ω}; then hΣi ii∈I is independent (272F).
For i ∈ I set φi (ω) = φ(ω)(i). Let ν be the usual measure of {0, 1}. Then it is easy to check that
1
µφ−1
i [H] = #(H) = νH2
for every H ⊆ {0, 1}. If hHi ii∈I is a family of subsets of {0, 1}, and {i : Hi 6= {0, 1}} is finite, then

\ \ Y
µφ−1 [ Hi ] = µ( φ−1
i [Hi ]) = µφ−1
i [Hi ]
i∈I i∈I i∈J
(because φ−1 [Hi ] ∈ Σi for each i, and hΣi ii∈I is independent)
Y Y
= νHi = λ( Hi ).
i∈I i∈I

As hHi ii∈I is arbitrary, 254G gives the result.

272O Tail σ-algebras and the zero-one law I have never been able to make up my mind whether the following
result is ‘deep’ or not. I think it is one of the many cases in mathematics where a theorem is surprising and exciting if
one comes on it unprepared, but is natural and straightforward if one approaches it from the appropriate angle.
Proposition Let (Ω, Σ, µ) be a probability
S space and hΣn in∈N an independent
T sequence of σ-subalgebras of Σ. Let
Σ∗n be the σ-algebra generated by m≥n Σm for each n, and set Σ∗∞ = n∈N Σ∗n . Then µE is either 0 or 1 for every
E ∈ Σ∗∞ .
proof For each n, the family (Σ0 , . . . , Σn , Σ∗n+1 ) is independent, by 272K. So (Σ0 , . . . , Σn , Σ∗∞ ) is independent, because
Σ∗∞ ⊆ Σ∗n+1 . But this means that every finite subfamily of (Σ∗∞ , Σ0 , Σ1 , . . . ) is independent, and therefore that the
whole family is (272Bb). Consequently (Σ∗∞ , Σ∗0 ) must be independent, by 272K again.
Now if E ∈ Σ∗∞ , then E also belongs to Σ∗0 , so we must have
µ(E ∩ E) = µE · µE,
2
that is, µE = (µE) ; so that µE ∈ {0, 1}, as claimed.

272P To support the claim that somewhere we have achieved a non-trivial insight, I give a corollary, which will be
fundamental to the understanding of the limit theorems in the next section, and does not seem to be obvious.
Corollary Let (Ω, Σ, µ) be a probability space, and hXn in∈N an independent sequence of real-valued random variables
on Ω. Then
1
lim supn→∞ (X0 + . . . + Xn )
n+1
is almost everywhere constant – that is, there is some u ∈ [−∞, ∞] such that
342 Probability theory 272P

1
lim supn→∞ (X0 + . . . + Xn ) = u
n+1

almost everywhere.
proof We may suppose that each Xn is Σ-measurable and defined everywhere in Ω, because (as remarked in 272H)
1
changing the Xn on a negligible set does not affect their independence, and it affects lim supn→∞ (X0 + . . . + Xn )
n+1

onlySon a negligible set. For
T each∗ n, let Σn be the σ-algebra generated by Xn (272C), and Σn the σ-algebra generated
by m≥n Σm ; set Σ∞ = n∈N Σn . By 272D, hΣn in∈N is independent, so µE ∈ {0, 1} for every E ∈ Σ∗∞ (272O).

Now take any a ∈ R and set


1
Ea = {ω : lim supm→∞ (X0 (ω) + . . . + Xm (ω)) ≤ a}.
m+1

Then
1 1
lim supm→∞ (X0 + . . . + Xm ) = lim supm→∞ (Xn + . . . + Xm+n ),
m+1 m+1
so
1
Ea = {ω : lim supm→∞ (Xn (ω) + . . . + Xn+m (ω)) ≤ a}
m+1

belongs to Σ∗n for every n, because Xi is Σ∗n -measurable for every i ≥ n. So E ∈ Σ∗∞ and
1
Pr(lim supm→∞ (X0 + . . . + Xm ) ≤ a) = µEa
m+1

must be either 0 or 1. Setting


u = sup{a : a ∈ R, µEa = 0}
(allowing sup ∅ = −∞ and sup R = ∞, as usual in such contexts), we see that
1
lim supn→∞ (X0 + . . . + Xn ) = u
n+1

almost everywhere.

*272Q I add here a result which will be useful in Volume 5 and which gives further insight into the nature of large
independent families.
Theorem Let (Ω, Σ, µ) be a probability space, and hΣi ii∈I an independent family of σ-subalgebras of Σ. Let E ⊆ Σ
be a family of measurable sets, and T the σ-algebra generated by E. Then Q⊆
T there is a set J I such that #(I \ J) ≤
n
max(ω, #(E)) and T, hΣj ij∈J are independent, in the sense that µ(F ∩ r≤n Er ) = µF · r=1 µEr whenever F ∈ T,
j1 , . . . , jr are distinct members of J and Er ∈ Σjr for each r ≤ n.

proof (a) As in 272J, give ΩI the probability measure λ which is the product of the measures µ↾Σi , and let φ : Ω → ΩI
be the diagonal map, so that φ is inverse-measure-preserving for µ̂ andTλ, where µ̂ is the completion of µ. Write Λ
for the domain of λ. Set κ = max(ω, #(E)), and let E ∗ be the set { r≤n Fr : n ∈ N, Fr ∈ E for every r ≤ n}.
Because #(E n ) ≤ κ for each n (2A1Lc), #(E ∗ ) ≤ κ (2A1Ld). For each F ∈ E ∗ , define νF : Λ → [0, 1] by setting
νF W = µ̂(F ∩ φ−1 [W ]); then νF is countably additive and R dominated by λ. It therefore has a Radon-Nikodým
derivative hF with respect to λ, so that µ̂(F ∩ φ−1 [W ]) = W hF dλ for every W ∈ Λ (232F). By 254Qc or 254Rb, we
can find a function h′F equal λ-almost everywhere to hF and determined by coordinates in a countable set JF , in the
sense that h′F (w) = h′F (w′ ) whenever w, w′ ∈ ΩI and w↾JF = w′ ↾JF . (I am taking it for granted that we chose h′F to
be defined everywhere on ΩI .)
S S
(b) Set J = I \ F ∈E ∗ JF ; by 2A1Ld, I \ J = F ∈E ∗ JF has cardinal at most κ. If F ∈ E ∗ , j1 , . . . , jr are distinct
T Qn
members of J and Er ∈ Σjr for each r ≤ n, µ(F ∩ r≤n Er ) = µF · r=1 µEr . P P Set W = {w : w ∈ ΩI , w(jr ) ∈ Er
for each r ≤ n}. Then
T R R
µ(F ∩ r≤n Er ) = µ̂(F ∩ φ−1 [W ]) = W h′F dλ = h′F × χW dλ.
But observe that W is determined by coordinates in J, while h′F is determined by coordinates in JF ⊆ I \ J; putting
272Ma, 272K and 272R together (or otherwise), we have
T R R Qn
µ(F ∩ r≤n Er ) = h′F × χW dλ = h′F dλ · λW = µF · r=1 µEr . Q Q
272T Independence 343

T Qn
(c) Now consider the family A of those sets F ∈ Σ such that µ(F ∩ r≤n Er ) = µF · r=1 µEr whenever j1 , . . . , jn ∈ J
are distinct and Er ∈ Σjr for every r ≤ n. It is easy to check that A is a Dynkin class, and we have just seen that A
includes E ∗ ; as E ∗ is closed under ∩, A includes the σ-algebra T of sets generated by E ∗ (136B). And this is just what
the theorem asserts.

272R I must now catch up on some basic facts from elementary probability theory.
Proposition Let X, Y be independent real-valued random variables with finite expectation (271Ab). Then E(X × Y )
exists and is equal to E(X)E(Y ).
proof Let ν(X,Y R) be the joint distribution
R of the pair (X, Y ). Then ν(X,Y ) is the product of the distributions νX and
νY (272G). Also xνX (dx) = E(X) and yνY (dy) = E(Y ) exist in R (271F). So
R
xyν(X,Y ) d(x, y) exists = E(X)E(Y )
(253D). But this is just E(X × Y ), by 271E with h(x, y) = xy.

272S Bienaymé’s Equality Let X1 , . . . , Xn be independent real-valued random variables. Then Var(X1 + . . . +
Xn ) = Var(X1 ) + . . . + Var(Xn ).
proof (a) Suppose first that all the Xi have finite variance. Set ai = E(Xi ), Yi = Xi − ai , X = X1 + . . . + Xn ,
Y = Y1 + . . . + Yn ; then E(X) = a1 + . . . + an , so Y = X − E(X) and

n
X
Var(X) = E(Y 2 ) = E( Y i )2
i=1
X n
n X n X
X n
= E( Yi × Yj ) = E(Yi × Yj ).
i=1 j=1 i=1 j=1

Now observe that if i 6= j then E(Yi × Yj ) = E(Yi )E(Yj ) = 0, because Yi and Yj are independent (by 272E) and we
may use 272R, while if i = j then
E(Yi × Yj ) = E(Yi2 ) = E(Xi − E(Xi ))2 = Var(Xi ).
So
Pn Pn
Var(X) = i=1 E(Yi2 ) = i=1 Var(Xi ).

(b)(i) I show next that if Var(X1 + X2 ) < ∞ then Var(X1 ) < ∞. P


P We have

ZZ Z
(x + y)2 νX1 (dx)νX2 (dy) = (x + y)2 ν(X1 ,X2 ) (d(x, y))

(by 272G and Fubini’s theorem)


= E((X1 + X2 )2 )
(by 271E)
< ∞.
R
So there must be some a ∈ R such that (x + a)2 µX1 (dx) is finite, that is, E((X1 + a)2 ) < ∞; consequently E(X12 ) and
Var(X1 ) are finite. Q
Q
(ii) Now an easy Pinduction (relying on 272L!) shows that if Var(X1 + . . . + Xn ) is finite, so is Var Xj for every j.
n
Turning this round, if j=1 Var(Xj ) = ∞, then Var(X1 + . . . + Xn ) = ∞, and again the two are equal.

272T The distribution of a sum of independent random variables: Theorem Let X, Y be independent
real-valued random variables on a probability space (Ω, Σ, µ), with distributions νX , νY . Then the distribution of
X + Y is the convolution νX ∗ νY (257A).
344 Probability theory 272T

proof Set ν = νX ∗ νY . Take a ∈ R and set h = χ ]−∞, a]. Then h is ν-integrable, so

Z Z
ν ]−∞, a] = h dν = h(x + y)(νX × νY )(d(x, y))

(by 257B, writing νX × νY for the product measure on R 2 )


Z
= h(x + y)ν(X,Y ) (d(x, y))

(by 272G, writing ν(X,Y ) for the joint distribution of (X, Y ); this is where we use the hypothesis that X and Y are
independent)
= E(h(X + Y ))
(applying 271E to the function (x, y) 7→ h(x + y))
= Pr(X + Y ≤ a).

As a is arbitrary, νX ∗ νY is the distribution of X + Y .

272U Corollary Suppose that X and Y are independent real-valued random variables, and that they have densities
f and g. Then the convolution f ∗ g is a density function for X + Y .
proof By 257F, f ∗ g is a density function for νX ∗ νY = νX+Y .

272V The following simple result will be very useful when we come to stochastic processes in Volume 4, as well as
in the next section.

Pm lemma (Etemadi 96) Let X0 , . . . , Xn be independent real-valued random variables. For m ≤ n, set
Etemadi’s
Sm = i=0 Xi . Then
Pr(supm≤n |Sm | ≥ 3γ) ≤ 3 maxm≤n Pr(|Sm | ≥ γ)
for every γ > 0.
proof As in 272P, we may suppose that every Xi is a measurable function defined everywhere on a measure space Ω.
Set α = maxm≤n Pr(|Sm | ≥ γ). For each r ≤ n, set
Er = {ω : |Sm (ω)| < 3γ for every m < r, |Sr (ω)| ≥ 3γ}.
Then E0 , . . . , En is a partition of {ω : maxm≤n |Sm (ω)| ≥ 3γ}. Set Er′ = {ω : ω ∈ Er , |Sn (ω)| < γ}. Then
Er′ ⊆ {ω : ω ∈ Er , |(Sn − Sr )(ω)| > 2γ}. But Er depends on X0 , . . . , Xr so is independent of {ω : |(Sn − Sr )(ω)| > 2γ},
which can be calculated from Xr+1 , . . . , Xn (272K). So

µEr′ ≤ µ{ω : ω ∈ Er , |(Sn − Sr )(ω)| > 2γ} = µEr · Pr(|Sn − Sr | > 2γ)
≤ µEr (Pr(|Sn | > γ) + Pr(|Sr | > γ)) ≤ 2αµEr ,
and µ(Er \ Er′ ) ≥ (1 − 2α)µEr . On the other hand, hEr \ Er′ ir≤n is a disjoint family of sets all included in {ω : |Sn (ω)| ≥
γ}. So
Pn Pn
α ≥ µ{ω : |Sn (ω)| ≥ γ} ≥ r=0 µ(Er \ Er′ ) ≥ (1 − 2α) r=0 µEr ,
and
Pn α
Pr(supr≤n |Sr | ≥ 3γ) = r=0 µEr ≤ min(1, ) ≤ 3α,
1−2α

(considering α ≤ 31 , α ≥ 1
3 separately), as required.

*272W The next result is a similarly direct application of the ideas of this section. While it will not be used in this
volume, it is an accessible and useful representative of a very large number of results on tails of sums of independent
random variables.
Theorem (Hoeffding P 63) Let X0 , . . . , Xn be independent real-valued random variables such that 0 ≤ Xi ≤ 1 a.e.
1 n
for every i. Set S = n+1 i=0 Xi and a = E(S). Then

Pr(S − a ≥ c) ≤ exp(−2(n + 1)c2 )


272X Independence 345

for every c ≥ 0.
proof (a) Set ai = E(Xi ) for each i. If b ≥ 0 and i ≤ n, then
1
E(ebXi ) ≤ exp(bai + b2 ).
8

P Set φ(x) = ebx for x ∈ R. Then φ is convex, so


P
φ(x) ≤ 1 + x(eb − 1)
whenever x ∈ [0, 1],
φ(Xi ) ≤a.e. 1 + (eb − 1)Xi
and
E(ebXi ) = E(φ(Xi )) ≤ 1 + (eb − 1)ai = eh(b)
where h(t) = ln(1 − ai + ai et ) for t ∈ R. Now h(0) = 0,
a i et 1−ai
h′ (t) = =1− , h′ (0) = ai ,
1−ai +ai et 1−ai +ai et

1−ai a i et 1
h′′ (t) = · ≤
1−ai +ai et 1−ai +ai et 4

because ai et and 1 − ai are both greater than or equal to 0. By Taylor’s theorem with remainder, there is some t ∈ [0, b]
such that
1 1
h(b) = h(0) + bh′ (0) + b2 h′′i (t) ≤ bai + b2 ,
2 8
and
1
E(ebXi ) ≤ exp(bai + b2 ). Q
Q
8

(b) Take any b ≥ 0. Then

n
X n
X
Pr(S − a ≥ c) = Pr( (Xi − ai − c) ≥ 0) ≤ E(exp(b Xi − ai − c))
i=0 i=0
Pn Pn
(because exp(b i=0 Xi − ai − c) ≥ 1 whenever i=0 Xi − ai − c ≥ 0)
Yn Yn
−(n+1)bc −bai
=e e E( exp(bXi ))
i=0 i=0
Yn n
Y
= e−(n+1)bc e−bai E(exp(bXi ))
i=0 i=0
(because the random variables exp(bXi ) are independent, by 272E, so the expectation of the product is the product of
the expectations, by 272R)
n
Y 1
≤ e−(n+1)bc e−bai exp(bai + b2 )
8
i=0
((a) above)
n+1 2
= exp(−(n + 1)bc + b ).
8

n+1 2
Now the minimum value of the quadratic b − (n + 1)cb is −2(n + 1)c2 when b = 4c, so Pr(S − a ≥ c) ≤
8
exp(−2(n + 1)c2 ), as claimed.

272X Basic exercises (a) Let (Ω, Σ, µ) be an atomless probability space, and hǫn in∈N any sequence in [0, 1]. Show
that there is an independent sequence hEn in∈N in Σ such that µEn = ǫn for every n. (Hint: 215D.)
346 Probability theory 272Xb

> (b) Let hXi ii∈I be a family of real-valued random variables. Show that it is independent iff
Qn
E(h1 (Xi1 ) × . . . × hn (Xin )) = j=1 E(hj (Xij ))
whenever i1 , . . . , in are distinct members of I and h1 , . . . , hn are Borel measurable functions from R to R such that
E(hj (Xij )) are all finite.

(c) Write out a proof of 272F which does not use the theory of product measures.

(d) Let X = (X1 , . . . , Xn ) be a family of real-valued random variables all defined on the same probability space,
and suppose that X has a density function f expressible in the form f (ξ1 , . . . , ξn ) = f1 (ξ1 )f2 (ξ2 ) . . . fn (ξn ) for suitable
functions f1 , . . . , fn of one real variable. Show that X1 , . . . , Xn are independent.

(e) Let X1 , X2 be independent real-valued random variables both with distribution ν and distribution function F .
Set Y = max(X1 , X2 ). Show that the distribution of Y is absolutely continuous with respect to ν, with a Radon-
Nikodým derivative F + F − , where F − (x) = limt↑x F (t) for every x ∈ R.

(f ) Use 254Sa and the idea of 272J to give another proof of 272O.

(g) Let (Ω, Σ, µ) be a probability


S space and hΣn in∈N a non-decreasing sequence of σ-subalgebras of Σ. Let Σ∞ be
the σ-algebra generated by n∈N Σn . Let T be another σ-subalgebra of Σ such that Σn and T are independent for
each n. Show that Σ∞ and T are independent. (Hint: apply the Monotone Class Theorem to {E : µ(E ∩ F ) = µE · µF
for every F ∈ T}.) Use this to prove 272O.

(h) Let hXn in∈N be a sequence of real-valued random variables and Y a real-valued
P∞ random variable such that Y
and Xn are independent for each n ∈ N. Suppose that Pr(Y ∈ N) = 1 and that n=0 Pr(Y ≥ n)E(|Xn |) is finite. Set
PY PY (ω)
Z = n=0 Xn (that is, Z(ω) = n=0 P∞ Xn (ω) whenever ω ∈ dom Y is such that Y (ω) ∈ N and ω ∈ dom Xn for every
n ≤ Y (ω)). (i) Show that E(Z) = n=0 Pr(Y ≥ n)E(Xn ). (Hint: set Xn′ (ω) = Xn (ω) if Y (ω) ≥ n, 0 otherwise.) (ii)
Show that if E(Xn ) = γ for every n ∈ N then E(Z) = γE(Y ). (This is Wald’s equation.)

(i) Let X1 , . . . , Xn be independent real-valued random variables. Show that if X1 + . . . + Xn has finite expectation
so does every Xj . (Hint: part (b) of the proof of 272S.)

> (j) Let X and Y be independent


R ∞ 1real-valued random variables with densities f and g. Show that X × Y has a
density function h where h(x) = −∞ |y| g(y)f ( xy )dy for almost every x. (Hint: 271K.)

(k) Let X0 , . . . , Xn be independent real-valued random variables such that di ≤ Xi ≤ d′i a.e. for every i. (i) Show
1 1
Pn
that if b ≥ 0 then E(ebXi ) ≤ exp(bai + b2 (d′i − di )2 ) for each i, where ai = E(Xi ). (ii) Set S = n+1 i=0 Xi and
8
a = E(S). Show that
2(n+1)2 c2
Pr(S − a ≥ c) ≤ exp(− )
d
Pn ′
for every c ≥ 0, where d = i=0 (di − di )2 .

(l) Suppose that X0 , . . . , Xn are independent real-valued random variables, all with expectation 0, such that
1 Pn 2
Pr(|Xi | ≤ 1) = 1 for every i. Set S = √ i=0 Xi . Show that Pr(S ≥ c) ≤ exp(−c /2) for every c ≥ 0.
n+1

272Y Further exercises (a) Let X0 , . . . , Xn be independent real-valued random variables with distributions
ν0 , . . . , νn and distribution functions F0 , . . . , Fn . Show that, for any Borel set E ⊆ R,
Pn R Qi−1 Qn
Pr(supi≤n Xi ∈ E) = i=0 E j=0 Fj− (x) j=i+1 Fj (x)νi (dx),
Q−1 Qn
where Fj− (x) = limt↑x Fj (t) for each j, and we interpret the empty products j=0 Fj− (x), j=n+1 Fj (x) as 1.

(b) Let X = hXn in∈N be an independent sequence of real-valued random variables on a complete probability space
(B)
(Ω, Σ, µ). Let B be the Borel σ-algebra of R N (271Ya). Let νX be the probability measure with domain B defined by
(B) (B)
setting νX E = µX X −1 [E] for every E ∈ B, and write νX for the completion of νX . Show that νX is just the product
of the distributions νXn .
272 Notes Independence 347

(c) Let X1 , . . . , Xn be real-valued random variables such that for each j < n the family
(X1 , . . . , Xj , −Xj+1 , . . . , −Xn )
has the same joint distribution as the original family (X1 , . . . , Xn ). Set Sj = X1 + . . . + Xj for each j ≤ n. (i) Show
that for any a ≥ 0
Pr(sup1≤j≤n |Sj | ≥ a) ≤ 2 Pr(|Sn | ≥ a).
T
(Hint: show that if Ej = {ω : ω ∈ i≤n dom Xi , |Si (ω)| < a for i < j, |Sj (ω)| ≥ a} then µ{ω : ω ∈ Ej , |Sn (ω)| ≥
|Sj (ω)|} ≥ 12 µEj .) (ii) Show that E(supj≤n |Sj |) ≤ 2E(|Sn |). (iii) Show that E(supi≤n Si2 ) ≤ 2E(Sn2 ).
Pn
(d) Let hXi ii∈N be an independent sequence of real-valued random variables, and set Sn = i=0 Xi for each n.
Show that if hSn in∈N converges to S in L0 for the topology of convergence in measure, then hSn in∈N converges to S
a.e.

(e) Let (Ω, Σ, µ) be a probability space.


(i) Let hEn in∈N be an independent sequence in Σ. Show that for any real-valued random variable X with finite
expectation,
R
limn→∞ En
X dµ − µEn E(X) = 0.
(Hint: let T0 be the subalgebra of Σ generated by {En : n ∈ N} and T the σ-subalgebra of Σ generated by {En : n ∈ N}.
Start by considering X = χE for E ∈ T0 and then X = χE for E ∈ T. Move from L1 (µ↾ T) to L1 (µ) by using
conditional expectations.)
(ii) Let hXn in∈N be a uniformly integrable independent sequence of real-valued random variables on Ω. Show that
for any bounded real-valued random variable Y ,
limn→∞ E(Xn × Y ) − E(Xn )E(Y ) = 0.
(iii) Suppose that 1 < p ≤ ∞ and set q = p/(p − 1) (taking q = 1 if p = ∞). Let hXn in∈N be an independent
sequence of real-valued random variables with supn∈N kXn kp < ∞, and Y a real-valued random variable with kY kq < ∞.
Show that
limn→∞ E(Xn × Y ) − E(Xn )E(Y ) = 0.

(f ) Let (Ω, Σ, µ) be a probability space and hZn in∈N a sequence of random variables on Ω such that Pr(Zn ∈ N) = 1
for each n, and Pr(Zm = Zn ) = 0 for all m 6= n. Let hXn in∈N be a sequence of real-valued random variables on Ω, all
with the same distribution ν, and independent of each other and the Zn , in the sense that
S if Σn is the σ-algebra defined
by Xn , and Tn the σ-algebra defined by Zn , and T is the σ-algebra generated by n∈N Tn , then (T, Σ0 , Σ1 , . . . ) is
independent. Set Yn (ω) = XZn (ω) (ω) whenever this is defined, that is, ω ∈ dom Zn , Zn (ω) ∈ N and ω ∈ dom XZn (ω) .
Show that hYn in∈N is an independent sequence of random variables and that every Yn has the distribution ν.

(g) Show that all the ideas of this section apply equally to complex-valued random variables, subject to suitable
adjustments (to be devised).

(h) Develop a theory of independence for random variables taking values in R r , following through as many as
possible of the ideas of this section.

272 Notes and comments This section is lengthy for two reasons: I am trying to pack in the basic results associated
with one of the most fertile concepts of mathematics, and it is hard to know where to stop; and I am trying to do this
in language appropriate to abstract measure theory, insisting on a variety of distinctions which are peripheral to the
essential ideas. For while I am prepared to be flexible on the question of whether the letter X should denote a space or
a function, some of the applications of these results which are most important to me are in contexts where we expect
to be exactly clear what the domains of our functions are. Consequently it is necessary to form an opinion on such
matters as what the σ-algebra defined by a random variable really is (272C).
The point of 272Q is that the family E does not have to be related in any way to the family hΣi ii∈I , except, of
course, that we are dealing with measurable sets. All we need to know is that I should be large compared with E;
for instance, that E is countable and I is uncountable. The family hΣj ij∈J is now a kind of ‘tail’ of hΣi ii∈I , safely
independent of the ‘head’ σ-algebra generated by E.
Of course I should emphasize again that such proofs as those in 272R-272S are to be thought of as confirmations
that we have a suitable model of probability theory, rather than as reasons for believing the results to be valid in
348 Probability theory 272 Notes

statistical contexts. Similarly, 272T-272U can be approached by a variety of intuitions concerning discrete random
variables and random variables with continuous densities, and while the elegant general results are delightful, they are
more important to the pure mathematician than to the statistician. But I came to an odd obstacle in the proof of 272S,
when showing that if X1 + . . . + Xn has finite variance then so does every Xj . We have done enough measure theory for
this to be readily dealt with, but the connexion with ordinary probabilistic intuition, both here and in 272Xi, remains
unclear to me.
There are four ideas in 272W worth storing for future use. The first is the estimate
E(ebXi ) ≤ 1 − ai + eb ai
in part (a), a crude but effective way of using the hypothesis that Xi is bounded. The second is the use of Taylor’s
theorem to show that 1 − ai + eb ai ≤ exp(ai + 18 b2 ). The third is the estimate
Pr(Y ≥ 0) ≤ E(ebY ) if b ≥ 0
used in part (b); and the fourth is 272R. After this one need only be sufficiently determined to reach 272Xk. But even
the special case 272W is both striking and useful.

273 The strong law of large numbers


I come now to the first of the three main theorems of this chapter. Perhaps I should call it a ‘principle’, rather
than a ‘theorem’, as I shall not attempt to enunciate any fully general form, but will give three theorems (273D, 273H,
273I), with a variety of corollaries, each setting out conditions under which the averages of a sequence of independent
random variables will almost surely converge. At the end of the section (273N) I add a result on norm-convergence of
averages.

273A It will be helpful to start with an explicit statement of a very simple but very useful lemma.
P∞
Lemma Let hEn in∈N be a sequence of measurable sets in a measure space (Ω, Σ, µ), and suppose that n=0 µEn < ∞.
Then {n : ω ∈ En } is finite for almost every ω ∈ Ω.
proof We have
\ [ [
µ{ω : {n : ω ∈ En } is infinite} = µ( Em ) = inf µ( Em )
n∈N
n∈N m≥n m≥n

X
≤ inf µEm = 0.
n∈N
m=n

Pn
273B Lemma Let hXn in∈N be an independent sequence of real-valued random variables, and set Sn = i=0 Xi
for each n ∈ N.
(a) If hSn in∈N is convergent in measure, then it isP
convergent almost everywhere.
P∞

(b) In particular, if E(Xn ) = 0 for every n and n=0 E(Xn2 ) < ∞, then n=0 Xn is defined, and finite, almost
everywhere.
proof (a) Let (Ω, Σ, µ) be the underlying probability space. If we change each Xn on a negligible set, we do not change
the independence of hXn in∈N (272H), and the Sn are also changed only on a negligible set; so we may suppose from
the beginning that every Xn is a measurable function defined on the whole of Ω.
Because the functional X 7→ E(min(1, |X|)) is one of the pseudometrics defining the topology of convergence in
measure (245A), limm,n→∞ E(min(1, |Sm −Sn |)) = 0, and we can find for each k ∈ N an nk ∈ N such that E(min(1, |Sm −
Snk |)) ≤ 4−k for every m ≥ nk . So Pr(|Sm − Snk | ≥ 2−k ) ≤ 2−k for every m ≥ nk . By Etemadi’s lemma (272V)
applied to hXi ii≥nk ,
Pr(supnk ≤m≤n |Sm − Snk | ≥ 3 · 2−k ) ≤ 3 · 2−k
for every n ≥ nk . Setting
Hkn = {ω : supnk ≤m≤n |Sm (ω) − Snk (ω)| ≥ 3 · 2−k } for n ≥ nk ,
S
Hk = n≥nk Hkn ,
we have
273C The strong law of large numbers 349

µHk = limn→∞ µHkn ≤ 3 · 2−k


P∞
for each k, so k=0 µHk is finite and almost every ω ∈ Ω belongs to only finitely many of the Hk (273A).
S take any such ω. Then there is some r ∈ N such that ω ∈
Now / Hk for any k ≥ r. In this case, for every k ≥ r,
ω∈ / n≥nk Hkn , that is, |Sn (ω) − Snk (ω)| < 3 · 2−k for every n ≥ nk . But this means that hSn (ω)in∈N is a Cauchy
sequence, therefore convergent. Since this is true for almost every ω, hSn in∈N converges almost everywhere, as claimed.
P∞
(b) Now suppose that E(Xn ) = 0 for every n and that n=0 E(Xn2 ) < ∞. In this case, for any m < n,

kSn − Sm k21 ≤ kχΩk22 kSn − Sm k22


(by Cauchy’s inequality, 244Eb)
= E(Sn − Sm )2 = Var(Sn − Sm )
Pn
(because E(Sn − Sm ) = i=m+1 E(Xi ) = 0)
n
X
= Var(Xi )
i=m+1

(by Bienaymé’s equality, 272S)


→0

as m → ∞. So hSn• in∈N is a Cauchy sequence in L1 (µ) and converges in L1 (µ), by 242F; by 245G, it converges in
0 0
P∞ in L (µ), that is, hSn in∈N converges in measure in L (µ). By (a), hSn in∈N converges almost everywhere, that
measure
is, i=0 Xi is defined and finite almost everywhere.

Remark The proof above assumes familiarity with the ideas of Chapter 24. However part (b), at least, can be
established without any of these; see 273Xa. In 276B there is a generalization of (b) based on a different approach.

273C We now need a lemma (part (b) below) from the theory of summability. I take the opportunity to include
an elementary fact which will be useful later in this section and elsewhere.
1 Pn
Lemma (a) If limn→∞ xn = x, then limn→∞ i=0 xi = x.
n+1
1 Pn
(b) Let hxn in∈N be summable, and hbn in∈N a non-decreasing sequence in [0, ∞[ diverging to ∞. Then limn→∞ k=0 bk xk
bn
0.
Pm−1
proof (a) Let ǫ > 0. Let m be such that |xn − x| ≤ ǫ whenever n ≥ m. Let m′ ≥ m be such that | i=0 x − xi | ≤ ǫm′ .
Then for n ≥ m′ we have

n
X n
1 1 X
|x − xi | = | x − xi |
n+1 n+1
i=0 i=0
m−1
X n
X
1 1
≤ | x − xi | + |x − xi |
n+1 n+1
i=0 i=m
ǫm′ ǫ(n−m+1)
≤ + ≤ 2ǫ.
n+1 n+1

1 Pn
As ǫ is arbitrary, limn→∞ i=0 xi = x.
n+1
Pn
(b) Let ǫ > 0. Write sn = i=0 xi for each n, and
P∞
s = limn→∞ sn = i=0 xi ;
set s∗ = supn∈N |sn | < ∞. Let m ∈ N be such that |sn − s| ≤ ǫ whenever n ≥ m; then |sn − sj | ≤ 2ǫ whenever j,
n ≥ m. Let m′ ≥ m be such that bm s∗ ≤ ǫbm′ .
Take any n ≥ m′ . Then
350 Probability theory 273C

n
X
| bk xk | = |b0 s0 + b1 (s1 − s0 ) + . . . + bn (sn − sn−1 )|
k=0
= |(b0 − b1 )s0 + (b1 − b2 )s1 + . . . + (bn−1 − bn )sn−1 + bn sn |
n−1
X
= |b0 sn + (bi+1 − bi )(sn − si )|
i=0
m−1
X n−1
X
≤ b0 |sn | + (bi+1 − bi )|sn − si | + (bi+1 − bi )|sn − si |
i=0 i=m
m−1
X n−1
X
≤ b0 s∗ + 2s∗ (bi+1 − bi ) + 2ǫ (bi+1 − bi )
i=0 i=m
= b0 s∗ + 2s∗ (bm − b0 ) + 2ǫ(bn − bm ) ≤ 2s∗ bm + 2ǫbn .

Consequently, because bn ≥ bm′ ,


1 Pn s∗ b m
| k=0 bk xk | ≤2 + 2ǫ ≤ 4ǫ.
bn bn
As ǫ is arbitrary,
1 Pn
limn→∞ k=0 bk xk = 0,
bn
as required.
Remark Part (b) above is sometimes called ‘Kronecker’s lemma’.

273D The strong law of large numbers: first form Let hXn in∈N be an independent sequence of real-
valued random variables, and suppose that hbn in∈N is a non-decreasing sequence in ]0, ∞[, diverging to ∞, such
P∞ 1
that n=0 2 Var(Xn ) < ∞. Then
bn
1 Pn
limn→∞ i=0 (Xi − E(Xi )) = 0
bn
almost everywhere.
proof As usual, write (Ω, Σ, µ) for the underlying probability space. Set
1
Yn = (Xn − E(Xn ))
bn
for each n; then hYn in∈N is independent (272E), E(Yn ) = 0 for each n, and
P∞ 2
P∞ 1
n=0 E(Yn ) = n=0 2 Var(Xn ) < ∞. bn
By 273B, hYn (ω)in∈N is summable for almost every ω ∈ Ω. But by 273Cb,
1 Pn 1 Pn
limn→∞ i=0 (Xi (ω) − E(Xi )) = limn→∞
bn i=0 bi Yi (ω) = 0 bn
for all such ω. So we have the result.

273E Corollary Let hXn in∈N be an independent sequence of random variables such that E(Xn ) = 0 for every n
and supn∈N E(Xn2 ) < ∞. Then
1
limn→∞ (X0 + . . . + Xn ) = 0
bn
P∞ 1
almost everywhere whenever hbn in∈N is a non-decreasing sequence of strictly positive numbers and n=0 b2 is finite.
n
In particular,
1
limn→∞ (X0 + . . . + Xn ) = 0
n+1
273H The strong law of large numbers 351

almost everywhere.
Remark For most √ of the rest of this section, we shall take bn = n + 1. The special virtue of 273D is that it allows
other bn , e.g., bn = n ln n. A direct strengthening of this theorem is in 276C below.

273F Corollary Let hEn in∈N be an independent sequence of measurable sets in a probability space (Ω, Σ, µ). and
suppose that
1 Pn
limn→∞ i=0 µEi = c.
n+1
Then
1
limn→∞ #({i : i ≤ n, ω ∈ Ei }) = c
n+1
for almost every ω ∈ Ω.
proof In 273D, set Xn = χEn , bn = n + 1. For almost every ω, we have
1 Pn
limn→∞ i=0 (χEi (ω) − ai ) = 0,
n+1

writing ai = µEi = E(Xi ) for each i. (I see that I am relying on 272F to support the claim that hXn in∈N is independent.)
But for any such ω,
n
X
1 1 
lim #({i : i ≤ n, ω ∈ Ei }) − ai
n→∞ n+1 n+1
i=0
n
X
1
= lim (χEi (ω) − ai ) = 0;
n→∞ n+1
i=0

1 Pn
because we are supposing that limn→∞ i=0 ai = c, we must have
n+1
1
limn→∞ #({i : i ≤ n, ω ∈ Ei }) = c,
n+1
as required.

273G Corollary Let µ be the usual measure on PN, as described in 254Jb. Then for µ-almost every set a ⊆ N,
1
limn→∞ #(a ∩ {0, . . . , n}) = 12 .
n+1

proof The sets En = {a : n ∈ a} are independent, with measure 21 .


1
Remark The limit limn→∞ #(a ∩ {0, . . . , n}) is called the asymptotic density of a.
n+1

273H Strong law of large numbers: second form Let hXn in∈N be an independent sequence of real-valued
random variables, and suppose that supn∈N E(|Xn |1+δ ) < ∞ for some δ > 0. Then
1 Pn
limn→∞ i=0 (Xi − E(Xi )) = 0
n+1
almost everywhere.
proof As usual, call the underlying probability space (Ω, Σ, µ); as in 273B we can adjust the Xn on negligible sets
so as to make them measurable and defined everywhere on Ω, without changing E(Xn ), E(|Xn |) or the convergence of
the averages except on a negligible set.
(a) For each n, define a random variable Yn on Ω by setting

Yn (ω) = Xn (ω) if |Xn (ω)| ≤ n,


= 0 if |Xn (ω)| > n.
Then hYn in∈N is independent (272E). For each n ∈ N,
352 Probability theory 273H

Var(Yn ) ≤ E(Yn2 ) ≤ E(n1−δ |Xn |1+δ ) ≤ n1−δ K,


where K = supn∈N E(|Xn |1+δ ), so
P∞ 1 P∞ n1−δ
n=0 (n+1)2 Var(Yn ) ≤ n=0 (n+1)2 K < ∞.

By 273D,
1 Pn
G = {ω : limn→∞ (Yi (ω) − E(Yi )) = 0}
n+1 i=0
is conegligible.
(b) On the other hand, setting
En = {ω : Yn (ω) 6= Xn (ω)} = {ω : |Xn (ω)| > n},
1+δ
we have K ≥ n µEn for each n, so
P∞ P∞ 1
n=0 µEn ≤ 1 + K n=1 n1+δ < ∞,

and the set H = {ω : {n : ω ∈ En } is finite} is conegligible (273A). But of course


1 Pn
limn→∞ i=0 (Xi (ω) − Yi (ω)) = 0
n+1
for every ω ∈ H.
(c) Finally,
R R
|E(Yn ) − E(Xn )| ≤ En
|Xn | ≤ En
n−δ |Xn |1+δ ≤ n−δ K
whenever n ≥ 1, so limn→∞ E(Yn ) − E(Xn ) = 0 and
1 Pn
limn→∞ E(Yi ) − E(Xi ) =0
n+1 i=0

(273Ca). Putting these three together, we get


1 Pn
limn→∞ Xi (ω) − E(Xi ) =0
n+1 i=0
whenever ω belongs to the conegligible set G ∩ H. So
1 Pn
limn→∞ Xi − E(Xi ) = 0
n+1 i=0
almost everywhere, as required.

273I Strong law of large numbers: third form Let hXn in∈N be an independent sequence of real-valued
random variables of finite expectation, and suppose that they are identically distributed, that is, all have the same
distribution. Then
1 Pn
limn→∞ i=0 (Xi − E(Xi )) = 0
n+1
almost everywhere.
proof The proof follows the same line as that of 273H, but some of the inequalities require more delicate arguments.
As usual, call the underlying probability space (Ω, Σ, µ) and suppose that the Xn are all measurable and defined
everywhere on Ω. (We need to remember that changing a random variable on a negligible set does not change its
distribution.) Let ν be the common distribution of the Xn .
(a) For each n, define a random variable Yn on Ω by setting

Yn (ω) = Xn (ω) if |Xn (ω)| ≤ n,


= 0 if |Xn (ω)| > n.

Then hYn in∈N is independent (272E). For each n ∈ N,


R
Var(Yn ) ≤ E(Yn2 ) = [−n,n]
x2 ν(dx)
273I The strong law of large numbers 353

P∞ 1 2
(271E). To estimate n=0 (n+1)2 E(Yn ), set

x2
fn (x) = if |x| ≤ n, 0 if |x| > n,
(n+1)2

1 R
so that Var(Yn ) ≤ fn dν. If r ≥ 1 and r < |x| ≤ r + 1 then
(n+1)2

X ∞
X 1
fn (x) ≤ (r + 1)|x|
(n+1)2
n=0 n=r+1

X 1 1
≤ (r + 1)|x| ( − ) ≤ |x|,
n n+1
n=r+1

while if |x| ≤ 1 then


P∞ P∞ 1 π2
n=0 fn (x) ≤ n=0 (n+1)2 = ≤ 2 < ∞.
6
π2
(You do not need to know that the sum is 6 , only that it is finite; but see 282Xo.) Consequently
P∞
f (x) = n=0 fn (x) ≤ 2 + |x|
R R
for every x, and f dν < ∞, because |x|ν(dx) is the common value of E(|Xn |), and is finite. By any of the great
convergence theorems,
P∞ 1 P∞ R R
n=0 2
Var(Yn ) ≤ n=0 fn dν = f dν < ∞.
(n+1)
By 273D,
1 Pn
G = {ω : limn→∞ (Yi (ω) − E(Yi )) = 0}
n+1 i=0
is conegligible.
(b) Next, setting
En = {ω : Xn (ω) 6= Yn (ω)} = {ω : |Xn (ω)| > n},
we have
S
En = i≥n Fni ,
where
Fni = {ω : i < |Xn (ω)| ≤ i + 1}.
Now
µFni = ν{x : i < |x| ≤ i + 1}
for every n and i. So

X X ∞
∞ X ∞ X
X i
µEn = µFni = µFni
n=0 n=0 i=n i=0 n=0
X∞ Z
= (i + 1)ν{x : i < |x| ≤ i + 1} ≤ (1 + |x|)ν(dx) < ∞.
i=0

Consequently the set H = {ω : {n : Xn (ω) 6= Yn (ω)} is finite} is conegligible (273A). But of course
1 Pn
limn→∞ i=0 Xi (ω) − Yi (ω) = 0
n+1
for every ω ∈ H.
(c) Finally,
R R
|E(Yn ) − E(Xn )| ≤ En
|Xn | = R\[−n,n]
|x|ν(dx)
whenever n ∈ N, so limn→∞ E(Yn ) − E(Xn ) = 0 and
354 Probability theory 273I

1 Pn
limn→∞ E(Yi ) − E(Xi ) =0
n+1 i=0

(273Ca). Putting these three together, we get


1 Pn
limn→∞ Xi (ω) − E(Xi ) =0
n+1 i=0
whenever ω belongs to the conegligible set G ∩ H. So
1 Pn
limn→∞ Xi − E(Xi ) = 0
n+1 i=0
almost everywhere, as required.
Remarks In my own experience, this is the most important form of the strong law from the point of view of ‘pure’
measure theory. I note that 273G above can also be regarded as a consequence of this form.
For a very striking alternative proof, see 275Yn. Yet another proof treats this result as a special case of the Ergodic
Theorem (see 372Xg in Volume 3).
R
273J Corollary Let (Ω, Σ, µ) be a probability space. If f is a real-valued function such that f dµ is defined in
[−∞, ∞], then
1 Pn R
limn→∞ i=0 f (ωi ) = f dµ
n+1

for λ-almost every ω = hωn in∈N ∈ Ω , where λ is the product measure on ΩN (254A-254C).
N

proof (a) To begin with, suppose that f is integrable. Define functions Xn on ΩN by setting
ω ) = f (ωn ) whenever ωn ∈ dom f .
Xn (ω
Then hXn in∈N is an independent sequence of random variables, all with the same distribution as f (272M). So
1 Pn R 1 Pn
limn→∞ i=0 f (ωi ) − f dµ = limn→∞ ω ) − E(Xi ) = 0
i=0 Xi (ω
n+1 n+1
for almost every ω , by 273I, and
1 Pn R
limn→∞ f (ωi ) = f dµ
n+1 i=0
for almost every ω .
R
(b) Next, suppose that f ≥ 0 and f = ∞. In this case, for every m ∈ N,
n
X n
X
1 1
lim inf f (ωi ) ≥ lim inf min(m, f (ωi ))
n→∞ n+1 n→∞ n+1
i=0 i=0
Z
= min(m, f (ω))µ(dω)

for almost every ω , so


1 Pn R
lim inf n→∞ f (ωi ) ≥ supm∈N min(m, f (ω))µ(dω) = ∞
n+1 i=0
and
1 Pn R
limn→∞ f (ωi ) =∞= f
n+1 i=0
for almost every ω .
R R
(c) In general, if f = ∞, this is because f + = ∞ and f − is integrable, so
n
X n
X n
X
1 1 1
lim f (ωi ) = lim f + (ωi ) − lim f − (ωi )
n→∞ n+1 n→∞ n+1 n→∞ n+1
i=0 i=0 i=0
Z Z
=∞− f− = f

for almost every ω . Similarly,


273L The strong law of large numbers 355

1 Pn
limn→∞ f (ωi ) = −∞
n+1 i=0
R
for almost every ω if f dµ = −∞.
Remark I find myself slipping here into measure-theorists’ terminology; this corollary is one of the basic applications
of the strong law to measure theory. Obviously, in view of 272J and 272M, this corollary covers 273I. It could also
(in theory) be used as a definition of integration on a probability space (see 273Ya); it is sometimes called the ‘Monte
Carlo’ method of integration.

273K It is tempting to seek extensions of 273I in which the Xn are not identically distributed, but are otherwise
well-behaved. Any such idea should be tested against the following example. I find that I need another standard result,
complementing that in 273A.
Borel-Cantelli
P∞ lemma Let (Ω, Σ, µ) be a probability space and hEn in∈N a sequence of measurable subsets of Ω such
that n=0 µEn = ∞ and µ(Em ∩En ) ≤ µEm ·µEn whenever m 6= n. Then almost every point of Ω belongs to infinitely
many of the En .
Pn Pn
proof For n, k ∈ N set Xn = i=0 χEi , βn = i=0 µEn = E(Xn ) and Fnk = {x : x ∈ Ω, #({i : i ≤ n, x ∈ Ei }) ≤ k}.
Then

X n
n X n
X n X
X
E(Xn2 ) = µ(Ei ∩ Ej ) ≤ µEi + µEi · µEj
i=0 j=0 i=0 i=0 j6=i
n
X
= βn + βn2 − (µEi )2 ,
i=0
so
Pn 2
Var(Xn ) = βn − i=0 (µEi ) ≤ βn .
Now if k < βn ,
(βn − k)2 µFnk = (βn − k)2 Pr(Xn ≤ k) ≤ E(Xn − βn )2 = Var(Xn ) ≤ βn
βn
and µFnk ≤ .
(βn −k)2
Now recall that we are assuming that limn→∞ βn = ∞. So for any k ∈ N,
T βn
µ( n∈N Fnk ) = limn→∞ µFnk ≤ limn→∞ = 0.
(βn −k)2
Accordingly
S T
µ{x : x belongs to only finitely many En } = µ( k∈N n∈N Fnk ) = 0,
and almost every point of Ω belongs to infinitely many En .
Remark Of course this result is usually applied to an independent sequence hEn in∈N . But very occasionally it is of
interest to know that it is enough to assume that weaker hypotheses suffice. See also 273Yb.

273L Now for the promised example.


Example There is an independent sequence hXn in∈N of non-negative random variables such that limn→∞ E(Xn ) = 0
but
1 P∞
lim supn→∞ i=0 Xi − E(Xi ) = ∞,
n+1

1 P∞
lim inf n→∞ Xi − E(Xi ) = 0
n+1 i=0
almost everywhere.
proof Let (Ω, Σ, µ) be a probability space with an independent sequence hEn in∈N of measurable sets such that
1
µEn = for each n. (I have nowhere explained exactly how to build such a sequence. Two obvious
(n+3) ln(n+3)
methods are available to us, and another a trifle less obvious. (i) Take Ω = {0, 1}N and µ to be the product of the
356 Probability theory 273L

1
probabilities µn on {0, 1}, defined by saying that µn {1} = for each n; set En = {ω : ω(n) = 1}, and
(n+3) ln(n+3)
appeal to 272M to check that the En are independent. (ii) Build the En inductively as subsets of [0, 1], arranging
that each En should be a finite union of intervals, so that when you come to choose En+1 the sets E0 , . . . , En define a
partition In of [0, 1] into intervals, and you can take En+1 to be the union of (say) the left-hand subintervals of length
1
a proportion of the intervals in In . (iii) Use 215D to see that the method of (ii) can be used on any
(n+3) ln(n+3)
atomless probability space, as in 272Xa.)
Set Xn = (n + 3) ln ln(n + 3)χEn for each n; then hXn in∈N is an independent sequence of real-valued random
ln ln(n+3)
variables (272F) and E(Xn ) = for each n, so that E(Xn ) → 0 as n → ∞. Thus, for instance, {Xn : n ∈ N}
ln(n+3)
1 Pn
is uniformly integrable and hXn in∈N → 0 in measure (246Jc); while surely limn→∞ i=0 E(Xi ) = 0. n+1
On the other hand,

X ∞
X Z ∞
1 1
µEn = ≥ dx
(n+3) ln(n+3) 0
(x+3) ln(x+3)
n=0 n=0
= lim (ln ln(a + 3) − ln ln 3) = ∞,
a→∞

so almost every ω belongs to infinitely many of the En , by the Borel-Cantelli lemma (273K). Now if we write Yn =
1 Pn
n+1 i=0 Xi , then if ω ∈ En we have Xn (ω) = (n + 3) ln ln(n + 3) so

n+3
Yn (ω) ≥ ln ln(n + 3).
n+1
This means that
n
X n
X
1 1
{ω : lim sup (Xi (ω) − E(Xi )) = ∞} = {ω : lim sup Xi (ω) = ∞}
n→∞ n+1 n→∞ n+1
i=0 i=0
= {ω : sup Yn (ω) = ∞} ⊇ {ω : {n : ω ∈ En } is infinite}
n∈N

is conegligible, and the strong law of large numbers does not apply to hXn in∈N .
Because
limn→∞ kYn k1 = limn→∞ E(Yn ) = limn→∞ E(Xn ) = 0
(273Ca), hYn in∈N → 0 for the topology of convergence in measure, and hYn in∈N has a subsequence converging to 0
almost everywhere (245K). So
1 Pn
lim inf n→∞ i=0 (Xi (ω) − E(Xi )) = lim inf n→∞ Yn (ω) = 0
n+1
for almost every ω. The fact that both lim supn→∞ Yn and lim inf n→∞ Yn are constant almost everywhere is of course
a consequence of the zero-one law (272P).

*273M All the above has been concerned with pointwise convergence of the averages of independent random
variables, and that is the important part of the work of this section. But it is perhaps worth complementing it with a
brief investigation of norm-convergence. To deal efficiently with convergence in Lp , we need the following. (I should
perhaps remark that, compared with the general case treated here, the case p = 2 is trivial; see 273Xl.)
Lemma For any p ∈ ]1, ∞[ and ǫ > 0, there is a δ > 0 such that kS + Xkp ≤ 1 + ǫkXkp whenever S and X are
independent random variables, kSkp = 1, kXkp ≤ δ and E(X) = 0.
proof (a) Take ζ ∈ ]0, 1] such that pζ ≤ 2 and
p2 2
(1 + ξ)p ≤ 1 + pξ + ξ
2

whenever |ξ| ≤ ζ; such exists because


(1+ξ)p −1−pξ p(p−1) p2
limξ→0 = < .
ξ2 2 2
Observe that
*273M The strong law of large numbers 357

1
(1 + ξ)p ≤ (1 + )p + ξ p + 2pξ p−1
ζ
1 1
for every ξ ≥ 0. P
P If ξ ≤ , this is trivial. If ξ ≥ , then
ζ ζ
1 p p2
(1 + ξ)p = ξ p (1 + )p ≤ ξ p (1 + + )
ξ ξ 2ξ 2
p p2 ζ pζ
≤ ξ p (1 + + ) = ξ p + pξ p−1 (1 + ) ≤ ξ p + 2pξ p−1 . Q
Q
ξ 2ξ 2

ǫ
Define η > 0 by declaring that 3η p−1 = (this is one of the places where we need to know that p > 1). Let δ > 0
2
be such that
p2 1 pǫ
δ ≤ ηζ, δ + (1 + )p δ p−1 ≤ .
2η 2 ζ 2

(b) Now suppose that S and X are independent random variables with kSkp = 1, kXkp ≤ δ and E(X) = 0. If
kXkp = 0 then of course kS + Xkp ≤ 1 + ǫkXkp , so suppose that X is non-trivial. Write (Ω, Σ, µ) for the underlying
probability space and adjust S and X on negligible sets so that they are measurable and defined everywhere on Ω. Set
α = kXkp , γ = α/η,
E = {ω : S(ω) 6= 0}, F = {ω : |X(ω)| > γ|S(ω)|}, β = kS × χF kp .
Then

Z Z Z
p p
|S + X| = |S + X| + |S + X|p
F E\F
(because S and X are both zero on Ω \ (E ∪ F ))
Z
X p
= k(S × χF ) + (X × χF )kpp + |S|p |1 + |
E\F
S
Z
X p2 2
≤ (kS × χF kp + kX × χF )kp )p + |S|p (1 + p + γ )
E\F
S 2

X δ
(because | | ≤γ≤ ≤ ζ ≤ 1 everywhere on E \ F )
S η
Z Z
p p2 2 p
≤ (β + α) + (1 + γ ) |S| + p |S|p−1 × sgn S × X
2 E\F E\F
(writing sgn(ξ) = ξ/|ξ| if ξ 6= 0, 0 if ξ = 0)
Z Z
p2 2
= (β + α)p + (1 + γ ) |S|p + p |S|p−1 × sgn S × X
2 Ω\F Ω\F
(because S = 0 on Ω \ E)
Z
p β p p2 2
= α (1 + ) + (1 + γ )(1 − β p ) − p |S|p−1 × sgn S × X
α 2 F
R
(because X and |S|p−1 × sgn S are independent, by 272L, so |S|p−1 × sgn S × X = E(|S|p−1 × sgn S)E(X) = 0)
1 β β  p2
≤ αp (1 + )p + 2p( )p−1 + ( )p + (1 + γ 2 )(1 − β p )
ζ α α 2
Z
+p |S|p−1 × |X|
F
(see (a) above)
1 p2 2
≤ αp (1 + )p + β p + 2pβ p−1 α + (1 + γ )(1 − β p )
ζ 2
Z
1
+p p−1
|X|p
F
γ
p
1 α p2 2 αp
≤ αp (1 + )p + 2p +1+ γ +p
ζ γ p−1 2 γ p−1
1 α
(because β = kS × χF kp ≤ kX × χF kp ≤ )
γ γ
358 Probability theory *273M

1 p 2 α2
= αp (1 + )p + 3pη p−1 α + 1 +
ζ 2η 2
1 p2 α 
= 1 + αp−1 (1 + )p + 3pη p−1 + α
ζ 2η 2
1 p2 δ 
≤ 1 + δ p−1 (1 + )p + 3pη p−1 + α
ζ 2η 2
≤ 1 + pαǫ ≤ (1 + ǫkXkp )p .

So kS + Xkp ≤ 1 + ǫkXkp , as required.


*Remark What is really happening here is that φ = k kpp : Lp → R is differentiable (as a real-valued function on the
normed space Lp ) and
R
φ′ (S • )(X • ) = p |S|p−1 × sgn S × X,
so that in the context here
φ((S + X)• ) = φ(S • ) + φ′ (S • )(X • ) + o(kXkp ) = 1 + o(kXkp )
and kS +Xkp = 1+o(kXkp ). The calculations above are elaborate partly because they do not appeal to any non-trivial
ideas about normed spaces, and partly because we need the estimates to be uniform in S.

273N Theorem Let hXn in∈N be an independent sequence of real-valued random variables with zero expectation,
1
and set Yn = n+1 (X0 + . . . + Xn ) for each n ∈ N.
(a) If hXn in∈N is uniformly integrable, then limn→∞ kYn k1 = 0.
*(b) If p ∈ ]1, ∞[ and supn∈N kXn kp < ∞, then limn→∞ kYn kp = 0.
proof (a) Let ǫ > 0. Then there is an M ≥ 0 such that E(|Xn | − M )+ ≤ ǫ for every n ∈ N. Set
Xn′ = (−M χΩ) ∨ (Xn ∧ M χΩ), αn = E(Xn′ ), X̃n = Xn′ − αn , Xn′′ = Xn − Xn′
for each n ∈ N. Then hXn′ in∈N and hX̃n in∈N are independent and uniformly bounded, and kXn′′ k1 ≤ ǫ for every n. So
if we write
1 Pn 1 Pn
Ỹn = i=0 X̃i , Yn′′ = ′′
i=0 Xi ,
n+1 n+1

hỸn in∈N → 0 almost everywhere, by 273E (for instance), while kYn′′ k1 ≤ ǫ for every n. Moreover,
|αn | = |E(Xn′ − Xn )| ≤ E(|Xn′′ |) ≤ ǫ
for every n. As |Ỹn | ≤ 2M almost everywhere for each n, limn→∞ kỸn k1 = 0, by Lebesgue’s Dominated Convergence
Theorem. So

lim sup kYn k1 = lim sup kỸn + Yn′′ + αn k1


n→∞ n→∞

≤ lim kỸn k1 + sup kYn′′ k1 + sup |αn |


n→∞ n∈N n∈N

≤ 2ǫ.
As ǫ is arbitrary, limn→∞ kYn k1 = 0, as claimed.
Pn
*(b) Set M = supn∈N kXn kp . For n ∈ N, set Sn = i=0 Xi . Let ǫ > 0. Then there is a δ > 0 such that
kS + Xkp ≤ 1 + ǫkXkp whenever S and X are independent random variables, kSkp = 1, kXkp ≤ δ and E(X) = 0
(273M). It follows that kS + Xkp ≤ kSkp + ǫkXkp whenever S and X are independent random variables, kSkp is finite,
kXkp ≤ δkSkp and E(X) = 0. In particular, kSn+1 kp ≤ kSn kp + ǫM whenever kSn kp ≥ M/δ. An easy induction shows
that
M
kSn kp ≤ + M + nǫM
δ
for every n ∈ N. But this means that
1
lim supn→∞ kYn kp = lim supn→∞ kSn kp ≤ ǫM .
n+1
As ǫ is arbitrary, limn→∞ kYn kp = 0.
Remark There are strengthenings of (a) in 276Xd, and of (b) in 276Ya.
273Xk The strong law of large numbers 359

273X Basic exercises (a) In part (b) of the proof of 273B, use Bienaymé’s equality to show that limm→∞ supn≥m Pr(|Sn −
Sm | ≥ ǫ) = 0 for every ǫ > 0, so that we can apply the argument of part (a) of the proof directly, without appealing
to 242F or 245G or even 244E.
P∞ (−1)ω(n)
(b) Show that n=0 is defined in R for almost every ω = hω(n)in∈N in {0, 1}N , where {0, 1}N is given its
n+1
usual measure (254J).

(c) Let hEn in∈N be an independent sequence of measurable sets in a probability


P∞ space, all with thePsame non-zero

measure. Let han in∈N be a sequence of non-negative real numbers such that n=0 an = ∞. Show that n=0 an χEn =
Pkn+1 ai
∞ a.e. (Hint: Take a strictly increasing sequence hkn in∈N such that dn = i=k n +1
ai ≥ 1 for each n. Set ci =
(n+1)dn
P∞ P∞ pP n
for kn < i ≤ kn+1 ; show that n=0 c2n < ∞ = n=0 cn . Apply 273D with Xn = cn χEn and bn = i=0 ci .)

> (d) Take any q ∈ [0, 1], and give PN a measure µ such that
µ{a : I ⊆ a} = q #(I)
for every I ⊆ N, as in 254Xg. Show that for µ-almost every a ⊆ N,
1
limn→∞ #(a ∩ {0, . . . , n}) = q.
n+1

> (e) Let µ be the usual probability measure on PN (254Jb), and for r ≥ 1 let µr be the product probability measure
on (PN)r . Show that
1
limn→∞ #(a1 ∩ . . . ∩ ar ∩ {0, . . . , n}) = 2−r ,
n+1

1
limn→∞ #((a1 ∪ . . . ∪ ar ) ∩ {0, . . . , n}) = 1 − 2−r
n+1

for µr -almost every (a1 , . . . , ar ) ∈ (PN)r .

#(a∩b∩{0,... ,n})
(f ) Let µ be the usual probability measure on PN, and b any infinite subset of N. Show that limn→∞ =
#(b∩{0,... ,n})
1
for almost every a ⊆ N.
2

> (g) For each x ∈ [0, 1], let ǫk (x) be the kth digit in the decimal expansion of x (choose for yourself what to do
1
with 0·100 . . . = 0·099 . . . ). Show that limk→∞ k1 #({j : j ≤ k, ǫj (x) = 7}) = 10 for almost every x ∈ [0, 1].

(h) Let hFn in∈N be a sequence of distribution functions for real-valued random variables, in the sense of 271Ga,
and F another distribution function; suppose that limn→∞ Fn (q) = F (q) for every q ∈ Q and limn→∞ Fn (a− ) = F (a− )
whenever F (a− ) < F (a), where I write F (a− ) for limx↑a F (x). Show that Fn → F uniformly.

> (i) Let (Ω, Σ, µ) be a probability space and hXn in∈N an independent identically distributed
T sequence of real-valued
random variables on Ω with common distribution function F . For a ∈ R, n ∈ N and ω ∈ i≤n dom Xi set
1
Fn (ω, a) = #({i : i ≤ n, Xi (ω) ≤ a}).
n+1
Show that
limn→∞ supa∈R |Fn (ω, a) − F (a)| = 0
for almost every ω ∈ Ω.

(j) Let (Ω, Σ, µ) be a probability space, and λ the product measure on ΩN . Let f : Ω → R be a function, and set
1 Pn R ∗ R
f ∗ (ω
ω ) = lim supn→∞ N
i=0 f (ωi ) for ω = hωn in∈N ∈ Ω . Show that f dλ = f dµ whenever the right-hand-side
n+1
is finite. (Hint: 133J(a-i).)

(k)PFind an independent sequence hXn in∈N of random variables with zero expectation such that kXn k1 = 1 and
1 n 1
k n+1 i=0 Xi k1 ≥ 2 for every n ∈ N. (Hint: take Pr(Xn 6= 0) very small.)
360 Probability theory 273Xl

(l) Use 272S to prove 273Nb in the case p = 2.

(m)
1
PnFind an independent sequence hXn in∈N of random variables with zero expectation such that kXn k∞ =
k n+1 i=0 Xi k∞ = 1 for every n ∈ N.

(n) Repeat the work of this section for complex-valued random variables.

273Y Further exercises (a) Let (Ω, Σ, µ) be a probability space, and λ the product measure on ΩN . Suppose
that f is a real-valued function, defined on a subset of Ω, such that
1 Pn
ω ) = limn→∞
h(ω i=0 f (ωi )
n+1

exists in R for λ-almost every ω = hωn in∈N in ΩN . Show (i) that f has conegligible domain (ii) f is Σ̂-measurable,
where Σ̂ is the domain
R of the completion of µ (iii) there is an a ∈ R such that h = a almost everywhere in ΩN (iv) f is
integrable, with f dµ = a.

(b) Let hXn in∈N be a sequence of random variables with finite variance. Suppose that limn→∞ E(Xn ) = ∞ and
E(Xn2 )
lim inf n→∞ ≤ 1. Show that lim supn→∞ Xn = ∞ a.e.
(E(Xn ))2

273 Notes and comments I have tried in this section to offer the most useful of the standard criteria for pointwise
convergence of averages of independent random variables. In my view the strong law of large numbers, like Fubini’s
theorem, is one of the crucial steps in measure theory, where the subject changes character. Theorems depending
on the strong law have a kind of depth and subtlety to them which is missing in other parts of the subject. I have
described only a handful of applications here, but I hope that 273G, 273J, 273Xd, 273Xg and 273Xi will give an idea
of what is to be expected. These do have rather different weights. Of the four, only 273J requires the full resources of
this chapter; the others can be deduced from the essentially simpler version in 273Xi.
273Xi is the ‘fundamental theorem of statistics’ or ‘Glivenko-Cantelli theorem’. The Fn (., a) are ‘statistics’, computed
from the Xi ; they are the ‘empirical distributions’, and the theorem says that, almost surely, Fn → F uniformly. (I say
‘uniformly’ to make the result look more striking, but of course the real content is that Fn (., a) → F (a) almost surely
for each a; the extra step is just 273Xh.)
I have included 273N to show that independence is quite as important in questions of norm-convergence as it is in
questions of pointwise convergence. It does not really rely on any form of the strong law; in the proof I quote 273E as a
quick way of disposing of the ‘uniformly bounded parts’ Xn′ , but of course Bienaymé’s equality (272S) is already enough
to show that if hXn′ in∈N is an independent uniformly bounded sequence of random variables with zero expectation,
1
then k n+1 (X0 + . . . + Xn )kp → 0 for p = 2, and therefore for every p < ∞.
The proofs of 273H, 273I and 273Na all involve ‘truncation’; the expression of a random variable X as the sum of
a bounded random variable and a tail. This is one of the most powerful techniques in the subject, and will appear
again in §§274 and 276. In 273Na I used a slightly different formulation of the method, solely because it matched the
definition of ‘uniformly integrable’ more closely.

274 The central limit theorem


The second of the great theorems to which this chapter is devoted is of a new type. It is a limit theorem, but the limit
involved is a limit of distributions, not of functions (as in the strong limit theorem above or the martingale theorem
below), nor of equivalence classes of functions (as in Chapter 24). I give three forms of the theorem, in 274I-274K, all
drawn as corollaries of Theorem 274G; the proof is spread over 274C-274G. In 274A-274B and 274M I give the most
elementary properties of the normal distribution.

274A The normal distribution We need some facts from basic probability theory.
(a) Recall that
R∞ 2 √
−∞
e−x /2
dx = 2π
(263G). Consequently, if we set
1 R 2
µG E = √ e−x /2 dx
2π E
274B The central limit theorem 361

for every Lebesgue measurable set E, µG is a Radon probability measure (256E); we call it the standard normal
distribution. The corresponding distribution function is
1 Ra 2
Φ(a) = µG ]−∞, a] = √ e−x /2 dx
2π −∞
for a ∈ R; for the rest of this section I will reserve the symbol Φ for this function.
Writing Σ for the algebra of Lebesgue measurable subsets of R, (R, Σ, µG ) is a probability space. Note that it is
2
complete, and has the same negligible sets as Lebesgue measure, because e−x /2 > 0 for every x (cf. 234Lc).

1 2
(b) A random variable X is standard normal if its distribution is µG ; that is, if the function x 7→ √ e−x /2 is a

density function for X. The point of the remarks in (a) is that there are such random variables; for instance, take the
probability space (R, Σ, µG ) there, and set X(x) = x for every x ∈ R.

(c) If X is a standard normal random variable, then


1 R∞ 2
E(X) = √ xe−x /2 dx = 0,
2π −∞

1 R ∞ 2 −x2 /2
Var(X) = √ x e dx =1
2π −∞
by 263H.

(d) More generally, a random variable X is normal if there are a ∈ R, σ > 0 such that Z = (X − a)/σ is standard
normal. In this case X = σZ + a so E(X) = σE(Z) + a = a, Var(X) = σ 2 Var(Z) = σ 2 .
We have, for any c ∈ R,

Z c Z (c−a)/σ
1 2
/2σ 2 1 2
√ e−(x−a) dx = √ e−y /2
dy
σ 2π ∞
2π −∞
(substituting x = a + σy for −∞ < y ≤ (c − a)/σ)
c−a
= Pr(Z ≤ ) = Pr(X ≤ c).
σ

1 2 2
So x 7→ √ e−(x−a) /2σ is a density function for X (271Ib). Conversely, of course, a random variable with such a
σ 2π
density function is normal, with expectation a and variance σ 2 . The normal distributions are the distributions with
these density functions.

(e) If Z is standard normal, so is −Z, because


1 R ∞ −x2 /2 1 Ra 2
Pr(−Z ≤ a) = Pr(Z ≥ −a) = √ e dx = √ e−x /2 dx.
2π −a 2π −∞
The definition in the first sentence of (d) now makes it obvious that if X is normal, so is a + bX for any a ∈ R,
b ∈ R \ {0}.

274B Proposition Let X1 , . . . , Xn be independent normal random variables. Then Y = X1 + . . . + Xn is normal,


with E(Y ) = E(X1 ) + . . . + E(Xn ) and Var(Y ) = Var(X1 ) + . . . + Var(Xn ).
proof There are innumerable proofs of this fact; the following one gives me a chance to show off the power of Chapter
26, but of course (at the price of some disagreeable algebra) 272U also gives the result.
p
pSetting ai = E(Xi ), σi = Var(Xi ), Zi = (Xi −ai )/σi we get independent standard
(a) Consider first the case n = 2.
normal variables Z1 , Z2 . Set ρ = σ12 + σ22 , and express σ1 , σ2 as ρ cos θ, ρ sin θ. Consider U = cos θZ1 + sin θZ2 . We
know that (Z1 , Z2 ) has a density function
1 2 2
(ζ1 , ζ2 ) 7→ g(ζ1 , ζ2 ) = e−(ζ1 +ζ2 )/2
2πσ1 σ2
(272I). Consequently, for any c ∈ R,
R
Pr(U ≤ c) = F
g(z)dz,
362 Probability theory 274B

where F = {(ζ1 , ζ2 ) : ζ1 cos θ + ζ2 sin θ ≤ c}. But now let T be the matrix
 
cos θ − sin θ
.
sin θ cos θ
Then it is easy to check that
T −1 [F ] = {(η1 , η2 ) : η1 ≤ c},

det T = 1,

g(T y) = g(y) for every y ∈ R 2 ,


so by 263A
R R R
Pr(U ≤ c) = F
g(z)dz = T −1 [F ]
g(T y)dy = ]−∞,c]×R
g(y)dy = Pr(Z1 ≤ c) = Φ(c).
As this is true for every c ∈ R, U also is standard normal (I am appealing to 271Ga again). But
X1 + X2 = σ1 Z1 + σ2 Z2 + a1 + a2 = ρU + a1 + a2 ,
so X1 + X2 is normal.
(b) Now we can induce on n. If n = 1 the result is trivial. For the inductive step to n + 1 ≥ 2, we know that
X1 + . . . + Xn is normal, by the inductive hypothesis, and that Xn+1 is independent of X1 + . . . + Xn , by 272L. So
X1 + . . . + Xn + Xn+1 is normal, by (a).
The computation of the expectation and variance of X1 + . . . + Xn is immediate from 271Ab and 272S.

274C Lemma Let U0 , . . . , Un , V0 , . . . , Vn be independent real-valued random variables and h : R → R a bounded


Borel measurable function. Then
Pn Pn  Pn 
|E h( i=0 Ui ) − h( i=0 Vi ) | ≤ i=0 supt∈R |E h(t + Ui ) − h(t + Vi ) |.
Pj−1 Pn Pn Pn
proof For 0 ≤ j ≤ n + 1, set Zj = i=0 Ui + i=j Vi , taking Z0 = i=0 Vi and Zn+1 = i=0 Ui , and for j ≤ n set
Pj−1 Pn
Wj = i=0 Uj + i=j+1 Vj , so that Zj = Wj + Vj and Zj+1 = Wj + Uj and Wj , Uj and Vj are independent (I am
appealing to 272K, as in 272L). Then
n
X n
X n
X
 
|E h( Ui ) − h( Vi ) | = |E h(Zi+1 ) − h(Zi ) |
i=0 i=0 i=0
n
X 
≤ |E h(Zi+1 ) − h(Zi ) |
i=0
n
X 
= |E h(Wi + Ui ) − h(Wi + Vi ) |.
i=0
To estimate this sum I turn it into a sum of integrals, as follows. For each i, let νWi be the distribution of Wi , and
so on. Because (w, u) 7→ w + u is continuous, therefore Borel measurable, (w, u) 7→ h(w, u) also is Borel measurable;
accordingly (w, u, v) 7→ h(w + u) − h(w + v) is measurable for each of the product measures νWi × νUi × νVi on R 3 , and
271E and 272G give us


E h(Wi + Ui )−h(Wi + Vi )
Z

= h(w + u) − h(w + v)(νWi × νUi × νVi )d(w, u, v)
Z Z

= h(w + u) − h(w + v)(νUi × νVi )d(u, v) νWi (dw)
Z Z

≤ h(w + u) − h(w + v)(νUi × νVi )d(u, v) νWi (dw)
Z

= E h(w + Ui ) − h(w + Vi ) νWi (dw)

≤ sup E h(t + Ui ) − h(t + Vi ) .
t∈R
274D The central limit theorem 363

So we get
n
X n
X n
X
 
|E h( Ui ) − h( Vi ) | ≤ |E h(Wi + Ui ) − h(Wi + Vi ) |
i=0 i=0 i=0
n
X 
≤ sup |E h(t + Ui ) − h(t + Vi ) |,
i=0 t∈R
as required.

274D Lemma Let h : R → R be a bounded three-times-differentiable function such that M2 = supx∈R |h′′ (x)|,
M3 = supx∈R |h′′′ (x)| are both finite. Let ǫ > 0.
(a) Let U be a real-valued random variable of zero expectation and finite variance σ 2 . Then for any t ∈ R we have
σ 2 ′′ 1
|E(h(t + U )) − h(t) − h (t)| ≤ ǫM3 σ 2 + M2 E(ψǫ (U ))
2 6

where ψǫ (x) = 0 if |x| ≤ ǫ, x2 if |x| > ǫ.


(b) Let U0 , . . . , Un , V0 , . . . , Vn be independent random variables with finite variances, and suppose that E(Ui ) =
E(Vi ) = 0, Var(Ui ) = Var(Vi ) = σi2 for every i ≤ n. Then
n
X n
X 
|E h( Ui ) − h( Vi ) |
i=0 i=0
Xn n
X n
X
1  
≤ ǫM3 σi2 + M2 E ψǫ (Ui ) + M2 E ψǫ (Vi ) .
3
i=0 i=0 i=0

proof (a) The point is that, by Taylor’s theorem with remainder,


1
|h(t + x) − h(t) − xh′ (t)| ≤ M2 x2 ,
2

1 1
|h(t + x) − h(t) − xh′ (t) − x2 h′′ (t)| ≤ M3 |x|3
2 6
for every x ∈ R. So
1 1 1
|h(t + x) − h(t) − xh′ (t) − x2 h′′ (t)| ≤ min( M3 |x|3 , M2 x2 ) ≤ ǫM3 x2 + M2 ψǫ (x).
2 6 6
Integrating with respect to the distribution of U , we get

1  1
|E h(t + U )) − h(t) − h′′ (t)σ 2 | = |E(h(t + U )) − h(t) − h′ (t)E(U ) − h′′ (t)E(U 2 )|
2 2
1 ′′ 
= |E h(t + U ) − h(t) − h′ (t)U − h (t)U 2 |
2
1 
≤ E |h(t + U ) − h(t) − h′ (t)U − h′′ (t)U 2 |
2
1 
≤E ǫM3 U 2 + M2 ψǫ (U )
6
1
= ǫM3 σ 2 + M2 E(ψǫ (U )),
6
as claimed.
(b) By 274C,
n
X n
X n
X
 
|E h( Ui ) − h( Vi ) | ≤ sup |E h(t + Ui ) − h(t + Vi ) |
i=0 i=0 i=0 t∈R
Xn
1
≤ sup |E(h(t + Ui )) − h(t) − h′′ (t)σi2 |
2
i=0 t∈R
1 
+ |E(h(t + Vi )) − h(t) − h′′ (t)σi2 | ,
2
364 Probability theory 274D

which by (a) above is at most


Pn 1 2
i=0 3 ǫM3 σi + M2 E(ψǫ (Ui )) + M2 E(ψǫ (Vi )),

as claimed.

274E Lemma For any ǫ > 0, there is a three-times-differentiable function h : R → [0, 1], with continuous third
derivative, such that h(x) = 1 for x ≤ −ǫ and h(x) = 0 for x ≥ ǫ.
proof Let f : ]−ǫ, ǫ[ → ]0, ∞[ be any twice-differentiable function such that
limx↓−ǫ f (n) (x) = limx↑ǫ f (n) (x) = 0
for n = 0, 1 and 2, writing f (n) for the nth derivative of f ; for instance, you could take f (x) = (ǫ2 − x2 )3 , or
1
f (x) = exp(− ). Now set
ǫ2 −x2
Rx Rǫ
h(x) = 1 − −ǫ
f/ −ǫ
f
for |x| ≤ ǫ.

274F Lindeberg’s theorem Let ǫ > 0. Then there is a δ > 0 such that whenever X0 , . . . , Xn are independent
real-valued random variables such that
E(Xi ) = 0 for every i ≤ n,
Pn
i=0 Var(Xi ) = 1,
Pn
i=0 E(ψδ (Xi )) ≤ δ

(writing ψδ (x) = 0 if |x| ≤ δ, x2 if |x| > δ), then


Pn
Pr( Xi ≤ a) − Φ(a) ≤ ǫ
i=0
for every a ∈ R.
proof (a) Let h : R → [0, 1] be a three-times-differentiable function, with continuous third derivative, such that
χ ]−∞, −ǫ] ≤ h ≤ χ ]−∞, ǫ], as in 274E. Set
M2 = supx∈R |h′′ (x)| = sup|x|≤ǫ |h′′ (x)|,

M3 = supx∈R |h′′′ (x)| = sup|x|≤ǫ |h′′′ (x)|;


2
because h′′′ is continuous, both are finite. Write ǫ′ = ǫ(1 − √ ) > 0, and let η > 0 be such that

1
( M3 + 2M2 )η ≤ ǫ′ .
3
Note that limm→∞ ψm (x) = 0 for every x, so if X is a random variable of finite variance we must have limm→∞ E(ψm (X)) =
0, by Lebesgue’s Dominated Convergence Theorem; let m ≥ 1 be such that E(ψm (Z)) ≤ η, where Z is some (or any)
standard normal random variable. Finally, take δ > 0 such that δ ≤ η, δ + δ 2 ≤ (η/m)2 .
(I hope that you have seen enough ǫ-δ arguments not to be troubled by any expectation of understanding the reasons
for each particular formula here before reading the rest of the argument. But the formula 13 M3 + 2M2 , in association
with ψδ , should recall 274D.)
Pn
Pn(b) Let X0 , . . . , Xn be independent random variables with zero expectation such that i=0 Var(Xi ) = 1 and
i=0 E(ψ δ (X i )) ≤ δ. We need an auxiliary sequence Z 0 , . . . , Z n of standard normal random variables to match against
the Xi . To create this, I use the following device. Suppose that the probability space underlying X0 , . . . , Xn is (Ω, Σ, µ).
Set Ω′ = Ω × R n+1 , and let µ′ be the product measure on Ω′ , where Ω is given the measure µ and each factor R of
R n+1 is given the measure µG . Set Xi′ (ω, z) = Xi (ω) and Zi (ω, z) = ζi for ω ∈ dom Xi , z = (ζ0 , . . . , ζn ) ∈ R n+1 , i ≤ n.
Then X0′ , . . . , Xn′ , Z0 , . . . , Zn are independent, and each Xi′ has the same distribution as Xi (272Mb). Consequently
S ′ = X0′ +. . .+Xn′ has the same distribution as S = X0 +. . .+Xn (using 272T, or otherwise); so that E(g(S ′ )) = E(g(S))
for any bounded Borel measurable function g (using 271E). Also each Zi has distribution µG , so is standard normal.
p
(c) Write σi = Var(Xi ) for each i, and set K = {i : i ≤ n, σi > 0}. Observe that η/σi ≥ m for each i ∈ K. P P
We know that
274F The central limit theorem 365

σi2 = Var(Xi ) = E(Xi2 ) ≤ E(δ 2 + ψδ (Xi )) = δ 2 + E(ψδ (Xi )) ≤ δ 2 + δ,


so

η/σi ≥ η/ δ + δ 2 ≥ m
by the choice of δ. Q
Q
(d) Consider the independent normal random variables σi Zi . We have E(σi Zi )P= E(Xi′ ) = 0 and Var(σi Zi ) =
n
Var(Xi′ ) = σi2 for each i, so that Z = Z0 + . . . + Zn has expectation 0 and variance i=0 σi2 = 1; moreover, by 274B,
Z is normal, so in fact it is standard normal. Now we have

n
X X X
E(ψη (σi Zi )) = E(ψη (σi Zi )) = σi2 E(ψη/σi (Zi ))
i=0 i∈K i∈K
(because σ 2 ψη/σ (x) = ψη (σx) whenever x ∈ R, σ > 0)
X X
= σi2 E(ψη/σi (Z)) ≤ σi2 E(ψm (Z))
i∈K i∈K
(because, by (c), η/σi ≥ m for every i ∈ K, so ψη/σi (t) ≤ ψm (t) for every t)
X
≤ σi2 η
i∈K
(by the choice of m)
= η.
On the other hand, we surely have
Pn ′
Pn Pn
i=0 E(ψη (Xi )) = i=0 E(ψη (Xi )) ≤ i=0 E(ψδ (Xi )) ≤ δ ≤ η.

(e) For any real number t, set


ht (x) = h(x − t)
for each x ∈ R. Then ht is three-times-differentiable, with supx∈R |h′′t (x)| = M2 and supx∈R |h′′′ (x)| = M3 . Conse-
quently
|E(ht (S)) − E(ht (Z))| ≤ ǫ′ .
P
P By 274Db,

|E(ht (S)) − E(ht (Z))| = |E(ht (S ′ )) − E(ht (Z))|


n
X n
X
= |E(ht ( Xi′ )) − E(ht ( σi Zi ))|
i=0 i=0
n
X n
X n
X
1
≤ ηM3 σi2 + M2 E(ψη (Xi )) + M2 E(ψη (σi Zi ))
3
i=0 i=0 i=0
1
≤ ηM3 + M2 η + M2 η ≤ ǫ′ ,
3
by the choice of η. Q
Q
(f ) Now take any a ∈ R. We have
χ ]−∞, a − 2ǫ] ≤ ha−ǫ ≤ χ ]∞, a] ≤ ha+ǫ ≤ χ ]−∞, a + ǫ].
Note also that, for any b,
1 R b+2ǫ −x2 /2 2ǫ
Φ(b + 2ǫ) = Φ(b) + √ e dx ≤ Φ(b) + √ = Φ(b) + ǫ − ǫ′ .
2π b 2π
Consequently

Φ(a) − ǫ ≤ Φ(a − 2ǫ) − ǫ′ = Pr(Z ≤ a − 2ǫ) − ǫ′


≤ E(ha−ǫ (Z)) − ǫ′ ≤ E(ha−ǫ (S)) ≤ Pr(S ≤ a)
≤ E(ha+ǫ (S)) ≤ E(ha+ǫ (Z)) + ǫ′ ≤ Pr(Z ≤ a + 2ǫ) + ǫ′
= Φ(a + 2ǫ) + ǫ′ ≤ Φ(a) + ǫ.
366 Probability theory 274F

But this means just that


Pn
Pr( Xi ≤ a) − Φ(a) ≤ ǫ,
i=0

as claimed.

274G Central Limit Theorem Let hX pPn in∈N be an independent sequence of random variables, all with zero
n
expectation and finite variance; write sn = i=0 Var(Xi ) for each n. Suppose that
1 Pn
limn→∞ 2 i=0 E(ψδsn (Xi )) = 0 for every δ > 0,
sn
2
writing ψδ (x) = 0 if |x| ≤ δ, x if |x| > δ. Set
1
Sn = (X0 + . . . + Xn )
sn
for each n ∈ N such that sn > 0. Then
limn→∞ Pr(Sn ≤ a) = Φ(a)
uniformly for a ∈ R.
proof Given ǫ > 0, take δ > 0 as in Lindeberg’s theorem (274F). Then for all n large enough,
1 Pn
2 i=0 E(ψδsn (Xi )) ≤ δ.
sn
Fix on any such n. Of course we have sn > 0. Set
1
Xi′ = Xi for i ≤ n;
sn

then X0′ , . . . , Xn′ are independent, with zero expectation,


Pn ′
Pn 1
i=0 Var(Xi ) = i=0 s2n
Var(Xi ) = 1,

Pn Pn 1
i=0 E(ψδ (Xi′ )) = i=0 s2 E(ψδsn (Xi )) ≤ δ.
n

By 274F,

Pr(Sn ≤ a) − Φ(a) = Pr(Pn X ′ ≤ a) − Φ(a) ≤ ǫ
i=0 i

for every a ∈ R. Since this is true for all n large enough, we have the result.

274H Remarks (a) The condition


1 Pn
limn→∞ i=0 E(ψǫsn (Xi )) = 0 for every ǫ > 0
s2n
is called Lindeberg’s condition, following Lindeberg 22.

(b) Lindeberg’s condition is necessary as well as sufficient, in the following sense. Suppose that hXnp in∈N is an
independent sequence of real-valued random variables with zero expectation and finite variance; write σn = Var(Xn ),
pP n σn
sn = i=0 Var(Xi ) for each n. Suppose that limn→∞ sn = ∞, limn→∞ = 0 and that limn→∞ Pr(Sn ≤ a) = Φ(a)
sn
1
for each a ∈ R, where Sn = (X0 + . . . + Xn ). Then
sn
1 Pn
limn→∞ i=0 E(ψǫsn (Xi )) =0
s2n

for every ǫ > 0. (Feller 66, §XV.6, Theorem 3; Loève 77, §21.2.)

(c) The proof of 274F-274G here is adapted from Feller 66, §VIII.4. It has the virtue of being ‘elementary’, in
that it does not involve characteristic functions. Of course this has to be paid for by a number of detailed estimations;
and – what is much more serious – it leaves us without one of the most powerful techniques for describing distributions.
The proof does offer a method of bounding
| Pr(Sn ≤ a) − Φ(a)|;
274J The central limit theorem 367

but it should be said that the bounds obtained are not useful ones, being grossly over-pessimistic, at least in the readily
analysable cases. (For instance, a better bound, in many cases, is given by the Berry-Esséen p theorem: if hXn in∈N is
independent and identically distributed, with zero expectation, and the common values of E(Xn2 ), E(|Xn |3 ) are σ,
ρ < ∞, then
33ρ
| Pr(Sn ≤ a) − Φ(a)| ≤ √ ;
4σ 3 n+1

see Feller 66, §XVI.5, Loève 77, §21.3, or Hall 82.) Furthermore, when |a| is large, Φ(a) is exceedingly close
to either 0 or 1, so that any uniform bound for | Pr(S ≤ a) − Φ(a)| gives very little information; a great deal of
work has been done on estimating the tails of such distributions more precisely, subject to special conditions. For
instance, if X0 , . . . , Xn are independent random variables,
p of zero expectation, uniformly bounded with |Xi | ≤ K
almost everywhere for each i, Y = X0 + . . . + Xn , s = Var(Y ) > 0, S = 1s Y , then for any α ∈ [0, s/K]

−α2  2
Pr(|S| ≥ α) ≤ 2 exp ≏ 2e−α /2
2(1 + αK
2s ) 2

if s ≫ αK (Rényi 70, §VII.4, Theorem 1).


I now list some of the standard cases in which Lindeberg’s conditions are satisfied, so that we may apply the theorem.

274I Corollary Let hXn in∈N be an independent sequence of real-valued random variables, all with the same
distribution, and suppose that
p their common expectation is 0 and their common variance is finite and not zero. Write
σ for the common value of Var(Xn ), and set
1
Sn = √ (X0 + . . . + Xn )
σ n+1
for each n ∈ N. Then
limn→∞ Pr(Sn ≤ a) = Φ(a)
uniformly for a ∈ R.

proof In the language of 274G-274H, we have σn = σ, sn = σ n + 1, so the first two conditions are surely satisfied;
moreover, if ν is the common distribution of the Xn , then
R
E(ψǫsn (Xn )) = √
{x:|x|>ǫσ n}
x2 ν(dx) → 0
by Lebesgue’s Dominated Convergence Theorem; so that
1 Pn
2 i=0 E(ψǫsn (Xn )) → 0
sn
by 273Ca. Thus Lindeberg’s conditions are satisfied and 274G gives the result.

274J Corollary Let hXn in∈N be an independent sequence of real-valued random variables with zero expectation,
and suppose that {Xn2 : n ∈ N} is uniformly integrable and that
1 Pn
lim inf n→∞ i=0 Var(Xi ) > 0.
n+1
Set
pP n 1
sn = i=0 Var(Xi ), Sn = (X0 + . . . + Xn )
sn
for large n ∈ N. Then
limn→∞ Pr(Sn ≤ a) = Φ(a)
uniformly for a ∈ R.
proof The condition
1 Pn
lim inf n→∞ Var(Xi ) >0
n+1 i=0

means that there are c > 0, n0 ∈ N such that sn ≥ c n + 1 for every n ≥ n0 . Let the underlying space be (Ω, Σ, µ),
and take ǫ, η > 0. Writing ψδ (x) = 0 for |x| ≤ δ, x2 for |x| > δ, as in 274F-274G, we have
368 Probability theory 274J
R
E(ψǫsn (Xi )) ≤ E(ψcǫ√n+1 (Xi )) = √
F (i,cǫ n+1)
Xi2 dµ

for n ≥ n0 , i ≤ n, where
R F (i, γ) = {ω : ω ∈ dom Xi , |Xi (ω)| > γ}. Because {Xi2 : i ∈ N} is uniformly
√ integrable, there
is a γ ≥ 0 such that F (i,γ) Xi dµ ≤ ηc2 for every i ∈ N (246I). Let n1 ≥ n0 be such that cǫ n1 + 1 ≥ γ; then for any
2

n ≥ n1
1 Pn 1 Pn 2
2
sn i=0 E(ψǫsn (Xi )) ≤ 2 i=0 ηc = η.
c (n+1)
As ǫ, η are arbitrary, the conditions of 274G are satisfied and the result follows.

274K Corollary Let hXn in∈N be an independent sequence of real-valued random variables with zero expectation,
and suppose that
(i) there is some δ > 0 such that supn∈N E(|Xn |2+δ ) < ∞,
1 Pn
(ii) lim inf n→∞ i=0 Var(Xi ) > 0.
n+1
pP n
Set sn = i=0 Var(Xi ) and
1
Sn = (X0 + . . . + Xn )
sn
for large n ∈ N. Then
limn→∞ Pr(Sn ≤ a) = Φ(a)
uniformly for a ∈ R.
proof The point is that {Xn2 : n ∈ N} is uniformly integrable. P
P Set K = 1 + supn∈N E(|Xn |2+δ ). Given ǫ > 0, set
1/δ 2 + −δ 2+δ
M = (K/ǫ) . Then (Xn − M ) ≤ M |Xn | , so
E(Xn2 − M )+ ≤ KM −δ = ǫ
for every n ∈ N. As ǫ is arbitrary, {Xn2 : n ∈ N} is uniformly integrable. Q
Q
Accordingly the conditions of 274J are satisfied and we have the result.

274L Remarks (a) All the theorems of this section are devoted to finding conditions under which a random variable
S is ‘nearly’ standard normal, in the sense that Pr(S ≤ a) ≏ Pr(Z ≤ a) uniformly for a ∈ R, where Z is some (or any)
standard normal random variable. In all cases the random variable S is normalized to have expectation 0 and variance
1, and is a sum of a large number of independent random variables. (In 274G and 274I-274K it is explicit that there
must be many Xi , since they refer to a limit as n → ∞. This√ is not said in so many words in the formulation I give of
Lindeberg’s theorem, but the proof makes it evident that n δ + δ 2 ≥ 1, so surely n will have to be large there also.)

(b) I cannot leave this section without remarking that the form of the definition of ‘nearly standard normal’ may
lead your intuition astray if you try to apply it to other distributions. If we take F to be the distribution function of
S, so that F (a) = Pr(S ≤ a), I am saying that S is ‘nearly standard normal’ if supa∈R |F (a) − Φ(a)| is small. It is
natural to think of this as approximation in a metric, writing
ρ̃(ν, ν ′ ) = supa∈R |Fν (a) − Fν ′ (a)|
for distributions ν, ν ′ on R, where Fν (a) = ν ]−∞, a]. In this form, the theorems above can be read as finding conditions
under which limn→∞ ρ̃(νSn , µG ) = 0. But the point is that ρ̃ is not really the right metric to use. It works here because
µG is atomless. But suppose, for instance, that ν is the distribution which gives mass 1 to the point 0 (I mean, that
νE = 1 if 0 ∈ E ⊆ R, 0 if 0 ∈ / E ⊆ R), and that νn is the distribution of a normal random variable with expectation 0
and variance n1 , for each n ≥ 1. Then Fν (0) = 1 and Fνn (0) = 12 , so ρ̃(νn , ν) = 12 for each n ≥ 1. However, for most
purposes one would regard the difference between νn and ν as small, and surely ν is the only distribution which one
could reasonably call a limit of the νn .

(c) The difficulties here present themselves in more than one form. A statistician would be unhappy with the
idea that the νn of the last paragraph were far from ν (and from each other), on the grounds that any measurement
involving random variables with these distributions must be subject to error, and small errors of measurement will
render them indistinguishable. A pure mathematician, looking forward to the possibility of generalizing these results,
will be unhappy with the emphasis given to the values of ν ]−∞, a], for which it may be difficult to find suitable
equivalents in more abstract spaces.
274Xg The central limit theorem 369

(d) These considerations join together to lead us to a rather different definition for a topology on the space P of
probability distributions on R. For any bounded continuous function h : R → R we have a pseudometric ρh : P × P →
[0, ∞[ defined by writing
R R
ρh (ν, ν ′ ) = | h dν − h dν ′ |
for all ν, ν ′ ∈ P . The vague topology on P is that generated by the pseudometrics ρh (2A3F). I will not go into
its properties in detail here (some are sketched in 274Ya-274Yd below; see also 285K-285L, 285S and 437J-437P in
Volume 4). But I maintain that the right way to look at the results of this chapter is to say that (i) the distributions
νS are close to µG for the vague topology (ii) the sets {ν : ρ̃(ν, µG ) < ǫ} are open for that topology, and that is why
ρ̃(νS , µG ) is small.

*274M I conclude with a simple pair of inequalities which are frequently useful when studying normal random
variables.
R∞ 2 1 2
Lemma (a) x e−t /2 dt ≤ e−x /2 for every x > 0.
x
R∞ 2 1 2
(b) x e−t /2 dt ≥ e−x /2 for every x ≥ 1.
2x

proof (a)
Z ∞ Z ∞ Z ∞
2 2 2 1 2
e−t /2
dt = e−(x+s) /2
ds ≥ e−x /2
e−xs ds = e−x /2
.
x 0 0
x

(b) Set
2 2
f (t) = e−t /2
− (1 − x(t − x))e−x /2
.
2
Then f (x) = f ′ (x) = 0 and f ′′ (t) = (t2 − 1)e−t /2 is positive for t ≥ x (because x ≥ 1). Accordingly f (t) ≥ 0 for every
R x+1/x
t ≥ x, and x f (t)dt ≥ 0. But this means just that
Z ∞ Z 1
x+ x Z 1
x+ x
−t2 /2 −t2 /2 2 1 −x2 /2
e dt ≥ e dt ≥ (1 − x(t − x))e−x /2
dt = e ,
x x x
2x

as required.

274X Basic exercises > (a) Use 272U to give an alternative proof of 274B.

(b) Prove 274D when h′′ is M3 -Lipschitz but not necessarily differentiable.

(c) Let hmk ik∈N be a strictly increasing sequence in N such that m0 = 0 and limk→∞ mk /mk+1 = 0. Let hXn in∈N
√ √
be an independent sequence of random variables such that Pr(Xn = mk ) = Pr(Xn = − mk ) = 1/2mk , Pr(Xn =
0) = 1 − 1/mk whenever mk−1 ≤ n < mk . Show that the Central Limit Theorem is not valid for hXn in∈N . (Hint:

setting Wk = (X0 + . . . + Xmk −1 )/ mk , show that Pr(Wk ∈ G) → 1 for every open set G including Z.)

(d) Let hXn in∈N be any independent sequence of random variables all with the same distribution; suppose that they
1
all have finite variance σ 2 > 0, and that their common expectation is c. Set Sn = √ (X0 + . . . + Xn ) for each n, and
n+1
let Y be a normal random variable with expectation c and variance σ 2 . Show that limn→∞ Pr(Sn ≤ a) = Pr(Y ≤ a)
uniformly for a ∈ R.

> (e) Show that for any a ∈ R,



n
⌊n
2 +a ⌋ √
1 X2 n! 1 n n
lim = lim n #({I : I ⊆ n, #(I) ≤ +a }) = Φ(a).
n→∞ 2n r!(n − r)! n→∞ 2 2 2
r=0

(f ) Show that 274I is a special case of 274J.

(g) Let hXn in∈N be an independent sequence of real-valued random variables with zero expectation. Set sn =
pP n
i=0 Var(Xi ) and
370 Probability theory 274Xg

1
Sn = (X0 + . . . + Xn )
sn
for each n ∈ N. Suppose that there is some δ > 0 such that
1 Pn
limn→∞ 2+δ i=0 E(|Xi |2+δ ) = 0.
sn

Show that limn→∞ Pr(Sn ≤ a) = Φ(a) uniformly for a ∈ R. (This is a form of Liapounoff ’s central limit theorem;
see Liapounoff 1901.)

(h) Let P be the set of Radon probability measures on R. Let ν0 ∈ P , a ∈ R. Show that the map ν 7→ ν ]−∞, a] :
P → [0, 1] is continuous at ν0 for the vague topology on P iff ν0 {a} = 0.
R∞ 2
(i) Suppose that f : R → R is absolutely continuous on every closed bounded interval, and that −∞ |f ′ (x)|e−ax dx <
∞ for every a > 0. Let X be a normal random variable with zero expectation. Show that E(Xf (X)) and E(X 2 )E(f ′ (X))
are defined and equal.

(j) (Steele 86) Suppose that X0 , . . . , Xn , Y0 , . . . , Yn are independent random variables such that, for each i ≤ n,
Xi and Yi have the same distribution. Let h : R n+1 → R be a Borel measurable function, and set Z = h(X0 , . . . , Xn ),
Zi = h(X0 , . . . , Xi−1 , Yi , Xi+1 , . . . , Xn ) for each i (with Z0 = h(Y0 , XP
1 , . . . , Xn ) and Zn = h(X0 , . . . , Xn−1 , Yn ), of
n
course). Suppose that Z has finite expectation. Show that Var(Z) ≤ 12 i=0 E(Zi − Z)2 .

(k) Let hXn in∈N be an independent identically


P∞ distributed sequence ofPrandom variables with non-zero finite variance.

Let htn in∈N be a sequence in R such that n=0 t2n = ∞. Show that n=0 tn Xn is undefined or infinite a.e. (Hint:
First deal with
Pmthe case in which htn in∈N does not converge to 0. Otherwise, use 274G to show that, for any n ∈ N,
limm→∞ Pr(| i=n ti Xi | ≥ 1) ≥ 12 ).)

(l) Let hXn in∈N be an independent sequence of real-valued random variables with zeropexpectation. Suppose that
P∞ Pn
M ≥ 0 is such that |Xn | ≤ M a.e. for every n, and that n=0 Var(Xn ) = ∞. Set sn = i=0 Var(Xi ) for each n,
1 Pn
and Sn =
sn i=0 Xi when sn > 0. Show that limn→∞ Pr(Sn ≤ a) = Φ(a) for every a ∈ R.

274Y Further exercises (a) Write P for the set of Radon probability measures on R. For ν, ν ′ ∈ P set

ρ(ν, ν ′ ) = inf{ǫ : ǫ ≥ 0, ν ]−∞, a − ǫ] − ǫ ≤ ν ′ ]−∞, a] ≤ ν ]−∞, a + ǫ] + ǫ


for every a ∈ R}.
Show that ρ is a metric on P and that it defines the vague topology on P . (ρ is called Lévy’s metric.)

(b) Write P for the set of Radon probability measures on R, and let ρ̃ be the metric on P defined in 274Lb. Show
that if ν ∈ P is atomless and ǫ > 0, then {ν ′ : ν ′ ∈ P, ρ̃(ν ′ , ν) < ǫ} is open for the vague topology on R.

(c) Let hSn in∈N be a sequence of real-valued random variables, and Z a standard normal random variable. Show
that the following are equiveridical:
(i) µG = limn→∞ νSn for the vague topology, writing νSn for the distribution of Sn ;
(ii) E(h(Z)) = limn→∞ E(h(Sn )) for every bounded continuous function h : R → R;
(iii) E(h(Z)) = limn→∞ E(h(Sn )) for every bounded function h : R → R such that (α) h has continuous derivatives
of all orders (β) {x : h(x) 6= 0} is bounded;
(iv) limn→∞ Pr(Sn ≤ a) = Φ(a) for every a ∈ R;
(v) limn→∞ Pr(Sn ≤ a) = Φ(a) uniformly for a ∈ R;
(vi) {a : limn→∞ Pr(Sn ≤ a) = Φ(a)} is dense in R.
(See also 285L.)

(d) Let (Ω, Σ, µ) be a probability space, and P the set of Radon probability measures on R. Show that X 7→ νX :
L0 (µ) → P is continuous for the topology of convergence in measure on L0 (µ) and the vague topology on P .

(e) Let hXn in∈N be an independent sequenceP of real-valued random variables. Suppose that there is an M ≥ 0 such

that
P∞ |X n | ≤ M a.e. for every n ∈ N, and that n=0 Xn is defined, as a real number, almost everywhere. Show that
n=0 Var(X n ) < ∞.
275B Martingales 371

274 Notes and comments For more than two hundred years the Central Limit Theorem has been one of the glories
of mathematics, and no branch of mathematics or science would be the same without it. I suppose it is the most
important single theorem of probability theory; and I observe that the proof hardly uses measure theory. To be sure,
I have clothed the arguments above in the language of measure and integration. But if you look at their essence, the
vital elements of the proof are
(i) a linear combination of independent normal random variables is normal (274Ae, 274B);
(ii) if U , V , W are independent random variables, and h is a bounded continuous function, then
|E(h(U, V, W ))| ≤ supt∈R |E(h(U, V, t))| (274C);
(iii) if (X0 , . . . , Xn ) are independent random variables, then we can find independent random variables
(X0′ , . . . , Xn′ , Z0 , . . . , Zn ) such that Zj is standard normal and Xj′ has the same distribution as Xj , for each
j (274F).
The rest of the argument consists of elementary calculus, careful estimations and a few of the most fundamental
properties of expectations and independence. Now (ii) and (iii) are justified above by appeals to Fubini’s theorem,
but surely they belong to the list of probabilistic intuitions which take priority over the identification of probabilities
with countably additive functionals. If they had given any insuperable difficulty it would have been a telling argument
against the model of probability we were using, but would not have affected the Central Limit Theorem. In fact (i)
seems to be the place where we really need a mathematical model of the concept of ‘distribution’, and all the relevant
calculations can be done in terms of the Riemann integral on the plane, with no mention of countable additivity. So
while I am happy and proud to have written out a version of these beautiful ideas, I have to admit that they are in no
essential way dependent on the rest of this treatise.
In §285 I will describe a quite different approach to the theorem, using much more sophisticated machinery; but
it will again be the case, perhaps more thoroughly hidden, that the relevance of measure theory will not be to the
theorem itself, but to our imagination of what an arbitrary distribution is. For here I do have a claim to make for
my subject. The characterization of distribution functions as arbitrary monotonic functions, continuous on the right,
and with the right limits at ±∞ (271Xb), together with the analysis of monotonic functions in §226, gives us a chance
of forming a mental picture of the proper class of objects to which such results as the Central Limit Theorem can be
applied.
Theorem 274F is a trifling modification of Theorem 3 of Lindeberg 22. Like the original, it emphasizes what
I believe to be vital to all the limit theorems of this chapter: they are best founded on a proper understanding of
finite sequences of random variables. Lindeberg’s condition was the culmination of a long search for the most general
conditions under which the Central Limit Theorem would be valid. I offer a version of Laplace’s theorem (274Xe) as
the starting place, and Liapounoff’s condition (274Xg) as an example of one of the intermediate stages. Naturally the
corollaries 274I, 274J, 274K and 274Xd are those one seeks to apply by choice. There is an intriguing, but as far as I
know purely coincidental, parallel between 273H/274K and 273I/274Xd. As an example of an independent sequence
hXn in∈N of random variables, all with expectation zero and variance 1, to which the Central Limit Theorem does not
apply, I offer 274Xc.

275 Martingales
This chapter so far has been dominated by independent sequences of random variables. I now turn to another of the
remarkable concepts to which probabilistic intuitions have led us. Here we study evolving systems, in which we gain
progressively more information as time progresses. I give the basic theorems on pointwise convergence of martingales
(275F-275H, 275K) and a very brief account of ‘stopping times’ (275L-275P).

275A Definition Let (Ω, Σ, µ) be a probability space with completion (Ω, Σ̂, µ̂), and hΣn in∈N a non-decreasing
sequence of σ-subalgebras of Σ̂. (Such sequences hΣn in∈N are called filtrations.) A martingale adapted to hΣn in∈N
is a sequence hXn in∈N of integrable real-valued random variables on Ω R (i) dom Xn ∈ Σn and Xn is Σn -
R such that
measurable for each n ∈ N (ii) whenever
R m ≤ nR ∈ N and E ∈ Σm then E
X n = E
Xm .
Note that for (ii) it is enough if E Xn+1 = E Xn whenever n ∈ N and E ∈ Σn .

275B Examples We have seen many contexts in which such sequences appear naturally; here are a few.
(a) Let (Ω, Σ, µ) be a probability space and hΣn in∈N a non-decreasing sequence of σ-subalgebras of Σ. Let X be any
real-valued random variable on Ω with finite expectation, and for each n ∈ N let Xn be a conditional expectation of X
on Σn , as in §233. Subject to the conditions that dom Xn ∈ Σn and Xn is actually Σn -measurable
R for each
R n (a purely
R
technical point – see 232He), hXn in∈N will be a martingale adapted to hΣn in∈N , because E Xn+1 = E X = E Xn
whenever E ∈ Σn .
372 Probability theory 275Bb

(b) Let (Ω, Σ, µ) be a probability space and hXn in∈N an independent sequence of random variables all with zero
S
expectation. For each n ∈ N let Σ̃n be the σ-algebra generated by i≤n ΣXi , writing ΣXi for the σ-algebra defined by
Xi (272C), and set Sn = X0 + . . . + Xn . Then hSn in∈N is a martingale adapted to Σ̃n . (Use 272K to see that ΣXn+1
R R
is independent of Σ̃n , so that E Xn+1 = Xn+1 × χE = 0 for every E ∈ Σ̃n , by 272R.)

(c) Let (Ω, Σ, µ) be a probability space and hXn in∈N an independent sequence of random variables all with expec-
S
tation 1. For each n ∈ N let Σ̃n be the σ-algebra generated by i≤n ΣXi , writing ΣXi for the σ-algebra defined by Xi ,
and set Wn = X0 × . . . × Xn . Then hWn in∈N is a martingale adapted to hΣ̃n in∈N .

275C Remarks (a) It seems appropriate to the concept of a random variable X being ‘adapted’ to a σ-algebra
Σ to require that dom X ∈ Σ and that X should be Σ-measurable, even though this may mean that other random
variables, equal almost everywhere to X, may fail to be ‘adapted’ to Σ.

(b) Technical problems of this kind evaporate, of course, if all µ-negligible subsets of X belong to Σ0 . But examples
such as 275Bb make it seem unreasonable to insist on such a simplification as a general rule.

(c) The concept of ‘martingale’ can readily be extended to other index sets than N; indeed, if I is any partially
ordered set, we can say that hXi ii∈I is a martingale on (Ω, Σ, µ) adapted to hΣi ii∈I if (i) each Σi is a σ-subalgebra of
Σ̂ (ii) each Xi is an integrable
R real-valued
R Σi -measurable random variable such that dom Xi ∈ Σi (iii) whenever i ≤ j
in I, then Σi ⊆ Σj and E Xi = E Xj for every E ∈ Σi . The principal case, after I = N, is I = [0, ∞[; I = Z also
is interesting, and I think it is fair to say that the most important ideas can already be expressed in theorems about
martingales indexed by finite sets I. But in this volume I will generally take martingales to be indexed by N.

(d) Given just a sequence hXn in∈N of integrable real-valued random variables on a probability space (Ω, Σ, µ),
we can say simply that hXn in∈N is a martingale on (Ω, Σ, µ) if there is some non-decreasing sequence hΣn in∈N of
σ-subalgebras of Σ̂ (the completion
S of Σ) such that hXn in∈N is a martingale adapted to hΣn in∈N . If we write Σ̃n for
the σ-algebra generated by i≤n ΣXi , where ΣXi is the σ-algebra defined by Xi , as in 275Bb, then it is easy to see
that hXn in∈N is a martingale iff it is a martingale adapted to hΣ̃n in∈N .

(e) Continuing from (d), it is also easy to see that if hXn in∈N is a martingale on (Ω, Σ, µ), and Xn′ =a.e. Xn for
every n, then hXn′ in∈N is a martingale on (Ω, Σ, µ). (The point is that if hXn in∈N is adapted to hΣn in∈N , then both
hXn in∈N and hXn′ in∈N are adapted to hΣ̂n in∈N , where
Σ̂n = {E△F : E ∈ Σn , F is negligible}.)
Consequently we have a concept of ‘martingale’ as a sequence in L1 (µ), saying that a sequence hXn• in∈N in L1 (µ) is a
martingale iff hXn in∈N is a martingale.
Nevertheless, I think that the concept of ‘martingale adapted to a sequence of σ-algebras’ is the primary one, since
in all the principal applications the σ-algebras reflect some essential aspect of the problem, which may not be fully
encompassed by the random variables alone.

(f ) The word ‘martingale’ originally (in English; the history in French is more complex) referred to a strap used to
prevent a horse from throwing its head back. Later it was used as the name of a gambling system in which the gambler
doubles his stake each time he loses, and (in French) as a general term for gambling systems. These may be regarded
as a class of ‘stopped-time martingales’, as described in 275L-275P below.

275D A large part of the theory of martingales consists of inequalities of various kinds. I give two of the most
important, both due to J.L.Doob. (See also 276Xa-276Xb.)
Lemma Let (Ω, Σ, µ) be a probability space, and hXn in∈N a martingale on Ω. Fix n ∈ N and set X ∗ = max(X0 , . . . , Xn ).
Then for any ǫ > 0,
1
Pr(X ∗ ≥ ǫ) ≤ E(Xn+ ),
ǫ

writing Xn+ = max(0, Xn ).


proof Write µ̂ for the completion of µ, and Σ̂ for its domain. Let hΣn in∈N be a non-decreasing sequence of σ-subalgebras
of Σ̂ to which hXn in∈N is adapted. For each i ≤ n set
Ei = {ω : ω ∈ dom Xi , Xi (ω) ≥ ǫ},
275F Martingales 373

S
Fi = Ei \ j<i Ej .
S S T
Then F0 , . . . , Fn are disjoint and F = i≤n Fi = i≤n Ei ; moreover, writing H for the conegligible set i≤n dom Xi ,
{ω : X ∗ (ω) ≥ ǫ} = F ∩ H,
so that
Pn
Pr(X ∗ ≥ ǫ) = µ̂{ω : X ∗ (ω) ≥ ǫ} = µ̂F = i=0 µ̂Fi .
On the other hand, Ei and Fi belong to Σi for each i ≤ n, so
R R
Fi
Xn = Fi
Xi ≥ ǫµ̂Fi
for every i, and
Pn Pn R R R
ǫµ̂F = ǫ i=0 µ̂Fi ≤ i=0 Fi
Xn = F
Xn ≤ F
Xn+ ≤ E(Xn+ ),
as required.
R
Remark Note that in fact we have ǫµ̂F ≤ F
Xn , where F = {ω : X ∗ (ω) ≥ ǫ}; this is of great importance in many
applications.

275E Up-crossings The next lemma depends on the notion of ‘up-crossing’. Let x0 , . . . , xn be any list of real
numbers, and a < b in R. The number of up-crossings from a to b in the list x0 , . . . , xn is the number of pairs
(j, k) such that 0 ≤ j < k ≤ n, xj ≤ a, xk ≥ b and a < xi < b for j < i < k. Note that this is also the largest m such
that sm < ∞, if we write
r1 = inf{i : i ≤ n, xi ≤ a},

s1 = inf{i : r1 < i ≤ n, xi ≥ b},

r2 = inf{i : s1 < i ≤ n, xi ≤ a},

s2 = inf{i : r2 < i ≤ n, xi ≥ b}
and so on, taking inf ∅ = ∞.

275F Lemma Let (Ω,TΣ, µ) be a probability space and hXn in∈N a martingale on Ω. Suppose that n ∈ N and that
a < b in R. For each ω ∈ i≤n dom Xi , let U (ω) be the number of up-crossings from a to b in the list X0 (ω), . . . , Xn (ω).
Then
1
E(U ) ≤ E((Xn − X0 )+ ),
b−a

writing (Xn − X0 )+ (ω) = max(0, Xn (ω) − X0 (ω)) for ω ∈ dom Xn ∩ dom X0 .


proof Each individual step in the proof is ‘elementary’, but the structure as a whole is non-trivial.
(a) The following fact will be useful. Suppose that x0 , . . . , xn are real numbers; let u be the number of up-crossings
from a to b in the list x0 , . . . , xn . Set yi = max(xi , a) for each i; then u is also the number of up-crossings from a to
b in the list y0 , . . . , yn . For each k ≤ n, set ck = 1 if there is a j ≤ k such that xj ≤ a and xi < b for j ≤ i ≤ k, 0
otherwise. Then
Pn−1
(b − a)u ≤ k=0 ck (yk+1 − yk ).
P
P I induce on m to show that (defining rm , sm as in 275E)
Psm −1
(b − a)m ≤ k=0 ck (yk+1 − yk )
whenever m ≤ u. For m = 0 (taking s0 = −1) we have 0 = 0. For the inductive step to m ≥ 1, we have sm−1 < rm <
sm ≤ n (because I am supposing that m ≤ u), and ck = 0 if sm−1 ≤ k < rm , ck = 1 if rm ≤ k < sm . So

m −1
sX sm−1 −1 m −1
sX
X
ck (yk+1 − yk ) = ck (yk+1 − yk ) + (yk+1 − yk )
k=0 k=0 k=rm
≥ (b − a)(m − 1) + ysm − yrm
(by the inductive hypothesis)
374 Probability theory 275F

≥ (b − a)m

(because ysm ≥ b, yrm = a), and the induction proceeds.


Accordingly
Psu −1
k=0 ck (yk+1 − yk ) ≥ (b − a)u.
Pn−1
As for the sum k=su ck (yk+1 − yk ), we have ck = 0 for su ≤ k < ru+1 , ck = 1 for ru ≤ k < su+1 , while su+1 > n, so
if n ≤ ru+1 we have
Pn−1 Psu −1
k=0 ck (yk+1 − yk ) = k=0 ck (yk+1 − yk ) ≥ (b − a)u,

while if n > ru+1 we have


n−1
X u −1
sX n−1
X
ck (yk+1 − yk ) = ck (yk+1 − yk ) + yk+1 − yk
k=0 k=0 k=ru+1

≥ (b − a)u + yn − yru+1
≥ (b − a)u
because yn ≥ a = yru+1 . Thus in both cases we have the required result. Q
Q
(b)(i) Now define
Yk (ω) = max(a, Xk (ω)) for ω ∈ dom Xk ,
T
Fk = {ω : ω ∈ i≤k dom Xi , ∃ j ≤ k, Xj (ω) ≤ a, Xi (ω) < b if j ≤ i ≤ k}
for each k ∈ N. If hΣn in∈N is a non-decreasing sequence of σ-algebras to which hXn in∈N is adapted, then Fk ∈ Σk
(because if j ≤ k all the sets dom Xj , {ω : Xj (ω) ≤ a}, {ω : Xj (ω) < b} belong to Σj ⊆ Σk ).
R R
(ii) We find that F Yk ≤ F Yk+1 if F ∈ Σk . P P Set G = {ω : Xk (ω) > a} ∈ Σk . Then
Z Z
Yk = Xk + aµ̂(F \ G)
F F ∩G
Z
= Xk+1 + aµ̂(F \ G)
ZF ∩G Z Z
≤ Yk+1 + Yk+1 = Yk+1 . Q
Q
F ∩G F \G F

R R
(iii) Consequently F
Yk+1 − Yk ≤ Yk+1 − Yk for every F ∈ Σk .
R R R R
P
P (Yk+1 − Yk ) − F
(Yk+1 − Yk ) = Ω\F
Yk+1 − Ω\F
Yk ≥ 0. Q
Q
T
(c) Let H be the conegligible set dom U = i≤n dom Xi ∈ Σn . We ought to check at some point that U is Σn -
measurable; but this is clearly true, because all the relevant sets {ω : Xi (ω) ≤ a}, {ω : Xi (ω) ≥ b} belong to Σn . For
each ω ∈ H, apply (a) to the list X0 (ω), . . . , Xn (ω) to see that
Pn−1
(b − a)U (ω) ≤ k=0 χFk (ω)(Yk+1 (ω) − Yk (ω)).
Because H is conegligible, it follows that

XZ
n−1 XZ
n−1
(b − a)E(U ) ≤ Yk+1 − Yk ≤ Yk+1 − Yk
k=0 Fk k=0
(using (b-iii))
= E(Yn − Y0 ) ≤ E((Xn − X0 )+ )

because Yn − Y0 ≤ (Xn − X0 )+ everywhere on dom Xn ∩ dom X0 . This completes the proof.

275G We are now ready for the principal theorems of this section.
275I Martingales 375

Doob’s Martingale Convergence Theorem Let hXn in∈N be a martingale on a probability space (Ω, Σ, µ), and
suppose that supn∈N E(|Xn |) < ∞. Then limn→∞ Xn (ω) is defined in R for almost every ω in Ω.
T
proof (a) Set H = n∈N dom Xn , and for ω ∈ H set Y (ω) = lim inf n→∞ Xn (ω), Z(ω) = lim supn→∞ Xn (ω),
allowing ±∞ in both cases. But note that Y ≤ lim inf n→∞ |Xn |, so by Fatou’s Lemma Y (ω) < ∞ for almost every
ω; similarly Z(ω) > −∞ for almost every ω. It will therefore be enough if I can show that Y =a.e. Z, for then
Y (ω) = Z(ω) ∈ R for almost every ω, and hXn (ω)in∈N will be convergent for almost every ω.
(b) ?? So suppose, if possible, that Y and Z are not equal almost everywhere. Of course both are Σ̂-measurable,
where (Ω, Σ̂, µ̂) is the completion of (Ω, Σ, µ), so we must have
µ̂{ω : ω ∈ H, Y (ω) < Z(ω)} > 0.
Accordingly there are rational numbers q, q ′ such that q < q ′ and µ̂G > 0, where
G = {ω : ω ∈ H, Y (ω) < q < q ′ < Z(ω)}.
Now, for each ω ∈ H, n ∈ N, let Un (ω) be the number of up-crossings from q to q ′ in the list X0 (ω), . . . , Xn (ω). Then
275F tells us that
1 1 2M
E(Un ) ≤ E((Xn − X0 )+ ) ≤ E(|Xn | + |X0 |) ≤ ,
q ′ −q q ′ −q q ′ −q

if we write M = supi∈N E(|Xi |). By B.Levi’s theorem, U (ω) = supn∈N Un (ω) < ∞ for almost every ω. On the other
hand, if ω ∈ G, then there are arbitrarily large j, k such that Xj (ω) < q and Xk (ω) > q ′ , so U (ω) = ∞. This means
that µ̂G must be 0, contrary to the choice of q, q ′ . X
X
(c) Thus we must in fact have Y =a.e. Z, and hXn (ω)in∈N is convergent for almost every ω, as claimed.

275H Theorem Let (Ω, Σ, µ) be a probability space, and hΣn in∈N a non-decreasing sequence of σ-subalgebras of
Σ. Let hXn in∈N be a martingale adapted to hΣn in∈N . Then the following are equiveridical:
(i) there is a random variable X, of finite expectation, such that Xn is a conditional expectation of X on Σn for
every n;
(ii) {Xn : n ∈ N} is uniformly integrable;
(iii) X∞ (ω) = limn→∞ Xn (ω) is defined in R for almost every ω, and E(|X∞ |) = limn→∞ E(|Xn |) < ∞.
proof (i)⇒(ii) By 246D, the set of all conditional expectations of X is uniformly integrable, so {Xn : n ∈ N} is surely
uniformly integrable.
(ii)⇒(iii) If {Xn : n ∈ N} is uniformly integrable, we surely have supn∈N E(|Xn |) < ∞, so 275G tells us that X∞
is defined almost everywhere. By 246Ja, X∞ is integrable and limn→∞ E(|Xn − X∞ |) = 0. Consequently E(|X∞ |) =
limn→∞ E(|Xn |) < ∞.
(iii)⇒(i) Because E(|X∞ |) = limn→∞ E(|Xn |), limn→∞ E(|Xn − X∞ |) = 0 (245H(a-ii)). Now let n ∈ N, E ∈ Σn .
Then
R R R
E
Xn = limm→∞ E
Xm = E
X∞ .
As E is arbitrary, Xn is a conditional expectation of X∞ on Σn .

S space, and hΣn in∈N a non-decreasing sequence of σ-subalgebras of


275I Theorem Let (Ω, Σ, µ) be a probability
Σ; write Σ∞ for the σ-algebra generated by n∈N Σn . Let X be any real-valued random variable on Ω with finite
expectation, and for each n ∈ N let Xn be a conditional expectation of X on Σn . Then X∞ (ω) = limn→∞ Xn (ω) is
defined almost everywhere; limn→∞ E(|X∞ − Xn |) = 0, and X∞ is a conditional expectation of X on Σ∞ .
proof (a) By 275G-275H, we know that X∞ is defined almost everywhere, and, as remarked in 275H, limn→∞ E(|X∞ −
Xn |) = 0. To see that X∞ is a conditional expectation of X on Σ∞ , set
R R S
A = {E : E ∈ Σ∞ , E X∞ = E X}, I = n∈N Σn .
Now I and A satisfy the conditions of the Monotone Class Theorem (136B). P α) Of course Ω ∈ I and I is closed
P (α
under finite intersections, because hΣn in∈N is a non-decreasing sequence of σ-algebras; in fact I is a subalgebra of PΩ,
and is closed under finite unions and complements. (β β ) If E ∈ I, say E ∈ Σn ; then
R R R
E
X∞ = limm→∞ E
Xm = E
X,
as in (iii)⇒(i) of 275H, so E ∈ A. Thus I ⊆ A. (γγ ) If E, F ∈ A and E ⊆ F , then
376 Probability theory 275I
R R R R R R
F \E
X∞ = F
X∞ − E
X∞ = F
X− E
X= F \E
X,

so F \ E ∈ A. (δδ ) If hEk in∈N is a non-decreasing sequence in A with union E, then


R R R R
E
X∞ = limk→∞ Ek
X∞ = limk→∞ Ek
X= E
X,
so E ∈ A. Thus A is a Dynkin class. Q
Q
Consequently, by 136B, A includes Σ∞ ; that is, X∞ is a conditional expectation of X on Σ∞ .
Remark I have written ‘limn→∞ E(|Xn − X∞ |) = 0’; but you may prefer to say ‘X∞ •
= limn→∞ Xn• in L1 (µ)’, as in
Chapter 24.
The importance of this theorem is such that you may be interested in a proof based on 275D rather than 275E-275G;
see 275Xd.

*275J As a corollary of this theorem I give an important result, a kind of density theorem for product measures.
Proposition Let h(Ωn , Σn , µn )in∈N be a sequence of probability spaces with product (Ω, Σ, µ). Let X be a real-valued
random variable on Ω with finite expectation. For each n ∈ N define Xn by setting
R
ω ) = X(ω0 , . . . , ωn , ξn+1 , . . . )d(ξn+1 , . . . )
Xn (ω
R
wherever
Q this is defined, where I write ‘ . . . d(ξn+1 , . . . )’ to mean integration with respect to the product measure λ′n
on i≥n+1 Ωi . Then X(ω ω ) = limn→∞ Xn (ω ω ) for almost every ω = (ω0 , ω1 , . . . ) in Ω, and limn→∞ E(|X − Xn |) = 0.
proof For each n, we can identify µ with the product of λn and λ′n , where λn is the product measure
Q on Ω0 × . . . × Ωn
(254N). So 253H tells us that Xn is a conditional expectation of X on the σ-algebra Λn = {E × i>n Ωi : E ∈ dom λn }.
Since (by 254N again) we can think of λn+1 as the product of λn and µn+1 , Λn ⊆ Λn+1 for each n. So 275I tells us
that hXn in∈N converges almost everywhere to a conditional expectation X∞ of X on the σ-algebra Λ∞ generated by
S Nc
n∈N Λn . Now Λ∞ ⊆ Σ and also n∈N Σn ⊆ Λ∞ , so every member of Σ is sandwiched between two members of Λ∞
of the same measure (254Ff), and X∞ must be equal to X almost everywhere. Moreover, 275I also tells us that
limn→∞ E(|X − Xn |) = limn→∞ E(|X∞ − Xn |) = 0,
as required.

275K Reverse martingales We have a result corresponding to 275I for decreasing sequences of σ-algebras. While
this is used less often than 275G-275I, it does have very important applications.
Theorem Let (Ω, Σ, µ) be a probability space, and hΣn in∈N a non-increasing sequence of σ-subalgebras of Σ, with
intersection Σ∞ . Let X be any real-valued random variable with finite expectation, and for each n ∈ N let Xn be
a conditional expectation of X on Σn . Then X∞ = limn→∞ Xn is defined almost everywhere and is a conditional
expectation of X on Σ∞ .
T
proof (a) Set H = n∈N dom Xn , so that H is conegligible. For n ∈ N, a < b in R, and ω ∈ H, write Uabn (ω) for the
number of up-crossings from a to b in the list Xn (ω), Xn−1 (ω), . . . , X0 (ω) (275E). Then

1
E(Uabn ) ≤ E((X0 − Xn )+ )
b−a
(275F)
1 2
≤ E(|X0 | + |Xn |) ≤ E(|X0 |) < ∞.
b−a b−a

So limn→∞ Uabn (ω) is finite for almost every ω. But this means that
{ω : lim inf n→∞ Xn (ω) < a, lim supn→∞ Xn (ω) > b}
is negligible. As a and b are arbitrary, hXn in∈N is convergent a.e., just as in 275G. Set X∞ (ω) = limn→∞ Xn (ω)
whenever this is defined in R.
(b) By 246D, {Xn : n ∈ N} is uniformly integrable, so E(|Xn − X∞ |) → 0 as n → ∞ (246Ja), and
R R R
E
X∞ = limn→∞ E
Xn = E
X0
for every E ∈ Σ∞ .
275N Martingales 377

(c) Now there is a conegligible set G ∈ Σ∞ such that G ⊆ dom X∞ and X∞ ↾G is Σ∞ -measurable. P T n ∈ N,
PSFor each
there is a conegligible set Gn ∈ Σn such that Gn ⊆ dom Xn and Xn ↾Gn is Σn -measurable. Set G′ = n∈N m≥n Gm ;
S T
then, for any r ∈ N, G′ = n≥r m≥n Gm belongs to Σr , so G′ ∈ Σ∞ , while of course G′ is conegligible. For n ∈ N,
set Xn′ (ω) = Xn (ω) for ω ∈ Gn , 0 for ω ∈ Ω \ Gn ; then for ω ∈ G′ , limn→∞ Xn′ (ω) = limn→∞ Xn (ω) if either is defined

in R. Writing X∞ = limn→∞ Xn′ whenever this is defined in R, 121F and 121H tell us that X∞ ′
is Σr -measurable and
′ ′′ ′ ′
dom X∞ ∈ Σr for every r ∈ N, so that G = dom X∞ belongs to Σ∞ and X∞ is Σ∞ -measurable. We also know, from
(b), that G′′ is conegligible. So setting G = G′ ∩ G′′ we have the result. QQ
Thus X∞ is a conditional expectation of X on Σ∞ .

275L Stopping times In a sense, the main work of this section is over; I have no room for any more theorems of
importance comparable to 275G-275I. However, it would be wrong to leave this chapter without briefly describing one
of the most fruitful ideas of the subject.
Definition Let (Ω, Σ, µ) be a probability space, with completion (Ω, Σ̂, µ̂), and hΣn in∈N a non-decreasing sequence
of σ-subalgebras of Σ̂. A stopping time adapted to hΣn in∈N (also called ‘optional time’, ‘Markov time’) is a
function τ from Ω to N ∪ {∞} such that {ω : τ (ω) ≤ n} ∈ Σn for every n ∈ N.
Remark Of course the condition
{ω : τ (ω) ≤ n} ∈ Σn for every n ∈ N
can be replaced by the equivalent condition
{ω : τ (ω) = n} ∈ Σn for every n ∈ N.
I give priority to the former expression because it is more appropriate to other index sets (see 275Cc).

275M Examples (a) If hXn in∈N is a martingale adapted to a sequence hΣn in∈N of σ-algebras, and Hn is a Borel
subset of R n+1 for each n, then we have a stopping time τ adapted to hΣn in∈N defined by the formula
T
τ (ω) = inf{n : ω ∈ i≤n dom Xi , (X0 (ω), . . . , Xn (ω)) ∈ Hn },
setting inf ∅ = ∞ S
as usual. (For by 121Ka the set En = {ω : (X0 (ω), . . . , Xn (ω)) ∈ Hn } belongs to Σn for each n, and
{ω : τ (ω) ≤ n} = i≤n Ei .) In particular, for instance, the formulae
inf{n : Xn (ω) ≥ a}, inf{n : |Xn (ω)| > a}
define stopping times.

(b) Any constant function τ : Ω → N ∪ {∞} is a stopping time. If τ , τ ′ are two stopping times adapted to the same
sequence hΣn in∈N of σ-algebras, then τ ∧τ ′ is a stopping time adapted to hΣn in∈N , setting (τ ∧τ ′ )(ω) = min(τ (ω), τ ′ (ω))
for ω ∈ Ω.

275N Lemma Let (Ω, Σ, µ) be a complete probability space, and hΣn in∈N a non-decreasing sequence of σ-subalgebras
of Σ. Suppose that τ and τ ′ are stopping times on Ω, and hXn in∈N a martingale, all adapted to hΣn in∈N .
(a) The family
Σ̃τ = {E : E ∈ Σ, E ∩ {ω : τ (ω) ≤ n} ∈ Σn for every n ∈ N}
is a σ-subalgebra of Σ.
(b) If τ (ω) ≤ τ ′ (ω) for every ω, then Σ̃τ ⊆ Σ̃τ ′ .
(c) Now suppose that τ is finite almost everywhere. Set
X̃τ (ω) = Xτ (ω) (ω)
whenever τ (ω) < ∞ and ω ∈ dom Xτ (ω) . Then dom X̃τ ∈ Σ̃τ and X̃τ is Σ̃τ -measurable.
(d) If τ is essentially bounded, that is, there is some m ∈ N such that τ ≤ m almost everywhere, then E(X̃τ ) exists
and is equal to E(X0 ).
(e) If τ ≤ τ ′ almost everywhere, and τ ′ is essentially bounded, then X̃τ is a conditional expectation of X̃τ ′ on Σ̃τ .
proof (a) This is elementary. Write Hn = {ω : τ (ω) ≤ n} for each n ∈ N. The empty set belongs to Σ̃τ because it
belongs to Σn for every n. If E ∈ Σ̃τ , then
(Ω \ E) ∩ Hn = Hn \ (E ∩ Hn ) ∈ Σn
378 Probability theory 275N

because Hn ∈ Σn ; this is true for for every n, so X \ E ∈ Σ̃τ . If hEk ik∈N is any sequence in Σ̃τ then
S S
( k∈N Ek ) ∩ Hn = k∈N Ek ∩ Hn ∈ Σn
S
for every n, so k∈N Ek ∈ Σ̃τ .

(b) If E ∈ Σ̃τ then of course E ∈ Σ, and if n ∈ N then {ω : τ ′ (ω) ≤ n} ⊆ {ω : τ (ω) ≤ n}, so that
E ∩ {ω : τ ′ (ω) ≤ n} = E ∩ {ω : τ (ω) ≤ n} ∩ {ω : τ ′ (ω) ≤ n}
belongs to Σn ; as n is arbitrary, E ∈ Σ̃τ ′ .
(c) Set Hn = {ω : τ (ω) ≤ n} for each n ∈ N. For any a ∈ R,

Hn ∩ {ω : ω ∈ dom X̃τ , X̃τ (ω) ≤ a}


[
= {ω : τ (ω) = k, ω ∈ dom Xk , Xk (ω) ≤ a} ∈ Σn .
k≤n

As n is arbitrary,
Ga = {ω : ω ∈ dom X̃τ , X̃τ (ω) ≤ a} ∈ Σ̃τ .
S
As a is arbitrary, dom X̃τ = Gm ∈ Σ̃τ and X̃τ is Σ̃τ -measurable.
m∈N
S
(d) Set Hk = {ω : τ (ω) = k} for k ≤ m. Then k≤m Hk is conegligible, so
Pm R Pm R R R
E(Xτ ) = k=0 H Xk = k=0 H Xm = Ω Xm = Ω X0 .
k k

(e) Suppose τ ′ ≤ n almost everywhere. Set Hk = {ω : τ (ω) = k}, Hk′ = {ω : τ ′ (ω) = k} for each k; then both
hHk ik≤n and hHk′ ik≤n are partitions of conegligible subsets of X. Now suppose that E ∈ Στ . Then
R Pn R Pn R Pn R R
E
X̃ τ = k=0 E∩Hk X̃ τ = k=0 E∩Hk X k = k=0 E∩Hk X n = E
Xn
R R R R
because E ∩ Hk ∈ Σk for every k. By (b), E ∈ Σ̃τ ′ , so we also have E X̃τ ′ = E Xn . Thus E X̃τ = E X̃τ ′ for every
E ∈ Σ̃τ , as claimed.

275O Proposition Let hXn in∈N be a martingale and τ a stopping time, both adapted to the same sequence hΣn in∈N
of σ-algebras. For each n, set (τ ∧ n)(ω) = min(τ (ω), n) for ω ∈ Ω; then τ ∧ n is a stopping time, and hX̃τ ∧n in∈N is a
martingale adapted to hΣ̃τ ∧n in∈N , defining X̃τ ∧n and Σ̃τ ∧n as in 275N.
proof As remarked in 275Mb, each τ ∧ n is a stopping time. If m ≤ n, then Σ̃τ ∧m ⊆ Σ̃τ ∧n by 275Nb. Each X̃τ ∧m is
Σ̃τ ∧m -measurable, with domain belonging to Σ̃τ ∧m , by 275Nc, and has finite expectation, by 275Nd; finally, if m ≤ n,
then X̃τ ∧m is a conditional expectation of X̃τ ∧n on Σ̃τ ∧m , by 275Ne.

275P Corollary Suppose that (Ω, Σ, µ) is a probability space and hXn in∈N is a martingale on Ω such that W =
supn∈N |Xn+1 − Xn | is finite almost everywhere and has finite expectation. Then for almost every ω ∈ Ω, either
limn→∞ Xn (ω) exists in R or supn∈N Xn (ω) = ∞ and inf n∈N Xn (ω) = −∞.

T Let hΣn in∈N be a non-decreasing sequence of σ-algebras to which hXn in∈N is adapted. Let H be the conegligible
proof
set n∈N dom Xn ∩ {ω : W (ω) < ∞}. For each m ∈ N, set
τm (ω) = inf{n : ω ∈ dom Xn , Xn (ω) > m}.
As in 275Ma, τm is a stopping time adapted to hΣn in∈N . Set
Ymn = X̃τm ∧n ,
defined as in 275Nc, so that hYmn in∈N is a martingale, by 275O. If ω ∈ H, then either τm (ω) > n and
Ymn (ω) = Xn (ω) ≤ m,
or 0 < τm (ω) ≤ n and
Ymn (ω) = Xτm (ω) (ω) ≤ W (ω) + Xτm (ω)−1 (ω) ≤ W (ω) + m,
or τm (ω) = 0 and
Ymn (ω) = X0 (ω).
275Xh Martingales 379

Thus
Ymn (ω) ≤ |X0 (ω)| + W (ω) + m
for every ω ∈ H, and
|Ymn (ω)| = 2 max(0, Ymn (ω)) − Ymn (ω) ≤ 2(|X0 (ω)| + W (ω) + m) − Ymn (ω),

E(|Ymn |) ≤ 2E(|X0 |) + 2E(W ) + 2m − E(Ymn ) = 2E(|X0 |) + 2E(W ) + 2m − E(X0 )


by 275Nd. As this is true for every n ∈ N, supn∈N E(|Ymn |) < ∞, and limn→∞ Ymn is defined in R almost everywhere,
by Doob’s Martingale
T Convergence Theorem (275G). Let Fm be the conegligible set on which hYmn in∈N converges. Set
H ∗ = H ∩ m∈N Fm , so that H ∗ is conegligible.
Now consider
E = {ω : ω ∈ H ∗ , supn∈N Xn (ω) < ∞}.
For any ω ∈ E, there must be an m ∈ N such that supn∈N Xn (ω) ≤ m. Now this means that Ymn (ω) = Xn (ω) for
every n, and as ω ∈ Fm we have
limn→∞ Xn (ω) = limn→∞ Ymn (ω) ∈ R.
This means that hXn (ω)in∈N is convergent for almost every ω such that {Xn (ω) : n ∈ N} is bounded above.
Similarly, hXn (ω)in∈N is convergent for almost every ω such that {Xn (ω) : n ∈ N} is bounded below, which completes
the proof.

275X Basic exercises > (a)PLet hXn in∈N be an independent sequence of random variables with zero expectation
n
and finite variance. Set sn = ( i=0 Var(Xi ))1/2 , Yn = (X0 + . . . + Xn )2 − s2n for each n. Show that hYn in∈N is a
martingale.

1
> (b) Let hXn in∈N be a martingale. Show that for any ǫ > 0, Pr(supn∈N |Xn |) ≥ ǫ) ≤ ǫ supn∈N E(|Xn |).

(c) Pólya’s urn scheme Imagine a box containing red and white balls. At each move, a ball is drawn at random
from the box and replaced together with another of the same colour. (i) Writing Rn , Wn for the numbers of red
and white balls after the nth move and Xn = Rn /(Rn + Wn ), show that hXn in∈N is a martingale. (ii) Starting
from R0 = W0 = 1, find the distribution of (Rn , Wn ) for each n. (iii) Show that X = limn→∞ Xn is defined almost
everywhere, and find its distribution when R0 = W0 = 1. (See Feller 66 for a discussion of other starting values.)

> (d) Let (Ω, Σ, µ) be a probability space, and hΣn in∈N a non-decreasing sequence of σ-subalgebras of Σ; for each
n ∈ N let Pn : L1 → L1 be the conditional expectation operator corresponding to Σn , where L1 = L1 (µ) (242J).
(i) Show that V = {u : u ∈ L1 , limn→∞ kPn u − ukS1 = 0} is a k k1 -closed linear subspace of L1 . (ii) ShowS that
{E : E ∈ Σ, χE • ∈ V } is a Dynkin class including n∈N Σn , so includes the σ-algebra Σ∞ generated by n∈N Σn .
(iii) Show that if u ∈ L1 then v = supn∈N Pn |u| is defined in L1 and is of the form W • where Pr(W ≥ ǫ) ≤ 1ǫ kuk1 for
every ǫ > 0. (Hint: 275D.) (iv) Show that if X is a Σ∞ -measurable random variable with finite expectation, and for
each n ∈ N Xn is a conditional expectation of X on Σn , then X • ∈ V and X =a.e. limn→∞ Xn . (Hint: apply (iii) to
u = (X − Xm )• for large m.)

(e) Let (Ω, Σ, µ) be aS probability space, hΣn in∈N a non-decreasing sequence of σ-subalgebras of Σ, and Σ∞ the
σ-algebra generated by n∈N Σn . For each n ∈ N ∪ {∞} let Pn : L1 → L1 be the conditional expectation operator
corresponding to Σn , where L1 = L1 (µ). Show that limn→∞ kPn u − ukp = 0 whenever p ∈ [1, ∞[ and u ∈ Lp (µ).
(Hint: 275Xd, 233J/242K, 246Xg.)

(f ) Let hXn in∈N be a martingale, and suppose that p ∈ ]1, ∞[ is such that supn∈N kXn kp < ∞. Show that
X = limn→∞ Xn is defined almost everywhere and that limn→∞ kXn − Xkp = 0.

>(g) Let (Ω, Σ, µ) be [0, 1] with Lebesgue measure. For each n ∈ N let Σn be the finite subalgebra of Σ generated
by intervals of the type R[0, 2−n r] for r ≤ 2−n . Use 275I to show that for any integrable X : [0, 1] → R we must
have X(t) = limn→∞ 2n In (t) X for almost every t ∈ [0, 1[, where In (t) is the interval of the form [2−n r, 2−n (r + 1)[
containing t. Compare this result with 223A and 261Yd.

(h) In 275K, show that limn→∞ kXn − X∞ kp = 0 for any p ∈ [1, ∞[ such that kX0 kp is finite. (Compare 275Xe.)
380 Probability theory 275Xi

(i) Let (Ω, Σ, µ) be a probability space, with completion (Ω, Σ̂, µ̂), and hΣn in∈N a non-decreasing sequence of σ-
subalgebras of Σ̂. Let hXn in∈N be a uniformly integrable martingale adapted to Σn , and set X∞ = limn→∞ Xn . Let τ
be a stopping time adapted to hΣn in∈N , and set X̃τ (ω) = Xτ (ω) (ω) whenever ω ∈ dom Xτ (ω) , allowing ∞ as a value of
τ (ω). Show that X̃τ is a conditional expectation of X∞ on Σ̃τ , as defined in 275N.

(j) Let (Ω, Σ, µ) be a probability space, with completion (Ω, Σ̂, µ̂), and hΣn in∈N a non-decreasing sequence of σ-
subalgebras of Σ̂. Let hXn in∈N be a martingale and τ a stopping time, both adapted to hΣn in∈N . Suppose that
supn∈N E(|Xn |) < ∞ and that τ is finite almost everywhere. Show that X̃τ , as defined in 275Nc, has finite expectation,
but that E(X̃τ ) need not be equal to E(X0 ).

(k)(i) Find a martingale hXn in∈N such that hX2n in∈N → 0 a.e. but |X2n+1 | ≥ 1 a.e. for every n ∈ N. (ii) Find a
martingale which converges in measure but is not convergent a.e.

275Y Further exercises (a) Let (Ω, Σ, µ) be a complete probability space, hΣn in∈N a non-decreasing sequence of
σ-subalgebras of Σ all containing every negligible set, and hXn in∈N a martingale adapted to hΣn in∈N . Let ν be another
probability measure with domain Σ which is absolutely continuous with respect to µ, with Radon-Nikodým derivative
Z. For each n ∈ N let Zn be a conditional expectation of Z on Σn (with respect to the measure µ). (i) Show that
Zn is a Radon-Nikodým derivative of ν↾Σn with respect to µ↾Σn . (ii) Set Wn (ω) = Zn (ω)/Zn−1 (ω) if this is defined
in R, otherwise 0. For n ≥ 1, let Vn be
Pan conditional expectation of Wn × (Xn − Xn−1 ) on Σn−1 (with respect to the
measure µ). Set Y0 = X0 , Yn = Xn − k=1 Vk for n ≥ 1. Show that hYn in∈N is a martingale adapted to hΣn in∈N with
respect to the measure ν.

(b) Combine the ideas of 275Cc with those of 275Cd-275Ce to describe a notion of ‘martingale indexed by I’, where
I is an arbitrary partially ordered set.

(c) Let hXk ik∈N be a martingale on a complete probability space (Ω, Σ, µ), and fix n ∈ N. Set X ∗ = max(|X0 |, . . . , |Xn |).
p R
Let p ∈ ]1, ∞[. Show that kX ∗ kp ≤ kXn kp . (Hint: set Ft = {ω : X ∗ (ω) ≥ t}. Show that tµFt ≤ Ft |Xn |. Using
p−1
Fubini’s theorem on Ω × [0, ∞[ and on Ω × [0, ∞[ × [0, ∞[, show that
R∞
E((X ∗ )p ) = p 0
tp−1 µ̂Ft dt,
R∞ R 1
0
tp−2 Ft
|Xn |dt = E(|Xn | × (X ∗ )p−1 ),
p−1

E(|Xn | × (X ∗ )p−1 ) ≤ kXn kp kX ∗ kp−1


p .
Compare 286A.)

(d) Let hXk ik∈N be a martingale on a complete probability space (Ω, Σ, µ), and fix Rn ∈ N. Set X ∗ = max(|X0 |, . . . , |Xn |),
Ft = {ω : X ∗ (ω) ≥ t}, Gt = {ω : |Xn (ω)| ≥ 12 t} for t ≥ 0. (i) Show that tµFt ≤ 2 Gt |Xn | for every t ≥ 0. (ii) Show
that E(X ∗ ) ≤ 1 + 2 ln 2E(|Xn |) + 2E(|Xn | × ln+ |Xn |), where ln+ t = ln t for t ≥ 1, 0 for t ∈ [0, 1].

(e) Let (Ω, Σ, µ) be a probability space and hΣi ii∈I a countable family of σ-subalgebras of Σ such that for any i,
j ∈ I either Σi ⊆ Σj or Σj ⊆ Σi . Let X be a real-valued random variable on Ω such that kXkp < ∞, where 1 < p < ∞,
p
and suppose that Xi is a conditional expectation of X on Σi for each i ∈ I. Show that k supi∈I |Xi |kp ≤ kXkp .
p−1

(f ) Let (Ω, Σ, µ) be a probability space, with completion (Ω, Σ̂, µ̂), and let hΣn in∈N be a non-decreasing sequence
of σ-subalgebras of Σ̂. Let hXn in∈N be a sequence of µ-integrable real-valued functions such that dom Xn ∈ Σn and
Xn is Σn -measurable for R each n ∈ RN. We say that hXn in∈N is a submartingale adapted to hΣn in∈N (also called
‘semi-martingale’) if E Xn+1 ≥ E Xn for every n ∈ N and every E ∈ Σn . Prove versions of 275D, 275F, 275G,
275Xf for submartingales.

(g) Let hXn in∈N be a martingale, and φ : R → R a convex function. Show that hφ(Xn )in∈N is a submartingale.
(Hint: 233J.) Re-examine part (b-ii) of the proof of 275F in the light of this fact.

(h) Let hXn in∈N be an independent sequence of non-negative random variables all with expectation 1. Set Wn =
X0 × . . . × Xn for every n. (i) Show that W = limn→∞ Wn is defined a.e. (ii) Show that E(W ) is either 0 or 1. (Hint:
275 Notes Martingales 381

suppose E(W ) > 0. Set Zn = limm→∞ Xn × . . . × Xm . Show that limn→∞ Zn = 1 when 0 < W < ∞, therefore a.e., by
the zero-one law,Q √ n ) ≤ 1, by Fatou’s lemma, so limn→∞ E(Zn ) = 1, while E(W ) = E(Wn )E(Zn+1 ) for every
while E(Z

n.) (iii) Set γ = n=0 E( Xn ). Show that γ > 0 iff E(W ) = 1. (Hint: Pr(Wn ≥ 14 γ 2 ) ≥ 14 γ 2 for every n, so if γ > 0

then W cannot be zero a.e.; while E( W ) ≤ γ.)
(i) Let h(Ωn , Σn , µn )in∈N be a sequence of probability spaces with product (Ω, Σ, µ). Suppose that for each n ∈ N
we have a probability measure νn , with domain
Q∞ RΣ√ n , which is absolutely continuous with respect to µn , with Radon-
Nikodým derivative fn , and suppose that n=0 fn dµn > 0. Let ν be the productQof hνn in∈N . Show that ν is an
indefinite-integral measure over µ, with Radon-Nikodým
R ω) = ∞
derivative f , where f (ω n=0 fn (ωn ) for µ-almost every
ω = hωn in∈N in Ω. (Hint: use 275Yh to show that f dµ = 1.)

(j) Let hpn in∈N be a sequence in [0, 1]. Let µ be the usual measure on {0, 1}N (254J) and ν the product of hνn in∈N ,
where νn is the probability
P∞ measure on {0, 1} defined by setting νn {1} = pn . Show that ν is an indefinite-integral
measure over µ iff n=0 |pn − 12 |2 < ∞.
(k) Let (Ω, Σ, µ) be a probability space, hΣn in∈N a non-decreasing sequence of σ-subalgebras of Σ and hXn in∈N a
sequence of random variables on Ω such that E(supn∈N |Xn |) is finite and X = limn→∞ Xn is defined almost everywhere.
For each n, let Yn be a conditional expectation of Xn on ΣS n . Show that hYn in∈N converges almost everywhere to a
conditional expectation of X on the σ-algebra generated by n∈N Σn .
(l) Show that 275Yk can fail if hXn in∈N is merely uniformly integrable, rather than dominated by an integrable
function.
(m) Let (Ω, Σ, µ) be a probability space, hΣn in∈N an independent sequence of σ-subalgebras of Σ, and X a random
variable on Ω with finite variance. Let Xn be
Pa∞conditional expectation of X on Σn for each n. Show that limn→∞ Xn =
E(X) almost everywhere. (Hint: consider n=0 Var(Xn ).)
(n) Let (Ω, Σ, µ) be a complete probability space, and hXn in∈N an independent sequence of random variables on
1
Ω, all with the same distribution, and of finite expectation. For each n, set Sn = n+1 (X0 + . . . + Xn ); let Σn be the

S
σ-algebra defined by Sn and Σn the σ-algebra generated by m≥n Σm . Show that Sn is a conditional expectation
of X0 on Σ∗n . (Hint: assume every Xi defined everywhere on Ω. Set φ(ω) = hXi (ω)ii∈N . Show that φ : Ω → R N
is inverse-measure-preserving for a suitable product measure on R N , and that every set in Σ∗n is of theR form φR−1 [H]
where H ⊆ R N is a Borel set invariant under permutations of coordinates in the set {0, . . . , n}, so that E Xi = E Xj
whenever i ≤ j ≤ n and E ∈ Σ∗n .) Hence show that hSn in∈N converges almost everywhere. (Compare 273I.)
(o) Formulate and prove versions of the results of this chapter for martingales consisting of functions taking values
in C or R r rather than R.
(p)(i) Find a martingale hXn in∈N which is convergent in measure, but is not convergent a.e. (Compare 272Yd.) (ii)
Find a martingale hXn in∈N such that the sequence νXn of distributions (271C) is convergent for the vague topology
(274Ld), but hXn in∈N is not convergent in measure.
P∞
(q) Let hXn in∈N be an independent sequence of real-valued random variables such that n=0 Xn P is defined in R

almost everywhere. Suppose that there is an M ≥ 0 such that |Xn | ≤ M a.e. for every n. Show that n=0 E(Xn ) is
defined in R. (Hint: 274Ye, 275G.)
(r) Let (Ω, Σ, µ) be a probability space and hXn in∈N an independent sequence of real-valued random variables on
Ω; set En = {ω : ω ∈ dom Xn , |Xn (ω)| > 1}, Yn = Xn × χ(Ω \ En ) forPeach n, and Zn (ω) = med(−1, Xn (ω), 1) for

n ∈ N and
P∞ ω ∈ dom Xn . P Show that the following are equiveridical:
P∞ (i) n=0 Xn (ω)Pis defined in R for
Palmost every
∞ ∞ ∞
ω; (ii) n=0 µ̂En P < ∞, n=0 E(Yn ) is defined in R, and n=0 Var(Yn ) < ∞; (iii) n=0 µ̂En < ∞, n=0 E(Zn ) is

defined in R, and n=0 Var(Zn ) < ∞. (Hint: 273K, 275Yq.) (This is a version of the Three Series Theorem.)

275 Notes and comments I hope that the sketch above, though distressingly abbreviated, has suggested some of
the richness of the concepts involved, and will provide a foundation for further study. All the theorems of this section
have far-reaching implications, but the one which is simply indispensable in advanced measure theory is 275I, ‘Lévy’s
martingale convergence theorem’, which I will use in the proof of the Lifting Theorem in Chapter 34 of the next volume.
As for stopping times, I mention them partly in an attempt to cast further light on what martingales are for (see
276Ed below), and partly because the ideas of 275N-275O are so important in modern probability theory that, just
as a matter of general knowledge, you should be aware that there is something there. I add 275P as one of the most
accessible of the standard results which may be obtained by this method.
382 Probability theory §276 intro.

276 Martingale difference sequences


Hand in hand with the concept of ‘martingale’ is that of ‘martingale difference sequence’ (276A), a direct general-
ization of the notion of ‘independent sequence’. In this section I collect results which can be naturally expressed in
terms of difference sequences, including yet another strong law of large numbers (276C). I end the section with a proof
of Komlós’ theorem (276H).

276A Martingale difference sequences (a) If hXn in∈N is a martingale adapted to a sequence hΣn in∈N of σ-
algebras, then we have
R
E
Xn+1 − Xn = 0

whenever E ∈ Σn . Let us say that if (Ω, Σ, µ) is a probability space, with completion (Ω, Σ̂, µ̂), and hΣn in∈N is a
non-decreasing sequence of σ-subalgebras of Σ̂, then a martingale difference sequence adapted to hΣn in∈N is a
sequence hXn in∈N of real-valued random Rvariables on Ω, all with finite expectation, such that (i) dom Xn ∈ Σn and
Xn is Σn -measurable, for each n ∈ N (ii) E Xn+1 = 0 whenever n ∈ N, E ∈ Σn .
Pn
(b) Evidently hXn in∈N is a martingale difference sequence adapted to hΣn in∈N iff h i=0 Xi in∈N is a martingale
adapted to hΣn in∈N .

(c) Just as in 275Cd, we can say that a sequence hXn in∈N is in itself a martingale difference sequence if
Pn
h i=0 Xi in∈N is a martingale,
S that is, if hXn in∈N is a martingale difference sequence adapted to hΣ̃n in∈N , where Σ̃n is
the σ-algebra generated by i≤n ΣXi .

(d) If hXn in∈N is a martingale difference sequence then han Xn in∈N is a martingale difference sequence for any real
an .

(e) If hXn in∈N is a martingale difference sequence and Xn′ =a.e. Xn for every n, then hXn′ in∈N is a martingale
difference sequence. (Compare 275Ce.)

(f ) Of course the most important example of ‘martingale difference sequence’ is that of 275Bb: any independent
sequence of random variables with zero expectation is a martingale difference sequence. It turns out that some of the
theorems of §273 concerning such independent sequences may be generalized to martingale difference sequences.

P∞ P∞
276B Proposition Let hXn in∈N be a martingale difference sequence such that n=0 E(Xn2 ) < ∞. Then n=0 Xn
is defined, and finite, almost everywhere.
proof (a) Let (Ω, Σ, µ) be the underlying probability space, (Ω, Σ̂, µ̂) its completion, and hΣn in∈N a non-decreasing
Pn
sequence of σ-subalgebras of Σ̂ such that hXn in∈N is adapted to hΣn in∈N . Set Yn = i=0 Xi for each n ∈ N. Then
hYn in∈N is a martingale adapted to hΣn in∈N .
(b) E(Yn × Xn+1 ) = 0 for each n. P P Yn is a sum of random variables with finite variance, so E(Yn2 ) < ∞, by
244Ba; it follows that Yn × Xn+1 has finite expectation, by 244Eb. Because the constant function 0 is a conditional
expectation of Xn+1 on Σn ,
E(Yn × Xn+1 ) = E(Yn × 0 ) = 0,
by 242L. Q
Q
Pn
(c) It follows that E(Yn2 ) = i=0 E(Xi2 ) for every n. P
P Induce on n. For the inductive step, we have
2
E(Yn+1 ) = E(Yn2 + 2Yn × Xn+1 + Xn+1
2
) = E(Yn2 ) + E(Xn+1
2
)
because, by (b), E(Yn × Xn+1 ) = 0. Q
Q
(d) Of course
R p
E(|Yn |) = |Yn | × χΩ ≤ kYn k2 kχΩk2 = E(Yn2 ),
so
p pP ∞
E(Yn2 ) =
supn∈N E(|Yn |) ≤ supn∈N 2
i=0 E(Xi ) < ∞.
P∞
By 275G, limn→∞ Yn is defined and finite almost everywhere, that is, i=0 Xi is defined and finite almost everywhere.
276E Martingale difference sequences 383

276C The strong law of large numbers: fourth form Let hXn in∈N be a martingale difference sequence, and
P∞ 1
suppose that hbn in∈N is a non-decreasing sequence in ]0, ∞[, diverging to ∞, such that n=0 2 Var(Xn ) < ∞. Then
bn
1 Pn
limn→∞ i=0 Xi = 0
bn
almost everywhere.
proof (Compare 273D.) As usual, write (Ω, Σ, µ) for the underlying probability space. Set
1
X̃n = Xn
bn

for each n; then hX̃n in∈N also is a martingale difference sequence, and
P∞ 2
P∞ 1
n=1 E(X̃n ) = n=1 2 Var(Xn ) < ∞.
bn

By 276B, hX̃n (ω)in∈N is summable for almost every ω ∈ Ω. But by 273Cb,


1 Pn 1 Pn
limn→∞ i=0 Xi (ω) = limn→∞ i=0 bi X̃i (ω) = 0
bn bn
for all such ω. So we have the result.

276D Corollary Let hXn in∈N be a martingale such that bn = E(Xn2 ) is finite for each n.
(a) If supn∈N bn is infinite, then limn→∞ b1n Xn = 0 a.e.
(b) If supn≥1 n1 bn < ∞, then limn→∞ n1 Xn = 0 a.e.
proof Consider the martingale difference sequence hYn in∈N = hXn+1 − Xn in∈N . Then E(Yn × Xn ) = 0, so E(Yn2 ) +
2
E(Xn2 ) = E(Xn+1 ) for each n. In particular, hbn in∈N must be non-decreasing.
(a) If limn→∞ bn = ∞, take m such that bm > 0; then
P∞ 1 P∞ 1 R∞ 1
n=m 2 Var(Yn ) = n=m (bn+1 − bn ) ≤ dt < ∞.
bn+1 b2n+1 bm t2

By 276C (modifying bi for i < m, if necessary),


1 1 Pn 1 Pn
limn→∞ Xn = limn→∞ (X0 + i=0 Yi ) = limn→∞ i=0 Yi = 0
bn bn+1 bn+1
almost everywhere.
(b) If γ = supn≥1 n1 bn < ∞, then 1
(n+1)2 ≤ min(1, γ 2 /t2 ) for bn < t ≤ bn+1 , so
P∞ 1 R∞ 1
n=0 (n+1)2 (bn+1 − bn ) ≤ γ + γ 2 γ
dt < ∞,
t2

and, by the same argument as before, limn→∞ n1 Xn = 0 a.e.

276E ‘Impossibility of systems’ (a) I return to the word ‘martingale’ and the idea of a gambling system. Consider
a gambler who takes a sequence of ‘fair’ bets, that is, bets which have payoff expectations of zero, but who chooses
which bets to take on the basis of past experience. The appropriate model for such a sequence of random events is a
martingale in the sense of 275A, taking Σn to be the algebra of all events which are observable up to and including
the outcome of the nth bet, and Xn to be the gambler’s net gain at that time. (In this model it is natural to take
Σ0 = {∅, Ω} and X0 = 0.) Certain paradoxes can arise if we try to imagine this model with atomless Σn ; to begin with
it is perhaps easier to work with the discrete case, in which each Σn is finite, or is the set of unions of some countable
family of atomic events. Now suppose that the bets involved are just two-way bets, with two equally likely outcomes,
but that the gambler chooses his stake each time. In this case we can think of the outcomes as corresponding to an
independent sequence hWn in∈N of random variables, each taking the values ±1 with equal probability. The gambler’s
system must be of the form
Xn+1 = Xn + Zn+1 × Wn+1 ,
where Zn+1 is his stake on the (n + 1)-st R bet, and must be constant on each atom of the σ-algebra Σn generated by
W1 , . . . , Wn . The point is that because E Wn+1 = 0 for each E ∈ Σn , E(Zn+1 × Wn+1 ) = 0, so E(Xn+1 ) = E(Xn ).
384 Probability theory 276Eb

(b) The general result, of which this is a special case, is the following. If hWn in∈N is a martingale difference sequence
adapted to hΣn in∈N , and hZn in≥1 is a sequence of random variables such that (i) Zn is Σn−1 -measurable (ii) Zn × Wn
has finite expectation forR each n ≥ 1, then W0 , Z1 × W1 , Z2 × W2 , . . . is a martingale difference sequence adapted to
hΣn in∈N ; the proof that E Zn+1 × Wn+1 = 0 for every E ∈ Σn is exactly the argument of (b) of the proof of 276B.

(c) I invited you to restrict your ideas to the discrete case for a moment; but if you feel that you understand what
it means to say that a ‘system’ or predictable sequence hZn in≥1 must be adapted to hΣn in∈N , in the sense that
every
R Zn is Σn−1 -measurable, then any further difficulty lies in the measure theory needed to show that the integrals
Z
E n+1
× Wn+1 are zero, which is what this book is about.

(d) Consider the gambling system mentioned in 275Cf. Here the idea is that Wn = ±1, as in (a), and Zn+1 = 2n a
if Xn ≤ 0, 0 if Xn > 0; that is, the gambler doubles his stake each time until he wins, and then quits. Of course he
is almost sure to win eventually, so we have limn→∞ Xn = a almost everywhere, even though E(Xn ) = 0 for every n.
We can compute the distribution of Xn : for n ≥ 1 we have Pr(Xn = a) = 1 − 2−n , Pr(Xn = −(2n − 1)a) = 2−n .
Thus E(|Xn |) = (2 − 2−n+1 )a and the almost-everywhere convergence of the Xn is an example of Doob’s Martingale
Convergence Theorem. Pn
In the language of stopping times (275N), Xn = Ỹτ ∧n , where Yn = k=0 2k aWk and τ = min{n : Yn > 0}.

*276F I come now to Komlós’ theorem. The first step is a trifling refinement of 276C.
Lemma Let (Ω, Σ, µ) be a probability space, and hΣn in∈N a non-decreasing sequence of σ-subalgebras of Σ. Suppose
P∞ 1
that hXn in∈N is a sequence of random variables on Ω such that (i) Xn is Σn -measurable for each n (ii) n=0 2
E(Xn2 )
(n+1)
is finite (iii) limn→∞ Xn′ = 0 a.e., where Xn′ is a conditional expectation of Xn on Σn−1 for each n ≥ 1. Then
1 Pn
limn→∞ k=0 Xk = 0 a.e.
n+1

proof Making suitable adjustments on a negligible set if necessary, we may suppose that Xn′ is Σn−1 -measurable for
n ≥ 1 and that every Xn and Xn′ is defined on the whole of Ω. Set X0′ = X0 and Yn = Xn − Xn′ for n ∈ N. Then
P If n ≥ 1, Xn′ is
hYn in∈N is a martingale difference sequence adapted to hΣn in∈N . Also E(Yn2 ) ≤ E(Xn2 ) for every n. P

square-integrable (244M), and E(Yn × Xn ) = 0, as in part (b) of the proof of 276B. Now
E(Xn2 ) = E(Yn + Xn′ )2 = E(Yn2 ) + 2E(Yn × Xn′ ) + E(Xn′ )2 ≥ E(Yn2 ). Q
Q
P∞ 1 1 Pn
This means that n=0 E(Yn2 ) must be finite. By 276C, limn→∞ i=0 Yi = 0 a.e. But by 273Ca we also
(n+1)2 n+1
1 P n 1 Pn
have limn→∞ i=0 Xi′ = 0 whenever limn→∞ Xn′ = 0, which is almost everywhere. So limn→∞ i=0 Xi = 0
n+1 n+1
a.e.

*276G Lemma Let (Ω, Σ, µ) be a probability space, and hXn in∈N a sequence of random variables on Ω such that
supn∈N E(|Xn |) is finite. For k ∈ N and x ∈ R set Fk (x) = x if |x| ≤ k, 0 otherwise. Let FR be an ultrafilter
R on N.
(a) For each k ∈ N there is a measurable function Yk : Ω → [−k, k] such that limn→F E Fk (Xn ) = E Yk for every
E ∈ Σ.
(b) limn→F E((Fk (Xn ) − Yk )2 ) ≤ limn→F E(Fk (Xn )2 ) for each k.
(c) Y = limk→∞ Yk is defined a.e. and limk→∞ E(|Y − Yk |) = 0.
proof (a) For each k, |Fk (Xn )| ≤a.e. kχΩ for every n, so that {Fk (Xn ) : n ∈ N} is uniformly integrable, and
{Fk (Xn )• : n ∈ N} is relatively weakly compact in L1 = L1 (µ) (247C). Accordingly vk = limn→F Fk (Xn )• is defined
in L1 (2A3Se); take Yk : Ω → R to be a measurable function such that Yk• = vk . For any E ∈ Σ,
R R R
Y =
E k
vk × (χE)• = limn→F E
Fk (Xn ).
In particular,
R R
| E
Yk | ≤ supn∈N | E
Fk (Xn )| ≤ kµE
for every E, so that {ω : Yk (ω) > k} and {ω : Yk (ω) < −k} are both negligible; changing Yk on a negligible set if
necessary, we may suppose that |Yk (ω)| ≤ k for every ω ∈ Ω.
(b) Because Yk is bounded, Yk• ∈ L∞ (µ), and
R R R R
limn→F Fk (Xn ) × Yk = limn→F Fk (Xn )• × Yk• = Yk• × Yk• = Yk2 .
Accordingly
*276H Martingale difference sequences 385

Z Z Z Z
lim (Fk (Xn ) − Yk )2 = lim Fk (Xn )2 − 2 lim Fk (Xn ) × Yk + Yk2
n→F n→F n→F
Z Z Z
2 2
= lim Fk (Xn ) − Yk ≤ lim Fk (Xn )2 .
n→F n→F

(c) Set W0 = Y0 = 0, Wk = Yk − Yk−1 for k ≥ 1. Then E(|Wk |) ≤ limn→F E(|Fk (Xn ) − Fk−1 (Xn )|) for every k ≥ 1.
P Set E = {ω : Wk (ω) ≥ 0}. Then
P
Z Z Z
Wk = Yk − Yk−1
E E E
Z Z
= lim Fk (Xn ) − lim Fk−1 (Xn )
n→F E n→F E
Z Z
= lim Fk (Xn ) − Fk−1 (Xn ) ≤ lim |Fk (Xn ) − Fk−1 (Xn )|.
n→F E n→F E
Similarly,
R R
| X\E
Wk | ≤ limn→F X\E
|Fk (Xn ) − Fk−1 (Xn )|.
So
R R R
E(|Wk |) = E
Wk − X\E
Wk ≤ limn→F |Fk (Xn ) − Fk−1 (Xn )|. Q
Q
P∞
It follows that k=0 E(|Wk |) is finite. P
P For any m ≥ 1,
m
X m
X
E(|Wk |) ≤ lim E(|Fk (Xn ) − Fk−1 (Xn )|)
n→F
k=0 k=1
m
X
= lim E( |Fk (Xn ) − Fk−1 (Xn )|)
n→F
k=1
= lim E(|Fm (Xn )|) ≤ sup E(|Xn |).
n→F n∈N
P∞
So k=0 E(|Wk |) ≤ supn∈N E(|Xn |) is finite.
PmQ Q
By B.Levi’s theorem (123A), limm→∞ k=0 |Wk | is finite a.e., so that
P∞
Y = limm→∞ Ym = k=0 Wk
is defined a.e.; and moreover
Pm
E(|Y − Yk |) ≤ limm→∞ E( j=k+1 |Wj |) → 0
as k → ∞.

*276H Komlós’ theorem (Komlós 67) RLet (Ω, Σ, µ) be any measure space, and hXn in∈N a sequence of integrable
real-valued functions on Ω such that supn∈N |Xn | is finite. Then there are a subsequence hXn′ in∈N of hXn in∈N and
1 Pn ′′ ′′ ′
an integrable function Y such that Y =a.e. limn→∞ i=0 Xi whenever hXn in∈N is a subsequence of hXn in∈N .
n+1
proof Since neither the hypothesis nor the conclusion is affected by changing the Xn on a negligible set, we may
suppose throughout that every Xn is measurable and defined on the whole of Ω. In addition, to begin with (down to
the end of (e) below), let us suppose that µX = 1. As in 276G, set Fk (x) = x for |x| ≤ k, 0 for |x| > k.
P∞
(a) Let F be any non-principal ultrafilter on N (2A1O). For j ∈ N set pj = limn→F Pr(|Xn | > j). Then j=0 pj is
P For any k ∈ N,
finite. P

k
X k
X k
X
pj = lim Pr(|Xn | > j) = lim Pr(|Xn | > j)
n→F n→F
j=0 j=0 j=0
Z Z
≤ lim (1 + |Xn |) ≤ 1 + sup |Xn |.
n→F n∈N
386 Probability theory *276H

P∞ R
So j=0 pj ≤ 1 + supn∈N |Xn | is finite. Q
Q
Setting
p′j = pj − pj+1 = limn→F Pr(j < |Xn | ≤ j + 1)
for each j, we have

X m
X m
X 
(j + 1)p′j = lim (j + 1)pj − (j + 1)pj+1
m→∞
j=0 j=0 j=0
m
X ∞
X
= lim pj − (m + 1)pm+1 ≤ pj < ∞.
m→∞
j=0 j=0

Next,
R Pk
limn→F Fk (Xn )2 ≤ j=0 (j + 1)2 p′j
Pk
P Setting Ejn = {ω : j ≤ |Xn (ω)| < j + 1} for j, n ∈ N, Fk (Xn )2 ≤ j=0 (j + 1)2 χEjn , so
for each k. P
R Pk Pk
limn→F Fk (Xn )2 ≤ limn→F j=0 (j + 1)2 µEjn = j=0 (j + 1)2 p′j . Q Q

(b) Define hYk ik∈N and Y =a.e. limk→∞ Yk from hXn in∈N and F as in Lemma 276G. Then
R Pk
Jk = {n : n ∈ N, (Fk (Xn ) − Yk )2 ≤ 1 + j=0 (j + 1)2 p′j }
belongs to F for every k ∈ N. P
P By (a) above and 276Gb,
R R Pk
limn→F (Fk (Xn ) − Yk )2 ≤ limn→F Fk (Xn )2 ≤ j=0 (j + 1)2 p′j . Q
Q
Also, of course,
Kk = {n : n ∈ N, Pr(Fj (Xn ) 6= Xn ) ≤ pj + 2−j for every j ≤ k}
belongs to F for every k.
R
(c) For n, k ∈ N let Zkn be a simple function such that |Zkn | ≤ |Fk (Xn ) − Yk | and |Fk (Xn ) − Yk − Zkn | ≤ 2−k .
For m ∈ N let Σm be the algebra of subsets of Ω generated by sets of the form {ω : Zkn (ω) = α} for k, n ≤ m and
α ∈ R. Because each Zkn takes only finitely many values, Σm is finite (and is therefore a σ-subalgebra of Σ); and of
course Σm ⊆ Σm+1 for every m.
We need to look at conditional expectations on the Σm , and because Σm is always finite these have a particularly
straightforward expression. Let Am be the set of ‘atoms’, or minimal non-empty sets, in Σm ; that is, the set of
equivalence classes in Ω under the relation ω ∼ ω ′ if Zkn (ω) = Zkn (ω ′ ) for all k, n ≤ m. For any integrable random
variable X on Ω, define Em (X) by setting
Z
1
Em (X)(ω) = X if x ∈ A ∈ Am and µA > 0,
µA A
= 0 if x ∈ A ∈ Am and µA = 0.
Then Em (X) is a conditional expectation of X on Σm .
Now

Z X Z
lim |Em (Fk (Xn ) − Yk )| = lim |Em (Fk (Xn ) − Yk )|
n→F n→F A
A∈Am
X Z
= lim | Em (Fk (Xn ) − Yk )|
n→F A
A∈Am
(because Em (Fk (Xn ) − Yk ) is constant on each A ∈ Am )
X Z
= lim | Fk (Xn ) − Yk |
n→F A
A∈Am
X Z
= lim | Fk (Xn ) − Yk | = 0
n→F A
A∈Am
*276H Martingale difference sequences 387

by the choice of Yk . So if we set


R
Im = {n : n ∈ N, |Em (Fk (Xn ) − Yk )| ≤ 2−k for every k ≤ m},
then Im ∈ F for every m.
(d) Suppose that hr(n)in∈N is any strictly increasing sequence in N such that r(0) > 0, r(n) ∈ Jn ∩ Kn for every n
1 Pn
and r(n) ∈ Ir(n−1) for n ≥ 1. Then i=0 Xr(i) → Y a.e. as n → ∞. P P Express Xr(n) as
n+1
(Xr(n) − Fn (Xr(n) )) + (Fn (Xr(n) ) − Yn − Zn,r(n) ) + Yn + Zn,r(n)
for each n. Taking these pieces in turn:
(i)


X ∞
X
Pr(Xr(n) 6= Fn (Xr(n) )) ≤ pn + 2−n
n=0 n=0
(because r(n) ∈ Kn for every n)
<∞

by (a). But this means that Xr(n) − Fn (Xr(n) ) → 0 a.e., since the sequence is eventually zero at almost every point,
1 Pn
and
n+1 i=0 Xr(i) − Fi (Xr(i) ) → 0 a.e. by 273Ca.

(ii) By the choice of the Zn,r(n) ,


P∞ R P∞
n=0 |Fn (Xr(n) ) − Yn − Zn,r(n) | ≤ n=0 2−n
1 Pn
is finite, so Fn (Xr(n) ) − Yn − Zn,r(n) → 0 a.e. and i=0 Fi (Xr(i) ) − Yi − Zi,r(i) → 0 a.e.
n+1

1 Pn
(iii) By 276G, Yn → Y a.e. and i=0 Yi → Y a.e.
n+1
R
(iv) We know that, for each n ≥ 1, r(n) ∈ Ir(n−1) . So (because r(n − 1) ≥ n) |Er(n−1) (Fn (Xr(n) ) − Yn | ≤ 2−n .
But as also
R  R
Er(n−1) Fn (Xr(n) ) − Yn − Zn,r(n) ≤ |Fn (Xr(n) ) − Yn − Zn,r(n) | ≤ 2−n
by 244M and the choice of Zn,r(n) ,
Z Z

|Er(n−1) Zn,r(n) | = Er(n−1) (Fn (Xr(n) ) − Yn ) − Er(n−1) (Fn (Xr(n) ) − Yn − Zn,r(n) )

≤ 2−n+1
for every n. Accordingly Er(n−1) Zn,r(n) → 0 a.e.
On the other hand,


X Z ∞
X Z
1 2 1
Zn,r(n) ≤ Fn (Xr(n) − Yn )2
(n+1)2 (n+1)2
n=0 n=0
X∞ X n
1
≤ (1 + (j + 1)2 p′j )
(n+1)2
n=0 j=0
(because r(n) ∈ Jn )

X ∞
X ∞
X
1 1
≤ + (j + 1)2 p′j
(n+1)2 (n+1)2
n=0 j=0 n=j

X
2
π
≤ +2 (j + 1)p′j
6
j=0

is finite. (I am using the estimate


388 Probability theory *276H

P∞ 1 P∞ 2 2 2
n=j (n+1)2 ≤ n=j n+1 − = .)
n+2 j+1
1 Pn
By 276F, applied to hΣr(n) in∈N and hZn,r(n) in∈N , i=0 Zi,r(i) → 0 a.e.
n+1

1 Pn
(v) Adding these four components, we see that i=0 Xr(i) → 0, as claimed. Q
Q
n+1

(e) Now fix any strictly increasing sequence hs(n)in∈N in N such that s(0) > 0, s(n) ∈ Jn ∩ Kn for every n and
s(n) ∈ Is(n−1) for n ≥ 1; such a sequence exists because Jn ∩ Kn ∩ Is(n−1) belongs to F, so is infinite, for every n ≥ 1.
Set Xn′ = Xs(n) for every n. If hXn′′ in∈N is a subsequence of hXn′ in∈N , then it is of the form hXs(r(n)) in∈N for some
strictly increasing sequence hr(n)in∈N . In this case
s(r(0)) ≥ s(0) > 0,

s(r(n)) ∈ Jr(n) ∩ Kr(n) ⊆ Jn ∩ Kn for every n,

s(r(n)) ∈ Is(r(n)−1) ⊆ Is(r(n−1)) for every n ≥ 1.


1 Pn
So (d) tells us that i=0 Xi′′ → Y a.e.
n+1

(f ) Thus the theorem is proved in the case in which (Ω, Σ, µ) is a probability space. NowRsuppose that µ is σ-finite
and µΩ > 0. In this case there is a strictly positive measurable function f : Ω → R such that f dµ = 1 (215B(ix)). Let
1
ν be the corresponding indefinite-integral measure (234J), so that ν is a probability measure on Ω, and h × Xn in∈N is
f
R 1
a sequence of ν-integrable functions such that supn∈N × Xn dν is finite (235K). From (a)-(e) we see that there must
f
1 Pn 1
be a ν-integrable function Y and a subsequence hXn′ in∈N of hXn in∈N such that ′′
i=0 f × Xi → Y ν-a.e. for every
n+1
1 Pn
subsequence hXn′′ in∈N of hXn′ in∈N . But µ and ν have the same negligible sets (234Lc), so ′′
i=0 Xi → f × Y µ-a.e.
n+1
for every subsequence hXn′′ in∈N of hXn′ in∈N .
(g) Since the result is trivial if µΩ = 0, the theorem is true whenever µ is σ-finite. For the general case, set
S S
Ω̃ = n∈N {ω : Xn (ω) 6= 0} = m,n∈N {ω : |Xn (ω)| ≥ 2−m },
so that the subspace measure µΩ̃ is σ-finite. Then there are a µΩ̃ -integrable function Ỹ and a subsequence hXn′ in∈N
1 Pn ′′ ′′ ′
of hXn in∈N such that i=0 Xi ↾ Ω̃ → Ỹ µΩ̃ -a.e. for every subsequence hXn in∈N of hXn in∈N . Setting Y (ω) = Ỹ (ω)
n+1
1 Pn ′′ ′′
if ω ∈ Ω̃, 0 for ω ∈ Ω \ Ω̃, we see that Y is µ-integrable and that i=0 Xi → Y µ-a.e. whenever hXn in∈N is a
n+1
subsequence of hXn′ in∈N . This completes the proof.

276X
R Basic
R exercises > (a) Let hXn in∈N be a martingale adapted to a sequence hΣn in∈N of σ-algebras. Show
2
that E Xn2 ≤ E Xn+1 for every n ∈ N, E ∈ Σn (allowing ∞ as a value of an integral). (Hint: see the proof of 276B.)

> (b) Let hXn in∈N be a martingale. Show that for any ǫ > 0,
1
Pr(supn∈N |Xn | ≥ ǫ) ≤ supn∈N E(Xn2 ).
ǫ2

(Hint: put 276Xa together with the argument for 275D.)

(c) When does 276Xb give a sharper result than 275Xb?


1
(d) Let hXn in∈N be a martingale difference sequence and set Yn = n+1 (X0 + . . . + Xn ) for each n ∈ N. Show that
if hXn in∈N is uniformly integrable then limn→∞ kYn k1 = 0. (Hint: use the argument of 273Na, with 276C in place of
273D, and setting X̃n = Xn′ − Zn , where Zn is an appropriate conditional expectation of Xn′ .)
>(e) Strong law of large numbers: fifth form A sequence hXn in∈N of random variables is exchangeable if
(Xn0 , . . . , Xnk ) has the same joint distribution as (X0 , . . . , Xk ) whenever n0 , . . . , nk are distinct. Show that if hXn in∈N
1 P∞
is an exchangeable sequence of random variables with finite expectation, then h i=0 Xi in∈N converges a.e. (Hint:
n+1
276H.)
276 Notes Martingale difference sequences 389

(f ) Let hXn in∈N be an independent identically distributed sequence P of random variablesPwith zero expectation and
∞ 2 ∞
non-zero finite
P∞ 2variance, and ht
P i
n n∈N a sequence in R. Show that (i) if t
n=0 n < ∞, then n=0 tn Xn is defined in R

a.e. (ii) if n=0 tn = ∞ then n=0 tn Xn is undefined a.e. (Hint: 276B, 274Xk.)
(g) Suppose
Qn that hXn in∈N is a uniformly bounded martingale difference sequence and han in∈N ∈ ℓ2 . Show that
limn→∞ i=0 (1 + ai Xi ) is defined and finite almost everywhere. (Hint: han Xn in∈N is summable and square-summable
a.e.)

276Y Further exercises (a) Let hXn iP n∈N be a martingale difference sequence such that supn∈N kXn kp is finite,
1 n
where p ∈ ]1, ∞[. Show that limn→∞ k n+1 i=0 Xi kp = 0. (Hint: 273Nb.)

(b) Let hXn in∈N be a uniformly integrable martingale difference sequence and Y a bounded random variable. Show
that limn→∞ E(Xn × Y ) = 0. (Compare 272Ye.)
(c) Use 275Yg to prove 276Xa.
(d) Let hXn in∈N be a sequence of random variables such that, for some δ > 0, supn∈N nδ E(|Xn |) is finite. Set
1
Sn = n+1 (X + . . . + Xn ) for each n. Show that limn→∞ Sn = 0 a.e. (Hint: set Zk = 2−k (|X0 | + . . . + |X2k −1 |). Show
P∞ 0
that k=0 E(Zk ) < ∞. Show that Sn ≤ 2Zk+1 if 2k < n ≤ 2k+1 .)
(e) Strong law of large numbers: sixth form Let hXn in∈N be a martingale difference sequence such that, for
1
some δ > 0, supn∈N E(|Xn |1+δ ) is finite. Set Sn = n+1 (X0 + . . . + Xn ) for each n. Show that limn→∞ Sn = 0 a.e. (Hint:
take a non-decreasing sequence hΣn in∈N to which hXn in∈N is adapted. Set Yn = Xn when |Xn | ≤ n, 0 otherwise. Let
Pnexpectation of Yn on Σn−1 and set Zn = Yn − Un . Use ideas from 273H, 276C and 276Yd above
Un be a conditional
1
to show that n+1 i=0 Vi → 0 a.e. for Vi = Zi , Vi = Ui , Vi = Xi − Yi .)

(f ) Show that there is a martingale hXn in∈N which converges in measure but is not convergent a.e. (Compare
273Ba.) (Hint: arrange that {ω : Xn+1 (ω) 6= 0} = En ⊆ {ω : |Xn+1 (ω) − Xn (ω)| ≥ 1}, where hEn in∈N is an
1
independent sequence of sets and µEn = for each n.)
n+1
1
(g) Give an example of an identically distributed martingale difference sequence hXn in∈N such that h n+1 (X0 + . . . +
Xn )in∈N does not converge to 0 almost everywhere. (Hint: start by devising a uniformly bounded sequence hUn in∈N
1
such that limn→∞ E(|Un |) = 0 but h n+1 (U0 + . . . + Un )in∈N does not converge to 0 almost everywhere. Now repeat
your construction in such a context that the Un can be derived from an identically distributed martingale difference
sequence by the formulae of 276Ye.)
(h) Construct a proof of Komlós’ theorem which does not involve ultrafilters, or any other use of the full axiom of
choice, but proceeds throughout by selecting appropriate sub-subsequences. Remember to check that you can prove
any fact you use about weakly convergent sequences in L1 on the same rules.

276 Notes and comments I include two more versions of the strong law of large numbers (276C, 276Ye) not because
I have any applications in mind but because I think that if you know the strong law for k k1+δ -bounded independent
sequences, and what a martingale difference sequence is, then there is something missing if you do not know the strong
law for k k1+δ -bounded martingale difference sequences. And then, of course, I have to add 276Yf and 276Yg, lest you
be tempted to think that the strong law is ‘really’ about martingale difference sequences rather than about independent
sequences. (Compare 272Yd and 275Yp.)
Komlós’ theorem is rather outside the scope of this volume; it is quite hard work and surely much less important,
to most probabilists, than many results I have omitted. It does provide a quick proof of 276Xe. However it is relevant
to questions arising in some topics treated in Volumes 3 and 4, and the proof fits naturally into this section.
390 Fourier analysis

Chapter 28
Fourier analysis
For the last chapter of this volume, I attempt a brief account of one of the most important topics in analysis. This is
a bold enterprise, and I cannot hope to satisfy the reasonable demands of anyone who knows and loves the subject as it
deserves. But I also cannot pass it by without being false to my own subject, since problems contributed by the study
of Fourier series and transforms have led measure theory throughout its history. What I will try to do, therefore, is to
give versions of those results which everyone ought to know in language unifying them with the rest of this treatise,
aiming to open up a channel for the transfer of intuitions and techniques between the abstract general study of measure
spaces, which is the centre of our work, and this particular family of applications of the theory of integration.
I have divided the material of this chapter, conventionally enough, into three parts: Fourier series, Fourier transforms
and the characteristic functions of probability theory. While it will be obvious that many ideas are common to all
three, I do not think it useful, at this stage, to try to formulate an explicit generalization to unify them; that belongs
to a more general theory of harmonic analysis on groups, which must wait until Volume 4. I begin however with a
section on the Stone-Weierstrass theorem (§281), which is one of the basic tools of functional analysis, as well as being
useful for this chapter. The final section (§286), a proof of Carleson’s theorem, is at a rather different level from the
rest.

281 The Stone-Weierstrass theorem


Before we begin work on the real subject of this chapter, it will be helpful to have a reasonably general statement
of a fundamental theorem on the approximation of continuous functions. In fact I give a variety of forms (281A, 281E,
281F and 281G, together with 281Ya, 281Yd and 281Yg), all of which are sometimes useful. I end the section with a
version of Weyl’s Equidistribution Theorem (281M-281N).

281A Stone-Weierstrass theorem: first form Let X be a topological space and K a compact subset of X.
Write Cb (X) for the space of all bounded continuous real-valued functions on X, so that Cb (X) is a linear space over
R. Let A ⊆ Cb (X) be such that
A is a linear subspace of Cb (X);
|f | ∈ A for every f ∈ A;
χX ∈ A;
whenever x, y are distinct points of K there is an f ∈ A such that f (x) 6= f (y).
Then for every continuous h : K → R and ǫ > 0 there is an f ∈ A such that
|f (x) − h(x)| ≤ ǫ for every x ∈ K,
if K 6= ∅, inf x∈X f (x) ≥ inf x∈K h(x) and supx∈X f (x) ≤ supx∈K h(x).
Remark I have stated this theorem in its natural context, that of general topological spaces. But if these are unfamiliar
to you, you do not in fact need to know what they are. If you read ‘let X be a topological space’ as ‘let X be a subset
of Rr ’ and ‘K is a compact subset of X’ as ‘K is a subset of X which is closed and bounded in R r ’, you will have
enough for all the applications in this chapter. In order to follow the proof, of course, you will need to know a little
about compactness in R r ; I have written out the necessary facts in §2A2.
proof (a) If K is empty, then we can take f = 0 to be the constant function with value 0. So henceforth let us suppose
that K 6= ∅.
(b) The first point to note is that if f , g ∈ A then f ∧ g and f ∨ g belong to A, where
(f ∧ g)(x) = min(f (x), g(x)), (f ∨ g)(x) = max(f (x), g(x))
for every x ∈ X; this is because
1 1
f ∧ g = (f + g − |f − g|), f ∨ g = (f + g + |f − g|).
2 2
It follows by induction on n that f0 ∧ . . . ∧ fn and f0 ∨ . . . ∨ fn belong to A for all f0 , . . . , fn ∈ A.
(c) If x, y are distinct points of K, and a, b ∈ R, there is an f ∈ A such that f (x) = a and f (y) = b. P
P Start from
g ∈ A such that g(x) 6= g(y); this is the point at which we use the last of the list of four hypotheses on A. Set
a−b bg(x)−ag(y)
α= , β= , f = αg + βχX ∈ A. Q
Q
g(x)−g(y) g(x)−g(y)
281B The Stone-Weierstrass theorem 391

(d) (The heart of the proof lies in the next two paragraphs.) Let h : K → [0, ∞[ be a continuous function and x
any point of K. For any ǫ > 0, there is an f ∈ A such that f (x) = h(x) and f (y) ≤ h(y) + ǫ for every y ∈ K. P P Let Gx
be the family of those open sets G ⊆SX for which there is some f ∈ A such that f (x) = h(x) and f (w) ≤ h(w) + ǫ for
every w ∈ K ∩ G. I claim that K ⊆ Gx . To see this, take any y ∈ K. By (c), there is an f ∈ A such that f (x) = h(x)
and f (y) = h(y). Now h − f ↾K : K → R is a continuous function, taking the value 0 at y, so there is an open subset G
S that (h − f ↾K)(w) ≥ −ǫ for every w ∈ G ∩ K, that is, f (w) ≤ h(w) + ǫ for every w ∈ G ∩ K.
of X, containing y, such
Thus G ∈ Gx and y ∈ Gx , as required.
Because K is compact, Gx has a finite subcover G0 , . . . , Gn say. For each i ≤ n, take fi ∈ A such that fi (x) = h(x)
and fi (w) ≤ h(w) + ǫ for every w ∈ Gi ∩ K. Then
f = f0 ∧ f1 ∧ . . . ∧ fn ∈ A,
by (b), and evidently f (x) = h(x), while if y ∈ K there is some i ≤ n such that y ∈ Gi , so that
f (y) ≤ fi (y) ≤ h(y) + ǫ. Q
Q

(e) If h : K → R is any continuous function and ǫ > 0, there is an f ∈ A such that |f (y) − h(y)| ≤ ǫ for every y ∈ K.
PP This time, let G be the set of those open subsets G of X for which there is some f ∈ A such that f (y) ≤ h(y) + ǫ
for every y ∈ K and f (x) ≥ h(x) − ǫ for every x ∈ G ∩ K. Once again, G is an open cover of K. To see this, take any
x ∈ K. By (d), there is an f ∈ A such that f (x) = h(x) and f (y) ≤ h(y) + ǫ for every y ∈ K. Now h − f ↾K : K → R is
a continuous function which is zero at x, so there is an open subset G of X, containing x, suchS that (h − f ↾K)(w) ≤ ǫ
for every w ∈ G ∩ K, that is, f (w) ≥ h(w) − ǫ for every w ∈ G ∩ K. Thus G ∈ G and x ∈ G, as required.
Because K is compact, G has a finite subcover G0 , . . . , Gm say. For each j ≤ m, take fj ∈ A such that fj (y) ≤ h(y)+ǫ
for every y ∈ K and fj (w) ≥ h(w) − ǫ for every w ∈ Gj ∩ K. Then
f = f0 ∨ f1 ∨ . . . ∨ fm ∈ A,
by (b), and evidently f (y) ≤ h(y) + ǫ for every y ∈ K, while if x ∈ K there is some j ≤ m such that x ∈ Gj , so that
f (x) ≥ fj (x) ≥ h(x) − ǫ.
Thus |f (x) − h(x)| ≤ ǫ for every x ∈ K, as required. Q
Q
(f ) Thus we have an f satisfying the first of the two requirements of the theorem. But for the second, set M0 =
inf x∈K h(x) and M1 = supx∈K h(x), and
f1 = (M0 χX) ∨ (f ∧ M1 χX);
f1 satisfies the second condition as well as the first. (I am tacitly assuming here what is in fact the case, that M0 and
M1 are finite; this is because K is compact – see 2A2G or 2A3N.)

281B We need some simple tools, belonging to the basic theory of normed spaces; but I hope they will be accessible
even if you have not encountered ‘normed spaces’ before, if you keep a finger at the beginning of §2A4 as you read the
next lemma.
Lemma Let X be any set. Write ℓ∞ (X) for the set of bounded functions from X to R. For f ∈ ℓ∞ (X), set
kf k∞ = supx∈X |f (x)|,
counting the supremum as 0 if X is empty. Then
(a) ℓ∞ (X) is a normed space.
(b) Let A ⊆ ℓ∞ (X) be a subset and A its closure (2A3D).
(i) If A is a linear subspace of ℓ∞ (X), so is A.
(ii) If f × g ∈ A whenever f , g ∈ A, then f × g ∈ A whenever f , g ∈ A.
(iii) If |f | ∈ A whenever f ∈ A, then |f | ∈ A whenever f ∈ A.
proof (a) This is a routine verification. To confirm that ℓ∞ (X) is a linear space over R, we have to check that f + g,
cf belong to ℓ∞ (X) whenever f , g ∈ ℓ∞ (X) and c ∈ R; simultaneously we can confirm that k k∞ is a norm on ℓ∞ (X)
by observing that
|(f + g)(x)| ≤ |f (x)| + |g(x)| ≤ kf k∞ + kgk∞ ,

|cf (x)| = |c||f (x)| ≤ |c|kf k∞


whenever f , g ∈ ℓ∞ (X) and c ∈ R. It is worth noting at the same time that if f , g ∈ ℓ∞ (X), then
|(f × g)(x)| = |f (x)||g(x)| ≤ kf k∞ kgk∞
392 Fourier analysis 281B

for every x ∈ X, so that kf × gk∞ ≤ kf k∞ kgk∞ .


(Of course all these remarks are very elementary special cases of parts of §243; see 243Xl.)
(b) Recall that
A = {f : f ∈ ℓ∞ (X), ∀ ǫ > 0 ∃ f1 ∈ A, kf − f1 k∞ ≤ ǫ}
(2A3Kb). Take f , g ∈ A and c ∈ R, and let ǫ > 0. Set
ǫ
η = min(1, ) > 0.
2+|c|+kf k∞ +kgk∞

Then there are f1 , g1 ∈ A such that kf − f1 k∞ ≤ η, kg − g1 k∞ ≤ η.


Now

k(f + g) − (f1 + g1 )k∞ ≤ kf − f1 k∞ + kg − g1 k∞ ≤ 2η ≤ ǫ,

kcf − cf1 k∞ = |c|kf − f1 k∞ ≤ |c|η ≤ ǫ,

k(f × g) − (f1 × g1 )k∞ = k(f − f1 ) × g + f × (g − g1 ) − (f − f1 ) × (g − g1 )k∞


≤ k(f − f1 ) × gk∞ + kf × (g − g1 )k∞ + k(f − f1 ) × (g − g1 )k∞
≤ kf − f1 k∞ kgk∞ + kf k∞ kg − g1 )k∞ + kf − f1 k∞ kg − g1 k∞
≤ η(kgk∞ + kf k∞ + η) ≤ η(kgk∞ + kf k∞ + 1) ≤ ǫ,

k|f | − |f1 |k∞ ≤ kf − f1 k∞ ≤ η ≤ ǫ.

(i) If A is a linear subspace, then f1 + g1 and cf1 belong to A. As ǫ is arbitrary, f + g and cf belong to A. As f ,
g and c are arbitrary, A is a linear subspace of ℓ∞ (X).
(ii) If A is closed under multiplication, then f1 × g1 ∈ A. As ǫ is arbitrary, f × g ∈ A.
(iii) If the absolute values of functions in A belong to A, then |f1 | ∈ A. As ǫ is arbitrary, |f | ∈ A.

281C Lemma There is a sequence hpn in∈N of real polynomials such that limn→∞ pn (x) = |x| uniformly for x ∈
[−1, 1].
proof (a) By the Binomial Theorem we have
1 1 2 1·3 3 P∞ (2n)!
(1 − x)1/2 = 1 − x − x − x − ... = − n=0 (2n−1)(2n n!)2 x
n
2 4·2! 23 ·3!
whenever |x| < 1, with the convergence being uniform on any interval [−a, a] with 0 ≤ a < 1. (For a proof of this,
see almost any book on real or complex analysis. If you have no favourite text to hand, you can try to construct
a proof from the following facts: (i) the radius of convergence of the series is 1, so on any interval [−a, a], with
0 ≤ a < 1, it is uniformly absolutely summable (ii) writing f (x) for the sum of the series for |x| < 1, use Lebesgue’s
Rx R0
Dominated Convergence Theorem to find expressions for the indefinite integrals 0 f , − −x f and show that these are
2 2
3 (1 − (1 − x)f (x)), 3 (1 − (1 + x)f (−x)) for 0 ≤ x < 1 (iii) use the Fundamental Theorem of Calculus to show that
2
d f (x)
f (x) + 2(1 − x)f ′ (x) = 0 (iv) show that dx 1−x = 0 and hence (v) that f (x)2 = 1 − x whenever |x| < 1. Finally,
show that because f is continuous and non-zero in ]−1, 1[, f (x) must be the positive square root of 1 − x throughout.)
We have a further fragment of information. If we set
1 Pn (2k)!
q0 (x) = 1, q1 (x) = 1 − x, qn (x) = − k=0 k 2
xk
2 (2k−1)(2 k!)

for n ≥ 2 and x ∈ [0, 1], so that qn is the nth partial sum of the binomial series for (1 − x)1/2 , then we have
limn→∞ qn (x) = (1 − x)1/2 for every x ∈ [0, 1[. But also every qn is non-increasing on [0, 1], and hqn (x)in∈N is a
non-increasing sequence for each x ∈ [0, 1]. So we must have

1 − x ≤ qn (x) ∀ n ∈ N, x ∈ [0, 1[,
and therefore, because all the qn are continuous,

1 − x ≤ qn (x) ∀ n ∈ N, x ∈ [0, 1].
281E The Stone-Weierstrass theorem 393

√ √
Moreover, given ǫ > 0, set a = 1 − 41 ǫ2 , so that 1 − a = 2ǫ . Then there is an n0 ∈ N such that qn (x) − 1 − x ≤ 2ǫ for

every x ∈ [0, a] and n ≥ n0 . In particular, qn (a) ≤ ǫ, so qn (x) ≤ ǫ and qn (x) − 1 − x ≤ ǫ for every x ∈ [a, 1], n ≥ n0 .
This means that

0 ≤ qn (x) − 1 − x ≤ ǫ ∀ n ≥ n0 , x ∈ [0, 1];

as ǫ is arbitrary, hqn (x)in∈N → 1 − x uniformly on [0, 1].
(b) Now set pn (x) = qn (1 − x2 ) for x ∈ R. Because each qn is a real polynomial of degree n, each pn is a real
polynomial of degree 2n. Next,
p
sup |pn (x) − |x|| = sup |qn (1 − x2 ) − 1 − (1 − x2 )|
|x|≤1 |x|≤1
p
= sup |qn (y) − 1 − y| → 0
y∈[0,1]

as n → ∞, so limn→∞ pn (x) = |x| uniformly for |x| ≤ 1, as required.

281D Corollary Let X be a set, and A a norm-closed linear subspace of ℓ∞ (X) containing χX and such that
f × g ∈ A whenever f , g ∈ A. Then |f | ∈ A for every f ∈ A.
proof Set
1
f1 = f,
1+kf k∞

so that f1 ∈ A and kf1 k∞ ≤ 1. Because A contains χX and is closed under multiplication, p ◦ f1 ∈ A for every
polynomial p with real coefficients. In particular, gn = pn ◦ f1 ∈ A for every n, where hpn in∈N is the sequence of 281C.
Now, because |f1 (x)| ≤ 1 for every x ∈ X,
kgn − |f1 |k∞ = supx∈X |pn (f1 (x)) − |f1 (x)|| ≤ sup|y|≤1 |pn (y) − |y|| → 0
as n → ∞. Because A is k k∞ -closed, |f1 | ∈ A; consequently |f | ∈ A, as claimed.

281E Stone-Weierstrass theorem: second form Let X be a topological space and K a compact subset of X.
Write Cb (X) for the space of all bounded continuous real-valued functions on X. Let A ⊆ Cb (X) be such that
A is a linear subspace of Cb (X);
f × g ∈ A for every f , g ∈ A;
χX ∈ A;
whenever x, y are distinct points of K there is an f ∈ A such that f (x) 6= f (y).
Then for every continuous h : K → R and ǫ > 0 there is an f ∈ A such that
|f (x) − h(x)| ≤ ǫ for every x ∈ K,
if K 6= ∅, inf x∈X f (x) ≥ inf x∈K h(x) and supx∈X f (x) ≤ supx∈K h(x).
proof Let A be the k k∞ -closure of A in ℓ∞ (X). It is helpful to know that A ⊆ Cb (X); this is because the uniform
limit of continuous functions is continuous. (But if this is new to you, or your memory has faded, don’t take time to
look it up now; just read ‘A ∩ Cb (X)’ in place of ‘A’ in the rest of this argument.) By 281B-281D, A is a linear subspace
of Cb (X) and |f | ∈ A for every f ∈ A, so the conditions of 281A apply to A.
Take a continuous h : K → R and an ǫ > 0. The cases in which K = ∅ or h is constant are trivial, because all
constant functions belong to A; so I suppose that M0 = inf x∈K h(x) and M1 = supx∈K h(x) are defined and distinct.
As observed at the end of the proof of 281A, M0 and M1 are finite. Set
η = min( 13 ǫ, 21 (M1 − M2 )) > 0, h̃(x) = med(M0 + η, h(x), M1 − η) for x ∈ K
(definition: 2A1Ac), so that h̃ : K → R is continuous and M0 + η ≤ h̃(x) ≤ M1 − η for every x ∈ K. By 281A, there
is an f0 ∈ A such that |f0 (x) − h̃(x)| ≤ η for every x ∈ K and M0 + η ≤ f0 (x) ≤ M1 − η for every x ∈ X. Now there
is an f ∈ A such that kf − f0 k∞ ≤ η, so that
|f (x) − h(x)| ≤ |f (x) − f0 (x)| + |f0 (x) − h̃(x)| + |h̃(x) − h(x)| ≤ 3η ≤ ǫ
for every x ∈ K, while
M0 ≤ f0 (x) − η ≤ f (x) ≤ f0 (x) + η ≤ M1
for every x ∈ X.
394 Fourier analysis 281F

281F Corollary: Weierstrass’ theorem Let K be any closed bounded subset of R. Then every continuous
h : K → R can be uniformly approximated on K by polynomials.
proof Apply 281E with X = K (noting that K, being closed and bounded, is compact), and A the set of polynomials
with real coefficients, regarded as functions from K to R.

281G Stone-Weierstrass theorem: third form Let X be a topological space and K a compact subset of X.
Write Cb (X; C) for the space of all bounded continuous complex-valued functions on X, so that Cb (X; C) is a linear
space over C. Let A ⊆ Cb (X; C) be such that
A is a linear subspace of Cb (X; C);
f × g ∈ A for every f , g ∈ A;
χX ∈ A;
the complex conjugate f¯ of f belongs to A for every f ∈ A;
whenever x, y are distinct points of K there is an f ∈ A such that f (x) 6= f (y).
Then for every continuous h : K → C and ǫ > 0 there is an f ∈ A such that
|f (x) − h(x)| ≤ ǫ for every x ∈ K,
if K 6= ∅, supx∈X |f (x)| ≤ supx∈K |h(x)|.
proof If K = ∅, or h is identically zero, we can take f = 0. So let us suppose that M = supx∈K |h(x)| > 0.
(a) Set
AR = {f : f ∈ A, f (x) is real for every x ∈ X}.
Then AR satisfies the conditions of 281E. PP (i) Evidently AR is a subset of Cb (X) = Cb (X; R), is closed under addition,
multiplication by real scalars and pointwise multiplication of functions, and contains χX. If x, y are distinct points of
K, there is an f ∈ A such that f (x) 6= f (y). Now
1 1
Re f = (f + f¯), Im f = (f − f¯)
2 2i
both belong to A and are real-valued, so belong to AR , and at least one of them takes different values at x and y. Q
Q
(b) Consequently, given a continuous function h : K → C and ǫ > 0, we may apply 281E twice to find f1 , f2 ∈ AR
such that
|f1 (x) − Re(h(x))| ≤ η, |f2 (x) − Im(h(x))| ≤ η
for every x ∈ K, where η = min( 12 , 6M
ǫM
+4 ) > 0. Setting g = f1 + if2 , we have g ∈ A and |g(x) − h(x)| ≤ 2η for every
x ∈ K.
(c) Set L = kgk∞ . If L ≤ M we can take f = g and stop. Otherwise, consider the function
M −η
φ(t) = √
max(M, t)
η
for t ∈ [0, L2 ]. By Weierstrass’ theorem (281F), there is a real polynomial p such that |φ(t) − p(t)| ≤ L whenever
0 ≤ t ≤ L2 . Note that |g|2 = g × ḡ ∈ A, so that
f = g × p(|g|2 ) ∈ A.
Now
η η M
|p(t)| ≤ φ(t) + ≤ φ(t) + √ = √
L max(M, t) max(M, t)

whenever 0 ≤ t ≤ L2 , so
M
|f (x)| ≤ |g(x)| ≤M
max(M,|g(x)|)

for every x ∈ X. Next, if 0 ≤ t ≤ min(L, M + 2η)2 ,


η η M −η 4η
|1 − p(t)| ≤ + 1 − φ(t) ≤ +1− ≤ .
L M M +2η M
Consequently, if x ∈ K, so that
|g(x)| ≤ min(L, |h(x)| + 2η) ≤ min(L, M + 2η),
281L The Stone-Weierstrass theorem 395

we shall have

|1 − p(|g(x)|2 )| ≤ ,
M
and

|f (x) − h(x)| ≤ |g(x) − h(x)| + |g(x)||1 − p(|g(x)|2 )|


4η 4η
≤ 2η + (M + 2η) ≤ 2η + (M + 1) ≤ ǫ,
M M
as required.
Remark Of course we could have saved ourselves effort by settling for
supx∈X |f (x)| ≤ 2 supx∈K |h(x)|,
which would be quite good enough for the applications below.

281H Corollary Let [a, b] ⊆ R be a non-empty bounded closed interval and h : [a, b] → C a continuous function.
Then for any ǫ > 0 there are y0 , . . . , yn ∈ R and c0 , . . . , cn ∈ C such that
Pn
|h(x) − k=0 ck eiyk x | ≤ ǫ for every x ∈ [a, b],
Pn
supx∈R | k=0 ck eiyk x | ≤ supx∈[a,b] |h(x)|.

proof Apply 281G with X = R, K = [a, b] and A the linear span of the functions x 7→ eiyx as y runs over R.

281I Corollary Let S 1 be the unit circle {z : |z| = 1} ⊆ C. Then for P any continuous function h : S 1 → C and
n
ǫ > 0, there are n ∈ N and c−n , c−n+1 , . . . , c0 , . . . , cn ∈ C such that |h(z) − k=−n ck z k | ≤ ǫ for every z ∈ S 1 .
proof Apply 281G with X = K = S 1 and A the linear span of the functions z 7→ z k for k ∈ Z.

281J Corollary Let h : [−π, π] → C be aP continuous function such that h(π) = h(−π). Then for any ǫ > 0 there
n
are n ∈ N, c−n , . . . , cn ∈ C such that |h(x) − k=−n ck eikx | ≤ ǫ for every x ∈ [−π, π].
proof The point is that h̃ : S 1 → C is continuous on S 1 , where h̃(z) = h(arg z); this is because arg is continuous
everywhere except at −1, and
limx↓−π h(x) = h(−π) = h(π) = limx↑π h(x),
so
limz∈S 1 ,z→−1 h̃(z) = h(π) = h̃(−1).
Pn
Now by 281H there are c−n , . . . , cn ∈ C such that |h̃(z) − k=−n ck z k | ≤ ǫ for every z ∈ S 1 , and these coefficients
serve equally for h.

281K Corollary Suppose that r ≥ 1 and that K ⊆ R r is a non-empty closed bounded set. Let h : K → C be a
continuous function, and ǫ > 0. Then there are y0 , . . . , yn ∈ Qr and c0 , . . . , cn ∈ C such that
Pn
|h(x) − k=0 ck eiyk . x | ≤ ǫ for every x ∈ K,
Pn
supx∈Rr | k=0 ck eiyk . x | ≤ supx∈K |h(x)|,
Pr
writing y . x = j=1 ηj ξj when y = (η1 , . . . , ηr ) and x = (ξ1 , . . . , ξr ) belong to R r .
proof Apply 281G with X = R r and A the linear span of the functions x 7→ eiy . x as y runs over Q r .

281L Corollary Suppose that r ≥ 1 and that K ⊆ R r is a non-empty closed bounded set. Let h : K → R be
a continuous function, and ǫ > 0. Then there are y0 , . . . , yn ∈ R r and c0 , . . . , cn ∈ C such that, writing g(x) =
P n iyk . x
k=0 ck e , g is real-valued and
|h(x) − g(x)| ≤ ǫ for every x ∈ K,

inf y∈K h(y) ≤ g(x) ≤ supy∈K h(y) for every x ∈ R r .

proof Apply 281E with X = R r and A the set of real-valued functions on R r which are complex linear combinations
of the functions x 7→ eiy . x ; as remarked in part (a) of the proof of 281G, A satisfies the conditions of 281E.
396 Fourier analysis 281M

281M Weyl’s Equidistribution Theorem We are now ready for one of the basic results of number theory. I
shall actually apply it to provide an example in §285 below, but (at least in the one-variable case) it is surely on the
(rather long) list of things which every pure mathematician should know. For the sake of the application I have in
mind, I give the full r-dimensional version, but you may wish to take it in the first place with r = 1.
It will be helpful to have a notation for ‘fractional part’. For any real number x, write <x> for that number in [0, 1[
such that x − <x> is an integer. Now for the theorem.

281N Theorem Let η1 , . . . , ηr be real numbers such that 1, η1 , . . . , ηr are linearly independent over Q. Then
whenever 0 ≤ αj ≤ βj ≤ 1 for each j ≤ r,
1 Qr
limn→∞ #({m : m ≤ n, <mηj > ∈ [αj , βj ] for every j ≤ r}) = j=1 (βj − αj ).
n+1

Remark Thus the theorem says that the long-term proportion of the r-tuples (<mη1 >, . . . , <mηr >) which belong to
the interval [a, b] ⊆ [0, 1] is just the Lebesgue measure µ[a, b] of the interval. Of course the condition ‘η1 , . . . , ηr are
linearly independent over Q’ is necessary as well as sufficient (281Xg).
proof (a) Write y = (η1 , . . . , ηr ) ∈ R r ,
r
<my> = (<mη1 >, . . . , <mηr >) ∈ [0, 1[ = [0, 1[
for each m ∈ N. Set I = [0, 1] = [0, 1]r , and for any function f : I → R write
1 Pn
L(f ) = lim supn→∞ m=0 f (<my>),n+1

1 Pn
L(f ) = lim inf n→∞ f (<my>);
n+1 m=0
and for f : I → C write
1 Pn
L(f ) = limn→∞ f (<my>)
n+1 m=0
if the limit exists. It will be worth noting that for non-negative functions f , g, h : I → R such that h ≤ f + g,
L(h) ≤ L(f ) + L(g),
and that L(cf + g) = cL(f ) + L(g) for any two functions f , g : I → C such that L(f ) and L(g) exist, and any c ∈ C.
R
(b) I mean to show that L(f ) exists and is equal to I f for (many) continuous functions f . The key step is to
consider functions of the form
f (x) = e2πik . x ,
where k = (κ1 , . . . , κr ) ∈ Zr . In this case, if k 6= 0,
Pr
k .y = j=1 κj η j ∈
/Z
because 1, η1 , . . . , ηr are linearly independent over Q. So

n
X n
X
1 1
L(f ) = lim e2πik . <my> = lim e2πimk . y
n→∞ n+1 n→∞ n+1
m=0 m=0
Pr
(because mk . y − k . <my> = j=1 κj (mηj − <mηj >) is an integer)

1 − e2πi(n+1)k . y
= lim
n→∞ (n + 1)(1 − e2πik . y )

(because e2πik . y 6= 1)
= 0,
because |1 − e2πi(n+1)k . y | ≤ 2 for every n. Of course we can also calculate the integral of f over I, which is

Z Z Z Y
r
f (x)dx = e2πik . x dx = e2πiκj ξj dx
I I I j=1

(writing x = (ξ1 , . . . , ξr ))
281N The Stone-Weierstrass theorem 397

Z 1 Z 1 r
Y
= ... e2πiκj ξj dξr . . . dξ1
0 0 j=1
Z 1 Z 1
= e2πiκr ξr dξr . . . e2πiκ1 ξ1 dξ1 = 0
0 0

because at least one κj is non-zero, and for this j we must have


R1 1
0
e2πiκj ξj dξj = (e2πiκj − 1) = 0.
2πiκj
R
So we have L(f ) = I
f = 0 when k 6= 0. On the other hand, if k = 0, then f is constant with value 1, so
1 Pn R
L(f ) = limn→∞ f (<my>) = limn→∞ 1 = 1 = f (x)dx.
n+1 m=0 I

R write ∂I = [0, 1] \ ]0,


(c) Now 1[, the boundary of I. If f : I → C is continuous and f (x) = 0 for x ∈ ∂I, then
P As in 281I, let S 1 be the unit circle {z : z ∈ C, |z| = 1}, and set K = (S 1 )r ⊆ Cr . If we think of K as
L(f ) = I f . P
a subset of R 2r , it is closed and bounded. Let φ : K → I be given by
1 arg ζ1 1 arg ζr
φ(ζ1 , . . . , ζr ) = ( + ,... , + )
2 2π 2 2π

for ζ1 , . . . , ζr ∈ S 1 . Then h = f φ : K → C is continuous, because φ is continuous on (S 1 \ {−1})r and


limw→z f φ(w) = f φ(z) = 0
1 r
for any z ∈ K \ (S \ {−1}) . (Compare 281J.) Now apply 281G with X = K and A the set of polynomials in
ζ1 , . . . , ζr , ζ1−1 , . . . , ζr−1 to see that, given ǫ > 0, there is a function of the form
P
g(z) = k∈J ck ζ1κ1 . . . ζrκr ,
for some finite set J ⊆ Zr and constants ck ∈ C for k ∈ J, such that
|g(z) − h(z)| ≤ ǫ for every z ∈ K.
Set
P P
g̃(x) = g(eπi(2ξ1 −1) , . . . , eπi(2ξr −1) ) = k∈J ck eπik . (2x−1) = k∈J (−1)
k.1
ck e2πik . x ,
so that g̃φ = g, and see that
supx∈I |g̃(x) − f (x)| = supz∈K |g(z) − h(z)| ≤ ǫ.
R
Now g̃ is of the form dealt with in (a), so we must have L(g̃) = I g̃. Let n0 be such that
R
g̃ − 1 Pn g̃(<my>) ≤ ǫ
I m=0 n+1

for every n ≥ n0 . Then


R R R
| I
f− I
g̃| ≤ I
|f − g̃| ≤ ǫ
and
n
X n
X n
X
1 1 1
| g̃(<my>) − f (<my>)| ≤ |g̃(<my>) − f (<my>)|
n+1 n+1 n+1
m=0 m=0 m=0
1
≤ (n + 1)ǫ =ǫ
n+1

for every n ∈ N. So for n ≥ n0 we must have


1 Pn R
| m=0 f (<my>) − I f | ≤ 3ǫ.
n+1
R
As ǫ is arbitrary, L(f ) = I
f , as required. Q
Q
r
(d) Observe next that if a, b ∈ ]0, 1[ = ]0, 1[ , and ǫ > 0, there are continuous functions f1 , f2 such that
R R
f1 ≤ χ[a, b] ≤ f2 ≤ χ ]0, 1[, f −
I 2 I
f1 ≤ ǫ.
398 Fourier analysis 281N

P This is elementary. For n ∈ N, define hn : R → [0, 1] by setting hn (ξ) = 0 if ξ ≤ 0, 2n ξ if 0 ≤ ξ ≤ 2−n and 1 if


P
ξ ≥ 2−n . Set
Qr
f1n (x) = j=1 hn (ξj − αj )hn (βj − ξj ),
Qr
f2n (x) = j=1 (1 − hn (αj − ξj ))(1 − hn (ξj − βj ))
for x = (ξ1 , . . . , ξr ) ∈ R r . (Compare the proof of 242Oa.) Then f1n ≤ χ[a, b] ≤ f2n for each n, f2n ≤ χ ]0, 1[ for all n
so large that
2−n ≤ min(minj≤r αj , minj≤r (1 − βj )),
and limn→∞ f2n (x) − f1n (x) = 0 for every x, so
R R
limn→∞ f −
I 2n I
f1n = 0.
Thus we can take f1 = f1n , f2 = f2n for any n large enough. Q
Q
(e) It follows that if a, b ∈ ]0, 1[ and a ≤ b, L(χ[a, b]) = µ[a, b]. P
P Let ǫ > 0. Take f1 , f2 as in (d). Then, using (c),
R R
L(χ[a, b]) ≤ L(f2 ) = L(f2 ) = f ≤
I 2 I
f1 + ǫ ≤ µ[a, b] + ǫ,
R R
L(χ[a, b]) ≥ L(f1 ) = L(f1 ) = I
f1 ≥ I
f2 − ǫ ≥ µ[a, b] − ǫ,
so
µ[a, b] − ǫ ≤ L(χ[a, b]) ≤ L(χ[a, b]) ≤ µ[a, b] + ǫ.
As ǫ is arbitrary,
µ[a, b] = L(χ[a, b]) = L(χ[a, b]) = L(χ[a, b]),
as required. Q
Q
(f ) To complete the proof, take any a, b ∈ I with a ≤ b. For 0 ≤ ǫ ≤ 12 , set Iǫ = [ǫ1, (1 − ǫ)1], so that Iǫ is a closed
interval included in ]0, 1[ and µIǫ = (1 − 2ǫ)r . Of course L(χI) = µI = 1, so
L(χ(I \ Iǫ )) = L(χI) − L(χIǫ ) = 1 − µIǫ ,
and

µ[a, b] − 1 + µIǫ ≤ µ[a, b] + µIǫ − µ([a, b] ∪ Iǫ ) = µ([a, b] ∩ Iǫ )


= L(χ([a, b] ∩ Iǫ )) ≤ L(χ([a, b]))
≤ L(χ([a, b])) ≤ L(χ([a, b] ∩ Iǫ )) + L(χ(I \ Iǫ ))
= L(χ([a, b] ∩ Iǫ )) + 1 − µIǫ
= µ([a, b] ∩ Iǫ ) + 1 − µIǫ ≤ µ[a, b] + 1 − µIǫ .
As ǫ is arbitrary,
µ[a, b] = L(χ[a, b]) = L(χ[a, b]) = L(χ[a, b]),
as stated.

281X Basic exercises (a) Let PnA be the set of those bounded continuous functions f : R r × R r → R which are
expressible in the form f (x, y) = k=0 gk (x)gk (y), where all the gk , gk′ are continuous functions from R r to R. Show

that for any bounded continuous function h : R r × R r → R and any bounded set K ⊆ R r × R r and any ǫ > 0, there is
an f ∈ A such that |f (x, y) − h(x, y)| ≤ ǫ for every (x, y) ∈ K and supx,y∈R r |f (x, y)| ≤ supx,y∈R r |h(x, y)|.

(b) Let K be a closed bounded set in R r , where r ≥ 1, and h : K → R a continuous function. Show that for any
ǫ > 0 there is a polynomial p in r variables such that |h(x) − p(x)| ≤ ǫ for every x ∈ K.

> (c) Let [a, b] be a non-empty closed interval of R and h : [a, b] → R a continuous function. Show that for any ǫ > 0
there are y0 , . . . , yn , a0 , . . . , an , b0 , . . . , bn ∈ R such that
Pn
|h(x) − k=0 (ak cos yk x + bk sin yk x)| ≤ ǫ for every x ∈ [a, b],
Pn
supx∈R | k=0 (ak cos yk x + bk sin yk x)| ≤ supx∈[a,b] |h(x)|.
281Ye The Stone-Weierstrass theorem 399

(d) Let h be a complex-valued function on ]−π, π]Psuch that |h|p is integrable, where 1 ≤ p < ∞. Show
R π that for every
n
ǫ > 0 there is a function of the form x 7→ f (x) = k=−n ck eikx , where c−k , . . . , ck ∈ C, such that −π |h − f |p ≤ ǫ.
(Compare 244H.)

> (e) Let h : [−π, π] → R be a continuous function such that h(π) = h(−π), and ǫ > 0. Show that there are
a0 , . . . , an , b1 , . . . , bn ∈ R such that
1 Pn
|h(x) − a0 − k=1 (ak cos kx + bk sin kx)| ≤ ǫ
2

for every x ∈ [−π, π].

(f ) Let K be a non-empty closed bounded set in R r , where r ≥ 1, and h : K → R a continuous function. Show that
for any ǫ > 0 there are y0 , . . . , yn ∈ R r , a0 , . . . , an , b0 , . . . , bn ∈ R such that
Pn
|h(x) − k=0 (ak cos yk . x + bk sin yk . x)| ≤ ǫ for every x ∈ K,
Pn
supx∈R | k=0 (ak cos yk . x + bk sin yk . x)| ≤ supx∈K |h(x)|,
interpreting y . x as in 281K.

(g) Let y1 , . . . , yr be real numbers which are not linearly independent over Q. Show that there is a non-trivial
interval [a, b] ⊆ [0, 1] ⊆ R r such that (<ky1 >, . . . , <kyr >) ∈
/ [a, b] for every k ∈ Z.

(h) Let η1 , . . . , ηr be real numbers such that 1, η1 , . . . , ηr are linearly independent over Q. Suppose that 0 ≤ αj ≤
βj ≤ 1 for each j ≤ r. Show that for every ǫ > 0 there is an n0 ∈ N such that
Qr 1
| j=1 (βj − αj ) − #({m : k ≤ m ≤ k + n, <mηj > ∈ [αj , βj ] for every j ≤ r})| ≤ ǫ
n+1

whenever n ≥ n0 and k ∈ N. (Hint: in 281N, set


1 Pk+n
L(f ) = lim supn→∞ supk∈N f (<my>).)
n+1 m=k

281Y Further exercises (a) Show that under the hypotheses of 281A, there is an f ∈ A, the k k∞ -closure of A in
Cb (X), such that f ↾K = h. (Hint: take f = limn→∞ fn where
kfn+1 − fn k∞ ≤ supx∈K |fn (x) − h(x)| ≤ 2−n
for every n ∈ N.)

(b) Let X be a topological space and K ⊆ X a compact subset. Suppose that for any distinct points x, y of K there
is a continuous function f : X → R such that f (x) 6= f (y). Show that for any r ∈ N and any continuous h : K → R r
there is a continuous f : X → R r extending h. (Hint: consider r = 1 first.)

(c) Let hXi ii∈I be any family of compact Hausdorff spaces, and X their product as topological spaces. For each i,
write C(Xi ) for the set of continuous functions from Xi to R, and πi : X → Xi for the coordinate map. Show that the
subalgebra of C(X) generated by {f πi : i ∈ I, f ∈ C(Xi )} is k k∞ -dense in C(X). (Note: you will need to know that
X is compact, and that if Z is any compact Hausdorff space then for any distinct z, w ∈ Z there is an f ∈ C(Z) such
that f (z) 6= f (w). For references see 3A3J and 3A3Bf in the next volume.)

(d) Let X be a topological space and K a compact subset of X. Let A be a linear subspace of the space Cb (X) of
real-valued continuous functions on X such that |f | ∈ A for every f ∈ A. Let h : K → R be a continuous function such
that whenever x, y ∈ K there is an f ∈ A such that f (x) = h(x) and f (y) = h(y). Show that for every ǫ > 0 there is
an f ∈ A such that |f (x) − h(x)| ≤ ǫ for every x ∈ K.

(e) Let X be a compact topological space and write C(X) for the set of continuous functions from X to R. Suppose
that h ∈ C(X), and let A ⊆ C(X) be such that
A is a linear subspace of C(X);
either |f | ∈ A for every f ∈ A or f × g ∈ A for every f , g ∈ A or f × f ∈ A for every f ∈ A;
whenever x, y ∈ X and δ > 0 there is an f ∈ A such that |f (x) − h(x)| ≤ δ, |f (y) − h(y)| ≤ δ.
Show that for every ǫ > 0 there is an f ∈ A such that |h(x) − f (x)| ≤ ǫ for every x ∈ X.
400 Fourier analysis 281Yf

(f ) Let X be a compact topological space and A a k k∞ -closed linear subspace of the space C(X) of continuous
functions from X to R. Show that the following are equiveridical:
(i) |f | ∈ A for every f ∈ A;
(ii) f × f ∈ A for every f ∈ A;
(iii) f × g ∈ A for all f , g ∈ A,
and that in this case A is closed in C(X) for the topology defined by the pseudometrics
(f, g) 7→ |f (x) − g(x)| : C(X) × C(X) → [0, ∞[
as x runs over X (the ‘topology of pointwise convergence’ on C(X)).

(g) Show that under the hypotheses of 281G there is an f ∈ A, the k k∞ -closure of A in Cb (X; C), such that f ↾K = h
and (if K 6= ∅) kf k∞ = supx∈K |h(x)|.

(h) Let y ∈ R be irrational. Show that for any Riemann integrable function f : [0, 1] → R,
R1 1 Pn
0
f (x)dx = limn→∞ m=0 f (<my>),
n+1

writing <my> for the fractional part of my. (Hint: recall Riemann’s criterion: for any ǫ > 0, there are a0 , . . . , an
with 0 = a0 ≤ a1 ≤ . . . ≤ an = 1 and
P
{aj − aj−1 : j ≤ n, supx∈[aj−1 ,aj ] f (x) − inf x∈[aj−1 ,aj ] f (x) ≥ ǫ} ≤ ǫ.)

1
Pn
(i) Let htn in∈N be a sequence in [0, 1]. Show that the following are equiveridical: (i) limn→∞ n+1 k=0 f (tk )
R1 1
Pn R1
= 0 f for every continuous function f : [0, 1] → R; (ii) limn→∞ n+1 k=0 f (tk ) = 0 f for every Riemann integrable
1
function f : [0, 1] → R; (iii) lim inf n→∞ n+1 #({k : k ≤ n, tk ∈ G}) ≥ µG for every open set G ⊆ [0, 1]; (iv)
1 1
limn→∞ n+1 #({k : k ≤ n, tk ≤ α}) = α for every α ∈ [0, 1]; (v) limn→∞ n+1 #({k : k ≤ n, tk ∈ E}) = µE for every
1
Pn 2πimtk
E ⊆ [0, 1] such that µ(int E) = µE (vi) limn→∞ n+1 k=0 e = 0 for every m ≥ 1. (Cf. 273J. Such sequences
htn in∈N are called equidistributed or uniformly distributed.)

(j) Show that the sequence h< ln(n + 1)>in∈N is not equidistributed.

(k) Give [0, 1]N its product measure λ. Show that λ-almost every sequence htn in∈N ∈ [0, 1]N is equidistributed in
the sense of 281Yi. (Hint: 273J.)
R Ra
(l) Let f : [0, 1]2 → C be a continuous function. Show that if γ ∈ R is irrational then [0,1]2 f = lima→∞ a1 0 f (<t>, <γt>)dt.
(Hint: consider first functions of the form x 7→ e2πik . x .)
1
(m) A sequence htn in∈N in [0, 1] is well-distributed if lim inf n→∞ inf l∈N n+1 #({k : l ≤ k ≤ l + n, tk ∈ G}) ≥ µG
R1 1
Pk+n
for every open set G ⊆ [0, 1]. (i) Show that htn in∈N is well-distributed iff limn→∞ supl∈N | 0 f − n+1 k=l f (tk )| = 0
for every continuous f : [0, 1] → R. (ii) Show that h<nα>in∈N is well-distributed for every irrational α.

281 Notes and comments I have given three statements (281A, 281E and 281G) of the Stone-Weierstrass theorem,
with an acknowledgement (281F) of Weierstrass’ own version, and three further forms (281Ya, 281Yd, 281Yg) in the
exercises. Yet another will appear in §4A6 in Volume 4. Faced with such a multiplicity, you may wish to try your own
hand at writing out theorems which will cover some or all of these versions. I myself see no way of doing it without
setting up a confusing list of alternative hypotheses and conclusions. At which point, I ask ‘what is a theorem, anyway?’,
and answer, it is a stopping-place on our journey; it is a place where we can rest, and congratulate ourselves on our
achievement; it is a place which we can learn to recognise, and use as a starting point for new adventures; it is a place
we can describe, and share with others. For some theorems, like Fermat’s last theorem, there is a canonical statement,
an exactly locatable point. For others, like the Stone-Weierstrass theorem here, we reach a mass of closely related
results, all depending on some arrangement of the arguments laid out in 281A-281G and 281Ya (which introduces
a new idea), and all useful in different ways. I suppose, indeed, that most authors would prefer the versions 281Ya
and 281Yg, which eliminate the variable ǫ which appears in 281A, 281E and 281G, at the expense of taking a closed
subspace A. But I find that the corollaries which will be useful later (281H-281L) are more naturally expressed in
terms of linear subspaces which are not closed.
The applications of the theorem, or the theorems, or the method – choose your own expression – are legion; only a
few of them are here. An apparently innocent one is in 281Xa and, in a different variant, in 281Yc; these are enormously
282Ba Fourier series 401

important in their own domains. In this volume the principal application will be to 285L below, depending on 281K,
and it is perhaps right to note that there is an alternative approach to this particular result, based on ideas in 282G.
But I offer Weyl’s equidistribution theorem (281M-281N) as evidence that we can expect to find good use for these
ideas in almost any branch of mathematics.

282 Fourier series


Out of the enormous theory of Fourier series, I extract a few results which may at least provide a foundation for
further study. I give the definitions of Fourier and Fejér sums (282A), with five of the most important results concerning
their convergence (282G, 282H, 282J, 282L, 282O). On the way I include the Riemann-Lebesgue lemma (282E). I end
by mentioning convolutions (282Q).

282A Definition Let f be an integrable complex-valued function defined almost everywhere in ]−π, π].

(a) The Fourier coefficients of f are the complex numbers


Z π
1
ck = f (x)e−ikx dx
2π −π

for k ∈ Z.

(b) The Fourier sums of f are the functions


n
X
sn (x) = ck eikx
k=−n

for x ∈ ]−π, π], n ∈ N.


P∞
(c) The Fourier series of f is the series k=−∞ ck eikx , or (because we ordinarily consider the symmetric partial
P∞
sums sn ) the series c0 + k=1 (ck eikx + c−k e−ikx ).

(d) The Fejér sums of f are the functions


m
X
1
σm = sn
m+1
n=0

for m ∈ N.

(e) It will be convenient to have a further phrase available. If f is any function with dom f ⊆ ]−π, π], its periodic
S
extension is the function f˜, with domain k∈Z (dom f + 2kπ), such that f˜(x) = f (x − 2kπ) whenever k ∈ Z and
x ∈ dom f + 2kπ.

282B Remarks I have made two more or less arbitrary choices here.

(a) I have chosen to express Fourier series in their ‘complex’ form rather than their ‘real’ form. From the point
of view of pure measure theory (and, indeed, from the point of view of the nineteenth-century origins of the subject)
there are gains in elegance from directing attention to real functions f and looking at the real coefficients
Z π
1
ak = f (x) cos kx dx for k ∈ N,
π −π
Z π
1
bk = f (x) sin kx dx for k ≥ 1.
π −π

If we do this we have
1
c 0 = a0 ,
2
and for k ≥ 1 we have
402 Fourier analysis 282Ba

1 1
ck = (ak − ibk ), c−k = (ak + ibk ), ak = ck + c−k , bk = i(ck − c−k ),
2 2
so that the Fourier sums become
n
X
1
sn (x) = a0 + ak cos kx + bk sin kx.
2
k=1

The advantage of this is that real functions f correspond to real coefficients ak , bk , so that it is obvious that if f is
real-valued so are its Fourier and Fejér sums. The disadvantages are that we have to use a variety of trigonometric
equalities which are rather more complicated than the properties of the complex exponential function which they reflect,
and that we are farther away from the natural generalizations to locally compact abelian groups. So both electrical
engineers and harmonic analysts tend to prefer the coefficients ck .

(b) I have taken the functions f to be defined on the interval ]−π, π] rather than on the circle S 1 = {z : z ∈ C, |z| =
1}. There would be advantages in elegance of language in using S 1 , though I do not recall often seeing the formula
R
ck = z k f (z)dz
1
R
which is the natural translation of ck = 2π eikx f (x)dx under the substitution x = arg z, dx = 2πν(dz). However,
applications of the theory tend to deal with periodic functions on the real line, so I work with ]−π, π], and accept the
fact that its group operation +2π , writing x +2π y for whichever of x + y, x + y + 2π, x + y − 2π belongs to ]−π, π], is
less familiar than multiplication on S 1 .

(c) The remarks in (b) are supposed to remind you of §255.

(d) Observe that if f =a.e. g then f and g have the same Fourier coefficients, Fourier sums and Fejér sums. This
means that we could, if we wished, regard the ck , sn and σm as associated with a member of L1C , the space of equivalence
classes of integrable functions (§242), rather than as associated with a particular function f . Since however the sn and
σm appear as actual functions, and since many of the questions we are interested in refer to their values at particular
points, it is more natural to express the theory in terms of integrable functions f rather than in terms of members of
L1C .

282C The problems (a) Under what conditions, and in what senses, do the Fourier and Fejér sums sn and σm of
a function f converge to f ?

(b) How do the properties of the double-ended sequence hck ik∈Z reflect the properties of f , and vice versa?
Remark The theory of Fourier series has been one of the leading topics of analysis for nearly two hundred years, and
innumerable further problems have contributed greatly to our understanding. (For instance: can one characterize those
sequences hck ik∈Z which are the Fourier coefficients of some integrable function?) But in this outline I will concentrate
on the question (a) above, with one and a half results (282K, 282Rb) addressing (b), which will give us more than
enough material to work on.
While most people would feel that the Fourier sums are somehow closer to what we really want to know, it turns out
that the Fejér sums are easier to analyse, and there are advantages in dealing with them first. So while you may wish
to look ahead to the statements of 282J, 282L and 282O for an idea of where we are going, the first half of this section
will be largely about Fejér sums. Note that in any case in which we know that the Fourier sums converge (which is
quite common; see, for instance, the examples in 282Xh and 282Xo), then if we know that the Fejér sums converge to
f , we can deduce that the Fourier sums also do, by 273Ca.
The first step is a basic lemma showing that both the Fourier and Fejér sums of a function f can be thought of as
convolutions of f with kernels describable in terms of familiar functions.

282D Lemma Let f be a complex-valued function which is integrable over ]−π, π], and
1
Rπ Pn 1
Pm
ck = 2π −π
f (x)e−ikx dx, sn (x) = k=−n ck eikx , σm (x) = m+1 n=0 sn (x)

its Fourier coefficients, Fourier sums and Fejér sums. Write f˜ for the periodic extension of f (282Ae). For m ∈ N,
write

1−cos(m+1)t
ψm (t) =
2π(m+1)(1−cos t)
282D Fourier series 403

m+1
for 0 < |t| ≤ π. (If you like, you can set ψm (0) = 2π to make ψm continuous on [−π, π].)
(a) For each n ∈ N, x ∈ ]−π, π],
Z π
1 sin(n+ 21 )(x−t)
sn (x) = f (t) dt
2π −π
sin 21 (x−t)
Z π 1
1 sin(n+ 2 )t
= f˜(x + t) 1 dt
2π −π
sin 2 t
Z π
1 sin(n+ 21 )t
= f (x −2π t) dt,
2π −π
sin 21 t

writing x −2π t for whichever of x − t, x − t − 2π, x − t + 2π belongs to ]−π, π].


(b) For each m ∈ N, x ∈ ]−π, π],
Z π
σm (x) = f˜(x + t)ψm (t)dt
−π
Z π
= (f˜(x + t) + f˜(x − t))ψm (t)dt
0
Z π
= f (x −2π t)ψm (t)dt.
−π

(c) For any n ∈ N,


Z 0 Z π Z π
1 sin(n+ 21 )t 1 sin(n+ 21 )t 1 1 sin(n+ 21 )t
1 dt = 1 dt = , dt = 1.
2π −π
sin 2
t 2π 0
sin 2
t 2 2π −π
sin 21 t

(d) For any m ∈ N,


m+1
(i) 0 ≤ ψm (t) ≤ for every t;

(ii) for any δ > 0, limm→∞ ψm (t) = 0 uniformly on {t : δ ≤ |t| ≤ π};
R0 Rπ 1 Rπ
(iii) −π ψm = 0 ψm = , −π m
ψ = 1.
2
proof Really all that these amount to is summing geometric series.
(a) For (a), we have
n
X eint − e−i(n+1)t
e−ikt =
1 − e−it
k=−n
1 1
ei(n+ 2 )t − e−i(n+ 2 )t sin(n + 21 )t
= = .
sin 12 t
1
e 2 it −e − 12 it

So
n
X Z π n
X
1 
sn (x) = ck eikx = f (t) eik(x−t) dt
2π −π
k=−n k=−n
Z π Z π
1 sin(n+ 21 )(x−t) 1 sin(n+ 2 )(x−t) 1
= f (t) dt = f˜(t) dt
2π −π
sin 21 (x−t) 2π −π
1
sin 2 (x−t)
Z π−x Z π
1 sin(n+ 21 )t 1 sin(n+ 2 )t 1
= f˜(x + t) dt = f˜(x + t) dt
2π −π−x
sin 12 t 2π −π
1
sin 2 t

sin(n+ 1 )t
because f˜ and t 7→ sin 1 t2 are periodic with period 2π, so that the integral from −π − x to −π must be the same as
2
the integral from π − x to π.
For the expression in terms of f (x −2π t), we have

Z π 1
Z π 1
1 sin(n+ 2 )t 1 sin(n+ 2 )(−t)
sn (x) = f˜(x + t) 1 dt = f˜(x − t) 1 dt
2π −π
sin 2 t 2π −π
sin 2 (−t)
(substituting −t for t)
404 Fourier analysis 282D
Z π
1 sin(n+ 21 )t
= f (x −2π t) dt
2π −π
sin 21 t

because (for x, t ∈ ]−π, π]) f (x −2π t) = f˜(x − t) whenever either is defined, and sin is an odd function.
(b) In the same way, we have
m
X m
X m
X
1 1  1 
sin(n + )t = Im ei(n+ 2 )t = Im e 2 it eint
2
n=0 n=0 n=0
i(m+1)t 
1 1−e 1 − ei(m+1)t 
= Im e 2 it = Im 1 1
1 − eit e− 2 it − e 2 it
1 − ei(m+1)t  i(1 − ei(m+1)t ) 
= Im = Im
−2i sin 12 t 2 sin 12 t
1 − cos(m + 1)t
= .
2 sin 12 t
So
Pm sin(n+ 21 )t 1−cos(m+1)t 1−cos(m+1)t
n=0 = = = 2π(m + 1)ψm (t).
sin 21 t 2 sin2 12 t 1−cos t

Accordingly,

m
X
1
σm (x) = sn (x)
m+1
n=0
Xm Z π 1
1 1 sin(n+ 2 )t
= f˜(x + t) 1 dt
m+1 2π −π
sin 2 t
n=0
Z π m
X
1 1 sin(n+ 12 )t 
= f˜(x + t) dt
2π −π
m+1 sin 12 t
n=0
Z π Z π
= f˜(x + t)ψm (t)dt = f (x −2π t)ψm (t)dt
−π −π

as in (a), because cos and ψm are even functions. For the same reason,
Z π Z 0
f˜(x − t)ψm (t)dt = f˜(x + t)ψm (t)dt,
0 −π
so
Z π
σm (x) = (f˜(x + t) + f˜(x − t))ψm (t)dt.
0

sin(n+ 12 )t
(c) We need only look at where the formula sin 12 t
came from to see that
Z Z X
n
1 sin(n+ 21 )t 1
dt = eikt dt
2π I
sin 21 t 2π I k=−n
Z n
X
1 1
= (1 + 2 cos kt)dt =
2π I
2
k=1
R
for both I = [−π, 0] and I = [0, π], because I
cos kt dt = 0 for every k 6= 0.
(d)(i) ψm (t) ≥ 0 for every t because 1 − cos(m + 1)t, 1 − cos t are always greater than or equal to 0. For the upper
bound, we have, using the constructions in (a) and (b),
n
sin(n+ 1 )t X
1
2 = eikt ≤ 2n + 1
sin 2 t
k=−n
282F Fourier series 405

for every n, so
m
X
1 sin(n+ 21 )t
ψm (t) =
2π(m+1) sin 12 t
n=0
Xm
1 m+1
≤ 2n + 1 = .
2π(m+1) 2π
n=0

(ii) If δ ≤ |t| ≤ π,
1 1
ψm (t) ≤ ≤ →0
π(m+1)(1−cos t) π(m+1)(1−cos δ)
as m → ∞.
(iii) also follows from the construction in (b), because
Z m Z
X m
X
1 sin(n+ 12 )t 1 1 1
ψm = dt = =
I
2π(m+1) I
sin 21 t m+1 2 2
n=0 n=0

for both I = [−π, 0] and I = [0, π], using (c).


Remarks For a discussion of substitution in integrals, if you feel any need to justify the manipulations in part (a) of
the proof, see 263I.
The functions
sin(n+ 21 )t 1−cos(m+1)t
t 7→ , t 7→
sin 21 t (m+1)(1−cos t)

are called respectively the Dirichlet kernel and the Fejér kernel.
I give the formulae in terms of f (x −2π t) in (a) and (b) in order to provide a link with the work of 255O.

282E The next step is a vital lemma, with a suitably distinguished name which (you will be glad to know) reflects
its importance rather than its difficulty.
The Riemann-Lebesgue lemma Let f be a complex-valued function which is integrable over R. Then
R R
limy→∞ f (x)e−iyx dx = limy→−∞ f (x)e−iyx dx = 0.

proof (a) Consider first the case in which f = χ ]a, b[, where a < b. Then
R Rb 1 2
| f (x)e−iyx dx| = | a
e−iyx dx| = | (e−iyb − e−iya )| ≤
−iy |y|
if y 6= 0. So in this case the result is obvious.
(b) It follows at once that the result is true if f is a step-function with bounded support, that is, if there are
a0 ≤ a1 . . . ≤ an such that f is constant on every interval ]aj−1 , aj [ and zero outside [a0 , an ].
R
(c) Now, for a given integrable f and ǫ > 0, there is a step-function g such that |f − g| ≤ ǫ (242Oa). So
R R R
| f (x)e−iyx dx − g(x)e−iyx dx| ≤ |f (x) − g(x)|dx ≤ ǫ
for every y, and
R
lim supy→∞ | f (x)e−iyx dx| ≤ ǫ,
R
lim supy→−∞ | f (x)e−iyx dx| ≤ ǫ.
As ǫ is arbitrary, we have the result.

282F Corollary (a) Let f be a complex-valued function which is integrable over ]−π, π], and hck ik∈Z its sequence
of Fourier coefficients. Then limk→∞ ck = limk→−∞ ck = 0. R
(b) Let f be a complex-valued function which is integrable over R. Then limy→∞ f (x) sin yx dx = 0.
406 Fourier analysis 282F

proof (a) We need only identify


Z π
1
ck = f (x)e−ikx dx
2π −π
R
with g(x)e−ikx dx, where g(x) = f (x)/2π for x ∈ dom f and 0 for |x| > π.
(b) This is just because
R 1 R R
f (x) sin yx dx = ( f (x)eiyx dx − f (x)e−iyx dx).
2i

282G We are now ready for theorems on the convergence of Fejér sums. I start with an easy one, almost a
warming-up exercise.
Theorem Let f : ]−π, π] → C be a continuous function such that limt↓−π f (t) = f (π). Then its sequence hσm im∈N of
Fejér sums converges uniformly to f on ]−π, π].
proof The conditions on f amount just to saying that its periodic extension f˜ is defined and continuous everywhere
on R. Consequently it is bounded and uniformly continuous on any bounded interval, in particular, on the interval
[−2π, 2π]. Set K = sup|t|≤2π |f˜(t)| = supt∈]−π,π] |f (t)|. Write
1−cos(m+1)t
ψm (t) =
2π(m+1)(1−cos t)
for m ∈ N, 0 < |t| ≤ π, as in 282D.
Given ǫ > 0 we can find a δ ∈ ]0, π] such that |f˜(x + t) − f˜(x)| ≤ ǫ whenever x ∈ [−π, π] and |t| ≤ δ. Next, we can
ǫ
find an m0 ∈ N such that Mm ≤ 4πK for every m ≥ m0 , where Mm = supδ≤|t|≤π ψm (t) (282D(d-ii)). Now suppose
that m ≥ m0 and x ∈ ]−π, π]. Set g(t) = f˜(x + t) − f (x) for |t| ≤ π. Then |g(t)| ≤ 2K for all t ∈ [−π, π] and |g(t)| ≤ ǫ
if |t| ≤ δ, so
Z π Z −δ Z δ Z π

g × ψm ≤ |g| × ψm + |g| × ψm + |g| × ψm
−π −π −δ δ
Z δ
≤ 2Mm K(π − δ) + ǫ ψm + 2Mm K(π − δ)
−δ
≤ 4πMm K + ǫ ≤ 2ǫ.
Consequently, using 282Db and 282D(d-iii),
Z π
|σm (x) − f (x)| = | (f˜(x + t) − f (x))ψm (t)dt| ≤ 2ǫ
−π

for every m ≥ m0 ; and this is true for every x ∈ ]−π, π]. As ǫ is arbitrary, hσm im∈N converges to f uniformly on
]−π, π].

282H I come now to a theorem describing the behaviour of the Fejér sums of general functions f . The hypothesis
of the theorem may take a little bit of digesting; you can get an idea of its intended scope by glancing at Corollary
282I.
Theorem Let f be a complex-valued function which is integrable over ]−π, π], and hσm im∈N its sequence of Fejér
sums. Suppose that x ∈ ]−π, π] and c ∈ C are such that
Z δ
1
lim |f˜(x + t) + f˜(x − t) − 2c|dt = 0,
δ↓0 δ 0

writing f˜ for the periodic extension of f , as usual; then limm→∞ σm (x) = c.


Rt
proof Set φ(t) = |f˜(x + t) + f˜(x − t) − 2c| when this is defined, which is almost everywhere, and Φ(t) = 0
φ, which
is defined for every t ≥ 0, because f˜ is integrable over ]−π, π] and therefore over every bounded interval.
As in 282D, set
1−cos(m+1)t
ψm (t) =
2π(m+1)(1−cos t)
282I Fourier series 407

for m ∈ N, 0 < |t| ≤ π. We have


Z π Z π
|σm (x) − c| = | (f˜(x + t) + f˜(x − t) − 2c)ψm (t)dt| ≤ φ(t)ψm
0 0

by (b) and (d) of 282D.


Let ǫ > 0. By hypothesis, Rlimt↓0 Φ(t)/t = 0; let δ ∈ ]0, π] be such that Φ(t) ≤ ǫt for every t ∈ [0, δ]. Take any
π
m ≥ π/δ. I break the integral 0 φ × ψm up into three parts.
(i) For the integral from 0 to 1/m, we have
Z 1/m Z 1/m
m+1 m+1 1 ǫ(m+1)
φ × ψm ≤ φ = Φ( ) ≤ ≤ ǫ,
0 0
2π 2π m 2πm

m+1
because ψm (t) ≤ 2π for every t (282D(d-i)).
(ii) For the integral from 1/m to δ, we have

Z δ Z δ Z δ
1 1 π φ(t)
φ × ψm ≤ φ(t) dt ≤ dt
1/m 2π(m + 1) 1/m 1 − cos t 4(m + 1) 1/m t2
2t2
(because 1 − cos t ≥ for |t| ≤ π)
π2
1 Z δ
π Φ(δ) Φ( m ) 2Φ(t) 
= 2
− 1 + dt
4(m + 1) δ (m) 2
1/m t3
(integrating by parts – see 225F)
Z δ
π ǫ 2ǫ 
≤ + dt
4(m + 1) δ 1/m t2
(because Φ(t) ≤ ǫt for 0 ≤ t ≤ δ)
π ǫ  πǫ πǫ ǫ πǫ
≤ + 2ǫm ≤ + ≤ + ≤ 2ǫ.
4(m + 1) δ 4(m + 1)δ 2 4 2

(iii) For the integral from δ to π, we have


Z π Z π
1
φ × ψm ≤ φ → 0 as m → ∞
δ δ
π(m+1)(1−cos δ)

because φ is integrable over [−π, π]. There must therefore be an m0 ∈ N such that
Z π
φ × ψm ≤ ǫ
δ

for every m ≥ m0 .
Putting these together, we see that
Z π
φ × ψm ≤ ǫ + 2ǫ + ǫ = 4ǫ
0

for every m ≥ max(m0 , πδ ). As ǫ is arbitrary, limm→∞ σm (x) = c, as claimed.

282I Corollary Let f be a complex-valued function which is integrable over ]−π, π], and hσm im∈N its sequence of
Fejér sums.
R πm→∞ σm (x) for almost every x ∈ ]−π, π].
(a) f (x) = lim
(b) limm→∞ −π |f − σm | = 0.
(c) If g is another integrable function with the same Fourier coefficients, then f =a.e. g.
(d) If x ∈ ]−π, π[ is such that a = limt∈dom f,t↑x f (t) and b = limt∈dom f,t↓x f (t) are both defined in C, then
1
limm→∞ σm (x) = (a + b).
2
408 Fourier analysis 282I

(e) If a = limt∈dom f,t↑π f (t) and b = limt∈dom f,t↓−π f (t) are both defined in C, then
1
limm→∞ σm (π) = (a + b).
2

(f) If f is defined and continuous at x ∈ ]−π, π[, then


limm→∞ σm (x) = f (x).
(g) If f˜, the periodic extension of f , is defined and continuous at π, then
limm→∞ σm (π) = f (π).

proof (a) We have only to recall that by 223D


Z δ
1
lim sup |f (x + t) + f (x − t) − 2f (x)|dt
δ↓0 δ 0
Z δ Z δ
1 
≤ lim sup |f (x + t) − f (x)|dt + |f (x − t) − f (x)|dt
δ↓0 δ 0 0
Z δ
1
= lim sup |f (x + t) − f (x)|dt = 0
δ↓0 δ −δ

for almost every x ∈ ]−π, π[.


(b) Next observe that, in the language of 255O,
σ m = f ∗ ψm ,
by the last formula in 282Db. Consequently, by 255Od,
kσm k1 ≤ kf k1 kψm k1 ,

writing kσm k1 = −π
|σm |. But this means that we have
f (x) = limm→∞ σm (x) for almost every x, lim supm→∞ kσm k1 ≤ kf k1 ;
and it follows from 245H that limm→∞ kf − σm k1 = 0.
(c) If g has the same Fourier coefficients as f , then it has the same Fourier and Fejér sums, so we have
g(x) = limm→∞ σm (x) = f (x)
almost everywhere.
(d)-(e) Both of these amount to considering x ∈ ]−π, π] such that
limt∈dom f˜,t↑x f˜(t) = a, limt∈dom f˜,t↓x f˜(t) = b.
Setting c = 21 (a + b), φ(t) = |f˜(x + t) + f˜(x − t) − 2c| whenever this is defined, we have limt∈dom φ,t↓0 φ(t) = 0, so surely

limδ↓0 1δ 0 φ = 0, and the theorem applies.
(f )-(g) are special cases of (d) and (e).

282J I now turn to conditions for the convergence of Fourier sums. Probably the easiest result – one which is both
striking and satisfying – is the following.
Theorem Let f be a complex-valued function which is square-integrable over ]−π, π]. Let hck ik∈Z be its Fourier
coefficients and hsn in∈N its Fourier sums (282A). Then
P∞ 1 Rπ
(i) k=−∞ |ck |2 = |f |2 ,
2π −π

(ii) limn→∞ −π |f − sn |2 = 0.

proof (a) I recall some notation from 244N/244P. Let L2C be the space of square-integrable complex-valued functions
on ]−π, π]. For g, h ∈ L2C , write
Z π p
(g|h) = g × h̄, kgk2 = (g|g).
−π
282J Fourier series 409

Recall that kg + hk2 ≤ kgk2 + khk2 for all g, h ∈ L2C (244Fb/244Pb). For k ∈ Z, x ∈ ]−π, π] set ek (x) = eikx , so that
Z π
(f |ek ) = f (x)e−ikx dx = 2πck .
−π

Moreover, if |k| ≤ n,
n
X Z π
(sn |ek ) = cj eijx e−ikx dx = 2πck ,
j=−n −π

because
Z π
eijx e−ikx dx = 2π if j = k,
−π
= 0 if j 6= k.
So
(f − sn |ek ) = 0 whenever |k| ≤ n;
in particular,
n
X
(f − sn |sn ) = c̄k (f − sn |ek ) = 0
k=−n

for every n ∈ N.
(b) Fix ǫ > 0. TheP next element of the proof is the fact that there are m ∈ N, a−m , . . . , am ∈ C such that
m
kf −hk2 ≤ ǫ, where h = k=−m ak ek . P P By 244Hb/244Pb we know that there is a continuous function g : [−π, π] → C
such that kf − gk2 ≤ 3ǫ . Next, modifying g on a suitably short interval ]π − δ, π], we can find a continuous function
g1 : [−π, π] → C such that kg − g1 k2 ≤ 3ǫ and g1 (−π) = g1 (π). (Set M = supx∈[−π,π] |g(x)|, take δ ∈ ]0, 2π] such that
(2M )2 δ ≤ (ǫ/3)2 , and set g1 (π − tδ) = tg(π − δ) + (1 − t)g(−π) for t ∈ [0, 1].) Either by the Stone-Weierstrass theorem
Pm ǫ
(281J), or by 282G above, there are a−m , . . . , am such that |g1 (x) − k=−m ak eikx | ≤ √ for every x ∈ [−π, π];
3 2π
Pm
setting h = k=−m ak ek , we have kg1 − hk2 ≤ 31 ǫ, so that
kf − hk2 ≤ kf − gk2 + kg − g1 k2 + kg1 − hk2 ≤ ǫ. Q
Q

(c) Now take any n ≥ m. Then sn − h is a linear combination of e−n , . . . , en , so (f − sn |sn − h) = 0. Consequently

ǫ2 ≥ (f − h|f − h)
= (f − sn |f − sn ) + (f − sn |sn − h) + (sn − h|f − sn ) + (sn − h|sn − h)
= kf − sn k22 + ksn − hk22 ≥ kf − sn k22 .

Thus kf − sn k2 ≤ ǫ for every n ≥ m. As ǫ is arbitrary, limn→∞ kf − sn k22 = 0, which proves (ii).


(d) As for (i), we have
n
X n
X
1 1 1
|ck |2 = c̄k (sn |ek ) = (sn |sn ) = ksn k22 .
2π 2π 2π
k=−n k=−n

But of course

ksn k2 − kf k2 ≤ ksn − f k2 → 0
as n → ∞, so

X Z π
1 1 1
|ck |2 = lim ksn k22 = kf k22 = |f |2 ,
2π n→∞ 2π 2π −π
k=−∞

as required.
410 Fourier analysis 282K

282K Corollary Let L2C be the Hilbert space of equivalence classes of square-integrable complex-valued functions
on ]−π, π], with the inner product
Z π
(f • |g • ) = f × ḡ
−π

and norm
Z π 1/2
kf k2 =

|f |2 ,
−π

writing f • ∈ L2C for the equivalence class of a square-integrable function f . Let ℓ2C (Z) be the Hilbert space of square-
summable double-ended complex sequences, with the inner product

X
(cc|dd) = ck d¯k
k=−∞

and norm

X 1/2
kcck2 = |ck |2
k=−∞

for c = hck ik∈Z , d = hdk ik∈Z in ℓ2C (Z). Then we have an inner-product-space isomorphism S : L2C → ℓ2C (Z) defined by
saying that
1 Rπ
S(f • )(k) = √ f (x)e−ikx dx
2π −π

for every square-integrable function f and every k ∈ Z.


proof (a) As in 282J, write L2C for the space of square-integrable functions. If f , g ∈ L2C and f • = g • , then f =a.e. g,
so
1 Rπ 1 Rπ
√ f (x)e−ikx dx = √ g(x)e−ikx dx
2π −π 2π −π
for every k ∈ N. Thus S is well-defined.
P This is elementary. If f , g ∈ L2C and c ∈ C,
(b) S is linear. P
Z π
1
S(f • + g • )(k) = √ (f (x) + g(x))e−ikx dx
2π −π
Z π Z π
1 1
= √ f (x)e−ikx dx + √ g(x)e−ikx dx
2π −π
2π −π
= S(f • )(k) + S(g • )(k)

for every k ∈ Z, so that S(f • + g • ) = S(f • ) + S(g • ). Similarly,


1 Rπ c Rπ
S(cf • )(k) = √ cf (x)e−ikx dx = √ f (x)e−ikx dx = cS(f • )(k)
2π −π 2π −π

for every k ∈ Z, so that S(cf • ) = cS(f • ). Q


Q

(c) If f ∈ L2C has Fourier coefficients ck , then S(f • ) = hck 2πik∈Z , so by 282J(i)

X Z π
kS(f • )k22 = 2π |ck |2 = |f |2 = kf • k22 .
k=−∞ −π

Thus Su ∈ ℓ2C (Z) and kSuk2 = kuk2 for every u ∈ L2C . Because S is linear and norm-preserving, it is surely injective.
(d) It now follows that (Sv|Su) = (v|u) for every u, v ∈ L2C . P
P (This is of course a standard fact about Hilbert
spaces.) We know that for any t ∈ R
282L Fourier series 411

kuk22 + 2 Re(eit (v|u)) + kvk22 = (u|u) + eit (v|u) + e−it (u|v) + (v|v)
= (u + eit v|u + eit v)
= ku + eit vk22 = kS(u + eit v)k22
= kSuk22 + 2 Re(eit (Sv|Su)) + kSvk22
= kuk22 + 2 Re(eit (Sv|Su)) + kvk22 ,

so that Re(eit (Sv|Su)) = Re(eit (v|u)). As t is arbitrary, (Sv|Su) = (v|u). Q


Q
(n)
P Let c = hck ik∈Z be any member of ℓ2C (Z). Set ck = ck if |k| ≤ n, 0 otherwise, and
(e) Finally, S is surjective. P
(n)
c(n) = hck ik∈N . Consider

n
X
sn = c k ek , un = s•n
k=−n

where I write ek (x) = √1 eikx for x ∈ ]−π, π]. Then Sun = c (n) , by the same calculations as in part (a) of the proof

of 282J. Now
qP
kcc(n) − c k2 = |k|>n |ck |2 → 0
as n → ∞, so
kum − un k2 = kcc(m) − c (n) k2 → 0
as m, n → ∞, and hun in∈N is a Cauchy sequence in L2C . Because L2C is complete (244G/244Pb), hun in∈N has a limit
u ∈ L2C , and now
Su = limn→∞ Sun = limn→∞ c (n) = c . Q
Q
Thus S : L2C → ℓ2C (Z) is an inner-product-space isomorphism.

Remark In the language of Hilbert spaces, all that is happening here is that he•k ik∈Z is a ‘Hilbert space basis’ or
‘complete orthonormal sequence’ in L2C , which is matched by S with the standard basis of ℓ2C (Z). The only step which
calls on non-trivial real analysis, as opposed to the general theory of Hilbert spaces, is the check that the linear subspace
generated by {e•k : k ∈ Z} is dense; this is part (b) of the proof of 282J.
Observe that while S : L2 → ℓ2 is readily described, its inverse P∞is more of a problem. If c ∈ ℓ2 , we should like
to say that S −1c is the equivalence class of f , where f (x) = √12π k=−∞ ck eikx for every x. This works very well if
{k : ck 6= 0} is finite, but for the general case it is less clear how to interpret the sum. It is in fact the case that if
c ∈ ℓ2 then
1 Pn
g(x) = √ limn→∞ k=−n ck eikx

is defined for almost every x ∈ ]−π, π], and that S −1c = g • in L2 ; this is, in effect, Carleson’s theorem (286V). A proof
of Carleson’s theorem is out of our reach for the moment. What is covered by the results of this section is that
1 1 Pm P n ikx
h(x) = √ limm→∞ n=0 k=−n ck e
2π m+1

is defined for almost every x ∈ ]−π, π], and that h• = S −1c . (The point is that we know from the result just proved
that there is some square-integrable f such that c is the sequence of Fourier coefficients of f ; now 282Ia declares that
the Fejér sums of f converge to f almost everywhere, that is, that h =a.e. √12π f .)

282L The next result is the easiest, and one of the most useful, theorems concerning pointwise convergence of
Fourier sums.

Theorem Let f be a complex-valued function which is integrable over ]−π, π], and hsn in∈N its sequence of Fourier
sums.
(i) If f is differentiable at x ∈ ]−π, π[, then f (x) = limn→∞ sn (x).
(ii) If the periodic extension f˜ of f is differentiable at π, then f (π) = limn→∞ sn (π).
412 Fourier analysis 282L

proof (a) Take x ∈ ]−π, π] such that f˜ is differentiable at x; of course this covers both parts. We have
Z π
1 f˜(x+t) 1
sn (x) = 1 sin(n + )t dt
2π −π
sin 2 t 2

for each n, by 282Da.


(b) Next,
Z π ˜
f (x+t)−f˜(x)
dt
−π
t

exists in C, because there is surely some δ ∈ ]0, π] such that (f˜(x + t) − f˜(x))/t is bounded on {t : 0 < |t| ≤ δ}, while
Z −δ ˜ Z π ˜
f (x+t)−f˜(x) f (x+t)−f˜(x)
dt, dt
−π
t δ
t

exist because 1/t is bounded on those intervals. It follows that


Z π
f˜(x+t)−f˜(x)
dt
−π
sin 21 t

exists, because |t| ≤ π| sin 12 t| if |t| ≤ π. So by the Riemann-Lebesgue lemma (282Fb),


Z π
f˜(x+t)−f˜(x) 1
lim 1 sin(n + )t dt = 0.
n→∞ −π
sin 2 t 2

(c) Because
Z π 1
1 sin(n+ 2 )t
f˜(x) 1 dt = f˜(x)
2π −π
sin 2 t

for every n (282Dc),


Z π ˜
1 f (x+t)−f˜(x) 1
sn (x) = f˜(x) + sin(n + )t dt → f˜(x)
2π −π
sin 21 t 2

as n → ∞, as required.

282M Lemma Suppose that f is a complex-valued function, defined almost everywhere and of bounded variation
on ]−π, π]. Then supk∈Z |kck | < ∞, where ck is the kth Fourier coefficient of f , as in 282A.
proof Set
M = limx∈dom f,x↑π |f (x)| + Var]−π,π[ (f ).
By 224J,
Z π Z c
1 1
|kck | = kf (t)e−ikt dt ≤ M sup ke−ikt dt
2π −π 2π c∈[−π,π] −π

M M
= sup |e−ikc − eikπ | ≤
2π c∈[−π,π] π

for every k.

282N I give another lemma, extracting the technical part of the proof of the next theorem. (Its most natural
application is in 282Xn.)
Pn 1
Pm
Lemma Let hdk ik∈N be a complex sequence, and set tn = k=0 dk , τm = m+1 n=0 tn for n, m ∈ N. Suppose that
supk∈N |kdk | = M < ∞. Then for any j ≥ 1 and any c ∈ C,
M
|tn − c| ≤ + (2j + 3) supm≥n−n/j |τm − c|
j

for every n ≥ j 2 .
282O Fourier series 413

proof (a) The first point to note is that for any n, n′ ∈ N,


M |n−n′ |
|tn − tn′ | ≤ .
1+min(n,n′ )

P If n = n′ this is trivial. Suppose that n′ < n. Then


P
n
X n
X M M (n−n′ ) M |n−n′ |
|tn − tn′ | = | dk | ≤ ≤ = .
k n′ +1 1+min(n′ ,n)
k=n′ +1 k=n′ +1

Of course the case n < n′ is identical. Q


Q
(b) Now take any n ≥ j 2 . Set η = supm≥n−n/j |τm − c|. Let m ≥ j be such that jm ≤ n < j(m + 1); then
n < jm + m; also
n(1 − 1j ) ≤ m(j + 1)(1 − 1j ) ≤ mj.
Set

jm+m
X
1 jm+m+1 jm+1
τ∗ = tn′ = τjm+m − τjm .
m m m
n′ =jm+1

Then

jm+m+1 jm+1
|τ ∗ − c| = | τjm+m − τjm − c|
m m
jm+m+1 jm+1
=| (τjm+m − c) − (τjm − c)|
m m
jm+m+1 jm+1
≤ η + η ≤ (2j + 3)η.
m m

On the other hand,

jm+m
X jm+m
X
1 1 M |n−n′ |
|τ ∗ − tn | = (tn′ − tn ) ≤
m m 1+min(n,n′ )
n′ =jm+1 n′ =jm+1
jm+m
X
1 Mm Mm M
≤ = ≤ .
m 1+jm 1+jm j
n′ =jm+1

Putting these together, we have


M M
|tn − c| ≤ |tn − τ ∗ | + |τ ∗ − c| ≤ + (2j + 3)η = + (2j + 3) supm≥n−n/j |τm − c|,
j j
as required.

282O Theorem Let f be a complex-valued function of bounded variation, defined almost everywhere in ]−π, π],
and let hsn in∈N be its sequence of Fourier sums.
(i) If x ∈ ]−π, π[, then
1
limn→∞ sn (x) = (limt∈dom f,t↑x f (t) + limt∈dom f,t↓x f (t)).
2

1
(ii) limn→∞ sn (π) = (limt∈dom f,t↑π f (t) + limt∈dom f,t↓−π f (t)).
2
(iii) If f is defined throughout ]−π, π], is continuous, and limt↓−π f (t) = f (π), then sn (x) → f (x) uniformly on
]−π, π].
proof (a) Note first that 224F shows that the limits limt∈dom f,t↓x f (t), limt∈dom f,t↑x f (t) required in the formulae
above always exist. We know also from 282M that M = supk∈Z |kck | < ∞, where ck is the kth Fourier coefficient of f .
Take any x ∈ ]−π, π], and set
c = 12 (limt∈dom f,t↑x f˜(t) + limt∈dom f˜,t↓x f˜(t)),
414 Fourier analysis 282O

writing f˜ for the periodic extension of f , as usual. We know from 282Id-282Ie that c = limm→∞ σm (x), writing σm for
the Fejér sums of f . Let ǫ > 0. Take any j ≥ max(2, 2M/ǫ), and m0 ≥ 1 such that |σm (x) − c| ≤ ǫ/(2j + 3) for every
m ≥ m0 .
Now if n ≥ max(j 2 , 2m0 ), apply Lemma 282N with
d0 = c 0 , dk = ck eikx + c−k e−ikx for k ≥ 1,
so that tn = sn (x), τm = σm (x) and |kdk | ≤ 2M for every k, n, m ∈ N. We have n − n/j ≥ 21 n ≥ m0 , so
ǫ
η = supm≥n−n/j |τm − c| ≤ supm≥m0 |τm − c| ≤ .
2j+3
So 282N tells us that
2M
|sn (x) − c| = |tn − c| ≤ + (2j + 3) supm≥n−n/j |τm − c| ≤ ǫ + (2j + 3)η ≤ 2ǫ.
j

As ǫ is arbitrary, limn→∞ sn (x) = c, as required.


(b) This proves (i) and (ii) of this theorem. Finally, for (iii), observe that under these conditions σm (x) → f (x)
uniformly as m → ∞, by 282G. So given ǫ > 0 we choose j ≥ max(2, 2M/ǫ) and m0 ∈ N such that |σm (x) − f (x)| ≤
ǫ/(2j + 3) whenever m ≥ m0 and x ∈ ]−π, π]. By the same calculation as before,
|sn (x) − f (x)| ≤ 2ǫ
for every n ≥ max(j 2 , 2m0 ) and every x ∈ ]−π, π]. As ǫ is arbitrary, limn→∞ sn (x) = f (x) uniformly for x ∈ ]−π, π].

282P Corollary Let f be a complex-valued function which is integrable over ]−π, π], and hsn in∈N its sequence of
Fourier sums.
(i) Suppose that x ∈ ]−π, π[ is such that f is of bounded variation on some neighbourhood of x. Then
1
limn→∞ sn (x) = (limt∈dom f,t↑x f (t) + limt∈dom f,t↓x f (t)).
2

(ii) If there is a δ > 0 such that f is of bounded variation on both ]−π, −π + δ] and [π − δ, π], then
1
limn→∞ sn (π) = (limt∈dom f,t↑π f (t) + limt∈dom f,t↓−π f (t)).
2

proof In case (i), take δ > 0 such that f is of bounded variation on [x − δ, x + δ] and set f1 (t) = f (t) if x ∈
dom f ∩ [x − δ, x + δ], 0 for other t ∈ ]−π, π]; in case (ii), set f1 (t) = f (t) if t ∈ dom f and |t| ≥ π − δ, 0 for other
t ∈ ]−π, π], and say that x = π. In either case, f1 is of bounded variation, so by 282O the Fourier sums hs′n in∈N of
f1 converge at x to the value given by the formulae above. But now observe that, writing f˜ and f˜1 for the periodic
extensions of f and f1 , f˜ − f˜1 = 0 on a neighbourhood of x, so
Z π
f˜(x+t)−f˜1 (x+t)
1 dt
−π
sin 2 t

exists in C, and by 282Fb


Z π
f˜(x + t) − f˜1 (x + t) 1
lim sin(n + )t dt = 0,
n→∞ −π sin 12 t 2

that is, limn→∞ sn (x) − s′n (x) = 0. So hsn in∈N also converges to the right limit.

282Q I cannot leave this section without mentioning one of the most important facts about Fourier series, even
though I have no space here to discuss its consequences.
Theorem Let f and g be complex-valued functions which are integrable over ]−π, π], and hck ik∈N , hdk ik∈N their
Fourier coefficients. Let f ∗ g be their convolution, defined by the formula
Z π Z π
(f ∗ g)(x) = f (x −2π t)g(t)dt = f˜(x − t)g(t)dt,
−π −π

as in 255O, writing f˜ for the periodic extension of f . Then the Fourier coefficients of f ∗ g are h2πck dk ik∈Z .
proof By 255O(c-i),
282Xg Fourier series 415

Z π Z π Z π
1 1
(f ∗ g)(x)e−ikx dx = e−ik(t+u) f (t)g(u)dtdu
2π −π
2π −π −π
Z π Z π
1
= e−ikt f (t)dt e−iku g(u)du = 2πck dk .
2π −π −π

*282R In my hurry to get to the theorems on convergence of Fejér and Fourier sums, I have rather neglected the
elementary manipulations which are essential when applying the theory. One basic result is the following.
Proposition (a) Let f : [−π, π] → C be an absolutely continuous function such that f (−π) = f (π), and hck ik∈Z its
sequence of Fourier coefficients. Then the Fourier coefficients of f ′ are hikck ik∈Z .

(b) Let f : R → C be a differentiable function such that P∞f is absolutely continuous on [−π, π], and f (π) = f (−π).
If hck ik∈Z are the Fourier coefficients of f ↾ ]−π, π], then k=−∞ |ck | is finite.
proof (a) By 225Cb, f ′ is integrable over [−π, π]; by 225E, f is an indefinite integral of f ′ . So 225F tells us that
Rπ Rπ
−π
f ′ (x)e−ikx dx = f (π)e−ikπ − f (−π)eikπ + ik −π
f (x)e−ikx dx = ikck
for every k ∈ Z.
(b)(i) Suppose first that f ′ (π) = f ′ (−π). By (a), applied twice, the Fourier coefficients of f ′′ are h−k 2 ck ik∈Z , so
P∞ 1 P∞
supk∈Z k 2 |ck | is finite; because k=1 2 < ∞, k=−∞ |ck | < ∞.
k

(ii) Next, suppose that f (x) = x2 for every x. Then, for k 6= 0,


Z Z π
1 2 −ikx 1 1 2 −ikπ 2 ikπ 2x −ikx 
ck = x e dx = − (π e −π e )+ e dx
2π 2π ik −π
ik
Z π
1 1 1  2
= − (πe−ikπ + πeikπ ) + eikx dx = 2 (−1)k ,
ikπ ik ik −π
k

P P∞ 1
so k∈Z |ck | ≤ |c0 | + 4 k=1 k 2 is finite.

1
(iii) In general, we can express f as f1 + cf2 where f2 (x) = x2 for every x, c = (f ′ (π) − f ′ (−π)), and f1 satisfies

the conditions of (i); so that hck ik∈Z is the sum of two summable sequences and is itself summable.

282X Basic exercises > (a) P∞Suppose that hck ik∈N is an absolutely summable double-ended sequence of complex
numbers. Show that f (x) = k=−∞ ck eikx exists for every x ∈ R, that f is continuous and periodic, and that its
Fourier coefficients are the ck .

(c) Set φn (t) = 2t sin(n + 12 t) for t 6= 0. (This is sometimes called the modified Dirichlet kernel.) Show that for
any integrable function f on ]−π, π], with Fourier sums hsn in∈N and periodic extension f˜,

limn→∞ |sn (x) − 1
2π −π
φn (t)f˜(x + t)dt| = 0
2 1
for every x ∈ ]−π, π]. (Hint: show that t − sin 21 t
is bounded, and use 282E.)

(d) Give a proof of 282Ib from 242O, 255O and 282G.

(e) Give another proof of 282Ic, based on 242O and 281J instead of on 282H.

(f ) Use the idea of 255Ya to shorten one of the steps in the proof of 282H, taking gm (t) = min( m+1 π
2π , 4(m+1)t2 ) for
|t| ≤ δ, so that gm ≥ ψm on [−δ, δ].

> (g)(i) Let f be a realPsquare-integrable function


Rπ on ]−π, π], and hak ip
k∈N , hbk ik≥1 its real Fourier coefficients
∞ √ √
(282Ba). Show that 21 a20 + k=1 (a2k + b2k ) = π1 −π |f |2 . (ii) Show that f 7→ ( π2 a0 , πa1 , πb1 , . . . ) defines an inner-
product-space isomorphism between the real Hilbert space L2R of equivalence classes of real square-integrable functions
on ]−π, π] and the real Hilbert space ℓ2R of square-summable sequences.
416 Fourier analysis 282Xh

(h) Show that π4 = 1 − 13 + 51 − 71 + . . . . (Hint: find the Fourier series of f where f (x) = x/|x|, and compute the
sum of the series at π2 . Of course there are other methods, e.g., examining the Taylor series for arctan π4 .)

(i) Let f be an integrable complex-valued function on ]−π, π], and hsn in∈N its sequence of Fourier sums. Suppose
Rπ f (t)−a
that x ∈ ]−π, π[, a ∈ C are such that −π
dt exists and is finite. Show that limn→∞ sn (x) = a. Explain how this
t−x
generalizes 282L. What modification is appropriate to obtain a limit limn→∞ sn (π)?

(j) Suppose that α > 0, K ≥ 0 and f : ]−π, π[ → C are such that |f (x) − f (y)| ≤ K|x − y|α for all x, y ∈ ]−π, π[.
(Such functions are called Hölder continuous.) Show that the Fourier sums of f converge to f everywhere in ]−π, π[.
(Hint: use 282Xi.) (Compare 282Yb.)

(k) In 282L, show that it is enough if f˜ is differentiable with respect to its domain at x or π (see 262Fb), rather
than differentiable in the strict sense.
Ra Rb
(l) Show that lima→∞ 0 sint t dt exists and is finite. (Hint: use 224J to estimate a sint t dt for 0 < a ≤ b.)
R∞ | sin t| Ra cos 2t
Ra sin2 t
(m) Show that 0 t dt = ∞. (Hint: show that supa≥0 | 1 t dt| < ∞, and therefore that supa≥0 1 t dt =
∞.)

> (n) Let hdk ik∈N be a sequence in C such that supk∈N |kdk | < ∞ and
1 P m Pn
limm→∞
m+1 n=0 k=0 dk = c ∈ C.
P∞
Show that c = k=0 dk . (Hint: 282N.)
P∞ 1 π2
> (o) Show that n=1 n2 = 6 . (Hint: (b-ii) of the proof of 282R.)

(p) Let f be an integrable complex-valued function on ]−π, π], and hsn in∈N its sequence of Fourier sums. Suppose
that x ∈ ]−π, π[ is such that
(i) there is an a ∈ C such that
R x a−f (t)
either −π dt exists in C
x−t
or there is some δ > 0 such that f is of bounded variation on [x − δ, x], and a = limt∈dom f,t↑x f (t)
(ii) there is a b ∈ C such that
R π f (t)−b
either x dt exists in C
t−x
or there is some δ > 0 such that f is of bounded variation on [x, x + δ], and b = limt∈dom f,t↓x f (t).
Show that limn→∞ sn (x) = 12 (a + b). What modification is appropriate to obtain a limit limn→∞ sn (π)?

> (q) Let f , g be integrable complex-valued


P∞ functions on ]−π, P∞ π], and c = hck ik∈Z , d = hdk ik∈Z their sequences of
Fourier coefficients. Suppose that either k=−∞ |ck | < ∞ or k=−∞ |ck |2 + |dk |2 < ∞. Show that the sequence of
Fourier coefficients of f × g is just the convolution c ∗ d of c and d (255Xk).

(r) In 282Ra, what happens if f (π) 6= f (−π)?


P∞
(s) Suppose that hck ik∈N is a double-ended sequence of complex numbers such that k=−∞ |kck | < ∞. Show that
P∞
f (x) = k=−∞ ck eikx exists for every x ∈ R and that f is differentiable everywhere.

(t) Let hck ik∈Z be a double-ended sequence of complex numbers such that supk∈Z |kck | < ∞. Show that there is
a square-integrable function f on ]−π, π] such that the ck are the Fourier coefficients of f , that f is the limit almost
everywhere of its Fourier sums, and that f ∗ f ∗ f is differentiable. (Hint: use 282K to show that there is an f , and
282Xn to show that its Fourier sums converge wherever its Fejér sums do; use 282Q and 282Xs to show that f ∗ f ∗ f
is differentiable.)

282Y Further exercises (a) Let f be a non-negative integrable function on ]−π, π], with Fourier coefficients
hck ik∈Z . Show that
P n Pn
j=0 k=0 aj āk cj−k ≥ 0

for all complex numbers a0 , . . . , an . (See also 285Xr below.)


282 Notes Fourier series 417

(b) Let f : ]−π, π] → C, K ≥ 0, α > 0 be such that |f (x) − f (y)| ≤ K|x − y|α for all x, y ∈ ]−π,Rπ]. Let ck , sn be
π
1
the Fourier coefficients and sums of f . (i) Show that supk∈Z |k|α |ck | < ∞. (Hint: show that ck = 4π −π
(f (x) − f˜(x +
π −ikx
k ))e dx.) (ii) Show that if f (π) = limx↓−π f (x) then sn → f uniformly. (Compare 282Xj.)

(c) Let f be a measurable complex-valued function on ]−π, π], and suppose that p ∈ [1, ∞[ is such that −π |f |p < ∞.

Let hσm im∈N be the sequence of Fejér sums of f . Show that limm→∞ −π |f − σm |p = 0. (Hint: use 245Xl, 255Yk and
the ideas in 282Ib.)

(d) Construct a continuous function h : [−π, π] → R such that h(π) = h(−π) but the Fourier sums of h are
R π sin(m+ 21 )t sin(n+ 12 )t
unbounded at 0, as follows. Set α(m, n) = 0 sin 21 t
dt. Show that limn→∞ α(m, n) = 0 for every m, but
P∞ 1
limn→∞ α(n, n) = ∞. Set h0 (x) = k=0 δk sin(mk + 2 )x for 0 ≤ x ≤ π, 0 for −π ≤ x ≤ 0, where δk > 0, mk ∈ N are
such that (α) δk ≤ 2−k , δk |α(mk , mn )| ≤ 2−k for every n < k (choosing δk ) (β) δk α(mk , mk ) ≥ k, δn |α(mk , mn )| ≤ 2−n
for every n < k (choosing mk ). Now modify h0 on [−π, 0[ by adding a function of bounded variation.
R π sin(n+ 1 )t
(e)(i) Show that limn→∞ −π | sin 1 t2 |dt = ∞. (Hint: 282Xm.) (ii) Show that for any δ > 0 there are n ∈ N, f ≥ 0
2
Rπ Rπ 1 R π sin(n+ 21 )t 1
such that −π f ≤ δ, −π |sn | ≥ 1, where sn is the nth Fourier sum of f . (Hint: take n such that −π
| sin 1 t |dt >
2π 2 δ
δ
and set f (x) = for 0 ≤ x ≤ η, 0 otherwise, where η is small.) (iii) Show that there is an integrable function
η
f : ]−π, π] → R such that supn∈N ksn k1 is infinite, where hsn in∈N is the sequence of Fourier sums of f . (Hint: it helps
to know the ‘Uniform Boundedness Theorem’ of functional analysis, but f can also be constructed bare-handed by the
method of 282Yd.)

(f ) Let u : [−π, π] → R be an absolutely continuous function such that u(π) = u(−π) and −π u = 0. Show that
kuk2 ≤ ku′ k2 . (This is Wirtinger’s inequality.)

1−r 2
(g) For 0 ≤ r < 1, t ∈ R set Ar (t) = . (Ar is the Poisson kernel; see 478Xl1 in Volume 4.) (i) Show that
1−2r cos t+r 2
1 Rπ
Ar = 1. (ii) For a real function f which is integrable over ]−π, π], with real Fourier coefficients ak , bk (282Ba),
2π −π
1 P∞ 1 Rπ
set Sr (x) = a0 + k=1 rk (ak cos kx + bk sin kx) for x ∈ ]−π, π], r ∈ [0, 1[. Show that Sr (x) = Ar (x − t)f (t)dt
2 2π −π
P∞ n
for every x ∈ ]−π, π]. (Hint: Ar (t) = 1 + 2 n=1 r cos nt.) (iii) Show that R π limr↑1 Sr (x) = f (x) for every x ∈ ]−π, π[
which is in the Lebesgue set of f . (Hint: 223Yg.) (iv) Show that limr↑1 −π |Sr − f | = 0. (v) Show that if f is defined
everywhere on ]−π, π], is continuous, and f (π) = limx↓−π f (x), then limr↑1 supx∈]−π,π] |Sr (x) − f (x)| = 0.

282 Notes and comments This has been a long section with a potentially confusing collection of results, so perhaps
I should recapitulate. Associated with
Pn any integrable function on ]−π,P π] we have the corresponding Fourier sums,

being the symmetric partial sums k=−n ck eikx of the complex series k=−∞ ck eikx , or, equally, the partial sums
1
Pn 1
P∞
2 a0 + k=1 ak cos kx + bk sin kx of the real series 2 a0 + k=1 ak cos kx + bk sin kx. The Fourier coefficients ck , ak , bk
are the only natural ones, because if the series is to converge with any regularity at all then
1 R π P∞ 
ikx −ilx
2π−π k=−∞ ck e e dx

ought to be simultaneously
P∞ 1 Rπ ikx −ilx
k=−∞ 2π c e
−π k
e dx = cl

and
1 Rπ
f (x)e−ilx dx.
2π −π

(Compare the calculations in 282J.) The effect of taking Fejér sums σm (x) rather than the Fourier sums sn (x) is to
smooth the sequence out; recall that if limn→∞ sn (x) = c then limm→∞ σm (x) = c, by 273Ca in the last chapter.
Most of the work above is concerned with the question of when Fourier or Fejér sums converge, in some sense, to
the original function f . As has happened before, in §245 and elsewhere, we have more than one kind of convergence to
1 Later editions only.
418 Fourier analysis 282 Notes

consider. Norm convergence, for k k1 or k k2 or k k∞ , is the simplest; the three theorems 282G, 282Ib and 282J at least
are relatively straightforward. (I have given 282Ib as a corollary of 282Ia; but there is an easier proof from 282G. See
282Xd.) Respectively, we have
if f is continuous (and matches at ±π, that is, f (π) = limt↓−π f (t)) then σm → f uniformly, that is, for
k k∞ (282G);
if f is any integrable function, then σm → f for k k1 (282Ib);
if f is a square-integrable function, then sn → f for k k2 (282J);
if f is continuous and of bounded variation (and matches at ±π), then sn → f uniformly (282O).
There are some similar results for other k kp (282Yc); but note that the Fourier sums need not converge for k k1 (282Ye).
Pointwise convergence is harder. The results I give are
if f is any integrable function, then σm → f almost everywhere (282Ia);
this relies on some careful calculations in 282H, and also on the deep result 223D. Next we have the results which look
at the average of the limits of f from the two sides. Suppose I write
1
f ± (x) = (limt↑x f (t) + limt↓x f (t))
2

whenever this is defined, taking f ± (π) = 21 (limt↑π f (t) + limt↓−π f (t)). Then we have
if f is any integrable function, σm → f ± wherever f ± is defined (282I);
if f is of bounded variation, sn → f ± everywhere (282O).
Of course these apply at any point at which f is continuous, in which case f (x) = f ± (x). Yet another result of this
type is
if f is any integrable function, sn → f at any point at which f is differentiable (282L);
in fact, this can be usefully extended for very little extra labour (282Xi, 282Xp).
I cannot leave this list without mentioning the theorem I have not given. This is Carleson’s theorem:
if f is square-integrable, sn → f almost everywhere
(Carleson 66). I will come to this in §286. There is an elementary special case in 282Xt. The result is in fact valid
for many other f (see the notes to §286).
The next glaring lacuna in the exposition here is the absence of any examples to show how far these results are best
possible. There is no suggestion, indeed, that there are any natural necessary and sufficient conditions for
sn → f at every point.
Nevertheless, we have to make an effort to find a continuous function for which this is not so, and the construction of an
example by du Bois-Reymond (Bois-Reymond 1876) was an important moment in the history of analysis, not least
because it forced mathematicians to realise that some comfortable assumptions about the classification of functions –
essentially, that functions are either ‘good’ or so bad that one needn’t trouble with them – were false. The example is
instructive but I have had to omit it for lack of space; I give an outline of a possible method in 282Yd. (You can find
a detailed construction in Körner 88, chapter 18, and a proof that such a function exists in Dudley 89, 7.4.3.) If
you allow general integrable functions, then you can do much better, or perhaps I should say much worse; there is an
integrable f such that supn∈N |sn (x)| = ∞ for every x ∈ ]−π, π] (Kolmogorov 26; see Zygmund 59, §§VIII.3-4).
In 282C I mentioned two types of problem. The first – when is a Fourier series summable? – has at least been
treated at length, even though I cannot pretend to have given more than a sample of what is known. The second –
how do properties of the ck reflect properties of f ? – I have hardly touched on. I do give what seem to me to be the
three most important results in this area. The first is
if f and g have the same Fourier coefficients, they are equal almost everywhere (282Ic).
This at least tells us that we ought in principle to be able to learn almost anything about f by looking at its Fourier
series. (For instance, 282Ya describes a necessary and sufficient condition for f to be non-negative almost everywhere.)
The second is
P∞
f is square-integrable iff k=−∞ |ck |2 < ∞;
in fact,
P∞ 2 1 Rπ
k=−∞ |ck | = π
|f |2 (282J).

Of course this is fundamental, since it shows that Fourier coefficients provide a natural Hilbert space isomorphism
between L2 and ℓ2 (282K). I should perhaps remark that while the real Hilbert spaces L2R , ℓ2R are isomorphic as inner
product spaces (282Xg), they are certianly not isomorphic as Banach lattices; for instance, ℓ2R has ‘atomic’ elements c
such that if 0 ≤ d ≤ c then d is a multiple of c , while L2R does not. Perhaps even more important is
283B Fourier transforms I 419

the Fourier coefficients of a convolution f ∗ g are just a scalar multiple of the products of the Fourier
coefficients of f and g (282Q);
but to use this effectively we need to study the Banach algebra structure of L1 , and I have no choice but to abandon
this path immediately. (It will form a conspicuous part of Chapter 44 in Volume 4.) 282Xt gives an elementary
consequence, and 282Xq a very partial description of the relationship between a product f × g of two functions and
the convolution product of their sequences of Fourier coefficients.
The Fejér sums considered in this section are one way of working around the convergence difficulties associated
with Fourier sums. When we come to look at Fourier transforms in the next two sections we shall need some further
manoeuvres. A different type of smoothing is obtained by using the Poisson kernel in place of the Dirichlet or Fejér
kernel (282Yg).
I end these notes with a remark on the number 2π. This enters nearly every formula involving Fourier series, but
could I think be removed totally from the present section, at least, by re-normalizing the measure of ]−π, π]. If instead
1
of Lebesgue measure µ we took the measure ν = 2π µ throughout, then every 2π would disappear. (Compare the remark
in 282Bb concerning the possibility of doing integrals over S 1 .) But I think most of us would prefer to remember the
location of a 2π in every formula than to deal with an unfamiliar measure.

283 Fourier transforms I


I turn now to the theory of Fourier transforms on R. In the first of two sections on the subject, I present those parts
of the elementary theory which can be dealt with using the methods of the previous section on Fourier series. I find
no way of making sense of the theory, however, without introducing a fragment of L.Schwartz’ theory of distributions,
which I present in §284. As in §282, of course, this treatment also is nothing but a start in the topic.
The whole theory can also be done in R r . I leave this extension to the exercises, however, since there are few new
ideas, the formulae are significantly more complicated, and I shall not, in this volume at least, have any use for the
multidimensional versions of these particular theorems, though some of the same ideas will appear, in multidimensional
form, in §285.

283A Definitions Let f be a complex-valued function which is integrable over R.



(a) The Fourier transform of f is the function f : R → C defined by setting
∧ 1 R ∞ −iyx
f (y) = √ e f (x)dx
2π −∞

for every y ∈ R. (Of course the integral is always defined because x 7→ e−iyx is bounded and continuous, therefore
measurable.)

(b) The inverse Fourier transform of f is the function f : R → C defined by setting
∨ 1 R ∞ iyx
f (y) = √ e f (x)dx
2π −∞
for every y ∈ R.

283B Remarks (a) It is a mildly vexing feature of the theory of Fourier transforms – vexing, that is, for outsiders
like myself – that there is in fact no standard definition of ‘Fourier transform’. The commonest definitions are, I think,
∧ 1 R ∞ ∓iyx
f (y) = √ e f (x)dx,
2π −∞

∧ R∞
f (y) = −∞
e∓iyx f (x)dx,

∧ R∞
f (y) = −∞
e∓2πiyx f (x)dx,
corresponding to inverse transforms
∨ 1 R ∞ ±iyx
f (y) = √ e f (x)dx,
2π −∞

∨ 1 R ∞ ±iyx
f (y) = e f (x)dx,
2π −∞
420 Fourier analysis 283B

∨ R∞
f (y) = −∞
e±2πiyx f (x)dx.
I leave it to you to check that the whole theory can be carried through with any of these six pairs, and to investigate
other possibilities (see 283Xa-283Xb below).

(b) The phrases ‘Fourier transform’, ‘inverse Fourier transform’ make it plain that (f )∨ is supposed to be f , at least
some of the time. This is indeed the case, but the class of f for which this is true in the literal sense is somewhat
constrained, and we shall have to wait a little while before investigating it.
∧ ∨
(c) No amount of juggling with constants, in the manner of (a) above, can make f and f quite the same. However,
∨ ∧ ∨ ∧
on the definitions I have chosen, we do have f (y) = f (−y) for every y, so that f and f will share essentially all the
properties of interest to us here; in particular, everything in the next proposition will be valid with ∨ in place of ∧ , if
you change signs at the right points in parts (c), (h) and (i).

283C Proposition Let f and g be complex-valued functions which are integrable over R.


(a) (f + g)∧ = f + g.

(b) (cf )∧ = cf for every c ∈ C.
∧ ∧
(c) If c ∈ R and h(x) = f (x + c) whenever this is defined, then h(y) = eicy f (y) for every y ∈ R.
∧ ∧
(d) If c ∈ R and h(x) = eicx f (x) for every x ∈ dom f , then h(y) = f (y − c) for every y ∈ R.
∧ 1 ∧
(e) If c > 0 and h(x) = f (cx) whenever this is defined, then h(y) = f (cy) for every y ∈ R.
c

(f) f : R → C is continuous.
∧ ∧
(g) limy→∞ f (y) = limy→−∞ f (y) = 0.
R∞ ∧
(h) If −∞ |xf (x)|dx < ∞, then f is differentiable, and its derivative is
∧ i R ∞ −iyx
f ′ (y) = − √ e xf (x)dx
2π −∞
for every y ∈ R.

(i) If f is absolutely continuous on every bounded interval and f ′ is integrable, then (f ′ )∧ (y) = iy f (y) for every
y ∈ R.
proof (a) and (b) are trivial, and (c), (d) and (e) are elementary substitutions.
(f ) If hyn in∈N is any convergent sequence in R with limit y, then
Z ∞
∧ 1
f (y) = √ lim e−iyn x f (x)dx
2π −∞ n→∞
Z ∞
1 ∧
= lim √ e−iyn x f (x)dx = lim f (yn )
n→∞ 2π −∞ n→∞

−iyn x
by Lebesgue’s Dominated Convergence Theorem, because |e f (x)| ≤ |f (x)| for every n ∈ N, x ∈ dom f . As hyn in∈N

is arbitrary, f is continuous.
(g) This is just the Riemann-Lebesgue lemma (282E).
∂ −iyx
(h) The point is that | ∂y e f (x)| = |xf (x)| for every x ∈ dom f , y ∈ R. So by 123D
Z ∞ Z
∧ 1 d 1 d
′ −iyx
f (y) = √ e f (x)dx = √ e−iyx f (x)dx
2π dy −∞
2π dy dom f
Z Z ∞
1 ∂ −iyx 1
= √ e f (x)dx = √ −ixe−iyx f (x)dx
2π dom f
∂y 2π −∞
Z ∞
i
= −√ xe−iyx f (x)dx.
2π −∞

(i) Because f is absolutely continuous on every bounded interval,


Rx R0
f (x) = f (0) + 0
f ′ for x ≥ 0, f (x) = f (0) − x
f ′ for x ≤ 0.
283D Fourier transforms I 421

Because f ′ is integrable,
R∞ R0
limx→∞ f (x) = f (0) + 0
f ′, limx→−∞ f (x) = f (0) − −∞
f′
both exist. Because f also is integrable, both limits must be zero. Now we can integrate by parts (225F) to see that
Z ∞ Z a
1 1
(f ′ )∧ (y) = √ e−iyx f ′ (x)dx = √ lim e−iyx f ′ (x)dx
2π −∞ 2π a→∞ −a
Z ∞
1  iy
=√ lim e−iya f (a) − lim e−iya f (a) + √ e−iyx f (x)dx
2π a→∞ a→−∞ 2π −∞

= iy f (y).

Ra sin x π
R a sin x
283D Lemma (a) lima→∞ 0x dx = 2 , lima→∞ −a x dx = π.
Rb
(b) There is a K < ∞ such that | a sinxcx dx| ≤ K whenever a ≤ b and c ∈ R.
proof (a)(i) Set
Ra sin x R0 sin x
F (a) = 0
dx if a ≥ 0, F (a) = − −a
dx if a ≤ 0,
x x
Rb sin x
so that F (a) = −F (−a) and a x dx = F (b) − F (a) for all a ≤ b.
If 0 < a ≤ b, then by 224J
Rb sin x 1 1 1 Rc 1 2
| a
dx| ≤( + − ) supc∈[a,b] | a
sin x dx| ≤ supc∈[a,b] | cos c − cos a| ≤ .
x b a b a a
2
In particular, |F (n) − F (m)| ≤ m if m ≤ n in N, and hF (n)in∈N is a Cauchy sequence with limit γ say; now
2
|γ − F (a)| = limn→∞ |F (n) − F (a)| ≤
a

for every a > 0, so lima→∞ F (a) = γ. Of course we also have


Ra sin x
lima→∞ −a
dx = lima→∞ (F (a) − F (−a)) = lima→∞ 2F (a) = 2γ.
x

(ii) So now I have to calculate γ. For this, observe first that


R πasin x Rπ sin at
2γ = lima→∞ −πa x
dx = lima→∞ −π
dt
t
(substituting x = t/a). Next,
1 1 sin u−u
limt→0 − = limu→0 = 0,
t 2 sin 12 t 2u sin u
so
R π 1
− 1
1
dt < ∞,
−π t 2 sin 2 t

and by the Riemann-Lebesgue lemma (282Fb)


Rπ 1 1 
lima→∞ − sin at dt = 0.
−π t 2 sin 21 t

But we know that


Rπ sin(n+ 21 )t
dt = π
−π 2 sin 12 t

for every n (using 282Dc), so we must have


Z a Z π Z π
sin t sin at sin at
lim dt = lim dt = lim dt
a→∞ −a t a→∞ −π t a→∞ −π
2 sin 21 t
Z π
sin(n+ 12 )t
= lim dt = π,
n→∞ −π
2 sin 12 t

and γ = π/2, as claimed.


422 Fourier analysis 283D

(b) Because F is continuous and


π π
lima→∞ F (a) = γ = , lima→−∞ F (a) = −γ = − ,
2 2

F is bounded; say |F (a)| ≤ K1 for all a ∈ R. Try K = 2K1 . Now suppose that a < b and c ∈ R. If c > 0, then
Rbsin cx R bc sin t
| a
dx| =| ac
dt| = |F (bc) − F (ac)| ≤ 2K1 = K,
x t

substituting x = t/c. If c < 0, then


Rb sin cx Rb sin(−c)x
| a
dx| =|− a
dx| ≤ K;
x x
while if c = 0 then
Rb sin cx
| a
dx| = 0 ≤ K.
x

283E The hardest work of this section will lie in the ‘pointwise inversion theorems’ 283I and 283K below. I begin
however with a relatively easy, and at least equally important, result, showing (among other things) that an integrable
function f can (essentially) be recovered from its Fourier transform.
Lemma Whenever c < d in R,
Z a
eidy −eicy
lim e−iyx dy = 2πi if c < x < d,
a→∞ −a
y

= πi if x = c or x = d,
= 0 if x < c or x > d.

proof We know that for any b > 0


Ra sin bx R ab sin t
lima→∞ −a
dx = lima→∞ −ab
dt =π
x t

(subsituting x = t/b), and therefore that for any b < 0


Ra sin bx Ra sin(−b)x
lima→∞ −a
dx = − lima→∞ −a
dx = −π.
x x
Now consider, for x ∈ R,
Ra eidy −eicy
lima→∞ −a
e−iyx dy.
y
Ra
First note that all the integrals −a
exist, because
eidy −eicy
limy→0 = i(d − c)
y
is finite, and the integrand is certainly continuous except at 0. Now we have

Z a
eidy −eicy
e−iyx dy
−a
y
Z a
ei(d−x)y −ei(c−x)y
= dy
−a
y
Z a Z a
cos(d−x)y−cos(c−x)y sin(d−x)y−sin(c−x)y
= dy +i dy
−a
y −a
y
Z a
sin(d−x)y−sin(c−x)y
=i dy
−a
y

because cos is an even function, so


Ra cos(d−x)y−cos(c−x)y
−a
dy =0
y
283F Fourier transforms I 423

for every a ≥ 0. (Once again, this integral exists because


cos(d−x)y−cos(c−x)y
limy→0 = 0.)
y
Accordingly
Z a Z a Z a
idy
−iyx e −eicy sin(d−x)y sin(c−x)y
lim e dy = i lim dy − i lim dy
a→∞ −a
y a→∞ −a y a→∞ −a
y

= iπ − iπ = 0 if x < c,
= iπ − 0 = πi if x = c,
= iπ + iπ = 2πi if c < x < d,
= 0 + iπ = πi if x = d,
= −iπ + iπ = 0 if x > d.


283F Theorem Let f be a complex-valued function which is integrable over R, and f its Fourier transform. Then
whenever c ≤ d in R,
Z d Z a
i eicy −eidy ∧
f= √ lim f (y)dy.
c
2π a→∞ −a y

proof If c = d this is trivial; let us suppose that c < d.


(a) Writing
Z a
eidy −eicy
θa (x) = e−iyx dy
y
−a

for x ∈ R, a ≥ 0, 283E tells us that


lima→∞ θa (x) = 2πiθ(x)
1 1
where θ = 2 (χ[c, d] + χ ]c, d[) takes the value 1 inside the interval [c, d], 0 outside and the value 2 at the endpoints. At
the same time,
Z a
sin(d−x)y−sin(c−x)y
|θa (x)| = | dy|
−a
y
Z a Z a
sin(d−x)y sin(c−x)y
≤| dy| + | dy| ≤ 2K
−a
y −a
y

for all a ≥ 0, x ∈ R, where K is the constant of 283Db. Consequently |f × θa | ≤ 2K|f | everywhere on dom f , for every
a ≥ 0, and (applying Lebesgue’s Dominated Convergence Theorem to sequences hf × θan in∈N , where an → ∞)
R R Rd
lima→∞ f × θa = 2πi f × θ = 2πi c
f.

(b) Now consider the limit in the statement of the theorem. We have

Z a Z a Z ∞ icy
eicy − eidy ∧ 1 e − eidy −iyx
f (y)dy = √ e f (x)dxdy
−a y 2π −a −∞ y
Z ∞ Z a icy
1 e − eidy −iyx
=√ e f (x)dydx
2π −∞ −a y
Z ∞
1
= −√ f × θa ,
2π −∞

by Fubini’s and Tonelli’s theorems (252H), using the fact that (eicy − eidy )/y is bounded to see that
R ∞ R a eicy −eidy −iyx
y e f (x) dydx
−∞ −a
424 Fourier analysis 283F

is finite. Accordingly
Z a Z ∞
i eicy − eidy ∧ i
√ lim f (y)dy = − lim f × θa
2π a→∞ −a y 2π a→∞ −∞
Z d Z d
i
= − 2πi f= f,
2π c c
as required.


283G Corollary If f and g are complex-valued functions which are integrable over R, then f = g iff f =a.e. g.
proof If f =a.e. g then of course
∧ 1 R ∞ iyx 1 R ∞ iyx ∧
f (y) = √ e f (x)dx = √ e g(x)dx = g(y)
2π −∞ 2π −∞


for every y ∈ R. Conversely, if f = g, then by the last theorem
Rd Rd
c
f= c
g
for all c ≤ d, so f = g almost everywhere, by 222D.

283H Lemma Let f be a complex-valued function which is integrable over R, and f its Fourier transform. Then
for any a > 0, x ∈ R,
1 R a ixy ∧ 1 R ∞ sin a(x−t) 1 R ∞ sin at
√ e f (y)dy = f (t)dt = f (x − t)dt.
2π −a π −∞ x−t π −∞ t

proof We have
Ra R∞ R∞
−a −∞
|eixy e−iyt f (t)|dtdy ≤ 2a −∞
|f (t)|dt < ∞,
7 eixy e−iyt f (t) is surely jointly measurable) we may reverse the order of integration,
so (because the function (t, y) →
and get
Z a Z aZ ∞
1 ∧ 1
√ eixy f (y)dy = eixy e−iyt f (t)dt dy
2π −a 2π −a −∞
Z ∞ Z a
1
= f (t) ei(x−t)y dy dt
2π −∞ −a
Z ∞ Z ∞
1 2 sin(x−t)a 1 sin au
= f (t)dt = f (x − u)du,
2π −∞
x−t π −∞
u

substituting t = x − u.

283I Theorem Let f be a complex-valued function which is integrable over R, and suppose that f is differentiable
at x ∈ R. Then
1 Ra ∧
1 Ra ∨
f (x) = √ lima→∞ −a
eixy f (y)dy = √ lima→∞ −a
e−ixy f (y)dy.
2π 2π

proof Set g(u) = f (x) if |u| ≤ 1, 0 otherwise, and observe that limu→0 u1 (f (x − u) − g(u)) = −f ′ (x) is finite, so that
there is a δ ∈ ]0, 1] such that
f (x−u)−g(u)
K = sup0<|u|≤δ < ∞.
u
Consequently
Z ∞ Z −δ Z 1
f (x−u)−g(u) 1 1
du ≤ |f (x − u)|du + |g|
−∞
u δ −∞
δ −1
Z δ Z ∞
1
+ K+ |f (x − u)|du
−δ
δ δ
Z ∞
1 2
≤ |f | + |f (x)| + 2δK < ∞.
δ −∞
δ
283K Fourier transforms I 425

By the Riemann-Lebesgue lemma (282Fb),


R∞ sin au
lima→∞ −∞
(f (x − u) − g(u))du = 0.
u
R sin au
If we now examine u g(u)du, we get
Z ∞ Z 1 Z a
sin au sin au sin v
g(u)du = f (x)du = f (x) dv,
−∞
u −1
u −a
v

substituting u = v/a. So we get


Z ∞ Z ∞
sin au sin au
lim f (x − u)du = lim g(u)du
a→∞ −∞ u a→∞ −∞
u
Z a
sin v
= lim f (x) dv = πf (x),
a→∞ −a
v

by 283D. Accordingly
Z a Z ∞
1 ∧ 1 sin au
√ lim eixy f (y)dy = lim f (x − u)du = f (x),
2π a→∞ −a π a→∞ −∞ u

using 283H. As for the second equality,


Z a Z a
1 ∨ 1 ∧
√ lim e−ixy f (y)dy = √ lim e−ixy f (−y)dy
2π a→∞ −a 2π a→∞ −a
Z a
1 ∧
= √ lim eixu f (u)du = f (x)
2π a→∞ −a

(substituting y = −u).
Remark Compare 282L.

283J Corollary Let f : R → C be an integrable function such that f is differentiable and f is integrable. Then
∧ ∨
f = ( f )∨ = ( f )∧ .

proof Because f is integrable,
∧∨ 1 R a ixy ∧
f (x) = lima→∞ √ e f (y)dy = f (x)
2π −a
for every x ∈ R. Similarly,
∨∧ 1 R a −ixy ∨
f (x) = lima→∞ √ e f (y)dy = f (x).
2π −a

283K The next proposition gives a class of functions to which the last corollary can be applied.
Proposition Suppose that f is a twice-differentiable function from R to C such that f , f ′ and f ′′ are all integrable.

Then f is integrable.
proof Because f ′ and f ′′ are integrable, f and f ′ are absolutely continuous on any bounded interval (225L). So by
283Ci we have

(f ′′ )∧ (y) = iy(f ′ )∧ (y) = −y 2 f (y)
∧ ∧
for every y ∈ R. At the same time, by 283Cf-283Cg, (f ′′ )∧ and f must be bounded; say |f (y)| + |(f ′′ )∧ (y)| ≤ K for
every y ∈ R. Now
∧ K
|f (y)| ≤
1+y 2
for every y, so that
R∞ ∧ R −1 1 R∞ 1
−∞
|f | ≤ K −∞ y 2
dy + 2K + K 1
dy = 4K < ∞.
y2

Remark Compare 282Rb.


426 Fourier analysis 283L

283L I turn now to the result corresponding to 282O, using a slightly different approach.

Theorem Let f be a complex-valued function which is integrable over R, with Fourier transform f and inverse Fourier

transform f , and suppose that f is of bounded variation on some neighbourhood of x ∈ R. Set a = limt∈dom f,t↑x f (t),
b = limt∈dom f,t↓x f (t). Then
1 Rγ ∧
1 Rγ ∨
1
√ limγ→∞ −γ
eixy f (y)dy = √ limγ→∞ −γ
e−ixy f (y)dy = (a + b).
2π 2π 2

proof (a) The limits limt∈dom f,t↑x f (t) and limt∈dom f,t↓x f (t) exist because f is of bounded variation near x (224F).
Recall from 283Db that there is a constant K < ∞ such that
Rδ sin cx
| γ
dx| ≤K
x

whenever γ ≤ δ and c ∈ R.

(b) Let ǫ > 0. The hypothesis is that there is some δ > 0 such that Var[x−δ,x+δ] (f ) < ∞. Consequently
limη↓0 Var]x,x+η] (f ) = limη↓0 Var[x−η,x[ (f ) = 0
(224E). There is therefore an η > 0 such that
max(Var[x−η,x[ (f ), Var]x,x+η] (f )) ≤ ǫ.
Of course
|f (t) − f (u)| ≤ Var[x−η,x[ (f ) ≤ ǫ
whenever t, u ∈ dom f and x − η ≤ t ≤ u < x, so we shall have
|f (t) − a| ≤ ǫ for every t ∈ dom f ∩ [x − η, x[,
and similarly
|f (t) − b| ≤ ǫ whenever t ∈ dom f ∩ ]x, x + η].

(c) Now set


g1 (t) = f (t) when t ∈ dom f and |x − t| > η, 0 otherwise,

g2 (t) = a when x − η ≤ t < x, b when x < t ≤ x + η, 0 otherwise,

g3 = f − g1 − g2 .
Then f = g1 + g2 + g3 ; each gj is integrable; g1 is zero on a neighbourhood of x;
supt∈dom g3 ,t6=x |g3 (t)| ≤ ǫ,

Var[x−η,x[ (g3 ) ≤ ǫ, Var]x,x+η] (g3 ) ≤ ǫ.

(d) Consider the three parts g1 , g2 , g3 separately.

(i) For the first, we have


1 R γ ixy ∧
limγ→∞ √ e g 1 (y)dy =0
2π −γ

by 283I.

(ii) Next,

Z γ Z ∞
1 1 sin(x−t)γ
eixy g 2 (y)dy =

√ g2 (t)dt
2π −γ
π −∞
x−t

(by 283H)
283L Fourier transforms I 427
Z x Z x+η
a sin(x−t)γ b sin(x−t)γ
= dt + dt
π x−η
x−t π x
x−t
Z γη Z γη
a sin u b sin u
= du + du
π 0
u π 0
u
(substituting t = x − γ1 u in the first integral, t = −x + γ1 u in the second)
a+b
→ as γ → ∞
2

by 283Da.
(iii) As for the third, we have, for any γ > 0,

Z γ Z ∞ Z ∞
1 1 sin(x−t)γ 1 sin tγ
√ eixy g 3 (y)dy =

g3 (t)dt = g3 (x − t)dt
2π −γ
π −∞
x−t π −∞
t
Z 0 Z η
1 sin tγ 1 sin tγ
≤ g3 (x − t)dt + g3 (x − t)dt
π −η t π 0 t

K
≤ sup |g3 (t)| + Var (g3 )
π t∈dom g ∩]x−η,x[ ]x−η,x[
3

+ sup |g3 (t)| + Var (g3 )
t∈dom g3 ∩]x,x+η[ ]x,x+η[

K
≤ 4ǫ ,
π

sin γt
using 224J to bound the integrals in terms of the variation and supremum of g3 and integrals of t over subintervals.
(e) We therefore have
Z γ
1 ∧ a+b
lim sup √ eixy f (y)dy −
γ→∞ 2π −γ
2
Z γ
1
eixy g 1 (y)dy

≤ lim sup √
γ→∞ 2π −γ
Z γ
1 a+b
+ lim sup √ eixy g 2 (y)dy −

γ→∞ 2π −γ
2
Z γ
1
+ lim sup √ eixy g 3 ydy

γ→∞ 2π −γ

4K
≤0+0+ ǫ
π

by the calculations in (d). As ǫ is arbitrary,


1 R γ ixy ∧ a+b
limγ→∞ √ e f (y)dy − = 0.
2π −γ 2

(f ) This is the first half of the theorem. But of course the second half follows at once, because
Z γ Z γ
1 ∨ 1 ∧
√ lim e−ixy f (y)dy = √ lim e−ixy f (−y)dy
2π γ→∞ −γ 2π γ→∞ −γ
Z γ
1 ∧ a+b
= √ lim eixy f (y)dy = .
2π γ→∞ −γ 2

Remark You will see that this argument uses some of the same ideas as those in 282O-282P. It is more direct because
(i) I am not using any concept corresponding to Fejér sums (though a very suitable one is available; see 283Xf) (ii) I
do not trouble to give the result concerning uniform convergence of the Fourier integrals when f is continuous and of
bounded variation (283Xj) (iii) I do not give any pointer to the significance of the fact that if f is of bounded variation

then supy∈R |y f (y)| < ∞ (283Xk).
428 Fourier analysis 283M

283M Corresponding to 282Q, we have the following.

Theorem Let f and g be complex-valued functions which are integrable over R, and f ∗ g their convolution product,
defined by setting
R∞
(f ∗ g)(x) = −∞
f (t)g(x − t)dt

whenever this is defined (255E). Then


√ ∧

√ ∨

(f ∗ g)∧ (y) = 2π f (y)g(y), (f ∗ g)∨ (y) = 2π f (y)g(y)
for every y ∈ R.

proof For any y,

Z ∞
1
(f ∗ g)∧ (y) = √ e−iyx (f ∗ g)(x)dx
2π −∞
Z ∞Z ∞
1
= √ e−iy(t+u) f (t)g(u)dtdu
2π −∞ −∞

(using 255G)
Z ∞ Z ∞ √ ∧ ∧
1
= √ e−iyt f (t)dt e−iyu g(u)du = 2π f (y)g(y).
2π −∞ −∞

Now, of course,
√ ∧ ∧
√ ∨ ∨
(f ∗ g)∨ (y) = (f ∗ g)∧ (−y) = 2π f (−y)g(−y) = 2π f (y)g(y).

283N I show how to compute a special Fourier transform, which will be used repeatedly in the next section.
2 2
Lemma For σ > 0, set ψσ (x) = √1 e−x /2σ for x ∈ R. Then its Fourier transform and inverse Fourier transform are
σ 2π
∧ ∨ 1
ψ σ = ψ σ = ψ1/σ .
σ

In particular, ψ 1 = ψ1 .

proof (a) I begin with the special case σ = 1, using the Maclaurin series
P∞ (−iyx)k
e−iyx = k=0
k!
R∞ 2
and the expressions for −∞
xk e−x /2
dx from §263.
Fix y ∈ R. Writing
(−iyx)k −x2 /2 Pn 2
gk (x) = e , hn (x) = k=0 gk (x), h(x) = e|yx|−x /2
,
k!

we see that
|yx|k −x2 /2
|gk (x)| ≤ e ,
k!

so that
P∞ 2
|hn (x)| ≤ k=0 |gk (x)| ≤ e|yx| e−x /2
= h(x)

for every n; moreover, h is integrable, because |h(x)| ≤ e−|x| whenever |x| ≥ 2(1 + |y|). Consequently, using Lebesgue’s
Dominated Convergence Theorem,
283Wa Fourier transforms I 429

Z ∞ Z ∞
∧ 1 1
ψ 1 (y) = lim hn = lim hn
2π −∞ n→∞
2π n→∞ −∞

1 XZ ∞

1

X (−iy)k
Z ∞
2
= gk = xk e−x /2
dx
2π −∞
2π k! −∞
k=0 k=0
X∞
1 (−iy)2j (2j)! √
= 2π
2π (2j)! 2j j!
j=0
(by 263H)

X
1 (−y 2 )j 1 2
= √ = √ e−y /2 = ψ1 (y),
2π 2j j! 2π
j=0

as claimed.
1 x
(b) For the general case, ψσ (x) = ψ1 ( ), so that
σ σ
∧ 1 ∧ 1 ∧
ψ σ (y) = · σ ψ 1 (σy) = ψ 1/σ (y)
σ σ
by 283Ce. Of course we now have
∨ ∧ 1
ψ σ (y) = ψ σ (−y) = ψ1/σ (y)
σ
because ψ1/σ is an even function.

283O To lead into the ideas of the next section, I give the following very simple fact.
R∞ ∧
R∞ ∧
Proposition Let f and g be two complex-valued functions which are integrable over R. Then −∞
f ×g = −∞
f ×g
R∞ ∨
R∞ ∨
and −∞ f × g = −∞ f × g.
proof Of course
R∞ R∞ R∞ R∞
−∞ −∞
|e−ixy f (x)g(y)|dxdy = −∞
|f | −∞
|g| < ∞,
so
Z ∞ Z ∞ Z ∞
1
f (y)e−iyx g(x)dxdy

f ×g = √

2π −∞ −∞
Z ∞ Z ∞ Z ∞
1 ∧
−ixy
= √ f (y)e g(x)dydx = f × g.
2π −∞ −∞ −∞
−ixy ixy
For the other half of the proposition, replace every e in the argument by e .

283W Higher dimensions I offer a series of exercises designed to provide hints on how the work of this section
may be done in the r-dimensional case, where r ≥ 1.

(a) Let f be an integrable complex-valued function defined almost everywhere in R r . Its Fourier transform is

the function f : R r → C defined by the formula
∧ 1 R
f (y) = √
r
e−iy . x f (x)dx,
( 2π)
R
writing y . x = η1 ξ1 + . . . + ηr ξr for x = (ξ1 , . . . , ξr ) and y = (η1 , . . . , ηr ) ∈ R r , and . . . dx for integration with respect

r
to Lebesgue measure on R . Similarly, the inverse Fourier transform of f is the function f given by
∨ 1 R ∧
f (y) = √ eiy . x f (x)dx = f (−y).
( 2π)r
Show that, for any integrable complex-valued function f on R r ,

(i) f : R r → C is continuous;
430 Fourier analysis 283Wa

∧ √
(ii) limkyk→∞ f (y) = 0, writing kyk = y . y as usual;
R ∧
(iii) if kxk|f (x)|dx < ∞, then f is differentiable, and
∂ ∧ i R
f (y) =− √
r
e−iy . x ξj f (x)dx
∂ηj ( 2π)
for j ≤ r, y ∈ R r , always taking ξj to be the jth coordinate of x ∈ R r ;

∂f ∂f ∧
(iv) if j ≤ r and ∂ξj is defined everywhere and is integrable, and if limkxk→∞ f (x) = 0, then ( ∂ξ j
) (y) = iηj f (y)
for every y ∈ Rr .

(b) Let f be an integrable complex-valued function on R r , and f its Fourier transform. If c ≤ d in R r , show that
Z Z Yr
i eiγj ηj − eiδj ηj ∧
f = ( √ )r lim f (y)dy,
[c,d] 2π α1 ,... ,αr →∞ [−a,a] j=1 ηj

setting a = (α1 , . . . ), c = (γ1 , . . . ), d = (δ1 , . . . ).



(c) Let f be an integrable complex-valued function on R r , and f its Fourier transform. Show that if we write
B∞ (0, a) = {y : |ηj | ≤ a for every j ≤ r},
then
1 R ∧ R
√ eix . y f (y)dy = φa (t)f (x − t)dt
( 2π)r B∞ (0,a)
for every a ≥ 0, where
1 Qr sin aτj
φa (t) =
π r j=1 τj

for t = (τ1 , . . . , τr ) ∈ R r .


√ ∧
(d) Let f and g be integrable complex-valued functions on R r . Show that f ∗ g = ( 2π)r (f × g)∨ .

(e) For σ > 0, define ψσ : Rr → C by setting


1 2
ψσ (x) = √ e−x . x/2σ
(σ 2π)r
for every x ∈ R r . Show that
∧ ∨ 1
ψσ = ψσ = ψ .
σ r 1/σ

(f ) Defining ψσ as in (e), show that limσ→0 (f ∗ ψσ )(x) = f (x) for every continuous integrable f : R r → C, x ∈ R r .
∧ ∧∨
(g) Show that if f : R r → C is continuous and integrable, and f also is integrable, then f = f . (Hint: Show that
both are equal at every point to
√ ∧
limσ→∞ (σ 2π)r (f × ψσ )∨ = limσ→∞ f ∗ ψ1/σ .)

(h) Show that


R 1
Rr 1+kxkr+1
dx < ∞.

(i) Show that if f : R r → C can be partially differentiated r + 1 times, and f and all its partial derivatives

∂k f
∂ξj1 ∂ξj2 ...∂ξjk are integrable for k ≤ r + 1, then f is integrable.

√ ∧

(j) Show that if f and g are integrable complex-valued functions on R r , then (f ∗ g)∧ = ( 2π)r f × g.
R ∧
R ∧
(k) Show that if f and g are integrable complex-valued functions on R r , then f ×g = f × g.
283Xg Fourier transforms I 431

(l) Show that if f1 , . . . , fr are integrable complex-valued functions on R with Fourier transforms g1 , . . . , gr , and we
write f (x) = f1 (ξ1 ) . . . fr (ξr ) for x = (ξ1 , . . . , ξr ) ∈ R r , then the Fourier transform of f is y 7→ g1 (η1 ) . . . gr (ηr ).
R 2(k+1)π sin
R∞
(m)(i) Show that √ t dt > 0 for every k ∈ N, and hence that sin
√ t dt > 0.
2kπ t t 0t t
p Ra ∧
(ii) Set f1 (ξ) = 1/ |ξ| for 0 < |ξ| ≤ 1, 0 for other ξ. Show that lima→∞ √1a −a f 1 (η)dη
exists in R and is greater
than 0.
(iii) Construct an integrable function f2 , zero on some neighbourhood of 0, such that there are infinitely many
Rm ∧
m ∈ N for which | −m f 2 (η)dη| ≥ √1m . (Hint: take f2 (ξ) = 2−k sin mk ξ for k + 1 ≤ ξ < k + 2, for a sufficiently rapidly
increasing sequence hmk ik∈N .)
(iv) Set f (x) = f1 (ξ1 )f2 (ξ2 ) for x ∈ R 2 . Show that f is integrable, that f is zero in a neighbourhood of 0, but
that
1 R ∧
lim supa→∞ | B (0,a) f (y)dy| > 0,
2π ∞

defining B∞ as in (c).

∧ ∨
283X Basic exercises (a) Confirm that the six alternative definitions of the transforms f , f offered in 283B all
lead to the same theory; find the constants involved in the new versions of 283Ch, 283Ci, 283L, 283M and 283N.
∧ R∞ ∨
(b) If we redefined f (y) to be α −∞
eiβxy f (x)dx, what would f (y) be?

(c) Show that nearly every 2π would disappear from the theorems of this section if we defined a measure ν on R by
saying that νE = √12π µE for every Lebesgue measurable set E, where µ is Lebesgue measure, and wrote
∧ R∞ ∨ R∞
f (y) = −∞
e−iyx f (x)ν(dx), f (y) = −∞
eiyx f (x)ν(dx),
R∞
(f ∗ g)(x) = −∞
f (t)g(x − t)ν(dt).
Ra sin t
What is lima→∞ −a t ν(dt)?


> (d) Let f be an integrable complex-valued function on R, with Fourier transform f . Show that (i) if g(x) = f (−x)


whenever this is defined, then g(y) = f (−y) for every y ∈ R; (ii) if g(x) = f (x) whenever this is defined, then


g(y) = f (−y) for every y.


(e) Let f be an integrable complex-valued function on R, with Fourier transform f . Show that
Rd ∧ i R ∞ e−idx −e−icx
f (y)dy = √ f (x)dx
c 2π −∞ x

whenever c ≤ d in R.

> (f ) For an integrable complex-valued function f on R, let its Fejér integrals be


1 Rc Ra ∧ 
σc (x) = √
−a
eixy f (y)dy da
c 2π 0

for c > 0. Show that


1 R ∞ 1−cos ct
σc (x) = f (x − t)dt.
π −∞ ct2

R∞ 1−cos at
(g) Show that −∞
dt = π for every a > 0. (Hint: integrate by parts and use 283Da.) Show that
at2
R ∞ 1−cos at 1−cos at
lima→∞ δ
dt = lima→∞ supt≥δ =0
at2 at2

for every δ > 0.


432 Fourier analysis 283Xh

(h) Let f be an integrable complex-valued function on R, and define its Fejér integrals σa as in 283Xf above. Show
that if x ∈ R, c ∈ C are such that
1R δ
limδ↓0 |f (x + t) + f (x − t) − 2c|dt = 0,
δ 0

then lima→∞ σa (x) = c. (Hint: adapt the argument of 282H.)

> (i) Let f be an integrable complex-valued function on R, and define its Fejér integrals σa as in 283Xf above. Show
that f (x) = lima→∞ σa (x) for almost every x ∈ R.

(j) Let f : R → C be a continuous integrable complex-valued function of bounded variation, and define its Fejér
integrals σa as in 283Xf above. Show that f (x) = lima→∞ σa (x) uniformly for x ∈ R.

> (k) Let f be an integrable complex-valued function of bounded variation on R, and f its Fourier transform. Show

that supy∈R |y f (y)| < ∞.


√ ∧
(l) Let f and g be integrable complex-valued functions on R. Show that f ∗ g = 2π(f × g)∨ .

(m) Let f be an integrable complex-valued function on R, and fix x ∈ R. Set


R∞
fˆx (y) = −∞
f (t) cos y(x − t)dt
for y ∈ R. Show that
(i) if f is differentiable at x,
1 Ra
f (x) = lima→∞ 0
f˜x (y)dy;
π

(ii) if there is a neighbourhood of x in which f has bounded variation, then


1 Ra 1
lima→∞ 0
fˆx (y)dy = (limt∈dom f,t↑0 f (t) + limt∈dom f,t↓0 f (t));
π 2
R∞
(iii) if f is twice differentiable and f , f are integrable then fˆx is integrable and f (x) =
′ ′′ 1
π 0
fˆx . (The formula
1R ∞ R∞ 
f (x) = −∞
f (t) cos y(x − t)dt dy,
π 0

valid for such functions f , is called Fourier’s integral formula.)

(n) Show that if f is a complex-valued function of bounded variation, defined almost everywhere in R, and converging
to 0 at ±∞, then
1 Ra
g(y) = √ lima→∞ −a
e−iyx f (x)dx

is defined in C for every y 6= 0, and that the limit is uniform in any region bounded away from 0.

(o) Let f be an integrable complex-valued function on R. Set


∧ 1 R∞ ∧ 1 R∞
f c (y) = √ cos yx f (x)dx, f s (y) = √ sin yx f (x)dx
2π −∞ 2π −∞
for y ∈ R. Show that
r r
1 R a ixy ∧ 2R a ∧
2R a ∧
√ e f (y)dy = cos xy f c (y)dy + sin xy f s (y)dy
2π −a π 0 π 0

for every x ∈ R, a ≥ 0.
RaR∞ R∞Ra Ra sin y
(p) Use the fact that 0 0
e−xy sin y dxdy = 0 0
e−xy sin y dydx whenever a ≥ 0 to show that lima→∞ 0 y dy =
R∞ 1
0 1+x2
dx.

(q) Let f : R → C be an integrable function which is absolutely continuous on every bounded interval, and suppose
∧ ∧∨
that its derivative f ′ is of bounded variation on R. Show that f is integrable and that f = f . (Hint: 283Ci, 283Xk.)
283 Notes Fourier transforms I 433



> (r) Show that if f (x) = e−σ|x| , where σ > 0, then f (y) = √
2π(σ 2 +y 2 )
. Hence, or otherwise, find the Fourier
1
transform of y 7→ .
1+y 2

(s) Find the inverse Fourier transform of the characteristic function of a bounded interval in R. Show that in a
formal sense 283F can be regarded as a special case of 283O.
∧ Pn Pn ∧
(t) Let f be a non-negative integrable function on R, with Fourier transform f . Show that j=0 k=0 aj āk f (yj −
yk ) ≥ 0 whenever y0 , . . . , yn in R and a0 , . . . , an ∈ C.
P∞
(u) Let f be an integrable complex-valued function on R. Show that f˜(x) = n=−∞ f (x + 2πn) is defined in C for
P∞ Rπ
˜
almost every x. (Hint: n=−∞ −π |f (x + 2πn)|dx < ∞.) Show that f is periodic. Show that the Fourier coefficients

of f˜↾ ]−π, π] are h √1 f (k)ik∈Z .

283Y Further exercises (a) Show that if f : R → C is absolutely continuous in every bounded interval, f ′ is of
bounded variation on R, and limx→∞ f (x) = limx→−∞ f (x) = 0, then
1 Ra i Ra
g(y) = √ lima→∞ −a
e−iyx f (x)dx = − √ lima→∞ −a
e−iyx f ′ (x)dx
2π y 2π
is defined, with
4
y 2 |g(y)| ≤ √ VarR (f ′ ),

for every y 6= 0.

(b) Let f : R → [0, ∞[ be an even function such that f is convex on [0, ∞[ and limx→∞ f (x) = 0.
R 2kπ/y
(i) Show that, for any y > 0, k ∈ N, −2kπ/y e−iyx f (x)dx ≥ 0.
R a
(ii) Show that g(y) = √12π lima→∞ −a e−iyx f (x)dx exists in [0, ∞[ for every y 6= 0.
(iii) For n ∈ N, set fn (x) = e−|x|/(n+1) f (x) for every x. Show that fn is integrable and convex on [0, ∞[.

(iv) Show that g(y) = limn→∞ f n (y) for every y 6= 0.
(vi) Show that if f is integrable then
Ra ∧
4 R ∞ sin at 4a R π/a √
−a
f= √ f (t)dt ≤ √ f ≤ 2 2πf (1)
2π 0 t 2π 0

for every a ≥ 0. Hence show that whether f is integrable or not, g is integrable and fn = (f n )∨ for every n.
Ra ∧
(vii) Show that lima↓0 supn∈N −a f n = 0.

(viii) Show that if f ′ is bounded (on its domain) then {f n : n ∈ N} is uniformly integrable (hint: use (vii) and


283Ya), so that limn→∞ kf n − gk1 = 0 and f = g.
(ix) Show that if f ′ is unbounded then for every ǫ > 0 Rwe can find h1 , h2 : R → [0, ∞[, both even, convex and
converging to 0 at ∞, such that f = h1 + h2 , h′1 is bounded, h2 ≤ ǫ and h2 (1) ≤ ǫ. Hence show that in this case also

f = g.

(c) Suppose that f : R → R is even, twice differentiable and convergent to 0 at ∞, that f ′′ is continuous and that
{x : f ′′ (x) = 0} is bounded in R. Show that f is the Fourier transform of an integrable function. (Hint: use 283Yb
and 283Xq.)
R∞
(d) Let g : R → R be an odd function of bounded variation such that 1 x1 g(x)dx = ∞. Show that g cannot be

the Fourier transform of any integrable function f . (Hint: show that if g = f then
R1 2i R a 1−cos x
0
f= √ lima→∞ 0
g(x)dx = ∞.)
2π x

283 Notes and comments I have tried in this section to give the elementary theory of Fourier transforms of integrable
functions on R, with an eye to the extension of the concept which will be attempted in the next section. Following
§282, I have given prominence to two theorems (283I and 283L) describing conditions for the inversion of the Fourier
434 Fourier analysis 283 Notes
Ra
transform to return to the original function; we find ourselves looking at improper integrals lima→∞ −a , just as earlier
Pn
we needed to look at symmetric sums limn→∞ k=−n . I do not go quite so far as in §282, and in particular I leave
the study of square-integrable functions for the moment, since their Fourier transforms may not be describable by the
simple formulae used here.
One of the most fundamental obstacles in the subject is the lack of any effective criteria for determining which
functions are the Fourier transforms of integrable functions. (Happily, things are better for square-integrable functions;
see 284O-284P.) In 283Yb-283Yc I sketch an argument showing that ‘ordinary’ non-oscillating even functions which
converge to 0 at ±∞ are Fourier transforms of integrable functions. Strikingly, this is not true of odd functions; thus
1 arctan y
y 7→ is the Fourier transform of an integrable function, but y 7→ is not (283Yd).
ln(e+y 2 ) ln(e+y 2 )
In 283W I sketch the corresponding theory of Fourier transforms in Rr . There are few surprises. One point to note
is that where in the one-dimensional case we ask for a well-behaved second derivative, in the r-dimensional case we may
need to differentiate r + 1 times (283Wi). Another is that we lose the ‘localization principle’. In the one-dimensional
Ra ∧
case, if f is integrable and zero on an interval ]c, d[, then lima→∞ −a eixy f (y)dy = 0 for every x ∈ ]c, d[; this is
immediate from either 283I or 283L. But in higher dimensions the most natural formulation of a corresponding result
is false (283Wm).

284 Fourier transforms II



The basic paradox of Fourier transforms is the fact that while for certain functions (see 283J-283K) we have (f )∨ = f ,
‘ordinary’ integrable functions f (for instance, the characteristic functions of non-trivial intervals) give rise to non-
∧ ∧∨
integrable Fourier transforms f for which there is no direct definition available for f , making it a puzzle to decide in
∧∨
what sense the formula f = f might be true. What now seems by far the most natural resolution of the problem lies
in declaring the Fourier transform to be an operation on distributions rather than on functions. I shall not attempt
to describe this theory properly (almost any book on ‘Distributions’ will cover the ground better than I can possibly
do here), but will try to convey the fundamental ideas, so far as they are relevant to the questions dealt with here, in
language which will make the transition to a fuller treatment straightforward. At the same time, these methods make
it easy to prove strong versions of the ‘classical’ theorems concerning Fourier transforms.

284A Test functions: Definition Throughout this section, a rapidly decreasing test function or Schwartz
function will be a function h : R → C such that h is smooth, that is, differentiable everywhere any finite number of
times, and moreover
supx∈R |x|k |h(m) (x)| < ∞
for all k, m ∈ N, writing h(m) for the mth derivative of h.

284B The following elementary facts will be useful.


Lemma (a) If g and h are rapidly decreasing test functions, so are g + h and ch, for any c ∈ C.
(b) If h is a rapidly decreasing test function and y ∈ R, then x 7→ h(y − x) is a rapidly decreasing test function.
(c) If h is any rapidly decreasing test function, then h and h2 are integrable.
(d) If h is a rapidly decreasing test function, so is its derivative h′ .
(e) If h is a rapidly decreasing test function, so is the function x 7→ xh(x).
2
(f) For any ǫ > 0, the function x 7→ e−ǫx is a rapidly decreasing test function.
proof (a) is trivial.
(b) Write g(x) = h(y − x) for x ∈ R. Then g (m) (x) = (−1)m h(m) (y − x) for every m, so g is smooth. For any k ∈ N,
|x|k ≤ 2k (|y|k + |y − x|k )
for every x, so

sup |x|k |g (m) (x)| = sup |x|k |h(m) (y − x)|


x∈R x∈R

≤ 2k |y|k sup |h(m) (y − x)| + 2k sup |y − x|k |h(m) (y − x)|


x∈R x∈R

= 2k |y|k sup |h(m) (x)| + 2k sup |x|k |h(m) (x)| < ∞.


x∈R x∈R
284E Fourier transforms II 435

(c) Because
M = supx∈R |h(x)| + x2 |h(x)|
is finite, we have
R R M
|h| ≤ dx < ∞.
1+x2

Of course we now have |h2 | ≤ M |h|, so h2 also is integrable.


(d) This is immediate from the definition, as every derivative of h′ is a derivative of h.
(e) Setting g(x) = xh(x), g (m) (x) = xh(m) (x) + mh(m−1) (x) for m ≥ 1, so
supx∈R |xk g (m) (x)| ≤ supx∈R |xk+1 h(m) (x)| + m supx∈R |xk h(m−1) (x)|
is finite, for all k ∈ N, m ≥ 1.
2
(f ) If h(x) = e−ǫx , then for each m ∈ N we have h(m) (x) = pm (x)h(x), where p0 (x) = 1 and pm+1 (x) =
2

pm (x) − 2ǫxpm (x), so that pm is a polynomial. Because eǫx ≥ ǫk+1 x2k+2 /(k + 1)! for all x, k ≥ 0,
2
lim|x|→∞ |x|k h(x) = limx→∞ xk /eǫx = 0
for every k, and lim|x|→∞ p(x)h(x) = 0 for every polynomial p; consequently
lim|x|→∞ xk h(m) (x) = lim|x|→∞ xk pm (x)h(x) = 0
for all k, m, and h is a rapidly decreasing test function.

∧ ∨
284C Proposition Let h : R → C be a rapidly decreasing test function. Then h : R → C and h : R → C are
∧∨ ∨∧
rapidly decreasing test functions, and h = h = h.
R∞
proof (a) Let k, m ∈ N. Then supx∈R (|x|m + |x|m+2 )|h(k) (x)| < ∞ and −∞ |xm h(k) (x)|dx < ∞. We may therefore

use 283Ch-283Ci to see that y 7→ ik+m y k h(m) (y) is the Fourier transform of x 7→ xm h(k) (x), and therefore that
∧ ∧ ∧
lim|y|→∞ y k h(m) (y) = 0, by 283Cg, so that (because h(m) is continuous) supy∈R |y k h(m) (y)| is finite. As k and m are

arbitrary, h is a rapidly decreasing test function.
∨ ∧ ∨
(b) Since h(y) = h(−y) for every y, it follows at once that h is a rapidly decreasing test function.
∧∨ ∨∧
(c) By 283J, it follows from (a) and (b) that h = h = h.

284D Definition I will use the phrase tempered function on R to mean a measurable complex-valued function
f , defined almost everywhere in R, such that
R∞ 1
−∞ 1+|x|k
|f (x)|dx < ∞

for some k ∈ N.

284E As in 284B I spell out some elementary facts.


Lemma (a) If f and g are tempered functions, so are |f |, f + g and cf , for any c ∈ C.
(b) If f is a tempered function then it is integrable over any bounded interval.
(c) If f is a tempered function and x ∈ R, then t 7→ f (x + t) and t 7→ f (x − t) are both tempered functions.
proof (a) is elementary; if
R∞ 1 R∞ 1
−∞ 1+|x|j
f (x)dx < ∞, −∞ 1+|x|k
g(x)dx < ∞,

then
R∞ 1
−∞ 1+|x|j+k
|(f + g)(x)| < ∞

because
1
1 + |x|j+k ≥ max(1, |x|j+k ) ≥ max(1, |x|j , |x|k ) ≥ max(1 + |x|j , 1 + |x|k )
2
436 Fourier analysis 284E

for all x.
(b) If
R∞ 1
−∞ 1+|x|k
|f (x)|dx = M < ∞,

then for any a ≤ b


Rb
a
|f | ≤ M (1 + |a|k + |b|k )(b − a) < ∞.

(c) The idea is the same as in 284Bb. If k ∈ N is such that


R∞ 1
−∞ 1+|t|k
|f (t)|dt = M < ∞,

then we have
1 + |x + t|k ≤ 2k (1 + |x|k )(1 + |t|k )
so that
1 1
≤ 2k (1 + |x|k )
1+|t|k 1+|x+t|k
for every t, and
R∞ |f (x+t)| R∞ |f (x+t)|
−∞ 1+|t|k
dt ≤ 2k (1 + |x|k ) −∞ 1+|x+t|k
dt ≤ 2k (1 + |x|k )M < ∞.

Similarly,
R∞ |f (x−t)|
−∞ 1+|t|k
dt ≤ 2k (1 + |x|k )M < ∞.

284F Linking the two concepts, we have the following.


Lemma Let f be a tempered function on R and h a rapidly decreasing test function. Then f × h is integrable.
R∞ 1
proof Of course f × h is measurable. Let k ∈ N be such that −∞ 1+|x| k |f (x)|dx < ∞. There is a M such that
k
(1 + |x| )|h(x)| ≤ M for every x ∈ R, so that
R∞ R∞ 1
−∞
|f × h| ≤ M −∞ 1+|x|k
|f (x)|dx < ∞.

R R
284G Lemma Suppose that f1 and f2 are tempered functions and that f1 × h = f2 × h for every rapidly
decreasing test function h. Then f1 =a.e. f2 .
R
proof (a) Set g = f1 − f2 ; then g × h = 0 for every rapidly decreasing test function h. Of course g is a tempered
Rb
function, so is integrable over any bounded interval. By 222D, it will be enough if I can show that a g = 0 whenever
a < b, since then we shall have g = 0 a.e. on every bounded interval and f1 =a.e. f2 .
(b) Consider the function φ0 (x) = e−1/x for x > 0. Then φ0 is differentiable arbitrarily often everywhere in ]0, ∞[,
(m)
0 < φ0 (x) < 1 for every x > 0, and limx→∞ φ0 (x) = 1. Moreover, writing φ0 for the mth derivative of φ0 ,
(m) 1 (m)
limx↓0 φ0 (x) = limx↓0 φ0 (x) = 0
x
(m)
for every m ∈ N. P P (Compare 284Bf.) We have φ0 (x) = pm ( x1 )φ0 (x), where p0 (t) = 1 and pm+1 (t) = t2 (pm (t) −

pm (t)), so that pm is a polynomial for each m ∈ N. Now for any k ∈ N,
(k+1)!tk
limt→∞ tk e−t ≤ limt→∞ = 0,
tk+1
so
(m)
limx↓0 φ0 (x) = limt→∞ pm (t)e−t = 0,

1 (m)
limx↓0 φ0 (x) = limt→∞ tpm (t)e−t = 0. Q
Q
x
284Ie Fourier transforms II 437

(c) Consequently, setting φ(x) = 0 for x ≤ 0, e−1/x for x > 0, φ is smooth, with mth derivative
(m)
φ(m) (x) = 0 for x ≤ 0, φ(m) (x) = φ0 (x) for x > 0.
(The proof is an easy induction on m.) Also 0 ≤ φ(x) ≤ 1 for every x ∈ R, and limx→∞ φ(x) = 1.
(d) Now take any a < b, and for n ∈ N set
ψn (x) = φ(n(x − a))φ(n(b − x)).
Then ψn will be smooth and ψn (x) = 0 if x ∈
/ ]a, b[, so surely ψn is a rapidly decreasing test function, and
R∞
−∞
g × ψn = 0.
Next, 0 ≤ ψn (x) ≤ 1 for every x, n, and if a < x < b then limn→∞ ψn (x) = 1. So
Rb R R R
a
g= g × χ(]a, b[) = g × (limn→∞ ψn ) = limn→∞ g × ψn = 0,
using Lebesgue’s Dominated Convergence Theorem. As a and b are arbitrary, g = 0 a.e., as required.

284H Definition Let f and g be tempered functions in the sense of 284D. Then I will say that g represents the
Fourier transform of f if
R∞ R∞ ∧

−∞
g×h= −∞
f ×h
for every rapidly decreasing test function h.

284I Remarks (a) As usual, when shifting definitions in this way, we have some checking to do. If f is an integrable
∧ ∧
complex-valued function on R, f its Fourier transform, then surely f is a tempered function, being a bounded continuous
R ∧ R ∧ ∧
function; and if h is any rapidly decreasing test function, then f × h = f × h by 283O. Thus f ‘represents the
Fourier transform of f ’ in the sense of 284H above.

(b) Note also that 284G assures us that if g1 , g2 are two tempered functions both representing the Fourier transform
of f , then g1 =a.e. g2 , since we must have
R R ∧ R
g1 × h = f ×h= g2 × h
for every rapidly decreasing test function h.

(c) Of course the value of this indirect approach is that we can assign Fourier transforms, in a sense, to many more
functions. But we must note at once that if g ‘represents the Fourier transform of f ’ then so will any function equal
almost everywhere to g; we can no longer expect to be able to speak of ‘the’ Fourier transform of f as a function. We
could say that ‘the’ Fourier transform of f is a functional φ on the space of rapidly decreasing test functions, defined
R ∧
by setting φ(h) = f × h; alternatively, we could say that ‘the’ Fourier transform of f is a member of L0C , the space
of equivalence classes of almost-everywhere-defined measurable functions (241J).

(d) It is now natural to say that g represents the inverse Fourier transform of f just when f represents
R R ∧
the Fourier transform of g; that is, when f × h = g × h for every rapidly decreasing test function h. Because
∧∨ ∨∧ R ∨ R
h = h = h for every such h, this is the same thing as saying that f × h = g × h for every rapidly decreasing test
function h, which is the other natural expression of what it might mean to say that ‘g represents the inverse Fourier
transform of f ’.
↔ ↔
(e) If f , g are tempered functions and we write g (x) = g(−x) whenever this is defined, then g will also be a
tempered function, and we shall always have
R ↔
∧ R ∧ R ∧ R ∨
g ×h= g(−x)h(x)dx = g(x)h(−x)dx = g × h,
so that
g represents the Fourier transform of f
R R ∧
⇐⇒ g × h = f × h for every test function h
R ∨ R ∨∧
⇐⇒ g × h = f × h for every h
R↔ ∧ R
⇐⇒ g × h = f × h for every h
438 Fourier analysis 284Ie


⇐⇒ g represents the inverse Fourier transform of f .
Combining this with (d), we get
g represents the Fourier transform of f


⇐⇒ f = f represents the inverse Fourier transform of g

⇐⇒ f represents the Fourier transform of g.


(f ) Yet again, I ought to spell out the check: if f is integrable and f is its inverse Fourier transform as defined in
283Ab, then
R ∨ ∧ R ∧∨ R
f ×h= f ×h = f ×h

for every rapidly decreasing test function h, so f ‘represents the inverse Fourier transform of f ’ in the sense given here.

284J Lemma Let f be any tempered function and h a rapidly decreasing test function. Then f ∗ h, defined by the
formula
R∞
(f ∗ h)(y) = −∞
f (t)h(y − t)dt,
is defined everywhere.

proof Take any y ∈ R. By 284Bb, t 7→ h(y − t) is a rapidly decreasing test function, so the integral is always defined
in C, by 284F.

284K Proposition Let f and g be tempered functions such that g represents the Fourier transform of f , and h a
rapidly decreasing test function.
∧ ∧ ∧
(a) The Fourier transform of the integrable function f × h is √1 g
∗ h, where g ∗ h is the convolution of g and h.

√ ∧
(b) The Fourier transform of the continuous function f ∗ h is represented by the product 2πg × h.

proof (a) Of course f × h is integrable, by 284F, while g ∗ h is defined everywhere, by 284C and 284J.
∧ ∧
Fix y ∈ R. Set φ(x) = h(y − x) for x ∈ R; then φ is a rapidly decreasing test function because h is (284Bb). Now
Z ∞ Z ∞
∧ 1 ∧ 1 ∧
φ(t) = √ e−itx h(y − x)dx = √ e−it(y−x) h(x)dx
2π −∞
2π −∞
Z ∞
1 ∧ ∧∨
= √ e−ity eitx h(x)dx = e−ity h (t) = e−ity h(t),
2π −∞

using 284C. Accordingly

Z ∞
1
(f × h)∧ (y) = √ e−ity f (t)h(t)dt
2π −∞
Z ∞ Z ∞
1 ∧ 1
= √ f (t)φ(t)dt = √ g(t)φ(t)dt
2π −∞
2π −∞
(because g represents the Fourier transform of f )
Z ∞
1 ∧ 1 ∧
= √ g(t)h(y − t)dt = √ (g ∗ h)(y).
2π −∞

1 ∧
As y is arbitrary, √ g ∗ h is the Fourier transform of f × h.

∧ ↔ ↔
(b) Write ψ for the Fourier transform of g × h, f (x) = f (−x) when this is defined, and h(x) = h(−x) for every
↔ ↔ ∧
x, so that f represents the Fourier transform of g, by 284Ie, and h is the Fourier transform of h. By (a), we have
1 ↔ ↔ √ ∧ √ ↔ ↔ ↔
ψ = √ f ∗ h. This means that the inverse Fourier transform of 2πg × h must be 2π ψ = ( f ∗ h)↔ ; and as

284L Fourier transforms II 439

↔ ↔ ↔ ↔
( f ∗ h)↔ (y) = ( f ∗ h)(−y)
Z ∞
↔ ↔
= f (t) h(−y − t)dt
−∞
Z ∞
= f (−t)h(y + t)dt
−∞
Z ∞
= f (t)h(y − t)dt = (f ∗ h)(y),
−∞
√ ∧ √ ∧
the inverse Fourier transform of 2πg × h is f ∗ h (which is therefore continuous), and 2πg × h must represent the
Fourier transform of f ∗ h.
Remark Compare 283M. It is typical of the theory of Fourier transforms that we have formulae valid in a wide variety
of contexts, each requiring a different interpretation and a different proof.

284L We are now ready for a result corresponding to 282H. I use a different method, or at least a different
arrangement of the ideas, through the following fact, which is important in other ways.
2 2
Proposition Let f be any tempered function. Writing ψσ (x) = √1 e−x /2σ for x ∈ R and σ > 0, then
σ 2π

limσ↓0 (f ∗ ψσ )(x) = c
whenever x ∈ R and c ∈ C are such that
1R δ
limδ↓0 |f (x + t) + f (x − t) − 2c|dt = 0.
δ 0

R ∞every ψσ is a rapidly decreasing test function, so that f ∗ ψσ is defined everywhere, by 284J. We


proof (a) By 284Bf,
need to know that −∞ ψσ = 1; this is because (substituting u = x/σ)
R∞ 1 R ∞ −u2 /2
ψ =
−∞ σ
√ e du = 1,
2π −∞
by 263G. The argument now follows the lines of 282H. Set
φ(t) = |f (x + t) + f (x − t) − 2c|
Rt
when this is defined, which is almost everywhere, and Φ(t) = 0 φ, defined for all t ≥ 0 because f is integrable over
every bounded interval (284Eb). We have

Z ∞ Z ∞
|(f ∗ ψσ )(x) − c| = | f (x − t)ψσ (t)dt − c ψσ (t)dt|
−∞ −∞
Z 0 Z ∞
=| (f (x − t) − c)ψσ (t)dt + (f (x − t) − c)ψσ (t)dt|
−∞ 0
Z ∞ Z ∞
=| (f (x + t) − c)ψσ (t)dt + (f (x − t) − c)ψσ (t)dt|
0 0
(because ψσ is an even function)
Z ∞
=| (f (x + t) + f (x − t) − 2c)ψσ (t)dt|
0
Z ∞ Z ∞
≤ |f (x + t) + f (x − t) − 2c|ψσ (t)dt = φ × ψσ .
0 0

(b) I should explain why this last integral is finite. Because f is a tempered function, so are the functions t 7→ f (x+t),
t 7→ f (x − t) (284Ec); of course constant functions are tempered, so t 7→ φ(t) = |f (x + t) + f (x − t) − 2c| is tempered,
and because ψσ is a rapidly decreasing test function we may apply 284F to see that the product is integrable.
(c) Let ǫ > 0. By hypothesis, R ∞ limt↓0 Φ(t)/t = 0; let δ > 0 be such that Φ(t) ≤ ǫt for every t ∈ [0, δ]. Take any
σ ∈ ]0, δ]. I break the integral 0 φ × ψσ up into three parts.
(i) For the integral from 0 to σ, we have
440 Fourier analysis 284L

Z σ Z σ
1 1 ǫσ
φ × ψσ ≤ √ φ = √ Φ(σ) ≤ √ ≤ ǫ,
0 0
σ 2π σ 2π σ 2π

because ψσ (t) ≤ √1 for every t.


σ 2π

(ii) For the integral from σ to δ, we have

Z δ Z δ
1 2σ 2
φ × ψσ ≤ √ φ(t) dt
σ
σ 2π σ
t2
−t2 /2σ 2 t2 /2σ 2
(because e = 1/e ≤ 1/(t /2σ ) = 2σ 2 /t2 for every t 6= 0)
2 2
r Z δ r Z δ
2 φ(t) 2 Φ(δ) Φ(σ) 2Φ(t) 
=σ dt =σ − + dt
π σ
t2 π δ2 σ σ
t3
(integrating by parts – see 225F)
δ Z
2ǫ  ǫ
≤σ + dt
σ
t2 δ
p
(because Φ(t) ≤ ǫt for 0 ≤ t ≤ δ and 2/π ≤ 1)
ǫ 2ǫ 
≤σ + ≤ 3ǫ.
δ σ

(iii) For the integral from δ to ∞, we have


Z ∞ Z ∞ 2
/2σ 2
1 e−t
φ × ψσ = √ φ(t) dt.
δ 2π δ σ
Now for any t ≥ δ,
1 2
/2σ 2
σ 7→ e−t : ]0, δ] → R
σ
is monotonically increasing, because its derivative
d 1 −t2 /2σ 2 1 t2  2 2
e = − 1 e−t /2σ
dσ σ σ2 σ2
is positive, and
1 2
/2σ 2 2 2
limσ↓0 e−t = lima→∞ ae−a t /2
= 0.
σ
So we may apply Lebesgue’s Dominated Convergence Theorem to see that
Z ∞ 2 2
e−t /2σn
lim φ(t) dt = 0
n→∞ δ σn
whenever hσn in∈N is a sequence in ]0, δ] converging to 0, so that
Z ∞ 2
/2σ 2
e−t
lim φ(t) dt = 0.
σ↓0 δ σ
There must therefore be a σ0 ∈ ]0, δ] such that
R∞
δ
φ × ψσ ≤ ǫ
for every σ ≤ σ0 .
Putting these together, we see that
R∞
|(f ∗ ψσ )(x) − c| ≤ 0
φ × ψσ ≤ ǫ + 3ǫ + ǫ = 5ǫ
whenever 0 < σ ≤ σ0 . As ǫ is arbitrary, limσ↓0 (f ∗ ψσ )(x) = c, as claimed.
284N Fourier transforms II 441

284M Theorem Let f and g be tempered functions such that g represents the Fourier transform of f . Then
1 R ∞ −iyx −ǫx2
(a)(i) g(y) = limǫ↓0 √ −∞
e e f (x)dx for almost every y ∈ R.

(ii) If y ∈ R is such that a = limt∈dom g,t↑y g(t) and b = limt∈dom g,t↓y g(t) are both defined in C, then
1 R ∞ −iyx −ǫx2 1
limǫ↓0 √ e e f (x)dx = (a + b).
2π −∞ 2
1 R ∞ ixy −ǫy 2
(b)(i) f (x) = limǫ↓0 √ e e g(y)dy for almost every x ∈ R.
2π −∞
(ii) If x ∈ R is such that a = limt∈dom f,t↑x f (t) and b = limt∈dom f,t↓x f (t) are both defined in C, then
1 R ∞ ixy −ǫy 2 1
limǫ↓0 √ e e g(y)dy = (a + b).
2π −∞ 2

proof (a)(i) By 223D,


1 Rδ
limδ↓0 |g(y + t) − g(y)|dt = 0
2δ −δ
for almost every y ∈ R, because g is integrable over any bounded interval. Fix any such y. Set φ(t) = |g(y + t) + g(y −
t) − 2g(y)| whenever this is defined. Then, as in 282Ia,
Rδ Rδ
0
φ≤ −δ
|g(y + t) − g(y)|dt,
R
1 δ
so limδ↓0 δ 0
φ = 0. Consequently, by 284L,
g(y) = limσ→∞ (g ∗ ψ1/σ )(y).
1
We know from 283N that the Fourier transform of ψσ is σ ψ1/σ for any σ > 0. Accordingly, by 284K, g ∗ ψ1/σ is the

Fourier transform of σ 2πf × ψσ , that is,
R∞
(g ∗ ψ1/σ )(y) = −∞
e−iyx σψσ (x)f (x)dx.
So
Z ∞
g(y) = lim e−iyx σψσ (x)f (x)dx
σ→∞ −∞
Z ∞
1 2 2
= lim √ e−iyx e−x /2σ f (x)dx
σ→∞ 2π −∞
Z ∞
1 −iyx −ǫx2
= lim √ e e f (x)dx.
ǫ↓0 2π −∞

And this is true for almost every y.


(ii) Again, setting c = 12 (a+b), φ(t) = |g(y+t)+g(y−t)−2c| whenever this is defined, we have limt∈dom φ,t↓0 φ(t) =

0, so of course limδ↓0 1δ 0 φ = 0, and
Z ∞
1 2
c = lim (g ∗ ψ1/σ )(y) = lim √ e−iyx e−ǫx f (x)dx
σ→∞ ǫ↓0 2π −∞

as before.

(b) This can be shown by similar arguments; or it may be actually deduced from (a), by observing that x 7→ f (x) =

f (−x) represents the Fourier transform of g (see 284Id), and applying (a) to g and f .

284N L2 spaces We are now ready for results corresponding to 282J-282K.


Lemma Let L2C be the space of square-integrable complex-valued functions on R, and S the space of rapidly decreasing
test functions. Then for every f ∈ L2C and ǫ > 0 there is an h ∈ S such that kf − hk2 ≤ ǫ.
proof Set φ(x) = e−1/x for x > 0, zero for x ≤ 0; recall from the proof of 284G that φ is smooth. For any a < b, the
functions
x 7→ ψn (x) = φ(n(x − a))φ(n(b − x))
provide a sequence of test functions converging to χ ]a, b[ from below, so (as in 284G)
442 Fourier analysis 284N

Rb
inf h∈S kχ ]a, b[ − hk22 ≤ limn→∞ a
|1 − ψn |2 = 0.
Because S is a linear space (284Ba), it follows that for every step-function g with bounded support and every ǫ > 0
there is an h ∈ S such that kg − hk2 ≤ 12 ǫ. But we know from 244H that for every f ∈ L2C and ǫ > 0 there is a
step-function g with bounded support such that kf − gk2 ≤ 12 ǫ; so there must be an h ∈ S such that
kf − hk2 ≤ kf − gk2 + kg − hk2 ≤ ǫ.
As f and ǫ are arbitrary, we have the result.

284O Theorem (a) Let f be any complex-valued function which is square-integrable over R. Then f is a tempered
function and its Fourier transform is represented by another square-integrable function g, and kgk2 = kf k2 .
(b) If f1 and f2 are complex-valued functions, square-integrable over R, with Fourier transforms represented by
functions g1 , g2 , then
R∞ R∞
−∞
f1 × f¯2 = g
−∞ 1
× ḡ2 .
(c) If f1 and f2 are complex-valued functions, square-integrable over R, with Fourier transforms represented by
functions g1 , g2 , then the integrable function f1 × f2 has Fourier transform √12π g1 ∗ g2 .
(d) If f1 and f2 are
√ complex-valued functions, square-integrable over R, with Fourier transforms represented by
functions g1 , g2 , then 2πg1 × g2 represents the Fourier transform of the continuous function f1 ∗ f2 .
proof (a)(i) Consider first the case in which f is a rapidly decreasing test function and g is its Fourier transform; we
know that g is also a rapidly decreasing test function, and that f is the inverse Fourier transform of g (284C). Now
the complex conjugate g of g is given by the formula
1 R ∞ −iyx 1 R ∞ iyx
g(y) = √ −∞
e f (x)dx = √ −∞
e f (x)dx,
2π 2π

so that g is the inverse Fourier transform of f . Accordingly


R R ∨
R ∨ R
f ×f = g×f = g×f = g × g,
using 283O for the middle equality.
(ii) Now suppose that f ∈ L2C . I said that f is a tempered function; this is simply because
R∞ 1 2
−∞ 1+|x|
dx < ∞,
so
R∞ |f (x)|
−∞ 1+|x|
dx < ∞

(244Eb). By 284N, there is a sequence hfn in∈N of rapidly decreasing test functions such that limn→∞ kf − fn k2 = 0.
By (i),
∧ ∧
limm,n→∞ kf m − f n k2 = limm,n→∞ kfm − fn k2 = 0,
∧ ∧
and the sequence hf •n in∈N of equivalence classes is a Cauchy sequence in L2C . Because L2C is complete (244G), hf •n in∈N
has a limit in L2C , which is representable as g • for some g ∈ L2C . Like f , g must be a tempered function. Of course

kgk2 = limn→∞ kf n k2 = limn→∞ kfn k2 = kf k2 .
Now if h is any rapidly decreasing test function, h ∈ L2C (284Bc), so we shall have
R R ∧ R ∧ R ∧
g × h = limn→∞ f n × h = limn→∞ fn × h = f × h.
So g represents the Fourier transform of f .
(b) Of course any functions representing the Fourier transforms of f1 and f2 must be equal almost everywhere to
square-integrable functions, and therefore square-integrable, with the right norms. It follows as in 282K (part (d) of
the proof) that if g1 , g2 represent the Fourier transforms of f1 , f2 , so that ag1 + bg2 represents the Fourier transform
of af1 + bf2 and kag1 + bg2 k2 = kaf1 + bf2 k2 for all a, b ∈ C, we must have
R R
f1 × f 2 = (f1 |f2 ) = (g1 |g2 ) = g1 × g 2 .

(c) Of course f1 × f2 is integrable because it is the product of two square-integrable functions (244E).
284Qb Fourier transforms II 443

(i) Let y ∈ R and set f (x) = f2 (x)eiyx for x ∈ R. Then f ∈ L2C . We need to know that the Fourier transform of
f is represented by g, where g(u) = g2 (y − u). P P Let h be a rapidly decreasing test function. Then
Z Z Z
g × h = g2 (y − u)h(u)du = g2 (u)h(y − u)du
Z Z

= g 2 × h1 = f 2 × h1 ,


where h1 (u) = h(y − u). To compute h1 , we have
Z ∞ Z ∞
∧ 1 1
h1 (v) = √ e−ivu h1 (u)du = √ e−ivu h(y − u)du
2π −∞
2π −∞
Z ∞ Z ∞
1 1 ∧
= √ eivu h(y − u)du = √ eiv(y−u) h(u)du = eivy h(v).
2π −∞
2π −∞

So
Z Z Z
∧ ∧
g×h= f 2 × h1 =
f2 (v)h1 (v)dv
Z Z
∧ ∧
= f2 (v)eivy h(v)dv = f × h :

as h is arbitrary, g represents the Fourier transform of f . Q


Q
(ii) We now have

Z ∞
1
(f1 × f2 )∧ (y) = √ e−iyx f1 (x)f2 (x)dx
2π −∞
Z ∞ Z ∞
1 1
= √ f1 × f¯ = √ g1 × ḡ
2π −∞
2π −∞
(using part (b))
Z ∞
1 1
= √ g1 (u)g2 (y − u)du = √ (g1 ∗ g2 )(y).
2π −∞

As y is arbitrary, (f1 × f2 )∧ = √1 g1 ∗ g2 , as claimed.



√ ↔ ↔ ↔ ↔
(d) By (c), the Fourier transform of 2πg1 × g2 is f 1 ∗ f 2 , writing f 1 (x) = f1 (−x), so that f 1 represents the Fourier
√ ↔ ↔
transform of g1 . So the inverse Fourier transform of 2πg1 × g2 is ( f 1 ∗ f 2 )↔ . But, just as in the proof of 284Kb,
↔ ↔ √ √
( f 1 ∗ f 2 )↔ = f1 ∗ f2 , so f1 ∗ f2 is the inverse Fourier transform of 2πg1 × g2 , and 2πg1 × g2 represents the Fourier
transform of f1 ∗ f2 , as claimed. Also f1 ∗ f2 , being the Fourier transform of an integrable function, is continuous
(283Cf; see also 255K).

284P Corollary Writing L2C for the Hilbert space of equivalence classes of square-integrable complex-valued func-
tions on R, we have a linear isometry T : L2C → L2C given by saying that T (f • ) = g • whenever f , g ∈ L2C and g
represents the Fourier transform of f .

284Q Remarks (a) 284P corresponds, of course, to 282K, where the similar isometry between ℓ2C (Z) and L2C (]−π, π])
is described. In that case there was a marked asymmetry which is absent from the present situation; because the relevant
measure on Z, counting measure, gives non-zero mass to every point, members of ℓ2C are true functions, and it is not
surprising that we have a straightforward formula for S(f • ) ∈ ℓ2C for every f ∈ L2C (]−π, π]). The difficulty of describing
S −1 : ℓ2C (Z) → L2C (]−π, π]) is very similar to the difficulty of describing T : L2C (R) → L2C (R) and its inverse. 284Yg and
286U-286V show just how close this similarity is.

(b) I have spelt out parts (c) and (d) of 284O√ in detail, perhaps in unnecessary detail, because they give me an
opportunity to insist on the difference between ‘ 2πg1 × g2 represents the Fourier transform of f1 ∗ f2 ’ and ‘ √12π g1 ∗ g2
is the Fourier transform of f1 × f2 ’. The actual functions g1 and g2 are not well-defined by the hypothesis that they
444 Fourier analysis 284Qb

represent the Fourier transforms of f1 and f2 , though their equivalence classes g1• , g2• ∈ L2C are. So the product g1 × g2
is also not uniquely defined as a function, though its equivalence class (g1 × g2 )• = g1• × g2• is well-defined as a member
of L1C . However the continuous function g1 ∗ g2 is unaffected by changes to g1 and g2 on negligible sets, so is well defined
as a function; and since f1 × f2 is integrable, and has a true Fourier transform, it is to be expected that (f1 × f2 )∧
should be exactly equal to √12π g1 ∗ g2 .

(c) Of course 284Oc-284Od also exhibit a characteristic feature of arguments involving Fourier transforms, the
extension by continuity of relations valid for test functions.

(d) 284Oa is a version of Plancherel’s theorem. The formula kf k2 = kf k2 is Parseval’s identity.

284R Dirac’s delta function Consider the tempered function χR with constant value 1. In what sense, if any,
can we assign a Fourier transform to χR?
R ∧
If we examine χR × h, as suggested in 284H, we get
R∞ ∧ R ∞ ∧ √ ∧∨ √
−∞
χR × h = −∞
h = 2π h (0) = 2πh(0)
R √
for every rapidly decreasing test function h. Of course there is no function g such that g × h = 2πh(0) for every
Rb √
rapidly decreasing test function h, since (using the arguments of 284G) we should have to have a g = 2π whenever
a < 0 < b, so that the indefinite integral of g could not be continuous at 0. However there is a measure on R with
exactly
R the right property, the Dirac measure δ0 concentrated at 0; this is a Radon probability measure (257Xa), and
h dδ0 = h(0) for every function h defined at 0. So we shall have
R∞ ∧ √ R
−∞
χR × h = 2π h dδ0

for every rapidly decreasing test function h, and we can reasonably say that the measure ν = 2πδ0 ‘represents the
Fourier transform of χR’.
We note with pleasure at this point that
1 R
√ eixy ν(dy) = 1

for every x ∈ R, so that χR can be called the inverse Fourier transform of ν.
If we look at the formulae of Theorem 284M, we get ideas consistent with this pairing of χR with ν. We have
1 R ∞ −iyx −ǫx2 1 R ∞ −iyx −ǫx2 1 2
√ e e χR(x)dx = √ e e dx = √ e−y /4ǫ
2π −∞ 2π −∞ 2ǫ

for every y ∈ R, using 283N with σ = 1/ 2ǫ. So
1 R ∞ −iyx −ǫx2
limǫ↓0 √ e e χR(x)dx =0
2π −∞
for every y 6= 0, and the Fourier transform of χR should be zero everywhere except at 0. On the other hand, the
1 2 √
functions y 7→ √ e−y /4ǫ all have integral 2π, concentrated more and more closely about 0 as ǫ decreases to 0, so


also point us directly to ν, the measure which gives mass 2π to 0.
Thus allowing measures, as well as functions, enables us to extend the notion of Fourier transform. Of course we
can go very much farther than this. If h is any rapidly decreasing test function, then
R∞ ∧ √
−∞
xh(x)dx = −i 2πh′ (0),

so that the identity function x 7→ x can be assigned, as a Fourier transform, the operator h 7→ −i 2πh′ (0).
At this point we are entering the true theory of (Schwartzian) distributions or ‘generalized functions’, and I had
1 ∧
better stop. The ‘Dirac delta function’ is most naturally regarded as the measure δ0 above; alternatively, as √ χR.

284W The multidimensional case As in §283, I give exercises designed to point the way to the r-dimensional
generalization.

(a) A rapidly decreasing test function on R r is a function h : R r → C such that (i) h is smooth, that is, all
repeated partial derivatives
284Xc Fourier transforms II 445

∂mh
∂ξj1 ...∂ξjm
are defined and continuous everywhere in R r (ii)
∂mh
supx∈R r kxkk |h(x)| < ∞, supx∈Rr kxkk | (x)| <∞
∂ξj1 ...∂ξjm
for every k ∈ N, j1 , . . . , jm ≤ r. A tempered function on R r is a measurable complex-valued function f , defined
almost everywhere in R r , such that, for some k ∈ N,
R 1
Rr 1+kxkk
|f (x)|dx < ∞.

Show that if f is a tempered function on R r and h is a rapidly decreasing test function on R r then f × h is integrable.
∧ ∧∨
(b) Show that if h is a rapidly decreasing test function on R r so is h, and that in this case h = h.
R
(c) Show that if f is a tempered function on Rr and f × h = 0 for every rapidly decreasing test function h on R r ,
then f = 0 a.e.
R R ∧
(d) If f and g are tempered functions on Rr , I say that g represents the Fourier transform of f if g×h = f ×h

for every rapidly decreasing test function h on R r . Show that if f is integrable then f represents the Fourier transform
of f in this sense.
2
(e) Let f be any tempered function on R r . Writing ψσ (x) = (σ√12π)r e−x . x/2σ for x ∈ R r , show that limσ↓0 (f ∗
R
ψσ )(x) = c whenever x ∈ R r , c ∈ C are such that limδ↓0 δ1r B(x,δ) |f (t) − c|dt = 0, writing B(x, δ) = {t : kt − xk ≤ δ}.

(f ) Let f and g be tempered functions on R r such that g represents the Fourier transform of f , and h a rapidly
1
∧ √ ∧
decreasing test function. Show that (i) the Fourier transform of f × h is (√2π) r
g ∗ h (ii) ( 2π)r g × h represents the
Fourier transform of f ∗ h.
(g) Let f and g be tempered functions on R r such that g represents the Fourier transform of f . Show that
1 R −iy . x −ǫx . x
g(y) = limǫ↓0 √ e e f (x)dx
( 2π)r Rr
for almost every y ∈ R r .
(h) Show that for any square-integrable complex-valued function f on R r and any ǫ > 0 there is a rapidly decreasing
test function h such that kf − hk2 ≤ ǫ.
(i) Let L2C be the space of square-integrable complex-valued functions on R r . Show that
(i) for every f ∈ L2C there is a g ∈ L2C which represents the Fourier transform of f , and in this case kgk2 = kf k2 ;
1
(ii) if g1 , g2 ∈ L2C represent the Fourier transforms of f1 , f2 ∈ L2C , then (√2π) g ∗ g2 is the Fourier transform of
r 1
√ r
f1 × f2 , and ( 2π) g1 × g2 represents the Fourier transform of f1 ∗ f2 .

(j) Let T be an invertible real r × r matrix, regarded as a linear operator from R r to itself. (i) Show that f =
| det T |(f ◦ T )∧ ◦ T ′ for every integrable complex-valued function on R r . (ii) Show that h ◦ T is a rapidly decreasing test
function for every rapidly decreasing test function h. (iii) Show that if f , g are a tempered functions and g represents
1
the Fourier transform of f , then g ◦ (T ′ )−1 represents the Fourier transform of f ◦ T ; so that if T is orthogonal,
| det T |
then g ◦ T represents the Fourier transform of f ◦T .

284X Basic exercises (a) Show that if g and h are rapidly decreasing test functions, so is g × h.
(b) Show that there are non-zero continuous integrable functions f , g : R → C such that f ∗ g = 0 everywhere.
(Hint: take them to be Fourier transforms of suitable test functions.)
(c) Suppose that f : R → C is a differentiable function such that its derivative f ′ is a tempered function and, for
some k ∈ N,
limx→∞ x−k f (x) = limx→−∞ x−k f (x) = 0.
R R
(i) Show that f × h′ = − f ′ × h for every rapidly decreasing test function h. (ii) Show that if g is a tempered
function representing the Fourier transform of f , then y 7→ iyg(y) represents the Fourier transform of f ′ .
446 Fourier analysis 284Xd

(d) Show that if h is a rapidly decreasing


R∞ test function and f is any measurable complex-valued function, defined
almost everywhere in R, such that −∞ |x|k |f (x)|dx < ∞ for every k ∈ N, then the convolution f ∗ h is a rapidly
decreasing test function. (Hint: show that the Fourier transform of f ∗ h is a test function.)
Ra
> (e) Let f be a tempered function such that lima→∞ −a f exists in C. Show that this limit is also equal to
R∞ 2 Rb 2
limǫ↓0 −∞ e−ǫx f (x)dx. (Hint: set g(x) = f (x) + f (−x). Use 224J to show that if 0 ≤ a ≤ b then | a g(x)e−ǫx dx| ≤
Rc Ra 2 R a 2 R a
supc∈[a,b] | a g|, so that lima→∞ 0 g(x)e−ǫx dx exists uniformly in ǫ, while limǫ↓0 0 g(x)e−ǫx dx = 0 g for every
a ≥ 0.)

> (f ) Let f and g be tempered functions on R such that g represents the Fourier transform of f . Show that
1 R a −iyx
g(y) = lima→∞ √ e f (x)dx
2π −a
at almost all points y for which the limit exists. (Hint: 284Xe, 284M.)
∧ ∧∨
> (g) Let f be an integrable complex-valued function on R such that f also is integrable. Show that f = f at any
point at which f is continuous.

(h) Show that for every p ∈ [1, ∞[, f ∈ LpC and ǫ > 0 there is a rapidly decreasing test function h such that
kf − hkp ≤ ǫ.

>(i) Let f and g be square-integrable complex-valued functions on R such that g represents the Fourier transform
of f . Show that
Rd i R ∞ eicy −eidy
f= √ g(y)dy
c 2π −∞ y
whenever c < d in R.
R
(j) Let f be a measurable complex-valued function, defined almost everywhere in R, such that |f |p < ∞, where
1 < p ≤ 2. Show that f is a tempered function and that there is a tempered function g representing the Fourier
transform of f . (Hint: express f as f1 + f2 , where f1 is integrable and f2 is square-integrable.) (Remark Defining
kf kp , kgkq as in 244D, where q = p/(p − 1), we have kgkq ≤ (2π)(p−2)/2p kf kp ; see Zygmund 59, XVI.3.2.)

(k) Let f , g be square-integrable complex-valued functions on R such that g represents the Fourier transform of f .
(i) Show that
1 R a ixy 1 R ∞ sin at
√ e g(y)dy = f (x − t)dt
2π −a π −∞ t

for every x ∈ R, a > 0. (Hint: find the inverse Fourier transform of y 7→ e−ixy χ[−a, a](y), and use 284Ob.)
(ii) Show that if f (x) = 0 for x ∈ ]c, d[ then
1 Ra
√ lima→∞ −a
eixy g(y)dy = 0

for x ∈ ]c, d[.


(iii) Show that if f is differentiable at x ∈ R, then
1 Ra
√ lima→∞ −a
eixy g(y)dy = f (x).

(iv) Show that if f has bounded variation over some interval properly containing x, then
1 Ra 1
√ lima→∞ −a
eixy g(y)dy = (limt∈dom f,t↑x f (t) + limt∈dom f,t↓x f (t)).
2π 2


(l) Let f be an integrable complex function on R. Show that if f is square-integrable, so is f .

(m) LetRf1 , f2 be square-integrable


R∞ complex-valued functions on R with Fourier transforms represented by g1 , g2 .

Show that −∞ f1 (t)f2 (−t)dt = −∞ g1 (t)g2 (t)dt.

(n) Suppose x ∈ R. Write δx for Dirac measure on R concentrated at x. Describe a sense in which 2πδx can be
regarded as the Fourier transform of the function t 7→ eixt .
284Ye Fourier transforms II 447

(o) For any tempered function f and x ∈ R, let δx be the Dirac measure on R concentrated at x, and set
R
(δx ∗ f )(u) = f (u − t)δx (dt) = f (u − x)
for every u for which u − x ∈ dom f (cf. 257Xe). If g represents the Fourier transform of f , find a corresponding
representation of the Fourier transform of δx ∗ f , and relate it to the product of g with the Fourier transform of δx .

(p) Show that


R −δ 1 Ra1 
limδ↓0,a→∞ −a x
e−iyx dx + δ x
e−iyx dx = −πi sgn y

for every y ∈ R, writing sgn y = y/|y| if y 6= 0 and sgn 0 = 0. (Hint: 283Da.)


(ii) Show that
1 R c R a ixy 2i
limc→∞ e sgn y dy da =
c 0 −a x
for every x 6= 0.
(iii) Show that for any rapidly decreasing test function h,
Z ∞ Z −δ Z a
1 ∧ ∧ 1∧ 1∧ 
(h(x) − h(−x))dx = lim h(x)dx + h(x)dx
0
x δ↓0,a→∞ −a
x δ
x
Z ∞

= −√ h(y) sgn y dy.
2π −∞

(iv) Show that for any rapidly decreasing test function h,


iπ R ∞ ∧ R∞1
√ h(x) sgn x dx = (h(y) − h(−y))dy.
2π −∞ 0 y

R∞
(q) Let hhn in∈N be a sequence of rapidly decreasing test functions such that φ(f ) = limn→∞ −∞ hn ×f is defined for
R∞ R∞ ∧ R∞
every rapidly decreasing test function f . Show that limn→∞ −∞ h′n ×f , limn→∞ −∞ hn ×f and limn→∞ −∞ (hn ∗g)×f
are defined for all rapidly decreasing test functions f and g, and are zero if φ is identically zero. (Hint: 255G will help
with the last.)

284Y Further exercises (a) Let f be an integrable complex-valued function on ]−π, π], and f˜ its periodic
extension, as in 282Ae. Show that f˜ is a tempered function. Show that for any rapidly decreasing test function h,
R ∧ √ P∞
f˜ × h = 2π k=−∞ ck h(k), where hck ik∈N is the sequence of Fourier coefficients of f . (Hint: begin with the case
f (x) = einx . Next show that
P∞ P∞ ∧
M = k=−∞ |h(k)| + k=−∞ supx∈[(2k−1)π,(2k+1)π] |h(x)| < ∞,
and that
R ∧ √ P∞
| f˜ × h − 2π k=−∞ ck h(k)| ≤ M kf k1 .
Finally apply 282Ib.)

(b) Let f be a complex-valued function, defined almost everywhere in R, such that f × h is integrable for every
rapidly decreasing test function h. Show that f is tempered.

(c) Let f and g be tempered functions on R such that g represents the Fourier transform of f . Show that
Z d Z ∞
i eicy −eidy −y 2 /2σ 2
f= √ lim e g(y)dy
c
2π σ→∞ −∞ y
R
whenever c ≤ d in R. (Hint: set θ = χ[c, d]. Show that both sides are limσ→∞ f × (θ ∗ ψ1/σ ), defining ψσ as in 283N.)
R∞
(d) Show that if g : R → R is an odd function of bounded variation such that 1 x1 g(x)dx = ∞, then g does not
represent the Fourier transform of any tempered function. (Hint: 283Yd, 284Yc.)

(e) Let S be the space of rapidly decreasing test functions. For k, m ∈ N set τkm (h) = supx∈R |x|k |h(m) (x)| for every
h ∈ S, writing h(m) for the mth derivative of h as usual. (i) Show that each τkm is a seminorm and that S is complete
448 Fourier analysis 284Ye


and separable for the metrizable linear space topology T they
R define. (ii) Show that h 7→ h : S → S is continuous for T.
(iii) Show that if f Ris any tempered function, then h 7→ f × h is T-continuous. (iv) Show that if f is an integrable
function such that |xk f (x)|dx < ∞ for every k ∈ N, then h 7→ f ∗ h : S → S is T-continuous.

(f ) Show that if f is a tempered function on R and


1R c R a
γ = limc→∞ f (x)dxda
c 0 −a
is defined in C, then γ is also
R∞
limǫ↓0 −∞
f (x)e−ǫ|x| dx.

(g) Let f , g be square-integrable complex-valued functions on R such that g represents the Fourier transform of f .
Suppose that m ∈ Z and that (2m − 1)π < x < (2m + 1)π. Set f˜(t) = f (t + 2mπ) for those t ∈ ]−π, π] such that
t + 2mπ ∈ dom f . Let hck ik∈Z be the sequence of Fourier coefficients of f˜. Show that
1 Ra Pn
√ lima→∞ −a eixy g(y)dy = limn→∞ k=−n ck eikx

in the sense that if one limit exists in C so does the other, and they are then equal. (Hint: 284Xk(i), 282Da.)

(h) Show that if f is integrable over R and there is some M ≥ 0 such that f (x) = f (x) = 0 for |x| ≥ M , then f = 0
a.e. (Hint: reduce to the case M = π. Looking at the Fourier series of f ↾ ]−π, π], show that f is expressible in the
Pm ∧
form f (x) = k=−m ck eikx for almost every x ∈ ]−π, π]. Now compute f (2n + 12 ) for large n.)
R∞ 1
(i) Let ν be a Radon measure on R (definition: 256A) which is ‘tempered’ in the sense that −∞ 1+|x|k
ν(dx) is finite
for some k ∈ N. (i) Show that every rapidly decreasing test function is ν-integrable. (ii) Show that if ν has bounded
support (definition:R 256Xf), and h is a rapidly decreasing test function, then ν ∗ h is a rapidly decreasing test function,

where (ν ∗ h)(x) = −∞ h(x − y)ν(dy) for x ∈ R. (iii) Show that there is a sequence hhn in∈N of rapidly decreasing test
R∞ R∞
functions such that limn→∞ −∞ hn × f = −∞ f dν for every rapidly decreasing test function f .

(j) Let φ : S → R be a functional defined by the formula of 284Xq. Show that φ is continuous for the topology of
284Ye. (Note: it helps to know a little more about metrizable linear topological spaces than is covered in §2A5.)

284 Notes and comments Yet again I must warn you that the material above gives a very restricted view of the
subject. I have tried to indicate how the theory of Fourier transforms of ‘good’ functions – here taken to be the
rapidly decreasing test functions – may be extended, through a kind of duality, to a very much wider class of functions,
the ‘tempered functions’. Evidently, writing S for the linear space of rapidly decreasing test functions, we can seek
∧ ∧
to investigate a Fourier transform of any linear functional φ : S → C, writing φ(h) = φ(h) for any h ∈ S. (It is
actually commoner at this point to restrict attention to functionals φ which are continuous for the standard topology
on S, described in 284Ye; these are called tempered distributions.) By 284F-284G, we can identify some of these
functionals with equivalence classes of tempered functions, and then set out to investigate those tempered functions
whose Fourier transforms can again be represented by tempered functions.
I suppose the structure of the theory of Fourier transforms is best laid out through the formulae involved. Our aim


is to set up pairs (f, g) = (f, f ) = (g, g) in such a way that we have
∧∨ ∨∧
Inversion: h = h = h;
∨ ∧
Reversal : h(y) = h(−y);
∧ ∧ ∧
Linearity: (h1 + h2 )∧ = h1 + h2 , (ch)∧ = ch;

Differentiation: (h′ )∧ (y) = iy h(y);
∧ ∧
Shift: if h1 (x) = h(x + c) then h1 (y) = eiyc h(y);
∧ ∧
Modulation: if h1 (x) = eicx h(x) then h1 (y) = h(y − c);
∧ ∧
Symmetry: if h1 (x) = h(−x) then h1 (y) = h(−y);

Complex Conjugate: (h)∧ (y) = h(−y);
∧ 1∧ y
Dilation: if h1 (x) = h(cx), where c > 0, then h1 (y) = h( );
c c
284 Notes Fourier transforms II 449

√ ∧ ∧ 1 ∧ ∧
Convolution: (h1 ∗ h2 )∧ = 2π h1 × h2 , (h1 × h2 )∧ = √ h1 ∗ h2 ;

R∞ ∧ R∞ ∧
Duality: −∞ h1 × h2 = −∞ h1 × h2 ;
R∞ R∞ ∧ ∧
Parseval : −∞ h1 × h2 = −∞ h1 × h2 ;
and, of course,
∧ 1 R ∞ −iyx
h(y) = √ e h(x)dx,
2π −∞

Rd∧ i R ∞ e−icy −e−idy


h(y)dy = √ h(y)dy.
c 2π −∞ y

(I have used the letter h in the list above to suggest what is in fact the case, that all the formulae here are valid

for rapidly decreasing test functions.) On top of all this, it is often important that the operation h 7→ h should be
continuous in some sense.
The challenge of the ‘pure’ theory of Fourier transforms is to find the widest possible
R ∞ variety of objects h for which
the formulae above will be valid, subject to appropriate interpretations of ∧ , ∗ and −∞ . I must of course remark here
that from the very beginnings, the subject has been enriched by its applications in other parts of mathematics, the

physical sciences and the social sciences, and that again and again these have suggested further possible pairs (f, f ),
making new demands on our power to interpret the rules we seek to follow. Even the theory of distributions does not
seem to give a full canonical account of what can be done. First, there are great difficulties in interpreting the ‘product’
of two arbitrary distributions, making several of the formulae above problematic; and second, it is not obvious that
only one kind of distribution need be considered. In this section I have looked at just one space of ‘test functions’, the
space S of rapidly decreasing test functions; but at least two others are significant, the space D of smooth functions
with bounded support and the space Z of Fourier transforms of functions in D. The advantage of starting with S is
∧ 2
that it gives a symmetric theory, since h ∈ S for every h ∈ S; but it is easy to find objects (e.g., the function x 7→ ex ,
or the function x 7→ 1/|x|) which cannot be interpreted as functionals on S, so that their Fourier transforms must be
investigated by other methods, if at all. In 284Xp I sketch some of the arguments which can be used to justify p the
assertion that the Fourier transform of the function x 7→ 1/x is, or can be represented by, the function y 7→ −i π2 sgn y;
the general principle in this case being that we approach both 0 and ∞ symmetrically. For a variety of such matching
pairs, established by arguments based on the idea in 284Xq, see Lighthill 59, chap. 3.
Accordingly it seems that, after nearly two centuries, we must still proceed by carefully examining particular classes
of function, and checking appropriate interpretations of the formulae. In the work above I have repeatedly used the
concepts
Ra R∞ 2
lima→∞ −a
f, limǫ↓0 −∞
e−ǫx f (x)dx
R∞
as alternative interpretations of −∞
f . (Of course they are closely related; see 284Xe.) The reasons for using the
−ǫx2
particular kernel e are that it belongs to S, it is an even function, its Fourier transform is calculable and easy to
2 2
manipulate, and it is associated with the normal probability density function σ√12π e−x /2σ , so that any miscellaneous
facts we gather have a chance of being valuable elsewhere. But there are applications in which alternative kernels are
more manageable – e.g., e−ǫ|x| (283Xr, 283Yb, 284Yf).
One of the guiding principles here is that purely formal manipulations, along the lines of those in the list above, and
(especially) changes in the order of integration, with other exchanges of limit, again and again give rise to formulae
which, suitably interpreted, are valid. First courses in analysis are often inhibitory; students are taught to distrust any
manipulation which they cannot justify. To my own eye, the delight of this subject lies chiefly in the variety of the
arguments demanded by a rigorous approach, the ground constantly shifting with the context; but there is no doubt
that cheerful sanguinity is often the best guide to the manipulations which it will be right to try to justify.
This being a book on measure theory, I am of course particularly interested in the possibility of a measure appearing
as a Fourier transform. This is what happens if we seek the Fourier transform of the constant function χR (284R).
More generally, any periodic tempered function f with period 2π can be assigned a Fourier transform which is a ‘signed
measure’ (for our present purposes, a complex linear combination of measures) concentrated on Z, the mass at each
k ∈ Z being determined by the corresponding Fourier coefficient of f ↾ ]−π, π] (284Xn, 284Ya). In the next section I will
go farther in this direction, with particular reference to probability distributions on R r . But the reason why positive
measures have not forced themselves on our attention so far is that we do not expect to get a positive function as a
Fourier transform unless some very special conditions are satisfied, as in 283Yb.
As in §282, I have used the Hilbert space structure of L2C as the basis of the discussion of Fourier transforms of
450 Fourier analysis 284 Notes

functions in L2C (284O-284P). But as with Fourier series, Carleson’s theorem (286U) provides a more direct description.

285 Characteristic functions


I come now to one of the most effective applications of Fourier transforms, the use of ‘characteristic functions’ to
analyse probability distributions. It turns out not only that the Fourier transform of a probability distribution deter-
mines the distribution (285M) but that many of the things we want to know about a distribution are easily calculated
from its transform (285G, 285Xf). Even more strikingly, pointwise convergence of Fourier transforms corresponds (for
sequences) to convergence for the vague topology in the space of distributions, so they provide a new and extremely
powerful method for proving such results as the Central Limit Theorem and Poisson’s theorem (285Q).
As the applications of the ideas here mostly belong to probability theory, I return to probabilists’ terminology, as in
Chapter 27. There will nevertheless be many points at which it is appropriate toR speak of integrals, and there will often
be more than one measure in play; so I should say directly that an integral f (x)dx will always be with respect to
Lebesgue measure (usually, but not always, one-dimensional),
R R as in the rest of the chapter, while integrals with respect
to other measures will be expressed in the forms f dν or f (x)ν(dx).

285A Definition (a) Let ν be a Radon probability measure on R r (256A). Then the characteristic function of
ν is the function φν : R r → C given by the formula
R
φν (y) = eiy . x ν(dx)
for every y ∈ R r , writing y . x = η1 ξ1 + . . . + ηr ξr if y = (η1 , . . . , ηr ), x = (ξ1 , . . . , ξr ).

(b) Let X1 , . . . , Xr be real-valued random variables on the same probability space. The characteristic function
of X = (X1 , . . . , Xr ) is the characteristic function φX = φνX of their joint probability distribution νX as defined in
271C.

285B Remarks (a) By one of the ordinary accidents of history, the definitions of ‘characteristic function’ and
‘Fourier transform’ have evolved independently. In 283Ba I remarked that the definition of the Fourier transform
remains unfixed, and that the formulae
∧ R∞
f (y) = −∞
eiyx f (x)dx,

∨ 1 R ∞ −iyx
f (y) = e f (x)dx
2π −∞
are sometimes used. On the other hand, I think that nearly all authors agree on the definition of the characteristic
function as given above. You may feel therefore that I should have followed their lead, and chosen the definition of
Fourier transform which best matches the definition of characteristic function. I did not do so largely because I wished
to emphasise the symmetry between the Fourier transform and the inverse Fourier transform, and the correspondence
between Fourier transforms and Fourier series. The principal advantage of matching the definitions up would be to
make the constants in such theorems as 283F, 285Xh the same, and would be balanced by the need to remember
∧ ∨
different constants for f , f in such results as 283M.

(b) A secondary reason for not trying too hard to make the formulae of this section match directly those of §§283-284
is that the r-dimensional case is at the heart of some of the most important applications of characteristic functions, so
that it seems right to introduce it from the beginning; and consequently the formulae of this section will necessarily
have new features compared with those in the body of the work so far.

285C Of course there is a direct way to describe the characteristic function of a family (X1 , . . . , Xr ) of random
variables, as follows.
Proposition Let X1 , . . . , Xr be real-valued random variables on the same probability space (Ω, Σ, µ), and νX their
joint distribution. Then their characteristic function φνX is given by
X
φνX (y) = E(eiy .X ) = E(eiη1 X1 eiη2 X2 . . . eiηr Xr )
for every y = (η1 , . . . , ηr ) ∈ R r .
proof Apply 271E to the functions h1 , h2 : Rr → R defined by
285E Characteristic functions 451

h1 (x) = cos(y . x), h2 (y) = sin(y . x),


to see that
Z Z
φνX (y) = h1 (x)νX (dx) + i h2 (x)νX (dx)
X
= E(h1 (X X )) = E(eiy .X
X )) + iE(h2 (X ).

285D I ought to spell out the correspondence between Fourier transforms, as defined in 283A, and characteristic
functions.
Proposition Let ν be a Radon probability measure on R. Write
∧ 1 R ∞ −iyx
ν(y) = √ e ν(dx)
2π −∞
for every y ∈ R, and φν for the characteristic function of ν.
∧ 1
(a) ν(y) = √ φν (−y) for every y ∈ R.

(b) For any Lebesgue integrable complex-valued function h defined almost everywhere in R,
R∞ ∧
R∞ ∧

−∞
ν(y)h(y)dy = −∞
h(x)ν(dx).
(c) For any rapidly decreasing test function h on R (see §284),
R∞ R∞ ∨

−∞
h(x)ν(dx) = −∞
h(y)ν(y)dy.

(d) If ν is an indefinite-integral measure over Lebesgue measure, with Radon-Nikodým derivative f , then ν is the
Fourier transform of f .

proof (a) This is immediate from the definitions of φν and ν.
(b) Because
R∞ R∞ R∞
−∞ −∞
|h(y)|ν(dx)dy = −∞
|h(y)|dy < ∞,
we may change the order of integration to see that
Z ∞ Z ∞Z ∞
1
e−iyx h(y)ν(dx)dy

ν(y)h(y)dy = √
−∞
2π −∞ −∞
Z ∞Z ∞ Z ∞
1 ∧
−iyx
= √ e h(y)dy ν(dx) = h(x)ν(dx).
2π −∞ −∞ −∞

∨ ∨∧
(c) This follows immediately from (b), because h is integrable and h = h (284C).
(d) The point is just that
R R
h dν = h(x)f (x)dx
for every bounded Borel measurable h : R → R (235K), and therefore for the functions x 7→ e−iyx : R → C. Now
∧ 1 R ∞ −iyx 1 R ∞ −iyx ∧
ν(y) = √ e ν(dx) = √ e f (x)dx = f (y)
2π −∞ 2π −∞
for every y.

285E Lemma Let X be a normal random variable with expectation a and variance σ 2 , where σ > 0. Then the
characteristic function of X is given by
2 2
φ(y) = eiya e−σ y /2
.

proof This is just 283N with the constants changed. We have

Z ∞
1 2
/2σ 2
φ(y) = E(eiyX ) = √ eiyx e−(x−a) dx
σ 2π −∞
(taking the density function for X given in 274Ad, and applying 271Ic)
452 Fourier analysis 285E
Z ∞
1 2
= √ eiy(σt+a) e−t /2
dt
2π −∞
(substituting x = σt + a)
√ ∧
= eiya 2π ψ 1 (−yσ)
2
(setting ψ1 (x) = √1 e−x /2 , as in 283N)

2 2
= eiya e−σ y /2
.

285F I now give results corresponding to parts of 283C, with an extra refinement concerning independent random
variables (285I).
Proposition Let ν be a Radon probability measure on R r , and φ its characteristic function.
(a) φ(0) = 1.
(b) φ : R r → C is uniformly continuous.
(c) φ(−y) = φ(y),R |φ(y)| ≤ 1 for every y ∈ Rr . R
(d) If r = 1 and R |x|ν(dx) < ∞, then φ′ (y) exists and is equal to i Rxeixy ν(dx) for every y ∈ R.
(e) If r = 1 and x2 ν(dx) < ∞, then φ′′ (y) exists and is equal to − x2 eixy ν(dx) for every y ∈ R.
R
proof (a) φ(0) = χR r ν(dx) = ν(R r ) = 1.
(b) Let ǫ > 0. Let M > 0 be such that
ν{x : kxk ≥ M } ≤ ǫ,

writing kxk = x . x as usual. Let δ > 0 be such that |eia − 1| ≤ ǫ whenever |a| ≤ δ. Now suppose that y, y ′ ∈ R r are
such that ky − y ′ k ≤ δ/M . Then whenever kxk ≤ M ,
′ ′ ′ ′
|eiy . x − eiy .x
| = |eiy .x
||ei(y−y ) . x − 1| = |ei(y−y ) . x − 1| ≤ ǫ
because
|(y − y ′ ) . x| ≤ ky − y ′ kkxk ≤ δ.
Consequently
Z Z

|φ(y) − φ(y ′ )| ≤ |eiy . x − eiy .x
|ν(dx) + |eiy . x |ν(dx)
kxk≤M kxk>M
Z

+ |eiy .x
|ν(dx)
kxk>M

≤ ǫ + ǫ + ǫ = 3ǫ.

As ǫ is arbitrary, φ is uniformly continuous.


(c) This is elementary;
R R
φ(−y) = e−iy . x ν(dx) = eiy . x ν(dx) = φ(y),
R R R
|φ(y)| = | eiy . x ν(dx)| ≤ |eiy . x |ν(dx) = χR r ν(dx) = 1.

∂ iyx
(d) The point is that | ∂y e | = |x| for every x, y ∈ R. So by 123D (applied, strictly speaking, to the real and
imaginary parts of the function)
d R R ∂ iyx R
φ′ (y) = eiyx ν(dx) = e ν(dx) = ixeiyx ν(dx).
dy ∂y


(e) Since we now have | ∂y xeiyx | = x2 for every x, y, we can repeat the argument to get
d R R ∂ R
φ′′ (y) = i xeiyx ν(dx) = i xeiyx ν(dx) = − x2 eiyx ν(dx).
dy ∂y
285J Characteristic functions 453

285G Corollary (a) Let X be a real-valued random variable with finite expectation, and φ its characteristic
function. Then φ′ (0) = iE(X).
(b) Let X be a real-valued random variable with finite variance, and φ its characteristic function. Then φ′′ (0) =
−E(X 2 ).

proof We have only to match X to its distribution ν, and say that


‘X has finite expectation’
corresponds to
R
‘ |x|ν(dx) = E(|X|) < ∞’,
so that
R
φ′ (0) = i x ν(dx) = iE(X),
and that
‘X has finite variance’
corresponds to
R
‘ x2 ν(dx) = E(X 2 ) < ∞’,
so that
R
φ′′ (0) = − x2 ν(dx) = −E(X 2 ),
as in 271E.


285H Remark Observe that there is no result corresponding to 283Cg (‘lim|y|→∞ f (y) = 0’). If ν is the Dirac
measure on R concentrated at 0, that is, the distribution of a random variable which is zero almost everywhere, then
φ(y) = 1 for every y.

285I Proposition Let X1 , . . . , Xn be independent real-valued random variables, with characteristic functions
φ1 , . . . , φn . Let φ be the characteristic function of their sum X = X1 + . . . + Xn . Then
Qn
φ(y) = j=1 φj (y)
for every y ∈ R.

proof Let y ∈ R. By 272E, the variables


Yj = eiyXj
are independent, so by 272R
Qn Qn Qn
φ(y) = E(eiyX ) = E(eiy(X1 +...+Xn ) ) = E( j=1 Yj ) = j=1 E(Yj ) = j=1 φj (y),
as required.

Remark See also 285R below.

285J There is an inversion theorem for characteristic functions, corresponding to 283F; I give it in 285Xh, with an
r-dimensional version in 285Yb. However, this does not seem to be as useful as the following group of results.

Lemma Let ν be a Radon probability measure on Rr , and φ its characteristic function. Then for any j ≤ r and a > 0,
R 1/a
ν{x : |ξj | ≥ a} ≤ 7a 0
(1 − Re φ(tej ))dt,
where ej ∈ R r is the jth unit vector.

proof We have
454 Fourier analysis 285J

Z 1/a Z 1/a Z

7a (1 − Re φ(tej ))dt = 7a 1 − Re eitξj ν(dx) dt
0 0 Rr
Z 1/a Z
= 7a 1 − cos(tξj )ν(dx)dt
0 Rr
Z Z 1/a
= 7a 1 − cos(tξj )dt ν(dx)
Rr 0
1
(because (x, t) 7→ 1 − cos(tξj ) is bounded and νR r · is finite)
a
Z
1 1 ξj 
= 7a − sin ν(dx)
r a ξj a
ZR
1 1 ξj 
≥ 7a − sin ν(dx)
|ξj |≥a a ξj a
1 ξ 1
(because sin ≤ for every ξ 6= 0)
ξ a a
≥ ν{x : |ξj | ≥ a},

because
sin η sin 1 6
≤ ≤ if η ≥ 1,
η 1 7
so
1 1 ξj 1
a( − sin ) ≥
a ξj a 7
if |ξj | ≥ a.

285K Characteristic functions and the vague topology The time has come to return to ideas mentioned
briefly in 274L. Fix r ≥ 1 and let P be the set of all Radon probability measures on R r . For any bounded continuous
function h : R r → R, define ρh : P × P → R by setting
R R
ρh (ν, ν ′ ) = | h dν − h dν ′ |
for ν, ν ′ ∈ P . Then the vague topology on P is the topology generated by the pseudometrics ρh (274Ld).

285L Theorem Let ν, hνn in∈N be Radon probability measures on R r , with characteristic functions φ, hφn in∈N .
Then the following are equiveridical:
(i) νR = limn→∞ νn forR the vague topology;
(ii) h dν = limn→∞ h dνn for every bounded continuous h : R r → R;
(iii) limn→∞ φn (y) = φ(y) for every y ∈ R r .
proof (a) The equivalence of (i) and (ii) is virtually the definition of the vague topology; we have

lim νn = ν for the vague topology


n→∞
⇐⇒ lim ρh (νn , ν) = 0 for every bounded continuous h
n→∞
(2A3Mc)
Z Z
⇐⇒ lim | h dνn − h dν| = 0 for every bounded continuous h.
n→∞

(b) Next, (ii) obviously implies (iii), because


R
Re φ(y) = hy dν = limn→∞ hy dνn = limn→∞ Re φn (y),
setting hy (x) = cos x . y for each x, and similarly
Im φ(y) = limn→∞ Im φn (y)
285L Characteristic functions 455

for every y ∈ R r .

(c) So we are left to prove that (iii)⇒(ii). I start by showing that, given ǫ > 0, there is a closed bounded set K such
that
νn (R r \ K) ≤ ǫ for every n ∈ N.

P We know that φ(0) = 1 and that φ is continuous at 0 (285Fb). Let a > 0 be so large that for every j ≤ r, |t| ≤ 1/a
P
we have
ǫ
1 − Re φ(tej ) ≤ ,
14r

writing ej for the jth unit vector, as in 285J. Then


R 1/a ǫ
7a 0
(1 − Re φ(tej ))dt ≤
2r

for each j ≤ r. By Lebesgue’s Dominated Convergence Theorem (since of course the functions t 7→ 1 − Re φn (tej ) are
uniformly bounded on [0, a1 ]), there is an n0 ∈ N such that
R 1/a ǫ
7a 0
(1 − Re φn (tej ))dt ≤
r

for every j ≤ r, n ≥ n0 . But 285J tells us that now


ǫ
νn {x : |ξj | ≥ a} ≤
r

for every j ≤ r, n ≥ n0 . On the other hand, there is surely a b ≥ a such that


ǫ
νn {x : |ξj | ≥ b} ≤
r

for every j ≤ r, n < n0 . So, setting K = {x : |ξj | ≤ b for every j ≤ r},


νn (R r \ K) ≤ ǫ

for every n ∈ N, as required. Q


Q

(d) Now take any bounded continuous h : R r → R and ǫ > 0. Set M = 1 + supx∈R r |h(x)|, and let K be a bounded
closed set such that
ǫ ǫ
νn (R r \ K) ≤ for every n ∈ N, ν(R r \ K) ≤ ,
M M

using (b) just above. By the Stone-Weierstrass theorem (281K) there are y0 , . . . , ym ∈ Qr and c0 , . . . , cm ∈ C such
that
|h(x) − g(x)| ≤ ǫ for every x ∈ K,

|g(x)| ≤ M for every x ∈ R r ,


Pm iyk . x
writing g(x) = k=0 ck e for x ∈ R r . Now
R Pm Pm R
limn→∞ g dνn = limn→∞ k=0 ck φn (yk ) = k=0 ck φ(yk ) = g dν.

On the other hand, for every n ∈ N,


R R R
| g dνn − h dνn | ≤ K
|g − h|dνn + 2M νn (R \ K) ≤ 3ǫ,
R R
and similarly | g dν − h dν| ≤ 3ǫ. Consequently
R R
lim supn→∞ | h dνn − h dν| ≤ 6ǫ.

As ǫ is arbitrary,
R R
limn→∞ h dνn = h dν,

and (ii) is true.


456 Fourier analysis 285M

285M Corollary (a) Let ν, ν ′ be two Radon probability measures on R r with the same characteristic functions.
Then they are equal.
(b) Let (X1 , . . . , Xr ) and (Y1 , . . . , Yr ) be two families of real-valued random variables. If
E(eiη1 X1 +...+iηr Xr ) = E(eiη1 Y1 +...+iηr Yr )
for all η1 , . . . , ηr ∈ R, then (X1 , . . . , Xr ) has the same joint distribution as (Y1 , . . . , Yr ).
R R
proof (a) Applying 285L with νn = ν ′ for every n, we see that h dν ′ = h dν for every bounded continuous
h : R r → R. By 256D(iv), ν = ν ′ .
(b) Apply (a) with ν, ν ′ the two joint distributions.

285N Remarks Probably the most important application of this theorem is to the standard proof of the Central
Limit Theorem. I sketch the ideas in 285Xn and 285Yj-285Ym; details may be found in most serious probability texts;
two on my shelf are Shiryayev 84, §III.4, and Feller 66, §XV.6. However, to get the full strength of Lindeberg’s
version of the Central Limit Theorem we have to work quite hard, and I therefore propose to illustrate the method
with a version of Poisson’s theorem (285Q) instead. I begin with two lemmas which are very frequently used in results
of this kind.

285O Lemma Let c0 , . . . , cn , d0 , . . . , dn be complex numbers of modulus at most 1. Then


Qn Qn Pn
| k=0 ck − k=0 dk | ≤ k=0 |ck − dk |.

proof Induce on n. The case n = 0 is trivial. For the case n = 1 we have

|c0 c1 − d0 d1 | = |c0 (c1 − d1 ) + (c0 − d0 )d1 |


≤ |c0 ||c1 − d1 | + |c0 − d0 ||d1 | ≤ |c1 − d1 | + |c0 − d0 |,
which is what we need. For the inductive step to n + 1, we have

n+1
Y n+1
Y n
Y n
Y
| ck − dk | ≤ | ck − dk | + |cn+1 − dn+1 |
k=0 k=0 k=0 k=0
Qn Qn
(by the case just done, because cn+1 , dn+1 , k=0 ckand k=0 dk all have modulus at most 1)
Xn
≤ |ck − dk | + |cn+1 − dn+1 |
k=0
(by the inductive hypothesis)
n+1
X
= |ck − dk |,
k=0

so the induction continues.

285P Lemma Let M , ǫ > 0. Then there are η > 0 and y0 , . . . , yn ∈ R such that whenever X, Z are two real-valued
random variables with E(|X|) ≤ M , E(|Z|) ≤ M and |φX (yj ) − φZ (yj )| ≤ η for every j ≤ n, then FX (a) ≤ FZ (a + ǫ) + ǫ
for every a ∈ R, where I write φX for the characteristic function of X and FX for the distribution function of X.
ǫ
proof Set δ = > 0, b = M/δ.
7

(a) Define h0 : R → [0, 1] by setting


h0 (x) = 1 if x ≤ 0, h0 (x) = 1 − x/δ if 0 ≤ x ≤ δ, h0 (x) = 0 if x ≥ δ.
Then h0 is continuous. Let m be the integral part of b/δ, and for −m ≤ k ≤ m + 1 set hk (x) = h0 (x − kδ).
PnBy theiyStone-Weierstrass theorem (281K), there are y0 , . . . , yn ∈ R and c0 , . . . , cn ∈ C such that, writing g0 (x) =
jx
j=0 c j e ,
|h0 (x) − g0 (x)| ≤ δ for every x ∈ [−b − (m + 1)δ, b + mδ],

|g0 (x)| ≤ 1 for every x ∈ R.


285Q Characteristic functions 457

For −m ≤ k ≤ m + 1, set
Pn −iyj kδ iyj x
gk (x) = g0 (x − kδ) = j=0 cj e e .
Pn
Set η = δ/(1 + j=0 |cj |) > 0.
(b) Now suppose that X, Z are random variables such that E(|X|) ≤ M , E(|Z|) ≤ M and |φX (yj ) − φZ (yj )| ≤ η
for every j ≤ n. Then for any k we have
Pn Pn
E(gk (X)) = E( j=0 cj e−iyj kδ eiyj X ) = j=0 cj e−iyj kδ φX (yj ),
and similarly
Pn −iyj kδ
E(gk (Z)) = j=0 cj e φZ (yj ),
so
Pn Pn
|E(gk (X)) − E(gk (Z))| ≤ j=0 |cj ||φX (yj ) − φZ (yj )| ≤ j=0 |cj |η ≤ δ.
Next,
|hk (x) − gk (x)| ≤ δ for every x ∈ [−b − (m + 1)δ + kδ, b + mδ + kδ] ⊇ [−b, b],

|hk (x) − gk (x)| ≤ 2 for every x,

M
Pr(|X| ≥ b) ≤ = δ,
b

so E(|hk (X) − gk (X)|) ≤ 3δ; and similarly E(|hk (Z) − gk (Z)|) ≤ 3δ. Putting these together,
|E(hk (X)) − E(hk (Z))| ≤ 7δ = ǫ
whenever −m ≤ k ≤ m + 1.
(c) Now suppose that −b ≤ a ≤ b. Then there is a k such that −m ≤ k ≤ m + 1 and a ≤ kδ ≤ a + δ. Since
χ ]−∞, a] ≤ χ ]−∞, kδ] ≤ hk ≤ χ ]−∞, (k + 1)δ] ≤ χ ]−∞, a + 2δ],
we must have
Pr(X ≤ a) ≤ E(hk (X)),

E(hk (Z)) ≤ Pr(Z ≤ a + 2δ) ≤ Pr(Z ≤ a + ǫ).


But this means that
Pr(X ≤ a) ≤ E(hk (X)) ≤ E(hk (Z)) + ǫ ≤ Pr(Z ≤ a + ǫ) + ǫ
whenever a ∈ [−b, b].
(d) As for the cases a ≥ b, a ≤ −b, we surely have
b(1 − FZ (b)) = b Pr(Z ≥ b) ≤ E(|Z|) ≤ M ,
so if a ≥ b then
M
FX (a) ≤ 1 ≤ FZ (a) + 1 − FZ (b) ≤ FZ (a) + = FZ (a) + δ ≤ FZ (a + ǫ) + ǫ.
b
Similarly,
bFX (−b) ≤ E(|X|) ≤ M ,
so
FX (a) ≤ δ ≤ FZ (a + ǫ) + ǫ
for every a ≤ −b. This completes the proof.

285Q Law of Rare Events: Theorem For any M ≥ 0 and ǫ > 0 there is a δ > 0 such that whenever
Pn X0 , . . . , X n
are independent {0, 1}-valued random variables with Pr(Xk = 1) = pk ≤ δ for every k ≤ n, and k=0 pk = λ ≤ M ,
and X = X0 + . . . + Xn , then
λm −λ
| Pr(X = m) − e | ≤ǫ
m!
458 Fourier analysis 285Q

for every m ∈ N.
proof (a) We should begin by calculating some characteristic functions. First, the characteristic function φk of Xk
will be given by
φk (y) = (1 − pk )eiy0 + pk eiy1 = 1 + pk (eiy − 1).
Next, if Z is a Poisson random variable with parameter λ (that is, if Pr(Z
P∞ = m) = λm e−λ /m! for every m ∈ N; all
m −λ
you need to know at this point about the Poisson distribution is that m=0 λ e /m! = 1), then its characteristic
function φZ is given by
P∞ λm −λ iym P∞ (λeiy )m iy iy
φZ (y) = m=0 e e = e−λ m=0 = e−λ eλe = eλ(e −1) .
m! m!

(b) Before getting down to δ’s and η’s, I show how to estimate φX (y) − φZ (y). We know that
Qn
φX (y) = k=0 φk (y)
(using 285I), while
Qn iy
−1)
φZ (y) = k=0 epk (e .
iy
−1)
Because φk (y), epk (e all have modulus at most 1 (we have
iy
−1)
|epk (e | = e−pk (1−cos y) ≤ 1,)
285O tells us that
Pn iy
−1)
Pn iy
−1)
|φX (y) − φZ (y)| ≤ k=0 |φk (y) − epk (e |= k=0 |epk (e − 1 − pk (eiy − 1)|.

(c) So we have a little bit of analysis to do. To estimate |ez − 1 − z| where Re z ≤ 0, consider the function
g(t) = Re(c(etz − 1 − tz))
where |c| = 1. We have g(0) = g ′ (0) = 0 and
|g ′′ (t)| = | Re(c(z 2 etz ))| ≤ |c||z 2 ||etz | ≤ |z|2
for every t ≥ 0, so that
1
|g(1)| ≤ |z|2
2

by the (real-valued) Taylor theorem with remainder, or otherwise. As c is arbitrary,


1
|ez − 1 − z| ≤ |z|2
2
whenever Re z ≤ 0. In particular,
iy
−1) 1
|epk (e − 1 − pk (eiy − 1)| ≤ p2k |eiy − 1|2 ≤ 2p2k
2
for each k, and
Pn iy
−1)
Pn
|φX (y) − φZ (y)| ≤ k=0 |epk (e − 1 − pk (eiy − 1)| ≤ 2 k=0 p2k
for each y ∈ R.
(d) Now for the detailed estimates. Given M ≥ 0 and ǫ > 0, let η > 0 and y0 , . . . , yl ∈ R be such that
1 ǫ
Pr(X ≤ a) ≤ Pr(Z ≤ a + ) +
2 2

whenever X, Z are real-valued random variables, E(|X|) ≤ M , E(|Z|) ≤ M and |φX (yj ) − φX (yj )| ≤ η for every
j ≤ l (285P). Take δ = η/(2M + 1) and suppose
Pnthat X0 , . . . , Xn are independent {0, 1}-valued random variables with
Pr(Xk = 1) = pk ≤ δ for every k ≤ n, λ = k=0 pk ≤ M . Set X = X0 + . . . + Xn and let Z be a Poisson random
variable with parameter λ; then by the arguments of (a)-(c),
Pn Pn
|φX (y) − φZ (y)| ≤ 2 k=0 p2k ≤ 2δ k=0 pk = 2δλ ≤ η
for every y ∈ R. Also
Pn
E(|X|) = E(X) = k=0 pk = λ ≤ M ,
285T Characteristic functions 459

P∞ λm −λ P∞ λm P∞ λm+1
E(|Z|) = E(Z) = m=0 m e = e−λ m=1 (m−1)! = e−λ m=0 = λ ≤ M.
m! m!
So
1 ǫ
Pr(X ≤ a) ≤ Pr(Z ≤ a + ) + ,
2 2

1 ǫ
Pr(Z ≤ a) ≤ Pr(X ≤ a + ) +
2 2
for every a. But as both X and Z take all their values in N,
ǫ
| Pr(X ≤ m) − Pr(Z ≤ m)| ≤
2
for every m ∈ N, and
λm −λ
| Pr(X = m) − e | = | Pr(X = m) − Pr(Z = m)| ≤ ǫ
m!
for every m ∈ N, as required.

285R Convolutions Recall from 257A that if ν, ν̃ are Radon probability measures on R r then they have a
convolution ν ∗ ν̃ defined by writing
(ν ∗ ν̃)(E) = (ν × ν̃){(x, y) : x + y ∈ E}
r
for every Borel set E ⊆ R , which is also a Radon probability measure. We can readily compute the characteristic
function φν∗ν̃ from 257B: we have
Z Z

φν∗ν̃ (y) = e iy . x
(ν ∗ ν̃)(dx) = eiy . (x+x ) ν(dx)ν̃(dx′ )
Z Z Z
′ ′
= eiy . x eiy . x ν(dx)ν̃(dx′ ) = eiy . x ν(dx) eiy . x ν̃(dx′ ) = φν (y)φν̃ (y).

(Thus convolution of measures corresponds to pointwise multiplication of characteristic functions, just as convolution
of functions corresponds to pointwise multiplication of Fourier transforms.) Recalling that the sum of independent
random variables corresponds to convolution of their distributions (272T), this gives another way of looking at 285I.
Remember also that if ν, ν̃ have Radon-Nikodým derivatives f , f˜ with respect to Lebesgue measure then f ∗ f˜ is a
Radon-Nikodým derivative of ν ∗ ν̃ (257F).

285S The vague topology and pointwise convergence of characteristic functions In 285L we saw that a
sequence hνn in∈N of Radon probability measures on R r converges in the vague topology to a Radon probability measure
ν if and only if
R R
limn→∞ eiy . x νn (dx) = eiy . x ν(dx)
for every y ∈ R r ; that is, iff
limn→∞ ρ′y (νn , ν) = 0 for every y ∈ Rr ,
writing
R R
ρ′y (ν, ν ′ ) = | eiy . x ν(dx) − eiy . x ν ′ (dx)|
for Radon probability measures ν, ν ′ on R r and y ∈ R r . It is natural to ask whether the pseudometrics ρ′y actually
define the vague topology. Writing T for the vague topology and S for the topology defined by {ρ′y : y ∈ R r }, we surely
have S ⊆ T, just because every ρ′y is one of the pseudometrics used in the definition of T. Also we know that S and
T give the same convergent sequences, and incidentally that T is metrizable (see 285Xq). But all this does not quite
amount to saying that the two topologies are the same, and indeed they are not, as the next result shows.

285T Proposition Let y0 , . . . , yn ∈ R and η > 0. Then there are infinitely many m ∈ N such that |1 − eiyk m | ≤ η
for every k ≤ n.
proof Let η1 , . . . , ηr ∈ R be such that 1 = ηP
0 , η1 , . . . , ηr are linearly independent over Q and every yk /2π is a linear
r
combination of the ηj over Q; say yk = 2π j=0 qkj ηj where every qkj ∈ Q. Express the qkj as pkj /p where each
Pr
pkj ∈ Z and p ∈ N \ {0}. Set M = maxk≤n j=0 |pkj |.
460 Fourier analysis 285T

Take any m0 ∈ N and let δ > 0 be such that |1 − e2πix | ≤ η whenever |x| ≤ 2πM δ. By Weyl’s Equidistribution
Theorem (281N), there are infinitely many m such that <mηj > ≤ δ whenever 1 ≤ j ≤ r; in particular, there is such
an m ≥ m0 . Let mj be the integral part of mηj , so that |mηj − mj | ≤ δ for 0 ≤ j ≤ r. Then
Pr Pr
|mpyk − 2π j=0 pkj mj | ≤ 2π j=0 |pkj ||mηj − mj | ≤ 2πM δ,
so that
Pr
|1 − eiyk mp | = |1 − exp(i(mpyk − 2π j=0 pkj mj ))| ≤ η
for every k ≤ n. As mp ≥ m0 and m0 is arbitrary, this proves the result.

285U Corollary The topologies S and T on the space of Radon probability measures on R, as described in 285S,
are different.
proof Let δx be the Dirac measure on R concentrated at x. By 285T, every member of S which contains δ0 also
contains δm for infinitely many m ∈ N. On the other hand, the set
R 2 1
G = {ν : e−x ν(dx) > }
2

is a member of T, containing δ0 , which does not contain δm for any integer m 6= 0. So G ∈ T \ S and T 6= S.
R
285X Basic exercises (a) Let ν be a Radon probability measure on R r , where r ≥ 1, and suppose that kxkν(dx) <
∂φ
∞. Show that the characteristic function φ of ν is differentiable (in the full sense of 262Fa) and that ∂η (y) =
R j

i ξj eiy . x ν(dx) for every j ≤ r, y ∈ R r , using ξj , ηj to represent the coordinates of x and y as usual.

> (b) Let X = (X1 , . . . , Xr ) be a family of real-valued random variables, with characteristic function φX . Show
that the characteristic function φXj of Xj is given by
φXj (y) = φX (yej ) for every y ∈ R,
r
where ej is the jth unit vector of R .

> (c) Let X be a real-valued random variable and φX its characteristic function. Show that
φaX+b (y) = eiyb φX (ay)
for any a, b, y ∈ R.

(d) Let X be a real-valued random variable and φ its characteristic function.


(i) Show that for any integrable complex-valued function h on R,
∧ 1 R∞
E(h(X)) = √ φ(−y)h(y)dy,
2π −∞

writing h for the Fourier transform of h.
(ii) Show that for any rapidly decreasing test function h,
1 R∞ ∧
E(h(X)) = √ φ(y) h(y)dy.
2π −∞

(e) Let ν be a Radon probability measure on R, and suppose that its characteristic function φ is square-integrable.
Show that ν is an indefinite-integral measure over Lebesgue measure and that its Radon-Nikodým derivatives are also
R R ∧
square-integrable. (Hint: use 284O to find a square-integrable f such that h × f = √12π φ × h for every rapidly
Rb
decreasing test function h, and ideas from the proof of 284G to show that a f = ν ]a, b[ whenever a < b in R.)

> (f ) Let X = (X1 , . . . , Xr ) be a family of real-valued random variables with characteristic function φX . Suppose
that φX is expressible in the form
Qr
φX (y) = j=1 φj (ηj )
for some functions φ1 , . . . , φr , writing y = (η1 , . . . , ηr ) as usual. Show that X1 , . . . , Xr are independent. (Hint:
show that the φj must be the characteristic functions of the Xj ; now show that the distribution of X has the same
characteristic function as the product of the distributions of the Xj .)
285Xq Characteristic functions 461

(g) Let X1 , X2 be independent real-valued random variables with the same distribution, and φ the characteristic
function of X1 − X2 . Show that φ(t) = φ(−t) ≥ 0 for every t ∈ R.

(h) Let ν be a Radon probability measure on R, with characteristic function φ. Show that
1 i Ra e−idy −e−icy
(ν[c, d] + ν ]c, d[) = lima→∞ −a
φ(y)dy
2 2π y
whenever c ≤ d in R. (Hint: use part (a) of the proof of 283F.)

(i) Let X be a real-valued random variable and φX its characteristic function. Show that
R 1/a
Pr(|X| ≥ a) ≤ 7a 0
(1 − Re(φX (y))dy
for every a > 0.

(j) We say that a set Q of Radon probability measures on R is uniformly tight if for every ǫ > 0 there is an M ≥ 0
such that ν(R \ [−M, M ]) ≤ ǫ for every ν ∈ Q. Show that if Q is any uniformly tight family of Radon probability
measures on R, and ǫ > 0, then there are η > 0 and y0 , . . . , yn ∈ R such that
ν ]−∞, a] ≤ ν ′ ]−∞, a + ǫ] + ǫ
whenever ν, ν ′ ∈ Q and |φν (yj ) − φν ′ (yj )| ≤ η for every j ≤ n, writing φν for the characteristic function of ν.

(k) Let hνn in∈N be a sequence of Radon probability measures on R, and suppose that it converges for the vague
topology to a Radon probability measure ν. Show that {ν} ∪ {νn : n ∈ N} is uniformly tight in the sense of 285Xj.

> (l) Let ν, ν ′ be two totally finite Radon measures on R r which agree on all closed half-spaces, that is, sets of the
form {x : x . y ≥ c} for c ∈ R, y ∈ R r . Show that ν = ν ′ . (Hint: reduce to the case νR r = ν ′ Rr = 1 and use 285M.)

>(m) For γ > 0, the Cauchy distribution with centre 0 and scale parameter γ is the Radon probability measure
νγ defined by the formula
γR 1
νγ (E) = dt.
π E γ 2 +t2

(i) Show that if X is a random variable with distribution νγ then Pr(X ≥ 0) = Pr(|X| ≥ γ) = 12 . (ii) Show that the
characteristic function of νγ is y 7→ e−γ|y| . (Hint: 283Xr.) (iii) Show that if X and Y are independent random variables
with Cauchy distributions, both centered at 0 and with scale parameters γ, δ respectively, and α, β are not both 0,
then αX + βY has a Cauchy distribution centered at 0 and with scale parameter |α|γ + |β|δ. (iv) Show that if X and
Y are independent normally distributed random variables with expectation 0 then X/Y has a Cauchy distribution.

> (n) Let X1 , X2 , . . . be an independent identically distributed sequence of random variables, all of zero expectation
and variance 1; let φ be their common characteristic function. For each n ≥ 1, set Sn = √1n (X1 + . . . + Xn ).
y
(i) Show that the characteristic function φn of Sn is given by the formula φn (y) = (φ( √ ))n for each n.
n
2 y 2
(ii) Show that |φn (y) − e−y /2
| ≤ n|φ( √ ) − e−y /2n
|.
n
2 √
(iii) Setting h(y) = φ(y) − e−y /2 , show that h(0) = h′ (0) = h′′ (0) = 0 and therefore that limn→∞ nh(y/ n) = 0,
2
so that limn→∞ φn (y) = e−y /2 for every y ∈ R.
1 Ra 2
(iv) Show that limn→∞ Pr(Sn ≤ a) = √ −∞
e−x /2 dx for every a ∈ R.

> (o) A random variable X has a Poisson distribution with parameter λ > 0 if Pr(X = n) = e−λ λn /n! for every
n ∈ N. (i) Show that in this case E(X) = Var(X) = λ. (ii) Show that if X and Y are independent random variables
with Poisson distributions then X + Y has a Poisson distribution. (iii) Find a proof of (ii) based on 285Q.

> (p) For x ∈ R r , let δx be the Dirac measure on R r concentrated at x. Show that δx ∗ δy = δx+y for all x, y ∈ R r .

(q) Let P be the set of Radon probability measures on R r . For y ∈ R r , set ρ′y (ν, ν ′ ) = |φν (y) − φν ′ (y)| for all ν,
′ 1
ν ∈ P , writing φν for the characteristic function of ν. Set ψ(x) = (√2π) r
e−x . x/2 for x ∈ R r . Show that the vague
topology on P is defined by the family {ρψ } ∪ {ρ′y : y ∈ Qr }, defining ρψ as in 285K, and is therefore metrizable. (Hint:
281K; cf. 285Xj.)
462 Fourier analysis 285Xr

r
PnLetPφn: R → C be the characteristic function of a Radon
> (r) probability measure on R r . Show that φ(0) = 1 and
r
that j=0 k=0 cj c̄k φ(aj − ak ) ≥ 0 whenever a0 , . . . , an ∈ R and c0 , . . . , cn ∈ C. (‘Bochner’s theorem’ states that
these conditions are sufficient, as well as necessary, for φ to be a characteristic function; see 445N in Volume 4.)
Pn
(s) Let hXn in∈N be an independent sequence of real-valued random variables and set Sn = j=0 Xj for each n ∈ N.
Suppose that the sequence hνSn in∈N of distributions is convergent for the vague topology to a distribution. Show that
hSn in∈N converges in measure, therefore a.e. (Hint: 285J, 273B.)

(t) Let X be a normal random variable with expectation a and variance σ 2 . Show that E(eX ) = exp(a + 21 σ 2 ).

285Y Further exercises (a) Let ν be a Radon probability measure on R r . Write


1 R
e−iy . x ν(dx)

ν(y) = √
( 2π)r
for every y ∈ R r .
∧ 1
(i) Writing φν for the characteristic function of ν, show that ν(y) = √ r φν (−y) for every y ∈ R r .
( 2π)
R ∧ R ∧
(ii) Show that ν(y)h(y)dy = h(x)ν(dx) for any Lebesgue integrable complex-valued function h on R r , defining

the Fourier transformR h

as in 283Wa.
R ∧
(iii) Show that h(x)ν(dx) = h(y)ν(y)dy for any rapidly decreasing test function h on R r .
(iv) Show that if ν is an indefinite-integral measure over Lebesgue measure, with Radon-Nikodým derivative f ,

then ν is the Fourier transform of f .

(b) Let ν be a Radon probability measure on R r , with characteristic function φ. Show that whenever c ≤ d in R r
then
Z α1 Z αr r
Y
i r e−iδj ηj −e−iγj ηj 
lim ... φ(y)dy
2π α1 ,... ,αr →∞ −α1 −αr j=1
ηj
Q
exists and lies between ν ]c, d[ and ν[c, d], writing ]c, d[ = j≤r ]γj , δj [ if c = (γ1 , . . . , γr ) and d = (δ1 , . . . , δr ).

(c) Let hXnP in∈N be an independent identically distributed sequence of (not-essentially-constant) random variables,
n
and set Sn = k=0 X2k+1 − X2k for each n ∈ N. Show thatP limn→∞ Pr(|Sn | ≥ α) = 1 for every α ∈ R. (Hint: 285Xg,
n
proof of 285J.) Hence, or otherwise, show that limn→∞ Pr(| k=0 Xk | ≥ α) = 1 for every α ∈ R.

(d) For Radon probability measures ν, ν ′ on R r set

ρ(ν, ν ′ ) = inf{ǫ : ǫ ≥ 0, ν ]−∞, a] ≤ ν ′ ]−∞, a + ǫ1] + ǫ ≤ ν ]−∞, a + 2ǫ1] + 2ǫ


for every a ∈ R r },
writing ]−∞, a] = {(ξ1 , . . . , ξr ) : ξj ≤ αj for every j ≤ r} when a = (α1 , . . . , αr ), and 1 = (1, . . . , 1) ∈ R r . Show that
ρ is a metric on the set of Radon probability measures on R r , and that the topology it defines is the vague topology.
(Cf. 274Ya.)

(e) Let r ≥ 1. We say that a set Q of Radon probability measures on R r is uniformly tight if for every ǫ > 0 there
is a compact set K ⊆ R r such that ν(R r \K) ≤ ǫ for every ν ∈ Q. Show that if Q is any uniformly tight family of Radon
probability measures on R r , and ǫ > 0, then there are η > 0, y0 , . . . , yn ∈ R r such that ν ]−∞, a] ≤ ν ′ ]−∞, a + ǫ1] + ǫ
whenever ν, ν ′ ∈ Q and a ∈ R r and |φν (yj ) − φν ′ (yj )| ≤ η for every j ≤ n, writing φν for the characteristic function
of ν.
R
(f ) Show that for any M ≥ 0 the set of Radon probability measures ν on R r such that kxkν(dx) ≤ M is uniformly
tight in the sense of 285Ye.

(g) Let Cb (R r ) be the Banach space of bounded continuous real-valued functions on R r .


(i) Show that anyR Radon probability measure ν on Rr corresponds to a continuous linear functional hν : Cb (Rr ) →
r
R, writing hν (f ) = f dν for f ∈ Cb (R ).
(ii) Show that if hν = hν ′ then ν = ν ′ .
(iii) Show that the vague topology on the set of Radon probability measures corresponds to the weak* topology
on the dual (Cb (R r ))∗ of Cb (R r ).
285Yr Characteristic functions 463

(h) Let r ≥ 1 and let P be the set of Radon probability measures on R r . For m ∈ N let ρ∗m be the pseudometric on
P defined by setting ρ∗m (ν, ν ′ ) = supkyk≤m |φν (y) − φν ′ (y)| for ν, ν ′ ∈ P , writing φν for the characteristic function of
ν. Show that {ρ∗m : m ∈ N} defines the vague topology on P .

(i) Let r ≥ 1 and let P be the set of Radon probability measures on R r . For m ∈ N let ρ̃∗m be the pseudometric on
P defined by setting
R
ρ̃∗m (ν, ν ′ ) = {y:kyk≤m}
|φν (y) − φν ′ (y)|dy

for ν, ν ′ ∈ P , writing φν for the characteristic function of ν. Show that {ρ̃∗m : m ∈ N} defines the vague topology on P .

(j) Let X be a real-valued random variable with finite variance. Show that for any η ≥ 0,
1 1
|φ(y) − 1 − iyE(X) + y 2 E(X 2 )| ≤ η|y 3 |E(X 2 ) + y 2 E(ψη (X)),
2 6

writing φ for the characteristic function of X and ψη (x) = 0 for |x| ≤ η, x2 for |x| > η.

(k) Suppose that ǫ ≥ δ > 0 and that X0 , . . . , Xn are independent real-valued random variables such that
Pn Pn
E(Xk ) = 0 for every k ≤ n, k=0 Var(Xk ) = 1, k=0 E(ψδ (Xk )) ≤ δ
2

(writing ψδ (x) = 0 if |x| ≤ δ, x if |x| > δ). Set γ = ǫ/ δ 2 + δ, and let Z be a standard normal random variable. Show
that
2 1
|φ(y) − e−y /2
| ≤ ǫ|y|3 + y 2 (δ + E(ψγ (Z)))
3
Pn
for every y ∈ R, writing φ for the characteristic function of X = k=0p Xk . (Hint: write φk for the characteristic
function of Xk and φ̃k for the characteristic function of σk Z, where σk = Var(Xk ). Show that
1 
|φk (y) − φ̃k (y)| ≤ ǫ|y 3 |σk2 + y 2 E(ψǫ (Xk )) + σk2 E(ψγ (Z)) .)
3

(l) Show that for every ǫ > 0 there is a δ > 0 such that whenever X0 , . . . , Xn are independent real-valued random
variables such that
Pn Pn
E(Xk ) = 0 for every k ≤ n, k=0 Var(Xk ) = 1, k=0 E(ψδ (Xk )) ≤ δ
2
(writing ψδ (x) = 0 if |x| ≤ δ, x2 if |x| > δ), then |φ(y) − e−y /2
| ≤ ǫ(y 2 + |y 3 |) for every y ∈ R, writing φ for the
characteristic function of X = X0 + . . . + Xn .

(m) Use 285Yl to prove Lindeberg’s theorem (274F).

(n) Let r ≥ 1 and let P be the set of Radon probability measures on R r . Show that convolution, regarded as a map
from P × P to P , is continuous when P is given the vague topology. (Hint: 281Xa and 257B will help.)

(o) Let S be the topology on R defined by {ρ′y : y ∈ R}, where ρ′y (x, x′ ) = |eiyx − eiyx | (compare 285S). Show that
addition and subtraction are continuous for S in the sense of 2A5A.

(p) Let ν be a Radon probability measure on R r with bounded support (definition: 256Xf). Show that its charac-
teristic function is smooth.

(q) Let (Ω, Σ, µ) be a probability space. Suppose that hXn in∈N is a sequence of real-valued random variables on Ω,
and X another real-valued random variable on Ω; let φXn , φX be the corresponding characteristic functions. Show that
the following are equiveridical: (i) limn→∞ E(f (Xn )) = E(f (X)) for every bounded continuous function f : R → R;
(ii) limn→∞ φXn (y) = φX (y) for every y ∈ R. (In this case we say that hXn in∈N converges in distribution to X.)

(r) Let (Ω, Σ, µ) be a probability space, and P the set of Radon probability measures on R. (i) Show that we have
a function ψ : L0 (µ) → P defined by saying that ψ(X • ) is the distribution of X whenever X is a real-valued random
variable on Ω. (ii) Show that ψ is continuous for the topology of convergence in measure on L0 (µ) and the vague
topology on P . (Compare 271Yd.)
464 Fourier analysis 285 Notes

285 Notes and comments Just as with Fourier transforms, the power of methods which use the characteristic func-
tions of distributions is based on three points: (i) the characteristic function of a distribution determines the distribution
(285M); (ii) the properties of interest in a distribution are reflected in accessible properties of its characteristic function
(285G, 285I, 285J) (iii) these properties of the characteristic function are actually different from the corresponding
properties of the distribution, and are amenable to different kinds of investigation. Above all, the fact that (for se-
quences!) convergence in the vague topology of distributions corresponds to pointwise convergence for characteristic
functions (285L) provides us with a path to the classic limit theorems, as in 285Q and 285Xn. In 285S-285U I show that
this result for sequences does not correspond immediately to any alternative characterization of the vague topology,
though it can be adapted in more than one way to give such a characterization (see 285Yh-285Yi).
Concerning the Central Limit Theorem there is one conspicuous difference between the method suggested here and
that of §274. The previous approach offered at least a theoretical possibility of giving an explicit formula for δ in 274F
as a function of ǫ, and hence an estimate of the rate of convergence to be expected in the Central Limit Theorem. The
arguments in the present chapter, involving as they do an entirely non-constructive compactness argument in 281A,
leave us with no way of achieving such an estimate. But in fact the method of characteristic functions, suitably refined,
is the basis of the best estimates known, such as the Berry-Esséen theorem (274Hc).
In 285D I try to show how the characteristic function φν of a Radon probability measure can be related to a ‘Fourier

transform’ ν of ν which corresponds directly to the Fourier transforms of functions discussed in §§283-284. If f is a non-


negative Lebesgue integrable function and we take ν to be the corresponding indefinite-integral measure, then ν = f .
Thus the concept of ‘Fourier transform of a measure’ is a natural extension of the Fourier transform of an integrable
function. Looking at it from the other side, the formula of 285Dc shows that ν can be thought of as representing the

inverse Fourier transform of ν in the sense of 284H-284I. Taking ν to be the measure which assigns a mass 1 to the
point 0, we get the Dirac delta function, with Fourier transform the constant function χR. These ideas can be extended
without difficulty to handle convolutions of measures (285R).
It is a striking fact that while there is no satisfactory characterization of the functions which are Fourier transforms
of integrable functions, there is a characterization of the characteristic functions of probability distributions. This is
‘Bochner’s theorem’. I give the condition in 285Xr, asking you to prove its necessity as an exercise; we already have
three-quarters of the machinery to prove its sufficiency, but the last step will have to wait for Volume 4.

286 Carleson’s theorem


Carleson’s theorem (Carleson 66) was the (unexpected) solution to a long-standing problem. Remarkably, it can
be proved by ‘elementary’ arguments. The hardest part of the work below, in 286I-286L, involves only the laborious
verification of inequalities. How the inequalities were chosen is a different matter; for once, some of the ideas of the
proof lie in the statements of the lemmas. The argument here is a greatly expanded version of Lacey & Thiele 00.
The Hardy-Littlewood Maximal Theorem (286A) is important, and worth learning even if you leave the rest of the
section as an unexamined monument. I bring 286B-286D forward to the beginning of the section, even though they
are little more than worked exercises, because they also have potential uses in other contexts.
The complexity of the argument is such that it is useful to introduce a substantial number of special notations.
Rather than include these in the general index, I give a list in 286W. Among them are ten constants C1 , . . . , C10 . The
values of these numbers are of no significance. The method of proof here is quite inappropriate if we want to estimate
rates of convergence. I give recipes for the calculation of the Cn only for the sake of the linear logic in which this
treatise is written, and because they occasionally offer clues concerning the tactics being used.
In this section all integrals are with respect to Lebesgue measure µ on R unless otherwise stated.

286A The Maximal Theorem Suppose that 1 < p < ∞ and that f ∈ LpC (µ) (definition: 244P). Set
1 Rb
f ∗ (x) = sup{ |f | : a ≤ x ≤ b, a < b}
b−a a

21/p p
for x ∈ R. Then kf ∗ kp ≤ kf kp .
p−1

proof (a) It is enough to consider the case f ≥ 0. Note that if E ⊆ R has finite measure, then
R R
E
f= (f × χE) × χE ≤ kf × χEkp (µE)1/q ≤ kf kp (µE)1/q
p R
is finite, where q = , by Hölder’s inequality (244Eb). Consequently, if t > 0 and E
f ≥ tµE, we must have
p−1
tµE ≤ kf × χEkp (µE)1/q and
286A Carleson’s theorem 465

1 1R
µE = (µE)p−p/q ≤ kf × χEkpp = f p.
tp tp E

(b) For t > 0, set


Ra
Gt = {x : x
f > (a − x)t for some a > x}.

P For any a ∈ R,
(i) Gt is an open set. P
Ra
Gta = {x : x < a, x
f > (a − x)t}
Ra S
is open, because x 7→ x
f and x 7→ (a − x)t are continuous (225A); so Gt = Gta is open. Q
Q a∈R
R
(ii) By 2A2I, there is a partition C of Gt into open intervals. Now C is bounded and tµC ≤ C f for every C ∈ C.
P
P Express C as ]a, b[ (for the moment, we have to allow R c for the possibility that one or both of a, b is infinite).
(α) If x ∈ C, there is some (finite) c > x such that x f > (c − x)t. Set d = min(b, c) > x. If d = c, then of course
Rd Rc
x
f > (d − x)t. If d = b < c, then (because b ∈/ Gt ) b f ≤ (c − b)t, so again
Rd Rb Rc Rc
x
f= x
f= x
f− b
f > (c − x)t − (c − b)t = (b − x)t = (d − x)t.
Rd
Thus we always have some d ∈ ]x, b] such that x
f > (d − x)t.
(β) Now take any z ∈ C, and consider
Rx
Az = {x : z ≤ x ≤ b,
f ≥ (x − z)t}. z
Rx
Then z ∈ Az , and Az is closed, again because the functions x 7→ z f and x 7→ (x − z)t are continuous. Moreover, Az
1
is bounded, because x − z ≤ p kf kpp for every x ∈ Az , by (a). ?? If sup Az = x0 < b, then x0 ∈ Az , and there is a
t
Rd
d ∈ ]x0 , b] such that x0 f ≥ t(d − x0 ), by (α); but in this case d ∈ Az , which is impossible. X
X Thus b = sup Az ∈ Az
Rb
(in particular, b < ∞), and z f ≥ (b − z)t.
1
(γ) Letting z decrease to a, we see that b − a ≤ kf kpp , so a is finite, and also
tp
Rb Rb
a
f = limz↓a z
f ≥ limz↓a (b − z)t = (b − a)t,
as required. Q
Q
(iii) Accordingly, because C is countable and f is non-negative,
P P 1R 1R∞
µGt = C∈C µC ≤ C∈C p C f p ≤ fp
t tp −∞
is finite, and
R P R P
Gt
f= C∈C C
f≥ C∈C tµC = tµGt .

(c) All this is true for every t > 0. Now if we set


1 Ra
f1∗ (x) = supa>x f
a−x x
for x ∈ R, we have {x : f1∗ (x) > t} = Gt for every t > 0.
For any t > 0,
1 1 R 1 R∞ 1
tµGt = (1 − )tµGt ≤ Gt
f − tχR ≤ −∞
(f − tχR)+ .
p q q q
So

Z ∞ Z ∞
(f1∗ )p = µ{x : f1∗ (x)p > t}dt
−∞ 0
(see 252O)
Z ∞
=p up−1 µ{x : f1∗ (x) > u}du
0
(substituting t = up )
466 Fourier analysis 286A
Z ∞ Z ∞ Z ∞
1 
≤p up−1 µGu du = p2 up−2 (f − uχR)+ du
0 0 −∞
q
Z ∞Z ∞
1
= p2 max(0, f (x) − u)up−2 dudx
−∞ 0
q

1
(by Fubini’s theorem, 252B, because (x, u) 7→ up−2 max(0, f (x) − u) is measurable and non-negative)
q
Z ∞ Z qf (x)
1
= p2 up−2 (f (x) − u)dudx
−∞ 0
q
Z ∞
2 p−1
p q p p
= fp = ( ) kf kpp .
p(p−1) −∞
p−1

1 Rx R∞ p p
(d) Similarly, setting f2∗ (x) = supa<x f for x ∈ R, −∞
(f2∗ )p ≤ ( ) kf kpp . But f ∗ = max(f1∗ , f2∗ ). P
P Of
x−a a p−1
course f1∗ ≤ f ∗ and f2∗ ≤ f ∗ . But also, if f ∗ (x) > t, there must be a non-trivial interval I containing x such that
R Rx Rb
I
f > tµI; if a = inf I and b = sup I, then either a f > (x − a)t and f2∗ (x) > t, or x f > (b − x)t and f1∗ (x) > t. As
x and t are arbitrary, f ∗ = max(f1∗ , f2∗ ). Q
Q
Accordingly
Z ∞ Z ∞
∗ p ∗ p
kf kp = (f ) = max((f1∗ )p , (f2∗ )p )
−∞ −∞
Z ∞
p p
≤ (f1∗ )p + (f2∗ )p ≤ 2( ) kf kpp .
−∞
p−1

Taking pth roots, we have the inequality we seek.

286B Lemma Let g : R → [0, ∞[ be a function which is non-decreasing on ]−∞, α], non-increasing
R∞ onR [β, ∞[

and constant on [α, β], where α ≤ β. Then for any measurable function f : R → [0, ∞], −∞ f × g ≤ −∞ g ·
1 Rb
supa≤α,b≥β,a<b a
f.
b−a

1 Rb
proof Set γ = supa≤α,b≥β,a<b For n, k ∈ N set Enk = {x : α − 2n ≤ x ≤ β + 2n , g(x) ≥ 2−n (k + 1)}, so that
f.
b−a a
R P4n −1
Enk is either empty or a bounded interval including [α, β], and Enk f ≤ γµEnk . For n ∈ N, set gn = 2−n k=0 χEnk ;
then hgn in∈N is a non-decreasing sequence of functions with supremum g, and
Z ∞ Z ∞
n
4X −1 Z
f × g = sup f × gn = sup 2−n f
−∞ n∈N −∞ n∈N Enk
k=0
n
4X −1 Z ∞ Z ∞
≤ sup 2−n γµEnk = sup γ gn = γ g,
n∈N n∈N −∞ −∞
k=0

as claimed.

286C Shift, modulation and dilation Some of the calculations below will be easier if we use the following
formalism. For any function f with domain included in R, and α ∈ R, we can define
(Sα f )(x) = f (x + α), (Mα f )(x) = eiαx f (x), (Dα f )(x) = f (αx)
whenever the right-hand sides are defined. In the case of Sα f and Dα f it is sometimes convenient to allow ±∞ as a
value of the function. We have the following elementary facts.

(a) S−α Sα f = f , D1/α Dα f = f if α 6= 0.

(b) Sα (f × g) = Sα f × Sα g, Dα (f × g) = Dα f × Dα g.

(c) Dα |f | = |Dα f |.
286Eb Carleson’s theorem 467

(d) If f is integrable, then


∧ ∧ ∨
(Mα f )∧ = S−α f , (Sα f )∧ = Mα f , (Sα f )∨ = M−α f ;
if moreover α > 0, then
∧ ∨
α(Dα f )∧ = D1/α f , α(Dα f )∨ = D1/α f
(283Cc-283Ce).

(e) If f belongs to L1C = L1C (µ), so do Sα f , Mα f and (if α 6= 0) Dα f , and in this case
1
kSα f k1 = kMα f k1 = kf k1 , kDα f k1 = kf k1 .
|α|

(f ) If f belongs to L2C so do Sα f , Mα f and (if α 6= 0) Dα f , and in this case


1
kSα f k2 = kMα f k2 = kf k2 , kDα f k2 = √ kf k2 .
|α|

(g) If f is a rapidly decreasing test function (284A), so are Mα f and Sα f and (if α 6= 0) Dα f .

286D Lemma Suppose that f : R → [0, ∞] is a measurable function such that, for some constant C ≥ 0,
R √ R∞ 1
E
f ≤ C µE whenever µE < ∞. Then f is finite almost everywhere and −∞ f (x)dx is finite.
1+|x|

proof For any n ≥ 1, set En = {x : |x| ≤ n, f (x) ≥ n}; then


R √
nµEn ≤ En
f ≤ C µEn ,
C2
so µEn ≤ and
n2
T S
{x : f (x) = ∞} = n≥1 m≥n Em
P∞
has measure at most inf n≥1 m=nR µEm = 0.
x
As for the integral, set F (x) = 0 f for x ≥ 0. Then, for any a ≥ 0,

Z a Z a
f (x) F (a) F (x)
dx = + dx
0
1+x 1+a 0
(1+x)2
(225F)
√ Z a √ Z ∞ √
a x  x 
≤C + 2
dx ≤C 1+ 2
dx ,
1+a 0
(1+x) 0
(1+x)

so
R ∞ f (x) R∞ √
x 
0
dx ≤ C 1 + 0 2
dx
1+x (1+x)
R0 f (x)
is finite. Similarly, −∞ 1−x
dx is finite, so we have the result.

 
286E The Lacey-Thiele construction (a) Let I be the family of all dyadic intervals of the form 2k n, 2k (n + 1)
where k, n ∈ Z. The essential geometric property of I is that if I, J ∈ I then either I ⊆ J or J ⊆ I or I ∩ J = ∅. Let
Q be the set of all pairs σ = (Iσ , Jσ ) ∈ I 2 such that µIσ · µJσ = 1. For σ ∈ Q, let kσ ∈ Z be such that µIσ = 2−kσ
and µJσ = 2kσ ; let xσ be the midpoint of Iσ , yσ the midpoint of Jσ , Jσl ∈ I the left-hand half-interval of Jσ , Jσr ∈ I
the right-hand half-interval of Jσ , and yσl the lower quartile of Jσ , that is, the midpoint of Jσl .
∧ ∧
(b) There is a rapidly decreasing test function φ such that φ is real-valued and χ[− 16 , 61 ] ≤ φ ≤ χ[− 15 , 51 ]. P P Look
at parts (b)-(d) of the proof of 284G. The process there can be  used to provide us with a smooth function ψ 1 which
is zero outside the interval [ 61 , 51 ] and strictly positive on 16 , 51 ; multiplying by a suitable
 factor,
 we can
 arrange
 that
R∞ Rx 1 1
−∞ 1
ψ = 1. So if we set ψ2 (x) = 1 − −∞ ψ1 for x ∈ R, ψ2 will be smooth, and χ −∞, 6 ≤ ψ2 ≤ χ −∞, 5 . Now
∨ ∧
set ψ0 (x) = ψ2 (x)ψ2 (−x) for x ∈ R, and φ = ψ 0 ; φ = ψ0 (284C) will have the required property. Q
Q
468 Fourier analysis 286Eb

For σ ∈ Q, set φσ = 2kσ /2 Myσl S−xσ D2kσ φ, so that


l
φσ (x) = 2kσ /2 eiyσ x φ(2kσ (x − xσ )).
∧ ∧
Observe that φσ is a rapidly decreasing test function. Now φσ = 2−kσ /2 S−yσl M−xσ D2−kσ φ, that is,
∧ l ∧
φσ (y) = 2−kσ /2 e−ixσ (y−yσ ) φ(2−kσ (y − yσl )),
which is zero unless |y − yσl | ≤ 15 2kσ ; since the length of Jσl is 12 2kσ , this can be true only when y ∈ Jσl . We have the
following simple facts.
(i) kφσ k2 = 2kσ /2 · 2−kσ /2 kφk2 =kφk2 for every σ ∈ Q.
∧ ∧ ∧
(ii) kφσ k1 = 2−kσ /2 · 2kσ kφk1 =2kσ /2 kφk1 for every σ ∈ Q.
(iii) If σ, τ ∈ Q and Jσ 6= Jτ and Jσr ∩ Jτr is non-empty, then Jσl ∩ Jτl = ∅ so
∧ ∧
(φσ |φτ ) = (φσ |φτ ) = 0,
R
by 284Ob. (For f , g ∈ L2C , I write (f |g) for f × ḡ.)
1
(c) Set w(x) = for x ∈ R. Then there is a C1 > 0 such that |φ(x)| ≤ C1 min(w(3), w(x)2 ) for every
(1+|x|)3
x ∈ R (because limx→∞ x6 φ(x) = limx→−∞ x6 φ(x) = 0). For σ ∈ Q, set wσ = 2kσ S−xσ D2kσ w, so that wσ (x) =
2kσ w(2kσ (x − xσ )) for every x. Elementary calculations show that
(i) wRσ dependsRonly on Iσ ;
∞ ∞
(ii) −∞ wσ = −∞ w = 1 for every σ;
(iii)
|φσ (x)| ≤ C1 min(2−kσ /2 wσ (x), 2−3kσ /2 wσ (x)2 )
for every x and σ (because |φ(x)| ≤ C1 w(x)2 ≤ C1 w(x) for every x ∈ R).

286F Two partial orders (a) For σ, τ ∈ Q say that σ ≤ τ if Iσ ⊆ Iτ and Jτ ⊆ Jσ . Then ≤ is a partial order on
Q. We have the following elementary facts.
(i) If σ ≤ τ , then kσ ≥ kτ .
(ii) If σ and τ are incomparable (that is, σ 6≤ τ and τ 6≤ σ), then (Iσ × Jσ ) ∩ (Iτ × Jτ ) is empty. P P We may
suppose that kσ ≤ kτ . If Jσ ∩ Jτ 6= ∅, then Jσ ⊆ Jτ , because both are dyadic intervals, and Jσ is the shorter; but as
τ 6≤ σ, this means that Iτ 6⊆ Iσ and Iτ ∩ Iσ = ∅. Q Q
(iii) If σ, σ ′ are incomparable and both less than or equal to τ , then Iσ ∩ Iσ′ = ∅, because Jτ ⊆ Jσ ∩ Jσ′ .
(iv) If σ ≤ τ and kσ ≥ k ≥ kτ , then there is a (unique) σ ′ such that σ ≤ σ ′ ≤ τ and kσ′ = k. (The point is that
there is a unique I ∈ I such that Iσ ⊆ I ⊆ Iτ and µI = 2−k ; and similarly there is just one candidate for Jσ′ .)

(b) For σ, τ ∈ Q say that σ ≤r τ if Iσ ⊆ Iτ and Jτr ⊆ Jσr (that is, either τ = σ or Jτ ⊆ Jσr ), so that, in particular,
σ ≤ τ . Note that if σ, σ ′ ≤r τ and kσ =
6 kσ′ then Jσr ∩ Jσr′ 6= ∅, so (φσ |φσ′ ) = 0 (286E(b-iii)).

(c) It will be convenient to have a shorthand for the following: if P , R ⊆ Q, say that P 4 R if for every σ ∈ P there
is a τ ∈ R such that σ ≤ τ .

286G We shall need the results of some elementary calculations. The first three are nearly trivial.
P∞ 1
Lemma (a) For any m ∈ N, n=m w(n + 12 ) ≤ .
2(1+m)2 R
(b) Suppose that σ ∈ P and that I is an interval not containing xσ in its interior. Then I wσ ≥ wσ (x)µI, where x
is the midpoint of I. P

(c) For any x ∈ R, n=−∞ w(x − n) ≤ 2.
R∞
(d) There is a constant C2 ≥ 0 such that −∞ w(x)w(αx + β)dx ≤ C2 w(β) whenever 0 ≤ α ≤ 1 and β ∈ R.
R
(e) There is a constant C3 ≥ 0 such that |(φσ |φτ )| ≤ 2−kσ /2 2kτ /2 C3 Iτ wσ whenever σ, τ ∈ Q and kσ ≤ kτ .
(f) There is a constant C4 ≥ 0 such that whenever τ ∈ Q and k ∈ Z, then
P R
σ∈Q,σ≤τ,kσ =k R\I wσ ≤ C4 . τ

proof (a) The point is just that w is convex on ]−∞, 0] and [0, ∞[. So we can apply 233Ib with f (x) = x, or argue
R n+1
directly from the fact that w(n + 12 ) ≤ 12 (w(n + 21 + x) + w(n + 21 − x)) for |x| ≤ 12 , to see that w(n + 12 ) ≤ n w for
every n ≥ 0. Accordingly
286G Carleson’s theorem 469

P∞ R∞ 1
n=m w(n + 12 ) ≤ m
w= .
2(1+m)2

R (b) Similarly, because I lies all on the same side of xσ , wσ is convex on I, so the same inequality yields wσ (x)µI ≤
w .
I σ

(c) Let m be such that |x − m| ≤ 21 . Then, using the same inequalities as before to estimate w(x − n) for n 6= m,
we have

X Z x−m− 12 Z ∞
w(x − n) ≤ w(x − m) + w+ w
n=−∞ −∞ x−m+ 12
Z ∞
≤1+ w = 2.
−∞

(d)(i) The first step is to note that

w( 12 (1 + β)) 8(1 + β)3


= ≤8
w(β) (3 + β)3
P For t ≥ 21 ,
for every β ≥ 0. Now αw(α + αβ) ≤ 4w(β) whenever β ≥ 0 and α ≥ 21 . P
d 1−2t(1+β)
tw(t + tβ) = ≤ 0,
dt (1+t+tβ)4
so
αw(α + αβ) ≤ 21 w( 12 + 12 β) ≤ 4w(β). Q
Q
Of course this means that
1 1+β
w( ) ≤ 8w(β)
α 2α
whenever β ≥ 0 and 0 < α ≤ 1.
1+β
(ii) Try C2 = 16. If 0 < α ≤ 1 and β ≥ 0, set γ = . Then, for any x ≥ −γ,

αx 1
1 + αx + β = (1 + β)(1 + ) ≥ (1 + β),
1+β 2

so w(αx + β) ≤ 8w(β) and


R∞ R∞
−γ
w(x)w(αx + β)dx ≤ 8w(β) −γ
w ≤ 8w(β).
On the other hand,
Z −γ Z ∞
w(x)w(αx + β)dx ≤ w(γ) w(αx + β)dx
−∞ −∞
Z ∞
1 1+β
= w( ) w ≤ 8w(β).
α 2α −∞
R∞
Putting these together, −∞
w(x)w(αx + β)dx ≤ 16w(β); and this is true whenever 0 < α ≤ 1 and β ≥ 0.
(iii) If α = 0, then
R∞ R∞
−∞
w(x)w(αx + β)dx = w(β) −∞
w = w(β) ≤ C2 w(β)
for any β. If 0 < α ≤ 1 and β < 0, then

Z ∞ Z ∞
w(x)w(αx + β)dx = w(−x)w(−αx − β)dx
−∞ −∞
(because w is an even function)
Z ∞
= w(x)w(αx − β)dx ≤ C2 w(−β)
−∞
(by (ii) above)
470 Fourier analysis 286G

= C2 w(β).

So we have the required inequality in all cases.


R 1/2
(e) Set C3 = max(C12 C2 , kφk22 / −1/2 w).

(i) It is worth disposing immediately of the case σ = τ . In this case,


|(φσ |φτ )| = kφσ k22 = kφk22 ,
while
Z Z xσ +2−kσ −1 Z 1/2
kσ kσ
wσ = 2 w(2 (x − xσ ))dx = w,
Iτ xσ −2−kσ −1 −1/2

R
so certainly |(φσ |φτ )| ≤ C3 Iτ
wσ .

R (ii) Now suppose that Iσ 6= Iτ . In this case, because kσ ≤ kτ , Iτ must all lie on the same side of xσ , so
w
Iτ σ
≥ wσ (xτ )µIτ , by (b).
We know from 286E(c-iii) that |φσ (x)| ≤ 2−kσ /2 C1 wσ (x) for every x. So

Z ∞
−kσ /2 −kτ /2
|(φσ |φτ )| ≤ 2 2 C12 wσ × wτ
−∞
Z ∞
= 2kσ /2 2kτ /2 C12 w(2kσ (x − xσ ))w(2kτ (x − xτ ))dx
−∞
Z ∞
= 2kσ /2 2−kτ /2 C12 w(2kσ −kτ x + 2kσ (xτ − xσ ))w(x)dx
−∞

≤ 2kσ /2 2−kτ /2 C12 C2 w(2kσ (xτ − xσ ))


(by (d), since 2kσ −kτ ≤ 1)
Z
≤ 2−kσ /2 2−kτ /2 C3 wσ (xτ ) ≤ 2−kσ /2 2kτ /2 C3 wσ ,

as required.
P∞ R ∞ R∞ 1
(f ) Set C4 = 2 j=0 j+ 12 w; this is finite because α
w= for every α ≥ 0.
2(1+α)2
If k < kτ then kσ 6= k for any σ ≤ τ , so the result is trivial. If k ≥ kτ , then for each dyadic subinterval I of Iτ of
length 2−k there
 is exactly one σ ≤ τ such that Iσ = I. List these as σ0 , . . . in ascending order of the centres xσj , so
that if Iτ = 2−kτ m, 2−kτ (m + 1) then xσj = 2−kτ m + 2−k (j + 12 ), for j < 2k−kτ . Now

X−1 Z 2−kτ m
2k−k X−1 Z 2−kτ m
2k−k
τ τ
1
w σj = 2 k
w(2k (x − 2−kτ m) − j − )dx
−∞ −∞
2
j=0 j=0

X−1 Z
k−kτ
2 0
1
= w(x − j − )dx
−∞
2
j=0

X ∞Z
1
≤ w = C4 .
j+ 12
2
j=0

Similarly (since w is an even function, so the whole picture is symmetric about xτ )


P2k−kτ −1 R ∞ 1
j=0 w ≤ C4 ,
2−kτ (m+1) σj 2
and
P R
σ≤τ,kσ =k R\Iτ
wσ ≤ C4 ,
as required.
286I Carleson’s theorem 471

286H ‘Mass’ and ‘energy’ (Lacey & Thiele 00) If P is a subset of Q, E ⊆ R is measurable, h : R → R is
measurable, and f ∈ L2C , set
R R∞
massEh (P ) = supσ∈P,τ ∈Q,σ≤τ E∩h−1 [J
w ≤ supτ ∈Q
] τ −∞
wτ = 1,
τ

qP
energyf (P ) = supτ ∈Q 2kτ /2 σ∈P,σ≤r τ |(f |φσ )|2 .

If P ′ ⊆ P then massEh (P ′ ) ≤ massEh (P ) and energyf (P ′ ) ≤ energyf (P ). Note that energyf ({σ}) = 2kσ /2 |(f |φσ )| for
any σ ∈ Q, since if σ ≤r τ then kτ ≤ kσ .

286I Lemma Set C5 = 212 . If P ⊆ Q is finite, E ⊆ R is measurable, h


P: R → R is measurable, and γ ≥ massEh (P ),
then we can find sets P1 ⊆ P , P2 ⊆ Q such that massEh (P1 ) ≤ 14 γ, γ τ ∈P2 µIτ ≤ C5 µE and P \ P1 4 P2 (in the
notation of 286Fc).

proof (a) Set P1 = {σ : σ ∈ P , massEh ({σ}) ≤ 41 γ}. Then massEhR (P1 ) ≤ 14 γ. If γ = 0 we can stop here, as P1 = P .
Otherwise, for each σ ∈ P \ P1 let σ ′ ∈ Q be such that σ ≤ σ ′ and E∩h−1 [J ′ ] wσ′ > 14 γ. Let P2 be the set of elements
σ
of {σ ′ : σ ∈ P \ P1 } which are maximal for ≤; then P \ P1 4 P2 .

(b) For k ∈ N set


(k)
Rk = {τ : τ ∈ P2 , 2kτ µ(E ∩ h−1 [Jτ ] ∩ Iτ ) ≥ 22k−9 γ},
(k) S
where Iτ is the interval with the same centre as Iτ and 2k times its length. Now P2 = k∈N P Take τ ∈ P2 . If
Rk . P
(k)
k ∈ N and x ∈ R \ Iτ , then |x − xτ | ≥ 2k−kτ −1 , so
wτ (x) = 2kτ w(2kτ (x − xτ )) ≤ 2kτ w(2k−1 ) = 2kτ (1 + 2k−1 )−3 .
So

Z Z ∞ Z
X
1
γ < wτ = wτ + wτ
4 E∩h−1 [Jτ ] E∩h−1 [Jτ ]∩Iτ E∩h−1 [Jτ ]∩Iτ
(k+1) (k)
\Iτ
k=0

X
≤ 2kτ µ(E ∩ h−1 [Jτ ] ∩ Iτ ) + 2kτ µ(E ∩ h−1 [Jτ ] ∩ Iτ(k+1) )(1 + 2k−1 )−3 .
k=0

It follows that either


1
2kτ µ(E ∩ h−1 [Jτ ] ∩ Iτ ) ≥ γ
8

and τ ∈ R0 , or there is some k ∈ N such that


(k+1)
2kτ µ(E ∩ h−1 [Jτ ] ∩ Iτ )(1 + 2k−1 )−3 ≥ 2−k−4 γ
and
(k+1)
2kτ µ(E ∩ h−1 [Jτ ] ∩ Iτ ) ≥ (1 + 2k−1 )3 2−k−4 γ ≥ 22k−7 γ,
so that τ ∈ Rk+1 . Q
Q
P
(c) For every k ∈ N, τ ∈Rk µIτ ≤ 211−k µE. P P If Rk = ∅, this is trivial. Otherwise, enumerate Rk as hτj ij≤n in
such a way that kτj ≤ kτl if j ≤ l ≤ n. Define q : {0, . . . , n} → {0, . . . , n} inductively by the rule
(k) (k)
q(l) = min({l} ∪ {q(j) : j < l, (Iτq(j) × Jτq(j) ) ∩ (Iτl × Jτl ) 6= ∅})
(k) (k)
for each l ≤ n. A simple induction shows that q(q(l)) = q(l) ≤ l for every l ≤ n. Note that, for l ≤ n, Iτq(l) ∩ Iτl 6= ∅,
so that
(k) (k+2)
Iτl ⊆ Iτl ⊆ Iτq(l) ,
(k) (k)
because µIτl ≤ µIτq(l) . Moreover, if j < l ≤ n and q(j) = q(l), then both Jτj and Jτl meet Jτq(j) , therefore include it,
and Jτj ⊆ Jτl . But as τj and τl are distinct members of P2 , τl 6≤ τj and Iτj ∩ Iτl must be empty.
Set M = {q(j) : j ≤ n}. We have
472 Fourier analysis 286I

X X X
γ µIτ = γ µIτj
τ ∈Rk m∈M j≤n
q(j)=m
X X
≤γ µIτ(k+2)
m
= 2k+2 γ µIτm
m∈M m∈M
X
≤ 2k+2 29−2k µ(E ∩ h−1 [Jτm ] ∩ Iτ(k)
m
)
m∈M
k+2 9−2k
≤2 ·2 µE = 211−k µE
(k) (k)
because if l, m ∈ M and l < m then Iτl × Jτl and Iτm × Jτm are disjoint (since otherwise q(m) ≤ l and there can be
(k) (k)
no j such that q(j) = m), so that h−1 [Jτl ] ∩ Iτl and h−1 [Jτm ] ∩ Iτm are disjoint. Q
Q
(d) Accordingly
P P∞ P
γ τ ∈P2 µIτ ≤ γ k=0 τ ∈Rk µIτ ≤ 212 µE,
as required.

286J Lemma If P ⊆ Q is finite and f ∈ L2C , then


X X
(f |φσ )(φσ |φτ )(φτ |f ) ≤ C3 |(f |φσ )|2
σ,τ ∈P,Jσ =Jτ σ∈P
X
≤ C3 k (f |φσ )φσ k2 kf k2 .
σ∈P

proof

X X 1 
(f |φσ )(φσ |φτ )(φτ |f ) ≤ |(f |φσ )|2 + |(f |φτ )|2 |(φσ |φτ )|
2
σ,τ ∈P,Jσ =Jτ σ,τ ∈P,Jσ =Jτ

(because |ξζ| ≤ 21 (|ξ|2 + |ζ|2 ) for all complex numbers ξ, ζ)


X X
= |(f |φσ )|2 |(φσ |φτ )|
σ∈P τ ∈P
Jσ =Jτ
X X Z
≤ |(f |φσ )|2 C3 wσ
σ∈P τ ∈P,Jσ =Jτ Iτ

(by 286Ge, since kσ = kτ if Jσ = Jτ )


X Z ∞
≤ |(f |φσ )|2 C3 wσ
σ∈P −∞

(because if τ , τ ′ are distinct members of P and Jτ = Jτ ′ , then Iτ and Iτ ′ are disjoint)


X X
= C3 |(f |φσ )|2 = C3 (f |φσ )(φσ |f )
σ∈P σ∈P
X  X
= C3 (f |φσ )φσ f ≤ C3 k (f |φσ )φσ k2 kf k2
σ∈P σ∈P

by Cauchy’s inequality (244Eb).

286K Lemma Set



C6 = 4(C3 + 4C3 2C4 ).
Let P ⊆ Q be a finite set, f ∈ L2C and kf k2 = 1. Suppose that γ ≥ energyf (P ). Then we can find finite sets P1 ⊆ P
P
and P2 ⊆ Q such that energyf (P1 ) ≤ 12 γ, γ 2 τ ∈P2 µIτ ≤ C6 , and P \ P1 4 P2 .
proof (a) We may suppose that γ > 0 and that P 6= ∅, since otherwise we can take P1 = P and P2 = ∅.
286K Carleson’s theorem 473

(i) For τ ∈ Q, A ⊆ Q set


P
Tτ = {σ : σ ∈ P , σ ≤r τ }, ∆(A) = σ∈A |(f |φσ )|2 .
There are only finitely many sets of the form Tτ ; let R ⊆ Q be a non-empty finite set such that whenever τ ∈ Q and
Tτ is not empty, there is a τ ′ ∈ R such that Tτ = Tτ ′ and kτ ′ ≥ kτ ; this is possible because if A ⊆ P is not empty then
kτ ≤ minσ∈A kσ whenever A = Tτ .

(ii) Choose τ0 , τ1 , . . . , P0′ , P1′ , . . . inductively, as follows. P0′ = P . Given that Pj′ ⊆ P is not empty, consider
1
Rj = {τ : τ ∈ R, 2kτ ∆(Pj′ ∩ Tτ ) ≥ γ 2 }.
4

If Rj = ∅, stop the induction and set n = j, P2 = {τl : l < j}, P1 = Pj′ . Otherwise, among the members of Rj take

one with yτ as far to the left as possible, and call it τj ; set Pj+1 = Pj′ \ {σ : σ ∈ P , σ ≤ τj }, and continue. Note that
as Rj+1 ⊆ Rj for every j, yτj+1 ≥ yτj for every j.
The induction must stop at a finite stage because if it does not stop with n = j then ∆(Pj′ ∩ Tτj ) > 0, so Pj′ ∩ Tτj is

not empty and Pj+1 ⊆ Pj′ \ Tτj is a proper subset of Pj′ , while P0′ = P is finite. Since Rn = ∅,

p
energy(P1 ) = energy(Pn′ ) = sup 2kτ /2 ∆(Pn′ ∩ Tτ )
f f τ ∈Q
p 1
= max 2kτ /2 ∆(Pn′ ∩ Tτ ) ≤ γ.
τ ∈R 2

We also have P \ P1 4 {τj : j < n}.


S
(iii) Set Pj′′ = Pj′ ∩ Tτj ⊆ Pj′ \ Pj+1

for j < n, so that hPj′′ ij<n is disjoint, and P ′ = j<n Pj′′ ⊆ P . Then if σ ∈ P ′ ,
j < n and Jτj ⊆ Jσl , Iσ ∩ Iτj = ∅. P P?? Otherwise, take l < n such that σ ∈ Pl′′ . Then Jτj ⊆ Jσ , so kτj ≤ kσ and Iσ

must be included in Iτj ; thus σ ≤ τj and σ ∈ / Pj+1 . On the other hand,
yτj ∈ Jτj ⊆ Jσl , yτl ∈ Jτrl ⊆ Jσr ,
so yτj < yτl and j < l. But this means that σ ∈ / Pl′ , while we chose l such that σ ∈ Pl′′ . XXQQ

It follows that if σ, τ ∈ P are distinct and Jσl ∩ Jτl is not empty, then Iσ ∩ Iτ = ∅. P P If Jσ = Jτ this is true just
because σ 6= τ . Otherwise, since Jσ and Jτ intersect, one is included in the other; suppose that Jσ ⊂ Jτ . Since Jσ
meets Jτl , Jσ ⊆ Jτl . Now let j < n be such that σ ∈ Pj′′ ; then σ ≤ τj , so Jτj ⊆ Jσ ⊆ Jτl , and Iσ ∩ Iτ ⊆ Iτj ∩ Iτ = ∅ by
the last remark. QQ

(b) Now let us estimate

X X X
γ2 µIτj = 2−kτj γ 2 ≤ 4 ∆(Pj′′ )
j<n j<n j<n
X X X
=4 |(f |φσ )|2 = 4 |(f |φσ )|2 = 4α
j<n σ∈Pj′′ σ∈P ′

say.
P
Because kf k2 = 1, we have α ≤ k σ∈P ′ (f |φσ )φσ k2 (see 286J). So

X X
α2 ≤ k (f |φσ )φσ k22 = (f |φσ )(φσ |φτ )(φτ |f )
σ∈P ′ σ,τ ∈P ′
X X
= (f |φσ )(φσ |φτ )(φτ |f ) + 2 (f |φσ )(φσ |φτ )(φτ |f )
σ,τ ∈P ′ ,Jσ =Jτ σ,τ ∈P ′ ,Jσ ⊆Jτl

because (φσ |φτ ) = 0 unless Jσl ∩ Jτl 6= ∅, as noted in 286Eb. Take these two terms separately. For the first, we have
P

(f |φσ )(φσ |φτ )(φτ |f ) ≤ C3 α
σ,τ ∈P ,Jσ =Jτ

by 286J. For the second term, we have


474 Fourier analysis 286K

X X X
(f |φσ )(φσ |φτ )(φτ |f ) ≤ |(f |φσ )| |(φσ |φτ )(φτ |f )|
σ,τ ∈P ′ ,Jσ ⊆Jτl σ∈P ′ τ ∈P ′ ,Jσ ⊆Jτl
sX sX X 2
≤ |(f |φσ )|2 |(φσ |φτ )(φτ |f )|
σ∈P ′ σ∈P ′ τ ∈P ′ ,Jσ ⊆Jτl
sX

= α Hj ,
j<n

where for j < n I set


X X 2
Hj = |(φσ |φτ )(φτ |f )| .
σ∈Pj′′ τ ∈P ′ ,Jσ ⊆Jτl

Now we can estimate Hj by observing that, for any τ ∈ P ′ ,


|(φτ |f )| = 2−kτ /2 energyf ({τ }) ≤ 2−kτ /2 γ,
while if σ, τ ∈ P ′ and Jτl ⊇ Jσ then
R
|(φσ |φτ )| ≤ 2−kσ /2 2kτ /2 C3 Iτ

by 286Ge. We also need to know that if σ ∈ Pj′′ and τ , τ ′ are distinct elements of P ′ such that Jσ ⊆ Jτl ∩ Jτl ′ , then Iτ ,
Iτ ′ and Iτj are all disjoint, by (a-iii) above, because Jτj ⊆ Jσ . So we have

X X Z
2
Hj ≤ 2−kτ /2 γ2−kσ /2 2kτ /2 C3 wσ
σ∈Pj′′ τ ∈P ′ ,Jσ ⊆Jτl Iτ

X X Z
2
= C32 γ 2 2−kσ wσ
σ∈Pj′′ Iτ
τ ∈P ′ ,Jσ ⊆Jτl
X Z
≤ C32 γ 2 2−kσ ( wσ )2
σ∈Pj′′ R\Iτj


X X Z Z ∞
≤ C32 γ 2 2−k wσ · wσ
k=kτj σ∈Pj′′ ,kσ =k R\Iτj −∞


X
≤ C32 γ 2 2−k C4
k=kτj

(by 286Gf, since σ ≤ τj for every σ ∈ Pj′′ )


= C32 γ 2 2−kτj +1 C4 .
Accordingly
P P
j<n Hj ≤ 2C32 γ 2 C4 j<n 2−kτj ≤ 2C32 C4 · 4α.
Putting these together,
X X
α2 ≤ (f |φσ )(φσ |φτ )(φτ |f ) + 2 (f |φσ )(φσ |φτ )(φτ |f )
σ,τ ∈P ′ ,Jσ =Jτ σ,τ ∈P ′ ,Jσ ⊆Jτl
sX
√ √ p
≤ C3 α + 2 α Hj ≤ C3 α + 4C3 α 2C4 α
j<n
p  1
= α C3 + 4C3 2C4 = αC6 .
4
But this means that
P
γ2 j<n µIτj ≤ 4α ≤ C6 ,
and P2 = {τj : j < n} has the property required.
286L Carleson’s theorem 475

286L Lemma Set



28 4 14C3 
C7 = C1 6 + + .
w(3/2) w(3/2)

Suppose that P is a finite subset of Q with an upper bound τ in Q. Suppose that E ⊆ R is measurable, h : R → R is
measurable and f ∈ L2C . Then
P R −kτ
σ∈P |(f |φσ ) E∩h−1 [J r ] φσ | ≤ 2 C7 energyf (P ) massEh (P ).
σ

proof Set γ = energyf (P ), γ ′ = massEh (P ). If P = ∅ the result is trivial, so suppose that P 6= ∅.


   
(a)(i) For a dyadic interval J = 2k n, 2k (n + 1) set J ∗ = 2k (n − 1), 2k (n + 2) , the half-open interval with the
same centre as J and three times its length. Let J be the family of those J ∈ I such S that Iσ 6⊆ J ∗ for any σ ∈ P
such that µIσ ≤ µJ. Because P is finite, all sufficientlyS small intervals belong to J , and J = R; let K be the set of
maximal members of J , so that K is disjoint. Then K = R. P P The point is that P 6= ∅; fix σ ∈ P for the moment.
If J ∈ J , consider for each n ∈ N the interval J˜(n) ∈ I including J with length 2n µJ. Then there is some n ∈ N such
that µJ˜(n) ≥ µIσ and Iσ ⊆ (J˜(n) )∗ , so that J˜(k) ∈
/ J for any k ≥ n, and there must be some k < n such that J˜(k) ∈ K.
S S S
Thus J ⊆ J˜(k) ⊆ K; as J is arbitrary, K = J = R. Q Q
(ii) For K ∈ K, let lK ∈ Z be such that µK = 2−lK . If lK ≥ kτ , so that µK ≤ µIτ , then K must lie within the
interval Iˆ with centre xτ and length 7µIτ , since otherwise we should have Iτ ∩ K̃ ∗ = ∅, where K̃ is the dyadic interval
of length 2µK including K, and K̃ would belong to J . But this means that
P −kτ
K∈K,lK ≥kτ µK ≤ 7µIτ = 7 · 2 ,
because K is disjoint.
(iii) For any l < kτ , there are just three members K of K such that lK = l. PP If J ∈ I and µJ > µIτ , then either
Iτ ⊆ J ∗ or Iτ ∩ J ∗ = ∅, and J ∈J iff Iτ ∩ J ∗ is empty.
 This means
 −l that if K ∈ −l
 and µK = 2 , K ∈ K iff Iτ ∩ K is
I ∗
∗ −l −l −l
empty and Iτ ⊆ K̃ . So if Iτ ⊆ 2 n, 2 (n + 1) and K = 2 m, 2 (m + 1) , we shall have K ∈ K iff
either m = n − 2 or m = n + 2 or m = n − 3 is even or m = n + 3 is odd;
which for any given n happens for just three values of m. Q
Q
(b) For σ ∈ P , let ζσ be a complex number of modulus 1 such that
R R
ζσ (f |φσ ) E∩h−1 [Jσr ]
φσ = |(f |φσ ) E∩h−1 [Jσr ]
φσ |.

Set W = P × K. For (σ, K) ∈ W , set


R
ασK = (f |φσ ) E∩h−1 [Jσr ]∩K
φσ .

The aim of the proof is to estimate


P R P
σ∈P
(f |φσ )
E∩h−1 [Jσr ]
φσ = (σ,K)∈W ζσ ασK .

It will be helpful to note straight away that


P P R∞
(σ,K)∈W |ασK | ≤ σ∈P |(f |φσ )| −∞
|φσ |
is finite.
Set
W0 = {(σ, K) : σ ∈ P , K ∈ K, kτ ≤ lK ≤ kσ },

W1 = {(σ, K) : σ ∈ P , K ∈ K, lK < kτ },

W2 = {(σ, K) : σ ∈ P , K ∈ K, kσ < lK , σ 6≤r τ },

W3 = {(σ, K) : σ ∈ P , K ∈ K, kσ < lK , σ ≤r τ }.
Because kσ ≥ kτ for every σ ∈ P , W = W0 ∪ W1 ∪ W2 ∪ W3 . I will give estimates for
P
αj = (σ,K)∈Wj ζσ ασK
for each j; the three components in the expression for C7 given above are bounds for |α0 |+|α1 |, |α2 | and |α3 | respectively.
476 Fourier analysis 286L

(c)(i) Whenever K ∈ K and σ ∈ P ,


|(f |φσ )| ≤ 2−kσ /2 γ,
by 286H, and

Z Z
|φσ | ≤ 2−3kσ /2 C1 wσ2
E∩h−1 [Jσr ]∩K E∩h−1 [Jσr ]∩K

(286E(c-iii))
Z
≤ 2−3kσ /2 C1 wσ · sup wσ (x)
E∩h−1 [Jσ ] x∈K
−3kσ /2 ′
≤2 C1 γ sup wσ (x)
x∈K
−kσ /2 ′
=2 C1 γ w(2kσ ρ(xσ , K)),

where I write ρ(xσ , K) for inf x∈K |x − xσ |. So, for fixed K ∈ K and k ≥ lK ,
X X
|ασK | ≤ 2−k C1 γγ ′ w(2k ρ(xσ , K))
σ∈P,kσ =k σ∈P,kσ =k
X∞
≤ 2−k C1 γγ ′ · 2 w(n + 12 )
n=2k−lK

because the xσ , for σ ∈ P and kσ = k, are all distinct and all a distance at least 2−lK from K (because Iσ 6⊆ K ∗ ); so
there are at most two σ with ρ(xσ , K) = 2−k (n + 12 ) for each n ≥ 2k−lK . So we have
P −k
σ∈P,kσ =k |ασK | ≤ 2 C1 γγ ′ (1 + 2k−lK )−2 ≤ 2−k−2 C1 γγ ′
by 286Ga. And this is true whenever K ∈ K and k ≥ lK .

(ii) Now
X X X
|α0 | ≤ |ασK | = |ασK |
(σ,K)∈W0 K∈K σ∈P
lK ≥kτ kσ ≥lK
X ∞
X X
= |ασK |
K∈K k=lK σ∈P
lK ≥kτ kσ =k
X X ∞ X
−k−2
≤ 2 C1 γγ ′ = C1 γγ ′ 2−lK −1
K∈K k=lK K∈K,lK ≥kτ
lK ≥kτ
1 X
= C1 γγ ′ µK ≤ 4 · 2−kτ C1 γγ ′
2
K∈K,lK ≥kτ

by the formula in (a-ii). This deals with α0 .

(d) Next consider W1 . We have

X X ∞
X X
|α1 | ≤ |ασK | = |ασK |
(σ,K)∈W1 K∈K k=kτ σ∈P
lK <kτ kσ =k
X ∞
X
≤ C1 γγ ′ 2−k (1 + 2k−lK )−2
K∈K k=kτ
lK <kτ

(by (c-i) above )


286L Carleson’s theorem 477

τ −1
kX X ∞
X
≤ C1 γγ ′ (1 + 2kτ −l )−2 2−k
l=−∞ K∈K k=kτ
lK =l
τ −1
kX X
= 2−kτ +1 C1 γγ ′ (1 + 2kτ −l )−2
l=−∞ K∈K
lK =l
τ −1
kX
−kτ ′
=6·2 C1 γγ (1 + 2kτ −l )−2
l=−∞
(by (a-iii))

X
−kτ ′
=6·2 C1 γγ (1 + 2l )−2 ≤ 2 · 2−kτ C1 γγ ′
l=1

because
P∞ P∞ 1 1
l=1 (1 + 2l )−2 ≤ l=1 4l = .
3
This deals with α1 .
S
(e) For K ∈ K, set GK = K ∩ E ∩ σ∈P,kσ <lK h−1 [Jσ ]. Then µGK ≤ 2γ ′ µK/w( 32 ). P P If lK ≤ kτ , then GK = ∅, so
we may suppose that lK > kτ . Let K̃ ∈ I be the dyadic interval containing K and with twice the length. Then K̃ ∈ / J,
so there is a σ ∈ P such that K̃ ∗ ⊇ Iσ and µIσ ≤ µK̃, so that kσ ≥ lK − 1 ≥ kτ . Let υ ∈ Q be such that σ ≤ υ ≤ τ
and kυ = lK − 1 (286F(a-iv)). Then Iυ meets K̃ ∗ , so K̃ is either equal to Iυ or adjacent to it, and |x − xυ | ≤ 32 · 2−kυ
for every x ∈ K̃, therefore for every x ∈ K. Accordingly
wυ (x) ≥ 2kυ w( 23 ) = w( 23 )/2µK
R
for every x ∈ K. On the other hand, because σ ∈ P and σ ≤ υ, E∩h−1 [Jυ ] wυ ≤ γ ′ . So
µ(E ∩ h−1 [Jυ ] ∩ K) ≤ 2γ ′ µK/w( 32 ).
Now suppose that σ ′ ∈ P and kσ′ < lK . Then Jσ′ is the dyadic interval of length 2kσ′ including Jτ . But Jυ is
the dyadic interval of length 2kυ including Jτ , so includes Jσ′ , and h−1 [Jσ′ ] ⊆ h−1 [Jυ ]. As σ ′ is arbitrary, GK ⊆
E ∩ h−1 [Jυ ] ∩ K and µGK ≤ 2γ ′ µK/w( 32 ), as claimed. Q
Q
(f )(i) For x ∈ R, set
P
v2 (x) = (σ,K)∈W2 ζσ (f |φσ )φσ (x)χ(E ∩ h−1 [Jσr ] ∩ K)(x) .
Then, for any x ∈ R, either v2 (x) = 0 or there is a k ≥ kτ such that
P
v2 (x) = σ∈P,kσ =k ζσ (f |φσ )φσ (x) .
P If v2 (x) 6= 0, then we have a pair (υ, L) ∈ W2 such that x ∈ E ∩ h−1 [Jυr ] ∩ L. Now suppose that (σ, K) ∈ W2 and
P
either K 6= L or kσ 6= kυ . If K 6= L then of course x ∈ / L, because K is a disjoint family. If kσ 6= kυ , then examine
Jσ and Jυ . These are dyadic intervals of different lengths, and both include Jτ . On the other hand, neither of the
right-hand halves Jσr , Jυr includes Jτr , because σ 6≤r τ and υ 6≤r τ . So either Jσ ∩ Jυr = ∅ (if kσ < kυ ) or Jυ ∩ Jσr = ∅ (if
kσ > kυ ); in either case, Jσr ∩ Jυr is empty, and x ∈ / h−1 [Jσr ].
On the other hand, of couse, if σ ∈ P and kσ = kυ , then kσ < lL and Jσr = Jυr does not include Jτr , so that
(σ, L) ∈ W2 and x ∈ E ∩ h−1 [Jσr ] ∩ L.
Thus
P P
v2 (x) = | (σ,K)∈W2 ,x∈h−1 [J r ]∩K ζσ (f |φσ )φσ (x)| = | σ∈P,kσ =kυ ζσ (f |φσ )φσ (x)|,
σ

and we can set k = kυ . Q


Q
(ii) It follows that v2 (x) ≤ 2C1 γ for every x ∈ R. P
P If v2 (x) = 0 this is trivial. Otherwise, take k from (i). Then

X X
v2 (x) ≤ |(f |φσ )φσ (x)| ≤ 2−k/2 γ · 2−k/2 C1 wσ (x)
σ∈P,kσ =k σ∈P,kσ =k
(by 286H and 286E(c-iii))
478 Fourier analysis 286L

X ∞
X
= C1 γ w(2k (x − xσ )) ≤ C1 γ w(2k x − n − 12 )
σ∈P,kσ =k n=−∞

(because the xσ , for σ ∈ P and kσ = k, are all distinct and of the form 2−k (n + 21 ))
≤ 2C1 γ

by 286Gc. Q
Q
(iii) Note also that, if v2 (x) > 0, there is a pair (σ, K) ∈ W2 such that x ∈ h−1 [Jσ ] ∩ K, so that kτ ≤ kσ < lK
and x ∈ GK . But now we have

X Z ∞

|α2 | = ζσ (f |φσ ) φσ × χ(E ∩ h−1 [Jσr ] ∩ K)
(σ,K)∈W2 −∞
Z ∞ X Z X
≤ v2 = v2 ≤ 4C1 γγ ′ µK/w( 32 )
−∞ K∈K,lK >kτ GK K∈K,lK >kτ

(putting the estimates in (e) and (ii) just above together)


≤ 28 · 2−kτ C1 γγ ′ /w( 32 )

by (a-ii). This deals with α2 .


P
(g) Set P ′ = {σ : σ ∈ P , σ ≤r τ } and g̃ = σ∈P ′ ζσ (f |φσ )φσ . Then
kg̃k22 ≤ 2−kτ C3 γ 2 .
P If σ, σ ′ ∈ P ′ and kσ 6= kσ′ , then (φσ |φσ′ ) = 0 (286Fb). While if kσ = kσ′ , then Jσ = Jσ′ , because σ and σ ′ have a
P
common upper bound τ . So

X
kg̃k22 = ζσ (f |φσ )(φσ |φσ′ )(φσ′ |f )ζ̄σ′
σ,σ ′ ∈P ′
X X
≤ (f |φσ )(φσ |φσ′ )(φσ′ |f ) ≤ C3 |(f |φσ )|2
σ,σ ′ ∈P ′ ,Jσ =Jσ′ σ∈P ′

(by 286J)
≤ C3 · 2−kτ γ 2

by the definition of ‘energy’. Q


Q
(h) For m ∈ N, set
P
g̃m = σ∈P ′ ,kσ ≤m ζσ (f |φσ )φσ .

Then whenever x, x′ ∈ R and |x − x′ | ≤ 2−m , |g̃m (x)| ≤ 12 C1 g̃ ∗ (x′ ), where


1 Rb
g̃ ∗ (x′ ) = supa≤x′ ≤b,a<b |g̃|
b−a a

P (i) Since kσ ≥ kτ for every σ ∈ P ′ , we may take it that m ≥ kτ . Let Jˆ be the dyadic interval of length
as in 286A. P
∧ ∧
2m including Jτ , and ŷ its midpoint. Set ψ = S−ŷ D2−m /3 φ, that is, ψ(y) = φ( 13 2−m (y − ŷ)) for y ∈ R.

(ii) If σ ∈ P ′ and kσ ≤ m and φσ (y) 6= 0, then y ∈ Jσ . But Jσ ∩ Jˆ ⊇ Jτ is not empty, so Jσ ⊆ J,


ˆ |y − ŷ| ≤ 1 2m ,
2
| 13 2−m (y 1
− ŷ)| ≤ 6 and ψ(y) = 1.

(iii) If σ ∈ P ′ and kσ > m and φσ (y) 6= 0, then Jσr ∩ Jˆ ⊇ Jτr is non-empty, so Jˆ ⊆ Jσr and y ≤ yσ ≤ ŷ; now
1 1 3 1 1
ŷ − y = (ŷ − yσ ) + (yσ − y) ≥ · 2m + · 2kσ ≥ · 2m , | · 2−m (y − ŷ)| ≥
2 20 5 3 5

and ψ(y) = 0.
(iv) What this means is that if σ ∈ P ′ then
286L Carleson’s theorem 479

∧ ∧
φσ × ψ = φσ if kσ ≤ m,
= 0 if kσ > m,
∧ ∧
so that g̃ m = ψ × g̃.
1 ∨ ∨ ∨
(v) By 283M, g̃m = √ g̃ ∗ ψ, where g̃ ∗ ψ is the convolution of g̃ and the inverse Fourier transform ψ of ψ.

1 ∨
(Strictly speaking, 283M, with the help of 284C, tells us that g̃m and √ g̃ ∗ ψ have the same Fourier transforms. By

283G, they are equal almost everywhere; by 255K, the convolution is defined everywhere and is continuous; so in fact
they are the same function.) Now
∨ ∧∨
ψ = 3 · 2m Mŷ D3·2m φ = 3 · 2m Mŷ D3·2m φ,
that is,

ψ(x) = 3 · 2m eixŷ φ(3 · 2m x)
for y ∈ R.
(vi) Set θ(x) = min(w(3), w(x)) for x ∈ R, so that θ is non-decreasing on ]−∞, −3], non-increasing on [3, ∞[,
and constant on [−3, 3], and |φ(x)| ≤ C1 θ(x) for every x, by the choice of C1 (286Ec). Take any x, x′ ∈ R such that
|x − x′ | ≤ 2−m . Then

Z ∞ Z ∞
1 ∨ 3·2m
|g̃m (x)| ≤ √ |g̃(x − t)||ψ(t)|dt = |g̃(x − t)||φ(3 · 2m t)|dt

2π −∞
2π −∞
Z ∞ Z ∞
3·2m m 3·2m
≤ √ C1 |g̃(x − t)|θ(3 · 2 t)dt = √ C1 |g̃(x + t)|θ(3 · 2m t)dt
2π −∞
2π −∞
(because θ is an even function)
Z ∞ Z b
3·2m 1
≤ √ C1 θ(3 · 2m t)dt · sup |g̃(x + t)|dt
2π −∞ a≤−2 −m ,b≥2 −m b−a a

(by 286B, because t 7→ θ(3 · 2m t) is non-decreasing on ]−∞, −2−m ], non-increasing on [2−m , ∞[ and constant on
[−2−m , 2−m ])
Z ∞ Z b Z ∞
1 1 1
= √ C1 θ· sup |g̃| ≤ C1 w · g̃ ∗ (x′ )
2π −∞ a≤x−2−m ,b≥x+2−m b−a a
2 −∞

(because if a ≤ x − 2−m and b ≥ x + 2−m then a ≤ x′ ≤ b)


1
= C1 g̃ ∗ (x′ )
2

as required. Q
Q
(i) For x ∈ R, set
P
v3 (x) = (σ,K)∈W3 ζσ (f |φσ )φσ (x)χ(E ∩ h−1 [Jσr ] ∩ K)(x) .
Then whenever L ∈ K and x, x′ ∈ L, |v3 (x)| ≤ C1 g̃ ∗ (x′ ). P P We may suppose that v3 (x) 6= 0, so that, in particular,
x ∈ E. The only pairs (σ, K) contributing to the sum forming v3 (x) are those in which x ∈ K, so that K = L, and
h(x) ∈ Jσr . Moreover, since we are looking only at σ such that σ ≤r τ , so that Jτr ⊆ Jσr , Jσr will always be the dyadic
interval of length 2kσ −1 including Jτr . So these intervals are nested, and there will be some m such that (for σ ≤r τ )
h(x) ∈ Jσr iff kσ ≥ m. Accordingly
P
v3 (x) = ′ ζσ (f |φσ )φσ (x) = |g̃l −1 (x) − g̃m−1 (x)|
σ∈P ,m≤kσ <lL L

′ −lL +1 −m+1
(we must have m < lL because v3 (x) 6= 0). Now |x − x | ≤ 2 ≤2 , so (h) tells us that both |g̃lL −1 (x)| and
|g̃m−1 (x)| are at most 12 C1 g̃ ∗ (x′ ), and v3 (x) ≤ C1 g̃ ∗ (x′ ), as claimed. Q
Q
C1 R
It follows that v3 (x) ≤ L
g̃ ∗ for every x ∈ L.
µL
480 Fourier analysis 286L

(j) Now we are in a position to estimate

X Z ∞ X Z
|α3 | = | ζσ ασK | ≤ v3 ≤ v3
(σ,K)∈W3 −∞ K∈K,lK >kτ GK

(because if v3 (x) 6= 0 there are (σ, K) ∈ W3 such that x ∈ K, and in this case x ∈ GK and lK > kσ ≥ kτ )
X Z
C1
≤ µGK · g̃ ∗
µK K
K∈K,lK >kτ
(by (i) above, because GK ⊆ K)
X Z
2γ ′
≤ C1 g̃ ∗
K∈K,lK >kτ
w( 32 ) K

(by (e))
Z
2C1 γ ′
≤ g̃ ∗
w( 32 ) Iˆ
ˆ as noted in (a-ii))
(because if lK > kτ then K ⊆ I,
q
2C1 γ ′
≤ µIˆ · kg̃ ∗ k2
w( 32 )
(by Cauchy’s inequality)
2C1 γ ′ √ √
≤ 3 7 · 2−kτ /2 8kg̃k2
w( 2 )
(by the Maximal Theorem, 286A)

4C1 γ ′ 14 −kτ /2 −kτ /2 p
≤ 2 ·2 C3 γ
w( 32 )
(by (g))

4C1 14C3 ′ −kτ
= γγ 2 .
w( 32 )

(k) Assembling these,

X Z X 3
X X 3
X

(f |φσ ) φσ = ζσ ασK = ζσ ασK ≤ |αj |
σ∈P E∩h−1 [Jσr ] σ∈P,K∈K j=0 (σ,K)∈Wj j=0

≤ 4 · 2−kτ C1 γγ ′ + 2 · 2−kτ C1 γγ ′ + 28 · 2−kτ C1 γγ ′ /w( 32 )


p
+ 4 14C3 · 2−kτ C1 γγ ′ /w( 32 )
= 2−kτ C7 γγ ′ ,
as claimed.

286M The Lacey-Thiele lemma Set C8 = 3C7 (C5 + C6 ). Then


P R
σ∈Q |(f |φσ ) E∩h−1 [J r ] φσ | ≤ C8
σ

whenever f ∈ L2C , kf k2 = 1, µE ≤ 1 and h : R → R is measurable.


p
proof (a) The first step is to combine 286I and 286K, as follows: if P ⊆ Q is finite and max( massEh (P ), energyf (P )) ≤
p P
γ, there are P ′ ⊆ P and R ⊆ Q such that max( massEh (P ′ ), energyf (P ′ )) ≤ 21 γ, γ 2 τ ∈R 2−kτ ≤ C5 + C6 ,
and P \ P ′ 4 R. P Since massEh (P ) ≤ γ 2 , 286I tells us that there are P0 ⊆ P , R0 ⊆ Q such that
P
P
1 2 2 −kτ
massEh (P0 ) ≤ 4 γ , γ τ ∈R0 2 ≤ C5 , and P \ P0 4 R0 . Now turn to 286K: since energyf (P0 ) ≤ energyf (P ) ≤ γ,
P
we can find P ⊆ P0 and R1 ⊆ Q such that energyf (P ′ ) ≤ 12 γ, γ 2 τ ∈R1 2−kτ ≤ C6 , and P0 \ P ′ 4 R1 .

p
Now massEh (P ′ ) ≤ massEh (P0 ) ≤ 14 γ 2 , so max( massEh (P ′ ), energyf (P ′ )) ≤ 21 γ; and setting R = R0 ∪ R1 ,
P
γ 2 τ ∈R 2−kτ ≤ C5 + C6 , while P \ P ′ 4 R. Q Q
286N Carleson’s theorem 481
p
(b) Now take any finite P ⊆ Q. Let k ∈ N be such that max( massEh (P ), energyf (P )) ≤ 2k . By (a), we can
choose hPn in∈N , hRn in∈N inductively such that P0 = P and, for each n ∈ N,
Pn+1 ⊆ Pn , Pn \ Pn+1 4 Rn ,
p P
max( massEh (Pn ), energyf (Pn )) ≤ 2k−n , 2−kτ ≤ C5 + C6 .
22k−2n τ ∈Rn
T
Since energyf ({σ}) = 2kσ /2 |(f |φσ )| > 0 whenever (f |φσ ) 6= 0 (286H), (f |φσ ) = 0 whenever σ ∈ n∈N Pn , and

X Z X Z

(f |φσ ) φσ = (f |φσ ) φσ
E∩h−1 [Jσr ] S E∩h−1 [Jσr ]
σ∈P σ∈ n∈N Pn \Pn+1

X X Z

= (f |φσ ) φσ
n=0 σ∈Pn \Pn+1 E∩h−1 [Jσr ]

∞ X X
X Z

≤ (f |φσ ) φσ
n=0 τ ∈Rn σ∈Pn E∩h−1 [Jσr ]
σ≤τ
X∞ X
−kτ
≤ 2 C7 energy(Pn ) mass(Pn )
f Eh
n=0 τ ∈Rn
(by 286L)

X
≤ C7 (C5 + C6 )22n−2k 2k−n min(1, 22k−2n )
n=0
(because massEh (Pn ) ≤ 1 for every n, as noted in 286H )

X
= C7 (C5 + C6 ) min(2n−k , 2k−n )
n=0
X∞
≤ C7 (C5 + C6 ) min(2n , 2−n ) = 3C7 (C5 + C6 ).
n=−∞

(c) Since this true for every finite P ⊆ Q,


P R
σ∈Q |(f |φσ ) E∩h−1 [Jσr ]
φ σ | ≤ C8 ,
as claimed.

286N Lemma Set C9 = C8 2. Suppose that f ∈ L2C , h : R → R is measurable and µF < ∞. Then
P R √
σ∈Q |(f |φσ ) F ∩h−1 [J r ] φσ | ≤ C9 kf k2 µF .
σ

proof This is trivial if kf k2 = 0, that is, f = 0 a.e. So we may take it that kf k2 > 0. Dividing both sides by kf k2 ,
we may suppose that kf k2 = 1.
Let k ∈ Z be such that 2k−1 < µF ≤ 2k . We have a bijection σ 7→ σ ∗ : Q → Q defined by saying that
σ = (2−k Iσ , 2k Jσ ); so that kσ∗ = kσ + k, xσ∗ = 2−k xσ , yσl ∗ = 2k yσl , Jσr∗ = 2k Jσr , and for every x ∈ R

k l
φσ (2k x) = 2kσ /2 e2 iyσ x
φ(2kσ (2k x − xσ ))
l
= 2kσ /2 eiyσ∗ x φ(2kσ +k (x − 2−k xσ ))
l
= 2kσ /2 eiyσ∗ x φ(2kσ∗ (x − xσ∗ )) = 2−k/2 φσ∗ (x).

Write F̃ = 2−k F , so that µF̃ ≤ 1, and h̃(x) = 2k h(2k x) for every x. Then, for σ ∈ Q,

F ∩ h−1 [Jσr ] = {x : x ∈ F, h(x) ∈ Jσr } = {x : 2−k x ∈ F̃ , 2−k h̃(2−k x) ∈ Jσr }


= {x : 2−k x ∈ F̃ , h̃(2−k x) ∈ Jσr∗ } = 2k {x : x ∈ F̃ , h̃(x) ∈ Jσr∗ }.

Write f˜(x) = 2k/2 f (2k x), so that


482 Fourier analysis 286N

kf˜k2 = 2k/2 kD2k f k2 = kf k2 = 1,


while
R∞ R∞
(f |φσ ) = −∞
f × φ̄σ = 2k −∞
f (2k x)φσ (2k x)dx = (f˜|φσ∗ )
for every σ ∈ Q. Putting all these together,

X Z X Z

(f |φσ ) φσ = 2k (f |φσ ) φσ (2k x)dx
σ∈Q F ∩h−1 [Jσr ] σ∈Q 2−k (F ∩h−1 [Jσr ])
X Z

= 2k/2 (f˜|φσ∗ ) φσ ∗
σ∈Q F̃ ∩h̃−1 [Jσr ∗ ]
X Z

= 2k/2 (f˜|φτ ) φτ
τ ∈Q F̃ ∩h̃−1 [Jτr ]

≤ 2k/2 C8
(by the Lacey-Thiele lemma, applied to h̃, F̃ and f˜)
p p
≤ C9 µF = C9 kf k2 µF .

286O Lemma Suppose that f ∈ L2C . For x ∈ R, set


P
(Af )(x) = supz∈R σ∈Q,z∈J r |(f |φσ )φσ (x)|.
σ
R √
Then Af : R → [0, ∞] is Borel measurable, and F Af ≤ C9 kf k2 µF whenever µF < ∞.
P
proof (a) For z ∈ R and finite P ⊆ Q, set gP z = σ∈P,z∈Jσr |(f |φσ )φσ |, so that
Af (x) = sup{gP z (x) : P ⊆ Q is finite, z ∈ R}.
Because every gP z is continuous, Af is Borel measurable and
R R
F
Af = sup{ F
gP0 z0 ∨ . . . ∨ gPn zn : P0 , . . . , Pn ⊆ Q are finite, z0 , . . . , zn ∈ R}
for every measurable set F (256M).
(b) Given finite sets P0 , . . . , Pn ⊆ Q and z0 , . . . , zn , x ∈ R, set
g(x) = maxj≤n gPj zj (x), l(x) = min{j : j ≤ n, g(x) = gPj zj (x)}, h(x) = zl(x) ;
because every gPj zj is continuous, l : R → {0, . . . , n} and h : R → R are measurable. Now

Z Z Z X
g= gPl(x) ,h(x) (x)dx = |(f |φσ )φσ (x)|dx
F F F σ∈P r
l(x) ,h(x)∈Jσ
Z X
≤ |(f |φσ )φσ (x)|dx
F σ∈Q,h(x)∈J r
σ
XZ p
= |(f |φσ )φσ | ≤ C9 kf k2 µF
σ∈Q F ∩h−1 [Jσr ]

R √
by 286N. Since P0 , . . . , Pn and z0 , . . . , zn are arbitrary, F
Af ≤ C9 kf k2 µF .

286P Lemma For any z ∈ R, define θz : R → [0, 1] by setting



θz (y) = φ(2−k (y − ŷ))2
whenever there is a dyadic interval J ∈ I of length 2k such that z belongs to the right-hand half of J and y belongs to
the left-hand half of J and ŷ is the lower quartile of J, and zero if there is no such J. Then 0 ≤ θz (y) ≤ 1 for every

y ∈ R, θz (y) = 0 if y ≥ z, and 2π|(f × θz )∨ | ≤ Af for any rapidly decreasing test function f .
286P Carleson’s theorem 483

proof (a) I had better start by explaining why the recipe above defines a function θz . Let M be the set of those k ∈ Z
such that z belongs to the right-hand half of the dyadic interval Jˆk of length 2k containing z. For k ∈ M , let ŷk be the

midpoint of the left-hand half Jˆkl of Jˆk , and set ψk (y) = φ(2−k (y − ŷk ))2 for y ∈ R; then ψk is smooth and zero outside
Jˆkl . But now observe that if k, k ′ are distinct members of M , then Jˆkl and Jˆkl ′ are disjoint, as observed in 286E(b-iii).
P ∧
So θz is just the sum k∈M ψk . Because φ takes values in [0, 1], so does θz . If y ≥ z, then of course y ∈ / Jˆkl for any
k ∈ M , so θz (y) = 0.
(b) Fix a rapidly
S decreasing function f . For k ∈ M , set Rk = {(I, Jˆk ) : I ∈ I, µI = 2−k }, so that {σ : σ ∈
P testP
r
Q, z ∈ Jσ } = k∈M Rk , and k∈M σ∈Rk |(f |φσ )φσ (x)| ≤ (Af )(x).
∧ P
P If σ ∈ Rk , yσl = ŷk and xσ is of the form 2−k (n + 21 ) for some
Now, for k ∈ M , 2π(f × ψk )∨ = σ∈Rk (f |φσ )φσ . P
n ∈ Z. So

Z ∞ ∧ ¯∧
(f |φσ ) = f × φσ
−∞
(284O)
Z ∞ ∧ −k ∧
i(n+ 21 )(t−ŷk )
= f (t) · 2−k/2 e2 φ(2−k (t − ŷk ))dt
−∞

(by the formula in 286Eb, because φ is real-valued)
Z ∞
∧ 1 ∧
= 2k/2 f (2k t + ŷk )ei(n+ 2 )t φ(t)dt
−∞
Z π
∧ 1 ∧
k/2
=2 f (2k t + ŷk )ei(n+ 2 )t φ(t)dt
−π

1
(because φ(t) = 0 if |t| ≥ 5)
Z π
= 2k/2 g(t)eint dt,
−π

∧ ∧ 1 Rπ
where g(t) = f (2k t + ŷk )eit/2 φ(t) for −π < t ≤ π. So if we set cn = g(t)e−int dt, as in 282A, we have
2π −π

(f |φσ ) = 2k/2 · 2πc−n


P∞
when σ ∈ Rk and xσ = 2−k (n + 21 ). Note that as g is smooth and zero outside [− 15 , 51 ], n=−∞ |cn | < ∞ (282Rb).
Now, for any y ∈ Jˆkl ,

X ∞
X
∧ −k ∧
i(n+ 21 )(y−ŷk )
(f |φσ )φσ (y) = 2k/2 · 2πc−n · 2−k/2 e−2 φ(2−k (y − ŷk ))
σ∈Rk n=−∞

X
∧ −k−1 −k
= 2π φ(2−k (y − ŷk ))e−2 i(y−ŷk )
c−n e−2 in(y−ŷk )

n=−∞
X∞
∧ −k−1 −k
= 2π φ(2−k (y − ŷk ))e−2 i(y−ŷk )
cn ein2 (y−ŷk )

n=−∞

−k −2−k−1 i(y−ŷk )
= 2π φ(2 (y − ŷk ))e g(2−k (y − ŷk ))
1
(by 282L, because 2−k |y − ŷk | ≤ 4 < π)
−k−1 ∧ ∧ −k−1 ∧
= 2πe−2 i(y−ŷk )
φ(2−k (y − ŷk ))f (y)e2 i(y−ŷk )
φ(2−k (y − ŷk ))

= 2π f (y)ψk (y).
∧ P ∧ ∧
On the other hand, if y ∈ R \ Jˆkl , ψk (y) = φσ (y) = 0 for every σ ∈ Rk , so again σ∈Rk (f |φσ )φσ (y) = 2π f (y)ψk (y).
Next,
P k/2
P∞
σ∈Rk |(f |φσ )| = 2π · 2 n=−∞ |cn |
484 Fourier analysis 286P

and
R∞ ∧ R∞ ∧
supσ∈Rk −∞
|φσ | = 2k/2 −∞
|φ|
P R ∧
are finite. So σ∈Rk |(f |φσ )φσ | is finite. Accordingly, for any x ∈ R,

∧ X ∧ ∨
2π(f × ψk )∨ (x) = (f |φσ )φσ (x)
σ∈Rk
Z ∞ X
1 ∧
= √ (f |φσ )φσ (y)eixy dy
2π −∞ σ∈R
k

1 X Z ∞ ∧
= √ (f |φσ )φσ (y)eixy dy
2π −∞
σ∈Rk
(226E)
1 X ∧ 1 X
= √ (f |φσ )(φσ )∨ (x) = √ (f |φσ )φσ (x)
2π 2π
σ∈Rk σ∈Rk

by 284C. Q
Q
P ∧
(c) Because every ψk is non-negative, θz = k∈M ψk is bounded above by 1, and f is integrable,

Z ∞
∧ 1 ∧
2π(f × θz )∨ (x) = 2π · √ eixy f (y)θz (y)dy
2π −∞
√ Z ∞ X ixy ∧
= 2π e f (y)ψk (y)dy
−∞ k∈M
√ XZ ∞ ∧
= 2π eixy f (y)ψk (y)dy
k∈M −∞

(by 226E again)


X ∧ X X
= 2π (f × ψk )∨ (x) = (f |φσ )φσ (x),
k∈M k∈M σ∈Rk

and
∧ P P
2π|(f × θz )∨ (x)| ≤ k∈M σ∈Rk |(f |φσ )φσ (x)| ≤ (Af )(x).


286Q Lemma For α > 0 and y, z, β ∈ R, set θzαβ (y) = θαz+β (αy + β). Then
′ 3
(a) the function (α, β, y, z) 7→ θzαβ (y) : ]0, ∞[ × R → [0, 1] is Borel measurable;
(b) for any rapidly decreasing test function f ,


2π|(f × θzαβ )∨ | ≤ D1/α A(Mβ Dα f )
(in the notation of 286C) at every point.

proof (a) We need only observe that (y, z) 7→ θz (y) : R 2 → R is Borel measurable, and that (α, β, y, z) 7→ θzαβ (y) is
built up from this, + and ×.

(b) Set v = αz + β, so that θzαβ = Dα Sβ θv . Then
∧ ∧ ∧

f × θzαβ = f × Dα Sβ θv = Dα Sβ (S−β D1/α f × θv )
= αDα Sβ (S−β (Dα f )∧ × θv ) = αDα Sβ ((Mβ Dα f )∧ × θv ),
so


∨
(f × θzαβ )∨ = α Dα Sβ ((Mβ Dα f )∧ × θv )
∨
= D1/α Sβ ((Mβ Dα f )∧ × θv )
∨
= D1/α M−β (Mβ Dα f )∧ × θv
286R Carleson’s theorem 485

and
∧ ∨

2π|(f × θzαβ )∨ | = 2πD1/α (Mβ Dα f )∧ × θv ≤ D1/α A(Mβ Dα f )
by 286P.

286R Lemma For any y, z ∈ R,


R2 1 1R n ′ 
θ̃z (y) = 1 α
limn→∞ 0
θzαβ (y)dβ dα
n
is defined, and

θ̃z (y) = θ̃1 (0) > 0 if y < z,


= 0 if y ≥ z.


proof (a) The case y ≥ z is trivial; because if y ≥ z then αy + β ≥ αz + β for all α > 0 and β ∈ R, so that θzαβ (y) = 0
for every α > 0, β ∈ R and θ̃z (y) = 0. For the rest of the proof, therefore, I look at the case y < z.
′ ′
(b)(i) Given y < z ∈ R and α > 0, set l = ⌊log2 (20α(z − y))⌋. Then θz,α,β+2 l (y) = θzαβ (y) for every β ∈ R. P
P If

θzαβ (y) = θαz+β (αy + β) is non-zero, there must be k, m ∈ Z such that
1
2k (m + ) ≤ αz + β < 2k (m + 1)
2
and
∧ 1
φ(2−k (αy + β) − (m + ))2 = θzαβ

(y) 6= 0,
4
so
9
2k m ≤ αy + β ≤ 2k (m + )
20
∧ 1
because φ is zero outside [− 51 , 15 ]. In this case, · 2k < α(z − y), so that k ≤ l. We therefore have
20
1
2k (m + 2l−k + ) ≤ αz + β + 2l < 2k (m + 2l−k + 1),
2

1
2k (m + 2l−k ) ≤ αy + β + 2l < 2k (m + 2l−k + ),
2
so
∧ 1
′ −k ′
θz,α,β+2 l (y) = φ(2 (αy + β + 2l ) − (m + 2l−k + ))2 = θzαβ (y).
4
Similarly,
1
2k (m − 2l−k + ) ≤ αz + β − 2l < 2k (m − 2l−k + 1),
2

1
2k (m − 2l−k ) ≤ αy + β − 2l < 2k (m − 2l−k + ),
2
so
∧ 1
′ −k ′
θz,α,β−2 l (y) = φ(2 (αy + β − 2l ) − (m − 2l−k + ))2 = θzαβ (y).
4
′ ′
What this shows is that θz,α,β+2 l (y) = θzαβ (y) if either is non-zero, so we have the equality in any case. Q
Q

1 Rb ′
(ii) It follows that g(α, y, z) = limb→∞ θ (y)dβ is defined. P
P Set
b 0 zαβ
R 2l
γ = 2−l 0

θzαβ (y)dβ.
From (i) we see that
R 2l (m+1)
γ = 2−l 2l m

θzαβ (y)dβ
486 Fourier analysis 286R

for every m ∈ Z, and therefore that


1 R 2l m ′
γ= 0
θzαβ (y)dβ
2l m

for every m ≥ 1. Now θzαβ (y) is always greater than or equal to 0, so if 2l m ≤ b ≤ 2l (m + 1) then
Z 2l m Z b
m 1 ′ 1 ′
γ = θzαβ ≤ θzαβ
m+1 2l (m+1) 0
b 0
Z 2l (m+1)
1 ′ m+1
≤ θzαβ = γ,
2l m 0
m

which approach γ as b → ∞. Q
Q
′ 1 Rn ′
(c) Because (α, β) 7→ θzαβ (y) is Borel measurable, each of the functions α 7→ θ , for n ≥ 1, is Borel
n 0 zαβ
measurable (putting 251M and 252P together), and α 7→ g(α, y, z) : ]0, ∞[ → R is Borel measurable; at the same time,

R2 1
since 0 ≤ θzαβ (y) ≤ 1 for all α and β, 0 ≤ g(α, y, z) ≤ 1 for every α, and θ̃z (y) = 1 g(α, y, z)dα is defined in [0, 1].
α

(d) For any y < z, γ ∈ R and α > 0, g(α, y + γ, z + γ) = g(α, y, z). P


P It is enough to consider the case γ ≥ 0. In
this case
Z b
1 ′
g(α, y + γ, z + γ) = lim θz+γ,α,β (y + γ)dβ
b→∞ b 0
Z b
1
= lim θαz+αγ+β (αy + αγ + β)dβ
b→∞ b 0
Z b+αγ
1
= lim θαz+β (αy + β)dβ
b→∞ b αγ
Z b+αγ
1 ′
= lim θzαβ (y)dβ,
b→∞ b αγ

so
Z b+αγ Z αγ
1
|g(α, y + γ, z + γ) − g(α, y, z)| = lim ′
θzαβ (y)dβ − ′
θzαβ (y)dβ
b→∞ b b 0
2αγ
≤ lim = 0. Q
Q
b→∞ b

It follows that whenever y < z and γ ∈ R,


R1 1 R1 1
θ̃z+γ (y + γ) = 0 α
g(α, y + γ, z + γ)dα = 0 α
g(α, y, z)dα = θ̃z (y).

P If θz (y) 6= 0, then (as in (b) above)


(e) The next essential fact to note is that θ2z (2y) is always equal to θz (y). P
there are k, m ∈ Z such that
1 1 ∧ 1
2k (m + ) ≤ z < 2k (m + 1), 2k m ≤ y ≤ 2k (m + ), θz (y) = φ(2−k y − (m + ))2 .
2 2 4
In this case,
1 1
2k+1 (m + ) ≤ 2z < 2k+1 (m + 1), 2k+1 m ≤ 2y ≤ 2k+1 (m + ),
2 2
so
∧ 1
θ2z (2y) = φ(2−k−1 · 2y − (m + ))2 = θz (y).
4
Similarly,
1 1 1 1
2k−1 (m + ) ≤ z < 2k−1 (m + 1), 2k−1 m ≤ y ≤ 2k−1 (m + ),
2 2 2 2
so
286R Carleson’s theorem 487

1 ∧ 1 1
θ 12 z ( y) = φ(2−k+1 · y − (m + ))2 = θz (y).
2 2 4

This shows that θ2z (2y) = θz (y) if either is non-zero, and therefore in all cases. Q
Q
Accordingly
′ ′
θz,2α,2β (y) = θ2αz+2β (2αy + 2β) = θαz+β (αy + β) = θzαβ (y)
for all y, z, β ∈ R and all α > 0.
(f ) Consequently
Z b Z b/2
1 ′ 2 ′
g(2α, y, z) = lim θz,2α,β (y)dβ = lim θz,2α,2β (y)dβ
b→∞ b 0 b→∞ b 0
Z b/2 Z b
2 ′ 1 ′
= lim θzαβ (y)dβ = lim θzαβ (y)dβ = g(α, y, z)
b→∞ b 0 b→∞ b 0

whenever α > 0 and y, z ∈ R. It follows that


Rδ 1 Rδ 1 R 2δ 1
γ α
g(α, y, z)dα = γ α
g(2α, y, z)dα = 2γ α
g(α, y, z)dα

whenever 0 < γ ≤ δ, and therefore that


R 2γ 1 R2 1
γ
g(α, y, z)dα = 1 α
g(α, y, z)dα
α

P Take k ∈ Z such that 2k ≤ γ < 2k+1 . Then


for every γ > 0. P
Z 2γ Z 2k+1 Z γ Z 2γ
1 1 1 1
g(α, y, z)dα = g(α, y, z)dα − g(α, y, z)dα + g(α, y, z)dα
γ
α 2k
α 2k
α 2k+1
α
Z 2k+1 Z 2
1 1
= g(α, y, z)dα = g(α, y, z)dα. Q
Q
2k
α 1
α

(g) Now if α, γ > 0 and y < z,


1R b
g(α, γy, γz) = limb→∞ θαγz+β (αγy + β)dβ = g(αγ, y, z).
b 0
So if γ > 0 and y < z,
Z 2 Z 2
1 1
θ̃γz (γy) = g(α, γy, γz)dα = g(αγ, y, z)dα
1
α 1
α
Z 2γ Z 2
1 1
= g(α, y, z)dα = g(α, y, z)dα = θ̃z (y).
γ
α 1
α

Putting this together with (d), we see that if y < z then


θ̃z (y) = θ̃z−y (0) = θ̃1 (0).

(h) I have still to check that θ̃1 (0) is not zero. But suppose that 1 ≤ α < 76 and that there is some m ∈ Z such that
1 5
2(m + 12 ) ≤ β ≤ 2(m + 12 ). Then 2(m + 21 ) ≤ α + β < 2(m + 1), while | 12 β − (m + 41 )| ≤ 16 , so
∧ 1 1
θα+β (β) = φ( β − (m + ))2 = 1.
2 4

What this means is that, for 1 ≤ α < 76 ,


Z 2m
1
g(α, 1, 0) = lim θα+β (β)dβ
m→∞ 2m 0
m−1
X
1 1 5 1
≥ lim µ[2(j + ), 2(j + )] = .
m→∞ 2m 12 12 3
j=0

So
488 Fourier analysis 286R

R2 1 1 R 7/6 1
θ̃1 (0) = 1 α
g(α, 1, 0)dα ≥ dα > 0.
3 1 α
This completes the proof.

286S Lemma Suppose that f ∈ L2C .


(a) For every x ∈ R,
1R 2 1R n
(Ãf )(x) = lim inf n→∞ (D1/α AMβ Dα f )(x)dβdα
n 1 α 0

R in [0, ∞], and√Ãf : R → [0, ∞] is Borel measurable.


is defined
(b) F Ãf ≤ C9 kf k2 µF whenever µF < ∞.

(c) If f is a rapidly decreasing test function and z ∈ R, 2π|(f × θ̃z )∨ | ≤ Ãf at every point.
proof (a) The point here is that the function
(α, β, x) 7→ (D1/α AMβ Dα f )(x) : ]0, ∞[ × R 2 → [0, ∞]
is Borel measurable. P
P

x
(D1/α AMβ Dα f )(x) = (AMβ Dα f )( )
α
X x
= sup |(Mβ Dα f |φσ )φσ ( )|.
z∈R,P ⊆Q is finite σ∈P,z∈J r α
σ

Look at the central term in this formula. For any σ ∈ Q, we have


Z ∞
(Mβ Dα f |φσ ) = eiβt f (αt)φσ (t)dt
−∞
Z ∞
1
= eiβt/α f (t)φσ (t/α)dt.
α −∞
Now φσ is a rapidly decreasing test function, so there is some γ ≥ 0 such that |φσ (t)| ≤ γ/(1 + t2 ) for every t ∈ R.
This means that if α > 0 and hαn in∈N is a sequence in 12 α, ∞ and we set g(t) = supn∈N |φσ (t/αn )| for t ∈ R, then
g(t) ≤ 4γ/(4 + t2 ) for every t and g is integrable. So Lebesgue’s Dominated Convergence Theorem tells us that if
hαn in∈N → α and hβn in∈N → β,
1 R ∞ iβn t/αn 1 R ∞ iβt/α
e f (t)φσ (t/αn )dt → e f (t)φσ (t/α)dt.
αn −∞ α −∞
Thus
(α, β) 7→ (Mβ Dα f |φσ ) : ]0, ∞[ × R → R
is continuous; and this is true for every σ ∈ Q.
Accordingly
P x
(α, β, x) 7→ σ∈P,z∈Jσr |(Mβ Dα f |φσ )φσ ( )|
α
is continuous for every z ∈ R and every finite P ⊆ Q, and (α, β, x) 7→ (D1/α AMβ Dα f )(x) is Borel measurable by
256Ma. Q Q
It follows that the repeated integrals
R2 1Rn
1 α 0
(D1/α AMβ Dα f )(x)dβdα

are defined in [0, ∞] and are Borel measurable functions of x (252P again), so that Ãf is Borel measurable.
(b) For any n ∈ N,
Z Z 2 Z n
1 1
(D1/α AMβ Dα f )(x)dβdαdx
F
n 1
α 0
Z 2 Z nZ
1 1
= (D1/α AMβ Dα f )(x)dxdβdα
n 1
α 0 F
(by Fubini’s theorem, 252H)
286S Carleson’s theorem 489

Z 2Z nZ
1 1 x
= (AMβ Dα f )( )dxdβdα
n 1 0 F
α α
Z 2Z n Z
1
= (AMβ Dα f )(x)dxdβdα
n 1 0 α−1 F
Z 2Z n p
1
≤ C9 kMβ Dα f k2 µ(α−1 F )dβdα
n 1 0
(286O)
Z 2Z
1 p
n
1 1
= C9 · √ kf k2 ·√ µF dβdα
n 1 0
α α
p Z 2 Z n
1 1
= C9 kf k2 µF · dβdα
n 1
α 0
p p
= C9 kf k2 µF ln 2 ≤ C9 kf k2 µF .
So
Z Z Z 2 Z n
1 1
Ãf = lim inf (D1/α AMβ Dα f )(x)dβdαdx
F F n→∞
n 1
α 0
Z Z 2 Z n
1 1
≤ lim inf (D1/α AMβ Dα f )(x)dβdαdx
n→∞ F
n 1
α 0
(by Fatou’s lemma)
p
≤ C9 kf k2 µF .

(c) For any x ∈ R,


R∞ ∧ R2 1 1R n ′  R∞ ∧
|f (y)| supn∈N θ (y)dβ dαdy ≤ ln 2 · |f |
−∞ 1 α n 0 zαβ −∞

is finite. So
Z ∞
∧ 1 ∧
(f × θ̃z )∨ (x) = √ eixy f (y)θ̃z (y)dy
2π −∞
Z ∞ Z 2 Z n
1 ∧ 1 1
ixy ′
= √ e f (y) lim θzαβ (y)dβdαdy
2π −∞ 1
α n→∞ n 0
Z ∞ Z 2 Z n
1 ∧ 1 ′
= √ lim eixy f (y) θzαβ (y)dβdαdy
2π n→∞ −∞ 1
αn 0
(by Lebesgue’s Dominated Convergence Theorem)
Z 2 Z n Z ∞
1 1 ∧

= √ lim eixy f (y)θzαβ (y)dydβdα
2π n→∞ 1 αn 0 −∞
(by Fubini’s theorem)
Z 2 Z n
1 ∧

= lim (f × θzαβ )∨ (x)dβdα,
n→∞ 1 αn 0
and
Z 2 Z n
∧ 1 ∧
2π|(f × θ̃z ) (x)| = 2π lim
∨ ′
(f × θzαβ )∨ (x)dβdα
n→∞ 1 αn 0
Z 2 Z n
1 ∧

≤ 2π lim inf |(f × θzαβ )∨ (x)|dβdα
n→∞ 1
αn 0
Z 2 Z n
1
≤ lim inf (D1/α AMβ Dα f )(x)dβdα
n→∞ 1
αn 0
(286Qb)
= (Ãf )(x).
490 Fourier analysis 286T

286T Lemma Set C10 = C9 /π θ̃1 (0). For f ∈ L2C , define Âf : R → [0, ∞] by setting
1 R b −ixy
(Âf )(y) = supa≤b √ | e f (x)dx|
2π a
R √
for each y ∈ R. Then F
Âf ≤ C10 kf k2 µF whenever µF < ∞.
1 R b −ixy
P For a ≤ b, y 7→ | √
proof (a) As usual, the first step is to confirm that Âf is measurable. P e f (x)dx| is
2π a
continuous (by 283Cf, since f × χ[a, b] is integrable), so 256M gives the result. Q
Q
(b) Suppose that f is a rapidly decreasing test function. Then
1 ∨
(Âf )(y) ≤ (Ãf )(−y)
π θ̃1 (0)
for every y ∈ R. P
P If a ∈ R then

Z a Z ∞
1 1
√ | e−ixy f (x)dx| = √ | e−ixy θ̃a (x)f (x)dx|
2π −∞ θ̃1 (0) 2π −∞
(286R)
1 1 ∨∧
= |(f × θ̃a )∨ (−y)| = |(f × θ̃a )∨ (−y)|
θ̃1 (0) θ̃1 (0)
(284C)
1 ∨
≤ (Ãf )(−y)
2π θ̃1 (0)

(286Sc). So if a ≤ b in R,
1 R b −ixy 1 ∨
√ | e f (x)dx| ≤ (Ãf )(−y);
2π a π θ̃1 (0)
taking the supremum over a and b, we have the result. Q
Q
It follows that

Z Z p
1 ∨ 1
Âf ≤ Ãf ≤ C9 kf k2 µ(−F )
F
π θ̃1 (0) −F
π θ̃1 (0)
(286Sb, 284Oa)
p
= C10 kf k2 µF .

(c) For general square-integrable f , take any ǫ > 0 and any n ∈ N. Set
1 R b −ixy
(Ân f )(y) = sup−n≤a≤b≤n √ | e f (x)dx|
2π a
for each y ∈ R. Let g be a rapidly decreasing test function such that kf − gk2 ≤ ǫ (284N). Then

2n
Âg ≥ Ân g ≥ Ân f − √ ǫ

(using Cauchy’s inequality), so
r r
R R n √ n
F
Ân f ≤ F
Âg + ǫµF ≤ C10 (kf k2 + ǫ) µF + ǫµF .
π π
R √ R √
As ǫ is arbitrary, F
Ân f ≤ C10 kf k2 µF ; letting n → ∞, we get F Âf ≤ C10 kf k2 µF .

286U Theorem If f ∈ L2C then


1 R b −ixy
g(y) = lima→−∞,b→∞ √ e f (x)dx
2π a
is defined in C for almost every y ∈ R, and g represents the Fourier transform of f .
proof (a) For n ∈ N, y ∈ R set
286U Carleson’s theorem 491

1 R b −ixy Rn
γn (y) = supa≤−n,b≥n √ e f (x)dx − −n e−ixy f (x)dx .
2π a

P If inf n∈N γn (y) = 0 and ǫ > 0, take m ∈ N such that γm (y) ≤ 12 ǫ;


Then g(y) is defined whenever inf n∈N γn (y) = 0. P
1 R b R n
then √ | a e−ixy f (x)dx − −n e−ixy f (x)dx| ≤ ǫ whenever n ≥ m and a ≤ −n, b ≥ n. But this means, first, that

Rn Rb
h −n e−ixy f (x)dxin∈N is a Cauchy sequence, so has a limit ζ say, and, second, that ζ = lima→−∞,b→∞ a e−ixy f (x)dx,
ζ
so that g(y) = √ is defined. Q
Q

Also each γn is a measurable function (cf. part (a) of the proof of 286T).
(b) ?? Suppose, if possible, that {y : inf n∈N γn (y) > 0} is not negligible. Then
1
limm→∞ µ{y : |y| ≤ m, inf n∈N γn (y) ≥ } > 0,
m
so there is an ǫ > 0 such that
1
F = {y : |y| ≤ , inf n∈N γn (y) ≥ ǫ}
ǫ
has measure greater than ǫ. Let n ∈ N be such that
2
R∞ Rn
4C10 ( −∞
|f (x)|2 dx − −n
|f (x)|2 dx) < ǫ3 ,

and set f1 = f − f × χ[−n, n]; then 2C10 kf1 k2 ≤ ǫ3/2 .


We have
Z b Z n
1
γn (y) = sup √ e−ixy f1 (x)dx − e−ixy f1 (x)dx
a≤−n,b≥n 2π a −n
Z b
1
≤ 2 sup √ | e−ixy f1 (x)dx| ≤ 2(Âf1 )(y),
a≤b 2π a

so that

Z Z p
ǫµF ≤ γn ≤ 2 Âf1 ≤ 2C10 kf1 k2 µF
F F
(286T)
p
≤ ǫ3/2 µF

and µF ≤ ǫ; but we chose ǫ so that µF would be greater than ǫ. X


X
(c) Thus g(y) is defined for almost every y ∈ R. Now g represents the Fourier transform of f . P P Let h be a rapidly
decreasing
R∞ test function. Then the restriction of Âf to the set on which it is finite is a tempered function, by 286D, so
−∞
( Âf )(y)|h| is finite, by 284F. Now

Z ∞ Z ∞ Z n
1 
g×h= √ lim e−ixy f (x)dx h(y)dy
−∞
2π −∞ n→∞ −n
Z ∞Z n
1
= √ lim e−ixy f (x)h(y)dxdy
2π n→∞ −∞ −n
1 R n −ixy
(because √ | e f (x)dx| ≤ Âf (y) for every n and y, so we can use Lebesgue’s Dominated Convergence Theorem)
2π −n
Z nZ ∞
1
=√ lim e−ixy f (x)h(y)dydx
2π n→∞ −n −∞
R∞ Rn
(because −∞ −n
|f (x)h(y)|dxdy is finite for each n)
Z n Z ∞
∧ ∧
= lim f ×h= f ×h
n→∞ −n −∞


because f × h is certainly integrable. As h is arbitrary, g represents the Fourier transform of f . Q
Q
492 Fourier analysis 286V

286V Theorem For any square-integrable complex-valued function on ]−π, π], its sequence of Fourier sums converges
to it almost everywhere.

proof Suppose that f ∈ L2C (µ]−π,π] ). Set f1 (x) = f (x) for x ∈ dom f , 0 for x ∈ R \ ]−π, π]; then f1 ∈ L2C (µ). Let g ∈
1 R a −ixy
L2C (µ) represent the inverse Fourier transform of f1 (284O). Then 286U tells us that f2 (x) = lima→∞ √ −a
e g(y)dy

is defined for almost every x, and that f2 represents the Fourier transform of g, so is equal almost everywhere to f1
(284Ib).
Now, for any a ≥ 0, x ∈ R,

Z a
e−ixy g(y)dy = (g|hax )
−a

(where hax (y) = eixy if |y| ≤ a, 0 otherwise)



= (f2 |hax )
(284Ob)
Z ∞ Z ∞
1
= √ f2 (t) e−ity hax (y)dy dt
2π −∞ −∞
Z ∞ Z π
2 sin(x−t)a 2 sin(x−t)a
= √ f2 (t)dt = √ f (t)dt.
2π −∞
x−t 2π −π
x−t

So
1 R π sin(x−t)a
f (x) = f2 (x) = lima→∞ f (t)dt
π −π x−t

for almost every x ∈ ]−π, π].


On the other hand, writing hsn in∈N for the sequence of Fourier sums of f , we have, for any x ∈ ]−π, π[,
1 Rπ sin(n+ 21 )(x−t)
sn (x) = f (t) dt
2π −π sin 12 (x−t)

for each n, by 282Da. Now


Z π Z π
1 sin(n+ 21 )(x−t) 1 sin(n+ 21 )(x−t)
f (t) 1 dt − f (t) dt
2π −π
sin 2
(x−t) π −π
x−t
Z π
1 sin(n+ 21 )(x−t) sin(n+ 21 )(x−t) 
= f (t) 1 dt − dt
π −π
2 sin 2
(x−t) x−t
Z x+π
1 1 1
= f (x − t) sin(n + 12 )t − dt.
π x−π
2 sin 12 t t

But if we look at the function

1 1
px (t) = f (x − t) − if x − π < t < x + π and t 6= 0,
2 sin 12 t t
= 0 otherwise,

1 1 1 1
px is integrable, because f is integrable over ]−π, π] and limt→0 − = 0, so supt6=0,x−π≤t≤x+π | − | is
2 sin 12 t t 2 sin 12 t t
finite. (This is where we need to know that |x| < π.) So
Z π Z ∞
1 sin(n+ 21 )(x−t)
lim sn (x) − f (t) dt = lim px (t) sin(n + 21 )t dt = 0
n→∞ π −π
x−t n→∞ −∞

by the Riemann-Lebesgue lemma (282Fb). But this means that limn→∞ sn (x) = f (x) for any x ∈ ]−π, π[ such that
1 R π sin(x−t)a
f (x) = lima→∞ −π
f (t)dt, which is almost every x ∈ ]−π, π].
π x−t
286 Notes Carleson’s theorem 493

286W Glossary The following special notations are used in more than one paragraph of this section:
µ for Lebesgue measure on R. 286G: C2 , C3 , C4 . 286O: Af .
286A: f ∗ . 286H: mass, energy. 286P: θz (y).

286C: Sα f , Mα f , Dα f . 286I: C5 . 286Q: θzαβ (y).
286Ea: I, Q, Iσ , Jσ , kσ , xσ , yσ , Jσl , Jσr , yσl . 286K: C6 . 286R: θ̃z (y).
286Eb: φ, φσ , (f |g). 286L: C7 . 286S: Ãf .
286Ec: w, wσ , C1 . 286M: C8 . 286T: C10 , Âf .
286F: ≤, ≤r , 4. 286N: C9 .

286X Basic exercises (a) Use 284Oa and 284Xf to shorten part (c) of the proof of 286U.
P∞ P∞
(b) Show that if hck ik∈N is a sequence of complex numbers such that k=0 |ck |2 is finite, then k=0 ck eikx is defined
in C for almost all x ∈ R.

286Y Further exercises (a) Show that if f is a square-integrable function on R r , where r ≥ 2, then
1 Rb
g(y) = √ limα1 ,... ,αr →−∞,β1 ,... ,βr →∞ a
e−iy . x f (x)dx
( 2π)r
is defined in C for almost every y ∈ R r , and that g represents the Fourier transform of f .

286 Notes and comments This is not the longest single section in this treatise as a whole, but it is by a substantial
margin the longest in the present volume, and thirty pages of sub-superscripts must tax the endurance of the most
enthusiastic. You will easily understand why Carleson’s theorem is not usually presented at this level. But I am trying
in this book to present complete proofs of the principal theorems, there is no natural place for Carleson’s theorem in
later volumes as at present conceived, and it is (just) accessible at this point; so I take the space to do it here.
The proof here divides naturally into two halves: the ‘combinatorial’ part in 286E-286M, up to the Lacey-Thiele
lemma, followed by the ‘analytic’ part in 286N-286V, in which the averaging process
R2 1 1R b
1 α
limb→∞ . . . dβdα
b 0
is used to transform the geometrically coherent, but analytically irregular, functions θz into the characteristic functions
1
θ̃z . From the standpoint of ordinary Fourier analysis, this second part is essentially routine; there are many paths
θ̃1 (0)
we could follow, and we have only to take the ordinary precautions against illegitimate operations.
Carleson (Carleson 66) stated his theorem in the Fourier-series form of 286V; but it had long been understood
that this was equiveridical with the Fourier-transform version in 286U. There are of course many ways of extending
the theorem. In particular, there are corresponding results for functions in Lp for any p > 1, and even for functions
f such that f × ln(1 + |f |) × ln ln ln(16 + |f |) is integrable (Antonov 96). The methods here do not seem to reach
so far. I ought also to remark that if we define Âf as in 286T, then there is for every p > 1 a constant C such that
kÂf kp ≤ Ckf kp for every f ∈ LpC (Hunt 67, Mozzochi 71, Jørsboe & Mejlbro 82, Arias de Reyna 02).
Note that the point of Carleson’s theorem, in either form, is that we take special limits. In the formulae
∧ 1 Rb
f (y) = √ lima→−∞,b→∞ a
e−ixy f (x)dx,

Pn ikx
f (x) = limn→∞ −n ck e ,
R∞
valid almost everywhere for square-integrable functions f , we are not taking the ordinary integral −∞ e−ixy f (x)dx
P P∞
or the unconditional sum k∈Z ck eikx . If f is not integrable, or k=−∞ |ck | is infinite, these will not be defined at
even one point. Carleson’s theorem makes sense only because we have a natural preference for particular kinds of
improper integral and conditional sum. So when we return, in Chapter 44 of Volume 4, to Fourier analysis on general
topological groups, there will simply be no language in which to express the theorem, and while versions have been
proved for other groups (e.g., Schipp 78), they necessarily depend on some structure beyond the simple notion of
‘locally compact Hausdorff abelian topological group’. Even in R 2 , I understand that it is still unknown whether
1 R
lima→∞ e−iy . x f (x)dx
2π B(0,a)
will be defined a.e. for any square-integrable function f , if we use ordinary Euclidean balls B(0, a) in place of the
rectangles in 286Ya.
494 Appendix

Appendix to Volume 2
Useful Facts
In the course of writing this volume, I have found that a considerable number of concepts and facts from various
branches of mathematics are necessary to us. Nearly all of them are embedded in important and well-established
theories for which many excellent textbooks are available and which I very much hope that you will one day study in
depth. Nevertheless, I am reluctant to send you off immediately to courses in general topology, functional analysis and
set theory, as if these were essential prerequisites for our work here, along with real analysis and basic linear algebra.
For this reason I have written this Appendix, setting out those results which we actually need at some point in this
volume. The great majority of them really are elementary – indeed, some are so elementary that they are not always
spelt out in detail in orthodox treatments of their subjects.
While I do not put this book forward as the proper place to learn any of these topics, I have tried to set them out
in a way that you will find easy to integrate into regular approaches. I do not expect anybody to read systematically
through this work, and I hope that the references given in the main chapters of this volume will be adequate to guide
you to the particular items you need.

2A1 Set theory


Especially for the examples in Chapter 21, we need some non-trivial set theory, which is best approached through the
standard theory of cardinals and ordinals; and elsewhere in this volume I make use of Zorn’s Lemma. Here I give a very
brief outline of the results involved, largely omitting proofs. Most of this material should be in any sound introduction
to set theory. The references I give are to books which happen to have come my way and which I can recommend as
reasonably suitable for beginners.
I do not discuss axiom systems or logical foundations. The set theory I employ is ‘naive’ in the sense that I rely on my
understanding of the collective experience of the last ninety years, rather than on any attempt at formal description,
to distinguish legitimate from unsafe arguments. There are, however, points in Volume 5 at which such a relaxed
philosophy becomes inappropriate, and I therefore use arguments which can, I believe, be translated into standard
Zermelo-Fraenkel set theory without new ideas being invoked.
Although in this volume I use the axiom of choice without scruple whenever appropriate, I will divide this section
into two parts, starting with ideas and results not dependent on the axiom of choice (2A1A-2A1I) and continuing with
the remainder (2A1J-2A1P). I believe that even at this level it helps us to understand the nature of the arguments
better if we maintain a degree of separation.

2A1A Ordered sets (a) Recall that a partially ordered set is a set P together with a relation ≤ on P such that
if p ≤ q and q ≤ r then p ≤ r
p ≤ p for every p ∈ P
if p ≤ q and q ≤ p then p = q.
In this context, I will write p ≥ q to mean q ≤ p, and p < q or q > p to mean ‘p ≤ q and p 6= q’. ≤ is a partial order
on P .

(b) Let (P, ≤) be a partially ordered set, and A ⊆ P . A maximal element of A is a p ∈ A such that p 6< a for any
a ∈ A. Note that A may have more than one maximal element. An upper bound for A is a p ∈ P such that a ≤ p
for every a ∈ A; a supremum or least upper bound is an upper bound p such that p ≤ q for every upper bound q
of A. There can be at most one such, because if p, p′ are both least upper bounds then p ≤ p′ and p′ ≤ p. Accordingly
we may safely write p = sup A if p is the least upper bound of A.
Similarly, a minimal element of A is a p ∈ A such that p 6> a for every a ∈ A; a lower bound of A is a p ∈ P such
that p ≤ a for every a ∈ A; and inf A = a means that
∀ q ∈ P , a ≥ q ⇐⇒ p ≥ q for every p ∈ A.
A subset A of P is order-bounded if it has both an upper bound and a lower bound.
A subset A of P is upwards-directed if for any p, p′ ∈ A there is a q ∈ A such that p ≤ q and p′ ≤ q; that is, if
any non-empty finite subset of A has an upper bound in A. Similarly, A ⊆ P is downwards-directed if for any p,
p′ ∈ A there is a q ∈ A such that q ≤ p and q ≤ p′ ; that is, if any non-empty finite subset of A has a lower bound in A.
It is sometimes convenient to adapt the notation for closed intervals to arbitrary partially ordered sets: [p, q] will be
{r : p ≤ r ≤ q}.
2A1B Set theory 495

(c) A totally ordered set is a partially ordered set (P, ≤) such that
for any p, q ∈ P , either p ≤ q or q ≤ p.
≤ is then a total or linear order on P . In any totally ordered set we have a median function; for p, q, r ∈ P set

med(p, q, r) = max(min(p, q), min(p, r), min(q, r))


= min(max(p, q), max(p, r), max(q, r)),

so that med(p, q, r) = q if p ≤ q ≤ r.

(d) A lattice is a partially ordered set (P, ≤) such that


for any p, q ∈ P , p ∨ q = sup{p, q} and p ∧ q = inf{p, q} are defined in P .

(e) A well-ordered set is a totally ordered set (P, ≤) such that inf A exists and belongs to A for every non-empty
set A ⊆ P ; that is, every non-empty subset of P has a least element. In this case ≤ is a well-ordering of P .

2A1B Transfinite Recursion: Theorem Let (P, ≤) be a well-ordered set and X any class. S For p ∈ P write Lp
for the set {q : q ∈ P, q < p} and X Lp for the class of all functions from Lp to X. Let F : p∈P X Lp → X be any
function. Then there is a unique function f : P → X such that f (p) = F (f ↾Lp ) for every p ∈ P .
proof There are versions of this result in Enderton 77 (p. 175) and Halmos 60 (§18). Nevertheless I write out
a proof, since it seems to me that most elementary books on set theory do not give it its proper place at the very
beginning of the theory of well-ordered sets.
(a) Let Φ be the class of all functions φ such that
(α) dom φ is a subset of P , and Lp ⊆ dom φ for every p ∈ dom φ;
(β) φ(p) ∈ X for every p ∈ dom φ, and φ(p) = F (φ↾Lp ) for every p ∈ dom φ.
(b) If φ, ψ ∈ Φ then φ and ψ agree on dom φ ∩ dom ψ. P P?? If not, then A = {q : q ∈ dom φ ∩ dom ψ, φ(q) 6= ψ(q)}
is non-empty. Because P is well-ordered, A has a least element p say. Now Lp ⊆ dom φ ∩ dom ψ and Lp ∩ A = ∅, so
φ(p) = F (φ↾Lp ) = F (ψ↾Lp ) = ψ(p),
which is impossible. X
XQQ
(c) It follows that Φ is a set, since the function φ 7→ dom φ is an injective function from Φ to PP , and its inverse is
a surjection from a subset of PP onto Φ. We can therefore, without inhibitions, define a function f by writing
S
dom f = φ∈Φ dom φ, f (p) = φ(p) whenever φ ∈ Φ, p ∈ dom φ.
S
(If you think that a function φ is just the set of ordered pairs {(p, φ(p)) : p ∈ dom φ}, then f becomes Φ.) Then
f ∈ Φ. PP Of course f is a function from a subset of P to X. If p ∈ dom f , then there is a φ ∈ Φ such that p ∈ dom φ,
in which case
Lp ⊆ dom φ ⊆ dom f , f (p) = φ(p) = F (φ↾Lp ) = F (f ↾Lp ). Q
Q

(d) f is defined everywhere in P . P P?? Otherwise, P \ dom f is non-empty and has a least element r say. Now
Lr ⊆ dom f . Define a function ψ by saying that dom ψ = {r} ∪ dom f , ψ(p) = f (p) for p ∈ dom f and ψ(r) = F (f ↾Lr ).
Then ψ ∈ Φ, because if p ∈ dom ψ
either p ∈ dom f so Lp ⊆ dom f ⊆ dom ψ and
ψ(p) = f (p) = F (f ↾Lp ) = F (ψ↾Lp )

or p = r so Lp = Lr ⊆ dom f ⊆ dom ψ and


ψ(p) = F (f ↾Lr ) = F (ψ↾Lr ).
Accordingly ψ ∈ Φ and r ∈ dom ψ ⊆ dom f . X
XQQ
(e) Thus f : P → X is a function such that f (p) = F (f ↾Lp ) for every p. To see that f is unique, observe that any
function of this type must belong to Φ, so must agree with f on their common domain, which is the whole of P .
Remark If you have been taught to distinguish between the words ‘set’ and ‘class’, you will observe that my naive set
theory is a relatively tolerant one in that it is willing to allow class variables in its theorems.
496 Appendix 2A1C

2A1C Ordinals An ordinal (sometimes called a ‘von Neumann ordinal’) is a set ξ such that
if η ∈ ξ then η is a set and η 6∈ η,
if η ∈ ζ ∈ ξ then η ∈ ξ,
writing ‘η ≤ ζ’ to mean ‘η ∈ ζ or η = ζ’, (ξ, ≤) is well-ordered
(Enderton 77, p. 191; Halmos 60, §19; Henle 86, p. 27; Krivine 71, p. 24; Roitman 90, 3.2.8. Of course many
set theories do not allow sets to belong to themselves, and/or take it for granted that every object of discussion is a
set, but I prefer not to take a view on such points in general.)

2A1D Basic facts about ordinals (a) If ξ is an ordinal, then every member of ξ is an ordinal. (Enderton 77,
p. 192; Henle 86, 6.4; Krivine 71, p. 14; Roitman 90, 3.2.10.)

(b) If ξ, η are ordinals then either ξ ∈ η or ξ = η or η ∈ ξ (and no two of these can occur together). (Enderton
77, p. 192; Henle 86, 6.4; Krivine 71, p. 14; Lipschutz 64, 11.12; Roitman 90, 3.2.13.) It is customary, in this
case, to write η < ξ if η ∈ ξ and η ≤ ξ if either η ∈ ξ or η = ξ. Note that η ≤ ξ iff η ⊆ ξ.

(c) If A is any non-empty class of ordinals, then there is an α ∈ A such that α ≤ ξ for every ξ ∈ A. (Henle 86,
6.7; Krivine 71, p. 15.)

(d) If ξ is an ordinal, so is ξ ∪ {ξ}; call it ‘ξ + 1’. If ξ < η then ξ + 1 ≤ η; ξ + 1 is the least ordinal greater than ξ.
(Enderton 77, p. 193; Henle 86, 6.3; Krivine 71, p. 15.) For anySordinal ξ, either there is a greatest ordinal η < ξ,
in which case ξ = η + 1 and we call ξ a successor ordinal, or ξ = ξ, in which case we call ξ a limit ordinal.

(e) The first few ordinals are 0 = ∅, 1 = 0 + 1 = {0} = {∅}, 2 = 1 + 1 = {0, 1} = {∅, {∅}}, 3 = 2 + 1 = {0, 1, 2}, . . . .
The first infinite ordinal is ω = {0, 1, 2, . . . }, which may be identified with N.

(f ) The union of any set of ordinals is an ordinal. (Enderton 77, p. 193; Henle 86, 6.8; Krivine 71, p. 15;
Roitman 90, 3.2.19.)

(g) If (P, ≤) is any well-ordered set, there is a unique ordinal ξ such that P is order-isomorphic to ξ, and the
order-isomorphism is unique. (Enderton 77, pp. 187-189; Henle 86, 6.13; Halmos 60, §20.)

2A1E Initial ordinals An initial ordinal is an ordinal κ such that there is no bijection between κ and any member
of κ. (Enderton 77, p. 197; Halmos 60, §25; Henle 86, p. 34; Krivine 71, p. 24; Roitman 90, 5.1.10, p. 79).

2A1F Basic facts about initial ordinals (a) All finite ordinals, and the first infinite ordinal ω, are initial ordinals.

(b) For every well-ordered set P there is a unique initial ordinal κ such that there is a bijection between P and κ.

(c) For every ordinal ξ there is a least initial ordinal greater than ξ. (Enderton 77, p. 195; Henle 86, 7.2.1.) If
κ is an initial ordinal, write κ+ for the least initial ordinal greater than κ. We write ω1 for ω + , ω2 for ω1+ , and so on.

(d) For any initial ordinal κ ≥ ω there is a bijection between κ × κ and κ; consequently there are bijections between
κ and κr for every r ≥ 1.

2A1G Schröder-Bernstein theorem I remind you of the following fundamental result: if X and Y are sets and
there are injections f : X → Y , g : Y → X then there is a bijection h : X → Y . (Enderton 77, p. 147; Halmos 60,
§22; Henle 86, 7.4; Lipschutz 64, p. 145; Roitman 90, 5.1.2. It is also a special case of 344D in Volume 3.)

2A1H Countable subsets of PN The following results will be needed below.


(a) There is a bijection between PN and R. (Enderton 77, p. 149; Lipschutz 64, p. 146.)

(b) Suppose that X is any set such that there is an injection from X into PN. Let C be the set of countable subsets of
X. Then there is a surjection from PN onto C. P P Let f : X → PN be an injection. Set f1 (x) = {0} ∪ {i + 1 : i ∈ f (x)};
then f1 : X → PN is injective and f1 (x) 6= ∅ for every x ∈ X. Define g : PN → PX by setting
g(A) = {x : ∃ n ∈ N, f1 (x) = {i : 2n (2i + 1) ∈ A}}
for each A ⊆ N. Then g(A) is countable, since we have an injection
x 7→ min{n : f1 (x) = {i : 2n (2i + 1) ∈ A}}
2A1K Set theory 497

from g(A) to N. Thus g is a function from PN to C. To see that g is surjective, observe that ∅ = g(∅), while if C ⊆ X
is countable and not empty there is a surjection h : N → C; now set
A = {2n (2i + 1) : n ∈ N, i ∈ f1 (h(n))},
and see that g(A) = C. Q
Q
(c) Again suppose that X is a set such that there is an injection from X to PN, and write H for the set of functions
h such that dom h is a countable subset of X and h takes values in {0, 1}. Then there is a surjection from PN onto H.
P Let C be the set of countable subsets of X and let g : PN → C be a surjection, as in (a). For A ⊆ N set
P
g0 (A) = g({i : 2i ∈ A}), g1 (A) = g({i : 2i + 1 ∈ A}),
so that g0 (A), g1 (A) are countable subsets of X, and A 7→ (g0 (A), g1 (A)) is a surjection from PN onto C × C. Let hA
be the function with domain g0 (A) ∪ g1 (A) such that hA (x) = 1 if x ∈ g1 (A), 0 if x ∈ g0 (A) \ g1 (A). Then A 7→ hA is
a surjection from PN onto H. Q Q

2A1I Filters I pause for a moment to discuss a construction which is of great value in investigating topological
spaces, but has other uses, and in its nature belongs to elementary set theory (much more elementary, indeed, than
the work above).
(a) Let X be a non-empty set. A filter on X is a family F of subsets of X such that
X ∈ F, ∅ ∈
/ F,
E ∩ F ∈ F whenever E, F ∈ F,
E ∈ F whenever X ⊇ E ⊇ F ∈ F.
The second condition implies (inducing on n) that F0 ∩ . . . ∩ Fn ∈ F whenever F0 , . . . , Fn ∈ F.
(b) Let X, Y be non-empty sets, F a filter on X and f : D → Y a function, where D ∈ F. Then
{E : E ⊆ Y, f −1 [E] ∈ F}
is a filter on Y (because f −1 [Y ] = D, f −1 [∅] = ∅, f −1 [E ∩ F ] = f −1 [E] ∩ f −1 [F ], X ⊇ f −1 [E] ⊇ f −1 [F ] whenever
Y ⊇ E ⊇ F ); I will call it f [[F]], the image filter of F under f .
Remark Of course there is a hidden variable in this notation. Ordinarily in this book I regard a function f as being
defined by its domain dom f and its values on its domain; that is, it is determined by its graph {(x, f (x)) : x ∈ dom f },
and indeed I normally do not distinguish between a function and its graph. This means that when I write ‘f : D → Y
is a function’ then the class D = dom f can be recovered from the function, but the class Y cannot; all I promise is
that Y includes the class f [D] of values of f . Now in the notation f [[F]] above we do actually need to know which set
Y it is to be a filter on, even though this cannot be discovered from knowledge of f and F. So you will always have to
infer it from the context.

2A1J The Axiom of Choice I come now to the second half of this section, in which I discuss concepts and
theorems dependent on the Axiom of Choice. Let me remind you of the statement of this axiom:
(AC) ‘whenever I is a set and hXi ii∈I is a family of non-empty sets indexed by I, there is a function f , with
domain I, such that f (i) ∈ Xi for every i ∈ I’.
The function f is a choice function; it picks out one member of each of the given family of non-empty sets Xi .
I believe that one’s attitude to this principle is a matter for individual choice. It is an indispensable foundation
for very large parts of twentieth-century pure mathematics, including a substantial fraction of the present volume;
but there are also significant areas in which principles actually contradictory to it can be employed to striking effect,
leading – in my view – to equally valid mathematics. (I will describe one of these in §567 of Volume 5.) At present it
is the case that more current mathematical activity, by volume, depends on asserting the axiom of choice than on all
its rivals put together; but it is a matter of judgement and taste where the most important, or exciting, ideas are to
be found. For the present volume I follow standard practice in twentieth-century abstract analysis, using the axiom of
choice whenever necessary.

2A1K Zermelo’s Well-Ordering Theorem (a) The Axiom of Choice is equiveridical with each of the statements
‘for every set X there is a well-ordering of X’,
‘for every set X there is a bijection between X and some ordinal’,
‘for every set X there is a unique initial ordinal κ such that there is a bijection between X and κ.’
(Enderton 77, p. 196 et seq.; Halmos 60, §17; Henle 86, 9.1-9.3; Krivine 71, p. 20; Lipschutz 64, 12.1; Roitman
90, 3.6.38.)
498 Appendix 2A1Kb

(b) When assuming the axiom of choice, as I do nearly everywhere in this treatise, I write #(X) for that initial
ordinal κ such that there is a bijection between κ and X; I call this the cardinal of X.

2A1L Fundamental consequences of the Axiom of Choice (a) For any two sets X and Y , there is a bijection
between X and Y iff #(X) = #(Y ). More generally, there is an injection from X to Y iff #(X) ≤ #(Y ), and a
surjection from X onto Y iff either #(X) ≥ #(Y ) > 0 or #(X) = #(Y ) = 0.

(b) In particular, #(PN) = #(R); write c for this common value, the cardinal of the continuum. Cantor’s
theorem that PN and R are uncountable becomes the result ω < c, that is, ω1 ≤ c.

(c) If X is any infinite set, and r ≥ 1, then there is a bijection between X r and X. (Enderton 77, p. 162; Halmos
60, §24.) (I note that we need some form of the axiom of choice to prove the result in this generality. But of course for
most of the infinite sets arising naturally in mathematics – sets like N and PR – it is easy to prove the result without
appeal to the axiom of choice.)

S at most κ and hAi ii∈I is a family of sets with


(d) Suppose that κ is an infiniteScardinal. If I is a set of cardinal
#(Ai ) ≤ κ for every i ∈ I, then #( i∈I Ai ) ≤ κ. Consequently #( A) ≤ κ whenever A is a family of sets such that
#(A) ≤ κ and #(A) ≤ κ for every A ∈ A. In particular, ω1 cannot be expressed as a countable union of countable
sets, and ω2 cannot be expressed as a countable union of sets of cardinal at most ω1 .

(e) Now we can rephrase 2A1Hc as: if #(X) ≤ c, then #(H) ≤ c, where H is the set of functions from a countable
subset of X to {0, 1}. P
P For we have an injection from X into PN, and therefore a surjection from PN onto H. Q
Q

(f ) Any non-empty class of cardinals has a least member (by 2A1Dc).

2A1M Zorn’s Lemma In 2A1K I described the well-ordering principle. I come now to another proposition which
is equiveridical with the axiom of choice:
‘Let (P, ≤) be a non-empty partially ordered set such that every non-empty totally ordered subset of P
has an upper bound in P . Then P has a maximal element.’
This is Zorn’s Lemma. For the proof that the axiom of choice implies, and is implied by, Zorn’s Lemma, see Enderton
77, p. 151; Halmos 60, §16; Henle 86, 9.1-9.3; Roitman 90, 3.6.38.

2A1N Ultrafilters A filter F on a set X is an ultrafilter if for every A ⊆ X either A ∈ F or X \ A ∈ F.


If F is an ultrafilter on X and f : D → Y is a function, where D ∈ F, then f [[F]] is an ultrafilter on Y (because
f −1 [Y \ A] = D \ f −1 [A] for every A ⊆ Y ).
One type of ultrafilter can be described easily: if x is any point of a set X, then F = {F : x ∈ F ⊆ X} is an
ultrafilter on X. (You need only read the definitions. Ultrafilters of this type are called principal ultrafilters.) But
it is not obvious that there are any further ultrafilters, and indeed it is not possible to prove that there are any, without
using a strong form of the axiom of choice, as follows.

2A1O The Ultrafilter Theorem As an example of the use of Zorn’s lemma which will be of great value in studying
compact topological spaces (2A3N et seq., and §247), I give the following result.
Theorem Let X be any non-empty set, and F a filter on X. Then there is an ultrafilter H on X such that F ⊆ H.
proof (Cf. Henle 86, 9.4; Roitman 90, 3.6.37.) Let P be the set of all filters on X including F, and order P by
inclusion, so that, for G1 , G2 ∈ P, G1 ≤ G2 in P iff G1 ⊆ G2 . It is easy to see that P is a partially
S ordered set, and it is
non-empty because F ∈ P. If Q is any non-empty totally ordered subset of P, then HQ = Q ∈ P. P P Of course HQ
is a family of subsets of X. (i) Take any G0 ∈ Q; then X ∈ G0 ⊆ HQ . If G ∈ Q, then G is a filter, so ∅ ∈ / G; accordingly
∅∈ / HQ . (ii) If E, F ∈ HQ , then there are G1 , G2 ∈ Q such that E ∈ G1 and F ∈ G2 . Because Q is totally ordered,
either G1 ⊆ G2 or G2 ⊆ G1 . In either case, G = G1 ∪ G2 ∈ Q. Now G is a filter containing both E and F , so it contains
E ∩ F , and E ∩ F ∈ HQ . (iii) If X ⊇ E ⊇ F ∈ HQ , there is a G ∈ Q such that F ∈ G; and E ∈ G ⊆ HQ . This shows
that HQ is a filter on X. (iv) Finally, HQ ⊇ G0 ⊇ F, so HQ ∈ P. Q Q Now HQ is evidently an upper bound for Q in P.
We may therefore apply Zorn’s Lemma to find a maximal element H of P. This H is surely a filter on X including
F.
Now let A ⊆ X be such that A ∈ / H. Consider
H1 = {E : E ⊆ X, E ∪ A ∈ H}.
2A1P Set theory 499

P Of course it is a family of subsets of X. (i) X ∪ A = X ∈ H, so X ∈ H1 . ∅ ∪ A = A ∈


This is a filter on X. P / H so
∅∈
/ H1 . (ii) If E, F ∈ H1 then
(E ∩ F ) ∪ A = (E ∪ A) ∩ (F ∪ A) ∈ H,
so E ∩ F ∈ H1 . (iii) If X ⊇ E ⊇ F ∈ H1 then E ∪ A ⊇ F ∪ A ∈ H, so E ∪ A ∈ H and E ∈ H1 . Q
Q Also H1 ⊇ H, so
H1 ∈ P. But H is a maximal element of P, so H1 = H. Since (X \ A) ∪ A = X ∈ H, X \ A ∈ H1 and X \ A ∈ H.
As A is arbitrary, H is an ultrafilter, as required.

2A1P I come now to a result from infinitary combinatorics for which I give a detailed proof, not because it cannot
be found in many textbooks, but because it is usually given in enormously greater generality, to the point indeed that
it may be harder to understand why the stated theorem covers the present result than to prove the latter from first
principles.
Theorem (a) Let hKα iα∈A be a family of countable sets, with #(A) strictly greater than c, the cardinal of the
continuum. Then there are a set M , of cardinal at most c, and a set B ⊆ A, of cardinal strictly greater than c, such
that Kα ∩ Kβ ⊆ M whenever α, β are distinct members of B.
(b) Let I be a set, and hfα iα∈A a family in {0, 1}I , the set of functions from I to {0, 1}, with #(A) > c. If hKα iα∈A
is any family of countable subsets of I, then there is a set B ⊆ A, of cardinal greater than c, such that fα and fβ agree
on Kα ∩ Kβ for all α, β ∈ B.
(c) In particular, under the conditions of (b), there are distinct α, β ∈ A such that fα and fβ agree on Kα ∩ Kβ .
proof (a) Choose inductively a family hMξ iξ<ω1 of sets by the rule
if there is any set N such that
S
(∗) N is disjoint from η<ξ Mη , #(N ) ≤ c and
#({α : α ∈ A, Kα ∩ N = ∅}) ≤ c,
choose such a set for Mξ ;
otherwise set Mξ = ∅.
S
When Mξ has been chosen for every ξ < ω1 , set M = ξ<ω1 Mξ . The rule ensures that hMξ iξ<ω1 is disjoint and that
#(Mξ ) ≤ c for every ξ < ω1 , while ω1 ≤ c, so #(M ) ≤ c.
Let P be the family of sets P ⊆ A such that Kα ∩ Kβ ⊆ M Sfor all distinct α, β ∈ P . Order P by inclusion,
S so that
it is a partially ordered set. If Q ⊆ P is totally ordered, then Q ∈ P. PP If α, β are distinct members of Q, there
are Q1 , Q2 ∈ Q such that α ∈ Q1 , β ∈ Q2 ; now P = Q1 ∪ Q2 is equal to one of Q1 , Q2 , and in either case belongs to
P and contains both α and β, so Kα ∩ Kβ ⊆ M . Q Q By Zorn’s Lemma, P has a maximal element B, and we surely
have Kα ∩ Kβ ⊆ M for all distinct α, β ∈ B. S
?? Suppose, if possible, that #(B) ≤ c. Set N = α∈B Kα \ M . Then N has cardinal at most c, being included
in a union of at most c countable sets. For every γ ∈ A \ B, B ∪ {γ} ∈ / P, so there must be some α ∈ B such that
Kα ∩ Kγ 6⊆ M ; that is, Kγ ∩ N 6= ∅. Thus {γ : Kγ ∩ N = ∅} ⊆ B has cardinal at most c. But this means that
in the rule for choosing Mξ , there was always an N satisfying the condition (∗),S and therefore Mξ also did. Thus
Cξ = {α : Kα ∩ Mξ = ∅} has cardinal at most c for every ξ < ω1 . So C = ξ<ω1 Cξ also has. But the original
hypothesis was that #(A) > c, so there is an α ∈ A \ C. In this case, Kα ∩ Mξ 6= ∅ for every ξ < ω1 . But this means
that we have a surjection φ : Kα ∩ M → ω1 given by setting
φ(i) = ξ if i ∈ Kα ∩ Mξ .
Since #(Kα ) ≤ ω < ω1 , this is impossible. X
X
Accordingly #(B) > c and we have found a suitable pair M , B.
(b) By (a), we can find a set M , of cardinal at most c, and a set B0 ⊆ A, of cardinal greater than c, such that
Kα ∩ Kβ ⊆ M for all distinct α, β ∈ B0 . Let H be the S set of functions from countable subsets of M to {0, 1}; then
fα′ = fα ↾(Kα ∩ M ) ∈ H for each α ∈ B0 . Now B0 = h∈H {α : α ∈ B0 , fα′ = h} has cardinal greater than c, while
#(H) ≤ c (2A1Le), so there must be some h ∈ H such that B = {α : α ∈ B0 , fα′ = h} has cardinal greater than c.
If α, β are distinct members of B, then Kα ∩ Kβ ⊆ M , because α, β ∈ B0 ; but this means that
fα ↾Kα ∩ Kβ = h↾Kα ∩ Kβ = fβ ↾Kα ∩ Kβ .
Thus B has the required property. (Of course fα and fβ agree on Kα ∩ Kβ if α = β.)
(c) follows at once.
Remark The result we need in this volume (in 216E) is part (c) above. There are other proofs of this, perhaps a little
simpler; but the stronger result in part (b) will be useful in Volume 3.
500 Appendix §2A2 intro.

2A2 The topology of Euclidean space


In the appendix to Volume 1 (§1A2) I discussed open and closed sets in R r ; the chief aim there was to support
the idea of ‘Borel set’, which is vital in the theory of Lebesgue measure, but of course they are also fundamental to
the study of continuous functions, and indeed to all aspects of real analysis. I give here a very brief introduction to
the further elementary facts about closed and compact sets and continuous functions which we need for this volume.
Much of this material can be derived from the generalizations in §2A3, but nevertheless I sketch the proofs, since for
the greater part of the volume (most of the exceptions are in Chapter 24) Euclidean space is sufficient for our needs.

2A2A Closures: Definition For any r ≥ 1 and any A ⊆ R r , the closure of A, A, is the intersection of all the
closed subsets of R r including A. This is itself closed (being the intersection of a non-empty family of closed sets, see
1A2Fd), so is the smallest closed set including A. In particular, A is closed iff A = A.

2A2B Lemma Let A ⊆ R r be any set. Then for x ∈ R r the following are equiveridical:
(i) x ∈ A, the closure of A;
(ii) B(x, δ) ∩ A 6= ∅ for every δ > 0, where B(x, δ) = {y : ky − xk ≤ δ};
(iii) there is a sequence hxn in∈N in A such that limn→∞ kxn − xk = 0.
proof (a)(i)⇒(ii) Suppose that x ∈ A and δ > 0. Then U (x, δ) = {y : ky − xk < δ} is an open set (1A2D), so
F = R r \ U (x, δ) is closed, while x ∈
/ F . Now
x ∈ A \ F =⇒ A 6⊆ F =⇒ A 6⊆ F =⇒ A ∩ U (x, δ) 6= ∅ =⇒ A ∩ B(x, δ) 6= ∅.
As δ is arbitrary, (ii) is true.
(b)(ii)⇒(iii) If (ii) is true, then for each n ∈ N we can find an xn ∈ A such that kxn − xk ≤ 2−n , and now
limn→∞ kxn − xk = 0.
/ A. Then x belongs to the open set R r \ A and there is
(c)(iii)⇒(i) Assume (iii). ?? Suppose, if possible, that x ∈
a δ > 0 such that U (x, δ) ⊆ R r \ A. But now there is an n such that kxn − xk < δ, in which case xn ∈ U (x, δ) ∩ A ⊆
U (x, δ) ∩ A. X
X

2A2C Continuous functions (a) I begin with a characterization of continuous functions in terms of open sets. If
r, s ≥ 1, D ⊆ R r and φ : D → R s is a function, we say that φ is continuous if for every x ∈ D and ǫ > 0 there is a
δ > 0 such that kφ(y) − φ(x)k ≤ ǫ whenever y ∈ D and ky − xk ≤ δ. Now φ is continuous iff for every open set G ⊆ R s
there is an open set H ⊆ R r such that φ−1 [G] = D ∩ H.
PP (i) Suppose that φ is continuous and that G ⊆ R s is open. Set
S
H = {U : U ⊆ R r is open, φ[U ∩ D] ⊆ G}.
Then H is a union of open sets, therefore open (1A2Bd), and H ∩ D ⊆ φ−1 [G]. If x ∈ φ−1 [G], then φ(x) ∈ G, so
there is an ǫ > 0 such that U (φ(x), ǫ) ⊆ G; now there is a δ > 0 such that kφ(y) − φ(x)k ≤ 12 ǫ whenever y ∈ D and
ky − xk ≤ δ, so that
φ[U (x, δ) ∩ D] ⊆ U (φ(x), ǫ) ⊆ G
and
x ∈ U (x, δ) ⊆ H.
−1
As x is arbitrary, φ [G] = H ∩ D. As G is arbitrary, φ satisfies the condition.
(ii) Now suppose that φ satisfies the condition. Take x ∈ D and ǫ > 0. Then U (φ(x), ǫ) is open, so there is an
open H ⊆ R r such that H ∩ D = φ−1 [U (φ(x), ǫ)]; we see that x ∈ H, so there is a δ > 0 such that U (x, δ) ⊆ H; now
if y ∈ D and ky − xk ≤ 21 δ then y ∈ D ∩ H, φ(y) ∈ U (φ(x), ǫ) and kφ(y) − φ(x)k ≤ ǫ. As x and ǫ are arbitrary, φ is
continuous. QQ

(b) Using the ǫ-δ definition of continuity, it is easy to see that a function φ from a subset D of R r to R s is continuous
iff all its components φi are continuous, writing φ(x) = (φ1 (x), . . . , φs (x)) for x ∈ D. PP (i) If φ is continuous, i ≤ s,
x ∈ D and ǫ > 0, then there is a δ > 0 such that
|φi (y) − φi (x)| ≤ kφ(y) − φ(x)k ≤ ǫ
whenever y ∈ D and√ky − xk ≤ δ. (ii) If every φi is continuous, x ∈ D and ǫ > 0, then there are δi > 0 such that
|φi (y) − φi (x)| ≤ ǫ/ s whenever y ∈ D and ky − xk ≤ δi ; setting δ = min1≤i≤r δi > 0, we have kφ(y) − φ(x)k ≤ ǫ
whenever y ∈ D and ky − xk ≤ δ. Q Q
2A2F The topology of Euclidean space 501

2A2D Compactness in R r : Definition A subset F of R r is called compact if whenever G is a family of open


sets covering F then there is a finite subset G0 of G still covering F .

2A2E Elementary properties of compact sets Take any r ≥ 1, and subsets D, F , G and K of R r .

(a) If K is compact and F is closed, then K ∩ F is compact. PP Let G be an open cover of F ∩ K. Then G ∪ {R r \ F }
is an open cover of K, so has a finite subcover G0 say. Now G0 \ {R r \ F } is a finite subset of G covering K ∩ F . As G
is arbitrary, K ∩ F is compact. QQ

(b) If s ≥ 1, φ : D → R s is a continuous function, K is compact and K ⊆ D, then φ[K] is compact. P


P Let V be an
open cover of φ[K]. Let H be
{H : H ⊆ R r is open, ∃ V ∈ V, φ−1 [V ] = D ∩ H}.
If x ∈ K, then φ(x) ∈ φ[K] so there is a V ∈SV such that φ(x) ∈ V ; now there is an H ∈ H such that D ∩ Hφ−1 [V ]
contains x (2A2Ca); as x is arbitrary, K ⊆ H. Let H0 be a finite subset of H covering K. For each H ∈ H0 , let
VH ∈ V be such that φ−1 [VH ] = D ∩ H; then {VH : H ∈ H0 } is a finite subset of V covering φ[K]. As V is arbitrary,
φ[K] is compact. Q
Q

(c) If K is compact, it is closed. PP Write H = R r \ K. Take any x ∈ H. Then Gn = R r \ B(x, 2−n ) is open for
every n ∈ N (1A2G). Also
S r r
n∈N Gn = {y : y ∈ R , ky − xk > 0} = R \ {x} ⊇ K.

So there is some finite set G0 ⊆ {Gn : n ∈ N} which covers K. There must be an n such that G0 ⊆ {Gi : i ≤ n}, so that
S S
K ⊆ G0 ⊆ i≤n Gi = Gn ,
and B(x, 2−n ) ⊆ H. As x is arbitrary, H is open and K is closed. Q
Q

(d) If K is compact and G is open and K ⊆ G, then there is a δ > 0 such that K + B(0, δ) ⊆ G. P
P If K = ∅, this
is trivial, as then
K + B(0, 1) = {x + y : x ∈ K, y ∈ B(0, 1)} = ∅.
Otherwise, set
G = {U (x, δ) : x ∈ Rr , δ > 0, U (x, 2δ) ⊆ G}.
S
Then G is a family of open sets and G = G (because G is open), so G is an open cover of K and has a finite subcover
G0 . Express G0 as {U (x0 , δ0 ), . . . , U (xn , δn )} where U (xi , 2δi ) ⊆ G for each i. Set δ = mini≤n δi > 0. If x ∈ K and
y ∈ B(0, δ), then there is an i ≤ n such that x ∈ U (xi , δi ); now
k(x + y) − xi k ≤ kx − xi k + kyk < δi + δ ≤ 2δi ,
so x + y ∈ U (xi , 2δi ) ⊆ G. As x and y are arbitrary, K + B(0, δ) ⊆ G. Q
Q
Remark This result is a simple form of the Lebesgue covering lemma.

2A2F The value of the concept of ‘compactness’ is greatly increased by the fact that there is an effective charac-
terization of the compact subsets of R r .
Theorem For any r ≥ 1, a subset K of R r is compact iff it is closed and bounded.
proof (a) Suppose that K is compact.SBy 2A2Ec, it is closed. To see that it is bounded, consider G = {U (0, n) : n ∈ N}.
G consists entirely of open sets, and G = Rr ⊇ K, so there is a finite G0 ⊆ G covering K. There must be an n such
that G0 ⊆ {Gi : i ≤ n}, so that
S S
K ⊆ G0 ⊆ i≤n U (0, i) = U (0, n),
and K is bounded.
(b) Thus we are left with the converse; I have to show that a closed bounded set is compact. The main part
of the argument is a proof by induction on r that the closed interval [−n, n] is compact for all n ∈ N, writing
n = (n, . . . , n) ∈ Rr .
(i) If r = 1 and n ∈ N and G is a family of open sets in R covering [−n, n], set
S
A = {x : x ∈ [−n, n], there is a finite G0 ⊆ G such that [−n, x] ⊆ G0 }.
502 Appendix 2A2F

S
Then −n ∈ A, because if −n ∈ G ∈ G then [−n, −n] ⊆ {G}, and A is bounded above by n, so c = sup A exists and
belongs to [−n, n]. S
Next, c ∈ [−n, n] ⊆ G, so there is a G ∈ G containing c. Let δ > 0 be such that U (c, δ) ⊆ G. There is an x ∈ A
such that x ≥ c − δ. Let G0 be a finite subset of G covering [−n, x]. Then G1 = G0 ∪ {G} is a finite subset of G covering
[−n, c + 12 δ]. But c + 12 δ ∈
/ A so c + 21 δ > n and G1 is a finite subset of G covering [−n, n]. As G is arbitrary, [−n, n] is
compact and the induction starts.
(ii) For the inductive step to r + 1, regard the closed interval F = [−n, n], taken in R r+1 , as the product of the
closed interval E = [−n, n], taken in R r , with the closed interval [−n, n] ⊆ R; by the inductive hypothesis, both E and
[−n, n] are compact. Let G be a family of open subsets of R r+1 covering F . Write
S H for the family of open subsets H
of R r such that H × [−n, n] is covered by a finite subfamily of G. Then E ⊆ H. P P Take x ∈ E. Set
Ux = {U : U ⊆ R is open, ∃ G ∈ G, open H ⊆ R r , x ∈ H and H × U ⊆ G}.
Then Ux is a family of open subsets of R. If ξ ∈ [−n, n], there is a G ∈ G containing (x, ξ); there is a δ > 0 such that
U ((x, ξ), δ) ⊆ G; now U (x, 12 δ) and U (ξ, 12 δ) are open sets in R r , R respectively and
U (x, 12 δ) × U (ξ, 21 δ) ⊆ U ((x, ξ), δ) ⊆ G,
so U (ξ, 21 δ) ∈ Ux . As ξ is arbitrary, Ux is an open cover of [−n, n] in R. By (i), it has a finite subcover U0 , . . . , Uk say.
T j ≤ k we can find Hj , Gj such that
For each Hj is an open subset of R r containing
S x and Hj × Uj ⊆ Gj ∈ G. Now set
r
H = j≤k Hj . This is an open subset of R containing x, and H × [−n, n] ⊆ j≤n Gj is covered by a finite subfamily
of G. So x ∈ H ∈ H. As x is arbitrary, H covers E. Q Q
(iii) Now the inductive hypothesis tells us that E is compact, so there is a finite
S subfamily H0 of H covering E.
For each H ∈ H0 let GH be a finite subfamily of G covering H × [−n, n]. Then H∈H0 GH is a finite subfamily of G
covering E × [−n, n] = F . As G is arbitrary, F is compact and the induction proceeds.
(iv) Thus the interval [−n, n] is compact in R r for every r, n. Now suppose that K is a closed bounded set in
r
R . Then there is an n ∈ N such that K ⊆ [−n, n], that is, K = K ∩ [−n, n]. As K is closed and [−n, n] is compact,
K is compact, by 2A2Ea.
This completes the proof.

2A2G Corollary If φ : D → R is continuous, where D ⊆ R r , and K ⊆ D is a non-empty compact set, then φ is


bounded and attains its bounds on K.
proof By 2A2Eb, φ[K] is compact; by 2A2F it is closed and bounded. To say that φ[K] is bounded is just to say
that φ is bounded on K. Because φ[K] is a non-empty bounded set, it has an infimum a and a supremum b; now both
belong to φ[K] (by the criterion 2A2B(ii), or otherwise); because φ[K] is closed, both belong to φ[K], that is, φ attains
its bounds.

2A2H Lim sup and lim inf revisited In §1A3 I briefly discussed lim supn→∞ an , lim inf n→∞ an for real se-
quences han in∈N . In this volume we need the notion of lim supδ↓0 f (δ), lim inf δ↓0 f (δ) for real functions f . I say that
lim supδ↓0 f (δ) = u ∈ [−∞, ∞] if (i) for every v > u there is an η > 0 such that f (δ) is defined and less than or equal
to v for every δ ∈ ]0, η] (ii) for every v < u and η > 0 there is a δ ∈ ]0, η] such that f (δ) is defined and greater than or
equal to v. Similarly, lim inf δ↓0 f (δ) = u ∈ [−∞, ∞] if (i) for every v < u there is an η > 0 such that f (δ) is defined
and greater than or equal to v for every δ ∈ ]0, η] (ii) for every v > u and η > 0 there is an δ ∈ ]0, η] such that f (δ) is
defined and less than or equal to v.

2A2I In the one-dimensional case, we have a particularly simple description of the open sets.
Proposition If G ⊆ R is any open set, it is expressible as the union of a countable disjoint family of open intervals.
proof For x, y ∈ G write x ∼ y if either x ≤ y and [x, y] ⊆ G or y ≤ x and [y, x] ⊆ G. It is easy to check that ∼
is an equivalence relation on G. Let C be the set of equivalence classes under ∼. Then C is a partition of G. Now
every C ∈ C is an open interval. PP Set a = inf C, b = sup C (allowing a = −∞ and/or b = ∞ if C is unbounded). If
a < x < b, there are y, z ∈ C such that y ≤ x ≤ z, so that [y, x] ⊆ [y, z] ⊆ G and y ∼ x and x ∈ C; thus ]a, b[ ⊆ C. If
x ∈ C, there is an open interval I containing x and included in G; since x ∼ y for every y ∈ I, I ⊆ C; so
a ≤ inf I < x < sup I ≤ b
and x ∈ ]a, b[. Thus C = ]a, b[ is an open interval. Q
Q
To see that C is countable, observe that every member of C contains a member of Q, so that we have a surjective
function from a subset of Q onto C, and C is countable (1A1E).
2A3C General topology 503

2A3 General topology


At various points – principally §§245-247, but also for certain ideas in Chapter 27 – we need to know something about
non-metrizable topologies. I must say that you should probably take the time to look at some book on elementary
functional analysis which has the phrases ‘weak compactness’ or ‘weakly compact’ in the index. But I can list here the
concepts actually used in this volume, in a good deal less space than any orthodox, complete treatment would employ.

2A3A Topologies First we need to know what a ‘topology’ is. If X is any set, a topology
S on X is a family T of
subsets of X such that (i) ∅, X ∈ T (ii) if G, H ∈ T then G ∩ H ∈ T (iii) if G ⊆ T then G ∈ T (cf. 1A2B). The pair
(X, T) is now a topological space. In this context, members of T are called open and their complements (in X) are
called closed (cf. 1A2E-1A2F).

2A3B Continuous functions (a) If (X, T) and (Y, S) are topological spaces, a function φ : X → Y is continuous
if φ−1 [G] ∈ T for every G ∈ S. (By 2A2Ca above, this is consistent with the ǫ-δ definition of continuity for functions
from one Euclidean space to another. See also 2A3H below.)

(b) If (X, T), (Y, S) and (Z, U) are topological spaces and φ : X → Y and ψ : Y → Z are continuous, then
P If G ∈ U then ψ −1 [G] ∈ S so (ψφ)−1 [G] = φ−1 [ψ −1 [G]] ∈ T. Q
ψφ : X → Z is continuous. P Q

(c) If (X, T) is a topological space, a function f : X → R is continuous iff {x : a < f (x) < b} is open whenever a < b
in R. PP (i) Every interval ]a, b[ is open in R, so if f is continuous its inverse image {x : a < f (x) < b} must be open.
(ii) Suppose that f −1 [ ]a, b[ ] is open whenever a < b, and let H ⊆ R be any open set. By the definition of ‘open’ set
in R (1A2A),
[
H = {]y − δ, y + δ[ : y ∈ R, δ > 0, ]y − δ, y + δ[ ⊆ H},
so
S
f −1 [H] = {f −1 [ ]y − δ, y + δ[ ] : y ∈ R, δ > 0, ]y − δ, y + δ[ ⊆ H}
is a union of open sets in X, therefore open. Q
Q

(d) If r ≥ 1, (X, T) is a topological space, and φ : X → R r is a function, then φ is continuous iff φi : X → R


is continuous for each i ≤ r, where φ(x) = (φ1 (x), . . . , φr (x)) for each x ∈ X. P P (i) Suppose that φ is continuous.
For i ≤ r, y = (η1 , . . . , ηr ) ∈ R r , set πi (y) = ηi . Then |πi (y) − πi (z)| ≤ ky − zk for all y, z ∈ R r so πi : R r → R is
continuous. Consequently φi = πi φ is continuous, by (b) above. (ii) Suppose that every φi is continuous, and that
H ⊆ R r is open. Set
G = {G : G ⊆ X is open, G ⊆ φ−1 [H]}.
S
Then G0 = G is open, and G0 ⊆ φ−1 [H]. But suppose that x0 is any point of φ−1 [H]. Then there is a δ > 0 such that
U (φ(x0 ), δ) ⊆ H, because H is open and contains φ(x0 ). For 1 ≤ i ≤ r set Vi = {x : φi (x0 )− √δr < φi (x) < φi (x0 )+ √δr };
T
then Vi is the inverse image of an open set under the continuous map φi , so is open. Set G = i≤r Vi . Then G is
open (using (ii) of the definition 2A3A), x0 ∈ G, and kφ(x) − φ(x0 k < δ for every x ∈ G, so G ⊆ φ−1 [H], G ∈ G and
x0 ∈ G0 . This shows that φ−1 [H] = G0 is open. As H is arbitrary, φ is continuous. Q Q

(e) If (X, T) is a topological space, f1 , . . . , fr are continuous functions from X to R, and h : R r → R is continuous,
P Set φ(x) = (f1 (x), . . . , fr (x)) ∈ R r for x ∈ X. By (d), φ is continuous, so
then h(f1 , . . . , fr ) : X → R is continuous. P
by 2A3Bb h(f1 , . . . , fr ) = hφ is continuous. Q Q In particular, f + g, f × g and f − g are continuous for all continuous
functions f , g : X → R.

(f ) If (X, T) and (Y, S) are topological spaces and φ : X → Y is a continuous function, then φ−1 [F ] is closed in X
for every closed set F ⊆ Y . (For X \ φ−1 [F ] = φ−1 [Y \ F ] is open.)

2A3C Subspace topologies If (X, T) is a topological space and D ⊆ X, then TD = {G ∩ D : G ∈ T} is a


topology on D. P P (i) ∅ = ∅ ∩ D and D = X ∩ D belong to TD . (ii) If G, H ∈ TD there are G′ , H ′ ∈ T such that

G=G S
S ∩ D, H = H ′ ∩ D; now G ∩ H = G′ ∩ H ′ ∩ D ∈ TD . (iii) If G ⊆ TD set H = {H : H ∈ T, H ∩ D ∈ G}; then
G = ( H) ∩ D ∈ TD . Q Q
TD is called the subspace topology on D, or the topology on D induced by T. If (Y, S) is another topological
space, and φ : X → Y is (T, S)-continuous, then φ↾D : D → Y is (TD , S)-continuous. (For if H ∈ S then
(φ↾D)−1 [H] = D ∩ φ−1 [H] ∈ TD .)
504 Appendix 2A3D

2A3D Closures and interiors (a) In the proof of 2A3Bd I have already used the following idea. Let (X, T) be
any topological space and A any subset of X. Write
S
int A = {G : G ∈ T, G ⊆ A}.
Then int A is an open set, being a union of open sets, and is of course included in A; it must be the largest open set
included in A, and is called the interior of A.

(b) Because a set is closed iff its complement is open, we have a complementary notion:
\
A = {F : F is closed, A ⊆ F }
[
= X \ {X \ F : F is closed, A ⊆ F }
[
= X \ {G : G is open, A ∩ G = ∅}
[
= X \ {G : G is open, G ⊆ X \ A} = X \ int(X \ A).

A is closed (being the complement of an open set) and is the smallest closed set including A; it is called the closure
of A. (Compare 2A2A.) Because the union of two closed sets is closed (cf. 1A2Fc), A ∪ B = A ∪ B for all A, B ⊆ X.

(c) There are innumerable ways of looking at these concepts; a useful description of the closure of a set is

x ∈ A ⇐⇒ x ∈
/ int(X \ A)
⇐⇒ there is no open set containing x and included in X \ A
⇐⇒ every open set containing x meets A.

2A3E Hausdorff topologies (a) The concept of ‘topological space’ is so widely drawn, and so widely applicable,
that a vast number of different types of topological space have been studied. For this volume we shall not need much
of the (very extensive) vocabulary which has been developed to describe this variety. But one useful word (and one
of the most important concepts) is that of ‘Hausdorff space’; a topological space X is Hausdorff if for all distinct x,
y ∈ X there are disjoint open sets G, H ⊆ X such that x ∈ G and y ∈ H.

P If z ∈ X, then for any x ∈ X \ {z} there is an open set


(b) In a Hausdorff space X, finite sets are closed. P
containing x but not z, so X \ {z} is open and {z} is closed. So a finite set is a finite union of closed sets and is
therefore closed. Q
Q

2A3F Pseudometrics Many important topologies (not all!) can be defined by families of pseudometrics; it will be
useful to have a certain amount of technical skill with these.
(a) Let X be a set. A pseudometric on X is a function ρ : X × X → [0, ∞[ such that
ρ(x, z) ≤ ρ(x, y) + ρ(y, z) for all x, y, z ∈ X
(the ‘triangle inequality’;)
ρ(x, y) = ρ(y, x) for all x, y ∈ X;
ρ(x, x) = 0 for all x ∈ X.
A metric is a pseudometric ρ satisfying the further condition
if ρ(x, y) = 0 then x = y.

(b) Examples (i) For x, y ∈ R, set ρ(x, y) = |x − y|; then ρ is a metric on R (the ‘usual metric’ on R).
pPr
(ii) For x, y ∈ R r , where r ≥ 1, set ρ(x, y) = kx − yk, defining kzk = 2
i=1 ζi , as usual. Then ρ is a metric,
the Euclidean metric on R . (The triangle inequality for ρ comes from Cauchy’s inequality in 1A2C: if x, y, z ∈ R r ,
r

then
ρ(x, z) = kx − zk = k(x − y) + (y − z)k ≤ kx − yk + ky − zk = ρ(x, y) + ρ(y, z).
The other required properties of ρ are elementary. Compare 2A4Bb below.)
(iii) For an example of a pseudometric which is not a metric, take r ≥ 2 and define ρ : R r × Rr → [0, ∞[ by
setting ρ(x, y) = |ξ1 − η1 | whenever x = (ξ1 , . . . , ξr ), y = (η1 , . . . , ηr ) ∈ R r .
2A3I General topology 505

(c) Now let X be a set and P a non-empty family of pseudometrics on X. Let T be the family of those subsets G
of X such that for every x ∈ G there are ρ0 , . . . , ρn ∈ P and δ > 0 such that
U (x; ρ0 , . . . , ρn ; δ) = {y : y ∈ X, maxi≤n ρi (y, x) < δ} ⊆ G.
Then T is a topology on X.
P (Compare 1A2B.) (i) ∅ ∈ T because the condition is vacuously satisfied. X ∈ T because U (x; ρ; 1) ⊆ X
P
for any x ∈ X, ρ ∈ P. (ii) If G, H ∈ T and x ∈ G ∩ H, take ρ0 , . . . , ρm , ρ′0 , . . . , ρ′n ∈ P, δ, δ ′ > 0 such that
U (x; ρ0 , . . . , ρm ; δ) ⊆ G, U (x; ρ′0 , . . . , ρ′n ; δ ′ ) ⊆ G; then
U (x; ρ0 , . . . , ρm , ρ′0 , . . . , ρ′n ; min(δ, δ ′ )) ⊆ G ∩ H.
S
As x is arbitrary, G ∩ H ∈ T. (iii) If G ⊆ T and x ∈ G, there is a G ∈ G such that x ∈ G; now there are ρ0 , . . . , ρn ∈ P
and δ > 0 such that
S
U (x; ρ0 , . . . , ρn ; δ) ⊆ G ⊆ G.
S
As x is arbitrary, G ∈ T. Q Q
T is the topology defined by P.

(d) You may wish to have a convention to deal with the case in which P is the empty set; in this case the topology
on X defined by P is {∅, X}.

(e) In many important cases, P is upwards-directed in the sense that for any ρ1 , ρ2 ∈ P there is a ρ ∈ P such that
ρi (x, y) ≤ ρ(x, y) for all x, y ∈ X and both i. In this case, of course, any set U (x; ρ0 , . . . , ρn ; δ), where ρ0 , . . . , ρn ∈ P,
includes some set of the form U (x; ρ; δ), where ρ ∈ P. Consequently, for instance, a set G ⊆ X is open iff for every
x ∈ G there are ρ ∈ P, δ > 0 such that U (x; ρ; δ) ⊆ G.

(f ) A topology T is metrizable if it is the topology defined by a family P consisting of a single metric. Thus the
Euclidean topology on R r is the metrizable topology defined by {ρ}, where ρ is the metric of (b-ii) above.

2A3G Proposition Let X be a set with a topology defined by a non-empty set P of pseudometrics on X. Then
U (x; ρ0 , . . . , ρn ; ǫ) is open for all x ∈ X, ρ0 , . . . , ρn ∈ P and ǫ > 0.
proof (Compare 1A2D.) Take y ∈ U (x; ρ0 , . . . , ρn ; ǫ). Set
η = maxi≤n ρi (y, x), δ = ǫ − η > 0.
If z ∈ U (y; ρ0 , . . . , ρn ; δ) then
ρi (z, x) ≤ ρi (z, y) + ρi (y, x) < δ + η = ǫ
for each i ≤ n, so U (y; ρ0 , . . . , ρn ; δ) ⊆ U (x; ρ0 , . . . , ρn ; ǫ). As y is arbitrary, U (x; ρ0 , . . . , ρn ; ǫ) is open.

2A3H Now we have a result corresponding to 2A2Ca, describing continuous functions between topological spaces
defined by families of pseudometrics.
Proposition Let X and Y be sets; let P be a non-empty family of pseudometrics on X, and Θ a non-empty family
of pseudometrics on Y ; let T and S be the corresponding topologies. Then a function φ : X → Y is continuous iff
whenever x ∈ X, θ ∈ Θ and ǫ > 0, there are ρ0 , . . . , ρn ∈ P and δ > 0 such that θ(φ(y), φ(x)) ≤ ǫ whenever y ∈ X and
maxi≤n ρi (y, x) ≤ δ.
proof (a) Suppose that φ is continuous; take x ∈ X, θ ∈ Θ and ǫ > 0. By 2A3G, U (φ(x); θ; ǫ) ∈ S. So G =
φ−1 [U (φ(x); θ; ǫ)] ∈ T. Now x ∈ G, so there are ρ0 , . . . , ρn ∈ P and δ > 0 such that U (x; ρ0 , . . . , ρn ; δ) ⊆ G. In this
case θ(φ(y), φ(x)) ≤ ǫ whenever y ∈ X and maxi≤n ρi (y, x) ≤ 12 δ. As x, θ and ǫ are arbitrary, φ satisfies the condition.
(b) Suppose φ satisfies the condition. Take H ∈ S and consider G = φ−1 [H]. If x ∈ G, then φ(x) ∈ H, so there are
θ0 , . . . , θn ∈ Θ and ǫ > 0 such that U (φ(x); θ0 , . . . , θn ; ǫ) ⊆ H. For each i ≤ n there are ρi0 , . . . , ρi,mi ∈ P and δi > 0
such that θ(φ(y), φ(x)) ≤ 12 ǫ whenever y ∈ X and maxj≤mi ρij (y, x) ≤ δi . Set δ = mini≤n δi > 0; then
U (x; ρ00 , . . . , ρ0,m0 , . . . , ρn0 , . . . , ρn,mn ; δ) ⊆ G.
As x is arbitrary, G ∈ T. As H is arbitrary, φ is continuous.

2A3I Remarks (a) If P is upwards-directed, the condition simplifies to: for every x ∈ X, θ ∈ Θ and ǫ > 0, there
are ρ ∈ P and δ > 0 such that θ(φ(y), φ(x)) ≤ ǫ whenever y ∈ X and ρ(y, x) ≤ δ.
506 Appendix 2A3Ib

(b) Suppose we have a set X and two non-empty families P, Θ of pseudometrics on X, generating topologies T and
S on X. Then S ⊆ T iff the identity map φ from X to itself is a continuous function when regarded as a map from
(X, T) to (X, S), because this will mean that G = φ−1 [G] belongs to T whenever G ∈ S. Applying the proposition
above to φ, we see that this happens iff for every θ ∈ Θ, x ∈ X and ǫ > 0 there are ρ0 , . . . , ρn ∈ P and δ > 0 such that
θ(y, x) ≤ ǫ whenever y ∈ X and maxi≤n ρi (y, x) ≤ δ. Similarly, reversing the roles of P and Θ, we get a criterion for
when T ⊆ S, and putting the two together we obtain a criterion to determine when T = S.

2A3J Subspaces: Proposition If X is a set, P a non-empty family of pseudometrics on X defining a topology T


on X, and D ⊆ X, then
(a) for every ρ ∈ P, the restriction ρ(D) of ρ to D × D is a pseudometric on D;
(b) the topology defined by PD = {ρ(D) : ρ ∈ P} on D is precisely the subspace topology TD described in 2A3C.
proof (a) is just a matter of reading through the definition in 2A3Fa. For (b), we have to think for a moment.
(i) Suppose that G belongs to the topology defined by PD . Set
H = {H : H ∈ T, H ∩ D ⊆ G},
S
H ∗ = H ∈ T, G∗ = H ∗ ∩ D ∈ TD ;
then G∗ ⊆ G. On the other hand, if x ∈ G, then there are ρ0 , . . . , ρn ∈ P and δ > 0 such that
(D) (D) (D)
U (x; ρ0 , . . . , ρn ; δ) = {y : y ∈ D, maxi≤n ρi (y, x) < δ} ⊆ G.
Consider
H = U (x; ρ0 , . . . , ρn ; δ) = {y : y ∈ X, maxi≤n ρi (y, x) < δ} ⊆ X.
Evidently
(D) (D)
H ∩ D = U (x; ρ0 , . . . , ρn ; δ) ⊆ G.
Also H ∈ T. So H ∈ H and
x ∈ H ∩ D ⊆ H ∗ ∩ D = G∗ .
Thus G = G∗ ∈ TD .
(ii) Now suppose that G ∈ TD . Then there is an H ∈ T such that G = H ∩D. Consider the identity map φ : D → X,
defined by saying that φ(x) = x for every x ∈ D. φ obviously satisfies the criterion of 2A3H, if we endow D with
PD and X with P, because ρ(φ(x), φ(y)) = ρ(D) (x, y) whenever x, y ∈ D and ρ ∈ P; so φ must be continuous for the
associated topologies, and φ−1 [H] must belong to the topology defined by PD . But φ−1 [H] = G. Thus every set in TD
belongs to the topology defined by PD , and the two topologies are the same, as claimed.

2A3K Closures and interiors Let X be a set, P a non-empty family of pseudometrics on X and T the topology
defined by P.

(a) For any A ⊆ X and x ∈ X,

x ∈ int A ⇐⇒ there is an open set included in A containing x


⇐⇒ there are ρ0 , . . . , ρn ∈ P, δ > 0 such that U (x; ρ0 , . . . , ρn ; δ) ⊆ A.

(b) For any A ⊆ X and x ∈ X, x ∈ A iff U (x; ρ0 , . . . , ρn ; δ) ∩ A 6= ∅ for every ρ0 , . . . , ρn ∈ P and δ > 0. (Compare
2A2B(ii), 2A3Dc.)

2A3L Hausdorff topologies Recall that a topology T is Hausdorff if any two points can be separated by open
sets (2A3E). Now a topology defined on a set X by a non-empty family P of pseudometrics is Hausdorff iff for any
two different points x, y of X there is a ρ ∈ P such that ρ(x, y) > 0. P P (i) Suppose that the topology is Hausdorff
and that x, y are distinct points in X. Then there is an open set G containing x but not containing y. Now there are
ρ0 , . . . , ρn ∈ P and δ > 0 such that U (x; ρ0 ), . . . , ρn ; δ) ⊆ G, in which case ρi (y, x) ≥ δ > 0 for some i ≤ n. (ii) If P
satisfies the condition, and x, y are distinct points of X, take ρ ∈ P such that ρ(x, y) > 0, and set δ = 21 ρ(x, y). Then
U (x; ρ; δ) and U (y; ρ; δ) are disjoint (because if z ∈ X, then
ρ(z, x) + ρ(z, y) ≥ ρ(x, y) = 2δ,
2A3P General topology 507

so at least one of ρ(z, x), ρ(z, y) is greater than or equal to δ), and they are open sets containing x, y respectively. As
x and y are arbitrary, the topology is Hausdorff. Q Q
In particular, metrizable topologies are Hausdorff.

2A3M Convergence of sequences (a) If (X, T) is any topological space, and hxn in∈N is a sequence in X, we say
that hxn in∈N converges to x ∈ X, or that x is a limit of hxn in∈N , or hxn in∈N → x, if for every open set G containing
x there is an n0 ∈ N such that xn ∈ G for every n ≥ n0 .

(b) Warning In general topological spaces, it is possible for a sequence to have more than one limit, and we cannot
safely write x = limn→∞ xn . But in Hausdorff spaces, this does not occur. P P If T is Hausdorff, and x, y are distinct
points of X, there are disjoint open sets G, H such that x ∈ G and y ∈ H. If now hxn in∈N converges to x, there is an
n0 such that xn ∈ G for every n ≥ n0 , so xn ∈
/ H for every n ≥ n0 , and hxn in∈N cannot converge to y. Q
Q In particular,
a sequence in a metrizable space can have at most one limit.

(c) Let X be a set, and P a non-empty family of pseudometrics on X, generating a topology T; let hxn in∈N be a
sequence in X and x ∈ X. Then hxn in∈N converges to x iff limn→∞ ρ(xn , x) = 0 for every ρ ∈ P. P P (i) Suppose that
hxn in∈N → x and that ρ ∈ P. Then for any ǫ > 0 the set G = U (x; ρ; ǫ) is an open set containing x, so there is an
n0 such that xn ∈ G for every n ≥ n0 , that is, ρ(xn , x) < ǫ for every n ≥ n0 . As ǫ is arbitrary, limn→∞ ρ(xn , x) = 0.
(ii) If the condition is satisfied, take any open set G containing X. Then there are ρ0 , . . . , ρk ∈ P and δ > 0 such
that U (x; ρ0 , . . . , ρk ; δ) ⊆ G. For each i ≤ k there is an ni ∈ N such that ρi (xn , x) < δ for every n ≥ ni . Set
n∗ = max(n0 , . . . , nk ); then xn ∈ U (x; ρ0 , . . . , ρk ; δ) ⊆ G for every n ≥ n∗ . As G is arbitrary, hxn in∈N → x. Q
Q

(d) Let (X, ρ) be a metric space, A a subset of X and x ∈ X. Then x ∈ A iff there is a sequence in A converging to
x. P P(i) If x ∈ A, then for every n ∈ N we can choose a point xn ∈ A ∩ U (x; ρ; 2−n ) (2A3Kb); now hxn in∈N → x. (ii)
If hxn in∈N is a sequence in A converging to x, then for every open set G containing x there is an n such that xn ∈ G,
so that A ∩ G 6= ∅; by 2A3Dc, x ∈ A. Q Q

2A3N Compactness The next concept we need is the idea of ‘compactness’ in general topological spaces.

(a) If (X, T) is any topological space, a subset K of X is compact if whenever G is a family in T covering K, then
there is a finite G0 ⊆ G covering K. (Cf. 2A2D. A warning: many authors reserve the term ‘compact’ for Hausdorff
spaces.) A set A ⊆ X is relatively compact in X if there is a compact subset of X including A. (Warning! in
non-Hausdorff spaces, this is not the same thing as saying that A is compact.)

(b) Just as in 2A2E-2A2G (and the proofs are the same in the general case), we have the following results.
(i) If K is compact and E is closed, then K ∩ E is compact.
(ii) If K ⊆ X is compact and φ : K → Y is continuous, where (Y, S) is another topological space, then φ[K] is a
compact subset of Y .
(iii) If K ⊆ X is compact and φ : K → R is continuous, then φ is bounded and attains its bounds.

2A3O Cluster points (a) If (X, T) is a topological space, and hxn in∈N is a sequence in X, then a cluster point
of hxn in∈N is an x ∈ X such that whenever G is an open set containing x and n ∈ N then there is a k ≥ n such that
xk ∈ G.

(b) Now if (X, T) is a topological space and A ⊆ X is relatively compact, every sequence hxn in∈N in A has a cluster
point in X. P
P Let K be a compact subset of X including A. Set
G = {G : G ∈ T, {n : xn ∈ G} is finite}.
?? If G covers K, then there is a finite G0 ⊆ G covering K. Now
S S
N = {n : xn ∈ A} = {n : xn ∈ G0 } = G∈G0 {n : xn ∈ G}
S
X Thus G does not cover K. Take any x ∈ K \ G. If G ∈ T and
is a finite union of finite sets, which is absurd. X
x ∈ G and n ∈ N, then G ∈ / G so {k : xk ∈ G} is infinite and there is a k ≥ n such that xk ∈ G. Thus x is a cluster
point of hxn in∈N , as required. QQ

2A3P Filters In R r , and more generally in all metrizable spaces, topological ideas can be effectively discussed
in terms of convergent sequences. (To be sure, this occasionally necessitates the use of a weak form of the axiom of
508 Appendix 2A3P

choice, in order to choose a sequence; but as measure theory without such choices is changed utterly – see Chapter 56
in Volume 5 – there is no point in fussing about them here.) For topological spaces in general, however, sequences are
quite inadequate, for very interesting reasons which I shall not enlarge upon. Instead we need to use ‘nets’ or ‘filters’.
The latter take a moment’s more effort at the beginning, but are then (in my view) much easier to work with, so I
describe this method now.

2A3Q Convergent filters (a) Let (X, T) be a topological space, F a filter on X (see 2A1I) and x a point of X.
We say that F is convergent to x, or that x is a limit of F, and write F → x, if every open set containing x belongs
to F.

(b) Let (X, T) and (Y, S) be topological spaces, φ : X → Y a continuous function, x ∈ X and F a filter on X
converging to x. Then φ[[F]] (as defined in 2A1Ib) converges to φ(x) (because φ−1 [G] is an open set containing x
whenever G is an open set containing φ(x)).

2A3R Now we have the following characterization of compactness.


Theorem Let X be a topological space, and K a subset of X. Then K is compact iff every ultrafilter on X containing
K has a limit in K.
proof (a) Suppose that K is compact and that F is an ultrafilter on X containing K. Set
G = {G : G ⊆ X is open, X \ G ∈ F}.
Then the union of any two members of G belongs to G, so the union of any finite number of members of G belongs to
G; also no member of G canSinclude K, because X \ K ∈ / F. Because K is compact, it follows that G cannot cover K.
Let x be any point of K \ G. If G is any open set containing x, then G ∈ / G so X \ G ∈ / F; but this means that G
must belong to F, because F is an ultrafilter. As G is arbitrary, F → x. Thus every ultrafilter on X containing K has
a limit in K.
(b) Now suppose that every ultrafilter on X containing K has a limit in K. Let G be a cover of K by open sets in
X. ?? Suppose, if possible, that G has no finite subcover. Set
S
F = {F : there is a finite G0 ⊆ G, F ∪ G0 ⊇ K}.
S
Then F is a filter on X. PP (i) X ∪ ∅ ⊇ K so X ∈ F.
S S
∅ ∪ G0 = G0 6⊇ K
S
S finite G0 ⊆ G, by hypothesis, so ∅ ∈
for any / F. (ii) If E, F ∈ F there are finite sets G1 , G2 ⊆ G such that E ∪ G1 and
S
F ∪ G2 both include K; now (E ∩ F ) ∪S (G1 ∪ G2 ) ⊇ K so E ∩ F ∈ F. (iii) If X ⊇ E ⊇ F ∈ F then there is a finite
G0 ⊆ G such that F ∪ G0 ⊇ K; now E ∪ G0 ⊇ K and E ∈ F. Q Q
By the Ultrafilter Theorem (2A1O), there is an ultrafilter F ∗ on X including F. Of course K itself belongs to F,
so K ∈ F ∗ . By hypothesis, F ∗ has a limit x ∈ K. But now there is a set G ∈ G containing x, and (X \ G) ∪ G ⊇ K,
so X \ G ∈ F ⊆ F ∗ ; which means that G cannot belong to F ∗ , and x cannot be a limit of F ∗ . X X
So G has a finite subcover. As G is arbitrary, K must be compact.
Remark Note that this theorem depends vitally on the Ultrafilter Theorem and therefore on the axiom of choice.

2A3S Further calculations with filters (a) In general, it is possible for a filter to have more than one limit; but
in Hausdorff spaces this does not occur. P
P (Compare 2A3Mb.) If (X, T) is Hausdorff, and x, y are distinct points of
X, there are disjoint open sets G, H such that x ∈ G and y ∈ H. If now a filter F on X converges to x, G ∈ F so
H∈ / F and F does not converge to y. QQ
Accordingly we can safely write x = lim F when F → x in a Hausdorff space.

(b) Now suppose that X is a set, F is a filter on X, (Y, S) is a Hausdorff space, D ∈ F and φ : D → Y is a function.
Then we write limx→F φ(x) for lim φ[[F]] if this is defined in Y ; that is, limx→F φ(x) = y iff φ−1 [H] ∈ F for every open
set H containing y.
In the special case Y = R, limx→F φ(x) = a iff {x : |φ(x) − a| ≤ ǫ} ∈ F for every ǫ > 0 (because every open set
containing a includes a set of the form [a − ǫ, a + ǫ], which in turn includes the open set ]a − ǫ, a + ǫ[).

(c) Suppose that X and Y are sets, F is a filter on X, Θ is a non-empty family of pseudometrics on Y defining a
topology S on Y , and φ : X → Y is a function. Then the image filter φ[[F]] converges to y ∈ Y iff limx→F θ(φ(x), y) = 0
in R for every θ ∈ Θ. P P (i) Suppose that φ[[F]] → y. For every θ ∈ Θ and ǫ > 0, U (y; θ; ǫ) = {z : θ(z, y) < ǫ} is
2A3Sg General topology 509

an open set containing y (2A3G), so belongs to φ[[F]], and its inverse image {x : 0 ≤ θ(φ(x), y) < ǫ} belongs to
F. As ǫ is arbitrary, limx→F θ(φ(x), y) = 0. As θ is arbitrary, φ satisfies the condition. (ii) Now suppose that
limx→F θ(φ(x), y) = 0 for every θ ∈ Θ. Let G be any open set in Y containing y. Then there are θ0 , . . . , θn ∈ Θ and
ǫ > 0 such that
T
U (y; θ0 , . . . , θn ; ǫ) = i≤n U (y; θi ; ǫ) ⊆ G.
For each i ≤ n,
φ−1 [U (y; θi ; ǫ)] = {x : θ(φ(x), y) < ǫ}
belongs to F; because F is closed under finite intersections, so do φ−1 [U (y; θ0 , . . . , θn ; ǫ)] and its superset φ−1 [G]. Thus
G ∈ φ[[F]]. As G is arbitrary, φ[[F]] → y. QQ

(d) In particular, taking X = Y and φ the identity map, if X has a topology T defined by a non-empty family P of
pseudometrics, then a filter F on X converges to x ∈ X iff limy→F ρ(y, x) = 0 for every ρ ∈ P.

(e)(i) If X is any set, F is an ultrafilter on X, (Y, S) is a Hausdorff space, and h : X → Y is a function such that h[F ]
is relatively compact in Y for some F ∈ F, then limx→F h(x) is defined in Y . P P Let K ⊆ Y be a compact set including
h[F ]. Then K ∈ h[[F]], which is an ultrafilter (2A1N), so h[[F]] has a limit in Y (2A3R), which is limx→F h(x). Q Q
(ii) If X is any set, F is an ultrafilter on X, and h : X → R is a function such that h[F ] is bounded in R for
some set F ∈ F, then limx→F h(x) exists in R. P P h[F ] is closed and bounded, therefore compact (2A2F), so h[F ] is
relatively compact and we can use (i). Q Q

(f ) The concepts of lim sup, lim inf can be applied to filters. Suppose that F is a filter on a set X, and that
f : X → [−∞, ∞] is any function. Then

lim sup f (x) = inf{u : u ∈ [−∞, ∞], {x : f (x) ≤ u} ∈ F}


x→F
= inf sup f (x) ∈ [−∞, ∞],
F ∈F x∈F

lim inf f (x) = sup{u : u ∈ [−∞, ∞], {x : f (x) ≥ u} ∈ F}


x→F
= sup inf f (x).
F ∈F x∈F

It is easy to see that, for any two functions f , g : X → R,


limx→F f (x) = a iff a = lim supx→F f (x) = lim inf x→F f (x),
and
lim supx→F f (x) + g(x) ≤ lim supx→F f (x) + lim supx→F g(x),

lim inf x→F f (x) + g(x) ≥ lim inf x→F f (x) + lim inf x→F g(x),

lim inf x→F (−f (x)) = − lim supx→F f (x), lim supx→F (−f (x)) = − lim inf x→F f (x),

lim inf x→F cf (x) = c lim inf x→F f (x), lim supx→F cf (x) = c lim supx→F f (x)
whenever the right-hand-sides are defined in [−∞, ∞] and c ≥ 0. So if a = limx→F f (x) and b = limx→F (x) exist in R,
limx→F f (x) + g(x) exists and is equal to a + b and limx→F cf (x) exists and is equal to c limx→F f (x) for every c ∈ R.
We also see that if f : X → R is such that
for every ǫ > 0 there is an F ∈ F such that supx∈F f (x) ≤ ǫ + inf x∈F f (x),
then lim supx→F f (x) ≤ ǫ + lim inf x→F f (x) for every ǫ > 0, so that limx→F f (x) is defined in [−∞, ∞].

(g) Note that the standard limits of real analysis can be represented in the form described here. For instance,
limn→∞ , lim supn→∞ , lim inf n→∞ correspond to limn→F0 , lim supn→F0 , lim inf n→F0 where F0 is the Fréchet filter
on N, the filter {N \ A : A ⊆ N is finite} of cofinite subsets of N. Similarly, limδ↓a , lim supδ↓a , lim inf δ↓a correspond to
limδ→F , lim supδ→F , lim inf δ→F where
F = {A : A ⊆ R, ∃ h > 0 such that ]a, a + h] ⊆ A}.
510 Appendix 2A3T

2A3T Product topologies We need some brief remarks concerning topologies on product spaces.
(a) Let (X, T) and (Y, S) be topological spaces. Let U be the set of subsets U of X × Y such that for every (x, y) ∈ U
there are G ∈ T, H ∈ S such that (x, y) ∈ G × H ⊆ U . Then U is a topology on X × Y . P P (i) ∅ ∈ U because the
condition for membership of U is vacuously satisfied. X × Y ∈ U because X ∈ T, Y ∈ S and (x, y) ∈ X × Y ⊆ X × Y
for every (x, y) ∈ X × Y . (ii) If U , V ∈ U and (x, y) ∈ U ∩ V , then there are G, G′ ∈ T, H, H ′ ∈ S such that
(x, y) ∈ G × H ⊆ U , (x, y) ∈ G′ × H ′ ⊆ V ;
now G ∩ G′ ∈ T, H ∩ H ′ ∈ S and
(x, y) ∈ (G ∩ G′ ) × (H ∩ H ′ ) ⊆ U ∩ V .
S
As (x, y) is arbitrary, U ∩ V ∈ U. (iii) If U ⊆ U and (x,
S y) ∈ U , then there is aS U ∈ U such that (x, y) ∈ U ; now there
are G ∈ T, H ∈ S such that (x, y) ∈ G × H ⊆ U ⊆ U . As (x, y) is arbitrary, U ∈ U. Q Q
U is called the product topology on X × Y .
(b) Suppose, in (a), that T and S are defined by non-empty families P, Θ of pseudometrics in the manner of 2A3F.
Then U is defined by the family Υ = {ρ̃ : ρ ∈ P} ∪ {θ̄ : θ ∈ Θ} of pseudometrics on X × Y , where
ρ̃((x, y), (x′ , y ′ )) = ρ(x, x′ ), θ̄((x, y), (x′ , y ′ )) = θ(y, y ′ )
whenever x, x′ ∈ X, y, y ′ ∈ Y , ρ ∈ P and θ ∈ Θ.
PP (i) Of course you should check that every ρ̃, θ̄ is a pseudometric on X × Y .
(ii) If U ∈ U and (x, y) ∈ U , then there are G ∈ T, H ∈ S such that (x, y) ∈ G×H ⊆ U . There are ρ0 , . . . , ρm ∈ P,
θ0 , . . . , θn ∈ Θ, δ, δ ′ > 0 such that (in the language of 2A3Fc) U (x; ρ0 , . . . , ρm ; δ) ⊆ G, U (x; θ0 , . . . , θn ; δ) ⊆ H. Now
U ((x, y); ρ̃0 , . . . , ρ̃m , θ̄0 , . . . , θ̄n ; min(δ, δ ′ )) ⊆ U .
As (x, y) is arbitrary, U is open for the topology generated by Υ.
(iii) If U ⊆ X × Y is open for the topology defined by Υ, take any (x, y) ∈ U . Then there are υ0 , . . . , υk ∈ Υ
and δ > 0 such that U ((x, y); υ0 , . . . , υk ; δ) ⊆ U . Take ρ0 , . . . , ρm ∈ P and θ0 , . . . , θn ∈ Θ such that {υ0 , . . . , υk } ⊆
{ρ̃0 , . . . , ρ̃m , θ̄0 , . . . , θ̄n }; then G = U (x; ρ0 , . . . , ρm ; δ) ∈ T (2A3G), H = U (y; θ0 , . . . , θn ; δ) ∈ S, and
(x, y) ∈ G × H = U ((x, y); ρ̃0 , . . . , ρm , θ̄0 , . . . , θ̄n ; δ) ⊆ U ((x, y); υ0 , . . . , υk ; δ) ⊆ U .
As (x, y) is arbitrary, U ∈ U. This completes the proof that U is the topology defined by Υ. Q
Q
(c) In particular, the product topology on R r × R s is the Euclidean topology if we identify R r × R s with R r+s . P
P
The product topology is defined by the two pseudometrics υ1 , υ2 , where for x, x′ ∈ R r and y, y ′ ∈ R s I write
υ1 ((x, y), (x′ , y ′ )) = kx − x′ k,
υ2 ((x, y), (x′ , y ′ )) = ky − y ′ k
(2A3F(b-ii)). Similarly, the Euclidean topology on R r × R s ∼ = R r+s is defined by the metric ρ, where
p
ρ((x, y), (x′ , y ′ )) = k(x − y) − (x′ , y ′ )k = kx − x′ k2 + ky − y ′ k2 .
Now if (x, y) ∈ R r × R s and ǫ > 0, then
U ((x, y); ρ; ǫ) ⊆ U ((x, y); υj ; ǫ)
for both j, while
ǫ
U ((x, y); υ1 , υ2 ; √ ) ⊆ U ((x, y); ρ; ǫ).
2
Thus, as remarked in 2A3Ib, each topology is included in the other, and they are the same. Q
Q

2A3U Dense sets (a) If X is a topological space, a set D ⊆ X is dense in X if D = X, that is, if every non-empty
open set meets D. More generally, if D ⊆ A ⊆ X, then D is dense in A if it is dense for the subspace topology of A
(2A3C), that is, if A ⊆ D.
(b) If T is defined by a non-empty family P of pseudometrics on X, then D ⊆ X is dense iff U (x; ρ0 , . . . , ρn ; δ)∩D 6= ∅
whenever x ∈ X, ρ0 , . . . , ρn ∈ P and δ > 0.
(c) If (X, T), (Y, S) are topological spaces, of which Y is Hausdorff (in particular, if (X, ρ) and (Y, θ) are metric
spaces), and f , g : X → Y are continuous functions which agree on some dense subset D of X, then f = g. P P??
Suppose, if possible, that there is an x ∈ X such that f (x) 6= g(x). Then there are open sets G, H ⊆ Y such that
f (x) ∈ G, g(x) ∈ H and G ∩ H = ∅. Now f −1 [G] ∩ g −1 [H] is an open set, containing x and therefore not empty, but
it cannot meet D, so x ∈ / D and D is not dense. X XQQ
2A4E Normed spaces 511

(d) A topological space is called separable if it has a countable dense subset. For instance, R r is separable for
every r ≥ 1, since Qr is dense.

2A4 Normed spaces


In Chapter 24 I discuss the spaces Lp , for 1 ≤ p ≤ ∞, and describe their most basic properties. These spaces form
a group of fundamental examples for the general theory of ‘normed spaces’, the basis of functional analysis. This is
not the book from which you should learn that theory, but once again it may save you trouble if I briefly outline those
parts of the general theory which are essential if you are to make sense of the ideas here.

2A4A The real and complex fields While the most important parts of the theory, from the point of view of
measure theory, are most effectively dealt with in terms of real linear spaces, there are many applications in which
complex linear spaces are essential. I will therefore use the phrase
R
‘U is a linear space over C’
to mean that U is either a linear space over the field R or a linear space over the field C; it being understood that in
any particular context all linear spaces considered will be over the same field. In the same way, I will write ‘α ∈ R
C ’ to
mean that α belongs to whichever is the current underlying field.

2A4B Definitions (a) A normed space is a linear space U over R C together with a norm, that is, a functional
k k : U → [0, ∞[ such that
ku + vk ≤ kuk + kvk for all u, v ∈ U ,
kαuk = |α|kuk for u ∈ U , α ∈ RC,
kuk = 0 only when u = 0, the zero vector of U .
(Observe that if u = 0 (the zero vector) then 0u = u (where this 0 is the zero scalar) so that kuk = |0|kuk = 0.)

(b) If U is a normed space, then we have a metric ρ on U defined by saying that ρ(u, v) = ku − vk for u,
v ∈ U. P P ρ(u, v) ∈ [0, ∞[ for all u, v because kuk ∈ [0, ∞[ for every u. ρ(u, v) = ρ(v, u) for all u, v because
kv − uk = | − 1|ku − vk = ku − vk for all u, v. If u, v, w ∈ U then
ρ(u, w) = ku − wk = k(u − v) + (v − w)k ≤ ku − vk + kv − wk = ρ(u, v) + ρ(v, w).
If ρ(u, v) = 0 then ku − vk = 0 so u − v = 0 and u = v. Q
Q
We therefore have a corresponding topology, with open and closed sets, closures, convergent sequences and so on.

(c) If U is a normed space, a set A ⊆ U is bounded (for the norm) if {kuk : u ∈ A} is bounded in R; that is, there
is some M ≥ 0 such that kuk ≤ M for every u ∈ A.

2A4C Linear subspaces (a) If U is any normed space and V is a linear subspace of U , then V is also a normed
space, if we take the norm of V to be just the restriction to V of the norm of U ; the verification is trivial.

(b) If V is a linear subspace of U , so is its closure V . PP Take u, u′ ∈ V and α ∈ R


C . If ǫ > 0, set δ = ǫ/(2 + |α|) > 0;
then there are v, v ∈ V such that ku − vk ≤ δ, ku − v k ≤ δ. Now v + v ′ , αv ∈ V and
′ ′ ′

k(u + u′ ) − (v + v ′ )k ≤ ku − vk + ku′ − v ′ k ≤ ǫ, kαu − αvk ≤ |α|ku − vk ≤ ǫ.


′ ′
As ǫ is arbitrary, u + u and αu belong to V ; as u, u and α are arbitrary, and 0 surely belongs to V ⊆ V , V is a linear
subspace of U . QQ

2A4D Banach spaces (a) If U is a normed space, a sequence hun in∈N in U is Cauchy if kum − un k → 0 as m,
n → ∞, that is, for every ǫ > 0 there is an n0 ∈ N such that kum − un k ≤ ǫ for all m, n ≥ n0 .

(b) A normed space U is complete if every Cauchy sequence has a limit; a complete normed space is called a
Banach space.

2A4E It is helpful to know the following result.


Lemma Let U be a normed space such that hun in∈N is convergent (that is, has a limit) in U whenever hun in∈N is a
sequence in U such that kun+1 − un k ≤ 4−n for every n ∈ N. Then U is complete.
512 Appendix 2A4E

proof Let hun in∈N be any Cauchy sequence in U . For each k ∈ N, let nk ∈ N be such that kum − un k ≤ 4−k whenever
m, n ≥ nk . Set vk = unk for each k. Then kvk+1 − vk k ≤ 4−k (whether nk ≤ nk+1 or nk+1 ≤ nk ). So hvk ik∈N has a
limit v ∈ U . I seek to show that v is the required limit of hun in∈N . Given ǫ > 0, let l ∈ N be such that kvk − vk ≤ ǫ
for every k ≥ l; let k ≥ l be such that 4−k ≤ ǫ; then if n ≥ nk ,
kun − vk = k(un − vk ) + (vk − v)k ≤ kun − vk k + kvk − vk ≤ kun − unk k + ǫ ≤ 2ǫ.
As ǫ is arbitrary, v is a limit of hun in∈N . As hun in∈N is arbitrary, U is complete.

2A4F Bounded linear operators (a) Let U , V be two normed spaces. A linear operator T : U → V is bounded
if {kT uk : u ∈ U, kuk ≤ 1} is bounded. (Warning! in this context, we do not ask for the whole set of values T [U ] to
be bounded; a ‘bounded linear operator’ need not be what we ordinarily call a ‘bounded function’.) Write B(U ; V ) for
the space of all bounded linear operators from U to V , and for T ∈ B(U ; V ) write kT k = sup{kT uk : u ∈ U, kuk ≤ 1}.

(b) A useful fact: kT uk ≤ kT kkuk for every T ∈ B(U ; V ), u ∈ U . P


P If |α| > kuk then
1 1
k uk = kuk ≤ 1,
α |α|
so
1 1
kT uk = kαT ( u)k = |α|kT ( u)k ≤ |α|kT k;
α α
as α is arbitrary, kT uk ≤ kT kkuk. Q
Q

(c) A linear operator T : U → V is bounded iff it is continuous for the norm topologies on U and V . P
P (i) If T is
bounded, u0 ∈ U and ǫ > 0, then
kT u − T u0 k = kT (u − u0 )k ≤ kT kku − u0 k ≤ ǫ
ǫ
whenever ku − u0 k ≤ ; by 2A3H, T is continuous. (ii) If T is continuous, then there is some δ > 0 such that
1+kT k
kT uk = kT u − T 0k ≤ 1 whenever kuk = ku − 0k ≤ δ. If now kuk ≤ 1,
1 1
kT uk = kT (δu)k ≤ ,
δ δ
so T is a bounded operator. Q
Q

R
2A4G Theorem B(U ; V ) is a linear space over C, and k k is a norm on B(U ; V ).
proof I am rather supposing that you are aware, but in any case you will find it easy to check, that if S : U → V and
T : U → V are linear operators, and α ∈ R
C , then we have linear operators S + T and αT from U to V defined by the
formulae
(S + T )(u) = Su + T u, (αT )(u) = α(T u)
for every u ∈ U ; moreover, that under these definitions of addition and scalar multiplication the space of all linear
operators from U to V is a linear space. Now we see that whenever S, T ∈ B(U ; V ), α ∈ RC , u ∈ U and kuk ≤ 1,
k(S + T )(u)k = kSu + T uk ≤ kSuk + kT uk ≤ kSk + kT k,

k(αT )uk = kα(T u)k = |α|kT uk ≤ |α|kT k;


so that S + T and αT belong to B(U ; V ), with kS + T k ≤ kSk + kT k and kαT k ≤ |α|kT k. This shows that B(U ; V ) is
a linear subspace of the space of all linear operators and is therefore a linear space over RC in its own right. To check
that the given formula for kT k defines a norm, most of the work has just been done; I suppose I should remark, for the
sake of form, that kT k ∈ [0, ∞[ for every T ; if α = 0, then of course kαT k = 0 = |α|kT k; for other α,
|α|kT k = |α|kα−1 αT k ≤ |α||α−1 |kαT k = kαT k,
so kαT k = |α|kT k. Finally, if kT k = 0 then kT uk ≤ kT kkuk = 0 for every u ∈ U , so T u = 0 for every u and T is the
zero operator (in the space of all linear operators, and therefore in its subspace B(U ; V )).

2A4H Dual spaces The most important case of B(U ; V ) is when V is the scalar field R C itself (of course we can
think of C as a normed space over itself, writing kαk = |α| for each scalar α). In this case we call B(U ; R
R
C ) the dual of
U ; it is commonly denoted U ′ or U ∗ ; I use the latter.
*2A4K Normed spaces 513

2A4I Extensions of bounded operators: Theorem Let U be a normed space and V ⊆ U a dense linear
subspace. Let W be a Banach space and T0 : V → W a bounded linear operator; then there is a unique bounded linear
operator T : U → W extending T0 , and kT k = kT0 k.
proof (a) For any u ∈ U , there is a sequence hvn in∈N in V converging to u. Now
kT0 vm − T0 vn k = kT0 (vm − vn )k ≤ kT0 kkvm − vn k ≤ kT0 k(kvm − uk + ku − vn k) → 0
as m, n → ∞, so hT0 vn in∈N is Cauchy and w = limn→∞ T0 vn is defined in W . If hvn′ in∈N is another sequence in V
converging to u, then

kw − T0 vn′ k ≤ kw − T0 vn k + kT0 (vn − vn′ )k


≤ kw − T0 vn k + kT0 k(kvn − uk + ku − vn′ k) → 0
as n → ∞, so w is also the limit of hT0 vn′ in∈N .
(b) We may therefore define T : U → W by setting T u = limn→∞ T0 vn whenever hvn in∈N is a sequence in V
converging to u. If v ∈ V , then we can set vn = v for every n to see that T v = T0 v; thus T extends T0 . If u, u′ ∈ U
and α ∈ R ′ ′
C , take sequences hvn in∈N , hvn in∈N in V converging to u, u respectively; in this case
k(u + u′ ) − (vn + vn′ )k ≤ ku − vn k + ku′ − vn′ k → 0, kαu − αun k = |α|ku − un k → 0

as n → ∞, so that T (u + u ) = limn→∞ T0 (vn + vn′ ), T (αu) = limn→∞ T0 (αvn ), and

kT (u + u′ ) − T u − T u′ k ≤ kT (u + u′ ) − T0 (vn + vn′ )k + kT0 vn − T uk + kT0 vn′ − T u′ k


→ 0,

kT (αu) − αT uk ≤ kT (αu) − T0 (αvn )k + |α|kT0 vn − T uk → 0


as n → ∞. This means that kT (u + u′ ) − T u − T u′ k = 0, kT (αu) − αT uk = 0 so T (u + u′ ) = T u + T u′ , T (αu) = αT u;
as u, u′ and α are arbitrary, T is linear.
(c) For any u ∈ U , let hvn in∈N be a sequence in V converging to u. Then

kT uk ≤ kT0 vn k + kT u − T0 vn k ≤ kT0 kkvn k + kT u − T0 vn k


≤ kT0 k(kuk + kvn − uk) + kT u − T0 vn k → kT0 kkuk
as n → ∞, so kT uk ≤ kT0 kkuk. As u is arbitrary, T is bounded and kT k ≤ kT0 k. Of course kT k ≥ kT0 k just because
T extends T0 .
(d) Finally, let T̃ be any other bounded linear operator from U to W extending T . If u ∈ U , there is a sequence
hvn in∈N in V converging to u; now
kT̃ u − T uk ≤ kT̃ (u − vn )k + kT (vn − u)k ≤ (kT̃ k + kT k)ku − vn k → 0
as n → ∞, so kT̃ u − T uk = 0 and T̃ u = T u. As u is arbitrary, T̃ = T . Thus T is unique.

2A4J Normed algebras (a) A normed algebra is a normed space (U, k k) together with a multiplication, a
binary operator × on U , such that
u × (v × w) = (u × v) × w,

u × (v + w) = (u × v) + (u × w), (u + v) × w = (u × w) + (v × w),

(αu) × v = u × (αv) = α(u × v),

ku × vk ≤ kukkvk
R
for all u, v, w ∈ U and α ∈ C.

(b) A Banach algebra is a normed algebra which is a Banach space. A normed algebra U is commutative if its
multiplication is commutative, that is, u × v = v × u for all u, v ∈ U .

*2A4K Definition A normed space U is uniformly convex if for every ǫ > 0 there is a δ > 0 such that
ku + vk ≤ 2 − δ whenever u, v ∈ U , kuk = kvk = 1 and ku − vk ≥ ǫ.
514 Appendix §2A5 intro.

2A5 Linear topological spaces


The principal objective of §2A3 is in fact the study of certain topologies on the linear spaces of Chapter 24. I give
some fragments of the general theory.

2A5A Linear space topologies Something which is not covered in detail by every introduction to functional
analysis is the general concept of ‘linear topological space’. The ideas needed for the work of §245 are reasonably briefly
expressed.
R R
Definition A linear topological space or topological vector space over C is a linear space U over C together
with a topology T such that the maps
(u, v) 7→ u + v : U × U → U ,
R
(α, u) 7→ αu : C ×U →U
R
are both continuous, where the product spaces U × U and C × U are given
their product topologies (2A3T). Given a
linear space U , a topology on U satisfying the conditions above is a linear space topology. Note that
(u, v) 7→ u − v = u + (−1)v : U × U → U
will also be continuous.

2A5B All the linear topological spaces we need turn out to be readily presentable in the following terms.
Proposition Suppose that U is a linear space over R C , and T is a family of functionals τ : U → [0, ∞[ such that
(i) τ (u + v) ≤ τ (u) + τ (v) for all u, v ∈ U , τ ∈ T;
(ii) τ (αu) ≤ τ (u) if u ∈ U , |α| ≤ 1, τ ∈ T;
(iii) limα→0 τ (αu) = 0 for every u ∈ U , τ ∈ T.
For τ ∈ T, define ρτ : U × U → [0, ∞[ by setting ρτ (u, v) = τ (u − v) for all u, v ∈ U . Then each ρτ is a pseudometric
on U , and the topology defined by P = {ρτ : τ ∈ T} renders U a linear topological space.
proof (a) It is worth noting immediately that
τ (0) = limα→0 τ (α0) = 0
for every τ ∈ T.
(b) To see that every ρτ is a pseudometric, argue as follows.
(i) ρτ takes values in [0, ∞[ because τ does.
(ii) If u, v, w ∈ U then

ρτ (u, w) = τ (u − w) = τ ((u − v) + (v − w))


≤ τ (u − v) + τ (v − w) = ρτ (u, v) + ρτ (v, w).

(iii) If u, v ∈ U , then
ρ(v, u) = τ (v − u) = τ (−1(u − v)) ≤ τ (u, v) = ρτ (u, v),
and similarly ρτ (u, v) ≤ ρτ (v, u), so the two are equal.
(iv) If u ∈ U then ρτ (u, u) = τ (0) = 0.
(c) Let T be the topology on U defined by {ρτ : τ ∈ T} (2A3F).
(i) Addition is continuous because, given τ ∈ T, we have

ρτ (u′ + v ′ , u + v) = τ ((u′ + v ′ ) − (u + v))


≤ τ (u′ − u) + τ (v ′ − v) ≤ ρτ (u′ , u) + ρτ (v ′ , v)
for all u, v, u′ , v ′ ∈ U . This means that, given ǫ > 0 and (u, v) ∈ U × U , we shall have
ǫ
ρτ (u′ + v ′ , u + v) ≤ ǫ whenever (u′ , v ′ ) ∈ U ((u, v); ρ̃τ , ρ̄τ ; ),
2
2A5F Linear topological spaces 515

using the language of 2A3Tb. Because ρ̃τ , ρ̄τ are two of the pseudometrics defining the product topology of U × U
(2A3Tb), (u, v) 7→ u + v is continuous, by the criterion of 2A3H.
(ii) Scalar multiplication is continuous because if u ∈ U and n ∈ N then τ (nu) ≤ nτ (u) for every τ ∈ T (induce
on n). Consequently, if τ ∈ T,
α
τ (αu) ≤ nτ ( u) ≤ nτ (u)
n
R
whenever |α| < n ∈ N and τ ∈ T. Now, given (α, u) ∈ C × U and ǫ > 0, take n > |α| and δ > 0 such that
ǫ
δ ≤ min(n − |α|, 2n ) and τ (γu) ≤ 2ǫ whenever |γ| ≤ δ; then

ρτ (α′ u′ , αu) = τ (α′ u′ − αu) ≤ τ (α′ (u′ − u)) + τ ((α′ − α)u)


≤ nτ (u′ − u) + τ ((α′ − α)u)
R R
whenever u′ ∈ U and α′ ∈ C and |α′ | < n ∈ N. Accordingly, setting θ(α′ , α) = |α′ − α| for α′ , α ∈ C,
ǫ
ρτ (α′ u′ , αu) ≤ nδ + ≤ǫ
2
whenever
(α′ , u′ ) ∈ U ((α, u); θ̃, ρ̄τ ; δ).
R
Because θ̃ and ρ̄τ are among the pseudometrics defining the topology of C × U , the map (α, u) 7→ αu satisfies the
criterion of 2A3H and is continuous.
Thus T is a linear space topology on U .

*2A5C We do not need it for Chapter 24, but the following is worth knowing.
Theorem Let U be a linear space and T a linear space topology on U .
(a) There is a family T of functionals satisfying the conditions (i)-(iii) of 2A5B and defining T.
(b) If T is metrizable, we can take T to consist of a single functional.
proof (a) Kelley & Namioka 76, p. 50.
(b) Köthe 69, §15.11.

2A5D Definition Let U be a linear space over R C . Then a seminorm on U is a functional τ : U → [0, ∞[ such that
(i) τ (u + v) ≤ τ (u) + τ (v) for all u, v ∈ U ;
(ii) τ (αu) = |α|τ (u) if u ∈ U , α ∈ RC.
Observe that a norm is always a seminorm, and that a seminorm is always a functional of the type described in 2A5B.
In particular, the association of a metric with a norm (2A4Bb) is a special case of 2A5B.

2A5E Convex sets (a) Let U be a linear space over R C . A subset C of U is convex if αu+(1−α)v ∈ C whenever u,
v ∈ C and α ∈ [0, 1]. The intersection of any family of convex sets P is convex, so for every set A ⊆ U there is a smallest
n
convex
Pn set including A; this is just the set of vectors expressible as i=0 αi ui where u0 , . . . , un ∈ A, α0 , . . . , αn ∈ [0, 1]
and i=0 αi = 1 (Bourbaki 87, II.2.3); it is the convex hull of A.
(b) If U is a linear topological space, the closure of any convex set is convex (Bourbaki 87, II.2.6). It follows that,
for any A ⊆ U , the closure of the convex hull of A is the smallest closed convex set including A; this is the closed
convex hull of A.
(c) I note for future reference that in a linear topological space, the closure of any linear subspace is a linear subspace.
(Bourbaki 87, I.1.3; Köthe 69, §15.2. Compare 2A4Cb.)

2A5F Completeness in linear topological spaces In normed spaces, completeness can be described in terms
of Cauchy sequences (2A4D). In general linear topological spaces this is inadequate. The true theory of ‘completeness’
demands the concept of ‘uniform space’ (see §3A4 in the next volume, or Kelley 55, chap. 6; Engelking 89, §8.1:
Bourbaki 66, chap. II); I shall not describe this here, but will give a version adapted to linear spaces. I mention this
only because you will I hope some day come to the general theory (in Volume 3 of this treatise, if not before), and you
should be aware that the special case described here gives a misleading emphasis at some points.
Definitions Let U be a linear space over RC , and T a linear space topology on U . A filter F on U is Cauchy if for
every open set G in U containing 0 there is an F ∈ F such that F − F = {u − v : u, v ∈ F } is included in G. U is
complete if every Cauchy filter on U is convergent.
516 Appendix 2A5G

2A5G Cauchy filters have a simple description when a linear space topology is defined by the method of 2A5B.
Lemma Let U be a linear space over R C , and let T be a family of functionals defining a linear space topology on U , as
in 2A5B. Then a filter F on U is Cauchy iff for every τ ∈ T and ǫ > 0 there is an F ∈ F such that τ (u − v) ≤ ǫ for all
u, v ∈ F .
proof (a) Suppose that F is Cauchy, τ ∈ T and ǫ > 0. Then G = U (0; ρτ ; ǫ) is open (using the language of 2A3F-2A3G),
so there is an F ∈ F such that F − F ⊆ G; but this just means that τ (u − v) < ǫ for all u, v ∈ F .
(b) Suppose that F satisfies the criterion, and that G is an open set containing 0. Then there are τ0 , . . . , τn ∈ T
such that U (0; ρτ0 , . . . , ρτn ; ǫ) ⊆ G. For each i ≤ n there is an Fi ∈ F such that τi (u, v) < 2ǫ for all u, v ∈ Fi ;
and ǫ > 0 T
now F = i≤n Fi ∈ F and u − v ∈ G for all u, v ∈ F .

2A5H Normed spaces and sequential completeness I had better point out that for normed spaces the definition
of 2A5F agrees with that of 2A4D.
Proposition Let (U, k k) be a normed space over R
C , and let T be the linear space topology on U defined by the method
of 2A5B from the set T = {k k}. Then U is complete in the sense of 2A5F iff it is complete in the sense of 2A4D.
proof (a) Suppose first that U is complete in the sense of 2A5F. Let hun in∈N be a sequence in U which is Cauchy in
the sense of 2A4Da. Set
F = {F : F ⊆ U, {n : un ∈
/ F } is finite}.
Then it is easy to check that F is a filter on U , the image of the Fréchet filter under the map n 7→ un : N → U .
If ǫ > 0, take m ∈ N such that kuj − uk k ≤ ǫ whenever j, k ≥ m; then F = {uj : j ≥ m} belongs to F, and
ku − vk ≤ ǫ for all u, v ∈ F . So F is Cauchy in the sense of 2A5F, and has a limit u say. Now, for any ǫ > 0, the set
{v : kv − uk < ǫ} = U (u; ρk k ; ǫ) is an open set containing u, so belongs to F, and {n : kun − uk ≥ ǫ} is finite, that is,
there is an m ∈ N such that kum − uk < ǫ whenever n ≥ m. As ǫ is arbitrary, u = limn→∞ un in the sense of 2A3M.
As hun in∈N is arbitrary, U is complete in the sense of 2A4D.
(b) Now suppose that U is complete in the sense of 2A4D. Let F be a Cauchy T filter on U . For each n ∈ N, choose
a set Fn ∈ F such that ku − vk ≤ 2−n for all u, v ∈ Fn . For each n ∈ N, Fn′ = i≤n Fi belongs to F, so is not empty;
choose un ∈ Fn′ . If m ∈ N and j, k ≥ m, then both uj and uk belong to Fm , so kuj − uk k ≤ 2−m ; thus hun in∈N is a
Cauchy sequence in the sense of 2A4Da, and has a limit u say. Now take any ǫ > 0 and m ∈ N such that 2−m+1 ≤ ǫ.
There is surely a k ≥ m such that kuk − uk ≤ 2−m ; now uk ∈ Fm , so
Fm ⊆ {v : kv − uk k ≤ 2−m } ⊆ {v : kv − uk ≤ 2−m+1 } ⊆ {v : ρk k (v, u) ≤ ǫ},
and {v : ρk k (v, u) ≤ ǫ} ∈ F. As ǫ is arbitrary, F converges to u, by 2A3Sd. As F is arbitrary, U is complete.
(c) Thus the two definitions coincide, provided at least that we allow the countably many simultaneous choices of
the un in part (b) of the proof.

2A5I Weak topologies I come now to brief notes on ‘weak topologies’ on normed spaces; from the point of view
of this volume, these are in fact the primary examples of linear space topologies. Let U be a normed linear space over
R
C.

(a) Write U ∗ for its dual B(U ; R ∗


C ) (2A4H). Let T be the set {|h| : h ∈ U }; then T satisfies the conditions of 2A5B,
so defines a linear space topology on U ; this is called the weak topology of U .

(b) A filter F on U converges to u ∈ U for the weak topology of U iff limv→F ρ|h| (v, u) = 0 for every h ∈ U ∗
(2A3Sd), that is, iff limv→F |h(v − u)| = 0 for every h ∈ U ∗ , that is, iff limv→F h(v) = h(u) for every h ∈ U ∗ .

(c) A set C ⊆ U is called weakly compact if it is compact for the weak topology of U . So (subject to the axiom
of choice) a set C ⊆ U is weakly compact iff for every ultrafilter F on U containing C there is a u ∈ C such that
limv→F h(v) = h(u) for every h ∈ U ∗ (put 2A3R together with (b) above).

(d) A subset A of U is called relatively weakly compact if it is a subset of some weakly compact subset of U .

(e) If h ∈ U ∗ , then h : U → R R
C is continuous for the weak topology on U and the usual topology of C ; this is obvious
if we apply the criterion of 2A3H. So if A ⊆ U is relatively weakly compact, h[A] must be bounded in R C. PP Let C ⊇ A
R
be a weakly compact set. Then h[C] is compact in C , by 2A3Nb, so is bounded, by 2A2F (noting that if the underlying
field is C, then it can be identified, as metric space, with R 2 ). Accordingly h[A] also is bounded. Q
Q
2A6C Factorization of matrices 517

(f ) If V is another normed space and T : U → V is a bounded linear operator, then T is continuous for the respective
weak topologies. P P If h ∈ V ∗ then the composition hT belongs to U ∗ . Now, for any u, v ∈ U ,
ρ|h| (T u, T v) = |h(T u − T v)| = |hT (u − v)| = ρ|hT | (u, v),
taking ρ|h| , ρ|hT | to be the pseudometrics on V , U respectively defined by the formula of 2A5B. By 2A3H, T is
continuous. Q Q

(g) Corresponding to the weak topology on a normed space U , we have the weak* or w*-topology on its dual U ∗ ,
defined by the set T = {|û| : u ∈ U }, where I write û(f ) = f (u) for every f ∈ U ∗ , u ∈ U . As in (a), this is a linear
space topology on U ∗ . (It is essential to distinguish between the ‘weak*’ topology and the ‘weak’ topology on U ∗ . The
former depends only on the action of U on U ∗ , the latter on the action of U ∗∗ = (U ∗ )∗ . You will have no difficulty in
checking that û ∈ U ∗∗ for every u ∈ U , but the point is that there may be members of U ∗∗ not representable in this
way, leading to open sets for the weak topology which are not open for the weak* topology.)

*2A5J Angelic spaces I do not rely on the following ideas, but they may throw light on some results in §§246-247.
First, a topological space X is regular if whenever G ⊆ X is open and x ∈ G then there is an open set H such that
x ∈ H ⊆ H ⊆ G. Next, a regular Hausdorff space X is angelic if whenever A ⊆ X is such that every sequence in A
has a cluster point in X, then A is compact and every point of A is the limit of a sequence in A. What this means is
that compactness in X, and the topologies of compact subsets of X, can be effectively described in terms of sequences.
Now the theorem (due to Eberlein and Šmulian) is that any normed space is angelic in its weak topology. (462D in
Volume 4; Köthe 69, §24; Dunford & Schwartz 57, V.6.1.) In particular, this is true of L1 spaces, which makes
it less surprising that there should be criteria for weak compactness in L1 spaces which deal only with sequences.

2A6 Factorization of matrices


I spend a couple of pages on the linear algebra of R r required for Chapter 26. I give only one proof, because this is
material which can be found in any textbook of elementary linear algebra; but I think it may be helpful to run through
the basic ideas in the language which I use for this treatise.

2A6A Determinants We need to know the following things about determinants.


(i) Every r × r real matrix T has a real determinant det T .
(ii) For any r × r matrices S and T , det ST = det S det T .
(iii) If T is a diagonal matrix, its determinant is just the product of its diagonal entries.
(iv) For any r × r matrix T , det T ′ = det T , where T ′ is the transpose of T .
(v) det T is a continuous function of the coefficients of T .
There are so many routes through this topic that I avoid even a definition of ‘determinant’; I invite you to check your
memory, or your favourite text, to confirm that you are indeed happy with the facts above.

r
Pr
2A6B Orthonormal √ families For x = (ξ1 , . . . , ξr ), y = (η1 , . . . , ηr ) ∈ R , write x . y = i=1 ξi ηi ; of course kxk,
as defined in 1A2A, is x .x. Recall that x1 , . . . , xk are orthonormal if xi . xj = 0 for i 6= j, 1 for i = j. The results
we need here are:
(i) If x1 , . . . , xk are orthonormal vectors in R r , where k < r, then there are vectors xk+1 , . . . , xr in R r
such that x1 , . . . , xr are orthonormal.
(ii) An r × r matrix P is orthogonal if P ′ P is the identity matrix; equivalently, if the columns of P are
orthonormal.
(iii) For an orthogonal matrix P , det P must be ±1 (put (ii)-(iv) of 2A6A together).
(iv) If P is orthogonal, then P x .P y = P ′ P x .y = x . y for all x, y ∈ R r .
(v) If P is orthogonal, so is P ′ = P −1 .
(vi) If P and Q are orthogonal, so is P Q.

2A6C I now give a proposition which is not always included in elementary presentations. Of course there are many
approaches to this; I offer a direct one.
Proposition Let T be any real r × r matrix. Then T is expressible as P DQ where P and Q are orthogonal matrices
and D is a diagonal matrix with non-negative coefficients.
518 Appendix 2A6C

proof I induce on r.
(a) If r = 1, then T = (τ11 ). Set D = (|τ11 |), P = (1) and Q = (1) if τ11 ≥ 0, (−1) otherwise.
(b)(i) For the inductive step to r + 1 ≥ 2, consider the unit ball B = {x : x ∈ R r+1 , kxk ≤ 1}. This is a closed
bounded set in R r+1 , so is compact (2A2F). The maps x 7→ T x : R r+1 → R r+1 and x 7→ kxk : R r+1 → R are
continuous, so the function x 7→ kT xk : B → R is bounded and attains its bounds (2A2G), and there is a u ∈ B
such that kT uk ≥ kT xk for every x ∈ B. Observe that kT uk must be the norm kT k of T as defined in 262H. Set
δ = kT k = kT uk. If δ = 0, then T must be the zero matrix, and the result is trivial; so let us suppose that δ > 0. In
this case kuk must be exactly 1, since otherwise we should have u = kuku′ where ku′ k = 1 and kT u′ k > kT uk.
(ii) If x ∈ R r+1 and x . u = 0, then T x .T u = 0. P
P?? If not, set γ = T x .T u 6= 0. Consider y = u + ηγx for small
η > 0. We have
kyk2 = y . y = u .u + 2ηγu . x + η 2 γ 2 x .x = kuk2 + η 2 γ 2 kxk2 = 1 + η 2 γ 2 kxk2 ,
while
kT yk2 = T y . T y = T u .T u + 2ηγT u .T x + η 2 γ 2 T x .T x = δ 2 + 2ηγ 2 + η 2 γ 2 kT xk2 .
But also kT yk2 ≤ δ 2 kyk2 (2A4Fb), so
δ 2 + 2ηγ 2 + η 2 γ 2 kT xk2 ≤ δ 2 (1 + η 2 γ 2 kxk2 )
and
2ηγ 2 ≤ δ 2 η 2 γ 2 kxk2 − η 2 γ 2 kT xk2 ,
that is,
2 ≤ η(δ 2 kxk2 − kT xk2 ).
But this surely cannot be true for all η > 0, so we have a contradiction. X
XQQ
(iii) Set v = δ −1 T u, so that kvk = 1. Let u1 , . . . , ur+1 be orthonormal vectors such that ur+1 = u, and let
Q0 be the orthogonal (r + 1) × (r + 1) matrix with columns u1 , . . . , ur+1 ; then, writing e1 , . . . , er+1 for the standard
orthonormal basis of R r+1 , we have Q0 ei = ui for each i, and Q0 er+1 = u. Similarly, there is an orthogonal matrix P0
such that P0 er+1 = v.
Set T1 = P0−1 T Q0 . Then
T1 er+1 = P0−1 T u = δP0−1 v = δer+1 ,
while if x . er+1 = 0 then Q0 x .u = 0 (2A6B(iv)), so that
T1 x . er+1 = P0 T1 x . P0 er+1 = T Q0 x .v = 0,
by (ii). This means that T1 must be of the form
 
S 0
,
0 δ
where S is an r × r matrix.
(iv) By the inductive hypothesis, S is expressible as P̃ D̃Q̃, where P̃ and Q̃ are orthogonal r × r matrices and D̃
is a diagonal r × r matrix with non-negative coefficients. Set
     
P̃ 0 Q̃ 0 D̃ 0
P1 = , Q1 = , D= .
0 1 0 1 0 δ
Then P1 and Q1 are orthogonal and D is diagonal, with non-negative coefficients, and P1 DQ1 = T1 . Now set
P = P0 P1 , Q = Q1 Q−1
0 ,

so that P and Q are orthogonal and


P DQ = P0 P1 DQ1 Q−1 −1
0 = P0 T 1 Q 0 = T .

Thus the induction proceeds.


272U Concordance 519

Concordance
I list here the section and paragraph numbers which have (to my knowledge) appeared in print in references to this
chapter, and which have since been changed.

211Ya Countable-cocountable algebra of R This exercise, referred to in the 2002 edition of Volume 3, has been
moved to 211Ye.

214J Subspace measures on measurable subspaces, direct sums 214J-214M, referred to in the 2002 and
2004 editions of Volume 3, the 2003 and 2006 editions of Volume 4, and the 2008 edition of Volume 5, have been moved
to 214K-214N.

214N Upper and lower integrals This result, referred to in the 2008 edition of Volume 5, has been moved to
214J.

215Yc Measurable envelopes This exercise, referred to in the 2000 edition of Volume 1, has been moved to
216Yc.

§234 Section §234 has been rewritten, with a good deal of new material. The former paragraphs 234A-234G,
referred to in the 2002 and 2004 editions of Volume 3 and the 2003 and 2006 editions of Volume 4, are now 234I-234O.

§235 Section §235 has been re-organized, with some material moved to §234. Specifically, 235H, 235I, 235J, 235L,
235M, 235T and 235Xe, referred to in the 2002 and 2004 editions of Volume 3 and the 2003 and 2006 editions of Volume
4, are now dealt with in 234B, 235G, 235H, 235J, 235K, 235R and 234A.

241Yd Countable sup property This exercise, referred to in the 2002 edition of Volume 3, has been moved to
241Ye.

241Yh Quotient Riesz spaces This exercise, referred to in the 2002 edition of Volume 3, has been moved to
241Yc.

242Xf Inverse-measure-preserving functions This exercise, referred to in the 2002 edition of Volume 3, has
been moved to 242Xd.

242Yc Order-continuous norms This exercise, referred to in the 2002 edition of Volume 3, has been moved to
242Yg.

244O Complex Lp This paragraph, referred to in the 2002 and 2004 editions of Volume 3, and the 2003 and 2006
editions of Volume 4, is now 244P.

244Xf Lp and Lq This exercise, referred to in the 2003 edition of Volume 4, has been moved to 244Xe.

244Yd-244Yf Lp as Banach lattice These exercises, referred to in the 2002 and 2004 editions of Volume 3, are
now 244Ye-244Yg.

251N Paragraph numbers in the second half of §251, referred to in editions of Volumes 3 and 4 up to and including
2006, have been changed, so that 251M-251S are now 251N-251T.

252Yf Exercise This exercise, referred to in the first edition of Volume 1, has been moved to 252Ym.

272S Distribution of a sum of independent random variables This result, referred to in the 2002 and 2004
editions of Volume 3, and the 2003 and 2006 editions of Volume 4, is now 272T.

272U Etemadi’s lemma This result, referred to in the 2003 and 2006 editions of Volume 4, is now 272V.
520 Concordance 272Yd

272Yd This exercise, referred to in the 2002 and 2004 editions of Volume 3, is now 272Ye.

273Xh This exercise, referred to in the 2006 edition of Volume 4, is now 273Xi.
References 521

References for Volume 2


Alexits G. [78] (ed.) Fourier Analysis and Approximation Theory. North-Holland, 1978 (Colloq. Math. Soc. Janos
Bolyai 19).
Antonov N.Yu. [96] ‘Convergence of Fourier series’, East J. Approx. 7 (1996) 187-196. [§286 notes.]
Arias de Reyna J. [02] Pointwise Convergence of Fourier Series. Springer, 2002 (Lecture Notes in Mathematics
1785). [§286 notes.]
Bergelson V., March P. & Rosenblatt J. [96] (eds.) Convergence in Ergodic Theory and Probability. de Gruyter,
1996.
du Bois-Reymond P. [1876] ‘Untersuchungen über die Convergenz und Divergenz der Fouriersche Darstellungformeln’,
Abh. Akad. München 12 (1876) 1-103. [§282 notes.]
Bourbaki N. [66] General Topology. Hermann/Addison-Wesley, 1968. [2A5F.]
Bourbaki N. [87] Topological Vector Spaces. Springer, 1987. [2A5E.]
Carleson L. [66] ‘On convergence and growth of partial sums of Fourier series’, Acta Math. 116 (1966) 135-157. [§282
notes, §286 intro., §286 notes.]
Clarkson J.A. [36] ‘Uniformly convex spaces’, Trans. Amer. Math. Soc. 40 (1936) 396-414. [244O.]
Defant A. & Floret K. [93] Tensor Norms and Operator Ideals, North-Holland, 1993. [§253 notes.]
Dudley R.M. [89] Real Analysis and Probability. Wadsworth & Brooks/Cole, 1989. [§282 notes.]
Dunford N. & Schwartz J.T. [57] Linear Operators I. Wiley, 1957 (reprinted 1988). [§244 notes, 2A5J.]
Enderton H.B. [77] Elements of Set Theory. Academic, 1977. [§2A1.]
Engelking R. [89] General Topology. Heldermann, 1989 (Sigma Series in Pure Mathematics 6). [2A5F.]
Etemadi N. [96] ‘On convergence of partial sums of independent random variables’, pp. 137-144 in Bergelson
March & Rosenblatt 96. [272V.]
Evans L.C. & Gariepy R.F. [92] Measure Theory and Fine Properties of Functions. CRC Press, 1992. [263Ec, §265
notes.]
Federer H. [69] Geometric Measure Theory. Springer, 1969 (reprinted 1996). [262C, 263Ec, §264 notes, §265 notes,
§266 notes.]
Feller W. [66] An Introduction to Probability Theory and its Applications, vol. II. Wiley, 1966. [Chap. 27 intro.,
274H, 275Xc, 285N.]
Fremlin D.H. [74] Topological Riesz Spaces and Measure Theory. Cambridge U.P., 1974. [§232 notes, 241F, §244
notes, §245 notes, §247 notes.]
Fremlin D.H. [93] ‘Real-valued-measurable cardinals’, pp. 151-304 in Judah 93. [232Hc.]
Haimo D.T. [67] (ed.) Orthogonal Expansions and their Continuous Analogues. Southern Illinois University Press,
1967.
Hall P. [82] Rates of Convergence in the Central Limit Theorem. Pitman, 1982. [274Hc.]
Halmos P.R. [50] Measure Theory. Van Nostrand, 1950. [§251 notes, §252 notes, 255Yn.]
Halmos P.R. [60] Naive Set Theory. Van Nostrand, 1960. [§2A1.]
Hanner O. [56] ‘On the uniform convexity of Lp and lp ’, Arkiv för Matematik 3 (1956) 239-244. [244O.]
Henle J.M. [86] An Outline of Set Theory. Springer, 1986. [§2A1.]
Hoeffding W. [63] ‘Probability inequalities for sums of bounded random variables’, J. Amer. Statistical Association
58 (1963) 13-30. [272W.]
Hunt R.A. [67] ‘On the convergence of Fourier series’, pp. 235-255 in Haimo 67. [§286 notes.]
Jorsbøe O.G. & Mejlbro L. [82] The Carleson-Hunt Theorem on Fourier Series. Springer, 1982 (Lecture Notes in
Mathematics 911). [§286 notes.]
Judah H. [93] (ed.) Proceedings of the Bar-Ilan Conference on Set Theory and the Reals, 1991. Amer. Math. Soc.
(Israel Mathematical Conference Proceedings 6), 1993.
Kelley J.L. [55] General Topology. Van Nostrand, 1955. [2A5F.]
Kelley J.L. & Namioka I. [76] Linear Topological Spaces. Springer, 1976. [2A5C.]
Kirzbraun M.D. [34] ‘Über die zusammenziehenden und Lipschitzian Transformationen’, Fund. Math. 22 (1934)
77-108. [262C.]
Kolmogorov A.N. [26] ‘Une série de Fourier-Lebesgue divergente partout’, C. R. Acad. Sci. Paris 183 (1926) 1327-
1328. [§282 notes.]
Komlós J. [67] ‘A generalization of a problem of Steinhaus’, Acta Math. Acad. Sci. Hung. 18 (1967) 217-229. [276H.]
522 References

Körner T.W. [88] Fourier Analysis. Cambridge U.P., 1988. [§282 notes.]
Köthe G. [69] Topological Vector Spaces I. Springer, 1969. [2A5C, 2A5J.]
Krivine J.-L. [71] Introduction to Axiomatic Set Theory. D. Reidel, 1971. [§2A1.]
Lacey M. & Thiele C. [00] ‘A proof of boundedness of the Carleson operator’, Math. Research Letters 7 (2000) 1-10.
[§286 intro., 286H.]
Lebesgue H. [72] Oeuvres Scientifiques. L’Enseignement Mathématique, Institut de Mathématiques, Univ. de Genève,
1972. [Chap. 27 intro.]
Liapounoff A. [1901] ‘Nouvelle forme du théorème sur la limite de probabilité’, Mém. Acad. Imp. Sci. St-Pétersbourg
12(5) (1901) 1-24. [274Xg.]
Lighthill M.J. [59] Introduction to Fourier Analysis and Generalised Functions. Cambridge U.P., 1959. [§284 notes.]
Lindeberg J.W. [22] ‘Eine neue Herleitung des Exponentialgesetzes in der Wahrscheinlichkeitsrechnung’, Math.
Zeitschrift 15 (1922) 211-225. [274Ha, §274 notes.]
Lipschutz S. [64] Set Theory and Related Topics. McGraw-Hill, 1964 (Schaum’s Outline Series). [§2A1.]
Loève M. [77] Probability Theory I. Springer, 1977. [Chap. 27 intro., 274H.]
Luxemburg W.A.J. & Zaanen A.C. [71] Riesz Spaces I. North-Holland, 1971. [241F.]
Mozzochi C.J. [71] On the Pointwise Convergence of Fourier Series. Springer, 1971 (Lecture Notes in Mathematics
199). [§286 notes.]
Naor A. [04] ‘Proof of the uniform convexity lemma’, http://www.cims.nyu.edu/∼ naor/homepage files/inequality
.pdf, 26.2.04. [244O.]
Rényi A. [70] Probability Theory. North-Holland, 1970. [274Hc.]
Roitman J. [90] An Introduction to Set Theory. Wiley, 1990. [§2A1.]
Roselli P. & Willem M. [02] ‘A convexity inequality’, Amer. Math. Monthly 109 (2002) 64-70. [244Ym.]
Schipp F. [78] ‘On Carleson’s method’, pp. 679-695 in Alexits 78. [§286 notes.]
Semadeni Z. [71] Banach spaces of continuous functions I. Polish Scientific Publishers, 1971. [§253 notes.]
Shiryayev A. [84] Probability. Springer, 1984. [285N.]
Sjölin P. [71] ‘Convergence almost everywhere of certain singular integrals and multiple Fourier series’, Arkiv för
Math. 9 (1971) 65-90. [§286 notes.]
Steele J.M. [86] ‘An Efron-Stein inequality of nonsymmetric statistics’, Annals of Statistics 14 (1986) 753-758.
[274Xj.]
Zaanen A.C. [83] Riesz Spaces II. North-Holland, 1983. [241F.]
Zygmund A. [59] Trigonometric Series. Cambridge U.P, 1959. [§244 notes, §282 notes, 284Xj.]
convex Principal topics and results 523

Index to volumes 1 and 2


Principal topics and results
The general index below is intended to be comprehensive. Inevitably the entries are voluminous to the point that
they are often unhelpful. I have therefore prepared a shorter, better-annotated, index which will, I hope, help readers
to focus on particular areas. It does not mention definitions, as the bold-type entries in the main index are supposed to
lead efficiently to these; and if you draw blank here you should always, of course, try again in the main index. Entries
in the form of mathematical assertions frequently omit essential hypotheses and should be checked against the formal
statements in the body of the work.
absolutely continuous real functions §225
—– as indefinite integrals 225E
absolutely continuous additive functionals §232
—– characterization 232B
atomless measure spaces
—– have elements of all possible measures 215D
Borel sets in R r 111G
—– and Lebesgue measure 114G, 115G, 134F
bounded variation, real functions of §224
—– as differences of monotonic functions 224D
—– integrals of their derivatives 224I
—– Lebesgue decomposition 226C
Brunn-Minkowski theorem 266C
Cantor set and function 134G, 134H
Carathéodory’s construction of measures from outer measures 113C
Carleson’s theorem (Fourier series of square-integrable functions converge a.e.) §286
Central Limit Theorem (sum of independent random variables approximately normal) §274
—– Lindeberg’s condition 274F, 274G
change of measure in the integral 235K
change
R of variable inR the integral §235
—– J × gφ dµ = g dν 235A, 235E, 235J
—– finding J 235M;
—– —– J = | det T | for linear operators T 263A; J = | det φ′ | for differentiable operators φ 263D
—– —– —– when the measures are Hausdorff measures 265B, 265E
—– when
R φ is inverse-measure-preserving
R 235G
—– gφ dµ = J × g dν 235R
characteristic function of a probability distribution §285
—– sequences of distributions converge in vague topology iff characteristic functions converge pointwise 285L
complete measure space §212
completion of a measure 212C et seq.
concentration of measure 264H
conditional expectation
—– of a function §233
—– as operator on L1 (µ) 242J
construction of measures
—– image measures 234C
—– from outer measures (Carathéodory’s method) 113C
—– subspace measures 131A, 214A
—– product measures 251C, 251F, 251W, 254C
—– as pull-backs 234F
convergence theorems (B.Levi, Fatou, Lebesgue) §123
convergence in measure (linear space topology on L0 )
—– on L0 (µ) §245
—– when Hausdorff/complete/metrizable 245E
convex functions 233G et seq.
524 Index convolution

convolution of functions
r
R R ) §255 R
—– (on
—– h × (f ∗ g) = h(x + y)f (x)g(y)dxdy 255G
—– f ∗ (g ∗ h) = (f ∗ g) ∗ h 255J
convolution of measures
r
R R ) §257 R
—– (on
—– h d(ν1 ∗ ν2 ) = h(x + y)ν1 (dx)ν2 (dy) 257B
—– of absolutely continuous measures 257F
countable sets 111F, 1A1C et seq.
countable-cocountable measure 211R
counting measure 112Bd
Denjoy-Young-Saks theorem 222L
differentiable functions (from R r to R s ) §262, §263
direct sum of measure spaces 214L
distribution of a finite family of random variables §271
—– as a Radon measure 271B
—– of φ(X X ) 271J
—– of an independent family 272G
—– determined by characteristic functions 285M
Doob’s Martingale Convergence Theorem 275G
exhaustion, principle of 215A
extended real line §135
extension of measures 214P
R R
Fatou’s Lemma ( lim inf ≤ lim inf for sequences of non-negative functions) 123B
Fejér sums (running averages of Fourier sums) converge to local averages of f 282H
—– uniformly if f is continuous 282G
Fourier series §282
—– norm-converge in L2 282J
—– converge at points of differentiability 282L
—– converge to midpoints of jumps, if f of bounded variation 282O
—– and convolutions 282Q
—– converge a.e. for square-integrable function 286V
Fourier transforms
—– on R §283, §284
Rd ∧
—– formula for c f in terms of f 283F
—– and convolutions 283M
—– in terms of action on test functions 284H et seq.
—– of square-integrable functions 284O, 286U
—– inversion formulae for differentiable functions 283I; for functions of bounded variation 283L
∧ R∞ 2
—– f (y) = limǫ↓0 √12π −∞ e−iyx e−ǫx f (x)dx a.e. 284M
R theorem §252
Fubini’s RR
—– f d(µ × ν) = f (x, y)dxdy 252B
—– when both factors σ-finite 252C, 252H
—– for characteristic functions 252D, 252F
d
Rx Rb
Fundamental Theorem of the Calculus ( dx a
f = f (x) a.e.) §222; ( a F ′ (x)dx = F (b) − F (a)) 225E

Hahn decomposition of a countably additive functional 231E


Hardy-Littlewood Maximal Theorem 286A
Hausdorff measures (on R r ) §264
—– are topological measures 264E
—– r-dimensional Hausdorff measure on R r a multiple of Lebesgue measure 264I
—– (r − 1)-dimensional measure on R r 265F-265H
image measures 234C
outer Principal topics and results 525

indefinite integrals
—– differentiate back to original function 222E, 222H
—– to construct measures 234I
independent random variables §272
—– joint distributions are product measures 272G
—– sums 272S, 272T, 272V, 272W
—– limit theorems §273, §274
inner regularity of measures
—– (with respect to compact sets) Lebesgue measure 134F; Radon measures §256
integration of real-valued functions, construction §122
—– as a positive linear functional 122O
—– —– acting on L1 (µ) 242B
—– by parts 225F
—– characterization of integrable functions 122P, 122R
—– over subsets 131D, 214E
—– functions and integrals with values in [−∞, ∞] §133
Jensen’s inequality 233I-233J
—– expressed in L1 (µ) 242K
Komlós’ subsequence theorem 276H
Lebesgue’s Density Theorem (in R) §223
1
R x+h
—– limh↓0 2h f = f (x) a.e. 223A
1
Rx−h
x+h
—– limh↓0 2h x−h |f (x − y)|dy = 0 a.e. 223D
—– (in R r ) 261C, 261E
Lebesgue measure, construction of §114, §115
—– further properties §134 R R
Lebesgue’s Dominated
R Convergence
R Theorem ( lim = lim for dominated sequences
RP P of Rfunctions) 123C
B.Levi’s theorem ( lim = lim for monotonic sequences of functions) 123A; ( = ) 226E
Lipschitz functions §262
—– differentiable a.e. 262Q
localizable measure space
—– assembling partial measurable functions 213N
—– which is not strictly localizable 216E
Lusin’s theorem (measurable functions are almost continuous)
—– (on R r ) 256F
martingales §275
—– L1 -bounded martingales converge a.e. 275G
—– when of form E(X|Σn ) 275H, 275I
measurable envelopes
—– elementary properties 132E
—– existence 213L
measurable functions
—– (real-valued) §121
—– —– sums, products and other operations on finitely many functions 121E
—– —– limits, infima, suprema 121F
measure space §112
Monotone Class Theorem 136B
monotonic functions
—– are differentiable a.e. 222A
non-measurable set (for Lebesgue measure) 134B
ordering of measures 234P
outer measures constructed from measures 132A et seq.
outer regularity of Lebesgue measure 134F
526 Index Plancherel

Plancherel Theorem (on Fourier series and transforms of square-integrable functions) 282K, 284O
Poisson’s theorem (a count of independent rare events has an approximately Poisson distribution) 285Q
product of two measure spaces §251
—– basic properties of c.l.d. product measure 251I
—– Lebesgue measure on R r+s as a product measure 251N
—– more than two spaces 251W
—– Fubini’s theorem 252B
—– Tonelli’s theorem 252G
—– and L1 spaces §253
—– continuous bilinear maps on L1 (µ) × L1 (ν) 253F
—– conditional expectation on a factor 253H
product of any number of probability spaces §254
—– basic properties of the (completed) product 254F
—– characterization with inverse-measure-preserving functions 254G
—– products of subspaces 254L
—– products of products 254N
—– determination by countable subproducts 254O
—– subproducts and conditional expectations 254R

Rademacher’s theorem (Lipschitz functions are differentiable a.e.) 262Q


Radon measures
—– on R r §256
—– as completions of Borel measures 256C
—– indefinite-integral measures 256E
—– image measures 256G
—– product measures 256K
Radon-Nikodým theorem (truly continuous additive set-functions have densities) 232E
—– in terms of L1 (µ) 242I
repeated integrals §252
indexvheaderRothberger
Stone-Weierstrass theorem §281
—– for Riesz subspaces of Cb (X) 281A
—– for subalgebras of Cb (X) 281E
—– for *-subalgebras of Cb (X; C) 281G
strictly localizable measures
—– sufficient condition for strict localizability
Pn 213O
1
strong law of large numbers (limn→∞ n+1 i=0 (Xi − E(Xi )) = 0 a.e.) §273
P∞ 1
—– when n=0 2
Var(Xn ) < ∞ 273D
(n+1)
—– when supn∈N E(|Xn |1+δ ) < ∞ 273H
—– for identically distributed Xn 273I
—– for martingale difference sequences 276C, 276F
—– convergence of averages for k k1 , k kp 273N
subspace measures
—– for measurable subspaces §131
—– for arbitrary subspaces §214
surface measure in R r §265
tensor products of L1 spaces §253 RR
Tonelli’s theorem (f is integrable if |f (x, y)|dxdy < ∞) 252G
uniformly integrable sets in L1 §246
—– criteria for uniform integrability 246G
—– and convergence in measure 246J
—– and weak compactness 247C
atomless General index 527

Vitali’s theorem (for coverings by intervals in R) 221A


—– (for coverings by balls in R r ) 261B
weak compactness in L1 (µ) §247
Weyl’s Equidistribution Theorem 281N
c.l.d. version 213D et seq.
L0 (µ) (space of equivalence classes of measurable functions) §241
—– as Riesz space 241E
L1 (µ) (space of equivalence classes of integrable functions) §242
—– norm-completeness 242F
—– density of simple functions 242M
—– (for Lebesgue measure) density of continuous functions and step functions 242O
Lp (µ) (space of equivalence classes of pth-power-integrable functions, where 1 < p < ∞) §244
—– is a Banach lattice 244G
—– has dual Lq , where p1 + 1q = 1 244K
—– and conditional expectations 244M
L∞ (µ) (space of equivalence classes of bounded measurable functions) §243
—– duality with L1 243F-243G
—– norm-completeness 243E
—– order-completeness 243H
σ-algebras of sets §111
—– generated by given families 136B, 136G
σ-finite measures 215B

General index
References in bold type refer to definitions; references in italics are passing references. Definitions marked with >
are those in which my usage is dangerously at variance with that of some other author or authors.
Abel’s theorem 224Yi
absolute summability 226Ac
absolutely continuous additive functional 232Aa, 232B, 232D, 232F-232I, 232Xa, 232Xb, 232Xd, 232Xf, 232Xh,
232Yk, 234Kb, 256J, 257Xf
absolutely continuous function 225B, 225C-225G, 225K-225O, 225Xa-225Xh, 225Xn, 225Xo, 225Ya, 225Yc, 232Xb,
233Xc, 244Yi, 252Ye, 256Xg, 262Bc, 263I, 264Yp, 265Ya, 274Xi, 282R, 282Yf, 283Ci
adapted martingale 275A
—– stopping time 275L
additive functional on an algebra of sets see finitely additive (136Xg, 231A), countably additive (231C)
adjoint operator 243Fc, 243 notes
algebra see algebra of sets (136E), Banach algebra (2A4Jb), normed algebra (2A4J)
algebra of sets 113Yi, 136E, 136F, 136G, 136Xg, 136Xh, 136Xk, 136Ya, 136Yb, 231A, 231B, 231Xa; see also
σ-algebra (111A)
almost continuous function 256F
almost every, almost everywhere 112Dd
almost surely 112De
analytic (complex) function 133Xc
angelic topological space 2A5J
Archimedean Riesz space 241F, 241Yb, 242Xc
area see surface measure
arithmetic mean 266A
asymptotic density 273G
asymptotically equidistributed see equidistributed (281Yi)
atom (in a measure space) 211I, 211Xb, 246G
atomic see purely atomic (211K)
atomless measure (space) 211J, 211Md, 211Q, 211Xb, 211Yd, 211Ye, 212G, 213He, 214K, 214Xd, 215D, 215Xe,
215Xf, 216A, 216Ya, 234Be, 234Nf, 234Yi, 251U, 251Wo, 251Xs, 251Xt, 251Yd, 252Yo, 252Yq, 252Ys, 254V, 254Yf,
256Xd, 264Yg
528 Index automorphism

automorphism see measure space automorphism


axiom see Banach-Ulam problem, choice (2A1J), countable choice
ball (in R r ) 252Q, 252Xi, 264H; see also sphere
Banach algebra > 2A4Jb
Banach lattice 242G, 242Xc, 242Yc, 242Ye, 243E, 243Xb; see also Lp
Banach space 231Yh, 262Yi, 2A4D, 2A4E, 2A4I; see also Banach algebra (2A4Jb), Banach lattice (242G), sepa-
rable Banach space
Banach-Ulam problem 232Hc
Berry-Esséen theorem 274Hc, 285 notes
Bienaymé’s equality 272S
bilinear map §253 (253A), 255 notes
Bochner integral 253Yf, 253Yg, 253Yi
Bochner’s theorem 285Xr
Borel algebra see Borel σ-algebra (111Gd, 135C, 256Ye)
Borel-Cantelli lemma 273K
Borel measurable function 121C, 121D, 121Eg, 121H, 121K, 121Yd, 134Fd, 134Xg, 134Yt, 135Ef, 135Xc, 135Xe,
225H, 225J, 225Yf, 233Hc, 241Be, 241I, 241Xd, 256M
Borel measure (on R) 211P, 216A
Borel sets in R, R r 111G, 111Yd, 114G, 114Yd, 115G, 115Yb, 115Yd, 121Ef, 121K, 134F, 134Xd, 135C, 136D,
136Xj, 254Xs, 225J, 264E, 264F, 264Xb, 266Bc
Borel σ-algebra (of subsets of Rr ) 111Gd, 114Yg-114Yi, 121J, 121Xd, 121Xe, 121Yc, 121Yd, 212Xc, 212Xd, 216A,
251M;
—– (of other spaces) 135C, 135Xb, 256Ye, 271Ya
bounded bilinear operator 253Ab, 253E, 253F, 253L, 253Xb, 253Yf
bounded linear operator 242Je, 253Ab, 253F, 253Gc, 253L, 253Xc, 253Yf, 253Yj, 2A4F, 2A4G-2A4I, 2A5If; see
also B(U ; V ) (2A4F)
bounded set (in R r ) 134E; (in a normed space) 2A4Bc; (in a linear topological space) 245Yf; see also order-bounded
(2A1Ab)
bounded support, function with see compact support
bounded variation, function of §224 (224A, 224K), 225Cb, 225M, 225Oc, 225Xh, 225Xn, 225Yc, 226Bc, 226C,
226Yd, 263Ye, 264Yp, 265Yb, 282M, 282O, 283L, 283Xj, 283Xk, 283Xm, 283Xn, 283Xq, 284Xk, 284Yd
Brunn-Minkowski inequality 266C
Cantor function 134H, 134I, 222Xd, 225N, 226Cc, 262K, 264Xf, 264Yn
Cantor measure 256Hc, 256Ia, 256Xk, 264Ym
Cantor set 134G, 134H, 134I, 134Xf, 256Hc, 256Xk, 264J, 264Ym, 264Yn; see also {0, 1}I
Carathéodory complete measure space see complete (211A)
Carathéodory’s method (of constructing measures) 113C, 113D, 113Xa, 113Xd, 113Xg, 113Yc, 114E, 114Xa, 115E,
121Ye, 132Xc, 136Ya, 212A, 212Xf, 213C, 213Xa, 213Xb, 213Xd, 213Xf, 213Xg, 213Xj, 213Ya, 214H, 214Xb, 216Xb,
251C, 251Wa, 251Xe, 264C, 264K
cardinal 2A1Kb, 2A1L
Carleson’s theorem 282K remarks, 282 notes, 284Yg, 286U, 286V
Cauchy distribution 285Xm
Cauchy filter 2A5F, 2A5G
Cauchy’s inequality 244Eb
Cauchy sequence 242Yc, 2A4D
Central Limit Theorem 274G, 274I-274K, 274Xc-274Xg, 274Xk, 274Xl, 285N, 285Xn, 285Ym
Cesàro sums 273Ca, 282Ad, 282N, 282Xn
chain rule for Radon-Nikodým derivatives 235Xh
change of variable in integration §235, 263A, 263D, 263F, 263G, 263I, 263Xc, 263Xe, 263Yc
characteristic function (of a set) 122Aa
—– (of a probability distribution) §285 (285Aa)
—– (of a random variable) §285 (285Ab)
Chebyshev’s inequality 271Xa
choice, axiom of 134C, 1A1G, 254Da, 2A1J, 2A1K-2A1M; see also countable choice
choice function 2A1J
convex General index 529

circle group 255M, 255Ym


closed convex hull 253Gb, 2A5Eb
closed interval (in R or R r ) 114G, 115G, 1A1A; (in a general partially ordered space) 2A1Ab
closed set (in a topological space) 134Fb, 134Xd, 1A2E, 1A2F, 1A2G, 2A2A, 2A3A, 2A3D, 2A3Nb
closure of a set 2A2A, 2A2B, 2A3D, 2A3Kb, 2A5E; see also essential closure (266B)
cluster point (of a sequence) 2A3O
Coarea Theorem 265 notes
cofinite see finite-cofinite algebra (231Xa)
commutative (ring, algebra) 2A4Jb
commutative Banach algebra 224Yb, 243Dd, 255Xi, 257Ya
compact set (in a topological space) 247Xc, 2A2D, 2A2E-2A2G, > 2A3N, 2A3R; see also relatively compact
(2A3Na), relatively weakly compact (2A5Id), weakly compact (2A5Ic)
compact support, function with 242Xh, 256Be, 256D, 262Yd-262Yg; see also Ck (X), Ck (X; C)
compact support, measure with 284Yi
compact topological space > 2A3N
complete linear topological space 245Ec, 2A5F, 2A5H
complete locally determined measure (space) 213C, 213D, 213H, 213J, 213L, 213O, 213Xg, 213Xi, 213Xl, 213Yd,
213Ye, 214I, 216D, 216E, 234Nb, 251Yc, 252B, 252D, 252N, 252Xc, 252Ye, 252Ym, 252Yq-252Ys, 252Yv, 253Yj,
253Yk; see also c.l.d. version (213E)
complete measure (space) 112Df, 113Xa, 122Ya, 211A, 211M, 211N, 211R, 211Xc, 211Xd, §212, 214I, 214K, 216A,
216C-216E, 216Ya, 234Ha, 234I, 234Ld, 234Ye, 235Xl, 254Fd, 254G, 254J, 264Dc; see also complete locally determined
measure
complete metric space 224Ye; see also Banach space (2A4D)
complete normed space see Banach space (2A4D), Banach lattice (242G)
complete Riesz space see Dedekind complete (241Fc)
completed indefinite-integral measure see indefinite-integral measure (234J)
completion (of a measure (space)) §212 (212C), 213Fa, 213Xa, 213Xb, 213Xk, 214Ib, 214Xi, 216Yd, 232Xe, 234Ba,
234Ke, 234Lb, 234Xc, 234Xl, 234Ye, 234Yo, 235D, 241Xb, 242Xb, 243Xa, 244Xa, 245Xb, 251T, 251Wn, 251Xr, 252Ya,
254I, 256C
complex-valued function §133
component (in a topological space) 111Ye
conditional expectation 233D, 233E, 233J, 233K, 233Xg, 233Yc, 233Ye, 235Yb, 242J, 246Ea, 253H, 253Le, 275Ba,
275H, 275I, 275K, 275Ne, 275Xi, 275Ya, 275Yk, 275Yl
conditional expectation operator 242Jf, 242K, 242L, 242Xe, 242Xj, 242Yk, 243J, 244M, 244Yl, 246D, 254R, 254Xp,
275Xd, 275Xe
conegligible set 112Dc, 211Xg, 212Xi, 214Cc, 234Hb
connected set 222Yb
continuity, points of 224H, 224Ye, 225J
continuous function 121D, 121Yd, 262Ia, 2A2C, 2A2G, 2A3B, 2A3H, 2A3Nb, 2A3Qb
continuous linear functional 284Yj; see also dual linear space (2A4H)
continuous linear operator 2A4Fc; see also bounded linear operator (2A4F)
continuous measure see atomless (211J), absolutely continuous (232Aa)
continuous see also semi-continuous (225H)
continuum see c (2A1L)
convergence in distribution 285Yq, 285Yr
convergence in mean (in L1 (µ) or L1 (µ)) 245Ib
convergence in measure (in L0 (µ)) §245 (> >245A), 246J, 246Yc, 247Ya, 253Xe, 255Yh, 285Yr
—– (in L0 (µ)) §245 (> >245A), 271Yd, 272Yd, 274Yd, 285Xs
—– (in the algebra of measurable sets) 232Ya
—– (of sequences) 245Ad, 245C, 245H-245L, 245Xd, 245Xf, 245Xl, 245Yh, 246J, 246Xh, 246Xi, 271L, 273Ba, 275Xk,
275Yp, 276Yf
convergent almost everywhere 245C, 245K, 273Ba, 276G, 276H; see also strong law
convergent filter 2A3Q, 2A3S, 2A5Ib;
—– sequence 135D, 245Yi, 2A3M, 2A3Sg; see also convergence in measure (245Ad)
convex function 233G, 233H-233J, 233Xb-233Xf (233Xd), 233Xh, 233Ya, 233Yc, 233Yd, 233Yg, 233Yi, 233Yj,
242K, 242Yi, 242Yj, 242Yk, 244Xm, 244Yh, 244Ym, 255Yj, 275Yg; see also mid-convex (233Ya)
convex hull 2A5E; see also closed convex hull (2A5Eb)
530 Index convex

convex set 233Xd, 244Yk, 233Yf-233Yj, 262Xh, 264Xg, 266Xc, 2A5E


convolution in L0 255Xh, 255Xi, 255Xj, 255Yh, 255Yj
convolution of functions 255E, 255F-255K, 255O, 255Xa-255Xg, 255Ya, 255Yc, 255Yd, 255Yf, 255Yg, 255Yk,
255Yl, 255Yn, 262Xj, 262Yd, 262Ye, 262Yh, 263Ya, 282Q, 282Xt, 283M, 283Wd, 283Wf, 283Wj, 283Xl, 284J, 284K,
284O, 284Wf, 284Wi, 284Xb, 284Xd
convolution of measures §257 (257A), 272T, 285R, 285Yn
convolution of measures and functions 257Xe, 284Xo, 284Yi
convolution of sequences 255Xk, 255Yo, 282Xq
countable (set) 111F, 114G, 115G, §1A1, 226Yc
countable choice, axiom of 134C, 211P
countable-cocountable algebra 211R, 211Ya, 232Hb
countable-cocountable measure 211R, 232Hb, 252L
countable sup property (in a Riesz space) 241Yd, 242Yd, 242Ye, 244Yb
countably additive functional (on a σ-algebra of sets) §231 (231C), §232, 246Yg, 246Yi
counting measure 112Bd, 122Xd, 122 notes, 211N, 211Xa, 213Xb, 226A, 241Xa, 242Xa, 243Xl, 244Xh, 244Xn,
245Xa, 246Xc, 246Xd, 251Xc, 251Xi, 252K, 255Yo, 264Db
cover see measurable envelope (132D)
Covering Lemma see Lebesgue Covering Lemma
covering theorem 221A, 261B, 261F, 261Xc, 261Ya, 261Yi, 261Yk
cylinder (in ‘measurable cylinder’) 254Aa, 254F, 254G, 254Q, 254Xa
decimal expansions 273Xg
decomposable measure (space) see strictly localizable (211E)
decomposition (of a measure space) 211E, 211Ye, 213O, 213Xh, 214Ia, 214L, 214N, 214Xh
—– see also Hahn decomposition (231F), Jordan decomposition (231F), Lebesgue decomposition (232I, 226C)
decreasing rearrangement (of a function) 252Yo
Dedekind complete ordered set 135Ba, 234Yo
—– —– Riesz space 241Fc, 241G, 241Xf, 242H, 242Yc, 243Hb, 243Xj, 244L
Dedekind σ-complete Riesz space 241Fb, 241G, 241Xe, 241Yb, 241Yh, 242Yg, 243Ha, 243Xb
delta function see Dirac’s delta function (284R)
delta system see ∆-system
Denjoy-Young-Saks theorem 222L
dense set in a topological space 136H, 242Mb, 242Ob, 242Pd, 242Xi, 243Ib, 244H, 244Pb, 244Xk, 244Yj, 254Xo,
281Yc, 2A3U, 2A4I
density function (of a random variable) 271H, 271I-271K, 271Xc-271Xe, 272U, 272Xd, 272Xj; see also Radon-
Nikodým derivative (232Hf, 234J)
density point 223B, 223Xc, 223Yb, 266Xb
density topology 223Yb, 223Yc, 223Yd, 261Yf, 261Yl
density see also asymptotic density (273G), Lebesgue’s Density Theorem (223A)
derivate see Dini derivate (222J)
derivative of a function (of one variable) 222C, 222E-222J, 222Yd, 225J, 225L, 225Of, 225Xc, 226Be, 282R; (of many
variables) 262F, 262G, 262P; see partial derivative
determinant of a matrix 2A6A
determined see locally determined measure space (211H), locally determined negligible sets (213I)
determined by coordinates (in ‘W is determined by coordinates in J’) 254M, 254O, 254R-254T, 254Xp, 254Xr
Devil’s Staircase see Cantor function (134H)
differentiability, points of 222H, 225J
differentiable function (of one variable) 123D, 222A, 224I, 224Kg, 224Yc, 225L, 225Of, 225Xc, 225Xn, 233Xc, 252Ye,
255Xd, 255Xe, 262Xk, 265Xd, 274E, 282L, 282Rb, 282Xs, 283I-283K, 283Xm, 284Xc, 284Xk; (of many variables)
262Fa, 262Gb, 262I, 262Xg, 262Xi, 262Xj; see also derivative
‘differentiable relative to its domain’ 222L, 262Fb, 262I, 262M-262Q, 262Xd-262Xf, 262Yc, 263D, 263Xc, 263Xd,
263Yc, 265E, 282Xk
differentiating through an integral 123D
diffused measure see atomless measure (211J)
dilation 286C
dimension see Hausdorff dimension (264 notes)
filter General index 531

Dini derivates 222J, 222L, 222Ye, 225Yf


Dirac measure 112Bd, 257Xa, 284R, 284Xn, 284Xo, 285H, 285Xp
direct image (of a set under a function or relation) 1A1B
direct sum of measure spaces 214L, 214M, 214Xh-214Xk, 241Xg, 242Xd, 243Xe, 244Xf, 245Yh, 251Xi, 251Xj
directed setsee downwards-directed (2A1Ab), upwards-directed (2A1Ab)
Dirichlet kernel 282D; see also modified Dirichlet kernel (282Xc)
disintegration of a measure 123Ye
disjoint family (of sets) 112Bb
disjoint sequence theorems (in topological Riesz spaces) 246G, 246Ha, 246Yd-246Yf, 246Yj
distribution see Schwartzian distribution, tempered distribution
distribution of a random variable 241Xc, 271E-271G, 272G, 272T, 272Xe, 272Ya, 272Yb, 272Yf, 285H, 285Xg; see
also Cauchy distribution (285Xm), empirical distribution (273 notes), Poisson distribution (285Xo)
—– of a finite family of random variables 271B, 271C, 271D-271I, 272G, 272Yb, 272Yc, 285Ab, 285C, 285Mb
distribution function of a random variable > 271G, 271L, 271Xb, 271Yb-271Yd, 272Xe, 272Ya, 273Xh, 273Xi,
274F-274L, 274Xd, 274Xg, 274Xh, 274Ya, 274Yc, 285P
Dominated Convergence Theorem see Lebesgue’s Dominated Convergence Theorem (123C)
Doob’s Martingale Convergence Theorem 275G
downwards-directed partially ordered set 2A1Ab
dual normed space 243G, 244K, 2A4H
Dynkin class 136A, 136B, 136Xb
Eberlein’s theorem 2A5J
Egorov’s theorem 131Ya, 215Yb
empirical distribution 273Xi, 273 notes
envelope see measurable envelope (132D)
equidistributed sequence (in a topological probability space) 281N, 281Yi, 281Yj, 281Yk; see alsowell-distributed
(281Ym)
Equidistribution Theorem see Weyl’s Equidistribution Theorem (281N)
equipollent sets 2A1G, 2A1Ha; see also cardinal (2A1Kb
equiveridical 121B, 212B
essential closure 266B, 266Xa, 266Xb
essential supremum of a family of measurable sets 211G, 213K, 215B, 215C
—– of a real-valued function 243D, 243I, 255K
essentially bounded function 243A
Etemadi’s lemma 272V
Euclidean metric (on R r ) 2A3Fb
Euclidean topology §1A2, §2A2, 2A3Ff, 2A3Tc
even function 255Xb, 283Yb, 283Yc
exchangeable family of random variables 276Xe
exhaustion, principle of 215A, 215C, 215Xa, 215Xb, 232E
expectation of a random variable 271Ab, 271E, 271F, 271I, 271Xa, 271Ye, 272R, 272Xb, 272Xi, 274Xi, 285Ga,
285Xo, 285Xt; see also conditional expectation (233D)
extended real line 121C, §135
extension of finitely additive functionals 113Yi
extension of measures 113Yc, 113Yh, 132Yd, 212Xh, 214P, 214Xm, 214Xn, 214Ya-214Yc; see also completion
(212C), c.l.d. version (213E)
fair-coin probability 254J
Fatou’s Lemma 123B, 133K, 135Gb, 135Hb
Fatou norm on a Riesz space 244Yg
Fejér integral 283Xf, 283Xh-283Xj
Fejér kernel 282D
Fejér sums 282Ad, 282B-282D, 282G-282I, 282Yc
Feller, W. chap. 27 intro.
field (of sets) see algebra (136E)
filter 211Xg, 2A1I, 2A1N, 2A1O; see also Cauchy filter (2A5F), convergent filter (2A3Q), Fréchet filter (2A3Sg),
ultrafilter (2A1N)
532 Index filtration

filtration (of σ-algebras) 275A


finite-cofinite algebra 231Xa, 231Xc
finitely additive function(al) on an algebra of sets 136Xg, 136Ya, 136Yb, 231A, 231B, 231Xb-231Xe, 231Ya-231Yh,
232A, 232B, 232E, 232G, 232Ya, 232Ye, 232Yg, 243Xk; see also countably additive
Fourier coefficients 282Aa, 282B, 282Cb, 282F, 282Ic, 282J, 282M, 282Q, 282R, 282Xa, 282Xg, 282Xq, 282Xt,
282Ya, 283Xu, 284Ya, 284Yg
Fourier’s integral formula 283Xm
Fourier series 121G, §282 (282Ac)
Fourier sums 282Ab, 282B-282D, 282J, 282L, 282O, 282P, 282Xi-282Xk, 282Xp, 282Xt, 282Yd, 286V, 286Xb
Fourier transform 133Xd, 133Yc, §283 (283A, 283Wa), §284 (284H, 284Wd), 285Ba, 285D, 285Xd, 285Ya,
286U, 286Ya
Fourier-Stieltjes transform see characteristic function (285A)
Fréchet filter 2A3Sg
323Ad
Fubini’s theorem 252B, 252C, 252H, 252R, 252Yb, 252Yc
full outer measure 132F, 132Yd, 134D, 134Yt, 212Eb, 214F, 214J, 234F, 234Xa, 241Yg, 243Ya, 254Yf
function 1A1B
Fundamental Theorem of Calculus 222E, 222H, 222I, 225E
Fundamental Theorem of Statistics 273Xi, 273 notes
gamma function 225Xj, 225Xk, 252Xi, 252Yf, 252Yu, 255Xg
Gaussian distribution see standard normal distribution (274Aa)
Gaussian random variable see normal random variable (274Ad)
generated (σ-)algebra of sets 111Gb, 111Xe, 111Xf, 121J, 121Xd, 121Yc, 136B, 136C, 136G, 136H, 136Xc, 136Xk,
136Yb, 136Yc. 251D, 251Xa, 272C
geometric mean 266A
Glivenko-Cantelli theorem 273 notes
graph of a function 264Xf, 265Yb
group 255Yn, 255Yo; see also circle group
Hahn decomposition (for countably additive functionals) 231E
Hahn decomposition of an additive functional 231Eb, 231F
—– see also Vitali-Hahn-Saks theorem (246Yi)
half-open interval (in R or R r ) 114Aa, 114G, 114Yj, 115Ab, 115Xa, 115Xc, 115Yd
Hanner’s inequalities 244O, 244Ym
Hardy-Littlewood Maximal Theorem 286A
Hausdorff dimension 264 notes
Hausdorff measure §264 (264C, 264Db, 264K, 264Yo), 265Yb; see also normalized Hausdorff measure (265A)
Hausdorff metric (on a space of closed subsets) 246Yb
Hausdorff outer measure §264 (264A, 264K, 264Yo)
Hausdorff topology 2A3E, 2A3L, 2A3Mb, 2A3S
Hilbert space 244Na, 244Yk
Hoeffding’s theorem 272W, 272Xk, 272Xl
Hölder continuous function 282Xj, 282Yb
Hölder’s inequality 244Eb, 244Yc, 244Ym
hull see convex hull (2A5Ea), closed convex hull (2A5Eb)
ideal in an algebra of sets 214O, 232Xc; see also σ-ideal (112Db)
identically distributed random variables 273I, 273Xi, 274Xk, 274 notes, 276Xf, 276Yg, 285Xn, 285Yc; see also
exchangeable sequence (276Xe)
image filter 212Xi, 2A1Ib, 2A3Qb, 2A3S
image measure 112Xf, 123Ya, 132Yb, 212Xf, 212Xi, 234C-234F (234D), 234Xb-234Xd, 234Ya, 234Yc, 234Yd, 235J,
235Xl, 235Ya, 254Oa, 256G
image measure catastrophe 235H
indefinite integral 131Xa, 222D-222F, 222H, 222I, 222Xa-222Xc, 222Yc, 224Xg, 225E, 225Od, 225Xh, 232D, 232E,
232Yf, 232Yi; see also indefinite-integral measure
indefinite-integral measure 234I-234O (234J), 234Xh-234Xk, 234Yi, 234Yj, 234Yl-234Yn, 235K, 235N, 235Xh,
235Xl, 235Xm, 253I, 256E, 256J, 256L, 256Xe, 256Yd, 257F, 257Xe, 263Ya, 275Yi, 275Yj, 285Dd, 285Xe, 285Ya;
see also uncompleted indefinite-integral measure
law General index 533

independence §272 (272A)


independent random variables 272Ac, 272D-272I, 272L, 272M, 272P, 272R-272W, 272Xb, 272Xd, 272Xe, 272Xh-
272Xl, 272Ya, 272Yb, 272Yd-272Yh, 273B, 273D, 273E, 273H, 273I, 273L-273N, 273Xi, 273Xk, 273Xm, 274B-274D,
274F-274K, 274Xc, 274Xd, 274Xg, 274Xj, 274Xl, 274Ye, 275B, 275Yh, 275Yq, 275Yr, 276Af, 285I, 285Xf, 285Xg,
285Xm-285Xo, 285Xs, 285Yc, 285Yk, 285Yl
independent sets 272Aa, 272Bb, 272F, 272N, 273F, 273K
independent σ-algebras 272Ab, 272B, 272D, 272F, 272J, 272K, 272M, 272O, 272Q, 275Ym
induced topology see subspace topology (2A3C)
inductive definitions 2A1B
inequality see Cauchy, Chebyshev, Hölder, isodiametric, Jensen, Wirtinger, Young
infinity 112B, 133A, §135
initial ordinal 2A1E, 2A1F, 2A1K
inner measure 113Yg, 212Ya, 213Xe, 213Yc
—– —– defined by a measure 113Yh, 213Xe
inner product space 244N, 244Yn, 253Xe
inner regular measure 256A
—– —– with respect to closed sets 134Xe, 256Bb, 256Ya
—– —– with respect to compact sets 134Fb, 256Bc
integrable function §122 (122M), 123Ya, 133B, 133Db, 133Dc, 133F, 133J, 133Xa, 135Fa, 212Bc, 212Fb, 213B,
213Gsee also Bochner integrable function (253Yf), L1 (µ) (242A)
integral §122 (122E, 122K, 122M); see also integrable function, Lebesgue integral (122Nb), lower integral (133I),
Riemann integral (134K), upper integral (133I)
integration by parts 225F, 225Oe, 252Xb
integration by substitution see change of variable in integration
interior of a set 2A3D, 2A3Ka
interpolation see Riesz Convexity Theorem
interval see half-open interval (114Aa, 115Ab), open interval (111Xb
inverse Fourier transform 283Ab, 283B, 283Wa, 283Xb, 284I; see also Fourier transform
inverse image (of a set under a function or relation) 1A1B
inverse-measure-preserving function 134Yl-134Yn, 234A, 234B, 234Ea, 234F, 234Xa, 235G, 235Xm, 241Xg, 242Xd,
243Xn, 244Xo, 246Xf, 251L, 251Wp, 251Yb, 254G, 254H, 254Ka, 254O, 254Xc-254Xf, 254Xh, 254Yb; see also image
measure (234D)
Inversion Theorem (for Fourier series and transforms) 282G-282J, 282L, 282O, 282P, 283I, 283L, 284C, 284M; see
also Carleson’s theorem
isodiametric inequality 264H
isomorphism see measure space isomorphism
Jacobian 263Ea
Jensen’s inequality 233I, 233J, 233Yi, 233Yj, 242Yi
joint distribution see distribution (271C)
Jordan decomposition of an additive functional 231F, 231Ya, 232C
kernel see Dirichlet kernel (282D), Fejér kernel (282D), modified Dirichlet kernel (282Xc), Poisson kernel (282Yg)
Kirzbraun’s theorem 262C
Kolmogorov’s Strong Law of Large Numbers 273I, 275Yn
Komlós’ theorem 276H, 276Yh
König H. 232Yk
Kronecker’s lemma 273Cb
Lacey-Thiele Lemma 286M
Laplace’s central limit theorem 274Xe
Laplace transform 123Xc, 123Yb, 133Xc, 225Xe
lattice 2A1Ad; see also Banach lattice (242G)
—– norm see Riesz norm (242Xg)
law of a random variable see distribution (271C)
law of large numbers see strong law (§273)
law of rare events 285Q
534 Index Lebesgue

Lebesgue, H. Vol. 1 intro., chap. 27 intro.


Lebesgue Covering Lemma 2A2Ed
Lebesgue decomposition of a countably additive functional 232I, 232Yb, 232Yg, 256Ib
Lebesgue decomposition of a function of bounded variation 226C, 226Dc, 226Ya, 232Yb, 263Ye
Lebesgue’s Density Theorem §223, 261C, 275Xg
Lebesgue’s Dominated Convergence Theorem 123C, 133G
Lebesgue extension see completion (212C)
Lebesgue integrable function 122Nb, 122Yb, 122Ye, 122Yf
Lebesgue integral 122Nb
Lebesgue measurable function 121C, 121D, 134Xg, 134Xj, 225H, 233Yd, 262K, 262P, 262Yc
Lebesgue measurable set 114E, 114F, 114G, 114Xe, 114Ye, 115E, 115F, 115G
Lebesgue measure (on R) §114 (114E), 131Xb, 133Xc, 133Xd, 134G-134L, 212Xc, 216A, chap. 22, 242Xi, 246Yd,
246Ye, 252N, 252O, 252Xf, 252Xg, 252Ye, 252Yo, §255
—– —– (on R r ) §115 (115E), 132C, 132Ef, 133Yc, §134, 211M, 212Xd, 245Yj, 251N, 251Wi, 252Q, 252Xi, 252Yi,
254Xk, 255A, 255L, 255Xj, 255Ye, 255Yf, 256Ha, 256J-256L, 264H, 264I, 266C
—– —– (on [0, 1], [0, 1[) 211Q, 216Ab, 234Ya, 252Yp, 254K, 254Xh, 254Xj-254Xl
—– —– (on other subsets of R r ) 131B, 234Ya, 234Yc, 242O, 244Hb, 244I, 244Yi, 246Yf, 246Yl, 251R, 252Yh,
255M-255O, 255Yg, 255Ym
Lebesgue negligible set 114E, 115E, 134Yk
Lebesgue outer measure 114C, 114D, 114Xc, 114Yd, 115C, 115D, 115E, 115Xb, 115Xd, 115Yb, 132C, 134A, 134D,
134Fa
Lebesgue set of a function 223D, 223Xh-223Xj, 223Yg, 223Yh, 261E, 261Ye, 282Yg
Lebesgue-Stieltjes measure 114Xa, 114Xb, 114Yb, 114Yc, 114Yf, 131Xc, 132Xg, 134Xd, 211Xb, 212Xd, 212Xg,
225Xf, 232Xb, 232Yb, 235Xb, 235Xf, 235Xg, 252Xb, 256Xg, 271Xb, 224Yh
length of a curve 264Yl, 265Xd, 265Ya
length of an interval 114Ab
B.Levi’s theorem 123A, 123Xa, 133K, 135Ga, 135Hb, 226E, 242E
Levi property of a normed Riesz space 242Yb, 244Yf
Lévy’s martingale convergence theorem 275I
Lévy’s metric 274Ya, 285Yd
Liapounoff’s central limit theorem 274Xg
limit of a filter 2A3Q, 2A3R, 2A3S
limit of a sequence 2A3M, 2A3Sg
limit ordinal 2A1Dd
Lindeberg’s central limit theorem 274F-274H, 285Ym
Lindeberg’s condition 274H
linear operator 262Gc, 263A, 265B, 265C, §2A6; see also bounded linear operator (2A4F), continuous linear operator
linear order see totally ordered set (2A1Ac)
linear space topology see linear topological space (2A5A), weak topology (2A5I), weak* topology (2A5Ig)
linear subspace (of a normed space) 2A4C; (of a linear topological space) 2A5Ec
linear topological space 245D, 284Ye, 2A5A, 2A5B, 2A5C, 2A5E-2A5I
Lipschitz constant 262A, 262C, 262Yi
Lipschitz function 224Ye, 225Yc, 262A, 262B-262E, 262N, 262Q, 262Xa-262Xc, 262Xh, 262Yi, 263F, 264G, 264Yj;
see also Hölder continuous (282Xj)
local convergence in measure see convergence in measure (245A)
localizable measure (space) 211G, 211L, 211Ya, 211Yb, 212Ga, 213Hb, 213L-213N, 213Xl, 213Xm, 214Ie, 214K,
214Xa, 214Xc, 214Xe, 216C, 216E, 216Ya, 216Yb, 234Nc, 234O, 234Yk-234Yn, 241G, 241Ya, 243Gb, 243Hb, 245Ec,
245Yf, 252Yp, 252Yq, 252Ys, 254U; see also strictly localizable (211E)
locally determined measure (space) 211H, 211L, 211Ya, 214Xb, 216Xb, 216Ya, 216Yb, 251Xd, 252Ya; see also
complete locally determined measure
locally determined negligible sets 213I, 213J-213L, 213Xj-213Xl, 214Ic, 214Xf, 214Xg, 216Yb, 234Yl, 252E, 252Yb
locally finite measure 256A, 256C, 256G, 256Xa, 256Ya
locally integrable function 242Xi, 255Xe, 255Xf, 256E, 261E, 261Xa, 262Yg
Loève, M. chap. 27 intro.
lower integral 133I, 133J, 133Xf, 133Yd, 135Ha, 214Jb, 235A
lower Lebesgue density 223Yf
lower Riemann integral 134Ka
normal General index 535

lower semi-continuous function 225H, 225I, 225Xl, 225Xm, 225Yd, 225Ye


Lusin’s theorem 134Yd, 256F
Markov time see stopping time (275L)
martingale §275 (275A, 275Cc, 275Cd, 275Ce); see also reverse martingale
martingale convergence theorems 275G-275I, 275K, 275Xf
martingale difference sequence 276A, 276B, 276C, 276E, 276Xd, 276Xg, 276Ya, 276Yb, 276Ye, 276Yg
martingale inequalities 275D, 275F, 275Xb, 275Yc-275Ye, 276Xb
maximal element in a partially ordered set 2A1Ab
maximal theorems 275D, 275Yc, 275Yd, 276Xb, 286A, 286T
mean (of a random variable) see expectation (271Ab)
Mean Ergodic Theorem see convergence in mean (245Ib)
measurable cover see measurable envelope (132D)
measurable envelope 132D, 132E, 132F, 132Xg, 134Fc, 134Xd, 213K-213M, 214G, 216Yc, 266Xa; see also full outer
measure (132F)
measurable envelope property 213Xl, 214Xk
measurable function (taking values in R) §121 (121C), 122Ya, 212B, 212Fa, 213Yd, 214Ma, 214Na, 235C, 235I,
252O, 252P, 256F, 256Yb, 256Yc
—– —– (taking values in R r ) 121Yd, 256G
—– —– (taking values in other spaces) 133Da, 133E, 133Yb, 135E, 135Xd, 135Yf
—– —– ((Σ, T)-measurable function) 121Yc, 235Xc, 251Ya, 251Yd
—– —– see also Borel measurable, Lebesgue measurable
measurable set 112A; µ-measurable set 212Cd; see also relatively measurable (121A)
measurable space 111Bc
measurable transformation §235; see also inverse-measure-preserving function
measure 112A
—– (in ‘µ measures E’, ‘E is measured by µ’) >112Be
measure algebra 211Yb, 211Yc
—– —– function see inverse-measure-preserving function (234A), measure space automorphism, measure space
isomorphism
measure space §112 (112A)
measure space automorphism 255A, 255L-255N, 255 notes
measure space isomorphism 254K, 254Xj-254Xl
median function 2A1Ac
metric 2A3F, 2A4Fb; Euclidean metric (2A3Fb), Hausdorff metric (246Yb), Lévy’s metric (274Ya), pseudometric
2A3F)
metric outer measure 264Xb, 264Yc
metric space 224Ye, 261Yi; see also complete metric space, metrizable space (2A3Ff)
metrizable (topological) space 2A3Ff, 2A3L; see also metric space, separable metrizable
mid-convex function 233Ya, 233Yd
minimal element in a partially ordered set 2A1Ab
Minkowski’s inequality 244F, 244Yc, 244Ym; see also Brunn-Minkowski inequality (266C)
modified Dirichlet kernel 282Xc
modulation 286C
Monotone Class Theorem 136B
Monotone Convergence Theorem see B.Levi’s theorem (133A)
monotonic function 121D, 222A, 222C, 222Yb, 224D
Monte Carlo integration 273J, 273Ya
multilinear operator 253Xc
negligible set 112D, 131Ca, 214Cb, 234Hb; see also Lebesgue negligible (114E, 115E), null ideal (112Db)
Nikodým see Radon-Nikodým
non-decreasing sequence of sets 112Ce
non-increasing sequence of sets 112Cf
non-measurable set 134B, 134D, 134Xc
norm 2A4B; (of a linear operator) 2A4F, 2A4G, 2A4I; (of a matrix) 262H, 262Ya; (norm topology) 242Xg, 2A4Bb
normal density function 274A, 283N, 283We, 283Wf
normal distribution 274Ad, 495Xh; see also standard normal (274A)
536 Index normal

normal distribution function 274Aa, 274F-274K, 274M, 274Xe, 274Xg


normal random variable 274A, 274B, 274Xi, 285E, 285Xm, 285Xn, 285Xt
normalized Hausdorff measure 264 notes, §265 (265A)
normed algebra 2A4J; see also Banach algebra (2A4Jb)
normed space 224Yf, §2A4 (2A4Ba); see also Banach space (2A4D)
null ideal of a measure 112Db, 251Xp
null set see negligible (112Da)
odd function 255Xb, 283Yd
open interval 111Xb, 114G, 115G, 1A1A, 2A2I
open set (in R r ) 111Gc, 111Yc, 114Yd, 115G, 115Yb, 133Xb, 134Fa, 134Xe, 135Xa, 1A2A, 1A2B, 1A2D; (in R)
111Gc, 111Ye, 114G, 134Xd, 2A2I; (in other topological spaces) 256Ye, 2A3A, 2A3G; see also topology (2A3A)
optional time see stopping time (275L)
order-bounded set (in a partially ordered space) 2A1Ab
order-complete see Dedekind complete (241Ec)
order-continuous norm (on a Riesz space) 242Yc, 242Ye, 244Ye
order*-convergent sequence; ; (in L0 (µ)) 245C, 245K, 245L, 245Xc, 245Xd
order unit (in a Riesz space) 243C
ordered set see partially ordered set (2A1Aa), totally ordered set (2A1Ac), well-ordered set (2A1Ae)
ordering of measures 234P, 234Q, 234Xl, 234Yo
ordinal 2A1C, 2A1D-2A1F, 2A1K
ordinate set 252N, 252Yl, 252Ym, 252Yv
orthogonal matrix 2A6B, 2A6C
orthogonal projection in Hilbert space 244Nb, 244Yk, 244Yl
orthonormal vectors 2A6B
outer measure §113 (113A), 114Xd, 132B, 132Xg, 136Ya, 212Xf, 213C, 213Xa, 213Xg, 213Xk, 213Ya, 251B,
251Wa, 251Xe, 254B, 264B, 264Xa, 264Ya, 264Yo; see also Lebesgue outer measure (114C, 115C), metric outer
measure (264Yc), regular outer measure (132Xa)
—– —– defined from a measure 113Ya, 132A-132E (132B), 132Xa-132Xi, 132Xk, 132Ya-132Yc, 133Je, 212Ea,
212Xa, 212Xb, 213C, 213F, 213Xa, 213Xg-213Xj, 213Xk, 213Yd, 214Cd, 215Yc, 234Bf, 234Ya, 251P, 251S, 251Wk,
251Wm, 251Xn, 251Xq, 252D, 252I, 252O, 252Ym, 254G, 254L, 254S, 254Xb, 254Xq, 254Yd, 264Fb, 264Ye
outer regular measure 134Fa, 134Xe, 256Xi
Parseval’s identity 284Qd; see also Plancherel’s theorem
partial derivative 123D, 252Ye, 262I, 262J, 262Xh, 262Yb, 262Yc
partial order see partially ordered set (2A1Aa)
partially ordered linear space 241E, 241Yg
partially ordered set 2A1Aa, 234Qa, 234Yo
partition 1A1J
Peano curve 134Yl-134Yo
periodic extension of a function on ]−π, π] 282Ae
Plancherel Theorem (on Fourier series and transforms of square-integrable functions) 282K, 284O, 284Qd
point-supported measure 112Bd, 211K, 211O, 211Qb, 211Rc, 211Xb, 211Xf, 213Xo, 215Xr, 234Xb, 234Xe, 234Xk,
256Hb;see also Dirac measure (112Bd)
pointwise convergence (topology on a space of functions) 281Yf
pointwise convergent see order*-convergent (245Cb)
pointwise topology see pointwise convergence
Poisson distribution 285Q, 285Xo
Poisson kernel 282Yg
Poisson’s theorem 285Q
polar coordinates 263G, 263Xf
Pólya’s urn scheme 275Xc
polynomial (on R r ) 252Yi
porous set 223Ye, 261Yg, 262L
positive cone 253Gb, 253Xi, 253Yd
positive definite function 283Xt, 285Xr
power set PX(usual measure on) 254J, 254Xf, 254Xq, 254Yd
—– PN 1A1Hb, 2A1Ha, 2A1Lb; (usual measure on) 273G, 273Xe, 273Xf
Riesz General index 537

predictable sequence 276Ec


presque partout 112De
primitive product measure 251C, 251E, 251F, 251H, 251K, 251Wa, 251Xb-251Xh, 251Xk, 251Xm-251Xo, 251Xq,
251Xr, 252Yc, 252Yd, 252Yl, 253Ya-253Yc, 253Yg
principal ultrafilter 2A1N
probability density function see density (271H)
probability space 211B, 211L, 211Q, 211Xb, 211Xc, 211Xd, 212Ga, 213Ha, 215B, 234Bb, 243Xi, 245Xe, 253Xh,
§254, chap. 27
product measure chap. 25; see also c.l.d. product measure (251F, 251W), primitive product measure (251C),
product probability measure (254C)
product probability measure §254 (254C), 272G, 272J, 272M, 273J, 273Xj, 275J, 275Yi, 275Yj, 281Yk; see also
{0, 1}I
product topology 281Yc, 2A3T
—– see also inner product space
pseudometric 2A3F, 2A3G-2A3M 2A3S-2A3U, 2A5B
pseudo-simple function 122Ye, 133Ye
pull-back measures 234F, 234Ye
purely atomic measure (space) 211K, 211N, 211R, 211Xb, 211Xc, 211Xd, 212Gd, 213He, 214K, 214Xd, 234Be,
234Xe, 234Xj, 251Xu
purely infinite measurable set 213 notes
push-forward measure see image measure (234D)
quasi-Radon measure (space) 256Ya, 263Ya
quasi-simple function 122Yd, 133Yd
quotient partially ordered linear space 241Yg; see also quotient Riesz space
quotient Riesz space 241Yg, 241Yh, 242Yg
quotient topology 245Ba
Rademacher’s theorem 262Q
Radon measure (on R or R r ) §256 (256A), 257A, 284R, 284Yi
—– (on ]−π, π]) 257Yb
Radon-Nikodým derivative 232Hf, 232Yj, 234J, 234Ka, 234Yi, 234Yj, 235Xh, 256J, 257F, 257Xe, 257Xf, 272Xe,
275Ya, 275Yi, 285Dd, 285Xe, 285Ya
Radon-Nikodým theorem 232E-232G, 234O, 235Xj, 242I, 244Yl
Radon probability measure (on R or R r ) 256Hc, 257A, 257Xa, 271B, 271C, 271Xb, 274Xh, 274Yb, 285A, 285D,
285F, 285J, 285M, 285R, 285Xa, 285Xh, 285Xe, 285Xj, 285Xk, 285Xm, 285Ya, 285Yb, 285Yg, 285Yp;
—– —– —– (on other spaces) 256Ye, 271Ya
Radon product measure (of finitely many spaces) 256K
random variable 271Aa
rapidly decreasing test function §284 (284A, 284Wa), 285Dc, 285Xd, 285Ya
rearrangement see decreasing rearrangement (252Yo)
recursion 2A1B
regular measure see inner regular (256Ac)
regular outer measure 132C, 132Xa, 213C, 214Hb, 214Xb, 251Xn, 254Xb, 264Fb
regular topological space 2A5J
relation 1A1B
relatively compact set > 2A3Na, 2A3Ob; see also relatively weakly compact
relatively measurable set 121A
relatively weakly compact set (in a normed space) 247C, 2A5Id
repeated integral §252 (252A); see also Fubini’s theorem, Tonelli’s theorem
reverse martingale 275K
Riemann integrable function 134K, 134L, 281Yh, 281Yi
Riemann integral 134K, 242 notes
Riemann-Lebesgue lemma 282E, 282F
Riesz Convexity Theorem 244 notes
Riesz norm 242Xg
Riesz space (= vector lattice) 231Yc, 241Ed, 241F, 241Yc, 241Yg; see also Banach lattice (242G), Riesz norm
(242Xg)
538 Index Saks

Saks see Denjoy-Young-Saks theorem (222L), Vitali-Hahn-Saks theorem (246Yi)


saltus function 226B, 226Db, 226Xa
saltus part of a function of bounded variation 226C, 226Xb, 226Xc, 226Yd
scalar multiplication of measures 234Xf, 234Xl
Schröder-Bernstein theorem 2A1G
Schwartz function see rapidly decreasing test function (284A)
Schwartzian distribution 284R, 284 notes; see also tempered distribution (284 notes)
self-supporting set (in a topological measure space) 256Xf
semi-continuous functionsee lower semi-continuous (225H)
semi-finite measure (space) 211F, 211L, 211Xf, 211Ya, 212Ga, 213A, 213B, 213Hc, 213Xc, 213Xd, 213Xj, 213Xl,
213Xm, 213Ya-213Yc 213Yf, 214Xd, 214Xg, 214Ya, 215B, 216Xa, 216Yb, 234B, 234Na, 234Xe, 234Xi, 235M, 235Xd,
241G, 241Ya, 241Yd, 243Ga, 245Ea, 245J, 245Xd, 245Xk, 245Xm, 246Jd, 246Xh, 251J, 251Xd, 252P, 252Yk, 253Xf,
253Xg
semi-finite version of a measure 213Xc, 213Xd
semi-martingale see submartingale (275Yf)
seminorm 2A5D
semi-ring of sets 115Ye
separable (topological) space 2A3Ud
separable Banach space 244I, 254Yc
separable metrizable space 245Yj, 264Yb, 284Ye
Sierpiński Class Theorem see Monotone Class Theorem (136B)
signed measure see countably additive functional (231C)
simple function §122 (> >122A), 242M
singular additive functional 232Ac, 232I, 232Yg
singular measures 231Yf, 232Yg
smooth function (on R or R r ) 242Xi, 255Xf, 262Yd-262Yg, 284A, 284Wa, 285Yp
smoothing by convolution 261Ye
solid hull (of a subset of a Riesz space) 247Xa
space-filling curve 134Yl
sphere, surface measure on 265F-265H, 265Xa-265Xc, 265Xe
spherical polar coordinates 263Xf, 265F
square-integrable function 244Na; see also L2
standard normal distribution, standard normal random variable 274A
Steiner symmetrization 264H
step-function 226Xa
Stieltjes measure see Lebesgue-Stieltjes measure (114Xa)
Stirling’s formula 252Yu
stochastically independent see independent (272A)
Stone-Weierstrass theorem 281A, 281E, 281G, 281Ya, 281Yg
stopping time 275L, 275M-275O, 275Xi, 275Xj
strictly localizable measure (space) 211E, 211L, 211N, 211Xf, 211Ye, 212Gb, 213Ha, 213J, 213L, 213O, 213Xa,
213Xh, 213Xn, 213Ye, 214Ia, 214K, 215Xf, 216E, 216Yd, 234Nd, 235N, 251O, 251Q, 251Wl, 251Xo, 252B, 252D,
252Yr, 252Ys
strong law of large numbers 273D-273J, 275Yn, 276C, 276F, 276Ye, 276Yg
subalgebra; see σ-subalgebra (233A)
submartingale 275Yf, 275Yg
subspace measure 113Yb, 214A, 214B, 214C, 214H, 214I, 214Xb-214Xg, 214Ya, 216Xa, 216Xb, 241Ye, 242Yf,
243Ya, 244Yd, 245Yb, 251Q, 251R, 251Wl, 251Xo, 251Xp, 251Yc, 254La, 254Ye, 264Yf
—– —– on a measurable subset 131A, 131B, 131C, 132Xb, 135I, 214K, 214L, 214Xa, 214Xh, 241Yf, 247A
—– —– (integration with respect to a subspace measure) 131D, 131E-131H, 131Xa-131Xc, 133Dc, 133Xa, 135I,
214D, 214E-214G, 214N, 214J, 214Xl
subspace of a normed space 2A4C
subspace topology 2A3C, 2A3J
subspace σ-algebra 121A, 214Ce
substitution see change of variable in integration
successor cardinal 2A1Fc
—– ordinal 2A1Dd
vague General index 539

sum over arbitrary index set 112Bd, 226A


sum of measures 112Yf, 122Xi, 234G, 234H, 234Qb, 234Xd-234Xg, 234Xi, 234Xl, 234Yf-234Yh
summable family of real numbers 226A, 226Xf
—– of a topological measure 256Xf, 257Xd, 285Yp
—– see also bounded support, compact support
supported see point-supported (112Bd)
supportingsee self-supporting set (256Xf), support
supremum 2A1Ab
surface measure see normalized Hausdorff measure (265A)
symmetric distribution 272Yc
symmetrization see Steiner symmetrization
tempered distribution 284 notes
—– function §284 (284D), 286D
—– measure 284Yi
tensor product of linear spaces 253 notes
test function 242Xi, 284 notes; see also rapidly decreasing test function (284A)
thick set see full outer measure (132F)
Three Series Theorem 275Yr
tight see uniformly tight (285Xj)
Tonelli’s theorem 252G, 252H, 252R
topological measure (space) 256A
topological space §2A3 (2A3A)
topological vector space see linear topological space (2A5A)
topology §2A2, §2A3 (2A3A); see also convergence in measure (245A), linear space topology (2A5A)
total order see totally ordered set (2A1Ac)
total variation (of an additive functional) 231Yh; (of a function) see variation (224A)
totally finite measure (space) 211C, 211L, 211Xb, 211Xc, 211Xd, 212Ga, 213Ha, 214Ia, 214Ka, 214Yc, 215Yc,
232Bd, 232G, 243Ic, 234Bc, 243Xk, 245Fd, 245Ye, 246Xi, 246Ya
totally ordered set 135Ba, 2A1Ac
trace (of a σ-algebra) see subspace σ-algebra (121A)
transfinite recursion 2A1B
translation-invariant measure 114Xf, 115Xd, 134A, 134Ye, 134Yf, 255A, 255N, 255Yn
truly continuous additive functional 232Ab, 232B-232E, 232H, 232I, 232Xa, 232Xb, 232Xf, 232Xh, 232Ya, 232Ye
Ulam S. see Banach-Ulam problem
ultrafilter 254Yd, 2A1N, 2A1O, 2A3R, 2A3Se; see also principal ultrafilter (2A1N)
Ultrafilter Theorem 2A1O
uncompleted indefinite-integral measure 234Kc
uniform space 2A5F
uniformly continuous function 224Xa, 255K
uniformly convex normed space 244O, 244Yn, 2A4K
uniformly distributed sequence see equidistributed (281Yi)
uniformly integrable set (in L1 ) §246 (>
>246A), 252Yo, 272Ye, 273Na, 274J, 275H, 275Xi, 275Yl, 276Xd, 276Yb; (in
1
L (µ)) §246 (246A), 247C, 247D, 247Xe, 253Xd
uniformly tight (set of measures) 285Xj, 285Xk, 285Ye, 285Yf
unit ball in R r 252Q
universal mapping theorems 253F, 254G
up-crossing 275E, 275F
upper integral 133I, 133J-133L, 133Xf, 133Yd, 133Yf, 135H, 135I, 214Ja, 214Xl, 235A, 252Yj, 252Ym, 252Yn,
253J, 253K, 273Xj
upper Riemann integral 134Ka
upwards-directed partially ordered set 2A1Ab
usual measure on {0, 1}I 254J; see under {0, 1}I
—– —– on PX 254J; see under power set
vague topology (on a space of signed measures) 274Ld, 274Xh, 274Ya-274Yd, 275Yp, 285K, 285L, 285S, 285U,
285Xk, 285Xq, 285Xs, 285Yd, 285Yg-285Yi, 285Yn, 285Yr
540 Index variance

variance of a random variable 271Ac, 271Xa, 272S, 274Xj, 274Ye, 285Gb, 285Xo
variation of a function §224 (224A, 224K, 224Yd, 224Ye), 226B, 226Db, 226Xc, 226Xd, 226Yb, 226Yd, 264Xf,
265Yb; see also bounded variation (224A)
—– of a measure see total variation (231Yh)
vector integration see Bochner integral (253Yf)
vector lattice see Riesz space (241E)
virtually measurable function 122Q, 122Xe, 122Xf, 135Ia, 212Bb, 212Fa, 234K, 241A, 252E, 252O
Vitali’s construction of a non-measurable set 134B
Vitali cover 261Ya
Vitali’s theorem 221A, 221Ya, 221Yc-221Ye, 261B, 261Yg, 261Yk
Vitali-Hahn-Saks theorem 246Yi
volume 115Ac
—– of a ball in R r 252Q, 252Xi
Wald’s equation 272Xh
—– —– of a normed space 247Ya, 2A5I
—– —– see also (relatively) weakly compact, weakly convergent
weak* topology on a dual space 253Yd, 285Yg, 2A5Ig; see also vague topology (274Ld)
weakly compact set (in a linear topological space) 247C, 247Xa, 247Xc, 247Xd, 2A5I; see also relatively weakly
compact (2A5Id)
weakly convergent sequence in a normed space 247Yb
Weierstrass’ approximation theorem 281F; see also Stone-Weierstrass theorem
well-distributed sequence 281Xh, 281Ym
well-ordered set 214P, 2A1Ae, 2A1B, 2A1Dg, 2A1Ka; see also ordinal (2A1C)
Well-ordering Theorem 2A1Ka
Weyl’s Equidistribution Theorem 281M, 281N, 281Xh
Wirtinger’s inequality 282Yf
Young’s inequality 255Yl
—– see also Denjoy-Young-Saks theorem (222L)
Zermelo’s Well-ordering Theorem 2A1Ka
zero-one law 254S, 272O, 272Xf, 272Xg
Zorn’s lemma 2A1M
a.e. (‘almost everywhere’) 112Dd
a.s. (‘almost surely’) 112De
B (in B(x, δ), closed ball) 261A
B (in B(U ; V ), space of bounded linear operators) 253Xb, 253Yj, 253Yk, 2A4F, 2A4G, 2A4H
c (the cardinal of R and PN) 2A1H, 2A1L, 2A1P
C = the set of complex numbers; (in R C ) 2A1A
C (in C(X), where X is a topological space) 243Xo, 281Yc, 281Ye, 281Yf; (in C([0, 1])) 242 notes
Cb (in Cb (X), where X is a topological space) 281A, 281E, 281G, 281Ya, 281Yd, 281Yg, 285Yg
Ck (in Ck (X), where X is a topological space) 242O, 244Hb, 244Yj, 256Xh; see also compact support
—– (in Ck (X; C)) 242Pd
c.l.d. product measure 251F, 251G, 251I-251L, 251N-251U, 251W, 251Xb-251Xm, 251Xp, 251Xs-251Xu, 251Yb-
251Yd, §§252-253, 254Db, 254U, 254Ye, 256K, 256L
c.l.d. version of a measure (space) 213E, 213F-213H, 213M, 213Xb-213Xe, 213Xg, 213Xj, 213Xk, 213Xn, 213Xo,
213Yb, 214Xe, 214Xi, 232Ye, 234Xl, 234Yj, 234Yo, 241Ya, 242Yh, 244Ya, 245Yc, 251Ic, 251T, 251Wf, 251Wl, 251Xe,
251Xk, 251Xl, 252Ya
D (in D+ f , D− f ) see Dini derivate (222J)
D (in D+ f , D− f ) see Dini derivate (222J)
diam (in diam A) = diameter
dom (in dom f ): the domain of a function f
E (in E(X), expectation of a random variable) 271Ab
ess sup see essential supremum (243Da)
S General index 541

f -algebra 241H, 241 notes


Gδ set 264Xd
ℓ1 (in ℓ1 (X)) 242Xa, 243Xl, 246Xd, 247Xc, 247Xd
ℓ1 (= ℓ1 (N)) 246Xc
ℓ2 244Xn, 282K, 282Xg
ℓp (in ℓp (X)) 244Xn
ℓ∞ (in ℓ∞ (X)) 243Xl, 281B, 281D
ℓ∞ (= ℓ∞ (N)) 243Xl
L0 (in L0 (µ)) 121Xb, §241 (241A), §245, 253C, 253Ya; see also L0 (241C), L0strict (241Yh), L0C (241J)
L0strict 241Yh
L0C (in L0C (µ)) 241J, 253L
L0 (in L0 (µ)) §241 (241A), 242B, 242J, 243A, 243B, 243D, 243Xe, 243Xj, §245, 253Xe-253Xg, 271De, 272H;
—– (in L0C (µ)) 241J
—– see also L0 (241A)
L1 (in L1 (µ)) 122Xc, 242A, 242Da, 242Pa, 242Xb; (in L1strict (µ)) 242Yg; (in L1C (µ) 242P, 255Yn; (in L1V (µ))
253Yf; see also L1 , k k1
L1 (in L1 (µ)) §242 (242A), 243De, 243F, 243G, 243J, 243Xf-243Xh, 245H, 245J, 245Xh, 245Xi, §246, §247, §253,
254R, 254Xp, 254Ya, 254Yc, 257Ya, 282Bd
—– (in L1V (µ)) 253Yf, 253Yi
—– see also L1 , L1C , k k1
L1C (µ) 242P, 243K, 246K, 246Yl, 247E, 255Xi; see also convolution of functions
L2 (in L2 (µ)) 253Yj, §286; (in L2C (µ)) 284N, 284O, 284Wh, 284Wi, 284Xi, 284Xk-284Xm, 284Yg; see also L2 , Lp ,
k k2
L2 (in L2 (µ)) 244N, 244Yl, 247Xe, 253Xe; (in L2C (µ)) 244Pe, 282K, 282Xg, 284P; see also L2 , Lp , k k2
Lp (in Lp (µ)) 244A, 244Da, 244Eb, 244Pa, 244Xa, 244Ya, 244Yi, 246Xg, 252Yh, 253Xh, 255K, 255Of, 255Ye,
255Yf, 255Yk, 255Yl, 261Xa, 263Xa, 273M, 273Nb, 281Xd, 282Yc, 284Xj, 286A; see also Lp , L2 , k kp
Lp (in Lp (µ), 1 < p < ∞) §244 (244A), 245G, 245Xk, 245Xl, 245Yg, 246Xh, 247Ya, 253Xe, 253Xi, 253Yk, 255Yh;
see also Lp , k kp
L∞ (in L∞ (µ)) 243A, 243D, 243I, 243Xa, 243Xl, 243Xn; see also L∞
L∞C 243K
L∞strict 243Xb
L∞ (in L∞ (µ)) §243 (243A), 253Yd; see also L∞ , L∞ C , k k∞
L∞C 243K, 243Xm
L (in L(U ; V ), space of linear operators) 253A, 253Xa
lim (in lim F) 2A3S; (in limx→F ) 2A3S
lim inf (in lim inf n→∞ ) §1A3 (1A3Aa), 2A3Sg; (in lim inf t↓0 ) 1A3D, 2A3Sg; (in lim inf x→F ) 2A3S
lim sup (in lim supn→∞ ) §1A3 (1A3Aa), 2A3Sg; (in lim supt↓0 ) 1A3D, 2A3Sg; (in lim supx→F f (x)) 2A3S
ln+ 275Yd
M0,∞ 252Yo
M 1,∞ (in M 1,∞ (µ)) 244Xl, 244Xm, 244Xo, 244Yd
med (in med(u, v, w)) see median function (2A1Ac)
N see power set
N × N 111Fb
P see power set
p.p. (‘presque partout’) 112De
Pr(X > a), Pr(X X ∈ E) etc. 271Ad
Q (the set of rational numbers) 111Eb, 1A1Ef
R (the set of real numbers) 111Fe, 1A1Ha, 2A1Ha, 2A1Lb
RI 245Xa, 256Ye; see also Euclidean metric, Euclidean topology
R see extended real line (§135)
R
C 2A4A
S (in S(A)) 243I; (in S f ∼
= S(Af )) 242M, 244Ha
542 Index S

S see rapidly decreasing test function (284A)


S 1 (the unit circle, as topological group) see circle group
S r−1 (the unit sphere in R r ) see sphere

sf (in µsf ) see semi-finite version of a measure (213Xc); (in µsf ) 213Xf, 213Xg, 213Xk

T (in Tµ̄,ν̄ ) 244Xm, 244Xo, 244Yd, 246Yc


Tm see convergence in measure (245A)
Ts (in Ts (U, V )) see weak topology (2A5Ia), weak* topology (2A5Ig)
U (in U (x, δ)) 1A2A
Var (in Var(X)) see variance (271Ac); (in VarD f , Var f ) see variation (224A)
w∗ -topology see weak* topology (2A5Ig)
Z (the set of integers) 111Eb, 1A1Ee; (as topological group) 255Xk
ZFC see Zermelo-Fraenkel set theory
βr (volume of unit ball in R r ) 252Q, 252Xi, 265F, 265H, 265Xa, 265Xb, 265Xe
Γ (in Γ(z)) see gamma function (225Xj)
∆-system 2A1Pa

µG (standard normal distribution) 274Aa


νX see distribution of a random variable (271C)
π-λ Theorem see Monotone Class Theorem (136B)
σ-additive see countably additive (231C)
σ-algebra of sets 111A, 111B, 111D-111G, 111Xc-111Xf, 111Yb, 136Xb, 136Xi, 212Xh; see also Borel σ-algebra
(111G))
σ-algebra defined by a random variable 272C, 272D
σ-complete see Dedekind σ-complete (241Fb)
σ-field see σ-algebra (111A)
σ-finite measure (space) 211D, 211L, 211M, 211Xe, 212Ga, 213Ha, 213Ma, 214Ia, 214Ka, 215B, 215C, 215Xe,
215Ya, 215Yb, 216A, 232B, 232F, 234B, 234Ne, 234Xe, 235M, 235P, 235Xj, 241Yd, 243Xi, 245Eb, 245K, 245L, 245Xe,
251K, 251L, 251Wg, 251Wp, 252B-252E, 252H, 252P, 252R, 252Xd, 252Yb, 252Yg, 252Yv
σ-ideal (of sets) 112Db, 211Xc, 212Xe, 212Xh
σ-subalgebra
P of sets §233 (233A)
i∈I ia 112Bd, 222Ba, 226A
τ -additive measure 256M, 256Xb, 256Xc
Φ see normal distribution function (274Aa)
χ (in χA, where A is a set) 122Aa
ω (the first infinite ordinal) 2A1Fa
ω1 (the first uncountable ordinal) 2A1Fc
ω2 (the second uncountable initial ordinal) 2A1Fc
\ (in E \ F , ‘set difference’) 111C
△ (in S
S E△F , ‘symmetric difference’)
S 111C
T (in Tn∈N E n ) 111C; (in T A) 1A1F
(in n∈N En ) 111C; (in E) 1A2F
∨, ∧ (in a lattice) 121Xb, 2A1Ad
↾ (in f ↾A, the restriction of a function to a set) 121Eh
see closure (2A2A, 2A3Db)
¯ (in h̄(u), where h is a Borel function and u ∈ L0 ) 241I, 241Xd, 241Xi, 245Dd
=a.e. 112Dg, 112Xe, 241C
≤a.e. 112Dg, 112Xe
≥a.e. 112Dg, 112Xe
≪ (in ν ≪ µ) see absolutely continuous (232Aa)
special symbols General index 543

∗ (in f ∗ g, u ∗ v, λ ∗ ν, ν ∗ f , f ∗ ν) see convolution (255E, 255O, 255Xh, 255Xk, 255Yn)


* (in weak*) see weak* topology (2A5Ig); (in U ∗ = B(U ; R), linear topological space dual) see dual (2A4H); (in

µ ) see outer measure defined by a measure (132B)
∗ (in µ∗ ) see inner measure defined by a measure (113Yh,)
′ ′
R (in TR ) see
R adjoint
R operator
(in fR , f dµ, f (x)µ(dx)) 122E, 122K, 122M, 122Nb; see also upper integral, lower integral (133I)
—– (in R u) 242Ab, 242B, 242D
—– (in RA f ) 131D, 214D, 235Xe; see also subspace measure
—– (in A u) 242Ac
R
R see upper integral (133I)
see lower integral (133I)
R
R see Riemann integral (134K)
| | (in a Riesz space) 241Ee, 242G
k k1 (on L1 (µ)) §242 (242D), 246F, 253E, 275Xd, 282Ye; (on L1 (µ)) 242D, 242Yg, 273Na, 273Xk
k k2 244Da, 273Xl, 282Yf; see also L2 , k kp
k kp (for 1 < p < ∞) §244 (244Da), 245Xj, 246Xb, 246Xh, 246Xi, 252Yh, 252Yo, 253Xe, 253Xh, 273M, 273Nb,
275Xe, 275Xf, 275Xh, 276Ya; see also Lp , Lp
k k∞ 243D, 243Xb, 243Xo, 244Xg, 273Xm, 281B; see also essential supremum (243D), L∞ , L∞ , ℓ∞
⊗ (in f ⊗ g) 253B, 253C, 253I, 253J, 253L, 253Ya, 253Yb; (in u ⊗ v) 253E, 253F, 253G, 253L, 253Xc-253Xg, 253Xi,
253Yd
b (in Σ⊗T)
⊗ b 251D, 251K, 251M, 251Xa, 251Xl, 251Ya, 252P, 252Xe, 252Xh, 253C
N
c (in N c Σi ) 251Wb, 251Wf, 254E, 254F, 254Mc, 254Xc, 254Xi, 254Xs
Q Q i∈I Q
(in i∈I αi ) 254F; (in i∈I Xi ) 254Aa
# (in #(X), the cardinal of X) 2A1Kb
∧ ∨

, (in f , f ) see Fourier transform, inverse Fourier transform (283A)

+
(in κ+ , successor cardinal) 2A1Fc; (in f + , where f is a function) 121Xa, 241Ef; (in u+ , where u belongs to a
Riesz space) 241Ef; (in F (x+ ), where F is a real function) 226Bb

(in f − , where f is a function) 121Xa, 241Ef; (in u− , in a Riesz space) 241Ef; (in F (x− ), where F is a real
function) 226Bb
{0, 1}I (usual measure on) 254J, 254Xd, 254Xe, 254Yc, 272N, 273Xb
—– —– (when I = N) 254K, 254Xj, 254Xq, 256Xk, 261Yd
∞ see infinity
[ ] (in [a, b]) see closed interval (115G, 1A1A, 2A1Ab); (in f [A], f −1 [B], R[A], R−1 [B]) 1A1B
[[ ]] (in f [[F]]) see image filter (2A1Ib)
[ [ (in [a, b[) see half-open interval (115Ab, 1A1A)
] ] (in ]a, b]) see half-open interval (1A1A)
] [ (in ]a, b[) see open interval (115G, 1A1A)
< > (in <x>, fractional part) 281M
(in µ E) 234M, 235Xe

You might also like