Real Analysis: 1 σ-algebras

Real Analysis
Richard F. Bass
February 2, 2010
1 -algebras
Let X be a set. We will use the notation: A
c
= x X : x / A and
A B = A B
c
. (The notation A B is also commonly used.)
Denition 1.1 An algebra is a collection / of subsets of X such that
(a) , X /;
(b) if A /, then A
c
/;
(c) if A
1
, . . . , A
n
/, then
n
i=1
A
i
and
n
i=1
A
i
are in /.
/ is a -algebra (or -eld) if in addition
(d) if A
1
, A
2
, . . . are in /, then
i=1
A
i
and
i=1
A
i
are in /.
In (d) we allow countable unions and intersections only; we do not allow
uncountable unions and intersections.
Example 1.2 Let X = R and / be the collection of all subsets of R.
1
Example 1.3 Let X = R and let
/ = A R : A is countable or A
c
is countable.
Parts (a) and (b) of the denition are easy. Suppose A
1
, A
2
, . . . are all in /.
If each of the A
i
are countable, then
i
A
i
is countable, and so in /. If A
c
i
0
is countable for some i
0
, then
(A
i
)
c
=
i
A
c
i
A
c
i
0
is countable, and again
i
A
i
is in /. Since A
i
= (
i
A
c
i
)
c
, then the countable
intersection of sets in / is again in /.
Example 1.4 Let X = [0, 1] and / = , X, [0,
1
2
], (
1
2
, 1].
Example 1.5 X = 1, 2, 3 and / = X, , 1, 2, 3.
Example 1.6 Let X = [0, 1], and B
1
, . . . , B
8
subsets of X which are pairwise
disjoint and whose union is all of X. Let /be the collection of all nite unions
of the B
i
s as well as the empty set. (So / consists of 2
8
elements.)
Note that if we take an intersection of -algebras, we get a -algebra; this is
just a matter of checking the denition.If we have a collection ( of subsets of
X, there is at least one -algebra containing (, namely, the one consisting of
all subsets of X. We can take the intersection of all -algebras that contain
(; we denote this intersection by ((). If / is any -algebra containing (,
then / (().
If X has some additional structure, say, it is a metric space, then we can
talk about open sets. If ( is the collection of open subsets of X, then we
call (() the Borel -algebra on X, and this is often denoted B. We will see
later that when X is the real line, that B is not equal to the collection of all
subsets of X.
We end this section with the following proposition.
2
Proposition 1.7 If X = R, then the Borel -algebra is generated by each
of the following collection of sets:
(1) (
1
= (a, b) : a, b R.
(2) (
2
= [a, b] : a, b R;
(3) (
3
= (a, b] : a, b R;
(4) (
4
= (a, ) : a R;
Proof. (1) Let ( be the collection of open sets. Then (
1
( ((). (()
is the Borel -algebra and contains (
1
. Since ((
1
) is the intersection of all
-algebras containing (
1
, then ((
1
) (().
To get the reverse inclusion, if G is open, it is the countable union of open
intervals. So G ((
1
), and hence ( ((
1
). (() is the intersection of all
-algebras containing (; ((
1
) is one such, so (() ((
1
).
(2) If [a, b] (
2
, then [a, b] =
n=1
(a
1
n
, b +
1
n
) ((). So (
2
((),
and by an argument similar to that in (1), we conclude ((
2
) (().
If (a, b) (
1
, choose n
0
2/(b a) and note (a, b) =
n=n
0
[a+
1
n
, b
1
n
]
((
2
). So the Borel -algebra, which is equal to ((
1
) by part (1), is contained
in ((
2
).
(3) The proof here is similar to (2), using (a, b] =
n=1
(a, b +
1
n
) and
(a, b) =
n=n
0
(a, b
1
n
], provided n
0
is taken large enough.
(4) The proof of this comes from (3), using that (a, b] = (a, ) (b, )
and (a, ) =
n=1
(a, a + n].
2 Measures
Denition 2.1 A measure on (X, /) is a function : / [0, ] such that
(a) (A) 0 for all A /;
(b) () = 0;
3
(c) if A
i
/ are disjoint, then
(
i=1
A
i
) =
i=1
(A
i
).
Example 2.2 X is any set, / is the collection of all subsets, and (A) is
the number of elements in A. This is called counting measure.
Example 2.3 X = R, / the collection of all subsets, x
1
, x
2
, . . . R, and
a
1
, a
2
, . . . > 0. Set (A) =
{i:x
i
A}
a
i
. A particular case of this is if x
i
= i
and all the a
i
= 1. We will see later that this allows us to view innite series
as functions on this space.
Example 2.4
x
(A) = 1 if x A and 0 otherwise. This measure is called
point mass at x.
We will construct Lebesgue measure on R; this is an extension of the
notion of length. However, the construction is a bit lengthy. We will also
construct Lebesgue measure on R
n
; when n = 2, this is an extension of the
notion of area, when n = 3, of volume.
Proposition 2.5 The following hold:
(a) If A, B / with A B, then (A) (B).
(b) If A
i
/ and A =
i=1
A
i
, then (A)

i=1
(A
i
).
(c) If A
i
/, A
1
A
2
, and A =
i=1
A
i
, then (A) = lim
n
(A
n
).
(d) If A
i
/, A
1
A
2
, (A
1
) < , and A =
i=1
A
i
, then we have
(A) = lim
n
(A
n
).
4
Proof. (a) Let A
1
= A, A
2
= B A, and A
3
= A
4
= = . Now use part
(c) of the denition of measure.
(b) Let B
1
= A
1
, B
2
= A
2
B
1
, B
3
= A
3
(B
1
B
2
), and so on. The B
i
are disjoint and
i=1
B
i
=
i=1
A
i
. So (A) =
(B
i
)

(A
i
).
(c) Dene the B
i
as in (b). Since
n
i=1
B
i
=
n
i=1
A
i
, then
(A) = (
i=1
A
i
) = (
i=1
B
i
) =
i=1
(B
i
)
= lim
n
n
i=1
(B
i
) = lim
n
(
n
i=1
B
i
) = lim
n
(
n
i=1
A
i
).
(d) Apply (c) to the sets A
1
A
i
, i = 1, 2, . . ..
Example 2.6 To see that (A
1
) < is necessary, let X be the positive
integers, counting measure, and A
i
= i, i +1, . . .. Then the A
i
decrease,
(A
i
) = for all i, but (
i
A
i
) = () = 0.
Denition 2.7 A probability or probability measure is a measure such that
(X) = 1. In this case we usually write (, T, P) instead of (X, /, ).
3 Construction of Lebesgue measure
Dene m((a, b)) = ba. If G is an open set and G R, then G =
i=1
(a
i
, b
i
)
with the intervals disjoint. Dene m(G) =
i=1
(b
i
a
i
). If A R, dene
m
(A) = infm(G) : G open, A G. (3.1)

We will show the following.
(A) m
is not a measure on the collection of all subsets of R.

(B) m
is a measure on a strictly smaller -algebra that strictly contains the

Borel -algebra.
5
We will prove these two facts (and a bit more) in a moment, but lets rst
make some remarks.
A set N is a null set with respect to m
if m
(N) = 0. Let L be the

smallest -algebra containing B and all the null sets. More precisely, let ^
be the collection of all sets that are null sets with respect to m
and let
L = (B ^). L is called the Lebesgue -algebra, and sets in L are called
Lebesgue measurable.
As part of our proof of (B) we will show that m
is a measure on L.
Lebesgue measure is the measure m
on L. (A) shows that L is strictly

smaller than the collection of all subsets of R.
It is easy to get lost in the construction of Lebesgue measure, so let us
summarize our steps.
First we prove (A), which Proposition 3.1.
We then turn to the construction of Lebesgue measure. It is more conve-
nient for technical reasons to dene
m
(A) = inf
i=1
(b
i
a
i
) : A
i=1
(a
i
, b
i
]. (3.2)
There is no real dierence between this and (3.1) since it is clear that the
dierence between an open set and a set of the form
i=1
(a
i
, b
i
] is countable,
and with either denition of m
, the measure of a point is 0, so the measure

of a set consisting of countably many points is 0. However, when we talk
about Lebesgue-Stieltjes measure, then there is a real dierence.
We dene what it means to be an outer measure (Denition 3.2) and prove
that m
is an outer measure (Proposition 3.3). We then dene what it means

for a set to be m
-measurable (Denition 3.4) and prove that the collection

of m
-measurable sets is a -algebra and that m
restricted to this -algebra

is a measure.
This looks promising, but we do not yet know that enough sets are m
-
measurable. That takes one more step. We show in Proposition 3.6 that the
collection of m
-measurable sets contains the Borel -algebra.

Proposition 3.1 m
is not a measure on the collection of all subsets of R.

6
Proof. Suppose m
is a measure. Dene x y if xy is rational. This is an

equivalence relationship on [0, 1]. For each equivalence class, pick an element
out of that class (by the axiom of choice) Call the collection of such points
A. Given a set B, dene B +x = y +x : y B. Note m
(A+q) = m
(A)
since this translation invariance holds for intervals, hence for open sets, hence
for all sets. Moreover, the sets A + q are disjoint for dierent rationals q.
Now
[0, 1]
q[2,2]
(A + q),
where the sum is only over rational q, so 1

q[2,2]
m
(A + q), and
therefore m
(A) > 0. But
q[2,2]
(A + q) [6, 6],
where again the sum is only over rational q, so if m
is a measure, then
12

q[2,2]
m
(A + q), which implies m
(A) = 0, a contradiction.
Denition 3.2 A function n on the collection of all subsets satisfying
(a) n() = 0;
(b) if A B, then n(A) n(B);
(c) n(
i=1
A
i
)

i=1
n(A
i
).
is called an outer measure.
Proposition 3.3 m
dened by (3.2) is an outer measure.

Proof. (a) and (b) are obvious. To prove (c), let > 0. For each i there
exist intervals I
i1
, I
i2
, . . ., each of the form (a
ij
, b
ij
], such that A
i

j=1
I
ij
and
j
m(I
ij
) m
(A
i
) + /2
i
. Then
i=1
A
i

i,j
I
ij
and
i,j
m(I
ij
)
i
m
(A
i
) +
i
/2
i
=
i
m
(A
i
) + .
Since is arbitrary, m
i=1
A
i
)

i=1
m
(A
i
).
7
Denition 3.4 Let m
be an outer measure. A set A X is m
-measurable
if
m
(E) = m
(E A) + m
(E A
c
) (3.3)
for all E X.
Theorem 3.5 If m
is an outer measure on X, then the collection / of m
measurable sets is a -algebra and the restriction of m
to / is a measure.
Moreover, / contains all the null sets.
Proof. By Proposition 3.3,
m
(E) m
(E A) + m
(E A
c
)
for all E X. So to check (3.3) it is enough to show m
(E) m
(E A) +
m
(E A
c
). This will be trivial in the case m
(E) = .
If A /, then A
c
/ by symmetry and the denition of /. Suppose
A, B / and E X. Then
m
(E) = m
(E A) + m
(E A
c
)
= (m
(E A B) + m
(E A B
c
)) + (m
(E A
c
B)
+ m
(E A
c
B
c
).
The rst three terms on the right have a sum greater than or equal to m
(E
(A B)) because A B (A B) (A B
c
) (A
c
B). Therefore
m
(E) m
(E (A B)) + m
(E (A B)
c
),
which shows A B /. Therefore / is an algebra.
Let A
i
be disjoint sets in /, let B
n
=
n
i=1
A
i
, and B =
i=1
A
i
. If E X,
m
(E B
n
) = m
(E B
n
A
n
) + m
(E B
n
A
c
n
)
= m
(E A
n
) + m
(E B
n1
).
Repeating for m
(E B
n1
), we obtain
m
(E B
n
)
n
i=1
m
(E A
i
).
8
Since B
n
/, then
m
(E) = m
(E B
n
) + m
(E B
c
n
)
n
i=1
m
(E A
i
) + m
(E B
c
).
Let n . Recalling that m
is an outer measure,
m
(E)
i=1
m
(E A
i
) + m
(E B
c
)
m
i=1
(E A
i
)) + m
(E B
c
)
= m
(E B) + m(E B
c
)
m
(E).
This shows B /.
If we set E = B in this last equation, we obtain
m
(B) =
i=1
m
(A
i
),
or m
is countably additive on /.
If m
(A) = 0 and E X, then

m
(E A) + m
(E A
c
) = m
(E A
c
) m
(E),
which shows / contains all null sets.
Dene m(
n
i=1
(a
i
, b
i
]) =
n
i=1
(b
i
a
i
) if the (a
i
, b
i
] are disjoint. Note m is
well-dened (a set might be expressible as a union of such intervals in more
than one way).
The last step in the construction is the following.
Proposition 3.6 Every set in the Borel -algebra is m
- measurable.
Proof. Since the collection of m
-measurable sets is a -algebra, it suces

to show that every interval J of the form (a, b] is m
-measurable. Let E be
any set with m
(E) < ; we need to show

m
(E) m
(E J) + m
(E J
c
). (3.4)
9
Choose I
1
, I
2
, . . . of the form (a
i
, b
i
] such that E
i
I
i
and
m
(E)
i
(b
i
a
i
) .
Since E I
i
, we have m
(E J)

m
(I
i
J) and m
(E J
c
)
(I
i
J
c
). Hence we have
m
(E J) + m
(E J
c
)
i
[m
(I
i
J) + m
(I
i
J
c
)].
Now m
(I
i
J) is the length of an interval and m
(I
i
J
c
) is the length of
two intervals, so
m
(I
i
J) + m
(I
i
J) = m
(I
i
).
Thus
m
(E J) + m
(E J
c
)
(I
i
) m
(E) + .
Since is arbitrary, this proves (3.4).
We now drop the asterisks from m
and call m Lebesgue measure.

4 Examples and related results
Example 4.1 Recall the Cantor set is constructed by taking the interval
[0, 1], removing the middle third, removing the middle thirds of each of the
two remaining subintervals, and continuing. The Cantor set is what remains;
it is closed, uncountable, and every point is a limit point. Moreover, it
contains no intervals.
After one stage, the measure of the two intervals is 2(
1
3
), after two stages
4(1/9), and after n stages, (2/3)
n
. Since the Cantor set C is the intersection
of all these sets, the Lebesgue measure of C is 0.
Suppose we dene f
0
to be 1/2 on the interval (1/3, 2/3), 1/4 on the in-
terval (1/9, 2/9), 3/4 on the interval (7/9, 8/9), and so on. Dene f(x) =
inff
0
(y) : y x for x < 1. Dene f(1) = 1. Notice f = f
0
on the comple-
ment of the Cantor set. f is monotone, so it has only jump discontinuities.
But if it has a jump continuity, there is a rational of the form k/2
n
with
10
k 2
n
that is not in the range of f. On the other hand, by the construction,
each of these values is taken by f
0
for some point in the complement of C,
and so is taken by f. The only way this can happen is if f is continuous.
This function f is called the Cantor-Lebesgue function. We will use it in
examples later on. For now, we can see that it a function that increases only
on the Cantor set, which is of Lebesgue measure 0, yet is continuous.
Example 4.2 Let q
1
, q
2
, . . . be an enumeration of the rationals, let > 0,
and let I
i
be the interval (q
i
/2
i
, q
i
+ /2
i
). Then the measure of I
i
is
/2
i1
, so the measure of
i
I
i
is at most 2. (It is not equal to that because
there is a lot of overlap.) So the measure of A = [0, 1]
i
I
i
is larger than
1 2. But A contains no rational numbers.
Example 4.3 Let us follow the construction of the Cantor set, with this
dierence. Instead of removing the middle third at the rst stage, remove
the middle fourth, i.e., remove (3/8, 5/8). On each of the two intervals that
remain, remove the middle sixteenths. On each of the four intervals that
remain, remove the middle interval of length 1/64, and so on. The total that
we removed is
1
4
+ 2(
1
16
) + 4(
1
64
) + =
1
2
.
The set that remains contains no intervals, is closed, every point is a limit
point, is uncountable, and has measure 1/2. Such a set is called a fat Cantor
set or generalized Cantor set. Of course, other choices that 1/4, 1/16, etc.
are possible.
Let A [0, 1] be a Borel measurable set. We will show that A is al-
most equal to the countable intersection of open sets and almost equal to
the countable union of closed sets.. (A similar argument to what follows is
possible for sets that have innite measure.)
Proposition 4.4 Suppose A [0, 1] is a Borel measurable set.
(a) There exists a set H that is the countable intersection of open sets
which contains A and m(H A) = 0.
(b) There exists a set F that is the countable union of closed sets which is
contained in A and m(A F) = 0.
11
Proof. (a) For each i, there is an open set G
i
that contains A and such that
m(G
i
A) < 2
i
. This follows from the fact that m(A) = m
(A) and the

denition of m
. Then H
i
=
ji
G
j
will contain A, is open, and since it is
contained in G
i
, then m(H
i
A) < 2
i
. Let H =
i=1
H
i
. H need not be
open, but it is the intersection of countably many open sets. The set H is a
Borel set, contains A, and m(H A) m(H
i
A) < 2
i
for each i, hence
m(H A) = 0.
(b) If A [0, 1], let F
i
= [0, 1] H
i
, where H
i
is a decreasing sequence of
open sets containing A
c
such that m(H
i
A
c
) < 2
i
. (The H
i
are constructed
as in the proof of (a), but in terms of A
c
instead of A.) Then F
i
is an
increasing sequence of closed sets, F
i
A for each i, and m(A F
i
) < 2
i
for each i. Our result follows from letting F =
i
F
i
since m(A F)
m(A F
i
) < 2
i
for each i, hence m(A F) = 0..
The countable intersections of open sets are sometimes called G
sets; the
G is for geonet, the German word for open and the for Durchschnitt,
the German word for intersection. The countable unions of closed sets are
called F
sets, the F coming from ferme, the French word for closed, and
the coming from Summe, the German word for union.
Therefore, when trying to understand Lebesgue measure, we can look at
G
or F
sets, which are not so bad, and at null sets, which can be quite bad
but dont have positive measure.
Next we prove the Caratheodory extension theorem. We say that a mea-
sure is -nite if there exist E
1
, E
2
, . . . , such that (E
i
) < for all i and
X
i=1
E
i
.
Theorem 4.5 Suppose /
0
is an algebra and m restricted to /
0
is a measure.
Dene
m
(E) = inf
_

i=1
m(A
i
) : A
i
/
0
, E
i=1
A
i
_
.
Then
(a) m
(A) = m(A) if A /
0
;
(b) every set in /
0
is m
-measurable;
12
(c) if m is -nite, then there is a unique extension to the smallest -algebra
containing /
0
.
Proof. We start with (a). Suppose E /
0
. We know m
(E) m(E)
since we can take A
1
= E and A
2
, A
3
, . . . empty in the denition of m
. If
E
i=1
A
i
with A
i
/
0
, let B
n
= E (A
n

n1
i=1
A
i
). Then the B
n
are
disjoint, they are each in /
0
, and their union is E. Therefore
m(E) =
i=1
m(B
i
)
i=1
m(A
i
).
Thus m(E) m
(E).
Next we look at (b). Suppose A /
0
. Let > 0 and let E X. Pick
B
i
/
0
such that E
i=1
B
i
and
i
m(B
i
) m
(E) + . Then
m
(E) +
i=1
m(B
i
) =
i=1
m(B
i
A) +
i=1
m(B
i
A
c
)
m
(E A) + m
(E A
c
).
Since is arbitrary, m
(E) m
(EA)+m
(EA
c
). So A is m
-measurable.
Finally, suppose we have two extensions to the smallest -algebra contain-
ing /
0
; let the other extension be called n. We will show that if E is in this
smallest -algebra, then m
(E) = n(E).
Since m is -nite, we can reduce to the case where m is a nite measure:
if X =
i
K
i
with m(K
i
) < and we prove uniqueness for the measure m
i
dened by m
i
(A) = m(AK
i
), then uniqueness for m follows. So we suppose
m(X) < .
Since E must be m
-measurable,
m
(E) = inf
i=1
m(A
i
) : E
i=1
A
i
, A
i
/
0
.
But m = n on /
0
, so
i
m(A
i
) =
i
n(A
i
). Therefore n(E)

i
n(A
i
),
which implies n(E) m
(E).
Since we do not know that n is constructed via an outer measure, we must
use a dierent argument to get the reverse inequality. Let > 0 and choose
13
A
i
/
0
such that m
(E) +

i
m(A
i
) and E
i
A
i
. Let A =
i
A
i
and
B
k
=
k
i=1
A
i
. Observe m
(E) + m
(A), hence m
(AE) < . We have

m
(A) = lim
k
m
(B
k
) = lim
k
n(B
k
) = n(A).
Then
m
(E) m
(A) = n(A) = n(E) +n(AE) n(E) +m(AE) n(E) +.

Since is arbitrary, this completes the proof.
Remarks: (1) Uniqueness implies there is only one possible Lebesgue mea-
sure.
(2) We will use the Caratheodory extension theorem in the study of prod-
uct measures. It is also used in the Riesz representation theorem and in the
Daniell-Kolmogorov extension theorem.
We now dene Lebesgue-Stieltjes measures. Let : R R be nonde-
creasing and right continuous (i.e., (x+) = (x) for all x, where (x+) =
lim
yx,y>x
(y)). Suppose we dene m
((a, b]) = (b) (a), dene

m
i=1
(a
i
, b
i
]) =
i
((b
i
) (a
i
))
when the intervals (a
i
, b
i
] are disjoint, and dene
m
(A) = inf
i
((b
i
) (a
i
)) : A
i
(a
i
, b
i
].
Very much as in the previous section we can show that m
is a measure on
the Borel -algebra.
The m
measure of a point x is (x) (x), where

(x) = lim
yx,y<x
(x).
So m
(x) is equal to the size of the jump (if any) of at x.

Lebesgue measure is the special case of m
when (x) = x.
Given a measure on R such that (K) < whenever K is compact,
dene (x) = ((0, x]) if x 0 and (x) = ((x, 0]) if x < 0. Then is
nondecreasing, right continuous, and it is not hard to see that = m
.
14
5 Measurable functions
Suppose we have a set X together with a -algebra /.
Denition 5.1 f : X R is measurable or /-measurable if x : f(x) >
a / for all a R.
Example 5.2 Suppose f is identically constant. Then x : f(x) > a is
either empty or the whole space, so f is measurable.
Example 5.3 Suppose f(x) = 1 if x A and 0 otherwise. Then x : f(x) >
a is either , A, or X. So f is measurable if and only if A is in the -algebra.
Example 5.4 Suppose X is the real line with the Borel -algebra and
f(x) = x. Then x : f(x) > a = (a, ), and so f is measurable.
Proposition 5.5 The following are equivalent.
(a) x : f(x) > a / for all a;
(b) x : f(x) a / for all a;
(c) x : f(x) < a / for all a;
(d) x : f(x) a / for all a.
Proof. The equivalence of (a) and (b) and of (c) and (d) follow from taking
complements. The remaining equivalences follow from the equations
x : f(x) a =
n=1
x : f(x) > a 1/n,
x : f(x) > a =
n=1
x : f(x) a + 1/n.
Proposition 5.6 If X is a metric space, / contains all the open sets, and
f is continuous, then f is measurable.
15
Proof. x : f(x) > a = f
1
(a, ) is open.
Proposition 5.7 If f and g are measurable, so are f +g, cf, fg, max(f, g),
and min(f, g).
Proof. If f(x) +g(x) < , then f(x) < g(x), and there exists a rational
r such that f(x) < r < g(x). So
x : f(x) + g(x) < =
_
r rational
(x : f(x) < r x : g(x) < r).
f
2
is measurable since x : f(x)
2
> a) = x : f(x) >

a x : f(x) <
a. The measurability of fg follows since fg =

1
2
[(f + g)
2
f
2
g
2
].
x : max(f(x), g(x)) > a = x : f(x) > a x : g(x) > a, and the
argument for min(f, g) is similar.
Proposition 5.8 If f
i
is measurable for each i, then so are sup
i
f
i
, inf
i
f
i
,
limsup
i
f
i
, and liminf
i
f
i
.
Proof. The result will follow for limsup and liminf once we have the result
for the sup and inf by using the denitions. We have x : sup
i
f
i
> a =
i=1
x : f
i
(x) > a, and the proof for inf f
i
is similar.
Denition 5.9 We say f = g almost everywhere, written f = g a.e., if
x : f(x) ,= g(x) has measure zero. Similarly, we say f
i
f a.e., if the set
of x where this fails has measure zero.
We saw in Proposition 5.6 that all continuous functions are Borel measur-
able. The same is true for monotone functions on the real line.
Proposition 5.10 If f : R R is nondecreasing or nonincreasing, then f
is Borel measurable.
16
Proof. Let us suppose f is nondecreasing. The set A
a
= x : f(x) > a
is then either a semi-innite open interval or semi-innite closed interval.
This can be seen by a picture. To be more careful, given a R, let x
0
=
supy : f(y) a. If f(x
0
) = a, then A
a
= (x
0
, ), while if f(x
0
) ,= a, then
A
a
= [x
0
, ). In each case A
a
is a Borel set.
Proposition 5.11 Let X be a space, / a -algebra on X, and f : X R a
/-measurable function. If A is in the Borel -algebra on R, then f
1
(A) /.
Proof. Let B be the Borel -algebra on R and ( = A B : f
1
(A) /.
If A
1
, A
2
, . . . (, then since f
1
(
i
A
i
) =
i
f
1
(A
i
) /, we have that
( is closed under countable unions. Similarly ( is closed under countable
intersections and complements, so ( is a -algebra. Since f is measurable,
( contains (a, ) for every real a, hence ( contains the -algebra generated
by these intervals, that is, ( contains B.
Example 5.12 We want to construct a set that is Lebesgue measurable, but
not Borel measurable. Let F be the Cantor-Lebesgue function of Example
4.1 and dene
f(x) = infy : F(y) x.
Although f is not continuous, observe that f is strictly increasing (hence
one-to-one) and maps [0, 1] into C, the Cantor set. Since f is nondecreasing,
f
1
maps Borel measurable sets to Borel measurable sets.
Let A be the non-measurable set we constructed in Proposition 3.1. Let
B = f(A). Since f(A) C and m(C) = 0, then f(A) is a null set, hence
is Lebesgue measurable. On the other hand, f(A) is not Borel measurable,
because if it were, then A = f
1
(f(A)) would be Borel measurable, a con-
tradiction.
6 Integration
In this section we introduce the Lebesgue integral.
17
Denition 6.1 If E X, dene the characteristic function of E by
E
(x) =
_
1 x E;
0 x / E.
A simple function s is one of the form
s(x) =
n
i=1
a
i
E
i
(x)
for reals a
i
and measurable sets E
i
.
Proposition 6.2 Suppose f 0 is measurable. Then there exists a sequence
of nonnegative measurable simple functions increasing to f.
Proof. Let E
ni
= x : (i 1)/2
n
f(x) < i/2
n
and F
n
= x : f(x) n
for n = 1, 2, . . . , and i = 1, 2, . . . , n2
n
. Then dene
s
n
=
n2
n
i=1
i 1
2
n

E
ni
+ n
Fn
.
It is easy to see that s
n
has the desired properties.
Denition 6.3 If s =
n
i=1
a
i
E
i
is a nonnegative measurable simple func-
tion, dene the Lebesgue integral of s to be
_
s d =
n
i=1
a
i
(E
i
). (6.1)
If f 0 is a measurable function, dene
_
f d = sup
_
_
s d : 0 s f, s simple
_
. (6.2)
18
If f is measurable and at least one of the integrals
_
f
+
d,
_
f
d is nite,
where f
+
= max(f, 0) and f
= min(f, 0), dene

_
f d =
_
f
+
d
_
f
d. (6.3)
Finally, if f = u + iv and
_
([u[ +[v[) d is nite, dene
_
f d =
_
ud + i
_
v d. (6.4)
A few remarks are in order. A function s might be written as a simple
function in more than one way. For example
AB
=
A
+
B
is A and B
are disjoint. It is clear that the denition of
_
s d is unaected by how s
is written. Secondly, if s is a simple function, one has to think a moment to
verify that the denition of
_
s d by means of (6.1) agrees with its denition
by means of (6.2).
Denition 6.4 If
_
[f[ d < , we say f is integrable.
The proof of the next proposition follows from the denitions.
Proposition 6.5 (a) If f is measurable, a f(x) b for all x, and (X) <
, then a(X)
_
f d b(X);
(b) If f(x) g(x) for all x and f and g are measurable and integrable,
then
_
f d
_
g d.
(c) If f is integrable, then
_
cf d = c
_
f d for all real c.
(d) If (A) = 0 and f is measurable, then
_
f
A
d = 0.
The integral
_
f
A
d is often written
_
A
f d. Other notation for the
integral is to omit the if it is clear which measure is being used, to write
_
f(x) (dx), or to write
_
f(x) d(x).
Proposition 6.6 If f is integrable,
_
f

_
[f[.
19
Proof. For the real case, this is easy. f [f[, so
_
f
_
[f[. Also f [f[,
so
_
f
_
[f[. Now combine these two facts.
For the complex case,
_
f is a complex number. If it is 0, the inequality
is trivial. If it is not, then
_
f = re
i
for some r and . Then
_
f
= r = e
i
_
f =
_
e
i
f.
From the denition of
_
f when f is complex, we have Re (
_
f) =
_
Re (f).
Since [
_
f[ is real, we have
_
f
= Re
_
_
e
i
f
_
=
_
Re (e
i
f)
_
[f[.
We do not yet have that
_
(f + g) =
_
f +
_
g.
7 Limit theorems
One of the most important results concerning Lebesgue integration is the
monotone convergence theorem.
Theorem 7.1 Suppose f
n
is a sequence of nonnegative measurable functions
with f
1
(x) f
2
(x) for all x and with lim
n
f
n
(x) = f(x) for all x.
Then
_
f
n
d
_
f d.
Proof. By Proposition 6.5(b),
_
f
n
is an increasing sequence of real numbers.
Let L be the limit. Since f
n
f for all n, then L
_
f. We must show
L
_
f.
Let s =
m
i=1
a
i
E
i
be any nonnegative simple function less than f and
let c (0, 1). Let A
n
= x : f
n
(x) cs(x). Since the f
n
(x) increases to
f(x) for each x and c < 1, then A
1
A
2
, and the union of the A
n
is
20
all of X. For each n,
_
f
n

_
An
f
n
c
_
An
s
n
= c
_
An
m
i=1
a
i
E
i
= c
m
i=1
a
i
(E
i
A
n
).
If we let n , by Proposition 2.5(c), the right hand side converges to
c
m
i=1
a
i
(E
i
) = c
_
s.
Therefore L c
_
s. Since c is arbitrary in the interval (0, 1), then L
_
s.
Taking the supremum over all simple s f, we obtain L
_
f.
Example 7.2 Let X = [0, ) and f
n
(x) = 1/n for all x. Then
_
f
n
=
, but f
n
f where f = 0 and
_
f = 0. The problem here is that the f
n
are not nonnegative.
Example 7.3 Suppose f
n
= n
(0,1/n)
. Then f
n
0, f
n
0 for each x, but
_
f
n
= 1 does not converge to
_
0 = 0. The trouble here is that the f
n
do
not increase for each x.
Once we have the monotone convergence theorem, we can prove that the
Lebesgue integral is linear.
Theorem 7.4 If f
1
and f
2
are integrable, then
_
(f
1
+ f
2
) =
_
f
1
+
_
f
2
.
Proof. First suppose f
1
and f
2
are nonnegative and simple. Then it is clear
from the denition that the theorem holds in this case. Next suppose f
1
21
and f
2
are nonnegative. Take s
n
simple and increasing to f
1
and t
n
simple
and increasing to f
2
. Then s
n
+ t
n
increases to f
1
+ f
2
, so the result follows
from the monotone convergence theorem and the result for simple functions.
Finally in the general case, write f
1
= f
+
1
f
1
and similarly for f
2
, and use
the denitions and the result for nonnegative functions.
Proposition 7.5 Suppose f
n
are nonnegative measurable functions. Then
_

n=1
f
n
=
n=1
_
f
n
.
Proof. Let F
N
=
N
n=1
f
n
and write
_

n=1
f
n
=
_
lim
N
N
n=1
f
n
=
_
lim
N
F
N
= lim
N
_
F
N
(7.1)
= lim
N
N
n=1
_
f
n
=
n=1
_
f
n
,
using the monotone convergence theorem and the linearity of the integral.
The next theorem is known as Fatous lemma.
Theorem 7.6 Suppose the f
n
are nonnegative and measurable. Then
_
liminf
n
f
n
liminf
n
_
f
n
.
Proof. Let g
n
= inf
in
f
i
. Then g
n
are nonnegative and g
n
increases to
liminf f
n
. Clearly g
n
f
i
for each i n, so
_
g
n

_
f
i
. Therefore
_
g
n
inf
in
_
f
i
.
22
If we take the limit as n , on the left hand side we obtain
_
liminf f
n
by
the monotone convergence theorem, while on the right hand side we obtain
liminf
n
_
f
n
.
A typical use of Fatous lemma is the following. Suppose we have f
n
f
and sup
n
_
[f
n
[ K < . Then [f
n
[ [f[, and by Fatous lemma,
_
[f[
K.
Another very important theorem is the dominated convergence theorem.
Theorem 7.7 Suppose f
n
are measurable functions and f
n
(x) f(x). Sup-
pose there exists an integrable function g such that [f
n
(x)[ g(x) for all x.
Then
_
f
n
d
_
f d.
Proof. Since f
n
+ g 0, by Fatous lemma,
_
(f + g) liminf
_
(f
n
+ g).
Since g is integrable,
_
f liminf
_
f
n
.
Similarly, g f
n
0, so
_
(g f) liminf
_
(g f
n
),
and hence
_
f liminf
_
(f
n
) = limsup
_
f
n
.
Therefore _
f limsup
_
f
n
,
which with the above proves the theorem.
Example 7.3 is an example where the limit of the integrals is not the
integral of the limit because there is no dominating function g.
23
If in the monotone convergence theorem or dominated convergence the-
orem we have only f
n
(x) f(x) almost everywhere, the conclusion still
holds. For if the f
n
and f are measurable and A = x : f
n
(x) f(x), then
f
A
f
A
for each x. And since A
c
has measure 0, we see from Proposition
6.5(d) that
_
f
A
=
_
f, and similarly with f replaced by f
n
.
8 Properties of Lebesgue integrals
Later on we will need the following two propositions.
Proposition 8.1 Suppose f is measurable and for every measurable set A
we have
_
A
f d = 0. Then f = 0 almost everywhere.
Proof. Let A = x : f(x) > . Then
0 =
_
A
f
_
A
= (A)
since f
A

A
. Hence (A) = 0. We use this argument for = 1/n and
n = 1, 2, . . . , so x : f(x) > 0 = 0. Similarly x : f(x) < 0 = 0.
Proposition 8.2 Suppose f is measurable and nonnegative and
_
f d = 0.
Then f = 0 almost everywhere.
Proof. If f is not almost everywhere equal to 0, there exists an n such that
(A
n
) > 0 where A
n
= x : f(x) > 1/n. But then since f is nonnegative,
0 =
_
f
_
An
f
1
n
(A
n
),
a contradiction.
We give a result on approximating a function on R by continuous functions.
24
Proposition 8.3 Suppose f is a measurable function from R to R that is
integrable. Let > 0. Then there exists a continuous function that is 0
outside some bounded interval such that
_
[f g[ < .
Proof. If we have continuous functions g
1
, g
2
such that
_
[f
+
g
1
[ < /2
and
_
[f
g
2
[ < /2, where f
+
= max(f, 0) and f
= max(f, 0), then

taking g = g
1
g
2
will prove our result. So without loss of generality, we
may assume f 0.
By monotone convergence
_
f
[n,n]
increases to
_
f, so by taking n large
enough, the dierence of the integrals will be less than /2. If we nd g such
that
_
[f
[n,n]
g[ < /2, then
_
[f g[ < . Therefore we may assume
that f is 0 outside some bounded interval.
We can nd simple functions increasing to f whose integrals increase to
_
f. Let s
m
be a simple function such that s
m
f and
_
s
m

_
f /2.
If we nd g such that
_
[s
m
g[ < /2, then
_
[f g[ < . So it suces to
consider the case where f is a simple function.
If f =
p
i=1
a
i
A
i
and we nd g
i
continuous such that
_
[a
i
A
i
g
i
[ < /p,
then
p
i=1
g
i
will be the desired function. So we may assume f is a constant
times a characteristic function, and by linearity, we may assume f is equal
to
A
for some A contained in a bounded interval [n, n].
We can choose G open and F closed such that F A G and m(GF) <
. We can replace G by G (n 1, n + 1). G
c
[n 1, n + 1] and F
are compact sets, so there is a minimum distance between them, say, . Let
g(x) = max(0, 1 dist (x, F)/). Then g is continuous, 0 g 1, g is 1 on
F, g is 0 on G
c
, and g is 0 outside of [n 1, n + 1]. Therefore
[g
A
[
G
F
,
so _
[g
A
[
_
(
G
F
) = m(GF) < .
The method of proof, where one proves a result for characteristic func-
tions, then simple functions, then non-negative functions, and then nally
integrable functions is very common.
25
We nish this section with a comparison of the Lebesgue integral and
the Riemann integral. Here we are only looking at bounded functions from
[a, b] into R. If we are looking at the Lebesgue integral, we write
_
f, while,
temporarily, if we are looking at the Riemann integral, we write R(f). Recall
that the Riemann integral on [a, b] is dened as follows: if P is a partition of
[a, b], then
U(P, f) =
n
i=1
( sup
x
i=1
xx
f(x)) (x
i
x
i1
)
and
L(P, f) =
n
i=1
( inf
x
i=1
xx
f(x)) (x
i
x
i1
).
Set R(f) = infU(P, f) : P is a partition similarly R(f). Then the Rie-
mann integral exists if R(f) = R(f), and the common value is the Riemann
integral, which we denote R(f).
Theorem 8.4 A bounded measurable function f on [a, b] is Riemann inte-
grable if and only if the set of points at which f is discontinuous has Lebesgue
measure 0, and in that case, the Riemann integral is equal in value to the
Lebesgue integral.
Proof. If P is a partition, dene
T
P
(x) =
n
i=1
( sup
x
i1
yx
i
f(y))
[x
i1
,x
i
)
(x),
and
S
P
(x) =
n
i=1
( inf
x
i1
yx
i
f(y))
[x
i1
,x
i
)
(x).
We see that
_
T
P
= U(P, f) and
_
S
P
= L(P, f).
If f is Riemann integrable, there exists a sequence of partitions Q
i
such
that U(Q
i
, f) R(f) and a sequence Q
i
such that L(Q
i
, f) R(f). It is not
hard to check that adding points to a partition increases L and decreases U,
so if we let P
i
=
ji
(Q
j
Q
j
), then P
i
is an increasing sequence of partitions,
U(P
i
, f) R(f), L(P
i
, f) R(f). We see also that T
P
i
(x) decreases at each
point, say, to T(x), and S
P
i
(x) increases at each point, say, to S(x). Also
26
T(x) f(x) S(x). Then by dominated convergence (recall that f is
bounded)
_
(T S) = lim
i
_
(T
P
i
S
P
i
) = lim
i
(U(P
i
, f) L(P
i
, f)) = 0.
We conclude T = S = f a.e. If x is not in the null set where T(x) ,= S(x) nor
in
i
P
i
, which is countable and hence of Lebesgue measure 0, then T
P
i
(x)
f(x) and S
P
i
(x) f(x). This implies that f is continuous at such f. Since
R(f) = lim
i
U(P
i
, f) = lim
i
_
T
P
i
=
_
f,
we see the Riemann integral and Lebesgue integral agree.
Now suppose that f is continuous a.e. Let > 0. Let P
i
be the partition
where we divide [a, b] into 2
i
equal parts. If x is not in the null set where
f is discontinuous, nor in
i=1
P
i
, then T
P
i
(x) f(x) and S
P
i
(x) f(x). By
dominated convergence,
U(P
i
, f) =
_
T
P
i

_
f
and
L(P
i
, f) =
_
S
P
i

_
f.
This does it.
9 Modes of convergence
Denition 9.1 If is a measure, we say a sequence of measurable functions
f
n
converges to f almost everywhere (written f
n
f a.e.) if there is a set
of measure 0 and for x not in this set we have f
n
(x) f(x).
We say f
n
converges to f in measure if for each > 0
(x : [f
n
(x) f(x)[ > ) 0
as n .
27
Proposition 9.2 Suppose is a nite measure.
(a) If f
n
f, a.e., then f
n
converges to f in measure.
(b) If f
n
f in measure, there is a subsequence n
j
such that f
n
j
f,
a.e.
Proof. Let > 0. If A
n
= x : [f
n
(x) f(x)[ > , then
An
0 a.e., and
by dominated convergence,
(A
n
) =
_

An
(x) (dx) 0.
This proves (a).
To prove (b), let n
1
= 1 and choose n
j
> n
j1
inductively so that
(x : [f
n
j
(x) f(x)[ > 1/j) 2
j
.
Let A
j
= x : [f
n
j
(x) f(x)[ > 1/j. Then (A
j
) 2
j
, and
A =
k=1
j=k
A
j
has measure less than
j=k
A
j
for every k, hence less than 2
k+1
for every
k. Therefore A has measure 0. If x / A, then x /
j=k
A
j
for some k, so
[f
n
j
(x) f(x)[ 1/j for j k, which means f
n
j
f a.e. on A
c
.
Example 9.3 Part (a) of the above proposition is not true if (X) = .
Let X = R and let f
n
=
(n,n+1)
.
Example 9.4 For an example where f
n
f in measure but not almost
everywhere, let X = [0, 1], let be Lebesgue measure, and let f
n
(x) =
Fn
(x), where F
n
= y : (
n
j=1
1/j)( mod 1) y (
n+1
j=1
1/j)( mod 1).
z( mod 1) is dened as the fractional part of z (where the largest integer less
than z is subtracted from z). Let f(x) = 0 for all x.
Then (F
n
) 1/n 0, so f
n
f in measure. But any x will be in
innitely many F
n
s, so f
n
does not converge to f(x) at any point.
The following is known as Egoros theorem.
28
Theorem 9.5 If is a nite measure, > 0, and f
n
f a.e., then there
exists a measurable set A such that (A) < and f
n
f uniformly on A
c
.
This type of convergence is sometimes known as almost uniform conver-
gence.
Proof. Let
E
nk
=
m=n
x : [f
m
(x) f(x)[ > 1/k.
for xed k, E
nk
decreases as n increases, and the intersection
n
E
nk
has
measure 0. So (E
nk
) 0. Then there exists an integer n
k
such that
(E
n
k
k
) < 2
k
. Let E =
k=1
E
n
k
k
. Then (E) < , and if x / E and
n > n
k
, then [f
n
(x) f(x)[ 1/k. Thus f
n
f uniformly on E
c
.
10 Product measures
If A
1
A
2
and A =
i=1
A
i
, we write A
i
A. If A
1
A
2
and
A =
i=1
A
i
, we write A
i
A.
Denition 10.1 / is a monotone class is / is a collection of subsets of
X such that
(a) if A
i
A and each A
i
/, then A /;
(b) if A
i
A and each A
i
/, then A /.
The intersection of monotone classes is a monotone class, and the inter-
section of all monotone classes containing a given collection of sets is the
smallest monotone class containing that collection.
The next theorem, the monotone class lemma, is rather technical, but very
useful.
Theorem 10.2 Suppose /
0
is a algebra, / is the smallest -algebra con-
taining /
0
, and / is the smallest monotone class containing /
0
. Then
/ = /.
29
Proof. A -algebra is clearly a monotone class, so / /. We must show
/ /.
Let ^
1
= A / : A
c
/. Note ^
1
is contained in /, contains /
0
,
and is a monotone class. So ^
1
= /, and therefore / is closed under the
operation of taking complements.
Let ^
2
= A / : A B / for all B /
0
. ^
2
is contained in /;
^
2
contains /
0
because /
0
is an algebra; ^
2
is a monotone class because
(
i=1
A
i
) B =
i=1
(A
i
B), and similarly for intersections. Therefore
^
2
= /; in other words, if B /
0
and A /, then A B /.
Let ^
3
= A / : A B / for all B /. As in the preceding
paragraph, ^
3
is a monotone class contained in /. By the last sentence of
the preceding paragraph, ^
3
contains /
0
. Hence ^
3
= /.
We thus have that / is a monotone class closed under the operations of
taking complements and taking intersections. This shows / is a -algebra,
and so / /.
Suppose (X, /, ) and (Y, B, ) are two measure spaces, i.e., / and B are
-algebras on X and Y , resp., and and are measures on / and B, resp.
A rectangle is a set of the form A B, where A / and B B. Dene a
set function on rectangles by
(A B) = (A)(B).
Lemma 10.3 Suppose AB =
i=1
A
i
B
i
, where A, A
i
/ and B, B
i
B
and the A
i
B
i
are disjoint. Then
(A B) =
i=1
(A
i
B
i
).
Proof. We have
AB
(x, y) =
i=1
A
i
B
i
(x, y),
and so
A
(x)
B
(y) =
i=1
A
i
(x)
B
i
(y).
30
Holding x xed and integrating over y with respect to , we have, using (7.1),
A
(x)(B) =
i=1
A
i
(x)(B
i
).
Now use (7.1) again and integrate over x with respect to to obtain the
result.
Let (
0
= nite unions of rectangles. It is clear that (
0
is an algebra. By
Lemma 10.3 and linearity, we see that is a measure on (
0
. Let /B
be the smallest -algebra containing (
0
; this is called the product -algebra.
By the Caratheodory extension theorem, can be extended to a measure
on /B.
We will need the following observation. Suppose a measure is -nite.
So there exist E
i
which have nite measure and whose union is X. If we
let F
n
=
n
i=1
E
i
, then F
i
X and (F
n
) is nite for each n.
If and are both -nite, say with F
i
X and G
i
Y , then will
be -nite, using the sets F
i
G
i
.
The main result of this section is Fubinis theorem, which allows one to
interchange the order of integration.
Theorem 10.4 Suppose f : XY R is measurable with respect to /B.
If f is nonnegative or
_
[f(x, y)[ d( )(x, y) < , then
(a) the function g(x) =
_
f(x, y)(dy) is measurable with respect to /;
(b) the function h(y) =
_
f(x, y)(dx) is measurable with respect to B;
(c) we have
_
f(x, y) d( )(x, y) =
_
_
_
f(x, y) d(x)
_
d(y)
=
_
_
_
f(x, y) d(y)
_
(dx).
31
Proof. First suppose and are nite measures. If f is the characteristic
function of a rectangle, then (a)(c) are obvious. By linearity, (a)(c) hold
if f is the characteristic function of a set in (
0
, the set of nite unions of
rectangles.
Let / be the collection of sets C such that (a)(c) hold for
C
. If C
i
C
and C
i
/, then (c) holds for
C
by monotone convergence. If C
i
C,
then (c) holds for
C
by dominated convergence. (a) and (b) are easy. So
/ is a monotone class containing /
0
, so / = /B.
If and are -nite, applying monotone convergence to C (F
n
G
n
)
for suitable F
n
and G
n
and monotone convergence, we see that (a)(c) holds
for the characteristic functions of sets in /B in this case as well.
By linearity, (a)(c) hold for nonnegative simple functions. By monotone
convergence, (a)(c) hold for nonnegative functions. In the case
_
[f[ < ,
writing f = f
+
f
and using linearity proves (a)(c) for this case, too.

11 The Radon-Nikodym theorem
Suppose f is nonnegative, measurable, and integrable with respect to . If
we dene by
(A) =
_
A
f d, (11.1)
then is a measure. The only part that needs thought is the countable ad-
ditivity, and this follows from (7.1) applied to the functions f
A
i
. Moreover,
(A) is zero whenever (A) is. We sometimes write f = d/d for (11.1).
Denition 11.1 A measure is called absolutely continuous with respect
to a measure if (A) = 0 whenever (A) = 0. This is frequently written
.
Proposition 11.2 A nite measure is absolutely continuous with respect
to if and only if for all there exists such that (A) < implies (A) < .
32
Proof. If the condition given in the statement of the proposition holds, it is
clear that . Suppose now that . If the condition does not hold,
there exists E
k
such that (E
k
) < 2
k
but (E
k
) . Let F =
n=1
k=n
E
k
.
Then
(F) = lim
n
(
k=n
E
k
) lim
n
k=n
2
k
= 0,
but
(F) = lim
n
(
k=n
E
k
) ;
This contradicts the absolute continuity.
Denition 11.3 A function : / (, ] is called a signed measure if
() = 0 and (
i=1
A
i
) =
i=1
(A
i
) whenever the A
i
are disjoint and all
the A
i
are in /.
Denition 11.4 Let be a signed measure. A set A / is called a positive
set for if (B) 0 whenever B A and B /. We dene a negative
set similarly. A null set A is one where (B) = 0 whenever B A is
measurable.
Example 11.5 Suppose m is Lebesgue measure and (A) =
_
A
f dm for
some integrable f. If we let P = x : f(x) 0, then P is easily seen to be a
positive set, and if N = x : f(x) < 0, then N is a negative one. The Hahn
decomposition which we give below is a decomposition of our space (in this
case R) into positive and negative sets. This decomposition is unique, except
that C = x : f(x) = 0 could be included in N instead of P, or apportioned
partially to P and partially to N. Note, however, that C is a null set. The
Jordan decomposition below is a decomposition of into
+
and
, where
+
(A) =
_
A
f
+
dm, and similarly
(A) =
_
A
f
d.
Note that if is a signed measure, then (
i=1
A
i
) = lim
n
(
n
i=1
A
i
).
The proof is the same as in the case of positive measures.
Proposition 11.6 Let be a signed measure taking values in (, ]. Let
E be measurable with (E) < 0. Then there exists a subset F of E that is a
negative set with (F) < 0.
33
Proof. If E is a negative set, we are done. If not, there exists a subset with
positive measure. Let n
1
be the smallest positive integer such that there
exists E
1
E with (E
1
) 1/n. Let k 2. If F
k
= E (E
1
E
k1
)
is negative, we are done. If not, let n
k
be the smallest positive integer such
that there exists E
k
F
k
with (E
k
) 1/n
k
. We continue.
If the construction stops after a nite number of sets, we are done. If not,
let F =
k
F
k
= E (
k
E
k
). Since 0 > (E) > and (E
k
) 0, then
(E) = (F) +
k=1
(E
k
).
Then (F) (E) < 0, so the sum converges. If G F is measurable with
(G) > 0, then (G) 1/N for some N, which contradicts the construction.
Therefore F must be a negative set.
We write AB for (A B) (B A). The following is known as the
Hahn decomposition theorem.
Theorem 11.7 Let be a signed measure taking values in (, ]. There
exist sets E and F in / that are disjoint whose union is X and such that E
is a negative set and F is a positive set. If E
and F
are another such pair,

then EE
= FF
is a null set with respect to ..

Proof. Let L = inf(A) : A is a negative set. Choose negative sets A
n
such that (A
n
) L. Let E =
n=1
A
n
. Let B
n
= A
n
(B
1
B
n1
) for
each n. Since A
n
is a negative set, so is each B
n
. Also, the B
n
are disjoint.
If C E, then
(C) = lim
n
(C (
n
i=1
B
i
)) = lim
n
n
i=1
(C B
i
) 0.
So E is a negative set.
Since E is negative,
(E) = (A
n
) + (E A
n
) (A
n
).
Letting n , we obtain (E) = L.
34
Let F = E
c
. If F were not a positive set, there would exist B F with
(B) < 0. By Proposition 11.6 there exists a negative set C contained in B
with (C) < 0. But then E C would be a negative set with (E C) <
(E) = L, a contradiction.
To prove uniqueness, if E
, F
are another such pair of sets and A

E E
E, then (A) 0. But A E E
= F
F F
, so (A) 0.
Therefore (A) = 0. The same argument works if A E
E, and any
subset of EE
can be written as the union of A

1
and A
2
, where A
E E
and A
2
E
E.
Let us say two measures and are mutually singular if there exist two
disjoint sets E and F in / whose union is X with (E) = (F) = 0. This is
often written .
Example 11.8 If is Lebesgue measure restricted to [0, 1/2], that is, (A)
= m(A [0, 1/2]), and is Lebesgue measure restricted to [1/2, 1], then
and are mutually singular. We let E = [0, 1/2] and F = (1/2, 1]. This
example works because the Lebesgue measure of 1/2 is 0.
Example 11.9 A more interesting example is the following. Let f be the
Cantor-Lebesgue function and let be the Lebesgue-Stieltjes measure asso-
ciated with f. Let be Lebesgue measure restricted to [0, 1]. Then .
To see this, we let E = C, where C is the Cantor set, and F = [0, 1] C. We
already know that m(E) = 0 and we need to show (F) = 0. To do that,
we need to show (I) = 0 for every open interval contained in F. This will
follow if we show (J) = 0 for every interval of the form J = (a, b] contained
in F. But f is constant on every such interval, so f(b) = f(a), and therefore
(J) = f(b) f(a) = 0.
The following is known as the Jordan decomposition theorem.
Theorem 11.10 If is a signed measure, there exist measures
+
and
such that =
+

and
+
and
are mutually singular. This decom-

position is unique.
Proof. Let E and F be positive and negative sets for and let
+
(A) =
(E A),
(A) = (A F). This gives the desired decomposition.

35
If =
+

is another such decomposition with

+
,
mutually sin-
gular, let E
and F
be the sets in the denition of mutually singular. Then

X = E
gives another Hahn decomposition, hence EE
is a null set
with respect to . Then for any A /,
+
(A) = (A E
) = (A E) =
+
(A),
and similarly for
.
The measure
+
+
is called the total variation measure and is written

[[.
We now are ready for the Radon-Nikodym theorem.
Theorem 11.11 Suppose is a -nite measure and is a nite measure
such that is absolutely continuous with respect to . There exists a -
integrable nonnegative function f such that (A) =
_
A
f d for all A /.
Moreover, if g is another such function, then f = g almost everywhere with
respect to .
Proof. Let us rst prove the uniqueness assertion. For every set A we have
_
A
(f g) d = (A) (A) = 0.
By Proposition 8.1 we have f g = 0 a.e. with respect to .
Since is -nite, there exist F
i
X such that (F
i
) < for each i. Let
i
be the restriction of to F
i
, that is,
i
(A) = (A F
i
). Dene
i
, the
restriction of to F
i
, similarly. If f
i
is a function such that
i
(A) =
_
A
f
i
d
i
for all A, the argument of the rst paragraph shows that f
i
= f
j
on F
i
if
i j. If we dene f by f(x) = f
i
(x) if x F
i
, we see that f will be the
desired function. So it suces to restrict attention to the case where is
nite.
Let
T =
_
g : 0 g,
_
A
g d (A) for all A /
_
.
T is not empty because 0 T. Let L = sup
_
g d : g T, and let g
n
be
a sequence in T such that
_
g
n
d L. Let h
n
= max(g
1
, . . . , g
n
).
36
If g
1
and g
2
are in T, then h
2
= max(g
1
, g
2
) is also in T. To see this, let
B = x : g
1
(x) g
2
(x), and write
_
A
h
2
d =
_
AB
h
2
d +
_
AB
c
h
2
d
=
_
AB
g
1
d +
_
AB
c
g
2
d
(A B) + (A B
c
)
= (A).
By an induction argument, h
n
is in T.
The h
n
increase, say to f. By monotone convergence
_
f d = L and
_
A
f d (A) (11.2)
for all A.
Let A be a set where there is strict inequality in (11.2); let be chosen
suciently small so that if is dened by
(B) = (B)
_
B
f d (B),
then (A) > 0. is a signed measure; let F be the positive set as constructed
in Theorem 11.7. In particular, (F) > 0. So for every B
_
BF
f d + (B F) (B F).
We then have, using (11.2), that
_
B
(f +
F
) d =
_
B
f d + (B F)
=
_
BF
c
f d +
_
BF
f d + (B F)
(B F
c
) + (B F) = (B).
This says that f +
F
T. However,
L
_
(f +
F
) d =
_
f d + (F) = L + (F),
37
which implies (F) = 0. But then (F) = 0, and hence (F) = 0, contra-
dicting the fact that F is a positive set for F with (F) > 0.
The proof of the Lebesgue decomposition theorem is almost the same.
Theorem 11.12 Suppose and are two nite measures. There exist mea-
sures , such that = + , is absolutely continuous with respect to ,
and and are mutually singular.
Proof. Dene T and L and construct f as in the proof of the Radon-
Nikodym theorem. Let (A) =
_
A
f d and let = . We have
_
A
f d
(A), so (A) 0 for all A. To keep things straight, we record that we have
f = d/d and + = . We need to show and are mutually singular.
Suppose not. Then there exists F / with (F) > 0 and (F) > 0, and
so (F) (F) > 0. Note that for small enough, ( )(F) > 0. We
claim that there exist > 0 and E F such that E is a positive set with
respect to and (E) > 0. Given the claim, if A /,
_
A
E
d = (A E) (A E)
(A E) (A)
= (A)
_
A
f d.
This says that
_
A
(f +
E
) d (A)
for all A /, or f +
E
T. But
_
(f +
E
) d =
_
f d + (E) > L,
a contradiction to the denition of L.
It remains to prove the claim. Let F = P
n
N
n
be a Hahn decomposition
for the measure
1
n
restricted to F, let P = P
n
, and N = N
n
= F P.
Then N is a negative set for
1
n
for each n, or 0 (N)
1
n
(N); this
38
implies (N) = 0. If (P) = 0, then , and we are supposing that is
not the case. Therefore (P) > 0, hence (P
n
) > 0 for some n, and P
n
is a
positive set for
1
n
. Now take = 1/n and E = P
n
.
12 Dierentiation of real-valued functions
In this section we want to look at when f : R R is dierentiable and when
the fundamental theorem of calculus holds. Briey,
(1) Functions of bounded variation are dierentiable;
(2) The derivative of
_
x
a
f(y) dy is equal to f a.e. if f is integrable;
(3)
_
b
a
f
(y) dy = f(b) f(a) if f
is absolutely continuous.
Let E R be a measurable set and let O be a collection of intervals. We
say O is a Vitali cover of E if for each x E and each > 0 there exists
an interval G O containing x whose length is less than . m will denote
Lebesgue measure.
Lemma 12.1 Let E have nite measure and let O be a Vitali cover of E.
Given > 0 there exists a nite subcollection of disjoint intervals I
1
, . . . , I
n
such that m(E
n
i=1
I
n
) < .
Proof. We may replace each interval in O by a closed one, since the set of
endpoints of a nite subcollection will have measure 0.
Let O be an open set of nite measure containing E. Since O is a Vi-
tali cover, we may suppose without loss of generality that each set of O is
contained in O. Let a
0
= supm(I) : I O. Let I
1
be any element of
O with m(I
1
) a
0
/2. Let a
1
= supm(I) : I O, I disjoint from I
1
,and
choose I
2
O disjoint from I
1
such that m(I
2
) a
1
/2. Continue in this way,
choosing I
n+1
disjoint from I
1
, . . . , I
n
and in O with length at least one half
as large as any other such interval in O that is disjoint from I
1
, . . . , I
n
.
If the process stops at some nite stage, we are done. If not, we gener-
ate a sequence of disjoint intervals I
1
, I
2
, . . . Since they are disjoint and all
39
contained in O, then
i=1
m(I
i
) m(O) < . So there exists N such that
i=N+1
m(I
i
) < /5.
Let R = E
N
i=1
I
i
; we will show m(R) < . Let J
n
be the interval
with the same center as I
n
but ve times the length. Let x R. There
exists an interval I O containing x with I disjoint from I
1
, . . . , I
N
. Since
m(I
n
) < , then
a
n
2
m(I
n
) < , and a
n
0. So I must either
be one of the I
n
for some n > N or at least intersect it, for otherwise we
would have chosen I at some stage. Let n be the smallest integer such that
I intersects I
n
; note n > N. We have m(I) a
n1
2m(I
n
). Since x is in
I and I intersects I
n
, the distance from x to the midpoint of I
n
is at most
m(I) + m(I
n
)/2 (5/2)m(I
n
). Therefore x J
n
.
Then R
i=N+1
J
n
, so m(R)

i=N+1
m(J
n
) = 5
i=N+1
m(I
n
) < .
Given a function f, we dene the derivates of f at x by
D
+
f(x) = limsup
h0+
f(x + h) f(x)
h
, D
f(x) = limsup
h0
f(x + h) f(x)
h
D
+
f(x) = liminf
h0+
f(x + h) f(x)
h
, D
f(x) = liminf
h0
f(x + h) f(x)
h
.
If all the derivates are equal, we say that f is dierentiable at x and dene
f
(x) to be the common value.

Theorem 12.2 Suppose f is nondecreasing on [a, b]. Then f is dierentiable
almost everywhere, f
is integrable, and
_
b
a
f
(x) dx f(b) f(a).

Proof. We will show that the set where any two derivates are unequal has
measure zero. We consider the set E where D
+
f(x) > D
f(X), the other

sets being similar. Let E
u,v
= x : D
+
f(x) > u > v > D
f(x). If we show
m(E
u,v
) = 0, then taking the union of all pairs of rationals with u > v shows
m(E) = 0.
Let s = m(E
u,v
), let > 0, and choose an open set O such that E
u,v
O
and m(O) < s+. For each x E
u,v
there exists an arbitrarily small interval
[x h, x] contained in O such that f(x) f(x h) < vh. Use Lemma 12.1
to choose I
1
, . . . , I
n
which are disjoint and whose interiors cover a subset A
40
of E
u,v
of measure greater than s . Suppose I
n
= [x
n
h
n
, x
n
]. Summing
over these intervals,
N
n=1
[f(x
n
) f(x
n
h
n
)] < v
n
n=1
h
n
< vm(O) < v(s + ).
Each point y A is the left endpoint of an arbitrarily small interval
(y, y + k) that is contained in some I
n
and for which f(y + k) f(y) > uk.
Using Lemma 12.1 again, we pick out a nite collection J
1
, . . . , J
M
whose
union contains a subset of A of measure larger than s 2. Summing over
these intervals yields
M
i=1
[f(y
i
+ k
i
) f(y
i
)] > u
k
i
> u(s 2).
Each interval J
i
is contained in some interval I
n
, and if we sum over those i
for which J
i
I
n
we nd
[f(y
i
+ k
i
) f(y
i
)] f(x
n
) f(x
n
h
n
),
since f is increasing. Thus
N
n=1
[f(x
n
) f(x
n
h
n
)]
M
i=1
[f(y
i
+ k
i
) f(y
i
)],
and so v(s +) > u(s 2). This is true for each , so vs us. Since u > v,
this implies s = 0.
This shows that
g(x) = lim
h0
f(x + h) f(x)
h
is dened almost everywhere and that f is dierentiable wherever g is nite.
Dene f(x) = f(b) if x b. Let g
n
(x) = n[f(x + 1/n) f(x)]. Then
g
n
(x) g(x) for almost all x, and so g is measurable. Since f is increasing,
g
n
0. By Fatous lemma
_
b
a
g liminf
_
b
a
g
n
= liminf n
_
b
a
[f(x + 1/n) f(x)]dx
= liminf
_
n
_
b+1/n
b
f n
_
a+1/n
a
f
_
= liminf
_
f(b) n
_
a+1/n
a
f
_
f(b) f(a).
41
For the last inequality, we use the fact that f is increasing. This shows that
g is integrable and hence nite almost everywhere.
A function is of bounded variation if sup
k
i=1
[f(x
i
) f(x
i1
)[ is nite,
where the supremum is over all partitions a = x
0
< x
1
< < x
k
= b of
[a, b].
Lemma 12.3 If f is of bounded variation on [a, b], then f can be written as
the dierence of two nondecreasing functions on [a, b].
Proof. Dene
P(y) = sup
_
k
i=1
[f(x
i
)f(x
i1
)]
+
_
, N(y) = sup
_
k
i=1
[f(x
i
)f(x
i1
)]
_
,
where the supremum is over all partitions a = x
0
< x
1
< < x
k
= y for
y [a, b]. P and N are measurable since they are both increasing. Since
k
i=1
[f(x
i
) f(x
i1
)]
+
=
k
i=1
[f(x
i
) f(x
i1
)]
+ f(y) f(a),
taking the supremum over all partitions of [a, y] yields
P(y) = N(y) + f(y) f(a).
Clearly P and N are nondecreasing in y, and the result follows by solving
for f(y).
From this lemma, we see that functions of bounded variation are dif-
ferentiable a.e. But the function sin(1/x) dened on (0, 1] is dierentiable
everywhere, but is not of bounded variation.
Next we look at when the derivative of
_
x
a
f(t) dt is equal to f(x) a.e.
Dene the indenite integral of an integrable function f by
F(x) =
_
x
a
f(t) dt.
42
Lemma 12.4 If f is integrable, then F is continuous and of bounded vari-
ation.
Proof. The continuity follows from the dominated convergence theorem.
The bounded variation follows from
k
i=1
[F(x
i
) F(x
i1
)[ =
k
i=1
_
x
i
x
i1
f(t) dt
i=1
_
x
i
x
i1
[f(t)[ dt
_
b
a
[f(t)[ dt
for all partitions.
Lemma 12.5 If f is integrable and F(x) = 0 for all x, then f = 0 a.e.
Proof. For any interval,
_
d
c
f =
_
d
a
f
_
c
a
f = 0. By dominated conver-
gence and the fact that any open set is the countable union of disjoint open
intervals,
_
O
f = 0 for any open set O.
If E is any measurable set, take O
n
open that such that
On
decreases to
E
a.e. By dominated convergence,
_
E
f =
_
f
E
= lim
_
f
On
= lim
_
On
f = 0.
This with Proposition 8.1 implies f is zero a.e.
Proposition 12.6 If f is bounded and measurable, then F
(x) = f(x) for

almost every x.
Proof. By Lemma 12.4, F is continuous and of bounded variation, and so
F
exists a.e. Let K be a bound for [f[. If

f
n
(x) =
F(x + 1/n) F(x)
1/n
,
then
f
n
(x) = n
_
x+1/n
x
f(t) dt,
43
so [f
n
[ is also bounded by K. Since f
n
F
a.e., then by dominated con-

vergence,
_
c
a
F
(x) dx = lim
_
c
a
f
n
(x) dx = limn
_
c
a
[F(x + 1/n) F(x)] dx
= lim
_
n
_
c+1/n
c
F(x) dx n
_
a+c
a
F(x) dx
_
= F(c) F(a) =
_
c
a
f(x) dx,
using the fact that F is continuous. So
_
c
a
[F
(x) f(x)] dx = 0 for all c,

which implies F
= f a.e. by Lemma 12.5.

Theorem 12.7 If f is integrable, then F
= f almost everywhere.
Proof. Without loss of generality we may assume f 0. Let f
n
(x) = f(x)
if f(x) n and let f
n
(x) = n if f(x) > n. Then f f
n
0. If G
n
(x) =
_
x
a
[f f
n
], then G
n
is nondecreasing, and hence has a derivative almost
everywhere. By Proposition 12.6, we know the derivative of
_
x
a
f
n
is equal to
f
n
almost everywhere. Therefore
F
(x) = G
n
(x) +
_
_
x
a
f
n
_
f
n
(x)
a.e. Since n is arbitrary, F
f a.e. So
_
b
a
F

_
b
a
f = F(b) F(a). On
the other hand, by Theorem 12.2,
_
b
a
F
(x) dx F(b) F(a) =

_
b
a
f. We
conclude that
_
b
a
[F
f] = 0; since F
f 0, this tells us that F
= f a.e.
Finally, we look at when
_
b
a
F
(y) dy = F(b) F(a).

A function is absolutely continuous on [a, b] if given there exists such
that
k
i=1
[f(x
i
) f(x
i
)[ < whenever x
i
, x
i
) is a nite collection of
nonoverlapping intervals with
k
i=1
[x
i
x
i
[ < .
It is easy to see that absolutely continuous functions are continuous and
that the Cantor-Lebesgue function is not absolutely continuous.
44
Lemma 12.8 If F(x) =
_
x
a
f(t) dt for f integrable on [a, b], then F is abso-
lutely continuous.
Proof. Let > 0. Choose a simple function s such that
_
b
a
[f s[ < /2.
Let K be a bound for [s[ and let = /2K. If (x
i
, x
i
) is a collection of
nonoverlapping intervals, the sum of whose lengths is less than , then set
A =
k
i=1
(x
i
, x
i
) and note
_
A
[f s[ < /2 and
_
A
s < K = /2.
Lemma 12.9 If f is absolutely continuous, then it is of bounded variation.
Proof. Let correspond to = 1 in the denition of absolute continuity.
Given a partition, add points if necessary so that each subinterval has length
at most . We can then group the subintervals into at most K collections,
each of total length less than , where K is an integer larger than (1+ba)/.
So the total variation is then less than K.
Lemma 12.10 If f is absolutely continuous on [a, b] and f
(x) = 0 a.e.,
then f is constant.
The Cantor-Lebesgue function is an example to show that we need the
absolute continuity.
Proof. Let c [a, b], let E = x [a, c] : f
(x) = 0, and let > 0. For

each point x E there exists arbitrarily small intervals [x, x+h] [a, c] such
that [f(x + h) f(x)[ < h. By Lemma 12.1 we can nd a nite collection
of such intervals that cover all of E except for a set of measure less than ,
where is the in the denition of absolute continuity. If the intervals are
[x
i
, y
i
] with x
i
< y
i
x
i+1
, then
[f(x
i+1
) f(y
i
)[ < by the denition of
absolute continuity, while
[f(y
i
) f(x
i
)[ <
(y
i
x
i
) (c a). So
adding these two inequalities together,
[f(c) f(a)[ =
[f(x
i+1
) f(y
i
)] +
[f(y
i
) f(x
i
)]
+ (c a).
Since is arbitrary, then f(c) = f(a), which implies that f is constant.
45
Theorem 12.11 If F is absolutely continuous, then
F(b) F(a) =
_
b
a
F
(y) dy.
Proof. Suppose F is absolutely continuous on [a, b]. Then F is of bounded
variation, so F = F
1
F
2
where F
1
and F
2
are nondecreasing, and F
exists
a.e. Since [F
(x)[ F
1
(x) + F
2
(x), then
_
[F
(x)[ dx F
1
(b) + F
2
(b)
F
1
(a) F
2
(a), and hence F
is integrable. If G(x) =
_
x
a
F
(t) dt, then G is

absolutely continuous by Lemma 12.8, so F G is absolutely continuous.
Then (F G)
= 0 a.e., and therefore F G is constant. Thus F(x) =

_
x
a
F
(t) dt + F(a). If we set x = b, we get our result.

13 L
p
spaces
We assume throughout this section that the measure is -nite. For 1 p <
, dene the L
p
norm of f by
|f|
p
=
_
_
[f(x)[
p
d
_
1/p
.
For p = , dene the L
norm of f by
|f|
= infM : (x : [f(x)[ M) = 0.
For 1 p the space L
p
is the set f : |f|
p
< .
The L
norm of a function f is the supremum of f provided we disregard

sets of measure 0.
It is clear that |f|
p
= 0 if and only if f = 0 a.e.
Proposition 13.1 (Holders inequality) If 1 < p, q < and p
1
+q
1
= 1,
then _
[f(x)g(x)[ d |f|
p
|g|
q
.
This also holds if p = and g = 1.
46
Proof. If M = |f|
, then
_
fg M
_
[g[ and the case p = and q = 1
follows. So let us assume 1 < p, q < . If |f|
p
= 0, then f = 0 a.e and
_
fg = 0, so the result is clear if |f|
p
= 0 and similarly if |g|
q
= 0. Let
F(x) = [f(x)[/|f|
p
and G(x) = [g(x)[/|g|
q
. Note |F|
p
= 1 and |G|
q
= 1,
and it suces to show that
_
FG 1.
The second derivative of the function e
x
is again e
x
, which is positive, and
so e
x
is convex. Therefore if 0 1, we have
e
a+(1)b
e
a
+ (1 )e
b
.
If F(x), G(x) ,= 0, let a = p log F(x), b = q log G(x), = 1/p, and 1 =
1/q. We then obtain
F(x)G(x)
F(x)
p
p
+
G(x)
q
q
.
Clearly this inequality also holds if F(x) = 0 or G(x) = 0. Integrating,
_
FG
|F|
p
p
p
+
|G|
q
q
q
=
1
p
+
1
q
= 1.
One application of Holders inequality is to prove Minkowskis inequality,
which is simply the triangle inequality for L
p
.
We rst need the following lemma:
Lemma 13.2 If a, b > 0 and 1 p < , then
(a + b)
p
2
p1
a
p
+ 2
p1
b
p
.
Proof. To prove this, we may without loss of generality assume a b. The
case a = 0 is obvious, so we assume a > 0. Dividing both sides by a and
letting x = b/a, the inequality we want is equivalent to
(1 +x)
p
2
p1
+ 2
p1
x
p
, x 1. (13.1)
Clearly this inequality is valid for x = 1. So to prove (13.1) it suces to
show that the derivative of
(1 +x)
p
2
p1
2
p1
x
p
47
is less than equal to 0, or
p(1 +x)
p1
2
p1
px
p1
, x 1.
This last inequality hods because 2x 1 + x when x 1.
Proposition 13.3 (Minkowskis inequality) If 1 p , then
|f + g|
p
|f|
p
+|g|
p
.
Proof. Since [(f + g)(x)[ [f(x)[ +[g(x)[, integrating gives the case when
p = 1. The case p = is also easy. So let us suppose 1 < p < . If |f|
p
or |g|
p
is innite, the result is obvious, so we may assume both are nite.
The inequality Lemma 13.2 with a = [f(x)[ and b = [g(x)[ yields, after an
integration,
_
[(f + g)(x)[
p
d 2
p
_
[f(x)[
p
d + 2
p
_
[g(x)[
p
d.
So we have |f + g|
p
< . Clearly we may assume |f + g|
p
> 0.
Now write
[f + g[
p
[f[ [f + g[
p1
+[g[ [f + g[
p1
and apply Holders inequality with q = (1
1
p
)
1
. We obtain
_
[f + g[
p
|f|
p
_
_
[f + g[
(p1)q
_
1/q
+|g|
p
_
_
[f + g[
(p1)q
_
1/q
.
Since p
1
+ q
1
= 1, then (p 1)q = p, so we have
|f + g|
p
p

_
|f|
p
+|g|
p
_
|f + g|
p/q
p
.
Dividing both sides by |f +g|
p/q
p
and using the fact that p (p/q) = 1 gives
us our result.
Minkowskis inequality says that L
p
is a normed linear space, provided we
identify functions that are equal a.e.
We say f
n
converges to f in L
p
if |f
n
f|
p
0 as n . The next
proposition compares convergence in L
p
to convergence in measure. Before
we prove this, we prove an easy preliminary result known as Chebyshevs
inequality.
48
Lemma 13.4 If 1 p < ,
(x : [f(x)[ a)
|f|
p
p
a
p
.
Proof. If A = x : [f(x)[ a, then
(A)
_
A
[f(x)[
p
a
p
d
1
a
p
_
[f[
p
d.
Proposition 13.5 If f
n
converges to f in L
p
, then it converges in measure.
Proof. If > 0, by Chebyshevs inequality
(x : [f
n
(x)f(x)[ > ) = (x : [f
n
(x)f(x)[
p
>
p
)
|f
n
f|
p
p
p
0.
Letting f
n
= n
2
(0,1/n)
on [0, 1] with the measure being Lebesgue measure
gives an example where f
n
converges to 0 a.e. and in measure, but does not
converge in L
p
.
Example 9.4 is an example where f
n
converges to 0 in L
p
but not a.e.
We next show that L
p
is complete. This is often phrased as saying that
L
p
is a Banach space, i.e., a complete normed linear space.
Proposition 13.6 If 1 p , then L
p
is complete.
Proof. We do only the case p < ; the case p = is easy. Suppose f
n
is a Cauchy sequence in L
p
. Given = 2
(j+1)
, there exists n
j
such that if
n, m n
j
, then |f
n
f
m
|
p
2
(j+1)
. Without loss of generality we may
assume n
j
n
j1
for each j.
49
Set n
0
= 0 and dene f
0
0. If A
j
= x : [f
n
j
(x) f
n
j1
(x)[ > 2
j/2
,
then from Lemma 13.4, (A
j
) 2
jp/2
. We have
(
j=1
m=j
A
m
) = lim
j
(
m=j
A
m
) lim
j
m=j
(A
m
) = 0.
So except for a set of measure 0, for each x there is a last j for which
x
m=j
A
m
, hence a last j for which x A
j
. So for each x (except
for the null set) there is a j
0
(depending on x) such that if j j
0
, then
[f
n
j
(x) f
n
j1
(x)[ 2
j
.
Set
g
j
(x) =
j
m=1
[f
nm
(x) f
n
m1
(x)[.
g
j
(x) increases for each x, and the limit is nite for almost every x by the
preceding paragraph. Let us call the limit g(x). We have
|g
j
|
p

j
m=1
2
j
+|f
n
1
|
p
2 +|f
n
1
|
p
by Minkowskis inequality, and so by Fatous lemma, |g|
p
2+|f
n
1
|
p
< .
We have
f
n
j
(x) =
j
m=1
(f
nm
(x) f
n
m1
(x)).
Suppose x is not in the null set where g(x) is innite. Since [f
n
j
(x)f
n
k
(x)[
[g
n
j
(x) g
n
k
(x)[ 0 as j, k , then f
n
j
(x) is a Cauchy series (in R), and
hence converges, say to f(x). We have |f f
n
j
|
p
= lim
m
|f
nm
f
n
j
|
p
;
this follows by dominated convergence with the function g dened above as
the dominating function.
We have thus shown that |f f
n
j
|
p
0. Given = 2
(j+1)
, if m n
j
,
then |f f
m
|
p
|f f
n
j
|
p
+ |f
m
f
n
j
|
p
. This shows that f
m
converges
to f in L
p
norm.
Next we show:
Proposition 13.7 The set of continuous functions with compact support is
dense in L
p
(R
d
).
50
Proof. Suppose f L
p
. By dominated convergence
_
[f f
[n,n]
[
p
0
as n , the dominating function being [f[
p
. So we may suppose f has
compact support. By writing f = f
+
f
we may suppose f 0. By taking

simple functions s
m
increasing to f, we have
_
[f s
m
[
p
0 by dominated
convergence, so it suces to consider simple functions. By linearity, it suces
to consider characteristic functions with compact support. Given such a
E
and > 0 we showed in Proposition 8.3 that there exists g continuous with
compact support and with values in [0, 1] such that
_
[g
E
[ < . Since
[g
E
[ 1, then
_
[g
E
[
p
_
[g
E
[ < .
The following is very useful.
Proposition 13.8 For 1 < p < and p
1
+ q
1
= 1,
|f|
p
= sup
_
_
fg : |g|
q
1
_
. (13.2)
When p = 1 (13.2) holds if we take q = , and if p = (13.2) holds if we
take q = 1.
Proof. The right hand side of (13.2) is less than the left hand side by
Holders inequality. So we need only show that the right hand side is greater
than the left hand side.
First suppose p = 1. Take g(x) = sgn f(x), where sgn a is 1 if a > 0, is
0 if a = 0, and is 1 if a < 0. Then g is bounded by 1 and fg = [f[. This
takes care of the case p = 1.
Next suppose p = . Since is -nite, there exist sets F
n
increasing
up to X such that (F
n
) < for each n. If M = |f|
, let a be any
nite real less than M. By the denition of L
norm, the measure of A =

x F
n
: [f(x)[ > a must be positive if n is suciently large. Let g(x) =
(sgn f(x))
A
(x)/(A). Then the L
1
norm of g is 1 and
_
fg =
_
A
[f[/(A)
a. Since a is arbitrary, the supremum on the right hand side must be M.
Now suppose 1 < p < . We may suppose |f|
p
> 0. Let q
n
be a sequence
of nonnegative simple functions increasing to f
+
, r
n
a sequence of nonneg-
ative simple functions increasing to f
, and s
n
(x) = (q
n
(x) r
n
(x))
Fn
(x).
Then s
n
(x) f(x) for each x, [s
n
(x)[ [f(x)[ for each x, s
n
is a simple func-
tion, and |s
n
|
p
< for each n. If f L
p
, then |s
n
|
p
|f|
p
by dominated
51
convergence. If
_
[f[
p
= , then
_
[s
n
[
p
by monotone convergence.
For n suciently large, |s
n
|
p
> 0.
Let
g
n
(x) = (sgn f(x))
[s
n
(x)[
p1
|s
n
|
p/q
p
.
Since (p 1)q = p, then
|g
n
|
q
=
(
_
[s
n
[
(p1)q
)
1/q
|s
n
|
p/q
p
=
|s
n
|
p/q
p
|s
n
|
p/q
p
= 1.
On the other hand, since [f[ [s
n
[,
_
fg
n
=
_
[f[ [s
n
[
p1
|s
n
|
p/q
p
_
[s
n
[
p
|s
n
|
p/q
p
= |s
n
|
p(p/q)
p
.
Since p (p/q) = 1, then
_
fg
n
|s
n
|
p
, which tends to |f|
p
.
The above proof also establishes
Corollary 13.9 For 1 < p < and p
1
+ q
1
= 1,
|f|
p
= sup
_
_
fg : |g|
q
1, g simple
_
.
The space L
p
is a normed linear space. We can thus talk about its dual,
namely, the set of bounded linear functionals on L
p
. The dual of a space Y
is denoted Y
. If H is a bounded linear functional on L

p
, we dene the norm
of H to be |H| = supH(f) : |f|
p
1.
Theorem 13.10 If 1 < p < and p
1
+ q
1
= 1, then (L
p
)
= L
q
.
What this means is that if H is a bounded linear functional on L
p
, then
there exists g L
q
such that H(f) =
_
fg and that if g L
q
, then H(f) =
_
fg is a bounded linear functional on L
p
.
Proof. If g L
q
, then setting H(f) =
_
fg for f L
p
yields a bounded lin-
ear functional; the boundedness follows from Holders inequality. Moreover,
from Holders inequality and Proposition 13.8 we see that |H| = |g|
q
.
52
Now suppose we are given a bounded linear functional H on L
p
and we
must show there exists g L
q
such that H(f) =
_
fg. First suppose (X) <
. Dene (A) = H(
A
). If A and B are disjoint, then
(A B) = H(
AB
) = H(
A
+
B
) = H(
A
) + H(
B
) = (A) + (B).
To show is countably additive, it suces to show that if A
n
A, then
(A
n
) (A). But if A
n
A, then
An

A
in L
p
, and so (A
n
) =
H(
An
) H(
A
) = (A); we use here the fact that (X) < . Therefore
is a countably additive signed measure. Moreover, if (A) = 0, then
A
= 0 a.e., hence (A) = H(
A
) = 0. By writing =
+

and using
the Radon-Nikodym theorem for both the positive and negative parts, we
see there exists an integrable g such that (A) =
_
A
g for all sets A. If
s =
a
i
A
i
is a simple function, by linearity we have
H(s) =
a
i
H(
A
i
) =
a
i
(A
i
) =
a
i
_
g
A
i
=
_
gs.
By Corollary 13.9,
|g|
q
= sup
_
_
gs : |s|
p
1, s simple
_
supH(s) : |s|
p
1 |H|.
If s
n
are simple functions tending to f in L
p
, then H(s
n
) H(f), while by
Holders inequality
_
s
n
g
_
fg. We thus have H(f) =
_
fg for all f L
p
,
and |g|
p
|H|. By Holders inequality, |H| |g|
p
.
In the case where is -nite, but not nite, let F
n
X be such that
(F
n
) < for each n. Dene functionals H
n
by H
n
(f) = H(f
Fn
). Clearly
each H
n
is a bounded linear functional on L
p
. Applying the above argument,
we see there exist g
n
such that H
n
(f) =
_
fg
n
and |g
n
|
q
= |H
n
| |H|.
It is easy to see that g
n
is 0 if x / F
n
. Moreover, by the uniqueness part of
the Radon-Nikodym theorem, if n > m, then g
n
= g
m
on F
m
. Dene g by
setting g(x) = g
n
(x) if x F
n
. Then g is well dened. By Fatous lemma, g
is in L
q
with a norm bounded by |H|. Since f
Fn
f in L
p
by dominated
convergence, then H
n
(f) = H(f
Fn
) H(f), since H is a bounded linear
functional on L
p
. On the other hand H
n
(f) =
_
Fn
fg
n
=
_
Fn
fg
_
fg
by dominated convergence. So H(f) =
_
fg. Again by Holders inequality
|H| |g|
p
.
53
14 Fourier transforms
Fourier transforms give a representation of a function in terms of frequencies.
We give the basic properties here.
If f L
1
(R
n
), dene the Fourier transform

f by
f(u) =
_
R
n
e
iux
f(x)dx, u R
n
. (14.1)
We are using u x for the standard inner product in R
n
. Various books have
slightly dierent denitions. Some put a negative sign before the iu x, some
have a 2 either in front of the integral or in the exponent. The basic theory
is the same in any case.
Some basic properties of the Fourier transform are given by
Proposition 14.1 Suppose f and g are in L
1
. Then
(a)

f is bounded and continuous;
(b)

(f + g)(u) =

f(u) + g(u);

(af)(u) = a
f(u);
(c) if f
a
(x) = f(x + a), then

f
a
(u) = e
iua
f(u);
(d) if g
a
(x) = e
iax
g(x), then g
a
(u) =

f(u + a);
(e) if h
a
(x) = f(ax), then

h
a
(u) = a
n
f(u/a).
Proof. (a)

f is bounded because f L
1
and [e
iux
[ = 1. We have
f(u + h)

f(u) =
_
_
e
i(u+h)x
e
iux
_
f(x)dx.
So
[
f(u + h)

f(u)[
_

e
iux
e
ihx
1
[f(x)[dx.
The integrand is bounded by 2[f(x)[, which is integrable, and e
ihx
1 0
as h 0, and thus the continuity follows by dominated convergence.
54
(b) is obvious. (c) follows because
f
a
(u) =
_
e
iux
f(x + a)dx =
_
e
iu(xa)
f(x)dx = e
iua
f(u)
by a change of variables. For (d),
g
a
(u) =
_
e
iux
e
iax
f(x)dx =
_
e
i(u+a)x
f(x)dx =

f(u + a).
Finally for (e), by a change of variables,
h
a
(u) =
_
e
iux
f(ax)dx = a
n
_
e
iu(y/a)
f(y)dy
= a
n
_
e
i(u/a)y
f(y)dy = a
n
f(u/a).
One reason for the usefulness of Fourier transforms is that they relate
derivatives and multiplication.
Proposition 14.2 Suppose f L
1
and x
j
f(x) L
1
, where x
j
is the j
th
coordinate of x. Then
f
u
j
(u) = i
_
e
iux
x
j
f(x)dx.
Proof. Let e
j
be the unit vector in the j
th
direction. Then
f(u + he
j
)

f(u)
h
=
1
h
_
_
e
i(u+he
j
)x
e
iux
_
f(x)dx
=
_
e
iux
_
e
ihx
j
1
h
_
f(x)dx.
Since
1
h
_
e
ihx
j
1
_
[x
j
[
and x
j
f(x) L
1
, the right hand side converges to
_
e
iux
ix
j
f(x)dx by dom-
inated convergence. Therefore the left hand side converges. Of course, the
limit is
f/u
j
.
55
The convolution of f and g is dened by
f g(x) =
_
f(x y)g(y)dy.
By a change of variables, this is the same as
_
f(y)g(xy)dy, so f g = gf.
Proposition 14.3 (a) If f, g L
1
, then f g is in L
1
and |f g|
1

|f|
1
|g|
1
.
(b) The Fourier transform of f g is

f(u) g(u).
Proof. (a) We will show f g is nite a.e. by showing
_
[f g(x)[ dx < .
We have _
[f g(x)[dx
_ _
[f(x y)[ [g(y)[dy dx.
Since the integrand is nonnegative, we can apply Fubini and the right hand
side is equal to
_ _
[f(x y)[dx [g(y)[dy =
_ _
[f(x)[dx [g(y)[dy = |f|
1
|g|
1
.
The rst equality here follows by a change of variables. To verify that we can
do a change of variables, we reduce to simple functions and then characteristic
functions, and then use the translation invariance of Lebesgue measure.
(b) We have
f g(u) =
_
e
iux
_
f(x y)g(y)dy dx
=
_ _
e
iu(xy)
f(x y)dx e
iuy
g(y)dy
=
_

f(u)e
iuy
g(y)dy =

f(u) g(u).
We applied Fubini in the rst equality; this is valid because as we saw in (a),
the absolute value of the integrand is integrable.
We want to give a formula for recovering f from

f. First we need to
calculate the Fourier transform of a particular function.
56
Proposition 14.4 (a) Suppose f
1
: R R is dened by
f
1
(x) =
1
2
e
x
2
/2
.
Then

f
1
(u) = e
u
2
/2
.
(b) Suppose f
n
: R
n
R is given by
f
n
(x) =
1
(2)
n/2
e
|x|
2
/2
.
Then

f
n
(u) = e
|u|
2
/2
.
Proof. (a) may be proved using contour integration, but lets give a real
variable proof. Let g(u) =
_
e
iux
e
x
2
/2
dx. Dierentiate with respect to
u. We may dierentiate under the integral sign because (e
i(u+h)x
e
iux
)/h
is bounded in absolute value by [x[ and [x[e
x
2
/2
is integrable; therefore
dominated convergence applies. We then obtain
g
(u) = i
_
e
iux
xe
x
2
/2
dx.
By integration by parts this is equal to
u
_
e
iux
e
x
2
/2
dx = ug(u).
Solving the dierential equation g
(u) = ug(u), we have

[log g(u)]
=
g
(u)
g(u)
= u,
so log g(u) = u
2
/2 + c
1
, and so then
g(u) = c
2
e
u
2
. (14.2)
Since g(0) =
_
e
x
2
/2
dx =
2, c
2
=
2. Substituting this value of c

2
in
(14.2) and dividing both sides by
2 proves (a).
57
For (b), since f
n
(x) = f
1
(x
1
) f
1
(x
n
) if x = (x
1
, . . . , x
n
),
f
n
(u) =
_

_
e
i
P
j
u
j
x
j
f
1
(x
1
) f
1
(x
n
)dx
1
dx
n
=

f
1
(u
1
)

f
1
(u
n
) = e
|u|
2
/2
.
One more preliminary before proving the inversion theorem.
Proposition 14.5 Suppose is in L
1
and
_
(x)dx = 1. Let
A
(x) =
A
n
(x/A).
(a) Then |f
A
f|
1
0 as A 0.
(b) If f is continuous with compact support, then f
A
converges to f
pointwise.
Proof. (a) Let > 0. Choose g continuous with compact support so that
|f g|
1
< . Let h = f g. A change of variables shows that |
A
|
1
= ||
1
.
Observe
|f
A
f|
1
|g
A
g|
1
+|h
A
h|
1
and
|h
A
h|
1
|h|
1
+|h
A
|
1
|h|
1
+|h|
1
|
A
|
1
< (1 +||
1
).
So since is arbitrary, it suces to show that g
A
g in L
1
.
We start by writing
g
A
(x) g(x) =
_
g(x y)
A
(y)dy g(x) =
_
g(x Ay)(y)dy g(x)
=
_
[g(x Ay) g(x)](y)dy.
We used a change of variables and the fact that
_
(y)dy = 1. Because g
is continuous with compact support, then g is bounded, and the integral on
the right goes to 0 by dominated convergence, the dominating function being
|g|
[(y)[. Therefore g
A
(x) converges to g(x) pointwise.
58
To show the convergence in L
1
, we have
_
[g
A
(x) g(x)[dx
_ _
[g(x Ay) g(x)[ [(y)[dy dx
=
_ _
[g(x Ay) g(x)[ [(y)[dx dy.
Since g is continuous with compact support and hence bounded, for each y
G
A
(y) =
_
[g(x Ay) g(x)[dx
converges to 0 as A 0 by dominated convergence. Also
G
A
(y)
_
[g(x Ay)[dx +
_
[g(x)[dx 2|g|
1
< .
Then _
G
A
(y)[(y)[dy
converges to 0 as A 0 by dominated convergence, the dominating function
being 2|g|
1
[(y)[.
(b) This follows from the argument we used for g above.
Now we are ready to give the inversion formula. The proof seems longer
than it might be, but there is no avoiding the introduction of the function
H
a
or some similar function.
Theorem 14.6 Suppose f,

f L
1
. Then
f(y) =
1
(2)
n
_
e
iuy

f(u)du, a.e.
Proof. If g(x) = a
n
f(x/a), then its Fourier transform is

f(au). So the
Fourier transform of
1
a
n
1
(2)
n/2
e
x
2
/2a
2
is e
a
2
u
2
/2
. Therefore if we let
H
a
(x) =
1
(2)
n
e
|x|
2
/2a
2
,
59
we have
H
a
(u) = (2)
n/2
a
n
e
a
2
|u|
2
/2
.
We have
_

f(u)e
iuy
H
a
(u)du
=
_ _
e
iux
f(x)e
iuy
H
a
(u)dx du
=
_ _
e
iu(xy)
H
a
(u)duf(x) dx
=
_

H
a
(x y)f(x)dx. (14.3)
We can interchange the order of integration because
_ _
[f(x)[ [H
a
(u)[dx du < .
The left hand side of the rst line of (14.3) converges to (2)
n
_

f(u)e
iuy
dy
as a by dominated convergence and the fact that

f L
1
. The last line
of (14.3) is equal to
_

H
a
(y x)f(x)dx = f

H
a
(y) (14.4)
since

H
a
is symmetric. But by Proposition 14.5, f

H
a
converges to f in L
1
as a .
The last topic that we consider is the Plancherel theorem.
Theorem 14.7 (a) Suppose f is continuous with compact support. Then
f L
2
and
|f|
2
= (2)
n/2
|
f|
2
. (14.5)
(b) We can use the result in (a) to dene

f when f L
2
and so that
(14.5) holds.
60
Proof. (a) Let g(x) = f(x). Note
g(u) =
_
e
iux
f(x)dx =
_
e
iux
f(x)dx =
_
e
iux
f(x)dx =

f(u).
By (14.3) and (14.4) with y = 0
f g

H
a
(0) =
_

f g(u)H
a
(u)du. (14.6)
Since

f g(u) =

f(u) g(u) = [
f(u)[
2
, the right hand side of (14.6) converges
by monotone convergence to (2)
n
_
[
f(u)[
2
du as a . Since f and g
are continuous with compact support, then it is easy to see that f g is also,
and so the left hand side of (14.6) converges to f g(0) =
_
f(y)g(y)dy =
_
[f(y)[
2
dy by Proposition 14.5(b).
(b) The set of continuous functions with compact support is dense in L
2
.
Given a function f in L
2
, choose a sequence of continuous functions with
compact support f
m
such that f
m
f in L
2
. By the result in (a),
f
m
is
a Cauchy sequence in L
2
, and therefore converges to a function in L
2
, which
we call

f. If f
m
is another sequence of continuous functions with compact
support converging to f in L
2
, then f
m
f
m
is a sequence of continuous
functions with compact support converging to 0 in L
2
; by the result in (a),
f
m
m
converges to 0 in L
2
, and therefore

f is dened uniquely up to almost
everywhere equivalence. By passing to the limit in L
2
on both sides of (14.5),
we see that (14.5) holds for f L
2
.
Richard F. Bass
Department of Mathematics
University of Connecticut
Storrs, CT 06269-3009, USA
bass@math.uconn.edu
61

Real Analysis: 1 σ-algebras

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Real Analysis: 1 σ-algebras

Uploaded by

Copyright:

Available Formats

Real Analysis

(A) = infm(G) : G open, A G. (3.1)

is not a measure on the collection of all subsets of R.

is a measure on a strictly smaller -algebra that strictly contains the

(N) = 0. Let L be the

on L. (A) shows that L is strictly

, the measure of a point is 0, so the measure

is an outer measure (Proposition 3.3). We then dene what it means

-measurable (Denition 3.4) and prove that the collection

-measurable sets is a -algebra and that m

restricted to this -algebra

-measurable sets contains the Borel -algebra.

is not a measure on the collection of all subsets of R.

is a measure. Dene x y if xy is rational. This is an

(A) > 0. But

(A + q), which implies m

dened by (3.2) is an outer measure.

be an outer measure. A set A X is m

is an outer measure on X, then the collection / of m

measurable sets is a -algebra and the restriction of m

(A) = 0 and E X, then

-measurable sets is a -algebra, it suces

(E) < ; we need to show

and call m Lebesgue measure.

(A) and the

(AE) < . We have

(A) = n(A) = n(E) +n(AE) n(E) +m(AE) n(E) +.

((a, b]) = (b) (a), dene

measure of a point x is (x) (x), where

(x) is equal to the size of the jump (if any) of at x.

a. The measurability of fg follows since fg =

= min(f, 0), dene

= max(f, 0), then

and using linearity proves (a)(c) for this case, too.

are another such pair,

is a null set with respect to ..

are another such pair of sets and A

E, then (A) 0. But A E E

can be written as the union of A

are mutually singular. This decom-

(A) = (A F). This gives the desired decomposition.

is another such decomposition with

be the sets in the denition of mutually singular. Then

gives another Hahn decomposition, hence EE

is called the total variation measure and is written

(y) dy = f(b) f(a) if f

(x) to be the common value.

(x) dx f(b) f(a).

f(X), the other

(x) = f(x) for

exists a.e. Let K be a bound for [f[. If

a.e., then by dominated con-

(x) f(x)] dx = 0 for all c,

= f a.e. by Lemma 12.5.

(x) dx F(b) F(a) =

f 0, this tells us that F

(y) dy = F(b) F(a).

(x) = 0, and let > 0. For

(t) dt, then G is

= 0 a.e., and therefore F G is constant. Thus F(x) =

(t) dt + F(a). If we set x = b, we get our result.

norm of a function f is the supremum of f provided we disregard

we may suppose f 0. By taking