Professional Documents
Culture Documents
Lenya Ryzhik
December 4, 2008
Essentially nothing found here is original except for a few mistakes and misprints here and
there. These lecture notes are based on material from the following books: H. Royden Real
Analysis, L. Evans and R. Gariepy Measure Theory and Fine Properties of Functions,
J. Duoandikoetxea Fourier Analysis, and M. Pinsky Introduction to Fourier Analysis and
Wavelets.
1.1
m(En ).
(iv) m is translationally invariant, that is, m(E + x) = mE for all sets E M and x R.
The trouble is that such function does not exist, or, rather that for any such function m
the measure of any interval is either equal to zero or infinity. Let us explain why this is so. We
will do this for the interval [0, 1) but generalization to an arbitrary interval is straightforward.
Given x, y [0, 1) define
x + y,
if x + y < 1,
xy =
x + y 1, if x + y 1,
and for a set E [0, 1) we set E y = {x [0, 1) : x = e y for some e E}.
Lemma 1.1 Assume (i)-(iv) above. If E [0, 1) is a set and y [0, 1), then we have
m(E y) = m(E).
Proof. Let E1 = E [0, 1 y) and E2 = E [1 y, 1), then E1 and E2 are and disjoint, and
the same is true for E1 y = E1 + y, and E2 y = E2 + (y 1), so that
E y = (E1 y) (E2 y).
In addition, we have
m(E y) = m(E1 y) + m(E2 y) = m(E1 + y) + m(E2 + (y 1)) = m(E1 ) + m(E2 ) = m(E),
and we are done. 2
Let us introduce an equivalence relation on [0, 1): x y if xy Q. Using axiom of choice
we deduce existence of a set P which contains exactly one element from each equivalence class.
Set Pj = P qj , where qj is the j-th rational number in [0, 1) (we write Q[0, 1) = {q1 , q2 , . . .}).
Note that the sets Pj are pairwise disjoint: if x Pi Pj , then x = pi qi = pj qj , so pj pi
and thus pi = pj , and i = j, since P contains exactly one element from each equivalence class.
On the other hand, we have
[
[0, 1] =
Pj ,
j=1
X
[
m(Pn ).
Pn =
m([0, 1)) = m
n=1
n=1
Thus, we have m([0, 1)) = 0 if m(P ) = 0 or m([0, 1)) = + if m(P ) > 0. Therefore, if we
want to keep generalization of the length of an interval not totally trivial we have to drop one
of the requirements (i) - (iv), and the best candidate to do so is (i) since (ii)-(iv) come from
physical considerations.
Let us now define the (outer) Lebesgue measure of a set on the real line.
P
Definition 1.2 Let A be a subset of R. Its outer Lebesgue measure m A = inf l(In ) where
the infimum is taken over all at most countable collections of open intervals {In } such that
A n In .
Note that we obviously have (i) m () = 0, and (ii) if A B then m (A) m (B). The
condition that In are open intervals is not so important for the definition of the Lebesgue
measure but will be important for general measures later.
Proposition 1.3 If I is an interval then m (I) = l(I).
Proof. (1) If I is either an open, or a closed, or half-interval between points a and b then we
have m (I) l(a , b + ) = b a + 2 for all > 0. It follows that m (I) b a.
(2) On the other hand, to show that m ([a, b]) b a, take a cover {In } of [a, b] by open
intervals. We may choose a finite sub-cover {Jj }, j = 1, . . . , N which still covers [a, b]. As
PN
I N
j=1 Jj we have
j=1 l(Jj ) b a. Therefore, m [a, b] b a and thus, together with
(1.1)
Proof. If m (Aj ) = + then we are done. If m (Aj ) < + for all j N, then for any > 0
(j)
(j)
we may find a countable collection {Ik } of intervals such that Aj k Ik and
(j)
l(Ik )
k=1
X (j)
l(Ik ).
j m (Aj )
2
k=1
Then we have
A :=
Aj
and so
m (A)
X
j,k
(j)
l(Ik )
X
j
(j)
Ik ,
j,k
X
m (Aj ).
m (Aj ) + j = +
2
j
1.2
k=1
(Ak ) whenever A
k=1
Ak .
The term outer in the above definition is not the best since we do not always assume that
comes from some covers by open sets but we will use it anyway.
Definition 1.9 A measure defined on a set X is finite if (X) < +.
Definition 1.10 Let be an outer measure on X and let A X be a set. Then |A , a
restriction of to A is the outer measure defined by |A (B) = (A B) for B X.
Examples. (1) The Lebesgue measure on R.
(2) The counting measure: the measure # (A) is equal to the number of elements in A.
(3) The delta measure on the real line: given a subset A R, we set (A) = 1 if 0 A and
(A) = 0 if 0
/ A.
Measurable sets
Now, we have to restrict the class of sets for which we will define the notion of a measure (as
opposed to the outer measure which is defined for all sets). The following definition is due to
Caratheodory.
Definition 1.11 A set A X is -measurable if for each set B X we have
(B) = (A B) + (Ac B).
It goes without saying that if A is a measurable set then so is its complement Ac .
Note that we always have
(B) (A B) + (Ac B)
so to check measurability of A we would need only to verify that
(B) (A B) + (Ac B)
for all sets B X.
Remark. If A is a measurable set we will write (A) instead of (A).
Sets of measure zero
Lemma 1.12 If (E) = 0 then the set E is measurable.
Proof. Let A X be any set, then A E E, so
(A E) (E) = 0,
while A E c A and thus
(A) (A E c ) = (A E c ) + 0 = (A E c ) + (A E),
and thus the set E is measurable. 2
4
(1.2)
(1.3)
n
(A Ei ).
(1.4)
(A (i=1 Ei )) =
i=1
Proof. We prove this by induction. The case n = 1 is trivial. Assume that (1.4) holds for
n 1, then, as En is measurable, we have
(A ni=1 Ei ) = (A (ni=1 Ei ) En ) + (A (ni=1 Ei ) Enc )
n
X
n1
= (A En ) + (A (i=1 En ) =
(A Ei ).
i=1
The last equality above follows from the induction assumption while the second one uses
pairwise disjointness of Ei . 2
Then the sets Ej are disjoint and their union is the same as that of Ej :
E = j Ej = j Ej .
Now, take any set A and set
Fn =
n
[
Ej E.
j=1
n
X
(A Ej ) + (A E c ).
j=1
(A Ej ) + (A E c ).
j=1
(1.5)
(A Ej ).
j=1
Then we simply set In0 = In (a, ) and In00 = (, a + /2n ) this is to keep In00 an open
interval. Then we have
A1 n In0 , A2 n In00 ,
thus
m (A1 )
l(In0 ), m (A2 )
l(In00 ).
It follows that
m (A1 ) + m (A2 )
[l(In0 ) + l(In00 )]
Countable additivity
Proposition 1.20 Let be a measure and let {Ej } be a collection of pairwise disjoint measurable sets, then
!
[
X
Ej =
(Ej ).
j=1
j=1
Ej
j=1
(Ej ).
j=1
!
Ej
j=1
(Ej ).
(1.7)
j=1
However, if all Ej are measurable and pairwise disjoint, we have, according to Lemma 1.14,
with A = X, the whole measure space, for any n N
!
!
n
n
X
[
[
(Ej )
Ej
Ej =
j=1
j=1
j=1
Ej = lim (Ej ).
(1.8)
j+
j=1
Proof. Let E =
j=1
Fj ,
j=1
and all sets Fj are disjoint. It follows from Proposition 1.20 that
X
(E1 \ E) =
(Fj ).
(1.9)
j=1
On the other hand, as E E1 so that E1 = (E1 \ E) E, the same proposition implies that
(E1 \ E) = (E1 ) (E) and, similarly, (Fj ) = (Ej ) (Ej+1 ). Using this in (1.9) leads
to
n
X
X
(E1 ) (E) =
((Ej ) (Ej+1 ) = lim
((Ej ) (Ej+1 )
n+
j=1
j=1
n+
[
lim (Ej ) =
Ej .
j+
(1.10)
j=1
k
X
j=1
k
X
(Ej+1 \ Ej ).
(1.11)
j=1
We used in the last step the fact that Ej Ej+1 . Now, let k + in (1.11) and use the
fact that the sets Ej+1 \ Ej are pairwise disjoint, together with countable additivity of from
Proposition 1.20
!!
!
k
X
[ [
[
lim (Ek ) = (E1 ) +
(Ej+1 \ Ej ) = E1
(Ej+1 \ Ej )
=
Ej ,
k+
j=1
j=1
j=1
which is (1.10). 2
Exercise. The set P defined after Lemma 1.1 is not measurable.
1.3
Definition 1.23 (i) A measure on Rn is regular if for each set A Rn there exists a
-measurable set B such that A B and (A) = (B).
(ii) A measure is Borel if every Borel set is -measurable.
(iii) A measure on Rn is Borel regular if is Borel and for each set A Rn there exists a
Borel set B such that A B and (A) = (B).
(iv) A measure is a Radon measure if is Borel regular and (K) < + for each compact
set K Rn .
Example. 1. The Lebesgue measure is a Radon measure.
2. -function is a Radon measure.
Increasing sequences of sets
Theorem 1.24 Let be a regular measure, and let A1 A2 . . . An . . . be an
increasing sequence of sets which need not be measurable. Then
!
[
lim (Ak ) =
Ak .
(1.12)
k
k=1
Proof. Since the measure is regular, there exist measurable sets Ck such that Ak Ck and
(Ak ) = (Ck ). Set
\
Bk =
Cj ,
jk
[
[
lim (Ak ) = lim (Bk ) =
Bj
Aj .
k
j=1
j=1
!
Aj
j=1
10
(1.13)
(1.14)
We begin with the following lemma which addresses the statement of the theorem for the
Borel sets.
Lemma 1.27 Let be a Borel measure and B be a Borel set.
(i) If (B) < + then for any > 0 there exists a closed set C such that C B and
(B \ C) < .
(ii) If is Radon then for any > 0 there exists an open set U such that B U and
(U \ B) < .
Proof. (i) Take a Borel set B with (B) < +, and define the restriction = |B . As at
the beginning of the proof of Theorem 1.25, we deduce that is a Borel measure. In addition,
is a finite measure as (B) < +. Let F be the collection of all -measurable subsets A
of Rn such that for any > 0 we can find a closed set C A which is a subset of A and
(A \ C) < . Our goal is to show that B F. To do this we define G as the collection of all
sets A such that both A F and Ac F. It is sufficient to show that
G contains all open sets and is a -algebra.
(1.15)
Then it would follow that G contains all Borel sets, hence, in particular G contains B and
thus so does F. Hence, we set out to prove (1.15).
Step 1: Closed sets. The first trivial observation is that F contains all closed sets.
Step 2: Infinite intersections. Let us now show that if the sets Aj F for all j = 1, 2, . . . ,
then so is their intersection: A =
j=1 Aj F.
To show that, given > 0, using the fact that Aj F, we choose the closed sets Cj Aj
so that
(1.16)
(Aj \ Cj ) < j .
2
Then the closed set C =
j=1 Cj A and, moreover,
!
[
X
(A \ C)
(Aj \ Cj )
(Aj \ Cj ) < .
j=1
j=1
Therefore, indeed, A F.
11
Step 3: Infinite unions. SNext, we establish that if the sets Aj F for all j = 1, 2, . . . ,
then so is their union: A =
j=1 Aj F.
Choose the sets Cj as in (1.16), then, as (A) < + we have, using Proposition 1.21,
lim (A \ (
m+
m
[
Cj )) = (A \
j=1
[
[
Cj ) = (( Aj ) \ ( Cj ))
j=1
(Aj \ Cj )
j=1
j=
j=1
(Aj \ Cj ) < .
j=1
m0
[
Cj )) < ,
j=1
S 0
and the set C = m
j=1 Cj is closed.
Step 4: collection G contains all open sets. If O is an open set then Oc is closed and
thus Oc F automatically by Step 1. But any open set can be written as a countable union
of closed sets, hence by Step 3 collection F contains all open sets, hence, in particular, our
set O. Thus, both O and Oc are in F, so O G.
Step 5: G is a -algebra. Obviously, if A G then Ac G as well. Therefore, we only
need to check that if A1 , A2 , . . . , An , . . . G then A =
j=1 Aj G. But A F by Step 3,
while Step 2 implies that
\
c
A =
(Acj )
j=1
.
2m
c
Then B Cm
and Um B Um \ Cm , so that
B=
(Um B)
m=1
and
"
(U \ B) =
(Um \ Cm ) := U,
m=1
(Um \ Cm ) \ B
m=1
X
m=1
((Um \ Cm ) \ B) < ,
(1.17)
If A is -measurable and (A) = + define the annuli Dk = {x : k 1 |x| < k} and split
A=
(A Dk ).
k=1
Observe that
+ = (A) =
(A Dk ),
k=1
while (A Dk ) < + since is a Radon measure. We can use (1.17) to find closed sets
Ck A Dk such that
1
((A Dk ) \ Ck ) < k ,
2
n
and consider the closed sets Gn = k=1 Ck . Note that, as all Ck are pairwise disjoint,
n
n
X
X
1
(Gn ) =
(Ck )
(A Dk ) k .
(1.18)
2
k=1
k=1
As, by Proposition 1.22, we have
+ = (A) = lim
n+
(nk=1 (A
Dk )) = lim
n+
n
X
((A Dk )) ,
k=1
we deduce from (1.18) that limn+ (Gn ) = + = (A). Therefore, (1.17) actually holds
also if (A) = +. What remains is to replace the word closed in (1.17) by compact.
13
This is simple: if (A) < + given > 0 take a closed set C A such that (C) > (A)
and write C =
k=1 Ck , with Ck = C U (0, k). Then each Ck is a compact set, Ck A,
and (C) = limk+ (Ck ) because of Proposition 1.22 again. Hence, there exists a positive
integer k0 so that (Ck0 ) > (A) , and (1.14) follows. If (A) = + we can do the same
procedure by first choosing closed sets Cn A with (Cn ) > n for n N, and then writing
Cn =
k=1 Cnk , with compact sets Cnk = Cn B(0, k) A. We finish by choosing kn large
kn ) > n and noticing that (A) = + = supn (Cnkn ). 2
enough so that (Cn B(0,
2
2.1
Measurable functions
Definition and basic properties
Recall that a function is continuous if pre-image of every open set is open. Measurable
functions are defined in a similar spirit. We start with the following observation.
Proposition 2.1 Let f : Rn R be a real-valued function defined on a measurable set. Then
the following are equivalent.
(i) For any R the set {x : f (x) > } is measurable,
(ii) For any R the set {x : f (x) } is measurable,
(iii) For any R the set {x : f (x) } is measurable,
(iv) For any R the set {x : f (x) < } is measurable.
Proof. First, it is obvious that (i) and (iv) are equivalent, and so are (ii) and (iii). If we
write
[
{x : f (x) > } =
{x : f (x) + 1/n}
n=1
we see that (ii) implies (i), and, similarly, we get that (iv) implies (iii). 2
This leads to the following, somewhat more general definition. Let X be a set, Y a
topological space and assume that is a measure on X.
Definition 2.2 A function f : X Y is -measurable if for each open set U Y , the
pre-image f 1 (U ) is -measurable.
For real-valued functions it suffices to check that pre-images of the half intervals (, +) are
all open in order to establish measurability of a function.
The next proposition gives some basic properties of measurable functions which are neither
surprising nor particularly amusing.
Proposition 2.3 If f, g : X R are measurable functions and c R is a real number then
the following functions are also measurable: cf , f + c, f + g, f g and f g.
Proof. (1) To see that f + c is measurable we simply note that
{x : f (x) + c < } = {x : f (x) < c}.
14
(2) For measurability of cf with c > 0 we observe that {x : cf (x) < } = {x : f (x) < /c},
and the case c 0 is not very different.
(3) To show that f + g is measurable we decompose
[
{x : f (x) + g(x) < } =
[{x : f (x) < r} {x : g(x) < r}] .
rQ
(f + g)2 (f g)2
,
4
1jn
as well as
s(x) = lim sup fn (x) and w(x) = lim inf fn (x).
n
n
[
{g(x) > } =
j=1
j=1
which shows that gn (x) and g(x) are both measurable, and qn (x) are q(x) are measurable for
a similar reason. Now, for s(x) we use the representation
s(x) = lim sup fn (x) = inf sup fk (x) ,
n
kn
and see that s(x) is measurable by what we have just proved. The function w(x) is measurable
for a similar reason. 2
The next result gives a very convenient representation of a positive function as a sum of
simple functions. We denote by A the characteristic function of a set A:
0, if x
/ A,
A (x) =
1, if x A.
15
Definition 2.5 A measurable function f (x) is simple if it takes at most countably many
values.
Theorem 2.6 Let a non-negative function f be -measurable. Then there exist -measurable
sets Ak such that
X
1
Ak (x).
(2.1)
f (x) =
k
k=1
Proof. Set
A1 = {x : f (x) 1}
and continue inductively by setting
(
Ak =
)
k1
1 X1
x : f (x) +
A (x) .
k j=1 j j
k
X
1
j=1
and thus
f (x)
X
1
k=1
Aj (x),
Ak (x).
(2.2)
X
1
k=1
Ak (x) if f (x) = +.
2.2
Lusins theorem says, roughly speaking, that any measurable function coincides with a continuous function on a set of large measure. The catch is that you do not have a control of
the structure of the set where the two functions coincide. For instance, the Dirichlet function
which is equal to one at irrational numbers and to zero at rational ones coincides with the
function equal identically to one on the set of irrational numbers which has full measure but
lots of holes.
16
X
usj (x)
j=1
2j
Note that for x U the function (x) is continuous because usj (x) are continuous and by the
Weierstrass test. Moreover, for any x U there exists sj0 such that |x sj0 | 2dist(x, K)
since {sj } are dense. Therefore, usj0 (x) > 0 and thus (x) > 0 for all x U . Let us also set
normalized weights of each point sj
vj (x) =
Note that
2j usj (x)
.
(x)
vj (x) 1
(2.3)
j=1
(2.4)
The idea is that for x far from the boundary of K the extension is not very difficult, the
problem is with x close to K and for those x the function f(x) is defined as a weighted
average of f (sj ) with the bigger weight going to sj which are close to x.
Let us check that f(x) is continuous. The series in (2.4) for x U converges uniformly
because 0 usj (x) 1, the function (x) is continuous and |f (sj )| M since f is a
continuous function, sj K, and the set K is compact. As each individual term f (sj )vj (x) is
17
a continuous function, the uniform convergence of the series implies that f(x) is continuous
at x U .
Now, let us show that for each x K we have f (x) = limyx f(y). Given > 0 use uniform
continuity of the function f on the compact set K to choose > 0 so that |f (x) f (x0 )| <
as soon as |x x0 | < and x, x0 K. Consider y U such that |y x| < /4. Then if
|x sk | we have
|x sk | |x y| + |y sk |,
thus
3
2|x y|,
4
and hence usk (y) = vk (y) = 0 for such sk . Therefore, we have |f (x) f (sk )| < for all sk
such that vk (y) 6= 0, and we may simply estimate, using (2.3):
X
X
X
vk (y)f (sk )
vk (y)f (x)
vk (y)|f (x) f (sk )| < .
|f (y) f (x)| =
|y sk |
k=1
k=1
k=1
Therefore, the function f(x) is continuous everywhere. The claim that |f(x)| supyK |f(y)|
for all x Rn follows immediately from the definition of f(x) and (2.3). 2
Lusins Theorem
Lusins theorem says that every measurable function coincides with a continuous function on
an arbitrarily large set.
Theorem 2.8 Let be a Borel regular measure on Rn and let f : Rn Rm be -measurable.
Assume A Rn is a -measurable set with (A) < +. For any > 0 there exists a compact
set K A such that (A \ K ) < and the restriction of the function f to the compact set
K is continuous.
Proof. As usual, it is sufficient to prove this for m = 1. We will construct a compact set
K on which f (x) is a limit of a uniformly converging sequence of continuous functions and
is therefore itself continuous on K . To this end for each p N take half-open intervals
Bpj = [j/2p , (j + 1)/2p ), j Z and define the pre-images Apj = A (f 1 (Bpj )). The sets Apj
are -measurable and A =
j=1 Apj for each p fixed. Let = |A be the restriction of to
the set A, then is a Radon measure so there exists a compact set Kpj Apj such that
(Apj \ Kpj ) <
and thus
A\
2p+j
!
Kpj
<
j=1
.
2p
A \
N (p)
Kpj <
j=1
18
2p
N (p)
and set Dp = j=1 Kpj . Then the set Dp is compact. For each p and j define the function
gp (x) = j/2p for x Kpj , 1 j N (p). As the compact sets Kpj are pairwise disjoint, they
are all a finite distance apart and thus the function gp (x) is continuous on the set Dp and,
moreover, we have
1
(2.5)
|f (x) gp (x)| < p for all x Dp .
2
Finally, set K =
p=1 Dp . Then the set K (which depends on through the original choice
of the sets Kpj ) is compact, and
(A \ K)
(A \ Dp ) < .
p=1
Moreover, (2.5) implies that the sequence gp (x) converges uniformly to the function f (x) on
K and thus f is continuous on the set K . 2
A direct consequence of Theorems 2.7 and 2.8 is the following.
Corollary 2.9 Let and A be as in Lusins theorem. Then there exists a continuous function
f : Rn Rm such that {x A : f (x) 6= f(x)} < .
We note that if f (x) is a bounded function: |f (x)| M , then f(x) can be chosen so that
|f(x)| M as well this follows from the last statement in Theorem 2.7.
Egorovs theorem
Egorovs theorem shows that a point-wise converging sequence converges uniformly except
maybe on a small set.
Theorem 2.10 Let be a measure on Rn and let the functions fk : Rn R be -measurable.
Assume that the set A is -measurable with (A) < + and fk g almost everywhere on
A. Then for any > 0 there exists a -measurable set B such that (i) (A \ B ) < , and
(ii) the sequence fk converges uniformly to g on the set B .
Proof. Define a nested sequence of bad sets
[
1
Cij =
x : |fk (x) g(x)| > i ,
2
k=j
then Ci,j+1 Cij while
j=1 Cij = and so, as (A) < +, we have
lim (A Cij ) = (A (
j=1 Cij )) = 0
.
2i
Set
B = A \ (
i=1 Ci,Ni ),
then (A \ B) < and for each x B and for all n Ni we have
1
|fn (x) g(x)| i ,
2
hence fn (x) converges uniformly to g(x) on the set B. 2
19
f (x)d =
yj (f 1 (yj )) =
yj (Aj ).
(3.2)
Compared to the Riemann integral we simply turn our head sideways and compute the area
as in (3.2). This makes a world of difference and also allows the sets Aj to be just measurable,
and thus have a rather complicated structure which would rule out Riemann integrability of
f (x) of the form (3.1).
If f (x) is simple and measurable, we write f = f + f , where f + = max(f, 0) and
or
f d < +,
then we set
f d =
+
f d
f d.
The next proposition is the key step in the definition of the Lebesgue integral
Proposition 3.2 Let f be a bounded function defined on a measurable set E with E < +.
In order that
f d := inf
d = sup d := f d
f
where the infimum and the supremum are taken over all measurable simple functions f
and f , respectively, it is necessary and sufficient that f be measurable.
20
Proof. (1) Let f be a bounded measurable function, with |f (x)| M for all x E. Choose
a mesh step M/n and consider the pre-images
kM
(k 1)M
< f (x)
,
Ek = x :
n
n
with n k n, then
E=
n
[
Ek
k=n
n =
and thus
f d =
M
E,
n
f d.
(3.3)
(2) On the other hand, if (3.3) holds then for every n there exist measurable simple functions
n f and n f such that
1
(3.4)
(n n )d .
n
Set
= inf n , = sup n ,
then and are both measurable and f . Consider the set
[
1
A = {x E : (x) < (x)} =
{x E : (x) < (x) } :=
Ak .
k
k=1
k=1
Given any k N note that for n large enough we have n (x) < n (x) 1/k on the set Ak
and thus, as n n > 0, we have
n d
E
(n n )d
n d =
E
(n n )d
Ak
(Ak )
.
k
Combining this with (3.4) and letting n + we conclude that (Ak ) = 0 for all k. Thus,
= = f except on a set of measure zero, hence the function f is measurable. 2
Proposition 3.2 motivates the following.
Definition 3.3 Let f be a bounded measurable function defined on a measurable set E with
E < + then
f d = inf
E
d,
E
The next step in the hierarchy is to define the integral of a non-negative function.
Definition 3.4 If f 0 is a non-negative measurable function defined on a measurable set
E we define
f d = sup
hf
hd,
E
with supremum taken over all bounded simple functions h that vanish outside a set of finite
measure.
This gives way to the general case.
Definition
3.5 A non-negative measurable function f defined on a measurable set E is integrable if E f d < +. A measurable function g defined on a measurable set E is integrable
if both g + = max(g, 0) and g = max(0, g) are integrable.
Bounded convergence theorem
We now set to prove several theorems which address the same question: if a sequence fn (x)
converges point-wise to a function f (x), what can we say about the integral of f (x)? Let us
point out immediately two possible sources of trouble. One example is the sequence of step
functions fn (x) = [n,n+1] (x), and another is the sequence gn (x) = n[1/(2n),1/2n] (x). Both
fn (x) and gn (x) converge point-wise almost everywhere to f (x) = 0 but
gn (x)dx = 1 6 0 =
fn (x)dx =
R
f (x)dx.
R
This shows two possible reasons for the integrals of fn to fail to converge to the integral of
f (x): escape to infinity in case of fn (x) and concentration in the case of gn (x).
Bounded convergence theorem deals with the situation when neither escape to infinity nor
concentration is possible.
Theorem 3.6 Let fn be a sequence of measurable functions defined on a measurable set
E with E < +. Assume that fn are uniformly bounded: there exists M > 0 so that
|fn (x)| M for all n and all x E. Then if fn (x) f (x) point-wise almost everywhere on
E then
f d = lim
fn d.
(3.5)
n
Proof. This is trivial if fn converges uniformly to f on the set E. In general, given any
> 0 we may use Egorovs theorem to find a set A such that (A ) < , and fn converges
uniformly to f on the set E \ A . Then for large enough n we have
(fn f )d
|fn f |d +
|fn f |d (E) + 2M (A ) ((E) + 2M ),
E
E\A
22
Fatous Lemma
Fatous lemma tells us that in the limit we may only lose mass, which is exactly what happened
in the two examples (concentration and escape to infinity) mentioned at the beginning of this
section.
Theorem 3.7 Let fn be a sequence of non-negative measurable functions which converges
point-wise to a function f on a measurable set E, then
f d lim inf
n+
fn d.
(3.6)
Proof. Let h be a bounded non-negative simple function which vanishes outside a set E 0 with
E 0 < + and such that h f on E. Set hn (x) = min {h(x), fn (x)}, then hn (x) h(x) on
E. Then we have, applying the bounded convergence theorem to the sequence hn on the set
E 0:
fn d.
fn d lim inf
hn d lim inf
hd = lim
hd =
E
E0
E0
E0
f d = lim
n+
f d =
E
X
n=1
un (x)d.
E
Proof.
Apply the monotone convergence theorem to the sequence of partial sums fn (x) =
Pn
j=1 uj (x). 2
23
(g f )d lim inf
(g fn )d.
(3.8)
Moreover, the fact that |fn | g implies that the limit f is integrable, hence it follows from
(3.8) that
gd lim sup fn d,
f d
gd
E
and thus
fn d
lim sup
f d.
E
f d
gd +
E
and thus
gd + lim inf
E
fn d,
E
f d lim inf
fn d.
E
f d < +.
E
Then for any > 0 there exists > 0 so that for any measurable set A E with (A) <
we have
f d < .
A
Proof. Suppose that this fails. Then there exists 0 > 0 and a sequence of sets An E so
that (An ) < 1/2n but
f d 0 .
An
24
Consider the functions gn (x) = f (x)An (x), then gn (x) 0 as n except for points x
which lie in infinitely many An s, that is,
!
\
[
xA=
Aj .
n=1
j=n
Aj
j=n
(Aj )
j=n
1
2n1
It follows that (A) = 0 and thus gn (x) 0 a.e. on E. Now, set fn = f gn , then fn 0
and fn f a.e., so Fatous lemma can be applied to fn :
f d lim inf
E
fn d
E
f d lim sup
E
gn d
E
f d 0 ,
E
which is a contradiction. 2
Convergence in probability
Definition 3.12 A sequence of measurable functions fn converges in probability to a function
f on a set E if for every > 0 there exists N such that for all n N we have
(x E : |fn (x) f (x)| ) < .
It is quite easy to see that convergence in probability need not imply point-wise convergence
anywhere: take a sequence
!
n
X
1
sn =
(mod1)
k
k=1
and consider the functions
[sn ,sn+1 ] (x),
if 0 sn < sn+1 1
n (x) =
[0,sn+1 ] (x) + [sn ,1] (x), if 0 sn+1 < sn 1.
Then n 0 in probability but n (x) does not go to zero point-wise anywhere on [0, 1].
Nevertheless, convergence in probability implies point-wise convergence along a subsequence.
Proposition 3.13 Assume that fn converges to f in probability on a set E. Then there exists
a subsequence fnk which converges to f (x) point-wise a.e. on E.
Proof. For any j we can find a number Nj such that for any n > Nj we have
1
1
x E : |f (x) fn (x)| j j .
2
2
Define the bad sets
Ej =
1
x E : |f (x) fNj (x)| j
2
25
,
then for x
/ Dk =
j=k Ej we have
|f (x) fN (j) (x)| <
1
2j
We will now address for some time the question of when the Newton-Leibnitz formula
(4.1)
holds. Recall that we denote by m(E) the Lebesgue measure of a set E R on the line.
4.1
Proof. Let O be an open set with m(O) < + which contains E: E O. Such set exists
since m (E) < +. As O is an open set and J is a fine cover of E, if we consider the
collection J 0 of intervals in J which are contained in O, the new cover J 0 is still a fine cover
of E. Hence, we may assume from the start that all intervals in J are contained in O. Choose
any interval I1 and assume that the intervals I1 , I2 , . . . In have been already chosen. Here is
how we choose the interval In+1 . Let kn be the supremum of the lengths of intervals in J
that do not intersect any of I1 , I2 , . . . In . Then kn m(O) < + and, moreover, if kn = 0
then E nj=1 Ij . Indeed, if kn = 0 and x En = E Dn , Dn = (nj=1 Ij )c then as Dn is
open and J is a fine cover, there exists a non-trivial interval I J such that I Dnc which
contradicts kn = 0. Hence, if kn = 0 for some n then E nj=1 Ij and we are done. If kn > 0
for all n take the interval In+1 disjoint from all of Ij with 1 j n such that l(In+1 ) kn /2.
This produces a sequence of disjoint intervals In such that
X
l(In ) m(O) < +.
(4.2)
n
26
l(Ij ) <
j=N +1
and set
R=E\
N
[
Ij .
j=1
We need to verify that m (R) < . For any point x R there exists an interval I J such
that x I and I is disjoint from all intervals {I1 , I2 , . . . , IN }. Furthermore, if for some n the
interval I is disjoint from intervals {I1 , I2 , . . . In } then we have
l(I) kn < 2l(In+1 ).
(4.3)
However, (4.2) implies that l(In ) 0 as n +, thus I must intersect some interval In
with n > N because of (4.3). Let n0 be the smallest such n, then l(I) kn0 1 2l(In0 ).
Since x I and I intersects In0 , the distance from x to the midpoint of In0 is at most
l(I) +
l(In0 )
5l(In0 1 )
.
2
2
Hence, x lies in the interval In0 which has the same midpoint as In0 and is five times as long
as In0 . Therefore, the set R is covered:
R
In ,
n=N +1
and thus
m (R) <
l(In ) 5
n=N +1
l(In ) < ,
n=N +1
B
B,
BF
BJ
27
Corollary 4.4 Assume that F is a fine cover of a set A by closed balls and
sup{diamB : B F} < +.
Then there exists a countable sub-family J of disjoint balls in F such that for each finite
subset {B1 , . . . , Bn } F we have
A\
m
[
k=1
Bk
B.
BJ \{B1 ,...,Bn }
The next corollary uses the Vitali lemma repeatedly. Here we have to refer to the ndimensional Lebesgue measure. The reader may either set n = 1 or use the geometric
intuition.
Corollary 4.5 Let U Rn be an open set and > 0. There exists a countable collection of
disjoint closed balls in U such that diamB for al B J and
!
[
m U\
B = 0.
(4.4)
BJ
N1
[
Bj .
j=1
The set U1 is still open and we can find disjoint closed balls B2,1 , . . . , B2,N2 U1 so that
!
N2
[
m(U1 )
.
m U1 \
B2,j <
3
j=1
Continuing this procedure leads to a sequence of disjoint balls Bn so that (4.4) holds. 2
A key point in the proof of Vitalis lemma was the fact that the Lebesgue measure is
doubling. This means that there exists a constant c > 0 so that for any ball B(x, r) we
have a bound m(B(x, 2r)) cm(B(x, r)). Such property is not true in general, for arbitrary
measures. A difficult extension of Vitalis lemma and in particular of Corollary 4.5 is the
Besikovitch theorem that we will encounter soon which will establish this corollary for nondoubling measures.
28
One-sided derivatives
Let us go back to the question of when (4.1) holds. First, we need the definition of the
derivative and we begin with the definition of left and right derivatives.
Definition 4.6 Let f be a real-valued function defined on the real line, then
f (x + h) f (x)
f (x) f (x h)
, D f (x) = lim sup
h
h
h0
h0
f (x) f (x h)
f (x + h) f (x)
, D f (x) = lim inf
.
D+ f (x) = lim inf
h0
h0
h
h
and we will show that m (Ers ) = 0 for all r, s Q. Let l = m (Ers ) and given > 0 enclose
Ers in an open set O, Ers O, with mO < l + . For each x Ers there exists an arbitrary
small interval [x h, x] O such that f (x) f (x h) < sh. Using Vitalis lemma we can
choose a finite subcollection {I1 , . . . , IN } of such disjoint intervals whose interiors cover a set
o
A = (N
n=1 In ) Ers with l < m(A) < l + . It follows that
N
X
n=1
N
X
hn < s(l + ).
(4.5)
n=1
Next, take any point y A, then y In for some n, and, as A Ers , there exists an arbitrary
small interval [y, y + k] In such that f (y + k) f (y) > rk. Using Vitalis lemma again we
may choose intervals {J1 , . . . , JM } such that J1 , . . . , JM N
n=1 In and
m (A \
M
[
Jl ) < .
l=1
As a consequence,
M
X
n=1
29
and thus
M
X
n=1
M
X
(4.6)
n=1
On the other hand, each interval Jk is contained in some interval Ip and f is increasing so
that for each p:
X
(f (yk + hk ) f (yk )) f (xp ) f (xp hp ).
Jk Ip
Summing over p and taking into account (4.5) and (4.6) we conclude that s(l + ) r(l 2).
As this is true for all > 0, and r > s it follows that l = 0 so that m (Ers ) = 0 for all
r, s Q, and thus m (E) = 0.
Now that we know that f 0 (x) exists a.e. let us show that f 0 (x) is a measurable function.
Let us extend f (x) = f (b) for x b and set
1
(4.7)
gn (x) = n f (x + ) f (x) .
n
Then
f 0 (x) = lim gn (x)
(4.8)
(4.9)
Proof. The function f 0 (x) is measurable according to Theorem 4.7 hence the integral in the
left side of (4.9) is well defined. Let us define the approximations gn (x) by (4.7), once again
with the convention f (x) = f (b) for x > b, then gn (x) 0, thus f 0 (x) 0 by (4.8), and,
moreover, Fatous lemma implies that
b
b
b
1
0
f (x)dx lim inf
gn (x)dx = lim inf
n f (x + ) f (x) dx
n
a
a
a
b+1/n
a+1/n
b+1/n
a+1/n
f (b)dx n
= lim inf n
n
f (b)dx n
f (a)dx
a
= f (b) f (a),
and (4.9) follows. As a consequence of (4.9) we also conclude that f 0 (x) is finite a.e. 2
30
4.2
Let a = x0 < x1 < . . . < xm1 < xm = b be a partition of an interval [a, b]. For a fixed
partition we define
p=
m
X
k=1
m
X
[f (xk ) f (xk1 )] , t = n + p =
k=1
n
X
k=1
The total variation of a function f over an interval [a, b] is Tab [f ] = sup t, where supremum is
taken over all partitions on [a, b]. Similarly, we define Nab [f ] = sup n and Pab [f ] = sup p.
Definition 4.9 We say that f has a bounded total variation on [a, b] and write f BV [a, b]
if Tab [f ] < +.
The simplest example of function of bounded variation is a monotonic function on [a, b] as
Tab [f ] = |f (b) f (a)| for monotonic functions. It turns out that all functions in BV [a, b] are
a difference of two monotonic functions.
Theorem 4.10 A function f has a bounded variation on an interval [a, b] if and only if f is
a difference of two monotonic functions.
Proof. (1) Assume that f BV [a, b]. We claim that
f (x) f (a) = Pax [f ] Nax [f ].
(4.10)
|f (xi ) f (xi1 )|
i=1
n
X
i=1
n
X
|u(xi ) u(xi1 )| +
i=1
(u(xi ) u(xi1 )) +
n
X
n
X
|v(xi ) v(xi1 )|
i=1
i=1
31
Differentiation of an integral
Theorem 4.12 Let f L1 [a, b] be an integrable function, and
f (t)dt,
F (x) =
a
|F (xi ) F (xi1 |
i=1
n
X
xi
|f (t)|dt.
a
xi1
i=1
|f (t)|dt
f (s)ds = 0
(4.11)
0=
f (t)dt =
a
f (t)dt +
F
f (t)dt.
O
It follows that
f (t)dt < 0,
O
and thus, as O is a disjoint union of open intervals, there exists an interval (, ) O such
that
f (t)dt < 0,
x+1/n
f (x)dx.
x
These functions are uniformly bounded: |fn (x)| K and fn (x) F (x) a.e. The bounded
convergence theorem implies that for all x [a, b] we have
"
#
x
F 0 (t)dt = lim
a
x+1/n
a+1/n
F (t)dt n
fn (t)dt = lim n
x
32
The last step above follows from the continuity of the function F (t). Now, Lemma 4.13
implies that F 0 (x) = f (x) a.e. on [a, b].
Finally, consider the situation when f L1 [a, b] but is maybe unbounded. Without loss
of generality we may assume that f 0. Consider the cut-offs gn (x) = min{f (x), n}. Then
f gn 0, thus the functions
(f gn ))dt
Gn (x) =
a
are increasing, hence G0n (x) 0 a.e. As the functions gn are bounded for each n fixed, we
know from the first part of the proof that
d
dx
gn (t)dt = gn (x)
a
almost everywhere. It follows that F 0 (x) = G0n (x) + gn (x) gn (x) and, in particular, F 0 (x)
exists almost everywhere. Passing to the limit n we deduce that F 0 (x) f (x) a.e.
which, in turn, implies that
b
0
F (x)dx
a
f (t)dt.
a
Note that absolute continuity of a function f on [a, b] implies that f has a bounded variation
on [a, b]. To see this, simply take 0 in the definition of absolute continuity that corresponds
to = 1 and split [a, b] into a finite number of collections of non-overlapping intervals, each
of the total length less than 0 .
Another simple observation is that Proposition 3.11 implies that every indefinite integral
F (x) = F (a) +
f (t)dt
a
33
(4.12)
with f L1 [a, b] is absolutely continuous. Our goal is to show that every absolutely continuous
function is the indefinite integral of its derivative, that is, the Newton-Leibnitz formula holds
for absolutely continuous functions.
Theorem 4.15 A function F (x) is an indefinite integral, that is, it has the form (4.12) with
f L1 [a, b] if and only if F is absolutely continuous.
Proof. As we have mentioned, absolute continuity of the indefinite integral follows immediately from Proposition 3.11. Now, let F (x) be absolutely continuous, then, as we have noted
above F has bounded variation on [a, b] and thus can be written as F (x) = F1 (x) F2 (x),
where both of the functions F1 and F2 are increasing. Hence, F 0 (x) exists a.e. and |F 0 (x)|
F10 (x) + F20 (x) so that
F10 (x)dx
|F (x)|dx
a
|F 0 (t)|dt,
G(x) =
a
then G(x) is absolutely continuous and G0 (x) = F 0 (x) a.e. as follows from Theorem 4.12. Set
R(x) = F (x) G(x), then R(x) is absolutely continuous and R0 (x) = 0 a.e. Let us show that
R(x) is actually a constant (and thus is equal identically to F (a)). This will finish the proof
of Theorem 4.15. To this end we take a point c [a, b] and consider the set A of measure
m(A) = c a such that f 0 (x) = 0 on A. Given > 0 for any x A and every n < N (x) we
choose hn (x) < 1/n so that
|f (x + hn (x)) f (x)| < hn (x).
(4.13)
This produces a fine covering of A by intervals of the form In (x) = [x, x + hn (x)]. Vitalis
lemma allows us to find a finite collection Ik (xk ) = [xk , yk ], k = 1, . . . , N which covers a set
of measure (c a ()/2), where () is in the definition of absolute continuity of the
function R(x) corresponding to , that is, if we set y0 = a and xN +1 = c, we have
N
X
|xk+1 yk | < .
(4.14)
k=0
N
X
k=1
|f (yk ) f (xk )| +
N
X
|f (xk+1 ) f (yk )| (b a) + .
k=1
As > 0 is arbitrary, we deduce that R(x) = R(a) for all c [a, b]. 2
A common way to re-phrase Theorem 4.15 is to say that every absolutely continuous
function is the integral of its derivative this identifies functions which satisfy the NewtonLeibnitz formula.
34
X
( ) (S) = inf
(Aj )(Bj ) ,
j=1
j=1 (Aj
Bj ).
Our goal in this section is to prove basic statements familiar from the calculus course regarding
the connection between the iterated integrals and integrals over the product measure.
Let F be the collection of sets S X Y for which the iterated integral can be defined,
that is, the characteristic function S (x, y) is -measurable for -a.e. y Y and the function
s(y) =
S (x, y)d(x)
X
(A B) =
(A)d(y) = (A)(B).
B
From the way area is defined in elementary geometry we know that the next level of complexity
should be countable unions of such sets:
(
)
[
P1 =
(Aj Bj ) : Aj X is -measurable, and Bj Y is -measurable .
j=1
S
Note that every set S =
j=1 (Aj Bj ) P1 is in F. The point is that, using further
subdivision of Aj and Bj such S can be written as a disjoint countable union with
(Aj Bj ) (An Bn ) = for j 6= n.
Then for each y the cross-section {x : (x, y) S} is an at most countable union of
-measurable disjoint sets, and
S (x, y)dx =
X
X
Aj (x)Bj (y)dx =
X j=1
X
j=1
35
X
(S) =
(Aj )(Bj ).
j=1 (Aj
Bj ) P1 is a disjoint
j=1
Next, we note that for each set U X Y its outer measure can be approximated as in
elementary geometry:
( ) (U ) = inf{(S) : U S, S P1 }.
(5.1)
S
Indeed, this is somewhat tautological: if U S = j=1 (Aj Bj ) P1 then
!
X
X
(S) =
S (x, y)dx dy
Aj (x)Bj (y)dx dy =
(Aj )(Bj ).
Y
X j=1
j=1
As () (U ) is the infimum of all possible right sides above, by the definition of the product
measure we have
inf (S) ( ) (U ).
On the other hand, any such S can be written as a disjoint union and then
(S) =
(Aj )(Bj ) ( ) (U ).
j=1
again by the definition of the product measure. Hence, (5.1) holds. Now, we can show that
a product of two measurable sets is measurable.
Proposition 5.2 Let a set A X be -measurable and a set B Y be -measurable. Then
the set A B X Y is measurable.
Proof. Take a set S = A B such that A is -measurable and B is -measurable. Then S
is in P0 , thus in P1 so that
( ) (S) (A)(B) = (S) (R)
for all R P1 containing S. It follows from (5.1) that ( ) (A B) = (A)(B). Let us
show that A B is -measurable. Take any set T X Y and a P1 -set R containing
T . Then the sets R (A B)c and R (A B) are both disjoint and in P1 . Hence,
( ) (T (A B)c ) + ( ) (T (A B)) (R (A B)c ) + (R (A B)) = (R),
because if R and Q are in P1 , R Q = then (R Q) = (R) + (Q). Taking infimum over
all such R and using (5.1) we arrive to
( )(T (A B)c ) + ( )(T (A B)) ( )(T ),
and thus A B is a measurable set. 2
Once again, following the motivation from approximating areas in elementary geometry
we define sets that are countable intersections of those in P1 :
P2 = {
Sj , Sj P1 }.
j=1
36
Proposition 5.3 For each set S X Y there exists a set R P2 F such that S R
and (R) = ( ) (S).
Proof. If ( ) (S) = + it suffices to take R = X Y , so we may assume that
( ) (S) < + without loss of generality. Using (5.1) choose the sets Rj P1 such that
S Rj and
1
(Rj ) < ( ) (S) + .
j
T
Tk
Consider the sets R = j=1 Rj P2 and Qk = j=1 Rj and note that
R (x, y) = lim Qk (x, y).
k
R (y) =
Qk (x, y)d(x),
X
and thus R (y) is -integrable and R F. As k (y) 1 (y), it also follows that
(R) =
R (y)d(y) =
Y
Y k
(5.2)
On the other hand, since S Qk we know that ( ) (S) (Qk ) and thus (R) =
( ) (S). 2
Corollary 5.4 The measure is regular even if and are not regular.
Proof. Proposition 5.2 implies that each set in P2 is measurable, while Proposition 5.3
implies that for S P2 we have ( )(S) = (S). The same proposition implies then that
the measure is regular. 2
S
Definition 5.5 A set X is -finite if X =
j=1 Bk and the sets Bk are -measurable with
(Bk ) < +.
Theorem 5.6 (Fubini) Let a set S X Y be -finite with respect to the measure .
Then the cross-section Sy = {x : (x, y) S} is -measurable for -a.e. y, the cross-section
Sx = {x : (x, y) S} is -measurable for -a.e. x, (Sy ) is a -measurable function of y,
and (Sx ) is a -measurable function of x. Moreover,
( )(S) =
(Sy )dy =
Y
(Sx )dx .
X
37
(5.3)
Proof. If ( )(S) = 0 then there exists a set R P2 such that S R and (R) = 0.
Since S (x, y) R (x, y) it follows that S F and (S) = 0.
Now, let S X Y be -measurable and ( )(S) < +. Then there exists
R P2 , such that S R and ( )(R \ S) = 0, thus, by the above argument, (R \ S) = 0.
This means that
(x : (x, y) S) = (x : (x, y) R)
for -a.e. y and thus, as R P2 implies ( )(R) = (R),
(x : (x, y) R)d =
Y
(x : (x, y) S)d,
Y
which is (5.3).
Finally, assume thatSS is a -finite set and ( )(S) = +. Then S can be written as
a countable union S =
j=1 Bj of ( )-measurable sets Bj with ( )(Bj ) < +. We
may assume without loss of generality that all Bj are pairwise disjoint so that by what we
have just proved
( )(S) =
( )(Bj ) =
j=1
j=1
( : (x, y)
X
(x : (x, y) Bj )d =
X
(x : (x, y) Bj )d
Y j=1
(x : (x, y) S)d,
Bj )d =
Y
j=1
f (x, y)d(x)
p(y) =
X
q(x) =
f (x, y)d(y)
Y
is -measurable and
f d( ) =
XY
p(y)d(y) =
Y
q(x)d(x).
(5.4)
Proof. This follows immediately from Theorem 5.6 if f (x, y) = S (x, y) with a ( )measurable set S. If f 0 use Theorem 2.6 to write
f (x, y) =
X
1
k=1
Ak (x, y)
and then use Corollary 3.9 to integrate this relation term-wise leading both to
f (x, y)d(y) =
Y
X
1
k=1
38
(y : (x, y) Ak ),
( )(Ak ) =
(y : (x, y) Ak )d(x)
k
k X
k=1
f (x, y)d(y) d(x),
=
f d( ) =
XY
X
1
X
1
k=1
which is (5.4). 2
6
6.1
The Besicovtich theorem is a tool to study measures on Rn which do not have the doubling
in terms of (B) as in
property. The idea is to bypass having to control the measure (B)
the proof of Vitalis lemma. Here the doubling property means the following: there exists a
constant C > 0 so that for any x Rn and r > 0 we have
1
(B(x, 2r)) (B(x, r)) C(B(x, 2r)).
C
In dealing with measures which may not have this property the following theorem is extremely
helpful
Theorem 6.1 (The Besicovitch theorem.) There exists a constant N (n) depending only on
the dimension with the following property: if F is any collection of closed balls in Rn with
B
F < +
D = sup diamB|
F then there exist J1 , J2 , . . . , JN (n) such that each Jk
and A is the set of centers of balls B
is a countable collection of disjoint balls in F and
N (n)
[ [
B.
j
j=1 BJ
The key point here is that we do not have to stretch the balls as in the corollaries of Vitalis
lemma the price to pay is that we have several collections J1 , J2 , . . . , JN (n) , and a ball
from a collection Ji may intersect a ball from another collection Jj if i 6= j. However, this
is not that important since the number N (n) is a universal constant depending only on the
dimension n.
Corollary 6.2 Let be a Borel measure on Rn and F any collection of non-degenerate
closed balls. Let A denote the set of centers of the balls in F. Assume that (A) < +
r) F} = 0 for all a A. Then for each open set U Rn there exists a
and inf{r : B(a,
S
U and
countable collection J of pairwise disjoint balls in F such that BJ
B
[
= 0.
((A U ) \
B)
(6.1)
BJ
39
Proof. Let N (n) be the number of required collections in the Besicovitch theorem and take
= 1 1/(2N (n)). Then, using the Besicovitch theorem we may find a countable collection
: B
F, B
U, diamB
1} such that
J of disjoint balls in F1 = {B
[
1 (A U ).
(A U ) (
B)
N (n)
BJ
1 , . . . , B
M1
Therefore, using the increasing sets theorem, we may choose a finite sub-collection B
of J such that
!
M1
[
j ) 1 (A U ).
(A U ) ( B
2N (n)
j=1
It follows that
M1
[
1
(A U ).
(A U ) \ (
2N
(n)
j=1
S
M1
and the collection
Applying the same reasoning to the set U2 = U \
j=1 Bj
j )
B
1
: B
F, B
U2 , diamB
1}
F2 = {B
M1 +1 , . . . , B
M2 such that
we get a finite set of balls B
!
M2
[
j ) 1
(A U2 ) \ (
B
j=M1 +1
1
2N (n)
(A U2 ).
It follows that
(A U ) \ (
M2
[
!
j )
B
= (A U2 ) \ (
j=1
1
1
2N (n)
M2
[
!
j )
B
j=M1 +1
2
1
1
2N (n)
(A U2 )
(A U ).
1 , . . . , B
M so that
Continuing this procedure, for each k we obtain a finite collection of balls B
k
!
k
Mk
[
1
j ) 1
(A U ).
(A U ) \ ( B
2N (n)
j=1
1 , B
2 , . . . , B
k , . . .} satisfies (6.1). 2
Then the collection J = {B
6.2
The proof of this theorem proceeds in several technical steps. Step 1 is to reduce the problem
to the situation when the set A of the centers is bounded. Step 2 is to choose the balls
1 , B
2 , . . . , B
n , . . . this procedure is quite similar to that in Vitalis lemma. Step 3 is to
B
40
show that the balls we have chosen cover the set A. The last step is to to show that the
j can be split into N (n) separate sub-collections Jk , k = 1, . . . , N (n) such that each
balls B
Jk itself is a collection of pair-wise disjoint balls. For that one has to estimate how many of
1 , . . . , B
k1 the ball B
k intersects it turns out that this number depends only on
the balls B
the dimension (and not on k, the set A or anything else) and that is the number N (n) we are
looking for. The crux of the matter is in this estimate and it is not trivial.
k may
Reduction to counting the number of balls a given ball B
intersect
Let us first explain why our main interest is in estimating how many of the preceding balls
1 , . . . , B
k1 the ball B
k intersects.
B
1 , B
2 , . . . , B
n , . . . be a countable collection F of balls in Rn . Assume that
Lemma 6.3 Let B
n intersects at most M balls out of {B
1 , B
2 , . . . , B
n1 }.
there exists M > 0 so that each ball B
Then the collection F can be split into (M + 1) sub-collections J1 , J2 , . . . , JM +1 so that each
Jm is a collection of pair-wise disjoint balls and
[
BF
=
B
M
+1
[
B.
j
j=1 BJ
M +1 which goes
the following way: B1 goes into the basket J1 , B2 into J2 , and so on until B
1 , . . . , B
k1 were already put
into JM +1 . After that we proceed as follows: assume the balls B
k by assumption only M out the M + 1 baskets may contain
into baskets. Take the ball B
just put Bk into one of such baskets, it does not matter which one). Then we go to the next
k+1 , and so on. 2
ball B
centers
A of all balls
B F is bounded. Assume now that A is unbounded. Set D =
while
x2 {x : 3D(m 1) D/2 |x| < 3Dm + D/2},
1 + diamB
2 . Therefore, if we double the number N (n) needed
thus, |x1 x2 | 2D > diamB
to cover a bounded set we can set up the baskets as in the proof of Lemma 6.3 and cover an
unbounded set A by 2N (n) countable collections of disjoint balls.
Remark. From now on we assume that the set A is bounded.
3 D
.
4 2
Aj = A \
i
B
i=1
be the subset of A not covered by the first (j 1) balls. If Aj = , stop and set the counter
j may intersect each other
J = j (note that even in that case we are not done yet the balls B
and we still have to distribute them into N (n) baskets so that balls inside each basket do not
j = B(a
j , rj ) such that aj Aj and
intersect). If Aj 6= choose B
rj
3
r) F, a Aj .
sup r : B(a,
4
3
r) F, a Ai 3 rj ,
sup r : B(a,
4
4
as claimed. 2
j by a factor of three, the resulting
The next lemma shows that if we shrink the balls B
balls are disjoint without having to put them into any kind of separate sub-collections.
42
ri 2 3
ri rj
ri rj
ri 2ri
+
+ rj +
> + .
3
3
3
3 4
3
2
3
3
(a, rj /3)
B
j=1
|rj |n < +.
j=1
J
[
j , rj ).
B(a
j=1
Proof. If J <S this is obvious the only reason we can stop at a finite J is if the whole set A
j , rj ). Suppose J = and let a A be a center of a ball B(a,
r) F.
is covered by Jj=1 B(a
S
Assume that a is not in the union j=1 B(aj , rj ). Lemma 6.6 implies that there exists j such
S
that rj < 3r/4. This is a contradiction: the point a is not in the set j1
i=1 B(ai , ri ), hence
the ball B(a, r) was a candidate ball at stage j and its radius
S r satisfies r > 4rj /3 this is
impossible. Hence, no point in A can fail to be in the set j=1 B(aj , rj ), and we are done. 2
43
This proposition together with Lemma 6.3 completes the proof of the Besicovitch Theorem.
Hence, all we need to is to prove Proposition 6.8. The proof is rather technical. We will do
j , j = 1, . . . , m 1,
it in two steps. Given m N we will split the set of preceding balls B
m and the bad ones that do. Further, we
into the good ones which do not intersect B
m ) and large balls. Next, we will estimate
split the bad ones into small (relative to B
the number of small bad balls by 20n . Estimating the number of large balls is the final and
more daunting task.
To begin we fix a positive integer m and define the set of bad preceding indices
j B
m 6= }.
Im = {j : 1 j m 1, , B
Out of these we first consider the small bad balls:
Km = Im {j : rj 3rm }.
j , rj /3)
the stretched ball B(am , 5rm ). As Lemma 6.5 tells us that all the balls of the form B(a
are disjoint, it will follow that
X rjn
n n
5 rm
.
(6.2)
3n
jK
m
However, as j < k, we know from Lemma 6.4 that rj 3rm /4, and thus (6.2) implies that
n
5n rm
n
n
X rjn
3n rm
|Km |rm
|K
|
=
,
m
3n
4n 3n
4n
jK
m
j , rj /3)
and thus |Km | 20n . Thus, we need to show only that if j Km then B(a
m , 5rm ). To see that take a point x B(a
j , rj /3), then, as B
j and B
m intersect, and
B(a
rj 3rm , we have
|x am | |x aj | + |aj am |
rj
4
+ rj + rm = rj + rm 4rm + rm 5rm .
3
3
Proposition 6.10 There exists a number Ln which depends only on dimension n such that
the cardinality of the set Pm satisfies |Pm | Ln .
We will assume without loss of generality that the center am = 0. The key to the proof of
Proposition 6.10 is the following lemma which shows that the balls in the set Pm are sparsely
distributed in space.
Lemma 6.11 Let i, j Pm with i 6= j, and let be the angle between the two lines (ai , 0)
= 0 > 0.
and (aj , 0) that connect the centers ai and aj to am = 0. Then cos1 61
64
Before proving this technical lemma let us finish the proof of Proposition 6.10 assuming the
statement of Lemma 6.11 holds. To this end let r0 > 0 be such that if a point x Rn lies on
r0 ) are two points in a (small) ball of radius
the unit sphere in Rn , |x| = 1, and y, z B(x,
r0 around x then the angle between the lines connecting the points y and z to zero is less
than 0 from Lemma 6.11. Choose Ln so that the unit sphere {|x| = 1} Rn can be covered
by Ln balls of radius r0 but not Ln 1. Then Lemma 6.11 implies that |Pm | Ln . Indeed,
if i, j Pm then, according to this lemma, the rays connecting aj and ai to am = 0 have an
angle larger than 0 between them and thus they may not intersect the same ball of radius r0
with the center on the unit sphere. Therefore, their total number is at most Ln . 2
m and B
j , hence |ai | < rm + ri ,
addition, the balls Bm and Bi intersect, and so do the balls B
and |aj | < rm + rj . Moreover, as i, j Pm , we have ri > 3rm and rj > 3rm . Let us put these
facts together:
3rm < ri < |ai | ri + rm ,
3rm < rj < |aj | rj + rm ,
|ai | |aj |.
We claim that
|ai aj | |aj | if cos > 5/6.
45
(6.3)
< ,
2|ai ||aj |
2|ai ||aj |
2|aj |
2
6
which contradicts assumptions of Lemma 6.12. Therefore, |ai aj | |aj | is impossible and
j , |aj |) but we need a stronger condition
thus |ai aj | |aj |. This already implies that ai B(a
j , |aj |)).
ai B(aj , rj ) (recall that rj < |aj | so the ball B(aj , rj ) is smaller than B(a
Assume that ai
/ Bj we will show that this would imply that cos 5/6, which would
j , we have rj < |ai aj |, which, together with (6.3) gives
be a contradiction. As ai
/B
|ai |
(|aj | |ai aj |)(|aj | + |ai aj |)
|ai |2 + |aj |2 |ai aj |2
=
+
2|ai ||aj |
2|aj |
2|ai ||aj |
1 |aj | |ai aj |
1 |aj | rj
1 rj + rm rj
1 (|aj | |ai aj |)2|aj |
+
+
+
+
2
2|ai ||aj |
2
|ai |
2
ri
2
ri
1 rm
1 1
5
+
+ = .
2
ri
2 3
6
cos =
8|aj |
(1 cos ).
3
(6.4)
As by the assumptions of Lemma 6.13 we have ai Bj we must have i < j this follows from
j . Since i < j, we also have aj
the way we chose the balls B
/ Bi , and thus |ai aj | > ri ,
which implies (we also use our assumption that |ai | |aj | in the computation below)
|ai aj | + |ai | |aj | |ai aj | + |aj | |ai |
|ai aj | + |ai | |aj |
|aj |
|aj |
|aj ai |
2
2
2|ai ||aj |(1 cos )
2|ai |(1 cos )
|ai aj | (|ai | |aj |)
=
=
=
|aj ||ai aj |
|aj ||ai aj |
|ai aj |
2(ri + rm )(1 cos )
2 4ri (1 cos )
8(1 cos )
=
,
ri
3ri
3
0
so (6.4) holds.
j we have i < j and aj
i ,
Now, we can show that cos 61/64. Once again, as ai B
/B
so ri < |ai aj | rj , and as i < j we have rj 4ri /3. Therefore, we have
3
rj
rj
|ai aj | + |ai | |aj | ri + ri (rj + rm ) = 2ri rj rm 2 rj rj
=
4
3
6
1 3
rj 1
1
=
rj +
(rj + rm ) |aj |.
6 4
3
8
8
Returning to (6.4) it follows that
1
8|aj |
|aj |
(1 cos ),
8
3
46
and thus cos 61/64. This finishes the proof of Lemma 6.13 and hence that of the Besicovitch theorem! 2
Exercise 6.14 Find the best N (n) in dimensions n = 1 and n = 2. Warning: it is not very
difficult in one dimension but not at all simple in two dimensions.
6.3
Differentiation of measures
Let and be two Radon measures defined on Rn . The density of one measure with respect
to another is defined as follows.
Definition 6.15 We define
r))
(B(x,
lim sup
,
D (x) =
(B(x, r))
r0
+,
and
D (x) =
lim inf
r0
r))
(B(x,
r)) ,
(B(x,
+,
47
It follows that
(A)
(s + )
(B)
BJ
(s + )(U ).
(B)
BJ
Taking infimum over all open sets U containing the set A we obtain that (A) s(A). The
proof of part (ii) is almost identical. 2
Returning to the proof of Theorem 6.16 consider the set I = {x : D (x) = +}. Then
for all s > 0 we have s(I) (I), which means that (I) = 0, as (I) (Rn ) < +.
Moreover, for any b > a if we set Rab = {x : D < a < b < D }, we have, using
Lemma 6.17:
b(Rab ) (Rab ) a(Rab ),
thus (Rab ) = 0. It follows that D (x) exists and is finite -a.e. It remains to show that the
function D (x) is -measurable.
r)) (B(x,
r)) and
Lemma 6.18 For each x Rn and r > 0 we have lim sup (B(y,
yx
r)) (B(x,
r)).
lim sup (B(y,
yx
(6.5)
z
/ B(yk , r), and thus (6.5) holds. It follows that
lim inf (1 fk (z)) 1 B(x,r)
(z),
B(x,2r)
(1 B(x,r)
(z))d lim inf
B(x,2r)
(1 fk (z))d.
D (x) = lim
is also -measurable. 2
48
6.4
Definition 6.19 We say that a measure is absolutely continuous with respect to a measure
and write if for any set A such that (A) = 0 we have (A) = 0.
Theorem 6.20 Let and be Radon measures on Rn and assume that is absolutely continuous with respect to . Then for any -measurable set A we have
(A) =
D (x)d.
(6.6)
Proof. Let A be a -measurable set. We claim that A is also -measurable. Indeed, there
exists a Borel set B such that A B and (B \ A) = 0. As it follows that (B \ A) = 0
so that B \ A is -measurable, and, as B is a Borel set, B is also -measurable. Writing
A = B (B \ A)c we see that A is, indeed, -measurable.
Set now Z = {x : D (x) = 0} and I = {x : D (x) = +}. Then (I) = 0 by
Theorem 6.16 and thus (I) = 0. Moreover, for any R > 0 we have (Z B(0, R))
s(Z B(0, R)) for all s > 0 by Lemma 6.17. It follows that (Z B(0, R)) = 0 for all R > 0
and thus (Z) = 0. Summarizing, we have
(Z) =
(D )d = 0,
and (I) =
(D )d = 0.
(6.7)
+
X
m=
(Am )
+
X
m+1
(Am ) t
m=
+
X
m=
m
t (Am ) t
(D )d,
and
+
1 X m+1
1
t
(Am )
(D )d,
(A) =
(Am )
t (Am ) =
t
t
A
m=
m=
m=
+
X
+
X
S
where A = +
m= Am = A \ (Z I). Passing to the limit t 1 and using (6.7) to replace
A by A as the domain of integration we obtain (6.6). 2
6.5
Definition 6.21 We say that two Radon measures and are mutually singular and write
if there exists a Borel set B such that (Rn \ B) = (B) = 0.
49
Theorem 6.22 (The Lebesgue Decomposition) Let and be Radon measures on Rn . Then
(i) there exist measures ac and s so that = ac + s , and (ii) D (x) = D ac (x)
and D s = 0, both for -a.e. x so that for each Borel set A we have
(D )d + s (A).
(A) =
(6.8)
Proof. As before, since both and are Radon measures we may assume that (Rn ) <
and (Rn ) < +. If one or both of these measures is not finite we would simply restrict both
and to balls B(0, R) and let R + at the end of the proof.
We will define ac and s as ac = |B and s = |B c with an appropriately chosen Borel
set B. Consider the collection
F = {A Rn , A Borel, (Rn \ A) = 0.}
The set B should be, in measure-theoretical sense, the smallest element of F. To this end
choose Bk F such that
1
(Bk ) inf (A) + ,
AF
k
T
and set B = k=1 Bk . Then
n
(R \ B)
(Rn \ Bk ) = 0,
(6.9)
k=1
and thus B F and B is the smallest element of F in the sense that (B) = inf AF (A).
Note that (6.9) implies that s = |B c is mutually singular with . Let us show that ac =
|B is absolutely continuous with respect to . Let A Rn and assume that (A) = 0 but
ac
(A) > 0. Take a Borel set A0 such that A A0 , and (A0 ) = 0, while ac (A0 ) ac
(A) > 0
0
(6.10)
and
= ac (A0 ) > 0.
ac (A)
Now, (6.10) implies that B 0 = B \ A F but (6.11) means that
(6.11)
< (B),
(B 0 ) = (B) (A)
which is a contradiction. Therefore, ac is absolutely continuous with respect to .
Finally, let z > 0, consider the set Cz = {x : D s z}, and write Cz = Cz0 Cz00 with
Cz0 = Cz B, Cz00 = Cz B c . Then (Cz00 ) = 0 since B F, while Lemma 6.17 implies that
z(Cz0 ) s (Cz0 ) s (B) = 0. It follows that D s = 0 -a.e., which, in turn, means that
D = D ac -a.e. Now, Theorem 6.20 implies that (6.8) holds. 2
50
f d.
E
B(x,r)
r0
B(x,r)
(6.12)
Proof. The proof is surprisingly simple based on the Radon-Nikodym theorem. Let us
defined the measures as follows. For a Borel set B we set
(B) =
f d,
(6.13)
with f+ = max(f, 0) and f = max(f, 0), and for an arbitrary set A define
(A) = inf( (B) : A B, B Borel).
Then + and are Radon measures, absolutely continuous with respect to , thus
+ (A) =
D + d, (A) =
A
D d
(6.14)
for all -measurable sets A. Together, (6.13) and (6.14) imply that
D = f -a.e.
Indeed, consider, for instance, the set S = {x : f+ (x) > D + (x)} =
(6.15)
S
Sq , with
(f+ D + )d q(Sq ),
Sq
r0
1
+
r) [ (B(x, r)) (B(x, r))] = D + D = f+ f = f,
r0 (B(x,
f d = lim
B(x,r)
for -a.e. x. 2
The Lebesgue-Besicovitch theorem has several interesting corollaries.
51
Definition 6.24 Let f Lploc (Rn , d) with 1 p < +. A point x is a Lebesgue point of f
lim
r0
B(x,r)
|f (y) f (x)|p dy = 0.
Corollary 6.25 Let be a Radon measure on Rn , 1 p < + and let f Lploc (Rn , d)
with 1 p < +, then
lim
r0
B(x,r)
|f (y) f (x)|p dy = 0
(6.16)
for -a.e. x Rn .
Proof. Let j be a countable dense subset of R then for each j fixed we have
lim
r0
B(x,r)
|f (y) j |p dy = |f (x) j |p
(6.17)
for -a.e. x Rn . Hence, there exists a set S of full measure, (Rn \ S) = 0 so that (6.17)
holds for all j for x S. Next, given x S and > 0 choose j so that |f (x) j |p < /2p ,
then we have
lim sup
r0
B(x,r)
|f (y) f (x)|p dy
B(x,r)
B(x,r)
|j f (x)|p dy 0 + = ,
r0
and
|B(x, r) E|
= 1 for a.e. x E,
|B(x, r)|
|B(x, r) E|
= 0 for a.e. x
/ E.
r0
|B(x, r)|
lim
Proof. This follows immediately from the Lebesgue-Besicovitch theorem applied to the
function f (x) = E (x). 2
Signed measures
7.1
(ii) () = 0.
S
P
(iii)
E
=
j=1 j
j=1 (Ej ) for any sequence Ej of disjoint sets in B and the series
converges absolutely.
Definition 7.2 A set A is positive with respect to a signed measure if A B and (E) 0
for all E A.
Proposition 7.3 Let E be a measurable set, 0 < (E) < +, then there exists a positive
set A E with (A) > 0.
Proof. If E is not positive we construct a sequence of sets A1 , . . . , Ak , . . . as follows. Let n1 be
the smallest integer so that E contains a subset A1 with (A1 ) < 1/n
1 . Then, inductively,
Sk1
having chosen A1 , . . . , Ak1 choose Ak as follows. Set Ek1 = E \ ( j=1 Aj ) and let nk be
the smallest integer so that Ek1 contains a subset Q with (Q) < 1/nk . Finally, take
Ak Ek1 with (Ak ) < 1/nk . This procedure can be continued unless at some step k0 the
set Ek0 is positive. In that case we are done, as
(Ek0 ) = (E)
kX
0 1
j=1
S
On the other hand, if we never stop, we set A = E \
j=1 Aj . Note that, since (E) > 0, we
have
X
|(Aj )| < +,
j=1
7.2
f gdx,
F (f ) =
Rn
where g Lq (Rn ) and kF k kgkLq this follows from the Holder inequality. It turns out
that for 1 p < + all bounded linear functionals on Lp have this form.
Theorem 7.6 Let be a Radon measure, 1 p < +, and F : Lp (Rn , d) R be
a bounded linear functional. Then there exists a unique function g Lq (Rn , d), where
1/p + 1/q = 1, such that F (f ) = Rn f (x)g(x)d for any function f Lp (Rn , d), and
kF k = kgkLq .
Proof. The proof is long but straightforward. First, we construct the only candidate for the
function g rather explicitly in terms of the functional F . Then we checkthat the candidate
g lies in Lq (Rn , d), and, finally, we verify that, indeed, both F (f ) = f gd and kF k =
kgkLq (Rn ,d) .
First, we assume that is a finite measure: (Rn ) < + so that f 1 lies in all
Lp (Rn , d). For a -measurable set E let us set (E) = F (E ). The linearity and boundedness
of F , and finiteness of imply that is a signed measure with
|(E)| kF kkE kLp kF k[(E)]1/p kF k[(Rn )]1/p .
(7.1)
Let us decompose = + as in Corollary 7.5, and also use the Hahn decomposition of
Rn relative to : Rn = A B, so that + supported in A, and supported in B. Then (7.1)
implies that
+ (E) = (A E) = |(A E)| kF k[(A E)]1/p kF k[(E)]1/p ,
(7.2)
and thus + (and also by the same argument) is absolutely continuous with respect to .
Therefore, has the Radon-Nikodym derivative g(x)
(E) =
gd,
E
54
Let us now show that g Lq (Rn , d), where 1/p + 1/q = 1. It follows from the definition
of g that for any simple function which takes only finitely many values we have
F () =
gd.
(7.3)
Since F is a bounded linear functional and as for 1 p < + any simple function
Lp (Rn , d) of the form
X
(x) =
aj Aj (x)
j=1
N
X
aj Aj (x),
j=1
that is, k N kLp 0 as N +, (7.3) holds for all simple functions Lp (Rn , d)
and not only those that take finitely many values. Assume that 1 < p < + and let n be a
point-wise non-decreasing sequence of simple functions which take finitely many values such
1/q
that n |g|. Set n = (n )1/p sgn g, then
1/p
n d
,
kn kLp =
thus
n d =
n1/p+1/q d
1/q
|n |
= F (n ) kF kkn kLp
It follows that
|n |d |g||n |d =
1/p
.
kF k
n d
gn d
1/q
kF k
n d
and thus
|g|q d kF kq
(7.4)
by the Monotone Convergence Theorem, hence g Lq (Rn , d) and kgkLq (Rn ,d) kF k.
In order to finish the proof, note that, as g Lq (Rn , d), the linear functional
G(f ) =
f gd
is bounded: kGk kgkLq . Moreover, G() = F () for any simple function in Lp (Rn , d).
As simple functions are dense in this space, and both G and F are bounded functionals, it
follows that G(f ) = F (f ) for all f Lp (Rn , d), thus
F (f ) =
55
f gd
for all f Lp (Rn , d). Hence, kF k kgkLq , which, together with (7.4) implies that kF k =
kgkLq .
When the measure is not finite, consider the balls BR = B(0, R) and the restrictions
R = |BR . Define also the bounded linear functionals FR (f ) = F (f BR ). Then
|FR (f )| kF kkf R kLp (R,d) = kF kkf kLp (R,dR ),
so that FR is a bounded linear functional on Lp (R, dR ). It follows that there exists a unique
function gR Lq (R, dR ) such that
FR (f ) =
f gR dR ,
and kgR kLq (R,dR ) = kFR k kF k. We may assume without loss of generality that gR vanishes
outside of B(0, R). Given R0 > R00 the natural restriction of FR0 to Lp (R, dR00 ) coincides
with FR00 . Then uniqueness of the kernel gR00 implies that gR0 (x) = gR00 (x) for x B(0, R00 ).
Hence, we may pass to the limit R and Fatous lemma implies that the limit g(x) is in
Lq (Rn , d) with kgkLq (Rn ,d) kF k. Taking f = |g|q/p sgn g we note that
F (f ) =
q/p
which means that kgkLq (Rn ,d) kF k and thus kgkLq (Rn ,d) = kF k.
It remains only to show that for p = 1 we have kF k = kgkL (Rn ,d) , and it suffices to show
that kgkL (Rn ,d) kF k. Take any > 0 and consider the set
A = {x : |g(x)| > (1 )kgkL (Rn ,d) }.
Then (A ) > 0 so we can choose a subset B A with 0 < (B ) < +. Consider the
function f (x) = (sgn g)B (x), then
F (f ) =
|g|d (1 )kgkL (Rn ,d) (B ) = (1 )kgkL (Rn ,d) kf kL1 (Rn ,d) ,
f gd =
B
7.3
Theorem 7.7 Let L : Cc (Rn ; Rm ) R be a linear functional such that for each compact set
K we have
sup{L(f ) : f Cc (Rn ; Rm ), |f | 1, suppf K} < +.
(7.5)
Then there exists a Radon measure on Rn and a -measurable
function : Rn Rm such
that (i) |(x)| = 1 for -a.e. x Rn , and (ii) L(f ) =
(7.6)
It turns out that is actually a linear functional on Cc+ (Rn ). Moreover, we will show that
has an explicit form
(f ) =
f d.
(7.7)
Rn
The function will come about as follows: for every unit vector e Rm , |e| = 1, we define a
linear functional e on Cc (Rn ) by
e (f ) = L(f e).
(7.8)
We will extend e to a bounded linear functional on L1 (Rn , d) and use the Riesz representation theorem for L1 (Rn , d) to find a function e L (Rn ) so that
e (f ) =
f e d
m
X
L((f ej )ej ) =
j=1
m
X
ej (f ej ) =
Pm
j=1
m
X
(f ej )ej d =
(f )d,
j=1
j=1
j (x) 1 on Kg .
j=1
Then g =
Pk
j=1
k
X
|L(gj )|
j=1
k
X
(Vj ).
j=1
Next, let A and Aj , j 1 be arbitrary sets with A j=1 Aj . Given > 0 choose open sets
S
Vj such that Aj Vj and (Aj ) (Vj ) /2j . Then A V :=
j=1 Vj and thus
(A) (V )
X
j=1
(Vj )
X
j=1
57
X
(Aj ).
(Aj ) + j = +
2
j=1
(7.9)
simply be the definition of . Then for any pair of sets A1 and A2 with dist(A1 , A2 ) > 0 and
we can find sets V1 and V2 with dist(V1 , V2 ) > 0 which contain A1 and A2 , respectively. Then,
for any open set V containing A1 A2 we can set U1 = V V1 , U2 = V V2 , then (7.9) implies
that
(V ) = (U1 ) + (U2 ) (A1 ) + (A2 ),
thus (A1 A2 ) (A1 ) + (A2 ), and the measure is Borel. The definition of as
an outer measure immediately implies that is Borel regular: for any set A we canTchoose
open sets Vk containing A such that (Vk ) (Ak ) + 1/k, then the Borel set V =
k=1 Vk
contains A and (V ) = (A). Finally, (7.5) and the definition of imply that (K) < +
for any compact set K and thus is a Radon measure.
Step 2. Next, in order to show that e introduced in (7.8) is a bounded linear functional,
consider first the functional defined by (7.6) on Cc+ (Rn ). Let us show that is linear, that
is,
(f1 + f2 ) = (f1 ) + (f2 ).
(7.10)
Let f1 , f2 Cc+ (Rn ), take arbitrary functions g1 , g2 Cc (Rn ; Rm ) such that |g1 | f1 , |g2 | f2
and consider g10 = g1 sgn(L(g1 )), g20 = g2 sgn(L(g2 )). Then |g10 + g20 | f1 + f2 , and thus
|L(g1 )| + |L(g2 )| = L(g10 ) + L(g20 ) = L(g10 + g20 ) (f1 + f2 ).
It follows that
(f1 ) + (f2 ) (f1 + f2 ),
(7.11)
so that is super-linear. On the other hand, given g Cc (Rn ; Rm ) such that |g| f1 + f2
we may set, for j = 1, 2:
( f (x)g(x)
j
, if f1 (x) + f2 (x) > 0,
gj (x) = f1 (x) + f2 (x)
0,
if f1 (x) + f2 (x) = 0.
It is easy to check that g1 and g2 are continuous functions with compact support. Then, as
g = 0 where f1 + f2 = 0, we have g = g1 + g2 , and |gj (x)| fj (x), j = 1, 2, for all x R. It
follows that
|L(g)| |L(g1 )| + |L(g2 )| (f1 ) + (f2 ),
thus (f1 + f2 ) (f1 ) + (f2 ), which, together with (7.11) implies (7.10).
Step 3. The next step is to show that has the explicit form (7.7).
58
f d.
(f ) =
(7.12)
Rn
Proof. Given f Cc+ (Rn ) choose a partition 0 = t0 < t1 < . . . < tN = 2kf kL with
0 < ti ti1 < and so that (f 1 {tj }) = 0 for j = 1, . . . , N . Set Uj = f 1 (tj1 , tj ), then
Uj is a bounded open set, hence (Uj ) < . As is a Radon measure, there exist compact
sets Kj Uj with (Uj \ Kj ) < /N . There also exist functions gj Cc (Rn ; Rm ) with
|gj | 1, supp gj Uj , and |L(gj )| (Uj ) /N , as well as funcions hj Cc+ (Rn ) such that
supp hj Uj , 0 hj 1 and hj 1 on the compact set Kj supp gj . Then hj |gj | and
thus (hj ) |L(gj )| (Uj ) /N , while (hj ) (Uj ) since supp hj Uj and 0 hj 1.
Summarizing, we have
(Uj )
(hj ) (Uj ).
N
Consider the open set
N
X
A = {x : f (x)(1
hj (x)) > 0},
j=1
then
N
[
(A) =
!
(Uj \ {hj = 1}
j=1
N
X
(Uj \ Kj ) < .
j=1
N
X
N
n
o
X
n
m
hj ) = sup |L(g)| : g Cc (R ; R ), |g| f (1
hj )
j=1
j=1
n
o
sup |L(g)| : g Cc (Rn ; Rm ), |g| kf kL A = kf kL (A) kf kL .
It follows that
(f )
N
X
(f hj ) + kf k|L
j=1
and
N
X
N
X
tj (Uj ) + kf k|L ,
j=1
N
X
(f )
tj1 (Uj ) 2kf kL .
(f hj )
tj1 ((Uj ) )
N
j=1
j=1
j=1
As a consequence,
(f )
Rn
N
X
f d
(tj tj1 )(Uj ) + 3kf k|L (suppf ) + 3kf k|L ,
j=1
59
(f )d.
L(f ) =
(7.13)
Rn
Proof. For a fixed vector e Rn with |e| = 1 and f Cc (Rn ) define e (f ) = L(f e), Then
e is a linear functional on Cc (Rn ) and
|f |d.
(7.14)
Rn
Thus, e can be extended to a bounded linear functional on L1 (Rn , d), hence by the Riesz
representation theorem for Lp -spaces there exists e L (Rn , d) such that
e (f ) =
f e d.
(7.15)
Rn
Moreover, (7.14) implies that, as a bounded linear functional on L1 (Rn , d), e has the norm
ke k 1. Therefore, ke kL (Rn ,d) 1 as well. Setting
=
m
X
ej ej ,
j=1
m
X
L((f ej )ej ) =
j=1
m
X
j=1
(f )d,
(f ej )ej d =
Rn
Rn
which is (7.13). 2
Step 5. The last step is
Lemma 7.11 The function defined above satisfies || = 1 -a.e.
Proof. Let U be an open set, (U ) < + and set 0 (x) = (x)/|(x)| where (x) 6= 0, and
0 (x) = 0 where (x) = 0. Using Theorem 2.3 and Corollary 2.9 we may find a compact set
Kj U such that (U \ Kj ) < 1/j and 0 is continuous on Kj . Then we can extend 0 to
a continuous function fj on all of Rn so that |fj | 1. Next, since Kj is a proper compact
subset of an open set U we can find a cut-off function hj Cc (Rn ) such that 0 hj 1,
hj 1 on Kj U , and hj = 0 outside of U . This produces a sequence of functions gj = fj hj
such that |gj | 1, supp gj U and gj || in probability on U . Using Proposition 3.13
we may
pass to a subsequence jk + so that gjk || -a.e. in U . Then, as |gj | 1,
|| m and (U ) < +, bounded convergence theorem implies that
||d = lim
U
k+
(7.16)
by the definition of the measure . On the other hand, for any function f Cc (Rn ; Rm )
supported inside U with |f | 1 we have
(f )d
L(f ) =
U
||d,
U
60
thus
||d.
(U )
(7.17)
(7.18)
If (A) = + this is trivial so we assume that (A) < +. Define the sets
Cn = {x Rn : dist(x, C) 1/n}.
Then the distance dist(A \ Cn , A C) 1/n, thus, by the assumption of Lemma 7.8,
(A \ Cn ) + (A C) = ((A \ Cn ) (A C)) (A).
(7.19)
lim (A \ Cn ) = (A \ C).
(7.20)
We claim that
n
k=1
1
1o
< dist(x, C)
k+1
k
(R2k ) =
k=1
and
m
X
m
[
R2k (A),
k=1
(R2k1 ) =
k=1
m
[
R2k1 (A),
k=1
k=1
(A \ C) = (A \ Cn )
[ [
Rk ,
k=n
thus
(A \ Cn ) (A \ C) (A \ C) +
(Rk ),
k=n
and (7.20) follows. Passing to the limit n + in (7.19) with the help of (7.20) we obtain
(7.18). Therefore, all closed sets are -measurable, thus the measure is Borel. 2
61
8
8.1
Given a function f L1 (S1 ) (here S1 is the unit circle), or equivalently, a periodic function
f L1 [0, 1], we define the Fourier coefficients, for k Z:
f(k) =
f (x)e2ikx dx.
0
Trivially, we have |f(k)| kf kL1 for all k Z. The Riemann-Lebesgue lemma shows that an
L1 -signal can not have too much high-frequency content and f(k) have to decay for large k.
Lemma 8.1 (The Riemann-Lebesgue lemma) If f L1 (S1 ) then f(k) 0 as k +.
Proof. Note that
f(k) =
f (x)e
2ikx
dx =
f (x)e
2ik(x+1/(2k))
and thus
1
f(k) =
2
As a consequence, we have
f (x
0
dx =
1
|f(k)|
2
1 2ikx
)e
dx,
2k
1
f (x) f (x ) e2ikx dx.
2k
1
0
f (x) f (x 1 ) dx,
2k
hence f(k) 0 as k +. 2
A simple implication of the Riemann-Lebesgue lemma is that
f (x) sin(mx)dx 0
0
1
i(2k+1)x
f (x)e
dx =
f(k)e2ikx
k=
N
X
k=N
62
f(k)e2ikx .
f (t)
Sn f (x) =
0
N
X
2ik(xt)
f (x t)DN (t)dt.
dt =
0
k=N
N
X
k=N
e2i(2N +1)t 1
e2it 1
The definition of the Dini kernel as a sum of exponentials implies immediately that
DN (t)dt = 1
(8.1)
1
, |t| 1/2.
sin()
The problem with the Dini kernel is that its L1 -norm is not uniformly bounded in N .
Indeed, consider
1/2
|DN (t)|dt.
LN =
(8.2)
1/2
N +
(8.3)
We compute:
1/2
| sin((2N + 1)t)|
| sin((2N + 1)t)|
LN = 2
dt 2
dt
| sin t|
|t|
0
0
1/2
N +1/2
1
1
| sin(t)|
2
| sin((2N + 1)t)|
dt = 2
dt + O(1)
sin t t
t
0
0
N 1
2 X 1 | sin t|
dt + O(1) C log N + O(1),
k=0 0 t + k
1/2
which implies (8.3). This means that, first, the sequence DN does not form an approximation
of the delta function in the usual sense, that is DN does not behave like a kernel of the form
N (t) = N (N t), with L1 (S1 ), and, second, that (8.1) holds because of cancellation
of many oscillatory terms, and not because DN is uniformly bounded in L1 (S1 ). These
oscillations may cause difficulties in the convergence of the Fourier series.
63
1/2
SN f (x) f (x) =
[f (x t) f (x)]DN (t)dt
1/2
sin((2N + 1)t)
dt +
[f (x t) f (x)]
sin(t)
|t|
(8.5)
[f (x t) f (x)]
sin((2N + 1)t)
dt.
sin(t)
|t|1/2
Consider the first term above (with the change of variables t (t)):
sin((2N + 1)t)
I1 =
[f (x t) f (x)]
dt =
sin(t)
|t|
with
gx (t) =
1/2
f (x + t) f (x)
[,] (t).
sin(t)
Assumption (8.4) means that, as a function of the variable t, and for x fixed, gx L1 (S1 ).
The Riemann-Lebesgue lemma implies then that I1 0 as N +. The second term in
(8.5) is treated similarly: the function
rx (t) =
f (x + t) f (x)
[|t|1/2] (t)
sin(t)
I2 =
Proof. Let us set x = 0 for convenience. As f has a bounded variation on the interval
(, ), it is equal to the difference of two monotonic functions, and we can assume without
loss of generality that f is monotonic on (, ), and also that f (0+ ) = 0. Let us write
1/2
1/2
f (t)DN (t)dt =
SN f (0) =
1/2
Given > 0 choose > 0 so that 0 f (t) < for all t (0, ), then the first term above may
be split as
1/2
f (t)DN (t)dt =
1/2
f (t)DN (t)dt +
Then
1/2
f (t)DN (t)dt 0,
II2 =
exactly for the same reason as in the corresponding term I2 in the proof of Theorem 8.2, since
the function g(t) = f (t)/ sin(t) is uniformly bounded on the interval [, 1/2].
In order to treat I1 we recall the following basic fact: if h is an increasing function on [a, b]
and is continuous on [a, b] then there exists a point c (a, b) such that
h(x)(x)dx = h(b )
a
(x)dx + h(a+ )
c
(x)dx.
(8.7)
(y) = h(b )
(x)dx + h(a+ )
(x)dx,
(x)dx
(a) = h(b )
h(x)(x)dx,
while
(x)dx
(b) = h(a+ )
a
h(x)(x)dx,
a
thus there exists c [a, b] as in (8.7). Therefore, as f (0+ ) = 0, we have, with some c (0, ):
II1 =
f (t)DN (t)dt = f ( )
0
DN (t)dt,
c
and
1
1
sin((2N
+
1)t)
+
D
(t)dt
sin((2N
+
1)t)
dt
dt
N
sin t t
t
c
c
c
M
1
1
1
sin(t)
sup
dt = C < +,
sin t t dt + M
t
>0
0
0
65
with the constant C > 0 independent of . It follows that |II1 | C for all N N. This
shows that for a monotonic function f :
1/2
A change of variables t (t) shows that then for a monotonic function f we also have:
1/2
f (t)DN (t)dt f (0 ),
0
f (x t)DN (t)dt =
SN f (x) =
|t|1
f (x t)
|t|1 (t)
sin(t)
is in L1 (S1 ) as a function of t for each x fixed, because of the cut-off around t = 0. It follows
from the Riemann-Lebesgue lemma that SN f (x) 0 as N +. 2
The du Bois-Raymond example
In 1873, surprisingly, du Bois-Raymond proved that the Fourier series of a continuous function
may diverge at a point. In order to prove his theorem we need first a result from functional
analysis.
Theorem 8.5 (Banach-Steinhaus theorem) Let X be a Banach space, Y a normed vector
space and let {T } be a family of bounded linear operators X Y . Then either sup kT k <
+, or there exists x X so that sup kT xkY = +.
Proof. Let (x) = sup kT xkY and let Vn = {x X : (x) > n}. Each function
(x) = kT (x)kY is continuous on X, thus the set Vn is a union of open sets, hence Vn itself
is open. Let us assume that VN is not dense in X for some N . Then there exists x0 X and
r > 0 such that kxk < r implies that x0 +x
/ VN . Therefore, (x0 +x) N for all such x, thus
66
kT (x0 + x)kY N for all x B(0, r) and all . As a consequence, kT k (kT x0 kY + N )/r
for all .
On the other hand,
if all sets VN are dense, then Baire Category Theorem implies that the
T
intersection V =
V
n=1 n is not empty. In that case taking x0 V we observe that for any
n there exists n with kTn x0 kY n = (n/kx0 kX )kx0 kX , thus kTn k n/kx0 kX and thus
sup kT k = +. 2
Theorem 8.6 There exists a continuous function f C(S1 ) so that its Fourier series diverges at x = 0.
Proof. Let X = C(S1 ) and Y = C, and define TN : X Y by
1/2
f (t)DN (t)dt.
TN f = SN f (0) =
1/2
Then
1/2
|DN (t)|dt,
kTN k LN =
1/2
and, moreover, as DN changes sign only finitely many times, we may construct a sequence
of continuous functions fjN such that fjN |DN | pointwise as j +, |fjN | 1 and
|TN fjN | LN 1/j. It follows that kTN k = LN . Recall (see (8.3)) that
lim LN = +.
(8.8)
N +
With (8.8) in hand we may use the Banach-Steinhaus theorem to conclude that there exists
f C(S1 ) such that |SN f (0)| + as N +. 2
Kolmogorov showed in 1926 that an L1 -function may have a Fourier series that diverges at
every point. Then Carelson in 1965 proved that the Fourier series of an L2 -function converges
almost everywhere and then Hunt improved this result to an arbitrary Lp for p > 1.
8.2
X
1 X
1
FN (t) =
Dk (t) =
sin((2k + 1)t) sin(t)
N + 1 k=0
(N + 1) sin2 (t) k=0
N
X
1
=
[cos(2kt) cos(2(k + 1)t]
2(N + 1) sin2 (t) k=0
=
1
1 sin2 ((N + 1)t)
[1
cos(2(N
+
1)t]
=
.
N +1
2(N + 1) sin2 (t)
sin2 (t)
67
The definition and explicit form of FN show that, unlike the Dini kernel, FN is non-negative
and has L1 -norm
1
|FN (t)|dt = 1.
(8.9)
0
Moreover, its mass outside of any finite interval around zero vanishes as N +:
lim
(8.10)
<|t|<1/2
(8.11)
lim kN f f kC(S1 ) = 0.
(8.12)
Proof. We use the Minkowski inequality, with the notation ft (x) = f (x t):
1/2
N f (x) f (x) =
[f (x t) f (x)]FN (t)dt,
1/2
thus
1/2
kN f f kp
kft f kp FN (t)dt =
1/2
IN
kft f kp FN (t)dt +
|t|<
kft f kp FN (t)dt
|t|1/2
IIN .
(8.13)
|IN |
(8.14)
1/2
<
FN (t)dt = .
1/2
FN (t)dt < .
|t|1/2
This is possible because of (8.10). The second term in (8.13) may then be estimated as
|IIN |
2kf kp
Now, (8.11) follows. in order to prove (8.12) all we need to do is replace (8.14) with the
corresponding estimate in C(S1 ) and repeat the above argument. 2
Theorem 8.7 has a couple of useful corollaries. First, it shows that the trigonometric
polynomials are dense in Lp (S1 ):
68
Corollary 8.8 Trigonometric polynomials are dense in Lp (S1 ) for any p [1, +).
Proof. This follows immediately from (8.11) since each N f is a trigonometric polynomial,
for all N and f . 2
Corollary 8.9 (The Parceval identity) The map f f is an isometry between L2 (S1 ) and
l2 .
Proof. The trigonometric exponentials {e2ikx }, k Z, form an orthonormal set in L2 (S1 ),
which is complete by Corollary 8.8, hence they form a basis of L2 (S1 ) and the Fourier coefficients of f L2 (S1 ) are the coefficients in the expansion
X
f (x) =
f(k)e2ikx ,
kZ
so that
|f(k)|2 =
|f (x)|2 dx,
(8.15)
kZ
(8.16)
R (k)
R (x + )e
2ikx
R (x)e2ik(x) dx = R (k)e2ik .
dx =
0
f() =
f (x)e2ix dx.
Then, obviously, f L (Rn ), with kfkL kf kL1 . Moreover, the function f() is continuous:
0
0
9.1
For a smooth compactly supported function f Cc (Rn ) we have the following remarkable
algebraic relations between taking derivatives and multiplying by polynomials:
d
f
() = 2i f(),
xi
and
d
(2i)(x
j f )() =
f
().
j
(9.1)
(9.2)
As
Cc (Rn )
lies inside the Schwartz class, the Schwartz functions are dense in L1 (Rn ).
Convergence in S(Rn ) is defined as follows: a sequence k 0 in S(Rn ) if
lim p (k ) = 0
(9.3)
|k |p dx
|n |p dx
|x|1
+2
|x|1
|x|n+1 |n |p
dx Cn |p |p + 2Cn0 |p0 |p ,
1 + |x|n+1
with 0 = + (n + 1)/p and the constants Cn and Cn0 that depend only on the dimension n.
The main reason to introduce the Schwartz class is the following theorem.
70
Theorem 9.2 The Fourier transform is a continuous map S(Rn ) S(Rn ) such that
f(x)g(x)dx,
f (x)
g (x)dx =
Rn
(9.4)
Rn
and
f()e2ix d
f (x) =
(9.5)
Proof. First, as
2
(9.6)
with the boundary condition u(0) = 1. However, relations (9.1) and (9.2) together with (9.6)
imply that f satisfies the same differential equation (9.6), with the same boundary condition
f(0) = 0. It follows that f (x) = f(x) for all x R. 2
We continue with the proof of Theorem 9.2. Relations (9.1) and (9.2) imply that if fk 0
in S(Rn ) then for any pair of multi-indices , :
sup | D fk ()| CkD (x fk )kL1 0,
xRn
2ix
f (x)
g (x)dx =
Rn
f (x)g()e
f()g()d.
dxd =
R2n
Rn
Finally, we prove the inversion formula using a rescaling argument. Let f, g S(Rn ) then
for any > 0 we have
1
2ix
f (x)
g (x)dx =
f (x)g()e
dx = f()g()d = n
f()g
d.
Rn
Rn
R2n
Multiplying by n and changing variables on the left side we obtain
x
f
g(x)dx =
f()g
d.
Rn
Rn
Letting now using the Lebesgue dominated convergence theorem gives
f (0)
f()d,
g(x)dx = g(0)
Rn
Rn
71
(9.7)
for all functions f and g in S(Rn ). Taking g(x) = e|x| in (9.7) and using Lemma 9.3 leads
to
f (0) =
f ()d.
(9.8)
Rn
The inversion formula (9.5) now follows if we apply (9.8) to a shifted function fy (x) = f (x+y),
because
fy () =
f (x + y)e2ix dx = e2iy f(),
Rn
so that
e2iy f()d,
fy ()d =
f (y) = fy (0) =
Rn
Rn
which is (9.5). 2
The Schwartz distributions
Definition 9.4 The space S 0 (Rn ) of Schwartz distirbutions is the space of linear functionals
T on S(Rn ) such that T (k ) 0 for all sequences k 0 in S(Rn ).
Theorem 9.2 allows us to extend the Fourier transform to distributions in S 0 (Rn ) by setting
T(f ) = T (f) for T S 0 (Rn ) and f S(Rn ). The fact that T(fk ) 0 for all sequences fk 0
in S(Rn ) follows from the continuity of the Fourier transform as a map S(Rn ) S(Rn ), hence
T is a Schwartz distribution for all T S 0 (Rn ). For example, if 0 is the Schwartz distribution
such that 0 (f ) = f (0), f S(Rn ), then
0 (f ) = f(0) =
f (x)dx,
Rn
1 for all Rn .
so that ()
Similarly, since differentiation is a continuous map S(Rn ) S(Rn ), we may define the
distributional derivative as
f
T
(f ) = T
,
xj
xj
for all T S 0 (Rn ) and f S(Rn ) the minus sign here comes from the integration by parts
formula, for if T happens to have the form
Tg (f ) =
f (x)g(x)dx,
Rn
h(sgn)0 , f i = hsgn, f 0 i =
0
sgn(x)f 0 (x)dx =
f 0 (x)dx
72
9.2
The law of large numbers and the central limit theorem deal with the question of how a
sum of the large number of identically distributed random variables behaves. We will not
discuss them here in great detail but simply explain how the Fourier transform is useful in
this problem. Let Xj be a sequence of real-valued independent, identically distributed random
variables with mean zero and finite variance:
E(Xn ) = 0, E(Xn2 ) = D < +.
(9.9)
Let us define
X1 + X 2 + . . . + Xn
.
(9.10)
n
Recall that if X and Y are two random variables with probability densities pX and pY , that
is,
Zn =
E(f (X)) =
f (x)pY (x)dx,
pX (x y)pY (y)dy.
so that
pX (x)dx =
A
p(x)dx,
A
(f[
? g)() =
2ix
f (y)g(x y)e
dxdy =
pn () = pX
.
n
As
pX (x)dx = 1,
R
73
p0X (0)
= 2i
xpX (x)dx = 0,
(9.12)
2
x2 pX (x)dx = 4 2 D.
= (2i)
(9.13)
We can now compute, with the help of (9.12) and (9.13), for any R:
n
2 2 D||2
lim pn () = lim 1
= 1.
n
n
n2
As a consequence, for any test function f S(R) we have
E(f (Zn )) =
f()
pn ()d
f (x)pn (x)dx =
R
f()d = f (0).
R
Thus, the random variable Zn converges in law to a non-random value Z = 0. This is the
weak law of large numbers.
In order to get a non-trivial limit for a sum of random variables we consider the central
limit scaling:
X1 + X 2 + . . . + Xn
.
Rn =
n
As we did for Zn , we may compute the probability density qn for Rn :
qn () = pX
n
.
E(f (Rn ))
thus Rn converges in law to a random variable with the Gaussian probability density
q(x) =
2ix 2 2 D||2
d =
2ix/ 2D ||2
d
e|| /(2D)
=
.
2D
2D
9.3
Interpolation in Lp -spaces
p0 +(1)p1
|f |
p1
|f | d
p0
|f | d
(9.14)
as well as
k0 = kAkLp0 (M )Lq0 (N ) , k1 = kAkLp1 (M )Lq1 (N ) .
Theorem 9.5 (The Riesz-Thorin interpolation theorem) For any t [0, 1] there exists a
bounded linear operator At : Lpt (M ) Lqt (N ) that coincides with A on Lp0 (M ) Lp1 (M )
and whose operator norm satisfies
kAt kLpt (M )Lqt (N ) k01t k1t .
(9.15)
Before proving the Riesz-Thorin interpolation theorem we mention some of its implications.
We already know that the Fourier transform maps L1 (Rn ) to L (Rn ) and L2 (Rn ) to itself.
This allows us to extend the Fourier transform to all intermediate spaces Lp (Rn ) with 1
p 2.
Corollary 9.6 (The Hausdorff-Young inequality) If f Lp (Rn ) then its Fourier transform
1
1
0
f Lp (Rn ) with + 0 = 1 and kfkLp0 kf kLp .
p p
Proof. We take p0 = 1, p1 = 2, q0 = , q1 = 2. Then for any t [0, 1] the corresponding pt
and qt are given by
1
1t t
t 1
t
=
+ =1 ,
= ,
pt
1
2
2 qt
2
which means that 1/pt +1/qt = 1, as claimed. Furthermore, as kfkL2 = kf kL2 by the Parceval
identity and kfkL kf kL1 , it follows that kfkLpt Lqt 1. 2
The next corollary allows to estimate convolutions.
75
(9.16)
1
1 1
+1= + .
r
p q
(9.17)
with
kf ? gkL1
(9.18)
and
kf ? gkL kf kL kgkL1 .
(9.19)
(9.20)
which is a special case of (9.16) with q = 1 and r = p. On the other hand, Holders inequality
implies that
1
1
kf ? gkL kf kLp kgkLp0 ,
+ 0 = 1.
(9.21)
p p
Let us take p0 = 1, q0 = p, p1 = p0 and q1 = in the Riesz-Thorin interpolation theorem
applied to the mapping g f ? g, with f fixed. Then (9.20) and (9.21) imply that, for all
t [0, 1],
kf ? gkLr kf kLp kgkLq ,
with
1
t
1
1t
=
+ 0,
=
q
pt
1
p
and
1
1
1t
t
= =
+ .
r
qt
p
A(x, D)f =
Corollary 9.8 The family of operators A(x, D), 0 < 1, is uniformly bounded from any
Lp (Rn ), 1 p +, to itself.
76
A(x, D)f =
2ix
a(x, )f()d =
2ix+2iy
a
(x, y)f()ddy =
a
(x, y)f (x+y)dy,
where a
(x, y) is the Fourier transform of the function a(x, ) in the variable . It follows that
|
a(x, y)|dy = C1 (a)kf kL ,
xRn
and
kA(x, D)kL1
|
a(x, y)||f (x + y)|dydx
( sup |
a(z, y)|)|f (x + y)|dydx
zRn
= kf kL1
( sup |
a(z, y)|)dy = C2 (a)kf kL1 .
zRn
The Riesz-Thorin interpolation theorem implies that for any p [1, +] there exists Cp (a)
which does not depend on (0, 1] so that kA(x, D)kLp Lp Cp . 2
The three lines theorem
A key ingredient in the proof of the Riesz representation theorem is the following basic result
from complex analysis.
Theorem 9.9 Let F (z) be a bounded analytic function in the strip S = {z : 0 Rez 1},
such that |F (iy)| m0 , |F (1 + iy)| m1 , with m0 , m1 > 0 for all y R. Then
|F (x + iy)| m01x mx1 for all 0 x 1, y R.
Proof. It is convenient to set
F1 (z) =
(9.22)
F (z)
,
z
m1z
0 m1
so that |F1 (iy)| 1, |F1 (1 + iy)| 1 and F1 is uniformly bounded in S. It suffices to show
that |F (x + iy)| 1 for all (x, y) S under these assumptions. If the strip S were a bounded
domain, this would follow immediately from the maximum modulus principle.
Assume first that F1 (x+iy) 0 as |y| +, uniformly in x [0, 1]. Then |F1 (xiM )|
1/2 for all y with |y| M , and M > 0 large enough. The maximum modulus principle implies
that |F1 (x + iy)| 1 for |y| M , and, since, |F1 (x + iy)| 1/2 for all y with |y| M , it
follows that |F1 (x + iy)| 1 for all (x, y) S.
In general, set
2
Gn (z) = F1 (z)e(z 1)/n ,
then
|Gn (iy)| |F1 (iy)|e(y
2 1)/n
1,
and
2
77
2 y 2 1)/n
C0 ey
2 /n
with a constant C0 such that |F1 (z)| C0 for all z S. It follows from the previous part of
the proof that |Gn (z)| 1, hence
|F1 (z)| e(1+y
2 )/n
for all z S and all n N. Letting n + we deduce that |F1 (z)| 1 for all z S. 2
The proof of the Riesz-Thorin interpolation theorem
First, let us define the operator A on Lpt (M ) with pt as in (9.14). Given f Lpt (M ) we
decompose it as
f (x) = f1 (x) + f2 (x), f1 (x) = f (x)|f |1 (x),
Then, as pt p1 :
p1
|f1 | d =
M
p1
|f |pt d = kf kpLtpt ,
pt
|f | |f |1 d
|f | |f |1 d
and, as p0 pt :
p0
|f2 | d =
M
pt
|f |pt d = kf kpLtpt ,
pt
|f | |f |1 d
|f | |f |1 d
so that f1 Lp1 (M ) and f2 Lp0 (M ). As A is defined both on Lp0 (M ) and Lp1 (M ), we can
set
Af = Af1 + Af2 .
We need to verify that A maps Lpt (M ) to Lqt (N ) continuously. Note that the norm of a
0
bounded linear functional Lf : Lp (M ) R,
f gd, f Lp (M ),
Lf (g) =
M
To see that, for f (x) = |f (x)|ei(x) simply take g(x) = |f (x)|p/p exp{i(x)} for 1 < p < +,
g(x) = exp{i(x)} for p = 1, and g(x) = A (x) exp{i(x)}, where A is a set of a finite
measure such that |f (x)| > (1 )kf |L on A for p = +. We conclude that
kf kLp =
sup
kgk
Lp
0 =1
f gd,
M
78
1
1
+ 0 = 1.
p p
For an operator mapping Lp to Lq we have the corresponding representation for its norm:
kAkLp (M )Lq (N ) =
sup
kAf kLq (N ) =
kf kLp (M ) =1
sup
kf kLp (M ) =1
(Af )gd.
(9.23)
kgk
=1
0
Lq (N )
We will base our estimate of the norm of A : Lpt (M ) Lqt (N ) on (9.23) . Moreover,
0
as simple functions are dense in Lpt (M ) and Lqt (N ), it suffices to use in (9.23) only simple
functions f and g with kf kLpt (M ) = kgkLqt0 (N ) = 1, of the form
f (x) =
n
X
aj e
ij (x)
Aj (x), g(y) =
j=1
m
X
(9.24)
j=1
with aj , bj > 0, -measurable sets Aj and -measurable sets Bj . Since 0 < t < 1, neither pt
nor qt0 can be equal to +, hence (Aj ), (Bj ) < +.
Let us now extend the definition of pt and qt to all complex numbers with 0 Re 1:
1
1
1
1
1
=
=
=
+
.
+ ,
+ , 0
p()
p0
p1 q()
q0
q1 q ()
q00
q10
0
Fix t (0, 1) and a pair of (complex-valued) functions f Lpt (M ) and g Lqt (M ) of the
form (9.24). Consider a family of functions
u(x, ) =
n
X
p /p() ij (x)
aj t
e
Aj (x),
v(y, ) =
j=1
m
X
q 0 /q 0 () ij (y)
bj t
Bj (y),
j=1
(9.25)
As both 1/p() and 1/q 0 () are linear in , the functions u(x, ) and v(x, ) are analytic in
in the strip S = { : 0 Re 1}. Since u(x, ) and v(y, ) are simple functions of x and
y, respectively, vanishing outside of a set of finite measure for each S fixed, they lie in
0
0
Lp0 (M ) Lp1 (M ), and Lq0 (M ) Lq1 (M ), respectively. Therefore, we can define
F () =
(Au)(y, )v(y, )d =
N
n X
m
X
p /p() qt0 /q 0 ()
aj t
bk
j=1 k=1
with j (x) = eij (x) Aj (x). According to (9.23) and (9.25) , in order to prove that
kAt kLpt (M )Lqt (N ) k01t k1t ,
(9.26)
(9.27)
On the boundary of the strip S we have the following bounds: along the line = 0, for z = i,
!1/p0
X
n
[p (i)/p1 +pt (1i)/p0 ]p0
ku(x, i)kLp0 (M ) =
aj t
Aj (x)d
M j=1
n
X
!1/p0
p /p
0
= kf kLtpt (M
) = 1,
M j=1
and
X
m
kv(y, i)kLq00 (N ) =
N j=1
bj
X
m
Bj (y)d
!1/p0
qt0
|bj | Bj (y)d
N j=1
!1/q00
qt0 /q00
= kgk
Lqt (N )
= 1.
It follows that
|F (i)| k(Au)(i)kLq0 (N ) kv(i)kLq00 (N ) kAkLp0 (M )Lq0 (N ) ku(i)kLp0 (N ) kv(i)kLq00 (N ) k0 .
Similarly, along the line = 1+i we have ku(x, 1+i)kLp1 (M ) 1 and kv(x, 1+i)kLq10 (N ) 1,
which implies that |F (1 + i)| k1 . The three lines theorem implies now that |F ( + i)|
k01 k1 , hence (9.27) holds. 2
9.4
e2t|| f()e2ix d,
u(x, t) =
t 0, x Rn .
Rn
u(x, t) = Pt ? f =
Pt (x y)f (y),
with
Pt () = e2t|| ,
and
t
.
(t2 + |x|2 )(n+1)/2
Here the constant n depends only on the spatial dimension.
Pt (x) = Cn
80
u(z) =
2t||
f()e2iz d.
f()e2iz d +
f()e2ix d =
f()e2iz d.
f()e2iz d
iv(z) =
As the function
f()e2iz d
u(z) + iv(z) =
0
is analytic in the upper half-plane {Imz > 0}, the function v is the harmonic conjugate of u.
It can be written as
(isgn())e2t|| f()e2ix d = Qt ? f,
v(z) =
R
with
t () = isgn()e2t|| ,
Q
(9.28)
and
1 x
.
t2 + x2
The Poisson kernel and its conjugate are related by
Qt (x) =
i
,
(x + iy)
which is analytic in {Imz 0}. The main problem with the conjugate Poisson kernel is that
it does not decay fast enough at infinity to be in L1 (R) nor is regular at x = 0 as t 0.
The principle value of 1/x
In order to consider the limit of Qt as t 0 let us define the principal value of 1/x which is
an element of S 0 (R) defined by
1
P.V. () = lim
0
x
This is well-defined because
1
P.V. () =
x
thus
|x|<1
|x|>
(x)
dx, S(R).
x
(x) (0)
dx +
x
|x|>1
(x)
dx,
x
1
P.V. () C(k0 kL + kxkL ),
x
and therefore P.V.(1/x) is, indeed, a distribution in S 0 (R). The conjugate Poisson kernel Qt
and the principal value of 1/x are related as follows.
81
1 x
, then for any function S(R)
t2 + x2
1
1
P.V. () = lim
t0
Proof. Let
t (x) =
so that
Qt (x)(x)dx.
R
1
t<|x| (x)
x
1
P.V. () = lim
t0
x
t (x)(x)dx.
R
x(x)
(x)
(Qt (x) t (x))(x)dx =
dx
dx
2
2
R x +t
|x|>t x
x(x)
1
x
=
dx +
(x)dx
2
2
2
2
x
|x|<t x + t
|x|>t x + t
=
|x|<1
x(tx)
dx
x2 + 1
|x|>t
t2 (x)
dx =
x(x2 + t2 )
|x|<1
x(tx)
dx
x2 + 1
(9.29)
|x|>1
(tx)
dx.
x(x2 + 1)
The dominated convergence theorem implies that both integrals on the utmost right side
above tend to zero as t 0. 2
It is important to note that the computation in (9.29) worked only because the kernel
1/x is odd this produces the cancellation that saves the day. This would not happen, for
instance, for a kernel behaving as 1/|x| near x = 0.
The Hilbert transform
Motivated by the previous discussion, for a function f S(R), we define the Hilbert transform
as
f (x y)
1
dy.
Hf (x) = lim Qt ? f (x) = lim
t0
0 |y|>
y
It follows from (9.28) that
d() = lim Q
t ()f() = isgn()f().
Hf
0
(9.30)
Therefore, the Hilbert transform may be extended to an isometry L2 (R) L2 (R), with
kHf kL2 = kf kL2 , H(Hf ) = f and
(Hf )(x)g(x)dx =
f (x)(Hg)(x)dx.
(9.31)
The following extension of the Hilbert transform to Lp -spaces for 1 < p < is due to
M. Riesz.
Theorem 9.11 Given 1 < p < there exists Cp > 0 so that
kHf kLp Cp kf kLp for all f Lp (Rn ).
82
(9.32)
gn (x) =
so that
kf
gn k2L2
2/n
|f()|2 d 0 as n +.
(9.33)
2/n
2/n
|f()|d 0 as n +.
kf gn kL
(9.34)
2/n
(9.35)
(1 + sgn())f()e
2ix
f()e2ix d,
d = 2
0
f()e2iz d.
p(z) = 2
0
The function p(z) is holomorphic in the upper half-plane {Imz > 0} and is continuous up to
the boundary y = 0. Since f S0 there exists > 0 so that f() = 0 for || . Thus, p(z)
satisfies an exponential decay bound
|p(z)| 2e2y kfkL1 , z = x + iy.
(9.36)
Integrating p4 (z) along the contour CR which consists of the interval [R, R] along the real
axis and the semicircle {x2 + y 2 = R2 , y > 0}, and passing to the limit R 0 with the help
of (9.36) leads to
lim
R+
83
(Hf (x)) dx =
R
f 2 (x)(Hf )2 (x)dx
1
(2f 4 (x) + (Hf )4 (x))dx,
2
hence
4
(Hf (x)) dx 4
f 4 (x)dx,
(9.37)
for any function f S0 . As we have shown that S0 is dense in any Lp (R), 2 p < ,
(9.37) holds for all f L4 (R). Therefore, the Hilbert transform is a bounded operator
L4 (R) L4 (R). As we know that it is also bounded from L2 (R) to L2 (R), the Riesz-Thorin
interpolation theorem implies that kHf kLp Cp kf kLp for all 2 p 4.
An argument identical to the above, integrating the function p2k (z) over the same contour,
shows that H is bounded from L2k (R) to L2k (R) for all integers k. It follows then from RieszThorin interpolation theorem that kHf kLp Cp kf kLp for all 2 p < +.
It remains to consider 1 < p < 2 this is done using the duality argument. Let q > 2 be the
dual exponent, 1/p+1/q = 1. As the operator H : Lq (R) Lq (R) is bounded, so is its adjoint
H : Lp (R) Lp (R) defined by hH f, gi = hf, Hgi, with f Lp (R), g Lq (R). However,
(9.31) says that H = H, hence the boundedness of H implies that H : Lp (R) Lp (R) is
also bounded. 2
The Hilbert transform does not map L1 (R) L1 (R) but we have the following result due
to Kolmogorov.
Theorem 9.12 Let f L1 (R), then there exists C > 0 so that for any > 0 the following
estimate holds:
C
|f (x)dx.
m{x : |Hf (x)| }
R
We will not prove this theorem here.
10
10.1
1 if 0 x < 1/2,
(x) = 1 if 1/2 x < 1,
0
otherwise.
It has mean zero
(x)dx = 0,
0
84
(10.1)
2 (x)dx = 1.
0
jk (x)jk0 (x)dx = 2
R
R
0
j/2+j 0 /2
jk (x)j 0 k0 (x)dx = 2
j 0 /2j/2
j 0 j
=2
(y)(2
j 0 /2j/2
k k 0 )dy
R
1/2
=2
y+2
j 0 j
(2
j 0 j
y+2
j 0 j
j 0 /2j/2
(2j j y + 2j j k k 0 )dy.
k k )dy 2
1/2
0
Both of the integrals above equal to zero. Indeed, 2j j 2, hence, for instance,
1/2
j 0 j
(2
y+2
j 0 j
2j
jj 0
0 j1
0
(y + 2j j k k 0 )dy = 0,
k k )dy = 2
because
(y)dy = 0,
m
|jk (x)| = 2
R
|(x k)|2 dx = 1.
|(2 x k)| dx =
R
The Haar coefficients of a function f L2 (R) are defined as the inner products
cjk =
f (x)jk (x)dx,
(10.2)
cjk jk (x).
(10.3)
and the series (10.3) converges in L2 (R). In order to show that it actually converges to the
function f itself we need to prove that the Haar functions form a basis for L2 (R).
85
then
n
Pn f (x) =
f dx = 2
f dx,
Ink
for x Ink .
Ink
The function Pn f is constant on each of the dyadic intervals Ink . In particular, each Haar
function jk satisfies Pn jk (x) = 0 for j n, while Pn jk (x) = jk (x) for j < n. We claim
that, actually, for any f L2 (R) we have
X
Pn+1 f Pn f =
cnk nk (x),
(10.4)
kZ
with the Haar coefficients cnk given by (10.2). Indeed, let x Ink and write
(k 1) k i 2(k 1) (2k 1) i [ (2k 1) 2k i
[
,
=
,
,
=
I
In+1,2k .
Ink =
n+1,2k1
2n
2n
2n+1
2n+1
2n+1
2n+1
The function Pn f is constant on the whole interval Ink while Pn+1 f is constant on each of the
sub-intervals In+1,2k1 and In+1,2k . In addition,
(Pn f )dx =
(Pn+1 f )dx.
Ink
Ink
cjk jk (x),
(10.5)
j=m kZ
for all m, n Z with n > m. It remains to show that for any f L2 (R) we have
lim Pm f (x) = 0,
m+
n+
(10.6)
both in the L2 -sense. The operators Pn f are uniformly bounded because for all n, k Z we
have
2
2
n 2n
|(Pn f )(x)| dx = 2 2
f (y)dy
|f (y)|2 dy.
Ink
Ink
Ink
2
|f (x)|2 ,
|Pn f (x)|
R
86
thus kPn f kL2 kf kL2 . Uniform boundedness of Pn implies that it is sufficient to establish
both limits in (10.6) for functions f Cc (R). However, for such f we have, on one hand,
1
|Pm f (x)| m
2
|f (x)|dx 0 as m +,
R
f (x) =
j,k=
10.2
f (y)jk (y)dy.
(10.7)
P (Zn < y) =
2 dy
ey .
2
Zn ()
n (s)ds
0
n=0
is a Brownian motion.
87
(10.8)
First, we need to verify that the series (10.8) converges in L2 () for a fixed t [0, 1]. Note
that
!2
2 X
t
m
m t
m
X
X
E
k (s)ds =
k (s)ds =
Zk ()
h[0,t] , k i2 .
0
k=n
k=n
k=n
As k form a basis for L2 [0, 1], the series (10.8) satisfies the Cauchy criterion and thus converges in L2 (). Moreover, for any 0 s < t 1 we have
!2
2 X
t
t
X
X
2
E (Xt Xs ) = E
k (u)du =
Zk ()
k (u)du =
h[s,t] , k i2
=
k=0
k[s,t] k2L2
k=0
k=0
= t s,
hence the increments Xt Xs have the correct variance. Let us show that they are independent:
for 0 t0 < t1 t2 < t3 1:
!
t1
t3
X
E ((Xt3 Xt2 )(Xt1 Xt0 )) = E
k (u)du
k (u0 )du0
k=0
t2
t0
k=0
(10.9)
p
1
2
P |Zn ()| 2 log n e(2 log n) /2 = 2 ,
n
thus
X
p
P |Zn ()| 2 log n < +.
n=0
2
log
n
happens
n
only finitely many times, so that |Zn ()| < 2 log n for al n n0 () almost surely, and (10.9)
follows.
Another useful observation is that for each fixed t 0 and j N there exists only one k
so that
t
2j +k (s)ds 6= 0,
0
88
t
2j +k (s)ds 2j/2 2j = 1 .
2j/2
0
Hence, we may estimate the dyadic blocs, using (10.9):
j
j 1
2X
t
2X
p
1
t
jM1 () .
j +k (s)ds M ()
j +k ()
(j
+
1)
log
2
(s)ds
Z
jk
2
2
2j/2
0
0
k=0
k=0
Therefore, the dyadic blocs are bounded by a convergent series which does not depend on
t [0, 1], hence the sum Xt () of the series is a continuous function for a.e. .
Nowhere differentiability of the Brownian motion
Theorem 10.1 The Brownian path Xt () is nowhere differentiable for almost every .
Proof. Let us fix > 0. Then if X s exists at some s [0, 1] and |X s | < then there exists
n0 so that
2
(10.10)
|Xt Xs | 2|t s| if |t s|
n
for all n > n0 . Let An be the set of functions x(t)
S C[0, 1] for which (10.10) holds for some
s [0, 1]. Then An An+1 and the set A = n=1 An includes all functions x(t) C[0, 1]
such that |x(s)|
1
, x
, x
8 .
x
x
x
yk = max x
n
n
n
n
n
n
n
Therefore, if we denote by Bn the set of all functions x(t) C[0, 1] for which yk 8/n for
some k, then An Bn . Therefore, in order to show that P(A) = 0 it suffices to check that
lim P(Bn ) = 0.
(10.11)
This, however, can be estimated directly, using translation invariance of the Brownian motion:
n2
X
k+2
k + 1 k + 1
k k
k 1
8
P(Bn )
P max X(
)X(
) ,X(
)X( ) , X( )X(
)
n
n
n
n
n
n
n
k=1
3
2
2
1
1
8
nP max X
X
,X
X
, X
n
n
n
n
n
n
!
3
r
r
3
3
8/n
1 8
n
n 16
C
nx2 /2
= nP X
=n
e
dx n
,
n
n
2 8/n
2 n
n
which implies (10.11). It follows that P(A) = 0 as well, hence Brownian motion is nowhere
differentiable with probability one. 2
Corollary 10.2 Brownian motion does not have bounded variation with probability one.
89