This action might not be possible to undo. Are you sure you want to continue?
T.B. Ward
Author address:
School of Mathematics, University of East Anglia, Norwich NR4
7TJ, U.K.
Email address: t.ward@uea.ac.uk
Course objectives
In order to reach the more interesting and useful ideas, we shall adopt a fairly
brutal approach to some early material. Lengthy proofs will sometimes be left
out, though full versions will be made available. By the end of the course, you
should have a good understanding of normed vector spaces, Hilbert and Banach
spaces, ﬁxed point theorems and examples of function spaces. These ideas will be
illustrated with applications to diﬀerential equations.
Books
You do not need to buy a book for this course, but the following may be useful for
background reading. If you do buy something, the starred books are recommended
[1] Functional Analysis, W. Rudin, McGraw–Hill (1973). This book is thorough,
sophisticated and demanding.
[2] Functional Analysis, F. Riesz and B. Sz.Nagy, Dover (1990). This is a classic
text, also much more sophisticated than the course.
[3]* Foundations of Modern Analysis, A. Friedman, Dover (1982). Cheap and
cheerful, includes a useful few sections on background.
[4]* Essential Results of Functional Analysis, R.J. Zimmer, University of Chicago
Press (1990). Lots of good problems and a useful chapter on background.
[5]* Functional Analysis in Modern Applied Mathematics, R.F. Curtain and A.J.
Pritchard, Academic Press (1977). This book is closest to the course.
[6]* Linear Analysi, B. Bollobas, Cambridge University Press (1995). This book is
excellent but makes heavy demands on the reader.
Contents
Chapter 1. Normed Linear Spaces 5
1. Linear (vector) spaces 5
2. Linear subspaces 7
3. Linear independence 7
4. Norms 7
5. Isomorphism of normed linear spaces 9
6. Products of normed spaces 9
7. Continuous maps between normed spaces 10
8. Sequences and completeness in normed spaces 12
9. Topological language 13
10. Quotient spaces 15
Chapter 2. Banach spaces 17
1. Completions 18
2. Contraction mapping theorem 19
3. Applications to diﬀerential equations 22
4. Applications to integral equations 25
Chapter 3. Linear Transformations 29
1. Bounded operators 29
2. The space of linear operators 30
3. Banach algebras 32
4. Uniform boundedness 32
5. An application of uniform boundedness to Fourier series 34
6. Open mapping theorem 36
7. Hahn–Banach theorem 38
Chapter 4. Integration 43
1. Lebesgue measure 43
2. Product spaces and Fubini’s theorem 46
Chapter 5. Hilbert spaces 47
1. Hilbert spaces 47
2. Projection theorem 50
3. Projection and self–adjoint operators 52
4. Orthonormal sets 54
5. Gram–Schmidt orthonormalization 57
Chapter 6. Fourier analysis 59
1. Fourier series of L
1
functions 59
2. Convolution in L
1
61
3
4 CONTENTS
3. Summability kernels and homogeneous Banach algebras 62
4. Fej´er’s kernel 64
5. Pointwise convergence 67
6. Lebesgue’s Theorem 69
Appendix A. 71
1. Zorn’s lemma and Hamel bases 71
2. Baire category theorem 72
CHAPTER 1
Normed Linear Spaces
A linear space is simply an abstract version of the familiar vector spaces R, R
2
,
R
3
and so on. Recall that vector spaces have certain algebraic properties: vectors
may be added, multiplied by scalars, and vector spaces have bases and subspaces.
Linear maps between vector spaces may be described in terms of matrices. Using the
Euclidean norm or distance, vector spaces have other analytic properties (though
you may not have called them that): for example, certain functions from R to R
are continuous, diﬀerentiable, Riemann integrable and so on.
We need to make three steps of generalization.
Bases: The ﬁrst is familiar: instead of, for example, R
3
, we shall sometimes want
to talk about an abstract three–dimensional vector space V over the ﬁeld R. This
distinction amounts to having a speciﬁc basis ¦e
1
, e
2
, e
3
¦ in mind, in which case
every element of V corresponds to a triple (a, b, c) = ae
1
+ be
2
+ ce
3
of reals – or
choosing not to think of a speciﬁc basis, in which case the elements of V are just
abstract vectors v. In the abstract language we talk about linear maps or operators
between vector spaces; after choosing a basis linear maps become matrices – though
in an inﬁnite dimensional setting it is rarely useful to think in terms of matrices.
Ground fields: The second is fairly trivial and is also familiar: the ground ﬁeld
can be any ﬁeld. We shall only be interested in R (real vector spaces) and C
(complex vector spaces). Notice that C is itself a two–dimensional vector space
over R with additional structure (multiplication). Choosing a basis ¦1, i¦ for C
over R we may identify z ∈ C with the vector ('(z), ·(z)) ∈ R
2
.
Dimension: In linear algebra courses, you deal with ﬁnite dimensional vector
spaces. Such spaces (over a ﬁxed ground ﬁeld) are determined up to isomor
phism by their dimension. We shall be mainly looking at linear spaces that are
not ﬁnite–dimensional, and several new features appear. All of these features may
be summed up in one line: the algebra of inﬁnite dimensional linear spaces is in
timately connected to the topology. For example, linear maps between R
2
and R
2
are automatically continuous. For inﬁnite dimensional spaces, some linear maps
are not continuous.
1. Linear (vector) spaces
Definition 1.1. A linear space over a ﬁeld k is a set V equipped with maps
⊕ : V V → V and : k V → V with the properties
(1) x ⊕y = y ⊕x for all x, y ∈ V (addition is commutative);
(2) (x ⊕y) ⊕z = x ⊕(y ⊕z) for all x, y, z ∈ V (addition is associative);
(3) there is an element 0 ∈ V such that x ⊕ 0 = 0 ⊕ x = x for all x ∈ V (a zero
element);
(4) for each x ∈ V there is a unique element −x ∈ V with x ⊕ (−x) = 0 (additive
inverses);
5
6 1. NORMED LINEAR SPACES
(notice that (V, +) therefore forms an abelian group)
(5) α (β x) = (αβ) x for all α, β ∈ k and x ∈ V ;
(6) (α + β) x = α x ⊕ β x for all α, β ∈ k and x ∈ V (scalar multiplication
distributes over scalar addition);
(7) α (x ⊕ y) = α x ⊕ α y for all α ∈ k and x, y ∈ V (scalar multiplication
distributes over vector addition);
(8) 1 x = x for all x ∈ V where 1 is the multiplicative identity in the ﬁeld k.
Example 1.1. [1] Let V = R
n
= ¦x = (x
1
, . . . , x
n
) [ x
i
∈ R¦ with the usual
vector addition and scalar multiplication.
[2] Let V be the set of all polynomials with coeﬃcients in R of degree ≤ n with
usual addition of polynomials and scalar multiplication.
[3] Let V be the set M
(m,n)
(C) of complex–valued m n matrices, with usual
addition of matrices and scalar multiplication.
[4] Let
∞
denote the set of inﬁnite sequences (x
1
, x
2
, x
3
, . . . ) that are bounded:
sup¦[x
n
[¦ < ∞. Then
∞
is linear space, since sup¦[x
n
+ y
n
[¦ ≤ sup¦[x
n
[¦ +
sup¦[y
n
[¦ < ∞ and sup¦[αx
n
[¦ = [α[ sup¦[x
n
[¦.
[5] Let C(S) be the set of continuous functions f : S →R with addition (f ⊕g)(x) =
f(x) + g(x) and scalar multiplication (α f)(x) = αf(x). Here S is, for example,
any subset of R. The dimension of C(S) is inﬁnite if S is an inﬁnite set, and is
exactly [S[ if not
1
.
[6] Let V be the set of Riemann–integrable functions f : (0, 1) → R which are
square–integrable: that is, with the property that
_
1
0
[f(x)[
2
dx < ∞. We need to
check that this is a linear space. Closure under scalar multiplication is clear: if
_
1
0
[f(x)[
2
dx < ∞ and α ∈ R then
_
1
0
[αf(x)[
2
dx = [α[
2
_
1
0
[f(x)[
2
dx < ∞. For
vector addition we need the Cauchy–Schwartz inequality:
_
1
0
[f(x) +g(x)[
2
dx ≤
_
1
0
_
[f(x)[
2
+ 2[f(x)[[g(x)[ +[g(x)[
2
_
dx
≤
_
1
0
[f(x)[
2
dx +
__
1
0
[f(x)[
2
dx
_1/2 __
1
0
[g(x)[
2
dx
_1/2
+
_
1
0
[g(x)[
2
dx < ∞.
[7] Let C
∞
[a, b] be the space of inﬁnitely diﬀerentiable functions on [a, b].
[8] Let Ω be a subset of R
n
, and C
k
(Ω) the space of k times continuously diﬀeren
tiable functions. This means that if a = (a
1
, . . . , a
n
) ∈ N
n
has [a[ = a
1
+ +a
n
≤
k, then the partial derivatives
D
a
f =
∂
a
f
∂x
a
1
1
. . . ∂x
a
n
n
exist and are continuous.
From now on we will drop the special notation ⊕, for vector addition and
scalar multiplication. We will also (normally) use plain letters x, y and so on for
elements of linear spaces.
1
This may be seen as follows. If S = {s
1
, . . . , s
n
} is ﬁnite, then the map that sends a function
f ∈ C(S) to the vector (f(s
1
), . . . , f(s
n
)) ∈ R
n
is an isomorphism of linear spaces. If S is inﬁnite,
then the map that sends a polynomial f ∈ R[x] to the function f ∈ C(S) is injective (since
two polynomials that agree on inﬁnitely many values must be identical). This shows that C(S)
contains an isomorphic copy of an inﬁnitedimensional space, so must be inﬁnitedimensional.
4. NORMS 7
As in the linear algebra of ﬁnite–dimensional vector spaces, subsets of linear
spaces that are themselves linear spaces are called linear subspaces.
2. Linear subspaces
Definition 1.2. Let V be a linear space over the ﬁeld k. A subset W ⊂ V
is a linear subspace of V if for all x, y ∈ W and α, β ∈ k, the linear combination
αx +βy ∈ W.
Example 1.2. [1] The set of vectors in R
n
of the form (x
1
, x
2
, x
3
, 0, . . . , 0)
forms a three–dimensional linear subspace.
[2] The set of polynomials of degree ≤ r forms a linear subspace of the the set of
polynomials of degree ≤ n for any r ≤ n.
[3] (cf. Example 1.1(8)) The space C
k+1
(Ω) is a linear subspace of C
k
(Ω).
3. Linear independence
Let V be a linear space. Elements x
1
, x
2
, . . . , x
n
of V are linearly dependent if
there are scalars α
1
, . . . , α
n
(not all zero) such that
α
1
x
1
+ +α
n
x
n
= 0.
If there is no such set of scalars, then they are linearly independent.
The linear span of the vectors x
1
, x
2
, . . . , x
n
is the linear subspace
span¦x
1
, . . . , x
n
¦ =
_
_
_
x =
n
j=1
α
j
x
j
[ α
j
∈ k
_
_
_
.
Definition 1.3. If the linear space V is equal to the span of a linearly inde
pendent set of n vectors, then V is said to have dimension n. If there is no such
set of vectors, then V is inﬁnite–dimensional.
A linearly independent set of vectors that spans V is called a basis for V .
Example 1.3. [1] (cf. Example 1.1(1)) The space R
n
has dimension n; the
standard basis is given by the vectors e
1
= (1, 0, . . . , 0), e
2
= (0, 1, 0, . . . , 0), . . . , e
n
=
(0, . . . , 0, 1).
[2] (cf. Example 1.1[2]) A basis is given by ¦1, t, t
2
, . . . , t
n
¦, showing the space to
have dimension (n + 1).
[3] Examples 1.1 [4], [5], [6], [7], [8] are all inﬁnite–dimensional.
4. Norms
A norm on a vector space is a way of measuring distance between vectors.
Definition 1.4. A norm on a linear space V over k is a non–negative function
  : V →R with the properties that
(1) x = 0 if and only if x = 0 (positive deﬁnite);
(2) x +y ≤ x +y for all x, y ∈ V (triangle inequality);
(3) αx = [α[x for all x ∈ V and α ∈ k.
In Deﬁnition 1.4(3) we are assuming that k is R or C and [ [ denotes the usual
absolute value. If   is a function with properties (2) and (3) only it is called a
semi–norm.
8 1. NORMED LINEAR SPACES
Definition 1.5. A normed linear space is a linear space V with a norm  
(sometimes we write  
V
).
Definition 1.6. A set C in a linear space is convex if for any two points
x, y ∈ C, tx + (1 −t)y ∈ C for all t ∈ [0, 1].
Definition 1.7. A norm   is strictly convex if x = 1, y = 1, x+y = 2
together imply that x = y.
We won’t be using convexity methods much, but for each of the examples try to
work out whether or not the norm is strictly convex. Strict convexity is automatic
for Hilbert spaces.
Example 1.4. [1] Let V = R
n
with the usual Euclidean norm
x = x
2
=
_
_
n
j=1
[x
j
[
2
_
_
1/2
.
To check this is a norm the only diﬃculty is the triangle inequality: for this we use
the Cauchy–Schwartz inequality.
[2] There are many other norms on R
n
, called the p–norms. For 1 ≤ p < ∞ deﬁned
x
p
=
_
_
n
j=1
[x
j
[
p
_
_
1/p
.
Then  
p
is a norm on V : to check the triangle inequality use Minkowski’s
Inequality
_
_
n
j=1
[x
j
+y
j
[
p
_
_
1/p
≤
_
_
n
j=1
[x
j
[
p
_
_
1/p
+
_
_
n
j=1
[y
j
[
p
_
_
1/p
.
There is another norm corresponding to p = ∞,
x
∞
= max
1≤j≤n
¦[x
j
[¦.
It is conventional to write
n
p
for these spaces. Notice that the linear spaces
n
p
and
n
q
have exactly the same elements.
[3] Let X =
∞
be the linear space of bounded inﬁnite sequences (cf. Example
1.1[4]). Consider the function  
p
:
∞
→R ∪ ¦∞¦ given by
x
p
=
_
_
∞
j=1
[x
j
[
p
_
_
1/p
.
If we restrict attention to the linear subspace on which  
p
is ﬁnite, then  
p
is a
norm (to check this use the inﬁnite version of Minkowski’s inequality). This gives
an inﬁnite family of normed linear spaces,
p
= ¦x = (x
1
, x
2
, . . . ) [ x
p
< ∞¦.
Notice that for p < ∞ there is a strict inclusion
p
⊂
∞
. Indeed, for any p < q
there is a strict inclusion
p
⊂
q
so
p
is a linear subspace of
q
. That is, the sets
p
and
q
for p ,= q do not contain the same elements.
6. PRODUCTS OF NORMED SPACES 9
[4] Let X = C[a, b], and put f = sup
t∈[a,b]
[f(t)[. This is called the uniform or
supremum norm. Why is is ﬁnite?
[5] Let X = C[a, b], and choose 1 ≤ p < ∞. Then (using the integral form of
Minkowski’s inequality) we have the p–norm
f
p
=
_
_
b
a
[f(t)[
p
_
1/p
.
[6] (cf. Example 1.1[6]). Let V be the set of Riemann–integrable functions f :
(0, 1) → R which are square–integrable. Let f
2
=
_
1
0
[f(x)[
2
dx < ∞. Then V is
a normed linear space.
5. Isomorphism of normed linear spaces
Recall form linear algebra that linear spaces V and W are (algebraically) iso
morphic if there is a bijection T : V → W that is linear:
T(αx +βy) = αT(x) +βT(y)
for all α, β ∈ k and x, y ∈ V .
A pair (X,  
X
), (Y,  
Y
) of normed linear spaces are (topologically) iso
morphic if there is a linear bijection T : X → Y with the property that there are
positive constants a, b with
ax
X
≤ T(x)
Y
≤ bx
X
. (1)
We shall usually denote topological isomorphism by X
∼
= Y .
Lemma 1.1. If X and Y are n–dimensional normed linear spaces over R (or
C) then X and Y are topologically isomorphic.
If the constants a and b in equation (1) may both be taken as 1, so T(x)
Y
=
x
X
, then T is called an isometry and the normed spaces X and Y are called
isometric.
Example 1.5. The real linear spaces (C, [ [) and (R
2
,  
2
) are isometric.
If Y is a subspace of a linear normed space (X,  
X
) then  
X
restricted to
Y makes Y into a normed subspace.
Example 1.6. Let Y denote the space of inﬁnite real sequences with only
ﬁnitely many non–zero terms. Then Y is a linear subspace of
p
for any 1 ≤ p ≤ ∞
so the p–norm makes Y into a normed space.
6. Products of normed spaces
If (X,  
X
) and (Y,  
Y
) are normed linear spaces, then the product
X Y = ¦(x, y) [ x ∈ X, y ∈ Y ¦
is a linear space which may be made into a normed space in many diﬀerent ways,
a few of which follow.
Example 1.7. [1] (x, y) = (x
X
+y
Y
)
1/p
;
[2] (x, y) = max¦x
X
, y
Y
¦.
10 1. NORMED LINEAR SPACES
7. Continuous maps between normed spaces
We have seen continuous maps between R and R in ﬁrst year analysis. To make
this deﬁnition we used the distance function [x −y[ on R: a function f : R →R is
continuous if
∀ a ∈ R, ∀ > 0, ∃ δ > 0 such that [x −a[ < δ =⇒ [f(x) −f(a)[ < .
(2)
Looking at (2), we see that exactly the same deﬁnition can be made for maps be
tween linear normed spaces, which in view of Example 1.4 will give us the possibility
of talking about continuous maps between spaces of functions. Thus, on suitably
deﬁned spaces, questions like “is the map f → f
continuous?” or “is the map
f →
_
x
0
f” continuous?” can be asked.
Definition 1.8. A map f : X → Y between normed linear spaces (X,  
X
)
and (Y,  
Y
) is continuous at a ∈ X if
∀ > 0 ∃ δ = δ(, a) > 0 such that x −a
X
< δ =⇒ f(x) −f(a)
Y
< .
If f is continuous at every a ∈ X then we simply say f is continuous.
Finally, f is uniformly continuous if
∀ > 0 ∃ δ = δ() > 0 such that x−y
X
< δ =⇒ f(x)−f(y)
Y
< ∀ x, y ∈ X.
Example 1.8. [1] The map x → x
2
from (R, [ [) to itself is continuous but not
uniformly continuous.
[2] Let f(x) = Ax be the non–trivial linear map from R
n
to R
m
(with Euclidean
norms) deﬁned by the m n matrix A = (a
ij
). Using the Cauchy–Schwartz in
equality, we see that f is uniformly continuous: ﬁx a ∈ R
n
and b = Aa. Then for
any x ∈ R
n
we have
Ax −Aa
2
=
m
i=1
[
n
j=1
a
ij
(x
j
−a
j
)[
2
≤
m
i=1
_
_
n
j=1
[a
2
ij
[
_
_
_
_
n
j=1
[x
j
−a
j
[
_
_
= C
2
x −a
2
where C
2
=
m
i=1
n
j=1
[a
ij
[
2
> 0. It follows that f is uniformly continuous, and
we may take δ = /C.
[3] Let X be the space of continuous functions [−1, 1] →R with the sup norm (cf.
Example 1.4[4]). Deﬁne a map F : X → X by F(u) = v, where
v(t) = 1 +
_
t
0
(sin u(s) + tan s) ds.
The map F is uniformly continuous on X. Notice that F is intimately connected
to a certain diﬀerential equation: a ﬁxed point for F (that is, an element u ∈ X
for which F(u) = u) is a continuous solution to the ordinary diﬀerential equation
du
dt
= sin(u) + tan(t); u(0) = 1,
in the region t ∈ [−1, 1]. We shall see later that F does indeed have a ﬁxed point
– knowing that F is uniformly continuous is a step towards this. To see that F is
7. CONTINUOUS MAPS BETWEEN NORMED SPACES 11
continuous, calculate
F(u) −F(v) = sup
t∈[−1,1]
[F(u)(t) −F(v)(t)[
= sup
t∈[−1,1]
¸
¸
¸
¸
_
1 +
_
t
0
(sin u(s) + tan s)ds
_
−
_
1 +
_
t
0
(sin v(s) + tan s)ds
_¸
¸
¸
¸
= sup
t∈[−1,1]
¸
¸
¸
¸
_
t
0
(sin u(s) −sin v(s)) ds
¸
¸
¸
¸
≤ sup
t∈[−1,1]
¸
¸
¸
¸
_
t
0
[ sin u(s) −sin v(s)[
¸
¸
¸
¸
≤ u −v.
Notice we have used the inequality [ sin u −sin v[ ≤ [u −v[, an easy consequence of
the Mean Value Theorem.
[4] Let X be the space of complex–valued square–integrable Riemman integrable
functions on [0, 1] with 2–norm (cf. Example 1.4[6]). Deﬁne a map F : X → X by
F(u) = v, with
v(t) =
_
t
0
u
2
(s)ds.
Then F is continuous (but not uniformly continuous):
[Fu(t) −Fv(t)[ = [
_
t
0
(u
2
(s) −v
2
(s))ds[
≤
_
1
0
([u(s)[ +[v(s)[)([u(s)[ −[v(s)[)ds
≤
__
1
0
_
[u(s)[
2
+[v(s)[
2
_
ds
_1/2 __
1
0
_
[u(s) −v(s)[
2
_
ds
_1/2
,
so that
Fu −Fv
2
≤ sup
t∈[0,1]
[u(t) −v(t)[ ≤ ([u[ +[v[
2
) u −v.
[5] The same map as in [4] applied to square–integrable Riemann integrable func
tions on [0, ∞) is not continuous. To see this, let a, b ∈ R and deﬁne
u(t) =
_
¸
_
¸
_
a, 0 ≤ t ≤ 2b
2
ia, 2b
2
≤ 4b
2
0 otherwise.
Then u −0
2
= 2ab. On the other hand,
F(u)(t) =
_
¸
_
¸
_
a
2
t, 0 ≤ t ≤ 2b
2
4b
2
a
2
−a
2
t, 2b
2
≤ t ≤ 4b
2
0, otherwise.
Then F(u) − F(0)
2
=
16
3
a
4
b
6
. Now, given any δ > 0 we may choose constants
a, b with 2ab < δ but
16
3
a
4
b
6
= 1. That is, given any δ > 0 there is a function u
12 1. NORMED LINEAR SPACES
with the property that u −0 < δ but F(u) −F(0) = 1, showing that F is not
continuous.
The moral is that the topological properties of inﬁnite spaces are a little
counter–intuitive.
8. Sequences and completeness in normed spaces
Just as for continuity, we can use the norm on a normed linear space to deﬁne
convergence for sequences and series in a normed space using the corresponding
notion for R.
Let X = (X,  
X
) be a normed linear space. A sequence (x
n
) in X is said to
converge to a ∈ X if
x
n
−a −→ 0
as n → ∞.
Similarly, a series
∞
n=1
x
n
converges if the sequence of partial sums (s
N
)
deﬁned by s
N
=
N
n=1
x
n
is a convergent sequence.
Example 1.9. [1] If (x
j
) is a sequence in R
n
, with x
j
= (x
(1)
j
, . . . , x
(n)
j
), then
check that
x
j

p
→ 0
(that is, (x
j
) converges to 0 in the space
n
p
) if and only if x
(k)
j
→ 0 in R for each
k = 1, . . . , n.
[2] For inﬁnite–dimensional spaces, it is not enough to check convergence on each
component using a basis. Let (x
j
) be the sequence in
p
deﬁned by
x
j
= (0, 0, . . . , 1, . . . )
(where the 1 appears in the jth position. Then if we write x
j
= (x
(1)
j
, x
(2)
j
, . . . ) we
certainly have x
(k)
j
→ 0 as j → ∞ for each k. However, we also have x
j

p
= 1 for
all j, so the sequence is certainly not converging to 0. Indeed, it is not converging
to anything.
Lemma 1.2. A map F : X → Y between normed linear spaces is continuous at
a ∈ X if and only if
lim
n→∞
F(x
n
) = F(a)
for every sequence (x
n
) converging to a.
Proof. Replace [ [ with   in the proof of this statement for functions
R →R.
Definition 1.9. A sequence (x
n
) is a Cauchy sequence if
∀ > 0 ∃ N such that n, m > N =⇒ x
n
−x
m
 < .
It is clear that a convergent sequence is a Cauchy sequence. We know that in
the normed linear space (R, [ [) the converse also holds, and it is a simple matter
to check that in R
n
the converse holds. In many reasonable inﬁnite–dimensional
normed linear spaces however there are Cauchy sequences that do not converge.
Definition 1.10. A normed linear space is said to be complete if all Cauchy
sequences are convergent.
9. TOPOLOGICAL LANGUAGE 13
Example 1.10. [1] The sequence 3,
31
100
,
314
1000
,
31415
10000
, . . . is a Cauchy sequence
of rationals converging to the real number π.
[2] Consider the space C[0, 1] of continuous functions under the sup norm (cf. Ex
ample 1.4[4]). This is complete.
[3] The space C[0, 1] under the 2–norm (cf. Example 1.4[5]) is not complete. To
see this, consider the sequence of functions
u
n
(t) =
_
¸
_
¸
_
0, 0 ≤ t ≤
1
2
−
1
n
nt
2
−
n
4
+
1
2
,
1
2
−
1
n
≤ t ≤
1
2
+
1
n
1
1
2
+
1
n
≤ t ≤ 1.
Then (u
n
) is a Cauchy sequence, since
u
m
−u
n

2
2
=
_
1
0
[u
m
(t) −u
n
(t)[
2
dt
=
_
1/2
1/2−1/m
[u
m
(t) −u
n
(t)[
2
dt +
_
1/2+1/m
1/2
[u
m
(t) −u
n
(t)[
2
dt
→ 0 as m > n → ∞.
We claim that the sequence (u
n
) is not convergent in C[0, 1] under the 2–norm. To
see this, let g be the function deﬁned by g(t) = 0 for 0 ≤ t ≤
1
2
and g(t) = 1 for
1
2
< t ≤ 1, and assume that there is a continuous function f with u
n
− f
2
→ 0
as n → ∞. It is clear that u
n
−g
2
→ 0 as n → ∞ also, so we must have
f −g
2
= 0. (3)
Now examine f(
1
2
). If f(
1
2
) ,= g(
1
2
) = 0 then [f −g[ must be positive on (
1
2
−δ,
1
2
)
for some δ > 0, which contradicts (3). We must therefore have f(
1
2
) = 0; but in
this case [f −g[ must be positive on (
1
2
,
1
2
+δ) for some δ > 0, again contradicting
(3). We conclude that there is no continuous function f that is the 2–norm limit
of the sequence (u
n
). Thus the normed space (C[0, 1],  
2
) is not complete.
9. Topological language
There are certain properties of subsets of normed linear spaces (and other
more general spaces) that we use very often. Topology is a subject that begins
by attaching names to these properties and then develops a shorthand for talking
about such things.
Definition 1.11. Let X be a normed linear space.
A set C ⊂ X is closed if whenever (c
n
) is a sequence in C that is a convergent
sequence in X, the limit lim
n→∞
c
n
also lies in C.
A set U ⊂ X is open if for every u ∈ U there exists > 0 such that x −u <
=⇒ x ∈ U.
A set S ⊂ X is bounded if there is an R < ∞ with the property that x ∈
S =⇒ x < R.
A set S ⊂ X is connected if there do not exist open sets A, B in X with
S ⊂ A∪ B, S ∩ A ,= ∅, S ∩ B ,= ∅ and S ∩ A∩ B = ∅.
Associated to any set S ⊂ X in a normed space there are sets S
o
⊂ S ⊂
¯
S
deﬁned as follows: the interior of S is the set
S
o
= ¦x ∈ X [ ∃ > 0 such that x −y < =⇒ y ∈ S¦, (4)
14 1. NORMED LINEAR SPACES
and the closure of S,
¯
S = ¦x ∈ X [ ∀ > 0 ∃s ∈ S such that s −x < ¦. (5)
Exercise 1.1. [1] Prove that a map f : X → Y is continuous (cf. Deﬁnition
1.8) if and only if for every open set U ⊂ Y , the pre–image f
−1
(U) ⊂ X is also
open.
[2] Show by example that for a continuous map f : R →R there may be open sets
U for which f(U) is not open.
It is clear from ﬁrst year analysis that closed bounded sets (closed intervals,
for example) have special properties. For example, recall the theorem of Bolzano–
Weierstrass.
Theorem 1.1. Let S be a closed and bounded subset of R. Then a continuous
function f : S →R attains its bounds: there exist ξ, η ∈ S with the property that
f(ξ) = sup
s∈S
f(s), f(η) = inf
s∈S
f(s).
Definition 1.12. A subset S of a normed linear space is (sequentially) com
pact if and only if every sequence (s
n
) in S has a subsequence (s
n
j
) = (s
n
1
, s
n
2
, . . . )
that converges in S.
Recall the following theorem (the Heine–Borel theorem) – which is really the
same one as Theorem 1.1.
Theorem 1.2. A subset of R
n
is compact if and only if it is closed and bounded.
By now you should be used to the idea that any such result does not extend to
inﬁnite–dimensional normed linear spaces: Example 1.9[2] is a bounded sequence
with no convergent subsequences. Thus the result Theorem 1.2 does not extend
to inﬁnite–dimensional normed spaces. However the analogue of Theorem 1.1 does
hold in great generality. This is also a version of the Bolzano–Weierstrass theorem.
Theorem 1.3. If A is a compact subset of a normed linear space X, and f :
X → Y is a continuous map between normed linear spaces, then f(A) is a compact
subset of Y .
As an exercise, convince yourself that Theorem 1.3 implies Theorem 1.1.
Some standard sets are used so often that we give them special names.
Definition 1.13. Let X be a normed space. Then the open ball of radius
> 0 and centre x
0
is the set
B
(x
0
) = ¦x ∈ X [ x −x
0
 < .
The closed ball of radius > 0 and center x
0
is the set
¯
B
(x
0
) = ¦x ∈ X [ x −x
0
 ≤ ¦.
Exercise 1.2. Open and closed balls in normed spaces are convex (cf. Deﬁni
tion 1.6).
Definition 1.14. A subset S of a normed space X is dense if every open ball
in X has non–empty intersection with S. A normed space is said to be separable
if there is a countable set S = ¦x
1
, x
2
, . . . ¦ that is dense in X.
10. QUOTIENT SPACES 15
10. Quotient spaces
As an application of Section 9, quotients of normed spaces may be formed.
Notice that we need both the algebraic structure (subspace of a linear space) and
a topological property (closed) to make it all work.
Recall from Deﬁnition 1.11 and Deﬁnition 1.2 that a closed linear subspace Y
of a normed linear space X is a subset Y ⊂ X that is itself a linear space, with the
property that any sequence (y
n
) of elements of Y that converges in X has the limit
in Y .
The linear space X/Y (the quotient or factor space is formed as follows. The
elements of X/Y are cosets of Y – sets of the form x + Y for x ∈ X. The set of
cosets is a linear space under the operations
(x
1
+Y ) ⊕(x
2
+Y ) = (x
1
+x
2
) +Y, λ (x +Y ) = λx +Y.
Notice that this makes sense precisely because Y is itself a linear space, so for
example Y + Y = Y and λY = Y for λ ,= 0. Two cosets x
1
+ Y and x
2
+ Y are
equal if as sets x
1
+Y = x
2
+Y , which is true if and only if x
1
+x
2
∈ Y .
Example 1.11. [1] Let X = R
3
, and let Y be the subspace spanned by (1, 1, 0).
Then X/Y is a two–dimensional real vector space. There are many pairs of elements
that generate X/Y , for example
(1, 0, 1) +Y and (0, 0, 1) +Y.
[2] The linear space Y of ﬁnitely supported sequences in
1
is a linear subspace. The
quotient space
1
/Y is very hard to visualize: its elements are equivalence classes
under the relation (x
n
) ∼ (y
n
) if the sequences (x
n
) and (y
n
) diﬀer in ﬁnitely many
positions.
[3] The linear space Y of
1
sequences of the form (0, . . . , 0, x
n+1
, . . . ) (ﬁrst n are
zero) is a linear subspace of
1
. Here the quotient space
1
/Y is quite reasonable:
in fact it is isomorphic to R
n
.
[4] We know that for p, q ∈ [1, ∞], p < q =⇒
p
⊂
q
. This means that for
any p < q there is a linear quotient space
q
/
p
. These quotient spaces are very
pathological.
[5] The linear space Y = C[0, 1] is a linear subspace of the space X of square–
Riemmann–integrable functions on [0, 1]. The quotient X/Y is again a linear space
that is impossible to work with.
[6] Let X = C[0, 1], and let Y = ¦f ∈ X [ f(0)¦. Then X/Y is isomorphic to R.
It is clear from these examples that not all linear subspaces are equally good:
Examples 1.11 [1], [3], and [6] are quite reasonable, whereas [2], [4] and [5] are
examples of linear spaces unlike any we have seen. The reason is the following: the
space X/Y is guaranteed to be a normed space with a norm related to the original
norm on X only when the subspace Y is itself closed. Notice that Examples 1.11
[1], [3], and [6] are precisely the ones in which the subspace is closed.
Theorem 1.4. If X is a normed space, and Y is a normed linear subspace,
then X/Y is a normed space under the norm
x +Y  = inf
z∈x+Y
z. (6)
Before proving this theorem, try to convince yourself that the norm (6) is the
obvious candidate: if X = R
2
and Y = (1, 0)R, then the space X/Y consists of
16 1. NORMED LINEAR SPACES
lines in X of the form (s, t)+Y . Notice that each such line may be written uniquely
in the form (0, t) +Y , and this choice minimizes the norm of the element of X that
represents the line.
Proof. Let x + Y be any coset of X, and let (x
n
) ⊂ z + Y be a convergent
sequence with x
n
→ x. Then for any ﬁxed n, x
n
− x
m
→ x
n
− x is a sequence
in Y converging in X. Since Y is closed, we must have x
n
− x ∈ Y , so x + Y =
x
n
+ Y = z + Y . That is, the limit of the sequence deﬁnes the same coset as does
the sequence – the set z +Y is a closed set.
Assume now that x + Y  = 0. Then there is a sequence (x
n
) ⊂ x + Y with
x
n
 → 0. Since x+Y is closed and x
n
→ 0, we must have 0 ∈ x+Y , so x+Y = Y ,
the zero element in X/Y .
Homogeneity is clear:
λ(x +Y ) = inf
z∈x+Y
λz = [λ[ inf
z∈x+Y
z = [λ[x +Y .
Finally, the triangle inequality:
(x
1
+Y ) + (x
2
+Y ) = inf
z
1
∈x
1
+Y ;z
2
∈x
2
+Y
z
1
+z
2

≤ inf
z
1
∈x
1
+Y
z
1
 + inf
z
2
∈x
2
+Y
z
2

= x
1
+Y  +x
2
+Y .
Example 1.12. Even if the subspace is closed, the quotient space may be a
little odd. For example, let c denote the space of all sequences (x
n
) with the
property that lim
n
x
n
exists. This is a closed subspace of
∞
. What is the quotient
∞
/c?
CHAPTER 2
Banach spaces
It turns out to be very important and natural to work in complete spaces –
trying to do functional analysis in non–complete spaces is a little like trying to do
elementary analysis over the rationals.
Definition 2.1. A complete normed linear space is called
1
a Banach space.
Example 2.1. [1] We are already familiar with a large class of Banach spaces:
any ﬁnite–dimensional normed linear space is a Banach space. In our notation, this
means that
n
p
is a Banach space for all 1 ≤ p ≤ ∞ and all n.
[2] The space of continuous functions with the sup norm is a Banach space (cf.
Example 1.4[4] and Example 1.10[2].
[3] The sequence space
p
is a Banach space. To see this, assume that (x
n
) is a
Cauchy sequence in
p
, and write
x
n
= (x
(1)
n
, x
(2)
n
, . . . ).
Recall that  
p
≥  
∞
for all p (cf. Example 1.4[3]). So, given > 0 we may
ﬁnd N with the property that
m, n > N =⇒ x
n
−x
m

p
<
which in turn implies that x
n
− x
m

∞
< , so for each k, [x
(k)
n
− x
(k)
m
[ < . That
is, if (x
n
) is a Cauchy sequence in
p
, then for each k (x
(k)
n
) is a Cauchy sequence
in R. Since R is complete, we deduce that for each k we have x
(k)
n
→ y
(k)
. Notice
that this does not imply by itself that x
n
→ y (cf. Example 1.9[2]). However, if we
know (as we do) that (x
n
) is Cauchy, then it does: we prove this for p < ∞ but
the p = ∞ case is similar. Fix > 0, and use the Cauchy criterion to ﬁnd N such
that n, m > N implies that
∞
k=1
[x
(k)
n
−x
(k)
m
[
p
< .
Now ﬁx n and let m → ∞ to see that
∞
k=1
[x
(k)
n
−y
(k)
[
p
≤
(notice that < has become ≤). This last inequality means that
x
n
−y
p
≤
1/p
,
1
After the Polish mathematician Stefan Banach (1892–1945) who gave the ﬁrst abstract
treatment of complete normed spaces in his 1920 thesis (Fundamenta Math., 3, 133–181, 1922).
His later book (Th´eorie des op´erations lin´eaires, Warsaw, 1932) laid the foundations of functional
analysis.
17
18 2. BANACH SPACES
showing that in
p
, x
n
→ y = (y
(1)
, y
(2)
2
, . . . ).
Lemma 2.1. Let (x
n
) be a sequence in a Banach space. If the series
∞
n=1
x
n
is absolutely convergent, then it is convergent.
Recall that absolutely convergent means that the numerical series
∞
n=1
x
n

is convergent. The lemma is clearly not true for general normed spaces: take, for
example, a sequence of functions in C[0, 1] each with f
n

2
=
1
n
2
with the property
that
∞
n=1
f
n
is not continuous.
Proof. Consider the sequence of partial sums s
m
=
m
n=1
x
n
:
s
m
−s
k
 ≤
m
n=k+1
x
n
 → 0
as m > k → ∞. It follows that the sequence (s
m
) is Cauchy; since X is complete
this sequence converges, so the series
∞
n=1
x
n
converges.
1. Completions
Completeness is so important that in many applications we deal with non–
complete normed spaces by completing them. This is analogous to the process of
passing from Q to R by working with Cauchy sequences of rationals. In this section
we simply outline what is done. In later sections we will see more details about
what the completions look like.
Let X be a normed linear space. Let ((X) denote the set of all Cauchy se
quences in X. An element of ((X) is then a Cauchy sequence (x
n
). The linear
space structure of X extends to ((X) by deﬁning α (x
n
) +(y
n
) = (αx
n
+y
n
). The
norm   on X extends to a semi–norm on ((X), deﬁned by
(x
n
) = lim
n→∞
x
n
.
Finally, deﬁne an equivalence relation ∼ on ((X) by (x
n
) ∼ (y
n
) if and only if
x
n
−y
n
→ 0. Then the linear space operations and the semi–norm are well–deﬁned
on the space of equivalence classes ((X)/ ∼, giving a complete normed linear space
¯
X called the completion of X.
Exercise 2.1. [1] Apply the process outlined above to the rationals Q. Try to
see why the obvious extension of the norm to the space of Cauchy sequences only
gives a semi–norm.
[2] Construct a Cauchy sequence (f
n
) in (C[0, 1],  
2
) with the property that
f
n
,= 0 for any n but f
n

2
→ 0. This means that the Cauchy sequence (f
n
) and
the Cauchy sequence (0) are not separated by the semi–norm  
2
, showing it is
not a norm.
[3] Show that if X is already a Banach space, then there is a bijective isomorphism
between X and
¯
X.
It should be clear from the above that it is going to be diﬃcult to work with
elements of the completions in this formal way, where an element of
¯
X is an equiv
alence class of Cauchy sequences. However all we will ever need is the simple
statement: for any normed linear space X, there is a Banach space
ˆ
X such that X
is isomorphic to a dense subspace ı(X) of
ˆ
X; the map ı from X into
ˆ
X preserves
all the linear space operations.
2. CONTRACTION MAPPING THEOREM 19
Example 2.2. [1] We have seen that C[0, 1] under the 2–norm is not complete
(cf. Example 1.10[2]). Similar examples will show that C[0, 1] is not complete
under any of the p–norms. Let X denote the non–complete space (C[0, 1],  
p
).
A reasonable guess for
¯
X might be the space of Riemann–integrable functions with
ﬁnite p–norm, but this is still not complete. It is easy to construct a Cauchy
sequence of Riemann–integrable functions that does not converge to a Riemann–
integrable function in the p–norm. However, if you use Lebesgue integration, you
do get a complete space, called L
p
[0, 1]. For now, think of this space as consisting of
all Riemann–integrable functions with ﬁnite p–norm together with extra functions
obtained as limits of sequences of Riemann–integrable functions. Then L
p
provides
a further example of a Banach space.
[2] A function f : X → Y is said to have compact support if it is zero outside
some compact subset of X; the support of f is the smallest closed set containing
¦x ∈ X [ f(x) ,= 0¦. This example is of importance in distribution theory and
the study of partial diﬀerential equations. Let C
∞
0
(Ω) be the space of inﬁnitely
diﬀerentiable functions of compact support on Ω, an open subset of R
n
. Recall the
deﬁnition of higher–order derivatives D
a
from Example 1.1(8). For each k ∈ N and
1 ≤ p ≤ ∞ deﬁne a norm
f
k,p
=
_
_
_
Ω
a≤k
[D
a
f(x)[
p
dx
_
_
1/p
.
This gives an inﬁnite family of (diﬀerent) normed space structures on the linear
space C
∞
0
(Ω). None of these spaces are complete because there are sequences of
C
∞
functions whose (n, p)–limit is not even continuous. The completions of these
spaces are the Sobolev spaces.
2. Contraction mapping theorem
In this section we prove the simplest of the many ﬁxed–point theorems. Such
theorems are useful for solving equations, and with the formalism of function spaces
one uniform treatment may be given for numerical equations like x = cos(x) and
diﬀerential equations like
dy
dx
= x + tan(xy), y(0) = y
0
.
Exercise 2.2. If you have an electronic calculator, put it in “radians” mode.
Starting with any initial value, press the cos button repeatedly. What happens?
Can you explain why this happens? (Draw a graph) How does this relate to the
equation x = cos(x).
Definition 2.2. A map F : X → Y between normed linear spaces is called a
contraction if there is a constant K < 1 for which
F(x) −F(y)
Y
≤ K x −y
X
(7)
for all x, y ∈ X.
Exercise 2.3. [1] Any contraction is uniformly continuous.
[2] If f : [a, b] → [a, b] has the property that [f(x) − f(y)[ < [x − y[ then f is a
contraction.
[3] Find an example of a function f : R →R that has the property [f(x) −f(y)[ <
[x −y[ for all x, y ∈ R, but f is not a contraction.
20 2. BANACH SPACES
Theorem 2.1. If F : X → X is a contraction and X is a Banach space, then
there is a unique point x
∗
∈ X which is ﬁxed by F (that is, F(x) = x). Moreover,
if x
0
is any point in X, then the sequence deﬁned by x
1
= F(x
0
), x
2
= F(x
1
), . . .
converges to x
∗
.
Corollary 2.1. If S is a closed subset of the Banach space X, and F : S → S
is a contraction, then F has a unique ﬁxed point in S.
Proof. Simply notice that S is itself complete (since it is a closed subset of
a complete space), and the proof of Theorem 2.1 does not use the linear space
structure of X.
Corollary 2.2. If S is a closed subset of a Banach space, and F : S → S has
the property that for some n there is a K < 1 such that
F
n
(x) −F
n
(y)
Y
≤ K x −y
X
for all x, y ∈ S, then F has a unique ﬁxed point.
Proof. Choose any point x
0
∈ S. Then by Corollary 2.1 we have
x = lim
k→∞
F
kn
x
0
,
where x is the unique ﬁxed point of F
n
. By the continuity of F,
Fx = lim
k→∞
FF
kn
x
0
.
On the other hand, F
n
is a contraction, so
F
kn
Fx
0
−F
kn
x
0
 ≤ KF
(k−1)n
Fx
0
, F
(k−1)n
x
0
 ≤ ≤ K
k
F(x
0
) −x
0
,
so
F(x) −x = lim
k→∞
FF
kn
x
0
−F
kn
x
0
 = 0.
It follows that F(x) = x so x is a ﬁxed point for F. This ﬁxed point is automatically
unique: if F has more than one ﬁxed point, then so does F
n
which is impossible
by Corollary 2.1.
Exercise 2.4. [1] Give an example of a map f : R →R which has the property
that [f(x) −f(y)[ < [x −y[ for all x, y ∈ R but f has no ﬁxed point.
[2] Let f be a function from [0, 1] to [0, 1]. Check that the contraction condition
(7) holds if f has a continuous derivative f
on [0, 1] with the property that
[f
(x)[ ≤ K < 1
for all x ∈ [0, 1]. As an exercise, draw graphs to illustrate convergence of the iterates
2
of f to a ﬁxed point for examples with 0 < f
(x) < 1 and −1 < f
(x) < 0.
Example 2.3. A basic linear problem is the following: let F : R
n
→R
n
be the
aﬃne map deﬁned by
F(x) = Ax +b
2
Iteration of continuous functions on the interval may be used to illustrate many of the fea
tures of dynamical systems, including frequency locking, sensitive dependence on initial conditions,
period doubling, the Feigenbaum phenomena and so on. An excellent starting point is the article
and demonstration “One–dimensional iteration” at the web site http://www.geom.umn.edu/java/.
2. CONTRACTION MAPPING THEOREM 21
where A = (a
ij
) is an n n matrix. Equivalently, F(x) = y, where
y
i
=
n
j=1
a
ij
x
j
+b
i
for i = 1, . . . , n. If F is a contraction, then we can apply Theorem 2.1 to solve
3
the
equation F(x) = x. The conditions under which F is a contraction depend on the
choice of norm for R
n
. Three examples follow.
[1] Using the max norm, x
∞
= max¦[x
i
[¦. In this case,
F(x) −F(˜ x)
∞
= max
i
¸
¸
¸
¸
¸
¸
j
a
ij
(x
j
− ˜ x
j
)
¸
¸
¸
¸
¸
¸
≤ max
i
j
[a
ij
[[x
j
− ˜ x
j
[
≤ max
i
j
[a
ij
[ max
j
[x
j
− ˜ x
j
[
=
_
_
max
i
j
[a
ij
[
_
_
x − ˜ x
∞
.
Thus the contraction condition is
j
[a
ij
[ ≤ K < 1 for i = 1, . . . , n. (8)
[2] Using the 1–norm, x
1
=
n
i=1
[x
i
[. In this case,
F(x) −F(˜ x)
1
=
i
¸
¸
¸
¸
¸
¸
j
a
ij
(x
j
− ˜ x
j
)
¸
¸
¸
¸
¸
¸
≤
i
j
[a
ij
[[x
j
− ˜ x
j
[
≤
_
max
j
i
[a
ij
[
_
x − ˜ x
1
.
The contraction condition is now
i
[a
ij
[ ≤ K < 1 for j = 1, . . . , n. (9)
[3] Using the 2–norm, x
2
=
_
n
i=1
¦[x
i
[
2
_
1/2
. In this case,
F(x) −F(˜ x)
2
2
=
i
_
_
j
a
ij
(x
j
− ˜ x
j
)
_
_
2
≤
_
_
i
j
a
2
ij
_
_
2
x − ˜ x
2
2
.
3
Of course the equation is in one sense trivial. However, it is sometimes of importance
computationally to avoid inverting matrices, and more importantly to have an iterative scheme
that converges to a solution in some predictable fashion.
22 2. BANACH SPACES
The contraction condition is now
i
j
[a
2
ij
[ ≤ K < 1. (10)
It follows that if any one of the conditions (8), (9), or (10) holds, then there
exists a unique solution in R
n
to the aﬃne equation Ax + b = x. Moreover,
the solution may be approximated using the iterative scheme x
1
= F(x
0
), x
2
=
F(x
1
), . . . .
Notice that each of the conditions (8), (9), (10) is suﬃcient for the method to
work, but none of them are necessary. In fact there are examples in which exactly
one of the condition holds for each of them conditions.
It remains only to prove the contraction mapping theorem.
Proof of Theorem 2.1. Given any point x
0
∈ X, deﬁne a sequence (x
n
) by
x
1
= F(x
0
), x
2
= F(x
1
), . . . .
Then, for any n ≤ m we have by the contraction condition (7)
x
n
−x
m
 = F
n
x
0
−F
m
x
0

≤ K
n
x
0
−F
m−n
x
0

≤ K
n
(x
0
−x
1
 +x
1
−x
2
 + +x
m−n−1
−x
m−n
)
≤ K
n
x
0
−x
1

_
1 +K +K
2
+ +K
m−n−1
_
<
K
n
1 −K
x
0
−x
1
.
Now for ﬁxed x
0
, the last expression converges to zero as n goes to inﬁnity, so (cf.
Deﬁnition 1.9) the sequence (x
n
) is a Cauchy sequence.
Since the linear space X is complete (cf. Deﬁnition 1.10), the sequence x
n
therefore converges, say
x
∗
= lim
x→∞
X
N
.
Since F is continuous,
F(x
∗
) = F
_
lim
n→∞
x
n
_
= lim
n→∞
F(x
n
) = lim
n→∞
x
n+1
= x
∗
,
so F has a ﬁxed point x
∗
. To prove that x
∗
is the only ﬁxed point for F, notice
that if F(y) = y say, then
x
∗
−y = F(x
∗
) −F(y) ≤ Kx
∗
−y,
which requires that x
∗
= y since K < 1.
3. Applications to diﬀerential equations
As mentioned before, the most important applications of the contraction map
ping method are to function spaces. We have seen already in Example 1.8[3] that
ﬁxed points for certain integral operators on function spaces are solutions of ordi
nary diﬀerential equations. The ﬁrst result in this direction is due to Picard
4
.
4
(Charles) Emile Picard (1856–1941), who was Professor of higher analysis at the Sorbonne
and became permanent secretary of the Paris Academy of Sciences. Some of his deepest results
lie in complex analysis: 1) a non–constant entire function can omit at most one ﬁnite value, 2)
a non–polynomial entire function takes on every value (except the possible exceptional one), an
inﬁnite number of times.
3. APPLICATIONS TO DIFFERENTIAL EQUATIONS 23
Theorem 2.2. Let f : G → R be a continuous function deﬁned on a set G
containing a neighbourhood ¦(x, y) [ (x, y) − (x
0
, y
0
)
∞
< e¦ of (x
0
, y
0
) for some
e > 0. Suppose that f satisﬁes a Lipschitz condition of the form
[f(x, y) −f(x, ˜ y)[ ≤ M[y − ˜ y[ (11)
in the variable y on G. Then there is an interval (x
0
− δ, x
0
+ δ) on which the
ordinary diﬀerential equation
dy
dx
= f(x, y) (12)
has a unique solution y = φ(x) satisfying the initial condition
φ(x
0
) = y
0
. (13)
Proof. The diﬀerential equation (12) with initial condition (13) is equivalent
to the integral equation
φ(x) = y
0
+
_
x
x
0
f(t, φ(t))dt. (14)
Since f is continuous there is a bound
[f(x, y)[ ≤ R (15)
for all (x, y) with (x, y) −(x
0
, y
0
)
∞
< e
for some e
> 0. Choose δ > 0 such that
(1) [x −x
0
[ ≤ δ, [y −y
0
[ ≤ Rδ together imply that (x, y) −(x
0
, y
0
)
∞
< e
;
(2) Mδ < 1 where M is the Lipschitz constant in (11).
Let S be the set of continuous functions φ deﬁned on the interval [x −x
0
[ ≤ δ
with the property that [φ(x) − y
0
[ ≤ Rδ, equipped with the sup metric. The set
S is complete, since it is a closed subset of a complete space. Deﬁne a mapping
F : S → S by the equation
(F(φ)) (x) = y
0
+
_
x
x
0
f(t, φ(t))dt. (16)
First check that F does indeed map S into S: if φ ∈ S, then
[Fφ(x) −y
0
[ =
¸
¸
¸
¸
_
x
x
0
f(t, φ(t))dt
¸
¸
¸
¸
≤
_
x
x
0
[f(t, φ(t))[dt
≤ R[x −x
0
[ ≤ Rδ
by (15), so F(φ) ∈ S. Moreover,
[Fφ(x) −F
˜
φ(x)[ ≤
_
x
x
0
[f(t, φ(t)) −f(t,
˜
φ(t))[dt
≤ Mδ max
x
[φ(x) −
˜
φ(x)[,
so that
F(φ) −F(
˜
φ) ≤ Mδφ −
˜
φ,
after taking sups over x. By construction, Mδ < 1, so that F is a contraction
mapping. It follows from Corollary 2.2 that the operator F has a unique ﬁxed
point in S, so the diﬀerential equation (12) with initial condition (13) has a unique
solution.
24 2. BANACH SPACES
The conditions on the set G used in Theorem 2.2 arise very often so it is
useful to have a short description for them. A domain in a normed linear space
X is an open connected set (cf. Deﬁnition 1.11). An example of a domain in R
containing the point a is an interval (a − δ, a + δ) for some δ > 0. Notice that
if G is a domain in (X,  
X
), and a ∈ G then for some > 0 the open ball
B
(a) = ¦x ∈ X [ x −a
X
< ¦ lies in G (cf. Deﬁnition 1.13).
Picard’s theorem easily generalises to systems of simultaneous diﬀerential equa
tions, and we shall see in the next section that the contraction mapping method
also applies to certain integral equations.
Theorem 2.3. Let G be a domain in R
n+1
containing the point (x
0
, y
01
, . . . , y
0n
),
and let f
1
, . . . , f
n
be continuous functions from G to R each satisfying a Lipschitz
condition
[f
i
(x, y
1
, . . . , y
n
) −f
i
(x, ˜ y
1
, . . . , ˜ y
n
)[ ≤ M max
1≤i≤n
[y
i
− ˜ y
i
[ (17)
in the variables y
1
, . . . , y
n
. Then there is an interval (x
0
−δ, x
0
+ δ) on which the
system of simultaneous ordinary diﬀerential equations
dy
i
dx
= f
i
(x, y
1
, . . . , y
n
) for i = 1, . . . , n (18)
has a unique solution
y
1
= φ
1
(x), . . . , y
n
= φ
n
(x)
satisfying the initial conditions
φ
1
(x
0
) = y
01
, . . . , φ
n
(x
0
) = y
0n
. (19)
Proof. As in the proof of Theorem 2.2, write the system deﬁned by (18) and
(19) in integral form
φ
i
(x) = y
0i
+
_
x
x
0
f
i
(t, φ
1
(t), . . . , φ
n
(t))dt for i = 1, . . . , n. (20)
Since each of the functions f
i
is continuous on G, there is a bound
[f
i
(x, y
1
, . . . , y
n
)[ ≤ R (21)
in some domain G
⊂ G with G
÷ (x
0
, y
01
, . . . , y
0n
). Choose δ > 0 with the
properties that
(1) [x −x
0
[ ≤ δ and max
i
[y
i
−y
0i
[ ≤ Rδ together imply that (x, y
1
, . . . , y
n
) ∈ G
;
(2) Mδ < 1.
Now deﬁne the set S to be the set of n–tuples (φ
1
, . . . , φ
n
) of continuous func
tions deﬁned on the interval [x
0
−δ, x
0
+δ] and such that [φ
i
(x) −y
0i
[ ≤ Rδ for all
i = 1, . . . , n. The set S may be equipped with the norm
φ −
˜
φ = max
x,i
[φ
i
(x) −
˜
φ
i
(x)[.
It is easy to check that S is complete. The mapping F deﬁned by the set of integral
operators
(F(φ))
i
(x) = y
0i
+
_
x
x
0
f
i
(t, φ
1
(t), . . . , φ
n
(t))dt for [x −x
0
[ ≤ δ, i = 1, . . . , n
is a contraction from S to itself. To see this, ﬁrst notice that if
φ = (φ
1
, . . . , φ
n
) ∈ S, and [x −x
0
[ ≤ δ
4. APPLICATIONS TO INTEGRAL EQUATIONS 25
then
[φ
i
(x) −y
0i
[ =
¸
¸
¸
¸
_
x
x
0
f
i
(t, φ
1
(t), . . . , φ
n
(t))dt
¸
¸
¸
¸
≤ Rδ for i = 1, . . . , n
by (21), so that F(φ) = (F(φ)
1
, . . . , F(φ)
n
) lies in S. It remains to check that F is
a contraction:
[ (F(φ))
i
(x) −
_
F(
˜
φ)
_
i
(x)[ ≤
_
x
x
0
[f
i
(t, φ
1
(t), . . . , φ
n
(t)) −f
i
(t,
˜
φ
1
(t), . . . ,
˜
φ
n
(t))[dt
≤ Mδ max
i
[φ
i
(x) −
˜
φ
i
(x)[;
after maximising over x and i we have
F(φ) −F(
˜
φ) ≤ Mδφ −
˜
φ,
so F : S → S is a contraction. It follows that the equation (20) has a unique
solution, so the system of diﬀerential equations (18) and (19) has a unique solution.
4. Applications to integral equations
Integral equations may be a little less familiar than diﬀerential equations (though
we have seen already that the two are intimately connected), so we begin with some
important examples. The theory of integral equations is largely modern (twentieth–
century) mathematics, but several speciﬁc instances of integral equations had ap
peared earlier.
Certain problems in physics led to the need to “invert” the integral equation
g(x) =
1
√
2π
_
∞
−∞
e
ixy
f(y)dy (22)
for functions f and g of speciﬁc kinds. This was solved – formally at least – by
Fourier
5
in 1811, who noted that (22) requires that
f(x) =
1
√
2π
_
∞
−∞
e
−ixy
g(y)dy.
We shall see later that this is really due to properties of particularly good Banach
spaces called Hilbert spaces.
5
Jean Baptiste Joseph Fourier (1768–1830), who pursued interests in mathematics and math
ematical physics. He became famous for his Theorie analytique de la Chaleur (1822), a mathe
matical treatment of the theory of heat. He established the partial diﬀerential equation governing
heat diﬀusion and solved it by using inﬁnite series of trigonometric functions. Though these series
had been used before, Fourier investigated them in much greater detail. His research, initially
criticized for its lack of rigour, was later shown to be valid. It provided the impetus for later work
on trigonometric series and the theory of functions of a real variable.
26 2. BANACH SPACES
Abel
6
studied generalizations of the tautochrone
7
problem, and was led to the
integral equation
g(x) =
_
x
a
f(y)
(x −y)
b
dy, b ∈ (0, 1), g(a) = 0
for which he found the solution
f(y) =
sin πb
π
_
y
a
g
(x)
(y −x)
1−b
dx.
This equation is an example of a Volterra
8
equation.
We shall brieﬂy study two kinds of integral equation (though the second is
formally a special case of the ﬁrst).
Example 2.4. A Fredholm equation
9
is an integral equation of the form
f(x) = λ
_
b
a
K(x, y)f(y)dy +φ(x), (23)
where K and φ are two given functions, and we seek a solution f in terms of
the arbitrary (constant) parameter λ. The function K is called the kernel of the
equation, and the equation is called homogeneous if φ = 0.
We assume that K(x, y) and φ(x) are continuous on the square ¦(x, y) [ a ≤
x ≤ b, a ≤ y ≤ b¦. It follows in particular (see Section 1.9) that there is a bound
M so that
[K(x, y)[ ≤ M for all a ≤ x ≤ b, a ≤ y ≤ b.
Deﬁne a mapping F : C[a, b] → C[a, b] by
(F(f)) (x) = λ
_
b
a
K(x, y)f(y)dy +φ(x) (24)
Now
F(f
1
) −F(f
2
) = max
x
[F(f
1
)(x) −F(f
2
)(x)[
≤ [λ[M(b −a) max
x
[f
1
(x) −f
2
(x)[
= [λ[M(b −a)f
1
−f
2
,
6
Niels Henrik Abel (1802–1829), was a brilliant Norwegian mathematician. He earned wide
recognition at the age of 18 with his ﬁrst paper, in which he proved that the general polynomial
equation of the ﬁfth degree is insolvable by algebraic procedures (problems of this sort are studied
in Galois Theory). Abel was instrumental in establishing mathematical analysis on a rigorous
basis. In his major work, Recherches sur les fonctions elliptiques (Investigations on Elliptic Func
tions, 1827), he revolutionized the understanding of elliptic functions by studying the inverse of
these functions.
7
Also called an isochrone: a curve along which a pendulum takes the same time to make a
complete osciallation independent of the amplitude of the oscillation. The resulting diﬀerential
equation was solved by James Bernoulli in May 1690, who showed that the result is a cycloid.
8
Vito Volterra (1860–1940) succeeded Beltrami as professor of Mathematical Physics at
Rome. His method for solving the equations that carry his name is exactly the one we shall
use. He worked widely in analysis and integral equations, and helped drive Lebesgue to produce a
more sophisticated integration by giving an example of a function with bounded derivative whose
derivative is not Riemann integrable.
9
This is really a Fredholm equation “of the second kind”, named after the Swedish geometer
Erik Ivar Fredholm (18661927).
4. APPLICATIONS TO INTEGRAL EQUATIONS 27
so that F is a contraction mapping if
[λ <
1
M(b −a)
.
It follows by Theorem 2.1 that the equation (23) has a unique continuous solution
f for small enough values of λ, and the solution may be obtained by starting with
any continuous function f
0
and then iterating the scheme
f
n+1
(x) = λ
_
b
a
K(x, y)f
n
(y)dy +φ(x).
Example 2.5. Now consider the Volterra equation,
f(x) = λ
_
x
a
K(x, y)f(y)dy +φ(x), (25)
which only diﬀers
10
from the Fredholm equation (23) in that the variable x appears
as the upper limit of integration. As before, deﬁne a function F : C[a, b] → C[a, b]
by
(F(f)) (x) = λ
_
x
a
K(x, y)f(y)dy +φ(x).
Then for f
1
, f
2
∈ C[a, b] we have
[F(f
1
)(x) −F(f
2
)(x)[ =
¸
¸
¸
¸
λ
_
x
a
K(x, y)[f
1
(y) −f
2
(y)]dy
¸
¸
¸
¸
≤ [λ[M(x −a) max
x
[f
1
(x) −f
2
(x)[,
where M = max
x,y
[K(x, y)[ < ∞. It follows that
[F
2
(f
1
)(x) −F
2
(f
2
)(x)[ =
¸
¸
¸
¸
λ
_
x
a
K(x, y)[F(f
1
)(y) −F(f
2
)(y)]dy
¸
¸
¸
¸
≤ [λ[M
_
x
a
[F(f
1
)(y) −F(f
2
)(y)[dy
≤ [λ[
2
M
2
max
x
[f
1
(x) −f
2
(x)[
_
x
a
[y −a[dy
= [λ[
2
M
2
(x −a)
2
2
max
x
[f
1
(x) −f
2
(x)[,
and in general
[F
n
(f
1
)(x) −F
n
(f
2
)(x)[ ≤ [λ[
n
M
n
(x −a)
n
n!
max
x
[f
1
(x) −f
2
(x)[
≤ [λ[
n
M
n
(b −a)
n
n!
max
x
[f
1
(x) −f
2
(x)[.
It follows that
F
n
f
1
−F
n
f
2
 ≤ [λ[
n
M
n
(b −a)
n
n!
f
1
−f
2
,
so that F
n
is a contraction mapping if n is chosen large enough to ensure that
[λ[
n
M
n
(b −a)
n
n!
< 1.
10
If we extend the deﬁnition of the kernel K(x, y) appearing in (25) by setting K(x, y) = 0
for all y > x then (25) becomes an instance of the Fredholm equation (23). This is not done
because the contraction mapping method applied to the Volterra equation directly gives a better
result in that the condition on λ can be avoided.
28 2. BANACH SPACES
It follows by Corollary 2.2 that the equation (25) has a unique solution for all λ.
CHAPTER 3
Linear Transformations
Let X and Y be linear spaces, and T a function from the set D
T
⊂ X into Y .
Sometimes such functions will be called operators, mappings or transformations.
The set D
T
is the domain of T, and T(D
T
) ⊂ Y is the range of T. If the set D
T
is
a linear subspace of X and T is a linear map,
T(αx +βy) = αT(x) +βT(y) for all α, β ∈ R or C, x, y ∈ X. (26)
Notice that a linear operator is injective if and only if the kernel ¦x ∈ X [ Tx = 0¦
is trivial.
Lemma 3.1. A linear transformation T : X → Y is continuous if and only if
it is continuous at one point.
Proof. Assume that T is continuous at a point a. Then for any sequence
x
n
→ a, T(x
n
) → T(a). Let z be any point in X, and y
n
a sequence with y
n
→ z.
Then y
n
−z +a is a sequence converging to a, so T(y
n
−z +a) = T(y
n
) −T(z) +
T(a) → T(a). It follows that T(y
n
) → T(z).
A simple observation that is useful in diﬀerential equations, where it is called
the principle of superposition: if
∞
n=1
α
n
x
n
is convergent, and T is a continuous
linear map, then T (
∞
n=1
α
n
x
n
) =
∞
n=1
α
n
Tx
n
.
1. Bounded operators
Example 3.1. Consider a voltage v(t) applied to a resistor R, capacitor C,
and inductor L arranged in series (an “LCR” circuit). The charge u = u(t) on the
capacitor satisﬁes the equation
L
d
2
u
dt
2
+R
du
dt
+
1
C
u = v, (27)
with some initial conditions say u(0) = 0,
du
dt
(0) = 0. Assuming that R
2
> 4L/C,
then the solution of (27) is
u(t) =
_
t
0
k(t −s)v(s)ds, (28)
where
k(t) =
e
λ
1
t
−e
λ
2
t
L(λ
1
−λ
2
)
and λ
1
, λ
2
are the (distinct) roots of Lλ
2
+Rλ +
1
C
= 0.
This problem may be phrased in terms of linear operators. Let X = C[0, ∞);
then the transformation deﬁned by T(v) = u in (28) is a linear operator from X to
X.
29
30 3. LINEAR TRANSFORMATIONS
Similarly, (27) can be written in the form S(u) = v for some linear operator
S. However, S cannot be deﬁned on all of X – only on the dense linear subspace
of twice–diﬀerentiable functions. The transformations T and S are closely related,
and we would like to develop a framework for viewing them as inverse to each other.
Definition 3.1. A linear transformation T : X → Y is (algebraically) invert
ible if there is a linear transformation S : Y → X with the property that TS = 1
Y
and ST = 1
X
.
For example, in Example 3.1, if we take X = C[0, ∞) and Y = C
2
[0, ∞), then
T is algebraically invertible with T
−1
= S.
Definition 3.2. A linear operator T : X → Y is bounded if there is a constant
K such that
Tx
Y
≤ Kx
X
for all x ∈ X.
The norm of the bounded linear operator T is
T = sup
x=0
_
Tx
Y
x
X
_
. (29)
Example 3.2. In Example 3.1, the operator T is bounded when restricted to
any C[0, a] for any a, since
[Tv(s)[ ≤
_
t
0
[k(t −s)[ [v(s)[ds,
which shows that
Tv
∞
≤ a sup
0≤t≤a
[k(t)[v
∞
<
av
∞
L[λ
1
−λ
2
[
.
The operator S is not bounded of course – think about what diﬀerentiation does.
Exercise 3.1 (1). Show that T = sup
x=1
¦Tx
Y
¦.
[2] Prove the following useful inequality:
Tx
Y
≤ T x
X
for all x ∈ X. (30)
Theorem 3.1. A linear transformation T : X → Y is continuous if and only
if it is bounded.
Proof. If T is bounded and x
n
→ 0, then by Deﬁnition 3.2, Tx
n
→ 0 also. It
follows that T is continuous at 0, so by Lemma 3.1 T is continuous everywhere.
Conversely, suppose that T is continuous but unbounded. Then for any n ∈ N
there is a point x
n
with Tx
n
 > nx
n
. Let y
n
=
x
n
nx
n
, so that y
n
→ 0 as n → ∞.
On the other hand, Ty
n
 > 1 and T(0) = 0, contradicting the assumption that T
is continuous at 0.
2. The space of linear operators
The set of all linear transformations X → Y is itself a linear space with the
operations
(T +S)(x) = Tx +Sx, (λT)(x) = λTx.
Denote this linear space by L(X, Y ). If X and Y are normed spaces, denote by
B(X, Y ) the subspace of continuous linear transformations. If X = Y , then write
L(X, X) = L(X) and B(X, X) = B(X).
2. THE SPACE OF LINEAR OPERATORS 31
Lemma 3.2. Let X and Y be normed spaces. Then B(X, Y ) is a normed linear
space with the norm (29). If in addition Y is a Banach space, then B(X, Y ) is a
Banach space.
Proof. We have to show that the function T → T satisﬁes the conditions
of Deﬁnition 1.4.
(1) It is clear that T ≥ 0 since it is deﬁned as the supremum of a set of non–
negative numbers. If T = 0 then Tx
Y
= 0 for all x, so Tx = 0 for all x – that
is, T = 0.
(2) The triangle inequality is also clear:
T +S = sup
x=1
(T +S)x ≤ sup
x=1
Tx + sup
x=1
Sx = T +S.
(3) λT = sup
x=1
(λT)x = [λ[ sup
x=1
Tx = [λ[T.
Finally, assume that Y is a Banach space and let (T
n
) be a Cauchy sequence in
B(X, Y ). Then the sequence is bounded: there is a constant K with T
n
x ≤ Kx
for all x ∈ X and n ≥ 1. Since T
n
x −T
m
x ≤ T
n
−T
m
x → 0 as n ≥ m → ∞,
the sequence (T
n
x) is a Cauchy sequence in Y for each x ∈ X. Since Y is complete,
for each x ∈ X the sequence (T
n
x) converges; deﬁne T by
Tx = lim
n→∞
T
n
x.
It is clear that T is linear, and Tx ≤ Kx for all x, so T ∈ B(X, Y ).
We have not yet established that T
n
→ T in the norm of B(X, Y ) (cf. 29).
Since (T
n
) is Cauchy, for any > 0 there is an N such that
T
m
−T
n
 ≤ for all m ≥ n ≥ N.
For any x ∈ X we therefore have
T
m
x −T
n
x
Y
≤ x
X
.
Take the limit as m → ∞ to see that
Tx −T
n
x ≤ x,
so that T −T
n
 ≤ if n ≥ N. This proves that T −T
n
 → 0 as n → ∞.
Example 3.3. Once the space of linear operators is known to be complete, we
can do analysis on the operators themselves. For example, if X is a Banach space
and A ∈ B(X), then we may deﬁne an operator
e
A
= I +A+
1
2!
A
2
+
1
3!
A
3
+. . . ,
which makes sense since
e
A
 ≤ 1 +A +
1
2!
A
2
+. . .
≤ e
A
.
This is particularly useful in linear systems theory and control theory; if x(t) ∈ R
n
then the linear diﬀerential equation
dx
dt
= Ax(t), x(0) = x
0
, where A is an n n
matrix, has as solution x(t) = e
At
x
0
.
32 3. LINEAR TRANSFORMATIONS
3. Banach algebras
In many situations it makes sense to multiply elements of a normed linear space
together.
Definition 3.3. Let X be a Banach space, and assume there is a multiplication
(x, y) → xy from X X → X such that addition and multiplication make X into
a ring, and
xy ≤ xy.
Then X is called a Banach algebra.
Recall that a ring does not need to have a unit; if X has a unit then it is called
unital.
Example 3.4. [1] The continuous functions C[0, 1] with sup norm form a Ba
nach algebra with (fg)(x) = f(x)g(x).
[2] If X is any Banach space, then B(X) is a Banach algebra:
ST = sup
x=1
(ST)x = sup
x=1
S(Tx) ≤ S sup
x=1
Tx = ST.
The algebra has an identity, namely I(x) = x.
[3] A special case of [2] is the case X = R
n
. By choosing a basis for R
n
we may
identify B(R
n
) with the space of n n real matrices.
In the next few sections we will prove the more technical results about linear
transformations that provide the basic tools of functional analysis.
4. Uniform boundedness
The ﬁrst theorem is the principle of uniform boundedness or the Banach–
Steinhaus theorem.
Theorem 3.2. Let X be a Banach space and let Y be a normed linear space.
Let ¦T
α
¦ be a family of bounded linear operators from X into Y . If, for each x ∈ X,
the set ¦T
α
x¦ is a bounded subset of Y , then the set ¦T
α
¦ is bounded.
Proof. Assume ﬁrst that there is a ball B
(x
0
) on which ¦T
α
x¦ (a set of
functions) is uniformly bounded: that is, there is a constant K such that
T
α
x ≤ K if x −x
0
 < . (31)
Then it is possible to ﬁnd a uniform bound on the whole family ¦T
a
¦. For any
y ,= 0 deﬁne
z =
y
y +x
0
.
Then z ∈ B
(x
0
) by construction, so (31) implies that T
α
z ≤ K.
Now by linearity of T
α
this shows that
y
T
α
y −T
α
x
0
 ≤
_
_
_
_
y
T
α
y +T
α
x
0
_
_
_
_
= T
α
z ≤ K,
which can be solved for T
α
y:
T
α
y ≤
K +T
α
x
0

y ≤
K +K
y
4. UNIFORM BOUNDEDNESS 33
where K
= sup
α
T
α
x
0
 < ∞. It follows that
T
α
 ≤
K +K
as required.
To ﬁnish the proof we have to show that there is a ball on which property (31)
holds. This is proved by a contradiction argument: assume for not that there is no
ball on which (31) holds. Fix an arbitrary ball B
0
. By assumption there is a point
x
1
∈ B
0
such that
T
α
1
x
1
 > 1
for some index α
1
say. Since each T
α
is continuous, there is a ball B
1
(x
1
) in which
T
α
1
(x
1
) > 1. Assume without loss of generality that
1
< 1. By assumption, in
this new ball the family ¦T
α
x¦ is not bounded, so there is a point x
2
∈ B
1
(x
1
)
with
T
α
2
x
2
 > 2
for some index α
2
,= α
1
. Continue in the same way: by continuity of α
2
there is
a ball B
2
(x
2
) ⊂ B
1
(x
1
) on which T
α
2
x > 2. Assume without loss of generality
that
2
<
1
2
.
Repeating this process produces points x
3
, x
4
, x
5
, . . . , indices α
3
, α
4
, α
5
, . . . ,
and positive numbers
3
,
4
,
5
, . . . such that B
n
(x
n
) ⊂ B
n−1
(x
n−1
),
n
<
1
n
, all
the α
j
’s are distinct, and
T
α
n
x > n for all x ∈ B
n
(x
n
).
Now the sequence (x
n
) is clearly Cauchy and therefore converges to z ∈ X say
(equivalently, prove that
∞
n=1
¯
B
n
(x
n
) contains the single point z).
By construction, T
α
n
z ≥ n for all n ≥ 1, which contradicts the hypothesis
that the set ¦T
α
z¦ is bounded.
Recall the operator norm in Deﬁnition 3.2. Corresponding to this norm there
is a notion of convergence in B(X, Y ): we say that a sequence (T
n
) is uniformly
convergent if there is T ∈ B(X, Y ) with T
n
− T → 0 as n → ∞ (so uniform
convergence of a sequence of operators is simply convergence in the operator norm).
Definition 3.4. A sequence (T
n
) in B(X, Y ) is strongly convergent if, for any
x ∈ X, the sequence (T
n
x) converges in Y . If there is a T ∈ B(X, Y ) with
lim
n
T
n
x = Tx for all x ∈ X, then (T
n
) is strongly convergent to T.
Exercise 3.2 (1). Prove that uniform convergence implies strong convergence.
[2] Show by example that strong convergence does not imply uniform convergence.
Theorem 3.3. Let X be a Banach space, and Y any normed linear space. If
a sequence (T
n
) in B(X, Y ) is strongly convergent, then there exists T ∈ B(X, Y )
such that (T
n
) is strongly convergent to T.
Proof. For each x ∈ X the sequence (T
n
x) is bounded since it is convergent.
By the uniform boundedness principle (Theorem 3.2), there is a constant K such
that T
n
 ≤ K for all n. Hence
T
n
x ≤ Kx for all x ∈ X. (32)
Deﬁne T by requiring that Tx = lim
n→∞
T
n
x for all x ∈ X. It is clear that T is
linear, and (32) shows that Tx ≤ Kx for all x ∈ X, showing that T is bounded.
The construction of T means that (T
n
) converges strongly to T.
34 3. LINEAR TRANSFORMATIONS
5. An application of uniform boundedness to Fourier series
This section is an application of Theorem 3.2 to Fourier analysis. We will
encounter Fourier analysis again, in the context of Hilbert spaces and L
2
functions.
For now we take a naive view of Fourier analysis: the functions will all be continuous
periodic functions, and we compute Fourier coeﬃcients using Riemann integration.
Lemma 3.3.
_
2π
0
¸
¸
¸
¸
sin(n +
1
2
)x
sin
1
2
x
¸
¸
¸
¸
dx −→ ∞ as n → ∞.
Proof. Recall that [ sin(x)[ ≤ [x[ for all x. It follows that
_
2π
0
¸
¸
¸
¸
sin(n +
1
2
)x
sin
1
2
x
¸
¸
¸
¸
dx ≥
_
2π
0
2
x
[ sin(n +
1
2
)x[dx.
Now [ sin(n +
1
2
)x[ ≥
1
2
for all x with (n +
1
2
)x between kπ +
1
6
π and kπ +
1
3
π for
k = 1, 2, . . . . It follows (by thinking of the Riemann approximation to the integral)
that
_
2π
0
2
x
[ sin(n +
1
2
)x[dx ≥
2n
k=0
_
π(k +
1
3
n +
1
2
_−1
=
1
π
_
n +
1
2
_
2n
k=0
1
k +
1
3
→ ∞
as n → ∞.
Definition 3.5. If f : (0, 2π) → R is Riemann–integrable, then the Fourier
series of f is the series
s(x) =
∞
m=−∞
a
m
e
imx
, where a
m
=
1
2π
_
2π
0
f(ξ)e
−imξ
dξ.
Extend the deﬁnition of f to make it 2π–periodic, so f(x + 2π) = f(x) for all
x. Deﬁne the nth partial sum of the Fourier series to be
s
n
(x) =
n
m=−n
a
m
e
−imx
.
The basic questions of Fourier analysis are then the following: is there any relation
between s(x) and f(x)? Does the function s
n
(x) approximate f(x) for large n in
some sense?
Lemma 3.4. Let
1
D
n
(x) =
sin(n+
1
2
)x
sin
1
2
x
. Then
s
n
(y) =
1
2π
_
2π
0
f(y +x)D
n
(x)dx.
Proof. Exercise.
Now let X be the Banach space of continuous functions f : [0, 2π] → R with
f(0) = f(2π), with the uniform norm.
1
This function is called the Dirichlet kernel. For the lemma, it is helpful to notice that
D
n
(x) =
n
j=−n
e
ijx
. If you read up on Fourier analysis, it will be helpful to note that the
Dirichlet kernel is not a “summability kernel”.
5. AN APPLICATION OF UNIFORM BOUNDEDNESS TO FOURIER SERIES 35
Lemma 3.5. The linear operator T
n
: X →R deﬁned by
T
n
(f) =
1
2π
_
2π
0
f(x)D
n
(x)dx
is bounded, and
T
n
 =
1
2π
_
2π
0
[D
n
(x)[dx.
Proof. For any f ∈ X,
[T
n
(f)[ ≤
1
2π
_
2π
0
[f(x)[[D
n
(x)[dx ≤
1
2π
f
_
2π
0
[D
n
(x)[dx,
so
T
n
 ≤
1
2π
_
2π
0
[D
n
(x)[dx.
Assume that for some δ > 0 we have
T
n
 =
1
2π
_
2π
0
[D
n
(x)[dx −δ. (33)
Then since for ﬁxed n [D
n
(x)[ ≤ M
n
is bounded, we may ﬁnd a continuous function
f
n
that diﬀers from sign(D
n
(x)) on a ﬁnite union of intervals whose total length
does not exceed
1
M
n
δ. Then (don’t think about this – just draw a picture)
[
1
2π
_
2π
0
f
n
(x)D
n
(x)dx[ >
1
2π
_
2π
0
[D
n
(x)[dx −δ,
which contradicts the assumption (33). We conclude that
T
n
 =
1
2π
_
2π
0
[D
n
(x)[dx.
We are now ready to see a genuinely non–trivial and important observation
about the basic theorems of Fourier analysis.
Theorem 3.4. There exists a continuous function f : [0, 2π] →R, with f(0) =
f(2π), such that its Fourier series diverges at x = 0.
Proof. By Lemma 3.4, we have
T
n
(f) = s
n
(0)
for all f ∈ X. Moreover, for ﬁxed f ∈ X, if the Fourier series of f converges at 0,
then the family ¦T
n
f¦ is bounded as n varies (since each element is just a partial
sum of a convergent series). Thus if the Fourier series of f converges at 0 for all
f ∈ X, then for each f ∈ X the set ¦T
n
f¦ is bounded. By Theorem 3.2, this
implies that the set ¦T
n
¦ is bounded, which contradicts Lemma 3.5.
The conclusion is that there must be some f ∈ X whose Fourier series does not
converge at 0.
Exercise 3.3. The problem of deciding whether or not the Fourier series of a
given function converges at a speciﬁc point (or everywhere) is diﬃcult and usually
requires some degree of smoothness (diﬀerentiability). You can read about various
results in many books – a good starting point is Fourier Analysis, Tom K¨ orner,
Cambridge University Press (1988).
36 3. LINEAR TRANSFORMATIONS
It is more natural in functional analysis to ask for an appropriate semi–norm
in which s(x) −f(x) = 0 for some class of functions f.
6. Open mapping theorem
Recall that a continuous map between normed spaces has the property that the
pre–image of any open set is open, but in general the image of an open set is not
open (Exercise 1.1). Bounded linear maps between Banach spaces cannot do this.
Theorem 3.5. Let X and Y be Banach spaces, and let T be a bounded linear
map from X onto Y . Then T maps open sets in X onto open sets in Y .
Of course the assumption that X maps onto Y is crucial: think of the projection
(x, y) → (x, 0) from R
2
→ R
2
. This is bounded and linear, but not onto, and
certainly cannot send open sets to open sets.
The proof of the Open–Mapping theorem is long and requires the Baire category
theorem, so it will be omitted from the lectures. For completeness it is given here
in the next three lemmas.
Some notation: use B
X
r
and B
Y
r
to denote the open balls of radius r centre 0
in X and Y respectively.
Lemma 3.6. For any > 0, there is a δ > 0 such that
¯
TB
X
2
⊃ B
Y
δ
. (34)
Proof. Since X =
∞
n=1
nB
X
, and T is onto, we have Y = T(X) =
∞
n=1
nTB
X
.
By the Baire category theorem (Theorem A.4) it follows that, for some n, the set
n
¯
TB
X
contains some ball B
Y
r
(z) in Y . Then
¯
TB
X
must contain the ball B
Y
δ
(y
0
),
where y
0
=
1
n
z and δ =
1
n
r. It follows that the set
P = ¦y
1
−y
2
[ y
1
∈ B
Y
δ
(y
0
), y
2
∈ B
Y
δ
(y
0
)¦
is contained in the closure of the set TQ, where
Q = ¦x
1
−x
2
[ x
1
∈ B
X
, x
2
∈ B
X
¦ ⊂ B
X
2
.
Thus,
¯
TB
X
2
⊂ P. Any point y ∈ B
Y
δ
can be written in the form y = (y +y
0
) −y
0
,
so B
Y
δ
⊂ P. and (34) follows.
Lemma 3.7. For any
0
> 0 there is a δ
0
> 0 such that
TB
X
2
0
⊃ B
Y
δ
0
. (35)
Proof. Choose a sequence (
n
) with each
n
> 0 and
∞
n=1
n
<
0
. By
Lemma 3.6 there is a sequence (δ
n
) of positive numbers such that
¯
TB
X
n
⊃ B
Y
δ
n
(36)
for all n ≥ 1. Without loss of generality, assume that δ
n
→ 0 as n → ∞.
Let y be any point in B
Y
δ
0
. By (36) with n = 0 there is a point x
0
∈ B
X
0
with
y −Tx
0
 < δ
1
. Since (y −Tx
0
) ∈ B
Y
δ
1
, (36) with n = 1 implies that there exists a
point x
1
∈ B
X
1
such that y −Tx
0
−Tx
1
 < δ
2
. Continuing, we obtain a sequence
(x
n
) such that x
n
∈ B
X
n
for all n, and
_
_
_
_
_
y −T
_
n
k=0
x
k
__
_
_
_
_
< δ
n+1
. (37)
6. OPEN MAPPING THEOREM 37
Since x
n
 <
n
, the series
n
x
n
is absolutely convergent, so by Lemma 2.1 it is
convergent; write x =
n
x
n
. Then
x ≤
∞
n=0
x
n
 ≤
∞
n=0
n
< 2
0
.
The map T is continuous, so (37) shows that y = Tx since δ
n
→ 0.
That is, for any y ∈ B
Y
δ
0
we have found a point x ∈ B
X
2
0
such that Tx = y,
proving (35).
Lemma 3.8. For any open set G ⊂ X and for any point ¯ y = T ¯ x, ¯ x ∈ G, there
is an open ball B
Y
η
such that ¯ y +B
Y
η
⊂ T(G).
Notice that Lemma 3.8 proves Theorem 3.5 since it implies that T(G) is open.
Proof. Since G is open, there is a ball B
X
such that ¯ x+B
X
⊂ G. By Lemma
3.7, T(B
X
) ⊃ B
Y
η
for some η > 0. Hence
T(G) ⊃ T(¯ x +B
X
) = T(¯ x) +T(B
X
) ⊃ ¯ y +B
Y
η
.
As an application of Theorem 3.5, we establish a general property of inverse
maps. Generalizing Deﬁnition 3.1 slightly, we have the following.
Definition 3.6. Let T : X → Y be an injective linear operator. Deﬁne the
inverse of T, T
−1
by requiring that
T
−1
y = x if and only if Tx = y.
Then the domain of T
−1
is a linear subspace of Y , and T
−1
is a linear operator.
It is easy to check that T
−1
Tx = x for all x ∈ X, and TT
−1
y = y for all y in
the domain of T
−1
.
Lemma 3.9. Let X and Y be Banach spaces, and let T be an injective bounded
linear map from X to Y . Then T
−1
is a bounded linear map.
Proof. Since T
−1
is a linear operator, we only need to show it is continuous
by Theorem 3.1. By Theorem 3.5 (T
−1
)
−1
maps open sets onto open sets. By
Exercise 1.1[1], this means that T
−1
is continuous.
Corollary 3.1. If X is a Banach space with respect to two norms  
(1)
and
 
(2)
and there is a constant K such that
x
(1)
≤ Kx
(2)
,
then the two norms are equivalent: there is another constant K
with
x
(2)
≤ K
x
(1)
for all x ∈ X.
Proof. Consider the map T : x → x from (X,  
(1)
) to (X,  
(1)
). By
assumption, T is bounded, so by Lemma 3.9, T
−1
is also bounded, giving the
bound in the other direction.
38 3. LINEAR TRANSFORMATIONS
Definition 3.7. Let T : X → Y be a linear operator from a normed linear
space X into a normed linear space Y , with domain D
T
. The graph of T is the set
G
T
= ¦(x, Tx) [ x ∈ D
T
¦ ⊂ X Y.
If G
T
is a closed set in X Y (see Example 1.7) then T is a closed operator.
Notice as usual that this notion becomes trivial in ﬁnite dimensions: if X and
Y are ﬁnite–dimensional, then the graph of T is simply some linear subspace, which
is automatically closed. The next theorem is called the closed–graph theorem.
Theorem 3.6. Let X and Y be Banach spaces, and T : X → Y a linear
operator (notice that the notation means D
T
= X). If T is closed, then it is
continuous.
Proof. Fix the norm (x, y) = x
X
+ y
Y
on X Y . The graph G
T
is,
by linearity of T, a closed linear subspace in XY , so G
T
is itself a Banach space.
Consider the projection P : G
T
→ X deﬁned by P(x, Tx) = x. Then P is clearly
bounded, linear, and bijective. It follows by Lemma 3.9 that P
−1
is a bounded
linear operator from X into G
T
, so
(x, Tx) = P
−1
x ≤ Kx
X
for all x ∈ X,
for some constant K. It follows that x
X
+Tx
Y
≤ Kx
X
for all x ∈ X, so T
is bounded – and therefore T is continuous by Theorem 3.1.
7. Hahn–Banach theorem
Let X be a normed linear space. A bounded linear operator from X into the
normed space R is a (real) continuous linear functional on X. The space of all
continuous linear functionals is denoted B(X, R) = X
∗
, and it is called the dual or
conjugate space of X. All the material here may be done again with C instead of
R without signiﬁcant changes.
Notice that Lemma 3.2 shows that X
∗
is itself a Banach space independently
of X.
One of the most important questions one may ask of X
∗
is the following: are
there “enough” elements in X
∗
? (to do what we need: for example, to separate
points). This is answered in great generality using the Hahn–Banach theorem
(Theorem 3.7 below); see Corollary 3.4. First we prove the Hahn–Banach lemma.
Lemma 3.10. Let X be a real linear space, and p : X → R a continuous func
tion with
p(x +y) ≤ p(x) +p(y), p(λx) = λp(x) for all λ ≥ 0, x, y ∈ X.
Let Y be a subspace of X, and f ∈ Y
∗
with
f(x) ≤ p(x) for all y ∈ Y.
Then there exists a functional F ∈ X
∗
such that
F(x) = f(x) for x ∈ Y ; F(x) ≤ p(x) for all x ∈ X.
Proof. Let / be the set of all pairs (Y
α
, g
α
) in which Y
α
is a linear subspace
of X containing Y , and g
α
is a real linear functional on Y
α
with the properties that
g
α
(x) = f(x) for all x ∈ Y, g
α
(x) ≤ p(x) for all x ∈ Y
α
.
7. HAHN–BANACH THEOREM 39
Make / into a partially ordered set by deﬁning the relation (Y
α
, g
α
) ≤ (Y
β
, g
β
) if
Y
α
⊂ Y
β
and g
α
= g
β
on Y
α
. It is clear that any totally ordered subset ¦(Y
λ
, g
λ
)
has an upper bound given by the subspace
λ
Y
λ
and the functional deﬁned to be
g
λ
on each Y
λ
.
By Theorem A.1, there is a maximal element (Y
0
, g
0
) in /. All that remains is
to check that Y
0
is all of X (so we may take F to be g
0
).
Assume that y
1
∈ X¸Y
0
. Let Y
1
be the linear space spanned by Y
0
and y
1
:
each element x ∈ Y
1
may be expressed uniquely in the form
x = y +λy
1
, y ∈ Y
0
, λ ∈ R,
because y
1
is assumed not to be in the linear space Y
0
. Deﬁne a linear functional
g
1
∈ Y
∗
1
by g
1
(y +λy
1
) = g
0
(y) +λc.
Now we choose the constant c carefully. Note that if x ,= y are in Y
0
, then
g
0
(y) −g
0
(x) = g
0
(y −x) ≤ p(y −x) ≤ p(y +y
1
) +p(−y
1
−x),
so
−p(−y
1
−x) −g
0
(x) ≤ p(y +y
1
) −g
0
(y).
It follows that
A = sup
x∈Y
0
¦−p(−y
1
−x) −g
0
(x)¦ ≤ inf
y∈Y
0
¦p(y +y
1
) −g
0
(y)¦ = B.
Choose c to be any number in the interval [A, B]. Then by construction of A and
B,
c ≤ p(y +y
1
) −g
0
(y) for all y ∈ Y
0
, (38)
−p(−y
1
−y) −g
0
(y) ≤ c for all y ∈ Y
0
. (39)
Multiply (38) by λ > 0 and substitute
y
λ
for y to obtain
λc ≤ p(y +λy
1
) −g
0
(y). (40)
Now multiply (39) by λ < 0, substitute
y
λ
for y and use the homegeneity assumption
on p to obtain (40) again. Since (40) is clear for λ = 0, we deduce that
g
1
(y +λy
1
) = g
0
(y) +λc ≤ p(y +λy
1
)
for all λ ∈ R and y ∈ Y
0
. That is, (Y
1
, g
1
) ∈ / and (Y
0
, g
0
) ≤ (Y
1
, g
1
) with Y
0
,= Y
1
.
This contradicts the maximality of (Y
0
, g
0
).
For real linear spaces, the Hahn–Banach theorem follows at once (for complex
spaces a little more work is needed).
Theorem 3.7. Let X be a real normed space, and Y a linear subspace. Then
for any y
∗
∈ Y
∗
there corresponds an x
∗
∈ X
∗
such that
x
∗
 = y
∗
, and x
∗
(y) = y
∗
(y) for all y ∈ Y.
That is, any linear functional deﬁned on a subspace may be extended to a linear
functional on the whole space with the same norm.
Proof. Let p(x) = y
∗
x, f(x) = y
∗
(x), and x
∗
= F. Apply the Hahn–
Banach Lemma 3.10. To check that x
∗
 ≤ y, write x
∗
(x) = θ[x
∗
(x)[ for θ = ±1.
Then
[x
∗
(x)[ = θx
∗
(x) = x
∗
(θx) ≤ p(θx) = y
∗
θx = y
∗
x.
The reverse inequality is clear, so x
∗
 = y
∗
.
40 3. LINEAR TRANSFORMATIONS
Many useful results follow from the Hahn–Banach theorem.
Corollary 3.2. Let Y be a linear subspace of a normed linear space X, and
let x
0
∈ X have the property that
inf
y∈Y
y −x
0
 = d > 0. (41)
Then there exists a point x
∗
∈ X
∗
such that
x
∗
(x
0
) = 1, x
∗
 =
1
d
, x
∗
(y) = 0 for all y ∈ Y
0
.
Proof. Let Y
1
be the linear space spanned by Y and x
0
. Since x
0
/ ∈ Y , every
point x in Y
1
may be represented uniquely in the form x = y + λx
0
, with y ∈ Y ,
λ ∈ R. Deﬁne a linear functional z
∗
∈ Y
∗
1
by z
∗
(y +λx
0
) = λ. If λ ,= 0, then
y +λx
0
 = [λ[
_
_
_
y
λ
+x
0
_
_
_ ≥ [λ[d.
It follows that [z
∗
(x)[ ≤ x/d for all x ∈ Y
1
, so z
∗
 ≤
1
d
. Choose a sequence
(y
n
) ⊂ Y with x
0
−y
n
 → d as n → ∞. Then
1 = z
∗
(x
0
−y
n
) ≤ z
∗
x
0
−y
n
 → z
∗
d,
so z
∗
 =
1
d
. Apply Theorem 3.7 to z
∗
.
Corollary 3.3. Let X be a normed linear space. Then, for any x ,= 0 in X
there is a functional x
∗
∈ X
∗
with x
∗
 = 1 and x
∗
(x) = x.
Proof. Apply Corollary 3.2 with Y = ¦0¦ to ﬁnd z
∗
= X
∗
such that z
∗
 =
1/x, z
∗
(x) = 1. We may therefore take x
∗
to be xz
∗
.
Corollary 3.4. If z ,= y in a normed linear space X, then there exists x
∗
∈
X
∗
such that x
∗
(y) ,= x
∗
(z).
Proof. Apply Corollary 3.3 with x = y −z.
Corollary 3.5. If X is a normed linear space, then
x = sup
x
∗
=0
[x
∗
(x)[
x
∗

= sup
x
∗
=1
[x
∗
(x)[.
Proof. The last two expressions are clearly equal. It is also clear that
sup
x
∗
=1
[x
∗
(x)[ ≤ x.
By Corollary 3.3, there exists x
∗
0
such that x
∗
0
(x) = x and x
∗
0
 = 1, so
sup
x
∗
=1
[x
∗
(x)[ ≥ x.
Corollary 3.6. Let Y be a linear subspace of the normed linear space X. If
Y is not dense in X, then there exists a functional x
∗
,= 0 such that x
∗
(y) = 0 for
all y ∈ Y .
Proof. Notice that if there is no point x
0
∈ X satisfying (41) then Y must be
dense in X. So we may choose x
0
with (41) and apply Corollary 3.2).
7. HAHN–BANACH THEOREM 41
Notice ﬁnally that linear functionals allow us to decompose a linear space: let
X be a normed linear space, and x
∗
∈ X
∗
. The null space or kernel of x
∗
is the
linear subspace N
x
∗ = ¦x ∈ X [ x
∗
(x) = 0¦. If x
∗
,= 0, then there is a point
x
0
,= 0 such that x
∗
(x
0
) = 1. Any element x ∈ X can then be written x = z +λx
0
,
with λ = x
∗
(x) and z = x − λx
0
∈ N
x
∗. Thus, X = N
x
∗ ⊕ Y , where Y is the
one–dimensional space spanned by x
0
.
42 3. LINEAR TRANSFORMATIONS
CHAPTER 4
Integration
We have seen in Examples 1.10[3] that the space C[0, 1] of continuous functions
with the p–norm
f
p
=
__
1
0
[f(t)[
p
dt
_1/p
is not complete, even if we extend the space to Riemann–integrable functions.
As discussed in the section on completions, we can think of the completion of
the space in terms of all limit points of (equivalence classes) of Cauchy sequences.
This does not give any real sense of what kind of functions are in the completion. In
this chapter we construct the completions L
p
for 1 ≤ p ≤ ∞ by describing (without
proofs) the Lebesgue
1
integral.
1. Lebesgue measure
Definition 4.1. Let B denote the smallest collection of subsets of R that in
cludes all the open sets and is closed under countable unions, countable intersections
and complements. These sets are called the Borel sets.
In fact the Borel sets form a σ–algebra: R, ∅ ∈ B, and B is closed under
countable unions and intersections. We will call Borel sets measurable. Many
subsets of R are not measurable, but all the ones you can write down or that might
arise in a practical setting are measurable.
The Lebesgue measure on R is a map µ : B → R ∪ ¦∞¦ with the properties
that
(i) µ[a, b] = µ(a, b) = b −a;
(ii) µ(∪
∞
n=1
A
n
) =
∞
n=1
µ(A
n
).
Notice that the Lebesgue measure attaches a measure to all measurable sets.
Sets of measure zero are called null sets, and something that happens everywhere
except on a set of measure zero is said to happen almost everywhere, often written
simply a.e. For technical reasons, allow any subset of a null set to also be regarded
as “measurable”, with measure zero.
Exercise 4.1. [1] Prove that µ(Q) = 0. Thus a.e. real number is irrational.
1
Henri Leon Lebesgue (1875–1941), was a French mathematician who revolutionized the ﬁeld
of integration by his generalization of the Riemann integral. Up to the end of the 19th century,
mathematical analysis was limited to continuous functions, based largely on the Riemann method
of integration. Building on the work of others, including that of the French mathematicians Emile
Borel and Camille Jordan, Lebesgue developed (in 1901) his theory of measure. A year later,
Lebesgue extended the usefulness of the deﬁnite integral by deﬁning the Lebesgue integral: a
method of extending the concept of area below a curve to include many discontinuous functions.
Lebesgue served on the faculty of several French universities. He made major contributions in
other areas of mathematics, including topology, potential theory, and Fourier analysis.
43
44 4. INTEGRATION
[2] More can be said: call a real number algebraic if it is a zero of some polynomial
with rational coeﬃcients, and transcendental if not. Then a.e. real number is
transcendental.
[3] Prove that for any measurable sets A, B, µ(A∪ B) = µ(A) +µ(B) −µ(A∩ B).
[4] Can you construct
2
a set that is not a member of B?
Definition 4.2. A function f : R → R ∪ ¦±∞¦ is a Lebesgue measurable
function if f
−1
(A) ∈ B for every A ∈ /.
Example 4.1. [1] The characteristic function χ
Q
, deﬁned by χ
Q
(x) = 1 if
x ∈ Q, and = 0 if x / ∈ Q is an example of a measurable function that is not
Riemann integrable.
[2] All continuous functions are measurable (by Exercise 1.1[1]).
The basic idea in Riemann integration is to approximate functions by step
functions, whose “integrals” are easy to ﬁnd. These give the upper and lower
estimates. In the Lebesgue theory, we do something similar, using simple functions
instead of step functions.
A simple function is a map f : R →R of the form
f(x) =
n
i=1
c
i
χ
E
i
(x), (42)
where the c
i
are non–zero constants and the E
i
are disjoint measurable sets with
µ(E
i
) < ∞.
The integral of the simple function (42) is deﬁned to be
_
E
fdµ =
n
i=1
c
i
µ(E ∩ E
i
)
for any measurable set E.
The basic approximation fact in the Lebesgue integral is the following: if f :
R →R∪¦±∞¦ is measurable and non–negative, then there is an increasing sequence
(f
n
) of simple functions with the property that f
n
(t) → f(t) a.e. We write this as
f
n
↑ f a.e., and deﬁne the integral of f to be
_
E
fdµ = lim
n→∞
_
E
f
n
dµ.
Notice that (once we allow the value ∞), the limit is guaranteed to exist since the
sequence is increasing.
2
This “construction” requires the use of the Axiom of Choice and is closely related to the
existence of a Hamel basis for R as a vector space over Q. The question really has two faces:
1) using the usual axioms of set theory (including the Axiom of Choice), can you exhibit a non–
measurable subset of R? 2) using the usual axioms of set theory without the Axiom of Choice, is
it still possible to exhibit a non–measurable subset of R?
The ﬁrst question is easily answered. The second question is much deeper because the answer
is “no”. This is part of a subject called Model Theory. Solovay showed that there is a model of
set theory (excluding the Axiom of Choice but including a further axiom) in which every subset
of R is measurable. Shelah tried to remove Solovay’s additional axiom, and answered a related
question by exhibiting a model of set theory (excluding the Axiom of Choice but otherwise as
usual) in which every subset of R has the Baire property. The references are R.M. Solovay, “A
model of set–theory in which every set of reals is Lebesgue measurable”, Annals of Math. 92
(1970), 1–56, and S. Shelah, “Can you take Solovay’s inaccessible away?”, Israel Journal of Math.
48 (1984), 1–47 but both of them require extensive additional background to read.
1. LEBESGUE MEASURE 45
For a general measurable function f, write f = f
+
− f
−
where both f
+
and
f
−
are non–negative and measurable, then deﬁne
_
E
fdµ =
_
E
f
+
dµ −
_
E
f
−
dµ.
Example 4.2. Let f(x) = χ
Q∩[0,1]
(x). Then f is itself a simple function, so
_
1
0
fdµ = µ(Q∩ [0, 1]) = 0.
A measurable function f on [a, b] is essentially bounded if there is a constant
K such that [f(x)[ ≤ K a.e. on [a, b]. The essential supremum of such a function
is the inﬁmum of all such essential bounds K, written
f
∞
= ess.sup.
[a,b]
[f[.
Definition 4.3. Deﬁne L
p
[a, b] to be the linear space of measurable functions
f on [a, b] for which
f
p
=
_
_
b
a
[f[
p
dµ
_
1/p
< ∞
for p ∈ [1, ∞) and L
∞
[a, b] to be the linear space of essentially bounded functions.
Notice that  
p
on L
p
is only a semi–norm, since many functions will for example
have f
p
= 0. Deﬁne an equivalence relation on L
p
by f ∼ g if ¦x ∈ R [ f(x) ,=
g(x)¦ is a null set. Then deﬁne
L
p
[a, b] = L
p
/ ∼,
the space of L
p
functions.
In practice we will not think of elements of L
p
as equivalence classes of functions,
but as functions deﬁned a.e. A similar deﬁnition may be made of p–integrable
functions on R, giving the linear space L
p
(R).
The following theorems are proved in any book on measure theory or modern
analysis or may be found in any of the references. Theorem 4.1 is sometimes called
the Riesz–Fischer theorem; Theorem 4.2 is H¨older’s inequality.
Theorem 4.1. The normed spaces L
p
[a, b] and L
p
(R) are (separable) Banach
spaces under the norm  
p
.
Theorem 4.2. If
1
r
=
1
p
+
1
q
, then
fg
r
≤ f
p
g
q
for any f ∈ L
p
[a, b], g ∈ L
q
[a, b]. It follows that for any measurable f on [a, b],
f
1
≤ f
2
≤ f
3
≤ ≤ f
∞
.
Hence
L
1
[a, b] ⊃ L
2
[a, b] ⊃ ⊃ L
∞
[a, b].
In the theorem we allow p and q to be anything in [1, ∞] with the obvious
interpretation of
1
∞
.
Note the “opposite” behaviour to the sequence spaces
p
in Example 1.4[3],
where we saw that
1
⊂
2
⊂ ⊂
∞
.
46 4. INTEGRATION
Two easy consequences of H¨older’s inequality are the Cauchy–Schwartz inequal
ity,
fg
1
≤ f
2
g
2
and Minkowski’s inequality,
f +g
p
≤ f
p
+g
p
.
The most useful general result about Lebesgue integration is Lebesgue’s domi
nated convergence theorem.
Theorem 4.3. Let (f
n
) be a sequence of measurable functions on a measurable
set E such that f
n
(t) → f(t) a.e. and there exists an integrable function g such
that [f
n
(t)[ ≤ g(t) a.e. Then
_
E
fdµ = lim
n→∞
_
E
f
n
dµ.
Exercise 4.2. [1] Prove that the L
p
–norm is strictly convex for 1 < p < ∞
but is not strictly convex if p = 1 or ∞.
2. Product spaces and Fubini’s theorem
Let X and Y be two subsets of R. Let /, B denote the σ–algebra of Borel sets
in X and Y respectively.
Subsets of X Y (Cartesian product) of the form
AB = ¦(x, y) : x ∈ A, y ∈ B¦
with A ∈ /, B ∈ B are called (measurable) rectangles. Let / B denote the
smallest σ–algebra on XY containing all the measurable rectangles. Notice that,
depite the notation, this is much larger than the set of all measurable rectangles.
The measure space (X Y, / B) is the Cartesian product of (X, /) and (Y, B).
Let µ
X
and µ
Y
denote Lebesgue measure on X and Y . Then there is a unique
measure λ on X Y with the property that
λ(AB) = µ
X
(A) µ
Y
(B)
for all measurable rectangles A B. This measure is called the product measure
of µ
X
and µ
Y
and we write λ = µ
X
µ
Y
.
The most important result on product measures is Fubini’s theorem.
Theorem 4.4. If h is an integrable function on XY , then x → h(x, y) is an
integrable function of X for a.e. y, y → h(x, y) is an integrable function of y for
a.e. x, and
_
hd(µ
X
µ
Y
) =
_ _
hdµ
X
dµ
Y
=
_ _
hdµ
Y
dµ
X
.
CHAPTER 5
Hilbert spaces
We have seen how useful the property of completeness is in our applications
of Banach–space methods to certain diﬀerential and integral equations. However,
some obvious ideas for use in diﬀerential equations (like Fourier analysis) seem to
go wrong in the obvious Banach space setting (cf. Theorem 3.4). It turns out that
not all Banach spaces are equally good – there are distinguished ones in which the
parallelogram law (equation (43) below) holds, and this has enormous consequences.
It makes more sense in this section to deal with complex linear spaces, so from now
on assume that the ground ﬁeld is C.
1. Hilbert spaces
Definition 5.1. A complex linear space H is called a Hilbert
1
space if there
is a complex–valued function (, ) : H H →C with the properties
(i) (x, x) ≥ 0, and (x, x) = 0 if and only if x = 0;
(ii) (x +y, z) = (x, z) + (y, z) for all x, y, z ∈ H;
(iii) (λx, y) = λ(x, y) for all x, y ∈ H and λ ∈ C;
(iv) (x, y) =
¯
(y, x) for all x, y ∈ C;
(v) the norm deﬁned by x = (x, x)
1/2
makes H into a Banach space.
If only properties (i), (ii), (iii), (iv) hold then (H, (, )) is called an inner–
product space.
Notice that property (v) makes sense since by (i) (x, x) ≥ 0, and we shall see
below (Lemma 5.2) that   is indeed a norm.
The function (, ) is called an inner or scalar product, and so a Hilbert space
is a complete inner product space.
If the scalar product is real–valued on a real linear space, then the properties
determine a real Hilbert space; all the results below apply to these.
Notice that (iii) and (iv) imply that (x, λy) =
¯
λ(x, y), and (x, 0) = (0, x) = 0.
Example 5.1. [1] If X = C
n
, then (x, y) =
n
i=1
x
i
¯ y
i
makes C
n
into an n–
dimensional Hilbert space.
1
David Hilbert (1862–1943) was a German mathematician whose work in geometry had the
greatest inﬂuence on the ﬁeld since Euclid. After making a systematic study of the axioms of
Euclidean geometry, Hilbert proposed a set of 21 such axioms and analyzed their signiﬁcance.
Hilbert received his Ph.D. from the University of Konigsberg and served on its faculty from 1886
to 1895. He became (1895) professor of mathematics at the University of Gottingen, where he
remained for the rest of his life. Between 1900 and 1914, many mathematicians from the United
States and elsewhere who later played an important role in the development of mathematics
went to Gottingen to study under him. Hilbert contributed to several branches of mathematics,
including algebraic number theory, functional analysis, mathematical physics, and the calculus of
variations. He also enumerated 23 unsolved problems of mathematics that he considered worthy
of further investigation. Since Hilbert’s time, nearly all of these problems have been solved.
47
48 5. HILBERT SPACES
[2] Let X = C[a, b] (complex–valued continuous functions). Then the inner–product
(f, g) =
_
b
a
f(t)
¯
g(t)dt makes X into an inner–product space that is not a Hilbert
space.
[3] Let X =
2
(square–summable sequences; see Example 1.4[3]) with the inner–
product ((x
n
), (y
n
)) =
∞
n=1
x
n
¯ y
n
. This is well–deﬁned by the Schwartz inequality
Lemma 5.1, and it is a Hilbert space by Example 2.1[3]. We shall see later that
2
is the only
p
space that is a Hilbert space.
[4] Let X = L
2
[a, b] with inner–product (f, g) =
_
b
a
f(t)
¯
g(t)dt. Then X is a Hilbert
space (by the Cauchy–Schwartz inequality and Theorem 4.1).
Lemma 5.1. In a Hilbert space,
[(x, y)[ ≤ xy.
Proof. Assume that x, y are non–zero (the result is clear if x or y is zero),
and let λ ∈ C. Then
0 ≤ (x +λy, x +λy)
= x
2
+[λ[
2
y
2
+λ(y, x) +
¯
λ(x, y)
= x
2
+[λ[
2
y
2
+ 2'[λ(x, y)].
Let λ = −re
iθ
for some r > 0, and choose θ such that θ = −arg(x, y) if (x, y) ,= 0.
Then
x
2
+r
2
y
2
≥ 2r[(x, y).
Take r = x/y to obtain the result.
Lemma 5.2. The function deﬁned by x = (x, x)
1/2
is a norm on a Hilbert
space.
Proof. All the properties are clear except the triangle inequality. Since
(x, y) + (y, x) = 2'(x, y) ≤ 2xy,
we have
x +y
2
= x
2
+y
2
+ (x, y) + (y, x)
≤ x
2
+y
2
+ 2xy = (x +y)
2
,
so x +y ≤ x +y.
Lemma 5.3. The norm on a Hilbert space is strictly convex (cf. Deﬁnition
1.7).
Proof. From the proof of Lemma 5.1, if [(x, y)[ = xy, then x = −λy.
From the proof of Lemma 5.2 it follows that if x +y = x +y and y ,= 0 then
x = −λy. Hence if x = y = 1 and x + y = 2, then [λ[ = 1 and [1 − λ[ = 2,
so λ = −1 and x = y.
Next there is the peculiar parallelogram law.
Theorem 5.1. If H is a Hilbert space, then
x +y
2
+x −y
2
= 2x
2
+ 2y
2
(43)
for all x, y ∈ H.
Conversely, if H is a complex Banach space with norm   satisfying (43),
then H is a Hilbert space with scalar product (, ) satsifying x = (x, x)
1/2
.
1. HILBERT SPACES 49
Proof. The forward direction is easy: simply expand the expression
(x +y, x +y) + (x −y, x −y).
For the reverse direction, deﬁne
(x, y) =
1
4
__
x +y
2
−x −y
2
¸
+i
_
x +iy
2
−x −iy
2
¸_
(44)
(in the real case, with the second expression simply omitted). Since
(x, x) = x
2
+
i
4
x
2
[1 +i[
2
−
i
4
x
2
[1 −i[
2
= x
2
,
the inner–product norm (x, x)
1/2
coincides with the norm x.
To prove that (, ) satisﬁes condition (ii) in Deﬁnition 5.1, use (43) to show
that
u +v +w
2
+u +v −w
2
= 2u +v
2
+ 2w
2
,
u −v +w
2
+u −v −w
2
= 2u −v
2
+ 2w
2
.
It follows that
_
u +v −w
2
−u −v +w
2
_
+
_
u +v −w
2
−u −v −w
2
_
= 2u +v
2
−2u −v
2
,
showing that
'(u +w, v) +'(u −w, v) = 2'(u, v).
A similar argument shows that
·(u +w, v) +·(u −w, v) = 2·(u, v),
so
(u +w, v) + (u −w, v) = 2(u, v).
Taking w = u shows that (2u, v) = 2(u, v). Taking u + w = x, u − w = y, v = z
then gives
(x, z) + (y, z) = 2
_
x +y
2
, z
_
= (x +y, z).
To prove condition (iii) in Deﬁnition 5.1, use (ii) to show that
(mx, y) = ((m−1)x +x, y) = ((m−1)x, y) + (x, y)
= ((m−2)x, y) + 2(x, y)
= . . .
= m(x, y).
The same argument in reverse shows that n(x/n, y) = (x, y), so (x/n, y) = (1/n)(x, y).
If r = m/n (m, n ∈ N) then
r(x, y) =
m
n
(x, y) = m
_
x
n
, y
_
=
_
m
n
x, y
_
= (rx, y).
Now (x, y) is a continuous function in x (by (44)); we deduce that λ(x, y) = (λx, y)
for all λ > 0. For λ < 0,
λ(x, y) −(λx, y) = λ(x, y) −([λ[(−x), y) = λ(x, y) −[λ[(−x, y)
= λ(x, y) +λ(−x, y) = λ(0, y) = 0,
50 5. HILBERT SPACES
so (iii) holds for all λ ∈ R. For λ = i, (iii) is clear, so if λ = µ +iν,
λ(x, y) = µ(x, y) +iν(x, y) = (µx, y) +i(νx, y)
= (µx, y) + (iνx, y) = (λx, y).
Condition (iv) is clear, and (v) follows from the assumption that H is Banach
space.
2. Projection theorem
Let H be a Hilbert space. A point x ∈ H is orthogonal to a point y ∈ H,
written x ⊥ y, if (x, y) = 0. For sets N, M in H, x is orthogonal to N, written
x ⊥ N, if (x, y) = 0 for all y ∈ N. The sets N and M are orthogonal (written
N ⊥ M) if x ⊥ M for all x ∈ N. The orthogonal complement of M is deﬁned as
M
⊥
= ¦x ∈ H [ x ⊥ M¦.
Notice that for any M, M
⊥
is a closed linear subspace of H.
Lemma 5.4. Let M be a closed convex set in a Hilbert space H. For every point
x
0
∈ H there is a unique point y
0
∈ M such that
x
0
−y
0
 = inf
y∈M
x
0
−y. (45)
That is, it makes sense in a Hilbert space to talk about the point in a closed
convex set that is “closest” to a given point.
Proof. Let d = inf
y∈M
x
0
− y and choose a sequence (y
n
) in M such that
x
0
−y
n
 → d as n → ∞. By the parallelogram law (43),
4x
0
−
1
2
(y
m
+y
n
)
2
+y
m
−y
n

2
= 2x
0
−y
m

2
+ 2x
0
−y
n

2
→ 4d
2
as m, n → ∞. By convexity (Deﬁnition 1.6),
1
2
(y
m
+y
n
) ∈ M, so
4x
0
−
1
2
(y
m
+y
n
)
2
≥ 4d
2
.
It follows that y
m
−y
n
 → 0 as m, n → ∞. Now H is complete and M is a closed
subset, so lim
n→∞
y
n
= y
0
exists and lies in M. Now x
0
− y
0
 = lim
n→∞
x
0
−
y
n
 = d, showing (45).
It remains to check that the point y
0
is the only point with property (45). Let
y
1
be another point in M with
x
0
−y
1
 = inf
y∈M
x
0
−y.
Then
2
_
_
_
_
x
0
−
y
0
+y
1
2
_
_
_
_
≤ x
0
−y
0
 +x
0
−y
1

≤ 2 inf
y∈M
x
0
−y ≤ 2
_
_
_
_
x
0
−
y
0
+y
1
2
_
_
_
_
,
since (y
0
+y
1
)/2 lies in M. It follows that
2
_
_
_
_
x
0
−
y
0
+y
1
2
_
_
_
_
= x
0
−y
0
 +x
0
−y
1
.
Since the Hilbert norm is strictly convex (Lemma 5.3), we deduce that x
0
− y
0
=
x
0
−y
1
, so y
1
= y
0
.
2. PROJECTION THEOREM 51
This gives us the Orthogonal Projection Theorem.
Theorem 5.2. Let M be a closed linear subspace of a Hilbert space H. Then
any x
0
∈ H can be written x
0
= y
0
+ z
0
, y
0
∈ M, z
0
∈ M
⊥
. The elements y
0
, z
0
are determined uniquely by x
0
.
Proof. If x
0
∈ M then y
0
= x
0
and z
0
= 0. If x
0
/ ∈ M, then let y
0
be the
point in M with
x
0
−y
0
 = inf
y∈M
x
0
−y
(this point exists by Lemma 5.4). Now for any y ∈ M and λ ∈ C, y
0
+λy ∈ M so
x
0
−y
0

2
≤ x
0
−y
0
−λy
2
= x
0
−y
0

2
−2'λ(y, x
0
−y
0
) +[λ[
2
y
2
.
Hence
−2'λ(y, x
0
−y
0
) +[λ[
2
y
2
≥ 0.
Assume now that λ = > 0 and divide by . As → 0 we deduce that
'(y, x
0
−y
0
) ≤ 0. (46)
Assume next that λ = −i and divide by . As → 0, we get
·(y, x
0
−y
0
) ≤ 0. (47)
Exactly the same argument may be applied to −y since −y ∈ M, showing that
(46) and (47) hold with y replaced by −y. Thus (y, x
0
−y
0
) = 0 for all y ∈ M. It
follows that the point z
0
= x
0
−y
0
lies in M
⊥
.
Finally, we check that the decomposition is unique. Suppose that x
0
= y
1
+z
1
with y
1
∈ M and z
1
∈ M
⊥
. Then y
0
−y
1
= z
1
−z
0
∈ M ∩ M
⊥
= ¦0¦.
Corollary 5.1. If M is a closed linear subspace and M ,= H, then there exists
an element z
0
,= 0 such that z
0
⊥ M.
Proof. Apply the projection theorem (Theorem 5.2) to any x
0
∈ H¸M.
It follows that all linear functionals on a Hilbert space are given by taking inner
products – the Riesz theorem.
Theorem 5.3. For every bounded linear functional x
∗
on a Hilbert space H
there exists a unique element z ∈ H such that x
∗
(x) = (x, z) for all x ∈ H. The
norm of the functional is given by x
∗
 = z.
Proof. Let N be the null space of x
∗
; N is a closed linear subspace of H. If
N = H, then x
∗
= 0 and we may take x
∗
(x) = (x, 0). If N ,= H, then by Corollary
5.1 there is a point z
0
∈ N
⊥
, z
0
,= 0. By construction, α = x
∗
(z
0
) ,= 0. For any
x ∈ H, the point x −x
∗
(x)z
0
/α lies in N, so
(x −x
∗
(x)z
0
/α, z
0
) = 0.
It follows that
x
∗
(x)
_
z
0
α
, z
0
_
= (x, z
0
).
If we substitute z =
¯ α
(z
0
,z
0
)
z
0
, we get x
∗
(x) = (x, z) for all x ∈ H.
To check uniqueness, assume that x
∗
(x) = (x, z
) for all x ∈ H. Then (x, z −
z
) = 0 for all x ∈ H, so (taking x = z −z
), z −z
 = 0 and therefore z = z
.
Finally,
x
∗
 = sup
x=1
[x
∗
(x)[ = sup
x=1
[(x, z) ≤ sup
x=1
(xz) = x.
52 5. HILBERT SPACES
On the other hand,
z
2
= (z, z) = [x
∗
(z)[ ≤ x
∗
z,
so z ≤ x
∗
.
Corollary 5.2. If H is a Hilbert space, then the space H
∗
is also a Hilbert
space. The map σ : H → H
∗
given by (σx)(y) = (y, x) is an isometric embedding
of H onto H
∗
.
Definition 5.2. Let M and N be linear subspaces of a Hilbert space H. If
every element in the linear space M + N has a unique representation in the form
x+y, x ∈ M, y ∈ N, then we say M +N is a direct sum. If M ⊥ N, then we write
M ⊕N – and this sum is automatically a direct one. If Y = M ⊕N, then we also
write N = Y ¸M and call N the orthogonal complement of M in Y .
Notice that the projection theorem says that if M is a closed linear space in
H, then H = M ⊕M
⊥
.
3. Projection and self–adjoint operators
Definition 5.3. Let M be a closed linear subspace of the Hilbert space H. By
the projection theorem, every x ∈ H can be written uniquely in the form x = y +z
with y ∈ M, z ∈ M
⊥
. Call y the projection of x in M, and the operator P = P
M
deﬁned by Px = y is the projection on M. The space M is called the subspace of
the projection P.
Definition 5.4. Let T : H → H be a bounded linear operator. The adjoint
T
∗
of T is deﬁned by the relation (Tx, y) = (x, T
∗
y) for all x, y ∈ H. An operator
with T = T
∗
is called self–adjoint.
Notice that if T is self–adjoint, then for every x ∈ H, (Tx, x) ∈ R.
Exercise 5.1. Let T and S be bounded linear operators in Hilbert space H,
and λ ∈ C. Prove the following: (T +S)
∗
= T
∗
+S
∗
; (TS)
∗
= S
∗
T
∗
; (λT)
∗
=
¯
λT
∗
;
I
∗
= I; T
∗∗
= T; T
∗
 = T. If T
−1
is also a bounded linear operator with domain
H, then (T
∗
)
−1
is a bounded linear map with domain H and (T
−1
)
∗
= (T
∗
)
−1
.
Theorem 5.4. [1] If P is a projection, then P is self–adjoint, P
2
= P and
P = 1 if P ,= 0.
[2] If P is a self–adjoint operator with P
2
= P, then P is a projection.
Proof. [1] Let P = P
M
, and x
i
= y
i
+ z
i
for i = 1, 2 where y
i
∈ M and
z
i
∈ M
⊥
. Then λ
1
x
1
+λ
2
x
2
= (λ
1
y
1
+λ
2
y
2
) + (λ
1
z
1
+λ
2
z
2
) and
(λ
1
y
1
+λ
2
y
2
) ∈ M, (λ
1
z
1
+λ
2
z
2
) ∈ M
⊥
.
It follows that P is linear. To see that P
2
= P, notice that P
2
x
1
= P(Px
1
) =
P(y
1
) = y
1
= Px
1
since y
1
∈ M. Notice that x
1

2
= y
1

2
+ z
1

2
≥ y
1

2
=
Px
1

2
so P ≤ 1. If P ,= 0 then for any x ∈ M¸¦0¦ we have Px = x so P ≥ 1.
Self–adjointness is clear:
(Px
1
, x
2
) = (y
1
, x
2
) = (y
1
, y
2
) = (x
1
, y
2
) = (x
1
, Px
2
).
[2] Let M = P(H); then M is a linear subspace of H. If y
n
= P(x
n
), with y
n
→ z,
then Py
n
= P
2
x
n
= Px
n
= Py
n
, so z = lim
n
y
n
= lim
n
Py
n
= Pz ∈ M so M is
closed. Since P is self–adjoint and P
2
= P,
(x −Px, Py) = (Px −P
2
x, y) = 0
3. PROJECTION AND SELF–ADJOINT OPERATORS 53
for all y ∈ H so x −Px ∈ M
⊥
. This means that x = Px + (x −Px) is the unique
decomposition of x as a sum y + z with y ∈ M and z ∈ M
⊥
. That is, P is the
projection P
M
.
We collect all the elementary properties of projections into the next theorem.
Projections P
1
and P
2
are orthogonal if P
1
P
2
= 0. Since projections are self–
adjoint, P
1
P
2
= 0 if and only if P
2
P
1
= 0.
The projection P
L
is part of the projection P
M
if and only if L ⊂ M.
Theorem 5.5. [1] Projections P
M
and P
N
are orthogonal if and only if M ⊥ N.
[2] The sum of two projections P
M
and P
N
is a projection if and only if P
M
P
N
= 0.
In that case, P
M
+P
N
= P
M⊕N
.
[3] The product of two projections P
M
and P
N
is another projection if and only if
P
M
P
N
= P
N
P
M
. In that case, P
M
P
N
= P
M∩N
.
[4] P
L
is part of P
M
⇐⇒ P
M
P
L
= P
L
⇐⇒ P
L
P
M
= P
L
⇐⇒ P
L
x ≤
P
M
x ∀ x ∈ H.
[5] If P is a projection, then I −P is a projection.
[6] More generally, P = P
M
−P
L
is a projection if and only if P
L
is a part of P
M
.
If so, then P = P
ML
.
Proof. [1] Let P
M
P
N
= 0 and x ∈ M, y ∈ N. Then
(x, y) = (P
M
x, P
N
y) = (P
N
P
M
x, y) = 0,
so M ⊥ N. Conversely, if M ⊥ N then for any x ∈ H, P
N
x ⊥ M so P
M
(P
N
x) = 0.
[2] If P = P
M
+P
N
is a projection, then P
2
= P, so P
M
P
N
+P
N
P
M
= 0. Hence
P
M
P
N
+P
M
P
N
P
M
= 0,
after multiplying by P
M
on the left. Multiplying on the right by P
M
then gives
2P
M
P
N
P
M
= 0 so P
M
P
N
= 0.
Conversely, if P
M
P
N
= 0 then P
N
P
M
= 0 also, so P
2
= P. Since P is self–
adjoint, it is a projection.
Finally, it is clear that (P
M
+P
N
)(H) = M ⊕N so P = P
M⊕N
.
[3] If P = P
M
P
N
is a projection, then P
∗
= P, so P
M
P
N
= (P
M
P
N
)
∗
= P
∗
N
P
∗
M
=
P
N
P
M
.
Conversely, let P
M
P
N
= P
N
P
M
= P. Then P
∗
= P, so P is self–adjoint.
Also P
2
= P
M
P
N
P
M
P
N
= P
2
M
P
2
N
= P
M
P
N
= P, so P is a projection. Moreover,
Px = P
M
(P
N
x) = P
N
(P
M
x) so Px ∈ M ∩ N. On the other hand, if x ∈ M ∩ N
then Px = P
M
(P
N
x) = P
M
x = x so P = P
M∩N
.
[4] Assume that P
L
is part of P
M
, so L ⊂ M. Then P
L
x ∈ M for all x ∈ H. Hence
P
M
P
L
x = P
L
x, and P
M
P
L
= P
L
.
If P
M
P
L
= P
L
, then
P
L
= P
∗
L
= (P
M
P
L
)
∗
= P
∗
L
P
∗
M
= P
L
P
M
,
so P
L
P
M
= P
L
.
If P
L
P
M
= P
L
, then for any x ∈ H,
P
L
x = P
L
P
M
x ≤ P
L
P
M
x ≤ P
m
x,
so that P
L
x ≤ P
M
x.
Finally, assume that P
L
x ≤ P
M
x. If there is a point x
0
∈ L¸M then let
x
0
= y
0
+z
0
, y
0
∈ M, z
0
⊥ M, and z
0
,= 0. Then
P
L
x
0

2
= y
0

2
+z
0

2
> y
0

2
= P
M
x
0

2
,
54 5. HILBERT SPACES
so there can be no such point. It follows that L ⊂ M so P
L
is a part of P
M
.
[5] I −P is self–adjoint, and (I −P)
2
= I −P −P +P
2
= I −P.
[6] If P is a projection, then by [5] so is I −P = (I −P
M
) +P
L
. Also by [5], I −P
M
is a projection, so by [2] we must have (I − P
M
)P
L
= 0. That is, P
L
= P
M
P
L
.
Hence, by [4], P
L
is a part of P
M
.
Conversely, if P
L
is part of P
M
, then P
M
−P
L
and P
L
are orthogonal. By [2],
the subspace Y of P
M
−P
L
must therefore satisfy Y ⊕L = M, so Y = M ¸L.
4. Orthonormal sets
A subset K in a Hilbert space H is orthonormal if each element of K has norm
1, and if any two elements of K are orthogonal. An orthonormal set K is complete
if K
⊥
= 0.
Theorem 5.6. Let ¦x
n
¦ be an orthonormal sequence in H. Then for any x ∈
H,
∞
n=1
[(x, x
n
)[
2
≤ x
2
. (48)
The inequality (48) is Bessel’s inequality. The scalar coeﬃcients (x, x
n
) are
called the Fourier coeﬃcients of x with respect to ¦x
n
¦.
Proof. We have
_
_
_
_
_
x −
m
n=1
(x, x
n
)x
n
¸
¸
¸
¸
¸
2
= x
2
−
_
x,
m
n=1
(x, x
n
)x
n
_
−
_
m
n=1
(x, x
n
)x
n
, x
_
+
m
n=1
(x, x
n
)(x
n
, x),
so
_
_
_
_
_
x −
m
n=1
(x, x
n
)x
n
_
_
_
_
_
2
= x
2
−
m
n=1
[(x, x
n
[
2
. (49)
It follows that
m
n=1
[(x, x
n
)[
2
≤ x
2
,
and Bessel’s inequality follows by taking m → ∞.
The next result shows that the Fourier series of Theorem 5.6 is the best possible
approximation of ﬁxed length.
Theorem 5.7. Let ¦x
n
¦ be an orthonormal sequence in a Hilbert space H and
let ¦λ
n
¦ be any sequence of scalars. Then, for any n ≥ 1,
_
_
_
_
_
x −
m
n=1
λ
n
x
n
_
_
_
_
_
≥
_
_
_
_
_
x −
m
n=1
(x, x
n
)x
n
_
_
_
_
_
.
4. ORTHONORMAL SETS 55
Proof. Write c
n
= (x, x
n
). Then
_
_
_
_
_
x −
m
n=1
λ
n
x
n
_
_
_
_
_
2
= x
2
−
m
n=1
¯
λ
n
c
n
−
m
n=1
λ
n
¯ c
n
+
m
n=1
[λ
n
[
2
= x
2
−
m
n=1
[c
n
[
2
+
m
n=1
[c
n
−λ
n
[
2
≥ x
2
−
m
n=1
[c
n
[
2
.
Now apply equation (49).
Theorem 5.8. Let ¦x
n
¦ be an orthonormal sequence in a Hilbert space H, and
let ¦α
n
¦ be any sequence of scalars. Then the series
α
n
x
n
is convergent if and
only if
[α
n
[
2
< ∞, and if so
_
_
_
_
_
∞
n=1
α
n
x
n
_
_
_
_
_
=
_
∞
n=1
[α
n
[
2
_
1/2
. (50)
Moreover, the sum
α
n
x
n
is independent of the order in which the terms are
arranged.
Proof. For m > n we have (by orthonormality)
_
_
_
_
_
_
m
j=n
α
j
x
j
_
_
_
_
_
_
2
=
m
j=n
[α
j
[
2
. (51)
Since H is complete, (51) shows the ﬁrst part of the theorem. Take n = 1 and
m → ∞ in (51) to get (50)
Assume that
[α
j
[
2
< ∞ and let z =
α
j
n
x
j
n
be a rearrangement of the
series x =
α
j
x
j
. Then
x −z
2
= (x, x) + (z, z) −(x, z) −(z, x), (52)
and (x, x) = (z, z) =
[α
j
[
2
. Write
s
m
=
m
j=1
α
j
x
j
, t
m
=
m
n=1
α
j
n
x
j
n
.
Then
(x, z) = lim
m
(s
m
, t
m
) =
∞
j=1
[α
j
[
2
.
Also, (z, x) =
¯
(x, z) = (x, z) so (52) shows that x −z
2
= 0 and hence x = z.
Theorem 5.9. Let K be any orthonormal set in a Hilbert space H, and for
each x ∈ H let K
x
= ¦y [ y ∈ K, (x, y) ,= 0¦. Then:
(i) for any x ∈ H, K
x
is countable;
(ii) the sum Ex =
y∈K
x
(x, y)y converges independently of the order in which
the terms are arranged;
(iii) E is the projection operator onto the closed linear space spanned by K.
56 5. HILBERT SPACES
Proof. From Bessel’s inequality (48), for any > 0 there are no more than
x
2
/
2
points y in K with [(x, y)[ > . Taking =
1
2
,
1
3
, . . . we see that K
x
is
countable for any x.
Bessel’s inequality and Theorem 5.8 show (ii).
Let
¯
< K > denote the closed linear subspace spanned by K. If x ⊥
¯
< K >
then Ex = 0. If x ∈
¯
< K > then for any > 0 there are scalars λ
1
, . . . , λ
n
and
elements y
1
, . . . , y
n
∈ K such that
_
_
_
_
_
_
x −
n
j=1
λ
j
y
j
_
_
_
_
_
_
< .
Then, by Theorem 5.7,
_
_
_
_
_
_
x −
n
j=1
(x, y
j
)y
j
_
_
_
_
_
_
< . (53)
Without loss of generality, all of the y
j
lie in K
x
. Arrange the set K
x
in a sequence
¦y
j
¦. From (49) notice that the left–hand side of (53) does not increase with n.
Taking n → ∞, we get x − Ex < . Since > 0 is arbitrary, we deduce that
Ex = x for all x ∈
¯
< K >. This proves that E = P ¯
<K>
.
Definition 5.5. A set K is an orthonormal basis of H is K is orthonormal
and for every x ∈ H,
x =
y∈K
x
(x, y)y. (54)
Theorem 5.10. Let K be an orthonormal set in a Hilbert space H. Then the
following properties are equivalent.
(i) K is complete;
(ii)
¯
< K > = H;
(iii) K is an orthonormal basis for H;
(iv) for any x ∈ H, x
2
=
y∈K
x
[(x, y)[
2
.
The equality in (iv) is called Parseval’s formula.
Proof. That (i) implies (ii) follows from Corollary 5.1. Assume (ii). Then by
Theorem 5.9, Ex = x for all x ∈ H, so K is an orthonormal basis. Now assume (iii).
Arrange the elements of K
x
in a sequence ¦x
n
¦, and take n → ∞ in (49) to obtain
Parseval’s formula (iv). Finally, assume (iv). If x ⊥ K, then x
2
=
[(x, y)[
2
= 0,
so x = 0. This means that (iv) implies (i).
Theorem 5.11. Every Hilbert space has an orthonormal basis. Any orthonor
mal basis in a separable Hilbert space is countable.
Example 5.2. Classical Fourier analysis comes about using the orthonormal
basis ¦e
2πint
¦
n∈Z
for L
2
[0, 1].
Proof. Let H be a Hilbert space, and consider the classes of orthonormal
sets in H with the partial order of inclusion. By Lemma A.1 there exists a max
imal orthonormal set K. Since K is maximal, it is complete and is therefore an
orthonormal basis.
5. GRAM–SCHMIDT ORTHONORMALIZATION 57
Now let H be separable, and suppose that ¦x
α
¦ is an uncountable orthonormal
basis. Since, for any α ,= β,
x
α
−x
β

2
= x
α

2
+x
β

2
= 2,
the balls B
1/2
(x
α
are mutually disjoint. If ¦y
n
¦ is a dense sequence in H, then
there is a ball B
1/2
(x
α
0
) that does not contain any of the points y
n
. Hence x
α
0
is
not in the closure of ¦y
n
¦, a contradiction.
Corollary 5.3. Any two inﬁnite–dimensional separable Hilbert spaces are iso
metrically isomorphic.
Proof. Let H
1
and H
2
be two such spaces. By Theorem 5.11 there are se
quences ¦x
n
¦ and ¦y
n
¦ that form orthonormal bases for H
1
and H
2
respectively.
Given any points x ∈ H
1
and y ∈ H
2
, we may write
x =
∞
n=1
c
n
x
n
, y =
∞
n=1
d
n
x
n
, (55)
where c
n
= (x, x
n
) and d
n
= (y, y
n
) for all n ≥ 1. Deﬁne a map T : H
1
→ H
2
by
Tx = y if c
n
= d
n
for all n in (55). It is clear that T is linear and it maps H
1
onto
H
2
since the sequences (c
n
) and (d
n
) run through all of
2
. Also,
Tx
2
=
∞
n=1
[d
n
[
2
=
∞
n=1
[c
n
[
2
= x
2
,
so T is an isometry.
5. Gram–Schmidt orthonormalization
Starting with any linearly independent set ¦x
1
, x
2
, . . . ¦ is a a Hilbert space
H, we can inductively construct an orthonormal set that spans the same subspace
by the Gram–Schmidt Orthonormalization process (Theorem 5.12). The idea is
simple: ﬁrst, any vector v can be reduced to unit length simply by dividing by
the length v. Second, if x
1
is a ﬁxed unit vector and x
2
is another unit vector
with ¦x
1
, x
2
¦ linearly independent, then x
2
−(x
2
, x
1
)x
1
is a non–zero vector (since
x
1
and x
2
are independent), is orthogonal to x
1
(since (x
1
, x
2
− (x
2
, x
1
)x
1
) =
(x
1
, x
2
) −
¯
(x
2
, x
1
)(x
1
, x
1
) = (x
1
, x
2
) −(x
2
, x
1
) = 0), and ¦x
1
, x
2
−(x
2
, x
1
)x
1
¦ spans
the same space as ¦x
1
, x
2
¦. This idea can be extended as follows – the notational
complexity comes about because of the need to renormalize (make the new vector
unit length).
We will only need this for sets whose linear span is dense.
Theorem 5.12. If ¦x
1
, x
2
, . . . ¦ is a linearly independent set whose linear span
is dense in H, then the set ¦φ
1
, φ
2
, . . . ¦ deﬁned below is an orthonormal basis for
H:
φ
1
=
x
1
x
1

,
φ
2
=
x
2
−(x
2
, φ
1
)φ
1
x
2
−(x
2
, φ
1
)φ
1

,
and in general for any n ≥ 1,
φ
n
=
x
n
−(x
n
, φ
1
)φ
1
−(x
n
, φ
2
)φ
2
− −(x
n
, φ
n−1
)φ
n−1
x
n
−(x
n
, φ
1
)φ
1
−(x
n
, φ
2
)φ
2
− −(x
n
, φ
n−1
)φ
n−1

.
58 5. HILBERT SPACES
The proof is obvious unless you try to write it down: the idea is that at each
stage the piece of the next vector x
n
that is not orthogonal to the space spanned
by ¦x
1
, . . . , x
n−1
¦ is subtracted. The vector φ
n
so constructed cannot be zero by
linear independence.
The most important situation in which this is used is to ﬁnd orthonormal bases
for certain weighted function spaces.
Given a < b, a, b ∈ [−∞, ∞] and a function M : (a, b) → (0, ∞) with the
property that
_
b
a
t
n
M(t)dt < ∞ for all n ≥ 1, deﬁne the Hilbert space L
M
P
[a, b] to
be the linear space of measurable functions f with f
M
= (f, f)
1/2
M
< ∞ where
(f, g)
M
=
_
b
a
M(t)f(t)
¯
g(t)dt.
It may be shown that the linearly independent set ¦1, t, t
2
, t
3
, . . . ¦ has a linear span
dense in L
M
2
. The Gram–Schmidt orthonormalization process may be applied to
this set to produce various families of classical orthonormal functions.
Example 5.3. [1] If M(t) = 1 for all t, a = −1, b = 1, then the process
generates the Legendre polynomials.
[2] If M(t) =
1
√
1−t
2
, a = −1, b = 1, then the process generates the Tchebychev
polynomials.
[3] If M(t) = t
q−1
(1 − t)
p−q
, a = 0, b = 1 (with q > 0 and p − q > −1), then the
process generates the Jacobi polynomials.
[4] If M(t) = e
−t
2
, a = −∞, b = ∞, then the process generates the Hermite
polynomials.
[5] If M(t) = e
−t
, a = 0, b = ∞, then the process generates the Laguerre polyno
mials.
CHAPTER 6
Fourier analysis
In the last chapter we saw some very general methods of “Fourier analysis” in
Hilbert space. Of course the methods started with the classical setting on periodic
complex–valued functions on the real line, and in this chapter we describe the
elementary theory of classical Fourier analysis using summability kernels. The
classical theory of Fourier series is a huge subject: the introduction below comes
mostly from Katznelson
1
and from K¨ orner
2
; both are highly recommended for
further study.
1. Fourier series of L
1
functions
Denote by L
1
(T) the Banach space of complex–valued, Lebesgue integrable
functions on T = [0, 2π)/0 ∼ 2π (this just means periodic functions).
Modify the L
1
–norm on this space so that
f
1
=
1
2π
_
2π
0
[f(t)[dt.
What is going on here is simply this: to avoid writing “2π” hundreds of times, we
make the unit circle have “length” 2π. To recover the useful normalization that the
L
1
–norm of the constant function 1 is 1, the usual L
1
–norm is divided by 2π.
Notice that the translate f
x
of a function has the same norm, where f
x
(t) =
f(t −x).
Definition 6.1. A trigonometric polynomial on T is an expression of the form
P(t) =
N
n=−N
a
n
e
int
,
with a
n
∈ C.
Lemma 6.1. The functions ¦e
int
¦
n∈Z
are pairwise orthogonal in L
2
. That is,
1
2π
_
2π
0
e
int
e
−imt
dt =
1
2π
_
2π
0
e
i(n−m)t
dt =
_
1 if n = m,
0 if n ,= m.
It follows that if the function P(t) is given, we can recover the coeﬃcients a
n
by computing
a
n
=
1
2π
_
2π
0
P(t)e
−int
dt.
1
An introduction to Harmonic Analysis, Y. Katznelson, Dover Publications, New York
(1976).
2
Fourier Analysis, T. K¨orner, Cambridge University Press, Cambridge.
59
60 6. FOURIER ANALYSIS
It will be useful later to write things like
P ∼
N
n=−N
a
n
e
int
which means that P is identiﬁed with the formal sum on the right hand side. The
expression P(t) = . . . is a function deﬁned by the value of the right hand side for
each value of t.
Definition 6.2. A trigonometric series on T is an expression
S ∼
∞
n=−∞
a
n
e
int
. (56)
The conjugate of S is the series
˜
S ∼
∞
n=−∞
−isign(n)a
n
e
int
(57)
where sign(n) = 0 if n = 0 and = n/[n[ if not.
Notice that there is no assumption about convergence, so in general S is not
related to a function at all.
Definition 6.3. Let f ∈ L
1
(T). Deﬁne the nth (classical) Fourier coeﬃcient
of f to be
ˆ
f(n) =
1
2π
_
f(t)e
−int
dt (58)
(the integration is from 0 to 2π as usual). Associate to f the Fourier series S[f],
which is deﬁned to be the formal trigonometric series
S[f] ∼
∞
n=−∞
ˆ
f(n)e
int
. (59)
We say that a given trigonometric series (56) is a Fourier series if it is of the
form (59) for some f ∈ L
1
(T).
Theorem 6.1. Let f, g ∈ L
1
(T). Then
[1]
(f +g)(n) =
ˆ
f(n) +
ˆ
g(n).
[2] For λ ∈ C,
¯
(λf)(n) = λ
ˆ
f(n).
[3] If f(t) = (f(t) is the complex conjugate of f then
ˆ
f(n) =
ˆ
f(−n).
[4] If f
x
(t) = f(t −x) is the translate of f, then
ˆ
f
x
(n) = e
−inx
ˆ
f(n).
[5] [
ˆ
f(n)[ ≤
1
2π
_
[f(t)[dt = f
1
.
Prove these as an exercise.
Notice that f →
ˆ
f sends a function in L
1
(T) to a function in C(Z), the con
tinuous functions on Z with the sup norm. This map is continuous in the following
sense.
Corollary 6.1. Assume (f
j
) is a sequence in L
1
(T) with f
j
− f
1
→ 0.
Then
ˆ
f
j
→
ˆ
f uniformly.
Proof. This follows at once from Theorem 6.1[5].
2. CONVOLUTION IN L
1
61
Theorem 6.2. Let f ∈ L
1
(T) have
ˆ
f(0) = 0. Deﬁne
F(t) =
_
t
0
f(s)ds.
Then F is continuous, 2π periodic, and
ˆ
F(n) =
1
in
ˆ
f(n)
for all n ,= 0.
Proof. It is clear that F is continuous since it is the integral of an L
1
function.
Also,
F(t + 2π) −F(t) =
_
t+2π
t
f(s)ds = 2π
ˆ
f(0) = 0.
Finally, using integration by parts
ˆ
F(n) =
1
2π
_
2π
0
F(t)e
−int
dt = −
1
2π
_
2π
0
F
(t)
−1
in
e
−int
dt =
1
in
ˆ
fn.
Notice that we have used the symbol F
– the function F is diﬀerentiable
because of the way it was deﬁned.
2. Convolution in L
1
In this section we introduce a form of “multiplication” on L
1
(T) that makes
it into a Banach algebra (see Deﬁnition 3.3). Notice that the only real properties
we will use is that the circle T is a group on which the measure ds is translation
invariant:
_
f
x
(s)ds =
_
fds.
Theorem 6.3. Assume that f, g are in L
1
(T). Then, for almost every s, the
function f(t − s)g(s) is integrable as a function of s. Deﬁne the convolution of f
and g to be
(F ∗ g)(t) =
1
2π
_
f(t −s)g(s)ds. (60)
Then f ∗ g ∈ L
1
(T), with norm
f ∗ g
1
≤ f
1
g
1
.
Moreover
(f ∗ g)(n) =
ˆ
f(n)
ˆ
g(n).
Proof. It is clear that F(t, s) = f(t − s)g(s) is a measurable function of the
variable (s, t). For almost all s, F(t, s) is a constant multiple of f
s
, so is integrable.
Moreover
1
2π
_ _
1
2π
_
[F(t, s)[dt
_
ds =
1
2π
_
[g(s)[f
1
ds = f
1
g
1
.
So, by Fubini’s Theorem 4.4, f(t −s)g(s) is integrable as a function of s for almost
all t, and
1
2π
_
[(f∗g)(t)[dt =
1
2π
_ ¸
¸
¸
¸
1
2π
_
F(t, s)ds
¸
¸
¸
¸
dt ≤
1
4π
2
_ _
[F(t, s)[dtds = f
1
g
1
,
62 6. FOURIER ANALYSIS
showing that f ∗ g
1
≤ f
1
g
1
. Finally, using Fubini again to justify a change
in the order of integration,
(f ∗ g)(n) =
1
2π
_
(f ∗ g)(t)e
−int
dt =
1
4π
2
_ _
f(t −s)e
−in(t−s)
g(s)dtds
=
1
2π
_
f(t)e
−int
dt
1
2π
_
g(s)e
−ins
ds
=
ˆ
f(n)
ˆ
g(n).
Lemma 6.2. The operation (f, g) → f ∗ g is commutative, associative, and
distributive over addition.
Prove this as an exercise.
Lemma 6.3. If f ∈ L
1
(T) and k(t) =
N
n=−N
a
n
e
int
then
(k ∗ f)(t) =
N
n=−N
a
n
ˆ
f(n)e
int
.
Thus convolving with the function e
int
picks out the nth Fourier coeﬃcient.
Proof. Simply check this one term at a time: if χ
n
(t) = e
int
, then
(χ
n
∗ f)(t) =
1
2π
_
e
in(t−s)
f(s)ds = e
int
1
2π
_
f(s)e
−ins
ds.
3. Summability kernels and homogeneous Banach algebras
Two properties of the Banach space L
1
(T) are particularly important for Fourier
analysis.
Theorem 6.4. If f ∈ L
1
(T) and x ∈ T, then
f
x
(t) = f(t −x) ∈ L
1
(T) and f
x

1
= f
1
.
Also, the function x → f
x
is continuous on T for each f ∈ L
1
(T).
Proof. The translation invariance is clear.
In order to prove the continuity we must show that
lim
x→x
0
f
x
−f
x
0

1
= 0. (61)
Now (61) is clear if f is continuous. On the other hand, the continuous functions
are dense in L
1
(T), so given f ∈ L
1
(T) and > 0 we may choose g ∈ C(T) such
that
g −f
1
< .
Then
f
x
−f
x
0

1
≤ f
x
−g
x

1
+g
x
−f
x
0

1
+g
x
0
−f
x
0

1
= (f −g)
x

1
+g
x
−g
x
0

1
+(g −f)
x
0

1
< 2 +g
x
−g
x
0

1
.
It follows that
limsup
x→x
0
f
x
−f
x
0

1
< 2,
3. SUMMABILITY KERNELS AND HOMOGENEOUS BANACH ALGEBRAS 63
so the theorem is proved.
Definition 6.4. A summability kernel is a sequence (k
n
) of continuous 2π–
periodic functions with the following properties:
1
2π
_
k
n
(t)dt = 1 for all n. (62)
There is an R such that
1
2π
_
[k
n
(t)[dt ≤ R for all n. (63)
For all δ > 0, lim
n→∞
_
2π−δ
δ
[k
n
(t)[dt = 0. (64)
If in addition k
n
(t) ≥ 0 for all n and t then (k
n
) is called a positive summability
kernel.
Theorem 6.5. Let f ∈ L
1
(T) and let (k
n
) be a summability kernel. Then
f = lim
n→∞
1
2π
_
k
n
(s)f
s
ds
in the L
1
norm.
Proof. Write φ(s) = f
s
(t) = f(t − s) for ﬁxed t. By Theorem 6.4 φ is
a continuous L
1
(T)–valued function on T, and φ(0) = f. We will be integrating
L
1
(T)–valued functions – see the Appendix for a brief deﬁnition of what this means.
Then for any 0 < δ < π, by (62) we have
1
2π
_
k
n
(s)φ(s)ds −φ(0) =
1
2π
_
k
n
(s) (φ(s) −φ(0)) ds
=
1
2π
_
δ
−δ
k
n
(s) (φ(s) −φ(0)) ds
+
1
2π
_
2π−δ
δ
k
n
(s) (φ(s) −φ(0)) ds.
The two parts may be estimated separately:

1
2π
_
δ
−δ
k
n
(s) (φ(s) −φ(0)) ds
1
≤ max
s≤δ
φ(s) −φ(0)
1
k
n

1
, (65)
and

1
2π
_
2π−δ
δ
k
n
(s) (φ(s) −φ(0)) ds
1
≤ max φ(s) −φ(0)
1
1
2π
_
2π−δ
δ
[k
n
(s)[ds.
(66)
Using (63) and the fact that φ is continuous at s = 0, given any > 0 there is
a δ > 0 such that (65) is bounded by . With the same δ, (64) implies that (66)
converges to 0 as n → ∞, so that
1
2π
_
k
n
(s)φ(s)ds − φ(0) is bounded by 2 for
large n.
The integral appearing in Theorem 6.5 looks a bit like a convolution of L
1
(T)–
valued functions. This is not a problem for us. Consider ﬁrst the following lemma.
Lemma 6.4. Let k be a continuous function on T, and f ∈ L
1
(T). Then
1
2π
_
k(s)f
s
ds = k ∗ f. (67)
64 6. FOURIER ANALYSIS
Proof. Assume ﬁrst that f is continuous on T. Then, making the obvious
deﬁnition for the integral,
1
2π
_
k(s)f
s
ds =
1
2π
lim
j
(s
j+1
−s
j
)k(s
j
)f
s
j
,
with the limit taken in the L
1
(T) norm as the partition of T deﬁned by ¦s
1
, . . . , s
j
, . . . ¦
becomes ﬁner. On the other hand,
1
2π
lim
j
(s
j+1
−s
j
)k(s
j
)f(t −s
j
) = (k ∗ f)(t)
uniformly, proving the lemma for continuous f.
For arbitrary f ∈ L
1
(T), ﬁx > 0 and choose a continuous function g with
f −g
1
< . Then
1
2π
_
k(s)f
s
ds −k ∗ f =
1
2π
_
k(s)(f −g)
s
ds +k ∗ (g −f),
so
_
_
_
_
1
2π
_
k(s)f
s
ds −k ∗ f
_
_
_
_
1
≤ 2k
1
.
Lemma 6.4 means that Theorem 6.5 can be written in the form
f = lim
n→∞
k
n
∗ f in L
1
. (68)
4. Fej´er’s kernel
Deﬁne a sequence of functions
K
n
(t) =
n
j=−n
_
1 −
[j[
n + 1
_
e
ijt
.
Lemma 6.5. The sequence (K
n
) is a summability kernel.
Proof. Property (62) is clear.
Now notice that
_
−
1
4
e
−it
+
1
2
−
1
4
e
it
_
n
j=−n
_
1 −
[j[
n + 1
_
e
ijt
=
1
n + 1
_
−
1
4
e
−i(n+1)t
+
1
2
−
1
4
e
i(n+1)t
_
.
On the other hand,
sin
2
t
2
=
1
2
(1 −cos t) = −
1
4
e
−it
+
1
2
−
1
4
e
it
,
so
K
n
(t) =
1
n + 1
_
sin
n+1
2
t
sin
1
2
t
_2
. (69)
Property (64) follows, and this also shows that K
n
(t) ≥ 0 for all n and t.
Prove property (63) as an exercise.
4. FEJ
´
ER’S KERNEL 65
The following graph is the Fej´er kernel K
11
.
Definition 6.5. Write σ
n
(f) = K
n
∗ f.
Using Lemma 6.3, it follows that
σ
n
(f)(t) =
n
j=−n
_
1 −
[j[
n + 1
_
ˆ
f(j)e
ijt
, (70)
and (68) means that
σ
n
(f) → f
in the L
1
norm for every f ∈ L
1
(T). It follows at once that the trigonometric
polynomials are dense in L
1
(T). The most important consequences are however
more general statements about Fourier series.
Theorem 6.6. If f, g ∈ L
1
(T) have
ˆ
f(n) =
ˆ
g(n) for all n ∈ Z, then f = g.
Proof. It is enough to show that
ˆ
f(n) = 0 for all n implies that f = 0. Using
(70), we see that if
ˆ
f(n) = 0 for all n, then σ
n
(f) = 0 for all n; since σ
n
(f) → f, it
follows that f = 0.
Corollary 6.2. The family of functions ¦e
int
¦
n∈Z
form a complete orthonor
mal system in L
2
(T).
Proof. It is enough to notice that
(f, e
int
) =
ˆ
f(n).
Then for all f ∈ L
2
(T), the function f and its Fourier series have identical Fourier
coeﬃcients, so must agree.
We also ﬁnd a very general statement about the decay of Fourier coeﬃcients:
the Riemann– Lebesgue Lemma.
Theorem 6.7. Let f ∈ L
1
(T). Then lim
n→∞
ˆ
f(n) = 0.
Proof. Fix an > 0, and choose a trigonometric polynomial P with the
property that
f −P
1
< .
66 6. FOURIER ANALYSIS
If [n[ exceeds the degree of P, then
[
ˆ
f(n)[ = [
(f −P)(n)[ ≤ f −P
1
< .
Recall that for f ∈ L
1
(T), the Fourier series was deﬁned (formally) to be
S[f] ∼
∞
n=−∞
ˆ
f(n)e
int
,
and the nth partial sum corresponds to the function
S
n
(f)(t) =
n
j=−n
ˆ
f(j)e
ijt
. (71)
Looking at equations (71) and (70), we see that σ
n
(f) is the arithmetic mean of
S
0
(f), S
1
(f), . . . , S
n
(f):
σ
n
(f) =
1
n + 1
(S
0
(f) +S
1
(f) + +S
n
(f)) . (72)
It follows that if S
n
(f) converges in L
1
(T), then it must converge to the same thing
as σ
n
, that is to f (if this is not clear to you, look at Corollary 6.3 below.
The partial sums S
n
(f) also have a convolution form: using (70) we have that
S
n
(f) = D
n
∗ f where (D
n
) is the Dirichlet kernel deﬁned by
D
n
(t) =
n
j=−n
e
ijt
=
sin(n +
1
2
)t
sin
1
2
t
.
Notice that (D
n
) is not a summability kernel: it has property (62) but does not
have (63) (as we saw in Lemma 3.3) nor does it have (64). This explains why the
question of convergence for Fourier series is so much more subtle than the problem
of summability. The following graph is the Dirichlet kernel D
11
.
Definition 6.6. The de la Vall´ee Poussin kernel is deﬁned by
V
n
(t) = 2K
2n+1
(t) −K
n
(t).
Properties (62), (63) and (64) are clear.
5. POINTWISE CONVERGENCE 67
The next picture is the de la Vall´ee Poussin kernel with n = 11.
This kernel is useful because V
n
is a polynomial of degree 2n+1 with
´
V
n
(j) = 1
for [j[ ≤ n + 1, so it may be used to construct approximations to a function f
by trigonometric polynomials having the same Fourier coeﬃcients as f for small
frequencies.
5. Pointwise convergence
Recall that a sequence of elements (x
n
) in a normed space (X,  ) converges
to x if x
n
− x → 0 as n → ∞. If the space X is a space of complex–valued
functions on some set Z (for example, L
1
(T), C(T)), then there is another notion
of convergence: x
n
converges to x pointwise if for every z ∈ Z, x
n
(z) → x(z)
as a sequence of complex numbers. The question addressed in this section is the
following: does the Fourier series of a function converge pointwise to the original
function?
In the last section, we showed that for L
1
functions on the circle, σ
n
(f) con
verges to f with respect to the norm of any homogeneous Banach algebra containing
f. Applying this to the Banach algebra of continuous functions with the sup norm,
we have that σ
n
(f) → f uniformly for all f ∈ C(T).
If the function f is not continuous on T, then the convergence in norm of σ
n
(f)
does not tell us anything about the pointwise convergence. In addition, if σ
n
(f, t)
converges for some t, there is no real reason for the limit to be f(t).
Theorem 6.8. Let f be a function in L
1
(T).
(a) If
lim
h→0
(f(t +h) +f(t −h))
exists (the possibility that the limit is ±∞ is allowed), then
σ
n
(f, t) −→
1
2
lim
h→0
(f(t +h) +f(t −h)) .
(b) If f is continuous at t, then σ
n
(f, t) −→ f(t).
(c) If there is a closed interval I ⊂ T on which f is continuous, then σ
n
(f, )
converges uniformly to f on I.
Corollary 6.3. If f is continuous at t, and if the Fourier series of f converges
at t, then it must converge to f(t).
68 6. FOURIER ANALYSIS
Proof. Recall equation (72):
σ
n
(f) =
1
n + 1
(S
0
(f) +S
1
(f) + +S
n
(f)) .
By assumption and (b), σ
n
(f, t) → f(t) and S
n
(f, t) → S(t) say. Write the right
hand side as
1
n + 1
_
S
0
(t) +S
1
(t) + +S
√
n
(t)
_
+
1
n + 1
_
S
√
n+1
(t) + +S
n
(t)
_
.
The ﬁrst term converges to zero as n → ∞ (since the convergent sequence (S
n
(t))
is bounded). For the second term, choose and ﬁx and choose n so large that
[S
k
(t) −S(t)[ <
for all k ≥
√
n. Then the whole second term is within
_
n−
√
n
n+1
_
of S(t). It follows
that
1
n + 1
(S
0
(f) +S
1
(f) + +S
n
(f)) → S(t)
as n → ∞, so S(t) must coincide with lim
n→∞
σ
n
(f, t) = f(t).
Turning to the proof of Theorem 6.8, recall that the Fej´er kernel (K
n
) (see
Lemma 6.5) is a positive summability kernel with the following properties:
lim
n→∞
_
sup
θ<t<2π−θ
K
n
(t)
_
= 0 for any θ ∈ (0, π), (73)
and
K
n
(t) = K
n
(−t). (74)
Proof of Theorem 6.8. Deﬁne
ˇ
f(t) = lim
h→0
1
2
(f(t +h) +f(t −h)) ,
and assume that this limit is ﬁnite (a similar argument works for the inﬁnite cases).
We wish to show that σ
n
(f, t) −
ˇ
f(t) is small for large n. Evaluate the diﬀerence,
σ
n
(f, t) −
ˇ
f(t) =
1
2π
_
T
K
n
(τ)
_
f(t −τ) −
ˇ
f(t)
_
dτ
=
1
2π
_
θ
−θ
K
n
(τ)
_
f(t −τ) −
ˇ
f(t)
_
dτ
+
_
2π−θ
θ
K
n
(τ)
_
f(t −τ) −
ˇ
f(t)
_
dτ.
Applying (74) this may be written
σ
n
(f, t) −
ˇ
f(t) =
1
π
_
_
θ
0
+
_
π
θ
_
K
n
(τ)
_
f(t −τ) +f(t +τ)
2
−
ˇ
f(t)
_
dτ.
(75)
Fix > 0, and choose θ ∈ (0, π) small enough to ensure that
τ ∈ (−θ, θ) =⇒
¸
¸
¸
¸
f(t −τ) +f(t +τ)
2
−
ˇ
f(t)
¸
¸
¸
¸
< , (76)
and choose N large enough to ensure that
n > N =⇒ sup
θ<τ<2π−θ
K
n
(τ) < . (77)
6. LEBESGUE’S THEOREM 69
Putting the estimates (76) and (77) into the expression (75) gives
¸
¸
σ
n
(f, t) −
ˇ
f(t)
¸
¸
< +f −
ˇ
f(t)
1
,
which proves (a).
Part (b) follows at once from (a).
For (c), notice
3
that f must be uniformly continuous on I. This means that
(given > 0) θ can be chosen so that (76) holds for all t ∈ I and N depends only
on θ and . This means that a uniform estimate of the form
¸
¸
σ
n
(f, t) −
ˇ
f(t)
¸
¸
< +f −
ˇ
f(t)
1
,
can be found for all t ∈ I.
6. Lebesgue’s Theorem
The Fej´er condition, that
ˇ
f(t) = lim
h→0
f(t +h) +f(t −h)
2
(78)
exists is very strong, and is not preserved if the function f is modiﬁed on a null
set. This means that property (78) is not really well–deﬁned on L
1
. However, (78)
implies another property: there is a number
ˇ
f(t) for which
lim
h→0
1
h
_
h
0
¸
¸
¸
¸
f(t +h) +f(t −h)
2
−
ˇ
f(t)
¸
¸
¸
¸
dτ = 0. (79)
This is a more robust condition, better suited to integrable functions
4
.
Theorem 6.9. If f has property (79) at t, then σ
n
(f, t) →
ˇ
f(t). In particular
(by the footnote), for almost every value of t, σ
n
(f, t) →
ˇ
f(t).
Corollary 6.4. If the Fourier series of f ∈ L
1
(T) converges on a set F of
positive measure, then almost everywhere on F the Fourier series must converge to
f. In particular, a Fourier series that converges to zero almost everywhere must
have all its coeﬃcients equal to zero.
Remark 6.1. The case of trigonometric series is diﬀerent: a basic counter–
example in the theory of trigonometric series is that there are non–zero trigono
metric series that converge to zero almost everywhere. On the other hand, a trigono
metric series that converges to zero everywhere must have all coeﬃcients zero
5
.
Proof of 6.9. Recall the expression (75) in the proof of Theorem 6.8,
σ
n
(f, t) −
ˇ
f(t) =
1
π
_
_
θ
0
+
_
π
θ
_
K
n
(τ)
_
f(t −τ) +f(t +τ)
2
−
ˇ
f(t)
_
dτ.
(80)
Also, by (69),
K
n
(τ) =
1
n + 1
_
sin
n+1
2
τ
1
2
τ
_
, (81)
3
A continuous function on a closed bounded interval is uniformly continuous.
4
There are functions f with the property that Fej´er’s condition (78) does not hold anywhere,
but (79) does hold for any f ∈ L
1
(T), for almost all t with
ˇ
f(t) = f(t). This is described in
volume 1 of Trigonometric Series, A. Zygmund, Cambridge University Press, Cambridge (1959).
5
See Chapter 5 of Ensembles parfaits et series trigonometriques, J.P. Kahane and R. Salem,
Hermann (1963).
70 6. FOURIER ANALYSIS
and sin
τ
2
>
τ
π
for 0 < τ < π, so
K
n
(τ) ≤ min
_
n + 1,
π
2
(n + 1)τ
2
_
. (82)
It follows that the second integral in (80) will converge to zero so long as (n +1)θ
2
does. Pick θ = n
−1/4
; this guarantees that as n → ∞ the second integral tends to
zero.
Now consider the ﬁrst integral. Write
Ψ(h) =
_
h
0
¸
¸
¸
¸
f(t +h) +f(t −h)
2
−
ˇ
f(t)
¸
¸
¸
¸
dτ.
Then
¸
¸
¸
¸
¸
1
π
_
θ
0
K
n
(τ)
_
f(t +τ) +f(t −τ)
2
−
ˇ
f(t)
_
dτ
¸
¸
¸
¸
¸
is bounded above by
1
π
¸
¸
¸
¸
¸
_
1/n
0
¸
¸
¸
¸
¸
+
1
π
¸
¸
¸
¸
¸
_
θ
1/n
¸
¸
¸
¸
¸
≤
n + 1
π
Ψ(
1
n
)
+
π
n + 1
_
θ
1/n
¸
¸
¸
¸
f(t +τ) +f(t −τ)
2
−
ˇ
f(t)
¸
¸
¸
¸
dτ
τ
2
(we have used the estimate for K
n
from (82)). By the assumption (79), the ﬁrst
term
n+1
π
Ψ(
1
n
) tends to zero. Apply integration by parts to the second term gives
π
n + 1
_
θ
1/n
¸
¸
¸
¸
f(t +τ) +f(t −τ)
2
−
ˇ
f(t)
¸
¸
¸
¸
dτ
τ
2
=
π
n + 1
_
Ψ(τ)
τ
2
_
θ
1/n
+
2π
n + 1
_
θ
1/n
Ψ(τ)
τ
3
dτ.
(83)
For given > 0 and n > n() (79) gives
Ψ(τ) < τ for τ ∈ (0, θ = n
−1/4
).
It follows that (83) is bounded above by
πn
n + 1
+
2π
n + 1
_
θ
1/n
dτ
τ
2
< 3π,
which completes the proof.
APPENDIX A
1. Zorn’s lemma and Hamel bases
Definition A.1. A partially ordered set or poset is a non–empty set S together
with a relation ≤ that satisﬁes the following conditions:
(i) x ≤ x for all x ∈ S;
(ii) if x ≤ y and y ≤ z then x ≤ z for all x, y, z ∈ S.
If in addition for any two elements x, y of S at least one of the relations x ≤ y
or y ≤ x holds, then we say that S is a totally ordered set.
The set of subsets of a set X, with ≤ meaning inclusion, deﬁnes a partially
ordered set for example.
Definition A.2. Let S be a partially ordered set, and T any subset of S. An
element x ∈ S is an upper bound of T if y ≤ x for all y ∈ T.
Definition A.3. Let S be a partially ordered set. An element S ∈ S is maxi
mal if for any y ∈ S, x ≤ y =⇒ y ≤ x.
The next result, Zorn’s lemma, is one of the formulations of the Axiom of
Choice.
Theorem A.1. If S is a partially ordered set in which every totally ordered
subset has an upper bound, then S has a maximal element.
This result is used frequently to “construct” things – though whenever we use it
all we really are able to do is assert that something must exist subject to assuming
the Axiom of Choice. An example is the following result – as usual, trivial in ﬁnite
dimensions.
To see that the following theorem is “constructing” something a little surprising,
think of the following examples: R is a linear space over Q; L
2
[0, 1] is a linear space
over R.
Theorem A.2. Let X be a linear space over any ﬁeld. Then S contains a
set / of linearly independent elements such that the linear subspace spanned by /
coincides with X.
Any such set / is called a Hamel basis for X. It is quite a diﬀerent kind of
object to the usual spanning set or basis used, where X is the closure of the span
of the basis. If the Hamel basis is / = ¦x
λ
¦
λ∈Λ
, then every element of X has a
(unique) representation
x =
a
λ
x
λ
in which the sum is ﬁnite and the the a
λ
are scalars.
71
72 A
Proof. Let S be the set of subsets of X that comprise linearly independent
elements, and write S = ¦/, B, (, . . . ¦. Deﬁne a partial ordering on S by / ≤ B if
and only if / ⊂ B.
We ﬁrst claim that if ¦/
α
¦ is a totally ordered subset of S, it has the upper
bound B =
α
/
α
. In order to prove this, we must show that any ﬁnite number
of elements x
1
, . . . , x
n
of B are linearly independent. Assume that x
i
∈ /
α
i
for
i = 1, . . . , n. Since the set ¦/
α
¦ is totally ordered, one of the subsets /
α
j
con
tains all the others. It follows that ¦x
1
, . . . , x
n
¦ ⊂ /
α
j
, so x
1
, . . . , x
n
are linearly
independent.
We may therefore apply Theorem A.1 to conclude that S has a maximal element
/. If y ∈ X is not a ﬁnite linear combination of elements of /, then the set
B = / ∪ ¦y¦ belongs to S (since it is linearly independent), and / ≤ B, but it is
not true that B ≤ /, contradicting the maximality of /.
It follows that every element of X is a ﬁnite linear combination of elements of
/.
2. Baire category theorem
Most of the facts assembled here are really about metric spaces – normed spaces
are a special case of metric spaces.
A subset S ⊂ X of a normed space is nowhere dense if for every point x in the
closure of S, and for every > 0 B
(x) ∩ (X¸
¯
S) is non–empty.
The diameter of S ⊂ X is deﬁned by
diam(S) = sup
a,b∈S
a −b.
Theorem A.3. Let ¦F
n
¦ be a decreasing sequence of non–empty closed sets
(this means F
n
⊃ F
n+1
for all n) in a complete normed space X. If the sequence
of diameters diam(F
n
) converges to zero, then there exists exactly one point in the
intersection
∞
n=1
F
n
.
Proof. If x and y are both in the intersection, then by the deﬁnition of the
diameter, x −y ≤ diam(F
n
) → 0 so x = y. It follows that there can be no more
than one point in the intersection.
Now choose a point x
n
∈ F
n
for each n. Then x
n
− x
m
 ≤ diam(F
n
) → 0
as n ≥ m → ∞. Thus the sequence (x
n
) is Cauchy, so has a limit x say by
completeness. For any n, F
n
is a closed set that contains all the x
m
with m ≥ n,
so x ∈ F
n
. It follows that x ∈
∞
n=1
F
n
.
The next result is a version of the Baire
1
category theorem.
Theorem A.4. A complete normed space cannot be written as a countable
union of nowhere dense sets.
In the langauge of metric spaces, this means that a complete normed space is
of second category.
1
Rene Baire (1874–1932) was one of the most inﬂuential French mathematicians of the early
20th century. His interest in the general ideas of continuity was reinforced by Volterra. In 1905,
Baire became professor of analysis at the Faculty of Science in Dijon. While there, he wrote an
important treatise on discontinuous functions. Baire’s category theorem bears his name today, as
do two other important mathematical concepts, Baire functions and Baire classes.
2. BAIRE CATEGORY THEOREM 73
Proof. Let X be a complete normed space, and suppose that X = ∪
∞
j=1
X
j
where each X
j
is nowhere dense (that is, the sets
¯
X
j
all have empty interior). Fix
a ball B
1
(x
0
). Since
¯
X
1
does not contain B
1
(x
0
) there must be a point x
1
∈ B
1
(x
0
)
with x
1
/ ∈
¯
X
1
. It follows that there is a ball B
r
1
(x
1
) such that
¯
B
r
1
(x
1
) ⊂ B
1
(x
0
)
and
¯
B
r
1
(x
1
) ∩
¯
X
1
= ∅. Assume without loss of generality that r
1
<
1
2
.
Similarly, there is a point x
2
and a radius r
2
such that
¯
B
r
2
(x
2
) ⊂ B
r
1
(x
1
), and
¯
B
r
2
(x
2
)∩
¯
X
2
= ∅, and without loss of generality r
2
<
1
3
. Notice that
¯
B
r
2
(x
2
)∩
¯
X
1
=
∅ automatically since
¯
B
r
2
(x
2
) ⊂ B
r
1
(x
1
).
Inductively, we construct a sequence of decreasing closed balls
¯
B
r
n
(x
n
) such
that
¯
B
r
n
(x
n
) ∩
¯
X
j
= ∅ for 1 ≤ j ≤ n, and r
n
→ 0 as n → ∞.
Now by Theorem A.3, there must be a point x in the intersection of all the
closed balls
¯
B
r
n
(x
n
), so x / ∈
¯
X
j
for all j ≥ 1. This implies that x / ∈ ∪
j≥1
¯
X
j
= X,
a contradiction.
Course objectives In order to reach the more interesting and useful ideas, we shall adopt a fairly brutal approach to some early material. Lengthy proofs will sometimes be left out, though full versions will be made available. By the end of the course, you should have a good understanding of normed vector spaces, Hilbert and Banach spaces, ﬁxed point theorems and examples of function spaces. These ideas will be illustrated with applications to diﬀerential equations.
Books You do not need to buy a book for this course, but the following may be useful for background reading. If you do buy something, the starred books are recommended [1] Functional Analysis, W. Rudin, McGraw–Hill (1973). This book is thorough, sophisticated and demanding. [2] Functional Analysis, F. Riesz and B. Sz.Nagy, Dover (1990). This is a classic text, also much more sophisticated than the course. [3]* Foundations of Modern Analysis, A. Friedman, Dover (1982). Cheap and cheerful, includes a useful few sections on background. [4]* Essential Results of Functional Analysis, R.J. Zimmer, University of Chicago Press (1990). Lots of good problems and a useful chapter on background. [5]* Functional Analysis in Modern Applied Mathematics, R.F. Curtain and A.J. Pritchard, Academic Press (1977). This book is closest to the course. [6]* Linear Analysi, B. Bollobas, Cambridge University Press (1995). This book is excellent but makes heavy demands on the reader.
Contents
Chapter 1. Normed Linear Spaces 1. Linear (vector) spaces 2. Linear subspaces 3. Linear independence 4. Norms 5. Isomorphism of normed linear spaces 6. Products of normed spaces 7. Continuous maps between normed spaces 8. Sequences and completeness in normed spaces 9. Topological language 10. Quotient spaces Chapter 2. Banach spaces 1. Completions 2. Contraction mapping theorem 3. Applications to diﬀerential equations 4. Applications to integral equations Chapter 3. Linear Transformations 1. Bounded operators 2. The space of linear operators 3. Banach algebras 4. Uniform boundedness 5. An application of uniform boundedness to Fourier series 6. Open mapping theorem 7. Hahn–Banach theorem Chapter 4. Integration 1. Lebesgue measure 2. Product spaces and Fubini’s theorem Chapter 5. Hilbert spaces 1. Hilbert spaces 2. Projection theorem 3. Projection and self–adjoint operators 4. Orthonormal sets 5. Gram–Schmidt orthonormalization Chapter 6. Fourier analysis 1. Fourier series of L1 functions 2. Convolution in L1
3
5 5 7 7 7 9 9 10 12 13 15 17 18 19 22 25 29 29 30 32 32 34 36 38 43 43 46 47 47 50 52 54 57 59 59 61
Zorn’s lemma and Hamel bases 2. Baire category theorem . 5. 6. 4. Summability kernels and homogeneous Banach algebras Fej´r’s kernel e Pointwise convergence Lebesgue’s Theorem 62 64 67 69 71 71 72 Appendix A. 1.4 CONTENTS 3.
vector spaces have other analytic properties (though you may not have called them that): for example. In the abstract language we talk about linear maps or operators between vector spaces. Such spaces (over a ﬁxed ground ﬁeld) are determined up to isomorphism by their dimension. For inﬁnite dimensional spaces. b. diﬀerentiable. (3) there is an element 0 ∈ V such that x ⊕ 0 = 0 ⊕ x = x for all x ∈ V (a zero element). Bases: The ﬁrst is familiar: instead of. multiplied by scalars. e3 } in mind. For example. Dimension: In linear algebra courses. (4) for each x ∈ V there is a unique element −x ∈ V with x ⊕ (−x) = 0 (additive inverses). for example. we shall sometimes want to talk about an abstract three–dimensional vector space V over the ﬁeld R. Notice that C is itself a two–dimensional vector space over R with additional structure (multiplication). We need to make three steps of generalization. 5 . linear maps between R2 and R2 are automatically continuous. i} for C over R we may identify z ∈ C with the vector ( (z). Using the Euclidean norm or distance. We shall be mainly looking at linear spaces that are not ﬁnite–dimensional. R2 . R and so on. A linear space over a ﬁeld k is a set V equipped with maps ⊕ : V × V → V and · : k × V → V with the properties (1) x ⊕ y = y ⊕ x for all x. This distinction amounts to having a speciﬁc basis {e1 . you deal with ﬁnite dimensional vector spaces. c) = ae1 + be2 + ce3 of reals – or choosing not to think of a speciﬁc basis. Riemann integrable and so on. after choosing a basis linear maps become matrices – though in an inﬁnite dimensional setting it is rarely useful to think in terms of matrices. (z)) ∈ R2 . in which case every element of V corresponds to a triple (a. z ∈ V (addition is associative).CHAPTER 1 Normed Linear Spaces A linear space is simply an abstract version of the familiar vector spaces R. y ∈ V (addition is commutative). Linear maps between vector spaces may be described in terms of matrices. Ground fields: The second is fairly trivial and is also familiar: the ground ﬁeld can be any ﬁeld. (2) (x ⊕ y) ⊕ z = x ⊕ (y ⊕ z) for all x. Linear (vector) spaces Definition 1. in which case the elements of V are just abstract vectors v. We shall only be interested in R (real vector spaces) and C (complex vector spaces). Recall that vector spaces have certain algebraic properties: vectors may be added. y. All of these features may be summed up in one line: the algebra of inﬁnite dimensional linear spaces is intimately connected to the topology. 3 1. and several new features appear. R3 . certain functions from R to R are continuous.1. Choosing a basis {1. some linear maps are not continuous. e2 . and vector spaces have bases and subspaces.
. NORMED LINEAR SPACES (notice that (V. . If S is inﬁnite. [8] Let Ω be a subset of Rn . [2] Let V be the set of all polynomials with coeﬃcients in R of degree ≤ n with usual addition of polynomials and scalar multiplication. . b] be the space of inﬁnitely diﬀerentiable functions on [a. and C k (Ω) the space of k times continuously diﬀerentiable functions. .n) (C) of complex–valued m × n matrices. β ∈ k and x ∈ V . . . If S = {s1 . · for vector addition and scalar multiplication. . [1] Let V = Rn = {x = (x1 .1. x3 . then the map that sends a polynomial f ∈ R[x] to the function f ∈ C(S) is injective (since two polynomials that agree on inﬁnitely many values must be identical). . Closure under scalar multiplication is clear: if 1 1 1 f (x)2 dx < ∞ and α ∈ R then 0 αf (x)2 dx = α2 0 f (x)2 dx < ∞. (8) 1 · x = x for all x ∈ V where 1 is the multiplicative identity in the ﬁeld k. then the partial derivatives Da f = exist and are continuous. This means that if a = (a1 . . . y ∈ V (scalar multiplication distributes over vector addition). [7] Let C ∞ [a. β ∈ k and x ∈ V (scalar multiplication distributes over scalar addition). . The dimension of C(S) is inﬁnite if S is an inﬁnite set. sn } is ﬁnite. For 0 vector addition we need the Cauchy–Schwartz inequality: 1 1 f (x) + g(x)2 dx ≤ 0 0 1 f (x)2 + 2f (x)g(x) + g(x)2 dx 1 1/2 1 1/2 ≤ 0 f (x)2 dx + 0 f (x)2 dx 0 1 g(x)2 dx + 0 g(x)2 dx < ∞. then the map that sends a function f ∈ C(S) to the vector (f (s1 ). xn )  xi ∈ R} with the usual vector addition and scalar multiplication. . . for example. From now on we will drop the special notation ⊕. 1 This ∂xa1 1 ∂ a f . y and so on for elements of linear spaces. . 1) → R which are 1 square–integrable: that is. . We need to check that this is a linear space. [6] Let V be the set of Riemann–integrable functions f : (0. [3] Let V be the set M(m. This shows that C(S) contains an isomorphic copy of an inﬁnitedimensional space. +) therefore forms an abelian group) (5) α · (β · x) = (αβ) · x for all α. (6) (α + β) · x = α · x ⊕ β · x for all α. . . Then ∞ is linear space. [5] Let C(S) be the set of continuous functions f : S → R with addition (f ⊕g)(x) = f (x) + g(x) and scalar multiplication (α · f )(x) = αf (x). . ∂xan n . any subset of R. (7) α · (x ⊕ y) = α · x ⊕ α · y for all α ∈ k and x. an ) ∈ Nn has a = a1 + · · · + an ≤ k. and is exactly S if not1 . ) that are bounded: sup{xn } < ∞. We will also (normally) use plain letters x. since sup{xn + yn } ≤ sup{xn } + sup{yn } < ∞ and sup{αxn } = α sup{xn }. Here S is. . may be seen as follows. Example 1. . x2 . with usual addition of matrices and scalar multiplication. f (sn )) ∈ Rn is an isomorphism of linear spaces. . b]. [4] Let ∞ denote the set of inﬁnite sequences (x1 . so must be inﬁnitedimensional. with the property that 0 f (x)2 dx < ∞.6 1.
. . . [3] Examples 1. . subsets of linear spaces that are themselves linear spaces are called linear subspaces. [1] (cf. . xn of V are linearly dependent if there are scalars α1 . . · (1) (2) (3) Definition 1. tn }. Example 1. 0).3. x + y ≤ x + y for all x.1 [4]. . then V is inﬁnite–dimensional. x2 . y ∈ W and α. αx = α x for all x ∈ V and α ∈ k. . then V is said to have dimension n. x3 . . . . xn is the linear subspace n span{x1 . Let V be a linear space over the ﬁeld k. t2 . If · is a function with properties (2) and (3) only it is called a semi–norm. .4(3) we are assuming that k is R or C and  ·  denotes the usual absolute value.2. the standard basis is given by the vectors e1 = (1. . showing the space to have dimension (n + 1). Linear subspaces Definition 1. Example 1. Example 1. . e2 = (0. . . . Norms A norm on a vector space is a way of measuring distance between vectors. . x2 .2. 3. A norm on a linear space V over k is a non–negative function : V → R with the properties that x = 0 if and only if x = 0 (positive deﬁnite).4. [1] The set of vectors in Rn of the form (x1 . . [8] are all inﬁnite–dimensional. . [6]. If there is no such set of vectors. j=1 Definition 1. 0. A subset W ⊂ V is a linear subspace of V if for all x. 0. . αn (not all zero) such that α1 x1 + · · · + αn xn = 0.1(8)) The space C k+1 (Ω) is a linear subspace of C k (Ω). Example 1. . . the linear combination αx + βy ∈ W . Linear independence Let V be a linear space. y ∈ V (triangle inequality). . .3. The linear span of the vectors x1 . x2 . β ∈ k. 0). 0) forms a three–dimensional linear subspace. . . .4. .1(1)) The space Rn has dimension n. . . . Elements x1 . 0. . . [2] The set of polynomials of degree ≤ r forms a linear subspace of the the set of polynomials of degree ≤ n for any r ≤ n. [5]. . xn } = x = αj xj  αj ∈ k . [2] (cf. then they are linearly independent. [7]. 1). Example 1. A linearly independent set of vectors that spans V is called a basis for V . t. 0. . NORMS 7 As in the linear algebra of ﬁnite–dimensional vector spaces. . .1[2]) A basis is given by {1. If there is no such set of scalars. In Deﬁnition 1. . 1. If the linear space V is equal to the span of a linearly independent set of n vectors. . [3] (cf. en = (0. 4. . 2.
Notice that the linear spaces n and p n q have exactly the same elements. x+y = 2 together imply that x = y. x2 . Strict convexity is automatic for Hilbert spaces. Notice that for p < ∞ there is a strict inclusion p ⊂ ∞ . 1]. A norm · is strictly convex if x = 1. To check this is a norm the only diﬃculty is the triangle inequality: for this we use the Cauchy–Schwartz inequality.5.1[4]). for any p < q there is a strict inclusion p ⊂ q so p is a linear subspace of q . y ∈ C. y = 1. [3] Let X = ∞ be the linear space of bounded inﬁnite sequences (cf. A set C in a linear space is convex if for any two points x. . the sets p and q for p = q do not contain the same elements. )  x p < ∞}. For 1 ≤ p < ∞ deﬁned 1/p n x p = j=1 xj p . · Definition 1. Example 1. . [1] Let V = Rn with the usual Euclidean norm 1/2 n x = x 2 = j=1 xj 2 . [2] There are many other norms on Rn . Indeed.7. This gives an inﬁnite family of normed linear spaces. . . A normed linear space is a linear space V with a norm (sometimes we write · V ). Then · Inequality p is a norm on V : to check the triangle inequality use Minkowski’s n j=1 xj + yj p 1/p ≤ n j=1 xj p 1/p + n j=1 yj p 1/p . Consider the function · p : ∞ → R ∪ {∞} given by 1/p ∞ x p = j=1 xj p . That is. We won’t be using convexity methods much. x n p ∞ = max {xj }. If we restrict attention to the linear subspace on which · p is ﬁnite. tx + (1 − t)y ∈ C for all t ∈ [0. Example 1. There is another norm corresponding to p = ∞.6. then · p is a norm (to check this use the inﬁnite version of Minkowski’s inequality). p = {x = (x1 .4. called the p–norms. but for each of the examples try to work out whether or not the norm is strictly convex. 1≤j≤n It is conventional to write for these spaces.8 1. Definition 1. NORMED LINEAR SPACES Definition 1.
PRODUCTS OF NORMED SPACES 9 [4] Let X = C[a. (Y. Then (using the integral form of Minkowski’s inequality) we have the p–norm b 1/p f p = a f (t) p .6. Then V is a normed linear space. The real linear spaces (C. y ∈ Y } is a linear space which may be made into a normed space in many diﬀerent ways. Example 1. A pair (X. [1] (x. · Y ) are normed linear spaces. Isomorphism of normed linear spaces Recall form linear algebra that linear spaces V and W are (algebraically) isomorphic if there is a bijection T : V → W that is linear: T (αx + βy) = αT (x) + βT (y) for all α. Then Y is a linear subspace of p for any 1 ≤ p ≤ ∞ so the p–norm makes Y into a normed space.1[6]). so T (x) Y = x X .5. = Lemma 1. X) 2) are isometric. Example 1. y) = ( x [2] (x. β ∈ k and x. Let f 2 = 0 f (x)2 dx < ∞.b] f (t). 6. y Y }.  · ) and (R2 . and choose 1 ≤ p < ∞. 5. [6] (cf. b]. 1) → R which are square–integrable. Products of normed spaces If (X. This is called the uniform or supremum norm. b with a x X ≤ T (x) Y ≤b x X. y) = max{ x X . Why is is ﬁnite? [5] Let X = C[a. b]. If the constants a and b in equation (1) may both be taken as 1.7. a few of which follow. · X then restricted to Example 1. Let Y denote the space of inﬁnite real sequences with only ﬁnitely many non–zero terms. then T is called an isometry and the normed spaces X and Y are called isometric. Let V be the set of Riemann–integrable functions f : 1 (0. and put f = supt∈[a. then the product X × Y = {(x. · If Y is a subspace of a linear normed space (X. y)  x ∈ X. Example 1. X + y Y ) 1/p . · Y makes Y into a normed subspace. (1) We shall usually denote topological isomorphism by X ∼ Y . If X and Y are n–dimensional normed linear spaces over R (or C) then X and Y are topologically isomorphic. · Y ) of normed linear spaces are (topologically) isomorphic if there is a linear bijection T : X → Y with the property that there are positive constants a. . y ∈ V . · X ).1. · X) and (Y.6.
Continuous maps between normed spaces We have seen continuous maps between R and R in ﬁrst year analysis. If f is continuous at every a ∈ X then we simply say f is continuous. NORMED LINEAR SPACES 7. [2] Let f (x) = Ax be the non–trivial linear map from Rn to Rm (with Euclidean norms) deﬁned by the m × n matrix A = (aij ). (2) Looking at (2). Then for any x ∈ Rn we have m n Ax − Aa 2 = i=1 m  j=1 aij (xj − aj )2 n ≤ i=1 j=1 a2  ij 2 n j=1 xj − aj  = C 2 m i=1 n j=1 2 2 x−a where C = aij  > 0.10 1. we see that f is uniformly continuous: ﬁx a ∈ Rn and b = Aa. on suitably deﬁned spaces. A map f : X → Y between normed linear spaces (X. Definition 1. We shall see later that F does indeed have a ﬁxed point – knowing that F is uniformly continuous is a step towards this. u(0) = 1. an element u ∈ X for which F (u) = u) is a continuous solution to the ordinary diﬀerential equation du = sin(u) + tan(t). · Y ) is continuous at a ∈ X if ∀ > 0 ∃ δ = δ( . [3] Let X be the space of continuous functions [−1. Example 1. y ∈ X. Finally. where t v(t) = 1 + 0 (sin u(s) + tan s) ds. ∀ > 0.4[4]). Example 1. It follows that f is uniformly continuous. questions like “is the map f → f continuous?” or “is the map x f → 0 f ” continuous?” can be asked. we see that exactly the same deﬁnition can be made for maps between linear normed spaces.8. Deﬁne a map F : X → X by F (u) = v. ∃ δ > 0 such that x − a < δ =⇒ f (x) − f (a) < . a) > 0 such that x − a X X) < δ =⇒ f (x) − f (a) Y < .8. 1] → R with the sup norm (cf. Thus. To see that F is . Using the Cauchy–Schwartz inequality. Notice that F is intimately connected to a certain diﬀerential equation: a ﬁxed point for F (that is. 1]. The map F is uniformly continuous on X. dt in the region t ∈ [−1. which in view of Example 1. [1] The map x → x2 from (R. To make this deﬁnition we used the distance function x − y on R: a function f : R → R is continuous if ∀ a ∈ R. f is uniformly continuous if ∀ > 0 ∃ δ = δ( ) > 0 such that x−y X < δ =⇒ f (x)−f (y) Y < ∀ x.  · ) to itself is continuous but not uniformly continuous. · and (Y. and we may take δ = /C.4 will give us the possibility of talking about continuous maps between spaces of functions.
CONTINUOUS MAPS BETWEEN NORMED SPACES 11 continuous.1] t sup t∈[−1. t∈[0. [4] Let X be the space of complex–valued square–integrable Riemman integrable functions on [0. ∞) is not continuous. 2 Then F (u) − F (0) 2 = 16 a4 b6 . a2 t. 0 ≤ t ≤ 2b2 u(t) = ia. an easy consequence of the Mean Value Theorem.7. b with 2ab < δ but 16 a4 b6 = 1. On the other hand. 2b2 ≤ t ≤ 4b2 0.1] u−v . Now.1] 1+ 0 (sin u(s) + tan s)ds − t 1+ 0 t (sin v(s) + tan s)ds = ≤ ≤ sup t∈[−1. Then F is continuous (but not uniformly continuous): t F u(t) − F v(t) =  0 1 (u2 (s) − v 2 (s))ds (u(s) + v(s))(u(s) − v(s))ds 0 1 1/2 1 1/2 ≤ ≤ u(s)2 + v(s)2 ds 0 0 u(s) − v(s)2 ds .1] 0 t (sin u(s) − sin v(s)) ds  sin u(s) − sin v(s) 0 sup t∈[−1. 1] with 2–norm (cf. . given any δ > 0 we may choose constants 3 a. calculate F (u) − F (v) = = sup F (u)(t) − F (v)(t) t∈[−1. Notice we have used the inequality  sin u − sin v ≤ u − v. with t v(t) = 0 u2 (s)ds. so that Fu − Fv 2 ≤ sup u(t) − v(t) ≤ ( u + v 2 ) u − v . otherwise. Deﬁne a map F : X → X by F (u) = v. That is. b ∈ R and deﬁne a. 0 ≤ t ≤ 2b2 2 2 2 F (u)(t) = 4b a − a t.1] Then u − 0 [5] The same map as in [4] applied to square–integrable Riemann integrable functions on [0. Example 1. 2b2 ≤ 4b2 0 otherwise.4[6]). let a. given any δ > 0 there is a function u 3 = 2ab. To see this.
. m > N =⇒ xn − xm < . (xj ) converges to 0 in the space n ) if and only if xj → 0 in R for each p k = 1. 0. . Replace  ·  with R → R. Example 1. and it is a simple matter to check that in Rn the converse holds. . Let (xj ) be the sequence in p deﬁned by xj = (0. However. [1] If (xj ) is a sequence in Rn . n. Lemma 1. . . showing that F is not continuous.10. . We know that in the normed linear space (R. . . It is clear that a convergent sequence is a Cauchy sequence.  · ) the converse also holds.2. . A sequence (xn ) is a Cauchy sequence if ∀ > 0 ∃ N such that n. . . . ∞ Similarly. a series n=1 xn converges if the sequence of partial sums (sN ) N deﬁned by sN = n=1 xn is a convergent sequence. we also have xj p = 1 for all j. 8. it is not enough to check convergence on each component using a basis. A map F : X → Y between normed linear spaces is continuous at a ∈ X if and only if lim F (xn ) = F (a) n→∞ (k) (1) (2) (k) (1) (n) for every sequence (xn ) converging to a. 1. ) we certainly have xj → 0 as j → ∞ for each k. . then check that xj p → 0 (that is. Definition 1. . NORMED LINEAR SPACES with the property that u − 0 < δ but F (u) − F (0) = 1. A sequence (xn ) in X is said to converge to a ∈ X if xn − a −→ 0 as n → ∞.9.12 1. . · X ) be a normed linear space. Sequences and completeness in normed spaces Just as for continuity. with xj = (xj . we can use the norm on a normed linear space to deﬁne convergence for sequences and series in a normed space using the corresponding notion for R. . Let X = (X. it is not converging to anything. Then if we write xj = (xj . . so the sequence is certainly not converging to 0. ) (where the 1 appears in the jth position. Proof. A normed linear space is said to be complete if all Cauchy sequences are convergent. In many reasonable inﬁnite–dimensional normed linear spaces however there are Cauchy sequences that do not converge. [2] For inﬁnite–dimensional spaces. xj ). . · in the proof of this statement for functions Definition 1. xj .9. The moral is that the topological properties of inﬁnite spaces are a little counter–intuitive. Indeed. .
A set C ⊂ X is closed if whenever (cn ) is a sequence in C that is a convergent sequence in X. Topology is a subject that begins by attaching names to these properties and then develops a shorthand for talking about such things. Definition 1. the limit limn→∞ cn also lies in C. since 1 um − un 2 2 = 0 um (t) − un (t)2 dt 1/2 1/2+1/m = 1/2−1/m um (t) − un (t)2 dt + 1/2 um (t) − un (t)2 dt → 0 as m > n → ∞. and assume that there is a continuous function f with un − f 2 → 0 as n → ∞.4[4]). 0≤t≤ 1−n 2 nt n 1 1 1 1 un (t) = 2 − 4 + 2 . . which contradicts (3). This is complete. A set U ⊂ X is open if for every u ∈ U there exists > 0 such that x − u < =⇒ x ∈ U . consider the sequence of functions 1 0. Example 1. TOPOLOGICAL LANGUAGE 13 31 314 Example 1. again contradicting 2 2 (3). We must therefore have f ( 2 ) = 0. 9. . 100 .11. is a Cauchy sequence 10000 of rationals converging to the real number π. · 2 ) is not complete. To see this. 1].10. 31415 .4[5]) is not complete. ¯ Associated to any set S ⊂ X in a normed space there are sets S o ⊂ S ⊂ S deﬁned as follows: the interior of S is the set S o = {x ∈ X  ∃ > 0 such that x−y < =⇒ y ∈ S}. (4) . . [1] The sequence 3. We claim that the sequence (un ) is not convergent in C[0. S ∩ A = ∅. A set S ⊂ X is connected if there do not exist open sets A. 1 + δ) for some δ > 0. (1 2 (3) Now examine If = = 0 then f − g must be positive on − δ. A set S ⊂ X is bounded if there is an R < ∞ with the property that x ∈ S =⇒ x < R. To see this. 1] of continuous functions under the sup norm (cf. Example 1. 1 ) 2 1 for some δ > 0. 1000 . Let X be a normed linear space. let g be the function deﬁned by g(t) = 0 for 0 ≤ t ≤ 1 and g(t) = 1 for 2 1 2 < t ≤ 1. [2] Consider the space C[0. S ∩ B = ∅ and S ∩ A ∩ B = ∅. It is clear that un − g 2 → 0 as n → ∞ also. 2 − n ≤ t ≤ 1 + n 2 1 1 1 2 + n ≤ t ≤ 1. Thus the normed space (C[0. 2 f(1) 2 g( 1 ) 2 2 = 0. Topological language There are certain properties of subsets of normed linear spaces (and other more general spaces) that we use very often. [3] The space C[0. B in X with S ⊂ A ∪ B. 1] under the 2–norm (cf. but in this case f − g must be positive on ( 1 . so we must have f −g f ( 1 ). Then (un ) is a Cauchy sequence. 1] under the 2–norm. We conclude that there is no continuous function f that is the 2–norm limit of the sequence (un ).9.
Theorem 1. the pre–image f −1 (U ) ⊂ X is also open. s∈S Definition 1. Thus the result Theorem 1. Exercise 1. convince yourself that Theorem 1. The closed ball of radius > 0 and center x0 is the set ¯ B (x0 ) = {x ∈ X  x − x0 ≤ }. ) that converges in S.1. Open and closed balls in normed spaces are convex (cf. A subset S of a normed linear space is (sequentially) compact if and only if every sequence (sn ) in S has a subsequence (snj ) = (sn1 . Deﬁnition 1.12. A subset of Rn is compact if and only if it is closed and bounded.1 does hold in great generality. and f : X → Y is a continuous map between normed linear spaces.3. Theorem 1. It is clear from ﬁrst year analysis that closed bounded sets (closed intervals. .2 does not extend to inﬁnite–dimensional normed spaces.3 implies Theorem 1. Then the open ball of radius > 0 and centre x0 is the set B (x0 ) = {x ∈ X  x − x0 < . recall the theorem of Bolzano– Weierstrass. Definition 1. Let S be a closed and bounded subset of R. As an exercise. } that is dense in X. then f (A) is a compact subset of Y . Recall the following theorem (the Heine–Borel theorem) – which is really the same one as Theorem 1.1. Theorem 1. . Let X be a normed space. . Then a continuous function f : S → R attains its bounds: there exist ξ. (5) Exercise 1.8) if and only if for every open set U ⊂ Y . A subset S of a normed space X is dense if every open ball in X has non–empty intersection with S.14 1. However the analogue of Theorem 1. By now you should be used to the idea that any such result does not extend to inﬁnite–dimensional normed linear spaces: Example 1. Deﬁnition 1. . η ∈ S with the property that f (ξ) = sup f (s).2. sn2 . NORMED LINEAR SPACES and the closure of S.6). . For example. [1] Prove that a map f : X → Y is continuous (cf.9[2] is a bounded sequence with no convergent subsequences. x2 .1.2. If A is a compact subset of a normed linear space X. ¯ S = {x ∈ X  ∀ > 0 ∃s ∈ S such that s − x < }. A normed space is said to be separable if there is a countable set S = {x1 . .1. for example) have special properties. Definition 1. [2] Show by example that for a continuous map f : R → R there may be open sets U for which f (U ) is not open. .13. Some standard sets are used so often that we give them special names.14. s∈S f (η) = inf f (s). This is also a version of the Bolzano–Weierstrass theorem.
1) + Y and (0.11. [4] and [5] are examples of linear spaces unlike any we have seen. λ · (x + Y ) = λx + Y. The quotient X/Y is again a linear space that is impossible to work with. which is true if and only if x1 + x2 ∈ Y . There are many pairs of elements that generate X/Y .2 that a closed linear subspace Y of a normed linear space X is a subset Y ⊂ X that is itself a linear space. then X/Y is a normed space under the norm x + Y = inf z∈x+Y z . 1].11 and Deﬁnition 1. Theorem 1. q ∈ [1. p < q =⇒ p ⊂ q . then the space X/Y consists of .11 [1]. . Notice that we need both the algebraic structure (subspace of a linear space) and a topological property (closed) to make it all work. so for example Y + Y = Y and λY = Y for λ = 0. Notice that Examples 1. . . This means that for any p < q there is a linear quotient space q / p . for example (1. . 0). Quotient spaces As an application of Section 9. Recall from Deﬁnition 1. [4] We know that for p. Two cosets x1 + Y and x2 + Y are equal if as sets x1 + Y = x2 + Y . If X is a normed space. [3] The linear space Y of 1 sequences of the form (0. xn+1 . [3]. It is clear from these examples that not all linear subspaces are equally good: Examples 1. 1]. The set of cosets is a linear space under the operations (x1 + Y ) ⊕ (x2 + Y ) = (x1 + x2 ) + Y. Here the quotient space 1 /Y is quite reasonable: in fact it is isomorphic to Rn . These quotient spaces are very pathological. [5] The linear space Y = C[0. 0. Example 1. QUOTIENT SPACES 15 10. . and let Y = {f ∈ X  f (0)}. 0. and let Y be the subspace spanned by (1. The reason is the following: the space X/Y is guaranteed to be a normed space with a norm related to the original norm on X only when the subspace Y is itself closed. 1] is a linear subspace of the space X of square– Riemmann–integrable functions on [0. [2] The linear space Y of ﬁnitely supported sequences in 1 is a linear subspace. [6] Let X = C[0. ∞]. The quotient space 1 /Y is very hard to visualize: its elements are equivalence classes under the relation (xn ) ∼ (yn ) if the sequences (xn ) and (yn ) diﬀer in ﬁnitely many positions. Then X/Y is a two–dimensional real vector space.10. and Y is a normed linear subspace. with the property that any sequence (yn ) of elements of Y that converges in X has the limit in Y . try to convince yourself that the norm (6) is the obvious candidate: if X = R2 and Y = (1. [1] Let X = R3 . The linear space X/Y (the quotient or factor space is formed as follows. 0)R.11 [1]. 0. and [6] are quite reasonable. quotients of normed spaces may be formed. Notice that this makes sense precisely because Y is itself a linear space. 1) + Y. . [3]. (6) Before proving this theorem. The elements of X/Y are cosets of Y – sets of the form x + Y for x ∈ X. 1. ) (ﬁrst n are zero) is a linear subspace of 1 . . whereas [2]. Then X/Y is isomorphic to R.4. and [6] are precisely the ones in which the subspace is closed.
Since Y is closed. and this choice minimizes the norm of the element of X that represents the line.16 1. xn − xm → xn − x is a sequence in Y converging in X. so x + Y = xn + Y = z + Y .z2 ∈x2 +Y z1 ∈x1 +Y inf z1 + z2 inf z2 inf z1 + z2 ∈x2 +Y x1 + Y + x2 + Y . That is. Even if the subspace is closed. we must have xn − x ∈ Y . For example. NORMED LINEAR SPACES lines in X of the form (s. Notice that each such line may be written uniquely in the form (0. let c denote the space of all sequences (xn ) with the property that limn xn exists. Finally. the triangle inequality: (x1 + Y ) + (x2 + Y ) = ≤ = z1 ∈x1 +Y . Let x + Y be any coset of X. Proof. Then there is a sequence (xn ) ⊂ x + Y with xn → 0. and let (xn ) ⊂ z + Y be a convergent sequence with xn → x. Example 1. t)+Y . Homogeneity is clear: λ(x + Y ) = inf z∈x+Y λz = λ inf z∈x+Y z = λ x + Y . Assume now that x + Y = 0. we must have 0 ∈ x+Y . Since x+Y is closed and xn → 0. the zero element in X/Y . the limit of the sequence deﬁnes the same coset as does the sequence – the set z + Y is a closed set. This is a closed subspace of ∞ . so x+Y = Y . Then for any ﬁxed n. the quotient space may be a little odd. What is the quotient ∞ /c? . t) + Y .12.
However. x(2) .4[4] and Example 1. That (k) is. n n Recall that · p ≥ · ∞ for all p (cf. then it does: we prove this for p < ∞ but the p = ∞ case is similar. This last inequality means that xn − y p ≤ 1/p . . m > N implies that ∞ x(k) − x(k) p < . 1922). we deduce that for each k we have xn → y (k) . given ﬁnd N with the property that m. ). Example 1. .9[2]). To see this. p [2] The space of continuous functions with the sup norm is a Banach space (cf. In our notation. xn − xm  < . so for each k..4[3]). n > N =⇒ xn − xm p > 0 we may < (k) (k) which in turn implies that xn − xm ∞ < . Example 1. Fix > 0. then for each k (xn ) is a Cauchy sequence (k) in R. Since R is complete. 17 . 1 After the Polish mathematician Stefan Banach (1892–1945) who gave the ﬁrst abstract treatment of complete normed spaces in his 1920 thesis (Fundamenta Math. if (xn ) is a Cauchy sequence in p . assume that (xn ) is a Cauchy sequence in p . 3. [1] We are already familiar with a large class of Banach spaces: any ﬁnite–dimensional normed linear space is a Banach space. 133–181. 1932) laid the foundations of functional e e e analysis. if we know (as we do) that (xn ) is Cauchy. [3] The sequence space p is a Banach space. So. His later book (Th´orie des op´rations lin´aires.1. and write xn = (x(1) . n m k=1 Now ﬁx n and let m → ∞ to see that ∞ x(k) − y (k) p ≤ n k=1 (notice that < has become ≤). Example 1. A complete normed linear space is called1 a Banach space. this means that n is a Banach space for all 1 ≤ p ≤ ∞ and all n. Definition 2. Warsaw. and use the Cauchy criterion to ﬁnd N such that n.10[2].CHAPTER 2 Banach spaces It turns out to be very important and natural to work in complete spaces – trying to do functional analysis in non–complete spaces is a little like trying to do elementary analysis over the rationals. Example 2.1. . Notice that this does not imply by itself that xn → y (cf.
[3] Show that if X is already a Banach space. This is analogous to the process of passing from Q to R by working with Cauchy sequences of rationals. deﬁne an equivalence relation ∼ on C(X) by (xn ) ∼ (yn ) if and only if xn − yn → 0. deﬁned by (xn ) = lim n→∞ xn . xn Recall that absolutely convergent means that the numerical series n=1 xn is convergent. for 1 example. . In later sections we will see more details about what the completions look like. 1. The norm · on X extends to a semi–norm on C(X). showing that in xn → y = (y (1) . In this section we simply outline what is done. giving a complete normed linear space ¯ X called the completion of X. y2 . Completions Completeness is so important that in many applications we deal with non– complete normed spaces by completing them.1. . where an element of X is an equivalence class of Cauchy sequences. ∞ n=1 ∞ (2) Lemma 2. This means that the Cauchy sequence (fn ) and the Cauchy sequence (0) are not separated by the semi–norm · 2 . then there is a bijective isomorphism ¯ between X and X. Let C(X) denote the set of all Cauchy sequences in X. BANACH SPACES p. The linear space structure of X extends to C(X) by deﬁning α · (xn ) + (yn ) = (αxn + yn ). ). [1] Apply the process outlined above to the rationals Q. showing it is not a norm. It should be clear from the above that it is going to be diﬃcult to work with ¯ elements of the completions in this formal way. 1] each with fn 2 = n2 with the property ∞ that n=1 fn is not continuous. An element of C(X) is then a Cauchy sequence (xn ). The lemma is clearly not true for general normed spaces: take. Proof. Then the linear space operations and the semi–norm are well–deﬁned on the space of equivalence classes C(X)/ ∼. Exercise 2.18 2. all the linear space operations. there is a Banach space X such that X ˆ the map ı from X into X preserves ˆ is isomorphic to a dense subspace ı(X) of X. . Finally. It follows that the sequence (sm ) is Cauchy. Let (xn ) be a sequence in a Banach space. then it is convergent. so the series n=1 xn converges. Try to see why the obvious extension of the norm to the space of Cauchy sequences only gives a semi–norm. [2] Construct a Cauchy sequence (fn ) in (C[0.1. Consider the sequence of partial sums sm = m m n=1 xn : sm − sk ≤ n=k+1 xn → 0 as m > k → ∞. Let X be a normed linear space. . since X is complete ∞ this sequence converges. However all we will ever need is the simple ˆ statement: for any normed linear space X. 1]. · 2 ) with the property that fn = 0 for any n but fn 2 → 0. If the series is absolutely convergent. a sequence of functions in C[0.
1] under the 2–norm is not complete (cf. the support of f is the smallest closed set containing {x ∈ X  f (x) = 0}. Similar examples will show that C[0.2. This gives an inﬁnite family of (diﬀerent) normed space structures on the linear ∞ space C0 (Ω). 1]. 1]. [2] A function f : X → Y is said to have compact support if it is zero outside some compact subset of X. [2] If f : [a. 1] is not complete under any of the p–norms. y ∈ X. and with the formalism of function spaces one uniform treatment may be given for numerical equations like x = cos(x) and dy diﬀerential equations like dx = x + tan(xy). y ∈ R. Let X denote the non–complete space (C[0. called Lp [0. It is easy to construct a Cauchy sequence of Riemann–integrable functions that does not converge to a Riemann– integrable function in the p–norm. y(0) = y0 .10[2]). A map F : X → Y between normed linear spaces is called a contraction if there is a constant K < 1 for which F (x) − F (y) for all x. Let C0 (Ω) be the space of inﬁnitely diﬀerentiable functions of compact support on Ω. Then Lp provides a further example of a Banach space. For each k ∈ N and 1 ≤ p ≤ ∞ deﬁne a norm 1/p f k. None of these spaces are complete because there are sequences of ∞ C functions whose (n. Starting with any initial value. but f is not a contraction. you do get a complete space.p = Ω a≤k p Da f (x) dx .3. What happens? Can you explain why this happens? (Draw a graph) How does this relate to the equation x = cos(x). put it in “radians” mode. If you have an electronic calculator. 2. p)–limit is not even continuous. [1] We have seen that C[0. Exercise 2. b] has the property that f (x) − f (y) < x − y then f is a contraction. ¯ A reasonable guess for X might be the space of Riemann–integrable functions with ﬁnite p–norm. Exercise 2. Recall the deﬁnition of higher–order derivatives Da from Example 1. b] → [a. For now. [3] Find an example of a function f : R → R that has the property f (x) − f (y) < x − y for all x. Example 1. press the cos button repeatedly. This example is of importance in distribution theory and ∞ the study of partial diﬀerential equations.2. · p ).2. [1] Any contraction is uniformly continuous. Contraction mapping theorem In this section we prove the simplest of the many ﬁxed–point theorems. Such theorems are useful for solving equations. CONTRACTION MAPPING THEOREM 19 Example 2. However. an open subset of Rn . Definition 2. Y ≤K · x−y X (7) .2. if you use Lebesgue integration.1(8). The completions of these spaces are the Sobolev spaces. but this is still not complete. think of this space as consisting of all Riemann–integrable functions with ﬁnite p–norm together with extra functions obtained as limits of sequences of Riemann–integrable functions.
F (x) = x). y ∈ R but f has no ﬁxed point. draw graphs to illustrate convergence of the iterates 2 of f to a ﬁxed point for examples with 0 < f (x) < 1 and −1 < f (x) < 0.4. then there is a unique point x∗ ∈ X which is ﬁxed by F (that is.1 we have x = lim F kn x0 . 1] to [0. An excellent starting point is the article and demonstration “One–dimensional iteration” at the web site http://www. Check that the contraction condition (7) holds if f has a continuous derivative f on [0. BANACH SPACES Theorem 2. As an exercise. and the proof of Theorem 2. [2] Let f be a function from [0. F (x) − x = lim F F kn x0 − F kn x0 = 0. Proof. By the continuity of F . 1] with the property that f (x) ≤ K < 1 for all x ∈ [0.1.2. and F : S → S is a contraction. k→∞ It follows that F (x) = x so x is a ﬁxed point for F .20 2. including frequency locking. Exercise 2. Simply notice that S is itself complete (since it is a closed subset of a complete space). k→∞ where x is the unique ﬁxed point of F n . If F : X → X is a contraction and X is a Banach space. 1]. F (k−1)n x0 ≤ · · · ≤ K k F (x0 ) − x0 . This ﬁxed point is automatically unique: if F has more than one ﬁxed point. then so does F n which is impossible by Corollary 2. then the sequence deﬁned by x1 = F (x0 ). sensitive dependence on initial conditions.1. If S is a closed subset of a Banach space.geom. period doubling. and F : S → S has the property that for some n there is a K < 1 such that F n (x) − F n (y) Y ≤K · x−y X for all x. F F so kn n is a contraction. 1]. Moreover.1. Then by Corollary 2. Corollary 2. . .3. If S is a closed subset of the Banach space X. then F has a unique ﬁxed point in S. k→∞ On the other hand. Proof. so F x0 − F kn x0 ≤ K F (k−1)n F x0 . Corollary 2. Example 2.edu/java/. if x0 is any point in X. y ∈ S.1 does not use the linear space structure of X. x2 = F (x1 ). F x = lim F F kn x0 .umn. . the Feigenbaum phenomena and so on. A basic linear problem is the following: let F : Rn → Rn be the aﬃne map deﬁned by F (x) = Ax + b 2 Iteration of continuous functions on the interval may be used to illustrate many of the features of dynamical systems. . Choose any point x0 ∈ S. [1] Give an example of a map f : R → R which has the property that f (x) − f (y) < x − y for all x. then F has a unique ﬁxed point. converges to x∗ .
. . . . . x ∞ = max{xi }. . aij  ≤ K < 1 for j = 1. then we can apply Theorem 2. F (x) − F (˜ ) x ∞ = max i j aij (xj − xj ) ˜ aij xj − xj  ˜ j ≤ max i ≤ max i j aij  max xj − xj  ˜ j = max i j ˜ aij  x − x ∞. In this case.2. it is sometimes of importance computationally to avoid inverting matrices. and more importantly to have an iterative scheme that converges to a solution in some predictable fashion. However. CONTRACTION MAPPING THEOREM 21 where A = (aij ) is an n × n matrix. If F is a contraction. n. j (8) [2] Using the 1–norm. .1 to solve3 the equation F (x) = x. . . n. [1] Using the max norm. F (x) = y. The conditions under which F is a contraction depend on the choice of norm for Rn . aij (xj − xj ) ˜ a2 ij 2 2 F (x) − F (˜ ) x = i j ≤ i j ˜ 2 x − x 2. Thus the contraction condition is aij  ≤ K < 1 for i = 1. i (9) [3] Using the 2–norm. x 1 = n i=1 xi . Equivalently. Three examples follow. x 2 = 2 2 n 2 1/2 i=1 {xi  . . . where n yi = j=1 aij xj + bi for i = 1. . n. 3 Of course the equation is in one sense trivial. . In this case. In this case. aij (xj − xj ) ˜ i j F (x) − F (˜ ) x 1 = ≤ i j aij xj − xj  ˜ aij  i ≤ The contraction condition is now max j ˜ x − x 1.
the sequence xn therefore converges. the most important applications of the contraction mapping method are to function spaces. (9). x2 = F (x1 ). . Applications to diﬀerential equations As mentioned before. . deﬁne a sequence (xn ) by x1 = F (x0 ). It remains only to prove the contraction mapping theorem. BANACH SPACES The contraction condition is now a2  ≤ K < 1. notice that if F (y) = y say. Moreover. Then. Notice that each of the conditions (8). an inﬁnite number of times. Since the linear space X is complete (cf. (10) is suﬃcient for the method to work. x→∞ Since F is continuous. F (x∗ ) = F ∗ n→∞ lim xn = lim F (xn ) = lim xn+1 = x∗ . say x∗ = lim XN . . . or (10) holds. (9). n→∞ n→∞ ∗ so F has a ﬁxed point x . The ﬁrst result in this direction is due to Picard4 . 2) a non–polynomial entire function takes on every value (except the possible exceptional one). . . 3.8[3] that ﬁxed points for certain integral operators on function spaces are solutions of ordinary diﬀerential equations. Deﬁnition 1. To prove that x is the only ﬁxed point for F . for any n ≤ m we have by the contraction condition (7) xn − xm = F n x0 − F m x0 ≤ K n x0 − F m−n x0 ≤ K n ( x0 − x1 + x1 − x2 + · · · + xm−n−1 − xm−n ) ≤ K n x0 − x1 1 + K + K 2 + · · · + K m−n−1 Kn < x0 − x1 . Given any point x0 ∈ X. who was Professor of higher analysis at the Sorbonne and became permanent secretary of the Paris Academy of Sciences. In fact there are examples in which exactly one of the condition holds for each of them conditions. then there exists a unique solution in Rn to the aﬃne equation Ax + b = x. . We have seen already in Example 1. 4 (Charles) Emile Picard (1856–1941).1.22 2. . 1−K Now for ﬁxed x0 . the last expression converges to zero as n goes to inﬁnity. Deﬁnition 1. Some of his deepest results lie in complex analysis: 1) a non–constant entire function can omit at most one ﬁnite value. x2 = F (x1 ). then x∗ − y = F (x∗ ) − F (y) ≤ K x∗ − y . .9) the sequence (xn ) is a Cauchy sequence. the solution may be approximated using the iterative scheme x1 = F (x0 ). so (cf. but none of them are necessary. which requires that x∗ = y since K < 1.10). ij i j (10) It follows that if any one of the conditions (8). Proof of Theorem 2.
M δ < 1. so that F is a contraction mapping. Let f : G → R be a continuous function deﬁned on a set G containing a neighbourhood {(x. equipped with the sup metric. APPLICATIONS TO DIFFERENTIAL EQUATIONS 23 Theorem 2. Choose δ > 0 such that (1) x − x0  ≤ δ. then x F φ(x) − y0  = x0 x f (t. so F (φ) ∈ S. (2) M δ < 1 where M is the Lipschitz constant in (11). The diﬀerential equation (12) with initial condition (13) is equivalent to the integral equation x φ(x) = y0 + x0 f (t. y) − f (x. φ(t))dt x ˜ ≤ M δ max φ(x) − φ(x). φ(t))dt. φ(t))dt x0 ≤ ≤ Rx − x0  ≤ Rδ by (15). φ(t))dt. φ(t))dt f (t. y)  (x. so that ˜ ˜ F (φ) − F (φ) ≤ M δ φ − φ . y − y0  ≤ Rδ together imply that (x. Suppose that f satisﬁes a Lipschitz condition of the form f (x. The set S is complete. y) − (x0 . . y0 ) ∞ < e} of (x0 . Deﬁne a mapping F : S → S by the equation x (F (φ)) (x) = y0 + x0 f (t. Moreover. y0 ) ∞ < e for some e > 0. (13) Proof. y ) ≤ M y − y  ˜ ˜ (11) in the variable y on G.2 that the operator F has a unique ﬁxed point in S. y) (12) dx has a unique solution y = φ(x) satisfying the initial condition φ(x0 ) = y0 .2. (14) Since f is continuous there is a bound f (x. y) ≤ R (15) for all (x.3. after taking sups over x. x0 + δ) on which the ordinary diﬀerential equation dy = f (x. φ(t)) − f (t. so the diﬀerential equation (12) with initial condition (13) has a unique solution. y) − (x0 . y0 ) for some e > 0. since it is a closed subset of a complete space. It follows from Corollary 2. x ˜ F φ(x) − F φ(x) ≤ x0 ˜ f (t. (16) First check that F does indeed map S into S: if φ ∈ S. By construction. y0 ) ∞ < e . y) − (x0 . y) with (x. Then there is an interval (x0 − δ. Let S be the set of continuous functions φ deﬁned on the interval x − x0  ≤ δ with the property that φ(x) − y0  ≤ Rδ.
. . . . yn ) for i = 1. . . . . The mapping F deﬁned by the set of integral operators x (F (φ))i (x) = y0i + fi (t. φn (x0 ) = y0n . . ﬁrst notice that if φ = (φ1 . . BANACH SPACES The conditions on the set G used in Theorem 2. Then there is an interval (x0 − δ. Deﬁnition 1. y01 . φn (t))dt for x − x0  ≤ δ. y1 . .3. . . n. x0 + δ) on which the system of simultaneous ordinary diﬀerential equations dyi = fi (x. write the system deﬁned by (18) and (19) in integral form x φi (x) = y0i + x0 fi (t. . . yn ) − fi (x. The set S may be equipped with the norm ˜ ˜ φ − φ = max φi (x) − φi (x). . . . . n. . y1 . and let f1 . y1 . . . and x − x0  ≤ δ .24 2. . . Now deﬁne the set S to be the set of n–tuples (φ1 . yn ) ≤ R (21) in some domain G ⊂ G with G (x0 . i = 1. yn . . . . fn be continuous functions from G to R each satisfying a Lipschitz condition fi (x. . . . .11). A domain in a normed linear space X is an open connected set (cf. . . y0n ).i It is easy to check that S is complete. φn ) ∈ S.2 arise very often so it is useful to have a short description for them. y01 . . . . An example of a domain in R containing the point a is an interval (a − δ. .13). . . . Deﬁnition 1. and we shall see in the next section that the contraction mapping method also applies to certain integral equations. yn ) ∈ G . φn (t))dt for i = 1. (20) Since each of the functions fi is continuous on G. To see this. .2. . As in the proof of Theorem 2. a + δ) for some δ > 0. and a ∈ G then for some > 0 the open ball B (a) = {x ∈ X  x − a X < } lies in G (cf. n dx has a unique solution y1 = φ1 (x). . . φ1 (t). φn ) of continuous functions deﬁned on the interval [x0 − δ. . y0n ). . . . Choose δ > 0 with the properties that (1) x − x0  ≤ δ and maxi yi − y0i  ≤ Rδ together imply that (x. . . . φ1 (t). . (2) M δ < 1. . . . Let G be a domain in Rn+1 containing the point (x0 . . . . . . there is a bound fi (x. . Notice that if G is a domain in (X. . . x. . . n x0 is a contraction from S to itself. (19) (18) Proof. . yn ) ≤ M max yi − yi  ˜ ˜ ˜ 1≤i≤n (17) in the variables y1 . . x0 + δ] and such that φi (x) − y0i  ≤ Rδ for all i = 1. . . . Theorem 2. . . . . . · X ). Picard’s theorem easily generalises to systems of simultaneous diﬀerential equations. yn = φn (x) satisfying the initial conditions φ1 (x0 ) = y01 . . y1 . y1 .
initially criticized for its lack of rigour. φn (t)) − fi (t. . . . φn (t))dt i ˜ ≤ M δ max φi (x) − φi (x). It remains to check that F is a contraction: x ˜  (F (φ))i (x) − F (φ) (x) i ≤ x0 ˜ ˜ fi (t. so we begin with some important examples. . but several speciﬁc instances of integral equations had appeared earlier. Certain problems in physics led to the need to “invert” the integral equation g(x) = √ 1 2π ∞ −∞ eixy f (y)dy (22) for functions f and g of speciﬁc kinds.4. φn (t))dt ≤ Rδ for i = 1. . was later shown to be valid. He became famous for his Theorie analytique de la Chaleur (1822). . . . after maximising over x and i we have ˜ ˜ F (φ) − F (φ) ≤ M δ φ − φ . . so the system of diﬀerential equations (18) and (19) has a unique solution. Baptiste Joseph Fourier (1768–1830). . . . Fourier investigated them in much greater detail. φ1 (t). His research. . Applications to integral equations Integral equations may be a little less familiar than diﬀerential equations (though we have seen already that the two are intimately connected). so F : S → S is a contraction. . He established the partial diﬀerential equation governing heat diﬀusion and solved it by using inﬁnite series of trigonometric functions. It follows that the equation (20) has a unique solution. . This was solved – formally at least – by Fourier5 in 1811. 4. a mathematical treatment of the theory of heat. Though these series had been used before. who pursued interests in mathematics and mathematical physics. APPLICATIONS TO INTEGRAL EQUATIONS 25 then x φi (x) − y0i  = x0 fi (t. . . It provided the impetus for later work on trigonometric series and the theory of functions of a real variable. . . φ1 (t). so that F (φ) = (F (φ)1 . . The theory of integral equations is largely modern (twentieth– century) mathematics. 5 Jean . We shall see later that this is really due to properties of particularly good Banach spaces called Hilbert spaces. φ1 (t). who noted that (22) requires that f (x) = √ 1 2π ∞ −∞ e−ixy g(y)dy. n by (21). F (φ)n ) lies in S.
26
2. BANACH SPACES
Abel6 studied generalizations of the tautochrone7 problem, and was led to the integral equation
x
g(x) =
a
f (y) dy, b ∈ (0, 1), g(a) = 0 (x − y)b sin πb π
y a
for which he found the solution f (y) = g (x) dx. (y − x)1−b
This equation is an example of a Volterra8 equation. We shall brieﬂy study two kinds of integral equation (though the second is formally a special case of the ﬁrst). Example 2.4. A Fredholm equation9 is an integral equation of the form
b
f (x) = λ
a
K(x, y)f (y)dy + φ(x),
(23)
where K and φ are two given functions, and we seek a solution f in terms of the arbitrary (constant) parameter λ. The function K is called the kernel of the equation, and the equation is called homogeneous if φ = 0. We assume that K(x, y) and φ(x) are continuous on the square {(x, y)  a ≤ x ≤ b, a ≤ y ≤ b}. It follows in particular (see Section 1.9) that there is a bound M so that K(x, y) ≤ M for all a ≤ x ≤ b, a ≤ y ≤ b. Deﬁne a mapping F : C[a, b] → C[a, b] by
b
(F (f )) (x) = λ
a
K(x, y)f (y)dy + φ(x)
(24)
Now F (f1 ) − F (f2 ) = ≤ max F (f1 )(x) − F (f2 )(x)
x
λM (b − a) max f1 (x) − f2 (x)
x
= λM (b − a) f1 − f2 ,
6 Niels Henrik Abel (1802–1829), was a brilliant Norwegian mathematician. He earned wide recognition at the age of 18 with his ﬁrst paper, in which he proved that the general polynomial equation of the ﬁfth degree is insolvable by algebraic procedures (problems of this sort are studied in Galois Theory). Abel was instrumental in establishing mathematical analysis on a rigorous basis. In his major work, Recherches sur les fonctions elliptiques (Investigations on Elliptic Functions, 1827), he revolutionized the understanding of elliptic functions by studying the inverse of these functions. 7 Also called an isochrone: a curve along which a pendulum takes the same time to make a complete osciallation independent of the amplitude of the oscillation. The resulting diﬀerential equation was solved by James Bernoulli in May 1690, who showed that the result is a cycloid. 8 Vito Volterra (1860–1940) succeeded Beltrami as professor of Mathematical Physics at Rome. His method for solving the equations that carry his name is exactly the one we shall use. He worked widely in analysis and integral equations, and helped drive Lebesgue to produce a more sophisticated integration by giving an example of a function with bounded derivative whose derivative is not Riemann integrable. 9 This is really a Fredholm equation “of the second kind”, named after the Swedish geometer Erik Ivar Fredholm (18661927).
4. APPLICATIONS TO INTEGRAL EQUATIONS
27
so that F is a contraction mapping if λ < 1 . M (b − a)
It follows by Theorem 2.1 that the equation (23) has a unique continuous solution f for small enough values of λ, and the solution may be obtained by starting with any continuous function f0 and then iterating the scheme
b
fn+1 (x) = λ
a
K(x, y)fn (y)dy + φ(x).
Example 2.5. Now consider the Volterra equation,
x
f (x) = λ
a
K(x, y)f (y)dy + φ(x),
(25)
which only diﬀers10 from the Fredholm equation (23) in that the variable x appears as the upper limit of integration. As before, deﬁne a function F : C[a, b] → C[a, b] by
x
(F (f )) (x) = λ
a
K(x, y)f (y)dy + φ(x).
x
Then for f1 , f2 ∈ C[a, b] we have F (f1 )(x) − F (f2 )(x) = ≤ λ
a
K(x, y)[f1 (y) − f2 (y)]dy
x
λM (x − a) max f1 (x) − f2 (x),
x
where M = maxx,y K(x, y) < ∞. It follows that F 2 (f1 )(x) − F 2 (f2 )(x) = ≤ λ
a x
K(x, y)[F (f1 )(y) − F (f2 )(y)]dy F (f1 )(y) − F (f2 )(y)dy
a x x
λM
≤ λ2 M 2 max f1 (x) − f2 (x)
a
y − ady
= λ2 M 2 and in general
(x − a)2 max f1 (x) − f2 (x), x 2
F n (f1 )(x) − F n (f2 )(x) ≤ λn M n
(x − a)n max f1 (x) − f2 (x) x n! n (b − a) ≤ λn M n max f1 (x) − f2 (x). x n!
It follows that so that F n
(b − a)n f1 − f2 , n! is a contraction mapping if n is chosen large enough to ensure that (b − a)n λn M n < 1. n! F n f1 − F n f2 ≤ λn M n
10 If we extend the deﬁnition of the kernel K(x, y) appearing in (25) by setting K(x, y) = 0 for all y > x then (25) becomes an instance of the Fredholm equation (23). This is not done because the contraction mapping method applied to the Volterra equation directly gives a better result in that the condition on λ can be avoided.
28
2. BANACH SPACES
It follows by Corollary 2.2 that the equation (25) has a unique solution for all λ.
1. Assuming that R2 > 4L/C. Proof. Then yn − z + a is a sequence converging to a. Lemma 3. T (αx + βy) = αT (x) + βT (y) for all α. du (0) = 0. 1. β ∈ R or C. then the transformation deﬁned by T (v) = u in (28) is a linear operator from X to X. and T is a continuous ∞ ∞ linear map. The charge u = u(t) on the capacitor satisﬁes the equation L d2 u du 1 +R + u = v. T (xn ) → T (a).CHAPTER 3 Linear Transformations Let X and Y be linear spaces. y ∈ X. This problem may be phrased in terms of linear operators. then T ( n=1 αn xn ) = n=1 αn T xn . mappings or transformations. A simple observation that is useful in diﬀerential equations. where it is called ∞ the principle of superposition: if n=1 αn xn is convergent. λ2 are the (distinct) roots of Lλ2 + Rλ + C = 0. Bounded operators Example 3. Sometimes such functions will be called operators. ∞). Consider a voltage v(t) applied to a resistor R. and T a function from the set DT ⊂ X into Y . dt2 dt C (27) with some initial conditions say u(0) = 0. Then for any sequence xn → a. x. A linear transformation T : X → Y is continuous if and only if it is continuous at one point. dt then the solution of (27) is t u(t) = 0 k(t − s)v(s)ds. If the set DT is a linear subspace of X and T is a linear map. Let X = C[0.1. 29 . and inductor L arranged in series (an “LCR” circuit). It follows that T (yn ) → T (z). Assume that T is continuous at a point a. so T (yn − z + a) = T (yn ) − T (z) + T (a) → T (a). The set DT is the domain of T . capacitor C. and T (DT ) ⊂ Y is the range of T . and yn a sequence with yn → z. eλ1 t − eλ2 t L(λ1 − λ2 ) (28) where k(t) = 1 and λ1 . (26) Notice that a linear operator is injective if and only if the kernel {x ∈ X  T x = 0} is trivial. Let z be any point in X.
Lλ1 − λ2  Tx }. denote by B(X. then by Deﬁnition 3. It follows that T is continuous at 0. then write L(X. Y ) the subspace of continuous linear transformations. For example. Proof. Y ).2. T xn → 0 also.1. . The norm of the bounded linear operator T is T = sup x=0 Tx Y x X . In Example 3.1. The transformations T and S are closely related. A linear transformation T : X → Y is continuous if and only if it is bounded.1.2.2. X) = B(X). contradicting the assumption that T is continuous at 0. A linear transformation T : X → Y is (algebraically) invertible if there is a linear transformation S : Y → X with the property that T S = 1Y and ST = 1X . then T is algebraically invertible with T −1 = S. (30) which shows that Tv ∞ ≤ a sup k(t) v 0≤t≤a ∞ < The operator S is not bounded of course – think about what diﬀerentiation does. Definition 3. If T is bounded and xn → 0.1. Denote this linear space by L(X. ∞). Definition 3. ∞) and Y = C 2 [0. Theorem 3. (λT )(x) = λT x. if we take X = C[0. since t T v(s) ≤ 0 k(t − s) · v(s)ds. x On the other hand. Exercise 3.30 3. However. (29) Example 3. (27) can be written in the form S(u) = v for some linear operator S. the operator T is bounded when restricted to any C[0. so that yn → 0 as n → ∞. If X = Y .1 (1). Then for any n ∈ N there is a point xn with T xn > n xn . a] for any a. LINEAR TRANSFORMATIONS Similarly. a v ∞ . so by Lemma 3. T yn > 1 and T (0) = 0. Let yn = n xnn . S cannot be deﬁned on all of X – only on the dense linear subspace of twice–diﬀerentiable functions. Show that T = sup [2] Prove the following useful inequality: Tx Y x =1 { Y ≤ T · x X for all x ∈ X. The space of linear operators The set of all linear transformations X → Y is itself a linear space with the operations (T + S)(x) = T x + Sx. Conversely. If X and Y are normed spaces.1 T is continuous everywhere. A linear operator T : X → Y is bounded if there is a constant K such that T x Y ≤ K x X for all x ∈ X. 2. in Example 3. and we would like to develop a framework for viewing them as inverse to each other. suppose that T is continuous but unbounded. X) = L(X) and B(X.
(1) It is clear that T ≥ 0 since it is deﬁned as the supremum of a set of non– negative numbers. so T x = 0 for all x – that is. Let X and Y be normed spaces. If T = 0 then T x Y = 0 for all x. . Y ) is a normed linear space with the norm (29). We have not yet established that Tn → T in the norm of B(X. deﬁne T by T x = lim Tn x. for each x ∈ X the sequence (Tn x) converges. We have to show that the function T → T satisﬁes the conditions of Deﬁnition 1. 29). n→∞ It is clear that T is linear. T = 0. then B(X. Proof. assume that Y is a Banach space and let (Tn ) be a Cauchy sequence in B(X. For example. ≤ x X. Once the space of linear operators is known to be complete. Y ). if x(t) ∈ Rn then the linear diﬀerential equation dx = Ax(t). . Y for all m ≥ n ≥ N. for any > 0 there is an N such that Tm − Tn ≤ For any x ∈ X we therefore have Tm x − Tn x Take the limit as m → ∞ to see that T x − Tn x ≤ so that T − Tn ≤ x . Example 3. Y ) is a Banach space. This is particularly useful in linear systems theory and control theory. Finally. This proves that T − Tn → 0 as n → ∞. we can do analysis on the operators themselves. 2! 3! 1 A 2! 2 + . Y ) (cf. Since Tn x − Tm x ≤ Tn − Tm x → 0 as n ≥ m → ∞. Since Y is complete. (3) λT = sup x =1 (λT )x = λ sup x =1 T x = λ T . then we may deﬁne an operator eA = I + A + which makes sense since eA ≤ 1+ A + ≤ e A 1 2 1 A + A3 + . if n ≥ N . . (2) The triangle inequality is also clear: T + S = sup x =1 (T + S)x ≤ sup x =1 T x + sup x =1 Sx = T + S . so T ∈ B(X.3. has as solution x(t) = eAt x0 . if X is a Banach space and A ∈ B(X). . THE SPACE OF LINEAR OPERATORS 31 Lemma 3. Since (Tn ) is Cauchy. Then B(X.2. where A is an n × n dt matrix. the sequence (Tn x) is a Cauchy sequence in Y for each x ∈ X..2. . and T x ≤ K x for all x. x(0) = x0 . Y ).. If in addition Y is a Banach space.4. Then the sequence is bounded: there is a constant K with Tn x ≤ K x for all x ∈ X and n ≥ 1.
LINEAR TRANSFORMATIONS 3. the set {Tα x} is a bounded subset of Y . and xy ≤ x y . Let X be a Banach space and let Y be a normed linear space. In the next few sections we will prove the more technical results about linear transformations that provide the basic tools of functional analysis. y Then z ∈ B (x0 ) by construction. Banach algebras In many situations it makes sense to multiply elements of a normed linear space together. Uniform boundedness The ﬁrst theorem is the principle of uniform boundedness or the Banach– Steinhaus theorem. Let {Tα } be a family of bounded linear operators from X into Y . [2] If X is any Banach space. Assume ﬁrst that there is a ball B (x0 ) on which {Tα x} (a set of functions) is uniformly bounded: that is. and assume there is a multiplication (x. The algebra has an identity. if X has a unit then it is called unital. For any y = 0 deﬁne y + x0 . If. Now by linearity of Tα this shows that y Tα y − Tα x0 ≤ y Tα y + Tα x0 = Tα z ≤ K. Theorem 3. [3] A special case of [2] is the case X = Rn . [1] The continuous functions C[0. (31) Then it is possible to ﬁnd a uniform bound on the whole family { Ta }. for each x ∈ X. Let X be a Banach space. Definition 3. then the set { Tα } is bounded.4. 4. By choosing a basis for Rn we may identify B(Rn ) with the space of n × n real matrices. namely I(x) = x. Example 3. z= which can be solved for Tα y : Tα y ≤ K + Tα x0 y ≤ K +K y . then B(X) is a Banach algebra: ST = sup x =1 (ST )x = sup x =1 S(T x) ≤ S sup x =1 Tx = S T .3.2.32 3. 1] with sup norm form a Banach algebra with (f g)(x) = f (x)g(x). y) → xy from X × X → X such that addition and multiplication make X into a ring. Proof. so (31) implies that Tα z ≤ K. there is a constant K such that Tα x ≤ K if x − x0 < . Recall that a ring does not need to have a unit. Then X is called a Banach algebra.
Theorem 3. Proof. Y ) is strongly convergent. . Hence Tn x ≤ K x for all x ∈ X. This is proved by a contradiction argument: assume for not that there is no ball on which (31) holds. x5 . such that B n (xn ) ⊂ B n−1 (xn−1 ). n < n . all the αj ’s are distinct. . By assumption.3. If a sequence (Tn ) in B(X. Exercise 3. The construction of T means that (Tn ) converges strongly to T .4. Corresponding to this norm there is a notion of convergence in B(X. For each x ∈ X the sequence (Tn x) is bounded since it is convergent. Y ): we say that a sequence (Tn ) is uniformly convergent if there is T ∈ B(X. in this new ball the family {Tα x} is not bounded.2 (1). prove that n=1 B n (xn ) contains the single point z). . the sequence (Tn x) converges in Y . . then (Tn ) is strongly convergent to T . It is clear that T is linear. which contradicts the hypothesis that the set {Tα z} is bounded. . To ﬁnish the proof we have to show that there is a ball on which property (31) holds. Continue in the same way: by continuity of α2 there is a ball B 2 (x2 ) ⊂ B 1 (x1 ) on which Tα2 x > 2.2. Definition 3. α4 . and (32) shows that T x ≤ K x for all x ∈ X. 5 .2). If there is a T ∈ B(X. indices α3 . . . . there is a constant K such that Tn ≤ K for all n. Since each Tα is continuous. 4 . 2 Repeating this process produces points x3 . UNIFORM BOUNDEDNESS 33 where K = supα Tα x0 < ∞.4. It follows that Tα ≤ K +K as required. and Tαn x > n for all x ∈ B n (xn ). By the uniform boundedness principle (Theorem 3. Assume without loss of generality that 1 < 1. . . Y ) such that (Tn ) is strongly convergent to T . . . α5 . Now the sequence (xn ) is clearly Cauchy and therefore converges to z ∈ X say ∞ ¯ (equivalently. Y ) is strongly convergent if. By construction. Fix an arbitrary ball B0 . so there is a point x2 ∈ B 1 (x1 ) with Tα2 x2 > 2 for some index α2 = α1 . By assumption there is a point x1 ∈ B0 such that Tα1 x1 > 1 for some index α1 say. Recall the operator norm in Deﬁnition 3. and Y any normed linear space. Let X be a Banach space. Y ) with Tn − T → 0 as n → ∞ (so uniform convergence of a sequence of operators is simply convergence in the operator norm). (32) Deﬁne T by requiring that T x = limn→∞ Tn x for all x ∈ X. [2] Show by example that strong convergence does not imply uniform convergence. Assume without loss of generality that 2 < 1 . for any x ∈ X. Tαn z ≥ n for all n ≥ 1. x4 . 1 and positive numbers 3 . then there exists T ∈ B(X. there is a ball B 1 (x1 ) in which Tα1 (x1 ) > 1. Y ) with limn Tn x = T x for all x ∈ X. showing that T is bounded. A sequence (Tn ) in B(X. Prove that uniform convergence implies strong convergence.
If f : (0.34 3. so f (x + 2π) = f (x) for all x. .2 to Fourier analysis. Let1 Dn (x) = sn (y) = Proof. Recall that  sin(x) ≤ x for all x. 2. An application of uniform boundedness to Fourier series This section is an application of Theorem 3. . . in the context of Hilbert spaces and L2 functions. We will encounter Fourier analysis again. If you read up on Fourier analysis. sin 1 x 2 2π 0 Proof. For now we take a naive view of Fourier analysis: the functions will all be continuous periodic functions. Deﬁne the nth partial sum of the Fourier series to be n sn (x) = m=−n am e−imx .5. and we compute Fourier coeﬃcients using Riemann integration. Exercise. Definition 3. It follows that 2π 0 sin(n + 1 )x 2 dx ≥ sin 1 x 2 1 2  sin(n + )xdx. 2π] → R with f (0) = f (2π). . Lemma 3. For the lemma. where am = 1 2π 2π f (ξ)e−imξ dξ. x 2 1 Now  sin(n + 2 )x ≥ 1 for all x with (n + 1 )x between kπ + 1 π and kπ + 1 π for 2 2 6 3 k = 1. Now let X be the Banach space of continuous functions f : [0. it is helpful to notice that n Dn (x) = eijx .3. It follows (by thinking of the Riemann approximation to the integral) that 2π 0 2 1  sin(n + )xdx ≥ x 2 2n k=0 π(k + 1 3 n+ 1 2 −1 = 1 π n+ 1 2 1 k+ k=0 2n 1 3 →∞ as n → ∞. 1 This function is called the Dirichlet kernel. The basic questions of Fourier analysis are then the following: is there any relation between s(x) and f (x)? Does the function sn (x) approximate f (x) for large n in some sense? Lemma 3. sin 1 x 2 Then 1 2π 2π f (y + x)Dn (x)dx. . 0 Dirichlet kernel is not a “summability kernel”. 2π) → R is Riemann–integrable. with the uniform norm. 2π 0 sin(n + 1 )x 2 dx −→ ∞ as n → ∞. it will be helpful to note that the j=−n sin(n+ 1 )x 2 . 0 Extend the deﬁnition of f to make it 2π–periodic. LINEAR TRANSFORMATIONS 5.4. then the Fourier series of f is the series ∞ s(x) = m=−∞ am eimx .
2. Theorem 3. 2π] → R. Thus if the Fourier series of f converges at 0 for all f ∈ X. which contradicts Lemma 3. Then (don’t think about this – just draw a picture)  1 2π 2π fn (x)Dn (x)dx > 0 1 2π 2π Dn (x)dx − δ. for ﬁxed f ∈ X. . 2π 0 Tn ≤ (33) Then since for ﬁxed n Dn (x) ≤ Mn is bounded. Tom K¨rner. 0 2π 1 Dn (x)dx. By Theorem 3. AN APPLICATION OF UNIFORM BOUNDEDNESS TO FOURIER SERIES 35 Lemma 3. Moreover. We conclude that Tn = 1 2π 2π Dn (x)dx. 2π 0 Assume that for some δ > 0 we have 2π 1 Tn = Dn (x)dx − δ. For any f ∈ X. By Lemma 3. and Tn = Proof. this implies that the set { Tn } is bounded. Proof. then for each f ∈ X the set {Tn f } is bounded.5. then the family {Tn f } is bounded as n varies (since each element is just a partial sum of a convergent series). if the Fourier series of f converges at 0.4. 0 which contradicts the assumption (33).5. The problem of deciding whether or not the Fourier series of a given function converges at a speciﬁc point (or everywhere) is diﬃcult and usually requires some degree of smoothness (diﬀerentiability). o Cambridge University Press (1988). with f (0) = f (2π). we have Tn (f ) = sn (0) for all f ∈ X. Tn (f ) ≤ so 1 2π 2π 1 2π 1 2π 2π f (x)Dn (x)dx 0 2π Dn (x)dx. such that its Fourier series diverges at x = 0. 0 f (x)Dn (x)dx ≤ 0 1 f 2π 2π Dn (x)dx.5. You can read about various results in many books – a good starting point is Fourier Analysis.3. we may ﬁnd a continuous function fn that diﬀers from sign(Dn (x)) on a ﬁnite union of intervals whose total length 1 does not exceed Mn δ. Exercise 3. The conclusion is that there must be some f ∈ X whose Fourier series does not converge at 0. The linear operator Tn : X → R deﬁned by Tn (f ) = is bounded. 0 We are now ready to see a genuinely non–trivial and important observation about the basic theorems of Fourier analysis. There exists a continuous function f : [0.4.
Y so Bδ ⊂ P . By (36) δn for all n ≥ 1. ¯X Y Thus. By the Baire category theorem (Theorem A. (37) . 0) from R2 → R2 . (36) with n = 1 implies that there exists a point x1 ∈ B X such that y − T x0 − T x1 < δ2 . the set Y Y ¯ ¯ nT B X contains some ball Br (z) in Y . X Y Some notation: use Br and Br to denote the open balls of radius r centre 0 in X and Y respectively. Continuing. but in general the image of an open set is not open (Exercise 1. Bounded linear maps between Banach spaces cannot do this. but not onto.6 there is a sequence (δn ) of positive numbers such that ¯ T BX ⊃ BY n < 0. and n n y−T k=0 xk < δn+1 . By (36) with n = 0 there is a point x0 ∈ B X with 0 Y y − T x0 < δ1 . we obtain a sequence 1 (xn ) such that xn ∈ B X for all n. Theorem 3.5. Any point y ∈ Bδ can be written in the form y = (y + y0 ) − y0 . For any > 0. For completeness it is given here in the next three lemmas. Since X = nB X . For any 0 > 0 there is a δ0 > 0 such that X Y T B 2 0 ⊃ B δ0 . Then T maps open sets in X onto open sets in Y . Y Let y be any point in Bδ0 . LINEAR TRANSFORMATIONS It is more natural in functional analysis to ask for an appropriate semi–norm in which s(x) − f (x) = 0 for some class of functions f . and let T be a bounded linear map from X onto Y . Choose a sequence ( n ) with each n > 0 and Lemma 3.6. Open mapping theorem Recall that a continuous map between normed spaces has the property that the pre–image of any open set is open. It follows that the set Y Y P = {y1 − y2  y1 ∈ Bδ (y0 ). This is bounded and linear. y2 ∈ Bδ (y0 )} is contained in the closure of the set T Q. 1 1 where y0 = n z and δ = n r. there is a δ > 0 such that Y ¯X T B 2 ⊃ Bδ . The proof of the Open–Mapping theorem is long and requires the Baire category theorem.1). (35) ∞ n=1 n Proof. and (34) follows. we have Y = T (X) = nT B X .4) it follows that. Lemma 3. y) → (x. Since (y − T x0 ) ∈ Bδ1 . x2 ∈ B X } ⊂ B2 . so it will be omitted from the lectures. ∞ n=1 ∞ n=1 (34) Proof. Lemma 3. where X Q = {x1 − x2  x1 ∈ B X .36 3. 6. Let X and Y be Banach spaces.7. and certainly cannot send open sets to open sets. T B2 ⊂ P . Then T B X must contain the ball Bδ (y0 ). Of course the assumption that X maps onto Y is crucial: think of the projection (x. assume that δn → 0 as n → ∞. Without loss of generality. for some n. and T is onto.
1[1].1. the series n xn is absolutely convergent. Then T −1 is a bounded linear map.5 (T −1 )−1 maps open sets onto open sets. T is bounded. Then ∞ ∞ x ≤ n=0 xn ≤ n=0 n < 2 0.1. then the two norms are equivalent: there is another constant K with x for all x ∈ X. T −1 by requiring that T −1 y = x if and only if T x = y. · (1) ). For any open set G ⊂ X and for any point y = T x. Lemma 3. Since G is open. By Lemma ¯ Y 3. By Theorem 3. Proof. Hence Y T (G) ⊃ T (¯ + B X ) = T (¯) + T (B X ) ⊃ y + Bη . Generalizing Deﬁnition 3. Y X That is.8. and T −1 is a linear operator. giving the bound in the other direction. x x ¯ As an application of Theorem 3. T (B X ) ⊃ Bη for some η > 0. write x = n xn . · (1) ) to (X. for any y ∈ Bδ0 we have found a point x ∈ B2 0 such that T x = y. proving (35). we only need to show it is continuous by Theorem 3. T −1 is also bounded. Proof. so by Lemma 3. there is a ball B X such that x + B X ⊂ G. Consider the map T : x → x from (X. Deﬁne the inverse of T . Definition 3.7. If X is a Banach space with respect to two norms · · and there is a constant K such that (2) (1) and x (1) ≤K x (2) . and let T be an injective bounded linear map from X to Y . ¯ Notice that Lemma 3.1 it is convergent.8 proves Theorem 3.9.1 slightly. so by Lemma 2.6. and T T −1 y = y for all y in the domain of T −1 .6. Proof. By Exercise 1. so (37) shows that y = T x since δn → 0. there ¯ ¯ ¯ Y Y is an open ball Bη such that y + Bη ⊂ T (G). Let X and Y be Banach spaces. The map T is continuous.5 since it implies that T (G) is open. Since T −1 is a linear operator. By assumption.5. OPEN MAPPING THEOREM 37 Since xn < n . this means that T −1 is continuous. (2) ≤K x (1) . Let T : X → Y be an injective linear operator. we establish a general property of inverse maps.9. we have the following. It is easy to check that T −1 T x = x for all x ∈ X. Corollary 3. x ∈ G. Lemma 3. Then the domain of T −1 is a linear subspace of Y .
then the graph of T is simply some linear subspace. Let T : X → Y be a linear operator from a normed linear space X into a normed linear space Y . The graph of T is the set GT = {(x. Fix the norm (x. Notice as usual that this notion becomes trivial in ﬁnite dimensions: if X and Y are ﬁnite–dimensional. F (x) ≤ p(x) for all x ∈ X.6.7. with domain DT . so GT is itself a Banach space. First we prove the Hahn–Banach lemma. and gα is a real linear functional on Yα with the properties that gα (x) = f (x) for all x ∈ Y. so T Let X be a normed linear space. Let Y be a subspace of X. Then there exists a functional F ∈ X ∗ such that F (x) = f (x) for x ∈ Y .10. T x) = P −1 x ≤ K x X for all x ∈ X. A bounded linear operator from X into the normed space R is a (real) continuous linear functional on X.38 3. then it is continuous. The next theorem is called the closed–graph theorem. If GT is a closed set in X × Y (see Example 1. Theorem 3. Lemma 3.9 that P −1 is a bounded linear operator from X into GT .4. The space of all continuous linear functionals is denoted B(X. X for some constant K. T x) = x. linear. and T : X → Y a linear operator (notice that the notation means DT = X). This is answered in great generality using the Hahn–Banach theorem (Theorem 3.2 shows that X ∗ is itself a Banach space independently of X.7 below). p(λx) = λp(x) for all λ ≥ 0. All the material here may be done again with C instead of R without signiﬁcant changes. to separate points).7) then T is a closed operator. One of the most important questions one may ask of X ∗ is the following: are there “enough” elements in X ∗ ? (to do what we need: for example. and f ∈ Y ∗ with f (x) ≤ p(x) for all y ∈ Y. 7. T x)  x ∈ DT } ⊂ X × Y. so (x. Notice that Lemma 3. Then P is clearly bounded. y) = x X + y Y on X × Y . gα ) in which Yα is a linear subspace of X containing Y . The graph GT is. Let X be a real linear space. Consider the projection P : GT → X deﬁned by P (x. Hahn–Banach theorem for all x ∈ X. It follows by Lemma 3. Let X and Y be Banach spaces. and p : X → R a continuous function with p(x + y) ≤ p(x) + p(y). Proof. and bijective. If T is closed. It follows that x X + T x Y ≤ K x is bounded – and therefore T is continuous by Theorem 3. . gα (x) ≤ p(x) for all x ∈ Yα .1. see Corollary 3. LINEAR TRANSFORMATIONS Definition 3. which is automatically closed. and it is called the dual or conjugate space of X. a closed linear subspace in X × Y . Proof. x. by linearity of T . y ∈ X. R) = X ∗ . Let K be the set of all pairs (Yα .
That is. Note that if x = y are in Y0 . so x∗ = y ∗ . − p(−y1 − y) − g0 (y) ≤ c for all y ∈ Y0 . y Now multiply (39) by λ < 0. Then x∗ (x) = θx∗ (x) = x∗ (θx) ≤ p(θx) = y ∗ θx = y ∗ x . Deﬁne a linear functional g1 ∈ Y1∗ by g1 (y + λy1 ) = g0 (y) + λc. By Theorem A. Let Y1 be the linear space spanned by Y0 and y1 : each element x ∈ Y1 may be expressed uniquely in the form x = y + λy1 . Multiply (38) by λ > 0 and substitute y λ (38) (39) (40) for y to obtain λc ≤ p(y + λy1 ) − g0 (y). Then for any y ∗ ∈ Y ∗ there corresponds an x∗ ∈ X ∗ such that x∗ = y ∗ . Assume that y1 ∈ X\Y0 . we deduce that g1 (y + λy1 ) = g0 (y) + λc ≤ p(y + λy1 ) for all λ ∈ R and y ∈ Y0 . x∈Y0 y∈Y0 Choose c to be any number in the interval [A. λ ∈ R. g0 ). Since (40) is clear for λ = 0.7. any linear functional deﬁned on a subspace may be extended to a linear functional on the whole space with the same norm. g1 ) ∈ K and (Y0 . Then by construction of A and B. then g0 (y) − g0 (x) = g0 (y − x) ≤ p(y − x) ≤ p(y + y1 ) + p(−y1 − x). gβ ) if Yα ⊂ Yβ and gα = gβ on Yα . Let X be a real normed space. write x∗ (x) = θx∗ (x) for θ = ±1. That is. To check that x∗ ≤ y . substitute λ for y and use the homegeneity assumption on p to obtain (40) again. This contradicts the maximality of (Y0 . the Hahn–Banach theorem follows at once (for complex spaces a little more work is needed).1. Proof. B]. g0 ) in K.10. and x∗ = F . It is clear that any totally ordered subset {(Yλ . c ≤ p(y + y1 ) − g0 (y) for all y ∈ Y0 . Now we choose the constant c carefully. . The reverse inequality is clear. and x∗ (y) = y ∗ (y) for all y ∈ Y. It follows that A = sup {−p(−y1 − x) − g0 (x)} ≤ inf {p(y + y1 ) − g0 (y)} = B. (Y1 . For real linear spaces. Apply the Hahn– Banach Lemma 3. Let p(x) = y ∗ x . HAHN–BANACH THEOREM 39 Make K into a partially ordered set by deﬁning the relation (Yα . there is a maximal element (Y0 . g1 ) with Y0 = Y1 . and Y a linear subspace.7. f (x) = y ∗ (x). gα ) ≤ (Yβ . All that remains is to check that Y0 is all of X (so we may take F to be g0 ). gλ ) has an upper bound given by the subspace λ Yλ and the functional deﬁned to be gλ on each Yλ . y ∈ Y0 . because y1 is assumed not to be in the linear space Y0 . Theorem 3. so −p(−y1 − x) − g0 (x) ≤ p(y + y1 ) − g0 (y). g0 ) ≤ (Y1 .
Corollary 3. then there exists a functional x∗ = 0 such that x∗ (y) = 0 for all y ∈ Y .4. Let X be a normed linear space. for any x = 0 in X there is a functional x∗ ∈ X ∗ with x∗ = 1 and x∗ (x) = x . Deﬁne a linear functional z ∗ ∈ Y1∗ by z ∗ (y + λx0 ) = λ. y + λx0 = λ λ It follows that z ∗ (x) ≤ x /d for all x ∈ Y1 . Then 1 = z ∗ (x0 − yn ) ≤ z ∗ 1 so z ∗ = d . . Proof. If λ = 0.2. every / point x in Y1 may be represented uniquely in the form x = y + λx0 . x∗ = 1 .7 to z ∗ . Then. x∗ x∗ =0 x∗ =1 sup x∗ (x) ≤ x . If Y is not dense in X. (41) Then there exists a point x∗ ∈ X ∗ such that x∗ (x0 ) = 1.6. The last two expressions are clearly equal. Corollary 3. Let Y be a linear subspace of a normed linear space X. 1 d. Choose a sequence x0 − yn → z ∗ d.2 with Y = {0} to ﬁnd z ∗ = X ∗ such that z ∗ = 1/ x . If X is a normed linear space. Apply Theorem 3.3. Proof. Since x0 ∈ Y . Apply Corollary 3. Let Y be a linear subspace of the normed linear space X. Corollary 3. z ∗ (x) = 1. so z ∗ ≤ (yn ) ⊂ Y with x0 − yn → d as n → ∞. x∗ (y) = 0 for all y ∈ Y0 . so 0 0 0 x∗ =1 sup x∗ (x) ≥ x .3 with x = y − z. then there exists x∗ ∈ X ∗ such that x∗ (y) = x∗ (z). Notice that if there is no point x0 ∈ X satisfying (41) then Y must be dense in X. λ ∈ R. Proof. and let x0 ∈ X have the property that y∈Y inf y − x0 = d > 0.5. Proof. Let Y1 be the linear space spanned by Y and x0 . LINEAR TRANSFORMATIONS Many useful results follow from the Hahn–Banach theorem. It is also clear that x∗ =1 By Corollary 3. If z = y in a normed linear space X. Corollary 3.2). then x = sup x∗ (x) = sup x∗ (x).40 3. there exists x∗ such that x∗ (x) = x and x∗ = 1. Corollary 3.3. then y + x0 ≥ λd. We may therefore take x∗ to be x z ∗ . d Proof. Apply Corollary 3. with y ∈ Y . So we may choose x0 with (41) and apply Corollary 3.
HAHN–BANACH THEOREM 41 Notice ﬁnally that linear functionals allow us to decompose a linear space: let X be a normed linear space. with λ = x∗ (x) and z = x − λx0 ∈ Nx∗ . Thus. where Y is the one–dimensional space spanned by x0 . If x∗ = 0. . Any element x ∈ X can then be written x = z + λx0 . The null space or kernel of x∗ is the linear subspace Nx∗ = {x ∈ X  x∗ (x) = 0}. X = Nx∗ ⊕ Y .7. then there is a point x0 = 0 such that x∗ (x0 ) = 1. and x∗ ∈ X ∗ .
42 3. LINEAR TRANSFORMATIONS .
He made major contributions in other areas of mathematics. often written simply a. These sets are called the Borel sets. even if we extend the space to Riemann–integrable functions. potential theory. and something that happens everywhere except on a set of measure zero is said to happen almost everywhere. was a French mathematician who revolutionized the ﬁeld of integration by his generalization of the Riemann integral. The Lebesgue measure on R is a map µ : B → R ∪ {∞} with the properties that (i) µ[a. b] = µ(a. ∞ (ii) µ (∪∞ An ) = n=1 µ(An ). with measure zero. 43 1 Henri .10[3] that the space C[0. As discussed in the section on completions. Exercise 4. but all the ones you can write down or that might arise in a practical setting are measurable.1.CHAPTER 4 Integration We have seen in Examples 1. real number is irrational.1. This does not give any real sense of what kind of functions are in the completion. including that of the French mathematicians Emile Borel and Camille Jordan. A year later. ∅ ∈ B. and Fourier analysis. [1] Prove that µ(Q) = 0. We will call Borel sets measurable. we can think of the completion of the space in terms of all limit points of (equivalence classes) of Cauchy sequences. In fact the Borel sets form a σ–algebra: R. Lebesgue served on the faculty of several French universities. allow any subset of a null set to also be regarded as “measurable”. Sets of measure zero are called null sets. 1. Leon Lebesgue (1875–1941). Many subsets of R are not measurable. Lebesgue measure Definition 4. mathematical analysis was limited to continuous functions. n=1 Notice that the Lebesgue measure attaches a measure to all measurable sets. Let B denote the smallest collection of subsets of R that includes all the open sets and is closed under countable unions. countable intersections and complements. 1] of continuous functions with the p–norm 1 1/p f p = 0 f (t)p dt is not complete. including topology. and B is closed under countable unions and intersections. b) = b − a. For technical reasons. Thus a.e. Lebesgue extended the usefulness of the deﬁnite integral by deﬁning the Lebesgue integral: a method of extending the concept of area below a curve to include many discontinuous functions. Building on the work of others. Lebesgue developed (in 1901) his theory of measure. In this chapter we construct the completions Lp for 1 ≤ p ≤ ∞ by describing (without proofs) the Lebesgue1 integral. based largely on the Riemann method of integration. Up to the end of the 19th century.e.
2. “Can you take Solovay’s inaccessible away?”. (42) where the ci are non–zero constants and the Ei are disjoint measurable sets with µ(Ei ) < ∞.M. the limit is guaranteed to exist since the sequence is increasing. and = 0 if x ∈ Q is an example of a measurable function that is not / Riemann integrable. A function f : R → R ∪ {±∞} is a Lebesgue measurable function if f −1 (A) ∈ B for every A ∈ A. The references are R. Shelah tried to remove Solovay’s additional axiom. Solovay. E Notice that (once we allow the value ∞). “A model of set–theory in which every set of reals is Lebesgue measurable”. In the Lebesgue theory. B. real number is transcendental. Then a. A simple function is a map f : R → R of the form n f (x) = i=1 ci χEi (x). and transcendental if not. Annals of Math.1[1]). The question really has two faces: 1) using the usual axioms of set theory (including the Axiom of Choice). . INTEGRATION [2] More can be said: call a real number algebraic if it is a zero of some polynomial with rational coeﬃcients.1. Solovay showed that there is a model of set theory (excluding the Axiom of Choice but including a further axiom) in which every subset of R is measurable. and deﬁne the integral of f to be f dµ = lim E n→∞ fn dµ. We write this as fn ↑ f a. is it still possible to exhibit a non–measurable subset of R? The ﬁrst question is easily answered. The integral of the simple function (42) is deﬁned to be n f dµ = E i=1 ci µ(E ∩ Ei ) for any measurable set E. 2 This “construction” requires the use of the Axiom of Choice and is closely related to the existence of a Hamel basis for R as a vector space over Q.e.e. The second question is much deeper because the answer is “no”. [1] The characteristic function χQ . The basic idea in Riemann integration is to approximate functions by step functions. [3] Prove that for any measurable sets A. Israel Journal of Math.e. Shelah. 48 (1984). 1–56. 1–47 but both of them require extensive additional background to read. then there is an increasing sequence (fn ) of simple functions with the property that fn (t) → f (t) a. deﬁned by χQ (x) = 1 if x ∈ Q. This is part of a subject called Model Theory. we do something similar. and answered a related question by exhibiting a model of set theory (excluding the Axiom of Choice but otherwise as usual) in which every subset of R has the Baire property. and S. whose “integrals” are easy to ﬁnd. Example 4. The basic approximation fact in the Lebesgue integral is the following: if f : R → R∪{±∞} is measurable and non–negative. using simple functions instead of step functions.. can you exhibit a non– measurable subset of R? 2) using the usual axioms of set theory without the Axiom of Choice. [2] All continuous functions are measurable (by Exercise 1. These give the upper and lower estimates. 92 (1970). [4] Can you construct2 a set that is not a member of B? Definition 4. µ(A ∪ B) = µ(A) + µ(B) − µ(A ∩ B).44 4.
b]. b]. ∞) and L∞ [a.2. If 1 r = 1 p + 1 . 0 A measurable function f on [a.sup. written f ∞ = ess. Notice that · p on Lp is only a semi–norm. on [a. A similar deﬁnition may be made of p–integrable functions on R.1. b] to be the linear space of essentially bounded functions. b]. Then deﬁne Lp [a.3.e. Then f is itself a simple function. b] for which b 1/p f p = a f  dµ p <∞ for p ∈ [1. Definition 4. In practice we will not think of elements of Lp as equivalence classes of functions.4[3].b] f . 1]) = 0. Let f (x) = χQ∩[0. so 1 f dµ = µ(Q ∩ [0. write f = f + − f − where both f + and are non–negative and measurable. Theorem 4. Note the “opposite” behaviour to the sequence spaces p in Example 1.e. Deﬁne Lp [a. b] and Lp (R) are (separable) Banach spaces under the norm · p . f Hence L1 [a. It follows that for any measurable f on [a.2. giving the linear space Lp (R). 1 2 ≤ f 3 ≤ ··· ≤ f ∞. The essential supremum of such a function is the inﬁmum of all such essential bounds K. b].1 is sometimes called the Riesz–Fischer theorem. b] = Lp / ∼. b] ⊃ · · · ⊃ L∞ [a.1] (x). Theorem 4. since many functions will for example have f p = 0. . The following theorems are proved in any book on measure theory or modern analysis or may be found in any of the references. o Theorem 4. then q fg ≤ f r ≤ f p g q for any f ∈ Lp [a. g ∈ Lq [a. The normed spaces Lp [a. In the theorem we allow p and q to be anything in [1. Theorem 4.1. b]. LEBESGUE MEASURE 45 f − For a general measurable function f . where we saw that 1 ⊂ 2 ⊂ · · · ⊂ ∞. b] to be the linear space of measurable functions f on [a.[a. the space of Lp functions. but as functions deﬁned a. then deﬁne f dµ = E E f + dµ − E f − dµ. Deﬁne an equivalence relation on Lp by f ∼ g if {x ∈ R  f (x) = g(x)} is a null set. Example 4. ∞] with the obvious 1 interpretation of ∞ . b] ⊃ L2 [a.2 is H¨lder’s inequality. b] is essentially bounded if there is a constant K such that f (x) ≤ K a.
The most important result on product measures is Fubini’s theorem. B).e. y ∈ B} with A ∈ A. depite the notation. y) is an integrable function of y for a. A × B) is the Cartesian product of (X. The most useful general result about Lebesgue integration is Lebesgue’s dominated convergence theorem. Theorem 4. E Exercise 4. f +g p 1 ≤ f ≤ f 2 g 2 p + g p. .46 4. y) is an integrable function of X for a. y. Then there is a unique measure λ on X × Y with the property that λ(A × B) = µX (A) × µY (B) for all measurable rectangles A × B. Subsets of X × Y (Cartesian product) of the form A × B = {(x. and hd(µX × µY ) = hdµX dµY = hdµY dµX . then x → h(x. [1] Prove that the Lp –norm is strictly convex for 1 < p < ∞ but is not strictly convex if p = 1 or ∞.e. Let A × B denote the smallest σ–algebra on X × Y containing all the measurable rectangles.3. Let A. Notice that. and there exists an integrable function g such that fn (t) ≤ g(t) a. Theorem 4. B ∈ B are called (measurable) rectangles. The measure space (X × Y. this is much larger than the set of all measurable rectangles. Product spaces and Fubini’s theorem Let X and Y be two subsets of R. If h is an integrable function on X × Y . INTEGRATION Two easy consequences of H¨lder’s inequality are the Cauchy–Schwartz inequalo ity.e.4. x. y → h(x. y) : x ∈ A. 2.2. Let µX and µY denote Lebesgue measure on X and Y . A) and (Y. fg and Minkowski’s inequality. B denote the σ–algebra of Borel sets in X and Y respectively. Let (fn ) be a sequence of measurable functions on a measurable set E such that fn (t) → f (t) a.e. This measure is called the product measure of µX and µY and we write λ = µX × µY . Then f dµ = lim E n→∞ fn dµ.
where he remained for the rest of his life. x) ≥ 0. ¯ (iv) (x.2) that · is indeed a norm.CHAPTER 5 Hilbert spaces We have seen how useful the property of completeness is in our applications of Banach–space methods to certain diﬀerential and integral equations. ·) : H × H → C with the properties (i) (x. z) + (y. y). Theorem 3. (v) the norm deﬁned by x = (x. y ∈ H and λ ∈ C. (iv) hold then (H. Notice that property (v) makes sense since by (i) (x. (ii).4). Since Hilbert’s time.D. He became (1895) professor of mathematics at the University of Gottingen. then (x. x)1/2 makes H into a Banach space. ·)) is called an inner– product space. Hilbert received his Ph. A complex linear space H is called a Hilbert1 space if there is a complex–valued function (·. 1 David n i=1 xi yi makes Cn into an n– ¯ Hilbert (1862–1943) was a German mathematician whose work in geometry had the greatest inﬂuence on the ﬁeld since Euclid. 47 . many mathematicians from the United States and elsewhere who later played an important role in the development of mathematics went to Gottingen to study under him. He also enumerated 23 unsolved problems of mathematics that he considered worthy of further investigation. and the calculus of variations. and we shall see below (Lemma 5. and so a Hilbert space is a complete inner product space. Hilbert proposed a set of 21 such axioms and analyzed their signiﬁcance. y) for all x. and this has enormous consequences. y ∈ C. x) = 0 if and only if x = 0. (iii) (λx. functional analysis. x) ≥ 0. x) = 0. (·. The function (·. some obvious ideas for use in diﬀerential equations (like Fourier analysis) seem to go wrong in the obvious Banach space setting (cf. [1] If X = Cn . then the properties determine a real Hilbert space. ¯ Notice that (iii) and (iv) imply that (x. (ii) (x + y. z) for all x. z ∈ H. After making a systematic study of the axioms of Euclidean geometry. If only properties (i). and (x. nearly all of these problems have been solved. z) = (x. ·) is called an inner or scalar product. y) = (y. and (x. from the University of Konigsberg and served on its faculty from 1886 to 1895. If the scalar product is real–valued on a real linear space.1. 0) = (0. y) = dimensional Hilbert space. 1. (iii). mathematical physics. all the results below apply to these. including algebraic number theory. Hilbert spaces Definition 5. Example 5. Hilbert contributed to several branches of mathematics. y. y) = λ(x. x) for all x. It turns out that not all Banach spaces are equally good – there are distinguished ones in which the parallelogram law (equation (43) below) holds. It makes more sense in this section to deal with complex linear spaces. λy) = λ(x. Between 1900 and 1914.1. so from now on assume that the ground ﬁeld is C. However.
x) + λ(x. and it is a Hilbert space by Example 2.48 5. x)1/2 is a norm on a Hilbert space. y) . All the properties are clear except the triangle inequality. ≥ 2r(x. x) + 2 x y = ( x + y )2 . so x + y ≤ x + y . Then the inner–product b ¯ (f. x) = 2 (x. x)1/2 . This is well–deﬁned by the Schwartz inequality ¯ Lemma 5. Hence if x = y = 1 and x + y = 2. The norm on a Hilbert space is strictly convex (cf. The function deﬁned by x = (x. [3] Let X = 2 (square–summable sequences. Proof.4[3]) with the inner– ∞ product ((xn ). y)].1[3]. Then X is a Hilbert space (by the Cauchy–Schwartz inequality and Theorem 4. y) + (y. If H is a Hilbert space. b] with inner–product (f. y) + 2 [λ(x. From the proof of Lemma 5. and let λ ∈ C. g) = a f (t)g(t)dt makes X into an inner–product space that is not a Hilbert space. HILBERT SPACES [2] Let X = C[a. y) = 0. y) ≤ x y . and choose θ such that θ = − arg(x. Proof. Theorem 5. if H is a complex Banach space with norm · satisfying (43). Lemma 5. Proof. ·) satsifying x = (x. (yn )) = n=1 xn yn . then x+y 2 2 2 2 + x−y =2 x +2 y (43) for all x. Conversely. From the proof of Lemma 5.1. Next there is the peculiar parallelogram law. Then 0 ≤ (x + λy. (x.7). Lemma 5. . b] (complex–valued continuous functions). then λ = 1 and 1 − λ = 2. y are non–zero (the result is clear if x or y is zero).2 it follows that if x + y = x + y and y = 0 then x = −λy. Assume that x.2. Since (x. x + λy) = = Let λ = −re Then iθ x x 2 2 + λ2 y + λ y 2 2 2 ¯ + λ(y.1.1.1. 2 x 2 + r2 y Take r = x / y to obtain the result. Deﬁnition 1.1). y) ≤ 2 x y . then H is a Hilbert space with scalar product (·. so λ = −1 and x = y. b ¯ [4] Let X = L2 [a. y) if (x. y ∈ H. y) = x y . then x = −λy. if (x. Lemma 5. we have x+y 2 = ≤ x x 2 2 + y + y 2 2 + (x. for some r > 0. g) = a f (t)g(t)dt. We shall see later that 2 is the only p space that is a Hilbert space. In a Hilbert space. see Example 1.3. y) + (y.
n ∈ N) then m x m r(x.y = x. y) = (x.. v) + (u − w. n n n Now (x. y) is a continuous function in x (by (44)).1. x)1/2 coincides with the norm x . − u−v+w + = u+v−w 2 u+v 2 2 − u−v−w 2 − 2 u − v 2. y) = (1/n)(x. Taking w = u shows that (2u. λ(x. Since (x. HILBERT SPACES 49 Proof. y) for all λ > 0. y) = ((m − 1)x. v) = 2(u. y) = λ(x. For λ < 0. y) ((m − 2)x. v) + (u − w. y) − (λ(−x). m(x. v). z) = 2 . y) = (λx. . 2 2 2 + u+v−w + u−v−w 2 2 2 =2 u+v =2 u−v 2 2 + 2 w 2. deﬁne (x. use (43) to show that u+v+w u−v+w It follows that u+v−w showing that (u + w. we deduce that λ(x. 2 To prove condition (iii) in Deﬁnition 5. ·) satisﬁes condition (ii) in Deﬁnition 5.. with the second expression simply omitted). + 2 w 2. x) = x 2 + i i x 2 1 + i2 − x 2 1 − i2 = x 2 .1. y = (rx. 4 4 the inner–product norm (x. so (x/n. y) + 2(x. Taking u + w = x. The forward direction is easy: simply expand the expression (x + y. y) + (x. so (u + w. v) = 2 (u. For the reverse direction. y) − λ(−x. y) = λ(x. A similar argument shows that (u + w. z) + (y. y) = 1 4 2 2 2 2 x+y − x−y +i x + iy − x − iy (44) (in the real case. v) = 2(u. y) = λ(x. y) = λ(0. use (ii) to show that (mx. y) = 0. y) = = = = ((m − 1)x + x. x + y) + (x − y. The same argument in reverse shows that n(x/n. v). y) + λ(−x. y) = m . To prove that (·. y) = (x. v). y). If r = m/n (m. z). v) + (u − w. v = z then gives x+y (x. y). y). y) − (λx. z = (x + y. v) = 2 (u. y) . v). x − y). y). u − w = y.1.
It follows that ym − yn → 0 as m. if (x. By the parallelogram law (43). Condition (iv) is clear. HILBERT SPACES so (iii) holds for all λ ∈ R. It follows that 2 x0 − y0 + y1 2 = x0 − y0 + x0 − y1 . . it makes sense in a Hilbert space to talk about the point in a closed convex set that is “closest” to a given point. For sets N.3). x is orthogonal to N . y) = 0. written x ⊥ y. y) + iν(x. The sets N and M are orthogonal (written N ⊥ M ) if x ⊥ M for all x ∈ N .50 5. Proof. Now H is complete and M is a closed subset. y) + (iνx. 1 4 x0 − 2 (ym + yn ) 2 + y m − yn 2 = 2 x0 − ym 2 + 2 x0 − yn 2 → 4d2 as m. so if λ = µ + iν. so 2 4 x0 − 1 (ym + yn ) 2 2 ≥ 4d2 . y) = (µx. (iii) is clear. By convexity (Deﬁnition 1. (45) That is. written x ⊥ N . y) = (λx. Let M be a closed convex set in a Hilbert space H. A point x ∈ H is orthogonal to a point y ∈ H. 2 y∈M x0 − y . n → ∞. if (x. The orthogonal complement of M is deﬁned as M ⊥ = {x ∈ H  x ⊥ M }. 1 (ym + yn ) ∈ M . For every point x0 ∈ H there is a unique point y0 ∈ M such that x0 − y0 = inf y∈M x0 − y . y) + i(νx. and (v) follows from the assumption that H is Banach space. It remains to check that the point y0 is the only point with property (45).4. Notice that for any M . ≤ 2 inf y∈M since (y0 + y1 )/2 lies in M . Since the Hilbert norm is strictly convex (Lemma 5. so y1 = y0 . y) = µ(x. y) = (µx. n → ∞. showing (45). Let d = inf y∈M x0 − y and choose a sequence (yn ) in M such that x0 − yn → d as n → ∞. we deduce that x0 − y0 = x0 − y1 . y). Lemma 5. For λ = i. M in H. M ⊥ is a closed linear subspace of H. y) = 0 for all y ∈ N . λ(x. 2. Let y1 be another point in M with x0 − y1 = inf Then 2 x0 − y 0 + y1 2 ≤ x0 − y0 + x0 − y1 x0 − y ≤ 2 x0 − y0 + y1 . so limn→∞ yn = y0 exists and lies in M . Now x0 − y0 = limn→∞ x0 − yn = d.6). Projection theorem Let H be a Hilbert space.
Let N be the null space of x∗ . By construction. N is a closed linear subspace of H. Proof. y0 + λy ∈ M so x0 − y0 Hence −2 λ(y. z0 = 0. Suppose that x0 = y1 + z1 with y1 ∈ M and z1 ∈ M ⊥ . z − z = 0 and therefore z = z . If N = H. Exactly the same argument may be applied to −y since −y ∈ M .3. Proof. The norm of the functional is given by x∗ = z . The elements y0 . x =1 x =1 x =1 . x0 − y0 ) + λ2 y 2 ≥ 0. z) ≤ sup ( x z ) = x . z − z ) = 0 for all x ∈ H.2. assume that x∗ (x) = (x. Assume next that λ = −i and divide by . If M is a closed linear subspace and M = H. so (taking x = z − z ). As → 0 we deduce that (y. 0). Then y0 − y1 = z1 − z0 ∈ M ∩ M ⊥ = {0}.z To check uniqueness.2.2) to any x0 ∈ H\M . x0 − y0 ) ≤ 0. Thus (y. we check that the decomposition is unique. α = x∗ (z0 ) = 0. z0 ) = 0.1 there is a point z0 ∈ N ⊥ . If N = H. z0 are determined uniquely by x0 . As → 0. Theorem 5. z0 = (x. then there exists an element z0 = 0 such that z0 ⊥ M . z) for all x ∈ H. It follows that z0 . Finally. z0 ).4). It follows that all linear functionals on a Hilbert space are given by taking inner products – the Riesz theorem. PROJECTION THEOREM 51 This gives us the Orthogonal Projection Theorem. . Corollary 5. so (x − x∗ (x)z0 /α. then by Corollary 5. z) for all x ∈ H. Theorem 5. x0 − y0 ) ≤ 0. x0 − y0 ) + λ2 y 2 . we get (47) (y.1. If x0 ∈ M . y0 ∈ M . For every bounded linear functional x∗ on a Hilbert space H there exists a unique element z ∈ H such that x∗ (x) = (x. Proof. Then any x0 ∈ H can be written x0 = y0 + z0 . (46) 2 ≤ x0 − y0 − λy 2 = x0 − y0 2 − 2 λ(y. Assume now that λ = > 0 and divide by . then x∗ = 0 and we may take x∗ (x) = (x. Apply the projection theorem (Theorem 5. then let y0 be the / point in M with x0 − y0 = inf x0 − y y∈M (this point exists by Lemma 5. For any x ∈ H. If x0 ∈ M then y0 = x0 and z0 = 0. It follows that the point z0 = x0 − y0 lies in M ⊥ . Finally. z0 ∈ M ⊥ . α ¯ If we substitute z = (z0α 0 ) z0 . Now for any y ∈ M and λ ∈ C. we get x∗ (x) = (x. Then (x. the point x − x∗ (x)z0 /α lies in N . z ) for all x ∈ H. x0 − y0 ) = 0 for all y ∈ M . Let M be a closed linear subspace of a Hilbert space H. x∗ (x) x∗ = sup x∗ (x) = sup (x. showing that (46) and (47) hold with y replaced by −y.
y ∈ N . Corollary 5. notice that P 2 x1 = P (P x1 ) = P (y1 ) = y1 = P x1 since y1 ∈ M . every x ∈ H can be written uniquely in the form x = y + z with y ∈ M . (λ1 z1 + λ2 z2 ) ∈ M ⊥ . x2 ) = (y1 . If M ⊥ N . P 2 = P and P = 1 if P = 0. 3. then H = M ⊕ M ⊥ . (T x. 2 where yi ∈ M and zi ∈ M ⊥ . ∗ ∗∗ ∗ −1 I = I. If H is a Hilbert space. If every element in the linear space M + N has a unique representation in the form x + y. The space M is called the subspace of the projection P . so z = limn yn = limn P yn = P z ∈ M so M is closed. T ∗ y) for all x. with yn → z. x) is an isometric embedding of H onto H ∗ . y ∈ H. It follows that P is linear. Definition 5. z so z ≤ x . y) = (x. Proof.52 5. Call y the projection of x in M . [1] If P is a projection. Notice that the projection theorem says that if M is a closed linear space in H. z) = x∗ (z) ≤ x∗ z . then P is a projection. then for every x ∈ H. then we also write N = Y M and call N the orthogonal complement of M in Y . [1] Let P = PM . If yn = P (xn ). Since P is self–adjoint and P 2 = P .4. and xi = yi + zi for i = 1. The adjoint T ∗ of T is deﬁned by the relation (T x. Exercise 5. To see that P 2 = P . (T S)∗ = S ∗ T ∗ . HILBERT SPACES On the other hand. An operator with T = T ∗ is called self–adjoint.2.2. If Y = M ⊕ N . (λT )∗ = λT ∗ . Let M and N be linear subspaces of a Hilbert space H. (x − P x. T = T .3.4. Then λ1 x1 + λ2 x2 = (λ1 y1 + λ2 y2 ) + (λ1 z1 + λ2 z2 ) and (λ1 y1 + λ2 y2 ) ∈ M. If P = 0 then for any x ∈ M \{0} we have P x = x so P ≥ 1. and the operator P = PM deﬁned by P x = y is the projection on M . z ∈ M ⊥ . [2] Let M = P (H). Projection and self–adjoint operators Definition 5. then (T ∗ )−1 is a bounded linear map with domain H and (T −1 )∗ = (T ∗ )−1 . By the projection theorem. P y) = (P x − P 2 x. then P is self–adjoint. Theorem 5. then we write M ⊕ N – and this sum is automatically a direct one. x2 ) = (y1 . then M is a linear subspace of H. The map σ : H → H ∗ given by (σx)(y) = (y. then we say M + N is a direct sum. Self–adjointness is clear: (P x1 . P x2 ). ¯ and λ ∈ C. [2] If P is a self–adjoint operator with P 2 = P . y2 ) = (x1 . then the space H ∗ is also a Hilbert space. T = T . y) = 0 ∗ 2 = (z. If T is also a bounded linear operator with domain H. Prove the following: (T + S)∗ = T ∗ + S ∗ . Let T and S be bounded linear operators in Hilbert space H. Notice that if T is self–adjoint. y2 ) = (x1 . Let M be a closed linear subspace of the Hilbert space H. then P yn = P 2 xn = P xn = P yn . Definition 5. .1. x ∈ M . Let T : H → H be a bounded linear operator. x) ∈ R. Notice that x1 2 = y1 2 + z1 2 ≥ y1 2 = P x1 2 so P ≤ 1.
so M ⊥ N . If PL PM = PL . PL x = PL PM x ≤ PL PM x ≤ Pm x . Then PL x ∈ M for all x ∈ H. P = PM − PL is a projection if and only if PL is a part of PM . so L ⊂ M . PROJECTION AND SELF–ADJOINT OPERATORS 53 for all y ∈ H so x − P x ∈ M ⊥ . [1] Projections PM and PN are orthogonal if and only if M ⊥ N . then ∗ ∗ ∗ PL = PL = (PM PL )∗ = PL PM = PL PM . [4] PL is part of PM ⇐⇒ PM PL = PL ⇐⇒ PL PM = PL ⇐⇒ PL x ≤ PM x ∀ x ∈ H. [6] More generally. PN y) = (PN PM x. [1] Let PM PN = 0 and x ∈ M. then P ∗ = P . Multiplying on the right by PM then gives 2PM PN PM = 0 so PM PN = 0. it is clear that (PM + PN )(H) = M ⊕ N so P = PM ⊕N . y ∈ N . then P = PM L . Finally. Then P ∗ = P . 2 2 Also P 2 = PM PN PM PN = PM PN = PM PN = P . so P is a projection. then I − P is a projection. PM + PN = PM ⊕N . y0 ∈ M . Conversely. Since projections are self– adjoint. Then PL x0 2 = y0 2 + z0 2 > y0 2 = PM x0 2 . Then (x.5. P is the projection PM . if x ∈ M ∩ N then P x = PM (PN x) = PM x = x so P = PM ∩N . and PM PL = PL . Hence PM PN + PM PN PM = 0. it is a projection. This means that x = P x + (x − P x) is the unique decomposition of x as a sum y + z with y ∈ M and z ∈ M ⊥ . Since P is self– adjoint. let PM PN = PN PM = P . Proof. Projections P1 and P2 are orthogonal if P1 P2 = 0. That is. In that case. ∗ ∗ [3] If P = PM PN is a projection. On the other hand. then for any x ∈ H. so P is self–adjoint. The projection PL is part of the projection PM if and only if L ⊂ M . [2] The sum of two projections PM and PN is a projection if and only if PM PN = 0.3. If so. z0 ⊥ M . after multiplying by PM on the left. Theorem 5. then P 2 = P . [5] If P is a projection. [3] The product of two projections PM and PN is another projection if and only if PM PN = PN PM . PM PN = PM ∩N . We collect all the elementary properties of projections into the next theorem. so PL PM = PL . so PM PN = (PM PN )∗ = PN PM = PN PM . [4] Assume that PL is part of PM . Conversely. y) = (PM x. so that PL x ≤ PM x . and z0 = 0. assume that PL x ≤ PM x . P x = PM (PN x) = PN (PM x) so P x ∈ M ∩ N . so P 2 = P . if PM PN = 0 then PN PM = 0 also. so PM PN + PN PM = 0. [2] If P = PM + PN is a projection. In that case. If there is a point x0 ∈ L\M then let x0 = y0 + z0 . P1 P2 = 0 if and only if P2 P1 = 0. . Moreover. PN x ⊥ M so PM (PN x) = 0. Finally. if M ⊥ N then for any x ∈ H. If PM PL = PL . y) = 0. Hence PM PL x = PL x. Conversely.
and if any two elements of K are orthogonal. By [2]. We have m 2 m x− n=1 (x. [5] I − P is self–adjoint. (49) It follows that m (x. n=1 (48) The inequality (48) is Bessel’s inequality. if PL is part of PM . xn ) are called the Fourier coeﬃcients of x with respect to {xn }. n=1 and Bessel’s inequality follows by taking m → ∞. xn )(xn . Hence. so m 2 m x− n=1 (x.6 is the best possible approximation of ﬁxed length. HILBERT SPACES so there can be no such point. xn )xn . 4. n=1 m (x. An orthonormal set K is complete if K ⊥ = 0. for any n ≥ 1. PL = PM PL . xn )xn = x 2 − − x. xn )xn = x 2 − n=1 (x. xn )2 ≤ x 2 . Then for any x ∈ H.7. PL is a part of PM . x). m m x− n=1 λn xn ≥ x − n=1 (x. It follows that L ⊂ M so PL is a part of PM . ∞ (x.6. the subspace Y of PM − PL must therefore satisfy Y ⊕ L = M . Also by [5]. . Let {xn } be an orthonormal sequence in H. [6] If P is a projection. That is. Theorem 5. Orthonormal sets A subset K in a Hilbert space H is orthonormal if each element of K has norm 1. so by [2] we must have (I − PM )PL = 0. so Y = M L. Conversely.54 5. Let {xn } be an orthonormal sequence in a Hilbert space H and let {λn } be any sequence of scalars. then by [5] so is I − P = (I − PM ) + PL . Proof. x n=1 + n=1 (x. Theorem 5. xn )2 ≤ x 2 . I − PM is a projection. then PM − PL and PL are orthogonal. The next result shows that the Fourier series of Theorem 5. and (I − P )2 = I − P − P + P 2 = I − P. by [4]. xn 2 . xn )xn m (x. xn )xn . Then. The scalar coeﬃcients (x.
Then (x. x). x) + (z. Let K be any orthonormal set in a Hilbert space H. the sum arranged. x) = (x. and let {αn } be any sequence of scalars. y) = 0}. Theorem 5. Then the series αn xn is convergent if and only if αn 2 < ∞. xn ). tm = n=1 ∞ αjn xjn . z) so (52) shows that x − z = 0 and hence x = z. αn xn is independent of the order in which the terms are Proof.9. tm ) = m αj 2 . Let {xn } be an orthonormal sequence in a Hilbert space H. and if so ∞ ∞ 1/2 αn xn = n=1 n=1 αn 2 . Now apply equation (49). z) = lim(sm . Take n = 1 and m → ∞ in (51) to get (50) Assume that αj 2 < ∞ and let z = αjn xjn be a rearrangement of the series x = αj xj . Then m 2 m m m x− n=1 λn xn = = ≥ x x x 2 − n=1 m ¯ λn cn − n=1 m λn c¯ + n n=1 λn 2 2 − n=1 m cn 2 + n=1 cn − λn 2 2 − n=1 cn 2 . and for each x ∈ H let Kx = {y  y ∈ K. Write sm = j=1 αj xj .4. Theorem 5. Kx is countable. j=1 2 ¯ Also. Then: (i) for any x ∈ H. . (51) Since H is complete. For m > n we have (by orthonormality) m 2 m αj xj j=n = j=n αj 2 . (51) shows the ﬁrst part of the theorem. (z. (50) Moreover. (ii) the sum Ex = y∈Kx (x. Then x−z and (x. (iii) E is the projection operator onto the closed linear space spanned by K. z) = (x. ORTHONORMAL SETS 55 Proof. (x. m m (52) αj 2 . z) = 2 = (x.8. z) − (x. z) − (z. Write cn = (x. x) = (z. y)y converges independently of the order in which the terms are arranged.
. ¯ ¯ Let < K > denote the closed linear subspace spanned by K. Taking n → ∞. ¯ Definition 5.11. and take n → ∞ in (49) to obtain Parseval’s formula (iv). .5. .9. 1 . we deduce that ¯ Ex = x for all x ∈ < K >. Arrange the set Kx in a sequence {yj }. Ex = x for all x ∈ H. (i) K is complete. Since > 0 is arbitrary. y)2 = 0. Proof.2. Every Hilbert space has an orthonormal basis. . Theorem 5. . (54) Theorem 5. Assume (ii). . (iii) K is an orthonormal basis for H. . yj )yj < . (iv) for any x ∈ H. Proof.8 show (ii). x= y∈Kx (x. all of the yj lie in Kx . we get x − Ex < . Bessel’s inequality and Theorem 5. n x− j=1 (x. Finally. Since K is maximal. (53) Without loss of generality. Example 5. This proves that E = P<K> . HILBERT SPACES Proof. If x ⊥ K.56 5. Any orthonormal basis in a separable Hilbert space is countable. and consider the classes of orthonormal sets in H with the partial order of inclusion. y) > . That (i) implies (ii) follows from Corollary 5. we see that Kx is 2 3 countable for any x. . y)2 . A set K is an orthonormal basis of H is K is orthonormal and for every x ∈ H. so K is an orthonormal basis. Classical Fourier analysis comes about using the orthonormal basis {e2πint }n∈Z for L2 [0. λn and elements y1 . Then by Theorem 5. ¯ (ii) < K > = H. If x ∈ < K > then for any > 0 there are scalars λ1 . .10. . The equality in (iv) is called Parseval’s formula. x 2 = y∈Kx (x.1. yn ∈ K such that n x− j=1 λ j yj < . This means that (iv) implies (i). then x 2 = (x. y)y. From (49) notice that the left–hand side of (53) does not increase with n. Then the following properties are equivalent. by Theorem 5. Taking = 1 . From Bessel’s inequality (48). . By Lemma A. it is complete and is therefore an orthonormal basis. Arrange the elements of Kx in a sequence {xn }. assume (iv). for any > 0 there are no more than x 2 / 2 points y in K with (x. Then. .7. so x = 0. Now assume (iii). Let K be an orthonormal set in a Hilbert space H. If x ⊥ < K > ¯ then Ex = 0. 1].1 there exists a maximal orthonormal set K. Let H be a Hilbert space.
. a contradiction. } is a linearly independent set whose linear span is dense in H. then the set {φ1 . Corollary 5. Proof. This idea can be extended as follows – the notational complexity comes about because of the need to renormalize (make the new vector unit length). we may write ∞ ∞ x= n=1 cn xn .11 there are sequences {xn } and {yn } that form orthonormal bases for H1 and H2 respectively. } deﬁned below is an orthonormal basis for H: x1 φ1 = . φn = xn − (xn . is orthogonal to x1 (since (x1 . x2 − (x2 . x2 ) − (x2 . then there is a ball B1/2 (xα0 ) that does not contain any of the points yn . . and {x1 . Second. xα − xβ 2 = xα 2 + xβ 2 = 2. x1 ) = 0). GRAM–SCHMIDT ORTHONORMALIZATION 57 Now let H be separable. φ1 )φ1 φ2 = . xn − (xn . x1 x2 − (x2 . x1 )x1 ) = (x1 .3. . Gram–Schmidt orthonormalization Starting with any linearly independent set {x1 . x1 )x1 is a non–zero vector (since x1 and x2 are independent). . If {yn } is a dense sequence in H. . The idea is simple: ﬁrst. Since. We will only need this for sets whose linear span is dense. φn−1 )φn−1 . for any α = β. 2 = n=1 dn 2 = n=1 cn 2 = x 2 . x2 . φ1 )φ1 and in general for any n ≥ 1.12. Theorem 5. φn−1 )φn−1 . φ2 . the same space as {x1 . . φ1 )φ1 − (xn .12). It is clear that T is linear and it maps H1 onto H2 since the sequences (cn ) and (dn ) run through all of 2 . x2 } linearly independent. By Theorem 5. If {x1 . φ2 )φ2 − · · · − (xn . Hence xα0 is not in the closure of {yn }. x2 ) − (x2¯x1 )(x1 . and suppose that {xα } is an uncountable orthonormal basis. x1 )x1 } spans . Given any points x ∈ H1 and y ∈ H2 . we can inductively construct an orthonormal set that spans the same subspace by the Gram–Schmidt Orthonormalization process (Theorem 5. x2 . x2 }. . x2 − (x2 . . 5. φ2 )φ2 − · · · − (xn . } is a a Hilbert space H. the balls B1/2 (xα are mutually disjoint.5. then x2 − (x2 . (55) where cn = (x. Also. Any two inﬁnite–dimensional separable Hilbert spaces are isometrically isomorphic. ∞ ∞ Tx so T is an isometry. y = n=1 dn xn . Deﬁne a map T : H1 → H2 by T x = y if cn = dn for all n in (55). φ1 )φ1 − (xn . if x1 is a ﬁxed unit vector and x2 is another unit vector with {x1 . yn ) for all n ≥ 1. xn ) and dn = (y. x2 − (x2 . . x1 ) = (x1 . any vector v can be reduced to unit length simply by dividing by the length v . Let H1 and H2 be two such spaces.
a = −1. b = ∞. 1 [2] If M (t) = √1−t2 . t2 . [1] If M (t) = 1 for all t. f )M < ∞ where b (f. b) → (0. } has a linear span dense in LM . HILBERT SPACES The proof is obvious unless you try to write it down: the idea is that at each stage the piece of the next vector xn that is not orthogonal to the space spanned by {x1 . then the process generates the Laguerre polynomials. Given a < b. Example 5.58 5. a = 0. b = ∞. b = 1. . a = −∞. . . ∞] and a function M : (a. . then the process generates the Hermite polynomials. The Gram–Schmidt orthonormalization process may be applied to 2 this set to produce various families of classical orthonormal functions. then the process generates the Jacobi polynomials. ∞) with the b property that a tn M (t)dt < ∞ for all n ≥ 1.3. . [5] If M (t) = e−t . a. b] to P 1/2 be the linear space of measurable functions f with f M = (f. . g)M = a ¯ M (t)f (t)g(t)dt. deﬁne the Hilbert space LM [a. . xn−1 } is subtracted. then the process generates the Tchebychev polynomials. then the process generates the Legendre polynomials. a = 0. 2 [4] If M (t) = e−t . It may be shown that the linearly independent set {1. The vector φn so constructed cannot be zero by linear independence. The most important situation in which this is used is to ﬁnd orthonormal bases for certain weighted function spaces. b ∈ [−∞. [3] If M (t) = tq−1 (1 − t)p−q . . b = 1. t. b = 1 (with q > 0 and p − q > −1). a = −1. t3 .
It follows that if the function P (t) is given.CHAPTER 6 Fourier analysis In the last chapter we saw some very general methods of “Fourier analysis” in Hilbert space. Cambridge. New York (1976). we can recover the coeﬃcients an by computing 2π 1 an = P (t)e−int dt.1. o 59 . Of course the methods started with the classical setting on periodic complex–valued functions on the real line. and in this chapter we describe the elementary theory of classical Fourier analysis using summability kernels. 2π)/0 ∼ 2π (this just means periodic functions). The classical theory of Fourier series is a huge subject: the introduction below comes mostly from Katznelson 1 and from K¨rner2 . A trigonometric polynomial on T is an expression of the form N P (t) = n=−N an eint . we make the unit circle have “length” 2π. Katznelson. Lebesgue integrable functions on T = [0. Cambridge University Press. Lemma 6. 2π 0 1 An introduction to Harmonic Analysis. Fourier series of L1 functions Denote by L1 (T) the Banach space of complex–valued. both are highly recommended for o further study. Notice that the translate fx of a function has the same norm. K¨rner. if n = m. That is. where fx (t) = f (t − x).1. Dover Publications. 1 2π 2π eint e−imt dt = 0 1 2π 2π ei(n−m)t dt = 0 1 0 if n = m. To recover the useful normalization that the L1 –norm of the constant function 1 is 1. Y. with an ∈ C. Definition 6. 1. T. The functions {eint }n∈Z are pairwise orthogonal in L2 . 0 What is going on here is simply this: to avoid writing “2π” hundreds of times. Modify the L1 –norm on this space so that f 1 = 1 2π 2π f (t)dt. the usual L1 –norm is divided by 2π. 2 Fourier Analysis.
is a function deﬁned by the value of the right hand side for each value of t. Prove these as an exercise. (56) The conjugate of S is the series ∞ ˜ S∼ n=−∞ −isign(n)an eint (57) where sign(n) = 0 if n = 0 and = n/n if not. Let f.1. A trigonometric series on T is an expression ∞ S∼ n=−∞ an eint . Theorem 6. the continuous functions on Z with the sup norm. so in general S is not related to a function at all. Definition 6. Corollary 6. Notice that there is no assumption about convergence. (59) We say that a given trigonometric series (56) is a Fourier series if it is of the form (59) for some f ∈ L1 (T). then fx 1 ˆ [5] f (n) ≤ 2π f (t)dt = f 1 . which is deﬁned to be the formal trigonometric series ∞ S[f ] ∼ n=−∞ ˆ f (n)eint . Assume (fj ) is a sequence in L1 (T) with fj − f ˆ ˆ Then fj → f uniformly. 1 → 0. ˆ [4] If fx (t) = f (t − x) is the translate of f . .2.1. . (λf )(n) = λf (n). This map is continuous in the following sense. The expression P (t) = . Deﬁne the nth (classical) Fourier coeﬃcient of f to be 1 ˆ f (n) = f (t)e−int dt (58) 2π (the integration is from 0 to 2π as usual). Associate to f the Fourier series S[f ]. Proof. ˆ [2] For λ ∈ C. ˆ ˆ [3] If f (t) = (f (t) is the complex conjugate of f then f (n) = f (−n). Then ˆ ˆ [1] (f + g)(n) = f (n) + g(n). g ∈ L1 (T).60 6. .3. This follows at once from Theorem 6. FOURIER ANALYSIS It will be useful later to write things like N P ∼ n=−N an eint which means that P is identiﬁed with the formal sum on the right hand side.1[5]. ˆ (n) = e−inx f (n). Definition 6. Let f ∈ L1 (T). ˆ Notice that f → f sends a function in L1 (T) to a function in C(Z).
s)dtds = f 1 1 ≤ f 1 g 1. and 1 2π (f ∗g)(t)dt = 1 2π 1 2π F (t. Also. (60) (F ∗ g)(t) = 2π Then f ∗ g ∈ L1 (T). with norm f ∗g Moreover ˆ ˆ (f ∗ g)(n) = f (n)g(n). t). so is integrable. Notice that the only real properties we will use is that the circle T is a group on which the measure ds is translation invariant: fx (s)ds = f ds. by Fubini’s Theorem 4.3). s) = f (t − s)g(s) is a measurable function of the variable (s. Convolution in L1 In this section we introduce a form of “multiplication” on L1 (T) that makes it into a Banach algebra (see Deﬁnition 3.4. the function f (t − s)g(s) is integrable as a function of s. Let f ∈ L1 (T) have f (0) = 0. 2π Finally. F (t. Deﬁne the convolution of f and g to be 1 f (t − s)g(s)ds. Then. s)dt ds = g(s) f 1 ds = f 1 g 1 . . f (t − s)g(s) is integrable as a function of s for almost all t. 2π periodic.2. It is clear that F is continuous since it is the integral of an L1 function. Assume that f. Then F is continuous. s) is a constant multiple of fs . t+2π F (t + 2π) − F (t) = t ˆ f (s)ds = 2π f (0) = 0. It is clear that F (t. g are in L1 (T). using integration by parts 1 ˆ F (n) = 2π 2π F (t)e−int dt = − 0 1 2π F (t) 0 −1 −int 1 ˆ e dt = f n. CONVOLUTION IN L1 61 ˆ Theorem 6. g 1. For almost all s. and 1 ˆ ˆ F (n) = f (n) in for all n = 0. Proof. 2.3. Proof.2. Deﬁne t F (t) = 0 f (s)ds. Moreover 1 1 1 F (t. for almost every s. Theorem 6. 2π 2π 2π So. in in Notice that we have used the symbol F – the function F is diﬀerentiable because of the way it was deﬁned. s)ds dt ≤ 1 4π 2 F (t.
In order to prove the continuity we must show that x→x0 lim fx − fx0 1 = 0. the continuous functions are dense in L1 (T). Finally. Also.2. Simply check this one term at a time: if χn (t) = eint .62 6.4. Prove this as an exercise. lim sup fx − fx0 x→x0 1 It follows that <2 . Summability kernels and homogeneous Banach algebras Two properties of the Banach space L1 (T) are particularly important for Fourier analysis. 2π 2π 3. FOURIER ANALYSIS showing that f ∗ g 1 ≤ f 1 g 1 . so given f ∈ L1 (T) and > 0 we may choose g ∈ C(T) such that g−f 1 < . then fx (t) = f (t − x) ∈ L1 (T) and fx 1 = f 1. The translation invariance is clear. Lemma 6. g) → f ∗ g is commutative. using Fubini again to justify a change in the order of integration. associative.3. the function x → fx is continuous on T for each f ∈ L1 (T). On the other hand. Thus convolving with the function eint picks out the nth Fourier coeﬃcient. If f ∈ L1 (T) and k(t) = N N n=−N an eint then (k ∗ f )(t) = n=−N ˆ an f (n)eint . . Lemma 6. (61) Now (61) is clear if f is continuous. Proof. Proof. and distributive over addition. Then fx − fx0 1 ≤ fx − gx 1 + gx − fx0 1 + gx0 − fx0 1 = (f − g)x 1 + gx − gx0 1 + (g − f )x0 1 < 2 + gx − gx0 1 . Theorem 6. then 1 1 (χn ∗ f )(t) = ein(t−s) f (s)ds = eint f (s)e−ins ds. 1 1 (f ∗ g)(n) = (f ∗ g)(t)e−int dt = f (t − s)e−in(t−s) g(s)dtds 2π 4π 2 1 1 = f (t)e−int dt · g(s)e−ins ds 2π 2π ˆ ˆ = f (n)g(n). The operation (f. If f ∈ L1 (T) and x ∈ T.
Then 1 f = lim kn (s)fs ds n→∞ 2π in the L1 norm. so that 2π kn (s)φ(s)ds − φ(0) is bounded by 2 for large n.4. Let k be a continuous function on T. lim n→∞ kn (t)dt = 0. Lemma 6.5. (66) Using (63) and the fact that φ is continuous at s = 0. Then 1 k(s)fs ds = k ∗ f. and f ∈ L1 (T). (67) 2π . (62) 2π There is an R such that 1 2π kn (t)dt ≤ R for all n. The integral appearing in Theorem 6.4 φ is a continuous L1 (T)–valued function on T. Consider ﬁrst the following lemma. Proof. δ kn (s) (φ(s) − φ(0)) ds −δ 1 ≤ max φ(s) − φ(0) s≤δ 1 kn 1 . Definition 6. 2π−δ (65) kn (s) (φ(s) − φ(0)) ds δ 1 ≤ max φ(s) − φ(0) 1 1 2π δ kn (s)ds. This is not a problem for us.4. and φ(0) = f . δ (64) If in addition kn (t) ≥ 0 for all n and t then (kn ) is called a positive summability kernel. SUMMABILITY KERNELS AND HOMOGENEOUS BANACH ALGEBRAS 63 so the theorem is proved. We will be integrating L1 (T)–valued functions – see the Appendix for a brief deﬁnition of what this means. 2π−δ (63) For all δ > 0. With the same δ. by (62) we have 1 1 kn (s)φ(s)ds − φ(0) = kn (s) (φ(s) − φ(0)) ds 2π 2π δ 1 = kn (s) (φ(s) − φ(0)) ds 2π −δ + The two parts may be estimated separately: 1 2π and 1 2π 2π−δ δ 1 2π 2π−δ kn (s) (φ(s) − φ(0)) ds. By Theorem 6. Write φ(s) = fs (t) = f (t − s) for ﬁxed t. A summability kernel is a sequence (kn ) of continuous 2π– periodic functions with the following properties: 1 kn (t)dt = 1 for all n. Then for any 0 < δ < π. Theorem 6. given any > 0 there is a δ > 0 such that (65) is bounded by .5 looks a bit like a convolution of L1 (T)– valued functions. Let f ∈ L1 (T) and let (kn ) be a summability kernel.3. (64) implies that (66) 1 converges to 0 as n → ∞.
Prove property (63) as an exercise. On the other hand. . . and this also shows that Kn (t) ≥ 0 for all n and t.5. ≤2 k 1 . sj . 2 2 4 2 4 . Lemma 6.64 6. 1 lim (sj+1 − sj )k(sj )f (t − sj ) = (k ∗ f )(t) 2π j uniformly. sin2 so Kn (t) = 1 n+1 sin n+1 t 2 sin 1 t 2 2 n 1− j=−n j n+1 eijt 1 n+1 1 1 1 − e−i(n+1)t + − ei(n+1)t . . . . (69) Property (64) follows. Property (62) is clear. Proof. Fej´r’s kernel e Deﬁne a sequence of functions n Kn (t) = j=−n 1− j n+1 eijt . For arbitrary f ∈ L1 (T). making the obvious deﬁnition for the integral. FOURIER ANALYSIS Proof. Assume ﬁrst that f is continuous on T. Now notice that 1 1 1 − e−it + − eit 4 2 4 = On the other hand. Then 1 2π so 1 2π k(s)fs ds − k ∗ f 1 k(s)fs ds − k ∗ f = 1 2π k(s)(f − g)s ds + k ∗ (g − f ). proving the lemma for continuous f . } becomes ﬁner. . Then. 2π 2π j with the limit taken in the L1 (T) norm as the partition of T deﬁned by {s1 .5 can be written in the form f = lim kn ∗ f in L1 . n→∞ (68) 4. The sequence (Kn ) is a summability kernel.4 means that Theorem 6. . . ﬁx > 0 and choose a continuous function g with f − g 1 < . Lemma 6. 4 2 4 t 1 1 1 1 = (1 − cos t) = − e−it + − eit . 1 1 k(s)fs ds = lim (sj+1 − sj )k(sj )fsj .
Using ˆ(n) = 0 for all n. Using Lemma 6. and choose a trigonometric polynomial P with the f −P 1 < . we see that if f follows that f = 0. It is enough to show that f (n) = 0 for all n implies that f = 0. The family of functions {eint }n∈Z form a complete orthonormal system in L2 (T).2. FEJER’S KERNEL 65 The following graph is the Fej´r kernel K11 . If f. We also ﬁnd a very general statement about the decay of Fourier coeﬃcients: the Riemann– Lebesgue Lemma.6. the function f and its Fourier series have identical Fourier coeﬃcients. Write σn (f ) = Kn ∗ f . then f = g. It follows at once that the trigonometric polynomials are dense in L1 (T). ˆ ˆ Theorem 6.5. (70) and (68) means that σn (f ) → f in the L1 norm for every f ∈ L1 (T). Then for all f ∈ L2 (T). ˆ Theorem 6. e Definition 6. Corollary 6. Proof. Then limn→∞ f (n) = 0.3. The most important consequences are however more general statements about Fourier series. so must agree. Proof.7. eint ) = f (n). it (70). Fix an property that > 0. Let f ∈ L1 (T). It is enough to notice that ˆ (f. . ˆ Proof.´ 4. it follows that n σn (f )(t) = j=−n 1− j n+1 ˆ f (j)eijt . then σn (f ) = 0 for all n. since σn (f ) → f . g ∈ L1 (T) have f (n) = g(n) for all n ∈ Z.
then ˆ f (n) = (f − P )(n) ≤ f − P 1 < . . and the nth partial sum corresponds to the function n Sn (f )(t) = j=−n ˆ f (j)eijt . Recall that for f ∈ L1 (T). (72) n+1 It follows that if Sn (f ) converges in L1 (T). (63) and (64) are clear. . S1 (f ). that is to f (if this is not clear to you. look at Corollary 6. sin 1 t 2 Notice that (Dn ) is not a summability kernel: it has property (62) but does not have (63) (as we saw in Lemma 3. The following graph is the Dirichlet kernel D11 . The partial sums Sn (f ) also have a convolution form: using (70) we have that Sn (f ) = Dn ∗ f where (Dn ) is the Dirichlet kernel deﬁned by σn (f ) = n Dn (t) = j=−n eijt = sin(n + 1 )t 2 .3) nor does it have (64). This explains why the question of convergence for Fourier series is so much more subtle than the problem of summability. The de la Vall´e Poussin kernel is deﬁned by e Vn (t) = 2K2n+1 (t) − Kn (t). . Definition 6.3 below. (71) Looking at equations (71) and (70). . the Fourier series was deﬁned (formally) to be ∞ S[f ] ∼ n=−∞ ˆ f (n)eint . we see that σn (f ) is the arithmetic mean of S0 (f ). FOURIER ANALYSIS If n exceeds the degree of P . Sn (f ): 1 (S0 (f ) + S1 (f ) + · · · + Sn (f )) .66 6. then it must converge to the same thing as σn . .6. Properties (62).
then σn (f. then σn (f. . The question addressed in this section is the following: does the Fourier series of a function converge pointwise to the original function? In the last section. t) −→ 1 2 limh→0 (f (t + h) + f (t − h)) .5. t) −→ f (t). if σn (f. e This kernel is useful because Vn is a polynomial of degree 2n + 1 with Vn (j) = 1 for j ≤ n + 1. C(T)). then there is another notion of convergence: xn converges to x pointwise if for every z ∈ Z. there is no real reason for the limit to be f (t). so it may be used to construct approximations to a function f by trigonometric polynomials having the same Fourier coeﬃcients as f for small frequencies. In addition. (c) If there is a closed interval I ⊂ T on which f is continuous. ·) converges uniformly to f on I. xn (z) → x(z) as a sequence of complex numbers. Theorem 6. (b) If f is continuous at t. then σn (f. then it must converge to f (t). (a) If lim (f (t + h) + f (t − h)) h→0 exists (the possibility that the limit is ±∞ is allowed). If f is continuous at t. we showed that for L1 functions on the circle. Pointwise convergence Recall that a sequence of elements (xn ) in a normed space (X. L1 (T). we have that σn (f ) → f uniformly for all f ∈ C(T).8. and if the Fourier series of f converges at t. Applying this to the Banach algebra of continuous functions with the sup norm. then the convergence in norm of σn (f ) does not tell us anything about the pointwise convergence. Let f be a function in L1 (T). t) converges for some t. If the space X is a space of complex–valued functions on some set Z (for example. POINTWISE CONVERGENCE 67 The next picture is the de la Vall´e Poussin kernel with n = 11. Corollary 6.3. If the function f is not continuous on T. 5. · ) converges to x if xn − x → 0 as n → ∞. σn (f ) converges to f with respect to the norm of any homogeneous Banach algebra containing f .
t) − f (t) = π Fix θ π + 0 θ Kn (τ ) f (t − τ ) + f (t + τ ) ˇ − f (t) dτ. t) → S(t) say. σn (f ) = n+1 By assumption and (b). t) = f (t). ˇ We wish to show that σn (f. Write the right hand side as 1 1 √ √ S0 (t) + S1 (t) + · · · + S n (t) + S n+1 (t) + · · · + Sn (t) .8. so S(t) must coincide with limn→∞ σn (f. and choose θ ∈ (0.5) is a positive summability kernel with the following properties: n→∞ √ √ lim sup θ<t<2π−θ Kn (t) = 0 for any θ ∈ (0. π) small enough to ensure that τ ∈ (−θ. Then the whole second term is within n− n of S(t). t) − f (t) is small for large n. t) → f (t) and Sn (f. h→0 2 and assume that this limit is ﬁnite (a similar argument works for the inﬁnite cases). recall that the Fej´r kernel (Kn ) (see e Lemma 6. Applying (74) this may be written 1 ˇ σn (f. t) − f (t) = Kn (τ ) f (t − τ ) − f (t) dτ 2π T = 1 2π θ ˇ Kn (τ ) f (t − τ ) − f (t) dτ −θ 2π−θ + θ ˇ Kn (τ ) f (t − τ ) − f (t) dτ. (77) . 1 ˇ ˇ σn (f. 2 (75) > 0. n+1 n+1 The ﬁrst term converges to zero as n → ∞ (since the convergent sequence (Sn (t)) is bounded). 2 sup θ<τ <2π−θ (76) and choose N large enough to ensure that n > N =⇒ Kn (τ ) < . σn (f.8.68 6. (74) Proof of Theorem 6. θ) =⇒ f (t − τ ) + f (t + τ ) ˇ − f (t) < . Recall equation (72): 1 (S0 (f ) + S1 (f ) + · · · + Sn (f )) . Evaluate the diﬀerence. choose and ﬁx and choose n so large that Sk (t) − S(t) < for all k ≥ n. π). Turning to the proof of Theorem 6. FOURIER ANALYSIS Proof. (73) and Kn (t) = Kn (−t). Deﬁne 1 ˇ f (t) = lim (f (t + h) + f (t − h)) . It follows n+1 that 1 (S0 (f ) + S1 (f ) + · · · + Sn (f )) → S(t) n+1 as n → ∞. For the second term.
a Fourier series that converges to zero almost everywhere must have all its coeﬃcients equal to zero. On the other hand. Corollary 6. If f has property (79) at t.4.8.9. Salem. then almost everywhere on F the Fourier series must converge to f . (78) ˇ implies another property: there is a number f (t) for which 1 h→0 h lim h 0 f (t + h) + f (t − h) ˇ − f (t) dτ = 0. However. ˇ Theorem 6. are functions f with the property that Fej´r’s condition (78) does not hold anywhere. 5 See Chapter 5 of Ensembles parfaits et series trigonometriques. then σn (f. by (69). that e f (t + h) + f (t − h) ˇ f (t) = lim (78) h→0 2 exists is very strong. 1 ˇ σn (f.9. for almost all t with f (t) = f (t). e ˇ but (79) does hold for any f ∈ L1 (T). This means that property (78) is not really well–deﬁned on L1 . 2 (80) 1 n+1 sin n+1 τ 2 1 τ 2 . σn (f. (81) continuous function on a closed bounded interval is uniformly continuous. t) − f (t) < + f − f (t) 1 . For (c). Part (b) follows at once from (a). Recall the expression (75) in the proof of Theorem 6. In particular ˇ (by the footnote). A. 4 There . Cambridge (1959). t) − f (t) = π Also. which proves (a).1. 6. can be found for all t ∈ I. Kn (τ ) = 3A θ π + 0 θ Kn (τ ) f (t − τ ) + f (t + τ ) ˇ − f (t) dτ. J. a trigonometric series that converges to zero everywhere must have all coeﬃcients zero5 . t) → f (t). Proof of 6. If the Fourier series of f ∈ L1 (T) converges on a set F of positive measure.P.6. Kahane and R. for almost every value of t. Zygmund. The case of trigonometric series is diﬀerent: a basic counter– example in the theory of trigonometric series is that there are non–zero trigonometric series that converge to zero almost everywhere. t) − f (t) < + f − f (t) 1 . This means that (given > 0) θ can be chosen so that (76) holds for all t ∈ I and N depends only on θ and . t) → f (t). This means that a uniform estimate of the form ˇ ˇ σn (f. LEBESGUE’S THEOREM 69 Putting the estimates (76) and (77) into the expression (75) gives ˇ ˇ σn (f. In particular. and is not preserved if the function f is modiﬁed on a null set. Lebesgue’s Theorem The Fej´r condition. Hermann (1963). This is described in volume 1 of Trigonometric Series. Remark 6. better suited to integrable functions4 . 2 (79) This is a more robust condition. notice3 that f must be uniformly continuous on I. Cambridge University Press.
(82) It follows that the second integral in (80) will converge to zero so long as (n + 1)θ2 does. θ + 1/n 2π n+1 θ 1/n Ψ(τ ) dτ. π2 (n + 1)τ 2 . Apply integration by parts to the second term gives π π n+1 θ 1/n f (t + τ ) + f (t − τ ) dτ π Ψ(τ ) ˇ − f (t) 2 = 2 τ n + 1 τ2 > 0 and n > n( ) (79) gives Ψ(τ ) < τ for τ ∈ (0. FOURIER ANALYSIS τ π and sin τ > 2 for 0 < τ < π. 2 f (t + τ ) + f (t − τ ) ˇ − f (t) dτ 2 n+1 1 Ψ( n ) π + π n+1 θ 1/n Then 1 π is bounded above by 1 π 1/n θ Kn (τ ) 0 + 0 1 π θ ≤ 1/n f (t + τ ) + f (t − τ ) dτ ˇ − f (t) 2 2 τ (we have used the estimate for Kn from (82)). so Kn (τ ) ≤ min n + 1. Now consider the ﬁrst integral. Write h Ψ(h) = 0 f (t + h) + f (t − h) ˇ − f (t) dτ. Pick θ = n−1/4 . θ 1/n dτ < 3π . τ3 (83) For given It follows that (83) is bounded above by 2π π n + n+1 n+1 which completes the proof. this guarantees that as n → ∞ the second integral tends to zero. θ = n−1/4 ). By the assumption (79). the ﬁrst 1 term n+1 Ψ( n ) tends to zero. τ2 .70 6.
where X is the closure of the span of the basis. Theorem A. Then S contains a set A of linearly independent elements such that the linear subspace spanned by A coincides with X. To see that the following theorem is “constructing” something a little surprising. then every element of X has a (unique) representation x= aλ xλ in which the sum is ﬁnite and the the aλ are scalars. Definition A. Zorn’s lemma. If S is a partially ordered set in which every totally ordered subset has an upper bound. If in addition for any two elements x. then S has a maximal element. then we say that S is a totally ordered set.2. L2 [0. deﬁnes a partially ordered set for example. The set of subsets of a set X. (ii) if x ≤ y and y ≤ z then x ≤ z for all x. 1] is a linear space over R.1. y. A partially ordered set or poset is a non–empty set S together with a relation ≤ that satisﬁes the following conditions: (i) x ≤ x for all x ∈ S. Definition A. Any such set A is called a Hamel basis for X. Let S be a partially ordered set. trivial in ﬁnite dimensions. z ∈ S. It is quite a diﬀerent kind of object to the usual spanning set or basis used. Zorn’s lemma and Hamel bases Definition A. with ≤ meaning inclusion. Let X be a linear space over any ﬁeld. An example is the following result – as usual. This result is used frequently to “construct” things – though whenever we use it all we really are able to do is assert that something must exist subject to assuming the Axiom of Choice. Theorem A.APPENDIX A 1. If the Hamel basis is A = {xλ }λ∈Λ .2. x ≤ y =⇒ y ≤ x. think of the following examples: R is a linear space over Q.1. Let S be a partially ordered set. is one of the formulations of the Axiom of Choice. y of S at least one of the relations x ≤ y or y ≤ x holds.3. An element S ∈ S is maximal if for any y ∈ S. 71 . An element x ∈ S is an upper bound of T if y ≤ x for all y ∈ T . and T any subset of S. The next result.
While there. It follows that {x1 . contradicting the maximality of A.72 A Proof. B. . . ∞ so x ∈ Fn . It follows that there can be no more than one point in the intersection. Deﬁne a partial ordering on S by A ≤ B if and only if A ⊂ B. he wrote an important treatise on discontinuous functions. but it is not true that B ≤ A. In the langauge of metric spaces. . . 2. so x1 . We may therefore apply Theorem A. Now choose a point xn ∈ Fn for each n. as do two other important mathematical concepts. . It follows that x ∈ n=1 Fn .b∈S Theorem A. so has a limit x say by completeness. His interest in the general ideas of continuity was reinforced by Volterra. }. then there exists exactly one point in the ∞ intersection n=1 Fn . . In order to prove this. For any n. In 1905. . Then xn − xm ≤ diam(Fn ) → 0 as n ≥ m → ∞. . . We ﬁrst claim that if {Aα } is a totally ordered subset of S.4. we must show that any ﬁnite number of elements x1 . It follows that every element of X is a ﬁnite linear combination of elements of A. Baire functions and Baire classes. xn } ⊂ Aαj . Fn is a closed set that contains all the xm with m ≥ n. . A subset S ⊂ X of a normed space is nowhere dense if for every point x in the ¯ closure of S. then by the deﬁnition of the diameter. a. If x and y are both in the intersection. Let {Fn } be a decreasing sequence of non–empty closed sets (this means Fn ⊃ Fn+1 for all n) in a complete normed space X. . . this means that a complete normed space is of second category. Baire category theorem Most of the facts assembled here are really about metric spaces – normed spaces are a special case of metric spaces. then the set B = A ∪ {y} belongs to S (since it is linearly independent). C. . Thus the sequence (xn ) is Cauchy. x − y ≤ diam(Fn ) → 0 so x = y. and A ≤ B. Theorem A. Since the set {Aα } is totally ordered.1 to conclude that S has a maximal element A. one of the subsets Aαj contains all the others. . . . If y ∈ X is not a ﬁnite linear combination of elements of A. Baire became professor of analysis at the Faculty of Science in Dijon. and for every > 0 B (x) ∩ (X\S) is non–empty. A complete normed space cannot be written as a countable union of nowhere dense sets. n. Proof. and write S = {A. 1 Rene Baire (1874–1932) was one of the most inﬂuential French mathematicians of the early 20th century. Baire’s category theorem bears his name today. xn are linearly independent. . Let S be the set of subsets of X that comprise linearly independent elements. . The diameter of S ⊂ X is deﬁned by diam(S) = sup a − b . . it has the upper bound B = α Aα .3. xn of B are linearly independent. . Assume that xi ∈ Aαi for i = 1. If the sequence of diameters diam(Fn ) converges to zero. The next result is a version of the Baire1 category theorem.
Let X be a complete normed space.2. we construct a sequence of decreasing closed balls Brn (xn ) such ¯r (xn ) ∩ Xj = ∅ for 1 ≤ j ≤ n. ¯ and B 1 2 ¯ Similarly. and 1 ¯ ¯ ¯ ¯ Br2 (x2 )∩ X2 = ∅. the sets Xj all have empty interior). there is a point x2 and a radius r2 such that Br2 (x2 ) ⊂ Br1 (x1 ). and suppose that X = ∪∞ Xj j=1 ¯ where each Xj is nowhere dense (that is. there must be a point x in the intersection of all the ¯ ¯ closed balls Brn (xn ).3. Assume without loss of generality that r1 < 1 . It follows that there is a ball Br (x1 ) such that Br (x1 ) ⊂ B1 (x0 ) ¯ with x1 ∈ X / 1 1 ¯r (x1 ) ∩ X1 = ∅. ¯ that B n Now by Theorem A. BAIRE CATEGORY THEOREM 73 Proof. . Since X1 does not contain B1 (x0 ) there must be a point x1 ∈ B1 (x0 ) ¯1 . and rn → 0 as n → ∞. ∅ automatically since B 2 1 ¯ Inductively. Fix ¯ a ball B1 (x0 ). / ¯ / a contradiction. so x ∈ Xj for all j ≥ 1. and without loss of generality r2 < 3 . Notice that Br2 (x2 )∩ X1 = ¯r (x2 ) ⊂ Br (x1 ). This implies that x ∈ ∪j≥1 Xj = X.
This action might not be possible to undo. Are you sure you want to continue?
We've moved you to where you read on your other device.
Get the full title to continue reading from where you left off, or restart the preview.