Functional Analysis I Autumn Term 2008: James C. Robinson

Functional Analysis I
Autumn Term 2008
James C. Robinson
Introduction
I hope that these notes will be useful. They are, of course, much more wordy
than the notes you will have taken in lectures, but the maths itself is usually
done in a little more detail and should generally be ‘tighter’. You may find
that the order in which the material is presented is a little different to the
lectures, but this should make things more coherent.
Solutions to the examples sheets will follow separately.
I hope that there are relatively few mistakes, but if you find yourself
staring at something thinking that it must be wrong then it most likely is,
so do email me at j.c.robinson@warwick.ac.uk. I will post a list of errata
as and when people find them on my webpage for the course,
www.maths.warwick.ac.uk/˜jcr/FAI.
These notes will form the basis of the first part of a textbook on functional
analysis, so any general comments would also be welcome.
iii
Contents
1 Vector spaces and bases page 1
2 Norms and normed spaces 8
2.1 Norms and normed spaces 8
2.2 Convergence 16
3 Compactness and equivalence of norms 20
3.1 Compactness 20
4 Completeness 26
4.1 The completion of a normed space 31
5 Lebesgue integration 36
5.1 Integrals of functions in Lstep (R) 36
5.2 Increasing sequences of functions in Lstep (R): Linc (R) 39
5.3 The space L1 (R) of integrable functions 43
5.4 The Lebesgue spaces Lp 46
6 Inner product spaces 48
6.1 Inner products and norms 49
iv
Contents v
6.2 The Cauchy-Schwarz inequality 49
6.3 The relationship between inner products and their norms 51
7 Orthonormal bases in Hilbert spaces 54
7.1 Orthonormal sets 54
7.2 Convergence and orthonormality in Hilbert spaces 56
7.3 Orthonormal bases in Hilbert spaces 62
8 Closest points and approximation 65
8.1 Closest points in convex subsets 65
8.2 Linear subspaces and orthogonal complements 66
8.3 Best approximations 69
9 Separable Hilbert spaces and ℓ2 73
10 Linear maps between Banach spaces 78
10.1 Bounded linear maps 78
10.2 Kernel and range 83
11 The Riesz representation theorem and the adjoint operator 85
11.1 Linear operators from H into H 91
12 Spectral Theory I: General theory 93
12.1 Spectrum and point spectrum 93
13 Spectral theory II: compact self-adjoint operators 99
13.1 Complexification and real eigenvalues 99
13.2 Compact operators 101
14 Sturm-Liouville problems 110

1
Vector spaces and bases
In all that follows we use K to denote R or C, although one can

define vector spaces over arbitrary fields.
Rn is the simplest and most natural example of a vector space. We give a

formal definition, but it is the closure property inherent in the definitions,
f + λg ∈ V, f, g ∈ V, λ ∈ K = R or C,
that one usually has to check.
Definition 1.1 A vector space V over K is a set V with operations + :

V × V → V and ∗ : K × V → V such that
• additive and multiplicative identities exist: there exists a zero element
0 ∈ V such that x + 0 = x for all x ∈ V ; and 1 ∈ K is the identity
for scalar multiplication, 1 ∗ x = x for all x ∈ V ;
• there are additive inverses: for every x ∈ V there exists an element
−x ∈ V such that x + (−x) = 0;
• addition is commutative and associative, x+y = y+x and x+(y+z) =
(x + y) + z for all x, y, z ∈ V ;
• multiplication is associative
α ∗ (β ∗ x) = (αβ) ∗ x for all α, β ∈ K, x ∈ V
and distributive
α ∗ (x + y) = α ∗ x + α ∗ y and (α + β) ∗ x = α ∗ x + β ∗ x
for all α, β ∈ K, x, y ∈ V .
1
2 Vector spaces and bases
The multiplication operator ∗ can usually be understood and so we gen-

erally drop this notation.
As remarked above we will only consider K = R or C here, and will refer to

real or complex vector spaces respectively. Generally we will omit the word
‘real’ or ‘complex’ unless wishing to make an explicit distinction between
real and complex vector spaces.
Examples: Rn is a real vector space over R; it is not a vector space over

/ Rn for any x ∈ Rn ); Cn is a vector space over both R and C
C (since i ∗ x ∈
(so if we take K = R the space Cn can be thought of, somewhat unnaturally,
as a ‘real vector space’).
Example 1.2 For 1 ≤ p < ∞ define the space ℓp (K) of all pth power
summable sequences with elements in K (recall that K = R or C):
∞
X
ℓp (K) = {x = (x1 , x2 , . . .) : xj ∈ K, |xj |p < +∞}.
j=1
For p = ∞, ℓ∞ (K) is the space of all bounded sequences in K.

For x, y ∈ ℓp (K) set
x + y = (x1 + y1 , x2 + y2 , . . .),
and for α ∈ K, x ∈ ℓp , define
αx = (αx1 , αx2 , . . .).
With these definitions ℓp (K) is a vector space. The only issue is whether
x + y is still in ℓp (K); for 1 ≤ p < ∞ this follows since
n
X n
X ∞
X ∞
X
p p p p p p p
|xj + yj | ≤ 2 (|xj | + |yj | ) ≤ 2 |xj | + 2 |yj |p < +∞,
j=1 j=1 j=1 j=1
and for p = ∞ it is clear.
Sometimes we will simply write ℓp for ℓp (R).
Example 1.3 The space C 0 ([0, 1]) of all real-valued continuous functions
on the interval [0, 1] is a vector space with the obvious definitions of addition
and scalar multiplication, which we give here for the one and only time: for
Vector spaces and bases 3
f, g ∈ C 0 ([0, 1]) and α ∈ R, we denote by f + g the function whose values

are given by
(f + g)(x) = f (x) + g(x), x ∈ [0, 1],
and by αf the function whose values are
(αf )(x) = α f (x), x ∈ [0, 1].
Example 1.4 Denote by L̃1 (0, 1) the set of all real-valued continuous func-
tions on (0, 1) for which
Z 1
|f (x)|2 dx < +∞.
0
Then L̃1 (0, 1) is a vector space (with the obvious definitions of addition and
scalar multiplication).
The only thing to check here is that f + λg ∈ L̃1 (0, 1) whenever f, g ∈

L̃1 (0, 1)
and λ ∈ R. Clearly f + λg ∈ C 0 (0, 1), and we have
Z 1 Z 1
|(f + λg)(x)|2 dx = |f (x)| + |λ||g(x)| dx
0 0
Z 1 Z 1
≤ |f (x)| dx + |λ| |g(x)| dx < +∞.
0 0
Note that if f ∈ C 0 ([0, 1]) then, since it is a continuous function on a

closed bounded interval, it is bounded and attains its bounds. It follows
that for some M ≥ 0, |f (x)| ≤ M for all x ∈ [0, 1], and so
Z 1
|f (x)| dx ≤ M < ∞,
0
i.e. f ∈ L̃1 (0, 1).
But while the function f (x) = x−1/2 is not continuous on [0, 1], it is
continuous on (0, 1) and
Z 1 h i1
|x−1/2 | dx = 2x1/2 = 2 < ∞,
0 0
so f ∈ L̃1 (0, 1). These two examples show that C 0 ([0, 1]) is a strict subset
of L̃1 (0, 1).
We now discuss spanning sets, linear independence, and bases. Note

that the definitions – and the following arguments – also apply to infinite-
dimensional spaces. In particular the result of Lemma 1.9 is valid for infinite-
dimensional spaces.
Definition 1.5 The linear span of a subset E of a vector space V is the

collection of all finite linear combinations of elements of E:
n
X
Span(E) = {v ∈ V : v = αj ej , n ∈ N, αj ∈ K, ej ∈ E}.
j=1
We say that E spans V if V = Span(E), i.e. every element of v can be

written as a finite linear combination of elements of E.
Note that this definition requires v to be expressed as a finite linear com-

bination of elements of E. When discussing bases for abstract vector spaces
with no additional structure the only option is to take finite linear combina-
tions, since these are defined using only the axioms for a vector space (scalar
multiplication and addition of vector space elements). In order to take infi-
nite linear combinations we require some way to discuss convergence, which
is not available in a general vector space.
Definition 1.6 A set E is linearly independent if any finite collection of

elements of E is linearly independent
n
X
αj ej = 0 ⇒ α1 = · · · = αn = 0
j=1
for any choice of n ∈ N, αj ∈ K, and ej ∈ E.
Definition 1.7 A Hamel basis for V is an linearly independent spanning

set.
Expansions in terms of basis elements are unique:
Lemma 1.8 If E is a Hamel basis for V then any element of V can be

written uniquely in the form
n
X
v= αj ej
j=1
for some n ∈ N, αj ∈ K, and ej ∈ E.

For a proof see Linear Algebra.
If E is a linearly independent set that spans V then it is not possible to

find an element of V that can be added to E to obtain a larger linearly
independent set (otherwise E would not span V ). We now show that this
can be reversed.
Lemma 1.9 If E ⊂ V is maximal linearly independent set, i.e. a linearly

independent set E such that E ∪ {v} is not linearly independent for any
v ∈ V \ E. Then E is a Hamel basis for V .
Proof Suppose that E does not span V : in particular take v ∈ V that

cannot be written as any finite linear combination of elements of E. To
obtain a contradiction, choose n ∈ N and {ej }nj=1 with ej ∈ E, and suppose
that
Xn
αj ej + αn+1 v = 0.
j=1
Since v cannot be written as a sum of any finite collection of the {ej }, we

must have αn+1 = 0, which leaves nj=1 αj ej = 0. However, {ej } is a finite
P
subset of E and is thus linearly independent by assumption, and so we must

have αj = 0 for all j = 1, . . . , n + 1. But this says that E ∪ {v} is linearly
independent, a contradiction. So E spans V .
We recall here the following fundamental theorem:
Theorem 1.10 Suppose that V has a basis consisting of a finite number of

elements. Then every basis of V contains the same number of elements.
This allows us to make the following definition:
Definition 1.11 If V has a basis consisting of a finite number of elements

then the dimension of V is the number of elements in this basis. If V has
no finite basis then V is infinite-dimensional.
Since a Hamel basis is a maximal linearly independent set (Lemma 1.9),

it follows that a space is infinite-dimensional iff for every n ∈ N one can find
a set of n linearly independent elements of V .
Example 1.12 For any n ∈ N the n elements in ℓ2 (K) given by

(1, 0, 0, 0, . . .), (0, 1, 0, 0, . . .), (0, 0, ..., 0, 1, 0, . . .),
(j) (j)
i.e. elements e(j) with ej = 1 and ei = 0 if i 6= j, are linearly independent.
It follows that ℓ2 (K) is an infinite-dimensional vector space.
Example 1.13 Consider the functions fn ∈ C 0 ([0, 1]), where fn is zero for
x∈/ In = [2−n − 2−(n+2) , 2−n + 2−(n+2) ] and interpolates linearly between the
values
fn (2−n − 2−(n+2) ) = 0 fn (2−n ) = 1 fn (2−n + 2−(n+2) ) = 0.
The intervals In where fn 6= 0 are disjoint, but f (2−n ) = 1. It follows that
for any n the {fj }nj=1 are linearly independent, and so C 0 ([0, 1]) is infinite-
dimensional.
We end this section with the following powerful-looking theorem:
Theorem 1.14 Every vector space has a Hamel basis.
The proof makes use of Zorn’s Lemma. In order to state this result (which
in fact is more of an axiom, since it is equivalent to the axiom of choice) we
need to introduce some auxiliary concepts.
A partial order on a set P is a binary relation on P such that
(i) a a for all a ∈ P ,

(ii) a b and b a implies that a = b, and
(iii) a b and b c implies that a c.
The order is ‘partial’ because two arbitrary elements of P need not be or-
dered: consider for example, the case when P consists of all subsets of R
and X Y if X ⊆ Y ; one cannot order [0, 1] and [1, 2].
A subset C of P in which all elements can be ordered is called a chain

(i.e. for all a, b ∈ C, either a b or b a (or both, in which case a = b)).
An element b ∈ P is an upper bound for a subset S of P if s b for all

s ∈ P . An element m of P is maximal if m a for some a ∈ P implies that
a = m.
Lemma 1.15 (Zorn’s Lemma) If every chain in a non-empty partially

ordered set P has an upper bound, then P has at least one maximal element.
We can now give the proof of Theorem 1.14.
Proof If V is finite-dimensional then V has a finite-dimensional basis, by

definition.
So assume that V is infinite-dimensional. Let P be the collection of all
linearly independent subsets of V . Let E1 E2 if E1 ⊆ E2 . Let C =
{Ei }i∈I be a chain in P , where I is an index set, and let
[
E∗ = Ei .
i∈I
Note that E∗ is linearly independent, since any finite collection of elements

∗
of E must be contained in one of the Ei (which is linearly independent).
Clearly Ei E ∗ for all i ∈ I , so E ∗ is an upper bound for C.
It follows from Zorn’s Lemma that P has a maximal element, i.e. a max-
imal linearly independent set. By Lemma 1.9 this is a basis for V .
Exercise 1.16 Show that the space ℓf consisting of all sequences that contain
only finitely many non-zero terms is a vector space. Show that
ej = (0, . . . , 0, 1, 0, . . .)
(all zeros except for a single 1 in the jth position) is a Hamel basis for ℓf .
This is a very artificial example. No Banach space (a particularly nice

kind of vector space, see later) can have a countable Hamel basis. We will
show that no Hilbert space can have a countable Hamel basis in Chapter *.
2
Norms and normed spaces
2.1 Norms and normed spaces
Definition 2.1 A norm on a vector space V is a map k · k : V → R such

that for all x, y ∈ V and α ∈ K
(i) kxk ≥ 0 with equality iff x = 0;

(ii) kαxk = |α|kxk; and
(iii) kx + yk ≤ kxk + kyk (‘the triangle inequality’).
A vector space equipped with a norm is called a normed space.
Strictly a normed space should be written (V, k · kV ) where k · kV is the

particular norm on V . However, many normed spaces have standard norms,
and so often the norm is not specified. E.g. unless otherwise stated, Rn is
equipped with the standard norm
 1/2
n
X
kxk =  |xj |2  . (2.1)
j=1
However, others are possible, such as

X
kxk∞ = max |xi | and kxk1 = |xi |.
i=1,...,n
i
Exercise 2.2 Show that k · k, k · k∞ , and k · k1 are all norms on Rn .
8
Example 2.3 For 1 ≤ p < ∞ the standard norm on ℓp (K) is

 1/p
∞
X
kxkℓp =  |xj |p  ,
j=1
where x = (x1 , x2 , x3 , . . .). For p = ∞ we set

kxkℓ∞ = sup |xj |.
j
Note that when p = 2 and K = R this is the natural extension of the standard
Euclidean norm to a countable collection of real numbers.
We now show that this really does define a norm. It is clear that kxkℓp ≥ 0,
and that if the norm is zero that x = 0. It is also clear that
 1/p  1/p  1/p
X∞ ∞
X X ∞
kλxkℓp =  |λxj |p  =  |λ|p |xj |p  = |λ|  |xj |p 
j=1 j=1 j=1
= |λ|kxkℓp .
(This requirement explains why we have to take the pth root of the sum of
pth powers.)
It is the triangle inequality that requires some work. Although the ar-
gument is a little long, on the way we will prove two auxiliary – and very
useful – inequalities. We say that (p, q) with 1 < p, q < ∞ are conjugate if
1 1
+ = 1; (2.2)
p q
we extend to definition to cover the case p = 1, q = ∞.
Lemma 2.4 (Young’s inequality) Let a, b > 0 and (p, q) conjugate with
1 < p, q < ∞. Then
ap bq
ab ≤ + . (2.3)
p q
Proof Consider the function

tp 1
f (t) = + − t.
p q
Then
df
= tp−1 − 1,
dt
10 Norms and normed spaces
and this is zero only for t = 1, where f (1) = 0. The second derivative
is positive, so this is a minimum. It follows that f (t) ≥ 0 for all t. In
particular, choosing t = ab−q/p we obtain
ap b−q 1
+ − ab−q/p ≥ 0
p q
so that
ap bq
+ ≥ ab−q/p bq = ab.
p q
Lemma 2.5 (Hölder’s inequality) Let x ∈ ℓp and y ∈ ℓq , with p, q

conjugate (1 ≤ p, q ≤ ∞). Then
∞
X
|xj yj | ≤ kxkℓp kykℓq . (2.4)
j=1
Proof For p = 1, q = ∞, for each n ∈ N

 
n n
X n X
|xj yj | ≤ max |yj |  |xj | ≤ kykℓ∞ kxkℓ1 .
j=1
j=1 j=1
Taking the limit as n → ∞ it follows that

∞
X
|xj yj | ≤ kykℓ∞ kxkℓ1 .
j=1
For 1 < p < ∞, consider

n n
X |xj | X 1 |xj |p 1 |yj |q
f rac|yj |kykℓ ≤
q p +
kxkℓp p kxkℓp q kykqℓq
j=1 j=1
≤ 1.
It follows that for each n,

n
X
|xj yj | ≤ kxkℓp kykℓq ,
j=1
and so (2.4) follows by letting n → ∞.

Note that for p = q = 2 we obtain the Cauchy-Schwarz inequality for

x, y ∈ ℓ2 ,
 1/2  1/2
∞
X X∞ X∞
|xj yj | ≤  |xj |2   |yj |2  .
j=1 j=1 j=1
Choosing x = (x1 , . . . , xn , 0, 0, · · · ) and y = (y1 , . . . , yn , 0, 0, · · · ) we recover

the Cauchy-Schwarz inequality in Rn ,
 1/2  1/2
n
X Xn Xn
|xj yj | ≤  |xj |2   |yj |2  , (2.5)
j=1 j=1 j=1
which is just the familiar inequality for dot products |x · y| ≤ |x||y|.
Finally we can prove the triangle inequality for the ℓp norm:
Lemma 2.6 (Minkowski’s inequality) Let x, y ∈ ℓp , 1 ≤ p ≤ ∞. Then

x + y ∈ ℓp , and
kx + ykℓp ≤ kxkℓp + kykℓp . (2.6)
Proof For p = ∞ the triangle inequality is clear, since
n n n
max |xj + yj | ≤ max |xj | + max |yj |,
j=1 j=1 j=1
and the proof follows by taking limits. Similarly for p = 1,
n
X n
X n
X
|xj + yj | ≤ |xj | + |yj |,
j=1 j=1 j=1
and again one takes the limit as n → ∞.

For 1 ≤ p < ∞, we have already noted that x + y ∈ ℓp . Now let q be
the conjugate exponent to p, i.e. q = p/(p − 1) (so that 1/p + 1/q = 1). It
follows that (p − 1)q = p, and therefore, since x + y ∈ ℓp , the sequence z
with zj = |xj + yj |p−1 ∈ ℓq . So we can write

n
X n
X
|xj + yj |p ≤ |xj + yj |p−1 (|xj | + |yj |)
j=1 j=1
Xn n
X
p−1
= |xj + yj | |xj | + |xj + yj |p−1 |yj |
j=1 j=1
∞
X n
X
≤ |xj + y + j|p−1 |xj | + |xj + y + j|p−1 |yj |
j=1 j=1
 1/q
X∞ 
≤ |xj + y +j |(p−1)q (kxkℓp + kykℓp )
 
j=1
 1/q
n
X
= |xj + yj |p  (kxkℓp + kykℓp ),
j=1
i.e.
 1−1/q
Xn
 |xj + yj |p  ≤ kxkℓp + kykℓp .
j=1
Since 1 − (1/q) = 1/p one can now let n → ∞ to obtain (2.6).
One of our main concerns in what follows will be normed spaces that
consist of functions. For example, the following are norms on C 0 ([0, 1]), the
space of all continuous functions on [0, 1]: the ‘sup(remum) norm’,
kf k∞ = sup |f (x)|
x∈[0,1]
(convergence in this norm is equivalent to uniform convergence) and the L1

and L2 norms
Z 1 Z 1 1/2
2
kf kL1 = |f (x)| dx and kf kL2 = |f (x)| dx .
0 0
Note that of the three candidates here the L2 norm looks most like the
expression (2.1) for the familiar norm in Rn .
Lemma 2.7 k · kL1 is a norm on C 0 ([0, 1]).

Proof The only part that requires much thought is (i), to make sure that
kf kL1 = 0 iff f = 0. So suppose that f 6= 0. Then |f (y)| = δ > 0 for some
y ∈ (0, 1) (if f (0) 6= 0 or f (1) 6= 0 it follows from continuity that f (y) 6= 0
for some y ∈ (0, 1)). Since f is continuous, there exists an ǫ > 0 such that
for any x ∈ (0, 1) with |x − y| < ǫ we have
|f (x) − f (y)| < δ/2.
If necessary, reduce ǫ so that [y − ǫ, y + ǫ] ∈ (0, 1). Then
Z 1 Z y+ǫ Z y+ǫ
2 2 δ
|f (x)| dx ≥ |f (x)| dx ≥ dx = ǫδ > 0.
0 y−ǫ y−ǫ 2
(ii) and (iii) are clear since

Z Z
kαf kL1 = |αf (x)| dx = |α| |f (x)| dx = |α|kf kL1
and
Z Z
kf + gkL1 = |f (x) + g(x)| dx ≤ |f (x)| + |g(x)| dx ≤ kf kL1 + kgkL1 .
For k · kL2 (i) and (ii) are the same as above; we will see (iii) below as a
consequence of the Cauchy-Schwarz inequality for inner products.
Definition 2.8 Two norms k·k1 and k·k2 on a vector space V are equivalent
– we write k · k1 ∼ k · k2 – if there exist constants 0 < c1 ≤ c2 such that
c1 kxk1 ≤ kxk2 ≤ c2 kxk1 for all x ∈ V.
It is clear that the above notion of ‘equivalence’ is reflexive. It is also

transitive:
Lemma 2.9 Suppose that k · k1 , k · k2 , and k · k3 are all norms on a vector

space V , and that k · k2 and k · k3 are both equivalent to k · k1 . Then k · k2
and k · k3 are equivalent.
Proof There exist constants 0 < α1 ≤ α2 and 0 < β1 ≤ β2 such that

α1 kxk2 ≤ kxk1 ≤ α2 kxk2 and β1 kxk3 ≤ kxk1 ≤ β2 kxk3 ,
and so
kxk2 ≥ α−1 −1
2 kxk1 ≥ β1 α2 kxk3
and
kxk2 ≤ α−1 −1
1 kxk1 ≤ β2 α1 kxk3 ,
i.e. β1 α−1 −1
2 kxk3 ≤ kxk2 ≤ β2 α1 kxk3 and k · k2 and k · k3 are equivalent.
Exercise 2.10 Show that the norms k · k, k · k1 , and k · k∞ on Rn are all

equivalent.
This is a particular case of the general result that all norms on a finite-
dimensional vector space are equivalent, which we will prove in the following
chapter. As part of this proof, the following proposition – which shows that
one can always find a norm on a finite-dimensional vector space – will be
useful.
Proposition 2.11 Let V be an n-dimensional vector space, and E = {ej }nj=1

a basis for V . Define a map k · kE : V → [0, ∞) by
 1/2
n n
X X
|αj |2 

αj ej
=

j=1 j=1
E
(taking the positive square root). Then k · kE is a norm on V .
P
Proof First, note that any v ∈ V can be written uniquely as v = j αj ej ,
so the map v 7→ kvkE is well-defined. We check that k · kE satisfies the three
requirements of a norm:
|αj |2 = 0¡
P P
(i) clearly kvkE ≥ 0, and if kvkE = 0 then v = αj ej with
i.e. αj = 0 for j = 1, . . . , n, and so v = 0.
P P
(ii) If v = j αj ej then λv = j (λαj )ej , and so
X X
kλvk2E = |λαj |2 = |λ|2 |αj |2 = |λ|2 kvk2E .
j j
P P
(iii) For the triangle inequality, if u = j αj ej and v = j βj ej then,
using the Cauchy-Schwarz inequality (2.5)

2

X
2

ku + vkE = (αj + βj )ej

j
X
= |αj + βj |2
j
X
= |αj |2 + αj βj + αj βj + |βj |2
j
X X
= kuk2E + αj βj + αj βj + kvk2E
j j
 1/2  1/2
X X
≤ kuk2E +  |αj |2   |βj |2  + kvk2E
j j
= kuk2E + 2kukE kvkE + kvk2E

= (kukE + kvkE )2 ,
i.e. ku + vkE ≤ kukE + kvkE .
We now want to show that with k·kE norm, any finite-dimensional normed
space is ‘the same’ as Rn . For two objects to be ‘the same’ we generally
require an isomorphism that also preserves the essentially structures of the
objects. Here we want to say that two linear spaces, along with their norms,
are ‘the same’. So we will need the isomorphism to be linear (so that ϕ(x) +
ϕ(y) = ϕ(x + y)) and we will also want to preserve the norm (‘an isometry’).
Definition 2.12 Two normed spaces (X, k · kX ) and (Y, k · kY ) are isomet-
rically isomorphic, or simply isometric, if there exists a linear isomorphism
ϕ : X → Y that is also an isometry, i.e.
kϕ(x)kY = kxkX for all x ∈ X.
Corollary 2.13 With the norm defined in Proposition 2.11, (V, k · kE ) is

isometrically isomorphic to (Rn , | · |).
Pn
Proof For v ∈ (V, k · kE ) with v = j=1 αj ej , define
Xn
ϕ( αj ej ) = (α1 , . . . , αn ),
j=1
which has a well-defined inverse

n
X
ϕ−1 (α1 , . . . , αn ) = αj ej .
j=1
It is clear that ϕ is one-to-one and onto, that ϕ (and its inverse) are linear,
and it follows directly from the definition of k · kE that |ϕ(x)| = kxkE for all
x∈V.
2.2 Convergence
In a normed space we can measure the distance between x and y using

kx − yk. So we can define notions of convergence and continuity using this
idea of distance:
Definition 2.14 A sequence {xk }∞ k=1 in a normed space X converges to a

limit x ∈ X if for any ǫ > 0 there exists an N such that
kxk − xk < ǫ for all n ≥ N.
This definition is sensible in that limits are unique:
Exercise 2.15 Show that the limit of a convergent sequence is unique.
The following result shows that if xn → x then the norm of xn converges

to the norm of x. This will turn out to be a very useful observation.
Lemma 2.16 If xn → x in (X, k · k) then kxn k → kxk.
Proof The triangle inequality gives

kxn k ≤ kxk + kxn − xk and kxk ≤ kxn k + kx − xn k
which implies that

kx k − kxk ≤ kxn − xk.

n
2.2 Convergence 17
Two equivalent norms give rise to the same notion of convergence:
Lemma 2.17 Suppose that k · k1 and k · k2 are two equivalent norms on a

space X. Then
kxn − xk1 → 0 iff kxn − xk2 → 0,
i.e. convergence in one norm is equivalent to convergence in the other, with
the same limit.
The proof of this lemma is immediate from the definition of the equiva-
lence of norms, since there exist constants 0 < c1 ≤ c2 such that
c1 kxn − xk1 ≤ kxn − xk2 ≤ c2 kxn − xk1 ;
Using convergence we can also define continuity:
Definition 2.18 A map f : (X, k · kX ) → (Y, k · kY ) is continuous if

xn → x in X ⇒ f (xn ) → f (x) in Y,
i.e. if
kxn − xkX → 0 ⇒ kf (xn ) − f (x)kY → 0.
Exercise 2.19 Show that this is equivalent to the ǫ–δ definition of continu-
ity: for each x ∈ X, for every ǫ > 0 there exists a δ > 0 such that
ky − xkX < ǫ ⇒ kf (y) − f (x)kY < δ.
Lemma 2.17 has an immediate implication for continuity:
Corollary 2.20 Suppose that k · kX,1 ∼ k · kX,2 are two equivalent norms
on a space X, and k · kY,1 ∼ k · kY,2 are two equivalent norms on a space
Y . Then a function f : (X, k · kX,1 ) → (Y, k · kY,1 ) is continuous iff it is
continuous as a map from (X, k · kX,2 ) into (Y, k · kY,2 ).
We remarked above that all norms on a finite-dimensional space are equiv-

alent, which means that there is essentially only one notion of ‘convergence’
and of ‘continuity’. But in infinite-dimensional spaces there are distinct
k2 − 1
1
gk
fk
1 1 1 1 1
2 − k
1
2
1
2 + 1
k k+1 k k−1
Fig. 2.1. (a) definition of fk and (b) definition of gk .
norms, and the different notions of convergence implied by the norms we

have introduced for continuous functions are not equivalent, as we now show.
First we note that convergence of fk ∈ C 0 ([0, 1]) to f in the supremum

norm, i.e.
sup |fk (x) − f (x)| → 0 (2.7)
x∈[0,1]
implies that fk → f in the L1 norm, since clearly

Z 1 Z 1 !
|fk (y)−f (y)| dy ≤ sup |fk (x) − f (x)| dy = sup |fk (x)−f (x)|.
0 0 x∈[0,1] x∈[0,1]
This inequality should make very clear the advantage of the shorthand norm
notation, since it just says
kfk − f kL1 ≤ kfk − f k∞ .
It is also clear that if (2.7) holds then fk (x) → f (x) for each x ∈ [0, 1], which
is ‘pointwise convergence’. However, neither pointwise convergence nor L1
convergence imply uniform convergence:
Example 2.21 Consider the sequence of functions {fk } as illustrated in

Figure 2.2(a). Then fk → 0 in the L1 norm, since
1
kfk − 0kL1 = kfk kL1 = .
k
However, fk 6→ 0 pointwise, since fk ( 12 ) = 1 for all k.
2.2 Convergence 19
Example 2.22 Consider the sequence of functions {gk } as illustrated in

Figure 2.2(b). Then fk → 0 pointwise, but

1 2 1 1 1 1
kfk kL1 = 2 (k − 1) − + − = 1,
k−1 k k k+1
so fk 6→ 0 in the L1 norm.
3
Compactness and equivalence of norms
3.1 Compactness
One fundamental property of the real numbers is expressed by the Bolzano-

Weierstrass Theorem:
Theorem 3.1 (Bolzano-Weierstrass) A bounded sequence of real num-

bers has a convergent subsequence.
This can easily be generalised to sequences in Rn :
Corollary 3.2 (Bolzano-Weierstrass in Rn ) A bounded sequence in Rn

has a convergent subsequence.
(k) (k)
Proof Let {x(k) = (x1 , . . . , xn } be a bounded sequence in Rn . Since
(k)
x1 is a bounded sequence in R, there is a subsequence x(k1,j ) for which
(k ) (k )
x1 1,j converges. Since x(k1,j ) is again a bounded sequence in Rn , x2 1,j is
a bounded sequence in R. We can therefore find a subsequence x(k2,j ) of
(k )
x(k1,j ) such that x2 2,j converges. Since x(k2,j ) is a subsequence of x(k1,j ) ,
(k )
x1 2,j still converges. We can continue this process inductively to obtain a
(k )
subsequence x(kn,j ) such that all the xi n,j for i = 1, . . . , n converge.
We now make two definitions:
20
3.1 Compactness 21
Definition 3.3 A subset X of a normed space (V, k · k) is bounded if there

exists an M > 0 such that
kxk ≤ M for all x ∈ X.
Note that if k·k1 ∼ k·k2 are two equivalent norms on V then X is bounded
wrt k · k1 iff it is bounded wrt k · k2 .
Definition 3.4 A subset X of a normed space (V, k · k) is closed if whenever

a sequence {xn } with xn ∈ X converges to some x, we must have x ∈ X.
Note that if k · k1 ∼ k · k2 are two equivalent norms on V then X is closed

wrt k · k1 iff it is closed wrt k · k2 .
Example: any closed interval in R is closed in this sense. Any product of

closed intervals in closed in Rn .
Exercise 3.5 Show that if (X, k · k) is a normed space then the unit ball
BX ([0, 1]) = {x ∈ X : kxk ≤ 1}
and the unit sphere
SX = {x ∈ X : kxk = 1}
are both closed.
Definition 3.6 A subset K of a normed space (V, k · k) is compact if any

sequence {xn } with xn ∈ K has a convergent subsequence xnj → x∗ with
x∗ ∈ K.
Note that if k·k1 ∼ k·k2 are two equivalent norms on V then X is compact
wrt k · k1 iff it is compact wrt k · k2 .
Two properties of compact sets are easy to prove:
Theorem 3.7 A compact set is closed and bounded.
Proof Let K be a compact set in (V, k · k) and xn → x with xn ∈ K. Since

K is compact {xn } has a convergent subsequence; its limit must also be x,
and from the definition of compactness x ∈ K, and so K is closed.
Suppose that K is not bounded. Then for each n ∈ N there exists an
22 Compactness and equivalence of norms
xn ∈ K such that kxn k ≥ n. But {xn } must have a convergent subsequence,

and any convergent sequence is bounded, which yields a contradiction.
It follows from the Bolzano-Weierstrass theorem that any closed bounded

set K in Rn is compact: A sequence in a bounded subset K of Rn has a
convergent subsequence by Corollary 3.2; since K is closed by definition this
subsequence converges to an element of K. So K is compact. We have
therefore shown:
Theorem 3.8 A subset of Rn is compact iff it is closed and bounded.
We will see later that this characterisation does not hold in infinite-
dimensional spaces (and this is one way to characterise such spaces).
We now prove two fundamental results about continuous functions on

compact sets:
Theorem 3.9 Suppose that K ⊂ (X, k · kX ) is compact and that f is a

continuous map from (X, k · kX ) into (Y, k · kY ). Then f (K) is a compact
subset of (Y, k · kY ).
Proof Let {yn } ∈ f (K). Then yn = f (xn ) for some xn ∈ K. Since

{xn } ∈ K, and K is compact there is a subsequence of xn that converges,
xnj → x∗ ∈ K. Since f is continuous it follows that as j → ∞
ynj = f (xnj ) → f (x∗ ) = y ∗ ∈ f (K),
i.e. the subsequence ynj converges to some y ∗ ∈ f (K), and so f (K) is

compact.
From which follows:
Proposition 3.10 Let K be a compact subset of (X, k·k). Then any contin-
uous function f : K → R is bounded and attains its bounds, i.e. there exists
an M > 0 such that |f (x)| ≤ M for all x ∈ K, and there exist x, x ∈ K
such that
f (x) = inf f (x) and f (x) = sup f (x).
x∈K x∈K
3.1 Compactness 23
Proof Since f is continuous and K is compact, f (K) is a compact subset

of R, i.e. f (K) is closed and bounded. It follows that
f = sup y ∈ f (K),
y∈f (K)
and so f = f (x) for some x ∈ K. [That sup(S) ∈ S for any closed S is clear,
since for each n there exists an sn ∈ S such that sn > sup(S) − 1/n. Since
sn ≤ sup(S) by definition, sn → sup(S), and it follows from the fact that S
is closed that sup(S) ∈ S.] The argument for x is identical.
This allows one to prove the equivalence of all norms on a finite-dimensional

space.
Theorem 3.11 Let V be a finite-dimensional vector space. Then all norms

on V are equivalent.
Proof Let E = {ej }nj=1 be a basis for V , and let k · kE be the norm on V
defined in Proposition 2.11. Let k · k be another norm on V . We will show
that k · k is equivalent to k · kE . Since equivalence of norms is an equivalence
relation, this will imply that all norms on V are equivalent.
P
Now, if u = j αj ej then

X
kuk = αj ej

j
X
≤ |αj |kej k (using the triangle inequality)
j
 1/2  1/2
X X
≤ |αj |2   kej k2  (using the Cauchy-Schwarz inequality)
j j
= CE kukE ,
where CE2 = j kej k2 , i.e. CE is a constant that does not depend on u.
P
Now, observe that this estimate kuk ≤ CE kukE implies for u, v ∈ V ,

ku − vk ≤ CE ku − vkE ,
and so the map u 7→ kuk is continuous from (V, k · kE ) into R.
Now, note that set
SV = {u ∈ V : kukE = 1}
24 Compactness and equivalence of norms
Pn
is the image of Sn = {α ∈ Rn : |α| = 1} under the map α 7→ j=1 αj ej .
Since by definition

n
X

= |α|,
αj ej

j=1
E
this map is continuous. Since Sn is closed and bounded, it is a compact

subset of Rn ; since SV is the image of Sn under a continuous map, it is also
compact.
Therefore the map v 7→ kvk is bounded on SV , and attains its bounds. In
particular, there exists an a ≥ 0 such that
kvk ≥ a for every v ∈ V with kvkE = 1.
Since the bound is attained, there exists a v ∈ SV such that kvk = a. If

a = 0 then kvk = 0, i.e. v = 0. But since v ∈ SV we have kvkE = 1, and
so v cannot be zero. It follows that a > 0. Then for an arbitrary v ∈ V , we
have v/kvkE ∈ SV , and so

v
kvkE ≥ a ⇒ kvk ≥ akvkE .

Combining this with kuk ≤ CE kukE shows that k·k and k·kE are equivalent.
Corollary 3.12 A subset of a finite-dimensional normed space V is compact

iff it is closed and bounded.
Proof Let K be a closed bounded subset of (V, k · k). Then K is also a

closed bounded subset of (V, k · kE ), since k · k ∼ k · kE .
The map ϕ : (V, k · kE ) → Rn defined by
 
X
ϕ αj ej  = (α1 , . . . , αn )
j
is an isometry: ϕ and ϕ−1 are continuous, and |ϕ(x)| = kxkE .

Since |ϕ(x)| = kxkE and K is bounded, it follows that ϕ(K) is bounded.
Since ϕ−1 is continuous and K is closed, it follows that ϕ(K) is closed. To see
this, take yn ∈ ϕ(K), and suppose that yn → y ∗ . Since ϕ−1 is continuous,
it follows that
ϕ−1 (yn ) → ϕ−1 (y ∗ ),
3.1 Compactness 25
and since ϕ−1 (yn ) ∈ K and K is closed it follows that ϕ−1 (y ∗ ) ∈ K. In

particular, therefore, y ∗ = ϕ(ϕ−1 (y ∗ )) ∈ ϕ(K) and ϕ(K) must be closed.
So ϕ(K) is a closed bounded subset of Rn , and hence compact. ϕ−1 is
continuous, so K = ϕ−1 (ϕ(K)) is the continuous image of a compact set,
and hence a compact subset of (V, k · kE ). Since k · k ∼ k · kE , it follows that
K is a compact subset of (V, k · k) as required.
4
Completeness
In the treatment of convergent sequences of real numbers, one natural ques-

tion is whether there is a way to characterise which sequences converge
without knowing their limit. The answer, of course, is yes, and is given by
the notion of a Cauchy sequence.
Theorem 4.1 A sequence of real numbers {xn }∞ n=1 converges if and only if
it is a Cauchy sequence, i.e. given any ǫ > 0 there exists an N such that
|xn − xm | < ǫ for all n, m ≥ N.
Note that the proof makes use of the Bolzano-Weierstrass Theorem, so

is in some way entangled with compactness properties of closed bounded
subsets of R.
A sequence in a normed space (X, k · k) is Cauchy if given any ǫ > 0 there

exists an N such that
kxn − xm k < ǫ for all n, m ≥ N.
Lemma 4.2 Any Cauchy sequence is bounded.
Proof There exists an N such that
kxn − xm k < 1 for all n, m ≥ N.
It follows that in particular kxn k ≤ kxN k + 1 for all n ≥ N , and hence kxn k
is bounded.
26
Completeness 27
Definition 4.3 A normed space X is complete if any Cauchy sequence in

X, converges to some x ∈ X. A complete normed space is called a Banach
space.
Theorem 4.1 states that R with its standard norm is complete (‘R is a
Banach space’). It follows fairly straightforwardly that the same is true for
any finite-dimensional normed space.
Theorem 4.4 Every finite-dimensional normed space (V, k · k) (over R or

C) is complete.
Proof Choose a basis E = (e1 , . . . , en ) of V , and define another norm k · kE

on V by
 1/2
n n
X X
|xj |2  .

xj ej
=

j=1 j=1
E
Since all norms on V are equivalent (Theorem 3.11), a sequence {xk } that
is Cauchy in k · k is Cauchy in k · kE .
Writing xk = nj=1 xkj ej it follows that given any ǫ > 0 there exists an Nǫ
P
such that for k, l ≥ Nǫ

n
X
kxk − xl k2E = |xkj − xlj |2 < ǫ2 . (4.1)
j=1
In particular {xkj } is a Cauchy sequence of real numbers for each fixed j =

1, . . . , n. It follows that for each j = 1, . . . , n we have xnj → x∗j for some x∗j .
Set x∗ = nj=1 x∗j ej .
P
Letting l → ∞ in (4.1) shows that

n
X
k
kx − x∗ k2E = |xkj − x∗j |2 ≤ ǫ2 for all n ≥ Nǫ ,
j=1
i.e. xn → x∗ wrt k · kE . It follows that xn → x∗ wrt k · k, and clearly x∗ ∈ V ,

and so V is complete.
Note that in particular Rn is complete.
The completeness of ℓp is a little more delicate, but only in the final steps.
28 Completeness
Proposition 4.5 (Completeness of ℓp ) The sequence space ℓp (K) (equipped

with its standard norm) is complete.
Proof Suppose that xk = (xk1 , xk2 , · · · ) is a Cauchy sequence in ℓp (K). Then

for every ǫ > 0 there exists an Nǫ such that
∞
kxn − xm kpℓp =
X
|xnj − xm p
j | <ǫ
p
for all n, m ≥ Nǫ . (4.2)
j=1
In particular {xkj }∞
k=1 is a Cauchy sequence in K for every fixed j. Since K
is complete (recall K = R or C) it follows that for each k ∈ N
xkj → ak
for some ak ∈ R.
Set a = (a1 , a2 , · · · ). We want to show that a ∈ ℓp and that kxk −akℓp → 0
as k → ∞. First, since {xk } is Cauchy we have from (4.2) that kxn −xm kℓp <
ǫ for all n, m ≥ Nǫ , and so in particular for any N ∈ N
N
X ∞
X
|xnj − xm p
j | ≤ |xnj − xm p p
j | ≤ǫ .
j=1 j=1
Letting m → ∞ we obtain
N
X
|xnj − aj |p ≤ ǫp ,
j=1
and since this holds for all N it follows that

∞
X
|xnj − aj |p ≤ ǫp ,
j=1
and so xk − a ∈ ℓp . But since ℓp is a vector space and xk ∈ ℓp , this implies

that a ∈ ℓp and kxk − akℓp ≤ ǫ.
Since the norm on ℓ2 is the natural generalisation of the norm on Rn ,

and since it is complete, it is tempting to think that ℓ2 will behave just like
Rn . However, it does not have the ‘Bolzano-Weierstrass property’ (bounded
sequences have a convergent subsequence) as we can see easily by considering
the sequence {ej }∞j=1 , where ej consists entirely of zeros apart from a 1 in
the jth position. Then clearly kej kℓ2 = 1 for all j; but if i 6= j then
kei − ej k2ℓ2 = 2,
Completeness 29
√
i.e. any two elements of the sequence are always 2 away from each other.
It follows that no subsequence of the {ej } can form a Cauchy sequence, and
so there cannot be a convergent subsequence.
This is really the first time we have seen a significant difference between Rn
and the abstract normed vector spaces that we have been considering. The
failure of the Bolzano-Weierstrass property is in fact a defining characteristic
of infinite-dimensional spaces.
Theorem 4.6 C 0 ([0, 1]) equipped with the sup norm k · k∞ is complete.
Proof Let {fk } be a Cauchy sequence in C 0 ([0, 1]): so given any ǫ > 0 there
sup |fn (x) − fm (x)| < ǫ for all n, m ≥ N. (4.3)

x∈[0,1]
In particular {fk (x)} is a Cauchy sequence for each fixed x, so fk (x) con-
verges for each fixed x ∈ [0, 1]: define
f (x) = lim fk (x).

k→∞
We need to show that in fact fk → f uniformly. But this follows since for
every x ∈ [0, 1] we have from (4.3)
|fn (x) − fm (x)| < ǫ for all n, m ≥ N,
where N does not depend on x. Letting m → ∞ in this expression we obtain
|fn (x) − f (x)| < ǫ for all n ≥ N,
where again N does not depend on x. It follows that
sup |fn (x) − f (x)| < ǫ for all n ≥ N,

x∈[0,1]
i.e. fn converges uniformly to f on [0, 1]. Completeness of C 0 ([0, 1]) then

follows from the fact that the uniform limit of a sequence of continuous
functions is still continuous.
For this reason the supremum norm is the ‘standard norm’ on C 0 ([0, 1]);
if no norm is mentioned this is the norm that is intended.
Example 4.7 C 0 ([0, 1]) equipped with the L1 norm is not complete.
30 Completeness
fk
0
1 1
2 − k
1
2
1
Fig. 4.1. Definition of fk .
Consider the sequence of functions {fk } defined by
1 1

0
 0≤x≤ 2 − k
1 1 1 1 1

fk (x) = k x − 2 − k 2 − k <x< 2

1
1 ≤ x ≤ 1,

2
see Figure 4.
This sequence is Cauchy in the L1 norm, since for n, m ≥ N we have
1 1
fn (x) = fm (x) for all x< 2 − , x > 12 ,
N
and so
1
1
1
Z Z
2
kfn − fm kL1 = |fn (x) − fm (x)| dx = |fn (x) − fm (x)| dx ≤ ,
0
1 1 N
2−N
(4.4)
since |fn (x) − fm (x)| ≤ 1 for all x ∈ [0, 1] and all n, m ∈ N.
But what is the limit, f (x), as n → ∞? Clearly one would expect f to be

the pointwise limit,
(
0 0 ≤ x < 12
f (x) =
1 12 ≤ x ≤ 1,
but this is certainly not continuous. It is not a priori obvious that fn → f

(in the L1 norm), but this is easy to check,
Z 1
kfk − f kL1 = |fk (x) − f (x)| dx
0
Z 1
2 1
= |k(x − 2 + k1 )| dx
1 1
2−k
1
= →0 as k → ∞.
2k
So this sequence converges in the L1 norm but not the sup norm.
4.1 The completion of a normed space
However, every normed space has a completion, i.e. a minimal set Ṽ such
that Ṽ ⊃ V and (Ṽ , k · k) is a Banach space. Essentially Ṽ consists of all
limit points of Cauchy sequences in V (and in particular, therefore, contains
a copy of V via the constant sequence vn = v ∈ V ).
This implies that for any v ∈ Ṽ there exist vn ∈ V such that vn → v in

the norm k · k; we say that V is dense in Ṽ .
Definition 4.8 Let (V, k · k) be a normed space. Then X is dense in V if

for any v ∈ V and any ǫ > 0 there is an x ∈ X such that
kx − vk < ǫ.
This is equivalent to the fact that given any v ∈ V there exists a sequence
xn ∈ X such that
kxn − vk → 0 as n → ∞.
This is the particularly useful form of ‘density’: if X is dense in V one can
often deduce properties of V by approximating them with elements of X.
Example 4.9 R is the completion of Q in the standard norm on R.
Exercise 4.10 Recall that we defined ℓf to be the set of all sequences in

which only a finite numbers of terms are non-zero. Show that ℓf is dense in
ℓ2 .
32 Completeness
The description of the completion of (V, k · k) above is not strictly cor-

rect. Clearly it must be missing some subtleties, since we are ‘adding’ to V
elements that are not in V and hence, in the setting of a general abstract
normed space (V, k · kV ), are not defined.
A completion of (V, k · kV ) is a Banach space (X, k · kX ) that contains an

isometric image of (V, k · kV ) that is dense in X. One can show that there
is ‘only one’ completion in that any two candidates must be isometric:
Theorem 4.11 Let (X, k · kX ) be a normed space. Then there exists a

complete normed space (X , k · kX ) and a linear map i : (X, k · kX ) →
(X , k · kX ) that is an isometry between X and its image, such that i(X)
is a dense subspace of X . Furthermore X is unique up to isometry; if
(Y , k · kY ) is a complete normed space and j : (X, k · kX ) → (Y , k · kY ) is
an isometry between X and its image, such that j(X) is a dense subspace
of Y , then X and Y are isometric.
Proof We consider Cauchy sequences in X, writing

x = (x1 , x2 , . . .) xj ∈ X
for a sequence in X. We say that two Cauchy sequences x and y are equiv-
alent, x ∼ y, if
lim kxn − yn kX = 0.
n→∞
We let X be the space of equivalence classes of Cauchy sequences in X

(i.e. X = X/ ∼). It is clear that X is a vector space, since the sum of
two Cauchy sequences in X is again a Cauchy sequence in X. We define a
candidate for our norm on X : if η ∈ X then
kηkX = lim kxn kX , (4.5)
n→∞
for any x ∈ η (recall that η is an equivalence class of Cauchy sequences).

Note that (i) if y is a Cauchy sequence in X, then {kyn k} forms a Cauchy
sequence in R, so for a particular choice of y ∈ η the right-hand side of (4.5)
exists, and (ii) if x, y ∈ η then

lim kxn k − lim kyn k = lim kxn k − kyn k

n→∞ n→∞ n→∞

= lim kxn k − kyn k

n→∞
≤ lim kxn − yn k = 0
n→∞
since x ∼ y. So the map in (4.5) is well-defined, and it is easy to check that

it satisfies the three requirements of a norm.
Now we define a map i : X → X , by setting
i(x) = [(x, x, x, x, x, x, . . .)].
Clearly i is linear, and an isometry between X and its image. We want to

show that i(X) is a dense subset of X .
For any given η ∈ X , choose some x ∈ η. Since x is Cauchy, for any given
ǫ > 0 there exists an N such that
kxn − xm kX < ǫ for all n, m ≥ N.
In particular, kxn − xN kX < ǫ for all n ≥ N , and so
kη − i(xN )kX = lim kxn − xN kX < ǫ,

n→∞
which shows that i(X) is dense in X .

Finally, we have to show that X is complete, i.e. that any Cauchy se-
quence in X converges to another element of X . A Cauchy sequence in X
is a Cauchy sequence of equivalence classes of Cauchy sequences in X! Take
such a Cauchy sequence, {η (k) }∞
k=1 . For each k, find xk ∈ X such that
ki(xk ) − η (k) kX < 1/k,
using the density of i(X) in X . Now let
x = (x1 , x2 , x3 , . . .).
We will show (i) that x is a Cauchy sequence, and so [x] ∈ X , and (ii) that
η (k) converges to [x]. This will show that X is complete.
(i) To show that x is Cauchy, observe that
kxn − xm kX = ki(xn ) − i(xm )kX

= ki(xn ) − η (n) + η (n) − η (m) + η (m) − i(xm )kX
≤ ki(xn ) − η (n) kX + kη (n) − η (m) kX + kη (m) − i(xm )kX
1 1
≤ + kη (n) − η (m) kX + .
n m
So now given ǫ > 0, choose N such that kη (n) − η (m) kX < ǫ/3 for n, m ≥ N .
If N ′ = max(N, 3/ǫ), it follows that
kxn − xm kX < ǫ for all n, m ≥ N ′ ,
i.e. x is Cauchy. So [x] ∈ X .

34 Completeness
(ii) To show that η (k) → [x], simply observe that
k[x] − η (k) kX ≤ k[x] − i(xk )kX + ki(xk ) − η (k) kX .
Given ǫ > 0, choose N large enough that kxn −xm kX < ǫ/2 for all n, m ≥ N ,
and then set N ′ = max(N, 2/ǫ). It follows that for k ≥ N ′ ,
k[x] − i(xk )kX = lim kxn − xk k < ǫ/2

n→∞
and ki(xk ) − η (k) kX < ǫ/2, i.e.
k[x] − η (k) kX < ǫ,
and so η (k) → [x].

We will not prove the uniqueness of X here.
The space X in the above theorem is a very abstract one, and we are
fortunate that in most situations there is a more concrete description of the
completion of ‘interesting’ normed spaces.
Definition 4.12 The space L1 (0, 1) is the completion of C 0 ([0, 1]) with re-
spect to the L1 norm.
Note that with this definition it is immediate that L1 (0, 1) is complete,

and that C 0 ([0, 1]) is dense in L1 (0, 1).
What is this space L1 (0, 1)? There are a number of possible answers:
• Heuristically, L1 (0, 1) consists of all functions that can arise as the limit
(with respect to the L1 norm) of sequences fn ∈ C 0 ([0, 1]).
However, how do we characterise these limits? Certainly L1 (0, 1) is larger

than C 0 ([0, 1]) (and larger than C 0 (0, 1)). We saw above that it contains
functions that are not continuous, and even functions whose values at indi-
vidual points (e.g. x = 12 ) are not defined.
• Formally, L1 (0, 1) is isometrically isomorphic to the equivalence class of

sequences in C 0 ([0, 1]) that are Cauchy in the L1 norm, where {fk } ∼ {gk }
if
Z 1
|fk (x) − gk (x)| dx → 0 as k → ∞.
0
This is hardly helpful.
• The space L1 (0, 1) consists of all real-valued functions f such that

Z 1
|f (x)| dx
0
is finite, where the integral is understood in the sense of Lebesgue integra-
tion. We say that f = g in L1 (0, 1) (the functions are essentially ‘the same’
if
Z 1
|f (x) − g(x)| dx = 0.
0
This is the most intrinsic definition, and some ways the most ‘useful’.
But note that given this definition it is certainly not obvious that L1 (0, 1) is
complete, nor that C 0 ([0, 1]) is dense in L1 (0, 1). We will assume these prop-
erties in what follows, but at the risk of over-emphasis: if we use Definition
4.12 to define L1 these properties come for free. If we use the ‘useful’ defi-
nition above there is actually some work to do to check these (which would
be part of a proper development of the Lebesgue integral and corresponding
‘Lebesgue spaces’).
Although we cannot discuss the theory of Lebesgue integration in detail

here, we can give a quick overview of its fundamental features and give
a rigorous definition of the notion of ‘almost everywhere’. Essentially the
Lebesgue integral extends more elementary definitions of the integral in a
mathematically consistent way.
5
Lebesgue integration
We follow the presentation in Priestley (1997), and start the construction of

the Lebesgue integral by defining the integral of simple functions for which
there can be no argument as to the correct definition.
We define the measure (or length) |I| of an interval I = [a, b], (a, b), (a, b],
or [a, b) to be
|I| = b − a.
5.1 Integrals of functions in Lstep (R)
The class Lstep (R) of step functions on R consists of all those functions s(x)
that are piecewise constant on a finite number of intervals, i.e.
n
X
s(x) = cj χ[Ij ](x), (5.1)
j=1
where cj ∈ R, each Ij is an interval, and χ[A] denotes the characteristic

function of the set A,
(
1 x∈A
χ[A](x) =
0 x∈/ A.
We define the integral of s(x) by

Z n
X
s= cj |Ij |. (5.2)
j=1
36
5.1 Integrals of functions in Lstep (R) 37
Even though this definition appears entirely reasonable, note that we have
not specified the nature of the intervals Ij , so the functions
χ(0,1) and χ[0,1]
have the same integral (which is 1); by extension we canR change the value
of s at a finite number of points and leave the value of s unchanged.
Most of the results we state below about the integrals of functions in

Lstep (R) rely on the following two observations:
(1) If φ ∈ Lstep (R) then φ can be written as

n
X
φ= cj χKj (5.3)
j=1
where the intervals Kj are disjoint.

(2) If φ, ψ ∈ Lstep (R) then one can write
n
X n
X
φ= cj χIj and ψ= dj χIj ,
j=1 j=1
i.e. where φ and ψ are expressed using the same choice of intervals (in this
case some of cj s and dj s may be zero).
The following properties of step functions are easily proved:
(A) If φ ∈ Lstep (R) then |φ| ∈ Lstep (R)

(B) If φ ∈ Lstep (R) then φ+ = max(φ, 0) and φ− = min(φ, 0) are both in
Lstep (R).
It is tedious but fairly elementary to check that this integral is well-defined

on Lstep (R), so that if s(x) is given by two possible expressions (5.1) then
the integrals in (5.2) agree: to do this one uses the disjoint form in (5.3).
It is also relatively simple to check that this integral satisfies the following
three fundamental properties:
(L) Linearity: if φ, ψ ∈ Lstep (R) and λ ∈ R then φ + λψ ∈ Lstep (R) and

Z Z Z
(φ + λψ) = φ + λ ψ.
38 Lebesgue integration
Proof Use (2) above. Then

n
X
φ + λψ = (cj + λdj )χIj ,
j=1
and so
Z n
X n
X n
X Z Z
(φ + λψ) = (cj + λdj )|Ij | = cj |Ij | + λ dj |Ij | = φ + λ ψ.
j=1 j=1 j=1
(P) Positivity: If φ ∈ Lstep (R) with φ ≥ 0 then

R
φ ≥ 0.
Proof Use (1). Then

n
X
φ= cj χKj
j=1
R
is positive iff cj ≥ 0 for each j. It is immediate that φ ≥ 0.
Lstep (R) Lstep (R)
R R
(M) Modulus property: If φ ∈ then |φ| ∈ and | φ| ≤ |φ|.
Proof Clearly |φ| ± φ ≥ 0, and so, since φ ∈ Lstep (R) implies (see *) that
|φ| ∈ Lstep (R), we can use properties (L) and (P) to give
Z Z Z
|φ| ± φ = |φ| ± φ ≥ 0,
from whence
Z Z
∓ φ≤ |φ|,
as required.
(T) Translation invariance: Take φ ∈ RLstep (R).
R For t ∈ R define φh (x) =
φ(h + x). Then φh ∈ Lstep (R) and φh = φ.
Proof Clearly if
n
X
φ= cj ϕIj
j=1
where Ij = haj , bj i, then φd is given by

n
X
φh = cj ϕIˆj ,
j=1
where Iˆj = haj − h, bj − hi. It is clear that |Iˆj | = |Ij |, and so

R R
φh = φ.
Note that combining positivity and linearity gives the comparison result
step
Rwhich Rwill be critical in what follows: if φ, ψ ∈ L (R) and φ ≥ ψ then
φ ≥ ψ.
5.2 Increasing sequences of functions in Lstep (R): Linc (R)
Now, if sn (x) is a monotonically increasing sequence of functions in Lstep (R)

(sn+1 (x) ≥ sn (x) for each x ∈ R), then it follows from the comparison
property above that the sequence
Z
sn (5.4)
is also monotonically increasing. Provided that the integrals in (5.4) are

uniformly bounded in n,
Z
lim sn
n→∞
exists.
We would like to deduce that the functions sn (x) must converge to some
limit,
R but we do not know that sn (x) is bounded, only that the integral
sn is bounded. Nevertheless, we can show that sn (x) converges ‘almost
everywhere’, an idea that we now introduce.
First, we will say that a set A ⊂ R has “measure zero” (‘zero length’) if,
given any ǫ > 0, one can find a (possibly countably infinite) set of intervals
[aj , bj ] that cover A but whose total length is less than ǫ:
∞
[ ∞
X
A⊂ [aj , bj ] and (bj − aj ) < ǫ.
j=1 j=1
Exercise 5.1 Show that if Aj has measure zero for all j = 1, . . . then
∞
[
Aj
j=1
P∞ −j
also has measure zero. [Hint: j=n+1 2 = 2−n .]
A property is said to hold ‘almost everywhere’ or ‘for almost every x’ if

the set of points at which it does not hold has measure zero.
Exercise 5.2 Show that if each property Pj , j = 1, 2, . . ., holds almost

everywhere in an interval I then all the Pj hold simultaneously at almost
every point in I.
What we will now show that each monotonic sequence sn (x) with (5.4)
uniformly bounded tends pointwise to a function f (x) almost everywhere,
i.e. except on a set of measure zero.
Theorem 5.3 LetR {φn } be an increasing sequence of step functions (φn+1 (x) ≥
φn (x)) such that φn ≤ K for all n. Then φn (x) converges for almost every
x.
Proof First, replace {φn } by φ̃n := φn − φ1 . Then φ̃n satisfies the conditions
of the theorem with K replaced by some K ′ , but now φ̃n ≥ 0.
If we can show that φ̃n (x) converges for almost every x then the same
clearly follows for φn (x) = φ̃n (x) + φ1 (x).
We want to show that
E = {x : φ̃n (x) → ∞}
has measure zero. Note that, since φn (x) is non-decreasing for each x, this
is precisely the set of points where φn (x) does not converge.
Fix m > 0, and set
En = {x : φ̃n (x) ≥ K ′ m}.
Then En can be covered by a finite collection In of disjoint intervals of
P
total length ≤ 1/m: indeed, using (1) and writing φ̃n = j cj χKj with Kj
disjoint, let In be the collection of those Kj with corresponding cj ≥ K ′ m.
Then Z X X X
K ≥ φ̃n = cj |Kj | ≥ ci |Ki | ≥ K ′ m |Ki |,
j j ′ ∈In i∈In
and so
X 1
|Ki | ≤ .
m
i∈In
Now, note that En+1 ⊇ En (since φ̃n+1 ≥ φ̃n ); by splitting up the intervals
that occur in In+1 if need be one can ensure that In+1 ⊇ In .
One can therefore list the intervals occurring in the In , {J1 , J2 , J3 , . . .} –

first list those in I1 , then the additional intervals in I2 , and so on.
Now, suppose that x ∈ E. Then for n sufficiently large, x ∈ En . So
[ ∞
[
E⊆ Ei ⊆ Jk . (5.5)
i k=1
PN
Now, for any fixed N , ∪Ni=1 Ji ⊆ Ir for some r, and so i=1 |Ji | ≤ 1/m.
Since this does not depend on N ,
∞
X
|Ji | ≤ 1/m. (5.6)
i=1
Since m is arbitrary, it follows from (5.5) and (5.6) that E has zero
measure.
We denote the set of all functions that can be arrived at as the almost-
everywhere limit of an increasing sequence of step functions as Linc (R). For
such a function (f = limn→∞ sn ), we define
Z Z
f = lim sn .
n→∞
Again, we have to check that this definition does not depend on exactly
which sequence {sn } we have chosen.
The proof uses the following technical lemma whose proof, which appeals
to the Heine-Borel Theorem (compactness of [0,1]) we omit.
Lemma 5.4 Let θn be a decreasing sequenceR of non-negative step functions

such that θn → 0 almost everywhere. Then θn → 0 as n → ∞.
Lemma 5.5 If f, g ∈ Linc (R) with f ≥ g almost everywhere. If φn and ψn

are increasing sequences in Lstep (R) tending to f, g, respectively, then
Z Z
lim φn ≥ lim ψn .
n→∞ n→∞
Proof Since f ≥ g almost everywhere, there exists a set E≥ of zero measure

such that f (x) ≥ g(x) for all x ∈
/ E≥ . Also φn (x) ↑ f (x) for x ∈
/ Eφ (Eφ
with zero measure) and ψn (x) ↑ g(x) for x ∈/ Eψ (Eψ with zero measure).
Let E = E≥ ∪ Eφ ∪ Eψ ; then E has zero measure.
For each fixed k, the sequence ψk − φn is decreasing in n, with limit ψk − f

for x ∈
/ E. For x ∈ E, ψk ≤ g, and so
ψk − f ≤ g − f ≤ 0.
Thus ψk −φn converges to a non-positive limit for x ∈ E, and so the sequence

(ψk − φn )+ converges to zero almost everywhere. It follows from Lemma 5.4
that for each fixed k,
Z
(ψk − φn )+ → 0
as n → ∞. Since
Z Z Z Z
ψk − φn = (ψk − φn ) ≤ (ψk − φn )+ ,
letting n → ∞ it follows that

Z Z
ψk ≤ lim φn ,
n→∞
and now letting k → ∞

Z Z
lim ψk ≤ lim φn .
k→∞ n→∞
It follows from this lemma that f is well-definedR for f ∈ Linc

R
R (R), since
one can deduce that if f = lim φn = lim ψn then lim φn = lim ψn .
We can prove the following properties for the integral of functions in

Linc (R):
Proposition 5.6
If f, g ∈ Linc (R) and f ≥ g then f ≥ g.
R R
(i)
If f, g ∈ Linc (R) then f + g ∈ Linc (R) and (f + g) = f + g.
R R R
(ii)
If f ∈ Linc (R) and λ ≥ 0 then λf = λ f .
R R
(iii)
(iv) If f, g ∈ Linc (R) then max(f, g), min(f, g) ∈ Linc (R); in particular,
f + = max(f, 0) ∈ Linc (R).
The proofs are obvious.

5.3 The space L1 (R) of integrable functions
Finally, we define the space of integrable functions on R, written L1 (R), to

be all functions of the form f (x) = g(x) − h(x) with g, h ∈ Linc (R), and set
Z Z Z
f = g − h.
This definition is consistent: if f = g1 −h1 = g2 −h2 then g1 +h2 = g2 +h1 ,

and so by the addition property for functions in Linc (R),
Z Z Z Z Z Z
g1 + h2 = g1 + h2 = g2 + h1 = g2 + h1 ,
from whence
Z Z Z Z
g1 − h1 = g2 − h2 .
We can now show that the integral of L1 functions has the same properties
as the integral on Lstep (R):
(L) If f1 , f2 ∈ L1 (R) and λ ∈ R then f1 + λf2 ∈ L1 (R) and

Z Z Z
(f1 + λf2 ) = f1 + λ f2 .
Proof With fi = gi − hi , gi , hi ∈ Linc (R), consider λ ≥ 0. Then
f1 + λf2 = (g1 + λg2 ) − (h1 + λh2 )
where both the bracketed terms are in Linc (R). So f1 + λf2 ∈ L1 (R), and
Z Z Z
f1 + λf2 = (g1 + λg2 ) − (h1 + λh2 )
Z Z Z Z
= (g1 − h1 ) + λ (g2 − h2 ) = f1 + λ f2 .
Finally if f ∈ L1 (R) with f R= g − h (g, inc

R h ∈ L (R)), then −f = h − g.
1
So −f ∈ L (R), and clearly −f = − f .
(P) If f ∈ L1 (R) and f ≥ 0 a.e. then f ≥ 0.
R
Proof If f = g − h, g, h ∈ Linc (R), then f ≥ 0 implies that g ≥ h.

Since integration respects order for functions in Linc (R),

R R
g≥ h, and
therefore
Z Z Z
f = g − h ≥ 0.
RNote that it is a consequence of this property that if f = 0 a.e. then

f = 0.
(M) If f ∈ L1 (R) then |f | ∈ L1 (R) and
Z Z

f ≤ |f |.

Proof Write f = g − h with g, h ∈ Linc (R). One can easily check that
|f | = max(g, h) − min(g, h),
and so |f | ∈ L1 (R) using Proposition 5.6. We need to show that

Z Z Z Z Z
g − h = f ≤ |f | = max(g, h) − min(g, h)
and
Z Z Z Z Z
h− g=− f≤ |f | = max(g, h) − min(g, h).
By symmetry we need only check one of these, and for this it is sufficient
to show that
g(x) + min(g(x), h(x)) ≤ h(x) + max(g(x), h(x)).
If g < h then the LHS is 2g(x) and the RHS is 2h(x), so the inequality
holds; while if g ≥ h the LHS is g + h and the RHS is h + g, so the
inequality holds once more.
This result will imply that there are some functions that can be integrated
to nevertheless are not in L1 (R), see the Examples Sheet.
(T) If f ∈ L1 (R) and fd (x) = f (x + d) then fd ∈ L1 (R) and fd = f .
R R
Proof Follows immediately from the same property for functions in Linc (R).
We have now defined the space L1 (R) of integrable functions, and shown
that the integral as we have defined it has the four properties we start with
for the integral on Lstep (R).
If we want to talk about integrals on intervals, we say that f ∈ L1 (a, b) –

‘f is integrable on (a, b)’ – if f χ(a,b) ∈ L1 (R).
There are three fundamental theorems for the Lebesgue integral. The first
is the Monotone Convergence Theorem, which looks like the construction
of the Lebesgue integral, but with a monotone sequence of step functions
replaced by a monotone sequence of integrable functions.
Theorem 5.7 (Monotone Convergence Theorem) Suppose that fn ∈

1
R
L (R), fn (x) ≤ fn+1 (x) almost everywhere, and fn ≤ K for some K
independent of n. Then there exists a g ∈ L1 (R) such that fn → g almost
everywhere, and
Z Z
g = lim fn .
Corollary 5.8 Suppose that f ∈ L1 (R) and

R
|f | = 0. Then f = 0 almost
everywhere.
Proof Exercise.
Theorem 5.9 (Dominated Convergence Theorem) Suppose that fn ∈

L1 (R) and that fn → f almost everywhere. If there exists a function g ∈
L1 (R) such that |fn (x)| ≤ g(x) for almost every x, and for every n, then
f ∈ L1 (R) and
Z Z
f = lim fn .
To define an integral of a function of two variables one would naturally

proceed by analogy with the construction above: take ‘step functions’ that
are constant on rectangles, construct an integral on Linc (R2 ) by taking lim-
its of monotonic sequences, and the construct L1 (R2 ) as the limits of differ-
ences. But this does not related ‘double integrals’ to single integrals. This
is achieved by the Fubini and Tonelli theorems. We give a less-than-rigorous
formulation:
Theorem 5.10 (Fubini-Tonelli) If f : R2 → R is such that either

Z Z Z Z
|f (x, y)| dx dy < ∞ or |f (x, y)| dy dx < ∞
then f ∈ L1 (R2 ) and

Z Z Z Z
f (x, y) dx dy = f (x, y) dy dx.
(The less-than-rigorous nature is that the conditions require integrability

properties of |f (x, y)|: first that for almostRevery y ∈ R, |f (·, y)| ∈ L1 (R),
and then that the resulting function g(y) = |f (x, y)| dy is again in L1 (R).)
5.4 The Lebesgue spaces Lp
We denote by Lp (R) [Lp (I)] the space of functions that are pth power inte-
grable [on an interval I]. It is natural to try to use the Lp norm
Z 1/p
p
kf kLp = |f (x)|
on Lp (I). However, although this is a norm on L̃p (I) [recall this was con-
tinuous functions with finite Lp norm], it in facts fails to satisfy property (i)
in the definition of a norm – if kf kLp = 0 then we only have f = 0 almost
everywhere.
So strictly speaking if we want to consider Lp as a normed space we in

fact have to consider Lp / ∼, where f ∼ g if f = g almost everywhere. This
identification is usually done tacitly, rather than by being explicit about the
quotienting operation.
We have now gone almost full circle. We started developing these Lp

spaces because C 0 is not complete in the L1 norm. Using the abstract
definition of completion, we defined L1 as the completion of C 0 in the L1
norm. But now we have defined a space, L1 (R), which a priori has nothing
to do with C 0 and its completion.
In fact one can show that C 0 is dense in L1 fairly easily (see Exercises).
We now show that L1 as defined is complete (and so is the completion of
C 0 that we were after).
We start with a general lemma of independent interest.

5.4 The Lebesgue spaces Lp 47
Lemma 5.11 Suppose that (X, k · k) is a normed space in which

∞
X
kyj k < ∞
j=1
Pn
implies that limn→∞ j=1 yj exists. Then (X, k · k) is complete.
Theorem 5.12 The Lebesgue space L1 (R) is complete.
In fact we prove the following:
Proposition 5.13 Let fk ∈ L1 (R). Then

P∞ P∞
(i) If k=1 kfk kL1 < ∞ then k=1 |fk | converges to an element of
L1 (R).
(ii) If ∞
P 1
P∞ 1
k=1 |fk | ∈ L (R) then k=1 fk ∈ L (R).
Proof
(i) Suppose that fk ∈ L1 (R) and
∞ Z
X
K= |fk | < ∞.
j=1
We apply the MCT to gn = nk=1 |fk |, since gn+1 ≥ gn and gn ≤ K

P R
for every k.
(ii) We now apply the DCT to hn = nk=1 fk . Each hn ∈ L1 (R), and by
P
the triangle inequality

n
X ∞
X
|hn | ≤ |fk | ≤ |fk |,
k=1 k=1
and by assumption the right-hand side is in L1 (R). Since k |fk |

P
P P
converges almost everywhere, so does k fk (if {ak } ∈ R and |ak |
P P∞
converges then so does ak ), and the DCT implies that k=1 fk ∈
L1 (R).
For completeness of L2 (which follows similar lines) see the Examples

sheet.
6
Inner product spaces
If x = (x1 , . . . , xn ) and y = (y1 , . . . , yn ) are two elements of Rn then we

define their dot product as
x · y = x1 y1 + · · · + xn yn . (6.1)
This is one concrete example of an inner product on a vector space:
Definition 6.1 An inner product (·, ·) on a vector space V is a map (·, ·) :

V × V → K such that for all x, y, z ∈ V and for all α ∈ K,
(i) (x, x) ≥ 0 with equality iff x = 0,
(ii) (x + y, z) = (x, z) + (y, z),
(iii) (αx, y) = α(x, y), and
(iv) (x, y) = (y, x).
Note that
• in a real vector space the complex conjugate in (iv) is unnecessary;

• in the complex case the restriction that (y, x) = (x, y) implies in particular
that (x, x) = (x, x), i.e. that (x, x) is real, and so the requirement that
(x, x) ≥ 0 makes sense; and
• (iii) and (iv) imply that the inner product is conjugate linear in its second
element, i.e. (x, αy) = ᾱ(x, y).
A vector space equipped with an inner product is known as an inner

product space.
48
6.1 Inner products and norms 49
Example 6.2 In the space ℓ2 (K) of square summable sequences, for x =

(x1 , x2 , . . .) and y = (y1 , y2 , . . .) one can define an inner product
∞
X
(x, y) = xj ȳj .
j=1
|xj ȳj | ≤ 12 ( j |xj |2 + |yj |2 ).

P P
This is well-defined since j
Example 6.3 The expression

Z b
(f, g) = f (x)g(x) dx
a
defines an inner product on the space L2 (a, b).
6.1 Inner products and norms
Given an inner product we can define kvk by setting
kvk2 = (v, v). (6.2)
We will soon shows that k · k defines a norm; we say that it is the norm
induced by the inner product (·, ·).
6.2 The Cauchy-Schwarz inequality
Lemma 6.4 (Cauchy-Schwarz inequality) Any inner product satisfies

the inequality
|(x, y)| ≤ kxkkyk for all x, y ∈ V, (6.3)
where k · k is defined in (6.2).
Proof If x = 0 or y = 0 then (6.3) is clear; so suppose that x 6= 0 and y 6= 0.

For any λ ∈ K we have
(x − λy, x − λy) = (x, x) − λ(y, x) − λ̄(x, y) + |λ|2 (y, y) ≥ 0.

50 Inner product spaces
Setting λ = (x, y)/kyk2 we obtain
|(x, y)|2 |(x, y)|2

0 ≤ kxk2 − 2 +
kyk2 kyk2
|(x, y)|2
= kxk2 − ,
kyk2
which implies (6.3).
The Cauchy-Schwarz inequality allows us to show easily that the map

x 7→ kxk is a norm on V . Property (i) is clear, since kxk ≥ 0 and if
kxk2 = (x, x) = 0 then x = 0. Property (ii) is also clear, since
kαxk2 = (αx, αx) = αα(x, x) = |α|2 kxk2 .
Property (iii), the triangle inequality, follows from the Cauchy-Schwarz in-
equality (6.3), since
kx + yk2 = (x + y, x + y)
= kxk2 + (x, y) + (y, x) + kyk2
≤ kxk2 + 2kxkkyk + kyk2
= (kxk + kyk)2 ,
i.e. kx + yk ≤ kxk + kyk.
As an example of the Cauchy-Schwarz inequality, consider the standard

inner product on Rn . As we would expect, the norm derived from this inner
product is just
 1/2
Xn
kxk =  |xj |2  .
j=1
The Cauchy-Schwarz inequality says that

2   
n n n
X X X
|(x, y)|2 = |xj |2   |yj |2  ,

xj yj ≤  (6.4)
j=1 j=1 j=1
or just |x · y| ≤ |x||y|.
Exercise 6.5 The norm on the sequence space ℓ2 derived from the inner
P
product (x, y) = xj ȳj is
 1/2
X∞
kxkℓ2 = |xj |2  .
j=1
Obtain the Cauchy-Schwarz inequality for ℓ2 using (6.4) and a limiting ar-
gument rather than Lemma 6.4.
The Cauchy-Schwarz inequality in L2 (a, b) gives the very useful:

Z b Z b 1/2 Z b 1/2
2 2

f (x)g(x) dx ≤ |f (x)| dx |g(x)| dx ,
a a a
for f, g ∈ L2 (a, b). (This shows in particular that if f, g ∈ L2 (a, b) then

f g ∈ L1 (a, b).)
6.3 The relationship between inner products and their norms
Norms derived from inner products have one key property in addition to
(i)–(iii) of Definition 2.1:
Lemma 6.6 (Parallelogram law) Let V be an inner product space with

induced norm k · k. Then
kx + yk2 + kx − yk2 = 2(kxk2 + kyk2 ) for all x, y ∈ V. (6.5)
Proof Simply expand the inner products:
kx + yk2 + kx − yk2 = (x + y, x + y) + (x − y, x − y)
= kxk2 + (y, x) + (x, y) + kyk2
+kxk2 − (y, x) − (x, y) + kyk2
= 2(kxk2 + kyk2 ).
52 Inner product spaces
Exercise 6.7 Show that there is no inner product on C 0 ([0, 1]) which induces
the sup or L1 norms,
Z 1
kf k∞ = sup |f (x)| or kf kL1 = |f (x)| dx.
x∈[0,1] 0
Given a norm that is derived from an inner product, one can reconstruct
the inner product as follows:
Lemma 6.8 (Polarisation identity) Let V be an inner product space with

induced norm k · k. Then if V is real
4(x, y) = kx + yk2 − kx − yk2 , (6.6)
while if V is complex
4(x, y) = kx + yk2 − kx − yk2 + ikx + iyk2 − ikx − iyk2 . (6.7)
Proof Once again, rewrite the right-hand sides as inner products, multiply
out, and simplify.
If V is an real/complex inner product space and k · k is a norm on V that

satisfies the parallelogram law then (6.6) or (6.7) defines an inner product
on V . In other words, the parallelogram law characterises those norms that
can be derived from inner products. (This argument is non-trivial.)
Lemma 6.9 If V is an inner product space with inner product (·, ·) and
derived norm k · k, then xn → x and yn → y implies that
(xn , yn ) → (x, y).
Proof Since xn and yn converge, kxn k and kyn k are bounded (the proof is
a simple exercise). Then
|(xn , yn ) − (x, y)| = |(xn − x, yn ) + (x, yn − y)|

≤ kxn − xkkyn k + kxkkyn − yk
implies that (xn , yn ) → (x, y).

This lemma is extremely useful: that we can swap limits and inner prod-
ucts means that if
Xn
xj
j=1
Pn P∞
converges (so that j=1 xj →x= j=1 xj ) then
 
∞
X ∞
X
 xj , y  = (xj , y),
j=1 j=1
i.e. we can swap inner products and sums.
Definition 6.10 A Hilbert space is a complete inner product space.
Examples: Rn with inner product and norm

 1/2
Xn n
X
(x1 , . . . , xn ), (y1 , . . . , yn ) = xj yj k(x1 , . . . , xn )k =  |xj |2  ;
j=1 j=1
Cn with inner product and norm

 1/2
Xn Xn
(w1 , . . . , wn ), (z1 , . . . , zn ) = wj z̄j k(w1 , . . . , wn )k =  |wj |2  ;
j=1 j=1
ℓ2 (K) with inner product and norm

 1/2
∞
X X∞
(x, y) = xj ȳj kxk =  |xj |2 
j=1 j=1
(the complex conjugate is redundant if K = R); and L2 (I) with inner prod-
uct and norm
Z Z 1/2
2
(f, g) = f (x)g(x) dx kf kL2 = |f (x)| dx .
I I
From now on we will assume unless explicitly stated that all the above
spaces are equipped by their standard inner product (and corresponding
norm).
7
Orthonormal bases in Hilbert spaces
From now on we will denote by H an arbitrary Hilbert space, with inner

product (·, ·) and norm k · k; we take K = C, since the case K = R is
simplified only by removing the complex conjugates.
Our aim in this chapter is to discuss orthonormal bases for Hilbert spaces.
In contrast to the Hamel basis we considered earlier, we are now going to
allow infinite linear combinations of basis elements (called a Schauder basis).
7.1 Orthonormal sets
Definition 7.1 Two elements x and y of an inner product space are said
to be orthogonal if (x, y) = 0. (We sometimes write x ⊥ y.)
Clearly if (x, y) = 0 then
kx + yk2 = (x + y, x + y) = kxk2 + (x, y) + (y, x) + kyk2 = kxk2 + kyk2 (7.1)
(Pythagoras). Sums of orthogonal vectors are therefore very useful in cal-

culations, since all the cross terms in their norm vanish.
Definition 7.2 A set E is orthonormal if kek = 1 for all e ∈ E and

(e1 , e2 ) = 0 for any e1 , e2 ∈ E with e1 6= e2 .
Note that this definition does not require E to be countable. Note also
54
7.1 Orthonormal sets 55
that any orthonormal set must be linearly independent, since if

n
X
αj ej = 0 n ∈ N, αj ∈ K, ej ∈ E,
j=1
one can take the inner product with each {ej } in turn to show that αj = 0
for j = 1, . . . , n.
Example 7.3 The set {ej }∞

j=1 , where
ej = (0, 0, . . . , 1, . . . , 0, . . .)
(with the 1 in the jth position), is an orthonormal set in ℓ2 .
Example 7.4 Consider the space L2 (−π, π) and the set

1 1 1 1 1
E = √ , √ cos t, √ sin t, √ sin 2t, √ cos 2t, . . . .
2π π π π π
Then E is orthonormal, since
Z π Z π
cos2 nt dt = sin2 nt dt = π;
−π −π
for any n, m
Z π Z π Z π
cos nt dt = sin nt dt = sin nt cos mt dt = 0;
−π −π −π
and for any n 6= m

Z π Z π Z π
cos nt cos mt dt = sin nt sin mt dt = sin nt cos mt dt = 0.
−π −π −π
Expansions involving orthonormal elements of an inner product space are

easy to treat, essentially due to the following simple lemma.
Lemma 7.5 Let e1 , . . . , en be an orthonormal set in an inner product space

V . Then for any αj ∈ K.
2
n n
X X
|αj |2 .

αj ej
=

j=1 j=1
56 Orthonormal bases in Hilbert spaces
Proof Use induction and the Pythagorean property (7.1), noting that
 
n−1
X
 αj ej , αn en  = 0.
j=1
The following lemma, proved using the Gram-Schmidt orthonormalisa-

tion process (which we will revisit later), guarantees the existence of an
orthonormal basis in any finite-dimensional inner product space.
Lemma 7.6 Let (·, ·) be any inner product on a vector space V of dimension
n. Then there exists an orthonormal basis {ej }nj=1 of V .
It follows that in some sense the dot product (6.1) is the canonical in-
ner product on a finite-dimensional space. Indeed, with respect to any the
orthonormal basis {ej } the inner product (·, ·) has the form (6.1), i.e.
 
Xn n
X n
X
 xj ej , yk ek  = xj ȳk (ej , ek ) = x1 ȳ1 + · · · + xn ȳn .
j=1 k=1 i,j=1
7.2 Convergence and orthonormality in Hilbert spaces
In an infinite-dimensional Hilbert space we cannot hope to find a finite basis,

since then the space would by definition be finite-dimensional. The best that
we can hope for is to find a countable basis {ej }∞ j=1 , in terms of which to
expand any x ∈ H as potentially infinite series,
∞
X
x= αj ej .
j=1
We make the obvious definition of what this equality means.
Definition 7.7 Let (X, k · kX ) be a normed space. Then

∞
X
αj ej = x
j=1
iff the partial sums converge to x in the norm of X, i.e.

 
n
X

αn en − x

→0 as n → ∞.
j=1
X
We now formalise our notion of a basis for a Hilbert space. Note that at
present we do not require E to be countable.
Definition 7.8 A set E is a basis for H if every x can be written uniquely†

in the form
X∞
x= αj ej (7.2)
j=1
for some αj ∈ K, ej ∈ E.
(Note that if E is a basis in the sense of Definition 7.8, i.e. the expansion
in terms of the ej is unique then E is linearly independent, since if
n
X
0= αj ej
j=1
there is a unique expansion for zero and so we must have αj = 0 for all
j = 1, . . . , n.)
† There is a subtlety here. If E is countable then one can assume that E = {ej }∞ j=1 , i.e. that the
elements of E are specified in a particular order. In this case the ‘uniqueness’ is clear. But if
E is uncountable ‘uniqueness’ means that one does not care about the order of the summation
in (7.2), and so certainly this order should not affect the value of the sum itself. This requires
proof: Suppose that {wj } is a rearrangement of the {ej }. Set αn = (x, en ), βm = (x, wm ),
∞
X ∞
X
x1 = αn en and x2 = βm wm .
n=1 m=1
Then (by Lemma 6.9) it follows that (x1 , en ) = (x, en ) and (x2 , wm ) = (x, wm ). Since en = wm
for some m = m(n), it follows that
(x1 − x2 , en ) = (x1 , en ) − (x2 , wm ) = (x, en ) − (x, wm ) = 0
for every n, and similarly (x1 − x2 , wm ) = 0 for every m. Thus

∞ ∞
!
X X
2
kx1 − x2 k = x1 − x2 , αn en − βm wm
n=1 m=1
∞
X X∞
= ᾱn (x1 − x2 , en ) − β̄m (x1 − x2 , wm ) = 0,
n=1 m=1
and so x1 = x2 , as required.
If E is a basis and is orthonormal, we refer to it as an orthonormal basis.

Note that an orthonormal set is a basis provided that every x can be written
in the form (7.2), and the uniqueness follows from the orthonormality.
Indeed, suppose that
∞
X ∞
X
x= αj ej and x= βk fk
j=1 j=1
with {αj }, {βk } ∈ K non-zero, and {ej }, {fk } ∈ E.
Write E1 = {ej }∞ ∞ ′
j=1 and E2 = {fj }j=1 . Set U = E1 ∩ E2 , E1 = E1 \ U ,
′
and E2 = E2 \U . In other words, U consists of those basis elements common
to {ej } and {fj }, E1′ is those occurring only in {ej }, and E2′ those occurring
only in {fj }. All these sets are at most countable.
Then with some abuse of notation one can write

X X X X
x= αu u + αe e and y= βu u + βf f.
u∈U e∈E1′ u∈U f ∈E2′
It follows that
X X X
(αu − βu )u + αe e − βf f = 0.
u∈U e∈E1′ f ∈E2′
Taking the inner product of this with any e ∈ E1′ shows that αe = 0, while
the inner product with any f ∈ E2′ implies that βf = 0. So in fact E1 = E2 ,
and the inner product with any u ∈ U then shows that αu = βu , and so the
expansion is unique. (In each of these steps we use Lemma 6.9 to swap the
order of the inner product and summation.)
Rather than discuss general bases, we concentrate on orthonormal bases,

with a particularly emphasis on countable bases E = {ej }∞ j=1 . Neglecting
for the moment the question of convergence, and of conditions to guarantee
that {ej } really is a basis, suppose that the equality (7.2) holds for some
x ∈ H, i.e. that
∞
X
x= αj ej .
j=1
To find the coefficients αj , simply take the inner product with some ek to
give
 
X∞ ∞
X
(x, ek ) =  αj ej , ek  = αj (ej , ek ) = αk ,
j=1 j=1
P
and so we would expect αk = (x, ek ). (Note that if x = αj ej then this
manipulation is rigorous, using Lemma 6.9 to change the order of the inner
product and the sum.) So if E is an orthonormal basis we would expect to
obtain the expansion
∞
X
x= (x, ej )ej .
j=1
Assuming that the Pythagoras result of Lemma 7.5 holds for infinite sums,
we would expect that
X∞
|(x, ej )|2 = kxk2 .
j=1
In some ways this says that ‘the {ej } capture all directions in H’. Presum-
ably if {ej } do not form an orthonormal basis we should be able to find an
x such that
X∞
|(x, ej )|2 < kxk2 .
j=1
We have not proved any of this yet, since we are assuming that (7.2) holds;
but it motivates the following lemma, whose result is known as Bessel’s
inequality.
Lemma 7.9 (Bessel’s inequality) Let V be an inner product space and

{en }∞
n=1 an orthonormal sequence. Then for any x ∈ V we have
∞
X
|(x, en )|2 ≤ kxk2
n=1
and in particular the left-hand side converges.
Proof Let us denote by xk the partial sum

k
X
xk = (x, ej )ej .
j=1
Clearly
k
X
2
kxk k = |(x, ej )|2
j=1
and so we have
kx − xk k2 = (x − xk , x − xk )
= kxk2 − (xk , x) − (x, xk ) + kxk k2
k
X k
X
= kxk2 − (x, ej )(ej , x) − (x, ej )(x, ej ) + kxk k2
j=1 j=1
2 2
= kxk − kxk k .
It follows that
k
X
|(x, ej )|2 = kxk k2 ≤ kxk2 − kx − xk k2 ≤ kxk2 .
j=1
We can give an interesting corollary about the the coefficients (x, e) when
E is an uncountable set.
Corollary 7.10 Let E be an uncountable orthonormal set in an inner prod-

uct space V . Then for each x ∈ V ,
{e ∈ E : (x, e) 6= 0}
is at most a countable set.
Proof For each fixed m ∈ N, consider the set
Em = {e ∈ E : |(x, e)| ≥ 1/m}.
Then this set can have no more than m2 kxk2 elements. Indeed, if Em has
N > m2 kxk2 elements, one can select N elements {e1 , . . . , eN } from Em ,
and then
N
X 1
|(x, ej )|2 ≥ N × > kxk2 .
m2
j=1
But this contradicts Bessel’s inequality. Thus each Em contains only a finite
number of elements, and hence
∞
[
Em = {e ∈ E : (x, e) 6= 0}
m=1
contains at most a countable number of elements.
We now use Bessel’s inequality to give a simple criterion for the conver-
gence of a sum ∞
P
j=1 αj ej when the {ej } are orthonormal.
Lemma 7.11 Let H be a Hilbert space and {en } an orthonormal sequence

in H. The series ∞
P
n=1 αn en converges iff
∞
X
|αn |2 < +∞
n=1
and then
∞ 2 ∞
X X
αn en = |αn |2 . (7.3)

n=1 n=1
P∞
We could rephrase this as n=1 αn en converges iff α = (α1 , α2 , . . .) ∈ ℓ2 .
Pn
Proof Suppose that j=1 αj ej converges to x as n → ∞; then
2
n n
X X
|αj |2

αj ej =

j=1 j=1
converges to kxk2 as n → ∞ (see Lemma 2.16).

Conversely, if ∞ |α |2 < +∞ then { nj=1 |αj |2 } is a Cauchy sequence.
P P
Pn j=1 j
Setting xn = j=1 αj ej we have, taking wlog m > n,
2
Xm Xm
2
|αj |2 ,

kxn − xm k = αj ej =

j=n+1 j=n+1
and so {xn } is a Cauchy sequence and therefore converges to some x ∈ H,

since H is complete. The equality in (7.3) follows as above.
By combining this lemma with Bessel’s inequality we obtain:

Corollary 7.12 Let H be a Hilbert space and {en }∞

n=1 an orthonormal
sequence in H. Then for any x the sequence
∞
X
(x, en )en
n=1
converges.
In fact one can use Corollary 7.10 to deduce that for any orthonormal set
E,
X
(x, e)e
e∈E
converges (we have already seen that this is independent of the order of
summation).
7.3 Orthonormal bases in Hilbert spaces
We now show that E = {en }∞

n=1 forms a basis for H iff
∞
X
kxk2 = |(x, ej )|2 for all x ∈ H.
j=1
The same results hold for a general orthonormal set E, but we stick to the
countable case for simplicity of presentation.
Proposition 7.13 Let E = {ej }∞ j=1 be an orthonormal set in a Hilbert

space H. Then the following are equivalent to the statement that E is an
orthonormal basis for H:
(a) x = ∞
P
n=1 (x, en )en for all x ∈ H;
(b) kxk = ∞
2 2
P
n=1 |(x, en )| for all x ∈ H; and
(c) (x, en ) = 0 for all n implies that x = 0.
(d) the linear span of E = {en }∞ n=1 is dense in H, i.e. for any x ∈ H
and any ǫ > 0 there exists an n ∈ N, fj ∈ E and αj ∈ K such that

Xn

x − αj fj < ǫ.

j=1
7.3 Orthonormal bases in Hilbert spaces 63
Proof If E is an orthonormal basis for H then we can write

∞
X n
X
x= αj ej , i.e. x = lim αj ej .
j→∞
j=1 j=1
Clearly if k ≤ n we have
Xn
( αj ej , ek ) = αk ,
j=1
and using the properties of the inner product of limits we obtain αk = (x, ek )
and hence (a) holds. The same argument shows that if we assume (a) then
this expansion is unique and so E is a basis.
We first show that (a) ⇒ (b) ⇒ (c) ⇒ (a), and then that (a) ⇒ (d) and
(d) ⇒ (c).
(a) ⇒ (b) is immediate from (2.16).
(b) ⇒ (c) is immediate since kxk = 0 implies that x = 0.
(c) ⇒ (a) Take x ∈ H and let
∞
X
y =x− (x, ej )ej .
j=1
For each m ∈ N we have, using Lemma 6.9 (continuity of the inner product),
 
Xn
(y, em ) = (x, em ) − lim  (x, ej )ej , em 
n→∞
j=1
= 0
since eventually n ≥ m. It follows from (c) that y = 0, i.e. that
∞
X
x= (x, ej )ej
j=1
as required.
(a) ⇒ (d) is clear, since given any x and ǫ > 0 there exists an n such that

X n

(x, ej )ej − x < ǫ.

j=1
(d) ⇒ (c) Take x ∈ H such that (x, ej ) = 0 for every j. Choose xn

contained in the linear space of E such that xn → x. Then
kxk2 = (x, x) = ( lim xn , x) = lim (xn , x) = 0,
n→∞ n→∞
since xn is a (finite) linear combination of the ej . So x = 0.
Example 7.14 The set {ej }∞

j=1 , where
ej = (0, 0, . . . , 1, . . . , 0, . . .)
(with the 1 in the jth position), is an orthonormal basis for ℓ2 , since it clear
that if (x, ej ) = xj = 0 for all j then x = 0.
Example 7.15 The sine and cosine functions given in example 7.4 are an
o-n basis for L2 (−π, π).
Lemma 7.16 Any infinite-dimensional Hilbert H space contains a countably

infinite orthonormal sequence.
Proof Suppose that H contains an orthonormal set Ek = {ej }kj=1 . Then Ek

does not form a basis for H, since H is infinite-dimensional. It follows that
there exists a non-zero uk ∈ H such that
(uk , ej ) = 0 for all j = 1, . . . , k
(for otherwise by characterisation (c), Ek would be a basis). Define ek+1 =
uk /kuk k to obtain an orthonormal set Ek+1 = {e1 , . . . , ek }.
The result follows by induction, start with e1 = x/kxk for any non-zero
x ∈ H.
Theorem 7.17 A Hilbert space is finite-dimensional iff its unit ball is com-

pact.
Proof The unit ball is closed and bounded. If H is finite-dimensional this

is equivalent to compactness by Corollary 3.12. If H is infinite-dimensional
then it contains a countable orthonormal set {ej }∞
j=1 , and for i 6= j
kei − ej k2 = 2.
The {ej } form a sequence in the unit ball that can have no convergent
subsequence.
8
Closest points and approximation
8.1 Closest points in convex subsets
We start with a general result about closest points.
Definition 8.1 A subset A of a vector space V is said to be convex if for

every x, y ∈ A and λ ∈ [0, 1], λx + (1 − λ)y ∈ A.
Lemma 8.2 Let A be a non-empty closed convex subset of a Hilbert space

H and let x ∈ H. Then there exists a unique â ∈ A such that
kx − âk = inf{kx − ak : a ∈ A}.
Proof Set δ = inf{kx − ak : a ∈ A} and find a sequence an ∈ A such that

1
kx − an k2 ≤ δ2 + . (8.1)
n
We will show that {an } is a Cauchy sequence. To this end, we use the
parallelogram law:
k(x − an ) + (x − am )k2 + k(x − an ) − (x − am )k2 = 2[kx − an k2 + kx − am k2 ].
Which gives
2 2
k2x − (an + am )k2 + kan − am k2 < 4δ2 + +
m n
or
2 2
kan − am k2 ≤ 4δ2 + + − 4kx − 12 (an + am )k2 .
m n
65
66 Closest points and approximation
Since A is convex, an + am ∈ A, and so kx − 12 (an + am )k2 ≥ δ2 , which gives

2 2
kan − am k2 ≤ + .
m n
It follows that {an } is Cauchy, and so an → â. Since A is closed, â ∈ A.
To show that â is unique, suppose that ku − a∗ k = δ with a∗ 6= â. Then
ku − 12 (a∗ + â)k ≥ δ since A is convex, and so, using the parallelogram law
again,
ka∗ − âk2 ≤ 4γ 2 − 4γ 2 = 0,
i.e. a∗ = â and â is unique.
8.2 Linear subspaces and orthogonal complements
In an infinite-dimensional space, linear subspaces need not be closed. For

example, the space ℓf (R) of all real sequences with only a finite number of
non-zero terms is a linear subspace of ℓ2 (R), but is not closed (consider the
sequence x(n) = (1, 12 , 13 , . . . , n1 , 0, . . .)).
If X is a subset of H then the orthogonal complement of X in H is

X ⊥ = {u ∈ H : (u, x) = 0 for all x ∈ X}.
Clearly if Y ⊆ X then X ⊥ ⊆ Y ⊥ .
Lemma 8.3 If X is a subset of H then X ⊥ is a closed linear subspace of

H.
Proof It is clear that X ⊥ is a linear subspace of H: if u, v ∈ X ⊥ and α ∈ K

then
(u + v, x) = (u, x) + (v, x) = 0 and (αu, x) = α(u, x) = 0
for every x ∈ X. To show that X ⊥ is closed, suppose that un ∈ X ⊥ and
un → u then
(u, x) = lim (un , x) = 0,
n→∞
and so X ⊥ is closed.
Note that in general (X ⊥ )⊥ ⊇ X; one has equality if X is a closed linear

subspace (see examples sheet).
8.2 Linear subspaces and orthogonal complements 67
Note that Proposition 7.13 shows that E is a basis for H iff E ⊥ = {0}
(since this is just a rephrasing of (c): (u, ej ) = 0 for all j implies that u = 0).
Note also that if Span(E) denotes the linear span of E, i.e.

n
X
Span(E) = {u ∈ H : u = αj ej n ∈ N, αj ∈ K, ej ∈ E}
j=1
then E ⊥ = (Span(E))⊥ . Since E ⊆ Span(E) one immediately has the

inclusion (Span(E))⊥ ⊆ E ⊥ , and if y ∈ E ⊥ , then for any u ∈ Span(E),
i.e. for any
n
X
u= αj ej n ∈ N, αj ∈ K, ej ∈ E,
j=1
one has
n
X n
X
(y, u) = (y, αj ej ) = ᾱj (y, ej ) = 0
j=1 j=1
since y ∈ E ⊥ . So y ∈ (Span(E))⊥ , which shows that E ⊥ ⊆ (Span(E))⊥ and

hence the claimed equality.
In fact one also has E ⊥ = (clin(E))⊥ . Recall that the closed linear span
of E, clin(E) is given by
clin(E) = {u ∈ H : ∀ ǫ > 0 there exists an x ∈ Span(E) with ku − xk < ǫ}.
Since Span(E) ⊆ clin(E), we have (Span(E))⊥ ⊇ (clin(E))⊥ . To show

equality, take y ∈ (Span(E))⊥ and u ∈ clin(E): we want to show that
(y, u) = 0 so that u ∈ (clin(E))⊥ . Now, since u ∈ clin(E), there exists a
sequence xn ∈ Span(E) such that xn → 0. Therefore
(y, u) = (y, lim xn ) = lim (y, xn ) = 0,

n→∞ n→∞
as required.
Proposition 8.4 If U is a closed linear subspace of a Hilbert space H then

any x ∈ H can be written uniquely as
x=u+v with u ∈ U, v ∈ U ⊥,
i.e. H = U ⊕ U ⊥ . The map PU : H → U defined by
PU x = u
is called the orthogonal projection of x onto U , and satisfies
PU2 x = PU x and kPU xk ≤ kxk for all x ∈ H.
Proof If U is a closed linear subspace then U is closed and convex, so the

above result shows that given x ∈ H there is a unique closest point u ∈ U .
It is now simple to show that x − u ∈ U ⊥ and then such a decomposition is
unique.
Indeed, consider v = x − u; the claim is that v ∈ U ⊥ , i.e. that
(v, y) = 0 for all y ∈ U.
Consider kx − (u − ty)k = kv + tyk; then
∆(t) = kv + tyk2 = (v + ty, v + ty)

= kvk2 + (ty, v) + (v, ty)) + kyk2
= kvk2 + t(y, v) + t(y, v) + |t|2 kyk2
= kvk2 + 2 Re{t(y, v)} + |t|2 kyk2 .
We know from the construction of u that kv + tyk is minimal when t = 0. If

t is real then this implies that d∆/dt(0) = 2 Re{(y, v)} = 0. If t = is, with
s real, then d∆(is)/ds = −2 Im{(y, v)} = 0. So (y, v) = 0 for any y ∈ U ,
i.e. v ∈ U ⊥ is claimed.
Finally, the uniqueness follows easily: if x = u1 + v1 = u2 + v2 , then
u1 − u2 = v2 − v1 , and so
|v1 − v2 |2 = (v1 − v2 , v1 − v2 ) = (v1 − v2 , u2 − u1 ) = 0,
since u1 − u2 ∈ U and v2 − v1 ∈ U ⊥ .
If PU x denotes the closest point to x in U then clearly PU2 = PU , and it
follows from the definition of u that
kxk2 = kuk2 + kx − uk2 ,
thus ensuring that

kPU xk ≤ kxk,
i.e. the projection can only decrease the norm.
We will find an explicit expression for PU in Theorem 8.5

8.3 Best approximations
We now investigate the best approximation of elements of H using the closed

linear span of an orthonormal set E. Of course, if E is a basis then there is
no approximation involved.
Theorem 8.5 Let E be an orthonormal set E = {ej }j∈J , where J =

(1, 2, . . . , n) or N. Then for any u ∈ H, the closest point to x in clin(E) is
given by
X
y= (x, ej )ej .
j∈J
In particular the orthogonal projection of X onto clin(E) is given by

X
Pclin(E) x = (x, ej )ej .
j∈J
P
Proof Consider x − j αj ej . Then
2

X X X X
2
|αj |2

x −
= kxk −
αj ej (x, αj ej ) − (αj ej , x) +
j j j j
X X X
2
= kxk − ᾱj (x, ej ) − αj (x, ej ) + |αj |2
j j j
X
2 2
= kxk − |(x, ej )|
j
Xh i
+ |(x, ej )|2 − ᾱj (x, ej ) − αj (x, ej ) + |αj |2
j
X X
2
= kxk − |(x, ej )|2 + |(x, ej ) − αj |2 ,
j j
and so the minimum occurs when αj = (x, ej ) for all j.
Example 8.6 The best approximation of an element x ∈ ℓ2 in terms of

{ej }nj=1 (elements of the standard basis) is simply
n
X
(x, ej )ej = (x1 , x2 , . . . , xn , 0, 0, . . .).
j=1
Example 8.7 If E = {ej }∞j=1 is an orthonormal basis in H then the best

approximation of an element of H in terms of {ej }nj=1 is just given by the
partial sum
Xn
(x, ej )ej .
j=1
Now suppose that E is a finite or countable set that is not orthonormal.

We can still find the best approximation to any u ∈ H that lies in clin(E)
by using the Gram-Schmidt orthonormalisation process:
Proposition 8.8 (Gram-Schmidt orthonormalisation) Given a set

E = {ej }j∈J with J = N or J = (1, . . . , n) there exists an orthonor-
mal set Ẽ = {ẽj }j∈J such that
Span(e1 , . . . , ek ) = Span(ẽ1 , . . . , ẽk )
for every k ∈ N.
Proof First omit all elements of {en } which can be written as a linear
combination of the preceding ones.
Now suppose that we already have an orthonormal set (ẽ1 , . . . , ẽn ) whose
span is the same as (e1 , . . . , en ). Then we can define ẽn+1 by setting
n
X e′n+1
e′n+1 = en+1 − (en+1 , ẽi )ẽi and ẽn+1 = .
ke′n+1 k
i=1
The span of (ẽ1 , . . . , ẽn+1 ) is clearly the same as the span of (ẽ1 , . . . , ẽn , en+1 ),
which is the same as the span of (e1 , . . . , en , en+1 ) using the induction hy-
pothesis. Clearly kẽn+1 k = 1 and for m ≤ n we have
n
1 X
(ẽn+1 , ẽm ) = (en+1 , ẽm ) − (en+1 , ẽi )(ẽi , ẽm ) = 0
ke′n+1 k
i=1
since (ẽ1 , . . . , ẽn ) are orthonormal. Setting ẽ1 = e1 /ke1 k starts the induction.
Example 8.9 Consider approximation of functions in L2 (−1, 1) with poly-

nomials of degree up to n. We can start with the set {1, x, x2 , . . . , xn },
and then use the Gram-Schmidt process to construct polynomials that are
orthonormal w.r.t. the L2 (−1, 1) inner product.
√
We begin with e1 = 1/ 2, and then consider
1 1

1 1
Z
′
e2 = x − x, √ √ =x− t dt = x
2 2 2 −1
so
1
r
2 3
Z
ke′2 k2 = t dt = ; 2
e2 = x.
−1 3 2
Then
r !r
3 3 2 1 1
e′3 = x − 2 2
x , x x− x ,√ √
2 2 2 2
1
3x 3t3 1 1 2
Z Z
2
= x − dt − t dt
2 −1 2 2 −1
1
= x2 − ,
3
so
1 2 1
t5 2t3

1 t 8
Z
ke′3 k2 = 2
t − dt = − + =
−1 3 5 9 9 −1 45
which gives
r
5
3x2 − 1 .

e3 =
8
q
7
Exercise 8.10 Show that e4 = 8 (5x3 − 3x), and check that this is or-
thogonal to e1 , e2 , and e3 .
Using these orthonormal functions we can find the best approximation of

any function f ∈ L2 (−1, 1) by a degree three polynomial:
Z 1 Z 1
7 3 3 5
f (t)(5t − 3t) dt (5x − 3x) + f (t)(3t − 1) dt (3x2 − 1)
2
8 −1 8 −1
Z 1
1 1

3
Z
+ f (t)t dt x + f (t) dt.
2 −1 2 −1
Example 8.11 The best approximation of f (x) = |x| by a third degree

polynomial is
Z 1 Z 1
15x2 + 3

5 3 2 5 1
f3 (x) = 3t − t dt (3x −1)+ t dt = (3x2 −1)+ = .
4 0 0 16 2 16
We have (after some tedious integration)

Z 1
2 2 3
kf − f3 k = 2 (15x2 − 16x + 3)2 dx =
16 0 16
Of course, the meaning of the ‘best approximation’ is that this choice

minimises the L2 norm of the difference. It is not the ‘best approximation’
in terms of the supremum norm: at x = 0 the value of f3 (x) = 3/16, while
1 1
sup |x − (x2 + )| = .
x∈[0,1] 8 8
Exercise 8.12 Find the best approximation (w.r.t. L2 (−1, 1) norm) of sin x
by a third degree polynomial.
Exercise 8.13 Find the first four polynomials that are orthogonal on L2 (0, 1)
with respect to the usual L2 inner product.
9
Separable Hilbert spaces and ℓ2
We start with a definition.
Definition 9.1 A normed space is separable if it contains a countable dense

subset.
This is an approximation property: one can find a countable set {xn }∞

n=1
such that given any u ∈ H and ǫ > 0, there exists an xj such that
kxj − uk < ǫ.
Example 9.2 R is separable, since Q is a countable dense subset. So is Rn ,

since Qn is countable and dense. C is separable since complex numbers of
the form q1 + iq2 with q1 , q2 ∈ Q is countable and dense.
Example 9.3 ℓ2 is separable, since sequences of the form
x = (x1 , . . . , xn , 0, 0, 0, . . .)
with x1 , . . . , xn ∈ Q are dense.
We now show that C 0 ([0, 1]) is separable, by proving the Weierstrass ap-
proximation theorem: every continuous function can be approximated arbi-
trarily closely (in the supremum norm) by a polynomial.
73
74 Separable Hilbert spaces and ℓ2
Theorem 9.4 Let f (x) be a real-valued continuous function on [0, 1]. Then
the sequence of polynomials
n
X n
Pn (x) = f (p/n)xp (1 − x)n−p
p
p=0
converges uniformly to f (x) on [0, 1].
Proof Start with the identity

n
n
X n
(x + y) = xp y n−p .
p
p=0
Differentiate with respect to x and multiply by x to give

n
n−1
X n p n−p
nx(x + y) = p x y ;
p
p=0
differentiate twice with respect to x and multiply by x2 to give

n
2 n−2
X n p n−p
n(n − 1)x (x + y) = p(p − 1) x y .
p
p=0
It follows that if we write

n p
rp (x) = x (1 − x)n−p
p
we have
Xn n
X n
X
rp (x) = 1, prp (x) = nx, and p(p − 1)rp (x) = n(n − 1)x2 .
p=0 p=0 p=0
Therefore
n
X n
X n
X n
X
2 2 2
(p − nx) rp (x) = n x rp (x) − 2nx prp (x) + p2 rp (x)
p=0 p=0 p=0 p=0
2 2 2
= n x − 2nx · nx + (nx + n(n − 1)x )
= nx(1 − x).
Since f is continuous on the closed bounded interval it is bounded, |f (x)| ≤

M for some M > 0. It also follows that f is uniformly continuous on [0, 1],
so for any ǫ > 0 there exists a δ > 0 such that
|x − y| < δ ⇒ |f (x) − f (y)| < ǫ.
Separable Hilbert spaces and ℓ2 75
Pn
Since p=0 rp (x) = 1 we have

n
n
X X

f (x) − f (p/n)rp (x) = (f (x) − f (p/n))rp (x)

p=0 p=0

X

≤ (f (x) − f (p/n))rp (x)
|(p/n)−x|≤δ

X

+ (f (x) − f (p/n))rp (x) .
|(p/n)−x|>δ
For the first term on the right-hand side we have

X X

(f (x) − f (p/n))rp (x)≤ǫ
rp (x) = ǫ,
|(p/n)−x|≤δ
and for the second term on the right-hand side

X X

(f (x) − f (p/n))rp (x) ≤ 2M
rp (x)
|(p/n)−x|>δ |(p/n)−x|>δ
n
2M X
≤ (p − nx)2 rp (x)
n2 δ 2
p=0
2M x(1 − x) 2M
= ≤ 2
nδ2 nδ
which tends to zero as n → ∞.
One could also state this as: the set of polynomials is dense in C 0 ([0, 1])
equipped with the supremum norm.
Proposition 9.5 C 0 ([0, 1]) is separable.
Proof Given any f ∈ C 0 ([0, 1]), it can be approximated to within ǫ by some

polynomial, i.e.
 

X N
n 

f −  an x < ǫ/2.

j=1
∞
76 Separable Hilbert spaces and ℓ2
While the set of all polynomials in not countable, the set of all polynomials
with rational coefficients is. Since

N N N
! !
X X X
an xn − bn xn ≤ |an − bn |,

n=1 n=1 ∞ n=1
one can choose bn ∈ Q such that |an − bn | < ǫ/2N , and then
 
XN
n

f −  bn x 
< ǫ.
j=n
∞
If we now use the fact that C 0 ([0, 1]) is dense in L2 (0, 1), it follows that:
Proposition 9.6 L2 (0, 1) is separable.
Proof Take f ∈ L2 (0, 1). Given ǫ > 0 there exists a g ∈ C 0 ([0, 1]) such that
kf − gkL2 < ǫ/2. We know from above that there exists a polynomial h with
rational coefficients such that
kg − hk∞ < ǫ/2.
Since
Z 1 Z 1
kg − hk2L2 = 2
|g(x) − h(x)| dx ≤ kg − hk2∞ dx = kg − hk2∞ ,
0 0
it follows that
kf − hkL2 ≤ kf − gkL2 + kg − hk∞ < ǫ.
The property of separability seems very strong, but it is a simple conse-

quence of the existence of a countable orthonormal basis, as we now show.
Proposition 9.7 An infinite-dimensional Hilbert space is separable iff it

has a countable orthonormal basis.
Note that this shows immediately that the unit ball in a separable Hilbert
space is not compact.
Separable Hilbert spaces and ℓ2 77
Proof If a Hilbert space has a countable basis then we can construct a

countable dense set by taking finite combinations of the basis elements with
rational coefficients, and so it is separable.
If H is separable, let E = {xn } be a countable dense subset. In particular,
the closed linear span of E is the whole of H. The Gram-Schmidt process
now provides a countable orthonormal set whose closed linear span is all of
H, i.e. a countable orthonormal basis.
Note that there are Hilbert spaces that are not separable. For example, if
Γ is uncountable then the space ℓ2 (Γ) consisting of all functions f : Γ → R
such that
X
|f (γ)|2 < ∞
γ∈Γ
is a Hilbert space but is not separable.
By using these basis elements we can construct an isomorphism between

any separable Hilbert space and ℓ2 , so that in some sense ℓ2 is “the only”
separable infinite-dimensional Hilbert space:
Theorem 9.8 Any infinite-dimensional separable Hilbert space H is iso-

metric to ℓ2 (K).
Proof Since H is separable it has a countable orthonormal basis {ej }. Define

ϕ : H → ℓ2 by the map

u 7→ (u, e1 ), (u, e2 ), . . . , (u, en ), . . . ;
clearly the inverse map is given by

∞
X
α 7→ αj ej where α = (α1 , α2 , . . . , αn , . . .).
j=1
By the result of Lemma 7.11, u ∈ H ⇒ ϕ(u) ∈ ℓ2 and α ∈ ℓ2 ⇒ ϕ−1 (α) ∈ H,

while the characterisation of a basis in Proposition 7.13 shows that kukH =
kϕ(u)kℓ2 .
10
Linear maps between Banach spaces
We now consider linear maps between Banach spaces.
10.1 Bounded linear maps
Definition 10.1 If U and V are vector spaces over K then an operator A

from U into V is linear if
A(αx + βy) = αAx + βAy for all α, β ∈ K, x, y ∈ U.
The collection L(U, V ) of all linear operators from U into V is a vector

space.
Definition 10.2 A linear operator A from a normed space (X, k · kX ) into

another normed space (Y, k · kY ) is bounded if there exists a constant M
such that
kAxkY ≤ M kxkX for all x ∈ X. (10.1)
Linear operators on infinite-dimensional spaces need not be bounded.
Lemma 10.3 A linear operator T : X → Y is continuous iff it is bounded.
Proof Suppose that T is bounded; then for some M > 0
kT xn − T xkY = kT (xn − x)kY ≤ M kxn − xkX ,
78
and so T is continuous. Now suppose that T is continuous; then in particular

it is continuous at zero, and so, taking ǫ = 1 in the definition of continuity,
there exists a δ > 0 such that
kT xk ≤ 1 for all kxk ≤ δ.
It follows that

kzk δz kzk δz
≤ 1 kzk,
kT zk = T
= T
δ kzk δ kzk δ
and so T is bounded.
The space of all bounded linear operators from X into Y is denoted by

B(X, Y ).
Definition 10.4 The operator norm of an operator A (from X into Y ) is

the smallest value of M such that (10.1) holds,
kAkB(X,Y ) = inf{M : (10.1) holds}. (10.2)
Note that – and this is the key point – it follows that
kAxkY ≤ kAkB(X,Y ) kxkX for all x ∈ X.
(Since for each x ∈ X, kAxkY ≤ M kxkX for every M > kAkB(X,Y ) ).
Lemma 10.5 The following is an equivalent definition of the operator norm:
kAkB(X,Y ) = sup kAxkY . (10.3)

kxkX =1
Proof Let us denote by kAk1 the value defined in (10.2), and by kAk2 the
value defined in (10.3). Then given x 6= 0 we have

A x ≤ kAk2

kxkX i.e. kAxkY ≤ kAk2 kxkX ,
Y
and so kAk1 ≤ kAk2 . It is also clear that if kxkX = 1 then
kAxkY ≤ kAk1 kxkX = kAk1 ,
and so kAk2 ≤ kAk1 . It follows that kAk1 = kAk2 .

80 Linear maps between Banach spaces
Exercise 10.6 Show that also

kAkB(X,Y ) = sup kAxkY
kxkX ≤1
and
kAxkY
kAkB(X,Y ) = sup . (10.4)
x6=0 kxkX
When there is no room for confusion we will omit the B(X, Y ) subscript
on the norm, sometimes adding the subscript “op” (for “operator”) to make
things clearer (k · kop ).
If T : X → Y then in order to find kT kop one can try the following: first
show that
kT xkY ≤ M kxkX (10.5)
for some M > 0, i.e. show that T is bounded. It is then clear that kT kop ≤ M
(since kT kop is the infimum of all M such that (10.5) holds). Then, in order
to show that in fact kT kop = M , find an example of a particular z ∈ X such
that
kT zkY = M kzkX .
This shows from the definition in (10.4) that kT kop ≥ M and hence that in
fact kT kop = M .
Example 10.7 Consider the right and left shift operators on ℓ2 , σr and σl
given by
σr (x) = (0, x1 , x2 , . . .) and σl (x) = (x2 , x3 , x4 , . . .).
Both operators are clearly linear. We have
∞
X
kσr (x)k2ℓ2 = |xi |2 = kxk2ℓ2 ,
i=1
so that kσr kop = 1, and

∞
X
kσl (x)k2ℓ2 = |xi |2 ≤ kxk2ℓ2 ,
i=2
so that kσl kop ≤ 1. However, if we choose an x with

x = (0, x2 , x3 , . . .)
then we have
∞
X
kσl (x)k2ℓ2 = |xj |2 = kxk2ℓ2 ,
j=2
and so we must have kσl kop = 1.
In slightly more involved examples one might have to be a little more

crafty; for example, given the bound kT xkY ≤ M kxkX find a sequence
zn ∈ X such that
kT zn kY
→M
kzn kX
as n → ∞, which again shows using (10.4) that kT kop ≥ M and hence that
kT kop = M .
Example 10.8 Consider the space L2 (a, b) with −∞ < a < b < +∞ and
the multiplication operator T from L2 (a, b) into itself given by
T x(t) = f (t)x(t) t ∈ [a, b]
where f ∈ C 0 ([a, b]). Then clearly T is linear and

Z b
kT xk2 = |f (t)x(t)|2 dt
a
Z b
= |f (t)|2 |x(t)|2 dt
a
Z b
2
≤ max |f (t)| |x(t)|2 dt,
a≤t≤b a
and so
kT xkL2 ≤ kf k∞ kxkL2 ,
i.e. kT kop ≤ kf k∞ .
Now let s be a point at which |f | is maximum. Assume for simplicity that
s ∈ (a, b), and for each ǫ > 0 consider
(
1 |t − s| < ǫ
xǫ (t) =
0 otherwise,
then
s+ǫ
kT xǫ k2 1
Z
= |f (t)|2 dt → |f (s)|2 as ǫ→0
kxǫ k2 2ǫ s−ǫ
since f is continuous. Therefore in fact
kT kop = kf k∞ .
If s = a then we can replace |t−s| < ǫ in the definition of xǫ by a ≤ t < a+ǫ,

and if s = b we replace it by b−ǫ < t ≤ b; the rest of the argument is identical.
We now give a very important particular example.
Example 10.9 Consider the map from L2 (a, b) into itself given by the in-
tegral
Z b
(T x)(t) = K(t, s)x(s) ds for all t ∈ [a, b]
a
where
Z bZ b
|K(t, s)|2 ds dt < +∞.
a a
Then T is clearly linear, and

Z b Z b 2
2

kT xk =
K(t, s)x(s) ds dt
a a
Z b Z b Z b
2 2
≤ |K(t, s)| ds |x(s)| ds dt by Cauchy-Schwarz
a a a
Z b Z b
= |K(t, s)| ds dt kxk2 ,
2
a a
and so
Z bZ b
kT k2op ≤ |K(t, s)|2 ds dt.
a a
Note that this upper bound on the operator norm can be strict, see ex-
amples.
The space B(X, Y ) is a Banach space whenever Y is a Banach space.

Remarkably this does not depend on whether the space X is complete or
not.
Theorem 10.10 Let X be a normed space and Y a Banach space. Then

B(X, Y ) is a Banach space.
10.2 Kernel and range 83
Proof Let {An } be a Cauchy sequence in B(X, Y ). We need to show that

An → A for some A ∈ L(X, Y ). Since {An } is Cauchy, given ǫ > 0 there
kAn − Am kop ≤ ǫ for all n, m ≥ N. (10.6)
We now show that for every fixed x ∈ X the sequence {An x} is Cauchy in
Y . This follows since
kAn x − Am xkY = k(An − Am )xkY ≤ kAn − Am kop kxkX , (10.7)
and {An } is Cauchy in B(X, Y ). Since Y is complete, it follows that
An x → y,
where y depends on x. We can therefore define a mapping A : X → Y

by Ax = y. We still need to show, however, that A ∈ B(X, Y ) and that
An → A in the operator norm.
First, A is linear since
A(x + λy) = lim An (x + λy) = lim An x + λ lim An y = Ax + λAy.

n→∞ n→∞ n→∞
To show that A is bounded take n, m ≥ N (from (10.6)) in (10.7), and let

m → ∞. Since Am x → Ax this shows that
kAn x − AxkY ≤ ǫkxkX . (10.8)
Since (10.8) holds for every x it follows that
kAn − Akop ≤ ǫ, (10.9)
and so An − A ∈ B(X, Y ). Since B(X, Y ) is a vector space and An ∈

B(X, Y ), it follows that A ∈ B(X, Y ), and (10.9) shows that An → A in
B(X, Y ).
10.2 Kernel and range
Given a linear operator, we define its kernel
Ker T = {x ∈ X : T x = 0}
and its range
Range T = {y ∈ Y : y = T x for some x ∈ X}.

Corollary 10.11 If T ∈ B(X, Y ) then Ker T is a closed linear subspace of

X.
Proof It is easy to show that Ker(T ) is a linear subspace, since if x, y ∈

Ker(T ) then
T (αx + βy) = αT x + βT y = 0.
Furthermore if xn → x and T xn = 0 then since T is continuous T x =
limn→∞ T xn = 0, so Ker(T ) is closed.
Note that the range is not necessarily closed, see examples.

11
The Riesz representation theorem and the adjoint
operator
If U is a normed space over K then a linear map from U into K is called a

linear functional on U .
Since K = R or C is complete then by Theorem 10.10 the collection of all

linear functionals on U , B(U, K), is a Banach space. This space is termed
the dual space of U , and is denoted by U ∗ .
For any f ∈ U ∗ ,
kf kU ∗ = sup |f (u)|.
kuk=1
Example 11.1 Take U = C 0 ([a, b]), and consider δx defined for x ∈ [a, b]
by
δx (f ) = f (x) for all f ∈ U.
Then
|δx (f )| = |f (x)| ≤ kf k∞ ,
so that δx ∈ U ∗ with kδx k ≤ 1. Choosing a function f ∈ C 0 ([a, b]) such that

|f (x)| = kf k∞ shows that in fact kδx k = 1.
(Note: this shows that – at least for this particular choice of U – knowledge
of T (f ) for all T ∈ U ∗ determines f ∈ U . This result is in fact true in
general.)
Example 11.2 Let U be the real vector space L2 (a, b), and take φ ∈ C 0 ([a, b]).
85
86 The Riesz representation theorem and the adjoint operator
Consider
Z b
f (u) = φ(t)u(t) dt.
a
Then
Z b

|f (u)| =
φ(t)u(t) dt
a
= |(φ, u)L2 |
≤ kφkL2 kukL2 using the Cauchy-Schwarz inequality,
and so f ∈ U ∗ with
kf k ≤ kφkL2 .
φ
If we choose u = kφkL2 then kukL2 = 1 and
b
|φ(t)|2
Z
|f (u)| = dt = kφkL2
a kφkL2
and so kf k = kφk.
Exercise 11.3 Let U be C 0 ([a, b]) and for some φ ∈ U consider fφ defined
as
Z b
fφ (u) = φ(t)u(t) dt for all u ∈ U.
a
Rb
Show that fφ ∈ U ∗ with kfφ k ≤ a |φ(t)| dt. Show that this is in fact an
equality by choosingR an appropriate sequence of functions un ∈ U for which
b
|fφ (un )|/kun k∞ → a |φ(t)| dt.
Example 11.4 Let U be a Hilbert space. Given any y ∈ H, define
ly (x) = (x, y). (11.1)
Then ly is clearly linear, and
|ly (x)| = |(x, y)| ≤ kxkkyk
using the Cauchy-Schwarz inequality. It follows that ly ∈ H ∗ with kly k ≤

kyk. Choosing x = y in (11.1) shows that
|ly (y)| = (y, y) = kyk2
and hence kly k = kyk.

The Riesz representation theorem and the adjoint operator 87
The Riesz Representation Theorem shows that this example can be ‘re-
versed’, i.e. every linear functional on H corresponds to some inner product:
Theorem 11.5 (Riesz Representation Theorem) For every bounded

linear functional f on a Hilbert space H there exists a unique element y ∈ H
such that
f (x) = (x, y) for all x∈H (11.2)
and kykH = kf kH ∗ .
Proof Let K = Ker f , which is a closed linear subspace of H.

First we show that K ⊥ is a one-dimensional linear subspace of H. Indeed,
given u, v ∈ K ⊥ we have
f ( f (u)v − f (v)u ) = 0. (11.3)
Since u, v ∈ K ⊥ it follows that f (u)v − f (v)u ∈ K ⊥ , while (11.3) shows that

f (u)v − f (v)u ∈ K. It follows† that
f (u)v − f (v)u = 0,
and so u and v are proportional.

Therefore we can choose z ∈ K such that kzk = 1, and decompose any
x ∈ H as
x = (x, z)z + w with w ∈ K.
Therefore
f (x) = (x, z)f (z) = (x, f (z)z).
Setting y = f (z)z we obtain (11.2).

To show that this choice of y is unique, suppose that
(x, y) = (x, ŷ) for all x ∈ H.
Then (x, y − ŷ) = 0 for all x ∈ H, i.e. y − ŷ ∈ H ⊥ = {0}.

Finally, the calculation in Example 11.4 shows the equality of the norms
of y and f .
We now use this to define the adjoint of an operator.

† We always have U ∩ U ⊥ = {0}: if x ∈ U and x ∈ U ⊥ then kxk2 = (x, x) = 0.
Theorem 11.6 Let H and K be Hilbert spaces and T ∈ B(H, K). Then
there exists a unique operator T ∗ ∈ B(K, H), the adjoint of T , such that
(T x, y)K = (x, T ∗ y)H
for all x ∈ H, y ∈ K. In particular, kT ∗ kB(K,H) ≤ kT kB(H,K) .
Proof Let y ∈ K and consider the function f : H → K defined by f (x) =

(T x, y)K . Then clearly f is linear and
|f (x)| = |(T x, y)K |
≤ kT xkK kykK
≤ kT kkxkH kykK .
It follows that f ∈ H ∗ , and so by the Riesz representation theorem there
exists a unique z ∈ H such that
(T x, y)K = (x, z)H for all x ∈ H.
Define T ∗ y = z. Then by definition
(T x, y)K = (x, T ∗ y) for all x ∈ H, y ∈ K.
However, it remains to show that T ∗ ∈ B(K, H). First, T ∗ is linear since
(x, T ∗ (αy1 + βy2 ))H = (T x, αy1 + βy2 )K
= ᾱ(T x, y1 )K + β̄(T x, y2 )K
= ᾱ(x, T ∗ y1 )H + β̄(x, T ∗ y2 )H
= (x, αT ∗ y1 + βT ∗ y2 )H ,
i.e.
T ∗ (αy1 + βy2 ) = αT ∗ y1 + βT ∗ y2 .
To show that T ∗ is bounded, we have
kT ∗ yk2 = (T ∗ y, T ∗ y)H
= (T T ∗ y, y)K
≤ kT T ∗ ykK kykK
≤ kT kkT ∗ ykH kykK .
If kT ∗ ykH = 0 then clearly kT ∗ ykH ≤ kT kkykK , otherwise we can divide
both side by kT ∗ ykH to obtain the same conclusion (that T ∗ is bounded
from K into H). So kT ∗ k ≤ kT k.
Finally suppose that (x, T ∗ y)H = (x, T̂ y)H for all x ∈ H, y ∈ K. Then for
The Riesz representation theorem and the adjoint operator 89
each y ∈ H, (x, (T ∗ − T̂ )y)H = 0 for all x ∈ H: this shows that (T ∗ − T̂ )y = 0

for all y ∈ H, i.e. that T̂ = T ∗ .
Example 11.7 Let H = K = Cn with its standard inner product. Then

given a matrix A = (aij ) ∈ Cn×n we have
 
Xn X n
(Ax, y) =  aij xj  ȳi
i=1 j=1
n
X n
X
= xj (āij yi )
j=1 i=1
= (x, A∗ y),
T
where A∗ is the Hermitian conjugate of A, i.e. A∗ = A .
Example 11.8 Consider H = K = L2 (0, 1) and the integral operator

Z 1
(T x)(t) = K(t, s)x(s) ds.
0
Then for x, y ∈ L2 (0, 1) we have

Z 1Z 1
(T x, y)H = K(t, s)x(s) dsy(t) dt
0 0
Z 1Z 1
= K(t, s)x(s)y(t) ds dt
0 0
Z 1 Z 1
= x(s) K(t, s)y(t) dt ds
0 0
= (x, T ∗ y)H ,
where
Z 1
∗
T y(s) = K(t, s)y(t) dt.
0
Exercise 11.9 Show that the adjoint of the integral operator T : L2 (0, 1) →
L2 (0, 1) defined as
Z t
(T x)(t) = K(t, s)x(s) ds
0
is given by
Z 1
∗
(T y)(t) = K(s, t)y(s) ds.
t
Example 11.10 Let H = K = ℓ2 and consider σr x = (0, x1 , x2 , . . .) then

(σr x, y) = x1 y2 + x2 y3 + x3 y4 + . . .
= (x, σr∗ y),
where σr∗ = σl y = (y2 , y3 , y4 , . . .). Similarly σl∗ = σr .
The following lemma gives some elementary properties of the adjoint:
Lemma 11.11 Let H, K, and J be Hilbert spaces, R, S ∈ B(H, K) and

T ∈ B(K, J), then
(a) (αR + βS)∗ = ᾱR∗ + β̄S ∗ and
(b) (T R)∗ = R∗ T ∗ .
Proof
(a) Exercise
(b) Clearly
(x, (T R)∗ y)H = (T Rx, y)J
= (Rx, T ∗ y)K = (x, R∗ T ∗ y)H .
Less trivially we have the following:
Theorem 11.12 Let H and K be Hilbert spaces and T ∈ B(H, K). Then
(a) (T ∗ )∗ = T ,
(b) kT ∗ k = kT k, and
(c) kT ∗ T k = kT k2 .
Proof
(a) Since T ∗ ∈ B(K, H), (T ∗ )∗ ∈ B(H, K). For all x ∈ K, y ∈ H we
have
(x, (T ∗ )∗ y)K = (T ∗ x, y)H
= (y, T ∗ x)H
= (T y, x)K
= (x, T y)K ,
i.e. (T ∗ )∗ y = T y for all y ∈ H, i.e. (T ∗ )∗ = T .
11.1 Linear operators from H into H 91
(b) We have already shown in the proof of Theorem 11.6 that kT ∗ k ≤

kT k. Applying this inequality to T ∗ rather than to T we have
k(T ∗ )∗ k = kT k ≤ kT ∗ k, and so kT ∗ k = kT k.
(c) Since kT k = kT ∗ k we have
kT ∗ T k ≤ kT ∗ kkT k = kT k2 .
But also we have
kT xk2 = (T x, T x) = (x, T ∗ T x)
≤ kxkkT ∗ T xk ≤ kT ∗ T kkxk2 ,
i.e. kT k2 ≤ kT ∗ T k.
11.1 Linear operators from H into H
Definition 11.13 If H is a Hilbert space and T ∈ B(H, H) then T is normal

if
T T ∗ = T ∗T
and self-adjoint if T = T ∗ .
Note that if T is normal then T T ∗ is self-adjoint, since (T T ∗ )∗ = (T ∗ )∗ T ∗ =

T T ∗ using (b) of Lemma 11.11 ((T R)∗ = R∗ T ∗ ) and (a) of Theorem 11.12
((T ∗ )∗ = T ).
Equivalently T ∈ B(H, H) is self-adjoint iff

(x, T y) = (T x, y) for all x, y ∈ H.
Example 11.14 Let H = Rn and A ∈ Rn×n . Then A is self-adjoint if

A = AT , i.e. if A is symmetric.
Example 11.15 Let H = Cn and A ∈ Cn×n . Then A is self-adjoint if

T
A = A , i.e. if A is Hermitian.
Example 11.16 Consider the right-shift operator on ℓ2 , for which σr∗ = σl .

Then σr is not normal, since
σr σl x = σr (x2 , x3 , . . .) = (0, x2 , x3 , . . .)
and
σl σr x = σl (0, x1 , x2 , . . .) = (x1 , x2 , x3 , . . .).
Example 11.17 The integral operator T : L2 → L2 given by

Z 1
T f (t) = K(t, s)f (s) ds
0
is self-adjoint if K(t, s) = K(s, t), i.e. if K is symmetric.
It would be nice, of course, to have yet another expression for kT k, and

we can do this when T is self-adjoint.
Theorem 11.18 Let H be a Hilbert space and T ∈ B(H, H) a self-adjoint

operator. Then
(a) (T x, x) is real for all x ∈ H and
(b) kT k = sup{|(T x, x)| : x ∈ H, kxk = 1}.
Proof For (a) we have

(T x, x) = (x, T x) = (T x, x),
and so (T x, x) is real. Now let M = sup{|(T x, x)| : x ∈ H, kxk = 1}.
Clearly
|(T x, x)| ≤ kT xkkxk ≤ kT kkxk2 = kT k
when kxk = 1, and so M ≤ kT k.
For any u, v ∈ H we have
4 Re(T u, v) = (T (u + v), u + v) − (T (u − v), u − v)
≤ M (ku + vk2 + ku − vk2 )
≤ 2M (kuk2 + kvk2 )
using the parallelogram law. If T u 6= 0 choose
kuk
v= Tu
kT uk
to obtain, since kvk = kuk, that
4kukkT uk ≤ 4M kuk2 ,
i.e. kT uk ≤ M kuk. This also holds if T u = 0. It follows that kT k ≤ M and
therefore that kT k = M .
12
Spectral Theory I: General theory
12.1 Spectrum and point spectrum
Let H be a complex Hilbert space and T ∈ B(H, H), then the point spectrum
of T consists of the set of all eigenvalues,
σp (T ) = {λ ∈ C : T x = λx for some non-zero x ∈ H}.
Clearly |λ| ≤ kT kop for any λ ∈ σp (T ), since if T x = λx then
|λ|kxk = |λx| = kT xk ≤ kT kop kxk.
Example 12.1 Consider the right shift operator σr on ℓ2 . This operator

has no eigenvalues, since σr x = λx implies that
(0, x1 , x2 , . . .) = λ(x1 , x2 , x3 , . . .)
and so
λx1 = 0, λx2 = x1 , λx3 = x2 , . . . .
If λ 6= 0 then this implies that x1 = 0, and then x2 = x3 = x4 = . . . = 0,

and so λ is not an eigenvalue. If λ = 0 then we also obtain x = 0, and so
there are no eigenvalues, i.e. σp (σr ) = ∅.
Example 12.2 Consider the left shift operator σl on ℓ2 ; λ ∈ C is an eigen-

value if σl x = λx, i.e. if
(x2 , x3 , x4 . . .) = λ(x1 , x2 , x3 , . . .),
93
94 Spectral Theory I: General theory
i.e. if
x2 = λx1 , x3 = λx2 , x4 = λx3 .
Given λ 6= 0 this gives a candidate ‘eigenfunction’
x = (1, λ, λ2 , λ3 , . . .),
which is an element of ℓ2 provided that
∞
X 1
|λ|2n = <∞
1 − |λ|2
n=1
which is the case for any λ with |λ| < 1. It follows that
{λ ∈ C : |λ| < 1} ⊆ σp (σl ).
If A is a linear operator on a finite-dimensional space V then λ ∈ C is an

eigenvalue of A if
Ax = λx for some non-zero x ∈ V.
In this case λ is an eigenvalue if and only if A−λI is not invertible (recall that
you can find the eigenvalues of an n × n matrix by solving det(A − λI) = 0).
However, this is no longer true in infinite-dimensional spaces. Before we

define the spectrum, we briefly discuss the definition of the inverse of a linear
operator.
As with the theory of matrices, the concept of the inverse of a general

linear operator is extremely useful. We say that A is injective if the equation
Ax = y
has a unique solution for every y ∈ range(A). We say that A is invertible if
it is injective and the range of A is equal to H. In this case we define the
inverse of A, A−1 , by A−1 y = x. It is easy to check that
AA−1 u = u for all u ∈ range(A) and A−1 Au = u for all u ∈ H.
Exercise 12.3 Show that if A is linear and A−1 exists then it is linear too.
The injectivity of A is equivalent to the triviality of its kernel.
Lemma 12.4 A is injective iff Ker(A) = {0}.

Proof Suppose that A is invertible. Then the equation Ax = y has a unique

solution for any y ∈ Range(A). However, if Ker(A) contain some non-zero
element z then A(x+ z) = y also, so Ker(A) must be {0}. Conversely, if A is
not invertible then for some y ∈ Range(A) there are two distinct solutions,
x1 and x2 , of Ax = y, and so A(x1 − x2 ) = 0, giving a non-zero element of
Ker(A).
We can now make the following definition:
Definition 12.5 The resolvent set of T , R(T ), is

R(T ) = {λ ∈ C : (T − λI)−1 ∈ B(H, H)}.
The spectrum σ(T ) of T ∈ B(H, H) is the complement of R(T ),
σ(T ) = C \ R(T ),
i.e. the spectrum of T is the set of all complex λ for which T − λI does not
have a bounded inverse defined on all of H.
Clearly σp (T ) ⊆ σ(T ), since if there is a non-zero z with T z = λz then

Ker(T −λI) 6= {0} and so (T −λI) is not invertible (using Lemma 12.4). But
the spectrum can be much larger than the point spectrum; a nice example
will be a consequence of the fact that σ(T ∗ ) = σ(T ).
Lemma 12.6 σ(T ∗ ) = {λ̄ : λ ∈ σ(T )}.
Proof If λ ∈
/ σ(T ) then T − λI has a bounded inverse,
(T − λI)(T − λI)−1 = I = (T − λI)−1 (T − λI).
Taking adjoints we obtain
[(T − λI)−1 ]∗ (T ∗ − λ̄I) = I = (T ∗ − λ̄I)[(T − λI)−1 ]∗ ,
and so T ∗ − λ̄I has a bounded inverse, i.e. λ̄ ∈
/ σ(T ∗ ). Starting instead with
∗
T we deduce that λ ∈ ∗
/ σ(T ) ⇒ λ̄ ∈
/ σ(T ), which completes the proof.
Example 12.7 Let σr be the right-shift operator on ℓ2 . We saw above that

σr has no eigenvalues, but that for σr∗ = σl the interior of the unit disc is
contained in the point spectrum. It follows that
{λ ∈ C : |λ| < 1} ⊆ σ(σr )
even though σp (σr ) = ∅.
We have already seen that any eigenvalue λ of T must satisfy |λ| ≤ kT kop .
We now show that this also holds for any λ ∈ σ(T ); the argument is more
subtle, and based on considering how to solve the linear equation (I − T )x =
y.
Theorem 12.8 Suppose that T ∈ B(H, H) with kT k < 1. Then (I − T )−1 ∈

B(H, H) and
(I − T )−1 = I + T + T 2 + · · ·
with
k(I − T )k−1 ≤ (1 − kT k)−1 .
Proof Since
kT n xk ≤ kT kkT n−1 xk
it follows that kT n k ≤ kT kn . Therefore if we consider
Vn = I + T + · · · + T n
we have (for n > m)
kVn − Vm k = kT m+1 + · · · + T n−1 + T n k
≤ kT m+1 k + · · · + kT n−1 k + kT n k
≤ kT km+1 + · · · + kT kn−1 + kT kn
1
≤ kT km+1 .
1 − kT k
It follows that {Vn } is Cauchy in the operator norm, and so converges to
some V ∈ B(H, H) with
kV k ≤ 1 + kT k + kT k2 + · · · = [1 − kT k]−1 .
Clearly
V (I − T ) = (I + T + T 2 + · · · )(I − T ) = (I + T + T 2 + · · · )− (T + T 2 + T 3 ) = I
and similarly (I − T )V = I.
As promised:
Corollary 12.9 The spectrum of T , σ(T ) ⊆ {λ ∈ C : |λ| ≤ kT kop }.

Proof We have T − λI = λ( λ1 T − I). So if I − λ1 T is invertible, λ ∈

/ σ(T ).
1
But for |λ| > kT kop we have k λ T kop < 1, and the above theorem shows that
T − λI is invertible and the result follows.
We now show that the spectrum must also be closed, by showing that it
complement (the resolvent set) is open. To this end, we prove the following
theorem, which shows that the set of bounded linear operators with bounded
inverses defined on all of H is open, i.e. that this property is stable under
perturbation.
Theorem 12.10 Let H be a Hilbert space and T ∈ B(H, H) such that

T −1 ∈ B(H, H). Then for any U ∈ B(H, H) with
kU k < kT −1 k−1
the operator T + U is invertible with

kT −1 k
k(T + U )−1 k ≤ . (12.1)
1 − kU kkT −1 k
Proof Let P = T −1 (T + U ) = I + T −1 U . Then since by assumption

kT −1 kkU k < 1 it follows from Theorem 12.8 that P is invertible with
1
kP k−1 ≤ .
1 − kT −1 kkU k
Using the definition of P we have
T −1 (T + U )P −1 = P −1 T −1 (T + U ) = I;
from the first of these identities we have
(T + U )P −1 T −1 = I
and so
(T + U )−1 = P −1 T −1
and (12.1) follows.
Corollary 12.11 If T ∈ B(H, H) then the spectrum of T is closed.
Proof We show that the resolvent set R(T ), the complement of σ(T ), is
open. Indeed, if λ ∈ R(T ) then T − λI is invertible. Theorem 12.10 show

that (T − λI) − δI is invertible for all
|δ| < k(T − λI)−1 k−1 ,
i.e. R(T ) is open.
Lemma 12.12 The spectrum of σl and of σr are both equal to the unit disc
in the complex plane:
σ(σ· ) = {λ ∈ C : |λ| ≤ 1}.
Proof We showed earlier that for the shift operators σr and σl on ℓ2 ,

σ(σr ) = σ(σl ) ⊇ {λ ∈ C : |λ| < 1}.
We have already shown that kσ· kop = 1, so we know that at most
σ(σ· ) ⊇ {λ ∈ C : |λ| ≤ 1},
but since it follows from Corollary 12.11 that
σ(σ· ) ⊆ {λ ∈ C : |λ| ≤ 1}.
It follows that in fact
σ(σ· ) = {λ ∈ C : |λ| ≤ 1}.
A question on the final examples sheet shows that if T is self-adjoint then

σ(T ) ⊆ R.
13
Spectral theory II: compact self-adjoint operators
We now consider eigenvalues of compact self-adjoint linear operators on a

Hilbert space. It is convenient to restrict attention to Hilbert spaces over C,
but this is no restriction, since we can always consider the ‘complexification’
of a Hilbert space over R.
13.1 Complexification and real eigenvalues
Exercise 13.1 Let H be a Hilbert space over R, and define its complexifi-
cation HC as the vector space
HC = {x + iy : x, y ∈ H},
equipped with operations + and ∗ defined via
(x + iy) + (w + iz) = (x + w) + i(y + z), x, y, w, z ∈ V

and
(a + ib) ∗ (x + iy) = (ax − by) + i(bx + ay) a, b ∈ R, x, y ∈ V.
Show that equipped with the inner product
(x + iy, w + iz)HC = (x, w) + i(y, w) − i(x, z) + (y, z)
HC is a Hilbert space.
It follows that
kx + iyk2HC = kxk2 + kyk2 .
99
100 Spectral theory II: compact self-adjoint operators
Just as we can complexify a Hilbert space H to give HC , we can complexify

a linear operator T that acts on H to a linear operator TC that acts on HC :
Lemma 13.2 Let H be a real Hilbert space and HC its complexification.

Given T ∈ B(H, H), extend T to a linear operator T̃ : HC → HC via the
definition
T̃ (x + iy) = T x + iT y x, y ∈ H.
Then T̃ ∈ B(HC , HC ), any eigenvalue of T is an eigenvalue of T̃ , and that
any real eigenvalue of T̃ is an eigenvalue of T .
Proof Clearly
kT̃ (x+iy)k2HC = kT xk2 +kT yk2 ≤ kT k2B(H,H) (kxk2 +kyk2 ) = kT k2B(H,H) kx+iyk2HC ,
so kT̃ k ≤ kT k. But also kT xk = kT̃ (x + i0)k, and so kT k ≤ kT̃ k. So in fact

kT̃ k = kT k.
If λ is an eigenvalue of T with eigenvector x then
T̃ (x + i0) = T x = λx = λ(x + i0);
while if T̃ has eigenvalue λ ∈ R with eigenvector x + iy (with either x or y
non-zero), it follows that
T x + iT y = T̃ (x + iy) = λ(x + iy) = λx + iλy,
and so T x = λx and T y = λy, so since x or y is non-zero, λ is an eigenvalue
of T .
Lemma 13.3 Let H be a real Hilbert space and T ∈ B(H, H) a self-adjoint

operator. Then T̃ as defined above is a self-adjoint operator on HC .
Proof For ξ, η ∈ HC , ξ = x + iy, η = u + iv, x, y, u, v ∈ H,

(T̃ ξ, η) = (T (x + iy), u + iv)HC
= (T x + iT y, u + iv)
= (T x, u) − i(T x, v) + i(T y, u) + (T y, v)
= (x, T u) − i(x, T v) + i(y, T u) + (y, T v)
= (x + iy, T u + iT v)
= (ξ, T̃ η).
If T is self-adjoint then all its eigenvalues are real:
Theorem 13.4 Let T be a self-adjoint operator on a Hilbert space H. Then

all the eigenvalues of T are real and the eigenvectors corresponding to dis-
tinct eigenvalues are orthogonal.
Proof Suppose that T x = λx with x 6= 0. Then
λkxk2 = (λx, x) = (T x, x) = (x, T ∗ x) = (x, T x) = (x, λx) = λ̄kxk2 ,
i.e. λ = λ̄.
Now if λ and µ are distinct eigenvalues with T x = λx and T y = µy then
0 = (T x, y) − (x, T y) = (λx, y) − (x, µy) = (λ − µ)(x, y),
and so (x, y) = 0.
Corollary 13.5 If T is a self-adjoint operator on a real Hilbert space H,

and T̃ is its complexification defined above, σp (T ) = σp (T̃ ).
13.2 Compact operators
We will develop our spectral theory for operators that are self-adjoint and
‘compact’ according to the following definition:
Definition 13.6 Let X and Y be normed spaces. Then a linear operator

T : X → Y is compact if for any bounded sequence {xn } ∈ X, the sequence
{T xn } ∈ Y has a convergent subsequence (whose limit lies in Y ).
Note that a compact operator must be bounded, since otherwise there

exists a sequence in X with kxn k = 1 but kT xn k → ∞, and clearly {T xn }
cannot have a convergent subsequence.
Example 13.7 Take T ∈ B(X, Y ) with finite-dimensional range. Then T

is compact, since any bounded sequence in a finite-dimensional space has a
convergent subsequence.
Theorem 13.8 Suppose that X is a normed space and Y is a Banach space.

If {Kn } is a sequence of compact (linear) operators in B(X, Y ) converging
to some K ∈ B(X, Y ) in the operator norm, i.e.
sup kKn x − KxkY → 0

kxkX =1
as n → ∞, then K is compact.
Proof Let {xn } be a bounded sequence in X. Then since K1 is compact,

K1 (xn ) has a convergent subsequence, K1 (xn1j ). Since xn1j is bounded,
K2 (xn1j ) has a convergent subsequence, K2 (xn2j ). Repeat this process to
get a family of nested subsequences, xnkj , with Kl (xnkj ) convergent for all
l ≤ k.
Now consider the diagonal sequence yj = xnjj . Since {yj } is a subsequence
of {xni } for j ≥ n, it follows that Kn (yj ) converges (as j → ∞) for every n.
We now show that K(yj ) is Cauchy, and hence convergent, to complete
the proof. Choose ǫ > 0, and use the triangle inequality to write
kK(yi ) − K(yj )kY

≤ kK(yi ) − Kn (yi )kY + kKn (yi ) − Kn (yj )kY + kKn (yj ) − K(yj )kY .
Since {yj } is bounded and Kn → K in the operator norm, pick n large

enough that
kK(yj ) − Kn (yj )kY ≤ ǫ/3
for all yj in the sequence. For such an n, the sequence Kn (yj ) is Cauchy,
and so there exists an N such that for i, j ≥ N we can guarantee
kKn (yi ) − Kn (yj )kY ≤ ǫ/3.
So now
kK(yi ) − K(yj )kY ≤ ǫ for all i, j ≥ N,
and {K(yn )} is a Cauchy sequence.
We now use this theorem to show the following:
Proposition 13.9 The integral operator T : L2 (a, b) → L2 (a, b) given by

Z b
[T u](x) = K(x, y)u(y) dy
a
with
Z bZ b
|K(x, y)|2 dx dy < ∞
a a
is compact.
Proof Let {φj } be an orthonormal basis for L2 (a, b). It follows that {φi (x)φj (y)}
is an orthonormal basis for L2 ((a, b) × (a, b)). If we write K(x, y) in terms
of this basis we have
∞
X
K(x, y) = kij φi (x)φj (y),
j,k=1
where the coefficients kij are given by

Z bZ b
kij = K(x, y)φi (x)φj (y) dx dy,
a a
and the sum converges in L2 ((a, b) × (a, b)). Since {φi (x)φj (y)} is a basis
we have
Z bZ b ∞
X
kKk2L2 ((a,b)×(a,b)) = |K(x, y)|2 dx dy = |kij |2 . (13.1)
a a i,j=1
We now approximate T by operators derived from finite truncations of

the expansion of K(x, y). We set
n
X
Kn (x, y) = kij φi (x)φj (y),
i,j=1
and
Z b
[Tn u](x) = Kn (x, y)u(y) dy.
a
P∞
If u ∈ L2 (Ω) is given by u = l=1 cl φl , then
n X
X ∞ Z b
(Tn u)(x) = kij φi (x)φj (y)cl φl (y) dy
i,j=1 l=1 a
X n
= kij cj φi (x).
i,j=1
Since Tn u is therefore a linear combination of {φi }ni=1 , the range of Tn has

dimension n. It follows that n is compact for each n; if we can show that
Tn → T in the operator norm then we can use theorem 13.8 to show that T
is compact.
This is straightforward, since
Z bZ b
2
k(T − Tn )uk = |K(x, y)u(y) − Kn (x, y)u(y)|2 dx dy
a a
Z b Z b Z b
2
≤ |K(x, y) − Kn (x, y)| dx dy |u(y)|2 dy,
a a a
i.e.
Z bZ b
2
kT − Tn k ≤ |K(x, y) − Kn (x, y)|2 dx dy
a a
2
Z bZ b X ∞

≤
kij φi (x)φj (y) dx dy
a a i,j=n+1
∞
X
= |kij |2 ,
i,j=n+1
using the expansion of K and Kn . Convergence of Tn to T follows since the

sum in (13.1) is finite.
We now show that any compact self-adjoint operator has at least one
eigenvalue. (Recall that σr is not even normal, so this is no contradiction.)
Theorem 13.10 Let H be a Hilbert space and T ∈ B(H, H) a compact

self-adjoint operator. Then at least one of ±kT kop is an eigenvalue of T .
Proof We assume that T 6= 0, otherwise the result is trivial. From Theorem

11.18,
kT kop = sup |(T x, x)|.
kxk=1
Thus there exists a sequence xn , of unit vectors, such that

(T xn , xn ) → ±kT kop = α.
Since T is compact there is a subsequence xnj such that T xnj is convergent
to some y. Relabel xnj as xn again.
Now consider
kT xn − αxn k2 = kT xn k2 + α2 − 2α(T xn , xn )
≤ 2α2 − 2α(T xn , xn );
by the choice of xn , the right-hand side tends to zero as n → ∞. It follows,

since T xn → y, that
αxn → y,
and since α 6= 0 is fixed we must have xn → x for some x ∈ H. Therefore
T xn → T x = αx. It follows that
T x = αx
and clearly x 6= 0, since kyk = |α|kxk = kT kop 6= 0.
Note that since any eigenvalue must satisfy |λ| ≤ kT kop , since if T x = λx
we have
λkxk2 = (λx, x) = (T x, x) ≤ kT xkkxk ≤ kT kop kxk2 ,
it follows that kT kop = sup{λ : λ ∈ σp (T )}.
Proposition 13.11 Let T be a compact self-adjoint operator on a Hilbert

space H. Then σp (T ) is either finite or consists of a countable sequence
tending to zero. Furthermore every distinct non-zero eigenvalue corresponds
to a finite number of linearly independent eigenvectors.
Proof Suppose that T has infinitely many eigenvalues that do not form a
sequence tending to zero. Then for some ǫ > 0 there exists a sequence of
distinct eigenvalues with |λn | > ǫ. Let xn be a corresponding sequence of
eigenvectors with kxn k = 1; then
kT xn − T xm k2 = (T xn − T xm , T xn − T xm ) = |λn |2 + |λm |2 ≥ 2ǫ2
since (xn , xm ) = 0. It follows that {T xn } can have no convergent subse-
quence, which contradicts the compactness of T .
Now suppose that for some eigenvalue λ there exists an infinite number of
linearly independent eigenvectors {en }∞ n=1 . Using the Gram-Schmidt process
we can find a countably infinite orthonormal set of eigenvectors, since any
linear combination of the {ej } is still an eigenvector:
Xn n
X Xn
T( αj ej ) = αj T ej = λ( αj ej ).
j=1 j=1 j=1
Now, we have
√
kT en − T em k = kλen − λem k = |λ|ken − em k = 2|λ|.
It follows that {T en } can have no convergent subsequence, again contradict-

ing the compactness of T . (Note that this second part does not in fact use
the fact that T is self-adjoint.)
Lemma 13.12 Let T ∈ B(H, H) and let S be a closed linear subspace of H

such that T S ⊆ S. Then T ∗ S ⊥ ⊆ S ⊥ .
Proof Let x ∈ S ⊥ and y ∈ S. Then T y ∈ S and so (T y, x) = (y, T ∗ x) = 0

for all y ∈ S, i.e. T ∗ x ∈ S ⊥ .
Since we will apply this lemma when T is self-adjoint, for such a T we

have
TS ⊆ S ⇒ T S⊥ ⊆ S⊥
for any closed linear subspace S of H.
Theorem 13.13 (Hilbert-Schmidt Theorem). Let H be a Hilbert space and

T ∈ B(H, H) be a compact self-adjoint operator. Then there exists a finite
or countably infinite orthonormal sequence {wn } of eigenvectors of T with
corresponding non-zero real eigenvalues {λn } such that for all x ∈ H
X
Tx = λj (x, wj )wj . (13.2)
j
Proof By Theorem 13.10 there exists a w1 such that kw1 k = 1 and T w1 =

±kT kw1 . Consider the subspace of H perpendicular to w1 ,
H2 = w1⊥ .
Then since T is self-adjoint, Lemma 13.12 shows that T leaves H2 invariant.
If we consider T2 = T |H2 then we have T2 ∈ B(H2 , H2 ) with T2 compact;
this operator is still self-adjoint, since for all x, y ∈ H2
(x, T2 y) = (x, T y) = (T ∗ x, y) = (T x, y) = (T2 x, y).
Now apply Theorem 13.10 to the operator T2 on the Hilbert space H2
find an eigenvalue λ2 = ±kT2 k and an eigenvector w2 ∈ H2 with kw2 k = 1.
Continue this process as long as Tn 6= 0.
If Tn = 0 for some n then for any x ∈ H we have
n−1
X
y := x − (x, wj )wj ∈ Hn .
j=1
Then
n−1
X n−1
X
0 = Tn y = T y = T x − (x, wj )T wj = T x − λj (x, wj )wj
j=1 j=1
which is (13.2).
If Tn is never zero then consider
n−1
X
yn := x − (x, wj )wj ∈ Hn .
j=1
Then we have
n−1
X
kxk2 = kyn k2 + |(x, wj )|2 ,
j=1
and so kyn k ≤ kxk. It follows that

n−1
X

T x −
= kT yn k ≤ kTn kkyn k = |λn |kxk,
λj (x, wj )wj
j=1
and since |λn | → 0 as n → ∞ we have (13.2).
There is a partial converse to this theorem on the last examples sheet: if

H is a Hilbert space, {ej } is an orthonormal set in H, and T ∈ B(H, H) is
such that
∞
X
Tu = λj (u, ej )ej with λj ∈ R and λj → 0 as j→∞
j=1
then T is compact and self-adjoint.
The orthonormal sequence constructed in this theorem is only a basis for

the range of T ; however, we do have:
Corollary 13.14 Let H be an infinite-dimensional Hilbert space and T ∈

B(H, H) a compact self-adjoint operator. Then there exists an orthonormal
basis E of H consisting of eigenvectorsof T , and for any x ∈ H
∞
X
Tx = λe (x, e)e.
e∈E
where T e = λe e.
Proof From Theorem 13.13 we have a finite or countable sequence {wn } of

eigenvectors of T such that
∞
X
Tx = λj (x, wj )wj . (13.3)
j=1
Now let F be an orthonormal basis for Ker T (this exists since Ker(T ) is
a Hilbert space in its own right, and every Hilbert space has an orthonormal
basis); each f ∈ F is an eigenvector of T with eigenvalue zero, and since
T f = 0 but T wj = λj wj with λj 6= 0, we know that (f, ek ) = 0 for all f ∈ F ,
k ∈ N. So F ∪ {wj } is an orthonormal set in H. orthonormal set in H.
Now, (13.3) implies that
 
X∞
T x − (x, wj )wj  = 0,
j=1
P∞
i.e. that x− j=1 (x, wj )wj ∈ Ker T . It follows that {wj }∪ F is an orthonor-
mal basis for H.
Exercise 13.15 Show that if T is invertible and satisfies the conditions

of Theorem 13.13 then there is an orthonormal basis of H consisting of
eigenvectors corresponding to non-zero eigenvalues of T .
We end this chapter with a corollary of Corollary 13.14 (!) that shows that
the eigenvalues are essentially all of the spectrum of a compact self-adjoint
operator.
Theorem 13.16 Let T be a compact self-adjoint operator on a Hilbert space.

Then σ(T ) = σp (T ).
Note that this means that either σ(T ) = σp (T ) or σ(T ) = σp (T ) ∪ {0},

since σp (T ) has no limit points except perhaps zero. So σ(T ) = σp (T )
unless there are an infinite number of eigenvalues but zero itself is not an
eigenvalue.
Proof By the corollary of the HS Theorem, we have

X
Tx = λe (x, e)e
e∈E
for some orthonormal basis E of H.

Now take µ ∈
/ σp (T ). For such µ, it follows that there exists a δ > 0 such
that
sup |µ − λ| ≥ δ > 0 for all λ ∈ σp (T )
j
(otherwise µ ∈ σp (T )). We use this to show that T − µI is invertible with

bounded inverse.
Now,
X X
(T − µI)x = y ⇔ (λe − µ)(x, e)e = (y, e)e.
e∈E e∈E
Taking the inner product of both sides with a particular f ∈ E, we have

(T − µI)x = y ⇔ (λf − µ)(x, f ) = (y, f ).
So we must have
(y, f )
(x, f ) = .
λk − µ
|(y, f )|2 < ∞ and |λ − µ| ≥ δ, it follows that
P
Since
X
x= (x, f )f
f ∈E
converges, and that kxk ≤ δ−1 kyk. So (T − µI)−1 exists and is bounded.
14
Sturm-Liouville problems
We consider the Sturm-Liouville problem

d du
− p(x) + q(x)u = λu with u(a) = u(b) = 0. (14.1)
dx dx
As a shorthand, we write L[u] for the left-hand side of (14.1), i.e.
L[u] = −(p(x)u′ )′ + q(x)u
We will assume that p(x) > 0 on [a, b] and that q(x) ≥ 0 on [a, b].
It was one of the major concerns of applied mathematics throughout the

nineteenth century to show that the solutions {un (x)} of (14.1) form a com-
plete basis for some appropriate space of functions (generalising the use of
Fourier series as a basis for L2 ). We can do this easily using the theory
developed in the last section.
However, first we have to turn the differential equation (14.1) into an

integral equation. We do this as follows:
Lemma 14.1 Let u1 (x) and u2 (x) be two linearly independent non-zero
solutions of

d du
− p(x) + q(x)u = 0.
dx dx
Then
Wp (u1 , u2 )(x) := p(x)[u′1 (x)u2 (x) − u′2 (x)u1 (x)]
is a constant.
110
Sturm-Liouville problems 111
Proof First we show that Wp is constant. Differentiate Wp with respect to

x, then use the fact that L[u1 ] = L[u2 ] = 0 to substitute for pu′′ = qu − p′ u′
to give:
Wp′ = p′ u′1 u2 + pu′′1 u′2 + pu′1 u′2 − p′ u1 u′2 − pu′1 u′2 − pu1 u′′2
= p′ (u′1 u2 − u′2 u1 ) + p(u′′1 u2 − u′′2 u1 )
= p′ (u′1 u2 − u′2 u1 ) + u2 (qu1 − p′ u′1 ) − u1 (qu2 − p′ u′2 )
= 0.
Now, if Wp ≡ 0 then, since p 6= 0, we have
u′1 u′
u′1 u2 − u′2 u1 = 0 ⇒ = 2,
u1 u2
which can be integrated to give ln u1 = ln u2 + c, i.e. u1 = ec u2 , which

implies that u1 and u2 are proportional, contradicting the fact that they are
linearly independent.
Theorem 14.2 Suppose that u1 (x) and u2 (x) are two linearly independent
non-zero solutions of

d dy
− p(x) + q(x)y = 0,
dx dx
with u1 (a) = 0 and u2 (b) = 0. Set C = Wp (u1 , u2 )−1 and define

(
Cu1 (x)u2 (y) a≤x<y
G(x, y) = (14.2)
Cu2 (x)u1 (y) y ≤ x ≤ b;
then the solution of L[u] = f is given by

Z b
u(x) = G(x, y)f (y) dy. (14.3)
a
Proof Writing (14.3) out in full we have

Z x Z b
u(x) = Cu2 (x) u1 (y)f (y) dy + Cu1 (x) u2 (y)f (y) dy.
a x
112 Sturm-Liouville problems
Now,
Z x
′
u (x) = Cu2 (x)u1 (x)f (x) + Cu′2 (x)
u1 (y)f (y) dy − Cu1 (x)u2 (x)f (x)
a
Z b
′
+Cu1 (x) u2 (y)f (y) dy
x
Z x Z b
= Cu′2 (x) u1 (y)f (y) dy + Cu′1 (x) u2 (y)f (y) dy,
a x
and then since CWp (u1 , u2 ) = 1,

Z x
u′′ (x) = Cu′2 (x)u1 (x)f (x) + Cu′′2 (x) u1 (y)f (y) dy − Cu′1 (x)u2 (x)f (x)
a
Z b
+Cu′′1 (x) u2 (y)f (y) dy
x
Z x Z b
f (x) ′′ ′′
= − + Cu2 (x) u1 (y)f (y) dy + Cu1 (x) u2 (y)f (y) dy.
p(x) a x
Now we have L[u] = −pu′′ − p′ u′ + qu, and since L is linear with L[u1 ] =
L[u2 ] = 0 it follows that
L[u] = f (x)
as claimed.
We can now define a linear operator on L2 (a, b) by the right-hand side of

(14.3):
Z b
T f (x) = G(x, y)f (y) dy.
a
Since G is symmetric (see (14.2)), it follows that T is a self-adjoint (see

Example 11.17), and we have proved that such a T is compact in Proposition
13.9.
If L[u] = λu, then u = T (λu). Since T is linear,
L[u] = λu ⇔ u = λT u.
If we can show that λ, 1/λ 6= 0 then the eigenvectors (or ‘eigenfunctions’) of

the ODE boundary value problem L[u] = λu (which is just (14.1)) will be
exactly those of the operator T (for which T u = λ1 u). Since the eigenvectors
of T form an orthonormal basis for L2 (a, b) (we will see that Ker(T ) = {0}),
the same is true of the eigenfunctions of the Sturm-Liouville problem.
Sturm-Liouville problems 113
Theorem 14.3 The eigenfunctions of the problem (14.1) form a complete

orthonormal basis for L2 (a, b).
Proof We show first that λ = 0 is not an eigenvalue of (14.1), i.e. there is

no non-zero u for which L[u] = 0. Indeed, if L[u] = then we have
Z b
0 = (L[u], u) = −(pu′ )′ u + q|u|2 dx
a
Z b
b
= p|u′ |2 + q|u|2 dx − a p(x)u′ (x)u(x)]
a
Z b
= p|u′ |2 + q|u|2 dx.
a
Since p > 0 on [a, b], it follows that u′ = 0 on [a, b], and so u must be
constant on [a, b]. Since u(a) = 0, it follows that u ≡ 0.
We now use show that Ker(T ) = 0. Indeed, T f is the solution of L[u] = f ,
i.e. f = L[T f ]. So if T f = 0, it follows that f = 0.
So φ is an eigenfunction of the SL problem iff it is an eigenvector for T :
1
L[φ] = λφ ⇔ Tφ = φ.
λ
Since G(x, y) is symmetric and bounded, it follows from Examples 10.9
and 11.8 that T is a bounded self-adjoint operator; Proposition 13.9 shows
that T is also compact. It follows from Theorem 13.13 that T has a set of
orthonormal eigenfunctions {φj } with
T φj = µj φj ,
and since Ker(T ) = {0} the argument of Corollary 13.14 shows that those
form an orthonormal basis for L2 (a, b).
Comparing this with our original problem we obtain an infinite set of
eigenfunctions {φj } with corresponding eigenvalues λj = µ−1
j . Note that
now λj → ∞ as j → ∞. As above, the eigenfunctions {φj } form an or-
thonormal basis of L2 (a, b).
This has immediate applications for Fourier series. Indeed, if we consider

d2 u
− = λu u(0) = 0, u(1) = 0,
dx2
which is (14.1) with p = 1, q = 0, it follows that the eigenfunctions of this
114 Sturm-Liouville problems
equation will form a basis for L2 (0, 1). These are easily found by elementary
methods, and are
{sin kπx}∞
k=1 .
It follows that, appropriately normalised, these functions form an orthonor-

mal basis for L2 (a, b), i.e. that any f ∈ L2 (a, b) can be expanded in the
form
∞
X
f (x) = αk sin kπx.
k=1
Thus begins the theory of Fourier series...
Exercise 14.4 Show that the solution of −d2 u/dx2 = f is given by

Z 1
u(x) = G(x, y)f (y) dy,
0
where (
x(1 − y) 0≤x<y
G(x, y) =
y(1 − x) y ≤ x ≤ 1.

Functional Analysis I Autumn Term 2008: James C. Robinson

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Functional Analysis I Autumn Term 2008: James C. Robinson

Uploaded by

Copyright:

Available Formats

Functional Analysis I

Autumn Term 2008

Solutions to the examples sheets will follow separately.

1 Vector spaces and bases page 1

2 Norms and normed spaces 8

2.1 Norms and normed spaces 8

3 Compactness and equivalence of norms 20

4.1 The completion of a normed space 31

5.1 Integrals of functions in Lstep (R) 36

5.2 Increasing sequences of functions in Lstep (R): Linc (R) 39

5.3 The space L1 (R) of integrable functions 43

5.4 The Lebesgue spaces Lp 46

6 Inner product spaces 48

6.1 Inner products and norms 49

6.2 The Cauchy-Schwarz inequality 49

6.3 The relationship between inner products and their norms 51

7 Orthonormal bases in Hilbert spaces 54

7.1 Orthonormal sets 54

7.2 Convergence and orthonormality in Hilbert spaces 56

7.3 Orthonormal bases in Hilbert spaces 62

8 Closest points and approximation 65

8.1 Closest points in convex subsets 65

8.2 Linear subspaces and orthogonal complements 66

8.3 Best approximations 69

9 Separable Hilbert spaces and ℓ2 73

10 Linear maps between Banach spaces 78

10.1 Bounded linear maps 78

10.2 Kernel and range 83

11 The Riesz representation theorem and the adjoint operator 85

11.1 Linear operators from H into H 91

12 Spectral Theory I: General theory 93

12.1 Spectrum and point spectrum 93

13 Spectral theory II: compact self-adjoint operators 99

13.1 Complexification and real eigenvalues 99

13.2 Compact operators 101

14 Sturm-Liouville problems 110

In all that follows we use K to denote R or C, although one can

Rn is the simplest and most natural example of a vector space. We give a

that one usually has to check.

Definition 1.1 A vector space V over K is a set V with operations + :

α ∗ (β ∗ x) = (αβ) ∗ x for all α, β ∈ K, x ∈ V

The multiplication operator ∗ can usually be understood and so we gen-

As remarked above we will only consider K = R or C here, and will refer to

Examples: Rn is a real vector space over R; it is not a vector space over

For p = ∞, ℓ∞ (K) is the space of all bounded sequences in K.

and for α ∈ K, x ∈ ℓp , deﬁne

αx = (αx1 , αx2 , . . .).

and for p = ∞ it is clear.

Sometimes we will simply write ℓp for ℓp (R).

f, g ∈ C 0 ([0, 1]) and α ∈ R, we denote by f + g the function whose values

The only thing to check here is that f + λg ∈ L̃1 (0, 1) whenever f, g ∈

Note that if f ∈ C 0 ([0, 1]) then, since it is a continuous function on a

i.e. f ∈ L̃1 (0, 1).

We now discuss spanning sets, linear independence, and bases. Note

Definition 1.5 The linear span of a subset E of a vector space V is the

We say that E spans V if V = Span(E), i.e. every element of v can be

Note that this definition requires v to be expressed as a ﬁnite linear com-

Definition 1.6 A set E is linearly independent if any ﬁnite collection of

for any choice of n ∈ N, αj ∈ K, and ej ∈ E.

Definition 1.7 A Hamel basis for V is an linearly independent spanning

Expansions in terms of basis elements are unique:

Lemma 1.8 If E is a Hamel basis for V then any element of V can be

for some n ∈ N, αj ∈ K, and ej ∈ E.

A partial order on a set P is a binary relation on P such that

(i) a a for all a ∈ P ,

An element b ∈ P is an upper bound for a subset S of P if s b for all