You are on page 1of 88

Applied Functional Analysis

Jorge Aarao

July 2019
2
Contents

1 Motivation, metric, and integration 5


1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.2 Metric spaces: definition and examples . . . . . . . . . . . . . . . . . . . . . . . . 6
1.3 Metric topology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.4 Visualising nearness for functions . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.5 Convergence and completeness . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
1.6 The Lebesgue integral . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

2 The spaces ℓp 19
2.1 The spaces ℓp . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.2 Hölder’s and Minkowski’s inequalities for ℓp . . . . . . . . . . . . . . . . . . . . . 23
2.3 ℓp is a metric space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

3 The spaces Lp 31
3.1 Almost a metric . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
3.2 The spaces Lp (I) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
3.3 Hölder’s and Minkowski’s inequalities for Lp . . . . . . . . . . . . . . . . . . . . . 35
3.4 Lp is a complete metric space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
3.5 Understanding Lp . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

4 Banach spaces 41
4.1 Linear spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
4.1.1 Linear subspaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
4.2 The norm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
4.3 The unit ball . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
4.3.1 Compactness and the unit ball . . . . . . . . . . . . . . . . . . . . . . . . 48
4.3.2 Convexity and the unit ball . . . . . . . . . . . . . . . . . . . . . . . . . . 50
4.4 Banach spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
4.4.1 Questions of convergence . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
4.4.2 Something more about ℓp . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
4.4.3 Cartesian product . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
4.4.4 Equivalence of norms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
4.4.5 Direct sums . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58

5 Hilbert spaces 61
5.1 Inner product . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
5.2 Hilbert spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
5.3 Convexity and optimisation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66

3
4 CONTENTS

5.4 Orthogonality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
5.5 Bases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
5.5.1 The Gram-Schmidt process . . . . . . . . . . . . . . . . . . . . . . . . . . 70
5.5.2 Representing x on an orthogonal basis . . . . . . . . . . . . . . . . . . . . 72
5.5.3 Fourier series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
5.5.4 The Haar basis (wavelets) . . . . . . . . . . . . . . . . . . . . . . . . . . . 74

6 Linear functionals 79
6.1 Boundedness and continuity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
6.2 Linear functionals on Hilbert space . . . . . . . . . . . . . . . . . . . . . . . . . . 85
Topic 1

Motivation, metric, and integration

In this chapter we want to accomplish three goals: (A) Offer a brief motivation to the study of
functional analysis; (B) Review concepts from metric spaces, especially the definitions of open
and closed sets, convergence, and completeness; (C) Introduce the Lebesgue integral.

1.1 Motivation
When motivating functional analysis, it is key to look first at linear algebra. Consider the
problem of finding a vector u = (x, y, z) solving the problem
    
−5 1 2 x 3
 −1 2 1   y  =  6  .
0 3 1 z 9

You can check that x = 1, y = 2, z = 3 solves the problem. So far, so good. This problem is
in the format M u = b. Generally we would write u = M −1 b to solve such a problem. Here,
however, we can’t do that, because this particular M has no inverse (check that det(M ) = 0).
Another way of saying this is that the equation M u = 0 has infinitely many solutions. Even
though this general method fails, the problem still has a solution, as seen before.
Now consider the following problem in partial differential equations: Find a function u(x, y)
such that
∂2u ∂2u 1
+ 2 = ,
∂x2 ∂y 1 + x2 + y 2
for all (x, y) ∈ R2 . If we just take the liberty of writing this problem as
( 2 )
∂ ∂2 1
2
+ 2 u= ,
∂x ∂y 1 + x2 + y 2

we see that it looks (in format) like M u = b, where

∂2 ∂2 1
M= + , b= .
∂x2 ∂y 2 1 + x2 + y 2

Taking a clue from linear algebra, we want to write


( )−1
∂2 ∂2 1
u= + .
∂x2 ∂y 2 1 + x2 + y 2

5
6 TOPIC 1. MOTIVATION, METRIC, AND INTEGRATION

Can this be done? (Never mind what that inverse means, we will tackle that later in our course.)
Let’s go back to linear algebra. In our example, it could not be done because det(M ) = 0
and M has no inverse. Equivalently, the equation M u = 0 has infinitely many solutions.
So we ask ourselves the question: How many solutions are there to the equation
( 2 )
∂ ∂2
+ u = 0?
∂x2 ∂y 2
As it turns out, there are infinitely many solutions! For example,

u = ax + by, u = x2 − y 2 , u = xy,
∂ 2 ∂ 2
and many more. This means that the matrix ∂x 2 + ∂y 2 has an eigenvalue 0, whatever that
means.
We can already see what functional analysis will be about. A matrix in linear algebra finds its
counterpart in a partial differential operator; the notion of eigenvalue (and eigenvector) extends
from matrices to operators; vectors become functions... So vector spaces become function spaces,
and we need to understand the concepts of linear algebra in the new context of function spaces.
Every question we used to have about vectors and matrices now needs to be answered about
functions and operators: What is a basis? What does it mean to be perpendicular? What are
eigenvalues? How do we compute inverses?
The goal is simply stated: To provide a solid understanding to solving differential equations.
In the very first instance, that is what functional analysis is all about.

1.2 Metric spaces: definition and examples


We will be working with spaces of functions, and we will need to answer the all-important
question: When are two functions close to each other?
A metric is a way to measure distances; at least, that is the notion we want to implement.
Let’s proceed at once to the formal definition, and then give examples.
Definition 1.1. Suppose X is a given non-empty set where we want to define a notion of
distance. A metric, or distance on X is a function

d : X × X → [0, ∞)

where the following properties hold true for any x, y, z ∈ X:


M1 d(x, y) = 0 ⇐⇒ x = y;
M2 d(x, y) = d(y, x);
M3 d(x, y) ≤ d(x, z) + d(z, y).
The number d(x, y) is said to be the distance from x to y. ⋄
Because we defined the function d to have [0, ∞) as its target space, we see that a metric,
and therefore a distance, can’t take negative values, which is a good thing. The first property
simply states that the distance from any element to itself is 0, and moreover the distance
between two different elements can’t be 0. That is also a good thing. The second property is
simply symmetry, a desirable property for distances. Finally, the third property is the famous
triangle inequality, which boils down to say that if x, y, z are three vertices of a hypothetical
triangle, then by going from x to y through z we can’t possibly make our trip shorter.
1.2. METRIC SPACES: DEFINITION AND EXAMPLES 7

Important things to keep in mind.

• Given any non-empty set X, it is possible to put a metric on it.

• Usually the same set X will support more than one metric, so there is no sense in thinking
that there is some intrinsic distance from x to y. Change the metric, change the distance,
change the notion of nearness, change everything.

Let’s move on to the examples, each of which will become important in their own right when
studying functional analysis.

Example 1. We can always define the discrete metric on a non-empty set X by setting
d(x, x) = 0, and d(x, y) = 1, for any x, y ∈ X with x ̸= y. We leave it as an exercise for you to
verify that this is a metric.
We can visualise this metric in some simple cases. If for example X has only two elements,
we can think of two points on a line, at distance 1 apart. If X has three elements, we think of the
vertices of an equilateral triangle on the plane. If X has four elements, we think of the vertices
of a tetrahedron in space. If X has five elements, then we need to move to four dimensions; and
what we lose in visualisation we gain in analogy. ⋄

Example 2. We take X = R2 , and let’s define a few different metrics. For this example we
denote x = (a1 , b1 ), y = (a2 , b2 ). The first metric is the usual Euclidean metric, where

d(x, y) = (a1 − a2 )2 + (b1 − b2 )2 .

Properties M1 and M2 are easy enough to verify, but property M3 can be a challenge. You try.
With this metric, the distance from x to y is most readily associated in our minds with the
straight line segment from x to y.
Our second metric will be denoted by d1 so that we can distinguish it from the d we already
defined, and we set
d1 (x, y) = |a1 − a2 | + |b1 − b2 |.
Again, M1 and M2 are straightforward, and you should try proving M3. This metric could
be called the horizontal-vertical metric (a name I just invented), because of the following
geometric picture, which you should draw. Imagine that x and y are opposite vertices of a
rectangle, and that the rectangle has sides parallel to the coordinate axes. The value |a1 − a2 |
is the horizontal distance (so to speak) from x to y, while |b1 − b2 | is the vertical distance.
Hence the distance d1 (x, y) is the usual distance, but measured along two consecutive sides of
the rectangle, from x to y. (This distance is often called the taxicab distance, in a reference to
the street grid of Manhattan, but let’s not be too American-centric here.) We point out as an
irrelevant side note that with distance d there is only one shortest path from x to y, but with
distance d1 there are infinitely many shortest paths.
Our third distance will be

d∞ (x, y) = max{|a1 − a2 |, |b1 − b2 |}.

We will justify the sub-index in a moment. Again, M1 and M2 are simple, and in this case M3 is
not too bad either. In here the distance between x and y is simply the largest of the horizontal
and vertical distances.
8 TOPIC 1. MOTIVATION, METRIC, AND INTEGRATION

Now let’s justify the indices. It turns out that, if 1 ≤ p < ∞ is a given number, then

dp (x, y) = p |a1 − a2 |p + |b1 − b2 |p
is a distance. Again, M3 is an invitation to a headache. If we set p = 1 we get distance d1 , and
if we set p = 2 we get distance d. If we take a limit as p → ∞, we get d∞ . ⋄
Example 3. Let’s move away from examples that we can visualise. Consider X to be the space
of continuous functions f : [0, 1] → R. This space is denoted by C([0, 1], R), and often simply
by C([0, 1]). If f, g ∈ X, we define
∫1
d(f, g) = |f (t) − g(t)| dt.
0

In words, the distance from f to g is the sum total of the unsigned area comprehended between
the graphs of f and g. (So perhaps we can visualise it, after all.) While property M2 is still
simple, note that property M1 is not as simple anymore. We are required to verify that
∫1
|f (t) − g(t)| dt = 0 =⇒ f (t) = g(t) for all t ∈ [0, 1].
0

This needs proof, using properties of continuous functions ε-δ arguments, etc.
It is worth pointing out that this example is in fact very analogous to the previous R2
example, in which v
u∫1
u
u
dp (f, g) = t
p
|f (t) − g(t)|p dt
0

is a metric for each number 1 ≤ p < ∞, and if we take the limit as p → ∞, then
d∞ (f, g) = max {|f (t) − g(t)|}.
0≤t≤1

Not a bad exercise to try to do. ⋄


One last example, and we are done.
Example 4. We know about R2 , where elements have two coordinates, and R3 , where elements
have three coordinates, etc. Let’s now imagine R∞ where an element will have infinitely many
coordinates, as in
x = (a1 , a2 , a3 , a4 , · · · ),
and each aj is a real number. We will write x = (an ) as shorthand. Now, the space R∞ is not
such a great space where to define a metric, because secretly we want to generalise Example 2,
and define the distance d(x, y) by setting

d(x, y) = |a1 − b1 |2 + |a2 − b2 |2 + · · ·
Clearly this has every chance in the world of being infinite, if we don’t restrict the possible
values of an and bn . We restrict our space, and define X = ℓ2 (N) (better known as little ell two)
in the following way: an element x = (an ) is in ℓ2 (N) if and only if


|an |2 < ∞.
n=1
1.3. METRIC TOPOLOGY 9

Thus, the element x = (1, 1, 1, 1, · · · ) is in R∞ , but not in ℓ2 (N). It is now an exercise to show
that if x = (an ) and y = (bn ) are both in ℓ2 (N), then
v
u∞
u∑
d(x, y) = t |an − bn |2
n=1

indeed defines a metric on X. As usual, M3 is the trouble property.


Again, this generalizes. We define the space ℓp (N), for 1 ≤ p < ∞, to be formed by those
elements x = (an ) such that
∑∞
|an |p < ∞.
n=1

We define the metric on ℓp (N) to be


v
u∞
u∑
dp (x, y) = t
p
|an − bn |p .
n=1

The case p = ∞ is once more a limiting case, but it is easier to define ℓ∞ (N) to be formed by
those elements x = (an ) such that the sequence an is bounded on R. With that definition, the
distance becomes
d∞ (x, y) = sup{|an − bn |}.
n

Note that the maximum is replaced by the supremum. ⋄

1.3 Metric topology


A topology on a set X is a complete description of the open and closed sets available in X.
Once more, this is a definition that doesn’t define a thing. On X = R, for example, we are used
to think of the intervals [0, 1], and (0, 1), as closed and open, respectively, but in abstract spaces
the notion of open and closed can be very problematic. However, in the presence of a metric,
things become considerably simpler, and we proceed by analogy with Rn .

Definition 1.2. Suppose (X, d) is a metric space. Given a real number r ≥ 0, and x ∈ X, we
define the open ball with radius r and centred at x to be the set

B(x, r) = Br (x) = {y ∈ X | d(x, y) < r}.

In words, the open ball contains all elements of X which are closer than r to the fixed x. ⋄

Note that if r = 0 then Br (x) is empty, and if r > 0 then x ∈ Br (x).


We now give a fundamental definition.

Definition 1.3. Let (X, d) be a metric space, and let A be a subset of X. We say that x is an
interior point of A if, for some r > 0, we have that Br (x) is a subset of A. In other words, if
we have
d(x, y) < r =⇒ y ∈ A.
The set A will be called open if and only if every x ∈ A is an interior point of A. ⋄
10 TOPIC 1. MOTIVATION, METRIC, AND INTEGRATION

These definitions have a reasonable aspect to them, but make no mistake, they are abstract.
Just think that these notions depend on the metric, and our examples showed us how abstract
metrics can be. Let’s “prove the obvious” (except that it is not obvious).
Proposition 1. Let (X, d) be a metric space.

1. The space X itself is open.

2. The empty subset A = ∅ is open.

3. The open ball Br (x) is open.

4. The intersection of two open sets A and B is open.

5. The union of two open sets A and B is open.

Proof. Let’s take these one by one, and start by showing that X is open. To show it we need
to show that every x ∈ X is an interior point of X. Fix x ∈ X. To show that this x is interior
to X, we need to produce a number r > 0 with the property that, if d(x, y) < r, then y ∈ X.
But, wait a minute. Every y is an element of X, since X is the whole space, and this is true
regardless of the r we choose. So we choose r = 1. Since B1 (x) is completely contained in X,
we conclude that x is interior to X. Since x was arbitrary, we conclude that every x ∈ X is
interior to X, and therefore X is open.
The empty set is open, simply because we can’t falsify the statement that the empty set is
open. Let’s try to falsify the statement. If the empty set is not open, then it would be possible
to find an element x ∈ ∅ and a number r > 0 such that Br (x) is not contained in the empty set.
Since it is impossible to find such x, it is impossible to show that ∅ is not open. Conclusion:
The empty set is open.
The third statement is beautiful in its semantic twist. We defined the concept of open ball,
and we defined the concept of open set, but these are just names, and in principle there is nothing
guaranteeing that an open ball is an open set. If this were not to be the case, our nomenclature
would be poor indeed. But let’s proceed to the proof. For the open ball Br (x) to be an open
set, we need to show that each y ∈ Br (x) is interior to Br (x). Fix y ∈ Br (x). We need to
produce s > 0 such that Bs (y) is contained in Br (x). Choose s = r − d(x, y), and note that
s > 0. Supposing now that z ∈ Bs (y), we want to show that z ∈ Br (x) as well. Thus we need
to show that d(z, x) < r. Using the triangle inequality we have

d(z, x) ≤ d(z, y) + d(y, x) < s + d(x, y) = r.

This finishes the proof of the third item.


For the next item, suppose x ∈ A ∩ B. Since A and B are open, then x is interior to both.
Because x is interior to A, there must be some r1 > 0 such that Br1 (x) is contained in A.
Likewise, for some r2 > 0 we have Br2 (x) is contained in B. If we take r = min{r1 , r2 }, we see
that Br (x) is contained in both A and B. We conclude that A ∩ B is open.
Finally, if x ∈ A ∪ B, then either x ∈ A or x ∈ B (or both). Without loss of generality,
suppose x ∈ A; then, for some r > 0 we have Br (x) ⊂ A, and as a consequence Br (x) ⊂ (A ∪ B)
as well. We conclude that A ∪ B is open.
2

The following result is an important consequence of the above, and you should prove it.
1.4. VISUALISING NEARNESS FOR FUNCTIONS 11

Proposition 2. Let (X, d) be a metric space.


1. Any intersection of finitely many open sets is an open set.
2. Any union of open sets is an open set.
Please note that the intersection of infinitely many open sets may not be an open set.
Consider, for example, the intersection of the following open intervals on R.


[0, 1] = (−n−1 , 1 + n−1 ).
n=1

Definition 1.4. Let (X, d) be a metric space. A set A ⊂ X is closed if and only if its
complement Ac is open. ⋄
Note how a closed set is defined by its complement. As a side note, an old question that
used to be asked just to confuse students is this: which are there more of, open sets or closed
sets? The answer is that there are as many of one as of the other, because to every open set
you can associate, bijectively, its complement.

1.4 Visualising nearness for functions


So, when are two functions near to each other? It depends on the metric. Consider these two
functions defined on the interval [0, 1], whose graphs are depicted below.

3.5

2.5

1.5

0.5
0 0.2 0.4 0.6 0.8 1

The blue function is f , the orange function is g. In the first distance we have (computed by
MATLAB)
∫ 1
d1 (f, g) = |f (t) − g(t)| dt = 0.0951,
0
so, less than 0.1. I’d say these are close, in this distance. The second distance gives us
∫ 1
d(f, g) = |f (t) − g(t)|2 dt = 0.1142,
0
12 TOPIC 1. MOTIVATION, METRIC, AND INTEGRATION

so, not as close, but still fairly close. In the infinity distance
d∞ (f, g) = sup |f (t) − g(t)| = 2.0412.
So, not near at all! (Comparatively with the other distances.)
To be near in the d∞ distance, f and g have to be near for every value of t. If there is one
value of t when they are not near (in this example, t = 0.5), then f and g will not be near.
However, as the example shows, f and g can be near each other in the d1 distance, even
though they are not near in the d∞ distance. That is because for distance d1 , what is important
is the area comprised between the graphs of f and g, and in our example that area is small
(even though f (0.5) and g(0.5) are not close).
That should help develop some intuition about the different ways to measure distance be-
tween functions.
How do we visualise open sets? There is a neat way to visualise the open ball d∞ (f, g) < ε
containing all functions g which are ε-close to f in the distance d∞ . Let’s take ε = 0.1. The
following picture contains the graph of f (middle line, in blue), then the graphs of f + 0.1 and
f − 0.1 (above and below, in red).

0.8

0.6

0.4

0.2

-0.2

-0.4

-0.6

-0.8
0 0.2 0.4 0.6 0.8 1

To be close to f (closer than 0.1, that is, the graph of function g must be contained entirely
inside the two red lines. (The region inside the two red lines is called a tubular neighbourhood
of f , for reasons that I hope are clear.) In the next picture, you can see (in black) the graph
one one such function g.

0.8

0.6

0.4

0.2

-0.2

-0.4

-0.6

-0.8
0 0.2 0.4 0.6 0.8 1
1.5. CONVERGENCE AND COMPLETENESS 13

It is much more difficult (in my opinion) to picture the open∫ ball d1 (f, g) < ε, because in
1
this case g can vary wildly away from f , as long as the quantity 0 |f (t) − g(t)| dt is small. For
example, in the picture below f is in blue, g is orange.

1.5

0.5

-0.5

-1

-1.5

-2
0 0.2 0.4 0.6 0.8 1

MATLAB gives us d1 (f, g) = 0.1118, so you can see that we won’t have a neat image of
which functions g are inside the open ball.

1.5 Convergence and completeness


A metric gives us a notion of nearness and, by extension, of convergence.
Definition 1.5. Let (X, d) be a metric space, and suppose xn ∈ X is a sequence in X. We say
that xn converges to x ∈ X if, for each number ε > 0 there is an index n0 such that, if n ≥ n0 ,
then d(xn , x) < ε. ⋄
You must know this definition from other courses you took, so I will not elaborate.
Example 5. Consider the space X = C(R) of continuous functions defined on R, and let’s take
f (t) = 0 for all t. Define the function


 0, t < 0;

t, 0 ≤ t ≤ 1;
g(t) =

 2 − t, 1 ≤ t ≤ 2;

0, 2 < t.

Draw the graph of g, and convince yourself that g is in X. Now define fn (t) = g(t − n), a
translation of g n units to the right. We observe that, for each fixed t, we have

lim fn (t) = 0 = f (t).


n→∞

However, d∞ (fn , f ) = 1 for all n, and so fn does not converge to f in the distance d∞ . ⋄
14 TOPIC 1. MOTIVATION, METRIC, AND INTEGRATION

Example 6. Let’s change the last example a little. same f as before, same g as before, but
now we define fn (t) = g(nt − n2 ). Draw a picture of fn and convince yourself that fn (t) ̸= 0
only for n < t < n + n2 . As before, we have

lim fn (t) = 0 = f (t), d∞ (fn , f ) = 1.


n→∞

So it is not true that fn converges to f if we measure distances using the d∞ metric. However,
∫ ∞ ∫ ∞ ∫
1 ∞ 1
d1 (fn , f ) = |fn (t) − f (t)| dt = g(nt − n2 ) dt = g(s) ds = .
−∞ −∞ n −∞ n
We conclude that, if we use the metric d1 , then fn converges to f . ⋄
Do you know why closed sets are called closed ? Because of the following property.
Proposition 3. Let (X, d) be a metric space, and suppose F is a closed set of X. Let xn ∈ F
be a sequence converging to x ∈ X. Then x ∈ F as well.
Proof. Suppose x ̸∈ F . Then x ∈ F c . Since F is closed, then F c is open. So x is an interior
point of F c , and for some r > 0 we have that Br (x) is totally contained in F c . But xn converges
to x, so only finitely many elements of the sequence xn are outside Br (x). We conclude that
some element xn is inside Br (x), and therefore in F c . But no element xn can be both in F and
in F c . We conclude that x must be in F . 2

When a set is not closed, we can make it closed by adding points to it; that is, by closing it.
Definition 1.6. The closure of S in X is the set S containing all limits of sequences in S:

S = {x ∈ X | x = lim xn for some sequence xn ∈ S}.

Note that S is a subset of S, because if x ∈ S then we can take xn = x for all n. ⋄


Now we come to a subtle point: What if X itself is not closed? How can we check that?
According to the definition, X is not closed if there is a sequence xn ∈ X, converging to a
point x that is not in X. But X contains the totality of points! How can xn be converging to
something that is not even inside the space X? As a matter of fact, if we don’t have x, how can
we even talk about xn converging? (The definition of convergence presupposes the existence of
x ∈ X.)
The way around this conundrum is to define convergence in a way that does not mention
the limit point x at all.
Definition 1.7. We say that lim d(xn , xm ) = 0 if, for each ε > 0, there is an index n0 such
n,m→∞
that, if n > n0 and m > n0 , then d(xn , xm ) < ε. In this case the sequence xn is said to be a
Cauchy sequence. ⋄
The main result is that every convergent sequence is also a Cauchy sequence. (You saw this
result in Fundamentals of real Analysis, so I won’t repeat it here. Exercise.) However, not all
Cauchy sequences are convergent.
Example 7. Let X = C([0, 2], R) be the space of real-valued continuous functions defined on
the interval [0, 2], and let’s adopt the metric
∫2
d1 (f, g) = |f (t) − g(t)| dt.
0
1.6. THE LEBESGUE INTEGRAL 15

We define the sequence {


tn , 0 ≤ t ≤ 1;
fn (t) =
1, 1 ≤ t ≤ 2.
We can easily check that d1 (fn , fn ) = 0, and that for n ̸= m we have

∫1
1 1
d1 (fn , fm ) = |tn − tm | dt = − .
min{n + 1, m + 1} max{n + 1, m + 1}
0

We see that the sequence fn is a Cauchy sequence. But, alas, it does not converge to any function
in X. Draw a picture of successive fn , and convince yourself that the correct limit should be
{
0, 0 ≤ t < 1;
f (t) =
1, 1 ≤ t ≤ 2.

But this f is not continuous, and hence, not in X. ⋄

This example shows that the space C([0, 2], R), with the metric d1 , does not contain all
possible limit points of sequences. Spaces like that are said to be incomplete.

Definition 1.8. A metric space (X, d) is said to be complete if every Cauchy sequence in X
has a limit in X. Otherwise (X, d) is incomplete. ⋄

When a space (X, d) is not complete, it is always possible to make it go through a comple-
tion process, that is to say, find a complete space (X, d), where we can view X as a subset of
X, and d as an extension of d. You probably have seen this process in action when we complete
the rational number set Q and obtain R.

1.6 The Lebesgue integral


Do you remember how we define the Riemann integral? Say we have a continuous function f :
[a, b] → R. We break up the domain into n intervals (or same length, usually), and approximate
the vertical (the height). Let’s label the sub-intervals in the domain. Let ∆t = b−a n be the
length of each sub-interval. For j = 1 to j = n, we have the interval

Ij = [a + (j − 1)∆t, a + j∆t].

So:

I1 = [a, a + ∆t], I2 = [a + ∆t, a + 2∆t], ··· In = [a + (n − 1)∆t, a + n∆t] = [b − ∆t, b].

Inside interval Ij we pick two values for t, call them tj and tj , with the property that

f (tj ) ≤ f (t) ≤ f (tj ),

for any t ∈ Ij . Clearly we have


n ∑
n
Ln = f (tj )∆t ≤ f (tj )∆t = Un .
j=1 j=1
16 TOPIC 1. MOTIVATION, METRIC, AND INTEGRATION

Here Ln is the lower sum, and Un is the upper sum. When we take a limit as n → ∞, we
prove that there is a common number M such that
M = lim Ln = lim Un .
n→∞ n→∞

This number M is defined to be the integral of f :


∫ b
M= f (t) dt.
a

It seems to be an elaborate way of defining integrals, but there’s good reason to do it in this
way, if we want to prove theorems.
One by-product of the definition of (Riemann) integral, is that we can define integrals even
if the function f is not continuous. Suppose f has discontinuities at finitely many points
x1 , x2 , . . . , xm inside the interval [a, b]. It is possible (and not too difficult) to show that the
limit M still exists, so the integral still exists.
Now, one thing that is very desirable to have is the following. Suppose fn is a sequence of
functions converging to f . It would be extremely desirable to have
∫ b ∫ b
lim fn (t) dt = f (t) dt.
n→∞ a a

Unfortunately, this is not true. (In general.) And one of the reasons why it is not true is that,
even if fn is integrable (in the sense of Riemann), and f is the limit of the fn , it may be that f
is not integrable! Here is the important example.
Example 8. We consider functions defined on the interval [a, b] = [0, 1]. We define
{
1, if t is rational;
f (t) =
0, if t is irrational.
It is easy to see (but you do have to check the details) that, for this function f we have Ln = 0,
Un = 1 for all n, so there is no common number M that can be the integral of this f . This
function does not have a Riemann integral.
We proceed to define fn . The rational numbers are countable, so let x1 , x2 , . . . be an enu-
meration of all rational numbers inside [0, 1]. We define fn as


 f (x1 ) = 1;



 f (x2 ) = 1;
fn (t) = ..
 .



 f (x n = 1;
)

f (t) = 0 else.
Since fn has only finitely many discontinuities, then fn is Riemann-integrable, and in fact
∫ 1
fn (t) dt = 0.
0
∫1
Moreover, f (t) = limn→∞ fn (t). But 0 f (t) dt does not exist. ⋄
This is a bad state of affairs. If fn is integrable, and fn converges to f , it would be nice if f
were integrable (at least!). The mathematician Henri Lebesgue (1875-1941, French) figured out
a way to re-define integration, and circumvent this problem. Lebesgue provided a new definition
of integration. Here are some features of his integral.
1.6. THE LEBESGUE INTEGRAL 17

• Every function that was Riemann-integrable is also Lebesgue-integrable. (So, what used
to be integrable is still integrable.)

• The value of the Lebesgue integral is the same as the value of the Riemann integral (when
both values exist).

• Some new functions now are integrable (like the function f in the last example).

• If fn is Lebesgue-integrable, and f is the limit of the sequence fn , then f will be integrable


as well.

How did Lebesgue accomplish this? The key wasn’t to break the domain into parts. The
key was to break the image into parts. There are lots of technical details, but the main idea is
the following.
Suppose we are given a function f : [a, b] → R (and for the sake of argument let’s say it is
continuous, although that is not fundamental). Break R into segments of length 1/n. For each
k ∈ Z, define Jk = [k/n, (k + 1)/n). Note that the left endpoint is closed, the right endpoint is
open. Define the set Ak as follows:

t ∈ Ak ⇐⇒ f (t) ∈ Jk .

That is, t ∈ Ak means that nk ≤ f (t) < k+1 n . This definition is such that the sets Ak don’t
intersect (for different k), and their union is the interval [a, b] (because f is a function). Now
Lebesgue defines a function fn as follows.
k
fn (t) = , if t ∈ Ak .
n
That is, fn is constant on the set Ak . Also, for every t we have fn (t) ≤ f (t). Lebesgue goes on
to define the integral of fn as
∫ b ∑∞
k
fn (t) dt = |Ak |,
a n
k=−∞

where |Ak | is the size (measure) of the set Ak . (The measure of an interval is the length of the
interval, for example.)
(Lebesgue proves a lot of things: The measure |Ak | is defined; the sum on the right is finite,
etc. We will skip all that.)
Then Lebesgue defines the integral of f as
∫ b ∫ b
f (t) dt = lim fn (t) dt.
a n→∞ a

(He has to prove that the limit exists. I am, of course, skipping lots of details.)
With all of these definitions in place, Lebesgue goes on to prove the Monotone Conver-
gence Theorem and the Dominated Convergence Theorem. Here is the statement of the
Monotone Convergence Theorem:

Suppose fn is a sequence of integrable functions, and that fn (t) → f (t) for each
t. Suppose also that for each t we have fn (t) ≤ fn+1 (t) (that is, the sequence fn
converges monotonically to f ).

THEN
18 TOPIC 1. MOTIVATION, METRIC, AND INTEGRATION

∫ b ∫ b
the function f is also integrable, and f (t) dt = lim fn (t) dt.
a n→∞ a

And here is the statement of the Dominated Convergence Theorem:

Suppose fn is a sequence of integrable functions, and that fn (t) → f (t) for each t.
Suppose also that there is an integrable function g such that |fn (t)| ≤ g(t) for all n
and all t.

THEN
∫ b ∫ b
the function f is also integrable, and f (t) dt = lim fn (t) dt.
a n→∞ a

In our course, whenever we talk about integrals, we mean the Lebesgue integral.
Topic 2

The spaces ℓp

Instead of working with abstract theory, I want to start with concrete examples that form the
basis for the abstraction. As motivation, consider Fourier series.
We start with the interval I = [−π, π], and a function f : I → C (or f : I → R, same idea).
We start by allowing f to be differentiable. (And therefore, integrable.) We define, for each
integer n ∈ Z, the Fourier coefficient
∫ π
1
cn = f (t)e−int dt.
2π −π

This coefficient can be computed (in principle) because the function f (t)e−int is continuous.
The main point of Fourier theory is that we can now write


f (t) = cn eint .
n=−∞

What we have done is to create a correspondence between functions and coefficients:

f ⇐⇒ c0 , c±1 , c±2 , . . .

For each function there corresponds one set of coefficients, and conversely, for a given set of
coefficients, there will be only one possible function f . That is, we have a correspondence
{ } { }
function coefficient
⇐⇒
space space

We want to have better insight about what these function spaces and coefficient spaces may
be. What coefficients are allowed? For example, we can prove the following formula:
∫ π ∞

1
|f (t)| dt =
2
|cn |2 .
2π −π n=−∞

This is the energy formula, also known as Parseval’s identity. If f is continuous, then the left
side is finite, which means the right side is finite. This means that not all possible coefficients
are allowed. We can only allow coefficients where the sum to the right is finite.
This is going to be our starting point.

19
20 TOPIC 2. THE SPACES ℓP

2.1 The spaces ℓp


Let p be a number bigger or equal to than 1, that is, 1 ≤ p < ∞. We define the space ℓp (N) as
all sequences of complex numbers, indexed by n ∈ N, where a certain summation condition is
true. We will give a formal definition below.

Definition 2.1. The space ℓp (N) is formed by all functions c : N → C such that the sum


|c(n)|p
n=0

is finite. We will write cn = c(n). ⋄

The language of functions is convenient for the definition, but really, an element in ℓp is a
one-sided sequence of numbers.
We could also define the space ℓp (Z).

Definition 2.2. The space ℓp (Z) is formed by all functions c : Z → C such that the sum


|c(n)|p
n=−∞

is finite. We will write cn = c(n). ⋄

We can also define the space ℓ∞ (N), as follows.

Definition 2.3. The space ℓ∞ (N) is formed by all functions c : N → C such that

sup |c(n)|
n∈N

is finite. That is, ℓ∞ (N) contains all bounded sequences. ⋄

The space ℓ∞ (Z) is defined in a similar way. We will simply write ℓp from now on, with
1 ≤ p ≤ ∞, and the context should make clear if we are talking about ℓp (N) or ℓp (Z). In any
case, results that work for one also work for the other.
Of all ℓp spaces, the most important are ℓ1 , ℓ2 , and ℓ∞ .

Example 9. The space ℓ1 (N) contains all sequences such that




|cn | < ∞.
n=0

The sequence cn = 2−n is in ℓ1 , but the sequences cn = 1/(n + 1), and cn = 1, are not in ℓ1 . ⋄

Example 10. The space ℓ2 (N) contains all sequences such that


|cn |2 < ∞.
n=0

The sequences cn = 2−n and cn = 1/(n + 1) are in ℓ2 , but the sequence cn = 1 is not. ⋄
2.1. THE SPACES ℓP 21

Example 11. The space ℓ∞ (N) contains all sequences such that

sup |cn | < ∞.


n≥0

The sequences cn = 2−n , cn = 1/(n + 1), and cn = 1 are all in ℓ∞ ⋄


What we will do now is prove stuff. It should be reasonably clear why we are asking these
questions and proving them. In what follows, if c0 , c1 , c2 etc, is the sequence, we will refer to it
simply as c.
Definition 2.4. Suppose a ∈ ℓp . We define
( ∞
)1/p

∥a∥p = |an |p ,
n=0

if p < ∞, and we define


∥a∥∞ = sup |an |.
n≥0

These quantities are finite, by definition of ℓp . ⋄


Definition 2.5. Suppose a ∈ ℓp , and b ∈ ℓp . Let λ be a complex number. We define a + b and
λa as
(a + b)n = an + bn , (λa)n = λan .
In other words, the sequence a + b is a0 + b0 , a1 + b1 , etc, and the sequence λa is λa0 , λa1 , etc.⋄
Proposition 4. As before, suppose a, b ∈ ℓp , and λ ∈ C. Then a + b and λa are in ℓp as well.
Moreover,
∥λa∥p = |λ| ∥a∥p .
Proof. The proof for the case p = ∞ is slightly different from the proof when p < ∞. Let’s do
these proofs separately. First the case when p = ∞.
Fix an integer N > 0. Let AN be the largest of the (finitely many) numbers |a0 |. . . , |aN |,
that is,
An = max |an |.
0≤n≤N

Similarly,
BN = max |bn |.
0≤n≤N

Then, if 0 ≤ n ≤ N , we have

|an + bn | ≤ |an | + |bn | ≤ AN + BN .

As a consequence,
max |an + bn | ≤ AN + BN .
0≤n≤N

Since AN is one of the numbers |a0 |,. . . , |aN |, we conclude that

AN ≤ ∥a∥∞ .

Similarly, BN ≤ ∥b∥∞ . So

max |an + bn | ≤ AN + BN ≤ ∥a∥∞ + ∥b∥∞ .


0≤n≤N
22 TOPIC 2. THE SPACES ℓP

Now take a limit when N → ∞ to obtain

∥a + b∥∞ = lim max |an + bn | ≤ ∥a∥∞ + ∥b∥∞ .


N →∞ 0≤n≤N

This proves that ∥a + b∥∞ is finite. We also have

∥λa∥∞ = sup |λan | = sup |λ| |an | = |λ| sup |an | = |λ| ∥a∥∞ .
n≥0 n≥0 n≥0

The case p = ∞ is done. We now turn our attention to the case p < ∞.
We have
(∞ )1/p ( ∞ )1/p ( ∞
)1/p
∑ ∑ ∑
∥λa∥p = |λan |p = |λ|p |an |p = |λ|p |an |p = |λ| ∥a∥p .
n=0 n=0 n=0

Let’s now show that (a + b) ∈ ℓp . Since 1 ≤ p < ∞, the function f (x) = xp is convex. That is,
if 0 ≤ x, y, then ( )
x+y f (x) + f (y)
f ≤ .
2 2
Take x = |an | and y = |bn |. Then
( )p
|an | + |bn | |an |p + |bn |p
≤ .
2 2

That is,
2p
|an + bn |p ≤ (|an | + |bn |)p ≤ (|an |p + |bn |p ).
2
As a consequence, ( )

∑ ∞
∑ ∞

|an + bn |p ≤ 2p−1 |an |p + |bn |p .
n=0 n=0 n=0

That is,


|an + bn |p ≤ 2p−1 (∥a∥pp + ∥b∥pp ) < ∞,
n=0

because ∥a∥p and ∥b∥p are both finite. But this means that ∥a + b∥p is finite, and a + b is in ℓp .
2

What the last proposition tells us is that the spaces ℓp are vector spaces (in the sense of
linear algebra). Let’s recall the abstract definition of vector space.

Definition 2.6. Let K be a field (think K = R or K = C). A non-empty set V is said to be a


vector space over K if the following conditions are true.

1. There is an operation + : V × V → V , called the sum on V . Instead of writing +(v, w)


we will write v + w;

2. For any v, w ∈ V we have v + w = w + v;

3. There is a special element O ∈ V (the zero element) such that O + v = v + O = v, for all
v ∈V;
2.2. HÖLDER’S AND MINKOWSKI’S INEQUALITIES FOR ℓP 23

4. Each v ∈ V has an negative w; that is, for each v ∈ V there is some w ∈ V such that
v + w = O.

5. The sum is associative: v + (w + z) = (v + w) + z, for any v, w, z ∈ V .

6. There is an operation · : K × V → V , called the scalar product. If λ ∈ K and v ∈ V , we


will write λv, instead of ·(λ, v).

7. If λ, µ ∈ K and v, w ∈ V , then (λ+µ)v = λv +µv, λ(v +w) = λv +λw, and (λµ)v = λ(µv).

8. If 1 ∈ K is the neutral element for the multiplication, then 1v = v for all v ∈ V . ⋄

It is an exercise to check that ℓp is a vector space over C. At this point we could ask typical
vector space questions: is there a basis for ℓp ; what are the subspaces of ℓp , etc. We will go in
a slightly different direction.

2.2 Hölder’s and Minkowski’s inequalities for ℓp


Our goal is to obtain the following inequality, valid in ℓp :

∥a + b∥p ≤ ∥a∥p + ∥b∥p .

This is called Minkowski’s inequality. In this section we will be concerned with proving it (there’s
a trick), and later we will see what it can do for us. The path to proving this inequality is long.
We will first prove Young’s inequality, then we will prove Hölder’s inequality, and then we will
be able to prove Minkowski’s inequality.

Definition 2.7. If 1 < p < ∞, the conjugate exponent of p is the only number q such that
1 1
+ = 1.
p q

(Note that we also have 1 < q < ∞.) We say that p and q are conjugate exponents of each
other.
If p = 1, then q = ∞ is its conjugate exponent, and reciprocally. ⋄

The conjugare exponent of p = 2 is q = 2. The conjugate exponent of p = 3 is q = 3/2. You


get the idea.
Here is Young’s inequality.

Proposition 5 (Young’s inequality). Let 1 < p, q < ∞ be conjugate exponents. Let 0 ≤ a, b be


non-negative real numbers. Then
ap bq
ab ≤ + .
p q
Proof. Inequalities like this are often tackled with calculus. For example, the left side is a
function of two variables f (a, b) = ab, while the right side is g(a, b) = ap /p + bq /q. We want to
show that f (a, b) ≤ g(a, b), or g − f ≥ 0. If we show that the minimum of g − f is not negative,
that would accomplish the goal.
Since we don’t want to be working with functions of two variables, let’s fix b ≥ 0. We want
to show that
ap b q
0≤ + − ab.
p q
24 TOPIC 2. THE SPACES ℓP

ap q
Define f (a) = p + bq − ab. Let’s compute the minimum of f . Take derivatives (with respect to
a) to get
f ′ (a) = ap−1 − b, f ′′ (a) = (p − 1)ap−2 .
Since p > 1, we see that f ′′ (a) > 0, and f is convex, so it can only have a minimum. Since
f (0) = bq /q ≥ 0, and f (+∞) = +∞ (because p > 1), we see that f has one and only one
minimum. It must be located at f ′ (a) = 0, or ap−1 = b. That is, a = b1/(p−1) .
Now, since p and q are conjugate, you can check that p−11
= q − 1, and that pq = p + q. We
have
f (bq−1 ) ≤ f (a),
for all a. Let’s compute the value f (bq−1 ).

(bq−1 )p bq bq bq
f (bq−1 ) = + − bq−1 b = + − bq = 0.
p q p q
We conclude that 0 ≤ f (a) for all a ≥ 0, as we wanted.
2

As an aside (that is somewhat important), we see that Young’s inequality is attained if


and only if a = bq−1 , at the minimum. But p and q are conjugate, and you can check that
p(q − 1) = q, so equality is attained if, and only if, ap = bq . Store that fact.
Next we will use Young’s inequality to prove Hölder’s inequality, which is a very useful fact,
and very very deep. I will give you some intuition about it, after the proof.
Proposition 6 (Hölder’s inequality). Let p, q ≥ 1 be conjugate exponents. Suppose a ∈ ℓp and
b ∈ ℓq (not on the same space!). Define c = ab, that is, cn = an bn for all n. Then c ∈ ℓ1 , and
we have the inequality
∥ab∥1 ≤ ∥a∥p ∥b∥q .
Proof. First let’s treat the case p, q > 1. According to Young’s inequality, for each n we have
|an |p |bn |q
|an bn | ≤ + .
p q
If we now sum for all n. we obtain

∑ 1 1
∥ab∥1 = |an bn | ≤ ∥a∥pp + ∥b∥qq ,
p q
n=0

showing that ab ∈ ℓ1 – but falling short of proving Hölder’s inequality. So we need to do a little
better than that.
If either ∥a∥p = 0 or ∥b∥q = 0, then the inequality becomes 0 ≤ 0, which is true. So assume
neither ∥a∥p nor ∥b∥q are zero.
Define α = ∥a∥p , and β = ∥b∥q , and the new sequences An = an /α, Bn = bn /β. From what
we saw before, we have that the sequence A is in ℓp , and the sequence B is in ℓq . Moreover,
a 1
∥A∥p = ∥ ∥p = ∥a∥p = 1.
α α
Similarly, ∥B∥q = 1. Applying Young’s inequality, we obtain

|An |p |Bn |q
|An Bn | ≤ + .
p q
2.2. HÖLDER’S AND MINKOWSKI’S INEQUALITIES FOR ℓP 25

Summing for all n gives us

∥A∥pp ∥B∥qp 1 1
∥AB∥1 ≤ + = + = 1.
p q p q

But then
a b
∥ ∥1 ≤ 1.
αβ
That is,
∥ab∥1
≤ 1.
αβ
But that is Hölder’s inequality.
We need to consider the case p = 1, q = ∞. Note that for all n we have |bn | ≤ ∥b∥∞ . Then

|an bn | ≤ |an | ∥b∥∞ .

Summing for all n gives us


∥ab∥1 ≤ ∥a∥1 ∥b∥∞ ,

as desired.
2

Here is some intuition about Hölder’s inequality. It has to do with rates of convergence.
With 1 < p < ∞, what does it mean to say that a ∈ ℓp ? It means that the sum


|an |p
n=0

is finite. Now, we know that, if ε > 0, then



∑ ∞

1 1
= ∞, and < ∞.
n n1+ε
n=1 n=1

So, if the sum of |an |p is finite, the numbers |an |p must be converging to 0 faster that 1/n
converges to 0. That is, |an | converges to 0 faster than n1/p
1
.
Similarly, if b ∈ ℓ , then |bn | converges to zero faster that n1/q
q 1
.
This means that |an bn | converges to 0 faster that

1 1 1 1
= 1 = .
n1/p n1/q + 1q n
n p

That is, the sum of the |an bn | must converge.


Once we have proved Höder’s inequality, we are ready to prove Minkowski’s inequality.

Proposition 7 (Minkowski’s inequality). If 1 ≤ p ≤ ∞, and a, b ∈ ℓp , then

∥a + b∥p ≤ ∥a∥p + ∥b∥p .


26 TOPIC 2. THE SPACES ℓP

Proof. We should point out that the case p = 1 is simply a consequence of |an + bn | ≤ |an | + |bn |.
Moreover, the case p = ∞ is a consequence of the same inequality: Since

|an + bn | ≤ |an | + |bn |,

and since |an | ≤ ∥a∥∞ , and |bn | ≤ ∥b∥∞ , then

|an + bn | ≤ ∥a∥∞ + ∥b∥∞ .

But then
∥a + b∥∞ ≤ ∥a∥∞ + ∥b∥∞ .
So the cases p = 1 and p = ∞ are done. We now assume 1 < p < ∞. We compute


∥a + b∥pp = |an + bn |p
n=0
∑∞
= |an + bn | |an + bn |p−1
n=0
∑∞
≤ (|an | + |bn |) |an + bn |p−1
n=0
∑∞ ∞

= |an | |an + bn |p−1 + |bn | |an + bn |p−1 .
n=0 n=0

Let q be the conjugate exponent of p. Now observe: since a, b ∈ ℓp , then (a + b) ∈ ℓp (already


proved), and so (a + b)p−1 is in ℓq (because q(p − 1) = p). We can use Hölder’s inequality, to
obtain

(∞ )1/p ( ∞ )1/q
∑ ∑ ∑
|an | |an + bn |p−1 ≤ |an |p |an + bn |p = ∥a∥p ∥a + b∥p/q
p .
n=0 n=0 n=0

Similarly,


( ∞
)1/p ( ∞
)1/q
∑ ∑ ∑
|bn | |an + bn |p−1 ≤ |bn |p |an + bn |p = ∥b∥p ∥a + b∥p/q
p .
n=0 n=0 n=0

Then
∥a + b∥pp ≤ (∥a∥p + ∥b∥p ) ∥a + b∥p/q
p .

That is,
p− pq
∥a + b∥p ≤ ∥a∥p + ∥b∥p
p
But p − q = 1.
2

2.3 ℓp is a metric space


Minkowski’s inequality allows us to define a metric on ℓp .
2.3. ℓP IS A METRIC SPACE 27

Proposition 8. If 1 ≤ p ≤ ∞, and a, b ∈ ℓp , we define

dp (a, b) = ∥a − b∥p .

Then dp is a metric in ℓp .
Proof. You should try doing it yourself before reading on. First, we clearly have dp (a, b) ≥ 0.
Now, if dp (a, b) = 0, then ∥a − b∥p = 0, and this can only happen if |an − bn | = 0 for all n. We
conclude that a = b.
Since |an − bn | = |bn − an |, we conclude that dp (a, b) = dp (b, a).
Finally, suppose a, b, c ∈ ℓp . Then

dp (a, c) = ∥a − c∥p
= ∥(a − b) − (b − c)∥p
≤ ∥a − b∥p + ∥b − c∥p
= dp (a, b) + dp (b, c).

That is the triangle inequality proved. 2

As soon as we have this result, it is natural to ask whether ℓp is a complete metric space. (It
is.) We will need to develop some good notation here to deal with sequences. The problem is
this: We will be talking about sequences of elements in ℓp . But each element in ℓp is a sequence,
so we will be talking about sequences of sequences. Good notation is a must. I made up the
following notation. If we write cn ∈ ℓp , this means that c0 , c1 , c2 , etc, are each of them an
element in ℓp . The element c0 is the sequence

c0 = (c00 , c01 , c02 , . . . )

So, if we write cnm , this means the mth entry in cn . I hope this notation is clear.
Proposition 9. The space ℓp (N, C) is complete.
Proof. We need to show: If cn ∈ ℓp is a Cauchy sequence, then there is an element d ∈ ℓp such
that limn→∞ ∥cn − d∥p = 0.
Assume cn ∈ ℓp is a Cauchy sequence. The following diagram can help understand what we
will do.

c0 = ( c00 c01 c02 ··· c0m ··· )


c1 = ( c10 c11 c12 ··· c1m ··· )
c2 = ( c20 c21 c22 ··· c2m ··· )
.. .. .. .. ..
. . . . . )
cn = ( cn0 cn1 cn2 · · · cnm · · · )
↓ ↓ ↓ ↓ ↓ )
d = ( d0 d1 d2 · · · dm · · · )
The diagram provides a map of the proof. The sequence cn is a Cauchy sequence. If we focus
on the mth coordinate only (so m is fixed), the sequence cnm is a sequence of complex numbers
(also a Cauchy sequence), converging to some value dm ∈ C. This gives us the element d ∈ ℓp ,
the natural candidate to the the limit of the sequence cn . Let’s get to the proof.
Our first goal is to obtain the element d ∈ ℓp . We have, for any n, m, that

|cnm | ≤ ∥cn ∥p .
28 TOPIC 2. THE SPACES ℓP

As a consequence we have, for any n, k, m, that

|cnm − ckm | ≤ ∥cn − ck ∥p .

Since cn is a Cauchy sequence in ℓp , then

lim ∥cn − ck ∥p = 0.
n,k→∞

This implies that


lim |cnm − ckm | = 0,
n,k→∞

and (for fixed m), the sequence cnm ∈ C is a Cauchy sequence. Since C is complete, we now
that this numerical sequence has a limit.

lim cnm = dm ∈ C.
n→∞

Now we define
d = (d0 , d1 , d2 , . . . ).
To finish the first goal, we must show that d is an element of ℓp (not obvious).
First we do the case 1 ≤ p < ∞. Fix N > 0. Use Minkowski’s inequality to obtain
( )1/p ( )1/p ( )1/p

N ∑
N ∑
N
|dm |p ≤ |cnm − dm |p + |cnm |p
m=0 m=0 m=0

We have ( )1/p ( )1/p



N ∞

|cnm |p ≤ |cnm |p = ∥cn ∥p .
m=0 m=0
Because cn is a Cauchy sequence, there is a number M > 0 such that, for all n we have
∥cn ∥p ≤ M . (This is a general fact about Cauchy sequences in metric spaces.)
So now we have ( N )1/p ( N )1/p
∑ ∑
|dm |p ≤ |cnm − dm |p + M.
m=0 m=0
Since for each fixed m we have cnm → dm , we can pick n large enough so that
( )1/p

N
|cnm − dm |p ≤ 1.
m=0

We obtain ( )1/p

N
|dm | p
≤ 1 + M.
m=0
Since this is true for any N , we have
( ∞
)1/p

∥d∥p = |dm | p
≤ 1 + M < ∞,
m=0

and we conclude that d ∈ ℓp .


2.3. ℓP IS A METRIC SPACE 29

Now for the case when p = ∞. We write

|dm | ≤ |dm − cnm | + |cnm | ≤ |dm − cnm | + ∥cn ∥∞ ≤ |dm − cnm | + M.

For a fixed m, take n large enough so that |dm − cnm | ≤ 1. Then

|dm | ≤ 1 + M

for all m, and as a result we have

∥d∥∞ ≤ 1 + M < ∞,

showing that d ∈ ℓ∞ .
The first goal was accomplished.
Our second (and final) goal is to show that limn→∞ ∥cn − d∥p = 0. Again we must consider
the cases 1 ≤ p < ∞, and the case p = ∞.
The proof gets technical now. Let’s start with the case 1 ≤ p < ∞. We want to show that
the quantity
∥cn − d∥p
becomes as small as we want, if we take n large. More precisely, we want to show that given
ε > 0, there’s some n0 such that if n ≥ n0 , then ∥cn − d∥p < ε. Here is our strategy to do this.
We will write ( N )1/p
∑ ∞

∥cn − d∥p = |cnm − dm | +
p
|cnm − dm |p
.
m=0 m=N +1

The first sum is a finite sum, and we should have no problem is showing that the finite sum
is small if n → ∞. The second sum is the tail-end of the series. To show that this tail-end is
small, we will use the fact that cn is a Cauchy sequence, and introduce ck (another element in
the sequence), so that cn and ck are close together, and ck is also close to d.
Let’s start. Given ε > 0, there’s n1 such that, if n, k ≥ n1 , then ∥cn − ck ∥p < ε/3. Fix
k ≥ n1 . With k fixed, choose N such that
( ∞ )1/p

|ckm − dm |p
< ε/3.
m=N +1

From Minkowski’s inequality we have


( ∞
)1/p ( )1/p ( ∞
)1/p

N ∑ ∑
N ∑
|cnm − dm |p + |cnm − dm |p ≤ |cnm − dm |p + |cnm − dm |p .
m=0 m=N +1 m=0 m=N +1

Since N is fixed, there is some n0 ≥ n1 such that, if n ≥ n0 , then


( )1/p

N
|cnm − dm | p
< ε/3.
m=0

Using Minkowski’s inequality once again, we have


( ∞ )1/p ( ∞ )1/p ( ∞ )1/p
∑ ∑ ∑
|cnm − dm |p ≤ |cnm − ckm |p + |ckm − dm |p .
m=N +1 m=N +1 m=N +1
30 TOPIC 2. THE SPACES ℓP

Then
( )1/p ( ∞
)1/p ( ∞
)1/p

N ∑ ∑
∥cn − d∥p ≤ |cnm − dm |p + |cnm − ckm |p + |ckm − dm |p
m=0 m=N +1 m=N +1
ε ε ε
< + + = ε.
3 3 3
The last thing to do is the case p = ∞.
Given ε > 0, let n0 be such that, if n, k ≥ n0 , then ∥cn − ck ∥∞ < ε/2. Fix any m. Then

|cnm − dm | ≤ |cnm − ckm | + |ckm − dm |.

We have
|cnm − ckm | ≤ ∥cn − ck ∥∞ < ε/2.
Hence
ε
|cnm − dm | <
+ |ckm − dm |.
2
Now choose k ≥ n0 such that |ckm − dm | < ε/3. We arrive at
ε ε 5ε
|cnm − dm | < + = .
2 3 6
Since this is true of any m, we have

∥cn − d∥∞ ≤ < ε.
6
2
Topic 3

The spaces Lp

The spaces Lp , to be defined below, are closely related to the spaces ℓp , and are central to func-
tional analysis, being natural spaces where to define boundary conditions for partial differential
equations.

3.1 Almost a metric


In this topic, I will always be an interval of the real line, open or closed, finite or infinite. For
example, we could have I = [0, 1], or I = [0, ∞), or I = R, etc.
Let L(I) be the space of functions f : I → C such that

|f (t)| dt < ∞.
I

These are called (Lebesgue) integrable functions. We can try to define a metric on L(I) by
setting ∫
d(f, g) = |f (t) − g(t)| dt.
I
We can check that most of the properties for metric are satisfied. For example, d(f, g) ≥ 0,
d(f, f ) = 0, and d(f, g) = d(g, f ) are immediate. The triangle inequality is also not too difficult:

d(f, g) = |f (t) − g(t)| dt
∫I

= |f (t) − h(t) + h(t) − g(t)| dt


∫I
≤ (|f (t) − h(t)| + |h(t) − g(t)|) dt
∫I ∫
= (|f (t) − h(t)| dt + |h(t) − g(t)|) dt
I I
= d(f, h) + d(h, g).
The property that is not true is the following:
d(f, g) = 0 =⇒ f = g.
Here is an example. Take I = [0, 1], f (t) = 0 for all t, and g(t) = 0 for 0 ≤ t < 1, but g(1) = 3.
So, f and g differ for only one value of t, but they do differ. Hence f ̸= g. But the area between
the graphs of f and g measures zero, and so d(f, g) = 0.

31
32 TOPIC 3. THE SPACES LP

We have seen that this d is almost a metric, and that it fails to be a metric for a very silly
reason. The way we get around this problem in mathematics is by declaring that f and g are
equivalent to each other when d(f, g) = 0. If we do that, then we can view d as a metric.

∫ L (I) is the space of all Lebesgue integrable functions f : I → C,


Definition 3.1. The space 1

with the proviso that if I |f − g| dt = 0, then we will say that f is equivalent to g. The metric
in L1 is defined by ∫
∥f − g∥1 = |f (t) − g(t)| dt.
I

The definition given above is subtle, and in fact it is more than one definition. We defined
equivalence of functions, and defined L1 . I want to take a second look at the definition of
equivalence. ∫
Saying that f is equivalent to g when the integral I |f − g| dt = 0 is fine, but we can do a
little bit better. We will keep the discussion at an informal level (that is, we will give the idea
behind proofs, but won’t go into technical details).

Definition 3.2. The length (measure) of an interval J = (a, b) ⊂ R is m(J) = |b − a|. Let
A ⊂ R be any subset of R. We say that A has measure zero (and write m(A) = 0), if the
following is true: Given any ε > 0, there are countably many open
∪∞ intervals Jn = (an , bn ) such
that A is contained in the union of the
∑∞intervals (that is, A ⊂ n=1 Jn ), and the Jn combined
do not measure more than ε (that is, n=1 m(Jn ) ≤ ε). ⋄

In other words, A has measure zero if there is a bigger set containing A, and the measure of
that bigger set is smaller than ε (now do that for every ε).
In general it may be difficult to visualise sets of measure zero, but the example to keep in
mind is the example of Q.

Example 12. The rational numbers Q have measure zero (any countable set has measure zero).
Suppose xn , for n = 1, 2, 3, etc, is an enumeration of Q. Given ε > 0, define
( ε ε )
Jn = xn − n+1 , xn + n+1 .
2 2
Then m(Jn ) = ε
2n . Since xn ∈ Jn , then Q is a subset of the union of the Jn . But

∑ ∞
∑ ( )
ε 1 1 1
m(Jn ) = =ε + + + · · · = ε.
2n 2 4 8
n=1 n=1

We conclude that m(Q) = 0. ⋄

We state the following proposition without proof. (You can try proving it. I think the result
is reasonable.)

Proposition 10. (a) If m(A) = 0 and B ⊂ A, then ∪∞m(B) = 0.


(b) If m(An ) = 0 for n = 1, 2, 3, etc, and A = n=1 An , then m(A) = 0.

Now here is the big definition.

Definition 3.3. Let f, g : I → C be two functions from I ⊂ R to the complex numbers. We say
that f is equal to g almost everywhere (and write f = g a.e.) if there is a set A contained in
I, with m(A) = 0, and f (t) = g(t) if t ̸∈ A. ⋄
3.1. ALMOST A METRIC 33

In other words, f = g a.e. if f and g are different over (at most) a set of measure zero.
And here is the big result.
Proposition 11. Let f, g : I → C be two integrable functions from I ⊂ R to the complex
numbers.
∫ The following two statements are equivalent.
(a) |f (t) − g(t)| dt = 0.
I
(b) f = g a.e.
Proof. For simplicity we notate h(t) = |f (t) − g(t)|.
(a) ⇒ (b). Define A = {t ∈ I | h(t) ̸= 0}, and B = {t ∈ I | h(t) = 0}. Note that I = A ∪ B
and A ∩ B = ∅. Then
∫ ∫ ∫ ∫
0 = h(t) dt = h(t) dt + h(t) dt = h(t) dt,
I A B A

because h(t) = 0 when t ∈ B. Note that is t ∈ A, then h(t) > 0, and vice-versa. Define the
intervals Jk = (2−k−1 , 2−k ]. These intervals are disjoint, and


(0, ∞) = Jk .
k=−∞

Define Ak as follows: t ∈ Ak ⇔ f (t) ∈ Jk . The sets Ak must be disjoint, and their union must
be A. So ∫ ∞ ∫

0= h(t) dt = h(t) dt.
A k=−∞ Ak

But if t ∈ Ak , then h(t) > 2−k−1 , and so


∞ ∫
∑ ∞

−k−1
0> 2 dt = 2−k−1 m(Ak ) ≥ 0.
k=−∞ Ak k=−∞

We conclude that m(Ak ) = 0 for all k, and so




m(A) = m(Ak ) = 0.
k=−∞

That is, the set where f ̸= g has measure zero; that is, f = g a.e.
(b) ⇒ (a). Start from
∫ ∫ ∫
0≤ h(t) dt = h(t) dt + h(t) dt,
I A B

with A and B as before. Again as before, since h(t) = 0 when t ∈ B, we have b h(t) dt = 0, and
so ∫ ∫
0 ≤ h(t) dt ≤ h(t) dt.
I A
With the same intervals Ak as before, we have
∫ ∫ ∞ ∫

0 ≤ h(t) dt ≤ h(t) dt = h(t) dt.
I A k=−∞ Ak
34 TOPIC 3. THE SPACES LP

If t ∈ Ak , then h(t) ≤ 2−k , and so


∫ ∞ ∫
∑ ∞

−k
0 ≤ h(t) dt ≤ 2 dt = 2−k m(Ak ).
I k=−∞ Ak k=−∞

But Ak is a subset of A, and m(A) = 0. Hence m(Ak ) = 0, and we obtain



0 ≤ h(t) dt ≤ 0.
I
2

One consequence of this equivalence between functions is that it makes no sense to talk about
the value of f at t0 . Let’s explain that. If we view f as a function, then we can talk about f (t0 )
for a fixed t0 . No problem. But if g is equivalent to f , it is not reasonable to expect the value
g(t0 ) to be the same value as f (t0 ) because, as we now know, f (t) and g(t) can have different
values for individual t, and still be equivalent. Hence, if we view f not as a function, but as a
representative of all functions equivalent to it, we can’t really refer to the value f (t0 ) (because
a different representative g can have a different value g(t0 ) ̸= f (t0 )).

3.2 The spaces Lp (I)


Definition 3.4. Let 1 ≤ p < ∞. The space Lp = Lp (I) is the space of all functions f : I → C
such that ∫
|f (t)|p dt < ∞.
I
In that case we will write (∫ )1/p
∥f ∥p = |f (t)|p dt
I

We also need to define the space L∞ .
Definition 3.5. We say that a function f : I → C is essentially bounded if there is a set
A ⊂ I of measure zero, and a constant M > 0, such that

|f (t)| ≤ M

for all t ̸∈ A. ⋄
To be essentially bounded a function can be unbounded, but only over a set of measure zero.
Proposition 12. Suppose f is essentially bounded, and g is equivalent to f . Then g is also
essentially bounded.
Proof. Since f is essentially bounded, there’s a set A with m(A) = 0, and f is bounded by M
outside of A.
Since f and g are equivalent, there is a set B with m(B) = 0, and f = g outside B.
Since A and B have measure zero, then C = A ∪ B has measure zero.
If t is outside C, then t is outside both A and B. Suppose t is outside C. Then

|g(t)| = |f (t)| ≤ M,

showing that g is bounded outside C (a set of measure zero). Hence g is essentially bounded. 2
3.3. HÖLDER’S AND MINKOWSKI’S INEQUALITIES FOR LP 35

Definition 3.6. Suppose f is essentially bounded. We define ∥f ∥∞ to be the smallest number


M such that
|f (t)| ≤ M
for all t outside a set A of measure zero. ⋄

Note that, because of the last proposition, if f and g are equivalent, then ∥f ∥∞ = ∥g∥∞ . So
the number ∥f ∥∞ is defined not only for f , but also for all g that are equivalent to f .

Definition 3.7. The space L∞ (I) contains all functions that are essentially bounded. ⋄

As with the spaces ℓp , we have the following result.

Proposition 13. Let 1 ≤ p ≤ ∞. Suppose f, g ∈ Lp , and λ ∈ C. Then f + g ∈ Lp , and


λf ∈ Lp . As a consequence Lp is a vector space over C.

Proof. First the case p = ∞. Let A and B be such that m(A) = m(B) = 0, |f (t)| ≤ ∥f ∥∞ if
t ̸∈ A, and |g(t)| ≤ ∥g∥∞ if t ̸∈ B. Then, if t ̸∈ (A ∪ B), we have

|f (t) + g(t)| ≤ |f (t) + |g(t)| ≤ ∥f ∥∞ + ∥g∥∞ < ∞,

showing that f + g ∈ L∞ and also showing that

∥f + g∥∞ ≤ ∥f ∥∞ + ∥g∥∞ .

Also, if t ̸∈ A we have
|λf (t)| = |λ| |f (t)| ≤ |λ| ∥f ∥∞
showing that λf ∈ L∞ , and ∥λf ∥ = |λ| ∥f ∥∞ .
Now consider the case 1 ≤ p < ∞. The convexity of the function x 7→ xp gives us
( )
|f (t)| + |g(t)| p |f (t)|p + |g(t)|p
≤ .
2 2

Then ∫ ∫ ∫
2p
|f (t) + g(t)| dt ≤ (|f (t)| + |g(t)|) dt ≤
p p
(|f (t)|p + |g(t)|p ) dt < ∞.
I I 2 I
Finally, ∫ ∫
|λf (t))| dt = |λ|
p p
|f (t)|p dt < ∞.
I I

Taking p roots we obtain ∥λf ∥p = |λ| ∥f ∥p .


The vector space result follows from this. 2

3.3 Hölder’s and Minkowski’s inequalities for Lp


The proofs will be very similar to the ones for ℓp . The definition of conjugate exponents is the
same as before.

Proposition 14 (Hölder’s inequality). Let 1 ≤ p, q be conjugate exponents. Take f ∈ Lp and


g ∈ Lq . Then f g ∈ L1 , and
∥f g∥1 ≤ ∥f ∥p ∥g∥q .
36 TOPIC 3. THE SPACES LP

Proof. First the case 1 < p, q < ∞. From Young’s inequality we have
|f (t)|p |g(t)|q
|f (t)g(t)| ≤ + .
p q
Integrating gives us
∫ ∫ ∫
1 1
|f (t)g(t)| dt ≤ |f (t)|p dt + |g(t)|q dt,
I p I q I

showing that f g ∈ L1 .
If ∥f ∥p = 0 then |f |p = 0 a.e., and so f = 0 a.e. as well. But then f g = 0 a.e., and
∥f g∥1 = 0. The same conclusion is true if ∥g∥q = 0. In either case, Hölder’s inequality becomes
0 ≤ 0, which is true. So now we assume neither ∥f ∥p nor ∥g∥q are zero. Define F (t) = f (t)/∥f ∥p ,
and G(t) = g(t)/∥g∥q . Then ∥F ∥p = 1 and ∥G∥q = 1. From Young’s inequality we have

|F (t)|p |G(t)|q
|F (t)G(t)| ≤ + .
p q
Integrating gives us
∫ ∫ ∫
1 1
|F (t)G(t)| dt ≤ |F (t)| dt +
p
|G(t)|q dt,
I p I q I
or
1 1 1 1
∥F G∥1 ≤ ∥F ∥pp + ∥G∥qq = + = 1.
p q p q
But
∥f g∥1
∥F G∥1 = ,
∥f ∥p ∥g∥q
proving Hölder’s inequality.
Next we consider the case p = 1, q = ∞. There is a set A of measure zero such that
|g(t)| ≤ ∥g∥∞ for all t ̸∈ A. We have

∥f g∥1 = |f (t)g(t)| dt
∫I

= |f (t)g(t)| dt + |f (t)g(t)| dt
c
∫A A

= |f (t)g(t)| dt
c
∫A
≤ |f (t)| ∥g∥∞ dt
Ac

= ∥g∥∞ |f (t)| dt
c
∫A
= ∥g∥∞ |f (t)| dt
I
= ∥g∥∞ ∥f ∥1 .

This finishes the proof.


2

We are now ready to prove Minkowski’s inequality.


3.4. LP IS A COMPLETE METRIC SPACE 37

Proposition 15 (Minkowski’s inequality). If 1 ≤ p ≤ ∞ and f, g ∈ Lp , then

∥f + g∥p ≤ ∥f ∥p + ∥g∥p .

Proof. The case p = ∞ was done already, when we proved that Lp is a vector space. The case
p = 1 is simple enough, because
∫ ∫
∥f + g∥1 = |f (t) + g(t)| dt ≤ (|f (t)| + |g(t)|) dt = ∥f ∥1 + ∥g∥1 .
I I

We turn our attention to the case when 1 < p < ∞. We have


∫ ∫
|f (t) + g(t)| dt =
p
|f (t) + g(t)| |f (t) + g(t)|p−1 dt
I ∫I
≤ (|f (t)| + |g(t)|) |f (t) + g(t)|p−1 dt
∫I ∫
= |f (t)| |f (t) + g(t)|p−1 dt + |g(t)| |f (t) + g(t)|p−1 dt.
I I

Since f, g ∈ Lp , then (f + g) ∈ Lp (already proved), and because p, q are conjugate, we have


(f + g)p−1 ∈ Lq (because p − 1 = p/q). We are in a position to use Hölder’s inequality:

p/q
|f (t)| |f (t) + g(t)|p−1 dt ≤ ∥f ∥p ∥f + g∥p ;
∫I
p/q
|g(t)| |f (t) + g(t)|p−1 dt ≤ ∥g∥p ∥f + g∥p .
I

Then
∥f + g∥pp ≤ (∥f ∥p + ∥g∥p ) ∥f + g∥p/q
p .

That is,
p− pq
∥f + g∥p ≤ ∥f ∥p + ∥g∥p .
p
But p − q = 1. 2

3.4 Lp is a complete metric space


We define dp (f, g) = ∥f − g∥p .

Proposition 16. The space Lp is a metric space with he metric dp .

Proof. This could be an exercise right now. We have dp (f, g) ≥ 0, and dp (f, f ) = 0. Also, if
dp (f, g) = 0, then |f − g| = 0 a.e., and so f = g (in the sense that they are equivalent). Also,
dp (f, g) = dp (g, f ) because |f − g| = |g − f |. For the triangle inequality we use Minkowski’s
inequality:
∥f − g∥p = ∥f − h + h − g∥p ≤ ∥f − h∥p + ∥h − g∥p .
2

The big result is the following.

Theorem 1. The metric space Lp (I, C) is complete.


38 TOPIC 3. THE SPACES LP

Proof. Let fn ∈ Lp be a Cauchy sequence, that is, given ε > 0 there is some n0 such that,
if n, m ≥ n0 , then ∥fn − fm ∥p < ε. We want to show that there is some f ∈ Lp such that
∥fn − f ∥p → 0 as n increases.
The first thing to do is to obtain f , and show that f ∈ Lp . This is not easy to do (much
harder than the ℓp case).
The key thing is to prove the following:

The set of values t for which fn (t) is not a Cauchy sequence has measure zero.

That is, define


A = {t ∈ I | fn (t) does not converge}.
Then m(A) = 0. Let’s assume this is true. Then if t ̸∈ A we define

f (t) = lim fn (t),


n→∞

and f (t) = 0 if t ∈ A. This produces our candidate f to be the limit of the fn . Next we need
to show that f ∈ Lp . The case p = ∞ should be treated separately from the case 1 ≤ p < ∞,
so perhaps we should complete the proof for the case p = ∞ first, and then go back to treat the
case 1 ≤ p < ∞.
Assume p = ∞. Define B = Ac . Since fn is a Cauchy sequence in L∞ , there is a value of
M such that ∥fn ∥∞ ≤ M for all n (a general fact about Cauchy sequences in metric spaces).
Then, for any t ∈ B we have

|f (t)| = |f (t)−fn (t)+fn (t)| ≤ |f (t)−fn (t)|+|fn (t)| ≤ |f (t)−fn (t)|+∥fn ∥∞ ≤ |f (t)−fn (t)|+M

Since fn (t) → f (t), by taking limits we obtain

|f (t)| ≤ lim |f (t) − fn (t)| + M = M.


n→∞

But since A has measure zero, we conclude that ∥f ∥∞ ≤ M < ∞, and f ∈ L∞ .


Next we show that ∥fn − f ∥∞ → 0 as n increases. Given ε > 0, there’s n0 such that, if
n, m ≥ n0 , then ∥fn − fm ∥∞ < ε/3. Fix t ∈ B. There is some n1 such that, if m ≥ n1 , then
|fm (t) − f (t)| < ε/3. So, for n ≥ n0 and m ≥ max{n0 , n1 } we have
ε ε 2
|fn (t) − f (t)| ≤ |fn (t) − fm (t)| + |fm (t) − f (t)| ≤ + = ε.
3 3 3
Then
2
sup |fn (t) − f (t)| ≤ ε < ε.
t∈B 3
As a result we have ∥fn − f ∥∞ < ε, if n ≥ n0 , as we wanted to show. Hence L∞ is complete.
Now for the case 1 ≤ p < ∞. The proof is technical, so I will only give the main highlights.
Since fn (t) is converging to f (t) for each t a.e., then |fn (t)|p is converging to |f (t)|p a.e. as well.
It would be nice to be able to use either the Monotone Convergence Theorem, or the Dominated
Convergence Theorem (see Topic 1), but neither theorem applies here. So what we do is use
the sequence fn to define another Cauchy sequence gn such that ∥fn − gn ∥p → 0, and |gn (t)|p
increases monotonically for almost every t. (This is not difficult, but it is technical.) The gn
will be converging to f for almost every t. Now we use the Monotone Convergence Theorem to
conclude that ∫ ∫
lim |gn (t)|p dt = |f (t)|p dt.
n→∞ I I
3.5. UNDERSTANDING LP 39

This shows that f ∈ Lp . Now we use the triangle inequality to write


∥fn − f ∥p ≤ ∥fn − gn ∥p + ∥gn − f ∥p .
As n → ∞, ∥fn − gn ∥p goes to 0 because of the way we produced gn , and ∥gn − f ∥p goes to zero
because of the Dominated Convergence Theorem.
2

3.5 Understanding Lp
I want to have a working understanding of Lp . First, let’s assume the interval I is finite. (That
is, I is one of the following: [a, b], or (a, b], or [a, b), or (a, b), with a < b finite numbers).
When I is finite, then any bounded function is in Lp , because if |f (t)| ≤ M for all t ∈ I,
then ∫ ∫
∥f ∥∞ ≤ M, ∥f ∥p = |f (t)| dt ≤ M p dt = M p |b − a| < ∞.
p p
I I
Are there any unbounded functions in Well, if p = ∞, then no, because by definition Lp
Lp ?
contains functions that are essentially bounded.
For 1 ≤ p < ∞, then yes, there are unbounded functions in Lp . The main example to
understand is the following. Let’s take δ ≥ 0. If a < t0 < b, then
∫ b {
1 ∞, if δ ≥ 0;
dt =
a |t − t0 |1+δ finite, if δ < 0.
To fix ideas, let I = [−1, 1]. If p = 1, the following functions are not in L1 :
1 1 1
, , ,
|t| |t|3/2 |t|2
and so on. But the following functions are in L1 :
1 1 1
, , ,
|t|1/2 |t|−1 |t|2/3
and so on. The first and last of these are unbounded.
1
What if p > 1? Say p = 3. If f (t) = α , then
|t|
1
|f (t)|3 = .
|t|3α
This will be in L3 only if 3α < 1, or α < 1/3. In general, 1/|t|α will be in Lp only if α < 1/p.
We say that the exponent α = 1/p is critical: values lower than this will be in Lp , values bigger
or equal to this will not. So the type of singularity that is possible if f is to belong to Lp is of
a very special type.
Now we look at the case when I is infinite. We have already seen that singularities that
occur at specific values of t will have to be of a certain type. What we want to understand now
is how fast must f be decaying as |t| → ∞ for f to be in Lp . As it turns out, as |t| → ∞ the
behaviour is the opposite, because
∫ ∞ {
1 ∞, if δ ≤ 0;
1+δ
dt =
1 t finite, if δ > 0.
This characterises the type of decay at |t| = ∞ for f to be in Lp . Again, taking f (t) = 1/tα , for
t ≥ 1, we need αp > 1 for f to be in Lp .
40 TOPIC 3. THE SPACES LP

Example 13. Consider the function below, define on I = R.



 0, t ≤ 0;
f (t) = 2/3
1/t , 0 < t < 1;

1/t5 , 1 ≤ t.

Then ∫ ∫ ∫
1 ∞
1 1
|f (t)|p dt = dt + dt.
I 0 t2p/3 1 t5p
The first integral (on the right) is finite only if 2p/3 < 1, or p < 1.5, and the second integral is
finite only if 5p > 1, or p > 0.2. We conclude that this f is in Lp , as long as 0.2 < p < 1.5. ⋄
Topic 4

Banach spaces

Banach spaces are a straight generalisation of the concept of vector space (Rn , that is). They
are spaces where we can add elements to each other (as we can add vectors to each other), where
we can multiply elements by scalars (as we do with vectors), and where we have a notion of
length of an element. The space must also be complete.
We have already seen examples of Banach spaces, namely Rn and Cn , all spaces ℓp and Lp ,
for 1 ≤ p ≤ ∞. The reason we care about these spaces is that they form the natural setting
in which to model many problems in mathematics, especially problems in differential equations,
but also in optimisation.

4.1 Linear spaces


The definition of linear space tries to capture what is essential about Rn .

Definition 4.1. A linear space (over the scalar field R) is any non-empty set V where a
sum + : V × V → V , and a scalar multiplication · : R × V → V are defined. These two
operations must obey the following properties: The sum is commutative and associative, has a
neutral element (called zero, and denoted by 0), such that every element has an additive inverse.
Moreover, the usual distributive laws apply between the sum and the multiplication. ⋄

To be more explicit, the sum obeys the following rules, for all x, y, z ∈ V :

• x + y = y + x;

• (x + y) + z = x + (y + z);

• 0 + x = x;

• For each x there is an element −x such that x + (−x) = 0.

We won’t be too pedantic here, and will write x−x = 0 for short. Note that this neutral element
0 is not the real number 0. We are using the same symbol to denote two different things.
The multiplication obeys the following rules, for all x, y ∈ V and all a, b ∈ R:

• a · (b · x) = (ab) · x = b · (a · x);

• (a + b) · x = a · x + b · x;

• a · (x + y) = a · x + a · y;

41
42 TOPIC 4. BANACH SPACES

• 1 · x = x.
We can prove all sorts of properties from these statements. For example, we can show that
0 · x = 0. (The first 0 is a real number, the second 0 isn’t.) Here it goes, you justify the steps.

y = 0 · x = (0 + 0) · x = 0 · x + 0 · x = y + y.

Now add −y to both sides.

0 = y + (−y) = (y + y) + (−y) = y + (y + (−y)) = y + 0 = y.

Hence 0 · x = y = 0.
From now on we will simply write ax instead of a · x.
We will not spend time proving properties like these, but it is nice to see how the properties
we are used to are logical consequences of the conditions we imposed to define linear spaces.
Example 14. Let V = C([0, 1], R). We define the sum and the product as follows, for any
f, g ∈ V , and a ∈ R.

(f + g)(t) = f (t) + g(t); (a · f )(t) = a f (t).

The neutral element of the sum is the function 0(t) = 0 for all t. (Note how the symbol 0
means two different things in the same equation! Bad habit, but the alternative is encumbering
notation.) It is a tedious, and some would say unnecessary exercise, to verify that all conditions
for V to be a linear space are satisfied. ⋄
Example 15. Here we let V be the space of 2-by-2 matrices with real entries, the usual matrix
sum, and the usual multiplication by a scalar. The zero matrix is the matrix where all four
entries are zero. Again, this is a linear space. ⋄
Example 16. The spaces ℓp are linear spaces (we have shown that they satisfy the definition
already). Likewise, the spaces Lp are linear spaces. ⋄
In our definition we required that the scalars are real numbers, but we could also have linear
spaces over C, or over Q, without altering the definition of linear space in essentials. We will
say that V is a linear space over K, and let K be either Q, R, or C, when the particular case is
irrelevant.

4.1.1 Linear subspaces


A linear subspace of V is itself a linear space, but it is contained in V . The proper definition
follows.
Definition 4.2. Let V be a linear space over K. A non-empty subset W of S is said to be a
(linear) subspace of V if the following two properties are true:
• x ∈ W, a ∈ K =⇒ ax ∈ W ;

• x, y ∈ W =⇒ x + y ∈ W . ⋄
The following proposition is left as an exercise.
Proposition 17. If W is a subspace of V , then W is itself a linear space when we restrict the
sum + and the scalar multiplication · to W .
4.1. LINEAR SPACES 43

Example 17. If V is a linear space, then S1 = V and S2 = {0} are subspaces of V . These are
the trivial subspaces. ⋄

Example 18. The subspaces of R3 are: (a) R3 itself; (b) Any plane going through the origin;
(c) Any straight line going through the origin; (d) Just the origin. It is a neat exercise to show
that these are subspaces, and that there no other subspaces. ⋄

Example 19. Let V = C([0, 1], R) be a linear space over K = R. For each integer n ≥ 0 we
define fn (x) = cos(nx). We define
{ }

N
W = w(x) = an fn (x) | an ∈ R, N ∈ N .
n=0

In words, an element w ∈ W is a finite linear combination of some of the∑


functions fn . We claim
that W is a subspace of V . Let’s see. If w ∈ W and a ∈ R, then w = N n=1 an fn for some N ,
and therefore
∑N ∑
N
aw(x) = (aan )fn (x) = bn fn (x) ∈ W.
k=0 k=0
∑M
If now v = n=0 cn fn , we have that


N ∑
M
w(x) + v(x) = an fn (x) + cn fn (x).
n=0 n=0

Say M ≥ N (the other case is similar). We define an = 0 for N < n ≤ M , and so


M ∑
M ∑
M
w(x) + v(x) = an fn (x) + cn fn (x) = (an + bn )fn (x),
n=0 n=0 n=0

showing that w + v ∈ W . ⋄

Now comes a string of examples of subspaces in ℓp . I debated with myself about making
these exercises instead of examples, but in the end I chose examples. If you decide to treat them
as exercises, though, that would be great.

Example 20. The space ℓ1 (N, C) is a subspace of ℓp (N, C), for 1 ≤ p ≤ ∞. We already know
that ℓ1 is a linear space. To show that it is a subspace of ℓp , we need to show that it is contained
in ℓp .
Suppose
x = (x0 , x1 , x2 , . . . ) ∈ ℓ1 .
Then


|xn | = ∥x∥1 < ∞.
n=0

We want to show that, for the same x, the quantity




|xn |p
n=0
44 TOPIC 4. BANACH SPACES

is finite (if p < ∞), or that the quantity


sup |xn |
0≤n<∞

is finite (for p = ∞). This would be a ∑ good moment for you to treat this example as an exercise.
First take p < ∞. Since the sum ∞ n=0 |xn | converges, we know that there is some n0 such
that if n ≥ n0 , then |xn | < 1. Hence, for n ≥ n0 we have |xn |p < |xn |. We compute:

∑ 0 −1
n∑ ∞
∑ 0 −1
n∑ ∞
∑ 0 −1
n∑
|xn | =
p
|xn | +
p
|xn | <
p
|xn | +
p
|xn | ≤ |xn |p + ∥x∥1 < ∞.
n=0 n=0 n=n0 n=0 n=n0 n=0

We conclude that ℓ1 is contained in ℓp , for 1 ≤ p < ∞, and is a subspace of ℓp .


Now for the case p = ∞. With the same n0 as before, we note that
sup |xn | ≤ sup 1 = 1.
n0 ≤n<∞ n0 ≤n<∞

As a consequence we have
sup |xn | ≤ max{|x0 |, . . . , |xn0 −1 |, 1} < ∞.
0≤n<∞

This proves that ℓ1 is a subspaces of ℓ∞ . ⋄


Example 21. This is an example, but it really is an exercise. Suppose 1 ≤ p1 ≤ p2 ≤ ∞. Show
that ℓp1 is a subspace of ℓp2 . Use the previous example as a guide. ⋄
Example 22. Fix a ∈ ℓ∞ . (Let’s make it more concrete: Define a by setting an = sin(n),
certainly in ℓ∞ because | sin t| ≤ 1 for all t.) We define the following subset of ℓ1 :


W = {x ∈ ℓ1 | an xn = 0}.
n=0

Let’s show that W is a subspace of ℓ1 . Follow the steps below.


(a) W is not empty, because the element x = (0, 0, 0, . . . ) is in W . Why? Because


sin(n) · 0 = 0.
n=0

(Note: This argument works with any a ∈ ℓ∞ , not just this a.)
(b) Pick λ ∈ C. We want to show that, if x ∈ W , then λx is also in W . But

∑ ∞

sin(n) · (λxn ) = λ sin(n) · xn = 0,
n=0 n=0

the last equality coming from the assumption that x ∈ W . We conclude that λx is in W as well.
(Verify that this argument does not use the definition an = sin(n) in any essential way, and that
any a ∈ ℓ∞ would also work.)
(c) Suppose x, y ∈ W . Then

∑ ∞
∑ ∞

sin(n) · (xn + yn ) = sin(n) · xn + sin(n) · yn = 0 + 0 = 0.
n=0 n=0 n=0

We conclude that x + y is in W , and W is a subspace of ℓ1 . ⋄


4.1. LINEAR SPACES 45

Example 23. Really, an exercise. Repeat the last example, but this time don’t pick a ∈ ℓ∞ .
Instead, let a = (a0 , a1 , . . . ) be any sequence of numbers. Show that the result is still true. ⋄
Example 24. Let a = (a0 , a1 , a2 , . . . ) be any sequence of numbers, and define, for 1 ≤ p < ∞,


W = {x ∈ ℓp | an xn = 0}.
n=0

Show that W is a subspace of ℓp . ⋄

The following result is very important.


Proposition 18. Suppose W1 and W2 are subspaces of V . Then

W1 ∩ W2 = {w ∈ V | w ∈ W1 , w ∈ W2 },
W1 + W2 = {w1 + w2 | w1 ∈ W1 , w2 ∈ W2 }

are subspaces of V .
Moreover, if W1 ∩ W2 = {0}, then any element w in W1 + W2 can be written in one and only
one way as w = w1 + w2 , with w1 ∈ W1 and w2 ∈ W2 .
Conversely, if any element w in W1 + W2 can be written in one and only one way as w =
w1 + w2 , with w1 ∈ W1 and w2 ∈ W2 , then W1 ∩ W2 = {0}.
If W1 ∩ W2 = {0}, we write W1 ⊕ W2 to denote W1 + W2 .
Proof. First let’s prove that W = W1 ∩ W2 is a subspace of V . If x ∈ W and a ∈ K, then
x ∈ W1 and also x ∈ W2 , so ax ∈ W1 , and ax ∈ W2 . We conclude that ax ∈ W . For the second
property, suppose x, y ∈ W . Then both x, y ∈ W1 , and x, y ∈ W2 . Consequently x + y is in both
W1 and W2 , so x + y ∈ W . We conclude that W = W1 ∩ W2 is a subspace of V .
Next we prove that W1 + W2 is a subspace of V . Suppose w ∈ W1 + W2 , and a ∈ K. Then
there is some w1 ∈ W1 and some w2 ∈ W2 with w = w1 + w2 . We also have aw = aw1 + aw2 .
Since aw1 ∈ W1 , and aw2 ∈ W2 , we conclude that aw ∈ W1 + W2 . If now v = v1 + v2 with
v1 ∈ W1 and v2 ∈ W2 , we have

w + v = w1 + w2 + v1 + v2 = (w1 + v1 ) + (w2 + v2 ) ∈ W1 + W2 .

We conclude that W1 + W2 is indeed a subspace of V .


Now for the interesting part. Suppose W1 ∩ W2 = {0}, that is, their intersection contains
only the zero element. Suppose we can write

w = w1 + w2 = w3 + w4 ,

with w1 , w3 ∈ W1 , and w2 , w4 ∈ W2 . Then

w1 − w3 = w4 − w2 .

Since w1 − w3 ∈ W1 , and w4 − w2 ∈ W2 , and they are equal, then w1 − w3 and w4 − w2 are in


both W1 and W2 . Since the intersection contains only the zero element, we conclude that

w1 − w3 = w4 − w2 = 0.

Therefore w1 = w3 and w2 = w4 . Thus there is only one way to write w as a sum of one element
of W1 plus one element of W2 .
46 TOPIC 4. BANACH SPACES

Conversely, suppose v ̸= 0 and v ∈ W1 ∩ W2 . Then we have

w = w1 + w2 = (w1 + v) + (w2 − v) = w3 + w4 ,

with w3 = w1 + v and w4 = w2 − v. Since v ∈ W1 ∩ W2 , we have w3 ∈ W1 , and likewise


w4 ∈ W2 . But this gives us two different representations of w as a sum of elements in W1 and
W2 . Therefore, if only one such representation is possible, then W1 ∩ W2 = {0}. 2

The space W1 + W2 is called the sum of W1 and W2 (of course it is). The space W1 ⊕ W2 is
the direct sum of W1 and W2 .
The result for intersections can be generalised.

Proposition 19. Let V be a linear space. Let Λ be any set (finite or infinite). For each λ ∈ Λ,
suppose Wλ is a subspace of V . Then

W = Wλ
λ∈Λ

is a subspace of V .

The proof is left as an exercise. (Hint: Copy the proof for W1 ∩ W2 , it works.)

Example 25. We have seen that every ℓp is a subspace of ℓ∞ . Define



W = ℓp .
1≤p<∞

This is a subspace, containing all sequences x that are in all ellp , for 1 ≤ p < ∞. But, ℓ1 is
contained in all ℓp ! We conclude that W = ℓ1 .
But let’s make this example a bit more interesting. Define

G= ℓp .
1<p<∞

(Note the subtle difference: We are excluding ℓ1 from the intersection.) The space G is a
subspace of ℓ∞ . Now, is it true that G = W = ℓ1 ? As it happens, no, it is false. To see that it
is false we need to come up with an example of

x = (x0 , x1 , x2 , . . . ) ∈ ℓp

for all p > 1, but x ̸∈ ℓ1 . Can you think of one such example? ⋄

4.2 The norm


Linear spaces are a generalisation of Rn (vector spaces), and the norm is the generalisation of
length (of a vector).

Definition 4.3. Let (V, +, ·) be a linear space over K. The norm of x ∈ V is a real number
∥x∥, where the following properties hold true for all x, y ∈ V , a ∈ K.

1. ∥ax∥ = |a| ∥x∥;

2. ∥x∥ ≥ 0;
4.2. THE NORM 47

3. ∥x∥ = 0 ⇐⇒ x = 0;

4. ∥x + y∥ ≤ ∥x∥ + ∥y∥.

The last property is the triangle inequality. ⋄

Keen observers will notice similarities with the notion of metric. If a linear space has a norm
defined on it, then it generates a metric by defining

d(x, y) = ∥x − y∥.

This is a metric on V . The converse is not necessarily true. It is possible to have a metric on
V which does not arise from a norm, for example, the discrete metric. Exercise: Verify that
∥x − y∥ defines a metric.
As a consequence of the last observation, as soon as we have a norm, we are able to introduce
concepts like convergence, open sets, compact sets, etc, in the context of linear spaces. This is
not as unremarkable as it seems. The concept of linear space is an algebraic concept, depending
only on properties of algebraic operations. The concept of distance is an analytic (or geometric)
concept, allowing for us to talk about convergence. Normed linear spaces, therefore, represent
a place where algebra meets analysis.

Example 26. The spaces ℓp are normed linear spaces, with the norm given by
( ∞
)1/p

∥x∥p = |xn |p ,
n=0

if 1 ≤ p < ∞, and
∥x∥∞ = sup |xn |,
1≤n<∞

if p = ∞. (Verify this!) ⋄

Example 27. The spaces Lp (I) are normed linear spaces, with the norm given by
(∫ )1/p
∥f ∥p = |f (t)| dt
p
,
I

if 1 ≤ p < ∞ (Verify this!) ⋄

Example 28. The case p = ∞ requires a bit of care. A function f is in L∞ (I) if there is a
number M > 0 such that |f (t)| ≤ M a.e. Let’s rephrase it.
A function f is in L∞ if there is a number M > 0, and a set A ⊂ I, of measure zero, such
that if t ̸∈ A, then |f (t)| ≤ M .
The idea here is that measure zero sets don’t count. What happens in a set of measure zero,
stays in a set of measure zero.
We define ∥f ∥∞ to be the smallest M such that |f (t)| ≤ M outside a set of measure zero.
We call it the essential supremum of f , and write

∥f ∥∞ = ess sup |f (t)|.


t∈I

Let’s verify that this is a norm. (You could stop reading now, and treat it as an exercise. Or
you can keep reading.)
48 TOPIC 4. BANACH SPACES

If a ∈ K, then |af (t)| = |a| |f (t)| for all t ̸∈ A. Then

sup |af (t)| = sup(|a| |f (t)|) = |a| sup |f (t)| = |a|∥f ∥∞ .


t̸∈A t̸∈A t̸∈A

This shows that ∥af ∥∞ ≤ |a| ∥f ∥∞ . Now for the other inequality. If a = 0, the result is true. If
a ̸= 0, we have
1
|f (t)| = |af (t)|.
|a|
There is a set B of measure zero such that, if t ̸∈ B, then |af (t)| ≤ ∥af ∥∞ . Hence, if t ̸∈ B we
have
1
|f (t)| = ∥af ∥∞ .
|a|
This shows that |a| |f (t)| ≤ ∥af ∥∞ , and as a consequence |a| ∥f ∥∞ ≤ ∥af ∥∞ . We proved
|a| ∥f ∥∞ = ∥af ∥∞ .
The property ∥f ∥∞ ≥ 0 is true, because we are taking the supremum over quantities that
are not negative.
If ∥f ∥∞ = 0, then f (t) = 0 for all t ̸∈ A, where A has measure zero. That is, f (t) = 0 a.e.
The triangle inequality now. Suppose there are sets A, B ⊂ I, both of measure zero, such
that |f (t)| ≤ ∥f ∥∞ . if t ̸∈ A, and |g(t)| ≤ ∥g∥∞ , if t ̸∈ B. The set C = A ∪ B also has measure
zero, and both inequalities are true if t ∈ C. In that case we have

|f (t) + g(t)| ≤ |f (t)| + |g(t)| ≤ ∥f ∥∞ + ∥g∥∞ .

As a consequence we have ∥f + g∥∞ ≤ ∥f ∥∞ + ∥g∥∞ .


Question: Did you expect this to be quite so technical? That is because arguments with the
supremum are usually technical, and on top of that we have to account for the sets of measure
zero as well. ⋄

4.3 The unit ball


On a normed linear space V , the (closed) unit ball is the set

B = B1 (0) = {x ∈ V | ∥x∥ ≤ 1}.

There’s nothing special about it except that it is a convenient set to study, because many
properties of V and of the norm can be studied by considering properties of this set.

4.3.1 Compactness and the unit ball


Compactness is an important property in analysis. If a set X is compact, then any continuous
function f : X → R will attain a maximum and a minimum. That is a nice property to possess.
Let’s recall the definition of compactness, in our context.
Definition 4.4. Let V be a normed linear space. A subset X ⊂ V is compact if, given
any sequence xn ∈ X, there is a subsequence xnk in X, and an element x ∈ X, such that
limnk →∞ xnk = x. ⋄
Is the unit ball compact? That is, if xn ∈ B is any sequence, does it possess a converging sub-
sequence? This desirable property is true in Rn (with the usual norm), but it fails spectacularly
in general, as the next example shows.
4.3. THE UNIT BALL 49

Example 29. Consider ℓ∞ , and let xn = (0, 0, · · · , 0, 1, 0, · · · ), where the only 1 occurs at the
nth entry. We have ∥xn ∥∞ = 1, so xn ∈ B. Suppose by contradiction that some subsequence
xnk converges to y ∈ ℓ∞ , that is, suppose ∥xnk − y∥∞ → 0 as nk → ∞. First, let’s show that
y = 0, by showing that each entry of y is 0. Fix m, and consider ym , the mth entry of y. Let
(xnk )m be the mth entry of xnk . Since we are taking a limit as nk → ∞, at some stage we will
have nk > m, and from that moment on we have (xnk )m = 0. Therefore

|ym | = lim |ym − (xnk )m | ≤ lim ∥y − xnk ∥∞ = 0.


nk →∞ nk →∞

Thus ym = 0, and y = 0. But since y = 0, then

0 = lim ∥xnk − y∥∞ = lim ∥xnk ∥∞ = 1.


nk →∞ nk →∞

This is a contradiction, and we conclude that the unit ball is not compact. ⋄

The space ℓ∞ is not an anomaly, in the sense that compactness of the unit ball characterises
finite-dimensionality. In other words, any infinite-dimensional example will be such that the
unit ball is not compact.

Example 30. Consider the space ℓ2 , and the sequence xn as in the last example, where all
entries are 0 except for the nth entry, which is 1. Again ∥xn ∥2 = 1. You should now stop
reading, and prove (by analogy with the last example) that no subsequence xnk of xn can be
convergent. After you have done so, come back to read this example.
Suppose y ∈ ℓ2 is such that ∥y − xnk ∥2 → 0 as nk → ∞. The first step is to show that
ym = 0 for all m. We have, for nk > m,

|ym | = |ym − (xnk )m | ≤ ∥y − xnk ∥2 .

Taking a limit as nk → ∞, we obtain ym = 0, and we conclude that y = 0. But then

0 = lim ∥y − xnk ∥2 = lim ∥xnk ∥2 = 1.


nk →∞ nk →∞

It is the same proof. ⋄

Here is a nice example to keep in mind.

Example 31. Let V = L1 ([0, 1], R) with the usual norm


∫ 1
∥f ∥1 = |f (t)| dt.
0

For each n ≥ 1, we define the function




 (n + 1)2 t, 0 ≤ t ≤ n+1
1
;

fn (t) = 2(n + 1) − (n + 1)2 t, n+1
1
≤ t ≤ n+12
;


 0, 2
≤ t ≤ 1. n+1

Please, draw the graph of f1 , f2 , and f3 to see what’s going on. Convince yourself that
∫ 1
∥fn ∥1 = |fn (t)| dt = 1
0
50 TOPIC 4. BANACH SPACES

for all n. Each graph is essentially a triangle with height (n + 1) and base 2/(n + 1). Note that
fn (0) = 0 and fn (1) = 0 for all n. Note also that, for each fixed t with 0 < t < 1, then there is
some n0 (depending on t) such that if n > n0 , then fn (t) = 0.
Suppose g = limnk →∞ fnk , for some subsequence fnk , that is, suppose
∫ 1
lim |g(y) − fnk (y)| dy = 0.
nk →∞ 0

Let’s attempt to show that g(t) = 0 a.e. Fix t. Then, for 2


nk +1 ≤ t we have
∫ 1 ∫ 1 ∫ 1
|g(y)| dy = |g(y) − fnk (y)| dy ≤ |g(y) − fnk (y)| dy.
t t 0

Taking a limit as nk → ∞ we conclude that


∫ 1
|g(y)| dy = 0.
t
∫1
Since this is true of any t, we conclude that 0 |g(t)| dt = 0, and as a result we have g(t) = 0
a.e. But then we have
fnk (t) = fnk (t) + g(t) a.e.,
and ∫ ∫
1 1
|fnk (t) − g(t)| dt = |fnk (t)| dt.
0 0
Taking limits, this gives us
∫ 1 ∫ 1
0 = lim |fnk (t) − g(t)| dt = lim |fnk (t)| dt = 1.
nk →0 0 nk →0 0

We conclude that the unit ball in L1 is not compact. Compare this proof to the proofs we used
for ℓ∞ and ℓ2 , and convince yourself that this is the same proof. ⋄

4.3.2 Convexity and the unit ball


Is the unit ball convex? That is, if x, y ∈ B, and 0 ≤ t ≤ 1, can we conclude that z =
tx + (1 − t)y ∈ B?
The answer to this question is rather simple: B is always convex, because if x, y ∈ B, then

∥tx + (1 − t)y∥ ≤ ∥tx∥ + ∥(1 − t)y∥ = t∥x∥ + (1 − t)∥y∥ ≤ t + (1 − t) = 1.

Convexity is somehow built in in the triangle inequality itself; that is all there is to it. This
result can be used, say to show that the formula
( ∞
)1/p

|xk |p
k=1

does not define a norm if 0 < p < 1, since in that case the unit ball would fail to be convex.
(Draw a picture of the unit ball for p = 1/2 in R2 to see what it looks like.)
In spite of this result being so straightforward, here are two examples where we verify con-
vexity without resorting to the triangle inequality.
4.3. THE UNIT BALL 51

Example 32. In R2 , with the usual norm, the unit ball is bounded by the circle of radius 1,
and we know it looks convex, so maybe we’d better prove it. Suppose x = (x1 , x2 ), y = (y1 , y2 ),
x21 + x22 ≤ 1, y12 + y22 ≤ 1. We want to show that
(tx1 + (1 − t)y1 )2 + (tx2 + (1 − t)y2 )2 ≤ 1
for 0 ≤ t ≤ 1. We can use calculus. Let
f (t) = (tx1 + (1 − t)y1 )2 + (tx2 + (1 − t)y2 )2 .
Note that f (0) ≤ 1 and f (1) ≤ 1. We also have
f ′ (t) = 2(tx1 + (1 − t)y1 )(x1 − y1 ) + 2(tx2 + (1 − t)y2 )(x2 − y2 ),
and
f ′′ (t) = 2(x1 − y1 )2 + 2(x2 − y2 )2 .
If f ′′ (t) = 0 then x1 = y1 , and x2 = y2 , so that f ′ (t) = 0 for all t, and f (t) is constant. Since
f (0) ≤ 1, we conclude that f (t) ≤ 1 for all t ∈ [0, 1]. If f ′′ (t) > 0, then the solution to f ′ (t) = 0
will be a global minimum, and
max f (t) = max{f (0), f (1)} ≤ 1.
0≤t≤1

We conclude that f (t) ≤ 1 if 0 ≤ t ≤ 1, and the unit ball is indeed convex. ⋄


Example 33. Consider the space V = L2 ([0, 1], C) with norm

∫ 1
∥f ∥2 = |f (x)|2 dx.
0

Suppose f, g ∈ B, let’s show that tf + (1 − t)g ∈ B, 0 ≤ t ≤ 1. You should stop reading now,
and do it yourself, then come back to read the proof.
We want to prove that, with 0 ≤ t ≤ 1 (we won’t write this again)
∫ 1
F (t) = |tf (x) + (1 − t)g(x)|2 dx ≤ 1.
0
Since f and g may be taking complex values, we write
|tf (x) + (1 − t)g(x)|2 = (tf (x) + (1 − t)g(x))(tf (x) + (1 − t)g(x))∗ ,
where as usual ∗ denotes complex conjugation. Thus
∫ 1

F (t) = (f (x) − g(x))(tf (x) + (1 − t)g(x))∗ + (tf (x) + (1 − t)g(x))(f (x) − g(x))∗ dx,
0
∫ 1
F ′′ (t) = 2|f (x) − g(x)|2 dx.
0
If F ′′ (t) = 0 then f (x) = g(x) a.e. Then F ′ (t) = 0 for all t, and F is constant. Since F (0) ≤ 1,
we conclude that F (t) ≤ 1.
If F ′′ (t) > 0, then the solution to F ′ (t) = 0 is a global minimum, and
max F (t) = max{F (0), F (1)} ≤ 1.
0≤t≤1

Again, the same proof, and same conclusion. ⋄


The point of these two examples was to show, first, the similarities, and second, that problems
in convexity are calculus-related.
52 TOPIC 4. BANACH SPACES

4.4 Banach spaces


Definition 4.5. A complete normed linear space is called a Banach space. ⋄

We have already seen examples of Banach spaces: Rn , ℓp , and Lp are all Banach spaces. In
what follows we will be considering different aspects of the theory in Banach spaces.

4.4.1 Questions of convergence


On a Banach space, convergence works well (Cauchy sequences converge). The notion of ap-
proximation is of great importance, and so we concentrate on answering the following question:
How can we approximate elements in a Banach space?
The idea here is the following. It is all very well and fine to talk about abstract spaces,
but when you have to work with a concrete example, and provide concrete approximations,
sometimes the abstraction gets in the way. To that effect, for each example of Banach space,
we try to find approximating sets, that is, standard elements that we use to approximate every
other element of the space. Here is an example.

Example 34. The space ℓ1 (N, C) is a Banach space. Consider the following subset S of ℓ1 :

y ∈ S ⇐⇒ (∃n0 , n ≥ n0 ⇒ yn = 0).

In words, y ∈ S if yn = 0 for all n past a certain point. We claim that we can use the set S to
approximate any element in ℓ1 , as follows.
Pick x ∈ ℓ1 , and ε > 0. We aim to obtain y ∈ S such that ∥x − y∥1 < ε, so that y
approximates x to within ε. Our next step is to produce such y.
The quantity
∑∞
∥x∥1 = |xn |
n=0

is finite. Since the series converges, given ε > 0 there is some n0 such that


|xn | < ε.
n=n0

Now we define {
x n , n < n0 ;
yn =
0, n0 ≤ n.
We see that


∥x − y∥1 = |xn | < ε.
n=n0

The example shows that it is possible to approximate any element in ℓ1 by elements that are
essentially finite-dimensional. The following definition should make sense now.

Definition 4.6. Let (X, d) be a metric space. We say that S ⊂ X is dense in X if, for any
x ∈ X and any ε > 0 there is some element y ∈ S with d(x, y) < ε.
If X possesses a countable dense subset, we say that X is separable. ⋄
4.4. BANACH SPACES 53

Dense sets are approximating sets. The elements in S are usually simpler than those in X,
so it is easier to work with them. If we want to prove a property about x ∈ X, we often prove
the property about y ∈ S, and then we take a limit to obtain the result for x.
Example 35. Let I ⊂ R be an interval. Let J ⊂ I be a sub-interval of I. The characteristic
function on J is the function {
1, t ∈ J;
χ(t, J) =
0, t ̸∈ J.
Given any integer m > 0, and any disjoint intervals J1 , . . . , Jm , each contained in I, we define
the step function
∑n
s(t) = ak χ(t, Jk ),
k=1
where ak ∈ K for all k. (That is, a step function is a function that is constant over finitely many
intervals, and zero elsewhere.) Let S be the set of all step functions. Then S is dense in L1 (I).
This is a fancy way of saying that any function f ∈ L1 can be approximated by a step function.
This statement is trivial, because that is how we defined L1 to begin with. ⋄
Example 36. Let I = [0, 1] be the unit closed interval, and consider C([0, 1], R) ⊂ L1 ([0, 1], R),
the subset of L1 containing the continuous functions. We claim that C([0, 1], R) is dense in L1 ;
that is, any function in L1 can be approximated by a continuous function. We will prove this
by showing that every step function can be approximated by a continuous function. Since the
step functions are dense in L1 , we will conclude that the continuous functions are dense in L1
as well.
Let J = (a, b) ⊂ [0, 1] be an interval, and consider the step function s(t) = aχ(t, J), where
a > 0. Given ε > 0, we want to produce a continuous function f : [0, 1] → R such that
∫ 1
|f (t) − s(t)| dt < ε.
0
It is easier to show how to do it in a picture than to write formulas. The step function s in blue,
the approximation f in orange.

1.2

0.8

0.6

0.4

0.2

0 0.2 0.4 0.6 0.8 1

The sum of the areas


∫ 1 of the two triangles (making the difference between the graphs) is the
value of the integral 0 |f − s| dt. We can certainly make that smaller than ε by cutting very
thin triangles. Notice that f and s are zero over the same values of t.
The result now follows. ⋄
54 TOPIC 4. BANACH SPACES

Note that, in the last example, the approximating function f fails to have a derivative for
a few values of t. Can you modify the example and show that every step function can be
approximated in L1 by a function f that has a derivative for every t?
Also, answer the follow-up question: Is the same result true for Lp , with 1 ≤ p < ∞?

4.4.2 Something more about ℓp


We saw in an example that if 1 ≤ p < ∞, then ℓ1 has a convenient dense subset formed by
finite-length sequences. We have also seen that, if p1 < p2 , then ℓp1 is a subspace of ℓp2 . Let’s
now show the following interesting result.

Proposition 20. Suppose 1 ≤ p1 < p2 < ∞. Then ℓp1 is a dense subspace of ℓp2 .

Proof. Let S be the set of finite-length sequences, as before. We know that S is a subset of ℓp
for all p. That means S is in ℓp1 . But as we saw before, S can be used to approximate any
element in ℓp2 . But then any element in ℓp2 can be approximated by elements of ℓp1 . Since ℓp1
is contained in ℓp2 , we conclude that ℓp1 is dense in ℓp2 . 2

Now, why is this interesting? Because this can’t happen in Rn . In Rn , if W is a subspace


of V , then W can’t be dense in V (unless W = V ). So this is the first time that the theory for
infinite dimensions differs from the theory in infinite dimensions.
Here is another place where the theory in infinite dimensions is significantly different from
the theory in finite dimensions.

Proposition 21. Fix 1 < p′ ≤ ∞, and define



W = ℓp ,
1≤p<p′

′ ′
the union of all spaces ℓp with p < p′ . Then W is a subspace of ℓp , is strictly contained in ℓp .

Proof. Let’s show that W is a subspace. If λ ∈ K and x ∈ W , then x ∈ ℓp for some p < p′ .
Then λx ∈ ℓp (because ℓp is a linear space), and so λx ∈ W . If also y ∈ W , then y ∈ ℓq for some
q < p′ . Let r = max{p, q}. Then ℓp and ℓq are contained in ℓr , so x, y ∈ ℓr . Then their sum
x + y is in ℓr . But ℓr is contained in W , and we conclude that x + y is in W . So W is a linear

subspace of ℓp .
′ ′
To see that W is strictly contained in ℓp , we need to exhibit an element of ℓp , not contained
in any ℓp (with p < p′ ). This is tricky to do.
Define the sequence
( )
1 1 1
x = 0, p√ ′ , √
′ , √
′ ,··· .
2 ln 2 p 3 ln 3 p 4 ln 4

We want to show this x is in ℓp , but if p < p′ , then this x is not in ℓp . More concretely, we want
to show that
∑∞
1
k(ln k)p′
k=2

is finite, but

∑ 1
k=2
k p/p′ (ln k)p
4.4. BANACH SPACES 55

is infinite. We use the integral test for both series, and the substitution y = ln x.
∫ ∞ ∫ ∞
dx dy
p′
= p′
< ∞,
2 x(ln x) ln 2 y

since 1 < p′ . However,


∫ ∞ ∫ ∞ y(1− pp′ )
dx e dy
′ = = ∞,
2
p/p
x (ln x)p ln 2 yp
p
because 1 − p′ > 0.
2

This can’t happen in finite dimensions. In finite dimensions, if Wk are subspaces of V with
W1 ⊂ W2 ⊂ W3 ⊂ · · · , then eventually the subspaces stabilise, that is, Wn = Wn+1 for n ≥ n0 .
As we just saw, this is not the case in infinite dimensions.

4.4.3 Cartesian product


In this section we point out that it is always possible to build new normed linear spaces from
old ones via the Cartesian product. Let (A, ∥ · ∥A ) and (B, ∥ · ∥B ) be two Banach spaces. We
define V = A × B, with the norm of y = (x1 , x2 ) ∈ A × B given by

∥y∥ = ∥(x1 , x2 )∥ = ∥x1 ∥A + ∥x2 ∥B .

This is not the only possible norm, of course. Alternatives are, for example,

max{∥x1 ∥A , ∥x2 ∥B }, ∥x1 ∥2A + ∥x2 ∥2B ,

and many more. Of course, one has to show that these are norms.

Proposition 22. Let (A, ∥ · ∥A ) and (B, ∥ · ∥B ) be Banach spaces. Then

∥(x1 , x2 )∥ = ∥x1 ∥A + ∥x2 ∥B

defines a norm in V = A × B, making V a Banach space.

Proof. The triangle inequality needs to be verified. If x = (x1 , x2 ) and y = (y1 , y2 ), we have

∥x∥ = ∥x1 ∥A + ∥x2 ∥B , ∥y∥ = ∥y1 ∥A + ∥y2 ∥B .

Also,
∥x + y∥ = ∥(x1 + y1 , x2 + y2 )∥ = ∥x1 + y1 ∥A + ∥x2 + y2 ∥B .
Then

∥x + y∥ = ∥x1 + y1 ∥A + ∥x2 + y2 ∥B
≤ ∥x1 ∥A + ∥y1 ∥A + ∥x2 ∥B + ∥y2 ∥B
= ∥x∥ + ∥y∥.

Suppose now zn = (xn , yn ) ∈ A × B is a Cauchy sequence in V ; that is,

lim ∥zn − zm ∥ = 0.
n,m→∞
56 TOPIC 4. BANACH SPACES

We want to show that zn → z ∈ V , for some z.


Note that we have

∥xn − xm ∥A ≤ ∥zn − zm ∥, ∥yn − ym ∥B ≤ ∥zn − zm ∥.

Take limits to conclude that xn is a Cauchy sequence in A, and yn is a Cauchy sequence in B.


Since A and B are Banach spaces, we have xn → x ∈ A and yn → y ∈ B. Define z = (x, y).
Then
lim ∥zn − z∥ = lim ∥xn − x∥A + ∥yn − y∥B = 0,
n,m→∞ n,m→∞

showing that zn → z, and V is complete. 2

It is an exercise to show that the definitions



max{∥x1 ∥A , ∥x2 ∥B }, ∥x1 ∥2A + ∥x2 ∥2B ,
also produce norms.
Which brings us to our next question: Which norm is the correct norm? (If there is such a
thing.) Our next subsection goes some way to answer this question.

4.4.4 Equivalence of norms


It is often possible to put different norms on the same space, and we have already done so on
V R2 . A natural question to ask is whether two different norms are, in some sense, equivalent.
What would that mean?
Since a norm is a generalisation of the notion of vector length, equivalent norms should be
two norms that produce comparable lengths.

Definition 4.7. Let ∥ · ∥1 and ∥ · ∥2 be two norms defined over the same linear space V . We
say these norms are equivalent if there are constants 0 < c1 ≤ c2 < ∞ such that, for all x ∈ V
we have
c1 ∥x∥1 ≤ ∥x∥2 ≤ c2 ∥x∥1 .

Here is the consequence we are most interested in.

Proposition 23. Let ∥ · ∥1 and ∥ · ∥2 be equivalent on V . If xn is a Cauchy sequence with respect


to ∥ · ∥1 , then it is also a Cauchy sequence with respect to ∥ · ∥2 (and vice versa). In particular,
if V is complete in one of the norms, then it will also be complete in the other norm.

The proof is a direct consequence of the inequalities for equivalent norms, and we leave it as
an exercise. We also leave the following as an exercise.

Proposition 24. Equivalence of norms is an equivalence relation; that is: (a) any norm is
equivalent to itself; (b) if ∥ · ∥1 is equivalent to ∥ · ∥2 , then ∥ · ∥2 is equivalent to ∥ · ∥1 ; and (c)
if ∥ · ∥1 and ∥ · ∥2 are equivalent, and ∥ · ∥2 and ∥ · ∥3 are equivalent, then ∥ · ∥1 and ∥ · ∥3 are
equivalent.

Here is an astonishing (and useful) fact.

Proposition 25. Any two norms on R2 are equivalent.


4.4. BANACH SPACES 57

Proof. It is enough to show that any norm is equivalent to the supremum norm. With x =
(x1 , x2 ), let
∥x∥∞ = max{|x1 |, |x2 |}.
Let ∥ · ∥ be any norm on R2 . Let Q be the set
Q = {x ∈ R2 | ∥x∥∞ = 1}.
We claim that there are constants 0 < c1 ≤ c2 < ∞ such that, if x ∈ Q, then
c1 ≤ ∥x∥ ≤ c2 .
Let’s prove this. Since x = (x1 , x2 ) = x1 (1, 0) + x2 (0, 1), we have
∥x∥ ≤ |x1 | ∥(1, 0)∥ + |x2 | ∥(0, 1)∥.
Since x ∈ Q, we have |x1 | ≤ 1, and |x2 | ≤ 1, and so
∥x∥ ≤ ∥(1, 0)∥ + ∥(0, 1)∥ = c2 .
Now, if x ̸= 0, then y = x/∥x∥∞ ∈ Q, and so
∥x∥
∥y∥ ≤ c2 =⇒ ≤ c2 .
∥x∥∞
We conclude that, for any x ∈ R2 , we have
∥x∥ ≤ c2 ∥x∥∞ .
Now let’s prove the other inequality. Define
c1 = inf ∥x∥.
x∈Q

We want to show that c1 > 0. Suppose c1 = 0 (by way of contradiction). Then there is a
sequence xn ∈ Q such that
lim ∥xn ∥ = 0.
n→∞

Since Q is a compact set in R2 (in the norm ∥ · ∥∞ ), there is a subsequence xnk of xn , with
xnk → x ∈ Q, so that
lim ∥xnk − x∥∞ = 0.
nk →∞

But then
lim ∥xnk − x∥ ≤ c2 lim ∥xnk − x∥∞ = 0,
n→∞ n→∞
and we conclude that xnk converges to x in the norm ∥ · ∥ as well. But
∥x∥ ≤ ∥x − xnk ∥ + ∥xnk ∥ −→ 0
as nk → ∞, showing that ∥x∥ = 0. Therefore x = 0 (because ∥ · ∥ is a norm). But x ∈ Q, and
0 ̸∈ Q. This contradiction shows that c1 > 0. Thus, if x ∈ Q we have.
0 < c1 ≤ ∥x∥.
Now, if x ̸= 0 and y = x/∥x∥∞ , then y ∈ Q, and similarly as before we conclude, for all x, that
c1 ∥x∥∞ ≤ ∥x∥.
This concludes our proof. 2
58 TOPIC 4. BANACH SPACES

You are now invited to prove the following.

Proposition 26. Any two norms in Rn are equivalent.

This is, ultimately, a property of finite-dimensional spaces, and it essentially has to do with
the fact that the unit ball is compact (where have we used this fact?).
We return to the question about norms on Cartesian products.

Proposition 27. The norms



∥x1 ∥A + ∥x2 ∥B , max{∥x1 ∥A , ∥x2 ∥B }, ∥x1 ∥2A + ∥x2 ∥2B

are all equivalent to each other.

Proof. This statement has nothing to do with norms, linear spaces, Banach spaces– none of
that. This is a statement about functions of 2 variables. Consider the three functions below,
defined for x ≥ 0, y ≥ 0.

f (x, y) = x + y, g(x, y) = max{x, y}, h(x, y) = x2 + y 2 .

For instance, we want to show that there are positive numbers 0 < c1 < c2 such that

c1 f (x, y) ≤ g(x, y) ≤ c2 f (x, y).

This looks like a calculus problem, and may even be simpler than that. Consider: max{x, y} is
either x or y. Say max{x, y} = x. Then

x + y ≤ x + x = 2x = 2 max{x, y} ≤ 2x + 2y = 2(x + y).

Now divide by 2 to obtain


1
f (x, y) ≤ g(x, y) ≤ f (x, y).
2
So c1 = 1/2, c2 = 1. The same argument works if y = max{x, y}. This shows that the norms

∥x1 ∥A + ∥x2 ∥B , max{∥x1 ∥A , ∥x2 ∥B }

are equivalent. I will leave it as an exercise for you to show that the norms

max{∥x1 ∥A , ∥x2 ∥B }, ∥x1 ∥2A + ∥x2 ∥2B

are equivalent.
2

4.4.5 Direct sums


Let V be a Banach space, and let W1 , W2 be subspaces of V . In a previous proposition we
defined W1 + W2 , and more interestingly we defined W1 ⊕ W2 , when W1 ∩ W2 = {0}. This is
called the direct sum of W1 and W2 . So here is an interesting and important question: Suppose
W1 is a subspace of V . When can we find another subspace W2 such that V = W1 ⊕ W2 ?
This is an important question, because if we can answer it in the affirmative, then we have
a way of representing elements x ∈ V as a sum x = w1 + w2 in a unique way. As a consequence,
4.4. BANACH SPACES 59

we can identify x with the ordered product (w1 , w2 ), and identify V with the cartesian product
W1 × W2 . This could potentially be very useful, say, in developing a coordinate system for V .
Important as this question is, it has no easy answer, and in fact we don’t want to think
about it too much, given the difficulty of the subject. But it seems so simple, doesn’t it? We
have a subspace, why shouldn’t we be able to pick a second, complementary subspace? Let’s
attempt to show that it is always possible to find such W2 . (It isn’t.)
For our first attempt: Choose W2 to be perpendicular to W1 . That would be fine, except for
the fact that perpendicularity has no meaning in the abstract theory we have developed. What
does it mean for two elements x, y ∈ V to be perpendicular? Can we give an interpretation of
perpendicularity based only on the notion of norm? Well, here is an attempt. When x and y
are perpendicular vectors in R2 , they can be viewed as two sides of a rectangle, and in that case
the rectangle diagonals have the same length. Likewise, if the main diagonals of a parallelogram
have the same length, then the parallelogram is actually a rectangle. So, we could attempt to
define, in a general Banach space: two elements x, y are perpendicular if

∥x + y∥ = ∥x − y∥.

This definition only depends on the norm. (It is not a good definition, but let’s not get into
details now.) With this notion of perpendicularity, we could try to define a perpendicular
subspace to W1 , as follows:

W2 = {y ∈ V | ∥x + y∥ = ∥x − y∥ for all x ∈ W1 }.

To clarify: start with W1 , define W2 , and now try to prove that V = W1 ⊕ W2 . This W2 contains
all elements of V that are perpendicular to all of W1 . Sounds reasonable?
Maybe it sounds reasonable, but it doesn’t work, even in the simplest examples. Let’s make
this more concrete: Take V = ℓ∞ with the supremum norm, and take W1 = ℓ1 , which we already
know is a proper subspace of ℓ∞ . Prove that zero is the only element of ℓ∞ that is perpendicular
(in this strange definition) to all of ℓ1 . So this idea doesn’t work...
Here is a second idea: How about building W2 one element at a time? Let’s proceed induc-
tively. If W1 = V , then W2 = {0} and we are done. If W1 ̸= V then there is some element
x1 ̸= 0 with x1 ̸∈ W1 . This x1 will be declared to be in W2 . If the span of W1 and x1 is the
whole of V , then stop. If not, then there is an element x2 ̸= 0 that is not in the span of W1
and x1 , and we add x2 to W2 . Proceed like this. Does that produce W2 ? Well, maybe. For
one thing, the W2 that we can produce using this method will have a countable base, and who
knows if that is enough? The space V may be huge. So we simply don’t know if this method
can work for us.
The trouble here is that Banach spaces can be very complicated indeed, and there is no
guarantee that they will necessarily have a nice direct product structure. We will come back to
this topic later.
60 TOPIC 4. BANACH SPACES
Topic 5

Hilbert spaces

A Hilbert space is a Banach space where there is a notion of perpendicularity. This notion has
far-reaching implications in the geometry of the space, as we shall see.

5.1 Inner product


The goal here is to generalise the notion of the dot product from Rn . In what follows we do not
assume V is a Banach space. Also, there is a slight difference between the cases K = R and
K = C. If K = C, we sometimes have to take conjugates, but if K = R, condition involving
conjugates simplify a little.

Definition 5.1. Let V be a linear space over K. An inner product is a function ⟨ , ⟩ : V ×V →


K with the following properties: For any x, y, z ∈ V and a ∈ K we have

1. ⟨x, x⟩ ≥ 0;

2. ⟨x, x⟩ = 0 ⇐⇒ x = 0;

3. ⟨x, y⟩ = ⟨y, x⟩∗ ;

4. ⟨ax + y, z⟩ = a⟨x, z⟩ + ⟨y, z⟩. ⋄

Note that if K = R then condition 3 becomes ⟨x, y⟩ = ⟨y, x⟩.


Let’s see some examples.

Example 37. Let V = R2 , over K = R, and write x = (x1 , x2 ), y = (y1 , y2 ). We define

⟨x, y⟩ = x1 y1 + x2 y2 .

This is the dot product in R2 , more traditionally denoted by x · y. It is an exercise to check


properties 1 through 4 (you probably did it already in a calculus, or linear algebra course). ⋄

Example 38. Let V = C2 over C, x = (x1 , x2 ), y = (y1 , y2 ), and note that x1 , x2 , y1 , y2 are
complex numbers. We define
⟨x, y⟩ = x1 y1∗ + x2 y2∗ .
For example, x = (2 + i, 3), y = (1 + i, 1 − 2i). Then

⟨x, y⟩ = (2 + i)(1 − i) + 3(1 + 2i) = 6 + 5i.

61
62 TOPIC 5. HILBERT SPACES

To verify property 3, we write

⟨x, y⟩ = x1 y1∗ + x2 y2∗ = (x∗1 y1 + x∗2 y2 )∗ = ⟨y, x⟩∗ .

We leave the other properties as an exercise. ⋄


Example 39. Take V = L∞ ([0, 1], C), that is, bounded functions f : [0, 1] → C, and define
∫ 1
⟨f, g⟩ = f (t) g(t)∗ dt.
0

Because f (t)f (t)∗ = |f (t)|2 , we can verify that all properties of the inner product are satisfied.
(Do it!). ⋄
What is the meaning of the inner product? In R2 and R3 we use the dot product to detect
perpendicularity, and to obtain projections of vectors onto other vectors. For example, if x and
y are non-zero vectors in R2 or R3 , then the projection of x in the direction of y is given by a
formula involving the dot product. Let’s derive that formula here.
We want to obtain the projection of x in the direction of y. To that end, we write x =
(x)y + (x)⊥ , where (x)y is the desired projection, and (x)⊥ is perpendicular to y (so that
(x)⊥ · y = 0). Since we know that (x)y is the projection in the direction of y, then we must have
(x)y = cy, for some unknown constant c. Then

x = cy + (x)⊥ .

Taking a dot product with y we obtain

x · y = cy · y + (x)⊥ · y = cy · y.

As a result we obtain
x·y
c= ,
y·y
and therefore
x·y
(x)y = y.
y·y
This is our formula for the projection. For example, if x = (1, 2) and y = (5, 8), then
21
(x)y = (5, 8).
89
Easy to use, right? Now let me point out some things about the proof of that formula. First,
we never really used the fact that we are in R2 or in R3 . All we used was that we could write x
as a sum of two vectors, one in the direction of y (the projection), and the other perpendicular
to it. That was, literally, all we used. The rest were just properties of the dot product.
So here is the meaning of the inner product: It will allow us to study geometry (perpendicu-
larity, projections, etc) in any space where we can define an inner product. You may argue: But
what is the meaning of perpendicularity in, say, C2 ? The meaning of perpendicularity is not
geometric anymore, it is algebraic: Two vectors in C2 are perpendicular if their inner product is
zero. It is through the algebra that we will achieve visualisation. This is truly one of the great
triumphs of abstraction.
Our first proposition can be said to be one of the most important properties of the inner
product, the Cauchy-Schwarz inequality. The proof would simplify a lot if we had K = R.
But we could have K = C, and that is why we need to introduce the α you will see below in the
proof.
5.1. INNER PRODUCT 63

Proposition 28 (Cauchy-Schwarz inequality). For any x, y ∈ V we have


|⟨x, y⟩|2 ≤ ⟨x, x⟩ · ⟨y, y⟩.
Proof. If y = 0 both sides of the inequality are 0, so we assume y ̸= 0. Also, if ⟨x, y⟩ = 0, we
are done. If not, let α ∈ C be such that |α| = 1, and α⟨x, y⟩ > 0. For any real number t we
compute
0 ≤ ⟨αx + ty, αx + ty⟩ = ⟨αx, αx⟩ + t⟨αx, y⟩ + t⟨y, αx⟩ + t2 ⟨y, y⟩,
from which we obtain
0 ≤ ⟨αx, αx⟩ + 2tℜ⟨αx, y⟩ + t2 ⟨y, y⟩,
where ℜz is the real part of the complex number z. Since y ̸= 0, we have ⟨y, y⟩ > 0. Set
a = ⟨y, y⟩, b = 2ℜ⟨αx, y⟩, and c = ⟨αx, αx⟩. The quadratic at2 + bt + c attains its minimum
when t = −b/(2a). Substituting this value of t in the quadratic gives us
( )
−2ℜ⟨αx, y⟩ −2ℜ⟨αx, y⟩ 2
0 ≤ ⟨αx, αx⟩ + 2 ℜ⟨αx, y⟩ + ⟨y, y⟩.
2⟨y, y⟩ 2⟨y, y⟩
(Long computation now. Do it!) This simplifies to
(ℜ⟨αx, y⟩)2 ≤ ⟨αx, αx⟩ ⟨y, y⟩.
Now observe the following:
ℜ(⟨αx, y⟩) = ℜ(α⟨x, y⟩) = α⟨x, y⟩,
because α⟨x, y⟩ is real. Also,
⟨αx, αx⟩ = α⟨x, αx⟩ = αα∗ ⟨x, x⟩ = ⟨x, x⟩,
because |α| = 1. We obtain
0 < (α⟨x, y⟩)2 ≤ ⟨x, x⟩ ⟨y, y⟩.
Since α⟨x, y⟩ is a positive number, we have
(α⟨x, y⟩)2 = |α⟨x, y⟩|2 = |α|2 |⟨x, y⟩|2 = |⟨x, y⟩|2 ,
and the inequality is proved. 2

Our second proposition relates the inner product to the norm. It is a fundamental result,
because it says that if a spaces has an inner product, then we can define a norm on the space.
Proposition 29. Let (V, ⟨ , ⟩) be an inner product space. Then

∥x∥ = ⟨x, x⟩
defines a norm on V .
Proof. The triangle inequality is the tricky bit. Try proving it, then look back here. We have
∥x + y∥2 = ⟨x + y, x + y⟩
= ⟨x, x⟩ + 2ℜ⟨x, y⟩ + ⟨y, y⟩
≤ ⟨x, x⟩ + 2|⟨x, y⟩| + ⟨y, y⟩
√ √
≤ ⟨x, x⟩ + 2 ⟨x, x⟩ ⟨y, y⟩ + ⟨y, y⟩
√ √
= ( ⟨x, x⟩ + ⟨y, y⟩)2 .
Now take square roots to obtain the triangle inequality. Note: we used the Cauchy-Schwarz
inequality from the third to the fourth line. 2
64 TOPIC 5. HILBERT SPACES

From now on we will write the Cauchy-Schwarz inequality as

|⟨x, y⟩| ≤ ∥x∥ · ∥y∥,

where the norm is always understood to be the one defined by the inner product.
A natural question to ask is whether every norm comes from some inner product. This is
not difficult to settle, and in fact we already hinted at the answer elsewhere.
Proposition 30. If V is an inner product space, then, for any x, y ∈ V we have

∥x + y∥2 + ∥x − y∥2 = 2(∥x∥2 + ∥y∥2 ).

(This is called the parallelogram identity, relating the lengths of the sides of a parallelogram
to the lengths of its diagonals.)
Proof. As before, we have

∥x + y∥2 = ∥x∥2 + 2ℜ⟨x, y⟩ + ∥y∥2 ;


∥x − y∥2 = ∥x∥2 − 2ℜ⟨x, y⟩ + ∥y∥2 .

Now add them. 2

As a consequence, if we have a normed space where the parallelogram identity is not valid,
then the norm does not arise from any inner product. Show that the parallelogram identity is
not valid in ℓ∞ with the supremum norm.
The converse of the last proposition is also true.
Proposition 31. If the parallelogram identity is true for every x, y in a normed space, then the
norm does come from an inner product.
Proof. For a moment, suppose we do have such an inner product. Subtracting the two equations
in the last proposition gives us
1
ℜ⟨x, y⟩ = (∥x + y∥2 − ∥x − y∥2 ).
4
In the case K = R, then ℜ⟨x, y⟩ = ⟨x, y⟩. We can then write
1
⟨x, y⟩ = (∥x + y∥2 − ∥x − y∥2 ).
4
The right-hand-side depends only on the norm. We can now use it as the definition of ⟨x, y⟩, and
use the parallelogram identity to show that this definition yields an inner product. (Exercise!)
In the case K = C we need to do extra work. Now, ⟨x, y⟩ = ℜ⟨x, y⟩ + iℑ⟨x, y⟩, so that
−i⟨x, y⟩ = ℑ⟨x, y⟩ − iℜ⟨x, y⟩. We conclude that

ℜ⟨x, iy⟩ = ℜ(−i⟨x, y⟩) = ℑ⟨x, y⟩.

Then
1
ℑ⟨x, y⟩ = ℜ⟨x, iy⟩ = (∥x + iy∥2 − ∥x − iy∥2 ).
4
Finally,
1
⟨x, y⟩ = ℜ⟨x, y⟩ + iℑ⟨x, y⟩ = (∥x + y∥2 − ∥x − y∥2 + i∥x + iy∥2 − i∥x − iy∥2 ).
4
5.2. HILBERT SPACES 65

Thus, if we do have the inner product, then the last equation (called the polarization identity)
must be true. Note, however, that the right side is written in terms of the norm only. Now use
this equation to define the quantity ⟨x, y⟩. With this equation as the definition of ⟨x, y⟩, and
assuming the parallelogram identity, you now have to prove as an exercise that ⟨ , ⟩ is an inner
product. 2

Example 40. The space ℓ1 is contained in every ℓp , and therefore we can define the norm ∥ · ∥p
for the elements of ℓ1 , for any p ≥ 1. However, of all the norms we can define, only the norm
with p = 2 comes from an inner product. In other words, verify the following:
(a) If p = 2, the norm v
u∞
u∑
∥x∥2 = t |xn |2
n=0

does come from an inner product.


(b) If p ̸= 2, with 1 ≤ p < ∞, the norm
v
u∞
u∑
∥x∥p = t
p
|xn |p
n=0

does not come from an inner product (does not satisfy the parallelogram identity).
(c) If p = ∞, the norm
∥x∥∞ = sup |xn |
0≤n<∞

does not come from an inner product (does not satisfy the parallelogram identity). ⋄

5.2 Hilbert spaces


When a space V has an inner product, it can be equipped with the norm that comes from
the inner product, becoming a normed linear space. We are therefore allowed to talk about
convergence and completion.

Definition 5.2. A Hilbert space is a complete inner product linear space. ⋄

In simpler words, a Hilbert space is a Banach space with inner product. From now on, in
this Topic, V will always be a Hilbert space.

Example 41. Let I ⊂ R be an interval. The space L2 (I, K) with the norm
√∫
∥f ∥2 = |f (t)|2 dt
I

is a Banach space, as we saw before. We define



⟨f, g⟩ = f (t) g(t)∗ dt.
I

We want to see that this is an inner product. (Once we prove that this is an inner product, then
we note that the norm ∥ √· ∥2 that we already had on this space, actually comes from this inner
product, that is, ∥f ∥2 = ⟨f, f ⟩. Hence L2 will be a Hilbert space.)
66 TOPIC 5. HILBERT SPACES

The first thing to show is that the quantity ⟨f, g⟩ is actually finite. This is a consequence of
Hölder’s inequality:
∫ ∫


|⟨f, g⟩| = f (t)g(t) dt ≤ |f (t)g(t)| dt = ∥f g∥1 ≤ ∥f ∥2 ∥g∥2 < ∞.
I I

Once we know this is a finite quantity, the other properties are an exercise. ⋄

Example 42. The space ℓ2 (N, C) is a Hilbert space, with inner product given by


⟨x, y⟩ = xn yn∗ .
n=0

Use the last example as a blueprint and show that this indeed defines an inner product on ℓ2 .⋄

The first result about Hilbert spaces has to be the following:

Proposition 32. If xn → x and yn → y, then ⟨xn , yn ⟩ → ⟨x, y⟩.

Proof. Note that xn → x means that ∥xn − x∥ → 0. Write

⟨xn , yn ⟩ − ⟨x, y⟩ = (⟨xn , yn ⟩ − ⟨xn , y⟩) + (⟨xn , y⟩ − ⟨x, y⟩).

Taking absolute values, we get

|⟨xn , yn ⟩ − ⟨x, y⟩| ≤ |⟨xn , yn ⟩ − ⟨xn , y⟩| + |⟨xn , y⟩ − ⟨x, y⟩|.

That is,
|⟨xn , yn ⟩ − ⟨x, y⟩| ≤ |⟨xn , yn − y⟩| + |⟨xn − x, y⟩|.
Use the Cauchy-Schwarz (from now on, CS) inequality to obtain

|⟨xn , yn ⟩ − ⟨x, y⟩| ≤ ∥xn ∥ ∥yn − y∥ + ∥xn − x∥ ∥y∥.

Taking limits as n → ∞ gives us ∥xn ∥ → ∥x∥, ∥yn ∥ → ∥y∥, ∥xn − x∥ → 0, ∥yn − y∥ → 0. This
proves that ⟨xn yn ⟩ → ⟨x, y⟩. 2

We will see more results about Hilbert space in the sections below.

5.3 Convexity and optimisation


In many ways Hilbert spaces are the ideal setting in which to do optimisation.

Definition 5.3. Let V be a linear space. A set C ⊂ V is convex if, for any x, y ∈ C, and any
0 ≤ t ≤ 1, the element tx + (1 − t)y is also in C. ⋄

In words, a set is convex if the straight segment joining any two points in C, is also in C.
Note: Any subspace is convex.

Proposition 33. Suppose C is a closed, convex subset of V , and x ̸∈ C. Then there is some
y ∈ C such that, for all z ∈ C
∥x − y∥ ≤ ∥x − z∥.
Moreover, this y is unique.
5.3. CONVEXITY AND OPTIMISATION 67

The importance of this result is regarding optimisation problems. We are saying that a
certain minimization problem has a unique solution y.

Proof. Let s = inf z∈C ∥x − z∥. Let zn ∈ C be a sequence such that limn→∞ ∥x − zn ∥ = s. We
claim that zn is a Cauchy sequence.
Let’s apply the parallelogram law to
x − zn x − zm zn + zm zm − zn
v= , w= , v+w =x− , v−w = .
2 2 2 2
We obtain
zn + zm 2 zm − zn 2 x − zn 2 x − zm 2
∥x − ∥ +∥ ∥ = 2∥ ∥ + 2∥ ∥ .
2 2 2 2
Now observe that (zn + zm )/2 ∈ C, since C is convex, and therefore
zn + zm 2
∥x − ∥ ≥ s2 .
2
We obtain
zm − zn 2 x − zn 2 x − zm 2
s2 + ∥ ∥ ≤ 2∥ ∥ + 2∥ ∥ .
2 2 2
But ∥x − zn ∥ → s, ∥x − zm ∥ → s as n, m → ∞. We conclude that
zm − zn 2 x − zn 2 x − zm 2
∥ ∥ ≤ 2∥ ∥ + 2∥ ∥ − s2 → 0
2 2 2
as m, n → ∞, proving that zn is a Cauchy sequence. Now, V is complete, so zn → y for some
y ∈ V , and C is closed, so y ∈ C. We leave the proof of uniqueness as an exercise. 2

The use of the parallelogram identity hints at the fact that we must be working on a Hilbert
space for this result to be true. In general Banach spaces, this result may be false!
Example 43. Consider the Banach space ℓ∞ (N, R) with the supremum norm. Let C be the
closed unit ball:
C = {z ∈ ℓ∞ | ∥z∥∞ ≤ 1}.
This set is convex, as shown before. (Every ball on a Banach space is convex.) First, let’s take

x = (3, 0, 0, 0, . . . ).

Since ∥x∥∞ = 3, we have x ̸∈ C. If z ∈ C, then

∥x − z∥∞ = sup |xn − zn | = sup{|3 − z0 |, |z1 |, |z2 |, . . . } = |3 − z0 |.


0≤n<∞

The last equality is true because |zn | ≤ 1 for all n, and 1 < |3 − z0 |. But since −1 ≤ z0 ≤ 1, we
must have 2 ≤ |3 − z0 | ≤ 4. We conclude that

2 ≤ ∥x − z∥∞ .

Consider these two possibilities for z:

z = (1, 0, 0, 0, . . . ), z = (1, 1, 1, 1, . . . ).

For both of these, we have ∥x − z∥∞ = 2, showing that both these choices for z are solutions to
the minimisation problem. We conclude that the minimisation problem does not have a unique
solution! (In fact, check that there are infinitely many solutions to the minimisation problem.)

68 TOPIC 5. HILBERT SPACES

Things could be even worse. It could be that there is no minimiser at all!


Example 44. We will consider the space of continuous functions f : [−1, 1] → R, with the norm

∥f ∥ = sup |f (t)| = max |f (t)|.


−1≤t≤1 −1≤t≤1

(We may use max rather than sup, because every continuous function on [−1, 1] attains its
maximum.) We call this space V = C([−1, 1], R), and it can be shown that this is a Banach
space with this norm. We define the set
{ ∫ 0 ∫ 1 }
C= f ∈V | f (t) dt = −1, f (t) dt = 1 .
−1 0

This set is convex (prove it!). Also, this set is closed because if fn → f in the supremum norm,
then fn (t) → f (t) for every t, and we can apply the dominated convergence theorem to show
that ∫ ∫
0 0
f (t) dt = lim fn (t) dt = −1,
−1 n→∞ −1

with a similar statement for the other integral, showing that f ∈ C.


Now let g ∈ V be the zero function g(t) = 0 for all t. We claim that the problem of finding
a minimiser to
inf ∥f − g∥ = inf ∥f ∥
f ∈C f ∈C

has no solution. That is, there is no function f ∈ C that is closest to the origin. In yet
other words, given any f1 ∈ C, it is always possible to find another function f2 ∈ C such that
∥f2 ∥ < ∥f1 ∥.
You should try to prove this claim! ⋄

5.4 Orthogonality
Definition 5.4. Two elements x, y ∈ V are orthogonal (to each other) if ⟨x, y⟩ = 0. ⋄
Orthogonal is another word for perpendicular. As we pointed out before, this concept doesn’t
necessarily carry over to the more general setting of Banach spaces. Let’s see what it can do for
us.
Definition 5.5. If W is a subspace of V we define

W ⊥ = {y ∈ V | ⟨x, y⟩ = 0 for all x ∈ W },

and call it the orthogonal complement of W . ⋄


So, any element of W ⊥ is perpendicular to every element of W . Note also that 0 ∈ W ⊥
always, so W ⊥ is not empty.
Proposition 34. Let V be a Hilbert space, and let W be a subspace of V .

(a) If W = V , then W ⊥ = {0}. If W = {0}, then W ⊥ = V .

(b) If W is the closure of W , then (W )⊥ = W ⊥ .

(c) If W is dense in V , then W ⊥ = {0}.


5.4. ORTHOGONALITY 69

(d) W ⊥ is a closed subspace of V .

(e) (W ⊥ )⊥ = W .

(f ) If W is closed, then (W ⊥ )⊥ = W .

(g) If W is closed, then V = W ⊕ W ⊥ .

Proof. (a) Suppose W = V , and x ∈ W ⊥ . Since x ∈ W as well, x must be perpendicular to


itself: ⟨x, x⟩ = 0. But then x = 0, showing that W ⊥ = {0}. If, on the other hand, W = {0},
then for all x ∈ V we have ⟨0, x⟩ = 0, and we conclude that W ⊥ = V .
(b) Suppose x ∈ W , then there is a sequence xn ∈ W such that xn → x. If y ∈ W ⊥ , then

⟨y, x⟩ = lim ⟨y, xn ⟩ = 0,


n→∞

and we see that y ∈ (W )⊥ . Therefore W ⊥ ⊂ (W )⊥ . On the other hand, if y ∈ (W )⊥ , then y is


perpendicular to every element of W , and with more reason y is perpendicular to every element
in W . Thus y ∈ W ⊥ , and we conclude that (W )⊥ ⊂ W ⊥ . Thus (W )⊥ = W ⊥ .
(c) If W is dense in V , then W = V , and so W ⊥ = (W )⊥ = V ⊥ = {0}.
(d) Suppose yn is a Cauchy sequence in W ⊥ . Since V is complete, then yn converges to
y ∈ V , and we want to show that y ∈ W ⊥ If x ∈ W , we have

⟨y, x⟩ = lim ⟨yn , x⟩ = 0.


n→∞

We see that y ∈ W ⊥ , and so W ⊥ is closed.


(e), (f) We leave these as exercises.
(g) Since W is a subspace, then W is convex. As W is also closed, for each x ∈ V we define
w(x) to be the only element in W which is closest to x, so that for all y ∈ W we have

∥x − w(x)∥ ≤ ∥x − y∥.

This w(x) must exist, and must be unique, from the result we obtained in the last section. We
define v(x) = x − w(x), and we claim that v(x) ∈ W ⊥ . For any y ∈ W , any θ ∈ R, and any
t > 0 we have
∥v(x)∥2 ≤ ∥x − w(x) + eiθ ty∥2 = ∥v(x) + eiθ ty∥2 ,
from the minimization property of w(x). Thus

0 ≤ 2ℜ⟨v(x), eiθ ty⟩ + t2 ∥y∥2 ,

for any y ∈ W , any θ ∈ R, and any t > 0. Fixing y and letting θ change, we choose θ to make
e−iθ ⟨v(x), y⟩ real, and obtain
|t⟨v(x), y⟩| ≤ t2 ∥y∥2 .
Now cancel t, and let t → 0, to obtain ⟨v(x), y⟩ = 0. This shows that v(x) ∈ W ⊥ , and we have
obtained an expression x = w(x) + v(x), with w(x) ∈ W and v(x) ∈ W ⊥ , so that V = W + W ⊥ .
Since W ∩ W ⊥ = {0} (there is only one element that is perpendicular to itself), we see that in
fact V = W ⊕ W ⊥ , and we are done. 2

This seems to be the right place to point out that the mapping x 7→ v(x) is called the
orthogonal projection of V onto W . We will discuss this aspect of the theory later on.
70 TOPIC 5. HILBERT SPACES

5.5 Bases
Definition 5.6. Let V be a Hilbert space over K. A countable set x1 , x2 , x3 , . . . spans V if the
subspace W of finite linear combinations of the xn is dense in V . In other words, the xn span
V if any element of V can be approximated arbitrarily closely by finite linear combinations of
the spanning elements. ⋄
Example 45. In ℓ2 , take xn = (0, 0, . . . , 0, 1, 0, 0, . . . ), with 0 in every coordinate except for a
1 in the nth coordinate. This set spans ℓ2 . ⋄
Definition 5.7. A countable set xn is said to be a basis for V if it spans V , and no proper
subset of the xn spans V . ⋄
Example 46. In ℓ2 , take xn = (0, 0, . . . , 0, 1, 0, 0, . . . ), with 0 in every coordinate except for a
1 in the nth coordinate. This set is a basis for ℓ2 . ⋄
There are two ways in which a countable set xn can fail to be a basis. First, the set may
not span. Second, it spans, but it contains a proper spanning subset. Let’s take a look at the

first case. If it doesn’t span, then W is not the whole of V , and there is a non-trivial W . We

could then hope to produce a basis by adding some elements from W to it. Let’s now look at
the second case. If the set xn spans, we can hope to make it into a basis by shedding some of
its elements away (those that can be written in terms of the others). However, either of these
procedures could be clumsy, and offers no guarantee of success if tackled head on. Instead we
take a different route.

5.5.1 The Gram-Schmidt process


Proposition 35. Let x, y ∈ V be such that y ̸= 0. The orthogonal projection of x onto y,
denoted by Py x, is the element
⟨x, y⟩
Py x = y.
⟨y, y⟩
Proof. Let W = {λy | λ ∈ K} be the span of y. Since this is a closed subspace, we know that
any x ∈ V can be written in a unique way as x = w(x) + v(x), with w(x) ∈ W and v(x) ∈ W ⊥ .
This w(x) is the projection we are looking for. Since any element in W is a multiple of y, we
have x = λy + v(x), for some λ ∈ K. Then ⟨x, y⟩ = λ⟨y, y⟩ + ⟨v(x), y⟩, but as v(x), y are
perpendicular, we find ⟨x, y⟩ = λ⟨y, y⟩. Then
⟨x, y⟩
Py x = w(x) = λy = y,
⟨y, y⟩
as desired. 2

When ∥y∥ = 1, the formula simplifies to

Py x = ⟨x, y⟩ y.

The next result is called the Gram-Schmidt process.


Proposition 36. Let x1 , x2 , . . . , xn−1 , xn be non-zero elements, and let W be their span, that
is, { n }

W = ak xk , | ak ∈ K .
k=1
5.5. BASES 71

Then we can produce a set of non-zero elements y1 , . . . , ym , with m ≤ n, having the same span
W , and such that ∥yk ∥ = 1, and if k ̸= j, then ⟨yk , yj ⟩ = 0.

Proof. Define y1 = x1 /∥x1 ∥, so that ∥y1 ∥ = 1. Note that x1 and y1 have the same span. Now
define
z2 = y1 − Py1 x2 .
Note that z2 is perpendicular to y1 . If z2 = 0, we throw it out and skip to the next step. If
z2 ̸= 0, then we define
y2 = z2 /∥z2 ∥.
Note that ∥y2 ∥ = 1, and ⟨y1 , y2 ⟩ = 0. Also, {x1 , x2 } and {y1 , y2 } have the same span.
We now define
z3 = x3 − Py1 x3 − Py2 x3 .
Verify that z3 is perpendicular to both y1 and y2 . If z3 = 0, skip to the next element (x4 ),
otherwise define
y3 = z3 /∥z3 ∥,
and note that {x1 , x2 , x3 } has the same span as {y1 , y2 , y3 }. Proceed inductively. 2

Example 47. Consider the space L2 ([0, 1], R), and let

f1 (x) = 1, f2 (x) = x, f3 (x) = x2 .

We define
f1 (x)
g1 (x) = √∫ = 1.
1
0 |f1 (t)|2 dt

Next, ∫ 1
1
Pg1 f2 (x) = ⟨f2 , g1 ⟩ g1 (x) = f2 (t)f1 (t) dt · g1 (x) = .
0 2
Then
1
z2 (x) = f2 (x) − Pg1 f2 (x) = x − .
2
We compute
∫ 1( )2
1 1 1 1 1
∥z2 ∥ = 2
t− dt = − + = .
0 2 3 2 4 12
We define
z2 (x) √
g2 (x) = = 3(2x − 1).
∥z2 ∥
To find g3 we first compute
∫ √
1
1 √ ∫ 1
3
⟨f3 , g1 ⟩ = 2
t dt = , ⟨f3 , g2 ⟩ = 3 2t − t dt =
3 2
.
0 3 0 6

Then √ ( )
1 3 1 1
z3 (x) = f3 (x) − g1 (x) − g2 (x) = x − − x −
2
.
3 6 3 2
So
1
z3 (x) = x2 − x + ,
12
72 TOPIC 5. HILBERT SPACES

and
x2 − x + 12
1
1
g3 (x) = √∫ = 4x2 − 4x + .
1 2 3
0 (t − t + 12 ) dt
1 2

(It goes without saying that you must check all of these computations, as I am doing them in
my head late at night). ⋄
If we apply the Gram-Schmidt process to a countable sequence of elements x1 , x2 . . . , we
obtain a countable sequence of elements y1 , y2 , . . . , and provided no xk is contained in the span
of the previous elements, then

Wk = span{x1 , . . . , xk } = span{y1 , . . . , yk }

for all k. We conclude that the countably infinite sequence x1 , x2 , . . . has the same span as
y1 , y2 , . . .
Definition 5.8. A countable basis y1 , y2 , . . . for V is orthogonal if the yk are mutually per-
pendicular, and orthonormal if it is orthogonal, and all basis elements have norm 1. ⋄
Our preceding argument shows that if V has a countable basis, then it has an orthonormal
countable basis (which can be produced by the Gram-Schmidt process).
(A much more general result is true: every Hilbert space has an orthonormal basis.)

5.5.2 Representing x on an orthogonal basis


Suppose x1 , x2 , . . . is an orthogonal basis for V , then for any ε > 0 there is some finite linear
combination of the basis elements that is ε-close to x, and we can write

n
∥x − ak xk ∥ < ε,
k=1

for some n > 0 and some coefficients ak ∈ K. We can, however, do better than that. For each
integer n > 0 let Wn be the span of x1 . . . , xn . Then Wn is a closed subspace of V , and there is
one and only one element PWn x ∈ Wn that is closest to x. This element is

n
⟨x, xk ⟩
PWn x = xk ,
⟨xk , xk ⟩
k=1

as obtained by the Gram-Schmidt process. No other element in Wn is closer to x than this.


Since the xn form a basis (by assumption), then for each ε > 0 there is some n0 such that, if
n ≥ n0 then there are some coefficients ak such that

n
∥x − ak xk ∥ < ε.
k=1

But since

n
⟨x, xk ⟩ ∑
n
∥x − xk ∥ ≤ ∥x − ak xk ∥
⟨xk , xk ⟩
k=1 k=1
for any choice of ak , we conclude that, given ε > 0, there is some n0 such that, if n ≥ n0 , then

n
⟨x, xk ⟩
∥x − xk ∥ < ε.
⟨xk , xk ⟩
k=1
5.5. BASES 73

∑n ⟨x,xk ⟩
Hence the sequence zn = k=1 ⟨xk ,xk ⟩ xk converges to x, and we write


∑ ⟨x, xk ⟩
x= xk .
⟨xk , xk ⟩
k=1

This equality has to be ∑


interpreted as a limit.
⟨x,xk ⟩
Again writing zn = nk=1 ⟨x k ,xk ⟩
xk , we can write

⟨zn , x⟩ → ⟨x, x⟩,

which gives us Parseval’s identity.



∑ |⟨x, xk ⟩|2
∥x∥ = 2
.
⟨xk , xk ⟩
k=1

Another popular result is Bessel’s inequality. Assume that the xk are mutually perpen-
dicular, but don’t necessarily span V . Let’s write

⟨x, xk ⟩
ck = .
⟨xk , xk ⟩

We compute

n ∑
n ∑
n ∑
n
0 ≤ ∥x − ck xk ∥2 = ∥x∥2 − c∗k ⟨x, xk ⟩ − ck ⟨xk , x⟩ + ∥ ck xk ∥2 .
k=1 k=1 k=1 k=1

Because the xk are mutually orthogonal we have



n ∑
n
|⟨x, xk ⟩|2
∥ ck xk ∥ =2
.
⟨xk , xk ⟩
k=1 k=1

But also

n ∑
n ∑
n
|⟨x, xk ⟩|2
c∗k ⟨x, xk ⟩ = ck ⟨xk , x⟩ = .
⟨xk , xk ⟩
k=1 k=1 k=1

We conclude that

n
|⟨x, xk ⟩|2
≤ ∥x∥2 .
⟨xk , xk ⟩
k=1

Since this is true for all n, we can take a limit to obtain



∑ |⟨x, xk ⟩|2
≤ ∥x∥2 .
⟨xk , xk ⟩
k=1

This is Bessel’s inequality, valid for orthogonal sets (that don’t necessarily span).
Finally, we have the Riemann-Lebesgue Lemma, stating that

|⟨x, xk ⟩|2
lim = 0,
k→∞ ⟨xk , xk ⟩

which is an immediate consequence of Bessel’s inequality.


74 TOPIC 5. HILBERT SPACES

5.5.3 Fourier series


Consider the space L2 ([−π, π], C) (the change from [0, 1] to a different interval does not affect
any of the results we have obtained so far). Define for each k ∈ Z

eikx
ek (x) = √ .

Then ikx 2

e
π
∥ek ∥22 = √ dx = 1,

−π

since |eikx | = 1. Moreover, if k ̸= m, then


∫ ∫
π
eikx e−imx 1 π
⟨ek , em ⟩ = √ · √ dx = ei(k−m)x dx = 0.
−π 2π 2π 2π −π

We can’t conclude from this that the set {ek }k∈Z forms a basis for L2 , but we can conclude both
Bessel’s inequality and the Riemann-Lebesgue Lemma. Thus, for every function f ∈ L2 we have
∫ π
lim f (x) e−inx dx = 0,
|n|→∞ −π

(from Riemann-Lebesgue), and



∑ ∫ 2 ∫
1 π π
√ f (x) e −inx
dx ≤ |f (x)|2 dx

k=−∞ −π −π

(Bessel’s inequality). If, however, we manage to prove that the ek do form a basis for L2 , then
Bessel’s inequality becomes Parseval’s identify, and we write


f (x) = ⟨f, ek ⟩ ek (x).
k=−∞

Important to note: this equality does not mean that for each value of x the left side and the
right side give the same numerical value. This equality has to be understood as a limit in L2 ,
namely,
∑n

lim f − ⟨f, ek ⟩ ek = 0.
n→∞
k=−n 2

The proof that the set ek forms a basis of L2 belongs in an advanced course in Fourier series,
and we won’t see it here.

5.5.4 The Haar basis (wavelets)


We work in L2 ([0, 1), R); let’s define the Haar basis of L2 . This will be a sequence with two
indices n, k. Before we describe the indexing, we define two functions, for 0 ≤ x < 1.
{
1, 0 ≤ x < 1/2;
ϕ(x) = 1, ψ(x) =
−1, 1/2 ≤ x < 1.
5.5. BASES 75

For n = 0, 1, 2, 3, . . . and k = 0, 1, 2, . . . , 2n − 1, we define, for 0 ≤ x < 1,


{
2n/2 ψ(2n x − k), 2kn ≤ x < k+1 2n ;
ψn,k (x) =
0, otherwise.

Note that ψ0,0 = ψ. This is the zero level (n = 0). The first level (n = 1) contains ψ1,0 and ψ1,1 .
The second level contains four functions, namely ψ2,0 to ψ2,3 . The third level contains from ψ3,0
to ψ3,7 , and so on. Note that ∥ϕ∥2 = 1, and
∫ k+1 ∫ k+1
2n 2n
∥ψn,k ∥22 = (2 ) ψ(2 x − k) dx = 2
n/2 2 n 2 n
dx = 1.
k k
2n 2n

(This is because ψ(y)2 = 1 for all y ∈ [0, 1).)


Now a little nomenclature: The support of a function is where the function is not zero. The

2n ). If k ̸= k , then the support of ψn,k and the support
support of ψn,k is the interval [ 2kn , k+1
of ψn,k′ are disjoint, and therefore the product ψn,k (x) · ψn,k′ (x) = 0 for all 0 ≤ x < 1. As a
consequence
⟨ψn,k , ψn,k′ ⟩ = 0.
Now, if n < n′ , and regardless of what k, k ′ are, one of three things can happen with the support
of ψn,k and the support of ψn′ ,k′ . First, it is possible that the supports are disjoint, in which
case once more we have ⟨ψn,k , ψn′ k′ ⟩ = 0. Second, it is possible that the support of ψn,k falls
totally inside an interval where ψn′ k′ = 1. In this case we have
∫ k+1
2n
⟨ψn,k , ψn′ k′ ⟩ = ψn,k (x) dx = 0.
k
2n

Third and last possibility, the support of ψn,k falls totally inside an interval where ψn′ k′ = −1.
In this case we have ∫ k+1
2n
⟨ψn,k , ψn k ⟩ = −
′ ′ ψn,k (x) dx = 0.
k
2n

In any case we find that ⟨ψn,k , ψn′ k′ ⟩ = 0. Since ⟨ϕ, ψn,k ⟩ = 0, we see that the set

{ϕ, ψn,k }

is an orthonormal set (but we haven’t yet proved that it is a basis). We compute

∫ 1 ∫ k+ 1
2 ∫ k+1
2n 2n
f (x)ψn,k (x) dx = 2 n/2
f (x) dx − 2 n/2
k+ 1
f (x) dx.
k 2
0 2n 2n

This may look strange, but with a little explanation it becomes clear. The quantity
∫ k+ 1
2
1 2n
f (x) dx
(1/2n+1 ) k
2n

k+ 1
is simply the average of f (x) on the interval [ 2kn , 2n2 ). Thus the quantity ⟨f, ψn,k ⟩ is a measure
of how much the averages change in adjacent intervals. Let’s take a look at

⟨f, ϕ⟩ ϕ(x) + ⟨f, ψ0,0 ⟩ ψ0,0 (x),


76 TOPIC 5. HILBERT SPACES

the approximation at level 0. Note that

∫ 1
⟨f, ϕ⟩ = f (x) dx = the average of f on [0, 1].
0

Since
∫ 1/2 ∫ 1
⟨f, ψ0,0 ⟩ = f (x) dx − f (x) dx,
0 1/2

we obtain
 ∫ ∫ ∫
 1 f (x) dx + 1/2 f (x) dx − 1 f (x) dx, 0 ≤ x < 1/2;
0 0 1/2
⟨f, ϕ⟩ ϕ(x) + ⟨f, ψ0,0 ⟩ ψ0,0 (x) =
 ∫ 1 f (x) dx − ∫ 1/2 f (x) dx + ∫ 1 f (x) dx, 1/2 ≤ x < 1.
0 0 1/2

Simplifying we find
 ∫
 2 1/2 f (x) dx, 0 ≤ x < 1/2;
0
⟨f, ϕ⟩ ϕ(x) + ⟨f, ψ0,0 ⟩ ψ0,0 (x) = ∫
 2 1 f (x) dx, 1/2 ≤ x < 1.
1/2

In yet other words,


{
the average of f on [0, 1/2), 0 ≤ x < 1/2;
⟨f, ϕ⟩ ϕ(x) + ⟨f, ψ0,0 ⟩ ψ0,0 (x) =
the average of f on [1/2, 1), 1/2 ≤ x < 1.

I will leave for you to verify the following:

⟨f, ϕ⟩ ϕ(x) + ⟨f, ψ0,0 ⟩ ψ0,0 (x) + ⟨f, ψ1,0 ⟩ ψ1,0 (x) + ⟨f, ψ1,1 ⟩ ψ1,1 (x)

is equal to

 the average of f on [0, 1/4), 0 ≤ x < 1/4;



 the average of f on [1/4, 2/4), 1/4 ≤ x < 2/4;

 the average of f on [2/4, 3/4), 2/4 ≤ x < 3/4;



the average of f on [3/4, 1), 3/4 ≤ x < 1.

Try to prove the general case by induction (it is not that hard, if you get used to the indices).
Now take a look at what you proved. If f is a continuous function, the Haar sum up to
level n provides a step function, with the steps being the averages of f over intervals of type
[k/2n , (k + 1)/2n ). When f is continuous, this step function surely approximates f in L2 , and
the approximation becomes better as n increases. (This needs verification, but it is not too
hard.) In short, we conclude that the Haar functions form a basis for L2 .
Bessel’s inequality, Parsevals’ identity, and the Riemann-Lebesgue Lemma all hold.
5.5. BASES 77

Example 48. Look at Parseval’s for the function f (x) = x. We have

∫ 1 ∫ 1
k+ 2 ∫ k+1
2n 2n
f (x)ψn,k (x) dx = 2 n/2
x dx − 2 n/2
1
k+ 2
x dx
k
0 2n 2n
 ( ) 
1 2 ( )2 ( )2
2n/2 k+ 2 k k+1 
= · 2 n
− n

2 2 2 2n
( ( ) )
2n/2 1 2
= · 2 k+ − (k)2 − (k + 1)2
22n+1 2

2n/2
= − .
22n+2
Squaring this we obtain
2n 1
|⟨f, ψn,k ⟩|2 = = .
24n+4 23n+4
(This verifies Riemann-Lebesgue.) Adding these from k = 0 to k = 2n − 1 (that is, 2n equal
terms) produces
n −1
2∑
1
|⟨f, ψn,k ⟩|2 = 2n+4 .
2
k=0

Adding now from n = 0 to infinity, and not forgetting that |⟨f, ϕ⟩|2 = 1/4, gives us

∑ −1
∞ 2∑ ∞
1 ∑ 1
n
1 1 1 1
|⟨f, ϕ⟩| + 2
|⟨f, ψn,k ⟩| = +
2
= + · = .
4 22n+4 4 16 1 − 1
4
3
n=0 k=0 n=0

Well, what do you know? ∫ 1


1
∥f ∥22 = x2 dx = .
0 3
What a happy coincidence! ⋄
78 TOPIC 5. HILBERT SPACES
Topic 6

Linear functionals

The next big step in understanding Banach and Hilbert spaces is to understand the linear
functionals defined on those spaces; that is, to understand linear mappings from V to K.
Definition 6.1. A linear functional on a linear space V is a linear mapping f : V → K; that
is, a mapping satisfying
f (x + λy) = f (x) + λf (y)
for all x, y ∈ V , λ ∈ K. If f is a linear functional, we define
ker(f ) = {x ∈ V | f (x) = 0},
and call it the kernel of f , or the nullspace of f . ⋄
Note that the definition makes no mention of the norm. Here are some examples.
Example 49. Let V be the space of all polynomials with coefficients in R; then V is a vector
space over R. If p is a polynomial, we define
f (p) = p(0),
the value of the polynomial when evaluated at x = 0. Since
f (p + λq) = p(0) + λq(0) = f (p) + λf (q),
we see that f is a linear functional. The kernel of f contains all polynomials which have constant
term 0, and therefore all polynomials for which x = 0 is a solution to p(x) = 0. ⋄
Example 50. Let V = ℓ∞ (N, C), and define, for each x = (x1 , x2 , . . . ) ∈ V ,

∑ xk
f (x) = .
2k
k=1

Since x ∈ ℓ∞ we know that for some constant C > 0 we have |xk | ≤ C for all k, and therefore

∑∞
xk ∑ |xk |
∞ ∞

1
≤ ≤ C · = C.
2
k 2 k 2k
k=1 k=1 k=1

This shows that f is defined for all x, and since the series converges absolutely (that is, in
absolute value), we have

∑ ∞
∑ ∞

xk + λyk xk yk
f (x + λy) = = +λ· = f (x) + λf (y).
2k 2k 2k
k=1 k=1 k=1

79
80 TOPIC 6. LINEAR FUNCTIONALS

Example 51. Take V = ℓp (N, C), with 1 ≤ p ≤ ∞, and fix one element z ∈ ℓq , where q is the
conjugate exponent to p. We define f : ℓp → C by setting


f (x) = xn zn∗ .
n=0

This functional is well-defined (that is, f (x) is finite for each x) because of Hölder’s inequality:

∑∞ ∑ ∞

|f (x)| = xn zn ≤ |xn | |zn | = ∥xz∥1 ≤ ∥x∥p ∥z∥q < ∞.

n=0 n=0

Because the series defining f (x) converges absolutely, we have



∑ ∞
∑ ∞

f (x + λy) = (xn + λyn )zn∗ = xn zn∗ + λ yn zn∗ = f (x) + λf (y).
n=0 n=0 n=0

Example 52. Let V = L2 ([0, 1], C), and define for each f ∈ V (note the change in notation)
the functional ∫ 1
F (f ) = f (x) dx.
0

Since L2 ⊂ L1 ,
we see that the integral is finite. The linear properties of the integral guarantee
that F (f + λg) = F (f ) + λF (g). Here ker(F ) contains all functions that have average 0. ⋄

It may be a bit difficult to see why we should be interested in linear functionals at all,
but there are a few answers to this question. First, some functionals (like the integral in the
last example) are important in their own right. Second, and more subtle, understanding the
functionals helps us understand the space V itself (this is not yet clear, but hopefully it will
become clear later on).
An important fact about linear functionals is that their kernels are subspaces. We leave the
proof of this proposition as an exercise.

Proposition 37. Let f : V → K be a linear functional. Then ker(f ) is a subspace of V .

From now on we concentrate our attention on normed linear spaces V , but we will allow V
not to be complete, that is, neither Banach nor Hilbert, and there is a reason for that. We want
to understand, very particularly, what happens to a linear functional on V , when we complete
V . But we are getting ahead of ourselves. The first thing we need to do, now that we have a
norm, is to study continuity for linear functionals.

6.1 Boundedness and continuity


Definition 6.2. Let f : V → K be a linear functional on the normed space V . We say that f
is bounded if there is some constant C > 0 (depending on f ) such that, for all x ∈ V we have

|f (x)| ≤ C ∥x∥.

(The name bounded comes from the fact that if ∥x∥ ≤ 1, then |f (x)| ≤ C, so the image of the
unit ball is bounded.) ⋄
6.1. BOUNDEDNESS AND CONTINUITY 81

If a linear functional is bounded, then it is continuous, and actually the two are equivalent
properties. Let’s phrase this as a proposition.

Proposition 38. Let f be a linear functional defined on a normed linear space V . The following
are equivalent:

(a) f is bounded;

(b) f is continuous;

(c) f is continuous at the origin.

Proof. We will show (a)⇒ (b)⇒(c)⇒(a).


(a)⇒ (b). Assume |f (z)| ≤ C∥z∥ for all z ∈ V . Given ε > 0 and x ∈ V , choose δ = ε/C.
Then, with x − y = z, we have
|f (x − y)| ≤ C ∥x − y∥.
Therefore

∥x − y∥ < δ =⇒ |f (x) − f (y)| = |f (x − y)| ≤ C ∥x − y∥ < C ε/C = ε.

We conclude that f is continuous.


(b)⇒(c). If f is continuous, then clearly f is continuous at the origin.
(c)⇒(a). Suppose f is continuous at the origin, that is, given ε > 0 there is some δ > 0
such that, if ∥x − 0∥ < δ, then |f (x) − f (0)| < ε. Since for a linear functional f (0) = 0, we can
rephrase this as: Given ε > 0 there is some δ > 0 such that

∥x∥ < δ =⇒ |f (x)| < ε.


1 1
1+ n 1+ n
Now, if ∥y∥ = 1, define xn = δ
1
1+ n
· y, so that ∥xn ∥ < δ. Then |f (y)| = |f (xn δ )| <ε δ .
Since this is valid for all n > 0, we conclude that |f (y)| ≤ ε/δ = C. Now, if z ∈ V and z ̸= 0,
we set y = z/∥z∥, so
|f (z)| = |f (∥z∥ y)| = |f (y)| · ∥z∥ ≤ C ∥z∥.
Since this inequality is clearly true for z = 0, we see that f is bounded, and we are done. 2

Hence the concept of boundedness is equivalent to the concept of continuity for linear func-
tionals, but boundedness is simpler to use, as we don’t need to worry about ε and δ. With
boundedness we only need to work with a single inequality relating |f (x)| and ∥x∥.

Example 53. Let V = C([0, 1], R) be the set of continuous functions. Let’s place the supremum
norm on V . With this norm, we know that V is complete (it is a Banach space, but it is not a
Hilbert space). If f ∈ V , we define a linear functional F : V → R by setting F (f ) = f (0). Then

|F (f )| = |f (0)| ≤ sup |f (t)| = ∥f ∥∞ .


0≤t≤1

This shows that F is bounded (with C = 1). ⋄

Example 54. Again, take V = C([0, 1], R) to be the set of continuous functions, but this time
we change to the norm ∫ 1
∥f ∥1 = |f (t)| dt.
0
82 TOPIC 6. LINEAR FUNCTIONALS

This space is not complete (if we complete it, we obtain L1 ). But, since the functions in V
are continuous, we can define the linear functional G : V → R, given by G(f ) = f (0). This
functional is not bounded! To see why not, for each n > 0 we define
{
2n − 2n2 t, 0 ≤ t ≤ n1 ;
fn (t) =
n < t ≤ 1.
1
0,

Check that ∥fn ∥1 = 1 for all n, and fn (0) = 2n. If G were bounded, we would have, for some
fixed C > 0,
|G(fn )| = |fn (0)| ≤ C∥fn ∥1 ,
that is, we would have, for all n, that
2n ≤ C.
This is absurd, and we conclude that G is not bounded. ⋄
Continuity is, of course, related to convergence.
Proposition 39. Let f : V → K be a bounded linear functional on the normed space V , and
suppose xn ∈ V , x ∈ V , with xn → x. Then f (xn ) → f (x).
Proof. This is, of course, equivalent to continuity, but we can write

|f (xn ) − f (x)| = |f (xn − x)| ≤ C ∥xn − x∥ −→ 0

as n → ∞, proving the claim. 2

The next proposition is very useful.


Proposition 40. Let K be complete, let V be a normed space with norm ∥ · ∥, and let W be a
dense subspace of V . Suppose f : W → K is linear and bounded, that is, suppose that

f (x + λy) = f (x) + λf (y), |f (x)| ≤ C ∥x∥,

for all λ ∈ K, for all x, y ∈ W , and for some C > 0 (linearity over W , boundedness over
W ). Then f may be extended (in a unique way) to the whole space V , to a linear functional
fe : V → K, such that for all x ∈ V we have

|fe(x)| ≤ C ∥x∥,

and for all x ∈ W we have fe(x) = f (x).


Proof. Let’s define fe. Take x ∈ V . Since W is dense in V , then there is a sequence xn ∈ W
such that xn → x in V . The sequence xn is a Cauchy sequence in V . From

|f (xn ) − f (xm )| = |f (xn − xm )| ≤ C ∥xn − xm ∥,

we see that the sequence f (xn ) is a Cauchy sequence in K. Since K is complete, we see that
f (xn ) converges. Define
fe(x) = lim f (xn ).
n→∞
The first thing to do is to show that this definition does not depend on the particular Cauchy
sequence xn . Indeed, if yn → x, then, as n → ∞,

∥xn − yn ∥ ≤ ∥xn − x∥ + ∥x − yn ∥ −→ 0.
6.1. BOUNDEDNESS AND CONTINUITY 83

As a consequence
|f (xn ) − f (yn )| ≤ C ∥xn − yn ∥ −→ 0,
and we see that lim f (xn ) = lim f (yn ). This shows that the definition of fe(x) does not depend
n→∞ n→∞
on the particular sequence xn that converges to x.
Suppose now that x, y ∈ V , xn ∈ W , yn ∈ W , xn → x and yn → y. Then

fe(x + λy) = lim f (xn + λyn ) = lim f (xn ) + λf (yn ) = fe(x) + λfe(y),
n→∞ n→∞

showing that fe is a linear functional over V . Moreover,

|fe(x)| = | lim f (xn )| = lim |f (xn )| ≤ lim C ∥xn ∥ = C ∥x∥.


n→∞ n→∞ n→∞

We see that fe is bounded over V (with the same constant C, an interesting detail).
If x ∈ W , then xn = x is the ultimate Cauchy sequence converging to x, and so

fe(x) = lim f (xn ) = lim f (x) = f (x),


n→∞ n→∞

and fe agrees with f over W .


Finally, suppose f is another extension with the same properties as fe: linear, bounded (with
constant C ′ ), agreeing with f over W . If x ∈ V , xn ∈ W , and xn → x, then continuity implies
that
f (x) = lim f (xn ) = lim f (xn ) = lim fe(xn ) = fe(x).
n→∞ n→∞ n→∞
This shows that the extension is unique. 2

The functional fe is called an extension of f ; sometimes we denote the extension by the


same symbol, and simply write f : V → K.
We have already see an example of an unbounded linear functionals, but these examplex
can only occur in infinite dimensions, because on Rn every linear functional is bounded.
Proposition 41. All linear functionals on Rn are bounded.
Proof. We give the proof for R2 , as the general case only has more coordinates but is otherwise
the same. With x = (x1 , x2 ), we use the norm

∥x∥1 = |x1 | + |x2 |.

Given f , let e1 = (1, 0), e2 = (0, 1), and define C1 = max{|f (e1 )|, |f (e2 )|, 1}. Then C1 > 0 and

|f (x)| = |f (x1 e1 + x2 e2 )| = |x1 f (e1 ) + x2 f (e2 )| ≤ |x1 | |f (e1 )| + |x2 | |f (e2 )| ≤ C1 (|x1 | + |x2 |),

proving that f is bounded with respect to the norm ∥ · ∥1 . If ∥ · ∥ is any other norm, then it is
equivalent to ∥ · ∥1 , and there is some constant c2 > 0 such that ∥ · ∥1 ≤ c2 ∥ · ∥. We conclude
that f is bounded with respect to ∥ · ∥ as well, with constant C = c2 C1 > 0. 2

Another way to see this is to note that any linear functional on Rn is of the form

n
f (x1 , . . . , xn ) = ak xk ,
k=1

for some constants ak . Using this expression it is easy to see that f is continuous at the origin,
and so is bounded.
We need to move to infinitely many dimensions to see an unbounded linear functional.
84 TOPIC 6. LINEAR FUNCTIONALS

Example 55. Let V = ℓ2 (N, R), and consider the dense subspace W where if x ∈ W then only
finitely many entries of x are non-zero. We define the linear functional f : W → R by setting


f (x) = xk .
k=1

This sum converges because it is in fact a finite sum, and linearity is easy to check. This f ,
however, is not bounded on W . For each n ≥ 1, define
( )
1 1 1 1
yn = 1, , , , · · · , , 0, 0, 0, 0, · · · .
2 3 4 n
Then v v √
u n u∞
u∑ 1 u∑ 1 π2
∥yn ∥2 = t < t = .
k2 k2 6
k=1 k=1

Thus the sequence yn is bounded in V (and actually converging in V , although we don’t need
this fact). But

n
1
f (yn ) = ,
k
k=1
and thus f (yn ) → ∞ as n → ∞. This shows that f can’t be bounded, since there can be no
constant C > 0 such that, for all n, we have
|f (yn )| ≤ C ∥yn ∥2 .
Therefore any extension of this f to the whole space V must also be unbounded. ⋄
Example 56. Let V = Lp ([0, 1], C), 1 ≤ p < ∞, and let W = C([0, 1], C). We know that W
is dense in Lp because Lp can be defined as the completion of C([0, 1], C) in the p-norm. We
define the functional F : W → C by
F (f ) = f (1/2).
This is an example of an evaluation functional. It is simple to see that F is linear. It is not,
however, bounded. For each n ≥ 1 we define

 0, 0 ≤ x ≤ 21 − 4n
1
;



 (4n x + n − 2n ) ,
2 − 4n ≤ x ≤ 2 ;
2 2 1/p 1 1 1
fn (x) =

 (−4n2 x + n + 2n2 )1/p , 21 ≤ x ≤ 12 + 4n
1
;



2 + 4n ≤ x ≤ 1.
1 1
0,
If you go ahead and plot fnp (x), you will see that the graph is a triangle with base length 1/(2n),
and height n, so that ∫ 1
1 1 1
∥fn ∥pp = fn (x)p dx = · ·n= .
0 2 2n 4
However,
F (fn ) = fn (1/2) = n1/p .
If F were bounded, then for some C > 0 we would have, for all n,
|F (fn )| ≤ C ∥fn ∥p ,
or n1/p ≤ C 1
41/p
. This shows that F is unbounded. ⋄
6.2. LINEAR FUNCTIONALS ON HILBERT SPACE 85

Bounded functionals on a Banach space have closed kernels, but much more is true.

Proposition 42. Let f : V → K be a bounded linear functional on the Banach space V . Then
ker(f ) is closed. Moreover, if x0 ∈ V is such that f (x0 ) ̸= 0, then for any x ∈ V we can write

x = λ(x)x0 + v(x),

where λ(x) ∈ K and v(x) ∈ ker(f ). This representation is unique, in the sense that, if x =
λ1 (x)x0 + v1 (x) with λ1 (x) ∈ K and v1 (x) ∈ ker(f ), then λ1 (x) = λ(x) and v1 (x) = v(x). In
this case we can write
V = span(x0 ) ⊕ ker(f ).

Proof. Suppose xn ∈ ker(f ), and xn → x. Then f (x) = limn→∞ f (xn ) = 0, showing that
x ∈ ker(f ). Hence ker(f ) is a closed subspace of V .
Suppose now that for some element x0 we have f (x0 ) ̸= 0. Let x be any element in V . The
equation
f (x − λx0 ) = 0
has exactly one solution λ ∈ K, namely

f (x)
λ = λ(x) = .
f (x0 )

Define v(x) = x − λ(x)x0 , so that v(x) ∈ ker(f ). This shows that V = span(x0 ) + ker(f ). If
x = λ1 (x)x0 + v1 (x), then

(λ(x) − λ1 (x))x0 = v1 (x) − v(x) ∈ ker(f ).

Then λ(x) = λ1 (x), and as a consequence v1 (x) = v(x).


2

This result shows that, in some sense, the kernel of a linear functional is one dimension away
from being the whole space.

6.2 Linear functionals on Hilbert space


When V is a Hilbert space the bounded linear functionals have even more structure. Here is a
first result.

Proposition 43. Let V be a Hilbert space with countable basis xk , and f : V → K be a bounded
∑∞
linear functional. If x = k=1 ak xk , then


f (x) = ak f (xk ).
k=1

Proof. We are not assuming that xk is orthonormal, or anything. We only assume that xk form
a basis, so the sequence

n
zn = ak xk
k=1
86 TOPIC 6. LINEAR FUNCTIONALS

converges to x as n → ∞. Since zn is defined by a finite sum, we have



n
f (zn ) = ak f (xk ).
k=1

Since f is bounded, then it is continuous, and f (zn ) → f (x) as n → ∞, therefore



∑ ∑
n
ak f (xk ) = lim ak f (xk ) = lim f (zn ) = f (x),
n→∞ n→∞
k=1 k=1

and we are done. 2

Note: In the last proposition we required V to be a Hilbert space, but that was not really
necessary. All we needed was that V was a normed space, with a bounded f , and if zn → x
(with the same definitions as in the proposition), then the conclusion holds. However, it is on
Hilbert spaces that this result is most useful.
The inner product itself is a source of bounded linear functionals.
Proposition 44. Let V be a Hilbert space, and fix z ∈ V . Then

f (x) = ⟨x, z⟩

defines a bounded linear functional.


Proof. We have

f (x + λy) = ⟨x + λy, z⟩ = ⟨x, z⟩ + λ⟨y, z⟩ = f (x) + λf (y).

Also, with C = max{∥z∥, 1} we have C > 0, and

|f (x)| = |⟨x, z⟩| ≤ ∥x∥ · ∥z∥ ≤ C ∥x∥.

The amazing fact is that the converse is true!


Theorem 2 (Riesz Representation). Let f : V → K be a bounded linear functional on the
Hilbert space V . Then there is an element z ∈ V such that for all x ∈ V we have

f (x) = ⟨x, z⟩.

Proof. How do we even start proving something like this? Assume what you want to prove, look
at possible consequences, and make a plan. For example, if the result is true, then the kernel of
f must contain the elements perpendicular to z. Therefore z must be found in the perpendicular
space to ker f . That is our starting point.
If ker(f ) = V , we take z = 0, and the result is true. If ker(f ) ̸= V , then ker(f ) is a proper
closed subspace of V , and therefore W = ker(f )⊥ is not trivial. Choose any z0 ∈ W with
z0 ̸= 0. Then we know that V = span(z0 ) ⊕ ker(f ) (the last result from last section), but also
V = W ⊕ ker(f ) (because W = (ker(f ))⊥ ). We conclude that W = span(z0 ) (because z0 ∈ W ),
so every element in W is a multiple of z0 . We know that for any x ∈ V we can find unique
λ(x) ∈ K, v(x) ∈ ker(f ) such that x = λ(x)z0 + v(x). Thus

f (x) = λ(x)f (z0 ).


6.2. LINEAR FUNCTIONALS ON HILBERT SPACE 87

Now let’s define, for each fixed λ ∈ K, the bounded linear functional

fλ (x) = ⟨x, λz0 ⟩.

Then
fλ (x) = fλ (λ(x)z0 + v(x)) = ⟨λ(x)z0 + v(x), λz0 ⟩ = λ(x)λ∗ ⟨z0 , z0 ⟩.
We want to choose λ so that, for all x we have fλ (x) = f (x). In other words, we want to solve
the equation
λ(x)λ∗ ⟨z0 , z0 ⟩ = λ(x)f (z0 ).
f (z0 )∗
We define λ = ⟨z0 ,z0 ⟩ , and z = λz0 . Then, with this λ,

f (x) = λ(x)f (z0 ) = λ(x)λ∗ ⟨z0 , z0 ⟩ = fλ (x) = ⟨x, λz0 ⟩ = ⟨x, z⟩,

for all x ∈ V .
2

Example 57. Any linear functional in Rn with values in R is given by the expression

n
f (x) = ak xk ,
k=1

where ak ∈ R are constants and x = (x1 , . . . , xn ). With the usual inner product, we see that

f (x) = ⟨x, a⟩,

where a = (a1 , a2 , . . . , an ). ⋄
Example 58. The “usual inner product” is not the only inner product in Rn . Consider, for
example in R2 , the inner product

⟨x, y⟩1 = 2x1 y1 + 3x2 y2 .

It is an exercise to see that this defines an inner product. Then there must be some z ∈ R2 such
that
f (x) = a1 x1 + a2 x2 = ⟨x, z⟩1
for all x = (x1 , x2 ). Thus we must have z = (a1 /2, a2 /3). ⋄
Example 59. A bit more generally, on Rn , let M be an n × n symmetric matrix, and suppose
all eigenvalues of M are strictly positive. If x ∈ Rn , we will write x as a column matrix (so x is
an n × 1 matrix) and xt is the 1 × n transpose of x. We define

⟨x, y⟩M = xt M y.

This is a 1 × 1 matrix, which we identify with a scalar. Exercise for you: Show that ⟨ , ⟩M is
an inner product (it is important that M is symmetric and all eigenvalues are positive; in fact,
every inner product on Rn is of this form, for some matrix M as described).
If f : Rn → R is a linear functional, then, for some z ∈ Rn we must have

n
f (x) = ak xk = ⟨x, z⟩M = xt M z.
k=1

If we set a = M z, we see that z = M −1 a is the element we are looking for. ⋄


88 TOPIC 6. LINEAR FUNCTIONALS

Example 60. Let V = L2 ([0, 1], C), and consider the linear functional
∫ 1
F (f ) = f (x) dx.
0

If we take g(x) = 1 for all x, we see that F (f ) = ⟨f, g⟩. From the Cauchy-Schwarz inequality
we have
∫ 1 √
∫ 1

f (x) dx = |F (f )| = |⟨f, g⟩| ≤ ∥f ∥2 · ∥g∥2 = ∥f ∥2 = |f (x)|2 dx,

0 0

since ∥g∥2 = 1. ⋄

The Riesz Representation Theorem has some startling applications, but we are not quite
ready for them yet!

You might also like