Notes for Math2320
Analysis 1, 2002
John Hutchinson
email: John.Hutchinson@anu.edu.au
CHAPTER 1
Ignore this page, it is merely for typesetting
purposes
3
1. PRELIMINARIES 1
1. Preliminaries
This course is both for students who do not intend at this stage doing any
further theoretical mathematical courses, and also as a prerequisite for the course
Analysis 2 (Lebesgue Integration and Measure Theory; and Hilbert Spaces). In
teresting and signiﬁcant applications to other areas of mathematics and to ﬁelds
outside mathematics will be given, and the underlying theoretical ideas will be
developed.
The main material for the course will be the text by Reed and these supple
mentary, more theoretical, notes. I will also occasionally refer to the text by Adams
(fourth edition), last years MATH1115 Foundations Notes and the (supplementary)
MATH1116 Calculus Notes.
There are two diﬀerent printings of Reed, the second corrects some minor mis
takes from the ﬁrst. Most of you will have the second printing. You can tell which
version you have by looking at the ﬁrst expression on line 1 of page 53; the ﬁrst
printing has 2
−n
(M −b) and the second printing has 2
−n+1
(M −b). I will use the
second version as the standard, but will make comments where it varies from the
ﬁrst.
Thanks to the many students in 2000 who pointed out various mistakes, par
ticularly Tristan Bice, Kesava Jay, Ashley Norris and Griﬀ Ware.
1.1. Real number system. Study Reed Chapter 1.1, it should be all review
material. I will also refer ahead to parts of Chapter 2 of Reed, particularly in the
sections.
We begin with a review of the axioms for the real numbers. However,
we will not usually make deductions directly from the axioms.
Warning: What I call the Cauchy completeness axiom is called the “Com
pleteness axiom” by Reed. What I call the Completeness axiom is not given a
name by Reed. The terminology I use is more standard, and agrees with Adams
and last years 1115 Notes.
Axioms for the real numbers. Recall that the real number system is a set
1
R,
together with two binary operations
2
“+” and “”, a binary relation
3
“≤”, and
two particular members of R denoted 0 and 1 respectively. Moreover, it is required
that certain axioms hold.
The ﬁrst two sets of axioms are the algebraic and the order axioms respectively,
see Reed (P1–P9) and (O1–O9)
4
. These axioms also hold for the rationals (but not
for the irrationals, or the complex numbers).
In order to discuss the remaining two axioms, we need the set of natural numbers
deﬁned by
N = ¦1, 1 + 1, 1 + 1 + 1, . . . ¦ = ¦1, 2, 3, . . . ¦.
Unless otherwise clear from the context, the letters m, n, i, j, k will always denote
natural numbers, or sometimes more generally will denote integers.
1
We discuss sets in the next section
2
A binary operation on R is a function which assigns to any two numbers in R a third number
in R. For example, the binary operation “+” assigns to the two real numbers a and b the real
number denoted by a +b, and so in particular to the two numbers 2 and 3.4 the number 5.4.
3
To say “≤” is a binary relation means that for any two real numbers a and b, the expression
a ≤ b is a statement and so is either true or false.
4
See also the MATH1115 Foundations Notes, but there we had axioms for < instead of ≤.
This does not matter, since we can derive the properties of either from the axioms for the other.
In fact, last year we did this for the properties of ≤ from axioms for <.
1. PRELIMINARIES 2
The two remaining axioms are:
• The Archimedean axiom
5
: if b > 0 then there is a natural number n
such that n > b.
• The Cauchy completeness axiom: for each Cauchy sequence
6
(a
n
) of
real numbers there is a real number a to which the sequence converges.
This is the full set of axioms. All the standard properties of real numbers follow
from these axioms.
7
The (usual ) Completeness axiom
8
is:
if a set of real numbers is bounded above, then it has a least upper bound.
This is equivalent to the Archimedean axiom plus the Cauchy completeness
axiom. More precisely, if we assume the algebraic and order axioms then one can
prove (see the following Remark) that
9
:
Archimedean axiom
+
Cauchy completeness axiom
⇐⇒ Completeness axiom
Thus, from now on, we will assume both the Archimedean axiom and the Cauchy
completeness axiom, and as a consequence also the (standard) Completeness “ax
iom”.
Reed only assumes the algebraic, order and Cauchy completeness axioms, but
not the Archimedean axiom, and then claims to prove ﬁrst the Completeness axiom
(Theorem 2.5.1) and from this the Archimedean axiom (Theorem 2.5.2). The proof
in Theorem 2.5.2 is correct, but there is a mistake in the proof of Theorem 2.5.1,
as we see in the following Remark. In fact, not only is the proof wrong, but in
fact it is impossible to prove the Completeness axiom from the algebraic, order and
Cauchy completeness axioms, as we will soon discuss
10
.
Remark 1.1.1. The mistake in the “proof” of Theorem 2.5.1 is a hidden ap
plication of the Archimedean axiom. On page 53 line 6 Reed says “choose n so that
2
−n
(M −b) ≤ ε/2 . . . ”. But this is the same as choosing n so that 2
n
≥
2(M−b)
ε
,
and for this you really need the Archimedean axiom. For example, you could choose
n >
2(M−b)
ε
by the Archimedean axiom and then if follows that 2
n
>
2(M−b)
ε
(since
2
n
> n by algebraic and order properties of the reals).
What Reed really proves is that (assuming the algebraic and order axioms):
Archimedean axiom
+
Cauchy completeness axiom
=⇒ Completeness axiom
5
This is equivalent to the Archimedean “property” stated in Reed page 4 line 6, which says
for any two real numbers a, b > 0 there is a natural number n such that na > b.
Take a = 1 in the version in Reed to get the version here. Conversely, the version in Reed
follows from the version here by ﬁrst replacing b in the version here by b/a to get that n > b/a for
some natural number n, and then multiplying both sides of this inequality by a to get na > b —
this is all justiﬁed since we know the usual algebraic and order properties follow from the algebraic
and order axioms.
6
See Reed p.45 for the deﬁnition of a Cauchy sequence.
7
Note that the rationals do not satisfy the corresponding version of the Cauchy completeness
axiom. For example, take any irrational number such as
√
2. The decimal expansion leads to
a sequence of rational numbers which is Cauchy, but there is no rational number to which the
sequence converges. See MATH1115 Notes top of p.15. The rationals do satisfy the Archimedean
axiom, why?
8
See Adams page 4 and Appendix A page 23, also the MATH1115 Notes page 8
9
“=⇒” means “implies” and “⇐⇒” means “implies and is implied by”
10
Not a very auspicious beginning to the book, but fortunately that is the only really serious
blunder.
Fattened up copy of R





0
a
b
c
d

e
d < e < c < 0 < 1 < a < b
A ``number'' on any line is less than any number to the right ,
and less than any any number on any higher line.
Between any two lines there is an infinite number of other
lines.

1
1. PRELIMINARIES 3
The proof of “⇐=” is about the same level of diﬃculty. In Theorem 2.5.2, Reed
proves in eﬀect that the Completeness axiom implies the Archimedean axiom. The
proof that the Completeness axiom implies the Cauchy completeness axiom is not
too hard, and I will set it later as a exercise.
Remark 1.1.2. We next ask: if we were smarter, would there be a way of
avoiding the use of the Archimedean axiom in the proof of Theorem 2.5.1? The
answer is No. The reason is that there are models in which the algebraic, order and
Cauchy completeness axioms are true but the Archimedean axiom is false; these
models are sometimes called hyperreals, see below. If we had a proof that the
algebraic, order and Cauchy completeness axioms implied the Completeness axiom
(i.e. a correct proof of Theorem 2.5.1), then combining this with the (correct) proof
in Theorem 2.5.2 we would end up with a proof that the algebraic, order and Cauchy
completeness axioms imply the Archimedean axiom. But this would contradict the
properties of the “hyperreals”.
These hyperreals are quite diﬃcult to construct rigorously, but here is a rough
idea.
Part of any such model looks like a “fattened up” copy of R, in the sense that
it contains a copy of R together with “inﬁnitesimals” squeezed between each real a
and all reals greater than a. This part is followed and preceded by inﬁnitely many
“copies” of itself, and between any two copies there are inﬁnitely many other copies.
See the following crude diagram.
A property of absolute values. Note the properties of absolute value in Propo
sition 1.1.2 of Reed. Another useful property worth remembering is
¸
¸
[x[ −[y[
¸
¸
≤ [x −y[
Here is a proof from the triangle inequality, Prop. 1.1.2(c) in Reed..
Proof.
[x[ = [x −y +y[
≤ [x −y[ +[y[ by the triangle inequality.
Hence
[x[ −[y[ ≤ [x −y[ (1)
1. PRELIMINARIES 4
and similarly
[y[ −[x[ ≤ [y −x[ = [x −y[. (2)
Since
¸
¸
[x[ −[y[
¸
¸
= [x[ −[y[ or [y[ −[x[, the result follows from (1) and (2).
1.2. Sets and Functions. Study Reed Chapter 1.2. I provide comments and
extra information below.
Sets and their properties. The notion of a set is basic to mathematics. In fact,
it is possible in principle to reduce all of mathematics to set theory. But in practice,
this is usually not very useful.
Sets are sometimes described by listing the members, and more often by means
of some property. For example,
Q = ¦ m/n [ m and n are integers, n ,= 0 ¦.
If a is a member of the set A, we write a ∈ A. Sometimes we say a is an element
of A.
Note the deﬁnitions of A ∩ B, A ∪ B, A ¸ B, A
c
and A ⊆ B
11
. For given sets
A and B, the ﬁrst four are sets, and the ﬁfth is a statement that is either true or
false. Note that A
c
is not well deﬁned unless we know the set containing A in which
we are taking the complement (this set will usually be clear from the context and
it is often called the universal set). Often A is a set of real numbers and R is the
universal set.
We say for sets A and B that A = B if they have the same elements. This
means that every member of A is a member of B and every member of B is also a
member of A, that is A ⊆ B and B ⊆ A. We usually prove A = B by proving the
two separate statements A ⊆ B and B ⊆ A. For example, see the proof below of
the ﬁrst of DeMorgan’s laws.
Theorem 1.2.1 (De Morgan’s laws). For any sets A and B
(A∩ B)
c
= A
c
∪ B
c
(A∪ B)
c
= A
c
∩ B
c
A∩ (B ∪ C) = (A∩ B) ∪ (B ∩ C)
A∪ (B ∩ C) = (A∪ B) ∩ (B ∪ C)
Remark 1.2.1. The following diagram should help motivate the above. Which
regions represent the eight sets on either side of one of the above four equalities?
The proofs are also motivated by the diagram. But note that all the proofs
really use are the deﬁnitions of ∩, ∪,
c
and the logical meanings of and, or, not.
11
We occasionally write A ⊂ B instead of A ⊆ B. Some texts write A ⊂ B to mean that
A ⊆ B but A = B; we will not do this.
A
B
A∩B
X
C
A
B
A x B
f
f
1. PRELIMINARIES 5
Proof. We will prove the ﬁrst equality by showing (A ∩ B)
c
⊆ A
c
∪ B
c
and
A
c
∪ B
c
⊆ (A∩ B)
c
.
First assume a ∈ (A∩ B)
c
(note that a ∈ X where X is the universal set).
Then it is not the case that a ∈ A∩ B.
In other words: it is not the case that (a ∈ A and a ∈ B).
Hence: (it is not the case that a ∈ A) or (it is not the case that a ∈ B).
12
That is: a ∈ A
c
or a ∈ B
c
(remember that a ∈ X).
Hence: a ∈ A
c
∪ B
c
.
Since a was an arbitrary element in (A∩B)
c
, it follows that (A∩B)
c
⊆ A
c
∪B
c
.
The argument above can be written in essentially the reverse order (check! ) to
show that A
c
∪ B
c
⊆ (A∩ B)
c
.
This completes the proof of the ﬁrst equality
The proofs of the remaining equalities are similar, and will be left as exercises.
Products of sets. Recall that
AB = ¦ (a, b) [ a ∈ A, b ∈ B¦
For example, R
2
= R R. Here (a, b) is an ordered pair, not an open interval!
We can represent AB schematically as follows
Remark 1.2.2. It is interesting to note that we can deﬁne ordered pairs in
terms of sets. Important properties required for an ordered pair (a, b) are that it
depends on the two objects a and b and the order is important, i.e. (a, b) ,= (b, a)
12
If P and Q are two statements, then “not (P and Q)” has the same meaning as “(not P)
or (not Q)”.
1. PRELIMINARIES 6
unless a = b. This is diﬀerent from the case for sets, i.e. ¦a, b¦ = ¦b, a¦ for any a
and b.
More generally, the only property we require of ordered pairs is that
(a, b) = (c, d) iﬀ (a = c and b = d). (3)
There are a number of ways that we can deﬁne ordered pairs in terms of sets.
The standard deﬁnition is
(a, b) := ¦¦a¦, ¦a, b¦¦.
To show this is a good deﬁnition, we need to prove (3).
Proof. It is immediate from the deﬁnition that if a = c and b = d then
(a, b) = (c, d).
Next suppose (a, b) = (c, d), i.e. ¦¦a¦, ¦a, b¦¦ = ¦¦c¦, ¦c, d¦¦. We consider the
two cases a = b and a ,= b separately.
If a = b then ¦¦a¦, ¦a, b¦¦ contains exactly one member, namely ¦a¦, and so
¦¦c¦, ¦c, d¦¦ also contains exactly the one member ¦a¦. This means ¦a¦ = ¦c¦ =
¦c, d¦. Hence a = c and c = d. In conclusion, a = b = c = d.
If a ,= b then ¦¦a¦, ¦a, b¦¦ contains exactly two (distinct) members, namely ¦a¦
and ¦a, b¦. Since ¦¦a¦, ¦a, b¦¦ = ¦¦c¦, ¦c, d¦¦ it follows ¦c¦ ∈ ¦¦a¦, ¦a, b¦¦ and so
¦c¦ = ¦a¦ or ¦c¦ = ¦a, b¦. The second equality cannot be true since ¦a, b¦ contains
two members whereas ¦c¦ contains one member, and so ¦c¦ = ¦a¦, and so c = a.
Since also ¦c, d¦ ∈ ¦¦a¦, ¦a, b¦¦ it now follows that ¦c, d¦ = ¦a, b¦ (otherwise
¦c, d¦ = ¦a¦, but since also ¦c¦ = ¦a¦ this would imply ¦¦c¦, ¦c, d¦¦ and hence
¦¦a¦, ¦a, b¦¦ has only one member, and we have seen this is not so). Since a and b
are distinct and ¦c, d¦ = ¦a, b¦, it follows c and d are distinct; since a = c it then
follows b = d. In conclusion, a = c and b = d.
This completes the proof.
Power sets. The power set T(A) of a set A is deﬁned to be the set of all subsets
of A. That is
T(A) = ¦ S [ S ⊆ A¦.
1.2.1. Functions. We say f is a function from the set A into the set B, and
write
f : A →B,
if f “assigns” to every a ∈ A exactly one member of B. This member of B is
denoted by f(a).
We say A is the domain of f, i.e. Dom(f) = A, and the set of all members in
B of the form f(a) is the range of f, i.e. Ran(f) = ¦ f(a) [ a ∈ A¦.
Note that Ran(f) ⊆ B, but Ran(f) need not be all of B. An example is the
squaring function, see the next paragraph.
The convention on page 8 in Reed is diﬀerent. He only requires that f assigns
a value to some members of A, so that for him the domain of f need not be all
of A. This is not standard, and you should normally keep to the deﬁnition here.
In particular, in Reed Example 2 page 9 one should say f is a function from R
+
13
(not R) to R, and write f : R
+
→ R, since f only assigns a value to positive
numbers.
We informally think of f as some sort of “rule”. For example, the “squaring
function” f : R → R is given by the rule f(a) = a
2
for each a ∈ R. But functions
13
R
+
= { a ∈ R  a > 0 }.
1. PRELIMINARIES 7
can be much more complicated than this, and in particular there may not really be
any “rule” which explicitly describes the function.
The way to avoid this problem is to deﬁne functions in terms of sets. We
can think of the squaring function as being given by the set of all ordered pairs
¦ (a, a
2
) [ a ∈ R¦. This set is a subset of R R and has the property that to
every a ∈ R there corresponds exactly one element b ∈ R (b = a
2
for the squaring
function). Thus we are really just identifying f with its graph.
More generally, we say f is a function from A into B, and write f : A →B, if
f is a subset of A B with the property that for every a ∈ A there is exactly one
b ∈ B such that (a, b) ∈ f. See the preceding diagram.
Note that Reed uses a diﬀerent notation for the function f thought of as a
“rule” or “assignment” on the one hand, and the function f thought of as a subset
of AB on the other (he uses a capital F in the latter case). We will use he same
notation in either case.
We normally think of f as an assignment, although the “assignment” may not
be given by any rule which can readily be written down.
Note also that Reed uses the notation a
f
−→ b to mean the same as f(a) = b.
Reed’s notation is not very common and we will avoid it. Do not confuse it with
the notation f : A →B.
2. CARDINALITY 8
2. Cardinality
Some inﬁnite sets are bigger than others, in a sense that can be made
precise.
Study Reed Chapter 1.3. In the following I make some supplementary remarks.
Note the deﬁnitions of ﬁnite and inﬁnite sets.
A set is countable if it is inﬁnite and can be put in oneone correspondence
with N, i.e. has the same cardinality as N. (Some books include ﬁnite sets among
the countable sets; if there is any likelihood of confusion, you can say countably
inﬁnite.) In other words, a set is countable iﬀ it can be written as an inﬁnite
sequence (with all a
i
distinct)
a
1
, a
2
, a
3
, . . . , a
n
, . . . .
It is not true that all inﬁnite sets are countable, as we will soon see. If an inﬁnite
set is not countable then we say it is uncountable.
Thus every set is ﬁnite, countable or uncountable.
Important results in this section are that an inﬁnite subset of a countable set is
countable (Prop 1.3.2), the product of two countable sets is countable (Prop 1.3.4),
the union of two countable sets is countable (Exercise 3), the set of rational numbers
is countable (Theorem 1.3.5), the set of real numbers is uncountable (Theorem
1.3.6 and the remarks following that theorem), and the set of irrational numbers is
uncountable (second last paragraph in Chapter 1.3).
A diﬀerent proof of Proposition 1.3.4, which avoids the use of the Fundamental
Theorem of Arithmetic and is more intuitive, is the following.
Proposition 2.0.2. If S and T are countable, then so is S T.
Proof. Let S = (a
1
, a
2
, . . . ) and T = (b
1
, b
2
, . . . ) Then S T can be enumer
ated as follows:
(a
1
, b
1
) (a
1
, b
2
) → (a
1
, b
3
) (a
1
, b
4
) → (a
1
, b
5
) . . .
↓ ↑ ↓ ↑ ↓
(a
2
, b
1
) → (a
2
, b
2
) (a
2
, b
3
) (a
2
, b
4
) (a
2
, b
5
) . . .
↓ ↑ ↓
(a
3
, b
1
) ← (a
3
, b
2
) ← (a
3
, b
3
) (a
3
, b
4
) (a
3
, b
5
) . . .
↓ ↑ ↓
(a
4
, b
1
) → (a
4
, b
2
) → (a
4
, b
3
) → (a
4
, b
4
) (a
4
, b
5
) . . .
↓
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
It is not the case that all uncountable sets have the same cardinality. In fact
T(A) always has a larger cardinality than A, in the sense that there is no oneone
map from A onto T(A). (There is certainly a oneone map from A into T(A), deﬁne
f(A) = ¦A¦). Thus we can get larger and larger inﬁnite sets via the sequence
N, T(N), T(T(N)), T(T(T(N))), . . . .
(One can prove that T(N) and R have the same cardinality.)
In fact, this is barely the beginning of what we can do: we could take the
union of all the above sets to get a set of even larger cardinality, and then take this
as a new beginning set instead of using N. And still we would hardly have begun!
However, in practice we rarely go beyond sets of cardinality T(T(T(N))), i.e. of
cardinality T(T(R))
2. CARDINALITY 9
Theorem 2.0.3. For any set A, there is no map f from A onto T(A). In
particular, A and T(A) have diﬀerent cardinality.
Proof. * Consider any f : A →T(A). We have to show that f is not onto.
Deﬁne
B = ¦ a ∈ A [ a / ∈ f(a) ¦.
Then B ⊆ A and so B ∈ T(A). We claim there is no b ∈ A such that f(b) = B.
(This implies that f is not an onto map!)
Assume (in order to obtain a contradiction) that f(b) = B for some b ∈ A.
One of the two possibilities b ∈ B or b / ∈ B must be true.
If b ∈ B, i.e. b ∈ f(b), then b does not satisfy the condition deﬁning B, and so
b / ∈ B.
If b / ∈ B, i.e. b / ∈ f(b), then B does satisfy the condition deﬁning B, and so
b ∈ B.
In either case we have a contradiction, and so the assumption must be wrong.
Hence there is no b ∈ A such that f(b) = B and in particular f is not an onto
map.
3. SEQUENCES 10
3. Sequences
Review the deﬁnitions and examples in Section 2.1 of Reed, and the propositions
and theorems in Section 2.2.
3.1. Basic deﬁnitions. Recall that a sequence is an inﬁnite list of real num
bers (later, complex numbers or functions, etc.). We usually write
a
1
, a
2
, a
3
, . . . , a
n
, . . . ,
(a
n
)
n≥1
, (a
n
)
∞
n=1
or just (a
n
). Sometimes we start the enumeration from 0 or
another integer.
The sequence (a
n
) converges to a if for any preassigned positive number (usually
called ε), all members of the sequence beyond a certain member (usually called the
Nth), will be within distance ε of a. In symbols:
for each ε > 0 there is an integer N such that
[a
n
−a[ ≤ ε for all n ≥ N.
(Note that the deﬁnition implies N may (and usually will) depend on ε. We often
write N(ε) to emphasise this fact.)
Study Examples 1–4 in Section 2.1 of Reed. Note that the Archimedean axiom
is needed in each of these three examples (page 30 line 9, page 31 line 8, page 32
line 1, page 33 line 3).
Note also the deﬁnitions of
1. diverges; e.g. if a
n
= n
2
or a
n
= (−1)
n
, then (a
n
) diverges.
2. diverges to +∞; e.g. n
2
→∞, but not (−1)
n
n
2
→∞
3.2. Convergence properties. From the deﬁnition of convergence we have
the following useful theorems, see Reed Section 2.2 for proofs. While the results
may appear obvious, this is partly because we may only have simple examples of
sequences in mind (For a less simple example, think of a sequence which enumerates
the rational numbers!).
Theorem 3.2.1.
1. If (a
n
) converges, then (a
n
) is bounded.
14
2. If a
n
≤ c
n
≤ b
n
for all n
15
and both a
n
→L, b
n
→L, then c
n
→L.
3. If a
n
→a and b
n
→b then
(a) a
n
±b
n
→a ±b,
(b) ca
n
→ca (c is a real number),
(c) a
n
b
n
→ab,
(d) a
n
/b
n
→a/b, (assuming b ,= 0)
16
3.3. Sequences in R
N
. We will often be interested in sequences of vectors
(i.e. points) in R
N
.
14
A sequence (a
n
) is bounded if there is a real number M such that a
n
 ≤ M for all n.
15
Or for all suﬃciently large n, more precisely for all n ≥ N (say).
16
Since b = 0, it follows that b
n
= 0 for all n ≥ N (say). This implies that the sequence
(a
n
/b
n
) is deﬁned, at least for n ≥ N.
.
.
.
.
.
.
.
.
a
1
a
8
a
4
a
3
a
5
3. SEQUENCES 11
Such a sequence is said to converge if each of the N sequences of components
converges. For example,
_
1 +
(−1)
n
n
sin n
e
−n
_
→
_
1
1
_
.
The analogues of 1., 3(a), 3(b) hold, as is easily checked by looking at the
components. More precisely
Theorem 3.3.1.
1. If (u
n
) converges, then (u
n
) is bounded.
17
2. If u
n
→u, v
n
→v and a
n
→a (sequence of real numbers) then
(a) u
n
±v
n
→u ±v,
(b) a
n
u
n
→au,
3. Au
n
→Au (where A is any matrix with N columns).
Proof.
1. (Think of the case N = 2) Since (u
n
) converges, so does each of the N
sequences of real numbers (u
j
n
)
n≥1
(where 1 ≤ j ≤ N). But this implies that
each of these N sequences of real numbers lies in a ﬁxed interval [−M
j
, M
j
] for
j = 1, . . . , N. Let M be the maximum of the M
j
. Since [u
j
n
[ ≤ M for all j and all
n, it follows that [u
n
[ =
_
(u
1
n
)
2
+ + (u
N
n
)
2
≤
√
NM for all n. This proves the
ﬁrst result.
2. Since each of the components of u
n
± v
n
converges to the corresponding
component of u±v by the previous theorem, result 2(a) follows. Similarly for 2(b).
3. This follows from the fact that Au
n
is a linear combination of the columns
of A with coeﬃcients given by the components of u
n
, while Au is the corresponding
linear combination with coeﬃcients given by the components of u.
More precisely, write A = [a
1
, . . . , a
N
], where a
1
, . . . , a
N
are the column vectors
of A. Then
Au
n
= u
1
n
a
1
+ +u
N
n
a
N
and Au = u
1
a
1
+ +u
N
a
N
Since u
1
n
→ u
1
, . . . , u
N
n
→ u
N
as n → ∞, the result follows from 2(b) and
repeated applications of 2(a).
17
A sequence of points (u
n
)
n≥1
in R
N
, where u
n
= (u
1
n
, . . . , u
N
n
) for each n, is bounded if
there is a real number M such that u
n
 =
_
(u
1
n
)
2
+· · · + (u
1
n
)
2
≤ M for all n. This just means
that all members of the sequence lie inside some single ball of suﬃciently large radius M.
E
1
E
2
E
3
.5
.3
.2
.6
.2
.2
.4
.4
.2
4. MARKOV CHAINS 12
4. Markov Chains
Important examples of inﬁnite sequences occur via Markov chains.
See Reed Section 2.3. There is a proof in Reed of Markov’s Theorem (The
orem 4.0.2 in these notes) for the case of 2 states. But the proof is algebraically
messy and does not easily generalise to more than 2 states. So you may omit Reed
page 42 and the ﬁrst paragraph on page 43. Later I will give a proof of Markov’s
Theorem using the Contraction Mapping Theorem.
Suppose that a taxi is in one of three possible zones E
1
, E
2
or E
3
(e.g. Civic,
Belconnen or Woden). We are interested in its location at a certain sequence of
designated times, lets say at each hour of the day.
Suppose that if the taxi is in E
1
at one hour, then the probability that it is
in E
1
, E
2
, E
3
exactly one hour later is .5, .3, .2 respectively (see the ﬁrst column of
the following matrix, note that the sum of the probabilities must be 1). Similarly
suppose that if it is in E
2
then the probability that it is in E
1
, E
2
, E
3
exactly one
hour later is .6, .2, .2 (second column), and if it is in E
3
then the probability that
it is in E
1
, E
2
, E
3
exactly one hour later is .4, .2, .4 (third column).
We can represent the situation by a matrix of probabilities,
_
_
.5 .6 .4
.3 .2 .2
.2 .2 .4
_
_
.
The ﬁrst column corresponds to starting in E
1
, the second to E
2
and the third
to E
3
. We can also show this in a diagram as follows.
We are interested in the longterm behaviour of the taxi. For example, after a
large number of hours, what is the probability that the taxi is in each of the states
E
1
, E
2
, E
3
, and does this depend on the initial starting position?
More generally, consider the following situation, called a Markov process. One
has a system which can be in one of N states E
1
, . . . , E
N
. P = (p
ij
) is an N N
matrix where p
ij
is the probability that if the system is in state E
j
at some time
then it will be in state E
i
at the next time. Thus the jth column corresponds to
what happens at the next time if the system is currently in state E
j
. We say P is
a probability transition matrix.
H
L
P
.3
.4
.2
.3
.4
.7
.7
4. MARKOV CHAINS 13
What happens after the system is in state E
2
↓
P =
_
¸
¸
¸
_
p
11
p
12
. . . p
1N
p
21
p
22
. . . p
2N
.
.
.
.
.
.
.
.
.
.
.
.
p
N1
p
N2
. . . p
NN
_
¸
¸
¸
_
If a vector consists of numbers from the interval [0, 1] and the sum of the
components is one, we say the vector is a probability vector. We require the columns
of a probability transition matrix to be probability vectors.
There are many important examples:
Cell genetics: Suppose a cell contains k particles, some of type A and some
of type B. The state of the cell is the number of particles of type A, so there
are N = k + 1 states. Suppose daughter cells are formed by subdivision.
Then one can often compute from biological considerations the probability
that a random daughter cell is in each of the N possible states.
Population genetics: Suppose ﬂowers are bred and at each generation 1000
ﬂowers are selected at random. A particular gene may occur either 0 or 1
or 2 times in each ﬂower and so there are between 0 and 2000 occurrences
of the gene in the population. This gives 2001 possible states, and one usu
ally knows from biological considerations what is the possibility of passing
from any given state for one generation to another given state for the next
generation.
Random walk: Suppose an inebriated person is at one of three locations;
home (H), lamppost (L) or pub (P). Suppose the probability of passing
from one to the other an hour later is given by the following diagram:
The corresponding matrix of transition probabilities, with the three
columns corresponding to being in the state H, L, P respectively, is
_
_
.7 .4 0
.3 .4 .3
0 .2 .7
_
_
Other examples occur in diﬀusion processes, queueing theory, telecommunica
tions, statistical mechanics, etc. See An Introduction to Probability Theory and Its
Applications, vol 1, by W. Feller for examples and theory.
We now return to the general theory. Suppose that at some particular time the
probabilities of being in the states E
1
, . . . , E
N
is given by the probability vector
4. MARKOV CHAINS 14
a =
_
¸
¸
¸
_
a
1
a
2
.
.
.
a
N
_
¸
¸
¸
_
(so the sum of the a
i
is one). Then the probability at the next time of
being in state E
i
is
18
a
1
(prob. of moving from E
1
to E
i
) +a
2
(prob. of moving from E
2
to E
i
)
+ +a
N
(prob. of moving from E
N
to E
i
)
= p
i1
a
1
+p
i2
a
2
+ +p
iN
a
N
= (Pa)
i
.
Thus if at some time the probabilities of being in each of the N states is given by
the vector a, then the corresponding probabilities at the next time are given by
Pa, and hence at the next time by P(Pa) = P
2
a, and hence at the next time by
P
3
a, etc.
In the case of the taxi, starting at Civic and hence starting with a probability
vector a =
_
_
1
0
0
_
_
, the corresponding probabilities at later times are given by Pa =
_
_
.5
.3
.2
_
_
, P
2
a =
_
_
.51
.25
.24
_
_
, P
3
a =
_
_
.501
.251
.248
_
_
, . . . . This appears to converge to
_
_
.5
.25
.25
_
_
In fact it seems plausible that after a suﬃciently long period of time, the prob
ability of being in any particular zone should be almost independent of where the
taxi begins. This is indeed the case (if we pass to the limit), as we will see later.
We return again to the general situation. Suppose a is any probability vector
with N components. We will later prove:
Theorem 4.0.2 (Markov). If all entries in the probability transition matrix P
are greater than 0, and a is a probability vector, then the sequence of vectors
a, Pa, P
2
a, . . . ,
converges to a probability vector v, and this vector does not depend on a.
Moreover, the same results are true even if we just assume P
k
has all entries
nonzero for some integer k > 1.
The vector v is the unique nonzero solution of (P −I)v = 0.
19
The theorem applies to the taxi, and to the inebriated person (take k = 2).
The proof that if v is the limit then (P −I)v = 0 can be done now.
Proof. Assume v = limP
n
a. Then
v = lim
n→∞
P
n
a
= lim
n→∞
P
n+1
a (as this is the same sequence, just begun one term later)
= lim
n→∞
PP
n
a
= P lim
n→∞
P
n
a (from Theorem 3.3.1.3)
= Pv.
Hence (P −I)v = 0.
18
From intuitive ideas of probability, if you have not done any even very basic probability
theory.
19
The matrix I is the identity matrix of the same size as P.
4. MARKOV CHAINS 15
For the taxi, the non zero solution of
_
_
−.5 .6 .4
.3 −.8 .2
.2 .2 −.6
_
_
_
_
v
1
v
2
v
3
_
_
=
_
_
0
0
0
_
_
with entries summing to 1 is indeed (see previous discussion)
_
_
v
1
v
2
v
3
_
_
=
_
_
.5
.25
.25
_
_
and so
in the long term the taxi spends 5025
For the inebriate, the non zero solution of
_
_
−.3 .4 0
.3 −.6 .3
0 .2 −.3
_
_
_
_
v
1
v
2
v
3
_
_
=
_
_
0
0
0
_
_
is
_
_
v
1
v
2
v
3
_
_
=
_
_
4/9
1/3
2/9
_
_
and so in the long term (as the evening progresses) the inebriate
spends 44% of the time at home, 33% by the lamppost and 22% in the pub.
5. CAUCHY SEQUENCES AND THE BOLZANOWEIERSTRASS THEOREM 16
5. Cauchy sequences and the BolzanoWeierstraß theorem
A sequence converges iﬀ it is Cauchy. This gives a criterion for
convergence which does not require knowledge of the limit. It will
follow that a bounded monotone sequence is convergent.
A sequence need not of course converge, even if it is bounded.
But by the BolzanoWeierstraß theorem, every bounded sequence
has a convergent subsequence.
Read Sections 2.4–6 of Reed, most of it will be review. The following notes go
through the main points.
5.1. Cauchy sequences. (Study Section 2.4 of Reed.)
We ﬁrst give the deﬁnition.
Definition 5.1.1. A sequence (a
n
) is a Cauchy sequence if for each number
> 0 there exists an integer N such that
[a
m
−a
n
[ ≤ whenever m ≥ N and n ≥ N.
The idea is that, given any number (“tolerance”) ε > 0, all terms (not just
successive ones) of the sequence from the Nth onwards are within (this tolerance) ε
of one another (N will normally depend on the particular “tolerance”). The smaller
we choose ε the larger we will need to choose N.
See Example 1 page 45 of Reed (note that the Archimedean axiom is used on
page 45 line 8).
20
Theorem 5.1.2. A sequence is Cauchy iﬀ it converges.
The proof that convergent implies Cauchy is easy (Proposition 2.4.1 of Reed).
The fact that Cauchy implies convergent is just the Cauchy Completeness Axiom
(or what Reed calls the Completeness axiom, see Reed page 48).
Recall that a sequence (a
n
) is increasing if a
n
≤ a
n+1
for all n, and is decreasing
if a
n
≤ a
n+1
for all n. A sequence is monotone if it is either increasing or decreasing.
Theorem 5.1.3. Every bounded monotone sequence converges.
For the proof of the theorem see Reed page 48. The proof uses a bisection
argument. Begin with a closed interval which contains all members of the sequence,
and keep subdividing it so that all members of the sequence after some index are
in the chosen subinterval. This implies the sequence is Cauchy and so it converges
by the previous theorem. (Note the use of the Archimedean Axiom on page 49 line
10 of Reed.)
Loosely speaking, the sequence of chosen subintervals “converges to a point”
which is the limit of the original sequence.
5.2. Subsequences and the BolzanoWeierstraß theorem. Study Sec
tion 2.6 of Reed (omitting the Deﬁnition and Proposition on page 56). Also read
Section 3.5 of the MATH1115 Notes and Adams, Appendix III, Theorem 2.
Suppose (a
n
) is a sequence of real numbers. A subsequence is just a sequence
obtained by skipping terms. For example,
a
1
, a
27
, a
31
, a
44
, a
101
, . . .
is a subsequence. We usually write a subsequence of (a
n
) as
a
n
1
, a
n
2
, a
n
3
, a
n
4
, . . . , a
n
k
, . . . , or
a
n(1)
, a
n(2)
, a
n(3)
, a
n(4)
, . . . , a
n(k)
, . . . , or
20
line 8 means 8 lines from the bottom of the page.
5. CAUCHY SEQUENCES AND THE BOLZANOWEIERSTRASS THEOREM 17
(a
n
k
)
k≥1
, or (a
n(k)
)
k≥1
Thus in the above example we have n
1
= 1, n
2
= 27, n
3
= 31, n
4
= 44, n
5
= 101, . . . .
Instead of the Deﬁnition on page 56 of Reed, you may use Proposition 2.6 as
the deﬁnition. That is,
Definition 5.2.1. The number d is a limit point of the sequence (a
n
)
n≥1
if
there is a subsequence of (a
n
)
n≥1
which converges to d.
The limit of a convergent sequence is also a limit point of the sequence. More
over, it is the only limit point, because any subsequence of a convergent sequence
must converge to the same limit as the original sequence ( why?). On the other
hand, sequences which do not converge may have many limit points, even inﬁnitely
many (see the following examples).
Now look at Examples 1 and 2 on pages 56, 57 of Reed.
Here is an example of a sequence which has every every number in [0, 1] as a
limit point. Let
r
1
, r
2
, r
3
, . . . (4)
be an enumeration of all the rationals in the interval [0, 1]. We know this exists,
because the rationals are countable, and indeed we can explicitly write down such
a sequence.
Every number a ∈ [0, 1] has a decimal expansion, and this actually provides a
sequence (a
n
) of rationals converging to a (modify the decimal expansion a little
to get distinct rationals if necessary).
This sequence may not be a subsequence of (4), since the a
n
may occur in
a diﬀerent order. But we can ﬁnd a subsequence (a
n
) of (a
n
) (which thus also
converges to a) and which is also a subsequence of (4) as follows:
a
1
= a
1
a
2
= ﬁrst term in (a
n
) which occurs after a
1
in (4)
a
3
= ﬁrst term in (a
n
) which occurs after a
2
in (4)
a
4
= ﬁrst term in (a
n
) which occurs after a
3
in (4)
.
.
.
Theorem 5.2.2 (BolzanoWeierstraß Theorem). If a sequence (a
n
) is bounded
then it has a convergent subsequence. If all members of (a
n
) belong to a closed
bounded interval I, then so does the limit of any convergent subsequence of (a
n
).
Note that the theorem implies that any bounded sequence must have at least
one limit point. See Reed pages 57, 58.
The proof of the theorem is via a bisection argument. Suppose I is a closed
bounded interval containing all terms of the sequence. Subdivide I and choose
one of the subintervals so it contains an inﬁnite subsequence, then again subdivide
and takes one of the new subintervals with an inﬁnite subsequence of the ﬁrst
subsequence, etc. Now deﬁne a new subsequence of the original sequence by taking
one term from the ﬁrst subsequence, a later term from the second subsequence,
a still later term from the third subsequence, etc. One can now show that this
subsequence is Cauchy and so converges.
5.3. Completeness property of the reals. (See Section 2.5 of Reed, but
note the errors discussed below).
5. CAUCHY SEQUENCES AND THE BOLZANOWEIERSTRASS THEOREM 18
The usual Completeness Axiom for the real numbers is:
21
if a non empty set A of real numbers is bounded above, then there is
a real number b which is the least upper bound for A.
The real number b need not belong to A, but it is unique. For example, the
least upper bound for each of the sets (0, 1) and [0, 1] is 1.
A common term for the “least upper bound” is the supremum or sup. Similarly,
the greatest lower bound of a set is often called the inﬁmum or inf.
See the MATH1115 2000 notes, ¸2.3 for a discussion. See also Adams, third
edition, pages 4, A23.
The Completeness Axiom in fact follows from the other (i.e. algebraic, order,
Archimedean and Cauchy completeness) axioms. It is also possible to prove the
Archimedean and Cauchy completeness axioms from the algebraic, order and Com
pleteness axioms.
You should now go back and reread Section 1.1 of these notes.
Remarks on the text: Note that the remark at the end of Section 2.5 of Reed about
the equivalence of Theorems 2.4.2, 2.4.3 and 2.5.1 is only correct if we assume
the algebraic, order and Archimedean axioms. The assertion on page 53 before
Theorem 2.5.2 in the corrected printing, that the Archimedean property follows
from Theorem 2.5.1, is not correct, as Theorem 2.5.1 uses the Archimedean property
in its proof (in line 6 of page 53).
21
Note that a similar statement is not true of we replace “real” by “rational”. For example,
let
A = { x ∈ Q  x ≥ 0, x
2
< 2 }.
Then although A is a nonempty set of rational numbers which is bounded above, there is no
rational least upper bound for A. The (real number) least upper bound is of course
√
2.
6. THE QUADRATIC MAP 19
6. The Quadratic Map
This is the prototype of maps which lead to chaotic behaviour.
For some excellent online software demonstrating a wide range of
chaotic behaviour, go to Brian Davies’ homepage, via the Depart
ment of Mathematics homepage. When you reach the page on “Ex
ploring chaos”, click on the “graphical analysis” button.
Read Section 2.7 from Reed. We will only do this section lightly, just to indicate
how quite complicated behaviour can arise from seemingly simple maps.
The quadratic map is deﬁned by the recursive relation
x
n+1
= rx
n
(1 −x
n
), (5)
where x
0
∈ [0, 1] and 0 < r ≤ 4. It is a simple model of certain population models,
where x
n
is the fraction in year n of some “maximum possible” population.
The function F(x) = rx(1 − x) is positive on the interval [0, 1]. Its maximum
occurs at x = 1/2 and has the value a/4. It follows that
F : [0, 1] →[0, 1]
for r in the given range.
If r ∈ (0, 1], it can be seen from the above diagram (obtained via Brian Davies’
software) that x
n
→ 0, independently of the initial value x
0
. This is proved in
Theorem 2.7.1 of Reed.
The proof is easy; it is ﬁrst a matter of showing that the sequence (x
n
) is
decreasing, as is indicated by the diagram in Reed, and hence has a limit by The
orem 5.1.3. If x
∗
is the limit, then by passing to the limit in (5) it follows that
x
∗
= rx
∗
(1−x
∗
), and so x
∗
= 0. (The other solution of x = rx(1−x) is x = 1−1/r,
which does not lie in [0, 1] — unless a = 1, which gives 0 again.)
For 1 < r ≤ 3 and 0 < x
0
< 1, the sequence converges to x
∗
= 1 − 1/r. (This
is proved in Reed, Theorem 2.7.2 for 1 < r ≤ 2
√
2. The proof is not diﬃcult, but
we will not go through it.) If x
0
= 0, 1 then it is easy to see that the sequence
converges to 0.
6. THE QUADRATIC MAP 20
For 3 < r ≤ 4 and 0 < x
0
< 1 the situation becomes more complicated. If
3 < r ≤ 3.4 (approx.) then x
n
will eventually oscillate back and forth between two
values which become closer and closer to two of the solutions of x = F(F(x)) =
F
2
(x) (see the numerics on Reed page 65).
As r gets larger the situation becomes more and more complicated, and in
particular one needs to consider solutions of x = F
k
(x) for large k.
6. THE QUADRATIC MAP 21
7. CONTINUITY 22
7. Continuity
Study Reed Sections 3.1, 3.2. See also the MATH1115 Notes, Chapter 4.
7.1. Deﬁnition of continuity. We will consider functions f : A →R, where
A ⊆ R is the domain of f, also denoted by T(f). Usually A will be R, an interval,
or a ﬁnite union of intervals. (To ﬁx your ideas consider the case A = [a, b] or
A = (a, b).)
Definition 7.1.1. A function f is continuous at a ∈ T(f) if
x
n
→a =⇒ f(x
n
) →f(a)
whenever (x
n
) ⊆ T(f).
22
The function f is continuous if it is continuous at every
point in its domain.
This deﬁnition in terms of sequences is quite natural: “as x
n
gets closer and
closer to a, f(x
n
) gets closer and closer to f(a)”. By the following theorem it is
equivalent to the usual “ε–δ” deﬁnition which does not involve sequences.
Theorem 7.1.2. A function f is continuous at a point a ∈ T(f) iﬀ for every
ε > 0 there is a δ > 0 such that:
x ∈ T(f) and [x −a[ ≤ δ ⇒[f(x) −f(a)[ ≤ .
(For the proof see Reed Theorem 3.1.3 page 77.)
Note that if f(x) = sin
1
x
then f is continuous on its domain (−∞, 0) ∪(0, ∞).
However, there is no continuous extension of f to all of R. On the other hand,
f(x) = xsin
1
x
is both continuous on its domain (−∞, 0) ∪ (0, ∞) and also has a
continuous extension to all of R. (Draw diagrams!)
Another interesting example is the function g given by
g(x) =
_
x if x ∈ Q
−x if x ∈ R ¸ Q.
Then g is continuous at 0 and nowhere else. (Draw a diagram, and convince yourself
via the deﬁnition.)
7.2. Limits and continuity. Note the deﬁnition of
lim
x→a
f(x) = L,
on the second half of page 78 of Reed, both in terms of sequences and in terms of
ε and δ.
23
It follows (provided a is not an isolated point
24
of T(f)) that f is continuous
at a ∈ T(f) iﬀ
lim
x→a
f(x) = f(a).
(From our deﬁnitions, if a is an isolated point of T(f) then f is continuous at
a although lim
x→a
f(x) is not actually deﬁned. This is not an interesting sitation,
and we would not normally consider continuity at isolated points.)
22
By (x
n
) ⊆ D(f) we mean that each term of the sequence is a member of D(f).
23
In the deﬁnition of lim
x→a
f(x) = L, we restrict to sequences x
n
→a with x
n
= a, or we
require that 0 < x −a ≤ c (thus x = a), depending on the deﬁnition used — see Reed page 78.
In the case of continuity however, because the required limit is f(a) (c.f. (21) and (22)), one
can see it is not necessary to make this restriction.
24
We say a ∈ A is an isolated point of A if there is no sequence x
n
→ a such that (x
n
) ⊆
A \ {a}. For example, if A = [0, 1] ∪ 2 then 2 is an isolated point of A (and is the only isolated
point of A).
7. CONTINUITY 23
7.3. Properties of continuity.
Theorem 7.3.1. Suppose f and g are continuous. Then the following are all
continuous on their domains.
1. f +g
2. cf for any c ∈ R
3. fg
4. f/g
5. f ◦ g
The domain of each of the above functions is just the set of real numbers where
it is deﬁned. Thus the domain of f + g and of fg is T(f) ∩ T(g); the domain of
cf is T(f); the domain of f/g is ¦ x ∈ T(f) ∩T(g) [ g(x) ,= 0 ¦, and the domain of
f ◦ g is ¦ x [ x ∈ T(g) and g(x) ∈ T(f) ¦.
For the proofs of Theorem 7.3.1, see Reed Theorems 3.1.1 and 3.1.2 or the
MATH1115 notes Section 4. The proofs in terms of the sequence deﬁnition of
continuity are very easy, since the hard work has already been done in proving the
corresponding properties for limits of sequences.
See Examples 1–4 in Section 3.1 of Reed.
7.4. Deeper properties of continuous functions. These require the com
pleteness property of the real numbers for their proofs.
Theorem 7.4.1. Every continuous function deﬁned on a closed bounded in
terval is bounded above and below and moreover has a maximum and a minimum
value.
Proof. A proof using the BolzanoWeierstraß theorem is in Reed (Theorem
3.2.1 and 3.2.2 on pp 80,81).
The proof is fairly easy, since the hard work has already been done done in
proving the BolzanoWeierstraß theorem.
Noe that a similar result is not true on other intervals. For example, consider
f(x) = 1/x on (0, 1].
Theorem 7.4.2 (Intermediate Value Theorem). Every continuous function de
ﬁned on a closed bounded interval takes all values between the values taken at the
endpoints.
Proof. See Reid p82. Again, the hard work has alredy been done in proving
the BolzanoWeierstraß theorem.
Corollary 7.4.3. Let f be a continuous function deﬁned on a closed bounded
interval with minimum value m and maximum value M. Then the range of f is
the interval [m, M].
Proof. (See Reid p83.) The function f must take the values m and M at
points c and d (say) by Theorem 7.4.1. By the Intermediate Value Theorem applied
with the endpoints c and d, f must take all values between m and M. This proves
the theorem.
7.5. Uniform continuity. If we consider the ε–δ deﬁnition of continuity of
f at a point a as given in Theorem 7.1.2, we see that for any given ε the required
δ will normally depend on a. The steeper the graph of f near a, the smaller the
δ that is required. If for each ε > 0 there is some δ > 0 which will work for every
a ∈ T(f), then we say f is uniformly continuous.
More precisely:
graph of f
7. CONTINUITY 24
Definition 7.5.1. A function f is uniformly continuous if for every ε > 0 there
is a δ > 0 such that for every x
1
, x
2
∈ T(f):
[x
1
−x
2
[ ≤ δ implies [f(x
1
) −f(x
2
)[ ≤ ε.
Any uniformly continuous function is certainly continuous. On the other hand,
the function 1/x with domain (0, 1], and the function x
2
with domain R, are con
tinuous but not uniformly continuous. (See Reed Example 2 page 84.)
However, we have the following important result for closed bounded intervals.
Theorem 7.5.2. Any continuous function deﬁned on a closed bounded interval
is uniformly continuous.
Proof. See Reid p 85. I will discuss this in class.
There is an important case in which uniform continuity is easy to prove.
Definition 7.5.3. A function f is Lipschitz if there exists an M such that
[f(x
1
) −f(x
2
)[ ≤ M[x
1
−x
2
[
for all x
1
, x
2
in the domain of f. We say M is a Lipshitz constant for f.
This is equivalent to claiming that
[f(x
1
) −f(x
2
)[
[x
1
−x
2
[
≤ M
for all x
1
,= x
2
in the domain of f. In other words, the slope of the line joining any
two points on the graph of f is ≤ M.
Proposition 7.5.4. If f is Lipschitz, then it is uniformly continuous.
Proof. Let > 0 be any positive number.
Since [f(x) −f(a)[ ≤ M[x −a[ for any x, a ∈ Tf, it follows that
[x −a[ ≤
M
⇒[f(x) −f(a)[ ≤ .
This proves uniform continuity.
It follows from the Mean Value Theorem that if the domain of f is an interval
I (not necessarily closed or bounded), f is diﬀerentiable and [f
(x)[ ≤ M on I,
then f is Lipschitz with Lipschitz constant M.
Proof. Suppose x
1
, x
2
∈ I and x
1
< x
2
. Then
[f(x
1
) −f(x
2
)[
[x
1
−x
2
[
= f
(c)
for some x
1
< c < x
2
by the Mean Value Theorem. Since [f
(c)[ ≤ M, we are done.
Similarly if x
2
< x
1
.
♣ Where did we use the fact that the domain of f is an interval? Give a counter
example in case the domain is not an interval.
There are examples of nondiﬀerentianble Lipschitz functions, such as the fol
lowing “sawtooth” function.
8. RIEMANN INTEGRATION 25
8. Riemann Integration
Study Reed Chapter 3.3. Also see the MATH1116 notes.
8.1. Deﬁnition. The idea is that
_
b
a
f (which we also write as
_
b
a
f(x) dx)
should be a limit of lower and upper sums corresponding to rectangles, respectively
below and above the graph of f. See Reed page 87 for a diagram.
Suppose f is deﬁned on the interval [a, b] and f is bounded. (At ﬁrst you may
think of f as being continuous, but unless stated otherwise it is only necessary that
f be bounded.)
A partition P of [a, b] is any ﬁnite sequence of points (x
0
, . . . , x
N
) such that
a = x
0
< x
1
< < x
N
= b.
Let
m
i
= inf ¦ f(x) [ x ∈ [x
i−1
, x
i
] ¦, M
i
= sup¦ f(x) [ x ∈ [x
i−1
, x
i
] ¦, (6)
for i = 1, . . . , N.
25
Then the lower and upper sum for the partition P are deﬁned
by
L
P
(f) =
N
i=1
m
i
(x
i
−x
i−1
), U
P
(f) =
N
i=1
M
i
(x
i
−x
i−1
). (7)
They correspond to the sum of the areas of the lower rectangles and the upper
rectangles respectively.
Adding points to a partition increases lower sums and decreases upper sums.
More precisely, if Q is a reﬁnement of P, i.e. is a partition which includes all the
points in P, then
L
P
(f) ≤ L
Q
(f), U
Q
(f) ≤ U
P
(f). (8)
(See Reed Lemma 1 page 88 for the proof. Or see Lemma 7.4 page 72 of the 1998
AA1H notes — although the proof there is for continuous functions and so uses
max and min in the deﬁnition of upper and lower sums, the proof is essentially the
same for general bounded functions provide one uses instead sup and inf.)
Any lower sum is ≤ any upper sum:
L
P
(f) ≤ U
Q
(f). (9)
Here, P and Q are arbitrary partitions. The proof is by taking a common reﬁnement
of P and Q and using (8). (See Reed Lemma 2 page 89, and the MATH1116 notes.)
For each partition P we have a number L
P
(f), and the set of all such numbers
is bounded above (by any upper sum). The supremum of this set of numbers is
denoted by
sup
P
¦L
P
(f)¦ = sup¦ L
P
(f) [ Pis a partition of [a,b] ¦.
Similarly, the inﬁmum of the set of all upper sums,
inf
P
¦U
P
(f)¦ = inf¦ U
P
(f) [ Pis a partition of [a,b] ¦,
exists. If these two numbers are equal, then we say f is (Riemann)
26
integrable on
[a, b] and deﬁne
_
b
a
f := inf
P
¦U
P
(f)¦ = sup
P
¦L
P
(f)¦. (10)
25
If f is continuous, we can replace sup and inf by max and min respectively.
26
There is a much more powerful notion called Lebesgue integrable. This is more diﬃcult to
develop, and will be treated in the ﬁrst Analysis course in third year.
8. RIEMANN INTEGRATION 26
(For an example of a function which is bounded but not Riemann integrable, see
Reed Problem 1 page 93.)
8.2. Basic results.
Theorem 8.2.1. If f is a continous function on a closed bounded interval [a, b],
then f is integrable on [a, b].
Proof. (See Reed Lemma 3 and Theorem 3.3.1 on pages 89, 90, or MATH1116
notes.)
The idea is ﬁrst to show, essentially from the deﬁnitions of sup and inf, that it
is suﬃcient to prove for each ε > 0 that there exists a partition P for which
U
P
(f) −L
P
(f) < ε. (11)
The existence of such a P follows from the uniform continuity of f.
In order to numerically estimate
_
b
a
f we can choose a point x
∗
i
in each interval
[x
i−1
, x
i
] and compute the corresponding Riemann sum
R
P
(f) :=
N
i=1
f(x
∗
i
)(x
i
−x
i−1
).
This is not a precise notation, since R
P
(f) depends on the points x
∗
i
as well as P.
It is clear that
L
P
(f) ≤ R
P
(f) ≤ U
P
(f).
The following theorem justiﬁes using Riemann sums. Let P denote the max
imum length of the intervals in the partition P. Then
Theorem 8.2.2. Suppose f is Riemann integrable on the interval [a, b]. Let P
k
be a sequence of partitions of [a, b] such that lim
k→∞
P
k
 = 0 and suppose R
P
k
(f)
is a corresponding sequence of Riemann sums. Then
R
P
k
(f) →
_
b
a
f as k →∞.
See Reed page 91 for the proof.
8.3. Properties of the Riemann integral. For proofs of the following the
orems see Reed pages 91–93.
Theorem 8.3.1. Suppose f and g are continuous on [a, b] and c and d are
constants. Then
_
b
a
(cf +dg) = c
_
b
a
f +d
_
b
a
g.
(See Reed Theorem 3.3.3 page 92 for te proof.)
Theorem 8.3.2. Suppose f and g are continuous on [a, b] and f ≤ g. Then
_
b
a
f ≤
_
b
a
g.
Theorem 8.3.3. Suppose f is continuous on [a, b]. Then
¸
¸
¸
_
b
a
f
¸
¸
¸ ≤
_
b
a
[f[.
Corollary 8.3.4. If f is continuous on [a, b] and [f[ ≤ M then
¸
¸
¸
_
b
a
f
¸
¸
¸ ≤ M(b −a).
8. RIEMANN INTEGRATION 27
Theorem 8.3.5. Suppose f is continuous on [a, b] and a ≤ c ≤ b. Then
_
c
a
f +
_
b
c
f =
_
b
a
f.
Remark 8.3.1. Results analogous to those above are still true if we assume
the relevant functions are integrable, but not necessarily continuous. The proofs
are then a little diﬀerent, but not that much more diﬃcult. See “Calculus” by M.
Spivak, Chapter 13.
The fundamental connection between integration and diﬀerentiation is made in
Reed in Section 4.2; I return to this after we discuss diﬀerentiation. In particular,
if f is continuous on [a, b], then
_
b
a
f = F(b) −F(a)
where F
(x) = f(x) for all x ∈ [a, b]. This often gives a convenient way of ﬁnding
integrals.
8.4. Riemann integration for discontinuous functions. See Reed Section
3.5. This material will be done “lightly”.
• Example 1 of Reed: If f is the function deﬁned on [0, 1] by
f(x) =
_
1 0 ≤ x ≤ 1 and x irrational
0 0 ≤ x ≤ 1 and x rational
then f is not Riemann integrable, since every upper sum is 1 and every lower sum
is 0.
• However, any bounded monotone increasing
27
(or decreasing) function on an
interval [a, b] is Riemann integrable. See Reed Theorem 3.5.1.
A monotone function can have an inﬁnite number of points where it is discon
tinuous (Reed Q8 p112). Also, the sum of two Riemann integrable functions is
Riemann integrable (Reed Q13 p112). This shows that many quite “bad” (but still
bounded) functions can be Riemann integrable.
• The next result (Theorems 3.5.2, 3) is that if a bounded function is continuous
except at a ﬁnite number of points on [a, b], then it is Riemann integrable on [a, b].
(Interesting examples are f(x) = sin 1/x if −1 ≤ x ≤ 1 & x ,= 0, f(0) = 0; and
f(x) = sin 1/x if 0 < x ≤ 1, f(0) = 0; c.f. Reed Example 2.)
Moreover, if the points of discontinuity are a
1
< a
2
< < a
k
and we set
a
0
= a, a
k+1
= b, then
_
b
a
f =
j=k+1
j=1
_
a
j
a
j−1
f,
and
_
a
j
a
j−1
f = lim
δ→0+
_
a
j
−δ
a
j−1
+δ
f.
Since f is continuous on each interval [a
j−1
+δ, a
j
−δ], we can often ﬁnd
_
a
j
−δ
a
j−1
+δ
f
by standard methods for computing integrals (i.e. ﬁnd a function whose derivative
is f) or by numerical methods, and then take the limit.
_
By the onesided limit
lim
δ→0+
we mean that δ is restricted to be positive; Reed writes lim
δ0
to mean
the same thing. See Reed p 108 for the deﬁnition of a onesided limit in terms of
27
f is monotone increasing if f(x
1
) ≤ f(x
2
) whenever x
1
< x
2
.
8. RIEMANN INTEGRATION 28
sequences. The “–δ” deﬁnition is in Deﬁnition 3.13 p 29 of the AA1H 1998 Notes.
The two deﬁnitions are equivalent by the same argument as used in the proof of
similar equivalences for continuity and also for ordinary limits, see Reid p 78.
_
• A particular case of importance is when the left and right limits of f exist at
the discontinuity points a
j
. The function f then is said to be piecewise continuous.
In this case if the function f
j
is deﬁned on [a
j−1
, a
j
] by
f
j
(x) =
_
¸
¸
_
¸
¸
_
f(x) a
j−1
< x < a
j
lim
x→a
+
j−1
f(x) x = a
j−1
lim
x→a
−
j
f(x) x = a
j
,
then f
j
is continuous on [a
j−1
, a
j
],
_
b
a
f =
j=k+1
j=1
_
a
j
a
j−1
f
j
,
and each
_
a
j
a
j−1
f
j
can often be computed by standard methods. See Reed Example 3
p 108 and Corollary 3.5.4 p 109.
8.5. Improper Riemann integrals. See Reed Section 3.8. This material
will be done “lightly”.
In previous situations we had a bounded function with a ﬁnite number of dis
continuities. The Riemann integral existed according to our deﬁnition, and we saw
that it could be obtained by computing the Riemann integrals of certain continuous
functions and by perhaps taking limits.
If the function is not bounded, or the domain of integration is an inﬁnite in
terval, the Riemann integral can often be deﬁned by taking limits. The Riemann
integral is then called an improper integral. For details and examples of the follow
ing, see Reed Section 3.6.
For example, if f is continuous on [a, b) then we deﬁne
_
b
a
f = lim
δ→0+
_
b−δ
a
f,
provided the limit exists.
If f is continuous on [a, ∞) then we deﬁne
_
∞
a
f = lim
b→∞
_
b
a
f,
provided the limit exists.
_
I do not think that Reed actually deﬁnes what is meant by
lim
x→∞
g(x),
where here g(x) =
_
x
a
f. But it is clear what the deﬁnition in terms of sequences
should be. Namely,
lim
x→∞
g(x) = L if whenever a sequence x
n
→∞ then g(x
n
) →L.
The deﬁnition of x
n
→ ∞ (i.e. (x
n
) “diverges” to inﬁnity), is the natural one and
is given on p 32 of Reed.
There is also a natural, and equivalent, deﬁnition which does not involve se
quences. Namely,
lim
x→∞
g(x) = L if for every > 0 there is a real number K
such that x ≥ K implies [g(x) −L[ ≤ .
_
8. RIEMANN INTEGRATION 29
Similar deﬁnitions apply for integrals over (a, b], (a, b), R , etc.
Note the subtlety in Reed Example 5 p 116. The integral
_
∞
1
sin x
x
dx exists in
the limit sense deﬁned above, because of “cancellation” of successive positive and
negative bumps, but the “area” above the axis and the “area” below the axis is
inﬁnite. This is analogous to the fact that the series 1 −
1
2
+
1
3
−
1
4
. . . converges,
although the series of positive and negative terms each diverge.
9. DIFFERENTIATION 30
9. Diﬀerentiation
9.1. Material from last year. Read Sections 4.1, 4.2 of Reed and MATH1116
Notes.
Suppose f : S →R, x ∈ S, and there is some open interval containing x which
is a subset of S. (Usually S will itself be an interval, but not necessarily open or
bounded).
Then f
(x), the derivative of f at x is deﬁned in the usual way as
lim
h→0
f(x +h) −f(x)
h
.
If S = [a, b], it is also convenient to deﬁne the derivatives at the endpoints a
and b of S in the natural way by restricting h to be > 0 or < 0 respectively. See
Reed p 126, three paragraphs from the bottom, and p 108 (or MATH1116 notes).
In Reed Section 4.1 note in particular Theorem 4.1 — diﬀerentiability at any
point implies continuity there, Theorem 4.2 (the usual rules for diﬀerentiating sums,
producs and quotients), and Theorem 4.3 (the chain rule; you may prefer the proof
given in Adams).
The next main result is the justiﬁcation for using derivatives to ﬁnd maximum
(and similarly minimum) points; if a continuous function takes a maximum at c,
and f
(c) exists, then f
(c) = 0.
9.2. Fundamental theorem of calculus. This connects diﬀerentiation and
integration, see Reed Theorems 4.2.4 and 4.2.5. (The proof in Reed of Theo
rem 4.2.4 is diﬀerent from that in the MATH116 notes.)
9.3. Mean value theorem and Taylor’s theorem. See Reed Section 4.3
and Adams pages 285–290, 584, 585.
The Mean Value Theorem (Reed Theorem 4.2.3) says geometrically that “the
slope of the chord joining two points on the graph of a diﬀerentiable function equals
the derivative at some point between them”.
Taylor’s Theorem, particularly with the Legendre form of the remainder as in
Reed page 136, Theorem 4.3.1, is a generalisation.
Note also how L’Hˆopital’s Rule, Theorem 4.3.2 page 138, follows from Taylor’s
theorem.
9.4. Newton’s method. See Reed Section 4.4 and Adams pages 192–195.
Newton’s method, and its generalisation to ﬁnding zeros of functions f : R
n
→
R
n
for n > 1 (see Adams pages 745–748), is very important for both computational
and theoretical reasons.
As we see in Reed page 141, if x
n
is the nth approximatiion to the solution x
of f(x) = 0, then
x
n+1
= x
n
−
f(x
n
)
f
(x
n
)
.
Remarks on proof of Theorem 4.4.1. We want to estimate [x
n+1
−x[ in
terms of [x
n
−x[.
Applying the above formula, and both Taylor’s Theorem and the Mean Value
Theorem, gives
[x
n+1
−x[ ≤
1
f
(x
n
)
_
f
(τ
n
) +
f
(τ
n
)
2!
_
[x
n
−x[
2
for some τ
n
and τ
n
between x
n
and x, see Reed page 143 formula (14).
f(x) = x
3
o
•
9. DIFFERENTIATION 31
Assume f
(x) ,= 0 and f is C
2
in some interval containing x.
28
Then provided
[x
n
−x[ < ε (say), it follows that
1
f
(x
n
)
_
f
(τ
n
) +
f
(τ
n
)
2!
_
≤ C (12)
for some constant C (depending on f). Hence
[x
n+1
−x[ ≤ C [x
n
−x[
2
. (13)
We say that (x
n
) converges to x quadratically fast. See the discusion at the end of
the ﬁrst paragraph on page 144 of Reed.
The only remaining point in the proof of Theorem 4.4.1 in Reed, pages 142,143,
is to show that if [x
1
−x[ < ε (and so (13) is true for n = 1) then [x
n
−x[ < ε for
all n (and so (13) is true for all n). But from (13) with n = 1 we have
[x
2
−x[ ≤
1
2
[x
1
−x[
provided [x
1
− x[ ≤
1
2C
. Thus x
2
is even closer to x than x
1
and so (13) is true
with n = 2, etc.
Note that even though the proof only explicitly states linear convergence in
(16), in fact it also gives the much better quadratic quadratic convergence as noted
on page 144 line 4.
9.5. Monotonic functions.
See Reed Section 4.5, Adams Section 4.1 and pages 258–260.
A function f : I →R, is strictly increasing if x < y implies f(x) < f(y). Unless
stated otherwise, we alsways take the domain I to be an interval. Similarly one
deﬁnes strictly decreasing. A function is strictly monotonic if it is either strictly
increasing or strictly decreasing.
If f is diﬀerentiable on I and f
> 0, then it is strictly increasing (Reed page
150, problem 1. HINT: Use the Mean Value Theorem). But the derivative of a
strictly increasing function may somewhere be zero, e.g. f(x) = x
3
.
A strictly monotonic function f is oneone and so has an inverse f
−1
, deﬁned
by
f
−1
(y) = the unique x such that f(x) = y.
The domain of f
−1
is the range of f and the range of f
−1
is the domain of f.
If f : I →R is strictly monotonic and continuous, then its range is an interval
and its inverse exists and is strictly monotonic and continuous. See Reed page 83,
Corollary 3.2.4 and page 147 Theorem 4.5.1.
28
That is, f
and f
exist and are continuous.
9. DIFFERENTIATION 32
If f : I → R is strictly monotonic and continuous, and is diﬀerentiable at a
with f
(a) ,= 0, then f
−1
is diﬀerentiable at b = f(a) and
29
(f
−1
)
(b) =
1
f
(a)
If f : I → R is strictly monotonic and continuous, and everywhere diﬀeren
tiable with f
(x) ,= 0, then by the above result we have that f
−1
is everywhere
diﬀerentiable. Moreover, the derivative is continuous, since
(f
−1
)
(y) =
1
f
(f
−1
(y))
by the previous formula, and both f
−1
and f are continuous. (This is Reed Corol
lary 4.5.3)
The change of variables formula, Theorem 4.5.4, does not really require φ to
be strictly increasing, as noted in Reed page 151, problem 6 (See Adams page 326,
Theorem 6, for the simpler proof of this more general fact, using the Fundamental
Theorem of Calculus). The proof in Reed does have the advantage that it generalises
to situations where the Fundamental Theoorem cannot be applied.
The result of Theorem 4.5.5 also follows from the Fundamental Theorem of
Calculus, see Reed page 151, problem 7.
29
Loosely speaking,
dy
dx
= 1/
dx
dy
. But this notation can cause problems and be ambiguous.
10. BASIC METRIC AND TOPOLOGICAL NOTIONS IN R
n
33
10. Basic metric and topological notions in R
n
See Adams page 643 (third edition), or (better) fourth edition pages 601,602.
See also Reed problem 10 page 40 and the ﬁrst two paragraphs in Section 4.6. But
we do considerably more.
10.1. Euclidean nspace. We deﬁne
R
n
= ¦ (x
1
, . . . , x
n
) [ x
1
, . . . , x
n
∈ R¦.
Thus R
n
denotes the set of all (ordered) ntuples (x
1
, . . . , x
n
) of real numbers.
We think of R
1
, R
2
and R
3
as the real line, the plane and 3space respectively.
But for n ≥ 4 we cannot think of R
n
as consisting of points in physical space.
Instead, we usually think of points in R
n
as just being given algebraically by n
tuples (x
1
, . . . , x
n
) of real numbers. (We sometimes say (x
1
, . . . , x
n
) is a vector, but
then you should think of a vector as a point, rather than as an “arrow”.)
In R
2
and R
3
we usually denote the coordinates of a point by (x, y) and (x, y, z),
but in higer dimensions we usually use the notation (x
1
, . . . , x
n
), etc.
Although we usually think of points in R
n
as ntuples, rather than physical
points, we are still motivated and guided by the geometry in the two and three
dimensnional cases. So we still speak of “points” in R
n
We deﬁne the (Euclidean) distance
30
between x = (x
1
, . . . , x
n
) and y = (y
1
, . . . , y
n
)
by
d(x, y) =
_
(x
1
−y
1
)
2
+ + (x
n
−y
n
)
2
.
This agrees with the usual distance in case n = 1, 2, 3.
We also call the set of points (x
1
, . . . , x
n
) ∈ R
n
satisfying any equation of the
form
a
1
x
1
+ +a
n
x
n
= b,
a hyperplane. Sometimes we restrict this deﬁnition to the case b = 0 (in which case
the hyperplane “passes through” the origin).
10.2. The Triangle and CauchySchwarz inequalities. The Euclidean
distance function satisﬁes three important properties. Any function d on a set S
(not necessarily R
n
) which satisﬁes these properties is called a metric, and S is
called a metric space with metric d. We will discuss general metrics later. But you
should observe that, unless noted or otherwise clear from the context, the proofs
and deﬁnitions in Sections 10–14 carry over directly to general metric spaces. We
discuss this in more detail in Section 15.
Theorem 10.2.1. Let d denote the Euclidean distance function in R
n
. Then
1. d(x, y) ≥ 0; d(x, y) = 0 iﬀ x = y. (positivity)
2. d(x, y) = d(y, x). (symmetry)
3. d(x, y) ≤ d(x, z) +d(z, y). (triangle inequality)
Proof. The ﬁrst two properties are immediate.
The third follows from the CauchySchwarz inequality, which we prove in the
next theorem.
To see this let
u
i
= x
i
−z
i
, v
i
= z
i
−y
i
,
Then
d(x, y)
2
=
i
(x
i
−y
i
)
2
=
i
(x
i
−z
i
+z
i
−y
i
)
2
30
Later we will deﬁne other distance functions (i.e. metrics, see the next section) on R
n
, but
the Euclidean distance function is the most important.
10. BASIC METRIC AND TOPOLOGICAL NOTIONS IN R
n
34
=
i
(u
i
+v
i
)
2
=
i
u
2
i
+ 2u
i
v
i
+v
2
i
=
i
u
2
i
+ 2
i
u
i
v
i
+
i
v
2
i
≤
i
u
2
i
+ 2
_
i
u
2
i
_
1/2
_
i
u
2
i
_
1/2
+
i
v
2
i
(by the CauchySchwarz inequality)
=
_
_
_
i
u
2
i
_1
2
+
_
i
v
2
i
_1
2
_
_
2
=
_
d(x, z) +d(z, y)
_
2
.
The triangle inequality now follows.
The following is a fundamental inequality.
31
Here is one of a number of possible
proofs.
Theorem 10.2.2 (CauchySchwarz inequality). Suppose (u
1
, . . . , u
n
) ∈ R
n
and
(v
1
, . . . , v
n
) ∈ R
n
. Then
¸
¸
¸
¸
i
u
i
v
i
¸
¸
¸
¸
≤
_
i
u
2
i
_
1/2
_
i
v
2
i
_
1/2
. (14)
Moreover, equality holds iﬀ at least one of (u
1
, . . . , u
n
), (v
1
, . . . , v
n
) is a multiple of
the other (in particular if one is (0, . . . , 0)).
Proof. Let
f(t) =
i
(u
i
+tv
i
)
2
=
i
u
2
i
+ 2t
i
u
i
v
i
+t
2
i
v
2
i
.
Then f is a quadratic in t (provided (v
1
, . . . , v
n
) ,= (0, . . . , 0)), and from the ﬁrst
expression for f(t), f(t) ≥ 0 for all t.
It follows that f(t) = 0 has no real roots (in case f(t) > 0 for all t) or two
equal real roots (in case f(t) = 0 for some t). Hence
4
_
i
u
i
v
i
_
2
−4
i
u
2
i
i
v
2
i
≤ 0. (15)
This immediately gives the CauchySchwarz inequality.
If one of (u
1
, . . . , u
n
), (v
1
, . . . , v
n
) is (0, . . . , 0) or is a multiple of the other then
equality holds in (14). (why?)
Conversely, if equality holds in (14), i.e. in (15), and (v
1
, . . . , v
n
) ,= (0, . . . , 0),
then f(t) is a quadratic (since the coeﬃcient of t
2
is nonzero) with equal roots and
in particular f(t) = 0 for some t. This t gives u = −tv, from the ﬁrst expression
for f(t).
This completes the proof of the theorem.
Remark 10.2.1. In the preceding we discussed the (standard) metric on R
n
.
It can be deﬁned from the (standard) norm and conversely. That is
d(x, y) = x −y, and x = d(0, x)
The triangle inequlity for the metric follows easily from the triangle inequality
for the norm (i.e. x +y ≤ x +y), and conversely (Exercise).
31
The CauchySchwarz inequality can be interpreted in terms of inner products as saying
that u · v ≤ u v. But we will not need to refer to inner products here.
10. BASIC METRIC AND TOPOLOGICAL NOTIONS IN R
n
35
10.3. Preliminary Remarks on Metric Spaces. Any function d on a set
/ (not necessarily R
n
) which satisﬁes the three properties in Theorem 10.2.1 is
called a metric, and / together with d is called a metric space.
You should now study Section 15.1, except possibly for Example 15.1.3.
We will discuss general metrics later. But you should observe that unless noted
or otherwise clear from context, the proofs and deﬁnitions in Sections 10–14 carry
over directly to general metric spaces. We discuss this in more detail in Section 15.
10.4. More on intersections and unions. The intersection and union of
two sets was deﬁned in Reed page 7. The intersection and union of more than
two sets is deﬁned similarly. The intersection and union of a (possibly inﬁnite)
collection of sets indexed by the members of some set J are deﬁned by
λ∈J
A
λ
= ¦ x [ x ∈ A
λ
for all λ ∈ J ¦,
_
λ∈J
A
λ
= ¦ x [ x ∈ A
λ
for some λ ∈ J ¦.
For example:
n∈N
_
0,
1
n
_
= ¦0¦,
n∈N
_
0,
1
n
_
= ∅,
n∈N
_
−
1
n
,
1
n
_
= ¦0¦,
n∈N
[n, ∞) = ∅,
_
r∈Q
(r −, r +) = R (for any ﬁxed > 0),
_
r∈R
¦r¦ = R.
It is an easy generalisation of De Morgan’s law that
(A
1
∪ ∪ A
k
)
c
= A
c
1
∩ ∩ A
c
k
.
More generally,
_
_
λ∈J
A
λ
_
c
=
λ∈J
A
c
λ
.
Also,
(A
1
∩ ∩ A
k
)
c
= A
c
1
∪ ∪ A
c
k
and
_
λ∈J
A
λ
_
c
=
_
λ∈J
A
c
λ
.
10.5. Open sets. The deﬁnitions in this and the following sections generalise
the notions of open and closed intervals and endpoints on the real line to R
n
. You
should think of the cases n = 2, 3.
A neighbourhood of a point a ∈ R
n
is a set of the form
B
r
(a) = ¦ x ∈ R
n
[ d(x, a) < r ¦
•
S
boundary point
points not in S
points in S
•
interior point
S
c
∂S
10. BASIC METRIC AND TOPOLOGICAL NOTIONS IN R
n
36
for some r > 0. We also say that B
r
(a) is the open ball of radius r centred at a.
In case n = 1 we say B
r
(a) is an open interval centred at a, and in case n = 2 also
call it the open disc of radius r centred at a.
Definition 10.5.1. A set S ⊆ R
n
is open (in R
n
)
32
if for every a ∈ S there is
a neighbourhood of a which is a subset of S.
Every neighbourhood is itself open (why?). Other examples in R
2
are
1. ¦ x ∈ R
2
[ d(x, a) > r ¦,
2. ¦ (x, y) [ x > 0 ¦,
3. ¦ (x, y) [ y > x
2
¦,
4. ¦ (x, y) [ y ,= x
2
¦,
5. ¦ (x, y) [ x > 3 or y < x
2
¦, etc.
Typically, sets described by strict inequalities “<”, “>”, or “,=”, are open
33
. An
example in R
3
is
¦ (x, y, z) [ x
2
+y
2
< z
2
and sin x < cos y ¦.
By the following theorem, the intersection of a ﬁnite number of open sets is
open, but the union of any number (possibly inﬁnite) of open sets is open. The
third example in the previous section shows that the intersection of an inﬁnite
collection of open sets need not be open.
Theorem 10.5.2.
1. The sets ∅ and R
n
are both open.
2. If A
1
, . . . , A
k
is a ﬁnite collection of open subsets of R
n
, then their inter
section A
1
∩ ∩ A
k
is open.
3. If A
λ
is an open subset of R
n
for each λ ∈ J, then the union
λ
A
λ
of all
the sets A
λ
is also open.
Proof.
1. The ﬁrst part is trivial (the empty set is open because every point in it certainly
has the required property!)
2. Consider any point a ∈ A
1
∩ ∩ A
k
. Then a ∈ A
k
for every k, and since
A
k
is open there is a real number r
k
> 0 such that B
r
k
(a) ⊆ A
k
. Choosing r =
32
We sometimes say S is open in R
n
because there is a more general notion of S being open
in E for any set E such that S ⊆ E. When we say S is open we will always mean open in R
n
,
unless otherwise stated.
33
Let S = { (x
1
, . . . , x
n
)  f(x
1
, . . . , x
n
) > g(x
1
, . . . , x
n
) }. If f and g are continuous (we
rigorously deﬁne continuity for functions of more than one variable later) and a = (a
1
, . . . , a
n
)
is a point in S, then all points in a suﬃciently small neighbourhood of a will also satisfy the
corresponding inequality and so be in S — see later. It follows that S is open.
If a set is described by a ﬁnite number of strict inequalities in terms of “and” and “or”, then
it will be obtained by taking ﬁnite unions and intersections of open sets as above, and hence be
open by Theorem 10.5.2.
10. BASIC METRIC AND TOPOLOGICAL NOTIONS IN R
n
37
min¦r
1
, . . . , r
k
¦ we see that r > 0, and B
r
(a) ⊆ A
k
for each k. It follows
34
that
B
r
(a) ⊆ A
1
∩ ∩ A
k
.
Hence A
1
∩ ∩ A
k
is open.
3. Consider any point a ∈
λ∈J
A
λ
. Then a ∈ A
λ
for some λ ∈ J. For this λ
there is an r > 0 such that B
r
(a) ⊆ A
λ
. It follows (from the deﬁnition of
) that
B
r
(a) ⊆
_
λ
A
λ
.
Hence
λ∈J
A
λ
is open.
The reason the proof of 2. does not show the intersection of an inﬁnite number
of open sets is always open is that we would need to take r to be the inﬁmum of an
inﬁnite set of positive numbers. Such an inﬁmum may be 0. Consider, for example,
n∈N
_
−
1
n
,
1
n
_
= ¦0¦ from the previous section. If a = 0 then r
n
= 1/n, and the
inﬁmum of all the r
n
is 0.
10.6. Closed sets.
Definition 10.6.1. A set C is closed (in R
n
)
35
if its complement (in R
n
) is
open.
If, in the examples after Deﬁnition 10.5.1, “>”, “<” and ,= are replaced by
“≥”, “≤” and =, then we have examples of closed sets. Generally, sets deﬁned in
terms of nonstrict inequalities and = are closed.
36
Closed intervals in R are closed
sets.
Theorem 10.6.2.
1. The sets ∅ and R
n
are both closed.
2. If A
1
, . . . , A
k
is a ﬁnite collection of closed subsets of R
n
, then their union
A
1
∪ ∪ A
k
is closed.
3. If A
λ
is a closed subset of R
n
for each λ ∈ J, then the intersection
λ
A
λ
of all the sets A
λ∈J
is also closed.
Proof.
1. Since ∅ and R
n
are complements of each other, the result follows from the
corresponding part of the previous theorem.
2. By DeMorgan’s Law,
(A
1
∪ ∪ A
k
)
c
= A
c
1
∩ ∩ A
c
k
.
Since A
c
1
∩ ∩ A
c
k
is open from part 2 of the previous theorem, it follows that
(A
1
∪ ∪ A
k
) is closed.
3. The proof is similar to 2. Namely,
_
λ∈J
A
λ
_
c
=
_
λ∈J
A
c
λ
,
and so the result follows from 3 in the previous theorem.
34
If B ⊆ A
1
, . . . , B ⊆ A
k
, then it follows from the deﬁnition of A
1
∩ · · · ∩ A
k
that B ⊆
A
1
∩ · · · ∩ A
k
.
35
By closed we will always mean closed in R
n
. But as for open sets, there is a more general
concept of being closed in an arbitrary E such that S ⊆ E.
36
The complement of such a set is a set deﬁned in terms of strict inequalities, and so is open.
See the previous section.
10. BASIC METRIC AND TOPOLOGICAL NOTIONS IN R
n
38
10.7. Topological Spaces*. Motivated by the idea of open sets in R
n
we
make the following deﬁnition.
Definition 10.7.1. A set o, together with a collection of subsets of o (which
are called the open sets in o) which satisfy the following three properties, is called
a topological space.
1. The sets ∅ and o are both open.
2. If A
1
, . . . , A
k
is a ﬁnite collection of open subsets of o, then their intersection
A
1
∩ ∩ A
k
is open.
3. If A
λ
is an open subset of o for each λ ∈ J, then the union
λ
A
λ
of all the
sets A
λ
is also open.
Essentially the same arguments as used in the case of R
n
with the Euclidean
metric d allow us to deﬁne the notion of a neighbourhood of a point in any metric
space (/, d), and then to deﬁne open sets in any metric space, and then to show
as in Theorem 10.5.2 that this gives a topological space.
In any topological space (which as we have just seen includes all metric spaces)
one can deﬁne the notion of a closed set in an analogous way to that in Section 10.6
and prove the analogue of Theorem 10.6.2. The notion of boundary, interior and
exterior of an open set in any topologoical space is deﬁned as in Section 10.8.
The theory of topological spaces is fundamental in contemporary mathematics.
10.8. Boundary, Interior and Exterior.
Definition 10.8.1. The point a is a boundary point of S if every neighbour
hood of a contains both points in S and points not in S. The boundary of S (bdry S
or ∂S) is the set of boundary points of S.
Thus ∂B
r
(a) is the set of points whose distance from a equals r. See the
diagram in Section 10.5.
Definition 10.8.2. The point a is an interior point of S if a ∈ S but a is not
a boundary point of S. The point a is an exterior point of S if a ∈ S
c
but a is not
a boundary point of S.
The interior of S (int S) consists of all interior points of S. The exterior of S
(ext S) consists of all exterior points of S.
It follows that:
• a is an interior point of S iﬀ B
r
(a) ⊆ S for some r > 0,
• a is an exterior point of S iﬀ B
r
(a) ⊆ S
c
for some r > 0
• S = int S ∪∂S ∪ext S, and every point in R
n
is in exactly one of these three
sets.
Why?
Do exercises 1–8, page 647 of Adams, to reinforce the basic deﬁnitions.
10.9. Cantor Set. We next sketch a sequence of approximations A = A
(0)
,
A
(1)
, A
(2)
, . . . to the Cantor Set C, the most basic fractal.
10. BASIC METRIC AND TOPOLOGICAL NOTIONS IN R
n
39
We can think of C as obtained by ﬁrst removing the open middle third (1/3, 2/3)
from [0, 1]; then removing the open middle third from each of the two closed intervals
which remain; then removing the open middle third from each of the four closed
interval which remain; etc.
More precisely, let
A = A
(0)
= [0, 1]
A
(1)
= [0, 1/3] ∪ [2/3, 1]
A
(2)
= [0, 1/9] ∪ [2/9, 1/3] ∪ [2/3, 7/9] ∪ [8/9, 1]
.
.
.
Let C =
n≥0
A
(n)
. Since C is the intersection of a family of closed sets, C is
closed.
Note that A
(n+1)
⊆ A
(n)
for all n and so the A
(n)
form a decreasing family of
sets.
Consider the ternary expansion of numbers x ∈ [0, 1], i.e. write each x ∈ [0, 1]
in the form
x = .a
1
a
2
. . . a
n
. . . =
a
1
3
+
a
2
3
2
+ +
a
n
3
n
+ (16)
where a
n
= 0, 1 or 2. Each number has either one or two such representations, and
the only way x can have two representations is if
x = .a
1
a
2
. . . a
n
222 = .a
1
a
2
. . . a
n−1
(a
n
+1)000 . . .
for some a
n
= 0 or 1. For example, .210222 = .211000 . . . .
Note the following:
• x ∈ A
(n)
iﬀ x has an expansion of the form (16) with each of a
1
, . . . , a
n
taking the values 0 or 2.
• It follows that x ∈ C iﬀ x has an expansion of the form (16) with every a
n
taking the values 0 or 2.
• Each endpoint of any of the 2
n
intervals associated with A
(n)
has an expan
sion of the form (16) with each of a
1
, . . . , a
n
taking the values 0 or 2 and
the remaining a
i
either all taking the value 0 or all taking the value 2.
Exercise: Show that C is uncountable.
Next let
S
1
(x) =
1
3
x, S
2
(x) = 1 +
1
3
(x −1).
Notice that S
1
is a dilation with dilation ratio 1/3 and ﬁxed point 0. Similarly, S
2
is a dilation with dilation ratio 1/3 and ﬁxed point 1.
A
B
A x B
•
B
t
(a )
x
y
B
t
(x) x B
t
(y)
10. BASIC METRIC AND TOPOLOGICAL NOTIONS IN R
n
40
Then
A
(n+1)
= S
1
[A
(n)
] ∪ S
2
[A
(n)
].
Moreover,
C = S
1
[C] ∪ S
2
[C].
The Cantor set is a fractal. It is composed of two parts each obtained from the
original set by contracting by scaling (by 1/3).
If you think of a one, two or three dimensional set (e.g. line, square or cube),
contracting by 1/3 should give a set which is 1/3
d
of the original set in “size”.
Thus if d is the “dimension” of the Cantor set, then we expect
1
2
=
1
3
d
.
This gives log 2 = d log 3 or d = log 2/ log 3 ≈ .6309.
10.10. Product sets.
Theorem 10.10.1. If A ⊆ R
m
and B ⊆ R
n
are both open (closed) then so is
the product AB ⊆ R
m+n
.
Proof. For simplicity of notation, take m = n = 1.
First suppose A and B are open.
If a = (x, y) ∈ AB ⊆ R
2
then x ∈ A and y ∈ B. Choose r and s so B
r
(x) ⊆ A
and B
s
(y) ⊆ B.
Let t = min¦r, s¦. Then
B
t
(a) ⊆ B
r
(x) B
s
(y) ⊆ AB.
The ﬁrst “⊆” is because
(x
0
, y
0
) ∈ B
t
(a) ⇒ d
_
(x
0
, y
0
), (x, y)
_
< t
⇒ d(x
0
, x) < t and d(y
0
, y) < t
⇒ x
0
∈ B
t
(x) and y
0
∈ B
t
(y)
⇒ (x
0
, y
0
) ∈ B
r
(x) B
s
(y).
Since B
t
(a) ⊆ AB it follows that AB is open.
Next suppose A and B are closed. Note that
(AB)
c
= A
c
R ∪ R B
c
,
10. BASIC METRIC AND TOPOLOGICAL NOTIONS IN R
n
41
since (x, y) ,∈ A B iﬀ (either x ,∈ A or y ,∈ B (or both)). Hence the right side
is the union of two open sets by the above, and hence is open. Hence A B is
closed.
•
•
•
•
•
•
•
•
*
x
1
x
3
x
5
x
6
x
8
x
The shown sequence converges, as does
its projection on to the x and yaxes
11. SEQUENCES IN R
p
42
11. Sequences in R
p
See Reed Problem 8 page 51, Problems 10–12 on page 59 and the ﬁrst Deﬁnition
on page 152. But we do much more!
11.1. Convergence.
Definition 11.1.1. A sequence of points (x
n
) fromR
p
converges to x if d(x
n
, x) →0.
Thus the notion of convergence of a sequence from R
p
is reduced to the notion
of convergence of a sequence of real numbers to 0.
Example 11.1.2. Let θ ∈ R and a = (x
∗
, y
∗
) ∈ R
2
be ﬁxed, and let
a
n
=
_
x
∗
+
1
n
cos nθ, y
∗
+
1
n
sin nθ
_
.
Then a
n
→ a as n → ∞. The sequence (a
n
) spirals around a with d(a
n
, a) =
1
n
and with rotation by the angle θ in passing from a
n
to a
n+1
.
The next theorem shows that a sequence from R
p
converges iﬀ each sequence
of components converges. Think of the case p = 2.
Proposition 11.1.3. The sequence x
n
→x iﬀ x
(k)
n
→x
(k)
for k = 1, . . . , p.
Proof. Assume x
n
→ x. Then d(x
n
, x) → 0. Since [x
(k)
n
− x
(k)
[ ≤ d(x
n
, x),
it follows that [x
(k)
n
−x
(k)
[ →0 and so x
(k)
n
→x
(k)
, as n →∞.
Next assume x
(k)
n
→x
(k)
for each k, i.e. [x
(k)
n
−x
(k)
[ →0, as n →∞. Since
d(x
n
, x) =
_
(x
(1)
n
−x
(1)
)
2
+ + (x
(p)
n
−x
(p)
)
2
,
it follows that d(x
n
, x) →0 and so x
n
→x.
11.2. Closed sets. The reason we call a set “closed” is that it is closed under
the operation of taking limits of any sequence of points from the set. This is the
content of the next theorem.
Theorem 11.2.1. A set S ⊆ R
p
is closed iﬀ every convergent sequence (x
n
) ⊆
S has its limit in S.
Proof. First suppose S is closed. Let (x
n
) ⊆ S be a convergent sequence with
limit x.
Assume x ,∈ S, i.e. x ∈ S
c
. Since S
c
is open, there is an r > 0 such that
B
r
(x) ⊆ S
c
. But since x
n
→x, ultimately
37
x
n
∈ B
r
(x) ⊆ S
c
.
37
Ultimately means there is an N such that this is true for all n ≥ N.
If x
n
→ x then ultimately x
n
ε B
r
(x)
B
1/n
(x)
If for each n there is a member of S in B
1/n
(x) ,
then there is a sequence from S converging to x .
1/n
11. SEQUENCES IN R
p
43
This contradicts the fact x
n
∈ S for all n. Hence the assumption is wrong and so
x ∈ S
Next suppose that every convergent sequence (x
n
) ⊆ S has its limit in S.
Assume S is not closed, i.e. S
c
is not open. Then there is a point x ∈ S
c
such that
for no r > 0 is it true that B
r
(x) ⊆ S
c
. In particular, for each natural number n
there is a point in B
1/n
(x) which belongs to S. Choose such a point and call it x
n
.
Then x
n
→x since d(x
n
, x) ≤ 1/n →0.
This means that the sequence (x
n
) ⊆ S but its limit is not in S. This is a
contradiction and so the assumption is wrong. Hence S is closed.
11.3. Bounded sets.
Definition 11.3.1. A set S ⊆ R
p
is bounded if there is a point a ∈ R
p
and a
real number R such that S ⊆ B
R
(a).
A sequence (x
n
) is bounded if there is a point a ∈ R
p
and a real number R such
that x
n
∈ B
R
(a) for every n.
Thus a sequence is bounded iﬀ the corresponding set of members is bounded.
Proposition 11.3.2. A set is bounded iﬀ there is a ball (of ﬁnite radius) cen
tred at the origin which includes the set.
Proof. This is clear in R
2
from a diagram. It is proved from the triangle
inequality as follows.
Suppose S ⊆ B
R
(a). Let r = d(a, 0). Then for any x ∈ S,
d(x, 0) ≤ d(x, a) +d(a, 0) < R +r.
Hence S ⊆ B
R+r
(0).
R
S
S
S
P
1
(S)
P
2
(S)
11. SEQUENCES IN R
p
44
The following theorem says that a set S is bounded iﬀ its projection on to each
of the axes is bounded. The projection onto the kth axis is deﬁned by
P
k
(S) = ¦ a [ x ∈ S for some x whose kth component is a ¦.
Theorem 11.3.3. A set S ⊆ R
p
is bounded iﬀ P
k
(S) is bounded for each k =
1, . . . , p.
Proof. If S ⊆ B
R
(0) then each P
k
(S) ⊆ [−R, R] and so is bounded.
Conversely, if each P
k
(S) is bounded then by choosing the largest interval, we
may assume P
k
(S) ⊆ [−M, M] for every k. It follows that if x ∈ S then every
component of x is at most M in absolute value. It follows that
[x[ ≤
√
kM
2
,
i.e. S ⊆ B
R
(0) where R =
√
kM.
In particular, if a sequence of points from R
p
is bounded, then so are the
sequences of real numbers corresponding to the ﬁrst component, to the second
component, etc.
Proposition 11.3.4. A convergent sequence is bounded.
Proof. Suppose x
n
→ x. Then ultimately x
n
∈ B
1
(x), say for n ≥ N, and
so the set of terms ¦ x
n
[ n ≥ N ¦ is bounded. The set ¦x
1
, . . . , x
N−1
¦ is also
bounded (being ﬁnite) and so the set of all terms is bounded, being the union of
two bounded sets. In other words, the sequence is bounded.
11. SEQUENCES IN R
p
45
11.4. BolzanoWeierstraß theorem. As for sequences of real numbers, a
subsequence of a sequence is obtained by skipping terms.
Proposition 11.4.1. If (x
n
) ⊆ R
p
and x
n
→ x, then any subsequence also
converges to x.
Proof. Let (x
n(k)
)
k≥1
be a subsequence of (x
n
). Suppose > 0 is given.
Since x
n
→x there is an integer N such that
k ≥ N ⇒ d(x
k
, x) ≤ .
But if k ≥ N then also n(k) ≥ N and so
k ≥ N ⇒ d(x
n(k)
, x) ≤ .
That is, x
n(k)
→x.
The following theorem is analogous to the BolzanoWeierstraß Theorem 5.2.2
for real numbers, and in fact follows from it.
Theorem 11.4.2 (BolzanoWeierstraß Theorem). If a sequence (a
n
) ⊆ R
p
is
bounded then it has a convergent subsequence. If all members of (a
n
) belong to a
closed bounded set S, then so does the limit of any convergent subsequence of (a
n
).
Proof. For notational simplicity we prove the case p = 2, but clearly the proof
easily generalises to p > 2.
Suppose then that (a
n
) ⊆ R
p
is bounded.
Write a
n
= (x
n
, y
n
). Then the sequences (x
n
) and (y
n
) are also bounded (see
the paragraph after Theorem 11.3.3).
Let (x
n(k)
) be a convergent subsequence of (x
n
) (this uses the BolzanoWeierstraß
Theorem 5.2.2 for real sequences).
Now consider the sequence (x
n(k)
, y
n(k)
). Let (y
n
(k)
) be a convergent subse
quence of (y
n(k)
) (again by the BolzanoWeierstraß Theorem for real sequences).
Note that (x
n
(k)
) is a subsequence of the convergent sequence (x
n(k)
), and so
also converges.
Since (x
n
(k)
) and (y
n
(k)
) converge, so does (a
n
(k)
) = ((x
n
(k)
, y
n
(k)
)).
38
Finally, since S is closed, any convergent subsequence from S must have its
limit in S.
.
11.5. Cauchy sequences.
Definition 11.5.1. A sequence (x
n
) is a Cauchy sequence if for any > 0
there is a corresponding N such that
m, n ≥ N ⇒ d (x
m
, x
n
) ≤ .
That is, ultimately any two members of the sequence (not necessarily consec
utive members) are within of each another.
38
After a bit of practice, we would probably write the previous argument as follow:
Proof. Since (a
n
) ⊆ R
p
, where a
n
= (x
n
, y
n
), is bounded, so is each of the component
sequences.
By the BolzanoWeierstraß Theorem for real sequences, on passing to a subsequence (but
without changing notation) we may assume (x
n
) converges. Again by the BolzanoWeierstraß
Theorem for real sequences, on passing to a further subsequence (without changing notation) we
may assume that (y
n
) also converges. (This gives a new subsequence (x
n
), but it converges since
any subsequence of a convergent sequence is also convergent).
Since the subsequences (x
n
) and (y
n
) converge, so does the subsequence (a
n
) = ((x
n
, y
n
)).
S
F[S]
F[F[S]]
11. SEQUENCES IN R
p
46
Proposition 11.5.2. The sequence (x
n
)
n≥1
in R
p
is Cauchy iﬀ the component
sequences (x
(k)
n
)
n≥1
(of real numbers) are Cauchy for k = 1, . . . , p.
Proof. (The proof is similar to that for Proposition 11.1.3.)
Assume (x
n
) is Cauchy. Since [x
(k)
m
−x
(k)
n
[ ≤ d(x
m
, x
n
), it follows that (x
(k)
n
)
n≥1
is a Cauchy sequence, for k = 1, . . . , p
Conversely, assume (x
(k)
n
)
n≥1
is a Cauchy sequence, for k = 1, . . . , p. Since
d(x
m
, x
n
) =
_
(x
(1)
m
−x
(1)
n
)
2
+ + (x
(p)
m
−x
(p)
n
)
2
,
it follows that (x
n
) is Cauchy.
Theorem 11.5.3. A sequence in R
p
is Cauchy iﬀ it converges.
Proof. A sequence is Cauchy iﬀ each of the component sequences is Cauchy
(Proposition 11.5.2) iﬀ each of the component sequences converges (Theorem 5.1.2)
iﬀ the sequence converges (Proposition 11.1.3).
11.6. Contraction Mapping Principle in R
p
. The reference is Reed Sec
tion 5.7, but with R
p
and d instead of / and ρ. In Section 15.5 of these notes we
will generalise this to any complete metric space with the same proof. But until
then, omit Reed Examples 1 and 3 and all except the ﬁrst four lines on page 208.
Suppose S ⊆ R
p
. We say F : S → S is a contraction if there exists λ ∈ [0, 1)
such that
d(F(x), F(y)) ≤ λd(x, y)
for all x, y ∈ S.
* A contraction map is continuous, and in fact uniformly continuous. More
generally, any Lipschitz map F is continuous, where Lipschitz means for some
λ ∈ [0, ∞) and all x, y ∈ T(f),
d(F(x), F(y)) ≤ λd(x, y).
The deﬁnitions of continuity and uniform continuity are essentially the same as
in the one dimensional case, see Section 13.1.3. The proof that Lipschitz implies
uniform continuity is essentially the same as in the one dimensional case, see Propo
sition 7.5.4.
A simple example of a contraction map on R
p
is the map
F(x) = a +r(x −b), (17)
for 0 ≤ r < 1. In this case λ = r, as is easily checked. Since
a +r(x −b) =
_
b +r(x −b)
_
+a −b,
11. SEQUENCES IN R
p
47
we see (17) is dilation about b by the factor r, followed by translation by the
vector a −b.
We say z is a ﬁxed point of the map F : S →S if F(z) = z.
In the preceding example, the unique ﬁxed point for any r ,= 1 (even r > 1) is
(a −rb)/(1 −r), as is easily checked.
The following theorem (and its generalisations to closed subsets of a complete
metric spaces) is known as the Contraction Mapping Theorem, the Contraction
Mapping Principle or the Banach Fixed Point Theorem. It has many important
applications.
Theorem 11.6.1. Let F : S → S be a contraction map, where S ⊆ R
p
is
closed. Then F has a unique ﬁxed point x.
39
Moreover, iterates of F applied to any initial point x
0
converge to x, and
d(x
n
, x) ≤
λ
n
1 −λ
d(x
0
, F(x
0
)). (18)
Proof. We will ﬁnd the ﬁxed point as the limit of a Cauchy sequence.
Let x
0
be any point in S and deﬁne a sequence (x
n
)
n≥0
by
x
1
= F(x
0
), x
2
= F(x
1
), x
3
= F(x
2
), . . . , x
n
= F(x
n−1
), . . . .
Let λ be the contraction ratio.
1. Claim: (x
n
) is Cauchy.
We have
d(x
n
, x
n+1
) = d(F(x
n−1
), F(x
n
) ≤ λd(x
n−1
, x
n
).
By iterating this we get
d(x
n
, x
n+1
) ≤ λ
n
d(x
0
, x
1
).
Thus if m > n then
d(x
n
, x
m
) ≤ d(x
n
, x
n+1
) + +d(x
m−1
, x
m
)
≤ (λ
n
+ +λ
m−1
) d(x
0
, x
1
).
(19)
But
λ
n
+ +λ
m−1
≤ λ
n
(1 +λ +λ
2
+ )
=
λ
n
1 −λ
→0 as n →∞.
It follows (why?) that (x
n
) is Cauchy.
Hence (x
n
) converges to some limit x ∈ R
p
. Since S is closed it follows that
x ∈ S.
2. Claim: x is a ﬁxed point of F.
We will show that d(x, F(x)) = 0 and so x = F(x). In fact
d(x, F(x)) ≤ d(x, x
n
) +d(x
n
, F(x))
= d(x, x
n
) +d(F(x
n−1
), F(x))
≤ d(x, x
n
) +λd(x
n−1
, x)
→0
as n →∞. This establishes the claim.
3. Claim: The ﬁxed point is unique.
39
In other words, F has exactly one ﬁxed point.
11. SEQUENCES IN R
p
48
If x and y are ﬁxed points, then F(x) = x and F(y) = y and so
d(x, y) = d(F(x), F(y)) ≤ λd(x, y).
Since 0 ≤ λ < 1 this implies d(x, y) = 0, i.e. x = y.
4. From (19) and the lines which follow it, we have
d(x
n
, x
m
) ≤
λ
n
1 −λ
d(x
0
, F(x
0
)).
Since x
m
→x, (18) follows.
Remark 11.6.1. It is essential that there is a ﬁxed λ < 1. For example, the
function F(x) = x
2
is a contraction on [0, a] for each 0 ≤ a <
1
2
but is not a
contraction on [0,
1
2
]. However, in this case there is still a ﬁxed point in [0,
1
2
],
namely 0.
To obtain an example where d (F(x), F(y)) < d (x, y) for all x, y ∈ S but there
is no ﬁxed point, consider a function F with the properties
x < F(x), 0 ≤ F
(x) < 1,
for all x ∈ [0, ∞), see the following diagram. (Can you write down an analytic
expression?)
By the Mean Value Theorem, d(F(x), F(y)) < d(x, y) for all x, y ∈ R. But
since x < F(x) for all x, there is no ﬁxed point.
Example 11.6.2. (Approximating
√
2) The map
F(x) =
1
2
_
x +
2
x
_
takes [1, ∞) into itself and is a contraction with λ =
1
2
.
The ﬁrst claim is clear from a graph, and can be then checked in various
standard ways.
√2
√2
1

graph of F
11. SEQUENCES IN R
p
49
Since F
(x) =
1
2
−
1
x
2
, it follows for x ∈ [1, ∞) that
−
1
2
≤ F
(x) <
1
2
.
Hence
[F(x
1
) −F(x
2
)[ ≤
1
2
[x
1
−x
2
[
by the Mean Value Theorem. This proves F is a contraction with λ =
1
2
.
The unique ﬁxed point is
√
2. It follows that we can approximate
√
2 by
beginning from any x
0
∈ [1, ∞) and iterating F. In particular, the following provide
approximations to
√
2 :
1,
3
2
,
17
12
≈ 1.417,
577
408
≈ 1.4142156.
(Whereas
√
2 = 1.41421356 . . . .) This was known to the Babylonians, nearly 4000
years ago.
Example 11.6.3 (Stable ﬁxed points). See Reed, Theorem 5.7.2 on page 206
and the preceding paragraph.
Suppose F : S →S is continuously diﬀerentiable on some open interval S ⊆ R
and F(x
∗
) = x
∗
(i.e. x
∗
is a ﬁxed point of F). We say x
∗
is a stable ﬁxed point if
there is an ε > 0 such that F
n
(x
0
) →x
∗
for every x
0
∈ [x
∗
−ε, x
∗
+ε].
If [F
(x
∗
)[ < 1 it follows from the Mean Value Theorem and the Contrac
tion Mapping Principle that x
∗
is a stable ﬁxed point of F (Reed, Theorem 5.7.2
page 206).
Example 11.6.4 (The Quadratic Map). See Reed Example 4 on page 208.
In Section 6 we considered the quadratic map
F : [0, 1] →[0, 1], F(x) = rx(1 −x).
The only ﬁxed points of F are x = 0 and x = x
∗
:= 1 −1/r. As noted in Section 6,
if 1 < r ≤ 3 it can be shown that, beginning from any x
0
∈ (0, 1), iterates of F
converge to x
∗
. The proof is quite long.
However, we can prove less, but much more easily.
Note that F
(x) = r(1 − 2x) and so F
(x
∗
) = 2 − r. If 1 < r < 3 then
[F
(x
∗
)[ < 1. Hence x
∗
is a stable ﬁxed point of F, and so iterates F
n
(x
0
) converge
to x
∗
provided x
0
is suﬃciently close to x
∗
.
Example 11.6.5. (Newton’s Method) (Recall Reed Theorem 4.4.1) Suppose
that f(x) = 0, f
(x) ,= 0 and f
exists and is bounded in some interval containing x.
_
x
F(x) = x 
f(x)
f’(x)
__
x
f
11. SEQUENCES IN R
p
50
Then Newton’s Method is to iterate the function
F(x) = x −
f(x)
f
(x)
.
Here is a very short proof that the method works: Since
F
(x) =
f(x)
(f
(x))
2
f
(x),
we see F
(x) = 0. Hence x is a stable ﬁxed point of F and so iterates F
n
(x
0
)
converge to x for all x
0
suﬃciently close to x.
* We can even show quadratic convergence with a little extra work. The main
point is that the contraction ratio converges to 0 as x →x.
12. COMPACT SUBSETS OF R
p
51
12. Compact subsets of R
p
In this section we see that a subset of R
p
is closed and bounded iﬀ it is sequen
tially compact iﬀ it is compact. We also see that a subset of an arbitrary metric
space is sequentially compact iﬀ it is compact, that either implies the subset is
closed and bounded, but that (closed and bounded) need not imply compact (or
sequentially compact).
In the next section we will see that compact sets have nice properties with
respect to continuous functions (Theorems 13.2.1 and 13.5.1).
12.1. Sequentially compact sets.
Definition 12.1.1. A set A ⊆ R
p
is sequentially compact if every (x
n
) ⊆ A
has a convergent subsequence with limit in A.
We saw in the BolzanoWeierstraß Theorem 11.4.2 that a closed bounded subset
of R
p
is sequentially compact. The converse is also true.
Theorem 12.1.2. A set A ⊆ R
p
is closed and bounded iﬀ it is sequentially
compact.
Proof. We only have one direction left to prove.
So suppose A is sequentially compact.
To show that A is closed in R
p
, suppose (x
n
) ⊆ A and x
n
→ x. We need to
show that x ∈ A.
By sequential compactness, some subsequence (x
n
) converges to some x
∈ A.
But from Proposition 11.4.1 any subsequence of (x
n
) must converge to the same
limit as (x
n
). Hence x
= x and so x ∈ A. Thus A is closed.
To show that A is bounded, assume otherwise. Then for each n we can choose
some a
n
∈ A with a
n
,∈ B
n
(0), i.e. with d(a
n
, 0) > n. It follows that any sub
sequence of (a
n
) is unbounded. But this means no subsequence is convergent by
Proposition 11.3.4. This contradicts the fact A is compact, and so the assumption
is false, i.e. A is bounded.
Remark 12.1.1. The above result is true in an arbitrary metric space in only
one direction. Namely, sequentially compact implies closed and bounded and the
proof is essentially the same. The other direction requires the Heine Borel theorem,
which need not hold in an arbitrary metric space.
12.2. Compact sets. An open cover of a set A is a collection of open sets
A
λ
(indexed by λ ∈ J, say) such that
A ⊆
_
λ∈J
A
λ
.
We denote the collection of sets by ¦ A
λ
[ λ ∈ J ¦.
A ﬁnite subcover of the given cover is a ﬁnite subcollection which still covers
A. That is, a ﬁnite subcover is a collection of sets ¦ A
λ
[ λ ∈ J
0
¦ for some ﬁnite
J
0
⊆ J, such that
A ⊆
_
λ∈J
0
A
λ
.
(Thus a ﬁnite subcover exists if one can keep just a ﬁnite number of sets from the
original collection and still cover A.)
The following is an example of a cover of A. This particular cover is already
ﬁnite.
A
A
i
A
j
A
k
0 1
(
(
( (
)
) ) ) )
12. COMPACT SUBSETS OF R
p
52
Example 12.2.1. An open cover of a set which has no ﬁnite subcover is
(0, 1) ⊆
_
n≥2
_
1
n
, 1
_
.
This is easy to see, why?
Another example is
(0, 1) ⊆
_
n≥2
_
1 −
2
n
, 1 −
1
n
_
.
We can also write the cover as
_
0,
1
2
_
∪
_
1
3
,
2
3
_
∪
_
2
4
,
3
4
_
∪
_
3
5
,
4
5
_
∪
_
4
6
,
5
6
_
∪ . . .
The following is a rough sketch of the ﬁrst four intervals in the cover, indicated
by lines with an arrow at each end.
The intervals overlap since 1 −
1
n
> 1 −
2
n+1
if n ≥ 2 (because this inequality is
equivalent to the inequality
1
n
<
2
n+1
, which in turn is equivalent to the inequality
n+1
n
< 2, which in turn is equivalent to the inequality 1 +
1
n
< 2, which is certainly
true.) The intervals cover (0, 1) because they overlap and because 1 −
1
n
→1.
But it is clear that there is no ﬁnite subcover. (If n is the largest integer such
that
_
1 −
2
n
, 1 −
1
n
_
is in some ﬁnite collection of such intervals, then no x between
1 −
1
n
and 1 is covered.)
Definition 12.2.2. A set A ⊆ R
p
is compact if every open cover of A has a
ﬁnite subcover.
Note that we require a ﬁnite subcover of every open cover of A.
Example 12.2.3. Every ﬁnite set A is compact. (This is a simple case of the
general result that every closed bounded set is compact — see Remark ??.)
To see this, suppose A ⊆
λ∈J
A
λ
. For each a ∈ A choose one A
λ
which
contains a. The collection of A
λ
chosen in this way is a ﬁnite subcover of A.
We next prove that if a subset of R
p
is compact, then it is closed and bounded.
The converse direction is true for subsets of R
p
, but not for subsets of an arbitrary
metric space — we show all this later.
Theorem 12.2.4. If A ⊆ R
p
is compact, then it is closed and bounded.
• b
A
12. COMPACT SUBSETS OF R
p
53
Proof. Suppose A ⊆ R
p
is compact.
For any set A ⊆ R
p
,
A ⊆
_
n≥1
B
n
(0),
since the union of all such balls is R
p
. Since A is compact, there is a ﬁnite subcover.
If N is the radius of the largest ball in the subcover, then
A ⊆ B
N
(0).
Hence A is bounded.
If A is not closed then there exists a sequence (a
n
) ⊆ A with a
n
→b ,∈ A.
For each a ∈ A, let r
a
=
1
2
d(a, b). Clearly,
A ⊆
_
a∈A
B
r
a
(a).
By compactness there is a ﬁnite subcover:
A ⊆ B
r
1
(a
1
) ∪ ∪ B
r
n
(a
n
),
say.
Let r be the minimum radius of any ball in the subcover (r > 0 as we are taking
the minimum of a ﬁnite set of positive numbers). Every point x ∈ A is in some
B
r
i
(a
i
) (i = 1, . . . , n) and so within distance r
i
of a
i
. Since the distance from a
i
to b is 2r
i
, the distance from x to b is at least r
i
and hence at least r. But this
contradicts the fact that there is a sequence from A which converges to b.
Hence A is closed.
12.3. Compact sets. We next prove that sequentially compact sets and com
pact sets in R
p
are the same. The proofs in both directions are starred. They both
generalise to subsets of arbitrary metric spaces.
It follows that the three notions of closed and bounded, of sequential compact
ness, and of compactness, all agree in R
p
. To prove this, without being concerned
about which arguments generalise to arbitrary metric spaces, it is suﬃcient to con
sider Theorem 12.1.2 (closed and bounded implies sequentially compact), Theorem
12.2.4 (compact implies closed and bounded), and the following Theorem 12.3.1
just in one direction (sequentially compact implies compact).
Theorem 12.3.1. A set A ⊆ R
p
is sequentially compact iﬀ it is compact.
Proof*. First suppose A is sequentially compact. Suppose
A ⊆
_
λ∈J
A
λ
,
where ¦ A
λ
[ λ ∈ J ¦ is an open cover.
We ﬁrst claim there is a subcover which is either countable or ﬁnite. (This is
true for any set A, not necessarily closed or bounded.)
To see this, consider each point x = (x
1
, . . . , x
p
) ∈ A all of whose coordinates
are rational, and consider each rational r > 0. For such an x and r, if the ball
x*
B
r*
( x*)
B
r
(x)
x
12. COMPACT SUBSETS OF R
p
54
B
r
(x) ⊆ A
λ
for some λ ∈ J, choose one such A
λ
. Let the collection of A
λ
chosen
in this manner be indexed by λ ∈ J
, say.
Thus the set J
corresponds to a subset of the set
1
Q
p
Q Q, which is
countable by Reed Proposition 1.3.4 applied repeatedly (products of countable sets
are countable). Since a subset of a countable set is either ﬁnite or countable (Reed
page 16 Proposition 1.3.2), it follows that J
is either ﬁnite or countable.
We next show that the collection of sets ¦ A
λ
[ λ ∈ J
¦ is a cover of A.
To see this consider any x
∗
∈ A. Then x
∗
∈ A
λ
∗ for some λ
∗
∈ J, and so
B
r
∗(x
∗
) ⊆ A
∗
λ
for some real number r
∗
> 0 since A
λ
∗ is open.
It is clear from the diagram that we can choose x with rational components
close to x
∗
, and then choose a suitable rational r > 0, such that
x
∗
∈ B
r
(x) and B
r
(x) ⊆ B
r
∗(x
∗
).
(The precise argument uses the triangle inequality.
40
) In particular, B
r
(x) ⊆ A
λ
∗
and so B
r
(x) must be one of the balls used in constructing J
. In other words,
B
r
(x) ⊆ A
λ
for some λ ∈ J
.
The collection of all B
r
(x) obtained in this manner must be a cover of A
(because every x
∗
∈ A is in at least one such B
r
(x) ). It follows that the collection
¦ A
λ
[ λ ∈ J
¦ is also a cover of A.
But we have seen that the set ¦ A
λ
[ λ ∈ J
¦ is ﬁnite or countable, and so we
have proved the claim.
We next claim that there is a ﬁnite subcover of any countable cover.
To see this, write the countable cover as
A ⊆ A
1
∪ A
2
∪ A
3
∪ . . . . (20)
If there is no ﬁnite subcover, then we can choose
a
1
∈ A¸ A
1
a
2
∈ A¸ (A
1
∪ A
2
)
a
3
∈ A¸ (A
1
∪ A
2
∪ A
3
)
.
.
.
By sequential compactness of A there is a subsequence (a
n
) which converges
to some a ∈ A. From (20), a ∈ A
k
for some k. Because A
k
is open, a
n
∈ A
k
for
all suﬃciently large n
. But from the construction of the sequence (a
n
) a
n
,∈ A
k
if
n ≥ k. This is a contradiction.
Hence there is a ﬁnite subcover in (20), and so we have proved the claim.
40
For example, choose x with rational components so d(x, x
∗
) < r
∗
/4 and then choose
rational r so r
∗
/4 < r < r
∗
/2.
Then x
∗
∈ B
r
(x) since d(x, x
∗
) < r
∗
/4 < r.
Moreover, if y ∈ B
r
(x) then
d(x
∗
, y) < d(x
∗
, x) +d(x, y) < r
∗
/4 +r < r
∗
/4 +r
∗
/2 < r
∗
.
Hence y ∈ B
r
∗(x
∗
) and so B
r
(x) ⊆ B
r
∗(x
∗
).
12. COMPACT SUBSETS OF R
p
55
Putting the two claims together, we see that there is a ﬁnite subcover of any
cover of A, and so A is compact.
This proves one direction in the theorem.
To prove the other direction, suppose A is compact.
Assume A is not sequentially compact.
This means we can choose a sequence (a
n
) ⊆ A that has no subsequence con
verging to any point in A. We saw in the previous theorem that A is closed. This
then implies that in fact (a
n
) has no subsequence converging to any point in R.
Let the set of values taken by (a
n
) be denoted
S = ¦ a
1
, a
2
, a
3
, . . . ¦.
(Some values may be listed more than once.)
Claim: S is inﬁnite. To see this, suppose S is ﬁnite. Then at least one value a
would be taken an inﬁnite number of times. There is then a constant subsequence
all of whose terms equal a and which in particular converges to a. This contradicts
the above and so establishes the claim.
Claim: S is closed in R
p
. If S is not closed then from Theorem 11.2.1 there
is a sequence (b
n
) from S which converges to b ,∈ S. Choose a subsequence (b
n
)
of (b
n
) so that b
1
= b
1
, so that b
2
occurs further along in the sequence (a
n
)
than does b
1
, so that b
3
occurs further along in the sequence (a
n
) than does b
2
,
etc. Then the sequence (b
n
) is a subsequence of (a
n
), and it converges to b. This
contradicts what we said before about (a
n
). Hence S is closed in R
p
as claimed.
A similar argument shows that for each a ∈ S there is a neighbourhood U(a) =
B
r
(a) (for some r depending on a) such that the only point in S ∩ U(a) is a.
(Otherwise we could construct in a similar manner to the previous paragraph a
subsequence of (a
n
) which converges to a.)
Next consider the following open cover of A (in fact of all of R
p
):
S
c
, U(a
1
), U(a
2
), U(a
3
), . . . .
By compactness there is a ﬁnite subcover of A, and by adding more sets if necessary
we can take a ﬁnite subcover of the form
S
c
, U(a
1
), . . . , U(a
N
)
for some N.
But this is impossible, as we see by choosing a ∈ S (⊆ A) with a ,= a
1
, . . . , a
N
(remember that S is inﬁnite) and recalling that S ∩ U(a
i
) = ¦a
i
¦. In particular,
a ,∈ S
c
and a ,∈ U(a
i
) for i = 1, . . . , N.
Thus the assumption is false and so we have proved the theorem,
12.4. More remarks. We have seen that A ⊆ R
p
is closed and bounded iﬀ
it is sequentially compact iﬀ it is compact.
Remark 12.4.1.* In an arbitrary metric space the second “iﬀ” is still true,
with a “similar” argument to that of Theorem 12.3.1. The notion of points with
rational coordinates is replaced by the notion of a “countable dense subset” of A.
In an arbitrary metric space sequentially compact implies closed and bounded
(with essentially the same proof as in Theorem 12.1.2), but closed and bounded
does not imply sequentially compact (the proof of Theorem 11.4.2 uses components
and does not generalise to arbitrary metric spaces). The general result is that
complete and totally bounded (these are the same as closed and bounded in R
p
) is
equivalent to sequentially compact is equivalent to compact.
f(a)ε
B
δ
(a)
a
f
1
[ ( f(a)ε , f(a)+ε ) ]
f(a)
f(a)+ε
13. CONTINUOUS FUNCTIONS ON R
p
56
13. Continuous functions on R
p
Review Section 7 and Reed page 152 line 3 the end of paragraph 2 on page 154.
We consider functions f : A →R, where A ⊆ R
p
.
(Much of what we say will generalise to f : A → R where A ⊆ X and (X, ρ)
is a general metric space. It also mostly further generalises to replacing R by Y
where (Y, ρ
∗
) is another metric space.)
13.1. Basic results. Exactly as for p = 1, we have:
Definition 13.1.1. A function f : A →R is continuous at a ∈ A if
x
n
→a =⇒ f(x
n
) →f(a) (21)
whenever (x
n
) ⊆ A.
Theorem 13.1.2. A function f : A → R is continuous at a ∈ A iﬀ for every
ε > 0 there is a δ > 0 such that:
x ∈ A and d(x, a) ≤ δ =⇒ d(f(x), f(a)) ≤ . (22)
(The proof is the same as the one dimensional case, see Reed Theorem 3.1.3
page 77.)
The usual properties of sums, products, quotients (when deﬁned) and com
positions of continuous functions being continuous, still hold and have the same
proofs.
Definition 13.1.3. A function f : A →R is uniformly continuous iﬀ for every
ε > 0 there is a δ > 0 such that:
x ∈ A and d(x
1
, x
2
) ≤ δ =⇒ d(f(x), f(a)) ≤ . (23)
This is also completely analogous to the case p = 1.
Remark 13.1.1. In the previous Theorem and Deﬁnition we can replace either
or both “≤” by “<”. This follows essentially from the fact that we require the
statements to be true for every ε > 0. This is frequently very convenient and we
will usually do so without further comment. See also Adams ¸1.5, Deﬁnition 1.9.
Remark 13.1.2. We can think of (22) geometrically as saying either (using <
instead of ≤)
1. f[A∩ B
δ
(a)] ⊆ (f(a) −ε, f(a) +ε), or
2. A∩ B
δ
(a) ⊆ f
−1
[ (f(a) −ε, f(a) +ε) ].
See Section 13.4 for the notation, although the following diagram in case A = R
p
should make the idea clear.
B
δ
(a)
a
f
1
[ B
ε
(f(a)) ]
f(a)
f
B
ε
(f(a))
13. CONTINUOUS FUNCTIONS ON R
p
57
Remark 13.1.3. Deﬁnitions 13.1.1, 13.1.3 and Theorem 13.1.2 generalise im
mediately, without even changing notation, to functions f : A →R
q
. We can again
replace either or both “≤” by “<”.
The previous diagram then becomes:
A function f which has “rips” or “tears” are not continuous.
If f : A →R
q
we can write
f(x) = (f
1
(x), . . . , f
q
(x)).
It follows from Deﬁnition 13.1.1 (or Theorem 13.1.2) applied to functions with val
ues in R
q
, that f is continuous iﬀ each component function f
1
, . . . , f
q
is continuous.
For example, if
f : R
2
→R
3
, f(x, y) = (x
2
+y
2
, sin xy, e
x
),
then f is continuous since the component functions
f
1
(x, y) = x
2
+y
2
, f
2
(x, y) = sin xy, f
3
(x, y) = e
x
,
are all continuous.
In this sense we can always consider real valued functions instead of functions
into R
q
. But this is often not very useful — in linear algebra, for example, we
usually want to think of linear transformations as maps into R
q
rather than as q
component maps into R. And it does not help if we want to consider functions
which map into a general metric space.
13.2. Deeper properties of continuous functions.
Theorem 13.2.1. Suppose f : A → R is continuous where A ⊆ R
p
is closed
and bounded. Then f is bounded above and below and has a maximum and a
minimum value. Moreover, f is uniformly continuous of A.
This generalises the result for functions deﬁned on a closed bounded interval.
The proof is essentially the same as in the case p = 1. See Reed Theorem 4.6.1
page 153.
The theorem is also true for A ⊆ X where X is an arbitrary metric space,
provided A is sequentially compact (which is the same as compact, as we have
proved). The proof is essentially the same as in the case of X = R in Section 3.2 of
Reed. The BolzanoWierstrass theorem is not needed since we know immediately
13. CONTINUOUS FUNCTIONS ON R
p
58
from the deﬁnition of sequential compactness that every sequence from A has a
convergent subsequence.
* There is also a generalisation of the Intermediate Value Theorem, but for
this we need the set A to be “connected”. One deﬁnition of connected is that we
cannot write
A = (E ∩ A) ∪ (F ∩ A)
where E, F are open and E ∩ A, F ∩ A are nonempty. If f is continuous and A is
connected, then one can prove that the image of f is an interval.
13.3. Limits. As for the case p = 1, limits in general can be deﬁned either in
terms of sequences or in terms of ε and δ. See Deﬁnition 13.3.1 and Theorem 13.3.2.
We say a is a limit point of A if there is a sequence from A¸¦a¦ which converges
to a. We do not require a ∈ A. From Theorem 11.2.1, a set contains all its limit
points iﬀ it is closed.
We say a ∈ A is an isolated point of A if there is no sequence x
n
→a such that
(x
n
) ⊆ A¸ ¦a¦.
For example, if A = (0, 1] ∪ 2 ⊆ R then 2 is an isolated point of A (and is the
only isolated point of A). The limit points of A are given by the set [0, 1].
The following Deﬁnition, Theorem and its proof, are completely analogous to
the case p = 1.
Definition 13.3.1. Suppose f : A → R, where A ⊆ R
p
and a is a limit point
of A. If for any sequence (x
n
):
(x
n
) ⊆ A¸ ¦a¦ and x
n
→a ⇒ f(x
n
) →L,
then we say the limit of f(x) as x approaches a is L, or the limit of f at a is L,
and write
lim
x→a
x∈A
f(x) = L, or lim
x→a
f(x) = L.
The reason for restricting to sequences, none of whose terms equal a, is the
usual one. For example, if
f : R →R, f(x) =
_
x
2
x ,= 0
1 x = 0
or
g : R ¸ ¦0¦ →R, g(x) = x
2
,
then we want
lim
x→0
f(x) = 0, lim
x→0
g(x) = 0.
Theorem 13.3.2. Suppose f : A →R, where A ⊆ R
p
and a is a limit point of
A. Then lim
x→a
f(x) = L iﬀ
41
:
for every ε > 0 there is a corresponding δ > 0 such that
d(x, a) ≤ δ and x ,= a ⇒ [f(x) −L[ ≤ .
It follows, exactly as in the case p ,= 1, that if a ∈ T(f) is a limit point of
T(f)) then f is continuous at a iﬀ
lim
x→a
f(x) = f(a).
(If a is an isolated point of T(f) then f is always continuous at a according
to the deﬁnition — although lim
x→a
f(x) is not actually deﬁned. This is not an
41
As usual, we could replace either or both “≤ ” by “< ”
13. CONTINUOUS FUNCTIONS ON R
p
59
interesting sitation, and we would not normally consider continuity at isolated
points.)
The usual properties of sums, products and quotients of limits hold, with the
same proofs as in the case p = 1.
Example 13.3.3. Functions of two or more variables exhibit more complicated
behaviour than functions of one variable. For example, let
f(x, y) =
xy
x
2
+y
2
for (x, y) ,= (0, 0). (Setting x = r cos θ, y = r sin θ, we see that in polar coordinates
this is just the function f(r, θ) =
1
2
sin 2θ.)
If y = ax then f(x, y) = a(1 +a
2
)
−1
for x ,= 0. Hence
lim
(x,y)→a
y=ax
f(x, y) =
a
1 +a
2
.
Thus we obtain a diﬀerent limit of f as (x, y) → (0, 0) along diﬀerent lines. It
follows that
lim
(x,y)→(0,0)
f(x, y)
does not exist.
A partial diagram of the graph of f is:
Probably a better way to visualize f is by sketching level sets
42
of f as shown
in the next diagram. Then you can visualise the graph of f as being swept out by
a straight line rotating around the origin at a height as indicated by the level sets.
This may also help in understanding the previous diagram.
42
A level set of f is a set on which f is constant.
13. CONTINUOUS FUNCTIONS ON R
p
60
Example 13.3.4. Let
f(x, y) =
x
2
y
x
4
+y
2
for (x, y) ,= (0, 0).
Then
lim
(x,y)→(0,0)
y=ax
f(x, y) = lim
x→0
ax
3
x
4
+a
2
x
2
= lim
x→0
ax
x
2
+a
2
= 0.
Thus the limit of f as (x, y) → (0, 0) along any line y = ax is 0. The limit along
the yaxis x = 0 is also easily seen to be 0.
But it is still not true that lim
(x,y)→(0,0)
f(x, y) exists. For if we consider the
limit of f as (x, y) →(0, 0) along the parabola y = bx
2
we see that f = b(1 +b
2
)
−1
on this curve and so the limit is b(1 +b
2
)
−1
.
You might like to draw level curves (corresponding to parabolas y = bx
2
).
13. CONTINUOUS FUNCTIONS ON R
p
61
13.4. Another characterisation of continuity. One can deﬁne continuity
in terms of open sets (or closed sets) without using either sequences or the ε–δ
deﬁnition. See Theorem 13.4.3.
But for this we ﬁrst need to introduce some general notation for functions.
Definition 13.4.1. Suppose f : X → Y . The image of E ⊆ X under f is the
set
f[E] = ¦ y ∈ Y [ y = f(x) for some x ∈ E ¦.
The inverse image of E ⊆ Y under f is the set
f
−1
[E] = ¦ x ∈ X [ f(x) ∈ E ¦.
It is important to realise that f
−1
[E] is always deﬁned for E ⊆ Y , even if f
does not have an inverse. In fact the inverse function will exist iﬀ f
−1
¦y¦
43
contains
at most one element for each y ∈ Y .
The following are straightforward to check. Note that inverse images are bet
ter behaved than images. The results generalise immediately to intersections and
unions of more than two, including inﬁnitely many, sets.
Theorem 13.4.2. Suppose f : X →Y and E, E
1
, E
2
⊆ Y . Then
f
−1
[E
1
∩ E
2
] = f
−1
[E
1
] ∩ f
−1
[E
2
],
f
−1
[E
1
∪ E
2
] = f
−1
[E
1
] ∪ f
−1
[E
2
],
f
−1
[E
1
¸ E
2
] = f
−1
[E
1
] ¸ f
−1
[E
2
],
E
1
⊆ E
2
⇒f
−1
[E
1
] ⊆ f
−1
[E
2
]
f[f
−1
[E] ] ⊆ E.
If E, E
1
, E
2
⊆ X, then
f[E
1
∩ E
2
] ⊆ f[E
1
] ∩ f[E
2
],
f[E
1
∪ E
2
] = f[E
1
] ∪ f[E
2
],
f[E
1
¸ E
2
] ⊇ f[E
1
] ¸ f[E
2
]
E
1
⊆ E
2
⇒f[E
1
] ⊆ f[E
2
]
E ⊆ f
−1
[f[E] ].
Proof. For the ﬁrst, x ∈ f
−1
[E
1
∩ E
2
] iﬀ f(x) ∈ E
1
∩ E
2
iﬀ (f(x) ∈ E
1
and
f(x) ∈ E
2
) iﬀ (x ∈ f
−1
[E
1
] and x ∈ f
−1
[E
2
]) iﬀ x ∈ f
−1
[E
1
] ∩f
−1
[E
2
]. The others
are similar.
To see that equality need not hold in the sixth assertion let f(x) = x
2
, X =
Y = R, E
2
= (−∞, 0] and E
1
= [0, ∞). Then f[E
1
∩ E
2
] = ¦0¦ and f[E
1
] ∩
f[E
2
] = [0, ∞). Similarly, f[E
1
¸ E
2
] = (0, ∞) while f[E
1
] ¸ f[E
2
] = ∅. Also,
f
−1
[f[E
1
]] = R E
1
and f[f
−1
[R]] = [0, ∞) ,= R.
You should now go back and look again at Remarks 13.1.2 and 13.1.3 and the
two diagrams there.
We ﬁrst state and prove the following theorem for functions deﬁned on all of
R
p
. We then discuss the generalisation to f : A (⊆ R
p
) →B (⊆ R).
Theorem 13.4.3. Let f : R
p
→R. Then the following are equivalent:
1. f is continuous;
2. f
−1
[E] is open in R
p
whenever E is open in R;
3. f
−1
[C] is closed in R
p
whenever C is closed in R.
43
We sometimes write f
−1
E for f
−1
[E].
f
1
[E]
B
δ
(x)
f
1
( f(x)r , f(x) + r )
13. CONTINUOUS FUNCTIONS ON R
p
62
Proof.
(1) ⇒(2): Assume (1). Let E be open in R. We wish to show that f
−1
[E] is
open (in R
p
).
Let x ∈ f
−1
[E]. Then f(x) ∈ E, and since E is open there exists r > 0 such
that (f(x) −r, f(x) +r) ⊆ E.
From Remark 13.1.2 there exists δ > 0 such that
B
δ
(x) ⊆ f
−1
(f(x) −r, f(x) +r).
But
f
−1
(f(x) −r, f(x) +r) ⊆ f
−1
[E],
and so
B
δ
(x) ⊆ f
−1
[E].
Thus for every point x ∈ f
−1
[E] there is a δ = δ
x
> 0 such that the above line is
true, and so f
−1
[E] is open.
(2) ⇔(3): Assume (2), i.e. f
−1
[E] is open in R
p
whenever E is open in R. If
C is closed in R then C
c
is open and so f
−1
[C
c
] is open. But (f
−1
[C])
c
= f
−1
[C
c
].
Hence f
−1
[C] is closed.
We can similarly show (3) ⇒(2).
(2) ⇒(1): Assume (2).
Let x ∈ R
p
. In order to prove f is continuous at x take any r > 0.
Since (f(x) −r, f(x) +r) is open it follows that f
−1
(f(x) −r, f(x) +r) is open.
Since x ∈ f
−1
(f(x) − r, f(x) + r) and this set is open, it follows there exists
δ > 0 such that B
δ
(x) ⊆ f
−1
(f(x) −r, f(x) +r).
It now follows from Remark 13.1.2 that f is continuous at x.
Since x was an arbitrary point in R
p
, this ﬁnishes the proof.
Example 13.4.4. We can now give a simple proof that the examples at the
beginning of Section 10.5 are indeed open. For example, if
A = ¦ (x, y, z) ∈ R
3
[ x
2
+y
2
< z
2
and sin x < cos y ¦
then A = A
1
∩ A
2
where
A
1
= f
−1
1
(−∞, 0), f
1
(x, y, z) = x
2
+y
2
−z
2
,
A
2
= f
−1
2
(−∞, 0), f
2
(x, y, z) = sin x −cos y.
E
A
F
E does not include its boundary, A does.
F is relatively open in A, but is not open in R
2
.
13. CONTINUOUS FUNCTIONS ON R
p
63
Thus A
1
and A
2
are open, being the inverse image of open sets, and hence their
intersection is open.
Remark 13.4.1.* In order to extend the previous characterisation of conti
nuity to functions deﬁned just on a subset of R
p
, we need the following deﬁnitions
(which are also important in other settings).
Suppose A ⊆ R
p
. We say F ⊆ A is open (closed) in A or relatively open
(closed) in A if there is an open (closed) set E in R
p
such that
F = A∩ E.
It follows (exercise) that F ⊆ A is open in A iﬀ for each x ∈ F there is an
r > 0 such that
A∩ B
r
(x) ⊆ F.
Example 13.4.5. Consider the interval A = (0, 1]. Then the set F = (
1
2
, 1] is
open in A, as is A itself. The set F = (0,
1
2
] is closed in A, as is A itself. (What
could we take as the set E in each case?)
For another example, consider the square
F = ¦ (x, y) [ 0 ≤ x ≤ 1, 0 < y < 1 ¦,
which does not contain its vertical sides but does contain the remainder of its
horizontal sides. Let
A = ¦ (x, y) [ 0 ≤ x ≤ 1 ¦, B = ¦ (x, y) [ 0 < y < 1 ¦,
Then A is closed and B is open. Since F = A ∩ B it follows that F is open in A
and closed in B.
The interval (0, 1) is open in R . But it is not open when considered as a subset
of R
2
via the usual inclusion of R into R
2
.
Theorem 13.4.3 is true for functions f : A (⊆ R
p
) →R if we replace “in R
p
by
“in A”.
For (1) ⇒ (2) we suppose E is open. We need to show that f
−1
[E] is open
in A. Just replace B
δ
(x) by B
δ
(x) ∩ A and B
δ
x
(x) by B
δ
x
(x) ∩ A throughout the
proof.
The proof of (2) ⇔(3) is similar, by taking complements in A and noting that
a set is open in A iﬀ its complement in A is closed in A. (Why?)
For (2) ⇒ (1) we suppose the inverse image of an open set in R is open in A.
In the proof, replace B
δ
(x) by A∩B
δ
(x) (and use the exercise at the beginning of
this Remark).
Finally, we observe that the theorem also generalises easily to functions f :
A (⊆ R
p
)R
q
. Just replace the interval (f(x) − r, f(x) + r) by the ball B
r
(f(x))
throughout this Remark.
13. CONTINUOUS FUNCTIONS ON R
p
64
Remark 13.4.2. It is not true that the continuous image of an open (closed)
set need be open (closed).
For example, let f : R → R be given by f(x) = x
2
. Then f(−1, 1) = [0, 1)
which is neither open nor closed.
If f(x) = e
−x
2
then f[R] = (0, 1] which again is neither open nor closed.
13.5. Continuous image of compact sets*. The inverse image of a com
pact set under a continuous function need not be compact.
For example, if f : R →R is given by f(x) = e
−x
2
then f
−1
[0, 1] = R.
However, continuous images of compact sets are compact.
Theorem 13.5.1. Suppose f : E (⊆ R
p
) →R
q
is continuous and E is compact.
Then f[E] is compact.
Proof. Suppose
f[E] ⊆
_
λ∈J
A
λ
is an open cover. Then
E ⊆ f
−1
[f[E] ] ⊆ f
−1
_
_
λ∈J
A
λ
_
=
_
λ∈J
f
−1
[A
λ
].
This gives a cover of E by relatively open sets, and by compactness
44
there is a
ﬁnite subcover
E ⊆
_
λ∈J
0
f
−1
[A
λ
].
It follows from Theorem 13.4.2 that
f[E] ⊆ f
_
_
λ∈J
0
f
−1
[A
λ
]
_
=
_
λ∈J
0
f[f
−1
[A
λ
] ] ⊆
_
λ∈J
0
A
λ
.
Thus the original cover of f[E] has a ﬁnite subcover.
Hence f[E] is compact.
It is not always true that if f is continuous and onetoone then its inverse is
continuous. For example, let
f : [0, 2π) →¦ (x, y) [ x
2
+y
2
= 1 ¦ (⊆ R
2
), f(t) = (cos t, sin t).
Then the inverse function is given by
f
−1
: ¦ (x, y) [ x
2
+y
2
= 1 ¦ →[0, 2π), f(cos t, sin t) = t.
But this is not continuous, as we see by considering what happens near (1, 0).
For example,
_
cos 2π −
1
n
, sin 2π −
1
n
_
→(1, 0),
44
Each A
λ
can be extended to an open set in R
p
. By compactness there is a ﬁnite subcover
from these extended sets. Taking their restriction to E gives a ﬁnite subcover of the original cover.
13. CONTINUOUS FUNCTIONS ON R
p
65
but
f
−1
_
cos 2π −
1
n
, sin 2π −
1
n
_
= 2π −
1
n
,→f
−1
(1, 0) = 0.
However,
Theorem 13.5.2. Suppose f : E →R
q
is onetoone and continuous, and E is
compact. Then the inverse function f
−1
is also continuous (on its domain f[E]).
Proof. Since f is onetoone and onto its image, then its inverse function
f
−1
: f[E] →E is certainly welldeﬁned (namely, f
−1
(y) = x iﬀ f(x) = y).
To show that f
−1
is continuous we need to show that the inverse image under
f
−1
, of any set C ⊆ E which is closed in E, is closed in f[E]. Since f is onetoone
and onto, this inverse image (f
−1
)
−1
[C] under f
−1
is the same as the image f[C]
under f.
Since C is a relatively closed subset of a compact set, it is itself compact. (C is
the intersection of the closed bounded set E with a closed set, and so is itself closed
and bounded, hence compact.)
45
Hence f[C] is compact by the previous theorem,
and in particular is closed in R
q
. It follows that f[C] is closed in f[E].
45
On can also prove this directly from the deﬁnition of sequential compactness or from the
deﬁnition of compactness via open covers. Thus this result generalises to any metric space or even
any topological space.
14. SEQUENCES AND SERIES OF FUNCTIONS. 66
14. Sequences and series of functions.
The reference for this Chapter is Reed Chapter 5, ¸1–3, and Example 2 page
197; and Chapter 6, ¸6.3
14.1. Pointwise and uniform convergence. Study Reed ¸5.1 carefully; the
deﬁnitions and all the examples.
14.2. Properties of uniform convergence. Study Reed ¸5.2 carefully.
The point is that continuity and integration behave well under uniform conver
gence, but not under pointwise convergence.
Diﬀerentiation does not behave well, unless we also require the derivatives to
converge uniformly.
Theorem 5.2.4, concerning diﬀerentiation under the integral sign, is also im
portant.
14.3. The supremum norm and supremum metric. Study Reed ¸5.3 and
Example 2 page 197 carefully.
One way of measuring the “size” of a bounded function f is by means of the
supremum norm f
∞
(Reed page 175). We can then measure the distance between
two functions f and g (with the same domain E) by using the sup distance (or sup
metric)
ρ
∞
(f, g) = f −g
∞
= sup
x∈E
[f(x) −g(x)[.
In Reed, E ⊆ R, but exactly the same deﬁnition applies for E ⊆ R
n
.
In Proposition 5.3.1 it is shown that the sup norm satisﬁes positivity, the tri
angle inequality, and is well behaved under scalar multiples.
In Example 2 on page 197 it is shown that as a consequence the sup metric is
indeed a metric (i.e. satisﬁes positivity, symmetry and transitivity — see the ﬁrst
paragraph in Section 10.2).
14. SEQUENCES AND SERIES OF FUNCTIONS. 67
(See Reed page 177) A sequence of functions (f
n
) deﬁned on the same domain
E converges to f in the sup metric
46
if ρ
∞
(f
n
, f) → 0 as n → ∞. (Note that this
is completely analogous to Deﬁnition 11.1.1 for convergence of a sequence of points
in R
n
.)
One similarly deﬁnes the notion of Cauchy in the sup metric (or equivalently
Cauchy in the sup norm).
One proves (page 178) that if a sequence (f
n
) of continuous functions deﬁned
on a closed bounded interval [a, b] is Cauchy in the sup metric, then it converges in
the sup metric and the limit is also continuous on [a, b]. (This is analogous to the
fact that if a sequence from R
n
is Cauchy then it converges to a point in R
n
.)
The same theorem and proof works if [a, b] is replaced by any set E ⊆ R
n
, but
then we need to restrict to functions that are continuous and bounded (otherwise
the deﬁnition of ρ
∞
(f, g) may give ∞). (If E is compact we need not assume
boundedness, as this follows from continuity by Theorem 13.2.1.)
We say that the set of bounded and continuous functions deﬁned on E is com
plete in the sup metric (or norm).
14.4. Integral norms and metrics. Study Reed pp 179–181, particularly
Example 1. We cannot do much on this topic, other than introduce it.
The important “integral” norms for functions f with domain E are deﬁned by
f
p
=
__
E
[f[
p
_
1/p
.
for any 1 ≤ p < ∞.
The most important are p = 2 and p = 1 (in that order). If E is not an interval
in R then we need a more general notion of integration; to do it properly requires
the Lebesgue integral. But
_
E
g is still interpreted as the area between the graph
of f and the set E ⊆ R — being negative where it lies below the axis. In higher
dimensional cases we replace “area” by “volume” of the appropriate dimension.)
It is possible to prove for f
p
the three properties of a “norm” in Proposition
5.3.1 of Reed.
Actually, f
p
does not quite give a norm. The problem is that if f is not
continuous then f
p
may equal 0 even though f is not the zero function, i.e. not
everywhere zero. However f must then be zero “almost everywhere” — in a sense
that can be made precise. This problem does not arise if we restrict to continuous
functions, since if f is continuous and
_
b
a
[f[ = 0 then f = 0 everywhere. But it is
usually not desirable to restrict to continuous functions for other reasons.
The metrics corresponding to these norms are
ρ
p
(f, g) = f −p
p
,
again for 1 ≤ p < ∞.
Remark 14.4.1. * The set S of continuous functions deﬁned on a closed bounded
set is a metric space in any of the metrics ρ
p
, but it is not a complete metric space
(unless p = ∞). If we take the “completion” of the set S we are lead to the socalled
Lebesgue measurable functions and the theory of Lebesgue integration.
46
Reed says “converges in the sup norm”.
14. SEQUENCES AND SERIES OF FUNCTIONS. 68
14.5. Series of functions. Study Reed ¸4.3 carefully.
Suppose the functions f
j
all have a common domain (typically [a, b]).
The inﬁnite series of functions
∞
j=1
f
j
is said to converge pointwise to the
function f iﬀ the corresponding series of real numbers
∞
j=1
f
j
(x) converges to
f(x) for each x in the (common) domain of the f
j
.
From the deﬁnition of convergence of a series of real numbers, this means that
the inﬁnite sequence (S
n
) of partial sums
S
n
= f
1
+ +f
n
converges in the pointwise sense to f.
We say that the series
j
f converges uniformly if the sequence of partial sums
converges uniformly.
Note the important Weierstrass Mtest, Theorem 6.3.1 of Reed. This gives a
very useful criterion for uniform convergence of a series of functions.
The results about uniform convergence of a sequence of functions in Section14.2
(Reed ¸5.2) immediately give corresponding results for series (Reed Theorem 6.3.1
— last sentence; Theorem 6.3.2, Theorem 6.3.3). In particular:
• A uniformly convergent series of continuous functions has a continuous limit.
• Integration behaves well in the sense that for a uniformly convergent series
of continuous functions on a closed bounded interval, the integral of the
sum is the sum of the integrals (i.e. summation and integration can be
interchanged).
• If a series of C
1
functions converges uniformly on an interval to a function
f, and if the derivatives also converge uniformly to g say, then f is C
1
and
f
= g (i.e. summation and diﬀerentiation can be interchanged).
Study example 1 on page 240 of Reed. Example 2 is * material (it gives an
example of a continuous and nowhere diﬀerentiable function).
15. METRIC SPACES 69
15. Metric spaces
Study Reed ¸5.6.
The best way to study convergence and continuity is to do it abstractly by
means of metric spaces. This has the advantage of simplicity and generality.
15.1. Deﬁnition and examples. The reference is Reed ¸5.6 up to the end
of the ﬁrst paragraph on page 201.
The basic idea we need is the notion of a “distance function” or a “metric”.
Definition 15.1.1. A metric space is a set / together with a function ρ :
// →[0, ∞) (called a metric) which satisﬁes for all x, y, z ∈ /
1. ρ(x, y) ≥ 0 and ρ(x, y) = 0 iﬀ x = y, (positivity)
2. ρ(x, y) = ρ(y, x), (symmetry)
3. ρ(x, y) ≤ ρ(x, z) +ρ(z, y), (triangle inequality)
The idea is that ρ(x, y) measures the “distance (or length of a shortest route)
from x to y”. The three requirements are natural. In particular, the third might
be thought of as saying that one route from x to y is to take a shortest route from
x to z and then a shortest route from z to y — but there may be an even shorter
route from x to y which does not go via z.
Example 15.1.2 (Metrics on R
n
). We discussed the Euclidean metric on R
n
in Sections 10.1 and 10.2
Example 3 on page 197 of Reed is important. It can be generalised to give the
following metrics on R
n
:
ρ
p
(x, y) = ((x
1
−y
1
)
p
+. . . (x
n
−y
n
)
p
)
1/p
=
_
n
i=1
(x
i
−y
i
)
p
_
1/p
,
ρ
max
(x, y) = max¦ [x
i
−y
i
[ : i −1, . . . , n¦,
for x, y ∈ R
n
, 1 ≤ p < ∞.
We sometimes write ρ
∞
for ρ
max
.
Note the diagrams in Reed showing ¦ x ∈ R
2
[ρ
p
(x, 0) ≤ 1 ¦ for p = 1, 2, ∞.
The proof that ρ
p
is a metric has already been given for p = 2 (the Euclidean
metric). The cases p = 1, ∞ are Problem 1 page 202 of Reed in case n = 2, but the
proofs trivially generalise to n > 2. The case of arbitrary p is trickier, and I will
set it later as an assignment problem (with hints!).
Example 15.1.3 (Metrics on function spaces). These are extremely important.
See also Section 14.3.
The basic example is the sup metric ρ
∞
on ([a, b], the set of continuous func
tions f : [a, b] →R. See Reed Example 2 page 197.
More generally, one has the sup metric on
1. B(E), the set of bounded functions f : E (⊆ R
n
) →R,
2. B((E), the set of bounded continuous functions f : E (⊆ R
n
) →R.
In the second case, if E is compact, we need only require continuity as this already
implies boundedness.
The metrics ρ
p
(1 ≤ p < ∞) on ([a, b] (and generalisations to other sets than
[a, b]) have also been discussed, see Section 14.4.
* If we regard (a
1
, . . . , a
n
) as a function
a : ¦1, . . . , n¦ →R,
15. METRIC SPACES 70
and interpret
_
[a[
p
as
n
i=1
[a(i)[
p
,
(which is indeed the case for an appropriate “measure”), then the metrics ρ
p
(1 ≤
p < ∞) on function spaces give the metrics ρ
p
on R
n
as a special case.
Example 15.1.4 (Subsets of a metric space). Any subset of a metric space is
itself a metric space with the same metric, see the ﬁrst paragraph of Reed page 199.
See also the ﬁrst example in Example 4 of Reed, page 199.
Example 15.1.5 (The discrete metric). A simple metric on any set /is given
by
d(x, y) =
_
1 x ,= y,
0 x = y.
Check that this is a metric. It is usually called the discrete metric. (It is a particular
case of what Reed calls a “discrete metric” in Problem 10 page 202.) It is not very
useful, other than as a source of counterexamples to possible conjectures.
Example 15.1.6 (Metrics on strings of symbols and DNA sequences). Example
5 in Reed is a discussion about metrics as used to estimate the diﬀerence between
two DNA molecules. This is important in studying evolutionary trees.
Example 15.1.7 (The natural metric on a normed vector space). Any normed
vector space gives rise to a metric space deﬁned by
ρ(x, y) = x −y.
The three properties of a metric follow from the properties of a norm. Examples
are the metrics ρ
p
for 1 ≤ p ≤ ∞ on both R
n
and the metrics ρ
p
for 1 ≤ p ≤ ∞ on
spaces of functions.
But metric spaces are much more general. In particular, the discrete metric,
and Examples 4 and 5 on pages 199, 200 of Reed are not metrics which arise from
norms.
15.2. Convergence in a metric space. Reed page 201 paragraphs 2–4.
Once one has a metric, the following is the natural notion of convergence.
Definition 15.2.1. A sequence (x
n
) in a metric space (/, ρ) converges to
x ∈ / if ρ(x
n
, x) →0 as n →∞.
Notice that convergence in a metric space is reduced to the notion of conver
gence of a sequence of real numbers.
In the same way as for sequences in R, there can be at most one limit.
We discussed convergence in R
n
in Section 11.1. This corresponds to conver
gence with respect to the Euclidean metric. We will see in the next section that
convergence with respect to any of the metrics ρ
p
on R
n
is the same.
Convergence of a sequence of functions in the sup norm is the same as con
vergence in the sup metric. But convergence in the other metrics ρ
p
is not the
same.
For example, let
f
n
(x) =
_
¸
_
¸
_
0 −1 ≤ x ≤ 0
nx 0 ≤ x ≤
1
n
1
1
n
≤ x ≤ 1
, f(x) =
_
0 −1 ≤ x ≤ 0
1 0 < x ≤ 1
.
15. METRIC SPACES 71
Then ρ
∞
(f
n
, f) = 1 and ρ
1
(f
n
, f) =
1
2n
(why?). Hence f
n
→ f in ( ¹[−1, 1], ρ
1
)
(the space of Riemann integrable functions on [−1, 1] with the ρ
1
metric). But
f
n
,→f in ( ¹[−1, 1], ρ
∞
)
47
.
15.3. Uniformly equivalent metrics. The reference here is the Deﬁnition
and Theorem on Reed page 201, and Problem 11 page 203. However, we will only
consider uniformly equivalent metrics, rather than equivalent metrics.
Definition 15.3.1. Two metrics ρ and σ on the same set / are uniformly
equivalent
48
if there are positive numbers c
1
and c
2
such that
c
1
ρ(x, y) ≤ σ(x, y) ≤ c
2
ρ(x, y)
for all x, y ∈ /.
It follows that
c
−1
2
σ(x, y) ≤ ρ(x, y) ≤ c
−1
1
σ(x, y).
Reed also deﬁnes a weaker notion of “equivalent” (“uniformly equivalent” im
plies “equivalent” but not conversely). We will just need the notion of equivalent.
The following theorem has a slightly simpler proof than the analogous one in the
text for equivalent metrics.
Theorem 15.3.2. Suppose ρ and σ are uniformly equivalent metrics on /.
Then x
n
→x with respect to ρ iﬀ x
n
→x with respect to σ.
Proof. Since σ(x
n
, x) ≤ c
2
ρ(x
n
, x), it follows that ρ(x
n
, x) → 0 implies
σ(x
n
, x) →0. Similarly for the other implication.
The following theorem is a generalisation of Problem 12 page 203.
Example 15.3.3. The metrics ρ
p
on R
n
are uniformly equivalent to one an
other.
Proof. It is suﬃcient to show that ρ
p
for 1 ≤ p < ∞ is equivalent to ρ
∞
(since if two metrics are each uniformly equivalent to a third metric, then from
Deﬁnition15.3.1 it follows fairly easily that they are uniformly equivalent to one
another — Exercise).
But
ρ
p
(x, y) =
_
[x
1
−y
1
[
p
+ +[x
n
−y
n
[
p
_
1/p
≤
_
n max
1≤i≤n
[x
i
−y
i
[
p
_
1/p
= n
1/p
ρ
∞
(x, y),
and
ρ
∞
(x, y) = max
1≤i≤n
[x
i
−y
i
[ ≤ ρ
p
(x, y).
This completes the proof.
Remark 15.3.1. The metrics ρ
p
on function spaces are not uniformly equiva
lent (or even equivalent).
It is easiest to see this for ρ
1
and ρ
∞
on ¹[a, b] (the space of Riemann inte
grablefunctions on [a, b]). It follows from the example in the prevous section, since
f
n
→f with respect to ρ
1
but not with respect to ρ
∞
.
47
Remember that “Riemann integrable” implies “bounded” according to our deﬁnitions.
48
What is here called “uniformly equivalent” is usually called “equivalent”. But for consis
tency, I will keep to the convention in Reed.
15. METRIC SPACES 72
15.4. Cauchy sequences and completeness. The reference here is Reed
¸5.7 to the end of page 204.
The deﬁnition of a Cauchy sequence is exactly as for Deﬁnition 11.5.1 in R
p
.
Definition 15.4.1. A sequence (x
n
) is a Cauchy sequence if for any > 0
there is a corresponding N such that
m, n ≥ N ⇒ ρ (x
m
, x
n
) ≤ .
It follows from the Deﬁnition in Reed Page 177 that for a sequence of functions
“Cauchy in the sup norm” is the same as “Cauchy with respect to the sup metric”.
It follows as for convergence, that a sequence is Cauchy with respect to one
metric then it is Cauchy with respect to any uniformly equivalent metric.
The deﬁnition of a complete metric space is extremely important (Reed page 204).
Definition 15.4.2. A metric space (/, ρ) is complete if every Cauchy se
quence from / converges to an element in /.
Moreover (Problem 10(a) page 209):
Theorem 15.4.3. Suppose ρ and σ are uniformly equivalent metrics on /.
Then (/, ρ) is complete iﬀ (/, σ) is complete.
Proof. Suppose (/, ρ) is complete. Let (x
n
) be Cauchy with respect to σ.
Then it is Cauchy with respect to ρ and hence converges to x (say) with respect
to ρ. By Theorem 15.3.2, x
n
→ x with respect to σ. It follows that (/, σ) is
complete.
Example 15.4.4. The Cauchy Completeness Axiom says that (R, ρ
2
) is com
plete, and it follows from Proposition 11.5.3 that (R
n
, ρ
2
) is also complete.
It then follows from the previous theorem that (R
n
, ρ
p
) is complete for any
1 ≤ p ≤ ∞.
More generally, if M ⊆ R
n
is closed, then (M, ρ
p
) is complete for any 1 ≤ p ≤
∞. To see this, consider any Cauchy sequence (x
i
) in (M, ρ
2
). Since (x
i
) is also a
Cauchy sequence in R
n
it converges to x (say) in R
n
. Since M is closed, x ∈ M.
Hence (M, ρ
2
) is complete. It follows from the previous theorem that (M, ρ
p
) is
complete for any 1 ≤ p ≤ ∞.
If M ⊆ R
n
is not closed, then (M, ρ
p
) is not complete. To see this, choose a
sequence (x
i
) in M such that x
i
→x ,∈ M. Since (x
i
) converges in R
n
it is Cauchy
in R
n
and hence in M, but it does not converge to a member of M. Hence (M, ρ
p
)
is not complete.
Example 15.4.5. (([a, b], ρ
∞
) is complete from Reed Theorem 5.3.3 page 178.
However, (([a, b], ρ
1
) is not complete.
To see this, consider the example in Section 15.2. It is fairly easy to see that
the sequence (f
n
) is Cauchy in the ρ
1
metric. In fact if m > n then
ρ
1
(f
n
, f
m
) =
_
1/n
0
[f
n
−f
m
[ ≤
_
1/n
0
[f
n
[ +[f
m
[ ≤ 2/n.
Since f
n
→f in the larger space (¹[−1, 1], ρ
1
) of Riemann integrable functions
on [−1, 1] with the ρ
1
metric, it follows that (f
n
) is Cauchy in this space, and hence
is also Cauchy in (([a, b], ρ
1
). But f
n
does not converge to a member of this space
49
.
49
If it did, it would converge to the same function in the larger space, contradicting uniqueness
of limits in the larger space
15. METRIC SPACES 73
15.5. Contraction Mapping Principle. The reference here is Reed ¸5.7.
The deﬁnition of a contraction in Section 11.6 for a function F : S → S where
S ⊆ R
p
applies with S replaced by / and d replaced by ρ for any metric space
(/, ρ).
The Contraction Mapping Theorem, Theorem 11.6.1 applies with exactly the
same proof, for any complete metric space (/, ρ) instead of the closed set S and
the Euclidean metric d. Or see Reed Theorem 5.7.1 page 205.
Completeness is needed in the proof in order to show that the Cauchy sequence
of iterates actually converges to a member of /.
In Section 16 we will give some important applications of the Contraction
Mapping Principle. In particular, we will apply it to the complete metric space
((, [a, b], ρ
∞
).
15.6. Topological notions in metric spaces*. In this section I will point
out that much of what we proved for (R
n
, d) applies to an arbitrary metric space
(/, ρ) with the same proofs. The extensions are straightforward and I include
them for completeness and future reference in the Analysis II course. But we will
not actually need the material in this course.
References at about the right level are “Primer of Modern Analysis” by Ken
nan T. Smith, Chapter 7, and “An Introduction to Analysis” (second edition) by
William Wade, Chapter 10.
The deﬁnitions of neighbourhood, open ball B
r
(x), and open set are analogous
to those in Section 10.5. Theorem 10.5.2 remains true with R
n
replaced by /.
The deﬁnition of closed set is analogous to that in Section 10.6 and Theo
rem 10.6.2 is true with R
n
replaced by /.
The deﬁnitions of boundary point, boundary, interior point, interior, exterior
point and exterior are analogous to those in Section 10.8. The three dot points
there are true for S ⊆ /.
The product of two metric spaces (/
1
, ρ
1
) and (/
2
, ρ
2
) is the set /
1
/
2
with the metric ρ
1
ρ
2
deﬁned by
(ρ
1
ρ
2
)
_
(x
1
, y
1
), (x
2
, y
2
)
_
=
_
ρ
1
(x
1
, y
1
)
2
+ρ
2
(x
2
, y
2
)
2
.
If A
1
is open in (/
1
, ρ
1
) and A
2
is open in (/
2
, ρ
2
) the A
1
A
2
is open in
(/
1
/
2
, ρ
1
ρ
2
). The proof is the same as Theorem 10.10.1.
A set is closed iﬀ every convergent sequence from the set has its limit in the
set; the proof is the same as for Theorem 11.2.1.
The deﬁnition of a bounded set is the same as in Section 11.3, tha analogue
of Proposition 11.3.2 holds (with the origin replaced by any ﬁxed point a
∗
∈ /),
convergent sequences are bounded (Proposition 11.3.4).
Bounded sequences do not necessarily have a convergent subsequence (c.f. The
orem 11.4.2). The sequence (f
n
) in Section 15.2 is bounded in ([−1, 1] but no
subsequence converges. In fact it is not too hard to check that
ρ
∞
(f
n
, f
m
) ≥
1
2
if m ≥ 2n. To see this take x =
1
m
. Then
[f
m
(x) −f
n
(x)[ = 1 −
n
m
≥ 1 −
1
2
=
1
2
.
It follows that no subsequence can be Cauchy, and in particular no subsequence
can converge.
Cauchy sequences are deﬁned as in Deﬁnition 11.5.1. Convergent sequences
are Cauchy, this is proved easily as for sequences of real numbers. But Cauchy
15. METRIC SPACES 74
sequences will not necesarily converge (to a point in the metric space) unless the
metric space is complete.
A metric space (/, ρ) is sequentially compact if every sequence from / has
a convergent subsequence (to a point in /). This corresponds to the set A in
Deﬁnition 12.1.1 being sequentially compact. The metric space (/, ρ) is compact
if every open cover (i.e. a cover of / by subsets of / that are open in /) has a
ﬁnite subcover. This corresponds to the set A in Deﬁnition 12.2.2 being compact
(although A is covered by sets which are open in R
p
, the restriction of these sets
to A is a cover of A by sets that are open in A).
A subset of a metric space (/, ρ) is sequentially compact (compact) iﬀ it is
sequentially compact (compact) when regarded as a metric space with the induced
metric from /.
A metric space (/, ρ) is totally bounded if for every ε > 0 there is a cover of
/ by a ﬁnite number of open balls (in /) of radius ε. Then it can be proved
a metric space is totally bounded iﬀ every sequence has a Cauchy subsequence; see
Smith Theorem 9.9. (Note that what Smith calls “compact” is what is here, and
usually, called “sequentially compact”.)
A metric space is complete and totally bounded iﬀ it is sequentially compact iﬀ
it is compact. This is the analogue of Theorems 12.1.2 and 12.3.1, where the metric
space corresponds to A with the metric induced from R
p
.
The proof of the ﬁrst ‘iﬀ” is fairly straightforward. First suppose (/, ρ) is
complete and totally bounded. If (x
n
) is a sequence from / then it has a Cauchy
subsequence by the result two paragraphs back, and this subsequence converges to
a limit in / by completeness and so (/, ρ) is sequentially compact. Next suppose
(/, ρ) is sequentially compact. Then it is totally bounded again by the result two
paragraphs back. To show it is complete suppose the sequence (x
n
) is Cauchy: ﬁrst
by compactness it has a subsquence which converges to a member of /, and second
we use the fact (Exercise) that if a Cauchy sequence has a convergent subsequence
then the Cauchy sequence itself must converge to the same limit as the subsequence.
The proof of the second “iﬀ” is similar to that on Theorem 12.3.1. The proof
that “compact” implies “sequentially compact” is esentially identical. The proof
in the other direction uses the existence of a countable dense subset of /. (This
replaces the idea of considering those points in R
p
all of whose components are
rational.) We say a subset D of / is dense if for each x ∈ / and each ε > 0 there
is an x
∗
∈ D such that x ∈ B
ε
(x
∗
). The existence of a countable dense subset of
/ follows from sequential compactness.
50
If (/, ρ) and (^, σ) are metric spaces and f : / → ^ then we say f is
continuous at a ∈ / if
(x
n
) ⊆ / and x
n
→a ⇒f(x
n
) →f(a).
It follows that f is continuous at a iﬀ for every ε > 0 there is a δ > 0 such that for
all x, y ∈ /:
ρ(x, y) < δ ⇒σ(f(x), f(y)) < ε.
The proof is the same as for Theorem 7.1.2. One deﬁnes uniform continuity as in
Deﬁnition 13.1.3.
A continuous function deﬁned on a compact set is bounded above and below and
has a maximum and a minimum value. Moreoveor it is uniformly continuous. The
50
We have already noted that a sequentially compact set is totally bounded. Let A
k
be
the ﬁnite set of points corresponding to ε =
1
k
in the deﬁnition of total boundedness. Let A =
∞
k=1
A
k
. Then A is countable. It is also dense, since if x ∈ M then there exist points in A as
close to x as we wish.
15. METRIC SPACES 75
proof is the same as for Theorem 13.2.1, after using the fact that compactness is
equivalent to sequential compactness.
Limit points, isolated points, the deﬁnition of limit of a function at a point,
and the equivalence of this deﬁnition with the ε–δ characterisation, are completely
analogous to Deﬁnition 13.3.1 and Theorem 13.3.2.
A function f : / → ^ is continuous iﬀ the inverse image of every open
(closed) set in ^ is open (closed) in /. The proof is esentially the same as in
Theorem 13.4.3.
The continuous image of a compact set is compact. The proof is the same as
in Theorem 13.5.1.
The inverse of a onetoone continuous function deﬁned on a compact set is
continuous. The proof is essentially the same as for Theorem 13.5.2.
16. SOME APPLICATONS OF THE CONTRACTION MAPPING PRINCIPLE 76
16. Some applicatons of the Contraction Mapping Principle
16.1. Markov processes. Recall Theorem 4.0.2:
Theorem 16.1.1. If all entries in the probability transition matrix P are greater
than 0, and x
0
is a probability vector, then the sequence of vectors
x
0
, Px
0
, P
2
x
0
, . . . ,
converges to a probability vector x
∗
, and this vector does not depend on x
0
.
Moreover, the same results are true even if we just assume P
k
has all entries
nonzero for some integer k > 1.
The vector x
∗
is the unique nonzero solution of (P −I)x
∗
= 0.
We will give a proof that uses the Contraction Mapping Principle. The proof
is rather subtle, since P is only a contraction with respect to the ρ
1
metric, and it
is also necessary to restrict to the subset of R
n
consisting of those vectors a such
that
a
i
= 1.
Lemma 16.1.2. Suppose P is a probability transition matrix all of whose entries
are at least ε, for some ε > 0. Then P is a contraction map on
/ = ¦ a ∈ R
n
[
a
i
= 1 ¦.
in the ρ
1
metric, with contraction constant 1 −nε.
Proof. We will prove that
ρ
1
(Pa, Pb) ≤ (1 −nε) ρ
1
(a, b) (24)
for all a, b ∈ /.
Suppose a, b ∈ /. Let v = a −b. Then v ∈ H, where
H = ¦ v ∈ R
n
[
v
i
= 0 ¦.
Moreover,
ρ
1
(a, b) = a −b
1
= v
1
,
ρ
1
(Pa, Pb) = Pa −Pb
1
= Pv
1
,
where w
1
=
[w
i
[.
Thus in order to prove (24) it is suﬃcient to show
Pv
1
≤ (1 −nε) v
1
(25)
for all v ∈ H.
From the following Lemma, we can write
v =
ij
η
ij
(e
i
−e
j
), where η
ij
> 0, v
1
= 2
η
ij
. (26)
Then
Pv
1
=
_
_
_
ij
η
ij
P(e
i
−e
j
)
_
_
_
1
≤
ij
η
ij
P(e
i
−e
j
)
1
(by the triangle inequality for  
1
and using η
ij
> 0)
=
ij
η
ij
_
_
_
k
(P
ki
−P
kj
) e
k
_
_
_
1
=
ij
η
ij
k
[P
ki
−P
kj
[
16. SOME APPLICATONS OF THE CONTRACTION MAPPING PRINCIPLE 77
=
ij
η
ij
k
_
P
ki
+P
kj
−2 min¦P
ki
, P
kj
¦
_
(since [a − b[ = a + b − 2 min¦a, b¦ for any real numbers a and b, as one sees by
checking, without loss of generality, the case b ≤ a)
=
ij
η
ij
(2 −2ε)
(since the columns in P sum to 1 and each entry is ≥ ε )
= (1 −nε)v
1
(from (26))
This completes the proof.
Lemma 16.1.3. Suppose v ∈ R
n
and
v
i
= 0. Then v can be written in the
form
v =
i,j
η
ij
(e
i
−e
j
), where η
ij
> 0, v
1
= 2
η
ij
.
Proof. If v is in the span of two basis vectors, then after renumbering we
have
v = v
1
e
1
+v
2
e
2
(where v
1
+v
2
= 0)
= v
1
(e
1
−e
2
)
= v
2
(e
2
−e
1
).
Since either v
1
≥ 0 and then v
1
= 2v
1
, or v
2
≥ 0 and then v
1
= 2v
2
, this
proves the result in this case.
Suppose the claim is true for any v ∈ H which is in the span of k basis vectors.
Assume that v is spanned by k + 1 basis vectors, and write (after renumbering if
necessary)
v = v
1
e
1
+ +v
k+1
e
k+1
.
where v
1
+ +v
k+1
= 0.
Choose p so
[v
p
[ = max¦ [v
1
[, . . . , [v
k+1
[ ¦.
Choose q so v
q
has the opposite sign to v
p
(this is possible since
v
i
= 0). Note
that
[v
p
+v
q
[ = [v
p
[ −[v
q
[ (27)
since [v
q
[ ≤ [v
p
[ and since v
q
has the opposite sign to v
p
.
Write (where ´ indicates that the relevant term is missing from the sum)
v =
_
v
1
e
1
+ + ¯ v
q
e
q
+ + (v
p
+v
q
)e
p
+ +v
kn+1
e
k+1
_
+v
q
(e
q
−e
p
)
= v
∗
+v
q
(e
q
−e
p
).
Since v
∗
has no e
q
component, and the sum of the coeﬃcients of v
∗
is
v
i
= 0,
we can apply the inductive hypothesis to v
∗
to write
v
∗
=
η
ij
(e
i
−e
j
), where η
ij
> 0, v
∗

1
= 2
η
ij
.
In particular,
v =
_
η
ij
(e
i
−e
j
) +[v
q
[(e
q
−e
p
) v
q
≥ 0
η
ij
(e
i
−e
j
) +[v
q
[(e
p
−e
q
) v
q
≤ 0
16. SOME APPLICATONS OF THE CONTRACTION MAPPING PRINCIPLE 78
All that remains to be proved is
v
1
= 2
η
ij
+ 2[v
q
[,
i.e.
v
1
= v
∗

1
+ 2[v
q
[. (28)
But
v
∗

1
= [v
1
[ + +
¯
[v
q
[ + +[v
p
+v
q
[ + +[v
k+1
[
= ([v
1
[ + +[v
k+1
[) −[v
q
[ −[v
p
[ +[v
p
+v
q
[
= ([v
1
[ + +[v
k+1
[) −2[v
q
[ by (27)
= v
1
−2[v
q
[.
This proves (28) and hence the Lemma.
Proof of Theorem 16.1.1. The ﬁrst paragraph of the theorem follows from
the Contraction Mapping Principle and Lemma 16.1.2.
The last paragraph was proved after the statement of Theorem 4.0.2.
For the second paragraph we consider a sequence P
i
(x
0
) with i → ∞. From
Lemma 16.1.2, P
k
is a contraction map with contraction constant λ (say). For any
natural number i we can write i = mk +j, where 0 ≤ j < k.
Then
ρ
1
(P
i
x
0
, x
∗
) = ρ
1
(P
mk+j
x
0
, x
∗
)
≤ ρ
1
(P
mk+j
x
0
, P
mk
x
0
) +ρ
1
(P
mk
x
0
, x
∗
)
≤ ρ
1
_
(P
k
)
m
P
j
x
0
, (P
k
)
m
x
0
_
+ρ
1
((P
k
)
m
x
0
, x
∗
)
≤ λ
m
ρ
1
_
P
j
x
0
, x
0
_
+ρ
1
((P
k
)
m
x
0
, x
∗
)
≤ λ
m
max
0≤j<k
ρ
1
_
P
j
x
0
, x
0
_
+ρ
1
((P
k
)
m
x
0
, x
∗
).
(Note that m → ∞ as i → ∞). The ﬁrst term approaches 0 since λ
m
→ 0
and the second approaches 0 by the Contraction Mapping Principle applied to the
contraction map P
k
.
Remark 16.1.1. The fact we only needed to assume some power of P was a
contraction map can easily be generalised.
That is, in the Contraction Mapping Principle, Theorem 11.6.1, if we assume
1
F ◦ ◦
k
F is a contraction map for some k then there is still a unique ﬁxed point.
The proof is similar to that above for P.
16.2. Integral Equations. See Reed Section 5.4 for discussion.
Instead of the long proof in Reed of the main theorem, Theorem 5.4.1, we see
here that it is a consequence of the Contraction Mapping Theorem.
Theorem 16.2.1. Let F(x) be a continuous function deﬁned on [a, b]. Suppose
K(x, y) is continuous on [a, b] [a, b] and
M = max¦[K(x, y)[ [ a ≤ x ≤ b, a ≤ y ≤ b¦.
Then there is a unique continuous function ψ(x) deﬁned on [a, b] such that
ψ(x) = f(x) +λ
_
b
a
K(x, y) ψ(y) dy, (29)
provided [λ[ <
1
M(b−a)
.
16. SOME APPLICATONS OF THE CONTRACTION MAPPING PRINCIPLE 79
Proof. We will use the contraction mapping principle on the complete metric
space (([a, b], ρ
∞
).
Deﬁne
T : ([a, b] →([a, b]
by
(Tψ)(x) = f(x) +λ
_
b
a
K(x, y) ψ(y) dy (30)
Note that if ψ ∈ ([a, b], then Tψ is certainly a function deﬁned on [a, b] — the
value of Tψ at x ∈ [a, b] is obtained by evaluating the right side of (30). For ﬁxed
x, the integrand in (30) is a continuous function of y, and so the integral exists.
We next claim that Tψ is continuous, and so indeed T : ([a, b] →([a, b].
The ﬁrst term on the right side of (30) is continuous, being f.
The second term is also continuous. To see this let G(x, y) = K(x, y) ψ(y) and
let g(x) =
_
b
a
G(x, y) dy. Then Tψ = f + λg. To see that g is continuous on [a, b],
ﬁrst note that
[g(x
1
) −g(x
2
)[ =
¸
¸
¸
_
b
a
G(x
1
, y) −G(x
2
, y) dy
¸
¸
¸ ≤
_
b
a
[G(x
1
, y) −G(x
2
, y)[ dy. (31)
Now suppose > 0. Since G(x, y) is continuous on the closed bounded set [a, b]
[a, b], it is uniformly continuous there (by Theorem 13.2.1). Hence there is a δ > 0
such that
[(x
1
, y
1
) −(x
2
, y
2
)[ < δ ⇒ [G(x
1
, y
1
) −G(x
2
, y
2
)[ < .
Using this it follows from (31) that
[x
1
−x
2
[ < δ ⇒ [g(x
1
) −g(x
2
)[ < (b −a).
Hence g is uniformly continuous on [a, b].
This completes the proof that Tψ is continuous (in fact uniformly) on [a, b].
Hence
T : ([a, b] →([a, b].
Moreover, ψ is a ﬁxed point of T iﬀ ψ solves (29).
We next claim that T is contraction map on ([a, b]. To see this we estimate
[Tψ
1
(x) −Tψ
2
(x)[ = [λ[
¸
¸
¸
_
b
a
K(x, y) (ψ
1
(y) −ψ
2
(y)) dy
¸
¸
¸
≤ [λ[
_
b
a
[K(x, y)[ [ψ
1
(y) −ψ
2
(y)[ dy
≤ [λ[ M(b −a) ρ
∞
(ψ
1
, ψ
2
).
Since this is true for every x ∈ [a, b], it follows that
ρ
∞
(Tψ
1
, Tψ
2
) ≤ [λ[ M(b −a) ρ
∞
(ψ
1
, ψ
2
).
By the assumption on λ, it follows that T is a contraction map with contraction
ratio [λ[ M(b −a), which is < 1.
Because (([a, b], ρ
∞
) is a complete metric space, it follows that T has a unique
ﬁxed point, and so there is a unique continuous function ψ solving (29).
16. SOME APPLICATONS OF THE CONTRACTION MAPPING PRINCIPLE 80
Remark 16.2.1. Note that we also have a way of approximating the solution.
We can begin with some function such as
ψ
0
(x) = 0 for a ≤ x ≤ b,
and then successively apply T to ﬁnd better and better approximations to the
solution ψ.
In practice, we apply some type of numerical integration to ﬁnd approximations
to Tψ
0
, T
2
ψ
0
, T
3
ψ
0
, . . . .
16.3. Diﬀerential Equations. In this section we will prove the Fundamental
Existence and Uniqueness Theorem for Diﬀerential Equations, Theorem 7.1.1 of
Reed.
You should ﬁrst read Chapter 7 of Reed up to the beginning of Theorem 7.1.1.
We will prove Theorem 7.1.1 here, but with a simpler proof using the Contrac
tion Mapping Principle.
Theorem 16.3.1. Let f be continuously diﬀerentiable on the square
S = [t
0
−δ, t
0
+δ] [y
0
−δ, y
0
+δ].
Then there is a T ≤ δ, and a unique continuously diﬀerentiable function y(t) deﬁned
on [t
0
−T, t
0
+T], such that
dy
dt
= f(t, y) on [t
0
−T, t
0
+T],
y(t
0
) = y
0
.
(32)
Remark 16.3.1.
1. The diagram shows the square S centred at (t
0
, y
0
) and illustrates the situ
ation. We are looking for the unique curve through the point (t
0
, y
0
) which
at every point is tangent to the line segment of slope f(t, y).
Note that the solution through the point (t
0
, y
0
) “escapes” through the
top of the square S, and this is why we may need to take T < δ.
From the diagram, the absolute value of the slope of the solution will
be ≤ M, where M = max¦ [f(t, y) [ (t, y) ∈ S ¦. For this reason, we will
require that MT ≤ δ, i.e. T ≤ δ/M.
16. SOME APPLICATONS OF THE CONTRACTION MAPPING PRINCIPLE 81
2. The function y(t) must be diﬀerentiable for the diﬀerential equation to be
deﬁned. But it then follows from the diﬀerential equation that dy/dt is in
fact continuous.
* In fact more is true. The right side of the diﬀerential equation is
diﬀerentiable (why?) and its derivative is even continuous (why?). This
shows that dy/dt is diﬀerentiable with continuous derivative, i.e. y(t) is in
fact twice continuously diﬀerentiable. If f is inﬁnitely diﬀerentiable, it can
similarly be shown that y(t) is also inﬁnitely diﬀerentiable.
Proof of Theorem.
Step 1: We ﬁrst reduce the problem to an equivalent integral equation (not the
same one as in the last section, however).
It follows by integrating (32) from t
0
to t that any solution y(t) of (32) satisﬁes
y(t) = y
0
+
_
t
t
0
f(s, y(s)) ds. (33)
Conversely, any continuous function y(t) satisfying (33) is diﬀerentiable (by the
Fundamental Theorem of Calculus) and its derivative satisﬁes the diﬀerential equa
tion in (32). It also satisﬁes the initial condition y(t
0
) = y
0
(why?).
Step 2: We next show that (33) is the same as ﬁnding a certain “‘ﬁxed point”.
If y(t) is a continuous function deﬁned on [t
0
− δ, t
0
+ δ], let Fy (often writ
ten F(y)) be the function deﬁned by
(Fy)(t) = y
0
+
_
t
t
0
f(s, y(s)) ds. (34)
For the integral to be deﬁned, we require that (s, y(s)) belong to the square
S, and for this reason we will need to restrict to functions y(t) deﬁned on some
suﬃciently small interval [t
0
−T, t
0
+T].
More precisely, since f is continuous and hence bounded on S, it follows that
there is a constant M such that
(t, y) ∈ S ⇒ [f(t, y)[ ≤ M.
This implies from (34) that
[(Fy)(t) −y
0
[ ≤ M[t −t
0
[.
It follows that if the graph of y(t) is in S, then so is the graph of (Fy)(t), provided
t satisﬁes M[t −t
0
[ ≤ δ.
For this reason, we impose the restriction on T that MT ≤ δ, i.e.
T ≤
δ
M
. (35)
This ensures that (Fy)(t) is deﬁned for t ∈ [t
0
−T, t
0
+T].
Since f(s, y(s)) is continuous, being a composition of continuous functions, it
follows from the Fundamental Theorem of Calculus that (Fy)(t) is diﬀerentiable,
and in particular is continuous. Hence
F : ([t
0
−T, t
0
+T] →([t
0
−T, t
0
+T].
Moreover, it follows from the deﬁnition (34) that y(t) is a ﬁxed point of F iﬀ y(t)
is a solution of (33) on [t
0
−T, t
0
+ T] and hence iﬀ y(t) is a solution of (32) on
[t
0
−T, t
0
+T].
Step 2: We next impose a further restriction on T in order that F be a contraction
map. For this we need the fact that since
∂f
∂t
is continuous on S, there is a constant
16. SOME APPLICATONS OF THE CONTRACTION MAPPING PRINCIPLE 82
K such that
(t, y) ∈ S ⇒
¸
¸
¸
∂f
∂y
¸
¸
¸ ≤ K.
We now compute for any t ∈ [t
0
−T, t
0
+T] and any y
1
, y
2
∈ ([t
0
−T, t
0
+T],
that
[Fy
1
(t) −Fy
2
(t)[ =
¸
¸
¸
_
t
t
0
f(s, y
1
(s)) −f(s, y
2
(s)) ds
¸
¸
¸
≤
_
t
t
0
¸
¸
¸f(s, y
1
(s)) −f(s, y
1
(s))
¸
¸
¸ ds
≤
_
t
t
0
K[y
1
(s) −y
2
(s)[ ds by the Mean Value Theorem
≤ KT ρ
∞
(y
1
, y
2
)
since [t −t
0
[ ≤ T and since [y
1
(s) −y
2
(s)[ ≤ ρ
∞
(y
1
, y
2
).
Since t is any point in [t
0
−T, t
0
+T], it follows that
ρ
∞
(Fy
1
, Fy
2
) ≤ KT ρ
∞
(y
1
, y
2
).
We now make the second restriction on T that
T ≤
1
2K
. (36)
This guarantees that F is a contraction map on ([t
0
−T, t
0
+T] with contraction
constant
1
2
. It follows that F has a unique ﬁxed point, and that this is then the
unique solution of (32).
Remark 16.3.2. It appears from the diagram that we do not really need the
second restriction (36) on T. In other words, we should only need (35). This is
indeed the case. One can show by repeatedly applying the previous theorem that
the solution can be continued until it either escapes through the top, bottom or
sides of S. In fact this works for much more general sets S.
In particular, if the function f(t, y) and the diﬀerential equation are deﬁned
for all (t, y) ∈ R
2
, then the solution will either approach +∞ or −∞ at some ﬁnite
time t
∗
, or will exist for all time. For more discussion see Reed Section 7.2.
Remark 16.3.3. The same proof, with only notational changes, works for gen
eral ﬁrst order systems of diﬀerential equations of the form
dy
1
dt
= f
1
(t, y
1
, y
2
, . . . , y
n
)
dy
2
dt
= f
2
(t, y
1
, y
2
, . . . , y
n
)
.
.
.
dy
n
dt
= f
n
(t, y
1
, y
2
, . . . , y
n
).
Remark 16.3.4. Second and higher order diﬀerential equations can be reduced
to ﬁrst order systems by introducing new variables for the lower order derivates.
For example, the second order diﬀerential equation
y
= F(t, y, y
)
is equivalent to the ﬁrst order system of diﬀerential equations
y
1
= y
2
, y
2
= F(t, y
1
, y
2
).
16. SOME APPLICATONS OF THE CONTRACTION MAPPING PRINCIPLE 83
To see this, suppose y(t) is a solution of the given diﬀerential equation and let
y
1
= y, y
2
= y
. Then clearly y
1
(t), y
2
(t) solve the ﬁrst order system.
Conversely, if y
1
(t), y
2
(t) solve the ﬁrst order system let y = y
1
. Then y
= y
2
and y(t) solves the second order diﬀerential equation.
It follows from the previous remark that a similar Existence and Uniqueness
Theorem applies to second order diﬀerential equations, provided we specify both
y(t
0
) and y
(t
0
).
A similar remark applies to nth order diﬀerential equations, except that one
must specify the ﬁrst n −1 derivatives at t
0
.
Finally, similar remarks also apply to higher order systems of diﬀerential equa
tions. There are no new ideas involved, just notation!
D
.
x r
17. DIFFERENTIATION OF REALVALUED FUNCTIONS 86
17. Diﬀerentiation of RealValued Functions
I have included quite a lot of additional material, for future reference.
You need only consider Subsections 1, 3, 4, 5, 7, 8, 10. What is required is just
a basic understanding of the ideas and application to straightforward examples.
Note that the deﬁnition of “diﬀerentiable” on Reed page 155 is completely non
standard. It is the same as what is here, and elsewhere, called “continuously dif
ferentiable”.
17.1. Introduction. In this section we discuss the notion of derivative (i.e.
diﬀerential ) for functions f : D(⊂ R
n
) → R. In the next chapter we consider the
case for functions f : D(⊂ R
n
) →R
m
.
If m = 1 and n = 1 or 2, we can sometimes represent such a function by
drawing its graph. In case n = 2 (or perhaps n = 3) we can draw the level sets, as
is done in Section 17.6.
Convention Unless stated otherwise, we consider functions f : D(⊂ R
n
) →R
where the domain D is open. This implies that for any x ∈ D there exists r > 0
such that B
r
(x) ⊂ D.
Most of the following applies to more general domains D by taking onesided,
or otherwise restricted, limits. No essentially new ideas are involved.
17.2. Algebraic Preliminaries. The inner product in R
n
is represented by
y x = y
1
x
1
+. . . +y
n
x
n
where y = (y
1
, . . . , y
n
) and x = (x
1
, . . . , x
n
).
For each ﬁxed y ∈ R
n
the inner product enables us to deﬁne a linear function
L
y
= L : R
n
→R
given by
L(x) = y x.
Conversely, we have the following.
Proposition 17.2.1. For any linear function
L: R
n
→R
there exists a unique y ∈ R
n
such that
L(x) = y x ∀x ∈ R
n
. (37)
The components of y are given by y
i
= L(e
i
).
x x+te
i
. .
D
L
17. DIFFERENTIATION OF REALVALUED FUNCTIONS 87
Proof. Suppose L: R
n
→R is linear. Deﬁne y = (y
1
, . . . , y
n
) by
y
i
= L(e
i
) i = 1, . . . , n.
Then
L(x) = L(x
1
e
1
+ +x
n
e
n
)
= x
1
L(e
1
) + +x
n
L(e
n
)
= x
1
y
1
+ +x
n
y
n
= y x.
This proves the existence of y satisfying (37).
The uniqueness of y follows from the fact that if (37) is true for some y, then
on choosing x = e
i
it follows we must have
L(e
i
) = y
i
i = 1, . . . , n.
Note that if L is the zero operator, i.e. if L(x) = 0 for all x ∈ R
n
, then the
vector y corresponding to L is the zero vector.
17.3. Partial Derivatives.
Definition 17.3.1. The ith partial derivative of f at x is deﬁned by
∂f
∂x
i
(x) = lim
t→0
f(x +te
i
) −f(x)
t
(38)
= lim
t→0
f(x
1
, . . . , x
i
+t, . . . , x
n
) −f(x
1
, . . . , x
i
, . . . , x
n
)
t
,
provided the limit exists. The notation D
i
f(x) is also used.
Thus
∂f
∂x
i
(x) is just the usual derivative at t = 0 of the realvalued function g
deﬁned by g(t) = f(x
1
, . . . , x
i
+ t, . . . , x
n
). Think of g as being deﬁned along the
line L, with t = 0 corresponding to the point x.
17.4. Directional Derivatives.
Definition 17.4.1. The directional derivative of f at x in the direction v ,= 0
is deﬁned by
D
v
f(x) = lim
t→0
f(x +tv) −f(x)
t
, (39)
provided the limit exists.
x
x+tv
.
.
D
L
.
.
. .
a x
f(a)+f'(a)(xa)
graph of
x → f(a)+f'(a)(xa)
f(x)
graph of
x → f(x)
17. DIFFERENTIATION OF REALVALUED FUNCTIONS 88
It follows immediately from the deﬁnitions that
∂f
∂x
i
(x) = D
e
i
f(x). (40)
Note that D
v
f(x) is just the usual derivative at t = 0 of the realvalued function
g deﬁned by g(t) = f(x + tv). As before, think of the function g as being deﬁned
along the line L in the previous diagram.
Thus we interpret D
v
f(x) as the rate of change of f at x in the direction v; at
least in the case v is a unit vector.
Exercise: Show that D
αv
f(x) = αD
v
f(x) for any real number α.
17.5. The Diﬀerential (or Derivative).
Motivation Suppose f : I (⊂ R) → R is diﬀerentiable at a ∈ I. Then f
(a) can
be used to deﬁne the best linear approximation to f(x) for x near a. Namely:
f(x) ≈ f(a) +f
(a)(x −a). (41)
Note that the righthand side of (41) is linear in x. (More precisely, the right
side is a polynomial in x of degree one.)
The error, or diﬀerence between the two sides of (41), approaches zero as x →a,
faster than [x −a[ →0. More precisely
¸
¸
¸f(x) −
_
f(a) +f
(a)(x −a)
_¸
¸
¸
[x −a[
=
¸
¸
¸
¸
¸
¸
f(x) −
_
f(a) +f
(a)(x −a)
_
x −a
¸
¸
¸
¸
¸
¸
graph of
x → f(a)+L(xa)
graph of
x → f(x)
a
x
f(x)  (f(a)+L(xa))
17. DIFFERENTIATION OF REALVALUED FUNCTIONS 89
=
¸
¸
¸
¸
f(x) −f(a)
x −a
−f
(a)
¸
¸
¸
¸
→ 0 as x →a. (42)
We make this the basis for the next deﬁnition in the case n > 1.
Definition 17.5.1. Suppose f : D(⊂ R
n
) → R. Then f is diﬀerentiable at
a ∈ D if there is a linear function L: R
n
→R such that
¸
¸
¸f(x) −
_
f(a) +L(x −a)
_¸
¸
¸
[x −a[
→0 as x →a. (43)
The linear function L is denoted by f
(a) or df(a) and is called the derivative or
diﬀerential of f at a. (We will see in Proposition 17.5.2 that if L exists, it is
uniquely determined by this deﬁnition.)
The idea is that the graph of x →f(a) +L(x−a) is “tangent” to the graph of
f(x) at the point
_
a, f(a)
_
.
Notation: We write ¸df(a), x−a) for L(x−a), and read this as “df at a applied to
x −a”. We think of df(a) as a linear transformation (or function) which operates
on vectors x −a whose “base” is at a.
The next proposition gives the connection between the diﬀerential operating
on a vector v, and the directional derivative in the direction corresponding to v.
In particular, it shows that the diﬀerential is uniquely deﬁned by Deﬁnition 17.5.1.
Temporarily, we let df(a) be any linear map satisfying the deﬁnition for the
diﬀerential of f at a.
Proposition 17.5.2. Let v ∈ R
n
and suppose f is diﬀerentiable at a.
Then D
v
f(a) exists and
¸df(a), v) = D
v
f(a).
In particular, the diﬀerential is unique.
17. DIFFERENTIATION OF REALVALUED FUNCTIONS 90
Proof. Let x = a +tv in (43). Then
lim
t→0
¸
¸
¸f(a +tv) −
_
f(a) +¸df(a), tv)
_¸
¸
¸
t
= 0.
Hence
lim
t→0
f(a +tv) −f(a)
t
−¸df(a), v) = 0.
Thus
D
v
f(a) = ¸df(a), v)
as required.
Thus ¸df(a), v) is just the directional derivative at a in the direction v.
The next result shows df(a) is the linear map given by the row vector of partial
derivatives of f at a.
Corollary 17.5.3. Suppose f is diﬀerentiable at a. Then for any vector v,
¸df(a), v) =
n
i=1
v
i
∂f
∂x
i
(a).
That is, df(a) is the row vector
_
∂f
∂x
1
(a), . . . ,
∂f
∂x
n
(a)
_
.
Proof.
¸df(a), v) = ¸df(a), v
1
e
1
+ +v
n
e
n
)
= v
1
¸df(a), e
1
) + +v
n
¸df(a), e
n
)
= v
1
D
e
1
f(a) + +v
n
D
e
n
f(a)
= v
1
∂f
∂x
1
(a) + +v
n
∂f
∂x
n
(a).
Example 17.5.4. Let f(x, y, z) = x
2
+ 3xy
2
+y
3
z +z.
Then
¸df(a), v) = v
1
∂f
∂x
(a) +v
2
∂f
∂y
(a) +v
3
∂f
∂z
(a)
= v
1
(2a
1
+ 3a
2
2
) +v
2
(6a
1
a
2
+ 3a
2
2
a
3
) +v
3
(a
2
3
+ 1).
Thus df(a) is the linear map corresponding to the row vector (2a
1
+ 3a
2
2
, 6a
1
a
2
+
3a
2
2
a
3
, a
2
3
+ 1).
If a = (1, 0, 1) then ¸df(a), v) = 2v
1
+ v
3
. Thus df(a) is the linear map corre
sponding to the row vector (2, 0, 1).
If a = (1, 0, 1) and v = e
1
then ¸df(1, 0, 1), e
1
) =
∂f
∂x
(1, 0, 1) = 2.
Definition 17.5.5. (Rates of Convergence) If a function ψ(x) has the prop
erty that
[ψ(x)[
[x −a[
→0 as x →a,
then we say “[ψ(x)[ → 0 as x → a, faster than [x − a[ → 0”. We write o([x − a[)
for ψ(x), and read this as “little oh of [x −a[”.
If
[ψ(x)[
[x −a[
≤ M ∀[x −a[ < ,
17. DIFFERENTIATION OF REALVALUED FUNCTIONS 91
for some M and some > 0, i.e. if
[ψ(x)[
[x −a[
is bounded as x → a, then we say
“[ψ(x)[ → 0 as x → a, at least as fast as [x − a[ → 0”. We write O([x − a[) for
ψ(x), and read this as “big oh of [x −a[”.
Example 17.5.6. we can write
o([x −a[) for [x −a[
3/2
,
and
O([x −a[) for sin(x −a).
Clearly, if ψ(x) can be written as o([x − a[) then it can also be written as
O([x −a[), but the converse may not be true as the above example shows.
The next proposition gives an equivalent deﬁnition for the diﬀerential of a
function.
Proposition 17.5.7. If f is diﬀerentiable at a then
f(x) = f(a) +¸df(a), x −a) +ψ(x),
where ψ(x) = o([x −a[).
Conversely, suppose
f(x) = f(a) +L(x −a) +ψ(x),
where L: R
n
→R is linear and ψ(x) = o([x−a[). Then f is diﬀerentiable at a and
df(a) = L.
Proof. Suppose f is diﬀerentiable at a. Let
ψ(x) = f(x) −
_
f(a) +¸df(a), x −a)
_
.
Then
f(x) = f(a) +¸df(a), x −a) +ψ(x),
and ψ(x) = o([x −a[) from Deﬁnition 17.5.1.
Conversely, suppose
f(x) = f(a) +L(x −a) +ψ(x),
where L: R
n
→R is linear and ψ(x) = o([x −a[). Then
f(x) −
_
f(a) +L(x −a)
_
[x −a[
=
ψ(x)
[x −a[
→0 as x →a,
and so f is diﬀerentiable at a and df(a) = L.
Finally we have:
Proposition 17.5.8. If f, g : D(⊂ R
n
) → R are diﬀerentiable at a ∈ D, then
so are αf and f +g. Moreover,
d(αf)(a) = αdf(a),
d(f +g)(a) = df(a) +dg(a).
Proof. This is straightforward (exercise) from Proposition 17.5.7.
The previous proposition corresponds to the fact that the partial derivatives for
f +g are the sum of the partial derivatives corresponding to f and g respectively.
Similarly for αf
51
.
51
We cannot establish the diﬀerentiability of f + g (or αf) this way, since the existence of
the partial derivatives does not imply diﬀerentiability.
17. DIFFERENTIATION OF REALVALUED FUNCTIONS 92
17.6. The Gradient. Strictly speaking, df(a) is a linear operator on vectors
in R
n
(where, for convenience, we think of these vectors as having their “base at
a”).
We saw in Section 17.2 that every linear operator from R
n
to R corresponds to
a unique vector in R
n
. In particular, the vector corresponding to the diﬀerential at
a is called the gradient at a.
Definition 17.6.1. Suppose f is diﬀerentiable at a. The vector ∇f(a) ∈ R
n
(uniquely) determined by
∇f(a) v = ¸df(a), v) ∀v ∈ R
n
,
is called the gradient of f at a.
Proposition 17.6.2. If f is diﬀerentiable at a, then
∇f(a) =
_
∂f
∂x
1
(a), . . . ,
∂f
∂x
n
(a)
_
.
Proof. It follows from Proposition 17.2.1 that the components of ∇f(a) are
¸df(a), e
i
), i.e.
∂f
∂x
i
(a).
Example 17.6.3. For the example in Section 17.5 we have
∇f(a) = (2a
1
+ 3a
2
2
, 6a
1
a
2
+ 3a
2
2
a
3
, a
3
2
+ 1),
∇f(1, 0, 1) = (2, 0, 1).
Proposition 17.6.4. Suppose f is diﬀerentiable at x. Then the directional
derivatives at x are given by
D
v
f(x) = v ∇f(x).
The unit vector v for which this is a maximum is v = ∇f(x)/[∇f(x)[ (assum
ing [∇f(x)[ ,= 0), and the directional derivative in this direction is [∇f(x)[.
Proof. From Deﬁnition 17.6.1 and Proposition 17.5.2 it follows that
∇f(x) v = ¸df(x), v) = D
v
f(x)
This proves the ﬁrst claim.
Now suppose v is a unit vector. From the CauchySchwartz Inequality we have
∇f(x) v ≤ [∇f(x)[. (44)
By the condition for equality in the CauchySchwartz Inequality, equality holds
in (44) iﬀ v is a positive multiple of ∇f(x). Since v is a unit vector, this is
equivalent to v = ∇f(x)/[∇f(x)[. The left side of (44) is then [∇f(x)[.
Definition 17.6.5. If f : R
n
→R then the level set through x is ¦y: f(y) = f(x) ¦.
For example, the contour lines on a map are the level sets of the height function.
x
1
x
2
x
1
x
2
x
∇f(x)
.
graph of f(x)=x
1
2
+x
2
2
x
1
2
+x
2
2
= 2.5
x
1
2
+x
2
2
= .8
level sets of f(x)=x
1
2
+x
2
2
x
1
x
2
x
1
2
 x
2
2
= 2
x
1
2
 x
2
2
= .5
x
1
2
 x
2
2
= 2
x
1
2
 x
2
2
= 0
level sets of f(x) = x
1
2
 x
2
2
(the graph of f looks like a saddle)
17. DIFFERENTIATION OF REALVALUED FUNCTIONS 93
Definition 17.6.6. A vector v is tangent at x to the level set S through x if
D
v
f(x) = 0.
This is a reasonable deﬁnition, since f is constant on S, and so the rate of
change of f in any direction tangent to S should be zero.
Proposition 17.6.7. Suppose f is diﬀerentiable at x. Then ∇f(x) is orthog
onal to all vectors which are tangent at x to the level set through x.
Proof. This is immediate from the previous Deﬁnition and Proposition 17.6.4.
In the previous proposition, we say ∇f(x) is orthogonal to the level set through x.
17. DIFFERENTIATION OF REALVALUED FUNCTIONS 94
17.7. Some Interesting Examples.
Example 17.7.1. An example where the partial derivatives exist but the other
directional derivatives do not exist.
Let
f(x, y) = (xy)
1/3
.
Then
1.
∂f
∂x
(0, 0) = 0 since f = 0 on the xaxis;
2.
∂f
∂y
(0, 0) = 0 since f = 0 on the yaxis;
3. Let v be any vector. Then
D
v
f(0, 0) = lim
t→0
f(tv) −f(0, 0)
t
= lim
t→0
t
2/3
(v
1
v
2
)
1/3
t
= lim
t→0
(v
1
v
2
)
1/3
t
1/3
.
This limit does not exist, unless v
1
= 0 or v
2
= 0.
Example 17.7.2. An example where the directional derivatives at some point
all exist, but the function is not diﬀerentiable at the point.
Let
f(x, y) =
_
_
_
xy
2
x
2
+y
4
(x, y) ,= (0, 0)
0 (x, y) = (0, 0)
Let v = (v
1
, v
2
) be any nonzero vector. Then
D
v
f(0, 0) = lim
t→0
f(tv) −f(0, 0)
t
= lim
t→0
t
3
v
1
v
2
2
t
2
v
1
2
+t
4
v
2
4
−0
t
= lim
t→0
v
1
v
2
2
v
1
2
+t
2
v
2
4
=
_
v
2
2
/v
1
v
1
,= 0
0 v
1
= 0
(45)
Thus the directional derivatives D
v
f(0, 0) exist for all v, and are given by (45).
In particular
∂f
∂x
(0, 0) =
∂f
∂y
(0, 0) = 0. (46)
But if f were diﬀerentiable at (0, 0), then we could compute any directional
derivative from the partial drivatives. Thus for any vector v we would have
D
v
f(0, 0) = ¸df(0, 0), v)
= v
1
∂f
∂x
(0, 0) +v
2
∂f
∂y
(0, 0)
= 0 from (46)
This contradicts (45).
a = g(0)
a+h = g(1)
.
x
L
.
.
17. DIFFERENTIATION OF REALVALUED FUNCTIONS 95
Example 17.7.3. An Example where the directional derivatives at a point
all exist, but the function is not continuous at the point
Take the same example as in Example 17.7.2. Approach the origin along the
curve x = λ
2
, y = λ. Then
lim
λ→0
f(λ
2
, λ) = lim
λ→0
λ
4
2λ
4
=
1
2
.
But if we approach the origin along any straight line of the form (λv
1
, λv
2
),
then we can check that the corresponding limit is 0.
Thus it is impossible to deﬁne f at (0, 0) in order to make f continuous there.
17.8. Diﬀerentiability Implies Continuity. Despite Example 17.7.3 we
have the following result.
Proposition 17.8.1. If f is diﬀerentiable at a, then it is continuous at a.
Proof. Suppose f is diﬀerentiable at a. Then
f(x) = f(a) +
n
i=1
∂f
∂x
i
(a)(x
i
−a
i
) +o([x −a[).
Since x
i
− a
i
→ 0 and o([x − a[) → 0 as x → a, it follows that f(x) → f(a) as
x →a. That is, f is continuous at a.
17.9. Mean Value Theorem and Consequences.
Theorem 17.9.1. Suppose f is continuous at all points on the line segment L
joining a and a +h; and is diﬀerentiable at all points on L, except possibly at the
end points.
Then
f(a +h) −f(a) = ¸df(x), h) (47)
=
n
i=1
∂f
∂x
i
(x) h
i
(48)
for some x ∈ L, x not an endpoint of L.
Proof. Note that (48) follows immediately from (47) by Corollary 17.5.3.
Deﬁne the one variable function g by
g(t) = f(a +th).
Then g is continuous on [0,1] (being the composition of the continuous functions
t →a +th and x →f(x)). Moreover,
g(0) = f(a), g(1) = f(a +h). (49)
We next show that g is diﬀerentiable and compute its derivative.
17. DIFFERENTIATION OF REALVALUED FUNCTIONS 96
If 0 < t < 1, then f is diﬀerentiable at a +th, and so
0 = lim
w→0
f(a +th +w) −f(a +th) −¸df(a +th), w)
[w[
. (50)
Let w = sh where s is a small real number, positive or negative. Since [w[ = ±s[h[,
and since we may assume h ,= 0 (as otherwise (47) is trivial), we see from (50) that
0 = lim
s→0
f
_
(a + (t +s)h
_
−f(a +th) −¸df(a +th), sh)
s
= lim
s→0
_
g(t +s) −g(t)
s
−¸df(a +th), h)
_
,
using the linearity of df(a +th).
Hence g
(t) exists for 0 < t < 1, and moreover
g
(t) = ¸df(a +th), h). (51)
By the usual Mean Value Theorem for a function of one variable, applied to g,
we have
g(1) −g(0) = g
(t) (52)
for some t ∈ (0, 1).
Substituting (49) and (51) in (52), the required result (47) follows.
If the norm of the gradient vector of f is bounded by M, then it is not surprising
that the diﬀerence in value between f(a) and f(a +h) is bounded by M[h[. More
precisely.
Corollary 17.9.2. Assume the hypotheses of the previous theorem and sup
pose [∇f(x)[ ≤ M for all x ∈ L. Then
[f(a +h) −f(a)[ ≤ M[h[
Proof. From the previous theorem
[f(a +h) −f(a)[ = [¸df(x), h)[ for some x ∈ L
= [∇f(x) h[
≤ [∇f(x)[ [h[
≤ M[h[.
Corollary 17.9.3. Suppose Ω ⊂ R
n
is open and connected and f : Ω → R.
Suppose f is diﬀerentiable in Ω and df(x) = 0 for all x ∈ Ω
52
.
Then f is constant on Ω.
Proof. Choose any a ∈ Ω and suppose f(a) = α. Let
E = ¦x ∈ Ω : f(x) = α¦.
Then E is nonempty (as a ∈ E). We will prove E is both open and closed in Ω.
Since Ω is connected, this will imply that E is all of Ω. This establishes the result.
To see E is open
53
, suppose x ∈ E and choose r > 0 so that B
r
(x) ⊂ Ω.
If y ∈ B
r
(x), then from (47) for some u between x and y,
f(y) −f(x) = ¸df(u), y −x)
= 0, by hypothesis.
52
Equivalently, ∇f(x) = 0 in Ω.
53
Being open in Ω and being open in R
n
is the same for subsets of Ω, since we are assuming
Ω is itself open in R
n
.
(a
1
, a
2
)
(a
1
+h
1
, a
2
)
(a
1
+h
1
, a
2
+h
2
)
.
.
.
. .
(a
1
+h
1
, ξ
2
)
(ξ
1
, a
2
)
Ω
17. DIFFERENTIATION OF REALVALUED FUNCTIONS 97
Thus f(y) = f(x) (= α), and so y ∈ E.
Hence B
r
(x) ⊂ E and so E is open.
To show that E is closed in Ω, it is suﬃcient to show that E
c
=¦y : f(x) ,= α¦
is open in Ω.
From Proposition 17.8.1 we know that f is continuous. Since we have E
c
=
f
−1
[R¸¦α¦] and R¸¦α¦ is open, it follows that E
c
is open in Ω. Hence E is closed
in Ω, as required.
Since E ,= ∅, and E is both open and closed in Ω, it follows E = Ω (as Ω is
connected).
In other words, f is constant (= α) on Ω.
17.10. Continuously Diﬀerentiable Functions. We saw in Section 17.7,
Example (2), that the partial derivatives (and even all the directional derivatives)
of a function can exist without the function being diﬀerentiable.
However, we do have the following important theorem:
Theorem 17.10.1. Suppose f : Ω(⊂ R
n
) → R where Ω is open. If the partial
derivatives of f exist and are continuous at every point in Ω, then f is diﬀerentiable
everywhere in Ω.
Remark 17.10.1. If the partial derivatives of f exist in some neighbourhood
of, and are continuous at, a single point, it does not necessarily follow that f is
diﬀerentiable at that point. The hypotheses of the theorem need to hold at all
points in some open set Ω.
Proof of Theorem. We prove the theorem in case n = 2 (the proof for n > 2
is only notationally more complicated).
Suppose that the partial derivatives of f exist and are continuous in Ω. Then
if a ∈ Ω and a +h is suﬃciently close to a,
f(a
1
+h
1
, a
2
+h
2
) = f(a
1
, a
2
)
+f(a
1
+h
1
, a
2
) −f(a
1
, a
2
)
+f(a
1
+h
1
, a
2
+h
2
) −f(a
1
+h
1
, a
2
)
= f(a
1
, a
2
) +
∂f
∂x
1
(ξ
1
, a
2
)h
1
+
∂f
∂x
2
(a
1
+h
1
, ξ
2
)h
2
,
for some ξ
1
between a
1
and a
1
+ h
1
, and some ξ
2
between a
2
and a
2
+ h
2
. The
ﬁrst partial derivative comes from applying the usual Mean Value Theorem, for a
function of one variable, to the function f(x
1
, a
2
) obtained by ﬁxing a
2
and taking
x
1
as a variable. The second partial derivative is similarly obtained by considering
the function f(a
1
+h
1
, x
2
), where a
1
+h
1
is ﬁxed and x
2
is variable.
Hence
f(a
1
+h
1
, a
2
+h
2
) = f(a
1
, a
2
) +
∂f
∂x
1
(a
1
, a
2
)h
1
+
∂f
∂x
2
(a
1
, a
2
)h
2
17. DIFFERENTIATION OF REALVALUED FUNCTIONS 98
+
_
∂f
∂x
1
(ξ
1
, a
2
) −
∂f
∂x
1
(a
1
, a
2
)
_
h
1
+
_
∂f
∂x
2
(a
1
+h
1
, ξ
2
) −
∂f
∂x
2
(a
1
, a
2
)
_
h
2
= f(a
1
, a
2
) +L(h) +ψ(h), say.
Here L is the linear map deﬁned by
L(h) =
∂f
∂x
1
(a
1
, a
2
)h
1
+
∂f
∂x
2
(a
1
, a
2
)h
2
=
_
∂f
∂x
1
(a
1
, a
2
)
∂f
∂x
2
(a
1
, a
2
)
_ _
h
1
h
2
_
.
Thus L is represented by the previous 1 2 matrix.
We claim that the error term
ψ(h) =
_
∂f
∂x
1
(ξ
1
, a
2
) −
∂f
∂x
1
(a
1
, a
2
)
_
h
1
+
_
∂f
∂x
2
(a
1
+h
1
, ξ
2
) −
∂f
∂x
2
(a
1
, a
2
)
_
h
2
can be written as o([h[)
This follows from the facts:
1.
∂f
∂x
1
(ξ
1
, a
2
) →
∂f
∂x
1
(a
1
, a
2
) as h → 0 (by continuity of the partial deriva
tives),
2.
∂f
∂x
2
(a
1
+h
1
, ξ
2
) →
∂f
∂x
2
(a
1
, a
2
) as h →0 (again by continuity of the partial
derivatives),
3. [h
1
[ ≤ [h[, [h
2
[ ≤ [h[.
It now follows from Proposition 17.5.7 that f is diﬀerentiable at a, and the
diﬀerential of f is given by the previous 1 2 matrix of partial derivatives.
Since a ∈ Ω is arbitrary, this completes the proof.
Definition 17.10.2. If the partial derivatives of f exist and are continuous in
the open set Ω, we say f is a C
1
(or continuously diﬀerentiable) function on Ω.
One writes f ∈ C
1
(Ω).
It follows from the previous Theorem that if f ∈ C
1
(Ω) then f is indeed diﬀeren
tiable in Ω. Exercise: The converse may not be true, give a simple counterexample
in R.
17.11. HigherOrder Partial Derivatives. Suppose f : Ω(⊂ R
n
) → R.
The partial derivatives
∂f
∂x
1
, . . . ,
∂f
∂x
n
, if they exist, are also functions from Ω to R,
and may themselves have partial derivatives.
The jth partial derivative of
∂f
∂x
i
is denoted by
∂
2
f
∂x
j
∂x
i
or f
ij
or D
ij
f.
If all ﬁrst and second partial derivatives of f exist and are continuous in Ω
54
we write
f ∈ C
2
(Ω).
Similar remarks apply to higher order derivatives, and we similarly deﬁne C
q
(Ω)
for any integer q ≥ 0.
54
In fact, it is suﬃcient to assume just that the second partial derivatives are continuous.
For under this assumption, each ∂f/∂x
i
must be diﬀerentiable by Theorem 17.10.1 applied to
∂f/∂x
i
. From Proposition 17.8.1 applied to ∂f/∂x
i
it then follows that ∂f/∂x
i
is continuous.
a = (a
1
, a
2
) (a
1
+h, a
2
)
(a
1
+h, a
2
+h)
(a
1
, a
2
+h)
A B
C
D
A(h) = ( ( f(B)  f(A) )  ( f(D)  f(C) ) ) / h
2
= ( ( f(B)  f(D) )  ( f(A)  f(C) ) ) / h
2
17. DIFFERENTIATION OF REALVALUED FUNCTIONS 99
Note that
C
0
(Ω) ⊃ C
1
(Ω) ⊃ C
2
(Ω) ⊃ . . .
The usual rules for diﬀerentiating a sum, product or quotient of functions of a
single variable apply to partial derivatives. It follows that C
k
(Ω) is closed under
addition, products and quotients (if the denominator is nonzero).
The next theorem shows that for higher order derivatives, the actual order of
diﬀerentiation does not matter, only the number of derivatives with respect to each
variable is important. Thus
∂
2
f
∂x
i
∂x
j
=
∂
2
f
∂x
j
∂x
i
,
and so
∂
3
f
∂x
i
∂x
j
∂x
k
=
∂
3
f
∂x
j
∂x
i
∂x
k
=
∂
3
f
∂x
j
∂x
k
∂x
i
, etc.
Theorem 17.11.1. If f ∈ C
1
(Ω)
55
and both f
ij
and f
ji
exist and are continu
ous (for some i ,= j) in Ω, then f
ij
= f
ji
in Ω.
In particular, if f ∈ C
2
(Ω) then f
ij
= f
ji
for all i ,= j.
Proof. For notational simplicity we take n = 2. The proof for n > 2 is very
similar.
Suppose a ∈ Ω and suppose h > 0 is some suﬃciently small real number.
Consider the second diﬀerence quotient deﬁned by
A(h) =
1
h
2
_
_
f(a
1
+h, a
2
+h) −f(a
1
, a
2
+h)
_
−
_
f(a
1
+h, a
2
) −f(a
1
, a
2
)
_
_
(53)
=
1
h
2
_
g(a
2
+h) −g(a
2
)
_
, (54)
where
g(x
2
) = f(a
1
+h, x
2
) −f(a
1
, x
2
).
From the deﬁnition of partial diﬀerentiation, g
(x
2
) exists and
g
(x
2
) =
∂f
∂x
2
(a
1
+h, x
2
) −
∂f
∂x
2
(a
1
, x
2
) (55)
for a
2
≤ x ≤ a
2
+h.
55
As usual, Ω is assumed to be open.
17. DIFFERENTIATION OF REALVALUED FUNCTIONS 100
Applying the mean value theorem for a function of a single variable to (54), we
see from (55) that
A(h) =
1
h
g
(ξ
2
) some ξ
2
∈ (a
2
, a
2
+h)
=
1
h
_
∂f
∂x
2
(a
1
+h, ξ
2
) −
∂f
∂x
2
(a
1
, ξ
2
)
_
. (56)
Applying the mean value theorem again to the function
∂f
∂x
2
(x
1
, ξ
2
), with ξ
2
ﬁxed, we see
A(h) =
∂
2
f
∂x
1
∂x
2
(ξ
1
, ξ
2
) some ξ
1
∈ (a
1
, a
1
+h). (57)
If we now rewrite (53) as
A(h) =
1
h
2
_
_
f(a
1
+h, a
2
+h) −f(a
1
+h, a
2
)
_
−
_
f(a
1
, a
2
+h) −f(a
1
+a
2
)
_
_
(58)
and interchange the roles of x
1
and x
2
in the previous argument, we obtain
A(h) =
∂
2
f
∂x
2
∂x
1
(η
1
, η
2
) (59)
for some η
1
∈ (a
1
, a
1
+h), η
2
∈ (a
2
, a
2
+h).
If we let h →0 then (ξ
1
, ξ
2
) and (η
1
, η
2
) →(a
1
, a
2
), and so from (57), (59) and
the continuity of f
12
and f
21
at a, it follows that
f
12
(a) = f
21
(a).
This completes the proof.
17.12. Taylor’s Theorem. If g ∈ C
1
[a, b], then we know
g(b) = g(a) +
_
b
a
g
(t) dt
This is the case k = 1 of the following version of Taylor’s Theorem for a function
of one variable.
Theorem 17.12.1 (Single Variable, Integral form of the Remainder).
Suppose g ∈ C
k
[a, b]. Then
g(b) = g(a) +g
(a)(b −a) +
1
2!
g
(a)(b −a)
2
+ (60)
+
1
(k −1)!
g
(k−1)
(a)(b −a)
k−1
+
_
b
a
(b −t)
k−1
(k −1)!
g
(k)
(t) dt.
Proof. An elegant (but not obvious) proof is to begin by computing:
d
dt
_
gϕ
(k−1)
−g
ϕ
(k−2)
+g
ϕ
(k−3)
− + (−1)
k−1
g
(k−1)
ϕ
_
=
_
gϕ
(k)
+g
ϕ
(k−1)
_
−
_
g
ϕ
(k−1)
+g
ϕ
(k−2)
_
+
_
g
ϕ
(k−2)
+g
ϕ
(k−3)
_
−
+ (−1)
k−1
_
g
(k−1)
ϕ
+g
(k)
ϕ
_
= gϕ
(k)
+ (−1)
k−1
g
(k)
ϕ. (61)
Now choose
ϕ(t) =
(b −t)
k−1
(k −1)!
.
17. DIFFERENTIATION OF REALVALUED FUNCTIONS 101
Then
ϕ
(t) = (−1)
(b −t)
k−2
(k −2)!
ϕ
(t) = (−1)
2
(b −t)
k−3
(k −3)!
.
.
.
ϕ
(k−3)
(t) = (−1)
k−3
(b −t)
2
2!
ϕ
(k−2)
(t) = (−1)
k−2
(b −t)
ϕ
(k−1)
(t) = (−1)
k−1
ϕ
k
(t) = 0. (62)
Hence from (61) we have
(−1)
k−1
d
dt
_
g(t) +g
(t)(b −t) +g
(t)
(b −t)
2
2!
+ +g
k−1
(t)
(b −t)
k−1
(k −1)!
_
= (−1)
k−1
g
(k)
(t)
(b −t)
k−1
(k −1)!
.
Dividing by (−1)
k−1
and integrating both sides from a to b, we get
g(b) −
_
g(a) +g
(a)(b −a) +g
(a)
(b −a)
2
2!
+ +g
(k−1)
(a)
(b −a)
k−1
(k −1)!
_
=
_
b
a
g
(k)
(t)
(b −t)
k−1
(k −1)!
dt.
This gives formula (60).
Theorem 17.12.2 (Single Variable, Second Version). Suppose g ∈ C
k
[a, b]. Then
g(b) = g(a) +g
(a)(b −a) +
1
2!
g
(a)(b −a)
2
+
+
1
(k −1)!
g
(k−1)
(a)(b −a)
k−1
+
1
k!
g
(k)
(ξ)(b −a)
k
(63)
for some ξ ∈ (a, b).
Proof. We establish (63) from (60).
Since g
(k)
is continuous in [a, b], it has a minimum value m, and a maximum
value M, say.
By elementary properties of integrals, it follows that
_
b
a
m
(b −t)
k−1
(k −1)!
dt ≤
_
b
a
g
(k)
(t)
(b −t)
k−1
(k −1)!
dt ≤
_
b
a
M
(b −t)
k−1
(k −1)!
dt,
i.e.
m ≤
_
b
a
g
(k)
(t)
(b −t)
k−1
(k −1)!
dt
_
b
a
(b −t)
k−1
(k −1)!
dt
≤ M.
By the Intermediate Value Theorem, g
(k)
takes all values in the range [m, M],
and so the middle term in the previous inequality must equal g
(k)
(ξ) for some
ξ ∈ (a, b). Since
_
b
a
(b −t)
k−1
(k −1)!
dt =
(b −a)
k
k!
,
17. DIFFERENTIATION OF REALVALUED FUNCTIONS 102
it follows
_
b
a
g
(k)
(t)
(b −t)
k−1
(k −1)!
dt =
(b −a)
k
k!
g
(k)
(ξ).
Formula (63) now follows from (60).
Taylor’s Theorem generalises easily to functions of more than one variable.
Theorem 17.12.3 (Taylor’s Formula; Several Variables).
Suppose f ∈ C
k
(Ω) where Ω ⊂ R
n
, and the line segment joining a and a + h is a
subset of Ω.
Then
f(a +h) = f(a) +
n
i−1
D
i
f(a) h
i
+
1
2!
n
i,j=1
D
ij
f(a) h
i
h
j
+
+
1
(k −1)!
n
i
1
,··· ,i
k−1
=1
D
i
1
...i
k−1
f(a) h
i
1
. . . h
i
k−1
+R
k
(a, h)
where
R
k
(a, h) =
1
(k −1)!
n
i
1
,... ,i
k
=1
_
1
0
(1 −t)
k−1
D
i
1
...i
k
f(a +th) dt
=
1
k!
n
i
1
,... ,i
k
=1
D
i
1
,... ,i
k
f(a +sh) h
i
1
. . . h
i
k
for some s ∈ (0, 1).
Proof. First note that for any diﬀerentiable function F : D(⊂ R
n
) → R we
have
d
dt
F(a +th) =
n
i=1
D
i
F(a +th) h
i
. (64)
This is just a particular case of the chain rule, which we will discuss later. This
particular version follows from (51) and Corollary 17.5.3 (with f there replaced by
F).
Let
g(t) = f(a +th).
Then g : [0, 1] → R. We will apply Taylor’s Theorem for a function of one variable
to g.
From (64) we have
g
(t) =
n
i−1
D
i
f(a +th) h
i
. (65)
Diﬀerentiating again, and applying (64) to D
i
F, we obtain
g
(t) =
n
i=1
_
_
n
j=1
D
ij
f(a +th) h
j
_
_
h
i
=
n
i,j=1
D
ij
f(a +th) h
i
h
j
. (66)
Similarly
g
(t) =
n
i,j,k=1
D
ijk
f(a +th) h
i
h
j
h
k
, (67)
etc. In this way, we see g ∈ C
k
[0, 1] and obtain formulae for the derivatives of g.
17. DIFFERENTIATION OF REALVALUED FUNCTIONS 103
But from (60) and (63) we have
g(1) = g(0) +g
(0) +
1
2!
g
(0) + +
1
(k −1)!
g
(k−1)
(0)
+
_
¸
¸
¸
_
¸
¸
¸
_
1
(k −1)!
_
1
0
(1 −t)
k−1
g
(k)
(t) dt
or
1
k!
g
(k)
(s) some s ∈ (0, 1).
If we substitute (65), (66), (67) etc. into this, we obtain the required results.
Remark 17.12.1. The ﬁrst two terms of Taylor’s Formula give the best ﬁrst
order approximation
56
in h to f(a + h) for h near 0. The ﬁrst three terms give
the best second order approximation
57
in h, the ﬁrst four terms give the best third
order approximation, etc.
Note that the remainder term R
k
(a, h) in Theorem 17.12.3 can be written as
O([h[
k
) (see the Remarks on rates of convergence in Section 17.5), i.e.
R
k
(a, h)
[h[
k
is bounded as h →0.
This follows from the second version for the remainder in Theorem 17.12.3 and the
facts:
1. D
i
1
...i
k
f(x) is continuous, and hence bounded on compact sets,
2. [h
i
1
. . . h
i
k
[ ≤ [h[
k
.
Example 17.12.4. Let
f(x, y) = (1 +y
2
)
1/2
cos x.
One ﬁnds the best second order approximation to f for (x, y) near (0, 1) as follows.
First note that
f(0, 1) = 2
1/2
.
Moreover,
f
1
= −(1 +y
2
)
1/2
sin x; = 0 at (0, 1)
f
2
= y(1 +y
2
)
−1/2
cos x; = 2
−1/2
at (0, 1)
f
11
= −(1 +y
2
)
1/2
cos x; = −2
1/2
at (0, 1)
f
12
= −y(1 +y
2
)
−1/2
sin x; = 0 at (0, 1)
f
22
= (1 +y
2
)
−3/2
cos x; = 2
−3/2
at (0, 1).
Hence
f(x, y) = 2
1/2
+ 2
−1/2
(y −1) −2
1/2
x
2
+ 2
−3/2
(y −1)
2
+R
3
_
(0, 1), (x, y)
_
,
where
R
3
_
(0, 1), (x, y)
_
= O
_
[(x, y) −(0, 1)[
3
_
= O
_
_
x
2
+ (y −1)
2
_
3/2
_
.
56
I.e. constant plus linear term.
57
I.e. constant plus linear term plus quadratic term.
18. DIFFERENTIATION OF VECTORVALUED FUNCTIONS 104
18. Diﬀerentiation of VectorValued Functions
You need only consider Subsections 1, 2 (to the beginning of Deﬁnition 19.2.7),
3, 4, 5 (the last is not examinable). What is required is just a basic understanding
of the ideas, and application to straightforward examples.
18.1. Introduction. In this chapter we consider functions
f : D(⊂ R
n
) →R
m
,
with m ≥ 1.
We write
f (x
1
, . . . , x
n
) =
_
f
1
(x
1
, . . . , x
n
), . . . , f
m
(x
1
, . . . , x
n
)
_
where
f
i
: D →R i = 1, . . . , m
are real valued functions.
Example 18.1.1. Let
f (x, y, z) = (x
2
−y
2
, 2xz + 1).
Then f
1
(x, y, z) = x
2
−y
2
and f
2
(x, y, z) = 2xz + 1.
Reduction to Component Functions For many purposes we can reduce the
study of functions f , as above, to the study of the corresponding real valued func
tions f
1
, . . . , f
m
. However, this is not always a good idea, since studying the f
i
involves a choice of coordinates in R
m
, and this can obscure the geometry involved.
In Deﬁnitions 18.2.1, 18.3.1 and 18.4.1 we deﬁne the notion of partial derivative,
directional derivative, and diﬀerential of f without reference to the component
functions. In Propositions 18.2.2, 18.3.2 and 18.4.2 we show these deﬁnitions are
equivalent to deﬁnitions in terms of the component functions.
18.2. Paths in R
m
. In this section we consider the case corresponding to
n = 1 in the notation of the previous section. This is an important case in its own
right and also helps motivate the case n > 1.
Definition 18.2.1. Let I be an interval in R. If f : I →R
n
then the derivative
or tangent vector at t is the vector
f
(t) = lim
s→0
f (t +s) −f (t)
s
,
provided the limit exists
58
. In this case we say f is diﬀerentiable at t. If, moreover,
f
(t) ,= 0 then f
(t)/[f
(t)[ is called the unit tangent at t.
Remark 18.2.1. Although we say f
(t) is the tangent vector at t, we should
really think of f
(t) as a vector with its “base” at f (t). See the next diagram.
Proposition 18.2.2. Let f (t) =
_
f
1
(t), . . . , f
m
(t)
_
. Then f is diﬀerentiable
at t iﬀ f
1
, . . . , f
m
are diﬀerentiable at t. In this case
f
(t) =
_
f
1
(t), . . . , f
m
(t)
_
.
Proof. Since
f (t +s) −f (t)
s
=
_
f
1
(t +s) −f
1
(t)
s
, . . . ,
f
m
(t +s) −f
m
(t)
s
_
,
The theorem follows since a function into R
m
converges iﬀ its component functions
converge.
58
If t is an endpoint of I then one takes the corresponding onesided limits.
f(t)
f(t+s)
f(t+s)  f(t)
s
f'(t)
a path in R
2
t
1
t
2
f
f(t
1
)=f(t
2
)
f'(t
1
)
f'(t
2
)
I
18. DIFFERENTIATION OF VECTORVALUED FUNCTIONS 105
Definition 18.2.3. If f (t) =
_
f
1
(t), . . . , f
m
(t)
_
then f is C
1
if each f
i
is C
1
.
We have the usual rules for diﬀerentiating the sum of two functions from I
to R
m
, and the product of such a function with a real valued function (exercise:
formulate and prove such a result). The following rule for diﬀerentiating the inner
product of two functions is useful.
Proposition 18.2.4. If f
1
, f
2
: I →R
m
are diﬀerentable at t then
d
dt
_
f
1
(t), f
2
(t)
_
=
_
f
1
(t), f
2
(t)
_
+
_
f
1
(t), f
2
(t)
_
.
Proof. Since
_
f
1
(t), f
2
(t)
_
=
m
i=1
f
i
1
(t)f
i
2
(t),
the result follows from the usual rule for diﬀerentiation sums and products.
If f : I → R
m
, we can think of f as tracing out a “curve” in R
m
(we will make
this precise later). The terminology tangent vector is reasonable, as we see from
the following diagram. Sometimes we speak of the tangent vector at f (t) rather
than at t, but we need to be careful if f is not oneone, as in the second ﬁgure.
Example 18.2.5.
1. Let
f (t) = (cos t, sin t) t ∈ [0, 2π).
This traces out a circle in R
2
and
f
(t) = (−sin t, cos t).
2. Let
f (t) = (t, t
2
).
This traces out a parabola in R
2
and
f
(t) = (1, 2t).
f(t)=(cos t, sin t)
f'(t) = (sin t, cos t)
f(t) = (t, t
2
)
f'(t) = (1, 2t)
18. DIFFERENTIATION OF VECTORVALUED FUNCTIONS 106
Example 18.2.6. Consider the functions
1. f
1
(t) = (t, t
3
) t ∈ R,
2. f
2
(t) = (t
3
, t
9
) t ∈ R,
3. f
3
(t) = (
3
√
t, t) t ∈ R.
Then each function f
i
traces out the same “cubic” curve in R
2
, (i.e., the image is
the same set of points), and
f
1
(0) = f
2
(0) = f
3
(0) = (0, 0).
However,
f
1
(0) = (1, 0), f
2
(0) = (0, 0), f
3
(0) is undeﬁned.
Intuitively, we will think of a path in R
m
as a function f which neither stops nor
reverses direction. It is often convenient to consider the variable t as representing
“time”. We will think of the corresponding curve as the set of points traced out by
f . Many diﬀerent paths (i.e. functions) will give the same curve; they correspond
to tracing out the curve at diﬀerent times and velocities. We make this precise as
follows:
Definition 18.2.7. We say f : I →R
m
is a path
59
in R
m
if f is C
1
and f
(t) ,= 0
for t ∈ I. We say the two paths f
1
: I
1
→ R
m
and f
2
: I
2
→ R
m
are equivalent if
there exists a function φ: I
1
→I
2
such that f
1
= f
2
◦φ, where φ is C
1
and φ
(t) > 0
for t ∈ I
1
.
A curve is an equivalence class of paths. Any path in the equivalence class is
called a parametrisation of the curve.
We can think of φ as giving another way of measuring “time”.
We expect that the unit tangent vector to a curve should depend only on the
curve itself, and not on the particular parametrisation. This is indeed the case, as
is shown by the following Proposition.
59
Other texts may have diﬀerent terminology.
f
1
(t)=f
2
(ϕ(t))
f
1
f
2
ϕ
I
1
I
2
f
1
'(t) / f
1
'(t) =
f
2
'(ϕ(t)) / f
2
'(ϕ(t))
18. DIFFERENTIATION OF VECTORVALUED FUNCTIONS 107
Proposition 18.2.8. Suppose f
1
: I
1
→ R
m
and f
2
: I
2
→ R
m
are equivalent
parametrisations; and in particular f
1
= f
2
◦ φ where φ : I
1
→ I
2
, φ is C
1
and
φ
(t) > 0 for t ∈ I
1
. Then f
1
and f
2
have the same unit tangent vector at t and
φ(t) respectively.
Proof. From the chain rule for a function of one variable, we have
f
1
(t) =
_
f
1
1
(t), . . . , f
m
1
(t)
_
=
_
f
1
2
(φ(t)) φ
(t), . . . , f
m
2
(φ(t)) φ
(t)
_
= f
2
(φ(t)) φ
(t).
Hence, since φ
(t) > 0,
f
1
(t)
¸
¸
f
1
(t)
¸
¸
=
f
2
(t)
¸
¸
f
2
(t)
¸
¸
.
Definition 18.2.9. If f is a path in R
m
, then the acceleration at t is f
(t).
Example 18.2.10. If [f
(t)[ is constant (i.e. the “speed” is constant) then the
velocity and the acceleration are orthogonal.
Proof. Since [f (t)[
2
=
_
f
(t), f
(y)
_
is constant, we have from Proposition 18.2.4
that
0 =
d
dt
_
f
(t), f
(y)
_
= 2
_
f
(t), f
(y)
_
.
This gives the result.
Arc length Suppose f : [a, b] → R
m
is a path in R
m
. Let a = t
1
< t
2
< . . . <
t
n
= b be a partition of [a, b], where t
i
−t
i−1
= δt for all i.
We think of the length of the curve corresponding to f as being
≈
N
i=2
[f(t
i
) −f(t
i−1
)[ =
N
i=2
[f(t
i
) −f(t
i−1
)[
δt
δt ≈
_
b
a
¸
¸
f
(t)
¸
¸
dt.
See the next diagram.
t
1
t
i1
t
i
t
N
f(t
1
)
f(t
N
)
f(t
i1
)
f(t
i
)
f
18. DIFFERENTIATION OF VECTORVALUED FUNCTIONS 108
Motivated by this we make the following deﬁnition.
Definition 18.2.11. Let f : [a, b] → R
m
be a path in R
m
. Then the length of
the curve corresponding to f is given by
_
b
a
¸
¸
f
(t)
¸
¸
dt.
The next result shows that this deﬁnition is independent of the particular
parametrisation chosen for the curve.
Proposition 18.2.12. Suppose f
1
: [a
1
, b
1
] → R
m
and f
2
: [a
2
, b
2
] → R
m
are
equivalent parametrisations; and in particular f
1
= f
2
◦φ where φ: [a
1
, b
1
] →[a
2
, b
2
],
φ is C
1
and φ
(t) > 0 for t ∈ I
1
. Then
_
b
1
a
1
¸
¸
f
1
(t)
¸
¸
dt =
_
b
2
a
2
¸
¸
f
2
(s)
¸
¸
ds.
Proof. From the chain rule and then the rule for change of variable of inte
gration,
_
b
1
a
1
¸
¸
f
1
(t)
¸
¸
dt =
_
b
1
a
1
¸
¸
f
2
(φ(t))
¸
¸
φ
(t)dt
=
_
b
2
a
2
¸
¸
f
2
(s)
¸
¸
ds.
18.3. Partial and Directional Derivatives. Analogous to Deﬁnitions 17.3.1
and 17.4.1 we have:
Definition 18.3.1. The ith partial derivative of f at x is deﬁned by
∂f
∂x
i
(x)
_
or D
i
f (x)
_
= lim
t→0
f (x +te
i
) −f (x)
t
,
provided the limit exists. More generally, the directional derivative of f at x in the
direction v is deﬁned by
D
v
f (x) = lim
t→0
f (x +tv) −f (x)
t
,
provided the limit exists.
Remark 18.3.1.
1. It follows immediately from the Deﬁnitions that
∂f
∂x
i
(x) = D
e
i
f (x).
D
v
f(a)
f(a)
v
e
1
e
2
a
f(b)
∂ f
∂y
∂ f
∂x
(b)
(b)
b
R
2
R
3
f
18. DIFFERENTIATION OF VECTORVALUED FUNCTIONS 109
2. The partial and directional derivatives are vectors in R
m
. In the terminol
ogy of the previous section,
∂f
∂x
i
(x) is tangent to the path t → f (x + te
i
)
and D
v
f (x) is tangent to the path t → f (x + tv). Note that the curves
corresponding to these paths are subsets of the image of f .
3. As we will discuss later, we may regard the partial derivatives at x as a basis
for the tangent space to the image of f at f (x)
60
.
Proposition 18.3.2. If f
1
, . . . , f
m
are the component functions of f then
∂f
∂x
i
(a) =
_
∂f
1
∂x
i
(a), . . . ,
∂f
m
∂x
i
(a)
_
for i = 1, . . . , n
D
v
f (a) =
_
D
v
f
1
(a), . . . , D
v
f
m
(a)
_
in the sense that if one side of either equality exists, then so does the other, and
both sides are then equal.
Proof. Essentially the same as for the proof of Proposition 18.2.2.
Example 18.3.3. Let f : R
2
→R
3
be given by
f (x, y) = (x
2
−2xy, x
2
+y
3
, sin x).
Then
∂f
∂x
(x, y) =
_
∂f
1
∂x
,
∂f
2
∂x
,
∂f
3
∂x
_
= (2x −2y, 2x, cos x),
∂f
∂y
(x, y) =
_
∂f
1
∂y
,
∂f
2
∂y
,
∂f
3
∂y
_
= (−2x, 3y
2
, 0),
are vectors in R
3
.
18.4. The Diﬀerential. Analogous to Deﬁnition 17.5.1 we have:
Definition 18.4.1. Suppose f : D(⊂ R
n
) → R
m
. Then f is diﬀerentiable at
a ∈ D if there is a linear transformation L: R
n
→R
m
such that
¸
¸
¸f (x) −
_
f (a) +L(x −a)
_¸
¸
¸
[x −a[
→0 as x →a. (68)
60
More precisely, if n ≤ m and the diﬀerential df (x) has rank n. See later.
18. DIFFERENTIATION OF VECTORVALUED FUNCTIONS 110
The linear transformation L is denoted by f
(a) or df (a) and is called the derivative
or diﬀerential of f at a
61
.
A vectorvalued function is diﬀerentiable iﬀ the corresponding component func
tions are diﬀerentiable. More precisely:
Proposition 18.4.2. f is diﬀerentiable at a iﬀ f
1
, . . . , f
m
are diﬀerentiable
at a. In this case the diﬀerential is given by
¸df (a), v) =
_
¸df
1
(a), v), . . . , ¸df
m
(a), v)
_
. (69)
In particular, the diﬀerential is unique.
Proof. For any linear map L : R
n
→ R
m
, and for each i = 1, . . . , m, let
L
i
: R
n
→R be the linear map deﬁned by L
i
(v) =
_
L(v)
_
i
.
Since a function into R
m
converges iﬀ its component functions converge, it
follows
¸
¸
¸f (x) −
_
f (a) +L(x −a)
_¸
¸
¸
[x −a[
→0 as x →a
iﬀ
¸
¸
¸f
i
(x) −
_
f
i
(a) +L
i
(x −a)
_¸
¸
¸
[x −a[
→0 as x →a for i = 1, . . . , m.
Thus f is diﬀerentiable at a iﬀ f
1
, . . . , f
m
are diﬀerentiable at a.
In this case we must have
L
i
= df
i
(a) i = 1, . . . , m
(by uniqueness of the diﬀerential for real valued functions), and so
L(v) =
_
¸df
1
(a), v), . . . , ¸df
m
(a), v)
_
.
But this says that the diﬀerential df (a) is unique and is given by (69).
Corollary 18.4.3. If f is diﬀerentiable at a then the linear transformation
df (a) is represented by the matrix
_
¸
¸
¸
¸
_
∂f
1
∂x
1
(a)
∂f
1
∂x
n
(a)
.
.
.
.
.
.
∂f
m
∂x
1
(a)
∂f
m
∂x
n
(a)
_
¸
¸
¸
¸
_
: R
n
→R
m
(70)
Proof. The ith column of the matrix corresponding to df (a) is the vector
¸df (a), e
i
)
62
. From Proposition 18.4.2 this is the column vector corresponding to
_
¸df
1
(a), e
i
), . . . , ¸df
m
(a), e
i
)
_
,
i.e. to
_
∂f
1
∂x
i
(a), . . . ,
∂f
m
∂x
i
(a)
_
.
This proves the result.
61
It follows from Proposition 18.4.2 that if L exists then it is unique and is given by the right
side of (69).
62
For any linear transformation L: R
n
→R
m
, the ith column of the corresponding matrix is
L(e
i
).
18. DIFFERENTIATION OF VECTORVALUED FUNCTIONS 111
Remark 18.4.1. The jth column is the vector in R
m
corresponding to the
partial derivative
∂f
∂x
j
(a). The ith row represents df
i
(a).
The following proposition is immediate.
Proposition 18.4.4. If f is diﬀerentiable at a then
f (x) = f (a) +¸df (a), x −a) +ψ(x),
where ψ(x) = o([x −a[).
Conversely, suppose
f (x) = f (a) +L(x −a) +ψ(x),
where L: R
n
→ R
m
is linear and ψ(x) = o([x − a[). Then f is diﬀerentiable at a
and df (a) = L.
Proof. As for Proposition 17.5.7.
Thus as is the case for realvalued functions, the previous proposition implies
f (a) +¸df (a), x −a) gives the best ﬁrst order approximation to f (x) for x near a.
Example 18.4.5. Let f : R
2
→R
2
be given by
f (x, y) = (x
2
−2xy, x
2
+y
3
).
Find the best ﬁrst order approximation to f (x) for x near (1, 2).
Solution:
f (1, 2) =
_
−3
9
_
,
df (x, y) =
_
2x −2y −2x
2x 3y
2
_
,
df (1, 2) =
_
−2 −2
2 12
_
.
So the best ﬁrst order approximation near (1, 2) is
f (1, 2) +¸df (1, 2), (x −1, y −2))
=
_
−3
9
_
+
_
−2 −2
2 12
_ _
x −1
y −2
_
=
_
−3 −2(x −1) −4(y −2)
9 + 2(x −1) + 12(y −2)
_
=
_
7 −2x −4y
−17 + 2x + 12y
_
.
Alternatively, working with each component separately, the best ﬁrst order
approximation is
_
f
1
(1, 2) +
∂f
1
∂x
(1, 2)(x −1) +
∂f
1
∂y
(1, 2)(y −2),
f
2
(1, 2) +
∂f
2
∂x
(1, 2)(x −1) +
∂f
2
∂y
(y −2)
_
=
_
−3 −2(x −1) −4(y −2), 9 + 2(x −1) + 12(y −2)
_
=
_
7 −2x −4y, −17 + 2x + 12y
_
.
Remark 18.4.2. One similarly obtains second and higher order approxima
tions by using Taylor’s formula for each component function.
18. DIFFERENTIATION OF VECTORVALUED FUNCTIONS 112
Proposition 18.4.6. If f , g: D(⊂ R
n
) →R
m
are diﬀerentiable at a ∈ D, then
so are αf and f +g. Moreover,
d(αf )(a) = αdf (a),
d(f +g)(a) = df(a) +dg(a).
Proof. This is straightforward (exercise) from Proposition 18.4.4.
The previous proposition corresponds to the fact that the partial derivatives for
f +g are the sum of the partial derivatives corresponding to f and g respectively.
Similarly for αf .
Higher Derivatives We say f ∈ C
k
(D) iﬀ f
1
, . . . , f
m
∈ C
k
(D). It follows from
the corresponding results for the component functions that
1. f ∈ C
1
(D) ⇒f is diﬀerentiable in D;
2. C
0
(D) ⊃ C
1
(D) ⊃ C
2
(D) ⊃ . . . .
18.5. The Chain Rule. The chain rule for the composition of functions of
one variable says that
d
dx
g
_
f(x)
_
= g
_
f(x)
_
f
(x).
Or to use a more informal notation, if g = g(f) and f = f(x), then
dg
dx
=
dg
df
df
dx
.
This is generalised in the following theorem. The theorem says that the linear
approximation to g ◦ f (computed at x) is the composition of the linear approxi
mation to f (computed at x) followed by the linear approximation to g (computed
at f (x)).
A Little Linear Algebra Suppose L: R
n
→ R
m
is a linear map. Then we deﬁne
the norm of L by
L = max¦[L(x)[ : [x[ ≤ 1¦
63
.
A simple result (exercise) is that
[L(x)[ ≤ [[L[[ [x[ (71)
for any x ∈ R
n
.
It is also easy to check (exercise) that [[ [[ does deﬁne a norm on the vector
space of linear maps from R
n
into R
m
.
Theorem 18.5.1 (Chain Rule). Suppose f : D(⊂ R
n
) →Ω(⊂ R
m
) and g : Ω(⊂
R
m
) → R
r
. Suppose f is diﬀerentiable at x and g is diﬀerentiable at f (x). Then
g ◦ f is diﬀerentiable at x and
d(g ◦ f )(x) = dg(f (x)) ◦ df (x). (72)
Schematically:
g◦f
D
−−−−−−−−−−−−−−−−−−−→
(⊂ R
n
)
f
−→Ω(⊂ R
m
)
g
−→R
r
d(g◦f)(x) =dg(f(x)) ◦ df(x)
R
n
−−−−−−−−−−−−−−−−−−→
df(x)
−→ R
m
dg(f(x))
−→ R
r
63
Here x, L(x) are the usual Euclidean norms on R
n
and R
m
. Thus L corresponds to the
maximum value taken by L on the unit ball. The maximum value is achieved, as L is continuous
and {x : x ≤ 1} is compact.
18. DIFFERENTIATION OF VECTORVALUED FUNCTIONS 113
Example 18.5.2. To see how all this corresponds to other formulations of the
chain rule, suppose we have the following:
R
3
f
−→ R
2
g
−→ R
2
(x, y, z) (u, v) (p, q)
Thus coordinates in R
3
are denoted by (x, y, z), coordinates in the ﬁrst copy of
R
2
are denoted by (u, v) and coordinates in the second copy of R
2
are denoted by
(p, q).
The functions f and g can be written as follows:
f : u = u(x, y, z), v = v(x, y, z),
g : p = p(u, v), q = q(u, v).
Thus we think of u and v as functions of x, y and z; and p and q as functions of u
and v.
We can also represent p and q as functions of x, y and z via
p = p
_
u(x, y, z), v(x, y, z)
_
, q = q
_
u(x, y, z), v(x, y, z)
_
.
The usual version of the chain rule in terms of partial derivatives is:
∂p
∂x
=
∂p
∂u
∂u
∂x
+
∂p
∂v
∂v
∂x
∂p
∂x
=
∂p
∂u
∂u
∂x
+
∂p
∂v
∂v
∂x
.
.
.
∂q
∂z
=
∂q
∂u
∂u
∂z
+
∂q
∂v
∂v
∂z
.
In the ﬁrst equality,
∂p
∂x
is evaluated at (x, y, z),
∂p
∂u
and
∂p
∂v
are evaluated at
_
u(x, y, z), v(x, y, z)
_
, and
∂u
∂x
and
∂v
∂x
are evaluated at (x, y, z). Similarly for the
other equalities.
In terms of the matrices of partial derivatives:
_
∂p
∂x
∂p
∂y
∂p
∂z
∂q
∂x
∂q
∂y
∂q
∂z
_
. ¸¸ .
d(g ◦ f)(x)
=
_
∂p
∂u
∂p
∂v
∂q
∂u
∂q
∂v
_
. ¸¸ .
dg(f(x))
_
∂u
∂x
∂u
∂y
∂u
∂z
∂v
∂x
∂v
∂y
∂v
∂z
_
,
. ¸¸ .
df(x)
where x = (x, y, z).
Proof of Chain Rule: We want to show
(f ◦ g)(a +h) = (f ◦ g)(a) +L(h) +o([h[), (73)
where L = df (g(a)) ◦ dg(a).
Now
(f ◦ g)(a +h) = f
_
g(a +h)
_
= f
_
g(a) +g(a +h) −g(a)
_
= f
_
g(a)
_
+
_
df
_
g(a)
_
, g(a +h) −g(a)
_
+o
_
[g(a +h) −g(a)[
_
. . . by the diﬀerentiability of f
= f
_
g(a)
_
+
_
df
_
g(a)
_
, ¸dg(a), h) +o([h[)
_
18. DIFFERENTIATION OF VECTORVALUED FUNCTIONS 114
+o
_
[g(a +h) −g(a)[
_
. . . by the diﬀerentiability of g
= f
_
g(a)
_
+
_
df
_
g(a)
_
, ¸dg(a), h)
_
+
_
df
_
g(a)
_
, o([h[)
_
+o
_
[g(a +h) −g(a)[
_
= A+B +C +D
But B =
_
df
_
g(a)
_
◦ dg(a), h
_
, by deﬁnition of the “composition” of two
maps. Also C = o([h[) from (71) (exercise). Finally, for D we have
¸
¸
¸g(a +h) −g(a)
¸
¸
¸ =
¸
¸
¸¸dg(a), h) +o([h[)
¸
¸
¸ . . . by diﬀerentiability of g
≤ [[dg(a)[[ [h[ +o([h[) . . . from (71)
= O([h[) . . . why?
Substituting the above expressions into A+B +C +D, we get
(f ◦ g)(a +h) = f
_
g(a)
_
+
_
df
_
g(a))
_
◦ dg(a), h
_
+o([h[). (74)
If follows that f ◦ g is diﬀerentiable at a, and moreover the diﬀerential equals
df (g(a)) ◦ dg(a). This proves the theorem.
CHAPTER 1
Ignore this page, it is merely for typesetting purposes
3
I will also refer ahead to parts of Chapter 2 of Reed.1. In fact. last year we did this for the properties of ≤ from axioms for <.4.
discuss sets in the next section binary operation on R is a function which assigns to any two numbers in R a third number in R. PRELIMINARIES
1
1. and agrees with Adams and last years 1115 Notes. 3. j. and the underlying theoretical ideas will be developed. k will always denote natural numbers. The ﬁrst two sets of axioms are the algebraic and the order axioms respectively. Thanks to the many students in 2000 who pointed out various mistakes. i. We begin with a review of the axioms for the real numbers. we need the set of natural numbers deﬁned by N = {1. or the complex numbers). You can tell which version you have by looking at the ﬁrst expression on line 1 of page 53. Unless otherwise clear from the context. However. and also as a prerequisite for the course Analysis 2 (Lebesgue Integration and Measure Theory. the expression a ≤ b is a statement and so is either true or false. Axioms for the real numbers. } = {1. 1 + 1 + 1. it is required that certain axioms hold. What I call the Completeness axiom is not given a name by Reed. together with two binary operations2 “+” and “×”. a binary relation3 “≤”.4 the number 5. particularly Tristan Bice.1.1. particularly in the sections. and two particular members of R denoted 0 and 1 respectively.
2A 1 We
. the ﬁrst printing has 2−n (M − b) and the second printing has 2−n+1 (M − b). the binary operation “+” assigns to the two real numbers a and b the real number denoted by a + b. Recall that the real number system is a set1 R. n. 1. Warning: What I call the Cauchy completeness axiom is called the “Completeness axiom” by Reed. last years MATH1115 Foundations Notes and the (supplementary) MATH1116 Calculus Notes. and so in particular to the two numbers 2 and 3. This does not matter. These axioms also hold for the rationals (but not for the irrationals. Kesava Jay. but will make comments where it varies from the ﬁrst. I will use the second version as the standard. The terminology I use is more standard. 2. There are two diﬀerent printings of Reed. since we can derive the properties of either from the axioms for the other. For example. Preliminaries This course is both for students who do not intend at this stage doing any further theoretical mathematical courses. the letters m. see Reed (P1–P9) and (O1–O9)4 . In order to discuss the remaining two axioms. . the second corrects some minor mistakes from the ﬁrst. . we will not usually make deductions directly from the axioms. it should be all review material. Most of you will have the second printing. more theoretical. 1 + 1. Moreover. Ashley Norris and Griﬀ Ware. }. notes. . I will also occasionally refer to the text by Adams (fourth edition). and Hilbert Spaces). The main material for the course will be the text by Reed and these supplementary. Real number system. . . 3 To say “≤” is a binary relation means that for any two real numbers a and b. Study Reed Chapter 1. but there we had axioms for < instead of ≤. 4 See also the MATH1115 Foundations Notes. . Interesting and signiﬁcant applications to other areas of mathematics and to ﬁelds outside mathematics will be given. or sometimes more generally will denote integers.
why? 8 See Adams page 4 and Appendix A page 23. The rationals do satisfy the Archimedean axiom. For example. if we assume the algebraic and order axioms then one can prove (see the following Remark) that9 : Archimedean axiom + ⇐⇒ Completeness axiom Cauchy completeness axiom Thus.5. For example.5.45 for the deﬁnition of a Cauchy sequence.
. Conversely. More precisely. but in fact it is impossible to prove the Completeness axiom from the algebraic. ε and for this you really need the Archimedean axiom. All the standard properties of real numbers follow from these axioms.1. This is the full set of axioms. as we will soon discuss10 . order and Cauchy completeness axioms.2). which says for any two real numbers a. What Reed really proves is that (assuming the algebraic and order axioms): Archimedean axiom + =⇒ Completeness axiom Cauchy completeness axiom
5 This is equivalent to the Archimedean “property” stated in Reed page 4 line 6.1) and from this the Archimedean axiom (Theorem 2. b > 0 there is a natural number n such that na > b.5. Remark 1. and then multiplying both sides of this inequality by a to get na > b — this is all justiﬁed since we know the usual algebraic and order properties follow from the algebraic and order axioms. you could choose n > 2(M −b) by the Archimedean axiom and then if follows that 2n > 2(M −b) (since ε ε 2n > n by algebraic and order properties of the reals). Reed only assumes the algebraic. The decimal expansion leads to a sequence of rational numbers which is Cauchy. The proof in Theorem 2. See MATH1115 Notes top of p. • The Cauchy completeness axiom: for each Cauchy sequence6 (an ) of real numbers there is a real number a to which the sequence converges. . This is equivalent to the Archimedean axiom plus the Cauchy completeness axiom. But this is the same as choosing n so that 2n ≥ 2(M −b) . take any irrational number such as 2. the version in Reed follows from the version here by ﬁrst replacing b in the version here by b/a to get that n > b/a for some natural number n.15.7 The (usual ) Completeness axiom8 is: if a set of real numbers is bounded above. but not the Archimedean axiom.1. as we see in the following Remark.5. also the MATH1115 Notes page 8 9 “=⇒” means “implies” and “⇐⇒” means “implies and is implied by” 10 Not a very auspicious beginning to the book. and then claims to prove ﬁrst the Completeness axiom (Theorem 2.1 is a hidden application of the Archimedean axiom. not only is the proof wrong. from now on. In fact. PRELIMINARIES
2
The two remaining axioms are: • The Archimedean axiom5 : if b > 0 then there is a natural number n such that n > b. order and Cauchy completeness axioms. The mistake in the “proof” of Theorem 2. 7 Note that the rationals do not satisfy the corresponding version of the Cauchy completeness √ axiom.1.5. 6 See Reed p. On page 53 line 6 Reed says “choose n so that 2−n (M − b) ≤ ε/2 .1. then it has a least upper bound. .2 is correct. ”. but there is a mistake in the proof of Theorem 2. we will assume both the Archimedean axiom and the Cauchy completeness axiom. and as a consequence also the (standard) Completeness “axiom”. but fortunately that is the only really serious blunder. Take a = 1 in the version in Reed to get the version here. but there is no rational number to which the sequence converges.
5. This part is followed and preceded by inﬁnitely many “copies” of itself.. Note the properties of absolute value in Proposition 1. in the sense that it contains a copy of R together with “inﬁnitesimals” squeezed between each real a and all reals greater than a. these models are sometimes called hyperreals. order and Cauchy completeness axioms implied the Completeness axiom (i.e.2 of Reed. But this would contradict the properties of the “hyperreals”. see below.5. Remark 1. These hyperreals are quite diﬃcult to construct rigorously. Prop. See the following crude diagram.5.2(c) in Reed.
. then combining this with the (correct) proof in Theorem 2. order and Cauchy completeness axioms imply the Archimedean axiom. and between any two copies there are inﬁnitely many other copies.5. a correct proof of Theorem 2.2. x = x − y + y ≤ x − y + y Hence (1) x − y ≤ x − y by the triangle inequality. and I will set it later as a exercise. would there be a way of avoiding the use of the Archimedean axiom in the proof of Theorem 2.
A property of absolute values. The reason is that there are models in which the algebraic.1? The answer is No. but here is a rough idea. Between any two lines there is an infinite number of other lines.1. Part of any such model looks like a “fattened up” copy of R. and less than any any number on any higher line.
b a 0 c

  
1

d


e
d<e<c<0< 1<a<b
Fattened up copy of R
A ``number'' on any line is less than any number to the right . We next ask: if we were smarter. order and Cauchy completeness axioms are true but the Archimedean axiom is false. Reed proves in eﬀect that the Completeness axiom implies the Archimedean axiom.2 we would end up with a proof that the algebraic.1. Another useful property worth remembering is x − y ≤ x − y Here is a proof from the triangle inequality.1.1). Proof.1. In Theorem 2.2. PRELIMINARIES
3
The proof of “⇐=” is about the same level of diﬃculty. 1. The proof that the Completeness axiom implies the Cauchy completeness axiom is not too hard. If we had a proof that the algebraic.
not. Sets and their properties.
11 We occasionally write A ⊂ B instead of A ⊆ B. the ﬁrst four are sets. For any sets A and B (A ∩ B)c = Ac ∪ B c (A ∪ B)c = Ac ∩ B c A ∩ (B ∪ C) = (A ∩ B) ∪ (B ∩ C) A ∪ (B ∩ C) = (A ∪ B) ∩ (B ∪ C) Remark 1. Theorem 1. But in practice. If a is a member of the set A.1. Which regions represent the eight sets on either side of one of the above four equalities? The proofs are also motivated by the diagram. that is A ⊆ B and B ⊆ A. We say for sets A and B that A = B if they have the same elements.
1. This means that every member of A is a member of B and every member of B is also a member of A.1 (De Morgan’s laws). But note that all the proofs really use are the deﬁnitions of ∩.
(2)
Since x − y = x − y or y − x. Q = { m/n  m and n are integers. we will not do this. A \ B.1. it is possible in principle to reduce all of mathematics to set theory. Sets and Functions. see the proof below of the ﬁrst of DeMorgan’s laws. I provide comments and extra information below. Some texts write A ⊂ B to mean that A ⊆ B but A = B. n = 0 }. For example.2. and more often by means of some property. c and the logical meanings of and. or. We usually prove A = B by proving the two separate statements A ⊆ B and B ⊆ A.2. we write a ∈ A. A ∪ B. Often A is a set of real numbers and R is the universal set.
.2. In fact.2. Sets are sometimes described by listing the members. Ac and A ⊆ B 11 . For given sets A and B. this is usually not very useful. ∪. The notion of a set is basic to mathematics. Sometimes we say a is an element of A. For example. PRELIMINARIES
4
and similarly y − x ≤ y − x = x − y. the result follows from (1) and (2). Note the deﬁnitions of A ∩ B. Note that Ac is not well deﬁned unless we know the set containing A in which we are taking the complement (this set will usually be clear from the context and it is often called the universal set). The following diagram should help motivate the above. Study Reed Chapter 1. and the ﬁfth is a statement that is either true or false.
not an open interval! We can represent A × B schematically as follows
2
f B f
AxB A
Remark 1.2. It is interesting to note that we can deﬁne ordered pairs in terms of sets. Recall that A × B = { (a. Hence: a ∈ Ac ∪ B c . b) = (b. R = R × R. b) are that it depends on the two objects a and b and the order is important. a)
12 If P and Q are two statements. This completes the proof of the ﬁrst equality The proofs of the remaining equalities are similar.e. Here (a.1.2. The argument above can be written in essentially the reverse order (check! ) to show that Ac ∪ B c ⊆ (A ∩ B)c . Since a was an arbitrary element in (A∩B)c . Then it is not the case that a ∈ A ∩ B. it follows that (A∩B)c ⊆ Ac ∪B c . then “not (P and Q)” has the same meaning as “(not P) or (not Q)”. i. (a. Products of sets. We will prove the ﬁrst equality by showing (A ∩ B)c ⊆ Ac ∪ B c and A ∪ B c ⊆ (A ∩ B)c . and will be left as exercises. b) is an ordered pair. In other words: it is not the case that (a ∈ A and a ∈ B). b ∈ B } For example. Important properties required for an ordered pair (a.
.12 That is: a ∈ Ac or a ∈ B c (remember that a ∈ X). First assume a ∈ (A ∩ B)c (note that a ∈ X where X is the universal set). PRELIMINARIES
5
C
A∩B
A
c
B
X
Proof. b)  a ∈ A. Hence: (it is not the case that a ∈ A) or (it is not the case that a ∈ B).
Since a and b are distinct and {c. but Ran(f ) need not be all of B. and we have seen this is not so). d} = {a}. see the next paragraph. This member of B is denoted by f (a). This is diﬀerent from the case for sets. Since {{a}. and write f : A → B. b}. An example is the squaring function. The power set P(A) of a set A is deﬁned to be the set of all subsets of A. {a. d} ∈ {{a}. and the set of all members in B of the form f (a) is the range of f . b}. b} contains two members whereas {c} contains one member. Note that Ran(f ) ⊆ B. {c. b}} and so {c} = {a} or {c} = {a. This completes the proof.e. d}. b}} = {{c}. Next suppose (a. since f only assigns a value to positive numbers. and so {{c}. {a. {c. b}. In conclusion. d) iﬀ (a = c and b = d). If a = b then {{a}. a} for any a and b. in Reed Example 2 page 9 one should say f is a function from R+ 13 (not R) to R. If a = b then {{a}. Ran(f ) = { f (a)  a ∈ A }. b) = (c. and you should normally keep to the deﬁnition here. He only requires that f assigns a value to some members of A. Functions. More generally. {a. b}} has only one member. d).2. We say A is the domain of f . b}} contains exactly two (distinct) members. For example. so that for him the domain of f need not be all of A. i. {a. In particular. Dom(f ) = A. d}}. b) = (c.e.1. the only property we require of ordered pairs is that (3) (a. The second equality cannot be true since {a. b}}. d} = {a. {c. the “squaring function” f : R → R is given by the rule f (a) = a2 for each a ∈ R. and so c = a. b) = (c. we need to prove (3). b}} contains exactly one member. We say f is a function from the set A into the set B.
. In conclusion. b} = {b. There are a number of ways that we can deﬁne ordered pairs in terms of sets. i. b}} it now follows that {c. d} = {a. {a. namely {a}. PRELIMINARIES
6
unless a = b. {a. if f “assigns” to every a ∈ A exactly one member of B. and so {c} = {a}. b) := {{a}. but since also {c} = {a} this would imply {{c}. {a. 1. The standard deﬁnition is (a. i. namely {a} and {a. d}} and hence {{a}. it follows c and d are distinct. {a. It is immediate from the deﬁnition that if a = c and b = d then (a. We consider the two cases a = b and a = b separately. Hence a = c and c = d. {a. Since also {c. since a = c it then follows b = d.e. This means {a} = {c} = {c.e. {{a}. d}} also contains exactly the one member {a}. b}} = {{c}. We informally think of f as some sort of “rule”.1. That is P(A) = { S  S ⊆ A }. a = c and b = d. The convention on page 8 in Reed is diﬀerent. Power sets. i. b} (otherwise {c. To show this is a good deﬁnition. This is not standard. d}} it follows {c} ∈ {{a}. {c. and write f : R+ → R. Proof. d). But functions
13 R +
= { a ∈ R  a > 0 }. a = b = c = d.
We will use he same notation in either case.
. See the preceding diagram. Do not confuse it with the notation f : A → B. and write f : A → B. b) ∈ f . and in particular there may not really be any “rule” which explicitly describes the function.1. if f is a subset of A × B with the property that for every a ∈ A there is exactly one b ∈ B such that (a. We normally think of f as an assignment. we say f is a function from A into B. f Note also that Reed uses the notation a −→ b to mean the same as f (a) = b. PRELIMINARIES
7
can be much more complicated than this. More generally. The way to avoid this problem is to deﬁne functions in terms of sets. We can think of the squaring function as being given by the set of all ordered pairs { (a. although the “assignment” may not be given by any rule which can readily be written down. Note that Reed uses a diﬀerent notation for the function f thought of as a “rule” or “assignment” on the one hand. a2 )  a ∈ R }. Reed’s notation is not very common and we will avoid it. Thus we are really just identifying f with its graph. This set is a subset of R × R and has the property that to every a ∈ R there corresponds exactly one element b ∈ R (b = a2 for the squaring function). and the function f thought of as a subset of A × B on the other (he uses a capital F in the latter case).
a set is countable iﬀ it can be written as an inﬁnite sequence (with all ai distinct) a1 . .0. (a1 . b1 ) ↓ (a4 .. .3. .e. the union of two countable sets is countable (Exercise 3). . deﬁne f (A) = {A}). . b3 ) . b2 ) (a4 . b3 ) ↓ (a2 . . b5 ) ↓ (a2 .3. . . the set of rational numbers is countable (Theorem 1. a2 . . .. . P(P(P(N))). in the sense that there is no oneone map from A onto P(A). ) Then S × T can be enumerated as follows: (a1 . . this is barely the beginning of what we can do: we could take the union of all the above sets to get a set of even larger cardinality. b5 ) ↓ (a3 . And still we would hardly have begun! However. the product of two countable sets is countable (Prop 1. (One can prove that P(N) and R have the same cardinality.6 and the remarks following that theorem). b5 ) ↓ . . Important results in this section are that an inﬁnite subset of a countable set is countable (Prop 1. In fact P(A) always has a larger cardinality than A. b4 ) ↑ → (a4 . as we will soon see.5). . b3 ) ↓ ← (a3 . of cardinality P(P(R))
. in a sense that can be made precise. → (a1 . ) and T = (b1 . A diﬀerent proof of Proposition 1. then so is S × T . . Study Reed Chapter 1. In the following I make some supplementary remarks. Note the deﬁnitions of ﬁnite and inﬁnite sets. If an inﬁnite set is not countable then we say it is uncountable. If S and T are countable. . A set is countable if it is inﬁnite and can be put in oneone correspondence with N. . b4 ) ↑ (a2 . Thus we can get larger and larger inﬁnite sets via the sequence N.3. i. .2. b2 . .3. Thus every set is ﬁnite.. Proposition 2.
→ ← →
It is not the case that all uncountable sets have the same cardinality.2. an . P(N). . the set of real numbers is uncountable (Theorem 1. b4 ) ↑ (a3 . which avoids the use of the Fundamental Theorem of Arithmetic and is more intuitive. Proof. b3 ) → (a4 .3. It is not true that all inﬁnite sets are countable. b1 ) . has the same cardinality as N. and the set of irrational numbers is uncountable (second last paragraph in Chapter 1. and then take this as a new beginning set instead of using N. . Cardinality Some inﬁnite sets are bigger than others. .4. . . . a3 . → (a1 . . .) In other words..2). a2 . .3). . . countable or uncountable.. i. if there is any likelihood of confusion.. (a1 . .4). P(P(N)). Let S = (a1 . b2 ) (a3 . b5 ) ↓ (a4 . CARDINALITY
8
2. (Some books include ﬁnite sets among the countable sets. b1 ) ↓ (a2 . in practice we rarely go beyond sets of cardinality P(P(P(N))).3. b1 ) (a3 . is the following. (There is certainly a oneone map from A into P(A). b2 ) . you can say countably inﬁnite.. . . . b2 ) ↑ (a2 . b4 ) .) In fact..e..
0. i. In either case we have a contradiction. Proof. Deﬁne B = { a ∈ A  a ∈ f (a) }. i. then B does satisfy the condition deﬁning B. / Then B ⊆ A and so B ∈ P(A). * Consider any f : A → P(A). CARDINALITY
9
Theorem 2. We have to show that f is not onto.2.
. and so / / b ∈ B. A and P(A) have diﬀerent cardinality. In particular. b ∈ f (b). then b does not satisfy the condition deﬁning B. and so the assumption must be wrong. / If b ∈ B. and so b ∈ B. b ∈ f (b). / If b ∈ B.3. One of the two possibilities b ∈ B or b ∈ B must be true. Hence there is no b ∈ A such that f (b) = B and in particular f is not an onto map.e. We claim there is no b ∈ A such that f (b) = B. For any set A.e. there is no map f from A onto P(A). (This implies that f is not an onto map!) Assume (in order to obtain a contradiction) that f (b) = B for some b ∈ A.
(c) an bn → ab. for all suﬃciently large n. 1.e. page 33 line 3). will be within distance ε of a.
3. think of a sequence which enumerates the rational numbers!). 3. points) in RN . Note that the Archimedean axiom is needed in each of these three examples (page 30 line 9. an . We will often be interested in sequences of vectors (i. Convergence properties. (d) an /bn → a/b. a2 .1 of Reed.1. but not (−1)n n2 → ∞ 3. diverges to +∞.2. SEQUENCES
10
3. Sequences in RN . e. . etc. then (an ) diverges. and the propositions and theorems in Section 2. .3. bn → L. page 31 line 8. it follows that b = 0 for all n ≥ N (say). 2. then cn → L. (assuming b = 0)16 3.). n2 → ∞. From the deﬁnition of convergence we have the following useful theorems. . The sequence (an ) converges to a if for any preassigned positive number (usually called ε).
(Note that the deﬁnition implies N may (and usually will) depend on ε. We often write N (ε) to emphasise this fact. (an )∞ or just (an ). If an → a and bn → b then (a) an ± bn → a ± b.2 for proofs. if an = n2 or an = (−1)n .
sequence (an ) is bounded if there is a real number M such that an  ≤ M for all n. Theorem 3. see Reed Section 2. more precisely for all n ≥ N (say). diverges.3. If an ≤ cn ≤ bn for all n15 and both an → L. Sometimes we start the enumeration from 0 or n=1 another integer. This implies that the sequence n (an /bn ) is deﬁned. We usually write a1 . Note also the deﬁnitions of 1.1. page 32 line 1. Basic deﬁnitions. a3 . If (an ) converges. . While the results may appear obvious.14 2.2. then (an ) is bounded. . Sequences Review the deﬁnitions and examples in Section 2.
14 A 15 Or
. .g. e. this is partly because we may only have simple examples of sequences in mind (For a less simple example.1 of Reed. . 16 Since b = 0. .2. all members of the sequence beyond a certain member (usually called the N th). (an )n≥1 . complex numbers or functions. Recall that a sequence is an inﬁnite list of real numbers (later. (b) can → ca (c is a real number). In symbols: for each ε > 0 there is an integer N such that an − a ≤ ε for all n ≥ N. at least for n ≥ N .) Study Examples 1–4 in Section 2.g.
. . where un = (u1 . 3(a). For example. 1
The analogues of 1. (b) an un → au. More precisely. aN are the column vectors of A. 1+
(−1)n n −n
e
1 .
sequence of points (un )n≥1 in RN . Since each of the components of un ± vn converges to the corresponding component of u ± v by the previous theorem. aN ]. This just means n n that all members of the sequence lie inside some single ball of suﬃciently large radius M .. so does each of the N sequences of real numbers (uj )n≥1 (where 1 ≤ j ≤ N ).. . . . result 2(a) follows. . . (Think of the case N = 2) Since (un ) converges.√ Since uj  ≤ M for all j and all n 1 )2 + · · · + (uN )2 ≤ N M for all n. Aun → Au (where A is any matrix with N columns). 2. then (un ) is bounded. Mj ] for j = 1.. Then Aun = u1 a1 + · · · + uN aN n n and Au = u1 a1 + · · · + uN aN Since u1 → u1 . write A = [a1 . This proves the n. 1. . . . . . . .
17 A
. More precisely Theorem 3. If un → u..3. while Au is the corresponding linear combination with coeﬃcients given by the components of u. where a1 . Proof. vn → v and an → a (sequence of real numbers) then (a) un ± vn → u ± v. . . uN → uN as n → ∞. it follows that un  = (un n ﬁrst result. . But this implies that n each of these N sequences of real numbers lies in a ﬁxed interval [−Mj . uN ) for each n. Let M be the maximum of the Mj .
a4 a3 a8
sin n →
a1
Such a sequence is said to converge if each of the N sequences of components converges. 3(b) hold. If (un ) converges. the result follows from 2(b) and n n repeated applications of 2(a). This follows from the fact that Aun is a linear combination of the columns of A with coeﬃcients given by the components of un . 3. SEQUENCES
11
a5
. Similarly for 2(b). 1. .1.17 2. . as is easily checked by looking at the components. 3. .3. . is bounded if n n there is a real number M such that un  = (u1 )2 + · · · + (u1 )2 ≤ M for all n. . . N .
E2 .2 respectively (see the ﬁrst column of the following matrix. One has a system which can be in one of N states E1 .
.6 .g.4.0. See Reed Section 2. then the probability that it is in E1 .4 The ﬁrst column corresponds to starting in E1 . Belconnen or Woden). Civic. and does this depend on the initial starting position? More generally. . and if it is in E3 then the probability that it is in E1 .3 . Suppose that a taxi is in one of three possible zones E1 . . But the proof is algebraically messy and does not easily generalise to more than 2 states.5 . E3 exactly one hour later is .2 . E3 .2 in these notes) for the case of 2 states.2 E2
.
.2.5. Thus the jth column corresponds to what happens at the next time if the system is currently in state Ej .4.3.2 (second column).4 (third column). We can represent the situation by a matrix of probabilities. For example. Later I will give a proof of Markov’s Theorem using the Contraction Mapping Theorem. We say P is a probability transition matrix.2 .2 . E2 . MARKOV CHAINS
12
4. E2 . Similarly suppose that if it is in E2 then the probability that it is in E1 . lets say at each hour of the day. . E2 . There is a proof in Reed of Markov’s Theorem (Theorem 4.4 .6. P = (pij ) is an N × N matrix where pij is the probability that if the system is in state Ej at some time then it will be in state Ei at the next time. . So you may omit Reed page 42 and the ﬁrst paragraph on page 43. E3 exactly one hour later is . . .4 E3 . We are interested in its location at a certain sequence of designated times. what is the probability that the taxi is in each of the states E1 . .2 . called a Markov process. We can also show this in a diagram as follows.3. .2 . E2 or E3 (e. the second to E2 and the third to E3 . consider the following situation. . . .5 .2. . EN .3 E1 .2 .6 .2 . E3 exactly one hour later is . after a large number of hours.4
We are interested in the longterm behaviour of the taxi. note that the sum of the probabilities must be 1). Suppose that if the taxi is in E1 at one hour. Markov Chains Important examples of inﬁnite sequences occur via Markov chains.
etc. See An Introduction to Probability Theory and Its Applications. we say the vector is a probability vector. statistical mechanics. p1N p22 . pN 2 . . is . and one usually knows from biological considerations what is the possibility of passing from any given state for one generation to another given state for the next generation. lamppost (L) or pub (P). We require the columns of a probability transition matrix to be probability vectors. The state of the cell is the number of particles of type A. pN 1
What happens after the system is in state E2 ↓ p12 .7 . queueing theory. . . .3 . with the three columns corresponding to being in the state H.4 .. Then one can often compute from biological considerations the probability that a random daughter cell is in each of the N possible states. MARKOV CHAINS
13
p11 p21 P = . . vol 1.4 0 . . There are many important examples: Cell genetics: Suppose a cell contains k particles. Suppose daughter cells are formed by subdivision. . 1] and the sum of the components is one. Population genetics: Suppose ﬂowers are bred and at each generation 1000 ﬂowers are selected at random. p2N .4. home (H). . We now return to the general theory.. A particular gene may occur either 0 or 1 or 2 times in each ﬂower and so there are between 0 and 2000 occurrences of the gene in the population. pN N
If a vector consists of numbers from the interval [0.4
. This gives 2001 possible states. .7
H
.7
The corresponding matrix of transition probabilities.2 .3 L . . so there are N = k + 1 states.3 0 . P respectively.3
P
. Random walk: Suppose an inebriated person is at one of three locations. L. telecommunications. . Suppose the probability of passing from one to the other an hour later is given by the following diagram:
. .2 . . . Feller for examples and theory. .4 . some of type A and some of type B. . EN is given by the probability vector
. .. by W.7 Other examples occur in diﬀusion processes. Suppose that at some particular time the probabilities of being in the states E1 .
and hence at the next time by P (P a) = P 2 a.1.3)
= P v. of moving from E2 to Ei ) + · · · + aN × (prob. P 2 a. MARKOV CHAINS
14
a1 a2 a = .19 The theorem applies to the taxi. . and to the inebriated person (take k = 2). then the sequence of vectors a.5 . This appears to converge to . . just begun one term later)
(from Theorem 3.25. and hence at the next time by P 3 a. The vector v is the unique nonzero solution of (P − I)v = 0. of moving from E1 to Ei ) + a2 × (prob.2 (Markov). We will later prove: Theorem 4. Then the probability at the next time of . . the corresponding probabilities at later times are given by P a = 0 . Hence (P − I)v = 0. . We return again to the general situation. . (so the sum of the ai is one). then the corresponding probabilities at the next time are given by P a.2 In fact it seems plausible that after a suﬃciently long period of time.
18 From
intuitive ideas of probability. Proof.0. Moreover. starting at Civic and hence starting with a probability 1 vector a = 0.24 .51 . Suppose a is any probability vector with N components.3. This is indeed the case (if we pass to the limit).25 . the same results are true even if we just assume P k has all entries nonzero for some integer k > 1. and this vector does not depend on a.4. Assume v = lim P n a.
19 The
matrix I is the identity matrix of the same size as P . If all entries in the probability transition matrix P are greater than 0. In the case of the taxi.251. P 2 a = .
.25 .501 . Thus if at some time the probabilities of being in each of the N states is given by the vector a. the probability of being in any particular zone should be almost independent of where the taxi begins. Then v = lim P n a
n→∞ n→∞ n→∞
= lim P n+1 a = lim P P n a = P lim P n a
n→∞
(as this is the same sequence. P 3 a = . P a. and a is a probability vector. of moving from EN to Ei ) = pi1 a1 + pi2 a2 + · · · + piN aN = (P a)i . converges to a probability vector v. . if you have not done any even very basic probability
theory. The proof that if v is the limit then (P − I)v = 0 can be done now. . etc. as we will see later. . aN being in state Ei is18 a1 ×(prob.248 .5 .3. .
For the taxi.8 .4 0 v1 0 .6 v3
.3 −.25 in the long term the taxi spends 5025 For the inebriate.5 .3 .2 −. MARKOV CHAINS
15
.6 .2 v2 = 0 0 .25 and so v3 .2 −. 33% by the lamppost and 22% in the pub.5 v1 with entries summing to 1 is indeed (see previous discussion) v2 = .2 .4 v1 . the non zero solution of 0 −.4.6 .3 v2 = 0 0 . the non zero solution of −.3 −.3 v3 0 4/9 v1 is v2 = 1/3 and so in the long term (as the evening progresses) the inebriate v3 2/9 spends 44% of the time at home.
We usually write a subsequence of (an ) as an1 . ank . a1 . all terms (not just successive ones) of the sequence from the N th onwards are within (this tolerance) ε of one another (N will normally depend on the particular “tolerance”). A sequence is Cauchy iﬀ it converges.2.5. The fact that Cauchy implies convergent is just the Cauchy Completeness Axiom (or what Reed calls the Completeness axiom.1 of Reed). . This gives a criterion for convergence which does not require knowledge of the limit. CAUCHY SEQUENCES AND THE BOLZANOWEIERSTRASS THEOREM
16
5. .) Loosely speaking.5 of the MATH1115 Notes and Adams. A subsequence is just a sequence obtained by skipping terms. . Theorem 5. . even if it is bounded. and keep subdividing it so that all members of the sequence after some index are in the chosen subinterval. Every bounded monotone sequence converges. Definition 5.4.2. Cauchy sequences and the BolzanoWeierstraß theorem A sequence converges iﬀ it is Cauchy. a101 . This implies the sequence is Cauchy and so it converges by the previous theorem. given any number (“tolerance”) ε > 0.20 Theorem 5. . . . an(2) . A sequence is monotone if it is either increasing or decreasing.1.4 of Reed. a44 . Appendix III. 5. an(3) .1. an(4) . and is decreasing if an ≤ an+1 for all n. (Note the use of the Archimedean Axiom on page 49 line 10. For example. (Study Section 2. most of it will be review. . an(k) . The proof that convergent implies Cauchy is easy (Proposition 2.
20 line
or
8. an3 . the sequence of chosen subintervals “converges to a point” which is the limit of the original sequence. an4 . . is a subsequence. . Study Section 2. . A sequence (an ) is a Cauchy sequence if for each number > 0 there exists an integer N such that am − an  ≤ whenever m ≥ N and n ≥ N. a31 . Theorem 2. an2 . .1. .of Reed. . The smaller we choose ε the larger we will need to choose N . every bounded sequence has a convergent subsequence. Suppose (an ) is a sequence of real numbers. . Subsequences and the BolzanoWeierstraß theorem.4–6 of Reed. Begin with a closed interval which contains all members of the sequence. For the proof of the theorem see Reed page 48. . It will follow that a bounded monotone sequence is convergent.
The idea is that.6 of Reed (omitting the Deﬁnition and Proposition on page 56).
.1. or an(1) . a27 . The proof uses a bisection argument. But by the BolzanoWeierstraß theorem. The following notes go through the main points. . Cauchy sequences. Also read Section 3. Recall that a sequence (an ) is increasing if an ≤ an+1 for all n.) We ﬁrst give the deﬁnition.means 8 lines from the bottom of the page. . 5. Read Sections 2. See Example 1 page 45 of Reed (note that the Archimedean axiom is used on page 45 line 8).1. see Reed page 48).3. A sequence need not of course converge. .
.5. Now deﬁne a new subsequence of the original sequence by taking one term from the ﬁrst subsequence. The limit of a convergent sequence is also a limit point of the sequence. Note that the theorem implies that any bounded sequence must have at least one limit point. (See Section 2.2. because any subsequence of a convergent sequence must converge to the same limit as the original sequence ( why?). Suppose I is a closed bounded interval containing all terms of the sequence. . since the an may occur in a diﬀerent order. then again subdivide and takes one of the new subintervals with an inﬁnite subsequence of the ﬁrst subsequence. Subdivide I and choose one of the subintervals so it contains an inﬁnite subsequence. See Reed pages 57. . On the other hand.
or
(an(k) )k≥1
Thus in the above example we have n1 = 1. then so does the limit of any convergent subsequence of (an ).2. Here is an example of a sequence which has every every number in [0.
. That is.
be an enumeration of all the rationals in the interval [0. The number d is a limit point of the sequence (an )n≥1 if there is a subsequence of (an )n≥1 which converges to d. a still later term from the third subsequence.5 of Reed. Completeness property of the reals. This sequence may not be a subsequence of (4). 57 of Reed. a later term from the second subsequence. . Every number a ∈ [0. r3 . . n4 = 44. . If all members of (an ) belong to a closed bounded interval I. n2 = 27. The proof of the theorem is via a bisection argument. .6 as the deﬁnition. even inﬁnitely many (see the following examples).3. . and this actually provides a sequence (an ) of rationals converging to a (modify the decimal expansion a little to get distinct rationals if necessary).2 (BolzanoWeierstraß Theorem). Instead of the Deﬁnition on page 56 of Reed. n5 = 101. We know this exists. r2 . sequences which do not converge may have many limit points. etc. n3 = 31. because the rationals are countable. Theorem 5. If a sequence (an ) is bounded then it has a convergent subsequence. etc. 1] has a decimal expansion. . One can now show that this subsequence is Cauchy and so converges. Moreover. But we can ﬁnd a subsequence (an ) of (an ) (which thus also converges to a) and which is also a subsequence of (4) as follows: a1 = a1 a2 = ﬁrst term in (an ) which occurs after a1 in (4) a3 = ﬁrst term in (an ) which occurs after a2 in (4) a4 = ﬁrst term in (an ) which occurs after a3 in (4) .1. 58. 5. Definition 5. and indeed we can explicitly write down such a sequence. Let (4) r1 . Now look at Examples 1 and 2 on pages 56. you may use Proposition 2. 1]. CAUCHY SEQUENCES AND THE BOLZANOWEIERSTRASS THEOREM
17
(ank )k≥1 . 1] as a limit point. but note the errors discussed below). it is the only limit point.
5. third edition.
.√ there is no rational least upper bound for A.1 uses the Archimedean property in its proof (in line 6 of page 53). See the MATH1115 2000 notes.1 of these notes. Similarly.2 in the corrected printing. For example. order. You should now go back and reread Section 1. but it is unique. A23. The real number b need not belong to A. algebraic. then there is a real number b which is the least upper bound for A.5. A = { x ∈ Q  x ≥ 0.4.5. See also Adams.5 of Reed about the equivalence of Theorems 2. x2 < 2 }. 1) and [0.5.5. A common term for the “least upper bound” is the supremum or sup. The Completeness Axiom in fact follows from the other (i. the greatest lower bound of a set is often called the inﬁmum or inf.
let Then although A is a nonempty set of rational numbers which is bounded above. Archimedean and Cauchy completeness) axioms.1 is only correct if we assume the algebraic. 2. For example.3 for a discussion. The assertion on page 53 before Theorem 2. CAUCHY SEQUENCES AND THE BOLZANOWEIERSTRASS THEOREM
18
The usual Completeness Axiom for the real numbers is:21 if a non empty set A of real numbers is bounded above. 1] is 1.2. It is also possible to prove the Archimedean and Cauchy completeness axioms from the algebraic. The (real number) least upper bound is of course 2.1. order and Archimedean axioms. order and Completeness axioms. that the Archimedean property follows from Theorem 2. pages 4.4. §2. Remarks on the text: Note that the remark at the end of Section 2. as Theorem 2. the least upper bound for each of the sets (0.
21 Note
that a similar statement is not true of we replace “real” by “rational”.3 and 2.e. is not correct.
6. THE QUADRATIC MAP
19
6. The Quadratic Map This is the prototype of maps which lead to chaotic behaviour. For some excellent online software demonstrating a wide range of chaotic behaviour, go to Brian Davies’ homepage, via the Department of Mathematics homepage. When you reach the page on “Exploring chaos”, click on the “graphical analysis” button. Read Section 2.7 from Reed. We will only do this section lightly, just to indicate how quite complicated behaviour can arise from seemingly simple maps. The quadratic map is deﬁned by the recursive relation (5) xn+1 = rxn (1 − xn ),
where x0 ∈ [0, 1] and 0 < r ≤ 4. It is a simple model of certain population models, where xn is the fraction in year n of some “maximum possible” population. The function F (x) = rx(1 − x) is positive on the interval [0, 1]. Its maximum occurs at x = 1/2 and has the value a/4. It follows that F : [0, 1] → [0, 1] for r in the given range.
If r ∈ (0, 1], it can be seen from the above diagram (obtained via Brian Davies’ software) that xn → 0, independently of the initial value x0 . This is proved in Theorem 2.7.1 of Reed. The proof is easy; it is ﬁrst a matter of showing that the sequence (xn ) is decreasing, as is indicated by the diagram in Reed, and hence has a limit by Theorem 5.1.3. If x∗ is the limit, then by passing to the limit in (5) it follows that x∗ = rx∗ (1−x∗ ), and so x∗ = 0. (The other solution of x = rx(1−x) is x = 1−1/r, which does not lie in [0, 1] — unless a = 1, which gives 0 again.) For 1 < r ≤ 3 and 0 < x0 < 1, the sequence converges to x∗ = 1 − 1/r. (This √ is proved in Reed, Theorem 2.7.2 for 1 < r ≤ 2 2. The proof is not diﬃcult, but we will not go through it.) If x0 = 0, 1 then it is easy to see that the sequence converges to 0.
6. THE QUADRATIC MAP
20
For 3 < r ≤ 4 and 0 < x0 < 1 the situation becomes more complicated. If 3 < r ≤ 3.4 (approx.) then xn will eventually oscillate back and forth between two values which become closer and closer to two of the solutions of x = F (F (x)) = F 2 (x) (see the numerics on Reed page 65).
As r gets larger the situation becomes more and more complicated, and in particular one needs to consider solutions of x = F k (x) for large k.
6. THE QUADRATIC MAP
21
f (xn ) gets closer and closer to f (a)”. 1 f (x) = x sin x is both continuous on its domain (−∞. (To ﬁx your ideas consider the case A = [a. (Draw a diagram. or a ﬁnite union of intervals. We will consider functions f : A → R. 7. 3. ∞). (For the proof see Reed Theorem 3. On the other hand. Usually A will be R. (21) and (22)). we restrict to sequences xn → a with xn = a.2.f. For example. See also the MATH1115 Notes. and we would not normally consider continuity at isolated points. Chapter 4.
on the second half of page 78 of Reed.1. because the required limit is f (a) (c. both in terms of sequences and in terms of ε and δ.) Definition 7. where A ⊆ R is the domain of f . A function f is continuous at a ∈ D(f ) if xn → a whenever (xn ) ⊆ D(f ).
22 By 23 In
. Deﬁnition of continuity. A function f is continuous at a point a ∈ D(f ) iﬀ for every ε > 0 there is a δ > 0 such that: x ∈ D(f ) and x − a ≤ δ ⇒ f (x) − f (a) ≤ .)
(xn ) ⊆ D(f ) we mean that each term of the sequence is a member of D(f ). However. (Draw diagrams!) Another interesting example is the function g given by
g(x) =
x if x ∈ Q −x if x ∈ R \ Q. In the case of continuity however.2. Limits and continuity. there is no continuous extension of f to all of R.)
1 Note that if f (x) = sin x then f is continuous on its domain (−∞.) 7. ∞) and also has a continuous extension to all of R. This is not an interesting sitation. an interval. one can see it is not necessary to make this restriction. CONTINUITY
22
7.2. if A = [0. b).1.
Then g is continuous at 0 and nowhere else.1. 0) ∪ (0.23 It follows (provided a is not an isolated point 24 of D(f )) that f is continuous at a ∈ D(f ) iﬀ
x→a
lim f (x) = f (a). 1] ∪ 2 then 2 is an isolated point of A (and is the only isolated point of A). point in its domain. if a is an isolated point of D(f ) then f is continuous at a although limx→a f (x) is not actually deﬁned.7. b] or A = (a. the deﬁnition of limx→a f (x) = L. depending on the deﬁnition used — see Reed page 78.1. also denoted by D(f ).3 page 77. Note the deﬁnition of
x→a
lim f (x) = L. or we require that 0 < x − a ≤ c (thus x = a). Theorem 7.
22
=⇒
f (xn ) → f (a)
The function f is continuous if it is continuous at every
This deﬁnition in terms of sequences is quite natural: “as xn gets closer and closer to a.1. Continuity Study Reed Sections 3. and convince yourself via the deﬁnition.1. By the following theorem it is equivalent to the usual “ε–δ” deﬁnition which does not involve sequences. 24 We say a ∈ A is an isolated point of A if there is no sequence x → a such that (x ) ⊆ n n A \ {a}.
(From our deﬁnitions. 0) ∪ (0.
the domain of f /g is { x ∈ D(f ) ∩ D(g)  g(x) = 0 }.1. More precisely:
. (See Reid p83. 7.4.3. Proof.3. A proof using the BolzanoWeierstraß theorem is in Reed (Theorem 3. Noe that a similar result is not true on other intervals.2 on pp 80.4. Corollary 7.1.1. consider f (x) = 1/x on (0. Proof. the hard work has alredy been done in proving the BolzanoWeierstraß theorem. cf for any c ∈ R 3. For the proofs of Theorem 7. These require the completeness property of the real numbers for their proofs. and the domain of f ◦ g is { x  x ∈ D(g) and g(x) ∈ D(f ) }. f /g 5. f g 4. See Reid p82. The proofs in terms of the sequence deﬁnition of continuity are very easy. f ◦ g The domain of each of the above functions is just the set of real numbers where it is deﬁned.1.4. Theorem 7. The steeper the graph of f near a. Proof. we see that for any given ε the required δ will normally depend on a. Suppose f and g are continuous. Uniform continuity. since the hard work has already been done in proving the corresponding properties for limits of sequences. f must take all values between m and M . The proof is fairly easy. See Examples 1–4 in Section 3.2 (Intermediate Value Theorem). 1].1 of Reed.) The function f must take the values m and M at points c and d (say) by Theorem 7. CONTINUITY
23
7. Then the range of f is the interval [m.1. M ].1.1 and 3. By the Intermediate Value Theorem applied with the endpoints c and d.2 or the MATH1115 notes Section 4. Every continuous function deﬁned on a closed bounded interval is bounded above and below and moreover has a maximum and a minimum value. This proves the theorem.2. Let f be a continuous function deﬁned on a closed bounded interval with minimum value m and maximum value M . since the hard work has already been done done in proving the BolzanoWeierstraß theorem.2.1. If for each ε > 0 there is some δ > 0 which will work for every a ∈ D(f ).1 and 3.4. the domain of cf is D(f ). see Reed Theorems 3. Theorem 7. then we say f is uniformly continuous. Thus the domain of f + g and of f g is D(f ) ∩ D(g).81). Deeper properties of continuous functions.4.5.7. 1. the smaller the δ that is required. Properties of continuity. For example. 7. Theorem 7. Every continuous function deﬁned on a closed bounded interval takes all values between the values taken at the endpoints. Again.3. Then the following are all continuous on their domains. f + g 2. If we consider the ε–δ deﬁnition of continuity of f at a point a as given in Theorem 7.2.3.
A function f is uniformly continuous if for every ε > 0 there is a δ > 0 such that for every x1 .3.7.5. There are examples of nondiﬀerentianble Lipschitz functions. Let > 0 be any positive number. Any uniformly continuous function is certainly continuous. Proof. Any continuous function deﬁned on a closed bounded interval is uniformly continuous. x2 ∈ D(f ): x1 − x2  ≤ δ implies f (x1 ) − f (x2 ) ≤ ε. f is diﬀerentiable and f (x) ≤ M on I. x2 in the domain of f .) However. In other words.5. If f is Lipschitz. On the other hand. then it is uniformly continuous.2. it follows that x − a ≤ This proves uniform continuity. then f is Lipschitz with Lipschitz constant M . Proof. Proposition 7. See Reid p 85. the slope of the line joining any two points on the graph of f is ≤ M . We say M is a Lipshitz constant for f . I will discuss this in class. a ∈ Df .
graph of f
. Similarly if x2 < x1 . 1].1.5. ♣ Where did we use the fact that the domain of f is an interval? Give a counter example in case the domain is not an interval.4. we have the following important result for closed bounded intervals. Definition 7. we are done. x2 ∈ I and x1 < x2 . Since f (x) − f (a) ≤ M x − a for any x.5. Proof. and the function x2 with domain R. There is an important case in which uniform continuity is easy to prove. A function f is Lipschitz if there exists an M such that f (x1 ) − f (x2 ) ≤ M x1 − x2  for all x1 . the function 1/x with domain (0. M ⇒ f (x) − f (a) ≤ . are continuous but not uniformly continuous. CONTINUITY
24
Definition 7. Suppose x1 . such as the following “sawtooth” function. Then f (x1 ) − f (x2 ) = f (c) x1 − x2  for some x1 < c < x2 by the Mean Value Theorem. Since f (c) ≤ M . Theorem 7. It follows from the Mean Value Theorem that if the domain of f is an interval I (not necessarily closed or bounded). This is equivalent to claiming that f (x1 ) − f (x2 ) ≤M x1 − x2  for all x1 = x2 in the domain of f . (See Reed Example 2 page 84.
(See Reed Lemma 1 page 88 for the proof.25 Then the lower and upper sum for the partition P are deﬁned by
N N b b
(7)
LP (f ) =
i=1
mi (xi − xi−1 ). .b] }. 8. b] and deﬁne
b
(10)
a
25 If
f := inf {UP (f )} = sup{LP (f )}. See Reed page 87 for a diagram. and the set of all such numbers is bounded above (by any upper sum). The proof is by taking a common reﬁnement of P and Q and using (8). If these two numbers are equal. Adding points to a partition increases lower sums and decreases upper sums. More precisely.) For each partition P we have a number LP (f ). . xN ) such that a = x0 < x1 < · · · < xN = b. . respectively below and above the graph of f . UQ (f ) ≤ UP (f ). .1. The idea is that a f (which we also write as a f (x) dx) should be a limit of lower and upper sums corresponding to rectangles. Let (6) mi = inf { f (x)  x ∈ [xi−1 . b] is any ﬁnite sequence of points (x0 . P and Q are arbitrary partitions. the proof is essentially the same for general bounded functions provide one uses instead sup and inf. and will be treated in the ﬁrst Analysis course in third year. b] and f is bounded. xi ] }.
P
Similarly. (At ﬁrst you may think of f as being continuous.8. we can replace sup and inf by max and min respectively.
P P
f is continuous. then we say f is (Riemann)26 integrable on [a. Or see Lemma 7.
26 There
. .) Any lower sum is ≤ any upper sum: (9) LP (f ) ≤ UQ (f ). is a partition which includes all the points in P . Deﬁnition. xi ] }.
They correspond to the sum of the areas of the lower rectangles and the upper rectangles respectively. . This is more diﬃcult to develop. i. and the MATH1116 notes. (See Reed Lemma 2 page 89. Also see the MATH1116 notes. Mi = sup{ f (x)  x ∈ [xi−1 . is a much more powerful notion called Lebesgue integrable. . but unless stated otherwise it is only necessary that f be bounded.) A partition P of [a. for i = 1.3. the inﬁmum of the set of all upper sums.
UP (f ) =
i=1
Mi (xi − xi−1 ).b] }. N . The supremum of this set of numbers is denoted by sup{LP (f )} = sup{ LP (f )  P is a partition of [a. Suppose f is deﬁned on the interval [a. if Q is a reﬁnement of P .4 page 72 of the 1998 AA1H notes — although the proof there is for continuous functions and so uses max and min in the deﬁnition of upper and lower sums. inf {UP (f )} = inf{ UP (f )  P is a partition of [a. .
P
exists. Here. Riemann Integration Study Reed Chapter 3.e. then (8) LP (f ) ≤ LQ (f ). RIEMANN INTEGRATION
25
8.
Proof. or MATH1116 notes. Theorem 8.4. If f is continuous on [a. since RP (f ) depends on the points x∗ as well as P . b] and f ≤ g.3. The following theorem justiﬁes using Riemann sums.1. RIEMANN INTEGRATION
26
(For an example of a function which is bounded but not Riemann integrable.1 on pages 89. see Reed Problem 1 page 93.
See Reed page 91 for the proof. then f is integrable on [a. Then Theorem 8.) Theorem 8.2. b]. b]. (See Reed Lemma 3 and Theorem 3. b].3 page 92 for te proof. If f is a continous function on a closed bounded interval [a.3. 8. Suppose f is Riemann integrable on the interval [a. Then
b b b
(cf + dg) = c
a a
f +d
a
g.3. i It is clear that LP (f ) ≤ RP (f ) ≤ UP (f ).2.3.) The idea is ﬁrst to show.2.3. Then
b b
f≤
a a
g. that it is suﬃcient to prove for each ε > 0 that there exists a partition P for which (11) UP (f ) − LP (f ) < ε. Theorem 8. Suppose f is continuous on [a. Then
b b
f ≤
a a
f .3.8. In order to numerically estimate a f we can choose a point x∗ in each interval i [xi−1 . b] and f  ≤ M then
b
f ≤ M (b − a).) 8. Then
b
RPk (f ) →
a
f
as
k → ∞. Properties of the Riemann integral.
a
.3. 90.
(See Reed Theorem 3. Basic results. b] such that limk→∞ Pk = 0 and suppose RPk (f ) is a corresponding sequence of Riemann sums.
Theorem 8. Let Pk be a sequence of partitions of [a.3. xi ] and compute the corresponding Riemann sum
N
RP (f ) :=
i=1
f (x∗ )(xi − xi−1 ). Let P denote the maximum length of the intervals in the partition P . Suppose f and g are continuous on [a. i
This is not a precise notation.2.1. b]. Suppose f and g are continuous on [a.
b
The existence of such a P follows from the uniform continuity of f .
Corollary 8. b] and c and d are constants. For proofs of the following theorems see Reed pages 91–93. essentially from the deﬁnitions of sup and inf.2.
3. Results analogous to those above are still true if we assume the relevant functions are integrable. This shows that many quite “bad” (but still bounded) functions can be Riemann integrable. and f (x) = sin 1/x if 0 < x ≤ 1. See Reed Theorem 3.8. This material will be done “lightly”.) Moreover. Chapter 13. f (0) = 0. RIEMANN INTEGRATION
27
Theorem 8. Then
c b b
f+
a c
f=
a
f. but not that much more diﬃcult. any bounded monotone increasing 27 (or decreasing) function on an interval [a. See “Calculus” by M. In particular. Also. The proofs are then a little diﬀerent.e. • However.1. Reed Example 2.3. b]. I return to this after we discuss diﬀerentiation. the sum of two Riemann integrable functions is Riemann integrable (Reed Q13 p112). See Reed p 108 for the deﬁnition of a onesided limit in terms of
27 f
is monotone increasing if f (x1 ) ≤ f (x2 ) whenever x1 < x2 .5. then
b j=k+1 aj
f=
a j=1 aj−1
f.1. since every upper sum is 1 and every lower sum is 0. (Interesting examples are f (x) = sin 1/x if −1 ≤ x ≤ 1 & x = 0. f (0) = 0. aj − δ]. Reed writes limδ 0 to mean the same thing.
and
aj aj −δ
f = lim
aj−1
δ→0+
f. if the points of discontinuity are a1 < a2 < · · · < ak and we set a0 = a.
Remark 8. b]. c. 8.2. A monotone function can have an inﬁnite number of points where it is discontinuous (Reed Q8 p112).5.5. b] is Riemann integrable. if f is continuous on [a. See Reed Section 3. Riemann integration for discontinuous functions. b] and a ≤ c ≤ b. and then take the limit.f. ﬁnd a function whose derivative is f ) or by numerical methods. b].2. 3) is that if a bounded function is continuous except at a ﬁnite number of points on [a. we can often ﬁnd aj−1 +δ f by standard methods for computing integrals (i.4. Spivak. 1] by f (x) = 1 0 0 ≤ x ≤ 1 and x irrational 0 ≤ x ≤ 1 and x rational
then f is not Riemann integrable. • The next result (Theorems 3.
. By the onesided limit limδ→0+ we mean that δ is restricted to be positive. The fundamental connection between integration and diﬀerentiation is made in Reed in Section 4. Suppose f is continuous on [a. but not necessarily continuous. This often gives a convenient way of ﬁnding integrals.5. • Example 1 of Reed : If f is the function deﬁned on [0.
aj−1 +δ a −δ
j Since f is continuous on each interval [aj−1 + δ. then
b
f = F (b) − F (a)
a
where F (x) = f (x) for all x ∈ [a. then it is Riemann integrable on [a. ak+1 = b. b].
e. There is also a natural.
. and we saw that it could be obtained by computing the Riemann integrals of certain continuous functions and by perhaps taking limits. ∞) then we deﬁne
∞ b
f = lim
a
b→∞
f.6. see Reid p 78. In this case if the function fj is deﬁned on [aj−1 .5. b) then we deﬁne
b b−δ
f = lim
a
δ→0+
f. For example.
x→aj
then fj is continuous on [aj−1 .8. the Riemann integral can often be deﬁned by taking limits. or the domain of integration is an inﬁnite interval.4 p 109.
a
provided the limit exists. Improper Riemann integrals. aj ]. limx→∞ g(x) = L if for every > 0 there is a real number K such that x ≥ K implies g(x) − L ≤ .5. But it is clear what the deﬁnition in terms of sequences should be. limx→∞ g(x) = L if whenever a sequence xn → ∞ then g(xn ) → L.
j and each aj−1 fj can often be computed by standard methods. See Reed Section 3. deﬁnition which does not involve sequences. if f is continuous on [a. • A particular case of importance is when the left and right limits of f exist at the discontinuity points aj . The function f then is said to be piecewise continuous. The two deﬁnitions are equivalent by the same argument as used in the proof of similar equivalences for continuity and also for ordinary limits. see Reed Section 3.
where here g(x) = a f . and equivalent. aj ] by f (x) aj−1 < x < aj limx→a+ f (x) x = aj−1 fj (x) = j−1 lim − f (x) x = aj . For details and examples of the following. The Riemann integral is then called an improper integral. In previous situations we had a bounded function with a ﬁnite number of discontinuities. The “ –δ” deﬁnition is in Deﬁnition 3. is the natural one and is given on p 32 of Reed.
b j=k+1 aj
f=
a a j=1 aj−1
fj . If f is continuous on [a. (xn ) “diverges” to inﬁnity). If the function is not bounded. See Reed Example 3 p 108 and Corollary 3.8. I do not think that Reed actually deﬁnes what is meant by
x→∞ x
lim g(x). The Riemann integral existed according to our deﬁnition.13 p 29 of the AA1H 1998 Notes. This material will be done “lightly”.
8.
a
provided the limit exists. Namely. Namely. The deﬁnition of xn → ∞ (i. RIEMANN INTEGRATION
28
sequences.
b). R .8. because of “cancellation” of successive positive and negative bumps. The integral 1 sin x dx exists in x the limit sense deﬁned above. ∞ Note the subtlety in Reed Example 5 p 116. etc.
. This is analogous to the fact that the series 1 − 1 + 1 − 1 . but the “area” above the axis and the “area” below the axis is inﬁnite. b]. . (a. converges. 2 3 4 although the series of positive and negative terms each diverge. . RIEMANN INTEGRATION
29
Similar deﬁnitions apply for integrals over (a.
Theorem 4.1 note in particular Theorem 4. three paragraphs from the bottom. h→0 h If S = [a. Theorem 4.4 and Adams pages 192–195. Material from last year.1. is a generalisation. b]. and its generalisation to ﬁnding zeros of functions f : Rn → R for n > 1 (see Adams pages 745–748). In Reed Section 4.2 of Reed and MATH1116 Notes.2.3. you may prefer the proof given in Adams). Suppose f : S → R. 9. 4. We want to estimate xn+1 − x in terms of xn − x.5.1 — diﬀerentiability at any point implies continuity there. See Reed p 126.3) says geometrically that “the slope of the chord joining two points on the graph of a diﬀerentiable function equals the derivative at some point between them”.2.1. but not necessarily open or bounded). See Reed Section 4. producs and quotients). is very important for both computational and theoretical reasons. This connects diﬀerentiation and integration. Newton’s method. Mean value theorem and Taylor’s theorem. and Theorem 4.2. Applying the above formula. Note also how L’Hˆpital’s Rule.3 and Adams pages 285–290. see Reed page 143 formula (14). if xn is the nth approximatiion to the solution x of f (x) = 0. see Reed Theorems 4. x ∈ S. and f (c) exists.4 is diﬀerent from that in the MATH116 notes. f (xn )
Remarks on proof of Theorem 4. The next main result is the justiﬁcation for using derivatives to ﬁnd maximum (and similarly minimum) points. Newton’s method. if a continuous function takes a maximum at c.) 9. Then f (x).4 and 4. and both Taylor’s Theorem and the Mean Value Theorem.4. and p 108 (or MATH1116 notes). (The proof in Reed of Theorem 4. 585. gives xn+1 − x ≤ 1 f (xn ) f (τ n ) + f (τn ) 2! xn − x2
for some τ n and τn between xn and x.2.1.2 page 138. then f (c) = 0.3.
. lim 9.2. Taylor’s Theorem. As we see in Reed page 141. and there is some open interval containing x which is a subset of S.3. 584.9. Read Sections 4.3 (the chain rule. See Reed Section 4. DIFFERENTIATION
30
9. Diﬀerentiation 9. Fundamental theorem of calculus.2 (the usual rules for diﬀerentiating sums. it is also convenient to deﬁne the derivatives at the endpoints a and b of S in the natural way by restricting h to be > 0 or < 0 respectively. particularly with the Legendre form of the remainder as in Reed page 136. (Usually S will itself be an interval. follows from Taylor’s o theorem. Theorem 4.4. The Mean Value Theorem (Reed Theorem 4. the derivative of f at x is deﬁned in the usual way as f (x + h) − f (x) . then
n
xn+1 = xn −
f (xn ) .1.
9. A strictly monotonic function f is oneone and so has an inverse f −1 .
•
o
f(x) = x3
If f is diﬀerentiable on I and f > 0. f and f
exist and are continuous.1 and pages 258–260. problem 1. then its range is an interval and its inverse exists and is strictly monotonic and continuous. DIFFERENTIATION
31
Assume f (x) = 0 and f is C 2 in some interval containing x. Adams Section 4. But the derivative of a strictly increasing function may somewhere be zero. Corollary 3.4. f (x) = x3 . A function is strictly monotonic if it is either strictly increasing or strictly decreasing. See Reed page 83.2. Hence (13) xn+1 − x ≤ C xn − x2 . The domain of f −1 is the range of f and the range of f −1 is the domain of f . Thus x2 is even closer to x than x1 and so (13) is true with n = 2. See the discusion at the end of the ﬁrst paragraph on page 144 of Reed. HINT: Use the Mean Value Theorem).1.
.28 Then provided xn − x < ε (say). deﬁned by f −1 (y) = the unique x such that f (x) = y. The only remaining point in the proof of Theorem 4. in fact it also gives the much better quadratic quadratic convergence as noted on page 144 line 4.143. is strictly increasing if x < y implies f (x) < f (y).5.4 and page 147 Theorem 4. e.
We say that (xn ) converges to x quadratically fast. 9. If f : I → R is strictly monotonic and continuous. then it is strictly increasing (Reed page 150.5. See Reed Section 4. x2 − x ≤ Note that even though the proof only explicitly states linear convergence in (16). etc. But from (13) with n = 1 we have 1 x1 − x 2 1 provided x1 − x ≤ 2C .5. Unless stated otherwise.g. Similarly one deﬁnes strictly decreasing.1 in Reed. Monotonic functions.
28 That
is. pages 142. is to show that if x1 − x < ε (and so (13) is true for n = 1) then xn − x < ε for all n (and so (13) is true for all n). we alsways take the domain I to be an interval. it follows that (12) 1 f (xn ) f (τ n ) + f (τn ) 2! ≤C
for some constant C (depending on f ). A function f : I → R.
see Reed page 151. The proof in Reed does have the advantage that it generalises to situations where the Fundamental Theoorem cannot be applied. DIFFERENTIATION
32
If f : I → R is strictly monotonic and continuous. problem 7.5.5 also follows from the Fundamental Theorem of Calculus. for the simpler proof of this more general fact. The result of Theorem 4. then by the above result we have that f −1 is everywhere diﬀerentiable. Moreover. and is diﬀerentiable at a with f (a) = 0.3) The change of variables formula.5. and everywhere diﬀerentiable with f (x) = 0. But this notation can cause problems and be ambiguous. since 1 (f −1 ) (y) = f (f −1 (y)) by the previous formula. and both f −1 and f are continuous. does not really require φ to be strictly increasing. problem 6 (See Adams page 326. (This is Reed Corollary 4. the derivative is continuous. using the Fundamental Theorem of Calculus). Theorem 4.4. Theorem 6.5.
dy dx
= 1/ dx . as noted in Reed page 151.9.
29 Loosely
speaking. then f −1 is diﬀerentiable at b = f (a) and29 1 (f −1 ) (b) = f (a) If f : I → R is strictly monotonic and continuous. dy
.
10. BASIC METRIC AND TOPOLOGICAL NOTIONS IN Rn
33
10. Basic metric and topological notions in Rn See Adams page 643 (third edition), or (better) fourth edition pages 601,602. See also Reed problem 10 page 40 and the ﬁrst two paragraphs in Section 4.6. But we do considerably more. 10.1. Euclidean nspace. We deﬁne Rn = { (x1 , . . . , xn )  x1 , . . . , xn ∈ R }. Thus Rn denotes the set of all (ordered) ntuples (x1 , . . . , xn ) of real numbers. We think of R1 , R2 and R3 as the real line, the plane and 3space respectively. But for n ≥ 4 we cannot think of Rn as consisting of points in physical space. Instead, we usually think of points in Rn as just being given algebraically by ntuples (x1 , . . . , xn ) of real numbers. (We sometimes say (x1 , . . . , xn ) is a vector, but then you should think of a vector as a point, rather than as an “arrow”.) In R2 and R3 we usually denote the coordinates of a point by (x, y) and (x, y, z), but in higer dimensions we usually use the notation (x1 , . . . , xn ), etc. Although we usually think of points in Rn as ntuples, rather than physical points, we are still motivated and guided by the geometry in the two and three dimensnional cases. So we still speak of “points” in Rn We deﬁne the (Euclidean) distance 30 between x = (x1 , . . . , xn ) and y = (y1 , . . . , yn ) by d(x, y) = (x1 − y1 )2 + · · · + (xn − yn )2 . This agrees with the usual distance in case n = 1, 2, 3. We also call the set of points (x1 , . . . , xn ) ∈ Rn satisfying any equation of the form a1 x1 + · · · + an xn = b, a hyperplane. Sometimes we restrict this deﬁnition to the case b = 0 (in which case the hyperplane “passes through” the origin). 10.2. The Triangle and CauchySchwarz inequalities. The Euclidean distance function satisﬁes three important properties. Any function d on a set S (not necessarily Rn ) which satisﬁes these properties is called a metric, and S is called a metric space with metric d. We will discuss general metrics later. But you should observe that, unless noted or otherwise clear from the context, the proofs and deﬁnitions in Sections 10–14 carry over directly to general metric spaces. We discuss this in more detail in Section 15. Theorem 10.2.1. Let d denote the Euclidean distance function in Rn . Then 1. d(x, y) ≥ 0; d(x, y) = 0 iﬀ x = y. (positivity) 2. d(x, y) = d(y, x). (symmetry) 3. d(x, y) ≤ d(x, z) + d(z, y). (triangle inequality) Proof. The ﬁrst two properties are immediate. The third follows from the CauchySchwarz inequality, which we prove in the next theorem. To see this let ui = xi − zi , Then d(x, y)2 =
i
30 Later
vi = zi − yi ,
(xi − yi )2 =
i
(xi − zi + zi − yi )2
we will deﬁne other distance functions (i.e. metrics, see the next section) on Rn , but the Euclidean distance function is the most important.
10. BASIC METRIC AND TOPOLOGICAL NOTIONS IN Rn
34
=
i
(ui + vi )2 =
i
2 u2 + 2ui vi + vi i 2 vi 1/2
=
i
u2 + 2 i
i
ui vi +
i 1/2
≤
i
u2 + 2 i
i
u2 i
i
u2 i
+
i
2 vi
(by the CauchySchwarz inequality) 1 =
i
2
1 2
2 = d(x, z) + d(z, y)
2
u2 i
+
i
2 vi
.
The triangle inequality now follows. The following is a fundamental inequality.31 Here is one of a number of possible proofs. Theorem 10.2.2 (CauchySchwarz inequality). Suppose (u1 , . . . , un ) ∈ Rn and (v1 , . . . , vn ) ∈ Rn . Then
1/2 1/2 2 vi i
(14)
i
ui vi ≤
i
u2 i
.
Moreover, equality holds iﬀ at least one of (u1 , . . . , un ), (v1 , . . . , vn ) is a multiple of the other (in particular if one is (0, . . . , 0)). Proof. Let f (t) =
i
(ui + tvi )2 =
i
u2 + 2t i
i
ui vi + t2
i
2 vi .
Then f is a quadratic in t (provided (v1 , . . . , vn ) = (0, . . . , 0)), and from the ﬁrst expression for f (t), f (t) ≥ 0 for all t. It follows that f (t) = 0 has no real roots (in case f (t) > 0 for all t) or two equal real roots (in case f (t) = 0 for some t). Hence
2
(15)
4
i
ui vi
−4
i
u2 i
i
2 vi ≤ 0.
This immediately gives the CauchySchwarz inequality. If one of (u1 , . . . , un ), (v1 , . . . , vn ) is (0, . . . , 0) or is a multiple of the other then equality holds in (14). (why? ) Conversely, if equality holds in (14), i.e. in (15), and (v1 , . . . , vn ) = (0, . . . , 0), then f (t) is a quadratic (since the coeﬃcient of t2 is nonzero) with equal roots and in particular f (t) = 0 for some t. This t gives u = −tv, from the ﬁrst expression for f (t). This completes the proof of the theorem. Remark 10.2.1. In the preceding we discussed the (standard) metric on Rn . It can be deﬁned from the (standard) norm and conversely. That is d(x, y) = x − y , and x = d(0, x)
The triangle inequlity for the metric follows easily from the triangle inequality for the norm (i.e. x + y ≤ x + y ), and conversely (Exercise).
31 The CauchySchwarz inequality can be interpreted in terms of inner products as saying that u · v ≤ u v . But we will not need to refer to inner products here.
10. BASIC METRIC AND TOPOLOGICAL NOTIONS IN Rn
35
10.3. Preliminary Remarks on Metric Spaces. Any function d on a set M (not necessarily Rn ) which satisﬁes the three properties in Theorem 10.2.1 is called a metric, and M together with d is called a metric space. You should now study Section 15.1, except possibly for Example 15.1.3. We will discuss general metrics later. But you should observe that unless noted or otherwise clear from context, the proofs and deﬁnitions in Sections 10–14 carry over directly to general metric spaces. We discuss this in more detail in Section 15. 10.4. More on intersections and unions. The intersection and union of two sets was deﬁned in Reed page 7. The intersection and union of more than two sets is deﬁned similarly. The intersection and union of a (possibly inﬁnite) collection of sets indexed by the members of some set J are deﬁned by Aλ = { x  x ∈ Aλ for all λ ∈ J },
λ∈J
Aλ = { x  x ∈ Aλ for some λ ∈ J }.
λ∈J
For example: 0,
n∈N
1 = {0}, n 1 = ∅, n
0,
n∈N
n∈N
1 1 = {0}, − , n n [n, ∞) = ∅,
n∈N
(r − , r + ) = R (for any ﬁxed
r∈Q
> 0),
{r} = R.
r∈R
It is an easy generalisation of De Morgan’s law that (A1 ∪ · · · ∪ Ak )c = Ac ∩ · · · ∩ Ac . 1 k More generally,
c
Aλ
λ∈J
=
λ∈J
Ac . λ
Also, (A1 ∩ · · · ∩ Ak )c = Ac ∪ · · · ∪ Ac 1 k and
c
Aλ
λ∈J
=
λ∈J
Ac . λ
10.5. Open sets. The deﬁnitions in this and the following sections generalise the notions of open and closed intervals and endpoints on the real line to Rn . You should think of the cases n = 2, 3. A neighbourhood of a point a ∈ Rn is a set of the form Br (a) = { x ∈ Rn  d(x, a) < r }
x ) > g(x .5. unless otherwise stated. 4. or “=”. and since Ak is open there is a real number rk > 0 such that Brk (a) ⊆ Ak . . Then a ∈ Ak for every k. { (x. An example in R3 is { (x. y)  y = x2 }. . The ﬁrst part is trivial (the empty set is open because every point in it certainly has the required property!) 2. . If a set is described by a ﬁnite number of strict inequalities in terms of “and” and “or”.1. y)  x > 0 }. and in case n = 2 also call it the open disc of radius r centred at a. BASIC METRIC AND TOPOLOGICAL NOTIONS IN Rn
36
for some r > 0. . 2. Proof.2. Consider any point a ∈ A1 ∩ · · · ∩ Ak . x ) }. x )  f (x . 1. 3. 33 Let S = { (x . y)  y > x2 }. y. { x ∈ R2  d(x. 1. then it will be obtained by taking ﬁnite unions and intersections of open sets as above. then their intersection A1 ∩ · · · ∩ Ak is open. { (x. . 3. Choosing r =
32 We sometimes say S is open in R n because there is a more general notion of S being open in E for any set E such that S ⊆ E. { (x. are open33 . . . z)  x2 + y 2 < z 2 and sin x < cos y }. Typically. . . . The sets ∅ and Rn are both open.10. Every neighbourhood is itself open (why? ). . . .5. “>”. Ak is a ﬁnite collection of open subsets of Rn . . In case n = 1 we say Br (a) is an open interval centred at a. . The third example in the previous section shows that the intersection of an inﬁnite collection of open sets need not be open. If A1 . Other examples in R2 are 1. the intersection of a ﬁnite number of open sets is open. . . A set S ⊆ Rn is open (in Rn )32 if for every a ∈ S there is a neighbourhood of a which is a subset of S. then the union λ Aλ of all the sets Aλ is also open. We also say that Br (a) is the open ball of radius r centred at a. . { (x. etc. If f and g are continuous (we n n n 1 1 1 rigorously deﬁne continuity for functions of more than one variable later) and a = (a1 . .
. 2. Theorem 10. sets described by strict inequalities “<”. It follows that S is open. If Aλ is an open subset of Rn for each λ ∈ J. Definition 10. but the union of any number (possibly inﬁnite) of open sets is open. and hence be open by Theorem 10.
points not in S Sc S boundary point points in S
• •
∂S
interior point
By the following theorem. When we say S is open we will always mean open in Rn . a) > r }.2. y)  x > 3 or y < x2 }. then all points in a suﬃciently small neighbourhood of a will also satisfy the corresponding inequality and so be in S — see later. 5.5. . an ) is a point in S.
.6. Proof. If A1 . By DeMorgan’s Law. Since ∅ and Rn are complements of each other. 3. λ
and so the result follows from 3 in the previous theorem. . Hence A1 ∩ · · · ∩ Ak is open. BASIC METRIC AND TOPOLOGICAL NOTIONS IN Rn
37
min{r1 . Closed sets. then we have examples of closed sets. it follows that 1 k (A1 ∪ · · · ∪ Ak ) is closed. . It follows34 that Br (a) ⊆ A1 ∩ · · · ∩ Ak . Namely. then their union A1 ∪ · · · ∪ Ak is closed. Then a ∈ Aλ for some λ ∈ J.1. “<” and = are replaced by “≥”. then it follows from the deﬁnition of A ∩ · · · ∩ A that B ⊆ 1 1 k k A1 ∩ · · · ∩ Ak . the result follows from the corresponding part of the previous theorem.
34 If B ⊆ A . Consider. It follows (from the deﬁnition of ) that Br (a) ⊆
λ
Aλ . B ⊆ A . . there is a more general concept of being closed in an arbitrary E such that S ⊆ E. If Aλ is a closed subset of Rn for each λ ∈ J. . For this λ there is an r > 0 such that Br (a) ⊆ Aλ . 3. 36 The complement of such a set is a set deﬁned in terms of strict inequalities. . “>”. 1. Generally. (A1 ∪ · · · ∪ Ak )c = Ac ∩ · · · ∩ Ac . Consider any point a ∈ λ∈J Aλ . and so is open.2.
The reason the proof of 2. . . then the intersection λ Aλ of all the sets Aλ∈J is also closed.
.36 Closed intervals in R are closed sets.6.6. 2. rk } we see that r > 0.1. 35 By closed we will always mean closed in R n .
Hence
λ∈J
Aλ is open. in the examples after Deﬁnition 10. and the n∈N − n .10. But as for open sets. . See the previous section. does not show the intersection of an inﬁnite number of open sets is always open is that we would need to take r to be the inﬁmum of an inﬁnite set of positive numbers. Definition 10. If a = 0 then rn = 1/n. The sets ∅ and Rn are both closed. 10. for example. 2. n inﬁmum of all the rn is 0. sets deﬁned in terms of nonstrict inequalities and = are closed. . Ak is a ﬁnite collection of closed subsets of Rn .
c
Aλ
λ∈J
=
λ∈J
Ac . . A set C is closed (in Rn )35 if its complement (in Rn ) is open. 1.5. . If. 1 1 = {0} from the previous section. “≤” and =. and Br (a) ⊆ Ak for each k. Such an inﬁmum may be 0. The proof is similar to 2. 1 k Since Ac ∩ · · · ∩ Ac is open from part 2 of the previous theorem. 3. Theorem 10.
The boundary of S (bdry S or ∂S) is the set of boundary points of S.8. The interior of S (int S) consists of all interior points of S.1.2. The sets ∅ and S are both open. See the diagram in Section 10. Topological Spaces*.6. BASIC METRIC AND TOPOLOGICAL NOTIONS IN Rn
38
10. The point a is a boundary point of S if every neighbourhood of a contains both points in S and points not in S. The point a is an exterior point of S if a ∈ S c but a is not a boundary point of S. . A set S. If A1 .2. The notion of boundary. then the union λ Aλ of all the sets Aλ is also open. Interior and Exterior. Essentially the same arguments as used in the case of Rn with the Euclidean metric d allow us to deﬁne the notion of a neighbourhood of a point in any metric space (M. and then to deﬁne open sets in any metric space. Boundary. A(2) . Definition 10. and every point in Rn is in exactly one of these three sets. 2. . to the Cantor Set C. is called a topological space.8.
10. . then their intersection A1 ∩ · · · ∩ Ak is open. 1. It follows that: • a is an interior point of S iﬀ Br (a) ⊆ S for some r > 0. If Aλ is an open subset of S for each λ ∈ J. . . interior and exterior of an open set in any topologoical space is deﬁned as in Section 10.2 that this gives a topological space.1. .7. and then to show as in Theorem 10. Why? Do exercises 1–8. the most basic fractal.8. . Motivated by the idea of open sets in Rn we make the following deﬁnition. The exterior of S (ext S) consists of all exterior points of S. to reinforce the basic deﬁnitions. together with a collection of subsets of S (which are called the open sets in S) which satisfy the following three properties. Ak is a ﬁnite collection of open subsets of S. In any topological space (which as we have just seen includes all metric spaces) one can deﬁne the notion of a closed set in an analogous way to that in Section 10.7.
A
(1)
10.10. page 647 of Adams. Thus ∂Br (a) is the set of points whose distance from a equals r.5. We next sketch a sequence of approximations A = A(0) .8.6 and prove the analogue of Theorem 10. Cantor Set. • a is an exterior point of S iﬀ Br (a) ⊆ S c for some r > 0 • S = int S ∪ ∂S ∪ ext S.
. 3. Definition 10. . d).9. Definition 10. The point a is an interior point of S if a ∈ S but a is not a boundary point of S. The theory of topological spaces is fundamental in contemporary mathematics.5.
. • Each endpoint of any of the 2n intervals associated with A(n) has an expansion of the form (16) with each of a1 . and the only way x can have two representations is if x = . Note the following: • x ∈ A(n) iﬀ x has an expansion of the form (16) with each of a1 . . 1]. = 3 3 3 where an = 0.a1 a2 . . 1] . 1] A(1) = [0. Note that A(n+1) ⊆ A(n) for all n and so the A(n) form a decreasing family of sets. . for some an = 0 or 1. 1/3] ∪ [2/3. then removing the open middle third from each of the two closed intervals which remain. Exercise: Show that C is uncountable. 1].a1 a2 . 3 1 S2 (x) = 1 + (x − 1).a1 a2 . 1] A(2) = [0. 1/3] ∪ [2/3. i. . . write each x ∈ [0. . 2/3) from [0. . . . . then removing the open middle third from each of the four closed interval which remain. Since C is the intersection of a family of closed sets. an 222 · · · = .210222 · · · = . BASIC METRIC AND TOPOLOGICAL NOTIONS IN Rn
39
We can think of C as obtained by ﬁrst removing the open middle third (1/3. an . . More precisely. . Each number has either one or two such representations. . . . an−1 (an +1)000 . • It follows that x ∈ C iﬀ x has an expansion of the form (16) with every an taking the values 0 or 2. 1 or 2. 1] in the form a2 a1 an + 2 + ··· + n + ··· (16) x = .10. . Consider the ternary expansion of numbers x ∈ [0. S2 is a dilation with dilation ratio 1/3 and ﬁxed point 1. Next let S1 (x) = 1 x. 7/9] ∪ [8/9. Similarly. . 3
Notice that S1 is a dilation with dilation ratio 1/3 and ﬁxed point 0. 1/9] ∪ [2/9. . an taking the values 0 or 2 and the remaining ai either all taking the value 0 or all taking the value 2.211000 . . let A = A(0) = [0. etc. . . For example. Let C = n≥0 A(n) . C is closed. .
.e. an taking the values 0 or 2. .
y) < t d(x0 .6309. (x. The ﬁrst “⊆” is because (x0 . Theorem 10. BASIC METRIC AND TOPOLOGICAL NOTIONS IN Rn
40
Then A(n+1) = S1 [A(n) ] ∪ S2 [A(n) ].g. First suppose A and B are open. line. Then Bt (a) ⊆ Br (x) × Bs (y) ⊆ A × B. then we expect 1 1 = d.10.
AxB
B
y
Bt(a )
•
Bt(x) x Bt (y) x A
Proof. It is composed of two parts each obtained from the original set by contracting by scaling (by 1/3). Note that (A × B)c = Ac × R ∪ R × B c .
Since Bt (a) ⊆ A × B it follows that A × B is open. If a = (x.10. contracting by 1/3 should give a set which is 1/3d of the original set in “size”. Moreover. y0 ) ∈ Br (x) × Bs (y).1. 10. y) ∈ A×B ⊆ R2 then x ∈ A and y ∈ B. y0 ) ∈ Bt (a) ⇒ ⇒ ⇒ ⇒ d (x0 . C = S1 [C] ∪ S2 [C]. If A ⊆ Rm and B ⊆ Rn are both open (closed) then so is the product A × B ⊆ Rm+n . take m = n = 1. square or cube). y) < t x0 ∈ Bt (x) and y0 ∈ Bt (y) (x0 . For simplicity of notation. The Cantor set is a fractal. x) < t and d(y0 . Let t = min{r.
. If you think of a one. 2 3 This gives log 2 = d log 3 or d = log 2/ log 3 ≈ . y0 ). Choose r and s so Br (x) ⊆ A and Bs (y) ⊆ B. Product sets. Next suppose A and B are closed. s}. two or three dimensional set (e. Thus if d is the “dimension” of the Cantor set.10.
BASIC METRIC AND TOPOLOGICAL NOTIONS IN Rn
41
since (x. Hence A × B is closed. and hence is open. Hence the right side is the union of two open sets by the above.
.10. y) ∈ A × B iﬀ (either x ∈ A or y ∈ B (or both)).
1. there is an r > 0 such that Br (x) ⊆ S c . Convergence.e. Theorem 11. 11.2. .
.
37 Ultimately
means there is an N such that this is true for all n ≥ N . Assume x ∈ S. x) → 0. a) = and with rotation by the angle θ in passing from an to an+1 . as n → ∞. Definition 11. A set S ⊆ Rp is closed iﬀ every convergent sequence (xn ) ⊆ S has its limit in S. i. The reason we call a set “closed” is that it is closed under the operation of taking limits of any sequence of points from the set. x) → 0. Example 11. Proof. First suppose S is closed. y ∗ ) ∈ R2 be ﬁxed.3. y ∗ + sin nθ . i. But we do much more! 11.2. Let θ ∈ R and a = (x∗ .1. Closed sets.
The next theorem shows that a sequence from Rp converges iﬀ each sequence of components converges. n n
1 n
Then an → a as n → ∞.1. Since d(xn .1.e. The sequence xn → x iﬀ xn → x(k) for k = 1. SEQUENCES IN Rp
42
11. x ∈ S c . and let an = x∗ + 1 1 cos nθ.1. A sequence of points (xn ) from Rp converges to x if d(xn . Sequences in Rp See Reed Problem 8 page 51. This is the content of the next theorem. Assume xn → x. Thus the notion of convergence of a sequence from Rp is reduced to the notion of convergence of a sequence of real numbers to 0. . x).
x1 •
• • x6 • *x • x3• • • x5
x8
The shown sequence converges. Then d(xn . as does its projection on to the x and yaxes
Proposition 11.1. (k) (k) (k) Next assume xn → x for each k. (k) (k) it follows that xn − x(k)  → 0 and so xn → x(k) .11. . Think of the case p = 2. Let (xn ) ⊆ S be a convergent sequence with limit x. as n → ∞. ultimately37 xn ∈ Br (x) ⊆ S c . The sequence (an ) spirals around a with d(an . Since S c is open.2. x) → 0 and so xn → x. x) = (xn − x(1) )2 + · · · + (xn − x(p) )2 . Since xn − x(k)  ≤ d(xn .
(1) (p) (k)
(k)
it follows that d(xn . xn − x(k)  → 0. Problems 10–12 on page 59 and the ﬁrst Deﬁnition on page 152. . But since xn → x. p. Proof.
then there is a sequence from S converging to x . A set is bounded iﬀ there is a ball (of ﬁnite radius) centred at the origin which includes the set. i.
11. for each natural number n there is a point in B1/n (x) which belongs to S. Bounded sets.
1/n B1/n(x) If for each n there is a member of S in B 1/n(x ) . Then xn → x since d(xn . SEQUENCES IN Rp
43
If xn→ x then ultimately x n ε Br(x)
This contradicts the fact xn ∈ S for all n. Hence S ⊆ BR+r (0). Hence S is closed. Let r = d(a. A set S ⊆ Rp is bounded if there is a point a ∈ Rp and a real number R such that S ⊆ BR (a). Proof. This is a contradiction and so the assumption is wrong.2.1.3. 0) < R + r. 0) ≤ d(x. This is clear in R2 from a diagram. A sequence (xn ) is bounded if there is a point a ∈ Rp and a real number R such that xn ∈ BR (a) for every n. Then for any x ∈ S. Proposition 11. Definition 11. Hence the assumption is wrong and so x∈S Next suppose that every convergent sequence (xn ) ⊆ S has its limit in S.
This means that the sequence (xn ) ⊆ S but its limit is not in S. In particular. S c is not open. 0). It is proved from the triangle inequality as follows. Thus a sequence is bounded iﬀ the corresponding set of members is bounded.11.
. d(x.3.3.e. x) ≤ 1/n → 0. Suppose S ⊆ BR (a). Then there is a point x ∈ S c such that for no r > 0 is it true that Br (x) ⊆ S c . Choose such a point and call it xn . a) + d(a. Assume S is not closed.
The projection onto the kth axis is deﬁned by Pk (S) = { a  x ∈ S for some x whose kth component is a }. say for n ≥ N . √ i. if a sequence of points from Rp is bounded. being the union of two bounded sets. A convergent sequence is bounded.3.3. .
. then so are the sequences of real numbers corresponding to the ﬁrst component. In particular. . SEQUENCES IN Rp
44
S
S
R
The following theorem says that a set S is bounded iﬀ its projection on to each of the axes is bounded.11. . A set S ⊆ Rp is bounded iﬀ Pk (S) is bounded for each k = 1. if each Pk (S) is bounded then by choosing the largest interval. Proof.e. the sequence is bounded. It follows that if x ∈ S then every component of x is at most M in absolute value. In other words.3. S ⊆ BR (0) where R = kM . .
P 2(S) S
P1(S)
Theorem 11. xN −1 } is also bounded (being ﬁnite) and so the set of all terms is bounded. . we may assume Pk (S) ⊆ [−M. Then ultimately xn ∈ B1 (x). etc. to the second component.4. Conversely. Proposition 11. The set {x1 . . and so the set of terms { xn  n ≥ N } is bounded. Proof. p. Suppose xn → x. M ] for every k. . . If S ⊆ BR (0) then each Pk (S) ⊆ [−R. R] and so is bounded. It follows that √ x ≤ kM 2 .
x) ≤ .2 (BolzanoWeierstraß Theorem).4. on passing to a further subsequence (without changing notation) we may assume that (yn ) also converges. yn ). since S is closed. Again by the BolzanoWeierstraß Theorem for real sequences. any convergent subsequence from S must have its limit in S. Since (an ) ⊆ Rp . yn (k) )). Since the subsequences (xn ) and (yn ) converge.38 Finally.4. Let (yn (k) ) be a convergent subsequence of (yn(k) ) (again by the BolzanoWeierstraß Theorem for real sequences). Note that (xn (k) ) is a subsequence of the convergent sequence (xn(k) ). BolzanoWeierstraß theorem. (This gives a new subsequence (xn ). Definition 11. x) ≤ . where an = (xn . > 0 ⇒ ⇒ d(xk . Proof. Proposition 11.2 for real sequences). yn )).5. and so also converges. Now consider the sequence (xn(k) .
38 After
a bit of practice. n ≥ N ⇒ d (xm .
. Proof. ultimately any two members of the sequence (not necessarily consecutive members) are within of each another. Suppose Since xn → x there is an integer N such that k≥N k≥N That is.2. a subsequence of a sequence is obtained by skipping terms. so does (an (k) ) = ((xn (k) . xn ) ≤ . yn(k) ). then so does the limit of any convergent subsequence of (an ). Let (xn(k) )k≥1 be a subsequence of (xn ).1. The following theorem is analogous to the BolzanoWeierstraß Theorem 5. is bounded. As for sequences of real numbers. SEQUENCES IN Rp
45
11. If (xn ) ⊆ Rp and xn → x. and in fact follows from it. but clearly the proof easily generalises to p > 2. If a sequence (an ) ⊆ Rp is bounded then it has a convergent subsequence. But if k ≥ N then also n(k) ≥ N and so > 0 is given. If all members of (an ) belong to a closed bounded set S.3. Then the sequences (xn ) and (yn ) are also bounded (see the paragraph after Theorem 11. Theorem 11. A sequence (xn ) is a Cauchy sequence if for any there is a corresponding N such that m. then any subsequence also converges to x. so is each of the component sequences. Since (xn (k) ) and (yn (k) ) converge.
That is. xn(k) → x.4. By the BolzanoWeierstraß Theorem for real sequences. yn ).11. we would probably write the previous argument as follow:
Proof. Suppose then that (an ) ⊆ Rp is bounded. d(xn(k) .1. For notational simplicity we prove the case p = 2. but it converges since any subsequence of a convergent sequence is also convergent). 11. .2 for real numbers. Let (xn(k) ) be a convergent subsequence of (xn ) (this uses the BolzanoWeierstraß Theorem 5.5. so does the subsequence (an ) = ((xn .3). on passing to a subsequence (but without changing notation) we may assume (xn ) converges. Write an = (xn .2. Cauchy sequences.
A sequence is Cauchy iﬀ each of the component sequences is Cauchy (Proposition 11. . but with Rp and d instead of M and ρ. We say F : S → S is a contraction if there exists λ ∈ [0. p (k) Conversely. F (y)) ≤ λd(x. . . omit Reed Examples 1 and 3 and all except the ﬁrst four lines on page 208. d(F (x). In Section 15. .4. In this case λ = r. Contraction Mapping Principle in Rp .1. Since
.
(1) (1) (p) (p) (k) (k) (k)
S F[S] F[F[S]]
* A contraction map is continuous. where Lipschitz means for some λ ∈ [0. The deﬁnitions of continuity and uniform continuity are essentially the same as in the one dimensional case. see Section 13. The sequence (xn )n≥1 in Rp is Cauchy iﬀ the component (k) sequences (xn )n≥1 (of real numbers) are Cauchy for k = 1. Since d(xm . (The proof is similar to that for Proposition 11. assume (xn )n≥1 is a Cauchy sequence.1. y) for all x. p. and in fact uniformly continuous.3. xn ). Proof.) Assume (xn ) is Cauchy. p.2. for k = 1. see Proposition 7. (xm − xn )2 + · · · + (xm − xn )2 .1. y). .5 of these notes we will generalise this to any complete metric space with the same proof.6. 1) such that d(F (x).2) iﬀ each of the component sequences converges (Theorem 5. 11.5. a + r(x − b) = b + r(x − b) + a − b. More generally. . But until then. . F (y)) ≤ λd(x. A simple example of a contraction map on Rp is the map (17) F (x) = a + r(x − b). The proof that Lipschitz implies uniform continuity is essentially the same as in the one dimensional case.5. ∞) and all x.3. as is easily checked. . xn ) = it follows that (xn ) is Cauchy.5. it follows that (xn )n≥1 is a Cauchy sequence. y ∈ S. . for 0 ≤ r < 1.11. Suppose S ⊆ Rp . Theorem 11.5. A sequence in Rp is Cauchy iﬀ it converges. SEQUENCES IN Rp
46
Proposition 11. The reference is Reed Section 5.3). .3. Proof.2) iﬀ the sequence converges (Proposition 11.7. Since xm −xn  ≤ d(xm . .1. . any Lipschitz map F is continuous. y ∈ D(f ). for k = 1.
xn+1 ) = d(F (xn−1 ).6. x) ≤ 1−λ Proof. xm ) ≤ d(xn . Then F has a unique ﬁxed point x. . By iterating this we get d(xn . x2 = F (x1 ). We say z is a ﬁxed point of the map F : S → S if F (z) = z. Hence (xn ) converges to some limit x ∈ Rp . Let F : S → S be a contraction map. the Contraction Mapping Principle or the Banach Fixed Point Theorem. followed by translation by the vector a − b. F has exactly one ﬁxed point. xn ) + d(xn . . x1 ).
. Theorem 11. Since S is closed it follows that x ∈ S. Let x0 be any point in S and deﬁne a sequence (xn )n≥0 by x1 = F (x0 ). xm ) ≤ (λn + · · · + λm−1 ) d(x0 . Claim: x is a ﬁxed point of F . x) →0 as n → ∞. We will show that d(x. = It follows (why?) that (xn ) is Cauchy. . and λn d(x0 . xn ). xn+1 ) + · · · + d(xm−1 . λn + · · · + λm−1 ≤ λn (1 + λ + λ2 + · · · ) λn 1−λ → 0 as n → ∞. Claim:
39 In
(19) But
The ﬁxed point is unique. Let λ be the contraction ratio. Thus if m > n then d(xn . x1 ). F (x)) ≤ d(x. We have d(xn . F (x0 )).
other words. 3. The following theorem (and its generalisations to closed subsets of a complete metric spaces) is known as the Contraction Mapping Theorem. xn ) + λd(xn−1 . F (x)) = 0 and so x = F (x). iterates of F applied to any initial point x0 converge to x. the unique ﬁxed point for any r = 1 (even r > 1) is (a − rb)/(1 − r). Claim: (xn ) is Cauchy. . . xn = F (xn−1 ). (18) d(xn . as is easily checked. We will ﬁnd the ﬁxed point as the limit of a Cauchy sequence. SEQUENCES IN Rp
47
we see (17) is dilation about b by the factor r. xn+1 ) ≤ λn d(x0 .11. In fact d(x. 2. x3 = F (x2 ). F (xn ) ≤ λd(xn−1 . 1. . It has many important applications. xn ) + d(F (xn−1 ).39 Moreover. where S ⊆ Rp is closed. . This establishes the claim. F (x)) = d(x. . In the preceding example. F (x)) ≤ d(x.1.
y ∈ S but there is no ﬁxed point. From (19) and the lines which follow it.2. (Can you write down an analytic expression?)
By the Mean Value Theorem.6. SEQUENCES IN Rp
48
If x and y are ﬁxed points.11. we have d(xn . in this case there is still a ﬁxed point in [0.e. ∞) into itself and is a contraction with λ = 1 . F (x0 )). y) = d(F (x). F (y)) < d(x. λn d(x0 . (18) follows. y) for all x. 1 ].1. 1 ]. see the following diagram. But since x < F (x) for all x. the function F (x) = x2 is a contraction on [0. y ∈ R. there is no ﬁxed point. i. consider a function F with the properties x < F (x). Since 0 ≤ λ < 1 this implies d(x. xm ) ≤ Since xm → x. y) = 0. F (y)) < d (x. Example 11. 1−λ
for all x ∈ [0. 2 The ﬁrst claim is clear from a graph. To obtain an example where d (F (x). 4. (Approximating F (x) = √ 2) 1 2 The map x+ 2 x
takes [1. y) for all x. F (y)) ≤ λd(x. For example. 0 ≤ F (x) < 1. However. It is essential that there is a ﬁxed λ < 1. and can be then checked in various standard ways. Remark 11. y). d(F (x). 2 2 namely 0. x = y. then F (x) = x and F (y) = y and so d(x. a] for each 0 ≤ a < 1 but is not a 2 contraction on [0. ∞).6.
.
If F (x∗ ) < 1 it follows from the Mean Value Theorem and the Contraction Mapping Principle that x∗ is a stable ﬁxed point of F (Reed. . ≈ 1. Note that F (x) = r(1 − 2x) and so F (x∗ ) = 2 − r. F (x) = rx(1 − x). Hence x∗ is a stable ﬁxed point of F . ≈ 1.41421356 .2 on page 206 and the preceding paragraph.4 (The Quadratic Map).3 (Stable ﬁxed points). This proves F is a contraction with λ = 1 .11. 2 12 408
√ (Whereas 2 = 1. iterates of F converge to x∗ . If 1 < r < 3 then F (x∗ ) < 1. SEQUENCES IN Rp
49
graph of F
√2
1 √2

Since F (x) =
1 2
−
1 x2 . (Newton’s Method ) (Recall Reed Theorem 4.5. See Reed.7. Example 11. In Section 6 we considered the quadratic map F : [0. Example 11. ∞) that − 1 1 ≤ F (x) < .) This was known to the Babylonians. .4142156. nearly 4000 years ago.e. Theorem 5. we can prove less.7. The only ﬁxed points of F are x = 0 and x = x∗ := 1 − 1/r. f (x) = 0 and f exists and is bounded in some interval containing x. ∞) and iterating F .6. See Reed Example 4 on page 208. 1).417. Suppose F : S → S is continuously diﬀerentiable on some open interval S ⊆ R and F (x∗ ) = x∗ (i. 2 2 1 x1 − x2  2
Hence F (x1 ) − F (x2 ) ≤
by the Mean Value Theorem.2 page 206). 577 3 17 .1) Suppose that f (x) = 0. and so iterates F n (x0 ) converge to x∗ provided x0 is suﬃciently close to x∗ . the following provide x approximations to 2 : 1. In particular.4. x∗ is a ﬁxed point of F ).
it follows for x ∈ [1. As noted in Section 6. 1]. Example 11. but much more easily. We say x∗ is a stable ﬁxed point if there is an ε > 0 such that F n (x0 ) → x∗ for every x0 ∈ [x∗ − ε.6. However. The proof is quite long. beginning from any x0 ∈ (0. if 1 < r ≤ 3 it can be shown that.6. x∗ + ε]. 2 √ √ The unique ﬁxed point is 2. Theorem 5. . 1] → [0.
. It follows that we can approximate 2 by beginning from any√ 0 ∈ [1.
f (x)
f _ x
f(x) __ F(x) = x f’(x)
x
Here is a very short proof that the method works: Since f (x) f (x). Hence x is a stable ﬁxed point of F and so iterates F n (x0 ) converge to x for all x0 suﬃciently close to x. F (x) = (f (x))2 we see F (x) = 0.11. The main point is that the contraction ratio converges to 0 as x → x. * We can even show quadratic convergence with a little extra work.
. SEQUENCES IN Rp
50
Then Newton’s Method is to iterate the function f (x) F (x) = x − .
and so the assumption is false. A set A ⊆ Rp is closed and bounded iﬀ it is sequentially compact. It follows that any subsequence of (an ) is unbounded.5. but that (closed and bounded) need not imply compact (or sequentially compact).4. Hence x = x and so x ∈ A. This contradicts the fact A is compact.1 any subsequence of (xn ) must converge to the same limit as (xn ). Sequentially compact sets. To show that A is bounded. say) such that A⊆
λ∈J
Aλ .1 and 13.1. Remark 12. But this means no subsequence is convergent by Proposition 11. The above result is true in an arbitrary metric space in only one direction. We also see that a subset of an arbitrary metric space is sequentially compact iﬀ it is compact. 0) > n. Theorem 12.3. Namely.2.
We denote the collection of sets by { Aλ  λ ∈ J }. i. So suppose A is sequentially compact.e. COMPACT SUBSETS OF Rp
51
12.4. The converse is also true.1.12. A is bounded.1. Thus A is closed. such that A⊆
λ∈J0
Aλ .2. assume otherwise. 12. This particular cover is already ﬁnite.1). sequentially compact implies closed and bounded and the proof is essentially the same. In the next section we will see that compact sets have nice properties with respect to continuous functions (Theorems 13. suppose (xn ) ⊆ A and xn → x.2. which need not hold in an arbitrary metric space.1. The other direction requires the Heine Borel theorem. By sequential compactness. 12. That is. Definition 12. To show that A is closed in Rp . A set A ⊆ Rp is sequentially compact if every (xn ) ⊆ A has a convergent subsequence with limit in A. i. Then for each n we can choose some an ∈ A with an ∈ Bn (0).
. Proof. We only have one direction left to prove.2 that a closed bounded subset of Rp is sequentially compact. But from Proposition 11.e. that either implies the subset is closed and bounded. with d(an . Compact sets. A ﬁnite subcover of the given cover is a ﬁnite subcollection which still covers A.
(Thus a ﬁnite subcover exists if one can keep just a ﬁnite number of sets from the original collection and still cover A. Compact subsets of Rp In this section we see that a subset of Rp is closed and bounded iﬀ it is sequentially compact iﬀ it is compact. We saw in the BolzanoWeierstraß Theorem 11. some subsequence (xn ) converges to some x ∈ A. a ﬁnite subcover is a collection of sets { Aλ  λ ∈ J0 } for some ﬁnite J0 ⊆ J. We need to show that x ∈ A.1.1.) The following is an example of a cover of A. An open cover of a set A is a collection of open sets Aλ (indexed by λ ∈ J.4.
but not for subsets of an arbitrary metric space — we show all this later.4.2.. For each a ∈ A choose one Aλ which contains a. why? Another example is (0. ∪ 2 3 3
2 3 . (If n is the largest integer such 2 1 that 1 − n . suppose A ⊆ λ∈J Aλ .2. n
This is easy to see. Every ﬁnite set A is compact. The collection of Aλ chosen in this way is a ﬁnite subcover of A. Example 12. which in turn is equivalent to the inequality 1 + n < 2. We next prove that if a subset of Rp is compact.1 − n n
. 1) ⊆
n≥2
1 . then no x between 1 1 − n and 1 is covered. If A ⊆ Rp is compact.
The following is a rough sketch of the ﬁrst four intervals in the cover.1 . (This is a simple case of the general result that every closed bounded set is compact — see Remark ??. 5 5
∪
4 5 . An open cover of a set which has no ﬁnite subcover is (0.) To see this. then it is closed and bounded.
..)
Definition 12. Theorem 12. then it is closed and bounded. Note that we require a ﬁnite subcover of every open cover of A. A set A ⊆ Rp is compact if every open cover of A has a ﬁnite subcover.) The intervals cover (0. But it is clear that there is no ﬁnite subcover.2. COMPACT SUBSETS OF Rp
52
Aj
Ai
A Ak
Example 12. 1) because they overlap and because 1 − n → 1.
( 0 ( ) ( ) () ) ) 1
1 2 The intervals overlap since 1 − n > 1 − n+1 if n ≥ 2 (because this inequality is 1 2 equivalent to the inequality n < n+1 . which in turn is equivalent to the inequality n+1 1 n < 2.12. 4 4
∪
3 4 . The converse direction is true for subsets of Rp . 1 − n is in some ﬁnite collection of such intervals.2.
We can also write the cover as 1 1 2 0. 1) ⊆
n≥2
1−
1 2 . ∪ . which is certainly 1 true.1. 6 6
∪ . indicated by lines with an arrow at each end.3.2.
For each a ∈ A. A⊆
n≥1
Bn (0). 2 A⊆
a∈A
Bra (a). Theorem 12. To prove this. and the following Theorem 12. . For such an x and r. let ra = 1 d(a. then A ⊆ BN (0). Hence A is bounded.3. not necessarily closed or bounded. Theorem 12. If A is not closed then there exists a sequence (an ) ⊆ A with an → b ∈ A. b).
A
•b
Hence A is closed. of sequential compactness.3. 12. Since the distance from ai to b is 2ri .12. . without being concerned about which arguments generalise to arbitrary metric spaces. We next prove that sequentially compact sets and compact sets in Rp are the same. .2 (closed and bounded implies sequentially compact).
since the union of all such balls is Rp . n) and so within distance ri of ai . if the ball
. Since A is compact. (This is true for any set A. Clearly. Every point x ∈ A is in some Bri (ai ) (i = 1. First suppose A is sequentially compact. Let r be the minimum radius of any ball in the subcover (r > 0 as we are taking the minimum of a ﬁnite set of positive numbers). say. A set A ⊆ Rp is sequentially compact iﬀ it is compact. But this contradicts the fact that there is a sequence from A which converges to b. The proofs in both directions are starred. Suppose A⊆
λ∈J
Aλ . . and consider each rational r > 0. It follows that the three notions of closed and bounded.1. COMPACT SUBSETS OF Rp
53
Proof. it is suﬃcient to consider Theorem 12. If N is the radius of the largest ball in the subcover.
where { Aλ  λ ∈ J } is an open cover. . Proof*. For any set A ⊆ Rp . .3.1 just in one direction (sequentially compact implies compact). . They both generalise to subsets of arbitrary metric spaces.
By compactness there is a ﬁnite subcover: A ⊆ Br1 (a1 ) ∪ · · · ∪ Brn (an ). there is a ﬁnite subcover. Compact sets.4 (compact implies closed and bounded).2. the distance from x to b is at least ri and hence at least r. xp ) ∈ A all of whose coordinates are rational. . all agree in Rp . and of compactness. Suppose A ⊆ Rp is compact. consider each point x = (x1 .1.) To see this. We ﬁrst claim there is a subcover which is either countable or ﬁnite.
it follows that J is either ﬁnite or countable. choose x with rational components so d(x. x∗ ) < r ∗ /4 < r. We next claim that there is a ﬁnite subcover of any countable cover. an ∈ Ak for all suﬃciently large n . Because Ak is open. x) + d(x.3. and so Br∗ (x∗ ) ⊆ A∗ for some real number r∗ > 0 since Aλ∗ is open. Thus the set J corresponds to a subset of the set Q × · · · × Q ×Q. Then x∗ ∈ Br (x) since d(x. Since a subset of a countable set is either ﬁnite or countable (Reed page 16 Proposition 1. To see this consider any x∗ ∈ A. To see this. This is a contradiction. Moreover. COMPACT SUBSETS OF Rp
54
Br (x) ⊆ Aλ for some λ ∈ J. Br (x) ⊆ Aλ for some λ ∈ J . Br (x) ⊆ Aλ∗ and so Br (x) must be one of the balls used in constructing J . if y ∈ Br (x) then
If there is no ﬁnite subcover. The collection of all Br (x) obtained in this manner must be a cover of A (because every x∗ ∈ A is in at least one such Br (x) ). y) < r∗ /4 + r < r ∗ /4 + r ∗ /2 < r ∗ .2). and so we have proved the claim. But we have seen that the set { Aλ  λ ∈ J } is ﬁnite or countable. Hence y ∈ Br
∗ (x∗ )
and so Br (x) ⊆ Br∗ (x∗ ). and then choose a suitable rational r > 0.12. y) < d(x∗ . which is countable by Reed Proposition 1. Let the collection of Aλ chosen in this manner be indexed by λ ∈ J . a ∈ Ak for some k. .
40 For example. We next show that the collection of sets { Aλ  λ ∈ J } is a cover of A. say. . Hence there is a ﬁnite subcover in (20). λ
Br*( x*)
1 p
x* Br(x) x
It is clear from the diagram that we can choose x with rational components close to x∗ .3. In other words. and so we have proved the claim. By sequential compactness of A there is a subsequence (an ) which converges to some a ∈ A. .40 ) In particular. It follows that the collection { Aλ  λ ∈ J } is also a cover of A. But from the construction of the sequence (an ) an ∈ Ak if n ≥ k. . x∗ ) < r ∗ /4 and then choose rational r so r∗ /4 < r < r ∗ /2. choose one such Aλ .4 applied repeatedly (products of countable sets are countable). a1 ∈ A \ A1 a2 ∈ A \ (A1 ∪ A2 ) a3 ∈ A \ (A1 ∪ A2 ∪ A3 ) .
. From (20). such that x∗ ∈ Br (x) and Br (x) ⊆ Br∗ (x∗ ). (The precise argument uses the triangle inequality. then we can choose
d(x∗ . . write the countable cover as (20) A ⊆ A1 ∪ A2 ∪ A3 ∪ . Then x∗ ∈ Aλ∗ for some λ∗ ∈ J.
This contradicts the above and so establishes the claim. This proves one direction in the theorem.2.1. U (a1 ). U (a3 ).2). }. Claim: S is closed in Rp . . In an arbitrary metric space sequentially compact implies closed and bounded (with essentially the same proof as in Theorem 12. and by adding more sets if necessary we can take a ﬁnite subcover of the form S c . and it converges to b. . Then the sequence (bn ) is a subsequence of (an ). . . we see that there is a ﬁnite subcover of any cover of A. U (a2 ). But this is impossible. The general result is that complete and totally bounded (these are the same as closed and bounded in Rp ) is equivalent to sequentially compact is equivalent to compact. . Hence S is closed in Rp as claimed. . and so A is compact. . but closed and bounded does not imply sequentially compact (the proof of Theorem 11.12. U (aN ) for some N . The notion of points with rational coordinates is replaced by the notion of a “countable dense subset” of A. COMPACT SUBSETS OF Rp
55
Putting the two claims together. .
. By compactness there is a ﬁnite subcover of A. . a ∈ S c and a ∈ U (ai ) for i = 1. suppose A is compact. Thus the assumption is false and so we have proved the theorem. If S is not closed then from Theorem 11.) Claim: S is inﬁnite. so that b3 occurs further along in the sequence (an ) than does b2 .1. Then at least one value a would be taken an inﬁnite number of times. This contradicts what we said before about (an ). 12. aN (remember that S is inﬁnite) and recalling that S ∩ U (ai ) = {ai }.3. Assume A is not sequentially compact. (Otherwise we could construct in a similar manner to the previous paragraph a subsequence of (an ) which converges to a. . Let the set of values taken by (an ) be denoted S = { a1 . More remarks. We have seen that A ⊆ Rp is closed and bounded iﬀ it is sequentially compact iﬀ it is compact. suppose S is ﬁnite. Remark 12. To see this. A similar argument shows that for each a ∈ S there is a neighbourhood U (a) = Br (a) (for some r depending on a) such that the only point in S ∩ U (a) is a. U (a1 ). . so that b2 occurs further along in the sequence (an ) than does b1 .4.4. In particular. . . . etc. with a “similar” argument to that of Theorem 12.4. We saw in the previous theorem that A is closed. as we see by choosing a ∈ S (⊆ A) with a = a1 . .1. To prove the other direction. N .2 uses components and does not generalise to arbitrary metric spaces). This means we can choose a sequence (an ) ⊆ A that has no subsequence converging to any point in A. (Some values may be listed more than once. a3 . This then implies that in fact (an ) has no subsequence converging to any point in R. a2 . Choose a subsequence (bn ) of (bn ) so that b1 = b1 . . . .* In an arbitrary metric space the second “iﬀ” is still true. .) Next consider the following open cover of A (in fact of all of Rp ): S c .1 there is a sequence (bn ) from S which converges to b ∈ S. There is then a constant subsequence all of whose terms equal a and which in particular converges to a.
2. A ∩ Bδ (a) ⊆ f −1 [ (f (a) − ε. a) ≤ δ =⇒ d(f (x). Definition 13.5. See Section 13. see Reed Theorem 3.1. (Much of what we say will generalise to f : A → R where A ⊆ X and (X.1. It also mostly further generalises to replacing R by Y where (Y. x2 ) ≤ δ =⇒ d(f (x). where A ⊆ Rp .1.3 page 77.1. ρ) is a general metric space.
f(a)ε f(a)+ε
f(a)
a
Bδ(a )
f1 [ ( f(a)ε . Remark 13.1.3.1.
This is also completely analogous to the case p = 1.) The usual properties of sums. Deﬁnition 1. f(a) +ε ) ]
. although the following diagram in case A = Rp should make the idea clear. Basic results.1. f (a)) ≤ . A function f : A → R is continuous at a ∈ A if (21) whenever (xn ) ⊆ A. We can think of (22) geometrically as saying either (using < instead of ≤) 1. CONTINUOUS FUNCTIONS ON Rp
56
13.) 13.1. A function f : A → R is continuous at a ∈ A iﬀ for every ε > 0 there is a δ > 0 such that: (22) x ∈ A and d(x. Remark 13. we have: Definition 13.13. still hold and have the same proofs. See also Adams §1. quotients (when deﬁned) and compositions of continuous functions being continuous. Exactly as for p = 1. This follows essentially from the fact that we require the statements to be true for every ε > 0. f (a)) ≤ .2. f [A ∩ Bδ (a)] ⊆ (f (a) − ε. In the previous Theorem and Deﬁnition we can replace either or both “≤” by “<”.1. A function f : A → R is uniformly continuous iﬀ for every ε > 0 there is a δ > 0 such that: (23) x ∈ A and d(x1 . or 2.the end of paragraph 2 on page 154. ρ∗ ) is another metric space. Theorem 13. f (a) + ε). This is frequently very convenient and we will usually do so without further comment. We consider functions f : A → R. Continuous functions on Rp Review Section 7 and Reed page 152 line 3.9. f (a) + ε) ].4 for the notation. products. xn → a =⇒ f (xn ) → f (a)
(The proof is the same as the one dimensional case.
Moreover. sin xy. . . without even changing notation. .1.1. y) = sin xy. 13. y) = (x2 + y 2 .
If f : A → Rq we can write f (x) = (f1 (x).2 generalise immediately. . y) = ex . . Theorem 13. The previous diagram then becomes:
f(a)
f a
Bε (f(a))
Bδ (a)
f 1 [ Bε(f(a)) ]
A function f which has “rips” or “tears” are not continuous. we usually want to think of linear transformations as maps into Rq rather than as q component maps into R.2 of Reed. CONTINUOUS FUNCTIONS ON Rp
57
Remark 13. And it does not help if we want to consider functions which map into a general metric space. f is uniformly continuous of A.1. f3 (x.2.13.3 and Theorem 13. f (x.1 page 153. f1 (x. for example.1. fq (x)). . This generalises the result for functions deﬁned on a closed bounded interval. fq is continuous.1. The theorem is also true for A ⊆ X where X is an arbitrary metric space.3. Suppose f : A → R is continuous where A ⊆ Rp is closed and bounded.1. It follows from Deﬁnition 13.1 (or Theorem 13. y) = x2 + y 2 . We can again replace either or both “≤” by “<”.2. provided A is sequentially compact (which is the same as compact. The proof is essentially the same as in the case of X = R in Section 3.2) applied to functions with values in Rq . to functions f : A → Rq . The proof is essentially the same as in the case p = 1.1. See Reed Theorem 4. Then f is bounded above and below and has a maximum and a minimum value. that f is continuous iﬀ each component function f1 . Deeper properties of continuous functions. ex ).6.1. In this sense we can always consider real valued functions instead of functions into Rq . 13. But this is often not very useful — in linear algebra. as we have proved). .
are all continuous. Deﬁnitions 13. For example. if f : R 2 → R3 . . The BolzanoWierstrass theorem is not needed since we know immediately
.
then f is continuous since the component functions f2 (x.
We say a is a limit point of A if there is a sequence from A\{a} which converges to a. As for the case p = 1. * There is also a generalisation of the Intermediate Value Theorem. if f : R → R. It follows. Suppose f : A → R.3. are completely analogous to the case p = 1. is the usual one. From Theorem 11. Definition 13. none of whose terms equal a.
x→0
Theorem 13.
(If a is an isolated point of D(f ) then f is always continuous at a according to the deﬁnition — although limx→a f (x) is not actually deﬁned.1. If f is continuous and A is connected. then we say the limit of f (x) as x approaches a is L. This is not an
41 As
usual. 13.3. then one can prove that the image of f is an interval. where A ⊆ Rp and a is a limit point of A.
The reason for restricting to sequences. where A ⊆ Rp and a is a limit point of A. See Deﬁnition 13. lim g(x) = 0. For example.2.13.
lim f (x) = 0. then we want
x→0
f (x) =
x2 1
x=0 x=0
g(x) = x2 . or g : R \ {0} → R. if A = (0.3.1 and Theorem 13. We say a ∈ A is an isolated point of A if there is no sequence xn → a such that (xn ) ⊆ A \ {a}.3.3. The following Deﬁnition. Limits.1. we could replace either or both “≤ ” by “< ”
. Theorem and its proof. Then limx→a f (x) = L iﬀ 41 : for every ε > 0 there is a corresponding δ > 0 such that d(x. 1] ∪ 2 ⊆ R then 2 is an isolated point of A (and is the only isolated point of A). One deﬁnition of connected is that we cannot write A = (E ∩ A) ∪ (F ∩ A) where E. or the limit of f at a is L. We do not require a ∈ A. For example. limits in general can be deﬁned either in terms of sequences or in terms of ε and δ. Suppose f : A → R. exactly as in the case p = 1. If for any sequence (xn ): (xn ) ⊆ A \ {a} and xn → a ⇒ f (xn ) → L. a set contains all its limit points iﬀ it is closed. a) ≤ δ and x = a ⇒ f (x) − L ≤ . but for this we need the set A to be “connected”. CONTINUOUS FUNCTIONS ON Rp
58
from the deﬁnition of sequential compactness that every sequence from A has a convergent subsequence. that if a ∈ D(f ) is a limit point of D(f )) then f is continuous at a iﬀ
x→a
lim f (x) = f (a).2.2. F are open and E ∩ A. and write
x→a x∈A
lim f (x) = L. 1].
or
x→a
lim f (x) = L. The limit points of A are given by the set [0. F ∩ A are nonempty.
y) = xy + y2
x2
for (x. let f (x. (Setting x = r cos θ. y) =
(x.y)→a y=ax
a . Example 13. with the same proofs as in the case p = 1. CONTINUOUS FUNCTIONS ON Rp
59
interesting sitation. It follows that lim
(x. θ) = 1 sin 2θ. y) → (0. This may also help in understanding the previous diagram.0)
f (x.13. y) = a(1 + a2 )−1 for x = 0. y)
does not exist. products and quotients of limits hold.y)→(0. 1 + a2
Thus we obtain a diﬀerent limit of f as (x. Functions of two or more variables exhibit more complicated behaviour than functions of one variable. y = r sin θ. A partial diagram of the graph of f is:
Probably a better way to visualize f is by sketching level sets 42 of f as shown in the next diagram. 0) along diﬀerent lines.) The usual properties of sums. and we would not normally consider continuity at isolated points.3.3. Hence lim f (x. we see that in polar coordinates this is just the function f (r. 0).) 2 If y = ax then f (x.
. Then you can visualise the graph of f as being swept out by a straight line rotating around the origin at a height as indicated by the level sets.
42 A
level set of f is a set on which f is constant. y) = (0. For example.
y)→(0.0) f (x.13. y) = for (x. But it is still not true that lim(x. The limit along the yaxis x = 0 is also easily seen to be 0. y) → (0.
ax3 x→0 x4 + a2 x2
x→0
ax x2 + a2
Thus the limit of f as (x.
. y) = (0. You might like to draw level curves (corresponding to parabolas y = bx2 ). 0) along any line y = ax is 0. y) exists. Then lim
(x. 0). 0) along the parabola y = bx2 we see that f = b(1 + b2 )−1 on this curve and so the limit is b(1 + b2 )−1 . Let f (x.3. y) → (0.0) y=ax
x4
x2 y + y2
f (x.4. y) = lim = lim = 0.y)→(0. For if we consider the limit of f as (x. CONTINUOUS FUNCTIONS ON Rp
60
Example 13.
4. 0] and E1 = [0. Proof. Note that inverse images are better behaved than images.3 and the two diagrams there. We then discuss the generalisation to f : A (⊆ Rp ) → B (⊆ R). Similarly.1. f −1 [E] is open in Rp whenever E is open in R. ∞). The results generalise immediately to intersections and unions of more than two. But for this we ﬁrst need to introduce some general notation for functions. You should now go back and look again at Remarks 13. Then f −1 [E1 ∩ E2 ] = f −1 [E1 ] ∩ f −1 [E2 ]. sets. f is continuous. f −1 [f [E1 ]] = R E1 and f [f −1 [R]] = [0. We ﬁrst state and prove the following theorem for functions deﬁned on all of Rp . f [E1 \ E2 ] ⊇ f [E1 ] \ f [E2 ] E1 ⊆ E2 ⇒ f [E1 ] ⊆ f [E2 ] E ⊆ f −1 [f [E] ]. f [E1 ∪ E2 ] = f [E1 ] ∪ f [E2 ].2 and 13. E1 . including inﬁnitely many. Let f : Rp → R. ∞) = R. Then f [E1 ∩ E2 ] = {0} and f [E1 ] ∩ f [E2 ] = [0. ∞) while f [E1 ] \ f [E2 ] = ∅.4. f [E1 \ E2 ] = (0. E1 ⊆ E2 ⇒ f −1 [E1 ] ⊆ f −1 [E2 ] f [f −1 [E] ] ⊆ E.4. The others are similar.3. E1 . X = Y = R. 2. The image of E ⊆ X under f is the set f [E] = { y ∈ Y  y = f (x) for some x ∈ E }. Theorem 13. then f [E1 ∩ E2 ] ⊆ f [E1 ] ∩ f [E2 ].4. f −1 [E1 ∪ E2 ] = f −1 [E1 ] ∪ f −1 [E2 ]. One can deﬁne continuity in terms of open sets (or closed sets) without using either sequences or the ε–δ deﬁnition.4. If E. Suppose f : X → Y and E. See Theorem 13.1. In fact the inverse function will exist iﬀ f −1 {y}43 contains at most one element for each y ∈ Y .2. f −1 [C] is closed in Rp whenever C is closed in R. Theorem 13. f −1 [E1 \ E2 ] = f −1 [E1 ] \ f −1 [E2 ]. Suppose f : X → Y . The inverse image of E ⊆ Y under f is the set f −1 [E] = { x ∈ X  f (x) ∈ E }.1. Also. It is important to realise that f −1 [E] is always deﬁned for E ⊆ Y . 3.3. E2 ⊆ Y . ∞). E2 = (−∞. Another characterisation of continuity. x ∈ f −1 [E1 ∩ E2 ] iﬀ f (x) ∈ E1 ∩ E2 iﬀ (f (x) ∈ E1 and f (x) ∈ E2 ) iﬀ (x ∈ f −1 [E1 ] and x ∈ f −1 [E2 ]) iﬀ x ∈ f −1 [E1 ] ∩ f −1 [E2 ]. CONTINUOUS FUNCTIONS ON Rp
61
13.13. For the ﬁrst. To see that equality need not hold in the sixth assertion let f (x) = x2 .
. even if f does not have an inverse.
43 We
sometimes write f −1 E for f −1 [E]. Definition 13. E2 ⊆ X. The following are straightforward to check. Then the following are equivalent: 1.
CONTINUOUS FUNCTIONS ON Rp
62
Proof.2 that f is continuous at x. From Remark 13. We can similarly show (3) ⇒ (2). In order to prove f is continuous at x take any r > 0. But (f −1 [C])c = f −1 [C c ]. f (x) + r) is open. Let x ∈ Rp . 0). f (x) + r) ⊆ f −1 [E].e.4. and so Bδ (x) ⊆ f −1 [E]. Hence f −1 [C] is closed. f (x) + r). (2) ⇒ (1): Assume (2). f(x) + r ) Bδ(x)
f1[E]
(2) ⇔ (3): Assume (2). f (x) + r).2 there exists δ > 0 such that Bδ (x) ⊆ f −1 (f (x) − r.
. We can now give a simple proof that the examples at the beginning of Section 10. y. Since x ∈ f −1 (f (x) − r. If C is closed in R then C c is open and so f −1 [C c ] is open.
f1 (x.4.13. y. Example 13. (1) ⇒ (2): Assume (1). For example. f −1 [E] is open in Rp whenever E is open in R.1. f (x) + r) ⊆ E. 0). We wish to show that f −1 [E] is open (in Rp ). i. z) ∈ R3  x2 + y 2 < z 2 and sin x < cos y } then A = A1 ∩ A2 where
−1 A1 = f1 (−∞. z) = x2 + y 2 − z 2 . and so f −1 [E] is open. f (x) + r) is open it follows that f −1 (f (x) − r. But f −1 (f (x) − r. f2 (x. this ﬁnishes the proof. Since x was an arbitrary point in Rp .
f1 ( f(x)r .1. Since (f (x) − r. if A = { (x. z) = sin x − cos y. −1 A2 = f2 (−∞. Let E be open in R. and since E is open there exists r > 0 such that (f (x) − r. y.5 are indeed open. it follows there exists δ > 0 such that Bδ (x) ⊆ f −1 (f (x) − r. f (x) + r) and this set is open. Thus for every point x ∈ f −1 [E] there is a δ = δx > 0 such that the above line is true. It now follows from Remark 13. Let x ∈ f −1 [E]. Then f (x) ∈ E.
1) is open in R . (What 2 could we take as the set E in each case? ) For another example. Example 13. Suppose A ⊆ Rp . Remark 13. Since F = A ∩ B it follows that F is open in A and closed in B.5. (Why? ) For (2) ⇒ (1) we suppose the inverse image of an open set in R is open in A.4. 1] is 2 open in A. Then the set F = ( 1 . and hence their intersection is open. Consider the interval A = (0. being the inverse image of open sets. we observe that the theorem also generalises easily to functions f : A (⊆ Rp )Rq . F is relatively open in A. Then A is closed and B is open. Theorem 13. CONTINUOUS FUNCTIONS ON Rp
63
Thus A1 and A2 are open. 1]. We say F ⊆ A is open (closed) in A or relatively open (closed) in A if there is an open (closed) set E in Rp such that F = A ∩ E.
E
F
A
E does not include its boundary.4. y)  0 ≤ x ≤ 1. 0 < y < 1 }. y)  0 ≤ x ≤ 1 }. The interval (0. as is A itself. The proof of (2) ⇔ (3) is similar. consider the square F = { (x. A does. Finally. We need to show that f −1 [E] is open in A.4. Let A = { (x. 1 ] is closed in A. we need the following deﬁnitions (which are also important in other settings). but is not open in R2 . Just replace Bδ (x) by Bδ (x) ∩ A and Bδx (x) by Bδx (x) ∩ A throughout the proof. But it is not open when considered as a subset of R2 via the usual inclusion of R into R2 . y)  0 < y < 1 }.
. which does not contain its vertical sides but does contain the remainder of its horizontal sides.3 is true for functions f : A (⊆ Rp ) → R if we replace “in Rp by “in A”.1. Just replace the interval (f (x) − r. For (1) ⇒ (2) we suppose E is open. B = { (x.13.
It follows (exercise) that F ⊆ A is open in A iﬀ for each x ∈ F there is an r > 0 such that A ∩ Br (x) ⊆ F. f (x) + r) by the ball Br (f (x)) throughout this Remark. by taking complements in A and noting that a set is open in A iﬀ its complement in A is closed in A. In the proof.* In order to extend the previous characterisation of continuity to functions deﬁned just on a subset of Rp . as is A itself. The set F = (0. replace Bδ (x) by A ∩ Bδ (x) (and use the exercise at the beginning of this Remark).
Suppose f : E (⊆ Rp ) → Rq is continuous and E is compact. For example.4.
44 Each A can be extended to an open set in R p . By compactness there is a ﬁnite subcover λ from these extended sets. Continuous image of compact sets*. as we see by considering what happens near (1.
. y)  x2 + y 2 = 1 } → [0. 2 For example.4. sin t) = t. y)  x2 + y 2 = 1 } (⊆ R2 ). Theorem 13. Taking their restriction to E gives a ﬁnite subcover of the original cover.13.
But this is not continuous. 1) = [0. Suppose f [E] ⊆
λ∈J
Aλ
is an open cover.1. Proof. For example.5. It is not true that the continuous image of an open (closed) set need be open (closed). 1] which again is neither open nor closed. However. Then E ⊆ f −1 [f [E] ] ⊆ f −1
λ∈J
Aλ =
λ∈J
f −1 [Aλ ].
Thus the original cover of f [E] has a ﬁnite subcover. 2 If f (x) = e−x then f [R] = (0. 0).2 that f [E] ⊆ f
λ∈J0
f −1 [Aλ ] =
λ∈J0
f [f −1 [Aλ ] ] ⊆
λ∈J0
Aλ . 2π). let f : R → R be given by f (x) = x2 . f (t) = (cos t. f (cos t. sin t). It is not always true that if f is continuous and onetoone then its inverse is continuous. Hence f [E] is compact. 1] = R. CONTINUOUS FUNCTIONS ON Rp
64
Remark 13.
This gives a cover of E by relatively open sets. 1) which is neither open nor closed. continuous images of compact sets are compact. 13. Then f [E] is compact. 0).
It follows from Theorem 13.5. sin 2π − n n → (1. if f : R → R is given by f (x) = e−x then f −1 [0.
Then the inverse function is given by f −1 : { (x.2. cos 2π − 1 1 . and by compactness44 there is a ﬁnite subcover E⊆
λ∈J0
f −1 [Aλ ]. The inverse image of a compact set under a continuous function need not be compact. let f : [0. 2π) → { (x. Then f (−1. For example.
this inverse image (f −1 )−1 [C] under f −1 is the same as the image f [C] under f . and so is itself closed and bounded. 1 1 . Theorem 13. Proof. Thus this result generalises to any metric space or even any topological space. then its inverse function f −1 : f [E] → E is certainly welldeﬁned (namely.5. Since C is a relatively closed subset of a compact set. and in particular is closed in Rq . f −1 (y) = x iﬀ f (x) = y).2. of any set C ⊆ E which is closed in E. To show that f −1 is continuous we need to show that the inverse image under −1 f . CONTINUOUS FUNCTIONS ON Rp
65
but f −1 cos 2π − However. and E is compact. n
can also prove this directly from the deﬁnition of sequential compactness or from the deﬁnition of compactness via open covers. Since f is onetoone and onto. is closed in f [E].)45 Hence f [C] is compact by the previous theorem.
45 On
. Then the inverse function f −1 is also continuous (on its domain f [E]).13. it is itself compact. It follows that f [C] is closed in f [E]. hence compact. (C is the intersection of the closed bounded set E with a closed set. 0) = 0. sin 2π − n n = 2π − 1 → f −1 (1. Since f is onetoone and onto its image. Suppose f : E → Rq is onetoone and continuous.
unless we also require the derivatives to converge uniformly. SEQUENCES AND SERIES OF FUNCTIONS. We can then measure the distance between two functions f and g (with the same domain E) by using the sup distance (or sup metric) ρ∞ (f. The point is that continuity and integration behave well under uniform convergence.14.3.1 carefully. the deﬁnitions and all the examples.3.3 and Example 2 page 197 carefully. is also important.2. but not under pointwise convergence. the triangle inequality.2 carefully.
66
14. §6. E ⊆ R. Study Reed §5.1 it is shown that the sup norm satisﬁes positivity. Study Reed §5.4. Sequences and series of functions.
. g) = f − g
∞
= sup f (x) − g(x). Pointwise and uniform convergence. §1–3. Diﬀerentiation does not behave well.e.3 14. Theorem 5. concerning diﬀerentiation under the integral sign. symmetry and transitivity — see the ﬁrst paragraph in Section 10. Properties of uniform convergence. The supremum norm and supremum metric.
The reference for this Chapter is Reed Chapter 5. and is well behaved under scalar multiples. and Example 2 page 197.2. satisﬁes positivity. In Proposition 5. In Example 2 on page 197 it is shown that as a consequence the sup metric is indeed a metric (i. but exactly the same deﬁnition applies for E ⊆ Rn . 14.1. Study Reed §5. and Chapter 6. One way of measuring the “size” of a bounded function f is by means of the supremum norm f ∞ (Reed page 175).
x∈E
In Reed. 14.2).
g) may give ∞). This problem does not arise if we restrict to continuous b functions.14. not everywhere zero.) It is possible to prove for f p the three properties of a “norm” in Proposition 5.e. We cannot do much on this topic.2. b] is replaced by any set E ⊆ Rn .) One similarly deﬁnes the notion of Cauchy in the sup metric (or equivalently Cauchy in the sup norm).1. (This is analogous to the fact that if a sequence from Rn is Cauchy then it converges to a point in Rn . g) = f − p p . But it is usually not desirable to restrict to continuous functions for other reasons.4.1 for convergence of a sequence of points in Rn . again for 1 ≤ p < ∞. particularly Example 1. but it is not a complete metric space (unless p = ∞). If we take the “completion” of the set S we are lead to the socalled Lebesgue measurable functions and the theory of Lebesgue integration.1.4.
for any 1 ≤ p < ∞. Study Reed pp 179–181. f ) → 0 as n → ∞.
46 Reed
says “converges in the sup norm”.) We say that the set of bounded and continuous functions deﬁned on E is complete in the sup metric (or norm). other than introduce it. One proves (page 178) that if a sequence (fn ) of continuous functions deﬁned on a closed bounded interval [a. as this follows from continuity by Theorem 13.1. The most important are p = 2 and p = 1 (in that order). Remark 14.
67
(See Reed page 177) A sequence of functions (fn ) deﬁned on the same domain E converges to f in the sup metric 46 if ρ∞ (fn . but then we need to restrict to functions that are continuous and bounded (otherwise the deﬁnition of ρ∞ (f. However f must then be zero “almost everywhere” — in a sense that can be made precise. i. b]. If E is not an interval in R then we need a more general notion of integration. Integral norms and metrics. since if f is continuous and a f  = 0 then f = 0 everywhere.
. The metrics corresponding to these norms are ρp (f. to do it properly requires the Lebesgue integral. f p does not quite give a norm. b] is Cauchy in the sup metric. 14. Actually. The important “integral” norms for functions f with domain E are deﬁned by
1/p
f
p
=
E
f p
. then it converges in the sup metric and the limit is also continuous on [a. In higher dimensional cases we replace “area” by “volume” of the appropriate dimension.3. (Note that this is completely analogous to Deﬁnition 11.) The same theorem and proof works if [a. * The set S of continuous functions deﬁned on a closed bounded set is a metric space in any of the metrics ρp .1 of Reed. The problem is that if f is not continuous then f p may equal 0 even though f is not the zero function. (If E is compact we need not assume boundedness. But E g is still interpreted as the area between the graph of f and the set E ⊆ R — being negative where it lies below the axis. SEQUENCES AND SERIES OF FUNCTIONS.
then f is C 1 and f = g (i.3 carefully. • If a series of C 1 functions converges uniformly on an interval to a function f . Theorem 6.5.
j
f converges uniformly if the sequence of partial sums
Note the important Weierstrass Mtest. • Integration behaves well in the sense that for a uniformly convergent series of continuous functions on a closed bounded interval. This gives a very useful criterion for uniform convergence of a series of functions. Theorem 6.e. and if the derivatives also converge uniformly to g say. Study Reed §4. Series of functions. We say that the series converges uniformly.3. Theorem 6.
68
14. b]).e. Example 2 is * material (it gives an example of a continuous and nowhere diﬀerentiable function). SEQUENCES AND SERIES OF FUNCTIONS. Study example 1 on page 240 of Reed. ∞ The inﬁnite series of functions j=1 fj is said to converge pointwise to the ∞ function f iﬀ the corresponding series of real numbers j=1 fj (x) converges to f (x) for each x in the (common) domain of the fj .3.1 — last sentence. this means that the inﬁnite sequence (Sn ) of partial sums Sn = f1 + · · · + fn converges in the pointwise sense to f .1 of Reed. The results about uniform convergence of a sequence of functions in Section14.14. summation and diﬀerentiation can be interchanged). summation and integration can be interchanged).2.
.3. From the deﬁnition of convergence of a series of real numbers.3. In particular: • A uniformly convergent series of continuous functions has a continuous limit. Suppose the functions fj all have a common domain (typically [a.3). the integral of the sum is the sum of the integrals (i.2) immediately give corresponding results for series (Reed Theorem 6.2 (Reed §5.
y) measures the “distance (or length of a shortest route) from x to y”. .
ρmax (x. The reference is Reed §5. but the proofs trivially generalise to n > 2. we need only require continuity as this already implies boundedness. the third might be thought of as saying that one route from x to y is to take a shortest route from x to z and then a shortest route from z to y — but there may be an even shorter route from x to y which does not go via z. See also Section 14. These are extremely important. . We discussed the Euclidean metric on Rn in Sections 10.2 Example 3 on page 197 of Reed is important. BC(E). . ∞ are Problem 1 page 202 of Reed in case n = 2. . It can be generalised to give the following metrics on Rn :
n 1/p
ρp (x. the set of bounded functions f : E (⊆ Rn ) → R. See Reed Example 2 page 197. The cases p = 1. y) = ρ(y. b] → R. z ∈ M 1.1. ρ(x. .
. 0) ≤ 1 } for p = 1. METRIC SPACES
69
15. The three requirements are natural. The proof that ρp is a metric has already been given for p = 2 (the Euclidean metric). This has the advantage of simplicity and generality. b]) have also been discussed. The case of arbitrary p is trickier. see Section 14. B(E). Note the diagrams in Reed showing { x ∈ R2 ρp (x. The best way to study convergence and continuity is to do it abstractly by means of metric spaces. Definition 15.1 and 10. Deﬁnition and examples. 2. The metrics ρp (1 ≤ p < ∞) on C[a. (triangle inequality) The idea is that ρ(x. . The basic example is the sup metric ρ∞ on C[a. x).15. y). . .1.1. (symmetry) 3.1. 15. . Example 15. y ∈ Rn . ∞) (called a metric) which satisﬁes for all x. Example 15. and I will set it later as an assignment problem (with hints!). 2. More generally. * If we regard (a1 . In the second case. . The basic idea we need is the notion of a “distance function” or a “metric”. y) ≤ ρ(x.1. y) = ((x1 − y1 ) + . .3 (Metrics on function spaces). .2 (Metrics on Rn ). We sometimes write ρ∞ for ρmax . z) + ρ(z. n} → R. y) = 0 iﬀ x = y. ρ(x. one has the sup metric on 1. . (positivity) 2. an ) as a function a : {1. ρ(x. y) ≥ 0 and ρ(x. A metric space is a set M together with a function ρ : M × M → [0. b] (and generalisations to other sets than [a.4. y) = max{ xi − yi  : i − 1.6 up to the end of the ﬁrst paragraph on page 201. ∞. the set of bounded continuous functions f : E (⊆ Rn ) → R. In particular.6. y. (xn − yn ) )
p
p 1/p
=
i=1
(xi − yi )
p
. for x.3. b]. Metric spaces Study Reed §5. 1 ≤ p < ∞. . n }. the set of continuous functions f : [a. if E is compact.
Reed page 201 paragraphs 2–4. (It is a particular case of what Reed calls a “discrete metric” in Problem 10 page 202. see the ﬁrst paragraph of Reed page 199. x = y.6 (Metrics on strings of symbols and DNA sequences). y) = x − y . METRIC SPACES
70
and interpret
n
a
p
as
i=1
a(i)p .2. Any subset of a metric space is itself a metric space with the same metric. Example 15. x) → 0 as n → ∞.7 (The natural metric on a normed vector space).1. For example. This corresponds to convergence with respect to the Euclidean metric. But convergence in the other metrics ρp is not the same. Any normed vector space gives rise to a metric space deﬁned by ρ(x. Definition 15. A simple metric on any set M is given by d(x. Convergence of a sequence of functions in the sup norm is the same as convergence in the sup metric.) It is not very useful. See also the ﬁrst example in Example 4 of Reed. fn (x) = nx 0 ≤ x ≤ n . In the same way as for sequences in R. other than as a source of counterexamples to possible conjectures. Once one has a metric.4 (Subsets of a metric space). Notice that convergence in a metric space is reduced to the notion of convergence of a sequence of real numbers.
(which is indeed the case for an appropriate “measure”). We discussed convergence in Rn in Section 11. ρ) converges to x ∈ M if ρ(xn . It is usually called the discrete metric. 15. But metric spaces are much more general. the discrete metric. The three properties of a metric follow from the properties of a norm. A sequence (xn ) in a metric space (M.1. 200 of Reed are not metrics which arise from norms. This is important in studying evolutionary trees. Example 15. We will see in the next section that convergence with respect to any of the metrics ρp on Rn is the same. Convergence in a metric space.1. there can be at most one limit. the following is the natural notion of convergence.15. y) = 1 0 x = y. and Examples 4 and 5 on pages 199.1.5 (The discrete metric). Example 5 in Reed is a discussion about metrics as used to estimate the diﬀerence between two DNA molecules. Example 15. Examples are the metrics ρp for 1 ≤ p ≤ ∞ on both Rn and the metrics ρp for 1 ≤ p ≤ ∞ on spaces of functions.2. let 0 −1 ≤ x ≤ 0 0 −1 ≤ x ≤ 0 1 f (x) = . In particular. 1 0<x≤1 1 1 n ≤x≤1
. then the metrics ρp (1 ≤ p < ∞) on function spaces give the metrics ρp on Rn as a special case.
Check that this is a metric.1. page 199.1. Example 15.
and Problem 11 page 203. y) for all x.15. But ρp (x. Since σ(xn .
that “Riemann integrable” implies “bounded” according to our deﬁnitions. Proof. It follows that c−1 σ(x. Example 15. 2 1 Reed also deﬁnes a weaker notion of “equivalent” (“uniformly equivalent” implies “equivalent” but not conversely). Two metrics ρ and σ on the same set M are uniformly equivalent 48 if there are positive numbers c1 and c2 such that c1 ρ(x.
ρ∞ (x. f ) = 1 and ρ1 (fn . b] (the space of Riemann integrablefunctions on [a. rather than equivalent metrics.
15. y) ≤ c−1 σ(x. But for consistency. 1] with the ρ1 metric).3. It follows from the example in the prevous section. Definition 15. ρ∞ )47 . 1]. we will only consider uniformly equivalent metrics. x) ≤ c2 ρ(xn . 1]. y) ≤ σ(x.3. it follows that ρ(xn . y) = x1 − y1 p + · · · + xn − yn p
1/p 1/p
≤ and
n max xi − yi p
1≤i≤n
= n1/p ρ∞ (x. is here called “uniformly equivalent” is usually called “equivalent”. Proof.3. y).1.1. Suppose ρ and σ are uniformly equivalent metrics on M. f ) = 2n (why? ). x). Remark 15. y) ≤ ρ(x. ρ1 ) (the space of Riemann integrable functions on [−1. Then xn → x with respect to ρ iﬀ xn → x with respect to σ.
1≤i≤n
This completes the proof.3. The following theorem is a generalisation of Problem 12 page 203.
48 What 47 Remember
. Theorem 15. Similarly for the other implication. y) ≤ c2 ρ(x.1 it follows fairly easily that they are uniformly equivalent to one another — Exercise). x) → 0 implies σ(xn . y). y). METRIC SPACES
71
1 Then ρ∞ (fn . The following theorem has a slightly simpler proof than the analogous one in the text for equivalent metrics. I will keep to the convention in Reed. The metrics ρp on function spaces are not uniformly equivalent (or even equivalent). We will just need the notion of equivalent. The reference here is the Deﬁnition and Theorem on Reed page 201. x) → 0. then from Deﬁnition15. But fn → f in ( R[−1. since fn → f with respect to ρ1 but not with respect to ρ∞ . Uniformly equivalent metrics. It is suﬃcient to show that ρp for 1 ≤ p < ∞ is equivalent to ρ∞ (since if two metrics are each uniformly equivalent to a third metric. y ∈ M.2. It is easiest to see this for ρ1 and ρ∞ on R[a.3.3. b]). y) = max xi − yi  ≤ ρp (x. Hence fn → f in ( R[−1. However. The metrics ρp on Rn are uniformly equivalent to one another.3.
It follows from the previous theorem that (M. Since (xi ) is also a Cauchy sequence in Rn it converges to x (say) in Rn . 1]. Suppose ρ and σ are uniformly equivalent metrics on M.4.4. Proof. In fact if m > n then
1/n 1/n
ρ1 (fn .3 page 178. (C[a. ρ1 ) of Riemann integrable functions on [−1. The Cauchy Completeness Axiom says that (R. then (M. σ) is complete. Since (xi ) converges in Rn it is Cauchy in Rn and hence in M .3. ρ) is complete if every Cauchy sequence from M converges to an element in M. Then it is Cauchy with respect to ρ and hence converges to x (say) with respect to ρ. xn ) ≤ . n ≥ N ⇒ ρ (xm . ρ2 ). By Theorem 15.4. ρp ) is complete for any 1 ≤ p ≤ ∞. To see this. METRIC SPACES
72
15. ρ) is complete iﬀ (M. To see this.7 to the end of page 204. Definition 15.2. but it does not converge to a member of M . A sequence (xn ) is a Cauchy sequence if for any there is a corresponding N such that m.4. However. then (M. Moreover (Problem 10(a) page 209): Theorem 15. and it follows from Proposition 11. ρ1 ) is not complete. consider any Cauchy sequence (xi ) in (M.5. Definition 15.5.1 in Rp . > 0
It follows from the Deﬁnition in Reed Page 177 that for a sequence of functions “Cauchy in the sup norm” is the same as “Cauchy with respect to the sup metric”. The deﬁnition of a Cauchy sequence is exactly as for Deﬁnition 11. A metric space (M. x ∈ M . To see this. Hence (M. But fn does not converge to a member of this space49 . ρ2 ) is complete. (C[a. Then (M. if M ⊆ Rn is closed. b]. consider the example in Section 15.2. that a sequence is Cauchy with respect to one metric then it is Cauchy with respect to any uniformly equivalent metric. ρ) is complete. It follows as for convergence. b]. fm ) =
0
fn − fm  ≤
0
fn  + fm  ≤ 2/n. ρp ) is not complete.5. contradicting uniqueness of limits in the larger space
.
49 If it did.4. If M ⊆ Rn is not closed. Hence (M. It is fairly easy to see that the sequence (fn ) is Cauchy in the ρ1 metric. Let (xn ) be Cauchy with respect to σ. ρ∞ ) is complete from Reed Theorem 5. More generally. The deﬁnition of a complete metric space is extremely important (Reed page 204). and hence is also Cauchy in (C[a.3.15. ρp ) is complete for any 1 ≤ p ≤ ∞. ρp ) is not complete.4. Since M is closed.
Since fn → f in the larger space (R[−1. ρ2 ) is complete. Example 15.2.3. ρ2 ) is also complete. choose a sequence (xi ) in M such that xi → x ∈ M .3 that (Rn .1. Suppose (M. b]. It follows that (M. σ) is complete. It then follows from the previous theorem that (Rn . The reference here is Reed §5. it would converge to the same function in the larger space. ρp ) is complete for any 1 ≤ p ≤ ∞. xn → x with respect to σ. ρ1 ). 1] with the ρ1 metric. Example 15. it follows that (fn ) is Cauchy in this space.4. Cauchy sequences and completeness.
b]. and in particular no subsequence can converge.4). d) applies to an arbitrary metric space (M. open ball Br (x). ρ) with the same proofs. ρ1 ) and (M2 .1. In this section I will point out that much of what we proved for (Rn . The three dot points there are true for S ⊆ M. The deﬁnitions of neighbourhood. But we will not actually need the material in this course.6. The product of two metric spaces (M1 . Convergent sequences are Cauchy.2 is bounded in C[−1. Chapter 10. The reference here is Reed §5.f.8. Topological notions in metric spaces*.6. The proof is the same as Theorem 10.1. 1] but no subsequence converges.6 and Theorem 10. In Section 16 we will give some important applications of the Contraction Mapping Principle. we will apply it to the complete metric space (C. In particular. ρ2 ) the A1 × A2 is open in (M1 × M2 .7. ρ1 × ρ2 ). A set is closed iﬀ every convergent sequence from the set has its limit in the set.1 applies with exactly the same proof. and open set are analogous to those in Section 10. References at about the right level are “Primer of Modern Analysis” by Kennan T. The deﬁnition of a bounded set is the same as in Section 11.2).5.6.1 page 205. The extensions are straightforward and I include them for completeness and future reference in the Analysis II course. Theorem 10. Chapter 7.5. y2 )2 .2. this is proved easily as for sequences of real numbers. ρ) instead of the closed set S and the Euclidean metric d. The Contraction Mapping Theorem. interior. If A1 is open in (M1 . To see this take x = m . [a. tha analogue of Proposition 11. The sequence (fn ) in Section 15. exterior point and exterior are analogous to those in Section 10. Smith. Then 1 1 n ≥1− = . 15. for any complete metric space (M. interior point. Theorem 11.6 for a function F : S → S where S ⊆ Rp applies with S replaced by M and d replaced by ρ for any metric space (M. METRIC SPACES
73
15. ρ2 ) is the set M1 × M2 with the metric ρ1 × ρ2 deﬁned by (ρ1 × ρ2 ) (x1 . m 2 2 It follows that no subsequence can be Cauchy. y1 ).7.4.3.1. ρ∞ ).10. and “An Introduction to Analysis” (second edition) by William Wade. fm ) ≥ 2 1 if m ≥ 2n. Bounded sequences do not necessarily have a convergent subsequence (c.2 remains true with Rn replaced by M. (x2 .2 holds (with the origin replaced by any ﬁxed point a∗ ∈ M). convergent sequences are bounded (Proposition 11. y1 )2 + ρ2 (x2 . But Cauchy fm (x) − fn (x) = 1 −
. ρ1 ) and A2 is open in (M2 .15. Theorem 11. The deﬁnition of a contraction in Section 11. Or see Reed Theorem 5.5. The deﬁnitions of boundary point.2 is true with Rn replaced by M. boundary.3. ρ). Cauchy sequences are deﬁned as in Deﬁnition 11. Contraction Mapping Principle. the proof is the same as for Theorem 11. The deﬁnition of closed set is analogous to that in Section 10.5. y2 ) = ρ1 (x1 . Completeness is needed in the proof in order to show that the Cauchy sequence of iterates actually converges to a member of M. In fact it is not too hard to check that 1 ρ∞ (fn .3.
3. A metric space (M. First suppose (M.3.50 If (M. Let A be k 1 the ﬁnite set of points corresponding to ε = k in the deﬁnition of total boundedness. The proof that “compact” implies “sequentially compact” is esentially identical.
. Then A is countable. This is the analogue of Theorems 12. ρ) and (N .1. The proof in the other direction uses the existence of a countable dense subset of M. y) < δ ⇒ σ(f (x). One deﬁnes uniform continuity as in Deﬁnition 13. Next suppose (M.15. ρ) is sequentially compact. Then it is totally bounded again by the result two paragraphs back. This corresponds to the set A in Deﬁnition 12. and usually.2. (Note that what Smith calls “compact” is what is here. If (xn ) is a sequence from M then it has a Cauchy subsequence by the result two paragraphs back. This corresponds to the set A in Deﬁnition 12.1.9. ρ) is complete and totally bounded.e. The proof of the ﬁrst ‘iﬀ” is fairly straightforward. since if x ∈ M then there exist points in A as close to x as we wish. and second we use the fact (Exercise) that if a Cauchy sequence has a convergent subsequence then the Cauchy sequence itself must converge to the same limit as the subsequence. a cover of M by subsets of M that are open in M) has a ﬁnite subcover.2 and 12. the restriction of these sets to A is a cover of A by sets that are open in A). The proof of the second “iﬀ” is similar to that on Theorem 12. A metric space (M.) We say a subset D of M is dense if for each x ∈ M and each ε > 0 there is an x∗ ∈ D such that x ∈ Bε (x∗ ). see Smith Theorem 9. The proof is the same as for Theorem 7. The existence of a countable dense subset of M follows from sequential compactness.1. Then it can be proved a metric space is totally bounded iﬀ every sequence has a Cauchy subsequence. The
50 We have already noted that a sequentially compact set is totally bounded. and this subsequence converges to a limit in M by completeness and so (M. A continuous function deﬁned on a compact set is bounded above and below and has a maximum and a minimum value. ρ) is sequentially compact. It is also dense.1.1. (This replaces the idea of considering those points in Rp all of whose components are rational.2 being compact (although A is covered by sets which are open in Rp . σ) are metric spaces and f : M → N then we say f is continuous at a ∈ M if (xn ) ⊆ M and xn → a ⇒ f (xn ) → f (a). called “sequentially compact”.2. The metric space (M. A subset of a metric space (M. It follows that f is continuous at a iﬀ for every ε > 0 there is a δ > 0 such that for all x. ρ) is sequentially compact (compact) iﬀ it is sequentially compact (compact) when regarded as a metric space with the induced metric from M. where the metric space corresponds to A with the metric induced from Rp . ρ) is totally bounded if for every ε > 0 there is a cover of M by a ﬁnite number of open balls (in M) of radius ε.1 being sequentially compact. METRIC SPACES
74
sequences will not necesarily converge (to a point in the metric space) unless the metric space is complete.1. ρ) is sequentially compact if every sequence from M has a convergent subsequence (to a point in M). ρ) is compact if every open cover (i. f (y)) < ε.) A metric space is complete and totally bounded iﬀ it is sequentially compact iﬀ it is compact. Let A = ∞ k=1 Ak . To show it is complete suppose the sequence (xn ) is Cauchy: ﬁrst by compactness it has a subsquence which converges to a member of M. Moreoveor it is uniformly continuous. y ∈ M: ρ(x.3.
2. after using the fact that compactness is equivalent to sequential compactness. Limit points.3. and the equivalence of this deﬁnition with the ε–δ characterisation. The proof is essentially the same as for Theorem 13.3.2.2.4. isolated points.
. The proof is the same as in Theorem 13.15. A function f : M → N is continuous iﬀ the inverse image of every open (closed) set in N is open (closed) in M.3. The inverse of a onetoone continuous function deﬁned on a compact set is continuous. the deﬁnition of limit of a function at a point.1 and Theorem 13. The proof is esentially the same as in Theorem 13.1.5.5. are completely analogous to Deﬁnition 13. METRIC SPACES
75
proof is the same as for Theorem 13. The continuous image of a compact set is compact.1.
. b ∈ M.1. and this vector does not depend on x0 . in the ρ1 metric. where w 1 = Thus in order to prove (24) it is suﬃcient to show (25) Pv
1
≤ (1 − nε) v
1
for all v ∈ H. and it is also necessary to restrict to the subset of Rn consisting of those vectors a such that ai = 1. Then v ∈ H. . Lemma 16. v
1
=2
ηij .1. If all entries in the probability transition matrix P are greater than 0. Then P is a contraction map on M = { a ∈ Rn  ai = 1 }. P x0 . b) = a − b
1
vi = 0 }. P 2 x0 . The vector x∗ is the unique nonzero solution of (P − I)x∗ = 0.2.0.1.
1
ρ1 (P a.2: Theorem 16.1. then the sequence of vectors x0 . b) for all a.
where ηij > 0. SOME APPLICATONS OF THE CONTRACTION MAPPING PRINCIPLE
76
16. Suppose P is a probability transition matrix all of whose entries are at least ε. converges to a probability vector x∗ . Moreover. We will prove that (24) ρ1 (P a. = v 1. the same results are true even if we just assume P k has all entries nonzero for some integer k > 1. Markov processes. P b) ≤ (1 − nε) ρ1 (a. The proof is rather subtle. since P is only a contraction with respect to the ρ1 metric. . Recall Theorem 4. P b) = P a − P b
= P v 1. b ∈ M. we can write (26) Then Pv
1
v=
ij
ηij (ei − ej ).16. and x0 is a probability vector. with contraction constant 1 − nε.
=
ij
ηij P (ei − ej ) ηij P (ei − ej )
ij
1 1
≤
(by the triangle inequality for =
ij
·
1
and using ηij > 0) (Pki − Pkj ) ek
1
ηij
k
=
ij
ηij
k
Pki − Pkj 
. where H = { v ∈ Rn  Moreover. Suppose a. Let v = a − b. for some ε > 0. . Proof. From the following Lemma. ρ1 (a.
wi . Some applicatons of the Contraction Mapping Principle 16. We will give a proof that uses the Contraction Mapping Principle.
. and write (after renumbering if necessary) v = v1 e1 + · · · + vk+1 ek+1 . . where v1 + · · · + vk+1 = 0. b} for any real numbers a and b. v∗
1
vi = 0.
=2
ηij . as one sees by checking.
Proof.j 1
(from (26))
vi = 0. Suppose v ∈ Rn and form v=
i. SOME APPLICATONS OF THE CONTRACTION MAPPING PRINCIPLE
77
=
ij
ηij
k
Pki + Pkj − 2 min{Pki . Choose p so vp  = max{ v1 .
=2
ηij . this proves the result in this case. Since v∗ has no eq component.3. Pkj }
(since a − b = a + b − 2 min{a. then after renumbering we have v = v1 e1 + v2 e2 = v1 (e1 − e2 ) = v2 (e2 − e1 ).
where ηij > 0. . Then v can be written in the v
1
ηij (ei − ej ).16. Choose q so vq has the opposite sign to vp (this is possible since that (27) vp + vq  = vp  − vq  vi = 0). . the case b ≤ a) =
ij
ηij (2 − 2ε)
(since the columns in P sum to 1 and each entry is ≥ ε ) = (1 − nε) v This completes the proof. If v is in the span of two basis vectors. Assume that v is spanned by k + 1 basis vectors. Write (where indicates that the relevant term is missing from the sum) v = v1 e1 + · · · + vq eq + · · · + (vp + vq )ep + · · · + vkn+1 ek+1 + vq (eq − ep ) = v∗ + vq (eq − ep ). Suppose the claim is true for any v ∈ H which is in the span of k basis vectors. vk+1  }. Since either v1 ≥ 0 and then v 1 = 2v1 . or v2 ≥ 0 and then v 1 = 2v2 . Lemma 16.1. v= ηij (ei − ej ) + vq (eq − ep ) ηij (ei − ej ) + vq (ep − eq ) vq ≥ 0 vq ≤ 0 ηij (ei − ej ). . and the sum of the coeﬃcients of v∗ is we can apply the inductive hypothesis to v∗ to write v∗ = In particular. where ηij > 0. without loss of generality. Note (where v1 + v2 = 0)
since vq  ≤ vp  and since vq has the opposite sign to vp .
Theorem 5.1.1. Then there is a unique continuous function ψ(x) deﬁned on [a. P k is a contraction map with contraction constant λ (say).1. The fact we only needed to assume some power of P was a contraction map can easily be generalised.1.2. in the Contraction Mapping Principle.2. x0 + ρ1 ((P k )m x0 . where 0 ≤ j < k. b]. b] such that
b 1 k
(29) provided λ <
1 M (b−a) . y) ψ(y) dy. From Lemma 16. For the second paragraph we consider a sequence P i (x0 ) with i → ∞. x∗ ) ≤ λm max ρ1 P j x0 .1. Integral Equations. The last paragraph was proved after the statement of Theorem 4.16.
This proves (28) and hence the Lemma. b] × [a. See Reed Section 5. 16.0.1.
. Remark 16. a ≤ y ≤ b}. The ﬁrst term approaches 0 since λm → 0 and the second approaches 0 by the Contraction Mapping Principle applied to the contraction map P k . (28) But v∗
1 1
=2 = v∗
ηij + 2vq .1.1. x∗ ) ≤ ρ1 (P mk+j x0 .
0≤j<k
(Note that m → ∞ as i → ∞). Instead of the long proof in Reed of the main theorem. Proof of Theorem 16. Theorem 11.2.4 for discussion. we see here that it is a consequence of the Contraction Mapping Theorem. y)  a ≤ x ≤ b. That is. x∗ ) ≤ ρ1 (P k )m P j x0 . + 2vq .2. For any natural number i we can write i = mk + j. P mk x0 ) + ρ1 (P mk x0 .6. Then ρ1 (P i x0 . The ﬁrst paragraph of the theorem follows from the Contraction Mapping Principle and Lemma 16. y) is continuous on [a. if we assume F ◦ · · · ◦ F is a contraction map for some k then there is still a unique ﬁxed point.e.2. SOME APPLICATONS OF THE CONTRACTION MAPPING PRINCIPLE
78
All that remains to be proved is v i. x∗ ) = ρ1 (P mk+j x0 . b] and M = max{K(x. x∗ ) ≤ λm ρ1 P j x0 . (P k )m x0 + ρ1 ((P k )m x0 .
v
1
1
= v1  + · · · + vq  + · · · + vp + vq  + · · · + vk+1  = (v1  + · · · + vk+1 ) − vq  − vp  + vp + vq  = (v1  + · · · + vk+1 ) − 2vq  = v
1
by (27)
− 2vq . Let F (x) be a continuous function deﬁned on [a. The proof is similar to that above for P . x∗ ). Theorem 16. Suppose K(x.
ψ(x) = f (x) + λ
a
K(x.4.1. x0 + ρ1 ((P k )m x0 .
The second term is also continuous. We will use the contraction mapping principle on the complete metric space (C[a. b] is obtained by evaluating the right side of (30). b] by
b
(30)
(T ψ)(x) = f (x) + λ
a
K(x. the integrand in (30) is a continuous function of y. For ﬁxed x. and so indeed T : C[a. Moreover. y) dy ≤
a
G(x1 .16.
Now suppose > 0. b] → C[a. ψ2 ). it follows that ρ∞ (T ψ1 . it is uniformly continuous there (by Theorem 13. it follows that T is a contraction map with contraction ratio λ M (b − a). being f . b]. y) is continuous on the closed bounded set [a. and so the integral exists. b]. To see this let G(x. b]. y) (ψ1 (y) − ψ2 (y)) dy K(x. This completes the proof that T ψ is continuous (in fact uniformly) on [a. b]. y2 ) < δ Using this it follows from (31) that x1 − x2  < δ ⇒ g(x1 ) − g(x2 ) < (b − a). T ψ2 ) ≤ λ M (b − a) ρ∞ (ψ1 . b] × [a. We next claim that T ψ is continuous. ρ∞ ). ⇒ G(x1 . ψ2 ). By the assumption on λ.1). it follows that T has a unique ﬁxed point. We next claim that T is contraction map on C[a. b]. b]. Hence there is a δ > 0 such that (x1 . ρ∞ ) is a complete metric space. Then T ψ = f + λg. y2 ) < . y1 ) − (x2 . Since this is true for every x ∈ [a. Because (C[a. b] — the value of T ψ at x ∈ [a.
. and so there is a unique continuous function ψ solving (29).2. which is < 1. b] → C[a. b]. ﬁrst note that
b b
(31) g(x1 ) − g(x2 ) =
a
G(x1 . b] → C[a. then T ψ is certainly a function deﬁned on [a. Since G(x. y) ψ(y) dy
Note that if ψ ∈ C[a. y) − G(x2 . y) − G(x2 . y) dy. y) ψ(y) and b let g(x) = a G(x. b]. SOME APPLICATONS OF THE CONTRACTION MAPPING PRINCIPLE
79
Proof. The ﬁrst term on the right side of (30) is continuous. y) = K(x. To see that g is continuous on [a. b]. b]. y) dy. Deﬁne T : C[a. ψ is a ﬁxed point of T iﬀ ψ solves (29). To see this we estimate
b
T ψ1 (x) − T ψ2 (x) = λ
a b
K(x. y1 ) − G(x2 . Hence T : C[a. y) ψ1 (y) − ψ2 (y) dy
a
≤ λ
≤ λ M (b − a) ρ∞ (ψ1 . b].
Hence g is uniformly continuous on [a.
but with a simpler proof using the Contraction Mapping Principle. where M = max{ f (t.1 here. t0 + T ]. 1. . y). Let f be continuously diﬀerentiable on the square S = [t0 − δ. dy = f (t.1.e. and a unique continuously diﬀerentiable function y(t) deﬁned on [t0 − T. Note that we also have a way of approximating the solution. y0 ) “escapes” through the top of the square S. we will require that M T ≤ δ. y) dt y(t0 ) = y0 . Then there is a T ≤ δ. . 16. .
. on [t0 − T. . y) ∈ S }.1. In this section we will prove the Fundamental Existence and Uniqueness Theorem for Diﬀerential Equations. and this is why we may need to take T < δ. T 3 ψ0 . y)  (t. such that (32) Remark 16. y0 ) and illustrates the situation.1.
and then successively apply T to ﬁnd better and better approximations to the solution ψ. The diagram shows the square S centred at (t0 .2. T 2 ψ0 .1. t0 + T ].3. Diﬀerential Equations.3. We can begin with some function such as ψ0 (x) = 0 for a ≤ x ≤ b. T ≤ δ/M . In practice. i. we apply some type of numerical integration to ﬁnd approximations to T ψ0 . We are looking for the unique curve through the point (t0 .
Theorem 16. y0 ) which at every point is tangent to the line segment of slope f (t. the absolute value of the slope of the solution will be ≤ M .16.1. From the diagram.1. We will prove Theorem 7.1 of Reed.1. Note that the solution through the point (t0 . SOME APPLICATONS OF THE CONTRACTION MAPPING PRINCIPLE
80
Remark 16.3. y0 + δ]. t0 + δ] × [y0 − δ. You should ﬁrst read Chapter 7 of Reed up to the beginning of Theorem 7. For this reason. Theorem 7.
since f is continuous and hence bounded on S. * In fact more is true. we require that (s. t0 + T ] → C[t0 − T. If f is inﬁnitely diﬀerentiable. y(s)) belong to the square S. t0 + T ] and hence iﬀ y(t) is a solution of (32) on [t0 − T. it follows that there is a constant M such that (t.e. i. then so is the graph of (F y)(t). δ . and in particular is continuous. Moreover. t0 + T ]. y(t) is in fact twice continuously diﬀerentiable.
.16. The right side of the diﬀerential equation is diﬀerentiable (why?) and its derivative is even continuous (why?).e. it follows from the deﬁnition (34) that y(t) is a ﬁxed point of F iﬀ y(t) is a solution of (33) on [t0 − T. t0 + T ]. provided t satisﬁes M t − t0  ≤ δ. For this we need the fact that since ∂f is continuous on S. Hence (35) T ≤ F : C[t0 − T. we impose the restriction on T that M T ≤ δ.
For the integral to be deﬁned. Step 2 : We next impose a further restriction on T in order that F be a contraction map. More precisely. y(s)) ds. For this reason. But it then follows from the diﬀerential equation that dy/dt is in fact continuous. it can similarly be shown that y(t) is also inﬁnitely diﬀerentiable. however). It follows by integrating (32) from t0 to t that any solution y(t) of (32) satisﬁes
t
(33)
y(t) = y0 +
t0
f (s. The function y(t) must be diﬀerentiable for the diﬀerential equation to be deﬁned. and for this reason we will need to restrict to functions y(t) deﬁned on some suﬃciently small interval [t0 − T. It follows that if the graph of y(t) is in S. SOME APPLICATONS OF THE CONTRACTION MAPPING PRINCIPLE
81
2. M This ensures that (F y)(t) is deﬁned for t ∈ [t0 − T. i. y) ∈ S This implies from (34) that (F y)(t) − y0  ≤ M t − t0 . It also satisﬁes the initial condition y(t0 ) = y0 (why? ). it follows from the Fundamental Theorem of Calculus that (F y)(t) is diﬀerentiable. Step 1 : We ﬁrst reduce the problem to an equivalent integral equation (not the same one as in the last section. If y(t) is a continuous function deﬁned on [t0 − δ. Since f (s. t0 + δ]. y(s)) is continuous. Proof of Theorem. t0 + T ]. y(s)) ds. any continuous function y(t) satisfying (33) is diﬀerentiable (by the Fundamental Theorem of Calculus) and its derivative satisﬁes the diﬀerential equation in (32). t0 + T ]. Step 2 : We next show that (33) is the same as ﬁnding a certain “‘ﬁxed point”. This shows that dy/dt is diﬀerentiable with continuous derivative. there is a constant ∂t ⇒ f (t. let F y (often written F (y)) be the function deﬁned by
t
(34)
(F y)(t) = y0 +
t0
f (s. y) ≤ M.
Conversely. being a composition of continuous functions.
the second order diﬀerential equation y = F (t. y2 ). y) and the diﬀerential equation are deﬁned for all (t. In fact this works for much more general sets S. . . Second and higher order diﬀerential equations can be reduced to ﬁrst order systems by introducing new variables for the lower order derivates. works for general ﬁrst order systems of diﬀerential equations of the form dy1 = f1 (t. or will exist for all time. y2 . y. T ≤ 2K This guarantees that F is a contraction map on C[t0 −T. F y2 ) ≤ KT ρ∞ (y1 . y2 = F (t. t0 + T ] and any y1 . For example. y) ∈ S ⇒
t
F y1 (t) − F y2 (t) =
t0 t
f (s. y) ∈ R2 . Remark 16. . One can show by repeatedly applying the previous theorem that the solution can be continued until it either escapes through the top. This is indeed the case.2. yn ). t0 + T ]. . y ) is equivalent to the ﬁrst order system of diﬀerential equations y1 = y2 .3. Since t is any point in [t0 − T. that (t. y1 . . then the solution will either approach +∞ or −∞ at some ﬁnite time t∗ . In other words. dyn = fn (t. ∂y We now compute for any t ∈ [t0 − T. y1 (s)) − f (s. . For more discussion see Reed Section 7. y2 ). t0 + T ]. yn ) dt . It appears from the diagram that we do not really need the second restriction (36) on T . . . and that this is then the 2 unique solution of (32).3. . y2 ) since t − t0  ≤ T and since y1 (s) − y2 (s) ≤ ρ∞ (y1 .3. . if the function f (t. y1 . y1 . We now make the second restriction on T that 1 (36) . y2 . yn ) dt dy2 = f2 (t. . y1 (s)) − f (s. y2 ). y2 ∈ C[t0 − T.16. . bottom or sides of S. with only notational changes. it follows that ρ∞ (F y1 . It follows that F has a unique ﬁxed point.4. Remark 16. dt Remark 16. y2 (s)) ds f (s. In particular.3.2. . y2 . SOME APPLICATONS OF THE CONTRACTION MAPPING PRINCIPLE
82
K such that ∂f ≤ K. t0 +T ] with contraction constant 1 . The same proof. . we should only need (35).
. y1 . y1 (s)) ds
t0 t
≤ ≤
t0
K y1 (s) − y2 (s) ds
by the Mean Value Theorem
≤ KT ρ∞ (y1 .
if y1 (t). SOME APPLICATONS OF THE CONTRACTION MAPPING PRINCIPLE
83
To see this. except that one must specify the ﬁrst n − 1 derivatives at t0 . similar remarks also apply to higher order systems of diﬀerential equations. Finally.16. Conversely. Then y = y2 and y(t) solves the second order diﬀerential equation. Then clearly y1 (t). provided we specify both y(t0 ) and y (t0 ). y2 (t) solve the ﬁrst order system let y = y1 . just notation!
. suppose y(t) is a solution of the given diﬀerential equation and let y1 = y. y2 = y . It follows from the previous remark that a similar Existence and Uniqueness Theorem applies to second order diﬀerential equations. y2 (t) solve the ﬁrst order system. There are no new ideas involved. A similar remark applies to nth order diﬀerential equations.
The components of y are given by y i = L(ei ). we consider functions f : D(⊂ Rn ) → R where the domain D is open. Convention Unless stated otherwise. r
D
Most of the following applies to more general domains D by taking onesided.17.2.6. called “continuously differentiable”. .1. 17. 7.
. or otherwise restricted. It is the same as what is here. . . .2. we have the following. xn ). Proposition 17. Conversely. . 4. y n ) and x = (x1 .e. DIFFERENTIATION OF REALVALUED FUNCTIONS
86
17. No essentially new ideas are involved. If m = 1 and n = 1 or 2. . 17. Note that the deﬁnition of “diﬀerentiable” on Reed page 155 is completely nonstandard. . The inner product in Rn is represented by y · x = y 1 x1 + .
x. What is required is just a basic understanding of the ideas and application to straightforward examples. . limits. In this section we discuss the notion of derivative (i. . This implies that for any x ∈ D there exists r > 0 such that Br (x) ⊂ D. In the next chapter we consider the case for functions f : D (⊂ Rn ) → Rm . diﬀerential ) for functions f : D (⊂ Rn ) → R. Introduction. In case n = 2 (or perhaps n = 3) we can draw the level sets. 3. Diﬀerentiation of RealValued Functions I have included quite a lot of additional material. 5. For each ﬁxed y ∈ Rn the inner product enables us to deﬁne a linear function Ly = L : Rn → R given by L(x) = y · x. 10. we can sometimes represent such a function by drawing its graph. for future reference. as is done in Section 17. For any linear function L : Rn → R there exists a unique y ∈ Rn such that (37) L(x) = y · x ∀x ∈ Rn . + y n xn where y = (y 1 . 8. Algebraic Preliminaries. You need only consider Subsections 1.1. . and elsewhere.
The notation Di f (x) is also used. xn ) − f (x1 . i.e. Definition 17. . then on choosing x = ei it follows we must have L(ei ) = y i i = 1. . .1. Suppose L : Rn → R is linear.3.1. (38) ∂f (x) ∂xi =
t→0
lim
L
. Partial Derivatives. . The ith partial derivative of f at x is deﬁned by f (x + tei ) − f (x) t f (x1 . . n. . . . xn ) = lim . with t = 0 corresponding to the point x. Think of g as being deﬁned along the line L. 17. . . . . x
. This proves the existence of y satisfying (37). Thus 17.3. Deﬁne y = (y 1 .
= x1 y 1 + · · · + xn y n = y · x. . . then the vector y corresponding to L is the zero vector. . .4. . . xi . .17. DIFFERENTIATION OF REALVALUED FUNCTIONS
87
Proof. . . .
Note that if L is the zero operator . . xi + t. n. . . t
provided the limit exists. y n ) by y i = L(ei ) Then L(x) = = L(x1 e1 + · · · + xn en ) x1 L(e1 ) + · · · + xn L(en ) i = 1. xn ). x+tei
D
∂f (x) is just the usual derivative at t = 0 of the realvalued function g ∂xi deﬁned by g(t) = f (x1 . Directional Derivatives. xi + t. . . The directional derivative of f at x in the direction v = 0 is deﬁned by (39) Dv f (x) = lim
t→0
f (x + tv) − f (x) . . . if L(x) = 0 for all x ∈ Rn . . . t→0 t provided the limit exists. . Definition 17. .
. . .4. The uniqueness of y follows from the fact that if (37) is true for some y. .
approaches zero as x → a. 17. f(a)+f'(a)(xa)
.17. faster than x − a → 0. think of the function g as being deﬁned along the line L in the previous diagram. x
. The Diﬀerential (or Derivative).
x
Note that the righthand side of (41) is linear in x. As before. or diﬀerence between the two sides of (41). More precisely f (x) − f (a) + f (a)(x − a) x − a = f (x) − f (a) + f (a)(x − a) x−a
.) The error. Thus we interpret Dv f (x) as the rate of change of f at x in the direction v. DIFFERENTIATION OF REALVALUED FUNCTIONS
88
It follows immediately from the deﬁnitions that ∂f (40) (x) = Dei f (x). at least in the case v is a unit vector.
graph of x → f(x) graph of x → f(a)+f'(a)(xa)
f(x) .
a
.5. Namely: (41) f (x) ≈ f (a) + f (a)(x − a). Then f (a) can be used to deﬁne the best linear approximation to f (x) for x near a. x+tv
D
L
Note that Dv f (x) is just the usual derivative at t = 0 of the realvalued function g deﬁned by g(t) = f (x + tv). the right side is a polynomial in x of degree one. Exercise: Show that Dαv f (x) = αDv f (x) for any real number α. ∂xi
. Motivation Suppose f : I (⊂ R) → R is diﬀerentiable at a ∈ I. (More precisely. .
Temporarily. Let v ∈ Rn and suppose f is diﬀerentiable at a. Then Dv f (a) exists and df (a). the diﬀerential is unique. Definition 17.2 that if L exists. Proposition 17. Suppose f : D (⊂ Rn ) → R. Notation: We write df (a).5. x − a for L(x − a).)
graph of x → f(a)+L(xa) graph of x → f(x)
f(x) . =
We make this the basis for the next deﬁnition in the case n > 1.1. Then f is diﬀerentiable at a ∈ D if there is a linear function L : Rn → R such that (43) f (x) − f (a) + L(x − a) x − a → 0 as x → a.(f(a)+L(xa))
x a
The idea is that the graph of x → f (a) + L(x − a) is “tangent” to the graph of f (x) at the point a.
. it is uniquely determined by this deﬁnition.1.17. The next proposition gives the connection between the diﬀerential operating on a vector v. we let df (a) be any linear map satisfying the deﬁnition for the diﬀerential of f at a.5. v = Dv f (a).5.5. and the directional derivative in the direction corresponding to v.
The linear function L is denoted by f (a) or df (a) and is called the derivative or diﬀerential of f at a. f (a) . In particular. (We will see in Proposition 17. In particular. We think of df (a) as a linear transformation (or function) which operates on vectors x − a whose “base” is at a. it shows that the diﬀerential is uniquely deﬁned by Deﬁnition 17.2. and read this as “df at a applied to x − a”. DIFFERENTIATION OF REALVALUED FUNCTIONS
89
(42)
f (x) − f (a) − f (a) x−a → 0 as x → a.
That is. y.5. v = v1 (a) + v2 (a) + v3 (a) ∂x ∂y ∂z = v1 (2a1 + 3a2 2 ) + v2 (6a1 a2 + 3a2 2 a3 ) + v3 (a2 3 + 1).17. v =
i=1
vi
∂f (a). e1 = ∂x Definition 17.3. t→0 t lim Thus Dv f (a) = df (a).5. tv t = 0. 0. ∂xn (a)
df (a). 0. 1). Let f (x. df (a) is the row vector Proof.4. 0.
t→0
Hence f (a + tv) − f (a) − df (a). Thus df (a) is the linear map corresponding to the row vector (2.5. v 1 e1 + · · · + v n en v 1 df (a). If a = (1. (Rates of Convergence) If a function ψ(x) has the property that ψ(x) → 0 as x → a. v = 2v1 + v3 . Let x = a + tv in (43). DIFFERENTIATION OF REALVALUED FUNCTIONS
90
Proof. e1 + · · · + v n df (a). Then for any vector v. en v 1 De1 f (a) + · · · + v n Den f (a) ∂f ∂f v 1 1 (a) + · · · + v n n (a). x − a then we say “ψ(x) → 0 as x → a. Corollary 17. a2 3 + 1). .
n
df (a). Then ∂f ∂f ∂f df (a). ∂f (1. x − a
. Suppose f is diﬀerentiable at a. v is just the directional derivative at a in the direction v. . df (a). The next result shows df (a) is the linear map given by the row vector of partial derivatives of f at a. 6a1 a2 + 3a2 2 a3 . 1) and v = e1 then df (1. If ψ(x) ≤ M ∀x − a < . We write o(x − a) for ψ(x). 0. ∂xi .5. v = 0. v = = = =
∂f ∂f ∂x1 (a). 1). and read this as “little oh of x − a”. 0. Thus df (a). . z) = x2 + 3xy 2 + y 3 z + z. faster than x − a → 0”. Then lim f (a + tv) − f (a) + df (a). If a = (1. . v as required. Thus df (a) is the linear map corresponding to the row vector (2a1 + 3a2 2 . 1) = 2. ∂x ∂x
Example 17. 1) then df (a).
i. x − a + ψ(x). Then f (x) = f (a) + df (a). Proof.5. where L : R → R is linear and ψ(x) = o(x − a). suppose f (x) = f (a) + L(x − a) + ψ(x). Proposition 17. Let ψ(x) = f (x) − f (a) + df (a). x − a + ψ(x). since the existence of the partial derivatives does not imply diﬀerentiability. and read this as “big oh of x − a”.5.7. Clearly. This is straightforward (exercise) from Proposition 17.7. and ψ(x) = o(x − a) from Deﬁnition 17. DIFFERENTIATION OF REALVALUED FUNCTIONS
91
ψ(x) is bounded as x → a.6. Proposition 17.5. Moreover. then so are αf and f + g. d(f + g)(a) = df (a) + dg(a). suppose f (x) = f (a) + L(x − a) + ψ(x). Then
n
f (x) − f (a) + L(x − a) x − a Finally we have:
=
ψ(x) →0 x − a
as x → a. Conversely. but the converse may not be true as the above example shows. at least as fast as x − a → 0”. Similarly for αf 51 . The previous proposition corresponds to the fact that the partial derivatives for f + g are the sum of the partial derivatives corresponding to f and g respectively.
n
Proof. if Example 17. The next proposition gives an equivalent deﬁnition for the diﬀerential of a function. if ψ(x) can be written as o(x − a) then it can also be written as O(x − a). where ψ(x) = o(x − a). Suppose f is diﬀerentiable at a. we can write o(x − a) for x − a3/2 . If f.17.e.
.5. for some M and some > 0. If f is diﬀerentiable at a then f (x) = f (a) + df (a). then we say x − a “ψ(x) → 0 as x → a. Then f is diﬀerentiable at a and df (a) = L.
51 We cannot establish the diﬀerentiability of f + g (or αf ) this way.8. and O(x − a) for sin(x − a). d(αf )(a) = αdf (a). where L : R → R is linear and ψ(x) = o(x − a).5. Conversely.1.
and so f is diﬀerentiable at a and df (a) = L. We write O(x − a) for ψ(x). x − a . g : D (⊂ Rn ) → R are diﬀerentiable at a ∈ D.
If f is diﬀerentiable at a. Proof. Suppose f is diﬀerentiable at x. Then the directional derivatives at x are given by Dv f (x) = v · f (x). the contour lines on a map are the level sets of the height function. For example. n (a) . From the CauchySchwartz Inequality we have (44) f (x) · v ≤  f (x). Definition 17. the vector corresponding to the diﬀerential at a is called the gradient at a.6. for convenience. For the example in Section 17. Definition 17. a3 + 1).17.6. .6. 0.2 that every linear operator from Rn to R corresponds to a unique vector in Rn . equality holds in (44) iﬀ v is a positive multiple of f (x). f (a) ∈ Rn
Proof.5 we have f (a) = (2a1 + 3a2 2 . If f : Rn → R then the level set through x is {y: f(y) = f(x) }. Proposition 17. and the directional derivative in this direction is  f (x).1 and Proposition 17. We saw in Section 17. 6a1 a2 + 3a2 2 a3 .2.
By the condition for equality in the CauchySchwartz Inequality. 1).2 it follows that f (x) · v = df (x).6. The Gradient. Strictly speaking.3. It follows from Proposition 17. we think of these vectors as having their “base at a”).2. then f (a) = ∂f ∂f (a). ∂xi Example 17.e.1 that the components of ∂f (a). 1 ∂x ∂x f (a) are ∀v ∈ Rn .6.
.4. 0. 1) = (2. Now suppose v is a unit vector. Since v is a unit vector.5. . In particular.
The unit vector v for which this is a maximum is v = f (x)/ f (x) (assuming  f (x) = 0). i. . Suppose f is diﬀerentiable at a. The vector (uniquely) determined by f (a) · v = df (a).
Proposition 17. this is equivalent to v = f (x)/ f (x). df (a). The left side of (44) is then  f (x). DIFFERENTIATION OF REALVALUED FUNCTIONS
92
17.5.6. ei . From Deﬁnition 17.1. 2 f (1. df (a) is a linear operator on vectors in Rn (where. v = Dv f (x) This proves the ﬁrst claim. v is called the gradient of f at a.6. .
5
x2
x. This is immediate from the previous Deﬁnition and Proposition 17.4.6. This is a reasonable deﬁnition.x 22 = 0
x12 . and so the rate of change of f in any direction tangent to S should be zero. Then f (x) is orthogonal to all vectors which are tangent at x to the level set through x.
In the previous proposition.
∇f(x) x1
x2 x12+x 22 = .x22 = 2
level sets of f(x) = x12 .17. A vector v is tangent at x to the level set S through x if Dv f (x) = 0.x 22 = . since f is constant on S.
.6. we say
f (x) is orthogonal to the level set through x.7. Proposition 17.6.x 22 (the graph of f looks like a saddle)
Definition 17. Proof. DIFFERENTIATION OF REALVALUED FUNCTIONS
93
x12+x 22 = 2.5
x12 . Suppose f is diﬀerentiable at x.8 x1 graph of f(x)=x12+x22
x2
level sets of f(x)=x12+x 22
x12 .6.x 22 = 2 x1
x12 .
v ∂f ∂f = v1 (0. Let xy 2 (x.1.7. Then f (tv) − f (0. 0) ∂x ∂y = 0 from (46) =
This contradicts (45). Thus for any vector v we would have Dv f (0. DIFFERENTIATION OF REALVALUED FUNCTIONS
94
17. y) = x2 + y 4 0 (x. 0) df (0. unless v1 = 0 or v2 = 0. 0) f (tv) − f (0. 0) t→0 t t2/3 (v1 v2 )1/3 = lim t→0 t (v1 v2 )1/3 = lim . ∂f (0. 0) = 0 since f = 0 on the xaxis. 0) Let v = (v1 . ∂x ∂f (0. Some Interesting Examples. then we could compute any directional derivative from the partial drivatives.7. Dv f (0. Example 17. 0) = lim Example 17.17. y) = (0. Let v be any vector. An example where the partial derivatives exist but the other directional derivatives do not exist. t→0 t1/3 This limit does not exist. 0). Then 1.7. 0) = (0. y) = (0. 0) = 0. and are given by (45). An example where the directional derivatives at some point all exist. 0) + v2 (0. Then Dv f (0.2. Let f (x. v2 ) be any nonzero vector. 2. ∂x ∂y
But if f were diﬀerentiable at (0. 0). 0) exist for all v. ∂y 3. 0) f (x. 0) t→0 t t3 v1 v2 2 −0 t2 v1 2 + t4 v2 4 = lim t→0 t v1 v2 2 = lim 2 t→0 v1 + t2 v2 4 v1 = 0 v2 2 /v1 = 0 v1 = 0 = lim
(45)
Thus the directional derivatives Dv f (0. 0) = 0 since f = 0 on the yaxis. y) = (xy)1/3 . but the function is not diﬀerentiable at the point.
. In particular (46) ∂f ∂f (0.
1] (being the composition of the continuous functions t → a + th and x → f (x)).
a+h = g(1)
a = g(0)
.1. 17.7. Moreover.3. then it is continuous at a. Suppose f is diﬀerentiable at a.8. except possibly at the end points. Suppose f is continuous at all points on the line segment L joining a and a + h.
. Then
n
f (x) = f (a) +
i=1 i i
∂f (a)(xi − ai ) + o(x − a). Then g is continuous on [0. f (a + h) − f (a) = =
i=1
df (x). Despite Example 17.3.7. then we can check that the corresponding limit is 0. We next show that g is diﬀerentiable and compute its derivative. x not an endpoint of L. Then (47) (48) for some x ∈ L. ∂xi
Since x − a → 0 and o(x − a) → 0 as x → a. lim f (λ2 . 4 λ→0 λ→0 2λ 2 But if we approach the origin along any straight line of the form (λv1 . Then λ4 1 = . Proposition 17. Deﬁne the one variable function g by g(t) = f (a + th). Diﬀerentiability Implies Continuity.x
Proof. 0) in order to make f continuous there.9. Proof. That is.3 we have the following result. and is diﬀerentiable at all points on L. An Example where the directional derivatives at a point all exist. λ) = lim 17. (49) g(0) = f (a). g(1) = f (a + h). Thus it is impossible to deﬁne f at (0. Approach the origin along the curve x = λ2 . Mean Value Theorem and Consequences. Theorem 17.17. DIFFERENTIATION OF REALVALUED FUNCTIONS
95
Example 17.7. it follows that f (x) → f (a) as x → a. Note that (48) follows immediately from (47) by Corollary 17. f is continuous at a. h
n
∂f (x) hi ∂xi
L
.2.8. but the function is not continuous at the point Take the same example as in Example 17.5.
. If f is diﬀerentiable at a. λv2 ). y = λ.1.9.
0 = lim w w→0 Let w = sh where s is a small real number.17. Corollary 17. and so f (a + th + w) − f (a + th) − df (a + th). then f is diﬀerentiable at a + th. w (50) . open in Ω and being open in Rn is the same for subsets of Ω. h . Choose any a ∈ Ω and suppose f (a) = α. Then f (a + h) − f (a) ≤ M h Proof. We will prove E is both open and closed in Ω. since we are assuming Ω is itself open in Rn . By the usual Mean Value Theorem for a function of one variable. h  for some x ∈ L =  f (x) · h ≤  f (x) h ≤ M h. we have (52) g(1) − g(0) = g (t) for some t ∈ (0. Then E is nonempty (as a ∈ E).
using the linearity of df (a + th). Proof. If the norm of the gradient vector of f is bounded by M . Hence g (t) exists for 0 < t < 1. This establishes the result. y − x 0. Since Ω is connected. More precisely.9. Then f is constant on Ω. Suppose f is diﬀerentiable in Ω and df (x) = 0 for all x ∈ Ω52 . Let E = {x ∈ Ω : f (x) = α}. the required result (47) follows. If y ∈ Br (x). From the previous theorem f (a + h) − f (a) =  df (x). suppose x ∈ E and choose r > 0 so that Br (x) ⊂ Ω. 1). sh s g(t + s) − g(t) − df (a + th). Assume the hypotheses of the previous theorem and suppose  f (x) ≤ M for all x ∈ L.
f (x) = 0 in Ω. Since w = ±sh. DIFFERENTIATION OF REALVALUED FUNCTIONS
96
If 0 < t < 1. Substituting (49) and (51) in (52). applied to g. To see E is open 53 . this will imply that E is all of Ω. then from (47) for some u between x and y.2. and moreover (51) g (t) = df (a + th).9. and since we may assume h = 0 (as otherwise (47) is trivial). positive or negative. h s
s→0
s→0
lim
. Suppose Ω ⊂ Rn is open and connected and f : Ω → R.3.
. we see from (50) that 0 = = lim f (a + (t + s)h − f (a + th) − df (a + th).
Corollary 17.
53 Being
52 Equivalently. then it is not surprising that the diﬀerence in value between f (a) and f (a + h) is bounded by M h. by hypothesis. f (y) − f (x) = = df (u).
a2 ) + 1 (ξ 1 . a single point. The hypotheses of the theorem need to hold at all points in some open set Ω. a2 ) +f (a1 + h1 . then f is diﬀerentiable everywhere in Ω. a2 ) obtained by ﬁxing a2 and taking x1 as a variable. Example (2). a2 + h2 ) − f (a1 + h1 . that the partial derivatives (and even all the directional derivatives) of a function can exist without the function being diﬀerentiable. a 2)
Ω
Hence f (a1 + h1 .8. for a function of one variable.10. a2 + h2 ) = f (a1 . we do have the following important theorem: Theorem 17. Suppose that the partial derivatives of f exist and are continuous in Ω. Since E = ∅. a2 + h2 ) = f (a1 .1 we know that f is continuous. f is constant (= α) on Ω. and E is both open and closed in Ω. a2)
(a 1+h1 . a )h + 2 (a1 . (a +h . We saw in Section 17. a2 ) +f (a1 + h1 . a2 ) + ∂f 1 2 1 ∂f (a .
. it does not necessarily follow that f is diﬀerentiable at that point.17. and some ξ 2 between a2 and a2 + h2 . and so y ∈ E. ξ ) 1 1 2 . The second partial derivative is similarly obtained by considering the function f (a1 + h1 . However. to the function f (x1 . DIFFERENTIATION OF REALVALUED FUNCTIONS
97
Thus f (y) = f (x) (= α). it is suﬃcient to show that E c ={y : f (x) = α} is open in Ω.
(a 1. From Proposition 17. ∂x ∂x for some ξ 1 between a1 and a1 + h1 . Suppose f : Ω (⊂ Rn ) → R where Ω is open. ξ 2 )h2 . and are continuous at. The ﬁrst partial derivative comes from applying the usual Mean Value Theorem.10. f (a1 + h1 .1. Then if a ∈ Ω and a + h is suﬃciently close to a. a2+h2 )
. Proof of Theorem. In other words. a2 )
. it follows E = Ω (as Ω is connected). If the partial derivatives of f exist in some neighbourhood of. Since we have E c = f −1 [R \ {α}] and R \ {α} is open. a2 )h1 + 2 (a1 + h1 . 17. Hence E is closed in Ω. .1. Remark 17. x2 ).
(a1+h1. Continuously Diﬀerentiable Functions. as required. If the partial derivatives of f exist and are continuous at every point in Ω. a2 ) − f (a1 .10.
(ξ1. We prove the theorem in case n = 2 (the proof for n > 2 is only notationally more complicated).7. Hence Br (x) ⊂ E and so E is open. To show that E is closed in Ω. where a1 + h1 is ﬁxed and x2 is variable. a2 ) ∂f ∂f = f (a1 . a2 )h2 ∂x1 ∂x
. it follows that E c is open in Ω.
Exercise: The converse may not be true.8. . Since a ∈ Ω is arbitrary. If the partial derivatives of f exist and are continuous in the open set Ω. . a ) h1 ∂x1 ∂x1 ∂f 1 ∂f (a + h1 . (a . . give a simple counterexample in R. DIFFERENTIATION OF REALVALUED FUNCTIONS
98
+ + =
∂f 1 2 ∂f 1 2 (ξ . it is suﬃcient to assume just that the second partial derivatives are continuous. h1  ≤ h.10. a ) → (a . 17. ∂x1 ∂x1 tives). each ∂f /∂xi must be diﬀerentiable by Theorem 17. a )h + 2 (a1 . a2 ) + L(h) + ψ(h).
54 In
. 3. and we similarly deﬁne C q (Ω) for any integer q ≥ 0. ∂f ∂f . n . It now follows from Proposition 17. h2  ≤ h. 2 ∂x ∂x2 derivatives). a2 ) h2 ψ(h) = ∂x1 ∂x ∂x2 ∂x can be written as o(h) This follows from the facts: ∂f 1 2 ∂f 1 2 (ξ .
Here L is the linear map deﬁned by ∂f ∂f 1 2 1 (a . For under this assumption. a ) (a .17. a2 )h2 L(h) = ∂x1 ∂x h1 ∂f 1 2 ∂f 1 2 = . a2 ) h1 + (a + h1 . say. ξ 2 ) − 2 (a1 . a ) as h → 0 (by continuity of the partial deriva1.11. a ) as h → 0 (again by continuity of the partial 2.2. are also functions from Ω to R. The partial derivatives ∂x1 ∂x and may themselves have partial derivatives.10.7 that f is diﬀerentiable at a. ξ 2 ) → (a . and the diﬀerential of f is given by the previous 1 × 2 matrix of partial derivatives.1 applied to ∂f /∂xi it then follows that ∂f /∂xi is continuous. a ) − (a . HigherOrder Partial Derivatives. if they exist.5. From Proposition 17. ∂xj ∂xi If all ﬁrst and second partial derivatives of f exist and are continuous in Ω 54 we write f ∈ C 2 (Ω).
fact. One writes f ∈ C 1 (Ω). ∂f 1 2 ∂f 1 (a + h1 . We claim that the error term ∂f ∂f ∂f 1 ∂f 1 2 (ξ . . a2 ) h2 ∂x2 ∂x
f (a1 . Definition 17. ξ 2 ) − 2 (a1 . It follows from the previous Theorem that if f ∈ C 1 (Ω) then f is indeed diﬀerentiable in Ω. a ) − 1 (a1 . this completes the proof. a ) 1 2 h2 ∂x ∂x Thus L is represented by the previous 1 × 2 matrix. ∂f The jth partial derivative of is denoted by ∂xi ∂2f or fij or Dij f. Suppose f : Ω (⊂ Rn ) → R.1 applied to ∂f /∂xi . we say f is a C 1 (or continuously diﬀerentiable) function on Ω. Similar remarks apply to higher order derivatives.
. In particular. The proof for n > 2 is very similar. h2
C a = (a1. a2)
D (a 1+h. a2 + h) − f (a1 . Proof.
(a1.
55 As
usual.f(A) ) . a2 ) = 1 g(a2 + h) − g(a2 ) . a2+h) A (a 1+h.1. The usual rules for diﬀerentiating a sum. x2 ). ∂xi ∂xj ∂xk ∂xj ∂xi ∂xk ∂xj ∂xk ∂xi Theorem 17.f(C) ) ) / h2
g (x2 ) = ∂f 1 ∂f (a + h. Thus ∂2f ∂2f = . a2 +h) B
=
1 h2
f (a1 + h. The next theorem shows that for higher order derivatives. ∂xi ∂xj ∂xj ∂xi and so ∂3f ∂3f ∂3f = = .
. x2 ) − 2 (a1 . x2 ) − f (a1 . product or quotient of functions of a single variable apply to partial derivatives. Consider the second diﬀerence quotient deﬁned by A(h) (53) (54) where g(x2 ) = f (a1 + h. products and quotients (if the denominator is nonzero).( f(D) . then fij = fji in Ω. if f ∈ C 2 (Ω) then fij = fji for all i = j. Ω is assumed to be open. It follows that C k (Ω) is closed under addition. the actual order of diﬀerentiation does not matter. x2 ) ∂x2 ∂x
From the deﬁnition of partial diﬀerentiation. DIFFERENTIATION OF REALVALUED FUNCTIONS
99
Note that C 0 (Ω) ⊃ C 1 (Ω) ⊃ C 2 (Ω) ⊃ . If f ∈ C 1 (Ω)55 and both fij and fji exist and are continuous (for some i = j) in Ω. For notational simplicity we take n = 2.f(D) ) . . a2)
A(h) = ( ( f(B) . etc. only the number of derivatives with respect to each variable is important. g (x2 ) exists and (55) for a2 ≤ x ≤ a2 + h.11. a2 ) − f (a1 .f(C) )
) / h2 = ( ( f(B) . a2 + h)
− f (a1 + h.17. Suppose a ∈ Ω and suppose h > 0 is some suﬃciently small real number.( f(A) .
η 2 ) ∂x2 ∂x1 for some η 1 ∈ (a1 . a2 + h). Integral form of the Remainder).1 (Single Variable. a2 + h) − f (a1 + a2 )
and interchange the roles of x1 and x2 in the previous argument. = 2 h ∂x ∂x ∂f 1 2 (x . with ξ 2 Applying the mean value theorem again to the function ∂x2 ﬁxed. a1 + h).
f (a1 + h. we obtain ∂2f (η 1 . η 2 ) → (a1 . then we know
b
g(b) = g(a) +
a
g (t) dt
This is the case k = 1 of the following version of Taylor’s Theorem for a function of one variable. a2 )
− f (a1 . Theorem 17. ξ 2 ) ∂x1 ∂x2 If we now rewrite (53) as A(h) = A(h) (58) = 1 h2 some ξ 1 ∈ (a1 . (59) and the continuity of f12 and f21 at a. An elegant (but not obvious) proof is to begin by computing: d gϕ(k−1) − g ϕ(k−2) + g ϕ(k−3) − · · · + (−1)k−1 g (k−1) ϕ dt = gϕ(k) + g ϕ(k−1) − g ϕ(k−1) + g ϕ(k−2) + g ϕ(k−2) + g ϕ(k−3) − · · · + (−1)k−1 g (k−1) ϕ + g (k) ϕ (61) = gϕ(k) + (−1)k−1 g (k) ϕ. ξ 2 ) . a1 + h). a2 + h) − f (a1 + h. η 2 ∈ (a2 . b]. we see from (55) that 1 g (ξ 2 ) some ξ 2 ∈ (a2 . If we let h → 0 then (ξ 1 .17. ξ 2 ) − 2 (a1 . (k − 1)!
Now choose
. b]. a2 + h) A(h) = h ∂f 1 ∂f 1 (56) (a + h.12. Then 1 (60) g(b) = g(a) + g (a)(b − a) + g (a)(b − a)2 + · · · 2! b (b − t)k−1 (k) 1 g (k−1) (a)(b − a)k−1 + g (t) dt. ξ ).12. 17. Taylor’s Theorem. If g ∈ C 1 [a. we see (57) ∂2f (ξ 1 . and so from (57). ξ 2 ) and (η 1 . + (k − 1)! (k − 1)! a Proof. DIFFERENTIATION OF REALVALUED FUNCTIONS
100
Applying the mean value theorem for a function of a single variable to (54). it follows that (59) A(h) = f12 (a) = f21 (a). This completes the proof. Suppose g ∈ C k [a. a2 ). ϕ(t) = (b − t)k−1 .
12. (k − 1)! k!
. m≤ b (b − t)k−1 dt (k − 1)! a By the Intermediate Value Theorem. Suppose g ∈ C k [a. we get g(b) − g(a) + g (a)(b − a) + g (a)
b
=
a
g (k) (t)
(b − t)k−1 dt. Theorem 17. it has a minimum value m. say. ϕ(k−3) (t) ϕ(k−2) (t) ϕ (62) Hence from (61) we have (−1)k−1 d dt g(t) + g (t)(b − t) + g (t) (b − t)k−1 . Second Version). (k − 1)!
This gives formula (60). . We establish (63) from (60). b]. (k − 1)!
i. g (k) takes all values in the range [m. Since g (k) is continuous in [a.
= = = =
(t)
ϕ (t)
= (−1)k−1 g (k) (t)
Dividing by (−1)k−1 and integrating both sides from a to b. Since g (k) (t)
b a
(b − t)k−1 (b − a)k dt = . DIFFERENTIATION OF REALVALUED FUNCTIONS
101
Then ϕ (t) ϕ (t) = = . it follows that
b
m
a
(b − t)k−1 dt ≤ (k − 1)!
b
g (k) (t)
a b
(b − t)k−1 dt ≤ (k − 1)!
b
M
a
(b − t)k−1 dt. (k − 1)! (b − a)2 (b − a)k−1 + · · · + g (k−1) (a) 2! (k − 1)! (b − t)2 (b − t)k−1 + · · · + g k−1 (t) 2! (k − 1)!
(k−1) k
(b − t)k−2 (k − 2)! (b − t)k−3 (−1)2 (k − 3)! (−1) (b − t)2 2! k−2 (−1) (b − t) (−1)k−3 (−1)k−1 0. Proof. and so the middle term in the previous inequality must equal g (k) (ξ) for some ξ ∈ (a. b).
(b − t)k−1 dt (k − 1)! a ≤ M. By elementary properties of integrals. M ].e. Then 1 g(b) = g(a) + g (a)(b − a) + g (a)(b − a)2 + · · · 2! 1 1 g (k−1) (a)(b − a)k−1 + g (k) (ξ)(b − a)k (63) + (k − 1)! k! for some ξ ∈ (a.17.2 (Single Variable. . b). and a maximum value M . b].
we obtain
n
g (t)
=
i=1 n
n
Dij f (a + th) hj hi
j=1
(66) Similarly
=
i.ik f (a + sh) hi1 · . From (64) we have
n
(65)
g (t) =
i−1
Di f (a + th) hi .
b
g (k) (t)
Taylor’s Theorem generalises easily to functions of more than one variable. .3 (with f there replaced by F ). .··· .
.. We will apply Taylor’s Theorem for a function of one variable to g. which we will discuss later. Then
n
f (a + h)
=
f (a) +
i−1
Di f (a) hi +
n
1 Dij f (a) hi hj + · · · 2! i. (k − 1)! k! a Formula (63) now follows from (60). Then g : [0. · hik−1 + Rk (a.
Diﬀerentiating again. h)
1 ... and the line segment joining a and a + h is a subset of Ω. .
i=1
This is just a particular case of the chain rule.17...ik−1 f (a) hi1 · .12. h) =
1 (k − 1)! i
Di1 . · hik
1 .ik =1
=
Di1 . First note that for any diﬀerentiable function F : D (⊂ Rn ) → R we have (64) d F (a + th) = dt
n
Di F (a + th) hi . 1] → R. 1] and obtain formulae for the derivatives of g.. Let g(t) = f (a + th). Several Variables).. 1). and applying (64) to Di F .k=1 k
Dijk f (a + th) hi hj hk ..j=1
n
+ where Rk (a.ik f (a + th) dt
1 .j.... Theorem 17.3 (Taylor’s Formula. DIFFERENTIATION OF REALVALUED FUNCTIONS
102
it follows
(b − t)k−1 (b − a)k (k) dt = g (ξ). Suppose f ∈ C k (Ω) where Ω ⊂ Rn .j=1
Dij f (a + th) hi hj . This particular version follows from (51) and Corollary 17..ik−1 =1
1 (k − 1)! i 1 k! i
n
n 0
1
(1 − t)k−1 Di1 . .ik =1
for some s ∈ (0. . .
etc. In this way.
Proof. we see g ∈ C [0.5.
n
(67)
g (t) =
i.. .
i. the ﬁrst four terms give the best third order approximation. Moreover. .
constant plus linear term. Di1 .12. y) near (0. y) .
. 1). = −21/2 2 −1/2 −y(1 + y ) sin x. First note that f (0.
56 I. hi1 · . g(1) Remark 17. 1) = 21/2 . 2.17.12. = 2−3/2
at at at at at
(0. and hence bounded on compact sets. y) = (1 + y 2 )1/2 cos x. 1). = 2−1/2 2 1/2 −(1 + y ) cos x. 1) as follows. Note that the remainder term Rk (a.3 and the facts: 1. DIFFERENTIATION OF REALVALUED FUNCTIONS
103
But from (60) and (63) we have 1 1 g (k−1) (0) = g(0) + g (0) + g (0) + · · · + 2! (k − 1)! 1 1 (1 − t)k−1 g (k) (t) dt (k − 1)! 0 + or 1 (k) g (s) some s ∈ (0. 1) (0.12. One ﬁnds the best second order approximation to f for (x.1.
. h) in Theorem 17.e. h) is bounded as h → 0. we obtain the required results. f1 f2 f11 f12 f22 Hence f (x.4. k! If we substitute (65). (67) etc. 1) (0. 1) (0..5). Let f (x. constant plus linear term plus quadratic term. Example 17.12. into this. etc.ik f (x) is continuous. 1). Rk (a. = 0 (1 + y 2 )−3/2 cos x. 57 I. hk This follows from the second version for the remainder in Theorem 17. y) = O (x. 1)3 = O x2 + (y − 1)2
3/2
= = = = =
−(1 + y 2 )1/2 sin x. The ﬁrst three terms give the best second order approximation 57 in h. · hik  ≤ hk . (66). y) = 21/2 + 2−1/2 (y − 1) − 21/2 x2 + 2−3/2 (y − 1)2 + R3 (0. (x. (x.3 can be written as O(hk ) (see the Remarks on rates of convergence in Section 17.e. = 0 y(1 + y 2 )−1/2 cos x. . where R3 (0.e. The ﬁrst two terms of Taylor’s Formula give the best ﬁrst order approximation 56 in h to f (a + h) for h near 0. 1).. y) − (0. 1) (0.
If. . y. . Remark 18.2. 18. In this case we say f is diﬀerentiable at t. . However.1. f m . xn ) where f i : D → R i = 1. 2 (to the beginning of Deﬁnition 19.3. y. f m are diﬀerentiable at t. .1 and 18. We write f (x1 . . . Then f 1 (x. 18.
58 If
t is an endpoint of I then one takes the corresponding onesided limits. . Although we say f (t) is the tangent vector at t. as above. directional derivative. This is an important case in its own right and also helps motivate the case n > 1. In this chapter we consider functions f : D (⊂ Rn ) → Rm . this is not always a good idea. . Diﬀerentiation of VectorValued Functions You need only consider Subsections 1. 2xz + 1).1.2. 18.3. . . . . moreover. since studying the f i involves a choice of coordinates in Rm ..4.1 we deﬁne the notion of partial derivative. . In this case f (t) = f 1 (t). In this section we consider the case corresponding to n = 1 in the notation of the previous section. .2. z) = x2 − y 2 and f 2 (x. and this can obscure the geometry involved. f (t) = 0 then f (t)/f (t) is called the unit tangent at t. xn ).1. . Introduction.2. f (t) = lim s→0 s provided the limit exists58 . we should really think of f (t) as a vector with its “base” at f (t).2. Proposition 18. Reduction to Component Functions For many purposes we can reduce the study of functions f . Example 18. 5 (the last is not examinable). .2 we show these deﬁnitions are equivalent to deﬁnitions in terms of the component functions. Let f (t) = f 1 (t).2.2. f m (x1 ... What is required is just a basic understanding of the ideas. 3.7). DIFFERENTIATION OF VECTORVALUED FUNCTIONS
104
18. .4. In Deﬁnitions 18. . .1. . xn ) = f 1 (x1 . and application to straightforward examples. . .2 and 18. to the study of the corresponding real valued functions f 1 . z) = 2xz + 1.1. with m ≥ 1. . Proof. f m (t) .2. . f m (t) . . .2. . Let f (x. . . Paths in Rm . z) = (x2 − y 2 . .1. . If f : I → Rn then the derivative or tangent vector at t is the vector f (t + s) − f (t) . s s s The theorem follows since a function into Rm converges iﬀ its component functions converge. . In Propositions 18. and diﬀerential of f without reference to the component functions. m are real valued functions.
. . . . 18. Since f 1 (t + s) − f 1 (t) f m (t + s) − f m (t) f (t + s) − f (t) = . y. . Let I be an interval in R. . 4. Then f is diﬀerentiable at t iﬀ f 1 . See the next diagram. . Definition 18.18.
. If f (t) = f 1 (t).2.3. . . f 2 : I → Rm are diﬀerentable at t then d f 1 (t). If f : I → Rm . f 2 (t) = f 1 (t). The terminology tangent vector is reasonable. 1. DIFFERENTIATION OF VECTORVALUED FUNCTIONS
105
Definition 18. f 2 (t) + f 1 (t). Sometimes we speak of the tangent vector at f (t) rather than at t. as in the second ﬁgure. t2 ). . sin t) t ∈ [0. but we need to be careful if f is not oneone. f m (t) then f is C 1 if each f i is C 1 . We have the usual rules for diﬀerentiating the sum of two functions from I to Rm . cos t). 2π). Since
m
f 1 (t).2. If f 1 .18.
. f 2 (t) =
i=1
i i f1 (t)f2 (t). and the product of such a function with a real valued function (exercise: formulate and prove such a result).2. This traces out a parabola in R2 and f (t) = (1.4. f 2 (t) .
a path in R2 f(t+s) .
the result follows from the usual rule for diﬀerentiation sums and products. as we see from the following diagram. we can think of f as tracing out a “curve” in Rm (we will make this precise later). 2. Let f (t) = (t. 2t). dt Proof. Let f (t) = (cos t. The following rule for diﬀerentiating the inner product of two functions is useful.5. Proposition 18.f(t) s f(t+s) f(t) f '(t) f
f(t 1)=f(t2)
f '(t1) f '(t2)
t1
t2
I
Example 18. This traces out a circle in R2 and f (t) = (− sin t.
0).
59 Other
texts may have diﬀerent terminology. √ 3.e. Intuitively. Many diﬀerent paths (i. f 2 (t) = (t3 . and not on the particular parametrisation.2.6. cos t) f '(t) = (1. f 3 (t) = ( 3 t. We say f : I → Rm is a path 59 in Rm if f is C 1 and f (t) = 0 for t ∈ I. f 3 (0) is undeﬁned.e. We will think of the corresponding curve as the set of points traced out by f . t) t ∈ R. (i. t 2)
Example 18. We say the two paths f 1 : I1 → Rm and f 2 : I2 → Rm are equivalent if there exists a function φ : I1 → I2 such that f 1 = f 2 ◦ φ. f 1 (0) = (1. t3 ) t ∈ R. they correspond to tracing out the curve at diﬀerent times and velocities.7. We make this precise as follows: Definition 18.2. Then each function f i traces out the same “cubic” curve in R2 . DIFFERENTIATION OF VECTORVALUED FUNCTIONS
106
f '(t) = (sin t. we will think of a path in Rm as a function f which neither stops nor reverses direction. t9 ) t ∈ R. f 2 (0) = (0.
. 2t) f(t)=(cos t. This is indeed the case.18. and f 1 (0) = f 2 (0) = f 3 (0) = (0. where φ is C 1 and φ (t) > 0 for t ∈ I1 . f 1 (t) = (t. sin t) f(t) = (t. 2.. 0). Any path in the equivalence class is called a parametrisation of the curve. A curve is an equivalence class of paths. as is shown by the following Proposition. 0). However. functions) will give the same curve. It is often convenient to consider the variable t as representing “time”. the image is the same set of points). Consider the functions 1. We expect that the unit tangent vector to a curve should depend only on the curve itself. We can think of φ as giving another way of measuring “time”.
b] → Rm is a path in Rm . where ti − ti−1 = δt for all i. Since f (t)2 = f (t). f 1 (t) f 2 (t)
1 m f1 (t).
Definition 18. From the chain rule for a function of one variable. Example 18. Arc length Suppose f : [a. the “speed” is constant) then the velocity and the acceleration are orthogonal. then the acceleration at t is f (t). . and in particular f 1 = f 2 ◦ φ where φ : I1 → I2 . we have f 1 (t) = = = Hence. we have from Proposition 18.10. . f (y) is constant. < tn = b be a partition of [a.4 that d 0= f (t). If f (t) is constant (i.9. dt This gives the result. If f is a path in Rm . Proof.18.
. We think of the length of the curve corresponding to f as being
N N
≈
i=2
f (ti ) − f (ti−1 ) =
i=2
f (ti ) − f (ti−1 ) δt ≈ δt
b
f (t) dt. Let a = t1 < t2 < . since φ (t) > 0. .8. f (y) = 2 f (t). f2 (φ(t)) φ (t)
f 2 (φ(t)) φ (t).2.2. .2. . φ is C 1 and φ (t) > 0 for t ∈ I1 . . DIFFERENTIATION OF VECTORVALUED FUNCTIONS
107
f 1'(t) / f 1'(t) = f2'( ϕ(t)) / f2'( ϕ(t)) f1(t)=f 2(ϕ(t))
f1
f2
ϕ
I1 I2
Proposition 18. Suppose f 1 : I1 → Rm and f 2 : I2 → Rm are equivalent parametrisations. . . . .2. Then f 1 and f 2 have the same unit tangent vector at t and φ(t) respectively.
a
See the next diagram. f1 (t) 1 m f2 (φ(t)) φ (t).e. f (y) . f (t) f 1 (t) = 2 . Proof. b].
b] → Rm be a path in Rm . From the chain rule and then the rule for change of variable of integration. ∂xi
t→0
f (x + tv) − f (x) . Definition 18.
a
The next result shows that this deﬁnition is independent of the particular parametrisation chosen for the curve.
a2
=
18.2.
b1 b1
f 1 (t) dt
a1
=
a1 b2
f 2 (φ(t)) φ (t)dt f 2 (s) ds. (x) or Di f (x) = lim i t→0 ∂x t provided the limit exists.3.3.11. Then
b1 b2
f 1 (t) dt =
a1 a2
f 2 (s) ds.1 and 17. DIFFERENTIATION OF VECTORVALUED FUNCTIONS
108
f(t N)
f(t i1) f(t i) f(t 1) f
t1
t i1
ti
tN
Motivated by this we make the following deﬁnition. Analogous to Deﬁnitions 17. Let f : [a.3. 1. φ is C 1 and φ (t) > 0 for t ∈ I1 .18. b1 ] → Rm and f 2 : [a2 .1. the directional derivative of f at x in the direction v is deﬁned by Dv f (x) = lim provided the limit exists. b1 ] → [a2 . It follows immediately from the Deﬁnitions that ∂f (x) = Dei f (x).3.12. b2 ]. Partial and Directional Derivatives.2. The ith partial derivative of f at x is deﬁned by ∂f f (x + tei ) − f (x) . Suppose f 1 : [a1 .1 we have: Definition 18. and in particular f 1 = f 2 ◦φ where φ : [a1 .
Proof. Remark 18. More generally.4. Proposition 18. t
.1. b2 ] → Rm are equivalent parametrisations. Then the length of the curve corresponding to f is given by
b
f (t) dt.
f m are the component functions of f then ∂f (a) ∂xi Dv f (a) = = ∂f 1 ∂f m (a). (a) i ∂x ∂xi Dv f 1 (a). .1. As we will discuss later. 3.4.1 we have: Definition 18. Dv f m (a) for i = 1. if n ≤ m and the diﬀerential df (x) has rank n.5. In the terminol∂f ogy of the previous section. . 18. .3. . y) = ∂y are vectors in R3 . Example 18. The Diﬀerential. y) = ∂x ∂f (x. x2 + y 3 . ∂y ∂y ∂y
= (2x − 2y. . 3y 2 . ∂x ∂x ∂x ∂f 1 ∂f 2 ∂f 3 . (x) is tangent to the path t → f (x + tei ) ∂xi and Dv f (x) is tangent to the path t → f (x + tv). we may regard the partial derivatives at x as a basis for the tangent space to the image of f at f (x)60 . . Proof. . n
in the sense that if one side of either equality exists.
Dv f(a) R2 f(a) f(b) v a b e2 e1
∂f (b) ∂y
f R3
∂f (b) ∂x
Proposition 18. If f 1 . . . Then f is diﬀerentiable at a ∈ D if there is a linear transformation L : Rn → Rm such that (68)
60 More
∂f 1 ∂f 2 ∂f 3 .2.2. = (−2x. Note that the curves corresponding to these paths are subsets of the image of f . Then ∂f (x.2. Essentially the same as for the proof of Proposition 18. then so does the other. . . DIFFERENTIATION OF VECTORVALUED FUNCTIONS
109
2.3. . 0). . and both sides are then equal. See later.
.
f (x) − f (a) + L(x − a) x − a
→ 0 as x → a. Let f : R2 → R3 be given by f (x.18.3. Analogous to Deﬁnition 17. . sin x). The partial and directional derivatives are vectors in Rm . 2x. cos x). . y) = (x2 − 2xy.4. . . Suppose f : D (⊂ Rn ) → Rm .
precisely. .
··· . . . . m (by uniqueness of the diﬀerential for real valued functions). . .
But this says that the diﬀerential df (a) is unique and is given by (69). ∂f m ∂f m (a) · · · (a) ∂x1 ∂xn Proof. In this case the diﬀerential is given by (69) df (a). .
. and for each i = 1.
.2 this is the column vector corresponding to df 1 (a). . m. it follows f (x) − f (a) + L(x − a) x − a iﬀ f i (x) − f i (a) + Li (x − a) x − a → 0 as x → a for i = 1. v . . . (70) . .4. ei .
In particular. . Proof.2. 62 For any linear transformation L : R n → R m . df m (a). f m are diﬀerentiable at a. f is diﬀerentiable at a iﬀ f 1 . .e. ei 62 . the ith column of the corresponding matrix is L(ei ). v . . . i ∂x ∂xi This proves the result. .2 that if L exists then it is unique and is given by the right side of (69). and so L(v) = df 1 (a).4. For any linear map L : Rn → Rm . . . → 0 as x → a
i
Thus f is diﬀerentiable at a iﬀ f 1 . DIFFERENTIATION OF VECTORVALUED FUNCTIONS
110
The linear transformation L is denoted by f (a) or df (a) and is called the derivative or diﬀerential of f at a61 . . More precisely: Proposition 18. .4. . .4. to ∂f m ∂f 1 (a). ei i. .
61 It follows from Proposition 18. . f m are diﬀerentiable at a. : Rn → Rm . .18. . v . . . m. If f is diﬀerentiable at a then the linear transformation df (a) is represented by the matrix ∂f 1 ∂f 1 ∂x1 (a) · · · ∂xn (a) . . let Li : Rn → R be the linear map deﬁned by Li (v) = L(v) . The ith column of the matrix corresponding to df (a) is the vector df (a). df m (a). (a) . . . df m (a). . . . A vectorvalued function is diﬀerentiable iﬀ the corresponding component functions are diﬀerentiable. the diﬀerential is unique.3. In this case we must have Li = df i (a) i = 1. v . From Proposition 18. . Corollary 18. . Since a function into Rm converges iﬀ its component functions converge. v = df 1 (a). . . .
Find the best ﬁrst order approximation to f (x) for x near (1. 2) + df (1. y) = (x2 − 2xy.5.2. x − a + ψ(x).4.4. Solution: −3 f (1.4. (x − 1. Remark 18.4. One similarly obtains second and higher order approximations by using Taylor’s formula for each component function.
. 9 df (x.
Alternatively. Let f : R2 → R2 be given by f (x. ∂x ∂y ∂f 2 ∂f 2 f 2 (1. 2) + ∂f 1 ∂f 1 (1. ∂xj The following proposition is immediate. Proof. The ith row represents df i (a). working with each component separately. . 2). Then f is diﬀerentiable at a and df (a) = L. where L : Rn → Rm is linear and ψ(x) = o(x − a). 2)(x − 1) + (1. suppose f (x) = f (a) + L(x − a) + ψ(x).5. −17 + 2x + 12y . x2 + y 3 ). 2) + (1. Example 18. If f is diﬀerentiable at a then f (x) = f (a) + df (a). the best ﬁrst order approximation is f 1 (1.7.4. 2) = = 2x − 2y 2x −2 2 −2 12 −2x 3y 2 . 9 + 2(x − 1) + 12(y − 2) = 7 − 2x − 4y. x − a gives the best ﬁrst order approximation to f (x) for x near a. y) df (1. Thus as is the case for realvalued functions.1. As for Proposition 17. Conversely. the previous proposition implies f (a) + df (a). DIFFERENTIATION OF VECTORVALUED FUNCTIONS
111
Remark 18. 2). 2) is f (1. 2)(y − 2). Proposition 18. where ψ(x) = o(x − a). 2) = . The jth column is the vector in Rm corresponding to the ∂f partial derivative (a). y − 2) −3 −2 −2 = + 9 2 12 = = x−1 y−2
−3 − 2(x − 1) − 4(y − 2) 9 + 2(x − 1) + 12(y − 2) 7 − 2x − 4y −17 + 2x + 12y .18.
So the best ﬁrst order approximation near (1. 2)(x − 1) + (y − 2) ∂x ∂y
= −3 − 2(x − 1) − 4(y − 2).
g : D (⊂ Rn ) → Rm are diﬀerentiable at a ∈ D. 18. then dg dg df = . as L is continuous and {x : x ≤ 1} is compact. . The theorem says that the linear approximation to g ◦ f (computed at x) is the composition of the linear approximation to f (computed at x) followed by the linear approximation to g (computed at f (x)). . L(x) are the usual Euclidean norms on Rn and Rm .5. . d(αf )(a) = αdf (a). Theorem 18. This is straightforward (exercise) from Proposition 18. Similarly for αf . DIFFERENTIATION OF VECTORVALUED FUNCTIONS
112
Proposition 18. Moreover. Then we deﬁne L = max{L(x) : x ≤ 1}63 . The Chain Rule. d(f + g)(a) = df (a) + dg(a).18. f m ∈ C k (D).4. .4. Proof. dx df dx This is generalised in the following theorem. It is also easy to check (exercise) that  ·  does deﬁne a norm on the vector space of linear maps from Rn into Rm . The previous proposition corresponds to the fact that the partial derivatives for f + g are the sum of the partial derivatives corresponding to f and g respectively. Then g ◦ f is diﬀerentiable at x and
m
(72) Schematically:
d(g ◦ f )(x) = dg(f (x)) ◦ df (x). Thus L corresponds to the maximum value taken by L on the unit ball. then so are αf and f + g. if g = g(f ) and f = f (x). Suppose f : D (⊂ Rn ) → Ω (⊂ Rm ) and g : Ω (⊂ R ) → Rr . .6.1 (Chain Rule).5. Suppose f is diﬀerentiable at x and g is diﬀerentiable at f (x). .
63 Here
. 2. A Little Linear Algebra the norm of L by Suppose L : Rn → Rm is a linear map. The chain rule for the composition of functions of one variable says that d g f (x) = g f (x) f (x). A simple result (exercise) is that (71) L(x) ≤ L x for any x ∈ Rn . dx Or to use a more informal notation. The maximum value is achieved. .4. Higher Derivatives We say f ∈ C k (D) iﬀ f 1 . f ∈ C 1 (D) ⇒ f is diﬀerentiable in D. It follows from the corresponding results for the component functions that 1. − − − −f− − − − − − − − − − − − − − −g → D (⊂ Rn ) −→ Ω (⊂ Rm ) −→Rr −−−−−−−−−→ −−−−−−−−− df (x) dg(f (x)) Rn −→ Rm −→ Rr
d(g◦f )(x) = dg(f (x)) ◦ df (x) g◦f
x. C 0 (D) ⊃ C 1 (D) ⊃ C 2 (D) ⊃ . If f .
and p and q as functions of u and v. z). y. z). In terms of the matrices of partial derivatives:
∂p ∂x ∂q ∂x ∂p ∂y ∂q ∂y ∂p ∂z ∂q ∂z
=
∂p ∂u ∂q ∂u
∂p ∂v ∂q ∂v
∂u ∂x ∂v ∂x
∂u ∂y ∂v ∂y
∂u ∂z ∂v ∂z
. =
∂p ∂u ∂p ∂v + ∂u ∂x ∂v ∂x ∂p ∂u ∂p ∂v + ∂u ∂x ∂v ∂x ∂q ∂u ∂q ∂v + . z). where L = df (g(a)) ◦ dg(a). . y. We can also represent p and q as functions of x. The usual version of the chain rule in terms of partial derivatives is: ∂p ∂x ∂p ∂x ∂q ∂z In the ﬁrst equality. To see how all this corresponds to other formulations of the chain rule. z)
f
R2 −→ (u. y. v) and coordinates in the second copy of R2 are denoted by (p. y. y. v(x. . z). v(x. suppose we have the following: −→ R3 (x. q)
Thus coordinates in R3 are denoted by (x. h + o(h)
. Similarly for the u(x. and other equalities. . z) . y. y. v). y.
d(g ◦ f )(x) where x = (x. v). dg(a). DIFFERENTIATION OF VECTORVALUED FUNCTIONS
113
Example 18. coordinates in the ﬁrst copy of R2 are denoted by (u. v(x. z). y. The functions f and g can be written as follows: f g : : u = u(x.5. z). z).2. q = q(u.
∂v ∂x
and
∂p ∂v
are evaluated at
and are evaluated at (x. y. z) . by the diﬀerentiability of f = f g(a) + df g(a) . v)
g
R2 (p. y.
dg(f (x))
df (x)
Proof of Chain Rule: We want to show (73) (f ◦ g)(a + h) = (f ◦ g)(a) + L(h) + o(h).
Thus we think of u and v as functions of x. z). y. v = v(x. q). p = p(u. g(a + h) − g(a) +o g(a + h) − g(a) . z) . z). y and z. ∂u ∂z ∂v ∂z
∂p ∂u
is evaluated at (x. q = q u(x. Now (f ◦ g)(a + h) = = = f g(a + h) f g(a) + g(a + h) − g(a) f g(a) + df g(a) . y.
∂p ∂x ∂u ∂x
= = .18. . y and z via p = p u(x.
and moreover the diﬀerential equals df (g(a)) ◦ dg(a). . . . dg(a). by diﬀerentiability of g
≤ dg(a) h + o(h) .18. for D we have g(a + h) − g(a) = dg(a). DIFFERENTIATION OF VECTORVALUED FUNCTIONS
114
+o g(a + h) − g(a) .
. by the diﬀerentiability of g = f g(a) + df g(a) . we get (74) (f ◦ g)(a + h) = f g(a) + df g(a)) ◦ dg(a). . o(h) + o g(a + h) − g(a) = A+B+C +D But B = df g(a) ◦ dg(a). Finally. . h + df g(a) . from (71) = O(h) . h . . Also C = o(h) from (71) (exercise). . h + o(h).
If follows that f ◦ g is diﬀerentiable at a. by deﬁnition of the “composition” of two maps. h + o(h) . . why? Substituting the above expressions into A + B + C + D. This proves the theorem.