You are on page 1of 82

NOTES, MAT 472, INTERMEDIATE ANALYSIS, FALL 2010

JACK SPIELBERG

Contents
1. Axioms for the real numbers 2
2. Cardinality (briefly) 8
3. Decimal representation of real numbers 9
4. Metric spaces 11
5. The topology of metric spaces 14
6. The Cantor set 17
7. Sequences 19
8. Continuous functions 21
9. Limits of functions 23
10. Sequences in R 24
11. Limsup and liminf 26
12. Infinite limits and limits at infinity 28
13. Cauchy sequences and complete metric spaces 29
14. Compactness 31
15. Continuity and compactness 36
16. Connectedness 37
17. Continuity and connectedness 40
18. Uniform continuity 42
19. Convergence of functions 43
20. Differentiation 45
21. Higher order derivatives and Taylor’s theorem 51
22. The Riemann integral 53
23. The “Darboux” approach 56
24. Measure zero and integration 59
25. The fundamental theorem of calculus 63
26. The Weierstrass approximation theorem 65
27. Uniform convergence and the interchange of limits 68
28. Infinite series 71
29. Series of functions 74
30. Power series 75
31. Compactness in function space 76
32. Conditional convergence 78
1
2 JACK SPIELBERG

1. Axioms for the real numbers


In this course we will more-or-less follow an axiomatic approach. Namely, we will give
axioms for the real numbers, and prove everything in the course from these axioms. Well,
this is not strictly true — some things will be stated without proof. These may be as
simple as ordinary high school algebra, which we will assume is well-understood already
(and that you have had some experience in deriving from the axioms). We will also make
use of various functions familiar from calculus, such as the trigonometric, exponential and
logarithmic functions, even if we haven’t yet proved their existence and properties from
the axioms. However, in this case we will eventually at least sketch how this can be done
rigorously, and we promise that even if we never talk about these proofs, the use we make
of these functions isn’t needed for the proofs (thus we avoid any circularity in the logical
structure of the material). There is one theorem at the foundation of the course that we will
neither prove nor sketch. That one we will just take “on faith.” (You can read a proof in
the first chapter of Rudin’s book, and if you do, you will understand why we won’t use class
time for it.)
So now we begin. The axioms that define the real numbers come in three parts: the field
axioms, the order axioms, and the completeness axiom.
Definition 1.1. A field is a set F with two binary operations, addition (denoted +) and mul-
tiplication (denoted ·), that satisfy the following axioms (we assume that these are familiar,
so we only give them briefly).
(1) Addition and multiplication are associative and commutative.
(2) There exist identity elements for addition and multiplication, denoted 0 and 1, re-
spectively.
(3) 0 6= 1.
(4) Every element of F has an additive inverse.
(5) Every non-zero element of F has a multiplicative inverse.
(6) Multiplication distributes over addition.
All of the usual algebraic rules of arithmetic follow from these axioms. For example:
• Additive and multiplicative identities and inverses are unique.
• (−1)2 = 1.
• xy = 0 if and
Pnonlynif n−j
x = 0 or y = 0.
• (a + b) = j=0 j a bj , where the binomial coefficients are defined by
n

 
n n!
= .
j j!(n − j)!
We will assume familiarity with this stuff. It is interesting, though, to consider what is
actually included in the phrase “this stuff.” What facts from high school algebra are covered
by the field axioms? Here is an example of something that is not covered.
Example 1.2. Let F be a field. Are the elements 1, 1 + 1, 1 + 1 + 1, 1 + 1 + 1 + 1, . . .
all distinct? In fact, if we just have the field axioms, we can neither prove nor disprove
that these are all distinct elements. Notice that these are what we normally refer to as the
natural numbers (denoted N). So it isn’t clear that the natural numbers even make sense in
an arbitrary field.
Exercise 1.3. Explain why the “fact” stated in the previous example is true.
NOTES, MAT 472, INTERMEDIATE ANALYSIS, FALL 2010 3

As another example, it is impossible to define the absolute value function in an arbitrary


field in an intelligent way. To resolve these problems (i.e. to make sure that our axioms
really do pick out the real numbers) we have to give more axioms.
Definition 1.4. Let F be a field. Then F is an ordered field if there is a distinguished
subset F + of F (called the positive elements of F ) satisfying the following properties.
(1) For each x ∈ F , exactly one of the three statements x ∈ F + , −x ∈ F + , x = 0 is true.
(2) If x, y ∈ F + then x + y ∈ F + .
(3) If x, y ∈ F + then xy ∈ F + .
Now we define the usual symbols to express order. For any elements x, y ∈ F , we write
x > y to mean x − y ∈ F + , x ≥ y to mean x > y or x = y, etc.
All of the usual rules of inequalities follow from the order axioms and the field axioms.
For example:
• If x ≤ y and y ≤ x, then x = y.
• If x 6= 0 then x2 > 0.
• If x < y and z > 0 then xz < yz.
• If 0 < x < y and n ∈ N then xn < y n .
• xy > 0 if and only if x > 0 and y > 0, or x < 0 and y < 0.
• A finite subset of F has a minimum. (Of course this is not true for infinite subsets.)
As a further example, let’s note that the order axioms resolve the ambiguity mentioned
above. If F is an ordered field, then 1 + 1 + · · · + 1 > 0. It follows easily (exercise!) that 1,
1 + 1, 1 + 1 + 1, . . . are all distinct positive elements of F . Thus N is “contained” in every
ordered field. But then the integers, Z, are also contained in every ordered field. But then the
rational numbers, Q, are also contained in every ordered field. In fact, the rational numbers
themselves are an ordered field (this is obvious, isn’t it?). Thus the rational numbers can be
described as the smallest ordered field.
We will now dip back into high school algebra to review the absolute value function. Even
though we assume familiarity with this stuff, absolute value is so important that it’s worth
stating some of the details.
Definition 1.5. Let F be an ordered field. The absolute value function | · | : F → F is
defined by (
x, if x ≥ 0,
|x| =
−x, if x < 0.
When we think of the real numbers as a “number line,” we can view |x| as the “distance”
between x and 0. The number line is great for intuition, but must not be used for proofs.
Here are the basic properties of absolute value.
(1) | − x| = |x|.
(2) |x| ≥ 0.
(3) ±x ≤ |x|.
(4) |x| < a if and only if −a < x and x < a (written −a < x < a).
(5) |x| > a if and only if x < −a or x > a (Note that this cannot be written without
using the word “or”).
(6) |x + y| ≤ |x|
+ |y| (the
triangle inequality).
(7) |x + y| ≥ |x| − |y| .

4 JACK SPIELBERG

(8) |x − a| < r if and only if a − r < x < a + r (draw a picture on the number line).
(9) If a < x < b and a < y < b then |x − y| < b − a.
(10) Let x ∈ F . Suppose that |x| < ε for every positive element ε ∈ F . Then x = 0.
Property 10 above can be strengthened a bit, in a way that can be very useful. (Don’t
cite property 10 when proving this.)
Exercise 1.6. Let F be an ordered field, and let x ∈ F . Suppose that p, q ∈ F + are such
that for every ε ∈ F with 0 < ε < p, we have |x| < qε. Then x = 0.
Remark 1.7. Here is another consequence of the ordered field axioms. Let b > 0. Then
(1 + b)n = 1 + nb + · · · > nb.
1
Now let 0 < a < 1. Then a
> 1, so
1−a 1
= − 1 > 0.
a a
1 1
Let b = a
− 1. Then a = 1+b
, and
 
n 1 1 a 1
a = < = .
(1 + b)n nb 1−a n
Now we ask the following question (assuming some familiarity with the concept of limit,
but only for the sake of the discussion): if 0 < a < 1 does an tend towards 0, as n → ∞?
Another way to put this is to ask: if c is any fixed positive element, does there exists n0 ∈ N
such that an < c for all n ≥ n0 ? Using the above computations, we see that we can answer
this question affirmatively if we could show that for any fixed positive element c, there exists
a
n0 ∈ N such that (1−a)n < c for all n ≥ n0 . Now observe that we could do this if we could
a
find n0 ∈ N such that (1−a)n 0
< c. In other words, we could prove that an → 0 if we could
a
find n0 ∈ N such that n0 > (1−a)c . But since c is an arbitrary positive element, then so
a
is (1−a)c . So this all comes down to trying to prove that for any positive element x, there
is a natural number n0 such that n0 > x. An ordered field in which this is true is called
Archimedean.
Definition 1.8. Let F be an ordered field. F is called Archimedean if for every x ∈ F there
exists a natural number n such that x < n.
It is evident that Q is an Archimedean ordered field, and we “know” that R is one too.
But we can’t prove it yet, because not all ordered fields are Archimedean!! In other words,
we don’t yet have enough axioms for the real numbers, since we can’t prove the most basic
fact from advanced calculus. Along with the field and order axioms, there is one more axiom
that is necessary to characterize the real numbers. We need some definitions before we can
present it.
Definition 1.9. Let F be an ordered field, let S ⊆ F , and let x ∈ F .
(1) x is an upper bound of S if y ≤ x for every y ∈ S.
(2) x is a lower bound of S if y ≥ x for every y ∈ S.
(3) S is bounded above if there exists an upper bound for S.
(4) S is bounded below if there exists a lower bound for S.
(5) S is bounded if it is bounded above and below.
Exercise 1.10. Is the empty set bounded?
NOTES, MAT 472, INTERMEDIATE ANALYSIS, FALL 2010 5

Definition 1.11. Let S be a subset of an ordered field F , and let x ∈ F . x is a supremum


(or sup, or least upper bound, or lub) of S if
(1) x is an upper bound of S.
(2) For every upper bound z of S, x ≤ z.
The condition (2) can be expressed in the equivalent forms:
(20 ) For any z ∈ F , if z < x then z is not an upper bound of S.
(200 ) For any z ∈ F , if z < x then there exists y ∈ S with z < y.
In a completely analogous manner we define infimum (or inf, greatest lower bound, glb).
The details of the precise formulation are left as an exercise.
Remark 1.12. It follows immediately from condition (2) of Definition 1.11 that if S has a
supremum then it is unique (and similarly for infimum).
Exercise 1.13. Let S be a subset of an ordered field F , and let x ∈ F . Let −S = {−y :
y ∈ S}
(1) S is bounded above (respectively, below) if and only if −S is bounded below (respec-
tively, above).
(2) x is an upper (respectively, lower) bound for S if and only if −x is a lower (respec-
tively, upper) bound for −S.
(3) x is a supremum (respectively, infimum) of S if and only if −x is an infimum (respec-
tively, supremum) of −S.
Now we are ready to state the last axiom of the real numbers, the completeness axiom.
Definition 1.14. Let F be an ordered field. F is complete if every non-empty subset of F
that is bounded above has a supremum.
The following is an easy consequence of Exercise 1.13.
Corollary 1.15. Let F be a complete ordered field. Then every non-empty subset of F that
is bounded below has an infimum.
The next theorem is the foundation of the course, but is the one result that we won’t
attempt to prove. As mentioned earlier, you can read a proof in Rudin’s book.
Theorem 1.16. There exists a unique complete ordered field.
The one and only complete ordered field is called the field of real numbers, and we will
write R as an abbreviation. This is the same number line that we (think we) know and love.
But even though we have lots of intuition about it, we will insist on proving EVERYTHING
about it. For example, since R is an ordered field, R contains the rational numbers Q as a
subfield. Are there any other elements of R besides Q? Well, we think we know that there
are —√but how do we prove that there are? The usual way is to bring up the classical proof
that 2 is irrational. But this is sophistry! That proof “merely” shows that no rational
number has square equal to 2. It’s possible that there is an element of R having square equal
to 2. If there is such an element, then it can’t belong to Q, so it would be an element of
R \ Q. But we don’t know yet that there is a real number having square equal to 2. In fact,
there is, but this fact must be proved.
Let’s return to an even more basic point, the Archimedean property. We mentioned earlier
that R is Archimedean, and of course this is a fundamental property of the number line: the
6 JACK SPIELBERG

natural numbers march off arbitrarily far to the right. Our first theorem about the real
numbers is this fact. As we pointed out before, the proof must rely on the completeness
axiom, since not all ordered fields are Archimedean.
Theorem 1.17. R is Archimedean: for every x ∈ R there exists n ∈ N such that x < n.
Proof. We suppose that R is not Archimedean, and derive a contradiction. So let x ∈ R
be such that x ≥ n for all n ∈ N. This just means that x is an upper bound for N. Thus
the (non-empty) subset N of R is bounded above. By the completeness axiom, N has a
supremum. Let z = sup(N). Now z − 1 < z. By Definition 1.11 (200 ), there is an element
n ∈ N with n > z − 1. But then n + 1 > z. Since n + 1 ∈ N, this contradicts Definition 1.11
(1). Therefore R is Archimedean. 
We now present some corollaries of the Archimedean property.
1
Corollary 1.18. If x ∈ R with x > 0, then there exists n ∈ N with n
< x.
Proof. By the Archimedean property there is n ∈ N with n > x1 . Then 1
n
< x. 
Before stating the next corollary, we recall the well-ordering principle (WOP) and one of
its variations. The WOP states that a non-empty subset of N contains a smallest element.
This is a fundamental property of the natural numbers — it is logically equivalent to the
principle of mathematical induction. The variation we need states that a non-empty subset
of Z that is bounded below (in Z) contains a smallest element.
Corollary 1.19. For x ∈ R there exists a unique n ∈ Z with n ≤ x < n + 1.
Proof. Let x ∈ R. By the Archimedean property there is m ∈ N with m > |x|. Then
x > −m, so the set {k ∈ Z : k > x} is non-empty and bounded below (by −m). Let n + 1
be its smallest element. Then n + 1 > x. But since n < n + 1, n is not in this set, so
n ≤ x. This proves existence. For uniqueness, suppose that n and n0 both do the job. Then
x − 1 < n, n0 ≤ x, so (by property 9 of absolute value) we have |n − n0 | < 1. Since n, n0 ∈ Z
then n = n0 . 
The integer n of Corollary 1.19 is denoted [x]. The function [·] : R → Z is called the greatest
integer function. (Some people denote it by bxc; b·c is also called the floor function.)
n
Corollary 1.20. For x ∈ R and for N ∈ N there exists a unique n ∈ Z such that N
≤x<
n+1
N
.
Proof. Apply Corollary 1.19 to N x. 
Corollary 1.21. For x, ε ∈ R with ε > 0, there exists y ∈ Q such that |x − y| < ε.
Proof. By Corollary 1.18 there is N ∈ N with N1 < ε. By Corollary 1.20 there is n ∈ Z such
that Nn ≤ x < n+1
N
. Let y = Nn . Then y ∈ Q, and |x − y| = x − y < n+1
N
− Nn = N1 < ε. 
The conclusion of Corollary 1.21 is often expressed as: Q is dense in R.
The completeness axiom is actually stronger than the Archimedean property. The next
result does not follow from the Archimedean property (as can be seen from the fact that the
conclusion does not hold in Q).
Theorem 1.22. Let n ∈ N. Every positive real number has a unique positive nth root.
NOTES, MAT 472, INTERMEDIATE ANALYSIS, FALL 2010 7

Proof. We first prove uniqueness. If 0 < y < z then y n < z n , so two distinct positive real
numbers cannot be nth roots of the same real number. We now prove existence. Let a > 1.
(If 0 < a < 1, then 1/a > 1. In this case, if we show that 1/a has a positive nth root, then
the inverse of that root will be a positive nth root for a.) Let E = {x ≥ 0 : xn ≤ a}. We
note that E 6= ∅ since 1 ∈ E. We claim that E is bounded above. To see this, note that if
x ∈ E then
xn ≤ aan .
Therefore x < a, and we see that a is an upper bound for E. Thus the completeness axiom
implies that y = sup(E) exists. We will show that y n = a, finishing the proof.
First note that y ≥ 1, since 1 ∈ E. We will use Exercise 1.6. Let 0 < ε < 1. First note
that since y − ε < y < y + ε, we have
(1) (y − ε)n < y n < (y + ε)n .
Since y − ε < y, property (200 ) of Definition 1.11 implies that there is x ∈ E with y − ε < x.
Then (y − ε)n < xn ≤ a. Also, since y + ε > y then y + ε 6∈ E, and hence a < (y + ε)n .
Therefore
(2) (y − ε)n < a < (y + ε)n .
From (1) and (2), and property 9 of absolute value, we have |y n − a| < (y + ε)n − (y − ε)n .
We have
n  
n n
X n
y n−j εj − y n−j (−ε)j

(y + ε) − (y − ε) =
j=0
j
n  
X n n−j j
y ε 1 − (−1)j

=
j=0
j
n  
X n n−j j
=2 y ε
j=1
j
j odd
 
n  
 X n n−j 
<
2 y  ε, since ε < 1.
j=1
j
j odd

By Exercise 1.6 it follows that y − a = 0, and hence that y n = a.


n


Now we know that R truly is bigger than Q; for example, 2 ∈ R \ Q. This one number
can be parlayed into many more.
Theorem 1.23. R \ Q is dense in R.

Proof. Let x ∈ R, and let ε > 0. By Corollary 1.21 there is z√∈ Q with |x 2 − z| < ε.
In fact, it follows
√ also that we can assume z 6= 0. Let y = z/ 2. Then y ∈ R \ Q, and
|x − y| < ε/ 2 < ε. 
Definition 1.24. The elements of R \ Q are called irrational numbers.
Thus the irrational numbers are also dense in R. While Corollary 1.21 and Theorem 1.23
treat the rational and irrational numbers symmetrically, in fact the set of irrational numbers
8 JACK SPIELBERG

is much bigger than the set of rationals (Corollary 2.12). Before proving this, we will first
review some basic facts about the size of sets.
2. Cardinality (briefly)
Definition 2.1. Let A and B be sets.
(1) A and B are equivalent, written A ∼ B, if there exists a bijection from A to B. In
this case, A and B are said to be of the same cardinality.
(2) A is subequivalent to B, written A  B, if there is a one-to-one function from A to
B.
The proof of the following proposition is elementary.
Proposition 2.2. For any sets A, B and C,
• A ∼ A.
• If A ∼ B then B ∼ A.
• If A ∼ B and B ∼ C then A ∼ C.
The next theorem is very useful, and its proof is a nice exercise.
Theorem 2.3. (Cantor-Bernstein) Let A and B be sets. If A  B and B  A then A ∼ B.
Definition 2.4. Let A be a set.
(1) A is finite if there is n ∈ N ∪ {0} such that A ∼ {1, 2, . . . , n}.
(2) A is infinite if A is not finite.
(3) A is denumerable if A ∼ N.
(4) A is countable if A is finite or denumerable.
(5) A is uncountable if A is not countable.
Proposition 2.5. (1) If m 6= n then {1, 2, . . . , m} 6∼ {1, 2, . . . , n}.
(2) N is infinite.
(3) A is countable if and only if A  N.
(4) Let A1 , A2 , . . . be countable sets. Then ∪∞ n=1 An is countable, and for each n, A1 ×
· · · × An is countable.
(5) Q is countable.
Proof. The first three statements can be proved as exercises. For the fourth, let An =
{xn1 , xn2 , . . .}. Consider the list: x11 , x12 , x21 , x13 , x22 , x31 , . . .. For each entry, delete all
subsequent occurrences. What is left is a list, without duplications, of the elements of the
union. This defines a bijection from N to the union.
Suppose inductively that A1 × · · · × An is countable. Then
A1 × · · · × An+1 = ∪x∈An+1 A1 × · · · × An × {x}
is countable.
For the last statement, first note that Z is countable, as can be seen from the list: 0, 1,
-1, 2, -2, . . .. Since Z ∼ n1 Z, it follows from Proposition 2.2 that n1 Z is countable. Then
Q = ∪∞ 1
n=1 n Z is countable. 
Q∞ 
Example 2.6. Let X = 1 {0, 1} = (x1 , x2 , . . .) : xi ∈ {0, 1} for all i . (Thus X is the
set of all sequences of 0’s and 1’s.)
Proposition 2.7. X is uncountable.
NOTES, MAT 472, INTERMEDIATE ANALYSIS, FALL 2010 9

Proof. We will show that if f : N → X is any function, then f is not onto. Therefore there
does not exist a bijection from N to X.
So let f : N → X be given. Let f (n) be the sequence (xn1 , xn2 , xn3 , . . .). Define an
element y = (y1 , y2 , . . .) ∈ X by yn = 1 − xnn . Then for each n, y and f (n) differ in the nth
slot, so that y 6= f (n). Therefore y is not in the range of f . Therefore f is not onto. 
Remark 2.8. X ∼ P(N). To define a bijection from X to P(N), send a sequence x =
(x1 x2 . . .) to the set {n ∈ N : xn = 1}. It is easy to check that this works. In fact, this is a
special case of a general theorem of Cantor.
Theorem 2.9. If S is any set, and if f : S → P(S) is any function, then f is not onto.
Thus for any set S, S 6∼ P(S). (Since it is evident that S  P(S), we observe that P(S)
has a larger cardinality than S.)
Proof. Given f , let E = {x ∈ S : x 6∈ f (x)}. It is easy to check that E is not in the range
of f . 
The next result will be proved later (Corollary 6.4).
Theorem 2.10. R ∼ X.
Corollary 2.11. R is uncountable.
The previous corollary (and hence also the next corollary) can be proved from the results
of the next section, rather than from Corollary 6.4.
Corollary 2.12. The set of irrational numbers is uncountable.

3. Decimal representation of real numbers


We like to think of elements of R as infinite decimals: x ∼ x0 .x1 x2 x3 · · · , where x0 ∈ Z
and xn ∈ {0, 1, . . . , 9} for n ≥ 1. We want to make this precise without using infinite series.
Let x ∈ R. Let x0 = [x] ∈ Z. Then x0 ≤ x < x0 + 1, so
0 ≤ x − x0 < 1.
Then 0 ≤ 10(x − x0 ) < 10. We let x1 = [10(x − x0 )] ∈ {0, 1, . . . , 9}. We have
x1 ≤ 10(x − x0 ) < x1 + 1
x1 10−1 ≤ x − x0 < x1 10−1 + 10−1
0 ≤ x − x0 − x1 10−1 < 10−1 .
Inductively, suppose that we have constructed xn−1 ∈ {0, 1, . . . , 9} such that
n−1
X
0≤x− xi 10−i < 10−(n−1) .
i=0
Pn−1
Then 0 ≤ 10n (x − i=0 xi 10−i ) < 10. We set
n−1
X
n
xn = [10 (x − xi 10−i )] ∈ {0, 1, . . . , 9}.
i=0
n
Pn−1 −i
Then xn ≤ 10 (x − i=0 xi 10 ) < xn + 1, and hence
(1) 0 ≤ x − ni=0 xi 10i < 10−n .
P
10 JACK SPIELBERG

Thus we have defined x0 ∈ Z and xn ∈ {0, 1, . . . , 9} for n ≥ 1 so that (1) holds for all n.
Pn −i

(2) x = sup i=0 x i 10 : n ≥ 0 .
Proof. The proof is left as an exercise. 
(3) (xn ) is not eventually equal to 9; precisely, for every n there is m ≥ n such that
xm 6= 9. The point is that, for example, if we start with x = 1, we will obtain the
expansion 1.0000 · · · , and NOT 0.9999 · · · .
Proof. The proof is left as an exercise. 
(4) If x 6= y then there exists k such that xk 6= yk . In other words, the map that takes a
real number to its decimal expansion is one-to-one.
Proof. Let x < y. Choose n such that 10−n < y − x. Then
n
X n
X
−i −n
xi 10 ≤ x < y − 10 < yi 10−i .
i=0 i=0

Hence there exists k ≤ n such that xk 6= yk . 


(5) Let y0 ∈ Z and yn ∈ {0, 1, . . . , 9} for n ∈ N be such that (yn ) is not eventually equal
to 9. Then there is a real number x such that xn = yn for all n. This will prove that
there is a one-to-one correspondence between real numbers and decimal expansions
that do not terminate in a string of 9’s.
Proof. First note that ni=1 9/10i = 1 − 10−n , by summing a finite geometric series. Now we
P
have
Xn X n
−i
yi 10 ≤ y0 + 9 · 10−i = y0 + 1 − 10−n < y0 + 1.
i=0 i=1
Pn
Thus the set { i=0 yi 10−i : n ≥ 0} is bounded above. Let x be the supremum of this set.
Note that the elements of this set, indexed by n, form an increasing sequence. We will show
that xn = yn for all n. For n = 0, choose k such that yk 6= 9. For any m ≥ k,
m
X m
X
−i
yi 10 ≤ y0 + 9 · 10−i − 10−k = y0 + 1 − 10−m − 10−k < y0 + 1 − 10−k .
i=0 i=1
−k
P0 x ≤ y−i
It follows that 0 + 1 − 10 < y0 + 1. Therefore x0 ≤ y0 . On the other hand, we have
that y0 = i=0 yi 10 , so y0 ≤ x, and hence y0 ≤ x0 . Thus x0 = y0 . Suppose inductively
that xi = yi for i < n. Choose k > n with yk 6= 9, and let m ≥ k. We have
m
X n
X m
X
−i −i
yi 10 ≤ yi 10 + 9 · 10−i − 10−k
i=0 i=0 i=n+1
n
X
= yi 10−i + (1 − 10−m ) − (1 − 10−n ) − 10−k
i=0
n
X
< yi 10−i + 10−n − 10−k .
i=0
NOTES, MAT 472, INTERMEDIATE ANALYSIS, FALL 2010 11

Since this is true for all m ≥ k, we have


Xn
x≤ yi 10−i + 10−n − 10−k .
i=0
Since xi = yi for i < n, we have
n−1
X
x− xi 10−i ≤ yn 10−n + 10−n − 10−k
i=0
n−1
X
10n (x − xi 10−i ) ≤ yn + 1 − 10−(k−n) < yn + 1
i=0
n−1
X
n
xn = [10 (x − xi 10−i )] ≤ yn .
i=0
For the reverse inequality,
n−1
X n−1
X n
X n−1
X
−i −i −i
x− xi 10 =x− yi 10 ≥ yi 10 − yi 10−i = yn 10−n .
i=0 i=0 i=0 i=0
Pn−1
Hence 10n (x − i=0 xi 10−i ) ≥ yn , and hence xn ≥ yn . 
As we mentioned at the end of the previous section, the decimal representation of real
numbers can be used to prove that R is uncountable. The idea of the proof is a special case
of the proof of Cantor’s theorem. It is usually called Cantor’s diagonal argument.
Proof. (of Corollary 2.11) We suppose that R is countable, and deduce a contradiction. Let
x1 , x2 , . . . be a listing of the elements of (the supposedly countable set) R. Let xn have the
decimal representation xn0 .xn1 xn2 · · · . For each n ≥ 1 define yn as follows: if xnn 6= 1 let
yn = 1; if xnn = 1, let yn = 2. By construction, the sequence of digits yn is not eventually 9,
and therefore it is the decimal representation of a real number y. Now we see that for each
n, y and xn have decimal representations differing in the nth place; therefore y 6= xn . Thus
y is not in the list we started with, contradicting the assumption that this list contained all
real numbers. 

4. Metric spaces
Much of what we do in analysis ultimately comes down to measuring the distance between
two real numbers. We use the absolute value for this: |x − y| is the distance between the
numbers x and y. There are many other situations where we use the distance between
points in an essential way. For example, the Pythagorean theorem is used to define the usual
distance between points in R2 , and even in Rn . One of the wonderful abstractions of XXth
century mathematics is a generalization of this notion of distance. In fact, it isn’t too hard to
notice that everything we use distance for in advanced calculus (e.g. limits, continuity, etc.)
relies only on a few very coarse aspects of the distance function. The following definition
sets these out precisely, and gives the basic setting for this course.
Definition 4.1. Let X be a set. A metric on X is a function d : X × X → R such that
(1) d(x, y) ≥ 0 for all x, y ∈ X (positivity).
(2) d(x, y) = 0 if and only if x = y (definiteness).
12 JACK SPIELBERG

(3) d(x, y) = d(y, x) (symmetry).


(4) d(x, y) ≤ d(x, z) + d(z, y) (triangle inequality).
Example 4.2. The usual metric on R is defined by d(x, y) = |x − y|.
Remark 4.3. Two common variations of the triangle inequality are easily proved as exer-
cises:

(1) d(x, y) ≥ d(x, z) − d(y, z) .
(2) d(x, y) ≤ d(x, z1 ) + d(z1 , z2 ) + · · · + d(zn−1 , zn ) + d(zn , y).
Many important examples of metric spaces arise from norms on vector spaces.
Definition 4.4. Let V be a real vector space. A norm on V is a function k · k : V → R such
that
(1) kxk ≥ 0.
(2) kxk = 0 only if x = 0.
(3) kcxk = |c|kxk for all c ∈ R (and x ∈ V ).
(4) kx + yk ≤ kxk + kyk.
Remark 4.5. If k · k is a norm on V , there is an associated metric on V given by d(x, y) =
kx − yk.
Example 4.6. (Function space) Let S be a nonempty set. The bounded (real-valued) func-
tions on S are defined by B(S, R) = {f : S → R : the range of f is bounded}. It is easy
to see that B(S, R) is a vector space (with point-wise operations). The uniform norm is
defined on B(S, R) by kf ku = supx∈S f (x) . It is an (easy) exercise to check that this is a
norm.

Example 4.7. Rn = R × · · · × R (n factors) can be thought of as B {1, . . . , n}, R . The
uniform norm here is usually denoted k · k∞ : k(x1 , . . . , xn )k∞ = max1≤i≤n |xi |.
Another important way of producing a norm on a vector space is by means of an inner
product.
Definition 4.8. Let V be a real vector space. An inner product on V is a function h·, ·i :
V × V → R such that
(1) hx, xi ≥ 0.
(2) hx, xi = 0 only if x = 0.
(3) hx, yi = hy, xi.
(4) hax + by, zi = ahx, zi + bhy, zi for all a, b ∈ R (and x, y, z ∈ V ).
Property 4 of Definition 4.8 is called linearity in the first variable. By properties 3 and
4 it follows that inner products are also linear in the second variable. It follows from these
that h0, yi = hy, 0i = 0 for all y ∈ V .

Theorem 4.9. (Cauchy-Schwartz inequality) Let V, h·, ·i be an inner product space. Then
hx, yi ≤ hx, xi1/2 hy, yi1/2 .
Proof. By the remarks before the theorem, the inequality holds if any of x, y, and hx, yi
equals zero. So suppose that all three are non-zero. Let a = −sgn hx, yi hy, yi1/2 and
b = hx, xi1/2 . (Recall that sgn(t) = 1 if t > 0, = −1 if t < 0, and = 0 if t = 0.) Then

0 ≤ hax + by, ax + byi = a2 hx, xi + 2abhx, yi + b2 hy, yi = a2 b2 − 2|a| b hx, yi + b2 a2 .
Dividing by 2|a| b gives the result. 
NOTES, MAT 472, INTERMEDIATE ANALYSIS, FALL 2010 13

Corollary 4.10. Let V be an inner product space. For x ∈ V let kxk = hx, xi1/2 . Then k · k
is a norm on V .
Proof. We will prove the triangle inequality, leaving the verification of the other properties
of a norm as an exercise. Let x, y ∈ V . Then by the Cauchy-Schwartz inequality,
kx + yk2 = hx + y, x + yi = hx, xi + hx, yi + hy, xi + hy, yi = kxk2 + 2hx, yi + kyk2
2
≤ kxk2 + 2kxk kyk + kyk2 = kxk + kyk . 
Example 4.11. The usual norm on Rn arises from the usual inner product. The corre-
sponding metric space is usually referred to as (n-dimensional) Euclidean space. We note
the following important inequalities for the Euclidean norm (proof by squaring).
Remark 4.12. Let x ∈ Rn . Then for any i,
|xi | ≤ kxk ≤ |x1 | + · · · + |xn |.
Definition 4.13. Let (X, d) be a metric space, and let Y ⊆ X. If we restrict the metric d
to points of Y then Y becomes a metric space, called a subspace of X.

Example 4.14. The circle (or torus) is a subspace of Euclidean space: T = (x, y) ∈ R2 :
 √
x2 + y 2 = 1 . (Thus, for example, d (1, 0), (0, 1) = 2.)
It is very important to remember that, while pictures can give a lot of valuable intuition,
they are not a substitute for a proof. In this course, you may never use a picture as part
of a proof (though they can be included to help explain what you are doing). Well, it isn’t
really enough to just tell you not to touch the stove — you really have to burn yourself. The
following example is much more frequently encountered than you might imagine the first
time you see it. You should work through carefully on your own the details of the proof that
it is a metric space, and try to visualize it in some way (it’s unclear what that means!). It
provides a counterexample to many “obvious” facts about metric spaces that are not actually
true. The point is this: any theorem that we prove about metric spaces must be true for all
metric spaces. In particular, it will be true for the metric space in the next example.
Example 4.15. Recall the set X from Example 2.6. We define a metric on X as follows.
for x, y ∈ X with x 6= y, the set {i : xi 6= yi } is non-empty. By the well-ordering principle,
it has a least element. We set k(x, y) = min{i : xi 6= yi }. Then we define
(
1
k(x,y)
, if x 6= y
d(x, y) =
0, if x = y.
We claim that d is a metric on X. The proofs of positive definiteness and symmetry are
immediate. We will verify the triangle inequality. In fact, we will prove something stronger,
called the ultrametric inequality.

Lemma 4.16. For any x, y, z ∈ X, d(x, y) ≤ max d(x, z), d(y, z) .
Proof. We will write “k(x, x) = ∞” as a kind of shorthand. (But notice that then we have
that d(x, y) < d(u, v) if and only if k(x, y) > k(u, v), for any points x, y, u, v ∈ X.)
Now let x, y, z ∈ X. If d(x, y) ≤ d(x, z) then the inequality holds. So suppose that
d(x, y) > d(x, z). Then k(x, y) < k(x, z). Since xi = zi for i < k(x, z), we have that
 = xk(x,y) 6= yk(x,y)
zk(x,y) . Therefore k(y, z) ≤ k(x, y), and hence d(x, y) ≤ d(y, z). Therefore
max d(x, z), d(y, z) = d(y, z) ≥ d(x, y). 
14 JACK SPIELBERG

The following is another example of a metric space that varies from what our intuition
suggests. This one often seems like a stupid metric space . . . well, it is stupid, but it is
also a metric space. Every theorem about metric spaces must be true for it, and hence any
statement that is not true for this example, cannot be proven using only the axioms of a
metric space.
Example 4.17. Let S be any set. The discrete metric on S is defined by
(
1, if x 6= y,
d(x, y) =
0, if x = y.
Remark 4.18. It is easy to see that the discrete metric on a set with n points can be
realized as a subspace of Euclidean n-space. It is a little harder to find a natural setting for
the discrete metric on N. The discrete metric on R is a useful counterexample to keep in
mind.

5. The topology of metric spaces


Definition 5.1. Let (X, d) be a metric space, and let a ∈ X, r > 0. The open ball with
center a and radius r is the set

Br (a) = x ∈ X : d(x, a) < r .
The closed ball with center a and radius r is the set

B r (a) = x ∈ X : d(x, a) ≤ r .
Example 5.2. In R, Br (a) = (a − r, a + r) and B r (a) = [a − r, a + r]. You should sketch the
pictures of the open and closed balls in R2 . These pictures are extremely useful as intuition
when proving things. But it is NEVER permissible to use a picture as a substitute for a
proof.
Definition 5.3. Let (X, d) be a metric space, and let E ⊆ X.
(1) E is an open set if for each x ∈ E there is r > 0 such that Br (x) ⊆ E.
(2) E is a closed set if E c is an open set.
Proposition 5.4. In a metric space, open balls are open sets and closed balls are closed sets.
Proof. Let a ∈ X and r > 0, and let x ∈ Br (a). We need to find an open ball centered at x
(with some positive radius) that is completely contained in Br (a). We know that d(x, a) < r,
since x ∈ Br (a). Let s = r − d(x, a). Then s > 0. We claim that Bs (x) ⊆ Br (a). To prove
this, let y ∈ Bs (x). Then d(y, x) < s. Then
d(y, a) ≤ d(y, x) + d(x, a) < s + d(x, a) = r − d(x, a) + d(x, a) = r,
and hence y ∈ Br (a). Therefore Br (a) is an open set.
The proof that closed balls are closed sets is left as an exercise. 
Proposition 5.5. The following hold in a metric space.
(1) The union of any collection of open sets is open.
(2) The intersection of a finite collection of open sets is open.
(3) The intersection of any collection of closed sets is closed.
(4) The union of a finite collection of closed sets is closed.
NOTES, MAT 472, INTERMEDIATE ANALYSIS, FALL 2010 15

Proof. These are easy exercises using DeMorgan’s laws (and the notation for families of
sets). 

Example 5.6. (1) In any metric space X, X and ∅ are both open and closed. (It is a
fairly deep fact (to be proved later) that if X = Rn then these are the only sets that
are simultaneously open and closed.)
(2) A singleton set in a metric space is a closed set.

Proof. Let x ∈ X and y ∈ {x}c . Then y 6= x, so r = d(x, y) > 0. Then Br (y) ⊆


{x}c . 

(3) A finite subset of a metric space is a closed set.


(4) In Rn , sets of the form {x : xi > c}, {x : xi < c}, are called open half-spaces, and are
open sets. Sets of the form {x : xi ≥ c}, {x : xi ≤ c} are called closed half-spaces,
and are closed sets.

Proof. Since a closed half-space is the complement of an open half-space, it is enough


to prove openness of open half-spaces. If H = {x : xi > c} and y ∈ H, let r = yi −c >
0. If z ∈ Br (y) then yi −zi ≤ |zi −yi | ≤ d(z, y) < r. Then zi = yi −(yi −zi ) > yi −r = c,
and hence z ∈ H. Thus Br (y) ⊆ H, and we have shown that H is open. The proof
for the other kind of open half-space is left as an exercise. 

Definition 5.7. An open box in Rn is a set of the form (a1 , b1 ) × · · · × (an , bn ), where
−∞ ≤ ai ≤ bi ≤ ∞ for each i. Closed boxes in Rn are defined similarly, by including all
finite endpoints of the interval factors of the Cartesian product. We note that an open box
is a finite intersection of (at most 2n) open half-spaces, and hence is an open set. Similarly,
closed boxes are closed sets.

It is important to remember that, while the complement of an open set is a closed set, the
opposite of “open” is not “closed” — many (most, even) sets are neither open nor closed.
We next introduce the operations of interior and closure. These provide important open and
closed sets associated with arbitrary subsets of a metric space.

Definition 5.8. Let X be a metric space and let E ⊆ X. The interior of E is the set
[
int (E) = U : U ⊆ E and U is open .

The closure of E is the set


\
E= K : K ⊇ E and K is closed .

Remark 5.9. We observe that


(1) int(E) is an open set, and is the largest open set contained in E.
(2) E is a closed set, and is the smallest closed set containing E.
c
(3) E = int(E c ) , and int(E) = (E c )c .
16 JACK SPIELBERG

Proof. The first two items follow immediately from the definitions. For the third item, we
have
E = ∩{K : E ⊆ K, K closed}
(E)c = ∪{K c : E ⊆ K, K closed}
= ∪{K c : K c ⊆ E c , K c open}
= ∪{U : U ⊆ E c , U open}
= int (E c ).
Taking complements of both sides yields the first formula. If we apply the first formula to
E c , and take complements of both sides, we obtain the second formula. 
The above definitions are abstract, in that they don’t give an explicit criterion to use to
decide if a point does or does not belong to the interior or closure of a set. We now give
such criteria.
Proposition 5.10. (1) x ∈ int(E) if and only if there is r > 0 such that Br (x) ⊆ E.
(2) x ∈ E if and only if for every r > 0, Br (x) ∩ E 6= ∅.
Proof. (1) This is almost instantly obtained from the definition, and we leave the details
as an exercise.
(2) We note that x ∈ (E)c if and only if x ∈ Int(E c ), by Remark 5.9(3). But this is true
if and only if there is r > 0 such that Br (x) ⊆ E c , by part (1). But this is true if and
only if there is r > 0 such that Br (x) ∩ E = ∅. By negating the first and last items
in this chain of equivalent statements, we find that x ∈ E if and only if for all r > 0,
Br (x) ∩ E 6= ∅.

Example 5.11. It is worth thinking about the above definitions and results in the context
of some examples. We note that in any metric space, if E is open then int(E) = E, while if
E is closed then E = E.
In R,

(1) int (a, b] = (a, b).
(2) int (Z) = ∅.
(3) int (Q) = ∅.
(4) (0, 1] = [0, 1].
(5) Z = Z.
(6) Q = R.
It might seem tempting to try to describe the new points sucked in by the closure operation,
i.e. the points of E that are not already in E. However it turns out to be much more useful
to describe the property that brings them into the closure. This property may apply also to
some points already in E.
Definition 5.12. Let X be a metric space, E ⊆ X, and a ∈ X. The point a is a cluster
point of E (also called by some people limit point or accumulation point) if for every r > 0,
the intersection E ∩ Br (a) is infinite. We write E 0 for the set of cluster points of E.
Example 5.13. Let X = R.
NOTES, MAT 472, INTERMEDIATE ANALYSIS, FALL 2010 17

(1) {1, 21 , 31 , . . .}0 = {0}.


(2) {0, 1, 21 , 13 , . . .}0 = {0}.
(3) Z0 = ∅.
(4) Q0 = R.
Note that E 0 ⊆ E — this follows from Proposition 5.10(2). Therefore E ∪ E 0 ⊆ E. In
fact, the two sides are equal, which fact is the content of the next result.
Proposition 5.14. E = E ∪ E 0 .
Before proving the proposition, we give a lemma that may seem surprising at first.
Lemma 5.15. a ∈ E 0 if and only if for each r > 0, E \ {a} ∩ Br (a) 6= ∅.


Proof. (⇒): Suppose that for some r > 0, E \ {a} ∩ Br (a) = ∅. Then E ∩ Br (a) ⊆ {a}, a
finite set. Hence a 6∈ E 0 .
(⇐): Let a 6∈ E 0 . Then there is r > 0 such that E ∩ Br (a) is finite. Let E ∩ Br (a) \ {a} =


{x1 , x2 , . . . , xm }. Let s = mini d(a, xi ) > 0. Then E \ {a} ∩ Bs (a) = ∅. 


Proof. (of Proposition 5.14) We already proved ⊇ in the comments before the proposition.
For ⊆, let a ∈ E. If a ∈ E then clearly a ∈ E ∪ E 0 . So suppose that a 6∈ E. Let
r > 0. By Proposition 5.10(2) we have E ∩ Br (a) 6= ∅. But since a 6∈ E this implies that
E \ {a} ∩ Br (a) 6= ∅. By Lemma 5.15, a ∈ E 0 . 
Corollary 5.16. E is closed if and only if E 0 ⊆ E.
Definition 5.17. Let X be a metric space. A subset E ⊆ X is called bounded if there is a
ball in X that contains E. X is bounded if it is a bounded subset of itself. (In this case we
say that the metric is bounded.)
Remark 5.18. In Definition 5.17 it doesn’t matter whether the ball is required to be open
or closed.
Exercise 5.19. Let X be a metric space, and let E ⊆ X with E 6= ∅. We define the
diameter of E by
( 
sup d(x, y) : x, y ∈ E , if E is bounded,
diam(E) =
∞, if E is unbounded.

(1) Prove that diam(E) < ∞ if and only if diam(E) < ∞.


(2) Prove that diam(E) = diam(E).

6. The Cantor set


In this section we introduce the first “interesting” set that most people come across. Let
F0 = [0, 1], F1 = [0, 31 ] ∪ [ 23 , 1], F2 = [0, 19 ] ∪ [ 29 , 13 ] ∪ [ 23 , 79 ] ∪ [ 89 , 1], and so on. Recursively,
Fn is obtained by removing the open middle third from each subinterval of Fn−1 . Thus Fn
is the disjoint union of 2n closed intervals, each of length 3−n . Fn is closed, nonempty, and
Fn ⊇ Fn+1 .
Definition 6.1. The Cantor set, C, is the set ∞
T
n=0 Fn .
18 JACK SPIELBERG

It is a good idea to draw a picture. It isn’t hard to see that C is nonempty: all the
endpoints of the closed subintervals making up the Fn ’s belong to C. Still, this set of
endpoints is a countable set. In fact, C is much bigger, as we will now see. Recall the space
X of Definition 2.6. We will prove that C ∼ X.
Definition 6.2. We define f : X → C as follows. Let x = (x1 , x2 , . . .) ∈ X. For each n
define a closed interval In (x) recursively by
I0 (x) = [0, 1]
(
left piece of In (x) ∩ Fn+1 , if xn+1 = 0,
In+1 =
right piece of In (x) ∩ Fn+1 , if xn+1 = 1.
Then I0 (x) ⊇ I1 (x) ⊇ · · · . Let us write In (x) = [an , bn ]. The nesting of these intervals
implies that
a1 ≤ a2 ≤ · · · ≤ b 2 ≤ b 1 .
T∞
n=0 In (x) = [α, β]. To see
Let α = sup{a1 , a2 , . . .} and β = inf{b1 , b2 , . . .}. We claim that T
this, we firstTnote that since an ≤ α ≤ β ≤ bn for all n, [α, β] ⊆ ∞ n=0 In (x). On the other

hand, if x ∈ n=0 , then an ≤ x ≤ bn for all n. Hence x is an upper bound for the set of an ’s,
and a lower bound for the set of bn ’s. Thus α ≤ x ≤ β. This proves the claim. Finally,T∞ since
−n −n
bn − an = 3 , we have β − α ≤ 3 for all n. Therefore α = β. It follows that n=0 In (x) is
the singleton set {α}. We define f by setting f (x) = α. More precisely, the above argument
allows us to describe f as follows:
\∞
{f (x)} = In (x).
n=0
Proposition 6.3. f is bijective.
Proof. We first show that f is injective. Let x, y ∈ X with x 6= y. Let k = k(x, y) (recall
Example 4.15). For i < k, xi = yi , so that Ii (x) = Ii (y). Since xk 6= yk , Ik (x) and Ik (y) are
two disjoint subintervals of Ik−1 (x) = Ik−1 (y). Since f (x) ∈ Ik (x) and f (y) ∈ Ik (y), we must
have f (x) 6= f (y)..
We now show that f is surjective. Let t ∈ C. Then t ∈ Fn for all n. For each n, let In
be the subinterval of Fn containing t. Since In and In+1 are subintervals of Fn and Fn+1 ,
respectively, then either In ⊇ IT n+1 or In ∩ In+1 = ∅. Since both contain t, we must have
In ⊇ In+1 . Thus we must have ∞ n=0 In = {t}. Now let
(
0, if In is the left piece of In−1 ∩ Fn ,
xn =
1, if In is the right piece of In−1 ∩ Fn .
Letting x = (x1 , x2 , . . .) ∈ X, we see that In = In (x) for all n, so that t = f (x). 
Corollary 6.4. R, C, X, and P(N) are equivalent sets. In particular, R is uncountable.
Proof. In Remark 2.8 we sketched the proof that X ∼ P(N), while in Proposition 6.3 we saw
that X ∼ C. We finish the proof by showing that R ∼ C. Since C ⊆ R we have C  R. By
the Cantor-Bernstein theorem, it suffices to show that R  C. Since C ∼ P(N), it suffices
to show that R  P(N). But since N ∼ Q, we know that P(N) ∼ P(Q). Thus we will
be finished if we can show that R  P(Q). We do that as follows. We define a function
g : R → P(Q) by
g(t) = {q ∈ Q : q < t}.
NOTES, MAT 472, INTERMEDIATE ANALYSIS, FALL 2010 19

If s 6= t are distinct points of R, say s < t, then by the density of Q in R there exists
q ∈ Q with r < q < t. Then q ∈ g(t) and q 6∈ g(s), and we have g(s) 6= g(t). Hence g is
one-to-one. 
Exercise 6.5. (1) int(C) = ∅.
(2) C 0 = C.
7. Sequences
Definition 7.1. Let X be a set. A sequence in X is a function x : N → X.
Remark 7.2. We usually write xn instead of x(n), but the latter notation is often useful
too. We sometimes write (xn )∞ n=1 , or (xn ), for x. It is important to remember that in this
notation, n is a dummy variable — it is the argument of the function x. (So, in particular,
there is nothing special about the letter n used as the argument — it will often be convenient
to use a different letter.) Some texts use curly braces instead of parentheses, but we will
avoid this notation, for the following reason. The range of the sequence x is the subset
{xn : n ∈ N} of X. This is often referred to as the set of terms of (xn ). It is important
to distinguish between the sequence itself (which is a function from N to X), and its set of
terms (which is a subset of X).
While we are on the subject of the subtlety of the notation for sequences, let me point
out a common mistake to guard against. What should we make of the following statement
(taken from more than one actual homework paper!): “Let (xn ) be a sequence, and let (xi )
be another sequence.”? Of course, this deserves a quantity of red ink, but you should think
carefully about the precise error. (And PLEASE don’t make this mistake too.)
Definition 7.3. Let (xn ) be a sequence in a metric space X, and let a ∈ X. (xn ) converges
to a if for every ε > 0, there exists n0 ∈ N such that for all n ≥ n0 we have d(xn , a) < ε. We
write xn → a (as n → ∞) to indicate that (xn ) converges to a.
Lemma 7.4. A sequence in a metric space converges to at most one point.
Proof. Suppose that xn → a and xn → b. Let ε > 0. There exist n1 , n2 ∈ N such that
d(xn , a) < ε/2 for all n ≥ n1 , and d(xn , b) < ε/2 for all n ≥ n2 . Let n = max{n1 , n2 }. Then
d(a, b) ≤ d(a, xn ) + d(xn , b) < ε/2 + ε/2 = ε. Since d(a, b) < ε for all ε > 0, it follows that
a = b. 
Definition 7.5. If xn → a, a is called the limit of (xn ), and we write limn→∞ xn = a. We
say that (xn ) converges if it has a limit; otherwise it diverges.
Proposition 7.6. Let X be a metric space, let E ⊆ X, and let a ∈ X.
(1) a ∈ E if and only if there is a sequence in E converging to a.
(2) a ∈ E 0 if and only if there is a sequence in E \ {a} converging to a.
(3) E is closed if and only if every sequence in E that converges in X has its limit in E.
Proof. We prove part of the proposition, and leave the rest as an exercise.
(1) (⇒): Let a ∈ E. By Proposition 5.10(2), for each n ∈ N we have E ∩ B1/n (a) 6= ∅.
Choose xn ∈ E with d(xn , a) < 1/n. Then xn → a. 
Remark 7.7. Sequences are an important tool for studying metric spaces. One can think
of a sequence as a kind of “probe” — a function from N to the space picks out a certain
countable subset in a manner indexed by the natural numbers. It is also useful to use
sequences as tools to study a sequence itself. This leads to the next definition.
20 JACK SPIELBERG

Definition 7.8. Let x be a sequence in a set X, and let n be a strictly increasing sequence
in N. (Thus n : N → N satisfies n1 < n2 < n3 < · · · .) Then x ◦ n is another sequence in X.
It is called a subsequence of x.

Remark 7.9. The terms of the subsequence ∞ x ◦ n may be denoted (x ◦ n)i = x n(i) =
xn(i) = xni . Thus we may write x ◦ n = xni i=1 .
The idea of a subsequence is pretty simple, but the notation can lead to lots of silly
mistakes, against which you should be on guard. For example, let (xn ) be a sequence.  The
expression x50 makes sense — it is the 50th term of the sequence. Now let xni be a
subsequence. The expression xn50 makes sense — it is the 50th term of the subsequence,
and equivalently, it is the n50 th term of the original sequence. However, the expression x50i
does not make sense. If we try to interpret it, we first realize that it is the value of the
function x at the argument 50i . So 50i must be an element of the domain of x, namely a
natural number. Now 50i must be the value of the function 50 at the argument i. But this
is nonsense — ‘50’ is not a function, so it can’t be ‘evaluated’ at the argument i.
Here is another example to keep in mind. Suppose that we have a bunch of sequences
in X. Say that x1 , x2 , . . . are all sequences (i.e. we have a sequence of sequences). How
∞ the terms of the nth sequence? We have that xn : N → X, so we can write
should we write
xn = xn (i) i=1 , using function notation for xn . Note carefully that i is the argument of the
function xn , and not the argument of n (which is not a function). We have to be careful
about using subscript notation. If we weren’t being careful, we might write xn = (xni )∞ i=1 .
But this is the same as the notation for a subsequence ∞of a sequence x. One resolution of this
ambiguity is to use more parentheses: xn = (xn )i i=1 . The more usual way is to use two
subscripts: xn = (xni )∞
i=1 , and this is what we will do when we are faced with this situation.
Writing it out longhand for clarity gives xn = (xn1 , xn2 , xn3 , . . .). Note that it is necessary to
write so clearly that the reader does not mistake the second subscript for a sub-subscript.
Here is a simple result about subsequences.
Proposition 7.10. Let (xn ) be a convergent sequence in a metric space. Then every subse-
quence of (xn ) is also convergent, and has the same limit.
Remark 7.11. Before proving the proposition, we observe that if n : N → N is strictly
increasing, then ni ≥ i for all i. This is easily proved by induction on i, and we omit the
proof. We do point out that equality is possible. In fact, letting ni = i for all i shows that
any sequence is a subsequence of itself.
Proof. (of Proposition 7.10) Let xn → a, and let (xni ) be a subsequence. We will show that
xni → a. Let ε > 0. Since xn → a, there is m such that d(xn , a) < ε whenever n ≥ m. Now
if i ≥ m, then ni ≥ m, by the remark, so that d(xni , a) < ε. Thus xni → a (as i → ∞). 
Remark 7.12. It is clear from the definition that convergence or divergence of a sequence
is unaffected if finitely many terms are changed. Convergence, divergence, the limit if con-
vergent, are examples of properties of a sequence that depend only on the ultimate behavior
of the sequence. In fact, such properties are the only ones that are important for sequences.
One way to describe this is by means of tails of a sequence. If (xn ) is a sequence, the nth
tail is the subsequence (xi )∞i=n . Thus, if the sequence converges to L, then every tail of
the sequence also converges to L. We sometimes say that a property holds eventually for a
sequence if it holds for some tail.
NOTES, MAT 472, INTERMEDIATE ANALYSIS, FALL 2010 21

Definition 7.13. A subset of a metric space is bounded if it is contained in some ball. A


function having a metric space as codomain is bounded if its range is a bounded subset of
the codomain (cf. Example 4.6).
Of course, since a sequence in a metric space is an example of a function with the metric
space as codomain, it makes sense to talk of bounded (and unbounded) sequences. The proof
of the next result is a good exercise, but it will also follow from some later results.
Lemma 7.14. Let (xn ) be a convergent sequence (in some metric space). Then (xn ) is
bounded.

8. Continuous functions
Definition 8.1. Let (X, d) and (Y, ρ) be metric spaces, f : X → Y a function, and x0 ∈ X.
f is continuous at x0 if for every  ε > 0 there exists δ > 0 such that for every x ∈ X, if
d(x, x0 ) < δ then ρ f (x), f (x0 ) < ε. f is continuous if it is continuous at each point of X.
Remark 8.2. Here are some equivalent formulations of continuity at a point x0 .
 
(1) For every ε > 0 there exists δ > 0 such that f Bδ (x0 ) ⊆ Bε f (x0 ) .
(2) For every open ball C with center f (x0 ), there exists an open ball B with center x0
such that f (B) ⊆ C.
(3) For every ε > 0 there exists δ > 0 such that Bδ (x0 ) ⊆ f −1 Bε f (x0 ) .


Example 8.3. (If the proof is not given, it is an exercise.)


(1) Let f : R → R be given by f (x) = x2 . Then f is continuous.
Proof. Let x0 ∈ R, and let ε > 0. Then for any x ∈ R,

f (x) − f (x0 ) = |x2 − x20 |
= |x − x0 | |x + x0 |

≤ |x − x0 | |x − x0 | + 2|x0 | ;
if |x − x0 | < 1, then

≤ |x − x0 | 1 + 2|x0 | ;
if |x − x0 | < ε/(1 + 2|x0 |), then
< ε.

Now choose δ > 0 such that δ < min 1, ε/(1 + 2|x0 |) . Then |x − x0 | < δ implies
that |x2 − x20 | < ε. 
(2) Define the identity function id : X → X by id(x) = x. id is continuous.
(3) Fix y0 ∈ Y . Define f : X → Y by f (x) = y0 for all x ∈ X. Then f is continuous. (f
is called a constant function.)
(4) Define χQ : R → R by
(
1, if x ∈ Q
χQ (x) =
0, if x 6∈ Q.
χQ is discontinuous at each point of R.
22 JACK SPIELBERG

(5) (The Hermite function) Define h : R → R by


(
1
, if x = m in lowest terms, where m, n ∈ Z with n > 0
h(x) = n n
0, if x ∈ R \ Q.
Then h is continuous at each irrational number, and discontinuous at each rational
number. The proof is a nice exercise. (It is interesting to consider the opposite
continuity behavior.)
(6) We define the coordinate projections on Rn , πi : Rn → R, by πi (x) = xi . The πi are
continuous (by Remark 4.12).
Earlier we said that sequences are an important tool for studying objects in analysis. As
evidence, we now show how to use sequences to characterize continuity of a function between
metric spaces.
Theorem 8.4. Let X and Y be metric spaces, and let f : X → Y be a function. f
is continuous if and only if for every convergent sequence xn → a in X, we have that
f (xn ) → f (a) in Y . (Thus f is continuous if and only if it preserves convergent sequences,
and maps the limit of a convergent sequence to the limit of the image sequence.)
Proof. The forward direction is straightforward, and we leave it as an exercise. For the
reverse direction we prove the contrapositive. Suppose that f is not continuous at a. Then
there is ε > 0 such that for every δ > 0 there is x ∈ Bδ (a) with f (x) 6∈ Bε f (a) . We
apply this toδ = 1/n: thus there is a sequence (xn ) in X such that d(xn , a) < 1/n and
ρ f (xn ), f (a) ≥ ε. But then clearly xn → a while f (xn ) 6→ f (a). 
We didn’t mention this before, but the word topology has a technical meaning: the topology
of a metric space is the collection of all the open subsets of the space. A property of the
space is topological if it can be defined just by using the open sets. It is very important to
know that continuity of functions is a topological property.
Theorem 8.5. f : X → Y is continuous if and only if for every open set V ⊆ Y , the inverse
image f −1 (V ) is open in X.
Proof. (=⇒): Let V ⊆ Y be open. Let x0 ∈ f −1 (V ). Then f (x0 ) ∈ V . Since V is open
there is ε > 0 such that Bε f (x0 ) ⊆ V . By Remark 8.2 (3) there is δ > 0 such that
Bδ (x0 ) ⊆ f −1 (V ). Hence f −1 (V ) is open.
−1
 
(=⇒): Let x0 ∈ X and let ε > 0. Since Bε f (x 0 ) is open, then f Bε f (x 0 ) is open.
Since x0 ∈ f −1 Bε f (x0 ) there is δ > 0 such that Bδ (x0 ) ⊆ f −1 Bε f (x0 ) . Therefore f
 
is continuous at x0 (by Remark 8.2 (3)). 
Exercise 8.6. f : X → Y is continuous if and only if for every closed set V ⊆ Y , the inverse
image f −1 (V ) is closed in X.
This is a good place to introduce the notion of “sameness” for metric spaces. First, the
definition:
Definition 8.7. Let X and Y be metric spaces. A homeomorphism from X to Y is a
function f : X → Y which is bijective, continuous, and such that its inverse function f −1
is continuous. Two metric spaces are called homeomorphic if there exists a homeomorphism
from one to the other.
NOTES, MAT 472, INTERMEDIATE ANALYSIS, FALL 2010 23

Homeomorphic metric spaces have the same topological structure and properties. It is
colloquial to describe this by saying that one space can be deformed into the other by
bending and stretching without tearing. Here are some simple examples.
Example 8.8. (1) Any two open disks in R2 are homeomorphic.
(2) Any two closed disks in R2 having positive radii are homeorphic.
(3) No open disk in R2 is homeomorphic to any closed disk in R2 . (This is not an obvious
one.)
(4) Every open ball in Rn is homeomorphic to every open box in Rn .
(5) The unit circle T = {x ∈ R2 : kxk = 1} is not homeomorphic to the unit interval
[0, 1] ⊆ R. (Again, it isn’t so obvious how to prove this.)
Example 8.9. Recall the function f : X → C from Definition 6.2, where X = ∞
Q
1 {0, 1} is
as in Example 2.6, and C is the Cantor set (Definition 6.1). We will Qn show that f and f −1
are continuous functions. First some notation. If (a1 , a2 , . . . , an ) ∈ 1 {0, 1}, let
Z(a1 , . . . , an ) = {x ∈ X : xi = ai for 1 ≤ i ≤ n}.
Such sets are called cylinder sets. Note that cylinder sets are clopen: Z(a1 , . . . , an ) =
B1/n (x) = B 1/(n+1) (x) for any x ∈ Z(a1 , . . . , an ). Note also that f Z(a1 , . . . , an ) = C ∩In (x)
(again for any x ∈ Z(a1 , . . . , an )), which is a clopen subset of C (recall the definition of In (x)
from Definition 6.2). Thus these two families of clopen subsets are paired by the function
f . Since every open subset of X is a union of open balls, i.e. of cylinder sets, and every
open subset of C is a union of subsets of the form C ∩ In (x) (an exercise!), it follows from
Theorem 8.5 that f and f −1 are continuous.
The proofs of the next two results are easy, and so are left as exercises.
Corollary 8.10. (of Theorem 8.5) Let X be a metric space. f : X → R is continuous if
and only if f −1 (a, b) is open for all a < b in R. Equivalently, f : X → R is continuous if
and only if {f < a} and {f > a} are open for all a ∈ R.
Theorem 8.11. Let f : X → Y and g : Y → Z be functions between metric spaces, and let
x0 ∈ X. If f is continuous at x0 , and g is continuous at f (x0 ), then g ◦ f is continuous at
x0 .

9. Limits of functions
The definition we gave a while ago for the limit of a sequence is a special case of a general
notion of limit of a function — after all, a sequence is just a special kind of function. But
sequences are quite special. The definition of the limit of a function is a little bit more
involved. We will need it, in principle, when we talk about differentiation.
Definition 9.1. Let (X, d) and (Y, ρ) be metric spaces, let E ⊆ X, let x0 ∈ E 0 , and let
y0 ∈ Y . The limit of f , as x approaches x0 , equals y0 if for every ε > 0 there exists δ > 0
such that for all x ∈ E, if 0 < d(x, x0 ) < δ then ρ f (x), y0 < ε. (The final implication can
also be expressed as f E ∩ Bδ (x0 ) \ {x0 } ⊆ Bε (y0 ).) We write limx→x0 f (x) = y0 .
Remark 9.2. Note that f might or might not be defined at x0 (accordingly as x0 ∈ E or
x0 6∈ E). We require x0 ∈ E 0 so that for every δ > 0 there will exist points x satisfying the
hypothesis of the implication. Even if x0 ∈ E, the definition of the limit as x → x0 never
requires that f be evaluated at x0 — the value of f at x0 is irrelevant.
24 JACK SPIELBERG

Note further, that if we tried to apply this definition to a point x0 that is not a cluster
point of E, then we would find that the definition is satisfied for any point y0 ∈ Y . To avoid
this situation, we only consider limits at cluster points of the domain of the function.
Exercise 9.3. Show that in the situation of Definition 9.1, if the limit exists it is unique.
(Be sure to note explicitly where the hypothesis that x0 ∈ E 0 is used.)

Lemma 9.4. Let f , etc., be as in Definition 9.1. Define fe : E ∪ {x0 } → Y by


(
f (x), if x ∈ E \ {x0 }
fe(x) =
y0 , if x = x0 .

Then limx→x0 f (x) = y0 if and only if fe is continuous at x0 .


Proof. The proof is left as an exercise. 
Example 9.5. (1) It is easy to show that limt→0 t sin(1/t) = 0. Let f : R → R be given
by
(
t sin 1t , if t 6= 0
f (t) =
0, if t = 0.
Then f is continuous at 0.
(2) It is easy to show that limt→0 sin(1/t) does not exist. Let c ∈ R, and let g : R → R
be given by
(
sin 1t , if t 6= 0
g(t) =
c, if t = 0.
Then g is not continuous at 0.
Remark 9.6. Note that the definition of limit is local — it depends only on the restriction
of f to Br (x0 ), for any r > 0.

10. Sequences in R
Theorem 10.1. Let (an ) and (bn ) be sequences in R. Suppose that an → a, and bn → b.
Then
(1) an + bn → a + b.
(2) an bn → ab.
(3) If b 6= 0 then an /bn → a/b (where at most finitely many terms are not defined).
(4) If an ≤ bn for all n, then a ≤ b.
Proof. These are good exercises, so we will only prove part of the third statement; namely,
the case where an = 1 for all n. First, let’s sort out the parenthetical comment. If b 6= 0,
then |b| > 0. By definition of convergence, there is n0 such that |bn − b| < |b| for all n ≥ n0 .
But then, for all n ≥ n0 we have |bn | = |b − (b − bn )| ≥ |b| − |b − bn | > |b| − |b| = 0. Therefore
bn 6= 0 if n ≥ n0 . The quotient sequence will fail to be defined if the denominator equals
zero, but this can only happen for finitely many n (all less than n0 ).
Now let’s prove that if bn → b 6= 0, then 1/bn → 1/b. Let ε > 0. Let n1 be such that
|bn − b| < |b|/2 whenever n ≥ n1 . We can improve on the previous paragraph. If n ≥ n1 we
NOTES, MAT 472, INTERMEDIATE ANALYSIS, FALL 2010 25

have that |bn | ≥ |b| − |b − bn | > |b| − |b|/2 = |b|/2. Now let n2 be such that |bn − b| < |b|2 ε/2
whenever n ≥ n2 . Let n0 = max{n1 , n2 }. For n ≥ n0 we have
2

b − bn
= |bn − b| · 1 · 1 < |b| ε · 1 · 2 = ε.
1 1
− =
bn b bbn |b| |bn | 2 |b| |b|
Therefore 1/bn → 1/b. 
Remark 10.2. The first three statements in the theorem mean that the functions + and
· : R2 → R, and ÷ : R × (R \ {0}) → R are continuous.
Remark 10.3. (1) It follows from Theorem 10.1(4) that if an < bn for all n, then a ≤ b.
Note that even with strict inequalities in the hypotheses, the conclusion will in general
only be a weak inequality. This reflects a general principle: limits change strict
inequalities into weak inequalities.
(2) The following well-known lemma also follows from Theorem 10.1(4).
Lemma 10.4. Let (an ) and (bn ) be real sequences, suppose that |an | ≤ |bn |, and suppose that
bn → 0. Then an → 0.
Lemma 10.5. Let (xi ) be a sequence in Rn . We write the ith term of the sequence as an
n-tuple thus: (xi1 , . . . , xin ) (cf. Remarks 7.9). If a = (a1 , . . . , an ) ∈ Rn , then xn → a if and
only if xij → aj (as i → ∞) for all j = 1, . . ., n.
Proof. These follow easily from Remarks 4.12. 
We now establish convergence of some special, familiar sequences in R.

Proposition 10.6. (1) For any k ∈ N, 1/ k n → 0 as n → ∞.
(2) For any 0 < a < 1, an → 0 as n → ∞.
(3) n1/n → 1 as n → ∞.
(4) For any a ∈ R with 0 < a < 1, and any k ∈ N, an nk → 0 as n → ∞.
1/k
Proof.
√ (1) Let ε > 0. Choose n0 > 1/εk . If n ≥ n0 then n1/k ≥ n0 > 1/ε, and hence
1/ n < ε.
k

(2) We essentially proved this a long time ago, in Remark 1.7.


(3) It is evident that n1/n > 1. Let xn = n1/n − 1. Then by property (1) after Definition 1.1,
for n ≥ 2 we have n = (1 + xn )n > n(n−1)
2
2
x2n , and hence x2n < n−1 . It follows (using Lemma
10.4, and (1)) that xn → 0.
(4) This is very similar to the proof of (2). For that we referred to Remark 1.7. In that
remark we saw that if 0 < a < 1 then there is c > 0 such that an < c/n. Let’s apply this to
the number a1/(k+1) , which also lies between 0 and 1. Thus there is a positive number d such
that an/(k+1) < d/n. Raising both sides to the power k + 1 gives that an < dk+1 /nk+1 , and
hence that an nk < dk+1 /n. By Theorem 10.1(3) and Lemma 10.4, an nk → 0 as n → ∞. 
Definition 10.7. A sequence (xn ) in R is increasing if xn ≤ xn+1 for all n. It is called
strictly increasing if xn < xn+1 for all n. Decreasing and strictly decreasing sequences are
defined similarly. A sequence is called monotone if any of these terms apply.
Theorem 10.8. An increasing sequence that is bounded above is convergent.
Proof. Let (xn ) be an increasing sequence that is bounded above. Then the set of terms,
{xn : n ∈ N}, has a supremum, c. We claim that xn → c. Let ε > 0. Since the supremum
26 JACK SPIELBERG

is an upper bound, we have xn ≤ c < c + ε for all n. Since c − ε < c, c − ε is not an upper
bound, so there exists n0 with c − ε < xn0 . Then for all n ≥ n0 we have c − ε < xn . Thus
we get that c − ε < xn < c + ε whenever n ≥ n0 . Thus xn → c. 
Exercise 10.9. A bounded monotone sequence is convergent.

q p
Example 10.10. Does 2 + 2 + 2 + · · · mean anything? OK, this is phrased as a
philosophical question, i.e. it’s a joke. But we can still try to give the expression some
kind of sense. For example, we could argue that √ IF it does represent a real number, call
it x, then x must satisfy the equation x = 2 + x. Then it’s easy to see that x = 2.
But this is not valid since we haven’t shown that the expression does indeed represent a
real number. Someq people might try to make sense of it by interpreting it as a sequence:
√ p √ p √
( 2, 2 + 2, 2 + 2 + 2, . . .). They would define the expression to be the limit of this
sequence, assuming that the sequence converges. We could argue about whether this is a
reasonable definition for the expression, but we can’t argue with the intelligibility of the new
problem: does the given sequence converge, and if so, to what? (Other people might argue
that the limitq of this sequence (if existing) is actually the definition of a different expression;
p √
namely, · · · + 2 + 2 + 2.) However we come to study this sequence, it is a nice exercise
in induction to prove that it is bounded above and√increasing. Therefore it converges. Using
the recursive definition of the sequence (an+1 = 2 + an ), and Theorem 10.1, it is easy to
prove that the limit is, in fact, 2.
We next point out that continuity of real-valued functions is preserved by pointwise arith-
metic of functions.
Corollary 10.11. Let f , g : X → Y , and let a ∈ X. If f and g are continuous at a, then
so are f + g, f g, and f /g (if g(a) 6= 0).
Proof. This follows from Theorems 10.1 and 8.4. 
Remark 10.12. The result for limits analogous to the one in Corollary 10.11 holds, as can
be seen by using Lemma 9.4.
Definition 10.13. If f : X → Rn we define the coordinate functions of f by fi = πi ◦ f :
X → R (recall the coordinate  projections πi from Example 8.3 (6)). We can then write
f (x) = f1 (x), . . . , fn (x) .
Corollary 10.14. Let f : X → Rn . Then f is continuous if and only if all fi are continuous.
Proof. (=⇒): Use Theorem 8.11 and Example 8.3 (6).
(⇐=): Use Remark 4.12 and Theorem 8.4. 

11. Limsup and liminf


In spite of the wonderful theorem about monotone sequences from the last section, most
sequences (even bounded ones) diverge. However, there is still information to be gotten from
a divergent sequence.
Let (an ) be a bounded sequence in R, say L ≤ an ≤ M for all n. Then for each n,
sup{ak : k ≥ n} = sup{an , an+1 , an+2 . . .} exists, since the tails of (an ) are all bounded
above (by M ). (We will usually use the shorthand supk≥n ak for this sup of the nth tail
NOTES, MAT 472, INTERMEDIATE ANALYSIS, FALL 2010 27

of the sequence (an ).) Notice that {ak : k ≥ n} ⊇ {ak : k ≥ n + 1}, and hence that
supk≥n ak ≥ supk≥n+1 ak . Of course, since L ≤ ak for all k, we also have L ≤ supk≥n ak for
all n. Therefore the sequence of suprema of tails, (supk≥n ak )∞
n=1 is decreasing and bounded
below, and hence converges.
Definition 11.1. Let (an ) be a bounded sequence in R. The limit superior, or limsup of
(an ) is the real number 
lim sup an = lim sup ak .
n→∞ n→∞ k≥n

In a completely analogous way we define the limit inferior, or liminf :



lim inf an = lim inf ak .
n→∞ n→∞ k≥n

The justification is the opposite of the above: the sequence (inf k≥n ak )∞
n=1 is increasing and
bounded, so it has a limit.
Theorem 11.2. Let (an ) be a bounded sequence in R.
(1) lim inf n→∞ an ≤ lim supn→∞ an
(2) (an ) converges if and only if lim inf n→∞ an = lim supn→∞ an , and in this case,
lim an = lim inf an = lim sup an .
n→∞ n→∞ n→∞

Proof. The proof is left as an exercise. 


The following theorem is usually referred to as the Bolzano-Weierstrass theorem. It is true
in Rn as well. In fact, we will use this property as a definition later (Definition 14.17). The
proof of the Bolzano-Weierstrass theorem in Rn will be given then.
Theorem 11.3. Let (an ) be a bounded sequence in R. Then (an ) has a convergent subse-
quence.
Proof. Let c = lim sup an . Let bn = supj≥n aj , so that b1 ≥ b2 ≥ · · · and lim bn = c. We
will use (bn ) to recursively define a subsequence of (an ) that converges to c. Choose m1 with
bm1 < c + 1. Then choose n1 ≥ m1 with an1 > bm1 − 1. Then
c − 1 ≤ bm1 − 1 < an1 ≤ bm1 < c + 1,
so that |an1 − c| < 1. (Exercise: make sure you can explain each of the above inequalities.)
Recursively, having chosen 1 ≤ n1 < n2 < · · · < nk−1 with |ani − c| < 1i for i = 1, . . .,
k − 1, choose mk > nk−1 so that bmk < c + k1 . Then choose nk ≥ mk with bmk − k1 < ank .
Then we have
1 1 1
c − ≤ bmk − < ank ≤ bmk < c + ,
k k k
and hence |ank − c| < k1 . Therefore we have defined 1 ≤ n1 < n2 < · · · so that |ank − c| < k1
for all k. Thus the subsequence (ank )∞
k=1 converges to c. 
Remark 11.4. As a corollary to the proof, we see that every bounded sequence in R has a
subsequence converging to the limsup of the sequence. An analogous argument shows that
there is a(nother) subsequence converging to the liminf of the sequence.
Exercise 11.5. Let (an ) be a bounded sequence. Let E be the set of subsequential limits;
that is, E = {x ∈ R : there is a subsequence of (an ) converging to x}. Then lim inf an =
min(E) and lim sup an = max(E)
28 JACK SPIELBERG

Definition 11.6. We introduce here some standard terminology regarding sequences, re-
flecting the idea that it is only the “ultimate” behavior of a sequence that is of interest (cf.
Remark 7.12). Our phrasing is very general, hence vague, but expresses a useful notion that
is easy to understand once you see the idea. Let (an ) be a sequence, and let P be some
property that the terms of the sequence might have. We say that P holds eventually if P
holds for all terms in some tail of the sequence; in other words, if there exists n0 such that
P (an ) is true for all n ≥ n0 . We say that P holds frequently if every tail of the sequence
contains a term for which P holds; in other words, if for all n0 there exists n ≥ n0 such that
P (an ) is true.
For example, you can check your understanding of these terms by working through the
following statements.
(1) (an ) converges to c if and only if for every ε > 0, an ∈ Bε (c) eventually.
(2) (an ) has a subsequence converging to c if and only if for every ε > 0, an ∈ Bε (c)
frequently.
Exercise 11.7. Let (an ) be a bounded real sequence, and let x ∈ R. Prove the following:
1. x < lim sup an =⇒ x < an frequently =⇒ x ≤ lim sup an
2. x > lim sup an =⇒ x > an eventually =⇒ x ≥ lim sup an
3. x < lim inf an =⇒ x < an eventually =⇒ x ≤ lim inf an
4. x > lim inf an =⇒ x > an frequently =⇒ x ≥ lim inf an
(The exercise is not only to prove the eight implications, but also to show that none of these
implications can be reversed.)

12. Infinite limits and limits at infinity


There are innumerable ways in which a limit can fail to exist. One of these is “regular”
enough to warrant special notation: divergence to (±) infinity.
Definition 12.1. Let X be a metric space, let x0 ∈ X 0 , and let f : X → R. We say that
f diverges to infinity as x approaches x0 if for every M ∈ R there exists δ > 0 such that
for all x ∈ X, if 0 < d(x, x0 ) < δ then f (x) > M . We write limx→x0 f (x) = ∞ in this case.
Similarly, we say that f diverges to minus infinity as x approaches x0 if for every M ∈ R
there exists δ > 0 such that for all x ∈ X, if 0 < d(x, x0 ) < δ then f (x) < M . We write
limx→x0 f (x) = −∞ in this case. An analogous definition is used for sequences.
Remark 12.2. It is important always to remember that ∞ and −∞ are not real numbers.
However, a limited portion of the arithmetic of real numbers can be usefully extended to
include these two symbols. The conventions are as follows.
• For x ∈ R, x ± ∞ = ±∞.
• For x ∈ R with x 6= 0, x · ±∞ = ± sgn(x) · ∞.
• For x ∈ R, x/ ± ∞ = 0.
• ∞ + ∞ = ∞, ∞ · ∞ = ∞.
On the other hand, certain combinations are expressly forbidden, under pain of writing
nonsense:

∞ − ∞, , 0·∞

are not defined.
NOTES, MAT 472, INTERMEDIATE ANALYSIS, FALL 2010 29

With the above definition and remarks in mind, we can extend the arithmetic of limits
from Corollary 10.11 and Remark 10.12 to include infinite limits (and, of course, limits of
sequences as well as of functions). By this we mean that the limit of the sum/difference/pro-
duct/quotient of two functions equals the sum/difference/product/quotient of the two limits,
IF that arithmetic combination of the limits is permissible. We leave it as an exercise to
write a precise theorem and its proof.
A different use of the symbols ±∞ is in the description of limits at infinity.
Definition 12.3. Let X be a metric space, let f : R → X, and let x0 ∈ X. We write
limt→∞ f (t) = x0 if for every ε > 0 there exists M ∈ R such that for all t ∈ R with t ≥ M
we have d f (t), x0 < ε. There is a similar definition for limits at minus infinity.
Remark 12.4. We mention that in this context, the symbols ∞ and −∞ merely indicate
“directions”, and are not to be thought of as “numbers” in any way.

13. Cauchy sequences and complete metric spaces


It may not have seemed important at the time, but the definition of convergence for
a sequence has an unfortunate limitation. Namely, in order to check the definition, it is
necessary to have the limit in hand. In order to use sequences as a tool to study spaces,
it would be very helpful to be able to give an internal characterization of convergence, one
that doesn’t refer to the limit itself. This motivation is not possible to carry out in general,
but the idea that came from it is very important.
Definition 13.1. Let (X, d) be a metric space. A sequence (xn ) in X is Cauchy if for every
positive real number ε, there exists n0 ∈ N such that for all m, n ≥ n0 we have d(xm , xn ) < ε.
Informally, we say that the sequence is Cauchy if its terms can be made close to each other
merely by requiring them to be far enough out in the sequence. It is an exercise in the logic
of quantifiers to convince yourself that the definition captures precisely the idea behind this
informal statement.
The following lemma provides many examples of Cauchy sequences.
Lemma 13.2. A convergent sequence is Cauchy.
Proof. Let (xn ) be convergent, with limit x. Let ε > 0 be given. By the definition of
convergence there is n0 such that for all n ≥ n0 , d(xn , x) < ε/2. Then if m, n ≥ n0 we have
d(xm , xn ) ≤ d(xm , x) + d(x, xn ) < ε/2 + ε/2 = ε. Therefore (xn ) is Cauchy. 

Example 13.3. Here is an example of a non-Cauchy sequence in R: let xn = n. (Exercise:
prove that it’s not Cauchy.) But successive terms do get close to each other: |xn − xn+1 | =
1
√ √
n+ n+1
< √2n .
Example 13.4. Here is an example of a Cauchy sequence that does not converge. Let
X = (0, 1) with the usual metric gotten from R. The sequence (1/n) in X is Cauchy but
not convergent. (Remember the definition of convergence (Definition 7.5): the limit has to
belong to the metric space.)
Example 13.5. Here is a more interesting example of a non-convergent Cauchy sequence.
Let V be the vector space of all finite real sequences:

V = (x1 , x2 , . . .) : xi ∈ R, there exists i0 such that for all i > i0 , xi = 0 .
30 JACK SPIELBERG

P∞ 2 1/2
We define a norm on V by kxk = i=1 xi (note that the sum is actually finite). It’s
easy to see that this is a norm: the properties defining a norm only involve finitely many
vectors at a time, and then the required property actually occurs in some Euclidean space,
where we already know the properties hold. Now, let
1 1 1 1 
vn = , , , . . . , n , 0, 0, 0, . . . ∈ V.
2 4 8 2
If m < n, we have
1 1
kvm − vn k2 = k(0, 0, . . . , 0, m+1 , . . . , n , 0, 0, . . .)k2
2 2
n n−m−1
X 1 2 1 X 1 1
= i
= m+1 i
< m.
i=m+1
2 4 i=0
4 4
Thus (vn ) is Cauchy in V . But we claim that (vn ) does not converge. To prove this, let
y = (yn ) be an arbitrary vector in V . There is k such that yi = 0 for i > k. For n > k,
∞ n
2
X
2
X 1 2 1 2 1
ky − vn k = (yi − vni ) = yi − i ≥ yk+1 − k+1 = k+1 .
i=1 i=1
2 2 4
Thus d(vn , y) ≥ 2−(k+1) for all n > k. Therefore vn 6→ y.
Definition 13.6. A metric space is called complete if every Cauchy sequence converges.
Theorem 13.7. Rn is complete.
We will give the proof after a couple of lemmas about Cauchy sequences in general metric
spaces.
Lemma 13.8. A Cauchy sequence is bounded.
Proof. Let (an ) be a Cauchy
 sequence. Then there is L such that d(am , an ) < 1 for all m,
n ≥ L. Let R = max d(a1 , aL ), . . . , d(aL−1 , aL ) + 2. Then d(an , aL ) < R for all n, and
hence (an ) is bounded 
Lemma 13.9. A Cauchy sequence having a convergent subsequence is convergent.
Proof. Let (an ) be a Cauchy sequence, and let (ani ) be a convergent subsequence, with limit
c. We claim an → c. Let ε > 0. Since (an ) is Cauchy there is L such that d(am, an ) < ε/2
for all m, n ≥ L. By the definition of convergence, there is i0 such that d ani , c < ε/2 for
all i ≥ i0 . Let i1 ≥ i0 be such that ni1 ≥ L. Then for any n ≥ ni1 we have
ε ε
d(an , c) ≤ d(an , ani1 ) + d(ani1 , c) < + = ε.
2 2
Hence an → c. 
Proof. (of Theorem 13.7) We first show that R is complete. Let (an ) be a Cauchy sequence
in R. By Lemma 13.8 we know that (an ) is bounded. By Theorem 11.3 we know that (an )
has a convergent subsequence. Then by Lemma 13.9 we know that (an ) converges. Thus R
is complete. Now it follows easily from Remark 4.12 that Rn is complete (the details are left
as an exercise). 
Exercise 13.10. A closed subset of a complete metric space is complete.
NOTES, MAT 472, INTERMEDIATE ANALYSIS, FALL 2010 31

Exercise 13.11. Let (X, d) be a metric space. Recall the diameter of a subset of X from
Exercise 5.19.
(1) Suppose that X is complete. Prove that for every decreasing sequence
F1 ⊇ F2 ⊇ · · ·
of nonempty closed subsets of X with limn→∞ diam(Fn ) = 0, there exists an element
a ∈ X such that

\
Fn = {a}.
n=1
(2) (converse of part (a)) Suppose that whenever F1 , F2 . . . are nonempty
T∞ closed subsets
of X such that F1 ⊇ F2 ⊇ · · · and limn→∞ diam(Fn ) = 0, then n=1 Fn 6= ∅. Prove
that X is a complete metric space.

14. Compactness
Compactness is probably the most important concept in analysis. It can be described in
various ways. The “right” way is not necessarily the easiest to understand. Before we give
the definition, here is some motivation for why it is reasonable. The basic problem that
compactness addresses is the transition from local information to global information. That
may sound cryptic, and it is meant to be a catchy phrase that will become more intelligible
as you get more used to these ideas. But it isn’t hard to see what it is about. Local (near a
point) means in an open ball centered at that point. Here is a simple example of using this
terminology. If a function is continuous at a point, then it is bounded in some open ball
centered at that point. Thus if a function is continuous on a set, it is bounded locally on
that set: each point in the set has a neighborhood on which the function is bounded. On
the other hand, global (on a set) means on the whole set. A function is “globally bounded”
if it is bounded on its domain, i.e. if it is a bounded function. Is every continuous function
bounded? Of course not! For example, a non-constant polynomial on R is continuous, but
not bounded. Local boundedness does not generally imply global boundedness. However if
the domain of the polynomial is taken to be a closed bounded interval, then the extreme
value theorem from calculus implies that the polynomial is bounded on the interval. The
great insight was that it is a property of the domain that lets us pass from local boundedness
to global boundedness, and this property is called compactness.
Now, recall what the word local means: in a neighborhood of a point. A property holds
locally on a set if for each point, there is an open ball centered at the point such that the
property holds in that ball. If the set is infinite, this will give an infinite collection of open
balls, one for each point. We could obtain the property globally if we had a finite collection
of balls instead of an infinite collection. Compactness of the set means that we can always
reduce to a finite collection.
You might notice that a lot of mathematics seems to proceed in this way: what would we
like to have? Let’s give a name to the situation where we have what we want. Now let’s
analyze the situation to see what exactly we were asking for. In fact, compactness can be
described in a variety of ways that seem very different. That means that we can prove that a
space is compact using an easy description. Then we can use compactness via a complicated
description.
OK, with that as motivation, here is the precise definition.
32 JACK SPIELBERG

Definition 14.1. Let X be a set. A cover of X is a collection of sets whose union contains
X. If U is a cover of X, a subcover of U is a subcollection of U that is also a cover of X.
Example
 14.2. (1) The set
of all open intervals is a cover of R.
(2) (a, b) : a < b, a, b ∈ Z is a subcover of example (1).
Definition 14.3. Let X be a metric space, and let E ⊆ X. An open cover of E is a cover
of E whose elements are open subsets of X.
Definition 14.4. Let X be a metric space, and let E ⊆ X. E is compact if every open
cover of E has a finite subcover.
Example 14.5. (1) Example 14.2(1) is an open cover of R having a finite subcover.
(2) Example 14.2(2) is an open cover of R not having a finite subcover. In particular, it
follows that R is not compact.
Example 14.6. (1) Finite sets are compact.
(2) {0, 1, 1/2, 1/3, . . .} is a compact subset of R.
(3) [0, 1] is a compact subset of R (this is a special case of Corollary 14.30).

Proof. Let U be an open cover of [0,1]. Let E = x ∈ [0, 1] : [0, x] is finitely covered
by U . Note that 0 ∈ E, so E 6= ∅. Let c = sup E. Then c ∈ [0, 1]. We first
claim that c ∈ E. To see this, choose U0 ∈ U with c ∈ U0 . Then there exists r > 0
such that (c − r, c + r) ⊆ U0 . By the definition of supremum, there is y ∈ E S with
y > c − r. By definition of E there is a finite subcollection V ⊆ U with [0, y] ⊆ V.
But then V ∪ {U0 } is a finite subcollection of U covering [0, c], proving that c ∈ E.
Now we note that, in fact, V ∪ {U0 } covers [0, a] for any number a between c and
c + r. Thus if c < 1 we could find a larger element of E than c, contradicting its
status as supremum. So we have shown that c = 1. Thus [0, 1] is finitely covered by
U. 
(4) [0, 1) is not compact.
Proof. (−1, 1 − n1 ) : n ∈ N is an open cover not having a finite subcover.


Definition 14.7. A metric space X is compact if X is a compact subset of itself.
By now our waffling use of the qualifier “subset” after the word “compact” may be causing
some trauma. We will remedy this now, but first we need the important notion of relatively
open set.
Definition 14.8. Let X be a metric space. Recall that a subset E ⊆ X is also a metric
space (cf Definition 4.13). A subset of E is called relatively open (in E) if it is an open
subset of the metric space E.
Example 14.9. (1) Let X = R, and let E = [0, 1] ⊆ X. Then [0, 1/2) is relatively open
in E, but not open in X.
(2) Let X = R2 , and let E = R × {0} ⊆ X (we think of E as being the x-axis in R2 ).
Then (0, 1) × {0} is just the usual open unit interval in the x-axis — it is relatively
open in E, but is not open in X.
Lemma 14.10. Let X be a metric space, and let E ⊆ X. For U ⊆ E, U is relatively open
in E if and only if there exists an open subset V of X such that U = E ∩ V .
NOTES, MAT 472, INTERMEDIATE ANALYSIS, FALL 2010 33

Proof. We will use a superscript E to distinguish open balls in the metric space E from open
balls in X. For a ∈ E and r > 0 we see that
BrE (a) = x ∈ E : d(x, a) < r = x ∈ X : d(x, a) < r ∩ E = Br (a) ∩ E.
 

S x ∈ U there
Thus U is relatively open in E if and only if for every  exists r(x) > 0 such that
Br(x) (x)∩E ⊆ U . In this case, we have that U = x∈U Br(x) (x) ∩E, and we may use the set
in parentheses for V . Conversely, suppose that U = V ∩ E for some open set V of X. Then
for a point x ∈ U there is r > 0 such that Br (x) ⊆ V . Then BrE (x) = Br (x)∩E ⊆ V ∩E = U ,
so we have that U is relatively open in E. 
Proposition 14.11. Let X be a metric space, and let E ⊆ X. E is a compact subset of X
if and only if E is a compact metric space.
Proof. Suppose that E is a compact subset of X. Let U be an open cover of (the metric space)
E. By Lemma 14.10, for each U ∈ U there is an open set VU ⊆ X such that U = VU ∩ E.
Then [ [ [ 
E= U= (VU ∩ E) = VU ∩ E,
U ∈U U ∈U U ∈U
and hence {VU : U ∈ U} is an open cover of E in X. By hypothesis this open cover has
a finite subcover. Thus there are U1 , . . ., Uk ∈ U such that E ⊆ VU1 ∪ · · · ∪ VUk . Hence
E ⊆ U1 ∪ · · · ∪ Uk , so that U has a finite subcover. Therefore the metric space E is compact.
The converse is left as an exercise. 
Thus compactness is an intrinsic property of a metric space, that cannot be lost when the
space is realized as a subspace of another metric space (in contrast to openness, which does
depend on the ambient metric space, as seen in Example 14.9). We now develop the chief
properties of compactness.
Proposition 14.12. A closed subset of a compact space is compact.
Proof. Let X be a compact metric space, and let E ⊆ X be a closed subset. Let U be an
open cover of E. Since E is closed, E c is open. Then U ∪ {E c } is an open cover of X. Since
X is compact, this open cover has a finite subcover. The subcover consists of finitely many
sets from U, possibly together with E c . But then the sets from U must cover E, so that U
has a finite subcover (of E). Therefore E is compact. 
Exercise 14.13. It is a nice exercise to prove a sort of converse to this. Namely, a compact
subset of a metric space is closed. We won’t do it here, as this fact will follow from a later
result (Corollary 14.20).
Proposition 14.14. A compact subset of a metric space is bounded.
Proof. Let E be a compact
 subset of the metric space X. Choose any point x0 ∈ X. Then
Bn (x0 ) : n = 1, 2, 3, . . . is an open cover of X, hence also of E. Since E is compact, there
is a finite subcover. But since the open balls increase with n, this means that there is n such
that E ⊆ Bn (x0 ). Thus E is bounded. 
Of course, the converse of Proposition 14.14 is false.
Theorem 14.15. (Finite Intersection Property, or FIP) Let X be a compact metric space.
Let {Ei }i∈I be a collection of nonempty closed subsets of X. Suppose that every finite
subcollection has nonempty intersection: for all k ∈ N, for all i1 , . . ., ik ∈ I, we have
Ei1 ∩ · · · ∩ Eik 6= ∅. Then ∩i∈I Ei 6= ∅.
34 JACK SPIELBERG

Proof. Suppose not. Then taking complements we have ∪i∈I Eic = X. This means that
{Eic : i ∈ I} is an open cover of X. Since X is compact there are i1 , . . ., ik ∈ I with
Eic1 ∪ · · · ∪ Eick = X. But then by complements again, we get that Ei1 ∩ · · · ∩ Eik = ∅, a
contradiction. 

Example
14.16. The theorem may fail if the sets are not closed: consider (0, 1/n) : n ∈
N . This does have the FIP, but the intersection is empty.
Definition 14.17. A metric space X is sequentially compact if every sequence in X has a
convergent subsequence (convergent in X, of course).
Example 14.18. [a, b] is sequentially compact by Theorem 11.3, and the fact that [a, b] is
closed.
Theorem 14.19. A compact metric space is sequentially compact.
Corollary 14.20. A compact subset of a metric space is closed.
The proof of the theorem will be made easier by the following preliminary “computation.”
Lemma 14.21. Let (xn ) be a sequence in a metric space, and let y be a point. Then (xn )
has a subsequence converging to y if and only if for every ε > 0 and for every m ∈ N, there
exists n ≥ m such that d(xn , y) < ε.
Proof. (⇒): Suppose limi→∞ xni = y. Let ε > 0 and m ∈ N. By the hypothesized conver-
gence there is i0 such that d(xni , y) < ε whenever i ≥ i0 . Since ni → ∞ as i → ∞ there
exists j ≥ i0 such that nj ≥ m. Then d(xnj , y) < ε. So nj is the desired ‘n’.
(⇐): Suppose the condition in the statement holds. We apply it repeatedly. First choose
n1 such that d(xn1 , y) < 1. Then choose n2 > n1 such that d(xn2 , y) < 1/2. Continuing
this way we construct a subsequence (xni )∞ i=1 such that d(xni , y) < 1/i for all i. Evidently
xni → y as i → ∞. 
Proof. (of Theorem 14.19) We will prove the contrapositive of the statement in the theorem.
So suppose that X is not sequentially compact. Then there is a sequence (xn ) having no
convergent subsequence. Thus for all y ∈ X, (xn ) does not have a subsequence converging to
y. Negating the condition in Lemma 14.21, we find that for all y ∈ X there exists εy > 0 and
there exists ny ∈ N such that for all n ≥ ny , d(xn , y) ≥ εy . Let U = Bεy (y) : y ∈ X . U is
obviously an open cover of X. But if y1 , . . ., yk ∈ X are any finite collection of points, choose
n > max{ny1 , . . . , nyk }. Then d(xn , y) ≥ εyi for i = 1, . . ., k. Hence xn 6∈ ∪ki=1 Bεyi (yi ). Thus
U has no finite subcover. Therefore X is not compact. 
Proposition 14.22. A sequentially compact metric space is complete.
Proof. This follows from Lemma 13.9. 
Exercise 14.23. A metric space X is sequentially compact if and only if every infinite subset
of X has a cluster point.
We now turn to the role of boundedness for compact metric spaces. By way of introduction,
we mention that the most famous result about compact metric spaces is the Heine-Borel
theorem: a subset of Rn is compact if and only if it closed and bounded. We will prove this
later, but now we want to point out that this result is special to Rn — it is NOT true in
arbitrary metric spaces. The reason is that Rn is (duh!) finite dimensional. This may not
NOTES, MAT 472, INTERMEDIATE ANALYSIS, FALL 2010 35

seem so special now, but many of the most important metric spaces in analysis are infinite
dimensional, and you will surely run into them (maybe not today, maybe not tomorrow,
but...yeah, yeah.)
Here is a simple part of the Heine-Borel theorem that we have essentially proved already.
For E ⊆ R, if E is bounded then every sequence in E has a convergent subsequence. If E is
both closed and bounded, then the limit of the convergent subsequence must belong to E.
Thus we see that for subsets of R, closed and bounded imply sequentially compact.
Here are two examples to show that for general metric spaces, boundedness is too weak a
notion. The first is simple-minded, but the second is more interesting.
Example 14.24. (1) Let X be an infinite set with the discrete metric (Example 4.17).
Then X is bounded, but not sequentially compact.
(2) Let V be the normed space of finite real sequences (Example 13.5). Then B 1 (0) is
closed and bounded, but not sequentially compact.
In fact, the situation is worse than might be realized if you just think about the
non-convergent Cauchy sequence from Example 13.5. Consider the sequence (en )
in V , where en = (0, 0, . . . , 0, 1, 0, 0, . . .) (with 1 in the nth slot). This sequence is
contained in the unit ball of V , but does not even have a Cauchy subsequence.
These examples show that the problem with boundedness is that a huge space can hide
inside a bounded set. The correct definition is the following.
Definition 14.25. A subset E of a metric space is called totally bounded if for every ε > 0
there are finitely many balls of radius ε that cover E.
Remark 14.26. (1) The definition is unaffected by specifying the type of the balls (open
vs. closed).
(2) A totally bounded subset of a metric space is bounded. A subset of a totally bounded
set is totally bounded.
The proofs are left as exercises.
The next lemma shows what makes Rn so special.
Lemma 14.27. In Rn , every bounded subset is totally bounded.
Proof. Let E ⊆√Rn be bounded, and let ε > 0. Choose C > 0 such that E ⊆ [−C, C]n .
Choose k > 2C n/ε. Write
k   [ k
[ 2C(i − 1) 2Ci
[−C, C] = −C + , −C + = Si ,
i=1
k k i=1

where S1 , . . ., Sk are closed intervals of length 2C/k < ε/ n. Then
k
k
[
n
[−C, C] = (S1 ∪ · · · ∪ Sk ) × · · · × (S1 ∪ · · · ∪ Sk ) = Si1 × · · · × Sik = ∪nj=1 Fj ,
i1 ,...,in =1

where each Fj is a closed cube of side 2C/k. Then √ the diameter of each Fj , which equals
the length of the diagonal of Fj , equals (2C/k) n < ε. Let xj ∈ Fj be arbitrary. Then
k
Fj ⊆ Bε (xj ). It follows that E ⊆ [−C, C]n ⊆ ∪nj=1 Bε (xj ). 
We now return to our development of the properties of compactness.
36 JACK SPIELBERG

Proposition 14.28. A sequentially compact metric space is totally bounded.


Proof. We again prove the contrapositive. Suppose that X is a metric space that is not
totally bounded. Then there is a positive number ε such that X cannot be covered by
finitely many balls of radius ε. Let x1 ∈ X. Since X 6⊆ B ε (x1 ) there must be x2 ∈ X with
d(x1 , x2 ) > ε. Since X 6⊆ B ε (x1 ) ∪ B ε (x2 ) there must be x3 ∈ X with d(xi , x3 ) > ε for i < 3.
Continuing this way we construct a sequence (xn ) in X such that d(xi , xn ) > ε for i < n.
This sequence has no Cauchy subsequence, hence no convergent subsequence. Therefore X
is not sequentially compact. 
We now have almost all of the pieces of the main theorem on compactness in metric spaces.
Theorem 14.29. Let X be a metric space. The following are equivalent:
(1) X is compact.
(2) X is sequentially compact.
(3) X is complete and totally bounded.
Proof. (1)⇒(2) This is Theorem 14.19.
(2)⇒(3) This follows from Propositions 14.22 and 14.28.
(3)⇒(1) We prove this by contradiction. Let X be complete and totally bounded, and
suppose that X is not compact. Then there is an open cover U having no finite subcover.
We first use total boundedness. There is a finite collection C1 of closed balls of radius 1
covering X. There must be a ball B1 ∈ C1 such that B1 is not finitely covered by U —
otherwise X would be finitely covered by U. Now since B1 is totally bounded there is a
finite collection C2 of closed balls of radius 1/2 covering B1 . There must exist B2 ∈ C2 such
that B1 ∩ B2 is not finitely covered by U. Continuing this process we construct a sequence
B1 , B2 , . . . of closed balls such that Bi has radius 1/i and such that for each i, B1 ∩ · · · ∩ Bi
is not finitely covered by U.
Now we use completeness of X: exercise 13.11 implies that there is a point a ∈ ∩∞ i=1 Bi .
Choose U0 ∈ U with a ∈ U0 . Since U0 is open there is r > 0 with Br (a) ⊆ U0 . Let n > 2/r.
We claim that Bn ⊆ Br (a). To see this, let y ∈ Bn . Then d(y, a) < diam (Bn ) ≤ 2/n <
r. This proves the claim, and hence we have Bn ⊆ U0 . Therefore B1 ∩ · · · ∩ Bn ⊆ U0 ,
contradicting the fact that B1 ∩ · · · ∩ Bn is not finitely covered by U. 
Corollary 14.30. (Heine-Borel theorem) Let E ⊆ Rn . Then E is compact if and only if E
is closed and bounded.
Proof. Since Rn is complete (Theorem 13.7), E is complete if and only if it is closed. By
Lemma 14.27 (and the remark preceding that Lemma), E is totally bounded if and only if
it is bounded. 

15. Continuity and compactness


Theorem 15.1. Let X and Y be metric spaces, and let f : X → Y be a continuous function.
If X is compact then so is f (X).
Proof. Let V be an open cover of f (X). Then f −1 (V) = f −1 (V ) : V ∈ V is an open cover


of X. Since X is compact, f −1 (V) has a finite subcover. Thus there are V1 , . . ., Vk ∈ V


such that X = ∪ki=1 f −1 (Vi ). But then f (X) ⊆ ∪ki=1 Vi . Thus V admits the finite subcover
{V1 , . . . , Vk }. 
NOTES, MAT 472, INTERMEDIATE ANALYSIS, FALL 2010 37

Exercise 15.2. One can also prove this theorem using sequences and sequential compact-
ness.
Corollary 15.3. If X is compact and f : X → Y is continuous, then f (X) is a closed
bounded subset of Y (in fact, totally bounded).
Corollary 15.4. (Extreme value theorem) Let X be a compact metric space, and let f :
X → R be continuous. Then f achieves its maximum and minimum at points of X: there
exist x0 , x1 ∈ X such that for all x ∈ X, f (x0 ) ≤ f (x) ≤ f (x1 ).
Proof. A (non-empty) closed bounded subset of R contains its infimum and supremum. 
Corollary 15.5. A continuous (R-valued) function on a closed bounded interval has a max-
imum and a minumum.
Definition 15.6. Let X and Y be metric spaces, and let f : X → Y . f is an open map if
f (A) is an open subset of Y whenever A is an open subset of X. f is a closed map if f (A)
is a closed subset of Y whenever A is a closed subset of X.
Remark 15.7. Note that the above definitions refer to the forward set map defined by f ,
which is less well behaved than the reverse set map. For the reverse map, the analogous
properties are equivalent to continuity (Theorem 8.5 and Exercise 8.6).
Theorem 15.8. Let X be compact, and let f : X → Y be continuous. Then f is a closed
map.
Proof. The proof is an exercise. 
Example 15.9. (1) Let T be the unit circle, and let f : [0, 1] → T be given by f (t) =
(cos 2πt, sin 2πt). Then f is continuous
 but not an open map: [0, 1/2) is an open
subset of [0, 1], but f [0, 1/2) is not an open subset of T, since it contain its non-
interior point (1, 0).
2πt 2πt

(2) Define g : [0, ∞) → T by g(t) = cos t+1 , sin t+1 . Then g is bijective and continuous,
but is neither
 a closed map nor an open map: [1, ∞) is a closed subset of [0, ∞), but
f [1, ∞) is not a closed subset of T since it does not contain its limit point  (1, 0).
As in the previous example, [0, 1) is an open subset of [0, ∞), but f [0, 1) is not an
open subset of T.
Theorem 15.10. Let X and Y be metric spaces with X compact, and let f : X → Y be
continuous and bijective. Then f is an open map.
Proof. Let U ⊆ X be open. Then U c is closed, hence compact. Therefore f (U c ) is compact,
hence closed. But f (U c ) = f (U )c since f is bijective. Therefore f (U ) is open. 
Corollary 15.11. In the above theorem, f −1 is continuous.

16. Connectedness
Let’s recall for a moment Example 8.8(5): T and [0, 1] are not homeomorphic metric
spaces. How might we go about proving this? A clever observation is the following: if we
remove a point from T, the result is still “one piece” (in fact, it is easy to see that for any
z ∈ T, T \ {z} is homeomorphic to R). On the other hand, if we remove a point from [0, 1]
(other than one of the two endpoints), the result “consists of two pieces”. It is an even
38 JACK SPIELBERG

cleverer observation that it is not very easy to say more precisely what we mean by “consists
of two pieces”. For example, any set containing more than one point can be divided into two
nonempty disjoint pieces. But surely, the divsion [0, 1] \ { 12 } = [0, 12 ) t ( 21 , 1] is a special way
of dividing a set into two pieces. What is special about it?
We need a topological property, and the following is the right one: no sequence in one
of the pieces can converge to a point of the other. Well, this is clearly true of the division
of [0, 1] \ { 21 } described above. But it pushes the problem back over to the other side: can
we prove that it is not possible to divide R into two nonempty disjoint pieces such that no
sequence in one piece can converge to a point of the other piece?
At some point, we just have to bite the bullet and try to prove a hard result. In this section
we will do this, and prove the fact about R stated in the previous paragraph. This is a deep
consequence of the completeness axiom. The relevant property of R is called connectedness.
As the above discussion has indicated, connectedness is a sort of “negative” property. We
will begin with the corresponding “positive” property. First, notice that to say that no
sequence in A converges to a point of B is the same thing as saying that A ∩ B = ∅. We use
this for our definition (notice that disjointness of A and B is implied).
Definition 16.1. Let X be a metric space. We call X separated if there exist nonempty
subsets A and B such that A ∪ B = X and A ∩ B = ∅ = A ∩ B. X is called connected if it
is not separated.
Remark 16.2. If E ⊆ X is a subset, we call E separated (or connected) if as a metric space
in its own right E has that property. We note that in the above definition of separation,
if A and B are subsets of E with union equal to E, the closures may be taken relative to
E, or in X — the intersections A ∩ B and A ∩ B will be the same. Thus being separated
or connected is an intrinsic property of E; it does not depend on whether E is given as a
subspace of another metric space.
There is another way to describe connectedness. Suppose that the metric space X is
separated, and let A and B be subsets as in the definition. Since X = A ∪ B, we know that
X = A ∪ B. Since A ∩ B = ∅, then A = (B)c . Thus A is an open set in X. Since A ∩ B = ∅
also, we know that B = Ac , hence B is closed. By the symmetry of the situation we know
that A is also closed, and B is open.
Definition 16.3. Let X be a metric space. A subset of X is clopen if it is both closed and
open.
Lemma 16.4. The metric space X is separated if and only if it contains a proper nonempty
clopen subset. X is connected if and only if its only clopen subsets are X and ∅.
Proof. The proof is elementary, and we leave it as an exercise. 
Remark 16.5. Let X be a metric space, and let E ⊆ X. What does it mean for A ⊆ E to
be relatively clopen in E? We know that A is relatively open in E if and only if A = E ∩ U
for some open set U ⊆ X. Similarly, one can check that A is relatively closed in E if and
only if A = E ∩ K for some closed set K ⊆ X. Thus A is relatively clopen in E if and only if
there are two sets U and K in X, with U open and K closed, such that A = E ∩ U = E ∩ K.
(Note that it is NOT NECESSARILY true that A equals the intersection of E with a clopen
subset of X.)
Exercise 16.6. The Cantor set (Definition 6.1) is not connected.
NOTES, MAT 472, INTERMEDIATE ANALYSIS, FALL 2010 39

We now identify the connected subsets of R.


Definition 16.7. An interval is a subset I ⊆ R such that for all a < c < b in R, if a, b ∈ I
then c ∈ I. (I.e. an interval is a subset of R that is closed under ‘betweenness’.)
Example 16.8. The following are intervals (for any a ≤ b in R):
(a, b) [a, b] [a, b) (a, b] ∅
(a, ∞) [a, ∞) (−∞, b) (−∞, b] R.
Lemma 16.9. Every interval is of one of the forms in Example 16.8.

Proof. Let I be a nonempty interval. Choose c ∈ I. Let B = x ∈ R : [c, x] ⊆ I . B is
nonempty since c ∈ B. Let
(
sup B, if B is bounded above,
b=
∞, else.
If b ∈ I and b ≥ c, then [c, b] ⊆ I and (b, ∞) ⊆ I c . If b 6∈ I, then [c, b) ⊆ I and [b, ∞) ⊆ I c .
Similarly, define a by working on the left of c. There are four cases altogether, and I is
presented as one of the forms in Example 16.8 in each case. 
Theorem 16.10. Let I ⊆ R. Then I is connected if and only if I is an interval.
Proof. (=⇒): Suppose that I is not an interval. Then there are a < c < b in R with a, b ∈ I
and c 6∈ I. Put A = (−∞, c) ∩ I. Then A 6= ∅, A 6= I, and A = I ∩ (−∞, c) = I ∩ (−∞, c]
is clopen in I.
(⇐=): Suppose that I is an interval, but that I is not connected. Let E ⊆ I be a proper
nonempty clopen subset of I. Then there are an open set U ⊆ R and a closed set K ⊆ R
such that E = I ∩ U = I ∩ K. Let a ∈ E and b ∈ I \ E. We may as well assume that a < b.
Then since I is an interval, we know that [a, b] ⊆ I. We have
E ∩ [a, b] = I ∩ K ∩ [a, b] = K ∩ [a, b];

(I \ E) ∩ [a, b] = I \ (I ∩ U ) ∩ [a, b] = (I \ U ) ∩ [a, b] = [a, b] \ U.


Thus E ∩ [a, b] and (I \ E) ∩ [a, b] are closed subsets of R. Let c = sup E ∩ [a, b] . Then
c ∈ E ∩[a, b] since this set is closed. Also, c < b since b 6∈ E. Hence (c, b] ⊆ (I \E)∩[a, b], and
so c ∈ (I \E)∩[a, b] since this set is closed. This leads to the contradiction c ∈ E ∩(I \E). 
The following theorem is very useful, and we place it here because it deals with intervals
(although it is not a result about connectedness).
Theorem 16.11. Let U ⊆ R be open. Then U equals the union of countably many open
intervals. Moreover, U can be written as the union of a countable collection of pairwise
disjoint open intervals, and this collection is unique.
  
Proof. For x ∈ U choose a(x), b(x) ∈ Q with x ∈ a(x), b(x) ⊆ U . Let E = a(x), b(x) :
x ∈ U . Then E is a collection of open intervals. Since E ⊆ (α, Sβ) : α, β ∈ Q, α < β 
2
Q , we see that E is a countable collection. It is clear that U = E.
The proof of the second statement of the Theorem is left as an exercise. 
40 JACK SPIELBERG

17. Continuity and connectedness


Theorem 17.1. Let X and Y be metric spaces, and let f : X → Y be continuous. Suppose
that X is connected. Then f (X) is connected.
Proof. Since f is continuous, f −1 preserves openness and closedness, hence clopenness. Since
X is connected, f −1 (E) is clopen if and only if it equals X or ∅. Therefore any nonempty
clopen set in f (X) must equal f (X). 
Corollary 17.2. (Intermediate value theorem) Let X be a connected metric space, and
f : X → R a continuous function. Let a, b ∈ X, and let t lie between f (a) and f (b). Then
there exists x ∈ X such that f (x) = t.
Proof. By Theorem 17.1, f (X) is a connected subset of R, hence an interval. 
Example 17.3. The following is a typical “practical” illustration of the corollary. Suppose
that the temperature in Phoenix is 110 degrees, and at the same instant the temperature in
La Paz is 2 degrees. Then there must be a place on the earth’s surface where the temperature
(at the same instant) is exactly π degrees.
Example 17.4. Let n ∈ N, and define f : [0, ∞) → [0, ∞) by f (t) = tn . Since f is
continuous and [0, ∞) is connected, it follows from Theorem 17.1 that f [0, ∞) is connected. 
n n
Let x > 0. There
 is k ∈ N with x < k. Then 0 < x < k . Since 0, k ∈ f [0, ∞) , then
x ∈ f [0, ∞) . Therefore there exists y > 0 such that x = f (y). This is a new proof of the
existence of nth roots (compare with the proof of Theorem 1.22).
Now let b > 0, and consider the restriction of f : fb := f [0,b] : [0, b] → [0, bn ]. Since [0, b]
is compact and fb is continuous and bijective, it follows from Corollary 15.11 that (fb )−1 is
continuous.√This is true for all b > 0, and hence we have proved that f −1 is continuous.
(f −1 (x) = n x.)
Definition 17.5. The metric space X is path connected if for any two points a1 , a2 ∈ X,
there is a continuous function f : [t1 , t2 ] → X such that f (ti ) = ai , for i = 1, 2.
Proposition 17.6. If X is path connected, then X is connected.
Proof. Let A ⊆ X be a nonempty clopen subset. Let a ∈ X. For any x ∈ X, there
is a continuous function f : [0, 1] → X such that f (0) = a and f (1) = x. Since A is
clopen, f −1 (A) is a clopen subset of [0, 1], and is nonempty since it contains 0. Since [0, 1] is
connected, we have 1 ∈ [0, 1] ⊆ f −1 (A), and hence that x = f (1) ∈ f ([0, 1]) ⊆ A. Therefore
A = X. 

Definition
17.7. Let V be a real vector space. For x, y ∈ V let Sx,y = (1 − t)x + ty : t ∈
[0, 1] . (Sx,y is the line segment connecting x and y.) A subset E ⊆ V is called convex if
Sx,y ⊆ E whenever x, y ∈ E.
Theorem 17.8. Let V be a real normed vector space. Every convex subset of V is connected.
Proof. For any x, y ∈ E, the function f : [0, 1] → V defined by f (t) = (1 − t)x + ty is
continuous. Thus f : [0, 1] → E, and so E is path connected. 
Corollary 17.9. Any convex subset of Rn is connected. (For example, any ball in Rn is
connected.)
NOTES, MAT 472, INTERMEDIATE ANALYSIS, FALL 2010 41

Definition 17.10. Let D ⊆ Rn , and


 let f : D → Rm be a function. The graph of f is the
set G(f ) ⊆ Rn+m given by G(f ) = x, f (x) : x ∈ D .
Proposition 17.11. If D ⊆ Rn is connected, and f : D → Rm is continuous, then G(f ) is
connected.

Proof. Define g : D → Rn+m by g(x) = x, f (x) . Then g is continuous, since all of its
coordinate functions are continuous (being either a coordinate of x, or a coordinate function
of the continuous function f ). By Theorem 17.1, g(D) is connected. But g(D) = G(f ). 
Example 17.12. (1) The unit circle T is connected by Theorem 17.1, being the image
of [0, 1] under the continuous function (cos 2πt, sin 2πt).
(2) The graph of sin(1/x)
 for x > 0 is connected,
by Proposition 17.11. Let E denote
this graph: E = x, sin(1/x) : x > 0 . Let F = {0} × [−1, 1]. F is also connected,
being convex. It follows from Exercise 17.13 below that the union E∪F is a connected
set. It is a nice exercise to prove that it is not path connected. You should draw a
picture (and do the exercise) in order to appreciate this bizarre example.
(3) In the last example, delete the portion of E for x > π, then include a curve below the
wiggly graph, connecting (π, 0) to (0, −1). The new set is called the Warsaw circle.
It is path connected, but there does not exist a path going “once around”.
Exercise 17.13. Let A be a connected subset of a metric space, and let A ⊆ B ⊆ A. Then
B is connected.
Exercise 17.14. Let X be a metric space, let A ⊆ X be a connected subset, and let E ⊆ X
be a clopen subset. Then either A ∩ E = ∅, or A ⊆ E. (Thus if a clopen set touches a
connected set, it must contain all of it.)
Exercise 17.15. Let {Ai : i ∈ I} be subsets of a metric space. If all of the Ai are connected,
and if ∩i∈I Ai 6= ∅, then ∪i∈I Ai is connected.
Exercise 17.16. Let E be the following subset of R2 :

!
[
{ n1 } × [0, 1] ∪ {(0, 1)}.
 
E = (0, 1] × {0} ∪
n=1

Then E is connected, but not path connected.


Theorem 17.17. Let X be a metric space. For x ∈ X let
[
C(x) = A ⊆ X : x ∈ A and A is connected .
(1) C(x)
 is connected.

(2) C(x) : x ∈ X is a partition of X.
(3) C(x) is a closed set.
(4) C(x) is a maximal connected subset of X.
Proof. (1) The sets A in the union defining C(x) all contain x. Thus C(x) is connected by
Lemma 17.15.
(2) Suppose that C(x) ∩ C(y) 6= ∅. By Lemma 17.15, C(x) ∪ C(y) is connected. Since it
contains x it is one of the sets A in the union defining C(x). Thus C(x) ∪ C(y) ⊆ C(x), and
we have that C(y) ⊆ C(x). By symmetry, C(x) ⊆ C(y), so that C(x) = C(y).
42 JACK SPIELBERG

(3) By (1) and Exercise 17.13, C(x) is connected. Since x ∈ C(x), C(x) is one of the sets in
the union defining C(x); thus C(x) ⊆ C(x).
(4) Any connected set containing C(x) is one of the sets in the union defining C(x), and
hence must equal C(x). 
Definition 17.18. Let X be a metric space. A component of X is a maximal connected
subset. Thus the components of X are the sets C(x) from Theorem 17.17.
Theorem 17.19. Let U ⊆ Rn be open. Then U has countably many components, and these
are open sets.
Proof. Let x ∈ U , and y ∈ C(x). Since U is open there is r > 0 such that Br (y) ⊆ U . Then
C(x) ∪ Br (y) is connected by Lemma 17.15 (and Corollary 17.9). Then C(x) ∪ Br (y) ⊆ C(x)
by the definition of C(x), hence Br (y) ⊆ C(x). Thus C(x) is open.
Since the components of U are open, we may choose an element of Qn in each one. This
defines a map from the set of components to Qn . Since the distinct components are disjoint,
this map is one-to-one. Since Qn is countable, so is the set of components. 

18. Uniform continuity


Continuity is a locally defined property. Suppose that f : X → Y is continuous. If
ε > 0 is given, and if a point x0 ∈ X is given, then continuity of f at x0 provides a
positive number δ with a certain property (Definition 8.1). The local-ness is expressed in
the order of the quantifiers in that definition (and as we have rephrased it above): the
number δ need only do its job for the one point x0 already chosen. In fact, this means that
δ (perhaps slightly modified) works throughout some ball centered at x0 . A(n open) ball
centered at x0 is a neighborhood of x0 . A property is local if each point has a neighborhood
in which the property holds. A globally defined property, on the other hand, is one that
holds everywhere. Continuity would be globally defined if the same δ worked for all points
of X. Not all continuous functions have such a strong form of continuity; those that do have
a special name.
Definition 18.1. Let X and Y be metric spaces, and let f : X → Y be a function. f is
uniformly continuous if for every ε > 0 there exists δ > 0 such that for all x1 , x2 ∈ X, if
dX (x1 , x2 ) < δ then dY f (x1 ), f (x2 ) < ε.
Note that the only difference between this definition and the definition of continuity on X
is in the order in which the point and the δ are specified. Some examples will help to clarify
this.
Example 18.2. (1) Let f : [−10, 10] → R be given by f (t) = t2 . Then f is uniformly
continuous.
Proof. Let ε > 0 be given. Let δ = ε/20. If t1 , t2 ∈ [−10, 10] are such  that |t1 −t2 | < δ,
then f (t1 ) − f (t2 ) = |t21 − t22 | = |t1 + t2 | · |t1 − t2 | < |t1 | + |t2 | δ ≤ 20δ = ε. 
(2) Let g : R → R be given by g(t) = t2 . Then g is not uniformly continuous.
Proof. We choose ε = 1. Let δ > 0 be given. Choose t > 1/δ, and let s = t + δ/2.
Then |s − t| = δ/2 < δ, while |s2 − t2 | = |s − t| · |s + t| = (δ/2)(2t + δ/2) > δt > 1 = ε.
Therefore g is not uniformly continuous. 
NOTES, MAT 472, INTERMEDIATE ANALYSIS, FALL 2010 43

(3) Let h : (0, 1) → R be given by h(t) = sin(1/t). Then h is not uniformly continuous.

Proof. We choose ε = 2. Let δ > 0 be given. Choose n > 1/ δ. Let s = 2/[(2n+1)π]
and let t = 2/[(2n + 3)π]. Then
   
2 1 1 2 2 1
|s − t| = − = ≤ 2 < δ.
π 2n + 1 2n + 3 π (2n + 1)(2n + 3) n
But h(s) − h(t) = 1 − (−1) = 2 ≥ ε. Therefore h is not uniformly continuous. 
The following theorem is a classic use of compactness to get a global result from local
information.
Theorem 18.3. Suppose f : X → Y is continuous, and X is compact. Then f is uniformly
continuous.
Proof. Let  ε > 0 be given.
 Since f is continuous,
 for each x ∈ X there is rx > 0 such that
f Brx (x) ⊆ Bε/2 f (x) . The collection Brx /2 (x) : x ∈SX is an open cover of X. Since X
is compact, there are x1 , . . ., xn ∈ X such that X = ni=1 Brxi /2 (xi ). Let δ = min{rxi /2 :
1 ≤ i ≤ n}. Let y, z ∈ X with d(y, z) < δ. There is i such that d(y, xi ) < rxi/2. Then
d(z, xi ) ≤ d(z,
 y) + d(y, xi ) < δ + rxi /2 ≤ rxi . Then f (y), f (z) ∈ Bε/2 f (xi ) , so that
d f (y), f (z) < ε. 
19. Convergence of functions
Definition 19.1. Let X be a set. (Note that we really do mean set. Later we will let X
be a metric space, but for now, that is not relevant.) Let fn : X → Rk for n = 1, 2, 3, . . ..
(We remark that Rk may be replaced by another metric space. For ease of exposition, we
restrict our attention to the case where
∞ the codomain is Euclidean space.) For a ∈ X we say
k
that (fn ) converges at a if fn (a) n=1 is a convergent sequence in R . If (fn ) converges at
each point of x, define f : X → Rk by f (x) = limn→∞ fn (x). We say that (fn ) converges to
f (pointwise).
We may specify this more precisely as: for every ε > 0, for every x ∈ X, there exists
n0 ∈ N such that for all n ≥ n0 , kfn (x) − f (x)k < ε. (Note that n0 ≡ no (ε, x) depends on
both ε and on x.)
Example 19.2. (1) Let fn : [0, 1] → R be given by fn (x) = n1 x. Then fn → 0.
(2) Let gn : [0, 1] → R be given by gn (x) = xn . Then gn → g, where
(
0, if x < 1,
g(x) =
1, if x = 1.
Definition 19.3. Let f , fn : X → Rk . We say that (fn ) converges to f uniformly (on X)
if for each ε > 0, there exists n0 ∈ N such that for every x ∈ X, and for every n ≥ n0 ,
kfn (x) − f (x)k < ε. (Note that n0 ≡ n0 (ε) depends only on ε.)
Formally, the difference between pointwise convergence and uniform convergence is only
in the order of the two quantifed variables n0 and x. The difference practically, however, is
profound, and it is important that you get a good feel for it.
Example 19.4. (1) n1 x → 0 uniformly on [0, 1].
(2) xn 6→ 0 uniformly on [0, 1].
44 JACK SPIELBERG

Proof. Let ε = 1/2. Let n0 be given. We choose n =


n0 . Since limt→1 tn = 1, there
is x ∈ [0, 1) such that xn > 1/2. Then gn (x) − g(x) = gn (x) − 0 > 1/2 = ε. 
1
(3) n
x 6→ 0 uniformly on R.
It is useful to have an intrinsic characterization for uniform convergence, i.e. a Cauchy
condition.
Definition 19.5. Let X be a set, and let fn : X → Rk be functions for n ∈ N. (fn ) is
uniformly Cauchy (on X) if for each ε > 0, there exists n0 ∈ N such that for all x ∈ X, and
for all m, n ≥ n0 , kfm (x) − fn (x)k < ε.
Proposition 19.6. If (fn ) is uniformly Cauchy, then (fn ) is uniformly convergent.
Proof. Let ε > 0. Choose n0 such that for all m, n ≥ n0 , and for all x ∈ X, kfm (x)−fn (x)k <

ε/2. This shows that for each x ∈ X, the sequence fn (x) n=1 is Cauchy in Rk . Since Rk
∞
is complete, fn (x) n=1 converges. Define f : X → Rk by f (x) = limn→∞ fn (x). If n ≥ n0 ,
then for all x ∈ X we have
kfn (x) − f (x)k = lim kfn (x) − fm (x)k, since y ∈ Rk 7→ kz − yk ∈ R is continuous,
m→∞
ε

2
< ε.
Therefore fn → f uniformly on X. 

Now we derive consequences when X is a metric space.


Theorem 19.7. Let X be a metric space, let f , fn : X → Rk , and suppose that fn → f
uniformly on X. Let a ∈ X, and suppose that fn is continuous at a for all n ∈ N. Then f
is continuous at a.
Proof. Let ε > 0. Choose n such that for all x ∈ X, kfn (x) − f (x)k < ε/3. since fn is
continuous at a, there is δ > 0 such that kfn (x) − fn (a)k < ε/3 whenever d(x, a) < δ. Now
let x ∈ X with d(x, a) < δ. We have
kf (x) − f (a)k ≤ kf (x) − fn (x)k + kfn (x) − fn (a)k + kfn (a) − f (a)k
ε ε ε
+ +
3 3 3
= ε,

(where the first and third occurrences of ε/3 are due to the uniform approximation of f by
fn , and the second is due to the continuity of fn at a). Therefore f is continuous at a. 
Corollary 19.8. The uniform limit of continuous functions is continuous.
Example 19.9. (1) Consider the sequence of functions xn on [0, 1]. We have seen that
this sequence has a pointwise limit, which is not continuous. Since xn is continuous
for each n, the theorem implies that the convergence is not uniform (this is an easier
proof than the direct proof we gave earlier).
NOTES, MAT 472, INTERMEDIATE ANALYSIS, FALL 2010 45

(2) The above argument cannot be used in reverse. For example, let fn : [0, 1] → R be
given by 
1
2nx,
 if 0 ≤ x ≤ 2n
1 1
fn (x) = −2n(x − 2n ), if 2n ≤ x ≤ n1
if n1 ≤ x ≤ 1.

0,
(It will be helpful to draw a picture.) Then fn → 0 pointwise on [0, 1], but not
uniformly, even though the limit is continuous.
Example 19.10. Recall function space from Example 4.6: if X is a set, B(X, Rk ) is the
vector space of all bounded function from X to Rk . B(X, Rk ) is a normed vector space, with
norm given by kf k = supx∈X kf (x)k. Thus B(X, Rk ) is a metric space.
Proposition 19.11. Let f , fn : X → Rk be bounded functions.
(1) fn → f in B(X, Rk ) if and only if fn → f uniformly on X.
(2) (fn ) is Cauchy in B(X, Rk ) if and only if (fn ) is uniformly Cauchy on X.
Proof. This follows immediately from the definitions. 
Corollary 19.12. B(X, Rk ) is a complete metric space.
Proof. This follows from Proposition 19.6 and the above proposition. 
Definition 19.13. Let X be a metric space. Cb (X, Rk ) is the space of all bounded continuous
functions from X to Rk .
Note that Cb (X, Rk ) is a vector subspace of B(X, Rk ), since the sum and (scalar) product
of continuous functions is continuous.
Proposition 19.14. Cb (X, Rk ) is a complete metric space.
Proof. This follows from Corollary 19.8. 
Remark 19.15. If X is a compact metric space, then C(X, Rk ) = Cb (X, Rk ).

20. Differentiation
Definition 20.1. Let I ⊆ R be open, let f : I → R, and let a ∈ I. f is differentiable at a if
f (x) − f (a)
lim
x→a x−a

exists (equivalently, if limh→0 f (a + h) − f (a) /h exists). The limit is called the derivative
df df
of f at a, and is denoted f 0 (a) (or dx (a), or dx x=a
). We say that f is differentiable on I if

it is differentiable at each point of I. We refer to the quantity f (x) − f (a) /(x − a) as the
difference quotient.
Suppose that f is differentiable at a. Let L(x) = f (a) + f 0 (a)(x − a) (L is a “linear
function”, in that its graph is a straight line). The function f is well-approximated by L in
the following sense:
(3) f (a) = L(a)
f (x) − L(x)
(4) lim = 0.
x→a x−a
46 JACK SPIELBERG

Remark 20.2. There exists at most one linear function L having these properties. Unique-
ness is an exercise, while existence is equivalent to differentiability.
There is a third equivalent formulation of differentiability. We motivate it as follows. Let
f be differentiable at a. Define u : I → R by
f (x)−f (a)−f 0 (a)(x−a)
(
x−a
, if x 6= a
u(x) =
0, if x = a.
Then limx→a u(x) = limx→a f (x)−L(x)
x−a
= 0, so that u is continuous at a. Moreover, f (x) =
f (a) + f 0 (a)(x − a) + u(x)(x − a). Thus we see that if f is differentiable at a, then f differs
from L by a function that tends to zero as x tends to a, even when divided by x − a.
Theorem 20.3. f is differentiable at a if and only if there exist a linear function L(x) =
m(x − a) + b, and a function u(x), such that
(1) u(a) = 0.
(2) u is continuous at a.
(3) f (x) = L(x) + u(x)(x − a).
In this case, f 0 (a) = m (and of course, b = f (a)).
Proof. The ‘only if’ direction was proved in the remarks before the statement of the theorem.
For the ‘if’ direction, let L and u be as in the statement of the theorem. Letting x = a in
the third item of the statement gives f (a) = b. Then dividing by x − a, and letting x → a,
we get
f (x) − f (a) m(x − a) + u(x)(x − a) 
lim = lim = lim m + u(x) = m,
x→a x−a x→a x−a x→a

since u is continuous at a with value 0. 


We now present some basic properties of differentiation.
Lemma 20.4. If f is differentiable at a, then f is continuous at a.
Proof.
f (x) − f (a)
(x−a)+f (a) = f 0 (a)·0+f (a) = f (a).

lim f (x) = lim f (x)−f (a) +f (a) = lim
x→a x→a x→a x−a

Lemma 20.5.
d
(kx + `) = k.
dx
Proof. (exercise) 
Lemma 20.6. If f and g are both differentiable at a, then so are f + g, f g, and f /g (if
g(a) 6= 0), and
(f + g)0 (a) = f 0 (a) + g 0 (a)
(f g)0 (a) = f 0 (a)g(a) + f (a)g 0 (a)
f 0 (a)g(a) − f (a)g 0 (a)
(f /g)0 (a) = .
g(a)2
Proof. (exercises) 
NOTES, MAT 472, INTERMEDIATE ANALYSIS, FALL 2010 47

Theorem 20.7. (The chain rule.) Let I, J ⊆ R be open, let f : I → R and g : J → R, let
a ∈ I, suppose that f (a) ∈ J, and suppose that f is differentiable at a and g is differentiable
at f (a). Then g ◦ f is differentiable at a, and (g ◦ f )0 (a) = g 0 f (a) f 0 (a).
Proof. We apply Theorem 20.3 to f and g to obtain functions u : I → R and v : J → R
such that
(1) u and v vanish at a and f (a), respectively.
(2) u and v are continuous at a and f (a), respectively.
(3)
f (x) = f (a) + f 0 (a)(x − a) + u(x)(x − a)
g(y) = g f (a) + g 0 f (a) y − f (a) + v(y) y − f (a) .
   

Then we have (where we let f (x) play the role of y):


g f (x) = g f (a) + g 0 f (a) f (x) − f (a) + v f (x) f (x) − f (a)
     
 
= g f (a) + g 0 f (a) f 0 (a)(x − a) + u(x)(x − a)


 0 
+ v f (x) f (a)(x − a) + u(x)(x − a)
= g f (a) + g 0 f (a) f 0 (a)(x − a)
 
h i
+ g 0 f (a) u(x) + v f (x) f 0 (a) + v f (x) u(x) (x − a).
  

Then by Theorem 20.3 it suffices to show that the expression in square brackets vanishes
and is continous at x = a. We check this for each of the three terms separately. It is true for
the first term because it is true for u. It is true for the second term because f is continuous
at a (by Lemma 20.4), v is continuous, and vanishes, at f (a), and Theorem 8.11. It is true
for the third term by both of the above. 
We now draw out some consequences of differentiability on intervals. First we give a
general definition.
Definition 20.8. Let X be a metric space, let U ⊆ X be open, let a ∈ U and let f : U → R.
f has a local maximum (respectively local minimum) at a if there is r > 0 such that for all
x ∈ Br (a) we have f (x) ≤ f (a) (respectively, f (x) ≥ f (a)). Local maxima and minima are
called local extrema.
Lemma 20.9. Let I ⊆ R be an open interval, let a ∈ I, and let f : I → R. Suppose that f
is differentiable at a. If f has a local extremum at a, then f 0 (a) = 0.
Proof. We prove the contrapositive. Suppose that f 0 (a) 6= 0. For definiteness we assume
f 0 (a) > 0 (the proof in the case f 0 (a) < 0 is analogous). We then have that limx→a f (x) −
f (a) /(x − a) > 0. Then there is δ >0 such that (a − δ, a + δ) ⊆ I, and such that for x ∈ I,
if 0 < |x − a| < δ then f (x) − f (a) /(x − a) > 0. Now, for any x with a − δ < x < a, we
have x − a < 0. Since the difference quotient is positive, we must have f (x) − f (a) < 0; thus
f does not have a local minimum at a. Similarly, for any x with a < x < a + δ, we have
x − a > 0. Again, since the difference quotient is positive, we must have f (x) − f (a) > 0;
48 JACK SPIELBERG

thus f does not have a local maximum at a. Therefore, f does not have a local extremum
at a. 
This lemma has several famous applications.
Theorem 20.10. (Rolle’s theorem) Let f : [a, b] → R be continuous, and assume that f is
differentiable on (a, b). Suppose further that f (a) = f (b) = 0. Then there exists c ∈ (a, b)
such that f 0 (c) = 0.
Rolle’s theorem is a special case of the following theorem
Theorem 20.11. (Mean value theorem) Let f : [a, b] → R be continuous, and assume  that f
0
is differentiable on (a, b). Then there exists c ∈ (a, b) such that f (c) = f (b) − f (a) /(b − a).
The idea of the theorem, and the proof, is easy to see from a simple sketch:
 on the graph
of f , draw the straight line between the endpoints of the graph a, f (a) and b, f (b) . Let
L(x) be the linear function whose graph passes through these two points. The point c in the
theorem is (one of) the place(s) where the vertical distance between the graphs of f and L
is stationary, i.e. has a local extremum. A little algebraic manipulation of the expression
f (x) − L(x) yields the beginning of the following proof.
 
Proof. Let h(x) = f (x) − f (a) (b − a) − f (b) − f (a) (x − a). Then h is continuous on [a, b]
and differentiable on (a, b). Also h(a) = h(b) = 0. By the extreme value theorem (Corollary
15.4), h takes on its maximum and minimum values on [a, b]. We note that at least one
of these occurs in the interior (a, b). For if both occur at the endpoints, then h must be
identically zero, and hence achieves its maximimum and minimum at every point of [a, b].
Let c ∈ (a, b) be such a point. By Lemma 20.9 we have h0 (c) = 0. Differentiating h gives
h0 (x) = f 0 (x)(b−a)− f (b)−f (a) . Then the equation h0 (c) = 0 gives the desired result. 


Remark 20.12. There is an alternate phrasing of the mean value theorem that is often
convenient. Let f : I → R be differentiable, where I is an open interval. Let a ∈ I and
h ∈ R \ {0} be such that a + h ∈ I. If we wish to apply the mean value theorem to the closed
interval having a and a + h as endpoints, we would like to express the conclusion without
declaring which is the left, and which the right, endpoint. We avoid this inconvenience in the
following way: the point c lies (strictly) between a and a + h if and only if there is a number
0 < θ < 1 such that c = a + θh. Thus we reexpress the mean value theorem in the following
way: if a, a + h ∈ I then there exists 0 < θ < 1 such that f (a + h) = f (a) + hf 0 (a + θh).
Now we give some corollaries of the mean value theorem.
Corollary 20.13. Let I ⊆ R be an open interval, and let f : I → R be differentiable. If
f 0 = 0 on I, then f is constant on I.
Proof. Let x0 ∈ I, and apply the mean value theorem to the interval between x0 and x,
for any x ∈ I. We find that there is c strictly between x0 and x such that f (x) − f (x0 ) =
f 0 (c)(x − x0 ) = 0. Thus f (x) = f (x0 ) for all x ∈ I. 
Corollary 20.14. Let I be as in the previous corollary, and let f , g : I → R be differentiable.
If f 0 = g 0 on I, then f − g is a constant function.
Proof. Apply the previous corollary to f − g. 
NOTES, MAT 472, INTERMEDIATE ANALYSIS, FALL 2010 49

Definition 20.15. Let I be an interval, and f : I → R. We say that f is increasing


(respectively, decreasing) on I if for all x, y ∈ I, if x < y then f (x) ≤ f (y) (respectively,
f (x) ≥ f (y)). We say that f is strictly increasing (respectively, strictly decreasing) if the
inequalities above involving f are strict rather than weak.
Corollary 20.16. Let I be as in the previous corollaries, and let f : I → R be differentiable.
If f 0 ≥ 0 (respectively f 0 ≤ 0) on I, then f is increasing (respectively, decreasing) on I.
If f 0 > 0 (respectively f 0 < 0) on I, then f is strictly increasing (respectively, strictly
decreasing) on I.
Proof. We will give the proof in the case that f 0 > 0 on I; the other parts have similar proofs.
Let x < y in I. By the mean value theorem there is x < c < y such that f (y) − f (x) =
f 0 (c)(y − x) > 0. Since f 0 (c) > 0 and y − x > 0, then f (y) > f (x). 
Definition 20.17. Let X and Y be metric spaces. A function f : X → Y is called Lipschitz 
(on X) if there is a positive constant M such that for all x1 , x2 ∈ X we have d f (x1 ), f (x2 ) ≤
M d(x1 , x2 ).
Corollary 20.18. Let I and f be as in the previous corollary. Suppose that f 0 is bounded
on I. Then f is Lipschitz on I.
Proof. Let |f 0 | ≤ M on I. Then for any x, y ∈ I, the mean value
theorem provides
c between
0
x and y such that f (x) − f (y) = f (c)(x − y). It follows that f (x) − f (y) ≤ M |x − y|. 

We next wish to prove the inverse function theorem.


Theorem 20.19. Let I be an open interval, let f : I → R be differentiable, and suppose
that f 0 6= 0 on I. Then f (I) is an open interval, and f : I → f (I) is a homeomorphism.
Moreover, f −1 is differentiable, and
1
(f −1 )0 (y) = −1
.
f f (y)
The generalization of this theorem to higher dimensions is a very important result, and
somewhat surprisingly, is much much harder to prove. (We will tackle that next semester.)
In dimension one, the job is easier because the assumption that f 0 is nonzero means that
f is monotone — if we know that f 0 > 0 (or < 0) throughout I. If we assume that f is
continuously differentiable, then this is immediate: the intermediate value theorem would
apply to the continuous function f 0 , and we would know that f 0 can’t take on both positive
and negative values if it is never zero. In the higher dimensional situation we will assume
that f is continuously differentiable. However, it is remarkable that in dimension one, the
result is true even if f 0 is not continuous. This is because of the following simple observation:
f 0 satisfies the intermediate value property even if it is not continuous.
Theorem 20.20. Let I be an interval and let f : I → R be differentiable. Let a, b ∈ I, and
assume that f 0 (a) < f 0 (b). If f 0 (a) < M < f 0 (b), then there exists c between a and b such
that f 0 (c) = M .
Proof. We will prove this in the special case where a < b, f 0 (a) < 0 < f 0 (b), and M = 0. The
general case follows easily from this, and we leave those details as an exercise. Since f 0(a) =
limh→0 h f (a + h) − f (a) , there is h > 0 such that a + h < b and h−1 f (a + h) − f (a) < 0.
−1

It follows that f (a + h) < f (a). Therefore the minimum of f on [a, b] does not occur at
50 JACK SPIELBERG

a. A similar argument shows that the minimum does not occur at b. Hence f has a local
minimum in the open interval (a, b), and at this point f 0 = 0. 
Proof. (of Theorem 20.19) By Theorem 20.20 we know that f 0 > 0 on I, or that f 0 < 0
on I. By Corollary 20.16 it follows that f is strictly monotone on I. It follows from the
intermediate value theorem that f (I) is an open interval, and that f −1 is continuous. We now
show that f −1 is differentiable, and compute its derivative. For x ∈ I let y = f (x) ∈ f (I).
For w ∈ f (I) with w 6= y, there is t ∈ I such that w = f (t). Since f is one-to-one, t 6= x.
We have −1
f −1 (w) − f −1 (y)

t−x f (t) − f (x)
= = .
w−y f (t) − f (x) t−x
Since f −1 is continuous, limw→y t = x. Moreover t 6= x during this limiting process. Therefore
−1
f −1 (w) − f −1 (y)

f (t) − f (x) 1 1
lim = lim = 0 = 0 −1  . 
w→y w−y t→x t−x f (x) f f (y)
Corollary 20.21. If f is C r (in addition to the hypotheses of the inverse function theorem),
then so is f −1 .
Proof. The formula for (f −1 )0 shows that it is continuous if f 0 is continuous. Similarly, it is
differentiable if f 0 is differentiable, etc. 
If you consider the function h used in the proof of the mean value theorem, you will notice
the beginnings of some symmetry: the function f and the identity function play opposite
roles. Remarkably, the identity function can be replaced by another function like f . The
result is
Theorem 20.22. (Cauchy mean value theorem.) Let f , g : [a, b] → R be continuous,  0
and differentiable
 0 on (a, b). Then there exists c ∈ (a, b) such that f (b) − f (a) g (c) =
g(b) − g(a) f (c).
   
Proof. Let h(t) = f (b)−f (a) g(t)−g(a) − f (t)−f (a) g(b)−g(a) . Then h is continuous
on [a, b], differentiable on (a, b), and h(a) = h(b) = 0. Now the mean value theorem gives
the result. 
We apply Cauchy’s mean value theorem to prove L’Hôpital’s rule on the computation of
indeterminate limits. The proof applies to any form of continuous limit — here we phrase
it for one-sided limits.
Theorem 20.23. (L’Hôpital’s rule.) Let f , g : (a, b) → R be differentiable. Suppose that
limt→a+ f (t) = limt→a+ g(t) = 0, and that g(t) 6= 0 on (a, b). If limt→a+ f 0 (t)/g 0 (t) = L, then
limt→a+ f (t)/g(t) = L.
Proof. Define f (a) = g(a) = 0. Then f and g are continuous on [a, b). By the hypothesis on
the limit of f 0 /g 0 , we are implicitly assuming that g 0 (t) 6= 0, at least for all t close enough to
a. Replacing b by a smaller value, we may assume that g 0 6= 0 on (a, b). Now, for t ∈ (a, b),
we apply Cauchy’s mean value  0 theorem to f and g on the interval [a, t]. Thus there exists
0
c ∈ (a, t) with f (t) − f (a) g (c) = g(t) − g(a) f (c). Since f (a) = g(a) = 0, we get
f (t)g 0 (c) = g(t)f 0 (c). By hypothesis we have g(t) 6= 0. Thus we have
f (t) f 0 (c)
= 0 .
g(t) g (c)
NOTES, MAT 472, INTERMEDIATE ANALYSIS, FALL 2010 51

Moreover, a < c < t. Of course, c depends on t, but we see that as t → a+ then also c → a+ .
Hence
f (t) f 0 (c)
lim+ = lim+ 0 = L.
t→a g(t) c→a g (c)

With a bit
more work, the same result can be proved in the case where we assume
limt→a+ f (t) = limt→a+ g(t) = ∞. This is an interesting exercise, or you may look up
the proof (e.g. in Rudin). If limt→a+ f (t) = 0 while limt→a+ g(t) = ±∞, evaluating the limit
limt→a+ f (t)g(t) presents us with the third kind of indeterminate form, namely 0 · ∞. In this
case, we would instead consider the limit of f /(g −1 ), which is indeterminate of form 0/0.

21. Higher order derivatives and Taylor’s theorem


If f is differentiable on an open interval I, then f 0 is itself a function on I. f 0 need not be
continuous; if it is continuous, we say that f is continuously differentiable on I. Even if f is
continuously differentiable, f 0 need not be differentiable. If f is differentiable on I, and if f 0
is differentiable at a point a ∈ I, we write f 00 (a) for the derivative of f 0 at a. (Note that in
order to be able to consider whether f 00 (a) exists, it is necessary that f be differentiable in
a neighborhood of a — I is such a neighborhood.) If f 0 is differentiable on I, we say that
f is twice differentiable on I. (Of course, if f is twice differentiable, then f is necessarily
continuously differentiable.) In general, if f 0 , f 00 , . . ., f (k) exist on I, we say that f is k-times
differentiable (on I). In this case f is necessarily (k − 1)-times continuously differentiable.
Definition 21.1. Let I be an open interval, let a ∈ I, and let f : I → R. Suppose that f
is k-times differentiable at a. The kth Taylor polynomial of f at a is
k
X f (j) (a) 1 00 1
Pk (a, t) = (t − a)j = f (a) + f 0 (a)(t − a) + f (a)(t − a)2 + · · · + f (k) (a)(t − a)k .
j=0
j! 2! k!

Lemma 21.2. Let f : I → R be k-times differentiable at a ∈ I. Then Pk (a, t) has the


property that
dj
j
P k (a, t)
t=a
= f (j) (a), j = 0, . . . , k.
dt
Moreover, no other polynomial of degree k (or less) has this property. (Thus the Taylor
polynomial of degree k is the best approximation to f at a among all polynomials of degree
less than or equal to k.)
Proof. It is a simple calculation to check that Pk (a, t) has the indicated property. If q(t) =
c0 + c1 (t − a) + · · · + ck (t − a)k is a polynomial, then differenting j times gives q (j) (a) = j!cj .
Thus if q (j) (a) = f (j) (a) for 0 ≤ j ≤ k, then we must have j!cj = f (j) (a), as required. 

We see by this lemma that it is easy to find a polynomial that approximates f well at the
point a. It is not as easy to see how well this polynomial approximates f near the point a.
For this, we have Taylor’s Theorem. One can think of it as the generalization of the mean
value theorem from order 0 to order k. The proof is a bit tricky; we will use Cauchy’s mean
value theorem.
52 JACK SPIELBERG

Theorem 21.3. (Taylor’s theorem.) Let I ⊆ R be open, let a ∈ I, let f : I → R. Suppose


that f is (k + 1)-times differentiable on I. For t ∈ I there exists c between a and t such that
f (k+1) (c)
f (t) = Pk (a, t) + (t − a)k+1 .
(k + 1)!
Proof. Let R(t) = f (t) − Pk (a, t). (R(t) is sometimes called the kth remainder.) It follows
from Lemma 21.2 that R(j) (a) = 0 for 0 ≤ j ≤ k, and we easily see that R(k+1) (a) = f (k+1) (a).
Let h(x) = (x − a)k+1 . Then h(j) (a) = 0 for 0 ≤ j ≤ k also, and h(k+1) (a) = (k + 1)!.
Now let x ∈ I, x 6= a. We apply the Cauchy mean value theorem to R and h on the
interval between a and x: thus there is c1 between a and x such that
R(x) − R(a) h0 (c1 ) = h(x) − h(a) R0 (c1 ),
 

or equivalently, since R(a) = h(a) = 0 and h, h0 6= 0 away from a,


R(x) R0 (c1 )
= 0 .
h(x) h (c1 )
Now we apply the Cauchy mean value again, to R0 and h0 on the interval between a and c1 :
there is c2 between a and c1 such that
R0 (c1 ) R00 (c2 )
=
h0 (c1 ) h00 (c2 )
(again using the facts that R(a) = h(a) = 0 and h0 , h00 6= 0 away from a). We repeat this
process k + 1 times, and we obtain
R(x) R0 (c1 ) R(k+1) (ck+1 ) f (k+1) (ck+1 )
= 0 = · · · = (k+1) = .
h(x) h (c1 ) h (ck+1 ) (k + 1)!
1
Unwinding this gives f (x)−Pk (a, x) = (k+1)!
f (k+1) (c)(x−a)k+1 , where c (≡ ck+1 ) lies between
a and x. 
Corollary 21.4. Let f : I → R be twice differentiable, and assume that f 00 ≥ 0 on I. Then
at each point of I, the graph of f lies above its tangent line.
Proof. Let a, a + h ∈ I. By Taylor’s theorem there is 0 < θ < 1 such that
1
f (a + h) = f (a) + f 0 (a)h + f 00 (a + θh)h2 ≥ f (a) + f 0 (a)h.
2
The last expression is the x2 -coordinate of the point on the tangent line at x1 = a + h. 
Example 21.5. (Polynomial approximation.)
(1) Let f (x) = ex . Then f (j) (x) = ex for all j, so f (j) (0) = 1 for all j. Thus Pk (x) =
Pk 1 j
j=0 j! x is the k-th order Taylor polynomial of f at 0. By Taylor’s theorem, there
is c between 0 and x such that
ec
ex − Pk (x) = xk+1 .
(k + 1)!
Now fix M > 0. For |x| ≤ M , we have
k+1
e − Pk (x) ≤ eM M
x
→ 0 as k → ∞.
(k + 1)!
Thus the Taylor polynomials converge uniformly to ex on any bounded interval.
NOTES, MAT 472, INTERMEDIATE ANALYSIS, FALL 2010 53

(2) Define g : R → R by
(
e−1/x , if x > 0
g(x) =
0, if x ≤ 0.
It is a nice exercise to show that g has derivatives of all orders at 0 (this is clear at
other points of R), and that g (j) (0) = 0 for all j. Thus all Taylor polynomials of g
at 0 are identically zero. Therefore the Taylor polynomials of g do not approximate
g uniformly in any neighborhood of zero.

22. The Riemann integral


In this section we will discuss Riemann integration. We gratefully cobbled together this
treatment from the ideas of the analogous chapter of Pugh’s book. Pugh’s approach digs a
little bit deeper than the usual ones, but it really is worth the extra effort
What is integration all about? Of course we rely on your previous experience from calculus:
the most basic answer is that we want to find the area of a region bounded by curved lines.
(A region bounded by straight lines can be dealt with entirely by elementary geometry.) Our
definition(s) are based on this idea. The next level of abstraction comes from the fundamental
theorem of calculus: integration is the inverse operation to differentiation. That statement is
a bit glib. After all, the derivative of a function is another function, whereas the integral of
a function is a number. But the statement actually is correct, when it is fleshed out properly
— that is the role of the fundamental theorem. What we take from this, (or rather, what
we imagine that we are explaining to first year calculus students), is that integration is a
function of functions, i.e. a functional :
Z
: {functions} → R.

Among the first properties of integration that are presented in calculus are the “sum” and
“scalar multiple” rules:
Z Z Z Z Z
(f + g) = f + g; (cf ) = c f.

In fact, these are indicating precisely that integration is a linear functional. Linear algebra
is an essential part of modern analysis, and the analysis of linear functionals, functional
analysis, is one of its broadest subdisciplines.
Well, the notion of linear map presupposes the idea of vector spaces: the domain and
codomain of a linear map should be vector spaces. This is a fundamental idea, that is
almost completely lost in a calculus course: the collection of functions that can be integrated
should be a vector space. To be candid, we don’t really talk at all about the “space of
integrable functions” in a calculus course. At best, we try to explain why certain functions
are integrable, e.g. continuous, or piecewise continuous, functions. This time, we will directly
address this question. Not only will we carefully define what integrable means, and prove that
the set of integrable functions is a vector space. We will give an independent characterization
(due to Lebesgue) of exactly which functions are integrable. This is useful even just in the
context of Riemann integration. Many important results that would otherwise require fussy
proofs will become effortless (so to speak). But it also prods us to a larger view. Once we
are able to see the space of Riemann integrable functions as a whole, we can also begin to
54 JACK SPIELBERG

see its limitations, and where it might give way to generalization. In the next semester we
will spend some time (how much???) exploring Lebesgue’s version of integration.
That is the end of the “introduction”. We have to get started, and the beginning is
very basic — after all, integration is just a lot of arithmetic. We will follow Pugh’s idea
of emphasizing the fact that there are two usual ways to present the integral; he refers to
them as the Riemann and the Darboux approaches. Without any expertise in the history of
mathematics, or any effort at tracking down that history, we will just adopt this terminology.
First we give the Riemann approach. We let f be a real-valued function on a compact interval
[a, b].
Definition 22.1. A partition of [a, b] is a finite set P ⊆ [a, b] such that a, b ∈ P .
The idea of a partition is that it defines a subdivision of [a, b] into a finite number of
subintervals. The easiest way of indicating this is by giving the set of endpoints of the
subintervals, which is what our definition does. We usually write a partition in the form
P = {x0 , x1 , . . . , xn },
where a = x0 < x1 < · · · < xn = b. This is a slight abuse of notation, since the definition
of P as a set does not indicate that the numbers in the set are given in (strictly) increasing
order. From the partition P we obtain n subintervals of [a, b]: [x0 , x1 ], . . ., [xn−1 , xn ]. Note
that the number n associated with P is obtained from the relation n + 1 = #(P ). We use
the term mesh for the length of the largest subinterval: mesh(P ) = max1≤i≤n (xi − xi−1 ).
The mesh is a rough sort of description of how fine the partition is.
Definition 22.2. A partition pair is a partition P together with a list T = (t1 , . . . , tn ) such
that xi−1 ≤ ti ≤ xi for 1 ≤ i ≤ n.
Thus the list T consists of a selection of one element from each subinterval of the partition.
Definition 22.3. Let f : [a, b] → R, and let (P, T ) be a partition pair for the interval [a, b].
The Riemann sum associated to this data is the number
n
X
R(f, P, T ) = f (ti )∆xi ,
i=1
where ∆xi = xi − xi−1 , the length of the ith subinterval.
Now we have the terminology we need to define Riemann integrability and the Riemann
integral. As mentioned above, Riemann sums are just a lot of (carefully organized) arith-
metic. To pass to the integral is a limiting process. The following definition is the usual
notion of limit, but is based on the mesh.
Definition 22.4. The function f : [a, b] → R is Riemann integrable if there is a number L
such that for every ε > 0, there exists δ > 0 such that for every partition pair (P, T ) of [a, b],
if mesh(P ) < δ then R(f, P, T ) − L < ε.
We write L = limmesh(P )→0 R(f, P, T ) to indicate this limit. The number L is unique, if
it exists. This is proved in theR usual way
R b of limits,
R band is left to youR as an exercise. If f is
b
Riemann integrable, we write a f (or a f dx, or a f (x) dx, or just f ) for the number L.
We will write R[a, b] for the set of all Riemann integrable functions on [a, b].
There is an important detail hidden in the last definition. For the limit to exist it must
be the case that the approximation holds independently of the choice of the list T in the
NOTES, MAT 472, INTERMEDIATE ANALYSIS, FALL 2010 55

partition pair. In other words, if P is a partition with mesh(P ) < δ, then the Riemann sum
is within ε of L for any choice of T .
We now give some consequences of the definition.
Theorem 22.5. If f is Riemann integrable then f is bounded.
Proof. We apply the definition of integrability with ε = 1: there
exist L and δ > 0 such that
if P is any partition with mesh(P ) < δ, then R(f, P, T ) − L < 1. (As we mentioned above,

this estimate holds for any choice of T .) It follows from the triangle inequality that
n
X
f (ti )∆xi < 1 + |L|.
i=1

We will show that f is bounded on each subinterval of [a, b] defined by P . It will then follow
that f is bounded on [a, b]. Fix i0 ∈ {1, 2, . . . , n}. For i 6= i0 choose ti ∈ [xi−1 , xi ]. For any
t ∈ [xi0 −1 , xi0 ] we apply the above inequality to the list T = (t1 , . . . , ti0 −1 , t, ti0 +1 , . . . , tn ):
X
f (t)∆xi0 − f (ti )∆xi ≤ R(f, P, T ) < 1 + |L|.
i6=i0

We find that !
X
f (t) ≤ (∆xi0 )−1 1 + |L| +

f (ti )∆xi .
i6=i0
Thus the right hand side is an upper bound for |f | on [xi0 −1 , xi0 ]. 
Theorem 22.6. R[a, b] is a vector space, and integration defines a linear functional on it.
Proof. We note that for a fixed partition pair (P, T ), the Riemann sum is linear in f :
X
R(cf + g, P, T ) = (cf + g)(ti )∆xi
i
X 
= cf (ti ) + g(ti ) ∆xi
i
X X
=c f (ti )∆xi + g(ti )∆xi
i i
= cR(f, P, T ) + R(g, P, T ).
Since addition and multiplication in R are continuous, we get

lim R(cf + g, P, T ) = lim cR(f, P, T ) + R(g, P, T )
mesh(P )→0 mesh(P )→0

=c lim R(f, P, T ) + lim R(g, P, T ).


mesh(P )→0 mesh(P )→0

Therefore cf + g is RiemannR integrable, and


R hence
R R[a, b] is a vector space. Moreover the
above calculation shows that (cf +g) = c f + g, i.e. that integration is a linear functional
on R[a, b]. 
The following example and theorem are easy exercises using the definition of integrability.
Rb
Example 22.7. The constant function 1 is Riemann integrable, and a 1 = b − a.
R R
Theorem 22.8. Let f , g ∈ R[a, b] with f ≤ g. Then f ≤ g. If |f | ≤ M on [a, b], then
R b
f ≤ M.
a
56 JACK SPIELBERG

23. The “Darboux” approach


We now discuss the second way of defining the Riemann integral, which we call the Dar-
boux method. Again, we need some preliminaries. Notice that for this method we must
assume that the function is bounded.
Definition 23.1. Let f : [a, b] → R be a bounded function, and let P = {x0 , x1 , . . . , xn } be
a partition of [a, b]. We define
mi = inf f (t)
xi−1 ≤t≤xi

Mi = sup f (t).
xi−1 ≤t≤xi
X
L(f, P ) = mi ∆xi
i
X
U (f, P ) = Mi ∆xi .
i
These are referred to as lower and upper sums. Notice that for any partition pair (P, T )
we have that L(f, P ) ≤ R(f, P, T ) ≤ U (f, P ). Finally we define
I(f ) = sup L(f, P )
P

I(f ) = inf U (f, P ).


P

These are referred to as the lower and upper integrals of f on [a, b]. It is standard to write
Rb Rb
a
f for I(f ), and a f for I(f ). Finally, we say that f is Darboux integrable on [a, b] if I = I,
and in this case the common value is called the (Darboux) integral.
Our goal for this section is to prove that the Riemann and Darboux approaches yield the
same result. Before doing this we need to talk a bit about refinements of partitions, and
their effect on upper and lower sums and integrals.
Definition 23.2. Let P and P 0 be partitions of [a, b]. We say that P 0 refines P if P ⊆ P 0 .
It is easy to see that P 0 refines P if and only if every subinterval associated to P 0 is
contained in one of the subintervals associated to P .
Lemma 23.3. (Refinement Principle) Let P 0 refine P . Then L(f, P ) ≤ L(f, P 0 ) and
U (f, P 0 ) ≤ U (f, P ).
In other words, refining the partition causes the lower sum to increase, and the upper sum
to decrease. The idea of the proof is to proceed from P to P 0 by adding one point at a time.
Then the change in the lower and upper sums happens on only one subinterval of P . We
leave as an exercise the writing of a precise proof.
In general, if P1 and P2 are two partitions of [a, b], then neither one need refine the other.
Thus there is in general no relation between the upper and lower sums for two partitions.
However, P1 and P2 always have a common refinement; for example, P1 ∪ P2 contains both
P1 and P2 . This device gives us the following important result: every lower sum for f is less
than or equal to every upper sum for f .
Lemma 23.4. Let P1 and P2 be two partitions of [a, b]. Then L(f, P1 ) ≤ U (f, P2 ).
NOTES, MAT 472, INTERMEDIATE ANALYSIS, FALL 2010 57

Proof. Let P 0 be a common refinement. Then


L(f, P1 ) ≤ L(f, P 0 ) ≤ U (f, P 0 ) ≤ U (f, P2 ). 
We now give a Cauchy type characterization of Darboux integrability.
Corollary 23.5. f is Darboux integrable on [a, b] if and only if for every ε > 0 there is a
partition P of [a, b] such that U (f, P ) − L(f, P ) < ε.
Proof. The forward direction follows easily from the definition, and we leave it as an exercise
to write it out carefully. For the reverse direction, suppose that the Cauchy condition holds.
We must show that I = I. We already know that I ≤ I. Let ε > 0. Choose a partition P
such that U (f, P ) − L(f, P ) < ε. Then
I ≤ U (f, P ) < L(f, P ) + ε ≤ I + ε.
This is true for every choice of ε, and hence I ≤ I. 
(This Cauchy type condition corresponds to a kind of limit. The limiting process going
on here is that the partition becomes finer and finer, in the sense of refinement. This is a
different kind of limit than the others we have seen. Until now we have seen limits based on
a totally ordered set; for example, n → ∞ in N, t → t0 in R, or t → ∞ in R. The limit taken
as a partition of [a, b] becomes finer and finer is based on a partially ordered set, namely, the
set of partitions ordered by refinement. It isn’t hard to get used to this notion, and we may
write
I = lim L(f, P )
P →∞

I = lim U (f, P ).
P →∞
R
If f is Darboux integrable, then we have f = limP →∞ L(f, P ) = limP →∞ U (f, P ).)
We are now ready to prove the main theorem of this section.
Theorem 23.6. Let f : [a, b] → R. Then f is Riemann integrable if and only if f is Darboux
integrable. For an integrable function, the two integrals coincide.
Proof. We first assume that f is Riemann integrable. Let ε > 0. There exist a number L
and δ > 0 such that if P is any
partition with mesh(P ) < δ, then for any list T associated
to P we have R(f, P, T ) − L < ε. Fix any partition P with mesh(P ) < δ. Then we have
(for any T )
L − ε < R(f, P, T ) < L + ε.
Recall that for any partition pair (P, T ), we have L(f, P ) ≤ R(f, P, T ) ≤ U (f, P ). Moreover,
it is easy to see that
L(f, P ) = inf R(f, P, T )
T
U (f, P ) = sup R(f, P, T ).
T

It follows that
L − ε ≤ L(f, P )
L + ε ≥ U (f, P ).
Therefore U (f, P ) − L(f, P ) ≤ 2ε. Hence f is Darboux integrable.
58 JACK SPIELBERG

Now we assume that f is Darboux integrable. The proof of this direction is a bit trickier
than the other one. In particular, it relies upon the standard technique of dividing the sum
into two kinds of terms, and estimatingR b them differently. Since f is bounded, there is K
such that |f | ≤ K on [a, b]. Let L = a f (the Darboux integral of f ). Let ε > 0. Choose a
partition P such that
U (f, P ) − L(f, P ) < ε.
Write P = {x0 , x1 , . . . , xn }. Set δ = nε . We will show that if (Q, T ) is any partition pair
with mesh(Q) < δ, then R(f, Q, T ) − L < (2K + 1)ε, proving Riemann integrability

(and also showing that the two integrals coincide). In fact, it will suffice to show that
U (f, Q) − L(f, Q) < (2K + 1)ε, since both L and R(f, Q, T ) lie between the lower and upper
sums.
So let Q = {y0 , y1 , . . . , yk } have mesh less than δ. We will write Ii = [xi−1 , xi ] for 1 ≤ i ≤ n,
and Jj = [yj−1 , yj ] for 1 ≤ j ≤ k. We divide the subintervals associated to Q into two groups
as follows:
S1 = {j : there exists i with xi ∈ int(Jj )}
S2 = {1, 2, . . . , k} \ S1 .
Thus S2 indicates those Jj ’s that are entirely contained in one of the Ii ’s; S1 indicates those
Jj ’s that straddle more than one of the Ii ’s. There are at most n elements in S1 (in fact,
there are at most n − 1). Now we will use m(I) and M (I) for the infimum and supremum
of f over an interval I. For j ∈ S1 we have
−K ≤ m(Jj ) ≤ M (Jj ) ≤ K.
For j ∈ S2 there is i such that Jj ⊆ Ii . Then
m(Ii ) ≤ m(Jj ) ≤ M (Jj ) ≤ M (Ii ).
Hence for this j and i we have
M (Jj ) − m(Jj ) ≤ M (Ii ) − m(Ii ).
Now we estimate:
k
X 
U (f, Q) − L(f, Q) = M (Jj ) − m(Jj ) ∆yj
j=1
X  X 
= M (Jj ) − m(Jj ) ∆yj + M (Jj ) − m(Jj ) ∆yj
j∈S1 j∈S2
X n
X X 
≤ 2K∆yj + M (Ii ) − m(Ii ) ∆yj
j∈S1 i=1 j∈S2
Jj ⊆Ii
n
X 
< 2Knδ + M (Ii ) − m(Ii ) ∆xi
i=1
= 2Kε + U (f, P ) − L(f, P )
< (2K + 1)ε. 
NOTES, MAT 472, INTERMEDIATE ANALYSIS, FALL 2010 59

There are various situations where it is fairly easy to prove integrability (or non-integrabi-
lity) using the Darboux definition. These are useful exercises in working with the definition.
In the next section we will prove a deep theorem that will make them trivial to verify.
Example 23.7. (1) Continuous functions are Riemann integrable.
(2) Monotone functions are Riemann integrable.
(3) Step functions are Riemann integrable. (A step function on [a, b] is a function for
which there exists a partition of [a, b] such that the function is constant on the interior
of each subinterval.) In particular, the characteristic function χ[c,d] of a subinterval
[c, d] of [a, b] is Riemann integrable over [a, b], where χE (x) = 1 if x ∈ E and = 0 if
x 6∈ E.
(4) More generally, a bounded function that is continuous at all but finitely many points
of [a, b] is Riemann integrable.
(5) The characteristic function of Q is not Riemann integrable over any interval.

24. Measure zero and integration


In order to characterize intrinsically the property of being Riemann integrable, we need to
develop the concept of sets of measure zero. This is the first step in what is called measure
theory (which I hope to cover a bit more fully next semester). Riemann integration is built
around the elementary concept of length of an interval. It is natural to consider the “length”
of a finite union of intervals, but the question of measuring more complicated subsets of R
is not addressed in calculus. Nevertheless, this is a very important problem, resolved by
Lebesgue in the first part of the twentieth century. One of his great insights is the main
theorem below.
Definition 24.1. Let E ⊆ R. E has measure zero if for every ε > 0 there exist open
intervals U1 , U2 , . . ., such that
⊆ ∞
S
(1) E
P∞ i=1 Ui .
(2) i=1 |Ui | < ε.
In this definition we write |Ui | for the length of the interval Ui . We expect that the notion
of a convergent sum of positive real numbers is familiar from a previous course (even though
we will review this idea later
P this semester). In any case, the definition is unchanged if we
demand instead of (2) that ni=1 |Ui | < ε for all n. Here are some examples of sets of measure
zero.
Example 24.2. (1) Finite sets have measure zero. (In fact, only finitely many open
intervals are necessary in this case.)
(2) Countable sets have measure zero.
Proof. This is a convenient place to introduce the “ε/2n trick”. Let E = {x1 , x2 , . . .}.
ε ε
Let ε > 0 be given,
ε
P and set εUi = (xi − 2i+2 , xi + 2i+2 ), for
S i = 1, 2, . . .. Then
|Ui | = 2i+1 , so that i |Ui | = 2 < ε. It is obvious that E ⊆ i Ui . 
(3) Q has measure zero.
(4) A subset of a set of measure zero has measure zero.
(5) A countable union of sets of measure zero has measure zero.
(6) The definition is unchanged if arbitrary intervals (open, closed, or half-open) are used
instead of open intervals.
60 JACK SPIELBERG

Proof. Proofs for the previous three assertions are left as exercises. 
(7) The Cantor set C has measure zero.
Proof. Recall from our construction of C that C = ∞
T
n=1 Fn , where Fn is the union
of 2n closed intervals, each of length 3−n . Stretching each of these a little bit, we
can produce 2n open intervals Ui each having length less than (2.5)−n and having
Pn 2 n
union containing Fn (and hence C). Then 2i=1 |Ui | = 2.5 , which tends to zero as
n → ∞. 
(8) If a < b then [a, b] does not have measure zero. This is a good exercise, even if it
isn’t homework (but it might be).
Before stating the main theorem, we recall the notion of oscillation of a function at a
point. The definition makes sense for a function between general metric spaces, but for
clarity we will state it only for functions whose codomain is R.
Definition 24.3. Let X be a metric space, and let f : X → R. Let a ∈ X. The oscillation
of f at a is 
osc(f, a) = inf sup f (x) − f (y) .
r>0 x,y∈B (a)
r

This is the precise description of a very natural idea. Let’s briefly take the definition apart.
Fix r > 0. This defines an open ball about a. How much can the function vary over this
ball? The supremum in the parentheses is exactly how much. If we let r become smaller,
then the ball becomes smaller, so that there are fewer points in the ball to put inside of f .
Thus as r decreases, the supremum also decreases. In fact, the infimum over r is actually
equal to the limit as r → 0. This limiting value is the minimum amount that f can be made
to jump, no matter to how small a ball (centered at a) you confine its argument. That is
what we mean by the oscillation at a.
We can think of the oscillation of f at a as a measure of the size of the discontinuity of f
at a. That is an interpretation of the first part of the following lemma (which should have
been homework earlier in the semester).
Lemma 24.4. Let X be a metric space, let f : X → R, and let a ∈ X.
(1) f is continuous at a if and only if osc(f, a) = 0.
(2) For c > 0, {x ∈ X : osc(f, x) ≥ c} is a closed set.
Theorem 24.5. Let f : [a, b] → R be bounded. Let E be the set of points in [a, b] where f is
discontinuous. Then f is Riemann integrable if and only if E has measure zero.
1
Proof. We first assume that f is Riemann S∞ integrable. Let En = {x ∈ [a, b] : osc(f, x) ≥ n }.
By Lemma 24.4 (1), we know that E = n=1 En . Thus it suffices to show that En has measure
zero for each n. So now fix n, and choose a partition P such that U (f, P ) − L(f, P ) < nε .
Let
S = {I : I is a subinterval of P, and int(I) ∩ En 6= ∅}.
For I ∈ S we have that M (I) − m(I) ≥ n1 . (The reason is that there must exist a point
a ∈ En in the interior of I, so that I ⊇ Br (a) for some r > 0.) But now we estimate
X
1
n
|I| ≤ U (f, P ) − L(f, P ) < nε ,
I∈S
NOTES, MAT 472, INTERMEDIATE ANALYSIS, FALL 2010 61
P S
so that I∈S |I| < ε. Now the union I∈S I contains all points of En except possibly some
of the endpoints of subintervals of P not in S. There can be only finitely many such points.
Let T be a collection
P P of open intervals centered at these points with total length so small that
I∈S |I| + J∈T |J| < ε. Then {int(I) : I ∈ S} ∪ T is a finite collection of open intervals
covering En and having total length less than ε. Therefore En has measure zero.
Now we prove the converse. Suppose that E has measure zero. Let |f | ≤ K on [a, b],
and let ε > 0 be given. Let E0 = {x ∈ [a, b] : osc(f, x) ≥ ε}. Then E0 ⊆SE, so that

P0∞also has measure zero. Let U1 , U2 , . . . be open intervals such that E0 ⊆ i=1 Ui and
E
i=1 |Ui | < ε. By Lemma 24.4(2), E0 is closed.SnSince E0 ⊆ [a,b], E0 is compact. Thus there
is n such that E0 ⊆ U1 ∪ · · · ∪ Un . Let P0 = i=1 ∂Ui ∩ [a, b] ∪ {a, b}, a partition of [a, b].
We will find a suitable refinement P of P0 such that U (f, P ) − L(f, P ) < (2K + b − a)ε,
which will conclude the proof. Since P0 contains the endpoints of the Ui ’s, each subinterval
associated to P0 is either contained in some Ui , or is disjoint from all of the Ui ’s. Let S1
denote the collection of those subintervals that are contained in some Ui , and let S2 denote
the remaining subintervals. Then for I ∈ S1 we have
M (I) − m(I) ≤ 2K.
Hence
X n
X

M (I) − m(I) |I| ≤ 2K |Ui | < 2Kε.
I∈S1 i=1
Now consider a subinterval I ∈ S2 . Then I ∩ E0 = ∅, so the oscillation of f at each point
of I is less than ε. Thus for each x ∈ I there is an open interval Ix centered at x such that
M (Ix ) − m(Ix ) < ε. The collection {Ix : x ∈ I} is an open cover of the compact interval I,
hence has a finite subcover: there are x1 , . . ., xk ∈ I such that I ⊆ ki=1 Ixi . We define P by
S
including into P0 all endpoints of the Ixi that lie in I:
[ [ 
P = P0 ∪ (∂Ixi ) ∩ I .
I∈S2 i

Let us consider the subintervals of P contained in some I ∈ S2 , let J be one such. Then
J ⊆ Ixi for some i, and hence M (J) − m(J) < ε. Therefore
XX  XX X
M (J) − m(J) |J| < ε|J| = ε |I| ≤ ε(b − a).
I∈S2 J⊆I I∈S2 J⊆I I∈S2

We now have
X  XX 
U (f, P ) − L(f, P ) = M (I) − m(I) |I| + M (J) − m(J) |J|
I∈S1 I∈S2 J⊆I

< 2Kε + (b − a)ε. 


We now give several consequences of this theorem. These can be proved directly from
the Riemann or Darboux definitions, but are deduced much more easily from the above
characterization. The first of these is immediate from the theorem.
Corollary 24.6. A bounded function with only finitely many discontinuities is Riemann
integrable. In particular, a piecewise continuous function is Riemann integrable.
Corollary 24.7. A function that is zero except at finitely many points is Riemann integrable.
Moreover, the integral of such a function equals 0.
62 JACK SPIELBERG

Proof. The integrability follows from the previous corollary. It is easy to use the definition
of the integral to show that the integral is zero. 
Corollary 24.8. Riemann integrability, and the value of the Riemann integral, of a function
are unaffected when the function is altered at finitely many points.
Proof. The altered function equals the sum of the original function with a function that is
zero except at finitely many points. Thus the previous corollary, together with linearity of
the integral, give the result. 
Corollary 24.9. Monotone functions are Riemann integrable.
Proof. This follows from the fact that a monotone function has countably many discontinu-
ities. To see this, note that a monotone function has one-sided limits at all points, and is
discontinuous at a point if and only if the two one-sided limits at that point are distinct. If
we let
f (x±) = lim± f (t),
 t→x 
then for any x 6= y we have f (x−), f (x+) ∩ f(y−), f (y+) = ∅. Thus if we let q(x) be
a rational number in the interval f (x−), f (x+) for each discontinuity x of f , then q is a
one-to-one function from the set of discontinuities into Q. Therefore the set of discontinuities
is countable, and hence of measure zero. 
Corollary 24.10. The product of Riemann integrable functions is Riemann integrable.
Proof. The set of discontinuities of f g is contained in the union of the sets of discontinuities
of f and g separately. 
Corollary 24.11. Let f be Riemann integrable on [a, b], and let ϕ be a continuous function
defined on the range of f . Then ϕ ◦ f is Riemann integrable (also on [a, b]).
Proof. Since composition preserves continuity, the set of points where f is continuous is
contained in the set of points where ϕ ◦ f is continuous. Hence the sets of discontinuities
satisfy the reverse containment. 
Remark 24.12. The order in which the two functions are composed in the previous corollary
is crucial: f ◦ ϕ need not be integrable. (You can remember which order preserves integrabi-
lity by noting that in the corollary, the composition has the same domain as the integrable
function.)
Corollary 24.13. If f is Riemann integrable, then so is |f |.
Proof. |f | = | · | ◦ f . 
Corollary 24.14. Let f be Riemann integrable on [a, b], and let [c, d] ⊆ [a, b]. Then f is
Rd Rb
Riemann integrable on [c, d]. Moreover, c f = a f χ[c,d] .
Proof. For the first statement, note that any discontinuity of f in [c, d] is also a discontinuity
in [a, b]. The second statement follows easily from either definition of the integral by including
{c, d} into a partition of [a, b]. 
Corollary 24.15. Let f be Riemann integrable on [a, b], and let c ∈ (a, b). Then
Z b Z c Z b
f= f+ f.
a a c
NOTES, MAT 472, INTERMEDIATE ANALYSIS, FALL 2010 63

Proof. This follows from linearity of the integral and Corollary 24.8, since f χ[a, b] and
f χ[a,c] + f χ[c,b] can differ only at c. 
The image of a set of measure zero under a continuous function need not have measure
zero. This is a pretty strange phenomenon. The upshot is that continuity is not really such a
strong property. It is important that a stronger version of continuity is sufficient to preserve
measure zero sets.
Lemma 24.16. Let g : [a, b] → R be a Lipschitz function, and let E ⊆ [a, b] have measure
zero. Then g(E) has measure zero.
Proof. Let c > 0 be a Lipschitz
constant for g. We claim that if I is an open interval
contained in [a, b], then g(I) ≤ c|I|.To see this, let I = (t − r, t + r). Then by the Lipschitz
condition, g(I) ⊆P g(t) − cr, g(t) + cr . Now let ε > 0. Let U1 , U2 , . . . be open intervals with
ε
S
E ⊆ i Ui and i |Ui | < c . Let us assume that Ui ⊆ [a, b]; this is not a serious restriction,
as we may extend the domain of g to all of R (e.g. by letting g be S constant on (−∞, a] and
on [b, ∞)) without changing the Lipschitz constant. Then g(E) ⊆ i g(Ui ), and
X X
g(Ui ) ≤ c |Ui | < c εc = ε.
i i

Therefore g(E) has measure zero. 


Corollary 24.17. Let f be Riemann integrable, and let ϕ be a continuous one-to-one func-
tion (defined on an interval) such that its inverse function ϕ−1 is Lipschitz. Then f ◦ ϕ is
Riemann integrable. (Compare with Corollary 24.11.)
Proof. Let E denote the set of discontinuities of f . We first claim that the set of disconti-
nuities of f ◦ ϕ is contained in ϕ−1 (E). To see this, note that if x 6∈ ϕ−1 (E) then ϕ(x) 6∈ E,
so that f is continuous at ϕ(x). But then f ◦ ϕ is continuous at x. Hence any point where
f ◦ ϕ is discontinuous must be contained in ϕ−1 (E). By the previous corollary, ϕ−1 (E) has
measure zero. 

25. The fundamental theorem of calculus


Before giving the fundamental theorem, we present the usual notational expediencies.
Ra Rb
Definition 25.1. If a < b and f is Riemann integrable on [a, b], we define b f = − a f .
Rb Rc
Theorem
Ra 25.2. Let f be bounded on an interval containing a, b and c. If two of a f , b f
and c f exist, then so does the third, and
Z b Z c Z a
f+ f+ f = 0.
a b c

Proof. One of a, b and c lies between the other two. By symmetry, we may assume without
loss of generality that it is b that lies in the middle. Again without loss of generality, we may
assume that a < b < c. Now, if f is integrable on [a, b], we are done by Corollary 24.14. On
the other hand, if f is integrable on [a, c] and [c, b], then f is integrable on [a, b] by Theorem
24.5. 
R a R b
Remark 25.3. If a < b, then b f ≤ a |f |.
64 JACK SPIELBERG

Theorem 25.4. (The fundamental R x theorem of calculus.) Let f be Riemann integrable on


[a, b]. For x ∈ [a, b] let F (x) = a f . Then F is Lipschitz (in particular, F is continuous.)
If f is continuous at x0 ∈ [a, b], then F is differentiable at x0 , and F 0 (x0 ) = f (x0 ) (in
particular, if f is continuous on [a, b], then F is differentiable on [a, b]).
Proof. We leave as an exercise the proof that F is Lipschitz. Suppose that f is continuous
x0 . Let ε > 0. Then there is δ > 0 such that for all x ∈ [a, b], if |x − x0 | < δ then
at
f (x) − f (x0 ) < ε. Now for x ∈ [a, b] \ {x0 } with |x − x0 | < δ, we have
Z x Z x0 
F (x) − F (x0 ) 1
= f− f
x − x0 x − x0 a a
Z x
1
= f, hence
x − x 0 x0
Z x Z x
F (x) − F (x0 ) 1 1
− f (x0 ) = f− f (x0 )
x − x0 x − x 0 x0 x − x 0 x0
Z x
1 
= f − f (x0 )
x − x 0 x0
Z x
1
≤ f − f (x0 )

|x − x0 |
x0
1
≤ ε|x − x0 |
|x − x0 |
= ε. 
Corollary 25.5. If f is continuous on [a, b], then f has an antiderivative on [a, b].
Corollary 25.6. If f is continuous on [a, b], and if G is an antiderivative for f on [a, b],
Rb
then a f = G(b) − G(a).
Rx
Proof. Let F (x) = a f . Then F and G are both antiderivatives for f on [a, b]. Thus F
and G are differentiable on [a, b], and F 0 = G0 . By Corollary 20.14, F − G is constant, say
F − G = c. Then
Z b
 
f = F (b) = F (b) − F (a) = G(b) + c − G(a) + c = G(b) − G(a),
a

(since F (a) = 0). 


We conclude this section with the change of variable theorem, which is the basis for the
method of integration by substitution from elementary calculus. If we assume that the
function f is continuous, then an easy short proof can be given using antiderivatives (you
might try to find it as an exercise). Our characterization of integrability by means of sets of
measure zero lets us prove a more general result, without too much extra work.
Theorem 25.7. (Change of variable theorem.) Let f be Riemann integrable on [a, b], and
let g : [c, d] → [a, b] be continuously differentiable with g 0 6= 0 on [c, d]. Then
Z g(d) Z d
f g(x) g 0 (x) dx.

f (y) dy =
g(c) c
NOTES, MAT 472, INTERMEDIATE ANALYSIS, FALL 2010 65

Proof. Since g 0 is continuous on [c, d], it does not change sign. We first consider the case
where g 0 > 0 on [c, d]. Then g(c) < g(d). Note that g −1 is also continuously differentiable,
by the inverse function theorem, and hence that g −1 is Lipschitz. By Corollary 24.17, f ◦ g
is Riemann integrable, and hence so is (f ◦ g)g 0 . Let L and L0 be the two integrals in the
statementof the theorem, and let ε > 0. Let δ > 0 be such that
for any partition pair
(Q, U ) of g(c), g(d) with mesh(Q) < δ we have R(f, Q, U ) − L < ε. Since

g is uniformly

continuous on [c, d] there is η1 > 0 such that if |x − x0 | < η1 then g(x) − g(x0 ) < δ.
Choose
η2 > 0 such
 that for any partition pair (P, T ) of [c, d] with mesh(P ) < η2 we have
R (f ◦ g)g 0 , P, T − L0 < ε. Fix a partition P of [c, d] with mesh(P ) < min{η1 , η2 }.
Write P = {x0 , x1 , . . . , xn }. Let yi = g(xi ), and let Q = g(P ) = {y0 , y1 , . . . , yn }. Since
mesh(P ) < η1 we know that mesh(Q) < δ. The mean value theorem applied to g on
[xi−1 , xi ] gives ti ∈ (xi−1 , xi ) such that
g(xi ) − g(xi−1 ) = g 0 (ti )(xi − xi−1 )
i.e. ∆yi = g 0 (ti )∆xi .
 
Let ui = g(ti ), and set U = (u1 , . . . , un ). Then (Q, U ) is a partition pair of g(c), g(d) , and
n
X n
X
f g(ti ) g 0 (ti )∆xi = R (f ◦ g)g 0 , P, T .
 
R(f, Q, U ) = f (ui )∆yi =
i=1 i=1
Therefore
|L − L0 | ≤ L − R(f, Q, U ) + R (f ◦ g)g 0 , P, T − L0 < ε + ε = 2ε.


Hence L = L0 .
If, on the other hand, g 0 < 0 on [c, d], then g(d) < g(c). Note that ∆yi = −g 0 (ti )∆xi (and
R g(c)
i runs backward). But R(f, Q, U ) approximates g(d) f = −L. 

26. The Weierstrass approximation theorem


In this section, we will apply the Riemann integral to prove a classic, and still very
important, theorem from the 19th century, on polynomial approximation. The proof relies
on a technique called smearing that is very useful harmonic analysis, and many other areas
of mathematics. We will discuss this idea first, and then see about the Weierstrass theorem.
As an introductory example of smearing, consider a continuous function f : R → R. Then
f ∈ R[a, b] for any a < b. For n ∈ N we define the nth average of f by
n x+1/n
Z
An (x) = f (t) dt, for x ∈ R.
2 x−1/n
Thus An (x) is the average of f over the interval of length 2/n centered at x. We claim that
An converges to f uniformly on any compact interval. For the proof, let [a, b] be the interval,
and let ε > 0. Choose δ > 0 as in the definition of uniform continuity for f and ε on [a, b].
Let n > 1/δ. Then for any x ∈ [a, b],
Z Z x+1/n
n x+1/n n
An (x) − f (x) = f (t) dt − f (x) dt

2 x−1/n 2 x−1/n
Z x+1/n
n
≤ f (t) − f (x) dt;
2 x−1/n
66 JACK SPIELBERG

for t ∈ [x − 1/n, x + 1/n], f (x) − f (t) < ε, so
Z x+1/n
n
≤ ε dt
2 x−1/n
= ε.
Observe that An is constructed from f and the auxiliary function n2 χ[−1/n,1/n] :
n x+1/n n 1/n
Z Z Z
n 
An (x) = f (t) dt = f (t + x) dt = f (t + x) χ[−1/n,1/n] (t) dt.
2 x−1/n 2 −1/n R 2
Let gn = n2 χ[−1/n,1/n] . Thus we may express An in the form
Z
(∗) An (x) = f (t + x)gn (t) dt.
R

The sequence of functions (gn ) has the following key properties:


(1) gRn ≥ 0.
(2) R gn = 1.

(3) For any δ > 0, limn→∞ −δ gn = 1. (This means that the “mass” of gn concentrates
at 0 as n → ∞.)
An argument analogous to the above will work for any sequence of functions having these
three properties. Such a sequence is sometimes called an approximate identity, and there
are many important examples. Here is an example that we will use to prove the Weierstrass
theorem.
Define hn : R → R by (
(1 − x2 )n , if |x| ≤ 1
hn (x) =
0, if |x| > 1.
R1
Let cn = −1 hn , and let gn = c1n hn . (A sketch of hn , and of gn , will aid in understanding.)
It is immediate that this sequence (gn ) satisfies the first two properties above. For the third,
note that
1 − t2 ≥ 1 − t on [0, 1]
(1 − t2 )n ≥ (1 − t)n
Z 1 Z 1
2 n 1
(1 − t ) dt ≥ (1 − t)n dt =
0 0 n+1
Z 1
2
cn = (1 − t2 )n dt ≥ ,
−1 n+1

and so
Z 1 1 1
1−δ
Z Z
1 2 n n+1
gn (t) dt = (1 − t ) dt ≤ (1 − δ 2 )n dt = (n + 1)(1 − δ 2 )n → 0
δ cn δ 2 δ 2
R −δ Rδ
as n → ∞. Similarly, −1 gn → 0 as n → ∞. Hence −δ gn → 1.
Now we will state and prove the Weierstrass approximation theorem.
NOTES, MAT 472, INTERMEDIATE ANALYSIS, FALL 2010 67

Theorem 26.1. Let f : [a, b] → R be continuous. There is a sequence of polynomials pn


converging uniformly to f on [a, b].
Proof. The proof is easier if we first make some reductions. Suppose we prove the theorem
in the case that [a, b] = [0, 1]. Let f : [a, b] → R be continuous. Let ϕ(x) = a + (b − a)x and
ψ(x) = (x − a)/(b − a). Then ϕ and ψ are inverses of each other, and ϕ [0, 1] = [a, b]. Now
f ◦ ϕ : [0, 1] → R is continuous, so we are assuming that there are polynomials qn converging
to f ◦ ϕ uniformly on [0, 1]. Then qn ◦ ψ converges to f ◦ ϕ ◦ ψ = f uniformly on [a, b], and
it is clear that qn ◦ ψ are also polynomials.
Now suppose that we can prove the theorem in the case that [a, b] = [0, 1] and with the
assumption that f (0) = f (1) = 0. Let f : [0, 1] → R be an arbitrary continuous function.
Let w(x) = f (0) + f (1) − f (0) x. Then f − w is continuous on [0, 1], and vanishes at 0
and at 1. By our assumption, there is a sequence qn of polynomials converging to f − w
uniformly on [0, 1]. But then qn + w is a sequence of polynomials converging uniformly to f
on [0, 1].
The above remarks mean that if we can prove theorem in the case where [a, b] = [0, 1] and
f (0) = f (1) = 0, then we will have proved the theorem in general. So now we consider such
a function f . Since f is continuous on the compact set [0, 1], it is bounded. Let |f | ≤ M
on [0, 1]. Extend the domain of f to all of R by setting f (t) = 0 for t 6∈ [0, 1]. Then f is
continuous on R. Let gn (t) = (1/cn )(1 − t2 )n χ[−1,1] be as above. For x ∈ [0, 1] define pn (x)
by Z
pn (x) = f (t + x)gn (t) dt.
R
(Note that pn is defined just like An via equation (∗) in our preliminary discussion.)
We now claim that pn converges to f uniformly on [0, 1]. For this, we use the properties
of gn as an approximate identity. Let ε > 0. Choose δ > 0 as in the definition of uniform
continuity of f on [0, 1]. Notice that because f vanishes outside the interval [0, 1], this δ
satisfies the definition of uniform continuity for f on all of R. By property (3) of approximate

identities, there is n0 ∈ N such that 1 − −δ gn < δ for n ≥ n0 . Now, for any n ≥ n0 and any
x ∈ [0, 1], we have
Z Z

pn (x) − f (x) = f (t + x)gn (t) dt − f (x) gn (t) dt

ZR R


= f (t + x) − f (x) gn (t) dt
ZR

≤ f (t + x) − f (x) gn (t) dt
ZR Z

= f (t + x) − f (x) gn (t) dt + f (t + x) − f (x) gn (t) dt
[−δ,δ] [−1,−δ]∪[δ,1]
= C1 + C2 ,
where we have restricted the integration to the interval [−1, 1] because
gn vanishes outside

that interval. We estimate C1 and C2 separately. For |t| ≤ δ we have f (t + x) − f (x) < ε,
so Z δ Z
C1 ≤ εgn ≤ ε gn = ε.
−δ
68 JACK SPIELBERG

For |t| > δ we have f (t + x) − f (x) ≤ 2M , so
Z Z δ 
C2 ≤ 2M gn = 2M 1 − gn < 2M ε.
[−1,−δ]∪[δ,1] −δ

Thus pn (x) − f (x) < (2M + 1)ε for all x ∈ [0, 1].
We will finish the proof by showing that pn is a polynomial on [0, 1]. For x ∈ [0, 1],
Z
pn (x) = f (t + x)gn (t) dt
R
Z 1−x
= f (t + x)gn (t) dt, since f = 0 outside [0, 1],
−x
Z1
= f (u)gn (u − x) du, by the change of variable u = x + t
0
Z 1
1 n
= 1 − (u − x)2 du, since u − x ∈ [−1, 1] when u ∈ [0, 1].
f (u)
0 cn
 n
Note that c1n 1 − (u − x)2 is a polynomial in u and x:
2n 2n 2n
!
1 n X X X
1 − (u − x)2 = aij ui xj = aij ui xj .
cn i,j=0 j=0 i=0

It follows that !
2n
X Z 1 2n
X 
pn (x) = f (u) aij ui du xj
j=0 0 i=0
is a polynomial in x. 

27. Uniform convergence and the interchange of limits


There are many limiting processes in analysis, and it is frequently the case that two of
them bump up against each other. We have seen this once already, in Theorem 19.7. Let’s
recall that statement: “Let f , fn : X → Rk , and let a ∈ X. Suppose that each fn is
continuous at a, and that fn → f uniformly. Then f is continuous at a.” We can rewrite
this in the following way:
lim lim fn (x) = lim lim fn (x),
x→a n→∞ n→∞ x→a
which shows that it is an example of the interchange of two limiting processes. We also
saw an example where the above equation does not hold — in that example, the sequence of
functions converges pointwise, but not uniformly. It is the uniform nature of the convergence
that makes the theorem true. This points out another aspect of such situations: the order
of two limiting processes may be reversed if appropriate conditions hold. As a general rule,
you should always verify such conditions explicitly when making such an interchange. From
the point of view of an instructor, the interchange of two limits in a solution is ALWAYS a
red flag, and must be justified in detail by the student.
Theorem 27.1. Let fn ∈ R[a, b], let f : [a, b] → R, and suppose that fn → f uniformly on
Rb Rb
[a, b]. Then f ∈ R[a, b], and a f = limn a fn .
NOTES, MAT 472, INTERMEDIATE ANALYSIS, FALL 2010 69

Proof. Let ε > 0 be given. Let η = ε/ 1 + 2(b − a) . Choose N such that kfn − f ku < η for
n ≥ N . Let n ≥ N . Since fn ∈ R[a, b] there are step functions g0 , h0 such that g0 ≤ fn ≤ h0
Rb
on [a, b], and a (h0 − g0 ) < η. Let g = g0 − η and h = h0 + η. Then g and h are step
functions. If x ∈ [a, b], we have

g(x) = g0 (x) − η ≤ fn (x) − η < f (x) < fn (x) + η ≤ h0 (x) + η = h(x).

Moreover,
Z b Z b Z b

(h − g) = (h0 − g0 + 2η) = (h0 − g0 ) + 2η(b − a) < η 1 + 2(b − a) = ε.
a a a

It follows that f ∈ R[a, b]. Finally, we have


Z b Z b Z b


fn − f ≤ |fn − f | ≤ η(b − a) < ε,
a a a
Rb Rb
so we have that limn→∞ a
fn = a
f. 

It is instructive to find an example of a sequence converging pointwise, but whose integrals


do not converge to the integral of the limit function.
We now turn to the derivative of the limit of a sequence of differentiable functions. Here,
the situation is more complicated: uniform convergence seems to have nothing to do even
with differentiability of the limit, let alone with convergence to the derivative of the limit
if the limit is differentiable. For example, we know from the Weierstrass approximation
theorem that every continuous function is a uniform limit of polynomials, which are certainly
differentiable. But the continuous limit need not be differentiable. This is an indication that
we need to assume more to get a theorem analogous to the previous one, but for derivatives.
The key idea is that we have to assume that the sequence of derivatives converges uniformly.
Actually, even though this is a very strong hypothesis, it is not quite enough. For example,
any sequence of constant functions has derivatives that converge uniformly (since they are
all identically zero), but the sequence of functions need not converge at all.

Theorem 27.2. Let I be an interval, and let fn : I → R be differentiable. Suppose that (fn0 )
converges uniformly on I to a functiong. Suppose additionally that there is a ∈ I such that
the sequence of function values fn (a) converges. Then (fn ) converges to a differentiable
function f , and f 0 = g. Moreover, the convergence of fn to f is uniform on any bounded
subinterval of I.

Proof. We first show that there is a function f to which (fn ) converges, and that this con-
vergence is uniform on bounded subintervals.
Let ε > 0 be given. Choose N so that
kfn0 − fm
0
ku < ε and fn (a) − fm (a) < ε for all m, n ≥ N . Let J ⊆ I be a bounded
subinterval; say |x| ≤ M for x ∈ J. If x ∈ J, we have

fn (x) − fm (x) = (fn − fm )(x)

= (fn − fm )(x) − (fn − fm )(a) + fn (a) − fm (a)
= (fn − fm )0 (c) |x − a| + fn (a) − fm (a) ,

70 JACK SPIELBERG

for some c between x and a,

≤ ε|x − a| + ε

= ε |x − a| + 1
≤ ε(M + |a| + 1).
Thus (fn ) is uniformly Cauchy on J, and hence converges uniformly on J (and pointwise on
all of I too).
Let f be the limit of fn . We now show that f is differentiable, and that f 0 = g. Let
ε > 0. Choose N so that kfn0 − fm 0
ku < ε/3 for m, n ≥ N . Letting m → ∞, we see also that
kfn0 − gku ≤ ε/3. Now fix n ≥ N , and fix x ∈ I. For any h 6= 0 such that x + h ∈ I, we have

fn (x + h) − fn (x) f (x + h) − f (x) fn (x + h) − fn (x) fm (x + h) − fm (x)
− = m→∞
lim

h h h h

(fn − fm )(x + h) − (fn − fm )(x)
= lim
m→∞ h
= lim (fn − fm )0 (x + θh) ,

m→∞

for some θ ≡ θ(n, m, x, h) ∈ (0, 1),


ε
≤ .
3
Next we use the differentiability of fn (at x) to choose δ > 0 such that

fn (x + h) − fn (x) 0
ε
− fn (x) < ,
h 3
whenever 0 < |h| < δ and x + h ∈ I. Then for such h we have

f (x + h) − f (x) f (x + h) − f (x) fn (x + h) − fn (x)
− g(x) ≤ −
h h h

fn (x + h) − fn (x)
− fn0 (x) + fn0 (x) − g(x)

+
h
ε ε ε
< + + = ε.
3 3 3
Therefore limh→0 f (x+h)−f (x) /h = g(x). Hence f is differentiable, and f 0 (x) = g(x). 


As a last example of the interchange of two limiting processes, we give a result on differen-
tiating an integral. For this we recall from earlier experience the notion of partial derivative.
Let f : [a, b] × [c, d] → R, and suppose that for each y ∈ [c, d] the function x 7→ f (x, y) is
differentiable on [a, b]. The partial derivative of f with respect to x is defined by
∂f f (x + h, y) − f (x, y)
(x, y) = lim .
∂x h→0 h
Theorem 27.3. Let f : [a, b] × [c, d] → R be continuous, and suppose that ∂f /∂x exists and
Rd
is continuous on [a, b] × [c, d]. Let G : [a, b] → R be defined by G(x) = c f (x, y) dy. Then
Rd
G is differentiable on [a, b], and G0 (x) = c (∂f /∂x)(x, y) dy.
NOTES, MAT 472, INTERMEDIATE ANALYSIS, FALL 2010 71

Proof. Let ε > 0. Since ∂f /∂x is continuous on the compact set [a, b] × [c, d], it is uniformly
continuous. Let δ > 0 be as in the definition of uniform continuity for ∂f /∂x on [a, b] × [c, d]
and for the positive quantity ε/(d − c). Now if x, x + h ∈ [a, b] with 0 < |h| < δ, then
Z d Z d  
G(x + h) − G(x) ∂f f (x + h, y) − f (x, y) ∂f
− (x, y) dy = − (x, y) dy
h c ∂x c h ∂x
Z d  
∂f ∂f
= (x + θh, y) − (x, y) dy ,
∂x c ∂x
for some θ ≡ θ(x, y, h) ∈ (0, 1),
ε
≤ (d − c) = ε.
d−c
Rd
It follows that G0 (x) = c
(∂f /∂x)(x, y) dy. 

28. Infinite series


P∞
Definition 28.1. Let (an )n∈N be a sequence in R. The infinite series n=1 an is defined as
follows. For each n let sn = ni=1 ai ; sn is called the nth partial
P
P∞ sum of the series. The infinite
series is the sequence (sn )n∈N of partial sums. The series n=1 an converges (diverges) if the
sequence of partial sums converges (diverges). The sum of a convergent
P∞ series is the limit of
the sequence of partial sums. The sum is usually denoted by n=1 an .
Remark 28.2. Frequently, an infinite series uses N ∪ {0} as index set. Other intervals in Z
are also used.
Remark 28.3. The Cauchy criterion for convergence P of real sequences can be translated
into the following criterion for convergence of series: an converges
Pn if and
only if for every
ε > 0, there exists n0 ∈ N such that for all n0 ≤ m ≤ n we have
i=m ai < ε.

P∞
Theorem 28.4. (Test for divergence.) Let n=1 an be an infinite series. If the series
converges, then limn→∞ an = 0.
Proof. Suppose that ∞
P
n=1 an converges. Then

lim an = lim (sn − sn−1 ) = lim sn − lim sn−1 ,


n→∞ n→∞ n→∞ n→∞

since both of these limits exist,



X ∞
X
= an − an = 0.
n=1 n=1


P∞
Example 28.5. (1) The series n=0 (−1)n diverges, since limn→∞ (−1)n does not exist
(and hencePis not equal to zero). √
(2) The series ∞ n=1 n
−1/n
diverges, since limn→∞ n−1/n = 1/(limn→∞ n n) = 1 is nonzero.
P P P P
Proposition
P 28.6.PIf an and bn converge, then so does (λan + µbn ), and (λan +
µbn ) = λ an + µ bn .
Proof. This follows immediately from the corresponding results for sequences. 
72 JACK SPIELBERG

Theorem 28.7. (Geometric series.) Let x ∈ R. The series ∞ n


P
n=0 x converges if and only if
|x| < 1, in which case its sum is 1/(1 − x). (The number x is called the ratio of the geometric
series.)
Proof. First suppose that |x| < 1. Note that
n+1
X
xsn = xi = sn + xn+1 − 1,
i=1
and hence
1 − xn+1
.
sn =
1−x
Since |x| < 1, then limn→∞ xn+1 = 0. Thus limn→∞ sn = 1/(1 − x).
Conversely, if |x| ≥ 1, then (xn ) does not converge, so the series diverges by the test for
divergence. 
Remark 28.8. Generally, it is very difficult to find the sum of a convergent infinite series.
Often we content ourselves with being able to prove convergence (or divergence). Geometric
series are one of the exceptions. For the next family of series, we will not worry about the
value of the sum. First we make an observation about series with nonnegative terms.
Lemma 28.9. Let ∞
P
n=1 an be an infinite series, and suppose that an ≥ 0 for all n. Then
the series converges if and only if the sequence of partial sums is bounded.
Proof. The sequence of partial sums is increasing, since sn+1 = sn + an+1 ≥ sn . We already
know that a monotone sequence converges if and only if it is bounded. 
P∞
Theorem 28.10. (p series) Let p ∈ R. The series n=1 1/np converges if and only if p > 1.
Proof. If p ≤ 0, then the terms of the series are increasing. In this case, 1/np cannot converge
to zero, so the series diverges. Now assume that p > 0. In this case the terms of the series
are decreasing. Let an = 1/np , and consider two cases. First, suppose that p > 1. We have
that
n −1
2X
ai = a1 + (a2 + a3 ) + (a4 + · · · + a7 ) + · · · + (a2n−1 + · · · + a2n −1 )
i=1
≤ a1 + 2a2 + 4a4 + · · · + 2n−1 a2n−1 ,
since the terms are decreasing. Hence
n−1 n−1 n−1
X
j
X
j−jp
X j
s2n −1 ≤ 2 a2j = 2 = 21−p .
j=0 j=0 j=0

This last is the partial sum of a geometric series with ratio 21−p . Since p > 1, the ratioPis less
than 1, and hence the geometric series converges. It follows that the partial sums of 1/np
are bounded, hence it converges.
Next we suppose that 0 < p ≤ 1. We have that
2 n
X
ai = a1 + a2 + (a3 + a4 ) + (a5 + · · · + a8 ) + · · · + (a2n−1 +1 + · · · + a2n )
i=1
≥ a2 + 2a4 + 4a8 + · · · + 2n−1 a2n ,
NOTES, MAT 472, INTERMEDIATE ANALYSIS, FALL 2010 73

again since the terms are decreasing. Hence


n−1 n−1 n−1
X
j
X
j−(j+1)p −p
X j
s2n ≥ 2 a2j+1 = 2 =2 21−p .
j=0 j=0 j=0
1−p
Since p ≤ 1, the ratio of P
this last geometric series is 2 ≥ 1, hence it diverges. It follows
that the partial sums of 1/np are unbounded, so it diverges also. 
P∞
Example 28.11. We particularly draw attention to the case p = 1: n=1 1/n is called the
harmonic series, and it diverges.
Having a few series whose behavior is known makes it relatively easy to establish conver-
gence or divergence of other series.
Theorem 28.12. (Comparison test.) Let ∞
P P∞
n=1 an and n=1 bn be series, and suppose that
an , bn ≥ 0 for all n. Suppose further that an ≤ bn for all n.
(1) If ∞
P P∞
n=1 bn converges, then so does an .
P∞ P∞n=1
(2) If n=1 an diverges, then so does n=1 bn .
(In fact, both conclusions hold if the corresponding inequalities are valid only for n ≥ n0 .)
Pn Pn
Proof. Let sPn = i=1 ai and tn = i=1 bi . Since ai ≤ bi for all i, we know that sn ≤ tn

P∞all n. If n=1 bn converges,
for P∞ then (tn ) is bounded above, and hence so is (sn ). Therefore
a
n=1 n converges.
P∞ If n=1 n diverges, then (sn ) is unbounded, and hence so is (tn ). It
a
follows that n=1 bn diverges. 
P∞
Definition P28.13. Let n=1 an be a convergent infinite series.
P∞ It converges absolutely if

the series n=1 |an | converges. It converges conditionally if n=1 |an | diverges.
Thus all infinite series may be classified into three (mutually exclusive) types: absolutely
convergent, conditionally convergent, and divergent. Most of the previous results concern
series whose terms are nonnegative; hence they are useful for establishing absolute conver-
gence. It is appropriate to think of absolute convergence as “robust”, and of conditional
convergence as “touchy”. In this course we will only treat absolute convergence (for lack
of time). However a few remarks about conditional convergence are in order. The simplest
example of a conditionally convergent series is the alternating harmonic series:
1 1 1 1
− + − + ··· .
1 2 3 4
It is an easy exercise to prove that this series converges. Since its absolute value is the har-
monic series that we already know diverges, the alternating harmonic series is conditionally
convergent. The difference between absolute and conditional convergence can be illustrated
by the fact that the sum of the alternating harmonic series can be altered by changing the
order in which the terms are added. This somewhat counter-intuitive fact can be explained
by noticing that the “subseries” of odd terms by itself is divergent, as is the subseries of even
terms. This phenomenon is completely general: the sum of an absolutely convergent series is
unaffected by rearranging the terms, while the sum, and even convergence, of a conditionally
convergent series can be changed arbitrarily by a suitable rearrangement of the terms. We
will not discuss this theorem further.
The next two theorems give the most useful tests for absolute convergence.
74 JACK SPIELBERG
P∞
Theorem 28.14. (Ratio test.) Let n=1an be a series of nonzero terms. Let

an+1
L = lim sup
.
n→∞ an
(1) If L < 1 then the series converges
absolutely.

(2) If there exists n0 ∈ N such that an+1 /an ≥ 1 for all n ≥ n0 , then the series diverges.

Proof. (1) Choose r such that L < r < 1. Let n0 ∈ N be such that an+1 /an ≤ r for
n ≥ n0 . Then for k ≥ 0 we have
|an0 +k | ≤ r|an0 +k−1 | ≤ r2 |an0 +k−2 | ≤ · · · ≤ rk |an0 |.
n n0
P n ≥ n0 we have |an | ≤ Cr , where C = |an0 |/r . The absolute
It follows that for
convergence of an now follows from by comparison with the geometric series of
ratio r.
(2) In this case, the hypotheses imply that |an | ≥ |an0 | for all n ≥ n0 , and hence an does
not tend to zero.

P∞
Theorem 28.15. (Root test.) Let n=1 an be a series. Let
p
L = lim sup n |an |.
n→∞

(1) If L < 1 then the series converges absolutely.


(2) If L > 1 then the series diverges.
p
Proof. (1) Choose r such that L < r < 1. Let n0 ∈ N be such that n
|an | ≤ r for n ≥ n0 .
n
P
Then |an | ≤ r for n ≥ n0 . The absolute convergence of an now follows from by
comparison with the pgeometric series of ratio r.
(2) Let n0 be such that n |an | > 1 for n ≥ n0 . Then |an | > 1 for n ≥ n0 , and hence an
does not tend to zero.

The ratio and root test are useful in situations where the series converges at least as
strongly as some geometric series. Note that both tests are inclusive for the p series (exer-
cise!). The ratio test is usually easier to apply, but the root test is more effective: if the ratio
test indicates convergence, then the root test does too. There are series for which the ratio
test is inconclusive, but the root test indicates convergence (exercises).
29. Series of functions
Definition 29.1. Let X be a metric space, and let fn : X → R. The series of functions
P ∞
n=1 fn converges pointwise (uniformly) if the sequence of partial sums converges pointwise
(uniformly).
It is easy to translate results about convergence of sequences of functions to statements
about series of functions.
Theorem 29.2. (Cauchy criterion) ∞
P
n=1 fn converges uniformly on X if andPn only if for
every ε > 0 there exists n0 ∈ N such that for all m, n ≥ n0 and for all x ∈ X,
i=m fi (x) <

ε.
The following criterion for uniform convergence is called the Weierstrass M-test:
NOTES, MAT 472, INTERMEDIATE ANALYSIS, FALL 2010 75

Corollary 29.3. Let ∞ ∞


P
fn be a series of functions on X. LetP(M
n=1

n )1 be a sequence of
constants such that fn (x) ≤ Mn for all x ∈ X and all n. If n=1 Mn converges, then
P ∞
n=1 fn converges uniformly.

We also have the following facts as immediate corollaries of the theorems on uniform
convergence.
Theorem 29.4. Let fn : X → R be functions.
P P
(1) If fn is continuous for all n, and fn converges
P uniformly, then fn is continuous.
P
(2) If X = [a, b], fn ∈ R[a, b] for all n, and fn converges uniformly, then fn ∈
RbP PRb
R[a, b], and a fn = f .
a n P 0 P
(3) If X = [a, b], fn is differentiable forPall n, fn converges uniformly, and P 0fn (x0 )
fn is differentiable, and ( fn )0 =
P
converges for some x0 ∈ [a, b], then fn .

30. Power series


P∞
Definition 30.1. Let x0 ∈ R and let (an )∞ n=0 be a real sequence. The series n=0 an (x−x0 )
n

is called a power series


P with center xn0 . The domain of convergence of the power series is
the set D = x ∈ R : ∞

a
n=0 n (x − x 0 ) converges .
P∞
Theorem 30.2. Let n=0 an (x − x0 )n be a power series. There are three possibilities.
(1) The series converges absolutely for all x ∈ R. The convergence is uniform on compact
sets.
(2) There is R > 0 such that the series converges absolutely if |x − x0 | < R, and diverges
if |x − x0 | > R. The convergence is uniform on compact subsets of the interval
(x0 − R, x0 + R).
(3) The series diverges for all x 6= x0 .
Proof. Let x, y ∈ R with |y − x0 | < |x − x0 |. First suppose that the
series converges at x.
Then limn→∞ an (x − x0 )n = 0, so there is M such that an (x − x0 )n ≤ M for all n. Now we
have
y − x0 n

n n
an (y − x0 ) = an (x − x0 )
x − x0
an (y − x0 )n converges absolutely by comparison with the geometric series M rn ,
P P
Thus
where r = |y − x0 |/|x − x0 | < 1. The contrapositive of the above implication shows that, on
the other hand, if the series diverges at y, then it must also  diverge at x.
Now, if cases (1) and (3) do not apply, let R = sup |x − x0 | : the series converges at
x . It follows from the above that 0 < R < ∞, and that the series converges absolutely for
|x − x0 | < R and diverges for |x − x0 | >  R. We let R = ∞ in case (1), and we let R = 0 in
case (3). In cases (1) and (2), if K ⊆ x : |x − x0 | < R is a compact set, then there are
that 0 <nr < s < R, and K ⊆ (x0 − r, x0 +nr). Let x
r, s such P= x0 + s, and let M be as
above: an (x − x0 ) ≤ M for all n. Let Mn = M (r/s) . Then n Mn < ∞. For any y ∈ K

we have
an (y − x0 )n = an (x − x0 )n y − x0 ≤ M r = Mn .
 n
x − x0 s
Then the convergence is uniform on K by the Weierstrass M-test. 
Definition 30.3. The number R in Theorem 30.2 is called the radius of convergence of the
power series.
76 JACK SPIELBERG

We see that (x0 − R, x0 + R) ⊆ D ⊆ [x0 − R, x0 + R], where D is the domain of convergence


of the power series.
p −1
Theorem 30.4. R = lim supn→∞ n |an |
1/n
Proof. We use the root test: lim supn→∞ an (x − x0 )n = lim supn→∞ |an |1/n |x − x0 |.
p  −1
This is less than 1 if |x − x0 | < lim supn→∞ n |an | , and greater than 1 if |x − x0 | >
p −1
lim supn→∞ n |an | . This identifies the number R as in the statement of the theorem. 
Let an(x−x0 )n have a positive radius of convergence R. Then nan (x−
P P
Theorem 30.5.
x0 )n−1 and x0 )n+1 also have radius of convergence R. If f : (x0 − R, x0 +
P
an /(n + 1) (x −P
R) → R is defined by f (x) = ∞ n 0
n=0 aRn (x − x0 ) , then these converge to f (x) and F (x) for
x
x ∈ (x0 − R, x0 + R) (where F (x) = x0 f (t) dt).
Proof. For x 6= x0 ,
∞ ∞
X X nan
nan (x − x0 )n−1 = (x − x0 )n .
n=0 n=0
x − x0
1/n
Since limn n = limn |x − x0 |1/n = 1, we have
nan 1/n

lim sup = lim sup |an |1/n = R−1 .
n→∞ x − x0 n→∞

For the antidifferentiated series, first note that


 1/n
1/n 1
1 ≤ (n + 1) = 1 + n1/n ≤ 21/n n1/n → 1 as n → ∞.
n
Now we have
∞ ∞  
X an n+1
X an (x − x0 )
(x − x0 ) = (x − x0 )n ,
n=0
n + 1 n=0
n + 1
and hence
an (x − x0 ) 1/n

lim sup = lim sup |an |1/n = R−1 .
n→∞ n+1 n→∞
Since the convergence of all three series is uniform on compact subsets of the interior of
the domain of convergence, we may differentiate or integrate term-by-term, by Theorem
29.4. 

31. Compactness in function space


Let X be a compact metric space (we will only consider the case where X is compact).
k
Recall that C(X, R ) is a complete metric space with the uniform norm: kf − gku =
supx∈X f (x) − g(x) . It is important to remember that the Heine-Borel theorem does not

hold in this setting: it is possible
 for a closed bounded
set to be non-compact. In particular,
k
the closed unit ball B = f ∈ C(X, R ) : kf ku ≤ 1 is (usually) not compact.
Here is an example, with X = [0, 1]. Let fn (x) = xn . Then fn ∈ B. If B were compact,
then the sequence (fn ) in B would have a convergent subsequence — that means a uniformly
convergent subsequence. But we have already seen that there is no such subsequence, since
the pointwise limit of fn exists and is discontinuous.
NOTES, MAT 472, INTERMEDIATE ANALYSIS, FALL 2010 77

We will characterize the compact subsets of C(X, Rk ). Recall that a set is compact if
and only if it is complete and totally bounded. Since C(X, Rk ) is already a complete metric
space, a subset is complete if and only if it is closed. Therefore we will focus our attention
on the property of total boundedness: how can we describe in a more intrinsic way what it
means for a subset of C(X, Rk ) to be totally bounded?
Let F ⊆SC(X, Rk ) be totally bounded. Let ε > 0. Then there are f1 , . . ., fn ∈ F such
that F ⊆ ni=1 Bε (fi ). Since X is compact, and the fi are continuous, they are uniformly
continuous.
for each i there is δi > 0 such that for all x, y ∈ X, if d(x, y) < δi then
Thus
fi (x) − fi (y) < ε. Let δ = min{δ1 , . . . , δn }. We claim that for any function f ∈ F, this δ
works in the definition of uniform continuity. To see this, let f ∈ F, and let x, y ∈ X with
d(x, y) < δ. There is i0 , 1 ≤ i0 ≤ n, such that kf − fi0 k < ε. Then

f (x) − f (y) ≤ f (x) − fi0 (x) + fi0 (x) − fi0 (y) + fi0 (y) − f (y) < ε + ε + ε = 3ε.
Thus we have shown that the functions in the family F are “equally uniformly continuous”.
This phrase has been shortened to “equicontinuous”.
Definition 31.1. Let F be a family of functions between metric spaces X and Y . Let
x0 ∈ X.
(1) F is equicontinuous at x0 if for each ε > 0 there is δ > 0 such that for each f ∈ F
and for all x ∈ X, if dX (x, x0 ) < δ then dY f (x), f (x0 ) < ε. (I.e. δ is independent
of the choice of f ∈ F.)
(2) F is equicontinuous (on X) if it is equicontinuous at each point of X.
(3) F is uniformly equicontinuous (on X) if for each ε > 0 there is δ >  0 such that for
each f ∈ F and for all x, z ∈ X, if dX (x, z) < δ then dY f (x), f (z) < ε.
Exercise 31.2. If X is compact, and F is equicontinuous, then F is uniformly equicontin-
uous.
Because of this exercise, when X is compact we need not distinguish between equicontinu-
ity and uniform equicontinuity. We remark that there are stupid examples of equicontinuous
families. For example, in C(X, R), we may consider the family of all constant functions.
This family is clearly equicontinuous, but is not totally bounded (or even bounded). For this
reason we identify another property of a family of functions.
Definition 31.3.
 F ⊆ C(X, Rk ) is pointwise bounded if for each x ∈ X, the set F(x) :=
f (x) : f ∈ F is a bounded subset of Rk .
Exercise 31.4. If F ⊆ C(X, Rk ) is pointwise bounded and equicontinuous, then F is a
bounded subset (of C(X, Rk )).
We remark that a totally bounded subset of C(X, Rk ) is also bounded, and hence pointwise
bounded. Thus we have already proved the following result.
Lemma 31.5. Let X be compact and F ⊆ C(X, Rk ). If F is totally bounded, then F is
pointwise bounded and equicontinuous.
The Arzela-Ascoli theorem is the converse of the lemma. It is usually phrased in terms of
precompactness: a subset of a metric space is precompact if its closure is compact. In the
setting of C(X, Rk ), then, precompactness is the same as total boundedness.
Theorem 31.6. Let X be a compact metric space, and let F ⊆ C(X, Rk ). Then F is
precompact if and only if it is pointwise bounded and equicontinuous.
78 JACK SPIELBERG

Proof. As remarked above, we have already proved the “only if” direction. So we assume that
F is pointwise bounded and equicontinuous. We use Exercise 31.2; hence F is uniformly
equicontinuous. Let ε > 0. Choose δ > 0 as in the definition of uniform equicontinuity
of F. SinceS X is compact, X is totally bounded. Then there are x1 , . . ., xp ∈ X such
that X = pi=1 Bδ (xi ). Now we use the pointwise boundedness of F. For each i, the set
F(xi ) = f (xi ) : f ∈ FS is a bounded subset of Rk , hence is totally bounded (by Lemma
14.27). Then the union pi=1 F(xi ) is also totally bounded. So we can choose points y1 , . . .,
yq ∈ Rk such that
p q
[ [
F(xi ) ⊆ Bε (yj ).
i=1 j=1
Now we come to the interesting part of the argument. Let f ∈ F. For each i, choose j
such that f (xi ) ∈ Bε (yj ). This defines a function ηf : {1, 2, . . . , p} → {1, 2, . . . , q}. Thus ηf
satisfies the formula
f (xi ) ∈ Bε (yηf (i) ).
But notice that there are only a finite number of possible functions η : {1, 2, . . . , p} →
{1, 2, . . . , q}. For each such function η, let
Cη = {f ∈ F : ηf = η}.
Then F ⊆ η Cη , a finite union. Each Cη is a subset of C(X, Rk ). To finish the proof, we
S
will show that Cη has diameter at most 4ε. Let f , g ∈ Cη , for some η. Then for i = 1, . . .,
p, we have f (xi ), g(xi ) ∈ Bε (yη(i) ). For any x ∈ X choose i with x ∈ Bδ (xi ). Then

f (x) − g(x)k ≤ f (x) − f (xi ) + f (xi ) − g(xi ) + g(xi ) − g(x) < ε + f (xi ) − g(xi ) + ε,
by the uniform equicontinuity of F (and the choice of δ),
< ε + 2ε + ε,
since f (xi ) and g(xi ) belong to a ball of radius ε. Thus kf − gku < 4ε. Therefore Cη has
diameter at most 4ε. 

32. Conditional convergence


P
Theorem 32.1. (Abel’s theorem.) Let an have bounded
Pnpartial
sums, and let (bn ) be a
decreasing nonnegative sequence. If M ≥ 0 is such that
j=1 aj ≤ M for all n, then
for
Pn P
any m ≤ n we have the estimate
j=m aj bj ≤ 2M bm . Moreover, if bn → 0, then
an b n
converges.
Proof. Let sn = nj=1 aj . We have
P

n
X n
X
aj b j = (sj − sj−1 )bj
j=m j=m
n
X n−1
X
= s j bj − sj bj+1
j=m j=m−1
n−1
X
= sj (bj − bj+1 ) + sn bn − sm−1 bm .
j=m
NOTES, MAT 472, INTERMEDIATE ANALYSIS, FALL 2010 79

We then have

Xn X n−1

aj b j ≤ sj (bj − bj+1 ) + |sn bn | + |sm−1 bm |



j=m j=m
n−1
X
≤ M (bj − bj+1 ) + M bn + M bm ,
j=m

since bj − bj+1 , bn , and bm ≥ 0,

= 2M bm .
P
If bm → 0 as m → ∞, the series an bn converges by the Cauchy criterion. 

Remark 32.2. P The estimate


in Abel’s theorem also shows that the “remainder” after m
terms satisfies j>m am bm ≤ 2M bm+1 .

Corollary 32.3. (Alternating series test.) Let (bn ) be a decreasing sequence with limit 0.
Then the alternating series

X
b1 − b2 + b3 − · · · = (−1)n−1 bn
n=1

converges, and ∞
P j−1

j=n+1 (−1) bj ≤ bn+1 .

Proof. With an = (−1)n , Abel’s theorem provesPconvergence, and gives the estimate with a
factor of 2. However, since the partial sums of an are all non-negative (either 0 or 1), the
estimate in that proof can be improved as in the statement of the corollary. We leave the
details to the interested reader. 
P∞ n−1
Example 32.4. (1) (The alternating harmonic series.) n=1 (−1) /n = 1 − 1/2 +
1/3 − 1/4 + 1/5 − · · · converges by the alternating series test. (We will see later that
the sum is log 2.)
(2) Let θ be an irrational number. (In fact, the argument we present applies to any
non-integral real number θ.) In the following, we will apply the formula for the sum
of a finite geometric series to complex numbers.
n
X n
X
sin 2πjθ = Im (cos 2πjθ + i sin 2πjθ)
j=1 j=0
n
X
= Im (cos 2πθ + i sin 2πθ)j
j=0

1 − (cos 2πθ + i sin 2πθ)n+1


= Im .
1 − (cos 2πθ + i sin 2πθ)
80 JACK SPIELBERG

Hence
n
X 2
sin 2πjθ ≤

1 − (cos 2πθ + i sin 2πθ)


j=1
2
=p
(1 − cos 2πθ)2 + sin2 2πθ
r
2
= .
1 − cos 2πθ
P
Thus the series
P sin 2πnθ n sin 2πnθ has bounded partial sums. By Abel’s theorem, the series
n n
converges.
Abel also proved the following theorem on the behavior of a power series at an endpoint
of the interval of convergence.
Theorem 32.5. Let ∞ n
P
n=0 an (x − x0 ) have radius of convergence 0 < R < ∞. Suppose that
the series converges at an endpoint of the interval of convergence. Then the series converges
uniformly on the closed interval from x0 to that endpoint.
Corollary 32.6. With the hypotheses of the theorem, let f (x) denote the sum of the series
in its domain of convergence. Then f is continuous.
Proof. (of theorem) A linear change of variables reduces the theorem to the case where x0 = 0
and R = 1. We consider the case where the series converges at P the right-hand endpoint;
∞ n
the other case has a similar proof.
P Thus we have a power series n=0 an x with radius of
convergenceP 1, and such that an converges. Let ε > 0 be given. Applying the Cauchy
criterion to an , we obtain n0 ∈ N such that for all n0 ≤ m ≤ n we have

X n ε
aj < .

2


j=m

For any x ∈ [0, 1], the sequence xn is decreasing. We apply Abel’s theorem to the series
P ∞ j
j=n0 aj x to get
X n ε
aj x j 2 xm ≤ ε,

2


j=m
a uniform estimate. 
n
P∞
Example 32.7. (1) From the geometric series 1/(1 − x) = n=0 x for |x| < 1, we
integrate term-by-term to obtain
Z x ∞ Z x ∞ ∞
dt X X xn+1 X xn
− log(1 − x) = = tn dt = = ,
0 1−t n=0 0 n=0
n + 1 n=1
n
still with radius of convergence equal to 1. Replacing x by −x we get

X xn
(∗∗) log(1 + x) = (−1)n−1 ,
n=1
n
valid for |x| < 1. When x = 1 we have the alternating harmonic series, which
converges. By Abel’s theorem, the power series converges uniformly on [0, 1], and
NOTES, MAT 472, INTERMEDIATE ANALYSIS, FALL 2010 81

hence the limit is continuous. Since the equality in (∗∗) holds on [0, 1), and both
sides are continuous on [0, 1], the equality must hold at x = 1. This gives
1 1 1
log 2 = 1 − + − + ··· .
2 3 4
(2) Again starting with the geometric series, we replace x by −x2 to get

1 X
= (−1)n x2n .
1 + x2 n=0

Since | − x2 | < 1 if and only if |x| < 1, this equation is also valid for |x| < 1. Now we
integrate term-by-term to get
Z x ∞
dt X (−1)n 2n
arctan x = 2
= x ,
0 1+t n=0
2n + 1

valid for |x| < 1. Again, the series converges for x = 1 by the alternating series test.
By Abel’s theorem, the series is continuous on [0, 1], and so the above equation is
still valid at x = 1. We obtain the classical series
π 1 1 1
= 1 − + − + ··· .
4 3 5 7
(3) We consider f (x) = (1 + x)α for α > 0, α 6∈ N. Repeated differentiation gives
f (n) (x) = α(α − 1) · · · (α − n + 1)(1 + x)α−n . Thus the Taylor series for f is given by

X α(α − 1) · · · (α − n + 1)
1+ xn .
n=1
n!

Letting an = α(α − 1) · · · (α − n + 1)/n!, we compute |an+1 /an | = |α − n|/(n + 1) → 1


as n → ∞. Thus the ratio test implies that the radius of convergence of the series is
1. If we appy the convergence test of homework #49, we find that (for n > α)
 
α − n
n − 1 = n (−1 − α) → −1 − α
n+1 n+1
as n → ∞. Thus since α > 0, the Taylor series converges at x = ±1.
Now we are faced with the following difficulty. Let g(x) = 1 + ∞ n
P
n=1 an x be the
sum of the Taylor series of f . Then f and g are both defined on [−1, 1], and are
continuous (by Abel’s theorem, in the case of g). We would like to know that they
are equal. In the previous two examples, we knew which function the power series
represented because we began with the geometric series. The other method that
we have seen involves using Taylor’s theorem to prove that the Taylor polynomials
converge to the function uniformly on compact subsets of the (interior of the) interval
of convergence. In the current example, it isn’t apparent how to bound the derivatives
of f so as to use Taylor’s theorem. Instead, we will present a clever trick, courtesy
of Folland’s Real Analysis.
Notice that f 0 (x) = α(1 + x)α−1 , and hence α−1 (1 + x)f 0 (x) = (1 + x)α . Since
we intend to prove that g(x) = (1 + x)α , we investigate this expression using g.
82 JACK SPIELBERG

Differentiating term-by-term on (−1, 1), we get



X
α−1 (1 + x)g 0 (x) = α−1 (1 + x) nan xn−1
n=1
∞ ∞
!
X X
= α−1 (n + 1)an+1 xn + nan xn
n=0 n=1

!
X
= α−1 (n + 1)an+1 + nan xn

a1 +
n=1

Note that a1 = α, and for n ≥ 1,


α(α − 1) · · · (α − n + 1)(α − n)
(n + 1)an+1 + nan = (n + 1)
(n + 1)!
α(α − 1) · · · (α − n + 1)
+ n
n!
α(α − 1) · · · (α − n + 1) 
= (α − n) + n
n!
= an α.
Hence
α−1 (n + 1)an+1 + nan = an .


Thus we see that α−1 (1 + x)g 0 (x) = 1 + ∞ n


P
n=1 an x = g(x). It follows that
d
(1 + x)−α g(x) = −α(1 + x)−α−1 g(x) + (1 + x)−α g 0 (x)
dx
= −α(1 + x)−α−1 α−1 (1 + x)g 0 (x) + (1 + x)−α g 0 (x)
= 0.
Since g(0) = 1, we have g(x) = (1 + x)α = f (x) on (−1, 1). By continuity, they are
equal on [−1, 1].
Finally, we remark that letting t = 1+x ∈ [0, 2], we obtain tα = 1+ ∞ n
P
n=1 an (t−1) ,
the series converging uniformly on [0, 2]. This is another explicit demonstration of
Weierstrass’ approximation theorem (for these functions).