Professional Documents
Culture Documents
JACK SPIELBERG
Contents
1. Axioms for the real numbers 2
2. Cardinality (briefly) 8
3. Decimal representation of real numbers 9
4. Metric spaces 11
5. The topology of metric spaces 14
6. Sequences 17
7. Continuous functions 19
8. Limits of functions 21
9. Sequences in R 22
10. Limsup and liminf 24
11. Infinite limits and limits at infinity 26
12. Cauchy sequences and complete metric spaces 27
13. Compactness 28
14. The Cantor set 34
15. Connectedness 35
16. Continuity and compactness 37
17. Continuity and connectedness 38
18. Uniform continuity 39
19. Convergence of functions 40
20. Differentiation 42
21. Higher order derivatives and Taylor’s theorem 48
22. The Riemann integral 50
23. The “Darboux” approach 53
24. Measure zero and integration 56
25. The fundamental theorem of calculus 60
26. The Weierstrass approximation theorem 62
27. Uniform convergence and the interchange of limits 65
28. Infinite series 68
29. Series of functions 71
30. Power series 72
31. Compactness in function space 73
32. Conditional convergence 75
1
2 JACK SPIELBERG
n n!
= .
j j!(n − j)!
We will assume familiarity with this stuff. It is interesting, though, to consider what is
actually included in the phrase “this stuff.” What facts from high school algebra are covered
by the field axioms? Here is an example of something that is not covered.
Example 1.2. Let F be a field. Are the elements 1, 1 + 1, 1 + 1 + 1, 1 + 1 + 1 + 1, . . .
all distinct? In fact, if we just have the field axioms, we can neither prove nor disprove
that these are all distinct elements. Notice that these are what we normally refer to as the
natural numbers (denoted N). So it isn’t clear that the natural numbers even make sense in
an arbitrary field.
Exercise 1.3. Explain why the “fact” stated in the previous example is true.
NOTES, MAT 472, INTERMEDIATE ANALYSIS, FALL 2010 3
(8) |x − a| < r if and only if a − r < x < a + r (draw a picture on the number line).
(9) If a < x < b and a < y < b then |x − y| < b − a.
(10) Let x ∈ F . Suppose that |x| < ε for every positive element ε ∈ F . Then x = 0.
Property 10 above can be strengthened a bit, in a way that can be very useful. (Don’t
cite property 10 when proving this.)
Exercise 1.6. Let F be an ordered field, and let x ∈ F . Suppose that p, q ∈ F + are such
that for every ε ∈ F with 0 < ε < p, we have |x| < qε. Then x = 0.
Remark 1.7. Here is another consequence of the ordered field axioms. Let b > 0. Then
(1 + b)n = 1 + nb + · · · > nb.
1
Now let 0 < a < 1. Then a
> 1, so
1−a 1
= − 1 > 0.
a a
1 1
Let b = a
− 1. Then a = 1+b
, and
n 1 1 a 1
a = < = .
(1 + b)n nb 1−a n
Now we ask the following question (assuming some familiarity with the concept of limit,
but only for the sake of the discussion): if 0 < a < 1 does an tend towards 0, as n → ∞?
Another way to put this is to ask: if c is any fixed positive element, does there exists n0 ∈ N
such that an < c for all n ≥ n0 ? Using the above computations, we see that we can answer
this question affirmatively if we could show that for any fixed positive element c, there exists
a
n0 ∈ N such that (1−a)n < c for all n ≥ n0 . Now observe that we could do this if we could
a
find n0 ∈ N such that (1−a)n 0
< c. In other words, we could prove that an → 0 if we could
a
find n0 ∈ N such that n0 > (1−a)c . But since c is an arbitrary positive element, then so
a
is (1−a)c . So this all comes down to trying to prove that for any positive element x, there
is a natural number n0 such that n0 > x. An ordered field in which this is true is called
Archimedean.
Definition 1.8. Let F be an ordered field. F is called Archimedean if for every x ∈ F there
exists a natural number n such that x < n.
It is evident that Q is an Archimedean ordered field, and we “know” that R is one too.
But we can’t prove it yet, because not all ordered fields are Archimedean!! In other words,
we don’t yet have enough axioms for the real numbers, since we can’t prove the most basic
fact from advanced calculus. Along with the field and order axioms, there is one more axiom
that is necessary to characterize the real numbers. We need some definitions before we can
present it.
Definition 1.9. Let F be an ordered field, let S ⊆ F , and let x ∈ F .
(1) x is an upper bound of S if y ≤ x for every y ∈ S.
(2) x is a lower bound of S if y ≥ x for every y ∈ S.
(3) S is bounded above if there exists an upper bound for S.
(4) S is bounded below if there exists a lower bound for S.
(5) S is bounded if it is bounded above and below.
Exercise 1.10. Is the empty set bounded?
NOTES, MAT 472, INTERMEDIATE ANALYSIS, FALL 2010 5
natural numbers march off arbitrarily far to the right. Our first theorem about the real
numbers is this fact. As we pointed out before, the proof must rely on the completeness
axiom, since not all ordered fields are Archimedean.
Theorem 1.17. R is Archimedean: for every x ∈ R there exists n ∈ N such that x < n.
Proof. We suppose that R is not Archimedean, and derive a contradiction. So let x ∈ R
be such that x ≥ n for all n ∈ N. This just means that x is an upper bound for N. Thus
the (non-empty) subset N of R is bounded above. By the completeness axiom, N has a
supremum. Let z = sup(N). Now z − 1 < z. By Definition 1.11 (200 ), there is an element
n ∈ N with n > z − 1. But then n + 1 > z. Since n + 1 ∈ N, this contradicts Definition 1.11
(1). Therefore R is Archimedean.
We now present some corollaries of the Archimedean property.
1
Corollary 1.18. If x ∈ R with x > 0, then there exists n ∈ N with n
< x.
Proof. By the Archimedean property there is n ∈ N with n > x1 . Then 1
n
< x.
Before stating the next corollary, we recall the well-ordering principle (WOP) and one of
its variations. The WOP states that a non-empty subset of N contains a smallest element.
This is a fundamental property of the natural numbers — it is logically equivalent to the
principle of mathematical induction. The variation we need states that a non-empty subset
of Z that is bounded below (in Z) contains a smallest element.
Corollary 1.19. For x ∈ R there exists a unique n ∈ Z with n ≤ x < n + 1.
Proof. Let x ∈ R. By the Archimedean property there is m ∈ N with m > |x|. Then
x > −m, so the set {k ∈ Z : k > x} is non-empty and bounded below (by −m). Let n + 1
be its smallest element. Then n + 1 > x. But since n < n + 1, n is not in this set, so
n ≤ x. This proves existence. For uniqueness, suppose that n and n0 both do the job. Then
x − 1 < n, n0 ≤ x, so (by property 9 of absolute value) we have |n − n0 | < 1. Since n, n0 ∈ Z
then n = n0 .
The integer n of Corollary 1.19 is denoted [x]. The function [·] : R → Z is called the greatest
integer function. (Some people denote it by bxc; b·c is also called the floor function.)
n
Corollary 1.20. For x ∈ R and for N ∈ N there exists a unique n ∈ Z such that N
≤x<
n+1
N
.
Proof. Apply Corollary 1.19 to N x.
Corollary 1.21. For x, ε ∈ R with ε > 0, there exists y ∈ Q such that |x − y| < ε.
Proof. By Corollary 1.18 there is N ∈ N with N1 < ε. By Corollary 1.20 there is n ∈ Z such
that Nn ≤ x < n+1
N
. Let y = Nn . Then y ∈ Q, and |x − y| = x − y < n+1
N
− Nn = N1 < ε.
The conclusion of Corollary 1.21 is often expressed as: Q is dense in R.
The completeness axiom is actually stronger than the Archimedean property. The next
result does not follow from the Archimedean property (as can be seen from the fact that the
conclusion does not hold in Q).
Theorem 1.22. Let n ∈ N. Every positive real number has a unique positive nth root.
NOTES, MAT 472, INTERMEDIATE ANALYSIS, FALL 2010 7
Proof. We first prove uniqueness. If 0 < y < z then y n < z n , so two distinct positive real
numbers cannot be nth roots of the same real number. We now prove existence. Let a > 1.
(If 0 < a < 1, then 1/a > 1. In this case, if we show that 1/a has a positive nth root, then
the inverse of that root will be a positive nth root for a.) Let E = {x ≥ 0 : xn ≤ a}. We
note that E 6= ∅ since 1 ∈ E. We claim that E is bounded above. To see this, note that if
x ∈ E then
n n 2
x ≤ a ≤ na < 1 + na + a + · · · = (1 + a)n .
2
Therefore x < 1 + a, and we see that 1 + a is an upper bound for E. Thus the completeness
axiom implies that y = sup(E) exists. We will show that y n = a, finishing the proof.
First note that y ≥ 1, since 1 ∈ E. We will use Exercise 1.6. Let 0 < ε < 1. First note
that since y − ε < y < y + ε, we have
(1) (y − ε)n < y n < (y + ε)n .
Since y − ε < y, property (200 ) of Definition 1.11 implies that there is x ∈ E with y − ε < x.
Then (y − ε)n < xn ≤ a. Also, since y + ε > y then y + ε 6∈ E, and hence a < (y + ε)n .
Therefore
(2) (y − ε)n < a < (y + ε)n .
From (1) and (2), and property 9 of absolute value, we have |y n − a| < (y + ε)n − (y − ε)n .
We have
n
n n
X n
y n−j εj − y n−j (−ε)j
(y + ε) − (y − ε) =
j=0
j
n
X n n−j j
y ε 1 − (−1)j
=
j=0
j
n
X n n−j j
=2 y ε
j=1
j
j odd
n
X n n−j
< 2
y ε, since ε < 1.
j=1
j
j odd
Thus the irrational numbers are also dense in R. While Corollary 1.21 and Theorem 1.23
treat the rational and irrational numbers symmetrically, in fact the set of irrational numbers
is much bigger than the set of rationals (Corollary 2.12). Before proving this, we will first
review some basic facts about the size of sets.
2. Cardinality (briefly)
Definition 2.1. Let A and B be sets.
(1) A and B are equivalent, written A ∼ B, if there exists a bijection from A to B. In
this case, A and B are said to be of the same cardinality.
(2) A is subequivalent to B, written A B, if there is a one-to-one function from A to
B.
The proof of the following proposition is elementary.
Proposition 2.2. For any sets A, B and C,
• A ∼ A.
• If A ∼ B then B ∼ A.
• If A ∼ B and B ∼ C then A ∼ C.
The next theorem is very useful, and its proof is a nice exercise.
Theorem 2.3. (Cantor-Bernstein) Let A and B be sets. If A B and B A then A ∼ B.
Definition 2.4. Let A be a set.
(1) A is finite if there is n ∈ N ∪ {0} such that A ∼ {1, 2, . . . , n}.
(2) A is infinite if A is not finite.
(3) A is denumerable if A ∼ N.
(4) A is countable if A is finite or denumerable.
(5) A is uncountable if A is not countable.
Proposition 2.5. (1) If m 6= n then {1, 2, . . . , m} 6∼ {1, 2, . . . , n}.
(2) N is infinite.
(3) A is countable if and only if A N.
(4) Let A1 , A2 , . . . be countable sets. Then ∪∞ n=1 An is countable, and for each n, A1 ×
· · · × An is countable.
(5) Q is countable.
Proof. The first three statements can be proved as exercises. For the fourth, let An =
{xn1 , xn2 , . . .}. Consider the list: x11 , x12 , x21 , x13 , x22 , x31 , . . .. For each entry, delete all
subsequent occurrences. What is left is a list, without duplications, of the elements of the
union. This defines a bijection from N to the union.
Suppose inductively that A1 × · · · × An is countable. Then
A1 × · · · × An+1 = ∪x∈An+1 A1 × · · · × An × {x}
is countable.
For the last statement, first note that Z is countable, as can be seen from the list: 0, 1,
-1, 2, -2, . . .. Since Z ∼ n1 Z, it follows from Proposition 2.2 that n1 Z is countable. Then
Q = ∪∞ 1
n=1 n Z is countable.
NOTES, MAT 472, INTERMEDIATE ANALYSIS, FALL 2010 9
4. Metric spaces
Much of what we do in analysis ultimately comes down to measuring the distance between
two real numbers. We use the absolute value for this: |x − y| is the distance between the
numbers x and y. There are many other situations where we use the distance between
points in an essential way. For example, the Pythagorean theorem is used to define the usual
distance between points in R2 , and even in Rn . One of the wonderful abstractions of XXth
century mathematics is a generalization of this notion of distance. In fact, it isn’t too hard to
notice that everything we use distance for in advanced calculus (e.g. limits, continuity, etc.)
relies only on a few very coarse aspects of the distance function. The following definition
sets these out precisely, and gives the basic setting for this course.
Definition 4.1. Let X be a set. A metric on X is a function d : X × X → R such that
(1) d(x, y) ≥ 0 for all x, y ∈ X (positivity).
(2) d(x, y) = 0 if and only if x = y (definiteness).
12 JACK SPIELBERG
Corollary 4.10. Let V be an inner product space. For x ∈ V let kxk = hx, xi1/2 . Then k · k
is a norm on V .
Proof. We will prove the triangle inequality, leaving the verification of the other properties
of a norm as an exercise. Let x, y ∈ V . Then by the Cauchy-Schwartz inequality,
kx + yk2 = hx + y, x + yi = hx, xi + hx, yi + hy, xi + hy, yi = kxk2 + 2hx, yi + kyk2
2
≤ kxk2 + 2kxk kyk + kyk2 = kxk + kyk .
Example 4.11. The usual norm on Rn arises from the usual inner product. The corre-
sponding metric space is usually referred to as (n-dimensional) Euclidean space. We note
the following important inequalities for the Euclidean norm (proof by squaring).
Remark 4.12. Let x ∈ Rn . Then for any i,
|xi | ≤ kxk ≤ |x1 | + · · · + |xn |.
Definition 4.13. Let (X, d) be a metric space, and let Y ⊆ X. If we restrict the metric d
to points of Y then Y becomes a metric space, called a subspace of X.
Example 4.14. The circle (or torus) is a subspace of Euclidean space: T = (x, y) ∈ R2 :
√
x2 + y 2 = 1 . (Thus, for example, d (1, 0), (0, 1) = 2.)
It is very important to remember that, while pictures can give a lot of valuable intuition,
they are not a substitute for a proof. In this course, you may never use a picture as part
of a proof (though they can be included to help explain what you are doing). Well, it isn’t
really enough to just tell you not to touch the stove — you really have to burn yourself. The
following example is much more frequently encountered than you might imagine the first
time you see it. You should work through carefully on your own the details of the proof that
it is a metric space, and try to visualize it in some way (it’s unclear what that means!). It
provides a counterexample to many “obvious” facts about metric spaces that are not actually
true. The point is this: any theorem that we prove about metric spaces must be true for all
metric spaces. In particular, it will be true for the metric space in the next example.
Example 4.15. Recall the set X from Example 2.6. We define a metric on X as follows.
for x, y ∈ X with x 6= y, the set {i : xi 6= yi } is non-empty. By the well-ordering principle,
it has a least element. We set k(x, y) = min{i : xi 6= yi }. Then we define
(
1
k(x,y)
, if x 6= y
d(x, y) =
0, if x = y.
We claim that d is a metric on X. The proofs of positive definiteness and symmetry are
immediate. We will verify the triangle inequality. In fact, we will prove something stronger,
called the ultrametric inequality.
Lemma 4.16. For any x, y, z ∈ X, d(x, y) ≤ max d(x, z), d(y, z) .
Proof. We will write “k(x, x) = ∞” as a kind of shorthand. (But notice that then we have
that d(x, y) < d(u, v) if and only if k(x, y) > k(u, v), for any points x, y, u, v ∈ X.)
Now let x, y, z ∈ X. If d(x, y) ≤ d(x, z) then the inequality holds. So suppose that
d(x, y) > d(x, z). Then k(x, y) < k(x, z). Since xi = zi for i < k(x, z), we have that
= xk(x,y) 6= yk(x,y)
zk(x,y) . Therefore k(y, z) ≤ k(x, y), and hence d(x, y) ≤ d(y, z). Therefore
max d(x, z), d(y, z) = d(y, z) ≥ d(x, y).
14 JACK SPIELBERG
The following is another example of a metric space that varies from what our intuition
suggests. This one often seems like a stupid metric space . . . well, it is stupid, but it is
also a metric space. Every theorem about metric spaces must be true for it, and hence any
statement that is not true for this example, cannot be proven using only the axioms of a
metric space.
Example 4.17. Let S be any set. The discrete metric on S is defined by
(
1, if x 6= y,
d(x, y) =
0, if x = y.
Remark 4.18. It is easy to see that the discrete metric on a set with n points can be
realized as a subspace of Euclidean n-space. It is a little harder to find a natural setting for
the discrete metric on N. The discrete metric on R is a useful counterexample to keep in
mind.
Remark 5.14. E is the smallest closed set containing E. As a consequence, we see that E
is closed if and only if E = E.
Note that the closure of a set is defined in analogy with the abstract characterization of
interior given in Corollary 5.12. A characterization of the points of the closure (in analogy
with Definition 5.9) is given in the second part of the next proposition.
Proposition 5.15. Let E be a subset of a metric space X, and let x ∈ X.
c
(1) E = int (E c ) .
(2) x ∈ E if and only if for all r > 0 we have Br (x) ∩ E 6= ∅.
Proof. For (1) we have
E = ∩{F : E ⊆ F, F closed}
(E)c = ∪{F c : E ⊆ F, F closed}
= ∪{F c : F c ⊆ E c , F c open}
= ∪{U : U ⊆ E c , U open}
= int (E c ).
(2) (⇒): Suppose there exists r > 0 such that Br (x) ∩ E = ∅. Then E ⊆ Br (x)c , a closed
set. Then E ⊆ Br (x)c , and hence x ∈ Br (x) ⊆ (E)c .
(⇐): Suppose that x 6∈ E. Then x ∈ (E)c , an open set. Hence there is r > 0 such that
c
Br (x) ⊆ (E)c . But then Br (x) ∩ E ⊆ (E ) ∩ E = ∅.
Definition 5.16. Let X be a metric space, E ⊆ X, and a ∈ X. The point a is a cluster
point of E (also called by some people limit point or accumulation point) if for every r > 0,
the intersection E ∩ Br (a) is infinite. We write E 0 for the set of cluster points of E.
Example 5.17. Let X = R.
(1) {1, 21 , 31 , . . .}0 = {0}.
(2) {0, 1, 21 , 13 , . . .}0 = {0}.
(3) Z0 = ∅.
(4) Q0 = R.
Note that E 0 ⊆ E — this follows from Proposition 5.15(2). Therefore E ∪ E 0 ⊆ E. In
fact, the two sides are equal, which fact is the content of the next result.
Proposition 5.18. E = E ∪ E 0 .
Before proving the proposition, we give a lemma that may seem surprising at first.
Lemma 5.19. a ∈ E 0 if and only if for each r > 0, E \ {a} ∩ Br (a) 6= ∅.
Proof. (⇒): Suppose that for some r > 0, E \ {a} ∩ Br (a) = ∅. Then E ∩ Br (a) ⊆ {a}, a
finite set. Hence a 6∈ E 0 .
(⇐): Let a 6∈ E 0 . Then there is r > 0 such that E ∩ Br (a) is finite. Let E ∩ Br (a) \ {a} =
Of course, since a sequence in a metric space is an example of a function with the metric
space as codomain, it makes sense to talk of bounded (and unbounded) sequences. The proof
of the next result is a good exercise, but it will also follow from some later results.
Lemma 6.14. Let (xn ) be a convergent sequence (in some metric space). Then (xn ) is
bounded.
7. Continuous functions
Definition 7.1. Let (X, d) and (Y, ρ) be metric spaces, f : X → Y a function, and x0 ∈ X.
f is continuous at x0 if for every ε > 0 there exists δ > 0 such that for every x ∈ X, if
d(x, x0 ) < δ then ρ f (x), f (x0 ) < ε. f is continuous if it is continuous at each point of X.
Remark 7.2. Here are some equivalent formulations of continuity at a point x0 .
(1) For every ε > 0 there exists δ > 0 such that f Bδ (x0 ) ⊆ Bε f (x0 ) .
(2) For every open ball C with center f (x0 ), there exists an open ball B with center x0
such that f (B) ⊆ C.
(3) For every ε > 0 there exists δ > 0 such that Bδ (x0 ) ⊆ f −1 Bε f (x0 ) .
(2) Any two closed disks in R2 having positive radii are homeorphic.
(3) No open disk in R2 is homeomorphic to any closed disk in R2 . (This is not an obvious
one.)
(4) Every open ball in Rn is homeomorphic to every open box in Rn .
Example 7.9. Recall the function f : X → C from Definition 14.2, where X = ∞
Q
1 {0, 1}
is as in Example 2.6, and C is the Cantor set (Definition 14.1). We will Q show that f and
f −1 are continuous functions. First some notation. If (a1 , a2 , . . . , an ) ∈ n1 {0, 1}, let
Z(a1 , . . . , an ) = {x ∈ X : xi = ai for 1 ≤ i ≤ n}.
Such sets are called cylinder sets. Note that cylinder sets are clopen: Z(a1 , . . . , an ) =
B1/n (x) = B 1/(n+1) (x) for any x ∈ Z(a1 , . . . , an ). Note also that f Z(a1 , . . . , an ) = C ∩In (x)
(again for any x ∈ Z(a1 , . . . , an )), which is a clopen subset of C (recall the definition of In (x)
from Definition 14.2). Thus these two families of clopen subsets are paired by the function
f . Since every open subset of X is a union of open balls, i.e. of cylinder sets, and every
open subset of C is a union of subsets of the form C ∩ In (x) (an exercise!), it follows from
Theorem 7.5 that f and f −1 are continuous.
The proofs of the next two results are easy, and so are left as exercises.
Corollary 7.10. (of Theorem 7.5) Let X be a metric space. f : X → R is continuous if
and only if f −1 (a, b) is open for all a < b in R. Equivalently, f : X → R is continuous if
and only if {f < a} and {f > a} are open for all a ∈ R.
Theorem 7.11. Let f : X → Y and g : Y → Z be functions between metric spaces, and let
x0 ∈ X. If f is continuous at x0 , and g is continuous at f (x0 ), then g ◦ f is continuous at
x0 .
8. Limits of functions
The definition we gave a while ago for the limit of a sequence is a special case of a general
notion of limit of a function — after all, a sequence is just a special kind of function. But
sequences are quite special. The definition of the limit of a function is a little bit more
involved. We will need it, in principle, when we talk about differentiation.
Definition 8.1. Let (X, d) and (Y, ρ) be metric spaces, let E ⊆ X, let x0 ∈ E 0 , and let
y0 ∈ Y . The limit of f , as x approaches x0 , equals y0 if for every ε > 0 there exists δ > 0
such that for all x ∈ E, if 0 < d(x, x0 ) < δ then ρ f (x), y0 < ε. (The final implication can
also be expressed as f E ∩ Bδ (x0 ) \ {x0 } ⊆ Bε (y0 ).) We write limx→x0 f (x) = y0 .
Remark 8.2. Note that f might or might not be defined at x0 (accordingly as x0 ∈ E or
x0 6∈ E). We require x0 ∈ E 0 so that for every δ > 0 there will exist points x satisfying the
hypothesis of the implication. Even if x0 ∈ E, the definition of the limit as x → x0 never
requires that f be evaluated at x0 — the value of f at x0 is irrelevant.
Note further, that if we tried to apply this definition to a point x0 that is not a cluster
point of E, then we would find that the definition is satisfied for any point y0 ∈ Y . To avoid
this situation, we only consider limits at cluster points of the domain of the function.
Exercise 8.3. Show that in the situation of Definition 8.1, if the limit exists it is unique.
(Be sure to note explicitly where the hypothesis that x0 ∈ E 0 is used.)
22 JACK SPIELBERG
9. Sequences in R
Theorem 9.1. Let (an ) and (bn ) be sequences in R. Suppose that an → a, and bn → b.
Then
(1) an + bn → a + b.
(2) an bn → ab.
(3) If b 6= 0 then an /bn → a/b (where at most finitely many terms are not defined).
(4) If an ≤ bn for all n, then a ≤ b.
Proof. These are good exercises, so we will only prove part of the third statement; namely,
the case where an = 1 for all n. First, let’s sort out the parenthetical comment. If b 6= 0,
then |b| > 0. By definition of convergence, there is n0 such that |bn − b| < |b| for all n ≥ n0 .
But then, for all n ≥ n0 we have |bn | = |b − (b − bn )| ≥ |b| − |b − bn | > |b| − |b| = 0. Therefore
bn 6= 0 if n ≥ n0 . The quotient sequence will fail to be defined if the denominator equals
zero, but this can only happen for finitely many n (all less than n0 ).
Now let’s prove that if bn → b 6= 0, then 1/bn → 1/b. Let ε > 0. Let n1 be such that
|bn − b| < |b|/2 whenever n ≥ n1 . We can improve on the previous paragraph. If n ≥ n1 we
have that |bn | ≥ |b| − |b − bn | > |b| − |b|/2 = |b|/2. Now let n2 be such that |bn − b| < |b|2 ε/2
whenever n ≥ n2 . Let n0 = max{n1 , n2 }. For n ≥ n0 we have
2
b − bn
= |bn − b| · 1 · 1 < |b| ε · 1 · 2 = ε.
1 1
− =
bn b bbn |b| |bn | 2 |b| |b|
Therefore 1/bn → 1/b.
NOTES, MAT 472, INTERMEDIATE ANALYSIS, FALL 2010 23
Remark 9.2. (1) It follows from Theorem 9.1(4) that if an < bn for all n, then a ≤ b.
Note that even with strict inequalities in the hypotheses, the conclusion will in general
only be a weak inequality. This reflects a general principle: limits change strict
inequalities into weak inequalities.
(2) The following well-known lemma also follows from Theorem 9.1(4).
Lemma 9.3. Let (an ) and (bn ) be real sequences, suppose that |an | ≤ |bn |, and suppose that
bn → 0. Then an → 0.
Lemma 9.4. Let (xi ) be a sequence in Rn . We write the ith term of the sequence as an
n-tuple thus: (xi1 , . . . , xin ) (cf. Remarks 6.9). If a = (a1 , . . . , an ) ∈ Rn , then xn → a if and
only if xij → aj (as i → ∞) for all j = 1, . . ., n.
Proof. These follow easily from Remarks 4.12.
We now establish convergence of some special, familiar sequences in R.
√
Proposition 9.5. (1) For any k ∈ N, 1/ k n → 0 as n → ∞.
(2) For any 0 < a < 1, an → 0 as n → ∞.
(3) n1/n → 1 as n → ∞.
(4) For any a ∈ R with 0 < a < 1, and any k ∈ N, an nk → 0 as n → ∞.
1/k
Proof.
√ (1) Let ε > 0. Choose n0 > 1/εk . If n ≥ n0 then n1/k ≥ n0 > 1/ε, and hence
1/ n < ε.
k
The justification is the opposite of the above: the sequence (inf k≥n ak )∞
n=1 is increasing and
bounded, so it has a limit.
Theorem 10.2. Let (an ) be a bounded sequence in R.
(1) lim inf n→∞ an ≤ lim supn→∞ an
(2) (an ) converges if and only if lim inf n→∞ an = lim supn→∞ an , and in this case,
lim an = lim inf an = lim sup an .
n→∞ n→∞ n→∞
contains a term for which P holds; in other words, if for all n0 there exists n ≥ n0 such that
P (an ) is true.
For example, you can check your understanding of these terms by working through the
following statements.
(1) (an ) converges to c if and only if for every ε > 0, an ∈ Bε (c) eventually.
(2) (an ) has a subsequence converging to c if and only if for every ε > 0, an ∈ Bε (c)
frequently.
Exercise 10.7. Let (an ) be a bounded real sequence, and let x ∈ R. Prove the following:
1. x < lim sup an =⇒ x < an frequently =⇒ x ≤ lim sup an
2. x > lim sup an =⇒ x > an eventually =⇒ x ≥ lim sup an
3. x < lim inf an =⇒ x < an eventually =⇒ x ≤ lim inf an
4. x > lim inf an =⇒ x > an frequently =⇒ x ≥ lim inf an
(The exercise is not only to prove the eight implications, but also to show that none of these
implications can be reversed.)
If m < n, we have
1 1
kvm − vn k2 = k(0, 0, . . . , 0, ,..., , 0, 0, . . .)k2
2m+1 2n
n n−m−1
X 1 2 1 X 1 1
= i
= m+1 i
< m.
i=m+1
2 4 i=0
4 4
Thus (vn ) is Cauchy in V . But we claim that (vn ) does not converge. To prove this, let
y = (yn ) be an arbitrary vector in V . There is k such that yi = 0 for i > k. For n > k,
∞ n
2
X
2
X 1 2 1 2 1
ky − vn k = (yi − vni ) = yi − i ≥ yk+1 − k+1 = k+1 .
i=1 i=1
2 2 4
Thus d(vn , y) ≥ 2−(k+1) for all n > k. Therefore vn 6→ y.
Definition 12.6. A metric space is called complete if every Cauchy sequence converges.
Theorem 12.7. Rn is complete.
We will give the proof after a couple of lemmas about Cauchy sequences in general metric
spaces.
Lemma 12.8. A Cauchy sequence is bounded.
Proof. Let (an ) be a Cauchy
sequence. Then there is L such that d(am , an ) < 1 for all m,
n ≥ L. Let R = max d(a1 , aL ), . . . , d(aL−1 , aL ) + 2. Then d(an , aL ) < R for all n, and
hence (an ) is bounded
Lemma 12.9. A Cauchy sequence having a convergent subsequence is convergent.
Proof. Let (an ) be a Cauchy sequence, and let (ani ) be a convergent subsequence, with limit
c. We claim an → c. Let ε > 0. Since (an ) is Cauchy there is L such that d(am, an ) < ε/2
for all m, n ≥ L. By the definition of convergence, there is i0 such that d ani , c < ε/2 for
all i ≥ i0 . Let i1 ≥ i0 be such that ni1 ≥ L. Then for any n ≥ ni1 we have
ε ε
d(an , c) ≤ d(an , ani1 ) + d(ani1 , c) < + = ε.
2 2
Hence an → c.
Proof. (of Theorem 12.7) We first show that R is complete. Let (an ) be a Cauchy sequence
in R. By Lemma 12.8 we know that (an ) is bounded. By Theorem 10.3 we know that (an )
has a convergent subsequence. Then by Lemma 12.9 we know that (an ) converges. Thus R
is complete. Now it follows easily from Remark 4.12 that Rn is complete (the details are left
as an exercise).
Exercise 12.10. A closed subset of a complete metric space is complete.
13. Compactness
Compactness is probably the most important concept in analysis. It can be described in
various ways. The “right” way is not necessarily the easiest to understand. Before we give
the definition, here is some motivation for why it is reasonable. The basic problem that
compactness addresses is the transition from local information to global information. That
may sound cryptic, and it is meant to be a catchy phrase that will become more intelligible
NOTES, MAT 472, INTERMEDIATE ANALYSIS, FALL 2010 29
as you get more used to these ideas. But it isn’t hard to see what it is about. Local (near a
point) means in an open ball centered at that point. Here is a simple example of using this
terminology. If a function is continuous at a point, then it is bounded in some open ball
centered at that point. Thus if a function is continuous on a set, it is bounded locally on
that set: each point in the set has a neighborhood on which the function is bounded. On
the other hand, global (on a set) means on the whole set. A function is “globally bounded”
if it is bounded on its domain, i.e. if it is a bounded function. Is every continuous function
bounded? Of course not! For example, a non-constant polynomial on R is continuous, but
not bounded. Local boundedness does not generally imply global boundedness. However if
the domain of the polynomial is taken to be a closed bounded interval, then the extreme
value theorem from calculus implies that the polynomial is bounded on the interval. The
great insight was that it is a property of the domain that lets us pass from local boundedness
to global boundedness, and this property is called compactness.
Now, recall what the word local means: in a neighborhood of a point. A property holds
locally on a set if for each point, there is an open ball centered at the point such that the
property holds in that ball. If the set is infinite, this will give an infinite collection of open
balls, one for each point. We could obtain the property globally if we had a finite collection
of balls instead of an infinite collection. Compactness of the set means that we can always
reduce to a finite collection.
You might notice that a lot of mathematics seems to proceed in this way: what would we
like to have? Let’s give a name to the situation where we have what we want. Now let’s
analyze the situation to see what exactly we were asking for. In fact, compactness can be
described in a variety of ways that seem very different. That means that we can prove that a
space is compact using an easy description. Then we can use compactness via a complicated
description.
OK, with that as motivation, here is the precise definition.
Definition 13.1. Let X be a set. A cover of X is a collection of sets whose union contains
X. If U is a cover of X, a subcover of U is a subcollection of U that is also a cover of X.
Example
13.2. (1) The set
of all open intervals is a cover of R.
(2) (a, b) : a < b, a, b ∈ Z is a subcover of example (1).
Definition 13.3. Let X be a metric space, and let E ⊆ X. An open cover of E is a cover
of E whose elements are open subsets of X.
Definition 13.4. Let X be a metric space, and let E ⊆ X. E is compact if every open
cover of E has a finite subcover.
Example 13.5. (1) Example 13.2(1) is an open cover of R having a finite subcover.
(2) Example 13.2(2) is an open cover of R not having a finite subcover. In particular, it
follows that R is not compact.
Example 13.6. (1) Finite sets are compact.
(2) {0, 1, 1/2, 1/3, . . .} is a compact subset of R.
(3) [0, 1] is a compact subset of R (this is a special case of Corollary 13.30).
Proof. Let U be an open cover of [0,1]. Let E = x ∈ [0, 1] : [0, x] is finitely covered
by U . Note that 0 ∈ E, so E 6= ∅. Let c = sup E. Then c ∈ [0, 1]. We first
claim that c ∈ E. To see this, choose U0 ∈ U with c ∈ U0 . Then there exists r > 0
30 JACK SPIELBERG
S x ∈ U there
Thus U is relatively open in E if and only if for every exists r(x) > 0 such that
Br(x) (x)∩E ⊆ U . In this case, we have that U = x∈U Br(x) (x) ∩E, and we may use the set
in parentheses for V . Conversely, suppose that U = V ∩ E for some open set V of X. Then
for a point x ∈ U there is r > 0 such that Br (x) ⊆ V . Then BrE (x) = Br (x)∩E ⊆ V ∩E = U ,
so we have that U is relatively open in E.
Proposition 13.11. Let X be a metric space, and let E ⊆ X. E is a compact subset of X
if and only if E is a compact metric space.
Proof. Suppose that E is a compact subset of X. Let U be an open cover of (the metric space)
E. By Lemma 13.10, for each U ∈ U there is an open set VU ⊆ X such that U = VU ∩ E.
Then [ [ [
E= U= (VU ∩ E) = VU ∩ E,
U ∈U U ∈U U ∈U
and hence {VU : U ∈ U} is an open cover if E in X. By hypothesis this open cover has
a finite subcover. Thus there are U1 , . . ., Uk ∈ U such that E ⊆ VU1 ∪ · · · ∪ VUk . Hence
E ⊆ U1 ∪ · · · ∪ Uk , so that U has a finite subcover. Therefore the metric space E is compact.
The converse is left as an exercise.
NOTES, MAT 472, INTERMEDIATE ANALYSIS, FALL 2010 31
Thus compactness is an intrinsic property of a metric space, that cannot be lost when the
space is realized as a subspace of another metric space (in contrast to openness, which does
depend on the ambient metric space, as seen in Example 13.9). We now develop the chief
properties of compactness.
Proposition 13.12. A closed subset of a compact space is compact.
Proof. Let X be a compact metric space, and let E ⊆ X be a closed subset. Let U be an
open cover of E. Since E is closed, E c is open. Then U ∪ {E c } is an open cover of X. Since
X is compact, this open cover has a finite subcover. The subcover consists of finitely many
sets from U, possibly together with E c . But then the sets from U must cover E, so that U
has a finite subcover (of E). Therefore E is compact.
Exercise 13.13. It is a nice exercise to prove a sort of converse to this. Namely, a compact
subset of a metric space is closed. We won’t do it here, as this fact will follow from a later
result (Corollary 13.20).
Proposition 13.14. A compact subset of a metric space is bounded.
Proof.
Let E be a compact subset of the metric space X. Choose any point x0 ∈ X. Then
Bn (x0 ) : n = 1, 2, 3, . . . is an open cover of X, hence also of E. Since E is compact, there
is a finite subcover. But since the open balls increase with n, this means that there is n such
that E ⊆ Bn (x0 ). Thus E is bounded.
Of course, the converse of Proposition 13.14 is false.
Theorem 13.15. (Finite Intersection Property, or FIP) Let X be a compact metric space.
Let {Ei }i∈I be a collection of nonempty closed subsets of X. Suppose that every finite
subcollection has nonempty intersection: for all k ∈ N, for all i1 , . . ., ik ∈ I, we have
Ei1 ∩ · · · ∩ Eik 6= ∅. Then ∩i∈I Ei 6= ∅.
Proof. Suppose not. Then taking complements we have ∪i∈I Eic = X. This means that
{Eic : i ∈ I} is an open cover of X. Since X is compact there are i1 , . . ., ik ∈ I with
Eic1 ∪ · · · ∪ Eick = X. But then by complements again, we get that Ei1 ∩ · · · ∩ Eik = ∅, a
contradiction.
Example
13.16. The theorem may fail if the sets are not closed: consider (0, 1/n) : n ∈
N . This does have the FIP, but the intersection is empty.
Definition 13.17. A metric space X is sequentially compact if every sequence in X has a
convergent subsequence (convergent in X, of course).
Example 13.18. [a, b] is sequentially compact by Theorem 10.3, and the fact that [a, b] is
closed.
Theorem 13.19. A compact metric space is sequentially compact.
Corollary 13.20. A compact subset of a metric space is closed.
The proof of the theorem will be made easier by the following preliminary “computation.”
Lemma 13.21. Let (xn ) be a sequence in a metric space, and let y be a point. Then (xn )
has a subsequence converging to y if and only if for every ε > 0 and for every m ∈ N, there
exists n ≥ m such that d(xn , y) < ε.
32 JACK SPIELBERG
Proof. (⇒): Suppose limi→∞ xni = y. Let ε > 0 and m ∈ N. By the hypothesized conver-
gence there is i0 such that d(xni , y) < ε whenever i ≥ i0 . Since ni → ∞ as i → ∞ there
exists j ≥ i0 such that nj ≥ m. Then d(xnj , y) < ε. So nj is the desired ‘n’.
(⇐): Suppose the condition in the statement holds. We apply it repeatedly. First choose
n1 such that d(xn1 , y) < 1. Then choose n2 > n1 such that d(xn2 , y) < 1/2. Continuing
this way we construct a subsequence (xni )∞ i=1 such that d(xni , y) < 1/i for all i. Evidently
xni → y as i → ∞.
Proof. (of Theorem 13.19) We will prove the contrapositive of the statement in the theorem.
So suppose that X is not sequentially compact. Then there is a sequence (xn ) having no
convergent subsequence. Thus for all y ∈ X, (xn ) does not have a subsequence converging to
y. Negating the condition in Lemma 13.21, we find that for all y ∈ X there exists εy > 0 and
there exists ny ∈ N such that for all n ≥ ny , d(xn , y) ≥ εy . Let U = Bεy (y) : y ∈ X . U is
obviously an open cover of X. But if y1 , . . ., yk ∈ X are any finite collection of points, choose
n > max{ny1 , . . . , nyk }. Then d(xn , y) ≥ εyi for i = 1, . . ., k. Hence xn 6∈ ∪ki=1 Bεyi (yi ). Thus
U has no finite subcover. Therefore X is not compact.
Proposition 13.22. A sequentially compact metric space is complete.
Proof. This follows from Lemma 12.9.
Exercise 13.23. A metric space X is sequentially compact if and only if every infinite subset
of X has a cluster point.
We now turn to the role of boundedness for compact metric spaces. By way of introduction,
we mention that the most famous result about compact metric spaces is the Heine-Borel
theorem: a subset of Rn is compact if and only if it closed and bounded. We will prove this
later, but now we want to point out that this result is special to Rn — it is NOT true in
arbitrary metric spaces. The reason is that Rn is (duh!) finite dimensional. This may not
seem so special now, but many of the most important metric spaces in analysis are infinite
dimensional, and you will surely run into them (maybe not today, maybe not tomorrow,
but...yeah, yeah.)
Here is a simple part of the Heine-Borel theorem that we have essentially proved already.
For E ⊆ R, if E is bounded then every sequence in E has a convergent subsequence. If E is
both closed and bounded, then the limit of the convergent subsequence must belong to E.
Thus we see that for subsets of R, closed and bounded imply sequentially compact.
Here are two examples to show that for general metric spaces, boundedness is too weak a
notion. The first is simple-minded, but the second is more interesting.
Example 13.24. (1) Let X be an infinite set with the discrete metric (Example 4.17).
Then X is bounded, but not sequentially compact.
(2) Let V be the normed space of finite real sequences (Example 12.5). Then B 1 (0) is
closed and bounded, but not sequentially compact.
In fact, the situation is worse than might be realized if you just think about the
non-convergent Cauchy sequence from Example 12.5. Consider the sequence (en )
in V , where en = (0, 0, . . . , 0, 1, 0, 0, . . .) (with 1 in the nth slot). This sequence is
contained in the unit ball of V , but does not even have a Cauchy subsequence.
These examples show that the problem with boundedness is that a huge space can hide
inside a bounded set. The correct definition is the following.
NOTES, MAT 472, INTERMEDIATE ANALYSIS, FALL 2010 33
Definition 13.25. A subset E of a metric space is called totally bounded if for every ε > 0
there are finitely many balls of radius ε that cover E.
Remark 13.26. (1) The definition is unaffected by specifying the type of the balls (open
vs. closed).
(2) A totally bounded subset of a metric space is bounded. A subset of a totally bounded
set is totally bounded.
The proofs are left as exercises.
The next lemma shows what makes Rn so special.
Lemma 13.27. In Rn , every bounded subset is totally bounded.
Proof. Let E ⊆√Rn be bounded, and let ε > 0. Choose C > 0 such that E ⊆ [−C, C]n .
Choose k > 2C n/ε. Write
k [ k
[ 2C(i − 1) 2Ci
[−C, C] = −C + , −C + = Si ,
i=1
k k i=1
√
where S1 , . . ., Sk are closed intervals of length 2C/k < ε/ n. Then
k
k
[
[−C, C]n = (S1 ∪ · · · ∪ Sk ) × · · · × (S1 ∪ · · · ∪ Sk ) = Si1 × · · · × Sik = ∪nj=1 Fj ,
i1 ,...,in =1
where each Fj is a closed cube of side 2C/k. Then √ the diameter of each Fj , which equals
the length of the diagonal of Fj , equals (2C/k) n < ε. Let xj ∈ Fj be arbitrary. Then
k
Fj ⊆ Bε (xj ). It follows that E ⊆ [−C, C]n ⊆ ∪nj=1 Bε (xj ).
We now return to our development of the properties of compactness.
Proposition 13.28. A sequentially compact metric space is totally bounded.
Proof. We again prove the contrapositive. Suppose that X is a metric space that is not
totally bounded. Then there is a positive number ε such that X cannot be covered by
finitely many balls of radius ε. Let x1 ∈ X. Since X 6⊆ B ε (x1 ) there must be x2 ∈ X with
d(x1 , x2 ) > ε. Since X 6⊆ B ε (x1 ) ∪ B ε (x2 ) there must be x3 ∈ X with d(xi , x3 ) > ε for i < 3.
Continuing this way we construct a sequence (xn ) in X such that d(xi , xn ) > ε for i < n.
This sequence has no Cauchy subsequence, hence no convergent subsequence. Therefore X
is not sequentially compact.
We now have almost all of the pieces of the main theorem on compactness in metric spaces.
Theorem 13.29. Let X be a metric space. The following are equivalent:
(1) X is compact.
(2) X is sequentially compact.
(3) X is complete and totally bounded.
Proof. (1)⇒(2) This is Theorem 13.19.
(2)⇒(3) This follows from Propositions 13.22 and 13.28.
(3)⇒(1) We prove this by contradiction. Let X be complete and totally bounded, and
suppose that X is not compact. Then there is an open cover U having no finite subcover.
We first use total boundedness. There is a finite collection C1 of closed balls of radius 1
covering X. There must be a ball B1 ∈ C1 such that B1 is not finitely covered by U —
34 JACK SPIELBERG
It is a good idea to draw a picture. Note that it follows from the FIP that C is nonempty.
In fact, much more is true, as we will now see. Recall the space X of Definition 2.6.
Definition 14.2. We define f : X → C as follows. Let x = (x1 , x2 , . . .) ∈ X. For each n
define a closed interval In (x) recursively by
I0 (x) = [0, 1]
(
left piece of In (x) ∩ Fn+1 , if xn+1 = 0,
In+1 =
right piece of In (x) ∩ Fn+1 , if xn+1 = 1.
Then I0 (x) ⊇ I1 (x) ⊇ · · · . By the FIP, the intersection is nonempty. But since
T∞the length
−n
of equals 3 , which tends to 0, there is a unique f (x) ∈ [0, 1] such that n=0 In (x) =
In (x)
f (x) . This defines the function f .
Proposition 14.3. f is bijective.
Proof. We first show that f is injective. Let x, y ∈ X with x 6= y. Let k = k(x, y) (recall
Example 4.15). For i < k, xi = yi , so that Ii (x) = Ii (y). Since xk 6= yk , Ik (x) and Ik (y) are
two disjoint subintervals of Ik−1 (x) = Ik−1 (y). Since f (x) ∈ Ik (x) and f (y) ∈ Ik (y), we must
have f (x) 6= f (y)..
We now show that f is surjective. Let t ∈ C. Then t ∈ Fn for all n. For each n, let In
be the subinterval of Fn containing t. Since In and In+1 are subintervals of Fn and Fn+1 ,
NOTES, MAT 472, INTERMEDIATE ANALYSIS, FALL 2010 35
15. Connectedness
Definition 15.1. Let X be a metric space. A subset of X is clopen if it is both closed and
open. X is connected if the only clopen subsets of X are ∅ and X. A subset E ⊆ X is
connected if the metric space E is connected.
Remark 15.2. What does it mean for A ⊆ E to be relatively clopen in E? We know that A
is relatively open in E if and only if A = E ∩ U for some open set U ⊆ X. Similarly, one can
check that A is relatively closed in E if and only if A = E ∩ K for some closed set K ⊆ X.
Thus A is relatively clopen in E if and only if there are two sets U and K in X, with U open
and K closed, such that A = E ∩ U = E ∩ K. (Note that it is NOT NECESSARILY true
that A equals the intersection of E with a clopen subset of X.)
Exercise 15.3. We call a metric space X separated if there exist nonempty subsets A and
B such that A ∪ B = X and A ∩ B = ∅ = A ∩ B. Prove that X is connected if and only if
it is not separated.
Exercise 15.4. The Cantor set (Definition 14.1) is not connected.
36 JACK SPIELBERG
Thus E ∩ [a, b] and (I \ E) ∩ [a, b] are closed subsets of R. Let c = sup E ∩ [a, b] . Then
c ∈ E ∩[a, b] since this set is closed. Also, c < b since b 6∈ E. Hence (c, b] ⊆ (I \E)∩[a, b], and
so c ∈ (I \E)∩[a, b] since this set is closed. This leads to the contradiction c ∈ E ∩(I \E).
Corollary 15.9. Rn is connected.
Proof. Suppose not. Then there is a nonempty proper clopen subset A of Rn . Let a ∈ A
and b ∈ Ac . For t ∈ [0, 1] let xt = (1 − t)a + tb, and let
E = t ∈ [0, 1] : xt ∈ A .
Set F = [0, 1] \ E, and note that 0 ∈ E and 1 ∈ F . So E and F are nonempty subsets
of [0, 1]. Note that xt − xs = (s − t)(b − a). We claim that E is relatively open in [0, 1].
To see this, suppose that t ∈ E. Then xt ∈ A. Since A is open there is r > 0 such that
Br (xt ) ⊆ A. Then for s ∈ [0, 1] with |s − t| < r/ka − bk, we have kxt − xs k < r, and hence
xs ∈ A. A similar argument, using the fact that Ac is open, shows that F is an open subset
of [0, 1]. Since E and F are complements (in [0, 1]), it follows that [0, 1] is not connected,
contradicting Theorem 15.8.
NOTES, MAT 472, INTERMEDIATE ANALYSIS, FALL 2010 37
The following theorem is very useful, and we place it here because it deals with intervals
(although it is not a result about connectedness).
Theorem 15.10. Let U ⊆ R be open. Then U equals the union of countably many open
intervals. Moreover, U can be written as the union of a countable collection of pairwise
disjoint open intervals, and this collection is unique.
Proof. For x ∈ U choose a(x), b(x) ∈ Q with x ∈ a(x), b(x) ⊆ U . Let E = a(x), b(x) :
x ∈ U . Then E is a collection of open intervals. Since E ⊆ (α, Sβ) : α, β ∈ Q, α < β
2
Q , we see that E is a countable collection. It is clear that U = E.
The proof of the second statement of the Theorem is left as an exercise.
16. Continuity and compactness
Theorem 16.1. Let X and Y be metric spaces, and let f : X → Y be a continuous function.
If X is compact then so is f (X).
Proof. Let V be an open cover of f (X). Then f −1 (V) = f −1 (V ) : V ∈ V is an open cover
2πt 2πt
(2) Define g : (0, ∞) → T by g(t) = cos t+1 , sin t+1 . Then g is continuous but not a
closed map: [1, ∞) is a closed subset of (0, ∞), but f [1, ∞) is not an open subset
of T since it does not contain its limit point (1, 0).
Theorem 16.10. Let X and Y be metric spaces with X compact, and let f : X → Y be
continuous and bijective. Then f is an open map.
Proof. Let U ⊆ X be open. Then U c is closed, hence compact. Therefore f (U c ) is compact,
hence closed. But f (U c ) = f (U )c since f is bijective. Therefore f (U ) is open.
Corollary 16.11. In the above theorem, f −1 is continuous.
(3) Let h : (0, 1) → R be given by h(t) = sin(1/t). Then h is not uniformly continuous.
√
Proof. We choose ε = 2. Let δ > 0 be given. Choose n > 1/ δ. Let s = 2/[(2n+1)π]
and let t = 2/[(2n + 3)π]. Then
2 1 1 2 2 1
|s − t| = − = ≤ 2 < δ.
π 2n + 1 2n + 3 π (2n + 1)(2n + 3) n
But h(s) − h(t) = 1 − (−1) = 2 ≥ ε. Therefore h is not uniformly continuous.
The following theorem is a classic use of compactness to get a global result from local
information.
Theorem 18.3. Suppose f : X → Y is continuous, and X is compact. Then f is uniformly
continuous.
Proof. Let ε > 0 be given.
Since f is continuous,
for each x ∈ X there is rx > 0 such that
f Brx (x) ⊆ Bε/2 f (x) . The collection Brx /2 (x) : x ∈SX is an open cover of X. Since X
is compact, there are x1 , . . ., xn ∈ X such that X = ni=1 Brxi /2 (xi ). Let δ = min{rxi /2 :
1 ≤ i ≤ n}. Let y, z ∈ X with d(y, z) < δ. There is i such that d(y, xi ) < rxi/2. Then
d(z, xi ) ≤ d(z,
y) + d(y, xi ) < δ + rxi /2 ≤ rxi . Then f (y), f (z) ∈ Bε/2 f (xi ) , so that
d f (y), f (z) < ε.
19. Convergence of functions
Definition 19.1. Let X be a set. (Note that we really do mean set. Later we will let X
be a metric space, but for now, that is not relevant.) Let fn : X → Rk for n = 1, 2, 3, . . ..
(We remark that Rk may be replaced by another metric space. For ease of exposition, we
restrict our attention to the case where
∞ the codomain is Euclidean space.) For a ∈ X we say
k
that (fn ) converges at a if fn (a) n=1 is a convergent sequence in R . If (fn ) converges at
each point of x, define f : X → Rk by f (x) = limn→∞ fn (x). We say that (fn ) converges to
f (pointwise).
We may specify this more precisely as: for every ε > 0, for every x ∈ X, there exists
n0 ∈ N such that for all n ≥ n0 , kfn (x) − f (x)k < ε. (Note that n0 ≡ no (ε, x) depends on
both ε and on x.)
Example 19.2. (1) Let fn : [0, 1] → R be given by fn (x) = n1 x. Then fn → 0.
(2) Let gn : [0, 1] → R be given by gn (x) = xn . Then gn → g, where
(
0, if x < 1,
g(x) =
1, if x = 1.
Definition 19.3. Let f , fn : X → Rk . We say that (fn ) converges to f uniformly (on X)
if for each ε > 0, there exists n0 ∈ N such that for every x ∈ X, and for every n ≥ n0 ,
kfn (x) − f (x)k < ε. (Note that n0 ≡ n0 (ε) depends only on ε.)
Formally, the difference between pointwise convergence and uniform convergence is only
in the order of the two quantifed variables n0 and x. The difference practically, however, is
profound, and it is important that you get a good feel for it.
Example 19.4. (1) n1 x → 0 uniformly on [0, 1].
(2) xn 6→ 0 uniformly on [0, 1].
NOTES, MAT 472, INTERMEDIATE ANALYSIS, FALL 2010 41
(where the first and third occurrences of ε/3 are due to the uniform approximation of f by
fn , and the second is due to the continuity of fn at a). Therefore f is continuous at a.
Corollary 19.8. The uniform limit of continuous functions is continuous.
Example 19.9. (1) Consider the sequence of functions xn on [0, 1]. We have seen that
this sequence has a pointwise limit, which is not continuous. Since xn is continuous
for each n, the theorem implies that the convergence is not uniform (this is an easier
proof than the direct proof we gave earlier).
42 JACK SPIELBERG
(2) The above argument cannot be used in reverse. For example, let fn : [0, 1] → R be
given by
1
2nx,
if 0 ≤ x ≤ 2n
1 1
fn (x) = −2n(x − 2n ), if 2n ≤ x ≤ n1
if n1 ≤ x ≤ 1.
0,
(It will be helpful to draw a picture.) Then fn → 0 pointwise on [0, 1], but not
uniformly, even though the limit is continuous.
Example 19.10. Recall function space from Example 4.6: if X is a set, B(X, Rk ) is the
vector space of all bounded function from X to Rk . B(X, Rk ) is a normed vector space, with
norm given by kf k = supx∈X kf (x)k. Thus B(X, Rk ) is a metric space.
Proposition 19.11. Let f , fn : X → Rk be bounded functions.
(1) fn → f in B(X, Rk ) if and only if fn → f uniformly on X.
(2) (fn ) is Cauchy in B(X, Rk ) if and only if (fn ) is uniformly Cauchy on X.
Proof. This follows immediately from the definitions.
Corollary 19.12. B(X, Rk ) is a complete metric space.
Proof. This follows from Proposition 19.6 and the above proposition.
Definition 19.13. Let X be a metric space. Cb (X, Rk ) is the space of all bounded continuous
functions from X to Rk .
Note that Cb (X, Rk ) is a vector subspace of B(X, Rk ), since the sum and (scalar) product
of continuous functions is continuous.
Proposition 19.14. Cb (X, Rk ) is a complete metric space.
Proof. This follows from Corollary 19.8.
Remark 19.15. If X is a compact metric space, then C(X, Rk ) = Cb (X, Rk ).
20. Differentiation
Definition 20.1. Let I ⊆ R be open, let f : I → R, and let a ∈ I. f is differentiable at a if
f (x) − f (a)
lim
x→a x−a
exists (equivalently, if limh→0 f (a + h) − f (a) /h exists). The limit is called the derivative
df df
of f at a, and is denoted f 0 (a) (or dx (a), or dx x=a
). We say that f is differentiable on I if
it is differentiable at each point of I. We refer to the quantity f (x) − f (a) /(x − a) as the
difference quotient.
Suppose that f is differentiable at a. Let L(x) = f (a) + f 0 (a)(x − a) (L is a “linear
function”, in that its graph is a straight line). The function f is well-approximated by L in
the following sense:
(3) f (a) = L(a)
f (x) − L(x)
(4) lim = 0.
x→a x−a
NOTES, MAT 472, INTERMEDIATE ANALYSIS, FALL 2010 43
Remark 20.2. There exists at most one linear function L having these properties. Unique-
ness is an exercise, while existence is equivalent to differentiability.
There is a third equivalent formulation of differentiability. We motivate it as follows. Let
f be differentiable at a. Define u : I → R by
f (x)−f (a)−f 0 (a)(x−a)
(
x−a
, if x 6= a
u(x) =
0, if x = a.
Then limx→a u(x) = limx→a f (x)−L(x)
x−a
= 0, so that u is continuous at a. Moreover, f (x) =
f (a) + f 0 (a)(x − a) + u(x)(x − a). Thus we see that if f is differentiable at a, then f differs
from L by a function that tends to zero as x tends to a, even when divided by x − a.
Theorem 20.3. f is differentiable at a if and only if there exist a linear function L(x) =
m(x − a) + b, and a function u(x), such that
(1) u(a) = 0.
(2) u is continuous at a.
(3) f (x) = L(x) + u(x)(x − a).
In this case, f 0 (a) = m (and of course, b = f (a)).
Proof. The ‘only if’ direction was proved in the remarks before the statement of the theorem.
For the ‘if’ direction, let L and u be as in the statement of the theorem. Letting x = a in
the third item of the statement gives f (a) = b. Then dividing by x − a, and letting x → a,
we get
f (x) − f (a) m(x − a) + u(x)(x − a)
lim = lim = lim m + u(x) = m,
x→a x−a x→a x−a x→a
Theorem 20.7. (The chain rule.) Let I, J ⊆ R be open, let f : I → R and g : J → R, let
a ∈ I, suppose that f (a) ∈ J, and suppose that f is differentiable at a and g is differentiable
at f (a). Then g ◦ f is differentiable at a, and (g ◦ f )0 (a) = g 0 f (a) f 0 (a).
Proof. We apply Theorem 20.3 to f and g to obtain functions u : I → R and v : J → R
such that
(1) u and v vanish at a and f (a), respectively.
(2) u and v are continuous at a and f (a), respectively.
(3)
f (x) = f (a) + f 0 (a)(x − a) + u(x)(x − a)
g(y) = g f (a) + g 0 f (a) y − f (a) + v(y) y − f (a) .
0
+ v f (x) f (a)(x − a) + u(x)(x − a)
= g f (a) + g 0 f (a) f 0 (a)(x − a)
h i
+ g 0 f (a) u(x) + v f (x) f 0 (a) + v f (x) u(x) (x − a).
Then by Theorem 20.3 it suffices to show that the expression in square brackets vanishes
and is continous at x = a. We check this for each of the three terms separately. It is true for
the first term because it is true for u. It is true for the second term because f is continuous
at a (by Lemma 20.4), v is continuous, and vanishes, at f (a), and Theorem 7.11. It is true
for the third term by both of the above.
We now draw out some consequences of differentiability on intervals. First we give a
general definition.
Definition 20.8. Let X be a metric space, let U ⊆ X be open, let a ∈ U and let f : U → R.
f has a local maximum (respectively local minimum) at a if there is r > 0 such that for all
x ∈ Br (a) we have f (x) ≤ f (a) (respectively, f (x) ≥ f (a)). Local maxima and minima are
called local extrema.
Lemma 20.9. Let I ⊆ R be an open interval, let a ∈ I, and let f : I → R. Suppose that f
is differentiable at a. If f has a local extremum at a, then f 0 (a) = 0.
Proof. We prove the contrapositive. Suppose that f 0 (a) 6= 0. For definiteness we assume
f 0 (a) > 0 (the proof in the case f 0 (a) < 0 is analogous). We then have that limx→a f (x) −
f (a) /(x − a) > 0. Then there is δ >0 such that (a − δ, a + δ) ⊆ I, and such that for x ∈ I,
if 0 < |x − a| < δ then f (x) − f (a) /(x − a) > 0. Now, for any x with a − δ < x < a, we
have x − a < 0. Since the difference quotient is positive, we must have f (x) − f (a) < 0; thus
f does not have a local minimum at a. Similarly, for any x with a < x < a + δ, we have
x − a > 0. Again, since the difference quotient is positive, we must have f (x) − f (a) > 0;
NOTES, MAT 472, INTERMEDIATE ANALYSIS, FALL 2010 45
thus f does not have a local maximum at a. Therefore, f does not have a local extremum
at a.
This lemma has several famous applications.
Theorem 20.10. (Rolle’s theorem) Let f : [a, b] → R be continuous, and assume that f is
differentiable on (a, b). Suppose further that f (a) = f (b) = 0. Then there exists c ∈ (a, b)
such that f 0 (c) = 0.
Rolle’s theorem is a special case of the following theorem
Theorem 20.11. (Mean value theorem) Let f : [a, b] → R be continuous, and assume that f
0
is differentiable on (a, b). Then there exists c ∈ (a, b) such that f (c) = f (b) − f (a) /(b − a).
The idea of the theorem, and the proof, is easy to see from a simple sketch:
on the graph
of f , draw the straight line between the endpoints of the graph a, f (a) and b, f (b) . Let
L(x) be the linear function whose graph passes through these two points. The point c in the
theorem is (one of) the place(s) where the vertical distance between the graphs of f and L
is stationary, i.e. has a local extremum. A little algebraic manipulation of the expression
f (x) − L(x) yields the beginning of the following proof.
Proof. Let h(x) = f (x) − f (a) (b − a) − f (b) − f (a) (x − a). Then h is continuous on [a, b]
and differentiable on (a, b). Also h(a) = h(b) = 0. By the extreme value theorem (Corollary
16.4), h takes on its maximum and minimum values on [a, b]. We note that at least one
of these occurs in the interior (a, b). For if both occur at the endpoints, then h must be
identically zero, and hence achieves its maximimum and minimum at every point of [a, b].
Let c ∈ (a, b) be such a point. By Lemma 20.9 we have h0 (c) = 0. Differentiating h gives
h0 (x) = f 0 (x)(b−a)− f (b)−f (a) . Then the equation h0 (c) = 0 gives the desired result.
Remark 20.12. There is an alternate phrasing of the mean value theorem that is often
convenient. Let f : I → R be differentiable, where I is an open interval. Let a ∈ I and
h ∈ R \ {0} be such that a + h ∈ I. If we wish to apply the mean value theorem to the closed
interval having a and a + h as endpoints, we would like to express the conclusion without
declaring which is the left, and which the right, endpoint. We avoid this inconvenience in the
following way: the point c lies (strictly) between a and a + h if and only if there is a number
0 < θ < 1 such that c = a + θh. Thus we reexpress the mean value theorem in the following
way: if a, a + h ∈ I then there exists 0 < θ < 1 such that f (a + h) = f (a) + hf 0 (a + θh).
Now we give some corollaries of the mean value theorem.
Corollary 20.13. Let I ⊆ R be an open interval, and let f : I → R be differentiable. If
f 0 = 0 on I, then f is constant on I.
Proof. Let x0 ∈ I, and apply the mean value theorem to the interval between x0 and x,
for any x ∈ I. We find that there is c strictly between x0 and x such that f (x) − f (x0 ) =
f 0 (c)(x − x0 ) = 0. Thus f (x) = f (x0 ) for all x ∈ I.
Corollary 20.14. Let I be as in the previous corollary, and let f , g : I → R be differentiable.
If f 0 = g 0 on I, then f − g is a constant function.
Proof. Apply the previous corollary to f − g.
46 JACK SPIELBERG
a. A similar argument shows that the minimum does not occur at b. Hence f has a local
minimum in the open interval (a, b), and at this point f 0 = 0.
Proof. (of Theorem 20.19) By Theorem 20.20 we know that f 0 > 0 on I, or that f 0 < 0
on I. By Corollary 20.16 it follows that f is strictly monotone on I. It follows from the
intermediate value theorem that f (I) is an open interval, and that f −1 is continuous. We now
show that f −1 is differentiable, and compute its derivative. For x ∈ I let y = f (x) ∈ f (I).
For w ∈ f (I) with w 6= y, there is t ∈ I such that w = f (t). Since f is one-to-one, t 6= x.
We have −1
f −1 (w) − f −1 (y)
t−x f (t) − f (x)
= = .
w−y f (t) − f (x) t−x
Since f −1 is continuous, limw→y t = x. Moreover t 6= x during this limiting process. Therefore
−1
f −1 (w) − f −1 (y)
f (t) − f (x) 1 1
lim = lim = 0 = 0 −1 .
w→y w−y t→x t−x f (x) f f (y)
Corollary 20.21. If f is C r (in addition to the hypotheses of the inverse function theorem),
then so is f −1 .
Proof. The formula for (f −1 )0 shows that it is continuous if f 0 is continuous. Similarly, it is
differentiable if f 0 is differentiable, etc.
If you consider the function h used in the proof of the mean value theorem, you will notice
the beginnings of some symmetry: the function f and the identity function play opposite
roles. Remarkably, the identity function can be replaced by another function like f . The
result is
Theorem 20.22. (Cauchy mean value theorem.) Let f , g : [a, b] → R be continuous, 0
and differentiable
0 on (a, b). Then there exists c ∈ (a, b) such that f (b) − f (a) g (c) =
g(b) − g(a) f (c).
Proof. Let h(t) = f (b)−f (a) g(t)−g(a) − f (t)−f (a) g(b)−g(a) . Then h is continuous
on [a, b], differentiable on (a, b), and h(a) = h(b) = 0. Now the mean value theorem gives
the result.
We apply Cauchy’s mean value theorem to prove L’Hôpital’s rule on the computation of
indeterminate limits. The proof applies to any form of continuous limit — here we phrase
it for one-sided limits.
Theorem 20.23. (L’Hôpital’s rule.) Let f , g : (a, b) → R be differentiable. Suppose that
limt→a+ f (t) = limt→a+ g(t) = 0, and that g(t) 6= 0 on (a, b). If limt→a+ f 0 (t)/g 0 (t) = L, then
limt→a+ f (t)/g(t) = L.
Proof. Define f (a) = g(a) = 0. Then f and g are continuous on [a, b). By the hypothesis on
the limit of f 0 /g 0 , we are implicitly assuming that g 0 (t) 6= 0, at least for all t close enough to
a. Replacing b by a smaller value, we may assume that g 0 6= 0 on (a, b). Now, for t ∈ (a, b),
we apply Cauchy’s mean value 0 theorem to f and g on the interval [a, t]. Thus there exists
0
c ∈ (a, t) with f (t) − f (a) g (c) = g(t) − g(a) f (c). Since f (a) = g(a) = 0, we get
f (t)g 0 (c) = g(t)f 0 (c). By hypothesis we have g(t) 6= 0. Thus we have
f (t) f 0 (c)
= 0 .
g(t) g (c)
48 JACK SPIELBERG
Moreover, a < c < t. Of course, c depends on t, but we see that as t → a+ then also c → a+ .
Hence
f (t) f 0 (c)
lim+ = lim+ 0 = L.
t→a g(t) c→a g (c)
With a bit
more work, the same result can be proved in the case where we assume
limt→a+ f (t) = limt→a+ g(t) = ∞. This is an interesting exercise, or you may look up
the proof (e.g. in Rudin). If limt→a+ f (t) = 0 while limt→a+ g(t) = ±∞, evaluating the limit
limt→a+ f (t)g(t) presents us with the third kind of indeterminate form, namely 0 · ∞. In this
case, we would instead consider the limit of f /(g −1 ), which is indeterminate of form 0/0.
We see by this lemma that it is easy to find a polynomial that approximates f well at the
point a. It is not as easy to see how well this polynomial approximates f near the point a.
For this, we have Taylor’s Theorem. One can think of it as the generalization of the mean
value theorem from order 0 to order k. The proof is a bit tricky; we will use Cauchy’s mean
value theorem.
NOTES, MAT 472, INTERMEDIATE ANALYSIS, FALL 2010 49
(2) Define g : R → R by
(
e−1/x , if x > 0
g(x) =
0, if x ≤ 0.
It is a nice exercise to show that g has derivatives of all orders at 0 (this is clear at
other points of R), and that g (j) (0) = 0 for all j. Thus all Taylor polynomials of g
at 0 are identically zero. Therefore the Taylor polynomials of g do not approximate
g uniformly in any neighborhood of zero.
Among the first properties of integration that are presented in calculus are the “sum” and
“scalar multiple” rules:
Z Z Z Z Z
(f + g) = f + g; (cf ) = c f.
In fact, these are indicating precisely that integration is a linear functional. Linear algebra
is an essential part of modern analysis, and the analysis of linear functionals, functional
analysis, is one of its broadest subdisciplines.
Well, the notion of linear map presupposes the idea of vector spaces: the domain and
codomain of a linear map should be vector spaces. This is a fundamental idea, that is
almost completely lost in a calculus course: the collection of functions that can be integrated
should be a vector space. To be candid, we don’t really talk at all about the “space of
integrable functions” in a calculus course. At best, we try to explain why certain functions
are integrable, e.g. continuous, or piecewise continuous, functions. This time, we will directly
address this question. Not only will we carefully define what integrable means, and prove that
the set of integrable functions is a vector space. We will give an independent characterization
(due to Lebesgue) of exactly which functions are integrable. This is useful even just in the
context of Riemann integration. Many important results that would otherwise require fussy
proofs will become effortless (so to speak). But it also prods us to a larger view. Once we
are able to see the space of Riemann integrable functions as a whole, we can also begin to
NOTES, MAT 472, INTERMEDIATE ANALYSIS, FALL 2010 51
see its limitations, and where it might give way to generalization. In the next semester we
will spend some time (how much???) exploring Lebesgue’s version of integration.
That is the end of the “introduction”. We have to get started, and the beginning is
very basic — after all, integration is just a lot of arithmetic. We will follow Pugh’s idea
of emphasizing the fact that there are two usual ways to present the integral; he refers to
them as the Riemann and the Darboux approaches. Without any expertise in the history of
mathematics, or any effort at tracking down that history, we will just adopt this terminology.
First we give the Riemann approach. We let f be a real-valued function on a compact interval
[a, b].
Definition 22.1. A partition of [a, b] is a finite set P ⊆ [a, b] such that a, b ∈ P .
The idea of a partition is that it defines a subdivision of [a, b] into a finite number of
subintervals. The easiest way of indicating this is by giving the set of endpoints of the
subintervals, which is what our definition does. We usually write a partition in the form
P = {x0 , x1 , . . . , xn },
where a = x0 < x1 < · · · < xn = b. This is a slight abuse of notation, since the definition
of P as a set does not indicate that the numbers in the set are given in (strictly) increasing
order. From the partition P we obtain n subintervals of [a, b]: [x0 , x1 ], . . ., [xn−1 , xn ]. Note
that the number n associated with P is obtained from the relation n + 1 = #(P ). We use
the term mesh for the length of the largest subinterval: mesh(P ) = max1≤i≤n (xi − xi−1 ).
The mesh is a rough sort of description of how fine the partition is.
Definition 22.2. A partition pair is a partition P together with a list T = (t1 , . . . , tn ) such
that xi−1 ≤ ti ≤ xi for 1 ≤ i ≤ n.
Thus the list T consists of a selection of one element from each subinterval of the partition.
Definition 22.3. Let f : [a, b] → R, and let (P, T ) be a partition pair for the interval [a, b].
The Riemann sum associated to this data is the number
n
X
R(f, P, T ) = f (ti )∆xi ,
i=1
where ∆xi = xi − xi−1 , the length of the ith subinterval.
Now we have the terminology we need to define Riemann integrability and the Riemann
integral. As mentioned above, Riemann sums are just a lot of (carefully organized) arith-
metic. To pass to the integral is a limiting process. The following definition is the usual
notion of limit, but is based on the mesh.
Definition 22.4. The function f : [a, b] → R is Riemann integrable if there is a number L
such that for every ε > 0, there exists δ > 0 such that for every partition pair (P, T ) of [a, b],
if mesh(P ) < δ then R(f, P, T ) − L < ε.
We write L = limmesh(P )→0 R(f, P, T ) to indicate this limit. The number L is unique, if
it exists. This is proved in theR usual way
R b of limits,
R band is left to youR as an exercise. If f is
b
Riemann integrable, we write a f (or a f dx, or a f (x) dx, or just f ) for the number L.
We will write R[a, b] for the set of all Riemann integrable functions on [a, b].
There is an important detail hidden in the last definition. For the limit to exist it must
be the case that the approximation holds independently of the choice of the list T in the
52 JACK SPIELBERG
partition pair. In other words, if P is a partition with mesh(P ) < δ, then the Riemann sum
is within ε of L for any choice of T .
We now give some consequences of the definition.
Theorem 22.5. If f is Riemann integrable then f is bounded.
Proof. We apply the definition of integrability with ε = 1: there
exist L and δ > 0 such that
if P is any partition with mesh(P ) < δ, then R(f, P, T ) − L < 1. (As we mentioned above,
this estimate holds for any choice of T .) It follows from the triangle inequality that
n
X
f (ti )∆xi < 1 + |L|.
i=1
We will show that f is bounded on each subinterval of [a, b] defined by P . It will then follow
that f is bounded on [a, b]. Fix i0 ∈ {1, 2, . . . , n}. For i 6= i0 choose ti ∈ [xi−1 , xi ]. For any
t ∈ [xi0 −1 , xi0 ] we apply the above inequality to the list T = (t1 , . . . , ti0 −1 , t, ti0 +1 , . . . , tn ):
X
f (t)∆xi0 − f (ti )∆xi ≤ R(f, P, T ) < 1 + |L|.
i6=i0
We find that !
X
f (t) ≤ (∆xi0 )−1
1 + |L| + f (ti )∆xi .
i6=i0
Thus the right hand side is an upper bound for |f | on [xi0 −1 , xi0 ].
Theorem 22.6. R[a, b] is a vector space, and integration defines a linear functional on it.
Proof. We note that for a fixed partition pair (P, T ), the Riemann sum is linear in f :
X
R(cf + g, P, T ) = (cf + g)(ti )∆xi
i
X
= cf (ti ) + g(ti ) ∆xi
i
X X
=c f (ti )∆xi + g(ti )∆xi
i i
= cR(f, P, T ) + R(g, P, T ).
Since addition and multiplication in R are continuous, we get
lim R(cf + g, P, T ) = lim cR(f, P, T ) + R(g, P, T )
mesh(P )→0 mesh(P )→0
Mi = sup f (t).
xi−1 ≤t≤xi
X
L(f, P ) = mi ∆xi
i
X
U (f, P ) = Mi ∆xi .
i
These are referred to as lower and upper sums. Notice that for any partition pair (P, T )
we have that L(f, P ) ≤ R(f, P, T ) ≤ U (f, P ). Finally we define
I(f ) = sup L(f, P )
P
These are referred to as the lower and upper integrals of f on [a, b]. It is standard to write
Rb Rb
a
f for I(f ), and a f for I(f ). Finally, we say that f is Darboux integrable on [a, b] if I = I,
and in this case the common value is called the (Darboux) integral.
Our goal for this section is to prove that the Riemann and Darboux approaches yield the
same result. Before doing this we need to talk a bit about refinements of partitions, and
their effect on upper and lower sums and integrals.
Definition 23.2. Let P and P 0 be partitions of [a, b]. We say that P 0 refines P if P ⊆ P 0 .
It is easy to see that P 0 refines P if and only if every subinterval associated to P 0 is
contained in one of the subintervals associated to P .
Lemma 23.3. (Refinement Principle) Let P 0 refine P . Then L(f, P ) ≤ L(f, P 0 ) and
U (f, P 0 ) ≤ U (f, P ).
In other words, refining the partition causes the lower sum to increase, and the upper sum
to decrease. The idea of the proof is to proceed from P to P 0 by adding one point at a time.
Then the change in the lower and upper sums happens on only one subinterval of P . We
leave as an exercise the writing of a precise proof.
In general, if P1 and P2 are two partitions of [a, b], then neither one need refine the other.
Thus there is in general no relation between the upper and lower sums for two partitions.
However, P1 and P2 always have a common refinement; for example, P1 ∪ P2 contains both
P1 and P2 . This device gives us the following important result: every lower sum for f is less
than or equal to every upper sum for f .
Lemma 23.4. Let P1 and P2 be two partitions of [a, b]. Then L(f, P1 ) ≤ U (f, P2 ).
54 JACK SPIELBERG
I = lim U (f, P ).
P →∞
R
If f is Darboux integrable, then we have f = limP →∞ L(f, P ) = limP →∞ U (f, P ).)
We are now ready to prove the main theorem of this section.
Theorem 23.6. Let f : [a, b] → R. Then f is Riemann integrable if and only if f is Darboux
integrable. For an integrable function, the two integrals coincide.
Proof. We first assume that f is Riemann integrable. Let ε > 0. There exist a number L
and δ > 0 such that if P is any
partition with mesh(P ) < δ, then for any list T associated
to P we have R(f, P, T ) − L < ε. Fix any partition P with mesh(P ) < δ. Then we have
(for any T )
L − ε < R(f, P, T ) < L + ε.
Recall that for any partition pair (P, T ), we have L(f, P ) ≤ R(f, P, T ) ≤ U (f, P ). Moreover,
it is easy to see that
L(f, P ) = inf R(f, P, T )
T
U (f, P ) = sup R(f, P, T ).
T
It follows that
L − ε ≤ L(f, P )
L + ε ≥ U (f, P ).
Therefore U (f, P ) − L(f, P ) ≤ 2ε. Hence f is Darboux integrable.
NOTES, MAT 472, INTERMEDIATE ANALYSIS, FALL 2010 55
Now we assume that f is Darboux integrable. The proof of this direction is a bit trickier
than the other one. In particular, it relies upon the standard technique of dividing the sum
into two kinds of terms, and estimatingR b them differently. Since f is bounded, there is K
such that |f | ≤ K on [a, b]. Let L = a f (the Darboux integral of f ). Let ε > 0. Choose a
partition P such that
U (f, P ) − L(f, P ) < ε.
Write P = {x0 , x1 , . . . , xn }. Set δ = nε . We will show that if (Q, T ) is any partition pair
with mesh(Q) < δ, then R(f, Q, T ) − L < (2K + 1)ε, proving Riemann integrability
(and also showing that the two integrals coincide). In fact, it will suffice to show that
U (f, Q) − L(f, Q) < (2K + 1)ε, since both L and R(f, Q, T ) lie between the lower and upper
sums.
So let Q = {y0 , y1 , . . . , yk } have mesh less than δ. We will write Ii = [xi−1 , xi ] for 1 ≤ i ≤ n,
and Jj = [yj−1 , yj ] for 1 ≤ j ≤ k. We divide the subintervals associated to Q into two groups
as follows:
S1 = {j : there exists i with xi ∈ int(Jj )}
S2 = {1, 2, . . . , k} \ S1 .
Thus S2 indicates those Jj ’s that are entirely contained in one of the Ii ’s; S1 indicates those
Jj ’s that straddle more than one of the Ii ’s. There are at most n elements in S1 (in fact,
there are at most n − 1). Now we will use m(I) and M (I) for the infimum and supremum
of f over an interval I. For j ∈ S1 we have
−K ≤ m(Jj ) ≤ M (Jj ) ≤ K.
For j ∈ S2 there is i such that Jj ⊆ Ii . Then
m(Ii ) ≤ m(Jj ) ≤ M (Jj ) ≤ M (Ii ).
Hence for this j and i we have
M (Jj ) − m(Jj ) ≤ M (Ii ) − m(Ii ).
Now we estimate:
k
X
U (f, Q) − L(f, Q) = M (Jj ) − m(Jj ) ∆yj
j=1
X X
= M (Jj ) − m(Jj ) ∆yj + M (Jj ) − m(Jj ) ∆yj
j∈S1 j∈S2
X n
X X
≤ 2K∆yj + M (Ii ) − m(Ii ) ∆yj
j∈S1 i=1 j∈S2
Jj ⊆Ii
n
X
< 2Knδ + M (Ii ) − m(Ii ) ∆xi
i=1
= 2Kε + U (f, P ) − L(f, P )
< (2K + 1)ε.
56 JACK SPIELBERG
There are various situations where it is fairly easy to prove integrability (or non-integrability)
using the Darboux definition. These are useful exercises in working with the definition. In
the next section we will prove a deep theorem that will make them trivial to verify.
Example 23.7. (1) Continuous functions are Riemann integrable.
(2) Monotone functions are Riemann integrable.
(3) Step functions are Riemann integrable. (A step function on [a, b] is a function for
which there exists a partition of [a, b] such that the function is constant on the interior
of each subinterval.) In particular, the characteristic function χ[c,d] of a subinterval
[c, d] of [a, b] is Riemann integrable over [a, b], where χE (x) = 1 if x ∈ E and = 0 if
x 6∈ E.
(4) More generally, a bounded function that is continuous at all but finitely many points
of [a, b] is Riemann integrable.
(5) The characteristic function of Q is not Riemann integrable over any interval.
Proof. Proofs for the previous three assertions are left as exercises.
(7) The Cantor set C has measure zero.
Proof. Recall from our construction of C that C = ∞
T
n=1 Fn , where Fn is the union
of 2n closed intervals, each of length 3−n . Stretching each of these a little bit, we
can produce 2n open intervals Ui each having length less than (2.5)−n and having
Pn 2 n
union containing Fn (and hence C). Then 2i=1 |Ui | = 2.5 , which tends to zero as
n → ∞.
(8) If a < b then [a, b] does not have measure zero. This is a good exercise, even if it
isn’t homework (but it might be).
Before stating the main theorem, we recall (from homework) the notion of oscillation of a
function at a point. The definition makes sense for a function between general metric spaces,
but for clarity we will state it only for functions whose codomain is R.
Definition 24.3. Let X be a metric space, and let f : X → R. Let a ∈ X. The oscillation
of f at a is
osc(f, a) = inf sup f (x) − f (y) .
r>0 x,y∈B (a)
r
This is the precise description of a very natural idea. Let’s briefly take the definition apart.
Fix r > 0. This defines an open ball about a. How much can the function vary over this
ball? The supremum in the parentheses is exactly how much. If we let r become smaller,
then the ball becomes smaller, so that there are fewer points in the ball to put inside of f .
Thus as r decreases, the supremum also decreases. In fact, the infimum over r is actually
equal to the limit as r → 0. This limiting value is the minimum amount that f can be made
to jump, no matter to how small a ball (centered at a) you confine its argument. That is
what we mean by the oscillation at a.
We can think of the oscillation of f at a as a measure of the size of the discontinuity of f
at a. That is an interpretation of the first part of the following lemma (which should have
been homework earlier in the semester).
Lemma 24.4. Let X be a metric space, let f : X → R, and let a ∈ X.
(1) f is continuous at a if and only if osc(f, a) = 0.
(2) For c > 0, {x ∈ X : osc(f, x) ≥ c} is a closed set.
Theorem 24.5. Let f : [a, b] → R be bounded. Let E be the set of points in [a, b] where f is
discontinuous. Then f is Riemann integrable if and only if E has measure zero.
1
Proof. We first assume that f is Riemann S∞ integrable. Let En = {x ∈ [a, b] : osc(f, x) ≥ n }.
By Lemma 24.4 (1), we know that E = n=1 En . Thus it suffices to show that En has measure
zero for each n. So now fix n, and choose a partition P such that U (f, P ) − L(f, P ) < nε .
Let
S = {I : I is a subinterval of P, and int(I) ∩ En 6= ∅}.
For I ∈ S we have that M (I) − m(I) ≥ n1 . (The reason is that there must exist a point
a ∈ En in the interior of I, so that I ⊇ Br (a) for some r > 0.) But now we estimate
X
1
n
|I| ≤ U (f, P ) − L(f, P ) < nε ,
I∈S
58 JACK SPIELBERG
P S
so that I∈S |I| < ε. Now the union I∈S I contains all points of En except possibly some
of the endpoints of subintervals of P not in S. There can be only finitely many such points.
Let T be a collection
P P of open intervals centered at these points with total length so small that
I∈S |I| + J∈T |J| < ε. Then {int(I) : I ∈ S} ∪ T is a finite collection of open intervals
covering En and having total length less than ε. Therefore En has measure zero.
Now we prove the converse. Suppose that E has measure zero. Let |f | ≤ K on [a, b],
and let ε > 0 be given. Let E0 = {x ∈ [a, b] : osc(f, x) ≥ ε}. Then E0 ⊆SE, so that
E0 also has measure zero. Let U1 , U2 , . . . be open intervals such that E0 ⊆ ∞ i=1 Ui and
P ∞
i=1 |Ui | < ε. By Lemma 24.4(2), E0 is closed.
Sn Since E0 ⊆ [a,b], E0 is compact. Thus there
is n such that E0 ⊆ U1 ∪ · · · ∪ Un . Let P0 = i=1 ∂Ui ∩ [a, b] ∪ {a, b}, a partition of [a, b].
We will find a suitable refinement P of P0 such that U (f, P ) − L(f, P ) < (2K + b − a)ε,
which will conclude the proof. Since P0 contains the endpoints of the Ui ’s, each subinterval
associated to P0 is either contained in some Ui , or is disjoint from all of the Ui ’s. Let S1
denote the collection of those subintervals that are contained in some Ui , and let S2 denote
the remaining subintervals. Then for I ∈ S1 we have
M (I) − m(I) ≤ 2K.
Hence
X n
X
M (I) − m(I) |I| ≤ 2K |Ui | < 2Kε.
I∈S1 i=1
Now consider a subinterval I ∈ S2 . Then I ∩ E0 = ∅, so the oscillation of f at each point
of I is less than ε. Thus for each x ∈ I there is an open interval Ix centered at x such that
M (Ix ) − m(Ix ) < ε. The collection {Ix : x ∈ I} is an open cover of the compact interval I,
hence has a finite subcover: there are x1 , . . ., xk ∈ I such that I ⊆ ki=1 Ixi . We define P by
S
including into P0 all endpoints of the Ixi that lie in I:
[ [
P = P0 ∪ (∂Ixi ) ∩ I .
I∈S2 i
Let us consider the subintervals of P contained in some I ∈ S2 , let J be one such. Then
J ⊆ Ixi for some i, and hence M (J) − m(J) < ε. Therefore
XX XX X
M (J) − m(J) |J| < ε|J| = ε |I| ≤ ε(b − a).
I∈S2 J⊆I I∈S2 J⊆I I∈S2
We now have
X XX
U (f, P ) − L(f, P ) = M (I) − m(I) |I| + M (J) − m(J) |J|
I∈S1 I∈S2 J⊆I
Proof. The integrability follows from the previous corollary. It is easy to use the definition
of the integral to show that the integral is zero.
Corollary 24.8. Riemann integrability, and the value of the Riemann integral, of a function
are unaffected when the function is altered at finitely many points.
Proof. The altered function equals the sum of the original function with a function that is
zero except at finitely many points. Thus the previous corollary, together with linearity of
the integral, give the result.
Corollary 24.9. Monotone functions are Riemann integrable.
Proof. This follows from the fact that a monotone function has countably many discontinu-
ities. To see this, note that a monotone function has one-sided limits at all points, and is
discontinuous at a point if and only if the two one-sided limits at that point are distinct. If
we let
f (x±) = lim± f (t),
t→x
then for any x 6= y we have f (x−), f (x+) ∩ f(y−), f (y+) = ∅. Thus if we let q(x) be
a rational number in the interval f (x−), f (x+) for each discontinuity x of f , then q is a
one-to-one function from the set of discontinuities into Q. Therefore the set of discontinuities
is countable, and hence of measure zero.
Corollary 24.10. The product of Riemann integrable functions is Riemann integrable.
Proof. The set of discontinuities of f g is contained in the union of the sets of discontinuities
of f and g separately.
Corollary 24.11. Let f be Riemann integrable on [a, b], and let ϕ be a continuous function
defined on the range of f . Then ϕ ◦ f is Riemann integrable (also on [a, b]).
Proof. Since composition preserves continuity, the set of points where f is continuous is
contained in the set of points where ϕ ◦ f is continuous. Hence the sets of discontinuities
satisfy the reverse containment.
Remark 24.12. The order in which the two functions are composed in the previous corollary
is crucial: f ◦ ϕ need not be integrable. (You can remember which order preserves integrabi-
lity by noting that in the corollary, the composition has the same domain as the integrable
function.)
Corollary 24.13. If f is Riemann integrable, then so is |f |.
Proof. |f | = | · | ◦ f .
Corollary 24.14. Let f be Riemann integrable on [a, b], and let [c, d] ⊆ [a, b]. Then f is
Rd Rb
Riemann integrable on [c, d]. Moreover, c f = a f χ[c,d] .
Proof. For the first statement, note that any discontinuity of f in [c, d] is also a discontinuity
in [a, b]. The second statement follows easily from either definition of the integral by including
{c, d} into a partition of [a, b].
Corollary 24.15. Let f be Riemann integrable on [a, b], and let c ∈ (a, b). Then
Z b Z c Z b
f= f+ f.
a a c
60 JACK SPIELBERG
Proof. This follows from linearity of the integral and Corollary 24.8, since f χ[a, b] and
f χ[a,c] + f χ[c,b] can differ only at c.
The image of a set of measure zero under a continuous function need not have measure
zero. This is a pretty strange phenomenon. The upshot is that continuity is not really such a
strong property. It is important that a stronger version of continuity is sufficient to preserve
measure zero sets.
Lemma 24.16. Let g : [a, b] → R be a Lipschitz function, and let E ⊆ [a, b] have measure
zero. Then g(E) has measure zero.
Proof. Let c > 0 be a Lipschitz
constant for g. We claim that if I is an open interval
contained in [a, b], then g(I) ≤ c|I|.To see this, let I = (t − r, t + r). Then by the Lipschitz
condition, g(I) ⊆P g(t) − cr, g(t) + cr . Now let ε > 0. Let U1 , U2 , . . . be open intervals with
ε
S
E ⊆ i Ui and i |Ui | < c . Let us assume that Ui ⊆ [a, b]; this is not a serious restriction,
as we may extend the domain of g to all of R (e.g. by letting g be S constant on (−∞, a] and
on [b, ∞)) without changing the Lipschitz constant. Then g(E) ⊆ i g(Ui ), and
X X
g(Ui ) ≤ c |Ui | < c εc = ε.
i i
Proof. One of a, b and c lies between the other two. By symmetry, we may assume without
loss of generality that it is b that lies in the middle. Again without loss of generality, we may
assume that a < b < c. Now, if f is integrable on [a, b], we are done by Corollary 24.14. On
the other hand, if f is integrable on [a, c] and [c, b], then f is integrable on [a, b] by Theorem
24.5.
R a R b
Remark 25.3. If a < b, then b f ≤ a |f |.
NOTES, MAT 472, INTERMEDIATE ANALYSIS, FALL 2010 61
Proof. Since g 0 is continuous on [c, d], it does not change sign. We first consider the case
where g 0 > 0 on [c, d]. Then g(c) < g(d). Note that g −1 is also continuously differentiable,
by the inverse function theorem, and hence that g −1 is Lipschitz. By Corollary 24.17, f ◦ g
is Riemann integrable, and hence so is (f ◦ g)g 0 . Let L and L0 be the two integrals in the
statementof the theorem, and let ε > 0. Let δ > 0 be such that
for any partition pair
(Q, U ) of g(c), g(d) with mesh(Q) < δ we have R(f, Q, U ) − L < ε. Since
g is uniformly
continuous on [c, d] there is η1 > 0 such that if |x − x0 | < η1 then g(x) − g(x0 ) < δ.
Choose
η2 > 0 such
that for any partition pair (P, T ) of [c, d] with mesh(P ) < η2 we have
R (f ◦ g)g 0 , P, T − L0 < ε. Fix a partition P of [c, d] with mesh(P ) < min{η1 , η2 }.
Write P = {x0 , x1 , . . . , xn }. Let yi = g(xi ), and let Q = g(P ) = {y0 , y1 , . . . , yn }. Since
mesh(P ) < η1 we know that mesh(Q) < δ. The mean value theorem applied to g on
[xi−1 , xi ] gives ti ∈ (xi−1 , xi ) such that
g(xi ) − g(xi−1 ) = g 0 (ti )(xi − xi−1 )
i.e. ∆yi = g 0 (ti )∆xi .
Let ui = g(ti ), and set U = (u1 , . . . , un ). Then (Q, U ) is a partition pair of g(c), g(d) , and
n
X n
X
f g(ti ) g 0 (ti )∆xi = R (f ◦ g)g 0 , P, T .
R(f, Q, U ) = f (ui )∆yi =
i=1 i=1
Therefore
|L − L0 | ≤ L − R(f, Q, U ) + R (f ◦ g)g 0 , P, T − L0 < ε + ε = 2ε.
Hence L = L0 .
If, on the other hand, g 0 < 0 on [c, d], then g(d) < g(c). Note that ∆yi = −g 0 (ti )∆xi (and
R g(c)
i runs backward). But R(f, Q, U ) approximates g(d) f = −L.
and so
Z 1 1 1
1−δ
Z Z
1 2 n n+1
gn (t) dt = (1 − t ) dt ≤ (1 − δ 2 )n dt = (n + 1)(1 − δ 2 )n → 0
δ cn δ 2 δ 2
R −δ Rδ
as n → ∞. Similarly, −1 gn → 0 as n → ∞. Hence −δ gn → 1.
Now we will state and prove the Weierstrass approximation theorem.
64 JACK SPIELBERG
It follows that !
2n
X Z 1 2n
X
pn (x) = f (u) aij ui du xj
j=0 0 i=0
is a polynomial in x.
≤ ε|x − a| + ε
= ε |x − a| + 1
≤ ε(M + |a| + 1).
Thus (fn ) is uniformly Cauchy on J, and hence converges uniformly on J (and pointwise on
all of I too).
Let f be the limit of fn . We now show that f is differentiable, and that f 0 = g. Let
ε > 0. Choose N so that kfn0 − fm 0
ku < ε/3 for m, n ≥ N . Letting m → ∞, we see also that
kfn0 − gku ≤ ε/3. Now fix n ≥ N , and fix x ∈ I. For any h 6= 0 such that x + h ∈ I, we have
fn (x + h) − fn (x) f (x + h) − f (x) fn (x + h) − fn (x) fm (x + h) − fm (x)
− = m→∞
lim −
h h h h
(fn − fm )(x + h) − (fn − fm )(x)
= lim
m→∞ h
= lim (fn − fm )0 (x + θh),
m→∞
As a last example of the interchange of two limiting processes, we give a result on differen-
tiating an integral. For this we recall from earlier experience the notion of partial derivative.
Let f : [a, b] × [c, d] → R, and suppose that for each y ∈ [c, d] the function x 7→ f (x, y) is
differentiable on [a, b]. The partial derivative of f with respect to x is defined by
∂f f (x + h, y) − f (x, y)
(x, y) = lim .
∂x h→0 h
Theorem 27.3. Let f : [a, b] × [c, d] → R be continuous, and suppose that ∂f /∂x exists and
Rd
is continuous on [a, b] × [c, d]. Let G : [a, b] → R be defined by G(x) = c f (x, y) dy. Then
Rd
G is differentiable on [a, b], and G0 (x) = c (∂f /∂x)(x, y) dy.
68 JACK SPIELBERG
Proof. Let ε > 0. Since ∂f /∂x is continuous on the compact set [a, b] × [c, d], it is uniformly
continuous. Let δ > 0 be as in the definition of uniform continuity for ∂f /∂x on [a, b] × [c, d]
and for the positive quantity ε/(d − c). Now if x, x + h ∈ [a, b] with 0 < |h| < δ, then
Z d Z d
G(x + h) − G(x) ∂f f (x + h, y) − f (x, y) ∂f
− (x, y) dy = − (x, y) dy
h c ∂x c h ∂x
Z d
∂f ∂f
= (x + θh, y) − (x, y) dy ,
∂x c ∂x
for some θ ≡ θ(x, y, h) ∈ (0, 1),
ε
≤ (d − c) = ε.
d−c
Rd
It follows that G0 (x) = c
(∂f /∂x)(x, y) dy.
P∞
Example 28.5. (1) The series n=0 (−1)n diverges, since limn→∞ (−1)n does not exist
(and hencePis not equal to zero). √
(2) The series ∞ n=1 n
−1/n
diverges, since limn→∞ n−1/n = 1/(limn→∞ n n) = 1 is nonzero.
P P P P
Proposition
P 28.6.PIf an and bn converge, then so does (λan + µbn ), and (λan +
µbn ) = λ an + µ bn .
Proof. This follows immediately from the corresponding results for sequences.
NOTES, MAT 472, INTERMEDIATE ANALYSIS, FALL 2010 69
This last is the partial sum of a geometric series with ratio 21−p . Since p > 1, the ratioPis less
than 1, and hence the geometric series converges. It follows that the partial sums of 1/np
are bounded, hence it converges.
Next we suppose that 0 < p ≤ 1. We have that
2 n
X
ai = a1 + a2 + (a3 + a4 ) + (a5 + · · · + a8 ) + · · · + (a2n−1 +1 + · · · + a2n )
i=1
≥ a2 + 2a4 + 4a8 + · · · + 2n−1 a2n ,
70 JACK SPIELBERG
We also have the following facts as immediate corollaries of the theorems on uniform
convergence.
Theorem 29.4. Let fn : X → R be functions.
P P
(1) If fn is continuous for all n, and fn converges
P uniformly, then fn is continuous.
P
(2) If X = [a, b], fn ∈ R[a, b] for all n, and fn converges uniformly, then fn ∈
RbP PRb
R[a, b], and a fn = f .
a n P 0 P
(3) If X = [a, b], fn is differentiable forPall n, fn converges uniformly, and P fn (x0 )
fn is differentiable, and ( fn )0 = fn0 .
P
converges for some x0 ∈ [a, b], then
We see that (x0 − R, x0 +R) ⊆ D ⊆ [x0 − R, x0 +R], where D is the domain of convergence
of the power series.
p −1
Theorem 30.4. R = lim supn→∞ n |an |
1/n
Proof. We use the root test: lim supn→∞ an (x − x0 )n = lim supn→∞ |an |1/n |x − x0 |.
p −1
This is less than 1 if |x − x0 | < lim supn→∞ n |an | , and greater than 1 if |x − x0 | >
p −1
lim supn→∞ n |an | . This identifies the number R as in the statement of the theorem.
Let an(x−x0 )n have a positive radius of convergence R. Then nan (x−
P P
Theorem 30.5.
x0 )n−1 and x0 )n+1 also have radius of convergence R. If f : (x0 − R, x0 +
P
an /(n + 1) (x −P
R) → R is defined by f (x) = ∞ n 0
n=0 aRn (x − x0 ) , then these converge to f (x) and F (x) for
x
x ∈ (x0 − R, x0 + R) (where F (x) = x0 f (t) dt).
Proof. For x 6= x0 ,
∞ ∞
X X nan
nan (x − x0 )n−1 = (x − x0 )n .
n=0 n=0
x − x0
1/n
Since limn n = limn |x − x0 |1/n = 1, we have
nan 1/n
lim sup = lim sup |an |1/n = R−1 .
n→∞ x − x0 n→∞
We will characterize the compact subsets of C(X, Rk ). Recall that a set is compact if
and only if it is complete and totally bounded. Since C(X, Rk ) is already a complete metric
space, a subset is complete if and only if it is closed. Therefore we will focus our attention
on the property of total boundedness: how can we describe in a more intrinsic way what it
means for a subset of C(X, Rk ) to be totally bounded?
Let F ⊆SC(X, Rk ) be totally bounded. Let ε > 0. Then there are f1 , . . ., fn ∈ F such
that F ⊆ ni=1 Bε (fi ). Since X is compact, and the fi are continuous, they are uniformly
continuous.
for each i there is δi > 0 such that for all x, y ∈ X, if d(x, y) < δi then
Thus
fi (x) − fi (y)
< ε. Let δ = min{δ1 , . . . , δn }. We claim that for any function f ∈ F, this δ
works in the definition of uniform continuity. To see this, let f ∈ F, and let x, y ∈ X with
d(x, y) < δ. There is i0 , 1 ≤ i0 ≤ n, such that kf − fi0 k < ε. Then
f (x) − f (y)
≤
f (x) − fi0 (x)
+
fi0 (x) − fi0 (y)
+
fi0 (y) − f (y)
< ε + ε + ε = 3ε.
Thus we have shown that the functions in the family F are “equally uniformly continuous”.
This phrase has been shortened to “equicontinuous”.
Definition 31.1. Let F be a family of functions between metric spaces X and Y . Let
x0 ∈ X.
(1) F is equicontinuous at x0 if for each ε > 0 there is δ > 0 such that for each f ∈ F
and for all x ∈ X, if dX (x, x0 ) < δ then dY f (x), f (x0 ) < ε. (I.e. δ is independent
of the choice of f ∈ F.)
(2) F is equicontinuous (on X) if it is equicontinuous at each point of X.
(3) F is uniformly equicontinuous (on X) if for each ε > 0 there is δ > 0 such that for
each f ∈ F and for all x, z ∈ X, if dX (x, z) < δ then dY f (x), f (z) < ε.
Exercise 31.2. If X is compact, and F is equicontinuous, then F is uniformly equicontin-
uous.
Because of this exercise, when X is compact we need not distinguish between equicontinu-
ity and uniform equicontinuity. We remark that there are stupid examples of equicontinuous
families. For example, in C(X, R), we may consider the family of all constant functions.
This family is clearly equicontinuous, but is not totally bounded (or even bounded). For this
reason we identify another property of a family of functions.
Definition 31.3.
F ⊆ C(X, Rk ) is pointwise bounded if for each x ∈ X, the set F(x) :=
f (x) : f ∈ F is a bounded subset of Rk .
Exercise 31.4. If F ⊆ C(X, Rk ) is pointwise bounded and equicontinuous, then F is a
bounded subset (of C(X, Rk )).
We remark that a totally bounded subset of C(X, Rk ) is also bounded, and hence pointwise
bounded. Thus we have already proved the following result.
Lemma 31.5. Let X be compact and F ⊆ C(X, Rk ). If F is totally bounded, then F is
pointwise bounded and equicontinuous.
The Arzela-Ascoli theorem is the converse of the lemma. It is usually phrased in terms of
precompactness: a subset of a metric space is precompact if its closure is compact. In the
setting of C(X, Rk ), then, precompactness is the same as total boundedness.
Theorem 31.6. Let X be a compact metric space, and let F ⊆ C(X, Rk ). Then F is
precompact if and only if it is pointwise bounded and equicontinuous.
NOTES, MAT 472, INTERMEDIATE ANALYSIS, FALL 2010 75
Proof. As remarked above, we have already proved the “only if” direction. So we assume that
F is pointwise bounded and equicontinuous. We use Exercise 31.2; hence F is uniformly
equicontinuous. Let ε > 0. Choose δ > 0 as in the definition of uniform equicontinuity
of F. SinceS X is compact, X is totally bounded. Then there are x1 , . . ., xp ∈ X such
that X = pi=1 Bδ (xi ). Now we use the pointwise boundedness of F. For each i, the set
F(xi ) = f (xi ) : f ∈ FS is a bounded subset of Rk , hence is totally bounded (by Lemma
13.27). Then the union pi=1 F(xi ) is also totally bounded. So we can choose points y1 , . . .,
yq ∈ Rk such that
p q
[ [
F(xi ) ⊆ Bε (yj ).
i=1 j=1
Now we come to the interesting part of the argument. Let f ∈ F. For each i, choose j
such that f (xi ) ∈ Bε (yj ). This defines a function ηf : {1, 2, . . . , p} → {1, 2, . . . , q}. Thus ηf
satisfies the formula
f (xi ) ∈ Bε (yηf (i) ).
But notice that there are only a finite number of possible functions η : {1, 2, . . . , p} →
{1, 2, . . . , q}. For each such function η, let
Cη = {f ∈ F : ηf = η}.
Then F ⊆ η Cη , a finite union. Each Cη is a subset of C(X, Rk ). To finish the proof, we
S
will show that Cη has diameter at most 4ε. Let f , g ∈ Cη , for some η. Then for i = 1, . . .,
p, we have f (xi ), g(xi ) ∈ Bε (yη(i) ). For any x ∈ X choose i with x ∈ Bδ (xi ). Then
f (x) − g(x)k ≤
f (x) − f (xi )
+
f (xi ) − g(xi )
+
g(xi ) − g(x)
< ε +
f (xi ) − g(xi )
+ ε,
by the uniform equicontinuity of F (and the choice of δ),
< ε + 2ε + ε,
since f (xi ) and g(xi ) belong to a ball of radius ε. Thus kf − gku < 4ε. Therefore Cη has
diameter at most 4ε.
n
X n
X
aj b j = (sj − sj−1 )bj
j=m j=m
n
X n−1
X
= sj b j − sj bj+1
j=m j=m−1
n−1
X
= sj (bj − bj+1 ) + sn bn − sm−1 bm .
j=m
76 JACK SPIELBERG
We then have
Xn X n−1
aj b j ≤ sj (bj − bj+1 ) + |sn bn | + |sm−1 bm |
j=m j=m
n−1
X
≤ M (bj − bj+1 ) + M bn + M bm ,
j=m
= 2M bm .
P
If bm → 0 as m → ∞, the series an bn converges by the Cauchy criterion.
Corollary 32.3. (Alternating series test.) Let (bn ) be a decreasing sequence with limit 0.
Then the alternating series
∞
X
b1 − b2 + b3 − · · · = (−1)n−1 bn
n=1
converges, and ∞
P j−1
j=n+1 (−1) bj ≤ bn+1 .
Proof. With an = (−1)n , Abel’s theorem provesPconvergence, and gives the estimate with a
factor of 2. However, since the partial sums of an are all non-negative (either 0 or 1), the
estimate in that proof can be improved as in the statement of the corollary. We leave the
details to the interested reader.
P∞ n−1
Example 32.4. (1) (The alternating harmonic series.) n=1 (−1) /n = 1 − 1/2 +
1/3 − 1/4 + 1/5 − · · · converges by the alternating series test. (We will see later that
the sum is log 2.)
(2) Let θ be an irrational number. (In fact, the argument we present applies to any
non-integral real number θ.) In the following, we will apply the formula for the sum
of a finite geometric series to complex numbers.
n
X n
X
sin 2πjθ = Im (cos 2πjθ + i sin 2πjθ)
j=1 j=0
n
X
= Im (cos 2πθ + i sin 2πθ)j
j=0
Hence
n
X 2
sin 2πjθ ≤
1 − (cos 2πθ + i sin 2πθ)
j=1
2
=p
(1 − cos 2πθ)2 + sin2 2πθ
r
2
= .
1 − cos 2πθ
P
Thus the series
P sin 2πnθ n sin 2πnθ has bounded partial sums. By Abel’s theorem, the series
n n
converges.
Abel also proved the following theorem on the behavior of a power series at an endpoint
of the interval of convergence.
Theorem 32.5. Let ∞ n
P
n=0 an (x − x0 ) have radius of convergence 0 < R < ∞. Suppose that
the series converges at an endpoint of the interval of convergence. Then the series converges
uniformly on the closed interval from x0 to that endpoint.
Corollary 32.6. With the hypotheses of the theorem, let f (x) denote the sum of the series
in its domain of convergence. Then f is continuous.
Proof. (of theorem) A linear change of variables reduces the theorem to the case where x0 = 0
and R = 1. We consider the case where the series converges at P the right-hand endpoint;
∞ n
the other case has a similar proof.
P Thus we have a power series n=0 an x with radius of
convergenceP 1, and such that an converges. Let ε > 0 be given. Applying the Cauchy
criterion to an , we obtain n0 ∈ N such that for all n0 ≤ m ≤ n we have
X n ε
aj < .
2
j=m
For any x ∈ [0, 1], the sequence xn is decreasing. We apply Abel’s theorem to the series
P ∞ j
j=n0 aj x to get
X n ε
aj x j 2 xm ≤ ε,
2
j=m
a uniform estimate.
n
P∞
Example 32.7. (1) From the geometric series 1/(1 − x) = n=0 x for |x| < 1, we
integrate term-by-term to obtain
Z x ∞ Z x ∞ ∞
dt X X xn+1 X xn
− log(1 − x) = = tn dt = = ,
0 1−t n=0 0 n=0
n + 1 n=1
n
still with radius of convergence equal to 1. Replacing x by −x we get
∞
X xn
(∗∗) log(1 + x) = (−1)n−1 ,
n=1
n
valid for |x| < 1. When x = 1 we have the alternating harmonic series, which
converges. By Abel’s theorem, the power series converges uniformly on [0, 1], and
78 JACK SPIELBERG
hence the limit is continuous. Since the equality in (∗∗) holds on [0, 1), and both
sides are continuous on [0, 1], the equality must hold at x = 1. This gives
1 1 1
log 2 = 1 − + − + ··· .
2 3 4
(2) Again starting with the geometric series, we replace x by −x2 to get
∞
1 X
= (−1)n x2n .
1 + x2 n=0
Since | − x2 | < 1 if and only if |x| < 1, this equation is also valid for |x| < 1. Now we
integrate term-by-term to get
Z x ∞
dt X (−1)n 2n
arctan x = 2
= x ,
0 1+t n=0
2n + 1
valid for |x| < 1. Again, the series converges for x = 1 by the alternating series test.
By Abel’s theorem, the series is continuous on [0, 1], and so the above equation is
still valid at x = 1. We obtain the classical series
π 1 1 1
= 1 − + − + ··· .
4 3 5 7
(3) We consider f (x) = (1 + x)α for α > 0, α 6∈ N. Repeated differentiation gives
f (n) (x) = α(α − 1) · · · (α − n + 1)(1 + x)α−n . Thus the Taylor series for f is given by
∞
X α(α − 1) · · · (α − n + 1)
1+ xn .
n=1
n!