1.0K views

Uploaded by Cam McIntyre

- My TOS in Mathematics 7 (1st Quarter)
- borel.pdf
- Galois Connections in Category Theory, Topology and Logic
- Www.math.Mcgill.ca Goren MATH371.2014 MATH371notes
- SyllabusMSc Mathematics
- Galois Theory
- La-metrica-de-Hausdorf-Exposicion-Metricos.pdf
- Cardinality
- EC 515 Introduction
- Part5-2
- Sequences in R
- Chap01 Metric and Normed Spaces
- On the Naturality of Reducible Fields
- SELECTED STORIES IN MATHEMATICS AND PHYSICS/book Lambert Academic Publishing
- tensor analysis
- d_009_01
- algebra 2 quarter 2 do nows
- Fsc Part 1 Class MCQs With Answers for Mathematics Chapter 1
- Classical Model Theory of Fields
- chapter5formulationandsolutionstrategies-121030153949-phpapp02

You are on page 1of 82

JACK SPIELBERG

Contents

1. Axioms for the real numbers 2

2. Cardinality (briefly) 8

3. Decimal representation of real numbers 9

4. Metric spaces 11

5. The topology of metric spaces 14

6. The Cantor set 17

7. Sequences 19

8. Continuous functions 21

9. Limits of functions 23

10. Sequences in R 24

11. Limsup and liminf 26

12. Infinite limits and limits at infinity 28

13. Cauchy sequences and complete metric spaces 29

14. Compactness 31

15. Continuity and compactness 36

16. Connectedness 37

17. Continuity and connectedness 40

18. Uniform continuity 42

19. Convergence of functions 43

20. Differentiation 45

21. Higher order derivatives and Taylor’s theorem 51

22. The Riemann integral 53

23. The “Darboux” approach 56

24. Measure zero and integration 59

25. The fundamental theorem of calculus 63

26. The Weierstrass approximation theorem 65

27. Uniform convergence and the interchange of limits 68

28. Infinite series 71

29. Series of functions 74

30. Power series 75

31. Compactness in function space 76

32. Conditional convergence 78

1

2 JACK SPIELBERG

In this course we will more-or-less follow an axiomatic approach. Namely, we will give

axioms for the real numbers, and prove everything in the course from these axioms. Well,

this is not strictly true — some things will be stated without proof. These may be as

simple as ordinary high school algebra, which we will assume is well-understood already

(and that you have had some experience in deriving from the axioms). We will also make

use of various functions familiar from calculus, such as the trigonometric, exponential and

logarithmic functions, even if we haven’t yet proved their existence and properties from

the axioms. However, in this case we will eventually at least sketch how this can be done

rigorously, and we promise that even if we never talk about these proofs, the use we make

of these functions isn’t needed for the proofs (thus we avoid any circularity in the logical

structure of the material). There is one theorem at the foundation of the course that we will

neither prove nor sketch. That one we will just take “on faith.” (You can read a proof in

the first chapter of Rudin’s book, and if you do, you will understand why we won’t use class

time for it.)

So now we begin. The axioms that define the real numbers come in three parts: the field

axioms, the order axioms, and the completeness axiom.

Definition 1.1. A field is a set F with two binary operations, addition (denoted +) and mul-

tiplication (denoted ·), that satisfy the following axioms (we assume that these are familiar,

so we only give them briefly).

(1) Addition and multiplication are associative and commutative.

(2) There exist identity elements for addition and multiplication, denoted 0 and 1, re-

spectively.

(3) 0 6= 1.

(4) Every element of F has an additive inverse.

(5) Every non-zero element of F has a multiplicative inverse.

(6) Multiplication distributes over addition.

All of the usual algebraic rules of arithmetic follow from these axioms. For example:

• Additive and multiplicative identities and inverses are unique.

• (−1)2 = 1.

• xy = 0 if and

Pnonlynif n−j

x = 0 or y = 0.

• (a + b) = j=0 j a bj , where the binomial coefficients are defined by

n

n n!

= .

j j!(n − j)!

We will assume familiarity with this stuff. It is interesting, though, to consider what is

actually included in the phrase “this stuff.” What facts from high school algebra are covered

by the field axioms? Here is an example of something that is not covered.

Example 1.2. Let F be a field. Are the elements 1, 1 + 1, 1 + 1 + 1, 1 + 1 + 1 + 1, . . .

all distinct? In fact, if we just have the field axioms, we can neither prove nor disprove

that these are all distinct elements. Notice that these are what we normally refer to as the

natural numbers (denoted N). So it isn’t clear that the natural numbers even make sense in

an arbitrary field.

Exercise 1.3. Explain why the “fact” stated in the previous example is true.

NOTES, MAT 472, INTERMEDIATE ANALYSIS, FALL 2010 3

field in an intelligent way. To resolve these problems (i.e. to make sure that our axioms

really do pick out the real numbers) we have to give more axioms.

Definition 1.4. Let F be a field. Then F is an ordered field if there is a distinguished

subset F + of F (called the positive elements of F ) satisfying the following properties.

(1) For each x ∈ F , exactly one of the three statements x ∈ F + , −x ∈ F + , x = 0 is true.

(2) If x, y ∈ F + then x + y ∈ F + .

(3) If x, y ∈ F + then xy ∈ F + .

Now we define the usual symbols to express order. For any elements x, y ∈ F , we write

x > y to mean x − y ∈ F + , x ≥ y to mean x > y or x = y, etc.

All of the usual rules of inequalities follow from the order axioms and the field axioms.

For example:

• If x ≤ y and y ≤ x, then x = y.

• If x 6= 0 then x2 > 0.

• If x < y and z > 0 then xz < yz.

• If 0 < x < y and n ∈ N then xn < y n .

• xy > 0 if and only if x > 0 and y > 0, or x < 0 and y < 0.

• A finite subset of F has a minimum. (Of course this is not true for infinite subsets.)

As a further example, let’s note that the order axioms resolve the ambiguity mentioned

above. If F is an ordered field, then 1 + 1 + · · · + 1 > 0. It follows easily (exercise!) that 1,

1 + 1, 1 + 1 + 1, . . . are all distinct positive elements of F . Thus N is “contained” in every

ordered field. But then the integers, Z, are also contained in every ordered field. But then the

rational numbers, Q, are also contained in every ordered field. In fact, the rational numbers

themselves are an ordered field (this is obvious, isn’t it?). Thus the rational numbers can be

described as the smallest ordered field.

We will now dip back into high school algebra to review the absolute value function. Even

though we assume familiarity with this stuff, absolute value is so important that it’s worth

stating some of the details.

Definition 1.5. Let F be an ordered field. The absolute value function | · | : F → F is

defined by (

x, if x ≥ 0,

|x| =

−x, if x < 0.

When we think of the real numbers as a “number line,” we can view |x| as the “distance”

between x and 0. The number line is great for intuition, but must not be used for proofs.

Here are the basic properties of absolute value.

(1) | − x| = |x|.

(2) |x| ≥ 0.

(3) ±x ≤ |x|.

(4) |x| < a if and only if −a < x and x < a (written −a < x < a).

(5) |x| > a if and only if x < −a or x > a (Note that this cannot be written without

using the word “or”).

(6) |x + y| ≤ |x|

+ |y| (the

triangle inequality).

(7) |x + y| ≥ |x| − |y| .

4 JACK SPIELBERG

(8) |x − a| < r if and only if a − r < x < a + r (draw a picture on the number line).

(9) If a < x < b and a < y < b then |x − y| < b − a.

(10) Let x ∈ F . Suppose that |x| < ε for every positive element ε ∈ F . Then x = 0.

Property 10 above can be strengthened a bit, in a way that can be very useful. (Don’t

cite property 10 when proving this.)

Exercise 1.6. Let F be an ordered field, and let x ∈ F . Suppose that p, q ∈ F + are such

that for every ε ∈ F with 0 < ε < p, we have |x| < qε. Then x = 0.

Remark 1.7. Here is another consequence of the ordered field axioms. Let b > 0. Then

(1 + b)n = 1 + nb + · · · > nb.

1

Now let 0 < a < 1. Then a

> 1, so

1−a 1

= − 1 > 0.

a a

1 1

Let b = a

− 1. Then a = 1+b

, and

n 1 1 a 1

a = < = .

(1 + b)n nb 1−a n

Now we ask the following question (assuming some familiarity with the concept of limit,

but only for the sake of the discussion): if 0 < a < 1 does an tend towards 0, as n → ∞?

Another way to put this is to ask: if c is any fixed positive element, does there exists n0 ∈ N

such that an < c for all n ≥ n0 ? Using the above computations, we see that we can answer

this question affirmatively if we could show that for any fixed positive element c, there exists

a

n0 ∈ N such that (1−a)n < c for all n ≥ n0 . Now observe that we could do this if we could

a

find n0 ∈ N such that (1−a)n 0

< c. In other words, we could prove that an → 0 if we could

a

find n0 ∈ N such that n0 > (1−a)c . But since c is an arbitrary positive element, then so

a

is (1−a)c . So this all comes down to trying to prove that for any positive element x, there

is a natural number n0 such that n0 > x. An ordered field in which this is true is called

Archimedean.

Definition 1.8. Let F be an ordered field. F is called Archimedean if for every x ∈ F there

exists a natural number n such that x < n.

It is evident that Q is an Archimedean ordered field, and we “know” that R is one too.

But we can’t prove it yet, because not all ordered fields are Archimedean!! In other words,

we don’t yet have enough axioms for the real numbers, since we can’t prove the most basic

fact from advanced calculus. Along with the field and order axioms, there is one more axiom

that is necessary to characterize the real numbers. We need some definitions before we can

present it.

Definition 1.9. Let F be an ordered field, let S ⊆ F , and let x ∈ F .

(1) x is an upper bound of S if y ≤ x for every y ∈ S.

(2) x is a lower bound of S if y ≥ x for every y ∈ S.

(3) S is bounded above if there exists an upper bound for S.

(4) S is bounded below if there exists a lower bound for S.

(5) S is bounded if it is bounded above and below.

Exercise 1.10. Is the empty set bounded?

NOTES, MAT 472, INTERMEDIATE ANALYSIS, FALL 2010 5

(or sup, or least upper bound, or lub) of S if

(1) x is an upper bound of S.

(2) For every upper bound z of S, x ≤ z.

The condition (2) can be expressed in the equivalent forms:

(20 ) For any z ∈ F , if z < x then z is not an upper bound of S.

(200 ) For any z ∈ F , if z < x then there exists y ∈ S with z < y.

In a completely analogous manner we define infimum (or inf, greatest lower bound, glb).

The details of the precise formulation are left as an exercise.

Remark 1.12. It follows immediately from condition (2) of Definition 1.11 that if S has a

supremum then it is unique (and similarly for infimum).

Exercise 1.13. Let S be a subset of an ordered field F , and let x ∈ F . Let −S = {−y :

y ∈ S}

(1) S is bounded above (respectively, below) if and only if −S is bounded below (respec-

tively, above).

(2) x is an upper (respectively, lower) bound for S if and only if −x is a lower (respec-

tively, upper) bound for −S.

(3) x is a supremum (respectively, infimum) of S if and only if −x is an infimum (respec-

tively, supremum) of −S.

Now we are ready to state the last axiom of the real numbers, the completeness axiom.

Definition 1.14. Let F be an ordered field. F is complete if every non-empty subset of F

that is bounded above has a supremum.

The following is an easy consequence of Exercise 1.13.

Corollary 1.15. Let F be a complete ordered field. Then every non-empty subset of F that

is bounded below has an infimum.

The next theorem is the foundation of the course, but is the one result that we won’t

attempt to prove. As mentioned earlier, you can read a proof in Rudin’s book.

Theorem 1.16. There exists a unique complete ordered field.

The one and only complete ordered field is called the field of real numbers, and we will

write R as an abbreviation. This is the same number line that we (think we) know and love.

But even though we have lots of intuition about it, we will insist on proving EVERYTHING

about it. For example, since R is an ordered field, R contains the rational numbers Q as a

subfield. Are there any other elements of R besides Q? Well, we think we know that there

are —√but how do we prove that there are? The usual way is to bring up the classical proof

that 2 is irrational. But this is sophistry! That proof “merely” shows that no rational

number has square equal to 2. It’s possible that there is an element of R having square equal

to 2. If there is such an element, then it can’t belong to Q, so it would be an element of

R \ Q. But we don’t know yet that there is a real number having square equal to 2. In fact,

there is, but this fact must be proved.

Let’s return to an even more basic point, the Archimedean property. We mentioned earlier

that R is Archimedean, and of course this is a fundamental property of the number line: the

6 JACK SPIELBERG

natural numbers march off arbitrarily far to the right. Our first theorem about the real

numbers is this fact. As we pointed out before, the proof must rely on the completeness

axiom, since not all ordered fields are Archimedean.

Theorem 1.17. R is Archimedean: for every x ∈ R there exists n ∈ N such that x < n.

Proof. We suppose that R is not Archimedean, and derive a contradiction. So let x ∈ R

be such that x ≥ n for all n ∈ N. This just means that x is an upper bound for N. Thus

the (non-empty) subset N of R is bounded above. By the completeness axiom, N has a

supremum. Let z = sup(N). Now z − 1 < z. By Definition 1.11 (200 ), there is an element

n ∈ N with n > z − 1. But then n + 1 > z. Since n + 1 ∈ N, this contradicts Definition 1.11

(1). Therefore R is Archimedean.

We now present some corollaries of the Archimedean property.

1

Corollary 1.18. If x ∈ R with x > 0, then there exists n ∈ N with n

< x.

Proof. By the Archimedean property there is n ∈ N with n > x1 . Then 1

n

< x.

Before stating the next corollary, we recall the well-ordering principle (WOP) and one of

its variations. The WOP states that a non-empty subset of N contains a smallest element.

This is a fundamental property of the natural numbers — it is logically equivalent to the

principle of mathematical induction. The variation we need states that a non-empty subset

of Z that is bounded below (in Z) contains a smallest element.

Corollary 1.19. For x ∈ R there exists a unique n ∈ Z with n ≤ x < n + 1.

Proof. Let x ∈ R. By the Archimedean property there is m ∈ N with m > |x|. Then

x > −m, so the set {k ∈ Z : k > x} is non-empty and bounded below (by −m). Let n + 1

be its smallest element. Then n + 1 > x. But since n < n + 1, n is not in this set, so

n ≤ x. This proves existence. For uniqueness, suppose that n and n0 both do the job. Then

x − 1 < n, n0 ≤ x, so (by property 9 of absolute value) we have |n − n0 | < 1. Since n, n0 ∈ Z

then n = n0 .

The integer n of Corollary 1.19 is denoted [x]. The function [·] : R → Z is called the greatest

integer function. (Some people denote it by bxc; b·c is also called the floor function.)

n

Corollary 1.20. For x ∈ R and for N ∈ N there exists a unique n ∈ Z such that N

≤x<

n+1

N

.

Proof. Apply Corollary 1.19 to N x.

Corollary 1.21. For x, ε ∈ R with ε > 0, there exists y ∈ Q such that |x − y| < ε.

Proof. By Corollary 1.18 there is N ∈ N with N1 < ε. By Corollary 1.20 there is n ∈ Z such

that Nn ≤ x < n+1

N

. Let y = Nn . Then y ∈ Q, and |x − y| = x − y < n+1

N

− Nn = N1 < ε.

The conclusion of Corollary 1.21 is often expressed as: Q is dense in R.

The completeness axiom is actually stronger than the Archimedean property. The next

result does not follow from the Archimedean property (as can be seen from the fact that the

conclusion does not hold in Q).

Theorem 1.22. Let n ∈ N. Every positive real number has a unique positive nth root.

NOTES, MAT 472, INTERMEDIATE ANALYSIS, FALL 2010 7

Proof. We first prove uniqueness. If 0 < y < z then y n < z n , so two distinct positive real

numbers cannot be nth roots of the same real number. We now prove existence. Let a > 1.

(If 0 < a < 1, then 1/a > 1. In this case, if we show that 1/a has a positive nth root, then

the inverse of that root will be a positive nth root for a.) Let E = {x ≥ 0 : xn ≤ a}. We

note that E 6= ∅ since 1 ∈ E. We claim that E is bounded above. To see this, note that if

x ∈ E then

xn ≤ aan .

Therefore x < a, and we see that a is an upper bound for E. Thus the completeness axiom

implies that y = sup(E) exists. We will show that y n = a, finishing the proof.

First note that y ≥ 1, since 1 ∈ E. We will use Exercise 1.6. Let 0 < ε < 1. First note

that since y − ε < y < y + ε, we have

(1) (y − ε)n < y n < (y + ε)n .

Since y − ε < y, property (200 ) of Definition 1.11 implies that there is x ∈ E with y − ε < x.

Then (y − ε)n < xn ≤ a. Also, since y + ε > y then y + ε 6∈ E, and hence a < (y + ε)n .

Therefore

(2) (y − ε)n < a < (y + ε)n .

From (1) and (2), and property 9 of absolute value, we have |y n − a| < (y + ε)n − (y − ε)n .

We have

n

n n

X n

y n−j εj − y n−j (−ε)j

(y + ε) − (y − ε) =

j=0

j

n

X n n−j j

y ε 1 − (−1)j

=

j=0

j

n

X n n−j j

=2 y ε

j=1

j

j odd

n

X n n−j

<

2 y ε, since ε < 1.

j=1

j

j odd

n

√

Now we know that R truly is bigger than Q; for example, 2 ∈ R \ Q. This one number

can be parlayed into many more.

Theorem 1.23. R \ Q is dense in R.

√

Proof. Let x ∈ R, and let ε > 0. By Corollary 1.21 there is z√∈ Q with |x 2 − z| < ε.

In fact, it follows

√ also that we can assume z 6= 0. Let y = z/ 2. Then y ∈ R \ Q, and

|x − y| < ε/ 2 < ε.

Definition 1.24. The elements of R \ Q are called irrational numbers.

Thus the irrational numbers are also dense in R. While Corollary 1.21 and Theorem 1.23

treat the rational and irrational numbers symmetrically, in fact the set of irrational numbers

8 JACK SPIELBERG

is much bigger than the set of rationals (Corollary 2.12). Before proving this, we will first

review some basic facts about the size of sets.

2. Cardinality (briefly)

Definition 2.1. Let A and B be sets.

(1) A and B are equivalent, written A ∼ B, if there exists a bijection from A to B. In

this case, A and B are said to be of the same cardinality.

(2) A is subequivalent to B, written A B, if there is a one-to-one function from A to

B.

The proof of the following proposition is elementary.

Proposition 2.2. For any sets A, B and C,

• A ∼ A.

• If A ∼ B then B ∼ A.

• If A ∼ B and B ∼ C then A ∼ C.

The next theorem is very useful, and its proof is a nice exercise.

Theorem 2.3. (Cantor-Bernstein) Let A and B be sets. If A B and B A then A ∼ B.

Definition 2.4. Let A be a set.

(1) A is finite if there is n ∈ N ∪ {0} such that A ∼ {1, 2, . . . , n}.

(2) A is infinite if A is not finite.

(3) A is denumerable if A ∼ N.

(4) A is countable if A is finite or denumerable.

(5) A is uncountable if A is not countable.

Proposition 2.5. (1) If m 6= n then {1, 2, . . . , m} 6∼ {1, 2, . . . , n}.

(2) N is infinite.

(3) A is countable if and only if A N.

(4) Let A1 , A2 , . . . be countable sets. Then ∪∞ n=1 An is countable, and for each n, A1 ×

· · · × An is countable.

(5) Q is countable.

Proof. The first three statements can be proved as exercises. For the fourth, let An =

{xn1 , xn2 , . . .}. Consider the list: x11 , x12 , x21 , x13 , x22 , x31 , . . .. For each entry, delete all

subsequent occurrences. What is left is a list, without duplications, of the elements of the

union. This defines a bijection from N to the union.

Suppose inductively that A1 × · · · × An is countable. Then

A1 × · · · × An+1 = ∪x∈An+1 A1 × · · · × An × {x}

is countable.

For the last statement, first note that Z is countable, as can be seen from the list: 0, 1,

-1, 2, -2, . . .. Since Z ∼ n1 Z, it follows from Proposition 2.2 that n1 Z is countable. Then

Q = ∪∞ 1

n=1 n Z is countable.

Q∞

Example 2.6. Let X = 1 {0, 1} = (x1 , x2 , . . .) : xi ∈ {0, 1} for all i . (Thus X is the

set of all sequences of 0’s and 1’s.)

Proposition 2.7. X is uncountable.

NOTES, MAT 472, INTERMEDIATE ANALYSIS, FALL 2010 9

Proof. We will show that if f : N → X is any function, then f is not onto. Therefore there

does not exist a bijection from N to X.

So let f : N → X be given. Let f (n) be the sequence (xn1 , xn2 , xn3 , . . .). Define an

element y = (y1 , y2 , . . .) ∈ X by yn = 1 − xnn . Then for each n, y and f (n) differ in the nth

slot, so that y 6= f (n). Therefore y is not in the range of f . Therefore f is not onto.

Remark 2.8. X ∼ P(N). To define a bijection from X to P(N), send a sequence x =

(x1 x2 . . .) to the set {n ∈ N : xn = 1}. It is easy to check that this works. In fact, this is a

special case of a general theorem of Cantor.

Theorem 2.9. If S is any set, and if f : S → P(S) is any function, then f is not onto.

Thus for any set S, S 6∼ P(S). (Since it is evident that S P(S), we observe that P(S)

has a larger cardinality than S.)

Proof. Given f , let E = {x ∈ S : x 6∈ f (x)}. It is easy to check that E is not in the range

of f .

The next result will be proved later (Corollary 6.4).

Theorem 2.10. R ∼ X.

Corollary 2.11. R is uncountable.

The previous corollary (and hence also the next corollary) can be proved from the results

of the next section, rather than from Corollary 6.4.

Corollary 2.12. The set of irrational numbers is uncountable.

We like to think of elements of R as infinite decimals: x ∼ x0 .x1 x2 x3 · · · , where x0 ∈ Z

and xn ∈ {0, 1, . . . , 9} for n ≥ 1. We want to make this precise without using infinite series.

Let x ∈ R. Let x0 = [x] ∈ Z. Then x0 ≤ x < x0 + 1, so

0 ≤ x − x0 < 1.

Then 0 ≤ 10(x − x0 ) < 10. We let x1 = [10(x − x0 )] ∈ {0, 1, . . . , 9}. We have

x1 ≤ 10(x − x0 ) < x1 + 1

x1 10−1 ≤ x − x0 < x1 10−1 + 10−1

0 ≤ x − x0 − x1 10−1 < 10−1 .

Inductively, suppose that we have constructed xn−1 ∈ {0, 1, . . . , 9} such that

n−1

X

0≤x− xi 10−i < 10−(n−1) .

i=0

Pn−1

Then 0 ≤ 10n (x − i=0 xi 10−i ) < 10. We set

n−1

X

n

xn = [10 (x − xi 10−i )] ∈ {0, 1, . . . , 9}.

i=0

n

Pn−1 −i

Then xn ≤ 10 (x − i=0 xi 10 ) < xn + 1, and hence

(1) 0 ≤ x − ni=0 xi 10i < 10−n .

P

10 JACK SPIELBERG

Thus we have defined x0 ∈ Z and xn ∈ {0, 1, . . . , 9} for n ≥ 1 so that (1) holds for all n.

Pn −i

(2) x = sup i=0 x i 10 : n ≥ 0 .

Proof. The proof is left as an exercise.

(3) (xn ) is not eventually equal to 9; precisely, for every n there is m ≥ n such that

xm 6= 9. The point is that, for example, if we start with x = 1, we will obtain the

expansion 1.0000 · · · , and NOT 0.9999 · · · .

Proof. The proof is left as an exercise.

(4) If x 6= y then there exists k such that xk 6= yk . In other words, the map that takes a

real number to its decimal expansion is one-to-one.

Proof. Let x < y. Choose n such that 10−n < y − x. Then

n

X n

X

−i −n

xi 10 ≤ x < y − 10 < yi 10−i .

i=0 i=0

(5) Let y0 ∈ Z and yn ∈ {0, 1, . . . , 9} for n ∈ N be such that (yn ) is not eventually equal

to 9. Then there is a real number x such that xn = yn for all n. This will prove that

there is a one-to-one correspondence between real numbers and decimal expansions

that do not terminate in a string of 9’s.

Proof. First note that ni=1 9/10i = 1 − 10−n , by summing a finite geometric series. Now we

P

have

Xn X n

−i

yi 10 ≤ y0 + 9 · 10−i = y0 + 1 − 10−n < y0 + 1.

i=0 i=1

Pn

Thus the set { i=0 yi 10−i : n ≥ 0} is bounded above. Let x be the supremum of this set.

Note that the elements of this set, indexed by n, form an increasing sequence. We will show

that xn = yn for all n. For n = 0, choose k such that yk 6= 9. For any m ≥ k,

m

X m

X

−i

yi 10 ≤ y0 + 9 · 10−i − 10−k = y0 + 1 − 10−m − 10−k < y0 + 1 − 10−k .

i=0 i=1

−k

P0 x ≤ y−i

It follows that 0 + 1 − 10 < y0 + 1. Therefore x0 ≤ y0 . On the other hand, we have

that y0 = i=0 yi 10 , so y0 ≤ x, and hence y0 ≤ x0 . Thus x0 = y0 . Suppose inductively

that xi = yi for i < n. Choose k > n with yk 6= 9, and let m ≥ k. We have

m

X n

X m

X

−i −i

yi 10 ≤ yi 10 + 9 · 10−i − 10−k

i=0 i=0 i=n+1

n

X

= yi 10−i + (1 − 10−m ) − (1 − 10−n ) − 10−k

i=0

n

X

< yi 10−i + 10−n − 10−k .

i=0

NOTES, MAT 472, INTERMEDIATE ANALYSIS, FALL 2010 11

Xn

x≤ yi 10−i + 10−n − 10−k .

i=0

Since xi = yi for i < n, we have

n−1

X

x− xi 10−i ≤ yn 10−n + 10−n − 10−k

i=0

n−1

X

10n (x − xi 10−i ) ≤ yn + 1 − 10−(k−n) < yn + 1

i=0

n−1

X

n

xn = [10 (x − xi 10−i )] ≤ yn .

i=0

For the reverse inequality,

n−1

X n−1

X n

X n−1

X

−i −i −i

x− xi 10 =x− yi 10 ≥ yi 10 − yi 10−i = yn 10−n .

i=0 i=0 i=0 i=0

Pn−1

Hence 10n (x − i=0 xi 10−i ) ≥ yn , and hence xn ≥ yn .

As we mentioned at the end of the previous section, the decimal representation of real

numbers can be used to prove that R is uncountable. The idea of the proof is a special case

of the proof of Cantor’s theorem. It is usually called Cantor’s diagonal argument.

Proof. (of Corollary 2.11) We suppose that R is countable, and deduce a contradiction. Let

x1 , x2 , . . . be a listing of the elements of (the supposedly countable set) R. Let xn have the

decimal representation xn0 .xn1 xn2 · · · . For each n ≥ 1 define yn as follows: if xnn 6= 1 let

yn = 1; if xnn = 1, let yn = 2. By construction, the sequence of digits yn is not eventually 9,

and therefore it is the decimal representation of a real number y. Now we see that for each

n, y and xn have decimal representations differing in the nth place; therefore y 6= xn . Thus

y is not in the list we started with, contradicting the assumption that this list contained all

real numbers.

4. Metric spaces

Much of what we do in analysis ultimately comes down to measuring the distance between

two real numbers. We use the absolute value for this: |x − y| is the distance between the

numbers x and y. There are many other situations where we use the distance between

points in an essential way. For example, the Pythagorean theorem is used to define the usual

distance between points in R2 , and even in Rn . One of the wonderful abstractions of XXth

century mathematics is a generalization of this notion of distance. In fact, it isn’t too hard to

notice that everything we use distance for in advanced calculus (e.g. limits, continuity, etc.)

relies only on a few very coarse aspects of the distance function. The following definition

sets these out precisely, and gives the basic setting for this course.

Definition 4.1. Let X be a set. A metric on X is a function d : X × X → R such that

(1) d(x, y) ≥ 0 for all x, y ∈ X (positivity).

(2) d(x, y) = 0 if and only if x = y (definiteness).

12 JACK SPIELBERG

(4) d(x, y) ≤ d(x, z) + d(z, y) (triangle inequality).

Example 4.2. The usual metric on R is defined by d(x, y) = |x − y|.

Remark 4.3. Two common variations of the triangle inequality are easily proved as exer-

cises:

(1) d(x, y) ≥ d(x, z) − d(y, z).

(2) d(x, y) ≤ d(x, z1 ) + d(z1 , z2 ) + · · · + d(zn−1 , zn ) + d(zn , y).

Many important examples of metric spaces arise from norms on vector spaces.

Definition 4.4. Let V be a real vector space. A norm on V is a function k · k : V → R such

that

(1) kxk ≥ 0.

(2) kxk = 0 only if x = 0.

(3) kcxk = |c|kxk for all c ∈ R (and x ∈ V ).

(4) kx + yk ≤ kxk + kyk.

Remark 4.5. If k · k is a norm on V , there is an associated metric on V given by d(x, y) =

kx − yk.

Example 4.6. (Function space) Let S be a nonempty set. The bounded (real-valued) func-

tions on S are defined by B(S, R) = {f : S → R : the range of f is bounded}. It is easy

to see that B(S, R) is a vector space (with point-wise operations). The uniform norm is

defined on B(S, R) by kf ku = supx∈S f (x) . It is an (easy) exercise to check that this is a

norm.

Example 4.7. Rn = R × · · · × R (n factors) can be thought of as B {1, . . . , n}, R . The

uniform norm here is usually denoted k · k∞ : k(x1 , . . . , xn )k∞ = max1≤i≤n |xi |.

Another important way of producing a norm on a vector space is by means of an inner

product.

Definition 4.8. Let V be a real vector space. An inner product on V is a function h·, ·i :

V × V → R such that

(1) hx, xi ≥ 0.

(2) hx, xi = 0 only if x = 0.

(3) hx, yi = hy, xi.

(4) hax + by, zi = ahx, zi + bhy, zi for all a, b ∈ R (and x, y, z ∈ V ).

Property 4 of Definition 4.8 is called linearity in the first variable. By properties 3 and

4 it follows that inner products are also linear in the second variable. It follows from these

that h0, yi = hy, 0i = 0 for all y ∈ V .

Theorem 4.9. (Cauchy-Schwartz inequality) Let V, h·, ·i be an inner product space. Then

hx, yi ≤ hx, xi1/2 hy, yi1/2 .

Proof. By the remarks before the theorem, the inequality holds if any of x, y, and hx, yi

equals zero. So suppose that all three are non-zero. Let a = −sgn hx, yi hy, yi1/2 and

b = hx, xi1/2 . (Recall that sgn(t) = 1 if t > 0, = −1 if t < 0, and = 0 if t = 0.) Then

0 ≤ hax + by, ax + byi = a2 hx, xi + 2abhx, yi + b2 hy, yi = a2 b2 − 2|a| b hx, yi + b2 a2 .

Dividing by 2|a| b gives the result.

NOTES, MAT 472, INTERMEDIATE ANALYSIS, FALL 2010 13

Corollary 4.10. Let V be an inner product space. For x ∈ V let kxk = hx, xi1/2 . Then k · k

is a norm on V .

Proof. We will prove the triangle inequality, leaving the verification of the other properties

of a norm as an exercise. Let x, y ∈ V . Then by the Cauchy-Schwartz inequality,

kx + yk2 = hx + y, x + yi = hx, xi + hx, yi + hy, xi + hy, yi = kxk2 + 2hx, yi + kyk2

2

≤ kxk2 + 2kxk kyk + kyk2 = kxk + kyk .

Example 4.11. The usual norm on Rn arises from the usual inner product. The corre-

sponding metric space is usually referred to as (n-dimensional) Euclidean space. We note

the following important inequalities for the Euclidean norm (proof by squaring).

Remark 4.12. Let x ∈ Rn . Then for any i,

|xi | ≤ kxk ≤ |x1 | + · · · + |xn |.

Definition 4.13. Let (X, d) be a metric space, and let Y ⊆ X. If we restrict the metric d

to points of Y then Y becomes a metric space, called a subspace of X.

Example 4.14. The circle (or torus) is a subspace of Euclidean space: T = (x, y) ∈ R2 :

√

x2 + y 2 = 1 . (Thus, for example, d (1, 0), (0, 1) = 2.)

It is very important to remember that, while pictures can give a lot of valuable intuition,

they are not a substitute for a proof. In this course, you may never use a picture as part

of a proof (though they can be included to help explain what you are doing). Well, it isn’t

really enough to just tell you not to touch the stove — you really have to burn yourself. The

following example is much more frequently encountered than you might imagine the first

time you see it. You should work through carefully on your own the details of the proof that

it is a metric space, and try to visualize it in some way (it’s unclear what that means!). It

provides a counterexample to many “obvious” facts about metric spaces that are not actually

true. The point is this: any theorem that we prove about metric spaces must be true for all

metric spaces. In particular, it will be true for the metric space in the next example.

Example 4.15. Recall the set X from Example 2.6. We define a metric on X as follows.

for x, y ∈ X with x 6= y, the set {i : xi 6= yi } is non-empty. By the well-ordering principle,

it has a least element. We set k(x, y) = min{i : xi 6= yi }. Then we define

(

1

k(x,y)

, if x 6= y

d(x, y) =

0, if x = y.

We claim that d is a metric on X. The proofs of positive definiteness and symmetry are

immediate. We will verify the triangle inequality. In fact, we will prove something stronger,

called the ultrametric inequality.

Lemma 4.16. For any x, y, z ∈ X, d(x, y) ≤ max d(x, z), d(y, z) .

Proof. We will write “k(x, x) = ∞” as a kind of shorthand. (But notice that then we have

that d(x, y) < d(u, v) if and only if k(x, y) > k(u, v), for any points x, y, u, v ∈ X.)

Now let x, y, z ∈ X. If d(x, y) ≤ d(x, z) then the inequality holds. So suppose that

d(x, y) > d(x, z). Then k(x, y) < k(x, z). Since xi = zi for i < k(x, z), we have that

= xk(x,y) 6= yk(x,y)

zk(x,y) . Therefore k(y, z) ≤ k(x, y), and hence d(x, y) ≤ d(y, z). Therefore

max d(x, z), d(y, z) = d(y, z) ≥ d(x, y).

14 JACK SPIELBERG

The following is another example of a metric space that varies from what our intuition

suggests. This one often seems like a stupid metric space . . . well, it is stupid, but it is

also a metric space. Every theorem about metric spaces must be true for it, and hence any

statement that is not true for this example, cannot be proven using only the axioms of a

metric space.

Example 4.17. Let S be any set. The discrete metric on S is defined by

(

1, if x 6= y,

d(x, y) =

0, if x = y.

Remark 4.18. It is easy to see that the discrete metric on a set with n points can be

realized as a subspace of Euclidean n-space. It is a little harder to find a natural setting for

the discrete metric on N. The discrete metric on R is a useful counterexample to keep in

mind.

Definition 5.1. Let (X, d) be a metric space, and let a ∈ X, r > 0. The open ball with

center a and radius r is the set

Br (a) = x ∈ X : d(x, a) < r .

The closed ball with center a and radius r is the set

B r (a) = x ∈ X : d(x, a) ≤ r .

Example 5.2. In R, Br (a) = (a − r, a + r) and B r (a) = [a − r, a + r]. You should sketch the

pictures of the open and closed balls in R2 . These pictures are extremely useful as intuition

when proving things. But it is NEVER permissible to use a picture as a substitute for a

proof.

Definition 5.3. Let (X, d) be a metric space, and let E ⊆ X.

(1) E is an open set if for each x ∈ E there is r > 0 such that Br (x) ⊆ E.

(2) E is a closed set if E c is an open set.

Proposition 5.4. In a metric space, open balls are open sets and closed balls are closed sets.

Proof. Let a ∈ X and r > 0, and let x ∈ Br (a). We need to find an open ball centered at x

(with some positive radius) that is completely contained in Br (a). We know that d(x, a) < r,

since x ∈ Br (a). Let s = r − d(x, a). Then s > 0. We claim that Bs (x) ⊆ Br (a). To prove

this, let y ∈ Bs (x). Then d(y, x) < s. Then

d(y, a) ≤ d(y, x) + d(x, a) < s + d(x, a) = r − d(x, a) + d(x, a) = r,

and hence y ∈ Br (a). Therefore Br (a) is an open set.

The proof that closed balls are closed sets is left as an exercise.

Proposition 5.5. The following hold in a metric space.

(1) The union of any collection of open sets is open.

(2) The intersection of a finite collection of open sets is open.

(3) The intersection of any collection of closed sets is closed.

(4) The union of a finite collection of closed sets is closed.

NOTES, MAT 472, INTERMEDIATE ANALYSIS, FALL 2010 15

Proof. These are easy exercises using DeMorgan’s laws (and the notation for families of

sets).

Example 5.6. (1) In any metric space X, X and ∅ are both open and closed. (It is a

fairly deep fact (to be proved later) that if X = Rn then these are the only sets that

are simultaneously open and closed.)

(2) A singleton set in a metric space is a closed set.

{x}c .

(4) In Rn , sets of the form {x : xi > c}, {x : xi < c}, are called open half-spaces, and are

open sets. Sets of the form {x : xi ≥ c}, {x : xi ≤ c} are called closed half-spaces,

and are closed sets.

to prove openness of open half-spaces. If H = {x : xi > c} and y ∈ H, let r = yi −c >

0. If z ∈ Br (y) then yi −zi ≤ |zi −yi | ≤ d(z, y) < r. Then zi = yi −(yi −zi ) > yi −r = c,

and hence z ∈ H. Thus Br (y) ⊆ H, and we have shown that H is open. The proof

for the other kind of open half-space is left as an exercise.

Definition 5.7. An open box in Rn is a set of the form (a1 , b1 ) × · · · × (an , bn ), where

−∞ ≤ ai ≤ bi ≤ ∞ for each i. Closed boxes in Rn are defined similarly, by including all

finite endpoints of the interval factors of the Cartesian product. We note that an open box

is a finite intersection of (at most 2n) open half-spaces, and hence is an open set. Similarly,

closed boxes are closed sets.

It is important to remember that, while the complement of an open set is a closed set, the

opposite of “open” is not “closed” — many (most, even) sets are neither open nor closed.

We next introduce the operations of interior and closure. These provide important open and

closed sets associated with arbitrary subsets of a metric space.

Definition 5.8. Let X be a metric space and let E ⊆ X. The interior of E is the set

[

int (E) = U : U ⊆ E and U is open .

\

E= K : K ⊇ E and K is closed .

(1) int(E) is an open set, and is the largest open set contained in E.

(2) E is a closed set, and is the smallest closed set containing E.

c

(3) E = int(E c ) , and int(E) = (E c )c .

16 JACK SPIELBERG

Proof. The first two items follow immediately from the definitions. For the third item, we

have

E = ∩{K : E ⊆ K, K closed}

(E)c = ∪{K c : E ⊆ K, K closed}

= ∪{K c : K c ⊆ E c , K c open}

= ∪{U : U ⊆ E c , U open}

= int (E c ).

Taking complements of both sides yields the first formula. If we apply the first formula to

E c , and take complements of both sides, we obtain the second formula.

The above definitions are abstract, in that they don’t give an explicit criterion to use to

decide if a point does or does not belong to the interior or closure of a set. We now give

such criteria.

Proposition 5.10. (1) x ∈ int(E) if and only if there is r > 0 such that Br (x) ⊆ E.

(2) x ∈ E if and only if for every r > 0, Br (x) ∩ E 6= ∅.

Proof. (1) This is almost instantly obtained from the definition, and we leave the details

as an exercise.

(2) We note that x ∈ (E)c if and only if x ∈ Int(E c ), by Remark 5.9(3). But this is true

if and only if there is r > 0 such that Br (x) ⊆ E c , by part (1). But this is true if and

only if there is r > 0 such that Br (x) ∩ E = ∅. By negating the first and last items

in this chain of equivalent statements, we find that x ∈ E if and only if for all r > 0,

Br (x) ∩ E 6= ∅.

Example 5.11. It is worth thinking about the above definitions and results in the context

of some examples. We note that in any metric space, if E is open then int(E) = E, while if

E is closed then E = E.

In R,

(1) int (a, b] = (a, b).

(2) int (Z) = ∅.

(3) int (Q) = ∅.

(4) (0, 1] = [0, 1].

(5) Z = Z.

(6) Q = R.

It might seem tempting to try to describe the new points sucked in by the closure operation,

i.e. the points of E that are not already in E. However it turns out to be much more useful

to describe the property that brings them into the closure. This property may apply also to

some points already in E.

Definition 5.12. Let X be a metric space, E ⊆ X, and a ∈ X. The point a is a cluster

point of E (also called by some people limit point or accumulation point) if for every r > 0,

the intersection E ∩ Br (a) is infinite. We write E 0 for the set of cluster points of E.

Example 5.13. Let X = R.

NOTES, MAT 472, INTERMEDIATE ANALYSIS, FALL 2010 17

(2) {0, 1, 21 , 13 , . . .}0 = {0}.

(3) Z0 = ∅.

(4) Q0 = R.

Note that E 0 ⊆ E — this follows from Proposition 5.10(2). Therefore E ∪ E 0 ⊆ E. In

fact, the two sides are equal, which fact is the content of the next result.

Proposition 5.14. E = E ∪ E 0 .

Before proving the proposition, we give a lemma that may seem surprising at first.

Lemma 5.15. a ∈ E 0 if and only if for each r > 0, E \ {a} ∩ Br (a) 6= ∅.

Proof. (⇒): Suppose that for some r > 0, E \ {a} ∩ Br (a) = ∅. Then E ∩ Br (a) ⊆ {a}, a

finite set. Hence a 6∈ E 0 .

(⇐): Let a 6∈ E 0 . Then there is r > 0 such that E ∩ Br (a) is finite. Let E ∩ Br (a) \ {a} =

Proof. (of Proposition 5.14) We already proved ⊇ in the comments before the proposition.

For ⊆, let a ∈ E. If a ∈ E then clearly a ∈ E ∪ E 0 . So suppose that a 6∈ E. Let

r > 0. By Proposition 5.10(2) we have E ∩ Br (a) 6= ∅. But since a 6∈ E this implies that

E \ {a} ∩ Br (a) 6= ∅. By Lemma 5.15, a ∈ E 0 .

Corollary 5.16. E is closed if and only if E 0 ⊆ E.

Definition 5.17. Let X be a metric space. A subset E ⊆ X is called bounded if there is a

ball in X that contains E. X is bounded if it is a bounded subset of itself. (In this case we

say that the metric is bounded.)

Remark 5.18. In Definition 5.17 it doesn’t matter whether the ball is required to be open

or closed.

Exercise 5.19. Let X be a metric space, and let E ⊆ X with E 6= ∅. We define the

diameter of E by

(

sup d(x, y) : x, y ∈ E , if E is bounded,

diam(E) =

∞, if E is unbounded.

(2) Prove that diam(E) = diam(E).

In this section we introduce the first “interesting” set that most people come across. Let

F0 = [0, 1], F1 = [0, 31 ] ∪ [ 23 , 1], F2 = [0, 19 ] ∪ [ 29 , 13 ] ∪ [ 23 , 79 ] ∪ [ 89 , 1], and so on. Recursively,

Fn is obtained by removing the open middle third from each subinterval of Fn−1 . Thus Fn

is the disjoint union of 2n closed intervals, each of length 3−n . Fn is closed, nonempty, and

Fn ⊇ Fn+1 .

Definition 6.1. The Cantor set, C, is the set ∞

T

n=0 Fn .

18 JACK SPIELBERG

It is a good idea to draw a picture. It isn’t hard to see that C is nonempty: all the

endpoints of the closed subintervals making up the Fn ’s belong to C. Still, this set of

endpoints is a countable set. In fact, C is much bigger, as we will now see. Recall the space

X of Definition 2.6. We will prove that C ∼ X.

Definition 6.2. We define f : X → C as follows. Let x = (x1 , x2 , . . .) ∈ X. For each n

define a closed interval In (x) recursively by

I0 (x) = [0, 1]

(

left piece of In (x) ∩ Fn+1 , if xn+1 = 0,

In+1 =

right piece of In (x) ∩ Fn+1 , if xn+1 = 1.

Then I0 (x) ⊇ I1 (x) ⊇ · · · . Let us write In (x) = [an , bn ]. The nesting of these intervals

implies that

a1 ≤ a2 ≤ · · · ≤ b 2 ≤ b 1 .

T∞

n=0 In (x) = [α, β]. To see

Let α = sup{a1 , a2 , . . .} and β = inf{b1 , b2 , . . .}. We claim that T

this, we firstTnote that since an ≤ α ≤ β ≤ bn for all n, [α, β] ⊆ ∞ n=0 In (x). On the other

∞

hand, if x ∈ n=0 , then an ≤ x ≤ bn for all n. Hence x is an upper bound for the set of an ’s,

and a lower bound for the set of bn ’s. Thus α ≤ x ≤ β. This proves the claim. Finally,T∞ since

−n −n

bn − an = 3 , we have β − α ≤ 3 for all n. Therefore α = β. It follows that n=0 In (x) is

the singleton set {α}. We define f by setting f (x) = α. More precisely, the above argument

allows us to describe f as follows:

\∞

{f (x)} = In (x).

n=0

Proposition 6.3. f is bijective.

Proof. We first show that f is injective. Let x, y ∈ X with x 6= y. Let k = k(x, y) (recall

Example 4.15). For i < k, xi = yi , so that Ii (x) = Ii (y). Since xk 6= yk , Ik (x) and Ik (y) are

two disjoint subintervals of Ik−1 (x) = Ik−1 (y). Since f (x) ∈ Ik (x) and f (y) ∈ Ik (y), we must

have f (x) 6= f (y)..

We now show that f is surjective. Let t ∈ C. Then t ∈ Fn for all n. For each n, let In

be the subinterval of Fn containing t. Since In and In+1 are subintervals of Fn and Fn+1 ,

respectively, then either In ⊇ IT n+1 or In ∩ In+1 = ∅. Since both contain t, we must have

In ⊇ In+1 . Thus we must have ∞ n=0 In = {t}. Now let

(

0, if In is the left piece of In−1 ∩ Fn ,

xn =

1, if In is the right piece of In−1 ∩ Fn .

Letting x = (x1 , x2 , . . .) ∈ X, we see that In = In (x) for all n, so that t = f (x).

Corollary 6.4. R, C, X, and P(N) are equivalent sets. In particular, R is uncountable.

Proof. In Remark 2.8 we sketched the proof that X ∼ P(N), while in Proposition 6.3 we saw

that X ∼ C. We finish the proof by showing that R ∼ C. Since C ⊆ R we have C R. By

the Cantor-Bernstein theorem, it suffices to show that R C. Since C ∼ P(N), it suffices

to show that R P(N). But since N ∼ Q, we know that P(N) ∼ P(Q). Thus we will

be finished if we can show that R P(Q). We do that as follows. We define a function

g : R → P(Q) by

g(t) = {q ∈ Q : q < t}.

NOTES, MAT 472, INTERMEDIATE ANALYSIS, FALL 2010 19

If s 6= t are distinct points of R, say s < t, then by the density of Q in R there exists

q ∈ Q with r < q < t. Then q ∈ g(t) and q 6∈ g(s), and we have g(s) 6= g(t). Hence g is

one-to-one.

Exercise 6.5. (1) int(C) = ∅.

(2) C 0 = C.

7. Sequences

Definition 7.1. Let X be a set. A sequence in X is a function x : N → X.

Remark 7.2. We usually write xn instead of x(n), but the latter notation is often useful

too. We sometimes write (xn )∞ n=1 , or (xn ), for x. It is important to remember that in this

notation, n is a dummy variable — it is the argument of the function x. (So, in particular,

there is nothing special about the letter n used as the argument — it will often be convenient

to use a different letter.) Some texts use curly braces instead of parentheses, but we will

avoid this notation, for the following reason. The range of the sequence x is the subset

{xn : n ∈ N} of X. This is often referred to as the set of terms of (xn ). It is important

to distinguish between the sequence itself (which is a function from N to X), and its set of

terms (which is a subset of X).

While we are on the subject of the subtlety of the notation for sequences, let me point

out a common mistake to guard against. What should we make of the following statement

(taken from more than one actual homework paper!): “Let (xn ) be a sequence, and let (xi )

be another sequence.”? Of course, this deserves a quantity of red ink, but you should think

carefully about the precise error. (And PLEASE don’t make this mistake too.)

Definition 7.3. Let (xn ) be a sequence in a metric space X, and let a ∈ X. (xn ) converges

to a if for every ε > 0, there exists n0 ∈ N such that for all n ≥ n0 we have d(xn , a) < ε. We

write xn → a (as n → ∞) to indicate that (xn ) converges to a.

Lemma 7.4. A sequence in a metric space converges to at most one point.

Proof. Suppose that xn → a and xn → b. Let ε > 0. There exist n1 , n2 ∈ N such that

d(xn , a) < ε/2 for all n ≥ n1 , and d(xn , b) < ε/2 for all n ≥ n2 . Let n = max{n1 , n2 }. Then

d(a, b) ≤ d(a, xn ) + d(xn , b) < ε/2 + ε/2 = ε. Since d(a, b) < ε for all ε > 0, it follows that

a = b.

Definition 7.5. If xn → a, a is called the limit of (xn ), and we write limn→∞ xn = a. We

say that (xn ) converges if it has a limit; otherwise it diverges.

Proposition 7.6. Let X be a metric space, let E ⊆ X, and let a ∈ X.

(1) a ∈ E if and only if there is a sequence in E converging to a.

(2) a ∈ E 0 if and only if there is a sequence in E \ {a} converging to a.

(3) E is closed if and only if every sequence in E that converges in X has its limit in E.

Proof. We prove part of the proposition, and leave the rest as an exercise.

(1) (⇒): Let a ∈ E. By Proposition 5.10(2), for each n ∈ N we have E ∩ B1/n (a) 6= ∅.

Choose xn ∈ E with d(xn , a) < 1/n. Then xn → a.

Remark 7.7. Sequences are an important tool for studying metric spaces. One can think

of a sequence as a kind of “probe” — a function from N to the space picks out a certain

countable subset in a manner indexed by the natural numbers. It is also useful to use

sequences as tools to study a sequence itself. This leads to the next definition.

20 JACK SPIELBERG

Definition 7.8. Let x be a sequence in a set X, and let n be a strictly increasing sequence

in N. (Thus n : N → N satisfies n1 < n2 < n3 < · · · .) Then x ◦ n is another sequence in X.

It is called a subsequence of x.

Remark 7.9. The terms of the subsequence ∞ x ◦ n may be denoted (x ◦ n)i = x n(i) =

xn(i) = xni . Thus we may write x ◦ n = xni i=1 .

The idea of a subsequence is pretty simple, but the notation can lead to lots of silly

mistakes, against which you should be on guard. For example, let (xn ) be a sequence. The

expression x50 makes sense — it is the 50th term of the sequence. Now let xni be a

subsequence. The expression xn50 makes sense — it is the 50th term of the subsequence,

and equivalently, it is the n50 th term of the original sequence. However, the expression x50i

does not make sense. If we try to interpret it, we first realize that it is the value of the

function x at the argument 50i . So 50i must be an element of the domain of x, namely a

natural number. Now 50i must be the value of the function 50 at the argument i. But this

is nonsense — ‘50’ is not a function, so it can’t be ‘evaluated’ at the argument i.

Here is another example to keep in mind. Suppose that we have a bunch of sequences

in X. Say that x1 , x2 , . . . are all sequences (i.e. we have a sequence of sequences). How

∞ the terms of the nth sequence? We have that xn : N → X, so we can write

should we write

xn = xn (i) i=1 , using function notation for xn . Note carefully that i is the argument of the

function xn , and not the argument of n (which is not a function). We have to be careful

about using subscript notation. If we weren’t being careful, we might write xn = (xni )∞ i=1 .

But this is the same as the notation for a subsequence ∞of a sequence x. One resolution of this

ambiguity is to use more parentheses: xn = (xn )i i=1 . The more usual way is to use two

subscripts: xn = (xni )∞

i=1 , and this is what we will do when we are faced with this situation.

Writing it out longhand for clarity gives xn = (xn1 , xn2 , xn3 , . . .). Note that it is necessary to

write so clearly that the reader does not mistake the second subscript for a sub-subscript.

Here is a simple result about subsequences.

Proposition 7.10. Let (xn ) be a convergent sequence in a metric space. Then every subse-

quence of (xn ) is also convergent, and has the same limit.

Remark 7.11. Before proving the proposition, we observe that if n : N → N is strictly

increasing, then ni ≥ i for all i. This is easily proved by induction on i, and we omit the

proof. We do point out that equality is possible. In fact, letting ni = i for all i shows that

any sequence is a subsequence of itself.

Proof. (of Proposition 7.10) Let xn → a, and let (xni ) be a subsequence. We will show that

xni → a. Let ε > 0. Since xn → a, there is m such that d(xn , a) < ε whenever n ≥ m. Now

if i ≥ m, then ni ≥ m, by the remark, so that d(xni , a) < ε. Thus xni → a (as i → ∞).

Remark 7.12. It is clear from the definition that convergence or divergence of a sequence

is unaffected if finitely many terms are changed. Convergence, divergence, the limit if con-

vergent, are examples of properties of a sequence that depend only on the ultimate behavior

of the sequence. In fact, such properties are the only ones that are important for sequences.

One way to describe this is by means of tails of a sequence. If (xn ) is a sequence, the nth

tail is the subsequence (xi )∞i=n . Thus, if the sequence converges to L, then every tail of

the sequence also converges to L. We sometimes say that a property holds eventually for a

sequence if it holds for some tail.

NOTES, MAT 472, INTERMEDIATE ANALYSIS, FALL 2010 21

function having a metric space as codomain is bounded if its range is a bounded subset of

the codomain (cf. Example 4.6).

Of course, since a sequence in a metric space is an example of a function with the metric

space as codomain, it makes sense to talk of bounded (and unbounded) sequences. The proof

of the next result is a good exercise, but it will also follow from some later results.

Lemma 7.14. Let (xn ) be a convergent sequence (in some metric space). Then (xn ) is

bounded.

8. Continuous functions

Definition 8.1. Let (X, d) and (Y, ρ) be metric spaces, f : X → Y a function, and x0 ∈ X.

f is continuous at x0 if for every ε > 0 there exists δ > 0 such that for every x ∈ X, if

d(x, x0 ) < δ then ρ f (x), f (x0 ) < ε. f is continuous if it is continuous at each point of X.

Remark 8.2. Here are some equivalent formulations of continuity at a point x0 .

(1) For every ε > 0 there exists δ > 0 such that f Bδ (x0 ) ⊆ Bε f (x0 ) .

(2) For every open ball C with center f (x0 ), there exists an open ball B with center x0

such that f (B) ⊆ C.

(3) For every ε > 0 there exists δ > 0 such that Bδ (x0 ) ⊆ f −1 Bε f (x0 ) .

(1) Let f : R → R be given by f (x) = x2 . Then f is continuous.

Proof. Let x0 ∈ R, and let ε > 0. Then for any x ∈ R,

f (x) − f (x0 ) = |x2 − x20 |

= |x − x0 | |x + x0 |

≤ |x − x0 | |x − x0 | + 2|x0 | ;

if |x − x0 | < 1, then

≤ |x − x0 | 1 + 2|x0 | ;

if |x − x0 | < ε/(1 + 2|x0 |), then

< ε.

Now choose δ > 0 such that δ < min 1, ε/(1 + 2|x0 |) . Then |x − x0 | < δ implies

that |x2 − x20 | < ε.

(2) Define the identity function id : X → X by id(x) = x. id is continuous.

(3) Fix y0 ∈ Y . Define f : X → Y by f (x) = y0 for all x ∈ X. Then f is continuous. (f

is called a constant function.)

(4) Define χQ : R → R by

(

1, if x ∈ Q

χQ (x) =

0, if x 6∈ Q.

χQ is discontinuous at each point of R.

22 JACK SPIELBERG

(

1

, if x = m in lowest terms, where m, n ∈ Z with n > 0

h(x) = n n

0, if x ∈ R \ Q.

Then h is continuous at each irrational number, and discontinuous at each rational

number. The proof is a nice exercise. (It is interesting to consider the opposite

continuity behavior.)

(6) We define the coordinate projections on Rn , πi : Rn → R, by πi (x) = xi . The πi are

continuous (by Remark 4.12).

Earlier we said that sequences are an important tool for studying objects in analysis. As

evidence, we now show how to use sequences to characterize continuity of a function between

metric spaces.

Theorem 8.4. Let X and Y be metric spaces, and let f : X → Y be a function. f

is continuous if and only if for every convergent sequence xn → a in X, we have that

f (xn ) → f (a) in Y . (Thus f is continuous if and only if it preserves convergent sequences,

and maps the limit of a convergent sequence to the limit of the image sequence.)

Proof. The forward direction is straightforward, and we leave it as an exercise. For the

reverse direction we prove the contrapositive. Suppose that f is not continuous at a. Then

there is ε > 0 such that for every δ > 0 there is x ∈ Bδ (a) with f (x) 6∈ Bε f (a) . We

apply this toδ = 1/n: thus there is a sequence (xn ) in X such that d(xn , a) < 1/n and

ρ f (xn ), f (a) ≥ ε. But then clearly xn → a while f (xn ) 6→ f (a).

We didn’t mention this before, but the word topology has a technical meaning: the topology

of a metric space is the collection of all the open subsets of the space. A property of the

space is topological if it can be defined just by using the open sets. It is very important to

know that continuity of functions is a topological property.

Theorem 8.5. f : X → Y is continuous if and only if for every open set V ⊆ Y , the inverse

image f −1 (V ) is open in X.

Proof. (=⇒): Let V ⊆ Y be open. Let x0 ∈ f −1 (V ). Then f (x0 ) ∈ V . Since V is open

there is ε > 0 such that Bε f (x0 ) ⊆ V . By Remark 8.2 (3) there is δ > 0 such that

Bδ (x0 ) ⊆ f −1 (V ). Hence f −1 (V ) is open.

−1

(=⇒): Let x0 ∈ X and let ε > 0. Since Bε f (x 0 ) is open, then f Bε f (x 0 ) is open.

Since x0 ∈ f −1 Bε f (x0 ) there is δ > 0 such that Bδ (x0 ) ⊆ f −1 Bε f (x0 ) . Therefore f

is continuous at x0 (by Remark 8.2 (3)).

Exercise 8.6. f : X → Y is continuous if and only if for every closed set V ⊆ Y , the inverse

image f −1 (V ) is closed in X.

This is a good place to introduce the notion of “sameness” for metric spaces. First, the

definition:

Definition 8.7. Let X and Y be metric spaces. A homeomorphism from X to Y is a

function f : X → Y which is bijective, continuous, and such that its inverse function f −1

is continuous. Two metric spaces are called homeomorphic if there exists a homeomorphism

from one to the other.

NOTES, MAT 472, INTERMEDIATE ANALYSIS, FALL 2010 23

Homeomorphic metric spaces have the same topological structure and properties. It is

colloquial to describe this by saying that one space can be deformed into the other by

bending and stretching without tearing. Here are some simple examples.

Example 8.8. (1) Any two open disks in R2 are homeomorphic.

(2) Any two closed disks in R2 having positive radii are homeorphic.

(3) No open disk in R2 is homeomorphic to any closed disk in R2 . (This is not an obvious

one.)

(4) Every open ball in Rn is homeomorphic to every open box in Rn .

(5) The unit circle T = {x ∈ R2 : kxk = 1} is not homeomorphic to the unit interval

[0, 1] ⊆ R. (Again, it isn’t so obvious how to prove this.)

Example 8.9. Recall the function f : X → C from Definition 6.2, where X = ∞

Q

1 {0, 1} is

as in Example 2.6, and C is the Cantor set (Definition 6.1). We will Qn show that f and f −1

are continuous functions. First some notation. If (a1 , a2 , . . . , an ) ∈ 1 {0, 1}, let

Z(a1 , . . . , an ) = {x ∈ X : xi = ai for 1 ≤ i ≤ n}.

Such sets are called cylinder sets. Note that cylinder sets are clopen: Z(a1 , . . . , an ) =

B1/n (x) = B 1/(n+1) (x) for any x ∈ Z(a1 , . . . , an ). Note also that f Z(a1 , . . . , an ) = C ∩In (x)

(again for any x ∈ Z(a1 , . . . , an )), which is a clopen subset of C (recall the definition of In (x)

from Definition 6.2). Thus these two families of clopen subsets are paired by the function

f . Since every open subset of X is a union of open balls, i.e. of cylinder sets, and every

open subset of C is a union of subsets of the form C ∩ In (x) (an exercise!), it follows from

Theorem 8.5 that f and f −1 are continuous.

The proofs of the next two results are easy, and so are left as exercises.

Corollary 8.10. (of Theorem 8.5) Let X be a metric space. f : X → R is continuous if

and only if f −1 (a, b) is open for all a < b in R. Equivalently, f : X → R is continuous if

and only if {f < a} and {f > a} are open for all a ∈ R.

Theorem 8.11. Let f : X → Y and g : Y → Z be functions between metric spaces, and let

x0 ∈ X. If f is continuous at x0 , and g is continuous at f (x0 ), then g ◦ f is continuous at

x0 .

9. Limits of functions

The definition we gave a while ago for the limit of a sequence is a special case of a general

notion of limit of a function — after all, a sequence is just a special kind of function. But

sequences are quite special. The definition of the limit of a function is a little bit more

involved. We will need it, in principle, when we talk about differentiation.

Definition 9.1. Let (X, d) and (Y, ρ) be metric spaces, let E ⊆ X, let x0 ∈ E 0 , and let

y0 ∈ Y . The limit of f , as x approaches x0 , equals y0 if for every ε > 0 there exists δ > 0

such that for all x ∈ E, if 0 < d(x, x0 ) < δ then ρ f (x), y0 < ε. (The final implication can

also be expressed as f E ∩ Bδ (x0 ) \ {x0 } ⊆ Bε (y0 ).) We write limx→x0 f (x) = y0 .

Remark 9.2. Note that f might or might not be defined at x0 (accordingly as x0 ∈ E or

x0 6∈ E). We require x0 ∈ E 0 so that for every δ > 0 there will exist points x satisfying the

hypothesis of the implication. Even if x0 ∈ E, the definition of the limit as x → x0 never

requires that f be evaluated at x0 — the value of f at x0 is irrelevant.

24 JACK SPIELBERG

Note further, that if we tried to apply this definition to a point x0 that is not a cluster

point of E, then we would find that the definition is satisfied for any point y0 ∈ Y . To avoid

this situation, we only consider limits at cluster points of the domain of the function.

Exercise 9.3. Show that in the situation of Definition 9.1, if the limit exists it is unique.

(Be sure to note explicitly where the hypothesis that x0 ∈ E 0 is used.)

(

f (x), if x ∈ E \ {x0 }

fe(x) =

y0 , if x = x0 .

Proof. The proof is left as an exercise.

Example 9.5. (1) It is easy to show that limt→0 t sin(1/t) = 0. Let f : R → R be given

by

(

t sin 1t , if t 6= 0

f (t) =

0, if t = 0.

Then f is continuous at 0.

(2) It is easy to show that limt→0 sin(1/t) does not exist. Let c ∈ R, and let g : R → R

be given by

(

sin 1t , if t 6= 0

g(t) =

c, if t = 0.

Then g is not continuous at 0.

Remark 9.6. Note that the definition of limit is local — it depends only on the restriction

of f to Br (x0 ), for any r > 0.

10. Sequences in R

Theorem 10.1. Let (an ) and (bn ) be sequences in R. Suppose that an → a, and bn → b.

Then

(1) an + bn → a + b.

(2) an bn → ab.

(3) If b 6= 0 then an /bn → a/b (where at most finitely many terms are not defined).

(4) If an ≤ bn for all n, then a ≤ b.

Proof. These are good exercises, so we will only prove part of the third statement; namely,

the case where an = 1 for all n. First, let’s sort out the parenthetical comment. If b 6= 0,

then |b| > 0. By definition of convergence, there is n0 such that |bn − b| < |b| for all n ≥ n0 .

But then, for all n ≥ n0 we have |bn | = |b − (b − bn )| ≥ |b| − |b − bn | > |b| − |b| = 0. Therefore

bn 6= 0 if n ≥ n0 . The quotient sequence will fail to be defined if the denominator equals

zero, but this can only happen for finitely many n (all less than n0 ).

Now let’s prove that if bn → b 6= 0, then 1/bn → 1/b. Let ε > 0. Let n1 be such that

|bn − b| < |b|/2 whenever n ≥ n1 . We can improve on the previous paragraph. If n ≥ n1 we

NOTES, MAT 472, INTERMEDIATE ANALYSIS, FALL 2010 25

have that |bn | ≥ |b| − |b − bn | > |b| − |b|/2 = |b|/2. Now let n2 be such that |bn − b| < |b|2 ε/2

whenever n ≥ n2 . Let n0 = max{n1 , n2 }. For n ≥ n0 we have

2

b − bn

= |bn − b| · 1 · 1 < |b| ε · 1 · 2 = ε.

1 1

− =

bn b bbn |b| |bn | 2 |b| |b|

Therefore 1/bn → 1/b.

Remark 10.2. The first three statements in the theorem mean that the functions + and

· : R2 → R, and ÷ : R × (R \ {0}) → R are continuous.

Remark 10.3. (1) It follows from Theorem 10.1(4) that if an < bn for all n, then a ≤ b.

Note that even with strict inequalities in the hypotheses, the conclusion will in general

only be a weak inequality. This reflects a general principle: limits change strict

inequalities into weak inequalities.

(2) The following well-known lemma also follows from Theorem 10.1(4).

Lemma 10.4. Let (an ) and (bn ) be real sequences, suppose that |an | ≤ |bn |, and suppose that

bn → 0. Then an → 0.

Lemma 10.5. Let (xi ) be a sequence in Rn . We write the ith term of the sequence as an

n-tuple thus: (xi1 , . . . , xin ) (cf. Remarks 7.9). If a = (a1 , . . . , an ) ∈ Rn , then xn → a if and

only if xij → aj (as i → ∞) for all j = 1, . . ., n.

Proof. These follow easily from Remarks 4.12.

We now establish convergence of some special, familiar sequences in R.

√

Proposition 10.6. (1) For any k ∈ N, 1/ k n → 0 as n → ∞.

(2) For any 0 < a < 1, an → 0 as n → ∞.

(3) n1/n → 1 as n → ∞.

(4) For any a ∈ R with 0 < a < 1, and any k ∈ N, an nk → 0 as n → ∞.

1/k

Proof.

√ (1) Let ε > 0. Choose n0 > 1/εk . If n ≥ n0 then n1/k ≥ n0 > 1/ε, and hence

1/ n < ε.

k

(3) It is evident that n1/n > 1. Let xn = n1/n − 1. Then by property (1) after Definition 1.1,

for n ≥ 2 we have n = (1 + xn )n > n(n−1)

2

2

x2n , and hence x2n < n−1 . It follows (using Lemma

10.4, and (1)) that xn → 0.

(4) This is very similar to the proof of (2). For that we referred to Remark 1.7. In that

remark we saw that if 0 < a < 1 then there is c > 0 such that an < c/n. Let’s apply this to

the number a1/(k+1) , which also lies between 0 and 1. Thus there is a positive number d such

that an/(k+1) < d/n. Raising both sides to the power k + 1 gives that an < dk+1 /nk+1 , and

hence that an nk < dk+1 /n. By Theorem 10.1(3) and Lemma 10.4, an nk → 0 as n → ∞.

Definition 10.7. A sequence (xn ) in R is increasing if xn ≤ xn+1 for all n. It is called

strictly increasing if xn < xn+1 for all n. Decreasing and strictly decreasing sequences are

defined similarly. A sequence is called monotone if any of these terms apply.

Theorem 10.8. An increasing sequence that is bounded above is convergent.

Proof. Let (xn ) be an increasing sequence that is bounded above. Then the set of terms,

{xn : n ∈ N}, has a supremum, c. We claim that xn → c. Let ε > 0. Since the supremum

26 JACK SPIELBERG

is an upper bound, we have xn ≤ c < c + ε for all n. Since c − ε < c, c − ε is not an upper

bound, so there exists n0 with c − ε < xn0 . Then for all n ≥ n0 we have c − ε < xn . Thus

we get that c − ε < xn < c + ε whenever n ≥ n0 . Thus xn → c.

Exercise 10.9. A bounded monotone sequence is convergent.

√

q p

Example 10.10. Does 2 + 2 + 2 + · · · mean anything? OK, this is phrased as a

philosophical question, i.e. it’s a joke. But we can still try to give the expression some

kind of sense. For example, we could argue that √ IF it does represent a real number, call

it x, then x must satisfy the equation x = 2 + x. Then it’s easy to see that x = 2.

But this is not valid since we haven’t shown that the expression does indeed represent a

real number. Someq people might try to make sense of it by interpreting it as a sequence:

√ p √ p √

( 2, 2 + 2, 2 + 2 + 2, . . .). They would define the expression to be the limit of this

sequence, assuming that the sequence converges. We could argue about whether this is a

reasonable definition for the expression, but we can’t argue with the intelligibility of the new

problem: does the given sequence converge, and if so, to what? (Other people might argue

that the limitq of this sequence (if existing) is actually the definition of a different expression;

p √

namely, · · · + 2 + 2 + 2.) However we come to study this sequence, it is a nice exercise

in induction to prove that it is bounded above and√increasing. Therefore it converges. Using

the recursive definition of the sequence (an+1 = 2 + an ), and Theorem 10.1, it is easy to

prove that the limit is, in fact, 2.

We next point out that continuity of real-valued functions is preserved by pointwise arith-

metic of functions.

Corollary 10.11. Let f , g : X → Y , and let a ∈ X. If f and g are continuous at a, then

so are f + g, f g, and f /g (if g(a) 6= 0).

Proof. This follows from Theorems 10.1 and 8.4.

Remark 10.12. The result for limits analogous to the one in Corollary 10.11 holds, as can

be seen by using Lemma 9.4.

Definition 10.13. If f : X → Rn we define the coordinate functions of f by fi = πi ◦ f :

X → R (recall the coordinate projections πi from Example 8.3 (6)). We can then write

f (x) = f1 (x), . . . , fn (x) .

Corollary 10.14. Let f : X → Rn . Then f is continuous if and only if all fi are continuous.

Proof. (=⇒): Use Theorem 8.11 and Example 8.3 (6).

(⇐=): Use Remark 4.12 and Theorem 8.4.

In spite of the wonderful theorem about monotone sequences from the last section, most

sequences (even bounded ones) diverge. However, there is still information to be gotten from

a divergent sequence.

Let (an ) be a bounded sequence in R, say L ≤ an ≤ M for all n. Then for each n,

sup{ak : k ≥ n} = sup{an , an+1 , an+2 . . .} exists, since the tails of (an ) are all bounded

above (by M ). (We will usually use the shorthand supk≥n ak for this sup of the nth tail

NOTES, MAT 472, INTERMEDIATE ANALYSIS, FALL 2010 27

of the sequence (an ).) Notice that {ak : k ≥ n} ⊇ {ak : k ≥ n + 1}, and hence that

supk≥n ak ≥ supk≥n+1 ak . Of course, since L ≤ ak for all k, we also have L ≤ supk≥n ak for

all n. Therefore the sequence of suprema of tails, (supk≥n ak )∞

n=1 is decreasing and bounded

below, and hence converges.

Definition 11.1. Let (an ) be a bounded sequence in R. The limit superior, or limsup of

(an ) is the real number

lim sup an = lim sup ak .

n→∞ n→∞ k≥n

lim inf an = lim inf ak .

n→∞ n→∞ k≥n

The justification is the opposite of the above: the sequence (inf k≥n ak )∞

n=1 is increasing and

bounded, so it has a limit.

Theorem 11.2. Let (an ) be a bounded sequence in R.

(1) lim inf n→∞ an ≤ lim supn→∞ an

(2) (an ) converges if and only if lim inf n→∞ an = lim supn→∞ an , and in this case,

lim an = lim inf an = lim sup an .

n→∞ n→∞ n→∞

The following theorem is usually referred to as the Bolzano-Weierstrass theorem. It is true

in Rn as well. In fact, we will use this property as a definition later (Definition 14.17). The

proof of the Bolzano-Weierstrass theorem in Rn will be given then.

Theorem 11.3. Let (an ) be a bounded sequence in R. Then (an ) has a convergent subse-

quence.

Proof. Let c = lim sup an . Let bn = supj≥n aj , so that b1 ≥ b2 ≥ · · · and lim bn = c. We

will use (bn ) to recursively define a subsequence of (an ) that converges to c. Choose m1 with

bm1 < c + 1. Then choose n1 ≥ m1 with an1 > bm1 − 1. Then

c − 1 ≤ bm1 − 1 < an1 ≤ bm1 < c + 1,

so that |an1 − c| < 1. (Exercise: make sure you can explain each of the above inequalities.)

Recursively, having chosen 1 ≤ n1 < n2 < · · · < nk−1 with |ani − c| < 1i for i = 1, . . .,

k − 1, choose mk > nk−1 so that bmk < c + k1 . Then choose nk ≥ mk with bmk − k1 < ank .

Then we have

1 1 1

c − ≤ bmk − < ank ≤ bmk < c + ,

k k k

and hence |ank − c| < k1 . Therefore we have defined 1 ≤ n1 < n2 < · · · so that |ank − c| < k1

for all k. Thus the subsequence (ank )∞

k=1 converges to c.

Remark 11.4. As a corollary to the proof, we see that every bounded sequence in R has a

subsequence converging to the limsup of the sequence. An analogous argument shows that

there is a(nother) subsequence converging to the liminf of the sequence.

Exercise 11.5. Let (an ) be a bounded sequence. Let E be the set of subsequential limits;

that is, E = {x ∈ R : there is a subsequence of (an ) converging to x}. Then lim inf an =

min(E) and lim sup an = max(E)

28 JACK SPIELBERG

Definition 11.6. We introduce here some standard terminology regarding sequences, re-

flecting the idea that it is only the “ultimate” behavior of a sequence that is of interest (cf.

Remark 7.12). Our phrasing is very general, hence vague, but expresses a useful notion that

is easy to understand once you see the idea. Let (an ) be a sequence, and let P be some

property that the terms of the sequence might have. We say that P holds eventually if P

holds for all terms in some tail of the sequence; in other words, if there exists n0 such that

P (an ) is true for all n ≥ n0 . We say that P holds frequently if every tail of the sequence

contains a term for which P holds; in other words, if for all n0 there exists n ≥ n0 such that

P (an ) is true.

For example, you can check your understanding of these terms by working through the

following statements.

(1) (an ) converges to c if and only if for every ε > 0, an ∈ Bε (c) eventually.

(2) (an ) has a subsequence converging to c if and only if for every ε > 0, an ∈ Bε (c)

frequently.

Exercise 11.7. Let (an ) be a bounded real sequence, and let x ∈ R. Prove the following:

1. x < lim sup an =⇒ x < an frequently =⇒ x ≤ lim sup an

2. x > lim sup an =⇒ x > an eventually =⇒ x ≥ lim sup an

3. x < lim inf an =⇒ x < an eventually =⇒ x ≤ lim inf an

4. x > lim inf an =⇒ x > an frequently =⇒ x ≥ lim inf an

(The exercise is not only to prove the eight implications, but also to show that none of these

implications can be reversed.)

There are innumerable ways in which a limit can fail to exist. One of these is “regular”

enough to warrant special notation: divergence to (±) infinity.

Definition 12.1. Let X be a metric space, let x0 ∈ X 0 , and let f : X → R. We say that

f diverges to infinity as x approaches x0 if for every M ∈ R there exists δ > 0 such that

for all x ∈ X, if 0 < d(x, x0 ) < δ then f (x) > M . We write limx→x0 f (x) = ∞ in this case.

Similarly, we say that f diverges to minus infinity as x approaches x0 if for every M ∈ R

there exists δ > 0 such that for all x ∈ X, if 0 < d(x, x0 ) < δ then f (x) < M . We write

limx→x0 f (x) = −∞ in this case. An analogous definition is used for sequences.

Remark 12.2. It is important always to remember that ∞ and −∞ are not real numbers.

However, a limited portion of the arithmetic of real numbers can be usefully extended to

include these two symbols. The conventions are as follows.

• For x ∈ R, x ± ∞ = ±∞.

• For x ∈ R with x 6= 0, x · ±∞ = ± sgn(x) · ∞.

• For x ∈ R, x/ ± ∞ = 0.

• ∞ + ∞ = ∞, ∞ · ∞ = ∞.

On the other hand, certain combinations are expressly forbidden, under pain of writing

nonsense:

∞

∞ − ∞, , 0·∞

∞

are not defined.

NOTES, MAT 472, INTERMEDIATE ANALYSIS, FALL 2010 29

With the above definition and remarks in mind, we can extend the arithmetic of limits

from Corollary 10.11 and Remark 10.12 to include infinite limits (and, of course, limits of

sequences as well as of functions). By this we mean that the limit of the sum/difference/pro-

duct/quotient of two functions equals the sum/difference/product/quotient of the two limits,

IF that arithmetic combination of the limits is permissible. We leave it as an exercise to

write a precise theorem and its proof.

A different use of the symbols ±∞ is in the description of limits at infinity.

Definition 12.3. Let X be a metric space, let f : R → X, and let x0 ∈ X. We write

limt→∞ f (t) = x0 if for every ε > 0 there exists M ∈ R such that for all t ∈ R with t ≥ M

we have d f (t), x0 < ε. There is a similar definition for limits at minus infinity.

Remark 12.4. We mention that in this context, the symbols ∞ and −∞ merely indicate

“directions”, and are not to be thought of as “numbers” in any way.

It may not have seemed important at the time, but the definition of convergence for

a sequence has an unfortunate limitation. Namely, in order to check the definition, it is

necessary to have the limit in hand. In order to use sequences as a tool to study spaces,

it would be very helpful to be able to give an internal characterization of convergence, one

that doesn’t refer to the limit itself. This motivation is not possible to carry out in general,

but the idea that came from it is very important.

Definition 13.1. Let (X, d) be a metric space. A sequence (xn ) in X is Cauchy if for every

positive real number ε, there exists n0 ∈ N such that for all m, n ≥ n0 we have d(xm , xn ) < ε.

Informally, we say that the sequence is Cauchy if its terms can be made close to each other

merely by requiring them to be far enough out in the sequence. It is an exercise in the logic

of quantifiers to convince yourself that the definition captures precisely the idea behind this

informal statement.

The following lemma provides many examples of Cauchy sequences.

Lemma 13.2. A convergent sequence is Cauchy.

Proof. Let (xn ) be convergent, with limit x. Let ε > 0 be given. By the definition of

convergence there is n0 such that for all n ≥ n0 , d(xn , x) < ε/2. Then if m, n ≥ n0 we have

d(xm , xn ) ≤ d(xm , x) + d(x, xn ) < ε/2 + ε/2 = ε. Therefore (xn ) is Cauchy.

√

Example 13.3. Here is an example of a non-Cauchy sequence in R: let xn = n. (Exercise:

prove that it’s not Cauchy.) But successive terms do get close to each other: |xn − xn+1 | =

1

√ √

n+ n+1

< √2n .

Example 13.4. Here is an example of a Cauchy sequence that does not converge. Let

X = (0, 1) with the usual metric gotten from R. The sequence (1/n) in X is Cauchy but

not convergent. (Remember the definition of convergence (Definition 7.5): the limit has to

belong to the metric space.)

Example 13.5. Here is a more interesting example of a non-convergent Cauchy sequence.

Let V be the vector space of all finite real sequences:

V = (x1 , x2 , . . .) : xi ∈ R, there exists i0 such that for all i > i0 , xi = 0 .

30 JACK SPIELBERG

P∞ 2 1/2

We define a norm on V by kxk = i=1 xi (note that the sum is actually finite). It’s

easy to see that this is a norm: the properties defining a norm only involve finitely many

vectors at a time, and then the required property actually occurs in some Euclidean space,

where we already know the properties hold. Now, let

1 1 1 1

vn = , , , . . . , n , 0, 0, 0, . . . ∈ V.

2 4 8 2

If m < n, we have

1 1

kvm − vn k2 = k(0, 0, . . . , 0, m+1 , . . . , n , 0, 0, . . .)k2

2 2

n n−m−1

X 1 2 1 X 1 1

= i

= m+1 i

< m.

i=m+1

2 4 i=0

4 4

Thus (vn ) is Cauchy in V . But we claim that (vn ) does not converge. To prove this, let

y = (yn ) be an arbitrary vector in V . There is k such that yi = 0 for i > k. For n > k,

∞ n

2

X

2

X 1 2 1 2 1

ky − vn k = (yi − vni ) = yi − i ≥ yk+1 − k+1 = k+1 .

i=1 i=1

2 2 4

Thus d(vn , y) ≥ 2−(k+1) for all n > k. Therefore vn 6→ y.

Definition 13.6. A metric space is called complete if every Cauchy sequence converges.

Theorem 13.7. Rn is complete.

We will give the proof after a couple of lemmas about Cauchy sequences in general metric

spaces.

Lemma 13.8. A Cauchy sequence is bounded.

Proof. Let (an ) be a Cauchy

sequence. Then there is L such that d(am , an ) < 1 for all m,

n ≥ L. Let R = max d(a1 , aL ), . . . , d(aL−1 , aL ) + 2. Then d(an , aL ) < R for all n, and

hence (an ) is bounded

Lemma 13.9. A Cauchy sequence having a convergent subsequence is convergent.

Proof. Let (an ) be a Cauchy sequence, and let (ani ) be a convergent subsequence, with limit

c. We claim an → c. Let ε > 0. Since (an ) is Cauchy there is L such that d(am, an ) < ε/2

for all m, n ≥ L. By the definition of convergence, there is i0 such that d ani , c < ε/2 for

all i ≥ i0 . Let i1 ≥ i0 be such that ni1 ≥ L. Then for any n ≥ ni1 we have

ε ε

d(an , c) ≤ d(an , ani1 ) + d(ani1 , c) < + = ε.

2 2

Hence an → c.

Proof. (of Theorem 13.7) We first show that R is complete. Let (an ) be a Cauchy sequence

in R. By Lemma 13.8 we know that (an ) is bounded. By Theorem 11.3 we know that (an )

has a convergent subsequence. Then by Lemma 13.9 we know that (an ) converges. Thus R

is complete. Now it follows easily from Remark 4.12 that Rn is complete (the details are left

as an exercise).

Exercise 13.10. A closed subset of a complete metric space is complete.

NOTES, MAT 472, INTERMEDIATE ANALYSIS, FALL 2010 31

Exercise 13.11. Let (X, d) be a metric space. Recall the diameter of a subset of X from

Exercise 5.19.

(1) Suppose that X is complete. Prove that for every decreasing sequence

F1 ⊇ F2 ⊇ · · ·

of nonempty closed subsets of X with limn→∞ diam(Fn ) = 0, there exists an element

a ∈ X such that

∞

\

Fn = {a}.

n=1

(2) (converse of part (a)) Suppose that whenever F1 , F2 . . . are nonempty

T∞ closed subsets

of X such that F1 ⊇ F2 ⊇ · · · and limn→∞ diam(Fn ) = 0, then n=1 Fn 6= ∅. Prove

that X is a complete metric space.

14. Compactness

Compactness is probably the most important concept in analysis. It can be described in

various ways. The “right” way is not necessarily the easiest to understand. Before we give

the definition, here is some motivation for why it is reasonable. The basic problem that

compactness addresses is the transition from local information to global information. That

may sound cryptic, and it is meant to be a catchy phrase that will become more intelligible

as you get more used to these ideas. But it isn’t hard to see what it is about. Local (near a

point) means in an open ball centered at that point. Here is a simple example of using this

terminology. If a function is continuous at a point, then it is bounded in some open ball

centered at that point. Thus if a function is continuous on a set, it is bounded locally on

that set: each point in the set has a neighborhood on which the function is bounded. On

the other hand, global (on a set) means on the whole set. A function is “globally bounded”

if it is bounded on its domain, i.e. if it is a bounded function. Is every continuous function

bounded? Of course not! For example, a non-constant polynomial on R is continuous, but

not bounded. Local boundedness does not generally imply global boundedness. However if

the domain of the polynomial is taken to be a closed bounded interval, then the extreme

value theorem from calculus implies that the polynomial is bounded on the interval. The

great insight was that it is a property of the domain that lets us pass from local boundedness

to global boundedness, and this property is called compactness.

Now, recall what the word local means: in a neighborhood of a point. A property holds

locally on a set if for each point, there is an open ball centered at the point such that the

property holds in that ball. If the set is infinite, this will give an infinite collection of open

balls, one for each point. We could obtain the property globally if we had a finite collection

of balls instead of an infinite collection. Compactness of the set means that we can always

reduce to a finite collection.

You might notice that a lot of mathematics seems to proceed in this way: what would we

like to have? Let’s give a name to the situation where we have what we want. Now let’s

analyze the situation to see what exactly we were asking for. In fact, compactness can be

described in a variety of ways that seem very different. That means that we can prove that a

space is compact using an easy description. Then we can use compactness via a complicated

description.

OK, with that as motivation, here is the precise definition.

32 JACK SPIELBERG

Definition 14.1. Let X be a set. A cover of X is a collection of sets whose union contains

X. If U is a cover of X, a subcover of U is a subcollection of U that is also a cover of X.

Example

14.2. (1) The set

of all open intervals is a cover of R.

(2) (a, b) : a < b, a, b ∈ Z is a subcover of example (1).

Definition 14.3. Let X be a metric space, and let E ⊆ X. An open cover of E is a cover

of E whose elements are open subsets of X.

Definition 14.4. Let X be a metric space, and let E ⊆ X. E is compact if every open

cover of E has a finite subcover.

Example 14.5. (1) Example 14.2(1) is an open cover of R having a finite subcover.

(2) Example 14.2(2) is an open cover of R not having a finite subcover. In particular, it

follows that R is not compact.

Example 14.6. (1) Finite sets are compact.

(2) {0, 1, 1/2, 1/3, . . .} is a compact subset of R.

(3) [0, 1] is a compact subset of R (this is a special case of Corollary 14.30).

Proof. Let U be an open cover of [0,1]. Let E = x ∈ [0, 1] : [0, x] is finitely covered

by U . Note that 0 ∈ E, so E 6= ∅. Let c = sup E. Then c ∈ [0, 1]. We first

claim that c ∈ E. To see this, choose U0 ∈ U with c ∈ U0 . Then there exists r > 0

such that (c − r, c + r) ⊆ U0 . By the definition of supremum, there is y ∈ E S with

y > c − r. By definition of E there is a finite subcollection V ⊆ U with [0, y] ⊆ V.

But then V ∪ {U0 } is a finite subcollection of U covering [0, c], proving that c ∈ E.

Now we note that, in fact, V ∪ {U0 } covers [0, a] for any number a between c and

c + r. Thus if c < 1 we could find a larger element of E than c, contradicting its

status as supremum. So we have shown that c = 1. Thus [0, 1] is finitely covered by

U.

(4) [0, 1) is not compact.

Proof. (−1, 1 − n1 ) : n ∈ N is an open cover not having a finite subcover.

Definition 14.7. A metric space X is compact if X is a compact subset of itself.

By now our waffling use of the qualifier “subset” after the word “compact” may be causing

some trauma. We will remedy this now, but first we need the important notion of relatively

open set.

Definition 14.8. Let X be a metric space. Recall that a subset E ⊆ X is also a metric

space (cf Definition 4.13). A subset of E is called relatively open (in E) if it is an open

subset of the metric space E.

Example 14.9. (1) Let X = R, and let E = [0, 1] ⊆ X. Then [0, 1/2) is relatively open

in E, but not open in X.

(2) Let X = R2 , and let E = R × {0} ⊆ X (we think of E as being the x-axis in R2 ).

Then (0, 1) × {0} is just the usual open unit interval in the x-axis — it is relatively

open in E, but is not open in X.

Lemma 14.10. Let X be a metric space, and let E ⊆ X. For U ⊆ E, U is relatively open

in E if and only if there exists an open subset V of X such that U = E ∩ V .

NOTES, MAT 472, INTERMEDIATE ANALYSIS, FALL 2010 33

Proof. We will use a superscript E to distinguish open balls in the metric space E from open

balls in X. For a ∈ E and r > 0 we see that

BrE (a) = x ∈ E : d(x, a) < r = x ∈ X : d(x, a) < r ∩ E = Br (a) ∩ E.

S x ∈ U there

Thus U is relatively open in E if and only if for every exists r(x) > 0 such that

Br(x) (x)∩E ⊆ U . In this case, we have that U = x∈U Br(x) (x) ∩E, and we may use the set

in parentheses for V . Conversely, suppose that U = V ∩ E for some open set V of X. Then

for a point x ∈ U there is r > 0 such that Br (x) ⊆ V . Then BrE (x) = Br (x)∩E ⊆ V ∩E = U ,

so we have that U is relatively open in E.

Proposition 14.11. Let X be a metric space, and let E ⊆ X. E is a compact subset of X

if and only if E is a compact metric space.

Proof. Suppose that E is a compact subset of X. Let U be an open cover of (the metric space)

E. By Lemma 14.10, for each U ∈ U there is an open set VU ⊆ X such that U = VU ∩ E.

Then [ [ [

E= U= (VU ∩ E) = VU ∩ E,

U ∈U U ∈U U ∈U

and hence {VU : U ∈ U} is an open cover of E in X. By hypothesis this open cover has

a finite subcover. Thus there are U1 , . . ., Uk ∈ U such that E ⊆ VU1 ∪ · · · ∪ VUk . Hence

E ⊆ U1 ∪ · · · ∪ Uk , so that U has a finite subcover. Therefore the metric space E is compact.

The converse is left as an exercise.

Thus compactness is an intrinsic property of a metric space, that cannot be lost when the

space is realized as a subspace of another metric space (in contrast to openness, which does

depend on the ambient metric space, as seen in Example 14.9). We now develop the chief

properties of compactness.

Proposition 14.12. A closed subset of a compact space is compact.

Proof. Let X be a compact metric space, and let E ⊆ X be a closed subset. Let U be an

open cover of E. Since E is closed, E c is open. Then U ∪ {E c } is an open cover of X. Since

X is compact, this open cover has a finite subcover. The subcover consists of finitely many

sets from U, possibly together with E c . But then the sets from U must cover E, so that U

has a finite subcover (of E). Therefore E is compact.

Exercise 14.13. It is a nice exercise to prove a sort of converse to this. Namely, a compact

subset of a metric space is closed. We won’t do it here, as this fact will follow from a later

result (Corollary 14.20).

Proposition 14.14. A compact subset of a metric space is bounded.

Proof. Let E be a compact

subset of the metric space X. Choose any point x0 ∈ X. Then

Bn (x0 ) : n = 1, 2, 3, . . . is an open cover of X, hence also of E. Since E is compact, there

is a finite subcover. But since the open balls increase with n, this means that there is n such

that E ⊆ Bn (x0 ). Thus E is bounded.

Of course, the converse of Proposition 14.14 is false.

Theorem 14.15. (Finite Intersection Property, or FIP) Let X be a compact metric space.

Let {Ei }i∈I be a collection of nonempty closed subsets of X. Suppose that every finite

subcollection has nonempty intersection: for all k ∈ N, for all i1 , . . ., ik ∈ I, we have

Ei1 ∩ · · · ∩ Eik 6= ∅. Then ∩i∈I Ei 6= ∅.

34 JACK SPIELBERG

Proof. Suppose not. Then taking complements we have ∪i∈I Eic = X. This means that

{Eic : i ∈ I} is an open cover of X. Since X is compact there are i1 , . . ., ik ∈ I with

Eic1 ∪ · · · ∪ Eick = X. But then by complements again, we get that Ei1 ∩ · · · ∩ Eik = ∅, a

contradiction.

Example

14.16. The theorem may fail if the sets are not closed: consider (0, 1/n) : n ∈

N . This does have the FIP, but the intersection is empty.

Definition 14.17. A metric space X is sequentially compact if every sequence in X has a

convergent subsequence (convergent in X, of course).

Example 14.18. [a, b] is sequentially compact by Theorem 11.3, and the fact that [a, b] is

closed.

Theorem 14.19. A compact metric space is sequentially compact.

Corollary 14.20. A compact subset of a metric space is closed.

The proof of the theorem will be made easier by the following preliminary “computation.”

Lemma 14.21. Let (xn ) be a sequence in a metric space, and let y be a point. Then (xn )

has a subsequence converging to y if and only if for every ε > 0 and for every m ∈ N, there

exists n ≥ m such that d(xn , y) < ε.

Proof. (⇒): Suppose limi→∞ xni = y. Let ε > 0 and m ∈ N. By the hypothesized conver-

gence there is i0 such that d(xni , y) < ε whenever i ≥ i0 . Since ni → ∞ as i → ∞ there

exists j ≥ i0 such that nj ≥ m. Then d(xnj , y) < ε. So nj is the desired ‘n’.

(⇐): Suppose the condition in the statement holds. We apply it repeatedly. First choose

n1 such that d(xn1 , y) < 1. Then choose n2 > n1 such that d(xn2 , y) < 1/2. Continuing

this way we construct a subsequence (xni )∞ i=1 such that d(xni , y) < 1/i for all i. Evidently

xni → y as i → ∞.

Proof. (of Theorem 14.19) We will prove the contrapositive of the statement in the theorem.

So suppose that X is not sequentially compact. Then there is a sequence (xn ) having no

convergent subsequence. Thus for all y ∈ X, (xn ) does not have a subsequence converging to

y. Negating the condition in Lemma 14.21, we find that for all y ∈ X there exists εy > 0 and

there exists ny ∈ N such that for all n ≥ ny , d(xn , y) ≥ εy . Let U = Bεy (y) : y ∈ X . U is

obviously an open cover of X. But if y1 , . . ., yk ∈ X are any finite collection of points, choose

n > max{ny1 , . . . , nyk }. Then d(xn , y) ≥ εyi for i = 1, . . ., k. Hence xn 6∈ ∪ki=1 Bεyi (yi ). Thus

U has no finite subcover. Therefore X is not compact.

Proposition 14.22. A sequentially compact metric space is complete.

Proof. This follows from Lemma 13.9.

Exercise 14.23. A metric space X is sequentially compact if and only if every infinite subset

of X has a cluster point.

We now turn to the role of boundedness for compact metric spaces. By way of introduction,

we mention that the most famous result about compact metric spaces is the Heine-Borel

theorem: a subset of Rn is compact if and only if it closed and bounded. We will prove this

later, but now we want to point out that this result is special to Rn — it is NOT true in

arbitrary metric spaces. The reason is that Rn is (duh!) finite dimensional. This may not

NOTES, MAT 472, INTERMEDIATE ANALYSIS, FALL 2010 35

seem so special now, but many of the most important metric spaces in analysis are infinite

dimensional, and you will surely run into them (maybe not today, maybe not tomorrow,

but...yeah, yeah.)

Here is a simple part of the Heine-Borel theorem that we have essentially proved already.

For E ⊆ R, if E is bounded then every sequence in E has a convergent subsequence. If E is

both closed and bounded, then the limit of the convergent subsequence must belong to E.

Thus we see that for subsets of R, closed and bounded imply sequentially compact.

Here are two examples to show that for general metric spaces, boundedness is too weak a

notion. The first is simple-minded, but the second is more interesting.

Example 14.24. (1) Let X be an infinite set with the discrete metric (Example 4.17).

Then X is bounded, but not sequentially compact.

(2) Let V be the normed space of finite real sequences (Example 13.5). Then B 1 (0) is

closed and bounded, but not sequentially compact.

In fact, the situation is worse than might be realized if you just think about the

non-convergent Cauchy sequence from Example 13.5. Consider the sequence (en )

in V , where en = (0, 0, . . . , 0, 1, 0, 0, . . .) (with 1 in the nth slot). This sequence is

contained in the unit ball of V , but does not even have a Cauchy subsequence.

These examples show that the problem with boundedness is that a huge space can hide

inside a bounded set. The correct definition is the following.

Definition 14.25. A subset E of a metric space is called totally bounded if for every ε > 0

there are finitely many balls of radius ε that cover E.

Remark 14.26. (1) The definition is unaffected by specifying the type of the balls (open

vs. closed).

(2) A totally bounded subset of a metric space is bounded. A subset of a totally bounded

set is totally bounded.

The proofs are left as exercises.

The next lemma shows what makes Rn so special.

Lemma 14.27. In Rn , every bounded subset is totally bounded.

Proof. Let E ⊆√Rn be bounded, and let ε > 0. Choose C > 0 such that E ⊆ [−C, C]n .

Choose k > 2C n/ε. Write

k [ k

[ 2C(i − 1) 2Ci

[−C, C] = −C + , −C + = Si ,

i=1

k k i=1

√

where S1 , . . ., Sk are closed intervals of length 2C/k < ε/ n. Then

k

k

[

n

[−C, C] = (S1 ∪ · · · ∪ Sk ) × · · · × (S1 ∪ · · · ∪ Sk ) = Si1 × · · · × Sik = ∪nj=1 Fj ,

i1 ,...,in =1

where each Fj is a closed cube of side 2C/k. Then √ the diameter of each Fj , which equals

the length of the diagonal of Fj , equals (2C/k) n < ε. Let xj ∈ Fj be arbitrary. Then

k

Fj ⊆ Bε (xj ). It follows that E ⊆ [−C, C]n ⊆ ∪nj=1 Bε (xj ).

We now return to our development of the properties of compactness.

36 JACK SPIELBERG

Proof. We again prove the contrapositive. Suppose that X is a metric space that is not

totally bounded. Then there is a positive number ε such that X cannot be covered by

finitely many balls of radius ε. Let x1 ∈ X. Since X 6⊆ B ε (x1 ) there must be x2 ∈ X with

d(x1 , x2 ) > ε. Since X 6⊆ B ε (x1 ) ∪ B ε (x2 ) there must be x3 ∈ X with d(xi , x3 ) > ε for i < 3.

Continuing this way we construct a sequence (xn ) in X such that d(xi , xn ) > ε for i < n.

This sequence has no Cauchy subsequence, hence no convergent subsequence. Therefore X

is not sequentially compact.

We now have almost all of the pieces of the main theorem on compactness in metric spaces.

Theorem 14.29. Let X be a metric space. The following are equivalent:

(1) X is compact.

(2) X is sequentially compact.

(3) X is complete and totally bounded.

Proof. (1)⇒(2) This is Theorem 14.19.

(2)⇒(3) This follows from Propositions 14.22 and 14.28.

(3)⇒(1) We prove this by contradiction. Let X be complete and totally bounded, and

suppose that X is not compact. Then there is an open cover U having no finite subcover.

We first use total boundedness. There is a finite collection C1 of closed balls of radius 1

covering X. There must be a ball B1 ∈ C1 such that B1 is not finitely covered by U —

otherwise X would be finitely covered by U. Now since B1 is totally bounded there is a

finite collection C2 of closed balls of radius 1/2 covering B1 . There must exist B2 ∈ C2 such

that B1 ∩ B2 is not finitely covered by U. Continuing this process we construct a sequence

B1 , B2 , . . . of closed balls such that Bi has radius 1/i and such that for each i, B1 ∩ · · · ∩ Bi

is not finitely covered by U.

Now we use completeness of X: exercise 13.11 implies that there is a point a ∈ ∩∞ i=1 Bi .

Choose U0 ∈ U with a ∈ U0 . Since U0 is open there is r > 0 with Br (a) ⊆ U0 . Let n > 2/r.

We claim that Bn ⊆ Br (a). To see this, let y ∈ Bn . Then d(y, a) < diam (Bn ) ≤ 2/n <

r. This proves the claim, and hence we have Bn ⊆ U0 . Therefore B1 ∩ · · · ∩ Bn ⊆ U0 ,

contradicting the fact that B1 ∩ · · · ∩ Bn is not finitely covered by U.

Corollary 14.30. (Heine-Borel theorem) Let E ⊆ Rn . Then E is compact if and only if E

is closed and bounded.

Proof. Since Rn is complete (Theorem 13.7), E is complete if and only if it is closed. By

Lemma 14.27 (and the remark preceding that Lemma), E is totally bounded if and only if

it is bounded.

Theorem 15.1. Let X and Y be metric spaces, and let f : X → Y be a continuous function.

If X is compact then so is f (X).

Proof. Let V be an open cover of f (X). Then f −1 (V) = f −1 (V ) : V ∈ V is an open cover

such that X = ∪ki=1 f −1 (Vi ). But then f (X) ⊆ ∪ki=1 Vi . Thus V admits the finite subcover

{V1 , . . . , Vk }.

NOTES, MAT 472, INTERMEDIATE ANALYSIS, FALL 2010 37

Exercise 15.2. One can also prove this theorem using sequences and sequential compact-

ness.

Corollary 15.3. If X is compact and f : X → Y is continuous, then f (X) is a closed

bounded subset of Y (in fact, totally bounded).

Corollary 15.4. (Extreme value theorem) Let X be a compact metric space, and let f :

X → R be continuous. Then f achieves its maximum and minimum at points of X: there

exist x0 , x1 ∈ X such that for all x ∈ X, f (x0 ) ≤ f (x) ≤ f (x1 ).

Proof. A (non-empty) closed bounded subset of R contains its infimum and supremum.

Corollary 15.5. A continuous (R-valued) function on a closed bounded interval has a max-

imum and a minumum.

Definition 15.6. Let X and Y be metric spaces, and let f : X → Y . f is an open map if

f (A) is an open subset of Y whenever A is an open subset of X. f is a closed map if f (A)

is a closed subset of Y whenever A is a closed subset of X.

Remark 15.7. Note that the above definitions refer to the forward set map defined by f ,

which is less well behaved than the reverse set map. For the reverse map, the analogous

properties are equivalent to continuity (Theorem 8.5 and Exercise 8.6).

Theorem 15.8. Let X be compact, and let f : X → Y be continuous. Then f is a closed

map.

Proof. The proof is an exercise.

Example 15.9. (1) Let T be the unit circle, and let f : [0, 1] → T be given by f (t) =

(cos 2πt, sin 2πt). Then f is continuous

but not an open map: [0, 1/2) is an open

subset of [0, 1], but f [0, 1/2) is not an open subset of T, since it contain its non-

interior point (1, 0).

2πt 2πt

(2) Define g : [0, ∞) → T by g(t) = cos t+1 , sin t+1 . Then g is bijective and continuous,

but is neither

a closed map nor an open map: [1, ∞) is a closed subset of [0, ∞), but

f [1, ∞) is not a closed subset of T since it does not contain its limit point (1, 0).

As in the previous example, [0, 1) is an open subset of [0, ∞), but f [0, 1) is not an

open subset of T.

Theorem 15.10. Let X and Y be metric spaces with X compact, and let f : X → Y be

continuous and bijective. Then f is an open map.

Proof. Let U ⊆ X be open. Then U c is closed, hence compact. Therefore f (U c ) is compact,

hence closed. But f (U c ) = f (U )c since f is bijective. Therefore f (U ) is open.

Corollary 15.11. In the above theorem, f −1 is continuous.

16. Connectedness

Let’s recall for a moment Example 8.8(5): T and [0, 1] are not homeomorphic metric

spaces. How might we go about proving this? A clever observation is the following: if we

remove a point from T, the result is still “one piece” (in fact, it is easy to see that for any

z ∈ T, T \ {z} is homeomorphic to R). On the other hand, if we remove a point from [0, 1]

(other than one of the two endpoints), the result “consists of two pieces”. It is an even

38 JACK SPIELBERG

cleverer observation that it is not very easy to say more precisely what we mean by “consists

of two pieces”. For example, any set containing more than one point can be divided into two

nonempty disjoint pieces. But surely, the divsion [0, 1] \ { 12 } = [0, 12 ) t ( 21 , 1] is a special way

of dividing a set into two pieces. What is special about it?

We need a topological property, and the following is the right one: no sequence in one

of the pieces can converge to a point of the other. Well, this is clearly true of the division

of [0, 1] \ { 21 } described above. But it pushes the problem back over to the other side: can

we prove that it is not possible to divide R into two nonempty disjoint pieces such that no

sequence in one piece can converge to a point of the other piece?

At some point, we just have to bite the bullet and try to prove a hard result. In this section

we will do this, and prove the fact about R stated in the previous paragraph. This is a deep

consequence of the completeness axiom. The relevant property of R is called connectedness.

As the above discussion has indicated, connectedness is a sort of “negative” property. We

will begin with the corresponding “positive” property. First, notice that to say that no

sequence in A converges to a point of B is the same thing as saying that A ∩ B = ∅. We use

this for our definition (notice that disjointness of A and B is implied).

Definition 16.1. Let X be a metric space. We call X separated if there exist nonempty

subsets A and B such that A ∪ B = X and A ∩ B = ∅ = A ∩ B. X is called connected if it

is not separated.

Remark 16.2. If E ⊆ X is a subset, we call E separated (or connected) if as a metric space

in its own right E has that property. We note that in the above definition of separation,

if A and B are subsets of E with union equal to E, the closures may be taken relative to

E, or in X — the intersections A ∩ B and A ∩ B will be the same. Thus being separated

or connected is an intrinsic property of E; it does not depend on whether E is given as a

subspace of another metric space.

There is another way to describe connectedness. Suppose that the metric space X is

separated, and let A and B be subsets as in the definition. Since X = A ∪ B, we know that

X = A ∪ B. Since A ∩ B = ∅, then A = (B)c . Thus A is an open set in X. Since A ∩ B = ∅

also, we know that B = Ac , hence B is closed. By the symmetry of the situation we know

that A is also closed, and B is open.

Definition 16.3. Let X be a metric space. A subset of X is clopen if it is both closed and

open.

Lemma 16.4. The metric space X is separated if and only if it contains a proper nonempty

clopen subset. X is connected if and only if its only clopen subsets are X and ∅.

Proof. The proof is elementary, and we leave it as an exercise.

Remark 16.5. Let X be a metric space, and let E ⊆ X. What does it mean for A ⊆ E to

be relatively clopen in E? We know that A is relatively open in E if and only if A = E ∩ U

for some open set U ⊆ X. Similarly, one can check that A is relatively closed in E if and

only if A = E ∩ K for some closed set K ⊆ X. Thus A is relatively clopen in E if and only if

there are two sets U and K in X, with U open and K closed, such that A = E ∩ U = E ∩ K.

(Note that it is NOT NECESSARILY true that A equals the intersection of E with a clopen

subset of X.)

Exercise 16.6. The Cantor set (Definition 6.1) is not connected.

NOTES, MAT 472, INTERMEDIATE ANALYSIS, FALL 2010 39

Definition 16.7. An interval is a subset I ⊆ R such that for all a < c < b in R, if a, b ∈ I

then c ∈ I. (I.e. an interval is a subset of R that is closed under ‘betweenness’.)

Example 16.8. The following are intervals (for any a ≤ b in R):

(a, b) [a, b] [a, b) (a, b] ∅

(a, ∞) [a, ∞) (−∞, b) (−∞, b] R.

Lemma 16.9. Every interval is of one of the forms in Example 16.8.

Proof. Let I be a nonempty interval. Choose c ∈ I. Let B = x ∈ R : [c, x] ⊆ I . B is

nonempty since c ∈ B. Let

(

sup B, if B is bounded above,

b=

∞, else.

If b ∈ I and b ≥ c, then [c, b] ⊆ I and (b, ∞) ⊆ I c . If b 6∈ I, then [c, b) ⊆ I and [b, ∞) ⊆ I c .

Similarly, define a by working on the left of c. There are four cases altogether, and I is

presented as one of the forms in Example 16.8 in each case.

Theorem 16.10. Let I ⊆ R. Then I is connected if and only if I is an interval.

Proof. (=⇒): Suppose that I is not an interval. Then there are a < c < b in R with a, b ∈ I

and c 6∈ I. Put A = (−∞, c) ∩ I. Then A 6= ∅, A 6= I, and A = I ∩ (−∞, c) = I ∩ (−∞, c]

is clopen in I.

(⇐=): Suppose that I is an interval, but that I is not connected. Let E ⊆ I be a proper

nonempty clopen subset of I. Then there are an open set U ⊆ R and a closed set K ⊆ R

such that E = I ∩ U = I ∩ K. Let a ∈ E and b ∈ I \ E. We may as well assume that a < b.

Then since I is an interval, we know that [a, b] ⊆ I. We have

E ∩ [a, b] = I ∩ K ∩ [a, b] = K ∩ [a, b];

(I \ E) ∩ [a, b] = I \ (I ∩ U ) ∩ [a, b] = (I \ U ) ∩ [a, b] = [a, b] \ U.

Thus E ∩ [a, b] and (I \ E) ∩ [a, b] are closed subsets of R. Let c = sup E ∩ [a, b] . Then

c ∈ E ∩[a, b] since this set is closed. Also, c < b since b 6∈ E. Hence (c, b] ⊆ (I \E)∩[a, b], and

so c ∈ (I \E)∩[a, b] since this set is closed. This leads to the contradiction c ∈ E ∩(I \E).

The following theorem is very useful, and we place it here because it deals with intervals

(although it is not a result about connectedness).

Theorem 16.11. Let U ⊆ R be open. Then U equals the union of countably many open

intervals. Moreover, U can be written as the union of a countable collection of pairwise

disjoint open intervals, and this collection is unique.

Proof. For x ∈ U choose a(x), b(x) ∈ Q with x ∈ a(x), b(x) ⊆ U . Let E = a(x), b(x) :

x ∈ U . Then E is a collection of open intervals. Since E ⊆ (α, Sβ) : α, β ∈ Q, α < β

2

Q , we see that E is a countable collection. It is clear that U = E.

The proof of the second statement of the Theorem is left as an exercise.

40 JACK SPIELBERG

Theorem 17.1. Let X and Y be metric spaces, and let f : X → Y be continuous. Suppose

that X is connected. Then f (X) is connected.

Proof. Since f is continuous, f −1 preserves openness and closedness, hence clopenness. Since

X is connected, f −1 (E) is clopen if and only if it equals X or ∅. Therefore any nonempty

clopen set in f (X) must equal f (X).

Corollary 17.2. (Intermediate value theorem) Let X be a connected metric space, and

f : X → R a continuous function. Let a, b ∈ X, and let t lie between f (a) and f (b). Then

there exists x ∈ X such that f (x) = t.

Proof. By Theorem 17.1, f (X) is a connected subset of R, hence an interval.

Example 17.3. The following is a typical “practical” illustration of the corollary. Suppose

that the temperature in Phoenix is 110 degrees, and at the same instant the temperature in

La Paz is 2 degrees. Then there must be a place on the earth’s surface where the temperature

(at the same instant) is exactly π degrees.

Example 17.4. Let n ∈ N, and define f : [0, ∞) → [0, ∞) by f (t) = tn . Since f is

continuous and [0, ∞) is connected, it follows from Theorem 17.1 that f [0, ∞) is connected.

n n

Let x > 0. There

is k ∈ N with x < k. Then 0 < x < k . Since 0, k ∈ f [0, ∞) , then

x ∈ f [0, ∞) . Therefore there exists y > 0 such that x = f (y). This is a new proof of the

existence of nth roots (compare with the proof of Theorem 1.22).

Now let b > 0, and consider the restriction of f : fb := f [0,b] : [0, b] → [0, bn ]. Since [0, b]

is compact and fb is continuous and bijective, it follows from Corollary 15.11 that (fb )−1 is

continuous.√This is true for all b > 0, and hence we have proved that f −1 is continuous.

(f −1 (x) = n x.)

Definition 17.5. The metric space X is path connected if for any two points a1 , a2 ∈ X,

there is a continuous function f : [t1 , t2 ] → X such that f (ti ) = ai , for i = 1, 2.

Proposition 17.6. If X is path connected, then X is connected.

Proof. Let A ⊆ X be a nonempty clopen subset. Let a ∈ X. For any x ∈ X, there

is a continuous function f : [0, 1] → X such that f (0) = a and f (1) = x. Since A is

clopen, f −1 (A) is a clopen subset of [0, 1], and is nonempty since it contains 0. Since [0, 1] is

connected, we have 1 ∈ [0, 1] ⊆ f −1 (A), and hence that x = f (1) ∈ f ([0, 1]) ⊆ A. Therefore

A = X.

Definition

17.7. Let V be a real vector space. For x, y ∈ V let Sx,y = (1 − t)x + ty : t ∈

[0, 1] . (Sx,y is the line segment connecting x and y.) A subset E ⊆ V is called convex if

Sx,y ⊆ E whenever x, y ∈ E.

Theorem 17.8. Let V be a real normed vector space. Every convex subset of V is connected.

Proof. For any x, y ∈ E, the function f : [0, 1] → V defined by f (t) = (1 − t)x + ty is

continuous. Thus f : [0, 1] → E, and so E is path connected.

Corollary 17.9. Any convex subset of Rn is connected. (For example, any ball in Rn is

connected.)

NOTES, MAT 472, INTERMEDIATE ANALYSIS, FALL 2010 41

let f : D → Rm be a function. The graph of f is the

set G(f ) ⊆ Rn+m given by G(f ) = x, f (x) : x ∈ D .

Proposition 17.11. If D ⊆ Rn is connected, and f : D → Rm is continuous, then G(f ) is

connected.

Proof. Define g : D → Rn+m by g(x) = x, f (x) . Then g is continuous, since all of its

coordinate functions are continuous (being either a coordinate of x, or a coordinate function

of the continuous function f ). By Theorem 17.1, g(D) is connected. But g(D) = G(f ).

Example 17.12. (1) The unit circle T is connected by Theorem 17.1, being the image

of [0, 1] under the continuous function (cos 2πt, sin 2πt).

(2) The graph of sin(1/x)

for x > 0 is connected,

by Proposition 17.11. Let E denote

this graph: E = x, sin(1/x) : x > 0 . Let F = {0} × [−1, 1]. F is also connected,

being convex. It follows from Exercise 17.13 below that the union E∪F is a connected

set. It is a nice exercise to prove that it is not path connected. You should draw a

picture (and do the exercise) in order to appreciate this bizarre example.

(3) In the last example, delete the portion of E for x > π, then include a curve below the

wiggly graph, connecting (π, 0) to (0, −1). The new set is called the Warsaw circle.

It is path connected, but there does not exist a path going “once around”.

Exercise 17.13. Let A be a connected subset of a metric space, and let A ⊆ B ⊆ A. Then

B is connected.

Exercise 17.14. Let X be a metric space, let A ⊆ X be a connected subset, and let E ⊆ X

be a clopen subset. Then either A ∩ E = ∅, or A ⊆ E. (Thus if a clopen set touches a

connected set, it must contain all of it.)

Exercise 17.15. Let {Ai : i ∈ I} be subsets of a metric space. If all of the Ai are connected,

and if ∩i∈I Ai 6= ∅, then ∪i∈I Ai is connected.

Exercise 17.16. Let E be the following subset of R2 :

∞

!

[

{ n1 } × [0, 1] ∪ {(0, 1)}.

E = (0, 1] × {0} ∪

n=1

Theorem 17.17. Let X be a metric space. For x ∈ X let

[

C(x) = A ⊆ X : x ∈ A and A is connected .

(1) C(x)

is connected.

(2) C(x) : x ∈ X is a partition of X.

(3) C(x) is a closed set.

(4) C(x) is a maximal connected subset of X.

Proof. (1) The sets A in the union defining C(x) all contain x. Thus C(x) is connected by

Lemma 17.15.

(2) Suppose that C(x) ∩ C(y) 6= ∅. By Lemma 17.15, C(x) ∪ C(y) is connected. Since it

contains x it is one of the sets A in the union defining C(x). Thus C(x) ∪ C(y) ⊆ C(x), and

we have that C(y) ⊆ C(x). By symmetry, C(x) ⊆ C(y), so that C(x) = C(y).

42 JACK SPIELBERG

(3) By (1) and Exercise 17.13, C(x) is connected. Since x ∈ C(x), C(x) is one of the sets in

the union defining C(x); thus C(x) ⊆ C(x).

(4) Any connected set containing C(x) is one of the sets in the union defining C(x), and

hence must equal C(x).

Definition 17.18. Let X be a metric space. A component of X is a maximal connected

subset. Thus the components of X are the sets C(x) from Theorem 17.17.

Theorem 17.19. Let U ⊆ Rn be open. Then U has countably many components, and these

are open sets.

Proof. Let x ∈ U , and y ∈ C(x). Since U is open there is r > 0 such that Br (y) ⊆ U . Then

C(x) ∪ Br (y) is connected by Lemma 17.15 (and Corollary 17.9). Then C(x) ∪ Br (y) ⊆ C(x)

by the definition of C(x), hence Br (y) ⊆ C(x). Thus C(x) is open.

Since the components of U are open, we may choose an element of Qn in each one. This

defines a map from the set of components to Qn . Since the distinct components are disjoint,

this map is one-to-one. Since Qn is countable, so is the set of components.

Continuity is a locally defined property. Suppose that f : X → Y is continuous. If

ε > 0 is given, and if a point x0 ∈ X is given, then continuity of f at x0 provides a

positive number δ with a certain property (Definition 8.1). The local-ness is expressed in

the order of the quantifiers in that definition (and as we have rephrased it above): the

number δ need only do its job for the one point x0 already chosen. In fact, this means that

δ (perhaps slightly modified) works throughout some ball centered at x0 . A(n open) ball

centered at x0 is a neighborhood of x0 . A property is local if each point has a neighborhood

in which the property holds. A globally defined property, on the other hand, is one that

holds everywhere. Continuity would be globally defined if the same δ worked for all points

of X. Not all continuous functions have such a strong form of continuity; those that do have

a special name.

Definition 18.1. Let X and Y be metric spaces, and let f : X → Y be a function. f is

uniformly continuous if for every ε > 0 there exists δ > 0 such that for all x1 , x2 ∈ X, if

dX (x1 , x2 ) < δ then dY f (x1 ), f (x2 ) < ε.

Note that the only difference between this definition and the definition of continuity on X

is in the order in which the point and the δ are specified. Some examples will help to clarify

this.

Example 18.2. (1) Let f : [−10, 10] → R be given by f (t) = t2 . Then f is uniformly

continuous.

Proof. Let ε > 0 be given. Let δ = ε/20. If t1 , t2 ∈ [−10, 10] are such that |t1 −t2 | < δ,

then f (t1 ) − f (t2 ) = |t21 − t22 | = |t1 + t2 | · |t1 − t2 | < |t1 | + |t2 | δ ≤ 20δ = ε.

(2) Let g : R → R be given by g(t) = t2 . Then g is not uniformly continuous.

Proof. We choose ε = 1. Let δ > 0 be given. Choose t > 1/δ, and let s = t + δ/2.

Then |s − t| = δ/2 < δ, while |s2 − t2 | = |s − t| · |s + t| = (δ/2)(2t + δ/2) > δt > 1 = ε.

Therefore g is not uniformly continuous.

NOTES, MAT 472, INTERMEDIATE ANALYSIS, FALL 2010 43

(3) Let h : (0, 1) → R be given by h(t) = sin(1/t). Then h is not uniformly continuous.

√

Proof. We choose ε = 2. Let δ > 0 be given. Choose n > 1/ δ. Let s = 2/[(2n+1)π]

and let t = 2/[(2n + 3)π]. Then

2 1 1 2 2 1

|s − t| = − = ≤ 2 < δ.

π 2n + 1 2n + 3 π (2n + 1)(2n + 3) n

But h(s) − h(t) = 1 − (−1) = 2 ≥ ε. Therefore h is not uniformly continuous.

The following theorem is a classic use of compactness to get a global result from local

information.

Theorem 18.3. Suppose f : X → Y is continuous, and X is compact. Then f is uniformly

continuous.

Proof. Let ε > 0 be given.

Since f is continuous,

for each x ∈ X there is rx > 0 such that

f Brx (x) ⊆ Bε/2 f (x) . The collection Brx /2 (x) : x ∈SX is an open cover of X. Since X

is compact, there are x1 , . . ., xn ∈ X such that X = ni=1 Brxi /2 (xi ). Let δ = min{rxi /2 :

1 ≤ i ≤ n}. Let y, z ∈ X with d(y, z) < δ. There is i such that d(y, xi ) < rxi/2. Then

d(z, xi ) ≤ d(z,

y) + d(y, xi ) < δ + rxi /2 ≤ rxi . Then f (y), f (z) ∈ Bε/2 f (xi ) , so that

d f (y), f (z) < ε.

19. Convergence of functions

Definition 19.1. Let X be a set. (Note that we really do mean set. Later we will let X

be a metric space, but for now, that is not relevant.) Let fn : X → Rk for n = 1, 2, 3, . . ..

(We remark that Rk may be replaced by another metric space. For ease of exposition, we

restrict our attention to the case where

∞ the codomain is Euclidean space.) For a ∈ X we say

k

that (fn ) converges at a if fn (a) n=1 is a convergent sequence in R . If (fn ) converges at

each point of x, define f : X → Rk by f (x) = limn→∞ fn (x). We say that (fn ) converges to

f (pointwise).

We may specify this more precisely as: for every ε > 0, for every x ∈ X, there exists

n0 ∈ N such that for all n ≥ n0 , kfn (x) − f (x)k < ε. (Note that n0 ≡ no (ε, x) depends on

both ε and on x.)

Example 19.2. (1) Let fn : [0, 1] → R be given by fn (x) = n1 x. Then fn → 0.

(2) Let gn : [0, 1] → R be given by gn (x) = xn . Then gn → g, where

(

0, if x < 1,

g(x) =

1, if x = 1.

Definition 19.3. Let f , fn : X → Rk . We say that (fn ) converges to f uniformly (on X)

if for each ε > 0, there exists n0 ∈ N such that for every x ∈ X, and for every n ≥ n0 ,

kfn (x) − f (x)k < ε. (Note that n0 ≡ n0 (ε) depends only on ε.)

Formally, the difference between pointwise convergence and uniform convergence is only

in the order of the two quantifed variables n0 and x. The difference practically, however, is

profound, and it is important that you get a good feel for it.

Example 19.4. (1) n1 x → 0 uniformly on [0, 1].

(2) xn 6→ 0 uniformly on [0, 1].

44 JACK SPIELBERG

n0 . Since limt→1 tn = 1, there

is x ∈ [0, 1) such that xn > 1/2. Then gn (x) − g(x) = gn (x) − 0 > 1/2 = ε.

1

(3) n

x 6→ 0 uniformly on R.

It is useful to have an intrinsic characterization for uniform convergence, i.e. a Cauchy

condition.

Definition 19.5. Let X be a set, and let fn : X → Rk be functions for n ∈ N. (fn ) is

uniformly Cauchy (on X) if for each ε > 0, there exists n0 ∈ N such that for all x ∈ X, and

for all m, n ≥ n0 , kfm (x) − fn (x)k < ε.

Proposition 19.6. If (fn ) is uniformly Cauchy, then (fn ) is uniformly convergent.

Proof. Let ε > 0. Choose n0 such that for all m, n ≥ n0 , and for all x ∈ X, kfm (x)−fn (x)k <

∞

ε/2. This shows that for each x ∈ X, the sequence fn (x) n=1 is Cauchy in Rk . Since Rk

∞

is complete, fn (x) n=1 converges. Define f : X → Rk by f (x) = limn→∞ fn (x). If n ≥ n0 ,

then for all x ∈ X we have

kfn (x) − f (x)k = lim kfn (x) − fm (x)k, since y ∈ Rk 7→ kz − yk ∈ R is continuous,

m→∞

ε

≤

2

< ε.

Therefore fn → f uniformly on X.

Theorem 19.7. Let X be a metric space, let f , fn : X → Rk , and suppose that fn → f

uniformly on X. Let a ∈ X, and suppose that fn is continuous at a for all n ∈ N. Then f

is continuous at a.

Proof. Let ε > 0. Choose n such that for all x ∈ X, kfn (x) − f (x)k < ε/3. since fn is

continuous at a, there is δ > 0 such that kfn (x) − fn (a)k < ε/3 whenever d(x, a) < δ. Now

let x ∈ X with d(x, a) < δ. We have

kf (x) − f (a)k ≤ kf (x) − fn (x)k + kfn (x) − fn (a)k + kfn (a) − f (a)k

ε ε ε

+ +

3 3 3

= ε,

(where the first and third occurrences of ε/3 are due to the uniform approximation of f by

fn , and the second is due to the continuity of fn at a). Therefore f is continuous at a.

Corollary 19.8. The uniform limit of continuous functions is continuous.

Example 19.9. (1) Consider the sequence of functions xn on [0, 1]. We have seen that

this sequence has a pointwise limit, which is not continuous. Since xn is continuous

for each n, the theorem implies that the convergence is not uniform (this is an easier

proof than the direct proof we gave earlier).

NOTES, MAT 472, INTERMEDIATE ANALYSIS, FALL 2010 45

(2) The above argument cannot be used in reverse. For example, let fn : [0, 1] → R be

given by

1

2nx,

if 0 ≤ x ≤ 2n

1 1

fn (x) = −2n(x − 2n ), if 2n ≤ x ≤ n1

if n1 ≤ x ≤ 1.

0,

(It will be helpful to draw a picture.) Then fn → 0 pointwise on [0, 1], but not

uniformly, even though the limit is continuous.

Example 19.10. Recall function space from Example 4.6: if X is a set, B(X, Rk ) is the

vector space of all bounded function from X to Rk . B(X, Rk ) is a normed vector space, with

norm given by kf k = supx∈X kf (x)k. Thus B(X, Rk ) is a metric space.

Proposition 19.11. Let f , fn : X → Rk be bounded functions.

(1) fn → f in B(X, Rk ) if and only if fn → f uniformly on X.

(2) (fn ) is Cauchy in B(X, Rk ) if and only if (fn ) is uniformly Cauchy on X.

Proof. This follows immediately from the definitions.

Corollary 19.12. B(X, Rk ) is a complete metric space.

Proof. This follows from Proposition 19.6 and the above proposition.

Definition 19.13. Let X be a metric space. Cb (X, Rk ) is the space of all bounded continuous

functions from X to Rk .

Note that Cb (X, Rk ) is a vector subspace of B(X, Rk ), since the sum and (scalar) product

of continuous functions is continuous.

Proposition 19.14. Cb (X, Rk ) is a complete metric space.

Proof. This follows from Corollary 19.8.

Remark 19.15. If X is a compact metric space, then C(X, Rk ) = Cb (X, Rk ).

20. Differentiation

Definition 20.1. Let I ⊆ R be open, let f : I → R, and let a ∈ I. f is differentiable at a if

f (x) − f (a)

lim

x→a x−a

exists (equivalently, if limh→0 f (a + h) − f (a) /h exists). The limit is called the derivative

df df

of f at a, and is denoted f 0 (a) (or dx (a), or dx x=a

). We say that f is differentiable on I if

it is differentiable at each point of I. We refer to the quantity f (x) − f (a) /(x − a) as the

difference quotient.

Suppose that f is differentiable at a. Let L(x) = f (a) + f 0 (a)(x − a) (L is a “linear

function”, in that its graph is a straight line). The function f is well-approximated by L in

the following sense:

(3) f (a) = L(a)

f (x) − L(x)

(4) lim = 0.

x→a x−a

46 JACK SPIELBERG

Remark 20.2. There exists at most one linear function L having these properties. Unique-

ness is an exercise, while existence is equivalent to differentiability.

There is a third equivalent formulation of differentiability. We motivate it as follows. Let

f be differentiable at a. Define u : I → R by

f (x)−f (a)−f 0 (a)(x−a)

(

x−a

, if x 6= a

u(x) =

0, if x = a.

Then limx→a u(x) = limx→a f (x)−L(x)

x−a

= 0, so that u is continuous at a. Moreover, f (x) =

f (a) + f 0 (a)(x − a) + u(x)(x − a). Thus we see that if f is differentiable at a, then f differs

from L by a function that tends to zero as x tends to a, even when divided by x − a.

Theorem 20.3. f is differentiable at a if and only if there exist a linear function L(x) =

m(x − a) + b, and a function u(x), such that

(1) u(a) = 0.

(2) u is continuous at a.

(3) f (x) = L(x) + u(x)(x − a).

In this case, f 0 (a) = m (and of course, b = f (a)).

Proof. The ‘only if’ direction was proved in the remarks before the statement of the theorem.

For the ‘if’ direction, let L and u be as in the statement of the theorem. Letting x = a in

the third item of the statement gives f (a) = b. Then dividing by x − a, and letting x → a,

we get

f (x) − f (a) m(x − a) + u(x)(x − a)

lim = lim = lim m + u(x) = m,

x→a x−a x→a x−a x→a

We now present some basic properties of differentiation.

Lemma 20.4. If f is differentiable at a, then f is continuous at a.

Proof.

f (x) − f (a)

(x−a)+f (a) = f 0 (a)·0+f (a) = f (a).

lim f (x) = lim f (x)−f (a) +f (a) = lim

x→a x→a x→a x−a

Lemma 20.5.

d

(kx + `) = k.

dx

Proof. (exercise)

Lemma 20.6. If f and g are both differentiable at a, then so are f + g, f g, and f /g (if

g(a) 6= 0), and

(f + g)0 (a) = f 0 (a) + g 0 (a)

(f g)0 (a) = f 0 (a)g(a) + f (a)g 0 (a)

f 0 (a)g(a) − f (a)g 0 (a)

(f /g)0 (a) = .

g(a)2

Proof. (exercises)

NOTES, MAT 472, INTERMEDIATE ANALYSIS, FALL 2010 47

Theorem 20.7. (The chain rule.) Let I, J ⊆ R be open, let f : I → R and g : J → R, let

a ∈ I, suppose that f (a) ∈ J, and suppose that f is differentiable at a and g is differentiable

at f (a). Then g ◦ f is differentiable at a, and (g ◦ f )0 (a) = g 0 f (a) f 0 (a).

Proof. We apply Theorem 20.3 to f and g to obtain functions u : I → R and v : J → R

such that

(1) u and v vanish at a and f (a), respectively.

(2) u and v are continuous at a and f (a), respectively.

(3)

f (x) = f (a) + f 0 (a)(x − a) + u(x)(x − a)

g(y) = g f (a) + g 0 f (a) y − f (a) + v(y) y − f (a) .

g f (x) = g f (a) + g 0 f (a) f (x) − f (a) + v f (x) f (x) − f (a)

= g f (a) + g 0 f (a) f 0 (a)(x − a) + u(x)(x − a)

0

+ v f (x) f (a)(x − a) + u(x)(x − a)

= g f (a) + g 0 f (a) f 0 (a)(x − a)

h i

+ g 0 f (a) u(x) + v f (x) f 0 (a) + v f (x) u(x) (x − a).

Then by Theorem 20.3 it suffices to show that the expression in square brackets vanishes

and is continous at x = a. We check this for each of the three terms separately. It is true for

the first term because it is true for u. It is true for the second term because f is continuous

at a (by Lemma 20.4), v is continuous, and vanishes, at f (a), and Theorem 8.11. It is true

for the third term by both of the above.

We now draw out some consequences of differentiability on intervals. First we give a

general definition.

Definition 20.8. Let X be a metric space, let U ⊆ X be open, let a ∈ U and let f : U → R.

f has a local maximum (respectively local minimum) at a if there is r > 0 such that for all

x ∈ Br (a) we have f (x) ≤ f (a) (respectively, f (x) ≥ f (a)). Local maxima and minima are

called local extrema.

Lemma 20.9. Let I ⊆ R be an open interval, let a ∈ I, and let f : I → R. Suppose that f

is differentiable at a. If f has a local extremum at a, then f 0 (a) = 0.

Proof. We prove the contrapositive. Suppose that f 0 (a) 6= 0. For definiteness we assume

f 0 (a) > 0 (the proof in the case f 0 (a) < 0 is analogous). We then have that limx→a f (x) −

f (a) /(x − a) > 0. Then there is δ >0 such that (a − δ, a + δ) ⊆ I, and such that for x ∈ I,

if 0 < |x − a| < δ then f (x) − f (a) /(x − a) > 0. Now, for any x with a − δ < x < a, we

have x − a < 0. Since the difference quotient is positive, we must have f (x) − f (a) < 0; thus

f does not have a local minimum at a. Similarly, for any x with a < x < a + δ, we have

x − a > 0. Again, since the difference quotient is positive, we must have f (x) − f (a) > 0;

48 JACK SPIELBERG

thus f does not have a local maximum at a. Therefore, f does not have a local extremum

at a.

This lemma has several famous applications.

Theorem 20.10. (Rolle’s theorem) Let f : [a, b] → R be continuous, and assume that f is

differentiable on (a, b). Suppose further that f (a) = f (b) = 0. Then there exists c ∈ (a, b)

such that f 0 (c) = 0.

Rolle’s theorem is a special case of the following theorem

Theorem 20.11. (Mean value theorem) Let f : [a, b] → R be continuous, and assume that f

0

is differentiable on (a, b). Then there exists c ∈ (a, b) such that f (c) = f (b) − f (a) /(b − a).

The idea of the theorem, and the proof, is easy to see from a simple sketch:

on the graph

of f , draw the straight line between the endpoints of the graph a, f (a) and b, f (b) . Let

L(x) be the linear function whose graph passes through these two points. The point c in the

theorem is (one of) the place(s) where the vertical distance between the graphs of f and L

is stationary, i.e. has a local extremum. A little algebraic manipulation of the expression

f (x) − L(x) yields the beginning of the following proof.

Proof. Let h(x) = f (x) − f (a) (b − a) − f (b) − f (a) (x − a). Then h is continuous on [a, b]

and differentiable on (a, b). Also h(a) = h(b) = 0. By the extreme value theorem (Corollary

15.4), h takes on its maximum and minimum values on [a, b]. We note that at least one

of these occurs in the interior (a, b). For if both occur at the endpoints, then h must be

identically zero, and hence achieves its maximimum and minimum at every point of [a, b].

Let c ∈ (a, b) be such a point. By Lemma 20.9 we have h0 (c) = 0. Differentiating h gives

h0 (x) = f 0 (x)(b−a)− f (b)−f (a) . Then the equation h0 (c) = 0 gives the desired result.

Remark 20.12. There is an alternate phrasing of the mean value theorem that is often

convenient. Let f : I → R be differentiable, where I is an open interval. Let a ∈ I and

h ∈ R \ {0} be such that a + h ∈ I. If we wish to apply the mean value theorem to the closed

interval having a and a + h as endpoints, we would like to express the conclusion without

declaring which is the left, and which the right, endpoint. We avoid this inconvenience in the

following way: the point c lies (strictly) between a and a + h if and only if there is a number

0 < θ < 1 such that c = a + θh. Thus we reexpress the mean value theorem in the following

way: if a, a + h ∈ I then there exists 0 < θ < 1 such that f (a + h) = f (a) + hf 0 (a + θh).

Now we give some corollaries of the mean value theorem.

Corollary 20.13. Let I ⊆ R be an open interval, and let f : I → R be differentiable. If

f 0 = 0 on I, then f is constant on I.

Proof. Let x0 ∈ I, and apply the mean value theorem to the interval between x0 and x,

for any x ∈ I. We find that there is c strictly between x0 and x such that f (x) − f (x0 ) =

f 0 (c)(x − x0 ) = 0. Thus f (x) = f (x0 ) for all x ∈ I.

Corollary 20.14. Let I be as in the previous corollary, and let f , g : I → R be differentiable.

If f 0 = g 0 on I, then f − g is a constant function.

Proof. Apply the previous corollary to f − g.

NOTES, MAT 472, INTERMEDIATE ANALYSIS, FALL 2010 49

(respectively, decreasing) on I if for all x, y ∈ I, if x < y then f (x) ≤ f (y) (respectively,

f (x) ≥ f (y)). We say that f is strictly increasing (respectively, strictly decreasing) if the

inequalities above involving f are strict rather than weak.

Corollary 20.16. Let I be as in the previous corollaries, and let f : I → R be differentiable.

If f 0 ≥ 0 (respectively f 0 ≤ 0) on I, then f is increasing (respectively, decreasing) on I.

If f 0 > 0 (respectively f 0 < 0) on I, then f is strictly increasing (respectively, strictly

decreasing) on I.

Proof. We will give the proof in the case that f 0 > 0 on I; the other parts have similar proofs.

Let x < y in I. By the mean value theorem there is x < c < y such that f (y) − f (x) =

f 0 (c)(y − x) > 0. Since f 0 (c) > 0 and y − x > 0, then f (y) > f (x).

Definition 20.17. Let X and Y be metric spaces. A function f : X → Y is called Lipschitz

(on X) if there is a positive constant M such that for all x1 , x2 ∈ X we have d f (x1 ), f (x2 ) ≤

M d(x1 , x2 ).

Corollary 20.18. Let I and f be as in the previous corollary. Suppose that f 0 is bounded

on I. Then f is Lipschitz on I.

Proof. Let |f 0 | ≤ M on I. Then for any x, y ∈ I, the mean value

theorem provides

c between

0

x and y such that f (x) − f (y) = f (c)(x − y). It follows that f (x) − f (y) ≤ M |x − y|.

Theorem 20.19. Let I be an open interval, let f : I → R be differentiable, and suppose

that f 0 6= 0 on I. Then f (I) is an open interval, and f : I → f (I) is a homeomorphism.

Moreover, f −1 is differentiable, and

1

(f −1 )0 (y) = −1

.

f f (y)

The generalization of this theorem to higher dimensions is a very important result, and

somewhat surprisingly, is much much harder to prove. (We will tackle that next semester.)

In dimension one, the job is easier because the assumption that f 0 is nonzero means that

f is monotone — if we know that f 0 > 0 (or < 0) throughout I. If we assume that f is

continuously differentiable, then this is immediate: the intermediate value theorem would

apply to the continuous function f 0 , and we would know that f 0 can’t take on both positive

and negative values if it is never zero. In the higher dimensional situation we will assume

that f is continuously differentiable. However, it is remarkable that in dimension one, the

result is true even if f 0 is not continuous. This is because of the following simple observation:

f 0 satisfies the intermediate value property even if it is not continuous.

Theorem 20.20. Let I be an interval and let f : I → R be differentiable. Let a, b ∈ I, and

assume that f 0 (a) < f 0 (b). If f 0 (a) < M < f 0 (b), then there exists c between a and b such

that f 0 (c) = M .

Proof. We will prove this in the special case where a < b, f 0 (a) < 0 < f 0 (b), and M = 0. The

general case follows easily from this, and we leave those details as an exercise. Since f 0(a) =

limh→0 h f (a + h) − f (a) , there is h > 0 such that a + h < b and h−1 f (a + h) − f (a) < 0.

−1

It follows that f (a + h) < f (a). Therefore the minimum of f on [a, b] does not occur at

50 JACK SPIELBERG

a. A similar argument shows that the minimum does not occur at b. Hence f has a local

minimum in the open interval (a, b), and at this point f 0 = 0.

Proof. (of Theorem 20.19) By Theorem 20.20 we know that f 0 > 0 on I, or that f 0 < 0

on I. By Corollary 20.16 it follows that f is strictly monotone on I. It follows from the

intermediate value theorem that f (I) is an open interval, and that f −1 is continuous. We now

show that f −1 is differentiable, and compute its derivative. For x ∈ I let y = f (x) ∈ f (I).

For w ∈ f (I) with w 6= y, there is t ∈ I such that w = f (t). Since f is one-to-one, t 6= x.

We have −1

f −1 (w) − f −1 (y)

t−x f (t) − f (x)

= = .

w−y f (t) − f (x) t−x

Since f −1 is continuous, limw→y t = x. Moreover t 6= x during this limiting process. Therefore

−1

f −1 (w) − f −1 (y)

f (t) − f (x) 1 1

lim = lim = 0 = 0 −1 .

w→y w−y t→x t−x f (x) f f (y)

Corollary 20.21. If f is C r (in addition to the hypotheses of the inverse function theorem),

then so is f −1 .

Proof. The formula for (f −1 )0 shows that it is continuous if f 0 is continuous. Similarly, it is

differentiable if f 0 is differentiable, etc.

If you consider the function h used in the proof of the mean value theorem, you will notice

the beginnings of some symmetry: the function f and the identity function play opposite

roles. Remarkably, the identity function can be replaced by another function like f . The

result is

Theorem 20.22. (Cauchy mean value theorem.) Let f , g : [a, b] → R be continuous, 0

and differentiable

0 on (a, b). Then there exists c ∈ (a, b) such that f (b) − f (a) g (c) =

g(b) − g(a) f (c).

Proof. Let h(t) = f (b)−f (a) g(t)−g(a) − f (t)−f (a) g(b)−g(a) . Then h is continuous

on [a, b], differentiable on (a, b), and h(a) = h(b) = 0. Now the mean value theorem gives

the result.

We apply Cauchy’s mean value theorem to prove L’Hôpital’s rule on the computation of

indeterminate limits. The proof applies to any form of continuous limit — here we phrase

it for one-sided limits.

Theorem 20.23. (L’Hôpital’s rule.) Let f , g : (a, b) → R be differentiable. Suppose that

limt→a+ f (t) = limt→a+ g(t) = 0, and that g(t) 6= 0 on (a, b). If limt→a+ f 0 (t)/g 0 (t) = L, then

limt→a+ f (t)/g(t) = L.

Proof. Define f (a) = g(a) = 0. Then f and g are continuous on [a, b). By the hypothesis on

the limit of f 0 /g 0 , we are implicitly assuming that g 0 (t) 6= 0, at least for all t close enough to

a. Replacing b by a smaller value, we may assume that g 0 6= 0 on (a, b). Now, for t ∈ (a, b),

we apply Cauchy’s mean value 0 theorem to f and g on the interval [a, t]. Thus there exists

0

c ∈ (a, t) with f (t) − f (a) g (c) = g(t) − g(a) f (c). Since f (a) = g(a) = 0, we get

f (t)g 0 (c) = g(t)f 0 (c). By hypothesis we have g(t) 6= 0. Thus we have

f (t) f 0 (c)

= 0 .

g(t) g (c)

NOTES, MAT 472, INTERMEDIATE ANALYSIS, FALL 2010 51

Moreover, a < c < t. Of course, c depends on t, but we see that as t → a+ then also c → a+ .

Hence

f (t) f 0 (c)

lim+ = lim+ 0 = L.

t→a g(t) c→a g (c)

With a bit

more work, the same result can be proved in the case where we assume

limt→a+ f (t) = limt→a+ g(t) = ∞. This is an interesting exercise, or you may look up

the proof (e.g. in Rudin). If limt→a+ f (t) = 0 while limt→a+ g(t) = ±∞, evaluating the limit

limt→a+ f (t)g(t) presents us with the third kind of indeterminate form, namely 0 · ∞. In this

case, we would instead consider the limit of f /(g −1 ), which is indeterminate of form 0/0.

If f is differentiable on an open interval I, then f 0 is itself a function on I. f 0 need not be

continuous; if it is continuous, we say that f is continuously differentiable on I. Even if f is

continuously differentiable, f 0 need not be differentiable. If f is differentiable on I, and if f 0

is differentiable at a point a ∈ I, we write f 00 (a) for the derivative of f 0 at a. (Note that in

order to be able to consider whether f 00 (a) exists, it is necessary that f be differentiable in

a neighborhood of a — I is such a neighborhood.) If f 0 is differentiable on I, we say that

f is twice differentiable on I. (Of course, if f is twice differentiable, then f is necessarily

continuously differentiable.) In general, if f 0 , f 00 , . . ., f (k) exist on I, we say that f is k-times

differentiable (on I). In this case f is necessarily (k − 1)-times continuously differentiable.

Definition 21.1. Let I be an open interval, let a ∈ I, and let f : I → R. Suppose that f

is k-times differentiable at a. The kth Taylor polynomial of f at a is

k

X f (j) (a) 1 00 1

Pk (a, t) = (t − a)j = f (a) + f 0 (a)(t − a) + f (a)(t − a)2 + · · · + f (k) (a)(t − a)k .

j=0

j! 2! k!

property that

dj

j

P k (a, t)

t=a

= f (j) (a), j = 0, . . . , k.

dt

Moreover, no other polynomial of degree k (or less) has this property. (Thus the Taylor

polynomial of degree k is the best approximation to f at a among all polynomials of degree

less than or equal to k.)

Proof. It is a simple calculation to check that Pk (a, t) has the indicated property. If q(t) =

c0 + c1 (t − a) + · · · + ck (t − a)k is a polynomial, then differenting j times gives q (j) (a) = j!cj .

Thus if q (j) (a) = f (j) (a) for 0 ≤ j ≤ k, then we must have j!cj = f (j) (a), as required.

We see by this lemma that it is easy to find a polynomial that approximates f well at the

point a. It is not as easy to see how well this polynomial approximates f near the point a.

For this, we have Taylor’s Theorem. One can think of it as the generalization of the mean

value theorem from order 0 to order k. The proof is a bit tricky; we will use Cauchy’s mean

value theorem.

52 JACK SPIELBERG

that f is (k + 1)-times differentiable on I. For t ∈ I there exists c between a and t such that

f (k+1) (c)

f (t) = Pk (a, t) + (t − a)k+1 .

(k + 1)!

Proof. Let R(t) = f (t) − Pk (a, t). (R(t) is sometimes called the kth remainder.) It follows

from Lemma 21.2 that R(j) (a) = 0 for 0 ≤ j ≤ k, and we easily see that R(k+1) (a) = f (k+1) (a).

Let h(x) = (x − a)k+1 . Then h(j) (a) = 0 for 0 ≤ j ≤ k also, and h(k+1) (a) = (k + 1)!.

Now let x ∈ I, x 6= a. We apply the Cauchy mean value theorem to R and h on the

interval between a and x: thus there is c1 between a and x such that

R(x) − R(a) h0 (c1 ) = h(x) − h(a) R0 (c1 ),

R(x) R0 (c1 )

= 0 .

h(x) h (c1 )

Now we apply the Cauchy mean value again, to R0 and h0 on the interval between a and c1 :

there is c2 between a and c1 such that

R0 (c1 ) R00 (c2 )

=

h0 (c1 ) h00 (c2 )

(again using the facts that R(a) = h(a) = 0 and h0 , h00 6= 0 away from a). We repeat this

process k + 1 times, and we obtain

R(x) R0 (c1 ) R(k+1) (ck+1 ) f (k+1) (ck+1 )

= 0 = · · · = (k+1) = .

h(x) h (c1 ) h (ck+1 ) (k + 1)!

1

Unwinding this gives f (x)−Pk (a, x) = (k+1)!

f (k+1) (c)(x−a)k+1 , where c (≡ ck+1 ) lies between

a and x.

Corollary 21.4. Let f : I → R be twice differentiable, and assume that f 00 ≥ 0 on I. Then

at each point of I, the graph of f lies above its tangent line.

Proof. Let a, a + h ∈ I. By Taylor’s theorem there is 0 < θ < 1 such that

1

f (a + h) = f (a) + f 0 (a)h + f 00 (a + θh)h2 ≥ f (a) + f 0 (a)h.

2

The last expression is the x2 -coordinate of the point on the tangent line at x1 = a + h.

Example 21.5. (Polynomial approximation.)

(1) Let f (x) = ex . Then f (j) (x) = ex for all j, so f (j) (0) = 1 for all j. Thus Pk (x) =

Pk 1 j

j=0 j! x is the k-th order Taylor polynomial of f at 0. By Taylor’s theorem, there

is c between 0 and x such that

ec

ex − Pk (x) = xk+1 .

(k + 1)!

Now fix M > 0. For |x| ≤ M , we have

k+1

e − Pk (x) ≤ eM M

x

→ 0 as k → ∞.

(k + 1)!

Thus the Taylor polynomials converge uniformly to ex on any bounded interval.

NOTES, MAT 472, INTERMEDIATE ANALYSIS, FALL 2010 53

(2) Define g : R → R by

(

e−1/x , if x > 0

g(x) =

0, if x ≤ 0.

It is a nice exercise to show that g has derivatives of all orders at 0 (this is clear at

other points of R), and that g (j) (0) = 0 for all j. Thus all Taylor polynomials of g

at 0 are identically zero. Therefore the Taylor polynomials of g do not approximate

g uniformly in any neighborhood of zero.

In this section we will discuss Riemann integration. We gratefully cobbled together this

treatment from the ideas of the analogous chapter of Pugh’s book. Pugh’s approach digs a

little bit deeper than the usual ones, but it really is worth the extra effort

What is integration all about? Of course we rely on your previous experience from calculus:

the most basic answer is that we want to find the area of a region bounded by curved lines.

(A region bounded by straight lines can be dealt with entirely by elementary geometry.) Our

definition(s) are based on this idea. The next level of abstraction comes from the fundamental

theorem of calculus: integration is the inverse operation to differentiation. That statement is

a bit glib. After all, the derivative of a function is another function, whereas the integral of

a function is a number. But the statement actually is correct, when it is fleshed out properly

— that is the role of the fundamental theorem. What we take from this, (or rather, what

we imagine that we are explaining to first year calculus students), is that integration is a

function of functions, i.e. a functional :

Z

: {functions} → R.

Among the first properties of integration that are presented in calculus are the “sum” and

“scalar multiple” rules:

Z Z Z Z Z

(f + g) = f + g; (cf ) = c f.

In fact, these are indicating precisely that integration is a linear functional. Linear algebra

is an essential part of modern analysis, and the analysis of linear functionals, functional

analysis, is one of its broadest subdisciplines.

Well, the notion of linear map presupposes the idea of vector spaces: the domain and

codomain of a linear map should be vector spaces. This is a fundamental idea, that is

almost completely lost in a calculus course: the collection of functions that can be integrated

should be a vector space. To be candid, we don’t really talk at all about the “space of

integrable functions” in a calculus course. At best, we try to explain why certain functions

are integrable, e.g. continuous, or piecewise continuous, functions. This time, we will directly

address this question. Not only will we carefully define what integrable means, and prove that

the set of integrable functions is a vector space. We will give an independent characterization

(due to Lebesgue) of exactly which functions are integrable. This is useful even just in the

context of Riemann integration. Many important results that would otherwise require fussy

proofs will become effortless (so to speak). But it also prods us to a larger view. Once we

are able to see the space of Riemann integrable functions as a whole, we can also begin to

54 JACK SPIELBERG

see its limitations, and where it might give way to generalization. In the next semester we

will spend some time (how much???) exploring Lebesgue’s version of integration.

That is the end of the “introduction”. We have to get started, and the beginning is

very basic — after all, integration is just a lot of arithmetic. We will follow Pugh’s idea

of emphasizing the fact that there are two usual ways to present the integral; he refers to

them as the Riemann and the Darboux approaches. Without any expertise in the history of

mathematics, or any effort at tracking down that history, we will just adopt this terminology.

First we give the Riemann approach. We let f be a real-valued function on a compact interval

[a, b].

Definition 22.1. A partition of [a, b] is a finite set P ⊆ [a, b] such that a, b ∈ P .

The idea of a partition is that it defines a subdivision of [a, b] into a finite number of

subintervals. The easiest way of indicating this is by giving the set of endpoints of the

subintervals, which is what our definition does. We usually write a partition in the form

P = {x0 , x1 , . . . , xn },

where a = x0 < x1 < · · · < xn = b. This is a slight abuse of notation, since the definition

of P as a set does not indicate that the numbers in the set are given in (strictly) increasing

order. From the partition P we obtain n subintervals of [a, b]: [x0 , x1 ], . . ., [xn−1 , xn ]. Note

that the number n associated with P is obtained from the relation n + 1 = #(P ). We use

the term mesh for the length of the largest subinterval: mesh(P ) = max1≤i≤n (xi − xi−1 ).

The mesh is a rough sort of description of how fine the partition is.

Definition 22.2. A partition pair is a partition P together with a list T = (t1 , . . . , tn ) such

that xi−1 ≤ ti ≤ xi for 1 ≤ i ≤ n.

Thus the list T consists of a selection of one element from each subinterval of the partition.

Definition 22.3. Let f : [a, b] → R, and let (P, T ) be a partition pair for the interval [a, b].

The Riemann sum associated to this data is the number

n

X

R(f, P, T ) = f (ti )∆xi ,

i=1

where ∆xi = xi − xi−1 , the length of the ith subinterval.

Now we have the terminology we need to define Riemann integrability and the Riemann

integral. As mentioned above, Riemann sums are just a lot of (carefully organized) arith-

metic. To pass to the integral is a limiting process. The following definition is the usual

notion of limit, but is based on the mesh.

Definition 22.4. The function f : [a, b] → R is Riemann integrable if there is a number L

such that for every ε > 0, there exists δ > 0 such that for every partition pair (P, T ) of [a, b],

if mesh(P ) < δ then R(f, P, T ) − L < ε.

We write L = limmesh(P )→0 R(f, P, T ) to indicate this limit. The number L is unique, if

it exists. This is proved in theR usual way

R b of limits,

R band is left to youR as an exercise. If f is

b

Riemann integrable, we write a f (or a f dx, or a f (x) dx, or just f ) for the number L.

We will write R[a, b] for the set of all Riemann integrable functions on [a, b].

There is an important detail hidden in the last definition. For the limit to exist it must

be the case that the approximation holds independently of the choice of the list T in the

NOTES, MAT 472, INTERMEDIATE ANALYSIS, FALL 2010 55

partition pair. In other words, if P is a partition with mesh(P ) < δ, then the Riemann sum

is within ε of L for any choice of T .

We now give some consequences of the definition.

Theorem 22.5. If f is Riemann integrable then f is bounded.

Proof. We apply the definition of integrability with ε = 1: there

exist L and δ > 0 such that

if P is any partition with mesh(P ) < δ, then R(f, P, T ) − L < 1. (As we mentioned above,

this estimate holds for any choice of T .) It follows from the triangle inequality that

n

X

f (ti )∆xi < 1 + |L|.

i=1

We will show that f is bounded on each subinterval of [a, b] defined by P . It will then follow

that f is bounded on [a, b]. Fix i0 ∈ {1, 2, . . . , n}. For i 6= i0 choose ti ∈ [xi−1 , xi ]. For any

t ∈ [xi0 −1 , xi0 ] we apply the above inequality to the list T = (t1 , . . . , ti0 −1 , t, ti0 +1 , . . . , tn ):

X

f (t)∆xi0 − f (ti )∆xi ≤ R(f, P, T ) < 1 + |L|.

i6=i0

We find that !

X

f (t) ≤ (∆xi0 )−1 1 + |L| +

f (ti )∆xi .

i6=i0

Thus the right hand side is an upper bound for |f | on [xi0 −1 , xi0 ].

Theorem 22.6. R[a, b] is a vector space, and integration defines a linear functional on it.

Proof. We note that for a fixed partition pair (P, T ), the Riemann sum is linear in f :

X

R(cf + g, P, T ) = (cf + g)(ti )∆xi

i

X

= cf (ti ) + g(ti ) ∆xi

i

X X

=c f (ti )∆xi + g(ti )∆xi

i i

= cR(f, P, T ) + R(g, P, T ).

Since addition and multiplication in R are continuous, we get

lim R(cf + g, P, T ) = lim cR(f, P, T ) + R(g, P, T )

mesh(P )→0 mesh(P )→0

mesh(P )→0 mesh(P )→0

R hence

R R[a, b] is a vector space. Moreover the

above calculation shows that (cf +g) = c f + g, i.e. that integration is a linear functional

on R[a, b].

The following example and theorem are easy exercises using the definition of integrability.

Rb

Example 22.7. The constant function 1 is Riemann integrable, and a 1 = b − a.

R R

Theorem 22.8. Let f , g ∈ R[a, b] with f ≤ g. Then f ≤ g. If |f | ≤ M on [a, b], then

R b

f ≤ M.

a

56 JACK SPIELBERG

We now discuss the second way of defining the Riemann integral, which we call the Dar-

boux method. Again, we need some preliminaries. Notice that for this method we must

assume that the function is bounded.

Definition 23.1. Let f : [a, b] → R be a bounded function, and let P = {x0 , x1 , . . . , xn } be

a partition of [a, b]. We define

mi = inf f (t)

xi−1 ≤t≤xi

Mi = sup f (t).

xi−1 ≤t≤xi

X

L(f, P ) = mi ∆xi

i

X

U (f, P ) = Mi ∆xi .

i

These are referred to as lower and upper sums. Notice that for any partition pair (P, T )

we have that L(f, P ) ≤ R(f, P, T ) ≤ U (f, P ). Finally we define

I(f ) = sup L(f, P )

P

P

These are referred to as the lower and upper integrals of f on [a, b]. It is standard to write

Rb Rb

a

f for I(f ), and a f for I(f ). Finally, we say that f is Darboux integrable on [a, b] if I = I,

and in this case the common value is called the (Darboux) integral.

Our goal for this section is to prove that the Riemann and Darboux approaches yield the

same result. Before doing this we need to talk a bit about refinements of partitions, and

their effect on upper and lower sums and integrals.

Definition 23.2. Let P and P 0 be partitions of [a, b]. We say that P 0 refines P if P ⊆ P 0 .

It is easy to see that P 0 refines P if and only if every subinterval associated to P 0 is

contained in one of the subintervals associated to P .

Lemma 23.3. (Refinement Principle) Let P 0 refine P . Then L(f, P ) ≤ L(f, P 0 ) and

U (f, P 0 ) ≤ U (f, P ).

In other words, refining the partition causes the lower sum to increase, and the upper sum

to decrease. The idea of the proof is to proceed from P to P 0 by adding one point at a time.

Then the change in the lower and upper sums happens on only one subinterval of P . We

leave as an exercise the writing of a precise proof.

In general, if P1 and P2 are two partitions of [a, b], then neither one need refine the other.

Thus there is in general no relation between the upper and lower sums for two partitions.

However, P1 and P2 always have a common refinement; for example, P1 ∪ P2 contains both

P1 and P2 . This device gives us the following important result: every lower sum for f is less

than or equal to every upper sum for f .

Lemma 23.4. Let P1 and P2 be two partitions of [a, b]. Then L(f, P1 ) ≤ U (f, P2 ).

NOTES, MAT 472, INTERMEDIATE ANALYSIS, FALL 2010 57

L(f, P1 ) ≤ L(f, P 0 ) ≤ U (f, P 0 ) ≤ U (f, P2 ).

We now give a Cauchy type characterization of Darboux integrability.

Corollary 23.5. f is Darboux integrable on [a, b] if and only if for every ε > 0 there is a

partition P of [a, b] such that U (f, P ) − L(f, P ) < ε.

Proof. The forward direction follows easily from the definition, and we leave it as an exercise

to write it out carefully. For the reverse direction, suppose that the Cauchy condition holds.

We must show that I = I. We already know that I ≤ I. Let ε > 0. Choose a partition P

such that U (f, P ) − L(f, P ) < ε. Then

I ≤ U (f, P ) < L(f, P ) + ε ≤ I + ε.

This is true for every choice of ε, and hence I ≤ I.

(This Cauchy type condition corresponds to a kind of limit. The limiting process going

on here is that the partition becomes finer and finer, in the sense of refinement. This is a

different kind of limit than the others we have seen. Until now we have seen limits based on

a totally ordered set; for example, n → ∞ in N, t → t0 in R, or t → ∞ in R. The limit taken

as a partition of [a, b] becomes finer and finer is based on a partially ordered set, namely, the

set of partitions ordered by refinement. It isn’t hard to get used to this notion, and we may

write

I = lim L(f, P )

P →∞

I = lim U (f, P ).

P →∞

R

If f is Darboux integrable, then we have f = limP →∞ L(f, P ) = limP →∞ U (f, P ).)

We are now ready to prove the main theorem of this section.

Theorem 23.6. Let f : [a, b] → R. Then f is Riemann integrable if and only if f is Darboux

integrable. For an integrable function, the two integrals coincide.

Proof. We first assume that f is Riemann integrable. Let ε > 0. There exist a number L

and δ > 0 such that if P is any

partition with mesh(P ) < δ, then for any list T associated

to P we have R(f, P, T ) − L < ε. Fix any partition P with mesh(P ) < δ. Then we have

(for any T )

L − ε < R(f, P, T ) < L + ε.

Recall that for any partition pair (P, T ), we have L(f, P ) ≤ R(f, P, T ) ≤ U (f, P ). Moreover,

it is easy to see that

L(f, P ) = inf R(f, P, T )

T

U (f, P ) = sup R(f, P, T ).

T

It follows that

L − ε ≤ L(f, P )

L + ε ≥ U (f, P ).

Therefore U (f, P ) − L(f, P ) ≤ 2ε. Hence f is Darboux integrable.

58 JACK SPIELBERG

Now we assume that f is Darboux integrable. The proof of this direction is a bit trickier

than the other one. In particular, it relies upon the standard technique of dividing the sum

into two kinds of terms, and estimatingR b them differently. Since f is bounded, there is K

such that |f | ≤ K on [a, b]. Let L = a f (the Darboux integral of f ). Let ε > 0. Choose a

partition P such that

U (f, P ) − L(f, P ) < ε.

Write P = {x0 , x1 , . . . , xn }. Set δ = nε . We will show that if (Q, T ) is any partition pair

with mesh(Q) < δ, then R(f, Q, T ) − L < (2K + 1)ε, proving Riemann integrability

(and also showing that the two integrals coincide). In fact, it will suffice to show that

U (f, Q) − L(f, Q) < (2K + 1)ε, since both L and R(f, Q, T ) lie between the lower and upper

sums.

So let Q = {y0 , y1 , . . . , yk } have mesh less than δ. We will write Ii = [xi−1 , xi ] for 1 ≤ i ≤ n,

and Jj = [yj−1 , yj ] for 1 ≤ j ≤ k. We divide the subintervals associated to Q into two groups

as follows:

S1 = {j : there exists i with xi ∈ int(Jj )}

S2 = {1, 2, . . . , k} \ S1 .

Thus S2 indicates those Jj ’s that are entirely contained in one of the Ii ’s; S1 indicates those

Jj ’s that straddle more than one of the Ii ’s. There are at most n elements in S1 (in fact,

there are at most n − 1). Now we will use m(I) and M (I) for the infimum and supremum

of f over an interval I. For j ∈ S1 we have

−K ≤ m(Jj ) ≤ M (Jj ) ≤ K.

For j ∈ S2 there is i such that Jj ⊆ Ii . Then

m(Ii ) ≤ m(Jj ) ≤ M (Jj ) ≤ M (Ii ).

Hence for this j and i we have

M (Jj ) − m(Jj ) ≤ M (Ii ) − m(Ii ).

Now we estimate:

k

X

U (f, Q) − L(f, Q) = M (Jj ) − m(Jj ) ∆yj

j=1

X X

= M (Jj ) − m(Jj ) ∆yj + M (Jj ) − m(Jj ) ∆yj

j∈S1 j∈S2

X n

X X

≤ 2K∆yj + M (Ii ) − m(Ii ) ∆yj

j∈S1 i=1 j∈S2

Jj ⊆Ii

n

X

< 2Knδ + M (Ii ) − m(Ii ) ∆xi

i=1

= 2Kε + U (f, P ) − L(f, P )

< (2K + 1)ε.

NOTES, MAT 472, INTERMEDIATE ANALYSIS, FALL 2010 59

There are various situations where it is fairly easy to prove integrability (or non-integrabi-

lity) using the Darboux definition. These are useful exercises in working with the definition.

In the next section we will prove a deep theorem that will make them trivial to verify.

Example 23.7. (1) Continuous functions are Riemann integrable.

(2) Monotone functions are Riemann integrable.

(3) Step functions are Riemann integrable. (A step function on [a, b] is a function for

which there exists a partition of [a, b] such that the function is constant on the interior

of each subinterval.) In particular, the characteristic function χ[c,d] of a subinterval

[c, d] of [a, b] is Riemann integrable over [a, b], where χE (x) = 1 if x ∈ E and = 0 if

x 6∈ E.

(4) More generally, a bounded function that is continuous at all but finitely many points

of [a, b] is Riemann integrable.

(5) The characteristic function of Q is not Riemann integrable over any interval.

In order to characterize intrinsically the property of being Riemann integrable, we need to

develop the concept of sets of measure zero. This is the first step in what is called measure

theory (which I hope to cover a bit more fully next semester). Riemann integration is built

around the elementary concept of length of an interval. It is natural to consider the “length”

of a finite union of intervals, but the question of measuring more complicated subsets of R

is not addressed in calculus. Nevertheless, this is a very important problem, resolved by

Lebesgue in the first part of the twentieth century. One of his great insights is the main

theorem below.

Definition 24.1. Let E ⊆ R. E has measure zero if for every ε > 0 there exist open

intervals U1 , U2 , . . ., such that

⊆ ∞

S

(1) E

P∞ i=1 Ui .

(2) i=1 |Ui | < ε.

In this definition we write |Ui | for the length of the interval Ui . We expect that the notion

of a convergent sum of positive real numbers is familiar from a previous course (even though

we will review this idea later

P this semester). In any case, the definition is unchanged if we

demand instead of (2) that ni=1 |Ui | < ε for all n. Here are some examples of sets of measure

zero.

Example 24.2. (1) Finite sets have measure zero. (In fact, only finitely many open

intervals are necessary in this case.)

(2) Countable sets have measure zero.

Proof. This is a convenient place to introduce the “ε/2n trick”. Let E = {x1 , x2 , . . .}.

ε ε

Let ε > 0 be given,

ε

P and set εUi = (xi − 2i+2 , xi + 2i+2 ), for

S i = 1, 2, . . .. Then

|Ui | = 2i+1 , so that i |Ui | = 2 < ε. It is obvious that E ⊆ i Ui .

(3) Q has measure zero.

(4) A subset of a set of measure zero has measure zero.

(5) A countable union of sets of measure zero has measure zero.

(6) The definition is unchanged if arbitrary intervals (open, closed, or half-open) are used

instead of open intervals.

60 JACK SPIELBERG

Proof. Proofs for the previous three assertions are left as exercises.

(7) The Cantor set C has measure zero.

Proof. Recall from our construction of C that C = ∞

T

n=1 Fn , where Fn is the union

of 2n closed intervals, each of length 3−n . Stretching each of these a little bit, we

can produce 2n open intervals Ui each having length less than (2.5)−n and having

Pn 2 n

union containing Fn (and hence C). Then 2i=1 |Ui | = 2.5 , which tends to zero as

n → ∞.

(8) If a < b then [a, b] does not have measure zero. This is a good exercise, even if it

isn’t homework (but it might be).

Before stating the main theorem, we recall the notion of oscillation of a function at a

point. The definition makes sense for a function between general metric spaces, but for

clarity we will state it only for functions whose codomain is R.

Definition 24.3. Let X be a metric space, and let f : X → R. Let a ∈ X. The oscillation

of f at a is

osc(f, a) = inf sup f (x) − f (y) .

r>0 x,y∈B (a)

r

This is the precise description of a very natural idea. Let’s briefly take the definition apart.

Fix r > 0. This defines an open ball about a. How much can the function vary over this

ball? The supremum in the parentheses is exactly how much. If we let r become smaller,

then the ball becomes smaller, so that there are fewer points in the ball to put inside of f .

Thus as r decreases, the supremum also decreases. In fact, the infimum over r is actually

equal to the limit as r → 0. This limiting value is the minimum amount that f can be made

to jump, no matter to how small a ball (centered at a) you confine its argument. That is

what we mean by the oscillation at a.

We can think of the oscillation of f at a as a measure of the size of the discontinuity of f

at a. That is an interpretation of the first part of the following lemma (which should have

been homework earlier in the semester).

Lemma 24.4. Let X be a metric space, let f : X → R, and let a ∈ X.

(1) f is continuous at a if and only if osc(f, a) = 0.

(2) For c > 0, {x ∈ X : osc(f, x) ≥ c} is a closed set.

Theorem 24.5. Let f : [a, b] → R be bounded. Let E be the set of points in [a, b] where f is

discontinuous. Then f is Riemann integrable if and only if E has measure zero.

1

Proof. We first assume that f is Riemann S∞ integrable. Let En = {x ∈ [a, b] : osc(f, x) ≥ n }.

By Lemma 24.4 (1), we know that E = n=1 En . Thus it suffices to show that En has measure

zero for each n. So now fix n, and choose a partition P such that U (f, P ) − L(f, P ) < nε .

Let

S = {I : I is a subinterval of P, and int(I) ∩ En 6= ∅}.

For I ∈ S we have that M (I) − m(I) ≥ n1 . (The reason is that there must exist a point

a ∈ En in the interior of I, so that I ⊇ Br (a) for some r > 0.) But now we estimate

X

1

n

|I| ≤ U (f, P ) − L(f, P ) < nε ,

I∈S

NOTES, MAT 472, INTERMEDIATE ANALYSIS, FALL 2010 61

P S

so that I∈S |I| < ε. Now the union I∈S I contains all points of En except possibly some

of the endpoints of subintervals of P not in S. There can be only finitely many such points.

Let T be a collection

P P of open intervals centered at these points with total length so small that

I∈S |I| + J∈T |J| < ε. Then {int(I) : I ∈ S} ∪ T is a finite collection of open intervals

covering En and having total length less than ε. Therefore En has measure zero.

Now we prove the converse. Suppose that E has measure zero. Let |f | ≤ K on [a, b],

and let ε > 0 be given. Let E0 = {x ∈ [a, b] : osc(f, x) ≥ ε}. Then E0 ⊆SE, so that

∞

P0∞also has measure zero. Let U1 , U2 , . . . be open intervals such that E0 ⊆ i=1 Ui and

E

i=1 |Ui | < ε. By Lemma 24.4(2), E0 is closed.SnSince E0 ⊆ [a,b], E0 is compact. Thus there

is n such that E0 ⊆ U1 ∪ · · · ∪ Un . Let P0 = i=1 ∂Ui ∩ [a, b] ∪ {a, b}, a partition of [a, b].

We will find a suitable refinement P of P0 such that U (f, P ) − L(f, P ) < (2K + b − a)ε,

which will conclude the proof. Since P0 contains the endpoints of the Ui ’s, each subinterval

associated to P0 is either contained in some Ui , or is disjoint from all of the Ui ’s. Let S1

denote the collection of those subintervals that are contained in some Ui , and let S2 denote

the remaining subintervals. Then for I ∈ S1 we have

M (I) − m(I) ≤ 2K.

Hence

X n

X

M (I) − m(I) |I| ≤ 2K |Ui | < 2Kε.

I∈S1 i=1

Now consider a subinterval I ∈ S2 . Then I ∩ E0 = ∅, so the oscillation of f at each point

of I is less than ε. Thus for each x ∈ I there is an open interval Ix centered at x such that

M (Ix ) − m(Ix ) < ε. The collection {Ix : x ∈ I} is an open cover of the compact interval I,

hence has a finite subcover: there are x1 , . . ., xk ∈ I such that I ⊆ ki=1 Ixi . We define P by

S

including into P0 all endpoints of the Ixi that lie in I:

[ [

P = P0 ∪ (∂Ixi ) ∩ I .

I∈S2 i

Let us consider the subintervals of P contained in some I ∈ S2 , let J be one such. Then

J ⊆ Ixi for some i, and hence M (J) − m(J) < ε. Therefore

XX XX X

M (J) − m(J) |J| < ε|J| = ε |I| ≤ ε(b − a).

I∈S2 J⊆I I∈S2 J⊆I I∈S2

We now have

X XX

U (f, P ) − L(f, P ) = M (I) − m(I) |I| + M (J) − m(J) |J|

I∈S1 I∈S2 J⊆I

We now give several consequences of this theorem. These can be proved directly from

the Riemann or Darboux definitions, but are deduced much more easily from the above

characterization. The first of these is immediate from the theorem.

Corollary 24.6. A bounded function with only finitely many discontinuities is Riemann

integrable. In particular, a piecewise continuous function is Riemann integrable.

Corollary 24.7. A function that is zero except at finitely many points is Riemann integrable.

Moreover, the integral of such a function equals 0.

62 JACK SPIELBERG

Proof. The integrability follows from the previous corollary. It is easy to use the definition

of the integral to show that the integral is zero.

Corollary 24.8. Riemann integrability, and the value of the Riemann integral, of a function

are unaffected when the function is altered at finitely many points.

Proof. The altered function equals the sum of the original function with a function that is

zero except at finitely many points. Thus the previous corollary, together with linearity of

the integral, give the result.

Corollary 24.9. Monotone functions are Riemann integrable.

Proof. This follows from the fact that a monotone function has countably many discontinu-

ities. To see this, note that a monotone function has one-sided limits at all points, and is

discontinuous at a point if and only if the two one-sided limits at that point are distinct. If

we let

f (x±) = lim± f (t),

t→x

then for any x 6= y we have f (x−), f (x+) ∩ f(y−), f (y+) = ∅. Thus if we let q(x) be

a rational number in the interval f (x−), f (x+) for each discontinuity x of f , then q is a

one-to-one function from the set of discontinuities into Q. Therefore the set of discontinuities

is countable, and hence of measure zero.

Corollary 24.10. The product of Riemann integrable functions is Riemann integrable.

Proof. The set of discontinuities of f g is contained in the union of the sets of discontinuities

of f and g separately.

Corollary 24.11. Let f be Riemann integrable on [a, b], and let ϕ be a continuous function

defined on the range of f . Then ϕ ◦ f is Riemann integrable (also on [a, b]).

Proof. Since composition preserves continuity, the set of points where f is continuous is

contained in the set of points where ϕ ◦ f is continuous. Hence the sets of discontinuities

satisfy the reverse containment.

Remark 24.12. The order in which the two functions are composed in the previous corollary

is crucial: f ◦ ϕ need not be integrable. (You can remember which order preserves integrabi-

lity by noting that in the corollary, the composition has the same domain as the integrable

function.)

Corollary 24.13. If f is Riemann integrable, then so is |f |.

Proof. |f | = | · | ◦ f .

Corollary 24.14. Let f be Riemann integrable on [a, b], and let [c, d] ⊆ [a, b]. Then f is

Rd Rb

Riemann integrable on [c, d]. Moreover, c f = a f χ[c,d] .

Proof. For the first statement, note that any discontinuity of f in [c, d] is also a discontinuity

in [a, b]. The second statement follows easily from either definition of the integral by including

{c, d} into a partition of [a, b].

Corollary 24.15. Let f be Riemann integrable on [a, b], and let c ∈ (a, b). Then

Z b Z c Z b

f= f+ f.

a a c

NOTES, MAT 472, INTERMEDIATE ANALYSIS, FALL 2010 63

Proof. This follows from linearity of the integral and Corollary 24.8, since f χ[a, b] and

f χ[a,c] + f χ[c,b] can differ only at c.

The image of a set of measure zero under a continuous function need not have measure

zero. This is a pretty strange phenomenon. The upshot is that continuity is not really such a

strong property. It is important that a stronger version of continuity is sufficient to preserve

measure zero sets.

Lemma 24.16. Let g : [a, b] → R be a Lipschitz function, and let E ⊆ [a, b] have measure

zero. Then g(E) has measure zero.

Proof. Let c > 0 be a Lipschitz

constant for g. We claim that if I is an open interval

contained in [a, b], then g(I) ≤ c|I|.To see this, let I = (t − r, t + r). Then by the Lipschitz

condition, g(I) ⊆P g(t) − cr, g(t) + cr . Now let ε > 0. Let U1 , U2 , . . . be open intervals with

ε

S

E ⊆ i Ui and i |Ui | < c . Let us assume that Ui ⊆ [a, b]; this is not a serious restriction,

as we may extend the domain of g to all of R (e.g. by letting g be S constant on (−∞, a] and

on [b, ∞)) without changing the Lipschitz constant. Then g(E) ⊆ i g(Ui ), and

X X

g(Ui ) ≤ c |Ui | < c εc = ε.

i i

Corollary 24.17. Let f be Riemann integrable, and let ϕ be a continuous one-to-one func-

tion (defined on an interval) such that its inverse function ϕ−1 is Lipschitz. Then f ◦ ϕ is

Riemann integrable. (Compare with Corollary 24.11.)

Proof. Let E denote the set of discontinuities of f . We first claim that the set of disconti-

nuities of f ◦ ϕ is contained in ϕ−1 (E). To see this, note that if x 6∈ ϕ−1 (E) then ϕ(x) 6∈ E,

so that f is continuous at ϕ(x). But then f ◦ ϕ is continuous at x. Hence any point where

f ◦ ϕ is discontinuous must be contained in ϕ−1 (E). By the previous corollary, ϕ−1 (E) has

measure zero.

Before giving the fundamental theorem, we present the usual notational expediencies.

Ra Rb

Definition 25.1. If a < b and f is Riemann integrable on [a, b], we define b f = − a f .

Rb Rc

Theorem

Ra 25.2. Let f be bounded on an interval containing a, b and c. If two of a f , b f

and c f exist, then so does the third, and

Z b Z c Z a

f+ f+ f = 0.

a b c

Proof. One of a, b and c lies between the other two. By symmetry, we may assume without

loss of generality that it is b that lies in the middle. Again without loss of generality, we may

assume that a < b < c. Now, if f is integrable on [a, b], we are done by Corollary 24.14. On

the other hand, if f is integrable on [a, c] and [c, b], then f is integrable on [a, b] by Theorem

24.5.

R a R b

Remark 25.3. If a < b, then b f ≤ a |f |.

64 JACK SPIELBERG

[a, b]. For x ∈ [a, b] let F (x) = a f . Then F is Lipschitz (in particular, F is continuous.)

If f is continuous at x0 ∈ [a, b], then F is differentiable at x0 , and F 0 (x0 ) = f (x0 ) (in

particular, if f is continuous on [a, b], then F is differentiable on [a, b]).

Proof. We leave as an exercise the proof that F is Lipschitz. Suppose that f is continuous

x0 . Let ε > 0. Then there is δ > 0 such that for all x ∈ [a, b], if |x − x0 | < δ then

at

f (x) − f (x0 ) < ε. Now for x ∈ [a, b] \ {x0 } with |x − x0 | < δ, we have

Z x Z x0

F (x) − F (x0 ) 1

= f− f

x − x0 x − x0 a a

Z x

1

= f, hence

x − x 0 x0

Z x Z x

F (x) − F (x0 ) 1 1

− f (x0 ) = f− f (x0 )

x − x0 x − x 0 x0 x − x 0 x0

Z x

1

= f − f (x0 )

x − x 0 x0

Z x

1

≤ f − f (x0 )

|x − x0 |

x0

1

≤ ε|x − x0 |

|x − x0 |

= ε.

Corollary 25.5. If f is continuous on [a, b], then f has an antiderivative on [a, b].

Corollary 25.6. If f is continuous on [a, b], and if G is an antiderivative for f on [a, b],

Rb

then a f = G(b) − G(a).

Rx

Proof. Let F (x) = a f . Then F and G are both antiderivatives for f on [a, b]. Thus F

and G are differentiable on [a, b], and F 0 = G0 . By Corollary 20.14, F − G is constant, say

F − G = c. Then

Z b

f = F (b) = F (b) − F (a) = G(b) + c − G(a) + c = G(b) − G(a),

a

We conclude this section with the change of variable theorem, which is the basis for the

method of integration by substitution from elementary calculus. If we assume that the

function f is continuous, then an easy short proof can be given using antiderivatives (you

might try to find it as an exercise). Our characterization of integrability by means of sets of

measure zero lets us prove a more general result, without too much extra work.

Theorem 25.7. (Change of variable theorem.) Let f be Riemann integrable on [a, b], and

let g : [c, d] → [a, b] be continuously differentiable with g 0 6= 0 on [c, d]. Then

Z g(d) Z d

f g(x) g 0 (x) dx.

f (y) dy =

g(c) c

NOTES, MAT 472, INTERMEDIATE ANALYSIS, FALL 2010 65

Proof. Since g 0 is continuous on [c, d], it does not change sign. We first consider the case

where g 0 > 0 on [c, d]. Then g(c) < g(d). Note that g −1 is also continuously differentiable,

by the inverse function theorem, and hence that g −1 is Lipschitz. By Corollary 24.17, f ◦ g

is Riemann integrable, and hence so is (f ◦ g)g 0 . Let L and L0 be the two integrals in the

statementof the theorem, and let ε > 0. Let δ > 0 be such that

for any partition pair

(Q, U ) of g(c), g(d) with mesh(Q) < δ we have R(f, Q, U ) − L < ε. Since

g is uniformly

continuous on [c, d] there is η1 > 0 such that if |x − x0 | < η1 then g(x) − g(x0 ) < δ.

Choose

η2 > 0 such

that for any partition pair (P, T ) of [c, d] with mesh(P ) < η2 we have

R (f ◦ g)g 0 , P, T − L0 < ε. Fix a partition P of [c, d] with mesh(P ) < min{η1 , η2 }.

Write P = {x0 , x1 , . . . , xn }. Let yi = g(xi ), and let Q = g(P ) = {y0 , y1 , . . . , yn }. Since

mesh(P ) < η1 we know that mesh(Q) < δ. The mean value theorem applied to g on

[xi−1 , xi ] gives ti ∈ (xi−1 , xi ) such that

g(xi ) − g(xi−1 ) = g 0 (ti )(xi − xi−1 )

i.e. ∆yi = g 0 (ti )∆xi .

Let ui = g(ti ), and set U = (u1 , . . . , un ). Then (Q, U ) is a partition pair of g(c), g(d) , and

n

X n

X

f g(ti ) g 0 (ti )∆xi = R (f ◦ g)g 0 , P, T .

R(f, Q, U ) = f (ui )∆yi =

i=1 i=1

Therefore

|L − L0 | ≤ L − R(f, Q, U ) + R (f ◦ g)g 0 , P, T − L0 < ε + ε = 2ε.

Hence L = L0 .

If, on the other hand, g 0 < 0 on [c, d], then g(d) < g(c). Note that ∆yi = −g 0 (ti )∆xi (and

R g(c)

i runs backward). But R(f, Q, U ) approximates g(d) f = −L.

In this section, we will apply the Riemann integral to prove a classic, and still very

important, theorem from the 19th century, on polynomial approximation. The proof relies

on a technique called smearing that is very useful harmonic analysis, and many other areas

of mathematics. We will discuss this idea first, and then see about the Weierstrass theorem.

As an introductory example of smearing, consider a continuous function f : R → R. Then

f ∈ R[a, b] for any a < b. For n ∈ N we define the nth average of f by

n x+1/n

Z

An (x) = f (t) dt, for x ∈ R.

2 x−1/n

Thus An (x) is the average of f over the interval of length 2/n centered at x. We claim that

An converges to f uniformly on any compact interval. For the proof, let [a, b] be the interval,

and let ε > 0. Choose δ > 0 as in the definition of uniform continuity for f and ε on [a, b].

Let n > 1/δ. Then for any x ∈ [a, b],

Z Z x+1/n

n x+1/n n

An (x) − f (x) = f (t) dt − f (x) dt

2 x−1/n 2 x−1/n

Z x+1/n

n

≤ f (t) − f (x) dt;

2 x−1/n

66 JACK SPIELBERG

for t ∈ [x − 1/n, x + 1/n], f (x) − f (t) < ε, so

Z x+1/n

n

≤ ε dt

2 x−1/n

= ε.

Observe that An is constructed from f and the auxiliary function n2 χ[−1/n,1/n] :

n x+1/n n 1/n

Z Z Z

n

An (x) = f (t) dt = f (t + x) dt = f (t + x) χ[−1/n,1/n] (t) dt.

2 x−1/n 2 −1/n R 2

Let gn = n2 χ[−1/n,1/n] . Thus we may express An in the form

Z

(∗) An (x) = f (t + x)gn (t) dt.

R

(1) gRn ≥ 0.

(2) R gn = 1.

Rδ

(3) For any δ > 0, limn→∞ −δ gn = 1. (This means that the “mass” of gn concentrates

at 0 as n → ∞.)

An argument analogous to the above will work for any sequence of functions having these

three properties. Such a sequence is sometimes called an approximate identity, and there

are many important examples. Here is an example that we will use to prove the Weierstrass

theorem.

Define hn : R → R by (

(1 − x2 )n , if |x| ≤ 1

hn (x) =

0, if |x| > 1.

R1

Let cn = −1 hn , and let gn = c1n hn . (A sketch of hn , and of gn , will aid in understanding.)

It is immediate that this sequence (gn ) satisfies the first two properties above. For the third,

note that

1 − t2 ≥ 1 − t on [0, 1]

(1 − t2 )n ≥ (1 − t)n

Z 1 Z 1

2 n 1

(1 − t ) dt ≥ (1 − t)n dt =

0 0 n+1

Z 1

2

cn = (1 − t2 )n dt ≥ ,

−1 n+1

and so

Z 1 1 1

1−δ

Z Z

1 2 n n+1

gn (t) dt = (1 − t ) dt ≤ (1 − δ 2 )n dt = (n + 1)(1 − δ 2 )n → 0

δ cn δ 2 δ 2

R −δ Rδ

as n → ∞. Similarly, −1 gn → 0 as n → ∞. Hence −δ gn → 1.

Now we will state and prove the Weierstrass approximation theorem.

NOTES, MAT 472, INTERMEDIATE ANALYSIS, FALL 2010 67

converging uniformly to f on [a, b].

Proof. The proof is easier if we first make some reductions. Suppose we prove the theorem

in the case that [a, b] = [0, 1]. Let f : [a, b] → R be continuous. Let ϕ(x) = a + (b − a)x and

ψ(x) = (x − a)/(b − a). Then ϕ and ψ are inverses of each other, and ϕ [0, 1] = [a, b]. Now

f ◦ ϕ : [0, 1] → R is continuous, so we are assuming that there are polynomials qn converging

to f ◦ ϕ uniformly on [0, 1]. Then qn ◦ ψ converges to f ◦ ϕ ◦ ψ = f uniformly on [a, b], and

it is clear that qn ◦ ψ are also polynomials.

Now suppose that we can prove the theorem in the case that [a, b] = [0, 1] and with the

assumption that f (0) = f (1) = 0. Let f : [0, 1] → R be an arbitrary continuous function.

Let w(x) = f (0) + f (1) − f (0) x. Then f − w is continuous on [0, 1], and vanishes at 0

and at 1. By our assumption, there is a sequence qn of polynomials converging to f − w

uniformly on [0, 1]. But then qn + w is a sequence of polynomials converging uniformly to f

on [0, 1].

The above remarks mean that if we can prove theorem in the case where [a, b] = [0, 1] and

f (0) = f (1) = 0, then we will have proved the theorem in general. So now we consider such

a function f . Since f is continuous on the compact set [0, 1], it is bounded. Let |f | ≤ M

on [0, 1]. Extend the domain of f to all of R by setting f (t) = 0 for t 6∈ [0, 1]. Then f is

continuous on R. Let gn (t) = (1/cn )(1 − t2 )n χ[−1,1] be as above. For x ∈ [0, 1] define pn (x)

by Z

pn (x) = f (t + x)gn (t) dt.

R

(Note that pn is defined just like An via equation (∗) in our preliminary discussion.)

We now claim that pn converges to f uniformly on [0, 1]. For this, we use the properties

of gn as an approximate identity. Let ε > 0. Choose δ > 0 as in the definition of uniform

continuity of f on [0, 1]. Notice that because f vanishes outside the interval [0, 1], this δ

satisfies the definition of uniform continuity for f on all of R. By property (3) of approximate

Rδ

identities, there is n0 ∈ N such that 1 − −δ gn < δ for n ≥ n0 . Now, for any n ≥ n0 and any

x ∈ [0, 1], we have

Z Z

pn (x) − f (x) = f (t + x)gn (t) dt − f (x) gn (t) dt

ZR R

= f (t + x) − f (x) gn (t) dt

ZR

≤ f (t + x) − f (x)gn (t) dt

ZR Z

= f (t + x) − f (x)gn (t) dt + f (t + x) − f (x)gn (t) dt

[−δ,δ] [−1,−δ]∪[δ,1]

= C1 + C2 ,

where we have restricted the integration to the interval [−1, 1] because

gn vanishes outside

that interval. We estimate C1 and C2 separately. For |t| ≤ δ we have f (t + x) − f (x) < ε,

so Z δ Z

C1 ≤ εgn ≤ ε gn = ε.

−δ

68 JACK SPIELBERG

For |t| > δ we have f (t + x) − f (x) ≤ 2M , so

Z Z δ

C2 ≤ 2M gn = 2M 1 − gn < 2M ε.

[−1,−δ]∪[δ,1] −δ

Thus pn (x) − f (x) < (2M + 1)ε for all x ∈ [0, 1].

We will finish the proof by showing that pn is a polynomial on [0, 1]. For x ∈ [0, 1],

Z

pn (x) = f (t + x)gn (t) dt

R

Z 1−x

= f (t + x)gn (t) dt, since f = 0 outside [0, 1],

−x

Z1

= f (u)gn (u − x) du, by the change of variable u = x + t

0

Z 1

1 n

= 1 − (u − x)2 du, since u − x ∈ [−1, 1] when u ∈ [0, 1].

f (u)

0 cn

n

Note that c1n 1 − (u − x)2 is a polynomial in u and x:

2n 2n 2n

!

1 n X X X

1 − (u − x)2 = aij ui xj = aij ui xj .

cn i,j=0 j=0 i=0

It follows that !

2n

X Z 1 2n

X

pn (x) = f (u) aij ui du xj

j=0 0 i=0

is a polynomial in x.

There are many limiting processes in analysis, and it is frequently the case that two of

them bump up against each other. We have seen this once already, in Theorem 19.7. Let’s

recall that statement: “Let f , fn : X → Rk , and let a ∈ X. Suppose that each fn is

continuous at a, and that fn → f uniformly. Then f is continuous at a.” We can rewrite

this in the following way:

lim lim fn (x) = lim lim fn (x),

x→a n→∞ n→∞ x→a

which shows that it is an example of the interchange of two limiting processes. We also

saw an example where the above equation does not hold — in that example, the sequence of

functions converges pointwise, but not uniformly. It is the uniform nature of the convergence

that makes the theorem true. This points out another aspect of such situations: the order

of two limiting processes may be reversed if appropriate conditions hold. As a general rule,

you should always verify such conditions explicitly when making such an interchange. From

the point of view of an instructor, the interchange of two limits in a solution is ALWAYS a

red flag, and must be justified in detail by the student.

Theorem 27.1. Let fn ∈ R[a, b], let f : [a, b] → R, and suppose that fn → f uniformly on

Rb Rb

[a, b]. Then f ∈ R[a, b], and a f = limn a fn .

NOTES, MAT 472, INTERMEDIATE ANALYSIS, FALL 2010 69

Proof. Let ε > 0 be given. Let η = ε/ 1 + 2(b − a) . Choose N such that kfn − f ku < η for

n ≥ N . Let n ≥ N . Since fn ∈ R[a, b] there are step functions g0 , h0 such that g0 ≤ fn ≤ h0

Rb

on [a, b], and a (h0 − g0 ) < η. Let g = g0 − η and h = h0 + η. Then g and h are step

functions. If x ∈ [a, b], we have

Moreover,

Z b Z b Z b

(h − g) = (h0 − g0 + 2η) = (h0 − g0 ) + 2η(b − a) < η 1 + 2(b − a) = ε.

a a a

Z b Z b Z b

fn − f ≤ |fn − f | ≤ η(b − a) < ε,

a a a

Rb Rb

so we have that limn→∞ a

fn = a

f.

do not converge to the integral of the limit function.

We now turn to the derivative of the limit of a sequence of differentiable functions. Here,

the situation is more complicated: uniform convergence seems to have nothing to do even

with differentiability of the limit, let alone with convergence to the derivative of the limit

if the limit is differentiable. For example, we know from the Weierstrass approximation

theorem that every continuous function is a uniform limit of polynomials, which are certainly

differentiable. But the continuous limit need not be differentiable. This is an indication that

we need to assume more to get a theorem analogous to the previous one, but for derivatives.

The key idea is that we have to assume that the sequence of derivatives converges uniformly.

Actually, even though this is a very strong hypothesis, it is not quite enough. For example,

any sequence of constant functions has derivatives that converge uniformly (since they are

all identically zero), but the sequence of functions need not converge at all.

Theorem 27.2. Let I be an interval, and let fn : I → R be differentiable. Suppose that (fn0 )

converges uniformly on I to a functiong. Suppose additionally that there is a ∈ I such that

the sequence of function values fn (a) converges. Then (fn ) converges to a differentiable

function f , and f 0 = g. Moreover, the convergence of fn to f is uniform on any bounded

subinterval of I.

Proof. We first show that there is a function f to which (fn ) converges, and that this con-

vergence is uniform on bounded subintervals.

Let ε > 0 be given. Choose N so that

kfn0 − fm

0

ku < ε and fn (a) − fm (a) < ε for all m, n ≥ N . Let J ⊆ I be a bounded

subinterval; say |x| ≤ M for x ∈ J. If x ∈ J, we have

fn (x) − fm (x) = (fn − fm )(x)

= (fn − fm )(x) − (fn − fm )(a) + fn (a) − fm (a)

= (fn − fm )0 (c)|x − a| + fn (a) − fm (a),

70 JACK SPIELBERG

≤ ε|x − a| + ε

= ε |x − a| + 1

≤ ε(M + |a| + 1).

Thus (fn ) is uniformly Cauchy on J, and hence converges uniformly on J (and pointwise on

all of I too).

Let f be the limit of fn . We now show that f is differentiable, and that f 0 = g. Let

ε > 0. Choose N so that kfn0 − fm 0

ku < ε/3 for m, n ≥ N . Letting m → ∞, we see also that

kfn0 − gku ≤ ε/3. Now fix n ≥ N , and fix x ∈ I. For any h 6= 0 such that x + h ∈ I, we have

fn (x + h) − fn (x) f (x + h) − f (x) fn (x + h) − fn (x) fm (x + h) − fm (x)

− = m→∞

lim

−

h h h h

(fn − fm )(x + h) − (fn − fm )(x)

= lim

m→∞ h

= lim (fn − fm )0 (x + θh),

m→∞

ε

≤ .

3

Next we use the differentiability of fn (at x) to choose δ > 0 such that

fn (x + h) − fn (x) 0

ε

− fn (x) < ,

h 3

whenever 0 < |h| < δ and x + h ∈ I. Then for such h we have

f (x + h) − f (x) f (x + h) − f (x) fn (x + h) − fn (x)

− g(x) ≤ −

h h h

fn (x + h) − fn (x)

− fn0 (x) + fn0 (x) − g(x)

+

h

ε ε ε

< + + = ε.

3 3 3

Therefore limh→0 f (x+h)−f (x) /h = g(x). Hence f is differentiable, and f 0 (x) = g(x).

As a last example of the interchange of two limiting processes, we give a result on differen-

tiating an integral. For this we recall from earlier experience the notion of partial derivative.

Let f : [a, b] × [c, d] → R, and suppose that for each y ∈ [c, d] the function x 7→ f (x, y) is

differentiable on [a, b]. The partial derivative of f with respect to x is defined by

∂f f (x + h, y) − f (x, y)

(x, y) = lim .

∂x h→0 h

Theorem 27.3. Let f : [a, b] × [c, d] → R be continuous, and suppose that ∂f /∂x exists and

Rd

is continuous on [a, b] × [c, d]. Let G : [a, b] → R be defined by G(x) = c f (x, y) dy. Then

Rd

G is differentiable on [a, b], and G0 (x) = c (∂f /∂x)(x, y) dy.

NOTES, MAT 472, INTERMEDIATE ANALYSIS, FALL 2010 71

Proof. Let ε > 0. Since ∂f /∂x is continuous on the compact set [a, b] × [c, d], it is uniformly

continuous. Let δ > 0 be as in the definition of uniform continuity for ∂f /∂x on [a, b] × [c, d]

and for the positive quantity ε/(d − c). Now if x, x + h ∈ [a, b] with 0 < |h| < δ, then

Z d Z d

G(x + h) − G(x) ∂f f (x + h, y) − f (x, y) ∂f

− (x, y) dy = − (x, y) dy

h c ∂x c h ∂x

Z d

∂f ∂f

= (x + θh, y) − (x, y) dy ,

∂x c ∂x

for some θ ≡ θ(x, y, h) ∈ (0, 1),

ε

≤ (d − c) = ε.

d−c

Rd

It follows that G0 (x) = c

(∂f /∂x)(x, y) dy.

P∞

Definition 28.1. Let (an )n∈N be a sequence in R. The infinite series n=1 an is defined as

follows. For each n let sn = ni=1 ai ; sn is called the nth partial

P

P∞ sum of the series. The infinite

series is the sequence (sn )n∈N of partial sums. The series n=1 an converges (diverges) if the

sequence of partial sums converges (diverges). The sum of a convergent

P∞ series is the limit of

the sequence of partial sums. The sum is usually denoted by n=1 an .

Remark 28.2. Frequently, an infinite series uses N ∪ {0} as index set. Other intervals in Z

are also used.

Remark 28.3. The Cauchy criterion for convergence P of real sequences can be translated

into the following criterion for convergence of series: an converges

Pn if and

only if for every

ε > 0, there exists n0 ∈ N such that for all n0 ≤ m ≤ n we have

i=m ai < ε.

P∞

Theorem 28.4. (Test for divergence.) Let n=1 an be an infinite series. If the series

converges, then limn→∞ an = 0.

Proof. Suppose that ∞

P

n=1 an converges. Then

n→∞ n→∞ n→∞ n→∞

∞

X ∞

X

= an − an = 0.

n=1 n=1

P∞

Example 28.5. (1) The series n=0 (−1)n diverges, since limn→∞ (−1)n does not exist

(and hencePis not equal to zero). √

(2) The series ∞ n=1 n

−1/n

diverges, since limn→∞ n−1/n = 1/(limn→∞ n n) = 1 is nonzero.

P P P P

Proposition

P 28.6.PIf an and bn converge, then so does (λan + µbn ), and (λan +

µbn ) = λ an + µ bn .

Proof. This follows immediately from the corresponding results for sequences.

72 JACK SPIELBERG

P

n=0 x converges if and only if

|x| < 1, in which case its sum is 1/(1 − x). (The number x is called the ratio of the geometric

series.)

Proof. First suppose that |x| < 1. Note that

n+1

X

xsn = xi = sn + xn+1 − 1,

i=1

and hence

1 − xn+1

.

sn =

1−x

Since |x| < 1, then limn→∞ xn+1 = 0. Thus limn→∞ sn = 1/(1 − x).

Conversely, if |x| ≥ 1, then (xn ) does not converge, so the series diverges by the test for

divergence.

Remark 28.8. Generally, it is very difficult to find the sum of a convergent infinite series.

Often we content ourselves with being able to prove convergence (or divergence). Geometric

series are one of the exceptions. For the next family of series, we will not worry about the

value of the sum. First we make an observation about series with nonnegative terms.

Lemma 28.9. Let ∞

P

n=1 an be an infinite series, and suppose that an ≥ 0 for all n. Then

the series converges if and only if the sequence of partial sums is bounded.

Proof. The sequence of partial sums is increasing, since sn+1 = sn + an+1 ≥ sn . We already

know that a monotone sequence converges if and only if it is bounded.

P∞

Theorem 28.10. (p series) Let p ∈ R. The series n=1 1/np converges if and only if p > 1.

Proof. If p ≤ 0, then the terms of the series are increasing. In this case, 1/np cannot converge

to zero, so the series diverges. Now assume that p > 0. In this case the terms of the series

are decreasing. Let an = 1/np , and consider two cases. First, suppose that p > 1. We have

that

n −1

2X

ai = a1 + (a2 + a3 ) + (a4 + · · · + a7 ) + · · · + (a2n−1 + · · · + a2n −1 )

i=1

≤ a1 + 2a2 + 4a4 + · · · + 2n−1 a2n−1 ,

since the terms are decreasing. Hence

n−1 n−1 n−1

X

j

X

j−jp

X j

s2n −1 ≤ 2 a2j = 2 = 21−p .

j=0 j=0 j=0

This last is the partial sum of a geometric series with ratio 21−p . Since p > 1, the ratioPis less

than 1, and hence the geometric series converges. It follows that the partial sums of 1/np

are bounded, hence it converges.

Next we suppose that 0 < p ≤ 1. We have that

2 n

X

ai = a1 + a2 + (a3 + a4 ) + (a5 + · · · + a8 ) + · · · + (a2n−1 +1 + · · · + a2n )

i=1

≥ a2 + 2a4 + 4a8 + · · · + 2n−1 a2n ,

NOTES, MAT 472, INTERMEDIATE ANALYSIS, FALL 2010 73

n−1 n−1 n−1

X

j

X

j−(j+1)p −p

X j

s2n ≥ 2 a2j+1 = 2 =2 21−p .

j=0 j=0 j=0

1−p

Since p ≤ 1, the ratio of P

this last geometric series is 2 ≥ 1, hence it diverges. It follows

that the partial sums of 1/np are unbounded, so it diverges also.

P∞

Example 28.11. We particularly draw attention to the case p = 1: n=1 1/n is called the

harmonic series, and it diverges.

Having a few series whose behavior is known makes it relatively easy to establish conver-

gence or divergence of other series.

Theorem 28.12. (Comparison test.) Let ∞

P P∞

n=1 an and n=1 bn be series, and suppose that

an , bn ≥ 0 for all n. Suppose further that an ≤ bn for all n.

(1) If ∞

P P∞

n=1 bn converges, then so does an .

P∞ P∞n=1

(2) If n=1 an diverges, then so does n=1 bn .

(In fact, both conclusions hold if the corresponding inequalities are valid only for n ≥ n0 .)

Pn Pn

Proof. Let sPn = i=1 ai and tn = i=1 bi . Since ai ≤ bi for all i, we know that sn ≤ tn

∞

P∞all n. If n=1 bn converges,

for P∞ then (tn ) is bounded above, and hence so is (sn ). Therefore

a

n=1 n converges.

P∞ If n=1 n diverges, then (sn ) is unbounded, and hence so is (tn ). It

a

follows that n=1 bn diverges.

P∞

Definition P28.13. Let n=1 an be a convergent infinite series.

P∞ It converges absolutely if

∞

the series n=1 |an | converges. It converges conditionally if n=1 |an | diverges.

Thus all infinite series may be classified into three (mutually exclusive) types: absolutely

convergent, conditionally convergent, and divergent. Most of the previous results concern

series whose terms are nonnegative; hence they are useful for establishing absolute conver-

gence. It is appropriate to think of absolute convergence as “robust”, and of conditional

convergence as “touchy”. In this course we will only treat absolute convergence (for lack

of time). However a few remarks about conditional convergence are in order. The simplest

example of a conditionally convergent series is the alternating harmonic series:

1 1 1 1

− + − + ··· .

1 2 3 4

It is an easy exercise to prove that this series converges. Since its absolute value is the har-

monic series that we already know diverges, the alternating harmonic series is conditionally

convergent. The difference between absolute and conditional convergence can be illustrated

by the fact that the sum of the alternating harmonic series can be altered by changing the

order in which the terms are added. This somewhat counter-intuitive fact can be explained

by noticing that the “subseries” of odd terms by itself is divergent, as is the subseries of even

terms. This phenomenon is completely general: the sum of an absolutely convergent series is

unaffected by rearranging the terms, while the sum, and even convergence, of a conditionally

convergent series can be changed arbitrarily by a suitable rearrangement of the terms. We

will not discuss this theorem further.

The next two theorems give the most useful tests for absolute convergence.

74 JACK SPIELBERG

P∞

Theorem 28.14. (Ratio test.) Let n=1an be a series of nonzero terms. Let

an+1

L = lim sup

.

n→∞ an

(1) If L < 1 then the series converges

absolutely.

(2) If there exists n0 ∈ N such that an+1 /an ≥ 1 for all n ≥ n0 , then the series diverges.

Proof. (1) Choose r such that L < r < 1. Let n0 ∈ N be such that an+1 /an ≤ r for

n ≥ n0 . Then for k ≥ 0 we have

|an0 +k | ≤ r|an0 +k−1 | ≤ r2 |an0 +k−2 | ≤ · · · ≤ rk |an0 |.

n n0

P n ≥ n0 we have |an | ≤ Cr , where C = |an0 |/r . The absolute

It follows that for

convergence of an now follows from by comparison with the geometric series of

ratio r.

(2) In this case, the hypotheses imply that |an | ≥ |an0 | for all n ≥ n0 , and hence an does

not tend to zero.

P∞

Theorem 28.15. (Root test.) Let n=1 an be a series. Let

p

L = lim sup n |an |.

n→∞

(2) If L > 1 then the series diverges.

p

Proof. (1) Choose r such that L < r < 1. Let n0 ∈ N be such that n

|an | ≤ r for n ≥ n0 .

n

P

Then |an | ≤ r for n ≥ n0 . The absolute convergence of an now follows from by

comparison with the pgeometric series of ratio r.

(2) Let n0 be such that n |an | > 1 for n ≥ n0 . Then |an | > 1 for n ≥ n0 , and hence an

does not tend to zero.

The ratio and root test are useful in situations where the series converges at least as

strongly as some geometric series. Note that both tests are inclusive for the p series (exer-

cise!). The ratio test is usually easier to apply, but the root test is more effective: if the ratio

test indicates convergence, then the root test does too. There are series for which the ratio

test is inconclusive, but the root test indicates convergence (exercises).

29. Series of functions

Definition 29.1. Let X be a metric space, and let fn : X → R. The series of functions

P ∞

n=1 fn converges pointwise (uniformly) if the sequence of partial sums converges pointwise

(uniformly).

It is easy to translate results about convergence of sequences of functions to statements

about series of functions.

Theorem 29.2. (Cauchy criterion) ∞

P

n=1 fn converges uniformly on X if andPn only if for

every ε > 0 there exists n0 ∈ N such that for all m, n ≥ n0 and for all x ∈ X,

i=m fi (x) <

ε.

The following criterion for uniform convergence is called the Weierstrass M-test:

NOTES, MAT 472, INTERMEDIATE ANALYSIS, FALL 2010 75

P

fn be a series of functions on X. LetP(M

n=1

∞

n )1 be a sequence of

constants such that fn (x) ≤ Mn for all x ∈ X and all n. If n=1 Mn converges, then

P ∞

n=1 fn converges uniformly.

We also have the following facts as immediate corollaries of the theorems on uniform

convergence.

Theorem 29.4. Let fn : X → R be functions.

P P

(1) If fn is continuous for all n, and fn converges

P uniformly, then fn is continuous.

P

(2) If X = [a, b], fn ∈ R[a, b] for all n, and fn converges uniformly, then fn ∈

RbP PRb

R[a, b], and a fn = f .

a n P 0 P

(3) If X = [a, b], fn is differentiable forPall n, fn converges uniformly, and P 0fn (x0 )

fn is differentiable, and ( fn )0 =

P

converges for some x0 ∈ [a, b], then fn .

P∞

Definition 30.1. Let x0 ∈ R and let (an )∞ n=0 be a real sequence. The series n=0 an (x−x0 )

n

P with center xn0 . The domain of convergence of the power series is

the set D = x ∈ R : ∞

a

n=0 n (x − x 0 ) converges .

P∞

Theorem 30.2. Let n=0 an (x − x0 )n be a power series. There are three possibilities.

(1) The series converges absolutely for all x ∈ R. The convergence is uniform on compact

sets.

(2) There is R > 0 such that the series converges absolutely if |x − x0 | < R, and diverges

if |x − x0 | > R. The convergence is uniform on compact subsets of the interval

(x0 − R, x0 + R).

(3) The series diverges for all x 6= x0 .

Proof. Let x, y ∈ R with |y − x0 | < |x − x0 |. First suppose that the

series converges at x.

Then limn→∞ an (x − x0 )n = 0, so there is M such that an (x − x0 )n ≤ M for all n. Now we

have

y − x0 n

n n

an (y − x0 ) = an (x − x0 )

x − x0

an (y − x0 )n converges absolutely by comparison with the geometric series M rn ,

P P

Thus

where r = |y − x0 |/|x − x0 | < 1. The contrapositive of the above implication shows that, on

the other hand, if the series diverges at y, then it must also diverge at x.

Now, if cases (1) and (3) do not apply, let R = sup |x − x0 | : the series converges at

x . It follows from the above that 0 < R < ∞, and that the series converges absolutely for

|x − x0 | < R and diverges for |x − x0 | > R. We let R = ∞ in case (1), and we let R = 0 in

case (3). In cases (1) and (2), if K ⊆ x : |x − x0 | < R is a compact set, then there are

that 0 <nr < s < R, and K ⊆ (x0 − r, x0 +nr). Let x

r, s such P= x0 + s, and let M be as

above: an (x − x0 ) ≤ M for all n. Let Mn = M (r/s) . Then n Mn < ∞. For any y ∈ K

we have

an (y − x0 )n = an (x − x0 )n y − x0 ≤ M r = Mn .

n

x − x0 s

Then the convergence is uniform on K by the Weierstrass M-test.

Definition 30.3. The number R in Theorem 30.2 is called the radius of convergence of the

power series.

76 JACK SPIELBERG

of the power series.

p −1

Theorem 30.4. R = lim supn→∞ n |an |

1/n

Proof. We use the root test: lim supn→∞ an (x − x0 )n = lim supn→∞ |an |1/n |x − x0 |.

p −1

This is less than 1 if |x − x0 | < lim supn→∞ n |an | , and greater than 1 if |x − x0 | >

p −1

lim supn→∞ n |an | . This identifies the number R as in the statement of the theorem.

Let an(x−x0 )n have a positive radius of convergence R. Then nan (x−

P P

Theorem 30.5.

x0 )n−1 and x0 )n+1 also have radius of convergence R. If f : (x0 − R, x0 +

P

an /(n + 1) (x −P

R) → R is defined by f (x) = ∞ n 0

n=0 aRn (x − x0 ) , then these converge to f (x) and F (x) for

x

x ∈ (x0 − R, x0 + R) (where F (x) = x0 f (t) dt).

Proof. For x 6= x0 ,

∞ ∞

X X nan

nan (x − x0 )n−1 = (x − x0 )n .

n=0 n=0

x − x0

1/n

Since limn n = limn |x − x0 |1/n = 1, we have

nan 1/n

lim sup = lim sup |an |1/n = R−1 .

n→∞ x − x0 n→∞

1/n

1/n 1

1 ≤ (n + 1) = 1 + n1/n ≤ 21/n n1/n → 1 as n → ∞.

n

Now we have

∞ ∞

X an n+1

X an (x − x0 )

(x − x0 ) = (x − x0 )n ,

n=0

n + 1 n=0

n + 1

and hence

an (x − x0 ) 1/n

lim sup = lim sup |an |1/n = R−1 .

n→∞ n+1 n→∞

Since the convergence of all three series is uniform on compact subsets of the interior of

the domain of convergence, we may differentiate or integrate term-by-term, by Theorem

29.4.

Let X be a compact metric space (we will only consider the case where X is compact).

k

Recall that C(X, R ) is a complete metric space with the uniform norm: kf − gku =

supx∈X f (x) − g(x) . It is important to remember that the Heine-Borel theorem does not

hold in this setting: it is possible

for a closed bounded

set to be non-compact. In particular,

k

the closed unit ball B = f ∈ C(X, R ) : kf ku ≤ 1 is (usually) not compact.

Here is an example, with X = [0, 1]. Let fn (x) = xn . Then fn ∈ B. If B were compact,

then the sequence (fn ) in B would have a convergent subsequence — that means a uniformly

convergent subsequence. But we have already seen that there is no such subsequence, since

the pointwise limit of fn exists and is discontinuous.

NOTES, MAT 472, INTERMEDIATE ANALYSIS, FALL 2010 77

We will characterize the compact subsets of C(X, Rk ). Recall that a set is compact if

and only if it is complete and totally bounded. Since C(X, Rk ) is already a complete metric

space, a subset is complete if and only if it is closed. Therefore we will focus our attention

on the property of total boundedness: how can we describe in a more intrinsic way what it

means for a subset of C(X, Rk ) to be totally bounded?

Let F ⊆SC(X, Rk ) be totally bounded. Let ε > 0. Then there are f1 , . . ., fn ∈ F such

that F ⊆ ni=1 Bε (fi ). Since X is compact, and the fi are continuous, they are uniformly

continuous.

for each i there is δi > 0 such that for all x, y ∈ X, if d(x, y) < δi then

Thus

fi (x) − fi (y)
< ε. Let δ = min{δ1 , . . . , δn }. We claim that for any function f ∈ F, this δ

works in the definition of uniform continuity. To see this, let f ∈ F, and let x, y ∈ X with

d(x, y) < δ. There is i0 , 1 ≤ i0 ≤ n, such that kf − fi0 k < ε. Then

f (x) − f (y)
≤
f (x) − fi0 (x)
+
fi0 (x) − fi0 (y)
+
fi0 (y) − f (y)
< ε + ε + ε = 3ε.

Thus we have shown that the functions in the family F are “equally uniformly continuous”.

This phrase has been shortened to “equicontinuous”.

Definition 31.1. Let F be a family of functions between metric spaces X and Y . Let

x0 ∈ X.

(1) F is equicontinuous at x0 if for each ε > 0 there is δ > 0 such that for each f ∈ F

and for all x ∈ X, if dX (x, x0 ) < δ then dY f (x), f (x0 ) < ε. (I.e. δ is independent

of the choice of f ∈ F.)

(2) F is equicontinuous (on X) if it is equicontinuous at each point of X.

(3) F is uniformly equicontinuous (on X) if for each ε > 0 there is δ > 0 such that for

each f ∈ F and for all x, z ∈ X, if dX (x, z) < δ then dY f (x), f (z) < ε.

Exercise 31.2. If X is compact, and F is equicontinuous, then F is uniformly equicontin-

uous.

Because of this exercise, when X is compact we need not distinguish between equicontinu-

ity and uniform equicontinuity. We remark that there are stupid examples of equicontinuous

families. For example, in C(X, R), we may consider the family of all constant functions.

This family is clearly equicontinuous, but is not totally bounded (or even bounded). For this

reason we identify another property of a family of functions.

Definition 31.3.

F ⊆ C(X, Rk ) is pointwise bounded if for each x ∈ X, the set F(x) :=

f (x) : f ∈ F is a bounded subset of Rk .

Exercise 31.4. If F ⊆ C(X, Rk ) is pointwise bounded and equicontinuous, then F is a

bounded subset (of C(X, Rk )).

We remark that a totally bounded subset of C(X, Rk ) is also bounded, and hence pointwise

bounded. Thus we have already proved the following result.

Lemma 31.5. Let X be compact and F ⊆ C(X, Rk ). If F is totally bounded, then F is

pointwise bounded and equicontinuous.

The Arzela-Ascoli theorem is the converse of the lemma. It is usually phrased in terms of

precompactness: a subset of a metric space is precompact if its closure is compact. In the

setting of C(X, Rk ), then, precompactness is the same as total boundedness.

Theorem 31.6. Let X be a compact metric space, and let F ⊆ C(X, Rk ). Then F is

precompact if and only if it is pointwise bounded and equicontinuous.

78 JACK SPIELBERG

Proof. As remarked above, we have already proved the “only if” direction. So we assume that

F is pointwise bounded and equicontinuous. We use Exercise 31.2; hence F is uniformly

equicontinuous. Let ε > 0. Choose δ > 0 as in the definition of uniform equicontinuity

of F. SinceS X is compact, X is totally bounded. Then there are x1 , . . ., xp ∈ X such

that X = pi=1 Bδ (xi ). Now we use the pointwise boundedness of F. For each i, the set

F(xi ) = f (xi ) : f ∈ FS is a bounded subset of Rk , hence is totally bounded (by Lemma

14.27). Then the union pi=1 F(xi ) is also totally bounded. So we can choose points y1 , . . .,

yq ∈ Rk such that

p q

[ [

F(xi ) ⊆ Bε (yj ).

i=1 j=1

Now we come to the interesting part of the argument. Let f ∈ F. For each i, choose j

such that f (xi ) ∈ Bε (yj ). This defines a function ηf : {1, 2, . . . , p} → {1, 2, . . . , q}. Thus ηf

satisfies the formula

f (xi ) ∈ Bε (yηf (i) ).

But notice that there are only a finite number of possible functions η : {1, 2, . . . , p} →

{1, 2, . . . , q}. For each such function η, let

Cη = {f ∈ F : ηf = η}.

Then F ⊆ η Cη , a finite union. Each Cη is a subset of C(X, Rk ). To finish the proof, we

S

will show that Cη has diameter at most 4ε. Let f , g ∈ Cη , for some η. Then for i = 1, . . .,

p, we have f (xi ), g(xi ) ∈ Bε (yη(i) ). For any x ∈ X choose i with x ∈ Bδ (xi ). Then

f (x) − g(x)k ≤
f (x) − f (xi )
+
f (xi ) − g(xi )
+
g(xi ) − g(x)
< ε +
f (xi ) − g(xi )
+ ε,

by the uniform equicontinuity of F (and the choice of δ),

< ε + 2ε + ε,

since f (xi ) and g(xi ) belong to a ball of radius ε. Thus kf − gku < 4ε. Therefore Cη has

diameter at most 4ε.

P

Theorem 32.1. (Abel’s theorem.) Let an have bounded

Pnpartial

sums, and let (bn ) be a

decreasing nonnegative sequence. If M ≥ 0 is such that

j=1 aj ≤ M for all n, then

for

Pn P

any m ≤ n we have the estimate

j=m aj bj ≤ 2M bm . Moreover, if bn → 0, then

an b n

converges.

Proof. Let sn = nj=1 aj . We have

P

n

X n

X

aj b j = (sj − sj−1 )bj

j=m j=m

n

X n−1

X

= s j bj − sj bj+1

j=m j=m−1

n−1

X

= sj (bj − bj+1 ) + sn bn − sm−1 bm .

j=m

NOTES, MAT 472, INTERMEDIATE ANALYSIS, FALL 2010 79

We then have

Xn X n−1

aj b j ≤ sj (bj − bj+1 ) + |sn bn | + |sm−1 bm |

j=m j=m

n−1

X

≤ M (bj − bj+1 ) + M bn + M bm ,

j=m

= 2M bm .

P

If bm → 0 as m → ∞, the series an bn converges by the Cauchy criterion.

in Abel’s theorem also shows that the “remainder” after m

terms satisfies j>m am bm ≤ 2M bm+1 .

Corollary 32.3. (Alternating series test.) Let (bn ) be a decreasing sequence with limit 0.

Then the alternating series

∞

X

b1 − b2 + b3 − · · · = (−1)n−1 bn

n=1

converges, and ∞

P j−1

j=n+1 (−1) bj ≤ bn+1 .

Proof. With an = (−1)n , Abel’s theorem provesPconvergence, and gives the estimate with a

factor of 2. However, since the partial sums of an are all non-negative (either 0 or 1), the

estimate in that proof can be improved as in the statement of the corollary. We leave the

details to the interested reader.

P∞ n−1

Example 32.4. (1) (The alternating harmonic series.) n=1 (−1) /n = 1 − 1/2 +

1/3 − 1/4 + 1/5 − · · · converges by the alternating series test. (We will see later that

the sum is log 2.)

(2) Let θ be an irrational number. (In fact, the argument we present applies to any

non-integral real number θ.) In the following, we will apply the formula for the sum

of a finite geometric series to complex numbers.

n

X n

X

sin 2πjθ = Im (cos 2πjθ + i sin 2πjθ)

j=1 j=0

n

X

= Im (cos 2πθ + i sin 2πθ)j

j=0

= Im .

1 − (cos 2πθ + i sin 2πθ)

80 JACK SPIELBERG

Hence

n

X 2

sin 2πjθ ≤

1 − (cos 2πθ + i sin 2πθ)

j=1

2

=p

(1 − cos 2πθ)2 + sin2 2πθ

r

2

= .

1 − cos 2πθ

P

Thus the series

P sin 2πnθ n sin 2πnθ has bounded partial sums. By Abel’s theorem, the series

n n

converges.

Abel also proved the following theorem on the behavior of a power series at an endpoint

of the interval of convergence.

Theorem 32.5. Let ∞ n

P

n=0 an (x − x0 ) have radius of convergence 0 < R < ∞. Suppose that

the series converges at an endpoint of the interval of convergence. Then the series converges

uniformly on the closed interval from x0 to that endpoint.

Corollary 32.6. With the hypotheses of the theorem, let f (x) denote the sum of the series

in its domain of convergence. Then f is continuous.

Proof. (of theorem) A linear change of variables reduces the theorem to the case where x0 = 0

and R = 1. We consider the case where the series converges at P the right-hand endpoint;

∞ n

the other case has a similar proof.

P Thus we have a power series n=0 an x with radius of

convergenceP 1, and such that an converges. Let ε > 0 be given. Applying the Cauchy

criterion to an , we obtain n0 ∈ N such that for all n0 ≤ m ≤ n we have

X n ε

aj < .

2

j=m

For any x ∈ [0, 1], the sequence xn is decreasing. We apply Abel’s theorem to the series

P ∞ j

j=n0 aj x to get

X n ε

aj x j 2 xm ≤ ε,

2

j=m

a uniform estimate.

n

P∞

Example 32.7. (1) From the geometric series 1/(1 − x) = n=0 x for |x| < 1, we

integrate term-by-term to obtain

Z x ∞ Z x ∞ ∞

dt X X xn+1 X xn

− log(1 − x) = = tn dt = = ,

0 1−t n=0 0 n=0

n + 1 n=1

n

still with radius of convergence equal to 1. Replacing x by −x we get

∞

X xn

(∗∗) log(1 + x) = (−1)n−1 ,

n=1

n

valid for |x| < 1. When x = 1 we have the alternating harmonic series, which

converges. By Abel’s theorem, the power series converges uniformly on [0, 1], and

NOTES, MAT 472, INTERMEDIATE ANALYSIS, FALL 2010 81

hence the limit is continuous. Since the equality in (∗∗) holds on [0, 1), and both

sides are continuous on [0, 1], the equality must hold at x = 1. This gives

1 1 1

log 2 = 1 − + − + ··· .

2 3 4

(2) Again starting with the geometric series, we replace x by −x2 to get

∞

1 X

= (−1)n x2n .

1 + x2 n=0

Since | − x2 | < 1 if and only if |x| < 1, this equation is also valid for |x| < 1. Now we

integrate term-by-term to get

Z x ∞

dt X (−1)n 2n

arctan x = 2

= x ,

0 1+t n=0

2n + 1

valid for |x| < 1. Again, the series converges for x = 1 by the alternating series test.

By Abel’s theorem, the series is continuous on [0, 1], and so the above equation is

still valid at x = 1. We obtain the classical series

π 1 1 1

= 1 − + − + ··· .

4 3 5 7

(3) We consider f (x) = (1 + x)α for α > 0, α 6∈ N. Repeated differentiation gives

f (n) (x) = α(α − 1) · · · (α − n + 1)(1 + x)α−n . Thus the Taylor series for f is given by

∞

X α(α − 1) · · · (α − n + 1)

1+ xn .

n=1

n!

as n → ∞. Thus the ratio test implies that the radius of convergence of the series is

1. If we appy the convergence test of homework #49, we find that (for n > α)

α − n

n − 1 = n (−1 − α) → −1 − α

n+1 n+1

as n → ∞. Thus since α > 0, the Taylor series converges at x = ±1.

Now we are faced with the following difficulty. Let g(x) = 1 + ∞ n

P

n=1 an x be the

sum of the Taylor series of f . Then f and g are both defined on [−1, 1], and are

continuous (by Abel’s theorem, in the case of g). We would like to know that they

are equal. In the previous two examples, we knew which function the power series

represented because we began with the geometric series. The other method that

we have seen involves using Taylor’s theorem to prove that the Taylor polynomials

converge to the function uniformly on compact subsets of the (interior of the) interval

of convergence. In the current example, it isn’t apparent how to bound the derivatives

of f so as to use Taylor’s theorem. Instead, we will present a clever trick, courtesy

of Folland’s Real Analysis.

Notice that f 0 (x) = α(1 + x)α−1 , and hence α−1 (1 + x)f 0 (x) = (1 + x)α . Since

we intend to prove that g(x) = (1 + x)α , we investigate this expression using g.

82 JACK SPIELBERG

∞

X

α−1 (1 + x)g 0 (x) = α−1 (1 + x) nan xn−1

n=1

∞ ∞

!

X X

= α−1 (n + 1)an+1 xn + nan xn

n=0 n=1

∞

!

X

= α−1 (n + 1)an+1 + nan xn

a1 +

n=1

α(α − 1) · · · (α − n + 1)(α − n)

(n + 1)an+1 + nan = (n + 1)

(n + 1)!

α(α − 1) · · · (α − n + 1)

+ n

n!

α(α − 1) · · · (α − n + 1)

= (α − n) + n

n!

= an α.

Hence

α−1 (n + 1)an+1 + nan = an .

P

n=1 an x = g(x). It follows that

d

(1 + x)−α g(x) = −α(1 + x)−α−1 g(x) + (1 + x)−α g 0 (x)

dx

= −α(1 + x)−α−1 α−1 (1 + x)g 0 (x) + (1 + x)−α g 0 (x)

= 0.

Since g(0) = 1, we have g(x) = (1 + x)α = f (x) on (−1, 1). By continuity, they are

equal on [−1, 1].

Finally, we remark that letting t = 1+x ∈ [0, 2], we obtain tα = 1+ ∞ n

P

n=1 an (t−1) ,

the series converging uniformly on [0, 2]. This is another explicit demonstration of

Weierstrass’ approximation theorem (for these functions).

- My TOS in Mathematics 7 (1st Quarter)Uploaded byRomneRyanPortacion
- borel.pdfUploaded byNguyen Quang Trung
- Galois Connections in Category Theory, Topology and LogicUploaded byValentín Mijoevich
- Www.math.Mcgill.ca Goren MATH371.2014 MATH371notesUploaded byJohannesBrustle
- SyllabusMSc MathematicsUploaded bysrajubasava
- Galois TheoryUploaded byStephie Galindo
- La-metrica-de-Hausdorf-Exposicion-Metricos.pdfUploaded byHaydee Hernández
- CardinalityUploaded byvignesh0617
- EC 515 IntroductionUploaded bybhavani008
- Part5-2Uploaded byjenidy
- Sequences in RUploaded bylusienopop
- Chap01 Metric and Normed SpacesUploaded bygod40
- On the Naturality of Reducible FieldsUploaded bynettle404
- SELECTED STORIES IN MATHEMATICS AND PHYSICS/book Lambert Academic PublishingUploaded byGeorge Mpantes mathematics teacher
- tensor analysisUploaded byJoseph Raya-Ellis
- d_009_01Uploaded byswapnilmani
- algebra 2 quarter 2 do nowsUploaded byapi-214128188
- Fsc Part 1 Class MCQs With Answers for Mathematics Chapter 1Uploaded byAfnan
- Classical Model Theory of FieldsUploaded byDiego Camelo
- chapter5formulationandsolutionstrategies-121030153949-phpapp02Uploaded byajpsalison
- RouthUploaded byjqsolis
- HolderUploaded byKariem Mohamed Ragab Hamed
- Linear Algebra - MATH 124 Z1 - Course Syllabus or Other Course-Related DocumentUploaded byContinuing Education at the University of Vermont
- 14s_ex2a_ma221_solUploaded byErika Achig
- g8m1 study guide mid module assessment answer keyUploaded byapi-276774049
- AMU MBA Exams SyllabusUploaded byVaibhav Goyal
- HW #09 plates and shellsUploaded byAmit Jain
- TipiUploaded byakash
- C4 January 2006 Mark SchemeUploaded byshariz500
- 30Uploaded byshubham_86

- Properties of ratios and proportionsUploaded byBrasil Terraplanista
- LESSON PLAN IN MATH 6 MA'AM MENDIOLA.docxUploaded byEvan Maagad Lutcha
- UcamasUploaded bydaya
- 1. R.S Aggarwal Quantitative Aptitude ( PDFDrive.com ).pdfUploaded byRAHUL SHARMA
- HelmUploaded bySaba Zafar
- DMS NotesUploaded bysubramanyam62
- Algebra modulusUploaded byLincoln Chau
- CBSE Class 6 Maths Practice Worksheets (1)Uploaded byRajeevSangam
- CAHSEE Algebra and Functions Student Text - UC Davis - August 2008Uploaded byDennis Ashendorf
- S 0104Uploaded byguru serasa
- Exponent_Rules_&_Practice.pdfUploaded byDan Pri
- CorePlusMath Course1Uploaded byHanzDucNguyen
- Binary CompositionUploaded bymumputapas5
- Bat Dang Thuc Qua Cac Ky Thi IMOUploaded byhuyden181
- MATH3150 Fa15 Hmw1 SolutionsUploaded byJustin Baars
- converting mixed numbers to improper fractions worksheetUploaded byapi-293869924
- Sequences and SeriesUploaded bymiss_bnm
- Fireup’s Number SystemUploaded byMayank Sharma
- CPT114-CPT104-predicateLogicUploaded byFieza Andrea
- NCERT Class 7 MathematicsUploaded byMithinga Boro
- NCERT 8th Class Mathematics Www Prep4civils ComUploaded byPrep4Civils
- 1sdfjadUploaded byLakshit Sanghrajka
- SadlierUploaded byjodan
- Examples DocumentUploaded byapi-3835421
- Fastrack Objective Mathematics by Rajesh VarmaUploaded byRaja Gopal
- 5.1 Numbers and Operations ContinuedUploaded byguanajuato_christopher
- Computer Architecture 3rd Edition by Moris ManoCh 10.pptUploaded bySadaf Rasheed
- HURAIAN SUKATAN PELAJARAN MATEMATIK TAHUN 4Uploaded byMat Jang
- Math DecimalsUploaded bycairo861
- Sorting Algorithms ReportUploaded byNada Tawfick