- hw4-sol
- Mathematics - I M
- ch01
- Chapter 2 .docx
- After Midnight
- Essentials.of.Probability.theory.for.Statisticians
- ch1
- 20120223210203SMN6014-StudentNotes
- Variational Analysis
- C5
- natural numbers
- David a. Santos - Number Theory for Mathematical Contests
- Group Theory and the Rubik’s Cube
- Real Analysis HW
- An Anti-Reductionist's Guide to Evidential Support - Agustín Rayo
- 1_RealNumbers
- Provas de Fibonacci
- read_math
- portfolio lesson plan analysis and reflection
- Hannah - Foucault’s “German Moment”- Genealogy of a Disjuncture
- 32.3engstrom
- dnb_vol24_no4_381
- 10GenomesEvolutionTransposableElements(2) 2(1)
- [W.E.B. Du Bois, Philip S. Foner] W.E.B. Du Bois S(B-ok.org)
- 22_B_U__Int_l_L_J__219_at_22
- fContracts Preflow
- Hannah - Foucault’s “German Moment”- Genealogy of a Disjuncture
- s s Saldanha (2)
- Relationship-related obsessivecompulsive phenomena- The case of relationship-centred and partnerfocused obsessive compulsive symptoms.pdf
- Wake Forest Bailey Min Neg ADA Round4
- Block Template
- Maquiladoras Neg
- tl1sqap1 - LibFlow
- To Get Deoxys First Beat the Elite Four
- Liberalism and the American Revolution.pdf
- Zen as a Social Ethics of Responsiveness
- Infinite Obligation NC
- Kant NC
- Ayer NC
- Levinas and Ethical Responsibility
- 3567
- Progressive Taxes Sample
- Ayer NC
- Why Compulsory Voting Can Enhance Democracy
- Giving Voice to the Vulnerable- Discourse Ethics and Amnesty for Undocumented Immigration
- Aff Afc+Democracy Pk
- [Arendt] the Human Condition (1958)(BookFi.org)

**Course notes, Fall 2012
**

Michael Damron

Compiled from lectures and exercises designed with Mark McConnell

following Principles of Mathematical Analysis, Rudin

Princeton University

1

Contents

1 Fundamentals 4

1.1 Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

1.2 Relations and functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

1.3 Cardinality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

1.4 Natural numbers and induction . . . . . . . . . . . . . . . . . . . . . . . . . 10

1.5 Cardinality and the natural numbers . . . . . . . . . . . . . . . . . . . . . . 14

1.6 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

2 The real numbers 18

2.1 Rationals and suprema . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

2.2 Existence and properties of real numbers . . . . . . . . . . . . . . . . . . . . 19

2.3 R

n

for n ≥ 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

2.4 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

3 Metric spaces 25

3.1 Deﬁnitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

3.2 Open and closed sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

3.3 Limit points . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

3.4 Compactness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

3.5 Heine-Borel Theorem: compactness in R

n

. . . . . . . . . . . . . . . . . . . . 32

3.6 The Cantor set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

3.7 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

4 Sequences 40

4.1 Deﬁnitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

4.2 Subsequences, Cauchy sequences and completeness . . . . . . . . . . . . . . 45

4.3 Special sequences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

4.4 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

5 Series 52

5.1 Deﬁnitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52

5.2 Ratio and root tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

5.3 Non non-negative series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58

5.4 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59

6 Function limits and continuity 62

6.1 Function limits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62

6.2 Continuity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63

6.3 Relations between continuity and compactness . . . . . . . . . . . . . . . . . 66

6.4 Connectedness and the IVT . . . . . . . . . . . . . . . . . . . . . . . . . . . 69

6.5 Discontinuities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70

6.6 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70

2

7 Derivatives 74

7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74

7.2 Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75

7.3 Mean value theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78

7.4 L’Hopital’s rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79

7.5 Power series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81

7.6 Taylor’s theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84

7.7 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86

8 Integration 91

8.1 Deﬁnitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91

8.2 Properties of integration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95

8.3 Fundamental theorems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99

8.4 Change of variables, integration by parts . . . . . . . . . . . . . . . . . . . . 101

8.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102

A Real powers 105

A.1 Natural roots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105

A.2 Rational powers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105

A.3 Real powers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107

B Logarithm and exponential functions 110

B.1 Logarithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110

B.2 Exponential function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111

B.3 Sophomore’s dream . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113

C Dimension of the Cantor set 115

C.1 Deﬁnitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115

C.2 The Cantor set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117

C.3 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119

3

1 Fundamentals

1.1 Sets

We begin with the concepts of set, object and set membership. We will leave these as primitive

in a sense; that is, undeﬁned. You can think of a set as a collection of objects and if a is

an object and A is a set then a ∈ A means a is a member of A. If A and B are sets, we

say that A is a subset of B (written A ⊂ B) whenever a ∈ A we have a ∈ B. If A ⊂ B and

B ⊂ A we say the sets are equal and we write A = B. A is a proper subset of B if A ⊂ B

but A ,= B. Note that ∅, the set with no elements, is a subset of every set.

There are many operations we can perform with sets.

• If A and B are sets, A ∪ B is the union of A and B and is the set

A ∪ B = ¦a : a ∈ A or a ∈ B¦ .

• If A and B are sets, A ∩ B is the intersection of A and B and is the set

A ∩ B = ¦a : a ∈ A and a ∈ B¦ .

• Of course we can generalize these to arbitrary numbers of sets. If ( is a (possibly

inﬁnite) collection of sets (that is, a set whose elements are themselves sets), we deﬁne

_

A∈C

A = ¦a : a ∈ A for some A ∈ (¦

A∈C

A = ¦a : a ∈ A for all A ∈ (¦ .

The sets A and B are called disjoint if A ∩ B = ∅.

These operations obey the following properties

A ∩ (B ∪ C) = (A ∩ B) ∪ (A ∩ C)

A ∪ (B ∩ C) = (A ∪ B) ∩ (A ∪ C) .

Let us give a proof of the ﬁrst. To show these sets are equal, we must show each is contained

in the other. So let a ∈ A ∩ (B ∪ C). We would like to show that a ∈ (A ∩ B) ∪ (A ∩ C).

We know a ∈ A and a ∈ (B ∪ C). One possibility is that a ∈ A and a ∈ B, in which case

a ∈ A∩B, giving a ∈ (A∩B) ∪(A∩C). The only other possibility is that a ∈ A and a ∈ C,

since a must be in either B or C. Then a ∈ A∩C and the same conclusion holds. The other

direction is an exercise.

If A and B are sets then deﬁne the diﬀerence A ¸ B as

A ¸ B = ¦a : a ∈ A but a / ∈ B¦ .

4

One can verify the following as well.

A ¸ (B ∪ C) = (A ¸ B) ∩ (A ¸ C)

A ¸ (B ∩ C) = (A ¸ B) ∪ (A ¸ C) .

Finally the symmetric diﬀerence is

A∆B = (A ¸ B) ∪ (B ¸ A) .

1.2 Relations and functions

Our last important way to build a set from other sets is the product. We write

A B = ¦(a, b) : a ∈ A, b ∈ B¦ .

Deﬁnition 1.2.1. A relation R between sets A and B is any subset of AB. If (a, b) ∈ R

we think of a as being related to b.

We will ﬁrst mention types of relations on a single set.

• A relation R between A and A is reﬂexive if (a, a) ∈ R for all a ∈ A.

• It is symmetric if whenever (a

1

, a

2

) ∈ R we have (a

2

, a

1

) ∈ R.

• It is transitive if whenever (a

1

, a

2

) ∈ R and (a

2

, a

3

) ∈ R we have (a

1

, a

3

) ∈ R.

Deﬁnition 1.2.2. A relation R on A which is reﬂexive, symmetric and transitive is called

an equivalence relation. Given a ∈ A and an equivalence relation R on A we write

[a]

R

= ¦a

∈ A : (a, a

) ∈ R¦

for the equivalence class of a.

Sometimes the condition (a, a

) ∈ R is written a ∼ a

**(and sometimes R is not even
**

mentioned). An example is equality of sets; that is, deﬁning the relation A ∼ B if A = B

gives an equivalence relation. And here we have not speciﬁed R or the “larger” set on which

R is a relation. You can check set equality is reﬂexive, symmetric and transitive.

Proposition 1.2.3. If R is an equivalence relation on a nonempty set A and a

1

, a

2

∈ R

then

either [a

1

]

R

= [a

2

]

R

or [a

1

]

R

∩ [a

2

]

R

= ∅ .

Proof. We will ﬁrst show that both conditions cannot simultaneously hold. Then we will

show that at least one must hold. To show the ﬁrst, note that a

1

∈ [a

1

]

R

and a

2

∈ [a

2

]

R

since R is reﬂexive. Therefore if [a

1

]

R

= [a

2

]

R

then a

1

∈ [a

1

]

R

∩ [a

2

]

R

, giving nonempty

intersection.

For the second claim, suppose that [a

1

]

R

∩ [a

2

]

R

,= ∅, giving some a in the intersection.

We claim that [a]

R

= [a

1

]

R

. If a

∈ [a]

R

then (a

, a) ∈ R. But a ∈ [a

1

]

R

, so (a, a

1

) ∈ R.

By transitivity, (a

, a

1

) ∈ R, so a

∈ [a

1

]

R

. This proves [a]

R

⊆ [a

1

]

R

. To show the other

containment, let a

∈ [a

1

]

R

so that (a

, a

1

) ∈ R. Again, (a, a

1

) ∈ R, giving (a

1

, a) ∈ R.

Transitivity then implies (a

, a) ∈ R, so a

∈ [a]

R

.

5

The picture is that the equivalence classes of R partition A.

Deﬁnition 1.2.4. A partition of A is a collection T of subsets of A such that

1. A =

S∈P

S and

2. S

1

∩ S

2

= ∅ whenever S

1

and S

2

in T are not equal.

Using this deﬁnition, we can say that if R is an equivalence relation on a set A then the

collection

(

R

= ¦[a]

R

: a ∈ A¦

of equivalence classes form a partition of A.

Just a note to conclude. If we have an equivalence relation R on a set A, it is standard

notation to write

R/A = ¦[a]

R

: a ∈ A¦

for the set of equivalence classes of A under R. This is known as taking the quotient by an

equivalence relation. At times the relation R is written in an implied manner using a symbol

like ∼. For instance, (a, b) ∈ R would be written a ∼ b. In this case, the quotient is R/ ∼.

We will spend much of the course talking about functions, which are special kinds of

relations.

Deﬁnition 1.2.5. Let A and B be sets and f a relation between A and B. We say that f

is a (well-deﬁned) function from A to B, written f : A →B if the following hold.

1. For each a ∈ A, there is at least one b ∈ B such that (a, b) ∈ f.

2. For each a ∈ A, there is at most one b ∈ B such that (a, b) ∈ f. That is, if we ever

have (a, b

1

) ∈ f and (a, b

2

) ∈ f for b

1

, b

2

∈ B, it follows that b

1

= b

2

.

The set A is called the domain of f and the B is called the codomain of f.

Of course we will not continue to use this notation for a function, but the more familiar

notation: if (a, b) ∈ f then because of item 2 above, we can unambiguously write f(a) = b.

We will be interested in certain types of functions.

Deﬁnition 1.2.6. The function f : A → B is called one-to-one (injective) if whenever

a

1

,= a

2

then f(a

1

) ,= f(a

2

). It is called onto (surjective) if for each b ∈ B there exists a ∈ A

such that f(a) = b.

Another way to deﬁne onto is to ﬁrst deﬁne the range of a function f : A →B by

f(A) = ¦f(a) : a ∈ A¦

and say that f is onto if f(A) = B.

Many times we want to compose functions to build other ones. Suppose that f : A →B

and g : B →C are functions. Then

(g ◦ f) : A →C is deﬁned as (g ◦ f)(a) = g((f(a)) .

6

Formally speaking we deﬁne g ◦ f ⊆ A C by

(a, c) ∈ g ◦ f if (a, b) ∈ f and (b, c) ∈ g for some b ∈ B .

You can check that this deﬁnes a function.

Proposition 1.2.7. Let f : A →B and g : B →C be functions.

1. If f and g are one-to-one then so is g ◦ f.

2. If f and g are onto then so is g ◦ f.

Proof. We start with the ﬁrst statement. Suppose that f and g are one-to-one; we will show

that g ◦ f must be one-to-one. Suppose then that a and a

**in A are such that (g ◦ f)(a) =
**

(g ◦ f)(a

). Then by deﬁnition, g(f(a)) = g(f(a

)). But g is one-to-one, so f(a) = f(a

).

Now since f is one-to-one, we ﬁnd a = a

. This shows that if (g ◦ f)(a) = (g ◦ f)(a

) then

a = a

**, proving g ◦ f is one-to-one.
**

Suppose then that f and g are onto. To show that g ◦ f is onto we must show that

for each c ∈ C there exists a ∈ A such that (g ◦ f)(a) = c.This is the same statement as

g(f(a)) = c. We know that g is onto, so there exists b ∈ B such that g(b) = c. Furthermore,

f is onto, so for this speciﬁc b, there exists a ∈ A such that f(a) = b. Putting these together,

(g ◦ f)(a) = g(f(a)) = g(b) = c .

This completes the proof.

If a function is both one-to-one and onto we can deﬁne an inverse function.

Deﬁnition 1.2.8. If f : A →B is both one-to-one and onto we call f a bijection.

Theorem 1.2.9. Let f : A →B. There exists a function f

−1

: B →A such that

f

−1

◦ f = id

A

and f ◦ f

−1

= id

B

, (1)

where id

A

: A →A and id

B

: B →B are the identity functions

id

A

(a) = a and id

B

(b) = b

if and only if f is a bijection. The meaning of the above equations is f

−1

(f(a)) = a and

f(f

−1

(b)) = b for all a ∈ A and b ∈ B.

Proof. Suppose that f : A →B is a bijection. Then deﬁne f

−1

⊆ B A by

f

−1

= ¦(b, a) : (a, b) ∈ f¦ .

This is clearly a relation. We claim it is a function. To show this we must prove that

• for all b ∈ B there exists a ∈ A such that (b, a) ∈ f

−1

and

7

• for all b ∈ B there exists at most one a ∈ A such that (b, a) ∈ f

−1

.

Restated, these are

• for all b ∈ B there exists a ∈ A such that f(a) = b and

• for all b ∈ B there exists at most one a ∈ A such that f(a) = b.

These are exactly the conditions that f be a bijection, so f

−1

is a function.

Now we must show that f

−1

◦ f = id

A

and f ◦ f

−1

= id

B

. We show only the ﬁrst; the

second is an exercise. For each a ∈ A, there is a b ∈ B such that f(a) = b. By deﬁnition of

f

−1

, we then have (b, a) ∈ f

−1

; that is, f

−1

(b) = a. Therefore (a, b) ∈ f and (b, a) ∈ f

−1

,

giving (a, a) ∈ f

−1

◦ f, or

(f

−1

◦ f)(a) = a = id

A

(a) .

We have now shown that if f is a bijection then there is a function f

−1

that satisﬁes (1).

For the other direction, suppose that f : A → B is a function and g : B → A is a function

such that

g ◦ f = id

A

and f ◦ g = id

B

.

We must show then that f is a bijection. To show one-to-one, suppose that f(a

1

) = f(a

2

).

Then a

1

= id

A

(a

1

) = g(f(a

1

)) = g(f(a

2

)) = id

A

(a

2

) = a

2

., giving that f is one-to-one. To

show onto, let b ∈ B; we claim that f maps the element g(b) to b. To see this, compute

b = id

B

(b) = f(g(b)). This shows that f is onto and completes the proof.

Here are some more facts about inverses and injectivity/surjectivity.

• If f : A →B is a bijection then so is f

−1

: B →A.

• If f : A →B and g : B →C are bijections then so is g ◦ f.

• The identity map id

A

: A →A is a bijection.

If a function f : A →B is not a bijection then there is no inverse function f

−1

: B →A.

However we can in all cases consider the inverse image.

Deﬁnition 1.2.10. Given f : A →B and C ⊂ B we deﬁne the inverse image of C as

f

−1

(C) = ¦a ∈ A : f(a) ∈ C¦ .

Note that if we let C be a singleton set ¦b¦ for some b ∈ B then we retrieve all elements

a ∈ A mapped to b:

f

−1

(¦b¦) = ¦a ∈ A : f(a) = b¦ .

In the case that f is invertible, this just gives the singleton set consisting of the point f

−1

(b).

We note the following properties of inverse images (proved in the homework). For f : A →B

and C

1

, C

2

⊂ B,

• f

−1

(C

1

∩ C

2

) = f

−1

(C

1

) ∩ f

−1

(C

2

).

• f

−1

(C

1

∪ C

2

) = f

−1

(C

1

) ∪ f

−1

(C

2

).

8

1.3 Cardinality

The results of the previous section allow us to deﬁne an equivalence relation on sets:

Deﬁnition 1.3.1. If A and B are sets, we say that A and B are equivalent (A · B or A

and B have the same cardinality) if there exists a bijection f : A →B. The cardinality of a

set A (written (A)) is deﬁned as the equivalence class of A under this relation. That is

(A) = ¦B : A · B¦ .

To compare cardinalities, we introduce a new relation on sets.

Deﬁnition 1.3.2. If A and B are sets then we write (A) ≤ (B) if there exists a one-to-one

function f : A →B. Write (A) < (B) if (A) ≤ (B) but (A) ,= (B).

The following properties follow. (Exercise: verify the ﬁrst two.)

1. (reﬂexivity) For each set A, (A) ≤ (A).

2. (transitivity) For all sets A, B, C, if (A) ≤ (B) and (B) ≤ (C) then (A) ≤ (C).

3. (antisymmetry) For all sets A and B, if (A) ≤ (B) and (B) ≤ (A) then (A) = (B).

Any relation on a set that satisﬁes these properties is called a partial order. For cardi-

nality, establishment of antisymmetry is done by the Cantor-Bernstein theorem, which we

will skip.

Theorem 1.3.3 (Cantor’s Theorem). For any set A let T(A) be the power set of A; that is,

the set whose elements are the subsets of A. Then (A) < (T(A)).

Proof. We ﬁrst show that (A) ,= (T(A)). We proceed by contradiction. Suppose that A is

a set but assume that (A) = (T(A)). Then there exists a bijection f : A → T(A). Using

this function, deﬁne the set

S = ¦a ∈ A : a / ∈ f(a)¦ .

Since this is a subset of A, it is an element of T(A). As f is a bijection, it is onto and

therefore there exists s ∈ A such that f(s) = S. There are now two possibilities; either

s ∈ S or s / ∈ S. In either case we will derive a contradiction, proving that the assumption

we made cannot be true: no such f can exist and (A) ,= (T(A)).

In the ﬁrst case, s ∈ S. Then as S = f(s), we have s ∈ f(s). But then by deﬁnition of

S, it must actually be that s / ∈ S, a contradiction. In the second case, s / ∈ S, giving by the

deﬁnition of S that s ∈ f(s). However f(s) = S so s ∈ S, another contradiction.

Second we must show that (A) ≤ (T(A)). To do this we deﬁne the function

f : A →T(A) by f(a) = ¦a¦ .

To prove injectivity, suppose that f(a

1

) = f(a

2

). Then ¦a

1

¦ = ¦a

2

¦ and therefore a

1

=

a

2

.

9

Let us now give an example of two sets with the same cardinality. If A and B are sets

we write B

A

for the set of functions f : A → B. Let F

2

be a set with two elements, which

we call 0 and 1. We claim that

(T(A)) = (F

A

2

) .

To see this we must display a bijection between the two. Deﬁne f : T(A) → F

A

2

by the

following. For any subset S ⊂ A associate the characteristic function χ

S

: A →F

2

by

χ

S

(a) =

_

1 if a ∈ S

0 if a / ∈ S

.

Exercise: show that the function f : T(A) →F

A

2

given by f(S) = χ

S

is a bijection.

1.4 Natural numbers and induction

To introduce the natural numbers in an axiomatic way we will use the Peano axioms.

Assumption. We assume the existence of a set N, an element 1 ∈ N and a function

s : N →N with the following properties.

1. For each n ∈ N, s(n) (the successor of n) is not equal to 1.

2. s is injective.

3. (Inductive axiom) If any subset S ⊂ N contains 1 and has the property that whenever

n ∈ S then s(n) ∈ S, it follows that S = N.

The third property seems a bit weird at ﬁrst, but actually there are many sets which satisfy

the ﬁrst two properties and are not N. For instance, the set ¦n/2 : n ∈ N¦ does. So we need

it to really pin down N.

From these axioms many properties follow. Here is one.

• for all n ∈ N, s(n) ,= n.

Proof. Let S = ¦n ∈ N : s(n) ,= n¦. Clearly 1 ∈ S. Now suppose that n ∈ S for some

n. Then we claim that s(n) ∈ S. To see this, note that by injectivity of s, s(n) ,= n

implies that s(s(n)) ,= s(n). Thus s(n) ∈ S. By the inductive axiom, since 1 ∈ S and

whenever n ∈ S we have s(n) ∈ S, we see that S = N. In other words, s(n) ,= n for

all n.

Addition

It is customary to call s(1) = 2, s(2) = 3, and so on. We deﬁne addition on the natural

numbers in a recursive manner:

• for any n ∈ N, deﬁne n + 1 to be s(n) and

10

• for any n, m ∈ N, deﬁne n + s(m) to be s(n + m).

That this indeed deﬁnes a function + : NN →N requires proof, but we will skip this and

assume that addition is deﬁned normally. Of course, addition satisﬁes the commutative and

associative laws.

1. For any m, n, r ∈ N, m + (n + r) = (m + n) + r.

Proof. First we show the statement for r = 1 and all m, n. We have

m + (n + 1) = m + s(n) = s(m + n) = (m + n) + 1 ,

where we have used the inductive deﬁnition of addition. Now suppose that the formula

holds for some r ∈ N; we will show it holds for s(r). Indeed,

m + (n + s(r)) = m + (n + (r + 1)) = m + ((n + r) + 1) = m + s(n + r)

= s(m + (n + r)) = s((m + n) + r) = (m + n) + s(r) .

In other words, the set

S = ¦r ∈ N : m + (n + r) = (m + n) + r for all m, n ∈ N¦

has 1 ∈ S and whenever r ∈ S, also s(r) ∈ S. By the inductive axiom, S = N.

2. For any m, n ∈ N, m + n = n + m.

Proof. Again we use an inductive argument. Deﬁne

S = ¦n ∈ N : n + m = m + n for all m ∈ N¦ .

The ﬁrst step is to show that 1 ∈ S; that is, that 1 + m = m + 1 for all m ∈ N. For

this we also do an induction. Set

T = ¦m ∈ N : 1 + m = m + 1¦ .

First, 1 ∈ T since 1 +1 = 1 +1. Suppose then that m ∈ T. We claim that this implies

m + 1 ∈ T. To see this, write

1 + (m + 1) = (1 + m) + 1 = (m + 1) + 1 .

By the induction, T = N.

Now that we have shown 1 ∈ S, we assume n ∈ S and prove n + 1 ∈ S. For m ∈ N,

(n + 1) + m = n + (1 + m) = n + (m + 1) = (n + m) + 1

= (m + n) + 1 = m + (n + 1) .

By the inductive axiom, S = N and we are done.

11

3. For all n, m ∈ N, n + m ,= n.

Proof. Deﬁne the set

S = ¦n ∈ N : n + m ,= nfor all m ∈ N¦ .

Then since by the Peano axioms,

1 + m = s(m) ,= 1 for all m ∈ N ,

so 1 ∈ N. Suppose then that n ∈ S; that is, n is such that n + m ,= n for all m ∈ N.

Then by injectivity, for m ∈ N,

(n + 1) + m = (n + m) + 1 = s(n + m) ,= s(n) = n + 1 ,

giving n + 1 ∈ S. By the inductive axiom, S = N and we are done.

Last, for proving facts about ordering we show

• s is a bijection from N to N ¸ ¦1¦.

Proof. We know s does not map any element to 1 so s is in fact a function to N¸ ¦1¦.

Also it is injective. To show surjective, consider the set

S = ¦1¦ ∪ ¦s(n) : n ∈ N¦ .

Clearly 1 ∈ S. Supposing that n ∈ S then n ∈ N, so s(n) ∈ S. Therefore S = N.

Therefore if k ,= 1 then k = s(n) for some n ∈ N.

The above lets us deﬁne n −1 for n ,= 1. It is the element such that (n −1) + 1 = n.

Ordering

We also deﬁne an ordering on the natural numbers. We say that m ≤ n for m, n ∈ N if

either m = n or m + a = n for some a ∈ N. This deﬁnes a total ordering of N; that is, it is

a partial ordering that also satisﬁes

• for all m, n ∈ N, m ≤ n or n ≤ m.

In the case that m ≤ n but m ,= n we write m < n. Note that by item 3 above, n < n + m

for all n, m ∈ N. In particular, n < s(n).

Proposition 1.4.1. ≤ is a total ordering of N.

12

Proof. First each n ≤ n so it is reﬂexive. Next if n

1

≤ n

2

and n

2

≤ n

3

then if n

1

= n

2

or

n

2

= n

3

, we clearly have n

1

≤ n

3

. Otherwise there exists m

1

, m

2

∈ N such that n

1

+m

1

= n

2

and n

2

+ m

2

= n

3

. In this case,

n

3

= n

2

+ m

2

= (n

1

+ m

1

) + m

2

= n

1

+ (m

1

+ m

2

) ,

giving n

1

≤ n

3

.

For antisymmetry, suppose that m ≤ n and n ≤ m. For a contradiction, if m ,= n then

there exists a, b ∈ N such that m = n+a and n = m+b. Then m = (m+a)+b = m+(a+b),

a contradiction with item 3 above. Therefore m = n.

So far we have proved that ≤ is a partial order. We now prove ≤ is a total ordering. To

begin with, we claim that for all n ∈ N, 1 ≤ n. Clearly this is true for n = 1. If we assume

it holds for some n then

n + 1 = 1 + n ≥ 1 ,

verifying the claim by induction.

Now for any m > 1 (that is, m ∈ N with m ,= 1), deﬁne the set

S = ¦n ∈ N : n ≤ m¦ ∪ ¦n ∈ N : m ≤ n¦ .

By the above remarks, 1 ∈ S. Supposing now that n ∈ S for some n ∈ N, we claim that

n + 1 ∈ S. To show this, we have three cases.

1. Case 1: n = m. In this case, n + 1 = m + 1 ≥ m, giving n + 1 ∈ S.

2. Case 2: n > m, so there exists a ∈ N such that n = m+a. Then n+1 = m+a+1 ≥ m,

giving n + 1 ∈ S.

3. Case 3: n < m, so there exists a ∈ N such that m = n+a. If a = 1 then n+1 = m ∈ S.

Otherwise a > 1, implying that a −1 ∈ N (that is, a −1 is deﬁned), so

m = n + a = n + a −1 + 1 = (n + 1) + a −1 > n + 1 ,

so that n + 1 ∈ S. By the inductive axiom, S = N and therefore for all n, we have

n ≤ m or m ≤ n.

A consequence of these properties is trichotomy of the natural numbers. For any m, n ∈

N, exactly one of the following holds: m < n, m = n or n < m.

A property that relates addition and ordering is

• if m, n, r ∈ N such that m < n then m + r < n + r.

Proof. There must be a ∈ N such that n = m+a. Then n+r = m+a+r = m+r +a,

giving m + r < n + r.

Clearly then if m ≤ n and r ∈ N we have m + r ≤ n + r.

13

• If n < k then n + 1 ≤ k.

Proof. If n < k then there exists j ∈ N such that n + j = k. Because 1 ≤ j we ﬁnd

n + 1 ≤ n + j = k.

Multiplication.

We deﬁne multiplication inductively by

n 1 = n for all n ∈ N

n s(m) = n + (n m) .

One can prove the following properties; (try it!) let m, n, r, s ∈ N:

1. for all n, m, r ∈ N,

n (m + r) = (n m) + (n r) .

2. n m = m n.

3. (n m) r = n (m r).

4. if n < m and r ≤ s then rn < sm.

1.5 Cardinality and the natural numbers

For each n ∈ N we write the set

J

n

= ¦m ∈ N : m ≤ n¦ .

Note that J

1

= ¦1¦ and for n ≥ 1, we have

J

n+1

= J

n

∪ ¦n + 1¦ .

To show this let k be in the right side. If k = n + 1 then k ∈ J

n+1

. Otherwise k ≤ n, giving

by n ≤ n + 1 the inequality k ≤ n + 1, or k ∈ J

n+1

. To prove the inclusion ⊂, suppose that

k ∈ J

n+1

. If k ∈ J

n

we are done, so suppose that k / ∈ J

n

. Therefore k > n, so k ≥ n +1. On

the other hand, k ≤ n + 1, so k = n + 1.

Deﬁnition 1.5.1. For an arbitrary set A we say that A has cardinality n if A · J

n

. In this

case we say A is ﬁnite and we write (A) = n. If A is not equivalent to any J

n

we say A is

inﬁnite.

In this deﬁnition, (A) is an equivalence class of sets and n is a number, so what we have

written here is purely symbolic: it means A · J

n

.

Lemma 1.5.2. If A and B are sets such that A ⊂ B then (A) ≤ (B).

Proof. Deﬁne f : A →B by f(a) = a. Then f is an injection.

14

Theorem 1.5.3. For all n ∈ N, (J

n

) < (J

n+1

) < N.

Proof. Each set above is a subset of the next, so the proposition holds using ≤ instead of

<. We must then prove ,= in each spot above. Assume ﬁrst that we have proved that

(J

n

) ,= (J

n+1

) for all n ∈ N; we will show that (J

n

) ,= N for all n ∈ N. If we had equality,

then we would ﬁnd (J

n+1

) ≤ N = (J

n

). This contradicts the ﬁrst inequality.

To prove the inequality (J

n

) ,= (J

n+1

), we use induction. Clearly it holds for n = 1

since J

1

= ¦1¦ and J

2

= ¦1, 2¦ and any function from J

1

to J

2

can only have one element

in its range (cannot be onto). Suppose then that (J

n

) ,= (J

n+1

); we will prove that

(J

n+1

) ,= (J

n+2

) by contradiction. Assume that there is a bijection f : J

n+1

→J

n+2

. Then

some element must be mapped to n + 2; call this k ∈ J

n+1

. Deﬁne h : J

n+1

→J

n+1

by

h(m) =

_

¸

_

¸

_

m m ,= k, n + 1

n + 1 m = k

k m = n + 1

.

This function just swaps k and n + 1. It follows then that

ˆ

f = f ◦ h : J

n+1

→ J

n+2

is a

bijection that maps n + 1 to n + 2.

Now J

n

is just J

n+1

¸ ¦n + 1¦ and J

n+1

is just J

n+2

¸ ¦n + 2¦, so deﬁne g : J

n

→ J

n+1

to do exactly what

ˆ

f does: g(m) =

ˆ

f(m). It follows that g is a bijection from J

n

to J

n+1

,

giving J

n

· J

n+1

, a contradiction.

Because of the proposition, if a set A has A · N it must be inﬁnite. In this case we say

that A is countable. Otherwise, if A is inﬁnite and (A) ,= N, we say it is uncountable. From

this point on, we will be more loose about working with the natural numbers. For example,

we will use the terms ﬁnite and inﬁnite in the same way that we normally do – a set is ﬁnite

if it has ﬁnitely many elements and inﬁnite otherwise. Of course every proof we write from

now on could be done using the Peano axioms, but we will be spared that.

Theorem 1.5.4. Let S be an inﬁnite subset of N. Then S is countably inﬁnite.

Proof. We must construct a bijection from N to S. We can actually do this using the well-

ordering property: that each non-empty subset of N has a least element. Deﬁne f : N →S

recursively: f(1) is the least element of S and, assuming we have deﬁned f(1), . . . , f(n),

deﬁne f(n + 1) to be the least element of S ¸ ¦f(1), . . . , f(n)¦.

This is a bijection.

Deﬁnition 1.5.5. We say a set A is countable if it is either ﬁnite or countably inﬁnite.

Note that A is countable if and only if there is an injection f : A → N; that is, that

(A) ≤ N.

Theorem 1.5.6. Let ( be a countable collection of countable sets. Then ∪

A∈C

A is countable.

15

Proof. To prove this we need to construct a bijection from N. We will do this somewhat

non-rigorously, thinking of a bijection from N as a listing of elements of ∪

A∈C

A in sequence.

For example, given a countably inﬁnite set S we may take a bijection f : N →S and list all

of the elements of S as

f(1), f(2), f(3), . . .

If S is ﬁnite then this corresponds to a ﬁnite list.

Since each A ∈ ( is countable, we may list its elements. The collection ( itself is countable

so we can list the elements of ∪

A∈C

A in an array:

a

1

a

2

b

1

b

2

b

3

c

1

d

1

d

2

d

3

d

4

Note that some rows are ﬁnite. We now list the elements according to diagonals. That is,

we write the list as

a

1

, b

1

, a

2

, c

1

, b

2

, d

1

, b

3

, d

2

, . . .

Because we want the list to correspond to a bijection, we need to make sure that no element

is repeated. So, for instance, if b

1

and a

2

are equal we would only include the ﬁrst.

1.6 Exercises

1. Let f : A → B and g : B → C be functions. Show that the relation g ◦ f ⊂ A C,

deﬁned by

(a, c) ∈ g ◦ f if (a, b) ∈ f and (b, c) ∈ g for some b ∈ B

is a function.

2. Show that the function f : T(A) →F

A

2

mentioned at the end of Section 1.3 and given

by f(S) = χ

S

is a bijection.

3. Prove the properties of multiplication listed at the end of Section 1.4.

4. Prove the following statements by induction.

(a) For all n ∈ N,

1 + 2 + + n =

n(n + 1)

2

.

(b) For all n ∈ N,

1

2

+ 2

2

+ + n

2

=

n(n + 1)(2n + 1)

6

.

16

5. Strong Induction. In this exercise we introduce strong mathematical induction,

which, although being referred to as “strong,” is actually equivalent to mathematical

induction. Suppose we are given a collection ¦P(n) : n ∈ N¦ of mathematical state-

ments. To show P(n) is true for all n, mathematical induction dictates that we show

two things hold: P(1) is true and if P(n) is true for some n ∈ N then P(n+1) is true.

To argue instead using strong induction we prove that

• P(1) is true and

• if n ∈ N is such that P(k) is true for all k ≤ n then P(n + 1) is true.

(a) Deﬁne a sequence (a

n

) of real numbers recursively by

a

1

= 1 and a

n

= a

1

+ + a

[n/2]

for n ≥ 2 .

(Here [n/2] is the largest integer no bigger than n/2.) Prove by strong induction

that a

n

≤ 2

n−1

for n ≥ 2. Is it possible to ﬁnd b < 2 such that a

n

≤ b

n−1

for all

n ≥ 2?

(b) Why does strong induction follow from mathematical induction? In other words

in the second step of strong induction, why are we allowed to assume that P(k)

is true for all k ≤ n to prove that P(n + 1) is true?

6. Prove that any non-empty subset S ⊂ N has a least element. That is, there is an s ∈ S

such that for all t ∈ S we have s ≤ t. This is a major result about N, expressed by

saying that N is well-ordered.

Hint. Assume there is no least element. Let

M = ¦m ∈ N : ∀t ∈ S, m ≤ t¦ .

Use Peano’s induction axiom to prove that M = N. Does this lead to a contradiction?

17

2 The real numbers

2.1 Rationals and suprema

From now on we will proceed through Rudin, using the standard notations

Z = ¦. . . , −1, 0, 1, . . .¦

Q = ¦m/n : m, n ∈ Z and n ,= 0¦ .

When thinking about the rational numbers, we quickly come to realize that they do not

capture all that we wish to express using numbers. For instance,

Theorem 2.1.1. There is no rational number whose square is 2.

Proof. We argue by contradiction, so assume that 2 = (m/n)

2

for some m, n ∈ Z with n ,= 0.

We may assume that m and n are not both even; otherwise, we can “reduce the fraction,”

removing enough factors of 2 from the numerator and denominator. Then

2n

2

= m

2

,

so m

2

is even. This actually implies that m must be even, for otherwise m

2

would be odd

(since the square of an odd number is odd). Therefore we can write m = 2s for some s ∈ Z.

Plugging back in, we ﬁnd

2n

2

= 4s

2

or n

2

= 2s

2

,

so n

2

is also even, giving that n is even. This is a contradiction.

From the previous theorem, what we know as

√

2 is not a rational number. Therefore if

we were to construct a theory from only rationals, we would have a “hole” where we think

√

2 should be. What is even stranger is that there are rational numbers arbitrarily close to

this hole.

Theorem 2.1.2. If q ∈ Q satisﬁes 0 < q

2

< 2 then we can ﬁnd another rational ˆ q ∈ Q such

that

q

2

< ˆ q

2

< 2 .

Similarly, for each r ∈ Q such that r

2

> 2, there is another rational ˆ r such that 2 < ˆ r

2

< r

2

.

Proof. Suppose that q > 0 satisﬁes q

2

< 2 and deﬁne

ˆ q = q +

2 −q

2

q + 2

.

Then ˆ q > q and

ˆ q

2

−2 =

2(q

2

−2)

(q + 2)

2

,

giving ˆ q

2

< 2.

18

We see from above that the set ¦q ∈ Q : q

2

< 2¦ does not have a largest element. This

leads us to study largest elements of sets more carefully.

Deﬁnition 2.1.3. If A is a set with a partial ordering ≤ we say that a ∈ A is an upper

bound for a subset B ⊂ A if b ≤ a for all b ∈ B. We say that a is a least upper bound for B

if whenever a

is an upper bound for B, we have a ≤ a

**. We deﬁne lower bound and greatest
**

lower bound similarly.

Note that if a is a least upper bound for B then a is unique. Indeed, assume that a and

a

are least upper bounds. Since they are both upper bounds, we have a ≤ a

and a

≤ a, so

by antisymmetry of partial orderings, a = a

**. Because of this uniqueness, there is no harm
**

in writing

a = sup B when a is the least upper bound of B

and

a = inf B when a is the greatest lower bound of B .

Proposition 2.1.4. Let A be a totally ordered set and B a subset. Deﬁne C to be the set

of all upper bounds for B. Then sup B = inf C.

Proof. We are trying to show that some element (inf C) is the supremum of B, so we must

show two things: inf C is an upper bound for B and any other upper bound a for B satisﬁes

inf C ≤ a. The second statement is easy because if a is an upper bound for B then a ∈ C.

As inf C is a lower bound for C we then have inf C ≤ a.

For the ﬁrst, assume that inf C is not an upper bound for B, so there exists b ∈ B such

that inf C is not ≥ b. By trichotomy, inf C < b. We claim then that b is a lower bound for C

which is larger than the greatest lower bound, a contradiction. Why is this? If c ∈ C then

c is an upper bound for B, giving c ≥ b, or b ≤ c.

Note that the second statement of Theorem 2.1.2 states that the set ¦q ∈ Q : q

2

> 2¦

does not have a supremum in Q. Indeed, if it did have a supremum r, then r would be a

rational upper bound for this set and then we could ﬁnd a smaller ˆ r that is still an upper

bound, a contradiction. So one way of formulating the fact that there are “holes” in Q is to

say that it does not have the least upper bound property.

Deﬁnition 2.1.5. Let A be a totally ordered set with order ≤. We say that A has the least

upper bound property if each nonempty subset B ⊂ A with an upper bound in A has a least

upper bound in A.

2.2 Existence and properties of real numbers

Therefore we are led to extend the rational numbers to ﬁll in the holes. This is actually

quite a diﬃcult procedure and there are many routes to its end. We will not discuss these,

however, and will instead state the main theorem about the existence of the real numbers

without proof. The main point of this course will be to understand properties of the real

numbers, and not its existence and uniqueness.

19

For the statement, one needs the deﬁnition of an ordered ﬁeld, which is a certain type of

totally ordered set with multiplication and addition (like the rationals).

Theorem 2.2.1 (Existence and uniqueness of R). There exists a unique ordered ﬁeld with

the least upper bound property.

The sense in which uniqueness holds is somewhat technical; it is not that any two ordered

ﬁelds as above must be equal, but they must be isomorphic. Again we defer to Rudin for

these deﬁnitions. We will now assume the existence of R, that it contains Q and Z, and its

usual properties.

One extremely useful property of R that follows from the least upper bound property is

Theorem 2.2.2 (Archimedean property of R). Given x, y ∈ R with x ,= 0, there exists

n ∈ Z such that

nx > y .

Proof. First let x, y ∈ R such that x, y > 0 and assume that there is no such n. Then the

set

¦nx : n ∈ N¦

is bounded above by y. As it is clearly nonempty, it has a supremum s. Then s −x < s, so

s −x cannot be an upper bound, giving the existence of some m ∈ N such that

s −x < mx .

However this implies that s < (m+1)x, so s was actually not an upper bound, contradiction.

This proves the statement for the case x < y. The other cases can be obtained from this one

by instead considering −x and/or −y.

The Archimedean property implies

Corollary 2.2.3 (Density of Q in R). Let x, y ∈ R with x < y. There exists q ∈ Q such

that x < q < y.

Proof. Apply the Archimedean property to y −x and 1 to ﬁnd n ∈ Z such that n(y −x) > 1.

We can also ﬁnd m

1

> nx and m

2

> −nx, so

−m

2

< nx < m

1

.

It follows then that there is an m ∈ Z such that m−1 ≤ nx < m. Finally,

nx < m ≤ 1 + nx < ny .

Dividing by n we get x < m/n < y.

Now we return to countability.

Theorem 2.2.4. The set Q is countable, whereas R is uncountable.

20

Proof. We already know that N N is countable: this is from setting up the array

(1, 1) (2, 1) (3, 1)

(1, 2) (2, 2) (3, 2)

(1, 3) (2, 3) (3, 3)

and listing the elements along diagonals. On the other hand, there is an injection

f : Q

+

→N N ,

where Q

+

is the set of positive rationals. One such f is given by f(m/n) = (m, n), where

m/n is the “reduced fraction” for the rational, expressed with m, n ∈ N. Therefore Q

+

is

countable. Similarly, Q

−

, the set of negative rationals, is countable. Last, Q = Q

+

∪Q

−

∪¦0¦

is a union of 3 countable sets and is thus countable.

To prove R is uncountable, we will use decimal expansions for real numbers. In other

words, we write

x = .a

1

a

2

a

3

. . .

where a

i

∈ ¦0, . . . , 9¦ for all i. Since we have not proved anything about decimal expansions,

we are certainly assuming a lot here, but this is how things go. Note that each real number

has at most 2 decimal expansions (for instance, 1/4 = .2500 . . . = .2499 . . .).

Assume that R is countable. Then as there are at most two decimal expansions for each

real number, the set of decimal expansions is countable (check this!) Now write the set of

all expansions in a list:

1 .a

0

a

1

a

2

. . .

2 .b

0

b

1

b

2

. . .

3 .c

0

c

1

c

2

. . .

We will show that no matter what list we are given (as above), there must be a sequence

that is not in the list. This implies that there can be no such list, and thus R is uncountable.

Consider the diagonal element of the list. That is, we take a

0

for the ﬁrst digit, b

1

for

the second, c

2

for the third and so on:

.a

0

b

1

c

2

d

3

. . .

We now have a rule to transform this diagonal element into a new one. We can use many,

but here is one: change each digit to a 0 if it is not 0, and replace it with 9 if it is 0. For

example,

.0119020 . . . −→.9000909 . . .

Note that this procedure changes the diagonal number into a new one that diﬀers from the

diagonal element in every decimal place. Call this new expansion A = .ˆ a

0

ˆ a

1

. . .

Now our original list contains all expansions, so it must contain A at some point; let us

say that the n-th element of the list is A. Then consider the n-th digit ˆ a

n

of A. On the

one hand, by construction, ˆ a

n

is not equal to the n-th digit of the diagonal element. On the

other hand, by the position in the list, ˆ a

n

equals the n-th digit of the diagonal element. This

is a contradiction.

21

2.3 R

n

for n ≥ 2

A very important extension of R is given by n-dimensional Euclidean space.

Deﬁnition 2.3.1. For n ≥ 2, the set R

n

is deﬁned as

R

n

= ¦a = (a

1

, . . . , a

n

) : a

i

∈ R for all i¦ .

Addition of elements is deﬁned as

a +

b = (a

1

, . . . , a

n

) + (b

1

, . . . , b

n

) = (a

1

+ b

1

, . . . , a

n

+ b

n

)

and multiplication of elements by numbers is

ca = c(a

1

, . . . , a

n

) = (ca

1

, . . . , ca

n

), c ∈ R .

Note that this deﬁnition gives us R for n = 1.

On R

n

we place a distance, but to do that, we need the existence of square roots. We

will take this for granted now, since we will prove it later using continuity.

Lemma 2.3.2. For each x ∈ R with x ≥ 0 there exists a unique y ∈ R such that y

2

= x.

This element is written y =

√

x.

Deﬁnition 2.3.3. On the set R

n

we deﬁne the norm

[a[ = [(a

1

, . . . , a

n

)[ =

_

a

2

1

+ + a

2

n

and inner product

a

b = (a

1

, . . . , a

n

) (b

1

, . . . , b

n

) = a

1

b

1

+ + a

n

b

n

.

Theorem 2.3.4. Suppose a,

b, c ∈ R

n

and c ∈ R. Then

1. [a[ ≥ 0 with [a[ = 0 if and only if a = 0.

2. [ca[ = [c[[a[.

3. (Cauchy-Schwarz inequality) [a

b[ ≤ [a[[

b[.

4. (Triangle inequality) [a +

b[ ≤ [a[ +[

b[.

5. [a −

b[ ≤ [a −c[ +[c −

b[.

Proof. The ﬁrst two follow easily; for instance since a

2

≥ 0 for all a ∈ R (this is actually

part of the deﬁnition of ordered ﬁeld), we get a

2

1

+ + a

2

n

≥ 0 and therefore [a[ ≥ 0. If

[a[ = 0 then by uniqueness of square roots, a

2

1

+ + a

2

n

= 0 and so 0 ≥ a

2

i

for all i, giving

a

i

= 0 for all i.

For the third item, we ﬁrst give a lemma.

22

Lemma 2.3.5. If ax

2

+ bx + c ≥ 0 for all x ∈ R then b

2

≤ 4ac.

Proof. If a = 0 then bx ≥ −c for all x. Then we claim b must be zero. If not, then plugging

in either 2c/b or −2c/b will give bx < −c, a contradiction. Therefore is a = 0 we must have

b = 0 and therefore b

2

≤ 4ac as claimed.

Otherwise a ,= 0. First assume that a > 0. Plug in x = −b/(2a) to get

−b

2

/(4a) + c ≥ 0

giving b

2

≤ 4ac. Last, if a < 0 then we have (−a)x

2

+ (−b)x + (−c) ≥ 0 and applying what

we have proved already to this polynomial, we ﬁnd (−b)

2

≤ 4(−a)(−c), or b

2

≤ 4ac.

To prove Cauchy-Schwarz, note that for all x ∈ R,

0 ≤ (a

1

x −b

1

)

2

+ + (a

n

x −b

n

)

2

= (a

2

1

+ + a

2

n

)x

2

−2(a

1

b

1

+ + a

n

b

n

)x + (b

2

1

+ + b

2

n

)

= [a[

2

x

2

−2(a

b)x +[

b[

2

.

So using the lemma, (a

b)

2

≤ [a[

2

[

b[

2

.

The last two items follow directly from the Cauchy-Schwarz inequality. Indeed,

[a +

b[

2

= (a

b) (a

b)

=a a + 2a

b +

b

b

≤ [a[

2

+ 2[a[[

b[ +[

b[

2

= ([a[ +[

b[)

2

.

The last inequality follows by taking a −c and c −

b in the previous.

2.4 Exercises

1. For each of the following examples, ﬁnd the supremum and the inﬁmum of the set S.

Also state whether or not they are elements of S.

(a) S = ¦x ∈ [0, 5] : cos x = 0¦.

(b) S = ¦x : x

2

−2x −3 < 0¦.

(c) S = ¦s

n

: s

n

=

n

i=1

2

−i

¦.

2. Prove by induction that for all n ∈ N and real numbers x

1

, . . . , x

n

,

[x

1

+ + x

n

[ ≤ [x

1

[ + +[x

n

[ .

3. Let A, B ⊂ R be nonempty and bounded above.

23

(a) Deﬁne the sum set

A + B = ¦a + b : a ∈ A, b ∈ B¦ .

Prove that sup(A + B) = sup A + sup B.

(b) Deﬁne the product set

A B = ¦a b : a ∈ A, b ∈ B¦ .

Is it true that sup(AB) = sup Asup B? If so, provide a proof; otherwise, provide

a counterexample.

4. Let ( be a collection of open intervals (sets I = (a, b) for a < b) such that

• for all I ∈ (, I ,= ∅ and

• if I, J ∈ ( satisfy I ,= J then I ∩ J = ∅.

Prove that ( is countable.

Hint. Deﬁne a function f : ( →S for some countable set S ⊂ R by setting f(I) equal

to some carefully chosen number.

24

3 Metric spaces

3.1 Deﬁnitions

Deﬁnition 3.1.1. A set X with a function d : X X → R is a metric space if for all

x, y, z ∈ X,

1. d(x, y) ≥ 0 and equals 0 if and only if x = y and

2. d(x, y) ≤ d(x, z) + d(z, y).

Then we call d a metric.

Examples.

1. A useful example of a metric space is R

n

with metric d(a,

b) = [a −

b[.

2. If X is any nonempty set we can deﬁne the discrete metric by

d(x, y) =

_

1 if x ,= y

0 if x = y

.

3. The set F[0, 1] of bounded functions f : [0, 1] →R is a metric space with metric

d(f, g) = sup¦[f(x) −g(x)[ : x ∈ [0, 1]¦ .

3.2 Open and closed sets

Let (X, d) be a metric space. We are interested in the possible subsets of X and in what

ways we can describe these using the metric d. Let’s start with the simplest.

Deﬁnition 3.2.1. Let r > 0. The neighborhood of radius r centered at x ∈ X is the set

B

r

(x) = ¦y ∈ X : d(x, y) < r¦

For example,

1. in R using the metric d(x, y) = [x −y[ we have the open interval

B

r

(x) = (x −r, x + r) = ¦y ∈ R : x −r < y < x + r¦ .

2. In R

n

using the metric d(x, y) = [x −y[ we have the open ball

B

r

(x) = ¦(y

1

, . . . , y

n

) : (x

1

−y

1

)

2

+ + (x

n

−y

n

)

2

< r

2

¦ .

To describe that these sets appear to be open (that is, no point is on the boundary), we

introduce a formal deﬁnition of open.

Deﬁnition 3.2.2. Let (X, d) be a metric space. A set Y ⊂ X is open if for each y ∈ Y

there exists r > 0 such that B

r

(y) ⊂ Y .

For each point y we must be able to ﬁt a (possibly tiny) neighborhood around y so that

it still stays in the set Y . Thinking of Y as, for example, an open ball in R

n

, as our point

y approaches the boundary of this set, the radius we take for the neighborhood around this

point will have to decrease.

25

Proposition 3.2.3. Any neighborhood is open.

Proof. Let x ∈ X and r > 0. To show that B

r

(x) is open we must choose y ∈ B

r

(x) and

show that there exists some s > 0 such that B

s

(y) ⊂ B

r

(x). The radius s will depend on

how close y is to the boundary. Therefore, choose

s = r −d(x, y) .

To show that for this s, we have B

s

(y) ⊂ B

r

(x) we take z ∈ B

s

(y). Then

d(x, z) ≤ d(x, y) + d(y, z) < d(x, y) + s = r .

Some more examples:

1. In R, the only intervals that are open are the (surprise!) open intervals. For instance,

let’s consider the half-open interval (0, 1] = ¦x ∈ R : 0 < x ≤ 1¦. If it were open, we

would be able to, given any x ∈ (0, 1], ﬁnd r > 0 such that B

r

(x) ⊂ (0, 1]. But clearly

this is false because B

r

(1) contains 1 + r/2.

2. In R

2

, the set ¦(x, y) : y > 0¦ ∪ ¦(x, y) : y < −1¦ is open.

3. In R

3

, the set ¦(x, y, z) : y > 0¦ ∪ ¦(0, 0, 0)¦ is not open.

Proposition 3.2.4. Let ( be a collection of open sets.

1. The union

O∈C

O is open.

2. If ( is ﬁnite then

O∈C

O is open. This need not be true if the collection is inﬁnite.

Proof. Let x ∈ ∪

O∈C

O. Then there exists O ∈ ( such that x ∈ O. Since O is open, there

exists r > 0 such that B

r

(x) ⊂ O. This is also a subset of ∪

O∈C

O so this set is open.

To show that we cannot allow inﬁnite intersections, consider the sets (−1/n, 1 + 1/n) in

R. We have

∩

∞

n=1

(−1/n, 1 + 1/n) = [0, 1] ,

which is not open (under the usual metric of R).

For ﬁnite intersections, let O

1

, . . . , O

n

be the open sets from ( and x ∈ ∩

n

i=1

O

i

. Then

for each i, we have x ∈ O

i

and therefore there exists r

i

> 0 such that B

r

i

(x) ⊂ O

i

. Letting

r = min¦r

1

, . . . , r

n

¦, we have B

r

(x) ⊂ B

r

i

(x) for all i and therefore B

r

(x) ⊂ O

i

for all i.

This implies B

r

(x) is a subset of the intersection and we are done.

Deﬁnition 3.2.5. An interior point of Y ⊂ X is a point y ∈ Y such that there exists r > 0

with B

r

(y) ⊂ Y . Write Y

◦

for the set of interior points of Y .

Directly by deﬁnition, Y is open if and only if Y = Y

◦

.

26

Examples:

1. The set of interior points of [0, 1] (under the usual metric) is (0, 1).

2. The set of interior points of

¦(x, y) : y > 0¦ ∪ ¦(x, y) : x = −1, y ≥ 0¦

is just ¦(x, y) : y > 0¦.

3. What is the set of interior points of Q?

4. Deﬁne a metric on R

2

by d(x, y) = 1 if x ,= y and 0 otherwise. This can be shown to

be a metric. Given a set Y ⊂ R

2

, what is Y

◦

?

Deﬁnition 3.2.6. A set Y ⊂ X is closed if its complement X ¸ Y is open.

Sets can be both open and closed. Consider ∅, whose complement is clearly open, making

∅ closed. It is also open.

Proposition 3.2.7. Let ( be a collection of closed sets.

1.

C∈C

C is closed.

2. If ( is ﬁnite then

C∈C

C is closed.

Proof. Just use X ¸ [∩

C∈C

C] = ∪

C∈C

(X ¸ C).

3.3 Limit points

There is an alternative characterization of closed sets in terms of limit points

Deﬁnition 3.3.1. Let Y ⊂ X. A point x ∈ X is a limit point of Y if for each r > 0 there

exists y ∈ Y such that y ,= x and y ∈ B

r

(x). Write Y

**for the set of limit points of Y .
**

Examples:

1. 0 is a limit point of ¦1, 1/2, 1/3, . . .¦.

2. ¦1, 2, 3¦ has no limit points.

3. In R

2

, B

1

(0) ∪ ¦(0, y) : y ∈ R¦ ∪ ¦(10, 10)¦ has limit points

¦(x, y) : x

2

+ y

2

≤ 1¦ ∪ ¦(0, y) : y ∈ R¦ .

Actually we could have given a diﬀerent deﬁnition of limit point.

Proposition 3.3.2. x ∈ X is a limit point of Y if and only if for each r > 0 there are

inﬁnitely many points of Y in B

r

(x)

27

Proof. We need only show that if x is a limit point of Y and r > 0 then there are in-

ﬁnitely many points of Y in B

r

(x). We argue by contradiction; assume there are only

ﬁnitely many and label the ones that are not equal to x as y

1

, . . . , y

n

. Choosing r =

min¦d(x, y

1

), . . . , d(x, y

n

)¦, we then have that B

r

(x) contains no points of Y except pos-

sibly x. This contradicts the fact that x is a limit point of Y .

Here is yet another deﬁnition of closed.

Theorem 3.3.3. Y is closed if and only if Y

⊂ Y .

Proof. Suppose Y is closed and let y be limit point of Y . If y / ∈ Y then because X ¸ Y is

open, we can ﬁnd r > 0 such that B

r

(y) ⊂ (X ¸ Y ). But for this r, there is no x ∈ B

r

(y)

that is also in Y , so that y is not a limit point of Y , a contradiction.

Suppose conversely that Y

**⊂ Y ; we will show that Y is closed by showing that X ¸ Y is
**

open. To do this, let z ∈ X ¸ Y . Since z / ∈ Y and Y

**⊂ Y , z cannot be a limit point of Y .
**

Therefore there is an r > 0 such that B

r

(z) contains no points p ,= z such that p ∈ Y . Since

z is also not in Y , we must have B

r

(z) ⊂ (X ¸ Y ), implying that X ¸ Y is open.

Examples:

1. Again the set ¦1, 2, 3¦ has no limit points (because from the above proposition, a ﬁnite

set cannot have limit points). However it is closed by the above theorem.

2. Is Q closed in R? How about Q

2

in R

2

?

3. The set Z has no limit points in R, so it is closed.

Deﬁnition 3.3.4. The closure of Y in X is the set Y = Y ∪ Y

.

Theorem 3.3.5. Let ( be the collection of all sets C ⊂ X such that C is closed and Y ⊂ C.

Then

Y =

C∈C

C .

Proof. We ﬁrst show the inclusion ⊂. To do this we need to show that each y ∈ Y and

each y ∈ Y

**must be in the intersection on the right (call it J). First if y ∈ Y then because
**

each C ∈ ( contains Y , we have y ∈ J. Second, if y ∈ Y

**and C ∈ ( we also claim that
**

y ∈ C. This is because y, being a limit point of Y , is also a limit point of C (directly from

the deﬁnition). However C is closed, so it contains its limit points, and y ∈ C.

For the inclusion ⊃, we will show that Y ∈ (. This implies that Y is one of the sets we

are intersecting to form J, and so J ⊂ Y . Clearly Y ⊃ Y , so we need to show that Y is

closed. If x / ∈ Y then x is not in Y and x is not a limit point of Y , so there exists r > 0 such

that B

r

(x) does not intersect Y . Since B

r

(x) is open, each point in it has a neighborhood is

contained in B

r

(x) and therefore does not intersect Y . This means that each point in B

r

(x)

is not in Y and is not a limit point of Y , giving B

r

(x) ⊂

_

Y

_

c

, so the complement of Y is

open. Thus Y is closed.

From the theorem above, we have a couple of consequences:

28

1. For all Y ⊂ X, Y is closed. This is because the intersection of closed sets is closed.

2. Y = Y if and only if Y is closed. One direction is clear: that if Y = Y then Y is closed.

For the other direction, if Y is closed then Y

⊂ Y and therefore Y = Y ∪ Y

⊂ Y .

Examples:

1. Q = R.

2. R ¸ Q = R.

3. ¦1, 1/2, 1/3, . . .¦ = ¦1, 1/2, 1/3, . . .¦ ∪ ¦0¦.

For some practice, we give Theorem 2.28 from Rudin:

Theorem 3.3.6. Let Y ⊂ R be nonempty and bounded above. Then sup Y ∈ Y and therefore

sup Y ∈ Y if Y is closed.

Proof. By the least upper bound property, s = sup Y exists. To show s ∈ Y we need to

show that s ∈ Y or s ∈ Y

. If s ∈ Y we are done, so we assume s / ∈ Y and prove that s ∈ Y

.

Since s is the least upper bound, given r > 0 there must exist y ∈ Y such that

s −r < y ≤ s .

If this were not true, then s − r would be an upper bound for Y . But now we have found

y ∈ Y such that y ,= s and y ∈ B

r

(s), proving that s is a limit point for Y .

Note that sup Y is not always a limit point of Y . Indeed, consider the set

Y = ¦0¦ .

This set has sup Y = 0 but has no limit points. The set Y can even have limit points but

just not with sup Y a limit point. Consider Y = ¦0¦ ∪ [−2, −1].

3.4 Compactness

It will be very important for us, during the study of continuity for instance, to understand

exactly which sets Y ⊂ R have the following property: for each inﬁnite subset E ⊂ Y , E

has a limit point in Y . We will soon see that the interval [0, 1] has this property, whereas

(0, 1) does not (take for example the subset ¦1, 1/2, 1/3, . . .¦). The reason is that we will

many times ﬁnd ourselves exactly in this situation: with an inﬁnite subset E of some set Y

and we will want to ﬁnd a limit point for E (and hope that it is also in E). This property

is what we will call on the problem set limit point compactness.

Limit point compactness was apparently one of the original notions of compactness (see

the discussion in Munkres’ topology book at the beginning of the compactness section –

thanks Prof. McConnell). However over time it became apparent that there was a stronger

and more general version of compactness (equivalent in metric spaces, but not in all topo-

logical spaces) which could be formulated only in terms of open sets. We give this deﬁnition,

now taken to be the standard one, below.

29

Deﬁnition 3.4.1. A subset K of a metric space X is compact if for every collection (

of open sets such that K ⊂ ∪

C∈C

C, there are ﬁnitely many sets C

1

, . . . , C

n

∈ ( such that

K ⊂ ∪

n

i=1

C

i

.

The collection ( is called an open cover for K and ¦C

1

, . . . , C

n

¦ is a ﬁnite subcover.

The process of choosing this ﬁnite number of sets from ( is referred to as extracting a ﬁnite

subcover. The deﬁnition, in this language, states that K is compact if from every open cover

of K we can extract a ﬁnite subcover of K.

It is quite diﬃcult to gain intuition about the above deﬁnition, but it will develop as we

go on and use compactness in various circumstances. The main point is that ﬁnite collections

are much more useful than inﬁnite collections. This is true for example with numbers: we

already know that a set of ﬁnitely many numbers has a min and a max, whereas an inﬁnite set

does not necessarily. As we go through the course, to develop a clearer view of compactness,

you should revisit the following phrase: often times, compactness allows us to pass from

“local” information (valid in each open set from the cover) to “global” information (valid on

the whole space), by patching together the sets in the ﬁnite subcover.

Let us now give some properties of compact sets and try to emphasize where the ability

to extract ﬁnite subcovers comes into the proofs.

Theorem 3.4.2. Any compact set is limit point compact.

Proof. Let K ⊂ X be compact and let E ⊂ K be an inﬁnite set. Assume for a contradiction

that E has no limit point in K, so for each x ∈ K we can ﬁnd r

x

> 0 such that B

rx

(x)

intersects E only possibly at x. The collection ( = ¦B

rx

(x) : x ∈ K¦ is an open cover of

K, so by compactness it can be reduced to a ﬁnite subcover of K (and thus of E). But this

means that E must have been ﬁnite, a contradiction.

Deﬁnition 3.4.3. A set E ⊂ X is bounded if there exists x ∈ X and R > 0 such that

E ⊂ B

R

(x).

Theorem 3.4.4. Any compact K ⊂ X is bounded.

Proof. Pick x ∈ X and deﬁne a collection ( of open sets by

( = ¦B

R

(x) : R ∈ N¦ .

We claim that ( is an open cover of K. We need just to show that each point of X is in at

least one of the sets of (. So let y ∈ X and choose R > d(y, x). Then y ∈ B

R

(x).

Since K is compact, there exist C

1

, . . . , C

n

∈ ( such that K ⊂ ∪

n

i=1

C

i

. By deﬁnition

of the sets in ( we can then ﬁnd R

1

, . . . , R

n

such that K ⊂ ∪

n

i=1

B

R

i

(x). Taking R =

max¦R

1

, . . . , R

n

¦, we then have K ⊂ B

R

(x), completing the proof.

In the proof it was essential to extract a ﬁnite subcover because we wanted to take R to

be the maximum of radii of sets in (. This is clearly inﬁnity if we have an inﬁnite subcover,

and so in this case the proof would break down (that is, if K we were not able to extract a

ﬁnite subcover).

Examples.

30

1. The set ¦1/2, 1/3, . . .¦ is not compact. This is because we can ﬁnd an open cover that

admits no ﬁnite subcover. Indeed, consider

( =

__

1

n

−

1

2n

,

1

n

+

1

2n

_

: n ≥ 2

_

.

Each one of the sets in the above collection covers only ﬁnitely many elements from

¦1/2, 1/3, . . .¦, and so any ﬁnite sub collection cannot cover the whole set.

2. However if we add 0, by considering the set ¦1/2, 1/3, . . .¦ ∪ ¦0¦, it becomes compact.

To prove this, let ( be any open cover; we will show that there are ﬁnitely many sets

from ( that still cover our set.

To do this, note ﬁrst that there must be some C ∈ ( such that 0 ∈ C. Since C is

open, it contains some interval (−r, r) for r > 0. Then for n > 1/r, all points 1/n are

in this interval, and thus C contains all but ﬁnitely many of the points from our set.

Now we just need to cover the other points, of which there are ﬁnitely many. Writing

1/2, . . . , 1/N for these points, choose for each i a set C

i

from ( such that 1/i ∈ C

i

.

Then

¦C, C

2

, . . . , C

N

¦

is a ﬁnite subcover.

The main problem in example 1 was actually that the set was not closed. It is not

immediately apparent how that was manifested in our inability to produce a ﬁnite subcover,

but it is a general fact:

Theorem 3.4.5. Any compact K ⊂ X is closed.

Proof. We will show that K

c

= X ¸ K is open. Therefore pick x ∈ K

c

; we will produce an

r > 0 such that B

r

(x) ⊂ K

c

.

We ﬁrst produce an open cover of K. For each y ∈ K, the distance d(x, y) must be

positive, since x ,= y (as x / ∈ K). Therefore deﬁne the ball

B

y

= ¦B(y, d(x, y)/2)¦ .

We now deﬁne the collection

( = ¦B

y

: y ∈ K¦ .

Since each y ∈ B

y

this is an open cover of K.

By compactness, we can extract a ﬁnite subcover ¦B

y

1

, . . . , B

yn

¦. Choosing

r = min¦d(x, y

i

)/2 : i = 1, . . . , n¦ ,

we claim then that B

r

(x) ⊂ K

c

. To show this, let z ∈ B

r

(x). Then d(z, x) < r and by the

triangle inequality,

d(y

i

, x) < d(z, y

i

) + d(z, x) < d(z, y

i

) + r ,

31

giving

d(x, y

i

)/2 ≤ d(y

i

, x) −r ≤ d(z, y

i

) for all i = 1, . . . , n .

In other words, z / ∈ B

y

i

for all i. But the B

y

i

’s cover K and therefore z / ∈ K. This means

K

c

is open, or K is closed.

We now mention a useful way to produce new compact sets from old ones.

Theorem 3.4.6. If K ⊂ X is compact and L ⊂ K is closed, then L is compact.

Proof. Let ( be an open cover of L. Deﬁne

T = ( ∪ ¦L

c

¦

and note that T is actually an open cover of K. Therefore, as K is compact, we can extract

from T a ﬁnite subcover ¦D

1

, . . . , D

n

¦. If D

i

∈ ( for all i, then we are done; otherwise L

c

it in this set (say it is D

n

) and we consider the collection ¦D

1

, . . . , D

n−1

¦. This is a ﬁnite

subcollection of (. We claim that it is an open cover of L as well. Indeed, if x ∈ L then

there exists i = 1, . . . , n such that x ∈ D

i

. Since x / ∈ L

c

, D

i

cannot equal :

c

, meaning that

i ,= n. This completes the proof.

3.5 Heine-Borel Theorem: compactness in R

n

In the above theorems, we see that a compact set is always closed and bounded. The converse

is true, but not in all metric spaces. The fact that it is true in R

n

is called the Heine-Borel

theorem.

Theorem 3.5.1 (Heine-Borel). A set K ⊂ R

n

is compact if and only if it is closed and

bounded.

To prove this theorem, we will need some preliminary results. Recall that Rudin deﬁnes

an n-cell to be a subset of R

n

of the form

[a

1

, b

1

] [a

n

, b

n

] for a

i

≤ b

i

, i = 1, . . . , n .

Lemma 3.5.2. Suppose that C

1

, C

2

, . . . are n-cells that are nested; that is, if

C

i

=

n

k=1

[a

(k)

i

, b

(k)

i

] ,

then

[a

(k)

i

, b

(k)

i

] ⊃ [a

(k)

i+1

, b

(k)

i+1

] for all i and k .

Then ∩

i

C

i

is nonempty.

32

Proof. We ﬁrst consider the case n = 1. That is, take C

i

= [a

i

, b

i

] for i ≥ 1 and a

i

≤ b

i

.

Deﬁne A = ¦a

1

, a

2

, . . .¦ and a = sup A. We claim that

a ∈ ∩

i

C

i

.

To see this, note that a

i

≤ b

j

for all i, j. Indeed,

a

i

≤ a

j

≤ b

j

if i ≤ j

and

a

i

≤ b

i

≤ b

j

if i ≥ j .

Therefore b

j

is an upper bound for A. But a is the least upper bound of A so a ≤ b

j

for all

m. This gives

a

i

≤ a ≤ b

i

for all m ,

or a ∈ ∩

i

C

i

.

For the case n ≥ 2 we just do the same argument on each of the coordinates to ﬁnd

(a(1), . . . , a(n)) such that

a

(k)

i

≤ a(k) ≤ b

(k)

i

for all i, k ,

or (a(1), . . . , a(n)) ∈ ∩

i

C

i

.

Lemma 3.5.3. Any n-cell is compact in R

n

.

Proof. For simplicity, take K = [0, 1] [0, 1] = [0, 1]

n

. Since R

n

is a metric space (with

the usual metric), it suﬃces to prove that K is limit point compact; that is, that each inﬁnite

subset of K has a limit point in K. This is from exercise 11 at the end of the Chapter. It

states that compactness and limit point compactness are equivalent in metric spaces.

Suppose that E ⊂ K is inﬁnite. We will produce a limit point of E inside K. We begin

by dividing K into 2

n

sub-cells by cutting each interval [0, 1] into two equal pieces. For

instance, in R

2

we would consider the 4 sub-cells

[0, 1/2] [0, 1/2], [0, 1/2] [1/2, 1], [1/2, 1] [0, 1/2], [1/2, 1] [1/2, 1] .

At least one of these 2

n

sub-cells must contain inﬁnitely many points of E. Call this sub-cell

K

1

. Repeat, by dividing K

1

into 2

n

equal sub-cells to ﬁnd a sub-sub-cell K

2

which contains

inﬁnitely many points of E.

We continue this procedure ad inﬁnitum, at stage i ≥ 1 ﬁnding a sub-cell K

i

of K of the

form

K

i

= [r

1,i

2

−i

, (r

1,i

+ 1)2

−i

] [r

n,i

2

−i

, (r

n,i

+ 1)2

−i

]

which contains inﬁnitely many points of K. Note that the K

i

’s satisfy the conditions of the

previous lemma: they are nested n-cells. Therefore there exists z ∈ ∩

i

K

i

. Because each K

i

is a subset of K, we have z ∈ K.

We claim that z is a limit point of E. To show this, let r > 0. Note that for all points

x, y ∈ K

i

we have

[x −y[

2

= (x

1

−y

1

)

2

+ + (x

n

−y

n

)

2

≤ n(2

−i

)

2

=

n

4

i

.

33

Therefore

diam(K

i

) = sup¦[x −y[ : x, y ∈ K

i

¦ ≤

√

n

2

i

≤

√

n

i

.

(You can prove this inequality i ≤ 2

i

for all i by induction.) So ﬁx any i >

√

n

r

; then for all

x ∈ K

i

we have (because z ∈ K

i

)

[x −z[ ≤ diam(K

i

) ≤

√

n

i

< r ,

so that K

i

⊂ B

r

(z). However K

i

contains inﬁnitely many points of E, so we can ﬁnd one

not equal to z in B

r

(z). This means z is a limit point of E.

Proof of Heine-Borel. Suppose that K is closed and bounded in R

n

. Then there exists an

n-cell C such that K ⊂ C. By the previous lemma, C is compact. But K is a closed subset

of C so K is compact.

Suppose conversely that K is compact. Then we have already shown K is closed and

bounded.

Therefore we ﬁnd that

(closed and bounded) ⇒(compact) in R

n

,

(closed and bounded) ⇐(compact) in metric spaces ,

and

(compact) ⇔(limit point compact) in metric spaces .

3.6 The Cantor set

We now give a famous example of a compact set. This is the Cantor set, and it has many

interesting properties. We construct it iteratively. At stage 0, we start with

C

0

= [0, 1] ,

the entire unit interval in R. At stage 1, we remove the middle third of C

0

to produce

C

1

=

_

0,

1

3

_

∪

_

2

3

, 1

_

,

a set which is a union of 2 disjoint closed intervals, each of length 1/3. We continue, at stage

n, producing C

n

from C

n−1

by removing the middle third of each interval which comprises

C

n−1

. For example,

C

2

=

_

0,

1

9

_

∪

_

2

9

,

1

3

_

∪

_

2

3

,

7

9

_

∪

_

8

9

, 1

_

.

It follows that at stage n, the set C

n

is a union of 2

n

disjoint closed intervals, each of length

3

−n

. We then deﬁne

C = ∩

∞

n=0

C

n

to be the Cantor set.

Properties.

34

1. C is closed because it is an intersection of closed sets.

2. C is compact because it is closed and bounded (in R).

3. C has “total length” 0. Although we have not deﬁned this, we can compute the length

of C

n

: it is composed of 2

n

intervals of length 3

−n

. Thus its “length” is (2/3)

n

. Because

this number tends to 0 as n goes to inﬁnity (don’t worry – we will deﬁne these things

rigorously later),

length(C) ≤ length(C

n

) = (2/3)

n

for all n, giving length(C) = 0.

4. Although it looks like all that will remain in the end is the endpoints of the intervals

used to construct C, in fact there is much more. The set of such endpoints is countable,

whereas C is uncountable. To see why this is true, note that each x ∈ C can be given

an “address.” The point x is in C

1

, so it is in exactly one of the two intervals of C

1

;

assign the value 0 to x if it is in the ﬁrst and 1 if it is in the second. Similarly, the

set C

2

splits each interval of C

1

into two: give x the value 0 if it is in the left such

interval and 1 if it is in the right. Continuing in this way, we can assign to x an inﬁnite

sequence of 0’s and 1’s:

x →0111000110101 . . .

(In fact, this is nothing but the ternary expansion of x, replacing 2’s by 1’s.) The map

sending x to a sequence is actually a bijection from C to the set of sequences of 0’s

and 1’s, which we know is uncountable.

One example of an element of C that is not an endpoint is 1/4: its address is

1/4 →010101 . . .

(Endpoints have 1-term repeating addresses, like 1/3 →011111 . . .)

5. Every point of C is a limit point of C. To show this, we will prove more. We will show

that for each x ∈ C and each r > 0, there are points y and z in (x − r, x + r) such

that y ,= x ,= z and y ∈ C, z / ∈ C. To do this, choose N such that N > 1/r and note

that since 2

N

> N we certainly have 3

N

> N, giving 3

−N

< r. Since x ∈ C it follows

that x ∈ C

N

so there is some subinterval I of C

N

of length 3

−N

such that x ∈ I. This

interval is necessarily contained in (x − r, x + r) and so both endpoints of I (which

survive through the construction of C) are in this neighborhood. At least one of these

points is not equal to x, so we have found a point of C in (x −r, x +r), giving that x

is a limit point of C.

6. C contains no open intervals. This proof proceeds like in the previous case. If x ∈ C

and r > 0 we can ﬁnd some N such that C

N

contains an interval I entirely contained

in (x −r, x +r). In the next stage of the construction, we remove part of this interval

and so we can ﬁnd some y ∈ C

c

that is in (x − r, x + r). So, each point of C is also

a limit point of C

c

. This implies that no point of C is an interior point, which in R

means that C contains no open intervals.

35

Let’s ﬁnish with an observation. In exercise 6, you are asked to show that if (X, d) is

a metric space and (O

n

) is a countable collection of open dense subsets of X then ∩

∞

n=1

O

n

is nonempty. From this we can actually derive uncountability of the real numbers. Indeed,

assume for a contradiction that R is countable and list its elements as ¦x

1

, x

2

, . . .¦. Deﬁne

O

n

= R¸ ¦x

n

¦. Each O

n

is open and dense in R, so the intersection of all O

n

’s is nonempty.

This is a contradiction, since

n

O

n

=

n

[R ¸ ¦x

n

¦] = ∅ .

3.7 Exercises

1. For X = R

2

deﬁne the function d : X X →R by

d ((x

1

, y

1

), (x

2

, y

2

)) = [x

1

−x

2

[ +[y

1

−y

2

[ .

Prove that d is a metric on X. Describe the unit ball centered at the origin geometri-

cally. Repeat this question using

d ((x

1

, y

1

), (x

2

, y

2

)) = max¦[x

1

−x

2

[, [y

1

−y

2

[¦ .

2. Let F[0, 1] be the set of all bounded functions from [0, 1] to R. Show that d is a metric,

where

d(f, g) = sup¦[f(x) −g(x)[ : x ∈ [0, 1]¦ .

3. Let X be the set of real valued sequences with only ﬁnitely many nonzero terms:

X = ¦x = (x

1

, x

2

, . . .) : x

i

∈ R and x

i

,= 0 for only ﬁnitely many i¦ .

For an element x ∈ X write n(x) for the largest i ∈ N such that x

i

,= 0. Deﬁne the

function d : X X →R by

d(x, y) =

_

_

max{n(x),n(y)}

i=1

(x

i

−y

i

)

2

_

_

1/2

.

(a) Show that (X, d) is a metric space.

(b) For each n ∈ N deﬁne e

n

as the element of X that has n-th coordinate equal to

1 and all others zero. Show that the set ¦e

n

: n ∈ N¦ is closed and bounded but

does not have a limit point.

4. For each of the following examples, verify that the collection ( is an open cover of E

and determine if it can be reduced to a ﬁnite subcover. If it can, give a ﬁnite subcover;

otherwise, show why it cannot be reduced.

(a) E = ¦1, 1/2, 1/4, . . .¦ = ¦2

−n

: n ≥ 0¦, ( = ¦(2

−n−1

, 3 2

−n−1

) : n ≥ 0¦.

36

(b) E = [0, 1], ( = ¦(x −10

−4

, x + 10

−4

) : x ∈ Q∩ [0, 1]¦.

5. Prove that an uncountable set E ⊂ R cannot have countably many limit points.

Hint. Argue by contradiction and assume that there is a set E ⊂ R such that E

, the

set of limit points of E, is countable. What can you say about E ¸ E

?

6. An open interval I ⊂ R is a set of the form

• (a, b) = ¦x ∈ R : a < x < b¦ for a ≤ b or

• (a, ∞) = ¦x ∈ R : a < x¦ or

• (−∞, b) = ¦x ∈ R : x < b¦ or

• R = (−∞, ∞).

Let O ⊂ R be a nonempty open set and let x ∈ O. Deﬁne O

x

as the union of all open

intervals I such that x ∈ I and I ⊂ O. Prove that O

x

is a nonempty open interval.

7. Let O ⊂ R be a nonempty open set. By completing the following two steps, show that

there exists a countable collection ( of open intervals such that

for all I, J ∈ ( with I ,= J, we have I ∩ J = ∅ and (2)

_

I∈C

I = O . (3)

(a) For x ∈ O, let O

x

be deﬁned as in exercise 3. Show that if O

x

∩ O

y

,= ∅ for some

x, y ∈ O then O

x

= O

y

.

(b) Deﬁne ( = ¦O

x

: x ∈ O¦ and complete the proof by showing that ( is countable

and has properties (2) and (3).

8. The Kuratowski closure and complement problem. For subsets A of a metric space X,

consider two operations, the closure A, and the complement A

c

= X ¸ A. We can

perform these multiple times, as in A

c

, (A)

c

, etc.

(a) Prove that, starting with a given A, one can form no more than 14 distinct sets by

applying the two operations successively.

(b) Letting X = R, ﬁnd a subset A ⊆ R for which the maximum of 14 is attained.

Hint to get started. Clearly (A

c

)

c

= A, so two complements in a row get you nothing

new. What about two closures in a row? See Rudin, Thm. 2.27.

9. A subset E of a metric space X is dense if each point of X is in E or is a limit point

of E (or both). Let A

1

, A

2

, . . . be open dense sets in R. Show that ∩

n

A

n

,= ∅.

Hint. Deﬁne a sequence of sets as follows. Choose x

1

∈ A

1

and r

1

> 0 such that

B

r

1

(x

1

) ⊂ A

1

. Then argue that there exists x

2

∈ A

1

∩ A

2

and r

2

> 0 such that

37

B

r

2

(x

2

) ⊂ B

r

1

/2

(x

1

). Continuing, ﬁnd inﬁnite sequences r

1

, r

2

, . . . and x

1

, x

2

, . . . such

that x

n

∈ ∩

n−1

k=1

A

k

and B

rn

(x

n

) ⊂ B

r

n−1

/2

(x

n−1

). Then, for each n, deﬁne B

n

=

B

rn/2

(x

n

). What can you say about ∩

n

B

n

?

10. Show that both Q and R ¸ Q are dense in R with the usual metric.

11. We now extend the deﬁnition of dense. If E

1

, E

2

are subsets of a metric space X then

E

1

is dense in E

2

if each point of E

2

is in E

1

or is a limit point of E

1

. Show that if

E

1

, E

2

, E

3

are subsets of X such that E

1

is dense in E

2

and E

2

is dense in E

3

then E

1

is dense in E

3

.

12. We say a metric space (X, d) has the ﬁnite intersection property if whenever ( is a col-

lection of closed sets in X such that each ﬁnite subcollection has nonempty intersection,

the full collection has nonempty intersection:

C∈C

C ,= ∅ .

Show that X has the ﬁnite intersection property if and only if X is compact. (You

may use Rudin, Theorem 2.36.)

13. Let / be a collection of compact subsets of a metric space X.

(a) Show that

K∈K

K is compact.

(b) Show that if / is ﬁnite then

K∈K

K is compact. Is this still true if / is inﬁnite?

14. Let (X, d) be a metric space. We say that a subset E of X is limit point compact if

every inﬁnite subset of E has a limit point in E. We have seen in class that if E is

compact then E is limit point compact. This exercise will serve to show the converse:

that if E is limit point compact then it is compact. For the following questions, ﬁx a

subset E that is limit point compact.

(a) Show that E is closed.

(b) Show that if δ > 0 then there exist ﬁnitely many points x

1

, . . . , x

n

in E such that

E ⊂

n

_

i=1

B

δ

(x

i

) .

(c) Show that if A ⊂ E is closed, then A is also limit point compact.

(d) Show that if A

1

, A

2

, . . . are closed subsets of E such that A

n

⊃ A

n+1

for all n ≥ 1

then ∩

n

A

n

,= ∅.

Hint. Deﬁne a set ¦x

n

: n ≥ 1¦ by choosing x

1

∈ A

1

, x

2

∈ A

2

, and so on.

38

(e) Use the previous parts to argue that E is compact.

Hint. Argue by contradiction and assume that there is an open cover ( of E that

cannot be reduced to a ﬁnite subcover. Begin with δ

1

= 1/2 and apply part (b)

to get points x

1

1

, . . . , x

1

n

1

∈ E such that E ⊂ ∪

n

1

k=1

B

δ

1

(x

1

k

). Clearly

E ⊂ ∪

n

1

k=1

_

B

δ

1

(x

1

k

) ∩ E

_

.

At least one of these sets, say B

δ

1

(x

1

j

1

) ∩ E cannot be covered by a ﬁnite number

of sets from (, or else we would have a contradiction. By parts (a) and (c), it has

the limit point property and ( is a cover of it, so repeat the construction using

this set instead of E and δ

2

= 1/4. Continue, at step n ≥ 3 using δ

n

= 2

−n

, to

create a decreasing sequence of closed subsets of E. Use part (d).

15. In this exercise we will consider a construction similar to that of the Cantor set. We

will deﬁne a countable collection of subsets ¦E

n

: n ≥ 0¦ of the interval [0, 1] and we

will set E = ∩

∞

n=0

E

n

.

We deﬁne E

0

= [0, 1], the entire interval. To deﬁne E

1

, we remove a subinterval of E

0

of length 1/4 from the middle of E

0

. Precisely, we set

E

1

=

_

0,

3

8

_

∪

_

5

8

, 1

_

.

Next, let E

2

be the set obtained by removing two subintervals, each of length 1/16,

from the middle of each piece of E

1

. Thus

E

2

=

_

0,

5

32

_

∪

_

7

32

,

3

8

_

∪

_

5

8

,

25

32

_

∪

_

27

32

, 1

_

.

Continuing, at each step n ≥ 3, we create E

n

by removing 2

n−1

subintervals, each of

length 4

−n

, from the middle of each piece of E

n−1

. Deﬁne

E =

∞

n=0

E

n

.

(a) Show that each point of E is a limit point of E.

(b) Show that E does not contain any open interval.

(c) What is the total length of E?

39

4 Sequences

4.1 Deﬁnitions

Deﬁnition 4.1.1. Let (X, d) be a metric space. A sequence is a function f : N →X.

We think of a sequence as a list of its elements. We typically write x

1

= f(1), x

2

= f(2)

and forget about f, denoting the sequence as (x

n

) and the elements x

1

, x

2

, . . ..

The most fundamental notion related to sequences is that of convergence.

Deﬁnition 4.1.2. A sequence (x

n

) converges to a point x ∈ X if for every > 0 there exists

N such that if n ≥ N then

d(x

n

, x) < .

In this case we write x

n

→x.

We can think of proving convergence of a sequence as follows. We have a sequence (x

n

)

and you tell me it has a limit x. I ask “Oh yeah? Well can you show that the terms of the

sequence get very close to x?” You say yes and I ask “Can you show that all but ﬁnitely

many terms are within distance = 1 of x? You say yes and provide an N equal to 600.

Then you proceed to show me that all x

n

for n ≥ 600 have d(x

n

, x) < 1. Temporarily

satisﬁed, I ask, “Well you did it for 1, what about for = .00001?” You then dream up of

an N equal to 40 billion such that for n ≥ N, d(x

n

, x) < .00001. This game can continue

indeﬁnitely, and as long as you can come up with an N for each of my values of , then we

say x

n

converges to x.

Example.

We all believe that the sequence (x

n

) given by x

n

=

1

n

2

+n

(in R) converges to 0, How do

we prove it? Let > 0. We want [x

n

−0[ < , so we solve:

1

n

2

+ n

< , which is equivalent to n

2

+ n >

1

.

This will certainly be true if n >

1

, so set

N =

_

1

_

.

Now if n ≥ N then

1

n

2

+ n

≤

1

N

2

+ N

<

1

N

2

< .

In the previous example, to show convergence to something we could have noticed that

the sequence is monotonic and bounded.

Deﬁnition 4.1.3. A sequence (x

n

) in R is

1. monotone increasing if x

n

< x

n+1

for all n (monotone non-decreasing if x

n

≤ x

n+1

)

and

40

2. monotone decreasing if x

n

> x

n+1

for all n (monotone non-increasing if x

n

≥ x

n+1

).

Theorem 4.1.4. If (x

n

) is monotone (any of the types above) and bounded (that is, ¦x

n

:

n ∈ N¦ is bounded) then it converges.

Proof. Suppose (x

n

) is monotone increasing. The other cases are similar. Then

X := ¦x

n

: n ∈ N¦

is nonempty and bounded above so it has a supremum x. We claim that x

n

→x. To prove

this, let > 0. Then x − is not an upper bound for X and there exists N such that

x

N

> x −. Then if n ≥ N,

x ≥ x

n

≥ x

N

> x − ,

giving [x −x

n

[ < , so x

n

→x.

We now recall some basic properties of limits. For one part we need a deﬁnition.

Deﬁnition 4.1.5. A sequence (x

n

) in a metric space X is bounded if there exists q ∈ X and

m ∈ R such that

d(p

n

, q) ≤ M for all n ∈ N .

Note that this is the same as saying that the set ¦x

n

: n ∈ N¦ is bounded.

Theorem 4.1.6 (Rudin, Theorem 3.2). Let (x

n

) be a sequence in a metric space X.

1. (x

n

) converges to x ∈ X if and only if every neighborhood of x contains x

n

for all but

ﬁnitely many n.

2. (Uniqueness of the limit) If x, y ∈ X and (x

n

) converges to both x and y, then x = y.

3. If (x

n

) converges then (x

n

) is bounded.

4. If E ⊂ X and x is a limit point of E then there is a sequence (x

n

) in E converging to

x.

Proof. Part 1 is just a restatement of the deﬁnition of a limit. For the second part, suppose

that (x

n

) converges to x and to y. Let > 0, so that there exists N

1

and N

2

such that

if n ≥ N

1

then d(x

n

, x) < /2

and if n ≥ N

2

then d(x

n

, y) < /2 .

Using the triangle inequality, for N = max¦N

1

, N

2

¦, we get

d(x, y) ≤ d(x, x

N

) + d(x

N

, y) < /2 + /2 = .

Thus d(x, y) < for all > 0; this is only possible if d(x, y) = 0 and thus x = y.

41

For part 3, suppose that (x

n

) converges to x ∈ X and let = 1. Then there exists N

such that if n ≥ N then d(x, x

n

) < 1. Now choose

r = max¦1, d(x, x

1

), . . . , d(x, x

N−1

)¦ .

It follows then that d(x, x

n

) < r for all n, and so the set ¦x

n

: n ∈ N¦ is contained in B

r

(x).

We now show part 4. Suppose x is a limit point of E. For each n, choose any point (call

it x

n

) in the set B

1/n

(x) ∩E. We claim that this sequence of points (x

n

) converges to x. To

see this, let > 0 and pick

N =

_

1

_

.

Then if n ≥ N, d(x

n

, x) < 1/n ≤ 1/N < .

In the above, we see that a limit of a sequence is unique. This is in contrast to the limit

points (plural!) of a subset E of X. The points in E are in no particular order, and E may

have many limit points. But in a sequence, the points are ordered, and there can be at most

one limit as n runs through that chosen order.

In the case that the sequence is of real numbers, there is a nice compatibility with

arithmetic operations.

Properties of real sequences. Let (x

n

) and (y

n

) be real sequences such that x

n

→x and

y

n

→y.

1. x

n

+ y

n

→x + y.

2. If c ∈ R then cx

n

→cx.

3. x

n

y

n

→xy.

4. If y ,= 0 and y

n

,= 0 for all n ∈ N then

xn

yn

→

x

y

.

Proofs of properties. Many of these are similar so we will prove only 1 and 3. Rudin contains

all of the proofs. Suppose ﬁrst that x

n

→ x and y

n

→ y. Given > 0 choose N

1

and N

2

such that

if n ≥ N

1

then [x

n

−x[ < /2 and

if n ≥ N

2

then [y

n

−y[ < /2 .

Letting N = max¦N

1

, N

2

¦, then if n ≥ N we have

[x

n

+ y

n

−(x + y)[ ≤ [x

n

−x[ +[y

n

−y[ < .

For the third part we write

[x

n

y

n

−xy[ ≤ [y

n

[[x

n

−x[ +[x[[y

n

−y[ .

Now note that since (y

n

) converges, it is bounded. Therefore we can ﬁnd M > 0 such that

[x[ ≤ M and [y

n

[ ≤ M for all n. Given > 0 choose N such that if n ≥ N then both

[x

n

−x[ ≤ /(2M) and [y

n

−y[ ≤ /(2M) .

42

Then if n ≥ N,

[x

n

y

n

−xy[ ≤ M/(2M) + M/(2M) = .

Note that for the last item above we required y

n

,= 0 for all n. This is not necessary for

the following reasons.

Lemma 4.1.7. If (y

n

) is a real sequence such that y

n

→ y and y ,= 0 then y

n

= 0 for at

most ﬁnitely many n ∈ N.

Proof. Suppose that y

n

→y with y ,= 0 and let = [y[. Then there exists N ∈ N such that

if n ≥ N then [y

n

−y[ < . By the triangle inequality, if n ≥ N then

[y

n

[ ≥ [y[ −[y

n

−y[ > 0 ,

giving y

n

,= 0.

The next lemma says that if we remove a ﬁnite number of terms from a convergent

sequence, this does not aﬀect the limit.

Lemma 4.1.8. Let (y

n

) be a sequence in a metric space X. For a ﬁxed k ∈ N deﬁne a

sequence (z

n

) by

z

n

= y

n+k

for n ∈ N .

Then (y

n

) converges if and only if (z

n

) does. If y

n

→y then z

n

→y.

Proof. Suppose y

n

→ y. If > 0 we can pick N ∈ N such that d(y

n

, y) < for n ≥ N. For

n ≥ N,

d(z

n

, y) = d(y

n+k

, y) < ,

since n + k ≥ N also. This means z

n

→y.

Conversely, if z

n

→y then given > 0 we can ﬁnd N ∈ N such that n ≥ N implies that

d(z

n

, y) < . Deﬁne N

= N + k. Then if n ≥ N

, we have n −k ≥ N and so

d(y

n

, y) = d(z

n−k

, y) < .

Thus y

n

→y.

Now we can change the last property of real sequences as follows. If (x

n

) and (y

n

) are

real sequences such that x

n

→ x and y

n

→ y with y ,= 0 then x

n

/y

n

→ x/y. To do this,

we use the ﬁrst lemma to ﬁnd k such that for all n, y

n+k

,= 0. Then we can consider the

sequences (x

n+k

) and (y

n+k

) and prove the property for them. Since they only diﬀer from

(x

n

) and (y

n

) by a ﬁnite number of terms, the property also holds for (x

n

) and (y

n

).

We will mostly deal with sequences of real numbers (or elements of an arbitrary metric

space), but it is useful to understand convergence in R

k

, k ≥ 2. It can be reformulated

in terms of convergence of each coordinate. That is, if (x

n

) is a sequence in R

k

, we write

x

n

= (x

(1)

n

, . . . , x

(k)

n

). The sequence (x

n

) converges to x ∈ R

k

if and only if each coordinate

sequence (x

(j)

n

) converges to x

(j)

, the j-th coordinate of x.

43

Theorem 4.1.9. Let (x

n

) and (y

n

) be sequences in R

k

and (β

n

) a sequence of real numbers.

1. (x

n

) converges to x ∈ R

k

if and only if x

(j)

n

→x

(j)

(in R) for all j = 1, . . . , k.

2. If x

n

→x, y

n

→y in R

k

and β

n

→β in R, then

x

n

+ y

n

→x + y, x

n

y

n

→x y, and β

n

x

n

→βx ,

where ‘

**is the standard dot product in R
**

k

.

Proof. The second part follows from the ﬁrst part and properties of limits in R we discussed

above. To prove the ﬁrst, suppose that x

n

→ x and let j ∈ ¦1, . . . , k¦. Given > 0, let N

be such that n ≥ N implies that [x

n

−x[ < . Then we have

[x

(j)

n

−x

(j)

[ =

_

(x

(j)

n

−x

(j)

)

2

≤

_

(x

(1)

n

−x

(1)

)

2

+ + (x

(k)

n

−x

(k)

)

2

= [x

n

−x[ < .

So x

(j)

n

→x

(j)

.

For the converse, suppose that x

(j)

n

→ x

(j)

for all j = 1, . . . , k and let > 0. Pick

N

1

, . . . , N

k

such that for j = 1, . . . , k, if n ≥ N

j

then [x

(j)

n

− x

(j)

[ < /

√

k. then for

N = max¦N

1

, . . . , N

k

¦ and n ≥ N, we have

[x

n

−x[ =

_

(x

(1)

n

−x

(1)

)

2

+ + (x

(k)

n

−x

(k)

)

2

<

_

2

/d + +

2

/d = .

We ﬁnish this section with the idea of convergence to inﬁnity.

Deﬁnition 4.1.10. A real sequence (x

n

) converges to ∞ if for each M > 0 there exists

N ∈ N such that

n ≥ N implies x

n

> M .

It converges to −∞ if (−x

n

) converges to ∞.

As before we write x

n

→ ∞ (or x

n

→ −∞) in this case. In this deﬁnition we think of

M as taking the role of from before and we imagine that (M, ∞) is a “neighborhood of

inﬁnity.”

Clearly a sequence that converges to inﬁnity needs to be unbounded. The converse is not

true. Consider (x

n

), deﬁned by

x

n

=

_

1 n odd

n n even

.

This sequence does not converge to inﬁnity, but it is unbounded.

44

4.2 Subsequences, Cauchy sequences and completeness

We now move back to sequences in general metric spaces. Sometimes the sequence does not

converge, but if we remove many of the terms we can make it converge. Another way to say

this is that a sequence might not converge but it may have a convergent subsequence.

Deﬁnition 4.2.1. Let (x

n

) be a sequence in a metric space X. Given a monotonically

increasing sequence (n

k

) in N (that is, the sequence is such that n

1

< n

2

< ) then the

sequence (x

n

k

) is called a subsequence of (x

n

). If x

n

k

→ y as k → ∞ then we call y a

subsequential limit of (x

n

).

Note that a sequence (x

n

) converges to x if and only if each subsequence of (x

n

) converges

to x. To prove this, suppose ﬁrst that x

n

→x and let (x

n

k

) be a subsequence. Given > 0

we can ﬁnd N such that if n ≥ N then d(x

n

, x) < . Because (n

k

) is monotone increasing, it

follows that n

k

≥ k for all k, so choose K = N. Then for k ≥ K, the element x

n

k

is a term

of the sequence (x

n

) with index at least equal to N, giving d(x

n

k

, x) < .

Conversely, suppose that each subsequence of (x

n

) converges to x. Then as (x

n

) is a

subsequence of itself, we also have x

n

→x!

The next theorem is one of the most important in the course. It is a restatement of

compactness; in general topological spaces, it is called sequential compactness.

Theorem 4.2.2. Let (x

n

) be a sequence in a compact metric space X. Then some subse-

quence of (x

n

) converges to a point x in X.

Proof. It may be that the set of sequence elements ¦x

n

: n ∈ N¦ is ﬁnite. In this case, at

least one of these elements must appear in the sequence inﬁnitely often. That is, there exists

x ∈ ¦x

n

: n ∈ N¦ and a monotone increasing sequence (n

k

) such that x

n

k

= x for all k.

Clearly then x

n

k

→x and the element x ∈ X because the sequence terms are.

Otherwise, the set ¦x

n

: n ∈ N¦ is inﬁnite. Because compactness implies limit point com-

pactness, there exists x ∈ X which is a limit point of this set. Then we build a subsequence

that converges to x as follows. Since d(x

n

, x) < 1 for inﬁnitely many n, we can pick n

1

such

that d(x

n

1

, x) < 1. Continuing in this fashion, at stage i we note that d(x

n

, x) < 1/i for

inﬁnitely many n, so we can pick n

i

> n

i−1

such that d(x

n

i

, x) < 1/i. Because n

1

< n

2

< ,

the sequence (x

n

i

) is a subsequence of (x

n

). Further, given > 0, choose I > 1/, so that if

i ≥ I,

d(x

n

i

, x) ≤ 1/i ≤ 1/I < .

Corollary 4.2.3 (Bolzano-Weierstrass). Each bounded sequence in R

k

has a convergent

subsequence.

Proof. If (x

n

) is bounded in R

k

then we can ﬁt the set ¦x

n

: n ∈ N¦ into a k-cell, which is

compact. Now, viewing (x

n

) as a sequence in this compact k-cell, we see by the previous

theorem that it has a convergent subsequence.

45

The last topic of the section is Cauchy sequences. The motivation is as follows. Many

times we are in a metric space X that has “holes.” For instance, we may consider Q as a

metric space inside of R (that is, using the metric d(x, y) = [x − y[ from R). In this space,

the sequence

(1, 1.4, 1.41, 1.414, . . .)

does not converge (it should only converge in R – to

√

2, but this element is not in our

space). Although we cannot talk about this sequence converging; that is, getting close to

some limit x, we can do the next best thing. We can say that the terms of the sequence get

close to each other.

Deﬁnition 4.2.4. Let (x

n

) be a sequence in a metric space X. We say that (x

n

) is Cauchy

if for each > 0 there exists N ∈ N such that

if m, n ≥ N then d(x

m

, x

n

) < .

Just like before, the number N gives us a cutoﬀ in the sequence after which all terms

are close to each other. Each convergent sequence (x

n

) (with some limit x) is Cauchy, for if

> 0 then we can pick N such that if n ≥ N then d(x

n

, x) < /2. Then for m, n ≥ N,

d(x

n

, x

m

) ≤ d(x

n

, x) + d(x

m

, x) < /2 + /2 = .

One reason a Cauchy sequence might not converge was illustrated above; the “limit” may

not be in the space. This is not possible, though, in a compact space.

Theorem 4.2.5. If X is a compact metric space, then all Cauchy sequences in X converge.

Proof. Let X be compact and (x

n

) a Cauchy sequence. By the previous theorem, (x

n

) has

a subsequence (x

n

k

) such that x

n

k

→ x, some point of X. We will show that since (x

n

)

is already Cauchy, the full sequence must converge to x. The idea is to ﬁx some element

x

n

k

∗

of the subsequence which is close to x. This term is chosen far enough along the initial

sequence so that all terms are close to it, and thus close to x.

Let > 0 and choose N such that if m, n ≥ N then d(x

m

, x

n

) < /2. Choose also some

K such that if k ≥ K then d(x

n

k

, x) < /2. Last, set N

= max¦N, K¦. Because (n

k

) is

monotone increasing, we can ﬁx k

∗

such that n

k

∗ ≥ N

. Then for any n ≥ N

, we have

d(x

n

, x) ≤ d(x

n

, x

n

k

∗

) ≤ d(x

n

k

∗

, x) ≤ /2 + /2 = .

Deﬁnition 4.2.6. A metric space X in which all Cauchy sequences converge is said to be

complete.

The above theorem says that compact spaces are complete. This is also true of R

k

,

though it is not compact.

Theorem 4.2.7. R

k

(with the usual metric) is complete.

46

Proof. Let (x

n

) be a Cauchy sequence. We claim that it is bounded. The proof is almost

the same as that of the fact that a convergent sequence is bounded. We can ﬁnd N such

that if n, m ≥ N then d(x

n

, x

m

) < 1. Therefore d(x

n

, x

N

) < 1 for all n ≥ N. Putting

R = max¦d(x

N

, x

1

), . . . , d(x

N

, x

N−1

), 1¦, we then have d(x

j

, x

N

) < R for all j, so (x

n

) is

bounded.

Since (x

n

) is bounded, we can put it in a k-cell C. Then we can view the sequence

as being in the space C, which is compact. Now we use the fact that compact spaces are

complete, giving some x ∈ C such that x

n

→x. But x ∈ R, so we are done.

4.3 Special sequences

Here we will list the limits Rudin gives in the book and prove a couple of them. One

interesting one is the fourth. Imagine taking α = 10

10

and p = 10

−10

. Then it says that

(1 + 10

−10

)

n

eventually outgrows n

10

10

.

Theorem 4.3.1. The limits below evaluate as follows.

• If p > 0 then n

−p

→0.

• If p > 0 then p

1/n

→1.

• n

1/n

→1.

• If p > 0 and α ∈ R then

n

α

(1+p)

n

→0.

• If [x[ < 1 then x

n

→0.

Proof. For the ﬁrst limit, let > 0 and choose N = ,

1

1/p

| + 1. Then if n ≥ N we have

n >

1

1/p

and therefore n

−p

< .

For the second, Rudin uses the binomial theorem:

Lemma 4.3.2. For x, y ∈ R and n ∈ N,

(x + y)

n

=

n

j=0

_

n

j

_

x

j

y

n−j

.

Proof. The proof is by induction. For n = 0 we get

1 =

_

0

0

_

= 1 .

47

Assuming it holds for some n, we show it holds for n + 1. We have

(x + y)

n+1

= (x + y)(x + y)

n

= (x + y)

n

j=0

_

n

j

_

x

j

y

n−j

=

n

j=0

_

n

j

_

x

j+1

y

n−j

+

n

j=0

_

n

j

_

x

j

y

n+1−j

=

n+1

j=1

_

n

j −1

_

x

j

y

n+1−j

+

n

j=0

_

n

j

_

x

j

y

n+1−j

=

_

n

0

_

y

n+1

+

_

n

n

_

x

n+1

+

n

j=1

__

n

j −1

_

+

_

n

j

__

x

j

y

n+1−j

But now we use the identity

_

n

j−1

_

+

_

n

j

_

=

_

n+1

j

_

, valid for n ≥ 0 and j = 1, . . . , n. This gives

y

n+1

+ x

n+1

+

n

j=1

_

n + 1

j

_

x

j

y

n+1−j

,

which is

n+1

j=0

_

n+1

j

_

x

j

y

n+1−j

.

Returning to the proof of the second limit, we ﬁrst assume that p > 1 and set y

n

= p

1/n

−1.

Computing,

p = (y

n

+ 1)

n

≥ 1 + ny

n

,

where we have taken only the ﬁrst two terms from the binomial theorem. This means

0 ≤ y

n

≤

p−1

n

and letting n → ∞ we get y

n

→ 0, completing the proof in the case p > 1.

If 0 < p < 1 then we consider 1/p and see that (1/p)

1/n

→ 1. Taking reciprocals, we get

p

1/n

→1.

For the third limit, we use a diﬀerent term in the binomial theorem. Set x

n

= n

1/n

− 1

and compute

n = (1 + x

n

)

n

≥

_

n

2

_

x

2

n

=

n(n −1)

2

x

2

n

,

so 0 ≤ x

n

≤

_

2

n−1

≤

_

4

n

if n ≥ 2. Since n

−1/2

→0 we are done.

The fourth limit is a bit more diﬃcult. Choose any k ≥ α and consider n > 2k. Then

(1 + p)

n

≥

_

n

k

_

p

k

=

n(n −1) (n −k + 1)

k!

p

k

≥

(n/2)

k

p

k

k!

.

This gives

0 ≤

n

α

(1 + p)

n

≤

(p/2)

k

k!

n

α−k

→0

since α < k.

The last limit is proved in Chapter 5 in the theorem on geometric series.

48

4.4 Exercises

1. For the following sequences, ﬁnd the limit and prove your answer (using an − N

argument).

(a) x

n

=

√

n

2

+ 1 −n, n ∈ N.

(b) x

n

= n2

−n

, n ∈ N.

2. Determine whether or not the following sequence converges. If it does not, give a

convergent subsequence (if one exists).

x

n

= sin

_

nπ

2

_

+ cos(nπ), n ∈ N .

3. Let a

1

, . . . , a

k

be positive numbers. Show that

lim

n→∞

_

a

n

1

+ + a

n

k

k

_

1/n

= max¦a

1

, . . . , a

k

¦ .

4. We have seen that if a metric space (X, d) is compact then it must be complete. In

this exercise we investigate the converse.

(a) Show that if X is complete then it need not be compact.

(b) We say that a metric space (X, d) is totally bounded if for each δ > 0 we can ﬁnd

ﬁnitely many points x

1

, . . . , x

n

such that

X ⊂

n

_

i=1

B

δ

(x

i

) .

(Here, B

r

(x) is the neighborhood ¦y ∈ X : d(x, y) < r¦.) Show that X is compact

if and only if X is both totally bounded and complete via the following steps.

i. Show that if X is compact then X is totally bounded.

ii. Assume that X is totally bounded and let E ⊂ X be inﬁnite. We will try to

construct a limit point for E in X. Begin by ﬁnding x

(1)

1

, . . . , x

(1)

n

1

∈ X such

that

X ⊂

n

1

_

i=1

B1

2

(x

(1)

i

) .

There must be k

1

∈ ¦1, . . . , n

1

¦ such that

E

1

= B1

2

(x

(1)

k

1

) ∩ E is inﬁnite .

Continue, at stage n ≥ 2 choosing x

(n)

kn

∈ X such that E

n

= B

2

−n(x

(n)

kn

) ∩E

n−1

is inﬁnite. Show that (x

(1)

k

1

, x

(2)

k

2

, . . .) is a Cauchy sequence.

Hint. You may want to use the following fact (without proof). For n ≥ 1,

deﬁne s

n

= 1/2 + 1/4 + + 1/2

n

. Then (s

n

) converges.

49

iii. Assume that X is totally bounded and complete and show that any E ⊂ X

which is inﬁnite has a limit point in X. Conclude that X is compact.

(c) Show that if X is totally bounded then it need not be compact.

5. Sometimes we want to analyze sequences that do not converge. For this purpose we

deﬁne upper and lower limits; numbers that exist for all real sequences. Let (a

n

) be a

sequence in R and for each n ≥ 1 deﬁne

u

n

= sup¦a

n

, a

n+1

, . . .¦ and l

n

= inf¦a

n

, a

n+1

, . . .¦ .

(Here we write u

n

= ∞ if ¦a

k

: k ≥ n¦ is not bounded above and l

n

= −∞ if it is not

bounded below.)

(a) Show that (u

n

) and (l

n

) are monotonic. If (a

n

) is bounded, show that there exist

numbers u, l ∈ R such that u

n

→u and l

n

→l. We denote these numbers as the

limit superior (upper limit) and limit inferior (lower limit) of (a

n

) and write

limsup

n→∞

a

n

= u, liminf

n→∞

a

n

= l .

(b) Give reasonable deﬁnitions of limsup

n→∞

a

n

and liminf

n→∞

a

n

in the unbounded

case. (Here your deﬁnitions should allow for the possibilities ±∞.)

(c) Show that (a

n

) converges if and only if

limsup

n→∞

a

n

= liminf

n→∞

a

n

.

(You may want to separate into cases depending on whether the liminf and/or

limsup is ﬁnite or inﬁnite.)

6. Let (a

n

) be a real sequence and write E for the set of all subsequential limits of (a

n

);

that is,

E = ¦x ∈ R : there is a subsequence (a

n

k

) of (a

n

) such that a

n

k

→x as k →∞¦ .

Assume that (a

n

) is bounded and prove that limsup

n→∞

a

n

= sup E. Explain how you

would modify your proof to show that liminf

n→∞

a

n

= inf E. (These results are also

true if (a

n

) is unbounded but you do not have to prove that.)

7. Let a

0

, b

0

be real numbers with 0 b

0

a

0

. Deﬁne sequences ¦a

n

¦, ¦b

n

¦ by

a

n+1

=

a

n

+ b

n

2

and b

n+1

=

_

a

n

b

n

for n 0 .

(a) Prove the arithmetic-geometric mean inequality: b

1

a

1

. This trivially extends

to b

n

a

n

for all n.

(b) Prove that ¦a

n

¦ and ¦b

n

¦ converge, and to the same limit. This limit is called

the arithmetic-geometric mean of a

0

and b

0

.

50

8. This problem is not assigned; it is just for fun. Let x

0

= 1 and x

n+1

= sin x

n

.

(a) Prove x

n

→0.

(b) Find lim

n→∞

√

nx

n

.

51

5 Series

We now introduce series, which are special types of sequences. We will concentrate on them

for the next couple of lectures.

5.1 Deﬁnitions

Deﬁnition 5.1.1. Let (x

n

) be a real sequence. For each n ∈ N, deﬁne the partial sum

s

n

= x

1

+ + x

n

=

n

j=1

x

j

.

We say that the series

x

n

converges if (s

n

) converges.

Just as before, the tail behavior is all that matters (we can chop oﬀ as many initial terms

as we want). In other words

∞

n=1

x

n

converges iﬀ

∞

n=N

x

n

converges for each N ≥ 1 .

The proof is the same as that for sequences.

We would like to characterize which series converge. We start with a simple criterion

that must be satisﬁed.

Theorem 5.1.2. If the series

x

n

converges then the terms (x

n

) converge to 0.

Proof. Let > 0 and suppose that

n

x

n

= s (that is, s

n

→ s). Then there exists N ∈ N

such that if n ≥ N then [s

n

−s[ < /2. Now for n ≥ N + 1,

[s

n

−s[ < /2 and [s

n−1

−s[ < /2 ,

implying that [s

n

−s

n−1

[ < . Therefore

[x

n

−0[ = [s

n

−s

n−1

[ <

and we are done.

The above tells us that many series cannot converge. For instance,

x

n

diverges, where

x

n

= (−1)

n

. However, it is not true that all series

x

n

with x

n

→0 converge. For example,

Theorem 5.1.3. The harmonic series,

∞

n=1

1/n, diverges.

Proof. To prove this, we give a lemma that allows us to handle series of non-negative terms

more easily.

Lemma 5.1.4. Let (x

n

) be a sequence of non-negative terms. Then

x

n

converges if and

only if the sequence of partial sums (s

n

) is bounded.

52

Proof. This comes directly from the monotone convergence theorem. If x

n

≥ 0 for all n,

then

s

n+1

= s

n

+ x

n+1

≥ s

n

,

giving that (s

n

) is monotone, and converges if and only if it is bounded.

Returning to the proof, we will show that the partial sums of the harmonic series are

unbounded. Let M > 0 and choose n of the form n = 2

k

for k > 2M. Then we give a lower

bound:

s

n

= 1 +

1

2

+

1

3

+ +

1

2

k

> 1 +

_

1

2

+

1

3

_

+

_

1

4

+

1

5

+

1

6

+

1

7

_

+ +

_

1

2

k−1

+ +

1

2

k

−1

_

>

1

2

+ 2

_

1

4

_

+ 4

_

1

8

_

+ + 2

k−1

_

1

2

k

_

=

1

2

(1 + 1 + 1 + + 1) =

k

2

> M .

So given any M > 0, there exists n such that s

n

> M. This implies that (s

n

) is unbounded

and we are done.

In the proof above, we used an argument that can be generalized a bit.

Theorem 5.1.5 (Comparison test). Let (x

n

) and (y

n

) be non-negative real sequences such

that x

n

≥ y

n

for all n.

1. If

x

n

converges, then so does

y

n

.

2. If

y

n

diverges, then so does

x

n

.

Proof. The ﬁrst part is implied by the second, so we need only show the second. Write (s

n

)

and (t

n

) for the partial sums

s

n

= x

1

+ + x

n

and t

n

= y

1

+ + y

n

.

Since y

n

≥ 0 for all n we can use the above lemma to say that (t

n

) is unbounded, so given

M > 0 choose N such that n ≥ N implies that t

n

> M. Now for such n,

s

n

= x

1

+ + x

n

≥ y

1

+ + y

n

= t

n

> M ,

so that (s

n

) is unbounded and diverges.

This test can be generalized in at least two ways:

1. We only need x

n

≥ y

n

for n greater than some N

0

. This is because we can consider

convergence/divergence of

∞

n=N

0

x

n

etc.

53

2. In the ﬁrst part, we do not even need y

n

≥ 0 as long as we modify the statement.

Suppose that (x

n

) is non-negative such that

x

n

converges and [y

n

[ ≤ x

n

for all n.

Then setting s

n

and t

n

as before, we can just show that (t

n

) is Cauchy. Since (s

n

) is,

given > 0 we can ﬁnd N such that if n > m ≥ N then [s

n

−s

m

[ < . Then

[t

n

−t

m

[ = [y

m+1

+ + y

n

[ ≤ [y

m+1

[ + +[y

n

[ ≤ x

m+1

+ + x

n

= [s

n

−s

m

[ < .

To use the comparison test, let us ﬁrst introduce one of the simplest series of all time.

Theorem 5.1.6 (Geometric series). For a ∈ R deﬁne a sequence (x

n

) by x

n

= a

n

. Then the

geometric series

n

x

n

converges if and only if [a[ < 1. Furthermore,

∞

n=0

a

n

=

1

1 −a

if [a[ < 1 .

Proof. The ﬁrst thing to note is that a

n

→0 if [a[ < 1. We can prove this by showing that

[a[

n

→0. So if 0 ≤ [a[ < 1 then the sequence [a[

n

is monotone decreasing:

[a[

n+1

= [a[[a[

n

≤ [a[

n

and is bounded below so has a limit, say L. Then we get

L = lim

n→∞

[a[

n

= [a[ lim

n→∞

[a[

n−1

= [a[L ,

but as [a[ , = 0 we have L = 0.

Now continue to assume that [a[ < 1 and compute the partial sum for n ≥ 1

s

n

+ a

n+1

= s

n+1

= 1 + a + a

2

+ + a

n+1

= 1 + a(1 + a + a

2

+ + a

n

) = 1 + as

n

.

Solving for s

n

, we ﬁnd s

n

(1 −a) = 1 −a

n+1

. Since a ,= 1,

s

n

=

1 −a

n+1

1 −a

.

We let n →∞ to get the result.

If [a[ ≥ 1 then the terms a

n

do not even go to zero, since [a[

n

≥ [a[ ,= 0, so the series

diverges.

Now we can prove facts about the “p-series.”

Theorem 5.1.7. The series

n

−p

converges if and only if p > 1.

54

Proof. For p ≤ 1 we have n

−p

≥ 1/n and so the comparison test gives divergence. Suppose

then that p > 1. We can group terms as before: taking n = 2

k

−1,

1 +

1

2

p

+

1

3

p

+ +

1

(2

k

−1)

p

= 1 +

_

1

2

p

+

1

3

p

_

+

_

1

4

p

+

1

5

p

+

1

6

p

+

1

7

p

_

+ +

_

1

2

p(k−1)

+ +

1

(2

k

−1)

p

_

≤ 1 + 2

_

1

2

p

_

+ 4

_

1

4

p

_

+ + 2

k−1

_

1

2

(k−1)p

_

= 1 + 2

1−p

+ 4

1−p

+ + (2

k−1

)

1−p

= (2

1−p

)

0

+ (2

1−p

)

1

+ + (2

1−p

)

k−1

≤

1

1 −2

1−p

< ∞ since p > 1 .

This means that if s

n

=

n

j=1

j

−p

, then s

2

k

−1

≤

1

1−2

1−p

for all k. Since (s

n

) is monotone, it

is then bounded and the series converges.

5.2 Ratio and root tests

We will continue to deal with series of non-negative terms. So far we have used the compar-

ison test to compare series to the geometric series, which we could solve for magically. We

will still do that, but in a more “reﬁned” way.

Theorem 5.2.1 (Root test). Let (x

n

) be a real sequence and deﬁne α = limsup

n→∞

[x

n

[

1/n

.

• If α < 1 then

x

n

converges.

• If α > 1 then

x

n

diverges.

• If α = 1 then

x

n

could converge or diverge.

Proof. Deﬁne

u

n

= sup¦[x

n

[

1/n

, [x

n+1

[

1/n+1

, . . .¦ .

In the homework set, limsup

n→∞

x

n

was deﬁned as lim

n→∞

u

n

(at least in the bounded case

– you can adapt this proof to the unbounded case). If α < 1 this means that given p ∈ (α, 1)

there exists N such that if n ≥ N then u

n

≤ p, or

n ≥ N implies [x

n

[ ≤ p

n

.

Now we just use the comparison test. Since

p

n

converges (as 0 < p < 1), so does

x

n

.

Suppose now that α > 1. Recall from the homework that given a real sequence (y

n

),

there always exists a subsequence (y

n

k

) such that y

n

k

→ limsup

n→∞

y

n

. So we can ﬁnd an

increasing sequence (n

k

) such that [x

n

k

[

1/n

k

→α. Thus there exists K such that

k ≥ K implies [x

n

k

[ ≥ 1

n

k

= 1

and since (x

n

) does not converge to zero, we cannot have

x

n

convergent.

55

Last if α = 1 we cannot tell anything. First (1/n)

1/n

→ 1 but also (1/n

2

)

1/n

=

_

(1/n)

1/n

_

2

→ 1. Since

1/n diverges and

1/n

2

converges, the root test tells us noth-

ing.

Applications.

1. The series

n

2

2

n

converges. We can see this by the root test.

limsup

n→∞

_

n

2

2

n

_

1/n

= limsup

n→∞

_

n

1/n

_

2

2

= 1/2 < 1 .

2. Power series. Let x ∈ R and for a given real sequence (a

n

), consider the series

∞

n=0

a

n

x

n

.

We would like to know for which values of x this series converges. To solve for this, we

simply use the root test. Consider

limsup

n→∞

[a

n

x

n

[

1/n

= [x[ limsup

n→∞

[a

n

[

1/n

.

Setting α = limsup

n→∞

[a

n

[

1/n

, we ﬁnd that the series converges if [x[ < 1/α and

diverges if [x[ > 1/α. So it makes sense to deﬁne

R := 1/α as the radius of convergence of the power series

∞

n=0

a

n

x

n

.

Of course we cannot tell from the root test what happens when x = ±R.

3. Consider the series

∞

n=0

1

n!

. We have

_

1

n!

_

1/n

≤

_

1

n(n −1) (,n/2|)

_

1/n

≤

_

1

(n/2)

n/2

_

1/n

=

_

2

n

→0 .

So the root test gives convergence.

Another useful test is the ratio test.

Theorem 5.2.2 (Ratio test). Let (x

n

) be a real sequence.

• If limsup

n→∞

¸

¸

¸

x

n+1

xn

¸

¸

¸ < 1 then

x

n

converges.

• If x

n+1

≥ x

n

> 0 for all n ≥ N

0

(a ﬁxed natural number) then

x

n

diverges.

56

Proof. Assume the limsup is α < 1. Then as before, choosing p ∈ (α, 1) we can ﬁnd N such

that if n ≥ N then [x

n+1

[ < p[x

n

[. Iterating this from n = N we ﬁnd

[x

N+k

[ < p

k

[x

N

[ for all k ≥ 1 .

Therefore if we set y

k

= x

N+k

then [y

k

[ ≤ Cp

k

with C a non-negative constant equal to x

N

.

This implies by the comparison test that

y

k

converges. This is the tail of

x

n

so this

converges as well.

Suppose on the other hand that x

n+1

≥ x

n

> 0 for all n ≥ N

0

. Then by iteration,

x

N

0

+k

≥ x

N

0

> 0 ,

and the terms do not even converge to 0. This implies that

x

n

diverges.

The ratio test can be inconclusive, but in more ways than can the root test. First if

limsup

n→∞

¸

¸

¸

x

n+1

xn

¸

¸

¸ > 1 it can still be that the series converges (try to think of an example!).

Also if this limsup equals 1, we could have convergence or divergence.

Note however that if

lim

n→∞

¸

¸

¸

¸

x

n+1

x

n

¸

¸

¸

¸

exists and is > 1

then we can apply the second criterion of the ratio test and conclude divergence of

x

n

.

Applications.

1. The series

x

n n

n

n!

converges if [x[ < 1/C, where C =

∞

n=0

1/n!. To see this, set

b

n

= x

n n

n

n!

:

¸

¸

¸

¸

b

n+1

b

n

¸

¸

¸

¸

= [x[

(n + 1)

n+1

n

n

(n + 1)

= [x[

_

1 +

1

n

_

n

.

But

_

1 +

1

n

_

n

=

n

j=0

_

n

j

_

n

−j

=

n

j=0

n(n −1) (n −j + 1)

j!n

j

≤

∞

j=0

1

j!

= C .

So limsup

n→∞

¸

¸

¸

b

n+1

bn

¸

¸

¸ ≤ C[x[ < 1.

2. Power series. Generally we can also test convergence of power series using the ratio

test. Considering the series

a

n

x

n

, we compute

limsup

n→∞

¸

¸

¸

¸

a

n+1

x

n+1

a

n

x

n

¸

¸

¸

¸

= [x[ limsup

n→∞

¸

¸

¸

¸

a

n+1

a

n

¸

¸

¸

¸

= [x[α ,

where α = limsup

n→∞

¸

¸

¸

a

n+1

an

¸

¸

¸. So if [x[ < 1/α the series converges, whereas if [x[ ≥ 1/α

we cannot tell. However, if β = lim

n→∞

¸

¸

¸

a

n+1

an

¸

¸

¸ exists then for [x[ > 1/β we have

divergence.

57

Remark (from class). The root and ratio tests can give diﬀerent answers. Consider the

sequence (a

n

) given by

a

n

=

_

1 if n is even

2 if n is odd

.

Then

limsup

n→∞

¸

¸

¸

¸

a

n+1

a

n

¸

¸

¸

¸

= 2 but limsup

n→∞

([a

n

[)

1/n

= 1 .

Therefore if we consider the radius of convergence of

a

n

x

n

, the root test gives 1. If we

were to deﬁne the radius of convergence using the ratio test (which we should not!) then

we would get 1/2, which is smaller, and is not accurate, since for x = 3/4, for instance, the

series converges. Generally speaking we have

limsup

n→∞

([a

n

[)

1/n

≤ limsup

n→∞

¸

¸

¸

¸

a

n+1

a

n

¸

¸

¸

¸

.

See Rudin, Theorem 3.37.

5.3 Non non-negative series

We saw in the last couple of lectures that when dealing with series with non-negative terms,

we compare to the geometric series. This gave rise to the comparison test, the ratio test and

the root test. For other series we have basically one tool: summation by parts. It comes in

to the following theorem.

Theorem 5.3.1 (Dirichlet test). Let (a

n

) and (b

n

) be real sequences such that, setting A

n

=

n

j=0

a

j

, we have (A

n

) bounded. If (b

n

) is monotonic with b

n

→0 then

a

n

b

n

converges.

Proof. Let’s suppose that (b

n

) is monotone decreasing; the other case is similar. The idea is

to get a diﬀerent representation for

a

n

b

n

by setting A

n

=

n

j=0

a

j

for n ≥ 0 and A

−1

= 0.

Now

N

n=0

a

n

b

n

=

N

n=0

(A

n

−A

n−1

)b

n

=

N

n=0

A

n

b

n

−

N

n=0

A

n−1

b

n

=

N

n=0

A

n

b

n

−

N−1

n=0

A

n

b

n+1

=

N

n=0

A

n

(b

n

−b

n+1

) + A

N

b

N

.

Now since (A

n

) is bounded and b

n

→ 0 we have A

n

b

n

→ 0. We can show this as follows.

Suppose that [A

n

[ ≤ M for all n and let > 0. Choose N

0

such that n ≥ N

0

implies that

[b

n

[ < /M. Then for n ≥ N

0

, [A

n

b

n

[ < M/M = .

58

Since A

N

b

N

→ 0, the above representation gives that

a

n

b

n

converges if and only

if

A

n

(b

n

− b

n+1

) converges. But now we use the comparison test: [A

n

(b

n

− b

n+1

)[ ≤

M[b

n

−b

n+1

[ = M(b

n

−b

n+1

), where we have used monotonicity to get b

n

−b

n+1

≥ 0. But

N

n=0

M(b

n

−b

n+1

) = M

N

n=0

(b

n

−b

n+1

) = M(b

0

−b

N+1

) →Mb

0

converges, so

A

n

(b

n

−b

n+1

) converges, completing the proof.

Note in the previous proof that we used a technique similar to integration by parts.

Recall from calculus that the integral

_

b

a

u(x)v(x) dx can be written as

_

b

a

u(x)v(x) dx = U(b)v(b) −U(a)v(a) −

_

b

a

U(x)v

(x) dx ,

where U is an antiderivative of u. Here we are thinking of A

n

as the “antiderivative” of a

n

and b

n

− b

n+1

as the “derivative of b

n

. In the sum case above, we only have one boundary

term, because one of them corresponds to A

−1

b

0

= 0.

Examples

1. Alternating series test. Let (a

n

) be a monotone non-increasing sequence converging

to 0. Then

(−1)

n

a

n

converges. This is obtained by applying the Dirichlet test, noting

that the partial sums of

(−1)

n

are bounded by 1. As an example, we have

(−1)

n

/n converges although

1/n does not .

2. For n ∈ N, let f(n) be the largest value of k such that 2

k

≤ n (this is the integer part

of log

2

n). Then

(−1)

n

/f(n) converges .

3. A series

a

n

is said to converge absolutely if

[a

n

[ converges. If it does not converge

absolutely but does converge then we say

a

n

converges conditionally. It is a famous

theorem of Riemann that given any L ∈ R and a conditionally convergent series

a

n

,

there is a rearrangement (b

n

) of the terms of (a

n

) such that

b

n

= L. See the last

section of Rudin, Chapter 3 for more details.

5.4 Exercises

1. Determine if the following series converge.

(a)

∞

n=1

n −3

n

2

−6n + 10

59

(b)

∞

n=1

n!

1 3 (2n −1)

2. Prove that the following series converges for all x ∈ R:

∞

n=1

sin(nx)

√

n

.

Hint. Use Theorem 3.42, after multiplying by sin(x/2). Use the following identity,

which is valid for all a, b ∈ R:

sin a sin b =

1

2

(cos(a −b) −cos(a + b)) .

Although we have not deﬁned sin x you can use the fact that [ sin x[ and [ cos x[ are

both bounded by 1.

3. Because

1/n diverges, any series

x

n

with x

n

≥ 1/n for all n must diverge, by the

comparison test. We might think then that if

x

n

converges and x

n

≥ 0 for all n

then x

n

is smaller than 1/n, in the sense that

lim

n→∞

nx

n

= 0 .

(a) Show that this is false; that is, there exist convergent series

x

n

with x

n

≥ 0 for

all n such that ¦nx

n

¦ does not converge to 0.

(b) Show however that if ¦x

n

¦ is monotone non-increasing and non-negative with

x

n

convergent, then nx

n

→0.

Hint. Use Theorems 3.23 and 3.27.

4. Here we give a diﬀerent proof of the alternating series test. Let ¦x

n

¦ be a real sequence

that is monotonically non-increasing and x

n

→0.

(a) For n ∈ N, let A

n

=

2n

i=1

(−1)

i

x

i

and B

n

=

2n−1

i=1

(−1)

i

x

i

. Show that ¦A

n

¦ and

¦B

n

¦ converge.

(b) Prove that A

n

−B

n

→0 and use this to show that

(−1)

n

x

n

converges.

5. Suppose that (a

n

) and (b

n

) are real sequences such that b

n

> 0 for all n and

a

n

converges. If

lim

n→∞

a

n

b

n

= L ,= 0

then must

b

n

converge? (Here L is a ﬁnite number.)

60

6. (a) For 0 ≤ k ≤ n, recall the deﬁnition of the binomial coeﬃcient

_

n

k

_

=

n!

k!(n −k)!

.

For n ≥ 0, let a

n

=

_

2n

n

_

. Find lim

n→∞

a

n+1

an

.

(b) Find R, the radius of convergence of

∞

n=0

_

2n

n

_

x

n

. (Fun fact: this actually equals

1

√

1−4x

when it converges; it is the Taylor series of this function.)

(c) Show that if b

1

, b

2

, . . . , b

m

are non-negative real numbers then

(1 + b

1

)(1 + b

2

) (1 + b

m

) ≥ 1 + b

1

+ + b

m

.

Write c

n

= a

n

/4

n

and

c

0

cn

=

c

n−1

cn

c

n−2

c

n−1

c

1

c

2

c

0

c

1

. Use this inequality to show that

_

2n

n

_

/4

n

→0.

(d) Show that the sum

_

2n

n

_

x

n

converges when x = −R. (Fun fact we will get to

later: it converges to the value of

1

√

1−4x

at x = −R, which is

1

√

2

.)

(e) What can you say about convergence when x = R?

61

6 Function limits and continuity

6.1 Function limits

So far we have only talked about limits for sequences. Now we step it up to functions. Pretty

quickly, though, we will see that we can relate function limits to sequence limits. The point

at which we consider the limit does not even need to be in the domain of the function f. It

is important also to notice that x is not allowed to equal x

0

below.

Deﬁnition 6.1.1. Let (X, d

X

) and (Y, d

Y

) be metric spaces and E ⊂ X with x

0

a limit point

of E. If f : E →Y then we write

lim

x→x

0

f(x) = L

if for each > 0 there exists δ > 0 such that whenever x ∈ E and 0 < d

X

(x, x

0

) < δ, it

follows that d

Y

(f(x), L) <

Why is it important that x

0

be allowed to be only a limit point of E (and therefore not

necessarily in E)? Consider f : (0, ∞) → R deﬁned by f(x) =

sin x

x

. Then f is not deﬁned

at 0 but we know from calculus that it has a limit of 1 as x →0.

Here we are using as a measure of “closeness” just as before. We can imagine a dialogue

similar to what occurred for sequences: you say that f(x) approaches L as x approaches x

0

.

I say “Well, can you get f(x) within .005 of L as long as x is close to x

0

?” You say yes and

produce a value of δ = .01. You then qualify this by saying, “As long as x is within δ = .01

of x

0

then f(x) will be within .005 of L. This goes on and on, and if each time I give you

an > 0 you manage to produce a corresponding δ > 0, then we say the limit equals L.

As promised, there is an equivalent formulation of limits using sequences. Note that the

statement below must hold for all sequences (x

n

) in E with x

n

→x

0

but x

n

,= x

0

for all n.

Proposition 6.1.2. Let f : E →Y and x

0

a limit point of E. We have lim

x→x

0

f(x) = L if

and only if for each sequence (x

n

) in E such that x

n

→x

0

with x

n

,= x

0

for all n, it follows

that f(x

n

) →L.

Proof. Suppose ﬁrst that lim

x→x

0

f(x) = L and let (x

n

) be a sequence in E such that x

n

→x

0

and x

n

,= x

0

for all n. We must show that f(x

n

) → L. So, let > 0 and choose δ > 0

such that whenever d

X

(x, x

0

) < δ, we have d

Y

(f(x), L) < . Now since x

n

→x

0

we can pick

N ∈ N such that if n ≥ N then d

X

(x

n

, x

0

) < δ. For this N, if n ≥ N then

d

X

(x

n

, x

0

) < δ, so d

Y

(f(x

n

), L) < ,

and it follows that f(x

n

) →L.

Suppose conversely that f(x

n

) → L for all sequences (x

n

) in E such that x

n

→ x

0

and

x

n

,= x

0

for all n. By way of contradiction, assume that lim

x→x

0

f(x) = L does not hold.

So there must be at least one > 0 such that for any δ > 0 we try to ﬁnd, there is always

a x

δ

∈ E with d

X

(x

δ

, x

0

) < δ but d

Y

(f(x

δ

), L) > . So create a sequence of these, using

δ = 1/n. In other words, for each n ∈ N, pick x

n

∈ E ¸ ¦x

0

¦ such that 0 < d

X

(x

n

, x

0

) < 1/n

but d

Y

(f(x

n

), L) > . (This is possible in part because x

0

is a limit point of E.) Then clearly

x

n

→x

0

with x

n

,= x

0

for all n but we cannot have f(x

n

) →L. This is a contradiction.

62

One nice thing about the sequence formulation is that it allows us to immediately bring

over theorems about convergence for sequences. For instance,

Proposition 6.1.3. Let E ⊂ R and x

0

a limit point of E. Let f, g : R → R (with the

standard metric) and a, b ∈ R. If lim

x→x

0

f(x) = L and lim

x→x

0

g(x) = M exist. Then

1. lim

x→x

0

af(x) = aL.

2. lim

x→x

0

(f(x) + g(x)) = L + M.

3. lim

x→x

0

f(x)g(x) = LM.

4. If M ,= 0 then

lim

x→x

0

f(x)

g(x)

=

L

M

.

6.2 Continuity

We now give the deﬁnition of continuity. Note that x

0

must be an element of E, since f

needs to be deﬁned there.

Deﬁnition 6.2.1. If E ⊂ X and f : E →Y with x

0

∈ E then we say f is continuous at x

0

if

lim

x→x

0

f(x) = f(x

0

) .

We say f is continuous on E if f is continuous at every x

0

∈ E.

Here is the equivalent deﬁnition in terms of δ, . The function f is continuous at x

0

if for

each > 0 there is δ > 0 such that if x ∈ E satisﬁes d

X

(x, x

0

) < δ then d

Y

(f(x), f(x

0

)) < .

Note here that we are not restricting 0 < d

X

(x, x

0

) since we trivially have d

Y

(f(x

0

), f(x

0

)) <

for all > 0. This caveat (or lack thereof) carries over the the corollary:

Corollary 6.2.2. The function f is continuous at x

0

∈ E if and only if for each sequence

(x

n

) in E with x

n

→x

0

we have f(x

n

) →f(x

0

).

Proof. This is just a consequence of the sequence theorem from last section.

There is yet another equivalent deﬁnition in terms of only open sets. This one is valid for

functions continuous on all of X (although there is a more technical one for continuity at a

point, but we will not get into that). To extend the theorem to functions that are continuous

on subsets E of X, one would need to talk about sets that are open in E.

Theorem 6.2.3. If f : X → Y then f is continuous on X if and only if for each open set

O ⊂ Y , the preimage

f

−1

(O) = ¦x ∈ X : f(x) ∈ O¦

is open in X.

63

Proof. Suppose that f is continuous on X and let O ⊂ Y be open. We want to show that

f

−1

(O) is open. So choose x

0

∈ f

−1

(O). Since f(x

0

) ∈ O (by deﬁnition) and O is open we

can ﬁnd > 0 such that B

(f(x

0

)) ⊂ O. However f is continuous at x

0

so there exists a

corresponding δ > 0 such that if x ∈ X with d

X

(x, x

0

) < δ then d

Y

(f(x), f(x

0

)) < . So if

x ∈ B

δ

(x

0

) then f(x) ∈ B

(f(x

0

)). As B

(f(x

0

)) was chosen to be a subset of O, we ﬁnd

if x ∈ B

δ

(x

0

) then f(x) ∈ O ,

of B

δ

(x

0

) ⊂ f

−1

(O). This means x

0

is an interior point of f

−1

(O) and this set is open.

Suppose that for each open O ⊂ Y the set f

−1

(O) is open in X. To show f is continuous

on X we must show that f is continuous at each x

0

∈ X. So let x

0

∈ X and > 0. The set

B

(f(x

0

)) is open in Y , so f

−1

(B

(f(x

0

))) is open in X. Because x

0

is an element of this

set (note that f(x

0

) ∈ B

(f(x

0

))) it must be an interior point, so there is a δ > 0 such that

B

δ

(x

0

) ⊂ f

−1

(B

(f(x

0

))). Now if d

X

(x, x

0

) < δ then d

Y

(f(x), f(x

0

)) < , so f is continuous

at x

0

.

It is diﬃcult to get intuition about this deﬁnition, but let us give an example to illustrate

how it may work. Consider the function f : R →R given by

f(x) =

_

1 if x = 0

0 if x ,= 0

.

We know from calculus that f is not continuous because it is not continuous at 0:

lim

x→0

f(x) = 0 ,= 1 = f(0) .

To see this in terms of the other deﬁnition, look at the open set (1/2, 3/2). Then

f

−1

((1/2, 3/2)) = ¦0¦ ,

which is not open. This only proves, however, that f is not continuous everywhere.

Corollary 6.2.4. f is continuous on X if and only if for each closed C ⊂ Y , the set f

−1

(C)

is closed in X.

Proof. f is continuous on X if and only if for each open O ⊂ Y , the set f

−1

(O) is open in

X. If C ⊂ Y is closed then C

c

is open in Y . Therefore

f

−1

(C) =

_

f

−1

(C

c

)

_

c

is closed inX .

To check this equality, we have x ∈ f

−1

(C) iﬀ f(x) ∈ C iﬀ f(x) / ∈ C

c

iﬀ x ∈ (f

−1

(C

c

))

c

.

The other direction is similar. If f

−1

(C) is closed in X whenever C is closed in Y , let O

be an open set in Y . Then f

−1

(O

c

) is closed in X, giving

f

−1

(O) =

_

f

−1

(O

c

)

_

c

open in X .

64

Examples.

1. The simplest. Take f : X →X as f(x) = x. Then for each open O ⊂ X, f

−1

(O) = O

is open in X. So f is continuous on X.

2. Let f : R →R be

f(x) =

_

1 if x ∈ Q

0 if x / ∈ Q

.

This function is continuous nowhere. If x ∈ R then suppose ﬁrst x is rational. Choose

a sequence of irrationals (x

n

) converging to x (this is possible by the fact that R ¸ Q

is dense in R, from the homework). Then lim

n→∞

f(x

n

) = 0 ,= 1 = f(x). A similar

argument holds for irrational x and gives that f is continuous nowhere.

Note that this conclusion cannot be obtained by showing that some open set O has

f

−1

(O) not open. (Take for instance O = (−1/2, 1/2).) This would prove that f is not

continuous everywhere.

3. The last function was discontinuous at the rationals and irrationals. This one is a

nasty function that will be discontinuous only at the rationals. For any q ∈ Q write

q ∼ (m, n) if m/n is the “lowest terms” representation of q; that is, if m, n are the

unique numbers with m ∈ Z, n ∈ N and m, n have no common prime factors. Then

deﬁne f : R →R by

f(x) =

_

1

n

if x ∈ Q and x ∼ (m, n) for some m ∈ Z

0 if x / ∈ Q

.

It is not hard to see that f is discontinuous at rationals. Indeed, if x is rational then

f(x) > 0 but we can choose a sequence of irrationals (x

n

) such that x

n

→ x, giving

0 = lim

n→∞

f(x

n

).

On the other hand it is a bit more diﬃcult to show that f is continuous at the irra-

tionals. Let x ∈ R ¸ Q and let > 0. Choose N ∈ N such that 1/N < . Consider the

rational numbers in the interval (x −1, x + 1). There are only ﬁnitely many rationals

in this interval that are represented as (m, 1) for some m. There are also only ﬁnitely

many in this interval represented as (m, 2) for some m. Continuing, the set of rational

numbers X

N

in this interval that are represented as (m, n) for some n ≤ N is ﬁnite and

is therefore a closed set. Since x ∈ X

c

N

we can then ﬁnd δ > 0 such that B

δ

(x) ⊂ X

c

N

.

We claim that if [y − x[ < δ then [f(x) − f(y)[ < . To prove this, consider ﬁrst y

irrational. Then [f(x) − f(y)[ = [0 − 0[ = 0 < . Next if y is rational in B

δ

(x) then

x ∈ X

c

N

and so the representation of x is (m, n) with n > N. It follows that

[f(x) −f(y)[ = [0 −f(y)[ = f(y) =

1

n

≤

1

N

< .

4. The last function was discontinuous exactly at the rationals. We will see in the home-

work that there is no function that is discontinuous exactly at the irrationals.

65

We saw last time that f : R → R given by f(x) = x is continuous everywhere. We will

use this along with the following proposition to show that polynomials are also continuous.

Proposition 6.2.5. Let X be a metric space and f, g : X →R be continuous at x

0

∈ X. If

a ∈ R then the following functions are continuous as x

0

:

1. f + g, af, and fg,

2. f/g as long as g(x

0

) ,= 0.

Proof. These follow using the limit properties from before; for example,

lim

x→x

0

(f + g)(x) = lim

x→x

0

f(x) + lim

x→x

0

g(x) = f(x

0

) + g(x

0

) = (f + g)(x

0

) ,

giving that f + g is continuous at x

0

. The others follow similarly.

The proposition implies:

• Any polynomial function is continuous on all of R. That is, if f(x) = a

n

x

n

+ +a

1

x+a

0

then f is continuous on R.

• Every rational function is continuous at each point for which the denominator is

nonzero. That is, if f(x) = g(x)/h(x), where g and h are polynomial functions, then

f is continuous at x

0

if and only if h(x

0

) ,= 0.

Another way to build continuous functions is through composition.

Theorem 6.2.6. Let X, Y, Z be metric spaces and f : X →Y , g : Y →Z functions. If f is

continuous at x

0

∈ X and g is continuous at f(x

0

) ∈ Y then g ◦ f is continuous at x

0

.

Proof. Let > 0. Since g is continuous at f(x

0

), we can choose δ

> 0 such that if

d

Y

(y, f(x

0

)) < δ

then d

Z

(g(y), g(f(x

0

))) < . Since f is continuous at x

0

, we can choose

δ > 0 such that if d

X

(x, x

0

) < δ then d

Y

(f(x), f(x

0

)) < δ

**. Putting these together, if
**

d

X

(x, x

0

) < δ then d

Y

(f(x), f(x

0

)) < δ

, giving d

Z

(g(f(x)), g(f(x

0

))) < . This means

d

Z

((g ◦ f)(x), (g ◦ f)(x

0

)) < and g ◦ f is continuous at x

0

.

6.3 Relations between continuity and compactness

Continuous functions and compact sets work well together. The ﬁrst basic theorem is:

Theorem 6.3.1. If f : X → Y is continuous and E ⊂ X is compact then the image

f(E) = ¦f(x) : x ∈ E¦ is compact.

Proof. We need to show that any open cover of f(E) can be reduced to a ﬁnite subcover, so

let ( be an open cover of f(E). Deﬁne a collection (

of sets in X by

(

= ¦f

−1

(O) : O ∈ (¦ .

66

Because f is continuous, each set in (

is open. Furthermore (

**covers E, as every point
**

in x ∈ E is mapped to an element in f(E), which is covered by some O ∈ (. This means

that (

**is an open cover of E, and compactness of E allows to reduce it to a ﬁnite subcover
**

¦f

−1

(O

1

), . . . , f

−1

(O

n

)¦. We claim that ¦O

1

, . . . , O

n

¦ is a ﬁnite subcover of f(E). To show

this, let y ∈ f(E) so that there exists some x ∈ E with f(x) = y. There exists k with

1 ≤ k ≤ n such that x ∈ f

−1

(O

k

) and therefore y = f(x) ∈ O

k

.

This theorem has many consequences.

Corollary 6.3.2. Let f : X →Y be continuous. If E ⊂ X is compact then f is bounded on

E. That is, there exists y ∈ Y and R > 0 such that d

Y

(y, f(x)) ≤ M for all x ∈ E.

Proof. From the theorem, the set f(E) is compact and therefore bounded.

The next is for continuous functions to R.

Corollary 6.3.3 (Extreme value theorem). Let f : X → R be continuous and E ⊂ X

compact. Then f takes a maximum on E; that is, there exists x

0

∈ E such that

f(x

0

) ≥ f(x) for all x ∈ E .

A similar statement holds for a minimum.

Proof. The set f(E) is closed and bounded, so it contains its supremum, y. Since y ∈ f(E)

there exists x

0

∈ E such that f(x

0

) = y. Then f(x

0

) ≥ f(x) for all x ∈ E.

Continuous functions on compact sets actually satisfy a property that is stronger than

continuity. To explain this, consider the function f : (0, ∞) → R given by f(x) = 1/x.

When we study continuity, what we are really interested in is how much a small change in x

will change the value of f(x). (Recall continuity says that if we change x by at most δ then

f(x) will change by at most .) Consider the eﬀect of changing x by a ﬁxed amount, say .1,

for diﬀerent values of x. If x is large, like 100, then changing x by .1 can change f so that

it lies anywhere in the interval (1/100.01, 1/99.99). If x is small, like .15, then this same

change in x changes f to lie in the interval (1/.25, 1/.05) = (4, 20). This is a much larger

interval, meaning that f is more unstable to changes when x is small compared to when x

is large.

This motivates the idea of uniform continuity. For a uniformly continuous function, the

measure of stability described above is uniform on the whole set. That is, there is an upper

bound to how unstable the function is to changes. This corresponds to a uniform δ > 0 over

all x for a given > 0:

Deﬁnition 6.3.4. A function f : X →Y is uniformly continuous if given > 0 there exists

δ > 0 such that if x, y ∈ X satisfy d

X

(x, y) < δ then d

Y

(f(x), f(y)) < .

Theorem 6.3.5. If f : X →Y is continuous and X is compact then f is uniformly contin-

uous.

67

Proof. The idea of the proof is as follows. Since f is continuous at each x, given > 0 we

can ﬁnd a δ

x

> 0 from the deﬁnition of continuity that “works” at x. These δ

x

-balls cover

X and by compactness we can ﬁnd only ﬁnitely many δ

x

i

’s such that these balls still cover

X. Taking the minimum of these numbers will give us the required (positive) δ.

Let > 0. For each x ∈ X, since f is continuous at x, we can ﬁnd δ

x

> 0 such that if

d

X

(x, x

) < δ

x

then d

Y

(f(x), f(x

)) < /2. The collection

¦B

δx/2

(x) : x ∈ X¦

is an open cover for X, so since X is compact, we can ﬁnd x

1

, . . . , x

n

∈ X such that

X ⊂ ∪

i

B

δx

i

/2

(x

i

). Let

δ = min¦δ

x

1

/2, . . . , δ

xn

/2¦ ;

we claim that if x, y ∈ X satisfy d

X

(x, y) < δ then d

Y

(f(x), f(y)) < . To prove this, pick

such x and y. We can then ﬁnd i such that d

X

(x

i

, x) < δ

i

/2. By the triangle inequality we

then have

d

X

(x

i

, y) ≤ d

X

(x

i

, x) + d

X

(x, y) < δ

i

/2 + δ ≤ δ

i

.

This means by deﬁnition of δ

i

that

d

Y

(f(x), f(y)) ≤ d

Y

(f(x), f(x

i

)) + d

Y

(f(y), f(x

i

)) < /2 + /2 = .

Examples.

1. Not every continuous function on a non-compact set is uniformly continuous. If E

is any non-closed subset of R then there exists a continuous function on E that is

both unbounded and not uniformly continuous. Take x

0

to be any limit point of E

that is not in E. Then f(x) = (x − x

0

)

−1

is continuous but unbounded. Further f

is not uniformly continuous because there is no δ > 0 such that for all x, y ∈ E with

[x −y[ < δ we have [f(x) −f(y)[ < 1. If there were, we could just choose some y ∈ E

with [y − x

0

[ < δ/2 and then deduce that all points z within distance δ of y have

f(z) ≤ f(y) + 1. But this is impossible.

2. If E is an unbounded subset of R then there is an unbounded continuous function on

E: just take f(x) = x.

3. The only polynomials that are uniformly continuous on all of R are those of degree at

most 1. Indeed, take f(x) = a

n

x

n

+ a

1

x + a

0

with a

n

,= 0 and n ≥ 2 and assume

that there exists δ > 0 such that if [x −y[ < δ then [f(x) −f(y)[ < 1. Then consider

points x, y of the form x, x + δ/2: you can check that

¦[f(x) −f(x + δ/2)[ : x ∈ R¦

is unbounded, giving a contradiction. (The problem here is that for ﬁxed δ, the quantity

[f(x) −f(x + δ/2)[ grows to inﬁnity as x →∞. This is not the case if n = 0 or 1.)

There are other ways that functions can fail to be uniformly continuous. We will see later,

however, that any diﬀerentiable function with bounded derivative is uniformly continuous.

68

6.4 Connectedness and the IVT

We would like to prove the intermediate value theorem from calculus and the simplest way to

do this is to see that it is a consequence of a certain property of intervals in R. Speciﬁcally,

an interval is connected. The deﬁnition of connectedness is somewhat strange so we will try

to motivate it. Instead of trying to envision what connectedness is, we will try to capture

what it is not. That is, we want to call a metric space disconnected if we can write it as a

union of two sets that do not intersect. There is a problem with this attempt at a deﬁnition,

as we can see by considering R. Certainly we can write it as (−∞, 1/2) ∪[1/2, ∞) and these

sets do not intersect, but we still want to say that R is connected. The issue in this example

is that the sets are not separated enough from each other. That is, one set contains limit

points of the other. This problem is actually resolved if we require that both sets are open.

(But you have to think about how this resolves the issue.)

Deﬁnition 6.4.1. A metric space X is disconnected if there exist non-empty open sets O

1

and O

2

in X such that X = O

1

∪ O

2

but O

1

∩ O

2

= ∅. If X is not disconnected we say it is

connected.

Connectedness and continuity also go well with each other.

Theorem 6.4.2. Let X, Y be metric spaces and f : X →Y be continuous. If X is connected

then the image set f(X), viewed as a metric space itself, is connected.

Proof. As stated above, we view f(X) ⊂ Y as a metric space itself, using the metric it

inherits from Y . To show that f(X) is a connected space we will assume it is disconnected

and obtain a contradiction. So assume that we can write f(X) = O

1

∪ O

2

with O

1

and O

2

nonempty, disjoint, and open (in the space f(X)). We will produce from this a disconnection

of X and obtain a contradiction.

Now consider U

1

= f

−1

(O

1

) and U

2

= f

−1

(O

2

). These are open sets in X since f is

continuous. Further they do not intersect: if x is in their intersection, then f(x) ∈ O

1

∩ O

2

,

which is empty. Last, they are nonempty because, for example, if y ∈ O

1

(which is nonempty

by assumption) then because O

1

⊂ f(X), there exists x ∈ X such that f(x) = y. This x is

in f

−1

(O

1

).

So we ﬁnd that X is disconnected, a contradiction. This means f(X) must have been

connected.

• Let X be a discrete metric space. If X consists of at least two points then X is

disconnected. This is because we can let O

1

= ¦x¦ for some x ∈ X and O

2

= O

c

1

. All

subsets of X are open, so these are open, disjoint, nonempty sets whose union is X.

• Every interval in R is connected. You will prove this in exercise 6.

Theorem 6.4.3 (Intermediate value theorem). Let f : [a, b] → R for a < b be continuous.

Suppose that for some L ∈ R,

f(a) < L < f(b) .

Then there exists c ∈ (a, b) such that f(c) = L.

69

Proof. Since f is continuous, the space f([a, b]) is connected. Since

O

1

:= (−∞, L) ∩ f([a, b]) and O

2

:= (L, ∞) ∩ f([a, b])

are both nonempty (because f(a) ∈ O

1

and f(b) ∈ O

2

), open in f([a, b]), and disjoint, it

cannot be that their union is equal to f([a, b]). Therefore L ∈ f([a, b]) and there exists

x ∈ X with f(x) = L.

6.5 Discontinuities

Let us spend a couple of minutes on types of discontinuities for real functions. Let E ⊂ R,

f : E →R and x

0

∈ E. (Draw some pictures.)

• x

0

is a removable discontinuity of f if lim

x→x

0

f(x) exists but it not equal to f(x

0

).

• x

0

is a simple discontinuity of f if lim

x→x

−

0

f(x) exists, as does lim

x→x

+

0

f(x), but they

are not equal. Here the ﬁrst limit is a left limit; that is, we are considering f as being

deﬁned on the metric space E∩(−∞, x

0

] and taking the limit in this space. The second

is a right limit, and we consider the space as E ∩ [x

0

, ∞). This corresponds to saying,

for example, that

lim

x→x

−

0

f(x) = L

if for each > 0 there exists δ > 0 such that if x

0

−δ < x < x

0

then [f(x) −L[ < .

• x

0

can be a discontinuity but not captured above. Consider f : R →R given by

f(x) =

_

sin(1/x) if x ,= 0

0 if x = 0

.

Here there is not even a limit as x →0. This is because we can ﬁnd a sequences (x

n

)

converging to 0 such that (f(x

n

)) does not have a limit. Take

x

n

= 2/(nπ) .

6.6 Exercises

1. Let f : [a, b] → R be continuous with f(x) > 0 for all x ∈ [a, b]. Show there exists

δ > 0 such that f(x) ≥ δ for all x ∈ [a, b].

2. Determine if the following functions are continuous at x = 0. Prove your answer. (You

may use standard facts about trigonometric functions although we have not introduced

them rigorously.)

(a)

f(x) =

_

x cos

1

x

if x ,= 0

0 if x = 0

.

70

(b)

g(x) =

_

sin

1

x

if x ,= 0

0 if x = 0

.

3. (a) Let f, g : R → R be continuous. Show that h : R → R is continuous, where h is

given by

h(x) = max¦f(x), g(x)¦ .

(b) Let ( be a set of continuous functions from R to R . For each x, assume that

¦f(x) : f ∈ (¦ is bounded above and deﬁne F : R →R by

F(x) = sup¦f(x) : f ∈ (¦ .

Must F be continuous?

4. In this problem we will show that there is no real-valued function that is continuous

exactly at the rationals. Fix any f : R →R.

(a) Show that for each n ∈ N, the set A

n

is open, where

A

n

=

_

x : ∃δ > 0 such that [f(z) −f(y)[ <

1

n

for all y, z ∈ (x −δ, x + δ)

_

.

(b) Prove that the set of points at which f is continuous is equal to ∩

n∈N

A

n

.

(c) Prove that ∩

n∈N

A

n

cannot equal Q.

Hint. Argue by contradiction and enumerate the rationals as ¦q

1

, q

2

, . . .¦. Deﬁne

B

n

= A

n

¸ ¦q

n

¦ and obtain a contradiction using exercise 5 of Chapter 3.

5. Find metric spaces X, Y , a continuous function f : X → Y , and a Cauchy sequence

¦x

n

¦ in X such that ¦f(x

n

)¦ is not Cauchy in Y .

6. Read the last section of Chapter 4 in Rudin on limits at inﬁnity. Prove that the

function f : (0, ∞) →R given by f(x) = 1/x has lim

x→∞

f(x) = 0.

7. Prove that any interval I ⊂ R is connected.

Hint. Consider I as a metric space with the standard metric from R. Suppose that

I = O

1

∪ O

2

where O

1

∩ O

2

= ∅ and both O

i

’s are nonempty and open in I. Then

there must be a point x

1

∈ O

1

and a point x

2

∈ O

2

. Suppose that x

1

< x

2

and deﬁne

I

1

= ¦r ≥ x

1

: [x

1

, r] ⊂ O

1

¦ .

What can you say about sup I

1

?

8. Let f : (a, b) → R be uniformly continuous. Prove that f has a unique continuous

extension to [a, b). That is, there is a unique g : [a, b) → R which is continuous and

agrees with f everywhere on (a, b). Show by example that it is not enough to assume

f is only continuous, or even both continuous and bounded.

71

9. Show that the function f given by f(x) = 1/x is uniformly continuous on [1, ∞).

10. Show that the function f given by f(x) =

√

x is uniformly continuous on [0, ∞).

Hint. Use the fact that for a, b ≥ 0 we have

√

a +

√

b ≥

√

a + b.

11. Show that the function f given by f(x) = sin(1/x) is not uniformly continuous on

(0, 1).

12. Suppose that f : [0, ∞) → R is continuous and has a ﬁnite limit lim

x→∞

f(x). Show

that f is uniformly continuous.

13. Give an example of functions f, g : [0, ∞) →R that are uniformly continuous but the

product fg is not.

14. Let f : R →R be continuous with f(f(x)) = x for all x. Show there exists c ∈ R such

that f(c) = c.

15. Let p be a polynomial with real coeﬃcients and odd degree. That is,

p(x) = a

n

x

n

+ + a

1

x + a

0

with a

n

,= 0 and n odd .

(a) Show there exists c such that p(c) = 0.

(b) Let L ∈ R. Show there exists c such that p(c) = L.

16. If E ⊂ R then a function f : E →R is called Lipschitz if there exists M > 0 such that

[f(x) −f(y)[ ≤ M[x −y[ for all x, y ∈ E .

The smallest number such that the above inequality holds for all x, y ∈ E is called the

Lipschitz constant for f.

(a) Show that if f : E → R is Lipschitz then it is uniformly continuous. Does the

converse hold?

(b) Show that the function f : R → R given by f(x) =

√

x

2

+ 4 is Lipschitz on R.

What is the Lipschitz constant?

(c) Is f : [0, ∞) →R given by f(x) =

√

x Lipschitz?

17. Let I be a closed interval. Let f : I →I and assume that f is Lipschitz with Lipschitz

constant A < 1.

(a) Prove that there is a unique y ∈ I with the following property. Choose x

1

∈ I

and deﬁne x

n+1

= f(x

n

) for all n ∈ N. Then x

n

→ y. This holds independently

of the choice of x

1

.

(b) Show by counterexample that for (a) to work, we need I to be closed.

72

(c) Choose a

1

, a

2

, . . . , a

k

∈ Q with a

i

> 0 for all i and with a

1

a

k

> 1. Starting

from any x

1

> 0, deﬁne a sequence ¦x

n

¦ by the continued fraction

x

n

=

1

a

1

+

1

a

2

+

1

+

1

a

k

+ x

n−1

.

Prove that ¦x

n

¦ converges. Prove that its limit is the root of a quadratic poly-

nomial with coeﬃcients in Q. In older books this is stated: an inﬁnite periodic

continued fraction is a quadratic surd. “The devil is the eternal surd in the

universal mathematic.” – C. S. Lewis, Perelandra.

18. Let f : I →R for some (open or closed) interval I ⊂ R. We say that f is convex if for

all x, y ∈ I and λ ∈ [0, 1],

f(λx + (1 −λ)y) ≤ λf(x) + (1 −λ)f(y) .

(a) Reformulate the above condition in terms of a relation between the graph of f

and certain line segments.

(b) Suppose that f : R → R is convex and let x < z < y. Choose λ =

y−z

y−x

to show

that

f(z) −f(x)

z −x

≤

f(y) −f(x)

y −x

.

Interpret this inequality in terms of the graph of f. Argue similarly to show that

f(y) −f(x)

y −x

≤

f(y) −f(z)

y −z

.

Combine these two to get

f(z) −f(x)

z −x

≤

f(y) −f(z)

y −z

and interpret this inequality in terms of the graph of f.

(c) Suppose that f : [a, b] →R is convex. Show that f is continuous on (a, b).

Hint. Let [c, d] be a subinterval of (a, b). Use the last inequality from (b) to show

that f is Lipschitz on [c, d] with Lipschitz constant bounded above by

max

_

[f(c) −f(a)[

[c −a[

,

[f(b) −f(d)[

[b −d[

_

.

19. Suppose that f : R →R is continuous and satisﬁes

f(x + y) = f(x) + f(y) for all x, y ∈ R .

(a) Show that there exists c ∈ R such that for all x ∈ Z, f(x) = cx.

(b) Show that there exists c ∈ R such that for all x ∈ Q, f(x) = cx.

(c) Show that there exists c ∈ R such that for all x ∈ R, f(x) = cx.

73

7 Derivatives

7.1 Introduction

Continuous functions are nicer than most functions. However we have seen that they can still

be rather weird (recall the function that equals 1/q at a rational expressed in lowest terms

as p/q). So we move on to study functions that are even nicer, and for this we henceforth

restrict to functions from R to R. We could start at the very bottom, ﬁrst studying constant

functions f(x) = c and then linear functions f(x) = ax+b, then quadratics, etc. But I trust

you learned about these functions earlier. Noting that constant functions are just special

cases of linear ones, we set out to study functions that are somehow close to linear functions.

The idea we will pursue is that even if a function f is wild, it may be that very close to

a particular point x

0

, it may be well represented by a linear function. For a good choice of

a linear function L, it would make sense to hope that

lim

x→x

0

(f(x) −L(x)) = 0 .

If f is already continuous then this is not much of a requirement: we just need L(x

0

) = f(x

0

).

So this just means that L(x) can be written as L(x) = a(x −x

0

) + f(x

0

).

We will look for a stronger requirement on the speed at which this diﬀerence converges

to zero. It should go to zero at least as fast as x −x

0

does (as x →x

0

). In other words, we

will require that

lim

x→x

0

f(x) −L(x)

x −x

0

= 0 or in shorthand, f(x) −L(x) = o(x −x

0

) .

Plugging in our form of L, this means

lim

x→x

0

_

f(x) −f(x

0

)

x −x

0

−a

_

= 0 ,

or

lim

x→x

0

f(x) −f(x

0

)

x −x

0

= a .

Rewriting this with the notation above, we get

f(x) = f(x

0

) + a(x −x

0

) + o(x −x

0

) ,

or setting x = x

0

+ h,

f(x

0

+ h) = f(x

0

) + ah + o(h)

as h → 0. Again, the symbol o(h) represents some term such that if we divide it by h and

take h →0, it goes to 0.

Deﬁnition 7.1.1. Let f : (a, b) →R. We say that f is diﬀerentiable at x

0

∈ (a, b) if

lim

x→x

0

f(x) −f(x

0

)

x −x

0

exists .

In this case we write f

(x

0

) for the limit.

74

7.2 Properties

Proposition 7.2.1. Let f : (a, b) →R be diﬀerentiable at x

0

. Then f is continuous at x

0

.

Proof.

lim

x→x

0

f(x) = f(x

0

) + lim

x→x

0

[f(x) −f(x

0

)] = f(x

0

) + lim

x→x

0

f(x) −f(x

0

)

x −x

0

lim

x→x

0

(x −x

0

) = f(x

0

) .

The converse is not true. Consider the function f : R →R given by f(x) = [x[. Then

f(x) = max¦x, 0¦ −min¦x, 0¦ ,

so it is continuous. However, trying to compute the derivative at x = 0, we get

lim

x→0

[x[

x

,

which does not exist (it has a right limit of 1 and left limit of -1).

We will now play the same game as we did for continuity, trying to ﬁnd which functions

are diﬀerentiable. Here are some examples.

1. f(x) = x:

lim

x→x

0

f(x) −f(x

0

)

x −x

0

= 1 .

So f

**(x) exists for all x and equals 1.
**

2. f(x) = x

n

for n ∈ N:

lim

x→x

0

x

n

−x

n

0

x −x

0

= lim

x→x

0

(x −x

0

)(x

n−1

+ x

n−2

x

0

+ + xx

n−2

0

+ x

n−1

0

)

x −x

0

= lim

x→x

0

_

x

n−1

+ + x

n−1

0

¸

= nx

n−1

0

.

So f

**(x) exists for all x and equals nx
**

n−1

.

Again we look at how to build diﬀerentiable functions from others.

Proposition 7.2.2. Let f, g : (a, b) →R be diﬀerentiable at x. Then the following functions

are diﬀerentiable with derivatives:

1. (f + g)

(x) = f

(x) + g

(x)

2. (fg)

(x) = f

(x)g(x) + f(x)g

(x).

3. (f/g)

(x) =

f

(x)g(x)−f(x)g

(x)

g

2

(x)

if g(x) ,= 0.

75

Proof. For the ﬁrst we just use properties of limits:

lim

y→x

(f + g)(y) −(f + g)(x)

y −x

= lim

y→x

f(y) −f(x)

y −x

+ lim

y→x

g(y) −g(x)

y −x

= f

(x) + g

(x) .

For the second, we write

(fg)(y) −(fg)(x) = (f(y) −f(x))g(y) + f(x)(g(y) −g(x)) ,

divide by y −x and take a limit:

lim

y→x

(fg)(y) −(fg)(x)

y −x

= lim

y→x

f(y) −f(x)

y −x

lim

y→x

g(y) + f(x) lim

y→x

g(y) −g(x)

y −x

.

As g is diﬀerentiable at x, it is also continuous, so g(y) → g(x) as x → y. This gives the

formula.

The last property can be derived in a similar fashion:

(f/g)(y) −(f/g)(x) =

1

g(y)g(x)

[f(y)g(x) −f(x)g(y)]

=

1

g(y)g(x)

[g(x)(f(y) −f(x)) −f(x)(g(y) −g(x))] .

Dividing by y −x and taking the limit gives the result.

Again, from this proposition, we ﬁnd that all polynomials are diﬀerentiable everywhere

as are rational functions wherever the denominator is nonzero. The next way to build

diﬀerentiable functions is to compose:

Theorem 7.2.3 (Chain rule). Let f : (a, b) →(c, d) be diﬀerentiable at x

0

and g : (c, d) →R

be diﬀerentiable at f(x

0

). Then g ◦ f is diﬀerentiable at x

0

with derivative

(g ◦ f)

(x

0

) = f

(x

0

)g

(f(x

0

)) .

Proof. We will want to use a division by f(y) −f(x

0

) for y ,= x

0

, so we must ﬁrst deal with

the case that this could be 0. If there exists a sequence (x

n

) in (a, b) with x

n

→ x

0

but

x

n

,= x

0

for all n with f(x

n

) = f(x

0

) for inﬁnitely many n, we would have

f

(x

0

) = lim

y→x

f(y) −f(x

0

)

y −x

0

= lim

n→∞

f(x

n

) −f(x

0

)

x

n

−x

0

= 0 ,

so the right side of the equation in the theorem would be 0. The left side would also be zero

for a similar reason:

lim

y→x

0

(g ◦ f)(y) −(g ◦ f)(x

0

)

y −x

0

= lim

n→∞

g(f(x

n

)) −g(f(x

0

))

x

n

−x

0

= 0 .

76

In the other case, every sequence (x

n

) in (a, b) with x

n

→ x

0

and x

n

,= x

0

has f(x

n

) =

f(x

0

) for at most ﬁnitely many n. Then as f is continuous at x

0

, we have f(x

n

) → f(x

0

)

with a

n

,= f(x

0

) for all n and so

lim

y→x

(g ◦ f)(y) −(g ◦ f)(x)

y −x

= lim

n→∞

g(f(x

n

)) −g(f(x

0

))

x

n

−x

0

= lim

n→∞

g(f(x

n

)) −g(f(x

0

))

f(x

n

) −f(x

0

)

lim

n→∞

f(x

n

) −f(x

0

)

x

n

−x

0

= g

(f(x

0

))f

(x

0

) .

Examples.

1. We know f(x) = [x[ is continuous but not diﬀerentiable. To go one level deeper,

consider

f(x) =

_

x

2

x ≥ 0

−x

2

x < 0

.

The derivative at 0 is

lim

h→0

f(0 + h)

h

= 0 ,

and the derivative elsewhere is

f

(x) =

_

2x x > 0

−2x x < 0

.

Note that f

**is continuous. Then we say f ∈ C
**

1

(or f is in class C

1

). However the

second derivative does not exist.

2. The function

f(x) =

_

x

3

x ≥ 0

−x

3

x < 0

is in class C

2

, as it has two continuous derivatives. But it is not three times diﬀeren-

tiable.

3. Generally, the function

f(x) =

_

x

n

x ≥ 0

−x

n

x < 0

, n ≥ 1

is in class C

n−1

, meaning that it has n−1 continuous derivatives. But it is not n times

diﬀerentiable.

77

7.3 Mean value theorem

We begin by looking at local extrema.

Deﬁnition 7.3.1. For X a metric space, let f : X → R. We say that x

0

∈ X is a local

maximum for f if there exists r > 0 such that for all x ∈ B

r

(x

0

) we have f(x) ≤ f(x

0

).

Similarly x

0

is a local minimum for f if there exists r > 0 such that for all x ∈ B

r

(x

0

) we

have f(x) ≥ f(x

0

).

In the case that X is R, if f is diﬀerentiable at a local extreme point, then the derivative

must be zero.

Proposition 7.3.2. Let f : (a, b) → R and suppose that c ∈ (a, b) is a local extreme point

for f. If f

(c) exists then f

(c) = 0.

Proof. Let c be a local max such that f

**(c) exists. Then there exists r > 0 such that for all
**

y with [y −c[ < r, we have f(y) ≤ f(c). Therefore, looking at only right limits,

lim

y→c

+

f(y) −f(c)

y −c

≤ 0 .

Looking only at left limits,

lim

y→c

−

f(y) −f(c)

y −c

≥ 0 .

Putting these together, we ﬁnd f

**(c) = 0. The argument for local min is similar.
**

Theorem 7.3.3 (Rolle’s theorem). For a < b, let f : [a, b] → R be continuous such that f

is diﬀerentiable on (a, b). If f(a) = f(b) then there exists c ∈ (a, b) such that f

(c) = 0.

Proof. If f is constant on the interval then clearly the statement holds. Otherwise for some

d ∈ (a, b) we have f(d) > f(a) or f(d) < f(a). Let us consider the ﬁrst case; the second is

similar. By the extreme value theorem, f takes a maximum on [a, b] and since f(d) > f(a)

this max cannot occur at a or b. So it occurs at some c ∈ (a, b). Then c is a local max as

well, so we can apply the previous proposition to ﬁnd f

(c) = 0.

An important corollary is the following.

Corollary 7.3.4 (Mean value theorem). For a < b let f : [a, b] →R be continuous such that

f is diﬀerentiable on (a, b). There exists c ∈ (a, b) such that

f

(c) =

f(b) −f(a)

b −a

.

Proof. Deﬁne L(x) to be the line that connects the points (a, f(a)) and (b, f(b)):

L(x) =

f(b) −f(a)

b −a

(x −a) + f(a) .

78

Then the function g = f − L satisﬁes g(a) = g(b) = 0. It is also continuous on [a, b]

and diﬀerentiable on (a, b). Therefore by Rolle’s theorem, we can ﬁnd c ∈ (a, b) such that

g

(c) = 0. This gives

0 = g

(c) = f

(c) −L

(c) = f

(c) −

f(b) −f(a)

b −a

,

implying the corollary.

The mean value theorem has a lot of consequences. It is one of the central tools to

analyze derivatives.

Corollary 7.3.5. Let f : (a, b) →R be diﬀerentiable.

1. If f

**(x) ≥ 0 for all x ∈ (a, b) then f is non-decreasing.
**

2. If f

**(x) ≤ 0 for all x ∈ (a, b) then f is non-increasing.
**

3. If f

**(x) = 0 for all x ∈ (a, b) then f is constant.
**

Proof. Suppose ﬁrst that f

**(x) ≥ 0 for all x ∈ (a, b). To show f is non-decreasing, let c < d
**

in (a, b). By the mean value theorem, there exists x

0

∈ (c, d) such that

f

(x

0

) =

f(d) −f(c)

d −c

.

But this quantity is nonnegative, giving f(d) ≥ f(c). The second follows by considering −f

instead of f. The third follows from the previous two.

7.4 L’Hopital’s rule

For the proof of L’Hopital’s rule, we need a generalized version of the mean value theorem.

Lemma 7.4.1 (Generalized MVT). If f, g : [a, b] →R are continuous and diﬀerentiable on

(a, b) then there exists c ∈ (a, b) such that

(f(b) −f(a))g

(c) = (g(b) −g(a))f

(c) .

Proof. The proof is exactly the same as that of the MVT but using the function h : [a, b] →R

given by

h(x) = (f(b) −f(a))g(x) −(g(b) −g(a))f(x) .

Indeed, h(a) = (f(b) −f(a))g(a) −(g(b) −g(a))f(a) = h(b), so applying Rolle’s theorem, we

ﬁnd c ∈ (a, b) such that h

(c) = 0.

79

Theorem 7.4.2 (L’Hopital’s rule). Suppose f, g : (a, b) →R are diﬀerentiable with g

(x) ,= 0

for all x, where −∞≤ a < b < ∞. Suppose that

f

(x)

g

(x)

→A as x →a .

If f(x) →0 and g(x) →0 as x →a or if g(x) →+∞ as x →a, then

f(x)

g(x)

→A as x →a .

Proof. We will suppose that A, a ,= ±∞; otherwise the argument is similar. We consider

two cases. First suppose that f(x) →0 and g(x) →0 as x →a. Then let > 0 and choose

δ > 0 such that if x ∈ (a, a + δ) then

¸

¸

¸

¸

f

(x)

g

(x)

−A

¸

¸

¸

¸

< /2 .

We will now show that if x ∈ (a, a + δ) then also

¸

¸

¸

f(x)

g(x)

−A

¸

¸

¸ < . Indeed, choose such an x

and then pick any y ∈ (a, x). From the generalized MVT, there exists c ∈ (y, x) such that

f(x) −f(y)

g(x) −g(y)

=

f

(c)

g

(c)

.

Note that the denominator is nonzero since g is injective (just use the MVT). But since

c ∈ (a, a + δ), we have

¸

¸

¸

¸

f(x) −f(y)

g(x) −g(y)

−A

¸

¸

¸

¸

< /2 .

Let y →a and we ﬁnd the result.

In the second case, we suppose that g(x) → +∞ as x → a. Again for > 0 pick δ

1

> 0

such that if x ∈ (a, a + δ

1

) then

¸

¸

¸

¸

f

(x)

g

(x)

−A

¸

¸

¸

¸

< /2 .

Fix x

0

= a + δ

1

. By the generalized MVT, as before, for all x ∈ (a, x

0

),

A −/2 <

f(x) −f(x

0

)

g(x) −g(x

0

)

< A + /2 . (4)

Notice that since g(x) →∞ as x →a,

lim

x→a

x∈(a,x

0

)

g(x) −g(x

0

)

g(x)

= 1 .

80

Therefore using equation (4), there exists δ

2

< δ

1

such that if x ∈ (a, a + δ

2

) then

A −3/4 <

f(x) −f(x

0

)

g(x) −g(x

0

)

g(x) −g(x

0

)

g(x)

< A + 3/4 . (5)

Also since g(x) →∞ as x →a,

lim

x→a

x∈(a,x

0

)

f(x

0

)

g(x)

= 0 .

Therefore using (9) we can ﬁnd δ

3

< δ

2

such that if x ∈ (a, a + δ

3

) then

A − <

f(x) −f(x

0

)

g(x) −g(x

0

)

g(x) −g(x

0

)

g(x)

+

f(x

0

)

g(x)

< A + .

But this means

A − <

f(x)

g(x)

< A + for all x ∈ (a, a + δ

3

) .

This proves that

f(x)

g(x)

→A as x →a.

7.5 Power series

We will derive some results about power series because they will help us on the problem set

to deﬁne trigonometric functions. Let f(x) =

∞

n=0

a

n

x

n

be a power series with radius of

convergence R > 0. We wish to show that

• f is diﬀerentiable on (−R, R).

• The power series

∞

n=0

na

n

x

n−1

also has radius of convergence R.

• For all x ∈ (−R, R), f

(x) =

∞

n=0

na

n

x

n−1

.

Step 1. The power series

∞

n=0

na

n

x

n−1

also has radius of convergence R. To show this, we

need a lemma.

Lemma 7.5.1. Suppose that (x

n

) and (y

n

) are non-negative real sequences such that x

n

→

x > 0. Then

limsup

n→∞

x

n

y

n

= x limsup

n→∞

y

n

.

Proof. We will use the deﬁnition from the homework that limsup

n→∞

b

n

is the supremum

of all subsequential limits of (b

n

). Let S be the set of subsequential limits of (y

n

) and T the

corresponding set for (x

n

y

n

). We will prove the case that S and T are bounded above; the

other case is left as an exercise.

We claim that

xS = T , where xS = ¦xs : s ∈ S¦ .

To prove this, let a ∈ xS. Then there exists a subsequence (y

n

k

) such that y

n

k

→ a/x.

Now x

n

k

y

n

k

→ xa/x = a, giving that a ∈ T. Conversely, let b ∈ T so that there exists a

81

subsequence (x

n

k

y

n

k

) such that x

n

k

y

n

k

→ b. Then y

n

k

= x

n

k

y

n

k

/x

n

k

→ b/x. This means

that b = xb/x ∈ xS.

To ﬁnish the proof we show that sup T = x sup S. First if t ∈ T we have t/x ∈ S, so

t/x ≤ sup S. Therefore t ≤ x sup S and sup T ≤ x sup S. Conversely if s ∈ S then xs ∈ T,

so xs ≤ sup T, giving s ≤ (1/x) sup T. This means sup S ≤ (1/x) sup T and therefore

sup T ≥ x sup S.

To ﬁnd the radius of convergence of

∞

n=0

na

n

x

n−1

, we use the root test:

limsup

n→∞

(n[a

n

[)

1/n

= limsup

n→∞

n

1/n

[a

n

[

1/n

.

Since n

1/n

→1 we can use the previous lemma to get a limsup of 1/R, where R is the radius

of convergence of

∞

n=0

a

n

x

n

. This means the radius of convergence of the new series is also

R.

Step 2. The function f given by f(x) =

∞

n=0

a

n

x

n

is diﬀerentiable at x = 0.

To prove this, we use 0 < [x[ < R/2 and compute

f(x) −f(0)

x −0

=

∞

n=0

a

n

x

n

−a

0

x

=

∞

n=1

a

n

x

n−1

.

Pulling oﬀ the ﬁrst term,

¸

¸

¸

¸

f(x) −f(0)

x −0

−a

1

¸

¸

¸

¸

=

¸

¸

¸

¸

¸

∞

n=2

a

n

x

n−1

¸

¸

¸

¸

¸

= [x[

¸

¸

¸

¸

¸

∞

n=2

a

n

x

n−2

¸

¸

¸

¸

¸

.

We can use the triangle inequality for the last sum to get

¸

¸

¸

¸

f(x) −f(0)

x −0

−a

1

¸

¸

¸

¸

≤ [x[

∞

n=2

[a

n

[[x[

n−2

≤ [x[

∞

n=2

[a

n

[(R/2)

n−2

.

By the ratio test, the last series converges, so setting C equal to it, we ﬁnd

¸

¸

¸

¸

f(x) −f(0)

x −0

−a

1

¸

¸

¸

¸

≤ C[x[ .

Now we can take the limit as x →0 and ﬁnd

lim

x→0

¸

¸

¸

¸

f(x) −f(0)

x −0

−a

1

¸

¸

¸

¸

= 0 , or lim

x→0

f(x) −f(0)

x −0

= a

1

.

This means f

(0) = a

1

.

Step 3. We will now prove that f is diﬀerentiable at all [x[ < R. So take such an x

0

and use

the binomial theorem:

f(x) =

∞

n=0

a

n

(x −x

0

+ x

0

)

n

=

∞

n=0

a

n

_

n

j=0

_

n

j

_

x

n−j

0

(x −x

0

)

j

_

=

∞

n=0

∞

j=0

1

n≥j

a

n

_

n

j

_

x

n−j

0

(x −x

0

)

j

. (6)

We now state a lemma.

82

Lemma 7.5.2. Let a

m,n

, m, n ≥ 0 be a double sequence. If

∞

n=0

[

∞

m=0

[a

m,n

[] converges

then

∞

n=0

_

∞

m=0

a

m,n

_

=

∞

m=0

_

∞

n=0

a

m,n

_

.

Proof. Let > 0 and write S for the left side above and T for the right side above. For

M, N ∈ N, deﬁne

S

M,N

=

N

n=0

M

m=0

a

m,n

and T

M,N

=

M

m=0

N

n=0

a

m,n

.

Clearly S

M,N

= T

M,N

for all M, N ∈ N. We claim that there exists M

0

, N

0

such that if

M ≥ M

0

and N ≥ N

0

then both [S − S

M,N

[ and [T − T

M,N

[ are less than /2. We need to

only verify this for S because the same argument works for T. Once we show that, we have

[S −T[ ≤ [S −S

M,N

[ +[S

M,N

−T

M,N

[ ≤ [T

M,N

−T[ < ,

and since is arbitrary, this means S = T.

To prove the claim ﬁrst use the fact that

∞

n=0

[

∞

m=0

[a

m,n

[] converges to pick N

0

such

that if n ≥ N

0

then

∞

n=N

0

+1

[

∞

m=0

[a

m,n

[] < /4. Next because each sum

∞

m=0

[a

m,0

[,

∞

m=0

[a

m,1

[, . . . ,

∞

m=0

[a

m,N

0

[

converges, we can pick M

0

such that if

N

0

n=0

∞

m=0

[a

m,n

[ < /4. This gives for M ≥ M

0

and N ≥ N

0

,

[S −S

M,N

[ ≤

N

n=0

_

∞

m=M+1

[a

m,n

[

_

+

∞

n=N+1

_

∞

m=0

[a

m,n

[

_

=

N

0

n=0

_

∞

m=M+1

[a

m,n

[

_

+

N

n=N

0

+1

_

∞

m=M+1

[a

m,n

[

_

+

∞

n=N+1

_

∞

m=0

[a

m,n

[

_

≤

N

0

n=0

_

∞

m=M+1

[a

m,n

[

_

+

∞

n=N

0

+1

_

∞

m=0

[a

m,n

[

_

≤

N

0

n=0

_

∞

m=M

0

+1

[a

m,n

[

_

+

∞

n=N

0

+1

_

∞

m=0

[a

m,n

[

_

< /2 .

We now want to apply the lemma to the sum in (6). To do this, we must verify that

∞

n=0

_

∞

j=0

1

n≥j

[a

n

[

_

n

j

_

[x

0

[

n−j

[x −x

0

[

j

_

83

converges. But using the binomial theorem again, this sum equals

∞

n=0

[a

n

[([x

0

[ +[x −x

0

[)

n

,

which converges as long as [x

0

[ +[x −x

0

[ < R. So pick such an x and we can exchange the

order of summation:

f(x) =

∞

j=0

_

∞

n=j

a

n

_

n

j

_

x

n−j

0

_

(x −x

0

)

j

.

We can view this as a power series in x −x

0

by setting g(x) = f(x +x

0

) and seeing that for

[x[ < R −[x

0

[,

g(x) =

∞

j=0

b

j

x

j

, with b

j

=

∞

n=j

a

n

_

n

j

_

x

n−j

0

.

Taking the derivative of this at x = 0 gives by the previous computation

f

(x

0

) = g

(0) = b

1

=

∞

n=1

na

n

x

n−1

0

.

7.6 Taylor’s theorem

Note that the theorem on power series actually gives that if f(x) =

∞

n=0

a

n

x

n

then f has

inﬁnitely many derivatives. (Just apply the theorem over and over.) Then we ask: is it true

that if a function has inﬁnitely many derivatives then it is equal to some power series?

Deﬁnition 7.6.1. A function f : (a, b) →R is called analytic if it equals some power series

∞

n=0

a

n

x

n

.

The question now becomes: is every f ∈ C

∞

actually analytic? To try to answer this

question we look at the derivatives of a power series: if f(x) =

∞

n=0

a

n

x

n

, then

f(0) = a

0

, f

(0) = a

1

, f

(0) = 2a

2

, f

(0) = 6a

3

, . . .

So we can rewrite a power series as

f(x) =

∞

n=0

f

(n)

(0)

n!

x

n

.

The sum on the right is called the Taylor series for f.

To try to go the other way (to try to build a power series from a function), suppose for

simplicity that f : R →R and a < b. If f is diﬀerentiable on (a, b) and continuous on [a, b],

the mean value theorem gives c

1

∈ (a, b) such that

f(b) = f(a) + f

(c

1

)(b −a) .

84

We can then ask, if f is twice diﬀerentiable, can we ﬁnd c

2

∈ (a, b) such that

f(b) = f(a) + f

(a)(b −a) +

f

(c

2

)

2

(b −a)

2

,

or a c

3

∈ (a, b) such that

f(b) = f(a) + f

(a)(b −a) +

f

(a)

2

(b −a)

2

+

f

(c

3

)

6

(b −a)

3

?

The answer is yes, and in fact we can keep going to any order we like. For its statement,

derivatives at a and b are understood as right and left derivatives, respectively.

Theorem 7.6.2 (Taylor’s theorem). Suppose that f : [a, b] → R has n − 1 continuous

derivatives on [a, b] and is n times diﬀerentiable on (a, b). There exists c ∈ (a, b) such that

f(b) =

n−1

j=0

f

(j)

(a)

j!

(b −a)

j

+

f

(n)

(c)

n!

(b −a)

n

.

Proof. See the proof in Rudin, Thm. 5.15. It is a repeated application of the mean value

theorem.

We get from this a corollary:

Corollary 7.6.3. Suppose that f : [a, b] → R has inﬁnitely many derivatives; that is, f ∈

C

∞

([a, b]). Set

M

n

= sup

c∈(a,b)

f

(n)

(c) .

If

Mn

n!

(b −a)

n

→0 then

f(b) =

∞

n=0

f

(n)

n!

(b −a)

n

.

We can see that in this corollary it is necessary to have this bound on M

n

. Take for

example f : [0, ∞) →R given by

f(x) =

_

e

−1/x

if x > 0

0 if x = 0

.

In this case, you can check that f

(n)

(0) = 0 for all n. However, if f(x) =

∞

n=0

a

n

x

n

this

would imply that a

n

= 0 for all n, giving f(x) = 0 for all x.

This means in particular that we must not have the required growth on f

(n)

(x) to apply

the corollary. If you compute the n-th derivative, you can try to see why the corollary does

not apply; that is, why f is not analytic. For instance, we have

f

(x) = e

−1/x

_

−1

x

2

_

, f

(x) = e

−1/x

_

1

x

4

−

2

x

3

_

for x > 0

85

and n-th derivative can be written as

f

(n)

(x) = e

−1/x

P(1/x) ,

where P is a polynomial in 1/x of degree 2n. For any given r > 0, you can show that

sup

x∈[0,r]

f

(n)

(x)

n!

r

n

→∞ ,

so that the corollary cannot apply.

7.7 Exercises

1. Prove that for any c ∈ R, the polynomial equation x

3

−3x + c = 0 does not have two

distinct roots in [0, 1].

2. Suppose that f : R →R is diﬀerentiable and there exists C < 1 such that [f

(x)[ ≤ C

for all x.

(a) Show that there exists a unique ﬁxed point; that is, an x such that f(x) = x.

(b) Show that if f(0) > 0 then the ﬁxed point is positive.

3. Let f : R →R be continuous. Suppose that for some a < b, both of the following two

conditions hold:

• f(a) = f(b) = 0 and

• f si diﬀerentiable at both a and b with f

(a)f

(b) > 0.

Show there exists c ∈ (a, b) such that f(c) = 0.

4. Assume f on [a, b] is continuous, and that f

**exists and is everywhere continuous and
**

positive on (a, b). Let [c, d] be the image of f. Prove that f has an inverse function

f

−1

: [c, d] →[a, b] and that the derivative of f

−1

is continuous on (c, d).

5. Let f : (−a, a) → R. Assume there is a C ∈ R such that for all x ∈ (−a, a), we have

[f(x) −x[ ≤ Cx

2

. Does f

(0) exist? If so, what is it?

6. Use the Mean Value Theorem to prove that for x ,= 0

1 +

x

2

√

1 + x

<

√

1 + x < 1 + x/2 .

7. If I is an open interval and f : I →R is diﬀerentiable, show that [f

(x)[ is bounded on

I by a constant M if and only if f is Lipschitz on I with Lipschitz constant bounded

above by (this same) M.

86

8. Read example 5.6 in Rudin. Deﬁne f : R →R by

f(x) =

_

x

200

sin

1

x

x ,= 0

0 x = 0

.

(a) For which n ∈ N does f

(n)

(0), the n-th derivative of f at 0, exist?

(b) For which n ∈ N does lim

x→0

+ f

(n)

(x) exist?

(c) For which n ∈ N is f ∈ C

n

(R)?

9. Let I ⊂ R be an open interval. Assume f : I → R is continuous on I and is diﬀeren-

tiable on I except perhaps at c ∈ I. Suppose further that lim

x→c

f

(x) exists. Prove

that f is diﬀerentiable at c and that f

is continuous at c.

10. (Weierstrass M-test) Let I be any interval. For each n ∈ N, let f

n

: I → R be

continuous and assume that there is a constant M

n

such that [f

n

(x)[ ≤ M

n

for all x.

Assume further that

M

n

converges.

(a) Show that for each x ∈ I, the sum

n

f

n

(x) converges. Call this number f(x).

We say ¦f

n

¦ converges pointwise to f.

(b) Show that f : I →R given in the ﬁrst part is continuous.

Hint. Given > 0, ﬁnd N ∈ N such that

¸

¸

¸

N

n=1

f

n

(x) −f(x)

¸

¸

¸ < /2 for all x ∈ I.

Then use the fact that

N

n=1

f

n

is continuous.

Remark. The condition above with /2 is called uniform convergence. Precisely,

we say a family ¦f

n

¦ of functions from R to R converges uniformly to f if for

each > 0 there exists N such that n ≥ N implies that [f

n

(x) −f(x)[ < for all

x. This problem is a special case of a more general theorem: if a family ¦f

n

¦ of

continuous functions converges uniformly then the limit f is continuous. Try to

think up an example where a family of continuous functions converges pointwise

to f, but does not converge uniformly, and where f is not continuous.

11. (From J. Feldman.) In this problem we will construct a function that is continuous

everywhere but diﬀerentiable nowhere. Deﬁne g : R →R by ﬁrst setting for x ∈ [0, 2],

g(x) =

_

x x ∈ [0, 1]

2 −x x ∈ [1, 2]

.

Then for x / ∈ [0, 2], deﬁne g(x) so that it is periodic of period 2; that is, set g(x) = g(ˆ x)

for the unique ˆ x ∈ [0, 2) such that ˆ x = x +2m for some m ∈ Z. (The graph of g forms

a sequence of identical triangles with the x-axis, each of height 1 and base 2. Clearly

g is continuous.) For each n ∈ N, deﬁne f

n

: [0, 1] →R by f

n

(x) =

_

3

4

_

n

g(4

n

x).

(a) Make a sketch of f

1

and f

2

on [0, 1]. (Optional: use a computer algebra package

to graph f

1

, f

1

+ f

2

, f

1

+ f

2

+ f

3

, etc.)

87

(b) Prove that the formula f(x) =

∞

n=1

f

n

(x) deﬁnes a continuous function on [0, 1].

(c) Complete the following steps to show that f is not diﬀerentiable at any x.

i. Let x ∈ [0, 1] and for each m ∈ N, deﬁne h

m

to be either number in the set

_

x −

1

2

4

−m

, x +

1

2

4

−m

_

such that there is no integer strictly between 4

m

x and

4

m

h

m

. Show that

if n > m then

f

n

(h

m

) −f

n

(x)

h

m

−x

= 0 .

ii. Show that

if n = m then

¸

¸

¸

¸

f

n

(h

m

) −f

n

(x)

h

m

−x

¸

¸

¸

¸

= 3

m

.

iii. Show that

if n < m then

¸

¸

¸

¸

f

n

(h

m

) −f

n

(x)

h

m

−x

¸

¸

¸

¸

≤ 3

n

.

Putting these three cases together, show that

¸

¸

¸

¸

f(h

m

) −f(x)

h

m

−x

¸

¸

¸

¸

≥

1

2

(3

m

+ 3)

and deduce that f is not diﬀerentiable at x.

12. Deﬁne for x ∈ R,

sin x =

∞

n=0

(−1)

n

x

2n+1

(2n + 1)!

and cos x =

∞

n=0

(−1)

n

x

2n

(2n)!

.

(a) Show that for any x, both series converge absolutely and deﬁne continuous func-

tions. Show that cos 0 = 1 and sin 0 = 0.

(b) Show that the derivative of sin x is cos x and the derivative of cos x is −sin x.

(c) Show that for any x, sin

2

x + cos

2

x = 1.

Hint. Take the derivative of the left side.

(d) For a given a ∈ R ﬁnd the Taylor series of both f(x) = sin(a + x) and g(x) =

cos(a + x) centered at x = 0.

(e) Use the previous part to show the identities

sin(x + y) = sin x cos y + cos x sin y and cos(x + y) = cos x cos y −sin x sin y .

13. Deﬁne the set

S = ¦x > 0 : cos x = 0¦ .

88

(a) Show that S is nonempty.

Hint. Assume it is empty. Since cos 0 = 1, show that then cos x would be

positive for all x > 0 and therefore sin x would be strictly increasing. As sin x is

bounded, it would have a limit as x → ∞. Deduce then that cos x would also

have a limit L. Show that L = 2L

2

−1 and that we must have L = 1. Argue that

this implies sin x is unbounded.

(b) Deﬁne

π = 2 inf S .

Show that cos

π

2

= 0, sin

π

2

= 1. Then prove that sin(x + 2π) = sin x and

cos(x + 2π) = cos x.

(c) Deﬁne tan x =

sin x

cos x

for all x such that cos x ,= 0. Show that tan

π

4

= 1.

14. Please continue to use only the facts about trigonometry established in problems 9

and 10.

(a) Show that the derivative of tan x is sec

2

x, where we deﬁne sec x = 1/ cos x.

(b) From now on, restrict the domain of tan x to (−π/2, π/2). Show that tan x is

strictly increasing on this domain. Show that its image is R. Therefore tan x has

an inverse function arctan x mapping R → (−π/2, π/2). By problem 1, arctan x

is of class C

1

, and in particular continuous.

(c) Show that sec

2

(arctan x) = 1+x

2

for all x ∈ R. (It is not rigorous to draw a little

right triangle with an angle θ = arctan x in one corner. Problems 9–10 involve no

notion of angle or two-dimensional geometry.)

(d) By the deﬁnition of inverse function, tan(arctan x) = x for all x ∈ R. Use the

Chain Rule to show the derivative of arctan x is

1

1+x

2

.

(e) In the geometric series 1 + x + x

2

+ x

3

+ , substitute −x

2

for x. Show that

this power series converges to

1

1+x

2

for x ∈ (−1, 1). (Aside: is this uniform

convergence?)

(f) Consider the power series

A(x) = x −

x

3

3

+

x

5

5

−

x

7

7

+

Show that this deﬁnes an analytic function on (−1, 1). Show that A(x) and

arctan x have the same derivative. Therefore A(x)−arctan x is a constant. Check-

ing at x = 0 to see what this constant is, show that A(x) = arctan x on (−1, 1).

(g) Show that arctan x is uniformly continuous on R.

(h) Since A(x) equals arctan x on (−1, 1), it is uniformly continuous on that open

interval. By the last problem set, it has a unique continuous extension to [−1, 1].

Conclude that

π

4

= 1 −

1

3

+

1

5

−

1

7

+

89

15. Abel’s limit theorem. Suppose that f : (−1, 1] → R is a function such that (a) f

is continuous at x = 1 and (b) for all x ∈ (−1, 1), f(x) =

∞

n=0

a

n

x

n

for some power

series that converges for all x ∈ (−1, 1). If, in addition,

a

n

converges, prove that

∞

n=0

a

n

= f(1) .

Hint. For x ∈ (−1, 1) write f

n

(x) =

n

k=0

a

k

x

k

and A

n

=

n

k=0

a

k

. Show that

f

n

(x) = (1 −x)(A

0

+ + A

n−1

x

n−1

) + A

n

x

n

.

Let n → ∞ to get a diﬀerent representation for f(x). Next denote A =

∞

k=0

a

k

and

write

f(x) −A = f(x) −(1 −x)

∞

n=0

Ax

n

.

Use the representation of f(x) above to bound this diﬀerence for x near 1.

90

8 Integration

The standard motivation for integration is to ﬁnd the area under the graph of a function.

There are other very important reasons to study integration and one is that integration is

a smoothing operation: the (indeﬁnite) integral of a function has more derivatives than the

original function does. Other motivations can be seen in abstract measure theory and the

application to, for instance, probability theory.

8.1 Deﬁnitions

We will start at the bottom and try to ﬁnd the area under a graph. We will place boxes

under the graph and sum the area in these boxes. The x-coordinates of the sides of these

boxes form an (ordered) partition. Although we have used this word before, it will take a

new meaning here.

Deﬁnition 8.1.1. A partition T of the interval [a, b] is a ﬁnite set ¦x

1

, . . . , x

n

¦ such that

a = x

1

< x

2

< < x

n

= b .

Given a partition and a bounded function f we can construct an upper sum and a lower

sum. To do this, we consider a subinterval [x

i

, x

i+1

] and let

m

i

= inf

x∈[x

i

,x

i+1

]

f(x) and M

i

= sup

x∈[x

i

,x

i+1

]

f(x) .

A box with base [x

i

, x

i+1

] and height M

i

contains the entire area below f in this interval,

whereas the box with the same base but height m

i

is contained in this area. (Here we are

thinking of f ≥ 0, so these statements are slightly diﬀerent otherwise.) Counting up the

area of these boxes, we get the following deﬁnitions.

Deﬁnition 8.1.2. Given a partition T = ¦x

1

< < x

n

¦ of [a, b] and a bounded function

f : [a, b] →R we deﬁne the upper and lower sums of f relative to the partition T as

U(f, T) =

n−1

i=1

M

i

(x

i+1

−x

i

) and L(f, T) =

n−1

i=1

m

i

(x

i+1

−x

i

) .

There is a useful monotonicity property of upper and lower sums. To state this, we use

the following term. A partition Q of [a, b] is said to be a reﬁnement of T if T ⊂ Q. This

means that we have just thrown in extra subintervals to T to form Q.

Lemma 8.1.3. Let f : [a, b] →R be bounded and Q a reﬁnement of T. Then

U(f, Q) ≤ U(f, T) and L(f, Q) ≥ L(f, T) .

91

Proof. By iteration (or induction) it suﬃces to show the inequalities in the case that Q has

just one more point than T. So take T = ¦x

1

< < x

n

¦ and Q = ¦x

1

< < x

k

< t <

x

k+1

< < x

n

¦. Since most intervals are unchanged,

U(f, T) −U(f, Q) = M

k

(x

k+1

−x

k

) −

_

sup

y∈[x

k

,t]

f(y)

_

(y −x

k

) −

_

sup

z∈[t,x

k+1

]

f(z)

_

(x

k+1

−y)

≥ M

k

(x

k+1

−x

k

) −M

k

(y −x

k

) −M

k

(x

k+1

−y)

= 0 .

The argument for lower sums is similar.

The above lemma says that upper sums decrease and lower sums increase when we add

more points into the partition. Since we are thinking of taking very ﬁne partitions, we deﬁne

the upper and lower integrals

_

b

a

f(x) dx = inf

P

U(f, T) and

_

b

a

f(x) dx = sup

P

L(f, T)

for bounded f : [a, b] →R. Note that these are deﬁned for all bounded f.

Deﬁnition 8.1.4. If f : [a, b] →R then f is integrable (written f ∈ 1([a, b])) if

_

b

a

f(x) dx =

_

b

a

f(x) dx .

In this case we write

_

b

a

f(x) dx for the common value.

Note the following property of upper and lower sums and integrals.

• For any partition T of [a, b] and bounded function f : [a, b] →R,

L(f, T) ≤

_

b

a

f(x) dx ≤

_

b

a

f(x) dx ≤ U(f, T) .

Proof. The only inequality that is not obvious is the one between the integrals. To

show this, we ﬁrst let > 0. By deﬁnition of the upper and lower integrals, there exist

partitions T

1

and T

2

of [a, b] such that

L(f, T) >

_

b

a

f(x) dx −/2 and U(f, Q) <

_

b

a

f(x) dx + /2 .

Taking T

**to be the common reﬁnement of T and Q (that is, their union), we can use
**

the previous lemma to ﬁnd

_

b

a

f(x) dx < L(f, T) + /2 ≤ L(f, T

) + /2 ≤ U(f, T

) + /2

≤ U(f, Q) + /2 <

_

b

a

f(x) dx + .

Taking →0 we are done.

92

There is an equivalent characterization of integrability. It is useful because the condition

involves only one partition, whereas when dealing with both upper and lower integrals one

would need to approximate using two partitions.

Theorem 8.1.5. Let f : [a, b] →R be bounded. f is integrable if and only if for each > 0

there is a partition T of [a, b] such that U(f, T) −L(f, T) < .

Proof. Suppose ﬁrst that f is integrable and let > 0. Then the upper and lower integrals

are equal. Choose T

1

such that L(f, T

1

) >

_

b

a

f(x) dx−/2 and U(f, T

2

) <

_

b

a

f(x) dx+/2.

Taking T to be the common reﬁnement of T

1

and T

2

we ﬁnd

L(f, T) ≥ L(f, T

1

) >

_

b

a

f(x) dx −/2

and

U(f, T) ≤ U(f, T

2

) <

_

b

a

f(x) dx + /2 .

Combining these two gives U(f, T) −L(f, T) < .

Conversely suppose that for each > 0 we can ﬁnd a partition T such that U(f, T) −

L(f, T) < . Then

_

b

a

f(x) dx ≤ U(f, T) < L(f, T) + ≤

_

b

a

f(x) dx + .

Since > 0 is arbitrary, we ﬁnd

_

b

a

f(x) dx ≤

_

b

a

f(x) dx. The other inequality is obvious, so

the upper and lower integrals are equal. In other words, f ∈ 1.

Using this we can show that all continuous functions are integrable.

Theorem 8.1.6. Let f : [a, b] →R be continuous. Then f is integrable.

Proof. Since [a, b] is compact, f is uniformly continuous. Then given > 0 we can ﬁnd δ > 0

such that if x, y ∈ [a, b] with [x−y[ < δ then [f(x)−f(y)[ < /(2(b−a)). Now construct any

partition T of [a, b] such that, writing T = ¦x

1

< x

2

< < x

n

¦, we have [x

i

− x

i+1

[ < δ

for all i = 1, . . . , n −1. Then in each subinterval [x

i

, x

i+1

], we have

[f(x) −f(y)[ < /2 for all x, y ∈ [x

i

, x

i+1

] .

This gives M

i

−m

i

≤ /(2(b −a)) < /(b −a). Therefore

U(f, T) −L(f, T) =

n−1

i=1

(M

i

−m

i

)(x

i+1

−x

i

) < /(b −a)

n−1

i=1

(x

i+1

−x

i

) = .

Using the last theorem, we are done.

93

So we know now that all continuous functions are integrable. There are some other

questions we need to resolve.

1. Which other functions are integrable?

2. Which functions are not integrable?

3. How do we compute integrals?

Examples.

• Let f be the indicator function of the rationals:

f(x) =

_

1 if x ∈ Q

0 if x / ∈ Q

.

We will now show that f is not integrable on any [a, b].Indeed, let T be any partition

of [a, b], written as ¦x

1

< x

2

< < x

n

¦. Then for each subinterval [x

i

, x

i+1

], we have

M

i

= sup

x∈[x

i

,x

i+1

]

f(x) = 1 and m

i

= 0 .

Therefore U(f, T) −L(f, T) =

n−1

i=1

(M

i

−m

i

)(x

i+1

−x

i

) = b −a. Choosing any > 0

that is less than b−a, we see that there is no partition T such that U(f, T)−L(f, T) < .

Therefore f / ∈ 1.

• Every monotone function is integrable. Indeed, take f : [a, b] →R to be nondecreasing.

If T

n

is the partition

T

n

=

_

a < a +

b −a

n

< a +

2(b −a)

n

< < b

_

,

then we can compute

U(f, T

n

) −L(f, T

n

) =

n−1

i=0

(M

i

−m

i

)(1/n)

=

1

n

n−1

i=1

_

f

_

a +

i(b −a)

n

_

−f

_

a +

(i −1)(b −a)

n

__

=

1

n

(f(b) −f(a)) .

Then given > 0 take n such that (f(b) −f(a))/n < . This shows that f ∈ 1.

• All functions with countably many discontinuities are integrable. One example will be

in the problem set. It is actually possible to show that some functions with uncountably

many discontinuities are integrable, but we will not address this.

94

Let us prove a simple example, the function f : [0, 1] given by

f(x) =

_

0 x ≤ 1/2

1 x > 1/2

.

Given > 0 we construct a partition containing a very small subinterval around the

discontinuity. Let T = ¦0 < 1/2 −/3 < 1/2 + /3 < 1¦. Then

U(f, T) −L(f, T) =

2

i=1

(M

i

−m

i

)(x

i+1

−x

i

)

= 0(1/2 −/3) + 1(2/3) + 0(1/2 −2/3) = 2/3 < .

In this example we did not need to care about subintervals away from the discontinuity

because the function is constant there (and thus has M

i

= m

i

). In general we would

have to have construct a partition with somewhat more complicated parts there too

(possibly using continuity).

Let us now give an example of computing an integral by hand. Consider f : [0, 1] → R

given by f(x) = x

2

. Take a partition T

n

to be

T

n

=

_

0 <

1

n

<

2

n

< <

n −1

n

< 1

_

.

The upper sum is

U(f, T

n

) =

n−1

i=0

f

_

i + 1

n

_

(1/n) = (1/n)

n−1

i=0

_

i + 1

n

_

2

=

1

n

3

n

i=1

i

2

=

n(n + 1)(2n + 1)

6n

3

→1/3.

Similarly, L(f, T

n

) →1/3. This means that

_

1

0

f(x) dx ≥ 1/3 and

_

1

0

f(x) dx ≤ 1/3, giving

_

1

0

x

2

dx = 1/3 .

8.2 Properties of integration

Here we state many properties of the integral. Because of the third item, we deﬁne

_

a

b

f(x) dx = −

_

b

a

f(x) dx

and the third item remains valid for any a, b, d.

95

Proposition 8.2.1. Let f, g : [a, b] →R be integrable and c ∈ R.

1. The functions f + g and cf are integrable with

_

b

a

(f + g)(x) dx =

_

b

a

f(x) dx +

_

b

a

g(x) dx

and

_

b

a

(cf)(x) dx = c

_

b

a

f(x) dx .

2. If f(x) ≤ g(x) for all x ∈ [a, b] then

_

b

a

f(x) dx ≤

_

b

a

g(x) dx .

3. If d ∈ (a, b) then f is integrable on [a, d] and on [d, b] with

_

b

a

f(x) dx =

_

d

a

f(x) dx +

_

b

d

f(x) dx .

Proof. Let us show item 1 ﬁrst. For > 0, take T and Q to be partitions such that

L(f, T) ≤

_

b

a

f(x) dx ≤ U(f, T) < L(f, T) + /2

and

L(g, Q) ≤

_

b

a

g(x) dx ≤ U(g, Q) < L(g, Q) + /2

Let T

**be their common reﬁnement so that
**

L(f, T

) + L(g, T

) ≤

_

b

a

f(x) dx +

_

b

a

g(x) dx < L(f, T

) + L(g, T

) + .

On the other hand you can check that

L(f, T

) + L(g, T

) ≤ L(f + g, T

) ≤ U(f + g, T

) ≤ U(f, T

) + U(g, T

) .

(Here we have used that for bounded functions h

1

and h

2

and any set S ∈ R, inf

x∈S

(h

1

(x) +

h

2

(x)) ≤ inf

x∈S

h

1

(x) + inf

x∈S

h

2

(x) and the corresponding statement for suprema.) So we

ﬁnd both

U(f + g, T

) −L(f + g, T

) <

and

¸

¸

¸

¸

L(f + g, T

) −

_

b

a

f(x) dx −

_

b

a

g(x) dx

¸

¸

¸

¸

< .

96

The ﬁrst statement implies that f +g is integrable and

¸

¸

¸L(f + g, T

) −

_

b

a

(f + g)(x) dx

¸

¸

¸ < .

Combining this with the second statement gives

¸

¸

¸

¸

_

b

a

(f + g)(x) dx −

_

b

a

f(x) dx −

_

b

a

g(x) dx

¸

¸

¸

¸

< 2 .

Since is arbitrary this gives the result.

If c ∈ R suppose ﬁrst that c ≥ 0. Then for any set S ⊂ R and bounded function

h : S →R we have

sup

x∈S

(ch)(x) = c sup

x∈S

h(x) and inf

x∈S

(ch)(x) = c inf

x∈S

h(x) .

Therefore for any partition T of [a, b],

U(cf, T) = cU(f, T) and L(cf, T) = cL(f, T) .

So given that f is integrable and > 0, we can choose a partition T such that U(f, T) −

L(f, T) < /c. Then U(cf, T) −L(cf, T) < , proving that cf is integrable. Furthermore,

L(cf, T) = cL(f, T) ≤ c

_

b

a

f(x) dx ≤ cU(f, T = U(cf, T) ≤ L(cf, T) + ,

giving

¸

¸

¸c

_

b

a

f(x) dx −L(cf, T)

¸

¸

¸ < . However we already know that

L(cf, T) ≤

_

b

a

(cf)(x) dx ≤ U(cf, T) < L(cf, T) + ,

giving

¸

¸

¸

_

b

a

(cf)(x) dx −L(cf, T)

¸

¸

¸ < . Combining these two and taking → 0 proves

_

b

a

(cf)(x) dx = c

_

b

a

f(x) dx.

If instead c < 0 then we ﬁrst prove the case c = −1. Then we have for any partition T of

[a, b] that U(−f, T) = −L(f, T) and L(−f, T) = −U(f, T). Thus is U(f, T) − L(f, T) <

we also have U(−f, T) −L(−f, T) < , proving that −f is integrable. Further, as above,

L(−f, T) ≤

_

b

a

(−f)(x) dx < L(−f, T) +

and

−U(f, T) ≤ −

_

b

a

f(x) dx < −U(f, T) + .

Combining these and taking →0 gives

_

b

a

(−f)(x) dx = −

_

b

a

f(x) dx. Last, for any c < 0

we note that if f is integrable, so is −f and since −c > 0, so is (−c)(−f) = cf. Further,

_

b

a

(cf)(x) dx =

_

b

a

(−(−cf))(x) dx = −

_

b

a

(−cf)(x) dx = −(−c)

_

b

a

f(x) dx

= c

_

b

a

f(x) dx .

97

For the second item, we just use the fact that for every partition T of [a, b], U(f, T) ≤

U(g, T) whenever f(x) ≤ g(x) for all x ∈ [a, b]. So given > 0, choose T such that

U(g, T) <

_

b

a

g(x) dx + . Now

_

b

a

f(x) dx ≤ U(f, T) ≤ U(g, T) <

_

b

a

g(x) dx + .

This is true for all > 0 so we deduce that

_

b

a

f(x) dx ≤

_

b

a

g(x) dx.

We move to the third item. Given > 0 choose a partition T of [a, b] such that U(f, T) −

L(f, T) < . Now reﬁne T to a partition Q by adding the point d. Call T

1

the partition of

[a, d] obtained from the points of Q up to d and T

2

the remaining points of Q (including d)

that form a partition of [d, c]. Then

U(f, T

1

) −L(f, T

1

) =

i:x

i

<d

(M

i

−m

i

)(x

i+1

−x

i

) ≤ U(f, T) −L(f, T) < .

This means f is integrable on [a, d]. Similarly it is integrable on [d, c]. Furthermore, we have

L(f, T

1

) ≤

_

d

a

f(x) dx ≤ L(f, T

1

) + ,

and

L(f, T

2

) ≤

_

c

d

f(x) dx ≤ L(f, T

2

) + .

Combining these with

L(f, T

1

) + L(f, T

2

) = L(f, Q) ≤

_

b

a

f(x) dx ≤ L(f, T

1

) + L(f, T

2

) + ,

We ﬁnd

¸

¸

¸

¸

_

b

a

f(x) dx −

_

d

a

f(x) dx −

_

c

d

f(x) dx

¸

¸

¸

¸

< 3 .

Taking to zero gives the result.

Let us give one more important property of the integral.

Proposition 8.2.2 (Triangle inequality for integrals). Let f : [a, b] →R be integrable. Then

so is [f[ and

¸

¸

¸

¸

_

b

a

f(x) dx

¸

¸

¸

¸

≤

_

b

a

[f(x)[ dx .

Proof. Let > 0 and choose a partition T of [a, b] such that U(f, T) −L(f, T) < . For the

proof we use the fact (which you can check using the triangle inequality) that for any set

S ⊂ R and bounded function g : S →R,

sup

x∈S

[f(x)[ − inf

x∈S

[f(x)[ ≤ sup

x∈S

f(x) − inf

x∈S

f(x) .

98

This implies that

U([f[, T) −L([f[, T) ≤ U(f, T) −L(f, T) < ,

so [f[ ∈ 1.

To prove the inequality in the proposition, note that f(x) ≤ [f(x)[ for all x, so

_

b

a

f(x) dx ≤

_

b

a

[f(x)[ dx. Similarly −f(x) ≤ [f(x)[, so −

_

b

a

f(x) dx =

_

b

a

(−f(x)) dx ≤

_

b

a

[f(x)[ dx.

Combining these gives the inequality.

In fact this is an instance of a more general theorem, stated in Rudin. We will not prove

it; the proof is similar to the above (but more complicated).

Theorem 8.2.3. Suppose that f : [a, b] →[c, d] is integrable and φ : [c, d] →R is continuous.

Then φ ◦ f is integrable.

Proof. See Rudin, Thm. 6.11.

From this theorem we ﬁnd more integrable functions:

• If f is integrable on [a, b] then so is f

2

. This follows by taking φ(x) = x

2

in the above

theorem.

• If f and g are integrable on [a, b] then so is fg. This follows by writing

fg =

1

4

_

(f + g)

2

−(f −g)

2

¸

.

8.3 Fundamental theorems

Of course we do not always have to compute integrals by hand. As we learn in calculus, we

can compute an integral if we know the “antiderivative” of the function. Stated precisely,

Theorem 8.3.1 (Fundamental theorem of calculus part I). Let f : [a, b] → R be integrable

and F : [a, b] →R a continuous function such that F

**(x) = f(x) for all x ∈ (a, b). Then
**

F(b) −F(a) =

_

b

a

f(x) dx .

Proof. Since f is integrable, given > 0 we can ﬁnd a partition T such that U(f, T) −

L(f, T) < . We will use the mean value theorem to relate values of f in the subintervals to

values of F. That is, writing T = ¦x

1

< < x

n

¦, we can ﬁnd for each i = 1, . . . , n − 1 a

point c

i

∈ (x

i

, x

i+1

) such that

F(x

i+1

) −F(x

i

) = f(c

i

)(x

i+1

−x

i

) .

Then we have

L(f, T) ≤

n−1

i=1

f(c

i

)(x

i+1

−x

i

) ≤ L(f, T) + .

99

Furthermore

L(f, T) ≤

_

b

a

f(x) dx ≤ L(f, T) + .

Using the equation derived by the mean value theorem above,

n−1

i=1

f(c

i

)(x

i+1

−x

i

) =

n−1

i=1

[F(x

i+1

) −F(x

i

)] = F(b) −F(a) .

Combining with the above,

¸

¸

¸

¸

_

b

a

f(x) dx −[F(b) −F(a)]

¸

¸

¸

¸

<

and we are done.

As we learn in calculus, we are able now to say, for example, that

_

b

a

cos x dx = sin(b) −sin(a)

and

_

b

a

x

n

dx =

1

n + 1

_

b

n+1

−a

n+1

¸

.

There is a second fundamental theorem of calculus. Whereas the ﬁrst is about integrating

a derivative, the second is about diﬀerentiating an integral. Both of them say that integration

and diﬀerentiation are inverse operations. For example, in the ﬁrst, when we start with F

and diﬀerentiate to get a function f, we integrate back to get F (in a sense).

Theorem 8.3.2 (Fundamental theorem of calculus part II). Let f : [a, b] →R be continuous.

Deﬁne F : [a, b] →R by

F(x) =

_

x

a

f(t) dt .

Then F is diﬀerentiable on [a, b] with F

(x) = f(x) for all x.

Proof. Let x ∈ [a, b); the case of x = b is similar and is calculated as a left derivative. For

h > 0,

F(x + h) −F(x)

h

=

1

h

__

x+h

a

f(t) dt −

_

x

a

f(t) dt

_

=

1

h

_

x+h

x

f(t) dt .

Let > 0. Since f is continuous at x we can ﬁnd δ > 0 such that if [t − x[ < δ then

[f(t) −f(x)[ < . This means that if 0 < h < δ then

¸

¸

¸

¸

F(x + h) −F(x)

h

−f(x)

¸

¸

¸

¸

=

1

h

¸

¸

¸

¸

_

x+h

x

f(t) dt −

_

x+h

x

f(x) dt

¸

¸

¸

¸

=

1

h

¸

¸

¸

¸

_

x+h

x

(f(t) −f(x)) dt

¸

¸

¸

¸

≤

1

h

_

x+h

x

[f(t) −f(x)[ dt

≤ (1/h)h = .

100

In other words,

lim

h→0

+

¸

¸

¸

¸

F(x + h) −F(x)

h

−f(x)

¸

¸

¸

¸

= 0 .

A similar argument works for the left limit (in the case that x ,= a), using

F(x −h) −F(x)

h

=

1

h

_

x−h

x

f(t) dt .

and completes the proof.

8.4 Change of variables, integration by parts

We will now prove the “u-substitution” rule for integrals. As you know from calculus, this

is a valuable tool to solve for the value of many deﬁnite integrals. The proof is essentially

a combination of the chain rule and the fundamental theorem of calculus. Note that in its

statement, the range of f is a closed interval. This follows from the fact that f is continuous

on a closed interval. Indeed, the image must be connected and compact, therefore a closed

interval as well.

Theorem 8.4.1 (Substitution rule). Let f : [a, b] → R be C

1

and write [c, d] for the range

of f. If g : [c, d] →R is continuous then

_

f(b)

f(a)

g(t) dt =

_

b

a

g(f(x))f

(x) dx .

Proof. Deﬁne a function F : [c, d] →R by

F(x) =

_

x

f(a)

g(t) dt .

Then because g is continuous, by the fundamental theorem of calculus II, F is diﬀerentiable

and F

(x) = g(x) (giving actually F ∈ C

1

). Furthermore as f is diﬀerentiable, the function

F ◦ f : [a, b] → R is diﬀerentiable with (F ◦ f)

(x) = F

(f(x))f

(x). Last, F

is continuous

and f is integrable, so by Theorem 8.2.3, F

◦ f is integrable. Since f

is continuous, it is

also integrable, so the product of F

◦ f and f

**is integrable. By the fundamental theorem
**

of calculus I,

F(f(b)) −F(f(a)) =

_

b

a

F

(f(x))f

(x) dx .

Plugging in,

_

f(b)

f(a)

g(t) dt =

_

b

a

g(f(x))f

(x) dx .

Just as the substitution rule is related to the chain rule, integration by parts is related

to the product rule.

101

Theorem 8.4.2 (Integration by parts). Let f, g : [a, b] →R be C

1

. Then

_

b

a

f(x)g

(x) dx = f(b)g(b) −f(a)g(a) −

_

b

a

f

(x)g(x) dx .

Proof. This follows from the product rule since both of f

g and fg

is integrable.

8.5 Exercises

1. Let f : [0, 1] →R be continuous.

(a) Suppose that f(x) ≥ 0 for all x and that

_

1

0

f(x) dx = 0. Show that f is identically

zero.

(b) Suppose that f is not necessarily non-negative but that

_

b

a

f(x) dx = 0 for all

a, b ∈ [0, 1] with a < b. Show that f is identically zero.

2. Let f : [0, 1] →R be continuous. Show that

lim

n→∞

_

1

0

x

n

f(x) dx = 0 .

Hint. For c near 1, consider [0, c] and [c, 1] separately.

3. Let f : [0, 1] →R be continuous. Prove that

lim

n→∞

__

1

0

[f(x)[

n

dx

_

1/n

= max

x∈[0,1]

[f(x)[ .

4. Deﬁne f : [0, 1] →R by

f(x) =

_

0 if x / ∈ Q

1

n

if x =

m

n

∈ Q, where m and n have no common divisor

.

Use Theorem 6.6 in Rudin to prove that f is Riemann integrable.

5. Let f and g be continuous functions on [0, 1] with g(x) ≥ 0 for all x. Show there exists

c ∈ [0, 1] such that

_

1

0

f(x)g(x) dx = f(c)

_

1

0

g(x) dx .

6. (a) Show that the Euler-Mascheroni constant

γ = lim

n→∞

_

n

k=1

1

k

−log n

_

exists .

102

Hint. Write the above quantity as

1

n

+

n−1

k=1

1

k

−

_

n

1

dx

x

=

1

n

+

n−1

k=1

_

k+1

k

_

1

k

−

1

x

_

dx .

Show the last sum converges.

(b) Use the last part to ﬁnd the limit

lim

n→∞

_

1

n

+ +

1

2n

_

.

7. Let ¦f

n

¦ be a sequence of continuous functions on [0, 1]. Suppose that ¦f

n

¦ converges

uniformly to a function f. Recall from last problem set that this means that for any

> 0 there exists N such that n ≥ N implies that [f

n

(x) −f(x)[ < for all x ∈ [0, 1].

Show that

lim

n→∞

_

1

0

f

n

(x) dx =

_

1

0

f(x) dx .

Give an example to show that we cannot only assume f

n

→f pointwise (meaning that

for each ﬁxed x ∈ [0, 1], f

n

(x) →f(x)).

Hint. Use the inequality

¸

¸

¸

_

1

0

g(x) dx

¸

¸

¸ ≤

_

1

0

[g(x)[ dx, valid for any integrable g.

8. Suppose that ¦f

n

¦ is a sequence of functions in C

1

([0, 1]) and that the sequence ¦f

n

¦

converges uniformly to some function g. Suppose there exists some c ∈ [0, 1] such that

the sequence ¦f

n

(c)¦ converges. By the fundamental theorem of calculus, we can write

for x ∈ [0, 1]

f

n

(x) = f

n

(c) +

_

x

c

f

n

(t) dt .

(a) Show that ¦f

n

¦ converges pointwise to some function f.

(b) Show that f is diﬀerentiable and f

**(x) = g(x) for all x. (You will need to use
**

Theorem 7.12 in Rudin.)

Remark. The above result gives a method to prove the form of a derivative of a power

series. Suppose that f(x) =

∞

n=0

a

n

x

n

has radius of convergence R > 0. Setting

f

n

(x) =

n

j=0

a

j

x

j

and g(x) =

∞

j=1

ja

j

x

j−1

,

one can show using the Weierstrass M-test that for any r with 0 < r < R, f

n

→ g

uniformly on (−r, r). We can then conclude that f

(x) = g(x).

103

9. You can solve either this question or the next one. In this problem we will

show part of Stirling’s formula. It states that

lim

n→∞

n!

n

n

e

−n

√

n

→

√

2π .

We will only show the limit exists.

(a) Show that

log

_

n

n

n!

_

=

n−1

k=1

__

k+1

k

log(x/k) dx

_

+ n −1 −log n .

Use a change of variable u = x/k and continue to show that this equals

n−1

k=1

_

k

_

1/k

0

[log(1 + u) −u] du +

1

2k

_

+ n −1 −log n .

(b) Prove that

n!

n

n

e

−n

√

n

converges if and only if

lim

n→∞

n−1

k=1

_

k

_

1/k

0

[log(1 + u) −u] du

_

exists .

(c) Show that for u ∈ [0, 1],

−u

2

/2 ≤ log(1 + u) −u ≤ 0

and deduce that the limit in part (b) exists.

Hint. Use Taylor’s theorem.

10. You can solve either this question or the previous one. In this question, you

will work out an alternate derivation of existence of the limit in Stirling’s formula.

(a) Deﬁne a continuous function g such that g(n) = log n for n ∈ N and g(x) is linear

in each interval [n, n + 1]. Show that for n large enough,

log n +

x −n

n + 1

≤ g(x) ≤ log x ≤ log n +

x −n

n

for x ∈ [n, n + 1] .

(b) Let S

n

=

_

n

1

[log x −g(x)] dx. Use part (a) to show that (S

n

) is Cauchy and thus

converges. Compute directly that

_

n

1

log x dx = nlog n −n + 1

and

_

n

1

g(x) dx = log n! −

1

2

log n. Conclude that the limit in Stirling’s formula

exists.

104

A Real powers

The question is the following: we know what 2

2

or 2

3

means, or even 2

2/3

, the number whose

cube equals 2

2

. But what does 2

√

2

mean? We will give the deﬁnition Rudin has in the

exercises of Chapter 1. We will only use the following facts for r, s > 0, n, m ∈ Z:

• r

n+m

= r

n

r

m

.

• (r

n

)

m

= r

mn

.

• (rs)

n

= r

n

s

n

.

• if r > 1 and m ≥ n then r

m

≥ r

n

. If r < 1 and m ≥ n then r

m

≤ r

n

.

• if s < r and n > 0 then s

n

< r

n

. If s < r and n < 0 then s

n

> r

n

.

A.1 Natural roots

We ﬁrst deﬁne the n-th root of a real number, for n ∈ N.

Theorem A.1.1. For any r > 0 and n ∈ N there exists a unique positive real number y

such that y

n

= r.

Proof. The proof is Theorem 1.21 in Rudin. The idea is to construct the set

S = ¦x > 0 : x

n

≤ r¦

and to show that S is nonempty, bounded above, and thus has a supremum. Calling y this

supremum, he then shows y

n

= r. The proof of this is somewhat involved and is similar to

our proof (from the ﬁrst lecture) that ¦a ∈ Q : a

2

< 2¦ does not have a greatest element.

To show there is only one such y, we note that 0 < y

1

< y

2

implies that y

n

1

< y

n

2

and so

if y

1

,= y

2

are positive then y

n

1

,= y

n

2

.

This deﬁnition extends to integer roots.

Deﬁnition A.1.2. If r > 0 and n ∈ N we deﬁne r

−1/n

as the unique positive real number y

such that y

n

= 1/r.

A.2 Rational powers

The above deﬁnitions allow us to deﬁne rational powers.

Deﬁnition A.2.1 (Preliminary deﬁnition of rational powers). If r > 0 and m, n ∈ N we

deﬁne r

m/n

to be the unique positive real number y such that y

n

= r

m

. Also r

−m/n

is deﬁned

as (1/r)

m/n

.

Because a rational number can have more than one representation m/n we need to show

this is well deﬁned.

105

Proposition A.2.2. If a positive a ∈ Q can be represented by m/n and p/q for m, n, p, q ∈ N

then for all r > 0,

r

m/n

= r

p/q

.

Proof. First note that (r

m/n

)

nq

= ((r

m/n

)

n

)

q

= r

mq

and (r

p/q

)

nq

= ((r

p/q

)

q

)

n

= r

pn

. However

as m/n = p/q we have pn = mq and so these numbers are equal. There is a unique nq-th

root of this number, so r

m/n

= r

p/q

.

Note that the above proof applies to negative rational powers: suppose that r > 0 and

a ∈ Q is negative such that a = −m/n = −p/q. Then

r

−m/n

= (1/r)

m/n

= (1/r)

p/q

= r

−p/q

.

Deﬁnition A.2.3 (Correct deﬁnition of rational powers). If r > 0 and a > 0 is rational

we deﬁne r

a

= r

m/n

for any m, n ∈ N such that a = m/n. If a < 0 is rational we deﬁne

r

a

= (1/r)

−a

.

Properties of rational powers. Let a, b ∈ Q and r, s > 0.

• If a = m/n for m ∈ Z and n ∈ N then r

a

is the unique positive number such that

(r

a

)

n

= r

m

.

Proof. For m ≥ 0 this is the deﬁnition. For m < 0, this is because (r

a

)

n

= ((1/r)

−a

)

n

=

(1/r)

−m

= r

m

and if s is any other positive number satisfying s

n

= r

m

then uniqueness

of n-th roots gives s = r

a

.

• r

a+b

= r

a

r

b

.

Proof. Choose m, p ∈ Z and n, q ∈ N such that a = m/n and b = p/q. Then a + b =

mq+np

nq

and therefore r

a+b

is the unique positive number such that (r

a+b

)

nq

= r

mq+np

.

But we can just compute

(r

a

r

b

)

nq

= ((r

a

)

n

)

q

((r

b

)

q

)

n

= r

mq

r

np

= r

mq+np

.

And by uniqueness we get r

a

r

b

= r

a+b

.

• (r

a

)

b

= r

ab

.

Proof. Write a = m/n and b = p/q for m, p ∈ Z and n, q ∈ N. Then r

ab

is the unique

positive number such that (r

ab

)

nq

= r

mp

. But

((r

a

)

b

)

nq

= (((r

a

)

b

)

q

)

n

= ((r

a

)

p

)

n

= ((r

a

)

n

)

p

= (r

m

)

p

= r

mp

,

giving (r

a

)

b

= r

ab

.

• (rs)

a

= r

a

s

a

.

106

Proof. Again write a = m/n for m ∈ Z and n ∈ N. Then (rs)

a

is the unique positive

number such that ((rs)

a

)

n

= (rs)

m

. But

(r

a

s

a

)

n

= (r

a

)

n

(s

a

)

n

= r

m

s

m

= (rs)

m

.

• If r > 1 and a ≥ b then r

a

≥ r

b

. If r < 1 and a ≥ b are rational then r

a

≤ r

b

.

Proof. Suppose ﬁrst that r > 1 and a ≥ 0 with a = m/n for m, n ∈ N. Then if r

a

< 1,

we ﬁnd r

m

< 1

n

= 1, a contradiction, as r

m

> 1. So r

a

> 1. Next if a ≥ b then

a −b ≥ 0 so r

a−b

≥ 1. This gives r

a

= r

a−b

r

b

≥ r

b

.

If r < 1 then r

a

(1/r)

a

= 1

a

= 1, so r

−a

= (1/r)

a

. Similarly r

−b

= (1/r)

b

. So since

1/r > 1 we get r

−a

= (1/r)

a

≥ (1/r)

b

= r

−b

. Multiplying both sides by r

a

r

b

we get

r

a

≤ r

b

.

• If s < r and a > 0 then s

a

< r

a

. If s < r and a < 0 then s

a

> r

a

.

Proof. Let a = m/n with m, n ∈ N. Then if s

a

≥ r

a

we must have s

m

= (s

a

)

n

≥

(r

a

)

n

= r

m

. But this is a contradiction since s

m

< r

m

. This proves the ﬁrst statement.

For the second, 1/s > 1/r so s

a

= (1/s)

−a

> (1/r)

−a

= r

a

.

A.3 Real powers

We deﬁne a real power as a supremum of rational powers.

Deﬁnition A.3.1. Given r > 1 and t ∈ R we set

r

t

= sup¦r

a

: a ∈ Q and a ≤ t¦ .

If 0 < r < 1 then deﬁne r

t

= (1/r)

−t

.

Proposition A.3.2. If a ∈ Q then for r > 0, the deﬁnition above coincides with the rational

deﬁnition.

Proof. For this proof, we take r

a

to be the deﬁned as in the rational powers section.

Suppose ﬁrst that r > 1. Clearly r

a

∈ ¦r

b

: b ∈ Q and b ≤ a¦. So to show it is the

supremum we need only show it is an upper bound. This follows from the fact that b ≤ a

implies r

b

≤ r

a

(proved above).

If 0 < r < 1 then r

a

(r

−1

)

−a

= (1/r)

−a

so the deﬁnitions coincide here as well.

Properties of real powers. Let t, u ∈ R and r, s > 0.

• r

t+u

= r

t

r

u

.

107

Proof. We will use the following statement, proved on the homework. If A and B are

nonempty subsets of [0, ∞) which are bounded above then deﬁne AB = ¦ab : a ∈

A, b ∈ B¦. We have

sup(AB) = sup Asup B . (7)

It either of the sets consists only of 0, then the supremum of that set is 0 and both

sides above are 0. Otherwise, both sets (and therefore also AB) contain positive

elements. For any element c ∈ AB we have c = ab for some a ∈ A, b ∈ B. Therefore

c = ab ≤ sup Asup B and therefore this is an upper bound for AB. As sup(AB) is the

least upper bound, we get sup(AB) ≤ sup Asup B. Assuming now for a contradiction

that we have strict inequality, because sup A > 0 we also have sup(AB)/ sup A < sup B.

Thus there exists b ∈ B such that sup(AB)/ sup A < b. As b must be positive, we also

have sup(AB)/b < sup A and there exists a ∈ A such that sup(AB)/b < a, giving

sup(AB) < ab. This is clearly a contradiction.

Now to prove the property, suppose ﬁrst that r > 1. By the statement we just proved,

we need only show that

¦r

b

: b ∈ Q and b ≤ t + u¦ = AB ,

where A = ¦r

c

: c ∈ Q and c ≤ t¦ and B = ¦r

d

: d ∈ Q and d ≤ u¦. (This is because

these are sets of non negative numbers.) This holds because each rational b ≤ t + u

can be written as a sum of two rationals c, d such that c ≤ t and d ≤ u.

For 0 < r < 1 we have r

t+u

= (1/r)

−(t+u)

= (1/r)

(−t)+(−u)

= (1/r)

−t

(1/r)

−u

= r

t

r

u

.

• (rs)

t

= r

t

s

t

.

Proof. We ﬁrst note that

r

−t

= 1/r

t

. (8)

This is true because r

−t

r

t

= r

0

= 1, so r

−t

= 1/r

t

.

For the property, if r, s > 1 then we can just use equation (7), noting that ¦(rs)

a

:

a ∈ Q and a ≤ t¦ = AB, where A = ¦r

a

: a ∈ Q and a ≤ t¦ and B = ¦s

a

: a ∈

Q and a ≤ t¦. If 0 < r < 1 but s > 1 with rs > 1 we get (rs)

t

/r

t

= (rs)

t

(1/r)

−t

.

We now use equation (7) again, noting that ¦s

a

: a ∈ Q and a ≤ t¦ = AB, where

A = ¦(rs)

a

: a ∈ Q and a ≤ t¦ and B = ¦(1/r)

a

: a ∈ Q and a ≤ −t¦. This gives

(rs)

t

/r

t

= s

t

. This same proof works if r > 1 but 0 < s < 1 with rs > 1. If 0 < r < 1

but s > 1 with rs < 1 we consider s

t

/(rs)

t

= s

t

(1/(rs))

−t

and use the above argument.

This also works in the case r > 1 but 0 < s < 1 with rs < 1. Finally, if 0 < r < 1 and

0 < s < 1 then (rs)

t

= (1/(rs))

−t

= ((1/r)(1/s))

−t

= (1/r)

−t

(1/s)

−t

= r

t

s

t

.

• (r

t

)

u

= r

tu

.

108

Proof. We will ﬁrst show the equality in the case r > 1 and t, u > 0. We begin with

the fact that (r

t

)

u

is an upper bound for ¦r

a

: a ∈ Q and a ≤ tu¦. So let a ≤ tu be

rational and assume further that a > 0. In this case we can write a = bc for b, c ∈ Q

and b ≤ t, c ≤ u. By properties of rational exponents, we have r

a

= (r

b

)

c

. As r

b

≤ r

t

(by deﬁnition) we get from monotonicity that (r

b

)

c

≤ (r

t

)

c

. But this is an element of

the set ¦(r

t

)

d

: d ∈ Q and d ≤ u¦, so (r

t

)

c

≤ (r

t

)

u

. Putting these together,

r

a

= (r

b

)

c

≤ (r

t

)

c

≤ (r

t

)

u

.

This shows that (r

t

)

u

is an upper bound for ¦r

a

: a ∈ Q and 0 < a ≤ tu¦. For the case

that a < 0 we can use monotonicity to write r

a

≤ r

0

≤ (r

t

)

u

. Putting this together

with the case a > 0 gives that (r

t

)

u

is an upper bound for ¦r

a

: a ∈ Q and a ≤ tu¦

and therefore r

tu

≤ (r

t

)

u

.

To prove that (r

t

)

u

≤ r

tu

we must show that r

tu

is an upper bound for ¦(r

t

)

a

: a ∈

Q and a ≤ u¦. For this we observe that r

t

> 1. This holds because t > 0 and therefore

we can ﬁnd some rational b with 0 < b < t. Thus r

t

≥ r

b

> r

0

= 1. Now let a be

rational with 0 < a ≤ u; we claim that (r

t

)

a

≤ r

tu

. Proving this will suﬃce since if

a < 0 then (r

t

)

a

< (r

t

)

0

= 1 ≤ r

tu

. To show the claim, note that if we show that

r

t

≤ (r

tu

)

1/a

we will be done. This is by properties of rational exponents: we would

then have

(r

t

)

a

≤

_

(r

tu

)

1/a

_

a

= r

tu

.

So we are reduced to proving that

sup¦r

b

: b ∈ Q and b ≤ t¦ ≤ (r

tu

)

1/a

,

which follows if we show that for each b ∈ Q such that b ≤ t, we have r

b

≤ (r

tu

)

1/a

.

Again, this is true if r

ab

≤ r

tu

because then r

b

= (r

ab

)

1/a

≤ (r

tu

)

1/a

. But a ≤ t and

b ≤ u so r

ab

≤ r

tu

. This completes the proof of (r

t

)

u

= r

tu

in the case r > 1 and

t, u > 0.

In the case r > 1 but t > 0 and u < 0, we can use (8):

(r

t

)

u

= 1/(r

t

)

−u

= 1/r

−tu

= r

tu

.

If instead r > 1 but t < 0 and u > 0,

(r

t

)

u

= (1/r

−t

)

u

= 1/(r

−t

)

u

= 1/r

−tu

= r

tu

.

Here we have used that for s > 0 and x ∈ R, (1/s)

x

= 1/s

x

, which can be veriﬁed as

1 = (s(1/s))

x

= s

x

(1/s)

x

. Last if r > 1 but t < 0 and u < 0, we compute

(r

t

)

u

= ((1/r)

−t

)

u

= 1/(r

−t

)

u

= 1/r

−tu

= r

tu

,

completing the proof in the case r > 1.

If 0 < r < 1 then

(r

t

)

u

= ((1/r)

−t

)

u

= (1/r)

−tu

= r

tu

.

109

• If r > 1 and u ≤ t then r

u

≤ r

t

. If 0 < r < 1 and u ≤ t then r

u

≥ r

t

.

Proof. Assume r > 1. If u = 0 and t > 0 then we can ﬁnd a rational b such that

0 < b ≤ t, giving r

t

≥ r

b

> r

0

= 1. For general u ≤ t we note 1 ≤ r

t−u

, so multiplying

both sides by the (positive) r

u

we get the result.

If 0 < r < 1 then r

u

= (1/r)

−u

≥ (1/r)

−t

= r

t

.

• If s < r and t > 0 then s

t

< r

t

. If s < r and t < 0 then s

t

> r

t

.

Proof. First consider the case that s = 1. Then r > 1 and for any t > 0 we can ﬁnd a

rational b such that 0 < b < t. Therefore r

t

≥ r

b

> r

0

= 1. For general s < r we write

r

t

= s

t

(r/s)

t

> s

t

. If t < 0 then s

t

= (1/s)

−t

> (1/r)

−t

= r

t

.

B Logarithm and exponential functions

B.1 Logarithm

We will use the integral deﬁnition of the natural logarithm. For x > 0 deﬁne

log x =

_

x

1

1

t

dt .

This is deﬁned because 1/x is continuous on (0, ∞).

Properties of logarithm.

• log 1 = 0.

• log is C

∞

on (0, ∞).

• log is strictly increasing and therefore injective.

Proof. The derivative is 1/x, which is positive.

• For x, y > 0, log(xy) = log x + log y. Therefore log(1/x) = −log x.

Proof. For a ﬁxed y > 0 deﬁne f(x) = log(xy) −log y. We have

f

(x) =

y

xy

=

1

x

=

d

dx

log x .

Therefore f(x) − log x has zero derivative and must be a constant. Taking x = 1, we

get

f(1) −log 1 = log y −log y = 0 ,

so f(x) = log x. This completes the proof.

110

• The range of log is R.

Proof. We ﬁrst claim that lim

x→∞

log x = ∞. Because log is strictly increasing, it

suﬃces to show that the set ¦log x : x ∈ R¦ is unbounded above. Note that

log 2 =

_

2

1

1

t

dt ≥

_

2

1

1

2

dt = 1/2 .

Therefore log(2

n

) = nlog 2 ≥ n/2. This proves the claim.

Because log is continuous and approaches inﬁnity as x → ∞, the intermediate value

theorem, combined with the fact that log 1 = 0, implies that the range of log includes

[0, ∞). Using log(1/x) = −log x, we get all of R.

B.2 Exponential function

Because log is strictly increasing and diﬀerentiable, exercise 1, Chapter 7 implies that the

inverse function of log exists and is diﬀerentiable. We deﬁne the inverse to be the exponential

function:

for x ∈ R, e

x

is the number such that log(e

x

) = x .

Its derivative can by found using the chain rule:

x = log(e

x

), so 1 =

1

e

x

d

dx

e

x

,

or

d

dx

e

x

= e

x

.

Properties of exponential.

• e

0

= 1.

• e

x

is C

∞

on R.

• For x, y ∈ R, e

x+y

= e

x

e

y

.

Proof. From properties of log,

log(e

x+y

) = x + y = log(e

x

) + log(e

y

) = log(e

x

e

y

) .

Since log is injective, this shows e

x+y

= e

x

e

y

.

• e

x

> 0 for all x. Therefore the exponential function is strictly increasing.

Proof. Because e

x

is the inverse function of log x, which is deﬁned on (0, ∞), its range

is (0, ∞), giving e

x

> 0.

111

• For any x,

e

x

=

∞

n=0

x

n

n!

.

Proof. This follows from Taylor’s theorem. For any x, the n-th derivative of the ex-

ponential function evaluated at x is simply e

x

. Therefore expanding at x = 0, for any

N ≥ 1,

e

x

=

N−1

n=0

f

(n)

(0)

n!

x

n

+

f

N

(c

N

)

N!

x

N

=

N−1

n=0

x

n

n!

+

e

c

N

N!

x

N

,

with c

N

some number between 0 and x. This remainder term is bounded by

e

c

N

N!

x

N

≤ e

x

x

N

N!

→0 ,

because x

N

/N! →0 as N →∞. This follows because the ratio test gives convergence

of

x

n

/n!, so the n-th term must go to 0. By the corollary to Taylor’s theorem, we

get e

x

=

∞

n=0

x

n

n!

.

• Writing e = e

1

, the exponential function is the x-th power of e (deﬁned earlier in terms

of suprema).

Proof. For ease of reading, write exp(x) for the function we have deﬁned here and e

x

for the x-th power of e, deﬁned in terms of suprema. Then for x = m/n ∈ Q with

n ∈ N, we have

(exp(x))

n

= exp(m/n)

n

= exp(m) = exp(1)

m

= e

m

.

Because e

m/n

was deﬁned as the unique positive number y such that y

n

= e

m

, we have

exp(x) = e

x

. Generally for x ∈ R we deﬁned

e

x

= sup¦e

q

: q ∈ Q and q < x¦ .

(This was the deﬁnition for exponents whose bases are ≥ 1, which is true in our case

because e

1

≥ e

0

= 1.) Using equivalence over rationals,

e

x

= sup¦exp(q) : q ∈ Q and q < x¦ .

However exp is an increasing function, so writing S for the set whose supremum we take

above, exp(x) ≥ sup S = e

x

. On the other hand, because exp is continuous at x, we can

pick any sequence q

n

of rationals converging up to x and we have exp(q

n

) → exp(x).

This implies that exp(x) −r is not an upper bound for S for any r > 0 and therefore

exp(x) = sup S = e

x

.

112

We now show that the exponential function can be attained by the standard limit

e

x

= lim

n→∞

_

1 +

x

n

_

n

.

First we use the binomial formula

_

1 +

x

n

_

n

=

n

j=0

_

n

j

_

_

x

n

_

j

=

n

j=0

n!

j!(n −j)!

_

x

n

_

j

=

n

j=0

n(n −1) (n −j + 1)

n

j

x

j

j!

.

To show the limit, let > 0. By convergence of

∞

j=0

|x|

j

j!

, we may choose J such that

∞

j=J+1

[x[

j

j!

< /3 .

Because

J

j=0

n(n−1)···(n−j+1)

n

j

x

j

j!

is a ﬁnite sum and the j-th term approaches

x

j

j!

as n → ∞,

we can pick N such that if n ≥ N then

¸

¸

¸

¸

¸

J

j=0

n(n −1) (n −j + 1)

n

j

x

j

j!

−

J

j=0

x

j

j!

¸

¸

¸

¸

¸

< /3 .

Thus by the triangle inequality, we ﬁnd for n ≥ N,

¸

¸

¸

¸

¸

_

1 +

x

n

_

n

−

∞

j=0

x

j

j!

¸

¸

¸

¸

¸

≤

¸

¸

¸

¸

¸

∞

j=J+1

x

j

j!

¸

¸

¸

¸

¸

+

¸

¸

¸

¸

¸

J

j=0

n(n −1) (n −j + 1)

n

j

x

j

j!

−

J

j=0

x

j

j!

¸

¸

¸

¸

¸

+

¸

¸

¸

¸

¸

n

j=J+1

n(n −1) (n −j + 1)

n

j

x

j

j!

¸

¸

¸

¸

¸

< 2/3 +

n

j=J+1

[x[

j

j!

< .

B.3 Sophomore’s dream

We end this appendix with a strange identity that is for some reason called the “Sophomore’s

dream.” It is

_

1

0

x

−x

dx =

∞

n=1

n

−n

.

113

To prove this, we need to deﬁne the function x

−x

. It is given by x

−x

= exp(−x log x). So

the identity reads

_

1

0

e

−xlog x

dx =

∞

n=1

n

−n

.

For this integral to make sense, as the integrand is not deﬁned at x = 0, we must use the

right limit. Note that by l’Hopital’s rule (which we didn’t cover but it is in Rudin),

lim

x→0

+

−x log x = lim

x→0

+

−log x

1/x

= lim

x→0

+

−1/x

−1/x

2

= 0 .

Using continuity of the exponential function, we ﬁnd

lim

x→0

+

x

−x

= 1 ,

so we can continuously extend x

−x

to 0 by deﬁning 0

0

= 1. Thus x

−x

is then continuous on

[0, 1] and integrable.

To ﬁnd the integral we will use a power series expansion of e

x

:

e

x

=

∞

n=0

x

n

n!

,

which has radius of convergence R = ∞. Therefore by the remark after exercise 6, Chapter

8, for any M > 0, this series converges uniformly for x in [−M, M]. (The proof uses the

Weierstrass M-test.) Because the number [x log x[ is bounded by e

−1

on the interval [0, 1]

(do some calculus),

e

−xlog x

=

∞

n=0

(−x log x)

n

n!

converges uniformly on [0, 1] .

We now use exercise 5, Chapter 8, which says that if (f

n

) is a sequence of continuous functions

that converges uniformly on [0, 1] to a function f then

_

1

0

f

n

(x) dx →

_

1

0

f(x) dx. Noting

that an inﬁnite series of functions is just a limit of the sequence of partial sums (which

converges uniformly in our case), we get

_

1

0

x

−x

dx =

∞

n=0

_

1

0

(−x log x)

n

n!

dx =

∞

n=0

1

n!

_

1

0

(−x log x)

n

dx .

Now we compute the integral

_

1

0

(−x log x)

n

dx using integration by parts. We take

u = (−log x)

n

and dv = x

n

dx to get du = (−1)

n

n(log x)

n−1

/x dx and v = x

n+1

/(n + 1):

_

1

0

(x log x)

n

dx =

(−log x)

n

x

n+1

n + 1

¸

¸

¸

¸

1

0

−(−1)

n

n

n + 1

_

1

0

x

n

(−log x)

n−1

dx

=

n

n + 1

_

1

0

x

n

(log x)

n−1

dx .

114

Repeating this, we ﬁnd

_

1

0

(−x log x)

n

dx =

n!

(n + 1)

n+1

.

So plugging back in, we ﬁnd

_

1

0

x

−x

dx = −

∞

n=0

1

(n + 1)

n+1

=

∞

n=1

n

−n

.

C Dimension of the Cantor set

In this section we will discuss how to assign a dimension to the Cantor set. One way is

through the use of Hausdorﬀ dimension. We will start with deﬁnitions and examples. This

treatment is based on notes of J. Shah from UChicago.

C.1 Deﬁnitions

For any set S ⊂ R write [S[ for the diameter of S:

[S[ = sup¦[x −y[ : x, y ∈ S¦ .

For example, we have [[0, 1][ = 1, [Q∩ [0, 1][ = 1 and [(0, 1) ∪ (2, 3)[ = 3.

Deﬁnition C.1.1. Let S ⊂ R. A countable collection ¦C

n

¦ of subsets of R is called a

countable cover of S if

S ⊂ ∪

∞

n=1

C

n

.

Note that the sets in a countable cover can be any sets whatsoever. For example, they

do not need to be open or closed.

Deﬁnition C.1.2. If ¦C

n

¦ is a countable collection of sets in R and α > 0, the α-total

length of ¦C

n

¦ is

∞

n=1

[C

n

[

α

.

If α > 1 then it has the eﬀect of increasing the diameter (that is, [C

n

[

α

> [C

n

[) when

[C

n

[ is large (bigger than 1) and decreasing it when [C

n

[ is small (less than 1).

Example 1. Consider the interval [0, 1]. Let us build a very simple cover of this set by

ﬁxing n and choosing our (ﬁnite) cover ¦C

1

, . . . , C

n

¦ by

C

i

=

_

i −1

n

,

i

n

_

.

For instance, for n = 4 we have

[0, 1/4], [1/4, 1/2], [1/2, 3/4] and [3/4, 1] .

115

Computing the α-total length of this cover:

n

i=1

¸

¸

¸

¸

_

i −1

n

,

i

n

_¸

¸

¸

¸

α

=

n

n

α

.

The limit as n approaches ∞ is

_

¸

_

¸

_

∞ if α < 1

1 if α = 1

0 if α > 1

.

This result gives us some hint that the dimension of a set is related to the α-total length of

countable covers of the set. Speciﬁcally we make the following deﬁnition:

Deﬁnition C.1.3. If S ⊂ R has [S[ < ∞ and α > 0 we deﬁne the α-covered length of S as

H

α

(S) = inf

_

∞

n=1

[C

n

[

α

: ¦C

n

¦ is a countable cover of S

_

.

The Hausdorﬀ dimension is deﬁned as

dim

H

(S) = inf¦α > 0 : H

α

(S) = 0¦ .

It is an exercise to show that for all 0 < α < dim

H

(S), we have H

α

(S) > 0. Also, setting

0

0

= 1 then H

0

(S) > 0 for all S Thus we could deﬁne the Hausdorﬀ dimension as

sup¦α ≥ 0 : H

α

(S) > 0¦ .

Note that example 1 shows that dim

H

([0, 1]) ≤ 1. To show the other inequality, we must

show that for all α < 1, H

α

([0, 1]) > 0. To do this, let ¦C

n

¦ be a countable cover of [0, 1].

We may replace the C

n

’s by D

n

= C

n

∩ [0, 1], since the D

n

’s will still cover [0, 1] and will

have smaller α-length. For α < 1 we then have

∞

n=1

[D

n

[

α

≥

∞

n=1

[D

n

[ ,

because [D

n

[ ≤ 1. Now it suﬃces to show.

Lemma C.1.4. If ¦D

n

¦ is a countable cover of [0, 1] then

∞

n=1

[D

n

[ ≥ 1 .

Proof. The proof is an exercise.

116

Assuming the lemma, we have H

α

([0, 1]) ≥ 1 for all α < 1 and therefore dim

H

([0, 1]) = 1.

If the concept of Hausdorﬀ dimension is to agree with our current notion of dimension

it had better be that each subset of R has dimension no bigger than 1. This is indeed the

case; we can argue similarly to before. If S ⊂ R has [S[ < ∞ then we can ﬁnd M > 0 such

that S ⊂ [−M, M]. Now for each n deﬁne a cover ¦C

1

, . . . , C

n

¦ by

C

i

=

_

−M + 2M

i −1

n

, −M + 2M

i

n

_

.

As before, for α > 1, the α-total length of ¦C

1

, . . . , C

n

¦ is

n

_

2M

n

_

α

→0 as n →∞ .

Therefore H

α

(S) = 0 and dim

H

(S) ≤ 1.

Example 2. Take S to be any countable set with ﬁnite diameter (for instance the rationals

in [0, 1]). We claim that dim

H

(S) = 0. To show this we must prove that for all α > 0,

H

α

(S) = 0. Let > 0 and deﬁne a countable cover of S by ﬁrst enumerating the elements

of S as ¦s

1

, s

2

, . . .¦ and for i ∈ N, letting C

i

be any interval containing s

i

of length (/2

n

)

1/α

(note that this is a positive number). Then the α-total length of the cover is

∞

n=1

2

n

= ;

therefore H

α

(S) ≤ . This is true for all > 0 so H

α

(S) = 0.

C.2 The Cantor set

Let S be the Cantor set. To remind you, the construction is as follows. We start with

S

0

= [0, 1]. We remove the middle third of S

0

to get S

1

= [0, 1/3] ∪ [2/3, 1]. In general, at

the k-th step we have a set S

k

which is a union of 2

k

intervals of length 3

−k

. We then remove

the middle third of each interval to get S

k+1

. The deﬁnition of S is

S = ∩

∞

k=0

S

k

.

Theorem C.2.1. The Hausdorﬀ dimension of the Cantor set is

dim

H

(S) =

log 2

log 3

= log

3

2 .

Proof. Set α = log 2/ log 3. We ﬁrst prove that dim

H

(S) ≤ α. For this we must show that

if β > α then H

β

(S) = 0. Pick k ≥ 0 and let I

1

, . . . , I

2

k be the intervals of length 3

−k

that

comprise S

k

, the set at the k-th level of the construction of the Cantor set. Since S ⊂ S

k

,

this is a cover of S. We compute the β-total length of the cover. It is

2

k

j=1

[I

j

[

β

=

2

k

j=1

3

−βk

= e

k[log 2−β log 3]

,

117

and this approaches zero as k → ∞. Note that we have used above that, for example

2

k

= e

k log 2

. Therefore H

β

(S) = 0 and dim

H

(S) ≤ α.

For the other direction (to prove dim

H

(S) ≥ α) we will show that H

α

(S) > 0. Let ¦C

n

¦

be a countable cover of S. We will give a bound on the α-total length of ¦C

n

¦. As before,

we may assume that each C

n

is actually a subset of [0, 1]. By compactness one can show the

following:

Lemma C.2.2. Given > 0 there exist ﬁnitely many open intervals D

1

, . . . , D

m

such that

∪

∞

n=1

C

n

⊂ ∪

m

j=1

D

j

and

m

j=1

[D

j

[

α

<

∞

n=1

[C

n

[

α

+ .

Proof. The proof is an exercise. The idea is to ﬁrst replace the C

n

’s by closed intervals and

then slightly widen them, while making them open. Then use compactness.

Now choose k such that

_

1

3

_

k

≤ min¦[D

j

[ : j = 1, . . . , m¦ .

For l = 1, . . . , k let N

l

be the number of sets D

j

such that 3

−l

≤ [D

j

[ < 3

−l+1

. Using

α = log 2/ log 3 and the deﬁnition of k, we ﬁnd

m

j=1

[D

j

[

α

≥

k

l=1

N

l

3

−lα

=

k

l=1

N

l

2

−l

, (9)

so we will give a lower bound for the right side. Suppose that D

j

has 3

−l

≤ [D

j

[ < 3

−l+1

.

Then D

j

can intersect at most 2 of the intervals in S

l

, the l-th step in the construction of

the Cantor set. Since each of these intervals produces 2

k−l

subintervals at the k-th step of

the construction, we ﬁnd that D

j

contains at most 2 2

k−l

subintervals at the k-th step of

the construction. But there are only 2

k

subintervals at the k-th step so we ﬁnd

2

k

≤

k

l=1

N

l

2 2

k−l

or

1

2

≤

k

l=1

N

l

2

−l

.

Combining this with (9),

m

j=1

[D

j

[

α

≥ 1/2 .

Now using the previous lemma with = 1/4,

∞

n=1

[C

n

[

α

> 1/4 and H

α

(S) ≥ 1/4. Thus

dim

H

(S) ≥ α.

118

C.3 Exercises

1. Prove Lemma C.1.4.

2. Prove Lemma C.2.2.

3. Prove that if S ⊂ R with [S[ < ∞has nonempty interior then show that dim

H

(S) = 1.

4. What is the Hausdorﬀ dimension of a modiﬁed Cantor set where we remove the middle

1/9-th of our intervals?

5. What is the Hausdorﬀ dimension of the modiﬁed Cantor set from exercise 15, Chapter

3?

119

- hw4-solUploaded byemerson
- Mathematics - I MUploaded bySitesh Sil
- ch01Uploaded byfleebt
- Chapter 2 .docxUploaded byNina Ad
- After MidnightUploaded byNamrevUidualc
- Essentials.of.Probability.theory.for.StatisticiansUploaded byMarko Cvarak
- ch1Uploaded bySanjeev Kumar Sinha
- 20120223210203SMN6014-StudentNotesUploaded bySri Tharan
- Variational AnalysisUploaded byJose Eduardo
- C5Uploaded byhaji63
- natural numbersUploaded byOscar Velez
- David a. Santos - Number Theory for Mathematical ContestsUploaded byCarlitosming
- Group Theory and the Rubik’s CubeUploaded bys v k
- Real Analysis HWUploaded byJimmy Broomfield
- An Anti-Reductionist's Guide to Evidential Support - Agustín RayoUploaded byJhonny Jaramillo
- 1_RealNumbersUploaded byddreidz
- Provas de FibonacciUploaded bywebfelipemaia
- read_mathUploaded byBasker
- portfolio lesson plan analysis and reflectionUploaded byapi-292337013

- Hannah - Foucault’s “German Moment”- Genealogy of a DisjunctureUploaded byRyan Teehan
- 32.3engstromUploaded byRyan Teehan
- dnb_vol24_no4_381Uploaded byRyan Teehan
- 10GenomesEvolutionTransposableElements(2) 2(1)Uploaded byRyan Teehan
- [W.E.B. Du Bois, Philip S. Foner] W.E.B. Du Bois S(B-ok.org)Uploaded byRyan Teehan
- 22_B_U__Int_l_L_J__219_at_22Uploaded byRyan Teehan
- fContracts PreflowUploaded byRyan Teehan
- Hannah - Foucault’s “German Moment”- Genealogy of a DisjunctureUploaded byRyan Teehan
- s s Saldanha (2)Uploaded byRyan Teehan
- Relationship-related obsessivecompulsive phenomena- The case of relationship-centred and partnerfocused obsessive compulsive symptoms.pdfUploaded byRyan Teehan
- Wake Forest Bailey Min Neg ADA Round4Uploaded byRyan Teehan
- Block TemplateUploaded byRyan Teehan
- Maquiladoras NegUploaded byRyan Teehan
- tl1sqap1 - LibFlowUploaded byRyan Teehan
- To Get Deoxys First Beat the Elite FourUploaded byRyan Teehan
- Liberalism and the American Revolution.pdfUploaded byRyan Teehan
- Zen as a Social Ethics of ResponsivenessUploaded byRyan Teehan
- Infinite Obligation NCUploaded byRyan Teehan
- Kant NCUploaded byRyan Teehan
- Ayer NCUploaded byRyan Teehan
- Levinas and Ethical ResponsibilityUploaded byRyan Teehan
- 3567Uploaded byRyan Teehan
- Progressive Taxes SampleUploaded byRyan Teehan
- Ayer NCUploaded byRyan Teehan
- Why Compulsory Voting Can Enhance DemocracyUploaded byRyan Teehan
- Giving Voice to the Vulnerable- Discourse Ethics and Amnesty for Undocumented ImmigrationUploaded byRyan Teehan
- Aff Afc+Democracy PkUploaded byRyan Teehan
- [Arendt] the Human Condition (1958)(BookFi.org)Uploaded byRyan Teehan