Professional Documents
Culture Documents
Writing Proofs in Analysis PDF
Writing Proofs in Analysis PDF
Kane
Writing
Proofs in
Analysis
Writing Proofs in Analysis
Jonathan M. Kane
123
Jonathan M. Kane
Department of Mathematics
University of Wisconsin - Madison
Madison, WI, USA
I wish to thank Natalya St. Clair for her excellent work creating the illustrations
appearing in this textbook. She took my crude sketches and vague ideas and turned
them into pleasing artwork and instructive diagrams. I also wish to thank Daniel M.
Kane, Alan Gluchoff, Thomas Drucker, and Walter Stromquist for their insightful
comments about the presentation, content, and correctness of the text.
vii
Preface
After learning to solve many types of problems such as those found in the first
courses in Algebra, Geometry, Trigonometry, and Calculus, mathematics students
are usually exposed to a “transition” course where they are expected to write proofs
of various theorems. I taught such a course for a dozen years and was never satisfied
with the textbooks available for that course. Although such textbooks often teach
the fundamentals of logic (conditionals, biconditionals, negations, truth tables) and
give some common proof strategies such as mathematical induction, the textbooks
failed to teach what a student needs to be thinking about when trying to construct a
proof. Many of these books present a great number of well-written proofs and then
ask students to write proofs of similar statements in the hope that the students will
be able to mimic what they have seen. Some of these books are also designed to be
used as an introductory textbook in Analysis, Abstract Algebra, Topology, Number
Theory, or Discrete Mathematics, and, as such, they concentrate more on explaining
the fundamentals of those topic areas than on the fundamentals of writing good
proofs.
This Book Is Not Your Traditional Transition Textbook The goal of this book
is to give the student precise training in the writing of proofs by explaining what
elements make up a correct proof, by teaching how to construct an acceptable proof,
by explaining what the student is supposed to be thinking about when trying to write
a proof, and by warning about pitfalls that result in incorrect proofs. In particular,
this book was written with the following directives:
• Unlike many transition books which do not give enough instruction about how
to write proofs, most of the proofs presented in this text are preceded by detailed
explanations describing the thought process one goes through when constructing
the proof. Then a good proof is given that incorporates the elements of that
discussion.
• For proofs that share the same general structure such as the proof of lim f .x/ D L
x!a
for various functions, proof templates are provided that give a generic approach
to writing that type of proof.
ix
x Preface
Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii
Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix
List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xvii
List of Proof Templates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xx
1 What Are Proofs, and Why Do We Write Them? . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1 What Is a Proof? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Why We Write Proofs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2 The Basics of Proofs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.1 The Language of Proofs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.1.1 Conditional Statements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.1.2 Negation of a Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.1.3 Proofs of Conditional Statements . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.1.4 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.2 Template for Proofs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.2.1 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.3 Proofs About Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.3.1 Set Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.3.2 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.3.3 Proofs About Subsets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.3.4 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.3.5 Proofs About Set Equality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.3.6 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
2.4 Proofs About Even and Odd Integers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
2.4.1 Definitions of Even and Odd Integers . . . . . . . . . . . . . . . . . . . . . 27
2.4.2 Proofs About Even and Odd Integers . . . . . . . . . . . . . . . . . . . . . 28
2.4.3 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
2.5 Basic Facts About Real Numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
2.5.1 Ordered Fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
2.5.2 The Completeness Axiom and the Real Numbers . . . . . . . . 35
2.5.3 Absolute Value, the Triangle Inequality, and Intervals . . . 38
2.5.4 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
2.6 Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
2.6.1 Function, Domain, Codomain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
2.6.2 Surjection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
2.6.3 Injection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
xi
xii CONTENTS
2.6.4 Composition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
2.6.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
3 Limits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
3.1 The Definition of Limit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
3.1.1 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
3.2 Proving lim f .x/ D L . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
x!a
3.2.1 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
3.3 One-Sided Limits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
3.3.1 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
3.4 Limits at Infinity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
3.4.1 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
3.5 Limit of a Sequence. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
3.5.1 Definition of Sequence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
3.5.2 Arithmetic with Sequences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
3.5.3 Monotone Sequences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
3.5.4 Subsequences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
3.5.5 Limit of a Sequence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
3.5.6 Limits of Monotone Sequences and
Mathematical Induction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
3.5.7 Cauchy Sequences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
3.5.8 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
3.6 Proving That a Limit Does Not Exist . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
3.6.1 Why a Limit Might Not Exist . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
3.6.2 Quantifiers and Negations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
3.6.3 Proving No Limit Exists . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
3.6.4 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
3.7 Accumulation Points . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
3.7.1 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
3.8 Infinite Limits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
3.8.1 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
3.9 The Arithmetic of Limits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
3.9.1 Limit of a Sum . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
3.9.2 Limit of a Product . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
3.9.3 Limit of a Quotient. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
3.9.4 Limit of Rational Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
3.9.5 Other Types of Limits. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
3.9.6 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
3.10 Other Limit Theorems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
3.10.1 The Limit of a Positive Function . . . . . . . . . . . . . . . . . . . . . . . . . . 90
3.10.2 Uniqueness of Limits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
3.10.3 The Squeezing Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
3.10.4 Limits of Subsequences. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
3.10.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
CONTENTS xiii
Fig. 1.1 Dividing the disk with the chords from n points . . . . . . . . . . . . . . . . . . . . . . 6
Fig. 2.1 List of implications for P ! Q . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
Fig. 2.2 .A [ B/c is equal to Ac \ Bc . . . . . . . . . . . . . . . . .p ........................... 26
Fig. 2.3 Showing the least upper bound of S is s D r . . . . . . . . . . . . . . . . . . . . . . . . 37
Fig. 2.4 Triangle inequality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
Fig. 2.5 Composition .f ı g/.x/ D z . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
Fig. 3.1 lim f .x/ D L . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
x!a
Fig. 3.2 lim f .x/ D L . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
x!a
Fig. 3.3 lim f .x/ D L . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
x!a
Fig. 3.4 Graph of f .x/ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
Fig. 3.5 Approaching a limit as x ! 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
Fig. 3.6 Proving bounded monotone sequences converge . . . . . . . . . . . . . . . . . . . . . 63
Fig. 3.7 f has no limit at x D 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
Fig. 3.8 Graph of sin 1x . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
Fig. 3.9 Set with accumulation point a and isolated point b . . . . . . . . . . . . . . . . . . . 74
Fig. 3.10 Sequences approaching the lim sup and lim inf . . . . . . . . . . . . . . . . . . . . . . . 94
Fig. 4.1 Continuity of a function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
Fig. 4.2 A function equal to 2x for rational x and x C 1 for
irrational x . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
Fig. 4.3 f .x/ D 1x is not uniformly continuous . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
Fig. 4.4 Heine–Borel Theorem first proof . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
Fig. 4.5 Heine–Borel Theorem second proof . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
Fig. 4.6 y and z straddle one endpoint but remain in an interval
of the open cover. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
Fig. 4.7 Proving that a continuous function on Œa; b is bounded . . . . . . . . . . . . . . 125
Fig. 4.8 The maximum and minimum of a function f .x/ on an interval . . . . . . 126
Fig. 4.9 f passing through each y between f .c/ and f .d/ . . . . . . . . . . . . . . . . . . . . . . 128
Fig. 4.10 A function with a jump discontinuity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130
Fig. 4.11 Graph of sin 1x . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130
Fig. 4.12 Graphs of sgn.x/ and bxc . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131
Fig. 4.13 Graphs of functions with discontinuities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132
xvii
xviii LIST OF FIGURES
Proof . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
Proving A B for sets A and B . . . . . . . . . . . . . . . . . . . . . . . . . 20
Proving A D B for sets A and B . . . . . . . . . . . . . . . . . . . . . . . . . 23
Proving a function f is surjective . . . . . . . . . . . . . . . . . . . . . . . . 41
Proving a function f is injective . . . . . . . . . . . . . . . . . . . . . . . . . 42
Proving lim f .x/ D L . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
x!a
Proving a result using mathematical induction . . . . . . . . . . . . . . . . . 65
Proving lim f .x/ does not exist . . . . . . . . . . . . . . . . . . . . . . . . . 70
x!a
Proving the function f is continuous at the point a . . . . . . . . . . . . . . . 102
Proving the function f is uniformly continuous on the set A . . . . . . . . . . 106
Proving <X; d> is a metric space . . . . . . . . . . . . . . . . . . . . . . . . 296
xx
Chapter 1
What Are Proofs, and Why Do We Write Them?
But more likely you are interested in investigating some properties of these
numbers having to do with their order or how they behave when operated on
by addition or multiplication. This, of course, would mean that you will need to
make clear statements about addition and multiplication operations and a less than
relationship, again, so that you do not run into problems later because you were
being ambiguous. So, you might write definitions of addition, multiplication, and
less than, and then make statements about how these operations behave such as a
Commutative Law of Addition (mCn D nCm), a Distributive Law of Multiplication
over Addition (a.b C c/ D ab C ac), and an Order Property of Addition (r < s
implies r C t < s C t). These statements about how these defined quantities work
are called axioms, postulates, or principles. They are statements that you accept
as the guiding rules for how your mathematical objects behave and go beyond the
definitions to describe and make precise just what the definitions are talking about.
Once you have made definitions and laid out your axioms, you should have the
tools necessary to begin an investigation of other properties. Suppose that someone
looks at a few examples and notice that 1 C 9 D 10 and 10 is 2 times another
number, 5. They then notice that 4C12 D 16, 3C147 D 150, and 1002C6 D 1008,
and all of these results are also numbers equal to 2 times another number. This might
lead them to make the statement that “if you add two natural numbers together,
the result is always 2 times another number.” Such a statement would be called a
conjecture, a statement whose truth has not yet been determined. Of course, you
know that this statement is false and came about because the investigator had not
yet considered enough examples. Once they stumble upon 5 C 8 D 13 and notice
that 13 cannot be represented as 2 times another number, they will know that the
statement does not hold true in every case.
Other conjectures such as “for every natural number a, the number a2 3a C 12
is a multiple of 2” hold up to more scrutiny. At some point in your investigation you
might see a convincing argument that this conjecture is, in fact, a true statement.
Such a convincing argument is what is called a proof. Once it is known that a
statement has a proof, it is known as a theorem, lemma, corollary, proposition,
or law. So, a proof of a statement in mathematics is a convincing argument that
establishes the truth of that statement.
Some statements are very easily proved, and certainly mathematicians often set
up axioms in order to make particular statements easy to prove. At first this may
appear to be cheating or, at best, unproductive and uninteresting because it seems
to defeat the purpose of establishing truth by dictating rules that make it trivial to
establish the truth. But this is certainly not the case. It is common for mathematicians
to have an intuitive idea about how a system should work before they feel that they
understand it enough to set down formal definitions and axioms. Perhaps you wanted
addition of all natural numbers a and b to satisfy a C b D b C a. Then it would make
sense to include this rule among your axioms. The axioms are written with the idea
of establishing enough structure so that the statements the mathematicians want to
hold true can easily be proved. The richness of mathematics is that after assuring
that the obvious can be proved from the axioms, there are many more results that
can be proved that are not immediately obvious from the definitions and axioms,
1.1 What Is a Proof? 3
statements which might never have been apparent to those who set up the system in
the first place. For example, Fermat’s Last Theorem (there are no natural numbers a,
b, c, and n > 2 such that an C bn D cn ) is a statement about natural numbers which
could only be conjectured after investigating a large number of examples, and stood
as a conjecture for hundreds of years before a proof was provided.
Occasionally, it is shown that a conjecture is independent of the axioms; that
is, neither the truth nor the falseness of the statement follows from the axioms. Two
famous examples are the statements about sets known as the Axiom of Choice and
the Continuum Hypothesis which have been shown to be independent of the original
axioms of Zermelo-Fraenkel Set Theory. The independence of such statements
suggests that the axiom system is not rich enough in structure to establish the truth
of these statements, and that if one chose to do so, those statements could be added
to the list of axioms for the system. The Axiom of Choice or something equivalent
to it, for example, is now usually listed along with the Zermelo-Fraenkel axioms.
One certainly hopes that it is not possible to prove two contradictory statements
about objects in a system. Such an occurrence would say that the axioms of the
system were inconsistent, and this would require the axioms to be changed. After
the original ground rules for Set Theory were established by Georg Cantor in the
1870s and 1880s, Bertrand Russell pointed out in 1901 a paradox (contradiction)
that is a consequence of those rules. Now commonly known as Russell’s paradox,
it stimulated a flurry of activity which resulted in the young field of Set Theory
being put on a firm foundation (we hope) with the creation and adoption of the
Zermelo–Fraenkel axioms.
The language of a proof can vary depending on who is writing the proof and
who is the intended reader. In other words, what makes a convincing argument may
well depend on who it is that needs to be convinced. For example, if two experts
in Functional Analysis are speaking to each other, one might prove a statement by
saying “Oh, that’s just a consequence of the Hahn-Banach Theorem.” That proof
might be sufficient since it completely describes the reasoning behind the statement
in question due to the shared knowledge of the two experts. On the other hand, if
one of these experts were speaking to a beginning mathematics graduate student,
the proof would need to include far more detail in order for it to be a convincing
argument. If the expert were speaking to a high school student, the proof might
need to be a complete book that both introduces the needed concepts and explains
many results needed to understand the proof.
It is important to understand that there is a difference between knowing why
a statement is true and knowing how to write a good proof of the statement. It is
quite possible to learn a great deal of mathematics, to be able to solve many types
of mathematical problems, and to understand why particular properties must hold
without being able to write coherent proofs of these properties. It is analogous to
a police detective who has gathered enough evidence to be convinced which of the
many suspects has committed a particular crime, but it is quite another thing to have
the criminal successfully prosecuted in a court of law resulting in the criminal’s
conviction and eventual punishment for the crime. A student in Analysis needs to
learn many strategies that can be brought to bear when writing proofs. Some of these
4 1 What Are Proofs, and Why Do We Write Them?
strategies are methods or tricks that enter a student’s bag of tricks which can be
employed later when solving problems or writing proofs. A student of proof writing
needs to learn how to take those strategies and turn them into coherent proofs where
the ideas are presented in a logical order, fill in all necessary details, and make clear
to the proof reader exactly why the chosen strategies justify the needed result.
This book talks about how you should go about writing proofs of the kinds
of statements typically found in the branch of Mathematics called Analysis. The
branches of mathematics are not precisely defined. After a new branch arises, some
mathematicians begin to combine ideas from older branches with ideas from the new
branch to form even newer areas of study. For example, there are branches called
Algebra, Geometry, and Topology. During the twentieth century mathematicians
began talking about Algebraic Topology, Algebraic Geometry, and Geometric
Topology. Very roughly speaking, then, some of the branches of mathematics are
• Set Theory: the study of sets, set operations, functions between sets, orderings
of sets, and sizes of sets
• Algebra: the study of sets upon which there are binary operations defined (such
as addition or multiplication) and includes Group Theory, Ring Theory, Field
Theory, and Linear Algebra
• Topology: the study of continuous functions and properties of sets that are
preserved by continuous functions
• Analysis: the study of sets for which there is a measure of distance allowing for
the definition of various limiting processes such as those found in the subjects
of Calculus, Differential Equations, Functional Analysis, Complex Variables,
Measure Theory, and many other areas.
There are many reasons why mathematicians put a lot of weight on the writing of
proofs. Here are some of the reasons.
Determining Truth Research mathematicians use proofs to determine what math-
ematical statements are true. Although many statements in mathematics are obvi-
ously true, many remain unproved conjectures for long periods of time before being
proved. When a conjecture stands unproved for many years, there is time for more
mathematicians to learn about the statement, and the conjecture may attract a great
deal of attention. When the conjecture is first stated, some may find it interesting,
but finding a suitable proof may not appear to be a difficult problem until many
people have tried unsuccessfully to find a proof. As this interesting statement
remains a conjecture for a longer and longer period of time, the mathematical
community realizes that the problem of finding a proof is much more involved than
originally expected. This is exciting partly because a wider community of experts
begin to wonder whether the statement under consideration is true and because
it becomes clear that new techniques will be needed to find a suitable proof if,
in fact, the statement can be proved at all. The problem of determining whether
or not the mathematical statement is true takes on the same sort of interest that
some people would take in the success of their favorite sports teams; sitting and
waiting to see how they will fair in the upcoming contest. When a longstanding
conjecture is finally proved, the announcement of the accomplishment will often be
covered by the lay press giving mathematics an uncharacteristic brief period of pubic
admiration. Perhaps you are familiar with some of these famous problems whose
6 1 What Are Proofs, and Why Do We Write Them?
resolution has alluded mathematicians for years (at least at the time of the writing of
this text in January 2016): The Riemann Hypothesis, the Goldbach Conjecture, the
Twin Prime Conjecture, the P versus NP Problem, and the Navier–Stokes Equations
Existence and Smoothness Problem. During the last 40 years resolutions have been
announced for several long-standing problems including the Four Color Theorem,
The Bieberbach Conjecture (now called de Branges’s Theorem), Fermat’s Last
Theorem, and the Poincaré Conjecture.
Why do mathematicians expend so much effort trying to prove statements, some
of which may seem obvious from the start? One reason is that mathematicians
are very skeptical of statements that appear obvious, and rightfully so. There is a
long history that includes mathematical statements which appear to be true which
are eventually shown to be false. Even very clear patterns can be deceptively
seductive. Take, for example, the following problem. Select a set of n points along
the circumference of a circle, draw the chords between each pair of points, and find
out the maximum number of regions into which these segments can divide the disk.
Figure 1.1 shows the results for the first few values of n.
Although from considering n D 1; 2; 3; 4; 5 it appears that the chords can divide
the disk into 2n1 regions, this fails to be true when n D 6. With a bit more thinking
itnis not hard to see that2n could not be the correct answer. With n points there are
n1
2
chords and at most 4 intersections of two chords. This number of intersections
grows as a fourth-degree polynomial in n suggesting that the number of regions will
1.2 Why We Write Proofs 7
In fact, for many years it was thought that li.x/ > .x/ for all x > 0 because this
holds for all small values of x which can be practically checked, for example, all x
between 0 and 1024 . It has now been shown that li.x/ .x/ switches sign infinitely
often, although only for extremely large values of x.
It is apparent that sometimes seemingly very obvious patterns do not hold
in every case, so mathematicians rely on proofs to convince themselves that the
patterns do indeed hold in the general case.
Testing Axiom Systems In the next chapter you will read about the writing of
proofs for some very elementary facts in mathematics; so elementary that you may
wonder why anyone would bother with these proofs. Clearly, it makes sense to
begin any training in the writing of proofs with some very simple results that are
easy to understand so that the student can feel confident about all the statements
being made in the proofs. But these proofs are not being presented just because they
are elementary. When one sets up a mathematical system by making definitions
and determining axioms, it is usually with a particular application or example in
mind. The desired result is that the new system will include the already partially
understood application so that any new discoveries will immediately tell something
new about the original application. Suppose someone sets up an axiom system for
the real numbers, for example, but is not able to prove that addition of real numbers
satisfies the commutative property. Since the commutative property is an important
aspect of addition of real numbers, it would appear that the new axiom system does
not have enough power to represent all that one would want to show about the real
numbers. Perhaps the axiom system will need to be expanded to include an axiom
about the commutativity of addition. Thus, if one cannot prove that the expected
simple properties hold, then it says that something is missing from the axioms. So
mathematicians write proofs to confirm that their axiom systems are representative
of the applications they are trying to describe.
Exhibiting Beauty There are no rules about what composers of music need to
write, but many composers try to write in standardly accepted formats such as
string quartets or symphonies because there are already organizations ready to
perform such works and groups of people happy to listen to such works. Scholars of
literature compare literary works by writing literary analysis, a form which holds
a lot of meaning for those who read and write in that field. Although painters
8 1 What Are Proofs, and Why Do We Write Them?
choose to make pictures of every sort of object or scene, real or imagined, most
painters eventually try their hand at painting some of the standard subjects (still life,
nudes, famous religious or historical depictions). Similarly, mathematicians write
proofs partly because that is what mathematicians enjoy doing. Although many
mathematicians make substantial contributions to the sciences, social sciences, and
arts through the application of their mathematical skills, others live in a world
of creating and discussing abstract concepts that have no immediate application
to real world problems, or at least no application apparent to the mathematicians
doing the research. To them, mathematics is studied as part of the humanities and
is appreciated for its beauty. And much of the beauty of mathematics lies in the
proofs of its theorems. One gets a great deal of pleasure reading a clever proof of
a complicated result when the proof can be stated in just a few lines, especially if
previous proofs of the same result were considerably longer and more difficult to
understand. Many mathematicians like reading articles and attending conferences
where they are exposed mainly to proofs of results, partly so that they can learn
about new results, but more importantly so they can appreciate the techniques
brought to bear to construct the proofs.
Testing Students One should not underestimate the need to educate future mathe-
maticians. A good way to test whether a student understands a particular result is to
ask the student to present a proof of the result. The presentation of a proof shows a
deep understanding of why the result is true and shows an ability to discuss many
details about the objects involved. At the graduate school level in mathematics, most
test problems require the student to produce a proof of a particular result.
The student who has completed a study of Calculus is likely to have mastered
basic skills in Algebra, Geometry, Trigonometry, and Elementary Functions. This
is a good point in one’s studies to begin writing proofs. It should not be assumed
that one can just begin writing proofs at this stage even if they have had years of
experience watching teachers and authors present proofs to them any more than
someone can be expected to sit down and begin playing the piano just because they
have watched many other people present concerts using the instrument. In this book
the reader will be taken through the construction of many proofs in a step-by-step
manner that presents the thought process used to write the proofs. Some incorrect
proofs are shown and explained so that the student can learn about common pitfalls
to avoid. Some students dread the transition to writing proofs because they feel
that they do not understand how to write proofs, and are leery of the day when
they will be expected to produce what they cannot now do. But the ability to write
good proofs is a skill no different from the ability to factor polynomials or integrate
rational functions. There is no expectation that the beginner can produce a good
proof, but every expectation that the beginner can learn.
Chapter 2
The Basics of Proofs
Most theorems concern mathematical objects x that satisfy a set of properties P, that
is, P.x/ D the properties P hold for object x. The theorem may say that if P.x/ is
true, then some additional properties Q.x/ must also be true. Such statements are
called conditional statements and can be written P.x/ ! Q.x/. In the context
of proving theorems, the “P.x/” portion of the statement is referred to as the
hypothesis of the statement, and the “Q.x/” portion of the statement is referred to as
the conclusion of the statement. The hypothesis of a conditional statement is often
called the antecedent while the conclusion of the conditional statement is often
called the consequent. For example, a well-known theorem is that all functions
differentiable at a point are also continuous at that point. There are many equivalent
ways to express this fact:
• All functions differentiable at a point are also continuous at that point.
• If the function f is differentiable at a point, then f is continuous at that point.
• The function f is differentiable at a point only if f is continuous at that point.
• If the function f is not continuous at a point, then f is not differentiable at that
point.
• There are no functions f such that f is both differentiable at a point and
discontinuous at that point.
• The function f is differentiable at a point implies that f is continuous at that point.
• The function f is differentiable at x ! f is continuous at x.
All of these statements assert that if a function f satisfies the hypothesis that it has
a derivative at a point x, then f must also satisfy the conclusion that f is continuous
at x. Note that the truth of a conditional statement, P.x/ ! Q.x/, suggests nothing
about the truth of the statement Q.x/ ! P.x/ which is known as the converse
of the conditional statement P.x/ ! Q.x/. Indeed, the converse of this theorem
is the clearly false statement: “If the function f is continuous at a point, then f is
differentiable at that point.” Certainly, there are functions f both continuous and
differentiable at a point, but knowing that a function is continuous at a point does
not allow one to conclude that it is differentiable at that point. The converse of a
conditional statement is not logically equivalent to the original statement, but since
the two statements are concerned with the same subject matter, mathematicians
are often interested in the converse of a given conditional. If someone succeeds
in proving a new theorem expressed as a conditional statement, you might wonder
whether the converse of the statement could also be true. Sometimes the truth of the
converse statement is a trivial matter because it is well known. But there are many
examples where the converse does not hold in every case; that is, there are many
known values of x where the converse statement “Q.x/ ! P.x/” is false. Other
times, the converse statement is something that has been previously established.
But very often, the truth of the converse statement remains an open question, and
the proof of the original conditional statement may generate research interest in its
converse.
One of the equivalent forms of a conditional statement P.x/ ! Q.x/ is the
statement “if Q.x/ is false, then P.x/ must be false.” This can be written as
“:Q.x/ ! :P.x/” using the negation symbol : . This form of the statement
is called the contrapositive of the original conditional statement. For example, the
contrapositive of the statement discussed above is “If the function f is not continuous
at a point, then f is not differentiable at that point.” Although logically equivalent
to the original conditional statement, the contrapositive often gives you a different
way to think about the statement, and you will often see a proof which is a proof of
the contrapositive statement instead of a proof of the original conditional statement.
The negation of a statement is a statement with the opposite truth value of the
original statement, that is, a statement which is false exactly when the original
statement is true. For example, the negation of “n is an integer” is “n is not an
integer.” The negation of the statement P.x/ is “not P.x/” or simply “:P.x/.” The
conditional statement “P.x/ ! Q.x/” says that every time P.x/ holds it must be the
case that Q.x/ also holds. The negation of this statement must, therefore, state that
for at least one value of x, P.x/ is true and Q.x/ is false or “P.x/ and :Q.x/.”
A proof by contradiction is a proof that assumes both that P.x/ and :Q.x/ are
true, and derives a statement that must be false (known as a contradiction) showing
that it is impossible to have P.x/ being true at the same time that Q.x/ is false.
The well-known Pythagorean Theorem is a conditional statement: “If a right
triangle has legs with lengths a and b and a hypotenuse with length c, then
a2 C b2 D c2 .” The converse of the Pythagorean Theorem is also true: “If a triangle
has sides with lengths a, b, and c satisfying a2 C b2 D c2 , then the triangle is
a right triangle.” When a conditional statement, P.x/ ! Q.x/ and its converse
2.1 The Language of Proofs 11
Q.x/ ! P.x/ are both true, the two statements can be combined into one as
P.x/ ! Q.x/. This can also be stated as “P.x/ if and only if Q.x/.” Such
statements are called biconditional statements. Thus, the Pythagorean Theorem and
its converse could be combined into the single biconditional statement: “A triangle
is a right triangle if and only if the triangle has side lengths a, b, and c satisfying
a2 C b2 D c2 .”
Conditional statements often make assertions about a very large number of objects
or even an infinite set of objects. Indeed, the statement about differentiable functions
being continuous refers to infinitely many functions, and the Pythagorean Theorem
refers to an infinite number of triangles. How, then, are you supposed to prove
these results since you clearly cannot consider every case individually? A general
approach to proving the conditional statement “P.x/ ! Q.x/” is to select a generic
element x which could represent any object satisfying P.x/ and then to prove the
statement Q.x/. Since a generic object x satisfying P.x/ has been shown to satisfy
Q.x/, it follows that every object satisfying P.x/ must also satisfy Q.x/, and the
result has been proved. This will be the format of most of the proofs you will ever
write in analysis.
If the statement “P.x/ ! Q.x/” is not true, it means that there is at least one
value of x that makes “P.x/ ! Q.x/” a false statement. Such an x is called a
counterexample to the statement, and exhibiting such a counterexample would be
a way to prove that “P.x/ ! Q.x/” is false. A proof of “P.x/ ! Q.x/” is essentially
an argument showing that no counterexamples exist.
There are many phrases that occur so frequently when writing proofs, that
mathematicians have developed a short hand notation for these phrases. There is
little need to use these abbreviations within a textbook such as this or even in a
journal article, but the short hand can be useful when writing out a proof by hand on
paper or a blackboard. Here is a list of some of the commonly used symbols.
2.1.4 Exercises
Perform the follows steps for each of the conditional statements in Exercises 1–6.
A identify the hypothesis and the conclusion.
B write the converse of the statement.
C decide whether or not the converse of the statement is true.
D write the contrapositive of the statement.
E write the negation of the statement.
1. If x D 1 and y D 1, then xy D 1.
2. If x is an integer, then 2x C 1 is also an integer.
3. f .x/ and g.x/ are both continuous at x D 0 only if f .x/ C g.x/ is continuous
at x D 0.
4. xy D 0 if x D 0 or y D 0.
5. If xy 9y D 0 and y > 0, then x D 9.
6. A rectangle has area xy if two adjacent sides of the rectangle have lengths x and y.
7. Write the following without using shorthand symbols.
(a) 9x 2 R 3 x C 4 D 2.
(b) 8x 2 R 9y 2 R 3 x C y D 10.
Many proofs can be written by following a simple formula or template that suggests
guidelines to follow when writing the proof. Mathematicians reading a proof that
follows a traditional template will find the proof easier to follow because there will
be an expectation about what will be presented in the proof. For example, many
proofs will follow the general template given here.
4ac 2
• This means that x C 2a
b
must be one of the two square roots of b 4a2 .
s p p
b 2
b 4ac 2
˙ b 4ac 2
b ˙ b 4ac
• So, x C D˙ 2
D , and x D .
2a 4a 2a 2a
• STATE THE CONCLUSION: Thus, the p roots of the quadratic polynomial
b ˙ b2 4ac
ax2 C bx C c are given by x D .
2a
The proof template begins with the suggestion to “SET THE CONTEXT” which
represents statements designed to tell the reader what is being assumed in the proof.
This is usually a sentence or two telling the reader about the properties of the objects
that will be encountered in the proof. It may also introduce which variables will
appear in the proof and what kinds of objects they represent. So, in the given proof
of the Quadratic Formula, the first line tells that the variables a, b, and c are going
to represent known constants with a not being 0. Clearly, the fact that a is not 0
needs to be stipulated because if a D 0, the polynomial ax2 C bx C c would not
be quadratic and would not have the proposed roots. Generally, you are not looking
for a lengthy narrative here, and, in fact, brevity is a particularly cherished attribute
of a proof. Saying what needs to be said, but only what needs to be said is usually
best. Some authors who state a theorem and immediately follow the statement of the
theorem with its proof will forgo setting the context at the beginning of the proof
because the reader will have just seen the statement of the theorem and may not need
to see a repeat of the context for that proof. For example, in the example proof, the
statement of the theorem does introduce the constants a, b, and c and polynomial
ax2 C bx C c, so some authors might just skip the first line of the proof. On the
other hand, if the first line of the proof instead introduced the constants r, s, and
t, the proof could have proceeded using these variables instead of a, b, and c. The
same result would have been proved. So the “SET THE CONTEXT” of the proof
makes the proof independent of the statement of the theorem being proved. Thus, for
completeness, it is good to establish the habit of including the setting of the context
at the beginning of each proof, at least until the student’s experience in proof writing
has matured.
Your choices of variables used to represent particular objects in the proof are
not critically important to the structure or correctness of the proof, but there are
14 2 The Basics of Proofs
certain variables that mathematicians associate with various uses, and sticking to
these conventional choices simplifies the understanding of the proof because those
variable choices bring with them a history of context that the reader will recognize.
There are very few Algebra students
p who would recognize the Quadratic Formula
s ˙ s2 4rt
if you gave them z D . Proofs about limits usually refer to the
2r
variables and ı which represent small positive real numbers used in specific ways
in the proof. Using these two variables in their traditional contexts makes the proofs
easier to understand because the reader will expect these variables to play specific
roles, just as they have in many other proofs the reader has seen. Seeing many
examples of proofs will familiarize the novice proof writer with these traditional
uses of variables.
Suppose that the statement being proved indicates that every object satisfying
the properties listed in the hypothesis of the theorem also satisfies some properties
listed in the conclusion of the theorem. One generally structures a proof of such
a statement by first selecting a generic object satisfying the properties listed in
the hypothesis. The “ASSERT THE HYPOTHESIS” part of the proof is where the
writer selects an arbitrary element satisfying the hypothesized properties. In the
Quadratic Formula proof, it was assumed that x satisfied the quadratic equation
ax2 C bx C c D 0. Other examples would be statements such as
• Let n be any natural number bigger than 3.
• Let x be an element of set A.
• Let y be a root of the polynomial p.x/.
• Assume that the real valued function f has a zero at the point z.
• Suppose G and H are any two linesR that intersect at a point P.
s
• Assume that the function f .s/ D 0 g.x/ dx is a differentiable function of s. In
addition assume that 0 f .s/ 10 for all s 0.
It is possible that there are infinitely many objects which could play the role of
the generically chosen object. But if an argument proves the result is true for this
generic object, then the theorem will have been shown to hold for any object that
could have played the role of the generic object, and, therefore, the theorem will
have been proved for all objects satisfying the hypothesis. The Quadratic Formula
proof addresses the one generic polynomial ax2 C bx C c and in doing so derives
a formula that works for all quadratic polynomials including 5x2 17x C 126 and
rx2 C sx C t. Often the reader of a proof will form a mental picture of the generic
object being chosen. For example, after reading “Let n be any natural number bigger
than 3,” the reader may think, “OK, how about n D 7?” As the proof progresses,
the reader may take each statement of the proof and verify that it is valid and makes
sense for their choice of n D 7. This helps the reader follow the logic of the proof
and verifies that they are understanding what the proof is saying.
The proof will be completed when it is shown that the generically chosen
element satisfying the hypothesis of the theorem is, in fact, an element satisfying
the conclusion of the theorem as stated in the “STATE THE CONCLUSION” part
of the template. There will certainly need to be some statements placed between
2.2 Template for Proofs 15
the original assertion of the hypothesis and the end of the proof that justify the
conclusion of the theorem. Those statements make up the “LIST IMPLICATIONS”
part of the template. In almost all cases, most of the body of the proof belongs to
this list of statements. Each statement in the list should follow from definitions or
be simple implications following from previous statements in the proof. In a well-
written complete proof, the reader should easily see why each implication follows
logically from other statements made earlier in the proof (Fig. 2.1). If an implication
is not clear on its own, it will need some justification so the reader can follow the
logic. The justification may just be a reminder of a key point made earlier in the
proof (“as shown earlier, f is continuous at point a”) or a reminder of a well-known
definition or theorem (“Since all continuous functions on the interval Œ0; 4 are
R4
integrable there, it follows that f .x/dx exists.”) The given Quadratic Formula proof
0
contains six lines of implications. Each line follows easily from the line before using
standard rules of Algebra, and any student familiar with the algebraic manipulations
of equations will be able to understand these implications. In the fourth step of the
b2
proof, the quantity 4a 2 is added to both sides of an equation. Although this step
surely follows the rules of Algebra, it may not be clear to the reader of the proof
why the step is important. As it turns out, this “completing the square” operation
prepares for the factoring performed in the fifth step of the proof and is arguably the
most clever step of the proof. A proof will often require a clever step such as this.
The proof writer may have labored for years looking for the inspiration needed to
find such a step, but the proof itself need only make clear the justification for what
is being done and does not need to refer to the sweat that went into producing it.
Some implications will be easy for the reader to follow without having to justify
the step. Other statements may need some deeper explanation. Here is where the
proof writer will need to consider the expertise of the target audience for the proof
in order to decide how much detail to provide. How to make your proof “easy to
follow” is only clear when you know for whom it is meant to be easy. For example, it
b2
made sense to follow the line x2 C ba xC ac D 0 with the statement x2 C ba xC ac C 4a 2 D
b2
4a2
because this just used the fact that you can add equal quantities to both sides of
an equation to get a new equation that is equivalent. On the other hand, suppose you
wish to combine a conditional statement on line 8 of a proof with the fact stated
on line 15 of the proof in order to show that the hypothesis of that conditional
is satisfied. This would allow the writer to state the conclusion of the conditional
statement to get line 16 of the proof, but the reader may have to be reminded about
which statements are being combined to get that conclusion.
Sometimes the writer of a long or complicated proof will need to make a new
definition or point out some new property that will be important later in the proof.
Depending on the complexity of the new idea, the proof writer may want to include
an example or two of objects satisfying the new definition or property. This will
serve to help the reader understand the new concept or to verify that the reader
is understanding the new concept. It is admirable to include such examples if the
complexity of the proof can be made clearer. But in most other contexts, the proof
should be kept short without the inclusion of unnecessary statements. If the intended
readers are able to easily construct these examples on their own, then the examples
should be left out of the proof.
The remainder of this chapter will discuss proofs that follow this general proof
template in contexts that the student should find easy to follow. It will also give
an opportunity to present some definitions and notation that will be used in later
chapters.
2.2.1 Exercises
1. If you were writing a proof of “All prime numbers greater than 2 are odd,” which
of the following would be appropriate ways to begin the proof. (There may be
more than one correct answer.)
(a) Let n be an odd prime number.
(b) Assume that all odd prime numbers are greater than 2.
(c) Let n be a prime number greater than 2.
(d) Assume that 2 is a prime number.
(e) Assume that n and k are integers with n > k > 2.
(f) The numbers 3, 5, 7, and 11 are prime numbers greater than 2.
(g) Let n be a number greater than 2 which is not prime.
2. If you were writing a proof of “The diagonals of a parallelogram bisect each
other,” which of the following would be appropriate ways to begin the proof.
(There may be more than one correct answer.)
(a) Let ABCD be a parallelogram.
(b) Let ABCD be a quadrilateral whose diagonals bisect each other.
(c) Let ABCD be a parallelogram whose diagonals bisect each other.
(d) All rectangles are parallelograms.
(e) Assume that the diagonals of a parallelogram bisect each other.
(f) Assume that if the diagonals of a quadrilateral bisect each other, then the
quadrilateral is a parallelogram.
2.3 Proofs About Sets 17
3. If you were writing a proof of “Every cubic polynomial with real coefficients has
at least one real root,” which of the following would be appropriate ways to begin
the proof. (There may be more than one correct answer.)
(a) Assume that every cubic polynomial with real coefficients has at least one
real root.
(b) Assume that p.x/ is a polynomial with at least one real root.
(c) Assume that a, b, c, and d are real numbers with a ¤ 0, and let p.x/ D
ax3 C bx2 C cx C d.
(c) The polynomial x3 8 has exactly one real root at x D 2.
(e) Let p.x/ be a cubic polynomial with real coefficients and q.x/ be a cubic
polynomial with complex coefficients.
(f) Let p.x/ be a cubic polynomial with real coefficients with real root r.
Write an appropriate first sentence that would begin proofs of each of the following
statements.
4. If m and n are relatively prime integers, then there exist integers x and y such that
mx C ny D 1.
5. The three angle bisectors of any triangle intersect at a common point.
6. If a and b are real numbers with a b, and f is a function continuous on the
closed interval Œa; b, then there is a real number M such that jf .x/j M for all
x 2 Œa; b.
u , !
7. If !
v , and !
w are 3-dimensional vectors, then .! u !
v / !
w D! u .!
v !w /:
Most courses in mathematics discuss sets: sets of numbers, sets of points, sets
of functions, sample spaces, and so forth. This should have given any Calculus
student an intuitive understanding of sets. Many theorems in mathematics are
statements about sets in disguise. For example, the statement that “If the function
f is differentiable at a point, then f is continuous at that point” is equivalent to the
statement “The set of functions differentiable at a point is a subset of the set of
functions continuous at that point.”
For the purposes of this text, it will be enough to define a set as a collection of
elements. That is, elements are those objects that belong to sets, and the notation
x 2 A says that x is an element of the set A, and x … A says that x is not an element
of the set A. The set A is a subset of the set B, or A is contained in the set B, if each
element of A is also an element of B in which case this fact is written as A B. Two
sets, A and B, are equal if they have the same elements, that is, all the elements in the
set A are in the set B, and all the elements in the set B are in the set A. Notationally,
this says that A D B if and only if both A B and B A.
18 2 The Basics of Proofs
There are many ways to express the contents of a set. One is to list the elements
such as A D fa; b; cg or B D f1; 3; 5; 7; : : : g. Another way is to use set builder
notation which states that the set consists of all elements satisfying a given property
P.x/ and is written fx j P.x/g, or to emphasize that the elements of the set are also in
set A, it is often written fx 2 A j P.x/g. Examples are fx j x > 0g, fy j y2 C3y2 > 7g,
and ff j f is a function differentiable at x D 3g. Note that a set is determined by the
elements that are in the set. Thus, f1; 2; 3g D f3; 2; 1g D f1; 2; 2; 3; 3; 3; 1; 2; 3g
because all three of these sets contain exactly the same three elements. In some
contexts, mathematicians will talk about a multiset which is an object similar to
a set but allows elements of the collection to appear with different multiplicities.
Thus, f1; 2; 3g and f1; 2; 2; 3; 3; 3; 1; 2; 3g would be different multisets even though,
in the notation of sets, they are the same set.
One special set is the empty set written as ; or fg and is the set that has no
elements. In some contexts there is an understanding of a universal set, U, such
that all other sets under consideration are subsets of U. For example, the sets
A D f1; 2; 3g and B D f2; 4; 6; 8; : : : g can be thought of as subsets of the universal
set U D f1; 2; 3; 4; : : : g.
Take care not to confuse elements and subsets. Remember that sets are collec-
tions of elements and sets are subsets of other sets. It is possible that a set contains
other sets as elements, but this would need to be explicitly clear from the definition
of that set. It is correct to write 3 2 f1; 2; 3; 4; 5g, f1; 3; 5g f1; 2; 3; 4; 5g,
and f1; 2g 2 f1; f1; 2g; f1; f1; 2ggg, but it is incorrect to write 3 f1; 2; 3; 4; 5g,
f1; 2g 2 f1; 2; 3; 4; 5g, or ; 2 f1; 2; 3; 4; 5g.
The student should be familiar with the following standard set operations. The
union of sets A and B is A [ B D fx j x 2 A or x 2 Bg, and the intersection
of sets A and B is A \ B D fx j x 2 A and x 2 Bg. When there is an
understood universal set, U, it makes sense to refer to the complement of a set
Ac D fx 2 U j x … Ag. It does not make sense to discuss the complement of a set if
there is no understood universal set. For example, is f1; 2; 3gc D f4; 5; 6; 7; : : : g,
or is it f: : : ; 3; 2; 1; 0; 4; 5; 6; 7; : : : g? For that matter, is your right shoe an
element of f1; 2; 3gc ? The difference of two sets is AnB D fx 2 A j x … Bg,
and, equivalently, if there is an understood universal set, AnB D A \ Bc . For
example, if A D f1; 2; 3; 4; 5g and B D f2; 4; 6; 8g, then A [ B D f1; 2; 3; 4; 5; 6; 8g,
A \ B D f2; 4g, AnB D f1; 3; 5g, and BnA D f6; 8g.
2.3.2 Exercises
There are many simple statements about sets which should be immediately obvious
to students reading this text, but learning to write proofs for these types of statements
will be instructive and useful in the proof writing discussed in the following
chapters. Here are some of those simple statements that apply to all sets A, B,
and C.
Some Statements About All Sets A, B, and C
• A A [ B.
• A \ B A.
• An.B [ C/ .A [ B/nC.
• .A [ B/ \ C A [ .B \ C/.
• A [ B D B [ A, the Commutative Law of Union.
• A \ B D B \ A, the Commutative Law of Intersection.
• .A [ B/ [ C D A [ .B [ C/, the Associative Law of Union.
• .A\B/\C D A\.B\C/, the Associative Law of Intersection.
• A [ .B \ C/ D .A [ B/ \ .A [ C/, the Distributive Law of
Union Over Intersection.
• A \ .B [ C/ D .A \ B/ [ .A \ C/, the Distributive Law of
Intersection Over Union.
• .A [ B/c D Ac \ Bc , DeMorgan’s Laws.
• .A \ B/c D Ac [ Bc , DeMorgan’s Laws.
The first four of these statements propose that one set is a subset of a second set.
From the definition of subset, for A B to be true, it is required that for every
x 2 A, x must also be in B. There is a standard template for proofs of statements of
this form:
20 2 The Basics of Proofs
For example, how would one prove the statement “For all sets A and B, A
A [ B?” Because this proof is supposed to apply to any sets A and B regardless of
what properties they may possess, all that would be necessary for the “SET THE
CONTEXT” part of the proof is a statement introducing to the reader the fact that
the variables A and B will represent sets. Since A A [ B exactly when every
element of A is also an element of A [ B, the “ASSERT THE HYPOTHESIS” part
of the proof needs to select a generic element of the set A so that the proof can
conclude that the generic element is an element of set A [ B. The first two lines of
the proof read:
Suppose that A and B are any two sets. Let x 2 A.
The “LIST IMPLICATIONS” for this proof can be very short. It merely needs
to show that the definition of “set union” implies that x is in the union A [ B. This
completes the proof.
PROOF: A A [ B.
• SET THE CONTEXT: Suppose that A and B are any two sets.
• ASSERT THE HYPOTHESIS: Let x 2 A.
• LIST IMPLICATIONS: Since x 2 A, it is true that x 2 A or x 2 B.
• By the definition of set union x 2 A [ B.
• STATE THE CONCLUSION: Therefore, by the definition of subset,
A A [ B.
Do the statements of this proof have to appear in exactly this order using exactly
these words? Of course not. There can be many variations in what makes up a good
proof. But it does not hurt to review why these statements make a good proof. The
first line Suppose that A and B are any two sets just makes it clear to the reader
that the variables A and B can be used to represent any two sets. Here is where the
reader of the proof may well mentally choose two sets so that when reading the
remainder of the proof, the reader can verify that the statements make sense when
applied to those two sets. The second line Let x 2 A is required because by the
definition of “subset,” one must show that each element of A is also an element of
A [ B, so selecting an arbitrary element of A is the natural way to do this. The next
line Since x 2 A, it is true that x 2 A or x 2 B is just a statement of logic that says
if statement p is true, then statement p or q is also true. Of course, this particular
p or q statement is exactly the definition of x being a member of A [ B, which is
exactly what is needed to complete the proof.
2.3 Proofs About Sets 21
Could one have interchanged the third and fourth lines of this proof? Well, yes;
the proof would be complete if that were done, but the fact that the definition of set
union is invoked right after its conditions are verified makes the statements of the
proof flow smoothly. The reader facing the definition of set union in line three might
wonder why that definition is being shown at that point. By placing that statement
as the fourth statement where the proof reader has just seen that x 2 A or x 2 B,
the proof reader will immediately see that the definition of set union applies. Note
that each of the five statements in the proof has been placed on a separate line in the
display box. This has been done merely to facilitate the discussion about that proof.
In practice, there is no requirement that these statements appear on a separate lines.
The second statement about all sets is A \ B A. This can be proved using the
same proof template as the first statement. Since this statement also applies to any
two sets A and B, the first line of this proof will be the same as the first line of the
previous proof. Because the assertion of the statement being proved is that A \ B is
a subset of another set, the “ASSERT THE HYPOTHESIS” line of the proof would
change to the assertion that x belongs to A \ B. After reading this second line, what
does the proof reader know about x? Only that x belongs to the intersection of two
sets. Thus, that only direction that the proof can proceed is to invoke the definition
of set intersection to make the additional assertion that x 2 A and x 2 B. This is a
statement of the form p and q, so logic allows the assertion that p is true, or, in this
case, that x 2 A. This is the required “STATE THE CONCLUSION” statement, and
the complete proof would be
PROOF: A \ B A.
• SET THE CONTEXT: Suppose that A and B are any two sets.
• ASSERT THE HYPOTHESIS: Let x 2 A \ B.
• LIST IMPLICATIONS: By the definition of set intersection, x 2 A and
x 2 B.
• Thus, x 2 A.
• STATE THE CONCLUSION: Therefore, by the definition of subset,
A \ B A.
For a more substantial example, consider the third of the list of statements about
sets An.B [ C/ .A [ B/nC. A proof of this statement will need to refer to the
definition of set difference as well as the definitions of set union and subset. Since
the statement being proved involves three sets, the “SET THE CONTEXT” part of
the proof will need to refer to all three sets. The “ASSERT THE HYPOTHESIS”
statement will need to select an arbitrary element from An.B [ C/. To emphasize
that the choice of which variable to use is arbitrary, this time use y rather than x to
represent the arbitrarily chosen element. Once it is known that y 2 An.B [ C/,
the only property of y that can be used is the fact that y is a member of a set
difference. Thus, this would be a good time to invoke the definition of set difference.
That assures that y 2 A and y … .B [ C/. At that point one can use the definition of
22 2 The Basics of Proofs
set union to conclude that since y … .B [ C/ that y … B and y … C. Now these facts
can be combined to get the “STATE THE CONCLUSION” statement required by
the proof template. The complete proof would be
2.3.4 Exercises
Let A and B be sets. From the definition of set equality it follows that one can prove
A D B by proving the two separate facts A B and B A. That suggests the
following proof template for proving that two sets are equal.
2.3 Proofs About Sets 23
Is it correct to use the same variable x in both parts of the above proof template?
Yes, since the use of the variable x is only important in the context of showing A B
or B A, there is little chance that the reader will be confused by these two uses of
the same variable. On the other hand, there would be nothing incorrect about using
the variable x to represent the element of set A in the first part of the proof and to
use the variable y to represent the element of set B in the second part of the proof.
Using the same variable has the advantage that it is used the same way in both parts
of the proof, that is, to represent an element of a set that is being shown to also be
an element of a second set.
Is it correct that the variables A and B are used to represent the sets in both parts
of the proof? Could, for example, the first part of the proof use sets A and B, and
the second part of the proof use sets C and D? Here the answer is that it is very
important to use the same variables in both parts of the proof. To show A D B it
must be shown that A B and B A for the same pair of sets A and B. Showing
A B and C D does not let one conclude that A D B. After introducing A and B
in the “SET THE CONTEXT” part of the proof, it would be wrong to change the use
of these variables later in the proof or to change which variables were representing
the two sets.
Consider how to write proofs of three of the example statements:
• A [ B D B [ A, the Commutative Law of Union.
• .A \ B/ \ C D A \ .B \ C/, the Associative Law of Intersection.
• .A [ B/c D Ac \ Bc , DeMorgan’s Law.
The first proof follows easily from the fact that in logic the statements p or q and
q or p are equivalent. This leads to the proof
24 2 The Basics of Proofs
PROOF: A [ B D B [ A.
• SET THE CONTEXT: Suppose that A and B are any two sets.
PART 1 A [ B B [ A
• ASSERT THE HYPOTHESIS: Let x 2 A [ B.
• LIST IMPLICATIONS: By the definition of set union, x 2 A or x 2 B.
• Thus, x 2 B or x 2 A.
• By the definition of set union x 2 B [ A.
• CONCLUDE PART 1: Hence, from the definition of subset, it follows that
A [ B B [ A.
PART 2 B [ A A [ B
• ASSERT THE HYPOTHESIS: Now suppose that x 2 B [ A.
• LIST IMPLICATIONS: By the definition of set union, x 2 B or x 2 A.
• Thus, x 2 A or x 2 B.
• By the definition of set union x 2 A [ B.
• CONCLUDE PART 2: Hence, from the definition of subset, it follows that
B [ A A [ B.
Note that the “PART 1” and “PART 2” labels have been included in the above
display as guides to the student, but they are not required elements of the proof
itself. This proof can be shortened. Since the second part of the proof is identical to
the first part of the proof except that the roles of the sets A and B are interchanged,
one might save the reader from having to think through the details of the second half
of the proof which are identical to the details of the first half. The proof could be
written as
PROOF: A [ B D B [ A.
• Suppose that A and B are any two sets.
• Let x 2 A [ B.
• By the definition of set union, x 2 A or x 2 B.
• Thus, x 2 B or x 2 A.
• By the definition of set union x 2 B [ A.
• Hence, from the definition of subset, it follows that A [ B B [ A.
• Similarly, one can conclude that B [ A A [ B.
• Therefore, since A [ B and B [ A are subsets of each other, by the definition
of set equality A [ B D B [ A.
In fact, the first half of the proof is the second half of the proof. The first half of
the proof shows that A[B B[A for any two sets A and B. In particular, that proof
applies when the roles of the two sets are interchanged; just let the variable A in the
2.3 Proofs About Sets 25
first part of the proof represent the set B from the second part of the proof, and let
the variable B in the first part of the proof represent the set A from the second part
of the proof.
The Associative Law of Intersection refers to three sets and requires repeated
use of the definition of “set intersection.” The definition is used to break down the
statement x 2 .A \ B/ \ C into the three simple statements x 2 A, x 2 B, and x 2 C
and then these facts are put back together to form the needed x 2 A\.B\C/. Again,
the proof needs two parts. The result is
PROOF: .A \ B/ \ C D A \ .B \ C/.
• Suppose that A, B, and C are any three sets.
PART 1 .A \ B/ \ C A \ .B \ C/
• Let x 2 .A \ B/ \ C.
• By the definition of set intersection, x 2 .A \ B/ and x 2 C.
• Also, by the definition of set intersection, x 2 A and x 2 B.
• Thus, x 2 A, x 2 B, and x 2 C.
• Since x 2 B and x 2 C, by the definition of set intersection x 2 B \ C.
• Since x 2 A and x 2 B \ C, by the definition of set intersection
x 2 A \ .B \ C/.
• Hence, from the definition of subset, it follows that .A \ B/ \ C
A\ .B \ C/.
PART 2 A \ .B \ C/ .A \ B/ \ C
The two DeMorgan’s Laws are useful because they tell how to simplify the
complement of a set formed by a combination of unions and intersections of sets.
Proving these laws can follow the template for showing set equality. The proofs will
need to refer to the definitions of set union, set intersection, and set complement.
The order in which these definitions are invoked follows from what is known at that
point of the proof. For example, if you know that x 2 .A [ B/c , then the only way
to make progress in the proof is to apply the definition of set complement because
the only attribute known about the set is that it is the complement of some other set.
26 2 The Basics of Proofs
A B A B
(A ∪ B)C AC ∩ BC
Yes, that other set is a union of two sets, but there is no way to use that information
at this point of the proof because complementation was performed after the union
was taken (Fig. 2.2).
PROOF: .A [ B/c D Ac \ Bc .
• Suppose that A and B are any two sets.
PART 1 .A [ B/c Ac \ Bc
• Let x 2 .A [ B/c .
• By the definition of set complement, x … .A [ B/.
• If x 2 A or x 2 B, then x 2 A [ B which is false.
• Thus, x … A and x … B, so by the definition of set complement, x 2 Ac and
x 2 Bc .
• By the definition of set intersection x 2 Ac \ Bc .
• Hence, from the definition of subset, it follows that .A [ B/c Ac \ Bc .
PART 2 Ac \ Bc .A [ B/c
• Now, let x 2 Ac \ Bc .
• By the definition of set intersection, x 2 Ac and x 2 Bc .
• Thus, by the definition of set complement, x … A and x … B.
• If x 2 A [ B, then by the definition of union, it would follow that x 2 A or
x 2 B which is false.
• Thus, x … A [ B, and, by the definition of set complement, x 2 .A [ B/c .
• Hence, from the definition of subset, it follows that Ac \ Bc .A [ B/c .
• Therefore, because Ac \ Bc and .A [ B/c are subsets of each other, by the
definition of set equality .A [ B/c D Ac \ Bc .
2.3.6 Exercises
Give that A, B, and C are sets, write proofs for each of the following statements.
1. A \ B D B \ A.
2. A \ .BnA/ D ;.
3. .AnB/ [ .BnA/ D .A [ B/n.A \ B/.
2.4 Proofs About Even and Odd Integers 27
4. .A [ B/ [ C D A [ .B [ C/.
5. .A [ B/ \ C D .A \ C/ [ .B \ C/.
6. .A \ B/ [ C D .A [ C/ \ .B [ C/.
7. An.B [ C/ D .AnB/ \ .AnC/.
You are already very familiar with the natural numbers, N D f1; 2; 3; 4; : : : g,
which are sometimes called the counting numbers or whole numbers. By adding
zero and the negative natural numbers to this set, one obtains the integers, Z D
f: : : ; 3; 2; 1; 0; 1; 2; 3; : : : g. The natural numbers are often referred to as the
positive integers. Much of a student’s first study of mathematics is concerned
with these two sets of numbers. By a very young age most people are already
familiar with even and odd integers and some of their properties. This section will
construct proofs of some of these properties both because the student will feel very
comfortable with the concepts and because it allows for the introduction of some
basics about how to write proofs.
Before proceeding with proofs, though, it is necessary that there is agreement on
the definitions of even and odd integers. Indeed, there are many possible definitions
of even integers:
n 2 Z is an even integer if
• the decimal representation of n has a ones digit equal to 0, 2,
4, 6, or 8.
• n is either 0 or the prime factorization of n contains a factor
of 2.
• there is an integer k such that npD 2k.
• in is a real number, where i D 1.
• the number .1/n is positive.
• 9n 1 .mod 10/.
• sin. n
2
/ D 0.
• n2 2 Z.
Which of these definitions should be used when writing proofs about even and
odd integers? Actually, since all the definitions are equivalent, one could adopt
any one of these definitions and then prove theorems that show that all the other
definitions are equivalent to the chosen definition. This is not an unusual situation
in mathematics, especially for a concept as elementary as even integers. But it turns
out that one of these definitions is particularly well suited for writing proofs, and
that is, n 2 Z is even if there is a k 2 Z such that n D 2k. This makes a useful
28 2 The Basics of Proofs
definition because it provides a fairly easy way to check whether a given integer is
even, and because knowing that a number n is even immediately gives you a number
k for which n D 2k, and that is a powerful tool for proving facts about even integers.
For this reason, this chosen definition is called the working definition, that is, it
is the definition easiest to apply in the wide variety of contexts. It is the definition
chosen from which all other properties of even numbers can be derived.
A similar discussion could take place about how to define odd integers. The
working definition is that n 2 Z is odd if there is a k 2 Z such that n D 2k C 1.
There is a long list of facts you could prove about even and odd numbers.
Facts About Even and Odd Integers
• Every integer is either even or odd.
• No integer is both even and odd.
• n 2 Z is even if and only if n C 1 2 Z is odd.
• The sum of any two even integers is even.
• The sum of any two odd integers is even.
• The sum of an even integer and an odd integer
is odd.
• The product of two odd integers is odd.
• The product of two integers is odd only if both
of the factors are odd.
Together, the first two of these facts say that each integer is either even or odd
but not both. This says that the sets of even and odd integers form a partition of
Z, that is, the sets are disjoint and the union of the sets is all of Z. Some authors
require that all the sets of a partition be nonempty as in the case with even and
odd integers. So why is it that every integer is either even or odd? This depends
on the Division Algorithm that states that if m; n 2 Z with n > 0, then there are
unique q; r 2 Z with 0 r < n such that m D nq C r. In this case q is called
the quotient of the division, and r is called the remainder of the division. Using
the Division Algorithm, any integer m can be divided by 2 giving a quotient and
remainder where the remainder is either 0 or 1. If the remainder is 0, then m D 2q
for integer q implying that m is even, and if the remainder is 1, then m D 2q C 1 for
integer q implying that m is odd.
How can these ideas be used to write a good proof of Every integer is either even
or odd? First it is easier to reword the statement as If m 2 Z, then either m is even
or m is odd. This is a conditional statement, so the natural way to begin a proof
is to assume that the hypothesis of the statement is satisfied, that is, that m is an
integer. Now apply the Division Algorithm to get the quotient q and remainder r
guaranteed by the algorithm. Finally, the value of r shows that m either satisfies the
2.4 Proofs About Even and Odd Integers 29
definition of being an even integer or the definition of being an odd integer. The
result would be
PROOF: Ever integer is either even or odd.
• Let m be an integer.
• By the Division Algorithm there are integers q and r with 0 r < 2 such
that m D 2q C r.
• If r D 0, then m D 2q for integer q which means that m satisfies the
definition for being even.
• If r D 1, then m D 2q C 1 for integer q which means that m satisfies the
definition for being odd.
• Since r must be either 0 or 1, it follows that every integer is either even
or odd.
Next consider the how to prove the statement The sum of any two odd integers is
even. The statement concerns the sum of any two odd integers, so the proof reader
would expect the proof to consider two arbitrarily chosen odd integers. Once two
odd integers are chosen, the definition of odd integer should be invoked because, at
that point, that is the only information that is known about the two integers. Finally,
a little algebra will help to show that the sum of these two odd integers satisfies the
definition of even integer. Here is an attempt to write such a proof that makes several
common proof writing errors.
PROOF ATTEMPT: The sum of any two odd integers is even.
• The two integers are odd, so each has the form 2k C 1.
• The sum of these two integers is .2k C 1/ C .2k C 1/ D 4k C 2.
• k could be even or odd.
• The number 2 is even since it is 2 1.
• 4k is even since it is 2 2k.
• The sum of two even numbers is even, so the sum of 4k and 2 is an even
number.
• Therefore, the sum of two odd integers is always even.
start with an odd integer, say m, and then represent it as 2kC1 rather than starting
with 2k C 1. The subtle point is that one should start with odd integer and use
its definition to move on to 2k C 1 rather than starting with 2k C 1 which jumps
the gun. The reader of the proof could wonder whether 2k C 1 could represent
a generic odd integer. Well, it can, but this takes some thought which can be
avoided by starting with an odd integer m and then using the definition of odd to
select the integer k such that m D 2k C 1.
• The definition of “odd integer” does refer to “2k C 1,” but it is more precise. It
does not say “has the form.” It says that there is an integer k such that the odd
number equals 2k C 1:
• It is a major error to allow both odd integers to equal 2k C 1 for the same number
k. The only way this can happen is for the two odd integers themselves to be
equal. Thus, this “proof” only applies to a small subset of cases where one adds
two identical odd integers together such as 3 C 3 or 117 C 117.
• The statement k could be even or odd is certainly correct, but it does not
contribute to the proof. It is a statement about items in the proof that is not part
of the proof. Occasionally, one will make a definition as part of a long proof,
and then give some examples to help the reader understand that definition. But
if a statement is not needed either as a critical step in a proof or an important
illustration to aid the understanding of the proof, then the statement should be
left out of the proof because it distracts from the proof and complicates it.
• The statement The sum of two even numbers is even is correct, but it has not
been proved yet, at least in this text, and is equivalent in difficulty to proving the
corresponding statement about the sum of odd integers. Thus, it is not appropriate
to use the result about sums of even integers to prove one about the sum of odd
integers.
Considering these ideas, one can construct a better proof.
The form of this proof can be copied almost word for word to get a similar proof
of the statement The product of two odd integers is odd.
2.5 Basic Facts About Real Numbers 31
2.4.3 Exercises
Many of the theorems of Calculus involve properties of the real numbers. Some
of these properties are subtle, so it is essential to understand this important set of
numbers. Already introduced are the sets of natural numbers, N, and the integers, Z.
Also of importance is the set of rational numbers, Q D f mn j m; n 2 Z; n ¤ 0g. This
definition comes with the understanding that the two rational numbers mn and ab are
equal whenever mb D na. Thus, there are always infinitely many representations for
each rational number. For all rational numbers r ¤ 0, one can always find relatively
prime integers m and n with n > 0 such that r D mn . Together with an agreement to
write the rational number 0 as 01 , each rational number has a unique lowest terms
representation.
The set of rational numbers is more than a set of fractions with integers for
numerators and denominators. It also comes with the two binary operations of
addition (C) and multiplication () and with the order relation less than (<).
32 2 The Basics of Proofs
The binary operations satisfy conditions which make Q into a field. A field F
is a set with operations of addition and multiplication that satisfies the following
axioms.
Axioms for a Field F
A set F together with the binary operations of addition .C/ and multiplication
./ form a field if F contains the two elements 0 and 1 with 0 ¤ 1 such that for
every r; s; t 2 F
r C s 2 F and r s 2 F and
r Ds ! rCt DsCt r Ds ! rt Dst the Closure
Properties
.r C s/ C t D r C .s C t/ .r s/ t D r .s t/ the Associative
Properties
rCsDsCr rsDsr the Commutative
Properties
rC0Dr r1Dr the Identity
Properties
There exists r 2 F If r ¤ 0, there exists 1r 2 F
such that r C .r/ D 0 such that r 1r D 1 the Inverse
Properties
r .s C t/ D r s C r t the Distributive
Law of
Multiplication
Over Addition
Notice that the rational numbers do satisfy the eleven field axioms. One defines the
operation subtraction () by r s D r C .s/ and the operation division ( ) for
s ¤ 0 by r s D r 1s D rs . Moreover, the field Q together with the less than order
relation is an ordered field that obeys the following axioms.
Notice that the rational numbers do satisfy the four ordered field axioms. One
defines the other order relations of greater than (>), greater than or equal to
(), and less than or equal to () in the obvious ways, that is, r > s whenever
s < r, r s whenever either r > s or r D s, and r s whenever either r < s or
r D s.
There are many other ordered fields, and it is constructive to consider how to
justify the fifteen ordered
p field axioms for a different ordered field. For example,
the set T D fr C s 2 j r; s 2 Qg is an ordered field p using the usual
p addition and
multiplication operations.
p For twop elements a C b 2 and pc C d 2 in T, define
addition as
p .a C b 2/pC .c C d 2/ D .a C c/ C .b C
p d/ 2 and multiplication
as .a C b 2/ .c C d 2/ D .ac p C 2bd/ C .adp C bc/ 2. To define the less than
p
relation you would want .a C b 2/ < .c C d 2/ whenever p a c < .d b/ 2
which can be checked by squaring both a c and .d b/ 2, although you will
need topalso considerpthe signs of a c and d b. Thus, the definition becomes
.a C b 2/ < .c C d 2/ if one of the following holds:
• a c < 0 and 0 < d b,
• 0 < a c, 0 < d b, and .a c/2 < 2.d b/2 , or
• a c < 0, d b < 0, and .a c/2 > 2.d b/2 .
It is fairly easy to check that T is an ordered field. The only field axiom which
does not follow immediately from the properties of rational
p numbers is the inverse
axiom for multiplication. You should verify that for a C b 2 ¤ 0, its multiplicative
inverse is
a b p
C 2
a2 2b2 a2 2b2
which is in T. The order axioms take more work to verify due to the complicated
definition of less than. For example, to verify the less than relation
p works correctly
p
with addition,
p one would begin with three
p elements of pT, a C b 2, c C d 2, and
e C f p2 where it ispgiven that a C p b 2 < cpC d 2. One needs to compare
.a C b 2/ C .e C f 2/ with .c C d 2/ C .e C f 2/. To do this, one compares the
values of .a C e/ .c C e/ D a c and .d C f / .b C f / D d b. But this reduces to
p a c and
comparing p d b which are known to satisfy the correct conditions because
a C b 2 < c C d 2 was given.
Every ordered field satisfies a long list of simple properties that you will
associate with facts learned in Arithmetic and Algebra. Here are some of those
properties.
34 2 The Basics of Proofs
The reader may wish to prove some of these properties by applying the axioms.
This book will not dwell on these proofs since the techniques used in proving them
are not essential for writing most proofs in Analysis. Two simple proofs are given
here as examples.
The next theorem essentially says that if .1/ r has the same properties as r, it
must equal r.
2.5 Basic Facts About Real Numbers 35
Note that every ordered field F will contain a copy of Q. This follows since
0; 1 2 F , and if n is a natural number in F , then n C 1 2 F . Thus, it follows by
mathematical induction that n 2 F for all n 2 N. Moreover, since 0 < 1, it follows
for each n 2 N that n D n C 0 < n C 1 showing that all natural numbers are
distinct elements of F . The existence of the negatives of all numbers in F implies
that the integers is a subset of F , and the existence of reciprocals implies that all of
Q lies in F . There are fields which are not ordered fields, and some of them do not
contain copies of Q. Indeed, there are finite fields as well as infinite fields that do
not contain Q or even N.
There are infinitely many ordered fields. The real numbers, R, is special because
it includes every number that is considered a possible “distance from zero,” either
positive, negative, or zero. An easy way to ensure that R contains every possible
distance is to require it to satisfy the Completeness Axiom. This axiom considers
nonempty subsets of an ordered field, F , (actually, any ordered set would do). A
subset S F is said to be bounded above if there is an M 2 F such that all x 2 S
satisfy x M. In this case, M is called an upper bound of S. Similarly, S F is
bounded below by lower bound K 2 F if all x 2 S satisfy x K. If S F is both
bounded above and bounded below, then S is said to be bounded. If M is an upper
bound for a set S, and it is less than or equal to every upper bound of S, then M is
the least upper bound of S. Similarly, if K is a lower bound for a set S, and it is
greater than or equal to every lower bound of S, then K is the greatest lower bound
p
of S. For example, if S is the interval .1; 5 D fx j 1 x < 5g, then 10, 6, and 30
36 2 The Basics of Proofs
are all upper bounds of S, but 5 is the least upper bound of S. Also, 2, 0, and 12
are all lower bounds of S, but 1 is the greatest lower bound of S. One often uses the
notation l.u.b..S/ or sup.S/ to represent the least upper bound or supremum of S
and g.l.b..S/ or inf.S/ to represent the greatest lower bound or infimum of S.
You may not have ever doubted that every nonnegative real number has a square
root, but this is a fact that can be proved using the axioms for the real numbers. It is
a nice application of both the Trichotomy Property and the Completeness Axiom.
Given a positive real number, r, the proof constructs the set S D fx 2 R j x2 rg
and then uses the Completeness Axiom to exhibit a value, s, equal to the least upper
bound of S. Then it shows that s2 cannot be greater than r and cannot be less than r,
so by the Trichotomy Property, s2 must equal r.
In particular, the proof first assumes that s2 > r and shows that there is a number
y > 0 such that the square of s y is also greater than r. This shows that s y is an
upper bound for S which contradicts the fact that s is the least upper bound of S. The
2
proof magically suggests that y D s 4sr works. Where did this magical expression
for y come from? It came from considering what property you would want such a y
to have. If you want .s y/2 > r, this suggests that you want s2 2sy C y2 > r. This
inequality is quadratic in y and has an unnecessarily messy solution. But one of the
most important lessons about writing proofs in Analysis is that one can often be a
little sloppy when trying to show that an inequality holds. Here, for example, rather
than finding a y such that s2 2sy C y2 > r, it would be sufficient to find a y such
that s2 2sy > r, because if s2 2sy > r, then certainly the needed s2 2sy C y2 > r
also holds. The advantage of making this change is that the inequality s2 2sy > r
2 2
is very easy to solve for y yielding y < s 2sr . Thus, the value y D s 4sr ought to work
fine, and, hence, the magic is demystified. Of course, there are many other possible
values of y that would also have worked in this proof, but only one value for y is
needed.
After showing that s2 > r cannot be true, the proof assumes that s2 < r and
shows that there is a number y > 0 such that the square of s C y is less than r.
This shows that s C y is in S which contradicts the fact that s is an upper bound of
2
S. Again, the proof just suggests setting y D rs 4s
. Can you figure out where this
expression for y came from? Indeed, the calculation is similar to the one above. You
need .s C y/2 r, so s2 C 2sy C y2 r. It is simpler if y2 could be replaced by
2sy. If you assume y 2s, it allows you to conclude y2 2sy so that .s C y/2 D
s2 C 2sy C y2 s2 C 2sy C 2sy D s2 C 4sy. You then want a y that satisfies
2
s2 C 4sy r. Thus, the value y D rs 4s
gives the needed value of y (Fig. 2.3). Putting
these ideas together gives the following proof.
S s2 – r s – s2
S s r 4s
4s S s
p
Fig. 2.3 Showing the least upper bound of S is s D r
38 2 The Basics of Proofs
• Thus, it must be true that s2 D r which proves that for every real number
r 0 there is an s 2 R with s2 D r.
The concept that separates the area of Mathematics known as Analysis from other
branches such as Algebra, Topology, Set Theory, and Combinatorics is the idea
of distance. In the real numbers, one canmeasure distance by using the absolute
x if x 0
value function which is defined as jxj D : For a real number x, the
x if x < 0
absolute value of x can be thought of as the distance that x is from the real number
0. Note that for all x 2 R it holds that jxj x jxj. If k > 0, then ˇ the set
fx j jxj < kg is the same ˇas the set fx j k < x < kg. Similarly, the set fx ˇ jxj > kg
is the same as the set fx ˇ k > x or x > kg.
The distance between two real numbers x and y can be defined as jx yj. Note
that this distance is positive unless x D y.
One property of the absolute value function used frequently in proofs in Analysis
is the triangle inequality which states that for all x; y 2 R, jx C yj jxj C jyj. The
name of this inequality comes from geometry where it is known that the sum of the
2.5 Basic Facts About Real Numbers 39
y x+y
lengths of two sides of a triangle always exceeds the length of the third side of the
triangle (Fig. 2.4). One simple proof of the triangle inequality is
PROOF (Triangle Inequality): jx C yj jxj C jyj
• Let x and y be elements of R.
• Then jxj x jxj and jyj y jyj.
Adding these inequalities yields .jxj C jyj/ x C y .jxj C jyj/.
• This last inequality is equivalent to jx C yj jxj C jyj.
A subset S contained in R is called connected if it has the property that for any
two numbers in S, all the numbers between those two numbers are also in S. More
precisely, S is connected if for all x; y 2 S with x < y, it follows that z 2 S for all z
with x < z < y. Informally, this means that there are no holes in the set S. Another
word for a connected set of real numbers is an interval. If a < b are real numbers,
all of the following sets are intervals.
Intervals of Real Numbers
; empty set
fag D Œa; a single point
fx j a < x < bg D .a; b/ open bounded interval
fx j a x bg D Œa; b closed bounded interval
fx j a x < bg D Œa; b/ bounded interval open on the right
fx j a < x bg D .a; b bounded interval open on the left
fx j a < xg D .a; 1/ open right infinite interval
fx j x < bg D .1; b/ open left infinite interval
fx j a xg D Œa; 1/ closed right infinite interval
fx j x ag D .1; b closed left infinite interval
R entire real line
40 2 The Basics of Proofs
2.5.4 Exercises
2.6 Functions
2.6.2 Surjection
The domain of the function f is exactly the set of all x that are first coordinates of
the order pairs in f , that is, the domain is A D fx j .x; y/ 2 f g. The range of f is
defined as the image of f , that is, the range is fy j .x; y/ 2 f g. Clearly, the codomain
of f can be any set that contains the range of f . This can lead to some confusion
since the codomain of f is not precisely defined. It is simply a convenience. When
one defines a function f W R ! R, one means that f is defined for every real number,
and that for any x 2 R, the value f .x/ also lies in R. This is the case whether or not
R is the range of f or if the range of f is actually some proper subset of R. It could
be difficult and unnecessary to calculate exactly which subset of R is the range of
f , so it might be easier to just give the codomain as R and avoid the technicalities
of figuring out just what values of R are in the range of f . For example, the function
f .x/ D 3x6 15x4 C 12x3 C 25x2 32x C 14 maps the real numbers into the real
numbers, but to find the range of f , you would need to find the minimum value
of f . This minimum exists, but it may not be possible to give its value explicitly.
2.6 Functions 41
Note that the crucial step in proving that a function is surjective is showing the
existence of an x with f .x/ D y and verifying that the x is in the domain A of the
function. For example, the function f .x/ D 5x2 C 1 is a surjection from the negative
real numbers onto the interval .1; 1/. To prove this you would need to show that for
each real number y > 1 there is a negative real number x for which f .x/ D y. But this
just involves a simple algebraic manipulation. That is, if you need 5x2 C 1 D y, then
you can solve to get 5x2 D y1 and x2 D y1 . Here one needs to be careful because
q 5
y1
it is easy to continue by writing x D 5
which always results in a positive value
for x. The
q proof needs to exhibit a negative value for x, so it is important to set
x D y1 5
. There is no need for the proof to display the steps of solving the
equation for x. The goal is to produce a value of x 2 A such that f .x/ D y; how you
arrived at that x is not important. It may be interesting, but it is not an essential part
of the proof, and, therefore, it should not be part of the proof.
2.6.3 Injection
The definition of function requires that each value x in the domain of f is found in
exactly one ordered pair .x; y/ 2 f . The same does not have to hold for values in
42 2 The Basics of Proofs
the codomain, that is, one value y in the codomain could appear in many order pairs
.x; y/ 2 f . For example, for the constant function f W R ! R given by f .x/ D 1 for
all x 2 R, the value 1 appears as the second coordinate in all the ordered pairs of
the function. If a function has the property that no value of y appears as the second
coordinate of more than one ordered pair in f , then f is said to be injective or, less
formally, that f is one-to-one. In this case the function f is called an injection. In
such a case, one sees that f .x1 / D f .x2 / only if x1 D x2 . This gives a procedure for
proving that a function is injective.
g(x)
y
f(x)
x B
z
A
f◦g
C
2.6.4 Composition
2.6.5 Exercises
a y = f(x)
a y = f(x)
L+ε ε
L
ε
L–ε
y = f(x)
a
a– a+
such that jf .x/ Lj < for all x in that interval. That is, there is a value ı > 0
such that every x satisfying jx aj < ı also satisfies jf .x/ Lj < as seen in
Fig. 3.3. Again, the choice of the Greek letter ı (delta) is completely arbitrary, but
the tradition of using ı in this context is universal.
Note that for the function whose graph appears in Fig. 3.3 the value L is the limit
of the function as x approaches a, and L also happens to be the value of f .x/ at x D a.
You should recall that this sometimes happens (specifically when f is continuous at
x D a), but that this is not a requirement. Indeed, one reason for discussing limits
in the first place is because there is a need to evaluate the behavior of a function
as x approaches a value a when the function fails to be defined at x D a. Thus, in
general, one does not want to require jf .x/ Lj < for all x with jx aj less than
some positive value ı since this would require jf .x/ Lj < at x D a. Instead,
one excludes the need for the function to satisfy any conditions at all at x D a by
saying that there is a positive ı such that f is within the desired tolerance of L for all
x with 0 < jx aj < ı. Clearly, the value of ı must be chosen to be positive since
no negative value would represent a distance and, ı D 0 would not result in a region
around the number a satisfying jx aj < ı.
3.2 Proving lim f .x/ D L 49
x!a
Combining these ideas results in the following definition. Suppose that the
function f is defined for all x in an open interval containing a 2 R except perhaps
at x D a. Then the limit of f as x approaches a is L, lim f .x/ D L, means that for
x!a
every > 0 there exists a ı > 0 such that for every x satisfying 0 < jx aj < ı, it
follows that jf .x/ Lj < . The power of this definition is the fact that the and ı
are arbitrary positive numbers. For example, what if you knew that for every > 0
there were a ı > 0 such that whenever 0 < jx aj < ı, then jf .x/ Lj < 2?
Would this be sufficient for showing lim f .x/ D L? The answer is yes, because the
x!a
is arbitrary. Suppose that for any > 0 you can find a ı > 0 that will ensure that
jf .x/ Lj < 2. Then since 2 is also a positive number, you can find a ı 0 > 0, likely
smaller than ı, that will ensure that jf .x/ Lj < 2 2 D . The point here is that
since was arbitrary, you can replace it with any positive number, including 2 .
3.1.1 Exercises
1. For all 0 there is a ı 0 such that if 0 < jx aj < ı, then jf .x/ Lj < .
2. For all > 0 there is a ı > 0 such that if 0 < jx aj < 4ı , then jf .x/ Lj < 7.
3. For all > 0:001 there is a ı > 0:001 such that if 0 < jx aj < ı, then
jf .x/ Lj < .
4. For all ı > 0 there is an > 0 such that if 0 < jx aj < ı, then jf .x/ Lj < .
5. There exists a ı > 0 such that for all > 0, if 0 < jxaj < ı, then jf .x/Lj < .
6. For all ı > 0 there is an > 0, such that if 0 < jx aj < , then jf .x/ Lj < ı.
7. For all > 1 there is a ı > 1, such that if 1 < jx aj C 1 < ı, then
jf .x/ Lj C 1 < .
The definition of limit provides a formula by which one can construct a proof that
a particular function f has a limit of L at the point a. The definition requires that
for every > 0 there is a ı > 0 that satisfies certain properties. Thus, a proof of a
limit must show that for every > 0 you can exhibit a ı > 0 which has the needed
property. As with other proofs that some property holds for all elements of a set, the
proof begins by selecting an arbitrary element of that set. In this case, one would
select an arbitrary > 0. The goal is to present a value for ı > 0 such that every x
satisfying 0 < jx aj < ı also satisfies jf .x/ Lj < . That suggests the following
proof template.
50 3 Limits
• SET THE CONTEXT: Make statements telling what is known about the
function f and the numbers a and L.
• SELECT AN ARBITRARY : Given > 0,
• PROPOSE A VALUE FOR ı: let ı D . Here you would insert an
appropriate value for ı.
• SELECT AN ARBITRARY x: Select x such that 0 < jx aj < ı.
• LIST IMPLICATIONS: Derive the result jf .x/ Lj < .
• STATE THE CONCLUSION: Therefore, lim f .x/ D L.
x!a
PROOF: lim 2x 3 D 7
x!5
• Let f .x/ D 2x 3.
• Given > 0, let ı D 2 > 0.
• Select x such that 0 < jx 5j < ı D 2 .
• Then ı > jx5j implies > 2jx5j D j2x10j D j.2x3/7j D jf .x/7j.
• Therefore, lim 2x 3 D 7.
x!5
The only problem with the above proof is in its use of the variable ı. In the second
line of the proof ı is set to 1, and in the fourth line it is set to 27 . It does not make
sense to set the value of ı equal to both of these values because, except in the rare
case that D 27, the value of ı cannot be equal to both values at the same time.
The solution is to choose one value for ı that satisfies two separate conditions. For
example, you can first require that ı < 1. Then a choice of x with 0 < jx 4j < ı
will guarantee that jx C 4j < 9. Then 3jxC4j > 39 D 27 . This suggests that you
should select ı D 27 . But you also need ı 1. What happens if someone suggests
that be some rather large number such as D 100? Then ı D 27 would not satisfy
ı < 1. This is not a problem since one can always get away with selecting a positive
value for ı that is smaller than needed. Thus, you can select
ı to be the lesser of 1
and 27 . This choice is usually written as ı D min 27 ; 1 . Now you can put this all
together to get a formal proof that is completely correct.
But this follows with some fairly straightforward algebra. Assuming that x ¤ 2,
PROOF: lim 2
xC2
D 1
x!2 x C3xC2
Clearly, at the point that you stipulate that ı should be less than 12 , you are making
a rather arbitrary decision. What would have happened if you had chosen some
other reasonable bound on the size of ı? For example, what if instead you only
require ı < 34 ? This would also work, although that decision would affect the final
choice of ı for now jx C 1j can get as small as 14 , and jxC2j could be as large as
3jxC1j
4jx C 2j. This suggests that you then select ı D min 4 ; 4 . This choice is no better
or worse than the ı chosen earlier. When one makes such arbitrary decisions, it
is good form to make a selection that does not lead to unnecessary arithmetic or
algebraic complications because one does not want to make the proof any harder to
read than necessary. Thus, it p would perfectly adequate but enormously awkward to
select the bound on ı to be p5 . As long as the bound is less than 1, it will do the
1C 5 p
5
job of keeping jx C 1j bounded away from zero, but p would certainly not be an
1C 5
optimal choice.
54 3 Limits
3.2.1 Exercises
The one-sided limits lim f .x/ D L and lim f .x/ D L are very similar to two-
x!a x!aC
sided limits except that the value of x is only allowed to approach the real number a
from one side. As a result, the definitions of these one-sided limits are very similar
to the definition of limit with minor alterations that forces x to stay on one side of
a. The definition of limit states that for a function f defined in a neighborhood of a,
but not necessarily at a, the limit lim f .x/ D L means for every > 0 there exists a
x!a
ı > 0 such that for every x satisfying 0 < jx aj < ı, it follows that jf .x/ Lj < .
What is it about this definition that allows x to approach a from two sides? It is the
inequality 0 < jx aj < ı that allows x to be either greater than or less than a since
jx aj is positive in either case. By removing the absolute value function in this
inequality and writing instead 0 < x a < ı, the choice of x becomes restricted
to being a value greater than a, or writing instead 0 < a x < ı, the choice of x
becomes restricted to being a value less than a. Thus, if f is a function defined for
all x in an open interval with right end at a, then the limit of f at a from the left is
L, lim f .x/ D L, means that for every > 0 there is a ı > 0 such that for every x
x!a
satisfying 0 < a x < ı, it follows that jf .x/ Lj < . Similarly, if f is a function
3.3 One-Sided Limits 55
defined for all x in an open interval with left end at a, then the limit of f at a from
the right is L, lim f .x/ D L, means that for every > 0 there is a ı > 0 such that
x!aC
for every x satisfying 0 < x a < ı, it follows that jf .x/ Lj < .
One-sided limits are particularly useful in cases where the function f behaves
1
differently on one side of a as on the other side such as the way e x behaves quite
1
differently as x approached 0 from the right where 1x is positive from how e x behaves
as x approaches 0 from the left where 1x is negative. Similarly, the derivative of
f .x/ D jxj has different limits as x approaches 0 from the right and from the left.
There are also
p cases where a function is not even defined for x on one side of a such
as f .x/ D x which is not defined for x < 0.
Proving the existence of one-sided limits is very similar to proving two-sided
limits except that care must be taken to ensure that the value of x remains on one
side of a. Take, for example, the limit lim 2x2 5x D 3. Here f .x/ D 2x2 5x,
x!3C
a D 3, and L D 3. As with a proof of other limits earlier in the chapter, the proof
needs to give a value for ı > 0 which will ensure > jf .x/Lj D j.2x2 5x/3j D
j.2x C 1/.x 3/j. This will follow if jx 3j < ı j2xC1j for all suitable values of
x. What is needed is the largest possible value of 2x C 1, but 2x C 1 is not bounded
unless x is restricted to be close to 3. Thus, stipulate that ı be less than 1 which will
ensure that x3 will be less than1, x will not exceed 4, and 2xC1 will not exceed 9.
Then ı can be chosen to be min 9 ; 1 , and the proof can be written as follows.
Consider a function where its left limit differs from its right limit such as the
function
5 7x if x < 1
f .x/ D : Then lim f .x/ D 2 while lim f .x/ D 1. Thus,
x if x 1 x!1 x!1C
while proving lim f .x/ D 2, it is important to use that fact that x < 1 as part
x!1
of the proof since the required inequalities will not hold for x > 1 (Fig. 3.4). The
following shows one possible proof.
56 3 Limits
5 7x if x < 1
PROOF: lim D 2
x!1 x if x 1
5 7x if x < 1
• Let f .x/ D .
x if x 1
• Given > 0, let ı D 7 .
• Select x such that 0 < 1 x < ı D 7 . Then x < 1, so f .x/ D 5 7x.
• It follows that > 7.1 x/ D 5 7x .2/ D jf .x/ .2/j.
• Therefore, lim f .x/ D 2.
x!1
In the third line of the proof, 0 < 1 x < ı ensures that x < 1 which, in turn, is
needed to conclude that f .x/ D 5 7x and not f .x/ D x. The fact that x < 1 is also
used in the fourth line of the proof to conclude that 5 7x .2/ D jf .x/ .2/j
which follows because 5 7x .2/ is positive for all x < 1.
3.3.1 Exercises
The definitions given in the last two sections do not make sense when the real
number that x approaches, a, is replaced by infinity. Infinity, of course, is not an
element of the real numbers, R, but it does make sense to ask whether a function
approaches a limit when x increases without bound, that is, as x approaches infinity.
When one writes lim f .x/ D L, one is thinking that f .x/ is getting close to the real
x!1
number L as x increases without bound. But it does not make sense to measure how
close x is to infinity by choosing a ı > 0 so that when x is within ı of infinity, f .x/
is close to L. Since infinity is not a real number, one cannot measure the distance
from the real number, x, to infinity, even less expect x to get within ı of infinity. So
how does one quantify “getting closer to infinity?” The answer lies in the phrase
“increases without bound” which suggests that for any bound, N, you could place
on the size of x, the value of x can be made to be greater than that bound. Thus,
instead of selecting a ı > 0 and requiring 0 < jx aj < ı, one chooses a number
N 2 R and requires x > N. This allows the following definition. Suppose that the
function f is defined for all x > K for some real number K. Then the limit of f as
x approaches infinity is L, lim f .x/ D L, means that for every > 0 there exists
x!1
an N 2 R such that for every x > N, it follows that jf .x/ Lj < (Fig. 3.5). Now
consider how one might write a proof of a limit at infinity. For example, consider
proving the limit lim x2 C6x
D 0. Here f .x/ D x2 C6
x
and L D 0. As with other limit
x!1
proofs, the goal is to arrange that jf .x/ Lj < for an ˇ arbitrarily
ˇ chosen > 0.
ˇ x ˇ
Again, you can work backwards. Since jf .x/ Lj D ˇ x2 C6 ˇ, as long as x > 0, it
ˇ ˇ
ˇ x ˇ
would follow that ˇ x2 C6 ˇ < xx2 D 1x . Thus, there is an expression, 1x , which is larger
than jf .x/ Lj for all suitably large values of x. This will help because if you can
assure that 1x is less than , it will follow that jf .x/ Lj is also less than . It would
not have been helpful to exhibit an expression that was always less than jf .x/ Lj
because making that expression small would not imply that jf .x/ Lj is small. Now,
if x > 1 , it follows that 1x < suggesting that 1 is a suitable value for N.
PROOF: lim x
D0
x!1 x2 C6
• Let f .x/ D x2 C6
x
.
• Given > 0, let N D 1 .
• Select x such that x > N > 0. ˇ ˇ
ˇ x ˇ
• Then x > 1 implies > 1x D xx2 > x
x2 C6
D ˇ x2 C6 0ˇ D jf .x/ 0j.
• Therefore, lim x2 C6x
D 0.
x!1
Note that it is important that the third step of the proof pointed out that N is positive.
It is used in the fourth step when 1x is calculated, and this would not have been
allowed if the value of x could have been zero.
For a second example, consider proving lim 2xC5 D 2. Again, you can work
x!1 x7
ˇ 2xC5 ˇ ˇˇ .2xC5/2.x7/ ˇˇ ˇ 19 ˇ
backwards to get > jf .x/ Lj D ˇ x7 2ˇ D ˇ x7 ˇ D ˇ x7 ˇ. From here
there are a number of ways to proceed. You can solve for x in the previous inequality
to get x > 7 C 19
which gives a reasonable value for N. Another way would be to
19
say that if x > 14, then x 7 < x 2x D 2x . In this case the fraction x7 is less than
19 38 38
x D , and it becomes clear that x > is sufficient.
x 2 x
This is an example demonstrating the enormous flexibility one sometimes has in
writing proofs in analysis where you often need to prove an inequality which can
be done in many ways. It is usually easier to prove an inequality involving a simple
fraction rather than a complicated fraction, so you can use the strategy of replacing
a fraction with a simpler fraction that is clearly larger, or in some cases, clearly
smaller. Keep in mind that a ratio of positive values gets larger if its numerator gets
larger or its denominator gets smaller.
A complete proof can be written as follows.
2xC5
PROOF: lim D2
x!1 x7
As in the previous proof it is important that x > 7 is pointed out in the third step of
the proof because that fact is needed both to ensure that f .x/ is defined by assuring
x 7 ¤ 0 and that x 7 is positive allowing the absolute value function to be
introduced in the fifth step of the proof.
With a slight adjustment of the definition of lim f .x/ D L, one gets a definition
x!1
of lim f .x/ D L. This time rather than choosing an N and requiring jf .x/ Lj <
x!1
for all x > N, one instead needs f .x/ to be within of L for those x < N. Thus,
lim f .x/ D L means that for every > 0 there exists an N 2 R such that for all
x!1
x < N it follows that jf .x/ Lj < .
2
To prove lim 6x2x2C5x
7
D 3, one can identify an N such that x < N implies that
x!1 ˇ 2 ˇ ˇ 2 ˇ
2 ˇ 6x C5x ˇ ˇ .6x C5x/3.2x2 7/ ˇ
j 6x2x2C5x
7
3j < by working backwards. That is, ˇ 2x2 7
3 ˇ D ˇ 2x2 7 ˇD
ˇ 5xC21 ˇ
ˇ 2 ˇ. It would be nice to simplify this rather messy expression; something you
2x 7
can do as long as you do not introduce changes that prevent the final inequality from
holding. In this case, the 7 term in the denominator of 5xC212x2 7
is an inconvenience,
3.4 Limits at Infinity 59
and it would be nice to remove it. Simply removing this negative term would make
the absolute value of the fraction smaller when what is needed is to make the fraction
larger. A strategy that does work is to take part of the 2x2 term, which grows very
large as x goes to 1, and pair it with the 7 term. For example, 2x2 7 p can
be written as x2 C .x2 7/. Because x2 7 is a positive value for all x < 7,
removing it from the denominator makes the absolute value of the fraction greater.
Also note that when x < 21 , the numerator j5x C 21j < 5jxj, and this happens for
p 10 ˇ 5xC21 ˇ
all x < 7. It would then be sufficient for > jxj5
D 5jxj ˇ ˇ 5
2 > 2x2 7 or that x <
p x
as long as x < 7. A proof would be
6x2 C5x
PROOF: lim D3
x!1 2x2 7
6x2 C5x
• Let f .x/ D .
2x2 7 p
• Given > 0, let N D min 7; 5 .
p
• Select x such that x < N 7.
ˇ ˇ ˇ ˇ ˇˇ 5xC21 ˇˇ
• Then x < N 5 implies > ˇ 5x ˇ D ˇ 5x x2
ˇ>ˇ 2 2 ˇD
x C.x 7/
ˇ 2 ˇ ˇ ˇ
ˇ .6x C5x/3.2x2 7/ ˇ ˇ 6x2 C5x ˇ
ˇ 2x2 7 ˇ D ˇ 2x2 7
3ˇ D jf .x/ 3j.
6x2 C5x
• Therefore, lim 2 D 3.
x!1 2x 7
3.4.1 Exercises
Find ways to justify each of the following inequalities that hold for large values
of x.
3x5
1. 2x2
< 2x
4xC7
2. 2x2 6
< 5x
2
5x C3xC1
3. x3 x2 1
< 10
x
A sequence is just a function whose domain is the set of natural numbers, N. In this
chapter the codomain of a sequence will be the real numbers, R, but you can have
a sequence with any set serving as the codomain. Functions are usually referenced
using the notation f .x/. But for sequences it is traditional to place the argument of
a sequence in a subscript rather than within parentheses as in a1 ; a2 ; a3 ; : : : . The
entire sequence is notated with angle brackets as in <an >. Note that this is not the
same as the set fa1 ; a2 ; a3 ; : : : g which is just the collection of the values taken on by
the sequence, that is, the range of the function a W N ! R. For each n 2 N, an is
called a term of the sequence, or specifically, the nth term of the sequence.
As with any real-valued function, you can add, subtract, multiply, and divide
sequences. The sum of sequences <an > and <bn > is the sequence <cn > where,
for each n 2 N, cn D an C bn . Similarly, one can define the difference of sequences
and product of sequences as cn D an bn and cn D an bn , respectively. If the
sequence <bn > has no terms equal to zero, then the quotient of sequence <an >
and <bn > is the sequence cn D abnn .
Other arithmetic operations can be similarly defined. If f is any real-valued
function with a domain that includes the range of the sequence <an >, then it makes
sense to define the sequence cn D f .an /. For example,pif <a pn >pis the
p sequence
p
1; 3; 5; 7; : : : , then the sequence < an > is the sequence 1; 3; 5; 7; : : : .
• 1; 2; 3; : : :
• 1; 1; 2; 3; 3; 4; 5; 5; : : :
• 12 ; 23 ; 34 ; 45 ; : : :
• 13 ; 23 ; 33 ; 43 ; : : :
whereas the following sequences are monotone decreasing:
• 1; 0; 1; 2; 3; : : :
• 8; 4; 2; 1; 12 ; 14 ; : : :
• 0; 0; 0; 12 ; 12 ; 12 ; 1; 1; 1; 32 ; : : :
1 1 1
• 44 ; 55 ; 66 ; : : :
It is interesting to notice that every sequence of real numbers can be written as
a sum of a monotone increasing sequence and a monotone decreasing sequence. In
particular, if <cn > is a sequence of real numbers, define an increasing sequence
<an > and a decreasing sequence <bn > as follows. Let a1 D c1 and b1 D 0. Then
for all n 2 N if cn cnC1 , define anC1 D cnC1 bn and bnC1 D bn , and if
cn > cnC1 , define anC1 D an and bnC1 D cnC1 an . These definitions make it
clear that cn D an C bn for each n 2 N. The sequence <an > is increasing because
cn cnC1 implies that anC1 an D .cnC1 bn / .cn bn / D cnC1 cn 0, and
cn < cnC1 , implies an D anC1 . Similarly, <bn > is decreasing because cn > cnC1
implies that bnC1 bn D .cnC1 an / .cn an / D cnC1 cn < 0, and cn cnC1
implies bn D bnC1 . Thus, 1; 1; 2; 2; 3; 3; : : : can be written as the sum of the
two sequences 1; 1; 4; 4; 9; 9; : : : and 0; 2; 2; 6; 6; 12; : : : .
3.5.4 Subsequences
The definition of the limit of a sequence is similar to that of the limit of a function
as x ! 1 except that the function is only defined on the natural numbers. Thus,
if <an > is a sequence of real numbers, then the limit of the sequence is L,
lim an D L, means that for all > 0 there is an N such that for every natural
n!1
number n > N it follows that jan Lj < . A sequence that has limit L is said to
converge to L and is said to be a convergent sequence. A sequence that does not
converge is said to diverge and is said to be a divergent sequence.
Except for slight notational changes, proving that a sequence has a particular
limit involves the same type of work as proving that a function has a particular limit
2
as its variable approaches infinity. For example, the sequence an D 4n2nCnC2 2 7 has
limit 2. To prove this, given an > 0, you would need to exhibit a number N such
that jan 2j < for all n > N. As ˇ with writing
ˇ proofs about functions,
ˇ ˇone can
ˇ 4n2 CnC2 ˇ ˇ nC16 ˇ ˇ nC16n ˇ
ˇ ˇ
work backwards from jan 2j D ˇ 2n2 7 2ˇ D 2n2 7 ˇ n2 C.n2 7/ ˇ. If you
stipulate thatˇ n ˇ 3, then n2 7 9 7 D 2 > 0 allowing you to conclude that
jan 2j < ˇ 17nn2
ˇ D 17 which can easily be made less than by requiring n > 17 .
n
This is what is needed for the proof.
4n2 CnC2
PROOF: lim 2n2 7
D2
n!1
2
• Let an D 4n2nCnC2
2 7 .
• Given > 0, let N D max 3; 17
.
• Select an n > N.
Since N 3, it follows that n2 > 9.
ˇ ˇ ˇˇ nC16n ˇˇ ˇ nC16 ˇ
•
• Also, n > N gives n 17
. Thus, 17
D ˇ 17n
n2
ˇ>ˇ 2 2 ˇ>ˇ 2 ˇD
n C.n 7/ 2n 7
ˇ 2 ˇ n
ˇ 4n CnC2 ˇ
ˇ 2n2 7 2ˇ D jan 2j.
4n2 CnC2
• Therefore, lim 2 D 2.
n!1 2n 7
L– L
a1 a2 a3 a4 a5 … aN an
So how would you write the proof? Certainly the proof would begin with
selecting a generic sequence and making a statement about the properties the
sequence is assumed to have, that is, its being monotone increasing and bounded
above. Then, the proof would proceed to justify the existence of the least upper
bound for the set of terms of the sequence; that will give you the target value of L.
Then, as with most proofs about limits, it would select a value for > 0. Unlike
the limit proofs earlier in this chapter, one cannot immediately state a value for N.
The existence of N must be proved as discussed in the previous paragraph. Finally,
the properties of the sequence can be brought together to show jan Lj < for all
n > N. Here is one possible proof.
Note that the proof needs to refer to the sequence <an > as well as a particular
element of the sequence an . It could be confusing to the proof reader to use the
variable n in both contexts here, especially since the sequence notation <an > is
used after the choice of a specific value of n is made. That is the reason the proof
changed to using the variable j to refer to a generic term index. Then, it could refer
to a specific term using index n without confusing the two uses.
There is also a theorem stating that a monotone decreasing sequence that is
bounded below converges. The proof of this is left as an exercise.
As an illustration of the usefulness of the above result, consider
p a sequence
defined recursively by a1 D 2, and for n p 1, anC1 D an C 12. That is,
p p p
a1 D 2, a2 D a1 C 12 D 14, a3 D 14 C 12, and so forth. One can
prove that this sequence converges by showing that the sequence is both monotone
increasing and bounded above. Indeed, both of these facts can be established by
mathematical induction. The reader is likely already familiar with proofs by
mathematical induction, but this is an appropriate opportunity to review the method
and its merits.
3.5 Limit of a Sequence 65
Suppose the variable n represents any natural number, and there is a statement
S.n/ that includes this variable as part of the statement. For example, the statement
could be lim xn D an . Mathematical induction is a proof technique that uses the
x!a
following proof template to show that S.n/ is true for all n greater than or equal to
some base value b 2 N.
TEMPLATE for using mathematical induction to prove the statement
S.n/ is true for all natural numbers n b.
• SET THE CONTEXT: The statement will be proved by mathematical
induction on n for all n b.
• PROVE S.b/: Prove that the statement is true when the variable n is equal
to the base value, b.
• STATE THE INDUCTION HYPOTHESIS: Assume that S.n/ is true for
some natural number n D k b.
• PERFORM THE INDUCTION STEP: Using the fact that S.k/ is true, prove
that S.k C 1/ is true.
• STATE THE CONCLUSION: Therefore, by mathematical induction, S.n/
is true for all natural numbers n b.
A Cauchy sequence is a sequence whose terms get close together. As with the
definition of limit, the concept of “close” needs to be made precise. As with the
definition of limit, “close” means that given any tolerance > 0, one can go out far
enough in the sequence to ensure that all terms of the sequence beyond that point
are within of each other. Thus, a sequence is Cauchy if for every > 0 there is an
N such that if natural numbers m and n are both greater than N, then jam an j < .
If a sequence of real numbers converges, then the sequence is Cauchy. The proof
of this fact uses a strategy employed repeatedly in Analysis, that is, if two quantities
are very close to the same value, then they must be very close to each other. This
standard technique for proving that two quantities are close to each other involves
the use of the triangle inequality. In particular, if lim aj D L, then for every > 0
j!1
there is an N such that if natural number n > N, then jan Lj < . Well then,
certainly if m and n are both natural numbers greater than N, then both jam Lj <
and jan Lj < . Adding these two inequalities together shows that jam Lj C
jan Lj < C . The triangle inequality states that for any real numbers x and y,
jxj C jyj jx C yj. Thus, 2 > jam Lj C jan Lj D jam Lj C jL an j
j.am L/ C .L an /j D jam an j. Of course, the definition of Cauchy sequence
requires you to show that jam an j is less than , not 2. But you have an enormous
amount of flexibility when working with these types of inequalities, so you could
have asked instead for an N such that for all natural numbers n greater than N,
you have jan Lj less than 2 rather than less than . Thus, the proof could be as
follows.
PROOF: Every convergent sequence is Cauchy.
• Let <aj > be a sequence of real numbers with lim aj D L.
j!1
• Let > 0 be given.
• From the definition of limit, there is a number N such that for all natural
numbers j > N, it follows that jaj Lj < 2 .
• Then for all natural numbers m and n greater than N, jam Lj < 2 and
jan Lj < 2 , so D 2 C 2 > jam Lj C jan Lj D jam Lj C jL an j
j.am L/ C .L an /j D jam an j.
• This shows that the convergent sequence <aj > is Cauchy.
Note that the converse of this theorem also holds. That is, any sequence of
real numbers that is Cauchy is a convergent sequence. This result will be proved
in Sect. 3.7. An important and useful consequence of the above theorem is its
contrapositive: If a sequence is not Cauchy, then it does not converge. Often when
one wants to show that a sequence does not converge, one shows that there is some
> 0 such that for every N there are natural numbers m and n greater than N for
which jam an j .
Another important property of Cauchy sequences is that all Cauchy sequences are
bounded. If the sequence <an > is Cauchy, then there is a natural number N such
that whenever m; n N, the difference jam an j < 1. The set fa1 ; a2 ; a3 ; : : : ; aN g
3.5 Limit of a Sequence 67
is a finite set, so it is bounded by some number, K. That is, jan j K for all n N.
If m > N, then, since both N and m are greater than or equal to N, it follows that
jam aN j < 1 from which it follows that jam j < jaN j C 1 K C 1. Then the
sequence <an > is necessarily bounded above by K C 1 and below by .K C 1/, and
the sequence is bounded. A complete proof follows.
One consequence of the last two results is that since all convergent sequences are
Cauchy, all convergent sequences are bounded. The concept of a Cauchy sequence is
not only applied to sequences of numbers but also to much more general sequences
such as sequences of vectors, sequences of functions, and sequences of linear
operators. Of course, one would need a way to discuss distances between the terms
of a sequence in these other contexts, but when that makes sense, the concept of a
Cauchy sequence becomes important.
3.5.8 Exercises
1. Which of the following sequences are monotone? Which of them are bounded
above? Which of them are bounded below? Which of them are bounded?
(a) an D .1/n
(b) an D nC1
n
(c) an D 5n
.1/n
(d) an D 5n
D 1C.1/
n
(e) an nCn1
(f) an D 5 n.1/n
(g) an D 1 12 13 1n
2. Write proofs of each of the following limits.
6n
(a) lim D2
n!1 3nC1
(b) lim 4n1 D4
n!1 nC6
n2 C2nC1
(c) lim 2 D1
n!1 n 2n5
68 3 Limits
p
3. If a1 D 3 and an is defined recursively by anC1 D 3an C 10, show that the
sequence <an > converges. p
4. If a1 D 7 and an is defined recursively by anC1 D 3an C 4, show that the
sequence <an > converges.
5. Prove that a monotone decreasing sequence that is bounded below converges.
6. Let <an > be any sequence. Prove that <an > has a monotone subsequence.
7. Prove that if <an > is a sequence such that L D lim a2n D lim a2nC1 , then the
n!1 n!1
sequence converges to L.
8. Prove that if <an > is a sequence that converges to L, then the sequence
a1 ; a1 ; a2 ; a2 ; a3 ; a3 ; : : : also converges to L.
9. Prove that if <an > is a sequence that converges to L, then the sequence
a1 ; a2 ; a2 ; a3 ; a3 ; a3 ; a4 ; a4 ; a4 ; a4 ; : : : also converges to L.
lim f .x/ D L means that if x is required to stay close to a, then f .x/ will stay close
x!a
to L. So what does it mean for lim f .x/ not to exist? Intuitively, it could mean that
x!a
in every neighborhood of a there are values of x for which f .x/ is close to one value
L1 and other values of x for which f .x/ is close to another
value L2 . That is what
4x 5 if x < 2
happens with the function f .x/ D as x approaches 2. For some
10 2x if x 2
values of x near 2, f .x/ is close to 3, and for some values of x near 2, f .x/ is close
to 6. Thus, the limit does not exist. Another well-known example is f .x/ D sin 1x
which oscillates wildly as x approaches zero, and in every neighborhood of 0, the
function takes on all values in the interval Œ1; 1 infinitely often. Another way for
the limit not to exist is for the values of f .x/ to grow without bound and approach
infinity or negative infinity such as what happens to f .x/ D .x5/
xC3
2 as x approaches 5.
One can write a proof showing that a particular function has no limit at x D a,
but before discussing how to do this, it is worth taking a close look at the definition
of limit.
To say that a function f has a limit at x D a is to say that there exists a real number
L such that for all > 0 there is a ı > 0 such that for every x, 0 < jx aj < ı
implies jf .x/ Lj < . This definition is actually a fairly complicated statement. At
the heart of it is the conditional statement “0 < jx aj < ı implies jf .x/ Lj < .”
3.6 Proving That a Limit Does Not Exist 69
But this is an open statement, that is, even though the function f and the limit point
a are supposedly known, the statement contains variables x, L, , and ı, all of which
are unknown. Thus, this open statement does not have any truth value until these
four variables have been stipulated. They are stipulated with four phrases: “there is
a real number L,” “for all > 0,” “there is a ı > 0,” and “for every x.” These four
phrases are called quantifications of the variables because they indicate for which
values of the variables the following statement must hold. Two of the phrases use
the existential quantifier “there exists.” It indicates that there is at least one value
of the variable that will make the following statement true. The other two phrases
use the universal quantifier “for all.” It indicates that every possible value of that
variable will make the following statement true. So
• The statement “there exists a real number L such that for all > 0 there is a ı > 0
such that for every x, 0 < jx aj < ı implies jf .x/ Lj < ” begins with the
existential quantifier “there exists a real number L,” and the entire statement is
true if, in fact, there is a value of the variable L that makes the following statement
true, that is, “for all > 0 there is a ı > 0 such that for every x, 0 < jx aj < ı
implies jf .x/ Lj < .”
• The statement “for all > 0 there is a ı > 0 such that for every x, 0 < jx aj < ı
implies jf .x/ Lj < ” begins with the universal quantifier “for all > 0,” and
the entire statement is true if, in fact, every possible positive value of the variable
makes the following statement true, that is, “there is a ı > 0 such that for every
x, 0 < jx aj < ı implies jf .x/ Lj < .”
• The statement “there is a ı > 0 such that for every x, 0 < jx aj < ı implies
jf .x/ Lj < ” begins with the existential quantifier “there is a ı > 0,” and the
entire statement is true if, in fact, there is a positive value of the variable ı that
makes the following statement true, that is, “for every x, 0 < jx aj < ı implies
jf .x/ Lj < .”
• The statement “for every x, 0 < jx aj < ı implies jf .x/ Lj < ” begins with
the universal quantifier “for every x,” and the entire statement is true if, in fact,
every possible value of the variable x makes the following statement true, that is,
“0 < jx aj < ı implies jf .x/ Lj < .”
A proof that no limit exists must prove the negation of the statement that says that
a limit does exist, so it is important that one can generate the negation of a statement
that contains quantifiers such as this one does. The logic of doing this is not hard
to follow. Suppose the P.y/ is a statement that depends on the value of a variable y.
Then the universally quantified statement “for every y, P.y/” says that P.y/ is true
for every possible value of y. The negation of “for every y, P.y/” must be that it
is false that every value of y makes P.y/ true, so there must be at least one y that
makes P.y/ a false statement. This means that the negation of “for every y, P.y/”
is the statement “there is a y such that :P.y/.” To negate a universally quantified
statement, change the universal quantifier to an existential quantifier and negate the
statement that follows.
What if the original statement is an existentially quantified statement such as
“there is a y such that P.y/?” This statement says that some value of y makes
70 3 Limits
P.y/ true. The negation of this statement must be that no value of y makes P.y/
true which is to say that every value of y makes P.y/ a false statement. This means
that the negation of “there is a y such that P.y/” is the statement “for all y, :P.y/.”
To negate an existentially quantified statement, change the existential quantifier to a
universal quantifier and negate the statement that follows.
The statement that f has a limit at x D a is a statement that has an existential
quantifier followed by a universal quantifier followed by an existential quantifier
followed by a universal quantifier followed by a conditional statement. To prove
that f does not have a limit at x D a requires a proof of the negation of that
statement. From the previous discussion it is now clear that to get the negation of
the statement that f has a limit at a, you must flip the two existential quantifiers to
universal quantifiers, flip the two universal quantifiers to existential quantifiers, and
end with the negation of the conditional statement. The result is “for all real numbers
L there is an > 0 such that for all ı > 0 there is an x such that 0 < jx aj < ı and
jf .x/ Lj .”
Getting back to writing a proof that a limit does not exist, the proof would need to
show that for every real number L there is an > 0 such that for every ı > 0 there
is an x within ı of a such that jf .x/ Lj . This is often done by exhibiting an x1
and an x2 within ı of a such that f .x1 / and f .x2 / are so far apart that they could not
both be within of any L. That suggests the following template for proving that a
particular limit does not exist.
TEMPLATE for proving lim f .x/ does not exist
x!a
• SET THE CONTEXT: Make statements about what is known about the
function f and the number a.
• SELECT AN ARBITRARY LIMIT L: Given L 2 R,
• PROPOSE A VALUE FOR : let D . Here you would insert a value for
.
• SELECT AN ARBITRARY ı > 0: Select ı > 0.
• SELECT VALUES FOR x1 AND x2 : Let x1 D and x2 D . Note that
0 < jx1 aj < ı, 0 < jx2 aj < ı, and jf .x1 /f .x2 /j 2. You would have
selected appropriate x1 and x2 in such a way that jf .x1 / f .x2 /j exceeds 2.
• LIST IMPLICATIONS: Assume that jf .x1 / Lj < and jf .x2 / Lj < .
Then 2 D C > jf .x1 / Lj C jf .x2 / Lj D jf .x1 / Lj C jL f .x2 /j
jf .x1 / L C L f .x2 /j D jf .x1 / f .x2 /j.
• STATE THE CONTRADICTION: This shows that 2 > jf .x1 / f .x2 /j
which is a contradiction.
• STATE THE CONCLUSION: Thus, it cannot hold that both jf .x1 / Lj <
and jf .x2 / Lj < , and the limit does not exist.
3.6 Proving That a Limit Does Not Exist 71
4x 5 if x < 2
For example, consider the limit of f .x/ D as x approaches
10 2x if x 2
2. Here the limit from the left is 3, and the limit from the right is 6. Thus, no matter
how close x is supposed to be to 2, there will be values x1 and x2 within that required
tolerance where f .x1 / is close to 3 and f .x2 / is close to 6. If f .x1 / and f .x2 / are both
supposed to be within of some limit L, then it will follow that f .x1 / and f .x2 / will
have to be within 2 of each other. Again, you employ the technique of showing that
two quantities close to the same value must be close to each other. In particular, if x1
is chosen to be less than 2, f .x1 / will be less than 3. If x2 is chosen to be between 2
and 2 12 , f .x2 / will be greater than 5. In this case it would be impossible to have f .x1 /
and f .x2 / within 2 of each other, and, therefore, it would be impossible to have them
both within D 1 of some limit L. This suggests that you will get a contradiction if
you set D 1. Indeed, if a ı > 0 is chosen, you can let x1 D 2 2ı (that is, less than 2
but within ı of 2), and let x2 D min 2 C 2ı ; 2 C 12 (that is, greater than 2 but within
ı of 2 and not so large that f .x/ is less than 5). The point of all of this is that now,
no matter what value is chosen for L, f .x1 / and f .x2 / are more than 2 apart, so how
could they both be within 1 of L? Specifically, if jf .x1 / Lj < 1 and jf .x2 / Lj < 1,
it follows from the triangle inequality that 2 D 1 C 1 > jf .x1 / Lj C jf .x2 / Lj D
jf .x1 / Lj C jL f .x2 /j jf .x1 / L C L f .x2 /j D jf .x1 / f .x2 /j showing
2 > jf .x1 / f .x2 /j which cannot hold. Here is the complete proof (Fig. 3.7).
4x 5 if x < 2
PROOF: The function has no limit as x ! 2.
10 2x if x 2
4x 5 if x < 2
• Let f .x/ D .
10 2x if x 2
• Given any value for L, let D1, and let ı > 0 be given.
• Let x1 D 2 2ı and x2 D min 2 C 2ı ; 2 C 14 .
• Note that 0 < jx1 2j < ı and 0 < jx2 2j < ı.
• Since x1 < 2, it follows that f .x1 / < 3. Since x2 > 2 and x2 < 2 14 , it follows
that f .x2 / > 5. As a consequence jf .x1 /f .x2 /j D f .x2 /f .x1 / > 53 D 2.
• If jf .x1 / Lj < D 1 and jf .x2 / Lj < D 1, it would follow that
2 D 1 C 1 > jf .x1 / Lj C jf .x2 / Lj D jf .x1 / Lj C jL f .x2 /j
jf .x1 / L C L f .x2 /j D jf .x1 / f .x2 /j > 2. This shows that 2 > 2 which
is a contradiction.
• Thus, it cannot hold that both jf .x1 / Lj < and jf .x2 / Lj < , and the
limit does not exist.
It is even easier to show that the function f .x/ D sin 1x has no limit as x
approaches 0. This is because for every ı > 0 it is easy to find x1 and x2 between
0 and ı such that f .x1 / D 1 and f .x2 / D 1. This makes it impossible to find an
L where jf .x1 / Lj < 1 and jf .x2 / Lj < 1. Thus, the proof follows the given
template for proving that a limit does not exist (Fig. 3.8).
1
Fig. 3.8 Graph of sin x
3.6 Proving That a Limit Does Not Exist 73
an x with xC3
.x5/2
>L C 1. But with x within 1 of 5, you could claim that xC3
.x5/2
>
1 1 1
.x5/2
> jx5j
making jx 5j <
, so by you will have the inequality that you
jLjC1
need. Note that the absolute value function was introduced in jLj C 1 to take care of
the embarrassing circumstance that L is negative, and in particular, when L D 1.
The proof is as follows.
3.6.4 Exercises
3. There is an integer k such that f .x/ f .k/ for all x between k and k C 1.
4. For all x > 0 and all y > 0 there exists a z < 0 such that f .z/ xf .y/.
Prove that the following limits do not exist.
5. f .x/ D jxj
x
as x ! 0
1
6. f .x/ D x sin x1 as x ! 1
5x if x < 3
7. f .x/ D as x ! 3
4x if x 3
8. f .x/ D x244 as x ! 2
b a
The second strategy also begins with the interval Œa; b that contains the infinite
bounded set, A. One can rename the end points of this interval to be a1 D a and
b1 D b. Since Œa1 ; b1 \ A D A is infinite, it follows that either Œa1 ; a1 Cb 2
1
\ A or
a1 Cb1 a1 Cb1
Œ 2 ; b1 \ A is an infinite set. If Œa1 ; 2 \ A is infinite, define a2 D a1 and
b2 D a1 Cb2
1
. Otherwise, define a2 D a1 Cb 2
1
and b2 D b1 . In either case, Œa2 ; b2 \ A
is an infinite set. This procedure can be repeated so that for every n 2 N, one gets an
interval Œan ; bn where Œan ; bn \ A is infinite, and each interval is half the length of
the previous interval. Also, the sequence of left endpoints, <an >, is a monotone
increasing sequence bounded above by b, and the sequence of right endpoints,
<bn >, is a monotone decreasing sequence bounded below by a. Thus, both of these
sequences converge. In fact, both of these sequences must converge to the same
limit, p. This follows because the distances between the terms of the sequences,
bn an , keep getting smaller and converge to 0. Given an > 0, it will follow that
there is an n such that an and bn are both within of p. Thus, .p ; p C / \ A
contains Œan ; bn \ A which is infinite. Here is the complete proof.
3.7 Accumulation Points 77
You now have the machinery necessary to prove the result mentioned in Sect. 3.6
that all Cauchy sequences converge. The difficulty in proving this result earlier was
that given a Cauchy sequence <an >, it was not clear what real number would play
the role of the limit L of the sequence. Now, the Bolzano–Weierstrass Theorem can
provide an accumulation point to serve as this limit. There are two cases to consider.
If the set of values in the sequence, fan g, is a finite set, then for the sequence to be
Cauchy, the sequence will necessarily need to be constant from some point on, and,
therefore, the sequence will converge. If the set of values in the sequence is infinite,
then since all Cauchy sequences are bounded, the set of values in the sequence will
be bounded and will have to have an accumulation point. It is then straightforward
to show that the sequence converges to this accumulation point.
78 3 Limits
Up to this point, the discussion of the limit lim f .x/ took place only for those
x!a
functions defined for all x ¤ a in an open interval containing a. The definition of
limit can now be extended. It should not be required that the function f be defined
for all x in an open interval containing a but that f be defined at enough points so
that it makes sense to allow x to approach a. In other words, a only needs to be
an accumulation point of the domain of f . Note that if a is not an accumulation
point of the domain of f , then there will be an open interval containing a where f
were not defined (except perhaps at a itself). Thus, no sense could be made out of x
approaches a. On the other hand, if a is an accumulation point of f , it makes sense
to define the limit of f at a to be L or lim f .x/ D L to mean that for all > 0 there is
x!a
a ı > 0 such that for all x in the domain of f , 0 < jx aj < ı implies jf .x/ Lj < .
Similarly, to define lim f .x/ D L one does not need f to be defined in an
x!1
entire interval stretching to positive infinity. It is sufficient that f .x/ is defined for
arbitrarily large values of x so that x can be allowed to approach infinity. One way
of saying this is that the domain of f should be unbounded above. This is what
was done, for example, when defining the limit of a sequence which is the limit of
a function defined for the natural numbers only. Similarly, lim f .x/ D L can be
x!1
defined for f when the domain of f is unbounded below.
3.8 Infinite Limits 79
3.7.1 Exercises
1. Write a definition for lim f .x/ where a is an accumulation point of the domain
x!aC
of f .
Identify the accumulation points, if any, of the following sets.
˚ n ˇ
2. nC2 ˇn 2 N
˚ ˇ
3. x 2 Q ˇ x2 < 2
˚1 3
4. 2 ; 1; ; 2; 52 ; : : :
˚m ˇ 2
5. 2n ˇ m; n 2 N
n n C4 ˇ
o
6. 2n.1/ ˇn 2 N
3nC5
M, it would be sufficient to make some fraction smaller than f .x/ bigger than M.
1
For example, for all x > 2 the fraction .x5/ 2 is smaller than f .x/. Moreover, for
1 1
x within 1 of 5, jx5j
is smaller than .x5/2
. Thus, it would be sufficient to make
1
jx5j
> M which, under the condition of M > 0, happens when jx 5j < M1 .
A proof would need to take care of the embarrassing case of M 0, perhaps by
1
making ı D jMjC1 since jMj C 1 is always bigger than M and is always positive.
Another way to handle this is to write a proof that assumes that M is positive. In fact,
one could just stipulate that M > 1 by inserting the often used phrase without loss of
generality. This phrase means that even though a restriction is being placed on one
of the assumptions in the proof, if one can complete the proof using this restriction,
then it would be very easy to give a proof without the restriction. In this case, if it is
assumed that M > 1, one could just as easily handled cases where M 1 by finding
a ı > 0 that ensured f .x/ > 1 M, so being able to produce a proof that works for
1 does provide a proof for M 1. The phrase without loss of generality is used so
frequently that many authors abbreviate it as WLOG. These ideas give the following
proof.
3.8.1 Exercises
The fact that the limits of some functions are easy to prove hides the fact that there
are some limits whose validity is considerably more difficult to prove. Fortunately,
the limits of most arithmetic combinations of functions work as expected due to the
behavior of the arithmetic operations of addition, subtraction, multiplication, and
division. In the words of the next chapter, these operations behave well because
they are themselves continuous functions of their arguments. That is, for example,
the function of two variables f .x; y/ D x C y is a continuous function of x and y.
That continuity allows you to prove the following theorem.
THEOREM: Suppose that f and g are functions both defined on a set with
accumulation point a. Let lim f .x/ D L and lim g.x/ D H. Then
x!a x!a
Consider how to prove each part of the above theorem. In each case you will need
to prove the validity of a limit, so the proof can follow the usual proof template for
establishing a limit. These proofs differ from limit proofs found earlier in the chapter
in that you know less about the functions whose limits you are trying to establish.
On the other hand, you do know that the limits of the functions f and g exist, and
that gives you a lot of tools with which to work.
82 3 Limits
So what needs to be done to prove that the limit of the sum of two functions is the
sum of their respective limits? As with all limit proofs, the proof will begin with a
statement about what is being assumed about two functions f and g. In this case,
that would essentially be a restatement of the hypothesis of the theorem that says
that the limits of f and g at a are L and H, respectively. The second step of the
proof would be to say Let > 0 be given which sets the tolerance to be met by
the proof. You know that the end of the proof will need to show that the function
in question, f .x/ C g.x/, needs to be within of the proposed limit, L C H. In
other words, you will need to establish j f .x/ C g.x/ .L C H/j < . Clearly,
this inequality will depend on properties of the functions f and g. But you know
very little about these functions. Actually, knowing very little about the functions
makes your job easier. All you know about these functions is that f has L for a limit,
and g has H for a limit. This means that your proof can only use these two facts.
Because these two limits exist, you will be able to set up conditions that ensure that
jf .x/ Lj and jg.x/ Hj are small. How does this help?It helps because
the triangle
inequality will allow you to show that the expression j f .x/ C g.x/ .L C H/j is
no
bigger than the sum of the two small quantities jf .x/ Lj and jg.x/ Hj. That is,
j f .x/ C g.x/ .L C H/j D j.f .x/ L/ C .g.x/ H/j jf .x/ Lj C jg.x/ Hj. For
example, if both jf .x/ Lj and jg.x/ Hj can be made less than 2 , then their sum
will be less than , and the value of j f .x/ C g.x/ .L C H/j will, in turn, be less
than , as desired. How can you arrange for jf .x/ Lj and jg.x/ Hj both to be less
than 2 ? You are given that the limits of f and g are L and H, respectively, so, by the
definition of limit, you can arrange for each of these quantities to be smaller that any
given positive value, such as 2 , with appropriate choices of ı > 0. The only subtlety
here is that the value of ı > 0 needed to assure that jf .x/ Lj is less than 2 cannot
be assumed to be the same value as the ı > 0 needed to assure that jg.x/ Hj is less
than 2 . Thus, two different values of ı should be chosen, and then the minimum of
those two will be small enough to guarantee both of the needed inequalities.
Thus, after the proof proposes a given > 0, it can produce a ı1 > 0 small
enough so that if x is in the domain of f and 0 < jx aj < ı1 , then jf .x/ Lj will
be less than 2 . The existence of this ı1 comes from the definition of lim f .x/ D L.
x!a
Similarly, the proof can produce a ı2 > 0 coming from the definition of lim g.x/ D
x!a
H such that if x is in the domain of g and 0 < jx aj < ı2 , then jg.x/ Hj will be
less than 2 . The proof then easily follows as described above.
3.9 The Arithmetic of Limits 83
A proof that the limit of the difference f .x/ g.x/ equals the difference of the
individual limits, L H, is very similar to the above proof and is left as an exercise.
Proving that the limit of the product f .x/g.x/ equals the product of the individual
limits, LH, uses the same techniques as the proof for the limit of a sum but has
an added complexity requiring the use of a commonly used trick. The proof of
lim f .x/g.x/ D LH follows the usual template for proving the existence of a limit.
x!a
Its goal is to establish the inequality jf .x/g.x/ LHj < . Again, you can use the
definition of limit to make jf .x/ Lj and jg.x/ Hj as small as you need, but how
small these have to be to ensure that jf .x/g.x/LHj is less than is not immediately
obvious. The problem is that it is difficult to gauge how close f .x/g.x/ is to LH when
you know that f .x/ is close to L, and g.x/ is close to H. The difficulty stems from
having to move from f .x/g.x/ to LH, where f .x/ changes to L and g.x/ changes to H
at the same time. If only one of these two changes were made, then it might be easier
to make the needed estimate. That is, it would be easier to work with an expression
like f .x/g.x/ f .x/H than with f .x/g.x/ LH.
Of course, f .x/g.x/ LH is not the same as f .x/g.x/ f .x/H, so one cannot
just use f .x/g.x/ f .x/H in place of f .x/g.x/ LH. Sometimes, though, it is
worth replacing one expression with another expression that is easier to handle,
84 3 Limits
and then adjusting the second expression to make it equivalent to the first. In this
case, the change can be accomplished by employing one of the oldest tricks used in
mathematical proofs, that of adding and subtracting the same quantity. In particular,
you can rewrite jf .x/g.x/ LHj as jf .x/g.x/ f .x/H C f .x/H LHj. The advantage
of doing this is that now you can see how the difference between f .x/g.x/ and LH
depends on the differences between f .x/ and L and g.x/ and H. Indeed, jf .x/g.x/
LHj D jf .x/g.x/ f .x/H C f .x/H LHj D jf .x/.g.x/ H/ C H.f .x/ L/j
jf .x/j jg.x/ Hj C jHj jf .x/ Lj. If each of the two terms, jf .x/j jg.x/ Hj
and jHj jf .x/ Lj, can be made smaller than 2 , then it will have been shown that
jf .x/g.x/ LHj is less than as needed.
So how small does jf .x/ Lj need to be to ensure that jHj jf .x/ Lj is less
than 2 ? Less than 2jHj
appears to be small enough, although one needs to handle
the embarrassing situation where H D 0. You could handle H D 0 and H ¤ 0 as
two separate cases, or you can take care of both cases at once by making jf .x/ Lj
less than since jHj C 1 is larger than jHj and can never be 0. Thus, you can
2 jHjC1
select a ı1 > 0 so that if 0 < jx aj < ı1 , then jf .x/ Lj < .
2 jHjC1
How small does jg.x/ Hj need to be to ensure that jf .x/j jg.x/ Hj is less than
2
? It would be nice to say that jg.x/ Hj < 2jf.x/j suggesting that you set ı small
enough to ensure jg.x/ Hj < 2jf.x/j , but there is a problem here. The definition of
limit requires that the choice of ı come before the choice of x, so you cannot have
the value of ı depending on x. What is needed is an upper bound for jf .x/j because,
if jf .x/j M, the value of ı can be found to ensure jg.x/ Hj < 2M which will
always be small enough to guarantee jf .x/j jg.x/ Hj < 2 . You can find such
an upper bound for jf .x/j because the limit of f .x/ exists as x approaches a, and so
jf .x/j can be restricted to being not much larger than jLj. You could, for example,
find ı2 > 0 so that if 0 < jx aj < ı2 , then jf .x/ Lj < 1. This would ensure that
f .x/ is a distance of no more than 1 from L so that jf .x/j < jLj C 1. Then you would
only need jg.x/ Hj < to get jf .x/j jg.x/ Hj < 2 . This gives you all the
2 jLjC1
pieces necessary to complete the proof as follows.
3.9 The Arithmetic of Limits 85
Finally, the proof that the limit of a quotient is the quotient of the individual
limits is much like the proof about the product of limits, although the algebra
is more complicated. As in the preceding ˇ ˇproof, you can start with the needed
ˇ f .x/ Lˇ
inequality which, in this case, is ˇ g.x/ H ˇ < . Using the trick of adding and
subtracting the same quantity, the left side of the inequality ˇ can be written ˇ as
ˇ ˇ ˇ ˇ ˇ ˇ ˇ ˇ
ˇ f .x/ ˇ ˇ ˇ ˇ ˇ .f .x/L/HCL
ˇ D ˇˇ ˇ
f .x/HLg.x/ f .x/HLHCLHLg.x/ Hg.x/
ˇ g.x/ HL ˇ D ˇ g.x/H ˇ D ˇ g.x/H g.x/H ˇ
ˇ ˇ
jf .x/Lj ˇ L.g.x/H/ ˇ jf .x/Lj
jg.x/j
C ˇ g.x/H ˇ. Again, the goal will be to make each of the terms jg.x/j
ˇ ˇ
ˇ ˇ
and ˇ L.g.x/H/
g.x/H ˇ
less than 2 by selecting an appropriate sequence of ı’s.
Both of these terms have a factor of jg.x/j in the denominator. To make the
fractions small, you will need to know that jg.x/j does not get too close to zero.
What you do know is that lim g.x/ D H is not zero because the hypothesis of the
x!a
theorem will make that assumption. How far away from zero can you require jg.x/j
to be? Certainly, this will depend on the value of H. If H is close to zero, then jg.x/j
86 3 Limits
will be close to zero as x approaches a. The best you can do is require that jg.x/j be
so close to H that it will keep a known distance from zero. For example, you could
require that jg.x/ Hj be less than jHj
2
. That will ensure that jg.x/j is at least jHj 2
which keeps it a known distance away from zero. So, select a ı1 > 0 such that if x
is in the domain of g with 0 < jx aj < ı1 , then jg.x/ Hj < jHj 2
, and jg.x/j will
jHj
be greater than 2 .
Now for these values of x you will have jf jg.x/j
.x/Lj 2
< jf .x/ Lj jHj . Thus, it would
jHj jf .x/Lj
be sufficient if jf .x/Lj were to be less than 4
which will ensure that jg.x/j
< 2 .
jHj
ˇ ı2 > 0 small enough so that jf .x/Lj is less than 4 .
This can be done ˇby choosing
ˇ ˇ
To make the ˇ L.g.x/H/
g.x/H ˇ
term less than 2 , you can select a ı3 > 0 so that if
2
x is within ı3 of a you will have jg.x/ Hj less than H 4jLj
because that will give
ˇ ˇ H 2 2
ˇ L.g.x/H/ ˇ jLj 4jLj H
ˇ g.x/H ˇ < jg.x/jjHj < H42 D 2 . Well, OK, did you catch that the preceding does
2
not work if L D 0? To avoid this problem it would be better to make jg.x/ Hj less
2
than H . Putting all of these ideas together gives the following proof.
4 jLjC1
PROOF: Suppose that f and g are functions both defined on a set with
accumulation point a. If lim f .x/ D L and lim g.x/ D H with H ¤ 0, then
x!a x!a
f .x/
lim D L
.
x!a g.x/ H
As a demonstration of the power of these results about the arithmetic of limits, you
can now easily prove the following list of results which will allow you to easily
calculate limits of polynomials and rational functions of x.
• For any constant c in the real numbers, lim c D c.
x!a
• lim x D a.
x!a
• For any n in the natural numbers, lim xn D an .
x!a
• For any polynomial p.x/, lim p.x/ D p.a/.
x!a
p.x/ p.a/
• For any polynomials p.x/ and q.x/ with q.a/ ¤ 0, lim D .
x!a q.x/ q.a/
The first two results are very easy to prove, and are left as exercises. The next
two results can be proved by using mathematical induction which is often the first
technique one considers using when trying to prove statements such as these that
depends on a natural number. Here, mathematical induction will be employed to
prove statements about the limits of polynomials, and the degree of the polynomial
provides a natural number to use as the induction variable.
To begin with, try using mathematical induction to prove that lim xn D an for
x!a
any natural number n. In this mathematical induction argument, the base case is
lim x D a, that is, when n D b D 1. The proofs of statements similar to this base
x!a
case were covered earlier. The induction step in the proof will need to show that if
lim xk D ak for some natural number k, then lim xkC1 D akC1 . But xkC1 is just the
x!a x!a
product xk x, so this result follows immediately from the theorem about the limits
of products. That leads to the following proof that uses the template for proofs by
mathematical induction.
PROOF: lim xn D an for any natural number n.
x!a
• SET THE CONTEXT: The statement will be proved for all natural numbers
n by mathematical induction on n.
• PROVE S.b/: When n D 1, the statement says that lim x D a which has
x!a
already been established.
• STATE THE INDUCTION HYPOTHESIS: Assume that for some natural
number k, lim xk D ak .
x!a
• PERFORM THE INDUCTION STEP: Then since the limit of a product
of two functions is the product of the two individual limits, it follows that
lim xkC1 D lim xk x D .lim xk /.lim x/ D ak a D akC1 . So the statement
x!a x!a x!a x!a
is true for n D k C 1.
• STATE THE CONCLUSION: Therefore, by mathematical induction,
lim xn D an is true for all natural numbers n.
x!a
88 3 Limits
Mathematical induction can again be employed to prove that for every polyno-
mial, p.x/, lim p.x/ D p.a/. As a reminder, a polynomial of degree n is a function,
x!a
p.x/ D cn xn C cn1 xn1 C cn2 xn2 C C c1 x C c0 where c0 ; c1 ; c2 ; : : : ; cn
are constants with cn ¤ 0. Previously it has been proved that lim cj D cj and
x!a
lim x j D a j , from which one gets that the limit of a monomial is lim cj x j D cj a j .
x!a x!a
A polynomial is just a sum of such monomials, so mathematical induction is a
convenient tool for showing that this sum of an arbitrary number of monomials
has the desired limit.
PROOF: For any constants c0 ; c1 ; c2 ; : : : ; cn and a 2 R, the poly-
nomial p.x/ D cn xn C cn1 xn1 C cn2 xn2 C C c1 x C c0 satisfies
lim p.x/ D p.a/.
x!a
Recall that a rational function is just a ratio of polynomials, that is, if p.x/ and
q.x/ are polynomials, then p.x/
q.x/
is a rational function. It is only a simple step to get
the following theorem.
It is time to note that even though all of these limit theorems concerned limits as
x approaches a, most can be extended to cover limits as x approaches a from the
left, as x approaches a from the right, as x approaches infinity, and as x approaches
negative infinity. In particular, most of the theorems apply to the limits of sequences.
Many of these statements can be found in the exercises.
3.9.6 Exercises
This section discusses a few other useful results about limits. They provide an
interesting variety of proof strategies to consider.
90 3 Limits
What can you say about lim f .x/ D L if you know that f .x/ > 0 for all x, or at
x!a
least for all x in an open interval containing a? Assuming that this limit exists, it
is clear that the limit cannot be negative because, from the definition of limit, you
know that jf .x/ Lj can be made as small as you like which would not be possible
if f .x/ were always positive and L were negative. But how would you prove this?
The key lies in the inequality jf .x/ Lj < since, if L were negative, you could
choose to be so small that the inequality could not hold. How small would need
to be? Well, the only thing you know about f .x/ is that it is positive, or, in other
words, cannot be smaller than 0. At the same time, L is negative which means that
f .x/ and L must be at least jLj apart, noting that jLj > 0. So set D jLj. Then
jLj D > jf .x/ Lj D f .x/ C jLj which implies f .x/ < 0 which is a contradiction.
This leads to the following proof.
PROOF: Let f be a function such that f .x/ > 0 for all x in the domain of
f . If lim f .x/ D L, then L 0.
x!a
• Suppose that lim f .x/ D L and that for all x, f .x/ > 0.
x!a
• Assume that L < 0.
• By the definition of limit, there is a ı > 0 such that for all x in the domain
of f satisfying 0 < jx aj < ı, it follows that jf .x/ Lj < L.
• For these values of x it must be that L > jf .x/ Lj D f .x/ L implying
that 0 > f .x/ which contradicts the fact that f .x/ is always positive.
• Therefore, it must hold that L 0.
Similar statements can be made about the limits of functions f satisfying f .x/ > b
or f .x/ < b for all x where b is a constant real number. One can also extend this to
limits from the left, limits from the right, and limits to infinity and negative infinity.
Several of these possibilities have been left for the exercises.
There is nothing in the definition of lim f .x/ D L that a priori precludes lim f .x/ D
x!a x!a
M for some M ¤ L. But, in fact, limits are unique, that is, the only way for the limit
to be L and the limit to be M is for L and M to be equal. Intuitively, this should make
sense. If the values of f .x/ are getting close to L, then they should not also be able
to get close to a value distinct from L. So how can you prove this using nothing but
the definition of limit as a tool?
The result can be proved by contradiction, that is, if you assume that the function
f has two distinct limits, L and M, as x approaches a, then this leads to a statement
which must be false. Assuming that both limits exist, the definition of limit will
3.10 Other Limit Theorems 91
allow you to force both jf .x/ Lj < and jf .x/ Mj < for any positive
that you choose. Why can’t this happen? Well, if it did, you could get C >
jf .x/ Lj C jf .x/ Mj D jf .x/ Lj C jM f .x/j jf .x/ L C M f .x/j D jM Lj.
If M ¤ L, then jM Lj is a positive number, so if is chosen less than or equal to
jMLj
2
, it will be impossible to have jM Lj < 2 as guaranteed by the definition of
limit. That gives you the following proof.
The Squeezing Theorem, also known as the Sandwich Theorem or the Scrunch
Theorem, says that if the values of f .x/ are always between g.x/ and h.x/, then
if g and h both have the same limit, L, at x D a, then f must also have limit L at a.
The proof of this is not hard once you write down everything that you know about
the functions f , g, and h. So what do you know? You can assume that for every x that
g.x/ f .x/ h.x/, and you can assume that lim g.x/ D lim h.x/ D L. This means
x!a x!a
that for every > 0 there is a ı1 > 0 such that when x satisfies 0 < jxaj < ı1 , then
jg.x/ Lj < . Similarly, for that same , there is a ı2 > 0 such that when x satisfies
0 < jx aj < ı2 , then jh.x/ Lj < . Thus, you can show for values of x near a that
g.x/ f .x/ h.x/, < g.x/ L < , and < h.x/ L < . Putting these three
sets of inequalities together shows that < g.x/ L f .x/ L h.x/ L <
from which jf .x/ Lj < follows. This gives the following proof.
92 3 Limits
If the sequence <an > converges to L, it means that the terms of the sequence are
getting close to L. This should mean that the terms of any subsequence should also
be getting close to L, and it is not hard to prove that every subsequence <anj > of
<an > has the same limit.
Given the fact that lim an D L and given a subsequence <anj >, how do you
n!1
use this to prove that the subsequence converges to L? What do you know about this
subsequence? Only that there is a strictly increasing sequence of natural numbers,
<nj >, that tells which terms of <an > are found in the subsequence. A nice property
of a strictly increasing sequence of natural numbers, <nj >, is that for any natural
number j, nj j. This can easily be proved by mathematical induction on j.
Certainly, n1 1 since n1 is a natural number, so the claim is true for j D 1. If
nk k for some k, then because <nj > is strictly increasing, nkC1 nk C 1 k C 1
showing that if the claim is true for k, then it is true for k C 1. This proves the claim.
The definition of limit gives you that for any > 0 there is an N such that if
j > N, then jaj Lj < . But since nj j, it follows that for all j > N, nj is also
greater than N, so janj Lj < as needed.
3.11 Liminf and Limsup 93
• Let <an > be a sequence with lim an D L, and let <anj > be any
n!1
subsequence.
• Let > 0 be given.
• By the definition of limit, there is an N such that for all n > N, jan Lj < .
• By the definition of subsequence, <nj > is a strictly increasing sequence of
natural numbers and, as such, satisfies nj j for all natural numbers j.
• Thus, for all j > N, nj j > N implies janj Lj < .
• This proves that lim anj D L.
j!1
Of course the converse of this theorem is trivially true. That is, if all subsequences
of a given sequence converge, then the original sequence converges. This is trivial
since the original sequence is one of its subsequences.
3.10.5 Exercises
Even when a limit does not exist, there is often something that can be said
about the values that the function approaches. Consider, for example, the sequence
1; 1; 0; 1; 1; 0; 1; 1; 0; : : : which just oscillates among the numbers 1, 1,
and 0. This sequence does not have a limit, but it has subsequences that do converge.
Some of its subsequences converge to 1, some converge to 1, and some converge
to 0.
2 sin x 2
Now consider the function f .x/ D 2xx2 C1 . The function x22xC1 has a limit of 2
as x goes to infinity, but f .x/ oscillates without approaching a limit. Some of its
94 3 Limits
values do approach 2, but other values approach 2 and every value in between.
More precisely, for each L 2 Œ2; 2, you can find sequences <xn > where lim xn D
n!1
1 and lim f .xn / D L.
n!1
So suppose that the function f is defined for positive real numbers. How might
f .x/ behave as x goes to infinity? f might diverge to infinity or minus infinity as
3 2x2
do f .x/ D x2 and f .x/ D x1x
2 C4 . It might have a finite limit as does x2 C1 . It might
oscillate among values within some bounded range such as .3xC100/ xC10
sin x
. Finally, it
might oscillate and be unbounded like x j sin xj.
Even when f oscillates so that it does not have a finite or infinite limit, it
is helpful to quantify which values the function f .x/ approaches repeatedly as x
grows. This can be done by considering the range of f .x/ when x is restricted to
an interval .M; 1/, and then watching what happens to that range as M gets large.
For example, consider the function f .x/ D .3xC100/ xC10
sin x
whose graph is shown in
3xC100 70
Fig. 3.10. The function xC10 D 3 C xC10 is a decreasing function of x for x > 0,
so on the interval .M; 1/, the function f oscillates in a range bounded between
3MC100
MC10
and 3MC100
MC10
. What can be said about the sequence <f .xn /> where <xn > is
a sequence with lim xn D 1? The values f .xn / are between 3xxnn C100
C10
and 3xxnn C100
C10
,
n!1
so as xn gets large, f .xn / is forced to be inside or very near the interval Œ3; 3.
Clearly, for no sequence <xn > can f .xn / approach a limit outside of the interval
Œ3; 3, but there are sequences for which f .xn / approaches 3 and others for which
f .xn / approaches 3 as shown in the figure. Finding the greatest and least values
to which f .xn / could converge is the idea behind the limit superior and limit
inferior often referred to simply as the lim sup and lim inf, respectively. In the
example of f .x/ D .3xC100/
xC10
sin x
, the values of 3 and 3 came from looking at the
greatest lower bound and least upper bound of the set ff .x/ j x > Mg and then
letting M go to infinity. In general, let f be a function whose domain is unbounded
above. For each real number M let AM be the range of f for x > M, that is,
AM D ff .x/ j x is in the domain of f with x > Mg. Then define lim sup f .x/ to be
x!1
lim sup AM . Similarly, define lim inf f .x/ to be lim inf AM . Some books use the
M!1 x!1 M!1
notation lim and lim for lim sup and lim inf, respectively.
such that lim f .xn / D lim inf f .x/. Similar definitions can be given for lim inf f .x/,
n!1 x!a x!aC
lim sup f .x/, lim inf
f .x/, lim sup f .x/, lim inf f .x/, and lim sup f .x/.
x!aC
x!a x!a x!1 x!a1
The most important theorem concerning lim inf and lim sup is that lim f .x/ D L
x!a
if and only if lim inf f .x/ D lim sup f .x/ D L. Notice first that this is a biconditional
x!a x!a
statement; that is, an “if and only if” statement. This requires that its proof have two
parts; one that assumes lim f .x/ D L and proves lim inf f .x/ D lim sup f .x/ D L
x!a x!a x!a
and another that assumes lim inf f .x/ D lim sup f .x/ D L and proves lim f .x/ D L.
x!a x!a x!a
So, given lim f .x/ D L, how can you conclude that lim inf f .x/ D lim sup f .x/ D
x!a x!a x!a
L? What you know is that given > 0, there is a ı > 0 such that for all x in the
domain of f for which 0 < jx aj < ı, you have jf .x/ Lj < . But this means that
for small ı > 0, the supremum sup Aı and infimum inf Aı are both within of L and,
therefore, the limits of sup Aı and inf Aı must both approach L as ı decreases to 0.
Conversely, suppose that lim inf f .x/ D lim sup f .x/ D L. Note that for any x ¤ a in
x!a x!a
that f .x/ 2 A2jxaj . Thus, inf A 2jxaj f .x/
the domain of f , it follows sup A2jxaj
which implies that lim inf A2jxaj lim f .x/ lim sup A2jxaj from which it
x!a x!a x!a
follows that lim f .x/ D L.
x!a
PART I: the limit equals L implies lim inf and lim sup equal L
(continued)
3.11 Liminf and Limsup 97
PART II: lim inf and lim sup equal L implies that the limit equals L
As discussed earlier, this theorem holds even when L D 1 or 1. It also holds for
limits at ˙1 and for one-sided limits.
3.11.1 Exercises
3. Prove that if a is any accumulation point of the domain of f , then lim inf f .x/
x!a
lim sup f .x/.
x!a
4. Prove that lim f .x/ D 1 if and only if lim inf f .x/ D 1.
x!a x!a
5. Suppose that lim inf f .x/ D L and lim inf g.x/ D M. What can you say about
x!a x!a
lim inf.f C g/.x/?
x!a
6. Suppose that f is a positive-valued function with lim sup f .x/ D L > 0. Prove
x!a
1
that lim inf f .x/ D L1 .
x!a
Chapter 4
Continuity
As with the definition of limit, most Calculus students will develop an intuitive feel
for what it means for a function to be continuous. This usually involves knowing
that a function is continuous on an interval if the graph of that function over that
interval can be drawn without lifting one’s pencil from the page. The important
property here is that as the pencil is tracing out the graph of the function, and the
pencil is approaching the point where x D a, the points on the graph are getting
close to their destination at the point .a; f .a//. In particular, it does not happen
that as the points on the graph are getting close to .a; L/ that the graph suddenly
jumps to a different point .a; f .a// where f .a/ ¤ L, a situation where the pencil
would have to be lifted from the page to get from .a; L/ to .a; f .a//. This intuitive
understanding leads directly to the key property of f being continuous at a which is
that lim f .x/ D f .a/.
x!a
How can one state a definition for continuity that embodies this intuitive feel for
the function having its own value as its limit? Clearly, the definition of a function f
being continuous at a point x D a must be similar to a definition of the limit of f as
x approaches a. As a reminder, here is the definition of limit.
Suppose that the point a is an accumulation point of the domain of the function f.
Then lim f .x/ D L means that for every > 0 there exists a ı > 0 such that for
x!a
every x in the domain of f satisfying 0 < jx aj < ı, it follows that jf .x/ Lj < .
The definition of continuity of f at point a needs to include the fact that the
function is defined at the point a, so references to the limit L in the definition of
limit can be replaced by references to f .a/. Thus, the definition of continuity will
contain the conclusion jf .x/f .a/j < . In the definition of limit, it was not required
that the function f be defined at x D a, and if it were defined, f .a/ did not need to
be equal to the limit L. For this reason, the definition of limit took care to ensure
that even though jf .x/ Lj < was required to hold for x values near x D a, this
inequality did not need to hold at x D a. The definition of limit excluded x D a by
a b c d e f
only requiring the inequality to hold for those x values satisfying 0 < jx aj < ı
which excludes x D a. This restriction is not necessary in the definition of continuity
of a function at a point.
Suppose that the point a is in the domain of the function f. Then f is continuous at a
means that for every > 0 there exists a ı > 0 such that for every x in the domain
of f satisfying jx aj < ı, it follows that jf .x/ f .a/j < .
Notice that the requirement that the point a be an accumulation point of the
domain of f has been dropped. As a result, if the function f is defined at an isolated
point a, then f is continuous at that point. A function that is not continuous at the
point a is discontinuous at the point a.
A function f is continuous on a set A if it is continuous at each point a 2 A. The
function whose graph appears in Fig. 4.1 is discontinuous at x D b because its limit
at x D b does not exist. Similarly, it is discontinuous at x D c. It is discontinuous
at x D d because it is not defined at that point even though the function has a limit
there. The function is continuous on the intervals Œa; b/, .b; c/, and .c; d/, and at
the points x D e and x D f . The function is not continuous on the intervals Œa; b
or Œc; d.
It is a direct consequence of the definition of continuity that if f is continuous
at a point a, and if a is an accumulation point of the domain of f , then the limit of
f .x/ at a exists and is, in fact, f .a/. To prove this you would just need to show that
if f satisfies the definition of continuity at a, then f also satisfies the definition of
lim f .x/ D f .a/. Writing down the definition of continuity gives you that for every
x!a
> 0 there is a ı > 0 such that jx aj < ı implies jf .x/ f .a/j < . But if this is
true, then certainly 0 < jx aj < ı implies jf .x/ f .a/j < , so the definition of
limit is satisfied.
4.2 Proving the Continuity of a Function 101
The template for proofs of lim f .x/ D L followed directly from the definition of
x!a
limit. Similarly, a template for proofs of the continuity of a function f at a point
a will follow directly from the definition of continuity. Indeed, the definition of
continuity requires that for every > 0 there exist a ı > 0 which satisfies
a particular condition. This suggests that a proof of continuity should select an
arbitrary > 0 and proceed to display a value of ı > 0 that causes the needed
condition to be satisfied. This is similar to the procedure taken for a limit proof
except that the needed condition is slightly different. Thus, here is a template for
proofs about the continuity of a function at a point.
102 4 Continuity
As a start, consider how to prove that the function defined for all real numbers x as
f .x/ D 5x3 is continuous at x D 4. The proof would begin with “Let f .x/ D 5x3.
Given > 0; : : : .” The task is then to find a ı > 0 so that jf .x/ f .4/j < for every
x satisfying jx 4j < ı. Working backwards, to get jf .x/ f .4/j < one needs
> j.5x 3/ .5 4 3/j D 5jx 4j. Therefore, it seems clear that jx 4j
needs to be less than 5 , so letting ı D 5 will work. Note that because > 0, ı is
also greater than 0 as required by the definition of continuity. Putting this into the
template results in the following proof.
For a more challenging example, consider proving that the function f .x/ D
2x3 4x C 1 is continuous for all real numbers. This proof not only tackles a
more complicated function than the one in the previous example, it is supposed to
demonstrate the continuity of the function at the general real number a rather than
at a specific value such as a D 4. This requires the proof to select an arbitrary a and
prove the continuity of f at the point a. By showing that the function is continuous
at any arbitrarily chosen a, it shows that the function is continuous at every point a.
Again, the proof will select an arbitrary > 0 and needs to produce a ı > 0 such
that jf .x/ f .a/j < for all x satisfying jx aj < ı. The proof needs to select an
arbitrary a and an arbitrary > 0. Does it matter which it does first? In this case
where the choice of a does not depend on which is chosen, and the choice of
does not depend on which a is chosen, the order is not critical. It makes sense to
select the a first because you are then challenged to prove that f is continuous at
a for which you should choose an > 0. But since both quantifiers are universal
quantifiers (for all a 2 R and for all > 0), the order does not matter. If it had been
4.2 Proving the Continuity of a Function 103
a universal quantifier and an existential quantifier such as “for all > 0 there exists
a ı > 0,” then the order would matter a great deal.
Working backwards from > jf .x/f .a/j you can see that you need > j.2x3
4xC1/.2a3 4aC1/j D j2.x3 a3 /4.xa/j D j2.xa/.x2 CxaCa2 /4.xa/j D
jxajj2.x2 CxaCa2 /4j. You should not be surprised and, in fact, be quite pleased
to see that this last expression contains a factor of jx aj because this will facilitate
making the expression small when jx aj is made small. One only needs to control
the size of the other factor j2.x2 C xa C a2 / 4j. Of course, if x is allowed to wonder
too far from a, this other factor could get arbitrarily large, so care must be taken to
restrict how far x gets from a. This can be done by requiring that ı not be larger than
some conveniently selected value such as 1. That means that jx aj < ı 1 would
imply, for example, that jxj < jaj C 1. Given this, there are many ways to find an
upper bound for the quantity j2.x2 C xa C a2 / 4j where the upper bound does not
depend on x. For example, j2.x2 C xa C a2 / 4j 2x2 C 2jxjjaj C 2a2 C 4
2.jaj C 1/2 C 2.jaj C 1/jaj C 2a2 C 4. One can afford to be sloppy here and get a
simpler looking upper bound by saying 2.jaj C 1/2 C 2.jaj C 1/jaj C 2a2 C 4
2.jaj C 1/2 C 2.jaj C 1/.jaj C 1/ C 2.jaj C 1/2 C 4.jaj C 1/2 D 10.jaj C 1/2 . All you
need is an upper bound that depends only on a. This leads to the following proof.
where f .x/ D x C 1. Indeed, for most real numbers a, lim f .x/ does not exist. Only
x!a
at x D 1, where 2x and x C 1 coincide, does this limit exist, and, in fact, at that point
f .x/ is continuous (Fig. 4.2).
A proof that f is continuous at x D 1 would be similar to the two preceding
proofs, but you need to be careful to handle f .x/ differently depending on whether
x is rational or irrational. As in other continuity proofs, given an > 0 you are
faced with producing a value for ı > 0 which will ensure that jf .x/ f .1/j <
whenever jx aj < ı. If the function in the proof were equal to x C 1 for every
value of x, then the value ı D would work because jx 1j < shows that
jf .x/ f .1/j D j.x C 1/ .1 C 1/j D jx 1j < . If the function in the proof
were equal to 2x for every value of x, then the value ı D 2 would work because
jx 1j < 2 shows that jf .x/ f .1/j D j.2x/ .2 1/j D 2jx 1j < . In this proof,
then, you can choose ı D min.; 2 / D 2 . After selecting an x with jx 1j < ı,
you merely consider two separate cases, one where x is rational, and one where x is
irrational. These ideas allow you to produce the following proof.
2x if x is rational
PROOF: The function f .x/ D is continuous at
x C 1 if x is irrational
x D 1.
2x if x is rational
• Let f .x/ D .
x C 1 if x is irrational
• Given > 0,
• let ı D 2 which is greater than 0 since > 0.
• Select x such that jx 1j < ı D 2 .
• If x is a rational number, then jf .x/ f .1/j D j2x 2j D 2jx 1j < 2ı D .
• If x is an irrational number, then jf .x/ f .1/j D j.x C 1/ 2j D jx 1j <
ı < .
• In either case, jf .x/ f .1/j < .
• Therefore, the function f is continuous at 1.
4.3 Uniform Continuity 105
4.2.1 Exercises
the point a where the continuity needs to be shown. These functions are special and
satisfy the following definition. A function f is uniformly continuous on the set
A if for every > 0 there is a ı > 0 such that jf .x/ f .y/j < for every x and y
in A satisfying jx yj < ı. You should compare this definition to the definition of
continuity at a point. The difference centers on when the value of ı > 0 needs to
be determined. For continuity at a single point, given > 0, one must specify the
value of ı > 0 after being given the value of a but before being given a value for x.
Thus, the value of ı > 0 can depend on the value of a even though it cannot depend
on the value of x. On the other hand, for uniform continuity, given > 0, one must
specify the value of ı > 0 before learning the values of either x or y, and, therefore,
its value cannot depend on either x or y.
The definition of uniform continuity suggests a template for how to prove that a
given function f is uniformly continuous on a set A. As in the proof for continuity
at a point, you would say that a value for > 0 has been given. Then you would
present a value for ı > 0. Once these two values have been specified, you would
need to show that any x and y in A that satisfy jxyj < ı also satisfy jf .x/f .y/j < .
This suggests the following.
Less clear is how to choose a value for ı > 0 when proving f .x/ D x2 1C1
is uniformly continuous on the real numbers. To do this, you need to find a
way to show ˇjf .x/ f .y/jˇ < . You would try to find an upper bound for
ˇ ˇ 2 2 C1/j
jf .x/f .y/j D ˇ x2 1C1 y2 1C1 ˇ D j.y.x2C1/.x
C1/.y2 C1/
jxCyj
D .x2 C1/.y2 C1/ jxyj. This expression
is complicated, so it is convenient to find ways to simplify it. The nice thing about
working with inequalities rather than equalities is that you are not prevented from
making changes that increase the value of your expression. That is, if you can
simplify an expression by substituting an expression that is a little larger, that might
not be a problem. The numerator in the previous expression is jx C yj which does
not simplify algebraically, but it does suggest a possible application of the triangle
inequality, jx C yj jxj C jyj. Changing jx C yj to jxj C jyj allows the fraction to
be broken into two simpler
fractions. It allows you to continue with jf .x/ f .y/j D
jxCyj jyj
.x2 C1/.y2 C1/
jx yj .x2 C1/.y2 C1/ C .x2 C1/.y2 C1/ jx yj x2jxj
jxj
C1
C y2jyj
C1
jx yj.
When jxj < 1, you can conclude that jxj < 1 x2 C 1. When jxj 1, you can
2
conclude that jxj x2 < x2 C 1. In either case .x2jxj C1/
xx2 C1
C1
D 1. This lets you
jxCyj jxj jyj
state that jf .x/ f .y/j D .x2 C1/.y2 C1/ jx yj 2 2
.x C1/.y C1/
C 2 2
.x C1/.y C1/
jx yj
2jx yj. This suggests that ı D 2 will work in the proof.
108 4 Continuity
1
PROOF: The function f .x/ D x2 C1
is uniformly continuous on the real
numbers.
• Let f .x/ D x2 1C1 .
• Given > 0,
• let ı D 2 which is greater than 0 since > 0.
• Let x and y be real numbers
ˇ ˇ jx 2 yj < ı2 D 2 .
such that
ˇ ˇ
• Then jf .x/ f .y/j D ˇ x2 1C1 y2 1C1 ˇ D j.y.x2C1/.x C1/j
C1/.y2 C1/
D
jxCyj jxj jyj
2 2 jx yj .x2 C1/.y 2 C1/ C .x2 C1/.y2 C1/ jx yj
.x C1/.y C1/
jxj
x2 C1
C y2jyj
C1
jx yj
• Note that if jxj < 1, then jxj < x2 C 1, and if jxj 1, then jxj x2 < x2 C 1.
• In either case, jxj < x2 C 1, so x2jxj < 1, and similarly, y2jyj < 1.
C1 C1
jxj jyj
• It follows that jf .x/ f .y/j x2 C1 C y2 C1 jx yj < 2jx yj < 2ı D .
• Therefore, the function f is uniformly continuous on the real numbers.
One of the most memorable theorems from Calculus is the Mean Value
Theorem which states that if the function f is continuous on the interval Œa; b
and differentiable on the interval .a; b/, then there is a c 2 .a; b/ such that
f 0 .c/ D f .b/f
ba
.a/
. If the function f has a bounded derivative on the interval
Œa; b, that is, if there is a positive real number M such that jf 0 .x/j M for all
values of x 2 Œa; b, then one can easily see that f is uniformly continuous on that
interval. Indeed, if x and y are in Œa; b, then there is a c between x and y such that
jf .x/ f .y/j D jf 0 .c/j jx yj M jx yj. This implies that given > 0, the value
ı D M > 0 can be used in a proof that f is uniformly continuous on Œa; b for then
jx yj < ı implies jf .x/ f .y/j D jf 0 .c/j jx yj < M jx yj < Mı D . This
is summarized by saying that a function with a bounded derivative on an interval is
uniformly continuous there.
Whenever you learn of the truth of a conditional statement such as the one at the
end of the previous paragraph (bounded derivative implies uniform continuity), it is
natural to ask whether the converse of the statement is also true (uniform continuity
implies bounded derivative). The answer to this particular question is “no, not all
functions uniformly continuous on an interval have bounded derivatives there.” In
particular, the function f .x/ D jxj is an example of a function uniformly continuous
on the entire real line, yet it fails to be differentiable at x D 0. The function f .x/ D
p
x is uniformly continuous for x 0, but its derivative is unbounded near x D 0.
A more complex example is the function defined by f .x/ D x2 sin x12 when x ¤
0 and f .0/ D 0. This function is uniformly continuous on the interval Œ10; 10
even though its derivative, which exists on the entire real line, is not bounded as x
approaches 0. p
Because the function f .x/ D x has an increasingly large rate of change as x
approaches 0, proving that the function is uniformly continuous for x 0 provides
an interesting challenge. The proof will need to conclude that > jf .x/ f .y/j D
4.3 Uniform Continuity 109
p p p p
p p j x yj. xC y/
j x yj D p p
xC y
D pjxyj
p . As expected, there is a factor of jx yj in
xC y
this expression, so that you can try to make the expression small by prestricting the
p
size of jx yj. This is easy if the denominator of the expression, x C y, does
not get too small. The problem is if x and y get close to 0, the denominator of the
expression will also get close to 0. At first this seemsp likepa significant roadblock.
But this roadblockppresents its own resolution for if x C y is very small, it must
p
certainly be that j x yj is even smaller p which is the conclusion that you want.
p
In other words, there are two
p cases: either x C y is small which would imply that
p
jf .x/ f .y/j is small, or x C y is large which would imply that jf .x/ f .y/j D
jxyj
p p is small. You only need to decide what to use as the dividing line between
xC y
p p
“large” and “small.” A natural choice would be itself because x C y <
p p p p
implies j x yj < . If x C y , then jf .x/ f .y/j D pxCpy jxyj jxyj
2
which suggests letting ı D 2 so that jx yj < ı gives jf .x/ f .y/j < D . The
complete proof follows.
p
PROOF: The function f .x/ D x is uniformly continuous on the interval
x 0.
p
• Let f .x/ D x.
• Given > 0,
• let ı D 2 which is greater than 0 since ¤ 0.
• Let x and y be nonnegative
p real numbers such that jx yj < ı.p
p p
• In
p the case that x C y < , it follows that jf .x/ f .y/j D j x yj
p
x C y < . p p
p p
• In the case that x C y , it follows that jf .x/ f .y/j D j x yj D
p p p p
j x yj. xC y/ jxyj jxyj ı 2
p p
xC y
D p p
xC y
<
D
D .
• In either case, jx yj < ı implies that jf .x/ f .y/j < , so the function f is
uniformly continuous on the interval x 0.
There is an important lesson to be learned from this example. When planning how
to write a proof, you can pursue one line of thinking which may solve the problem
in most but not all cases. Sometimes the special cases where the argument does not
work are enough to cause you to abandon your original line of reasoning altogether.
But often you can just break your argument into two or more cases and find other
techniques to handle the special cases where the original argument does not work.
4.3.1 Exercises
Let a and b be real numbers with a < b. It turns out that if a function f is continuous
on the closed interval Œa; b, then f is uniformly continuous on that interval. How
might you prove this result? As a first try, you might say that for each > 0 and
for each y 2 Œa; b there is a ı > 0 such that if x 2 Œa; b with jx yj < ı, then
jf .x/ f .y/j < . Then, having produced a value for ı for each y 2 Œa; b, you might
want to pick the smallest of all of those ı’s and hope that this minimum ı would be
sufficiently small to work for every y 2 Œa; b. Unfortunately, you started out with
an infinite collection of ı’s, each greater than 0. Such an infinite set might not have
a minimum value. The set of such ı’s is certainly nonempty and bounded below, so
the collection does have a greatest lower bound, but that greatest lower bound could
be 0, too small to use for the ı in the proof. A finite set of positive numbers always
has a minimum value that is positive, but an infinite set of positive numbers might
have a greatest lower bound of 0.
Suppose that T is a collection of open intervals, and A R. If the set A is
contained in the union of the open intervals in T, that is, if A [ .s; t/, then
.s;t/2T
T is called an open cover of A. A subset T 0 T which is also an open cover of
A is called a subcover of A. In the above suggested proof that the continuity of f
on Œa; b implies the uniform continuity of f on Œa; b, the definition of continuity
at each point of y 2 Œa; b produced a collection of open intervals which form an
open cover T of Œa; b. If that open cover had a finite subcover T 0 , then you would be
dealing with only a finite number of ı > 0 values, and you could expect to produce
a smallest such ı > 0. Whether such a finite subcover exists has nothing to do with
the continuous function f that motivated this discussion. A closed bounded interval
Œa; b in the real numbers is compact which means that every open open cover of
Œa; b contains a finite subcover. The fact that every closed bounded interval in the
real numbers is compact is known as the Heine–Borel Theorem, and it is central
to proving the above result about continuous functions on closed bounded intervals
being uniformly continuous there. In fact, the Heine–Borel Theorem is an important
tool for proving many results in analysis.
Suppose that for every rational number in Œ0; 1 you represent the rational
number in lowest terms as pq . Then for each of these rational numbers you
4.4 Compactness and the Heine–Borel Theorem 111
Presented next are two quite different proofs of the Heine–Borel Theorem. The
techniques used in both proofs are instructive, and it is interesting to see how a
single result can be proved using two completely different strategies. Given in each
case are real numbers a < b and a set of open intervals T that forms an open cover
of the closed bounded interval Œa; b. Both proofs seek to show that there must be
a finite subset of T that covers Œa; b. The strategy in the first proof suggests that,
whether or not you can cover Œa; b with a finite number of open intervals, you can
certainly cover some of the interval starting at a and working at least part of the way
toward b. The proof proposes looking at the set
S D fx 2 Œa; b j T has a finite subcover that covers the interval Œa; xg:
The proof first shows that S is not empty because it contains the point a. The set S
is bounded above by b, so S has a least upper bound, r. This is not to say that r 2 S,
but if r is not in S, there must be values in S that are arbitrarily close to r. Certainly
r is in Œa; b, so there is an open interval from T that covers r. Since there are values
of S arbitrarily close to r, there are some inside this open interval containing r. This
open interval then extends the finite subcover to values greater than r. One can only
conclude that r must be b, and, in fact, b 2 S. Thus, Œa; b has a finite subcover, and
the proof is complete (Fig. 4.4).
112 4 Continuity
PROOF (Heine–Borel Theorem): Let a < b be two real numbers, and let
T be an open cover of Œa; b. Then T contains a finite subcover of Œa; b.
• Let a < b be two real numbers, and let T be an open cover of Œa; b.
• Define set S D fx 2 Œa; b j T has a finite subcover that covers the interval
Œa; xg.
• The set T is an open cover of Œa; b, and a 2 Œa; b, so T must contain at
least one open interval, .p; q/ which contains the point a, that is, p < a < q.
Since the interval Œa; a is covered by .p; q/ 2 T, the point a 2 S, and S is
not an empty set.
• The set S is bounded above by b.
• Since S is nonempty and bounded above, it has a least upper bound r.
• Since r must be at least a and cannot be greater than b, r 2 Œa; b, so there
is an interval .p; q/ in T which contains the point r, that is, p < r < q.
• Since p < r and r is the least upper bound of S, p is not an upper bound of
S. Thus, there is a point y 2 S with p < y. This means that there is a finite
set of intervals in T that covers Œa; y.
• Let z D min. rCq 2
; b/. Since z r and z 2 .p; q/, adding the interval .p; q/
to the finite set of intervals of T that covers Œa; y produces a finite set of
intervals in T that covers Œa; z, and z 2 S.
• But r is the least upper bound for S, implying that z r. Because z D
min. rCq2
; b/ and rCq
2
> r, it must be that z D b.
• Because z 2 S, it follows that b 2 S which completes the proof of the
theorem.
p q
[
a
( y r z
) ]
b
So, if it is the case that Œa0 ; m0 cannot be covered by a finite number of intervals
in T, let a1 D a0 and b1 D m0 . Otherwise, if Œm0 ; b0 cannot be covered by a finite
number of intervals in T, let a1 D m0 and b1 D b0 . In either case, the new interval
Œa1 ; b1 Œa; b cannot be covered by a finite collection of intervals in T.
Now the proof continues iteratively. If for some j > 0, there is an interval Œaj ; bj
contained in Œa; b which cannot be covered by any finite collection of intervals in
a Cb
T, let mj D j 2 j be the midpoint of the interval. Either Œaj ; mj or Œmj ; bj cannot be
covered by a finite collection of intervals from T, so if Œaj ; mj cannot be covered by a
finite collection of intervals, let ajC1 D aj and bjC1 D mj . Otherwise, let ajC1 D mj
and bjC1 D bj . In either case ŒajC1 ; bjC1 cannot be covered by a finite collection
of intervals from T. Notice that this process constructs a sequence of intervals
Œa0 ; b0 ; Œa1 ; b1 ; Œa2 ; b2 ; : : : contained in Œa; b, none of which can be covered by
a finite collection of intervals in T. Also note that a D a0 a1 a2 : : :
while b D b0 b1 b2 : : :, and for each j, the length of the jth interval
is bj aj D ba 2j
. Since each aj term is less than all of the bk terms, both of the
monotone sequences are bounded and, therefore, converge. Moreover, since for each
k, lim bj lim aj bk ak D ba 2k
, it follows that lim aj D lim bj D r 2 Œa; b.
j!1 j!1 j!1 j!1
Note that since the sequence of aj ’s increases to r, and the sequence of bj ’s decrease
to r, the limit r 2 Œaj ; bj for each j. Because the limit, r, is in Œa; b, there is an open
interval .p; q/ 2 T such that r 2 .p; q/. The distance the limit r is from the boundary
of the interval .p; q/ is D min.r p; q r/ > 0. Since lim ba 2j
D 0, you can
j!1
select a j so that ba
2j
< . Then it follows that p r < aj r bj r C < q,
and, so, Œaj ; bj .p; q/. But this shows that Œaj ; bj is covered by the single open
interval .p; q/ 2 T contradicting the fact that Œaj ; bj could not be covered by a finite
collection of intervals in T. Thus, you must conclude that the assumption that Œa; b
cannot be covered by a finite number of intervals is false. A formal proof follows
(Fig. 4.5).
PROOF (Heine–Borel Theorem): Let a < b be two real numbers, and let
T be an open cover of Œa; b. Then T contains a finite subcover of Œa; b.
• Let a < b be two real numbers, and let T be an open cover of Œa; b.
• Assume that T contains no finite subcover of Œa; b.
• Let a0 D a and b0 D b so that the interval Œa0 ; b0 D Œa; b, and note that no
finite collection of intervals in T will cover Œa0 ; b0 .
• Define sequences <aj > and <bj > inductively. For j 0, let Œaj ; bj Œa; b
be an interval which cannot be covered by a finite collection of open
intervals in T, and where bj aj D ba 2j
.
aj Cbj
• Let mj D 2 be the midpoint of Œaj ; bj .
• It must be the case that at least one of the intervals Œaj ; mj or Œmj ; bj
cannot be covered by a finite number of intervals in T because, if both can
be covered by a finite number of intervals, putting those two collections
together would give a finite collection of intervals that covered the entire
interval Œaj ; bj .
• If Œaj ; mj cannot be covered by a finite collection of intervals, let ajC1 D aj
and bjC1 D mj . Otherwise, let ajC1 D mj and bjC1 D bj . In either case
ŒajC1 ; bjC1 cannot be covered by a finite collection of intervals from T, and
ba
bjC1 ajC1 D 22 D 2ba
j
jC1 .
• Thus, there are monotone sequences a D a0 a1 a2 : : : and b D
b0 b1 b2 : : :, and for each j, the length of the Œaj ; bj interval is
bj aj D ba
2j
.
• Since each aj term is less than all of the bk terms, both of the monotone
sequences are bounded and, therefore, converge. The fact that lim aj
j!1
lim bj lim .aj C ba
2j
/, shows that lim aj D lim bj D r 2 Œa; b.
j!1 j!1 j!1 j!1
• Because the limit, r, is in Œa; b, there is an open interval .p; q/ 2 T such
that r 2 .p; q/.
• The distance the limit r is from the boundary of the interval .p; q/ is D
min.r p; q r/ > 0. Since lim ba 2j
D 0, there is a j such that ba
2j
< .
j!1
• It follows that p r aj r bj rC < q, and, so, Œaj ; bj .p; q/.
• But then Œaj ; bj is covered by the single open interval .p; q/ 2 T contra-
dicting the fact that Œaj ; bj could not be covered by a finite collection of
intervals in T.
• Thus, the assumption that Œa; b cannot be covered by a finite number of
intervals is false, and the theorem is proved.
The fact that the interval Œa; b in the Heine–Borel Theorem is both closed and
bounded is crucial. The interval Œ1; 1/ is covered by the collection of open intervals
.j; j C 2/ for j D 0; 1; 2; 3; : : :, but no finite collection of these open intervals
can cover Œ1; 1/. The interval .0; 5/ is covered by the collection . 1j ; 5/ for j D
1; 2; 3; 4; : : :, but, again, no finite collection of these open intervals can cover .0; 5/.
4.4 Compactness and the Heine–Borel Theorem 115
With the Heine–Borel Theorem, it can now be shown that every continuous function
on a closed bounded interval is uniformly continuous on that interval. The idea is
simple enough: if f is continuous on the closed bounded interval Œa; b, then, given
> 0, at each point x 2 Œa; b there is a ı > 0 such that for any y 2 .x ı; x C ı/, it
follows that jf .x/ f .y/j < . Thus, there is an open interval around each x 2 Œa; b
that has the desired property, and the Heine–Borel Theorem shows that Œa; b can be
covered by just a finite number of these open intervals. Since each of these finitely
many open intervals is associated with a positive ı, you can select the smallest ı to
serve as the ı > 0 needed in your proof of uniform continuity.
There are, though, a couple of subtleties that get in the way of this simple
argument. First of all, for any y in one of the open intervals .x ı; x C ı/ you
can conclude that jf .y/ f .x/j < , but the proof will require that jf .y/ f .z/j <
for any y and z that are within the chosen ı of each other, not just for z D x, the
middle point of the interval. One can get around this problem by arranging that
jf .y/ f .x/j < 2 for all y 2 .x ı; x C ı/. This is a common trick in analysis
proofs. The definition of continuity allows you to find a ı > 0 that works for
any given > 0, so why not for 2 which is also greater than 0? Then for any
y and z in .x ı; x C ı/, you
can use the triangle inequality to conclude that
jf .y/f .z/j D j f .y/f .x/ f .z/f .x/ j jf .y/f .x/jCjf .z/f .x/j < 2 C 2 D .
There is a second problem with the this strategy. If you select y and z within ı of
each other, how do you know that they both lie within the same interval .xı; xCı/?
The interval Œa; b is covered by a finite number of such intervals, but just because
the two numbers y and z are close to each other does not mean that they will both
fall within the same interval in your finite collection of open intervals. There are a
couple of ways to get around this problem. One method is to consider the endpoints
of the intervals in your finite collection of open intervals. Since the number of open
intervals is finite, there are only finitely many endpoints to these intervals. You could
select the ı in the proof not to be the least of the ı’s used for any of the intervals
but to be the least distance between any two distinct elements of the collection of
endpoints of these intervals. That ensures that if y and z are closer together than ı,
there can be at most one endpoint between y and z. That will guarantee that y and
z will both be within one of the finitely many open intervals. This follows from the
fact that intervals in an open cover must overlap, so that each endpoint of one of
the open intervals must be a member of one of the other open intervals in the open
cover as seen in the following diagram (Fig. 4.6).
( y
(z ) )
Fig. 4.6 y and z straddle one endpoint but remain in an interval of the open cover
116 4 Continuity
A cleaner way to ensure that any y and z within ı of each other are in one of
the finite number of intervals in the open cover of Œa; b is to be more clever about
choosing the original open intervals. Suppose that for all y 2 .x ı; x C ı/, it
follows that jf .y/ f .x/j < . You can be very conservative and use the open
interval .x 2ı ; x C 2ı / as the interval chosen to cover x in the open cover of Œa; b.
Then if y and z are very close, and y 2 .x 2ı ; x C 2ı / for some x, it will follow that,
since y and z will be closer together than 2ı , guaranteeing that both y and z will be in
.x ı; x C ı/, and the result will follow. The following proof uses the first strategy.
Note that the fact that Œa; b is both closed and bounded is crucial. The function
f .x/ D x2 is continuous on the unbounded interval Œ0; 1/, but f is not uniformly
continuous on this interval. Similarly, the function f .x/ D 1x is continuous on the
open interval .0; 1/, but f is not uniformly continuous on this interval.
4.5 The Arithmetic of Continuous Functions 117
4.4.4 Exercises
Chapter 3 discusses several theorems about how one can calculate limits when faced
with the addition, subtraction, multiplication, or division of functions whose limits
are known. As one might expect, since continuity and limits are closely related, the
proofs of the corresponding theorems about functions continuous at a point are, in
fact, very similar. Before starting, it is worth pointing out that if f and g are two
functions, then you can define the new functions f C g, f g, f g, and gf at all
118 4 Continuity
points in the intersection of the domain of f and the domain of g and, in the case
of gf , only where g is not 0. Generally, one is interested in functions that have a
common domain, but sometimes this is not the case. Pathological examples do exist.
It could be, for example, that f is only defined for positive real numbers, and g is
only defined for negative real numbers as with f .x/ D p1x and g.x/ D p1x . Then
f C g has
an empty
domain and is the empty function, one that contains no ordered
pairs, x; f .x/ . Oddly, the definition of continuity says that the empty function is
continuous because it satisfies the definition at each point of its empty domain.
Suppose that functions f and g have a common domain where the point a is an
accumulation point of that domain. Also suppose that lim f .x/ D L and lim g.x/ D
x!a x!a
H. Recall that when proving that the limit of f C g is L C H, you are given an > 0
and can use the definition of limit to conclude that there are ı1 > 0 and ı2 > 0
such that if x is in the common domain of f and g with 0 < jx aj < ı1 , then
jf .x/ Lj < 2 , and if 0 < jx aj < ı2 , then jg.x/ Hj < 2 . Then the triangle
inequality allows you to conclude that for all x with 0 < jx aj < min.ı1 ; ı2 / that
j .f .x/ C g.x// .L C H/j D j .f .x/ L/ C .g.x/ H/ j jf .x/ Lj C jg.x/ Hj <
2
C 2 D . The same method works for the proof about continuity of f C g at a
with minor changes made to match the template for writing proofs about continuity
of a function at a point. Of course, the same logic works for proving the continuity
of f g, so the two results might as well be combined as follows.
PROOF: Suppose that f and g are functions with common domain
containing the point a. If both f and g are continuous at the point a, then
so are the functions f C g and f g.
• Let f and g be functions both defined on a set A containing the point a, and
assume that f and g are both continuous at a.
• Let > 0 be given.
• By the definition of continuity, there is a ı1 > 0 such that if x 2 A and
jx aj < ı1 , then jf .x/ f .a/j < 2 .
• Similarly, there is a ı2 > 0 such that if x 2 A and jx aj < ı2 , then
jg.x/ g.a/j < 2 .
• Let ı D min.ı1 ; ı2 /.
• Then
if x 2 Awith jx aj < ı,
• j f .x/ ˙ g.x/ f .a/ ˙ g.a/ j D j f .x/ f .a/ ˙ g.x/ g.a/ j
jf .x/ f .a/j C jg.x/ g.a/j < 2 C 2 D .
• This shows that f C g and f g are continuous at a.
Now suppose that f and g are functions as discussed above with lim f .x/ D L
x!a
and lim g.x/ D H. Recall how you prove that lim f .x/g.x/ D LH. Again, as with
x!a x!a
the proof for the sum of the limits, given > 0 you find ı > 0 so that both jf .x/ Lj
and jg.x/ Mj are small when 0 < jx aj < ı. How small do these need to
be? The idea was to write jf .x/g.x/ LHj as jf .x/g.x/ f .x/H C f .x/H LHj
jf .x/j jg.x/ Hj C jHj jf .x/ Lj. Thus, ı1 > 0 can be chosen to ensure that
4.5 The Arithmetic of Continuous Functions 119
jf .x/ Lj is less than 1, ı2 > 0 so that jf .x/ Lj is less than 2.jHjC1/ , and ı3 so that
jg.x/ Hj is less than 2.jLjC1/ . Then ı can be set to the least of ı1 , ı2 , and ı3 . The
proof for continuity of fg at the point a follows this same strategy.
Finally, suppose that f and g are functions as discussed above with lim f .x/ D L
x!a
f .x/
and lim g.x/ D H and H ¤ 0. This time recall how you prove that lim g.x/ D HL .
x!a x!a
The idea is the same ˇ as with ˇ the ˇproof for products,
ˇ ˇ but the algebra ˇ took
ˇ f .x/ Lˇ ˇ f .x/HLg.x/ ˇ ˇ ˇ
a few more steps. ˇ g.x/ H ˇ D ˇ g.x/H ˇ D ˇ f .x/HLHCLHLg.x/ ˇ D
ˇ ˇ g.x/H
ˇ .f .x/L/HCL Hg.x/ ˇ
ˇ ˇ jf .x/Lj C jLjjg.x/Hj . Then, given an > 0, you can
ˇ g.x/H ˇ jg.x/j jg.x/jjHj
• Let f and g be functions both defined on a set A containing the point a, and
assume that f and g are both continuous at a with g.a/ ¤ 0.
• Let > 0 be given.
• Note that jg.a/j > 0. By the definition of continuity, there is a ı1 > 0
such that if x 2 A and jx aj < ı1 , then jg.x/ g.a/j < jg.a/j 2
. For these x it
follows that jg.x/jC jg.a/j > jg.x/jCjg.x/g.a/j D jg.x/jCjg.a/g.x/j
ˇ ˇ2
ˇg.x/ C g.a/ g.x/ ˇ D jg.a/j which implies that jg.x/j > jg.a/j jg.a/j D
2
jg.a/j
2
.
• By the definition of continuity, there is a ı2 > 0 such that if x 2 A and
jx aj < ı2 , then jf .x/ f .a/j < jg.a/j 4
.
• By the definition of continuity, there is a ı3 > 0 such that if x 2 A and
2
jx aj < ı3 , then jg.x/ g.a/j < 4.jfg.a/ .a/jC1/
.
• Let ı D min.ı1 ; ı2 ; ı3 /.
• Then
ˇ if x 2ˇ A with
ˇ 0 < jx aj ˇ< ı,ˇ ˇ
ˇ f .x/ f .a/ ˇ ˇ f .x/g.a/f .a/g.x/ ˇ ˇ f .x/g.a/f .a/g.a/Cf .a/g.a/f .a/g.x/ ˇ
• ˇ g.x/ g.a/ ˇ D ˇ ˇDˇ ˇD
ˇ g.x/g.a/
ˇ g.x/g.a/
ˇ .f .x/f .a//g.a/Cf .a/ g.a/g.x/ ˇ jf .x/f .a/j ˇˇ f .a/.g.x/g.a// ˇˇ
ˇ ˇ C ˇ g.x/g.a/ ˇ <
ˇ g.x/g.a/ ˇ jg.x/j
jg.a/j 2
2
jg.a/j
4
C 4.jfg.a/ 2jf .a/j < 2 C
.a/jC1/ jg.a/j2
2
D .
• This shows that fg is continuous at a.
4.5.1 Exercises
1. Suppose that f and g are functions that are both uniformly continuous of a set
A. Find an example showing that their product need not be uniformly continuous
on A.
Write proofs for each of the following statements.
5
2. The function f .x/ D x 2 is continuous for x 0.
3. All polynomials are continuous on R.
4. All rational functions are continuous on R except at points where their denomi-
nators are 0.
5. If f and g are uniformly continuous on the set A, then f C g and f g are also
uniformly continuous on A.
4.6 Composition, Absolute Value, Maximum, and Minimum 121
As an example of how useful this theorem is consider the function jxj. One can
prove that this function is continuous fairly easily by following the template for
proofsˇ that a function
ˇ f is continuous at a point a. Indeed, such a proof must end
with ˇjxj jajˇ < , but by considering all ˇthe possible
ˇ cases for x and a being
negative or nonnegative,ˇ it can beˇ seen that ˇjxj jajˇ jx aj, so if jx aj is
made less than , then ˇjxj jajˇ < . This, in fact, shows that jxj is uniformly
continuous.
122 4 Continuity
As easy as this proof is, the continuity of jxj can more easily be proved as
follows.
PROOF: The function jxj is continuous.
p
• Let g.x/ D x2 , and f .x/ D x.
• Let a be any real number.
2
p since g.a/ D a 0, f is continuous at g.a/.
• Then g is continuous at a, and,
2
• Because the function jxj D x D .f ıg/.x/, it follows that jxj is continuous
at a.
In turn, this result can be used to show that if f and g are functions with domain A,
and f and g are both continuous at a 2 A, then the functions min.f ; g/ and max.f ; g/
are both continuous at a. This is because the functions min.f ; g/ and max.f ; g/ can
be expressed in terms of absolute value.
PROOF: If f and g are functions with the same domain A, and both func-
tions are continuous at a 2 A, then the function min.f ; g/ is continuous
at a.
• Let f and g be functions with the same domain A, and assume that both
functions are continuous at a 2 A.
• Note that for any two real numbers y and z, if y > z, then y C z jy zj D
y C z .y z/ D 2z, but if y z, then y C z jy zj D y C z .z y/ D 2y.
In either case, y C z jy zj D 2 min.y; z/.
• Thus, for any x, min f .x/; g.x/ D f .x/Cg.x/jf
2
.x/g.x/j
.
• Since f and g are continuous at a, so is f g.
• Since f g is continuous at a, so is jf gj.
• It then follows that the combination min.f ; g/ D f Cgjf 2
gj
is continuous
at a.
4.7 Other Continuity Theorems 123
4.6.1 Exercises
quantifier, and the second quantifier is a universal quantifier. Thus, the statement
“there is an M such that for all x 2 Œa; b, jf .x/j M” has an existential quantifier
stating that there exists a number M satisfying a property. This is followed by a
universal quantifier stating that all x in the interval Œa; b satisfy a property. Finally,
the property is given as “jf .x/j M.”
The rule of thumb for constructing the negation of statements with quantifiers
is to replace each existential quantifier with a corresponding universal quantifier,
replace each universal quantifier with a corresponding existential quantifier, and
replace the property with the negation of that property. In this example, the
existential quantifier “there exists a number M” would be replaced by the universal
quantifier “for all numbers M.” Then the universal quantifier “for all x 2 Œa; b”
would be replaced by the existential quantifier “there exists an x 2 Œa; b.” Finally,
the property “jf .x/j M” would be replaced by its negation “jf .x/j > M.” The
resulting negation is “for all numbers M there is an x 2 Œa; b such that jf .x/j > M.”
Your proof of the boundedness of f would begin by introducing f and the interval
Œa; b. Then it would assume negation just discussed. The remainder of the proof
would be to derive a contradiction, and that would show that the assumption made
at the outset of the proof is false, so its negation, the statement you were trying to
prove, must be true.
Thus, the proof would begin with a statement about f being a continuous function
on the closed bounded interval Œa; b which would be followed by the negation
of the statement you want to prove. So how do you use this negation to reach a
contradiction? Well, just see where this assumption leads you. If for each M you
can find an x 2 Œa; b where jf .x/j > M, it means that there is an x1 such that
jf .x1 /j > 1. Similarly, there is an x2 such that jf .x2 /j > 2. In this way, you can
assert that there is a sequence x1 ; x2 ; x3 ; : : : such that for each n 1, jf .xn /j > n.
Note that this gives you an infinite sequence of values in the closed bounded interval
Œa; b. The Bolzano–Weierstrass Theorem states that every infinite bounded set has
an accumulation point. Does the sequence x1 ; x2 ; x3 ; : : : produce such an infinite
bounded set? Well, it is certainly bounded because each xn is in the interval Œa; b.
Is it possible that the sequence does not give an infinite collection of points? For
that to happen, it would have to be the case that infinitely many of the value in
the sequence were equal to each other. Actually, just because you choose x1 so that
jf .x1 /j > 1 does not preclude having jf .x1 /j > 100, so the value x1 could appear
in the sequence many times. This is awkward. It would be easier if you chose a
sequence of distinct values. This is actually not hard to do. Rather than choosing xn
so that jf .xn /j > n, why not choose x1 as above, and for each n 1 choose xnC1 so
that jf .xnC1 /j > jf .xn /j C 1. This would not only imply that for each n, jf .xn /j > n
but also that xn could not equal any of the values that appear earlier in the sequence.
So what can you do with the infinite sequence of xn values with its guaranteed
accumulation point, y? First note that the accumulation point y is also in Œa; b
because all of the xn values satisfy both a xn and xn b, so the accumulation
point y must also satisfy a y b. Otherwise, there would be an interval around y
4.7 Other Continuity Theorems 125
a x1 x3 x4 x6 x5 x2 b
that did not share any points with Œa; b, so it would not contain any of the xn values.
This means that f is defined and continuous at y. That implies that there is a ı > 0
such that for all x 2 Œa; b satisfying jx yj < ı, it follows that jf .x/ f .y/j < 1.
But that means jf .x/j < jf .y/j C 1. But y is an accumulation point for the sequence
of xn values, so there are infinitely many of the xn within ı of y, and some of them
will necessarily have the property that jf .xn /j > jf .y/j C 1. This gives the needed
contradiction (Fig. 4.7).
(continued)
126 4 Continuity
Using the fact that continuous functions on closed bounded intervals are bounded,
there is a nice trick to show that a function f continuous on the closed bounded
interval Œa; b must achieve its extreme values, that is, its minimum and maximum.
The fact that the set of values that f takes on is a bounded set implies that the set
of values has a least upper bound, M. If f is never equal to M, then the function
M f .x/ is positive for all x 2 Œa; b because M is an upper bound, and f .x/ is never
equal to M. This implies that the function Mf1 .x/ is also continuous on the interval
Œa; b. But then you can again apply the previous theorem to show that there is a
number K such that for all x 2 Œa; b, Mf1 .x/ K. Taking reciprocals one more time
shows M f .x/ K1 which implies that f .x/ M K1 . This shows that M K1 < M
is an upper bound for f on Œa; b when M was assumed to be the least upper bound.
This is a contradiction, and you must conclude that f .x/ D M for at least one value
of x 2 Œa; b. The formal proof can be written as follows (Fig. 4.8).
minimum
a b
4.7 Other Continuity Theorems 127
Suppose the function f is defined on an interval containing c and d, and the graph of
f passes through the points .c; f .c// and .d; f .d//. It might be that the graph of the
function passes through every value of y between f .c/ and f .d/ as it moves between
the points .c; f .c// and .d; f .d// as shown in the figure (Fig. 4.9). For example, the
function f .x/ D 2x2 3 is defined for all real numbers
q with f .1/ D 1 and f .2/ D 5.
For each y between 1 and 5, the value x D yC3 2
lies between 1 and 2 and f .x/ D
y. Formally, a function defined on an interval Œa; b is said to have the intermediate
value property on that interval if for each choice of c and d with a c d
b and each y between f .c/ and f .d/, there is an x 2 Œc; d such that f .x/ D y.
The Intermediate Value Theorem states that any function continuous on an interval
has the intermediate value property there. If you consider the intuitive notion of
continuity where you say that f is continuous on Œa; b if you can draw the graph of
128 4 Continuity
f(d)
c x d
f without lifting your pencil from the paper, then this intermediate value property
becomes clear because in going from f .c/ to f .d/, your pencil will necessarily cross
over all the y values between f .c/ and f .d/.
To prove the Intermediate Value Theorem you would begin by setting the context
by introducing a function f continuous on an interval Œa; b and points c and d with
a c d b. Then you would select an arbitrary y between f .c/ and f .d/.
The proof would have to demonstrate the existence of an x between c and d with
f .x/ D y. How is this to be done? As with many other proofs in Analysis, one shows
the existence of a real number by constructing a set for which that number is a least
upper bound. Consider, for example, the case where f .c/ < y < f .d/. You could
construct the set S D fx 2 Œc; d j f .x/ yg. This set is not an empty set because
c 2 S, and S is certainly bounded above by d. Thus, the Completeness Axiom says
that the set has a least upper bound, s. Now you can refer to the continuity of f
which will show that if f .s/ < y, then there is a ı > 0 such that jx sj < ı
implies that f .x/ < y showing that there are values greater than s for which f .x/ < y
contradicting the fact that s is an upper bound of S. If f .s/ > y, then there is a ı > 0
such that jx sj < ı implies that f .x/ > y showing that s ı < s is an upper bound
for S contradicting the fact that s is the least upper bound of S. The only remaining
conclusion is that f .s/ D y which provides the needed example, x D s, needed to
prove the theorem.
Note that the above argument did not cover the general case where f .c/ and f .d/
can be in any order. The argument so far only covers the specific case where f .c/ <
f .d/. So is there more proof to write? It is easy to see that the case f .c/ > f .d/ can be
proved with an argument virtually identical to the one given above by changing the
sense of some of the inequalities. The case of f .c/ D f .d/ is even easier because the
only possible y between f .c/ and f .d/ is f .c/, so the value x D c gives the needed
f .x/ D y. Thus, giving the argument for f .c/ < f .d/ essentially covers all the
needed cases, and it would be very easy for the reader to add the needed arguments
to complete the proof for the missing cases. In this situation it is common for the
proof to cover only the specific condition f .c/ < f .d/ and introduce it with the
phrase without loss of generality. In this case the phrase means that although the
following assumption looks like it only covers some of the necessary cases, in order
4.7 Other Continuity Theorems 129
to make the argument completely general, the omitted cases are either very easy or
virtually identical to the case being considered. With this in mind, the following is
a proof of the Intermediate Value Theorem.
In the above proof the steps which begin “If f .x/ < y” and “If f .x/ > y” are written
in exactly the same style using almost identical words. If you were writing a short
story, you would avoid writing in this style because it might sound monotonous to
the reader. In creative writing, you would want to be more creative, and you would
reach for your thesaurus to find alternate words to enhance your writing. But in a
mathematical proof, using such parallel construction of sentences actually makes
the proof easier to read. A reader only needs to parse the first of the two steps in
order to have a good idea of what is going to be done in the second of the two steps.
This gives the reader a head start on processing the second step. What is passed off
as boring in creative writing can be applauded in the writing of proofs because of
the way it simplifies the understanding. In fact, one often begins the second of two
such steps with the word similarly to indicate that the argument to follow looks a lot
like the one just completed, again alerting the reader to the parallel construction.
The Intermediate Value Theorem says that functions continuous on an interval
have the intermediate value property there. But a function need not be continuous
for it to have the intermediate value property. Clearly, if a function has a jump
discontinuity at a point a, that is, if lim f .x/ and lim f .x/ both exist but are
x!a x!aC
different as shown in Fig. 4.10, then there could well be values of y that the function
misses as it passes from .c; f .c// to .d; f .d//.
130 4 Continuity
f(d)
c d
1
Fig. 4.11 Graph of sin x
For a discontinuous function to have the intermediate value property, the function
must necessarily
oscillate wildly (Fig. 4.11). A typical example is the function
sin 1x if x > 0
f .x/ D .
0 if x 0
4.7.4 Exercises
Write proofs for each of the following statements. Each statement can be proved
using one or more of the theorems in this section.
1. Let A R be a bounded set, and let f be a function defined on A. If f is
unbounded on A, then for every > 0, there exists a and b in R with b a <
such that f is unbounded on A \ .a; b/.
2. If a < b and f is a continuous function on Œa; b with f .a/ D f .b/, then there is a
c 2 .a; b/ such that f obtains an extreme value (either a minimum or maximum)
at c.
3. Suppose that f is a continuous function defined on R such that lim f .x/ D
x!1
lim f .x/ D 1. Then f obtains its minimum value for some x 2 R.
x!1
4.8 Discontinuity 131
4. If p is an odd degree polynomial with real coefficients, then p has at least one
real root.
5. Suppose that a plane contains be a polygon G and a line L. Then there is a line
L0 in the plane parallel to L such that exactly half the area of G lies on each side
of L0 . r
2 1
6. There is a value of x between 0 and 1 such that x equals .
1 C x2
4.8 Discontinuity
In Calculus students learn about a great many continuous functions. These include
the elementary functions: polynomials, rational functions, algebraic functions,
exponential functions, logarithmic functions, and circular and hyperbolic trigono-
metric functions and their inverses. How badly can a function be discontinuous? A
function can
8 be discontinuous 9 at a single point such as the signum or sign function
< 1 if x < 0 =
sgn.x/ D 0 if x D 0 or at a sequence of points such as the floor or greatest
: ;
1 if x > 0
integer function bxc D n if n is the integer satisfying n x < n C 1 (Fig. 4.12).
A function
( can be discontinuous at a sequence of points ) that converge such as with
1 1 1
if < x ; for positive integer n
f .x/ D n nC1 n . This function is discontin-
0 otherwise
uous at each x D 1n for positive integers n, but it is continuous everywhere else
including at x D 0 (Fig. 4.13). A function can be discontinuous at every x such as
0 if x is rational
with f .x/ D .
1 if x is irrational
But one of the most surprising examples is the following often called Thomae’s
function but also known as the popcorn function, the raindrop function,
or the modified
1 Dirichlet function. It is defined on
m
the interval .0; 1/ by
if x is rational written in lowest terms as
f .x/ D n n . Its graph is shown in
0 if x is irrational
Fig. 4.14. It is not hard to see that this function is discontinuous at each rational
number mn 2 .0; 1/. Indeed if mn is in lowest terms, then f . mn / D 1n . If is set
1
ˇat 2n ,mthen
ˇ for every ı > ˇ 0 there will be irrational numbers x 2 .0; 1/ satisfying
ˇ
ˇx ˇ < ı for which ˇf .x/ f m ˇ D j0 1 j > . On the other hand, at each
n n n
irrational number a in .0; 1/, the function is continuous. To see this, given an
> 0, notice that there are only finitely many rational numbers r 2 .0; 1/ such that
f .r/ . If there are such rational numbers, there is one, r0 , closest to a, so choose
ı D jr0 aj. If there are no such rational numbers, you can choose ı D 1. In either
case, for all x 2 .0; 1/ with jx aj < ı, it follows that jf .x/ f .a/j < , showing
that f is continuous at a.
Chapter 5
Derivatives
Anybody who was even half paying attention in their first course in Calculus got
the strong impression that the differentiation of functions has an enormous number
of applications. Not only does it provide a great tool for understanding the behavior
of functions, but it also has applications to a very wide range of other fields, most
notably Physics, Engineering, Chemistry, Biology, and Economics. In particular,
being able to use the derivative to determine where a function is increasing and
decreasing in itself justifies this reputation. Merely knowing the average rate of
change of a function over an interval is valuable. But the limit concept allows you to
refine this idea to get the instantaneous rate of change of the function at a point. This
allows for more precise information about the function as well as providing what is
often a simpler expression than that of the average rate of change from which it
is derived. This chapter will discuss the theorems needed to calculate derivatives
efficiently as well as theorems highlighting some of the important properties and
applications of the derivative.
Let f be a function defined on an open interval containing the point a.
Then for values of x near but not equal to a one can calculate the slope
of the secant
line passing
through the two points on the graph of the
function a; f .a/ and x; f .x/ . As shown in Fig. 5.1, the slope of this secant
line is given by the difference quotient f .x/f .a/
. If f is continuous, as x
xa
approaches a, the point x; f .x/ approaches the point a; f .a/ , and the secant
line may approach a tangent line, the line that passes through a; f .a/
and most closely approximates the graph of the function near a (Fig. 5.2).
The derivative of f at a is the slope of this tangent line. More formally, if a is
an accumulation point of the domain of the function f , and f is defined at a, then
the derivative of f at a is f 0 .a/ D lim f .x/f
xa
.a/
. The derivative is said to exist if this
x!a
limit exists. When the limit exists, f is said to be differentiable at a. Equivalently,
the limit can be written f 0 .a/ D lim f .aCh/f
h
.a/
.
h!0
f(x) - f(a)
(x, f(x))
x -a
The first important consequence of the definition of the derivative is that if a function
f has a derivative at a point, then f is also continuous at that point. As part of the
definition of derivative, f needs to be defined at the point a for it to have a derivative
at a. It remains to show that lim f .x/ D f .a/ whenever the limit lim f .x/f
xa
.a/
exists.
x!a x!a
For this difference quotient to have a finite limit when the denominator is clearly
approaching 0, the numerator must also be approaching 0. This last statement is
intuitively true, so you would hope that it has an easy justification. Consider what
sort of algebraic operations you could apply to the difference quotient f .x/f xa
.a/
in
order to produce the numerator f .x/ f .a/. It should be clear that if the difference
quotient is multiplied by x a, the product will be the desired difference f .x/ f .a/.
This suggests the method that works in the following simple proof.
5.3 Calculating Derivatives 135
The proof that a function f has a particular derivative at a point a is just a proof about
the limit of a difference quotient, and as such, is no different than a proof of any other
limit. On the other hand, there are some similarities among the proofs of derivatives,
so it is worth working through a few examples. The key observation is that whenever
you need to calculate a derivative directly from the definition, you must calculate the
limit of a difference quotient which, by design, is a fraction whose numerator and
denominator are both approaching zero. In such a case, one would expect to be able
to perform some algebraic manipulation that would result in the x a expression in
the denominator canceling with an equivalent factor in the numerator. This allows
you to use other limit theorems to complete the evaluation.
136 5 Derivatives
For example, consider the function f .x/ D 3x2 8x. To calculate the derivative
of f at a D 4, one needs to evaluate the limit
2
f .x/ f .4/ 3x 8x 3 42 8 4 3x2 8x 16
lim D lim D lim
x!4 x4 x!4 x4 x!4 x4
.3x C 4/.x 4/
D lim D lim 3x C 4 D 16:
x!4 x4 x!4
Since each step of this derivation follows either from rules of algebra or from
the theorems about calculating the limits of various arithmetic combinations of
functions, the calculation given is a complete proof that the derivate of f at x D 4
is 16.
In a more general setting, consider proving that the derivative of f .x/ D 5x4 at
the point x D a is f 0 .a/ D 20a3 . Here you would calculate
f .x/ f .a/ 5x4 5a4 5.x a/ x3 C x2 a C xa2 C a3
lim D lim D lim
x!a xa x!a xa x!a xa
D lim 5.x3 C x2 a C xa2 C a3 / D 5.a3 C a2 a C aa2 C a3 / D 20a3 :
x!a
Again, finding a factor of x a in the numerator of the difference quotient is the key
to evaluating the needed limit.
One quickly learns in Calculus that although the derivative is defined as a limit
of a difference quotient, there is a small collection of algorithms that reduce the
finding of the derivative of any combination of elementary functions to a fairly
mechanical exercise. The algorithms show you how to take the derivatives of the
sum, difference, product, and quotient of two differentiable functions as well as a
constant multiple of a differentiable function, the inverse of a differentiable function,
and the composition of two differentiable functions. Those rules along with the
knowledge of how to differentiate the elementary functions, xn , ax , loga x, sin x,
and cos x give you all the tools necessary to differentiate virtually any function you
are likely to see in a lifetime of applications. This and the next sections discusses
the proofs of the theorems that provide these needed algorithms.
The simplest of these results is the theorem that states that if f is a function
differentiable at a and c is any constant, then the function cf is also differentiable at
a with .cf /0 .a/ D cf 0 .a/. In the proof of this theorem, you would assume that f 0 .a/
exists. That provides for you the limit lim f .x/fxa
.a/
D f 0 .a/. Since the limit needed
x!a
to show that .cf /0 .a/ D cf 0 .a/ is just a multiple of a known limit, the needed result
follows immediately from the fact that the limit of a constant times a function is the
constant times the limit of the function.
5.4 The Arithmetic of Derivatives 137
of the theorem.
Why does the first step in this proof make the assumption that f and g are defined
on the same domain? This is to avoid the embarrassing situation that the intersection
of the domains of f and g isolates the point a. For example, if f is defined for all
x 1 and g is defined for all x 1, it could be that both f 0 .1/ and g0 .1/ are defined,
but the function f Cg is defined only at 1, so its derivative cannot be defined. Another
138 5 Derivatives
• Suppose that f and g are functions defined on a common domain, and that f
and g are both differentiable at a with g.x/ ¤ 0.
• From the definition of derivative f 0 .a/ D lim f .x/f
xa
.a/
and
x!a
g0 .a/ D lim g.x/g.a/
xa
.
x!a
• Because g is differentiable at a, it is continuous at a. This implies that
lim g.x/ D g.a/.
x!a
0 f
.x/ f
.a/ f .x/ f .a/
g.a/
f g g g.x/
• Then .a/ D lim D lim D
g x!a xa x!a xa
f .x/ f .a/ f .a/ f .a/
g.x/
g.x/ C g.x/ g.a/
lim D
x!a xa
1 1
!
f .x/ f .a/ 1 g.x/
g.a/
lim C f .a/ D
x!a xa g.x/ xa
f .x/ f .a/ 1 f .a/ g.a/ g.x/
lim C D
x!a xa g.x/ g.x/g.a/ xa
f .x/ f .a/ 1 f .a/ g.a/ g.x/
lim lim C lim lim D
x!a xa x!a g.x/ x!a g.x/g.a/ x!a xa
1 f .a/ f 0 .a/g.a/ f .a/g0 .a/
f 0 .a/ 0
2 g .a/ D 2 .
g.a/ g.a/ g.a/
0
f f 0 .a/g.a/ f .a/g0 .a/
• Thus, .a/ D 2 .
g g.a/
5.4.1 Exercises
This proof attempt does include the intuitive reasoning behind why the Chain Rule
works, but the proof is not correct. Can you spot the error? The problem is that even
though g.x/ is approaching g.a/ as x approaches a, there is no guarantee where g.x/
is different from g.a/. In fact, it is quite easy to construct functions g.x/ which are
differentiable at a for which g.x/ is equal to g.a/ for infinitely many values of x
as x approaches a. The simplest example
2 is when g is a constant function. A more
x sin 1x if x ¤ 0
complicated example is g.x/ D which has a derivative of
0 if x D 0
1
0 at x D 0 and is equal to g.0/ D 0 at n for all nonzero integers n. Clearly,
when g.x/ D g.a/, one cannot both multiply and divide the difference quotient
by g.x/ g.a/ and expect to get anything except nonsense. This problem does not
hurdle because, in the cases where g.x/ D g.a/, the difference
present anenormous
f g.x/ f g.a/
quotient is itself equal to 0. A typical way around the problem is to
xa 8 9
ˆ
< yg.a/ if y ¤ g.a/ >
f .y/f g.a/
=
introduce the function h.y/ D . This function has the
:̂ 0 >
;
f g.a/ if y D g.a/
nice property that it is equal to the desired difference quotient when g.x/ differs
from g.a/, and it is continuous at g.a/. Introducing this function into the proof gets
around the technical difficulties of the previously attempted proof.
limit of xa
In its application to the equation .f ı f 1 /.x/ D x one can rewrite
g.x/g.a/
.
xa
the difference quotient as g.x/g.a/
xa
D xa . Now there is no a priori assumption
h g.x/
xa
that the limit of g.x/g.a/
xa
exists; its limit is just the limit of the quotient xa which
h g.x/
exists as the quotient of limits.
As an application
p of differentiating an inverse function, consider finding the
derivative of n x for integer values of n ¤ 0. It is known that, for integer values of
n, the derivative of f .x/ D xn is f 0 .x/ D nxn1 . For n ¤ 0 and x 0, the inverse of
p 1 1
the function of f is f 1 .x/ D n x, so its derivative must be 0 1 D p .
f f .x/ n. n x/n1
Perhaps the most important property of the derivative is its ability to determine
where a function is increasing or decreasing. Let f be a function defined on an
interval I. If for all x and y in I, x < y implies that f .x/ f .y/, then f is said to be
increasing on I, and if x < y implies that f .x/ < f .y/, then f is said to be strictly
increasing on I. Similarly, if x < y implies that f .x/ f .y/, then f is said to be
decreasing on I, and if x < y implies that f .x/ > f .y/, then f is said to be strictly
decreasing on I.
So what can be said if it is known that function f has a positive derivative at a?
What is known is that the difference quotient f .x/f xa
.a/
has a positive limit, so it is
positive when x is close to a. How close to a does x have to be? What the limit
definition gives you is that for any > 0, you can find a ı > 0 so that f .x/f xa
.a/
is
within of its limit, f 0 .a/, which is positive. So, if > 0 is chosen to be f 0 .a/, then
the difference quotient which has to be within f 0 .a/ of f 0 .a/ will have to be positive.
Thus, for x within ı of a (and not equal to a), the difference quotient f .x/f xa
.a/
is
positive. Then if x > a, it follows that f .x/ > f .a/, and if x < a, it follows that
f .x/ < f .a/. Does this mean that f is increasing? The answer is no. There are
functions with a positive derivative at a which are not increasing over any open
interval containing a. An example of such a function is given in the last section of
this chapter. All one can say is the following.
144 5 Derivatives
f(x)
a b c d e g h
Fig. 5.4 This graph of f .x/ on the interval Œa; h shows relative maxima at b, d, and g, relative
minima at a; c; e; and h; an absolute maximum at b, and an absolute minimum at h. The derivative
f 0 .x/ does not exist at x D d
Any student of Calculus will see applications of this result where one is asked
to identify relative extrema for a particular function, and applications to what are
fondly called Max/Min problems where one is first asked to construct an appropriate
function to fit the application and then find a particular extremum of that function.
One defines a critical point of f to be a value a where either f 0 .a/ D 0 or f 0 .a/ does
not exist. Not all of these points will end up being relative extrema for some may just
be a saddle point of f where f 0 .a/ D 0, but f has no relative extrema at that point.
For example, the function f .x/ D x3 has a saddle point at x D 0 where f 0 is 0, but f
is a strictly increasing function over the entire real line. A function is said to have an
absolute maximum (sometimes called a global maximum) at a if f is defined at
a, and for all other x in the domain of f , f .x/ f .a/. The term absolute minimum
(sometimes called a global minimum) is defined in the analogous way with f .x/
f .a/, and an absolute extremum (sometimes called a global extremum) is either an
absolute maximum or absolute minimum. The theorem about relative extrema shows
that if f is defined on any interval I, then the only places f can have relative extrema
or absolute extrema are critical points or at endpoints of I. You should be able to
identify example functions where each of these criteria give extrema (Fig. 5.4).
5.6.1 Exercises
Identify the relative extrema and absolute extrema of the given functions on the
given intervals.
1. f .x/ D x3 8x on the interval Œ3; 2
2. f .x/ D 3x C 3x on the interval Œ1; 1/
3. f .x/ D jx2 16j on the interval Œ5; 6
p 2
4. f .x/ D 3 2 3 x on the interval Œ2; 2
146 5 Derivatives
The Mean Value Theorem is one of the better known results about derivatives, and
for good reason. It is invoked frequently when one needs to estimate the maximum
possible change between the values of a function at two different points. This can
be a valuable tool when finding approximations to functions or when it is necessary
to know how much variation is exhibited by a particular function. The theorem
states that the average rate of change of a function between two points a and b
given by f .b/f
ba
.a/
is equal to the value of the derivative f 0 .c/ for some c between a
and b. This allows you to use information about the derivative to make statements
about the change f .b/ f .a/. The theorem is usually proved in two steps by first
proving Rolle’s Theorem which is a simpler version of the Mean Value Theorem.
Rolle’s Theorem states that if a < b, and if f is a function continuous on the interval
Œa; b, differentiable on the interval .a; b/, and satisfying f .a/ D f .b/, then there is a
c 2 .a; b/ for which f 0 .c/ D 0.
What tools do you have to prove this result? Your proof needs to conclude that
f 0 .c/ D 0. Think through what you know about derivatives, and see if any of the
results conclude that the derivative is equal to 0. The only results that come to mind
are the result that the derivative of any constant function is 0, and the result that
if f reaches a relative extremum at a point where the function is differentiable,
then its derivative at that point must be 0. It is unlikely that the first of these two
results will be of much help except in the very special case where f is a constant
function. So how can you use the result about extreme values to show that there
is a place where the function has a derivative of 0? What you do know is that f is
continuous on a closed interval Œa; b, and the Extreme Value Theorem states that
such a function obtains its maximum and minimum values on this interval. You also
know that these extreme values can only occur at places where the derivative is 0,
where the derivative does not exist, or at the endpoints of the interval. OK, there are
no places on .a; b/ where the derivative does not exist, but could both the maximum
and minimum occur at endpoints of the interval? The hypothesis of Rolle’s Theorem
says that f .a/ D f .b/, so the only way that the two endpoints can be both maximum
and minimum values of f on the interval is for f to be constant on the interval. In
the case of a constant function, the theorem is clearly true. In any other case, it
could be that f .a/ and f .b/ are maximum values for f or minimum values for f , but
they cannot be both. If f is not constant, its maximum and minimum values must
be different. That guarantees that f must have either an absolute maximum or an
absolute minimum (possibly both) between a and b. That gives the result (Fig. 5.5).
5.7 The Mean Value Theorem 147
f(x)
a c b
Fig. 5.5 The proof of Rolle’s Theorem finds an extreme point c between a and b for which
f 0 .c/ D 0
Rolle’s Theorem takes care of the case where f .a/ D f .b/. To prove the Mean
Value Theorem in the more general case where f .a/ need not equal f .b/, you would
want to reduce this general case to the previously proved case where f .a/ and f .b/
are equal. An easy way to do this is to subtract a linear function from f to get a
new function h which does satisfy the hypothesis of Rolle’s Theorem. This linear
function can be any linear function that takes on a value at b which differs by
xa
f .b/ f .a/ from the value it takes on at a. One such function is f .b/ f .a/ ba
because it takes on the value 0 at a and f .b/ f .a/ at b (Fig. 5.6).
148 5 Derivatives
a c b
The following are two instructive applications of the Mean Value Theorem. First,
if you know that a function f is differentiable on an interval, and its derivative is
nonnegative on that interval, then the function must be increasing on the interval. To
show that a function is increasing, you need to show that if x and y are in the interval
with x < y, then f .x/ f .y/. This would follow from knowing that if y x 0,
then the difference quotient f .y/f
yx
.x/
0. What the Mean Value Theorem gives you
is that this difference quotient is equal to the derivative of f at some point c between
x and y. So, if you know that the derivative on the interval is always nonnegative,
then the difference quotient must be nonnegative as needed.
5.7 The Mean Value Theorem 149
Clearly, if f 0 is strictly positive on an interval, then you can prove that f is strictly
increasing on the interval. This can be done by altering the above proof by changing
the greater than or equal signs to greater than signs where needed. Is the converse of
the above theorem true? Well, one cannot conclude that a function is differentiable
on an interval by just knowing that the function is increasing there. But what if
you are given a differentiable function that is increasing? What can you conclude
about the derivative? If a function is increasing, it does mean that every difference
quotient f .y/f
yx
.x/
will be greater than or equal to 0, and, thus, the derivative which
is the limit of such difference quotients will have to be greater than or equal to 0. If
f is strictly increasing, can you conclude that its derivative is positive? In this case
you cannot. You can conclude that all difference quotients will be positive, but the
limit of positive difference quotients can be 0. For example, f .x/ D x3 is a function
differentiable on the entire real line, and it is strictly increasing, but its derivative is
0 at x D 0.
Another important consequence of the Mean Value Theorem is that if a function
has a derivative equal to 0 at every point of an interval, then f is constant on that
interval. Again, this follows directly from what you can say about any difference
quotient.
How important is it that the set where f 0 is 0 is an interval? The fact that the
0 if x < 0
set is an interval is crucial. For example, the function f .x/ D
1 if x > 0
is not defined at 0. The derivative, f 0 , is equal to 0 at each point of the domain
of f , but clearly, f is not a constant function, although it is constant on each
interval contained in its domain. Looking back to the previous theorem, note that
the function f .x/ D 1x has a strictly positive derivative at each point of its domain,
but, again, its domain does not include 0. This function is strictly increasing on
each interval contained in its domain, but it is not an increasing function because
f .1/ > f .1/.
5.7.1 Exercises
It seems like most students who take Calculus remember L’Hopital’s Rule. Even
those who do not remember what the rule states seem to remember its name.
Perhaps this is because it is so much fun to pronounce, but more students remember
L’Hopital’s Rule than some far more important results such as the Fundamental
Theorem of Calculus. L’Hopital’s Rule states that if f and g are differentiable
functions defined on an interval containing the point a with lim f .x/ D lim g.x/ D
x!a x!a
0
0, then lim gf 0.x/ f .x/
D L implies that lim g.x/ D L. This is very useful because the
x!a .x/ x!a
theorem stating that the limit of a quotient is the quotient of the limits does not
apply in cases when the denominator has a limit of 0.
How would you prove L’Hopital’s Rule? You might try to prove it by using the
Mean Value Theorem because the quotient you are considering is
f .x/f .a/
f .x/ f .x/ f .a/
D D xa
:
g.x/ g.x/ g.a/ g.x/g.a/
xa
5.8 L’Hopital’s Rule 151
This is not exactly correct because, as far as you know, f .a/ and g.a/ might not
even be defined, and if they are, they need not be equal to lim f .x/ and lim g.x/.
x!a x!a
This is not a big stumbling block, because you can always redefine f and g at a to
be equal to 0 without changing the result of the theorem. You also would need to
know that g.x/ ¤ g.a/ for x near a so that the needed quotient can be calculated.
Once the quotient of f and g is rewritten as the quotient of the difference quotient
of f and the difference quotient of g, you can apply the Mean Value Theorem to
replace the difference quotients with derivatives, and then take the limit. It might
look something like the following.
• Because cf .x/ and cg .x/ are both between x and a, lim cf .x/D lim cg .x/Da.
x!a x!a
f .x/f .a/
• Then because f .a/ D g.a/ D 0, lim f .x/ D lim f .x/f .a/ D lim xa
g.x/g.a/ D
x!a g.x/ x!a g.x/g.a/
x!a xa
f 0 cf .x/ 0
lim 0 D lim gf 0.x/
.x/
.
x!a g cg .x/ x!a
• This completes the proof.
There is a significant problem with this proof. The problem stems from the fact
that, although both functions cf .x/ and cg .x/ do approach a as x approaches a, the
two functions can approach a at different rates. Why is this a problem? Consider
2
calculating lim xx , a limit which is clearly equal to 0. But what if cf .x/ D x
x!0
and cg .x/ D x2 ? Even though it is true that cf .x/ and cg .x/ both approach 0 as
152 5 Derivatives
cf .x/2
x approaches 0, the limit lim D 1. Just knowing that cf .x/ and cg .x/ are
x!0 cg .x/
approaching a does not allow you to use both of these expressions in place of x
lim f 0 .x/
f .x/
when taking the limit. What the proof attempt does show is that lim D x!0
lim g0 .x/
,
x!0 g.x/ x!0
a result that is not as useful as L’Hopital’s Rule.
A second less crucial problem with this proof attempt is that it defines cf .x/ to
be the value of c such that f .x/f
xa
.a/
D f 0 .c/. But this condition may well be satisfied
by more than one value of c, so there is a problem with which of the possible values
of c is chosen. One can get around this difficulty, but that still does not address the
previously stated problem.
A common way to correct the problem in the proof attempt is to use a more
powerful version of the Mean Value Theorem known as the Extended Mean Value
Theorem or sometimes as the Cauchy Mean Value Theorem. It allows you to select
0
one value of c so that gf 0.c/.c/
f .x/f .a/
D g.x/g.a/ , that is, it allows you to select the ratio of
derivatives equal the ratio of the difference quotients at a single value of c rather
than selecting one value of c for the numerator and a possibly different value
of c for the denominator. One can prove the Extended Mean Value Theorem by
0
manipulating the desired relation gf 0.c/ .c/
f .x/f .a/
D g.x/g.a/ . This equation can be rewritten as
f 0 .c/Œg.x/ g.a/ D g0 .c/Œf .x/ f .a/ and then f 0 .c/Œg.x/ g.a/ g0 .c/Œf .x/ f .a/
D 0. This may be confusing because there are three variables involved, x, a, and c,
but you can make better sense of it by thinking of x and a as being fixed. That
is, if you define the function h.t/ D f .t/Œg.x/ g.a/ g.t/Œf .x/ f .a/, then
h0 .c/ D f 0 .c/Œg.x/ g.a/ g0 .c/Œf .x/ f .a/ as needed. How do you know that
there is a c such that h0 .c/ D 0? That follows from Rolle’s Theorem because it is
easy to verify that h.x/ D h.a/.
PROOF (Extended Mean Value Theorem): For a < b, let both f and g be
functions continuous on Œa; b and differentiable on .a; b/. Then there is a
c 2 .a; b/ such that f 0 .c/Œg.b/ g.a/ D g0 .c/Œf .b/ f .a/.
• Let a < b and assume f and g are functions continuous on Œa; b and
differentiable on .a; b/.
• For x 2 Œa; b define h.x/ D f .x/Œg.b/ g.a/ g.x/Œf .b/ f .a/.
• Then h is also continuous on Œa; b and differentiable on .a; b/.
• Note that h.a/ D f .a/Œg.b/g.a/g.a/Œf .b/f .a/ D f .a/g.b/g.a/f .b/,
and
h.b/ D f .b/Œg.b/ g.a/ g.b/Œf .b/ f .a/ D f .a/g.b/ g.a/f .b/ D h.a/.
• Thus, h satisfies the hypothesis of Rolle’s Theorem on the interval Œa; b.
• It follows that there is a c 2 .a; b/ such that h0 .c/ D 0, so
f 0 .c/Œg.b/ g.a/ g0 .c/Œf .b/ f .a/ D 0.
• This is equivalent to f 0 .c/Œg.b/ g.a/ D g0 .c/Œf .b/ f .a/ which is the
conclusion of the theorem.
5.8 L’Hopital’s Rule 153
Now the Extended Mean Value Theorem can be used to give a correct proof of
L’Hopital’s Rule.
L’Hopital’s Rule also holds in cases where lim g.x/ is infinite rather than
x!a
zero.
154 5 Derivatives
There are several variations of L’Hopital’s Rule covering the cases of one sided
limits and limits at positive or negative infinity. These are covered in the following
exercises.
5.9 Intermediate Value Property and Limits of Derivatives 155
5.8.1 Exercises
1
Fig. 5.7 x2 sin x2
and its derivative
0 and x. Moreover, it obtains each of those values infinitely often. In fact, between
0 and x, the function f 0 takes on every real number infinitely often (Fig. 5.7).
Note that the function f .x/ C x has a derivative of 1 at x D 0. This is an example
of a function with a positive derivative at 0 which is not an increasing function over
any open interval containing 0. This can easily be seen by the fact that in every open
interval containing 0 there are intervals where the derivative of f .x/ C x is negative.
So, how can you prove that if a function f has a derivative f 0 on an interval I,
that f 0 has the intermediate value property on I? The hypothesis suggests that you
start by taking a function f differentiable on an interval I and values a; b 2 I. Then
you select a value K between f 0 .a/ and f 0 .b/. Without loss of generality, you can
assume that a < b and f 0 .a/ < K < f 0 .b/. The goal would be to show that there
is a c between a and b such that f 0 .c/ D K. One simplification is to replace f with
the function g.x/ D f .x/ Kx. This function is also differentiable on I, and if
f 0 .c/ D K, then g0 .c/ D 0. Which theorems about derivatives allow you to conclude
that a derivative is 0 at some point in an interval? First there is a theorem that states
that if a differentiable function reaches an extreme value at a point in an interval,
then the point is either a critical point of the function or an endpoint of the interval.
A second theorem is Rolle’s Theorem which talks about a differentiable function
which takes on the same value at the endpoints a and b. Since you do not have any
information about the values of g at the endpoints of the interval, the theorem about
extreme values may be the more promising choice for this proof.
What is known about the function g? You know that g is differentiable at each
point of the interval from a to b. Additionally, g0 .a/ D f 0 .a/ K < 0 and g0 .b/ D
f 0 .b/ K > 0. Does this mean that the function g is decreasing at a and increasing
at b? Well, it would if you knew that g0 were continuous because then g0 would be
negative in an interval around a and positive in an interval around b. But, as you now
know, g0 need not be continuous. On the other hand, there is a theorem that says that
if g0 .a/ is negative, then there is a ı > 0 such that if x satisfies a < x < a C ı,
then g.x/ < g.a/. This does not show much, but you can use it to conclude that g
does not take on its minimum value on Œa; b at a. A similar argument uses the fact
that g0 .b/ > 0 to show that g does not take on its minimum value on Œa; b at b.
5.9 Intermediate Value Property and Limits of Derivatives 157
There are simple examples of functions that have discontinuous derivatives that
do not have the intermediate value property; functions such as f .x/ D jxj. This
function’s derivative is the constant 1 for all x > 0 and 1 for all x < 0. This
derivative is not continuous at x D 0 because it is not defined there. Clearly, f 0 does
not have the intermediate value property on any interval containing both positive
and negative numbers, but then f does not satisfy the hypothesis of the previous
theorem on any such interval because f 0 .0/ is not defined. Functions that have
discontinuous derivatives that are defined at all points will have to exhibit wild
oscillations
2 in the neighborhoods
of those discontinuities similar to the example
x sin x12 if x ¤ 0
.
0 if x D 0
Suppose f is a function whose derivative is defined at all points of an interval
except perhaps at some point c in the interval. What can be said if lim f 0 .x/ exists?
x!c
Such a derivative does not exhibit wild oscillations near c, and, in fact, it must
have a continuous derivative at c. The proof is a consequence of the Mean Value
Theorem.
158 5 Derivatives
6.1 Area
The first application one usually sees of the Riemann Integral is that of finding
the area of a region in the plane bounded by the graph of a function and the
lines x D a, x D b, and the x-axis. Thus, before discussing integration, it makes
sense to review what is meant by the area of a region in the plane. Clearly, the
measure of area should be a way to assign a size to a region in a way that is
compatible with the well-established rules from Geometry for assigning areas to
regions such as rectangles, triangles, and circles. But there is a need to go beyond
these simple regions so that area can be calculated for far more complicated regions.
For example, consider the region in the coordinate plane f.x; y/ j 0 x 1; 0
y 1; at least one of x or y is rationalg. Regions such as these are not typically
considered in a Geometry course, but being able to calculate areas for such sets
is important in the more general discussion of integration. This chapter, therefore,
begins by considering two different measures of the sizes of sets which will aid the
understanding of integration.
What does the set fA; B; C; D; Eg have in common with the set f2; 4; 6; 8; 10g? One
thing they have in common is that the two sets have the same number of elements.
What does the set of positive integers have in common with the set of positive
multiples of 2? These sets are both infinite sets, and the second set is clearly a
proper subset of the first, but, here again, the two sets have the same number
of elements. To see this consider the function f .n/ D 2n which is a bijection
from the set of positive integers one-to-one and onto the set of positive multiples
of 2. This function provides a one-to-one matching of the elements of one set
to the elements of the other set. One says that two sets A and B have the same
cardinality if there is a bijection f W A ! B. The bijection demonstrates a one-
to-one correspondence between the elements of set A and the elements of set B, so
one concludes that A and B are the same size. Some sets are finite, meaning that the
set is either empty (has cardinality 0) or, for some positive integer n, is in one-to-
one correspondence with the set f1; 2; 3; ; ng. A set is called denumerable if it
can be put in one-to-one correspondence with the set of positive integers. Thus, the
set of positive multiples of 2 is denumerable. So is the set of all integers since
the positive
integers can be mapped
onto the set of all integers using the bijection
x
if x is even
f .x/ D 2 . The verification that this map is a bijection is left
1 xC12
if x is odd
as an exercises. It shows that the integers and the positive integers have the same
cardinality. Sets that are either finite or denumerable are called countable because
they can be “counted out” by listing a first, second, third, and so forth. Thus, a good
way to think about a countable set is a set whose elements can be written down in
a finite or infinite sequence x1 ; x2 ; x3 ; because this listing shows the one-to-one
correspondence between the set and the natural numbers or one of its finite subsets.
The union of two countable sets is also countable. This can be seen by repre-
senting one set by the sequence x1 ; x2 ; x3 ; and the other by y1 ; y2 ; y3 ; . Then
the elements of the union of the two sets can be written as x1 ; y1 ; x2 ; y2 ; x3 ; y3 ; .
If there are elements that belong to both sets, then one can just leave the second
copies of those elements out of the listing. Clearly, this can be extended to the
union of any finite collection of countable sets, so the union of a finite number
of countable sets is countable. What might seem surprising is that the union of a
countable number of countable sets is still countable. That is, if A1 ; A2 ; A3 ; is
1
a sequence of countable sets, then the union [ Ak is also countable. To see this,
kD1
suppose that the elements in each Ak can be listed in a sequence ak;1 ; ak;2 ; ak;3 ; .
1
One can now list all the elements of [ Ak by listing the ak;j elements in increasing
kD1
order of k C j resulting in a1;1 ; a2;1 ; a1;2 ; a3;1 ; a2;2 ; a1;3 ; a4;1 ; a3;2 ; a2;3 ; a1;4 ; a5;1 ; .
As above, duplicate elements occurring because they belong to more than one set
can be left out of this listing. Figure 6.1 shows the order that the elements enter
the list.
Note that this result can be used to show that the set of rational numbers is
countable. Indeed, the rational numbers can be written as the union R1 [R2 [R3 [
where Rk are the rational numbers that can be written as a fraction with an integer
in the numerator and the positive integer k in the denominator. For example,
R2 D f 02 ; 12 ; 12 ; 22 ; 22 ; 32 ; 32 ; g. Thus, the rational numbers is a countable union
of countable sets showing that it is countable. The cardinality of a denumerable set is
often written using the symbol @0 (read “Aleph knot” or “Aleph null”). The symbol
represents the size of the natural numbers and the size of any set that can be placed
in one-to-one correspondence with the natural numbers.
A set which is not a countable set is called uncountable. There is a standard
argument that shows that the set of real numbers in the interval .0; 1/ is not a
countable set. The method, known as a diagonalization argument, first assumes
6.2 Cardinality of Sets 161
x3 = 0. 7 4 1 1 8 9 1 8 2 5 4 4
x4 = 0. 1 1 8 8 8 3 7 2 9 0 0 1
x5 = 0. 5 5 2 7 7 7 1 0 6 4 2 3
x6 = 0. 0 0 0 0 0 2 1 0 9 3 7 3
x7 = 0. 8 2 1 7 4 9 0 3 2 8 5 5
x8 = …
y = 0. 7 3 7 7 3 7 7 7 7 7 7 3
that the real numbers between 0 and 1 can all be written down in a sequence
x1 ; x2 ; x3 ; x4 ; . Then one constructs a real number y between 0 and 1 where the
kth digit to the right of the decimal point in y is chosen as follows. If the kth digit
to the right of the decimal point of xk is 7, then let the kth digit to the right of the
decimal point in y be 3. Otherwise, if the kth digit to the right of the decimal point of
xk is not 7, then let the kth digit to the right of the decimal point in y be 7. Figure 6.2
illustrates the process of determining y.
The point of this construction is that the number y is a real number in the interval
.0; 1/, but it cannot be one of the numbers in the sequence x1 ; x2 ; x3 ; x4 ; . This
is because for each k, y cannot equal xk because y and xk differ in their kth digits.
This is a contradiction to the assumption that the sequence contained all of the real
numbers in .0; 1/ and shows that it is impossible to list all the elements of .0; 1/ in
a sequence. Thus, this interval is an uncountable set. If there is a bijection from the
set .0; 1/ to a set B, then it follows that B will also be uncountable. You may wonder
whether all uncountable sets have the same cardinality. They do not, but that fact
will not be needed for the proofs discussed in this book. Refer to a standard text in
Set Theory for a far more in-depth look at the cardinality of sets.
162 6 Riemann Integrals
6.2.1 Exercises
Cardinality is used to compare the sizes of sets by considering how many elements
the sets have. But two sets such as Œ0; 3 and Œ0; 6 can have the same cardinality
and yet be quite different in what we traditionally think of as size in the geometric
sense. So there is a need to develop a different way to compare the sizes of sets that
embodies the notion of the length of a set of real numbers and of the area of a set in
the plane. A general theory of measure is not a topic that can be covered in a book
at this level, but it is helpful to introduce how one determines which sets should be
assigned a length or an area equal to 0.
If measure is to mean anything useful, you would want each finite interval Œa; b
to have measure equal to its length, ba. How about the measure of the open interval
.a; b/? Likely, you would say that its measure should also be b a. This suggests
that the set of endpoints fa; bg should be assigned a measure of 0. More generally,
a set S R is said to have measure zero if for each > 0 there is a sequence of
open intervals .a1 ; b1 /; .a2 ; b2 /; .a3 ; b3 /; : : : such that S is contained in the union of
1
the intervals S [ .aj ; bj / and the total length of the intervals is less than , that
jD1
P
n
is, for every natural number n, .bj aj / < . In other words, a set has measure
jD1
zero if you can cover it with a sequence of intervals whose total length is as small
as you want.
In particular, any finite set consisting of n real numbers has measure zero because
for any > 0, each point x in the set can be covered by the interval .x 3n ; x C 3n /,
2
and the total length of these intervals is 3 . Similarly, any countable set of real
numbers fx1 ; x2 ; x3 ; : : : g can be covered by intervals .xj 32 j ; xj C 32j /, and the total
P 2
1
length of these intervals is 32j
D 23 . Thus, the set of rational numbers, which
jD1
is countable, has measure zero. A similar argument shows that if A1 ; A2 ; A3 ; : : : is
1
a sequence of sets all of which have measure zero, then the union [ Aj also has
jD1
measure zero. Indeed, given > 0, for each j you can cover Aj with a sequence
of open intervals whose total length is less than 2j . Then the sequences of open
1
intervals can be combined into one sequence of intervals which cover [ Aj and has
jD1
P
1
total length less than 2j
D .
jD1
Since any countable set of real numbers has measure zero, if a set does not
have measure zero, it must be an uncountable set. A natural question is whether
an uncountable set of real numbers can have measure zero. The answer to this
question is yes. The most famous example of this is known as the Cantor set which
is constructed as follows. The construction begins with the closed unit interval
C0 D Œ0; 1. At the first stage, the open interval of length 13 is removed from the
middle of this set leaving two intervals each with length 13 so that C1 D Œ0; 13 [Œ 23 ; 1.
164 6 Riemann Integrals
Stage 0
Stage 1
Stage 2
Stage 3
Stage 4
Stage 5
At the second stage, open intervals of length 19 are removed from the middle of each
of the two remaining intervals leaving four intervals each with length 19 so that
C2 D Œ0; 19 [ Œ 29 ; 39 [ Œ 69 ; 79 [ Œ 89 ; 99 . This process is repeated so that at stage n,
open intervals of length 31n are removed from each of 2n1 closed intervals of length
1
3n1
leaving 2n closed intervals each with length 31n (Fig. 6.3). The Cantor set C
1
is then defined to be the intersection of all of these Cn sets, that is, C D \ Cn .
nD1
The Cantor set is sometimes called the Cantor middle thirds set, because, at each
stage, the middle thirds of the remaining intervals are removed. Other similar types
of Cantor-like sets can be constructed by removing other portions of each interval.
It is clear that the Cantor set has measure zero because it is contained in Cn which
is made up of 2n closed intervals each with length 31n . The total length of the closed
intervals in Cn is 23n , a quantity that goes to 0 as n gets large. Cn can be covered by
n
a finite collection of open intervals whose total length is 10 percent larger than 23n
n
showing that the Cantor set can be covered by open intervals whose total length is
as small as you want. So how do you show that the Cantor set is uncountable? To
see this, consider writing each number in the unit interval Œ0; 1 in base three. The
numbers in the interval Œ0; 13 are the numbers between 0 and 1 whose base-three
representation begins with 0.0, and the numbers in the interval Œ 23 ; 1 are the numbers
between 0 and 1 whose base-three representation begins with 0.2. The numbers in
the middle third of the interval that are removed at the first stage of the construction
process are the numbers between 0 and 1 whose base-three representation begins
with 0.1. Note that numbers at the endpoints of the removed interval, 13 and 23 each
has two different representations. Indeed, in base three 13 D 0:1 D 0:0222 and
2
3
D 0:2 D 0:1222 . One could say that C1 consists of all the numbers between 0
and 1 that can be represented in base three without a 1 in the first place to the right
of the decimal point, the one-third place. Similarly, C2 are the numbers between 0
and 1 that have a base-three representation with no 1 in either of the first two places
to the right of the decimal point. The Cantor set C is the set of numbers between 0
6.3 Measure Zero 165
and 1 that have a base-three representation that contains no digit equal to 1. Then
consider the map that takes each element of the Cantor set and divides it by 2. This
is an injective map that maps the numbers in the Cantor set to the set of numbers in
the unit interval that have base-three representations that include only the digits of
0 and 1 because it takes numbers with representations that only included the digits
of 0 and 2 and divides each of the digits by 2. Now, the numbers between 0 and
1 with base-three representations that include only the digits of 0 and 1 are clearly
in one-to-one correspondence with base-two representations of numbers in between
0 and 1. But all the real numbers between 0 and 1 have base-two representations
containing only 0 and 1, so the numbers in the Cantor set are as numerous as the
real numbers between 0 and 1. Thus, the Cantor set must be uncountable since the
set of real numbers between 0 and 1 is an uncountable set.
The concept of measure zero can be extended to sets in the plane, although here,
rather than being interested in the length of a set, the interest is in the area of the set.
Thus, rather than trying to cover a set with intervals whose total length is small, in
the plane one would try to cover a set with a sequence of squares whose total area
is small. Just as on the real line, it was taken as given that the length of an interval
Œa; b was b a, in the plane it will be taken as given that the area of a square with
side length x is x2 . Then, a region in the plane is said to have measure zero (or area
zero) if for each > 0, the set is contained in the union of a sequence of squares
whose total area is less than .
As it was with sets of real numbers, any countable set of points in the plane has
area zero because, for any < 0, you can cover the sequence x1 ; x2 ; x3 ; : : : with a
sequence of squares with total area less than . Moreover, let Y be a line segment
with length y > 0. Then Y has area zero. How would you prove this? Certainly, this
line segment is contained in a square with side length y which has area y2 , so the
square’s area could be rather large and, in particular, the area of the square is not
zero. Notice, though, that Y can also be covered by two side-by-side squares each
2
with side length 2y and each with area y4 giving a total area of the two squares equal
2
to y2 . This is the key to covering Y with squares with very small total area. If Y is
covered by a sequence of n adjacent squares each with side length ny , then the total
2 2
area of the n squares is yn . Since yn can be made arbitrarily small by choosing n
large, it follows that Y has measure zero (Fig. 6.4).
Fig. 6.4 Covering a line segment with smaller and smaller squares
166 6 Riemann Integrals
6.3.1 Exercises
1. Rather than constructing the Cantor set only on the interval Œ0; 1, perform the
same construction on each interval Œn; n C 1 for every integer n. Show that the
resulting set has measure zero.
2. Beginning with the interval Œ0; 1 construct a Cantor-like set, but instead of
removing intervals of length 13 at stage 1, 19 at stage 2, and so forth removing
intervals of length 31n at stage n, you remove an interval of length 14 at stage 1,
1
intervals of length 16 at stage 2, and so forth removing intervals of length 41n at
stage n. Show that the total lengths of the intervals remaining after stage n does
not approach zero as n approaches infinity.
3. Which of the following sets of real numbers have measure zero?
(a) the integers
(b) the irrational
p numbers
p
(c) fa C b 2 C c 3 j a; b; c are integers g
(d) the Cantor-like set where instead of removing the middle 13 of each
remaining interval at stage n, you remove the middle 14 of each remaining
interval
4. Show that if the set A has measure zero and B A, then the set B has measure
zero.
5. Show that a line in the plane has area zero.
6. Show that the set in the plane f.x; y/ j x is rationalg has area zero.
7. Suppose that the set A R has measure zero. Show that the set f.x; y/ j x 2
A; y 2 Œ0; 1g has area zero.
8. Suppose that the set A R has measure zero. Show that the set f.x; y/ j x 2 Ag
has area zero.
9. Show that f.x; y/ j 0 x 1; 0 y 1; at least one of x or y is rationalg has
area zero.
10. Show that the interval Œ0; 1 does not have measure zero. (Hint: Use the Heine–
Borel Theorem to reduce any cover to a finite subcover.)
When discussing area, it is not possible to avoid the limit concept, and this brings
a topic usually associated with Geometry into the field of Analysis. One could
even make a case for including much of Geometry as a subtopic of Analysis since
Geometry involves properties of distance, a distinguishing feature of Analysis.
What properties of area can be taken as given? One would hope that whatever
axioms are chosen, they would let you prove results about area that you know to be
true from Euclidean Geometry. The following axioms accomplish this.
6.4 Areas in the Plane 167
Axioms 1, 2, and 3 should agree with what you know about area from Geometry,
and they can be used to prove some simple results. For example, since a 1 1 square
has area 1, Axiom 3 can be used to show that an s s square has area s2 .
The result from the previous section that a line segment has area 0 is particularly
useful because of the way it can be used in conjunction with the Union area axiom.
In particular, suppose A and B are two squares or other polygons set side-by-side so
that they only share an edge. Because the shared edge is a line segment, it has 0 for
its area, and the Union area axiom shows that A [ B has an area equal to the sum of
the area of A and the area of B. By using mathematical induction, this result can be
extended to the union of many polygons that share borders. In particular, consider
finding the area of a rectangle with width x and length y. If xy is a rational number
equal to pq , where p and q are positive integers, then the x y rectangle is the union
of p q squares all with side length px . Indeed, the width of the rectangle which has
length x is spanned by p such squares, and the length of the rectangle which has
length y is spanned by q such squares showing that the entire rectangle can be tiled
2
by a p q array of squares, each with area px . The Union axiom then shows that
2
the area of the x y rectangle is p q px D x qp x D x y. It will require
the last of the area axioms to conclude that the area of any rectangle is equal to its
length times its width even when the length of the rectangle is an irrational multiple
of its width.
The last area axiom is essentially the Method of Exhaustion used some by
Euclid and much more extensively by Archimedes to calculate areas and volumes.
It is an example of a use of Calculus about 1800 years before the foundation of
Calculus was formally established by Newton and Leibniz. This axiom says that if
a region in the plane can be closely approximated by sets whose areas you know,
then you can figure out the area of the region. Take, for example, a rectangle B with
168 6 Riemann Integrals
width x > 0 and length y > 0 where the ratio xy D ˛ is irrational. It is certainly
possible to find other rectangles close to the size of B whose length to width ratios
are rational. To prove that B has area xy, the axiom requires that for each > 0 you
find a subset A B whose area is greater than xy and a set C containing B whose
area is less than xy C . Suppose you choose A to be a rectangle with width x and
length just a bit short of y, say rx, where r is a rational number chosen to be less
than but suitably close to xy . How close is suitably close? Well, you would need the
area of A, which is x rx D rx2 , to be within of xy, that is, xy rx2 < . Solving
for r shows that r > xy x2 . Is there such an r which is rational and between xy x2
and xy ? Of course there is. The rational numbers are dense in the real line; there are
rational numbers in every interval of positive length. Thus, you can select a rational
number r between xy x2 and xy and let A be an x rx rectangle. Then A can be
placed inside of B, and the area of A is within of xy. Similarly, you can choose a
rectangle C with width x and length sx, where s is a rational number chosen to be
greater than but suitably close to xy . You need the area of C to be within of xy, so
choose s so that x sx xy < . This happens if xy < s < xy C x2 . Since you have
found a rectangle A contained inside B and a rectangle C containing B with the areas
of A and C within of xy, the Exhaustion area axiom shows that B has area xy.
The familiar formula for the area of a triangle given as one half the base times the
height can be derived geometrically, but to prove this formula using the area axioms
requires more work. To begin, consider a right triangle with legs with lengths x and
y. Place this triangle in a rectangle with side lengths x and y. For any natural number
n, the rectangle can be overlaid with an n n grid of rectangles with side lengths
n
x
and ny . The hypotenuse of the triangle is the diagonal of the x y rectangle and
spans the diagonals of n of the smaller rectangles as shown in Fig. 6.5 exhibiting the
case where n D 8.
Because there are n grid rectangles along the hypotenuse of the triangle, it
2
must be that there are n 2n grid rectangles inside the triangle with a total area
2
of n 2n nxy2 D 1 1n xy2 . Similarly, the triangle is enclosed inside the union of
n2 Cn
2
grid rectangles with a total area of 1 C 1n xy2 . Clearly, n can be chosen large
enough to make both the total area of grid rectangles inside the triangle and the total
area of grid rectangles enclosing the triangle within a particular > 0 of xy2 . Thus,
the Exhaustion axiom shows that the area of the triangle is xy2 as expected. Since any
triangle can be partitioned into two right triangles, the well-known area formula for
the area of a triangle follows. Since any polygon can be partitioned into triangles,
the usual formulas from Geometry for the areas of polygons can be derived in the
same way they would be in Geometry.
You may wonder whether these techniques can be used to find the area of any
region in the plane, or at least any bounded region in the plane. This is a really good
question with a very complicated answer. The Area Axioms listed in this section
are designed to give the reader a feel for proofs about areas that will be useful
in the upcoming discussion of proofs about Riemann integrals. The axiom list is
not complete enough to allow the calculation of the area of many of the sets that
one might encounter. The area of Analysis known as Measure Theory provides a
somewhat richer environment for this study, but the complexities of measure theory
go beyond the aim of this text. What can be said is that even with the use of measure
theory, there are sets in the plane complex enough that one cannot assign an area
measure to them.
6.4.1 Exercises
a b a b
Fig. 6.6 Approximating the area under a curve with narrowing rectangles
If the function is in some sense well behaved, then as the widths of these rectangles
are chosen to be smaller and smaller, the total area of the rectangles will approach
the area of the region (Fig. 6.6). What is meant by well behaved will be a main focus
of the theorems presented in this chapter.
To make the definition of Riemann Integral precise, there needs to be a way to
talk about the placement of the vertical rectangles used to approximate the area
under a curve. This is done by designating the position of the vertical sides of the
rectangles with a collection of x values in the interval Œa; b. One defines a partition
of the interval Œa; b to be a finite sequence of x values P W a D x0 x1
x2 xn D b for some natural number n. These values of x break the interval
Œa; b into n subintervals Œxj1 ; xj . Note that the definition of partition does not say
anything about the lengths of the subintervals for the partition. Indeed, it could be
that the jth subinterval length xj xj1 could be 0 or could be as large as b a.
In particular, there is no requirement that all the interval lengths be the same size.
Since the lengths of the subintervals xj xj1 are used frequently in the discussion
of Riemann Integrals, one often uses the shorthand notation xj D xj xj1 .
Given a partition, P W a D x0 x1 x2 xn D b, one defines the
norm of the partition P , jjP jj, to be the maximum length of a subinterval of the
partition, that is, jjP jj D max xj . For example, if Œa; b D Œ1; 4 has the partition
jn
1; 1 12 ; 1 34 ; 2 12 ; 2 23 ; 2 23 ; 3 14 ; 3 12 ; 3 56 ; 4, then the norm of the partition is 34 D 2 12 1 34 , the
largest distance between any two of the adjacent points in the partition. As seen in
the previous section, one can get increasingly better approximations to the area of a
region by attempting to approximate the region by smaller and smaller polygons.
Thus, by requiring the norm of a partition to be smaller, the rectangles used to
approximate the area of a region bounded by a curve become smaller in width and
can give a better approximation.
For the Riemann Integral, a partition will determine the widths of the rectangles
used to approximate the area of a region. What will be used as the lengths of
those rectangles? Suppose a rectangle rests on the x-axis between xj1 and xj . If
the rectangle is going to fit inside the region bounded by the curve y D f .x/, then
6.5 Definition of Riemann Integral 171
the length of the rectangle (its height above the x-axis) cannot exceed inf f .x/.
xj1 xxj
If the rectangle is going to enclose the part of the region between xj1 and xj , then the
length of the rectangle must be at least sup f .x/. The definition of the Riemann
xj1 xxj
Integral uses a value between these two possible extremes. It requires the choice of
a sequence of x values 1 ; 2 ; 3 ; : : : ; n with xj1 j xj for each j. Then the
rectangle on Œxj1 ; xj is given the length f .j / so that it has area f .j /xj . Clearly,
the choice of j 2 Œxj1 ; xj results in the length of the rectangle being f .j / which
is between the two extremes inf f .x/ and sup f .x/, so the rectangles that
xj1 xxj xj1 xxj
result might neither be contained in the region bounded by the curve nor cover the
region. Instead, the lengths of the rectangles are allowed to be in between these two
extremes. The total area of all the rectangles is then given by the Riemann Sum
Pn
f .j /xj .
jD1
Now, given a function f defined on the interval Œa; b, one can define the Riemann
Rb
Integral of f on Œa; b to be I D f .x/dx if for every > 0 there is a ı > 0
a
such that for every partition P W a D x0 x1 x2 xn D b of Œa; b
with jjP jj <ˇ ı and for every ˇ choice of 1 ; 2 ; 3 ; : : : ; n with j 2 Œxj1 ; xj , it
ˇP ˇ Rb
ˇ n
ˇ
follows that ˇ f .j /xj I ˇ < . If f .x/dx exists, then f is said to be integrable
ˇjD1 ˇ a
(or Riemann integrable) on the interval Œa; b. The function f in the integral is
called the integrand. When the integrand f is a nonnegative function, this definition
results in a value for I that can be considered the area of the region bounded by the
x-axis, the lines x D a and x D b, and the curve y D f .x/. When f is allowed to take
on both positive and negative values, the value of I can be thought of as the area of
the region lying above the x-axis minus the area of the region lying below the x-axis.
The power of the definition of Riemann Integral is that it need not be associated
with area at all. The student may well be familiar with other applications to the
determination of moments, work, force, speed, distances, interest rates, populations,
and many other examples. It is convenient to extend the definition of Riemann
Rb Rb Ra
Integral to f .x/dx where b < a with the convention f .x/dx D f .x/dx.
a a b
Note that the definition of Riemann Integral, similar to the definitions of limit and
derivative, states that the integral of f between the numbers a and b is I if for every
> 0 there is a ı > 0 such that a particular inequality holds. But unlike previous
kinds of limits, the inequality that must hold for Riemann sums is supposed to be
true for every choice of a partition P and every choice of j ’s as long as jjP jj < ı.
Thus, it is not just that a region in the plane is being approximated by a sequence
of rectangles, but that the region must be closely approximated by every possible
172 6 Riemann Integrals
P
n
sequence of rectangles that arise from the Riemann sum f .j /xj . Also worth
jD1
noting is that the Riemann Integral is not the only way to define integration. Most
of the other definitions give the same value as the Riemann Integral for functions
where the Riemann Integral exists, but some of the other definitions give values to
integrals in situations where the Riemann Integral does not exist. Some examples of
other integration definitions include the Riemann–Stieltjes Integral, the Lebesgue
Integral, the Darboux Integral, and the Daniell Integral.
There are some fairly easy to describe functions that do not have a Riemann
1 if x is rational
integral. One simple example is the function f .x/ D whose
0 if x is irrational
Riemann integral is not defined on any interval Œa; b with a < b. To see why this is,
Pn
consider any Riemann sum f .j /xj . Because both the rational numbers and the
jD1
irrational numbers are dense in the real numbers, in any subinterval of the partition
which has positive length, there are values of j in the subinterval where f .j / D 0,
and other values of j in the subinterval where f .j / D 1. Thus, for any partition,
Pn
there are choices of the j ’s that make the Riemann sums equal to 0 xj D 0 and
jD1
P
n
other choices that make the Riemann sum equal to 1 xj D b a > 0. Thus,
jD1
no limit can exist.
6.5.1 Exercises
1. Let f .x/ D x. Partition the interval Œ1; 3 into n subintervals with 1 D x0 and
xj D xj1 C 2n for j D 1; 2; 3; : : : ; n.
(a) Find the minimum and maximum possible values for an associated Riemann
Pn
sum f .j /xj .
jD1
(b) Show that as n gets large, the Riemann sum must approach 4.
2. Let f .x/ D x2 . Partition the interval Œ1; 2 into n subintervals with 1 D x0 and
xj D xj1 C 3n for j D 1; 2; 3; : : : ; n.
(a) Find the minimum and maximum possible values for an associated Riemann
Pn
sum f .j /xj .
jD1
(b) Show that as n gets large, the Riemann sum must approach 3.
6.6 Properties of Integrals 173
There are many theorems about the properties satisfied by Riemann integrals. Some
of the proofs of these theorems merely rely on properties of summations since
P
n
the definition of the Riemann Integral is based on the Riemann sum, f .j /xj .
jD1
Consider the following results.
Rb
• If a, b, and c are a constants, then c dx D c.b a/.
a
• If f is an integrable function on the interval Œa; b and c is a constant, then
Rb Rb
c f .x/dx D c f .x/dx.
a a
Rb
• If f and g are functions integrable on the interval Œa; b, then .f C g/.x/dx D
a
Rb Rb
f .x/dx C g.x/dx.
a a
• If f and g are functions integrable on the interval Œa; b, and f .x/ g.x/ for all
Rb Rb
x 2 Œa; b, then f .x/dx g.x/dx.
a a
Rb
To prove that cdx D c.b a/, one needs to find a ı > 0 so that if the norm of a
a
P
n
partition is less than ı, then the Riemann sum f .j /xj is within some > 0 of
jD1
c.b a/. But in this case f .j / is always equal to the constant c, so the Riemann sum
is always equal to the desired integral, c.b a/. This makes the proof particularly
easy.
Note that the first four steps of this proof merely set up the assumptions required
by the definition of the Riemann Integral. That is, one needs to have constants
a and b and function f defined on the interval Œa; b. Then one needs to take an
arbitrary > 0, find an appropriate ı > 0, and consider an arbitrary Riemann
sum which satisfies the needed condition on the norm of the partition. Although
straightforward, these steps are necessary in order to show that the definition of
Riemann Integral is being satisfied.
174 6 Riemann Integrals
Rb
PROOF: If a, b, and c are constants, then c dx D c.b a/.
a
Rb Rb
Now consider the next theorem which states that c f .x/dx D c f .x/dx.
a a
Rb
In the proof of this result you will need to use the fact that f .x/dx D I to
ˇ ˇ a
ˇP ˇ
ˇ n
ˇ
say something about the size of ˇ c f .j /xj cI ˇ. But this expression equals
ˇjD1 ˇ
ˇ ˇ ˇ ˇ
ˇP ˇ ˇP ˇ
ˇ n
ˇ ˇ n ˇ
jcj ˇ f .j /xj I ˇ suggesting that if you can arrange for ˇ f .j /xj I ˇ to be
ˇjD1 ˇ ˇjD1 ˇ
ˇ ˇ
ˇP ˇ
ˇ n ˇ
small, then you can arrange for the product jcj ˇ f .j /xj I ˇ to be small. You
ˇjD1 ˇ
ˇwill need the product
ˇ to be less than some given > 0, so it is tempting to require
ˇP ˇ
ˇ n ˇ
ˇ f .j /xj I ˇ < jcj . This is fine except for the embarrassing case where c D 0.
ˇjD1 ˇ
One could handle this problem by breakingˇ the proof into two ˇ cases: c D 0 and
ˇP ˇ
ˇ n
ˇ
c ¤ 0. Easier, though, is to simply ask for ˇ f .j /xj I ˇ to be less than jcjC1 .
ˇjD1 ˇ
The use of jcj C 1 in the denominator is just a trick that takes care of the case where
6.6 Properties of Integrals 175
jcj is large
ˇ and the case ˇwhere jcj is 0 both at the same time. Of course, you can
ˇP ˇ Rb
ˇ n ˇ
arrange ˇ f .j /xj I ˇ < jcjC1 because that follows from f .x/dx D I.
ˇjD1 ˇ a
The third theorem in this section can be summarized by saying that the integral
of a sum is the sum of the integrals. Its proof is reminiscent of the proof of
the theorem stating that the limit of a sum is the sum of the limits, and of the
theorem stating that the derivative of a sum is the sum of the derivatives. In this
Rb Rb
case, you are given that f .x/dx D I and g.x/dx D J and are then faced with
a a
the distance that ˇthe Riemann sum for f C ˇg is from the value of the integral
ˇP ˇ
ˇ n ˇ
I C J given by ˇ .f C g/.j /xj .I C J/ˇ. This easily breaks into the two
ˇjD1 ˇ
ˇ ! !ˇ
ˇ P P ˇ
ˇ n n
ˇ
differences ˇ f .j /xj I C g.j /xj J ˇ. The existence of the two
ˇ jD1 jD1 ˇ
given integrals then lets you choose a value of ı > 0 that will ensure that the
two parts to this sum are both small.
176 6 Riemann Integrals
PROOF: If f and g are integrable functions on the interval Œa; b, then
Rb Rb Rb
.f C g/.x/dx D f .x/dx C g.x/dx.
a a a
• Let interval Œa; b be given, and let f and g be integrable functions on Œa; b
Rb Rb
with f .x/dx D I and g.x/dx D J.
a a
• Let > 0 be given.
• From the definition of Riemann Integral, there is a ı1 > 0 such that if
P W a D x0 x1 x2 xˇn D b is a partition ˇ with jjP jj < ı1 , then
ˇP ˇ
ˇ n ˇ
for every choice of j 2 Œxj1 ; xj , ˇ f .j /xj I ˇ < 2 .
ˇjD1 ˇ
• Similarly, there is a ı2 > 0 such that if P W a D x0 x1 x2
xˇ n D b is a partition
ˇ with jjP jj < ı2 , then for every choice of j 2 Œxj1 ; xj ,
ˇP ˇ
ˇ n
ˇ
ˇ g.j /xj J ˇ < 2 .
ˇjD1 ˇ
• Let ı D min.ı1 ; ı2 /.
• Let P W a D x0 x1 x2 xn D b be a partition of Œa; b with
jjP jj <
ˇ ı, and let j ’s be chosen with ˇ j 2 Œxj1 ; xj .
ˇP ˇ
ˇ n
ˇ
• Then ˇ .f C g/.j /xj .I C J/ˇ D
ˇjD1 ˇ
ˇ ! !ˇ
ˇ P P ˇ
ˇ n n
ˇ
ˇ f .j /xj I C g.j /xj J ˇ
ˇ jD1 jD1 ˇ
ˇ ˇ ˇ ˇ
ˇP ˇ ˇP ˇ
ˇ n ˇ ˇ n ˇ
ˇ f .j /xj I ˇ C ˇ g.j /xj J ˇ < 2 C 2 D .
ˇjD1 ˇ ˇjD1 ˇ
Rb Rb Rb
• Thus, .f C g/.x/dx D I C J D f .x/dx C g.x/dx.
a a a
The final theorem in this section states that if f .x/ g.x/ for all x 2 Œa; b, then if
Rb Rb
the functions are integrable, f .x/dx g.x/dx. It is sufficient to prove this result
a a
Rb
when f is the identically 0 function, because if h.x/ 0 implies h.x/dx 0, this
a
Rb
would imply that if f .x/ g.x/, then h.x/ D g.x/ f .x/ 0, so h.x/dx 0.
a
Rb Rb Rb
From there .g f /.x/dx D g.x/dx f .x/dx 0 and the needed result
a a a
follows. With the assumption that h.x/ 0 for all x 2 Œa; b, it is not hard to
Rb
show that h.x/dx 0, because the value of every associated Riemann sum must
a
6.6 Properties of Integrals 177
be nonnegative. How do you turn this into a proof? Recall how the proof went
when showing that if f .x/ 0, then lim f .x/ cannot be negative. If you assume that
x!a
the limit L is negative, then you can choose an D L2 . If f is never negative, it
follows that jf .x/ Lj is always greater than giving a contradiction. A very similar
argument works here where f is replaced by the Riemann sum.
PROOF: If f and g are functions integrable on the interval Œa; b, and
Rb Rb
f .x/ g.x/ for all x 2 Œa; b, then f .x/dx g.x/dx.
a a
• Let interval Œa; b be given, and let f and g be integrable functions on Œa; b
with f .x/ g.x/ for all x 2 Œa; b.
• Define h.x/ D g.x/f .x/ which is greater than or equal to 0 for all x 2 Œa; b.
Rb Rb Rb
Since f and g are integrable, so is h, and h.x/dx D g.x/dx f .x/dx.
a a a
Rb
Thus, it suffices to prove that h.x/dx 0.
a
Rb
• Assume instead that h.x/dx D I < 0, and let D 2I > 0.
a
• From the definition of Riemann Integral, there is a ı > 0 such that if P W
a D x0 x1 x2 xˇn D b is a partition ˇ with jjP jj < ı, then for
ˇP ˇ
ˇ n
ˇ
every choice of j 2 Œxj1 ; xj , ˇ h.j /xj I ˇ < .
ˇjD1 ˇ
ˇ ˇ
ˇP ˇ
ˇ n ˇ
• But for every choice of j , h.j / 0, so ˇ h.j /xj I ˇ D
ˇjD1 ˇ
Pn
h.j /xj I I > .
jD1
• This contradicts the assumption that I < 0 which completes the proof.
6.6.1 Exercises
2. If f is a function integrable on Œa; b with f .x/ c for all x 2 Œa; b, then
Rb
f .x/dx c.b a/.
a ˇ ˇ
ˇRb ˇ
ˇ ˇ
3. If f is a function such that both f and jf j are integrable on Œa; b, then ˇ f .x/dxˇ
ˇa ˇ
Rb
jf .x/jdx.
a
Knowing that integrable functions must be bounded is very helpful. If you can
claim that jf .x/j
P M for some constant M, then you know that any one term of a
Riemann sum njD1 f .j /xj can contribute at most M xj to the sum. By forcing
the norm of the partition, jjP jj, to be very small, you can control the maximum
size of xj and, thus, the maximum size of a term in the Riemann sum. This is the
key idea behind the proof of the next theorem which states that if f is integrable
Rc Rb Rc
on Œa; b and on Œb; c, then f .x/dx D f .x/dx C f .x/dx. To prove this, it is
a a b
natural to consider finding a ı1 > 0 so that Riemann sums arising from partitions of
Rb
Œa; b with norm less than ı1 are close to I D f .x/dx and finding a ı2 > 0 so that
a
Riemann sums arising from partitions of Œb; c with norm less than ı2 are close to
Rc
J D f .x/dx. You would consider allowing ı to equal the minimum of the ı1 and ı2 .
b
180 6 Riemann Integrals
Then you could take a partition of Œa; c with a norm less than ı. Unfortunately, this
partition of Œa; c does not separate into a partition of Œa; b and a partition of Œb; c
because there is no guarantee that the given partition of Œa; c includes the point b as
one of the xj values in the partition. But if you change the Riemann sum by altering
the interval of the partition containing the point b by adding b as an extra point to
the partition, you are not making a large change in the total sum. More precisely,
suppose the partition is P W a D x0 x1 x2 xn D c with the point b in
the interval Œxk1 ; xk . A resulting Riemann sum has the term f .k /.xk xk1 /. If this
term is replaced by two terms f .b/.b xk1 / C f .b/.xk b/, how much does this
change the Riemann sum? The change is exactly f .b/.b xk1 / C f .b/.xk b/
f .k /.xk xk1 / D .f .b/ f .k //.xk xk1 /. Given that f is integrable on Œa; b and
on Œb; c, you know that there is a bound M such that jf .x/j M for all x 2 Œa; c.
An upper bound for the size of this change is, therefore, 2M.xk xk1 / < 2Mı. This
says that by choosing ı small enough, you can control the amount of change made
in the Riemann sum by introducing b as a point in the partition of Œa; c. If ı is also
Rb Rc
chosen small enough so that the resulting Riemann sums for f .x/dx and f .x/dx
a b
are close to the corresponding integral, then the total difference between original
Rb Rc
Riemann sum and the sum of the integrals f .x/dx C f .x/dx is small enough.
a b
This is the idea behind the following proof.
PROOF: If f is a function integrable on the interval Œa; b and on the
Rc Rb Rc
interval Œb; c, then f .x/dx D f .x/dx C f .x/dx.
a a b
• Without loss of generality assume that a < b < c, and let f be a function
Rb
integrable on the interval Œa; b and on the interval Œb; c with I D f .x/dx
a
Rc
and J D f .x/dx.
b
• Because f is integrable on Œa; b, jf j is bounded on that interval by some
value M1 . Because f is integrable on Œb; c, jf j is bounded on that interval
by some value M2 . It follows that jf j is bounded on the interval Œa; b by
M D max.M1 ; M2 /.
• Let > 0 be given.
• From the definition of Riemann integration, there is a ı1 > 0 such that for
every partition P of Œa; b with jjP jj < ı1 and every choice of j 2 Œxj1 ; xj
on the intervals of the partition, the associated Riemann sum will be within
3
of the integral I.
• Similarly, there is a ı2 > 0 such that for every partition P of Œb; c with
jjP jj < ı2 and every choice of j 2 Œxj1 ; xj , the associated Riemann sum
will be within 3 of the integral J.
(continued)
6.7 Integrable Functions 181
• Let ı D min ı1 ; ı2 ; 6MC1 .
• Let P W a D x0 x1 x2 xn D c be a partition of Œa; c with
jjP jj < ı.
• Let ’s be chosen such that j 2 Œxj1 ; xj .
• Since b 2 Œa; c, there is a k such that b 2 Œxk1 ; xk .
• Then ˇ ˇ
ˇP ˇ
ˇ n ˇ
ˇ f .j /xj .I C J/ˇ D
ˇjD1 ˇ
ˇ ! !
ˇ k1
ˇ P Pn
ˇ f . /xj Cf .b/.b xk1 / C f .b/.xk b/ C f .j /xj C
ˇ jD1 j jDkC1 ˇ
ˇ
ˇ
.f .xk / f .b//xk .I C J/ˇ
ˇ
ˇ ˇ ˇ ˇ
ˇk1 ˇ ˇ ˇ
ˇP ˇ ˇ Pn
ˇ
ˇ f .j /xj Cf .b/.b xk1 / I ˇ C ˇf .b/.xk b/C f .j /xj J ˇ C
ˇjD1 ˇ ˇ jDkC1 ˇ
jf .xk / f .b/jxk :
• Since the partition a D x0 x1 x2 xk1 b D
ˇb is a partition of Œa; b withˇ norm less than ı1 , it follows that
ˇk1 ˇ
ˇP ˇ
ˇ f .j /xj C f .b/.b xk1 / I ˇ < 3 .
ˇjD1 ˇ
• Similarly, since the partition b D b xk xkC1
ˇxn D c is a partition of Œb; c with ˇ norm less than ı2 , it follows that
ˇ P ˇ
ˇ n
ˇ
ˇf .b/.xk b/ C f .j /xj J ˇ < 3 .
ˇ jDkC1 ˇ
• Also, jf .xk / f .b/jxk < 2M 6MC1 < 3 .
ˇ ˇ
ˇP ˇ
ˇ n ˇ
• Therefore, ˇ f .j /xj .I C J/ˇ < 3 C 3 C 3 D .
ˇjD1 ˇ
Rc
• This proves that f .x/dx D I C J and completes the proof of the theorem.
a
Note that you can easily show that this theorem also holds if a > b or b > c by
simply rearranging the order of the limits on one or more of the integrals.
The previous section discusses the theorem stating that if integrable functions
Rb Rb
satisfy f g on Œa; b, then f .x/dx g.x/dx. Can this statement be made
a a
stronger? That is, if f .x/ g.x/ for x 2 Œa; b, with f .x/ < g.x/ for some x 2 Œa; b,
Rb Rb
can you conclude that f .x/dx < g.x/dx? The answer is no. For example, if
a a
f and g only differ for a finite number of x values, then f and g will have identical
integrals. To prove this, start with two integrable functions, f and g, that are identical
for all x 2 Œa; b except for some t 2 Œa; b. How would you prove that f and g have
182 6 Riemann Integrals
identical integrals? Again, you should consider the Riemann sums associated with f
Pn
and g, that is, consider a Riemann sum g.j /xj for g with a particular partition
jD1
P
n
and choice of j ’s, and compare it to the corresponding sum f .j /xj for f . If
jD1
f .x/ D g.x/ at all points except x D t, how many of the corresponding terms in
these two Riemann sum could be different? Well, only those terms for which the
chosen j D t and xj ¤ 0. This could happen at most twice (twice in the unusual
case of t D xj D j D jC1 ). Thus, the Riemann sum for g is identical to the
Riemann sum for f plus at most two terms. By controlling the size of xj which
you can do by limiting the norm of the partition, you can control the contribution of
those at most two terms in the Riemann sum, thus ensuring that the sums for f and
g are close. That is the idea behind the following proof.
• Let f and g be a functions integrable on the interval Œa; b, and suppose that
f .x/ D g.x/ for all x 2 Œa; b except perhaps at t 2 Œa; b.
Rb
• Let f .x/dx D I.
a
• Let M D max.jf .t/j; jg.t/j/ C 1.
• Let > 0 be given.
• From the definition of Riemann Integration, there is a ı1 > 0 such that for
every partition P W a D x0 x1 x2 xn D b with norm less than
ıˇ 1 , and every choice
ˇ of j 2 Œxj1 ; xj , the associated Riemann sum satisfies
ˇP ˇ
ˇ n
ˇ
ˇ f .j /xj I ˇ < 2 .
ˇjD1 ˇ
• Let ı2 D 8M , and set ı D min.ı1 ; ı2 /.
• Select any partition P W a D x0 x1 x2 xn D b with
norm less than ı, and select any sequence of j 2 ˇ Œxj1 ; xj . Then ˇ the
ˇP ˇ
ˇ n ˇ
associated Riemann sum for the function g satisfies ˇ g.j /xj I ˇ
ˇjD1 ˇ
ˇ ˇ
ˇP ˇ
ˇ n
ˇ
ˇ f .j /xj I ˇ C jg.t/ f .t/j2ı < 2 C 2M 2 8M
D .
ˇjD1 ˇ
Rb Rb
• Thus, g.x/dx D I D f .x/dx which proves the theorem.
a a
It is left as an exercise to extend this theorem to the case where f and g differ at a
finite number of points. In fact, this can be extended to f and g which differ on an
infinite sequence of points in Œa; b as long as the sequence has a limit.
6.8 Step Functions 183
6.7.1 Exercises
Step functions play an important role in the theory of the Riemann integration. A
step function s on the interval Œa; b is associated with a partition P W a D x0 x1
x2 xn D b of Œa; b and has the property that s is constant8 on each interval
9
ˆ
ˆ 3 0 x < 2>>
ˆ
ˆ >
>
ˆ
< 1 2Dx >
=
of the partition, .xj1 ; xj /. For example, the function s.x/ D 4 2 < x < 4 is
ˆ
ˆ >
>
ˆ
ˆ 0 4Dx >
>
:̂ >
;
1 4 < x 5
a step function defined on the interval Œ0; 5 (Fig. 6.7). It follows easily that a step
function on an interval Œa; b is integrable there. Indeed, suppose that P W a D x0
x1 x2 xn D b, and s.x/ D cj for all x satisfying xj1 < x < xj . Clearly,
the constant function cj is integrable on the interval Œxj1 ; xj , and the function s.x/
differs from this constant function at at most the two endpoints, xj1 and xj . Thus,
184 6 Riemann Integrals
Rxj Rb n Rxj
P
by the previous theorem, s.x/dx D cj xj . Then, s.x/dx D s.x/dx D
xj1 a jD1 xj1
P
n
cj xj D cj xj .
jD1
The importance of step functions comes from the fact that a function f is
integrable on Œa; b if and only if f can be closely approximated by step functions.
Precisely, f has a Riemann integral on the interval Œa; b if and only if for every
> 0, there exist step functions u.x/ and v.x/ on Œa; b with the property that for
Rb Rb
all x 2 Œa; b, v.x/ f .x/ u.x/, and u.x/dx v.x/dx < . That is, f has
a a
an integral precisely when for every > 0 there is a lower step function v that is
always less than or equal to f and an upper step function u that is always greater
than or equal to f with the property that the integrals of v and u are within of each
other. This squeezes f between two step functions whose integrals are as close as
you want. This should remind you of the Exhaustion Area Axiom.
The statement of this theorem is a biconditional statement; that is, it is an “if and
only if” statement. This means that the proof will have two distinct parts. One proof
must show that if a function is integrable, then it can be approximated by very close
upper and lower step functions. The other proof must show that if a function can
be approximated by very close upper and lower step functions, then it is integrable.
Consider how you would approach the proofs of each of these statements.
For the first part of the proof, you would consider a function, f , integrable on an
interval Œa; b. Given an > 0, somehow you need to show that there are upper and
lower step functions, u and v, whose integrals are within of each other. Where do
you start? All you know about f is that it has a Riemann integral on Œa; b, thus, all
you have to go on is the definition of Riemann integration which makes a statement
about the properties of Riemann sums. The key observation here is that a Riemann
Pn
sum f .j /xj is equal to the integral of a step function defined to be equal to the
jD1
constant f .j / on the interval .xj1 ; xj /. Since the definition of the integral guarantees
6.8 Step Functions 185
that you can find Riemann sums that are very close to the value of the integral, this
suggests how you might choose step functions whose integrals are close to each
other. How can you assure that you choose a step function that is less than f .x/
for each x 2 Œa; b? For each interval of the partition .xj1 ; xj / you could consider
selecting j so that f .j / is the minimum value of f on that interval. Unfortunately,
f might not achieve a minimum value on that interval. Certainly, if f is continuous
on Œxj1 ; xj , then it obtains its minimum on that interval, but there is nothing here
indicating that f is continuous. On the other hand, you do know that, because f is
integrable, it is bounded. Thus, there is a greatest lower bound Mj D inf f .x/.
x2.xj1 ;xj /
There may not be any x 2 .xj1 ; xj / with the property that f .x/ D Mj , but you know
that there are values of x in the interval such that f .x/ is as close as you like to Mj .
Getting specific, now, your goal is to find upper and lower step functions whose
integrals are within some given > 0 of each other. It makes sense, therefore, to
find upper and lower step functions whose integrals are both within 2 of the value of
the integral of f because then the two step functions will be within of each other.
From the definition of Riemann integral, you can find a partition of Œa; b such that
all associated Riemann sums are within 4 of the integral of f . Then you can define
a lower step function, v.x/, that is equal to the infimum of f on each interval of the
chosen partition. On each interval of the partition you can find j values so that f .j /
is within 4.ba/ of v.j /. Then the integral of the lower step function will be within
4.ba/
.b a/ D 4 of a Riemann sum for f which in turn is within 4 of the integral
of f . This produces a lower step function with the properties you want. A similar
construction will produce an upper step function whose integral is also within 2 of
the integral of f , and that will complete the first part of the proof (Fig. 6.8).
For the second part of the proof, you consider a function, f , such that for each
> 0 you can find a lower step function, v.x/, and an upper step function, u.x/,
whose integrals are within of each other. You must then show that f has an integral.
The first task is to figure out what value I will serve as the integral of f . Your proof
will need to show that Riemann sums for f approach this value of I, so you first
4(b – a)
inf f(x)
ξj
xj–1 xj
need a target I for that purpose. To do this, consider the collection of all possible
lower step functions, v.x/. That is, let L D fv j v is a step function with v.x/
f .x/ for all x 2 Œa; bg. Each v 2 L has an integral, and each integral should be less
than or equal to the needed value of I. How about taking the least upper bound of
all of those integrals? Does the least upper bound exist? It does if the collection
of integrals of elements of L is bounded above. To get that, all you need is one
upper step function u. For each v 2 L and for each x 2 Œa; b, you know that
Rb Rb
v.x/ f .x/ u.x/. This ensures that for each v 2 L, v.x/dx u.x/dx showing
a a
that the set of integrals of elements in L is bounded above. That allows you to set
Rb
I D sup v.x/dx. This makes sense because I would then be greater than or equal
v2L a
to the integral of any lower step function. It would also have to be less than or equal
to the integral of any upper step function. Since the assumption is that the integrals
of lower step functions and upper step functions can be found arbitrarily close to
each other, and each integral of an upper step function must be greater than or equal
to any integral of a lower step function, you would expect that the least upper bound
of the lower step function integrals would be equal to the greatest lower bound of
the upper step function integrals, and this value is what you will choose for I.
After determining I, your proof can proceed naturally. You need to show that by
restricting the norm of a partition of Œa; b, you can force an associated Riemann
sum for f to be close to I. What you have at your disposal is the ability to find
upper and lower step functions whose integrals are close to each other. A helpful
observation is that if you have a lower step function v and an upper step function u,
then for any partition and choice of j in the intervals of the partition, you know that
Pn Pn P
n
v.j /xj f .j /xj u.j /xj . So you can choose upper and lower step
jD1 jD1 jD1
functions, u and v whose integrals are each within, say 2 , of I. Then you can choose
a norm of a partition so that any Riemann sum for v is within 2 of the integral of
v, and any Riemann sum for u is within 2 of the integral of u. That will force the
corresponding Riemann sum for f to be within of I completing the proof.
PROOF: The function f is integrable on the interval Œa; b if and only if for
every > 0 there are step functions, u and v, such that for each x 2 Œa; b,
Rb Rb
v.x/ f .x/ u.x/ and u.x/dx v.x/dx < .
a a
(continued)
6.8 Step Functions 187
Rb
• Assume that f is an integrable function with f .x/dx D I.
a
• Let > 0 be given.
• By the definition of Riemann integration, there is a ı > 0 such that for any
partition of Œa; b with norm less than ı and any choice of j ’s in the intervals
Pn
of the partition, the associated Riemann sum f .j /xj is within 4 of I.
jD1
• Let P W a D x0 < x1 < x2 < < xn D b be such a partition.
• Note that since f is integrable, f is a bounded function on the interval Œa; b.
• Because f is bounded, for each j D 1; 2; 3; : : : ; n, the value of
inf f .x/ exists. Therefore, there exists j 2 .xj1 ; xj / such that f .j / <
xj1 <x<xj
inf f .x/ C 4.ba/
.
xj1 <x<xj
• For each j, define v.xj / D f .xj / and for x 2 .xj1 ; xj /, define v.x/ D
inf f .x/ f .j / 4.ba/ .
xj1 <x<xj
• Then v is a step function with the property that v.x/ f .x/ for all x 2
Rb Pn P
n
Œa; b, and v.x/dx f .j / 4.ba/ xj D f .j /xj 4 . Since
a jD1 jD1
the Riemann sum was chosen to be within 4 of I, the integral of v is greater
than I 2 .
• Similarly, one can define an upper step function u in the same way that
v was defined except that, in this case, the j values are chosen to satisfy
f .j / > sup f .x/ 4.ba/ , and for x 2 .xj1 ; xj / the function u.x/ is
xj1 <x<xj
defined to be sup f .x/ f .j / C 4.ba/
.
xj1 <x<xj
• Then u is a step function with the property that f .x/ u.x/ for all x 2
Rb Pn P
n
Œa; b, and u.x/dx f .j / C 4.ba/ xj D f .j /xj C 4 . Since
a jD1 jD1
the Riemann sum was chosen to be within 4 of I, the integral of u is less
than I C 2 .
• It follows that u and v are upper and lower step functions for f and have the
Rb Rb
property that u.x/dx v.x/dx < .I C 2 / .I 2 / D .
a a
• This completes PART I of the proof.
PART II: Close upper and lower step functions implies integrability
• Assume that for every > 0 there exists step functions u and v satisfying
Rb Rb
v.x/ f .x/ u.x/ for every x 2 Œa; b, and u.x/dx v.x/dx < .
a a
(continued)
188 6 Riemann Integrals
• Let u be any upper step function for f . Every lower step function, v,
Rb
satisfies v.x/ f .x/ u .x/ for every x 2 Œa; b, implying that v.x/dx
( a )
Rb Rb
u .x/dx and that the set v.x/dx j v is a lower step function of f is
a a
Rb
bounded above by u .x/dx.
a
• The set of integrals of( lower step functions of f is nonempty and ) bounded
Rb
above, so let I D sup v.x/dx j v is a lower step function of f .
a
• Let > 0 be given.
• By assumption there are step functions u and v satisfying v.x/ f .x/
Rb Rb
u.x/ for every x 2 Œa; b, and u.x/dx v.x/dx < 2 .
a a
• Since the integral of any upper step function is an upper bound for the set
Rb
of all integrals of lower step functions, it follows that I u.x/dx <
a
Rb
v.x/dx C 2
< I C 2 .
a
Rb Rb
• Also v.x/dx > u.x/dx 2
> I 2 .
a a
• Since u is integrable, there is a ı1 > 0 such that for every partition of
Œa; b with norm less than ı1 and every choice of j ’s in the intervals of
Pn Rb
the partition, the Riemann sum u.j /xj is within 2 of u.x/dx.
jD1 a
• Similarly, there is a ı2 > 0 such that for every partition of Œa; b with norm
less than ı2 and every choice of j ’s in the intervals of the partition, the
Pn Rb
Riemann sum v.j /xj is within 2 of v.x/dx.
jD1 a
• Let ı D min.ı1 ; ı2 /, and let P W a D x0 x1 x2 xn D b be a
partition of Œa; b with jjP jj < ı.
• For each j, let j be chosen in the interval Œxj1 ; xj .
Rb Pn
• Then it follows that I D I 2 2 < v.x/dx 2 < v.j /
a jD1
P
n P
n Rb
f .j / u.j / < u.x/dx C 2 < I C 2 C 2 D I C .
jD1
ˇ jD1
ˇ a
ˇP ˇ Rb
ˇ n
ˇ
• Thus, ˇ f .j / I ˇ < which shows that f .x/dx D I and completes the
ˇjD1 ˇ a
proof of PART II.
6.9 Integrals of Continuous Functions 189
6.8.1 Exercises
The previous theorem about step functions gives a straightforward way to prove
that all continuous functions are integrable. Such a proof would take an arbitrary
function f that is continuous on the interval Œa; b for some a < b and an arbitrary
> 0, and show that f has upper and lower step functions whose integrals are
within of each other. What is it about such a continuous function, f , that allows
the construction of these upper and lower step functions? The important result about
continuous functions that comes into play here is that if f is continuous on Œa; b, then
it is uniformly continuous there. This has the consequence that there is a ı > 0 such
that if x and y are in Œa; b with jx yj < ı, then jf .x/ f .y/j < 2.ba/ . This means
that if xj1 < xj with xj xj1 < ı, then sup f .x/ inf f .x/ 2.ba/ .
x2.xj1 ;xj / x2.xj1 ;xj /
Defining upper and lower step functions to be equal to this supremum and infimum,
respectively, on .xj1 ; xj / gives the step functions with the needed property.
190 6 Riemann Integrals
One thing nice about knowing that a function is integrable on an interval Œa; b
is that rather than having to consider all partitions of Œa; b, you can determine the
value of the function’s integral by using any collection of partitions of Œa; b whose
norms approach zero. Thus, if you know that f is integrable on Œa; b, then for every
Pn
natural number n you could calculate I.n/ D f a C .b a/ nj ba n
which is the
jD1
Riemann sum for f based on the very specific partition where xj D a C .b a/ nj and
with j D xj . This is not the more general Riemann sum required by the definition
of the integral, but if you already know that the integral exists, then it must be equal
to lim I.n/.
n!1
As an example, consider the function f .x/ D x which is continuous on the
interval Œ0; 4, so you know that it is integrable there. You can then consider
P P 16 P
n n n
j 4
I.n/ D f a C .b a/ nj ba
n
D .4 /
n n
D n2 j D 16
n2
n.nC1/
2
. Then
jD1 jD1 jD1
R4
lim I.n/ is easily seen to be 8 which is x dx. On the other hand, if you try
n!1 0
6.9 Integrals of Continuous Functions 191
f(c)
a c b
0 x is rational
this with the function f .x/ D on the interval Œ0; 1, you obtain
1 x is irrational
Pn
I.n/ D f nj 1n D 1. So lim I.n/ D 1 which is not the integral of f . That
jD1 n!1
integral does not exist.
Now that it has been established that continuous functions are integrable, it is
appropriate to investigate the properties of the integrals of continuous functions.
The first of these properties is known as the Mean Value Theorem for Integration. It
states that the integral of a continuous function, f , on an interval, Œa; b, is given by
the length of the interval, b a, times one of the values f achieves on the interval.
Rb
That is, there exists a c 2 Œa; b such that f .x/dx D f .c/ .b a/. This result has a
a
nice visual interpretation showing that the area under a continuous curve is equal to
the area of a rectangle with length b a and width f .c/ for some c 2 Œa; b as shown
in Fig. 6.9. Another way to think about this is that there is a c 2 Œa; b such that f .c/
1
Rb
is the mean value of f which could be defined as ba f .x/dx.
a
The proof of this theorem follows easily from three earlier results: (1) the
Intermediate Value Theorem, (2) a continuous function on a closed interval takes
on its extreme values, and (3) if one integrable function is greater than or equal to a
second integrable function, then the integral of the first is greater than or equal to the
integral of the second. The proof starts with a function f continuous of an interval
Œa; b. That function achieves its minimum value K and its maximum value M on the
interval. Thus, for all x 2 Œa; b, it follows that K f .x/ M from which it follows
Rb
that .b a/K f .x/dx .b a/M. Then, by the Intermediate Value Theorem,
a
on the interval Œa; b the function f achieves every value between K and M including
1
Rb
ba
f .x/dx.
a
192 6 Riemann Integrals
1
Rb
• Because f .s/ D K ba
f .x/dx M D f .t/, the Intermediate Value
a
1
Rb
Theorem says that there is a c between s and t such that f .c/ D ba
f .x/dx.
a
• Thus, c 2 Œa; b satisfies the needed requirement and completes the proof.
It can be very exciting to take a first course in Calculus. After learning what a
limit is, you learn about two very different-looking limit processes: the derivative
and the integral. Both differentiation and integration have important applications
which justify the amount of attention they receive. But then comes the seemingly
amazing revelation that these two processes, although they are defined in extremely
different ways, are, in fact, very closely related in that they are essentially inverse
operations of each other. This fact is the point of the Fundamental Theorem of
Calculus, often presented as the pinnacle of the first course in Calculus.
The Fundamental Theorem of Calculus starts with a function f integrable
on Œa; b. The result of the theorem is generally stated in two parts. The first part
Rx
defines a new function F.x/ D f .t/dt and states that if f is continuous at some
a
point c 2 .a; b/, then F 0 .c/ D f .c/. The second part states that if f is continuous
on Œa; b, and if F is any function satisfying F 0 .x/ D f .x/ for all x 2 Œa; b, then
Rb
f .x/dx D F.b/ F.a/. It is fairly straightforward to prove the second part using
a
the first part.
To prove the first part of the theorem, you would assume that a function f is
integrable on an interval Œa; b and that f is continuous at c 2 .a; b/. To find
Rx
the derivative of F.x/ D f .t/dt at c, you would just apply the definition of
a
the derivative. That is, you would start with the difference quotient F.x/F.c/ D
x xc
1
R Rc
1
Rx
xc
f .t/dt f .t/dt . This simplifies to xc f .t/dt. Now if you knew that f
a a c
6.9 Integrals of Continuous Functions 193
were continuous between c and x, you could apply the just completed Mean Value
Theorem for Integration to conclude that this difference quotient is equal to f .y/ for
some y between c and x. Then by forcing x to be close to c, you could force f .y/
to be close to f .c/ to complete the proof. But you do not know that f is continuous
between c and x; only that f is continuous at c. Still this is enough. You can use
the continuity of f at c to say that for a given > 0 there is a ı > 0 that ensures
that if t satisfies jt cj < ı, then jf .t/ f .c/j < . This shows that for x within
1
Rx 1
Rx 1
Rx
ı of c, xc .f .c/ /dx < xc f .t/dx < xc .f .c/ C /dx which simplifies to
c c c
1
Rx
f .c/ < xc
f .t/dx < f .c/ C , and the result follows.
c
The second part of the Fundamental Theorem of Calculus now follows easily.
Rx
Indeed, if f is continuous on Œa; b, then the function F.x/ D f .t/dt is an
a
antiderivative of f , that is, a function whose derivative is f . If G.x/ is any other
antiderivative of f , then G0 .x/ D F 0 .x/ on Œa; b. It follows from the Mean Value
Theorem (for derivatives) that G and F differ by a constant because G F has a
derivative that is identically 0. Thus, F.x/ F.a/ D G.x/ G.a/ for all x 2 Œa; b
Rb
showing that f .t/dt D G.b/ G.a/ for any antiderivative G.
a
194 6 Riemann Integrals
6.9.1 Exercises
Rx3
1. If F.x/ D t
1Ct2
dt, find F 0 .x/.
x2
2. Suppose f has a jump discontinuity at c 2 Œa; b (that is, lim f .x/ and lim f .x/
x!c x!cC
both exist and are unequal). If f is integrable on Œa; b, what is the behavior of
Rx
F.x/ D f .t/dt at c?
a
Rx
3. Suppose f is integrable on Œa; b. If the derivative of F.x/ D f .t/dt exists at
a
c 2 Œa; b, what can you say about f at c?
to limit the size of the intervals where u.x/ v.x/ is large. Suppose, for example,
that near points where f is continuous, you could limit u.x/ v.x/ to be less than
2.ba/
. Then the total contribution to the integral of u v over those sections of the
step functions would be at most 2.ba/ .b a/ D 2 . The function f is bounded, so
there is an M such that jf .x/j < M for all x 2 Œa; b. It is possible, therefore, to
define upper and lower step functions that differ by at most 2M at points of Df . If
you can limit the regions where u.x/ v.x/ is large to intervals whose total length
is at most 4M , then the total contribution to the integral of u v over those sections
of the step functions would be at most 2M 4M D 2 . Accomplishing both of these
goals would then show that the integral of u v is less than 2 C 2 D . Can this be
accomplished? By the definition of continuity, for each point x where f is continuous
there is a ı > 0 such that if y is in Œa; b with jy xj < ı, then jf .y/ f .x/j < 4.ba/ .
That would ensure that for any two values y1 and y2 in the interval .x ı; x C ı/, the
difference jf .y1 /f .y2 /j jf .y1 /f .x/jCjf .x/f .y2 /j < 4.ba/ C 4.ba/ D 2.ba/ .
By the definition of measure zero, the set of discontinuities of f can be covered by a
collection of open intervals whose total length is less than the needed 4M . Thus, each
point of Œa; b can be covered by one of the open intervals covering Df or by one of
these .x ı; x C ı/ intervals constructed at each point of continuity. The Heine–
Borel Theorem then lets you reduce this covering of Œa; b with open intervals to a
finite subcovering, and from that subcovering, the appropriate upper and lower step
functions can be constructed. That completes the strategy for the first part of the
proof.
Assume, conversely, that the function f defined on the interval Œa; b is integrable.
You already know that this implies that jf j is bounded by some constant M, so all
you need to prove is that the set of discontinuities of f , Df , has measure zero. This
can be done with a proof by contradiction. That is, by assuming that Df does not
have measure zero, you can show that for any upper and lower step functions, u and
v, the integral of u v is bounded away from 0. To do this it is helpful to consider
how much f can vary near a particular value x. For a point x 2 Œa; b and a ı > 0,
you would like to know how much f can change over the interval .x ı; x C ı/. So
define Wı .x/ D sup f .y/ inf f .y/ where the supremum and infimum are calculated
for y varying over the interval .x ı; x C ı/ \ Œa; b. Note that if f had upper and
lower step functions that were both constant on the interval .x ı; x C ı/, then
the two step functions would have to differ by at least Wı .x/ on that interval. Now
define the variation of a function f at a point x to be W.x/ D lim Wı .x/. Since
ı!0C
0 Wı .x/ 2M is nonincreasing as ı ! 0C , the limit W.x/ always exists and is
equal to inf Wı .x/. The following lemma gives an important property of W.
6.10 Characterization of Integrable Functions 197
PROOF: Let f be any bounded function defined on the interval Œa; b.
Then for any x 2 Œa; b, the variation of f at x is 0 if an only if f is
continuous at x.
• Let f be a bounded function defined on the interval Œa; b.
• Assume that there is a real number M such that jf .x/j < M for all x 2 Œa; b,
and assume the set Df , the set of x 2 Œa; b such that f is discontinuous at x,
has measure zero.
• Let > 0 be given.
• By the definition of measure zero, there is a sequence of open intervals
I1 ; I2 ; I3 ; : : : with total length less than 4M such that Df is contained in the
union of those intervals.
• By the definition of continuity, for each x 2 Œa; b where f is continuous,
there is a ıx > 0 such that jf .y/ f .x/j < 4.ba/ for all y 2 Œa; b with
jy xj < ıx . Let Jx be the interval .x ıx ; x C ıx /.
• Since each x 2 Œa; b is either a point of continuity of f or a member of
Df , each x 2 Œa; b is either a member of one of the intervals Ij that covers
Df or in the interval Jx . Thus, the collection of open intervals consisting of
I1 ; I2 ; I3 ; : : : together with the Jx intervals forms an open covering of Œa; b.
• By the Heine–Borel Theorem, there exists a finite collection of these open
intervals than covers Œa; b. Let E D fx1 ; x2 ; x3 ; : : : ; xn g be the set of distinct
endpoints for the intervals in this finite cover of Œa; b where x1 < x2 < x3 <
< xn .
• Define step functions u.x/ and v.x/ as follows. If x D xj for one of the
endpoints xj 2 E, then define u.x/ D v.x/ D f .x/.
• For each j the open interval .xj1 ; xj / must be a subset of one of the finite
number of intervals that cover Œa; b. If .xj1 ; xj / is contained in one of the
Ik intervals that covers Df , define u.x/ D M and v.x/ D M for each
x 2 .xj1 ; xj /. Since jf j is bounded by M, v.x/ f .x/ u.x/ for each
x 2 .xj1 ; xj /.
• Otherwise, .xj1 ; xj / is contained in one of the Jx intervals. In this case,
define u.y/ D f .x/ C 4.ba/ and v.y/ D f .x/ 4.ba/ for each y 2 .xj1 ; xj /.
Since jf .y/ f .x/j < 4.ba/ for all y 2 Jx , it follows that v.y/ < f .y/ < u.y/
for each y 2 .xj1 ; xj /.
• It follows that v is a lower step function of f , and u is an upper step function
of f .
Rb Pn Rxj
• u.x/ v.x/dx D u.x/ v.x/dx.
a jD2 xj1
(continued)
6.10 Characterization of Integrable Functions 199
• Over the intervals that were subsets of the Ij intervals, u.x/ v.x/ D 2M.
The total length of such intervals cannot exceed 4M . As a result, the integral
of u.x/ v.x/ over these intervals cannot exceed 2M 4M D 2 .
• Over the intervals that were subsets of the Jx intervals, u.x/ v.x/ < 2.ba/ .
As a result, the integral of u.x/ v.x/ over these intervals cannot exceed
Rb
2.ba/
D 2 .
a
• Thus, f has upper and lower step functions, u and v, with the property that
Rb
u.x/ v.x/dx < 2 C 2 D .
a
• Therefore, f is Riemann integrable on Œa; b.
(continued)
200 6 Riemann Integrals
• Dnf cannot be covered by open intervals whose total length is less than .
Thus, it follows that the total length of the intervals Ij that contain points of
Df must be at least .
Rb
• It follows that u.x/ v.x/dx 1n .
a
• Thus, f cannot have upper and lower step functions whose integrals differ
by less than n . This implies that f is not integrable which is a contradiction.
Therefore, the assumption that Df does not have measure zero is false,
which completes the proof.
6.10.1 Exercises
The axioms for the real numbers define addition as a binary operation and establish
the rules for adding two real numbers together. One can use mathematical induction
to extend axioms and theorems about addition to get theorems about the addition
of any finite number of terms. But there is nothing in the axioms that suggests how
to add an infinite number of terms together or what such a sum would mean. You
need to make a separate definition in order to make sense out of adding infinitely
P
1
many terms together. An infinite series a1 C a2 C a3 C D an has a sequence
nD1
of terms a1 ; a2 ; a3 ; : : : which are written with plus signs or minus signs between
the terms of the sequence. In this chapter, most series will begin with a first term
a1 , although there is no problem with beginning the series at other subscript values
P
1
such as the commonly seen an . Also in this chapter the terms of the series will
nD0
be real numbers, although it is possible to extend the definition to series of other
kinds of terms such as complex numbers or matrices. This explains what an infinite
series looks like, but it does not prescribe any meaning to the symbols.
In Abstract Algebra one can study formal power series, a study that looks at one
type of infinite series and considers how to manipulate the series without regard
to whether these series can be assigned any meaningful numerical values. But in
Analysis, one is interested in the cases where it makes sense to assign a numerical
value to the series. The difference in the two studies is in the interpretation of a
series like 1 2 C 3 4 C . If you ask what happens if you multiply this series
by 2, a purely algebraic answer would be that you just use the Distributive Law and
multiply each term of the series by 2 to get 2 4 C 6 8 C . But an analytical
answer to the question is that it makes little sense to assign a numerical value to the
series, so multiplying the series by 2 cannot yield a meaningful result.
s1 D a1
s2 D a1 C a2
s3 D a1 C a2 C a3
:::
sk D a1 C a2 C a3 C C ak :
Since each of these sums is just the sum of a finite number of terms, they are easily
defined. The series is said to converge to real number L if the sequence of partial
P1
sums converges to L, that is, if lim sk D L. In this case one writes an D L and
k!1 nD1
says that the series has limit L or even that the series has value L. If the sequence
of partial sums does not converge, then the series is said to diverge. If the limit of
partial sums converges to infinity or negative infinity, the series is said to diverge to
P
1
infinity or negative infinity, respectively. In that case one could write an D 1 or
nD1
P
1
an D 1.
nD1
P
1
The definition of convergence suggests that for each series an one should
nD1
P
k
derive a simple expression for its partial sums sk D an and then calculate the
nD1
limit of the partial sums lim sk . Unfortunately, there are relatively few series that
k!1
admit simple closed-form expressions for their partial sums, and this technique for
finding the value of a series has limited use. Still, it is important to know about
some of the cases when this technique does work. Perhaps the best known examples
of series whose partial sums can be explicitly calculated are the geometric series.
These are the series whose sequence of terms can be written in the form an D arn1 ,
where a and r are given real numbers. Then the first term of the series is a and the
n1
an
common ratio of adjacent terms is an1 D ar
arn2
D r, at least in the interesting cases
when ar ¤ 0. When r is not equal to 1, there is a simple algebraic trick that gives
the expression for the partial sums.
X
k
sk D arn1
nD1
X
k
r sk D arn
nD1
7.1 Convergence of Infinite Series 203
X
k X
k
sk r sk D arn1 arn D a ark
nD1 nD1
sk .1 r/ D a.1 r / k
1 rk
sk D a :
1r
(Of course, there is an even simpler trick for the case when r D 1.) Thus, except
in the trivial case where a D 0, the limit of the partial sum diverges if jrj 1. On
the other hand, when jrj < 1, lim rk D 0 so lim sk D 1r a
which can easily be
k!1 k!1
remembered as “the first term divided by 1 minus the common ratio.” The geometric
series is particularly important because one can often compare other series to a
geometric series to determine if the other series converges. It also gives a nice
example showing that series that make a lot of sense when they converge can lead
you to very strange and very incorrect conclusions when they do not converge. In
P
1
particular, rn D 1r
r
whenever jrj < 1. But when you take a limit as r approaches
nD1
P
1
1
1, you get lim rn D lim r
D which is not the same as the nonsensical
r!1 nD1 r!1 1r 2
P
1
series lim rn D .1/ C 1 C .1/ C 1 C .1/ C 1 C .
nD1 r!1
Another class of series whose partial sums can be calculated are the telescoping
series. This is a class of series where each term an can be written as a difference of
two terms an D bn bnC1 . Then sk D .b1 b2 / C .b2 b3 / C .b3 b4 / C C
.bk bkC1 / D b1 C.b2 Cb2 /C.b3 Cb3 /C.b4 Cb4 /C .bk C bk / bkC1 D
b1 bkC1 . Hence, if lim bkC1 exists, the series converges. The best known example
k!1
P
1 P1
1 1 1 1
of this type is the series 2
n Cn
D n
nC1 D 1 lim nC1 D 1.
nD1 nD1 n!1
Fortunately, even though it is often difficult to determine the exact values for
the partial sums of a series, one can very often determine whether or not the series
converges and sometimes the value to which it converges even without knowing an
explicit formula for its partial sums. There are many tools that can be used to do this.
These tools consist of a large collection of convergence tests which can be applied
to determine if a particular series converges. Calculus students often get a great deal
of practice selecting appropriate convergence tests for series. This chapter will be
more interested in proving the theorems that provide these tests.
The simplest and possibly most important convergence test is the Limit of Terms
Test which says that a series can converge only if its sequence of terms has a limit
P
1
of 0. That is, if an converges, then lim an D 0. This is a direct consequence of
nD1 n!1
P
1
the fact that if an converges, then its sequence of partial sums <sk > converges.
nD1
204 7 Infinite Series
The point is, if lim sk exists, then <sk > is a Cauchy sequence whose term must get
k!1
close to each other, and so sk sk1 D ak must approach 0.
P
1
PROOF (Limit of Terms Test): The series an converges only if
nD1
lim an D 0.
n!1
P
1
• Assume that the series an converges to the limit L.
nD1
P
k
• Then the sequence of partial sums sk D an converges to L.
nD1
• This implies lim an D lim .sn sn1 / D lim sn lim sn1 D L L D 0
n!1 n!1 n!1 n!1
which completes the proof.
The convergence of one series can often be inferred from the convergence of a
similar series. For example, inserting extra terms equal to 0 into a series does not
affect whether the series converges, nor can inserting extra 0 terms affect the value
to which the series converges. This is because the insertion of terms equal to 0 into
a series does not change the sequence of partial sums for that series except to allow
some of the partial sums to be repeated, and that does not change the limit of the
sequence of partial sums.
Another useful observation is that if two series differ in only a finite number of
terms, then either both series converge or both series diverge. Suppose, for example,
P1 P
1
that an and bn are two series such that for some positive integer N, the terms
nD1 nD1
an D bn for all n > N. Why would the convergence of one of the series imply
the convergence of the other? It must depend on the convergence of their partial
Pk Pk
sums, so let sk D an and tk D bn be the sequences of partial sums for
nD1 nD1
the two series. The agreement of an and bn for all n > N shows that for k > N,
Pk Pk
tk D t N C bn D tN C an D tN C sk sN . Thus, lim sk exists if and only
nDNC1 nDNC1 k!1
if lim sk C tN sN D lim tk exists.
k!1 k!1
7.1.1 Exercises
Find limits for the following series or show that the limit does not exist.
P
1
5
1. 3n
nD1
P1
4
2. 22nC1
nD1
7.2 Absolute and Conditional Convergence 205
P
1 p
3. 23 n
nD1
P1
7
4. n2 C5n
nD1
P1
1
5. n2 C9nC14
nD1
1 1 1 1 1 1 1 1
6. C 2 C 3 3 4 C 4 C
2 3 22 3 2 3 2 3
1 1 1 1
7. C 0 C C 0 C 0 C C 0 C 0 C 0 C C 0 C 0 C 0 C 0
2 4 8 16
5 1 1 1 1 1
8. 11 C 3 C C C C C C
9 12 23 34 45 56
P
1
PROOF: If the series an is absolutely convergent, then it converges.
nD1
P
1
• Assume that the series an is absolutely convergent which means that the
nD1
P
1
series jan j converges.
nD1
P
k P
k
• For each k > 0 let Sk D jan j and sk D an .
nD1 nD1
• Then, since the series is absolutely convergent, the sequence <Sk > con-
verges implying that <Sk > is a Cauchy sequence.
• Given > 0 there is an N such that for all m and k greater than
N, jSm Sk j < . ˇ m ˇ
ˇ P ˇ Pm
• Let m > k > N. Then jsm sk j D ˇˇ an ˇˇ jan j D jSm Sk j < .
nDkC1 nDkC1
• Thus, the sequence <sk > is a Cauchy sequence and is, therefore, a
convergent sequence.
P
1
• This shows that an is convergent which proves the theorem.
nD1
P
1
If the series an converges, but it is not absolutely convergent, then it is called
nD1
conditionally convergent. An absolutely convergent series converge because its
terms get small fast enough that its partial sums must rapidly get close to each
other and to a limit. A conditionally convergent series converges because its negative
terms balance the growth of its positive terms. For example, the series 1 1 C 12
1
2
C 13 13 C 14 14 C clearly converges to 0 due to this type of cancelation.
Thus, every series can be categorized as either absolutely convergent, conditionally
convergent, or divergent.
Because the definition of the convergence of a series involves the limit of partial
sums, many results that are true for finite sums are easily proved for infinite sums.
P
1 P1 P
1
For example, if an converges, and c is any constant, then can D c an . To
nD1 nD1 nD1
P
k
prove this you would have to consider the partial sums c an . But the Distributive
nD1
P
k P
k
Law works for finite sums, so can D c an , and the limit of this is the needed
nD1 nD1
P
1
c an .
nD1
7.3 The Arithmetic of Series 207
P
1
PROOF: If the series an converges, and c is any real number, then
nD1
P
1 P
1
c an D c an .
nD1 nD1
P
1
• Assume that an converges, and that c is a real number.
nD1
P
1 P
k P
k P
k P
1
• Then can D lim can D lim c an D c lim an D c an
nD1 k!1 nD1 k!1 nD1 k!1 nD1 nD1
proving the result.
P
1 P
1
Another easy result is that if an and bn are both convergent series, then
nD1 nD1
P
1 P
1 P
1
.an C bn / D an C bn . Again, this is easy because the result follows
nD1 nD1 nD1
immediately from properties of finite sums.
P
1 P
1 P
1
PROOF: If the series an and bn both converge, then .an C bn /D
nD1 nD1 nD1
P
1 P
1
an C bn .
nD1 nD1
P
1 P
1
• Assume that the series an and bn both converge.
nD1 nD1
P
1 P
k P
k P
k
• Then .an C bn / D lim .an C bn / D lim an C bn D
nD1 k!1 nD1 k!1 nD1 nD1
P
k P
k P
1 P
1
lim an C lim bn D an C bn proving the result.
k!1 nD1 k!1 nD1 nD1 nD1
With these theorems you can often start with a series whose value you know
and derive the values of other series. For example, what is the value of the
series 1 C 12 14 C 18 C 16 1
321 1
C 64 C 1281
2561
? This series looks
something like the geometric series with first term 1 and common ratio 12 which
is 1 C 12 C 14 C 18 C . That series has limit 11 1 D 2. But the new series
2
is clearly not a geometric series because the terms are not all the same sign,
which would be the case for a geometric series with a positive common ratio,
nor are the terms alternating in sign, which would be the case for a geometric
series
with a negativecommon
ratio. The new series can be written, though, as
1 C 12 C 14 C 18 C 0 C 0 C 24 C 0 C 0 C 32 2
C 0 C 0 C 256 2
C . This is
the difference of two series: the geometric series with first term 1 and common ratio
1
2
, and a series whose value is the same as a geometric series with first term 24 D 12
1
1 2 10
and common ratio 18 . Thus, the new series has value 1
1
D .
1 2 1 8 7
208 7 Infinite Series
1 1 1 1 1 1 1 1 1 1 1
1C C C C C C D 1 C 0 C C
3 2 5 7 4 9 11 6 3 2 5
1 1 1 1 1
C0C C C0C C D
7 4 9 11 6
1 1 1 1 1
1 C C C
2 3 4 5 6
1 1 1 1
C 0 C C 0 C 0 C C 0 C 0 C D
2 4 6 8
1 1 1 1 1
1 C C C
2 3 4 5 6
1 1 1 1 1 1
C 1 C C C D
2 2 3 4 5 6
1 3
ln 2 C ln 2 D ln 2:
2 2
It is not unusual that rearranging the order of terms in a series results in the series
converging to a different quantity. This is, in fact, a characteristic of all conditionally
convergent series as will be shown later in this chapter.
7.3.1 Exercises
P
1 P
1
1. Prove that if the series an and bn both converge, and c and d are real
nD1 nD1
numbers, then
P1 P
1 P
1
.c an C d bn / D c an C d bn .
nD1 nD1 nD1
P
1 P
1
2. Prove that if the series an converges, then its sequence of tails tn D am
nD1 mDn
converges to 0.
7.4 Tests for Absolute Convergence 209
There are many tests for the convergence of series. Presented here are four very
useful tests that apply to series whose terms are all positive real numbers. Of course,
P1 P
1
since the convergence of jan j implies the convergence of an , these tests can
nD1 nD1
be thought of as tests for the absolute convergence of series.
After the Limit of Terms Test, the Comparison Test is likely the most important
convergence test because it is used to prove most of the other convergence tests. It
states that if the terms of one series are less than or equal to the corresponding terms
of a second series, then the convergence of the second series implies the convergence
P1 P1
of the first series. Specifically, suppose there are two series an and bn , and for
nD1 nD1
P
1 P
1
each n, the terms satisfy 0 an bn . Then if bn converges, it follows that an
nD1 nD1
must converge. The contrapositive of this statement is then also true and states that
P
1 P
1
if an diverges, then bn must also diverge.
nD1 nD1
Consider how you would prove that this test is valid. The proof would assume that
P
1
0 an bn for each n, and assume that bn converges. Then it must show that
nD1
P
1
an converges. One shows that a series converges by showing that its sequence
nD1
P
1
of partial sums converges. You do know that the sequence of partial sums for bn
nD1
converges, so how can you use that to make a conclusion about the partial sums of
P1
an ? One idea is to use the technique from the proof that absolutely convergent
nD1
210 7 Infinite Series
series are convergent; that is, a series converges if and only if its sequence of partial
P
1 P
m
sums is Cauchy. If the partial sums of bn form a Cauchy sequence, then bn
nD1 nDk
gets small whenever k m are large. Now, the given fact that an bn lets you
P
m P
m P1
conclude that an bn which implies that the partial sums of an are
nDk nDk nD1
P
1
Cauchy. Thus, an must converge.
nD1
P P
PROOF (Comparison Test): Suppose that an and bn are series
nD1 nD1
with nonnegative terms and N is a real number such that for every
P
1
integer n > N, the terms satisfy 0 an bn . Then if bn converges, so
nD1
P
1
does an .
nD1
P P
• Assume that an and bn are series with nonnegative terms.
nD1 nD1
• Assume that there is an N such that for every n > N, the terms of the series
satisfy 0 an bn .
P1
• Assume that the series bn converges.
nD1
P
k
• This means that the sequence of partial sums bn converges and is,
nD1
therefore, a Cauchy sequence.
• Thus,ˇ given > 0 ˇthere is an M N such that if M < k m, then
ˇPm Pk ˇ Pm
> ˇˇ bn bn ˇˇ D bn .
nD1 nD1 nDkC1
Pm P
m
• But then whenever M < k m, it follows that > bn an D
nDkC1 nDkC1
P
m P
k
an an .
nD1 nD1
P
1
• This implies that the sequence of partial sums of an is Cauchy.
nD1
P
1
• Therefore, the sequence of partial sums of an converges, so the series
nD1
converges.
• This proves that the Comparison Test is valid.
The Comparison Test can be used in many cases when you are faced with a series
which is similar to a series that you know converges. For example, you already know
P
1
1
that the series n2 Cn
converges because it forms a telescoping series. Can this fact
nD1
7.4 Tests for Absolute Convergence 211
P
1
1
be used to show that the series n2
converges? Well, the Comparison Test cannot
nD1
be applied directly because for each n you have n2 1Cn < n12 which is not what you
need. You need to find a convergent series whose terms are greater than or equal to
1
n2
or a divergent series whose terms are less than or equal to them. You have neither.
On the other hand for each positive integer n, it is true that n12 D n2 Cn2 2
2 n2 Cn .
P 2
1
The series n2 Cn
is twice a convergent series, so it is also convergent. Thus, the
nD1
P
1
1
Comparison Test shows that n2
converges.
nD1
In this way the Comparison Test can be used to simplify the task of testing the
convergence of many complicated looking series. As another example, consider the
P
1
2nC7
series n3 5nC1
. Note that the first two terms of this series are negative. Because
nD1
the convergence of a series does not depend on the value of any finite set of its
terms, it is sufficient to test the series by considering the terms where n 3. In
the terms n32nC75nC1
the degree of the polynomial in the denominator is 3 while the
degree of the polynomial in the numerator is 1. This suggests that the terms could be
compared to the terms n12 of a known convergent series. The strategy is to compare
2nC7
n3 5nC1
to a fraction that is greater but look more like n12 . If the series with greater
fractions converges, the Comparison Test shows that the original series converges.
This can be done by attempting to eliminate lower degree terms of the numerator
and denominator polynomials, thus, ending up with a simpler fraction greater than
the original. Clearly, when considering the numerator 2n C 7, the constant term,
7, will be dwarfed by the size of the linear term 2n suggesting that you replace
2n C 7 by the larger quantity 2n C 7n D 9n. This replacement will result in a larger
fraction, but it should not affect whether or not the series converges. Similarly, it
would be good to replace the denominator n3 5n C 1 with a smaller polynomial of
the same degree which will result in obtaining a fraction larger than n32nC7 5nC1
. One
can drop the constant term altogether, but one cannot drop the 5n term without
making the denominator polynomial larger. This can be handled by writing n3 as
1
2
n3 C 12 n3 . For large enough values of n, the value of 12 n3 will exceed 5n making
1
2
n3 5n a positive quantity which could be removed from the polynomial to make
the polynomial smaller. Indeed, you need 12 n3 5n 0 implying n2 10. Thus,
if n 4, you can conclude that n3 5n C 1 > 12 n3 . This shows that for n 4, the
P
1 P
1
fraction n32nC7
5nC1
< 19nn3 D 18 1
n2 . Since the series
1
n2 converges, so does
18
n2
.
2 nD1 nD1
P
1
2nC7
Thus, by the Comparison Test, the series n3 5nC1
converges.
nD1
As a final example of using the Comparison Test, consider the series 12 C 14 C
1
4
C 18 C 18 C 18 C 18 C 16
1 1
C 16 C 161
C 16 1
C 161 1
C 16 1
C 16 1
C 16 C . For each
k 0 this series has 2 terms equal to 2kC1 , and these 2 terms add to 12 . Thus, the
k 1 k
212 7 Infinite Series
P
1
PROOF (Ratio Test): Suppose that an is a series of positive terms such
nD1
that lim anC1 D L. Then if L < 1, the series converges, if L > 1, the series
n!1 an
diverges, and if L D 1 the test fails.
P
1
anC1
• Assume that an is a series of positive terms such that lim an
D L.
nD1 n!1
CASE 1: L < 1
a
• If L < 1, there is an integer N such that nC1
an
< LC1 2
for all n N.
• Let a D aN , and r D 2 < 1.
LC1
a
• Assume that for some k 0, aNCk ark . Then NCkC1 < r, so aNCkC1 <
k aNCk
aNCk r ar r D ar . kC1
CASE 2: L > 1
a
• If L > 1, there is an integer N such that nC1
an
> 1 for all n N.
• Then for all n N, anC1 > an > 0, so the sequence of terms increases from
aN and cannot have a limit of 0.
• Therefore, the series diverges by the Limit of Terms Test.
CASE 3: L D 1
P
1
anC1 1
• Note that the constant series 1 diverges, and lim D lim D 1.
nD1 n!1 an n!1 1
P
1
1 anC1 n2
• The series n2
converges, and lim D lim 2 D 1.
nD1 n!1 an n!1 .nC1/
• Therefore, no conclusion can be drawn when L D 1, and the Ratio Test
fails.
The ratio test is not helpful for series where the nth term is a rational function of n
a
because the limit of nC1an
will always be 1, and the test is inconclusive. The ratio test
is particularly useful for series whose nth terms involve powers or factorials. For
P1 n
5
5nC1
.nC1/Š
example, when you apply the ratio test to the series nŠ
, you get lim 5n D
nD1 n!1 nŠ
anC1
Note that rather than requiring lim an
to have a limit, it is enough to assume
n!1
anC1 anC1
that lim sup an
< 1 to assure that the series converges and lim inf an
> 1 to
n!1 n!1
assure that the series diverges. The proofs of these facts are left as exercises, but
they are important refinements of the Ratio Test since the lim inf and lim sup always
exist even if the limit does not. For example, consider the series 1 C 23 C 13 C 322 C
1
C 323 C 313 C . For this series, the ratio nC1 oscillates between 23 and 12 , so the
a
32 an
limit of the ratio does not exist. But the lim sup of the ratio is 23 < 1 implying that
the series converges.
The Ratio Test will play a major role in the discussion of power series in the next
chapter.
The Root Test is similar to the Ratio Test and can often be used for the same
series for which the Ratio Test can be used. This is because, like the Ratio Test,
it compares a series to a geometric series. For some series where the general term an
involves the nth powers of expressions, the Root Test can be easier to apply than the
Ratio Test. To test a series with positive terms an with the Root Test, you calculate
p
the limit lim n an D L. Then, as with the Ratio Test, if L < 1, the series converges,
n!1
if L > 1, the series diverges, and if L D 1, the test fails.
Proving that the Root Test is valid is very straightforward. Given that
p p
lim n an D L < 1, there is an integer N such that for all n N the root n an is
n!1 n
less than LC1
2
< 1. Then, for n N, the terms an are less than LC1 2
, the terms
of a convergent geometric series. Thus, the series converges by the Comparison
Test.
P
1
PROOF (Root Test): Suppose that an is a series of positive terms such
p nD1
that lim n an D L. Then if L < 1, the series converges, if L > 1, the series
n!1
diverges, and if L D 1 the test fails.
P
1 p
n a D L.
• Assume that an is a series of positive terms such that lim n
nD1 n!1
CASE 1: L < 1
p
• If L < 1, there is an integer N such that n an < LC1 for all n N.
LC1 n 2
• Then, for n N, each term an < 2
, the corresponding term of a
geometric series with common ratio LC1 2
< 1.
• Therefore, since the geometric series converges, the Comparison Test shows
P
1
that an converges.
nD1
(continued)
7.4 Tests for Absolute Convergence 215
CASE 2: L > 1
p
• If L > 1, there is an integer N such that n an > LC1 > 1 for all n N.
LC1 n 2
• Then, for all n N, an > 2 which diverges to infinity.
• Therefore, the series diverges by the Limit of Terms Test.
CASE 3: L D 1
P
1 p
• Note that the constant series 1 diverges, and lim n an D lim 1 D 1.
nD1 n!1 n!1
P
1
1 p
n a D lim 1
• The series converges, and lim n p
n 2.
n2 n!1 n!1 n
nD1
• Since
thep natural
logarithmp function is continuous at 1, it follows that
ln lim n an D lim ln n an D lim 2 lnn n . Then by L’Hopital’s Rule,
n!1 n!1 n!1
2 p
this limit is lim 1n D 0, from which it follows that lim n an D 1.
n!1 n!1
• Therefore, no conclusion can be drawn when L D 1, and the Root Test fails.
p
As with the Ratio Test, it is sufficient to know that lim sup n an < 1 to conclude
p n!1
that the series converges, and that lim inf n an > 1 to conclude that the series
n!1
diverges. For example, the series 12 C 13 C 212 C 312 C 213 C 313 C has general term
p p
a2n D 31n and a2n1 D 21n . Thus, lim n an does not exist, but lim sup n an D p1
n!1 n!1 2
which is less than 1, so the series converges.
The definition of the Riemann integral considers the integrals of functions over
closed bounded intervals, Œa; b. This definition can be extended to integrals on
an infinite interval. An improper Riemann integral of the first kind defines
integrals over intervals where one or both of the endpoints of the interval are infinite.
R1 Rb Rb Rb
One defines f .x/dx as lim f .x/dx. Similarly, f .x/dx D lim f .x/dx
a b!1 a 1 a!1 a
R1 Rb R1 1
Rb
and f .x/dx D lim lim f .x/dx. For example, x2
dx D lim x12 dx D
1 a!1 b!1 a 1 b!1 1
lim 1x jb1 D 1.
b!1
After seeing a definition of the improper Riemann integral of the first kind, the
reader may be curious whether there is also an improper Riemann integral of
the second kind. Although this text will not need to deal with improper Riemann
integrals of the second kind, the definition is given here for completeness. Recall
that Riemann integrals over an interval Œa; b exist only if the integrand is bounded.
So, an improper integral of the second kind is an integral where the integrand is
unbounded in every neighborhood of a point c 2 Œa; b. In this case, the Riemann
216 7 Infinite Series
integral on Œa; b can be calculated on a region that excludes c and then the limit can
R4
be taken as the region expands toward c. For example, one would define p1x dx as
0
R4 1 p
lim p
x
dx D lim 2 xj4a D 4.
a!0C a a!0C
The Integral Test for the convergence of a series of positive terms involves
the comparison of an infinite series with an improper Riemann integral. It applies
to series whose terms are equal to a monotonically decreasing function f defined
on an interval Œa; 1/ such that for all n a, the nth term of the series an is
equal to the function at the point n, that is, an D f .n/. The following figure
makes this comparison clear. Let k be an integer greater than or equal to a. If f
is a monotonically decreasing function, then whenever n x > k, the function
Rn Rn
f .x/ f .n/ D an showing that f .x/dx f .n/dx D f .n/ D an . Thus, by the
n1 n1
P
1 P R
1 nC1
Comparison Test, the series an converges if the series f .x/dx converges.
nD1 nDk n
P R
1 nC1 R1
Then, because f is a positive function, f .x/dx D f .x/dx. Alternatively, if
nDk n k
f is a monotonically decreasing function, then whenever x n > k, the function
R
nC1 R
nC1
f .x/ f .n/ D an showing that f .x/dx f .n/dx D f .n/ D an . Thus,
n n
R1 P R
1 nC1
again by the Comparison Test, the improper integral f .x/dx D f .x/dx
k nDk n
P
1 P
1
converges if the series f .n/ converges. Therefore, the series an converges
nDk nD1
R1
if and only if the improper integral f .x/dx converges. Moreover, for any integer
k
R1 P
1 R1
k a, f .x/dx an f .x/dx giving a fairly narrow range for the value
kC1 nDkC1 k
of the infinite series and a good way to obtain an approximation to the value of
the series. This is helpful because it is often easier to evaluate the integral than the
corresponding infinite series (Fig. 7.1).
Fig. 7.1 Comparing the series with the integral in the Integral Test
7.4 Tests for Absolute Convergence 217
P
1
1
As an example, consider the collection of p-series which are the series np
nD1
where p is some constant greater than 0. For which p does the p-series converge?
You have already seen that it converges when p D 2 and diverges when p D 1,
the harmonic series. All the p-series can be handled at once using the Integral Test.
Indeed, since the function f .x/ D x1p is monotonically decreasing in x for each p > 0,
R1
the p-series converges exactly when the integral x1p dx converges. But the integral
1
218 7 Infinite Series
R1 1
is easy to calculate. When p ¤ 1, xp
dx D 1p xp1
1 1
j1 . This is infinite when p < 1
1
but converges to 1p when p > 1. When p D 1, the integral is ln xj1 1 which is infinite.
Thus, by the Integral Test, the integral and the series converge exactly when p > 1.
Consider the p-series when p D 2. The value of this series can be estimated
using the integral estimate associated with the Integral Test. The estimate would be
R1 1 P1
1
R1 P1 P
1
x2
dx > n2
> x12 dx or 1 C 1 > a1 C 1
n2
> 1 C 12 , so 1
n2
is between
1 nD2 2 nD2 nD1
1.5 and 2. This is not very precise, but one can apply this technique a few terms
R1 P
1 R1 P
1
farther down the series to get x12 dx > 1
n2
> x12 dx which shows that 1
n2
10 nD11 11 nD1
2
is between 1.6406 and 1.649. In fact, the limit of the series is 6
1:6449.
7.4.5 Exercises
P
1 P
1
1. Suppose that bn is a convergent series, an is a series, and there are
nD1 nD1
P
1
constants N and K such that 0 an Kbn for all n > N. Prove that an
nD1
converges.
P
1 P
1
2. Suppose that bn is a convergent series with positive terms, an is a series
nD1 nD1
P
1
with positive terms, and lim an
D L for some real number L. Prove that an
n!1 bn nD1
converges. This is sometimes called the Limit Comparison Test.
P
1
a
3. Assume that an is a series of positive terms that satisfies lim sup nC1
an
nD1 n!1
P
1
D L < 1. Prove that an converges.
nD1
P
1
anC1
4. Assume that an is a series of positive terms that satisfies lim inf an
nD1 n!1
P
1
D L > 1. Prove that an diverges.
nD1
P
1
1
5. For the series an D
21
C 221 C 212 C 222 C 213 C 223 C 214 C 224 C calculate
nD1
anC1 a
lim sup an
and lim inf nC1
an
. What can you conclude about the convergence of
n!1 n!1
the series?
P
1 p
6. Assume that an is a series of positive terms that satisfies lim sup n a
n
nD1 n!1
P
1
D L < 1. Prove that an converges.
nD1
7.5 Alternating Series Test 219
P
1 p
7. Assume that an is a series of positive terms that satisfies lim sup n a
n
nD1 n!1
P
1
D L > 1. Prove that an diverges.
nD1
P1
8. For the series an D 211 C 312 C 213 C 314 C 215 C 316 C 217 C 318 C calculate
p nD1 p
lim sup n an and lim inf n an . What can you conclude about the convergence of
n!1 n!1
the series?
9. Use the integral estimate from the Integral Test to estimate the size of the series
P
1
1
n3
.
nD1
10. Determine which of the following series are absolutely convergent by applying
an appropriate convergence test.
P
1
7n6 18n4 C12n2 183
(a) n10 5n5 19
nD1
P1
n2 C5
(b) n3 5
nD1
P1
3n
(c) 2n C5n
nD1
P1
n
(d) 3
nD1
P1
nŠ
(e) .2n/Š
nD1
P1
5n
(f) nŠ
nD1
P1
nn
(g) .nŠ/2
nD1
So what can you do with a series which is not absolutely convergent? There are
fewer tools to handle conditionally convergent series. One tool that does help is
the Alternating Series Test which considers series whose terms alternate in sign.
Specifically, if the absolute values of the terms of the series are monotonically
decreasing to 0, and the signs of the term alternate, then the series converges. For
example, the series seen earlier 1 12 C 13 14 C 15 16 C satisfies these conditions.
The series formed by the absolute values of these terms 1 C 12 C 13 C 14 C 15 C 16 C
is the harmonic series which does not converge, so the given series is not absolutely
convergent. Seeing how the partial sums of this series behave will give you an idea
how to prove that the Alternating Series Test is valid. In particular, the first few
partial sums of this series are
220 7 Infinite Series
1.0
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
s1 D 1 D 1
1 1
s2 D 1 D
2 2
1 1 5
s3 D 1 C D
2 3 6
1 1 1 7
s4 D 1 C D
2 3 4 12
1 1 1 1 47
s5 D 1 C C D :
2 3 4 5 60
The progression can be seen graphically in Fig. 7.2.
Notice that the partial sums of an odd number of terms are all greater than the
limit, ln 2, while the partial sums of an even number of terms
are all less 1than the
1 1
limit. Also, if n is odd, then snC2 D sn C nC1 C nC2 D sn .nC1/.nC2/ <
sn , showing that the partial sums of an odd number 1 of terms forms a decreasing
1 1
sequence. Similarly, if n is even, then snC2 D sn C nC1 nC2 D sn C .nC1/.nC2/ >
sn , showing that the partial sums of an even number of terms forms an increasing
sequence. Because the terms of the series .1/n
nC1
approach 0, the odd partial sums
and the even partial sums approach each other. They both form bounded monotonic
sequences which both converge to the common limit. This behavior is typical of all
series satisfying the hypothesis of the Alternating Series Test.
7.5 Alternating Series Test 221
P
1
PROOF (Alternating Series Test): Suppose an is a series such that
nD1
lim an D 0, and for each n 1, an and anC1 have opposite signs, and
n!1
jan j janC1 j. Then the series converges.
P
1
• Assume that an is a series such that lim an D 0, and for each n 1, an
nD1 n!1
and anC1 have opposite signs, and jan j janC1 j.
• Without loss of generality, assume that a1 > 0.
P k
• Let the series have partial sums sk D an .
nD1
• Note that if n 1 is odd, then anC1 is negative and anC2 is positive with
janC1 j janC2 j implying that snC2 D sn C .anC1 C anC2 / sn .
• Similarly, if n 1 is even, then anC1 is positive and anC2 is negative with
janC1 j janC2 j implying that snC2 D sn C .anC1 C anC2 / sn .
• Thus, the subsequence of odd numbered partial sums forms a monotonically
decreasing sequence while the subsequence of even numbered partial sums
forms a monotonically increasing sequence.
• Because the subsequence of even numbered partial sums is increasing, when
n is an odd positive integer it follows that sn > snC1 s2 showing the
subsequence of odd numbered partial sums is bounded below by s2 implying
that that sequence converges to a limit L1 .
• Similarly, the subsequence of even number partial sums is an increasing
sequence that is bounded above by s1 implying that that sequence converges
to a limit L2 .
• Then L1 L2 D lim s2nC1 s2n D lim a2nC1 D 0 showing that L1 D L2
n!1 n!1
and that the odd numbered partial sums and the even numbered partial sums
both converge to the same limit.
• Therefore, the sequence of partial sums converges and the series converges.
This proof not only says that the given alternating series converges; it gives a
way to estimate the limit of the series. For any series that satisfies the hypothesis
of the theorem, any two adjacent partial sums, sn and snC1 , are on opposite sides of
the limit L of the series. Thus, the distance that sn is from the limit of the series is
less than the distance sn is from snC1 , and that distance is just janC1 j. Therefore, it
is easy to remember that for these series, the distance that a partial sum is from the
limit of the series is no more than the first term that is not part of the sum, janC1 j.
Note that the Alternating Series Test for convergence and this limit estimate apply to
series without regard to whether the series is absolutely convergent or conditionally
convergent.
For example, the number 1e D 0Š1 1Š1 C 2Š1 3Š1 C . This is an absolutely
convergent series as seen by the ratio test. But it is also a series whose terms alternate
in sign, and the absolute values of the terms decrease monotonically to 0. Thus,
the partial sum of the series 0Š1 1Š1 C 2Š1 3Š1 C 4Š1 is already within 100
1
of 1e because
the first neglected term is 5Š1 D 1201
. This technique gives an easy proof that the
222 7 Infinite Series
number e is irrational. It goes like this: If e were rational, then it could be expressed
as pq , where p and q are positive integers. Then 1e D qp D 0Š1 1Š1 C 2Š1 3Š1 C .
Multiplying both sides of this equation by pŠ yields q.p 1/Š D pŠ pŠ C pŠ2 pŠ3 C
1 1
˙ 1 pC1 C .pC1/.pC2/ . Thus, the integer q.p 1/Š would be an integer
1 1
plus (or minus) pC1 .pC1/.pC2/ C . But this infinite series is an alternating series
1
where the absolute value of the terms decrease to 0, so its value is between pC1 and
1 1
pC1
.pC1/.pC2/ . Thus, there would have to be an integer between those two values,
something clearly not possible. This is a contradiction, so the assumption that e is
rational must be false.
7.5.1 Exercises
P
1
1
Recall that the p-series np
converges when p > 1 and diverges otherwise. This
nD1
raises a natural question about whether there is, in some sense, a largest series
that converges, or, perhaps a smallest series that diverges. If there were, that might
provide a good series to use in the Comparison Test because all series smaller would
converge, and all series larger would diverge. This turns out not to be the case. For
P
1
every series of positive terms, an , that diverges, there is a sequence of positive
nD1
P
1
numbers <bn > that converges to 0 such that the series an bn also diverges. In
nD1
1 P
1
fact, one can take bn D sn
where sn is the nth partial sum of an . Clearly, if the
nD1
1
series diverges, then sn goes to infinity, so bn D sn
goes to 0.
7.6 The Smallest Divergent Series 223
To prove this result you would begin with a divergent the series with positive
P
1
terms, an . Because the series is divergent, you know that the sequence of partial
nD1
sums must diverge to infinity. The strategy is to show that the partial sums of the new
P
1
an
series sn
are not Cauchy. In particular, for every integer m, there is an integer k
nD1
P
k
1
such that an
sn
> 2
showing that the mth and kth partial sums differ by at least
nDmC1
1
2
.Suppose you are given a positive integer m. Since the original series diverges,
there is a positive integer k such that sk > 2sm . Then the difference between the kth
P
k
P
k P
k an
and the mth partial sums of the new series is an
sn
> an
sk
D nDmC1
sk
D
nDmC1 nDmC1
sk sm 1
sk
>1 2
D 12 .
P
1
PROOF: Let an be a divergent series with positive terms and partial
nD1
P
k P
1
sums sk D an . Then the series an
sn
also diverges.
nD1 nD1
P
1
• Assume that an is a divergent series with positive terms and partial sums
nD1
P
k
sk D an .
nD1
• Let m be any positive integer.
• Since the partial sums sn diverge to infinity, there is a positive integer k such
that sk > 2sm .
P
1
an
• Then the difference between the mth and kth partial sums of the series sn
nD1
P
k
P
k P
k an
sk sm 1
is an
sn
> an
sk
D nDmC1
sk
D sk
>1 2
D 12 .
nDmC1 nDmC1
P
1
an
• This shows that the sequence of partial sums of the series sn
is not a
nD1
Cauchy sequence, so it cannot converge.
P
1
an
• Therefore, the series sn
diverges.
nD1
It is interesting to note that even though for positive termed divergent series
P
1 P
1
an P
1
an , the series sn
also diverges, for positive termed series an , the series
nD1 nD1 nD1
P1
sn sn1
an
s2n
always converges. To see this, note that for n > 1 the term an
s2n
D s2n
<
nD1
sn sn1 1 1 P1
sn sn1
D
sn1
Thus, the
sn
.
terms of the series an
s2
are less than the terms of a
nD1 n
P
1
1
convergent telescoping series sn1
s1n D s11 lim s1n . Whether the original
nD2 n!1
series diverges so that sn goes to infinity, or it converges to a finite value L so that sn
goes to L, the telescoping series converges.
The series 11C11C11C does not converge. Yet, if you insert parentheses to
group some of the terms together, it can result in a convergent series such as .11/C
.11/C.11/C which converges to 0 or 1C.1C1/C.1C1/C.1C1/C
which converges to 1. So, inserting parentheses can turn a divergent series into a
convergent series. Equivalently, removing parentheses from a convergent series can
P1
turn it into a divergent series. What if the series an converges? Can inserting
nD1
parentheses change whether or not it converges or change the limit to which the
P
1
series converges? The answer to this is no. The point is, if an converges, it means
nD1
that its sequence of partial sums converges. By inserting parentheses into the series,
you are just removing some of the terms in the sequence of partial sums. You end up
with a new series whose sequence of partial sums is a subsequence of the sequence
P1
of partial sums of an , and any such subsequence will converge to the same limit
nD1
as the original series.
P
1
Slightly more can be said. Suppose an is a series whose terms approach 0. If
nD1
parentheses are inserted in such a way that the number of terms contained within
each set of parentheses is bounded, then the insertion of parentheses cannot affect
whether the series converges or the limit to which the series converges. To see this
P
1
assume that each set of parentheses encloses at most K terms. If an converges
nD1
to L, then, as suggested above, no insertion of parentheses can affect the limit of
P
1
the series. So suppose that the series an diverges, and that its partial sums are
nD1
P
k
sk D an . Because lim an D 0, for each > 0, there is an N such that for all
nD1 n!1
7.7 Rearrangement of Terms 225
n > N, the size of the terms jan j must be less than K . Suppose that for some m > N
one term of the series with parentheses added is .amC1 CamC2 CamC3 C CamCk /.
Then sm and smCk are both partial sums for the series with parentheses added. For
any j D 1; 2; 3; : : : ; k, jamC1 CamC2 CamC3 C CamCj j K j , showing that for
any of those j, jsmCj sm j < . The sequence of partial sums for the original series
does not converge either because the sequence is unbounded or because its lim sup
and lim inf approach distinct values. Because the subsequence of partial sums for
the original series remains within of the subsequence corresponding to the series
with parentheses added, the subsequence must also either be unbounded or have
distinct lim sup and lim inf values. Thus, the series with parentheses added cannot
converge.
This observation can be very helpful. Consider again the series 1 C 13 12 C 15 C
1
7
14 C 19 C 111
16 C . This series is not absolutely convergent, and it does not
satisfy the hypothesis of the Alternating
Series
Test.
Yet, if parentheses
are inserted
to group each set of three terms: 1 C 13 12 C 15 C 17 14 C 19 C 11 1
16 C ,
1 1
one gets a general term equal to 2n3 C 2n1 1n D n.2n1/.2n3/
4n3
. The series with
4n3
terms n.2n1/.2n3/ converges absolutely as can be seen by comparing it to the p-
4n3 4n
series with p D 2 since, for n 3, one has n.2n1/.2n3/ < n.2nn/.2nn/ D n42 .
So, the series with parentheses added converges, and since each set of parentheses
contains a maximum of three terms, and the terms of the original series approach 0,
this means that the original series converges.
Of course, if the number of terms enclosed by sets of parentheses is not bounded,
one cannot draw the same type of conclusions. The series .1/ C . 12 C 12 12 12 / C
. 13 C 13 C 13 13 13 13 / C . 41 C 14 C 14 C 14 14 14 14 14 / C converges, but
if parentheses are removed, the series diverges even though its terms do approach 0.
The partial sums oscillate between 1 and 2.
Suppose you are given a target limit L. You have isolated the positive terms of
the series, the bn terms, and the negative terms of the series, the cn terms, so you can
play a cute game by taking a few bn terms such that the sum of those terms exceeds
L, and then subtract off a few cn terms until the sum decreases below L. You can then
add on more bn terms to make the sum again exceed L, and subtract of a few cn terms
until the sum decreases below L. Thus, by alternating between adding on bn terms
and subtracting off cn terms, you can arrange for the resulting series to have limit
L. More precisely, construct a new series inductively as follows: select u1 so that
P
u1
bn > L. This is always possible because the series with bn terms diverges to
nD1
P
u1 P
v1
infinity. Then select v1 to be the least positive integer such that bn cn < L.
nD1 nD1
P
u2 P
v1
Then select u2 to be the least positive integer such that bn cn > L, and
nD1 nD1
P
u2 P
v2
v2 to be the least positive integer such that bn cn < L. For k 2, having
nD1 nD1
selected uk and vk , select ukC1 and vkC1 to be the least positive integers such that
uP
kC1 Pvk uP
kC1 vP
kC1
bn cn > L, and bn cn < L. It is then the case that the series
nD1 nD1 nD1 nD1
b1 C b2 C b3 C C bu1 c1 c2 c3 cv1 C bu1 C1 C bu1 C2 C bu1 C3 C
C bu2 cv1 C1 cv1 C2 cv1 C3 cv2 C is a rearrangement of the terms
of the original series with some extra 0 terms added. Since the terms of the series
approach 0, the partial sums of the series approach L. This provides the desired
rearrangement (Fig. 7.3).
b1 b2 b3 b4 b5
c3 c2 c1
b6 b7 b8 b9
c6 c5 c4
b8
c8 c7
0
L
P
1
PROOF: Let an be a conditionally convergent series, and let L be any
nD1
real number. Then there is a rearrangement of the terms of the series
which converges to L.
P
1
• Let an be a conditionally convergent series, and let L be any real number.
nD1
• For each n, define bn D an if an 0, and bn D 0 if an < 0, and define
cn D bn an .
• Thus, for each n, bn 0, cn 0, and an D bn cn .
P
1 P1 P
1 P
1
• Because an is conditionally convergent, jan j D bn C cn
nD1 nD1 nD1 nD1
P
1 P
1
diverges. Thus, at least one of bn and cn must diverge to infinity,
nD1 nD1
P
1 P
1
and because .bn cn / D an converges, both series must diverge to
nD1 nD1
infinity.
• The Limit of Terms Test shows that lim an D 0 and, thus, lim bn D
n!1 n!1
lim cn D 0.
n!1
P
1
• Because bn is unbounded, there is a least positive integer, u1 , such that
nD1
P
u1
bn > L.
nD1
P
1
• Because cn is unbounded, there is a least positive integer, v1 , such that
nD1
P
u1 Pv1
bn cn < L.
nD1 nD1
• Having selected uk and vk for some k 1, let ukC1 > uk be the least
uP
kC1 P
v1
positive integer such that bn cn > L. Then let vkC1 > vk be the
nD1 nD1
uP
kC1 vP
kC1
least positive integer such that bn cn < L. Thus, by mathematical
nD1 nD1
induction, the sequences <uk > and <vk > can be constructed so that for
P
uk P
vk uP
kC1 P
vk
each k, bn cn < L and bn cn > L.
nD1 nD1 nD1 nD1
Pu1
• Let the terms of the new series dn be given by terms b1 ; b2 ; b3 ; : : : ; bu1
nD1
followed by the terms c1 ; c2 ; c3 ; : : : ; cv1 ,
followed by the terms
bu1 C1 ; bu1 C2 ; bu1 C3 ; : : : ; bu2 follows by the terms cv1 C1 ; cv1 C2 ;
cv1 C3 ; : : : ; cv2 , and so forth, alternating between the sequence of bn
terms for uk < n ukC1 and the sequence of cn terms for vk < n vkC1 .
(continued)
228 7 Infinite Series
P
1
• The resulting series dn is a rearrangement of the terms in the series
nD1
P
1
an with some terms equal to 0 inserted.
nD1
• Given > 0 there is an N1 u1 such that if n > N1 , then bn < , and there
is an N2 such that if n > N2 , then cn < .
• Then there is a k1 such that uk1 > N1 and a k2 such that vk2 > N2 .
• Let k D max.k1 ; k2 /, and let N D uk C vk .
• Then for all m > N, either there is an r such that dm D bp for some p with
N1 < ur < p urC1 or ˇ there is anˇ s such that dm D cq for some q with N2 <
ˇPm ˇ
vs < q vsC1 . Thus, ˇˇ dn Lˇˇ is bounded by either max cvr ; burC1 <
nD1
or by max .bus ; cvs / < .
P
1
• This shows that dn converges to L implying that a rearrangement of the
nD1
P
1
series an converges to L as claimed.
nD1
This theorem takes care of the case of conditionally convergent series, but what
happens when terms of an absolutely convergent series are rearranged? The answer
is that nothing happens; that is, every rearrangement of an absolutely convergent
P
1
series converges to the same limit. Suppose, for example, the series an is
nD1
P
1 P
1
absolutely convergent with rearrangement bn . Because jan j converges, given
nD1 nD1
P
k
> 0 there is an integer N such that for all k N, jan j is within > 0 of its
nD1
P
1 P
1 P
1
limit. Alternatively, jan j < . Because bn is a rearrangement of an ,
nDNC1 nD1 nD1
there is an integer K such that all the terms a1 ; a2 ; a3 ; : : : ; aN are among the terms
P k Pk
b1 ; b2 ; b3 ; : : : ; bK . So, if k K, by how much can an and bn differ? Both
nD1 nD1
sums contain the terms a1 ; a2 ; a3 ; : : : ; aN , so the two sums differ only by a finite
P
1
number of the terms aNC1 ; aNC2 ; aNC3 ; : : : which add to at most jan j < .
nDNC1
This shows that the series and its rearrangement have partial sums within of each
other and completes the argument.
7.7 Rearrangement of Terms 229
P
1
PROOF: Let an be a series that converges absolutely to L. Then every
nD1
rearrangement of the series also converges to L.
P
1
• Let an be a series that converges absolutely to L.
nD1
P1 P
1
• Let bn be any rearrangement of the series an .
nD1 nD1
P1
• Since an converges absolutely, given > 0 there is an integer N such
nD1
P
k P
1
that if k N, jan j is within of its limit, jan j. This means that
nD1 nD1
P
1
jan j < .
nDNC1
P
1 P
1
• Since bn is a rearrangement of an , there is an integer K such that all
nD1 nD1
of the terms a1 ; a2 ; a3 ; : : : ; aN are among the terms b1 ; b2 ; b3 ; : : : ; bK .
• For k K, the difference between the kth partial sums of the two series
Pk Pk P k
is an bn . This difference is a sum of the terms in an that are
nD1 nD1 nD1
P
k P
k P
k
not in bn minus the sum of the terms in bn that are not in an .
nD1 nD1 nD1
Neither sum contains any of the terms a1 ; a2 ; a3 ; : : : ; aN ,
nor are there any
terms that appear in both sums. It follows that the difference of partial sums
equals a sum minus another sum where ˇ k each ksum ˇcontains distinct terms
ˇP P ˇ
from aNC1 ; aNC2 ; aNC3 ; : : : . Thus, ˇˇ an bn ˇˇ is bounded above by
nD1 nD1
P
1
jan j < .
nDNC1
• Thus, given > 0, there is a K such that for all k K, the k partial sum of
P
1 P1
an and the kth partial sum of bn are within of each other.
nD1 nD1
P
1
• Because an converges to L, given > 0, there is and N1 such that if
ˇnD1 ˇ
ˇPk ˇ
ˇ
k N1 , ˇ an Lˇˇ < 2 .
nD1
(continued)
230 7 Infinite Series
ˇ k ˇ
ˇP Pk ˇ
• Also, there is an N2 such that if k N2 , ˇˇ an bn ˇˇ < 2 .
ˇ k nD1 ˇ nD1 ˇ k ˇ
ˇP ˇ ˇP Pk ˇ
• Then for all k max.N1 ; N2 /, ˇ ˇ ˇ
bn Lˇ ˇ ˇ bn an ˇˇ C
ˇ k ˇ nD1 nD1 nD1
ˇP ˇ
ˇ an Lˇˇ < 2 C 2 D .
ˇ
nD1
P
1 P
1
• Thus, the series an and its rearrangement bn must both converge L.
nD1 nD1
7.7.3 Exercises
1. In which of the following series can the parentheses be removed without affecting
the convergence of the series?
1 1
(a) 1 12 C 12 13 13 C C 13 C 13 14 14 14
1 1 1 1
2
1 1 1 1
3
C C C C
4 4 4 4 5 5 5 5
(b) 12 14 C 16 18 C 10 1 1
12 C
1 1 1 1 1 1
(c) 2 2 C 2 2 C 2 2 C
1 1
(d) 12 13 C 14 C 15 16 17 C 1
9 C 10 1
C 11 121
131
C
1 1 1 1 1 1
8
C 16 C 17 18 19 C
14 15
(e) .1/C 1 12 C 1 12 14 C 1 12 14 18 C 1 12 14 18 16 1
C
P
1
2. Write a proof to show that if an is a conditionally convergent series, then
nD1
there is a rearrangement of the terms of the series that diverges to infinity and a
rearrangement that diverges to negative infinity.
3. Write a proof to show that if a1 , a2 , and a3 are real numbers, the series a11 C a22 C
a3
3
C a41 C a52 C a63 C converges if and only if a1 C a2 C a3 D 0.
P
1 P
1
4. Write a proof to show that if an is an absolutely convergent series, and bn
nD1 nD1
P
1
is a convergent series, then an bn converges.
nD1
P
1 P
1 P
1
5. Give an example of convergent series an and bn where an bn diverges.
nD1 nD1 nD1
6. Using the method described in this section find the first 20 terms of the
rearrangement of the series 1 1 C 12 12 C 13 13 C 14 14 C that converges
to 1.
7.8 Cauchy Products 231
P
1 P
1
Earlier it was shown that if an converges to L and bn converges to M, then
nD1 nD1
P
1
the sum of the series, .an C bn /, converges to L C M. What can be said about
1
nD1
P1 P
the product of the series an bn ? First of all, can this product even be
nD1 nD1
P
1 P
1
written as an infinite series? One could, of course, write an bp , and some
nD1 pD1
sense can be made out of this expression. The notation suggests that for each n,
P
1
one would calculate a limit of an bp , and then one would consider the series of
pD1
those limits. This raises interesting questions about whether that limit, if it should
P
1 P1
exist, has anything to do with the similar looking an bp . In fact, as seen in
pD1 nD1
the exercises, there are examples where interchanging the order of summation in a
double summation can result in a different limit. 1 1
P P
A simpler approach is to group the terms of the product an bn in a
nD1 nD1
way that might allow you to calculate the sum. One strategy is to group the terms
an bp where n C p is a given constant. For example, when the constant is 2, there
is only one term a1 b1 . When the constant is 3, there are two terms a1 b2 C a2 b1 .
P
n1
In general, the grouping of the terms whose subscripts add to n is ap bnp . This
pD1
!
P
1 P
n1
gives what is known as the Cauchy product of the two series ap bnp .
nD2 pD1
P
1
Note that this definition is symmetric in a and b, so the Cauchy product of an
nD1
P
1 P
1 P
1
and bn is the same as the Cauchy product of bn and an .
nD1 nD1 nD1
For example, what is the Cauchy product for the square of the geometric series
P
1
1 P
n1
2n
? Here you have two identical series where an D bn D 21n , so ap bnp D
nD1 pD1
P
n1
1 1 P
n1
1 P
1
2p
2np
D 2n
D n1
2n
. Thus, the Cauchy product is n1
2n
. The Ratio Test
pD1 pD1 nD2
shows that this series converges to some value S. So,
1
X n1
SD
nD2
2n
232 7 Infinite Series
1
X X1
n1 n
2S D D
nD2
2n1 nD1
2n
1
1 X 1
2S S D C D1
2 nD2 2n
SD1
P
1
1
This Cauchy product converges to 1 which is the expected limit since 2n
D 1.
nD1
But Cauchy products do not always behave so nicely. For example, find the Cauchy
P
1
.1/n P
1
.1/n
product of the two series p and p . The Alternating Series Test shows
n nC4
nD1 nD1
that both of these series converge, but the Integral Test shows that neither converges
absolutely. The nth term of the Cauchy product of these two series is
P .1/p .1/np
n1 P
n1
p p D .1/n p 1 . For even values of n, this is a sum of n1
p npC4 p.npC4/
pD1 pD1
1
positive terms of the form p
p.npC4/
. Since the product p.n p C 4/ is maximum
when p D nC4
, each term of the sum is greater than or equal to p2 2
pnC4 D 4
.
2 nC4 nC4
This means that the sum is greater than or equal to 4.n1/nC4
which approaches 4 as
n gets large. Thus, the terms of the Cauchy product do not approach 0 as n goes
to infinity, and the Limit of Terms Test shows that the Cauchy product does not
converge.
This last example shows what can go wrong with the Cauchy product of two
conditionally convergent series, but the results are better when at least one of
the series is absolutely convergent. For example, if both series are absolutely
convergent, then the Cauchy product is absolutely convergent to the product of the
series. To see why this is, just consider the difference between a partial sum of
the Cauchy product of the two series and the product of two partial sums of the
individual series. That is, let k1 and k2 be positive integers, and find the difference
!
k1P P
Ck2 n1
between the .k1 C k2 /th partial sum of the Cauchy product, ap bnp ,
nD2 pD1
and the product of the k1 th and k2 th partial sums, respectively, of the two series,
P
k1 P
k2
am bn . These are both just finite sums where the Cauchy product partial
mD1 nD1
sum includes all the terms am bn where the sum of the subscripts of m C n add to
something less than or equal to k1 C k2 and the other sum includes all the terms
am bn where M k1 and n k2 . Thus, the difference is the
sum of the remaining
P2 1
k1 Ck P2 m
k1 Ck P2 1
k1 Ck P2 n
k1 Ck
terms am bn C bn am . So by choosing k1 and
mDk1 C1 nD1 nDk2 C1 mD1
P2 1
k1 Ck P2 1
k1 Ck
k2 large, you can ensure that both am and bn are small showing the
mDk1 C1 nDk2 C1
necessary convergence.
7.8 Cauchy Products 233
Another way! to think about this theorem is that the Cauchy product
P
1 P
n1 P
1 P
1
ap bnp and the product of the two series am bn are rearrangements
nD2 pD1 mD1 nD1
of each other. Thus, if either converges absolutely, both converge absolutely to the
same limit. Of course, to make this rigorous, one would need to find at least one
rearrangement of the terms into a form c1 C c2 C c3 C and then show that that
series converges absolutely.
If one series converges absolutely and the other only converges conditionally,
then the Cauchy product of the two series still converges to the product of the two
series, but absolute convergence is not guaranteed. The proof is similar to the proof
of the previous theorem in that it carefully considers the difference between the
partial sum of the Cauchy product and product of the two series. This difference can
be broken into three differences each of which can be bounded. Specifically, assume
P
1 P
1
that am is absolutely convergent and bn is convergent. Then consider the
mD1 nD1
difference between the Nth partial sum of the Cauchy product and the product of the
P P
N n1 P1 P1 P
N1 P
Nm P1
series. That difference is ap bnp am bn D am bn am
nD2 pD1 mD1 nD1 mD1 nD1 mD1
Nm N1 1
P1 P
N1 P P
1 P P
1 P
bn D am bn bn C am am bn . In the second term
nD1 mD1 nD1 nD1 mD1
N1
mD1 nD1
P
1 P P1
of this sum, the factor bn is fixed, and the factor am am can be made
nD1 mD1 mD1
as small as necessary by choosing
Nm N large. The
first term of the sum is a little trickier
P P
1
to handle. In the terms am bn bn the am factor can be made small by
nD1 nD1
P
Nm P
1
making m large, and the bn bn factor can be made small by making N m
nD1 nD1
large, or by keeping m small. Both of these can be done, but not at the same time.
The technique one would use here would be to break the sumfrom m D 1 to m D
P
N1 P
Nm P
1
N 1 at some intermediate value K < N 1 writing am bn bn D
mD1 nD1 nD1
P
K P
Nm P
1 P
N1 P
Nm P
1
am bn bn C am bn bn . You can now choose K
mD1 nD1 nD1 mDKC1 nD1 nD1
so that for m > K the value of am is small, and when m K, the N m is large so
P
Nm P1
that bn bn will be small. This gives the following proof.
nD1 nD1
7.8 Cauchy Products 235
P
1 P
1
PROOF: If am is an absolutely convergent series and bn is a
mD1 nD1
convergent series, then the Cauchy product of the two series converges
to the product of the two series.
P
1 P
1
• Let am be an absolutely convergent series, and bn be a convergent
mD1 nD1
series.
• Then for integers N and K with 1 < K < N 1, the difference between the
Nth partial sum of the Cauchy product of the two series and the product of
the two series is
P P
N n1 P
1 P
1
ap bnp am bn D
nD2 pD1 mD1 nD1
P
N1 P
Nm P
1 P
1
am bn am bn D
mD1
nD1 mD1
nD1
N1 1
P
N1 P
Nm P1 P P1 P
am bn bn C am am bn D
mD1
nD1 nD1
mD1
mD1 nD1
P
K P
Nm P
1 P
N1 P
Nm P
1
am bn bn C am bn bn C
mD1 nD1 nD1 nD1 nD1
N1 mDKC1
P P
1 P
1
am am bn :
mD1 mD1 nD1
P
T
• Because bn converges as T goes to infinity, it remains bounded. Thus,
nD1 ˇ T ˇ
ˇP P1 ˇ
there is a number M such for all T, ˇˇ bn bn ˇˇ < M.
nD1 nD1
• Let > 0 be given.
P1
• Because am converges absolutely, there is an integer K such that
mD1
P
1
jam j < 3M
.
mDKC1
P
N P1
• Because bn converges to bn , there is a positive integer N1 such that
nD1 ˇ N nD1 ˇ
ˇP P
1 ˇ
for all N N1 , ˇˇ bn bn ˇˇ <
1
!.
P
nD1 nD1 3 1C jam j
mD1
P
N1 P1
• Because am converges to am , there is a positive integer N2 such that
mD1 ˇ N mD1ˇ
ˇP P
1 ˇ
for all N N2 , ˇˇ am am ˇˇ <
ˇ 1 ˇ! .
ˇP ˇ
mD1 mD1 3 1Cˇˇ bn ˇˇ
nD1
• Let N max.N1 C K; N2 C 1/.
(continued)
236 7 Infinite Series
ˇ ˇ ˇ ˇ
ˇP P P
1 P
1 ˇ ˇ N1 P P P1 P
1 ˇ
ˇ N n1 ˇ Nm
• Then ˇˇ ap bnp am bn ˇ D ˇˇ am bn am bn ˇˇ D
nD2 pD1 mD1 nD1 ˇ mD1 nD1 mD1 nD1
ˇ Nm Nm N1 1 ˇˇ
ˇP P P
1 P P P
1 P P
1 P ˇ
ˇ K N1
ˇ a b b C am bn bn C am am bn ˇ
ˇmD1 m nD1 n nD1 n mDKC1 nD1 nD1 mD1 mD1 nD1 ˇ
ˇNm ˇ ˇNm ˇ ˇ N1 ˇˇ 1 ˇ
PK ˇP P
1 ˇ P
N1 ˇP P
1 ˇ ˇP P
1 ˇˇP ˇ
jam j ˇˇ bn bn ˇˇ C jam j ˇˇ bn bn ˇˇ C ˇˇ am am ˇˇ ˇˇ bn ˇˇ <
mD1 nD1 nD1 nD1 nD1 mD1 mD1 nD1
mDKC1
ˇ ˇ
PK ˇP 1 ˇ
ˇ 1 ˇ! ˇˇ bn ˇˇ < 3 C 3 C 3 D .
! C MC
jam j 1
P 3M ˇP ˇ
mD1 3 1C jam j 3 1Cˇˇ bn ˇˇ nD1
mD1 nD1
P P
N Nn
• Therefore, the Cauchy product ap bnp converges to the product of
nD2 pD1
P
1 P
1
the series am bn .
mD1 nD1
Cauchy products play a particularly useful role in the study of power series, a
topic covered in the next chapter.
7.8.1 Exercises
1. Let am;n be the nth number in the mth row of the following table where m and n
both range from 1 to infinity.
1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0
1 1
2 2
12 12 0 0 0 0 0 0 0 0 0 0 0 0
1 1 1 1
4 4 4 4
14 14 14 14 0 0 0 0 0 0 0 0
1 1 1 1 1 1 1 1
8 8 8 8 8 8 8 8
18 18 18 18 18 18 18 18
P
1 P
1 P
1 P
1
Show that am;n is not equal to am;n .
mD1 nD1 nD1 mD1
2. Show that the Cauchy product for the square of the conditionally convergent
P
1
.1/n
series n
converges.
nD1
P
1
.1/n
3. Show that the Cauchy product for the square of the series p diverges.
n
nD1
4. Suppose you have two series whose indices begin with 0 rather than 1 as in
P
1 P
1
an and bn . Show that the Cauchy product of these two series is then
nD0 nD0
P1 Pn
ap bnp .
nD0 pD0
7.8 Cauchy Products 237
5. In the next chapter it will be shown that for all real values of x, the exponential
P
1 n
function has the series representation ex D x
nŠ
. Use the Cauchy product of
nD0
series to show that ea eb D eaCb .
Chapter 8
Sequences of Functions
Chapter 3 introduces the idea of a sequence of real numbers <an > and discusses
theorems related to the limit lim an , limit superior lim sup an , limit inferior
n!1 n!1
lim inf an , and subsequences <anj > of such a sequence. If instead of requiring
n!1
the terms of the sequence an to be constants, the an were allowed to depend on
the value of a variable as in fn .x/, then the sequence is a sequence of functions.
Thus, for each value of x, if all the functions fn .x/ are defined at x, then there is
a sequence of real numbers, <fn .x/>. This sequence changes as x changes, and,
indeed, there is a different sequence of real numbers for each choice of x. The limit
of the sequence, if it exists, could be different for each x, and, therefore, the limit
would also be a function, f .x/. The first question that arises is, what is meant by
the convergence of such a sequence? In fact, there are many different definitions
for the convergence of a sequence of functions, each with its own applications and
properties. The next question is, what can one say about the properties of the limit
of the sequence? For example, under what conditions can you know that the limit
function is continuous, differentiable, or integrable? In particular, if the sequence
of integrable functions <fn .x/> converges to an integrable function f .x/, when can
Rb Rb
you conclude that lim fn .x/dx D f .x/dx?
n!1 a a
The simplest form of convergence of a sequence of functions is to say that the
sequence of functions <fn .x/> converges pointwise to the function f .x/ on a set
A if for each x 2 A, lim fn .x/ D f .x/. This type of convergence is referred
n!1
to as pointwise convergence. For example, the sequence of functions fn .x/ D nx
converges pointwise to the function f .x/ D 0 on the entire real line because for each
x 2 R, lim nx D 0. A more interesting example is the sequence fn .x/ D xn which
n!1
converges pointwise on the interval .1; 1. When jxj < 1, the powers xn get small
as n gets large so lim xn D 0. But when x D 1, the powers xn D 1, so the limit
n!1
-1 1
0 if 1 < x < 1
of the sequence is 1. Thus, the limit function is f .x/ D : Note
1 if x D 1
that this is a sequence of continuous functions that converges to a function that is not
continuous. The sequence does not converge at x D 1 because the terms oscillate
between 1 and 1 (Fig. 8.1).
Continuity is not the only property not preserved by functions converging
nC1
pointwise. The terms of the sequence fn .x/ D jxj n are differentiable functions
for all real numbers, but the limit of the sequence is the function f .x/ D jxj
which
8 is not differentiable9at x D 0 (Fig. 8.2). The terms of the sequence f .x/ D
ˆ
ˆ n 2
x if 0 x 1n >>
ˆ
ˆ >
>
ˆ
< >
=
2 1 2
R2
n2 . n x/ if n < x < n all have integral fn .x/dx D 1, yet the sequence
ˆ
ˆ >
> 0
ˆ
ˆ >
>
:̂ 2
>
;
0 if n x 2
converges pointwise on the interval Œ0; 1 to the function f .x/ D 0 which has integral
equal to 0 (Fig. 8.3).
8.2 Uniform Convergence 241
f2
f1
8.1.1 Exercises
Determine the pointwise limits of the following sequences of functions. For which
sequences is the limit continuous? For which sequences is the limit of the integrals
of the terms equal to the integral of the limit?
p
1. fn .x/ D n x for x 2 Œ0; 16.
2. Let r1 ; r2 ; r3 ; : : : be a sequence
consisting of all the rational numbers in the
1 if x D rk for some k n
interval Œ0; 1. Let fn .x/ D for x 2 Œ0; 1.
0 otherwise
3. fn .x/ D (nx for x 2 .0; 1/.
n
)
n if 2nC4 n
< x < 2nC4
nC4
4. fn .x/ D for x 2 Œ0; 1.
0 otherwise
8 1
9
< .2 C .1/n / n2 x if 0 < x < 2n =
5. fn .x/ D .2 C .1/n / n2 1n x if 2n 1
x < 1n for x 2 Œ0; 1.
: ;
0 otherwise
The sequence <fn > converges pointwise to f on the set A if given > 0, for each
x 2 A there is an integer N such that jfn .x/ f .x/j < for all n N. So, for
each x there is an integer N that ensures the inequality. The value of N can depend
on the choice of x. If this dependence is dropped, and you are able to specify a
value of N that does not depend on the choice of x, then the speed of convergence
becomes similar for all x 2 A; that is, the rate of convergence is uniform for all
x 2 A. The sequence <fn > converges uniformly to f on the set A if, given
> 0, there is an integer N such that for each x 2 A, jfn .x/ f .x/j < for all
n N. The difference between a sequence of functions converging uniformly and
converging pointwise is that with uniform convergence there can be no points of
242 8 Sequences of Functions
the set A where convergence “lags behind.” For any region of width > 0 around
the limit function, all of the terms suitably far down the sequence enter that region.
Compare, for example, the uniformly convergent sequence depicted in Fig. 8.4 with
the pointwise convergent sequences depicted in Figs. 8.1 and 8.3. In Fig. 8.4 the
functions of the sequence get close to the limit function for all the values of x,
whereas for each function in Figs. 8.1 and 8.3 there is an x for which the function
is far from its limit. Clearly, if a sequence of functions converges uniformly, then it
also converges pointwise. Thus, to converge uniformly is a stronger condition than
to converge pointwise.
As seen in the previous section, the terms of the sequence <fn > can have many
properties that are not automatically inherited by the limit of the sequence, f , when
the convergence is pointwise. Under uniform convergence, more of the properties
of the terms of the sequence are retained by the limit. This is because under uniform
convergence there are no points x 2 A for which the values of <fn .x/> lag behind
as n gets large. For all values of x 2 A, the sequence of <fn .x/> values get close to
the corresponding f .x/ at a rate at least as fast as some fixed rate.
For example, if <fn > is a sequence of functions continuous on the set A which
converges uniformly to f on A, then the limit is guaranteed to be continuous.
Actually, a stronger statement can be made. If all the terms fn are continuous at
some point a 2 A, then the limit, f , will also be continuous at a. To prove that f is
continuous at a, you will need to show that for each > 0 there is a ı > 0 such
that if x is in A with jx aj < ı, then jf .x/ f .a/j < . How can you arrange for
f .x/ to be close to f .a/? What you know is that the functions fn get uniformly close
to f , and that the fn functions are continuous at a. Since, for any particular n, the
term fn is continuous at a, you can arrange for fn .x/ to be close to fn .a/. The uniform
convergence allows you to choose an integer n so that for every x 2 A, fn .x/ is close
8.2 Uniform Convergence 243
f(x)
3 fn(x)
( a x
)
to f .x/. That is, jf .x/ f .a/j D jf .x/ fn .x/ C fn .x/ fn .a/ C fn .a/ f .a/j
jf .x/ fn .x/j C jfn .x/ fn .a/j C jfn .a/ f .a/j. Each of these three terms can be made
small, say less than 3 , so that the sum is less than . The key point here is that only
one value of n needs to be chosen so that jf .x/ fn .x/j can be made less than 3 no
matter which x is chosen (Fig. 8.5).
PROOF: If the sequence <fn > converges uniformly to the limit f on the set
A and if for each n, fn is continuous at a 2 A, then f is continuous at a 2 A.
In particular, if each fn is continuous on A, then f is continuous on A.
• Let <fn > be a sequences of functions that converge uniformly to the
function f on a set A.
• Assume that each fn is continuous at point a 2 A.
• Let > 0 be given.
• Because the sequence converges uniformly, there is an integer N such that
jfn .x/ f .x/j < 3 for all x 2 A and all n N.
• Because fN is continuous at a, there is a ı > 0 such that jfN .x/ fN .a/j < 3
for all x 2 A satisfying jx aj < ı .
• Then, for all x 2 A satisfying jx aj < ı, it follows that
jf .x/ f .a/j D jf .x/ fN .x/ C fN .x/ fN .a/ C fN .a/ f .a/j
jf .x/ fN .x/j C jfN .x/ fN .a/j C jfN .a/ f .a/j < 3 C 3 C 3 D .
• Therefore, the function f is continuous at a 2 A.
• Moreover, if each function fn is continuous at each a 2 A, then f is
continuous at each x 2 A, so f is continuous on A.
It is worth considering where this proof breaks down if all you assume is that
the sequence <fn > converges pointwise to f . The problem comes in the fact that
although jfN .a/ f .a/j and jfN .x/ fN .a/j can be made smaller than 3 , there could
244 8 Sequences of Functions
be values of x very close to a for which jfN .x/ f .x/j is no longer small. Thus, the
needed inequality jf .x/ fN .x/j C jfN .x/ fN .a/j C jf .a/ f .a/j < might not
hold. Also consider the function f defined on the interval Œ0; 2 by f .x/ D x if x ¤ 1
and f .1/ D 3. If for each positive integer n you let fn .x/ D f .x/ C 1n , then it is
clear that the sequence <fn > converges uniformly to f . At the points where each fn
is continuous, that is, for x ¤ 1, the limit function f is also continuous.
Suppose that <fn > is a sequence of functions Riemann integrable on an interval
Œa; b and that this sequence converges to a limit f . Examples in the last section show
Rb
that if the convergence is pointwise, the limit lim fn .x/dx does not necessarily
n!1 a
Rb
equal f .x/dx. Moreover, the function f does not even need to be Riemann
a
integrable, and the limit of the integrals of the fn might not exist. On the other
hand, if the convergence is uniform, then the limit function f will be Riemann
integrable and the limit of the integrals of the fn will equal the integral of f . Showing
that the uniform limit of Riemann integrable functions is Riemann integrable is not
difficult and is based on the characterization of Riemann integrable functions given
by Lebesgue’s Theorem. Recall that a function is Riemann integrable on an interval
if and only if it is bounded and the set of points where the function is discontinuous
has measure zero. If each term of the sequence, fn , has these properties, then the limit
function, f , must also have them. By the definition of uniform convergence, there is
an integer N such that jfN .x/ f .x/j < 1 for all x 2 Œa; b. So, if the function fN is
bounded by some constant M, then the function f must be bounded by M C 1 since
for all x 2 Œa; b it follows that M 1 fN .x/ 1 < f .x/ < fN .x/ C 1 M C 1.
As for points of discontinuity of f , for each positive integer n, let Dn be the set of
points in Œa; b where the function fn is discontinuous. Because each fn is Riemann
1
integrable, each Dn has measure zero. But then D D [ Dn is a countable union
nD1
of sets of measure zero, so it also has measure zero. The sequence <fn > converges
uniformly on the set A D Œa; bnD, and each term of the sequence is continuous at
each point of A, so, by the preceding theorem, the limit f must be continuous
on A. Thus, the set of discontinuities of f must be contained in D, so the set of
discontinuities of f has measure zero. Therefore, f is Riemann integrable.
Rb Rb
So why does it follow that lim fn .x/dx D f .x/dx? From the definition of
n!1 a a
uniform convergence, for every > 0 there is an integer N such that n N implies
that jfn .x/ f .x/j < for every x 2 Œa; b. This means that for every n N and
Rb
every x 2 Œa; b it follows that f .x/ < fn .x/ < f .x/ C , so f .x/dx .b a/ D
a
Rb Rb Rb Rb
.f .x//dx fn .x/dx .f .x/C/dx D f .x/dxC.ba/. Thus, by selecting
a a a a
Rb Rb
a more appropriate value for , this shows that lim fn .x/dx D f .x/dx.
n!1 a a
8.2 Uniform Convergence 245
PROOF: Assume that <fn > is a sequence of functions that are Riemann
integrable on the interval Œa; b. If the sequence converges uniformly to f ,
Rb Rb
then f is also Riemann integrable on Œa; b, and lim fn .x/dx D f .x/dx.
n!1 a a
8.2.1 Exercises
1. Show that lim xn converges uniformly to 0 on any interval Œa; a where 0 <
n!1
a < 1.
1
2. Show that lim nx converges uniformly to 0 on any interval Œa; 1/ for a > 0 but
n!1
not on the interval .0; 1/.
3. Another way to show that the uniform limit of Riemann integrable functions is
Riemann integrable is to show that the limit function has upper and lower step
functions, u and v, such that the integrals of u and v are within > 0 of each
other. Write a proof that uses this strategy.
4. Suppose that the sequence of functions <fn > converges uniformly to the function
f and that g is a uniformly continuous function defined on the range
of f and the
ranges of each of the
f n functions. Prove that the functions g f n .x/ converge
uniformly to g f .x/ .
Combining these you can show that for any y in some interval around x, the value
of fN .y/ is close to the value of f .y/. The crucial observation here is that once you
know that fN .y/ and f .y/ are close, the monotonicity of the convergence gives you
that fn .y/ is between fN .y/ and f .y/ for all n N, and, thus, fn .y/ will be close to
f .y/ for all n N. Note, though, that the value of N can vary with the value of x.
Well, this means that for each x 2 Œa; b there is an interval around x where
fn .y/ is close to f .y/ for all y in the interval and all n N. Now you can use the
compactness of the interval Œa; b, that is, you can use the Heine–Borel Theorem to
show that there is a finite collection of these x values, say x1 ; x2 ; x3 ; : : : ; xk , such that
the entire interval Œa; b is covered by these intervals you constructed around each
of the xj s. Each of the xj s was associated with an Nj , and now one can select the
maximum of these Nj values to get a single function fN which is uniformly close
to f . Again, because the convergence is monotone, once you know that fN is close to
f , you know that fn is close to f for all n N. This will complete the proof.
PROOF: Assume that <fn > is a sequence of functions continuous on the
interval Œa; b that converges monotonically to the function f that is also
continuous on Œa; b. Then the sequence converges uniformly to f on Œa; b.
• Assume that <fn > is a sequence of functions continuous on the interval
Œa; b that converges monotonically to the function f that is also continuous
on Œa; b.
• Let > 0 be given.
• Let x 2 Œa; b.
• Because the function f is continuous at x, there is a ı1 > 0 such that if
y 2 Œa; b with jy xj < ı1 , then jf .y/ f .x/j < 3 .
• Because lim fn .x/ D f .x/, there is an integer Nx such that if n Nx , then
n!1
jfn .x/ f .x/j < 3 .
• Because fNx is continuous at x, there is a ı2 > 0 such that if y 2 Œa; b with
jy xj < ı2 , then jfNx .y/ fNx .x/j < 3 .
• Let ıx D min.ı1 ; ı2 /.
• Then, if y 2 Œa; b with jy xj < ıx , it follows that jfNx .y/ f .y/j D
jfNx .y/ fNx .x/ C fNx .x/ f .x/ C f .x/ f .y/j
jfNx .y/ fNx .x/j C jfNx .x/ f .x/j C jf .x/ f .y/j < 3 C 3 C 3 D .
• The interval Œa; b is covered by the collection of open intervals
.x ıx ; x C ıx / for x 2 Œa; b.
• By the Heine–Borel Theorem, there is a finite collection of these x
values, x1 ; x2 ; x3 ; : : : ; xk , such that the intervals .xj ıxj ; xj C ıxj / for j D
1; 2; 3; : : : ; k covers the interval Œa; b.
• Let N D max.Nx1 ; Nx2 ; Nx3 ; : : : ; Nxk /.
• Let y 2 Œa; b.
• There is a value of j between 1 and k such that y 2 .xj ıxj ; xj C ıxj /.
• Because the sequence <fn > converges monotonically to f , for all n N
Nxj , fn .y/ is between fNxj .y/ and f .y/, and jfNxj .y/ f .y/j < .
• This shows that the sequence <fn > converges uniformly to f .
248 8 Sequences of Functions
p
x
The sequence of functions fn .x/ D 2xC n converges monotonically to f .x/ D 2x
on the interval Œ0; 4. Since each fn and f is continuous on Œ0; 4, you can conclude
that the convergence of the sequence is uniform. On the other hand, the sequence
of continuous functions fn .x/ D xn converges monotonically on the interval Œ0; 1
to a function discontinuous at 1. Clearly, then, the sequence does not converge
uniformly.
Another important theorem about monotone convergence is that if <fn > is a
sequence of functions Riemann integrable on the interval Œa; b that converge mono-
Rb Rb
tonically to the Riemann integrable function f , then lim fn .x/dx D f .x/dx. The
n!1 a a
result is called the Monotone Convergence Theorem for Riemann Integrals. It is
generally not proved in a book of this type because it is an easy consequence of the
Monotone Convergence Theorem of Lebesgue which is covered in any beginning
course in measure theory, but that study requires the development of Lebesgue
measure, a topic which is beyond the scope of this book.
It does need to be pointed out that even if all the terms of a sequence are Riemann
integrable functions, and the sequence converges monotonically to a function f ,
it may be that the limit, f , is not itself Riemann integrable. For example, let
r1 ; r2 ; r3 ; : : : be a sequence consisting of all the rational numbers in the interval
Œ0; 1. Let fn .x/ be the function equal to 1 for x D r1 ; r2 ; r3 ; : : : ; rn and equal
to 0 elsewhere. Then each fn has finitely many points of discontinuity so each
fn has a Riemann integral on Œ0; 1 equal to 0. Yet the sequence <fn > converges
monotonically to the function f equal to 1 for rational values of x and 0 for irrational
values of x, so f is discontinuous everywhere, and, as a result, it is not Riemann
integrable.
So, suppose that <fn > is a sequence of functions Riemann integrable on the
interval Œa; b that converge monotonically to a limit function f that is also Riemann
integrable on Œa; b. Without loss of generality one can assume that the sequence
is monotonically decreasing to f because if the sequence were increasing, the same
argument could just be applied to the sequence <fn >. Also, it can be assumed that
the function f is identically 0 on Œa; b because if that is not the case, the argument
could be applied to the sequence <fn f > which does decrease monotonically to 0,
Rb Rb Rb
and lim Œfn .x/ f .x/dx D 0 is equivalent to lim fn .x/dx D f .x/dx. 1
n!1 a n!1 a a
A proof of the Monotone Convergence Theorem for Riemann Integrals would
start with an > 0, and the goal of the proof would be to show that there is an
Rb
integer N such that for all n N, it follows that fn .x/dx < . The proof presented
a
here is based on the fact that for any Riemann integrable function, fn , you can find
upper and lower step functions, un .x/ and vn .x/, satisfying vn .x/ fn .x/ un .x/
Rb
for every x 2 Œa; b so that a .un .x/ vn .x//dx is as small as you like. Suppose
Rb
you select un and vn so that a .un .x/ vn .x//dx < 2n . That is, find upper and
lower step functions for each fn such that they give increasingly better and better
1
This proof is based on ideas from the article Monotone Convergence Theorem for the Riemann
Integral by Brian S. Thomson from the American Mathematical Monthly, June–July 2010.
8.3 Monotone Convergence 249
be dropped from the subcover. Because the subcover contains only a finite number of
open intervals, all of these superfluous intervals can be dropped from the subcover.
Consider the intervals associated with each of the yj values. For simplicity, let the
interval associated with yj be renamed .aj ; bj /. Note that the endpoints a and b will
be among the yj values because for every n each of these endpoints was covered by
only one possible open interval. At this point the left endpoint associated with a can
be set to a and the right endpoint of the interval associated with b can be set to b. It
is important to note that if n.yi / D n.yj / for some distinct i and j, then the intervals
associated with yi and yj do not overlap. This is because the intervals associated with
yi and yj are distinct intervals from .a1; x1 /; .x1 ; x2 /; .x2 ; x3 /; : : : ; .xk1 ; bC1/. Let
N be the maximum of the finitely many n.yj / values for j D 1; 2; 3; : : : ; m. Because
no value of y 2 Œa; b appears in more than two of the open intervals associated with
the yj , it can be concluded that
Zb m Z
X
bj
m Z
X
bj
2 3 2 3
X Zbj X Zbj
m
6 7
m
6 7
4 fn.yj / .x/ fn.yj / .yj / dx C fn.yj / .yj /.bj aj /5 4 un.yj / .x/ vn.yj / .yj / dx C .bj aj /5
jD1 aj jD1 aj
N Z
X
b
X
N
up .x/ vp .x/ dx C 2.b a/ 2p C 2.b a/ < .2b 2a C 1/:
pD1 a pD1
There were two places in the above argument where quantities were forced to be
less than the given value . It can now be seen that those quantities should have been
Rb
made smaller than 2b2aC1 so that the final inequality would show fN .x/dx < as
a
needed. It is also worth noting that there were two places in the argument that use
the fact that the sequence <fn > converges monotonically. The first was to conclude
that when, for a particular y 2 Œa; b, the value of fn .y/ is small, then the values of
fm .y/ are also small for all m n. The second important use of monotonicity takes
Rb Rb
the final result that fN .x/dx < and concludes that fm .x/dx < for all m N.
a a
(continued)
8.3 Monotone Convergence 251
(continued)
252 8 Sequences of Functions
• Let N D max n.y1 /; n.y2 /; n.y3 /; : : : ; n.ym / .
• Then
Zb m Z
X
bj
m Z
X
bj
2 3
X Zbj
6
m
7
4 fn.yj / .x/ fn.yj / .yj / dx C fn.yj / .yj /.bj aj /5
jD1 aj
2 3
X Zbj N Zb
6 7 X
m
4 un.yj / .x/ vn.yj / .yj / dx C .bj aj /5 up .x/ vp .x/ dx C 2.b a/
jD1 pD1 a
2b 2a C 1
aj
X
N
2p C 2.b a/ < .2b 2a C 1/ D :
pD1
2b 2a C 1 2b 2a C 1 2b 2a C 1
Pointwise convergence and uniform convergence are not the only methods of
convergence of sequences of functions. Another method suggested by the above
theorem is called convergence in mean or convergence in L1 . A sequence of
Riemann integrable functions <fn > is said to converge in mean to the Riemann
Rb
integrable function f on the interval Œa; b if lim jfn .x/ f .x/jdx D 0. For
n!1 a
example, consider the following sequence of functions defined on the interval Œ0; 1.
Define f .xI a; b/ be the function that is 1 for x in the interval Œa; b and 0 for all
other x. Then for positive integer n and for integer k with 2n1 k < 2n , let
n1 kC12n1 n1 kC12n1
fk .x/ D f .xI k2
2n1
; 2n1 /. The integral of f .xI k2 2n1
; 2n1 / from 0 to 1 is
1
2n1
, so the integrals of f k .x/ approach 0 as k gets large. Thus, fk converges in mean
to the zero function. Yet this sequence of functions does not converge pointwise for
any single value of x.
1 1
by the Comparison Test for if x 0, then n2 Cx
n2
and if x < 0, then for any n
,
1 2 P1
with n2 > 2jxj it follows that n2 Cx
D 2n2 2jxj
n22 . Then 1
n2 Cx
converges since
nD1
P
1
1
n2
converges.
nD1
P
1
Another example is f .x/ D xn which is just a geometric series which
nD1
converges to f .x/ D 1x x
for all x satisfying jxj < 1. Note here that the function
f .x/ D 1x is defined for all x ¤ 1, but the infinite series is only defined for jxj < 1.
x
This is an example of a power series dealt with in considerably more detail in the
next section.
The results concerning the convergence of sequences of functions discussed
earlier in this chapter apply to the study of infinite series of functions because
an infinite series is just defined to be the sequence of its partial sums. Still other
questions arise such as, can one find the derivative or the integral of an infinite series
by simply differentiating or integrating the terms of the series and then finding the
limit of the resulting partial sums? The answer to this question is that sometimes one
gets a correct answer by differentiating or integrating a series term by term, but other
times this process results in nonsense. For example, consider again the function
P1
1
R R P
1
1
1 R
P 1
f .x/ D 2
n Cx
. Here, the statement that f .x/ dx D 2
n Cx
dx D n2 Cx
dx
nD1 nD1 nD1
1 R
P 1 P
1
is not valid since n2 Cx
dx D ln n2 C x C C which does not converge
nD1 nD1
for any value of x. Alternatively, for this particular series it is valid to use the
Ry Ry P
1
1
1 Ry
P 1
definite integral from 0 to y and write f .x/ dx D 2
n Cx
dx D n2 Cx
dx D
0 0 nD1 nD1 0
P1 2
ln n nCy2 which does converge for each y > 1. The integral and derivative of
nD1
P
1
the series f .x/ D xn make perfectly good sense in the range jxj < 1.
nD1
P
1
One simple observation about series an .x/ is that if there is a convergent
nD1
P
1
series of positive numbers Mn such that for each n, the term an is bounded by
nD1
P
1
Mn for all x in some set A, then the series an .x/ converges uniformly on A. This
nD1
in known as the Weierstrass M-Test. Consider how the proof of this result would
P
1
go. First, of course, you would assume that you had a series of functions, an .x/,
nD1
P
1
and a convergent series of positive numbers, Mn , such that for each positive
nD1
integer n, jan .x/j Mn for every x 2 A. You should note that for each x 2 A, the
254 8 Sequences of Functions
P
1 P
1
series an .x/ converges by the Comparison Test and, thus, an .x/ converges
nD1 nD1
pointwise. You are to prove that the sequence of function converges uniformly, so
you would need to take an > 0 and show that there is an integer N such that
P
m
whenever m N and x 2 A, the partial sum an .x/ is within of the limit
nD1
P
1 P1
an .x/. The difference between the mth partial sum of an .x/ and its limit
nD1ˇ ˇ nD1
ˇ P 1 ˇ P
1 P1
is ˇˇ an .x/ˇˇ jan .x/j Mn which can be made less than by
nDmC1 nDmC1 nDmC1
selecting m large. The value of m does not depend on x showing that the convergence
is uniform. This gives the following proof.
P
1
PROOF (Weierstrass M-Test): Let an .x/ be a series of functions
nD1
P
1
defined on the set A, and let Mn be a convergent series of positive
nD1
numbers. If for each n and each x 2 A it holds that jan .x/j Mn , then
P1
an .x/ converges uniformly on A.
nD1
P
1 P
1
• Let an .x/ be a series of functions defined on the set A, and let Mn be
nD1 nD1
a convergent series of positive numbers.
• Assume that for each n and each x 2 A it holds that jan .x/j Mn .
P
1
• Then for each x 2 A it follows from the Comparison Test that an .x/
nD1
converges absolutely and, thus, the series converges.
• Let > 0 be given.
P
1 P
1
• Because Mn converges, there is an integer N such that Mn < for
nD1 nDm
all m N.
• But, then, for each x 2 A and each m N, ˇ 1 ˇ
the difference between the mth
P1 ˇ P ˇ P
1
partial sum of an .x/ and its limit is ˇˇ an .x/ˇˇ jan .x/j
nD1 nDmC1 nDmC1
P
1
Mn < .
nDmC1
P
1
• Thus, an .x/ converges uniformly on A.
nD1
P
1
1
For example, the series n2 Cx
converges uniformly on the interval Œ0; 1/
nD1
1 1 P
1
1
because for all x 2 Œ0; 1/, n2 Cx
n2
, and the series n2
converges. Since
nD1
all the partial sums of the series are continuous functions, it follows from this
8.5 Power Series 255
uniform convergence that the limit function is continuous on Œ0; 1/. Similarly, the
P
1 ˇ ˇ
sin.n2 x/ ˇ sin.n2 x/ ˇ
series n2
converges uniformly on the entire real line because ˇ n2 ˇ
n12
nD1
for every positive integer n. Again, you can conclude that the limit function is
continuous because all the partial sums are continuous functions. Notice, though,
P
1
that if you differentiate each term of this series, you get cos.n2 x/ which does not
nD1
converge for any value of x because the terms do not approach 0.
Power series form a class of infinite series of functions that stands out because of the
particularly nice properties they satisfy, the ease in which
pthey can be produced, the
many well-known elementary functions they represent ( x; ex ; sin x; cos x; ln x),
and the enormous number of applications they have. A power series is a series of
P
1
the form an .x c/n , where the real number an is the nth coefficient and c is the
nD0
center of the power series. This book will consider such series where the variable,
coefficients, and center are real numbers, although most of what is said here holds
when these quantities are allowed to be complex numbers. In fact, such series play
a central role in Complex Analysis.
The first important result about power series is that they converge in an interval
.c R; c C R/ where c is the center of the power series and R, called the radius of
convergence, is a nonnegative real number or possibly even infinity. In fact, if the
P
1
power series an .xc/n converges for a particular real number y, then it converges
nD0
absolutely for any x satisfying jx cj < jy cj, that is, for any x closer to c than y.
The proof is based on the Weierstrass M-Test where the power series ˇ ˇ at the point x
ˇ ˇ
is compared to a convergent geometric series with common ratio ˇ xc yc ˇ
which is less
than 1.
256 8 Sequences of Functions
P
1
PROOF: If the power series an .x c/n converges when x D y, then the
nD0
series converges absolutely for all x satisfying jx cj < jy cj.
P
1
• Let an .x c/n be a power series that converges at x D y.
nD0
P1
• Since an .y c/n converges, its terms must approach 0 by the Limit of
nD0
Terms Test.
• Thus, the terms must be bounded, and there exists a real number M such
that jan .y c/n j M for every nonnegative integer n.
• Let x be any real number satisfyingˇ jx ˇn cj < jyˇ cj.
ˇ
n ˇ xc ˇ ˇ ˇn
• Then jan .x c/ j D jan .y c/ j ˇ yc ˇ M ˇ xc
n
yc ˇ
.
P1 ˇ ˇn
ˇ ˇ
• The series yc ˇ
M ˇ xc is a convergent geometric series with common ratio
ˇ ˇ nD0
ˇ xc ˇ
ˇ yc ˇ < 1.
P1
• Thus, jan .y c/n j converges absolutely by the Weierstrass M-Test.
nD0
It follows immediately from the previousˇ theorem that the radius of convergence for
a power series is R D supfjy cj ˇ the series converges at yg, and that the power
series converges absolutely for all x 2 .c R; c C R/. This does not say anything
about how the power series behaves at the end points c R and c C R. There are
examples of power series that converge at both endpoints, that converge at one of the
two endpoints, or converge at neither endpoint. It also follows from the above proof,
that if the power series converges absolutely at y, then it converges uniformly for all
x satisfying jx cj jy cj. In particular, since all the partial sums of the series are
continuous functions, if the power series has radius of convergence R > 0 and ı is
any positive number less than R, then the series converges absolutely at x D cCRı,
so the series converges absolutely and uniformly on Œc R C ı; c C R ı. As a
P1
result, the function f .x/ D an .x c/n is continuous on Œc R C ı; c C R ı for
nD0
all small ı > 0, so it is continuous on the open interval .c R; c C R/. If the series
converges absolutely for x D c C R, then f .x/ is continuous on the closed interval
P
1
Œc R; c C R. What if the series an .x c/n converges conditionally at x D c C R
nD0
or x D c R? Does this mean that the function is continuous at that endpoint? The
answer is yes, but this takes some proof and is known as Abel’s Theorem.
8.5 Power Series 257
P
1
PROOF (Abel’s Theorem): Suppose the power series an .x c/n has
nD0
positive radius of convergence R < 1, and that the series converges at
one of the endpoints c R or c C R. Then the series is continuous on an
interval from c R to c C R containing that endpoint.
P
1
• Let an .x c/n be a power series with positive radius of convergence
nD0
R < 1.
• Assume that the series converges at one of the endpoints of the interval of
convergence, c R or c C R.
• Without loss of generality c D 0 and R D 1 because the argument can be
applied to the series where x is replaced by Rx C c. Thus, assume that the
P
1
series is an xn with radius of convergence 1.
nD0
• Also, it can be assumed that the series converges at 1, because if it converges
P
1
at 1, the argument can be applied to the series an .1/n xn which
nD0
converges at 1.
• Finally, by subtracting a constant from the constant term of the series, a0 , it
P
1
can be assumed that an D 0.
nD0
P
k
• Let this series have partial sums sk D an .
nD0
P
1
• Because lim sk D 0, the sk are bounded, and, in particular, sn x n
k!1 nD1
converges for all x with jxj < 1.
P
1 P
1 P
1 P
1
• Then an xn D a0 C .sn sn1 /xn D s0 C sn x n sn1 xn D
nD0 nD1 nD1 nD1
P
1 P
1
s0 C sn x .1 x/ s0 x D .1 x/
n n
sn x .
nD1 nD0
• Let > 0 be given.
• Because lim sn D 0, there is an integer N such that for all n N, jsn j < 2 .
ˇ 1n!1 ˇ ˇ ˇ ˇ ˇ ˇ ˇ
ˇP ˇ ˇ P
1 ˇ ˇ P
N ˇ ˇ P
1 ˇ
• Then ˇˇ nˇ ˇ
an x ˇ D ˇ.1x/ nˇ ˇ
sn x ˇ ˇ.1x/ nˇ ˇ
sn x ˇ C ˇ.1x/ nˇ
sn x ˇ
ˇ nD0 N ˇ nD0
ˇ nD0 N ˇ nDNC1
ˇ P ˇ P
1 ˇ P ˇ
ˇˇ.1x/ sn xn ˇˇ C.1x/
xn D ˇˇ.1 x/ sn xn ˇˇC 2 .1x/ x1x D
NC1
2
ˇ nD0
ˇ nDNC1 nD0
ˇ PN ˇ NC1
ˇ.1 x/ sn x ˇˇ C 2 x
n
.
ˇ
nD0
• Because the limit of this quantity as x approaches 1 from the left is 2 , there
exists ı > 0 such that for all x between 1 ı and 1, this expression is less
than .
P
1
• This shows that lim an xn D 0 which completes the proof.
x!1 nD0
258 8 Sequences of Functions
P
1
• nn xn
nD0
1
The center is c D 0, and the radius of convergence is R D lim p
n n
n
D
n!1
nn
lim D 0. This series only converges at its center.
n!1 .nC1/
nC1
1 22 2 4 6
• 21
x C 32 x C 213 x3
C 234 x4 C 215 x5 C 236 x6 C
1
The center is c D 0. This is an example of a series where lim p
n a
n
does not exist,
n!1
1 3 n
but lim inf p
n a
n
D 2
D R. Note that lim inf anC1 D 0 and lim sup .nC1/
an n
nC1 D 1,
n!1 n!1 n!1
neither of which shed any light on the value of R. This series diverges at both
endpoints by the Limit of Terms Test.
8.5.3 Differentiability
ˇP P ˇ
ˇ1 1 ˇ
ˇ ˇ ˇ an .x C h/n an xn X 1 ˇ
ˇ f .x C h/ f .x/ ˇ ˇ nD0 ˇ
ˇ ˇDˇ nD0
n1 ˇ
ˇ g.x/ ˇ ˇ n an x ˇD
h ˇ h ˇ
ˇ nD1 ˇ
ˇ1 n
ˇ
ˇP P P
1 P1 ˇ
ˇ a n p np
h x a x n
hn an xn1 ˇˇ
ˇ nD0 pD0 p
n n
ˇ nD0 nD1 ˇ
ˇ ˇ:
ˇ h ˇ
ˇ ˇ
ˇ ˇ
A careful accounting of the terms in the numerator shows that all the terms of
P
1 P
1
an xn and all of the terms of hn an xn1 cancel leaving
nD0 nD1
ˇ1 n
ˇ
ˇP P n p np ˇˇ ˇ ˇ
ˇ a h x ˇ1 ! ˇ
ˇ nD2 n pD2 p ˇ ˇX X
n
ˇ
ˇ ˇ ˇ n p2 np ˇ
ˇ ˇ D jhj ˇ an h x ˇ:
ˇ h ˇ ˇ nD2 pD2 p ˇ
ˇ ˇ
ˇ ˇ
The factor jhj clearly goes to 0 as h goes to 0, but there is a question about what
happens to the other factor. This infinite sum will not be a problem if it remains
bounded as h gets small. Here is where you can use the fact that power series with
radius of convergence R converge absolutely at points less than a distance R from
the center of the series. Assume that jhj is smaller than some fixed value s > 0. Then
the second factor can be estimated as follows.
ˇ ! ˇ ! !
ˇ1 ˇ 1 1
ˇX X n
n p2 np ˇˇ X Xn
n X Xn
n p2 np
ˇ
ˇ jan j jhj jxj jan j jxj
p2 np
ˇ an h x s
ˇnD2 pD2 p ˇ nD2 pD2
p nD2 pD2
p
1
! 1
X jan j X n p np
n
1 X
2
s jxj D 2 jan j.jxj C s/n :
nD2
s pD0
p s nD2
This last expression converges as long as jxjCs is a point where the power series for
f converges absolutely. But if x were chosen so that jxj < R, then for any positive s
with s < R jxj, this will happen. Because you are freeˇ to choose any s ˇ> 0, you
ˇ ˇ
can choose one less than R jxj which will ensure that ˇ f .xCh/f
h
.x/
g.x/ˇ is small
whenever 0 < jhj < s, so the proof can be completed.
8.5 Power Series 261
P
1
• Let an .x c/n be a power series with positive radius of convergence
nD0
R 1.
• The power series for f and its derivative depend on x c and not on c, so
p that c D 0. p
there is no loss of generality to assume
p
• Note that lim sup n n an D lim n n lim sup n an , so the two series
n!1 n!1 n!1
P
1 P
1
an .x c/n and n an .x c/n1 have the same radius of convergence,
nD0 nD1
so each is absolutely convergent for all x with jxj < R.
• Let x be chosen with jxj < R.
• Let > 0 be given.
• If R < 1, let s D Rjxj
2
, and if R D 1, let s D 1.
P
1
• Because jxj C s < R, the series an .jxj C s/n converges absolutely.
0 1
nD0
s2
• Let ı D min @s; 1
P
A > 0.
1C jan j.jxjCs/n
nD2
• Let h be chosen with 0 < jhj < ı.
• Then
ˇ ˇ
ˇ 1 ˇ
ˇ ˇ P an .x C h/n P an xn
1
ˇ ˇ
ˇ f .x C h/ f .x/ X 1 ˇ ˇ X1 ˇ
ˇ ˇ ˇ nD0 nD0 n1 ˇ
ˇ nan x n1
ˇDˇ n an x ˇD
ˇ h ˇ ˇ h ˇ
nD0 ˇ nD1 ˇ
ˇ ˇ
ˇ 1 n
ˇ ˇ 1 n
ˇ
ˇ P P P
1 P
1 ˇ ˇ P P n p np ˇˇ
ˇ n p np
an x n hn an xn1 ˇˇ ˇˇ
ˇ nD0 an pD0 p h x an p h x ˇ
ˇ nD0 nD1 ˇ ˇ nD2 pD2 ˇ
ˇ ˇDˇ ˇD
ˇ h ˇ ˇ h ˇ
ˇ ˇ ˇ ˇ
ˇ ˇ ˇ ˇ
ˇ ! ˇ !
ˇ1 ˇ 1
ˇX X n
n p2 np ˇˇ X Xn
n
ˇ
jhj ˇ an h x jhj ja j jhjp2 jxjnp
ˇ n
ˇnD2 pD2 p ˇ nD2 pD2
p
(continued)
262 8 Sequences of Functions
1
! 1
! 1
X Xn
n p2 np X jan j X n p np
n
1 X
jhj jan j s jxj jhj 2
s jxj D jhj 2 jan j.jxj C s/n < :
nD2 pD2
p nD2
s pD0
p s nD2
P
1
Thus, the derivative of f at x is nan xn1 .
nD1
An immediate consequence of this theorem is that not only can you obtain the
P
1
first derivative of f .x/ D an .x c/n by differentiating term by term, but you
nD0
can also get all the higher derivatives of f by repeating the process. This follows
by induction because, if the mth derivative of f is equal to the series formed by the
mth derivatives of the terms of the series for f , and if that series has the same radius
of convergence as the series for f , then the theorem says that the mC1st derivative of
f can be obtained by differentiating the terms of the series for the mth derivative of
f , and the radius of convergence of that series will remain the same. Moreover, one
can find an antiderivative for f by integrating each term of the series for f . That is, if
P1 P1
f .x/ D an .x c/n for all x with jx cj < R, then the series an
nC1
.x c/nC1 will
nD0 nD0
have the same radius of convergence as the series for f , and the theorem says that
the derivative of the new series is equal to f . It is important to note that if a function
is analytic by virtue of having a power series representation in an open interval of
radius R around c, then that function is infinitely differentiable in that interval.
These results make it very simple to derive new series from previously known
1 P
1
series. For example, you already know that 1x D xn for all x with jxj < 1.
nD0
From this one can get
1 P
1
• by substituting x for x in the series for , 1
1x 1Cx
D .1/n xn .
nD0
1 P
1
• by substituting x2 for x in the series for , 1
1Cx 1Cx2
D .1/n x2n .
nD0
1 P
1
• by differentiating the series for , 1
1x .1x/2
D nxn1 .
nD1
1
• by integrating the series for 1Cx and noting that ln 1 D 0, ln.1 C x/ D
P1 nC1 P1 n
.1/n xnC1 D .1/n1 xn . In particular, by Abel’s theorem,
nD0 nD1
ln 2 D 1 12 C 13 14 C .
1 1
• by integrating the series for 1Cx 2 and noting that tan 0 D 0, tan1 x D
P
1 2nC1
.1/n x2nC1 . In particular, by Abel’s Theorem, 4 D 1 13 C 15 17 C .
nD0
8.5 Power Series 263
P
1
If f .x/ D an .x c/n for all x with jx cj < R, then f .c/ D a0 , the
nD0
constant term of the series for f . Finding the mth derivative of the series for f and
evaluating it at the center of the series, c, gives that f .m/ .c/ D mŠam . So, for all
.m/
integers m 0, am D f mŠ.c/ . This gives a straightforward way to generate the
power series representing any analytic function. Moreover, even if f is not infinitely
differentiable, if it is m times differentiable, one can generate the mth degree Taylor
Pm .n/
f .c/
polynomial for f centered at c given by g.x/ D nŠ
.x c/n . Then g is an mth
nD0
degree polynomial that is equal to f at c, and all of its derivatives up to order m agree
with the corresponding derivatives of f at c. In particular, the first degree Taylor
polynomial is just the familiar linear approximation to f given by the line tangent
to the graph of f at c. If f is m-times differentiable at c, one can generate the mth
degree Taylor polynomial, g.x/, for f centered at c, but this does not say whether
the value of g.x/ is even remotely related to the value of f .x/ when x is different
from c. This issue is what is addressed by Taylor’s Theorem which states that
f .x/ D g.x/ C Rm .x/ for some remainder function Rm .x/. Depending on various
characteristics of f , one can show that Rm .x/ is suitably small so that g.x/ is a good
approximation for f .x/.
There are many forms of Taylor’s Theorem that express the remainder term,
Rm .x/, in different ways. The one discussed here is sometimes called Lagrange’s
form. It says that if f is m C 1 times differentiable on the interval between c and
x, then the difference between f .x/ and the mth degree Taylor polynomial for f
centered at c can be expressed in terms of f .mC1/ ./ for some strictly between c
and x. Its proof follows easily from the following generalization of Rolle’s Theorem.
This Higher Order Rolle’s Theorem can now be used to prove Taylor’s Theorem.
If the function f is m C 1 times differentiable between c and x, then f has an
Pm .n/
f .c/
mth degree Taylor polynomial g.y/ D nŠ
.y c/n . Notice that the difference
nD0
f .y/ g.y/ has the property that this function and its first m derivatives are all equal
to 0 at c. The remainder term RmC1 .x/ will include a factor of f .mC1/ evaluated at
some between c and x, and that value of will come from an application of Rolle’s
Theorem. Of course, to apply Rolle’s Theorem, f .y/g.y/ would need to be 0 at y D
x. One needs to add a term to f .y/ g.y/ which will not affect the function and its
derivatives at c but will make the function equal to 0 at x. The term that accomplishes
.yc/mC1
this is f .x/g.x/ .xc/ mC1 since this term equals f .x/g.x/ at x, and it and its
first m derivatives are equal to 0 at c. But now Rolle’s Theorem can be applied to the
.yc/mC1
function h.y/ D f .y/g.y/ f .x/g.x/ .xc/ mC1 to find a value of between c and x
.mC1/Š
such that h.mC1/ ./ D 0, or 0 D f .mC1/ ./g.mC1/ ./.f .x/g.x// .xc/mC1 . Noting
that the mC1st derivative of g at c is equal to 0 gives f .x/ D g.x/Cf .mC1/ ./ .xc/
mC1
.mC1/Š
as desired.
PROOF (Taylor’s Theorem): Let f be an m C 1 times differentiable
function on the open interval from c to x with c ¤ x, and let f be
continuous on the closed interval from c to x. Then there is an between
P
m .n/
f .c/
.x c/n C f .mC1/ ./ .xc/
mC1
c and x such that f .x/ D nŠ .mC1/Š
.
nD0
For example, the cosine function is analytic, and its power series which converges
2 4 6 8
for all real numbers is cos x D 1 x2Š C x4Š x6Š C x8Š . So, how accurate
2 4
of an approximation is 1 x2Š C x4Š at x D 2? It is clear that the given Taylor
polynomial includes the terms for n D 0, 1, 2, 3, and 4, but it is beneficial to note
that it also includes the term for n D 5 which is 0. Therefore, Taylor’s Theorem
6
26
says that the remainder at x D 2 is f .6/ ./ .20/
6Š
D cos./ 720 . Since the cosine
function is bounded by 1, the error introduced by using the Taylor polynomial as an
26
approximation to cos 2 is at most 720
0:09. In fact, at x D 2 the polynomial is 13
while cos 2
0:416146 with a difference of 0:08281.
8.5 Power Series 265
Given two analytic functions each represented by power series with common center
c and positive radii of convergence, it is straightforward to find the power series
representing the sum, difference, product, and quotients of these series. Suppose
P
1 P
1
two functions have power series f .x/ D an .x c/n and g.x/ D bn .x c/n
nD0 nD0
which both converge when jx cj < R for some R > 0. Then theorems about
the sum and difference of series of real numbers ensure that the sum and difference,
P
1 P
1
.f Cg/.x/ D .an Cbn /.xc/n and .f g/.x/ D .an bn /.xc/n , both converge
nD0 nD0
when jx cj < R. Of course, it is possible that the new series converges in an even
larger interval. For example, the series 1 C x C x2 C x3 C and 2 x x2 x3
both have radius of convergence equal to 1, but the sum of the two series is the
constant function 3, and its power series converges for all x.
The product of two power series can be found by using the Cauchy product of
P
1
the two series. If f .x/ D an .x c/n has radius of convergence R1 > 0 and
nD0
P
1
g.x/ D bn .x c/ has radius of convergence R2 > 0, then both series converge
n
nD0
absolutely when jxcj < min.R1 ; R!2 / implying that their Cauchy
! product, .fg/.x/ D
P1 Pn P
1 Pn
ap .x c/p bnp .x c/np D ap bnp .x c/n , converges for
nD0 pD0 nD0 pD0
jx cj < min.R1 ; R2 /. Again, the radius of convergence can be larger as is with
1
the product of 1x D 1 C x C x2 C x3 C and 1 x which converges for all x.
P
1 P
1
If f .x/ D an .xc/n has radius of convergence R1 > 0 and g.x/D bn .xc/n
nD0 nD0
has radius of convergence R2 > 0, and g.c/ is not zero, then one can find the power
f .x/
series for the quotient h.x/ D g.x/ centered at c by working backwards from the
P1
Cauchy product of h and g. That is, if you assume that h.x/ D qn .x c/n , then
! nD0
P1 P
1 Pn
f .x/ D an .x c/n D h.x/g.x/ D bp qnp .x c/n . Because of the
nD0 nD0 pD0
assumption that g.c/ ¤ 0, it follows that b0 ¤ 0. Then equating like terms in the
product gives the sequence of equations
a0 Db0 q0
a1 Db0 q1 C b1 q0
a2 Db0 q2 C b1 q1 C b2 q0
a3 Db0 q3 C b1 q2 C b3 q1 C b4 q0
266 8 Sequences of Functions
and so forth. The first equation can be solved to give q0 . Then the second equation
can be solved to give q1 , and so forth. The fact that g.c/ ¤ 0 says that the coefficient
b0 ¤ 0 which allows the equation for am to be solved for qm for each m 0. Often
this results in a recursive formula for qn . For example, it is known that ln.1 C x/ D
2 3 4
x x2 C x3 C x4 , so you can find the series centered at 0 for the quotient
1
ln.1Cx/ P
1 P
1Cx
D qn x by writing .1 C x/
n
qn x giving 0 D 1 q0 so q0 D 0. Then
n
nD0 nD0
.1/n1
for each n > 0, n
D qn C qn1 , so q1 D 12 , q2 D 56 , q3 D 13
12
, and so forth
.1/n1
with qn D n
qn1 .
8.5.6 Exercises
d P P
1 1
where dx an .x c/n is the same as d
a .x
dx n
c/n . Abel’s Theorem discusses
nD0 nD0
P
1 P
1
whether it is valid to write lim an xn D lim an xn . Thus, the fundamental
x!R nD0 nD0 x!R
question of Analysis asks “when can you interchange the order of two limiting
processes?” It is instructive to watch for other occurrences of this question as your
study of Analysis continues.
Chapter 9
Topology of the Real Line
In the field of Analysis the concepts of the limit and the continuity of a function f
at a point x D a are defined in terms of open intervals. For example, the condition
jf .x/ Lj < says that f .x/ is in an open interval centered at L, and the condition
jx aj < ı says that x is in an open interval centered at a. These intervals are
specified in terms of the distance between x and y given by jx yj. Topology is
a branch of Mathematics where these concepts are extended to spaces where one
can discuss intervals without having to rely on a distance formula. As a result the
concepts of limit and continuity can be extended to such spaces, and it can be shown
that many of the properties associated with continuous functions defined on the
real line are shared by continuous functions defined on these more general spaces.
Although the theorems discussed in this chapter are presented in the context of sets
on the real line, virtually all of the theorems are true in the more general context of
any topological space. Many of the techniques used to prove these theorems are the
same techniques one would use for a general topological space, and, therefore, this
chapter can be thought of as an introduction to the field of Topology even though
general topological spaces are not discussed here.
A good way to begin is by taking a set S R and identifying the points s 2 S
that are not only inside of S but are, in a sense, completely surrounded by points in
S. The point s is said to be in int.S/, called the interior of S, if there is an > 0 such
that all x within of s are in S, that is, jx sj < implies x 2 S. You can think of
the interior of S as those points which are a positive distance from the complement
of S, Sc D RnS. For example, if S is the closed interval Œ0; 4, then the open interval
.0; 4/ is the interior of S. This is because if s 2 .0; 4/ and D min.s; 4 s/, then
all x satisfying jx sj < are elements of S. The two endpoints of the interval
Œ0; 4, 0 and 4, do not have this property. No open interval containing either 0 or
4 is completely contained inside of S. Clearly, if x > 4 or x < 0, then x … S, so
x … int.S/. Thus, int.S/ D .0; 4/. The interior of the set Q of rational numbers is
S
SC
the empty set because all nonempty open intervals contain irrational numbers, so
no nonempty open interval is contained in Q. One sometimes says that Q has no
interior even though it does have an interior; it is just that its interior is the empty
set.
Then ext.S/, called the exterior of S, is just defined to be the interior of Sc , that
is, s 2 ext.S/ if there is an > 0 such that all x satisfying jx sj < are in
Sc D RnS. The exterior of S is the set of points that are completely surrounded by
points outside of S. You can think of the exterior of S as the collection of points
bounded away from S, that is, the points that are a positive distance from S. The
exterior of the set Œ0; 4 is the union of two open intervals, .1; 0/ [ .4; 1/. The
exterior of the set Q is the empty set.
If a point s is in neither int.S/ nor ext.S/, then it must be that no open interval
containing s is completely inside of S and no open interval containing s is completely
outside of S. Thus, for every > 0, the interval .s ; s C / contains at least one
element of S and at least one element of Sc . Such points are said to be in @S, called
the boundary of S (Fig. 9.1). Note that the symbol used for boundary is @ which
is the same symbol use for partial derivatives in Calculus. There are connections
between derivatives and boundaries that justify the use of the same symbol for both
concepts. The boundary of Œ0; 4 is the set f0; 4g. The boundary of Q is the entire
real line, R.
It is important to note that for any set S R, the three sets int.S/, ext.S/, and @S
partition R, that is, each real number x is in exactly one of these three sets. A proof
of this fact must show two things about a set S: that R D int.S/ [ ext.S/ [ @S, and
that no point x belongs to more than one of these sets. To show that R is a union of
the three sets, you would take an arbitrary x 2 R and show that it is in at least one of
these sets. One way to show that a point must be one of three things is to assume that
it not one of the first two, and then prove that it must be the third. In this case, you
can assume that a point x 2 R is not in int.S/ or in ext.S/. If x is not in int.S/, then
9.1 Interior, Exterior, and Boundary 271
PROOF: For every set S R, R D int.S/ [ ext.S/ [ @S and the three sets
int.S/, ext.S/, and @S are mutually disjoint.
• Let S R.
• Assume that x is a real number that is not a member of int.S/ or ext.S/.
• Then, because x … int.S/, for every > 0, the open interval .x ; x C / is
not contained in S, so it contains points of Sc .
• And because x … ext.S/, for every > 0, the open interval .x ; x C / is
not contained in Sc , so it contains points of S.
• It follows that for every > 0, the open interval .x ; x C / contains
points in S and points in Sc .
• Thus, by the definition of boundary, x 2 @S, and this shows that x must be
in at least one of the three sets, int.S/, ext.S/, or @S.
• If x 2 int.S/, then there is an > 0 such that the open interval
.x ; x C / S.
• But then x 2 S, so x … ext.S/, and .x ; x C / S shows that x … @S.
• Similarly, if x 2 ext.S/, then it cannot be in int.S/ or @S.
• Thus, no x 2 R is a member of more than one of the three sets which
completes the proof.
There are many results that follow directly from the definitions of interior,
exterior, and boundary. For example, if S and T are any subsets of R, then
• int.int.S// D int.S/.
• int.ext.S// D ext.S/.
• int.S/ ext.ext.S//.
• ext.S/ ext.int.S//.
• @.@.S// @S.
• @.int.S// @S.
• @.ext.S// @S.
• @.S/ D @.Sc /.
• int.S/ [ int.T/ int.S [ T/.
• ext.S [ T/ ext.S/ \ ext.T/.
• int.S \ T/ D int.S/ \ int.T/.
• @.S [ T/ @S [ @T.
• if S T, then int.S/ int.T/.
• if S T, then ext.T/ ext.S/.
272 9 Topology of the Real Line
Each of these results is a statement about either two sets being equal to each other
or one set being a subset of another. Thus, one would prove these results using the
techniques discussed in Chap. 2 for proving subset and set equality statements. For
example, how would you write a proof that for any set S, int.int.S// D int.S/?
This would be a proof that two sets are equal, so the proof would consist of two
parts: showing int.int.S// int.S/ and showing int.S/ int.int.S//. The fact that
int.int.S// int.S/ is just a consequence of the definition of interior. For any set
T, int.T/ T, so certainly int.int.S// int.S/. Showing that int.S/ int.int.S//
is showing that one set is a subset of another. So, you would let x be an element of
int.S/, and then show that x is also an element of int.int.S//. By the definition of
interior, there is an > 0 such that the open interval .x ; x C / S. Thus, you
need to show that .x ; x C / is contained in int.S/. That is, each y 2 .x ; x C /
must be in the interior of S. But it is easy to find an open interval centered at y that
is contained in .x ; x C /. Just let ı D min.y .x /; x C y/ > 0 because
then .y ı; y C ı/ .x ; x C /. This shows each point of .x ; x C / is in
int.S/ which completes the proof.
PROOF: For every set S R, int.int.S// D int.S/.
• Let S R.
• For any set T, int.T/ T, so int.int.S// int.S/.
• So let x 2 int.S/.
• By the definition of interior, there is an > 0 such that the open interval
.x ; x C / is contained in S.
• Let y 2 .x ; x C /, and let ı D min.y .x /; x C y/ > 0.
• Then .y ı; y C ı/ .x ; x C / S.
• This shows that .x ; x C / int.S/ implying that x is in int.int.S//.
• This proves that int.S/ int.int.S// and completes the proof of the
theorem.
For a more difficult challenge, consider writing a proof that for any set S, @.@S/ @S
which, in words, says that the boundary of the boundary of a set is contained in the
boundary of the set. For example, let S be the set of rational numbers in the interval
Œ0; 4. You should prove to yourself that the boundary of this set is the entire interval
Œ0; 4. The boundary of that interval is just f0; 4g which indeed is contained in
@S D Œ0; 4. To show that @.@S/ is a subset of @S, you would take an arbitrary
point x 2 @.@S/ and show that it is in @S. So what do you know if x 2 @.@S/? The
only tool you have at your disposal here is the definition of the boundary of a set,
so you would proceed to use that definition. It says that for every > 0 the open
interval .x ; x C / contains elements of @S and elements of the complement
of @S. You want to show that x is in @S, so you would need to show that the open
interval .x ; x C / contains elements of S and elements of Sc . Well, what is the
consequence of saying that the open interval .x; xC/ contains elements of @S? It
must mean that there is a y 2 .x ; x C / such that y 2 @S. What does it mean for y
to be in @S? It means that for every ı > 0, the interval .yı; yCı/ contains elements
of S and elements of Sc . But this is sufficient if .y ı; y C ı/ .x ; x C / because
9.1 Interior, Exterior, and Boundary 273
((S c
y S
)
x
)
that would put elements of both S and Sc in .x ; x C /. This can be arranged by
selecting ı small enough (Fig. 9.2).
As a third example, consider proving that for any two sets S and T, that
int.S/ [ int.T/ int.S [ T/. Again, this is proving that one set is a subset of a
second set, so your proof would start by selecting an arbitrary element of the first
set and then proceed to show that that element belongs to the second set. Here the
first set is int.S/ [ int.T/. If you select an x from this set, all you know about x is that
it is in the union of the two sets int.S/ and int.T/. So, the only tool you can use is
the definition of union to say that x must be either a member of int.S/ or a member
of int.T/. In the case that x 2 int.S/, you can then apply the definition of interior to
say that there is an > 0 such that the interval .x ; x C / S. But this is all you
need since S S [ T showing .x ; x C / S [ T proving that x 2 int.S [ T/.
The case where x 2 int.T/ is analogous, completing the proof.
PROOF: For any sets of real numbers S and T, int.S/ [ int.T/ int.S [ T/.
• Let S and T be sets of real numbers.
• Let x 2 int.S/ [ int.T/.
• Then by the definition of the union of two sets, either x 2 int.S/ or x 2
int.T/.
• Without loss of generality, assume that x 2 int.S/.
• Then there is an > 0 such that the open interval .x ; x C / S.
• But since S S [ T, it follows that .x ; x C / S [ T showing that
x 2 int.S [ T/, completing the proof.
Can it be that int.S/ [ int.T/ is not equal to int.S [ T/? The answer is yes. See if
you can think of an example.
274 9 Topology of the Real Line
9.1.1 Exercises
For each of the following sets, find the interior, exterior, and boundary of the set.
1. Œ0; 3/ [ .3; 6
1
1 1
2. [ 2n ; 2n1 .
nD1
3. Œ0; 4 \ Q
Write proofs for each of the following statements. For exercises involving the subset
relation rather than the equality relation, give examples showing that the subset
relation in the statement cannot be replaced by an equality.
4. If S T, then int.S/ int.T/.
5. If S T, then ext.T/ ext.S/.
6. int.ext.S// D ext.S/.
7. ext.S/ ext.int.S//.
8. @.int.S// @S.
9. @.S/ D @.Sc /.
10. @.ext.S// @S.
11. int.S \ T/ D int.S/ \ int.T/.
12. ext.S [ T/ D ext.S/ \ ext.T/.
13. @.S [ T/ @S [ @T.
14. int.S/ ext.ext.S//
A set S of real numbers is called open if for every s 2 S there is an > 0 such that
the open interval .x; xC/ S. A set S of real numbers is called closed if @S S.
The intervals that are called open intervals are, in fact, open sets. In particular,
.1; 7/, .2; 1/, and ; are all open sets as well as .3; 3/ [ .5; 9/ [ .10; 41/ and
1
[ .2n; 2n C 1/. The intervals that are called closed intervals are, in fact, closed sets.
nD1
In particular, Œ5; 3, Œ4; 1/, and ; are all closed sets.
There are actually many equivalent ways to define open and closed sets, so
one usually begins this discussion by proving that all the different definitions are
equivalent. In particular, if S R, then the following are equivalent:
1. S is an open set.
2. S D int.S/.
3. S \ @S D ;.
4. Sc is a closed set.
Many theorems in mathematics are statements of the form p , q, and the proof of
these statements is often broken into two steps: p ) q and q ) p. Theorems of
that type state that two conditions are equivalent. But it is not uncommon to have
9.2 Open and Closed Sets 275
a theorem that states that several statements are equivalent, that is, p1 , p2 ,
p3 , , pk . One way to prove theorems of this form is to show in a sequence
of steps that p1 ) p2 , p2 ) p3 , p3 ) p4 , . . . , pk1 ) pk , and then pk ) p1 . This
is the technique you can use to prove the list of statements about open sets. You
would begin by assuming condition 1, that a set S is open and then prove condition
2, that S D int.S/. This can be done by noting that for any set, elements of the set
are either in the interior of the set or on the boundary of the set. But if the set S
is open, it means that for each x 2 S there is an > 0 such that the open interval
.x ; x C / S. Thus, .x ; x C / contains no elements of Sc showing that x
cannot be in @S, so it must be that x 2 int.S/ which proves that S D int.S/.
Now, assuming condition 2 that S D int.S/ it follows immediately that S \ @S D
int.S/ \ @S D ;, which is condition 3. If you assume condition 3 that S \ @S D ;,
how can you conclude that Sc is closed? Well, if S contains no elements of @S, it
must be that all the elements of @S (if there are any) must belong to Sc . But as seen
in the exercises of the previous section, the boundary of S and the boundary of Sc
are always the same. This follows from the fact that the definition of boundary is
symmetric in its references to S and Sc . Therefore, Sc contains its boundary proving
that Sc is a closed set, which is condition 4.
Finally, assuming condition 4 that Sc is a closed set, you know that Sc contains
its boundary, so Sc contains the boundary of S. You must show that for each x 2 S,
there is an open interval centered at x such that the entire interval is contained in S.
But if for every > 0 the open interval .x ; x C / contains elements in Sc , then
x would be in the boundary of S which is false. Thus, there is an > 0 such that
the open interval .x ; x C / is contained in S. This proves that S is an open set,
which is condition 1 (Fig. 9.3).
S ∂S
SC
276 9 Topology of the Real Line
Condition 2 ) Condition 3
• Assume that S D int.S/.
• Then S \ @S D int.S/ \ @S D ; because the interior and the boundary of
any set are disjoint.
• Thus, S \ @S D ;, which is condition 3.
Condition 3 ) Condition 4
• Assume that S \ @S D ;.
• Then @S must be contained in Sc .
• Because @S D @.Sc /, it follows that @.Sc / Sc implying that Sc is a closed
set, which is condition 4.
Condition 4 ) Condition 1
• Assume that Sc is a closed set, which means that Sc contains @.Sc /.
• Let x 2 S.
• If for every > 0, the open interval .x ; x C / contains elements of Sc ,
then x would be an element of @S D @.Sc /.
• But all elements of @.Sc / are contained in Sc , so there must be an > 0
such that the interval .x ; x C / contains no elements of Sc implying that
.x ; x C / S.
• This shows that S is an open set, which is condition 1.
Condition 4 ) Condition 1
• Assume that Sc is an open set.
• Then Sc \ @.Sc / D ; implying that @.Sc / S, so @S S.
• Thus, S contains @S, so S is a closed set, which is condition 1.
Of course, a set need not be either open or closed as is seen by the interval .0; 5
which contains one but not both of its boundary points (Fig. 9.4), so it is neither
open (because it contains a boundary point) nor closed (because it does not contain
all of its boundary points).
278 9 Topology of the Real Line
9.2.1 Exercises
Determine which of the following sets of real numbers are open and which are
closed.
1. .2; 2/ [ .2; 6/ [ .6; 10/
2. R
3. the irrational numbers
4. the real numbers that are not integers
Write proofs for each of the following statements.
5. If S is an open set, and T is a closed set, then SnT is an open set.
6. If S is an open set, and T is a closed set, then TnS is a closed set.
7. ; is both an open set and a closed set.
8. If x is an accumulation point of a set S, and x … S, then x 2 @S.
9. If x 2 @S and x … S, then x is an accumulation point of S.
10. Let <an > be any sequence. Let A be the set of all values y such that there exists
a subsequence of <an > that converges to y. Then A is a closed set.
Perhaps the most important properties open sets have are that the union of any
collection of open sets is itself an open set and that the intersection of a finite
collection of open sets is itself an open set. In fact, these two properties of open sets
are the defining conditions required to hold in the more general setting of topological
spaces (Fig. 9.5).
In the context of the real numbers, it is not hard to show that the union of any
collection of open sets is itself an open set. But before this proof can be started,
there needs to be a convenient way to discuss an arbitrary collection of open sets.
9.3 Unions and Intersections 279
PROOF: Assume that for each i in the index set I, Ai is an open set. Then
[ Ai is an open set.
i2I
Now consider proving the result that the intersection of a finite collection of
open sets is itself an open set. This time there is no need to consider an arbitrarily
large collection of open sets; you can just use the finite collection of open sets
A1 ; A2 ; A3 ; : : : ; Ak . Again you would take an arbitrary x 2 A1 \ A2 \ A3 \ \ Ak .
You know from the definition of intersection that for each j D 1; 2; 3; : : : ; k, this
element x must be in Aj . And you know that since Aj is an open set, there must be
an j > 0 such that the interval .x j ; x C j / Aj . Now you have a collection
of k open intervals each centered at x. By selecting D min.1 ; 2 ; 3 ; : : : ; k /, you
will have the least of these j values which is a positive number. This is crucial. The
fact that you have a finite collection of open sets ensures that you can find a finite
number of open intervals centered at x and can find the shortest of these intervals.
If the collection of open sets were infinite, there would be no guarantee that there
would be a minimum j . The fact that there is a minimum value that is greater than
0 allows you to claim that the interval .x ; x C / is contained in each of the Aj
sets, and thus, .x ; x C / is contained in the intersection of the Aj ’s.
There are analogous results about the union and intersections of closed sets. In
particular, the intersection of an arbitrary collection of closed sets is itself a closed
set, and the union of a finite number of closed sets is itself a closed set. One can
prove these results by relying on the definition of a closed set, but it is much easier
9.3 Unions and Intersections 281
to use the results from the previous section that show that a set is a closed set if and
only if it is the complement is an open set. For example, to show that the union of
a finite number of closed sets is closed, let A1 ; A2 ; A3 ; : : : ; Ak be a finite collection
of closed sets. Then for each j, Acj is the complement of a closed set, so it is open.
By the previous theorem, the intersection of a finite number of open sets is an open
set, so Ac1 \ Ac2 \ Ac3 \ \ Ack is an open set. But DeMorgan’s Law says that
Ac1 \ Ac2 \ Ac3 \ \ Ack D .A1 [ A2 [ A3 [ [ Ak /c which is an open set, so its
complement, A1 [ A2 [ A3 [ [ Ak is a closed set as desired.
Although not needed in this textbook about writing proofs in Analysis, for
completeness, it makes sense at this point to introduce the definition of a topological
space to be a set S together with a collection T of subsets of S satisfying the
conditions
• Both ; and S are in T .
• The union of any collection of sets in T is also a set in T .
• The intersection of any finite collection of sets in T is also a set in T .
If these conditions are satisfied, then the set T is said to be a topology for the
topological space S. From the definitions and theorems presented so far in this
chapter it follows that the real numbers R along with its collection of open sets
forms a topological space. The advantage of introducing the more general concept
of a topological space is that many theorems about the real numbers extend to
all topological spaces, so once you justify the fact that you are dealing with a
topological space, you then know many theorems about your new space.
As an example of another topological space consider the set of integers, Z, along
with the collection T of subsets of Z consisting of the empty set, ;, and the sets
A Z with the property that Ac D ZnA is a finite set. It is easy to see that both
; and Z are elements of T . To show that T is closed under unions, suppose you
have a collection of sets in T . There are two cases to consider: (1) all the sets in the
collection are the empty set, and (2) at least one of the sets in the collection is not
empty. In the first case, the union of all the sets in the collection is the empty set
which is in T . In the second case, if the collection includes a set A, then the union
of the sets in the collection contains A, and because the complement of the union
lies inside the complement of A which is finite, the union will have to have a finite
complement and be a set in T . To show that T is closed under finite intersections,
suppose you have a finite collection of sets in T . Again, there are two cases to
consider: (1) at least one set in the collection is the empty set, and (2) none of
the sets in the collection is the empty set. In the first case, the intersection of the
collection of sets is the empty set which is in T . In the second case, the complement
of the intersection of the finite collection of sets is the union of the complements of
the sets. If all the complements are finite, then the union of the finite number of
complements is also finite, so the intersection is in T . This verifies that T is a
topology for Z. This is known as the finite complement topology for Z. It is clearly
not the usual topology associated with the integers which is just the usual topology
of R restricted to Z. Generally, a set can have many different topologies, each giving
rise to a different topological space. Most of these topologies are uninteresting and
have few if any applications.
282 9 Topology of the Real Line
9.3.1 Exercises
Sometimes rather than focusing your attention on the entire real line, you are
interested in the open sets within a particular subset of the real numbers. For
example, if the real valued function f has domain A D Œ4; 4, you might be
interested in the open sets contained in A. Moreover, you might want to consider
some new sets to be open which were not considered to be open sets in R. For
example, within A the interval Œ4; 0/ should be considered open in the topological
space consisting just of the set A. This is because, within A, each point of Œ4; 0/
is an interior point. The only controversial point here is 4, but it makes sense to
claim that 4 is in the interior of A if your entire universe of interest is A. Certainly,
all the points of A that are within a distance of 12 of 4 are elements of Œ4; 0/.
Generalizing this idea leads to the definition of the inherited topology in the set
A R. In the inherited topology, a set B A is said to be open in A if B is the
intersection of A with some set that is open in R. For example, if A D Œ4; 4
as above, then the set Œ4; 0/ is open in A because Œ4; 0/ D .5; 0/ \ A,
and .5; 0/ is an open set in R. With this same reasoning, within A the set
Œ4; 3 [ Œ2; 0/ [ f1; 2g [ Œ3; 4/ has interior Œ4; 3/ [ .2; 0/ [ .3; 4/ and
boundary f3; 2; 0; 1; 2; 3; 4g.
Similarly, a set B A is said to be closed in A if B is the intersection of A with
some set that is closed in R. Note that all of the properties proved earlier in this
chapter pertaining to open or closed sets in R hold equally well for sets open or
closed in A. In particular, the union of any collection of sets open in A is itself a set
that is open in A.
The motivation for developing the properties of open and closed sets and for
defining topological spaces is that one can now generalize the idea of a continuous
function. One defines P .X/, the power set of a set X, to be the collection of all
subsets of the set X. For example, if X is a finite set with n elements, then P .X/
contains the 2n subsets of X. If f W A ! B is a function which maps elements of the
9.4 Continuous Functions Applied to Sets 283
set A to elements of the set B, the function f can be extended to f W P .A/ ! P .B/
which maps subsets of the set A to subsets of the set B. If C A, then define f .C/
to be the set fy 2 B j y D f .a/ for some a 2 Cg. Then f .C/ is called the image of
C under f . The notation f .C/ could be confusing because f was originally defined
for elements of A, not subsets of A. The application of f to subsets of A is really
defining a new function f W P .A/ ! P .A/ whose domain is the power set of A and
codomain is the power set of B. The confusion arises because the same name, f , is
given to both functions. The confusion is cleared up by recognizing the distinction
that if the argument of f is an element a 2 A, then f .a/ refers to an element of the
codomain, B, while if the argument of f is a subset C A, then f .C/ is a subset of
B, f .C/ B.
For example, the function f .x/ D x2 is defined to be a function with domain R
and codomain R. It is then easily understood that f .3/ D 9 and f .2/ D 4. But
taking C to be the interval .3; 2/, the expression f .C/ now refers to the function
f W P .R/ ! P .R/, and f .C/ is the set of all elements of R that are images under f
of elements of C. That is, f .C/ D Œ0; 9/.
If the function f W A ! B is not a bijection mapping A one-to-one and onto B,
then it is not possible to define the inverse function f 1 W B ! A. One problem is
that if f is not surjective (mapping A onto B), there might be an element of b 2 B for
which there is no corresponding element a satisfying f .a/ D b, so f 1 .b/ cannot be
defined. Another problem is that if f is not injective (mapping A one-to-one to B),
then there will be an element b 2 B such that f .x/ D b is satisfied by more than one
value of x, so f 1 .b/ would not be unique. On the other hand, if D B, it is always
possible to define the function f 1 W P .B/ ! P .A/ mapping the power set of B to
the power set of A. Indeed, one defines f 1 .D/ D fx 2 A j f .x/ 2 Dg. In this case
f 1 .D/ is called the preimage of D under f . For example, returning
to f .x/ D x2 ,
1 1
it follows that f .Œ4; 9/ D .3; 2/ [ .2; 3/ and f .1; 16/ D .4; 4/.
2
Note that
when the continuous function f .x/ D x was applied to an open set as in
f .3; 2/ D Œ0; 9/, the image did not end up being an open set. But when f 1 was
1
applied to an open set as in f .1; 16/ D .4; 4/, the preimage was also an open
set. This is an important distinction. A continuous function need not map open
sets to open sets; functions that do map all open sets to open sets are called open
functions. But all continuous functions have the property that their inverses always
map open sets to open sets. Conversely, a function whose inverse always maps open
sets to open sets must be a continuous function. Of course, these statements require
proof, but the proofs follow directly from the definition of continuity and definition
of open set.
Assume, for example, that f W A ! B is a continuous function and D B is an
open set in B. You are challenged to show that f 1 .D/ is an open set in A. To show
that f 1 .D/ is open, you would need to show for every a 2 f 1 .D/ there is a ı > 0
such that .a ı; a C ı/ \ A f 1 .D/. From the definition of f 1 .D/, you know
1
that if a 2 f .D/, then f .a/ 2 D. Because D is open, there is an > 0 such that
f .a/ ; f .a/ C \ B D. This means that if y 2 B such that jy f .a/j < , then
y 2 D. But now, by the definition of continuity, there is a ı > 0 such that if x 2 A
284 9 Topology of the Real Line
b
a
C
f-1 D
A
f(a) = b
f-1(D) = C
B
with jxaj < ı, then jf .x/f .a/j < implying that f .x/ is in f .a/; f .a/C \B,
and thus, f .x/ is in D. This shows that x 2 f 1 .D/ proving that .a ı; a C ı/
f 1 .D/, so f 1 .D/ is open.
Conversely, suppose that f has the property that f 1 .D/ is an open set in A
whenever D is an open set in B. Then let a 2 A. This time you are challenged
to show that for every > 0, there is a ı > 0 such that if x 2 A with jx aj < ı,
then jf .x/ f .a/j < . But the set D of all y 2 B satisfying jy f .a/j < is an
open set in B implying that f 1 .D/ is an open set in A containing the point a. This
means that there is a ı > 0 such that .a ı; a C ı/ \ A is contained in f 1 .D/.
In other words, if x 2 A with jx aj < ı, then x is in f 1 .D/, so f .x/ is in D and
jf .x/ f .a/j < , completing the proof that f is continuous (Fig. 9.6).
(continued)
9.5 Closure 285
9.4.1 Exercises
9.5 Closure
short proof of this fact that relies only on the definitions of accumulation point,
closed set, and boundary. Such a proof would start with the assumption that a is
an accumulation point of the closed set S. One way to continue is to construct a
proof by contradiction, that is, to assume that a … S and hope that this will lead to a
contradiction. Interestingly, you can proceed in more than one way. You could use
the fact that S is a closed set which implies that, since a … S, then a 2 ext.S/.
This means that there is an > 0 such that the open interval .a ; a C /
is contained in Sc . But the definition of accumulation point says that every open
interval containing a also contains points of S, so this contradicts the fact that
a is an accumulation point of S. Alternatively, you could use the fact that a is
an accumulation point of S. This means that for every > 0, the open interval
.a ; a C / contains points in S. All of these open intervals also contain a … S
implying that each of these open intervals contains points in S and points in Sc . Thus,
a satisfies the definition of being an element of @S. From the definition of closed set,
@S S. Thus, a 2 S.
PROOF: If S is a closed set, then S contains all of its accumulation points.
• Let S be a closed set, and let a be an accumulation point of S.
• Assume that a … S.
• From the definition of accumulation point, for every > 0 it follows that
the open interval .a ; a C / contains elements in S.
• Because a … S, it follows that for every > 0 the open interval .a; aC/
contains elements of S and elements of Sc , so a 2 @S.
• From the definition of closed set, @S S, so a 2 S which contradicts the
assumption that a … S.
• Thus, every accumulation point of S must be contained in S.
The collection of all the accumulation points of S is called the derived set of S
which is written S0 . The previous theorem shows that if S is closed, then S0 S.
The converse is also true, that is, if S0 S, then S must be closed. This follows from
the fact that if a is in the boundary of S but a is not an element of S, then a must be
an accumulation point of S. This should make sense to you. A boundary point is a
point close both to S and to Sc . An accumulation point is close to S, and if it is not
in S, it is close to Sc .
PROOF: If set S contains all of its accumulation points, then S is a closed
set.
• Let S be a set that contains all of its accumulation points.
• Assume that a 2 @SnS.
• Because a 2 @S, for every > 0, the open interval .a ; a C / contains
elements of S and elements of Sc .
• Thus, because a itself is not a member of S, .a; aC/ contains an element
of S not equal to a.
• It follows that a 2 S0 S which contradicts the assumption that a … S.
• Therefore, @S S which proves that S is a closed set.
9.5 Closure 287
You can conclude from this result that for any set S, if a 2 @S \ Sc , it is an
accumulation point of S, and, by symmetry, if a 2 @S \ S, then it is an accumulation
point of Sc . The set S is closed if it contains its boundary, @S. But for any set S, the
elements of @S that are not in S are accumulation points of S, so S is closed if and
only if it contains all of its accumulation points. It is important to recognize, though,
that the derived set S0 need not be contained in @S since points in the interior of S are
accumulation points of S, and @S need not be contained in S0 since isolated points
of S are in the boundary of S without being accumulation points of S. On the other
hand, S [ @S D S [ S0 .
For any set S, define the closure of S or cl.S/ to be S [ S0 D S [ @S. Some books
use the notation S or S for the closure of S. Intuitively, the closure of a set S takes
the elements of the boundary of S and adds them to the set so that you now have S
along with its boundary (Fig. 9.7). The closure also has the following properties.
• For any set S, the closure cl.S/ is a closed set.
• The set S is closed if and only if S D cl.S/.
• cl.S/ is the intersection of every closed set that contains S.
• cl.S/ is the smallest closed set that contains S.
All of these results have short proofs. For example, to get the first result, recall that
if x is in the boundary of the union of two sets, S [ T, then x is either in the boundary
of S or the boundary of T. Thus, if x 2 @ cl.S/, it means that x 2 @.S [ @S/ and,
therefore, x 2 @S or x 2 @.@S/. It was shown in the first section of this chapter that
@.@S/ @S implying that x 2 @S proving that x is in cl.S/. Thus, cl.S/ contains its
boundary, so it is closed.
For the second result, note that if S is closed, it contains its boundary so cl.S/ D
S [ @S D S. Conversely, if S D cl.S/, then S is closed because cl.S/ is always a
closed set.
The third and fourth results follow quickly after noticing that any closed set
containing S must also contain the boundary of S.
S
cl(S)
( ]( ) [ ][ ]
Fig. 9.7 The closure of a set
288 9 Topology of the Real Line
9.5.1 Exercises
9.6 Compactness
The topics of open cover, finite subcover, compactness, and the Heine–Borel
Theorem were introduced in Chap. 4 because of their usefulness in proving that a
function continuous on a closed bounded interval is uniformly continuous on that
interval. Compactness also played an important role in showing that a continuous
function on a closed bounded interval is bounded, a continuous function on a
closed bounded interval obtains its extreme values (maximum and minimum), and a
continuous function on a closed bounded interval has a Riemann integral. Recall
that an open cover of a set S was defined to be a collection open intervals T
where for each x 2 S there is an open interval .p; q/ 2 T such that x 2 .p; q/.
After the introduction of the topological ideas in this chapter, that definition can be
generalized to allow T to be a collection of open sets rather than just open intervals,
that is, a collection of open sets, T, is called an open cover of S if for each x 2 S
there is an open set U 2 T such that x 2 U. Moreover, the Heine–Borel Theorem
can now be extended in two ways: the concept of an open cover by intervals can
be generalized to an open cover by open sets, and the concept of closed bounded
interval can be generalized to closed bounded set.
(continued)
9.6 Compactness 289
So, this shows that all closed bounded sets of real numbers are compact. The
converse is also true, that is, all compact subsets of real numbers are both closed
and bounded. These two results together, then, completely characterize the compact
sets of real numbers.
PROOF: A subset of R is compact if and only if it is closed and bounded.
• The Heine–Borel Theorem shows that closed bounded sets of real numbers
are compact.
• Conversely, assume that S is a compact subset of R.
• The collection of open intervals .j; j/ where j ranges over the natural
numbers is a collection of open sets that covers all of R, so it certainly
covers S.
• Because S is compact, S can be covered by a finite collection of the .j; j/
intervals.
• It follows that there exists a natural number k such that S .k; k/, and S
is a bounded set.
• Suppose that there is a real number x in the boundary of S that is not an
element of S.
• For each > 0, let U D .1; x / [ .x C ; 1/ which is an open set.
• The collection of all such U covers all of Rnfxg, and since x is not an
element of S, the collection is an open cover of S.
• Because S is compact, it is covered by a finite collection of the U sets.
• It follows that there is an ı > 0 such that S Uı .
(continued)
290 9 Topology of the Real Line
Continuous functions need not map bounded sets onto bounded sets as is seen by
f .x/ D 1x which maps the bounded interval .0; 1/ continuously onto the interval
.1; 1/ which is not bounded. Continuous functions need not map closed sets
onto closed sets as seen by f .x/ D 1x which maps the closed interval Œ1; 1/
onto Œ1; 0/ which is not closed. But continuous functions always map compact
sets onto compact sets. This is a result that is true in any topological space, so
its proof need not use any more than the properties of open sets, compact sets,
and continuous functions. To write the proof you would start by assuming that the
function f W A ! B is continuous on A, and that C A is a compact set. You must
then show that the image of C, f .C/ B, is compact. How would you show this
set is compact? The definition of compact set suggests that you would take an open
cover of the set and proceed to show that that cover has a finite subcover. So let I be
an index set and assume that fUi j i 2 Ig is an open cover of f .C/. Somehow you
must show that this cover has a finite subcover. All you know is that f is a continuous
function and that the set C is compact. Since C is compact, you know that open
covers of C have finite subcovers, but you have an open cover of f .C/, not an open
cover of C. You need to use the fact that f is a continuous function which means that
for each i 2 I, the preimage of the open set Ui , f 1 .Ui /, is an open set in A. Does
the collection of f 1 .Ui / sets form a cover of C? Follow what happens: if x 2 C,
then f .x/ 2 f .C/. Thus, there is at least one i 2 I such that f .x/ 2 Ui . Therefore,
x 2 f 1 .Ui /. So, indeed, the collection of f 1 .Ui / sets forms an open cover of C.
Hence, there is a finite subcover of C given (by renaming subscripts) as f 1 .U1 /,
f 1 .U1 /; f 1 .U1 /; : : : ; f 1 .Uk /, for some natural number
k. For each x 2 C, there is
a j between 1 and k such that x 2 f 1 .Uj /, so f .x/ 2 f f 1 .Uj / Uj . Because each
element of f .C/ is the image of at least one x 2 C, and each x 2 C is an element
of at least one of the finite number of f 1 .Uj /, it follows that the finite collection of
open sets, U1 ; U2 ; U3 ; : : : ; Uk , covers f .C/ proving that f .C/ is compact.
PROOF: If f W A ! B is continuous on A, and if C A is a compact set,
then f .C/ is a compact set in B.
• Assume that f W A ! B is continuous on A, and C A is a compact set.
• Let I be an index set, and fUi j i 2 Ig be a collection of open sets that cover
f .C/.
• For each x 2 C there is an i 2 I such that f .x/ 2 Ui .
• Since f is continuous, and, for each i 2 I, Ui is an open set in B, f 1 .Ui / is
an open set in A.
• Thus, ff 1 .Ui / j i 2 Ig is an open cover of C.
(continued)
9.7 Connectedness 291
Notice that it is an immediate consequence of this theorem that a real valued con-
tinuous function on a closed bounded interval on the real line is bounded and obtains
its maximum and minimum values. This is because every closed bounded interval on
the real line is a compact set, so its image under a continuous function is compact
which means the image is closed and bounded. The image being bounded is just
another way of saying that the function is bounded. The image being closed shows
that the image contains its boundary which includes the maximum and minimum
values of the function.
The Heine–Borel Theorem can be extended to n-dimensional Euclidean
space Rn . That is, the compact sets in Rn are the sets that are both closed and
bounded. One can use mathematical induction to show that a rectangular box that
is a cross product of n closed intervals is compact, and then, that can be extended to
any closed bounded set.
9.6.1 Exercises
1. Find an example of a function f and a set C such that f 1 f .C/ is notequal to C.
2. Find an example of a continuous function f and a set D such that f f 1 .D/ is
not equal to D.
3. Find an example of a continuous function f W A ! B and a compact set D B
such that f 1 .D/ is not compact.
4. Suppose that the continuous function f has domain Œ0; 10 and codomain .4; 4/.
Show that the function is not surjective.
9.7 Connectedness
The intervals on the real line were discussed in Chap. 2. A set of real numbers is an
interval if whenever x and y are elements of the interval, then all the real numbers
between x and y are also elements of the interval. The intervals are the connected
sets on the real line, but the concept of connectedness can be extended to any
topological space. In a general topological space, two nonempty sets A and B are
292 9 Topology of the Real Line
[ ] | | ( )
0 1 2 3 4 5
disconnected if there are disjoint open sets U and V with A U and B V. For
example, the sets Œ0; 1 and .4; 5/ are disconnected because Œ0; 1 .1; 2/ and
.4; 5/ .4; 5/ where .1; 2/ and .4; 5/ are disjoint open sets (Fig. 9.8). The sets
Œ0; 3 and .3; 5/ are disjoint nonempty sets, but they are not disconnected because
any open set that contains Œ0; 3 will necessarily share points with any open set
containing .3; 5/, and, in particular, both open sets will contain the element 3. A set
is called connected if it is not the union of two disconnected nonempty sets. Even
though the connected sets of real numbers are just the intervals, the concept of
connectedness gets far more interesting in more general topological spaces.
If f W A ! B is continuous, then it always maps connected sets to connected
sets, that is, if C A is a connected set, then so is f .C/. This is easy to see since,
if f .C/ is disconnected, then there are two disjoint open sets U and V in B and two
nonempty sets S and T in B such that f .C/ D S [ T and S U and T V. But then
C f 1 .U/ [ f 1 .V/ where f 1 .U/ and f 1 .V/ are disjoint open sets in A. Because
S and T are nonempty, C \ f 1 .U/ and C \ f 1 .V/ are nonempty implying that C
is a disconnected set. Thus, if C is connected, f .C/ must also be connected.
When this theorem is applied to functions from the real numbers to the real
numbers, the result is the Intermediate Value Theorem which states that if f is a
real valued function on the interval Œa; b, then for every c between f .a/ and f .b/
there is an x 2 Œa; b such that f .x/ D c. This is because f must map the connected
set Œa; b into a connected set which must include all the elements c between f .a/
and f .b/. Note that f 1 need not bring connected sets to connected sets.
In n-dimensional Euclidean space the concept of connectedness gets consider-
ably richer as the connected sets are not merely the cross products of intervals
(Fig. 9.9). In R2 one introduces what it means for a set to be path-connected which
makes precise the intuitive notion that a set is connected if you can draw a path
9.7 Connectedness 293
C N
Fig. 9.9 The set C is a connected set. The set N is not a connected set
1
Fig. 9.10 Graph of sin x
with the y-axis
between any two of its points where the path stays inside the set. On the real line,
this just means that for any two points in the set, the interval between the two points
stays in the set. But in R2 where paths need not be straight lines, the examples are
far more varied. In fact, in R2 there are examples of connected sets that are not path
connected, a phenomenon that cannot occur on the real line. A famous example is
the set consisting of the graph of the equation y D sin 1x along with the y-axis.
This is a connected set because any
open set that contains the y-axis must intersect
parts of the graph of y D sin 1x both to the left and to the right of the y-axis. On
the other hand, this set is not path-connected because there is no way to construct a
path that stays inside the set and connects the points . 1 ; 0/ and . 1 ; 0/ (Fig. 9.10).
9.7.1 Exercises
This book has discussed at length how one writes proofs about the limits and
continuity of functions whose domains and ranges are subsets of the real num-
bers, R. Although the real numbers is a far simpler set to study than many other
naturally arising sets in Analysis, the techniques learned while dealing with real-
valued functions of a real variable can be applied almost exactly to prove similar
theorems about functions defined on other domains with other types of ranges. It is
instructive to take note of the properties of the real numbers that play important
roles in these proofs. In particular, most of the proofs about limits and continuity
involve measuring the distance between two real numbers x and y. This is done
by calculating the absolute value of the difference between the numbers, jx yj.
This distance measure has important properties that allow the proofs about limits
and continuity to proceed. Among the useful properties of this distance measure
is that if jx yj < for every > 0, then it follows that x D y, and if
jx yj > 0, then x is surely different from y. Another property use repeatedly in
these proofs is the triangle inequality. For example, if f and g are two functions,
and x and y are both elements in the domains of these functions, then knowing
ˇ jf .x/ f.y/j < 2 and jg.x/
that g.y/j < 2 allows the proofs to conclude that
ˇ ˇ ˇ
ˇ f .x/ C g.x/ f .y/ C g.y/ ˇ D ˇ f .x/ f .y/ C g.x/ g.y/ ˇ jf .x/f .y/jC
jg.x/ g.y/j < 2 C 2 D . The fact that the triangle inequality holds true for this
chosen measure of distance is crucial in the argument.
The conclusion is, then, that if there were a set, X, and a distance measure that
assigned to each x and y in X a real number, d.x; y/, that had many of the same
properties that the jx yj distance measure does in the real numbers, then it might
be possible to prove limit and continuity theorems for functions defined on X by
just adopting the same proof techniques used for the theorems about functions of
y z
d(y,z)
real numbers. With this in mind a nonempty set X together with distance function
d is defined to be a metric space if, for all x, y, and z in X, this distance function
satisfies the following properties:
• d.x; y/ 2 R with d.x; y/ 0 (the distance function is a nonnegative real number).
• d.x; y/ D 0 if and only if x D y (the distance function separates points).
• d.x; y/ D d.y; x/ (the distance function is symmetric).
• d.x; y/ C d.y; z/ d.x; z/ (the distance function satisfies the triangle inequality).
The distance function defines a metric for the metric space, and the metric
space is designated as <X; d> (Fig. 10.1). This definition is a generalization of the
distance function defined on the real numbers, d.x; y/ D jx yj. Clearly, for all real
numbers x, y, and z,
• d.x; y/ D jx yj 0
• 0 D d.x; y/ D jx yj if and only if x D y
• d.x; y/ D jx yj D jy xj D d.y; x/
• d.x; y/ C d.y; z/ D jx yj C jy zj j.x y/ C .y z/j D jx zj D d.x; z/
showing that <R; d> where d.x; y/ D jx yj is a metric space. In general, it is a
fairly straightforward process to construct a proof that <X; d> is a metric space.
Most proofs would follow this template:
Given a metric space <X; d>, an element a 2 X, and a positive real number r,
define the neighborhood of a with radius r to be N.a; r/ D fx 2 X j d.a; x/ < rg.
Sometimes, as in the definition of a limit at point a, one needs to exclude the point a
from the neighborhood of a. In this case, one can define the deleted neighborhood
of a with radius r to be N ı .a; r/ D fx 2 X j 0 < d.a; x/ < rg. These neighborhoods
play a central role in defining limits and continuity of functions defined on X and
in establishing a topology for the space X. It is not uncommon for there to be
10.2 Inequalities 297
several different distance functions defined on a particular set X that make X into a
metric space. Each new distance function results in different shaped neighborhoods.
Some give rise to the same topology of X while others may result in quite different
topologies. Many examples of these different distance functions will be explored in
the sections that follow.
10.2 Inequalities
The student familiar with vectors and the dot q product of two vectors will find
this inequality easy to remember. If jaj D a21 C a22 C a23 C C a2n refers to
the magnitude of vector a and the dot product a b D .a1 ; a2 ; a3 ; : : : an /
.b1 ; b2 ; b3 ; : : : bn / D a1 b1 C a2 b2 C a3 b3 C C an bn , then a b D jaj jbj cos
where is the angle between the two vectors. Then the Cauchy–Schwarz Inequality
is just the statement that jaj jbj ja bj which follows because j cos j 1.
To prove the Cauchy–Schwarz Inequality note that for given a; b 2 Rn and every
P
n
real number x the quantity .aj C xbj /2 is a sum of squares of real numbers, so
jD1
P
n P
n
it must be nonnegative. By expanding the squares one gets a2j C 2x aj bj C
jD1 jD1
P
n
x2 b2j 0. Thus, this is a quadratic polynomial in x that is nonnegative for every
jD1
real number x. Any quadratic polynomial Ax2 CBxCC with A > 0 is nonnegative for
every x if and only if its discriminant B2 4AC is not positive. But the discriminant
298 10 Metric Spaces
2 !2 ! !3
P
n P
n P
n
of the previous polynomial is 4 4 aj bj a2j b2j 5. The statement
jD1 jD1 jD1
that this discriminant is less than or equal to 0 is exactly the statement of the
Cauchy–Schwarz Inequality. An even stronger statement can now be made. Equality
occurs in the Cauchy–Schwarz Inequality if and only if the given discriminant is 0
so that the underlying quadratic polynomial has exactly one root, meaning that the
Pn
sum .aj C xbj /2 is 0 for exactly one value of x. This happens if and only if a is
jD1
a multiple of b. Thus, the Cauchy–Schwarz Inequality always holds, and equality
holds exactly when one of the points .a1 ; a2 ; a3 ; : : : ; an / and .b1 ; b2 ; b3 ; : : : ; bn / is a
scalar multiple of the other.
doubling it and adding a21 C a22 C a23 C C a2n C b21 C b22 C b23 C C b2n to
both sides yields
˚
.a1 C b1 /2 C .a2 C b2 /2 C .a3 C b3 /2 C C .an C bn /2
2 q q
a1 C a22 C a23 C C a2n C 2 a21 C a22 C a33 C C a2n b21 C b22 C b33 C C b2n C .b21 C b22 C b23 C C b2n /
which is a special case of the Minkowski Inequality which can be restated jaCbj
jaj C jbj. Again, equality occurs only when one of the points is a scalar multiple of
the other.
10.2.3 Exercises
P1
2. Show that the Minkowski Inequality extends to infinite series. That is, if a2n
s s nD1
P
1
2
P1
2
P1
and bn are both convergent series, then .an C bn / a2n C
nD1 nD1 nD1
s
P1
b2n .
nD1
3. Show that for any real numbers a1 ; a2 ; a3 ; : : : ; an and positive real numbers
a2 a2 a2 a2
b1 ; b2 ; b3 ; : : : ; bn , the following inequality holds: 1 C 2 C 3 C C n
b1 b2 b3 bn
.a1 C a2 C a3 C C an /2
. This can be shown by mathematical induction on n,
b1 C b2 C b3 C C bn
but can also be shown using the Cauchy–Schwarz Inequality.
For any natural number n one can define n-dimensional Euclidean space, Rn ,
with R1 being the real numbers, R2 being the Euclidean plane, R3 being
3-dimensional Euclidean space, and so forth. Elements of Rn can be represented
as ordered n-tuples of real numbers, .x1 ; x2 ; x3 ; : : : ; xn /. You should be familiar
with the Euclidean distance between two points in n-dimensional Euclidean space,
x D .x1 ; x2 ; x3 ; : : : ; xn / and y D .y1 ; y2 ; y3 ; : : : ; yn /, given by the generalization of
the Pythagorean Theorem as
p
d.x; y/ D .x1 y1 /2 C .x2 y2 /2 C .x3 y3 /2 C C .xn yn /2 :
Together these facts show that Rn with Euclidean distance is a metric space
(Fig. 10.2).
300 10 Metric Spaces
√
d(x,y) = (x1– x2)2+(y1– y2)2
|y2– y1|
The Euclidean distance, sometimes called the Euclidean metric, may be the most
commonly seen distance function used for Euclidean space, but there are many other
distance functions which can make Rn into a metric space. One example is d.a; b/ D
ja1 b1 jCja2 b2 jCja3 b3 jC Cjan bn j. This is sometimes called the taxicab
metric because the distance d.a; b/ is the distance you would travel between the two
points a and b if you could only travel in directions parallel to one of the coordinate
axes as a taxicab would do on a rectangular grid of streets. Proving that this distance
function makes Rn into a metric space is quite easy.
10.3 Examples of Metric Spaces 301
Still another distance function that can be used for Euclidean space is called
the supremum metric given by d.a; b/ D max.ja1 b1 j; ja2 b2 j; ja3 b3 j; : : : ;
jan bn j/. It is constructive to compare the shapes of the neighborhoods that you get
using the Euclidean metric, the taxicab metric, and the supremum metric as shown
in Fig. 10.3. Since the Euclidean distance is the familiar distance from Euclidean
Geometry, it is easy to see that if a 2 Rn and r > 0, then N.a; r/ is an open ball with
center a and radius r. On the other hand, using the taxicab metric, N.a; r/ is a union
of 2n n-dimensional triangular pyramids. That is, when n D 2, N.a; r/ is a diamond
made up of four isosceles right triangles, and when n D 3, N.a; r/ is a union of
8 tetrahedra, one in each octant, forming a regular octahedron. For the supremum
metric, N.a; r/ is an n-dimensional cube. Note that in the Euclidean metric, if the
coordinate axes are rotated (performing an orthogonal change of coordinates), there
is no change in the neighborhood whereas with the other two metrics, rotating the
axes changes the orientation of the neighborhoods. It turns out that all three of these
Fig. 10.3 N.0; 1/ in the Euclidean, taxicab, and supremum metrics in 2 and 3 dimensions
302 10 Metric Spaces
metrics give rise to the same topology on Rn because each metric gives the same
open sets even though the open neighborhoods are different in shape. But the three
metrics have different algebraic properties, and sometimes it is easier to prove a
particular theorem using one of these metrics rather than the others.
Distance measures in metric spaces need not be complicated. For any set X you
can define d.x; x/ D 0 for all x 2 X and d.x; y/ D 1 for all x and y in X with x ¤ y. It
is very easy to see that d.x; y/ is nonnegative, symmetric, and equal to 0 if and only
if x D y. Also, for any x; y; z 2 S, if d.x; z/ D 1, then x ¤ z, so at least one of d.x; y/
and d.y; z/ must be 1 which implies the triangle inequality d.x; y/Cd.y; z/ d.x; z/.
Thus, any set X is a metric space with this metric sometimes called the discrete
metric, and <X; d> is called a discrete metric space. Note that for this metric,
each neighborhood, N.a; r/ is either all of X or just the single point fag depending
on whether or not r is greater than 1.
Next, consider a space that looks much different than Euclidean space. Let CŒ0; 1
be the set of all real-valued functions continuous on the interval Œ0; 1. Certainly,
this set contains all the polynomials with real coefficients, but it also includes the
rational functions that are defined on Œ0; 1, exponential functions, many elementary
functions, and a much larger class of functions continuous but not differentiable on
Œ0; 1. This set is truly very large as compared, say, to the set of real numbers. There
are many ways you might try to measure the distance between two functions in this
set. For example, you could evaluate the function at one or more points and measure
how much the functions differ at those points.ˇ That is,ˇif f and g are in CŒ0; 1, you
could define d.f ; g/ D jf .0/ g.0/j C ˇf 12 g 12 ˇ C jf .1/ g.1/j. The only
problem with this definition is that there are continuous functions f and g which are
equal at 0, 12 , and 1 but not equal at other points such as f .x/ D x.2x 1/.x 1/
and g.x/ D 2x.2x 1/.x 1/. Because the given distance function gives a distance
of 0 between two unequal functions, it cannot serve as a metric for the space of
continuous functions on Œ0; 1 (Fig. 10.4).
As a result, a distance function that makes CŒ0; 1 into a metric space really
needs to take into account the values of the functions at all the points (or at least
a dense set of points) in Œ0; 1. One distance measure that does this is called
the supremum metric or sup metric for short. It is defined for all f and g in
CŒ0; 1 as d.f ; g/ D sup jf .x/ g.x/j. It is clear that if f ¤ g, then there are
x2Œ0;1
values of x 2 Œ0; 1 where f .x/ ¤ g.x/, so d.f ; g/ will be positive, yet when
f g, then d.f ; g/ D 0 as needed. It is necessary to check that this distance
0 1
10.3 Examples of Metric Spaces 303
function has a valid definition, that is, for every f and g in CŒ0; 1 the distance
function gives a nonnegative real number. But if f and g are continuous functions
on Œ0; 1, then so is jf .x/ g.x/j. Since all functions continuous on Œ0; 1 are
bounded and jf .x/g.x/j is a continuous function, the needed supremum is defined.
The triangle inequality follows from the fact that the triangle inequality works for
real numbers. Since for any three continuous functions f , g, and h and for each
x 2 Œ0; 1it is true that jf .x/ g.x/j C
jg.x/ h.x/j jf .x/ h.x/j, it follows
that sup jf .x/ g.x/j C jg.x/ h.x/j sup jf .x/ h.x/j. Then, the inequality
x2Œ0;1 x2Œ0;1
sup.A C B/ sup A C sup B shows that sup jf .x/ g.x/j C sup jg.x/ h.x/j
x2Œ0;1 x2Œ0;1
sup jf .x/ g.x/j C jg.x/ h.x/j sup jf .x/ h.x/j, and d.f ; g/ C d.g; h/
x2Œ0;1 x2Œ0;1
d.f ; h/.
PROOF: The set CŒ01 with distance function d.f ; g/ D sup jf .x/ g.x/j
x2Œ0;1
is a metric space.
• SET THE CONTEXT: Let CŒ0; 1 be the set of real-valued functions
continuous on the interval Œ0; 1.
• METRIC DEFINITION: For any f and g in CŒ0; 1, the function
jf .x/ g.x/j is also in CŒ0; 1. Define d.x; y/ D sup jf .x/ g.x/j which is
x2Œ0;1
the supremum of a nonnegative continuous function, so it is a nonnegative
real number.
• SEPARATION OF POINTS: For f ; g 2 CŒ0; 1 if f ¤ g, then for some
x 2 Œ0; 1, jf .x/ g.x/j must be positive implying that d.f ; g/ > 0.
• ZERO DISTANCE: For all x 2 Œ0; 1 and f 2 CŒ0; 1, jf .x/ f .x/j D 0, so
sup jf .x/ f .x/j D 0 and d.f ; f / D 0.
x2Œ0;1
• SYMMETRY: Since for all x 2 Œ0; 1 and all f ; g 2 CŒ0; 1, jf .x/ g.x/j D
jg.x/ f .x/j, it follows that d.f ; g/ D d.g; f /.
• TRIANGLE INEQUALITY: Since for all x 2 Œ0; 1 and all f ; g; h 2 CŒ0; 1,
it holds that jf .x/ g.x/j C jg.x/ h.x/j jf .x/ h.x/j, it follows
that sup jf .x/ g.x/j C sup jg.x/ h.x/j sup jf .x/ g.x/j C
x2Œ0;1
x2Œ0;1 x2Œ0;1
jg.x/ h.x/j sup jf .x/ h.x/j, and d.f ; g/ C d.g; h/ d.f ; h/.
x2Œ0;1
• This shows that CŒ0; 1 with the supremum distance function is a metric
space.
The supremum metric provides only one of many possible distance functions
for the space CŒ0; 1. Another example is called the L1 metric and is defined by
R1
d.f ; g/ D jf .x/ g.x/jdx. Since all functions continuous on a closed interval are
0
integrable there, this distance function is defined. Moreover, since jf .x/ g.x/j 0
for all x 2 Œ0; 1, its integral is also nonnegative. If f ¤ g, then there is an a 2 Œ0; 1
where f .a/ ¤ g.a/. Because jf .x/ g.x/j is continuous and positive at x D a, there
is a ı > 0 such that for all x 2 CŒ0; 1 with jxaj < ı, jf .x/g.x/j > 12 jf .a/g.a/j.
304 10 Metric Spaces
R1 R
aCı
This implies that d.f ; g/ D jf .x/ g.x/jdx > jf .x/ g.x/jdx > 0. Of course,
0 aı
a rigorous proof will take care that the limits of integration in the previous sentence
are chosen in a way that the integral is guaranteed to be defined. The symmetry
of d follows from its definition. For all f ; g; h 2 CŒ0; 1 and each x 2 Œ0; 1,
the triangle inequality gives jf .x/ g.x/j C jg.x/ h.x/j jf .x/ h.x/j. Thus,
R1 R1 R1
jf .x/ g.x/jdx C jg.x/ h.x/jdx D jf .x/ g.x/j C jg.x/ h.x/jdx
0 0 0
R1
jf .x/ h.x/jdx, so d.f ; g/ C d.g; h/ d.f ; h/, and the needed triangle inequality
0
holds.
R1
PROOF: The set CŒ01 with distance function d.f ; g/ D jf .x/ g.x/jdx
0
is a metric space.
• SET THE CONTEXT: Let CŒ0; 1 be the set of real-valued functions
continuous on the interval Œ0; 1.
R1
• METRIC DEFINITION: Define d.x; y/ D jf .x/ g.x/jdx which is the
0
integral of a nonnegative continuous function, so it is a nonnegative real
number.
• SEPARATION OF POINTS: For f ; g 2 CŒ0; 1 if f ¤ g, then for some
a 2 Œ0; 1, jf .a/ g.a/j must be positive.
• Because jf .x/ g.x/j is a continuous function, there is a ı > 0 such that
jf .x/ g.x/j > 12 jf .a/ g.a/j for all x 2 Œ0; 1 satisfying jx aj < ı.
• In particular, there are ˛ and ˇ in Œ0; 1 with ˛ < ˇ such that jf .x/ g.x/j >
1
2
jf .a/ g.a/j for all x satisfying ˛ < x < ˇ.
R1 Rˇ
• Then d.f ; g/ D jf .x/ g.x/jdx jf .x/ g.x/jdx >
0 ˛
1
2
jf .a/
g.a/j.ˇ ˛/ > 0, so d.f ; g/ > 0 whenever f ¤ g.
• ZERO DISTANCE: For all x 2 Œ0; 1 and f 2 CŒ0; 1, jf .x/ f .x/j D 0, so
R1 R1
jf .x/ f .x/jdx D 0 dx D 0 and d.f ; f / D 0.
0 0
• SYMMETRY: Since for all x 2 Œ0; 1 and all f ; g 2 CŒ0; 1, jf .x/ g.x/j D
jg.x/ f .x/j, it follows that d.x; y/ D d.y; x/.
• TRIANGLE INEQUALITY: Since for all x 2 Œ0; 1 and all f ; g; h 2 CŒ0; 1,
it holds that jf .x/ g.x/j C jg.x/ h.x/j jf .x/ h.x/j, it follows that
R1 R1 R1
jf .x/ g.x/jdx C jg.x/ h.x/jdx D jf .x/ g.x/j C jg.x/ h.x/jdx
0 0 0
R1
jf .x/ h.x/jdx, and d.f ; g/ C d.g; h/ d.f ; h/.
0
• This shows that CŒ0; 1 with the d.f ; g/ distance function is a metric space.
10.3 Examples of Metric Spaces 305
It is important to note that the supremum metric and the L1 metric are
9 consider the sequence of functions fn .x/ D
distinctly different. In particular,
8
1
ˆ
ˆ 0 if 0 >
>
ˆ
ˆ
nC1 >
>
ˆ
< >
=
1
n.n C 1/x n if nC1 < x n 1
for all natural numbers n. In the L1 metric,
ˆ
ˆ >
>
ˆ
ˆ >
>
:̂ 1
>
;
1 if n < x 1
these functions converge to the function which is identically 1 on Œ0; 1. On the
other hand, this sequence is not even a Cauchy sequence in the supremum metric
since d.fn ; fm / D 1 for all n ¤ m. All metrics for CŒ0; 1 need to measure the
distance between two continuous functions. The supremum metric measures the
maximum distance between two functions whereas the L1 metric measures a mean
distance between two functions.
10.3.1 Exercises
8. If <X; dX > and <Y; dY > are both metric spaces, then X Y D f.x; y/ j x 2
X and y 2 Yg with distance function d .x1 ; y1 /; .x2 ; y2 / D dX .x1 ; x2 / C
dY .y1 ; y2 / is a metric space. s
R1 2
9. CŒ0; 1 with distance function d.f ; g/ D f .x/ g.x/ dx is a metric space.
0
This distance function is called the L2 metric.
R1
10. The L1 metric d.f ; g/ D jf .x/ g.x/j dx is not a metric for the space of all
0
Riemann integrable functions defined on the interval Œ0; 1.
Recall that in the real numbers, R, the interior of a set S is defined to be the set of
points x 2 S such that there is an > 0 for which the entire interval .x ; x C /
is contained in S. The exterior of a set is defined to be the set of points x … S such
that there is an > 0 for which the entire interval .x ; x C / is contained in the
complement of S. The boundary of a set S is defined to be the set of points neither in
the interior nor the exterior of the set, or the points x such that for all > 0 the set
.x; xC/ contains elements of S and elements of Sc . All three of these definitions
generalize in a natural way to all metric spaces. Indeed, one just has to replace the
role of the open interval .x ; x C / with the neighborhood N.x; /. That is, if
<X; d> is a metric space, and S X, the interior of S, int.S/, is the set of x 2 S
such that there is an > 0 for which N.x; / S, the exterior of S, ext.S/, is the
set of x 2 Sc such that there is an > 0 for which N.x; / Sc , and the boundary
of S, @S, is the set of x 2 X such that for every > 0, the set N.x; / contains points
in S and points in Sc .
The definitions of interior, exterior, and boundary, in turn, allow one to define
open and closed sets, accumulation point, derived set, and closure in ways analogous
to how they are defined for the set of real numbers. That is, if S is a subset of a
metric space X, S is an open set if S D int.S/, S is a closed set if @S S, S has
accumulation point a if, for every > 0, N ı .a; /\S ¤ ;, the derived set of S, S0 ,
is the set of accumulation points of S, and the closure of S; cl.S/; is S[S0 . It is worth
noting that for every x 2 X and every > 0 that N.x; / is an open set. This is easy
to show by thinking about how you prove that an open interval in the real numbers
is an open set. In the real numbers, if a < b, then .a; b/ is open because if y 2 .a; b/,
the interval .y ı; y C ı/ .a; b/ when ı D min.y a; b y/. Similarly, then, in
metric space <X; d>, if a 2 X and > 0 are given, let y 2 N.a; /. It follows from
the definition of N.a; / that ı D d.a; y/ > 0. Then, if x 2 N.y; ı/, d.x; y/ <
ı D d.a; y/, so by the triangle inequality d.a; x/ d.a; y/ C d.y; x/ < , and
x 2 N.a; /. Thus, you can conclude that N.y; ı/ N.a; / when ı D d.a; y/
which proves that N.a; / is open.
10.4 Topology of Metric Spaces 307
Many of the theorems pertaining to the topology of the real numbers proved in the
preceding chapter can now be reproved in the context of metric spaces by merely
changing references to open intervals .x ; x C / with the new neighborhood
notation, N.x; /. For example, consider the proof that the union of open sets is also
an open set (Fig. 10.5).
PROOF: In metric space <X; d> assume that for each i in the index set I,
Ai is an open set. Then [ Ai is an open set.
i2I
• In metric space <X; d> assume that for each i in the index set I, Ai is an
open set.
• Let x 2 [ Ai .
i2I
• By the definition of set union, there is an j 2 I such that x is an element of
the open set Aj .
• By the definition of open set, there is an > 0 such that the N.x; / Aj .
• But by the definition of set union, Aj [ Ai showing that N.x; / [ Ai ,
i2I i2I
which proves the theorem.
Several other examples are left for the exercises. Note that any metric space <X; d>
with the given definition of open set is a topological space as defined in Chap. 9.
10.4.1 Exercises
4. A subset S of a metric space <X; d> is open if and only if its complement
is closed.
5. In any metric space, the intersection of a finite number of open sets is an open set.
6. The subset S of a metric space is closed if and only if S D cl.S/.
7. For any subset S of a metric space, its derived set, S0 , is a closed set.
8. A set U is an open set in Rn using the taxicab metric if and only if U is open in
Rn using the Euclidean metric.
f
L
a
X Y
and jx2 C xy 2j small. In fact, if these three quantities were each less than 2 ,
p q
2
then .x y 0/2 C .x C y 2/2 C .x2 C xy 2/2 would be less than 3 4 < .
So how can you arrange for each of p jx yj, jx C y 2j, and jx2 C y2 2j to
be less than 2 when you know that .x 1/2 C .y 1/2 < ı? One thing that
p
.x 1/2 C .y 1/2 < ı tells you is that each of jx 1j and jy 1j must be
less than ı because, if either of them exceeded ı, the square root of the sum of
their squares would also exceed ı. So look at each inequality separately. To make
jx yj < 2 , it would be enough to have jx 1j and jy 1j both less than 4 because
jxyj D j.x1/.y1/j jx1jCjy1j. Similarly, if jx1j and jy1j were both
less than 4 , then jxCy2j D j.x1/C.y1/j jx1jCjy1j would be less than
2
. As for jx2 Cxy2j you again want to write the expression using terms that include
factors of x1 or y1 so you can make those terms small. One way to do this would
be to write jx2 Cxy2j D j.x2 1/C.xy1/j D j.x1/.xC1/C.x1/yC.y1/j. As
with other limits of quadratic expressions with which you have dealt, it is convenient
to limit the size of ı so that x and y cannot grow too large. So, if ı is less than
1, both jxj and jyj will be bounded by 2, and jx C 1j and jy C 1j will each be
bounded by 3. Thus, it would be good enough to know that jx 1j and jy 1j do
not exceed 12 because then, jx2 C y2 2j D j.x 1/.x C 1/ C .x 1/y C .y 1/j
jx 1j jx C 1j C jx 1j jyj C jy 1j 12 3 C 12 2 C 12 D 2 . This results in
the following proof.
a3
a2
a1
10.5.1 Exercises
PROOF: Let <A; dA > and <B; dB > be metric spaces, and let f W A ! B be
a function from A to B. Then f is continuous on A if and only if for every
open set D B, its preimage under f , f 1 .D/, is an open set in A.
• Let <A; dA > and <B; dB > be metric spaces, and let f W A ! B be a function
from A to B.
Continuity implies that the preimages of open sets are open
• Assume that f W A ! B is a continuous function.
• Let D be an open set in B, and let a 2 f 1 .D/.
• Because D is open in B, thereis an > 0 such that N.f .a/; / D.
• Thus, if y 2 B with dB f .a/; y < , then y 2 D.
• Because f is a continuous function, there is aı > 0 such that for all x 2 A
with dA .a; x/ < ı it follows that dB f .a/; f .x/
< .
• Thus, if x 2 N.a; ı/, then f .x/ 2 N f .a/; , so f .x/ 2 D and x 2 f 1 .D/.
• Therefore, f 1 .D/ is an open set in A proving that in preimage under f of
any open set is open.
Another theorem about continuous functions from the real numbers to the real
numbers is that the composition of two continuous functions is a continuous
function. This theorem generalized to metric spaces, but due to the previous result,
it now has a very simple proof which is not only valid for metric spaces, it is valid
for continuous functions between topological spaces.
PROOF: Suppose <X; dX >, <Y; dY >, and <Z; dZ > are metric spaces and
functions f W X ! Y and g W Y ! Z are both continuous functions. Then
g ı f W X ! Z is a continuous function.
• Let <X; dX >, <Y; dY >, and <Z; dZ > be metric spaces, and let functions
f W X ! Y and g W Y ! Z both be continuous functions.
• Let U Z be an open set.
• Then, because g is a continuous function, the set g1 .U/ is an open set in Y.
(continued)
10.6 Continuous Functions on Metric Spaces 313
• Then, because f is a continuous function, the set f 1 g1 .U/ is an open set
in X.
• If x 2 f 1 g1 .U/ , then f .x/ 2 g1 .U/ and g f .x/ 2 U implying that
x 2 .g ı f /1 .U/.
• Conversely, if x 2 .g ı f /1 .U/, then .g ı f /.x/ D g f .x/ 2 U implying
that x 2 f 1 1
g .U/ .
• Thus, f 1 g1 .U/ D .g ı f /1 .U/ is an open set, so g ı f is a continuous
function.
For some natural number n, consider functions that map a metric space <X; d>
into the metric space Rn using the Euclidean metric. In particular, if f W X ! Rn ,
then for each x 2 X, f .x/ 2 Rn . As a point in Rn , f .x/ has n coordinates that
each depends on the value of x, that is, f .x/ D f1 .x/; f2 .x/; f3 .x/; : : : ; fn .x/ . It is
important that f is a continuous function if and only if fj is continuous for each
j D 1; 2; 3; : : : ; n. What will be the format of a proof of this result? First of all,
since the statement of the result is the biconditional “f is continuous if and only if
fj is continuous for each j,” the proof will have two parts. If it is assumed that f is
a continuous function, then you must show that fj is continuous for each j. But this
will not be hard since the Euclidean distance between two points f .x/ and f .a/ will
be greater than or equal to the distance between their jth coordinates, fj .x/ and fj .a/.
Thus, a ı > 0 that ensures that f .x/ is within > 0 of f .a/ will also insure that
fj .x/ is within of fj .x/. Conversely, if fj is continuous for each j, then a ıj > 0
can be found that will ensure that fj .x/ is very close to fj .a/. But by making each
jfj .x/ fj .a/j small, say smaller than n , the Euclidean distance from f .x/ to f .a/ will
be less than . This will be what is needed to show that f is continuous.
(continued)
314 10 Metric Spaces
Recall that <X; d> is a discrete metric space when d.x; y/ D 1 whenever x ¤ y.
In this case N.x; 1/ is the singleton fxg for every x 2 X, so each singleton is an open
set. Since arbitrary unions of open sets are open sets, all subsets of X are open. As
a consequence, all subsets of X are also closed. Suppose that <Y; d0 > is any metric
space, and the function f maps X into Y. Then f is automatically continuous. Indeed,
for any open set U Y, the set f 1 .U/ X is open in X because all subsets of X
are open.
10.6.1 Exercises
5. Suppose <X; d1 > and <X; d2 > are both metric spaces, and that there are positive
real numbers c and C such that for all x and y in X, the distance function satisfies
cd1 .x; y/ d2 .x; y/ Cd1 .x; y/. Then a function is continuous on X with metric
d1 if and only if it is a function continuous on X with metric d2 .
6. The three metrics for Euclidean n-space: the Euclidean metric, the taxicab metric,
and the supremum metric are related in pairs as described in the previous
problem.
7. If <X; d> is a metric space and D X, then if f W D ! R is uniformly
continuous on D, there is a continuous function g W cl.D/ ! R such that
g.x/ D f .x/ for all x 2 D.
10.7 Homeomorphism
The interval Œ0; 1 and the interval Œ5; 30 certainly are different in length and
have different arithmetic properties such as 01 ¤ 30 5
. On the other hand, the two
intervals have identical topological properties in the sense that there is a one-to-one
correspondence between the points of these two intervals such that a set is open in
the first interval if and only if the corresponding set is open in the second interval.
It can easily be seen that the function f .x/ D 25x C 5 is a bijection from Œ0; 1 to
Œ5; 30 and a set U is open in Œ0; 1 if and only if f .U/ is open in Œ5; 30. This follows
from the fact that both f and its inverse function f 1 .x/ D x5 25
are continuous
and the fact that inverse images of open sets under continuous maps are open.
Two topological spaces X and Y are called homeomorphic if there is a continuous
bijection f W X ! Y whose inverse f 1 W Y ! X is also continuous. In such a case,
the bijection f is called a homeomorphism. If you know that two topological spaces
are homeomorphic, then you know that all of the topological properties of the two
spaces are the same. In particular, if X and Y are homeomorphic spaces, then if you
can prove that X has a particular property, it may follow immediately that Y also has
that property. Since metric spaces are topological spaces, one has that two metric
spaces <X; dX > and <Y; dY > are homeomorphic if there is a continuous bijection
from X to Y whose inverse is continuous. Thus, Œ0; 1 and Œ5; 30 are homeomorphic
with homeomorphism f .x/ D 25x C 5.
Note that if f W X ! Y is a continuous bijection, then the function f 1 W Y ! X
will be defined, but it need not be continuous. For example, the function f . / D
.cos ; sin / is a continuous bijection that maps the interval Œ0; 2/ on the real line
onto the circle radius 1 centered at the origin in R2 . The inverse is not a continuous
function because it maps every neighborhood of the point .1; 0/ 2 R2 to points that
are close to 2 and other points that are close to 0. This argument does not prove
that the interval is not homeomorphic to the circle; just that this particular function
is not a homeomorphism. A proof that the two spaces are not homeomorphic will
be discussed in the next section.
316 10 Metric Spaces
10.7.1 Exercises
10.8.1 Exercises
Since metric spaces are topological spaces, the concept of compactness can be
applied to metric spaces. That is, <X; d> is said to be a compact metric space if
every cover of X by open sets contains a finite subcover. It follows easily that if X is
compact, then every close subset of X is also compact. The proof of this is left as an
exercise.
Any set S in a metric space <X; d> is said to be bounded if there is a positive
real number r and a point a 2 X such that S N.a; r/. Equivalently, if S is not an
empty set, one can define the diameter of set S to be sup d.x; y/. Then S is bounded
x;y2S
if its diameter is finite. As it is in the real numbers, any compact set S in a metric
space <X; d> is both closed and bounded. To prove that a compact set S is closed,
suppose instead that a is an accumulation point S that is not contained in S. For each
real number r > 0 let Ur D fx j d.x; a/ > rg. Then S is contained in the union of the
open sets Ur sets, yet, because a is an accumulation point of S, no finite collection of
the Ur sets will cover S. Thus, if S is compact, it must contain all of its accumulation
points, so S is closed. To prove that a compact set S is bounded, let a be any point in
X. Then S is certainly contained in the collection of sets N.a; r/ where r ranges over
the positive real numbers. Thus, if S is compact, it is contained in a finite number
of the N.a; r/. If t is the largest of the r values associated with the finite number of
neighborhoods, then S is contained in N.a; t/ showing that S is bounded (Fig. 10.8).
a
a
S S
X X
S is closed
• Assume that a … S is an accumulation point of S.
• Because a … S, the set S is contained in the collection of open sets Ur D
fx j d.x; a/ > rg where r ranges over the positive real numbers.
• Since S is compact and contained in the union of the open sets fUr g, it must
be contained in a finite subcollection of fUr g.
• Let t be the least of the r values associated with this finite subcollection of
Ur sets.
• Then S Ut .
• But a is an accumulation point of S, so N.a; t/ \ S is not empty, so S cannot
be contained in Ut .
• This is a contradiction, so there must not be any accumulation points of S
contained in the complement of S implying that S is a closed set.
S is bounded
• Let a be any point in X.
• S is contained in the union of the open neighborhoods N.a; r/ where r
ranges over the positive real numbers.
• Since S is compact and contained in the union of the N.a; r/ sets, it must be
contained in a finite subcollection of the N.a; r/ sets.
• Let t be the greatest of the r values associated with this finite subcollection
of N.a; r/ sets.
• Then S N.a; t/ implying that S is a bounded set.
• Thus, any compact set S in a metric space must be both closed and bounded.
The Heine–Borel Theorem says that for the real numbers, the converse of this
theorem is also true; that is every closed and bounded set of real numbers is
compact. The fact that it is also true for Rn using the Euclidean metric is left
as an exercise. But there are metric spaces in which some closed and bounded
sets are not compact. Consider, for example, the metric space CŒ0; 1 using
the
8 supremum metric. The sequence 9 of functions in CŒ0; 1 given by fn .x/ D
ˆ
< 0 if x nC11 >
=
1
n.n C 1/x n if nC1 < x < 1n for natural numbers n. Let S D ffn j n > 0g.
:̂ >
;
1 if 1n x
Then if m and n are distinct natural numbers, the distance between fm and fn is 1,
so the set S can have no accumulation point. Thus, S is a closed set. Each fn 2 S is
a distance 1 from all the other functions in S, so S has diameter 1, and, thus, S is a
bounded set. Note that for each natural number n the open neighborhood N.fn ; 12 /
320 10 Metric Spaces
Fig. 10.9 A continuous real-valued function on a compact set in R2 obtains its maximum and its
minimum
10.9 Compact Metric Spaces 321
You might reflect for a minute on the fact that the preceding two proofs are
considerably simpler than analogous theorems proved in Chap. 4 about continuous
real-valued functions defined on closed bounded intervals. Of course, at that time,
you had not been introduced to open and closed sets, did not have at your
disposal the theory of compact sets, had not proved the Heine–Borel Theorem
that established that closed bounded sets are compact, and had not proved that
the continuous images of compact sets are compact. It is actually comforting to
realize that the effort to learn about all of these newer concepts does give you many
powerful tools that simplify such proofs.
Earlier in the chapter in the section on homeomorphisms there was an example
of a continuous bijection between two metric spaces that failed to be a homeo-
morphism because the function did not have a continuous inverse. When compact
metric spaces are involved, this does not happen, that is, any continuous bijection
of a compact metric space to another (necessarily compact) metric space has a
continuous inverse, and thus, is a homeomorphism. To prove this you would want
to assume that f W X ! Y was a continuous bijection from compact metric space
<X; dX > to metric space <Y; dY >. Because f is injective, it does have an inverse
function f 1 W Y ! X. How would you prove that such a function is continuous?
One way would be to show that the inverse of f 1 , that is f , maps open sets to
open sets. But note that f maps closed sets to closed sets because every closed
subset of the compact metric space, X, is compact, so its image under the continuous
function f must be compact and thus closed. So if f maps closed sets onto closed
sets, then since f is a bijection, it must map the complement of a closed set onto the
complement of a closed set. This means f maps open sets onto open sets (Fig. 10.10).
322 10 Metric Spaces
f-1
X Y
10.9.1 Exercises
5. Suppose set S in metric space <X; d> has diameter r. Show that cl.S/ also has
diameter r.
The Completeness Axiom discussed in Chap. 2 says that every nonempty subset of
the real numbers that is bounded above has a least upper bound. As seen in Chap. 3,
this has many important consequences such as
• The Intermediate Value Theorem
• Every bounded monotone sequence of real numbers converges.
• Every Cauchy sequence of real numbers converges.
The concept of completeness can be generalized to metric spaces. <X; d> is
a complete metric space if every sequence in X that is Cauchy is a convergent
sequence. This is not a concept that can apply to topological spaces in general
because the concept of a Cauchy sequence only makes sense when there is some
sort of distance measure that can be used to determine if the terms of the sequence
are getting close to each other.
Any compact metric space is a complete metric space. To prove this you would
assume that <X; d> is a compact metric space and that <an > is a Cauchy sequence
in X. Your goal is to prove that the sequence has a limit in X. First you would need to
identify a point as the limit and then prove that that point is the limit of the sequence.
The property of compact metric spaces that works here is the property that if the
intersection of a collection of closed sets is the empty set, then a finite subcollection
of those closed sets also has empty intersection. Can you use the property of Cauchy
sequences to find a sequence of closed sets decreasing in size whose intersection
is not empty? Here is one technique that works. Because the sequence <an > is
Cauchy, there is a natural number k1 such that for all m and n greater than or equal
to k1 , the distance d.am ; an / < 1. That means that for all n k1 , an 2 N.ak1 ; 1/.
Of course, N.ak1 ; 1/ is not a closed set, but its closure N1 D cl N.ak1 ; 1/ is. Then
N1 is a closed set in X that contains the entire sequence <an > from the k1 term
on. Similarly, for each natural number p, find a natural number kp such that for all
m and n greater than or equal to kp , the distance d.am ; an / < 1p . Then let Np D
cl N.akp ; 1p / , which is a closed set such that for all n akp , an 2 Np . Now it is easy
to verify that the intersection of any finite number of the Np sets contains infinitely
many terms of the sequence, so the intersection of any finite collection of these
closed sets is not empty. Thus, since X is a compact metric space, the intersection of
all the Np sets is not empty. But since the diameter of the set Np is at most 2p which
goes to 0 as p gets large, there can be no more than one point in the intersection of
all of the Np sets. Call this intersection point L. By selecting a p large enough so that
2
p
is smaller than a given > 0, you can assure that for all n > kp the term an is in
Np and, thus, within of L. This shows that the sequence converges to L.
324 10 Metric Spaces
In Chap. 4 there are two proofs for the Heine–Borel Theorem, which states that
closed bounded sets of real numbers are compact. The second of these proofs begins
with a closed interval which is assumed to have an open covering. If the interval
cannot be covered by a finite subcover, then at least one of the two halves of the
interval cannot be covered by a finite subcover. That half interval can, in turn, be
broken into two halves, and at least one of those intervals does not have a finite
subcover. This process can be continued until the collection of ever smaller intervals
converge to an individual point. The properties of the real numbers that makes this
argument work is that every interval can be written as the union of two intervals
each with half the length of the original, and the fact that the real numbers are
complete which guarantees that there will be a point in the intersection of these
nested intervals that decrease to 0 in length.
This suggests a way to prove an analogous result for other complete metric spaces
that have the property that bounded closed sets can be written as the union of a
finite number of closed sets each with a diameter that is at most half that of the
original. By following the argument from the Heine–Borel Theorem, it is possible
to prove that the compact sets in such a metric space are exactly the closed and
bounded sets.
10.10 Complete Metric Spaces 325
PROOF: Let <X; d> be a complete metric space with the property that
every closed bounded subset of X can be written as the union of a finite
number of closed sets each with a diameter that is at most half the
diameter of the original closed bounded subset. Then a subset of X is
compact if and only if it is both closed and bounded.
• Let <X; d> be a complete metric space, and assume that every closed and
bounded subset of X can be written as the union of a finite number of closed
sets each with a diameter that is at most half the diameter of the original
closed bounded subset.
• It has already been shown that every compact set in X is both closed and
bounded.
• Assume that S0 X is both closed and bounded with diameter r.
• Let T be a collection of open sets that covers S0 , and assume that no finite
collection of open sets in T covers S0 .
• By assumption S0 can be written as a union of a finite number of closed
subsets each diameter at most 2r .
• At least one of these finitely many subsets cannot be covered by finitely
many open sets in T. Call one of those sets S1 .
• Clearly, S1 is not the empty set, or it could be covered by a single open set
in T.
• Assume, inductively, that for some k > 0 the set Sk Sk1 has been chosen
with diameter at most 2rk such that Sk cannot be covered by a finite number
of open sets in T.
• Sk can be written as the union of a finite number of closed subsets each with
r
diameter at most 2kC1 .
• Because Sk has no finite subcover in T, at least one of these finitely many
subsets of Sk has no finite subcover in T. Call one of these sets SkC1 .
• Thus, by mathematical induction, there is a nested sequence of sets S0
S1 S2 where each Sk cannot be covered by a finite number of open
sets in T and each Sk has diameter at most 2rk .
• Because none of the Sk are the empty set and the sets are nested, the
intersection of any finite collection of Sk sets is nonempty.
• From each Sk set select one element and call it ak .
• Because the Sk sets are nested, for each natural number k and each n k,
an 2 Sk .
• Let > 0 be given.
• Select k such that 2rk < .
• Then for all m; n k, am and an are in Sk , so d.am ; an / 2rk < .
• This shows that the sequence <an > is Cauchy, and because <X; d> is a
complete metric space, the sequence has a limit L.
• Because Sk is closed for each natural number k, and an 2 Sk for each n k,
it follows that the limit of the sequence, L, is also an element of Sk .
(continued)
326 10 Metric Spaces
For example, suppose that S is a closed bounded set in the metric space R2 using
the Euclidean metric. Let S have diameter r > 0. One could be very careful and
show that S can be enclosed in a square with side length r, or one could be sloppy
and more easily show that S can be enclosed in a square with side length 2r. In
the latter case, just select any point s 2 S and draw two parallel lines a distance 2r
apart where s is half way between the two lines. Since the diameter of S is r, S will
be bounded by these two lines. Then draw two lines a distance 2r apart where s is
half way between the two lines and the two lines are perpendicular to the original
two lines. The four lines now determine a square of side length 2r that contains
S. Drawing a 6 6 grid in thispsquare partitions the set S into at most 36 sections
each with diameter at most 2r 6 2 < 3r6 D 2r . Thus, R2 satisfies the condition of
the theorem, so the compact sets in R2 are exactly those that are both closed and
bounded (Fig. 10.11).
Earlier
8 in this chapter you saw the sequence
9 of functions from CŒ0; 1 given by
ˆ
< 0 if x 1
nC1
>
=
1
fn .x/ D n.n C 1/x n if nC1 < x < 1n . This is an example of a closed bounded
:̂ >
;
1 if 1n x
set with diameter 1, but it is not possible to cover this set with a finite number of sets
with diameter 12 since each such set could contain at most one term of the sequence.
As seen before, CŒ0; 1 is a metric space which contains closed bounded sets that are
not compact. It is left as an exercise to show that CŒ0; 1 with the supremum metric
is complete.
s
10.11 Contraction Mappings 327
10.10.1 Exercises
A very powerful tool for showing that a particular equation has a solution comes
from the properties of complete metric spaces. Let <X; d> be a complete metric
space, and suppose that there is a function f W X ! X. The function f is called a
contraction mapping if there is a positive real number r < 1 such that for every
x; y 2 X, the distance from f .x/ tof .y/ is smaller by at least a factor of r than
the distance from x to y. That is, d f .x/; f .y/ r d.x; y/. In other words, the
mapping contracts the space X by making all points move closer to each other by
a factor of at least r. Note that all contraction mappings are continuous functions,
in fact, uniformly continuous functions,
because given any > 0, you can choose
ı D and have that d f .x/; f .y/ r d.x; y/ < r < for all x; y 2 X with
d.x; y/ < ı. The important theorem about contraction mappings states that every
contraction mapping on a complete metric space has a unique fixed point, that is,
exactly one point L such that f .L/ D L so f maps L to itself (Fig. 10.12).
To prove this theorem about contraction mappings, you first have to identify the
point L that is a fixed point, and then you must show that it is unique. Finding
L is easy. Just start at any point x0 2 X and follow its orbit, that is, follow the
sequence x0 ; x1 D f .x0 /; x2 D f .x1 /; : : : . By the property of contraction mappings,
the distance between the terms of this sequence keeps shrinking by a factor of at
least r. It follows that for any natural number k, the distance between xk and xkC1 is
less than rk d.x0 ; x1 /, and, thus, the distances between successive terms of the orbit
decrease at least as fast as a convergent geometric series. This is sufficient to prove
328 10 Metric Spaces
that the orbit is a Cauchy sequence. Since X is complete, the sequence must converge
to some point L 2 X. Note that if you begin the sequence at some other point y0 2 X,
its orbit also converges, and since d.x0 ; y0 /; d.x1 ; y1 /; d.x2 ; y2 /; : : : must converge to
0, it follows that the orbit of y0 must also converge to L. In particular, the orbit of
f .L/ D L follows
L must converge to L. Thefact that from
the triangle inequality
which shows that d f .L/; L d f .L/; f .xn/ Cd f .x n /; L for any natural
number n,
and, in particular, for n large enough that d f .xn /; L and d f .xn /; f .L/ r d.xn ; L/
are small. The uniqueness of the fixed point comes from the simple fact that if L and
M are both fixed points, then d.L; M/ D d f .L/; f .M/ r d.L; M/ which can only
happen if d.L; M/ D 0 and L D M.
by requiring m and n to be large which shows that the sequence <xn > is a
Cauchy sequence.
• Because <X; d> is a complete metric space, the Cauchy sequence <xn >
has a limit L 2 X.
• Since lim xn D L, given any > 0, there is a natural number N, such that
n!1
d.xn ; L/ < 2 for all n N.
• Then d f .L/; L d f .L/; f .xN / Cd f .xN /; L/ rd.L; xN /Cd.xNC1 ; L/ <
.r C 1/ 2 < .
• Therefore, f .L/ must equal L, and L is a fixed point of the contraction
mapping.
• If M 2 X with f .M/ D M, then d.L; M/ D d f .L/; f .M/ r d.L; M/
which implies d.L; M/ D 0 and, thus, L D M.
• Therefore, L is a unique fixed point of f completing the proof of the theorem.
10.11 Contraction Mappings 329
The power of contraction mappings is that one can find the fixed point of
f W X ! X by selecting any x 2 X and following its orbit which will converge
to the fixed point at least as fast as a geometric series. The reader may well be
familiar, for example, with a method of calculating the square root of a positive
real number a by starting with a positive number x and iterating the function
2 Ca
f .x/ D x 2x , a formula which comes from applying Newton’s Method to find
2 Ca
a root of the function x2 a.pThe fixed point of f is an x satisfying x D x 2x
which is satisfied by x D a.pIf a D 2, for example, and you begin with
x D 100 as a first guess for 2, then you generate the orbit of 100 to be
100; 50:01; 25:024996; 12:55245805; 6:355894695; 3:335281609; 1:967465562;
1:49200089; 1:416241332; 1:414215014; p 1:414213562; : : : with each successive
term being about half as far from 2 as the previous term. Why does this work?
First note that f maps the interval Œ1; 1/ to itself. On that interval the derivative
2
of f is f 0 .x/ D 2x 2x2
which has its maximum value at x D 1 where f 0 .1/ D 12 .
Thus,
ˇ byˇ the Mean Value Theorem, when x > y 1, there is a c > 1 such that
ˇ f .x/f .y/ ˇ
ˇ xy ˇ D jf 0 .c/j, so jf .x/ f .y/j < 12 jx yj showing that f is a contraction
mapping on Œ1; 1/.
Another familiar example, known to any student who has played around
with a calculator which can calculate the cosine function at the touch of a
button, is that the orbit of 0 under the cosine function (using radian measure)
is 0; 1; 0:540302306; 0:857553216; 0:65428979; 0:793480359; 0:701368774;
0:763959683; 0:722102425; 0:750417762; 0:731404042; : : : ; 0:739085132; : : : .
Again, this is because the function cos x is a contraction mapping on the interval
Œ 12 ; 1 because the derivative of cos x is sin x, and the Mean Value Theorem
guarantees that there is a c 2 Œ 12 ; 1 such that j cos x cos yj D jx yj sin c <
jx yj sin 1 where sin 1 < 1.
The preceding examples suggest why contraction mappings are an important tool in
Numerical Analysis, a field that seeks out efficient algorithms for solving numerical
problems, often using computers. These examples find fixed points of functions
from R to R. But contraction mappings can be defined on other more complex
complete metric spaces. For example, the space of continuous functions on a closed
interval is a complete metric space. In particular, it can be shown that there is a
contraction mapping on this space of continuous functions whose fixed point is a
solution to particular differential equation.
A differential equation of the first order is an equation relating the variables
x, y, and y0 . The equation is called first order because it only involves first order
derivatives. A solution to such an equation is a function y.x/ that satisfies this
3
equation. For example, the equation y0 D 9x2 y has solutions y.x/ D Ce3x where C is
330 10 Metric Spaces
3
any real number. By differentiating y.x/ D Ce3x , you can verify that it does satisfy
the differential equation. It is typical that the solution of a first order equation will
contain an arbitrary constant such as the C that appears in this solution. This comes
from the fact that knowing a function’s derivative only determines the function up to
an additive constant of integration. For this reason, a first order differential equation
is often stated by giving an initial condition y.a/ D b for some constants a and b,
because then, the arbitrary constant in the solution can be determined. For example,
for the equation given above, if it were required that y.1/ D 5, then the solution
3
would be y.x/ D e53 e3x . A differential equation along with an initial condition is
called an initial value problem.
In a first course in Differential Equations you learn a large number of techniques
that can be used to solve various differential equations. For example, the above
equation y0 D 9x2 y can be solved by first dividing both sides of the equation
0
by y to get yy D 9x2 and then integrating both sides to find ln jyj D 3x3 C K.
Exponentiating both sides of this equation and letting C equal either eK or eK gives
the previously stated solution. It comes as a surprise to the student of Differential
Equations that even though the first course in the field contains many techniques
for solving differential equations, subsequent courses contain very little about how
to solve equations and concentrate instead on finding numerical approximations to
solutions and on theorems telling when solutions are expected to exist. One very
powerful theorem which shows there exist unique solutions to a fairly large class of
initial value problems is the Picard Existence Theorem which applies to equations
of the form y0 D F.x; y/ with initial condition y.a/ D b. The theorem requires that
there exists a compact set R R2 surrounding the initial point .a; b/ such that F is
continuous on that compact set R, and there is a constant M such that for any points
.x; y1 / and .x; y2 / in R, the function F satisfies jF.x; y1 / F.x; y2 /j Mjy1 y2 j.
This condition on the second variable of F is known as a Lipschitz condition. The
theorem concludes that there is a ı > 0 and a function y.x/ defined on the interval
Œa ı; a C ı such that y.x/ is the unique solution to the initial value problem
y0 D F.x; y/ and y.a/ D b.
The theorem can be proved by noticing that y is a solution to the given initial
Rx
value problem if and only if y satisfies the equation y.x/ D b C F t; y.t/ dt.
a
Rx
But the function G.y/ D b C F t; y.t/ dt turns out to be a contraction mapping
a
on a space of continuous functions on an interval of the form Œa ı; a C ı, and
the Contraction Mapping Theorem guarantees a unique fixed point to this equation
y D G.y/ which solves the initial value problem.
10.11 Contraction Mappings 331
• Let the point .a; b/ be contained in the interior of the compact set R R2 .
• Let F be a function continuous on R, and assume that there is a constant M
such that whenever .x; y1 / and .x; y2 / are in R, then jF.x; y1 / F.x; y2 /j
Mjy1 y2 j.
• Because F is continuous on the compact set R, it is bounded. Thus, there is
a constant K such that jF.x; y/j K for all points .x; y/ 2 R.
• Because .a; b/ is in the interior of R, there is a ı > 0 such that Mı < 1 and
all the points .x; y/ satisfying jx aj ı and jy bj Kı lie within the
compact set R.
• Let C be the metric space of all functions y.x/ continuous on the interval
Œa ı; a C ı satisfying jy.x/ bj Kı using the supremum metric.
• Note that C is a complete metric space because it is just a closed subset of
the complete metric space of all the continuous functions on Œa ı; a C ı.
Rx
• Define the mapping G on C by G.y/ D b C F t; y.t/ dt.
a
• For each y 2 C, G.y/ is an integral of the continuous function F.t; y.t//, so
G.y/ is a continuous function of x.
• ˇIf y 2 C, thenˇfor any x 2 Œa ı; a C ı, it follows that jG.y/.x/ bj D
ˇRx ˇ Rx Rx
ˇ F t; y.t/ dtˇ jF t; y.t/ j dt K dt D Kjx aj Kı showing that
ˇ ˇ
a a a
for all y 2 C, G.y/ is also in C.
• Moreover, if y1 and y2 are bothˇ in C, then for all x with jxaj ˇı, it follows
ˇRx Rx ˇ
that jG.y1 /.x/ G.y2 /.x/j D ˇˇ F t; y1 .t/ dt F t; y2 .t/ dtˇˇ
a a
Rx Rx
jF t; y1 .t/ F t; y2 .t/ j dt Mjy1 .t/ y2 .t/j dt
a a
Mı sup jy1 .x/ y2 .x/j. Because Mı < 1, this shows that G is a
jxajı
contraction mapping on C.
• Thus, by the Contraction Mapping Theorem, there is a unique y 2 C such
Rx
that y D G.y/ D b C F t; y1 .t/ dt.
a
x D a, and The Fundamental Theorem of
• Clearly, G.y/ is equal to b at
Calculus implies that y0 D F x; y.x/ .
• Because y satisfies the given initial value problem if and only if it satisfies
y D G.y/, this completes the proof.
332 10 Metric Spaces
Of course, this theorem only guarantees that there is a function y.x/ satisfying
the
differential equation
in a small neighborhood
Œa ı; a C ı. But if the points
a ı; y.a ı/ and a C ı; y.a C ı/ are in the interior of the compact set R, the
theorem can be applied again to extend the function. As a result, a function y.x/ can
be constructed whose graph extends toward the boundary of R.
Consider the problem of finding a solution to the first order differential equation
p
y0 D xy2 C y with the initial condition y.0/ D 2. Common techniques for solving
differential equations do not yield a solution to this equation. Yet Picard’s Existence
Theorem shows that the equation has a unique solution. In particular, let R be the
rectangle of points satisfying jxjp 2 and jy 2j 1. On that rectangle the function
p 1
xy2 C y is bounded by 2 32 C 3 < 20 D K. Thus, you can choose ı D 20 so that
2 2 p p
ıˇ 2 and Kı D 1 1.ˇThen jF.x; y1 / F.x; y2 /j D jx.y1 y2 / C . y1 y2 /j D
ˇ ˇ
ˇx.y1 C y2 / C py1 C1 py2 ˇ jy1 y2 j .2 6 C 12 /jy1 y2 j < 13jy1 y2 j, so you can
choose M D 13. Then Mı D 13 20
< 1. The theorem then shows that there is a unique
1 1
solution to the equation on the interval Œ 20 ; 20 .
It is instructive to apply the method of Picard’s Theorem in a situation with a
known outcome. For example, the initial value problem y0 D y with initial condition
y.0/ D 2 has the solution y.x/ D 2ex . In this case the contraction mapping is
Rx
G.y/ D 2 C y.t/ dt. Suppose you begin with the function y0 .x/ D 2 and iteratively
0
apply the contraction mapping to see the orbit of y0 .
Zx
y1 .x/ D G.y0 / D 2 C 2 dt D 2 C 2x
0
Zx
x2
y2 .x/ D G.y1 / D 2 C 2 C 2t dt D 2 C 2x C 2
2Š
0
Zx
x2 x2 x3
y3 .x/ D G.y2 / D 2 C 2 C 2t C 2 dt D 2 C 2x C 2 C 2
2Š 2Š 3Š
0
Zx
x2 x3 x2 x3 x4
y4 .x/ D G.y3 / D 2 C 2 C 2t C C 2 dt D 2 C 2x C 2 C 2 C 2 :
2 3Š 2Š 3Š 4Š
0
This shows that the orbit of y0 is just the partial sums of the power series for 2ex
centered at 0, a satisfying result even if it is not surprising.
10.11 Contraction Mappings 333
10.11.3 Fractals
a minimum value on the compact set A, which is min d.x; y/. Then, if A and B are
x2A
both elements of H, define h .A; B/ D max min d.x; y/ which is the largest distance
y2B x2A
any point of B is from the set A. Again, since the Euclidean distance function is
continuous, and B is compact, this min d.x; y/ obtains a maximum value on B, and
x2A
there are points x 2 A and y 2 B such that d.x; y/ D h.A; B/. Because h.A; B/
need not be the same as h .B; A/, define h.A; B/ D max h .A; B/; h .B; A/ . This
distance function is known as the Hausdorff metric.
For example, let A be the closed disk consisting of all the points in R2 within
1 of the origin, and let B be the closed disk consisting of all the points in R2
within
2 of the origin. Clearly, A B. As a result h .B; A/ D
0. But h .A; B/ D
d .2; 0/; .1; 0/ D 1, so h.A; B/ D max h .A; B/; h .B; A/ D max.1; 0/ D 1.
Intuitively, there are points in B that are a distance 1 from the set A, and this is
the largest distance from points in B to the set A. As a second example, let A be
the square with vertices at .1; 1/; .1; 1/; .1; 1/; and .1; 1/, and let B be the
line segment from .1; 2/ to .1; 2/. Since each point in B is a distance 1 from
A, h .A; B/ D 1. Since the largest distance from a point in A to the set B is
d .0; 1/; .0; 2/ D 3, h .B; A/ D 3. Thus, h.A; B/ D 3.
What does it take to show that <H; h> a metric space? Because H is clearly a
nonempty set, and h.A; B/ is defined for all A and B in H, one merely has to verify
the conditions for h to be a metric. The fact that h is always a nonnegative real
number follows immediately from its definition, as does the fact that h.A; B/ D
h.B; A/. Is it true that for all A 2 H that h.A; A/ D 0? Yes, this follows from the fact
that if y 2 A, then min d.x; y/ D 0, so h.A; A/ D h .A; A/ D max min d.x; y/ D 0.
x2A y2A x2A
If A ¤ B, then there is either a point x 2 AnB or an x 2 BnA. If a set is compact,
then any point a distance 0 from the set must be in the boundary of the set and is,
therefore, an element of the set. From this it follows that either min d.x; y/ > 0 or
y2A
min d.x; y/ > 0, so either h .A; B/ > 0 or h .B; A/ > 0, and h.A; B/ > 0.
y2B
Showing that h satisfies the triangle inequality is a little trickier and relies on a
careful look at the definitions of h and h. Let A, B, and C be elements of H. To
show that h.A; C/ h.A; B/ C h.B; C/, begin by showing that for every a 2 A,
min d.a; c/ h.A; B/ C h.B; C/. So if a 2 A, then it is true for all b 2 B
c2C
that min d.a; c/ min d.a; b/ C d.b; c/ D d.a; b/ C min d.b; c/. In particular,
c2C c2C c2C
this is true for b 2 B which minimizes d.a; b/, so min d.a; c/ d.a; b / C
c2C
min d.b ; c/ min d.a; b/ C max min d.b; c/ h .B; A/ C max min d.b; c/
c2C b2B b2B c2C b2B c2C
h.A; B/ C h .C; B/ h.A; B/ C h.B; C/. Since all the distances min d.a; c/ are
c2C
bounded by h.A; B/Ch.B; C/, its maximum also has the same bound, so h .A; C/
h.A; B/ C h.B; C/. The same argument shows that h .C; A/ h.A; B/ C h.B; C/, so
h.A; C/ h.A; B/ C h.B; C/ which is the desired triangle inequality.
10.11 Contraction Mappings 335
Moreover, <H; h> is a complete metric space. A natural way to prove this
completeness is to take a Cauchy sequence of sets in H and show that this sequence
converges in the h metric to some set L 2 H. So let <An > be a Cauchy sequence in
H. The strategy for this proof is to construct another sequence of sets, <Tn >, so that
each Tm is close to one of the terms of the <An > sequence. Then it is shown that
the <Tn > sequence converges to a set L, and since the terms of the <An > sequence
are close to the terms of the <Tn > sequence, the <An > sequence will also converge
to L.
Because <An > is a Cauchy sequence, for each natural number n, there is an Nn
such that for every k and m greater than or equal to Nn , h.Ak ; Am / < 21n . Then define
Tn D fx 2 R2 j min d.a; x/ 21n g. Tn can be thought of as a cloud or halo around
a2An
the set ANn . In particular, Tn contains ANn and all of the points within 21n of An . Since
h.Am ; ANn / < 21n for all m Nn , it follows that Am Tn for all m Nn . It is also
clear that since Tn contains ANnC1 , that Tn contains TnC1 , and <Tn > is a sequence of
nested sets. It is also easy to show that Tn is nonempty, bounded, and closed so that
Tn 2 H.
1
Now you can define L D \ Tn . This set is not empty because if you intersect a
nD1
nested sequence of nonempty compact sets, you get a nonempty set. L is bounded
because it is contained in the bounded set T1 , and it is closed because it is the
intersection of closed sets. Thus, L 2 H. To show that lim Tn D L, you would
n!1
want to estimate h.Tn ; L/. Since L Tn , it follows that h .Tn ; L/ D 0. To
estimate h .L; Tn /, for each xn 2 Tn , the proof will construct a sequence of points
1
xn ; xnC1 ; xnC2 ; : : : where, for each m n, xm 2 Tm and d.xm ; xmC1 / 2m2 . This is
enough to show that the sequence xn ; xnC1 ; xnC2 ; : : : is Cauchy, so it will converge
to a point x which must end up being a member the set L. From this, it will follow
1 1 1
that d.xn ; x/ < 2n3 , so h .L; Tn / D max min d.a; b/ < 2n3 . Thus, h.Tn ; L/ < 2n3 .
b2Tn a2L
But this is enough to show that L D lim Tn . It is then a simple argument to show
n!1
that lim An D L which proves that all Cauchy sequences in H converge, so H is
n!1
complete.
• Let R2 be the metric space with Euclidean distance function d.x; y/.
2
• Let H be the space compactsubsets of R with the metric
of all nonempty
h.A; B/ D max h .A; b/; h .B; A/ where h .A; B/ D max min d.a; b/.
b2B a2A
• Let A1 ; A2 ; A3 ; : : : be a Cauchy sequence in H.
• Because the sequence is Cauchy, for every natural number n there is an Nn
such that for every k and m greater than or equal to Nn , h.Ak ; Am / < 21n .
• Define Tn D fx 2 R2 j min d.a; x/ 21n g.
a2An
• Tn is nonempty because it contains the nonempty set An .
(continued)
10.11 Contraction Mappings 337
(continued)
338 10 Metric Spaces
Knowing that <H; h> is a complete metric space shows that every contraction
mapping on H has a unique fixed point. Some fractals can be generated as the result
of being a fixed point of such a contraction mapping f W H ! H. In the words of
the study of fractals, these fixed points are called attractors. What it means is that
if you begin with any nonempty compact set A, the orbit of A under the contraction
mapping, A; f .A/; f f .A/ ; f f f .A/ ; : : : , will converge to this attractor.
Where can you find a contraction mapping f W H ! H with an interesting fractal
as its attractor? The first thing to notice is that if f W R2 ! R2 is a contraction
mapping on R2 , then the usual extension of f to subsets of R2 given by f .A/ D
fy 2 R2 j y D f .x/ for some y 2 R2 g gives a function f W H ! H that is a
contraction mapping on H. To see this, let d be the usual Euclidean distance metric
on R2 , let k be a positive real number less than 1, and let f W R2 ! R2 be a
contraction mapping satisfying d f .x/; f .y/ k d.x;y/ for all x and y in R2 .
Then, if A and B are in H, it follows that h f .A/; f .B/ D max min d.a; b/ D
b2f .B/ a2f .A/
max min d f .a/; f .b/ max min k d.a; b/ D k h .A; B/, and it follows that
a2A
b2B b2B a2A
h f .A/; f .B/ k h.A; B/.
Next, suppose that f1 ; f2 ; f3 ; : : : ; fs is a finite set of contraction mappings on R2
(therefore on H) with associated contraction constants k1 ; k2 ; k3 ; : : : ; ks , respec-
s
tively. Then the mapping F W H ! H given by F.A/ D [ fj .A/ is also a
jD1
contraction mapping on H with contraction constant k D max.k1 ; k2 ; k3 ; : : : ; ks /.
This follows from the easily established fact that if A, B, C, and D are in H, then
h.A [ B; C [ D/ h.A; C/ C h.B; D/.
So what functions should one choose for f1 ; f2 ; f3 ; : : : ; fs ? Any contraction
mappings of the plane will give rise to some attractor. Linear transformations are
easy to construct and easy to understand, but they are limited because they map
points near the origin to points near the origin, so the origin is always the fixed
point of such transformations. Almost as easy are the affine transformations
which are functions that perform a linear transformation followed by a simple
translation. In other words, for every affine function on R2 there are six constants,
a1 ; a2 ; b1 ; b2 ; c1 ; and c2 , such that the affine function maps a point .x; y/ 2 R2 to
.a1 x C b1 y C c1 ; a2 x C b2 y C c2 /. The Sierpinski triangle can be generated using
such a collection of transformations where the constants are given by
10.11 Contraction Mappings 339
Function a1 a2 b1 b2 c1 c2
f1 0:5 0 0 0:5 0 0
f2 0:5 0 0 0:5 0:5 0
p
f3 0:5 0 0 0:5 0:25 0:25 3
Function a1 a2 b1 b2 c1 c2
f1 0 0 0 0:16 0 0
f2 0:85 0:04 0:04 0:85 0 1:6
f3 0:2 0:26 0:23 0:22 0 1:6
f4 0:15 0:28 0:26 0:24 0 0:44
Figure 10.15 shows what happens when this transformation is applied to a simple
equilateral triangle in the plane as shown by the first picture in the figure. Applying
the function once results in the second picture showing that the triangle is mapped
into four separate skewed copies by the four pieces of the transformation. The third
picture shows what happens to this after a second iteration of the transformation.
By the time the transformation has been iterated three times, the image set already
appears to be a recognizable object quite different from the original triangle. The
figure also shows what the set looks like after 5, 10, 15, and 20 iterations. After 20
iterations, the image appears to be quite close to the attractor for this transformation
and does look a great deal like a fern.
Fig. 10.15 Generation of a fractal fern by applying a contraction mapping to a triangle. Shown is
the original triangle and its image after 1, 2, 3, 5, 10, 15, and 20 iterations of the transformation
340 10 Metric Spaces
There are many fractal generating programs available either as web applications
or as stand-alone programs. The interested reader may well find it useful to
investigate the convergence properties of the transformations discussed here and
other similar transformations obtainable by either small or perhaps large changes in
the given affine maps.
10.11.4 Exercises
Infinite Series
Bonar, D.D., Khoury, M.J.: Real Infinite Series. Mathematical Association of
America, Washington, DC (2006)
Knopp, K.: Theory and Application of Infinite Series. Dover, Mineola (1990)
Writing Proofs
Lay, S.R.: Analysis with an Introduction to Proof, 5th edn. Pearson, New York
(2013)
Solow, D.: How to Read and do Proofs: An Introduction to Mathematical Thought
Processes, 6th edn. Wiley, New York (2014)
Differential Equations
Birkhoff, G., Rota, G.-C.: Ordinary Differential Equations, 4th edn. Wiley, New
York (1989)
Kelley, W.G., Peterson, A.C.: The Theory of Differential Equations. Springer,
New York (2010)
Fractals
Barnsley, M.F.: Fractals Everywhere. Dover, Mineola (2012)
Devaney, R.L.: Chaos Fractals, and Dynamics: Computer Experiments in Modern
Mathematics. Addison-Wesley, Boston (1991)
Index
A B
Abel’s theorem, 256 Barnsley fern, 339
absolute extremum, 145 biconditional, 11, 96, 184, 195, 313
absolute maximum, 145 bijection, 42, 142, 159, 161, 283, 315, 321, 322
absolute minimum, 145 bijective, 42
absolute value, 38, 121 Bolzano–Weierstrass theorem, 75, 76, 78, 124,
absolutely convergent series, 205 125, 322
accumulation point, 74, 286, 306, 308, 318, bound, 35
319 boundary, 270, 286, 306
addition property of less than, 32 bounded, 35
affine transformation, 338 bounded above, 35, 62
Algebra, 4 bounded below, 35, 62
alternating series, 219 bounded derivative, 108
alternating series test, 219 bounded function, 62, 123
Analysis, 4 bounded metric space, 318
analytic function, 259 bounded sequence, 63
antecedent, 9
antiderivative, 193
Archimedian principle, 36 C
area, 159 C[0,1], 302
area axioms, 169 Cantor set, 163, 166, 333
area zero, 165 cardinality, 159
assert the hypothesis, 14 Cauchy product, 231, 265
associative law of intersection, 19 Cauchy sequence, 66, 67, 77, 78, 204–206,
associative law of union, 19 210, 223, 323, 324, 328, 336
associative laws, 32 Cauchy sequences converge, 77
attractors, 338 Cauchy–Schwarz inequality, 297
axiom, 2, 7, 35–37, 63, 75, 128, 168, 169, 201, chain rule, 140
323 closed function, 285
axioms for a field, 32 closed set, 274, 306
axioms for an ordered field, 32 closure of a set, 287, 306
axioms for area, 166 closure properties, 32
axioms for the real numbers, 36, 201 codomain, 40, 42
lower bound, 35 P
lower step function, 184 p-series, 217
lowest terms, 31 paradox, 3
parentheses in series, 224
partial sums, 202
M partition, 28
M-test, 253 partition of an interval, 170
Mandelbrot set, 333 path-connected set, 292
mathematical induction, 35, 64, 87, 92, 167, Picard existence theorem, 330
212, 262, 291, 299 pointwise convergence, 239
maximum, 122 positive integers, 27
maximum value, 123, 126 postulate, 2
mean value theorem, 108, 146, 147, 152 power series, 201, 215, 253, 255–267, 332
mean value theorem for integration, 191 power series antiderivative, 262
measure zero, 163, 195, 200, 244 power series convergence, 255
method of exhaustion, 167 power series difference, 265
metric, 296 power series differentiability, 259
metric space, 296–340 power series interval of convergence, 256
minimum, 122 power series product, 265
minimum value, 123, 126 power series quotient, 265
Minkowski inequality, 298, 299 power series sum, 265
monotone convergence theorem, 248 preimage of a set, 283
monotone decreasing sequence, 60 principle, 2
monotone increasing sequence, 60 product of power series, 265
monotone sequence, 60 product rule, 138
monotonically decreasing sequence of proof, 2
functions, 246 proof by contradiction, 10, 36, 70, 90, 112, 123,
monotonically increasing sequence of 126, 161, 177, 178, 196, 222, 286, 317,
functions, 246 318, 324
multiplication property of less than, 32 proof of conditional statements, 11
multiset, 18 proof template, 5, 12, 20, 23, 41, 42, 49, 70,
100, 106, 296
proofs about even and odd, 27
N proofs about set equality, 22
n-dimensional Euclidean space, 299 proofs about subsets, 19
natural numbers, 27 proposition, 2
negation of a statement, 10 proving a derivative, 135
negation of statements with quantifiers, proving a limit, 49
69 proving a metric space, 296
negation symbol, 10 proving continuity, 100
neighborhood in a metric space, 296 proving no limit exists, 70
norm of a partition, 170 proving uniform continuity, 106
O Q
odd integers, 27 quantifier, 69
one-sided limits, 54 quotient, 28
one-to-one, 42 quotient of power series, 265
onto, 41 quotient rule, 138
open cover, 110, 288, 317
open function, 283
open set, 274, 306 R
orbit, 327 radius of convergence, 255
ordered field, 32 range, 40
Index 347