Honors Calculus

Pete L. Clark

c Pete L. Clark, 2014.
Thanks to Ron Freiwald, Eilidh Geddes, Melody Hine, Nita Jain, Andrew Kane,
Bryan Oakley, Elliot Outland, Didier Piau, Jim Propp, Troy Woo, and Sharon
Yao.

Contents

Foreword 7
Spivak and Me 7
What is Honors Calculus? 10
Some Features of the Text 10

Chapter 1. Introduction and Foundations 13
1. Introduction 13
2. Some Properties of Numbers 17

Chapter 2. Mathematical Induction 25
1. Introduction 25
2. The First Induction Proofs 27
3. Closed Form Identities 29
4. Inequalities 31
5. Extending Binary Properties to n-ary Properties 32
6. The Principle of Strong/Complete Induction 34
7. Solving Homogeneous Linear Recurrences 35
8. The Well-Ordering Principle 39
9. The Fundamental Theorem of Arithmetic 40

Chapter 3. Polynomial and Rational Functions 43
1. Polynomial Functions 43
2. Rational Functions 49

Chapter 4. Continuity and Limits 53
1. Remarks on the Early History of the Calculus 53
2. Derivatives Without a Careful Definition of Limits 54
3. Limits in Terms of Continuity 56
4. Continuity Done Right 58
5. Limits Done Right 62

Chapter 5. Differentiation 71
1. Differentiability Versus Continuity 71
2. Differentiation Rules 72
3. Optimization 77
4. The Mean Value Theorem 81
5. Monotone Functions 85
6. Inverse Functions I: Theory 92
7. Inverse Functions II: Examples and Applications 97
8. Some Complements 104
3

4 CONTENTS

Chapter 6. Completeness 105
1. Dedekind Completeness 105
2. Intervals and the Intermediate Value Theorem 113
3. The Monotone Jump Theorem 116
4. Real Induction 117
5. The Extreme Value Theorem 119
6. The Heine-Borel Theorem 119
7. Uniform Continuity 120
8. The Bolzano-Weierstrass Theorem For Subsets 122
9. Tarski’s Fixed Point Theorem 123

Chapter 7. Differential Miscellany 125
1. L’Hôpital’s Rule 125
2. Newton’s Method 128
3. Convex Functions 136

Chapter 8. Integration 149
1. The Fundamental Theorem of Calculus 149
2. Building the Definite Integral 152
3. Further Results on Integration 161
4. Riemann Sums, Dicing, and the Riemann Integral 170
5. Lesbesgue’s Theorem 176
6. Improper Integrals 179
7. Some Complements 182

Chapter 9. Integral Miscellany 185
1. The Mean Value Therem for Integrals 185
2. Some Antidifferentiation Techniques 185
3. Approximate Integration 187
4. Integral Inequalities 194
5. The Riemann-Lebesgue Lemma 196

Chapter 10. Infinite Sequences 199
1. Summation by Parts 200
2. Easy Facts 201
3. Characterizing Continuity 203
4. Monotone Sequences 203
5. Subsequences 206
6. The Bolzano-Weierstrass Theorem For Sequences 208
7. Partial Limits; Limits Superior and Inferior 211
8. Cauchy Sequences 214
9. Geometric Sequences and Series 216
10. Contraction Mappings Revisited 217
11. Extending Continuous Functions 225

Chapter 11. Infinite Series 229
1. Introduction 229
2. Basic Operations on Series 233
3. Series With Non-Negative Terms I: Comparison 236
4. Series With Non-Negative Terms II: Condensation and Integration 241

CONTENTS 5

5. Series With Non-Negative Terms III: Ratios and Roots 247
6. Absolute Convergence 250
7. Non-Absolute Convergence 254
8. Power Series I: Power Series as Series 260
Chapter 12. Taylor Taylor Taylor Taylor 265
1. Taylor Polynomials 265
2. Taylor’s Theorem Without Remainder 266
3. Taylor’s Theorem With Remainder 269
4. Taylor Series 271
5. Hermite Interpolation 277
Chapter 13. Sequences and Series of Functions 281
1. Pointwise Convergence 281
2. Uniform Convergence 283
3. Power Series II: Power Series as (Wonderful) Functions 288
Chapter 14. Serial Miscellany 291
π2
P∞ 1
1. n=1 n2 = 6 291
2. Rearrangements and Unordered Summation 293
3. Abel’s Theorem 302
4. The Peano-Borel Theorem 306
5. The Weierstrass Approximation Theorem 310
6. A Continuous, Nowhere Differentiable Function 313
7. The Gamma Function 315
Chapter 15. Several Real Variables and Complex Numbers 321
1. A Crash Course in the Honors Calculus of Several Variables 321
2. Complex Numbers and Complex Series 323
3. Elementary Functions Over the Complex Numbers 325
4. The Fundamental Theorem of Algebra 327
Chapter 16. Foundations Revisited 333
1. Ordered Fields 334
2. The Sequential Completion 342
Bibliography 351

Foreword

Spivak and Me
The document you are currently reading began its life as the lecture notes for a year
long undergraduate course in honors calculus. Specifically, during the 2011-2012
academic year I taught Math 2400(H) and Math 2410(H), Calculus With Theory,
at the University of Georgia. This is a course for unusually talented and motivated
(mostly first year) undergraduate students. It has been offered for many years at
the University of Georgia, and so far as I know the course text has always been
Michael Spivak’s celebrated Calculus [S]. The Spivak text’s take on calculus is
sufficiently theoretical that, although it is much beloved by students and practi-
tioners of mathematics, it is seldomed used nowadays as a course text. In fact, the
UGA Math 2400/2410 course is traditionally something of an interpolation between
standard freshman calculus and the fully theoretical approach of Spivak. My own
take on the course was different: I treated it as being a sequel to, rather than an
enriched revision of, freshman calculus.

I began the course with a substantial familiarity with Spivak’s text. The sum-
mer after my junior year of high school I interviewed at the University of Chicago
and visited Paul Sally, then (and still now, as I write this in early 2013, though
he is 80 years old) Director of Undergraduate Mathematics at the University of
Chicago. After hearing that I had taken AP calculus and recently completed a
summer course in multivariable calculus at Johns Hopkins, Sally led me to a sup-
ply closet, rummaged through it, and came out with a beat up old copy of Spivak’s
text. “This is how we do calculus around here,” he said, presenting it to me. During
my senior year at high school I took more advanced math courses at LaSalle Uni-
versity (which turned out, almost magically, to be located directly adjacent to my
high school) but read through Spivak’s text. And I must have learned something
from it, because by the time I went on to college – of course at the University of
Chicago – I placed not into their Spivak calculus course, but the following course,
“Honors Analysis in Rn ”. This course has the reputation of sharing the honor with
Harvard’s Math 55 of being the hardest undergraduate math course that American
universities have to offer. I can’t speak to that, but it was certainly the hardest
math course I ever took. There were three ten week quarters. The first quarter was
primarily taught out of Rudin’s classic text [R], with an emphasis on metric spaces.
The second quarter treated Lebesgue integration and some Fourier analysis, and
the third quarter treated analysis on manifolds and Stokes’s theorem.
However, that was not the end of my exposure to Spivak’s text. In my second
year of college I was a grader for the first quarter of Spivak calculus, at the (even
then) amazingly low rate of $20 per student for the entire 10 week quarter. Though

7

His words are (very) well chosen. motivated students with little prior background and exposure to university level mathematics (in our day. I found that there were many truly dif- ficult problems in Spivak’s text. but I could imagine a student underwhelmed by an ordinary freshman calculus class for which Spivak’s text would be a similarly mighty gift). Chapter 3 is on Functions. but few. including a non-negligible percentage that I still did not know how to solve. The text is broken up into five “parts” of which the third is Derivatives and Integrals. When given the chance to introduce a subtlety or ancillary topic. and Chapter 4 is on Graphs. and I had many further interactions with Spivak’s text. (By the end of my undergraduate career there were a small number of double-starred problems of which I well knew to steer clear. and I now think its best use is in fact the way I first encountered it myself: as a source of self study for bright. (ii) The text itself is lively and idiosyncratic. one long paragraph. I have had many years to reflect on Spivak’s text. the text itself is rather spare. and eventually circle a single sentence which carried the entire content that the writer was trying to express. meaning that I would field questions from any undergraduate math course that a student was having trouble with. but I was a “drop in tutor” for my last three years of college. It was also my first experience with reading proofs written by bright but very inexperienced authors: often I would stare at an entire page of text. but increasingly I realized it was not the ideal accompaniment for my course.but not in a completely straightforward way. and any such text can a fortiori also be used in a course. After teaching the 2400 course for a while. More recently I have begun doing so myself. Being “good for self study” is a high compliment to pay a text. For my course.) Here is what I remembered about Spivak’s text in fall of 2011: (i) It is an amazing trove of problems. which are essentially the ordered field axioms. although not called such.8 FOREWORD the material in Spivak’s text was at least a level below the trial by fire that I had successfully endured in the previous year. In particular limits are not touched until Chapter 5. all instances of third person pronouns referring to mathematicians had been changed from “he” to “she”. Initially I found this amusing but silly.. I realized that my much more vivid memories of the problems than the text itself had some basis in fact: although Spivak writes with a lively and distinct voice and has many humorous turns of phrase. this probably means high school students. For one thing. When one takes into account the ample margins (sometimes used for figures.1 (iii) The organization is somewhat eccentric. Grading for this course solidified my knowledge of this elementary but important material. I only graded for one quarter after that. The text begins with a chapter “basic properties of numbers”. Spivak almost inevitably defers it to the problems. although the 1In between the edition of the book that Sally had given me and the following edition. These chapters are essentially con- tentless. . but most often a white expanse) the chapters themselves are very short and core-minded.. I lost no esteem for Spivak’s text. some of which are truly difficult.

I do not view this borrowing as being in any way detrimental. and much of my teaching time was spent “reconstructing” it. There were many beautiful. and I spent the majority of them helping the students make their way through the problems in Spivak’s text. [H]. and I certainly do not attempt to suppress my reliance on other texts. constructing and critiquing proofs. students benefit from this exposure. by the way. It is also heavily indebted to Rudin’s classic text [R]. and I tried to take advantage of this by insisting that students present their questions and arguments to each other as well as to me. [A]. probably more so. understanding. For me. SPIVAK AND ME 9 students who stuck with it were very motivated and hard-working. I think experienced mathematicians can forget the extent to which excellent. I am more thinking of the term in the sense that chefs use it. The mathematics exposed here is hundreds of years old and has been treated excellently in many famous texts. As explained above. My experience rather convinced me that “more is more”: students wanted to see more arguments in complete detail. the instructor is not leaving less for the students to do. and not merely a gloss of [S]. it is – or at least. [L]. Of course they also benefit immensely. and this did indeed come to pass: I ended up having office hours four days a week. by working things out for themselves. and also simply more proofs over- all. This brings me to the current text. but only different things for them to do. I had been informed in advance by my colleague Ted Shifrin that signing on to teach this course should entail committing to about twice as many office hours as is typical for an under- graduate course. even brilliant. Aside from the (substantial) content conveyed in the course. The value of this is certainly not in question. e. and there is no competition here: the amount of “things to do” in honors calculus is infinite – and the repository of problems in Spivak’s text is nearly so – so by working out more for herself. I mean it to be – a new honors calculus text. An undergraduate mathematics text which strove to be unlike its predecessors would 2My conception of the meaning of deconstruction in the sense of academic humanities is painfully vague. . multi-part problems introducing extra material that I wanted my students to experience: and eventually I figured out that the best way to do this was to incorporate the problems into my lectures as theorems and proofs. Indeed: • The text is indebted to [S]. However. In this regard being in a classroom setting with other motivated students is certainly invaluable. a major goal is to give the students facility with reading. it is heavily indebted to [S]. But students also learn about how to reason and express themselves mathemat- ically by being repeatedly exposed to careful. complete proofs. the lightness of touch of Spivak’s approach was ultimately something I appreciated aesthetically but could see was causing my students difficulty at vari- ous key points. Even the best students – who. I thought were awfully good – could not solve some of the problems unassisted.g. but not uniquely so. most (or all) of them needed a lot of help from me in all aspects of the course. and it borrows at key points from several other sources. [Go]. All in all I ended up viewing Spivak’s book as being something of a “deconstruc- tion”2 of the material.

10 FOREWORD almost certainly do so to the detriment of its audience. most undergraduate texts with “calculus” in the title offer very little innovation indeed. This theoretical perspective is of course a huge change from the practical. It is distressing to me how many calculus books are written which look like nearly exact replicas of some platonic (but far from perfect) calculus text written circa 1920. and its distinctiveness – some might say eccentricity – has not gone unnoticed by instructors. those in the honors program) are signed up for Math 2400 without any real idea of what they are getting into.) Instead we make -δ arguments which amount to this. The attrition at the beginning of the course is therefore significant. we never use these concepts in the main text. Some Features of the Text • Our approach is heavily theoretical but not heavily abstract. This text is also eccentric. in fact probably even more so than Spivak’s text. What is Honors Calculus? It was audacious of Spivak to call his text simply “calculus. We are not interested in calculation for its own sake. It is certainly not as appealing to as broad an audience as freshman calculus. real induction. This text does contain some novelties. of course. (More precisely.5 introduces metric spaces and discussed compactness therein. you need not like this one. which to my knowledge has not appeared before in texts. if not wholly new texts.10. Apparently the issue of treating transcendental functions earlier rather than later was enough to warrant many new editions. up to and including a proof technique. Spivak’s text is a great innovation. What is honors calculus? By “honors calculus” I mean a certain take on undergraduate real analysis. Having said this. Let us freely admit that this concreteness makes certain arguments longer . The idea of presenting a fully theoretical take on real analysis to a young un- dergraduate audience is hardly new. problem-oriented approach of freshman calculus (with which we assume a prior familiarity). it happens that many freshman students (especially. and in its own way: if you like Spivak’s text. making use of concepts from set theory and topology. and we do not give any halfway serious applications to any subject outside mathe- matics. which inevitably turn on the completeness of R.g. § 10. and the students that remain are not necessarily the most talented or experienced but rather those that remain interested in a fully theoretical approach to calculus. especially rigorous definitions of the various limiting processes and proofs of the “hard theorems” of calculus.” since it is light years away from any other text of that name currently in print. But most contemporary texts on real analysis take a more sophisticated approach. The jumping off point for an honors calculus course is a theoretical understanding of the results seen in freshman calculus. I have chosen to call this text: honors calculus. In this text we rarely speak explicitly about the connected- ness and compactness of intervals on the real line. At the University of Georgia. but only in some digressions which briefly and optionally present some advanced material: e. This text assumes that interest on the part of the reader.

we prove the Interval Theorems using -δ arguments and later come back to give much quicker proofs using sequences and the Bolzano-Weierstrass Theorem. including uniform convergence. One may well say that this is evidence that sequences should be treated 3In fact. I got “permission” to do so while teaching the analysis class at McGill. but I have come to believe that it is much less use- ful – and moreover. For instance. When preparing to teach the course I was quite surprised at how pedestrian the syllabus seemed: an entire year of real analysis was restricted to the one-dimensional case. I came up with this approach in late 2004 while preparing for the McGill analysis course. Just a few days later I noticed that Lang had done (essentially) the same thing in his text [L]. Without conscious thought I had assumed otherwise. I was encouraged that this idea had been used by others. The topics of the course were: infinite series. In particular. much less educational – in basic real analysis than in most other branches of pure mathematics. Later I realized that Gordon is in real life an integration theorist! More significantly. Exception: We begin our treatment of the Riemann integral with an axiomatic ap- proach: that is. Most texts at this level cover only one of these in detail. for instance. in this text we usually opt for the former. but here we cover both: first Darboux. and Gaston Dar- boux’s later simplification using upper and lower sums and integrals. • We often spend time giving multiple proofs and approaches to basic results. SOME FEATURES OF THE TEXT 11 and harder (or phrased more positively: one merit of abstraction is to make certain arguments shorter and easier ). This approach to the Riemann Integral resonates deeply with me. I didn’t find out until just after the course ended that the students did not know about countable and uncountable sets. A turning point in my feelings about this was a second semester undergraduate real analysis course I taught at McGill Uni- versity in 2005. Given the choice between pounding out an -δ argument and developing a more abstract or softer technique that leads to an easier proof. and portions of these notes appear here as parts of Chapters 11 through 14. and sequences and series of functions. that’s not strictly true: sometimes we do both. . then Riemann.3 I was strictly forbidden from talking about metric spaces. Well. since this was done in the official course text of R. we list some reasonable properties (axioms) that an area functional should satisfy. and I endorse it still. as it addresses concerns that bubbled up over several years of my teaching this material in the context of freshman calculus. and before we do the hard work of constructing such a functional we explore the consequences of these axioms. I am a fan of abstraction. less than two pages into our discussion of integration we state and prove (given the axioms!) the Fun- damental Theorem of Calculus. Gordon [Go]. there are two distinct approaches to the Riemann integral: Riemann’s original approach using tagged partitions and Riemann sums. the Riemann integral. and there was no topology or set theory whatsoever. By the end of the course I had acquired more respect for the deep content inherent in these basic topics and the efficacy of treating them using no more than -δ arguments and the completeness axiom. This was also the first course I taught in which I typed up lecture notes.

in which honors calculus essentially begins with a thorough grappling with the -δ definition of limit. most notably [R]. and many texts do take this approach. Thus we do not put it on the stage in Act I.12 FOREWORD at the beginning of the course rather than towards the end. However I endorse Spivak’s ordering of the material. . Although there are easier – and especially. softer – approaches to most of the key theorems. I feel that manipulation of inequalities that characterized hard analysis is a vitally important skill for students to learn and this is their best chance to learn it. Scene I. when people are still settling into their seats. • We view the completeness axiom as the star of the show.

b] such that f (x) = m. Introduction 1. in homework and on exams. (Intermediate Value Theorem) Let f : [a. b]. b] such that f (x) = M . the student. b] → R be a continuous function defined on a closed. (Extreme Value Theorem) Let f : [a. m ≤ f (x) ≤ M . (Uniform Continuity and Integrability) Let f : [a. bonuded interval. here are three of the fundamental results of calculus. e. bounded interval. Theorem 1.2. b] → R be a continuous function defined on a closed.3. Except for the part about uniform continuity. c) There exists at least one x ∈ [a. Then f is bounded and assumes its maximum and minimum values: there are numbers m ≤ M such that a) For all x ∈ [a.g. CHAPTER 1 Introduction and Foundations 1. these theorems are familiar results from freshman calculus. because they all concern an arbitrary continuous function defined on a closed. The goal of this course is to cover the material of single variable calculus in a mathematically rigorous way. the 1The definition of this is somewhat technical and will be given only later on in the course.1 Rb b) f is integrable: a f exists and is finite. Theorem 1. Suppose that f (a) < 0 and f (b) > 0. b] → R be a continuous function defined on a closed. As examples. The latter phrase is important: in most calculus classes the emphasis is on techniques and applications. b) There exists at least one x ∈ [a. The Goal: Calculus Made Rigorous. at the undergraduate level and beyond. Please don’t worry about it for now. Their proofs. bounded interval. but they will also be presented by you. at least – the three Interval Theorems. 13 . Most freshman cal- culus texts like to give at least some proofs. Then: a) f is uniformly continuous. are not. Theorem 1. so it is often the case that these three theorems are used to prove even more famous theorems in the course. while theoretical explana- tions may be given by the instructor – e.1. however.1. they are called – by me. This course is very different: not only will theorems and proofs be presented in class by me. Then there exists c with a < c < b such that f (c) = 0.g. bounded interval. it is usual to give some discussion of the meaning of a continuous function – the student tests her understanding of the theory mostly or entirely through her ability to apply it to solve problems. This course offers a strong foundation for a student’s future study of mathematics.

. However. 1. 2. . N = {0} ∪ Z+ = {0. there is a natural number 5 − 3. . Thus one of the necessary tasks of the present course is to give a more penetrating account of the real numbers than you have seen before. and in fact most analysts I know use N to denote the positive integers. −17 is 0 − 17. INTRODUCTION AND FOUNDATIONS Mean Value Theorem and the Fundamental Theorem of Calculus. b ∈ Z. a Q={ | a.2. This is formed out of the natural numbers N by formally allowing all subtractions: for instance. . any nonzero rational number has a unique expression b in .14 1. . Here is a list of the ones which will be most important to us: (1) Z+ ⊂ N ⊂ Z ⊂ Q ⊂ R ⊂ C. . So we just need to agree that b = d a iff ad = bc.} is the set of integers.a. 3. the same cannot be said of multiplicative inverses. Alternately. but no natural number 3 − 5. .} is the set of natural numbers. 2. Recall that the operation of subtraction is nothing else than the inverse opera- tion of addition: in other words. However the operation of division is not everywhere defined on Z: thre is an integer 6/3. namely 1. . Why then are the three interval theorems not proved in freshman calculus? Because their proofs depend upon fundamental properties of the real numbers that are not discussed in such courses. and now we have an additive identity as well as a multiplicative identity. Thus in Z every element n has an additive inverse −n. . Let me remind you what these various numbers are. 1. This is formed out of the integers Z by formally allowing all divisions by nonzero integers. to say a − b = c is to say that a = b + c. −2. . . Z+ = {1. 0. On them we have defined the operations of addition + and multiplication ·.k. Clearly Z+ and N are very similar: they differ only as to whether 0 is included or not. Note that one subtlety here is that the same rational number has many different expressions as the quotient of two inte- gers: for instance 46 = 32 = 3n + a c 2n for any n ∈ Z . . 1. . Recall that the operation of division is nothing else than the inverse oper- ation of multiplication: to say a/b = c is to say that a = b·c. . .} is the set of positive integers (a. n. There is no additive identity. . Numbers of Various Kinds. “counting numbers”). but so be it. I am probably showing my stripes as an algebraically minded mathematician by making the distinction. However the operation of subtraction is not everywhere defined on N: for instance. 3. −1. but no integer 6/4. . Z = {. 2. There are various kinds of “numbers”. Again we have operations of addition and multi- plication. n. . . . b is the set of rational numbers. b 6= 0}. Moreover there is an identity element for the multiplication. In analysis – the subject we are beginning the study of here! – the distinc- tion between Z+ and N is not very important.

2 Thus in Q we have an additive identity 0. and every nonzero element has a multiplicative inverse. we are reminding the reader of some of her mathematical past. Back to R: let us nail down the fact that there are real numbers which are not rational.. This shows that the integer a2 is even. we get 2b2 = a2 = (2A)2 = 4A2 . this requires proof! Let me reiterate that we are not giving proofs here or even careful definitions.g.e. What then are the real numbers R? The geometric answer is that the real numbers correspond to “points on the number line”. But this number x reeks of contrivance: it seems to have been constructed only to make trouble.. every element has an additive inverse. b2 Clearing denominators. one needs them in order to solve polynomial equations – but in this course they will play at most a peripheral role. An answer that one learns in high school is that every real number has an infinite decimal expansion. It happens that for any integer a. Finally. √ we suppose 2 is rational: then √ there are integers a. but it is not a cure-all: in order to pursue the implications of this definition – and even to really understand it – one needs tools that we will develop later in the course. there is really nothing to do but square both sides to get a2 2= . Here goes: seeking a contradiction. x = 0. b with b > 0 and 2 = ab . rather. Therefore what we are trying √ to prove must be true. we can come back to it later. e. The square root of 2 is not a rational number. and then exhibit a decimal expansion which is not eventually periodic.e. Since the defining property of 2 is that its square is 2. 2Like most of the statements we have made recently. Proof. . The ancient Pythagoreans discovered a much more “natural” irrational real number. We assume that what we are trying to prove is false. we get 2b2 = a2 . i. but this does not make clear why there are such points other than the rational numbers. Theorem 1. if a2 is even. They are extremely important in mathematics generally – e. . then so is a: let us assume this for now. not necessarily terminating or repeating. Substituting. 1. and from that we reason until we reach an absurd conclusion. the complex numbers C are expressions of the form a + bi where a and b are real numbers and i2 = −1. Thus we may write a = 2A with A ∈ Z. INTRODUCTION 15 lowest terms. The proof is the most famous (and surely one of the first) instances of a certain important kind of argument.g.16116111611116111116 . with a and b not both divisible by any integer n > 1. One way to see this is as follows: show that the decimal expansion of every rational number is eventually periodic. . we have a multiplicative identity 1. divisible by 2. i. In fact this is perfectly correct: it gives a complete characterization of the real numbers. namely a proof by contradiction.4. where the number of 1’s after each 6 increases by 1 each time. and conversely any integer followed by an infinite deci- mal expansion determines a real number.

But the only integer which is arbitrarily divisible by 2 is 0. b]Q = {x ∈ Q | a ≤ x ≤ b}. Note that we do not need to define f (x) at x = ± 2. Moreover. let [a. say.  1. so our conclusion is a = b = 0. for instance. Either way we get a contradiction. However most of the big theorems – especially. f certainly does not attain a maximum value. why do we want to use R to do calculus? Is there something stopping us from doing calculus over. because by the result of the previous section these are not rational numbers. we found that both a and b were divisible by 2. b ∈ Q. Again. so 3We will be much more precise about this sort of thing later on. it can be shown (and will be – later) that any function on a closed.16 1. For a. so we could have assumed this about ab at the outset and. deriva- tives and so forth for functions f : Q → Q exactly as is done for real functions. B = 2B2 . so as above b = 2B for some B ∈ Z. so 2 must not be a rational number. Example: Consider the function f : [0. Alternately – and perhaps more simply – each rational number may be written in lowest terms. in par- ticular. 2]Q such that f (c) = 0. . we get 4B 2 = (2B)2 = b2 = 2A2 . and so forth. Thus we are back where we started: assuming that 2b2 = a2 . This is suspect in the extreme.3 In particular. and there is no c ∈ [0. and we now have our choice of killing blow. 2]Q → Q given by f (x) = −1 if x2 < √2 and f (x) = 1 if x2 > 2. 2]Q → Q given by f (x) = x21−2 . f (2) = 1 > 0. Q? The answer to the second question is no: we can define limits. INTRODUCTION AND FOUNDATIONS or b2 = 2A2 . But f (0) = −1 < 0. The most routine results carry over with no change: it is still true. Thus the Extreme Value Theorem fails over Q. this √ function is well-defined at all points of [0. 2]Q because 2 is not a rational number. Thus b2 is divisible by 2. whereas we assumed b > 0: contradiction. However it is √ not bounded above: by taking ra- tional numbers which are arbitrarily close to 2. that sums and products of continuous functions are continuous. Then f is continuous – in fact it is differentiable and has identically zero derivative. Substituting. Why do we not do calculus on Q? To paraphrase the title question. the Interval Theorems – become false over Q. continuity. One ending is to observe that everything we have said above applies to A and B: thus we must also have A = 2A1 . x2 − 2 becomes arbitrarily small and thus f (x) becomes arbitarily large. √ that a and b are not both divisible by 2. Example: Consider the function: f : [0. or 2B 2 = A2 . This is just an overview. It is also a continuous function.3. We can continue in this way factoring out as many powers of 2 from a and b as we wish. Thus the Intermediate Value Theorem fails over Q. bounded interval which is either uniformly continuous or integrable is bounded.

and a structure which satisfies them is called a field.. x + y = y + x. including inequalities. z (x · y) · z = x · (y · z). At this point we will take it for granted that in our number system we have operations of addition. one needs to identify a starting point. x · 1 = x. z. (P3) (Identity for +): For all numbers x. Some Properties of Numbers 2. z. Axioms for a Field. there exists y with x + y = 0. For this: if z = x + iy is a nonzero complex number – i. in Euclidean geometry one lays down a set of axioms and reasons only from them. Virtually all mathematical theorems are of the form A =⇒ B. y. 0 and 1. assuming A. x · (y + z) = (x · y) + (x · z). This property. Specifically. Example: Let F2 = {0. is called completeness. 2. y. B must follow. We define . there exists a number y with xy = 1. (We take this as “known” information. y. (P8) (Inverses for ·) For all numbers x 6= 0. zw = 1. so we will introduce them gradually. 2] \ { 2} → R√ is not improperly Riemann integrable: it builds up infinite area as we approach 2. For instance. The point of these examples is in order to succeed in getting calculus off the ground. (P2) (Associativity of +): For all numbers x.) Example: The complex numbers C satisfy all of the above field axioms. Although it is not important for us now. we will give axioms that we want a number system to satisfy. (P6) (Associativity of ·): For all numbers x. and will play a major role in this course. x · y = y · x. we need to make use of some fundamental property of the real numbers not pos- sessed by (for intance) the rational numbers. (P4) (Inverses for +): For all numbers x.1. x + 0 = x. If you have had second semester √ freshman calculus. the above axioms (P0) through (P9) are called the field axioms. What we give here is essentially a codification of high school algebra. Example: Both Q and R satisfy all of the above field axioms. That is. which can be expressed in various forms. (x + y) + z = x + (y + z). (P9) (Distributivity of · over +): For all numbers x. SOME PROPERTIES OF NUMBERS 17 the above function f is neither uniformly continuous nor integrable. multiplication and an inequality relation <. (P5) (Commutativity of ·): For all numbers x. (P7) (Identity for ·): For all numbers x. 2. y. y.e. the real numbers x and y are not both zero – then if w = xx−iy 2 +y 2 . (P1) (Commutativity of +): For all numbers x. 1} be a set consisting of two elements. In order to do mathematics in a rigorous way. you should think about why the analogous function f : [0. The axioms needed for calculus are a lot to swallow in one dose. We require the following properties: (P0) 0 6= 1. The only one which is not straightforward is the existence of multiplicative inverses. and that there are distin- guished numbers called 0 and 1.

it turns out to be simpler to talk about the properties of positive numbers. Thus z = 0. 1 + 1 = 0. We have x · 0 = x · (0 + 0) = (x · 0) + (x · 0). d) The proofs of these are the same as the proofs of parts a) and b) but with all instances of + replaced by · and all instances of 0 replaced by 1. and consider 0+z. then we say that x is positive if x > 0. (−1) · x + x = ((−1) · x) + (1 · x) = (−1 + 1) · x = 0 · x = 0. Then F2 satisfies all of the above field axioms. INTRODUCTION AND FOUNDATIONS 0 + 0 = 0. b) Every number x has a unique additive inverse. Moreover.6. 0 + 1 = 1 + 0 = 1. 0+z = z. 0 + z = 0.5. 0 · 0 = 0 · 1 = 1 · 0 = 0. It is sometimes called the binary field. c). In every system satisfying the field axioms. By Proposition 1. Then we can define x < y if y − x ∈ P. Since 0 is an additive identity. b) Suppose y and z are both additive inverses to x: x + y = x + z = 0. Proof. 0 = x−1 · xy = (x−1 · x)y = 1 · y = y. if xy = 0 then x = 0 or y = 0. a) Note that 0 is an additive identity by (P3). In every system satisfying the field axioms.5.  Proposition 1.2. so y = z.7. Proof. suppose that we have identified a subset P of numbers as positive. if x 6= 0 and y 6= 0 then xy 6= 0. namely 1. for any number x. Now we want our set of positive numbers . Adding y to both sides gives y = 0 + y = (x + y) + y = (y + x) + y = y + (x + y) = y + (x + z) = (y + x) + z = (x + y) + z = 0 + z = z. for every number x we have x · 0 = 0. Axioms for an ordered field. In every system satisfying the field axioms: a) The only additive identity is 0. Proof. The remaining properties of numbers concern the inequality relation <.  Proposition 1. −1·−1 is the additive inverse of −1. Suppose that z is another additive identity. 1 · 1 = 1. If we are given the inequality relation <. Since z is an additive identity. thus knowing < we know which numbers are posi- tive.8 is: in any system satisfying the field axioms. Proposition 1. b) Every nonzero number has a unique multiplicative inverse. Subtracting (x · 0) from both sides gives 0 = x · 0. Since x 6= 0 it has a multiplicative inverse x−1 and then by Proposition 1. c) The only multiplicative identity is 1.8. then the additive inverse of x is (−1) · x. If −1 denotes the additive inverse of 1. Conversely. In every system satisfying the field axioms. contradicting the assumption that y 6= 0. 2.18 1.6. (−1)2 = 1. Proof. Instead of describing the relation < directly.  Note that a logically equivalent formulation of Proposition 1.  Proposition 1. suppose that xy = 0. Seeking a contradiction.

−x is positive. c) If x is positive. exactly one of the following holds: x = 0.9a). x is zero. so we need to rule out the possibility that it’s negative. For any x.) b) The trichotomy property may be restated as: for any number x. Proof. Proof. we just state it for future reference. (We say “x is negative”. 2. a) By definition. but in F2 1 + 1 = 0. If x > 0. then xy is positive. z in an ordered field: a) x < 0 ⇐⇒ 0 < −x. hence positive. x is negative. c) Suppose x is positive. But by (P12) the product of two positive numbers is positive. If x < 0. SOME PROPERTIES OF NUMBERS 19 to satisfy the following properties. either x > 0 or −x > 0. a contradiction. Proposition 1. (P12) (Closure under ·) For all positive numbers x. then by part a) −1 x would be positive −1 and thus by (P12) x · x = −1 would be positive. Also as above we can show that there is no such relation. If x is negative. x + y ∈ P. a) By (P0). x1 is positive. hence positive. y. Certainly x1 is not zero. y. b) For every nonzero x. then xy is negative.9. not on the face of it because we have not been given an inequality relation < satisfying (P10) through (P12). c) It follows that for all x. Proposition 1. So there is nothing to show here. also 1 + 1 > 0. then −x > 0 and then x2 = (−1)2 x2 = (−x) · (−x) is the product of two positive numbers. x1 is negative. Also 0 < −x means −x − 0 = −x is positive. either 1 is positive and −1 is negative. For in the complex numbers we have an element i with i2 = −1. d) If x is positive and y is negative. x2 > 0.10. we have not been given an inequality relation. (P10) (Trichotomy) For all numbers x. but are they an ordered field? Well. So it must be that 1 is positive and −1 is negative. contradiction. or −1 is positive and 1 is negative. In fact we will now show that there is no such relation. Thus by trichotomy. But the ordered field axioms imply both that −1 is negative and that any square is non-negative. b) No further argument for this is needed.  Example: The binary numbers F2 satisfy the field axioms (P0) through (P9). If x is negative then −x is positive so by what we just showed −x1 = −1x is positive. x is positive. . since 1 > 0. Example: The complex numbers C satisfy the field axioms (P0) through (P9). x2 ≥ 0. But if it were. (P11) (Closure under +) For all positive numbers x. y. so if −1 is positive and 1 is negative then 1 = (−1)2 is positive. part c) follows immediately from part b). in any ordered field. In every ordered field: a) 1 > 0 and −1 < 0. x < 0 means 0 − x = −x is positive. Indeed. contradicting Proposition 1. but are they an ordered field? As above. 1 6= 0. e) If x and y are both negative. then x2 = x · x is the product of two positive numbers. In fancy language. b) Since x is nonzero. A number system satisfying (P1) through (P12) is called an ordered field. exactly one of the following holds: x is positive. x · y ∈ P. F2 is a field which cannot be endowed with the structure of an ordered field. c) Since 02 = 0.

20 1. 4. b + a is positive. Since c is positive. the ordered field axioms (P0) through (P12) are just a list of some of the useful properties of Q and R. so xy 6= 0. + with n0 ∈ Z and ai ∈ {0. x1 is negative. . d) We have a1 − 1b = (b − a) · ab 1 . . 2. there is a positive integer n such that x ≤ n. 6.11. so by (P12) so is their product. and we need to rule out the possibility that xy is negative. b − a is positive and d − c is positive. and thus x is less than or equal to the integer n0 + 1. . b − a is positive. INTRODUCTION AND FOUNDATIONS and thus x1 = −( −1 x ) is negative. 5. 9}. by part c). 1. They are not a “complete set of axioms” for either Q or R – in other words. by part b) it is enough to rule out the possibility that xy is positive. there are other properties these fields have that cannot be logically deduced from these axioms alone. 3. c) Left to the reader. Then xy 6= 0. so bc > ac. a + c < b + d. b. The hypotheses imply that b − a and ab1 are both positive. Proof. In fact this is already clear because Q and R each satisfy all the ordered field axioms but are essentially different structures: in Q the element 2 = 1 + 1 is not the square of another element. but in R it is. since x is positive.3. Suppose it is. e) If a > 0 and b > 0. d) If 0 < a < b. c. For all a. Here we want to give some further “familiar” prop- erties that do not hold for all ordered fields but both of which hold for R and one of which holds for Q. (We are still far away from the fundamental completeness axiom for R which is necessary to prove the Interval Theorems.  2. then 0 < 1b < a1 . and thus y = xy · x would be positive: contradiction. an . e) Note that b2 − a2 = (b + a)(b − a). . and then by (P11) (b − a) + (d − c) = (b + d) − (a + c) is positive. 7. d in an ordered field: a) If a < b and c < d. Suppose it is. But actually this need not be 0 true! Can you think of an example? Beware: decimal expansions can be tricky. 8. b) If a < b and c > 0. . a) Since a < b and c < d. In particular x and y are not zero. so b + d > a + c. d) Suppose x is positive and y is negative. b) Since a < b. 1 1 x is positive. Then.4 4When I first typed this I wrote that x is less than n + 1. Then −xy is positive. and therefore b − a is positive iff b2 − a2 is positive.) The first axiom is called the Archimedean property: it says that for any positive number x. Since a and b are both positive.a1 a2 . so by part d) −y = −xy · x1 is negative and thus y is positive: contradiction. Some further properties of Q and R. As we have mentioned before. c) If a < b and c < 0. then ac < bc. by (P12) bc − ac = (b − c)a is positive. then ac > bc. This clearly holds for R according to our description of real numbers as integers followed by infinite decimal expansions: a positive real number x is of the form x = n0 . To show that xy is negative.  Proposition 1. e) Suppose x and y are both negative. then a < b ⇐⇒ a2 < b2 .

b ∈ Z+ . then there is no real number y with y n = x.) that the curious student may be well wonder: are there in fact systems of numbers satisfying the ordered field axioms but not the Archimedean property?!? The an- swer is yes. Some Inequalities. Thus |x| ≥ 0 always and x = ±|x|. if and only if there exists a real number y with y 2 = x.12. 2. 2. x ≤ y ⇐⇒ (y − x) = z 2 for some real number z. since it holds in any ordered field. and then ab ≤ a.e. such a theory is in many ways more faithful to the calculus of Newton and Leibniz than the theory which we are presenting here.) As a supplement tto part b). then there is exactly one other real number with nth power equal to x: −y.) We can prove part c) now. we define the absolute value of x to be x if x ≥ 0 and −x if x < 0.12 important enough to be recorded separately. y ∈ R. But we will not see such things in this course! The next property does provide a basic difference between Q and R. Corollary 1.13 leads to a basic strategy for proving inequalities in the real numbers: for x. c) If n is even and x is negative. there is a unique real number y such that y n = x. √ a) If n is odd.. note that if n is even and y is a positive real number such that y n = x. . Note that Corollary 1. (You might try to prove this as an exercise. contradicting Proposition 1. In fact we will not use such things in the proof of any theorem. then also every positive rational number is less than or equal to some integer. Proposition 1. but only as examples. x ≤ |x|. it is denoted by |x|. b) If n is even and x is√positive. For any number x in an ordered field.12 rely on the Intermediate Value Theorem so are not accessible to us at this time. That is. Q also satisfies the Archimedean property. Theorem 1. i. (Thus we must guard aginst using the existence of nth roots of real numbers in any of the theorems that lead up to the Intermediate Value Theorem. Corollary 1.13 does not hold in the number system Q. In the next section we will see some instances of this strategy in action. SOME PROPERTIES OF NUMBERS 21 Since every positive real number is less than or equal to some integer.4. directly: any positive rational number may be written in the form ab with a.. (Or. a positive real number.) This Archimedean property is so natural and familiar (not to mention useful. so if for some negative x we have y n = x then x = y 2k = (y k )2 ..14.9c). there is a unique positive real number y such that y n = x. Indeed. Here is a special case of Theorem 1.13. We write y = n x. there are plenty of them. if n is even then n = 2k for an integer k. We write y = n x. For an element x of an ordered field. The first two parts of Theorem 1. and ev- ery positive rational number is. A real number x is non-negative if and only if it is a square. Let x be a real number and n ∈ Z+ . and it is in fact possible to construct a theory of calculus based upon them (in fact. which is a 19th century innovation). since 2 = 1 + 1 is positive but is not the square of any rational number. in particular.

||x| − |y|| ≤ |x − y|.14. so x < |x|. y ≥ 0. so subtracting the left hand side from the right. y < 0. now (|x+y|)2 = (x+y)2 = x2 +2xy +y 2 . so |x + y| = −(x + y) = −x − y = |x| + |y|. But (x2 + |2xy| + y 2 − (x2 + 2xy + y 2 ) = |2xy| − 2xy ≥ 0 by Proposition 1. If x < 0 then x < 0 < −x = |x|. 5Such symmetry arguments can often by used to reduce the number of cases considered. Proof. Then x + y < 0.22 1. y be any numbers.  Exercise: Let x. the inequality will hold iff it holds after squaring both sides (Proposition 1. Then |x + y| = x + y ≤ |x| + |y|. INTRODUCTION AND FOUNDATIONS Proof. so we must consider consider further cases. since both quantities |x + y| and |x| + |y| are non-negative. (Reverse Triangle Inequality) For all numbers x. (Triangle Inequality) For all numbers x. y ≥ 0. Then |x + y| = x + y = |x| + |y|. |x + y| ≤ |x| + |y|.15. b) Deduce from part a) that ||x| − |y|| ≤ |x − y|. whereas (|x|+|y|)2 = |x|2 +2|x||y|+|y|2 = x2 + |2xy| + y 2 . it is certainly not very much fun. A similar argument can be used to establish the following variant. it is sufficient to prove the inequality after squaring both sides: (||x| − |y||)2 = (|x| − |y|)2 = |x|2 − 2|x||y| + |y|2 = x2 − |2xy| + y 2 ≤ x2 − 2xy + y 2 = (x − y)2 = (|x − y|)2 . y < 0. In fact. Case 2: x. a) Show that |x| − |y| ≤ |x − y| by writing x = (x − y) + y and applying the usual triangle inequality. Then |x + y| = −x − y ≤ | − x| + | − y| = |x| + |y|.11e). So this gives a second proof of the Triangle Inequality. Case 1: x. Now unfortunately we do not know whether |x + y| is non-negative or negative. If x ≥ 0 then x = |x|. if we interchange x and y we do not change what we are trying to show – we may reduce to Case 3 by interchanging x and y. Case 3: x ≥ 0. First. However. we can guarantee it is the same: since the desired inequality is symmetric in x and y – meaning. So it is enough to show (|x + y|)2 ≤ (|x| + |y|)2 . Since |x| is defined to be x if x ≥ 0 and −x if x < 0. Spivak gives an alternate proof of the Triangle Inequality which is more interesting and thematic. y.  Theorem 1. it is equivalent to show that 0 ≤ (x2 + |2xy| + y 2 ) − (x2 + 2xy + y 2 ). Case 3a: x + y ≥ 0. . Proposition 1. it is natural to break the proof into cases. The argument is exactly the same as that in Case 3. Again. y. Case 4: x < 0.16.5  The preceding argument is definitely the sort that one should be prepared to make when dealing with expressions involving absolute values. since both quantities are non-negative. Proof. Case 3b: x + y < 0.

Proof. establishing part a).18. Expanding out the right and left hand sides. a · a < a · b. (Arithmetic-Geometric Mean Inequality. equality holds iff x1 y2 = x2 y1 . But indeed since a < b. i=1 i6=j n X X LHS = x2i yi2 + 2 xi yi xj yj . .  Theorem 1. xn . . then we must have x2 = 0 and then we may take λ = xy11 . Similarly. yn we have (x1 y1 + . Third inequality: Since a+b a+b 2 and b are both positive. + xn yn )2 ≤ (x21 + . n = 2) For all numbers 0 < a < b. if y1 6= 0 and y2 = 0. i6=j i<j i<j  Theorem 1. + x2n )(y12 + . . it is equiv- alent to 4ab < a2 + 2ab + ab2 . . But a2 − 2ab + b2 = (a − b)2 . . so since a 6= b. or to a2 − 2b + b2 > 0. Moreover. SOME PROPERTIES OF NUMBERS 23 Theorem 1. it is equivalent to 2 < b and thus to a + b < 2b. 2. . (a − b)2 > 0. If y1 = 0 and y2 6= 0 then we must have x1 = 0 and then we may take λ = xy22 . Now we need only rewrite it in the form (x21 + x22 )(y12 + y22 ) − (x1 y1 + x2 y2 )2 = (x1 y2 − x2 y1 )2 ≥ 0. y1 . one can prove the two squares identity: (x21 + x22 )(y12 + y22 ) = (x1 y2 − x2 y1 )2 + (x1 y1 + x2 y2 )2 . Proof. By pure brute force. we have  2 2 a+b a < ab < < b2 . . n = 2) a) For all numbers x1 . . . i=1 i<j so X X X RHS − LHS = x2i yj2 − 2 xi yj xj yi = (xi yj − xj yi )2 ≥ 0. If in this equality y1 and y2 are both nonzero. . + yn2 ). y2 . we get Xn X RHS = x2i yi2 + x2i yj2 . (2) (x1 y1 + x2 y2 )2 ≤ (x21 + x22 )(y12 + y22 ). b) Moreover equality holds in (2) iff y1 = y2 = 0 or there exists a number λ such that x1 = λy1 and x2 = λy2 .19. (Cauchy-Bunyakovsky-Schwarz Inequality. . . . y1 .17. Second inequality: Expanding out the square and clearing denominators. if y1 = y2 = 0 then the equality x1 y2 = x2 y1 also holds. . Finally.19. First inequality: Since a > 0 and 0 < a < b. we may divide by them to get xy11 = xy22 = λ. . (Cauchy-Bunyakovsky-Schwarz Inequality) For any n ∈ Z+ and numbers x1 .  Later we will use the theory of convexity to prove a signficant generalization of Theorem 1. 2 Proof. the Weighted Arithmetic-Geometric Mean Inequality. a + b < b + b = 2b. x2 .

.

and still less would it allow you to reason abstractly about the cat concept or “prove theorems about cats. but one does not start by saying what sort of object the Euclidean plane “really is”. but if you have never seen a cat before this definition would not allow you to determine with certainty whether any particular animal you encountered was a cat. In other words. Now apply (ii) with n = 3: since 3 ∈ S. in the same way that many dictionary definitions are actually descriptions: “cat: A small carnivorous mammal domesticated since early times as a catcher of rats and mice and as a pet and existing in several distinctive breeds and varieties. as alluded to earlier in the course) needs to incorporate. starting from 1 and adding 1 sufficiently many times. Now ap- ply (ii) with n = 1: since 1 ∈ S. Introduction Principle of Mathematical Induction for sets Let S be a subset of the positive integers. Euclid himself gave such “definitions” as: “A point is that which has position but not dimensions.” “A line is breadth without depth. This is not a proof.e. we deduce 1 + 1 = 2 ∈ S..” This helps you if you are already familiar with the animal but not the word. i. we know that 1 ∈ S. And so forth. the principle of mathematical induction. we deduce 2 + 1 = 3 ∈ S. In Euclidean geometry one studies points. we deduce 3 + 1 = 4 ∈ S. Now it is a fundamental part of the structure of the positive integers that every positive integer can be reached in this way. CHAPTER 2 Mathematical Induction 1. Now apply (ii) with n = 2: since 2 ∈ S. Suppose that: (i) 1 ∈ S.” In the 19th century it was recognized that these are descriptions rather than definitions. Alternately. but it is somewhat interesting. It is not a key point. (No good proof uses “and so forth” to gloss over a key point!) But the idea is as follows: we can keep iterating the above argument as many times as we want. planes and so forth. and (ii) ∀ n ∈ Z+ . Then S = Z+ . n ∈ S =⇒ n + 1 ∈ S. so let us be a bit more specific. The intuitive justification is as follows: by (i). deducing at each stage that since S contains the natural number which is one greater than the last natural number we showed that it contained.”) Rather 25 . any rigorous definition of the natural numbers (for instance in terms of sets. either implicitly or (more often) explicitly. the principle of mathematical induction is a key ingredient in any ax- iomatic characterization of the natural numbers. lines. (At least this is how Euclidean geometry has been approached for more than a hundred years.

. “plane” and so forth are taken as undefined terms. S(S(0)). S(0). We also have a number S(S(0)). For instance.. subject to the above axioms. since in [0. ∞). . In slightly modernized form. ∞) of all non-negative real numbers. In particular X itself is infinite. The first four are: (P1) Zero is a number. S(S(S(0)). They are related by certain axioms: abstract properties they must satisfy. This system satisfies (P1) through (P4) but has much more in it than just the natural numbers we want. Notice that the example we cooked up above fails (P5). Is 0 = S(0)? No. i. and S : X → X is a function. In 1889. we have a number (i. then Y = X. . so we must be missing an axiom! Indeed. suppose that the “numbers” consisted of the set [0. we define 0 to be the real number of that name. much simpler) system of axioms for the natural num- bers.e. number and successor.e. which is not equal to 0 by (P4) and it is also not equal to S(0). 0. the Peano axioms. the last axiom is: (P5) If Y is a subset of the set X of numbers such that 0 ∈ Y and such that x ∈ Y implies S(x) ∈ Y . in fact. because then S(0) = S(S(0)) would be the successor of the distinct numbers 0 and S(0). There are five axioms that they must satisfy.26 2. which is also a number. (P4) Zero is not the successor of any number. that is prohibited by (P4). . From this we can deduce quite a bit.. (P2) Every number has a successor. The crux of the matter is this: is there any element of X which is not a member of the sequence (3). and we define the successor of x to be x + 1. MATHEMATICAL INDUCTION “point”. the Italian mathematician and proto-logician Gisueppe Peano came up with a similar (and. Using set-theoretic language we can clarify what is going on here as follows: the structures we are considering are triples (X. ∞) the subset of natural numbers contains zero and contains the successor of each of its elements but is a proper subset of [0. we can produce an infinite sequence of distinct elements of X: (3) 0. First. “line”. Continuing in this way. S). 0 is an element of X. an element of X) called S(0). this goes as follows: The undefined terms are zero. is not obtained by starting at 0 and applying the successor function finitely many times? The axioms so far do not allow us to answer this question. . (P3) No two distinct numbers have the same successor. where X is a set. contradicting (P3). Thus it was Peano’s contribution to realize that mathematical induction is an ax- iom for the natural numbers in much the same way that the parallel postulate is an axiom for Euclidean geometry.

Then IH n(n + 1) 1 + . and in the above principle we replace (i) by the assumption that P (0) is true and keep the assumption (ii). . . . Not having an formal understanding of the relationship between mathematical induction and the structure of the natural numbers was not much of a hindrance to mathematicians of the time. P (n) is true =⇒ P (n + 1) is true. Base case (n = 1): Indeed 1 = 1(1+1) 2 . . 2. . so still less should it stop us from learning to use induction as a proof technique. + n(n+1) Induction step: Let n ∈ Z and suppose that 1 + . + n = 2 . Show that P (n) is true for all integers n ≥ N0 .1 Exercise 1: Suppose that N0 is a fixed integer. and (ii) For all n ∈ Z+ . This is more in accordance with the discussion of the Peano axioms above. which in the scope of mathematical history is quite recent. Principle of mathematical induction for predicates Let P (x) be a sentence whose domain is the positive integers. Variant 1: Suppose instead that P (x) is a sentence whose domain is the natu- ral numbers. THE FIRST INDUCTION PROOFS 27 On the other hand. . (Hint: define a new predicate Q(n) with domain Z+ by making a “change of variables” in P . What we presented above is a standard modern modification which is slightly cleaner to work with. in 1665.. P (n) is true =⇒ P (n + 1) is true. + n = 2 .1. We go by induction on n. with zero included. Suppose that: (i) P (1) is true. + n) + n + 1 = +n+1 2 1In fact Peano’s original axiomatization did not include zero. Then P (n) is true for all positive integers n. According to the (highly recommended!) Wikipedia article on mathematical induction. which we shall dis- cuss later – and the technique seems to have become commonplace by the end of the 18th century. it is telling that this work of Peano is little more than one hundred years old. Proof.) 2. Suppose that: (i) P (N0 ) is true. During the next hundred years various equivalent versions were used by different mathematicians – notably the methods of infinite descent and minimal counterexample. Then of course the conclusion is that P (n) is true for all natural numbers n. There are many things that one can prove by induction. Traces of what we now recognize as induction can be found from the mathematics of antiquity (including Euclid’s Elements!) on forward. and (ii) For all n ≥ N0 . . + n + n + 1 = (1 + . i. The First Induction Proofs 2. the first mathemati- cian to formulate it explicitly was Blaise Pascal. 1 + . n(n+1) Proposition 2. The Pedagogically First Induction Proof.1. . Let P (x) be a sentence whose domain contains the set of all integers n ≥ N0 .e. but the first thing that everyone proves by induction is invariably the following result. For all n ∈ Z+ . .

n}. . 2}. . .e. Since p1 · · · pn ≥ p1 ≥ 2. . that there are at least n prime numbers p1 < .2. to show that the number of 2-element subsets of an n element set is also equal to 1 + 2 + . . and assume that P (n) holds. . + n − 1 = . However thought and ideas are good things when you have them! In many cases an inductive proof of a result is a sort of “first assault” which raises the challenge of a more insightful. This comes out immediately if we list the 2-element subsets of {1.. Thus the number of 2-element subsets of {1. + n − 1. say q. We will prove the latter by induction on n. . In particular it is divisible by at least one prime number. . Proof. . The subsets with least element 2 are {2. The subset with least element n − 1 is {n − 1. i. 2 We recognize the quantity on the right-hand side as the binomial coefficient n2 :  it counts the number of 2-element subsets of an n element set. . n ≥ 2 : 1 + . n} in a systematic way: we may write each such subset as {i. (Euclid) There are infinitely many prime numbers. . i. Base Case (n = 1): We need to show that there is at least one prime number.e. which can be proved in many ways. there is at least one prime number different from the numbers we have already found.1 above. This is certainly the case for Proposition 2. . . For now we assume this (familiar) fact. 3}. Theorem 2. n}. noninductive proof.28 2. .2 But I claim that Nn is not divisible by pi for any 1 ≤ i ≤ n. + 1 = 1 + 2 + .  Induction is such a powerful tool that once one learns how to use it one can prove many nontrivial facts with essentially no thought or ideas required. consider the quantity Nn = p1 · · · pn + 1. < pn . Then: The subsets with least element 1 are {1. . {1. Then kpi = p1 · · · pn + 1 = 2Later in these notes we will prove the stronger fact that any integer greater than one may be expressed as a product of primes.e. . For n ∈ Z+ . 4}. 2. a total of n − 1. 2 is a prime number. + n − 1. Induction Step: Let n ∈ Z+ . n}. let P (n) be the assertion that there are at least n prime numbers. . . We need to show that P (n + 1) holds. if Nn = api for some a ∈ Z. . This gives a combinatorial proof of Proposition 2. The (Historically) First(?) Induction Proof. Nn ≥ 3.1.. Then there are infinitely many primes if and only if P (n) holds for all positive integers n. as is the case in the above proof... For instance. then let b = p1 ···p pi n ∈ Z. . To establish this. . {2. . . n} is on the one hand n2 and  on the other hand (n − 1) + (n − 2) + . Indeed. it is equivalent to show: (n − 1)n (4) ∀n ∈ Z. This raises the prospect of a combinatorial proof. a total of n − 2. {2. i. .2. j} with 1 ≤ i ≤ n − 1 and i < j ≤ n. MATHEMATICAL INDUCTION n2 + n 2n + 2 n2 + 2n + 3 (n + 1)(n + 2) (n + 1)((n + 1) + 1) = + = = = . . . 2 2 2 2 2 Here the letters “IH” signify that the induction hypothesis was used. {1. . 2. . . 3}. a total of 1. Here is one non-inductive proof: replacing n by n − 1.

But this argument is of mostly historical and philosophical interest. so we take p3 = 7. . CLOSED FORM IDENTITIES 29 bpi + 1. 17. In this way we generate an infinite sequence of prime numbers – so the proof is unassailably constructive. For definiteness let us take p5 to be the smallest prime factor of N4 . Starting with any given prime number – say p1 = 2 – and following his procedure. Perhaps Mullin’s most interesting question about this sequence is: does every prime num- ber appear in it eventually? This is an absolutely open question. so p5 = 13. The next few terms are 53. . one generates an infinite sequence of primes. Then N2 = 2 · 3 + 1 = 7 is again prime. but in retrospect his proof is clearly an inductive argument: what he does is to explain.) What is strange is that in our day Euclid’s proof is generally not seen as a proof by induction. 5. . a contradiction. Then N3 = 2 · 3 · 7 + 1 = 43 is also prime. . after Albert A. At the moment the smallest prime which is not known to appear in the Euclid-Mullin sequence is 31. q. pn of distinct primes. so (k − b)pi = 1 and thus pi = ±1. 3. . 887. So if we take q to be. To quote wikipedia: “It is widely considered to be one of the more. for instance. 109. . the number of points of contact between them is n − 1. very roughly. 37. (Maybe. Proposition 20). The statement in question is. 71. then there are at least n + 1 prime numbers: p1 . Rather. It is often called the Euclid-Mullin sequence. 3. 2801. 30693651606209. . challenging and enigmatic of Plato’s dialogues. Euclid did not explicitly use induction. how given any finite list p1 . But this time something more interesting happens: N4 = 2 · 3 · 7 · 43 + 1 = 13 · 139 is not prime. lines 149a7-c3. and as of 2010 only the first 47 terms are known. he must have regarded it as obvious that there is at least one prime number. the smallest prime divisor of Nn . one can produce a new prime which is not on the list. since to find pn+1 we need to factor Nn = p1 · · · pn + 1. pn .”) There is not much mathematics here! Nevertheless. it is often construed as a proof by contradiction – which it isn’t! Rather. 7127.  The proof that there are infinitely many prime numbers first appeared in Euclid’s Elements (Book IX. if not the most. it is in fact quite difficult to compute the terms of this sequence. as above. For instance. 1313797957. for a thorough discussion of induction in the Parmenides the reader may consult [Ac00] and the references cited therein. 6221671. 23003. (Euclid does not verify the base case. 1741. .1 is a prototype for a certain kind of induction proof (the easiest kind!) in which P (n) is some algebraic identity: say . Thus one can see that it is rather far from just giving us all of the prime numbers in increasing order! In fact. Mullin who asked questions about it in 1963 [Mu63]. a quantity which rapidly increases with n. this sequence of prime numbers is itself rather interesting. so we take p2 = 3. 5471. 38709183810571. 11. 52662739. Remark: Some scholars have suggested that what is essentially an argument by mathematical induction appears in the later middle Platonic dialogue Parmenides. . . Closed Form Identities The inductive proof of Proposition 2. N1 = 2 + 1 = 3 is prime. . so we take p4 = 43. 23. . that if n objects are placed adjacent to another in a linear fashion. Euclid’s argument is perfectly constructive. 139. By the way.

. + 2 + 1. Adding 2(n + 1) − 1 = 2n + 1 to both sides. In this case to make the induction proof work you need only (i) establish the base case and (ii) verify the equality of successive differences LHS(n + 1) − LHS(n) = RHS(n + 1) − RHS(n). Mathematical induction can be viewed as a particular incarnation of a much more general proof technique: try to solve your problem by reducing it to a previously . + n2 = 6 . 2 This is no doubt preferable to induction. . . . + n2 = 6 . + (n − 1) + n. 1 + 3 + . so the induction step is complete. Proposition 2. Gauss and his classmates were asked to add up the numbers from 1 to 100. Of course the sum is unchanged if we we write the terms in descending order: Sn = n + (n − 1) + . . we also get 2n +9n 6+13n+7 . . . + (2n − 1) = n2 . + n2 + (n + 1)2 = + (n + 1)2 = 6 2n3 + 3n2 + n + 6 + 6n2 + 12n + 1 2n3 + 9n2 + 13n + 7 = . We give two more familiar examples of this. expanding out (n+1)((n+1)+1)(2(n+1)+1) 6 . . when available. Then IH n(n + 1)(2n + 1) 1 + . we get (1 + 3 + . n(n+1)(2n+1) Induction step: Let n ∈ Z+ and suppose that 12 + . . For all n ∈ Z+ . MATHEMATICAL INDUCTION LHS(n) = RHS(n). We will show P (n) holds for all n ∈ Z+ by induction on n. .  Often a non-inductive proof. . Proof. By induction on n. + n. . Proof. Base case (n = 1): indeed 1 = 12 . Gauss reasoned essentially as follows: put Sn = 1 + . As a child of no more than 10 or so.30 2. Induction step: Let n be an arbitrary positive integer and assume P (n): (5) 1 + 3 + . . + (2n − 1) = n2 . + (2n − 1) + 2(n + 1) − 1 = n2 + 2(n + 1) − 1 = n2 + 2n + 1 = (n + 1)2 . This is precisely P (n + 1). . Again returning to our archetypical example: 1 + . it is time to tell the story of little Gauss. . Adding the two equations gives 2Sn = (n + 1) + (n + 1) + . + (2n − 1) = n2 ”. . + (n + 1) = n(n + 1). Base case: n = 1. . so long as one is clever enough to see it. .3. Let P (n) be the statement “1 + 3 + . .  n(n+1)(2n+1) Proposition 2.4. . Most of the students did this by a laborious calculation and got incorrect answers in the end. offers more insight. 12 + 22 + . so n(n + 1) Sn = . For all n ∈ Z+ . 6 6 3 2 On the other hand. . .

+(2n−1) = (2i−1) = 2 i− 1=2 −n = n2 +n−n = n2 . 2n ≥ n3 . Now n3 − 3n2 − 3n = n(n2 − 3n − 3). There exists N0 ∈ Z+ such that for all n ≥ N0 . A more straightforward application of this philosophy allows us to deduce Proposition 2. (By considering limits as n → ∞. Induction step: Assume that for some natural number n ≥ 2. so n2 − 3n − 3 > 0 if .) Now. We want to show that P (n) holds for all natural numbers n by induction.6. we will replace 10 with some larger number. proof analysis For n ∈ N. Here is a formal writeup: Proof. Proof analysis A little experimentation shows that there are several small values of n such that 2n < n3 : for instance 29 = 512 < 93 = 729. 4.  Proposition 2.5. Our task is then to show that 2n3 ≥ (n + 1)3 for all n ≥ 10. it seems to be the case that we can take N0 = 10: let’s try. Base case: n = 2: 22 = 4 > 2. this is in turn equivalent to n3 − 3n2 − 3n > 0. We just need to restructure the argument a bit: we verify the statement separately for n = 0 and then use n = 1 as the base case of our induction argument. Induction step: Suppose that for some n ≥ 10 we have 2n ≥ n3 . INEQUALITIES 31 solved problem. It’s not guaranteed to work for n ≥ 10. it suffices to verify the statement for all natural numbers n ≥ 2. . so this is equivalent √ to n2 − 3n − 3 ≥ 0. Then 2n+1 = 2 · 2n > 2 · n. 2n > n. 2 3± 21 The roots of the polynomial x − 3x − 3 are x = 2 . On the other hand. For all n ∈ N . We would now like to say that 2n ≥ n + 1.1: n n n   X X X n(n + 1) 1+3+. Well. Then 2n+1 = 2 · 2n ≥ 2n3 . let P (n) be the statement “2n > n”. Induction step: let n be any natural number and assume P (n): 2n > n. Since 20 = 1 > 0 and 21 = 2 > 1. But in fact this is true if and only if n ≥ 1. i=1 i=1 i=1 2 4. Base case: n = 0: 20 = 1 > 0. don’t panic. Base case: n = 10: 210 = 1024 > 1000 = 103 . it is certainly the case that the left hand side exceeds the right hand side for all sufficiently large n. if not. Then 2n+1 = 2 · 2n > 2n > n + 1. Since everything in sight is a whole number. We go by induction on n. . 2n > n. 2n3 − (n + 1)3 = 2n3 − n3 − 3n2 − 3n − 1 = n3 − 3n2 − 3n − 1 ≥ 0 ⇐⇒ n3 − 3n2 − 3n ≥ 1. Inequalities Proposition 2.3 from Proposition 2.

. in fact. MATHEMATICAL INDUCTION √ √ n > 4 = 3+2 25 > 3+2 21 . 1 1 Proposition 2. π6 . + 2 + ≤2− + ... this argument shows that the infinite series converges. 3. so it will suffice to show that for all n ∈ Z+ . we can in fact deduce that 2n < n3 for all 4 ≤ n ≤ 8. + n2 ≤ 2 − n1 . the desired inequality holds if n ≥ 10. Remark: More precisely. it follows that n=1 n12 ≤ 2. all of them have the same color. all of the horses in S have the same color! Induction step: We suppose that for some positive integer n. but again we need to put things together in proper order.. 9. Extending Binary Properties to n-ary Properties Example: All horses have the same color. Base case (n = 1): 1 ≤ 2 − 11 . 1) then becomes false for a little while longer. 4. 1 1 Induction step: Assume that for some n ∈ Z+ . 8. 4 n (n + 1)2 n (n + 1)2 1 We want the left hand side to be less than 2 − n+1 . P (n) holds.32 2. It is interesting that the desired inequality is true for a little while (i. The exact value of the sum 2 is. Then 1 1 1 1 1 1+ + . 6. 2n ≥ n3 for all natural numbers n except n = 2. it suffices to show 1 1 1 + < . and then becomes true for all n ≥ 10. in any set of n horses. the desired inequality is equivalent to n2 + 2n = n(n + 2) < (n + 1)2 = n2 + 2n + 1. 1 + 4 + . For all n ∈ Z+ . a task which we leave to the reader. In particular. then this equality remains true for all larger natural numbers n. Thus from the fact that 29 < 93 . 7. In particular. at n = 0. proof analysis By induction on n. so it will suffice to show 1 1 1 2− + 2 <2− . 1 + 4 + .. n (n + 1) n+1 Equivalently. Note that it follows from our analysis that if for any N ≥ 4 we have 2N ≥ N 3 .7. Proposed proof: There are only finitely many horses in the world. n + 1 (n + 1)2 n But we have 1 1 n+1+1 n+2 + = = . which is true! Thus we have all the ingredients of an induction proof. .e. Base case: In any set S of one horse. so by induction we have shown that 2n ≥ n3 for all n ≥ 10. A proof of this requires techniques from advanced calculus. 5.. so by clearing denominators.. n + 1 (n + 1)2 (n + 1)2 (n + 1)2 Everything in sight is positive. We leave it to to the student to convert the above analysis into a formal proof. where P (n) is the statement that in any set of n horses. P∞ Remark: Taking limits as n → ∞. + n2 ≤ 2 − n1 .

Proposition 2. . which for specificity we label H1 . . There is a moral here: one should pay especially close attention to the smallest values of n to make sure that the argument has no gaps. then p | ai for some 1 ≤ i ≤ n. erroneously. every horse in S has the same color as H1 : in particular Hn has the same color as H1 . an+1 ∈ Z. Here are some examples of this. Let p be a prime. The following is a fundamental fact of number theory. Similarly. Hn . . p n is irrational. and occurs in a perhaps unexpected place. . . Now we can split this into two sets of n horses: S = {H1 . except the induction step is not valid when n = 1: in this case S = {H1 } and T = {H2 } and these two sets are disjoint: they have no horses in common. then by our inductive hypothesis. It follows by induction that for all n ∈ Z+ . p | ai for some 1 ≤ i ≤ n. For instance. all have the same color. if only we could establish the argument for n = 2. This is trivial for n = 1. an . that S and T must have more than one element. Hn . Let a1 . an ∈ Z+ . but uses once again the n = 2 case. If p | ab.9. On the other hand. p | a or p | b. any two have the same color. there is a certain type of induction proof for which the n = 2 case is the most important (often it is also the base case. EXTENDING BINARY PROPERTIES TO n-ARY PROPERTIES 33 all have the same color. . Hn } and T = {H2 . so by Euclid’s Lemma. and that a prime p divides a1 · · · an an+1 . . Hn+1 all have the same color as H1 . . . By induction. Then p | (a1 · · · an )an+1 . But this means that H2 . However it is subtle. . If p | a1 · · · an . and let a. and let a. . In fact the argument is completely correct. Hn+1 . . so we are also done. then they all have the same color. We have been misled by the “dot dot dot” notation which suggests. If the former. n ∈ Z+ . . Proposition 2. . Induction Step: We assume that for a given n ∈ Z+ and a1 . Consider now a set of n + 1 horses. 1 Exercise 5: Use Corollary 2. in any set of n horses. Let’s assume it for now. and there is. Proof. if a prime p divides the product a1 · · · an . . . Then we can swiftly deduce the following useful generalization.10 to show that for any prime p. . We show it for all n ≥ 2 by induction. . the result can be fixed as follows: if in a finite set of horses. If the latter. called Euclid’s Lemma.8. . Let p be a prime. . . Base case: n = 2: This is precisely Euclid’s Lemma. b ∈ Z+ .  Corollary 2. . . Let p be a prime number. . then the proof goes through just fine. n ∈ Z+ and a1 . p | a1 · · · an or p | an+1 . . 5. an ∈ Z+ . and the induction step is easy to show.10. Hn . In fact. but not always). . Then p | an =⇒ p | a. . H2 . by induction!). Hn+1 }. . . Later in this chapter we will give a proof (yes. Proof analysis: Naturally one suspects that there is a mistake somewhere. we’re done. every horse in T has the same color as Hn : in particular Hn+1 has the same color as Hn . then it divides at least one ai .

The Principle of Strong/Complete Induction Problem: A sequence is defined recursively by a1 = 1. which is what we wanted to show. a4 = 3a3 − 2a2 = 3 · 4 − 2 · 2 = 8. P (n) are all true. Find a general formula for an and prove it by induction. then we’d be in good shape: an+1 = 3 · 2n−1 − 2 · 2n−2 = 3 · 2n−1 − 2n−1 = 2 · 2n−1 = 2n = 2(n+1)−1 .34 2. Then P (n) is true for all n ∈ Z+ . but rather allows us to assume the truth of the statement for all positive integer values less than n + 1 in order to prove it for n + 1. if P (1). Indeed. It is easy to see that PS/CI implies the usual principle of mathematical induc- tion. Thus. in a nutshell. A ∧ B means “A and B”. Evidently this goes a bit beyond the type of induction we have seen so far: in addition to assuming the truth of a statement P (n) and using it to prove P (n + 1). . By the induction hypothesis we can replace an with 2n−1 . now what?? A little bit of thought indicates that we think an−1 = 2n−2 . If for some reason it were logically permissible to make that substitution. a5 = 3a4 − 2a3 = 3 · 8 − 2 · 4 = 16. . The evident guess is therefore an = 2n−1 . we also want to assume the truth of P (n − 1). There is a version of induction that allows this. a2 = 2 and an = 3an−1 − 2an−2 . . Proof analysis: Unless we know something better. strong/complete induction allows us to assume not only the truth of our statement for a single value of n in order to prove it for the next value n + 1. and more: Principle of Strong/Complete Induction: Let P (n) be a sentence with domain the positive integers. a6 = 3a5 − 2a4 = 3 · 16 − 2 · 8 = 32. we may as well examine the first few terms of the sequence and hope that a pattern jumps out at us. let’s try: assume that an = 2n−1 . P (n − 1). Then an+1 = 3an − 2an−1 . . The logical form of this is simply3 (A =⇒ C) =⇒ (A ∧ B =⇒ C). We have a3 = 3a2 − 2a1 = 3 · 2 − 2 · 1 = 4. MATHEMATICAL INDUCTION 6. Suppose: (i) P (1) is true. . 3The symbol ∧ denotes logical conjunction: in other words. then P (n + 1) is true. and (ii) For all n ∈ Z+ . getting an+1 = 3 · 2n−1 − 2an−1 . Now a key point: it is not possible to prove this formula using the version of mathematical induction we currently have.

namely Q(n) = P (1) ∧ P (2) ∧ . Here is an application which makes full use of the “strength” of PS/CI. if one can deduce statement C from statement A. Proof. then one can also deduce statement C from A together with some additional hypothesis or hy- potheses B. By definition. . The trick is to introduce a new predicate Q(n). Case 2: n is not prime. with 1 < a. and as above. Q(n) =⇒ Q(n + 1). . Base case: n = 2. In general. . . Proposition 2. . We wish to show that P (n) holds for all n ∈ Z+ . one can use our previous PMI to prove PS/CI. (ii) above tells us that Q(n) =⇒ P (n + 1). b < n. State this explicitly. Let n > 1 be an integer. As above. . Q(n) holds for all n. this is induction’s Kryptonite: it has no help to offer in guessing the answer. Less obviously. ∧ P (n) and also P (n + 1). Let P (n) be a sentence with domain the positive integers and satisfying (i) and (ii) above. hence certainly P (n) holds for all n. such that n = ab. we can take A to be P (n). . By trial and error we guessed that an = 2n−1 . ∧ P (n) ∧ P (n + 1) = Q(n + 1). so we’re good. . . and for all n ≥ 1. Suppose we know PMI and wish to prove PS/CI. Then there exist prime numbers p1 . ∧ P (n). But this was very lucky (or worse: the example was constructed so as to be easy to solve). it might not be so obvious what the answer is. then we know P (1) ∧ . Indeed 2 is prime. Case 1: n is prime. there is a variant of strong/complete induc- tion where instead of starting at 1 we start at any integer N0 . b. 7. Specifically. Exercise 6: As for ordinary induction. a2 = 2. But now our induction hypothesis applies to both a and b: we can write a = p1 · · · pk and b = q1 · · · ql . ∧ P (n − 1). we’re good. . SOLVING HOMOGENEOUS LINEAR RECURRENCES 35 In other words. But then n = ab = p1 · · · pk q1 · · · ql is an expression of n as a product of prime numbers: done!  This is a good example of the use of induction (of one kind or another) to give a very clean proof of a result whose truth was not really in doubt but for which a more straightforward proof is wordier and messier. . 7. where the pi ’s and qj ’s are all prime numbers. using only ordinary induction. this means that there exist integers a. The proof is not hard but slightly tricky. We go by strong induction on n. Solving Homogeneous Linear Recurrences Recall our motivating problem for PS/CI: we were given a sequence defined by a1 = 1. So by PMI. Induction step: Let n > 2 be any integer and assume that the statement is true for all integers 2 ≤ k < n. pk (for some k ≥ 1) such that n = p1 · · · pk . C to be P (n + 1) and B to be P (1) ∧ P (2) ∧ . . We wish to show that it is true for n. So Q(1) holds and for all n. Notice that Q(1) = P (1). But if we know Q(n) = P (1) ∧ . and this was easily confirmed using PS/CI.11. . an = 3an−1 − 2an−2 .

b. F201 = 1. + ba + b. Fn+2 = Fn+1 + Fn .g. x2 = ax1 + b = a(ac + b) + b = ca2 + ba + b. . i=1 a−1 a−1 In particular the sequence xn grows exponentially in n. Again we list some values: F3 = 2. F5 = 5. xn+1 = axn + b.36 2. x4 = 782. F12 = 144. x3 = ax2 + b = a(ca2 + ba + b) + b = ca3 + ba2 + ba + b. . Indeed. F6 = 8. x5 = 3907. F8 = 21. . F200 Cognoscenti may recognize this as the decimal expansion of the golden ratio √ 1+ 5 ϕ= . F11 = 89.618033988749894848204586834 . So suppose we have a. c ∈ R. F201 = 453973694165307953197296969697410619233826. MATHEMATICAL INDUCTION Example: Suppose a sequence is defined by x0 = 2. so the desired expression is correct for all n. e. ∀n ≥ 1. x2 = 32.+ba+b)+b) = can+1 +ban +· · ·+ba2 +ba+b. . it’s not evident. F200 = 280571172992510140037611932413038677189525. ∀n ∈ N. 2 . . The Fn ’s are the famous Fibonacci numbers. F14 = 377. This is a case where more generality brings clarity: it is often easier to detect a pattern involving algebraic expressions than a pattern involving integers. and we define a sequence recursively by x0 = c. F9 = 34. F7 = 13. F4 = 3. Taking ratios of successive values suggests that the base of the exponential lies between 1 and 2. x4 = ax3 + b = a(ca3 + ba2 + ba + b) + b = ca4 + ba3 + ba2 + ba + b. F13 = 233. xn = can + ban−1 + . F15 = 377. x3 = 157. These computations suggest Fn grows exponentially. Aha: it seems that we have for all n ≥ 1. . What’s the pattern? At least to me. we can simplify it: n  n+1 (ab + ac − c)an − b  n X n a −1 xn = ca + b ai = ca + b = . Assuming it is true for n. . we calculate IH xn+1 = axn +b = a(can +ban−1 +. Now let’s try again: x1 = ax0 + b = ac + b. F10 = 55. xn = 5xn−1 − 3 for all n ≥ 1. Let us try our hand on a sequence defined by a two-term recurrence: F1 = F2 = 1. Here the first few terms of the sequence are x1 = 7. Now we have something that induction can help us with: it is true for n = 1.

in which case we have 2 only a single root r = 2b . . Namely. So let’s guess an exponential solution xn = Crn and plug this into the recurrence. It turns out that 1 −1 C1 = √ . So we recover the golden ratio ϕ = 1+2 5 – a good sign! – as well as √ 1− 5 = 1 − ϕ = −. Can it be defined in a sensible way? Yes! Writing the basic recurrence in the form Fn+1 = Fn + Fn−1 and solving for Fn−1 gives: Fn−1 = Fn+1 − Fn . it’s easier to evaluate at n = 1 and n = 0: then we have F0 = C1 ϕ0 + C2 (1 − ϕ)0 = C1 + C2 . 5 5 interlude: This is easily said and indeed involves only high school algebra. and the case c < −b4 in which case the roots are complex numbers.618033988749894848204586834 . But we can do something slicker. C2 = √ . 7. . SOLVING HOMOGENEOUS LINEAR RECURRENCES 37 However. Therefore we propose xn = C1 r1n + C2 r2n as the general solution to the two-term homoge- neous linear recurrence (6) and the two initial conditions x1 = A1 . c we consider an recurrence of the form (6) x1 = A1 . we get Crn+2 = xn+2 = b(Crn+1 ) + c(Crn ). But for this to work we need to know F0 . we get 1 = F1 = C1 ϕ + C2 (1 − ϕ). Multiplying the first equation by ϕ and subtracting it from the second equation will give us a linear equation to solve for C2 . 2 2 Some cases to be concerned about are the case c = −b4 . for real numbers b. In all the cases we’ve looked at the solution was (roughly) exponential. let’s consider a more general problem and make a vaguer guess. and then we plug the solution into either of the equations and solve for C1 . xn+2 = bxn+1 + cxn . x2 = A2 . 2 So we have two different bases – what do we do with that? A little thought shows that if r1n and r2n are both solutions to the recurrence xn+2 = bxn+1 cxn (with any initial conditions). then so is C1 r1n + C2 r2n for any constants C1 and C2 . Trying this for the Fibonacci sequence. ∀n ≥ 1.√ But for the moment let’s look at the Fibonacci √ case: b = c = 1. which simplifies to r2 − br − cr = 0. Then r = 1±2 5 . . 1 = F2 = C1 (ϕ)2 + C2 (1 − ϕ)2 . Instead of determining the constants by evaluating Fn at n = 1 and n = 2. which we have not defined. x2 = A2 provide just enough information to solve for C1 and C2 . Evidently the solutions to this are √ b± b2 + 4c r= . .

We go by strong/complete induction on n. 5 √ 1+ 5 where ϕ = 2 . In particular. whereas plugging in n = 1 gives 1 = C1 (ϕ) + C2 (1 − ϕ) = C1 (ϕ) − C1 (1 − ϕ) = (2ϕ − 1)C1 . Then x3 = 2x2 − x1 = 2 · 2 − 1 = 3.12. xn+2 = 2xn+1 − xn . but we have already checked these: we used them to determine the constants C1 and C2 . it is not quite true that any solution to (6) must have exponential growth. using the identities ϕ2 = ϕ + 1. 2ϕ − 1 2 1+ 5 −1 5 5 2 Now we are ready to prove the following result. we compute 1 Fn+2 = Fn+1 + Fn = √ (ϕn+1 + ϕn − (1 − ϕ)n+1 − (1 − ϕ)n )) 5 1 = √ (ϕn (ϕ + 1) − (1 − ϕ)n (1 − ϕ + 1) = 5 1 √ (ϕn (ϕ2 ) − (−ϕ)−n ((−ϕ)−1 + 1) 5 1 1 1 = √ (ϕn+2 −(−ϕ)−n (−ϕ)−2 ) = √ (ϕn+2 −(−ϕ)−(n+2) ) = √ (ϕn+2 −(1−ϕ)n+2 ). 1 − ϕ−1 = ϕ−2 = (−ϕ)−2 . consider the recurrence x1 = 1. (1 − ϕ) = −ϕ−1 . Theorem 2. x2 = 2. MATHEMATICAL INDUCTION This allows us to define Fn for all integers n. Proof.38 2. For instance. the nth Fibonacci number is 1 Fn = √ (ϕn − (1 − ϕ)n ) . ∀n ≥ 1. we have F0 = F2 − F1 = 1 − 1 = 0. So now assume that n ≥ 3 and that the formula is correct for all positive integers smaller than n + 2. Thus we get 0 = C1 + C2 . x4 = 2x3 − x2 = 2 · 3 − 2 = 4. Then. By the way. 5 5 5  Exercise 7: Find all n ∈ Z such that Fn < 0. . C2 = √ . x5 = 2 · 4 − 3 = 5. The base cases are n = 1 and n = 2. 1 1 1 −1 C1 = =  √  = √ . (Binet’s Formula) For any n ∈ Z.

Equivalently. If not. We wish to show that S = Z+ . it is certainly the least element of S. if P (n) holds for all n and S ⊂ Z is nonempty.e. if 1 ∈ S. Conversely. 8. Indeed. The characteristic polynomial is r2 − 2r + 1 = (r − 1)2 : it has repeated roots. For this. We claim WOP is logically equivalent to the principle of mathematical induc- tion (PMI) and thus also to the principle of strong/complete induction (PS/CI). xn is a constant sequence). observe that WOP holds iff P (n) holds for all n ∈ Z+ . (Well-Ordering Principle) Let S be a nonempty subset of Z+ . see [DC]. Now we can prove that P (n) holds for all n by complete induction: first. If n + 1 is the least element of S. If not. Then S has a least element. But if for every n the answer to the question “Is n an element of S?” is negative. Now assume P (k) for all 1 ≤ k ≤ n. It turns out that in general. i.. These considerations will be eerily familiar to the reader who has studied differen- tial equations. and then we can apply P (n) to see that S has a least element. so clearly there are nonconstant solutions as well.. The Well-Ordering Principle There is yet another form of mathematical induction that can be used to give what is.e. then the two basic solutions are xn = rn and also xn = nrn . 1 ≤ k ≤ S. let S ⊂ Z and suppose that 1 ∈ S. arguably. and that for all n. If it isn’t. but it is not hard to check that this gives a solution to a recurrence of the form xn+2 = 2r0 xn+1 −r02 xn . Theorem 2. Namely. assuming it to be true for all positive integers smaller than n + 2. Intutitively. if n ∈ S then n + 1 ∈ S. then S is empty! The well-ordering principle (henceforth WOP) is often useful in its contraposi- tive form: if a subset S ⊂ Z+ does not have a least element. so P (1) is certainly true. This occurs iff x2 = x1 . . And then we continue in this way: if we eventually get a “yes” answer then we have found our least element. 8. Since we have assumed P (k) is true. an even more elegant proof of Proposition 2. then it contains some positive integer n. It is unfortunately harder to guess this in advance. where P (n) is the following statement: P (n): If S ⊂ Z+ and n ∈ S. and suppose that n + 1 ∈ S. the statement is true by the following reasoning: first we ask: is 1 ∈ S? If so. s ≤ t. For a more systematic exposition on “discrete analogues” of calculus concepts (with applications to the determination of power sums as in §3). it is certainly the least element of S. then it means that there exists k ∈ S. First we will assume PS/CI and show that WOP follows. we wish to show that T = ∅. THE WELL-ORDERING PRINCIPLE 39 It certainly looks as though xn = n for all n. then indeed 1 is the least element of S.11. putting T = Z+ \ S. there exists s ∈ S such that for all t ∈ S. then S = ∅. let us assume WOP and prove PMI. we easily check xn+2 = 2xn+1 − xn = 2(n + 1) − n = 2n + 2 − n = n + 2. we ask: is 2 ∈ S? If so.13. then we’re done. One solution is C1 1n = C1 (i. therefore there exists a least element of S. then S has a least element. Indeed. if the characteristic polynomial is (x − r)2 .

but now our inductive assumption implies that n + 1 = n ∈ S. This kind of argument is often called proof by minimum counterexample. . so that we can write n = m + 1 for some m ∈ Z+ . ≤ pk .. Then p | a or p | b.11 – i.11 using WOP. ≤ ql . 1 ∈ S. . there- fore S is empty: every integer greater than 1 is a product of primes. MATHEMATICAL INDUCTION then by WOP T has a least element. In that case. Further. n 6∈ S.14. Theorem 2. since a and b are integers greater than 1 which are smaller than the least element of S. Now n is certainly not prime. . Every prime factorization can be put in standard form by ordering the primes from least to greatest. (Fundamental Theorem of Arithmetic) The factorization of any integer n > 1 into primes is unique.15. they must each have prime factorizations. But now. A prime factorization n = p1 · · · pk is in standard form if p1 ≤ . (Euclid’s Lemma) Let p be a prime number and a.14 and 15. say n. By assumption. since otherwise our uniqueness statement would have to include the proviso “up to the order of the factors. say n. Then k = l and pi = qi for all 1 ≤ i ≤ k. We wish to show that every integer n > 1 can be factored into primes.1. by WOP it has a least element. . So we must have n = ab with 1 < a. Seeking a contradiction. Theorem 2. contradicting our assumption. Reasoning this out gives an immediate contradiction: first. . b < n. . But then (stop me if you’ve heard this one before) n = ab = p1 · · · pk q1 · · · ql itself can be expressed as a product of primes. since n is the least element of T we must have n − 1 = m ∈ S. Euclid’s Lemma and the Fundamental Theorem of Arithmetic.13 are equivalent: each can be easily deduced from the other. Dealing with standard form factorizations is a convenient bookkeeping device. b be positive Suppose that p | ab. Let us give another proof of Proposition 2. b = q1 · · · ql . up to the order of the factors: suppose n = p1 · · · pk = q1 · · · ql .11 are very close: the difference between a proof by PS/CI and a proof by WOP is more a matter of taste than technique. contradiction. So now we have shown that PMI ⇐⇒ PS/CI =⇒ WOP =⇒ PMI. These two proofs of Proposition 2. .40 2. so we must have n > 1. The following are the two most important theorems in beginning number theory. Let S be the set of integers n > 1 which cannot be factored into primes. since otherwise it can be factored into primes.” Given Proposition 2. with p1 ≤ . are two factorizations of n into primes. the existence of prime factorizations – Theorems 2. ≤ pk and q1 ≤ . 9. say a = p1 · · · pk . we assume S is nonempty. The Fundamental Theorem of Arithmetic 9.e.

Therefore we can cancel them from the expression. This also implies that p1 = qj is less than or equal to all the primes appearing on the right hand side. say p. so j = 1. so (7) n = p1 · · · pk = q1 · · · ql . Here is a proof of Euclid’s Lemma using WOP.2. Rogers [Ro63]. . In the first case p divides a. b ∈ Z+ with p | ab but p . we suppose there is at least one prime such that Eu- clid’s Lemma does not hold for that prime. Now consider the following equation: ab = (a − p)b + pb. THE FUNDAMENTAL THEOREM OF ARITHMETIC 41 EL implies FTA: Assume Euclid’s Lemma. Since the prime factorization is unique. meaning that the primes on the left hand side of (8) are equal. p1 = qj = q1 and pi = qi for 2 ≤ i ≤ j.a. we must have that p1 | qj for some 1 ≤ j ≤ l. following K. so there are a.b. 9. Our proof will be by minimal counterexample: suppose that there are some integers greater than one which factor into primes in more than one way. so it must have a unique factorization into primes. The traditional route to FTA is via Euclid’s Lemma. But since qj is also prime. this means that p1 = qj . where each of the primes is written in nonincreasing order. and let n be the least such integer. let p be a prime. b > 1 and therefore have unique prime factorizations a = p1 · · · pr . this implies Propo- sition 2. By WOP there is a least such prime. our assumption that p divides ab means ab = kp for some k ∈ Z+ and thus ab = p1 · · · pr q1 · · · qs = kp. And we will.9. As seen above. This means that in (7) the two factorizations are the same after all! FTA implies EL: Assume every integer greater than one factors uniquely into a product of primes. p1 But pn1 is less than the least integer which has two different factorizations into primes. so by Proposition 2. Evidently p1 | n = q1 · · · ql . b = q1 · · · qs . we must have at least one pi or at least one qj equal to p. The right hand side of this equation shows that p must appear in the prime factor- ization of ab. and the traditional route to Euclid’s Lemma (Euclid’s route in Elements) is via a series of intermediate steps including the Euclidean algorithm and finding the set of all integer solutions to equations of the form ax+by = 1. Rogers’ Inductive Proof of Euclid’s Lemma. then the other is just p and the conclusion is clear. By WOP there is a least a ∈ Z+ such that there is at least one b ∈ Z+ with p | ab and p . to the primes on the right hand side of (8). and let a and b be positive integers such that p | ab. p . Thus k = l. so we may assume a. in the second case p divides b. Seeking a contradiction.9: if a prime divides any finite product of integers it must divide one of the factors. But we can bypass all these intermediate steps and give direct inductive proofs of both EL and FTA. If either a or b is 1.b.a and p . 9. getting n (8) = p2 · · · pk = q1 · · · qj−1 qj+1 · · · ql . in order. This takes some time to develop – perhaps a week in an elementary number theory course.

contradiction. Therefore k a a a p q = q b. We claim that the standard form factorization of a positive integer is unique. not necessarily distinct from each other. so p | q b. then after relabelling the qj ’s we could assume p1 = q1 and then divide through by p1 = q1 to get a smaller positive integer pn1 . Specifically. qj are already primes which are different from p1 . the prime factorization of pn1 must be unique: i. at least one of the prime factors must be p1 . and since q1 is also prime this implies p1 = q1 . So Euclid’s Lemma holds for all primes p. so q < p. r − 1 = s − 1 and pi = qi for all 2 ≤ i ≤ r. a > 1 – if p | 1 · b. we must have that p | a − p or p | b. However. so by minimality of n.e. the prime factorization of m must be unique. p1 6= qj for any j. .b. Since p | ab. when we factor m = (q1 − p1 )(q2 · · · qs ) into primes.. Evidently 0 < m < n. q | p or q | k. then p | b – so by Proposition 2. contradiction! Case 2: We have a = p. so we must have p | a − p. Without loss of generality. Then x | z. then the set of positive integers which have at least two different standard form factorizations is nonempty. MATHEMATICAL INDUCTION which shows that p | ab ⇐⇒ p | (a − p)b.11 a is divisible by some prime q. so Euclid’s Lemma holds for q. if we had such an equality. Indeed. so it must be that p | aq (so in particular p | a) or p | b. and q | a < p. and now q | a =⇒ q | ab = pk. Assume not. On the other hand. Then. By the assumed minimality of n. m = p1 (p2 · · · pr − q2 · · · qs ) shows that p1 | m. so by Euclid’s Lemma for q. so has a least element. Then multiplying by p1 = q1 we see that we didn’t have two different factorizations after all. In particular p1 6= q1 .42 2. assume p1 < q1 . The Lindemann-Zermelo Inductive Proof of FTA. The first case  is impossible   since p is prime and 1 < q < p. if we subtract p1 q2 · · · qs from both sides of (9). Contradiction! 9. But this implies p1 | q1 . so we must have q | k. . However. Therefore. . Case 3: We have a < p. b) Explain why Theorem 2. following Lindemann [Li33] and Zermelo [Ze34]. 9. Then. b.16 using the Fundamental Theorem of Arithmetic.16. (Generalized Euclid’s Lemma) Let a. . say n. Here is a proof of FTA using WOP.16 is indeed a generalization of Euclid’s Lemma. A Generalized Euclid’s Lemma.3. and we can use these to get a contradiction. Contradiction. where: (9) n = p1 · · · pr = q1 · · · qs .4. but then p | (a − p) + p = a. c ∈ Z+ . Theorem 2. By assumption p . (10) gives two different factorizations of m. But 1 < q < a and a is the least positive integer for which Euclid’s Lemma fails for p and a. There are three cases: Case 1: a − p is a positive integer. Therefore q is a prime which is smaller than the least prime for which Euclid’s Lemma fails. Suppose that x | yz and x and y are relatively prime. Exercise: a) Prove Theorem 2. But then p | a. Here the pi ’s and qj ’s are prime numbers. we get (10) m := n − p1 q2 · · · qs = p1 (p2 · · · pr − q2 · · · qs ) = (q1 − p1 )(q2 · · · qs ). But q2 . we may write pk = ab for some k ∈ Z+ . . so the only way we could get a p1 factor is if p1 | (q1 − p1 ). since 0 < a − p < a and a was by assumption the least positive integer such that Euclid’s Lemma fails for the prime p.

subtraction. The graph of such a function is the horizontal line y = a. In other words. Example: Let n ∈ Z+ . 43 . an ∈ R. The identity function I : R → R by I(x) = x.e. . The general name for a function f : R → R which is built up out of the identity function and the constant functions by finitely many additions and multiplications is a polynomial. Pn Let us define a polynomial expression as an expression of the form i=0 ai xi . the two simplest kinds of functions are the following: Constant functions: for each a ∈ R there is a function Ca : R → R such that for all x ∈ R. In other words. . for any function f : R → R we have I ◦ f = f ◦ I = f. CHAPTER 3 Polynomial and Rational Functions 1. Thus. Such functions are called constant. . we also want to take – at least until we prove it doesn’t make a dif- ference – a more algebraic approach to polynomials. division and composition of functions. b ∈ R. every polynomial function is of the form (11) f : x 7→ an xn + . Example: Let m. However. The function mn : x 7→ xn is built up out of the iden- tity function by repreated multiplication: mn = I · I · · · I (n I’s altogether). Recall that the identity function is so-called because it is an identity element for the operation of function composition: that is. . Then L is built up out of constant functions and the identity function by addition and multiplication: L = Cm · I + Cb .. Polynomial Functions Using the basic operations of addition. the output of the function does not depend on the input: whatever we put in. . . multiplication. The graph of the identity function is the straight line y = x. there exists some n ∈ N such that ai = 0 for all i > n. and consider the function L : R → R by x 7→ mx + b. to give a polynomial expression we need to give for each natural number i a constant ai . + a1 x + a0 for some constants a0 . we can combine very simple functions to build large and interesting (and useful!) classes of functions. For us. the same value a will come out. while requiring that all but finitely many of these constants are equal to zero: i. Ca (x) = a.

e. If I ask you what the coeffi- cient of arcsin x is in the expression π.) But still f and g are given by different “expressions”: if I ask you what the coefficient of arcsin x is in the expression f . This is the poly- nomial whose ith coefficient ai is equal to zero for all i ≥ 0. + a1 x + a0 . Thus in (11) the degree of f is n if and only if an 6= 0: otherwise the degree is smaller than n. g be nonzero polynomial expressions. The degree of q is 2 unless a = 0. then deg(f + g) ≤ max(deg f.. . We will follow this convention here but try not to make too big a deal of it. Similarly the polynomials of degree at most two are the quadratic expressions q(x) = ax2 + bc + c. The degree of L(x) is one if m 6= 0 – i.e. am 6= 0 and g(x) = bn xn + . To give some rough idea of what I mean here. it is extremely convenient to regard deg 0 as negative. Proof. you will have no idea what I’m talking about. such that deg 0 is smaller than the degree of any nonzero polynomial. . you will immediately tell me it is 2. This means that for any d ∈ N “the set of polynomials of degree at most d” includes the zero polynomial. so arcsin x + arccos x = θ + ϕ = π2 .) The polynomials of degree at most one are the linear expressions L = mx + b. . the largest natural number i such that the coefficient ai of xi is nonzero. i.. bn 6= 0 . . Now it turns out for all x ∈ [−1. a) If f + g 6= 0. deg g). b) We have deg(f g) = deg f + deg g. (The angle θ whose sine is x is complementary to the angle ϕ whose cosine is x. Theorem 3. which is a natural number. a) Suppose that f (x) = am xm + . Let f. + b1 x + b0 . if the line is not horizontal – and 0 if m = 0 and b 6= 0. The corresponding functions are linear functions: their graphs are straight lines. Let us give some examples to solidify this important concept: The polynomials of degree at most 0 are the expressions f = a0 . But it is at least conceivable that two different-looking polynomial expressions give rise to the same function. y = 0. consider the two expressions f = 2 arcsin x + 2 arccos x and g = π. We often denote the degree of the polynomial expression f by deg f . POLYNOMIAL AND RATIONAL FUNCTIONS Pn Then every polynomial expression f = i=0 ai xi determines a polynomial func- tion x 7→ f (x). The corresponding functions are all constant functions: their graphs are horizontal lines. Although the zero polynomial expression does not have any natural number as a degree.1.44 3. 1] (the common domain of the arcsin and arccos func- tions) we have f (x) = π. (The graph of the zero polynomial is still a horizontal line. One special polynomial expression is the zero polynomial. so it is useful to include the zero polynomial as having “degree at most zero”. Every nonzero polynomial expression has a degree.

n). the highest order term will be am xm . Take q1 (x) = α βx deg a(x)−deg b(x) . since the polynomial g only smaller powers of x. Proof. . unless bm = −am . In particular the degree of f + g is m = max(m. in this case it will be strictly smaller than m. and since am . b) If f and g are as above. Case 3: Suppose m = n. Case 2: m < n. Step 1: We prove the existence of q(x) and b(x). = a(x) − (q1 (x) + q2 (x) + . define P m X Xn (f ◦ g)(x) = ai ( bj xj )n . bn 6= 0. Take q(x) = 0: deg r(x) = deg a(x) < deg b(x). If deg r1 (x) < deg b(x) then we’re done. Continue in this way generating polynomial expressions qi (x) and ri (x) until we reach a k with deg rk (x) < deg b(x).. . On the other hand. when we add f and g. i=0 j=0 Show that deg(f ◦ g) = (deg f )(deg g).2. am bn 6= 0. deg g = n. so in r1 (x) = a(x) − q1 (x)b(x) the leading terms cancel and deg r1 (x) ≤ deg(a(x))−1. Thus the degree of f + g is at most m. . Then: rk (x) = rk−1 (x) − qk (x)b(x) = rk−2 (x) − qk−1 (x)b(x) − qk b(x) = rk−3 (x) − qk−2 (x)b(x) − qk−1 (x)b(x) − qk b(x) . take q(x) = q1 (x) and r(x) = r1 (x). POLYNOMIAL FUNCTIONS 45 so that deg g = m. 1. so the degree of f + g is n = max(m. Then (f + g)(x) = (am + bm )xm + . + (a1 + b1 )x + (a0 + b0 ). then the leading term of f · g will be am bn xm+n . It will be exactly m unless am + bm = 0. if deg r1 (x) ≥ deg b(x) we apply the process again with r1 (x) in place of a(x): let α1 be the leading coefficient of r1 (x) and take q2 (x) = αβ1 xdeg r1 (x)−deg b(x) .  Pm n Exercise: For polynomial expressions f = i=0 ai xi and g = j=0 bj xj . (Polynomial Division With Remainder) Let a(x) be a polyno- mial expression and b(x) be a nonzero polynomial expression. Case 2: deg a(x) ≥ degb(x). Similarly. the highest order term will be an xn . Then when we add f and g.. Case 1: deg a(x) < deg b(x). . First note that we really get to choose only q(x). The point of this is that q1 (x)b(x) has degee deg a(x) − deg b(x) + deg b(x) = deg a(x) and leading coefficient α β · β = α. n). Thus deg f g = m + n. for then r(x) = a(x) − q(x)b(x). Also let us denote the leading coefficient of a(x) by α and the leading coefficient of b(x) by β. . There are unique polynomial expressions q(x) and r(x) such that (i) a(x) = q(x)b(x) + r(x) and (ii) deg r(x) < deg b(x). so that in r2 (x) = r1 (x) − q2 (x)b(x) the leading terms cancel to give deg r2 (x) ≤ deg(r1 (x)) − 1 ≤ deg(a(x)) − 2. i.e. Case 1: m > n. The following is the most important algebraic property of polynomials. qk (x))b(x). Theorem 3. .

46 3. POLYNOMIAL AND RATIONAL FUNCTIONS

Thus we may take q(x) = q1 (x) +. . . +qk (x), so that r(x) = a(x)−q(x)b(x) = rk (x)
and deg r(x) = deg rk (x) < deg b(x).
Step 2: We prove the uniqueness of q(x) (and thus of r(x) = a(x) − q(x)b(x)).
Suppose Q(x) is another polynomial such that R(x) = a(x) − Q(x)b(x) has degree
less than the degree of b(x). Then
a(x) = q(x)b(x) + r(x) = Q(x)b(x) + R(x),
so
(q(x) − Q(x))b(x) = R(x) − r(x).
Since r(x) and R(x) both have degree less than deg b(x), so does r(x) − R(x), so
deg b(x) > deg(R(x)−r(x)) = deg((q(x)−Q(x))b(x)) = deg(q(x)−Q(x))+deg b(x).
Thus deg(q(x) − Q(x)) < 0, and the only polynomial with negative degree is the
zero polynomial, i.e., q(x) = Q(x) and thus r(x) = R(x).1 
Exercise: Convince yourself that the proof of Step 1 is really a careful, abstract
description of the standard high school procedure for long division of polynomials.

Theorem 3.2 has many important and useful consequences; here are some of them.
Theorem 3.3. (Root-Factor Theorem) Let f (x) be a polynomial expression and
c a real number. The following are equivalent:
(i) f (c) = 0. (“c is a root of f .”)
(ii) There is some polynomial expression q such that as polynomial expressions,
f (x) = (x − c)q(x). (“x − c is a factor of f .”)
Proof. We apply the Division Theorem with a(x) = f (x) and b(x) = x − c,
getting polynomials q(x) and r(x) such that
f (x) = (x − c)q(x) + r(x)
and r(x) is either the zero polynomial or has deg r < deg x − c = 1. In other
words, r(x) is in all cases a constant polynomial (perhaps constantly zero), and its
constant value can be determined by plugging in x = c:
f (c) = (c − c)q(c) + r(c) = r(c).
The converse is easier: if f (x) = (x − c)q(x), then f (c) = (c − c)q(c) = 0. 
Corollary 3.4. Let f be a nonzero polynomial of degree n. Then the corre-
sponding polynomial function f has at most n real roots: i.e., there are at most n
real numbers a such that f (a) = 0.
Proof. By induction on n.
Base case (n = 0): If deg f (x) = 0, then f is a nonzero constant, so has no roots.
Induction Step: Let n ∈ N, suppose that every polynomial of degree n has at
most n real roots, and let f (x) be a polynomial of degree n + 1. If f (x) has no
real root, great. Otherwise, there exists a ∈ R such that f (a) = 0, and by the
Root-Factor Theorem we may write f (x) = (x − a)g(x). Moreover by Theorem
3.1, we have n + 1 = deg f = deg(x − a)g(x) = deg(x − a) + deg g = 1 + deg g, so
deg g = n. Therefore our induction hypothesis applies and g(x) has m distinct real
1If you don’t like the convention that the zero polynomial has negative degree, then you
can phrase the argument as follows: if q(x) − Q(x) were a nonzero polynomial, these degree
considerations would give the absurd conclusion deg(q(x) − Q(x)) < 0, so q(x) − Q(x) = 0.

1. POLYNOMIAL FUNCTIONS 47

roots a1 , . . . , am for some 0 ≤ m ≤ n. Then f has either m + 1 real roots – if a is
distinct from all the roots ai of g – or m real roots – if a = ai for some i, so it has
at most m + 1 ≤ n + 1 real roots. 
Pn
Lemma 3.5. P Let f = i=0 ai xi be a polynomial expression. Suppose that the
n i
function f (x) = i=0 ai x is the zero function: f (x) = 0 for all x ∈ R. Then
ai = 0 for all i, i.e., f is the zero polynomial expression.
Proof. Suppose that f is not the zero polynomial, i.e., ai 6= 0 for some i.
Then it has a degree n ∈ N, so by Corollary 3.4 there are at most n real numbers c
such that f (c) = 0. But this is absurd: f (x) is the zero function, so for all (infinitely
many!) real numbers c we have f (c) = 0. 

Theorem 3.6. (Uniqueness Theorem For Polynomials) Let
f = an xn + . . . + a1 x + a0 ,

g = bn x n + . . . + b1 x + b0
be two polynomial expressions. The following are equivalent:
(i) f and g are equal as polynomial expressions: for all 0 ≤ i ≤ n, ai = bi .
(ii) f and g are equal as polynomial functions: for all c ∈ R, f (c) = g(c).
(iii) There are c1 < c2 < . . . < cn+1 such that f (ci ) = g(ci ) for 1 ≤ i ≤ n + 1.
Proof. (i) =⇒ (ii): This is clear, since if ai = bi for all i then f and g are
being given by the same expression, so they must give the same function.
(ii) =⇒ (iii): This is also immediate: since f (c) = g(c) for all real numbers c, we
may take for instance c1 = 1, c2 = 2, . . . , cn+1 = n + 1.
(iii) =⇒ (i): Consider the polynomial expression
h = f − g = (an − bn )xn + . . . + (a1 − b1 )x + (a0 − b0 ).
Then h(c1 ) = f (c1 ) − g(c1 ) = 0, . . . , h(cn+1 ) = f (cn+1 ) − g(cn+1 ) = 0. So h is a
polynomial of degree at most n which has (at least) n + 1 distinct real roots. By
Corollary 3.4, h must be the zero polynomial expression: that is, for all 0 ≤ i ≤ n,
ai − bi = 0. Equivalently, ai = bi for all 0 ≤ i ≤ n, so f and g are equal as
polynomial expressions. 

In particular, Theorem 3.6 says that if two polynomials f (x) and g(x) look differ-
ent – i.e., they are not coefficient-by-coefficient the same expression – then they are
actually different functions, i.e., there is some c ∈ R such that f (c) 6= g(c).

Finally, we want to prove an arithmetic result about polynomials, the Rational
Roots Theorem. For this we need another number-theoretic preliminary. Two
positive integers a and b are coprime (or relatively prime) if they are not both
divisible by any integer d > 1 (equivalently, they have no common prime factor).
Theorem 3.7. (Rational Roots Theorem) Let a0 , . . . , an be integers, with a0 , an 6=
0. Consider the polynomial
P (x) = an xn + . . . + a1 x + a0 .
Suppose that cb is a rational number, written in lowest terms, which is a root of P :
P ( cb ) = 0. Then a0 is divisible by b and an is divisible by c.

48 3. POLYNOMIAL AND RATIONAL FUNCTIONS

Proof. We know
bn
 
b b
0=P = an n + . . . + a1 + a0 .
c c c
Multiplying through by cn clears denominators, giving
an bn + an−1 bn−1 c + . . . + a1 bcn−1 + a0 cn = 0.
Rewriting this equation as
an bn = c(−an−1 bn−1 − . . . − a0 cn−1 )
shows that an bn is divisible by c. But since b and c have no prime factors in
common and bn has the same distinct prime factors as does b, bn and c have no
prime factors in common and are thus coprime. So Theorem 2.16 applies to show
that an is divisible by c. Similarly, rewriting the equation as
a0 cn = b(−an bn−1 − an−1 bn−2 c − . . . − a1 cn−1 )
shows that a0 cn is divisible by b. As above, since b and c are coprime, so are b and
cn , so by Theorem 2.16 a0 is divisible by b. 

In high school algebra the Rational Roots Theorem is often used to generate a finite
list of possible rational roots of a polynomial with integer coefficients. This is nice,
but there are more impressive applications. For instance, taking an = 1 and noting
that 1 is divisible by c iff c = ±1 we get the following result.
Corollary 3.8. Let a0 , . . . , an−1 ∈ Z, and consider the polynomial
P (x) = xn + an−1 xn−1 + . . . + a1 x + a0 .
Suppose c is a rational number such that P (c) = 0. Then c is an integer.
So what? Let p be any prime number, let n ≥ 2, and consider the polynomial
P (x) = xn − p.
By Corollary 3.8, if c ∈ Q is such that P (c) = 0, then c ∈ Z. But if c ∈ Z is such
that P (c) = 0, then cn = p. But this means that the prime number p is divisible
by the integer c, so c = ±1 or c = ±p. But (±1)n = ±1 and (±p)n = ±pn , so

cn = p is impossible. So there is no c ∈ Q such that cn = p: that is, n p is irrational.

This is a doubly infinite generalization of our first irrationality proof, that 2
is irrational, but the argument is, if anything, shorter and easier to understand.
(However, we did use – and prove, by induction – the Fundamental Theorem of
Arithmetic, a tool which was not available to us at the very beginning of the
course.) Moral: polynomials can be useful in surprising ways!
Proposition 3.9. For every polynomial P of positive degree there are irre-
ducible polynomials p1 , . . . , pk such that
P = p1 · · · pk .
Exercise: Prove it. (Suggestion: adapt the proof that every integer n > 1 can be
factored into a product of primes.)

2. RATIONAL FUNCTIONS 49

2. Rational Functions
A rational function is a function which is a quotient of two polynomial functions:
P (x)
f (x) = Q(x) , with Q(x) not the zero polynomial. To be sure, Q is allowed to have
roots; it is just not allowed to be zero at all points of R.
P (x)
The “natural domain” of Q(x) is the set of real numbers for which Q(x) 6= 0,
i.e., all but a finite (possibly empty) set of points.
2.1. The Partial Fractions Decomposition.

The goal of this section is to derive the Partial Fractions Decomposition (PFD)
for a proper rational function. The PFD is a purely algebraic fact which is useful
in certain analytic contexts: especially, it is the key technique used to find explicit
antiderivatives of rational functions.
Theorem 3.10. Let P1 , P2 , P be polynomials, with P1 monic irreducible and
P2 not divisible by P1 . Then there are polynomials A, B such that
AP1 + BP2 = P.
Proof. Let S be the set of nonzero polynomials of the form aP1 + bP2 for
some polynomials a, b. Since P1 = 1 · P1 + 0 · P2 , P2 = 0 · P1 + 1 · P2 are in S, S =
6 ∅.
By the Well Ordering Principle there is m ∈ S of minimal degree: write
aP1 + bP2 = m.
If a polynomial lies in S then so does every nonzero constant multiple of it, so we
may assume m is monic. We claim that for any M = AP1 + BP2 ∈ S, we must
have m | M . To see this, apply Polynomial Division to M and m, getting
M = Qm + R, deg R < deg m.
Since also
R = M − Qm = (A − Qa)P1 + (B − Qb)P2 ,
so if R 6= 0 we’d have R ∈ S of smaller degree than m, contradiction. Thus R = 0
and m | M . Since P1 ∈ S, m | P1 . Because P1 is monic irreducible, m = P1 or
m = 1. Since P2 ∈ S, also m | P2 , and so by hypothesis m 6= P1 . Thus m = 1 and
aP1 + bP2 = 1.
Multiplying through by P we get
(aP )P1 + (bP )P2 = P,
so we may take A = aP , B = bP . 
Corollary 3.11. (Euclid’s Lemma For Polynomials) Let p be an irreducible
polynomia, and suppose p | P1 P2 . Then p | P1 or p | P2 .
Proof. As is usual in trying to prove statements of the form “A implies (B or
C)”, we assume that B is false and show that C is true. Here this means assuming
p - P1 . By Theorem 3.10, there are polynomials A and B such that
Ap + BP1 = 1.
Multiplying through by P2 gives
ApP2 + BP1 P2 = P2 .

50 3. POLYNOMIAL AND RATIONAL FUNCTIONS

Since p | ApP2 and p | BP1 P2 , p | (Ap2 + BP1 P2 ) = P2 . 

Exercise: Show that every monic polynomial of positive degree factors uniquely into
a product of monic irreducible polynomials.
Corollary 3.12. Let n ∈ Z+ , and let p, P, Q0 be polynomials, such that p is
irreducible and that p does not divide Q0 .
a) There are polynomials B, A such that we have a rational function identity
P (x) B(x) A(x)
(12) = + .
p(x)n Q0 (x) p(x)n p(x)n−1 Q0 (x)
b) If deg P < deg(pn Q0 ), we may choose A and B such that
deg A < deg(pn−1 Q0 ), deg B < deg pn .
Proof. a) Apply Theorem 3.10a) with P1 = p, P2 = Q0 : we get
(13) P = A1 p + BQ0 .
for some polynomials A1 , B. Thus
P A1 p + BQ0 B A1
= = n + n−1 .
pn Q0 n
p Q0 p p Q0
b) By Polynomial Division (Theorem 3.2), there are C, r such that B = Cp + r and
deg r < deg p. Substituting in (13) and putting A = A1 + CQ0 we get
P = A1 p + BQ0 = A1 p + (Cp + r)Q0 = (A1 + CQ0 )p + rq0 = Ap + rQ0 .
We have
deg(rQ0 ) = deg r + deg Q0 < deg p + deg Q0 ≤ deg pn + deg Q0 = deg(pn Q0 ).
Since also deg P < deg(pn Q0 ) and Ap = P − rQ0 , it follows that
deg A + deg p = deg(Ap) < deg(pn Q0 ) = deg Q0 + n deg p,
so
deg A ≤ deg Q0 + (n − 1) deg p = deg(pn−1 Q0 ). 

The identity of Corollary 3.12 can be applied repeatedly: suppose we start with a
P
proper rational function Q , with Q a monic polynomial (as is no loss of generality;
we can absorb the leading coefficient into P ). Then Q is a polynomial of positive
degree, so we may factor it as
Q = pa1 1 · · · par r ,
where p1 , . . . , pr are distinct monic irreducible polynomials. Let us put Q0 =
pa2 2 · · · par r , so Q = pa1 1 Q0 . Applying Corollary 3.12, we may write
P Ba Aa−1
= a11 + n−1
Q p1 p1 Q0
with deg Ba1 < deg p1 and deg Aa1 < deg(pn−1
1 Q0 ). Thus Corollary 3.12 applies to
Aa1
a1 −1 , as well, giving us overall
p1 Q0

P Ba Ba −1 Aa1 −1
= a11 + a11−1 + n−2 .
Q p1 p1 p1 Q0

2. RATIONAL FUNCTIONS 51

And we may continue in this way until we get
P Ba B1 A0
= a11 + . . . + + ,
Q p1 p1 Q0
with each deg B· < deg P1 and deg A0 < deg Q0 . Recalling that Q0 = pa2 2 · · · par r ,
we may put Q0 = pa2 2 Q1 and continue on. Finally we get an identity of the form
P B1,1 B1,a1 Br,1 Br,ar
(14) a1 ar = a1 + . . . + + . . . + ar + . . . + ,
p1 · · · pr p1 p1 pr pr
with deg Bi,j < deg pi for all i and j. From an algebraic perspective, (14) is the
Partial Fractions Decomposition. In fact, though it is not the point for us,
everything that we have done works for polynomials with coefficients in any field K
– e.g. K = Q, K = C, K = Z/pZ. Conversely, when we work with polynomials over
the real numbers, the expression simplifies, because the classification of irreducible
polynomials over the real numbers is rather simple. Indeed:
Lemma 3.13. The irreducible (real!) polynomials are precisely:
(i) The linear polynomials `(x) = a(x − c) for a, c ∈ R, a 6= 0; and
(ii) The quadratic polynomisls Q(x) = ax2 + bx + c for a, b, c ∈ R, b2 − 4ac < 0.
Every linear polynomial is irreducible. Further, from the Rational Roots Theorem,
for a polynomial of degree greater than 1 to be irreducible, it cannot have any real
roots. For quadratic (and, for the record, cubic) polynomials this is also sufficient,
since a nontrivial factorization of a degree 2 or 3 polynomial necessarily has a linear
factor and thus a real root. It is a a consequence of the Quadratic Formula that a
real quadratic polynomial Q(x) = ax2 + bx + c has no real roots iff its discriminant
b2 − 4ac is negative. Thus half of Lemma 3.13 – the half that says that a linear
polynomials and quadratic polynomial without real roots are irreducible – is easy.
The other half of Lemma 3.13 – the half that says that there are no other
irreducible polynomials – is far from easy: in fact it lies well beyond our current
means. In several hundred pages’ time we will return to prove this result as a
consequence of the Fundamental Theorem of Algebra: see Theorem 15.17.
Combining (14) and Lemma 3.13 we get the desired result.
P (x)
Theorem 3.14. (Real Partial Fractions Decomposition) Let Q(x) be a proper
rational function, with
Q(x) = (x − c1 )m1 · · · (x − ck )mk q1 (x)n1 · · · ql (x)nl ;
here c1 , . . . , ck ∈ R, with a 6= 0, c1 , . . . , ck distinct; and q1 , . . . , ql are distinct monic
irreducible quadratics. There are real numbers A·,· , B·,· , C·,· such that we have an
identity of rational functions
(15)
m1 mk n1 nl
P (x) X A1,i X Ak,i X B1,j x + C1,j X Bl,j x + Cl,j
= i
+. . .+ i
+ j
+. . .+ .
Q(x) i=1
(x − c 1 ) i=1
(x − c k ) j=1
q 1 (x) j=1
ql (x)j

Exercise: Show that the constants in (15) are unique.

CHAPTER 4

Continuity and Limits

1. Remarks on the Early History of the Calculus
We have seen that in order to define the derivative f 0 of a function f : R → R we
need to understand the notion of a limit of a function at a point. It turns out that
giving a mathematically rigorous and workable definition of a limit is hard – really
hard. Let us begin with a quick historical survey.

It is generally agreed that calculus was invented (discovered?) independently by
Isaac Newton and Gottfried Wilhelm von Leibniz, roughly in the 1670’s. Leibniz
was the first to publish on calculus, in 1685. However Newton probably could have
published his work on calculus before Leibniz, but held it back for various reasons.1
To say that “calculus was discovered by Newton and Leibniz” is an oversim-
plification. Computations of areas and volumes which we can now recognize as
using calculus concepts go back to ancient Egypt, if not earlier. The Greek math-
ematicians Eudoxus (408-355 BCE) and Archimedes (287-212 BCE) developed
the method of exhaustion, a limiting process which anticipates integral calculus.
Also Chinese and Indian mathematicians made significant achievements. Even in
“modern” Europe, Newton and Leibniz were not functioning in a complete intel-
lectual vacuum. They were responding to and continuing earlier work by Pierre
de Fermat (on tangent lines) and John Wallis, Isaac Barrow and James Gre-
gory. This should not be surprising: all scientific work builds on work of others.
But the accomplishments of Newton and Leibniz were so significant that after their
efforts calculus existed as a systematic body of work, whereas before them it did not.

How did Newton and Leibniz construe the fundamental concept, namely that of
a limit? Both of their efforts were far from satisfactory, indeed far from making
sense. Newton’s limiting concept was based on a notion of fluxions, which is so
obscure as not to be worth our time to describe it. Leibniz, a philosopher and writer
as well as a mathematician, addressed the matter more thoroughly and came up
with the notion of an infinitesimal quantity, a number which is not zero but
“vanishingly small”, i.e., smaller than any “ordinary” positive number.

The concept of infinitesimals has been taken up by mathematicians since Leib-
niz, and eventually with complete mathematical success...but not until the 1960s!2
For instance one has a definition of an infinitesimal element x of an ordered field
K, namely an element x which is positive but smaller than n1 for all n ∈ Z+ . It is

1I highly recommend James Gleick’s biography of Newton. If I wanted to distill hundreds of
pages of information about his personality into one word, the word I would choose is...weirdo.
2See http://en.wikipedia.org/wiki/Nonstandard analysis for an overview of this story.

53

But if you wanted to make trouble and ask why infinitesimals could not be manip- ulated in other ways which swiftly led to contradictions. or more evidently deduced. and Inferences of the modern Analysis are more distinctly conceived. since we already saw that the slope of any secant line to a linear function y = mx + b is just m. So at best Lebiniz was advocating a limiting process based on a different mathematical model of the real numbers than the “standard” modern one. it is now not at all difficult to compute the derivative of a linear func- tion. So now we need to evaluate the limiting slope of the secant lines. subtitled “A DISCOURSE Addressed to an Infidel MATHEMATICIAN. CONTINUITY AND LIMITS easy to see that an ordered field admits infinitesimal elements iff it does not satisfy the Archimedean axiom. and then finally by Weierstrass around 1850.54 4. f (x1 )). Berkeley described fluxions as the ghosts of departed quantities. among other things. WHEREIN It is examined whether the Object. The calculus of Newton and Leibniz had a famous early critic. I haven’t read Berkeley’s text.assuming an innocuous fact about limits. Lebiniz’s writing on infinitesimals seems like equivocation: at different stages of a calculation the same quantity is at one point “vanishingly small” and at another point not. but from what I am told it displays a remarkable amount of mathematical sophistication and most of its criticisms are essentially valid! So if the mid 17th century is the birth of systematic calculus it is not the birth of a satisfactory treatment of the limiting concept.. Bishop George Berkeley. so that the secant line must be y = mx + b. correct!) answer.2: Let f (x) = mx + b. secant line between the two points (x1 .” Fa- mously. But there is a unique line passing through a given point with a given slope.. f (x1 )) and (x2 . f (x2 )) is the line y = f (x). 2. Then f has the following property: for any x1 6= x2 . it was all too easy to do so. as does the linear function f . Indeed. h→0 h h→0 h→0 h h→0 The above computation is no surprise. But surely if the slope of every secant line . Derivatives Without a Careful Definition of Limits Example 2. And at worst. When did this come? More than 150 years later! The modern definition of limits via inequalities was given by Bolzano in 1817 (but not widely read). than Religious Mysteries and Points of Faith. some goodwill: if you used them as Newton and Leibniz did in their calculations then at the end you would get a sensible (in fact. In 1734 he published The Analyst. The calculus of both fluxions and infnitesimals required. whereas the real numbers R do satisfy the Archimedean axiom. in a somewhat imprecise form by Cauchy in his influential 1821 text. Principles. Then f (x + h) − f (x) mh f 0 (x) = lim = lim m(x + h) + b − (mx + b)h = lim = lim m. the slope of the secant line is f (x2 ) − f (x1 ) mx2 + b − (mx1 + b) m(x2 − x1 ) = = = m.1: Let f (x) = mx + b be a linear function. Using this. x2 − x1 x2 − x1 x2 − x1 Thus the secant line has slope m and passes through the point (x1 . Example 2.

The limit of a constant function f (x) = C as x approaches a is C. Therefore 2x + h is infinitesimally close to 2x. But these are just words. 1 At this point we have seen many examples of a very pleasant algebraic phenomenon.3: Let f (x) = x2 .5: For n ∈ Z+ . and so in the limit the value is 2x. because when this happens we get f (x + h) − f (x) hg(h) f 0 (x) = lim = lim = lim g(h) = g(0). DERIVATIVES WITHOUT A CAREFUL DEFINITION OF LIMITS 55 is m. Then f (x + h) − f (x) (x + h)3 − x3 x3 + 3x2 h + 3xh2 + h3 − x3 f 0 (x) = lim = lim = lim h→0 h h→0 h h→0 h h(3x2 + 3xh + h2 ) = lim = lim 3x2 + 3xh + h2 . and then we plugged in h = 0. Then Pn n n−i i 0 f (x + h) − f (x) (x + h)n − xn x h − xn f (x) = lim = lim = lim i=0 i h→0 h h→0 h h→0 h Pn n n−i i−i   n   h i=1 i x h n n−1 X n = lim = lim x +h xn−i hi−2 h→0 h h→0 1 i=2 i   n n−1 = x = nxn−1 .1. Fact 4. h→0 h h→0 h h→0 Now Leibniz would argue as follows: in computing the limit. for y = f (x) a polynomial function. h→0 h h→0 Again we have simplified to the point where we may meaningfully set h = 0. This is exactly what we need in order to compute the derivative. and thus f 0 (x) = m (constant function). we want to take h infinitesimally small. where g(h) is another polynomial in h. Example 2. If you wanted to give a freshman calculus student practical instructions on how to compute derivatives of reasonably simple functions directly from the definition. when we compute the difference quotient f (x+h)−f h (x) we find that the numerator. Namely. let f (x) = xn . Example 2. Let us record the fact about limits we used.4: f (x) = x3 . getting f 0 (x) = 3x2 . Thus f 0 (x) = 2x. 2. the desired limiting slope is also m. I think you couldn’t do much better than this! Example 2. A simpler and equally accurate description of what we have done is as follows: we simplified the difference quotient f (x+h)−f h (x) until we got an expression in which it made good sense to plug in h = 0. say G(h) = f (x + h) − f (x). always has h as a factor: thus we can write it as G(h) = hg(h). h→0 h h→0 h h→0 . Then f (x + h) − f (x) (x + h)2 − x2 f 0 (x) = lim = lim h→0 h h→0 h x2 + 2xh + h2 − x2 h(2x + h) = lim = lim = lim 2x + h.

+ a1 x + a0 . In particular the derivative of a degree n polynomial is a polynomial of degree n − 1 (and the derivative of a constant polynomial is the zero polynomial). If limx→a f (x) = L. . .3. since constant functions are polynomials. Now what about sums? Let f and g be two differentiable functions. then limx→ (f + g)(x) = L + M. Let us again record our assumption about limits. . Fact 4. . + fn0 . . we get (f + g)0 = f 0 + g 0 .4. . Then the derivative of f + g is f (x + h) + g(x + h) − f (x) − g(x) f (x + h) − f (x) g(x + h) − g(x) lim = lim + . . For any polynomial function g. A . Fact 4.56 4.2. Perhaps we can establish some techniques to streamline the process? For instance. . then (f1 + . evidently. fn0 . The second. fn are functions with derivatives f10 . where c is some real number? Let’s see:   cf (x + h) − cf (x) f (x + h) − f (x) lim = lim c . h→0 h h→0 h h If we assume the limit of a sum is the sum of limits. . Putting these facts together. provided we assume the following fact. let’s record what we’ve used. 3.) Differentiating polynomial functions directly from the definition is. . we get an expression for the derivative of any polyno- mial function: if f (x) = an xn + . some- what tedious. h→0 h h→0 h If we assume limx→a cf (x) = c limx→a f (x). lim g(h) = g(a). + a1 . The first is. then f 0 (x) = nan xn−1 + . we can! By the Root-Factor Theorem. . . If limx→a f (x) = L and limx→a g(x) = M . this tells us that the derivative of the general monomial cxn is cnxn−1 .6: Show by mathematical induction that if f1 . Fact 4. is that of continuity at a point. More gen- erally. . just as important. Again. This tells for instance that the derivative of 17x10 is 17(10x9 ) = 170x9 . then limx→a cf (x) = cL. Exercise 2. we may factor out h = h−0 from the polynomial G(h) iff G(0) = 0. + fn )0 = f10 + . In freshman calculus it is traditional to define continuity in terms of limits. the notion of the limit of a function at a point. of course. CONTINUITY AND LIMITS Can we guarantee that this factorization f (x + h) − f (x) = G(h) = hg(h) always takes place? Yes. But G(0) = f (x+0)−f (x) = f (x)−f (x) = 0. . suppose we know the derivative of some function f : what can we say about the derivative of cf (x). Thus some simple polynomial algebra ensures that we will be able to compute the derivative of any polynomial function. Limits in Terms of Continuity We have been dancing around two fundmaental issues in our provisional treatment of derivatives. . then we can complete this computation: the derivative of cf (x) is cf 0 (x). . h→a (This generalizes our first fact above.

then g ◦ f is continuous at x = c.) Fact 4. x 6= c. c + δ) except possibly at c. Note though that this function is not defined at h = 0: the denominator is equal to zero. This brings us to the following definition. f (c) = L. In fact what we are trying to when differentiating is to find the most rea- sonable extension of the right hand side to a function which is defined at 0. From this relatively small number of facts many other facts follow. Thus the limit L of a function as x → c is a value that if we “plug the hole” in the graph of the function y = f (x) by setting f (c) = L. For instance. Here it is crucial to remark that c need not be in the domain of f . a) Every constant function is continuous at every c ∈ R. every rational function fg is continuous at every c in its domain. it seems to be of some value to define limits in terms of continuity. c + δ) is contained in D.5. For any x ∈ R. at all points c such that g(c) 6= 0. Similarly.δ about c: that is. . is continuous at c. Then limx→c f (x) = L if the function f with domain (c − δ. b) The identity function I(x) = x is continuous at every c ∈ R. one can define limits in terms of it. Let f be a real-valued function defined on some subset D of R such that (c − δ. but let’s save this for later. 3. e) If f is continuous at x = c and g is continuous at x = f (c). h→0 h Here x is fixed and we are thinking of the difference quotient as a function of h. c) If f and g are continuous at x = c. We now wish to define the limit of a function f at a point c ∈ R. there is some δ > 0 such that all points in (c − δ. we get a graph which is 3The concept of continuity also makes sense for functions with domain a proper subset of R. Let L be a real number. d) If f is continuous at x = c and f (c) 6= 0. Rather what we need is that f is defined on some deleted interval Ic. c) ∪ (c. Let f : R → R be a function. since polynomials are built up out of the identity function and the constant func- tions by repeated addition and multiplication.e. f is defined. then f + g and f · g are continuous at x = c. Since I think most people have at least some vague notion of what a continuous function is – very roughly it is that the graph y = f (x) is a nice. We say f is simply continuous if it is continuous at x for every x ∈ R. c + δ) and defined as f (x) = f (x). it follows that all polynomials are continuous at every c ∈ R. To see that this a necessary business.3u Here are some basic and useful properties of continuous functions. unbroken curve – and I know all too well that many students have zero in- tuition for limits.. i. then f1 is continuous at x = c. consider the basic limit defining the derivative: f (x + h) − f (x) f 0 (x) = lim . LIMITS IN TERMS OF CONTINUITY 57 true fact which is not often mentioned is that this works just as well the other way around: treating the concept of a continuous function as known. f may or may not be continu- ous at x. (Of course we cannot prove them until we give a formal definition of continuous functionsu.

. y) is x > y then the first statement is true – for every real number. i.g. whenever we speak of δ > 0 we always take it as implicit that δ ≤ ∆.e. it is g(0). Statements of the form “For all x. Morover we say that f is continuous if it is continuous at c for all c in its domain D. Continuity Done Right 4. The -δ definition of continuity has three alternating quantifiers: for all. CONTINUITY AND LIMITS continuous – just think nicely behaved. then for all. c + ∆) and that for all  > 0. The bit about the domain D is traditionally left a bit more implicit. In general.1. there exists y such that P (x. (For instance. to some new function g(h) which is continuous at zero. if x and y are real numbers and P (x. We start with the difference quotient f (x+h)−f h (x) which is defined for all sufficiently small h but not for h = 0. In general. the inequality |x−c| < δ means c−δ < x < c+δ.) Then the limiting value is obtained simply by plugging in h = 0. for all h 6= 0. Note that an immediate consequence of the definition is that if f (x) is itself con- tinuous at x = c. then there exists. |f (x) − f (c)| < .e.. P (x. The condition is equivalent to requiring that there be some ∆ > 0 such that f is defined on (c − ∆. Then we manipulate / simplify the difference quotient until we recognize it as being equal. when f (x) = x2 . Thus the statement is saying something about . y)” are a bit more complex. it determines a hor- izontal strip centered at the horizontal line y = f (c) of width 2. to fully comphrehend the meaning of statements this logically complex takes serious mental energy. mathematical statements become more complex and harder to parse the more alternating quantifiers they have: i. Similarly. P (x)” have a simple logical structure (if P (x) is itself something reasonably simple. the untrained mind must stop to remind itself that they are not logically equivalent: e. so determines a vertical strip centered at the vertical line x = c of width 2δ. for now – at x = c. 4. Thus the limit of a continuous function at a point is simply the value of the function at that point. Let us first talk about the geometric meaning of the statement: the inequality |f (x) − f (c)| <  means f (c) −  < f (x) < f (c) + : that is. of course). y)” and “There exists x such that for all y. Let D ⊂ R and f : D → R be a function. c + δ) ⊂ D and for all x with |x − c| < δ. statements of the form “There ex- ists x such that P (x)” or “For all x. there exists a greatest real number – and the second statement is false – there is no real number which is greater than every eal number. but since we are trying to be precise we may as well be completely precise. then it is already defined at c and limx→c f (x) = f (c). This is very important! In fact. For c ∈ R we say f is continuous at c if for all  > 0 there exists δ > 0 such that (c − δ.58 4. we can now give a better account of what we have been doing when we compute f 0 (x). The formal defininition of continuity. that new function was g(h) = 2x + h.

a) For any  > 0. Moreover. b) If f and g are both continuous at c then f + g is continuous at c. Lemma 4. leaving the general case as an extra credit problem. .. Example 4. There exists δ > 0 such that |x − c| < δ implies |f (x) − f (c)| < |f (c)| 2 . By the Reverse Triangle Inequality. 4. there exists δ > 0 such that |x−c| < δ implies |f (x)| ≥ α|f (c)|. |f (x)| − |f (c)| ≤ |f (x) − f (c)| < . The second player chooses a δ > 0. 1). We may think of it as a game: the first player chooses any  > 0 and thereby lays down a horizontal strip bounded above and below by the lines f (c) +  and f (c) − .3: Linear functions f (x) = mx + b are continuous. it is possible for her to win no matter which  > 0 the first player names. CONTINUITY DONE RIGHT 59 approximating both the y-values and the x-values of the function. there exists δ > 0 such that |x − c| < δ implies |f (x) − f (c)| < . 2 or |f (c)| |f (c)| |f (x)| > |f (c)| − = . the graph of the function is trapped between the two vertical lines f (c) ± .6. then Af is continuous at c. so |f (x)| ≤ |f (c)| + .. Then for any α ∈ (0.4: f (x) = x2 is continuous. Example 4... Example 4. Example 4. b) Suppose f (c) 6= 0. Now let us talk about the logical meaning of the statement and the sequence of quantifiers. there exists δ > 0 such that |x − c| < δ implies |f (x)| ≤ |f (c)| + .1: Constant functins f (x) = C are continuous. (Upper and Lower Bounds for Continuous Functions) Let f be continuous at x = c. a) If f is continuous at c and A ∈ R.. the second player wins if by restricting to x values lying between the vertical lines c−δ and c+δ. 2 2  Theorem 4. otherwise the first player wins. b) We will prove the result for α = 12 . Let f and g be functions and c ∈ R..7. 4. . a) For any  > 0. .2: The identity function I(x) = x is continuous. . Proof.. Again the Reverse Triangle Inequality implies |f (c)| |f (c)| − |f (x)| ≤ |f (x) − f (c)| < . Now the assertion that f is continuous at x = c is equivalent to the fact that the second player has a winning strategy: in other words. Basic properties of continuous functions. .2..

|A| b) Fix  > 0. A quantity which we can make as small as we like times a constant can still be made as small as we like! Now formally: we may assume A 6= 0 for othewise Af is the constantly zero function. δ2 ). so can be made arbitarily small. Moreover in the term |f (x)||g(x) − g(c)| we can make |g(x) − g(c)| arbitarily small by taking x sufficiently close to c. The situation here is somewhat perplexing: clearly we need to use the continuity of f and g at c. and then. Moreover. and to do this it stands to reason that we should be estimating |f (x)g(x)−f (c)g(c)| in terms of |f (x)−f (c)| and |g(x)−g(c)|. we can make each of |f (x) − f (c)| and |g(x) − g(c)| as small as we like by taking x sufficiently close to c. So we force them to appear by adding and subtracting f (x)g(c): |f (x)g(x) − f (c)g(c)| = |f (x)g(x) − f (x)g(c) + f (x)g(c) − f (c)g(c)| ≤ |f (x)||g(x) − g(c)| + |g(c)||f (x) − f (c)|. For any  > 0. The sum of two quantities which can each be made as small as we like can be made as small as we like! Now formally: choose δ1 > 0 such that |x − c| < δ1 implies |f (x) − f (c)| < 2 . e) If f is continuous at c and g is continuous at f (c) then g ◦ f is continuous at c. This is much better: |g(c)||f (x) − f (c)| is a constant times something which can be made arbitrarily small. Then |x − c| < δ implies |x − c| < δ1 and |x − c| < δ2 . We must show that there exists δ > 0 such that |x − c| < δ implies |f (x) + g(x) − (f (c) + g(c))| < . Choose δ2 > 0 such that |x − c| < δ2 implies |g(x) − g(c)| < 2 . we can make |f (x) − f (c)| less than any positive number we choose. Now |f (x)+g(x)−(f (c)+g(c))| = |(f (x)−f (c))+(g(x)−g(c))| ≤ |f (x)−f (c)|+|g(x)−g(c)|. We must show that there exists δ > 0 such that |x − c| < δ implies |Af (x) − Af (c)| < .) Then |x − c| < δ implies  |Af (x) − Af (c)| = |A||f (x) − f (c)| < |A| · = . This is good: since f and g are both continuous at c. by continuity of f . since f is continuous at x = c there exists δ > 0 such that |x − c| < δ implies |f (x) − f (c)| <  |A| .60 4. 2 2 c) Fix  > 0. We must show that there exists δ > 0 such that |x − c| < δ implies |f (x)g(x) − f (c)g(c)| < . d) If f and g are both continuous at c and g(c) 6= 0. CONTINUITY AND LIMITS c) If f and g are both continuous at c then f g is continuous at c. For each part we work out the idea of the proof first and then translate it into a formal -δ argument. where  is a previously given positive number. Let δ = min(δ1 . and something which is bounded times something which can be made arbitarily small can be made arbitrarily small! Now formally: . a) Fix  > 0. |f (x)| gets arbitrarily close to |f (c)|. so   |f (x) + g(x) − (f (c) + g(c))| ≤ |f (x) − f (c)| + |g(x) − g(c)| < + = . but |Af (x) − Af (c)| = |A||f (x) − f (c)|. but unfortunately we do not yet see these latter two expressions. which we have already proved is continuous. precisely because f is continuous at c we may make the quantity |f (x) − f (c)— as small as we like by taking x sufficiently close to c. (Note what is being done here: by continuity. Proof. then fg is continuous at c. It is convenient for us to make it smaller than  |A| . So |f (x)| is nonconstant but bounded.

there exists δ1 > 0 such that |x − c| < δ1 implies |f (x)| ≤ |f (c)| + 1. since f is continuous at c. Proof. |x − c| < δ implies |g(c)|2     1 1 1 2 | − |= |g(x) − g(c)| <  = . for |x − c| < δ then |x − c| is less than δ1 . there exists δ > 0 such that |x−c| < δ implies |f (x)−f (c)| < γ.8. Since rational functions are built out of constant functions and the identity by repeated addition. if |x−c| < δ. δ3 . Taking δ = min(δ1 . 4. CONTINUITY DONE RIGHT 61 Using Lemma 4. we can make the numerator |g(x) − g(c)| as small as we like by taking x sufficiently close to c. But indeed. (Here we are assuming that g(c) 6= 0. Since g(y) is continuous at y = f (c). δ2 . There exists δ2 > 0 such that |x − c| < δ2 implies  |g(x) − g(c)| < 2(|f (c)|+1) . multiplication and division. g(x) g(c) Now 1 1 |g(c) − g(x)| |g(x) − g(c)| | − |= = . there exists δ3 > 0 such that |x − c| < δ3 implies  |f (x) − f (c)| < 2|g(c)| .  . 2(|f (c)| + 1) 2|g(c)| 2 2 d) Since fg = f · g1 . δ2 .) Taking δ = min δ1 . We must show that there exists δ > 0 such that |x − c| < δ implies 1 1 | − | < .6b) with α = 12 : there exists δ1 > 0 such that |x − c| < δ1 implies |g(x)| ≥ |g(c)| 2 and thus also 1 2 ≤ . Thus again we have a quantity which we can make arbitarily small times a bounded quantity. so it can be made arbitrarily small! Now formally: We apply Lemma 4. This will make the entire fraction as small as we like provided the denominator is not also getting arbitrarily small as x approaches c. Thus. since g is continuous at c and g(c) 6= 0. |f (x) − f (c)| = |y − f (c)| < γ and hence |g(f (x)) − g(f (c))| = |g(y) − g(f (c))| < . Finally. g(x) g(c) |g(x)||g(c)| |g(c)|2 2 e) Fix  > 0. δ2 and δ3 so |f (x)g(x) − f (c)g(c)| ≤ |f (x)||g(x) − g(c)| + |g(c)||f (x) − f (c)|     < (|f (c)| + 1) · + |g(c)| = + = .  Corollary 4. g(x) g(c) |g(x)||g(c)| |g(x)||g(c)| Since g is continuous at x = c. If g(c) = 0 then we simply don’t have the second term in our expression and the argument is similar but easier.7. there exists γ > 0 such that |y − f (c)| < γ implies |g(y) − g(f (c))| < . in light of part c) it will suffice to show that if g is continuous at c and g(c) 6= 0 then g1 is continuous at c. Fix  > 0. Moreover. this follows immediately from Theorem 4. |g(x)||g(c)| |g(c)|2   |g(c)|2 Also there exists δ2 > 0 such that |x − c| < δ2 implies |g(x) − g(c)| < 2 . the denominator is approaching |g(c)|2 6= 0. All rational functions are continuous.6a) and taking  = 1.

we will use the assumed continuity of the sine function to differentiate it). so we are poorly placed to rigorously prove their continuity. Limits Done Right 5. Proof.62 4. c + δ).δ – i. CONTINUITY AND LIMITS Other elementary functions: unfortunately if we try to go beyond rational func- tions to other elementary functions familiar from precalculus. Remark 4. sin x and cos x. it is no loss of generality to suppose that L < M (otherwise switch L and M ) and we do so. if L and M are two numbers such that limx→c f (x) = L and limx→c f (x) = M . c) together with the points (c. all the elementary func- tions above are indeed continuous at√every point of their domain (with the small proviso that for power functions like x we will need to give a separate definition of continuity at an endpoint of an interval. take even the relatively innocuous f (x) = x. perhaps the following is the most basic and important. Now we take  = M 2−L in the definition of limit: since limx→c f (x) = L. we will proceed for now on the assumption that all the above elementary functions are continuous. for all x with 0 < |x − c| < δ – f is defined at x and |f (x) − L| < . In order to formally define limits.. or more colloquially it contains all points “sufficiently close to c but not equal to c”. it is convenient to have the notion of a deleted interval Ic.much later. e. The limit at a point is unique (if it exists at all): that is. Thus Ic consists of (c − δ. but this uses the special property of R that every non-negative number has a square root: we haven’t proved this yet! If α > 0is irrational we have not given any definition of the power function xα . For real numbers c and L and a function f : D ⊂ R → R.g. And in fact we will prove this later in the course. log x. 5.1. satisfactory definitions √ of these functions! For instance. Theorem 4. Now comes the definition. Thus it will be clear that we are not arguing circularly when we finally prove the continuity of these functions. Similarly we do not yet have rigorous definitions of ax for a > 1.9.. namely a set of real numbers of the form 0 < |x − c| < δ for some δ > 0. Among all the many problems of limits. coming up soon). then L = M. ∞).5: This assumption can be justified! That is.δ about a point c.e. we suppose L 6= M . we say limx→c f (x) = L if for every  > 0 there exists δ > 0 such that for all x in the deleted interval Ic.. We want this function to have domain [0. Seeking a contradiction. we run into the issue that we have not yet given complete.6: We will not use the continuity of the elementary functions as an assumption in any of our main results (but only in results and examples explicitly involving elementary functions. However (following Spivak) in order so as not to drastically limit the supply of functions to appear in our examples and exercises. The Formal Definition of a Limit. We hasten to make two remarks: Remark 4. there exists δ1 > 0 such .

Let f and g be two functions defined in a deleted interval Ic. and let c ∈ R. It is possible to prove any/all of these facts in one of two ways: (i) by rephrasing the definition of limit in terms of continuity and appealing to Theorem 4. a) For any constant A. b) limx→c f (x) = L iff f is defined in some deleted interval Ic. for 0 < |x − c| < δ we get both inequalities: M −L |f (x) − L| < 2 M −L |f (x) − M | < . Basic Properties of Limits. 2 However these inequalities are contradictory! Before we go further we urge the reader to draw a picture to see that the vertical strips defined by the two in- equalities above are disjoint: they have no points in common. then limx→c fg(x) (x) L =M .  We have now given a formal definition of continuity at a point and also a formal definition of limits at a point. On the other hand. δ2 . Taking δ = min(δ1 . f (x) > M − M 2−L = M 2+L . limx→c Af (x) = AL. . x 6= c. and similarly. Therefore L = M . Let us now check this formally: since |f (x) − L| < M 2−L .7 to the current context. All the pedagogical advantage here comes from working through this yourself rather than reading my proof. Now what about limits of composite functions? The natural analogue of Theo- rem 4. as usual. We leave the proof to the reader. a) f is continuous at x = c if and only if f is defined at c and limx→c f (x) = f (c).δ of a point c. f (c) = L makes f continuous at x = c. Theorem 4. Proof. let alone a deleted interval around c of such values of x.2. since |f (x) − M | < M 2−L . since limx→c f (x) = M . Clearly there is not a single value of x such that f (x) is at the same time greater than and less than M +L 2 .10. d) If M 6= 0. We should therefore check the compatibility of our definitions. defining f on (c − δ.  5. Theorem 4. then limx→c g(f (x)) = M . b) We have limx→c f (x) + g(x) = L + M . Most of the basic properties of continuity discussed above have analogues for limits. there exists δ2 > 0 such that 0 < |x − c| < δ2 implies |f (x) − M | < M 2−L . then.7 above. so we are now in an “overdeter- mined” situation. 5. so we have reached a contradiction.7e) above would be the following: If limx→c f (x) = L and limx→L g(x) = M . Let f : D ⊂ R → R.11. c + δ) by f (x) = f (x). or (ii) adapting the proofs of Theorem 4. f (x) < L + M 2−L = M 2+L . LIMITS DONE RIGHT 63 that 0 < |x − c| < δ1 implies |f (x) − L| < M 2−L . We state the facts in the following result. Previously though we argued that each of limits and continuity can be defined in terms of the other. c) We have limx→c f (x)g(x) = LM . We suppose that limx→c f (x) = L and limx→c g(x) = M . so I leave it to you.δ around x and.

. Since limy→f (c) g(y) = g(L). (i) =⇒ (ii): We will argue by contradiction: suppose that neither a) nor b) holds..64 4. Now fix  > 0. Indeed the proof of the following result is almost identical to that of Theorem 4. for every δ > 0 there exists x with 0 < |x − c| < δ such that f (x) = L. So it suffices to assume that b) holds: there exists some ∆ > 0 such that 0 < |x − c| < ∆ implies f (x) 6= L.12. Theorem 4. Then limx→c g(f (x)) = g(L). g is not continuous at L. there exists γ > 0 such that 0 < |y − L| < γ implies |g(y) − M | < . then lim g(f (x)) = g( lim f (x)). Indeed. it shows that the continuity of the “inside function” f was not really used. For such x we have g(f (x)) = g(L). g(f (x)) = g(0) = 0. Similarly (and familiarly). This result can be repaired by requiring the continuity of the “outside” function g(x). Since limx→c f (x) = L. It happens that one can say exactly when the above statement about limits of composite funtions holds. g is continuous at L – is precisely Theorem 4. I don’t plan on mentioning this in class and you needn’t keep it in mind or even read it. x→c x→c In other words.7e) above in which f is also continuous at c: in fact. but for all x ∈ R. Proof. .e. The following are equivalent: (i) limx→c g(f (x)) = M . since limx→c f (x) = L. one can “pull a limit through a continuous function”. f (x) 6= L. (Marjanović [MK09]) Suppose limx→c f (x) = L and limx→L g(x) = M . i. there exists δ > 0 such that 0 < |x − c| < δ implies |y − L| = |f (x) − L| < γ and thus |g(f (x)) − g(L)| = |g(y) − g(L)| < . Fix  > 0. (ii) At least one of the following holds: a) g is continuous at L. we have |f (x)−L| < γ. But since a) does not hold. so limx→c g(f (x)) 6= M . but I recently learned that this question has a rather simple answer so I might as well record it here so I don’t forget it myself.12 as: if g is continuous and limx→c f (x) exists.13. Taking  = |g(L) − M | this shows that there is no δ > 0 such that 0 < |x − c| < δ implies |g(f (x)) − M | < . (ii) =⇒ (i). Then limx→0 f (x) = 0 and limx→0 g(x) = 1. there exists δ1 > 0 such that 0 < |x − c| < δ1 implies |f (x)−L| < γ. Let g(x) be equal to 1 for x 6= 0 and g(0) = 0. The case in which a) holds – i. since limx→L g(x) = M .  One may rephrase Theorem 4. b) There exists ∆ > 0 such that for all 0 < |x − c| < ∆. Proof. so limx→0 g(f (x)) = 0. Let f. Here is the point: for 0 < |x−c| < δ1 . M 6= g(L).1: Let f (x) = 0 (constant function). In this form the result is actually a standard one in freshman calculus. Theorem 4.12. Thus g(f (x)) = g(L) 6= M . g be functions with limx→c f (x) = L and g continuous at L. Take c = 0. there exists γ > 0 such that |y − f (c)| < γ implies |g(y) − g(L)| < . we will show that limx→c g(f (x)) 6= M . CONTINUITY AND LIMITS Unfortunately the above statement is not always true! Example 5.e. since b) dooes not hold.

. . In turn this circular sector is contained in a second right triangle T2 . 0). f (x) and M (x) be defined on some deleted interval I ◦ = (c − ∆. so L −  < f (x) < L + . There exists δ1 > 0 such that 0 < |x − c| < δ1 implies |m(x) − L| <  and δ2 > 0 such that 0 < |x − c| < δ2 implies |M (x) − L| < . sin x). Then 0 < |x − c| < δ implies f (x) ≤ M (x) < L +  and f (x) ≥ m(x) > L − .  Example 5. (1. .) Consider the unit circle and the point on it Px = (cos x. δ2 ). m(x) ≤ f (x) ≤ M (x)..5: limx→0 sinx x = 1. the area of T1 is less than or equal to the area of the circular sector. 0). Proof. 5. This right triangle is contained in the circular sector determined by all points on or inside the unit circle with angle between 0 and x. (Nevertheless.. 1 The area of T1 is (one half the base times the height) 2 cos x sin x.4: The function fα defined above is differentiable at x = 0 iff α > 1. . and this is exactly what the additional hypothesis b) allows us to rule out: if we take δ = min(δ1 . . with vertices (0. . Example 5. Now let us write down the inequalities ex- pressing the fact that since T1 is contained in the circular sector which is contained in T2 .3.. we will as- sume that the trigonometric functions are continuous and use geometric reasoning. including properties of angles and arclength. c + ∆) − {c} about x = c. So our only concern is that pehaps f (x) = L for some c with 0 < |x − c| < δ1 .13 involving making an inverse change of variables to evaluate a limit. The area of . tan x). 0). 0). By our assumption about the continuity of sign and our results on continuity of rational functions and compositions of continuous functions. then we may conclude that |g(f (x)) − M | < . define fα : R → R by fα (x) = x α sin( x1 ). Px . Let δ = min(δ1 . Solution: This will be a “17th century solution” to the problem: i.  Remark 5. (1. x 6= 0 and fα (0) = 0. Then limx→c f (x) = L. which is less than or equal to the area of T2 . We claim that fα is continuous at x = 0 iff α > 0. (cos x. this solution seems a lot better than not giving any proof at all until much later in the course. 5. Example 5.3: For α ≥ 0. There is a right triangle T1 with vertices (0. Theorem 4. We suppose that: (i) For all x ∈ I ◦ . LIMITS DONE RIGHT 65 If in addition 0 < |f (x) − L| < γ.e. and (ii) limx→c m(x) = limx→c M (x) = L. or equivalently |f (x) − L| < . Perhaps we may revisit this point towards the end of the course when we talk about inverse functions. The Squeeze Theorem and the Switching Theorem.14. Fix  > 0. ∆) then 0 < |x − c| < δ implies 0 < |f (x) − L| < γ and thus |g(f (x)) − M | < . (Squeeze Theorem) Let m(x).2: The nice expository article [MK09] gives some applications of the implication (ii)b) =⇒ (i) of Theorem 4. fα is continuous at all x 6= 0.

h→0 h h→0 h Example 5.7: If f (x) = sin x. y. x→0 x x→0 x Before doing the next two examples we remind the reader of the composite an- gle formulas from trigonometry: for any real numbers x. for x 6= 0. we have 1 lim = lim cos x = 1.8: If f (x) = cos x. x→0 cos x x→0 Therefore the Squeeze Theorem implies sin x (16) lim = 1. cos(x + y) = cos x cos y − sin x sin y. 2 2 2 cos x or equivalently. or 2π ·π = 2. In summary: 1 − cos x cos x − 1 (17) lim = lim = 0. Indeed f (x + h) − f (x) sin(x + h) − sin x sin x cos h + cos x sin h − sin x f 0 (x) = lim = lim = lim h→0 h h→0 h h→0 h     1 − cos h sin h = − sin x lim + cos x lim = (− sin x) · 0 + (cos x) · 1 = cos x. Example 5. x→0 x x→0 1 + cos x 2 Of course we also have limx→0 cos xx−1 = − limx→0 1−cos x x = −0 = 0.6: We will evaluate limx→0 1−cos x x = 0. Indeed f (x + h) − f (x) cos(x + h) − cos x f 0 (x) = lim = lim = h→0 h h→0 h cos x cos h − sin x sin h − cos x lim h→0 h . cos x x Now we may apply the Squeeze Theorem: since cos x is continuous at 0 and takes the value 1 6= 0 there. then we claim f 0 (x) = − sin x. Here goes: cos2 x − 1 − sin2 x   1 − cos x 1 − cos x 1 + cos x lim = lim = lim = lim x→0 x x→0 x 1 + cos x x→0 x(1 + cos x) x→0 x(1 + cos x)    sin x − sin x −0 = lim lim =1·( ) = 0.66 4. CONTINUITY AND LIMITS x x x the circular sector is 2π times the area of the unit circle. The idea is to use trigonometric identities to reduce this limit to an expression involving the limit (16). then we claim f 0 (x) = cos x. The area 1 1 sin x of T2 is 2 tan x = 2 cos x . x→0 x Example 5. sin x cos x Taking reciprocals this inequality is equivalent to 1 sin x ≤ ≤ cos x. x 1 cos x ≤ ≤ . sin(x + y) = sin x cos y + cos x sin y. This gives us the inequalities 1 1 1 sin x cos x sin x ≤ x ≤ .

11: The “Switching Theorem” is not a standard result. g1 . either f (x) = g1 (x) or f (x) = g2 (x). Fix  > 0. Proof. on an interval of the form [0. We have said earlier that we are assuming for now that this function is continuous on its entire domain. h→0 h h→0 h Theorem 4. for instance. I do plan to use it later on to give a proof of the Chain Rule which is (I think) simpler than the one Spivak gives. Variations on the Limit Concept. but we need to be sure that we can adapt our -δ formalism to the variant notions of limit that apply in calculus. consider the function f (x) = x. 1]. But that statement glossed over a technicality which we now address. There is nothing really novel going on here. inspired by the homework problem asking for a function which is continuous at a single point. ∞). f (0) = 0 = limx→0 f (x). p A similar phenomenon arises when we consider the function f (x) = x(1 − x). We may apply the Switching Theorem to show that f is continuous at 0. Let x be such that 0 < |x − c| < δ. δ) for δ > 0: i. this contains all points sufficiently close to 0 but greater than or equal to zero. and limits at infinity. We suppose that: (i) limx→c g1 (x) = limx→c g2 (x) = L. Instead it is defined. Although I do not claim it is in the same league as the venerated Squeeze Theorem..∆ of x = c. √ By way of introducing the first variant. |f (x) − L| < !  Example 5. That is to say. In- deed. and let δ = min(δ1 . . but rather at functions which are continuous at every point of their domain.15. the domain of f is [0. LIMITS DONE RIGHT 67     1 − cos h sin h = − cos x lim −sin x lim = (− cos x)·0+(− sin x)·1 = − sin x. δ2 ). Either way. let δ2 > 0 be such that 0 < |x − c| < δ2 implies |g2 (x) − L| < . Then limx→c f (x) = L. Finally we consider three variations on the limit concept: one-sided limits. The function is defined at 1 but not in any open interval containing 1: only intervals of the form (1 − δ. (Switching Theorem) Consider three functions f. in which case |f (x) − L| = |g2 (x) − L| < . Then limx→0 g1 (x) = limx→0 g2 (x) = 0 and for all x. infi- nite limits. or f (x) = g2 (x). put g1 (x) = x and g2 (x) = −x. Remark 5. 1].10: The previous example shows that a function may be very strangely behaved and still be continuous at a point. (ii) For all x with 0 < |x − c| < ∆.e. Remark 5. Then either f (x) = g1 (x). It is thus worth emphasizing that we are not really interested in functions which are continuous at certain points.9: Let f (x) be defined as x for rational x and −x for irrational x. I came up with it myself. in which case |f (x) − L| = |g1 (x) − L| < . Such functions have many pleasant properties that we will prove later on in the course. 5. Namely. 5. Let δ1 > 0 be such that 0 < |x−c| < δ1 implies |g1 (x)−L| < .4. So by the Switching Theorem. g2 defined on some deleted interval Ic. which has natural domain [0. f (x) = g1 (x) or f (x) = g2 (x). So f cannot be continuous at 0 ac- cording to the definition that we gave because it is not defined on any open interval containing zero.

68 4. |f (x) − L| < . the following are equivalent: (i) f is left continuous at c and right continuous at c. We say that a is a left endpoint of I if a ∈ I and there is no x ∈ I with x < a. c] =⇒ |f (x) − f (c)| < . c + δ) =⇒ |f (x) − f (c)| < . |f (x) − L| < . Proposition 4. Finally we make the following definition: let I ⊂ R be any interval and let f : I → R be a function. For a function f : D ⊂ R → R and c ∈ D. • If I has a left endpoint a. and • If I had a right endpoint b. We say limx→c− f (x) = L – and read this as the limit as x approaches c from the left of f (x) is L – if for all  > 0 there exists δ > 0 such that for all x with c < x < c + δ. Then f is continu- ous at c for all c ∈ R \ Z. c] and x ∈ (c − δ. the following are equivalent: . whereas for any c ∈ Z. f is right continuous but not left continuous at c. we say that a function f : D ⊂ R is left continuous at x = c if for every  > 0 there exists δ > 0 such that f is defined on (c − δ. We leave the proof to the reader.16.17. Example: Let f (x) = bxc be the greatest integer function. then f is left continuous at b. We say limx→c+ f (x) = L – and read this as the limit as x approaches c from the right of f (x) is L – if for all  > 0. This example suggests the following simple result. As above. Similarly. c + δ) and x ∈ [c. (ii) f is continuous at c. and it is sometimes helpful to do so. all four possibilities of having / not having left / right endpoints are of course possible. An interval I has at most one endpoint and at most one endpoint. For a function f : D ⊂ R → R and c ∈ D. it is necessary to require left/right continuity when discussing behav- ior at right/left endpoints of an interval. there exists δ > 0 such that for all x with c − δ < x < c. √ Example: f (x) = x is right continuous at x = 0. CONTINUITY AND LIMITS This brings us to our definition. In a similar way we can define one-sided limits at a point c. then f is right continuous at a. Now we say that f : I →→ R is continuous if: • f is continuous at c for each interior point c ∈ I. Proposition 4. similarly we say b is a right endpoint of I if b ∈ I and there is no x ∈ I with x > b. Let us say c ∈ I is an interior point if it is not a left endpoint or a right endpoint. We say that a function f : D ⊂ R is right continuous at x = c if for every  > 0 there exists δ > 0 such that f is defined on [c. On the other hand one may still discuss left/right continuity at interior points of an interval.

e. whereas just the opposite is happening: the closer x is to 0. we pick one horizontal line which is arbitrarily large and require that on some small deleted interval the graph of y = f (x) always lie above that line. It is no loss of generality to assume M > 0 (why?).e. so f is not continuous at n. This ter- minology comes from our earlier observation that if we (re)define f at c to be the limiting value L then f becomes continuous at c. In freshman calculus we would say limx→0 f (x) = ∞. Example: Let f (x) = bxc be the greatest integer function. 5. We need to find δ such that 0 < |x| < δ implies x12 > M .δ on which f is bounded. we say that f has a jump discontinuity at c. the larger f (x) becomes.. if it did. Fix M ∈ R. but sometimes useful.. but instead of pick- ing two horizontal lines arbitrarily close to y = L. Similarly: We say limx→c f (x) = −∞ if for all m ∈ R. If for a function f and c ∈ R. if c is not in the domain of f ) we say that f has a removable discontinuity at c. if limx→c f (x) = L exists – but still f (x) is not continuous c (this can happen if either f (c) 6= L or. then there would be some deleted interval I0. this is the case whenever both one-sided limits exist at c but f is not continuous at c. this is similar to the -δ definition of limit. so we may take δ = √1M . but in order to know what we mean when we say this we want to give an -δ style definition of this. more plausibly. there exists δ > 0 such that 0 < |x − c| < δ =⇒ f (x) > M . . One sometimes calls a discontinu- ity which is either removable or a jump discontinuity a simple discontinuity: i. Infinite limits: Consider limx→0 x12 . And we still want to say that. there exists δ > 0 such that 0 < |x − c| < δ implies f (x) < m. (ii) limx→c f (x) exists. and let n ∈ Z. Example: Let us indeed prove that limx→0 x12 = ∞. Here it is: We say limx→c f (x) = ∞ if for all M ∈ R. the left and right hand limits at c both exist but are unequal. Again we leave the proof to the reader. Geometrically. LIMITS DONE RIGHT 69 (i) The left hand and right hand limits at c exist and are equal. Then x12 > M ⇐⇒ |x| < √1M . If the left and right hand limts at c both exist and are equal – i. Then limx→n− f (x) = n − 1 and limx→n+ f (x) = n. There is some terminology here – not essential. This limit does not exist: indeed.

.

x→a  The converse of Theorem 5. x→a x−a x→a Thus   0 = lim (f (x) − f (a)) = lim f (x) − f (a). More- over. Differentiability Versus Continuity Recall that a function f : D ⊂ R → R is differentiable at a ∈ D if f (a + h) − f (a) lim h→0 h exists. and let a ∈ D. We have f (x) − f (a) lim f (x) − f (a) = lim · (x − a) x→a x→a x−a   f (x) − f (a)   = lim lim x − a = f 0 (a) · 0 = 0. If f is differentiable at a. One way of introducing points of non-differentiability while preserving continuity is to take the absolute value of a differentiable function. the tangent line to y = f (x) at f (a) exists if f is differentiable at a and is the unique line passing through the point (a. x→a x→a so lim f (x) = f (a). and when this limit exists it is called the derivative f 0 (a) of f at a. 71 . f (a)) with slope f 0 (a).1 is far from being true: a function f which is continu- ous at a need not be differentiable at a. Note that an equivalent definition of the derivative at a is f (x) − f (a) lim . x→a x−a One can see this by going to the -δ definition of a limit and making the “substitu- tion” h = x − a: then 0 < |h| < δ ⇐⇒ 0 < |x − a| < δ. Let f : D ⊂ R → R be a function. In fact the situation is worse: a function f : R → R can be continuous every- where yet still fail to be differentiable at many points.1. Proof. Theorem 5. CHAPTER 5 Differentiation 1. An easy example is f (x) = |x| at a = 0. then f is continuous at a.

simple as they are.4.1.72 5. Product Rule(s). (Sum Rule) Let f and g be functions which are both differen- tiable at a ∈ R. Then the product f g is also differentiable at a and (f g)0 (a) = f 0 (a)g(a) + f (a)g 0 (a). have the following important consequence. Corollary 5. b) The following are equivalent: (i) f is differentiable at a. Again. Exercise: Prove Theorem 8. (Product Rule) Let f and g be functions which are both differ- entiable at a ∈ R. 2. Then the sum f + g is also differentiable at a and (f + g)0 (a) = f 0 (a) + g 0 (a). Theorem 5. (ii) |f | is differentiable at a.2. Proof. Let f : D ⊂ R → R be continuous at a ∈ D. Proof. Theorem 5. and either f (a) 6= 0 or f (a) = f 0 (a) = 0. Proof. Then the function Cf is also differentiable at a and (Cf )0 (a) = Cf 0 (a). no biggie: (f + g)(a + h) − (f + g)(a) f (a + h) − f (a) g(a + h) − g(a) (f +g)0 (a) = lim = lim + h→0 h h→0 h h f (a + h) − f (a) g(a + h) − g(a) = lim + lim = f 0 (a) + g 0 (a). (Linearity of the Derivative) For any differentiable functions f and g and any constants C1 . C2 .2. 2. DIFFERENTIATION Theorem 5.31. (Constant Rule) Let f be differentiable at a ∈ R and C ∈ R.6. h→0 h h→0 h  These results. a) Then |f | is continuous at a.3.5. h→0 h h→0 h  Theorem 5. h→0 h h→0 h→0 h . Differentiation Rules 2. There is nothing to it:   (Cf )(a + h) − (Cf )(a) f (a + h) − f (a) (Cf )0 (a) = lim =C lim = Cf 0 (a). Linearity of the Derivative. we have (C1 f + C2 g)0 = C1 f 0 + C2 g 0 . f (a + h)g(a + h) − f (a)g(a) (f g)0 (a) = lim h→0 h f (a + h)g(a + h) − f (a)g(a + h) + (f (a)g(a + h) − f (a)g(a)) = lim h→0 h      f (a + h) − f (a) g(a + h) − g(a) = lim lim g(a + h) + f (a) lim .

. what chance do the rest of us have? It is natural. to the extent that we try to hide them from others and perhaps even from ourselves. I don’t really want to know: it’s too good a story! It is fundamentally honest about the way mathematics is done and. and if the great ge- nius Gottfried von Leibniz was not above them. DIFFERENTIATION RULES 73 Since g is differentiable at a. cubic meters. but I am always stopped by the fact that if it turns out to be apocryphal. The last expression above is therefore equal to f 0 (a)g(a) + f (a)g 0 (a).  Dimensional analysis and the product rule: Leibniz kept a diary in which he recorded his scientific discoveries. We can give physically meaningful dimensions (the more common. like Leibniz. Many veteran instructors of higher math- ematics come to lament the fact that for the sake of efficiency and “coverage of the syllabus” we often give the student the correct answer within minutes or seconds of asking the question (or worse. Every once in a while I think about trying to verify it. but certainly not (f g)0 = f 0 g 0 . I confess that I have taken this story from a third-party account. area). But this entire entry of the diary is crossed out. es- pecially. What can we learn from “Leibniz’s” guess that (f g)0 = f 0 g 0 ? Simply to write down the correct product rule is not enough: I want to try to persuade you that “Leibniz’s guess” was not only incorrect but truly silly: that in some sense the product rule could have turned out to be any number of things. which Leibniz says he has known “for some time”. do not present the question at all!) and thus deprive them of the opportunity to make a silly guess and learn from their mistake. For a simple example. to go forward we make guesses. but less precise.e. it happens to be the correct formula for . But r3 is a product of three lengths – i. they play an important role in teaching and learning. suppose you walk into an empty room and find on the blackboard a drawing of a cylinder with its radius labelled r and its height labelled h. to want to hide our silly guesses. You can see right away that something has gone wrong: since r and h are both lengths – say in meters – rh denotes a product of lengths – say. g is continuous at a and thus limh→0 g(a + h) = limx→ ag(x) = g(a). Below it are some formulas. One day he got to the product rule and wrote a formula equivalent to (f g)0 = f 0 g 0 . In fact.. and then our for- mulas must manipulate these quantities in certain ways or they are meaningless. square meters. However. For this I want to follow an approach I learned in high school chemistry: di- mensional analysis. Often we later realize that our guesses were not only wrong but silly. including his development of differential calculus. term is “units”) to all our quantities. But the process of “silly guesswork” is vital to mathematics. the way new mathematics is created.e. and this one jumps out at you: 2π(rh + r3 ). 2. Three days later there is a new entry with a correct statement of the product rule and the derivation. How on earth will we get anything physically meaningful by adding square meters to cubic meters? It is much more likely that the writer meant 2π(rh + r2 ). since that is a meaningful quantity of di- mension length squared (i. That is.

in two steps: f 0 (x) = (x sin xex )0 = ((x sin x)ex )0 = (x sin x)0 ex + (x sin x)(ex )0 = (x0 sin x + x(sin x)0 )ex + x sin xex = sin x + x cos xex + x sin xex . Using the induction hypothesis this last expression becomes (f10 (a)f2 (a) · · · fn (a) + . . .g. . Induction Step: Let n ≥ 2 be an integer. Now let f1 . the product f = f1 f2 f3 is also differentiable at a and f 0 (a) = f10 (a)f2 (a)f3 (a) + f1 (a)f20 (a)f3 (a) + f1 (a)f2 (a)f30 (a). The following result rides this train of thought to its final destination. f3 which are all differentiable at x. Now suppose both f (t) and g(t) are lengths. and suppose that the product of any n functions which are each differentiable at a ∈ R is differentiable at a and that the derivative is given by (18). say measured in seconds. Then f = f1 · · · fn is also differentiable at a. That’s not only wrong. fn+1 be functions. (f g)0 is meters squared per second. . so f 0 (t) · g 0 (t) has units meters squared per second squared. Something similar can be applied to the “formula” (f g)0 = f 0 g 0 . f (x) = x sin xex . + f1 (a) · · · fn−1 (a)fn0 (a)) fn+1 (a)+f1 (a) · · · fn (a)fn+1 0 (a) . fn . We can – of course? – still use the product rule. it’s a priori meaningless. Proof. each differentiable at a. . e. By contrast the correct formula (f g)0 = f 0 g + f g 0 makes good dimensional sense: as above.6). which is good because above we proved it to be correct. say in meters. fn be n functions which are all differentiable at a. Thus the formula at least makes sense. f2 . Theorem 5. + f1 (a) · · · fn−1 (a)fn0 (a). the units of (f g)0 are meters squared per second.7. but we omit it for now. Then by the usual product rule (f1 · · · fn fn+1 )0 (a) = ((f1 · · · fn )fn+1 )0 (a) = (f1 · · · fn )0 (a)fn+1 (a)+f1 (a) · · · fn (a)fn+1 0 (a). . Then f 0 (t) and g 0 (t) both have units meters per second. Base Case (n = 2): This is precisely the “ordinary” Product Rule (Theorem 5. . and let f1 . so we can add them to get a meaningful number of meters squared per second. and (18) (f1 · · · fn )0 (a) = f10 (a)f2 (a) · · · fn (a) + .74 5. Taking these ideas more seriously suggests that we should look for a proof of the product rule which explicitly takes into account that both sides are rates of change of areas. . By induction on n. so the same method shows that for any three functions f1 . . sin x and ex until the last step. Let f = f (t) and view t as being time. (Generalized Product Rule) Let n ≥ 2 be an integer. . its dimension is the dimension of f divided by time. Suppose we want to find the derivative of a function which is a product of not two but three functions whose derivatives we already know. so is f 0 g and so is f g 0 . This is indeed possible. DIFFERENTIATION the surface area of the cylinder with the top and bottom faces included. On the other hand. Since f 0 (t) is a limit of quo- tients f (t+h)−f h (t) . Note that we didn’t use the fact that our three differentiable functions were x. Thus the “formula” (f g)0 = f 0 g 0 is asserting that some number of meters squared per second is equal to some number of meters squared per second squared. . . although dimensional considerations won’t tell you that.

and let f. a + δ) about a on which g is nonzero. g g(a)2 Proof. so applying the Generalized Power rule we get (xn )0 = (x)0 x · · · x + .8. Step 0: First observe that since g is continuous and g(a) 6= 0. + x · · · x(x)0 . there is some interval I = (a − δ. g(a) g(a) g(a)2  ..9. Theorem 5. g is continuous at a. (Generalized Leibniz Rule) Let n ∈ Z+ . Indeed. the special case of the Quo- tient Rule in which f (x) = 1 (constant function).e. h→0 hg(a)g(a + h) h→0 h h→0 g(a)g(a + h) g(a)2 We have again used the fact that if g is differentiable at a. Then f g is is n times differentiable at a and n   X n (k) (n−k) (f g)(n) = f g . Then fg is differentiable at a and  0 f g(a)f 0 (a) − f (a)g 0 (a) (a) = . Then  0 1 1 1 g(a+h) − g(a) (a) = lim g h→0 h −g 0 (a)    g(a) − g(a + h) g(a + h) − g(a) 1 = lim = − lim lim = . DIFFERENTIATION RULES 75 = f10 (a)f2 (a) · · · fn (a)fn+1 (a) + . with g(a) 6= 0. i. so each term evaluates to xn−1 . so (xn )0 = nxn−1 . we have  0  0  0 f 1 0 1 1 (a) = f · (a) = f (a) + f (a) (a) g g g(a) g f 0 (a) g 0 (a) g(a)f 0 (a) − g 0 (a)f (a) = − f (a) 2 = . taking f1 = · · · = fn = x. . and on this interval f g is defined. .  Example: We may use the Generalized Product Rule to give a less computationally intensive derivation of the power rule (xn )0 = nxn−1 for n ∈ Z+ . Moreover we have n terms in all. Step 1: We first establish the Reciprocal Rule. No need to dirty our hands with binomial coefficients! Theorem 5. .8. Indeed. g be functions which are each n times differentiable at a ∈ R. Thus it makes sense to consider the difference quotient f (a + h)/g(a + h) − f (a)/g(a) h for h sufficiently close to zero. Here in each term we have x0 = 1 multiplied by n − 1 factors of x. + f1 (a) · · · fn (a)fn+1 0 (a). . Step 2: We now derive the full Quotient Rule by combining the Product Rule and the Reciprocal Rule. 2. we have f (x) = xn = f1 · · · fn . k k=0 Exercise: Prove Theorem 5. (Quotient Rule) Let f and g be functions which are both differ- entiable at a ∈ R.

Then limx→a f (x) = L. However. the difference quotient g(f (x))−g(fx−a (a)) is close to (or equal to!) zero. since we are assuming f is differentiable at a. f (x)→f (a) f (x) − f (a) x→a x−a The replacement of “limx→a . | g(f (x))−g(f x−a (a)) | < . how do we know that we are not dividing by zero?? The answer is that we cannot rule this out: it is possible for f (x) to take the value f (a) on arbitarily small deleted intervals around a: again. there is x with 0 < |x − a| < δ such that f (x) − f (a) = 0.76 5. DIFFERENTIATION Lemma 5. by limf (x)→f (a) . In this case. there exists at least one x with 0 < |x − a| < δ such that f (x) = L.10. this is exactly what happens for the function fα (x) of the above example near a = 0. so the difference quotient is zero at these points.)  Theorem 5. Let f : D ⊂ R → R. Thus the limit tends to zero no matter which alternative obtains. The second is f (x) − f (a) 6= 0. .10 that if f (x) − f (a) lim x→a x−a exists at all. So f 0 (a) = 0. But we are assuming the above limit exists. Suppose: (i) limx→a f (x) exists. here we are trying to show (g ◦ f )0 (a) = 0. and therefore.11. if we fix  > 0. in which case g(f (x)) − g(f (a)) f (x) − f (a) g(f (x)) − g(f (a)) = · f (x) − f (a) x−a holds. Proof. it is tempting to argue as follows:     g(f (x)) − g(f (a)) g(f (x)) − g(f (a)) f (x) − f (a) (g ◦ f )0 (a) = lim = lim · x→a x−a x→a f (x) − f (a) x−a    g(f (x)) − g(f (a)) f (x) − f (a) = lim lim x→a f (x) − f (a) x→a x−a    g(f (x)) − g(f (a)) f (x) − f (a) = lim lim = g 0 (f (a))f 0 (a). On the other hand. (Chain Rule) Let f and g be functions. . . since the Chain Rule reads (g ◦ f )0 (a) = g 0 (f (a))f 0 (a). and derive a contradiction by taking  to be small enough compared to |M − L|. (Suggestion: suppose limx→a f (x) = M 6= L. in which case also g(f (x)) − g(f (a)) = g(f (a)) − g(f (a)) = 0. and let a ∈ R be such that f is differentiable at a and g is differentiable at f (a). Left as an exercise. The above argument is valid unless for all δ > 0. when f (x) − f (a) = 0.” in the first factor above is justified by the fact that f is continuous at a. Then the composite function g ◦ f is differentiable at a and (g ◦ f )0 (a) = g 0 (f (a))f 0 (a). then the first step of the argument shows that there is δ > 0 such that for all x with 0 < |x − a| < δ such that f (x) − f (a) 6= 0. it must be equal to 0. . Somewhat more formally. this argument has a gap: when we multiply and divide by f (x) − f (a). For x ∈ R we have two possibilities: the first is f (x) − f (a) = 0. . So whichever holds. and the above argument shows this expression tends to g 0 (f (a))f 0 (a) = 0 as x → a. Proof. Motivated by Leibniz notation. and (ii) There exists a number L ∈ R such that for all δ > 0. I maintain that the above gap can be mended so as to give a complete proof. it follows from Lemma 5.

so that g(f (x)) − g(f (a)) lim = 0 = g 0 (f (a))f 0 (a). 3. and • for all x with a < x < a + δ. it is always good to find one’s feet by considering some examples and nonexamples of that definition. Exercise: Let f : I → R. we require the existence of a δ > 0 such that: • for all x with a − δ < x < a. Functions increasing or decreasing at a point. Optimization 3. f (x) < f (a) and for all x sufficiently close to a and to the right of a. . Intervals and interior points. f (x) ≤ f (a). . let f (x) = xn . We say that f is increasing at a if for all x sufficiently close to a and to the left of a. then for all a ∈ R. Exercise: Give the definition of “f is decreasing at a”. f (x) > f (a). OPTIMIZATION 77 then | g(f (x))−g(f x−a (a)) | = 0. b) Show that f is weakly increasing at a iff −f is weakly decreasing at a.2. At this point I wish to digress to formally define the notion of an interval on the real line and and interior point of the interval. and • for all x with a < x < a + δ. so it is certainly less than ! Therefore. We say f is decreasing at a if there exists δ > 0 such that: • for all x with a − δ < x < a.1 1We do not stop to prove these assertions as it would be inefficient to do so: soon enough we will develop the right tools to prove stronger assertions. and let a be an interior point of D. a) Show that f is increasing at a iff −f is decreasing at a.1. all in all we have 0 < |x − a| < δ =⇒ | g(f (x))−g(f x−a (a)) | < . x→a x−a  3. f (x) is increasing at a. 3. and • for all x with a < x < a + δ. if a > 0 then f (x) is increasing at a. f (x) > 0 = f (0). f (x) ≥ f (a). f (x) > f (a). f (x) < f (a). If x is even. and let a be an interior point of I. . Example: Let f (x) = mx + b be the general linear function. and f is weakly decreasing at a iff m ≤ 0. f is weakly increasing at a iff m ≥ 0. f (x) > f (a). More formally phrased. Let f : D → R be a function. f (x) < f (a). Then for any a ∈ R: f is increasing at a iff m > 0. Example: Let n be a positive integer. then if a < 0. Note that when n is even f is neither increasing at 0 nor decreasing at 0 because for every nonzero x. f is decreasing at a iff m < 0. f (x) is decreasing at a. Then: If x is odd. But when given a new definition. . We say f is weakly increasing at a if there exists δ > 0 such that: • for all x with a − δ < x < a.

take  = f 0 (a): there exists δ > 0 such that for all x with 0 < |x − a| < δ. f (x)−f x−a (a) > 0. a minimum value m. Recall that f : D → R is bounded above if there is B ∈ R such that for all x ∈ D. one of the two would be larger than the other! However a function need not have any maximum value: for instance f : (0. then f is increasing at a. (ii) Use the fact that f is decreasing at a iff −f is increasing at a and f 0 (a) < 0 ⇐⇒ (−f )0 (a) > 0 to apply the result of part a). f (x) − f (a) < 0. Proof. A function is bounded below if there is b ∈ R such that for all x ∈ D. f (x) ≥ m. Similarly. Also. The function f : R \ {0} → R by f (x) = x1 has no minimum value: limx→0− f (x) = −∞.  3. the following are equivalent: (i) f assumes a maximum value M .78 5. a) We use the -δ interpretation of differentiability at a to our ad- vantage. ∞) → R by f (x) = x1 has no maximum value: limx→0+ f (x) = ∞. or equivalently f (x) − f (a) 0< < 2f 0 (a). i. f (x)−f (a) > 0. If f (x) = x2 . Exercise: For a function f : D → R. a) If f 0 (a) > 0. we say m ∈ R is the minimum value of f on D if (mV1) There exists x ∈ D such that f (x) = m. x−a In particular.2 c) If f (x) = x3 . We say M ∈ R is the maximum value of f on D if (MV1) There exists x ∈ D such that f (x) = M . b) This is similar enough to part a) to be best left to the reader as an exercise. . Let f : D → R. and M = m. then f is decreasing at a. | f (x)−f x−a (a) − f 0 (a)| < f 0 (a). (ii) f is a constant function. and if x < a. A function is bounded if it is both bounded above and bounded 2Two ways to go: (i) Revisit the above proof flipping inequalities as appropriate. f (x) < f (a). and (MV2) For all x ∈ D. Theorem 5. f (x) ≤ B. then no conclusion can be drawn: f may be increasing at a. i.3.12.e. so: if x > a. DIFFERENTIATION If one looks back at the previous examples and keeps in mind that we are sup- posed to be studying derivatives (!). Namely. f (x) ≤ M . one is swiftly led to the following fact.. Suppose f is differentiable at a ∈ I ◦ . c) If f 0 (a) = 0.e. b) If f 0 (a) < 0. f (x) > f (a). or neither.. then f 0 (0) = 0 but f is decreasing at 0. a function can have at most one minimum value but need not have any. de- creasing at a. then f 0 (0) = 0 but f is increasing at 0. then f 0 (0) = 0 but f is neither increasing nor decreasing at 0. Let f : I → R. f (x) ≥ b. for all x with 0 < |x−a| < δ. If f (x) = −x3 . and (mV2) For all x ∈ D. Extreme Values. It is clear that a function can have at most one maximum value: if it had more than one.

0 ≤ x < 1. bounded interval and is bounded above (by 2) and bounded below (by 0) but does not have a maximum or minimum value. hence is bounded above and below. We say f assumes its maximum value at a if f (a) is the maximum value of f on D. Example: Let f : R → R by f (x) = x3 + 5. f (x) ≥ f (a).. 2] → R be defined as follows: f (x) = x + 1. Example: Let f : [0. (Extreme Value Theorem) Let f : [a. and 1 is the maximum value of the sine function. b) Give an example of a bounded function f : R → R which has neither a maximum nor a minimum value. |f (x)| ≤ max(|m|. f (x) ≤ f (a). if M is the maximum value of f and m is the minimum value. the graph of f is “trapped between” the horizontal lines y = B and y = −B. f (x) = x − 1. . OPTIMIZATION 79 below: equivalently. then for all x ∈ B. Indeed. it is bounded above. |f (x)| ≤ B: i. a) Show: if f has a maximum value. 3. or in other words.e. Thus there may be more than one x-value at which a function attains its maximum value. Exercise: a) If a function has both a maximum and minimum value on D. Of course this example of a function defined on a closed bounded interval without a maximum or minimum value feels rather contrived: in particular it is not continuous at x = 1. following a discussion of completeness. and in fact is a useful tool for establishing the existence of maximia / minima in other situations as well. This brings us to one of the most important theorems in the course. 1 < x ≤ 2. The proof will be given in the next chapter. f (x) = 1. Unfor- tunately. |M |). Example: The function f (x) = sin x assumes its maximum value at x = π2 . Then f does not assume a maxi- mum or minimum value. limx→∞ f (x) = ∞ and limx→−∞ f (x) = −∞.13. it is bounded below. b) Show: if f has a minimum value. the sine function is periodic and takes value 1 precisely at x = π2 + 2πn for n ∈ Z. or in other words. Then f has a maximum and minimum value. we say f assumes its minimum value at a if f (a) is the minimum value of f on D. b] → R be continuous. Similarly f attains its minimum value at x = 3π 3π 2 – f ( 2 ) = −1 and f takes 3π no smaller values – and also at x = 2 + 2πn for n ∈ Z. Note however that π2 is not the only x-value at which f assumes its maximum value: indeed. Simlarly. be- cause sin π2 = 1. Theorem 5. Ubiquitously in (pure and applied) mathematics we wish to optimize functions: that is. Then f is defined on a closed. a general function f : D → R need not have a maximum or minimum value! But the Extreme Value Theorem gives rather mild hypotheses on which these values are guaranteed to exist. x = 1. there exists B ≥ 0 such that for all x ∈ D. Exercise: Let f : D → R be a function. as we have seen above. for all x ∈ D. for all x ∈ D. then it is bounded on D: indeed. find their maximum and or minimum values on a certain domain.

f (x) > f (a). Indeed. Theorem 5. if f 0 (a) < 0. Local Extrema and a Procedure for Optimization. and for which x value(s) does it occur? Stay tuned:we are about to develop tools to answer this question! 3. f (x) > 1 > 0 ≥ m.  Exercise: State and prove a version of Theorem 5.15. f (x) ≥ 1. DIFFERENTIATION Example: Let f : R → R be defined by f (x) = x2 (x − 1)(x − 2). (Optimization Theorem) Let f : I → R be continuous. and for x slightly larger than a.15 for the maximum value. However. More formally: there exists δ > 0 such that for all x ∈ D. This is an immediate consequence of Theorem 5. 1) and (2.12. −∆) and (∆. (ii) a stationary point: f 0 (a) = 0. Thus for x slightly smaller than a. f (x) < f (a). We argue for this as follows: given that f tends to ∞ with |x|. ∆] must in fact be the minimum value for f on all of R. we say that f has a local minimum at a if the vaalue of f at a is greater than or equal to its values at all sufficiently close points x. |x − a| < δ =⇒ f (x) ≤ f (a). if f 0 (a) 6= 0 then either f 0 (a) > 0 or f 0 (a) < 0. and let a ∈ D. ∞) and negative on (1.80 5. f is increasing at a. Then a is either: (i) an endpoint of I. We can be at least a little more explicit: a sign analysis of f shows that f is positive on (−∞.12. then f 0 (a) = 0. Let f : D → R be a function. So f does not have a local extremum at a. either a local minimum or a local maximum – at x = a. say m.4. In fact since f (0) = 0.14.. More formally: there exists δ > 0 such that for all x ∈ D. Proof. which will be strictly negative. since at the other values – namely. then by Theorem 5. so in particular m < 1. and for x slightly larger than a. and let a be an interior point of a. Similarly. . then by Theorem 5. 2]. On the other hand. ∆] we have a continuous function on a closed bounded interval.14. so the minimum value of f will be its minimum value on [1.  Theorem 5. we see that m < 0. ∞). Suppose that f attains a minimum or maximum value at x = a. there must exist ∆ > 0 such that for all x with |x| > ∆. |x − a| < δ =⇒ f (x) ≥ f (a). f is decreasing at a. If f is differentiable at a and has a local extremum – i.e. Proof. f (x) < f (a). 2). or (iii) a point of nondifferentiability. Let f : D ⊂ R. f (x) > f (a). we claim that f does have a minimum value. Note that f does not have a maximum value: indeed limx→∞ f (x) = limx→−∞ = ∞. 0 If f (a) > 0. This means that the minimum value m for f on [−∆. on (−∞. So f does not have a local extremum at a. if we restrict f to [−∆. We now describe a type of “local behavior near a” of a very different sort from being increasing or decreasing at a. Thus for x slightly smaller than a. Similarly. We say that f has a local maximum at a if the value of f at a is greater than or equal to its values at all sufficiently close points x. But ex- actly what is this minimum value m. so by the Extreme Value Theorem it must have a minimum value.

. b] → R be continuous on [a.619 . . x1 ≈ 0. . where f 0 (x) = 0. or. . (There may not be any critical points.619 . f (x1 ) = 0. . 4]. . . .. and the smallest is the minimum value. . then x3 (x − 3) ≥ 43 · 1 = 64 ≥ 1. Example: Let f (x) = x2 (x − 1)(x − 2) = x4 − 3x3 + 2x2 . the only critical points will be the stationary points.. . . Our goal in this section is to prove the following important result. f (4) = 96. . f (b). but that only makes things easier. cn . Clearly there are always exactly two endpoints.15 together under the term crit- ical point (but there is nothing very deep going on here: it’s just terminology).1. In favorable circustances there will be only finitely many critical points. . x2 = 1. so −3x3 > 0 and thus f (x) ≥ x4 + 2x2 = x2 (x2 + 2) ≥ 1 · 3 = 3.15 out by finding the maximum and minimum values of f (x) = x4 − 3x3 + 2x2 on [−4. Above we argued that there is a ∆ such that |x| > ∆ =⇒ |f (x)| ≥ 1: let’s find such a ∆ explicitly. . Theorem 5. . . . 4). Also we always test the endpoints: f (−4) = 480. On the other hand.6094 . then x < 0. f (c1 ). √ 9± 17 The roots are x = 8 . 4.604 . Now let us try the procedure of Theorem 5.2017 .) Suppose further that we can explicitly compute all the values f (a). So we compute the derivative: f 0 (x) = 4x3 − 9x2 + 4x = x(4x2 − 9x + 4). Thus we may take ∆ = 4. THE MEAN VALUE THEOREM 81 Often one lumps cases (ii) and (iii) of Theorem 5. . We intend nothing fancy here: f (x) = x4 − 3x2 + 2x2 ≥ x4 − 3x3 = x3 (x − 3). f (cn ). So if x ≥ 4. f (x2 ) = −0. and in very favorable circumstances they can be found exactly: suppose they are c1 .16. Since f is differentiable everywhere on (−4. approximately. Statement of the Mean Value Theorem. 4. . . The Mean Value Theorem 4. . So the maximum value is 480 and the minimum value is −. . b] and differentiable on (a. if x < −1. b−a . Then we win: the largest of these values is the maximum value. (Mean Value Theorem) Let f : [a. b). Then there is at least one c with a < c < b and f (b) − f (a) f 0 (c) = . .

Proof of the Mean Value Theorem. By Theorem 5. he announced to all that he was adding a possible answer to the Mean Value Theorem question. so that the Mean Value Theorem says that there is at least one instant at which the instantaneous velocity is equal to the average velocity. as it has a very simple geometric interpretation: under the hypotheses of the theorem. DIFFERENTIATION Remark:3 I still remember the calculus test I took in high school in which I was asked to state the Mean Value Theorem. . It was a multiple choice question. f has a maximum M and a minimum m. and so as not to give special treatment to any one student. (Rolle’s Theorem) Let f : [a. and I didn’t see the choice I wanted. Then the maximum value does not occur 3Please excuse this personal anecdote. They were not so thrilled with me or the teacher. and then had time to come back to this question. But many other students figured that if I had successfully lobbied for an additional answer then “my” answer was probably correct. which was as above except with the subtly stronger assumption that fR0 (a) and fL0 (b) exist: i. they may mathematically deduce that at some point your instantaneous velocity was over 70 mph.13. “Okay. although no one saw you violating the speed limit. We will deduce the Mean Value Theorem from the (as yet unproven) Extreme Value Theorem. you can add that as an answer if you want”. Thus. However. there exists at least one interior point c of the interval such that the tangent line at c is parallel to the secant line joining the endpoints of the interval. (iii) f (a) = f (b). (ii) f is differentiable on (a. my final answer was correct. One day you receive timestamped photos of yourself at two checkpoints. So I marked my added answer. whereas the derivative f 0 (c) is the in- stantaneous velocity at time c.. He thought for a moment and said. but in my opinion he behaved admirably: talk about a “teachable moment”! One should certainly draw a picture to go with the Mean Value Theorem. Example: Suppose that cameras are set up at checkpoints along an interstate highway in Georgia. then the expression f (b)−f b−a (a) is nothing less than the average velocity between time a and time b. After more thought I decided that one-sided differentiability at the endpoints was not required. So in the end I selected this pre-existing choice and submitted my exam.82 5. Theorem 5. As you can see. So I went up to the teacher’s desk to ask about this.17. Then there exists c with a < c < b and f 0 (c) = 0. it is convenient to first establish a special case. Case 1: Suppose M > f (a) = f (b). b] → R. f is one-sided differentiable at both endpoints. And one should also interpret it physically: if y = f (x) gives the position of a particle at a given time x.94 miles per hour. You are issued a ticket for violating the speeed limit of 70 miles per hour.2. The two checkpoints are 90 miles apart and the second photo is taken 73 minutes after the first photo.e. did the rest of the exam. We suppose: (i) f is continuous on [a. Proof. b]. b). so they changed their answer from the correct answer to my added answer. Guilt by the Mean Value Theorem! 4. Your average velocity was (90 miles) / (73 minutes) · (60 minutes) / (hour) ≈ 73.

b). b) with f 0 (c) = 0. hence they are equal. Proof of the Mean Value Theorem: Let f : [a. Then the minimum value does not occur at either endpoint. Cauchy.L. b] → R be contin- uous and differentiable on (a. Since f is differentiable on (a. b] and differentiable on (a.3. there exists c ∈ (a. The above objection is just a technicality. By Rolle’s Theorem. (Cauchy Mean Value Theorem) Let f. there is c ∈ (a. g : [a. b). . it may no longer be the graph of a function. so the “official” proof that follows is (slightly) different. there exists c ∈ (a. Case 3: The remaining case is m = f (a) = M .. Then there exists c ∈ (a. so Rolle’s Theorem does not apply. We present here a generalization of the Mean Value Theorem due to A. it must therefore occur at a stationary point: i. since L is a linear function with slope f (b)−f b−a (a) . both sides of (19) are zero. In fact. There is a unique linear function L(x) such that L(a) = f (a) and L(b) = f (b): indeed. Then g is defined and continuous on [a. f (b)) becomes horizontal and then apply Rolle’s Theorem.  To deduce the Mean Value Theorem from Rolle’s Theorem. Case 1: Suppose g(a) = g(b). THE MEAN VALUE THEOREM 83 at either endpoint. Applying Rolle’s Theorem to g. b). it has its place: later we will use it to prove L’Hôpital’s Rule. b) such that g 0 (c) = 0. b]. Proof. b). f (a)) to (b. it must therefore occur at a stationary point: there exists c ∈ (a. f (a)) and (b.e.18. differentiable on (a. Here goes: define g(x) = f (x) − L(x). b] → R be continuous on [a. The Cauchy Mean Value Theorem. The possible flaw here is that if we start a subset in the plane which is the graph of a function and rotate it too much. On the other hand. b) such that g 0 (c) = 0. L is nothing else than the secant line to f between (a. Theorem 5. Nevertheless formalizing this argument is more of a digression than we want to make. b) with f 0 (c) = 0. and then the conclusion that we get is easily seen to be the one we want about f . and g(a) = f (a) − L(a) = f (a) − f (a) = 0 = f (b) − f (b) = f (b) − L(b) = g(b). b−a and thus f (b) − f (a) f 0 (c) = . Here’s the trick: by subtracting L(x) from f (x) we reduce ourselves to a situation where we may apply Rolle’s Theorem. b). we compute f (b) − f (a) 0 = g 0 (c) = f 0 (c) − L0 (c) = f 0 (c) − . b). b) such that (19) (f (b) − f (a))g 0 (c) = (g(b) − g(a))f 0 (c). b−a 4. it is tempting to tilt our head until the secant line from (a. Although perhaps not as fundamental and physically appealing as the Mean Value Theorem. Case 2: Suppose m < f (a) = f (b). For this c. 4. Then f is constant and f 0 (c) = 0 at every point c ∈ (a. f (b)). Since f is differentiable on (a. it suggests that more is true: there should be some version of the Mean Value Theorem which applies to curves in the plane which are not necessarily graphs of functions.

which we think of as being a vector in the plane with tail at the point (x(t). in which case we get a vertical tangent line.g. 3π 2 . (f (b) − f (a))g 0 (c) = (g(b) − g(a))f 0 (c). e. When x0 (t) = y 0 (t) = 0 we get the zero vector. y(t)). 0) (as you would certainly expect from contemplation of the unit circle). b]. then h(a) = h(b). This point traces out a path which winds once around the unit circle counterclockwise. differentiable on (a. We say that v is continuous at t if both x and y are continuous at t. 2π. DIFFERENTIATION Case 2: Suppose g(a) 6= g(b). For any nonsingular point (x(t). g(b) − g(a) or equivalently. Further.e. y 0 (t)). and define   f (b) − f (a) h(x) = f (x) − g(x).  Exercise: Which choice of g recovers the “ordinary” Mean Value Theorem? The Cauchy Mean Value Theorem can be used to vindicate the “head-tilting argu- ment” contemplated before the formal proof of the Mean Value Theorem. (− sin t)2 + (cos t)2 = 1. y 0 (t)) is horizontal and the other is vertical. but it is interesting enough to be worth sketching here. When t = 0. 0) to v(t)) is x(t) = cos t and the slope of 4Don’t be so impressed: we wanted a constant C such that if h(x) = f (x) − Cg(x). which we don’t want: let’s call this a singular point. one of (x(t). π. y(t)) with slope xy 0 (t) (t) . y(t)). where t 7→ x(t) and y 7→ y(t) are each functions from I to R. This holds at all other points as well. . y(t)) we can define the tangent line to be the unique line passing through the points (x(t). by noting y(t) sin t that the slope of the radial line from (0. b) with   f (b) − f (a) 0 = h0 (c) = f 0 (c) − g 0 (c). y(t)) and (x(t) + x0 (t). We have v 0 (t) = (− sin t. To do so takes us a bit outside our wheelhouse (i.) Example: Let v : [0. At the points t = 0. 2π] → R2 by v(t) = (cos t. so in particular the tangent line at v(t) is perpendicular to v(t). when both x and y are differentiable at t we decree that v is differentiable at t and set v 0 (t) = (x0 (t). g(b) − g(a) g(b) − g(a) so h(a) = h(b). y(t) + y 0 (t)). the above definition is phrased so as to make sense also if x0 (t) = 0 and y 0 (t) 6= 0. a seriously theoretical approach to single variable calculus). so we set f (a) − Cg(a) = f (b) − Cg(b) and solved for C. x0 (t) = 0 and the tangent line is vertical at the points (±1.84 5. First recall (if possible) the notion of a parameterized curve: this is given by a function f with domain an interval I in R and codomain the plane R2 : we may write v : t 7→ (x(t). we can define the tangent line in a more familiar way. π2 . sin t). cos t): since for all t. (When x0 (t) 6= 0. y(t)) and (x0 (t). g(b) − g(a) g(b) − g(a) f (b)(g(b) − g(a)) − g(b)(f (b) − f (a)) f (a)g(b) − g(a)f (b) h(b) = = . and f (a)(g(b) − g(a)) − g(a)(f (b) − f (a)) f (a)g(b) − g(a)f (b) h(a) = = .. π. b).4 By Rolle’s Theorem there exists c ∈ (a. as the unique line passing through 0 (x(t). g(b) − g(a) Then h is continuous on [a. 2π. in particular v 0 (t) is never zero: there are no singular points.

Foremost of all it will be used in the proof of the Fundamental Theorem of Calculus.. b) such that (x(b) − x(a))y 0 (c) = (y(b) − y(a))x0 (c). for all t ∈ I. The Mean Value Theorem has several important consequences. 5. 5.e. f (b)). By our nonsingularity assumption y 0 (c) 6= 0.  Conversely. and indeed it is common to also refer to Theorem 5. Case 2: Suppose x0 (c) = 0. t 7→ (x(t). hence these two lines are parallel. so 0 = (y(b) − y(a))x0 (c) = (x(b) − x(a))y 0 (c) implies x(a) = x(b). x(b)) to (y(a). The Monotone Function Theorems. (Parameteric Mean Value Theorem) Let v : I → R2 . Then we must have x(b) − x(a) 6= 0: if not. i. x0 (c) and y 0 (c) are not both 0. f (a)) to (b.1.19. y(b)). since x0 (c) 6= 0. and then we have x(a) = x(b) and y(a) = y(b). y(b)) is vertical. Case 1: Suppose x0 (c) 6= 0. Then there is c ∈ (a. Proof. We apply the Cauchy Mean Value Theorem with f = x and g = y: there is c ∈ (a. contrary to our hypothesis. By assumption. . Monotone Functions A function f : I → R is monotone if it is weakly increasing or weakly decreasing. 5. x0 (c) x(b) − x(a) which says that the tangent line to v at c has the same slope as the secant line between (x(a). Let a < b ∈ I.e. and suppose that either x(a) 6= x(b) or y(a) 6= y(b). but that’s for later. 0 = (x(b) − x(a))y 0 (c) = (y(b) − y(a))x0 (c) and thus. so the secant line from (x(a). y(b) − y(a) 6= 0. y 0 (c)) at c is parallel to the secant line from (a. y(a)) and (x(b). Here is a version of the Mean Value Theorem for nonsingular parameterized curves. Thus we get a derivation using calculus of a familiar (or so I hope!) fact from elementary geometry.19 as the Cauchy Mean Value Theorem. be a nonsingular parametrized curve in the plane. b) such that the tangent vector (x0 (c). the tangent line to v at c is vertical. it is possible (indeed. At the moment we can use it to establish a criterion for a function f to be monotone on an interval in terms of sign condition on f 0 .. y(t)). MONOTONE FUNCTIONS 85 0 the tangent line to v(t) is xy 0 (t) − cos t (t) = sin t and observing that these two slopes are opposite reciprocals. hence the two lines are parallel. x0 (t) and y 0 (t) both exist and are not both 0. So we may divide to get y 0 (c) y(b) − y(a) = . Theorem 5. Thus really we have one theorem with two moderately different phrasings. i. similar) to deduce the Cauchy Mean Value The- orem from the Parametric Mean Value Theorem: try it.

Proof. and consider the set of all functions F : I → R such that F 0 = f . Then there exists a constant C ∈ R such that f = g + C. Show that there is no function F : R → R such that F 0 = f . is . b) Suppose f 0 (x) ≥ 0 for all x ∈ I. not every function f : R → R has an antiderivative.21. x2 inI with x1 < x2 . Let h = f − g. f (x1 ) ≤ f (x2 ). a) Suppose f 0 (x) > 0 for all x ∈ I.20. Let f : I → R be a function. If we had argued directly from the Mean Value Theorem the proof would have been shorter: try it! Corollary 5. i. c).22 can be viewed as a uniqueness theorem for differential equa- tions. i.22 asserts that if there is a function F such that F 0 = f . Then h0 = (f − g)0 = f 0 − g 0 ≡ 0. and more specifically that the general such function is of the form F + C. g : I → R are both differentiable and such that f 0 = g 0 (equality as functions. Exercise: Let f : R → R by f (x) = 0 for x ≤ 0 and f (x) = 1 for x > 0. Since f 0 (x) ≤ 0 for all x ∈ I.21 from Theorem 5. (First Monotone Function Theorem) Let I be an open interval.  Remark: Corollary 5. f (x1 ) < f (x2 ). Then f is constant. Since f 0 (x) ≥ 0 for all x ∈ I. x2 ∈ I with x1 < x2 such that f (x1 ) > f (x2 ). Then f is weakly increasing on I: for all x1 . then there is a one-parameter family of such functions. f (x1 ) ≤ f (x2 ) and f (x1 ) ≥ f (x2 ) and thus f (x1 ) = f (x2 ): f is constant.. we deduced Corollary 5. x2 ∈ I with x1 < x2 . f (x1 ) ≥ f (x2 ). Apply the Mean Value Theorem to f on [x1 . f 0 (x) = g 0 (x) for all x ∈ I). One may either proceed exactly as in parts a) and b).22.e.. x2 ∈ I with x1 < x2 .e.20.21.d) We leave these proofs to the reader. must there exist F : I → R such that F 0 = f ? In general the answer is no. Suppose f. h ≡ C and thus f = g + h = g + C. x2 ]: there exists x1 < c < x2 such that f 0 (c) = f (xx22)−f −x1 (x1 ) < 0. f (x1 ) > f (x2 ). (Zero Velocity Theorem) Let f : I → R be a differentiable function with identically zero derivative. a) We go by contraposition: suppose that f is not increasing: then there exist x1 . Proof. Apply the Mean Value Theorem to f on [x1 . or reduce to them by multiplying f by −1. the existence question lies deeper: namely. x2 ∈ I with x1 < x2 . Then f is decreasing on I: for all x1 . Proof. c) Suppose f 0 (x) < 0 for all x ∈ I. f (x) = g(x) + C. In other words.  Corollary 5. Then Corollary 5..e. f is weakly increasing on I: x1 < x2 =⇒ f (x1 ) ≤ f (x2 ). But a function which is weakly increasing and weakly decreasing satsifies: for all x1 < x2 . d) Suppose f 0 (x) ≤ 0 for all x ∈ I. b) Again. and let f : I → R be a function which is differentiable on I.86 5. Then f is increasing on I: for all x1 . so by Corollary 5. On the other hand. for all x ∈ I. x2 ]: there exists x1 < c < x2 such that f 0 (c) = f (xx22)−f −x1 (x1 ) ≤ 0. DIFFERENTIATION Theorem 5.  Remark: Above. we argue by contraposition: suppose that f is not weakly increasing: then there exist x1 . Then f is weakly decreasing on I: for all x1 . given f : I → R. f is weakly decreasing on I: x1 < x2 =⇒ f (x1 ) ≥ f (x2 ). i. x2 ∈ I with x1 < x2 such that f (x1 ) ≥ f (x2 ).

Indeed. Now suppose that f is increasing on [a. We will show that f is weakly increasing on [a. We will show that f is increasing on [a. Proof. Let f : I → R be a function whose nth derivative f (n) is identically zero. b). b) Suppose f is monotone. b). Step 2: Suppose that f is continuous at a and increasing on (a. MONOTONE FUNCTIONS 87 the derivative of some other function. b].. a) If f is weakly increasing on (a. This is just a technical convenience: for continu- ous functions. b). there is a ∈ I ◦ with f 0 (a) < 0. leaving the other case to the reader as a routine exercise. b) such that f (a) > f (x0 ). Now take  = f (a) − f (x0 ). b ∈ I ◦ with a < b such that f 0 (x) = 0 for all x ∈ [a. namely strict inequality. b) and also increasing on (a. Let f : [a. (iii) There exist a. To see that this cannot happen. By taking a < x < x0 . b ∈ I ◦ with a < b such that the restriction of f to [a. x0 ): since f is increasing on (a. Then f is decreasing at a. Step 1: Suppose that f is continuous at a and weakly increasing on (a. a) The following are equivalent: (i) f is monotone. (ii) There exist a.  Exercise: Show that we may replace each instance of “increasing” in Theorem 5..25. b). So. b). If not. since f is (right-)continuous at a. b). Note first that Step 1 applies to show that f (a) ≤ f (x) for all x ∈ (a. contradicting the fact that f is weakly increasing on [a. (ii) Either we have f 0 (x) ≥ 0 for all x ∈ I ◦ or f 0 (x) ≤ 0 for all x ∈ I ◦ . Proof.24 with “decreasing” (and still get a true statement!). so there . b) If f is incereasing on (a.. a) (i) =⇒ (ii): Suppose f is weakly increasing on I. this contradicts the assumption that f is weakly increasing on (a.)  The setting of the Increasing Function Theorem is that of a differentiable function defined on an open interval I.23. seeking a contradiction. it is increasing on [a. |f (x) − f (a)| < f (a) − f (x0 ). b). Then f is a polynomial function of degree at most n − 1. it is weakly increasing on [a. we suppose that f (a) = f (x0 ) for some x0 ∈ (a. b): then f (a) < f (c) < f (b). b). Proof.) Corollary 5. choose any c ∈ (a. The following are equivalent: (i) f is not increasing or decreasing. (Second Monotone Function Theorem) Let f : I → R be a function which is continuous on I and differentiable on the interior I ◦ of I (i. Theorem 5. (Much more subtly. b). the increasing / decreasing / weakly increasing / weakly decreasing behavior on the interior of I implies the same behavior at an endpoint of I. But now take x1 ∈ (a. b). The only thing that could go wrong is f (a) ≥ f (b). Exercise. Step 3: In a similar way one can handle the right endpoint b. b]. but we want slightly more than this. at every point of I except possibly at any endpoints I may have). b]. assume not: then there exists x0 ∈ (a. b] is constant. We claim f 0 (x) ≥ 0 for all x ∈ I ◦ . b]. there are also some discontinuous functions which have antiderivatives. b) we have f (x1 ) < f (x0 ) = f (a).e. It turns out that every continuous function has an antiderivative: this will be proved later. Throughout the proof we restrict our attention to increasing / weakly increasing functions. Theorem 5. It remains to show that f is increasing on [a. b]. which implies f (x) > f (x0 ). 5. b] → R be continuous at x = a and x = b. there exists δ > 0 such that for all a ≤ x < a + δ. (Hint: use induction.24.

24. a + δ). and thus Theorem 5. Then: a) If there exists δ > 0 such that f 0 (x) is negative on (a − δ. Example: A typical application of Theorem 5. Moreover.88 5.3. . a + δ). We suppose: (i) f is twice differentiable at a. f 0 is identically zero on [a.24 to quickly derive another staple of freshman calculus. b]. a) and is negative on (a. a + δ). The First Derivative Test. xn . and f 0 (x) > 0 except at a finite set of points x1 . f 0 (x) = 3x2 which is strictly positive at all x 6= 0 and 0 at x = 0. for all c ∈ [a. a + δ]. (First Derivative Test) Let I be an interval. (ii) =⇒ (i): Immediate from the Increasing Function Theorem and Theorem 5.26. b]. b]. a) By the First Monotone Function Theorem. a) and positive on the open interval (a. We suppose that f is continuous on I and differentiable on I \ {a}. 5. a + δ]. it is continuous at a − δ. Indeed. a] and increasing on [a. Then f has a strict local maximum at a. b ∈ I ◦ with a < b such that f (a) = f (b).24 applies to show that f is decreasing on [a − δ. f has a strict local maximum at a. . b) If there exists δ > 0 such that f 0 (x) is positive on (a − δ. Let f : I → R. whereas if f 00 (a) < 0. b) As usual this may be proved either by revisiting the above argument or deduced directly from the result of part a) by multiplying f by −1. . . Then. a) or in (a. Thus for instance our version of the test applies to f (x) = |x| to show that it has a strict local minimum at x = 0.24 f is still not increasing on I ◦ . Theorem 5. Proof. so there exist a. a) and is positive on (a. b]. a and a + δ. and let f : I → R. This gives the desired result. a) and increasing on (a. b) (i) =⇒ (ii): Suppose f is weakly increasing on I but not increasing on I. a an interior point of I. (Second Derivative Test) Let a be an interior point of an in- terval I. b] we have f (a) ≤ f (c) ≤ f (b) = f (a). Then if f 00 (a) > 0. DIFFERENTIATION exists b > a with f (b) < f (a). then by the Zero Velocity Theorem f is constant on [a. since f 0 is negative on the open interval (a−δ. contradicting the fact that f is weakly decreasing. The Second Derivative Test. since f is weakly increasing. and f : I → R a function. Then f has a strict local minimum at a. since f is differentiable on its entire domain. We can use Theorem 5.26 is to show that the function f : R → R by f (x) = x3 is increasing on all of R. 5. b]. Then f is increasing on I. a+δ) f is decreasing on (a − δ.28.  The next result follows immediately. (iii) =⇒ (i): If f 0 is identically zero on some subinterval [a. Theorem 5. since it implies that f (a) is strictly smaller than f (x) for any x ∈ [a − δ. hence is not increasing.2. and (ii) f 0 (a) = 0.  Remark: This version of the First Derivative Test is a little stronger than the familiar one from freshman calculus in that we have not assumed that f 0 (a) = 0 nor even that f is differentiable at a. Suppose f 0 (x) ≥ 0 for all x ∈ I. . Corollary 5. so f is constant on [a. (ii) =⇒ (iii): If f is constant on [a. By Theorem 5.27. f has a strict local minimum at a.

One precise formulation of of this is that f has only finitely many roots on any bounded subset of its domain. Then x−a > 0 and x − a < 0. But this is a con- venient assumption and in a given situation it is usually easy to see whether it holds. This procedure comes with two implicit assumptions. a + δ) and then apply the First Derivative Test. To see this. Our strategy will be to show that for sufficiently small δ > 0. but certainly not for all functions. so 0 f (x) > 0. Let us make them explicit. consider f 0 (x) − f 0 (a) f 0 (x) f 00 (a) = lim = lim . Of course this does not hold for all functions: it fails very badly. a + δ). finding exact values of roots may be difficult or impossible even for polynomial functions. a+δ) (otherwise it would not be meaningful to talk about the derivative of f 0 at a). for the function f which takes the value . so that there exists δ > 0 0 such that for all x ∈ (a − δ. This holds for all the elementary functions we know and love. a + δ). fx−a(x) is positive. Let us agree that a sign analysis of a function g : I → R is the determina- tion of regions on which g is positive. for instance. MONOTONE FUNCTIONS 89 Proof. or none of the above. Then fx−a (x) > 0 and x − a > 0. The basic strategy is to determine first the set of roots of g. a) ∪ (a. As usual it suffices to handle the case f 00 (a) > a. decreasing at a. a). the features of interest include number and approximate locations of the roots of f . regions on which f is positive or negative. For these considerations one wishes to do a sign analysis on both f and its derivative f 0 . no conclusion can be drawn about the local behavior of f at a: it may have a local minimum at a. Notice that the hypothesis that f is twice differentiable at a implies that f is differentiable on some interval (a−δ. As discussed before. even all differentiable functions: we have seen that things like x2 sin( x1 ) are not so well-behaved. be increasing at a. negative and zero. The second assumption is more subtle: it is that if a function f takes a posi- tive value at some point a and a negative value at some other point b then it must take the value zero somewhere in between. if any. Sign analysis and graphing. regions on which f is increasing or decreasing. So f has a strict local minimum at a by the First Derivative Test.4. 5.  Remark: When f 0 (a) = f 00 (a) = 0. suppose x ∈ (a. but often it is feasible to determine at least the number of roots and their approximate location (certainly this is possible for all polynomial functions. x→a x−a x→a x − a We are assuming that this limit exists and is positive. f 0 (x) is negative for x ∈ (a − δ. a) and positive for x ∈ (a. The first is that the roots of f are sparse enough to separate the domain I into “regions”. and local extrema. 0 On the other hand. so f 0 (x) < 0. a local maximum at a. 5. When one is graphing a function f . The next step is to test a point in each region between consecutive roots to determine the sign. And this gives us exactly f 0 (x) what we want: suppose x ∈ (a − δ. although this requires justification that we do not give here).

talk about completeness of the real numbers. (Intermediate Value Theorem) Let f : [a. Must its derivative f 0 : I → R be continuous? Let us first pause to appreciate the subtlety of the question: we are not asking whether f differentiable implies f continuous: we well know this is the case.e. with f (a) < L < f (b) or f (b) < L < f (a) – there exists some c ∈ (a. just as we used but did not yet prove the Extreme Value Theorem.30. Such innocuous statements as every non-negative real number having a square root contain an im- plicit appeal to IVT. So this is how sign analysis works for a function f when f is continuous – a very mild assumption. The point of this example is to drive home the point that the Intermediate Value Theorem is the second of our three “hard theorems” in the sense that we have no chance to prove it without using special properties of the real numbers beyond the ordered field axioms. at least if f 0 has only finitely many roots on any bounded interval. Let f (x) be −1 for 0 ≤ x < 2 and f (x) = 1 for 2 < x ≤ 1. but we will use it. But as above we also want to do a sign analysis of the derivative f 0 : how may we justify this? Well. Let f : I → R be a continuous function. and on each of them f has constant sign: it is either always positive or always negative. Then f has the intermediate value property. Here are two important theorems.. and suppose that there are only finitely many roots. then by IVT it too has the intermediate value property and thus. 2]Q → Q failing √ the intermediate value property. Further. and prove the three hard theorems. And indeed we will not prove IVT right now. . Example of a continuous function f :√[0. . i. DIFFERENTIATION 1 at every rational number and −1 at every irrational number. Then I \ {x1 . Rather we are asking whether the new function f 0 can exist at every point of I but fail to itself be a continuous function. b] → R be continu- ous. . Question 5. In fact the answer is yes. . xn ∈ I such that f (xi ) = 0 for all i and f (x) 6= 0 for all other x ∈ I. b ∈ I with a < b and all L in between f (a) and f (b) – i. Let us formalize the desired property and then say which functions satisfy it. . This brings up the following basic question. . Thus a function has the intermediate value property when it does not “skip” values. Theorem 5. Let f : I → R be a differentiable function. . here is one very reasonable justification: if the derivative f 0 of f is itself continuous. b) with f (c) = L. (However we are now not so far away from the point at which we will “switch back”.29.. . there are x1 . .) The Intermediate Value Theorem (IVT) is ubiquitously useful.90 5. A function f : I → R has the intermediate value property if for all a. each asserting that a broad class of functions has the intermediate value property. IVT justifies the following observation.e. sign analysis is justified. xn } is a finite union of intervals.

so the minimum cannot occur at a. which implies f 0 (a) < 0 and f 0 (b) > 0. x]: there exists cx ∈ (a. there exists c ∈ (a. b) such that f 0 (c) = L. terminology we will understand better when we discuss the Intermediate Value Theorem (for arbitrary continuous functions). . g 0 (a) = f 0 (a) − L < 0 and g 0 (b) = f 0 (b) − L > 0. and (iii) limx→a f 0 (x) = L exists. 5. x) such that f (x) − f (a) = f 0 (cx ). Step 2: We now reduce the general case to the special case of Step 1 by defining g(x) = f (x) − Lx. Then for every L ∈ R with f 0 (a) < L < f 0 (b).. b] so assumes its minimum value at some point c ∈ [a. Suppose: (i) f is continuous on I. . Proof. and let f : I → R. Step 1: First we handle the special case L = 0. . i. Now f is a differentiable – hence continuous – function defined on the closed interval [a. A Theorem of Spivak. Then f is differentiable at x. Then g is still differentiable. b ∈ I with a < b and f 0 (a) < f 0 (b). so the minimum cannot occur at b. a + δ). Theorem 5. Proof. (Darboux) Let f : I → R be a differentiable function. Suppose that we have a.) 5. b) such that f 0 (c) = L. I claim that f is differentiable on all of R but that the derivative is discontinuous at x = 0. (Recall that a function g has a simple discontinuity at a if limx→a− g(x) and limx→a+ g(x) both exist but either they are unequal to each other or they are unequal to g(a). a + δ) ⊂ I. it must be a stationary point: f 0 (c) = 0. and we may apply the Mean Value Theorem to f on [a.31. Choose δ > 0 such that (a − δ. f is increasing at b. b]. b) such that 0 = g 0 (c) = f 0 (c) − L. thus takes smaller values slightly to the right of a. In other words. Theorem 5. Let a be an interior point of I. and suppose f : I → R is a differentiable function. f is decreasing at a. so by Step 1. Exercise: Let a be an interior point of an interval I.32. Let x ∈ (a. Darboux’s Theorem also often called the Intermediate Value The- orem For Derivatives.e. there exists c ∈ (a. Show that the function f 0 cannot have a simple dis- continuity at x = a. and in fact that limx→0 f 0 (x) does not exist. (ii) f is differentiable on I \ {a}. thus takes smaller values slightly to the left of b. Similarly. The following theorem is taken directly from Spivak’s book (Theorem 7 of Chapter 11): it does not seem to be nearly as well known as Darboux’s Theorem (and in fact I think I encountered it for the first time in Spivak’s book).5.  Remark: Of course there is a corresponding version of the theorem when f (b) < L < f (a). there exists c ∈ (a. If c is an interior point. x−a . then as we have seen. But the hypotheses guarantee this: since f 0 (a) < 0. Then f is differentiable at a and f 0 (a) = L. at every point of I except possibly at a. since f 0 (b) > 0. MONOTONE FUNCTIONS 91 Example: Let f (x) = x2 sin( x1 ).

a function which is both injective and surjective. we denote it by f −1 . this definition breaks down and thus we are unable to define f −1 . there exists at least one x ∈ X such that y = f (x). we define f −1 (y) = xy . (ii) f admits an inverse function.  Since the inverse function to f is always unique provided it exists. (Uniqueness of Inverse Functions) Let f : X → Y be a function. Suppose that g1 . x2 ∈ X. that for all y ∈ Y .. In other words. a) we get fL0 (a) = lim f 0 (x) = L. (Caution: this has nothing to do with f1 . x] gets arbitrarily close to x. g2 : Y → X are both inverses of f . distinct x-values get mapped to distinct y-values. Nevertheless we ask the reader to bear with us as we give a slightly tedious formal proof of this. (Existence of Inverse Functions) For f : X → Y . as x → a every point in the interval [a. Proposition 5. (f ◦ g)(y) = f (g(y)) = y. Putting these two concepts together we get the important notion of a bijective function f : X → Y . . Theorem 5.33. the graph of f satisfies the horizontal line test. Review of inverse functions. x1 6= x2 =⇒ f (x1 ) 6= f (x2 ). Re- call that f : X → Y is injective if for all x1 . we have g1 (y) = (g2 ◦ f )(g1 (y)) = g2 (f (g1 (y))) = g1 (y). i. Let’s unpack this notation: it means the following: first. Thus sin−1 (x) 6= csc(x) = sin1 x . (And in yet other words. and second. Inverse Functions I: Theory 6. Other- wise put.1.) Also f : X → Y is surjec- tive if for all y ∈ Y . that for all x ∈ X. (g ◦ f )(x) = g(f (x)) = x. TFAE: (i) f is bijective. the unique element of X such that f (xy ) = y. x→ 0 so f is differentiable at a and f (a) = L. Recall that an inverse function is a function g : Y → X such that g ◦ f = 1X : X → X.34. Then g1 = g2 . x→a x−a x→a x→a By a similar argument involving x ∈ (a − δ. For all y ∈ Y .) We now turn to giving conditions for the existence of the inverse function. f ◦ g = 1Y : Y → Y. Let X and Y be sets. DIFFERENTIATION Now.e. and let f : X → Y be a function between them. for all y ∈ Y there exists exactly one x ∈ X such that y = f (x). so limx→a cx = x and thus f (x) − f (a) fR0 (a) = lim+ = lim+ f 0 (cx ) = lim+ f 0 (x) = L. Proof.92 5. It may well be intuitively clear that bijectivity is exactly the condition needed to guarantee existence of the inverse function: if f is bijective. And if f is not bijective.  6.

5 We now introduce the dirty trick of codomain restriction. i. the following are equivalent: (i) The codomain-restricted function f : X → f (X) has an inverse function. b]) = [m. 5This is sometimes called the range of f . b]. 6. (Imagine that you manage the up-and-coming band Yellow Pigs. for each y ∈ X there exists exactly one element of X – say xy – such that f (xy ) = y. namely x = f −1 (y). say at xm and a maximum value M . b]) of f is a subset of [m. Then f (g(y)) = f (xy ) = y. then as above. let y ∈ Y . Then f (f −1 (y)) = y.  Exercise: Let I be a nonempty interval which is not of the form [a. and although x2 : R → R is not surjective. so there is x ∈ X with f (x) = y. (ii) =⇒ (i): Suppose that f −1 exists. we define the image of f to be {y ∈ Y | ∃x ∈ X | y = f (x)}. Proof. We may therefore define a function g : Y → X by g(y) = xy . we still get a well-defined function f : X → f (X). (ii) The original function f is injective. and thus g(f (x)) = x. Then f has a minimum value m. Thus the image f ([a.) Example: Let f : R → R by f (x) = x2 . Then the image f (I) of f is also an interval. M ]. x2 : R → [0. M ). and the next morning you write a press release saying that Yellow Pigs played to a “packed house”. let x1 . Let J be any nonempty interval. let xy be the unique element of X such that f (xy ) = y. M ].2.35. but sometimes not. is closed and bounded. Let f : X → Y be any function. To see that f is surjective. Then f (R) = [0. say at xM . The image of f is often denoted f (X). After everyone sits down you remove all the empty chairs. and let f : I → R be a continuous function. Moreover. You get them a gig one night in an enormous room filled with folding chairs. Next we want to return to earth by considering functions f : I → R and their inverses. ∞).  For any function f : X → Y . This is essentially the same dirty trick as codomain restriction. x2 ∈ X be such that f (x1 ) = f (x2 ). The Interval Image Theorem. b] → R is continuous. The general case will be discussed later. Then if we replace the codomain Y by the image f (X). Since a codomain-restricted function is always surjective. Let us verify that g is in fact the inverse function of f .36. b]. ∞) certainly is. Because f is injective. (Interval Image Theorem) Let I ⊂ R be an interval. It is safer to call it the image! .. Applying f −1 on the left gives x1 = f −1 (f (x1 )) = f −1 (f (x2 )) = x2 . it has an inverse iff it is injective iff the original functionb is injective. So f ([a. if L ∈ (m. For now we will give the proof when I = [a. and this new function is tautologically surjective. For any y ∈ Y . concentrating on the case in which f is continuous. then by the Intermediate Value Theorem there exists c in between xm and xM such that f (c) = L.e. 6. To see that f is injective. Thus: Corollary 5. the only element x0 ∈ X such that f (x0 ) = f (x) is x0 = x. Show: there is a continuous function f : I → R with f (I) = J. So f is injective. consider g(f (x)). INVERSE FUNCTIONS I: THEORY 93 Proof. (i) =⇒ (ii): If f is bijective. For any x ∈ X. Suppose f : [a. Theorem 5. For a function f : X → Y .

As usual. f (b)] or f is decreasing and J = [f (b). hence invertible. and thus there is an inverse function f −1 : J → I. We will suppose that f is injective and not monotone and show that it cannot be continuous. which suffices. Proof. d) The function f −1 : J → I is also continuous. b]. f itself would not be a function!).  Lemma 5. If f were continuous then by the Intermediate Value Theorem there would be d ∈ (a. DIFFERENTIATION 6. it is necessarily injective (if it weren’t.94 5. Proposition 5. a) If f is increasing.. (Λ-V Lemma) Let f : I → R. b) and e ∈ (b.39. Let J = f (I) be the image of f . But if this holds we apply the increasing function f to get y2 = f (f −1 (y2 )) < f (f −1 (y1 )) = y1 .4. b) and e ∈ (b. the de- creasing case being so similar as to make a good exercise for the reader. contradicting the injectivity of f . . We may apply Lemma 5. c) such that f (d) = f (e) = L. a) f : I → J is a bijection. Proof. The following are equivalent: (i) f is not monotone: i. (Continuous Inverse Function Theorem) Let f : I → R be injective and continuous. Then there exists L ∈ R such that f (a) < L < f (b) > L > f (c). it is monotone. then f −1 : f (I) → I is decreasing. Then there exists L ∈ R such that f (a) > L > f (b) < L < f (c). Next suppose f has a V configuration: there exist a < b < c ∈ I such that f (a) > f (b) < f (c). (ii) At least one of the following holds: (a) f is not injective. a contradiction.38. Let f : I → f (I) be a strictly monotone function. Every strictly monotone function is injective. (c) f admits a V -configuration: there exist a < b < c ∈ I with f (a) > f (b) < f (c).3. then either f is increasing and J = [f (a).  6.38. Theorem 5.38 to conclude that f has either a Λ configuration or a V configuration.37. then f −1 : f (I) → I is increasing. so we cannot have f −1 (y1 ) = f −1 (y2 ). Suppose first f has a Λ configuration: there exist a < b < c ∈ I with f (a) < f (b) > f (c). b) If f is decreasing. Since f −1 is an inverse function. Recall f : I → R is strictly monotone if it is either increasing or decreasing. there exist y1 < y2 ∈ f (I) such that f −1 (y1 ) is not less than f −1 (y2 ). If f were continuous then by the Intermediate Value Theorem there would be d ∈ (a. c) such that f (d) = f (e) = L. contradicting injectivity.40. If f : I → R is continuous and injective. Seeking a contradiction we suppose that f −1 is not increasing: that is. f (a)]. b) J is an interval in R. c) If I = [a. Inverses of Continuous Functions. (b) f admits a Λ-configuration: there exist a < b < c ∈ I with f (a) < f (b) > f (c). f is neither increasing nor decreasing. Thus in this sense we may speak of the inverse of any strictly monotone function. Monotone Functions and Invertibility. f : I → f (I) is bijective. Therefore our dirty trick of codomain restriction works to show that if f : I → R is strictly monotone. and thus the possibility we need to rule out is f −1 (y2 ) < f −1 (y1 ). Exercise: Prove Lemma 5. we will content ourselves with the increasing case.e. Theorem 5.

Then: f (a − ) ≤ f (a) − δ. a characterization of the possible discontinuities of a monotone function. a) = (b. that f −1 : J → I is continuous. In this section our goal is to determine conditions under which the inverse f −1 of a differentiable funtion is differentiable. b). the slope of the tangent line of the inverse function at (b. We want to find δ > 0 such that if f (a) − δ < y < f (a) + δ. f (a) + δ ≤ f (a + ). then reflecting it across a line should not change this. . We must show that limy→b f −1 (y) = f −1 (b). and thus if f (a) − δ < y < f (a) + δ we have f (a − ) ≤ f (a) − δ < y < f (a) + δ ≤ f (a + ).e. Thus it should be the case that if f is differentiable. Morever. Geometrically speaking y = f (x) is differentiable at x iff its graph has a well-defined. or. By part c) and Proposition 5.37. put more geometrically. or f −1 (b) −  < f −1 (y) < f −1 (b) + . nonvertical tangent line at the point (x. then since a vertical line has “infinite slope” it does not have a finite-valued derivative. so is postponed to the next chapter. 12. Fix  > 0. 1 the inverse function of the straight line y = mx + b is the straight line y = m (x − b) 1 – i. and if a curve has a well-defined tangent line. [S. The new result is part d). f (a)) such that f 0 (a) = 0. we get f −1 (f (a − )) < f −1 (y) < f −1 (f (a + )). Otherwise. After re- flecting on my dissatisfaction with it. or f and f −1 are both decreasing. Let b ∈ J. So we need to worry about the possibility that reflection through y = x carries a nonvertical tangent line to a vertical tangent line.3] Parts a) through c) simply recap previous results. When does this happen? Well.  Remark: To be honest. We may write b = f (a) for a unique a ∈ I. Well. and if so to find a formula for (f −1 )0 . so that is our answer: at any point (a. Thm.5. Inverses of Differentiable Functions. Notice the occurrence of “nonvertical” above: if a curve has a vertical tangent line. b) = (a. then a −  < f −1 (y) < a + . f −1 (b)) because it will have a vertical tangent. Take δ = min(f (a + ) − f (a). The proof uses the completeness of the real numbers. almost.. we restrict ourselves to the first case. but which depends on the Monotone Jump Theorem. As usual. then the inverse function will fail to be differentiable at the point (b. Since f −1 is increasing. f (a) − f (a − )). it takes a horizontal line y = c to a vertical line x = c. f (x)). reflecting across y = x takes a line of slope m to a line of slope m . 6. 6. I don’t find the above proof very enlightening. a) is precisely the reciprocal of the slope of the tangent line to y = f (x) at (a. either f and f −1 are both increasing. Let’s first think about the problem geometrically. INVERSE FUNCTIONS I: THEORY 95 Proof. so is f −1 . I came up with an alternate proof that I find conceptually simpler. by reflecting the graph of y = f (x) across the line y = x. The graph of the inverse func- tion y = f −1 (x) is obtained from the graph of y = f (x) by interchanging x and y.

We need only implicitly differentiate the equation f −1 (f (x)) = x. It turns out to be quite straightforward to adapt this geometric argument to derive the desired formula for (f −1 )0 (b). Theorem 5. f (a) f (f (b)) Proof. Sup- pose that the inverse function f −1 : J → I is differentiable at b ∈ J. f −1 (b + h) = a + kh . We will do this first. but we are ready for it and we will do it. h→0 h h→0 h Since J = f (I). enough complaining: here goes. or 1 (f −1 )0 (f (x)) = . f 0 (f −1 (b)) 6= 0. and for this we cannot directly use Proposition 5. Let a = f −1 (b). getting (20) (f −1 )0 (f (x))f 0 (x) = 1. so the geometry tells us. Let f : I → J be a bijective differentiable function. h→0 f (a + kh ) − b h→0 f (a + kh ) − f (a) h→0 kh 6Unlike Spivak. . since by (20) we have (f −1 )0 (b)f 0 (f −1 (b)) = 1.41 but end up deriving it again. Let b be an interior point of J and put a = f −1 (b). 12. every b + h ∈ J is of the form b + h = f (a + kh ) for a unique kh ∈ I. (Differentiable Inverse Function Theorem) Let f : I → J be continuous and bijective.41. Then we need to come back and verify that indeed f −1 is differentiable at b if f 0 (f −1 (b)) exists and is nonzero: this turns out to be a bit stickier. Suppose that f is differentiable at a and f 0 (a) 6= 0. f 0 (x) To apply this to get the derivative at b ∈ J. Proposition 5. f (a) f (f (b)) Moreover. DIFFERENTIATION Well. we just need to think a little about our variables.96 5. [S.5] We have f −1 (b + h) − f −1 (b) f −1 (b + h) − a (f −1 )0 (b) = lim = lim . let’s make this 6 substitution as well as h = f (a + kh ) − f (a) in the limit we are trying to evaluate: a + kh − a kh 1 (f −1 )0 (b) = lim = lim = lim f (a+kh )−f (a) . Then (f −1 )0 (b) = f 0 (f −1 1 (b)) . so f (a) = b. Well. Evaluating the last equation at x = a gives 1 1 (f −1 )0 (b) = 0 = 0 −1 . we will include the subscript k to remind ourselves that this k is defined in h terms of h: to my taste this reminder is worth a little notational complication. In particular. if f −1 is differentiable at b then f 0 (f −1 (b)) 6= 0.42. Since b + h = f (a + kh ). Thm. Then f −1 is differentiable at b. with the familiar formula 1 1 (f −1 )0 (b) = 0 = 0 −1 .  As mentioned above. Proof. under the as- sumption that f is differentiable. unfortunately we need to work a little harder to show the differentiability of f −1 .

−x lies in D. this is a relatively simple case: if D ⊂ R. If we want to get an inverse function. ∞).40 – we have lim kh = lim f −1 (b + h) − a = f −1 (b + 0) − a = f −1 (b) − a = a − a = 0. If we want the restricted domain to be as large as possible. But kh = f −1 (b + h) − a. Luckily for us. −x for all x > 0. Thus f is decreasing on (−∞. if possible we would like it to be an interval. The reader should not now be surprised to hear that we give separate consideration to the cases of odd n and even n. Then f 0 (x) = (2k)x2k−1 is positive when x > 0 and negative when x < 0. n In this section we illustrate the preceding concepts by defining and differentiat- 1 ing the nth root function x n . we should choose the domain to include 0 and exactly one of x. we have limx→±∞ f (x) = ±∞. then we may simply replace the “limh→0 ” with “limkh →0 ” and we’ll be done. h→0 h→0 So as h → 0. . x . Therefore there is an inverse function f −1 : R → R which is everywhere continuous and differentiable at every x ∈ R except x = 0. There are still lots of ways to do this. but if we can show that limh→0 kh = 0. In particular it is not injective on its domain. Inverse Functions II: Examples and Applications 1 7.1. which can be done in exactly one way so as to result in a surjective function. at most one of x. 0]. domain restric- tion brings with it many choices. so by Theorem 5. so – since f −1 is continuous by Theorem 5.25 f : R → R is increasing. Then f 0 (x) = (2k +1)x2k = (2k +1)(xk )2 is non-negative for all x ∈ R and not identically zero on any subinterval [a. we need to engage in domain restriction. Moreover. then the restriction of f to D will be injective if and only if for each x ∈ R. The only issue is the pesky kh . A little thought shows that there are two restricted domains which meet all these requirements: we may take D = [0. kh → 0 and thus 1 1 1 (f −1 )0 (b) = = = . ∞) or D = (−∞. limkh →0 f (a+kkhh)−f (a) f 0 (a) f 0 (f −1 (b))  7. by the Intermediate Value Theorem the image of f is all of R. Case 1: n = 2k +1 is odd. Moreover. let n > 1 be an integer. so let’s try to impose another desirable property of the domain of a function: namely. Either way. x 7→ xn . 0] and increasing on [0. and consider f : R → R. 7. It 1 is typical to call this function x n . INVERSE FUNCTIONS II: EXAMPLES AND APPLICATIONS 97 We are getting close: the limit now looks like the reciprocal of the derivative of f at a. Unlike codomain restriction. Case 2: n = 2k is even. Since f is continuous. f is everywhere differentiable and has a horizontal tangent only at x = 0. b] with a < b.

∞) we have 0 = L(1) = L(x · x1 ) = L(x) + L( x1 ). d) We have L((0. This is perhaps more intuitive but is slightly tedious to make completely rigorous. b) For any x ∈ (0.43. As advertised. it is continuous. L(x) and E(x). ∞). ∞) be regarded as fixed. b) For x ∈ (0. L(xy) = L(x) + L(y) + Cy . (Alternately. More precisely. and the result follows immediately from part c) and the Intermediate Value Theorem. For all x. ∞). xy x xy x By the zero velocity theorem. this shows that L takes arbitaririly large values. this implies limx→∞ L(x) = ∞. this implies limx→0+ L(x) = −∞. If we plug in x = 1 we get L(y) = 0 + L(y) + Cy . ∞) → R is increasing – hence injective and has image (−∞. (Such a number exists because L : (0. L( 12 ) = −L(2) = −C < 0. the function f (x) is a constant (depending. c) We have limx→∞ L(x) = ∞. ∞)) = R. a priori on y). Since L(1) = 0. recall that when they exist antiderivatives are unique up to the addition of a constant. we will soon be able to prove that every continuous function has an antiderivative. By the Archimedean property of R. so C > 0. and since it is increasing. so we may uniquely specify L(x) by requiring L(1) = 0. Proposition 5. a) For all x ∈ (0. ∞). Thus for all x ∈ (0. Thus in fact for any real number α there is a unique positive real number β such that L(β) = α. so bor- rowing on this result we define L : (0. ∞). c) Since L0 (x) = x1 > 0 for all x ∈ (0. ∞). y → ∞+. ∞) and n ∈ Z+ . ∞). Proof. DIFFERENTIATION 7. We have 1 1 y 1 f 0 (x) = L0 (xy)(xy)0 − L0 (x) = ·y− = − = 0. combined with the fact that L is increasing. Again. Then by part a). y ∈ (0. L(2n ) = nL(2) = nC. limx→0+ L(x) = −∞. we have L( x1 ) = −L(x).98 5. ∞) → R to be such that L0 (x) = l(x). Let y ∈ (0. and thus Cy = 0. L(x) > 0. we have L(xn ) = nL(x).) . L is increasing on (0.2. To evaluate limx→0+ L(x) we may proceed similarly: by part b). and consider the function f (x) = L(xy) − L(x) − L(y). a) An easy induction argument using L(x2 ) = L(x) + L(x) = 2L(x). To be specific. Consider the function l : (0. say Cy . Proof.44. we have (21) L(xy) = L(x) + L(y). so L(xy) = L(x) + L(y). so L takes arbitrarily small values.) d) Since L is differentiable. take C = L(2). so L( 21n ) = −nL(2) = −Cn.  Corollary 5.  Definition: We define e to be the unique positive real number such that L(e) = 1. for any x > 0. we may evaluate limx→0+ L(x) by making the change of variable y = x1 and noting that as x → 0+ . ∞) → R given by l(x) = x1 .

Corollary 5. ∞) such that L(x) = 0 is x = 1. convert multiplication into addition. so E(x + y) = E(x)E(y).45. Put f (x) = E(x) . and take the value 1 at x = 0. To showcase the range of techniques available. E(x)E(y) The unique x ∈ (0. we have E(1) = e. we have L(y1 y2 ) = L(y1 ) + L(y2 ). INVERSE FUNCTIONS II: EXAMPLES AND APPLICATIONS 99 Since L(x) is everywhere differentiable with nonzero derivative x1 . Then Ey (x)E 0 (x) − Ey0 (x)E(x) E(x + y)E(x) − E 0 (x + y)(x + y)0 E(x) f 0 (x) = = E(x)2 E(x)2 E(x + y)E(x) − E(x + y) · 1 · E(x) = = 0. E(0) = 1. f (x) = E(x + y)/E(x) = Cy . we get E 0 (x) = E(x). 7. 0 Let’s compute E : differentiating L(E(x)) = x gives E 0 (x) 1 = L0 (E(x))E 0 (x) = . Second proof: We have   E(x + y) L = L(E(x + y)) − L(E(x)) − L(E(y)) = x + y − x − y = 0. Plugging in x = 0 gives E(y) = E(0)C(y) = 1 · C(y) = C(y). we give three different proofs. For all x. are equal to their own derivatives. E(x)E(y) or E(x + y) = E(x)E(y). Third proof: For any y1 . Now the previous disucssion must suggest to any graduate of freshman calculus that E(x) = ex : both functions defined and positive for all real numbers. E(x) In other words. so that x1 = L(y1 ).  Note also that since E and L are inverse functions and L(e) = 1. y ∈ R we have E(x + y) = E(x)E(y). so we must have E(x + y) = 1. there is Cy ∈ R such that for all x ∈ R. Proof. the differentiable inverse function theorem applies: L has a differentiable inverse function E : R → (0. ∞). x2 = L(y2 ) and thus E(x1 )E(x2 ) = y1 y2 = E(L(y1 y2 )) = E(L(y1 ) + L(y2 )) = E(x1 + x2 ). How many such functions could there be? . Put y1 = E(x1 ) and y2 = E(x2 ). Ey (x) First proof: For y ∈ R. E(x)2 By the Zero Velocity Theorem. or E(x + y) = E(x)Cy . let Ey (x) = E(x + y). y2 > 0.

loga (xy ) = y loga x. But. and indeed I don’t see how it can be given without using key concepts and theorems of calculus. b) We have  0 0 L(x) 1 (loga (x)) = = . For x ∈ R. g) For all x > 0 and y ∈ R. c ∈ R and a > 1. the actual definition of ax for irrational x is not given. f (x) Proof. L(a) L(a)x . If a 6= 1. if a = e this agrees with our previous definition: ex = E(xL(e))) = E(x). ax := E(L(a)x). Then a is increasing with image (0.  In other words.100 5. Fix a ∈ (0. How should we define ax ? In the following slightly strange way: for any x ∈ R. Then for all x ∈ R. a) We have (ax )0 = E(L(a)x)0 = E 0 (L(a)x)(L(a)x)0 = E(L(a)x) · L(a) = L(a)ax . we define L(x) loga (x) = . d) For all x.. E(x)2 E(x)2 f By the Zero Velocity Theorem g = E is constant: f (x) = CE(x) for all x. ∞). There is a constant C such that f (x) = CE(x) for all x ∈ R. y ∈ R. Indeed. Define a function g : R → R by g(x) = E(x) . then we must have ex = E(x) for all x. Proof. let us develop the theory of exponentials and logarithms to arbitrary bases. with the functions E(x) and L(x) in hand. the definition is motivated by the following desirable law of exponents: (ab )c = abc . loga (xy) = loga x + loga y. loga x is increasing with image (−∞. E(x)f 0 (x) − E 0 (x)f (x) E(x)f (x) − E(x)f (x) g 0 (x) = = = 0. Let a > 0 be a real number.47.46. we would have ax = E(x log a) = ex log a = (elog a )x = ax . we define ax := E(L(a)x)..we want to prove them! Proposition 5. and ax and loga x are inverse functions. DIFFERENTIATION Proposition 5. assuming this holds unrestrictedly for b. ax+y = ax ay . Let us make two comments: first. if there really is a function f (x) = ex out there with f 0 (x) = ex and f (0) = 1. ∞). x c) Suppose a > 1. L(a) a) The function ax is differentiable and (ax )0 = L(a)ax . y > 0. f ) For all x. e) For all x > 0 and y ∈ R. ∞). Second. Let f : R → R be a differentiable function such that f 0 (x) = f (x) for all x ∈ R. The point of this logical maneuver is that although in precalculus mathematics one learns to manipulate and graph exponential functions. (ax )y = axy . b) The function loga x is differentiable and (loga x)0 = L(a)x 1 . But here is the point: we do not wish to assume that the laws of exponents work for all real numbers as they do for positive integers.

a degree one polynomial. Let f (x) = ex . ax and loga x are both increasing functions. Thus check that they are inverses of each other. ∞). dxn Proof. Exercise: Prove the change of base formula: for all a. INVERSE FUNCTIONS II: EXAMPLES AND APPLICATIONS 101 c) Since their derivatives are always positive. where P1 (x) = 2x. ∞) and loga x : (0.48. L(a) L(a) L(a)  Having established all this. e) We have (ax )y = E(L(ax )y) = E(L(E(L(a)x))y) = E(L(a)xy) = axy . L(a) L(a) L(a) d) We have ax+y = E(L(a)(x + y)) = E(L(a)x + L(a)y) = E(L(a)x)E(L(a)y) = ax ay . we now feel free to write ex for E(x) and log x for L(x). Show that ax is decreasing with image (0. dx dx dx . L(a) L(a) L(a) L(a) g) We have L(xy ) L(E(L(x)y)) L(x)y loga xy = = = = y loga x. c 6= 1. f) We have L(xy) L(x) + L(y) L(x) L(y) loga (xy) = = = + = loga x + loga y. So dx n+1 e = d dn x2 IH d 2 2 2 2 n e = Pn (x)ex = Pn0 (x)ex + 2xPn (x)ex = (Pn0 (x) + 2xPn (x)) ex . ∞). ∞) → (0. logc a 2 Proposition 5. Inductive step: Assume that for some positive integer n there exists Pn (x) of degree dn x2 2 dn+1 x2 n such that dx ne = Pn (x)ex . b. Then for all n ∈ Z+ there exists a poly- nomial Pn (x). Moreover. By induction on n. loga x is decreasing with image (0. Now L(ax ) L(E(L(a)x)) L(a)x loga (ax ) = = = = x. x→∞ x→∞ L(x) ∞ lim loga (x) = lim = = ∞. and ax and loga x are inverse functions. since a > 1. it suffices to show that either one of the two compositions is the identity function. Exercise: Suppose 0 < a < 1. L(a) > 0 and thus lim ax = lim E(L(a)x) = E(∞) = ∞. ∞) → (−∞. such that dn 2 f (x) = Pn (x)ex . x→∞ x→∞ L(a) L(a) Thus ax : (−∞. logc b loga b = . ∞) are bijective and thus have inverse functions. 7. Base case (n = 1): d x2 2 2 dx e = 2xex = P1 (x)ex . of degree n. c > 0 with a.

but a little trigonometry will clean it up. 2 ). Right away we encounter a problem similar to the case of xn for even n: the trigonometric functions are periodic.102 5. 0 We claim that f is increasing on I. dx cos(arcsin x) This looks like a mess. by the Pythagorean Theorem. we get cos(arcsin x) arcsin0 (x) = 1. DIFFERENTIATION Now. In particular. and a little thought shows that the maximal possible length of an interval on which the sine function is injective is π. arcsin x is increasing. 1] → [ . Moreover since sin x has a nonzero derivative on ( −π π 2 . This still gives us choices to make. Thus cos θ = 1 − x2 . The inverse function here is of- ten called arcsin x (“arcsine of x”) in an attempt to distinguish it from sin1 x = csc x. The most standard choice – but to be sure. hence certainly not injective on their entire domain. arcsin x is differentiable there. 2 ]. we need to restrict the domain to some subset S on which f is injective. dx 1 − x2 Now consider f (x) = cos x. ]. so finally d 1 arcsin x = √ . since Pn (x) has degree n. and so forth. tan- gent. Reflecting a bit on the graph of f (x) = cos x one sees that a reasonable choice for the restricted domain is [0. Pn+1 (x) := Pn0 (x) + 2xPn (x) has degree n + 1. We have −π π arcsin : [−1. deg(g)). note that f (x) = cos x is indeed positive on ( −π π −π π 2 . We have f ([ 2 . one that is not the only possible one nor is mathematically consecrated in any particular way – is to take I = [ −π π 2 . If f and g are two polynomials such that the degree of f is different from the degree of g. then to get the ratio of the opposite side to the hypotenuse to be x we may take the length of the opposite side to be x and the length of the hypotenuse to be √ 1.3. 2 ]) = [−1. To get an inverse function. attained by any interval at which the function either increases from −1 to 1 or decreases from 1 to −1.  7. This is as good a name as any: let’s go with it. or d 1 arcsin x = . We now wish to consider inverses of the trigonometric functions: sine. it is not injective on any interval containing 0 in its interior. 1 − x2 . If we draw a right triangle with angle θ = arcsin x. 1]. completing the proof of the induction step. Once again we are forced into the art of domain restriction (as opposed to the science of codomain restriction). Since f is even. then deg(f + g) = max(deg(f ). Pn0 (x) has degree n − 1 and 2xPn (x) has degree n + 1. As usual we like intervals. To check this. Consider first f (x) = sin x. 2 ). cosine. π]: since f 0 (x) = − sin x is . Some inverse trigonometric functions. in which case the length √ of the adjacent side is. The key is to realize that cos arcsin x means “the cosine of the angle whose sine is x” and that there must be a simpler description of this. Differentiating sin(arcsin x)) = x. 2 2 As the inverse of an increasing function.

simply evaluate at x = 0: π π C = arccos 0 + arcsin 0 = + 0 = . π] and hence injective there. sin x Finally. this may be simplified. limx→± π2 tan x = ±∞. π) and 0 and 0 and π. 1 − x2 Remark: It is hard not to notice that the derivatives of the arcsine and the arccosine are simply negatives of each other. so is arccos x. as with the sine function. The tangent function is peri- odic with period π and also odd. ± 3π2 . Morever. arccos x is differentiable on (−1. Its image is f ([0. 2 ) and thus is injective there. so all real numbers except ± π2 . π2 ]. To determine C. . consider f (x) = tan x = cos x . 2 )) = R. 1) and has vertical tan- gent lines at x = −1 and x = 1.. Moreover. or −1 arccos0 x = . 7. 2 2 Since the tangent function is differentiable with everywhere positive derivative. In particular it is increasing but bounded: limx→±∞ arctan x = ± π2 . If ϕ = arccos √x. so for all x ∈ [0. so is arccos x. 2 2 and thus for all x ∈ [0. INVERSE FUNCTIONS II: EXAMPLES AND APPLICATIONS 103 negative on (0. Since cos x is differentiable and has zero derivative only at 0 and π. 1] → [0. Since f 0 (x) = sec2 x > 0. We find a formula for the derivative of arccos just as for arcsin: differentiating cos arccos x = x gives − sin(arccos x) arccos0 x = 1. and thus −1 arccos0 x = √ . π])) = [−1. 2 So the angle θ whose sine is x is complementary to the angle ϕ whose cosine is x. The domain is all real numbers for which cos x 6= 0. By the Zero Velocity Theorem. . we should restrict this domain to the largest interval about 0 on which f is defined and injective. we conclude arccos x + arcsin x = C for some constant C. since cos x is decreasing. 1]. then x = cos ϕ. π2 ] we have π arccos x + arcsin x = . f (x) is decreasing on [0. Therefore we have an inverse function   −π π arctan : R → . . sin arccos x Again. so if we are on the unit circle then the y-coordinate is sin ϕ = 1 − x2 . . which suggests that. f is increasing on ( −π π 2 . In other words the arctangent has horizontal asymptotes at y = ± π2 . so by the Intermediate Value Theorem f (( −π π 2 . Therefore we have an inverse function arccos : [−1. π]. . Since cos x is continuous. the same is true for arctan x. arccos0 x + arcsin0 x = 0.

DIFFERENTIATION 8.J. . Theorem 5. If the right-hand derivative f+ (x) exists and is 0 for all x ∈ (a. A short paper of W. Knight improves upon the Zero Velocity Theorem. b). b] → 0 R be continuous. (Right-Handed Zero Velocity Theorem [Kn80]) Let f : [a. deduces a right-handed Mean Value Inequality.49. Some Complements The Mean Value Theorem and its role in “freshman calculus” has been a popular topic of research and debate over the years. and then Theorem 5. then f is constant.49.104 5. The proof is modelled upon the usual one: one starts with a right-handed Rolle’s Theorem.

M is a least upper bound for S if M is an upper bound for S and for any upper bound M 0 for S. The following is a useful alternate characterization of sup S: the supremum of 105 . consider the following further property: (P14): Least Upper Bound Axiom (LUB): Let S be a nonempty subset of F which is bounded above. Recall that we began the course by considering the real numbers as a set endowed with two binary operations + and · together with a relation <. Among structures F satisfying the ordered field axioms. So there must be some further axiom. x ≤ M . M ≤ M 0 . of R which is needed to prove the three Interval Theorems. but it is so important that we had better make sure. However we did not claim that (P0) through (P12) was a complete list of axioms for R. the ordered field axioms.) By a least upper bound for a subset S of F . CHAPTER 6 Completeness 1.1. m ≤ x. This means exactly what it sounds like. a subset S of R is bounded below if there exists m ∈ F such that for all x ∈ S. we mean an upper bound M which is less than any other upper bound: thus. my friends: the time has come to tell what makes calculus work. we saw that this could not be the case: for instance the rational numbers Q also satisfy the ordered field axioms but – as we have taken great pains to point out – most of the “big theorems” of calculus are meaningful but false when regarded as results applied to the system of rational numbers. among others. namely the supremum of S. Here it is. Recall a subset S of F is bounded above if there exists M ∈ R such that for all x ∈ S. Then S admits a least upper bound. Gather round. We also introduce the notation lub S = sup S for the supremum of a subset S of an ordered field (when it exists). or property. Introducing (LUB) and (GLB). Dedekind Completeness 1. (For future reference. On the contrary. and satisfying a longish list of familiar axioms (P0) through (P12). There is a widely used synonym for “the least upper bound of S”. We then showed that using these axioms we could deduce many other familiar proper- ties of numbers and prove many other identities and inequalities.

Part b) of Theorem 6.1 really means the following: if F is any Dedekind complete ordered field then there is a bijection f : F → R which preserves the addition. but I won’t trouble you with them. Taking the risk of introducing even more terminology. I have my reasons for preferring the lengthier terminology.1 involves constructing the real numbers in a mathe- matically rigorous way. The definition of the least upper bound of a subset S makes sense for any set X endowed with an order relation <. or infimum. be- cause one of them will be larger than the other! Rather what is in question is the existence of least upper bounds.. or an infimum of S. Spivak does give a construction of R and a proof of Theorem 6.e. y ∈ F . +. COMPLETENESS S is an upper bound M for S with the property that for any M 0 < M . if we treat this material at all it will be at the very end of the course. Notice that the uniqueness of the supremum sup S is clear: we cannot have two different least upper bounds for a subset. m = inf S iff m is a lower bound for S and for any m0 > m there exists x ∈ S with x < m0 . 1It is perhaps more common to say “complete” instead of “Dedekind complete”. b) Conversely. this sense is similar to the one in which every serious student of computer science should build at least one working computer from scratch: in practice. Theorem 6. After discussing least upper bounds. and although in some sense every serious student of mathematics should see a construction of R at some point of her career. <) is Dedekind complete1 if it satisfies the least upper bound axiom. it is only natural to consider the “dual” con- cept of greatest lower bounds. multiplication and order structures in the following sense: for all x. then f (x) < f (y). Now here is the key fact lying at the foundations of calculus and real analysis. Then S admits a greatest lower bound. then a greatest lower bound for S. ·. One may take part b) to mean that there is essentially only one Dedekind complete ordered field: R. • f (xy) = f (x)f (y).106 6. a) The ordered field R is Dedekind complete. This is something of a production. one can probably get away with relying on the fact that many other people have performed this task in the past.1. Equivalently. any Dedekind complete ordered field is isomorphic to R. . M 0 is not an upper bound for S: explicitly. This concept of “isomorphism of structures” comes from a more advanced course – abstract algebra – so it is probably best to let it go for now. And indeed. for all M 0 < M . m ≤ x for all x ∈ S – and is such that if m0 is any lower bound for S then m0 ≤ m. The proof of Theorem 6. Again. we say that an ordered field (F. Now consider: (P140 ): Greatest Lower Bound Axiom (GLB): Let S be a nonempty subset of F which is bounded below. • f (x + y) = f (x) + f (y). this means exactly what it sounds like but it is so important that we spell it out explicitly: if S is a subset of an ordered field F . and (LUB) is an assertion about this. there exists x ∈ S with M 0 < x. and • If x < y.1 in the “Epilogue” of his text. is an element m ∈ F which is a lower bound for S – i.

But. even though we have not as yet addressed the issue of whether R satisfies (GLB). what in precalculus mathematics one cavalierly writes as M = 2. DEDEKIND COMPLETENESS 107 Example 1. there is no rational number x with x2 = 2..) (LUB) =⇒ (GLB): Let S ⊂ F be nonempty and bounded below by m. a) Then F satisfies (LUB) iff it satisfies (GLB).. In general. Of course in the previous inequality we could do better: for instance. if m = inf SF exists. b) In particular R satisfies both (LUB) and (GLB) and is (up to isomorphism) the only ordered field with this property. so in particular SR has a supremum. according to Theorem 6. you are asked to do things the other way around. which must be 2. 1.e.. in the following way: I will use the first argument to show that (LUB) =⇒ (GLB) and the second argument to show that (GLB) =⇒ (LUB). inf SF exists iff there is an element y < 0 in F with y 2 = 2. then x2 > 94 > 2. Both of these arguments is very nice in its own way..which is of course a theorem that we have √ exalted but not yet proved.. An interesting feature of this example is that we can see that inf SR exists. so if F = Q then our set SQ has neither an infimum nor a supremum. we may consider the subset SF = {x ∈ F | x2 < 2}. Let F be an ordered field. so also −3 3 2 is a lower bound for SF and 2 is an upper bound for SF . This turns out to be a very general phenomenon. we certainly believe that there is a real number whose square is 2. But here’s√ the point: how do we know that our ordered field F contains such an element 2? The answer of course is that depending on F such an element may or may not exist. a) I know two ways of showing that (LUB) ⇐⇒ (GLB). we necessarily must also have a negative element y with y 2 = 2: namely. Whether such best possible inequalities exist depends on the ordered field F .1: In any ordered field F . and I don’t want to have to choose between them. Proof. the existence of a real square root of every non-negative real number is a consequence of the Intermediate Value Theorem.e. Thus Q does not satisfy (LUB) or (GLB).. So I will show you both.. or what we usually write as − 2.. then it must be a positive element of F with M 2 = 2:√or in other words. Indeed the bounded set SF will have an infimum and a supremum if and only if there are best possible inequalities x ∈ S =⇒ m ≤ x ≤ M . then√it must be a negative element of F with m2 = 2. it is clear that if M = sup SF exists. i. Consider −S = {−x | x ∈ S}. i. . if |x| > 23 . y = −x. for which no improvement on either m or M is possible. As we saw at the beginning of the course. Indeed. similarly.2. then |x| ≤ 2. (In Exercise 1.2 below. But okay: if in F we have a positive element x with x2 = 2. Then SF is nonempty and bounded: indeed 0 ∈ SF and if x ∈ SF .1 every nonempty bounded above subset √ of R has a supremum. On the other hand. Theorem 6. A more fundamental answer is that we believe that 2 exists in R because of the Dedekind completeness of R.why do we believe this? As we have seen. Of course we could do better still.

Henceforth. Suppose that sup L(S) exists. A Dedekind complete ordered field is Archimedean. or more symbolically: inf S = − sup −S. COMPLETENESS Then −S is nonempty and bounded above by −m.1b) R is the only ordered field satisfying (LUB).108 6. if F were Dedekind complete then sup Z+ would exist.2: a) Fill in the details of the proof of Theorem 6. . Then the subset Z+ of F is bounded above by x. Theorem 6. and thus by part a) it satisfies (GLB).3. It seems likely that by now we have used this type of argument more than any other single trick or technique.. so certainly it is the only ordered field satifying (LUB) and (GLB). Also U(S) is bounded below: indeed any s ∈ S (there is at least one such s. When this has come up we have usually used the phrase “and similarly one can show. e) Use part d) to give a second proof that (LUB) =⇒ (GLB). By Theorem 6. (GLB) =⇒ (LUB): Let S be nonempty and bounded above by M .”! In view of Theorem 6. when we wish to multiply by −1 to convert ≤ to ≥. and let S be a subset of F . So. b) Let F be an ordered field. we will say by reflection. We claim that in fact inf U(S) is a least upper bound for S.}. We claim that in fact − sup(−S) is a greatest lower bound for S.. and we shall do so.2a).” Perhaps this stalwart ally deserves better. Show that inf S exists and inf S = sup L(S). max to min.2 it is reasonable to use the term Dedekind complete- ness to refer to either or both of (LUB). Exercise 1.2 below..2 asks you to check this. Define L(S) = {x ∈ F | x is a lower bound for S. or more succinctly. it has a least upper bound sup(−S). You are asked to check this in Exercise 1. But we claim that in no ordered field F does Z+ have a supremum. We will prove the contrapositive: let F be a non-Archimedean ordered field: thus there exists x ∈ F such that n ≤ x for all n ∈ Z+ . sup S = inf U(S). By (LUB).}. This seems more appealing and also more specific than “similarly.1a). Indeed.  Exercise 1. sup to inf and so forth. By (GLB) U(S) has a greatest lower bound inf U(S). so in particular it is nonempty and bounded above. Once again. Show that sup −S exists and sup −S = − inf S. (GLB). Consider U(S) = {x ∈ F | x is an upper bound for S. Suppose that inf S exists. The technique which was used to prove (LUB) =⇒ (GLB) is very familiar: we multiply everything in sight by −1. since S 6= ∅!) is a lower bound for U(S). R satisfies (LUB). Proof. b) By Theorem 6. d) Let F be an ordered field. c) Use part b) to give a second proof that (GLB) =⇒ (LUB).. and let S be a subset of F . Then U(S) is nonempty: indeed M ∈ U(S).

suppose m ∈ R is a lower bound for S such that for all  > 0. the state- ment should be understood as including the assertion that sup S exists. Then sup(X + Y ) = sup X + sup Y.2 Proposition 6.d) These follow from parts a) and b) by reflection. Show that a ≤ b. a) Let x ∈ X. a ≤ b + .e. Since this folds for all  > 0. Then for every  > 0. Calisthenics With Sup and Inf. suppose M ∈ R is an upper bound for S such that for all  > 0. Suppose that for all  > 0. Since sup S is the least upper bound of S and sup S − < sup S. there exists x ∈ S such that inf S ≤ x < inf S + . or equivalently. Let X. Let S be a nonempty subset of R. It follows that for all n ∈ Z+ . By Proposition 6. sup Y − 2 < y. a) Suppose X and Y are bounded above. b) This follows from part a) by reflection.e.  2Notice that a similar convention governs the use of lim x→c f (x). Y be nonempty subsets of R.3: Let a. so this is nothing new.2. a) Suppose S is bounded above. there exists x ∈ S with M −  < x ≤ M . Proof. Proposition 6. Now fix  > 0. i. c).4. so x + y ≤ sup X+sup Y . there exists x ∈ S such that sup S −  < x ≤ sup S. and define X + Y = {x + y | x ∈ X. there exists x ∈ S with m ≤ x ≤ m + . n + 1 ≤ M . d) Conversely. M is an upper bound for S and nothing smaller than M is an upper bound for S. by Exercise 1. that S is nonempty and bounded below. It follows that sup S −  < min(y. for all n ∈ Z+ . But then it is equally true that for all n ∈ Z+ .13]. so sup X + sup Y ≤ x + y + . n ≤ M −1.. b) Conversely. so we may take x = min(y. DEDEKIND COMPLETENESS 109 suppose that M = sup Z+ . so M −1 is a smaller upper bound for Z+ than sup Z+ : contradiction!  1. sup S) ≤ sup S.  Exercise 1. i.3.4 there are x ∈ X and y ∈ Y with sup X − 2 < x. a) Fix  > 0. b ∈ R. The material and presentation of this section is partly based on [A. c) Suppose S is bounded below. Then M = sup S. that S is nonempty and bounded above. b) Suppose X and Y are bounded below. Similarly for inf S: when it appears in the conclusion of a result then an implicit part of the conclusion is the assertion that inf S exists. Then x ≤ sup X and x ≤ sup Y ..3 sup X + sup Y ≤ sup(X + Y ). Then for every  > 0. §1. y ∈ Y . b) By hypothesis.5. sup S). Then m = inf S. Proof. y ∈ Y }. and thus sup(X+Y ) ≤ sup X+sup Y . convention: Whenever sup S appears in the conclusion of a result. n ≤ M . there exists y ∈ S with sup S −  < y. . Then inf(X + Y ) = inf X + inf Y. 1. so indeed M = sup S.

we are suggesting the following definition: • If S ⊂ R is unbounded above. Because N contains arbitrar- ily large elements of R. Let X. Y be subsets of R. As exciting and useful as this whole business with sup and inf is.  Proposition 6. We write X ≤ Y if for all x ∈ X and all y ∈ Y . and for inf S to be defined. The Extended Real Numbers. In other words. (In a similar way we define X < Y. Put sup X − inf Y = .4 there are x ∈ X. Surely we also want to make the following definition (“by reflection”!): • If S ⊂ R is unbounded below. we suppose that inf Y < sup X. X > Y . a contradiction. These definitions come with a warning: ±∞ are not real numbers! They are just symbols suggestively standing for a certain type of behavior of a subset of R. Exercise 1.7. there is one slightly annoying point: sup S and inf S are not defined for every subset of R. for sup S to be defined. Proof.6. it is not completely unreasonable to say that its elements approach infinity and thus to set sup N = +∞. COMPLETENESS Let X. S must be nonempty and bounded above. simpler) way as when we write limx→c f (x) = ±∞ and mean that the function has a certain type of behavior near the point c. Then: a) If Y is bounded above. x ≤ y. . Since X ≤ Y this gives sup X −  < x ≤ y < inf Y +  and thus sup X − inf Y < 2 = sup X − inf Y. in fact. then inf Y ≤ inf X. Let X. y ∈ Y with sup X −  < x and y < inf Y + . It is not bounded above. It involves bending the rules a bit. Y be nonempty subsets of R with X ⊆ Y .3. 1.5: Prove Proposition 6. Y be nonempty subsets of R with X ≤ Y . 2 By Proposition 6. Give necessary and sufficient conditions for X ≤ Y and Y ≤ X both to hold. Consider the subset N of R. so it does not have a least upper bound in R. X = Y is necessary but not sufficient!) Proposition 6. then we will say inf S = −∞. Then sup X ≤ inf Y. X ≥ Y . b) If Y is bounded below. Is there some way around this? There is. then we will say sup S = +∞. then sup X ≤ sup Y . S must be nonempty and bounded below. (Hint: in the case in which X and Y are both nonempty.) Exercise 1. Y be subsets of R. but in a very natural and useful way.7. Rather.4: Let X. in a similar (but. Seeking a contradiction.110 6.

as an ordered set. ∞ −∞ None of these definitions are really surprising.g. ±∞ .3 In a similar way. 1 1 = = 0. However. they correspond to facts you have learned about manipulating infinite limits. These are good reasons. again we might think in terms of associated limits. g(x) = . −∞ < x < ∞. are they? If you think about it. In fact. but lim f (x) + g(x) = lim 2011 = 2011. we define the extended real numbers [−∞. x→c x→c So ∞ − ∞ cannot have a universal definition independent of the chosen func- tions. (−∞) · (−∞) = ∞. b]: namely we have a largest and smallest element. ±∞ Why not? Well. if limx→c f (x) = ∞ and limx→c g(x) = 17. e. g(x) = (x−c) 2 . we may extend the ≤ relation to the extended real numbers in the obvious way: ∀x ∈ R. ∀x ∈ (−∞. x · ∞ = −∞. ∞). ∞ · ∞ = ∞. x · ∞ = ∞. Conversely much of the point of the extended real numbers is to give the real numbers. As a simple example. then limx→c f (x) + g(x) = ∞. ∀x ∈ (0. 1. Again. −∞ + x = −∞. certain other operations with the extended real numbers are not defined. However. consider 2011 ∞ something like f (x) = (x − c)2 . unless you know what specific functions f and g are. try constructing another example. This extension is primarily order-theoretic: that is. And similarly for ∞ . x + ∞ = ∞. The extended real numbers [−∞. 0 · ∞. In particular we do not define ∞ − ∞. for similar reasons.or wait until next semester and ask me again. x · (−∞) = ∞. it is useful to define some of these operations: ∀x ∈ R. limx→c g(x) = −∞. ∞ · (−∞) = −∞. we cannot even define the operations of + and · unrestrictedly on them. ∞] are not a field. bounded interval [a. ∞] = R ∪ {±∞} to be the real numbers together with these two formal symbols −∞ and ∞. suppose 1 −1 f (x) = + 2011. DEDEKIND COMPLETENESS 111 To give a name to what we have done. the pleasant properties of a closed. x · (−∞) = −∞. then limx→c f (x)g(x) depends on how fast f approaches zero compared to how fast g approaches infinity. However.. The above are indeterminate forms: if I tell you that limx→c f (x) = ∞ and limx→c g(x) = −∞. 0).. when evaluating limits 0 · ∞ is an indeterminate form: if limx→c f (x) = 0 and limx→c g(x) = ∞. (x − c)2 (x − c)2 Then limx→c f (x) = ∞. then what can you tell me about limx→c f (x) + g(x)? Answer: nothing. . there are also more purely algebraic reasons: there is no way to define the above expressions in such a way to make the field 3In the unlikely event you think that perhaps ∞−∞ = 2011 always.

. In particular though. then the sequence sup X n is not eventually constant.. COMPLETENESS axioms work out. X = X N = X N +1 = . then for X which is unbounded above. x + 1 > x.7: it says that. . i. . If therefore we were allowed to substract ∞ from ∞ we would deduce a = ∞ − ∞. . because it follows easily from the field axioms that for any x ∈ F . then sup X ≤ sup Y. all of the “calisthenics” of the previous sec- tion have nice analogues for unbounded sets. Xn = {x ∈ X | x ≥ n}. and thus ∞ − ∞ could be any real number: that’s not a well-defined operation. In speaking about elements like x we sometimes call them infinitely large. inf Y ≤ inf X. if X is bounded above. i.7. for which there exist elements x such that x > n for all n ∈ Z+ . For instance. ⊂ X and also ∞ [ X= X n. a similar discussion holds for inf X. but in increasingly generous ways. The idea here is that in defining X n we are cutting it off at n in order to force it to be bounded above. in fact it takes increasingly large values. and in particular limn→∞ sup X n = sup X. we get that for every nonempty subset X of R. . Then some N ∈ Z+ is an upper bound for X. . and thus lim sup X n = ∞. that if X ⊂ Y ⊂ R. We leave it to the reader to investigate this phenomenon on her own.e. This definition could have motivated our definition of sup and inf for unbounded sets. under conditions ensuring that the sets are nonempty and bounded above / below. so the sequence sup X n is eventually constant. n→∞ Thus if we take as our definition for sup X. sup X n ≤ . we get sup X = limn→∞ sup X n = ∞. n=0 in other words. Suppose moreover that X is bounded above. every element of X is a subset of X n for some n (this is precisely the Archimedean property). One of the merits of this extended definition of sup S and inf S is that it works nicely with calculations: in particular. limn→∞ sup X n . sup X 0 ≤ sup X 1 ≤ sup X 2 ≤ . By reflection. On the other hand.. . . Remark: Sometimes above we have alluded to the existence of ordered fields F which do not satisfy the Archimedean axiom.112 6. . as follows: for n ∈ Z and X ⊂ R. let a ∈ R. no ordered field F can have a largest element x. We have X0 ⊂ X1 ⊂ . Then a + ∞ = ∞. The moral: although we call ±∞ “extended real numbers”. put X n = {x ∈ X | x ≤ n}.. one should not think of them as being elements of a number system at all. let’s look back at Proposition 6. Applying Proposition 6.e. but rather limiting cases of such things. . Indeed. This is a totally different use of “infinity” than the ex- tended real numbers above.

Namely. (−∞. by reflection. There is exactly one extended real number which is less than or equal to every integer: −∞. at some upper bound of S – and keep pushing to the left until you can’t push any farther without passing by some element of S. Pn has a single element. so start anywhere and push to the left: you can keep pushing as far as you want. (−∞.e. so to speak. we must have sup ∅ ≤ sup{n} = n and inf ∅ ≥ inf{n} = n. because you will never hit an element of the set. b). I claim we can use these sets Pn along with Proposition 6. Therefore the inexorable conclusion is sup ∅ = −∞. if we didn’t have so many different kinds of intervals to write down and check. b). ∞).7 to see what inf ∅ and sup ∅ should be. Are there any nonempty convex sets other than intervals? (Just to be sure. in class I had a lot of success with the “pushing” conception of suprema and infima. the entire interval [x. We say that a subset S of R is convex if for all x < y ∈ S. This is immediate – or it would be. inf ∅ = ∞. because ∅ ⊂ {n}.e. (−∞. then for all n ∈ Z. Example 2..7.7. the integer n. if your set S is bounded above. every real number is an upper bound for ∅. to define these quantities in such a way as to obey Proposition 6..1. y] lies in S. one last piece of business to attend to: we said we wanted sup S and inf S to be defined for all subsets of R: what if S = ∅? There is an answer for this as well. a convex set is one that whenever two points are in it. Namely. [a. however. For any x ∈ R. b]. we . 2. [a. ∞). there is nothing to check! Example 2. b). One needs to see that the definition applies to invervals of all of the following forms: (a. Similarly for infima. Similarly. Certainly then inf Pn = sup Pn = n. the singleton set {x} is convex. Thus you can push all the way to −∞. but many people find it confusing and counterintuitive at first. then you start out to the right of every element of your set – i. all in between points are also in it. there is exactly one extended real number which is greater than or equal to every integer: ∞. Convex subsets of R. 2. ∞). All these verifications are trivial appeals to things like the transitivity of ≤ and ≥. What happens if you try this with ∅? Well. so let me approach it again using Proposition 6. [a. Other reasonable thought leads to this conclusion: for instance. b]. In both cases the definition applies vacuously: until we have two distinct points of S. INTERVALS AND THE INTERMEDIATE VALUE THEOREM 113 There is.1: The empty set ∅ is convex. Intervals and the Intermediate Value Theorem 2. For each n ∈ Z. consider the set Pn = {n}: i. In other words.2: We claim any interval is convex. So what? Well. b]. (a. (a.

namely (− 2. But since z ∈ D and b = sup D. and since D is unbounded above there exists b ∈ D with z < b. Similarly. Most contemporary analysts prefer a rival construction of R due to Cauchy using Cauchy sequences. we include a. I agree that Cauchy’s construction is simpler. Thus I = D = I = R. ∞). with certain modifications. Step 2: We claim that any subset D which contains I and is contained in I is an interval. Remark: Perhaps the above example seems legalistic. since z < b = sup D.  4However. there exists a ∈ D with a < z. ∞) be the infimum of D. b). i. e. there exists c ∈ D with c < z. and let z ∈ I. Since D is unbounded below. . When one looks carefully at the definitions it is no trouble to check that working solely in the rational numbers S is a convex set but is not an interval. 2). then there exists d ∈ D with z < d.e. ∞] be the supremum of D. b]. and let b ∈ (−∞. there exists a ∈ D with a < z. Dedekind’s construction works in the context of a general linearly ordered set to construct an associated Dedekind-complete ordered set.114 6. Case 1: Suppose I = (a. both are important in later mathemat- ics: Cauchy’s construction works in the context of a general metric space (and. but once again ± 2 6∈ Q. if b is finite. we want z ≤ b. We must have inf D = a ≤ z ≤ b = sup D. COMPLETENESS count {x} = [x. Throughout these notes the reader may notice minor linguistic contortions to ensure that this issue never arises. Case 4: Suppose I = (−∞.8. Let z ∈ R. this is immediate. Since D is convex. we do not wish to say whether the empty set is an interval.. Case 3: Suppose I = (a. z ∈ D. This√ has √ a familiar theme: replacing √ Q by R we would get an interval. if a is finite. but as above an interval can have any one of nine basic shapes. It may be quite tedious to argue that one of nine things must occur! So we just need to set things up a bit carefully: here goes: let a ∈ [−∞. We are given a nonempty convex subset D of R and we want to show it is an interval. let z ∈ D. Every nonempty convex subset D of R is an interval. S = {x ∈ Q | x2 < 2}. It really isn’t: one may surmise that contemplation of such examples led Dedekind to his construction of the real numbers via Dedekind cuts. since z < sup D. we include b. Let x ∈ I. z ∈ D. b ∈ R. However. Then. b ∈ R – in this case D could also be [a. b) with a. b). We wish to show that z ∈ I = (−∞. Since D is convex. in other words.g. there exists b ∈ D such that z < b. Since D is unbounded below. because if we work over Q with all of the corresponding definitions then there are nonempty convex sets which are not intervals. Moreover. Since D is convex. ∞) = R. This is similar to Case 2 and is left to the reader. Thus I ⊂ D ⊂ I. Next. and both are intervals. or maybe even a little silly. b) or (a. This construction may be discussed at the end of this course. Theorem 6. b]. and the only case in which there is any subset D strictly in between them is I = (a. Let z ∈ (a. Step 1: We claim that I ⊂ D ⊂ I. z ∈ D. in a general uniform space) to construct an associ- ated complete space. x] as an interval. Let I = (a. b).4) A little thought suggests that the answer should be no. Case 2: Suppose I = (−∞. since z > a = inf D. Now suppose z ∈ D. Indeed I and I are both intervals. b) with a. Proof. and let I be the closure of I. But more thought shows that if so we had better use the Dedekind com- pleteness of R.

Step 1: We make the following claim: if f : [a. proof of claim: Let S = {x ∈ [a.  2. then for all z with x < z < y. b) such that f (c) = 0. e) Deduce the converse to Theorem 6. b). Show that for all nonempty subsets S of F . Moreover S is bounded above by b. Since a ∈ S. contradicting the Intermediate Value Property. we define the associated downset S↓ = {x ∈ F | x ≤ s for some s ∈ S}. By Theorem 8. b]) is an interval. d) In any ordered field F . then f satisfies the Intermediate Value Property and thus f (I) is an interval. b] ⊂ I. the downset S↓ is convex. and any L in between f (a) and f (b). Theorem 6. b] → R is continuous. Proof. b] ⊂ D. (iii) For any interval I ⊂ D. (ii) =⇒ (iii): Suppose that f satisfies IVP. b] is an interval. the IVP is closely related to the notion of a convex subset. Freiwald): Let F be an ordered field. f ([a. in which case any element of (c − δ. if f (c) > 0. (i) =⇒ (ii): For all [a. then – as we have seen several times – there exists δ > 0 such that f (x) < 0 for all x ∈ (c − δ. and thus there are elements of S larger than c. so by assumption f ([a. Proof. 2. By the process of elimination we must have f (c) = 0! Step 2: We will show that f satisfies the Intermediate Value Property: for all [a. As you may well have noticed. hence a convex subset. b) such that . we say a subset S of F is convex if x. then F is Dedekind complete.2. The following result clarifies this connection. Assume not: then there exists a < b ∈ I and some L in between f (a) and f (b) such that L 6= f (c) for any c ∈ I. then there exists δ > 0 such that f (x) > 0 for all x ∈ (c − δ. and let I ⊂ D be an interval. Theorem 6. (iii) =⇒ (i): This is immediate: [a. (Strong Intermediate Value Theorem) If f : I → R is contin- uous. c + δ). b]. (ii) f satisfies the Intermediate Value Property. The (Strong) Intermediate Value Theorem. Therefore S has a least upper bound c = sup S. We want to show that f (I) is an interval. z ∈ S. a) Show that S 6= ∅ ⇐⇒ S↓ 6= ∅.1 it suffices to show that f (I) is convex.10.9. f ([a. c) gives a smaller upper bound for S than c. Similarly. f (I) is an interval. if f (c) < 0. y ∈ S with x < y.8: if F is an ordered field in which every nonempty convex subset is an interval. b] ⊂ D. we must find c ∈ (a. b) Show that S is bounded above ⇐⇒ S↓ is bounded above. the following are equivalent: (i) For all [a. then there exists c ∈ (a. Recall that a function f : D → R satisfies the Intermediate Value Property (IVP) if for all [a. For a subset S of F . b] ⊂ D. and if so then we have sup S = sup S↓ . c) Show that S has a supremum in F ⇐⇒ S↓ has a supremum in F . f (a) < 0 and f (b) > 0. b]) is a convex subset of R. for all L in between f (a) and f (b) is of the form f (c) for some c ∈ (a. b]) is a convex subset containing f (a) and f (b). Indeed. For f : D ⊂ R → R. In particular L 6= f (c) for any c ∈ [a. It is easy to see that we must have f (c) = 0. INTERVALS AND THE INTERMEDIATE VALUE THEOREM 115 Exercise (R. contradicting c = sup S. c + δ). b] | f (x) < 0}. hence it contains all numbers in between f (a) and f (b). S is nonempty.

The Monotone Jump Theorem Theorem 6. Then limx→b− f (x) exists and is at most f (c). (Monotone Jump) Let f : I → R be weakly increasing. We define f : F → F by: • f (x) = −1. Proof. Thus (22) l ≤ f (c) ≤ r. moreover g(a) = f (a) − L < 0 and g(b) = f (b) − L > 0. f (c) is an upper bound for L. We will prove the contrapositive: suppose F is not Dedekind complete. f (c) ≤ f (x). R = {f (x) | x ∈ I. The Intermediate Value Theorem Implies Dedekind Complete- ness. not just an interval of the form [a. such that f (c) = L. Therefore we may define l = sup L. and lim f (x) ≤ f (c) ≤ lim f (x).12. x ∈ U(S).  Remark: Theorem 6. f (x) ≤ f (c). For all c < x. a) Step 0: As usual. but it never takes the value zero. If f (a) = f (b) there is nothing to show.  Exercise 2.9. x→c− x→c+ b) Suppose I has a left endpoint a. a point of discontinuity would occur only at the least upper bound of S. • f (x) = 1. Then limx→a+ f (x) exists and is at least f (a). x > c}.11 is continuous at every element of F . f (I) is an interval.e. which is assumed not to exist. so f (c) is a lower bound for R. L is bounded above by f (c) and U is bounded below by f (c). Then limx→c− f (x) and limx→c+ f (x) exist. so is g.11. b]. x < c}. we may f is weakly increasing. Then f is continuous on F – indeed. so f (c) ≤ r. This version of IVT applies to continuous functions with domain any interval. . so f does not satisfy IVP. Theorem 6.. Now consider the function g(x) = f (x)−L. which cannot be an upper bound for S because then it would be the maximum element of S – and the value 1 at any upper bound for S (we have assumed that S is bounded above so such elements exist). so l ≤ f (c). i. Then F is Dedekind complete. Let F be an ordered field such that every continuous function f : F → F satisfies the Intermediate Value Property. Step 1: For all x < c. Proof. COMPLETENESS f (c) = L.116 6. 2. and includes a result that we previously called the Interval Image Theorem. Since f is continuous. Therefore by Step 1 there exists c ∈ (a. b) such that 0 = g(c) = f (c) − L. and let S ⊂ F be nonempty and bounded above but without a least upper bound in F . We define L = {f (x) | x ∈ I. Let U(S) be the set of upper bounds of S. so it is enough to treat the case f (a) < L < f (b). Moreover f takes the value −1 – at any element s ∈ S.10 is in fact a mild improvement of the Intermediate Value Theorem we stated earlier in these notes. If f (a) > f (b). r = inf R.3: Show in detail that the function f : F → F constructed in the proof of Theorem 6. a) Let c be an interior point of I. then we may replace f by −f (this is still a continuous function). 3. if x ∈ / U(S). Since f is weakly increasing. Step 3: By Step 2 and Theorem 6. c) Suppose I has a right endpoint b.3.

so has a (finite!) greatest lower bound inf S 0 . (i) =⇒ (ii): If f is not continuous on all of I. from which the full theorem easily follows. Recall that the key special case of IVT. c < d. Case 3: a < inf S 0 ∈ S 0 . y 0 ] ⊂ S. Then S = [a. choose any b ∈ I. (RI3) For all x ∈ R. Otherwise. we get an especially snappy proof of the Continuous Inverse Function Theorem. (ii) =⇒ (i): This follows immediately from Theorem 6. x) ∈ S. In more detail: suppose f is discontinuous at c. if x 6= b there exists y > x such that [x.  With Theorems 6. It is bounded below by a. then by the Monotone Jump Theorem f (I) fails to be convex. by (RI2) there exists y > inf S 0 such that [inf S 0 . is this: if f : [a. b] \ S is nonempty. For a monotone function f : I → R. f −1 is continuous! 4. b < c. Then f (I) contains f (c) < f (d) but not the in-between point limx→c+ f (x). let  > 0. Then f (I) contains f (b) < f (c) but not the in-between point limx→c− f (x). so by Theorem 6. Since f is monotone. f (I) = J is an interval. 4.13. there exists y > a such that [a. Step 3: We claim limx→c+ f (x) = r: this is shown as above and is left to the reader. a ∈ S. Similar arguments hold if c is the left or right endpoint of I: these are left to the reader. Then by (RI1). Since f is weakly increasing. l −  is not an upper bound for L: there exists x0 < c such that f (x0 ) > l − . (Principle of Real Induction) Let a < b be real numbers. As usual. Moreover f −1 (J) = I is an interval. REAL INDUCTION 117 Step 2: We claim limx→c− f (x) = l. Seeking a contradiction we suppose not: S 0 = [a. Step 4: Substituting the results of Steps 2 and 3 into (22) gives the desired result. Proof. and suppose: (RI1) a ∈ S. if [a.13. Thus we may take δ = c − x0 . then S = [a. Real Induction Theorem 6. so by (RI3) inf S 0 ∈ S: contradiction!  Example 4. choose any d ∈ I. Proof. and thus y is a greater lower bound for S 0 then a = inf S 0 : contradiction. Moreover f : I → J is a bijection. Case 2: a < inf S 0 ∈ S. b].1: Let us reprove the Intermediate Value Theorem. Since l is the least upper bound of L and l −  < l. Thus in all cases f (I) is not convex hence is not an interval. let S ⊂ [a. so by (RI2). inf S 0 ) ⊂ S. (RI2) for all x ∈ S. In the latter case. By Theorem 6. then x ∈ S.14. so is f −1 . Then [a.  Theorem 6. b) and c): The arguments at an endpoint are routine modifications of those of part a) above and are left to the reader as an opportunity to check her understanding. it is no loss of generality to assume f is weakly increasing. b] → . b]. with inverse function f −1 : J → I. (ii) f is continuous. y] ⊂ S. Let f : I → R be continuous and injective.10. In the former case. for all x0 < x < c we have l −  < f (x0 ) ≤ f (x) ≤ l < l + . To see this. However: Case 1: inf S 0 = a.10. b]. If inf S 0 = b. y] ⊂ S.10 and 6.13 in hand. If c is an interior point then either limx→c− f (x) < f (c) or f (c) < limx→c+ f (x). contradicting the definition of inf S 0 . TFAE: (i) f (I) is an interval.

e.14. x) ⊂ S. b] | f (x) ≥ 0}. x]. then the element b ∈ is a lower bound for T . x + δ]. Now suppose that S 6= [a. y) such that z ∈ / S. which suffices. taking any y ∈ (t. f (a) > 0. so applying real induction shows that one of (RI1).. b]. then by continuity there would exist δ > 0 such that f is negative on [x−δ. So (RI2) must fail: there exists y ∈ (a. Nevertheless we claim that S satisfies (RI1) and (RI3).. b]. Step 2: Since F satisfies the Principle of Real Induction. (ii) F satisfies the Principle of Real Induction: for all a < b ∈ F . so a ∈ S. (RI2) and (RI3) must fail. IVT is equivalent to: let f : [a. b] by real induction. We have a ∈ S – so (RI1) holds – and if a continuous function is non-negative on [a.. We know that S is proper in [a. i. We claim that f (x) > 0. We prove this by Real Induction. i. so by Step 1 S does not satisfy (RI2): there exists x ∈ S. b]. b].15.. This implies f (y) = 0. We will show that T has an infimum. b] → R be continuous and nowhere zero. then there exists c ∈ (a. Indeed. a contradiction. x]: contradiction! The following result shows that Real Induction does not only uses the Dedekind completeness of R but actually carries the full force of it. For this. x). b]. b] satisfying (RI1) through (RI3) above must be all of [a. then it is also non-negative at c: (RI3). (RI2) Let x ∈ S. x) is a lower bound for T . x) ⊂ S. let S be the set of lower bounds m of T with a ≤ m. redux: In class I handled the proof of IVT by Real Induction differ- ently. we get that y is not a lower bound for T either. b] and [a. Let S = {x ∈ [a. c). as follows. f is both positive and negative at each point of [x−δ. b] | f (x) > 0}. (RI3): Suppose x ∈ (a. (RI1): Since a is a lower bound of T with a ≤ a. by Step 1 S = [a. Step 1: Observe that b ∈ S ⇐⇒ b = inf T . b]. b] be such that [a.118 6. In general the infimum could be smaller. we have a ∈ S. In an ordered field F . We prove this by real induction. If f (a) > 0. there exists t ∈ T such that t < x. Proof. x + δ] ⊂ S.. the only other possibility is f (x) < 0. In other words x is a lower bound for T and no element larger than x is a lower bound for T . Then S ⊂ [a. x < b. so f (x) > 0. b] iff S satisfies (RI2).e. the following are equivalent: (i) F is Dedekind complete: every nonempty bounded above subset has a supremum. Theorem 6. so every y ∈ [a.  . y + ). COMPLETENESS R is continuous. f (a) < 0 and f (b) > 0. x). Then f (b) > 0 iff b ∈ S. a subset S ⊂ [a. (RI1) By hypothesis. We will show S = [a. then f (b) > 0. Let b be any element of T . since f (x) 6= 0. This strategy follows [Ka07]. so our strategy is not exactly to use real induction to prove S = [a. and in a way which I think gives a better first example of the method (most Real Induction proofs are not by contradiction). b] such that f (y) ≥ 0 but there is no  > 0 such that f is non-negative on [y. b) with f (c) = 0.1. f is positive on [a. If S = [a. (i) =⇒ (ii): This is simply a restatement of Theorem 6. (RI3) Let x ∈ (a. there exists z ∈ (x. z is not a lower bound for T . Namely. and thus [x. (ii) =⇒ (i): Let T ⊂ F be nonempty and bounded below by a ∈ F .e. i. Example 4. Let S = {x ∈ [a. so it must be the infimum of T . Since f is continuous at x. Then x is a lower bound for T : if not. but if so. there exists δ > 0 such that f is positive on [x. x < b such that for any y > x.so x = inf T .

1 These are really different statements: for instance. f (x) = x−2 is bounded on [0. this simply means that every element x ∈ S is also an element of Xi for at least one i. x − δ]. b] gives rise to an unbounded continuous function gM : [a. b] | f : [a. so is bounded near x: for instance. x] → R is bounded}. Then the covering {Ui }i∈I has a finite . gm is bounded. Having said this.” 5. Suppose that there does not exist x ∈ [a. xM ∈ [a. b] such that f (xm ) = m. x]. |f (y)| ≤ |f (x)| + 1.17. (RI1): Evidently a ∈ S. given any ordered set (F. x]. Similarly. assuming that f (x) < M for all x ∈ 1 [a. (RI2): Suppose x ∈ S. b] 1 and the function gm : [a. But. Now beware: this does not say that f is bounded on [a. <) – i.16. x 7→ M −f (x) . it becomes clear that we can proceed almost exactly as we did above: since f is continuous at x. So indeed there must exist xm ∈ [a. THE HEINE-BOREL THEOREM 119 Remark: Like Dedekind completeness. By part a) we have −∞ < m ≤ M < ∞. I proved that Theorem 6.e. so that f is bounded on [a. there exists 0 < δ < x − a such that f is bounded on [x − δ. In fact. b] → R. and let {XiS }i∈I be a family of subsets of R.. f is bounded on [a. We say that the family {Xi } covers S if S ⊂ i∈I Xi : in words. f (xM ) = M . b) Let m = inf f ([a. 2). But since a < x − δ < x we know also that f is bounded on [a. x) ⊂ S. 6. x): rather it says that for all a ≤ y < x. but this is absurd: by definition of the infimum. Theorem 6. b]). x + δ] and thus on [a. x] and also on [x. as usual. y] for all y < 2 but it is not bounded on [0. “Real Induction” depends only on the or- dering relation < and not on the field operations + and ·. By the result of part a). x]. b] with f (xM ) = M . The Heine-Borel Theorem Let S ⊂ R. x + δ]. contradicting part a).  6. a) Let S = {x ∈ [a. f (x) − m takes values less than n1 for any n ∈ Z+ and thus gm takes values greater than n for any n ∈ Z+ and is accordingly unbounded. So f is bounded on [a. (Extreme Value Theorem) Let f : [a. y]. So there exists xM ∈ [a. i. bounded interval by open intervals Ui . (Heine-Borel) Let {Ui }i∈I be any covering of the closed. there exists δ > 0 such that for all y ∈ [x − δ. b] such that f (xm ) = m.e. x + δ]. (RI3): Suppose that x ∈ (a. that the infimum and supremum are actually attained as values of f . In [Cl11]. the key feature of this counterexample is a lack of continuity: this f is not continuous at 2. Then: a) f is bounded. We want to show that there exist xm . we need not have operations + or · at all – it makes sense to speak of Dedekind completeness and also of whether an analogue of Real Induction holds. b]) and M = sup f ([a.. b] with f (x) = m: then f (x) > m for all x ∈ [a. b] → R be continuous. Proof. b] and [a. b) f attains a minimum and maximum value. b] → R by gm (x) = f (x)−m is defined and continuous. so f is bounded on [a. But then f is continuous at x.15 holds in this general context: an ordered set F is Dedekind complete iff the only it satisfies a “Principle of Ordered Induction. The Extreme Value Theorem Theorem 6.

Uniform Continuity 7. b] | U ∩ [a.17 to the open covering {Ix }. switching the order of quantifiers in general makes a big diffference in the meaning and truth of mathe- matical statements. A subset of R is called open if it is a union of open intervals. x) ⊂ S. COMPLETENESS subcovering: there is a finite subset J ⊂ I such that every x ∈ [a. . Although we used this δ to show that f is continuous at some arbitrary point c ∈ R. x] has a finite subcovering}. These two definitions are eerily (and let’s admit it: confusingly. By definition of open subset. Un covers [a. .  Exercise: The formulation of the Heine-Borel Theorem given above is superficially weaker than the standard one. b] lies in Uj for some j ∈ J. so {Ui }i∈J ∪ Uix covers [a. there exists a δ > 0 such that for all x1 . In the definition of uniform continuity. Let’s look at some simple examples. uniformly – across all values of x1 ∈ I. x2 ∈ I. . Proof. if |x1 − x2 | < δ then |f (x1 ) − f (x2 )| < . (RI3): if [a. continuity at every point of I – let us rephrase the standard -δ definition of continuity using the notation above. Ux must contain some open interval Ix containing x. Then f is uniformly continu- ous on I if for every  > 0. her choice of δ must work simultaneously – or. . there is an open subset Ux containing x. Thus f is uniformly continuous on R. The Definition. if |x1 − x2 | < δ then |f (x1 ) − f (x2 )| < . . The only difference is in the ordering of the quantifiers: in the definition of continuity. x − δ]. evidently the choice of δ does not depend on the point c: it works uniformly across all values of c. m 6= 0. player two only gets to hear the value of : thus. at first) similar: they use all the same words and symbols. Since x − δ ∈ S. x + δ] for some δ > 0. Namely: A function f : I → R is continuous on I if for every  > 0 and every x1 ∈ I.1: Let f : R → R by f (x) = mx + b. Of course. b]. In order to show what the difference is between uniform continuty on I and “mere” continuity on I – i. let S = {x ∈ [a. For an open covering U = {Ui }i∈I of [a.120 6. x] ∈ Uix . b]. and this is no exception. (Suggestion: for each x ∈ [a. In the usual statement of the Heine-Borel Theorem the Ui ’s are allowed to be arbitrary open subsets of R. Example 6. in the lingo of this subject. (RI1) is clear. We claim that f is uniformly continuous on R. then some Ui contains [x. In fact the argument that we gave for continuity  long ago shows this.. Apply Theorem 6. b] by Real Induction. That’s the only difference. We prove S = [a. (RI2): If U1 .17. x]. and let S δ > 0 be such that [x − δ. there exists δ > 0 such that for all x2 ∈ I. Key Examples. noindent Let I be an interval and f : I → R. because for every  > 0 we took δ = |m| .1. Show that this apparently more general version can be deduced from Theorem 6. there is a finite J ⊂ I with i∈J Ui ⊃ [a.e. x]. player two gets to hear the value of  and also the value of x1 before choosing her value of δ. let ix ∈ I be such that x ∈ Uix .) 7.

For instance. • f is (. x and x + 2δ are less than δ 2 apart. For .  So by taking δ = min(1. we factored x2 − c into (x − c)(x + c) and saw that to get some control on the other factor x + c we needed to restrict x to some bounded interval around c. δ)-UC on I if for all x1 . But this is not possible: take any δ > 0.18. This is a sort of halfway unpacking of the definition of uniform continuity. δ > 0. and let f : [a. (Covering Lemma) Let a < b < c < d be real numbers. 2|c| + 1 But the above choice of δ depends on c. let us say that f is (. |x1 − x2 | < δ =⇒ |f (x1 ) − f (x2 )| < . Let’s see it in action. Now we need only use the fact that we are assuming |c| ≤ M to remove the dependence of δ on c: since |c| ≤ M  we have 2|c|+1 ) ≥ 2M1+1 . Remark: In fact a polynomial function f : R → R is uniformly continuous on R if and only if it has degree at most one. this expression can be made arbitrarily large. In particular. M ]. we may as well assume I = [−M. δ)-UC on I. there exists δ > 0 such that f is (. M ] implies uniform continuity on I. On this interval |x + c| ≤ |x| + |c| ≤ |c| + 1 + |c| ≤ 2|c| + 1. by restricting f (x) = x2 to any such interval. Lemma 6. x2 ∈ I. Let f : I → R. there would have to be some δ > 0 such that for all x1 . so for  > 0 we may take δ = min(1. bounded interval I? For instance. x2 ∈ R with |x1 − x2 | < δ. It turns out that one can always recover uniform continuity from continuity by restricting to a closed bounded interval: this is the last of our Interval Theorems. it is uniformly continuous. δ1 . δ1 )-UC on [a. because any I is contained in such an interval. c + 1]. 7. If it were uniformly continuous. The following small technical argument will be applied twice in the proof of the Uniform Continuity Theorem. But wait! What if the domain is a closed. δ2 > 0. |x21 − x22 | < .2.2: Let f : R → R by f (x) = x2 . if x = 1δ . take  = 1. 7. say [c − 1. so advance treatment of this argument should make the proof of the Uniform Continuity Theorem more palatable. UNIFORM CONTINUITY 121 Example 6. and |x2 − (x + 2δ )2 | = |xδ + δ4 |. The reasoning is similar to the above. Suppose that for real numbers 1 . and uniform conti- nuity on [−M. More precisely. 2M1+1 ). f : I → R is uniformly continuous iff for all  > 0. Then for any x ∈ R. d] → R. But if I get to choose x after you choose δ. The Uniform Continuity Theorem. So f is not uniformly continuous on R. So that’s sad: uniform continuity is apparently quite rare. This time I claim that our usual proof did not show uniform continuity. Indeed. M ]. So it doesn’t show that f is uniformly continuous on R. then it is strictly greater than 1. To show that f is continuous at c. c] and . This shows 2 that f (x) = x is uniformly continuous on [−M. In fact the function f (x) = x2 is not uniformly continuous on R. 2|c|+1 ) we found that if |x − c| < δ then  |f (x) − f (c)| = |x − c||x + c| ≤ · (2|c| + 1) = .

x]. x2 ∈ [b. x + δ2 ]. min(δ1 . The Bolzano-Weierstrass Theorem For Subsets Let S ⊂ R. min(δ1 . Suppose x1 < x2 ∈ I are such that |x1 − x2 | < δ. so |f (x1 ) − f (x2 )| < . x + 1). Example: If S = R. if x1 < x2 ≤ c. δ2 )-UC on [a. δ2 )-UC on [x − δ2 . We apply the Covering Lemma to f with a < x − δ2 < x < x + δ2 to conclude that f is (. x]. We say that x ∈ R is a limit point of S if for every δ > 0. Example: If S ⊂ T and x is a limit point of S. For S ⊂ R and x ∈ R.  8. Moreover. b] → R be continu- ous. We apply the Covering Lemma to f with a < x−δ1 < x− δ21 < x to conclude that f is (. To show that f is uniformly continuous on [a. min(δ. x + δ2 ]. δ2 ))-UC on [a. Thus we must have either that b ≤ x1 < x2 or x1 < x2 ≤ c. More generally. d]. (ii) x is a limit point of S. for any x ∈ R. take I = (x − 1. x+δ2 ]. x+δ2 ]. |f (c1 ) − f (c2 )| = |f (c1 ) − f (x) + f (x) − f (c2 )| ≤ |f (c1 ) − f (x)| + |f (c2 ) − f (x)| < . b] such that there exists δ > 0 such that f is (. If b ≤ x1 < x2 .122 6. δ2 . b]. there exists δ2 > 0 such that for all c ∈ [x. x− δ21 ]. δ2 . so |f (x1 ) − f (x2 )| < . We will show this by Real Induction. Similarly. Then f is uniformly continuous on [a. (Uniform Continuity Theorem) Let f : [a. then x1 . so there exists δ1 > 0 such that f is (. COMPLETENESS • f is (. |f (c)−f (x)| < 2 .. It follows that [x. d] and |x1 − x2 | < δ ≤ δ2 . (RI1): Trivially a ∈ S(): f is (. since f is continuous at x. Since x − δ21 < x. x− δ21 −(x−δ1 ))) = (. x) ⊂ S(). min( δ21 . δ1 )-UC on [a. by hypothesis there exists δ2 such that f is (. . c] and |x1 − x2 | < δ ≤ δ1 . if S ⊂ R is dense – i. Proposition 6. Proof. x].e. if every nonempty open interval I contains an element of S – then every point of R is a limit point of S. x is a limit point of T . there exists δ1 > 0 such that f is (. δ2 ))-UC on [a. let S() be the set of x ∈ [a. c2 ∈ [x−δ2 .20. δ2 . then every x ∈ R is a limit point. In particular this holds when S = Q and when S = R \ Q. then x1 . Why 2 ? Because then for all c1 . Thus x ∈ S(). b]. the following are equivalent: (i) Every open interval I containing x also contains infinitely many points of S. Example: The subset Z has no limit points: indeed. x + δ2 ] ⊂ S(). δ)-UC on [a. x2 − x1 > c − b ≥ δ. f is (. x − (x − δ2 ))) = (. a] for all δ > 0! (RI2): Suppose x ∈ S().19. For  > 0. δ)-UC on [a. there exists s ∈ S with 0 < |s − x| < δ. since f is continuous at x. δ1 )-UC on [x − δ1 . Proof. In other words. Then I is bounded so contains only finitely many integers. As above. Then f is (. it suffices to show that S() = [a. x is a limit point of S if every open interval I containing x also contains an element s of S which is not equal to x. x2 ∈ [a. x]. min(δ1 . b]. Note that these examples show that a limit point x of S may or may not be an element of S: both cases can occur. Equivalently. (RI3): Suppose [a. Example: No finite subset S of R has a limit point. δ2 )-UC on [b. c − b))-UC on [a.  Theorem 6. b] for all  > 0. Then it cannot be the case that both x1 < b and c < x2 : if so.

x + δ] is infinite for all δ > 0. 9. b] be a weakly increasing function. it has a limit point. x) ⊂ S. b] | f (x) ≥ x} is a fixed point of f . b]. S ∩ [−M. and then x is a limit point for A and S = [a. and let S be the set of x in [a. f (a) ≥ a. However Tarski’s theorem has a quite different flavor from the fixed point theorems to come in that the function f is not assumed to be continuous! Theorem 6. If not. there is x ≤ z ≤ y with f (z) < z ≤ y. Let A ⊂ [a. y = f (x) < x. which we will do by Real Induction.  Remark: Later we will give a “sequential version” of Bolzano-Weierstrass and see that it is equivalent to Theorem 6. If for some δ > 0.22 to calculus. Then f has a fixed point: there is c ∈ [a. x) ⊂ S. (RI2) Let x ∈ [a. Then S has no limit points.  Exercise: a) Give another (more standard) proof of Theorem 6. x] is infinite. x] is finite but A ∩ [a. Fixed point theorems are ubiquitously useful throughout mathematics. contradicting the fact that b is the largest element of [a. so x ∈ S. so y ∈ S and thus f (y) > y = f (x). We claim [x. b] with f (c) = c. (RI1) is clear. y] ⊂ S. y = f (x) ≤ f (z) and we get y ≤ f (z) < z ≤ y. (Tarski Fixed Point Theorem) Let f : [a. then: either A ∩ [a. so x ∈ S.22 by showing that sup{x ∈ [a. Tarski’s Fixed Point Theorem Mainly as a further showpiece for Real Induction. We will show S = [a. (Bolzano-Weierstrass) Every infinite subset A of [a. y] is infinite for some y < x. Seeking a contradiction we suppose f has no fixed point. . b] | f (x) > x}. 9. Proof. b] by Real Induction. b]. (RI3) If [a. It is a fixed point theorem: that is. But then b ∈ S. b) Which argument do you prefer? Exercise: Find an application of Theorem 6. It suffices to show S = [a. A ∩ [a. let S be a subset such that for all M > 0. If not. a contradiction. b]. x + δ] is finite. b]! (RI1) Since a = min[a. but since y < x. and further fixed point theorems for functions defined on subintervals of R will be given when we study infinite sequences later on.22. x] is infinite. x + δ] ⊂ S. I am indebted to J. or A ∩ [a. b]. b] as above. Propp for alerting me to this application of Real Induction. then it has a limit point and hence so does A ∩ [a. TARSKI’S FIXED POINT THEOREM 123 Example: More generally. y] is finite for all y < x and A ∩ [a. since there is no fixed point. so f (b) > b. Tarski [Ta55]. x] and x ∈ S. b]: thus S = [a. (RI3) Let x ∈ (a. then [x. Theorem 6. f (a) > a. x] is infinite. but since x ≤ z. Otherwise A ∩ [a. (RI2) Suppose x ∈ [a. or A ∩ [a. x] is finite. b) ∩ S. b] has a limit point. so x is a limit point of A ∩ [a. a theorem which gives conditions on a function f : X → X for there to be a point c ∈ X with f (c) = c. b] such that if A ∩ [a. f (y) ≤ f (x). b] → [a.21 above. so y > x by definition of S. Put y = f (x). M ] is finite. we give here a special case of a theorem of the great logician A. x] ⊂ S. b) and suppose [a. Proof.21. If A ∩ [a. b] and suppose [a. Let S = {x ∈ [a. We claim x ∈ S.

.

b) → R be differentiable. let α be any real number which is greater than A. we find: there is c ∈ (c1 . Thm. b) such that for all x > c. or (ii) limx→b− g(x) = ±∞.13]. 125 . b) The analogous statements with limx→b− replaced everywhere by limx→a+ hold. Let c < x < y < b. Let −∞ ≤ a < b ≤ ∞. Step 2: Suppose A > −∞. g(x) g(x) and then a little algebra yields f (x) g(y) f (y) (24) <β−β + . b) such that for all x > c. L’Hôpital’s Rule We have come to the calculus topic most hated by calculus instructors: L’Hôpital’s ∞ Rule. fg(x) (x) < α. 5. Then limx→b− fg(x) (x) = A. and let β be such that A < β < α.1. Fix y in (23) and choose c1 ∈ (y. as is the proof. Then by letting x approach b in (23) we get fg(y) (y) ≤ β < α for all c < y < b. b) such that for all x > c. Multiplying (23) by g(x)−g(y) g(x) gives   f (x) − f (y) g(x) − g(y) <β . CHAPTER 7 Differential Miscellany 1. y) such that f (x) − f (y) f 0 (t) (23) = 0 <β g(x) − g(y) g (t) Suppose first that (i) holds. By Cauchy’s Mean Value Theorem. Proof. a) Step 1: Suppose that A < ∞. fg(x) (x) < α. The following succinct formulation is taken from [R. 0 0 First. b) such that c1 < x < b implies g(x) > g(y) and g(x) > 0. Let f. g(x) g(x) g(x) Letting x approach b. since fg0 (x) (x) → A < ∞. b) such that for all x > c. which is what we wanted to show. This result gives a criterion for evaluating indeterminate forms 00 or ∞ . g : (a. Next suppose that (ii) holds. Theorem 7. a) We suppose that f 0 (x) lim− 0 = A ∈ [−∞. there is t ∈ (x. ∞] x→b g (x) and also that either (i) limx→b− f (x) = limx→b− g(x) = 0. Then arguing in a very similar manner as in Step 1 we may show that for any α < A there exists c ∈ (a. there is c ∈ (a. We will show that there exists c ∈ (a. fg0 (x) (x) < β.

3. (Racetrack Principle) Let f. Because f 00 = (f 0 )0 is non-negative on [a. Thus many calcu- lus students switch to applying L’Hôpital’s Rule instead of evaluating derivatives from the definition.  Proposition 7. ∞). i. For instance. b) If f 0 (x) > g 0 (x) for all x > a. and thus being continuous. There are any number of ways.1  Remark: Perhaps you were expecting the additional hypothesis limx→b− f (x) = ±∞ in condition (ii). [R] states and proves the result with lim x→a+ instead of limx→b− .126 7. Since limx→∞ g(x) = 0 f 0 (x) 1 and limx→∞ g 0 (x) = limx→∞ ex = 0. ∞). b) This is quite straightforward and left to the reader. This can lead to painfully circular reasoning. to showing that limx→0 sinx x = 1! 2) Many limits which can be evaluated using L’Hôpital’s Rule can also be eval- uated in many other ways. Let g be the tangent line to f at x = a. a) Then h0 (x) ≥ 0 for all x ≥ a. Then: a) We have f (x) − f (a) ≥ g(x) − g(a) for all x ≥ a. Then n+1 LH (n+1)xn n limx→∞ x ex = ∞∞ = limx→∞ ex = (n + 1) limx→∞ xex = (n + 1) · 0 = 0. But it seems to be very risky to present the result to freshman calculus students in this form! n Example 7. For intance. weakly increasing on [a. ∞) → R be two differentiable functions such that f 0 (x) ≥ g 0 (x) for all x ≥ a. Proof. let us count the ways! 1) Every derivative f 0 (x) = limh→0 f (x+h)−f h (x) is of the form 00 . Induction Step: let n ∈ Z+ and suppose limx→∞ xex = 0. try to evaluate the limit of Example 7. . b) This is the same as part a) with all instances of ≥ replaced by >: details may be safely left to the reader. Put h = f − g : [a. condition (ii) of L’Hôpital’s Rule applies so n limx→∞ exx = 0. and often just by thinking a bit about how functions actually behave. limx→∞ xex = 0. what is limx→0 sinx x ? Well. Then limx→∞ f (x) = ∞. ∞): for all x ≥ a. Putting together these two estimates shows limx→b− g(x) = A.. and thus f (x) − f (a) ≥ g(x) − g(a). this is not necessary. I recast it this way since a natural class of examples concerns limx→∞ . We show this by induction on n. What’s wrong with this? Well.1: We claim that for all n ∈ Z+ . Why do calculus instructors not like L’Hôpital’s Rule? Oh. DIFFERENTIAL MISCELLANY f (x) f (x) g(x)> α. f 0 is weakly increasing 1In fact. As the proof shows. ∞) to R. how do we know that (sin x)0 = cos x? Thinking back. viewed as a function from [a.1 above without using L’Hôpital. For instance: Lemma 7. Proof.e. g : [a. then f (x) − f (a) > g(x) − g(a) for all x > a. First we do n = 1: limx→∞ exx = 0. both numerator and denominator approach 0 and 0 limx→0 (sinx0x) = limx→0 cos1 x = cos 0 = 1. ∞) → R be twice differentiable such that: (i) f 0 (a) > 0 and (ii) f 00 (x) ≥ 0 for all x ≥ a.2. so h is weakly increasing on (a. we reduced this to computing the derivative of sin x at x = 0. f (x) − g(x) = h(x) ≥ h(a) = f (a) − g(a). Let f : [a. ∞) → R.

then ex log A = lim log = lim x − n log x. ∞). One can therefore establish limx→∞ xen = ∞ by showing that fn0 (x). x (iii) Take logarithms: if A = limx→∞ xen . namely n=0 xn! . f 0 (x)       1 1 1 lim = lim 2x sin − cos = − lim cos . and thus for all x ≥ a. 1. (ii) Since fn0 (x) > 0 for all x > n. ∞] which satisfies A = eA is A = ∞. and limx→0 fg(x) (x) = limx→0 x sin( x1 ) = 0. f is eventually increasing and thus tends either to a positive limit A or to +∞. f (x) − f (a) ≥ g(x) − g(a). g : R → R by f (x) = x2 sin( x1 ) and f (0) = 0 and g(x) = x. But as x → ∞. A classic rookie mistake is to forget to verify condition (i) or (ii): of course in general f (x) f 0 (x) lim 6= lim 0 . From this the desired limit follows almost immediately! 3) The statement of L’Hôpital’s Rule is complicated and easy for even relatively proficient practitioners of the mathematical craft to misremember or misapply. lim f (x) ≥ lim g(x) = ∞. so by Proposition 7. fn00 (x) are both positive for sufficiently large x. x + 1 → ∞. Example 7. Then f and g are both differentiable (for f this involves going back to the limit definition of the derivative at x = 0 – we have seen this example before). x→∞ xn x→∞ Now if l(x) = x − n log x. L’HÔPITAL’S RULE 127 on [a. However. x→0 g 0 (x) x→0 x x x→0 x . For instance. both are positive for large enough x. we leave it to the reader and try something slightly different instead. x→∞ x→∞  x x (i) Let fn (x) = xen . Since g is a linear function with positive slope. The analysis for fn00 is a bit messier. even 0 under conditions (i) and (ii). l00 (x) = xn2 . limx→a fg(x) (x) = A need not imply that limx→a fg0 (x)(x) exists. Applying the Racetrack Principle.3. f 0 (x) ≥ f 0 (a) = g 0 (x). log A = ∞ and thus A = ∞. But there are subtler pitfalls as well. x→a g(x) x→a g (x) try a random example. then l0 (x) = 1 − nx . or f (x) ≥ g(x) + f (a) − g(a) = g(x). so ex+1 ex A = lim n = e lim = eA. we get that for all x ≥ a.2: Let f. x→∞ (x + 1) x→∞ (x + 1)n The only A ∈ (0. so you cannot use L’Hopital’s Rule to show that a limit does not exist. It is easy to see that fn0 (x) > 0 for all x > n. (iv) When we learn about Taylor series we will have access to a superior expression P∞ n for ex .

but that we wish to approximate to any prescribed degree of accuracy – e. i. Newton’s Method is an important procedure for approximating roots of differen- tiable functions. And. but suppose we restrict our- selves to some small interval in which we believe there is a unique root c. Well. That is. What we do is consider the tangent line ln (x) to y = f (x) at x = xn . every once in a while one really does need L’Hôpital’s Rule! We will encounter such a situation in our study of Taylor polynomials. Newton’s Method 2. and so forth. So let’s do it: 0 = ln (xn+1 ) = f (xn ) + f 0 (xn )(xn+1 − xn ). we may need to know its value to 100 decimal places. Now we take xn+1 to be x-intercept of the line ln (x).. The equation of this line is y − f (xn ) = f 0 (xn )(x − xn ). resulting in an infinite sequence of approximate roots {xn }∞ n=1 . then either it coincides with the x-axis (in which case xn is already a root of f and no amelioration is needed) or it is parallel .. we start with a number x1 which is an “approximate root” of f . then it won’t work at all. which is (in general) closer to the true root c than x1 is. Introducing Newton’s Method. the unique number such that ln (xn+1 ) = 0. as well it should be: if the tangent line at xn is horizontal. Nevertheless. f (x1 ) is rather close to 0 (this is not a precise mathematical statement.g. in general there are many. i. so −f (xn ) xn+1 − xn = f 0 (xn ) or f (xn ) (25) xn+1 = xn − . the better the method will work. This amelioration procedure is very geometric: let n ∈ Z+ .. but the closer x1 is to the true root c.e. If it’s too far away. and start with the approximate root xn . Namely. performing it a third time we get x4 . Say we do not need to know c exactly.128 7. f 0 (xn ) Note that our expression for xn+1 is undefined if f 0 (xn ) = 0. Exercise: Use L’Hôpital’s Rule to give a short proof of Theorem 5. 2.e. you know.32. The first key idea is that Newton’s method is one of sucessive approxima- tions. DIFFERENTIAL MISCELLANY and this limit does not exist. everyone else is doing it. suppose that y = f (x) is a differentiable function and that we know – perhaps by the methods of calculus! – that there is a real number c such that f (c) = 0. so y = ln (x) = f (xn ) + f 0 (xn )(x − xn ).) Then we perform some ameloriation procedure resulting in a second approximate root x2 .. so you should at least know how to do it. And then we continue: performing our amelioration procedure again we get x3 .1.

The formula (26) for successive approximations to 2 was known to the an- cient Babylonians. if we want to approximate the positive root of x2 − 2 = 0. . it is more than plausible that we should start with a positive number. using Newton’s method as above with f (x) = x2 − a leads to the recursion   1 a xn+1 = xn + .41666 . thousands of years before tangent lines. Thus x5 is accurate to 11 decimal places and x6 is accurate to 23 decimal places. 2 1 2   1 3 4 17 x3 = + = = 1. So in practice Newton’s method seems to work very well indeed. and Isaac Newton. NEWTON’S METHOD 129 to the x-axis. We compute the amelioration formula in this case: x2 − 2 2x2 − (x2n − 2) x2 + 2   1 2 (26) xn+1 = xn − n = n = n = xn + . . calculus of any kind. . A Babylonian Algorithm. 2 2 3 12   1 17 24 577 x4 = + = = 1. this means that xn is “too far away” from the true root c of f . .41421356237468991 . Consider f (x) = x2 −2. . 2. so I don’t really know. 2 12 17 408   1 577 816 x5 = + = 665857/470832 = 1. it means √ that the approximations get close to the true root very fast – it we wanted 2 to 100 decimal places we would only need to√compute x9 . If this holds true. . . √ We can use Newton’s method to approximate 2. 2.414215686 . Then   1 2 3 x2 = 1+ = = 1. then it tells me2 that √ 2 = 1. 2 408 577   1 470832 1331714 x6 = + = 886731088897/627013566048 = 1. Looking more carefully. . .41421356237309504880 . 2xn 2xn 2xn 2 xn 2 In other words. I don’t have access to the source code the software I used. in which case the method breaks down: in a sense we will soon make precise. for any a > 0.414213562373095048801688724 .5. So let’s try x1 = 1. . to get from xn to xn+1 we take the average of xn and xn . . In this case we are allowed to take any x1 6= 0. it seems that each iteration of the “amelioration process” xn 7→ xn+1 roughly doubles the number of decimal places of accuracy. 2].2. . straight- forward calculus tells us that there is a unique positive number c such that f (c) = 2 and that for instance c ∈ [1. 2 665857 470832 √ If I now ask my laptop computer to directly compute 2. . 2 xn 2Of course one should not neglect to wonder how my computer is doing this computation. . but it is plausible that it is in fact using some form Newton’s method. Remark: Similarly. .

we say the infinite sequence {xn } converges to L – and write xn → L – if for all  > 0 there exists N ∈ Z+ such that for all n ≥ N . then this is probably the method you should use. or – slightly more formally. Questioning Newton’s Method. If L ∈ R. Then f (xn ) → f (L). Proof. |xn − L| < . .) 2. . and let f : R → R be a continuous function. Introducing Infinite Sequences. . . Moreover. So it is a very close cousin of the types of limit operations we have already studied. Namely. c + δ) about the true root c such that starting Newton’s method with any x1 ∈ (c − δ. of course. given some x1 ∈ (c − δ.130 7.5 is a close cousin of the fact that compositions are con- tinuous functions are continuous. there exists a positive integer N such that for all n ≥ N . there exists δ > 0 such that |x − L| < δ =⇒ |f (x) − f (L)| < . Of course we haven’t proven anything yet. . you must tell them that the answer is yes! 2.  Remark: a) Proposition 7. which is a shame because the questions its treats are fundamental and closely related to pure mathematics. . Let f : I → R be differentiable.4. This is precisely the definition of limx→∞ f (x) = L except that our function f is no longer defined for all (or all sufficiently large) real numbers but only at positive integers. Since f is continuous at L.5. c + δ) guarantees that the sequence of ap- proximations {xn } gets arbitrarily close to c? b) Assuming the answer to part a) is yes.4. but one of a kind which is slightly different and in fact simpler than the limit of a real-valued function at a point. √ tions to a. a real infinite sequence xn is simply an ordered list of real numbers x1 .3. . Here are two natural questions: Question 7. a) Is there some subinterval (c − δ. say f (n) = xn . If you ever find yourself on a desert island and needing to compute a to many decimal placees as part of your engineering research to build a raft that will carry you back to civilization. We will give some answers to these questions.5 has a very . Fix  > 0. x2 . xn . Proposition 7. the business of the xn ’s getting arbitrarily close to c should be construed in terms of a limiting process.g. Most theoretical mathematicians (e. so |f (xn ) − f (L)| < . b) At the moment we are just getting a taste of infinite sequences. And now if anyone asks you whether “honors calculus” contains any practically useful information. This shows that f (xn ) → L. Here is one very convenient property of limits of sequences. DIFFERENTIAL MISCELLANY application √ of which with x1 = 1 leads to fantastically good numerical approxima. |xn − L| < δ. and in particular the proof is almost the same. First. (As well as being useful in applications. Suppose that xn → L. Putting these together: if n ≥ N then |xn − L| < δ. Let {xn }∞n=1 be a sequence of real numbers. Later in the course we will study them more seriously and show that Proposition 7. me) know little about it. since xn → L. is given by a function from the positive integers Z+ to R.. and let c ∈ I be such that f (c) = 0. c + δ) can we give a quantitative estimate on how close xn is to c as a function of n? Questions like these are explored in a branch of mathematics called numerical analysis.

c + δ]. a real number α for which such an inequality holds will be called a contraction constant for f . Thus Newton’s method starts with x1 ∈ R and produces an infinite sequence {xn } of “successive aproximations”. xn+1 = f (xn ) = (f ◦ · · · ◦ f )(x1 ). and this suggests (cor- rectly!) that sequences can be a powerful tool in studying functions f : R → R. . Conversely. c + δ]. i. we have also f (xn ) → f (L).  Suppose f : I → R is a contraction with constant α. |f (x) − f (y)| ≤ α|x − y|. y ∈ I. for which choices of x1 – this sequence converges to the true root c. 2. . NEWTON’S METHOD 131 important converse: if f : R → R is a function such that whenever xn → L. so f : [c − δ. . when. y) such that f (x) − f (y) = f 0 (c).e. x−y so |f (x) − f (y)| = |f 0 (c)||x − y| ≤ α|x − y|. Exercise: a) Which functions f : I → R have contraction constant 0? b) Show that every contraction f : I → R is continuous. then |f (x) − c| = |f (x) − f (c)| ≤ α|x − c| < |x − c| ≤ δ.  Lemma 7. c + δ] ⊂ I. Let α < 1 be a contraction constant for f . Then |x1 −x2 | = |f (x1 )−f (x2 )| ≤ α|x1 −x2 |. and consider the function f : R → R. . |f 0 (x)| ≤ α. A contractive function f : I → R has at most one fixed point. f (x) = αx.5. . and suppose there is α < 1 such that for all x ∈ I. Step 1: What are the fixed points? We set c = f (c) = αc. This is a key point. and let c ∈ I be a fixed point of f . c + δ] → [c − δ. because given any f : I → I and any x1 ∈ [c − δ. there is c ∈ (x. We call this the sequence of iterates of x1 under f. Then c = 0 is a fixed point no matter what α is. dividing through by |x1 − x2 | gives 1 ≤ α: contradiction. if x ∈ [c − δ. 2. Proposition 7. . Let us study the dynamics of iteration of this very simple function. Recall that a fixed point of f : I → R is a point c ∈ I with f (c) = c. Contractions and Fixed Points. . and our first question is whether – or more precisely. Proof.7. Let f : I → R be differentiable. A function f : I → R is contractive (or is a contraction) if there is α < 1 such that for all x. x2 = f (x1 ). we may define a sequence x1 . then f is continuous. if c 6= 0 then we divide through to get . . Example: Let α ∈ R. Proof. x3 = f (x2 ) = f (f (x1 )). Thus preservation of limits of sequences is actually a characteristic property of continuous functions. Then α is a contraction constant for f .. Suppose there are x1 6= x2 in I with f (x1 ) = x1 and f (x2 ) = x2 .6. Let x < y ∈ I. By the Mean Value Theorem. Then for any δ > 0.

then for all x1 6= 0. Since f is continuous and xn → L.9. (Contraction Lemma) For c ∈ R and δ > 0. c.e.8. . Suppose that f : I → R is a contraction with constant α and that f (c) = c. Lemma 7.  Lemma 7. Suppose f : I → I is continuous. a) For all x ∈ I. c + δ]. . Here are three observations about the above example: 1) If x1 is a fixed point of f . In particular it does not converge. the sequence of iterates will be constant: c.g. A sequence can have at most one limit.e. f is the identity function. and the sequence of iterates {xn } of x1 under f converges to L ∈ I. xn = x1 . f : I → I. Proof. by Proposition 7. DIFFERENTIAL MISCELLANY α = 1. we have |xn+1 − c| ≤ αn |x1 − c|.. whereas if x1 > 0 and α < −1. Case 4: If 1 < |α|. so f (L) = L. for all n ≥ 2. define a sequence {xn }∞ n=1 by xn+1 = f (xn ) for all n ≥ 1. But f (xn ) = xn+1 . |xn | = |αn−1 x1 | → ∞: the sequence of iterates grows without bound. . then the terms xn alternate in sign while increasing in absolute value. The first observation clearly holds for the iterates of any function f : I → I. then for all n. so of course it will converge to c. so if α 6= 1 then 0 is the only fixed point. One can say more about the signs if desired: e. First observe that for any fixed point c. so the sequence is constant. Here. i. put I = [c − δ. when x1 > 0 and α > 1. Finally. In particular xn → c. We repeat the statement here because it is a key point: in order to be able to iterate x1 under f we need f : I → I.. i. so they do not approach either ∞ or −∞. with eventual value the unique fixed point: 0. Then L is a fixed point of f . Then for all n ∈ Z+ . c . no matter what the initial point x1 is. Case 2: If α = 1. Case 1: If α = 0. b) For any x1 ∈ I. But note that |f (x) − f (y)| = |α||x − y|. it converges to a fixed point of f . then every xn is positive and the sequence of iterates approaches ∞. so f is contractive iff |α| < 1. x3 = αx2 = α(αx1 ) = α2 x1 . Case 3: If 0 < |α| < 1. in fact with exponential speed. the sequence of iterates converges to the unique fixed point c = 0. 2) Whenever the sequence of iterates converges. f (x) ∈ I. So really we are interested in the case when x1 is not a fixed point of f .132 7. then no matter what x1 is. then xn = αn−1 x1 → 0. Thus the sequence is eventually constant. if α = 1 then every c ∈ R is a fixed point.5 we have f (xn ) → f (L). Step 2: Let us try to figure out the limiting behavior of the sequences of iterates. and if xn → L then certainly xn+1 → L as well. 3) The sequence of iterates converges for all initial points x1 ∈ R iff |α| < 1. then the equence of iterates is constant. xn = 0. What about the last two observations? In fact they also hold quite generally. x1 ∈ I. a) This was established above. and this constant value is a fixed point of f . In fact we can – and this is an unusual feature arising because of the very simple f we choose – give an explicit formula for the general term xn in the sequence of iterates. . Starting at x1 we have x2 = αx. Proof. c. and so on: in general we have xn = αn−1 x1 .

what does the material we have just developed have to do with this prob- lem? At first sight it does not seem relevant.. In fact. Since T 0 is continuous at c.. so there exists δ > 0 such that f is negative on [c − δ. c+δ] and we probably cannot iterate f on this interval: f (c) = 0 and 0 need not lie in [c − δ. (f 0 (x))2 (f 0 (x))2 This is all kosher.  2. for any positive α we want! Thus not only can we make T contractive. and we have been talking about contractions and f need not be a contraction. T 0 is continuous. the second derivative exists and is continuous – and let c be a point of the interior of I such that f (c) = 0 and f 0 (c) 6= 0 – a simple root. because we have been talking about fixed points and are now interested in roots. . c + δ]. For this we look at the derivative: f 0 (x)f 0 (x) − f (x)f 00 (x) f (x)f 00 (x) T 0 (x) = 1 − = . c is the unique fixed point of T on this interval. observe that a point x is a root of f if and only if it is a fixed point of T . Convergence of Newton’s Method. we can make it contractive with any contractive constant α ∈ (0. So. Rather it is the amelioration function f (x) T (x) = x − .. But now a miracle occurs: T 0 (c) = 0. 2. .6. What we want to show is that – possibly after shrinking δ – for any choice of initial ap- proximation x1 ∈ [c−δ. First. at least in some smaller interval around c. δ]. c) and positive on (c. since we have assumed f 0 is nonzero on [c − δ. 1) we want! Thus we get the following result. NEWTON’S METHOD 133 b) We compute |xn+1 − c| = |f (xn ) − f (c)| ≤ α|xn − c| = α|f (xn−1 ) − f (c)| ≤ α2 |xn−1 − c| = α3 |xn−2 − c| = . this means that by making δ smaller we may assume that |T 0 (x)| ≤ α for all x ∈ [c − δ. c + δ]. . What gives? The answer is that all of our preparations apply not to f but to some auxiliary function defined in terms of f . c + δ].. since f is C 2 . In particular c is the unique root of f on [c − δ. f 0 (xn ) Thus the sequence {xn }∞ n=1 is generated by repeatedly applying a certain func- tion. To figure out what this function is. f 0 (x) Now we have to check that our setup does apply to T . consider the recursive definition of the Newton’s method sequence: f (xn ) xn+1 = xn − . Believe it or not.it’s just not the function f . quite nicely. Since by our assumption c is the unique root of f in [c − δ.. the Newton’s method sequence converges rapidly to c. The next order of business is to show that T is contractive. = αn |x1 − c|. c + δ] and also that f is C 2 .e. f is increasing through c. Note that since f 0 (c) > 0. we are now very close to a convergence theorem for the sequence of iterates generated by Newton’s method. c + δ]. c+δ]. we have been talking about iterating functions on [c−δ. Let f : I → R be a C 2 function – i.

Miller) First. and indeed it does so exponentially fast. if you look back at exactly what we proved. a) (W. precisely |[a. Then there exists δ > 0 such that for all x1 ∈ [c − δ. In fact it is possi- ble to give a short.10. b ∈ R. for a. In 1 fact. For this. b} be the open line segment from a to b. elementary proof of the quadratic convergence which not only avoids Taylor’s theorem but also avoids our analysis of the contractive properties of the amelioration map T . by Lemma 7. it seems to itself suggest that more is true: namely. B > 0 such that for all x ∈ [c − δ. c + δ]. c + δ]. we see that what we have proved is not as good as what we observed: in our exam- ple the number of decimal places of accuracy doubled with each iteration. A b) If f 00 is continuous on I and f 0 (c) 6= 0. c + δ]. then there are indeed δ. Proof.. c + δ] back into [c − δ. as we get farther out into the sequence of iterates. f 00 exists and is continuous. Proof. |xn+1 − c| ≤ αn |x1 − c|.e. Theorem 7. and therefore. and let c ∈ I ◦ be a root of f . c + δ] the sequence of iterates xn+1 = T (xn ) is well-defined. i. Quadratic Convergence of Newton’s Method. By Lemma 7. (Quadratic Convergence of Newton’s Method) Let f : I → R be twice differentiable. Fix α ∈ (0. c + δ]. we denote by |[a. c + δ]. is an exponential function of n.e. b] if a ≤ b and |[a.9a). so for any x1 ∈ [c − δ. Fix α ∈ (0. B > 0 as in the statement of part a). c + δ]. if we take α = 10 and δ ≤ 1. That’s a big difference! And in fact.7. Although Theorem 7. for all n ∈ Z+ . b]| the set of all real numbers on the closed line segment from a to b. it is far from the last word on Newton’s method.9b).e. let {xn } be the Newton’s Method sequence with initial value x0 . so that (27) holds for all x0 ∈ [c − δ. |f 0 (x)| ≥ A and |f 00 (x)| ≤ B. b] = [b. an important result which we will unfortunately not treat until much later on. T maps [c − δ. For any x0 ∈ [c − δ. 1). since T 0 (c) = 0. I believe it should be possible to use the idea expressed at the end of the last paragraph to give a proof of the quadratic convergence of Newton’s Method. Here it is. but the base of the exponential gets better and better as we get closer to the point.10 with the empirically observed convergence rate of f (x) = x2 − 2.. i. Similarly. A. Let c ∈ R be such that f (c) = 0 and f 0 (c) 6= 0. b)| = |[a. and note that it is precisely the Newton’s method sequence with initial approximation x1 . whereas we proved that the number of decimal places of accuracy must grow at least linearly with n. So xn → c. B (27) |xn+1 − c| ≤ |xn − c|2 . (Convergence of Newton’s Method) Let f : I → R be a C 2 function. an essential tool is Taylor’s Theorem With Remainder. a] if a > b. let |(a. This is faster than exponential decay with any fixed base.134 7. 1). a) Suppose there are real numbers δ. if we compare the proven convergence rate of Theo- rem 7.10 is a very satisfying result.. b]| \ {a. A. |xn+1 − c| ≤ αn |x1 − c|. the above estimate ensures that xn approximates c to within n decimal places of accuracy. . we get exponential convergence.11. x1 = 1. there exists δ > 0 such that α is a contraction constant for T 0 on [c − δ. DIFFERENTIAL MISCELLANY Theorem 7. i. As above. In fact.  2. the Newton’s method sequence xn+1 = xn − ff0(x n) (xn ) is well-defined and converges rapdily to c in the following sense: for all n ∈ Z+ . b]| = [a. Then for all n ∈ N.

yn ))| such that f 0 (xn ) − f 0 (yn ) = f 00 (zn ). c)| such that f (xn ) f (xn ) − f (c) = = f 0 (yn ). c]|: there is yn ∈ |(xn . xn − yn The rest of the proof is a clever calculation: for n ∈ N. f 0 (xn ) Apply the Mean Value Theorem to f on |[xn . 2. c+δ]. yn ]|: there is zn ∈ |(xn . NEWTON’S METHOD 135 Let x0 ∈ [c−δ. xn − c xn − c Apply the Mean Value Theorem to f 0 on |[xn . It will be useful to rewrite the defining recursion as −f (xn ) ∀n ∈ N. and let {xn } be the Newton’s Method sequence of iterates. xn+1 − xn = . we have .

.

.

−f (xn ) f (xn ) .

.

|xn+1 − c| = |(xn+1 − xn ) + (xn − c)| = .

0 .

+ 0 f (xn ) f (yn ) .

.

f (xn )(f 0 (xn ) − f 0 (yn )) .

.

f 0 (xn )(xn − c)(f 0 (xn ) − f 0 (yn )) .

.

.

.

.

= .

.

.

=.

.

f 0 (xn )f 0 (yn ) .

.

f 0 (xn )f 0 (yn ) .

.

0 .

f (xn ) − f 0 (yn ) .

.

.

00 .

.

f (zn ) .

=.

.

|xn − c| = .

f 0 (yn ) .

|xn − yn ||xn − c| .

.

.

.

f 0 (yn ) .

00 .

.

f (zn ) .

≤.

0

|xn − c|2 ≤ B |xn − c|2 .
f (yn )

A
In the second to the last inequality above, we used the fact that since yn lies between
xn and c, |xn − yn | ≤ |xn − c|.
b) This is left as an exercise for the reader. 

Exercise: Prove Theorem 7.11b).

Let c be a real number, C a positive real number, and let {xn }∞
n=0 be a sequence
of real numbers. We say that the sequence {xn } quadratically converges to c if
(QC1) xn → c, and
(QC2) For all n ∈ N, |xn+1 − c| ≤ C|xn − c|2 .

Exercise:
a) Show that a sequence to satsify (QC2) but not (QC1).
b) Suppose that a sequence {xn }∞ 1
n=0 satisfies |x0 − c| < min(1, C ). Show that {xn }
quadratically converges to c.
c) Deduce that under the hypotheses of Theorem 7.11, there is δ > 0 such that for
all x0 ∈ [c − δ, c + δ], the Newton’s Method sequence quadratically converges to c.

Exercise: Let {xn } be a sequence which is quadrtically convergent to c ∈ R. View-
ing xn as an approximation to c, one often says that “the number of decimal places
of accuracy roughly doubles with each iteration”.
a) Formulate this as a statement about the sequence dn = log10 (|xn − c|).
b) Prove the precise statement you formulated in part a).

136 7. DIFFERENTIAL MISCELLANY

2.8. An example of nonconvergence of Newton’s Method.

Consider the cubic function f : R → R by f (x) = x3 − 2x + 2, so
x3n − 2xn + 2
xn+1 = xn − .
3x2n − 2
Take x1 = 0. Then x2 = 1 and x3 = 0, so the sequence of iterates will then
alternate between 0 and 1: one calls this type of dynamical behavior a 2-cycle.
(The unique real root is at approximately −1.769, so we are plenty far away from it.)

This example is the tip of a substantial iceberg: complex dynamics. If you
consider cubic polynomials as functions from the complex numbers C to C, then
you are well on your way to generating those striking, trippy pictures of fractal
sets that have been appearing on tee-shirts for the last twenty years. I recommend
[Wa95] as an especially gentle, but insightful, introduction.

3. Convex Functions
3.1. Convex subsets of Euclidean n-space.

Let n be a positive integer and let Rn be Euclidean n-space, i.e., the set of
all ordered n-tuples (x1 , . . . , xn ) of real numbers. E.g. for n = 2 we get the plane
R2 , for n = 3 one gets “three space” R3 , and for higher n the visual/spatial in-
tuition is less immediate but there is no mathematical obstruction. For the most
part Rn is the subject of the sequel to this course – multivariable mathematics
– but let’s digress a little bit to talk about certain subsets of Rn . In fact for our
applications we need only n = 1 and n = 2.

Given two points P = (x1 , . . . , xn ), Q = (y1 , . . . , yn ∈ Rn we can add them:
P + Q = (x1 + y1 , . . . , xn + yn ),
a generalization of “vector addition” that you may be familiar with from physics.
Also given any real number λ, we can scale any point P by λ, namely
λP = λ(x1 , . . . , xn ) := (λx1 , . . . , λxn ).
In other words, we simply multiply every coordinate of P by λ.

Let P 6= Q be points in Rn . There is a unique line passing through P and Q.
We can express this line parameterically as follows: for every λ ∈ R, let
Rλ = (1 − λ)P + λQ = ((1 − λ)x1 + λy1 , . . . , (1 − λ)xn + λyn ).
In particular R0 = P and R1 = Q. Thus the line segment P Q is given as the set
{Rλ | 0 ≤ λ ≤ 1}.
Now here is the basic definition: a subset Ω ⊂ Rn is convex if for all P, Q ∈ Ω, the
line segment P Q is contained in Ω. In other words, if our physical universe is the
subset Ω and we stand at any two points of Ω, then we can see each other : we have
an unobstructed line of view that does not at any point leave Ω. There are many
convex subsets in the plane: e.g. interiors of disks, ellipses, regular polygons, the
portion of the plane lying on one side of any straight line, and so forth.

3. CONVEX FUNCTIONS 137

Ω1 , . . . , Ωn be convex subsets of Rn .
Exercise: Let T
n
a) Show that i=1 Ωi – the Snset of all points lying in every Ωi – is convex.
b) Show by example that i=1 Ωi need not be convex.

When n = 1, convex subsets are quite constrained. Recall we have proven:
Theorem 7.12. For a subset Ω ⊂ R, the following are equivalent:
(i) Ω is an interval.
(ii) Ω is convex.
3.2. Goals.

In freshman calculus one learns, when graphing a function f , to identify subinter-
vals on which the graph of f is “concave up” and intervals on which it is “concave
down”. Indeed one learns that the former occurs when f 00 (x) > 0 and the latter
occurs when f 00 (x) < 0. But, really, what does this mean?
First, where freshman calculus textbooks say concave up the rest of the math-
ematical world says convex ; and where freshman calculus textbooks say concave
down the rest of the mathematical world says concave. Moreover, the rest of the
mathematical world doesn’t speak explicitly of concave functions very much be-
cause it knows that f is concave exactly when −f is convex.
Second of all, really, what’s going on here? Are we saying that our definition
of convexity is that f 00 > 0? If so, exactly why do we care when f 00 > 0 and when
f 00 < 0: why not look at the third, fourth or seventeenth derivatives? The answer is
that we have not a formal definition but an intuitive conception of convexity, which
a good calculus text will at least try to nurture: for instance I was taught that a
function is convex (or rather “concave up”) when its graph holds water and that
it is concave (“concave down”) when its graph spills water. This is obviously not
a mathematical definition, but it may succeed in conveying some intuition. In less
poetic terms, the graph of a convex function has a certain characteristic shape that
the eye can see: it looks, in fact, qualitatively like an upward opening parabola or
some portion thereof. Similarly, the eye can spot concavity as regions where the
graph looks, qualitatively, like a piece of a downward opening parabola. And this
explains why one talks about convexity in freshman calculus: it is a qualitative,
visual feature of the graph of f that you want to take into account. If you are
graphing f and you draw something concave when the graph is actually convex,
the graph will “look wrong” and you are liable to draw false conclusions about the
behavior of the function.
So, at a minimum, our task at making good mathematical sense of this portion
of freshman calculus, comes down to the following:

Step 1: Give a precise definition of convexity: no pitchers of water allowed!

Step 2: Use our definition to prove a theorem relating convexity of f to the second
derivative f 00 , when f 00 exists.

In fact this is an oversimplification of what we will actually do. When we try
to nail down a mathematical definition of a convex function, we succeed all too
well: there are five different definitions, each having some intuitive geometric ap-
peal and each having its technical uses. But we want to be talking about one class

138 7. DIFFERENTIAL MISCELLANY

of functions, not four different classes, so we will need to show that all five of our
definitions are equivalent, i.e., that any function f : I → R which satisfies any one
of these definitions in fact satisfies all four. This will take some time.

3.3. Epigraphs.

For a function f : I → R, we define its epigraph to be the set of points (x, y) ∈ I×R
which lie on or above the graph of the function. In fewer words,
Epi(f ) = {(x, y) ∈ I × R | y ≥ f (x)}.
A function f : I → R is convex if its epigraph Epi(f ) is a convex subset of R2 .

Example: Any linear function f (x) = mx + b is convex.

Example: The function f (x) = |x| is convex.

Example: Suppose f (x) = ax2 + bx + c. Then Epi(f ) is just the set of points
of R2 lying on or above a parabola. From this picture it is certainly intuitively
clear that Epi(f ) is convex iff a > 0, i.e., iff the parabola is “opening upward”. But
proving from scratch that Epi(f ) is a convex subset is not so much fun.

3.4. Secant-graph, three-secant and two-secant inequalities.

A function f : I → R satisfies the secant-graph inequality if for all a < b ∈ I
and all λ ∈ [0, 1], we have
(28) f ((1 − λ)a + λb) ≤ (1 − λ)f (a) + λf (b).
As λ ranges from 0 to 1, the expression (1 − λ)a + λb is a parameterization of the
line segment from a to b. Similarly, (1 − λ)f (a) + λf (b) is a parameterization of
the line segment from f (a) to f (b), and thus
λ 7→ ((1 − λ)a + λb, (1 − λ)f (a) + λf (b))
parameterizes the segment of the secant line on the graph of y = f (x) from
(a, f (a)) to (b, f (b)). Thus the secant-graph inequality is asserting that the graph
of the function lies on or below the graph of any of its secant line segments.

A function f : I → R satisfies the three-secant inequality if for all a < x < b,
f (x) − f (a) f (b) − f (a) f (b) − f (x)
(29) ≤ ≤ .
x−a b−a b−x
A function f : I → R satisfies the two-secant inequality if for all a < x < b,

f (x) − f (a) f (b) − f (a)
(30) ≤ .
x−a b−a
Proposition 7.13. For a function f : I → R, the following are equivalent:
(i) f satisfies the three-secant inequality.
(ii) f satisfies the two-secant inequality.
(iii) f satisfies the secant-graph inequality.
(iv) f is convex, i.e., Epi(f ) is a convex subset of R2 .

we have (1 − λ1 )f (x1 ) + λ1 f (x2 ) ≤ (1 − λ1 )y1 + λ1 y2 and thus (1 − λ1 )f (x1 ) + λ1 f (x2 ) < f ((1 − λ1 )x1 + λ1 x2 ).b (x) from (a. b−a f (a)−f (b) and the inequality f (x) ≤ f (b) + b−a (b − x) is easily seen to be equivalent to f (b) − f (a) f (b) − f (x) ≤ . y1 ). Thus the two-secant inequality is equivalent to the secant-graph inequality. Then f (b) − f (a) f (d) − f (c) (31) ≤ . Since Epi(f ) is convex the line segment joining (x. 3. violating the secant-graph inequality. b. we suppose there is λ1 ∈ (0.b (x) = f (b) + (b − x). (iv) =⇒ (iii): Let x < y ∈ I.b b() = f (b). 1) such that (1 − λ1 )P1 + λ1 P2 ∈ / Epi(f ): that is. b−a say. so is every point y in between y1 and y2 . But since f (x1 ) ≤ y1 and f (x2 ) ≤ y2 . (Generalized Two Secant Inequality) Let f : I → R be a convex function. note that we also have f (a) − f (b) La. they are elements of the epigraph Epi(f ). c. x−a b−a To get the other half of the three-secant inequality. f (a)) and (b. (iii) =⇒ (i): As above. (1 − λ1 )y1 + λ1 y2 < f ((1 − λ1 )x1 + λ1 x2 ). f (b)) has f (b)−f (a) equation y = f (a) + b−a (x − a). So we may assume x1 6= x2 and then that x1 < x2 (otherwise interchange P1 and P2 ). and let a. d ∈ I with a < b ≤ c < d.b (x) is a linear function with La. y2 ) ∈ Epi(f ). Since (x. the secant graph inequality implies f (x) − f (a) f (b) − f (a) ≤ . Now La. and to say that it lies inside the epigraph is to say that the secant line always lies on or above the graph of f . f (y)) lie on the graph of f . (i) =⇒ (ii): This is immediate. P2 = (x2 . b−a d−c . f (a)) to (b. Seeking a contradiction. b−a b−x (iii) =⇒ (iv): Let P1 = (x1 . f (b)). We will show (i) =⇒ (ii) ⇐⇒ (iii) =⇒ (i) and (iii) ⇐⇒ (iv). We want to show Epi(f ) contains the line segment joining P1 and P2 .b (x). CONVEX FUNCTIONS 139 Proof.14. snce the secant line La.b (a) = f (a) and La. f (y)) lies inside Epi(f ). This is clear if x1 = x2 = x – in this case the line segment is vertical. hence it is the secant line between (a. But this line segment is nothing else than the secant line between the two points. f (x)) and (y. since since y1 and y2 are both greater than or equal to f (x). (ii) ⇐⇒ (iii): The two-secant inequality f (x) − f (a) f (b) − f (a) ≤ x−a b−a is equivalent to   f (b) − f (a) f (x) ≤ f (a) + (x − a) = La.  Corollary 7. f (x)) and (y.

If b = c then (31) is the Two Secant Inequality. b]: there exists a constant C such that for all x. Applying the Two Secant Inequality to the points a. b. d gives f (c) − f (b) f (d) − f (c) ≤ . c−b d−c Combining these two inequalities gives the desired result.15. Choose u. |f (x) − f (y)| ≤ C|x − y|. Applying the Generalized Two Secant Inequality twice we get f (v) − f (u) f (y) − f (x) f (z) − f (w) ≤ ≤ . v−u y−x z−w Thus . w. y ∈ [a. b] be any subinterval of I ◦ . Theorem 7.  Exercise (Interlaced Secant Inequality): Let f : I → R be a convex function. and let a < b < c < d ∈ I. v. Suppose f : I → R is convex. z ∈ I ◦ with u < v < a and b < w < z. Continuity properties of convex functions. so we may suppose b < c. b]. Then f is a Lipschitz function on [a.140 7. c gives f (b) − f (a) f (c) − f (b) ≤ b−a c−b and applying it to the points b. Proof. c−a d−b 3. c.5. and let [a. Show that we have f (c) − f (a) f (d) − f (b) ≤ . DIFFERENTIAL MISCELLANY Proof.

.

.

.

|f (x) − f (y)| .

f (v) − f (u) .

.

f (z) − f (w) .

≤ max .

.

.

. .

.

|x − y| v−u ..

.

z−w .

so .

.

.

.

.

f (v) − f (u) .

.

f (z) − f (w) .

L = max .

.

.

. .

.

v−u .

.

z−w .

. b]. b). Then f is continuous. For M ∈ R.  Corollary 7. ∞]. Exercise: Prove Proposition 7.17.16. and let I be an interval of the form (a. define a function f˜ : {a}∪I → R by f˜(a) = M . Let I be any open interval. b] or (a. It turns out that – at least according to our definition – a convex function need not be continuous on an endpoint of an interval on which it is defined. ∞). (a. b) Suppose that L = limx→a+ f (x) < ∞. Let a ∈ R. is a Lipschitz constant for f on [a. Let f : I → R be convex. The follow- ing result analyzes endpoint behavior of convex functions. Exercise: Prove Corollary 7.16. a) limx→a+ f (x) ∈ (−∞. and let f : I → R be a convex function. Proposition 7.17. Then f˜ is convex iff M ≥ L and continuous at a iff M = L. f˜(x) = f (x) for x ∈ I.

x→a x→b (ii) =⇒ (i): Let g(x) = f (x)−fx−a (a) . Proof. For a differentiable function f : I → R. (Kane3 Criterion) For twice differentiable f : I → R. Proof. CONVEX FUNCTIONS 141 3. f is weakly increasing iff f 00 = (f 0 )0 ≥ 0. Since f 0 is increasing. we get f 0 (a) = lim+ s(x) ≤ s(b) = S(a) ≤ lim− S(x) = f 0 (b). Theorem 7. Since f is differentiable. we define f (x) − f (a) s(x) = . Thus g 0 (x) ≥ 0 for all x ∈ (a. so indeed g is increasing on (a. a derivative cannot have jump discontinuities. TFAE: (i) f is convex.  Corollary 7.18. Differentiable convex functions. Proof. (i) =⇒ (ii): For x ∈ (a. f (x)−f x−a (a) = f 0 (y) for some a < y < x. By Darboux’s Theorem. 3. A differentiable convex function is C 1 . By Theorem 7.18. Taking limits. b]. TFAE: (i) f is convex. b) we define f (b) − f (x) S(x) = . b−x Since f is convex.19.  Corollary 7. (ii) f 0 is weakly increasing.20. For both directions of the proof it is convenient to consider “fixed” a < b ∈ I and “variable” a < x < b. b]: this gibes the two-secant inequality and thus the convexity of f . x−a and for x ∈ [a. 0  3Andrew Kane was a student in the 2011-2012 course who suggested this criterion upon being prompted in class. Moreover. By Theorem 7. b]. so is g and (x − a)f 0 (x) − (f (x) − f (a)) g 0 (x) = . . (x − a)2 By the Mean Value Theorem. We’re done. (ii) f 00 ≥ 0.6. f (x) − f (a) = f 0 (y) ≤ f 0 (x). the three-secant inequality for a < x < b holds: s(x) ≤ s(b) = S(a) ≤ S(x). f 0 is weakly increasing so has only jump disconti- nuities.17. b]. x−a equivalently (x − a)f 0 (x) − (f (x) − f (a)) ≥ 0. We will show that g is increasing on (a. f is convex iff f 0 is weakly increasing.

Proof. Then the secant line between (c. Then h(x) = f (x) − `(x) is a degree two polynomial with h(c) = 0. The following result secures the importance of convex functions in applied mathe- matics: for these functions. f (c)). We claim that for all c ∈ R the tangent line `c (x) to the parabola f (x) = Ax2 + Bx + C is the unique line pass- ing through (c. Therefore j has a root at some d 6= c and that point d. c) and (c. We claim that the tangent line `c (x) to f (x) = Ax2 +Bx+C is a supporting line iff A > 0. To see this. and thus f (x) − `c (x) = g(x) 6= 0 for x 6= c. Let f : I → R be a differentiable convex function. 3.e. But limx→±∞ f (x)−`c (x) = ∞ if A > 0 and −∞ if A < 0. C ∈ R with A 6= 0. so by the Mean Value Theorem there is z ∈ (c. Indeed. contradicting f 0 being weakly increasing. so there is w ∈ (b. f 0 ≡ 0 on the subinterval |[c. On the other hand. h(x) = A(x − c)j(x) with j(x) a linear function with j(c) 6= 0. b) If e 6= c is such that f 0 (e) = 0. An extremal property of convex functions. It follows that if A > 0. So f (c) ≤ f (x) for all x ≥ c. c) Exhibit a convex function f : R → R which does not assume a global minimum. and thus f is constant on |[c. By Theorem 7. a critical point must be a global minimum. b) → R be a convex function such that limx→a+ f (x) = limx→b− f (x) = ∞. Supporting lines and differentiability. f (b)) and (c. h0 (c) 6= 0. let A. Moreover. Similarly. `(x) = f (x). and thus h has a simple root at c. f (c)) and (d. suppose b < c is such that f (b) < f (c).21. Notice that ` = 0 is the tangent line to y = f (x) at c = 0. Example: More generally.18. So f (x) ≥ f (c) for all x ≤ c. That is. Then g is a quadratic polynomial with leading coeffi- cient A and g(c) = g 0 (c) = 0. so must have constant sign. c) such that f 0 (w) > 0. d) such that f 0 (z) < 0. b) replaced by R = (−∞. e]|. Let f : I → R be a function. b) State and prove an analogue of part a) with (a. f attains a strict global minimum at c: for all x 6= c. ∞) the continuous function f (x)−`c (x) is nonzero. a) Suppose d > c is such that f (d) < f (c). A supporting line for f at c ∈ I is a linear function ` : R → R such that f (c) = `(c) and f (x) ≥ `(x) for all x ∈ I. on both the intervals (−∞. e]| from c to e. Example: Consider the function f : R → R given by f (x) = x2 . i.  Exercise: a) Let f : (a. f (c)) has positive slope. Then: a) f attains its global minimum at c.8. f attains a global minimum at c. since ` is not the tangent line at c. But then c ≤ z and 0 = f 0 (c) > f 0 (z). contradicting f 0 being weakly incereasing.142 7. f (c)) such that f (x) 6= `c (x) for all x 6= c. f (x) > f (c). Let c ∈ I be a stationary point: f 0 (c) = 0. B. Observe that the horizontal line ` = 0 is a supporting line at c = 0: indeed f (0) = 0 and f (x) ≥ 0 for all x. Show that f assumes a global minimum.7. Then the secant line between (b. f 0 is weakly increasing. consider the function g(x) = f (x) − `c (x).. so g(c) = A(x − c)2 . f (x) − `c (x) > 0 for all x 6= c . f (d)) has negative slope. let ` be any other line passing through (c. b) Unless f is constant on some nontrivial subinterval of I. DIFFERENTIAL MISCELLANY 3. ∞). But then b ≤ c and f 0 (b) > 0 = f 0 (c). then since f 0 is weakly increasing. Theorem 7.

For a function f : I → R. and for every c < 0. Since for all x ∈ I. then f is convex. (ii) f admits a supporting line at each c ∈ I. Thus `(x) = mx is a supporting line for f at c = 0. f is the supremum of a family of convex functions. For all λ1 .22. TFAE: (i) f is convex. −λ2 α ∈ I. f (x) = supi∈I fi (x). f (x) ≥ `c (x) for all c and f (c) = `c (c). so we may assume that c = 0 and f (0) = 0. Since Epi(f ) is a convex subset of R2 . For every c > 0. More precisely. f (x) − `c (x) < 0 for all x – so `c is not a supporting line. (i) =⇒ (ii): Neither property (i) or (ii) is disturbed by translating the coordinate axes. Let a < b ∈ I and λ ∈ (0. f is convex. Example: Consider the function f : R → R given by f (x) = |x|. the line y = −x is a supporting line. but it is not the only one: indeed y = mx is a supporting line at c = 0 iff −1 ≤ m ≤ 1. f (tα) ≥ mt for all t ∈ R such that tα ∈ I. y = 0 is a supporting line. hence convex by Lemma 7. by the secant-graph inequality we have   λ1 λ2 0 = (λ1 + λ2 )f (−λ2 α) + (λ1 α) ≤ λ1 f (−λ2 α) + λ2 f (λ1 α). Proof. Note that the smallest slope of a supporting line is the left-hand derivative at zero: 0 f (0 + h) − f (0) −h f− (0) = lim− = lim− = −1. Note that since f 00 (x) = 2A. λ2 λ1 Equivalently. Since the linear functions `c are certainly convex. Let α ∈ I \ {0}. For c = 0. i∈i i∈I  Theorem 7. Then f ((1 − λ)a + λb) = sup fi ((1 − λ)a + λb) i∈I ≤ (1 − λ) sup f (a) + λ sup f (b) = (1 − λ)f (a) + λf (b). 1). f : I → R is a function such that for all x ∈ I. h→0 h h→0 h and the largest slope of a supporting line is the right-hand derivative at zero: 0 f (0 + h) − f (0) h f+ (0) = lim = lim = −1. CONVEX FUNCTIONS 143 – so `c is a supporting line – and also that if A < 0.23. 3. Convex functions are closed under suprema. h→0+ h h→0 h + Lemma 7. the line y = x is a sup- porting line. Let I be an open interval. (ii) =⇒ (i): For each c ∈ I. f is convex iff A > 0. if {fi : I → R}i∈I is a family of convex functions. λ1 + λ2 λ1 + λ2 or −f (−λ2 α) f (λ1 α) ≤ .21  . Proof. and in both cases these supporting lines are unique. λ2 λ1 −f (−λ2 α) f (λ1 α) It follows that supλ2 λ2 ≤ inf λ1 λ1 . λ2 > 0 such that λ1 α. so there is m ∈ R with −f (−λ2 α) f (λ1 α) ≤m≤ . let `c : I → R be a supporting line for f at c. we have f (x) = supc∈I `c (x).

`(x) > f (x) and thus ` is not a supporting line for f at c. we recall the notion of one-sided differentiability: if f is a function defined (at least) on some interval [c − δ. We follow [Go. 0 f (x) ≥ f (c) + f+ (c)(x − c). so `(x) = f (c) + m(x − c) is a supporting line for f at c. and A ≤ B. From the three-secant inequality we immediately deduce all of the following: ϕ is weakly increasing on (a. c]. Let I be an interval. we say f is left differentiable at c if limx→c− f (x)−f x−c (c) exists. Proof. f− (c) ≤ f+ (c). `(x) > f (x) and thus ` is not a supporting line for f at c. f (c)) is a supporting line for f iff its slope m satisfies 0 0 f− (c) ≤ m ≤ f+ (c). b). put `(x) = f (c) + m(x − c). 0 0 f (v) − f (x1 ) f (v) − f (x2 ) 0 0 f− (x1 ) ≤ f+ (x1 ) ≤ ≤ ≤ f− (x2 ) ≤ f+ (x2 ). d) A line ` passing through (c. and if so. we have 0 f (x) ≥ f (c) + f− (c)(x − c) ≥ f (c) + m(x − c). p. Exercise: Assume the hypotheses of Theorem 7. if x ≤ c. 153] and [Gr. pp. Show TFAE: . if x ≥ c. b).24. 0 and if so. a). DIFFERENTIAL MISCELLANY Before stating the next result. we denote this limit by f+ (c). the left derivative of f at c. 0 0 c) f− . 0 0 b) For all c ∈ (a. 8-10]. the right derivative of f at c. b). and for m ∈ R. Show that there is δ > 0 such that for all x ∈ (c. Thus 0 0 f− (c) = lim− ϕ(x) = sup A ≤ inf B = lim+ ϕ(x) = f+ (c).  Exercise: We supppose the hypotheses of Theorem 7. 0 f (x) ≥ f (c) + f− (c)(x − c). put A = ϕ((a.) Theorem 7. 0 a) Suppose that m < f− (c). That these are the only possible slopes of supporting lines for f at c is left as an exercise for the reader.24. c + δ). x→c x→c c) Let x1 .24. Show that there is δ > 0 such that for all x ∈ (c − δ. Similarly. if f is defined (at least) on sme interval [c. c + δ). (As usual 0 0 for limits. v − x1 v − x2 d) The proof of parts a) and b) shows that for x ∈ I. f is differentiable at c iff f− (c) and f+ (c) both exist and are equal. we say f is right differentiable at c if limx→c+ f (x)−fx−c (c) exists. f+ : I → R are both weakly increasing functions.144 7. b) \ {c} → R by f (x) − f (c) ϕ(x) = . b) Define ϕ : (a. and choose v with x1 < v < x2 . 0 0 Thus if f− (c) ≤ m ≤ f+ (c). 0 b) Suppose that m > f+ (c). we denote this limit by 0 f− (c). c)). b) with x1 < x2 . ϕ is weakly increasing on (c. and f : I → R be convex. c). if x ≤ c. c). 0 0 a) f is both left-differentiable and right-differentiable at c: f− (c) and f+ (c) exist. x2 ∈ (a. 0 f (x) ≥ f (c) + f+ (c)(x − c) ≥ m(xc ) if x ≥ c. c ∈ I ◦ . x−c Further. B = ϕ(c. Then by part b) and the three-secant inequality.

. . = λn = 0 and we have. . . . xn+1 ∈ I and λ1 .+λn+1 xn+1 = (1−λn+1 ) x1 + . and consider x1 . .25. If λn+1 = 0 we are reduced to the case of n variables which holds by induction. (iv) f has a unique supporting line at c. . the base case n = 1 being trivial. . . 1−λn+1 are non-negative numbers that sum to 1. 0 (ii) f+ is continuous at c. . + λn f (xn ) + λn+1 f (xn+1 ). . + λn = 1. . (iii) f is differentiable at c. by induction the n variable case of Jensen’s Inequality can be applied to give that the above expression is less than or equal to   λ1 λn (1 − λn+1 ) f (x1 ) + . . . .9. . . . . 1). Remark: One can go further: a convex function is twice differentiable at “most” points of its domain. (Jensen’s Inequality) Let f : I → R be continuous and convex. we have f (λ1 x1 + . 1 − λn+1 1 − λn+1 so that λ1 λn f (λ1 x1 + . lies considerably deeper than the one of the previous remark. Now for the big trick: we write   λ1 λn λ1 x1 +. Since the union of two countable sets is countable. . . 3. 1 − λn+1 1 − λn+1 λ1 λn Since 1−λ n+1 . . . . λn ∈ [0. . . it follows that a convex function is differentiable except possibly at a countable set of points. Proof. + f (xn ) + λn+1 f (xn+1 ) 1 − λn+1 1 − λn+1 = λ1 f (x1 ) + . . + xn + λn+1 f (xn+1 ). . equality. . xn ∈ I and any λ1 . . + λn f (xn ). 1] with λ1 + . For those who are familiar with the notions of countable and uncountable sets. . λn+1 ∈ [0. + λn+1 = 1. + xn +λn+1 xn+1 . . and the tangent line is a supporting line at c. it can be shown that such functions are continuous at “most” points of their domain.  . . For any x1 . .+ xn )+λn+1 xn+1 ) 1 − λn+1 1 − λn+1   λ1 λn ≤ (1 − λn+1 )f x1 + . . . trivially. + λn xn ) = f ((1−λn+1 )( x1 +. . . Jensen’s Inequality. We go by induction on n. CONVEX FUNCTIONS 145 0 (i) f− is continuous at c. So suppose Jensen’s Inequality holds for some n ∈ Z+ . . + λn xn ) ≤ λ1 f (x1 ) + . . . 1) and thus also that 1 − λn+1 ∈ (0. which is itself due to Lebesgue. So we may assume λn+1 ∈ (0. 3. . we may be more precise: the set of discontinuities of a monotone function must be countable. The sense of “most” here is different (and weaker): the set of points at which f fails to be twice differentiable has measure zero in the sense of Lebesgue. 1] with λ1 + . Theorem 7. . This result. . Remark: Because a weakly increasing function can only have jump discontinu- ities. Similarly if λn+1 = 1 then λ1 = .

. .+λn yn ≤ λ1 ey1 +. . Now apply the Weighted Arithmetic-Geometric Mean Inequality with n = 2. . . . ∞) with f 00 (x) > 0 there.+λn eyn = λ1 x1 +. p p (|x1 | + . But just as soon as you take graduate level real analysis. + |yn |q ) q . so we may assume that neither of these is the case. + λn = 1. so by the Kane Criterion f is convex on R. λ1 = p1 . ∞). 1] be such that λ1 + . . . When either x = 0 or y = 0 the left hand side is zero and the right hand side is non-negative. λ2 = 1q . (Weighted Arithmetic Geometric Mean Inequality) Let x1 . + λn xn . . Then: (32) xλ1 1 · · · xλnn ≤ λ1 x1 + . . y ∈ [0. . . Example: The function f : R → R given by x 7→ ex has f 00 (x) = ex > 0 for all x. Then xp yq (33) xy ≤ + .27. y1 .26. so the inequality holds and we may thus assume x. x1 = xp . + |xn yn | ≤ (|x1 |p + . xn > 0. ∞) satisfy p + q = 1. = yn = 0. By plugging these convex functions into Jensen’s Inequality and massaging what we get a bit. λn ∈ [0. . = λn = n. + |xn | ) p (|y1 | + . We get 1 1 xp yq xy = (xp ) p (y q ) q = xλ1 1 xλ2 2 ≤ λ1 x1 + λ2 x2 = + . . ∞).28.. . For 1 ≤ i ≤ n. . put yi = log xi . + xn (x1 · · · xn ) n ≤ . . . . p q Proof. DIFFERENTIAL MISCELLANY 3.4 Theorem 7. . x2 = y q . . . apply Young’s Inequality with |xi | |yi | x= 1 . ∞) → R by x 7→ xp is twice dif- ferentiable on (0. (Young’s Inequality) 1 1 Let x. . .+λn xn . . y > 0. the function f : [0. so f is convex on [0. . Some applications of Jensen’s Inequality. . + |xn |p ) (|y1 |q + . .. . Example: For any p > 1. .10. . . . q ∈ (1. . ∞) and λ1 . 1 Taking λ1 = . xn ∈ [0. . . xn . . .146 7. ∞) satisfy p + q = 1. p q  Theorem 7. yn ∈ R and let p. . we get the arithmetic geometric mean inequality: 1 x1 + . We may assume x1 . = xn = 0 or y1 = . Then 1 1 (34) |x1 y1 | + . q ∈ (1.  Theorem 7. . . . the result is clear if either x1 = . Then λ1 ···xλ n xλ1 1 · · · xλnn = elog(x1 n ) = eλ1 y1 +. p Proof. ∞) and let p. (Hölder’s Inequality) 1 1 Let x1 . we will quickly deduce some very important inequalities. . As above. . For 1 ≤ i ≤ n. . . n Proof.y = 1 . + |yn |q ) q q 4Unfortunately we will not see here why these inequalities are important: you’ll have to trust me on that for now. . you’ll see these inequalities again. so by the Kane Criterion f is convex on (0. Also f is continuous at 0.

. . . . .  Corollary 7. .+|xn ||xn +yn |p−1 +|y1 ||x1 +y1 |p−1 +. . . . . the inequality reads |x1 + y1 | + . . . |yn |p ) p Proof.+|yn ||xn +yn |p−1 ≤ 1 1 1 1 (|x1 |p +. . + |xn | + |yn | and this holds just by applying the triangle inequality: for all 1 ≤ i ≤ n. + |yn | ) p q p q  Theorem 7.+|xn +yn |p ) q +(|y1 |p +. . . We have |x1 + y1 |p + . . . When p = 1. . . + |xn + yn | ≤ |x1 | + |y1 | + . and note that then (p − 1)q = p. . . p p q q (|x1 | + . getting Pn i=1 |xi yi | 1 1 1 1 ≤ + = 1. .+|xn +yn |p ) q  1 1  1 = (|x1 |p + . . + |xn | ) (|y1 | + . . . |yn |p ) p (|x1 + y1 |p + . 3. + |xn + yn |p ) q . (Minkowski’s Inequality) For x1 .+|yn |p ) p (|x1 +y1 |p +. . . . . . . . . . xn .+|xn |p ) p (|x1 +y1 |p +. . . + (xn + yn )2 ≤ x21 + . + |xn + yn |p ) q and using 1 − 1q = p1 . yn ∈ R and p ≥ 1: 1 1 1 (35) (|x1 + y1 |p + . . . we get the desired result. 1 Dividing both sides by (|x1 + y1 |p + . . xn . . . .29. + yn2 . . y1 . + |xn |p ) p + (|y1 |p + . CONVEX FUNCTIONS 147 and sum the resulting inequalities from i = 1 to n. . . + |xn + yn |p HI ≤ |x1 ||x1 +y1 |p−1 +. + |xn |p ) p + (|y1 |p + . + x2n + y12 + . yn ∈ R p q q (x1 + y1 )2 + . . So we may assume p > 1. . (Triangle Inequality in Rn ) For all x1 . . . y1 .30. . + |xn + yn |p ) p ≤ (|x1 |p + . . . . . . Let q be such that p1 + 1q = 1. |xi + yi | ≤ |xi | + |yi . . . . .

.

b]. One feature that we haven’t explicitly addressed yet is this: for 149 . Rc Exercise 1. let’s take a Rb more axiomatic approach: given that we want a f to represent the area under y = f (x). because we do not have a formal definition of the area of a subset of the plane! In high school geometry one learns only about areas of very simple figures: polygons. As you probably know. Rb Rc Rc (I3) If a ≤ c ≤ b. But wait! Before plunging into the details of this limiting process. Rb we wish to associate a number a f . Well. Even more. almost. Dealing head-on with the task of assigning an area to every subset of R2 is quite difficult: it is one of the important topics of graduate level real analysis: measure theory.1: Show (I1) implies: for any f : [a. what properties should it satisfy? Here are some reasonable ones. we turn to the third main theme of calculus: integration. reasonably careful definitions appear in the textbook somewhere. b] into subintervals and take the sum of the areas of cer- tain rectangles which approximate the function f at various points of the interval (Riemann sums). It turns out that these three axioms already imply many of the other properties we want an integral to have. c f = 0. As usual in freshman calculus. the definite integral. Rb the general idea is to construe a f as the result of some kind of limiting process. Rb (I1) If f = C is a constant function. bounded on the left by x = a and bounded on the right by x = b. bounded below by y = 0. circles and so forth. there is essentially only one way to define Rb a f so as to satisfy (I1) through (I3). CHAPTER 8 Integration 1. Rb our intuition is that a f should represent the area under the curve y = f (x). but with so little context and development that (almost) no actual freshman calculus student can really appreciate them. The basic idea is this: for a function f : [a. b] → R and any c ∈ [a. The Fundamental Theorem of Calculus Having “finished” with continuity and differentiation. Rb So we need to back up a bit and give a definition of a f . wherein we divide [a. When f is non-negative. Unfortunately this is not yet a formal definition. then a f = a f + b f . b] → R. Ra b Rb (I2) If f1 (x) ≤ f2 (x) for all x ∈ [a. b]. or more precisely the area of the region bounded above by y = f (x). then a f1 ≤ a f2 . then C = C(b − a).

and one sees easily that the theorem holds in this case. As to exactly what this set R[a. b]. b) If f is continuous at c ∈ [a. b] we have f1 (x) ≤ f2 (x) for all x ∈ [a.  So we may assume M > 0. b]. b] → R}. That is. By (I0). then f ∈ R[a. Thus the class C[a. and there are some functions for which this area con- cept seems meaningful but for which the area is infinite. then a f = a f + c f . with domain some set of functions {f : [a. b] be any integrable function. If this business of “integrable functions” seems abstruse. b]. Now we have the following extremely important result. b]. For x ∈ [a. b] → R is con- tained in the class B[a. let’s assume the following: (I0) For all real numbers a < b: a) Every continuous f : [a. b] is bounded. b] ⊆ R[a. it turns out that we have some leeway. By the Extreme Value Theorem. b] of all bounded functions f : [a. Theorem 8. b]. say R[a. and axiom (I0) requires that the set of integrable functions lies somewhere in between: C[a. c] and Rb Rc Rb f ∈ R[c. b] of all continuous functions f : [a. If M = 0 then f is the constant function 0. b] → R. b] and a C = C(b − a). a function F : [a. Proof. then a f1 ≤ a f2 . b] → R lies in R[a. a a c . Then: a) The function F : [a. b] of integrable functions should be. Let’s recast the other three axioms in terms of our set R[a. If these equivalent conditions hold. INTEGRATION Rb which functions f : [a. every continuous function f : [a. then on the first pass just imagine that R[a. by (I3) Z x Z c Z x (36) F(x) − F(c) = f− f= f. Then f ∈ R[a. and F 0 (c) = f (c). c) If f is continuous and F is any antiderivative of f – i. b] of integrable functions: Rb (I1) If f = C is constant. (I3) Let f : [a. b]. we wish to associate a real number a f .. b] ⊆ B[a. and let c ∈ (a. So it turns out to be useful to think of integration itself as a real-valued function. b] → R is bounded. b] → R do we expect a f to be defined? For all functions?? A little thought shows this not to be plausible: there are some functions so patho- logical that there is no reason to believe that “the area under the curve y = f (x)” has any meaning whatsoever. b]. b].e. then F is differentiable at c. b). define F(x) = a f . b] is precisely the set of all continuous functions f : [a. b] → R is continuous at every c ∈ [a. b) Every function f ∈ R[a.150 8. b] → R Rb such that F 0 (x) = f (x) for all x ∈ [a. b] iff f ∈ R[a. b]. (Fundamental Theorem of Calculus) Rx Let f ∈ R[a. f2 ∈ R[a. b]. b]. of integrable functions f : [a. for each a ≤ b we wish to have a set. For all  > 0. but to get a theory which is useful and not too complicated. Indeed. b] → R and for each Rb f ∈ R[a.1. then a f = F (b) − F (a). and then it follows from (I1) that F is also the constant function zero. there exists M ∈ R such that |f (x)| ≤ M for all x ∈ [a. b] → R. b]. Rb Rb (I2) If for f1 . we may take δ = M .

x c) By part b). These same considerations answer the conundrum of why the celebrated Theo- rem 8. so by (I2) and then (I2) we have Z B Z B −M (B − A) = (−M ) ≤ f ≤ M (B − A). THE FUNDAMENTAL THEOREM OF CALCULUS 151 Moreover. there exists δ > 0 such that |x − c| < δ =⇒ |f (x) − f (c)| < . Therefore Rx Rx Rx f (c) −  c c f f (c) +  f (c) −  = ≤ ≤ c = f (c) + . We have just found an antiderivative F. let a ≤ A ≤ B ≤ b. B = x. Then still −M ≤ f (x) ≤ M for all x ∈ [A. But we have shown that if antiderivatives exist at all they are unique up to an additive constant. A A and thus Z B (37) | f | ≤ M (B − A). so if F is any other antiderivative of f we must have F (x) = F(x) + C for some constant C. and again. we get Z x    |F(x) − F(c)| = | f | ≤ M |x − c| < M = . Soon after I gave my 2005 lectures I found that a very similar “axiomatic” treatment of the integral was given by the eminent mathematician .1 has such a short and simple proof. or equivalently f (c) −  < f (x) < f (c) + . a a a  Remark: Although we introduced the integral “axiomatically”. for all  > 0. x−c This shows that F 0 (c) exists and is equalR to f (c). A  Now suppose |x − c| < δ = M. x−c x−c x−c and thus Rx f | c − f (c)| ≤ . c M b) Suppose f is continuous at c. x−cx→c Since f is continuous at c. F(x) = a f is an antiderivative of f . there is at most one such function. The current presentation is an adaptation of my lecture notes from this older course. 1. B].1 The theorem assumes that we already 1This is not just florid language. Using (36) and then (37) with A = c. b] → R satisfying the (rea- Rb sonable!) axioms (I1) through (I3) is to take a f to be an antidervative F of f with F (a) = 0. as long as we are only trying to integrate continuous functions we had no choice: the only way to Rb assign a value a f to each continuous function f : [a. I taught second semester calculus four times as a graduate student and really did become puzzled at how easy it was to prove the Fundamental Theorem of Calculus so soon after integration is discussed. We wish to compute F(x) − F(c) F 0 (x) = lim . if f is continuous. and then Z b Z a Z b F (b) − F (a) = (F(b) + C) − (F(a) + C) = F(b) − F(a) = f− f= f. I worked out the answer while teaching an undergraduate real analysis course at McGill University in 2005.

.1b) to get that a f is an antiderivative of f . but we have not yet constructed this integral! In other words. b] → R such that for every closed subinterval [c. It is also what freshman calculus students seem to think is taught in Rb freshman calculus.2 But I do not know any way to prove that an arbitrary continuous function has Rb an antiderivative except to give a constructive definition of a f as a limit of sums Rx and then appeal to Theorem 8. Thus Theorem 8. 2This is not what the books actually say. we have settled the problem of uniqueness of the definite integral but (thus far) assumed a solution to the much harder problem of existence of the definite integral. Now we begin the proof of the hard fact lurking underneath the Fundamental The- orem of Calculus: that we may define for every continuous function f : [a. And again. Building the Definite Integral 2. Upper and Lower Sums. INTEGRATION Rb have an integral. . f : [c. this existence problem is equivalent to an existence problem that we mentioned before. d] ⊂ [a. and then we want to consider the lower sum and upper sum associated to f on each subinterval. Of course this holds for all continuous functions. Okay. Then the lower sum should be less than or equal to the “true area under the curve” which should be less than or equal to the upper sum.. let’s do it! Serge Lang in [L]. b].152 8. This is the approach that Newton himself took.. let us only consider functions f : [a. So the presentation that I give here is not being given by me for the first time and was not originated by me. although he didn’t prove that every continuous function was a derivative but rather merely assumed it. d] → R has a maximum and minimum value.1 is “easy” because it diverts the hard work elsewhere: we need to give a constructive definition of the definite integral via a (new) kind of limiting pro- cess and then show “from scratch” that applied to every continuous f : [a. namely that the definition of a f is F (b) − F (a). Thus: if we could prove by some other means that every continuous function f is the derivative of some other function F . namely that every continuous function has an antiderivative. b] → R Rb this limiting process converges and results in a well-defined number a f . The basic idea is familiar from freshman calculus: we wish to subdivide our in- terval [a. 2. We have shown that there is at most one such integral on the con- tinuous functions. and by dividing [a.1. i. b] → R Rb a number a f so as to satisfy (I1) through (I3) above.but nevertheless the material is rarely presented this way. so we should define a f via some limiting process involving lower sums and upper sums. b] into a bunch of closed subintervals meeting only at the endpoints. b] → R) 7→ a f for every continuous function f . but what they actually say they don’t say loudly enough in order for the point to really stick. we will make a simplifying assumption on our class of integrable functions: namely.e. so it will be a good start. b] into more and smaller subintervals we should get better and better Rb approximations to the “true area under the curve”. an assignment (f : [a. For now. then by the above we may simply define Rb a f = F (b) − F (a).

This definition. To see this. although correct.) Here are some examples. b] → R and the partition P = {x0 . f is integrable. (It turn outs that there is always at least one I lying between every lower sum and every upper sum. with a f = b − a. 2. We say the function f : [a. P) = C(b − a). any continuous function!). b] and use them to divide [a. b] we have U (f. xi+1 ] and let Mi (f ) denote the maximum value of f on the subinterval [xi . but this is as yet far from clear. . b] → R be any function admitting a minimum and maximum value on every closed subinterval of [a. . Let f : [a. xi+1 ]: by definition of mi . then for every partition P on [a. xn } as n−1 X L(f. P) = b − a. For- mally. BUILDING THE DEFINITE INTEGRAL 153 Step 1: We need the notion of a partition of an interval [a. P).g. or equivalently the area under the constant function y = mi (f ) on [xi . xi+1 ]. For 0 ≤ i ≤ n − 1. at which f (c) = 0. b] we have L(f. this is the largest constant function whose graph lies on or below the graph of f at every point of [xi . b] f has 1 as its maximum value. and the other (distinct) points are absolutely arbitrary but written in increasing order. P) = U (f. On the other hand. ≤ an−1 ≤ an = b. Example 2. Indeed this is because on every subinterval of [a. Thus the unique I in question is celarly C(b − a): constant functions are integrable. b] we have L(f. b]: we choose finitely many “same points” in [a. a partitition P is given by a positive integer n and real numbers a = a0 ≤ a1 ≤ . That is. Then we define the lower sum associated to f : [a.1: If f (x) ≡ C is a constant function. . b] → R is integrable if there is a unique I ∈ R such that for every partition P of [a. the quantity xi+1 − xi is simply the length of the subinterval [xi . Therefore the quantity mi (f )(xi+1 − xi ) is simply the area of the rectangle with height mi (f ) and width xi+1 − xi . we require the “first sample point” a0 to be the left endpoint of the interval. P) ≤ I ≤ U (f. b] (e. is not ideally formulated: it underplays the most important part – the uniqueness of I – while making it annoying to show the exis- tence of I. So consider the constant function mi (f ) on the interval [xi . b] except for one interior point c. b] into subintervals. i=0 These sums have a simple and important geometric interpretation: for any 0 ≤ i ≤ n − 1. for . . xi+1 ]. first observe that for any partition P of [a. P) = mi (f )(xi+1 − xi ) i=0 and also the upper sum associated to f and P as n−1 X U (f. the “last sample point” an to be the right endpoint of the interval. xi+1 ]. P) = Mi (f )(xi+1 − xi ). .2: Suppose f (x) is constantly equal to 1 on the interval [a. Example 2. let mi (f ) denote the minimum value of f on the subinterval [xi . . We claim that despite having a dis- Rb continuity at c. xi+1 ].

this shows that f is not integrable: rather than the lower sums and the upper sums becoming arbitrarily close together. P) = 0 and U (f. 1] and show that L(f. 1] into n equally spaced subintervals – let us call this partition Pn – then since f is increasing its minimum on each subinterval occurs at the left endpoint and its maximum on each subinterval occurs at the right endpoint. who came up with an elegant way to see whether there is an unbridgeable gap between the lower and upper sums. But constancy of upper sums only occurs in trivial ex- amples.3: Show that starting with the constant function C on [a. For this very simple function f (x) = x it is possible to grind this out directly. For instance.e. which as we will see is much more pleasant than our current setup. 2. i=0 n n n i=1 n 2 2n Since limx→∞ x−1 2x = limx→∞ x+1 1 2x = 2 . b] we have L(f. P) and U (f. . P) ≤ 12 ≤ U (f. When this happens. P) is b − a. but it’s quite a bit of work. P) are con- stant. one can show f is integrable by finding a sequence of partitions for which the lower sums approach this common value U (f. And that’s just to find the area of a right triangle! Example 2. i=0 n n n i=0 n 2 2n and n−1 X  n   i+1 1 1 X 1 n(n + 1) 1 U (f. In the next section we do things Darboux’s way. Exercise 2. It did so to the late 19th century mathematician Jean-Gaston Darboux. 1]. P) = 1·(b−a) = b−a. If we partition [0. whereas on every other subinterval the minimum is again 1.. 1]. Because every subinterval contains both ratio- nal and irrational numbers. b] → R which is 1 at every irrational point and 0 at every rational point. Assuming of course that a < b. the upper and lower sums can both be 1 made arbitrarily close to 2 by taking n to be sufficiently large. Pn ) = · = 2 i= 2 · =1− . Darboux Integrability. The previous examples have the property that the upper sums U (f. P) = (b − a)(1 − ). Darboux’s definition crucially and cleverly relies on the existence of suprema and infima for subsets of R. Unfortunately we have not yet shown that f is integrable according to our definition: to do this we would have to consider an arbitrary partition P of [0. there is an unbridgeable gap between them. so a f = (b − a). Pn ) = · = 2 i= 2 =1+ . c is not one of the points of the partition).4: Consider the function f : [a. its integral must be 12 . This shows that the unique number between Rb every L(f.154 8. P). so L(f. The previous example perhaps suggests a solution to the problem. Thus if f (x) = x is integrable on [0. suppose we want to show that f (x) = x is integrable on [0.2. b] and changing Rb its value at finitely many points yields an integrable function f with a f = C(b−a). P) which must then be the integral. we may choose a partition in which c occurs in exactly one subinterval (i. for every partition P of [a. Then the lower sum on that subinterval is 0. INTEGRATION any sufficiently small  > 0. Thus n−1 Xi 1 n−1   1 X 1 (n − 1)n 1 L(f.

. Conversely. then by parts a) and b) f is bounded above and below on [a. We say that P2 refines P1 if P2 contains every point of P1 : i. Observe though that the lower sum could take the value −∞ and the upper sum could take the value ∞. P) = Mi (f )(xi+1 − xi ) ∈ (−∞. P) = ∞. L(f. (iii) f is not bounded above on [a. Proof. < xn−1 < xn = b} be a partition of [a. we define the lower and upper sums associated to P: n−1 X L(f. P2 ) ≤ U (f. b] such that L(f. let mi (f ) be the infimum of f on [xi . if f is bounded on [a. b] → R be any function. b]. which forces L(f. P) > −∞. Proposition 8. P) > −∞ and U (f. ∞]. P) = −∞. 2. b] → R we have L(f.  Let P1 and P2 be two partitions of [a. b]. . . b]. P). i=0 For any f and P we have (38) L(f. P) = −∞. b]. b]. b]. c) If for all partitions P. a) The following are equivalent: (i) For all partitions P of [a. b]. BUILDING THE DEFINITE INTEGRAL 155 Let f : [a. So there is at least one i such that mi (f ) = −∞. (iii) =⇒ (i): Suppose f is not bounded below on [a. b) This is similar enough to part a) to be left to the reader. ∞). P) < ∞. c) The following are equivalent: (i) For all partitions P of [a. b] (i. b]. The following result clarifies when this is the case. P) < ∞. Then for all partitions P = {a = x0 < . xi+1 ] and Mi (f ) be the supreumum of f on [xi . P1 ) ≤ L(f. P) = mi (f )(xi+1 − xi ) ∈ [−∞. so is bounded on [a. For a partition P = {a = x0 < x1 < . b]. we have mi (f ) ≥ m > −∞. P2 refines P1 ). b]. ∞) and Mi (f ) ∈ (−∞. P) < ∞. (iii) f is not bounded below on [a. so by parts a) and b). xi+1 ]. P2 ) ≤ U (f. Thus we have mi (f ) ∈ [−∞. Lemma 8. . i=0 n−1 X U (f.e. < xn−1 < xn = b} and 0 ≤ i ≤ n − 1.. P1 ). P) > −∞ and U (f. (Refinement Lemma) Let P1 ⊂ P2 be partitions of [a. P) ≤ U (f. ∞]. contradicting our assumption. Let f : [a. P) = −∞. b) The following are equivalent: (i) For all partitions P of [a. so L(f. b]. L(f. for all partitions P we have L(f. L(f. (ii) f is bounded on [a. then minn−1 i=0 mi (f ) is a finite lower bound for f on [a. . a) (i) =⇒ (ii) is immediate. P) = ∞. .e. b]. if P1 ⊂ P2 . (ii) =⇒ (iii): We prove the contrapositive: suppose that there is m ∈ R such that m ≤ f (x) for all x ∈ [a. (ii) There exists a partition P of [a. b] → R be any function. . Then for any bounded function f : [a. U (f. b]. If mi (f ) > −∞ for all 0 ≤ i ≤ n − 1. (ii) There exists a partition P of [a. and let P = {a = x0 < . As above. P) > −∞ and U (f. < xn−1 < xn = b} and all 0 ≤ i ≤ n − 1. b] then it is bounded above and below on [a.3. b] such that U (f. b]. .2.

M2 . P2 ). Then L(f. so m ≤ m1 . P2 ). but the most economical choice is simply the union of the two: put P = P1 ∪ P2 . d]. Now. Then m = inf m1 . M2 ≤ M. Let m and M be the infiumum and supremum of f on [c. P) as Rb P ranges over all partitions of [a. For any function f : [a. INTEGRATION Proof. The idea here is simple but important: we choose a common refine- ment of P1 and P2 . Any two partitions have infinitely many common refinements. P1 ) ≤ U (f. a a Proof. any lower sum associated to any partition is less than or equal to the upper sum associated to any other partition. P1 ) ≤ inf U (f. we say that f is Darboux Rb Rb Rb integrable if a f = a f ∈ R. a P1 P2 a  . e]. i. b] and the upper integral a f as the infimum of U (f. P2 ). Then by Lemma 8. Let f : [a.4. P2 ) = f. P2 ). M1 ≤ M. That is.156 8. b] → R. Then m(d − c) = m(e − c) + m(d − e) ≤ m1 (e − c) + m2 (e − c) and M (d − c) = M (e − c) + M (d − e) ≥ M1 (e − c) + M2 (e − c).e. and let P1 . b] → R be a function.4. Futher. P) as P ranges over all partitions of [a. P) ≤ U (f. and m2 and M2 be the infimum and supremum of f on [e.  Now we come to the crux of Darboux’s theory of integrability: for every function f : [a. P) ≤ U (f. let m1 and M1 be the infimum and supremum of f on [c. So it is enough to address the case of a single interval. Y ⊂ R are such that x ≤ y for all x ∈ X and all y ∈ Y . we get L(f.  Lemma 8. b] → R we define the lower integral intba f as the supremum of L(f. Lemma 8. for any partitions P1 and P2 we have L(f. Finally.5. we have Z b Z b f≤ f. b]. P2 be partitions of [a. a partition which refines (contains) both P1 and P2 . d] of P1 and insert a new point e to subdivide it into two subintervals. and we denote this common value by a f . Proof. P1 ) ≤ L(f. P2 ) ≤ L(f. then sup X ≤ inf Y . m2 and M = sup M1 . d].3 we have L(f. If P2 refines P1 . This shows that subdivision causes the lower sum to stay the same or increase and the upper sum to stay the same or decrease.. Since this holds each time we add a point to get from P1 to P2 . by Lemma 8. Recall that if X. Therefore Z b Z b f = sup L(f. m ≤ m2 . P1 ) ≤ U (f. P2 is obtained from P1 by a finite number of instances of the following: choose a subinterval [c. P2 ) ≤ U (f. b]. P1 ) ≤ L(f.

P) − L(f. Adding these two inequalities gives a Z b Z b f− f ≤ U (f. P) ≤ I ≤ U (f. Similarly. for every  > 0. c) This follows immediately from parts a) and b). P) < . a) If f is unbounded below. BUILDING THE DEFINITE INTEGRAL 157 Proposition 8. Theorem 8. we have L(f. P). b] → R. P) < . b) This is very similar to part a) and is left to the reader. P) < − f. there exists Rb P2 with U (f. then for all P. P) < . P) ≤ U (f. Let f : [a. P) − L(f. P) > −∞. b] → R be any function. b] → R with f= a f = ∞? a Finally. P). Rb a) We have a f = −∞ iff f is unbounded below. if f is Darboux integrable. a 2 and thus Z b  (40) −L(f. b]. since a f = a f = inf P U (f. a f ∈ R iff f is bounded. P1 ) > f− . Rb Proof. Rb Rb Exercise 2. P) − L(f. Rb Rb Proof. 2 a Adding (39) and (40) gives U (f. In particular. Let P = P1 ∪ P2 . But by definition.7. (iii) There is exactly one real number I such that for all partitions P of [a. we have a f ≤ U (f. there exists a partition P such that Rb Rb U (f. P2 ) < a + 2 . P) and ≥ L(f. P1 ) > a − 2 . (ii) =⇒ (i): By assumption. Rb Rb c) Therefore a f. L(f. P). (Darboux’s Integrability Criterion) For a bounded function f : [a. b] such that U (f.5: Is there a function f : [a. f = supP L(f. P) − L(f. 2. then it is bounded. P) ∈ R and thus supP L(f.  In particular the class of Darboux integrable functions satisfies axiom (IOb). there exists a Rb Rb Rb P1 with L(f. a If f is bounded below. Since P is a common refinement of P1 and P2 we have Z b  (39) U (f.6. the following are equivalent: (i) f is Darboux integrable. P) Rb a and thus also − f ≤ −L(f. a a . P) = supP −∞ = −∞. P). Since a f = f = supP L(f. Rb b) We have a f = ∞ iff f is unbounded above. P) ≥ L(f. P2 ) < f+ a 2 and also Z b  L(f. here is the result we really want. (ii) For all  > 0 there exists a partition P of [a. (i) =⇒ (ii): Fix  > 0. P) < .

Verification of the Axioms. Theorem 8. b] → R Rb the number a f satisfies axioms (I0) through (I3) above. Then (41) n−1     n−1 X b−a b−a X U (f. b]) 7→ a f satisfies the axioms (I0) through (I3) introduced in §1. Let n ∈ Z+ be such that b−a n < δ. suppose I < a f= f . Pn ) = (Mi (f ) − mi (f )) ≤ Mi (f ) − mi (f ). so for all  > 0. c) Thus the Fundamental Theorem of Calculus holdsR for the Darboux integral. P) ≤ f< f ≤ U (f. b] → R be continuous. Suppose that f is not Darboux inte- grable. this shores up the foundations of the Fundamental Theorem of Calculus and completes the proof that every continuous f : [a. P). a a Rb Rb Rb (i) =⇒ (iii): Suppose f is Darboux integrable. The key is that f is uniformly continuous. We now tie together the work of the previous two sections by showing that the assignment Rb f ∈ R([a. In x particular. In summary.3. On the other hand. Pn ) − L(f. for every continuous function f . This shows that a f is the unique real number which lies in between every lower sum and every upper sum. then I is greater than the infimum of the upper sums.5 we have f ≤ a f . i=0 n n i=0 . In particular. b] into n subintervals of equal length b−an .158 8. so there exists a partition P with I < L(f. P). Then for any partition P we have Z b Z b L(f. and let Pn be the partition of [a. so there exists a partition Rb P with U (f. we wish to prove the following result. a f ] lies between every upper sum and every lower sum. b]. INTEGRATION Rb Rb Since this holds for all  > 0. if I > a f = a f . Let R([a. a a a Rb Rb Moreover.  |x1 − x2 | < δ =⇒ |f (x1 ) − f (x2 )| < b−a . a) Let f : [a. Then I is less than the supremum of the lower a Rb Rb sums. F (x) = a f is an antiderivative of f . we have a f ≤ f . Then for a all partitions P we have Z b Z b Z b L(f. there is δ > 0 such that for all x1 . b) The operation which assigns to every Darboux integrable function f : [a. a  2. so a = f = a f ∈ R. b]) denote the set of Darboux integrable functions on [a. so f = a f ∈ R and thus f is Darboux integrable. P) ≤ f= f≤ f ≤ U (f. (Main Theorem on Integration) a) Every continuous function f : [a. Proof. P) < I. (iii) =⇒ (i): We prove the contrapositive. b] → R is Darboux integrable. P). b] → R has an antiderivative. by Lemma a Rb Rb Rb Rb 8. x2 ∈ [a. a a Rb Rb and thus every I ∈ [ f.8. b]. Similarly.

U (f. P) − L(f. b]. By the Refinement Lemma. P) ≤ L(f. Then L(f. b] → R. b] we have L(f1 . P1 ) − L(f. P2 ) ≤ f+ f ≤ U (f. Pc = P ∪ {c}. n i=0 n i=0 b−a b) (I0): By part a). BUILDING THE DEFINITE INTEGRAL 159 Now for all 0 ≤ i < n − 1. Pc ) = L(f. P1 )+L(f. P1 )) = (U (f. 2 Then P = P1 ∪ P2 is a partition of [a. P1 ) + U (f. . P2 ) < . Pc ) = L(f. Pc ) < . P2 ) a c = U (f. di ∈ [xi . Let Pc = P ∪ {c}. P). c] and P2 = Pc ∩ [c. P2 = Pc ∩ [c. b]. every Darboux integrable function on [a. P) − L(f. P1 ) + L(f. P2 ). P1 ))+(U (f. and therefore (U (f. for all  > 0. 2 2 As for the value of the integral: fix  > 0. xi+1 ]. c] → R and f : [c. P2 ) < 2 and a partition P2 of [c. Pc ) ≤ U (f. Let P be any partition of [a. P2 ) − L(f. P) ≤ L(f2 . P1 ) + U (f. so  (42) |Mi (f ) − mi (f )| = |f (di ) − f (ci )| < . and thus Z b Z b f1 = sup L(f1 . suppose f : [a. mi (f ) = f (ci ) and Mi (f ) = f (di ) for some ci . P1 )) + (U (f. b] → R are both Darboux integrable and such that f1 (x) ≤ f2 (x) for all x ∈ [a. and U (f. then for every partition P of [a. Pc ) − L(f. Thus |ci − di | ≤ xi+1 − xi = b−an < δ. P2 ) − L(f. Let P1 = Pc ∩ [a. P1 = Pc ∩ [a. P2 ))−(L(f. there exists a partition P of [a. we showed that the constant function C is integrable on [a. b] → R are Darboux integrable. every continuous function f : [a. b−a Combining (41) and (42) gives   n−1   n−1 b−a X b−a X  U (f. f2 : [a. P2 ))   = (U (f. P1 ) + U (f. let  > 0. P1 )−L(f. b). P1 ) + L(f. b] is bounded. c]. Pc ) = U (f. P2 ) − (L(f.6. and let c ∈ (a. (I1): In Example 2. b]. P) ≤ sup L(f2 . 2. Pc ) ≤ U (f. Pn ) − L(f. Conversely. P) < . P). P1 ) + L(f. b] → R is Darboux integrable: thus. P) − L(f. (I2): If f1 . By Darboux’s criterion. P2 )−L(f. a P P a (I3): Let f : [a. P) = f2 .1. b] → R is Darboux integrable. P1 )+U (f. b] such that  U (f. b] with U (f. Pc ) ≤ U (f. so U (f. c] such that  U (f. b]. b] → R are Darboux integrable. P) < . so by Darboux’s criterion f : [a. P2 )) = U (f. Then Z c Z b L(f. Pc ) ≤ U (f. Pc ) − L(f. b] with Rb a C = C(b − a). L(f. b]. By Proposition 8. P2 ). c] → R and f : [c. P) = U (f. there is a partition P1 of [a. P). Pn ) ≤ (Mi (f ) − mi (f )) ≤ = . P2 )) < + = . Suppose first that f : [a. P1 ) − L(f. P) ≤ L(f.

P) − L(f.7 a f + c f = a f . Since x ∈ S(). b] by Real Induction. Then U (f.  2. Pa ) = f (a) · 0 = 0. b] with U (f. [x. Then f is Darboux integrable. Since x−δ < x. b] → R be a continuous function on a closed bounded interval. x + δ]) < . namely P 0 = {x. The argument for this is the same as for (RI2) except we use the interval [x − δ. (RI3) Suppose that for x ∈ (a.9. [x. Px−δ ) + U (f. P 0 ) = (x+δ−x)(max(f. x] with U (f. x} and let Px = Px−δ ∪ P 0 . x])−min(f. In this section we will give a proof of the Darboux integrability of an arbitrary continuous function f : [a. x + δ] ⊂ S(). Px ) + U (f. x]) − min(f. x + δ]. we can make the difference between the maximum value and the minimum value of f as small as we want by taking a sufficiently small interval around x: i. [x−δ. it suffices to show that for all  > 0. [x. Px ) + U (f. By Darboux’s Criterion. Pa ) = 0 < . x])) < (x − δ − a) + δ = (x − a). (RI2) Suppose that for x ∈ [a. P 0 ) < (x − a) + δ = (x + δ − a). Px ) − L(f.e. (RI1) The only partition of [a. b]. there exists a partion P of [a. b] we have [a. [x − δ. We must show that there is δ > 0 such that [a. x + δ]) − min(f. P) for every c Rc Rb Rb partition P of [a. Fix  > 0. P 0 ) − L(f. Thus if we put Px+δ = Px +P 0 and use the fact that upper / lower sums add when split into subintervals. INTEGRATION Rc Rb Thus a f+ f is a real number lying in between L(f. Px ) − L(f. Px+δ ) < (x + δ − a)). Px−δ ) = (x−δ −a). P 0 ) = U (f. Px+δ ) − L(f. P 0 ) − (L(f. b] such that U (f. [x−δ.  . P 0 ) − L(f. x−δ] such that U (f. which first proves the integrability using uniform continuity as we did above and then later goes back to give a direct proof. our strategy will be to show S() = [a. x] instead of [x. Let f : [a. Then U (f. P) < (b − a). Px−δ ) = L(f. so U (f. b) we have [a. Px−δ )) + δ(max(f. Px ) − L(f. x+δ])−min(f. x + δ] such that U (f. x) ⊂ S(). b] such that U (f. Px ) < . we have U (f. so by Theorem 8. there is a partition P of [a. and by the above observation it is enough to find δ > 0 such that x + δ ∈ S(): we must find a partition Px+δ of [a. P) and U (f. P 0 )) = (U (f. Since f is continuous at x. a] is Pa = {a}. Px ) − L(f. and for this partition we have U (f. An Inductive Proof of the Integrability of Continuous Func- tions. We want to show b ∈ S(). Px+δ ) − L(f. x−δ ∈ S() and thus there exists a partition Px−δ of [a. Now take the smallest partition of [x. We should say that we got the idea for doing this from Spivak’s text. Let P 0 = {x − δ. there is a partition Px of [a. P) − L(f. Px ) < (x − a). Px ) − L(f. c) This is immediate from Theorem 8. x + δ].160 8. b] → R which avoids the rather technical Uniform Con- tinuity Theorem. P 0 ) − L(f. Px+δ ) = U (f. and let S() be the set of x ∈ [a.1 (the Fundamental Theorem of Calculus!). Px ) = U (f. b] such that there exists a partition Px of [a. P) < . Indeed: since f is continuous at x. Pa ) = L(f. Theorem 8. x]) < . x + δ}. Px−δ ) + L(f. there exists δ > 0 such that max(f. It is convenient to prove the following slightly different (but logically equivalent!) statement: for every  > 0. x+δ])) < δ. Px−δ ) − L(f. Proof. We must show that x ∈ S(). [x. Pa ) − L(f.. there is δ > 0 such that max(f. [x − δ.4. x] ⊂ S().

[c−δ. then also y ∈ S(). Spivak [S. [c − δ. c + δ]) − inf(f. c) = limδ→0+ ω(f. Let f : D ⊂ R → R. if c is the left endpoint we put ω(f. we are considering the oscillation of f on smaller and smaller intervals centered around c and taking the limit as δ approaches zero.J.9: he establishes equality of the upper and lower integrals by differentiation. c) = limδ→0+ ω(f. |f (x)−f (c)| < . we can still define the oscillation ω(f. What’s the point? This: Proposition 8. c + δ]).. FURTHER RESULTS ON INTEGRATION 161 Exercise 2. there exists δ > 0 such that for all x with |x − c| ≤ δ. [c − δ. I) and sup(f. c). J) ≤ ω(f. c + δ]) = sup(f. ∞] and is simply equal to the infimum. The oscillation. |f (x) − f (c)| < . I). c + δ]) is an increasing function of δ. [c − δ. J) ≥ inf(f. i. So f is continuous at c. If J ⊂ I ⊂ D. I) = sup(f. (i) =⇒ (ii): If ω(f. so ω(f. c + δ]) = 0. and c an interior point of I. The following are equivalent: (i) ω(f. Proposition 8. c) = lim+ ω(f. This sort of proof goes back at least to M. inf(f. pp. J) ≤ sup(f. so the limit as δ approaches zero from the right exists as an element of [0. We define the oscillation of f on I as ω(f. Suppose now that c is a point in the interior of the domain D of f . We define the oscillation of f at c to be ω(f.g. c + δ]) < . and thus (43) ω(f. I) is in general an extended real number. (ii) f is continuous at c.10. [c−δ. I). . I). just by taking suitable half-intervals: e. and let I be an interval contained in the domain D of f . [c − δ.  Remark: If f : I → R and c is an endpoint of I. it is an honest real number iff f is bounded on I (which will almost always be the case for us). then for all  > 0. 3. c + δ]).10 remains true even if c is an endpoint of the interval I. δ→0 In other words. Further Results on Integration 3. then for all  > 0 there exists δ > 0 such that for all x with |x−c| ≤ δ. [c − δ. [c. and then sup(f. c) = 0. Note that ω(f. c) = 0. c+δ]) ≤ 2. (ii) =⇒ (i): This is almost exactly the same. 3. c+δ]) ≥ f (c)−. 292-293] gives a different uniform continuity-free proof of Theo- rem 8. [c − δ.e. f : I → R be a function. With this definition and our usual standard con- ventions about continuity at an endpoint of an interval. then inf(f. Proof. Norris [No52]. If f is continuous at c. Let I be an interval. I) − inf(f. Because of (43) the function δ 7→ ω(f. c+δ]) ≤ f (c)+. there exists δ > 0 such that ω(f.6: Show that if x ∈ S() and a ≤ y ≤ x.1. [c−δ. so ω(f.

The second statement follows easily: Z b Z b Z c | f− f| = | f | ≤ 2M (c − a). So I wrote up a “di- rect” proof of this and it was long and messy. < xn−1 < xn = b} be a partition of [a. Since the infimum of f on any subinterval of [a. b] → R with only finitely many discontinuities is Darboux integrable.11.162 8. 3. so f is Darboux integrable on [a. f |[c.b] : [c. i=0 Thus this notation is just a way of abbreviating the quantities “upper sum minus lower sum” which will appear ubiquitously in the near future. we can choose c ∈ (a. b] → R is Darboux integrable. xi+1 ])(xi+1 − xi ) = U (f. We can restate Dar- boux’s Criterion especially cleanly with this new notation: a function f : [a. P ∩ [a + δ. b] → R and let P = {a = x0 < x1 < . Let f : [a. It turns out that it is no trouble to prove a stronger version of this. b] → R be bounded.  3For once we do not introduce a name but only a piece of notation. P) = ω(f. prove it for every function with n + 1 discontinutities (inductive step). b]) as small as we like with a suitable choice of P. One then has to prove the result for a function with a single discontinuity (base case). this implies that f : [a. and as we know. b] → R is Darboux integrable.b] has exactly n discontinuities. INTEGRATION Now let f : [a. P ∩ [a + δ. P). since it is the difference between the upper and the lower sum. b]. Afterwards I realized that a better argument is by induction on the number of discontinuities. P) = ∆(f. a c a and the last quantity goes to zero as c → a+ . b]. . At this point. For such partitions. I want to discuss the result that a bounded function f : [a. P) the discrepancy of f and P. b) such that f |[a. which we can make as small as we wish by taking δ small enough. P ∩ [a. c→a c a Proof. Theorem 8. a + δ]) + ∆(f. P) < . The restricted functions are Darboux integrable by the base case and the induction hypothesis.c] has exactly one discontinuity and f |[c. we may make ∆(f. b] → R is integrable iff for all  > 0. [xi . having chosen δ. Then f is Darboux integrable and Z b Z b lim+ f= f. Fix δ > 0 and consider partitions P of [a. b). . b] → R has n + 1 points of discontinuity. b] with ∆(f. ∆(f. but this is not especially apt. b] with x1 = a + δ. We define3 n−1 X ∆(f. Suppose that for all c ∈ (a. a + δ]) ≤ 2M δ. and assuming the result for every function with n discontinuities. there exists a partition P of [a. Let M > 0 be such that |f (x)| ≤ M for all x ∈ [a. P) − L(f. b]. [a. since f is assumed to be Darboux integrable on [a + δ. Similarly. But in fact it is simplest not to call it anything but ∆(f. In an earlier course on this subject I called this quantity “the oscillation of f on P”. Discontinuities of Darboux Integrable Functions. ∆(f. Here the inductive step is especially easy: if f : [a. b]. b] is at least −M and the supremum is at most M . P)! . Thus we can make the oscillation at most  for any  > 0. So really it is enough to treat the case of a bounded function with a single discontinuity. Better perhaps would be to call ∆(f. b]).2.

+ f (xn )) . Let f : [a. Pn ) arbitrarily small. xi+1 ] is attained at the right endpoint xi+1 . + f (xn−1 )) . b] into n equal parts. P) < . they are dense: any nontrivial subinterval contains at least one R 1point of discontinuity. .11 (and its reflected version) to prove Corollary 8. For such functions a miracle occurs: for every partition P = {a = x0 < x1 < . A monotone function f : [a. + f (xn )(xn − xn−1 ). To construct such things becomes much easier with some knowledge of infinite sequences and series. Things simplify further if we simply take Pn to be the uniform partition of [a. n Thus taking n sufficiently large we may take U (f. the infimum of f on [xi . Therefore L(f. n so   b−a U (f. . < xn−1 < xn = b} of [a. and thus if f integrable. Then f is Darboux integrable. . P) = 0. P) = 0. It remains to 0 show that for each  > 0 we may find a partition P of [0. This result is also telling us that under certain situations we need not bother to consider “improper integrals”: the improper integral will exist iff the conventional Darboux integral exists. . FURTHER RESULTS ON INTEGRATION 163 Of course there is an analogous result with the roles of a and b reversed. Theorem 8. Exercise 3. and f of any irrational number is zero. Theorem 8. except to make the following advertisement: for any real numbers a < b and any injective function s : Z+ → [a. so we defer the point until then. Pn ) = (f (b) − f (a)) . b]. . Thus not only does f have infinitely many points of discontinuity in [0. n b−a U (f. xi+1 ] is attained at the left endpoint xi and the supremum of f on [xi . By reflection it suffices to deal with the case of a weakly increasing case.12. P) = f (x1 )(x1 − x0 ) + f (x2 )(x2 − x1 ) + . xi+1 ] is zero. xi+1 ] contains an irrational number.  Theorem 8. its integral is zero. Pn ) − L(f. Pn ) = (f (x0 ) + . f ( pq ) = 1q . U (f. so we defer this discus- sion until later. . 1] such that U (f. the infimum of f on [xi . + f (xn−1 )(xn − xn−1 ). b] and for all 0 ≤ i ≤ n − 1. .2 (Thomae’s Function): Let f : [0.1: Use Theorem 8. First observe that since every subinterval [xi . Proof. there exists an increasing function f : [a. Then f is continuous at 0 and at all irrational numbers but discontinuous at every rational number. b]. . Then b−a L(f. b] → R is Darboux integrable.13. . b] → R be a bounded function which is continuous except at a finite set of points in its domain. . b] → R which is discontinuous at precisely the points s(n) ∈ [a. Pn ) − L(f. It follows then that R1 = supP L(f. 1] → R be defined by f (0) = 0.13 goes beyond Theorem 8. 3. We claim that nevertheless f is Darboux integrable and 0 f = 0.12: there are increasing functions which are discontinuous at infinitely many points. so f is Darboux integrable by Darboux’s criterion. Example 3. Pn ) = (f (x1 ) + . P) = f (x0 )(x1 − x0 ) + f (x1 )(x2 − x1 ) + . 1].12. . This will make a lot more sense in the context of a discus- sion on improper integrals. so for any partition P the lower sum is L(f.

observe that for any fixed . b] such that n−1 X b−a (44) ∆(f. Proof. [xi . bn ] such that for all n. since there are N bad subintervals over all.g. there are only finitely many nonzero ratio- nal numbers pq in [0. Pn ) = ω(f. Let S be the set of x ∈ [a. • an ≤ an+1 . ω(f. b1 ] we have ω(f.e. xi+1 ]) ≥ n1 for all i and thus 1 1 1 xn − x0 b−a ∆(f. xi+1 ] ⊂ [a1 . and so forth (and in fact there are less than this because e. 1]. Choose a partition P such that each of these points x lies in the interior of a subinterval of length at most N . there exists z ∈ (x. b] such that f is continuous at x. xi+1 ] of [a.. this shows that f is Darboux integrable. . and the remaining part of the sum is at most  times the length of [a. Since of course lim→0 2 = 0. at most 2 with denominator 2. and instead of considering f as defined on [a. Pn ) ≥ (x1 − x0 ) + (x2 − x1 ) + . y) such that f is continuous at z. • bn+1 ≤ bn . Then S is dense in [a. for all a ≤ x < y ≤ b. b1 ] such that for at least one subinterval [xi .. we now consider it as defined on the subinterval [a1 . for all n ∈ Z+ there is a partition Pn = {a = x0 < x1 < . b2 ] ⊂ [a1 . b1 = xi+1 . for instance. What about the other direction: is there. b]. i. the term of the upper sum corresponding to each of these N “bad” subintervals is  at most 1 · 2N . no: Theorem 8. but we will stick with the simpler notation) and b2 = xi+1 and have defined a sub-subinterval [a2 . Let f : [a. in our terminology the “denominator” of 24 is actually 2. if not. i=0 n 1 Now (44) implies that for at least one 0 ≤ i ≤ n − 1 we have ω(f. b].14. Since the maximum value of f on [0. Suppose then that there are N points x in [0. b1 ]. . 1] with q ≥ : indeed there is at most 1 such with denominator 1. inf n bn ] will do. xi+1 ]) < 1 2 . We then define a1 = xi . b] = [0. b1 ]. a Darboux integrable function which is dis- continuous at every point? In fact.e. . we know it is also Darboux integrable on [a1 . b]: i. Thus U (f. there is at least one c such that an ≤ c ≤ bn for all n: indeed supn an ≤ inf n bn . P) ≥  +  = 2. b1 ] ⊂ [ab ] on which . Step 1: We show that there is at least one c ∈ [a. b] such that f is continuous at c. INTEGRATION To see this. b] → R be Darboux integrable. We will use this analysis to choose a nested sequence of subintervals. First we take n = 1 and see that there is some closed subinterval [xi . We then put a2 = xi (this is not necessarily the same number that we were calling xi in the previous step. so any c ∈ [supn an . xi+1 ]) < n: for. 1] such that f (x) ≥ . xi+1 ]) < 1. at most . xi+1 ])(xi+1 − xi ) < . We will construct such a c using the Nested Intervals Theorem: recall that if we have a sequence of closed subintervals [an . [xi . [xi . + (xn − xn−1 ) = = .164 8. Since f is Darboux integrable. [xi . . so the above argument still works: there exists a partition P2 of [a1 . this part of the sum is at most N · N = . 1] is 1. Since f is Darboux integrable on [a. b] on which ω(f. since 2 1 4 = 2 in lowest terms). • an ≤ bn . n n n n n contradiction. [xi . All of our results so far have been in the direction of exhibiting examples of Darboux integrable functions with increasingly large sets of discontinuities. < xn−1 < xn = b} of [a.

since f : [a. Proof. there are discontinu- ous Darboux integrable functions. it is continuous. c) ≤ ω(f. continuing in this way we construct a nested sequence [an . Then we have mi (f 0 )(xi+1 − xi ) ≤ f 0 (ti )(xi+1 − xi ) ≤ Mi (f 0 )(xi+1 − xi ) and thus mi (f 0 )(xi+1 − xi ) ≤ f (xi+1 ) − f (xi ) ≤ Mi (f 0 )(xi+1 − xi ). Theorem 8. as we have seen. ω(f. B] such that f is continuous at c. To check differentiability at 0. [an . b2 ]) < n1 . < xN in [a. and thus it follows that f is continuous at infinitely many points of [a. Then fa. b].15 we are assuming only that f 0 is Darboux integrable. n i. B ∈ R with a ≤ x1 < A < B < x2 ≤ b. whereas in Theorem 8.1c)? Only in a rather subtle way: in order to apply Theorem 8. Choose any A. b] → R is Darboux integrable. Step 2: To show f has infinitely many points of continuity. ω(f.  3. [a2 . FURTHER RESULTS ON INTEGRATION 165 ω(f.  How is Theorem 8. bn ] of closed subintervals such that for all n ∈ Z+ .3: Let a. . [an . Applying Step 1. the base case N = 1 being Step 1 above. and limx→0 xa = 0.b be given by x 7→ xa sin( x1b ). It follows that for all n ∈ Z+ 1 ω(f. Then a f 0 = f (b) − f (a). we use the definition: f (h) − f (0) ha sin( h1b ) 1 f 0 (0) = lim = lim = lim ha−1 sin( b ). we get c ∈ [A. b] → R be differentiable and suppose f 0 is Darboux Rb integrable. c) = 0 and thus f is continuous at c by Proposition 8. Summing these inequalities from i = 0 to n − 1 gives L(f 0 . Darboux integrable derivatives? The possible discontinuities of a monotone function are incompatible with the possible discontinuities of a derivative: if f 0 is monotone. the sine of anything is bounded. the restriction of f to [A. bn ])) < . P). This completes the induction step. Every contin- uous function is Darboux integrable but. we conclude f (b) − f (a) = a f 0 .. we return to an old friend. x 6= 0 and 0 7→ 0. So suppose we have already shown f is continuous at x1 < x2 < . b]. it’s enough to show that for all N ∈ Z+ f is continuous at at least N distinct points. Let f : [a.b is infinitely differentiable except possibly at zero. P) ≤ f (b) − f (a) ≤ U (f 0 .10. A supplement to the Fundamental Theorem of Calculus. so the product approaches zero. b ∈ (0. Rb Since for the integrable function f .e. bn ]) < n1 . .1c) to f 0 .15 different from Theorem 8. xi+1 ] such that f (xi+1 ) − f (xi ) = f 0 (ti )(xi+1 − xi ). In fact.3. So we must look elsewhere for examples. B] is Darboux integrable on [A. Once again. bn ] for all n ∈ Z+ . B]. Let P be a partition of [a. and we can do this by induction. Now. and by construction c is different from all the continuity points we have already found. Now apply the Nested Intervals Theorem: there exists c ∈ R such that c ∈ [an .15. we need f 0 to be continuous. h→0 h h→0 h h→0 h . By the Mean Value Theorem there is ti ∈ [xi . What about discontinuous. b]. Example 3. It is con- tinuous at 0. 3. ∞) and let fa. a f is the unique number lying in between all Rb lower and upper sums.

As for the second term. b]. Thus we want an integration theory in which every derivative f 0 is integrable. As 0 for continuity of fa. By showing any interest in these results we are exploring the extent that the class of Darboux integrable functions goes beyond the class of all continuous functions on [a. 3.b are not Lebesgue integrable either. b] → R be Darboux integrable functions.15 applies to give Z x 0 fb+1.b (0) = fb+1. and all of the operations we are discussing here will take continuous functions to continuous functions. It is natural to want a fundamental theorem of calculus in which no hypothesis is needed on f 0 .4. Rb Rb a) For any constant C. b) The function f + g is Darboux integrable. mf +g be the infima of f . There is a more recent theory which allows every derivative to be integrable (and satisfy the fundamental theorem of calculus): it is called the Kurzweil-Henstock integral. 0 0 Next. Thus if a > 1 then fa.b does not exist. we compute the derivative for nonzero x and consider the limit as x → 0: 0 1 1 fa. This calculation shows that fa. In graduate real analysis.b (0) = fa. b] be a subinterval. and moreover Z b Z b Z b f +g = f+ g. if a < b + 1..166 8. provides the first evidence that the Darboux integral may not be the last word on integration theory.b (x). a a a Proof. This example. then 0 0 limx→0 fa.b (0) = 0. Theorem 8.b is unbounded near 0. The details may be safely left to the reader. but fa. x x The first term approaches 0 for a > 1. a) The idea here is simply that C may be factored out of the lower sums and the upper sums. mg . we know that every continuous function is integrable.b (x) − fa. the Lebesgue integral.b is bounded on any closed. As we have just seen. New Integrable Functions From Old. g . Therefore Theorem 8. and let mf ..b = fa. one meets a more powerful and general integral. Indeed. delicate though it is. Cf is Darboux integrable and a Cf = C a f . g : [a. INTEGRATION 0 This limit exists and is 0 iff a − 1 > 0 iff a > 1.16.b is continuous at 0 iff a > b + 1. But there is a third case: if a = b + 1. hence is not Darboux integrable on any interval containing zero. bounded interval. elementary operations on integrable functions yields integrable functions. then fa. so in this case we can apply the first version of the Fundamental Theorem of Calculus to conclude for instance that Z x 0 fa. In this section we show that performing many familiar.but not this one! In fact 0 for b > a + 1 the derivatives fa. in order for the 0 limit to exist we need a > b + 1. the Darboux integral is not such a theory. Let f. b) Let I ⊂ [a. x].b at zero.b (x) 0 for all b > 0.b = fb+1.b (x) − fb+1. say [0.b (x) = axa−1 sin( b ) − bxa−b−1 cos( b ). which remedies many of the defects of the Darboux integral.

xi+1 ])(xi+1 − xi ) > η. consider f (x) = x and g(x) = −x on [−1. P) + U (g. P) + ∆(g. Taking P = P1 ∪ P2 . so   ∆(f + g. We have set things up such  that for all i ∈ S1 . Similarly. both inequalities hold. P). b] such that ∆(f. P) ≤ ∆(f. P) ≤ ∆(f. . xi+1 ])(xi+1 − xi ) ≤ η. P) + ∆(g. P1 ) < 2 and P2 such that ∆(g. for  > 0. P) ≤ L(f + g. Let f : [a. P). P) < + = . 1. d] → R be continuous. subtracting the smallest quantity from the largest gives 0 ≤ ∆(f + g. < xn−1 < xn = b} of [a. d]. [xi . Mg and Mf +g the suprema of f . Combining these inequalities gives (45) L(f. Fix  > 0. i∈S1 i=0 . xi+1 ]) ≤ b−a+2M . mg = −1 and mf +g = 0. P) + L(g. n − 1} into two subsets: let S1 be the set of such i such that ω(f. denoting by Mf . and the inequalities (45) imply Rb Rb Rb a f + g = a f + a g. . Since S1 ⊂ {0. we may assume  η< . 3. so mf + mg < mf +g . b − a + 2M Since f is Darboux integrable. However there is a true inequality here: we always have mf + mg ≤ mf +g . and let S2 be the complementary set of all i such that ω(f.  since f and g are Darboux integrable. By the Uniform Continuity Theorem. 2 2 By Darboux’s Criterion. P). P) < η 2 . there is P1 such that ∆(f. Things would certainly be easy for us if we had mf + mg = mf +g . by the Extreme Value Theorem it is bounded: there exists M such that |g(x)| ≤ M for all x ∈ [c. FURTHER RESULTS ON INTEGRATION 167 and f + g. there exists η > 0 such that  |x − y| ≤ η =⇒ |g(x) − g(y)| ≤ b−a+2M . [xi . b] → R is Darboux integrable. Shrinking η if necessary. Then the composite function g ◦ f : [a. n − 1}. . we have X n−1 X (xi+1 − xi ) ≤ (xi+1 − xi ) = b − a. . P) ≤ U (f. . f +g is Darboux integrable.  Theorem 8. there exists a partition P = {a = x0 < x1 < . . 1]. we have Mf + Mg ≥ Mf +g and this implies that for every partition P of [a. . but observe that this need not be the case: e. P) + L(g. Since g is continuous on [c. . d] be Darboux integrable and g : [c.17.g. P). b] → [c. respectively on I. b] we have U (f + g. P) ≤ U (f. P) + U (g. . d]. Then mf = −1. P) ≤ L(f + g. Applying this on every subinterval of a partition P gives us L(f. g and f + g on some subinterval I. P2 ) < 2 . Proof. We divide the index set {0. Moreover. [xi . P) ≤ U (f + g. . ω(g ◦ f.

there is a smallest Lipschitz constant.19. and this shows that 1 is a Lipschitz constant for f . b]. xi+1 ])(xi+1 − xi ) + ω(f. y ∈ R. η i=0 η η Using this estimate.b] |f 0 (x)| is a Lipschitz constant for f . d] be Darboux integrable. there is z ∈ (x. Let f : [a. Thus we get X X ∆(g ◦ f.19. [xi .18. By the Mean Value Theorem. but this is not good enough: using it would give us a second term of 2M (b − a). INTEGRATION On the other hand. not something that we can make arbitrarily small. [xi . b − a + 2M b − a + 2M b − a + 2M  The proof of Theorem 8.. P) = ω(f. d] be bounded and g : [c. xi+1 ])(xi+1 . Then g ◦ f : [a. Proposition 8. the reverse triangle inequality reads: for all x. i. [xi .17 becomes much easier if we assume that g is not merely continuous but a Lipschitz function. P) < η 2 = η.e. |f (x) − f (y)| ≤ C|x − y|. and let g : [c.168 8. Example: f : R → R by x 7→ |x| is a Lipschitz function. ||x| − |y|| ≤ |x − y|. d] → R be Lipschitz with contant C. Exercise: Prove Lemma 8. b] → R be a C 1 -function. Then ω(g ◦ f. I) ≤ Cω(f. Indeed. xi ) = ∆(f. Recall that f : I → R is Lipschitz if there exists C ≥ 0 such that for all x. d] → R a Lipshitz function with Lipschitz constant C. b] → [c. Let f : [a. Exercise: a) For which functions may we take C = 0 as a Lipshitz constant? b) Let I be an interval. Let x < y ∈ [a. xi+1 ])(xi+1 − xi ) i∈S1 i∈S2 X X  X 1 (xi+1 − xi ) + (2M ) (xi+1 − xi ) ≤ (b − a) + 2M (xi+1 − xi ). b] → R is Darboux integrable. I). since −M ≤ f (x) ≤ M for all x ∈ [a.  Lemma 8. Proof. the oscillation of f on any subinterval of [a. Then M = maxx∈[a. b − a + 2M i∈S1 i∈S2 i∈S2 P (Note that reasoning as above also gives i∈S2 xi+1 − xi ≤ (b − a). Let f : I → [c. so |f (x)−f (y)| ≤ |f 0 (z)||x−y| ≤ M |x−y|. b] is at most 2M . y) such that f (x)−f (y) = f 0 (z)(x−y).) Here is a better estimate: X 1 X 1 X (xi+1 − xi ) = η(xi+1 − xi ) < ω(f.20. P) ≤ (b − a) + 2M η < + = . we get  (b − a) 2M  ∆(g ◦ f. xi+1 ])(xi+1 − xi ) η η i∈S2 i∈S2 i∈S2 n−1 1X 1 1 ω(f. [xi . . Show that for every Lipshitz function f : I → R. y ∈ I. Theorem 8. b].

4 Now. so there exists M with f1 ([a. Since f1 and f2 are Darboux integrable. M ] and thus Lipschitz there. a a  Corollary 8. and we have the integral triangle inequality Z b Z b | f| ≤ |f |. Thus we see that the composition of Darboux in- tegrable functions need not by Darboux integrable without some further hypothesis.20 g ◦ f = |f | is Darboux integrable on [a. b] → R is Darboux integrable. b] → R is Darboux integrable. [xi . [xi . i=0 C  Corollary 8. g is bounded and discontinuous only at 0 so is Darboux integrable. so is not Darboux integrable. i=0 C Pn−1 Then by Lemma 8.  Rb Rb Rb Warning: It is usually not the case that a f1 f2 = a f1 a f2 ! Example 3. both f1 + f2 and f1 − f2 are Darboux integrable. by (I2) we have Z b Z b Z b − |f | ≤ f≤ |f |. since −|f | ≤ f ≤ |f |. by Theorem 8. FURTHER RESULTS ON INTEGRATION 169 Proof. by Theorem 8. The function g(x) = x2 is C 1 on the closed. Example 3. P) = ω(g ◦ f. Then the product f1 f2 : [a. b] → R be Darboux integrable. 2] → R takes every rational number to 0 and every irrational number to 1. and then Theorem 8. Thus Theorem 8. b] → R be Darboux integrable. f2 : [a. bounded interval [−M. xi+1 ])(xi+1 − xi ) i=0 n−1 ! X  ≤C ω(f. Let f : [a. xi+1 ])(xi+1 − xi ) < C = . f2 ([a. Let f1 . Moreover.22. 1] to 1. and let g : [0. Proof. there is a partition P of [a. Fix  > 0. Then |f | : [a.16 appplies again to show that f1 f2 is Darboux integrable. b] with n−1 X  ∆(f. P) = ω(f.19 we have ∆(g ◦ f. Since f is integrable. they are both bounded. then we saw that if g .15. Then f is Darboux integrable. b]. M ]. but g ◦ f : [1.6: Let f : [1. a a a so Z b Z b | f| ≤ |f |.20 applies to show that (f1 + f2 )2 and (f1 − f2 )2 are Darboux integrable.7: Above we showed that if g is continuous and f is Darboux inte- grable then the composition g ◦ f is Darboux integrable. 1] be the function which takes the value 1q at every rational number pq and 0 at every irrational number. 1] → R be the function which takes 0 to 0 and every x ∈ (0. Since g(x) = |x| is a Lipshitz function. 2] → [0. a a Proof.21. [xi . xi+1 ])(xi+1 − xi ) < . b]) ⊂ [−M. b]). It is really just a dirty trick: we have the identity (f1 + f2 )2 − (f1 − f2 )2 f1 f2 = . 3.

we see that at least if f is continuous. the easiest counterexample I know is contained in a paper of Jitan Lu [Lu99]. But the Riemann sums are little more than a filigree to the main “dicing” property of the Darboux integral alluded to in the last paragraph: the Riemann sums will always lie in between the lower and upper .e. But this is not very explicit: how do we go about finding such a P ? In the (few!) examples in which we showed integrability from scratch. we will also introduce Rb a more general approximating sum to a f than just the upper and lower sums. must g ◦ f be Darboux integrable? The answer is again no. 4. The additional flexibility of Riemann sums is of great importance in the field of numerical integration (i. Thus what we really have is a rival construction of the Darboux integral. On the other. So what about the other way around: suppose f is continuous and g is Darboux integrable. This suggests a slightly different perspective: we view “Riemann integrability” as an additional property that we want to show that every Darboux integrable function possesses. although the definitions look different. P ) − L(f. Riemann Sums.170 8. Riemann gave an apparently different construction Rb of an integral a f which also satisfies axioms (I0) through (I3). In fact we will show something more general than this: in order to achieve ∆(f. a function f : [a. INTEGRATION and f are both merely Darboux integrable. b] → R is Riemann integrable iff it Rb Rb is Darboux integrable and then R a f = a f . P ) < . we saw that we could always take a uniform partition Pn : in particular it was enough to chop the interval [a. looking back at our first proof of the integrability of continuous functions. It turns out however to be relatively clear that a Riemann integrable function is necessarily Darboux integrable. namely we will define and consider Riemann sums. The key claim that we wish to establish in this section is that for any integrable function we will have ∆(f. we do not need P to have equally spaced subintervals but only to have all subintervals of length no larger than some fixed. Riemann’s (earlier) work. At the moment the theory tells us that if f is Darboux integrable. the Riemann integral R a f of Rb a continuous function agrees with our previously constructed Darboux integral a f .. P) and L(f. Pn ) <  for sufficiently large n. the branch of numerical analysis where we quantitatively study the error between a numerical approximation to an integral and its true value). it obviates the need for things like R a f . In fact. such uniform partitions always suffice. further insight on the relationship of Rb the upper and lower sums U (f. b] → R and a partition P of [a. and the Riemann Integral We now turn to the task of reconciling G. g ◦ f need not be Darboux integrable. b] into a sufficiently large number of pieces of equal size. P) < . sufficiently small constant δ. then for every  > 0 there exists some partition P of [a. Darboux’s take on the integral with B. and it pays some modest theoretical dividends as well. it highlights what is gained by this construction: namely. Dicing. P) to the integral a f . which is in some respects more complicated but also possesses certain advantages. By virtue of the Rb uniqueness of the integral of a continuous function. But this leaves open the question of how the class of “Riemann integrable func- tions” compares with the class of “Darboux integrable functions”. This seems like a clean way to go: one Rb the one hand. In fact. b] such that U (f. Given a function f : [a. b].

P. b]. b]. (i) =⇒ (ii): If f is Darboux integrable. show: a) If f is unbounded above. τ ) − I| < . Theorem 8. b) If f is unbounded below. For a function f : [a. b] → R be any function. τ τ Exercise 4. [xi . . in Rb whatever sense. there exists a real number I and a partition P of [a. then inf τ R(f. and moreover by integrablity Z b L(f. < xn−1 < xn = b}. τ ) = L(f. Let f : [a. P. then the same has to be true for the Riemann sums: they will be carried along for the ride. and let P be a partition of [a. P ) < . Instead of forming the rectangle with height the infimum (or supremum) of f on [xi . then supτ R(f. P) = ∞. Rb If the equivalent conditions hold. Conversely.1: Show that (47) holds even if f is unbounded. we associate the Riemann sum n−1 X R(f. For any refinement P of P we have ∆(f. P). τ ) of [a. Given a partition P = {a = x0 < x1 < . inf R(f. and it follows that the upper and lower sums associated to f and P are the supremum and infimum of the possible Riemann sums R(f. P. b] → R.23. P) ≤ f ≤ U (f. if f is bounded then for all  > 0 we can find x∗i . 4. τ ) = U (f. . τ ): (47) sup R(f. More precisely. Riemann sums. P ) = U (f. the following are equivalent: (i) f is Darboux integrable. AND THE RIEMANN INTEGRAL 171 sums. x∗n−1 }. (ii) For all  > 0. xi+1 ] such that sup(f. P. and the choice of a point x∗i ∈ [xi . b] → R and any tagged partition (P. DICING. then by Darboux’s Criterion there is a partition P such that ∆(f. P. τ ) is called a tagged partition. i=0 Let us compare the Riemann sums R(f. P ) − L(f. P) ≤ ∆(f. In this way we get a Riemann sum i=0 f (x∗i )(xi+1 − xi ) associated to the function f . P). P. P. Just because every value of a function f on a (sub)interval I lies between its infimum and its supremum. . [xi . and given any function f : [a. From inequalities (46) and (47) the following result follows almost immediately. x∗∗ i ∈ [xi . then I = a f . P ). xi+1 ] and take f (x∗i ) as the height of the rectan- Pn−1 gle. P) = −∞. A pair (P. the partition P. we have that for any tagging τ of P. P. P) ≤ R(f. τ ) = f (x∗i )(xi+1 − xi ). τ ) ≤ U (f. a . a choice of x∗i ∈ [xi . xi+1 ]) ≥ f (x∗∗ i ) − . say τ = {x∗0 . so if we can prove that the upper and lower sums are good approximations. 4. xi+1 ]) ≤ f (x∗i ) +  and inf(f. RIEMANN SUMS. . xi+1 ] for 0 ≤ i ≤ n − 1 is called a tagging of P and gets a notation of its own. xi+1 ] for all 0 ≤ i ≤ n − 1. . . τ ) = L(f. b] such that for any refinement P of P and any tagging τ of P we have |R(f. to the integral a f . τ ) to the upper and lower sums. P). P).1. xi+1 ]. (46) L(f. Proof. τ ) = U (f. we choose any point x∗i ∈ [xi .

25. P. P ) − L(f. τ ) and a lie in an interval of length less than . One can think of a kitchen assistant dicing vegetables – making a lot of knife cuts doesn’t ensure a good dicing job: you might have some tiny pieces but also some large pieces. b]. then |U (f. P ) − I| <  and thus U (f. (Levermore) Let  > 0. P) − f = (U (f. |R(f. For a partition P = {a = x0 < x1 < . . i. b] and let P2 ⊃ P1 be a refinement of P1 . b].. P ) − I| < . P ) = supτ R(f. a 2 a 2 Suppose |f (x)| ≤ M for all x ∈ [a. P)). Proof. P) − U (f. P ) < 2. Moreover. the only number I such that for all  > 0. τ ) − a f |. |L(f. P ) = inf τ R(f. P). P . Then |P2 | ≤ |P1 |. Rb Thus both R(f. Let P1 be a partition of [a. a a . f is Darboux integrable. P ) − f< . 0 ≤ U (f. τ ) − I| <  for all τ . b] → R be bounded. a a Proof. there exists δ > 0 such that for all partitions P of [a.172 8. P ) − I| <  is a f . P 0 ) − L(f.e. is less than . There exists a partition P of [a. Since U (f. τ ) “converge” to a f . Z b Z b (48) f − L(f. Now Z b Z b (49) 0≤ f − L(f. the proof involves little more than pushing around already established facts. P . We claim (48) holds for any partition P with |P| < δ. P. a Rb priori stronger. sense in which the Riemann sums converge to a f . P ) − I| <  and |L(f. Riemann considered a different. Indeed. Let N be the number of subintervals of P . τ ) ≤ U (f. b] with |P| < δ. Rather a proper dicing will ensure the mesh is sufficiently small.  Theorem 8. a a Z b Z b 0 (50) 0 ≤ U (f. For all  > 0. and put P 0 = P ∪ P .  Lemma 8. INTEGRATION For any tagging τ of P we have also L(f. (Dicing Lemma) Let f : [a. In the next section we discuss this and show that it holds for every Darboux integrable function. < xn−1 < xn = b} of [a. P ) < . (ii) =⇒ (i): Fix  > 0. P) <  and U (f. P ) − f ) + (U (f. there Rb is a partition P with |U (f. The mesh of a partition is a better measure of its “size” than the number of points it contains. Proving this is merely a matter of absorbing the definitions. let P be any partition with |P| < δ. P. 4. . P. Since  was arbitrary.23 gives a sense in which a function is Darboux integrable iff the Rie- Rb mann sums R(f.24. and it follows Rb that their distance from each other. P 0 )). b] such that Z b Z b   0≤ f − L(f. Choose δ > 0 such that 2M N δ < 2 . P . P 0 )) + (L(f. so we leave it to the reader as a good opportunity to stop and think. However. the largest length of a subinterval in P. Lemma 8. P) − f < . P) ≤ R(f.2. Dicing. P) = ( f − L(f. τ ). if |R(f. its mesh |P| is maxi (xi+1 − xi ). τ ) and L(f.

b] → R. DICING. then for every sequence (Pn . τn ) of tagged partitions of [a. Using the Refinement Lemma (Lemma 8. a) For a function f : [a. xi+1 ]) ≤ 2M (xi+1 − xi ) < 2M δ. P) < 2M N δ < . . For all other indices the terms are zero. Pi0 := P ∩ [xi . Pi0 ) − inf(f. Pi0 ) − inf(f. τn ) is convergent. R(f. there exists δ > 0 such that for all partitions P of [a. By the Refinement Lemma. 2 So the last terms on the right hand sides of (49) and (50) are each less than 2 . τn ) with |Pn | → 0. Pi0 )) . 2 0  0 ≤ U (f. τ ) − I| < . i=0 Because P 0 has at most N − 1 elements which are not contained in P. then necessarily I = a f . P) = (L(f. As for the other two: since P 0 is a refinement of P = {a = x0 < . xi+1 ]) − U (f. P 0 ) − L(f. n−1 X 0 ≤ L(f. we get  0 ≤ L(f.25). there are at most N − 1 indices i such that (xi . [xi . then a f = a f . a) (i) =⇒ (ii): if f is Darboux integrable. [xi . 4. τn ) of tagged partitions with Rb |Pn | → 0. b] of mesh at most δ and all taggings τ of P. Further. if (ii) holds then for any sequence of tagged partitions (Pn . P 0 ) ≤ f − L(f. xi+1 ] is a partition of [xi . 0 ≤ sup(f. Pn . Rb b) If condition (ii) holds for some real number I. P 0 ) − L(f. [xi . i=0 n−1 X 0 ≤ U (f. (ii) There exists a number I such that for all  > 0. P ) < 2M N δ < . b] such that |Pn | → 0. |R(f. P) − U (f. Theorem 8. AND THE RIEMANN INTEGRAL 173 We will establish the claim by showing that the two terms on the right hand side of (49) are each less than 2 and. xi+1 ])) . < . f< a a 2 This gives two of the four inequalities. < xN −1 < xN = b}. . xi+1 ) contains at least one point of Pi0 .3). c) If condition (iii) holds. . and property (ii) follows immediately from the Dicing Lemma (Lemma 8. P 0 ) − f ≤ U (f. . the following are equivalent: (i) f is Darboux integrable. RIEMANN SUMS. P. Pi0 ) ≤ 2M (xi+1 − xi ) < 2M δ. we have R(f. xi+1 ]. Because there are at most N − 1 nonzero terms. (ii) =⇒ (iii): Indeed. similarly. τn ) → I. Pn . for 0 ≤ i ≤ N − 1. we have Z b Z b  0≤ f − L(f.26. (iii) For every sequence (Pn . [xi . that the two terms on the right hand side of (50) are each less than 2 . Rb Rb Proof. τn ) → a f . P) − U (f. Pn . each nonzero term in either sum satisfies 0 ≤ L(f. the sequence of Riemann sums R(f. P ) < a a 2 and Z b Z b  0 ≤ U (f. P ) − . P 0 ) = (sup(f. xi+1 ]) − U (f. .  We can now deduce the main result of this section.

P) = inf τ L(f. Pn ) ≥ R(f. By definition. τ ). Pn . As mentioned above. P. tn ) < L(f. the sequence {R(f. a function f : [a.  4. Then for any partition P of [a. . Pn . the number I satisfying (ii) was unique. τ ) and L(f. for instance by a squeezing argument. Pn . lim U (f. Pn . INTEGRATION (iii) =⇒ (i): We will show the contrapositive: if f is not Darboux integrable. The Riemann Integral. P) = supτ R(f. Pn . P2n+1 . then there is a sequence (Pn . b) This follows from Theorem 8. It really is a bit messier though: on the one hand. a a For n ∈ Z . let Pn be the partition into n subintervals each of length b−a + n . and the associated integrals are the same. τn ) with |Pn | → 0 such that Z b lim R(f. n→∞ a Rb Rb Since a f 6= a f . Some contemporary texts take this approach as well. Z b Z b −∞ < f< f < ∞. for all n ∈ Z+ there is one tagging tn of Pn with L(f. n→∞ a Now let τn be tn if n is odd and Tn is n is even. τ2n+1 ) = f.. Pn . n→∞ a n→∞ a and it follows. Our condition (ii) is more stringent. the business about the taggings creates another level of notation and another (minor. τn ) is not convergent. Since U (f. P2n . τn ) of tagged partitions with |Pn | → 0 such that the sequence of Riemann sums R(f. there exists a tagging τ such that |R(f. Pn ) = f. τn )}∞ n=1 does not converge.e. Pn . By the Dicing Lemma. but nevertheless present) thing to worry about. that Z b lim R(f. Then we get a sequence of tagged partitions (Pn .23: therein. so there can be at most one I satisfying it. n→∞ a Z b lim R(f. Z b Z b lim L(f. Case 1: Suppose f is unbounded. Thus we can build a sequence of tagged partitions (Pn .174 8. τ )| > M . P. b] → R satisfying condition (ii) of Theorem 8. n→∞ a Z b lim R(f. what we have shown is that a function is Riemann integrable iff it is Darboux integrable. i. Tn ) = f. Pn ) − n1 . τn )| → ∞. Case 2: Suppose f is bounded but not Darboux integrable.3. Pn ) ≤ R(f. tn ) = f. and the number I associated to f is called the Riemann integral of f . τ2n ) = f. Pn ) + n1 and another tagging Tn of Pn with U (f. b] and any M > 0. Riemann set up his integration theory using the Riemann inte- gral. P. Pn ) = f. In this language.26 is Riemann integrable. τn ) with |Pn | → 0 and |R(f. c) This is almost immediate from the equivalence (ii) ⇐⇒ (iii) and part b): we leave the details to the reader. Tn ) > U (f.

Remark: By distinguishing between “Darboux integrable functions” and “Riemann integrable functions”. xi+1 ]: there is x∗i ∈ (xi . Solution: First observe that as a consequence of Theorem 8. However. but the argument was slightly complicated by the fact that we had only inequalities L(f. W. I encourage you to work out the details.3: Let f : [a. P). Using this equality and Theorem 8. P) ≥ U (f + g. Example 4. and thus the limit is surely f (b) − f (a). Rudin gives the Darboux version of “the Riemann integral”. P. there is some tagging such that Rb the corresponding Riemann sum for a f 0 is exactly f (b) − f (a)! Since the integral of an integrable function can be evaluated as the limit of any sequence of Riemann Rb sums over tagged partitions of mesh approaching zero. Apply the Mean Value Theorem to f on [xi . On the other hand. P. the more stringent notion of convergence in the definition of the Riemann integral can be hard to work with: directly showing that the com- position of a continuous function with a Riemann integrable function is Riemann integrable seems troublesome. but I confess I find it to be a more interesting argument. b] → R are both Darboux integrable. + (f (xn ) − f (xn−1 )) = f (b) − f (a). DICING. no matter what partition of [a. . we are exhibiting a fastidiousness which is largely absent in the mathematical literature. but then gives an exercise involving recognizing a certain limit as the limit of a sequence of Riemann sums and equating it with the integral of a certain function: he’s cheating! Let us illustrate with an example. P) + L(g. ..4: Compute limn→∞ k=1 k2 +n2 . this ambiguity leads to things which are not com- pletely kosher: in the renowned text [R]. < xn−1 < xn = b} of [a. Choose a partition P = {a = x0 < x1 < .. b] → R be differentiable such that f 0 is Darboux inte- grable. the Riemann sum is truly additive: R(f + g.and we did. The corresponding Riemann sum is R(f. Example 4. for any tagging τ of P. P. b]. It is more common to refer to the Riemann inte- gral to mean either the integral defined using either upper and lower sums and upper and loewr integrals or using convergence of Riemann sums as the mesh of a partition tends to zero. b] we choose. τ ) = (f (x1 ) − f (x0 )) + . we find that a f 0 is the limit of a sequence each of whose terms has value exactly f (b) − f (a). P.26. Now {x∗i }n−1 i=0 gives a tagging of P. Thus. However. AND THE RIEMANN INTEGRAL 175 But more significantly.23 leads to a more graceful proof that f + g is Rb Rb Rg integrable and a f + g = a f + a . Pn n Example 4. . 4. there are one or two instances where Riemann sums are more convenient to work with than upper and lower sums. P) ≤ L(f + g. P).2: Suppose f. . τ ). P) + U (g. τ ) = R(f. We wanted to show that f + g is also Darboux integrable. g : [a. for any Darboux . xi+1 ) with f (xi+1 ) − f (xi ) = f 0 (x∗i )(xi+1 − xi ). U (f. This is not really so different from the proof of the supple- ment to the Fundamental Theorem of Calculus we gave using upper and lower sums (and certainly. τ ) + R(g. no shorter). RIEMANN SUMS.

Here we give an unusually elementary proof of Lebesgue’s Theorem following lecture notes of A.k k S∞ Further. we have done our homework: by establishing Theorem 8. and (ii) ∞ S ⊂ i=1 In .176 8. We define a subset S ⊂ R to have measure zero if for all  > 0.k ) < 2k+1 . we have n Z 1 1X k lim f( ) = f.. `((a. n→∞ n n 0 k=1 Now observe that our limit can be recognized as a special case of this: n n n n X n 1 X n2 1X 1 1X k lim = lim = lim = lim f ( ). Then if we collect together all the open intervals into one big set {In. Lesbesgue’s Theorem 5. However. then so does their union S = n=1 Sn .e.k } is a covering of S = n=1 Sn by open intervals..e. i. it admits a covering by open intervals {In. Thus the limit is Z 1 dx π 2 = arctan 1 − arctan 0 = . There is one nagging technical problem: the covering is not given by a sequence – i. b)) = `((a. b)) = `([a. especially the Lebesgue integral.26 we have earned the right to use “Darboux integral” and “Riemann integral” interchangeably. For an interval I. not indexed by the positive integers – but rather by a double sequence – i. If each Sn has measure zero. as mentioned before. n n 2k+1 n. Proof. In fact however we will generally simply drop both names and simply speak of “in- tegrable functions” and “the integral”. ∞)) = `([a. Schep. Proposition 8. b] → R due to H. {In.k ) < = .. we denote by `(I) its length: for all −∞ < a ≤ b < ∞. Let {Sn }∞ n=1 be a sequence S∞ of subsets of R. b]) = b − a. Lebesgue. 0 x +1 2 Anyway.27. `((a. this is completely safe. you should be aware that in more advanced mathematical analysis one studies other kinds of integrals. indexed by ordered .R. Fix  > 0 and n ∈ Z+P . b]) = `((−∞. we see that X XX X  In. INTEGRATION integrable function f : [0. n→∞ k 2 + n2 k=1 n→∞ n k 2 + n2 k=1 n→∞ n ( nk )2 + 1 n→∞ n k=1 n k=1 1 where f (x) = x2 +1 . there is a sequence PN {In }Sof open intervals in R such that (i) for all N ≥ 1. Since Sn has measure zero. Lebesgue’s Theorem is a powerful. 5. every point of S lies in at least one of the intervals I. ∞)) = `((−∞.e.k = `(In. b)) = `((−∞. n=1 `(In ) ≤ .k and use the fact that sums of series with positive terms do not depends on the ordering of the terms.k }n. For us.k }∞ k=1 with  k `(In. 1] → R. ∞)) = ∞. Statement of Lebesgue’s Theorem. In this section we give a characterization of the Riemann-Darboux integrable func- tions f : [a. definitive result: many of our previous results on Riemann-Darboux integrable functions are immediate corollaries. b]) = `([a.1.

Exercise: Show that if in the definitions of measure zero and content zero we replace “open interval” by “interval”. (Suggestion: induct on N . LESBESGUE’S THEOREM 177 pairs (n. Use Theorem 8.  Theorem 8. Thus a function is Riemann-Darboux integrable on [a. k) of positive integers. there exists N ∈ Z+ and PN SN open intervals I1 . a1. e. . g : [a. a3. with a f = 0. a3. a1. . . b] at which f is discontinuous. The following are equivalent: (i) f is Riemann-Darboux integrable on [a.3 . In fact one goes further. b] | f (x) > 0} has measure zero. Let i be the set of indices such that SC ∩ [xi . 1] does not have content zero. b] → R be Riemann-Darboux integrable functions.g. a1. the following definition ought to be viewed as merely a technical tool used in the proof of Lebesgue’s Theorem. a1. simply by reindexing the terms of a double sequence an. b]. and choose a partition P of [a. we do not change the classes of sets of measure zero or content zero. i ∈ I ⇐⇒ . b] | f (x) ≥ C} has content zero.4 .2 .2 . Thus.3 . the set SC = {x ∈ [a.29. In contrast. an outer measure) to any subset S ⊂ R as the ∞ ∞ infimum of all quantitiesL∞ n=1 `(In ) as {In }n=1 ranges over all sequences of open intervals with S ⊂ n=1 In . 1] has measure zero. 5. . This is the beginning of a branch of real analysis called measure theory. (Suggestion: also use Proposition 8. the number of intervals. .1 .1 . a2. b] → R.28. b) The set S = {x ∈ [a. a2. b] → R be non-negative and Riemann-Darboux inte- Rb grable. as follows: a1.k via a single sequence bn . Then: a) For all C > 0. (Lebesgue) For a function f : [a. .28 to give a quick proof that f + g and f · g are both Riemann-Darboux integrable.27. a4. let D(f ) be the set of x ∈ [a. P)) <  · C. b] iff it is bounded (which we already know is necessary) and its set of discontinuities is “small” in the sense of having measure zero. xi+1 ] is nonempty. Exercise: Let f. IN such that i=1 `(Ii ) ≤  and S ⊂ i=1 Ii .) 5. a2.) Lemma 8. b] such that U(f. Preliminaries on Content Zero. Proof. A subset S ⊂ R has content zero if for every  > 0. by assigning a measure (to be entirely technicallyPprecise.1 . (ii) f is bounded and D(f ) has measure zero. The notion of sets of measure zero is a vitally important one in the further study of functions of a real variable. b) Show that Q ∩ [0.5 . Exercise: a) Show that Q ∩ [0. . Fix  > 0.2. But this is easily solved.2 .1 . Let f : [a.

. and only the extremely interested student need proceed. P) = Mi `([xi . for 0 ≤ i ≤ n − 1. we have ϕk (x) ≤ ϕ(x) ≤ f (x) ≤ Φ(x) ≤ Φk (x). Thus SC has content zero. 5. b] with Pk ⊂ Pk+1 such that |Pk | → 0 (e. Φ : [a. so are Riemann-Darboux integrable by X. [xi . moreover Z b Z n ϕ = L(f. a a Further. b). Φ = U (f. Φ are bounded with only finitely many discontinuities. the sequence φk (x) is increasing and bounded above. a a Step 1: Suppose that f : [a. xi+1 ]). P). [xi . Now apply Proposition 8. For all x ∈ [a. for all x ∈ [a. By X. i. Mi = sup(f. Show that E has content zero iff E has measure zero. hence convergent. i=0 i=0 Let ϕ. there is y ∈ E with |x − y| ≤ δ. say to ϕ(x). The proof of Lebesgue’s Theorem uses Heine-Borel (Theorem 6.178 8. Let {Pk } be a sequence of partitions of [a. The functions ϕ. Let ϕk and Φk be the lower and upper step functions for Pk .X. so it suffices to show that the set D(f ) of discontinuities of f has measure zero. As usual. xi+1 ]) < . b). so that for all k and all x ∈ [a. and let E be the set of x ∈ R such that for all δ > 0. hence convergent. b]. P). and similarly the sequence Φk (x) is decreasing and bounded below. xi+1 ]. Pk ) → f. which we will not discuss until Chapter 9. xi+1 ]). a a Z b Z b Φk = U(f. Pk ) → f. P) = mi `([xi . U (f. All in all this section should probably be omitted on a first reading.27. < xn−1 < xn = b} be a partition of [a. xi+1 ] and i∈I `([xi .17) and also the fact that a bounded monotone sequence is convergent (Theorem 10.  Exercise: Let E be a subset of R. xi+1 ). Each S n1 has content zero. b) and Z b Z b ϕk = L(f. so that n−1 X n−1 X L(f. S∞ b) We have S = n=1 S n1 . . P) ≥ Mi (xi+1 − xi ) ≥ C `([xi . since Pk ⊂ Pk+1 for all k. i∈I i∈I S P Thus SC ⊂ i∈I [xi . say to Φ(x). xi+1 ]). INTEGRATION Mi = sup(f. f is bounded.X. let Pk be the uniform subdivision of [a. hence certainly has measure zero. It follows that X X  · C > U(f. xi+1 ]).e. [xi . Step 0: Let P = {a = x0 < x1 < . b] into 2k subintervals). . b] → R be the lower and upper step functions. ϕ takes the value mi on [xi . put mi = inf(f. b] → R is Riemann-Darboux integrable.3. Proof of Lebesgue’s Theorem.11)..g. xi+1 ]) ≥ C. xi+1 ) and Φ takes the value Mi on [xi .

since x0 ∈ / Pk . . b] \ E. Fix  > 0. |f (y) − f (z)| ≤ 2(b−a) .29 to Φ − ϕ. b] \ E. b]. we get that ϕ = Φ except on a set of measure zero. b]. Since ϕ(x) = Φ(x). Let P = {a = t0 < t1 < . . Let I be the set of i such that (ti . b] whose points are the endpoints of I1 . . ti+1 ]) 2(b − a) i∈I i∈I /     < 2M + (b − a) = . ti+1 ]) · 2M + `([ti . say {Ik }N k=1 ∪ {Jxi }. the subinterval (tj . We claim that f is continuous at every point of [a. < tN = b} be the partition of [a. b] | ϕ(x) 6= Φ(x)} Pk . k Since E is the union of a set of measure zero with a the union of a sequence of finite sets. b].b]\E of [a. there is k ∈ Z+ such that Φk (x0 ) − ϕk (x0 ) < . ti+1 ]) sup(f (x) − f (y) : x. Applying the Heine-Borel Theorem (Theorem 6. there is a sequence {In }∞ n=1 of open intervals covering E such that  n=1 `(In ) < 4M . Basic definitions and first examples. . a a a a a a a Rb These inequalities show that ϕ and Φ are Riemann-Darboux integrable and a ϕ = Rb Rb a Φ = a f . ti+1 ]) i=0 X X  ≤ `([ti . . proof of claim Fix x0 ∈ [a. For x ∈ [a. y ∈ [ti . tj+1 ) is contained in some Ik or some Jxi . 6. Since E has measure P∞ zero. f is Riemann-Darboux integrable on [a. z ∈ Jx . Further. or 0 ≤ j ≤ N − 1. . which will be enough to complete this direction of the proof. it has measure zero. IMPROPER INTEGRALS 179 and thus Z b Z b Z b Z b Z b Z b Z b ϕk ≤ ϕ≤ ϕ≤ f≤ Φ≤ Φ≤ Φk . x0 + δ). Step 2: Suppose now that f is bounded on [a. there is a finite subcovering. 4M 2(b − a) Since  > 0 was arbitrary. P) = `([ti . Let M be such that |f (x)| ≤ M for all x ∈ [a. Then N X −1 U(f.17) to the open covering {Ik } ∪ {Jx }x∈[a. b] and continuous on [a. b]. b]. Let [ E = {x ∈ [a. b] \ E for a subset E of measure zero. Improper Integrals 6.1. ti+1 ) is contained in Ik for some k. b] \ E. IN and the endpoints of the intervals Jxi which lie in [a. there is δ > 0 such that Φk − ϕk is constant on the interval (x0 − δ. . P) − L(f. We must show that f is Riemann-Darboux integrable on [a. by continuity of f there is an open interval  Jx containing x such that for all y. 6. so f is continuous at x0 . Applying Lemma 8. and for x in this interval we have − < ϕk (x0 ) − Φk (x0 ) ≤ f (x) − f (x0 ) ≤ Φk (x0 ) − Φk (x0 ) < .

180 8. INTEGRATION

6.2. Non-negative functions.

Things become much simpler if we restrict to functions f : [a, ∞) → [0, ∞): in
words, we assume that f is defined and non-negative for all sufficiently large x.
As usual we Rsuppose that f is integrable on [a, b] for all bR ≥ a, so we may de-
x ∞
fine F (x) = a f for x ≥ a. Then the improper integral a f is convergent iff
limx→∞ F (x) exists. But here is the point: since Rf is non-negative,R x F is weakly
x Rx
increasing: indeed, for x1 ≤ x2 , F (x2 ) − F (x1 ) = a 2 f − a 1 f = x12 ≥ 0. Now
for any weakly increasing function F : [a, ∞) → R we have
lim F (x) = sup(f, [a, ∞)).
x→∞
In other words, the limit exists as a real number iff F is bounded; otherwise, the
limit is ∞: there is no oscillation! We deduce:

R ∞∞) → [0, ∞) be integrable
Proposition 8.30. Let f : [a, R ∞ on every finite interval
[a, N ] with N ≥ a. Then either a f is convergent or a f = ∞.
InRview of Proposition 8.30, we may write the two alternatives as

• a f < ∞ (convergent)
R∞
• a f = ∞ (divergent).
R∞ 2
Example: Suppose we wish to compute −∞ e−x . Well, we are out of luck: this
integral cannot be destroyed – ahem, I mean computed – by any craft that we
here possess.4 The problem is that we do not know any useful expression for the
2
antiderivative of e−x (and in fact it can be shown that this antiderivative is not an
“elementary function”). But because we are integrating a non-negative function,
we know that the integral is either convergent or infinite. Can we at least decide
which of these alternatives is the case?
Yes we can. First, since we are integrating an even function, we have
Z ∞ Z ∞
2 2
e−x = 2 e−x .
−∞ 0
−x2 1
Now the function e = ex2 is approaching 0 very rapidly; in fact a function like
−x 1
e = ex exhibits exponential decay, and our function is even smaller than that,
R∞ 2
at least for sufficiently large x. So it seems like a good guess that 0 e−x < ∞.
Can we formalize this reasoning?

Yes we can. First, for all x ≥ 1, x ≤ x2 , and since (ex )0 = ex > 0, ex is in-
2 2
creasing, so for all x ≥ 1 ex ≤ e−x , and finally, for all x ≥ 1, e−x ≤ e−x . By the
familiar (I2) property of integrals, this gives that for all N ≥ 1,
Z N Z N
2
e−x ≤ e−x ,
1 1
and taking limits as N → ∞ we get
Z ∞ Z ∞
2
e−x ≤ e−x .
1 1

4Elrond: “The Ring cannot be destroyed, Gimli, son of Glóin, by any craft that we here
possess. The Ring was made in the fires of Mount Doom. Only there can it be unmade. It must
be taken deep into Mordor and cast back into the fiery chasm from whence it came.”

6. IMPROPER INTEGRALS 181

This integral is much less scary, as we know an antiderivative for e−x : −e−x . Thus
Z ∞ Z ∞
−x2 1
e ≤ e−x = −e−x |∞1 = −(e
−∞
− e−1 ) = .
1 1 e
R ∞ −x2 R ∞ −x2
Note that we replaced 0 e with 1 e : does that make a difference? Well,
R1 2
yes, the difference between the two quantities is precisely 0 e−x , but this is a
“proper integral”, hence finite, so removing it changes the value of the integral –
which we don’t know anyway! – but not whether it converges. However we can be
2
slightly more quantitative: for all x ∈ R, x2 ≥ 0 so e−x ≤ 1, and thus
Z 1 Z 1
2
e−x ≤ 1 = 1,
0 0
and putting it all together,
Z ∞ Z ∞ Z 1 Z ∞   
−x2 −x2 −x2 −x2 1
e =2 e =2 e + e ≤2 1+ = 2.735 . . . .
−∞ 0 0 1 e
The exact value of the integral is known – we just don’t possess the craft to find it:
Z ∞
2 √
e−x = π = 1.772 . . . .
−∞

This is indeed smaller than our estimate, but it is a bit disappointinglyR ∞ far 2off.
Later in the course we will develop methods suitable for approximating 0 e−x to
any desired finite degree of
√ accuracy, and we will be able to check for instance that
this integral agrees with π to, say, 20 decimal places.

The following simple theorem formalizes the argument we used above.
Theorem 8.31. (Comparison Test For Improper Integrals) Let f, g : [a, ∞) →
[0, ∞)R be integrable onR[a, N ] for all N ≥ a. Suppose f (x) ≤ g(x) for all x ≥ a.
∞ ∞
a) If a g < ∞, then a f < ∞.
R∞ R∞
b) If a f = ∞, then a g = ∞.
Proof. By property (I2) of integrals, for all N ≥ a since f (x) ≤ g(x) on [a, N ]
RN RN
we have a f ≤ a g. Taking limits of both sides as N → ∞ gives
Z ∞ Z ∞
(51) f≤ g;
a a

∞ ∞ R ∞ or ∞. FromR ∞
here each Rside is either a Rnon-negative real number (51) both parts
follow: if a g < ∞ then a f < ∞, whereas if a f = ∞ then a g = ∞.5 

Theorem 8.32. (Limit Comparison Test For Improper Integrals)
Let f, g : [a, ∞) → [0, ∞) be integrable on [a, N ] for all N ≥ a. Consider condition
(S): there exists b ≥Ra and M > 0 suchR ∞that f (x) ≤ M g(x) for all x ≥ b.

a) If (S) holds and a g < ∞, then a f < ∞.
R∞ R∞
b) If (S) holds and a f = ∞, then a g = ∞.
c) Suppose there exists b ≥ a such that g(x) > 0 for all x ≥ b and that limx→∞ fg(x)
(x)
=
L < ∞. Then (S) holds.

5In fact parts a) and b) are contrapositives of each other, hence logically equivalent.

182 8. INTEGRATION

f (x)
d) Suppose there exists b ≥ a such that g(x) > 0 for all x ≥ b and that limx→∞ g(x) =
L with 0 < L < ∞. Then
Z ∞ Z ∞
f < ∞ ⇐⇒ g < ∞.
a a

Proof. For any b ≥ a, since f and g are integrable on [a, b], we have
Z ∞ Z ∞ Z ∞ Z ∞
f < ∞ ⇐⇒ f < ∞, g < ∞ ⇐⇒ g < ∞.
a b a b
R∞ R∞ R∞
a) If f (x) ≤ M g(x) for all x ≥ b, then b f ≤ b M g = M b g. Thus if
R∞ R∞ R∞ R∞
a
g < ∞, b g < ∞, so b f < ∞ and thus finally a f < ∞.
b) Note that this is precisely the contrapositive of part a)! Or to put it in a
slightly different way: suppose (S) holds. Seeking a contradiction, we also suppose
R∞ Rf
a
g < ∞. Then by part a), a f < ∞, contradiction.
c) Since limx→∞ fg(x)
(x)
= L < ∞, there is b ≥ a such that for all x ≥ b, fg(x)
(x)
≤ L + 1.
Thus for all x ≥ b we have f (x) ≤ (L + 1)g(x), so (S) holds.
d) Suppose limx→∞ fg(x) (x)
= L ∈ (0, ∞). By part c), (S) holds, so by part a), if
R∞ R∞
a
g < ∞, then a f < ∞. Moreover, since L 6= 0, limx→∞ fg(x) 1
(x) = RL ∈ (0, ∞). So
R∞ ∞
part c) applies with the roles of f and g reversed: if a f < ∞ then a g < ∞. 

Although from a strictly logical perspective part d) of Theorem 8.32 is the weakest,
it is the most useful in practice.

7. Some Complements
Riemann was the first to give a complete, careful treatment of integration as a pro-
cess that applies to a class of functions containing continuous functions and yields
the Fundamental Theorem of Calculus: he was certainly not the last.

In more advanced analysis courses one studies the Lebesgue integral. This is
a generalization of the Riemann integral that, extremely roughly, is modelled on
the idea of approximations in small ranges of y-values rather than small ranges of
x-values. Suppose for instance that f : [0, 1] → R has a finite range y1 , . . . , yN .
Rb
Then the idea is that a f is determined by the size of each of the sets f −1 (yi ) on
which f takes the value yi . For instance this is the perspective of the expected value
in probability theory: given a function defined on a probability space, its expected
value is equal to the sum of each of the values at each possible outcome multipled
by the probability that that outcome occurs. Notice that when f is a step function,
the sets f −1 (yi ) are just intervals, and the “size” of an interval (a, b), [a, b), (a, b] or
[a, b] is just its length b − a. However, a general function f : [0, 1] → R with finite
image need not have the sets f −1 (yi ) be intervals, so the first and hardest step in
Lebesgue’s theory is a rigorous definition of the measure of a much more general
class of subsets of [a, b]. This has grown into a branch of mathematics in its own
right, measure theory, which is probably at least as important as the integra-
tion theory it yields. However, Lebesgue’s integral has many technical advantages
over the Riemann integral: the supply of Lebesgue integrable functions is strictly
larger, and in particular is much more stable under limiting operations. Later we

7. SOME COMPLEMENTS 183

will study sequences of functions fn : [a, b] → R and see that a crucial issue is the
permissibility of the interchange of limit and integral: i.e., is
Z b Z n
lim fn = lim fn ?
n→∞ a a n→∞

For the Riemann integral it is difficult to prove general theorems asserting that the
left hand side exists, a phenomenon which evidently makes the identity hard to
prove! Probably the real showpiece of Lebesgue’s theory is that useful versions of
such convergence theorems become available without too much pain.

In classical probability theory one encounters both “discrete” and “continuous”
probability distributions. The idea of a continuous probability distribution on [a, b]
can be modelled by a non-negative integrable function f : [a, b] → R such that
Rb
a
f = 1. This is studied briefly in the next chapter. Lebesgue’s theory allows for
distributions like the Dirac delta function, a unit mass concentrated at a single
point. It turns out that one can model such discrete distributions using a simpler
theory than Lebesgue’s theory due to the Dutch mathematician T.J. Stieltjes (born
in 1856; Riemann was born in 1826, Lebesgue in 1875). This is a relatively mild
souping up of Riemann’s theory which views the dx as referring to the function
f (x) = x and replaces it by dg for a more general function g : [a, b] → R. Some
undergraduate texts develop this “Riemann-Stieltjes integral”, most notably [R].
But this integral adds one extra layer of technical complication to what is already
the most technical part of the course, so I feel like in a first course like this one
should stick to the Riemann integral.

In the twentieth century various generalizations of the Riemann integral were de-
veloped which are much more closely related to the Riemann integral than the
Lebesgue integral – avoiding any need to develop measure theory – but are yet
more powerful than the Lebesgue integral: i.e., with a larger class of integral func-
tions, good convergence theorems, and a simple general form of the Fundamental
Theorem of calculus. This integral is now called the Kurzweil-Henstock integral
although it was also developed by Denjoy, Perron and others. This theory however
is not so widely known, whereas every serious student of mathematics studies the
Lebesgue integral. As alluded to above, the reason is probably that the measure
theory, while a lot to swallow the first time around, is in fact a highly important
topic (in particular making a profound connection between analysis and geometry)
in its own right. It is also extremely general, making sense in contexts far beyond
intervals on the real line, whereas the Kurzweil-Henstock integral is more limited
in scope.

Aside from developing his new and more sophisticated theory of integration, Lebesgue
did much insightful work in the more classical context, as for instance his remark-
able characterization of Riemann integrable functions presented above. I want to
mention a further achievement of his which has been almost forgotten until re-
cently: from the perspective of differential calculus, a natural – but very difficult –
question is Which functions are derivatives? We now know that in particular
every continuous function f : [a, b] → R is the derivative of some other function
F , but in order to construct the antiderivative we had to develop the theory of

184 8. INTEGRATION

the Riemann integral. Is it possible to construct an antiderivative of an arbitrary
Rb
continuous function f in some other way than as a f ? (I don’t mean is there some
Rb
other function which is antiderivative: as we know, a f + C is the most general
antiderivative of f . But for instance we certainly don’t need an integration theory
to know that polynomials have antiderivatives: we can just write them down. The
question is whether one can find an antiderivative without expressing it in terms of
an integration process.)

Lebesgue gave a considerably more elementary construction of antiderivatives of
every continuous function. The basic idea is shockingly simple: on the one hand it
is easy to write down an antiderivative of a piecewise linear function: it is simply
an appropriate piecewise quadratic function. On the other hand every continu-
ous function can be well approximated by piecewise linear functions, so that one
can write down the antiderivative of any continuous function as a suitable limit of
piecewise quadratic functions. A nice modern exposition is given in [Be13].

CHAPTER 9

Integral Miscellany

1. The Mean Value Therem for Integrals
Theorem 9.1. Let f, g : [a, b] → R. Suppose that f is continuous and g is
integrable and non-negative. Then there is c ∈ [a, b] such that
Z b Z b
(52) f g = f (c) g.
a a

Proof. By the Extreme Value Theorem, f assumes a minimum value m and
Rb
a maximum value M on [a, b]. Put I = a g; since g is non-negative, so is I. We
have
Z b Z b Z b
(53) mI = mg ≤ fg ≤ M g = M I.
a a a
Case 1: Suppose I = 0. Then by (53), for all c ∈ [a, b] we have
Z b Z b
f g = 0 = f (c) g.
a a
Case 2: Suppose I 6= 0. Then dividing (53) through by I gives
Rb
fg
m≤ a ≤ M.
I
Rb
fg
Thus aI lies between the minimum and maximum values of the continuous func-
tion f : [a, b]R → R, so by the Intermediate Value Theorem there is c ∈ [a, b] such
b
fg Rb
that f (c) = Rab g . Multiplying through by a g, we get the desired result. 
a

Exercise 9.1.1: Show that in the setting of Theorem 9.1, we may take c ∈ (a, b).

Exercise 9.1.2: Show by example that the conclusion of Theorem 9.1 becomes false
– even when g ≡ 1 – if the hypothesis on continuity of f is dropped.

Exercise 9.1.3: Suppose that f : [a, b] → R is differentiable and f 0 is continu-
ous on [a, b]. Deduce the Mean Value Theorem for f from the Mean Value Theoem
for Integrals and the Fundamental Theorem of Calculus.1

2. Some Antidifferentiation Techniques
2.1. Change of Variables.

1Since the full Mean Value Theorem does not require continuity of f 0 , this is not so exciting.
But we want to keep track of the logical relations among these important theorems.

185

186 9. INTEGRAL MISCELLANY

2.2. Integration By Parts.

Recall the product rule: if f, g : I → R are differentiable, then so is f g and
(f g)0 = f 0 g + f g 0 .
Equivalently,
f g 0 = (f g)0 − f 0 g.
We have already (!) essentially derived Integration by Parts.
Theorem 9.2. (Integration by Parts) Let f, g : I → R be continuously differ-
0 0
entiable functions:
R 0 i.e., f and R 0 g are defined and continuous on I.
a) We have f g = f g − f g, in the sense that subtracting from f g any anti-
derivative of f 0 g gives an antiderivative of f g 0 .
b) If [a, b] ⊂ I then
Z b Z b
f g 0 = f (b)g(b) − f (a)g(a) − f 0 g.
a a

Exercise 9.2.1: a) Prove Theorem 9.2. (Yes, the point is that it’s easy.)
b) Can the hypothesis on continuous differentiability of f and g be weakened?

Exercise 9.2.2: Use Theorem 9.2a) to find antiderivatives for the following func-
tions.
a) xn ex for 1 ≤ n ≤ 6.
b) log x.
c) arctan x.
d) x cos x.
e) ex sin x.
f) sin6 x.
g) sec3 x, given that log(sec x + tan x) is antiderivative of sec x.

Exercise 9.2.3: a) Show that for each n ∈ Z+ , there is a unique monic (= leading
d
coefficient 1) polynomial Pn (x) such that dx (Pn (x)ex ) = xn ex .
b) Can you observe/prove anything about the other coefficients of Pn (x)? (If you
did part a) of the preceding exercise, you should be able to find patterns in at least
three of the non-leading coefficients.)

Exercise 9.2.4: Use Theorem 9.2a) to derive the following reduction formulas:
here m, n ∈ Z+ and a, b, c ∈ R.
a)
n−1
Z Z
1
cosn x = cosn−1 x sin x + cosn−2 x.
n n
b)
2n − 3
Z Z
1 x 1
2 2 n
= 2 2 2 n−1
+ 2
.
(x + a ) 2a (n − 1)(x + a ) 2a (n − 1) (x + a2 )n−1
2

c)
−1 m−1
Z Z
m n
sin ax cos ax = sinm−1 ax cosn+1 ax + sinm−2 ax cosn ax
a(m + n) m+n

Now to make progress in R ∞ n+1 −x evaluating 0 x e dx. a(m + n) m+n In this text we are not much interested in antidifferentiation techniques per se – we regard this as being somewhat forward-looking. Pn−1 xi ∗ = a + i b−a n and L n (f ) = ∗ b−a i=0 (xi ) n .3. R∞ Proposition 9. Then f . For all n ∈ N. Approximate Integration Despite the emphasis on integration (more precisely.2 Theorem 9.4. Base case (n = 0): −x −x ∞ 0 e = −e |0 = −e−∞ − (−e0 ) = −0 − (−1) = 1 = 0! R∞ Induction step: let n ∈ N and assume 0 xn e−x dx = n!. taking u = xn + 1. b]. For n ∈ Z+ . 3. In practice it Rb would often be sufficient to approximate a f rather than know it exactly. In fact this result.2.3. R ∞ Proof. let Ln (f ) be the left endpoint Riemann sum ob- tained by dividing  [a. i. since computer algebra packages are by now better at this than any human ever could be – so most of our interest in integration by parts comes from part b) of Theorem 9. b] → R be dif- ferentiable with bounded derivative: there is M ≥ 0 such that |f 0 (x)| ≤ M for all x ∈ [a. antidifferentiation!) techniques in a typical freshman calculus class.e. simple though it is. one that can be written in (finitely many) terms of the elementary functions one learns about in precalculus mathematics. (Endpoint Approximation Theorem) Let f : [a. b] into n equally spaced subintervals: thus for 0 ≤ i ≤ n − 1. Thus one wants methods for evaluating definite integrals other than the Fundamental Theorem of Calculus. it is a dirty secret of the trade that in practice many functions you wish to integrate do not have an “elementary” antiderivative.. v = e−x . Then du = (n + 1)xn dx. and Z ∞ Z ∞ n+1 −x n −x oo x e dx = (n + 1)x e |0 − (−e−x (n + 1)xn )dx 0 0 Z ∞ IH = (0 − 0) + (n + 1) xn e−x dx = (n + 1)n! = (n + 1)! 0  2. APPROXIMATE INTEGRATION 187 n−1 Z 1 = sinm+1 ax cosn−1 ax + sinm ax cosn−2 ax. 3. we integrate by parts. By induction on n. is a remarkably powerful tool throughout honors calculus and analysis. Integration of Rational Functions. dv = e−x dx. 0 xn e−x dx = n!.

Z .

 .

b .

(b − a)2 M 1  f − Ln (f ).

≤ .

.

2 n .

.

a .

Note that L1 (f ) = (b − a)f (a). By the Racetrack Principle. for all x ∈ [a. Step 1: We establish the result for n = 1. Proof. b] we have −M (x − a) + f (a) ≤ f (x) ≤ M (x − a) + f (a) 2It would be reasonable to argue that if one can approximate the real number R b f to any a degree of accuracy. then in some sense one does know it exactly. .

INTEGRAL MISCELLANY and thus Z b Z b Z b (−M (x − a) + f (a))) ≤ f≤ (M (x − a) + f (a)).188 9. 2 a 2 which is equivalent to . a a a Thus Z b −M 2 M (b − a) + (b − a)f (a) ≤ f≤ (b − a)2 + (b − a)f (a).

Z .

.

b .

M f − L1 (f ).

≤ (b − a)2 . .

.

2 .

.

a .

Then . Step 2: Let n ∈ Z+ .

Z .

X Z x∗i+1 X .

.

Z x∗i+1 ! .

.

b n−1 n−1 ∗ b−a b−a .

f − Ln (f ).

= | f − f (xi )( ) |≤ f − f (xi ∗ )( )| .

.

n n .

.

.

a .

xi ∗ .

xi ∗ i=0 i=0 Step 1 applies to each term in the latter sum to give .

Z .

n−1 .

b .

X M b−a  (b − a)2 M 1  2 f − L (f ) ≤ ( ) = . .

.

n 2 n 2 n .

.

.

a .

xi+1 ] via the line which matches f at the two endpoints of the interval. Notice that this is the area of the trapezoid whose upper edge is the line segment which joins (xi . .1: Show that Theorem 9.+f (a+(n−1)∆)+ f (b). 2 2 On each subinterval [xi . f (xi )) Rb to (xi+1 . f (xi+1 ). at least for Rb nice f and large n. That’s cute. Exercise 9. b] → R is increasing. In view of the preceding exercise there is certainly no reason to prefer left end- point sums over right endpoint sums. n=0  Exercise 9. if you stare at pictures of left and right endpoint sums for a while. xi+1 ] the  average of the left and right endpoint sums is f (xi )(xi+1 −xi )+f (xi+1 )(xi+1 −xi ) f (xi )+f (xi+1 ) 2 = 2 (xi+1 − xi ). Show that Z b 1 0≤ f − Ln (f ) ≤ (f (b) − f (a))(b − a) . a n b) Derive a similar result to part a) for the right endpoint sum Rn (f ). . Thinking of the trapezoidal rule this way suggests that Tn (f ) should. a better approximation to a f than either the left or right .3. but a more useful way of thinking about Tn (f ) is that it is obtained by linearly interpolating f on each subinterval [xi .4 holds verbatim for with left endpoint sums replaced by right endpoint sums Rn (f ). c) Derive similar results to parts a) and b) if f is decreasing. we get (54) 1 1 Tn (f ) = Ln (f )+Rn (f ) = f (a)+f (a+∆)+f (a+2∆) .3. For this reason the approximation Tn (f ) to a f is often called the trapezoidal rule. In fact. eventually it will occur to you to take the average of the two of them: with ∆ = b−a n .2: a) Suppose f : [a.

Another motivation is that if f is increasing. Then for all n ∈ Z+ we have . taking f to be linear with slope M (or −M ) was the worst case scenario: if we approximate a linear function `(x) with slope ±M on the interval [a.4 showed that for each fixed M . b] → R be twice differentiable with bounded second derivative: there is M ≥ 0 such that |f 00 (x)| ≤ M for all x ∈ [a. b]. b] by the horizontal line given by the left endpoint (say). 3. (Trapezoidal Approximation Theorem) Let f : [a. For instance. so averaging the two of them gives something which is closer to the true value. then we may as well assume that Rb `(a) = 0 and then we are approximating ` by the zero function. The following result confirms our intuitions. if f is linear then on each subinterval we are approx- Rb imating f by itself and thus Tn (f ) = a f .5. hence M 2 2 (b − a) . our proof of Theorem 9. Theorem 9. then Ln (f ) is a lower estimate and Rn (f ) is an upper estimate. APPROXIMATE INTEGRATION 189 endpoint sums. This was certainly not the case for the endpoint rules: in fact. whereas a ` is the area of the triangle with base (b − a) and height M (b − a).

Z .

 .

b .

(b − a)3 M  1 f − Tn (f ).

≤ . .

.

12 n2 .

.

a .

12 12 . . . [a. b]). Also put n . b]) B = sup(f 00 . [a. . . integrating and applying the Racetrack Principle gives A 2 B t ≤ ϕ0i (t) ≤ t2 . let xi = a + i∆. n − 1. using the Fundamental Theorem of Calculus. Z xi+1 t ϕi : [0. b−a Proof. 2 2 0 Since ϕi (0) = 0. Then for all 0 ≤ i ≤ n − 1 and t ∈ [0. i=0 a We have ϕi (0) = 0 and. ∆] → R. Put ∆ = and for i = 0. 2 2 2 2 so ϕ0i (0) = 0 and −1 0 1 1 1 ϕ00i = f (xi + t) + f 0 (xi + t) + tf 00 (xi + t) = tf 00 (xi + t). 4 4 doing it again (since ϕi (0) = 0) and plugging in t = ∆ gives A 3 B 3 ∆ ≤ ϕi (∆) ≤ ∆ . 2 2 2 2 Put A = inf(f 00 . 1 t 1 t ϕ0i = (f (xi ) + f (xi + t))+ f 0 (xi +t)−f (xi +t) = (f (xi )−f (xi +t))+ f 0 (xi +t). ϕi (t) = (f (xi ) + f (xi + t)) − f. ∆] we have A B t ≤ ϕ00i (t) ≤ t. 2 xi As motivation for this definition we observe that n−1 X Z b (55) ϕi (∆) = Tn (f ) − f.

INTEGRAL MISCELLANY Summing these inequalities from i = 0 to n − 1 and using (55) we get Z b A 3 B 3 ∆ n ≤ Tn (f ) − f≤ ∆ n. 12 a 12 Substituting in ∆ = b−a n gives Z b A(b − a)3 1 B(b − a)3     ≤ Tn (f ) − f≤ .190 9. 12 n2 a 12 Since |A|. we get the desired result: . |B| ≤ M .

Z .

 .

b .

(b − a)3 M  1 f − Tn (f ).

. ≤ .

.

12 n2 .

.

a .

3. let’s consider the Riemann sums in which each sample point x∗i is the midpoint of the subinterval [xi . then Mn (f ) ≤ a f . Theorem 9.5: a) Show: if f is linear.6.3. suppose  f is3 continuous on [a. the midpoint rule is exact: a f = Mn (f ).  00 Exercise 9. the trapezoidal rule is exact: a f = Tn (f ) for all n ∈ Z+ . the trapezoidal rule is never exact: a f 6= Tn (f ). Then for all n ∈ Z+ we have . i=0 2 Rb Exercise 9. b] into n subintervals of equal width ∆ = b−a n as usual. Rb Exercise 9. This suggests a differ- ent kind of approximation scheme: rather than averaging endpoint Riemann sums. This is an instance of a heuristic in statistical reasoning: extremal sample points are not as reliable as interior points. then Tn (f ) ≥ a f.3: In the setting of Theorem 9.4: Show: if f : [a.5. b] → R be twice differentiable with bounded second derivative: there is M ≥ 0 such that |f 00 (x)| ≤ M for all x ∈ [a. Looking back at the formula (54) for the trapezoidal rule we notice that we have given equal weights to all the interior sample points but only half as much weight to the two endpoints. Rb b) Show: if f : [a. b]. b]. Rb c) Show: if f = x2 + bx + c. (Midpoint Approximation Theorem) Let f : [a. (b−a) f 00 (c) Rb  1 a) Show: there is c ∈ [a. this leads us to the midpoint rule n−1 X 1 Mn (f ) = ∆ f (a + (i + )∆). b] → R is convex. The following result concerning the midpoint rule is somewhat surprising: it gives a sense in which the midpoint rule is twice as good as the trapezoidal rule. b] → R is convex. Rb b) Show: if f is linear. xi+1 ]. b] such that Tn (f ) − a f = 12 n2 .3. Dividing the interval [a.

Z .

 .

b .

(b − a)3 M  1 f − Mn (f ).

≤ . .

.

24 n2 .

.

a .

. n − 1. ∆/2] → R. . xi −t . b−a Proof. Put ∆ = n . . for i = 0. Also put Z xi +t Ψi : [0. . Ψi (t) = f − 2tf (xi ). let xi = a + (i + 21 )∆.

we get the desired result: . Integrate and apply the Racetrack Principle twice as in the proof of Theorem 9. 24 24 Summing (57) from i = 0 to n − 1. using (56) and substituting ∆ = b−a n gives b (b − a)3 A 1 (b − a)3 B 1   Z   2 ≤ f − Mn (f ) ≤ . So Ψ0i (0) = 0. [a.t ∈ (xi − t. Applying the Mean Value Theorem to f 0 on [xi − t. 3. ∆ 2 ] we have 2tA ≤ Ψ00i (t) ≤ 2tB. 2t 2t or Ψ00i (t) = 2tf 00 (xi. i=0 a We have Ψi (0) = 0. Put A = inf(f 00 . and plug in t = ∆ 2 to get A 3 B 3 (57) ∆ ≤ Ψi (∆/2) ≤ ∆ .t ) = = i .t ).5. 24 n a 24 n2 Since |A|. [a. b]). Ψ0i (t) = f (xi + t) − f (xi − t)(−1) − 2f (xi ) = (f (xi + t) + f (xi − t)) − 2f (xi ). Ψ00 (t) = f 0 (xi + t) − f 0 (xi − t). xi + t) such that f 0 (xi + t) − f 0 (xi − t) Ψ00 (t) f 00 (xi. b]) B = sup(f 00 . APPROXIMATE INTEGRATION 191 As motivation for this definition we observe that n−1 X Z b (56) Ψ(∆/2) = f − Mn (f ). xi + t] we get a point xi. |B| ≤ M . and using the Fundamental Theorem of Calculus. Then for all 0 ≤ i ≤ n − 1 and t ∈ [0.

Z .

 .

b .

(b − a)3 M  1 f − Mn (f ).

. ≤ .

.

24 n2 .

.

a .

 For integrable f : [a. b] → R and n ∈ Z+ . b ∈ R such that a+b = 1. Since our previous results suggest that the midpoint rule is “twice as good” as the trapezoidal rule. aMn (f )+bTn (f ) → a f. we define Simpson’s Rule 2 1 S2n (f ) = Mn (f ) + Tn (f ). As with the other approximation rules. 3 3 Thus S2n (f ) is a weighted average of the midpoint rule and the trapezoidal rule. Rb Exercise: a) Show that for any a. it . Let us better justify Simpson’s Rule. Rb b) Deduce that S2n (f ) → a f . it makes some vague sense to weight in this way.

7. . . We have 2 1 2 a+b 1 f (a) + f (b) S2 (f ) =M2 (f ) + T2 (f ) = ((b − a)f ( )) + ( (b − a)) 3 3 3 2 3 2   b−a a+b = f (a) + 4f ( ) + f (b) . INTEGRAL MISCELLANY is really a composite rule. .6: Suppose f is a cubic polynomial. . b] into two subintervals. shows that it is even a little better than we might have expected. Exercise 9.3. . . xn = b. 6 2 Since we are dividing [a. Simpson’s Rule Rb is exact: a f = S2n (f ) for all n ∈ Z+ . . Simpson’s rule is an approximation by quadratic functions. it suf- Rb fices to show a f = S2 (f ). Show that Simpson’s Rule is Rb exact: S2n (f ) = a f . So let’s first concentrate on S2 (f ). By splitting up [a. . + f (b) 3 ∆ = (f (x0 ) + 4f (x1 ) + 2f (x2 ) + . i. Proof. 3 + b−a For n ∈ Z . xn−1 a + (2n − 1)∆. calculation: Z b A B (Ax2 + Bx + C) = (b3 − a3 ) + (b2 − a2 ) + C(b − a) a 3 2   A 2 B = (b − a) (a + ab + b2 ) + (a + b) + C) . we have ∆ S2 (f ) = (f (x0 ) + 4f (x1 ) + f (x2 )). we get ∆ S2n (f ) = (f (x0 ) + 4f (x1 ) + f (x2 ) + f (x2 ) + 4f (x3 ) + f (x4 ) + f (x4 ) + . Let ∆ = b−a 2n . The following exercise (which again. x1 = a + ∆. 3 2 whereas     b−a a+b f (a) + 4f + f (b) 6 2  2   ! b−a 2 a+b a+b 2 = Aa + Ba + C + 4(A +B + C) + (Ab + Bb + C) 6 2 2 b−a A(a2 + (a2 + 2ab + b2 ) + b2 ) + B(a + 2a + 2b + b) + 6C  = 6   A 2 B = (b − a) (a + ab + b2 ) + (a + b) + C . x2 = a + 2∆ = b. 3 2  Thus whereas the endpoint rule is an approximation by constant functions and the trapezoidal rule is an approximation by linear functions. b] into n pairs of subintervals. 3 Lemma 9. x1 = a + ∆.e. so with x0 = a. the width of each is ∆ = b−a 2 . if unenlightening. For any quadratic function f (x) = Ax2 + Bx + C. . x0 = a. it is based on iterating one simple approximation on many equally spaced subintervals. if unenlightening.. calculation). putting ∆ = 2n .192 9. This is done by a direct. can be solved by a direct. + 4f (xn−1 ) + f (xn ). We may therefore expect it to be more accu- rate than either the Trapezoidal or Midpoint Rules.

APPROXIMATE INTEGRATION 193 Since Simpson’s Rule is exact for polynomials of degree at most 3 and a func- tion f is a polynomial of degree at most three if and only if f (4) ≡ 0. b].8. (Simpson Approximation Theorem) Let f : [a. b] → R be four times differentiable with bounded fourth derivative: there is M ≥ 0 such that |f (4) (x)| ≤ M for all x ∈ [a. Then for all even n ∈ Z+ we have . The following result confirms this. it stands to reason that Simpson’s Rule will be a better or worse approximation according to the magnitude of f (4) on [a. Theorem 9. 3. b].

Z .

 .

b .

(b − a)4 M  1 f − Sn (f ).

. ≤ .

.

180 n4 .

.

a .

∆] → R. 1 1 Φ0i (t) = t(−f 0 (xi −t)+f 0 (xi +t))+ (f (xi −t)+4f (xi )+f (xi +t))−f (xi +t)−f (xi −t)) 3 3 1 2 = t(f 0 (xi + t) − f 0 (xi − t)) − (f (xi − t) − 2f (xi ) + f (xi + t)). B = sup(f (4) . let xi = a + (2i + 1)∆ and put Z xi +t 1 Φi : [0. 2 2t Let A = inf(f (4) . b]). For 0 ≤ i ≤ 2 − 1. n Proof. Applying the Mean Value Theorem to f 000 on [xi − t. [a. there is ξ ∈ (xi − t. ∆] we have 2A 2 2B 2 t ≤ Φ000 i (t) ≤ t . 3 3 Integrate and apply the Racetrack Principle three times and plug in t = ∆ to get A 3 B 3 (59) ∆ ≤ Φi (∆) ≤ ∆ . so for all i and all t ∈ [0. 3 3 So Φ00i (0) = 0 and 1 000 1 1 1 Φ000 i (t) = t(f (xi +t)−f 000 (xi −t))+ (f 00 (xi +t)+f 00 (xi −t))− f 00 (xi +t)− f 00 (xi −t) 3 3 3 3 1 = t(f 000 (xi + t) − f 000 (xi − t)). 3 xi −t As motivation for this definition we observe that n−1 X Z b (58) Φi (t) = Sn (f ) − f. xi + t]. i=0 a We have Φi (0) = 0 and using the Fundamental Theorem of Calculus. 90 90 . xi + t) such that 3 00 f 000 (xi + t) − f 000 (xi − t) Φi (t)/t2 = = f (4) (η). 3 So Φ000 i (0) = 0. Φi (t) = t(f (xi − t) + 4(xi ) + f (xi + t)) − f. b]). [a. 3 3 So Φ0i (0) = 0 and 1 00 1 2 2 Φ00i (t) = t(f (xi −t)+f 00 (xi +t))+ (f 0 (xi +t)−f 0 (xi −t))+ f 0 (xi −t)− f 0 (xi +t) 3 3 3 3 1 1 = t(f 00 (xi + t) + f 00 (xi − t)) − (f 0 (xi + t) − f 0 (xi − t)).

f (b)−f (a) and let S(x) = f (a) + b−a (x − a) be the secant line. 4. developed in jointly with his younger collaborator Roger Cotes in the early years of the 18th century.3 using (58) and substituting ∆ = b−a n gives Z b A(b − a)5 1 B(b − a)5     ≤ S n (f ) − f ≤ . Then   Rb a+b f f (a) + f (b) f ≤ a ≤ .8 are taken from [BS. Integral Inequalities Theorem 9.9. They are admirably down-to-earth. For now we just mention the sobering fact that all these rules – and higher-degree analogues of them – were already known to Newton.3. Appendix D]. This will be taken up in more detail in Chapter 12. b]. using only the Mean Value Theorem and the Funda- mental Theorem of Calculus. Midpoint and Simpson rules. (The Hermite-Hadamard Inequality) Let f : [a. a 12 n2 b) Derive similar “error equalities” for the Endpoint. INTEGRAL MISCELLANY Summing (59) from i = 0 to n2 − 1. 2 b−a 2 Proof. The proofs of Theorems 9. b] → R be convex and continuous. Integrating these inequalities and dividing by b − a we get Rb Rb Rb a s a f S ≤ ≤ a . Let s(x) = f ( a+b a+b 2 ) +m(x − 2 ) be a supporting line for f at x = a+b 2 .5 to show that there is η ∈ [a. 9. but I hope the reader can see that they have something to do with polynomial in- terpolation.7: a) In the setting of the Trapezoidal Rule. s(x) ≤ f (x) ≤ S(x).5. 180 n4 a 180  Exercise 9. Adapt the proof of Theorem 9.6 and 9. . suppose moreover that f 00 is continuous on [a. so (60) ∀x ∈ [a. Our motivations for considering the various “rules” were also a little light. b].194 9. b−a b−a b−a Now we have Rb R b a+b f ( 2 ) + m(x − a+b Z b a s(x) 2 ) a+b m a+b = a = f( )+ (x − ) b−a b−a 2 b−a a 2  2 a2  a+b m b a+b a+b m a+b = f( )+ − − (b − a) = f ( )+ ·0= . However it must be admitted that they are rather mysterious. b] such that Z b (b − a)3 f 00 (η) 1   Tn (f ) − f= . 2 b−a 2 2 2 2 b−a 2 3Note that there are n terms here – half as many as in the previous results. This gives rise 2 1 to an extra factor of 2 .

b] → [c. a  Theorem 9. and let P : [a. ϕ : [c. Proof. b] and a P = 1. b] → [0. ∞) be a probability density function. and let f : [a. Let [a. b] → [c. For an integrable function f : [a.10. q ∈ R>0 be such that 1 p + 1q = 1. b−a b−a 2 Rb Rb s S Substituting these evaluations of and b−a a a b−a into (60) gives Rb a+b f f (a) + f (b) f( )≤ a ≤ . b] → [0. b] → R be Riemann integrable functions. Then .9 is not necessary: the inequality holds for any convex f : [a. ∞) be a probability density: Rb i. Then E(f ) ∈ [c. d] and thus ϕ(E(f )) is defined. g : [a. P is integrable on [a.11 (Hölder’s Integral Inequality).. a a a Rb so a f (x)P (x)dx = E(f ) ∈ [c. a Theorem 9. Let f. Now put x0 = E(f ) and let s(x) = mx + B be a supporting line for the convex function ϕ at x0 . d] and s(x0 ) = ϕ(x0 ).e. INTEGRAL INEQUALITIES 195 Rb   Rb f (b)−f (a) S(x)dx a (f (a) + b−a (x − a))dx f (a) + f (b) a = = . d]. 4. 2 b−a 2  Exercise: Show that the hypothesis of continuity in Theorem 9. b] → R. b] → R. b] be a closed interval. we define the expected value Z b E(f ) = f (x)P (x)dx. Since f : [a. d] and ϕ(E(f )) ≤ E(ϕ(f )). so s(x) ≤ ϕ(x) for all x ∈ [c. Let p. d] → R be convex. Now Z b Z b E(ϕ(f )) = ϕ(f (x))P (x)dx ≥ (mf (x) + B)P (x)dx a a Z b =m f (x)P (x)dx + B = mE(f ) + B = mx0 + b = ϕ(x0 ) = ϕ(E(f )). d] be integrable. we have Z b Z b Z b c= cP (x)dx ≤ f (x)P (x) ≤ dP (x) = d. (Jensen’s Integral Inequality) Let P : [a.

Z .

! p1 Z ! q1 .

b .

Z b b (61) f g.

. ≤ |f |p |g|q .

.

.

.

a .

29. |f g| ≤  p Mg and thus the sum of the integrals of |f g| over the . b] such that |f |p >  has content zero: thus. the set of x ∈ [a. for every δ > 0 there is a finite collection of subintervals of [a. such that on the complement of those subintervals |f |p ≤ . By Lemma 8. On 1 this complement. Let Mf and Mg be upper bound for f and g on [a. b]. of total length at most δ. a a Rb Proof. Step 1: Suppose a |f |p = 0. b].

11 is important enough to be restated as a result in its own right. so altogether we get Z b Z b 1 | f g| ≤ |f g| ≤ (b − a) p Mg + δMf Mg . b] → R be Riemann integrable functions. If the length of the interval a − b were a precise multiple of the period 2π λ then we get several “complete sine waves” and the positive and negative area cancel out exactly. a a a 5. then a C cos(λx) oscillates increasingly rapidly as λ increases. δ are arbitrary. As the reader surely expects. Henceforth we may suppose Z b Z b If = |f |p > 0. Rb Here is the idea of the proof: if f (x) ≡ C is a constant function. we deduce | a f g| = 0. since a non-negative continuous function f with Rb a f = 0 must be identically zero. Let f. Proof.12 (Cauchy-Schwarz Integral Inequality). Then Z b !2 Z b Z b fg ≤ f2 g2 . The sum of the Riemann integrals over the subintervals of total length δ of |f g| is at most δMf Mg .  Remark: Admittedly Step 1 of the above proof is rather technical. making the integral exactly zero. INTEGRAL MISCELLANY 1 complement is at most (b − a) p Mg . . the student may wish to restrict to continuous f and g. Ig = |g|q > 0. the arguments for parts a) and b) are virtually identical. a a a p q p q 1 1 R  p1 R  q1 b b Multiplying through by Ifp Igq = a |f |p a |g|q gives the desired result. b] → R be Riemann integrable. p q ! Z b Z b Z b | ˜|p f |g̃|q 1 1 | f˜g̃| ≤ |f˜||g̃| ≤ + = + = 1. The Riemann-Lebesgue Lemma Theorem 9. As λ increases the remaining “incomplete sine wave” is a bounded function living on a smaller and smaller subinterval. (Riemann-Lebesgue Lemma) Let f : [a. A similar Rb argument works if a |g|q = 0.196 9. g : [a. so its contribution to the integral goes to zero. The stu- dent who continues on will learn the corresponding result for the Lebesgue integral. so we will establish part a) and leave part b) as an exercise. To an extent this points to the finickiness of the class of Riemann integrable functions. Corollary 9.13. The case p = q = 2 of Theorem 9. Then: Rb a) limλ→∞ a f (x) cos(λx)dx = 0. Alternately. a a 1 1 Step 2: Put f˜ = f /If and g̃ = g/Ig . in which the proof of this step is immediate. Thus both sides of (61) are zero in this case and the inequality holds. a a Rb Since all the other quantities are fixed and . Rb b) limλ→∞ a f (x) sin(λx)dx = 0. Then by Young’s Inequality.

< xn−1 < xn = b} of [a. Okay. a 2 Now . . there is a partition P = {a = x0 < x1 < . [xi . b] such that Z b  0 ≤ L(f. xi+1 ]) on Rb the subinterval [xi . By Darboux’s Criterion. P) ≤ f< . xi+1 ) of [a. P). THE RIEMANN-LEBESGUE LEMMA 197 A similar argument holds for step functions. And now the key: the integrable func- tions are precisely those which can be well approximated by step functions in the sense that the integral of the difference can be made arbitrarily small. which become constant functions when split up into appropriate subintervals. so Z b  0≤ (f − g) ≤ . b]. 5. . so g ≤ f and a g = L(f. a 2 Let g be the step function which is constantly equal to mi = inf(f. let’s see the details: fix  > 0.

Z .

Z .

Z .

.

b .

b .

b .

f (x) cos(λx)dx.

≤ |f (x) − g(x)|| cos(λx)|dx + .

g(x) cos(λx)dx.

.

.

.

.

.

.

a .

a .

a .

.

.

.

.

n−1 Z n−1  .

.

X xi+1 .

 .

.

X mi .

(62) ≤ +.

mi cos(λx)dx.

≤ + .

(sin(λxi+1 ) − sin(λxi )).

. .

.

2 .

i=0 xi .

2 .

i=0 λ .

Using this. Here we have a lot of expressions of the form | sin(A) − sin(B)| for which an obvious upper bound is 2.  . 2 λ But this inequality holds for any λ > 0. so taking λ sufficiently large we can make Rb the last term at most 2 and thus | a f (x) cos(λx)dx| < . the last expression in (62) is at most Pn−1  2 i=0 |mi | + .

.

In place of x• .. ∞) → R as follows: for x ∈ [n. ∞) → R interpolates f if f (n) = xn for all n ∈ Z+ . x3 . then if we form the sequence xn = f (n). then it follows that limn→∞ xn = L. by a slight abuse of notation. Then f (x) = x interpolates the sequence. ∞) → R as x approaches infinity. In fact. Define f : [1.e. If xn = f (x) for a function f which is continuous – or better. . log n log x Example: Supose xn = n . we say a sequence an diverges to infinity – and write limn→∞ an = ∞ or an → ∞ – if for all M ∈ R there exists N ∈ Z+ such that n ≥ N =⇒ an > M . The notion of an infinite sequence in a general set X really is natural and important throughout mathematics: for instance. . Note that the function is not required to be injective: i. in which we fix some element x ∈ X and take xn = x for all n. a simple but important ex- ample of a sequence is a constant sequence. 1} then we are considering infi- nite sequences of binary numbers. we are getting an ordered infinite list of elements of X: x1 . Finally. xn . If limx→∞ f (x) = L.. differentiable – then the methods of calculus can often be brought to bear to analyze the limiting behavior of xn . Given a sequence {xn }. An infinite sequence in X is given by a function x• : Z+ → X. But here we will focus on real infinite sequences xbullet : Z+ → R. CHAPTER 10 Infinite Sequences Let X be a set. . xn . we have |an − L| < . In fact. We say that an infinite sequence an converges to a real number L if for all  > 0 there exists a positive integer N such that for all n ≥ N . Further. x2 . n + 1). Exercise: Let {an }∞ n=1 be a real sequence. . we say that a function f : [1. Less formally but entirely equivalently. it is more than reminiscent: there is a direct con- nection. we will write {xn }∞ n=1 or even. . . among other places. x→∞ x ∞ x→∞ 1 ∞ It follows that xn → 0. This concept is strongly reminiscent of that of the limit of a function f : [1. we define divergence to negative infinity: I leave it to you to write out the definition. we may have xi = xj for i 6= j. a concept which comes up naturally in computer science and probability theory. f (x) = (n + 1 − x)an + (x − n)an+1 . and 1 log x ∞ LH 1 lim = = lim x = = 0. 199 . . A sequence is said to be convergent if it converges to some L ∈ R and other- wise divergent. if X = {0.

+ (an − an−1 )bn ) n−1 X = an bn+1 − am bn − (ak+1 − ak )bk+1 k=m n X = an bn+1 − am bn + (an+1 − an )bn+1 − (ak+1 − ak )bk+1 k=m n X = an+1 bn+1 − am bm − (ak+1 − ak )bk+1 . 1.. Then for all m ≤ n we have Xn n X ak (bk+1 − bk ) = (an+1 bn+1 − am bm ) + (ak+1 − ak )bk+1 .200 10. INFINITE SEQUENCES a) Take. n X ak (bk+1 − bk ) = am bm+1 + . k=m k=m Proof. let n X 1 1 1 an = = 1 + + . + 2. i=1 i 2 n and let n X 1 1 1 bn = 2 = 1 + 2 + .1. k=m  . . and sketch the graph of f . b) Show that the sequence limx→∞ f (x) exists iff the sequence {an } is convergent. and if so the limits are the same. The given f is piecewise linear but gener- ally not differentiable at integer values. + . The previous exercise shows that in principle every infinite sequence {an } can be interpolated by a continuous function. . But in practice this is only useful if the interpolating function f is simple enough to have a known limiting behavior at infinity... . . In fact we will be most interested in sequences of finite sums. + an bn ) k=m = an bn+1 − am bm − ((am+1 − am )bm+1 + . an = n2 . .. + an bn+1 − (am bm + . Example (Fibonacci Sequence) Example (Newton’s method sequences). (Summation by Parts) Let {an } and {bn } be two sequences. Many sequences which come up in practice cannot be interpolated in a useful way. with only a little more trouble we could “round off the corners” and find a differentiable function f which interpolates {an }. for example. Summation by Parts Lemma 10. i=1 i 2 n What is the limiting behavior of an and bn ? In fact it turns out that an → ∞ and 2 bn → π6 : whatever is happening here is rather clearly beyond the tools we have at the moment! So we will need to develop new tools. For instance. However. .

(Abel’s Lemma) Let {an }∞ n=1 be a real sequence with the following property: there is an M > 0 such that |a1 + . and for N ∈ Z . . Proof. there are several alternate formulas and special cases. . n=1 n=1 Proposition 10. The point is that Lemma 10. 2. though in some sense trivial.. . . after taking limits. for all N ∈ Z+ . there are several identities involving expressions in which we have discretely differentiated {an } and discretely integrated {bn }.1 could hardly be more elementary or straightforward: lit- erally all we are doing is regrouping some finite sums. {cn } be real infinite sequences. 2. one of which is given in the following exercise. recalled above. Theorem 10.2. infinite series. ∞ ∞ + Exercise: Pn Let {a Pnn}n=1 and {bn }n=1 be sequences. . a a Just as integration by parts is a fundamental tool in the study of integrals – and. Using (63). Easy Facts The following result collects some easy facts about convergence of infinite sequences. for summation by parts.1 is a discrete analogue of integration by parts: Z b Z b f g 0 = f (b)g(b) − f (a)g(a) − f 0 g. put An = i=1 an . Show that N X N X −1 (63) an bn = An (bn − bn+1 ) + AN bN . N Then. summation by parts would seem (to me at least) very mysterious. ≥ 0. A0 = B0 = 0. Bn = i=1 bn . Let {bn }∞ n=1Pbe a real sequence such that b1 ≥ b2 ≥ . you’ll probably gain some appreciation that these formulas. we have N X N X −1 | ab bn | = | An (bn − bn+1 ) + AN bN | n=1 n=1 N X −1 ≤ (bn − bn+1 )|AN | + bN |AN | = b1 |AN | ≤ b1 M. For now though you might try to prove it without using the summation by parts variant (63): by doing so. can be used in distinctly nontrivial ways.. a) If an = C for all n – a constant sequence – then an → C. improper integrals – summation by parts is a fundamental tool in the study of finite sums – and. | n=1 an bn | ≤ b1 M . after taking limits.but only much later on. L2 ∈ R we have an → L1 . or vice versa – rather than any one specific formula. Nevertheless the human mind strives for understanding and rebels against its absence: without any further explanation. ≥ bn . EASY FACTS 201 The proof of Lemma 10. Let {an }. + aN | ≤ M for all positive integers N . P So really one should remem- ber the idea – starting with a sum of the form n an bn . b) The limit of a convergent sequence is unique: if for L1 . .3. Whereas for integration by parts there really is just the one identity. n=1  We will put Abel’s Lemma to use. {bn }.

(iv) If M 6= 0. there exists N ∈ Z+ such that for all n ≥ N . (iii) an bn → LM . then so is the third. abnn → M L . a) Which of the following subsets of R are discrete? (i) A finite set. g) Suppose there exists N ∈ Z+ such that bn = an for all n ≥ N . b] for all n and an → L. it is clear that an eventually constant sequence is convergent. e) If a ≤ b are such that an ∈ [a. We say that a sequence {an } is eventually constant if there is C ∈ R and N ∈ Z+ such that an = C for all n ≥ N . and thus the limiting behavior of such functions is very limited. x + ) is x. say . and the ones that may not be are routine. an → L ⇐⇒ bn → L. (ii) The integers Z. A function from Z+ to Z+ is “fully discrete”.  Proposition 10. If not. But be honest with yourself: for each part of Theorem 10. Proposition 10. Can → CL.202 10. But since an → L. f ) (Three Sequence Principle) Suppose cn = an + bn . It is easy to see that if such a C exists then it is unique. {bn }. As above.4. Let {an } be an infinite sequence with values in the integers Z. (iv) The set { n1 | n ∈ Z+ } of reciprocals of positive integers. But the interval (L − .3 holds verbatim for functions of a continuous variable approaching infinity. d) If an ≤ bn for all n. but really this is almost obvous in any event. we have |an − L| < 1. L + ) contains no integers: contradiction. Then it is not possible for exactly two of the three sequences {an }. and since an and L are both integers this implies an = L. |an − L| < . Of course an eventually constant sequence converges to its eventual value – e. then L ∈ [a. (ii) an + bn → L + M . In fact. Then an is convergent iff it is eventually constant. {cn } to be convergent: if any two are convergent. please take some time right now to write out a careful proof. Proof. ∞]. INFINITE SEQUENCES and an → L2 .g. Hence one method of proof would be to establish these for functions – or maintain that we have known these facts for a long time – and then apply the Sequence Interpolation Theorem. there is  > 0 such that the only element of S which lies in (x − . Then for any L ∈ [−∞. b]. then L ≤ M . by applying parts a) and g) of Theorem 10. Now take  = 1 in the definition of convergence: there is N ∈ Z+ such that for n ≥ N . the distance from L to the nearest integer is a positive number. (iii) The rational numbers Q. Conversely.4 goes a long way towards explaining why we have described a func- tion from Z+ to R as semi -discrete. every part of Theorem 10. Thus the sequence is eventually constant with eventual value L. Exercise: A subset S ⊂ R is discrete if for all x ∈ S. . and we call such a C the eventual value of the sequence. c) If an → L and bn → M then: (i) For all C ∈ R.3. Most of these facts are qiute familiar.3 for which you have an iota of doubt as to how to prove. an → L and bn → M . suppose an → L ∈ R. then L1 = L2 . First we claim that L ∈ Z.

 Spivak gives a slightly more general result. In particular. Let {an }∞ n=1 be a real sequence such that for all n ∈ Z+ . the proof of which (essentially the same as the one given above) we leave as an exercise. fix  > 0. and let f : I \ {c} → R be a function such that limx→c f (x) = L. since an → c. aN0 +1 . 4.5. Characterizing Continuity Theorem 10. we have a function from {n ∈ Z | n ≥ N0 } to R. c an interior point of I. 3. instead of a function from Z+ to R. 4. Monotone Sequences One commonality between sequences a : Z+ → R and non-discrete functions f : I → R is that – since indeed both the domain and codomain are subsets of R – we have a natural order structure ≤ and it makes sense to consider functions which systematically preserve – or systematically reverse! – this order structure. there is aδ with |aδ − c| < δ and |f (aδ ) − f (c)| ≥ . there is δ > 0 such that |x − c| < δ =⇒ |f (x) − f (c)| < . they look like aN0 . Namely. It is certainly no problem to entertain sequences starting at any integer N0 : informally. Then lim f (an ) = L. and let f : I → R be a function. . show that the following are equivalent: (i) S is discrete. Let I be an interval. This mild generalization changes nothing we have said so far or will say later. n→∞ Exercise: Prove Theorem 10. . (ii) Every convergent sequence {an } with an ∈ S for all n is eventually constant. (ii) =⇒ (i): We prove the contrapositive: that is. there exists N ∈ Z+ such that n ≥ N =⇒ |an − c| < δ. Formally. If f is not continuous at c then there exists  > 0 such that for all δ > 0. let c be an interior point of I. we will suppose that f is not continuous at c and find a sequence an → c but such that f (an ) does not converge to f (c). Since f is continuous at c. n ∈ Z+ . (i) =⇒ (ii): This argument is very similar to the (easy!) proof that a composition of continuous functions is continuous. m ≤ n =⇒ am ≤ an . (ii) For every infinite sequence an → c. Moreover. It is often convenient to consider sequences whose initial term is something other than 1. Theorem 10. Proof. an ∈ I \ {c} and limn→∞ an = c. S b) For a subset S ⊂ R. A sequence {an } is weakly increasing if for all m. for n ∈ Z+ we may take δ = n1 and choose an with |an − c| < n1 and |f (an ) − f (c)| ≥ . The following are equivalent: (i) f is continuous at c. . Thus the following definitions will be quite familiar. So if n ≥ N we have |an − c| < δ and thus |f (an ) − f (c)| < . . we have f (an ) → f (c). MONOTONE SEQUENCES 203 (v) The set { n1 | n ∈ Z+ } {0}.6. and then indeed we have an → c but f (an ) does not approach f (c).6. . We leave it to the reader to make her own peace with this. Let I be an interval.

Then: a) {an } is decreasing iff {−an } is increasing. unrelated meanings in mathematics so this is really not optimal. Exercise: Prove it.8. if a• : Z+ → R is weakly increasing. (ii) {an } is constant. weakly decreasing. A sequence {an } is weakly decreasing if for all m. m < n =⇒ am > an .7. Exercise: For a real sequence {an }. b) Formulate (and. INFINITE SEQUENCES A sequence {an } is increasing if for all m. If you are ever in doubt whether someone means m ≤ n =⇒ am ≤ an or m < n =⇒ am < an . many use the term “strictly increasing” for se- quences with m < n =⇒ am < an . i. we need to counsel the reader that the terminology here is not agreed upon: for instance. a) Show that {an } is increasing iff an < an+1 for all n ∈ Z+ . n ∈ Z+ . of course). it makes sense to consider its image. n ∈ Z+ . To speak of this it is convenient to make another definition: since a sequence an is really a function Z+ → R.204 10. m < n =⇒ am < an . this implies a much simpler limiting behavior. A sequence {an } is decreasing if for all m. Lemma 10. (Reflection Principle) Let {an } be a real sequence. Lemma 10. there is really only one way for limn→∞ an to fail to exist. show that the following are equivalent: (i) There is n ∈ Z+ such that either an < an+1 > an+2 (Λ-configuration) or an > an+1 < an+2 (V-configuration). the set of all real numbers of the form an for some n ∈ Z+ . prove) analogues of part a) with increasing replaced by each of weakly increasing. n ∈ Z+ . there is not really a standard name for this: I vaguely recall it being called the “trace” of the sequence. Just as for functions of a continuous variable.8 implies that whatever we can establish for increasing / weakly increasing sequences will carry over imme- diately to decreasing / weakly decreasing sequences (and conversely. . if you feel it is worth your time. decreasing. For a real sequence {an }. b) {an } is weakly increasing iff {−an } is weakly increasing. (ii) The sequence {an } is not monotone. A sequence is monotone if it is weakly increasing or weakly decreasing.. Remark: As before. We avoid the problem by giving the concept a very clunky name: we define the term set of {an } to be A = {an ∈ R | n ∈ Z+ }. but the term “trace” has many other. Especially. What a monotone sequence lacks is oscillation. there is a simple remedy: ask! Exercise: Let {an } be a sequence. Exercise: Prove it.e. Lemma 10. and as for functions of a continuous variable. m ≤ n =⇒ am ≥ an . Strangely. the following are equivalent: (i) {an } is both weakly increasing and weakly decreasing.

L < aN ≤ an . in terms of the image. we get a much stronger result. We get: Theorem 10. Then if {an } is bounded below. and this implies that L −  ≥ an for all n. Proposition 10. suppose not: then there is L0 such that for all n ∈ Z+ . if there exists m ∈ R such that m ≤ an for all n ∈ Z+ . so that L −  is a smaller upper bound for A than L. Similarly. if there exists M ∈ R such that an ≤ M for all n ∈ Z+ . But since the sequence is weakly increasing. 4. we say a• is bounded below if its term set is bounded below: that is. then because an is increasing. By combining this with the least upper bound axiom. For no n do we have |an − L| < . this implies that for all n ≥ N .. Next suppose that there are infinitely many terms an with L − an ≥ . contradicting our asumption that an → L. We need to show that for all but finitely many n ∈ Z+ we have − < L − an < . (Monotone Sequence Theorem) a) Every bounded monotone real sequence is convergent. Finally. Then if {an } is bounded above. and it is unbounded below it diverges to −∞.  Remark: In the previous result we have not used the completeness property of R. Second we claim L is the least upper bound. Since L is the least upper bound of A. and a sequence is unbounded if it is not bounded. Thus if we take  = aN − L. Otherewise we say the sequence is unbounded below. Let {an }∞ n=1 be a weakly increasing sequence. so L − an ≥ 0 > −. then L is the least upper bound of the term set A = {an | n ∈ Z+ }. Then an → L. Indeed.11. Indeed. a) If the sequence converges to L ∈ R. More precisely: b) Let {an } be weakly increasing.9. in particular L ≥ an for all n ∈ Z+ . b) Let  > 0. it holds for all terms of the sequence. a sequenc is bounded if it is both bounded above and bounded below. then an → L.um.e. contradicting our assumption that an → L. . This is so important as to be worth spelling out very carefully. Let L ∈ (−∞. in the following sense. Otherwise we say the sequence is unbounded above. then for no n ≥ N do we have |an − L| < . and thus it holds for sequences with values in the rationals Q (and where by converges we mean converges to a rational number !) or really in any ordered field. contradiction. in proving the Monotone Sequence Theorem we did not just invoke the completeness of the real field: we used its full force. it converges to its least upper bound. suppose not: then there is N ∈ Z+ with L < aN . MONOTONE SEQUENCES 205 We are therefore able to speak of bounded sequences just as for bounded func- tions: i. and if it is unbounded above it diverges to ∞. if the term set A has an upper bound L < ∞. I mean the term set. A sequence a• : Z+ → R is bounded above if its term set is bounded above: that is. Proof. ∞] be the least upper bound of the term set of A. an ≤ L0 < L. In fact. it converges to its greatest lower bound. b) Conversely. or L ≥ an + .10. Let  = L − L0 . But if this inequality holds for ifninitely many terms of the sequence. Let {an }∞ n=1 be a weakly increasing real sequence. Theorem 10. c) Let {an } be weakly decreasing... a) First we claim L = limn→∞ an is an upper bound for the term set A.

. and by the reflected version of Proposition 10. .g a1 . . {Mn } is decreasing and bounded below by any element of S. for sufficiently large n we have m + n1 < M ≤ Mn ≤ yn + n1 and thus m < yn . say to L ∈ F . Example: Let an = (−1)n . . −1. Then F is Dedekind complete: every nonempty bounded above subset has a least upper bound. is – again informally. 1. A subsequence of {an }. (There is not meant to be a clearly evident pattern as to which terms of the se- quence we are choosing: the point is that there does not need to be. . contradiction. . If for all x ∈ S. . . But this is absurd: if n ≤ L for all n ∈ Z+ then n + 1 ≤ L for all n ∈ Z+ and thus n ≤ L − 1 for all n ∈ Z+ . yn + n ). . then we may put yn = z1 . a3 . . Suppose that it were convergent. so L − 1 is a smaller upper bound for Z+ . −1. a precise definition will be coming soon – obtained by selecting some infinitely many terms of the sequence. . Proof of claim: Indeed. By selecting all the odd-numbered terms we get one subsequence: (65) −1. say to M . for a little while. Therefore the process musts terminate and we may take yn = zk for sufficiently large k. 1. . . .9. INFINITE SEQUENCES Theorem 10. e. Otherwise. there exists z3 ∈ S with z3 > z2 + n1 . claim For all n ∈ Z+ . . informally. Step 2: Let S ⊂ R be nonempty and bounded above by M0 .  5. There are other choices – many other choices. proof of claim: Suppose not: then the sequence xn = n is increasing and bounded above. . −1. 1 Mn = min(Mn−1 . let us look at some key examples. . by selecting all the even-numberd terms we get another subsequence: (66) 1.9. Proof. then we may put yn = z2 . . Now we define a sequence of upper bounds {Mn }∞ + n=1 of S as follows: for all n ∈ Z . If for all x ∈ S.) To learn more about subsequences and what they are good for. x ≤ yn + n1 . we get a sequence with zk ≥ z1 + k−1n . If this process continues infinitely. a2 . But by Step 1. there exists yn ∈ S such that for any x ∈ S. Let F be an ordered field in which every bounded monotone sequence converges. a real sequence can be ob- tained as a subsequence of {an } iff it takes values in {±1}. x ≤ z2 + n1 . an orderest list of real numbers: a1 . 1. Subsequences A real infinite sequence {an } is. 1. zk > M . M is the infimum of the set {Mn }. 1. In fact. so the sequence is (64) −1. Otherwise there exists z2 ∈ S with z2 > z1 + n1 . . L is the least upper bound of Z+ . first choose any element z1 of S. a100 . Similarly. a703 . a6 .12. . an . Moreover M must be the supremum of S. F is Archimedean. . for any m < M . . so by hypothesis it converges. By Proposition 10. since again by the Archimedean nature of the order. −1. x ≤ z1 + n1 . so that for sufficiently large k.206 10. Step 1: We claim that F is Archimedean.

. 5. L ∈ [−∞. (70) 1. Here are some subsequences we can get: (68) 1. To obtain a subsequence we choose an increasing sequence n• : Z+ → Z+ . . 16. . < nk of positive integers and use this to tell us which terms of the sequence to take. the subsequence (66) converges to its constant value 1. 5. . . . . getting an1 . an3 . 2. 4. Let’s recap: we began with a sequence (66) which did not converge due to oscil- lation. an2 . It is an increasing sequence which is not unbounded above. at least – in a convergent subsequence. hence convergent to it constant value −1. Similarly. k 7→ nk and form the composite function a• ◦ n• : Z+ → R. And so forth. SUBSEQUENCES 207 But note that something very interesting happened in the passage from our original sequence to each of the first two subsequences. In fact. Show that for all k ∈ Z+ . 8. Let’s formalize these observations about what passing to subsequences does for us. due to oscillation. Let {an } be a real sequence. . Indeed. . . Then for any subsequence {ank }∞ k=1 . . (69). . (70). the subsequence (65) is constant. Exercise: Let n• : Z+ → Z+ be increasing. Given a real sequence {an }. we choose an increasing list n1 < n2 < . . 9. (71) 1. . (69) 2. by choosing appropriate subsequences we were able to remove the oscillation. 7. we practically saw it in the example above. 4. . (71). 8. so the sequence is (67) 1. 3. . 4.) Example: Let an = n. 3. . . and thus it diverges to infinity. and a little thought suggests that every subse- quence will have this property. nk ≥ k. we view it as a function a• : Z→ R. Note moreover that our sequence (67) fails to converge.13. . 6. . For that matter. 2. However. n 7→ an . passing to a subsequence can cure divergence due to oscillation. but not divergence to infinity. 4. The original sequence (64) is not convergent. However. resulting – in this case. the subsequences of (67) are precisely the increasing se- quences with values in the positive integers. and suppose that an → L. . Example (subsequences of a convergent sequence): We are now well-prepared for the formal definition. Proposition 10. k 7→ ank . . so do the subsequences (68). we have ank → L. ∞]. (There are also lots of subsequences which are “inappropriate” for this purpose. . . Thus. Less formally. ank . . but not due to oscillation. .

. gener- ality and usefulness. hence we have found a monotone subsequence. (iv) At least one subsequence of {an } is convergent. Similarly. Case 2: Suppose L = ∞. Since an → L. Case 3: We leave the case of L = −∞ to the reader as a (not very challenging) exercise in modifying the argument of Case 2. ank1 ≤ ank2 . b) If {an } is increasing. . INFINITE SEQUENCES Proof. Lemma 10.14. for each  > 0 there exists N = N () ∈ Z+ such that for all n ≥ N . So suppose on the contrary that there are only finitely many peaks. . More precisely: a) If {an } is weakly increasing. Case 1: Suppose that L ∈ R. 6. Then any sequence of peaks forms a strictly decreasing subsequence. . it has stayed with me through my entire adult mathematical career. (ii) {an } is bounded. ank .X. Let us say that m ∈ N is a peak of the sequence {an } if for all n > m. It seems that this argument goes back to a short note of Newman and Parsons [NP88].c). Since an → ∞. is constant.208 10. an > M .1. The Bolzano-Weierstrass Theorem For Sequences 6. then by definition of a subsequence nk1 ≤ nk2 . an < am . Proof. For a montone sequence {an }. Proof. then every subsequence ank is weakly increasing.d). since n2 is not a peak. then every subsequence ank is constant. for each M ∈ R there exists N = N (M ) ∈ Z+ such that for all n ≥ N . I learned of the following result from Mr. then every subsequence ank is increasing. Continuing in this way we construct an infinite (not necessarily strictly) increasing subsequence an1 .15. then every subsequence ank is decreasing. (iii) Every subsequence of {an } is convergent. Since n1 = N is not a peak. If k ≥ N . by Exercise X. in my first quarter of college at the University of Chicago. then nk ≥ k ≥ N and thus ank > M .  Proposition 10. for all k ≥ N we have nk ≥ N and thus |ank − L| < . Evangelos Kobotis in late 1994.16. the following are equivalent: (i) {an } is convergent. . then every subsequence ank is weakly decreasing. d) If {an } is decreasing. . b). Every infinite sequence has a monotone subsequence. we cannot resist mentioning the following finite analogue of the Rising Sun Lemma. Remark: Although it is nothing to do with our development of infinite sequences. Done!  Exercise: Show that every infinite sequence admits a subsequence which is (i) in- creasing. there exists n2 > n1 with an2 ≥ an1 . and then by definition of weakly increasing. . a) If k1 ≤ k2 . an2 . there exists n3 > n2 with an3 ≥ an2 . |an − L| < . or (iii) constant. (ii) decreasing. e) If {an }. Then.  Corollary 10. Every subsequence of a monotone sequence is monotone. The Rising Sun Lemma. and let N ∈ N be such that there are no peaks n ≥ N . Suppose first that there are infinitely many peaks. . c) If {an } is weakly decreasing.e) These may safely be left to the reader. Because of its elegance..

By Corollary 10. . . Proof. sr −1.15. We will prove part a) and leave the task of adapting the argument to prove part b) to the reader.  Exercise: Show that for an ordered field F .17. We define a function F : {1. . . . which of course is also bounded: |ank | ≤ M for all k ∈ Z+ . where ij is the length of the longest weakly increasing subsequence beginning with xj and dj is the length of the longest weakly decreasing subsequence beginning with xj .  1Recall that we have already shown that this is equivalent to Dedekind completeness. b) Consider the length (r − 1)(s − 1) sequence r. 2r. . . . . 6. r − 1} × {1. . . (i) There is a weakly increasing subsequence of length r. a) Consider a finite real sequence x1 . k + 1). . . Theorem 10. Let {xn } be a real sequence which is unbounded above. let nk+1 be the least positive integer such that xnk+1 > max(xnk . (r−1)(s−1)+1} → {1. b) A real sequence which is unbounded below admits a subsequence diverging to −∞. b) The number (r − 1)(s − 1) + 1 is best possible. 2r −1. . (s−1)r +1. or (ii) There is a weakly decreasing subsequence of length s. Supplements to Bolzano-Weierstrass. And so forth: having defined nk . s ∈ Z+ . . 2). By the Rising Sun Lemma. (Bolzano-Weierstrass) Every bounded real sequence admits a convergent subsequence. a contradiction.18. . (Erdős-Szekeres [ES35]) Let r. . . .19. . dj ). Let n2 be the least positive integer such that xn2 > max(xn1 . . r −1.3. . [Er94]) Seeking a contradiction. 1. s − 1} by F (j) = (ij . sr. . in the sense that there are real sequences of length (r − 1)(s − 1) without either a weakly increasing subsequence of length r or a weakly decreasing subsequence of length s. . Then for every M ∈ R. 2r +1. if xj ≥ xk . Let {an } be a sequence with |an | ≤ M for all n ∈ Z+ . a) ([Se59]. the subsequence ank is convergent. then adding xj to the beginning of the longest weakly decreas- ing sequence starting with xk gives a longer weakly decreasing sequence starting at xj . Since the domain of F has (r −1)(s−1)+1 elements and the codomain has (r −1)(s−1) elements. . there exists at least one n such that xn ≥ M . r +1.1 (ii) Every bounded sequence admits a convergent subsequence. similarly. . the following are equivalent: (i) Every bounded monotone sequence converges. . 3r −1. THE BOLZANO-WEIERSTRASS THEOREM FOR SEQUENCES 209 Theorem 10. . . .  6. a) A real sequence which is unbounded above admits a sub- sequence diverging to ∞. Bolzano-Weierstrass for Sequences. x(r−1)(s−1)+1 of length (r − 1)(s − 1) + 1. 3r. 6. there must be j < k such that F (j) = F (k): thus ij = ik and dj = dk . . . Proof. there is a monotone subsequence ank . Let n1 be the least positive integer such that xn1 > 1. . Theorem 10. we suppose every weakly increasing subsequence has length at most r − 1 and every weakly decreasing subse- quence has length at most s−1. . But if xj ≤ xk then adding xj to the beginning of the longest weakly increasing sequence starting with xk gives a longer weakly increasing sequence starting at xj . a contradiction. . Then limk→∞ xnk = ∞. . Proof.2. .

seemingly unrelated technology – in this case. (Extreme Value Theorem Again) Let f : [a. the theory of infinite sequences – one is richly rewarded with the ability to come up with shorter. We proved that the converse holds for bounded sets: Theorem 10.22. M < ∞. Since xnk − ynk → 0. But this is impossible: by definition of M . 6. after passing to a subsequence we may assume xnk is convergent. By reflection. we can make |f (xnk )−f (ynk )| ≤ |f (xnk )−L−(f (ynk ))−L)| ≤ |f (xnk ) − L| + |f (ynk ) − L| arbitrarily small by taking n sufficiently large. there is x ∈ R with 0 < |a−x| < . but the original argument was sufficiently short and sweet that we don’t mind repeating it here: let M be the supremum of the set of f (x) for x ∈ [a. b] has a limit point. b]. b]. We will now give two quite different proofs of two of the three Interval Theorems. Proof. say to L. Then f is uniformly continuous. INFINITE SEQUENCES 6. for sufficiently large n we have |f (xnk ) − f (ynk )| < : contradiction!  Remark: Although I am very fond of the Real Induction proof of the Uniform Con- tinuity Theorem. so by Bolzano-Weierstrass there is a subsequence xnk such that xnk → L ∈ R. Suppose not: then there exists  > 0 such that for all n ∈ Z+ we have xn . b] so is bounded. Seeking a contradiction we suppose f is unbounded : then.210 10. bounded interval [a. we have for all k that |f (xnk )| > k.20. yn ∈ [a. b] with |f (xn )| > n. so the function g : [a. f attains its minimum as well. on the one hand. there exists xn ∈ [a. b] → R by g(x) = f (x)−M is continuous on [a. f (x) takes values arbitrarily close to M . Then f is bounded and attains its minimum and maximum values. By Bolzano-Weierstrass. so by what we just proved. Proof. so the sequence f (xnk ) is divergent: contradiction! We do not have a new argument for the attainment of the minimum and max- imum values. On the other hand. since f is continuous we have by Theorem 10. b] → R be con- tinuous. for each n ∈ Z+ . hence bounded. b] with |xn − yn | < n1 but |f (xn ) − f (yn )| ≥ . It was an easy observation that any finite subset of R has no limit points. and since the sequences f (xnk ) and f (ynk ) have the same limit. (Bolzano-Weierstrass for Subsets) An infinite subset X of a closed. b] → R be continuous. I must admit this argument seems shorter and easier. by the Three Sequence Principle ynk is also convergent to L. Bolzano-Weierstrass for Subsets Revisited. . The sequence {xn } takes val- ues in [a. simpler (but more “high tech”) proofs. (Uniform Continuity Theorem Again) Let f : [a. Now. Very often in mathematics if one takes the time to develop additional. In particular.  Theorem 10. so f (x) − M takes values arbitrarily close to zero and thus |g(x)| take arbitrarily large values.5 that f (xnk ) → f (L) ∈ R. If M is not attained then the function f (x)−M 1 is continuous and nowhere zero. But then f (xnk ) → f (L) and f (ynk ) → f (L). Recall that earlier we proved a different result which we also called the Bolzano- Weierstrass Theorem: it concerned the notion of a limit point of a subset X ⊂ R: a point a ∈ R is a limit point for X if for all  > 0.5. Applications of Bolzano-Weierstrass.4.21. Theorem 10.

Partial Limits.e. |xnk − L| < k1 . c) an → ∞ iff ∞ is the only partial limit.19b) −∞ is a partial limit. First observe that by Theorem 10.19. Indeed: Assume the Bolzano-Weierstrass Theorem for sequences. and let {xn } be a bounded sequence: thus there are a. |xnk − L| < : since the terms are distinct. a) {an } has at least one partial limit L ∈ [−∞. Partial Limits. then by Bolzano-Weierstrass there is a finite partial limit L. {an } must be bounded above and below. then by Theorem 10. a) If {an } is bounded. b]. for any  > 0. b]. and let X be an in- finite subset of [a.23. 7. Otherwise X is infinite and we may apply Bolzano-Weierstrass for subsets to get a limit point L of X. b) The sequence {an } is convergent iff it has exactly one partial limit L and L is finite. Proof. in particular.) Theorem 10. 7. We wish to show that an → L. Since xn ∈ X ⊂ [a. PARTIAL LIMITS. there is K ∈ Z+ such that for all k ≥ K. In fact they are really equivalent results. then by Theorem 10. at most one of them is equal to L and thus the interval (L − .24. we have a ≤ xn ≤ b for all n. If {an } is unbounded above.1. and thus xnk → L. we say that an extended real number L ∈ [−∞. ∞ is a partial limit. We claim that L is a limit point for X: indeed. Suppose that L is a partial limit of some subsequence of {an }. This implies: there is n1 ∈ Z+ such that |xn1 − L| < 1. Let {an } be a real sequence. So choose M ∈ R such . For a real sequence {an }. L 6= ±∞. If X is finite. Let {an } be a real sequence. we can choose a sequence {xn }∞ n=1 of distinct elements of X. b ∈ R with a ≤ xn ≤ b for all n ∈ Z+ . there is n2 ∈ Z+ such that n2 > n1 and |xn2 − L| < 21 : continuing in this way we build a subsequence {xnk } such that for all k ∈ Z+ . It {an } is unbounded below.. Every sequence is either bounded. Since X is infinite. Assume the Bolzano-Weierstrass Theorem for subsets. so there is always at least one partial limit. having chosen such an n1 . b) Suppose that L ∈ R is the unique partial limit of {an }. unbounded above or unbounded below (and the last two are not mutually exclusive).23. so by Bolzano-Weierstrass for sequences there is a subsequence xnk converging to some L ∈ [a. L + ) contains infinitely many elements of X.19a). ∞] is a partial limit of {an } if there exists a subsequence ank such that ank → L. (Hint: this comes down to the fact that a subse- quence of a subsequence is itself a subsequence. which means roughly that is much easier to deduce each from the other than it is to prove either one. for otherwise it would have an infinite partial limit. b]. Then L is also a partial limit of {an }. Exercise: Prove Lemma 10. {xn } is bounded. i. d) an → −∞ iff −∞ is the only partial limit. Let X = {xn | n ∈ Z+ } be the term set of the sequence. Lemma 10. ∞]. Limits Superior and Inferior 7. then the sequence has a con- stant (hence convergent) subsequence. LIMITS SUPERIOR AND INFERIOR 211 Our task now is to explain why we have two different results called the Bolzano- Weierstrass Theorem.

M ]. (ii) {xn } is bounded. . d) Left to the reader to prove. since every term of ank` is bounded above by L − . L − ]. Show that {xn } is convergent.2. c) Suppose an → ∞.23. Suppose that there exists N ∈ Z+ such that for all n ≥ N . L − ] or for all n ∈ S we have an ∈ [L + .24b) with the Bolzano-Weierstrass Theorem. Suppose that: (i) Any two convergent subsequences converge to the same limit. The reader who understands the argument will have no trouble adapting to the latter case.25. n2 . The Limit Supremum and Limit Infimum. and let a ≤ b be extended real numbers. there must in fact be in infinite subset S ⊂ N such that either for all n ∈ S we have an ∈ [−M. Then also every subsequence diverges to +∞. . so L = +∞ is a partial limit. we get a subsubsequence (!) ank` which converges to some L0 . so it is not the case that +∞ is the only partial limit of {an }. a ≤ xn ≤ b. Theorem 10.) Exercise: Let {xn } be a real sequence. Writing the elements of S in increasing order as n1 . Now suppose that an does not converge to L: then there exists  > 0 such that it is not the case that there exists N ∈ N such that |an − L| <  for all n ≥ N . Case 1: The sequence is unbounded above. But then L0 6= L so the sequence has a second partial limit L0 : contradiction. Moreover. Proof. For a real sequence {an }. L is also a partial limit of the original sequence. Let us treat the former case. by adapting the argument of part c) or otherwise. Moreover. We note right away that a subsubsequence of an is also a subsequence of an : we still have an infinite subset of N whose elements are being taken in increasing order. . since |an − L| ≥  means either −M ≤ an ≤ L −  or L +  ≤ an ≤ M . its limit L0 must satisfy L0 ≤ L − . Applying Bolzano-Weierstrass to this subsequence. (Suggestion: Combine Theorem 10. We will prove the converse via its contrapositive (the inverse): suppose that an does not diverge to ∞. Then +∞ is a partial limit. 7. Then there exists M ∈ R and infinitely many n ∈ Z+ such that an ≤ M . we have shown that there exists a subsequence {ank } all of whose terms lie in the closed interval [−M. L is a partial limit of the se- quence and is thus the largest partial limit. . and from this it follows that there is a subsequence {ank } which is bounded above by M . .  Exercise: Let {xn } be a real sequence. What this means is that there are infinitely many values of n such that |an −L| ≥ . Show that {an } has a partial limit L with a ≤ L ≤ b. hence by part a) it has some partial limit L < ∞. let L be the set of all partial limits of {an }. so +∞ is a partial limit and there are no other partial limits. By Lemma 10. nk . For any real sequence {an } . This subsequence does not have +∞ as a partial limit.212 10. INFINITE SEQUENCES that |an | ≤ M for all n. We define the limit supremum L of a real sequence to be the least upper bound of the set of all partial limits of the sequence.

PARTIAL LIMITS. Here is a very useful characterization of the limit supremum of a sequence {an } it is the unique extended real number L such that for any M > L. so M ≥ L. there exists a subsequence ank such that ank → L. Then it has a finite partial L (it may or may not also have −∞ as a partial limit). For each k ∈ Z+ . It follows that L0 ≥ L.  Similarly we define the limit infimum L of a real sequence to be the infimum of the set of all partial limits. If a sequence diverges . Then −∞ is the only partial limit and thus L = −∞ is the largest partial limit. namely its limit L. As usual. It follows that L ≤ L0 . there exists nk such that ank > L − k1 . the proof of Theorem 10. ∞).26. by the definition of L = sup L the subsequence cannot have any partial limit which is strictly greater than L: therefore by the process of elimination we must have ank → L. and such that for any m < L. For any real sequence an . By Theorem 10. 7. To see this. LIMITS SUPERIOR AND INFERIOR 213 Case 2: The sequence diverges to −∞. Proposition 10. its supremum either stays the same or decreases. define bn = supk≥n ak . i. L − k1 < L.e.21. Case 3: The sequence is bounded above and does not diverge to −∞. we have (72) L = lim sup ak n→∞ k≥n and (73) L = lim inf ak . It follows from these inequalities that the subsequence ank cannot have any partial limit which is less than L. moreover.  The merit of these considerations is the following: if a sequence converges. we have a number to describe its limiting behavior. so L ∈ (−∞. so there exists a subsequence converging to some L0 > L − k1 . and thus L = L0 . Our first order of business is to show that limn→∞ supk≥n ak exists as an ex- tended real number. we will prove the statements involving the limit supremum and leave the analogous case of the limit infimum to the reader. bn → L0 ∈ [−∞. Exercise: a) Prove the above characterization of the limit supremum. {n ∈ Z+ | an ≥ M } is finite. {n ∈ Z+ | an ≥ m} is infinite. In particular. Proof. n→∞ k≥n Because of these identities it is traditional to write lim sup an in place of L and lim inf an in place of L. By reflection. First suppose M > L0 . ∞]. On the other hand. We need to find a subsequence converging to L.25 shows that L is a partial limit of the sequence. Then there exists n ∈ Z+ such that supk≥n ak < M . Now we will show that L = L0 using the characterization of the limit supre- mum stated above. Thus there are only finitely many terms of the sequence which are at least M . when we pass from a set of extended real numbers to a subset. b) State and prove an analogous characterization of the limit infimum.. The key observation is that {bn } is decreasing. suppose m < L0 . Then there are infinitely many n ∈ Z+ such that m < an and hence m ≤ L. Indeed.

There is something slightly curious about this definition: do you see it? It is this: we are defining the notion “convergent” in terms of the notion “convergent to L”. a function is not “uniformly continuous to L at c”. this is the way we tend to think of things in practice: if one is given an infinite sequence. awkward. Exercise: Prove Corollary 10. first we want to know whether it is convergent or divergent.g. ∞). i. However. definition involved a fixed Rb real number I. Since derivatives are defined in terms of limits (and not in terms of continuity!) they have the same problem: to show that f is differentiable at c we have to show that the limiting slope of the secant line is some specific real number. One might think that things should go the other way around: shouldn’t the concept of convergene should be somehow logically prior than “convergence to L”? Anyway. Let us look back at our definition of a convergent sequence: a real sequence {an } is convergent if there is some L ∈ R such that {an } converges to L. This is sort of in between in that the defi- nition includes a particular limiting value but we don’t have to “find” it: we have to show that an underestimate and an overestimate are actually the same. 8. and our definition of integrability was that these two quantities are finite and equal. Corollary 10. our definition of “the limit of f (x) as x → c” also has this feature: what we actually defined was limx→c f (x) = L. INFINITE SEQUENCES to ±∞. Then we gave a modified definition in terms of two quantities a f Rb and a f which exist for any function.2 The concept of uniform continuity really does not mention a specific limiting value. For many purposes – e. For integrals.1. Since L exists for any sequence.27. this is very powerful and useful. such that −∞ ≤ L ≤ L ≤ +∞. the plot thickens.214 10. which remedies the issue completely: this gives us a procedure to show that a function is integrable without requiring any knowledge on 2I think this partially explains why the concept of continuity is simpler than that of a limit.e. . to every sequence we have now associated two numbers: the limit infimum L and the limit supremum L. Our first. Moreover we have Darboux’s criterion. for making upper estimates – we can use the limit supremum L in the same way that we would use the limit L if the sequence were convergent (or divergent to ±∞).. Motivating Cauchy sequences. and then if we know it is convergent we can ask the more refined question: “To which real number L does our convergent sequence converge?” This is a good opportunity to think back on our other notions of convergence. For continuity the issue is not pressing because we know that we want the limit to be f (c).27. again we have an “extended real number” that we can use to describe its limiting behavior. Cauchy Sequences 8. If you think about it. A real sequence {an } is convergent iff L = L ∈ (−∞. But a sequence can be more complicated than this: it may be highly oscillatory and therefore its limiting behavior may be hard to describe. Similarly for L.

and the main point of our discussion of Riemann sums was to make the convergence more explicit.  Proposition 10. Suppose an → L. 8. Let {an } be a Cauchy sequence and suppose there is a subsequence ank converging to L ∈ F . but of course without having to “find” in any sense the value of the integral. Here are some elementary properties of Cauchy sequences. n ≥ N1 we have |an − am | = |am − an | < 2 .3 Lemma 10. Then there exists N ∈ Z+ such that for all n ≥ N . Further. Cauchy sequences. so   |an − L| = |(an − anN ) − (anN − L)| ≤ |an − anN | + |anN − L| < + = . Then nN ≥ N and N ≥ N2 . And that’s what made Darboux’s criterion so useful: we used it to show that every continuous function and every monotone function is integrable. A sequence {an }∞ n=1 in is Cauchy if for all  > 0. Thus for all m. Choose N1 ∈ Z+ such that for all m. There exists N ∈ Z+ such that for all m. say. n ≥ N . Proof.30. n ≥ N . Proof. A Cauchy sequence which admits a convergent subse- quence is itself convergent. choose N2 ∈ Z+ such that for all k ≥ N2 we have |ank − L| < 2 . A Cauchy sequence is bounded. Fix any  > 0. Therefore.2. This is exactly what Cauchy sequences are for. Moreover put M2 = max1≤n≤N |an | and M = max(M1 . so |an | ≤ |aN | + 1 = M1 . . Let {an } be a Cauchy sequence. Left to the reader as an exercise. taking m = N we get that for all n ≥ N . CAUCHY SEQUENCES 215 what number the integral should be.) Upshot: it would be nice to have some way of expressing/proving that a sequence is convergent which doesn’t have the limit of the sequence built into it. But I myself noticed this only quite recently – more than 15 years after I first learned about monotone sequences and Cauchy sequences.  Proposition 10. |an − L| < 2 . 8. A convergent sequence is Cauchy. n ≥ N we have   |an − am | = |(an − L) − (am − L)| ≤ |an − L| + |am − L| < + = . Then for all n ∈ Z+ . A subsequence of a Cauchy sequence is Cauchy.31. 2 2  3You might notice that they are reminiscent of the elementary properties of monotone se- quences we proved above. |am − an | < . there exists N ∈ Z + such that for all m. and put N = max(N1 . We claim that an converges to L. N2 ). (This inexplicitness is not entirely a good thing. Proof. |an − aN | < 1. |an | ≤ M .29. 2 2  Proposition 10.28. |am − an | < 1. I wish I had a deeper understanding of the underlying source – if any! – of these commonalities. M2 ). Proof.

+ x0 rn . L = limn→∞ xn+1 = limn→∞ rxn = rL. . (Cauchy Criterion) Any real Cauchy sequence is convergent. c). L) for any L. The solutions to L = rL for r ∈ R. if for all n ∈ Z+ xn 6= 0 we have xxn+1 n = r. then the sequence is x0 . d) If r = −1. so {|xn |} is decreasing and bounded below by 0.216 10. L ∈ [−∞. . and by the above analysis we must have L = ∞. the next result does crucially use the Dedekind completeness of the real numbers. (Geometric Sequences) Let {xn } be a geometric sequence with geometric ratio r and x0 6= 0. (r. and by the above analysis L = 0. (In fact there are non-Archimedean ordered fields in which the only Cauchy sequences are the eventually constant se- quences!) But we had better not get into such matters here. Proof. e) If |r| > 1. and thus the results hold in any ordered field. then |xn+1 | = |r||xn | > |xn |. By the Monotone Sequence Lemma. Theorem 10. We claim that – quite luckily! – we are able to obtain a simple closed-form expres- sion for Sn . non- negative number L. (r. −x0 .33. We may dispose of the case r = 1: then Sn = (n + 1). x0 . |xn | → L ∈ (|x0 |. −∞) for negative r. by Proposition 10.. xn → 0. {an } is bounded. Now: b) If |r| < 1. there are non-Archimedean – and thus necessarily not Dedekind complete – ordered fields in which every Cauchy sequence converges. . ∞] are: (1. INFINITE SEQUENCES The above proofs did not use completeness.30. . Since |xn | → 0. .) 9. However. When r 6= 1.  For x0 . By Bolzano-Weierstrass there exists a convergent subequence. Geometric Sequences and Series A geometric sequence is a sequence {xn }∞ n=0 of real numbers such that there is a fixed real number r with xn+1 = rxn . ∞]. b) If |r| < 1. e) If |r| > 1. in the form of the Bolzano-Weierstrass Theorem. See [Ma56]. then xn → x0 . By Proposition 10. Let {an } be a real Cauchy sequence. . a) A simple induction argument which is left to the reader. ∞) for positive r and (r. ∞]. Since xn+1 = rxn . we use a very standard trick : multiplying (75) by r gives (75) rSn = x0 r + . Theorem 10. so the sequence {|xn |} is increasing and bounded below by |x0 |. then |xn | → ∞. d) These are immediate and are just recorded for easy reference. + x0 rn + x0 rn+1 . r ∈ R. this implies that {an } is convergent. By the Montone Sequence Lemma. Proof.32. Suppose xn → L ∈ [−∞. then xn → 0.31. . 0) for any r. . c) If r = 1. which diverges. −x0 . then |xn+1 | = |r||xn | < |xn |. Exercise: Is there a “Cauchy criterion” for a function to be differentiable at a point? (Hint: yes. we define the finite geometric series (74) Sn = x0 + x0 r + . Finally. a) We have xn = x0 rn .  It can be further shown that an Archimedean ordered field F in which every Cauchy sequence is convergent must be Dedekind complete. In contrast. We call r the geometric ratio since. |xn | converges to a finite.

to what.. Now. A function f : I → R is a short map if 1 is a Lipschitz constant for f . r ∈ R.33 to prove Theorem 10. Since r 6= 1. then {Sn } diverges. Exercise: Let f : I → I be differentiable. if there is a constant 0 < C < 1 such that: (77) ∀x. y ∈ I. so it is reasonable to take another look at this important circle of ideas. Contraction Mappings Revisited The material of this section owes a debt to a paper of K. If so. 10. So a Lipschitz function is uniformly continuous. Exercise: Use Theorem 10. or a contraction mapping.34. this yields n X 1 − rn+1 (76) Sn = x0 r k = . and if so. + x0 r n . ∀x. Preliminaries. then limn→∞ Sn = 1−r . fixed points and sequences of iterates in the context of Newton’s method. we have more tools at our disposal. let n X Sn = x0 r n = x0 + x0 r + . . . we may – independently of x – choose δ < C . b) If |r| ≥ 1. CONTRACTION MAPPINGS REVISITED 217 and subtracting (75) from (74) gives (1 − r)Sn = x0 (1 − rn+1 . Thus contractive implies weakly contractive implies short map. y ∈ I. 10. Let’s take it again from the top: A function f : I → R is contractive.e. Recall that f : I → R is Lipschitz if (77) holds for some C > 0. |f (x) − f (y)| ≤ |x − y|. and then |x − y| ≤ δ =⇒ |f (x) − f (y)| ≤ Cδ < . k=0 1 a) If |r| < 1.1. |f (x) − f (y)| ≤ C|x − y|. Recall that we last studied contractive mappings. 1−r k=0 Having this closed form enables us to determine easily whether the sequence {Sn } converges.34. 10. Conrad [CdC]. (The Geometric Series) For x0 . A contraction mapping is a Lipchitz mapping with Lipschitz constant C < 1. for any  > 0. towards the end of a long chap- ter on infinite sequences. |f (x) − f (y)| < |x − y|. A function f : I → R is weakly contractive if: ∀x 6= y ∈ I. i. . Theorem 10.

a “map” or “mapping”). Proof. (Recall that Newton’s method in- volved consideration of sequences of iterates of the function T (x) = x − ff0(x) (x) . . and let f : X → Y be a function (or. Formally.  Exercise:4 Consider the function f : (0.. An element L ∈ X is a fixed point for f if f (L) = L. William Faulkner} would be an exercise in futility. how can we explicitly find them. . an element of both X and Y . we can take the output and plug it back in as the input: x ∈ X 7→ f (x) 7→ f (f (x)). especially intervals. L2 ∈ I such that f (L1 ) = L1 and f (L2 ) = L2 .) And now. x2 = f (x2 ). Lady Gaga. the sequence {f (xn )} = {xn+1 } is just a reindexed version of the sequence {xn }. then C is a Lipschitz constant for f . be the sequence of iterates of x0 under f .2. Theorem 10. so it must have the same limit: f (L) = L. . Here is what we want to know.g. Our first result is a very modest generalization of Lemma 7. f is contractive. in particular. Question 10. contradiction.  Exercise: Show that a short map f : I → I can have more than one fixed point.35. Further. a) Does f have at least one fixed point? b) Does f have exactly one fixed point? c) If f has fixed points. Since f is continuous and xn → L. we often restrict attention to mappings f : X → X. Perhaps the sets X and Y have no elements in common: then of course no function between them can have a fixed point: e. Let f : I → I be a function. at least approximately? 10. 1] given by f (x) = x2 . . Some Easy Results. 4This exercise and the next are easy bordering on insulting.37. 1) with |f 0 (x)| ≤ C for all x ∈ I. To alleviate this problem. 1] → (0. Suppose that xn converges to some L ∈ I. of the real numbers. INFINITE SEQUENCES a) Show that if |f 0 (x)| ≤ C for all x ∈ I. back to earth: the methods of calculus allow us to analyze functions defined on subsets. And we can keep going. starting from any initial value x0 ∈ X. b) Deduce: if there is C ∈ (0. looking for fixed points for mappings from the set X of real numbers to the set Y = {Barack Obama. synonymously. Suppose there are distinct L1 . Then |f (L1 ) − f (L2 )| = |L1 − L2 |. Lemma 10. Proof. A fixed point L of f is. A weakly contractive function has at most one fixed point. such a mapping can be iterated: that is. But since f (xn ) = xn+1 . So suppose we have an interval I ⊂ R and a function f : I → I. a) Show that f is contractive. Let X and Y be sets. f (xn ) → f (L).36. a short map).218 10. xn+1 = f (xn ). but to properly appreciate later results it will be useful to have these examples on record. we get a sequence of iterates {xn }∞ n=0 where for all n ∈ N. x1 = f (x1 ). c) State and prove analogous sufficient conditions in terms of bounds on f 0 for f to be weakly contractive (resp. Let f : I → I be a continuous function.7. Let x0 ∈ I and let x0 . The study of fixed points of mappings is highly important throughout mathematics. Then L is a fixed point of f .

When f : I → I has a fixed point L. then the sequence of iterates of L under f is constantly L and thus converges to L. and we get sucked into it. c) If f is continuous.37. a) Show that f is weakly contractive. 1]. . hence continuous. this is a contradiction. 1] → [0. so the sequence of iterates is constantly L. f (f (x0 )). different sequences of iterates would have at least two different limits. We say L ∈ I is an attracting point for f if for every x0 ∈ I. x = 0 a) Show that 0 is an attractive point of f but not a fixed point. b) Show that f has no fixed point on R. we would like to be able to find L as the limit of a sequence of iterates. the sequence of iterates of x0 under f must converge to both L1 and L2 . converges to L. a) Suppose L1 6= L2 are attractive points. contradicting attractivity. the sequence of iterates x0 . . Proof.38? Remark: Note that. then f (x0 ) = L. So if we had at least two fixed points. Exercise: Consider the function f : [0. . But this observation is not very useful in find- ing L! The next. although our terminology suppresses this. In .38. b) f has at most one fixed point. a) The attracting point L is unique. b) Show that f has no fixed point on R. Lemma 10. Exercise: Consider the function f : R → R given by f (x) = x + 1. 10. And we can: if x0 = L. observation is that a convegent sequence of iterates of a continuous function yields a fixed point. So the fixed point of an attractive map is easy to find: we just start any- where. iterate. Since a sequence can have at most one limit. f (x0 ). To state this more precisely we introduce some further terminology. √ Exercise: Consider the function f : R → R given by f (x) = x2 + 1. Suppose f : I → I is an attractive function. slightly less trivial. 0<x≤1 f (x) = 1.  Thus attractive maps are precisely the maps with a unique fixed point which can be found as the limit of a sequence of iterates starting from any point x0 in the domain. CONTRACTION MAPPINGS REVISITED 219 b) Show that f has no fixed point on (0. c) Show that f is not contractive. b) Why does this not contradict Lemma 10. We say f is attractive if it has an attracting point. a) Show that f is a short map. b) If L is a fixed point for f . 1] such that x  2. c) This follows immediately from the first two parts and from Lemma 10. Starting at any initial point x0 ∈ L. whether L is an attracting fixed point for f depends very much on the interval of definition I. then the unique attracting point is the unique fixed point.

 Exercise: Complete the proof of Theorem 10. C n → 0. b) Explicitly. That’s a fine result in and of itself. + C n |x1 − x0 | 1 − Ck     |x1 − x0 | = |x1 − x0 |C n 1 + C + . . which is the limit of every sequence of iterates: i. Step 0: By Theorem 10. d) Exhibit f : [−1. and let f : I → I be contractive with constant C ∈ (0. namely the explicit estimate (78). and for all x0 ∈ I. So {xn } is Cauchy and hence. f 0 (0) = 1. Step 1: Let x0 ∈ I. c) Exhibit f : [−1.3. a) Then f is attractive: there is a unique fixed point L. L is an attracting point for f . and let L ∈ I ◦ be a fixed point of f .39 by establishing (78). the term “attracting point” is often used for a weaker. . (Suggestion: adapt the argument of Step 1 of the proof. INFINITE SEQUENCES the literature. f has at most one fixed point. Theorem 10. then L is not a locally attracting point for f . 10. so we may choose N such that for all n ≥ N and all k ∈ N. . . “local prop- erty” which is studied in the following exercise. b) Show that if |f 0 (L)| > 1. let n ≥ N and let k ≥ 0. By Step 0. Now let f : I → I be C 1 . a) Show that if |f 0 (L)| < 1. .e. let N be a large positive integer to be chosen (rather sooner than) later.32. 1] → R such that L = 0 is a fixed point.) Thus the Contraction Mapping Theorem asserts in particular that a contractive mapping on a closed interval is attractive.37.36. Then |xn+k − xn | ≤ |xn+k − xn+k−1 | + |xn+k−1 − xn+k−2 | + . (Contraction Mapping Theorem) Let I ⊂ R be a closed in- terval. the sequence {xn } of iterates of f under x0 converges to L.  < 1−C 1−C Since |C|< 1. Step 2: By Step 1 and Lemma 10. ..  |x1 −x0 | 1−C C n < . Now that’s a theorem! . f 0 (0) = 1 and 0 is not a locally attracting point for f . A point L ∈ I is a locally attracting point if there exists δ > 0 such that f maps [L − δ. + C k−1 = |x1 − x0 |C n C n. and is so explicit that it immediately gives an efficient algorithm for approximating L to any desired accuracy. fix  > 0. But we have more. convergent. Step 3: We leave the derivation of (78) as an exercise for the reader. then L is a locally attracting point for f .220 10. f has at most one fixed point. and 0 is a locally attracting point for f . 1] → R such that L = 0 is a fixed point. L + δ] has L as an attracting point.   |x0 − f (x0 )| (78) |xn − L| ≤ C n. Exercise: Let f : I → I be a function. It guarantees that starting at any point x0 ∈ I the sequence of iterates converges to the attracting point L exponentially fast. 1). This completes the proof of part a). L + δ] into itself and the restriction of f to [L − δ. for each x0 ∈ I the sequence of iterates of x0 under f converges to a fixed point. The Contraction Mapping Theorem. for all x0 ∈ I and all n ∈ N.39. by Theorem 10. + |xn+1 − xn | ≤ C n+k−1 |x1 − x0 | + C n+k−2 |x1 − x0 | + . So there must be a unique fixed point L ∈ I. 1−C Proof.

1] f (x) = 1. we say inf I is an attracting point for f if for all x0 ∈ I. b) Explain how to use (say) a handheld calculator to approximate x to as many decimal places of accuracy as your calculator carries. b) If I = [a. Theorem 10. Proof. ◦ f be the nth iterate of f . c) Give an example of a contraction mapping on [a.5 b) Use part a) to extend the Contraction Mapping Principle to contractions f : [a. (ii) sup I is an attracting point for f . (iii) inf I is an attracting point for f . Suppose that for some N ∈ Z+ . above. with the proviso that the unique fixed point may be b. Recall that L ∈ I is attracting for f if for all x0 ∈ I. A fixed point of f is precisely a root of g. b) with fixed point b. CONTRACTION MAPPINGS REVISITED 221 Exercise: a) Show that there is a unique real number x such that cos x = x. You may wish to assume this part for now and come back to it later. a) Show that any fixed point of f is a fixed point of f ◦N . a) Define g : I → R by g(x) = f (x) − x. Let I be an interval. . and deduce that f (L) = L: thus f has a unique fixed point in I. the sequence of iterates of x0 under f converges to L. . Exercise: Let f : I → I be a function. b) be a contractive map with constant C ∈ (0. Show that f is not a contraction but f ◦ f is a contraction.4. b). 10. 2]. b) → [a. Let us say sup I is an attracting point for f if for all x0 ∈ I. x ∈ (1. e) Consider the function f : [0. d) Show that in fact L is an attracting point for f . and that by defining f at b to be this limit. c) Show that f (L) is also a fixed point for f ◦N . b) → [a. the extended function f : [a. the sequence of iterates of x0 under f approaches sup I (sup I = ∞ iff I is unbounded above). b] then f has a fixed point. which treats similar but more general problems. For n ∈ Z+ . and deduce that f has at most one fixed point in I. Similarly. b) Show that f ◦N has a unique fixed point L ∈ I. a) At least one of the following holds: (i) f has a fixed point. The result on extension will be much easier after you have read the next section. b] → R remains contractive with constant C. 0 ≤ x ∈ [0. so to prove part a) we may assume that g has no roots on I and show 5This is the hardest part of the problem.40. . 1] → [0. 10. Let I ⊂ R be an interval. and let f : I → I be continuous. and let f : I → I be continuous. 1] by  0. f ◦N is contractive. we showed that if L is attracting for f then L is the unique fixed point of f . a) Show that limx→b− f (x) exists (as a real number!). let f ◦n = f ◦ . the sequence of iterates of x0 under f approaches inf I (inf I = −∞ iff I is unbounded below). 1). Further Attraction Theorems. Exercise: Let f : [a. d) State and prove a version of the Contraction Mapping Principle valid for an arbitrary interval.

|xnk − L| = dnk → d.  Theorem 10. d = 0.6 itself called either the Contraction Mapping Principle or Banach’s Fixed Point Theo- rem. by Bolzano- Weierstrass. This sequence cannot converge to any element L of I. The case I = [a.39 is a special case of a celebrated theorem of Banach.39. Now there is a general mathematical structure called a metric space: this is a set X equipped with a metric function d : X × X → R. and our task is to show that L is attracting for f .41 is due to A. By weak contractivity. d(x. contradiction. so by Lemma 10. so {xn } is bounded. for any x0 ∈ I. sup I]. Observe that the desired conclusion that xn → L is equivalent to d = 0.222 10. INFINITE SEQUENCES that either (ii) or (iii) holds. But both are elements of I. The flavor of the generalization can perhaps be appreciated by looking back at what tools we used to prove Theorem 10. Step 0: If f has no fixed point in I.41 draws ideas from Beardon’s proof and also from K. Then f has an attracting point in [inf I.37 either a or b is a fixed point. b] is an instance of a general result of M. then by Theorem 10. So we may assume dn > 0 for all n ∈ N. b] and let x0 ∈ I. the sequence of iterates is increasing. Theorem 10. 10.  Remark: The case I = R of Theorem 10. Beardon [Be06]. say xnk → y. for by Lemma 10. The general strategy of the proof was to show that any sequence of iterates of x0 under f was a Cauchy sequence. y) measures the distance between x and y. 1] defined by f (x) = 1+x is weakly contractive but not contractive. Edelstein [Ed62] which is described in the next section. We will show (I) =⇒ (ii). When X = R. Suppose f (x) > x for all x ∈ I. y ∈ X. the very similar proof that (II) =⇒ (iii) is left to the reader. L + d0 ]. we used nothing other than the definition of a contractive mapping and (repeatedly) the triangle inequality. so the sequence of iterates converges to L. Our proof of Theorem 10. To show this. If for some N we have xN = L. If the continuous function g has no roots then either (I) f (x) > x for all x ∈ I or (II) f (x) < x for all x ∈ I. Step 1: Let x0 ∈ I. then by part a) the sequence of iterates of x0 under f would approach either sup I = b or inf I = a. then xn = L for all n ≥ N . L would then be a fixed point of f .41. Proof. Let I ⊂ R be an interval. If there were no fixed point. b) Let I = [a. Step 2: For all n ∈ N. put dn = |xn − L|. Then. The intuition is that for x. Fixed Point Theorems in Metric Spaces. there is a convergent subsequence. Since 0 is a lower bound. Conrad’s treatment of Edelstein’s Theorem in [CdC]. So xn must approach sup I. and then by weak contractivity {dn } is decreasing. As k → ∞ we have: |xnk − L| → |y − L|.40 either sup I or inf I is an attracting point for f . there is d ≥ 0 such that dn → d. |f (xnk ) − L| → |f (y) − L|. Thus we may assume that f has a fixed point L ∈ I. and for n ∈ N. 1 Exercise: Show that the function f : [0.37. and let f : I → I be weakly contractive. this 6Stefan Banach (1892-1945) .5. 1] → [0. so |y − L| = d = |f (y) − L| = |f (y) − f (L)|. |f (xnk ) − L| = |xnk +1 − L| = dnk +1 → d. xn ∈ [L − d0 .

one can directly carry over the definition of a Cauchy sequence: try it! Similarly.39. f (y)) ≤ Cd(x. y) : X × X → R as before: d(x. For instance. This is not really new or surprising: for instance. there is C ∈ (0. y) = d(y. ∞).e. wherever in any of our definitions we see |x − y|. i. d(xn .e. x0 ) < δ. y) = |x − y|. It turns out that these three simple properties enable us to carry over a large portion of the theory of real in- finite sequences to study sequences with values in a metric space X. a fixed point of f .) One important class of examples of metric spaces are obtained simply by taking a subset X ⊂ R and definining d(x. 10. there is N ∈ Z+ such that for all n ≥ N . dY (f (x). y ∈ X. it must satisfy three very familiar properties: (M1) (Definiteness) For all x. dX ) and (Y. (Banach Fixed Point Theorem [Ba22]) Let X be a complete metric space. A function f is continuous if it is contin- uous at every point of X. d(x. In order for a function d to be a metric. For the last century. y) = |x − y|. y). if (X. Now we can state the theorem. Thus d(x. y) = |x − y| is certainly a metric on R. we replace it with d(x. y. Banach’s Fixed Point Theorem has been one of the most important and useful results in mathematical analysis: it gives a very general con- dition for the existence of fixed points. with equality iff x = y. functions x• : Z+ → X. L) < . b) For every x0 ∈ X. z) ≤ d(x. d(x. and a remarkable number of “existence theorems” can be reduced to the existence of a fixed point of some function on some metric space. y ∈ X. Theorem 10. L). y). The point here is that for some subspaces of X. Similarly. consider the sequence xn = n1 as a sequence in (0.. z ∈ X. y ∈ X. z). d(xn . y) + d(y. (The reader should recognize this as a rather straight- forward adaptation of our cherished -δ definition of continuity to this new context.. Essentially. rather it converges to 0. there exists δ > 0 such that for all x0 ∈ X with dX (x. i. d(f (x). dY ) are metric spaces. (M2) (Symmetry) For all x. d(x. 1) such that for all x. Similarly.42 is exactly the same as that of Theorem 10. (M3) (Triangle Inequality) For all x. and let f : X → X be a contraction mapping: that is. The proof of Theorem 10. y) ≥ 0. f (x0 )) < . Then for all n ∈ N. Then: a) There is a unique L ∈ X with f (L) = L.42. For instance. x). in a general metric space X a Cauchy sequence need not converge to an element of X: we say that a metric space X is complete if every Cauchy sequence in X is convergent in X. L) ≤ C n d(x0 . but it does not converge to an element of (0. ∞): it is a Cauchy sequence there. a Cauchy sequence with values in X need not converge to an element of X. we say that a sequence {xn } in a metric space X converges to an element L ∈ X if for all  > 0. define the sequence of iterates of x0 under f by xn+1 = f (xn ) for n ∈ N. if you continue on in your study of mathematics . then a function f : X → Y is continuous at x ∈ X if for all  > 0. CONTRACTION MAPPINGS REVISITED 223 function is nothing else than d(x.

it is the second proof that we wish to carry over here. then by definition of compactness it admits a subsequence xnk converging to some L ∈ X. Theorem 10. Step 1: We claim f has a fixed point. The now standard proof of this seminal result uses Banach’s Fixed Point Theorem!7 Let X be a metric space. But if f (L) 6= L. and let f : X → X be a weakly contractive mapping. We follow [CdC]. f (x)). Then the statement “Every sequence with values in X admits a convergent subsequence” is certainly meaningful. contradiction. so is g and thus the Extreme Value Theorem applies and in particular g attains a minimum value: there is L ∈ X such that for all y ∈ X. i. .e. and then we prove – exactly as we did before – that if a subsequence of a Cauchy sequence converges to L then the Cauchy sequence itself must converge to L. (Edelstein [Ed62]) Let X be a compact metric space. In fact every compact metric space is complete. So L is a fixed point for f .  7In fact the title of [Ba22] indicates that applications to integral equations are being explicitly considered.41 carries over directly to show that L is an attracting point for f . and we ask the interested reader to check that it does carry over with no new difficulties. Since f is continuous. Since by definition this latter property holds in a compact metric space. We say that a metric space is compact if every sequence with values in X admits a convergent subsequence. Step 2: The argument of Step 2 of the proof of Theorem 10.. Step 0: The Extreme Value Theorem has the following generalization to compact metric spaces: if X is a compact metric space.224 10. b]: one using Real Induction and one using the fact that every sequence in [a. then any continuous function f : X → R is bounded and attains its maximum and minimum values. What Edel- stein actually showed was the following result. Recall that we gave two proofs of the Extreme Value Theorem for X = [a. the sequence of iterates of x0 under f converges to f . Edelstein. but – as is already the case with intervals on the real line! – whether it is true or false certainly de- pends on X. then by weak contractivity we have d(f (L). if {xn } is a Cauchy sequence in a compact metric space. f (y)). g(f (L)) < g(L). b) For all x0 ∈ X. b] admits a convergent subsequence. Then f is attractive: a) There is a unique fixed point L of f . Here we need a new argument: the one we gave for X = [a. which is not available in our present context. f (L)). f (L)) ≤ d(y.43. and the proof again requires no ideas other than the ones we have already developed: indeed. Above we showed that a weakly contractive map on a closed. f (f (L))) < d(L. INFINITE SEQUENCES you will surely learn about systems of differential equations. So here goes: let g : X → R by g(x) = d(x. d(L. and the most impor- tant result in this area is that – with suitable hypotheses and precisions. b] used the Intermediate Value Theorem. Proof. An “integral equation” is very similar in spirit to a differential equation: it is an equating relating an unknown function to its integral(s). of course – every system of differential equations has a unique solution. bounded (and thus compact) interval was attractive and attributed this to M.

Proposition 10. |f (s) − f (s0 )| = |f˜(s) − f˜(s0 )| < . so it converges (uniquely!) to 0 and thus f (x) = 0. (Uniqueness of Extension) Let S be a dense subset of an interval I. f : S → R is continuous if it is continuous at s for every s ∈ S. so there is δ > 0 such that for all x ∈ I with |x − s0 | < δ. . and fix  > 0.e. Since S is dense in I. f˜2 : I → R are two continuous functions such that f˜1 (x) = f˜2 (x) = f (x) for all x ∈ S. In particular then. Thus we get a sequence {xn }∞ n=1 taking values in S such that xn → x. Here we wish to consider the following problem: suppose we are given a dense subset S of I and a function f : S → R. 0. for all s ∈ S with |s − s0 | < δ. By Step 1. Proof.8 A subset S of I is dense in I if for all x ∈ I and all  > 0. then f˜1 (x) = f˜2 (x) for all x ∈ I. Let f˜ : I → R be any continuous extension of f . if f˜1 . there is δ > 0 such that for all s ∈ S with |s − s0 | < δ. i. we need the definition of a continuous function f : S → R for an arbitrary subset S of R. . There is nothing novel here: f is continuous at s0 ∈ S if for all  > 0. But {f (xn )} = 0. Proof. 11.44. and it is helpful to nail that down first. Here’s what we really mean: Question 10. f (xn ) → f (x). for all x ∈ I. f˜2 : I → R are two continuous extensions of f . . for any x ∈ I and n ∈ Z+ . as is the subset I ∩ (R \ Q) of irrational numbers in I. Step 1: First suppose f ≡ 0 on S. and put g̃ = f˜1 − f˜2 : I → R. suppose f˜1 . we assume that all of our intervals contain more than one point.  To go further. First. Then since xn → x. f˜ is continuous at s0 .46. and let f : S → R be a function. EXTENDING CONTINUOUS FUNCTIONS 225 11. Step 2: In general. 0. Let S be a dense subset of the interval I. Under which circumstances does f ex- tend to a function f˜ : I → R? Stated this way the answer is easy: always. for every interval I the subset I ∩ Q of rational numbers in I is dense in I. If f admits a continuous extension f˜ to I. there is xn ∈ S with |xn − x| < n1 . |f˜(x) − f˜(s0 )| < . and suppose we are given a function f : S → R. Since S ⊂ I. 0 = g̃(x) = f˜1 (x) − f˜2 (x). For instance.. such that f˜(x) = f (x) for all x ∈ S? Here are some preliminary observations. There is at most one continuus extension of f to I: that is. g̃(x) = f˜1 (x) − f˜2 (x) = 0. Under what circumstances is there a continuous function f˜ : I → R extending f .45. so f˜1 (x) = f˜2 (x) for all x ∈ I. |f (s) − f (s0 )| < . the uniqueness problem is easier to solve than the existence problem. x + ). there exists at least one element of s with s ∈ (x − . Then g̃ is continuous and for all x ∈ S. . Extending Continuous Functions Let I ⊂ R be an interval. Proposition 10. we may set f˜(x) = 0 for all x ∈ I \ S. So we were not careful enough.  8To evade trivialities. For example. then f is continuous on S.. Let s0 ∈ S.

and 2 |f (x1 )−f (x2 )| ≤ |f (x1 )−f (s1 )|+|f (s1 )−f (s2 )|+|f (s2 )−f (x2 )| ≤ +|f (s1 )−f (s2 )|. 3 But since f : S → R is uniformly continuous. When I is closed and bounded. Fix  > 0. Then for m. and we must show that L = M . |an − x| < 2δ . By (i). there is N ∈ Z+ such that for n ≥ N . And we will. Fix  > 0. Since f is uniformly continuous on S. Theorem 10. For such a δ we get 2  |f (x1 ) − f (x2 )| < + = . so |an − bn | < δ and hence |f (an ) − f (bn )| < .47 gives necessary and sufficient conditions for a function f : S → R to admit a uniformly continuous extension to I. |bn − x| < 2δ . f (x) = 0 for x < 2. we may take δ to be sufficiently small so that |s1 − s2 | < δ =⇒ |f (s1 ) − f (s2 )| < 3 . So let’s do it: (i) By Theorem 10. By (ii). (iii) The (now well-defined) extension function f˜ is continuous on I. |s2 − x2 | < . Since an → x. and let f : S → R be a function. It follows that |L − M | ≤ . |f (s2 ) − f (x2 )| < . there are x1 . INFINITE SEQUENCES √ √ √ Example: I = R. By uniform continuity there exists δ > 0 such that |s1 − s2 | < δ =⇒ |f (s1 ) − f (s2 )| < . (ii) For any other sequence {bn } in S with bn → x. 3 3 contradiction. Here is a key observation: our counterexample function f is continuous on S but not uniformly continuous there: there is no δ > 0 such that for all x1 . it suffices to show that {f (an )} is Cauchy. (Extension Theorem) Let S be a dense subset of an interval I. so |f (am ) − f (an )| < . say. Moreover for all sufficiently large n we have |an − x| < 2δ . This suggests that we define f˜(x) = limn→∞ f (an ). 3 3 Then |s1 − s2 | ≤ |s1 − x1 | + |x1 − x2 | + |x2 − s2 | < δ. for each x ∈ I there is a sequence {an } in S with an → x.  Theorem 10. f (x) = 1 for x > 2. since S is dense in I. (iii) Let x ∈ I. S = R \ { 2}. being uniformly contin- uous is much stronger than being continuous. there exists δ > 0 such that |s1 − s2 | < δ =⇒ |f (s1 ) − f (s2 )| < . If f is uniformly continuous on S. there are s1 .47. s2 ∈ S such that δ  |s1 − x1 |. Since  was arbitrary. but there are several things we need to check: (i) The sequence f (an ) is convergent. Proof. L = M . limn→∞ f (an ) = limn→∞ f (bn ).226 10. for an interval which is not closed and bounded. As above. n ≥ N . However. |am − an | < δ. this solves our extension problem because uniform continuity is equivalent to continuity. we know that f (an ) → L and f (bn ) → M . |x1 − x2 | < δ =⇒ |f (x1 ) − f (x2 )| < 1. the only polynomial . (ii) Suppose that an → x and bn → x. x2 ∈ S. For instance. and suppose that f˜ is not uniformly continuous on I: then there exists some  > 0 such that for all δ > 0. Then f is continuous on S but admits no continuous extension to I. x2 ∈ I with |x1 − x2 | < 3δ but |f (x1 ) − f (x2 )| ≥ . |f (s1 ) − f (x1 )|. then it extends uniquely to a uniformly continuous function f˜ : I → R.32.

M ]. Moreover. we don’t have to extend f to I “all at once”. EXTENDING CONTINUOUS FUNCTIONS 227 functions which are uniformly continuous on all of R are the linear polynomials. Proof. since continuity at a point depends only on the behavior of the function in small intervals around the point. (i) =⇒ (ii): Applying the Extension Theorem to f on S ∩ [−M.  . Corollary 10. Let S be a dense subset of R. Since the extensions are unique. way to “soup up” the Extension Theo- rem: if I is not closed and bounded. bounded subintervals of I. (ii) There is a unique extension of f to a continuous function f˜ : R → R. but sneaky. the restriction of f to S ∩ [−M. we may choose any M with M ≥ |x| and define f˜(x) = f˜M (x): this does not depend on which M we choose. However there is an easy. for any x ∈ R. it is immediate that any function constructed from an expanding family of continuous functions in this way is continuous on all of R.48. M ] → R. The following are equivalent: (i) For all M > 0. we can do it via an increasing sequence of closed. we get a unique continuous extension f˜M : [−M. M ] is uniformly continuous. and let f : S → R. 11.

.

Nevertheless there is something interesting here: we have divided the total time of the trip into infinitely many parts. but also a suspicion of. ca. Can the arrow possibly hit the target? Call this event E. getting infinitely many events E1 . That infinitely many things can happen before some predetermined thing Zeno regarded as absurd. E2 . CHAPTER 11 Infinite Series 1. Indeed. infinite processes for well over two thousand years. We may continue in this way. . 490 BC . E2 takes place after 4 seconds.1. Humankind has had a fascination with. . in his view. to divide a whole into infinitely many parts. 2One has to wonder whether he got out much. and so forth: En takes place after 2n seconds. the first kind of infinite process that received detailed infomation was the idea of adding together infinitely many quan- titties. . + n + . The idea that any sort of infinite process can lead to a finite answer has been deeply unsettling to philosophers going back at least to Zeno. Before E can take place. we know exactly when these events Ei take place: E1 takes place after 1 1 1 2 seconds. We will get the flavor of his ideas by considering just one paradox. Historically. . .ca. to put a slightly different emphasis on the same idea. Introduction 1. illusory). Suppose that an arrow is fired at a target one stadium away. he ended up at the lively con- clusion that many everyday phenomena are in fact absurd (so. all of which must happen before the event E.2 Nowadays we have the mathematical tools to retain Zeno’s basic insight (that a single interval of finite length can be divided into infinitely many subintervals) without regarding it as distressing or paradoxical. . or. 430 BC. Similarly he deduced that all motion is impossible. 229 . . 2 4 2 1Zeno of Elea. and the conclusion seems to be that 1 1 1 (79) + + .1 who believed that a convergent infinite process was absurd. Since he had a sufficiently penetrating eye to see convergent infinite processes all around him. But before it does that it must arrive halfway to the halfway point: call this event E2 . Zeno’s ar- row paradox. Zeno Comes Alive: a historico-philosophical introduction. assuming the arrow takes one second to hit its target and (rather unrealistically) travels at uniform velocity. = 1. and he concluded that the ar- row never hits its target. the arrow must arrive at the halfway point to its target: call this event E1 .

230 11. we define Sn = a1 + . n=1 an = ∞ means: for every M ∈ R there exists N ∈ Z+ such that for all n ≥ N . . in which an = 2n for all n. . is a sequence of real numbers and S is a real number. + 1 = n. when necessary we call {an } the sequence P∞ of terms. . . Precisely. we form from our sequence {an } an auxiliary sequence {Sn } whose terms represent adding up the first n numbers.. Surely the ag- gregate of these events takes forever. .. . + 1 + . + 2n + . . INFINITE SERIES So now we have not a problem not in the philosophical sense but in the mathematical one: what meaning can be given to the left hand side of (79)? Certainly we ought to proceed with some caution in our desire to add infinitely many things together and get a finite number: the expression 1 + 1 + . Finally. . Precisely. + an + . for n ∈ Z+ . We see then that we dearly need a mathematical definition of an infinite series of numbers and also of its sum. .. . . + 1 + . so 1 1 (80) Sn = + .. + an . if a1 .. a2 . So here it is. Instead. The associated sequence {Sn } is said to be the sequence of partial sums of the sequence {an }. If the sequence of partial sums {Sn } converges to some number S we say thePinfinite ∞ series is convergent. multiplying (80) by 12 and subtracting it from (80) all but the first and last terms cancel. . P∞ Thus the trick of defining the infinite sum n=1 an is to do everything in terms of the associated sequence P∞ of partial sums Sn = a1 + . Let us revisit the examples above using the formal definition of convergence. Then Sn = a1 + . . . we need to give a precise meaning to the equation a1 + . . + an + .. .. . 2 2 There is a standard trick for evaluating such finite sums. We do not try to add everything together all at once. = S. In particular P∞ by n=1 an = ∞ we mean the sequence of partial sums diverges to ∞. . in which an = 1 for all n. . + an . . if the sequence {Sn } diverges then the infinite series n=1 an is divergent. a1 + . . 1 1 1 1 Example 2: Consider 2 + 4 + . + n. . Namely. each lasting one second. . So to spell out the first definition completely. + an ≥ M . . we say that the infinite series a1 + . and by n=1 an = −∞ we mean the sequence P∞of partial sums diverges to −∞. = n=1 an converges to S – or has sum S – if limn→∞ Sn = S in the familiar sense of limits of seqeunces. . .. . and we get . + an = 1 + . n→∞ n=1 Thus this infinite series indeed diverges to infinity. represents an infinite sequence of events... Example 1: Consider the infinite series 1 + 1 + . and we conclude ∞ X 1 = lim n = ∞.

Now suppose (81) holds for some n ∈ Z+ . then by n=N an we mean the sequence of partial sums aN . Telescoping Series. If this is the case. S1 = 2 1 1 2 S2 = S1 + a2 = + = . aN + aN +1 + aN +2 . We have 1 . even if we don’t really understand why the identity should hold! Indeed. n→∞ n + 1 n=1 How to prove it? First Proof: As ever. 4 20 5 1 n It certainly seems as though we have Sn = 1 − n+1 = n+1 for all n ∈ Z+ . both sides equal 2. INTRODUCTION 231 1 1 1 1 Sn = Sn − Sn = − n+1 . . . 2n It follows that ∞ X 1 1 n = lim (1 − n ) = 1. k +k n+1 k=1 1 When n = 1. n=1 2 n→∞ 2 So Zeno was right! Remark: It is not necessary for the sequence of terms {an } of an infinite series to start with a1 .2. n X 1 n (81) Sn = 2 = . In our applications it will be almost as common Pto consider series ∞ starting with a0 . 3 12 4 3 1 4 S4 = S3 + a4 = + = . then we have ∞ X n an = lim = 1.. 2 2 2 2 and thus 1 Sn = 1 − . (n + 1)(n + 2) (n + 1)(n + 2) n+2 This approach will work whenever we have some reason to look for and successfully guess a simple closed form identity for Sn . But in fact. Then 1 IH n 1 n 1 Sn+1 = Sn + = + 2 = + (n + 1)2 + (n + 1) n + 1 n + 3n + 2 n + 1 (n + 1)(n + 2) (n + 2)n + 1 (n + 1)2 n+1 = = = . as we will see in the coming . if N is any integer. we don’t even have to fully wake up to give an induction proof: we wish to show that for all positive integers n. More generally. . aN + aN +1 . 1. P∞ 1 Example: Consider the series n=1 n2 +n . induction is a powerful tool to prove that an identity holds for all positive integers. 2 6 3 2 1 3 S3 = S2 + a3 = + = . 1.

we know there are constants A and B such that 1 A B = + . . ∀n ≥ 2. Sn − Sn−1 = an . In this case. an = bn − bn−1 . n(n + 1) n n+1 This makes the behavior of the partial sums much more clear! Indeed we have 1 S1 = 1 − . then there is a unique sequence {an }∞ n=1 such that Sn is the sequence of partial sums Sn = a1 + . Thus once again n=1 n21+n = limn→∞ 1 − n+1 1 = 1. In fact induction is not needed: we have that 1 1 1 1 1 1 Sn = a1 + . . We need some insight into why the series n=1 n2 +n happens to work out so nicely. Does this remind us of anything? I certainly hope so: this is a partial fractions decomposition. . But looking at these formulas shows something curious: every infinite series is telescoping: we need only take bn = Sn for all n! Another. Finite sums which cancel in this way are often called telescoping sums. less confusing. + ( − )=1− . . + an . 2 2 3 3 1 1 1 1 S3 = S2 + a3 = (1 − ) + ( − ) = 1 − . if we stare at the induction proof long enough we will eventually notice how convenient it was that the denominator of (n+1)21+(n+1) factors into (n + 1)(n + 2). + an = b1 + (b2 − b1 ) + . + an = (1 − ) + ( − ) + . . In general an infinite ∞ sum n=1 an is telescoping when we can find an auxiliary sequence {bn }∞ n=1 such that a1 = b1 and for all n ≥ 2.232 11. Well. I be- lieve P after those old-timey collapsible nautical telescopes. . Indeed. . in practice it is exceedingly rare that we are able to express the partial sums Sn in a simple closed form. . we may look at the factorization n21+n = (n+1)(n) 1 . + (bn − bn−1 ) = bn . Equivalently. In practice all this seems to amount to the following: if you can find a simple closed form expression for the nth partial sum Sn (in which case you are very . 2 1 1 1 1 S2 = S1 + a2 = (1 − ) + ( − ) = 1 − . way to say this is that if we start with any infinite sequence {Sn }∞ n=1 . INFINITE SERIES sections. 3 3 4 4 1 and so on. This much simplifies the inductive proof that Sn = 1 − n+1 . the key equations here are simply S1 = a1 . n(n + 1) n n+1 I leave it to you to confirm – in whatever manner seems best to you – that we have 1 1 1 = − . for then for all n ≥ 1 we have Sn = a1 + a2 + . . Trying to do this for each given series would turn out P∞to be1 a discouraging waste of time. which tells us how to define the an ’s in terms of the Sn ’s. . 2 2 3 n n+1 n+1 the point being that every term P∞except the first and last is cancelled out by some other term.

e. Rather.. . there are certain operations that one can perform on an infinite series that will preserve the convergence / divergence of the series – i. what is its sum? It may seem that this is only “one and a half questions” because if the series diverges we cannot ask about its sum (other than to ask whether it diverges to ±∞ or “due to oscillation”). show that for all n ≥ 2.. nothing to do with infinite series. In any case.1b) – i. We define the nth harmonic number Hn = k=1 k1 = 1 1 1 1 + 2 + . ∞ interested in determining the convergence / divergence of a given series n=1 an rather than the sum. Pn Exercise: Let n ∈ Z+ . or in more sophisticated lan- guage we may ask for an asymptotic estimate for the sequence of partial sums PN n=1 an as a function of N as N → ∞. Use the method of telescoping sums to give an exact formula Exercise: for n=1 n(n+k) in terms of the harmonic number Hk of the previous exercise. However. Show that for all n ≥ 2. both because geometric series are ubiquitous and turn out to play a distinguished role in the theory in many ways. so far as I know. + n .1b) and the game becomes simply to decide whether a given series is convergent or not.. In these notes we try to give a little more attention to the second question in some of the optional sections. but also because other examples of series in which we can answer Question 11. 2. 2.e. when applied to a convergent series yields another convergent series and when applied to a divergent series yields another divergent series – but will in general 3This is a number theory exercise which has. for the P moment. We should keep this success story in mind. .e. we know that the series con- cr N verges iff |r| < 1 and in that case its sum is 1−r . BASIC OPERATIONS ON SERIES 233 lucky). then in order to prove it you do not need to do anything so fancy as mathe- matical induction (or fancier!). determine the sum of a convergent series – are much harder to come by. This is the discrete analogue of the fact that if you want to show that f dx = F – i. (Suggestion: more specifically. later on we will revisit this missing “half a question”: if a series diveges we may ask how rapidly it diverges. Namely. .1. For an infinite series n=1 an : a) Is the series convergent or divergent? b) If the series is convergent. it will suffice to just compute that S1 = a1 and for all n ≥ 2. Hn ∈ Q \ Z. Note that we have just seen an instanceP in which we asked and answered both ∞ of these questions: for a geometric series n=N crn .. when written as a fraction ab in lowest terms. there is a certain philosophy at work when one is. you already have a function F which you believe is an antiderivative of f – then you need not use any integration techniques whatsoever but may simply check that F 0 = f . then the denominator b is divisible by 2. in a standard course on infinite series one all but forgets about Question 11. Basic Operations on Series P∞ Given an infinite series n=1 an there are two basic questions to ask: P∞ Question 11. But I am a number theorist.)3 + P∞ Let1 k ∈ Z . Sn − RSn−1 = an . Frankly..

then so also is n=1 αan . INFINITE SERIES change the sum. . n b n . a) If α = 0. on occasion).234 11. + an . N ! X ak + Tn = a1 + . then the series n=1 an + bn is P also convergent.3. + aN +n . In other words. If any two of the three series n an . then so does the third. .   |Cn − (A + B)| = |An + Bn − A − B| ≤ |An − A| + |Bn − B| ≤ + = . consider the series n=N +1 an = aN +1 + aN +2 + . hence limn→∞ Tn = n=N +1 an exists. . As the reader has probably already seen for herself. Proposition 11. Let n=1 an . Then the first series converges iff the second series converges. we often employ a simplified notation n an when discussing series only up to convergence. with sum αA. and let α be any real P∞number. for any integer N > 1. removal or altering of any finite number of terms in an infinite series does not affect the convergence or divergence of the series (though of course it may change the sum of a convergent series). The addition. Then An → A and Bn → B. if we are so inclined (and we will be.2. Then. without affecting the convergence. . We record this as follows. Proof. reading someone else’s formal proof of this result can be more tedious than enlightening. n cn converge. P∞ P∞ a) If n=1 an = A and n=1 bn = B are both convergent. or for that matter change finitely many terms of the series. we could add finitely many terms to the series. . n cn be Pthree infinite P series P with c n = an + bn for all n. . then P∞so does limn→∞ k=1 ak + Tn = n k=1 ak + limn→∞ Tn . Show n an converges iff n αan converges. Then for all n ∈ Z+ . ∞ P∞ b) If n=1 an = A is convergent. n=1 bn be two infinite series. n bn . . + an and Tn = aN +1 + aN +2 + . + bn and Cn = a1 + b1 + . show that nPαan = 0. Here is one (among many) ways to show this formally: write Sn = a1 + . P∞ P∞ Proposition 11. suppose we start P∞with a se- ries n=1 an . |An − A| < 2 and |Bn − B| < 2 . . so we leave it to the reader to construct a proof that she finds satisfactory. there is N ∈ Z+ such that for all n ≥ N . . . Conversely if n=1 an exists. so for any  > 0. we may add or remove any finite number of terms from an infinite series without P∞ affecting its convergence. P∞ Because the convergence or divergence of a series n=1 an is not affected by chang- ing P the lower limit 1 to any other integer. so does limn→∞ SN +n = limn→∞ Sn = P∞ P∞ PN Pn=1 an . with sum A + B. + aN + aN +1 + . k=1 P∞ Thus if limn→∞ Tn = n=N +1 an exists. . + an + bn . It follows that for all n ≥ N . . a) Let An = a1 + . Similarly. P P P Exercise: Prove the Three Series Principle: let n an . 2 2 b) Left to the reader. . Bn = b1 + . P b) Suppose α 6= 0. + aN +n = SN +n . First.  P Exercise: Let n an bePan infinite series and α ∈ R. . . . ..

n=N PN +k In other words the supremum of the absolute values of the finite tails | n=N an | is at most . It is important to compare these two results: the Nth Term Test gives a very weak necessary condition for the convergence of the series. Let S = n=1 an . PN +k Let us call P∞ a sum of the form n=N = aN + aN +1 + . Proposition P∞ 11. then an → 0. we recover the Nth Term Test for convergence (Theorem 11.4 then n an diverges. Note that taking k = 0 in the Cauchy criterion. 2. or a diverges. Later we will see many examples. (Cauchy criterion for convergence of series) An infinite series n=1 an converges if and only if: for every  > 0. 4This means: a → L 6= 0. As a matter of notation.2. | n=N an | < . . n ≥ N we have |xn − xm | < . The Nth Term Test. 2. when put under duress (e. In order to turn this into a necessary and sufficient condition we must require not only that an → 0 but also an + an+1 → 0 and indeed that an + . Show P∞ P (n) that if deg P ≥ deg Q. n→∞ n→∞ n→∞ n→∞  P We will often apply theP contrapositive form of Theorem 11.1.+an } ∞ of an infinite series n=1 an . n n . Applying the Cauchy Pcondition to the sequence of partial sums {Sn = a1 +. Recall that we proved that a sequence {xn } of real numbers is convergent if and only if it is Cauchy: that is. while taking an exam) many students can will themselves into believing that the converse might be true. . so we may choose N ∈ Z+ such that for all n ≥ N . Don’t do it! P (x) Exercise: Let Q(x) be a rational function. BASIC OPERATIONS ON SERIES 235 2.5. The polynomial Q(x) has only finitely many roots. Q(x) 6= 0. we get the following result. This gives a nice way of thinking about the Cauchy criterion. an = Sn − Sn−1 .4 is not valid! It may well be the case that an → 0 but n an diverges.4. If n an converges.4). . The Cauchy Criterion. for all  > 0. .4: if for a series n an we have an 9 0. there exists N ∈ Z+ such that for all m. (Nth Term Test) Let n an be an infinite series. Then for all n ≥ 2. P P Theorem 11. + aN +k a finite tail of the series n=1 an . . then n=N Q(n) is divergent. let us abbreviate this by ∞ X | an | ≤ . Therefore lim an = lim Sn − Sn−1 = lim Sn − lim Sn−1 = S − S = 0.g. Warning: The P converse of Theorem 11. if for a fixed N ∈ Z+ and all k ∈ N PN +k we have | n=N an | ≤ . P∞ Proof. + an+k → 0 for a k which is allowed to be (in a certain precise sense) arbitrarily large. . there exists N0 ∈ Z+ such PN +k that for all N ≥ N0 and all k ∈ N. Still.

Again. Its sequence of partial sums is       1 1 1 1 1 Tn = 1 · + · + . we can make what remains arbitrarily small. Therefore our ∞ given series has partial sums bounded above by 1. Thus the sequence of partial sums is unbounded.6. By our work with geometric series we know that Sn ≤ 1 for all n and thus also TnP ≤ 1 for all n.. In general.. An infinite series n=1 an converges P∞ if and only if: for all  > 0. + n.. so n=1 n21n ≤ 1. n21n ≤ 21n . . a closed form expression for Tn = √ 1 + . an infinite series converges iff by removing sufficiently many of the initial terms. i. + 1 = n. . Sn+1 − Sn = an+1 ≥ 0. But we don’t need it: P certainly Tn ≥ ∞ √ 1 + . Let n an be an infinite series with an ≥ 0 for all n. a1 + . INFINITE SERIES P∞ Proposition 11. This simplifies matters considerably and places an array of powerful tests at our disposal.1. 3. + n is not easy to come by. we can compare it with the geometric series n=1 21n : 1 1 1 Sn = + + .2. . if and only if there exists M ∈ R such that for all n. so the sequence of partial sums {Sn } is increasing. . Because of this. so it is not possible for us to compute limn→∞ Tn directly. Moroever if the series converges. But if we just want P to decide ∞ whether the series converges. In particular. Summing these inequalities from k = 1 to n gives Tn ≤ Sn for all n. when dealing P with series with non-negative terms P we may express convergence by writing n an < ∞ and divergence by writing n an = ∞. + an ≤ M .. Applying the Monotone Sequence Lemma we get: P Proposition 11. Series With Non-Negative Terms I: Comparison 3. + · . we have that for all n ∈ Z+ . P∞ Example: Consider the series n=1 n21n . 3. 2 2 4 n 2n Unfortunately we do not (yet!) know a closed form expression for Tn . In other (less precise) words. so n=1 n = ∞. | n=N an | < . and so forth. 2 4 2 Since n1 ≤ 1 for all n ∈ Z+ . If the partial sums are unbounded. .. Then the series converges if and only if the partial sums are bounded above. the series diverges to ∞.7. The Comparison Test. P∞ √ Example: √ conside the series n=1 n. its sum is precisely the least upper bound of the sequence of partial sums. .236 11. Starting P∞ in this section we get down to business by restricting our attention to series n=1 an with an ≥ 0 for all n ∈ Z+ . the series converges. there exists N0 ∈ Z+ such that for all N ≥ N0 . .e. Why? Assume an ≥ 0 for all n ∈ Z+ and consider the sequence of partial sums: S1 = a1 ≤ a1 + a2 = S2 ≤ a1 + a2 + a3 = S3 . The sum is the supremum.

n=1 bn be two series with non-negative terms. and if n an = ∞ then n bn = ∞.. but what to compare it to? Well. . as a moment’s thought will show) 67 24 = 2. Happily. we still retain a quantitative estimate on the sum: it is at most (in fact strictly less than. one easily 1 establishes by induction that n! < 21n if and only if n ≥ 4. The Delayed Comparison Test. Example: Consider the series ∞ X 1 1 1 1 1 =1+1+ + + + . n=1 n=1 P P P P In particular: if n bn < ∞ then n an < ∞. + an .  3. (Comparison Test) Let n=1 an . There is really nothing new to say here.. . · n We would like to show that the series converges by comparison. but at least for all sufficiently large n.7182818284590452353602874714 . Putting an = n!1 and 1 bn = 2n we cannot apply the Comparison Test because we have an ≥ bn for all n ≥ 4 rather than for all n ≥ 0. we don’t know much.. 3. resulting in minor variants of the Comparison Test with a much wider range of applicability.. which also happens to be a bit less than 67 24 . there is always the geometric series! Observe that the sequence n! grows faster than any geometric rn in the sense that limn→∞ rn!n = ∞. .3. For instance. it follows that for any 0 < r < 1 we will have n! < r1n – not necessarily + for all n ∈ Z . . and suppose that an ≤ bn for all n ∈ Z+ ... (Perhaps this reminds you of e = 2.79166666 . n n n=1 n=1 The assertions about convergence and divergence follow immediately. Since ak ≤ bk for all k we have Sn ≤ Tn for all n and thus ∞ X ∞ X an = sup Sn ≤ sup Tn = bn . it can be weakened in several ways. given a series n an we need to find some other series to compare it to. .. . n=0 n! n=0 n! n=4 n! n=4 2 3 8 24 So the series converges. . Then ∞ X ∞ X an ≤ bn . (At the moment. It has two weaknesses: first. SERIES WITH NON-NEGATIVE TERMS I: COMPARISON 237 P∞ P∞ Theorem 11.8. It should! More on this later. Proof. . + + .. but that will soon change. Taking 1 reciprocals. + bn . but just to be sure: write Sn = a1 + . But this objection is more worthy of a bureaucrat than a mathematician: certainly the idea of the Comparison Test is applicable here: ∞ 3 ∞ ∞ X 1 X 1 X 1 X 1 8 1 67 = + ≤ 8/3 + n = + = < ∞. the requirement that an ≤ bn for all n ∈ Z+ is rather incon- veniently strong.) Second... Thus the test will be more or less effective according to the size of our repertoire of known convergent/divergent series. The Comparison Test P is beautifully simple when it works. n=0 n! 2 2 · 3 2 · 3 · 4 2 · 3 · . More than that. Tn = b1 + ..) We record the technique of the preceding example as a theorem. .

More P precisely. (Delayed Comparison Test) Let n=1 . an n2 1 lim = lim 2 = lim 1 = 1.10 n Pto thisPwe get that if n an converges. then n bn converges.11c). n→∞ bn n→∞ n + n n→∞ 1 + n . 0 ≤ an ≤ M bn . P P a) If 0 < L < ∞.10 to the second inequality. Then if n bn converges. and applying Theorem 11. ∞]. Suppose that for all sufficiently large n both an and bn are positive and limn→∞ abnn = L ∈ [0. The first inequality is equivalent toP0 < bn ≤ L2 an for all P P≥ N .4.10. So the two series a n n . In fact it is enough to show this P since for p > 2 we have for all n ∈ Z+ that for p = 2. an ≤ bn . For p = 2. Exercise: Prove Theorem 11. then there exists N ∈ Z+ such that 0 < L2 bnP ≤ an ≤ (2L)bn . P∞ Example: We will show that for all p ≥ 2. n21+n is close to n12 . INFINITE SERIES P∞ P∞ Theorem 11. In all cases we deduce the result from the Limit Comparison Test. 3. P Corollary P 11. The Limit Comparison Test. n bn converge or diverge together.10 to this we get that if n converges. For large n.. P n < n and thus np < n2 so n np ≤ n n12 . Thus the Delayed Comparison Test assures us that we do not need an ≤ bn for all n but only for all sufficiently large n. c) If L = 0 and n bn converges.238 11. then n an converges. we get that if n bn converges. Applying Theorem 11. a) If 0 < L < ∞. n=1 P Pn=1 n=1 P P In particular: if n bn < ∞ then n an < ∞. then n an converges. thenP n bn converges. Suppose that there exists N ∈ Z+ such that for all n > N . an ≥ bn ≥ 0. putting an = n21+n and bn = n12 we have an ∼ bn . A different issue occurs when we wish to apply the Comparison Test and the inequalities do not go our way. and if n an = ∞ then n bn = ∞. then b n n converges. the p-series n=1 n1p converges. Then ∞ ∞ N ! X X X an ≤ an − bn + bn . c) This case is left to the reader as an exercise.10. Proof. Sup- + there exists N ∈ ZP pose thatP and M ∈ R≥0 such that for all n ≥ N . then there exists N ∈ Z+ such Pthat for all n ≥ N P. i. n=1 bn be two series with non-negative terms. P b) If L = ∞ andP n an converges. Exercise: Prove Theorem 11. (Calculus Student’s Limit Comparison Test) Let n an and n bn be two series. n=1 n2 + n n=1 n n + 1 and in particular that n n21+n converges.. we happen to know that 2 p 1 1 1 ∞ ∞   X 1 X 1 1 = − = 1.9. P P Theorem 11.  Exercise: Prove Corollary 11.11. Applying Theorem 11. either both convege Por both diverge).e.9.e. n bn two series. the series n an and n bn converge or diverge together (i. b) If L = ∞. (Limit Comparison Test) Let n an . n an converges.

{bn }. we find that n n21+n and n n12 converge or diverge P P together. Then ∞ ∞ X X 1 an = bn = 1 = 2. whereas ∞ ∞ X X 1 1 4 an bn = n = 1 = . P (x) Exercise: Let be a rational function such that the degree of the denominator Q(x) P∞ P (n) minus the degree of the numerator is at least 2. multiply them? In order to forestall possible confusion.g. . + an b0 + . . 3. Show n=N Q(n) converges. .) = a0 b0 + a0 b1 + a1 b0 + a0 b2 + a1 b1 + a2 b0 + . if all the terms are positive. Since the former series converges. (a0 + a1 + a2 )(b0 + b1 + b2 ) = a0 b0 + a1 b1 + a2 b2 + a0 b1 + a1 b0 + a0 b2 + a1 b1 + a2 b0 . . P In fact this is not a very useful candidate for the product. . SERIES WITH NON-NEGATIVE TERMS I: COMPARISON 239 Applying Theorem 11. + bn + . + a0 bn + a1 bn−1 + . n=0 n=0 4 1 − 4 3 4 Unfortunately 3 6= 4. . in some sense. But for instance. P∞ P∞ Let n=0 an and n=0 bn be infinite series. take {an } = {bn } = 21n .)(b0 + b1 + .5 Let us try again at formally multiplying out a product of infinite series: (a0 + a1 + . n=0 n=0 n=0 In other words. The product is different and more complicated – and indeed. we want our “product series” to converge to AB. . If n an = A and n bn = B. . . Exercise: P∞ Determine whether each of the following series converges or diverges: a) n=1 sin n12 . Can we. 3. we deduce that n n12 converges.11. we form a new sequence of terms {an bn } and then thePassociated series. given two sequences of terms {an }. P∞ b) n=1 cos n12 . . . . What went wrong? Plenty! We have ignored the laws of algebra for finite sums: e. . . strictly larger – than just a0 b0 + a1 b1 + a2 b2 . n=0 n=0 1− 2 so AB = 4. + an + . let us point out that many students are tempted to consider the following “product” operation on series: ∞ ∞ ∞ ! ! ?? X X X an · bn = an bn . . Cauchy products I: non-negative terms. 5To the readers who did not forget about the cross-terms: my apologies. We have forgotten about the cross-terms which show up when we multiply one expression involving several terms by another expression involving several terms.5. even P though the Comparison Test does not directly apply. . But it is a common enough misconception that it had to be addressed.

the box product... .. +aN b0 + aN b1 + . ∞ Let P∞ n=0 an = A and n=0 b n = B. the terms of the N th partial sum of the Cauchy product can naturally be arranged in a triangle: CN = a0 b0 +a0 b1 + a1 b0 + a0 b2 + a1 b1 + a2 b0 +a0 b3 + a1 b2 + a2 b1 + a3 b0 . + an bn k=0 P∞ P∞ and then we define the Cauchy product of n=0 an and n=0 bn to be the series ∞ ∞ n ! X X X cn = ak bn−k . the question is to what extent one can neglect the terms in the upper half of the square – i. 0≤i. LetP{an }∞ ∞ n=0 . n=0 n=0 k=0 Theorem P∞ 11. {bn }n=0 be two series Pnwith non-negative terms. + aN b0 . . N →∞ N →∞ So the box product clearly converges to the product of the sums of the two series. This suggests that we compare the Cauchy product to the box product. viz: N = a0 b0 + a0 b1 + . + a1 bN . . .j≤N Thus by the usual product rule for sequences.12. as follows: for N ∈ N. + bN ) = AN BN . Thus while N is a sum of (N + 1)2 terms. . . . we have lim N = lim AN BN = AB. Proof. We define another sequence. X N = ai bj = (a0 + . CN is a sum of 1 + 2 + . . + a0 bN +a1 b0 + a1 b1 + . + N + 1 = (N +1)(N +2) 2 terms: those lying on our below the diagonal of the square.e. In order to shoehorn the right hand side into a single infinite series. Thus in considerations involving the Cauchy product. those with ai bj with i + j > N – as N gets large. . Putting c n = k=0 ak bn−k we have that n=0 c n = AB. The entries of the box product can be arranged to form a square. P the Cauchy product series converges iff the two “factor series” n an and n bn both converge. . . . we need to either (i) choose some particular ordering of the terms ak b` or (ii) collect some terms together into an nth term. . . + aN bN . For the moment we choose the latter: we define for any n ∈ N Xn cn = ak bn−k = a0 bn + a1 bn−1 + . . P In particular. . INFINITE SERIES The notation is getting complicated. + aN )(b0 + .240 11. . . +a0 bN + a1 bN −1 + a2 bN −2 + . On the other hand. .

the key observation is that if we make the sides of the triangle twice as long.878. i. SERIES WITH NON-NEGATIVE TERMS II: CONDENSATION AND INTEGRATION 241 Here.30 . Since S1050 ≈ 7. Perhaps the partial sums never exceed 6? (If so. . N →∞ N →∞ N →∞ N →∞ Having shown both that C ≤ AB and C ≥ AB. S103 ≈ 7. For the converse.4854. n=0 n=0 n=0  4. Does it converge? None of the tests we have developed so far are up to the job: especially.878. so the above guess is incorrect. In this section we place ourselves under more restrictive hypotheses: that for all n ∈ N. every term of N is of the form ai bj with 0 ≤ i. Series With Non-Negative Terms II: Condensation and Integration P We have recently been studying criteria for convergence of an infinite series n an which are valid under the assumption that an ≥ 0 for all n. since all the ai ’s and bj ’s are non-negative and N contains all the terms of CN and others as well. Next we compute S150 ≈ 5. This points to Sn ≈ log n. that the sequence of terms is non-negative and decreasing. Is this close to a familiar real number? Not really.187. If this is so. we conclude ∞ ∞ ! ∞ ! X X X C= an = AB = an bn . P∞ 1 Consider n=1 n . Thus C = limN →∞ CN ≤ AB. We ask the reader to hold this thought until we discuss the integral test a bit late on. j ≤ N .090.e. = log 10. the harmonic series. S100 is approximately 5. Let us take a computational approach by looking at various partial sums. an → 0 so the Nth Term Test is inconclusive.) Let’s try a significantly larger partial sums: S1000 ≈ 7. Now there is a pattern for the perceptive eye to see: the difference S10k+1 − S10k appears to be approaching 2. we are getting the idea that whatever the series is doing. I hope you notice that the relation between n1 and log n is one of a function and its antiderivative. 4. then since log n → ∞ the series would diverge. thus i + j ≤ 2N so ai bj appears as a term in C2N . . S104 ≈ 9. .591 and S200 ≈ 5. . it will contain the box: that is. so let’s instead start stepping up the partial sums multiplicatively: S100 ≈ 5. since the terms are assumed decreasing we have 0 = aN = aN +1 = .584.788.485. . it’s doing it rather slowly. we certainly have CN ≤ N = AN BN ≤ AB. an+1 ≥ an ≥ 0. aN = 0 for some N and. the series would converge. PN −1 and our infinite series reduces to the finite series n=1 an : this converges! 4. S105 ≈ 12.1. Remark: It’s no loss of generality to assume an > 0 for all n. It follows that C2N ≥ N and thus C = lim CN = lim C2N ≥ lim N = lim AN BN = AB.. If not. The Harmonic Series.

. The apparently ad hoc argument used to prove the divergence of the harmonic series can be adapted to give the following useful test. . we would only decrease the sum of the series. P∞ Theorem 11.L. For a rational function Q(x) .e. . Cauchy...2. due to A. = 2n a2n n=0 = (a1 + a2 ) + (a2 + a4 + a4 + a4 ) + (a4 + a8 + a8 + a8 + a8 + a8 + a8 + a8 ) + (a8 + . b) Thus the series n an converges iff the condensed series n 2n a2n converges. . . . The power of 21 which begins each block is larger than every term in the preceding block. . . We deduce the following result. we give the following brilliant and elementary argument due to Cauchy. But this latter sum is much easier to deal with: ∞       X 1 1 1 1 1 1 1 1 1 1 1 ≥ + + + + + + + .242 11. (Condensation Test) Let n=1 an be an infinite series such that an ≥ an+1 P∞≥ 0 for all P∞ n ∈ N. P Proof. . The Condensation Test. Left to the reader. so that every term in the sum is well defined. We have X∞ an = a1 + a2 + a3 + a4 + a5 + a6 + a7 + a8 + . 1 2 3 4 5 6 7 i. Consider the terms arranged as follows:       1 1 1 1 1 1 1 + + + + + + + . we group the terms in blocks of length 2k . so if we replaced every term in the current block the the first term in the next block. Then: P ∞ a) We have n=1 P an ≤ n=0 2n a2n ≤ 2 n=1 an . the series n=N Q(n) con- verges iff deg Q − deg P ≥ 2. n=1 6Take N larger than any of the roots of Q(x). 1 P Exercise: Determine the convergence of n 1 . = ∞. = + + + .14. .) X∞ =2 an . INFINITE SERIES For now.13.) ≤ (a1 + a1 ) + (a2 + a2 + a3 + a3 ) + (a4 + a4 + a5 + a5 + a6 + a6 + a7 + a7 ) + (a8 + . . n=1 ∞ X ≤ a1 + a2 + a2 + a4 + a4 + a4 + a4 + 8a8 + . Show that P∞ P (n) 6 n=N Q(n) diverges. . . P (x) P∞ P (n) Proposition 11... n=1 n 2 4 4 8 8 8 8 2 2 2 P∞ Therefore the harmonic series n=1 diverges.  4. Proof. n1+ n P (x) Exercise: Let Q(x) be a rational function with deg Q − deg P = 1.

643957981030164240100762569 . First. if p = 0 then our series is simply n n10 = n 1 = ∞. taking p = 2 we get ∞ X 1 1 2 ≤ = 2. For p ∈ R. so that the hypotheses P n nof−p Cauchy’s P nConden- −p −np sation P 1−p nTest apply. . namely ∞ ∞ ∞ X 1 X n n −p X 1 p ≤ 2 (2 ) = (21−p )n = . Step 1: The sequence an = n1p has positive terms. P Example (p-series continued): Let P∞p > 1. By applying part b) of Cauchy’s Con- densation Test we showed that n=1 n1p < ∞.15. and of couse without the hypothesis that the terms are decreasing. P P So the p-series “obviously diverges” when p ≤ 0. under the given hypotheses. This is a very curious phenomenon. isn’t the condensed series n 2n a2n more complicated than the original series n an ? In fact the opposite is often the case: passing from the given series to the condensed series preserves the convergence or divergence but tends to exchange subtly convergent/divergent series for more obviously (or better: more rapidly) converging/diverging series.  The Cauchy Condensation Test is. On the other hand. n=1 n2 7Or sometimes: hyperharmonic series. Step 2: Henceforth we assume P p > 0. . and part b) follows immediately. n=1 n n=0 n=0 1 − 21−p For instance. We get that n n converges iff n 2 (2 ) = n2 2 = n (2 ) converges. So we had better treat the cases p ≤ 0 sepa- rately. nothing like this could hold. it may be less clear thatP the Condensation Test is of any practical use: afterPall. so long as the sequence remains decreasing. then limn→∞ n1p = limn→∞ n|p| = ∞. an a priori interesting result: it says that. What about part a)? It gives an explicit upper bound on the sum of the series. so it converges iff |r| < 1 iff 2p−1 > 1 iff p > 1. P∞ 1 Example: Fix a real number p and consider the p-series7 n=1 np . if p < 0. SERIES WITH NON-NEGATIVE TERMS II: CONDENSATION AND INTEGRATION 243 This establishes part a). Second. I think. 4. Theorem 11. The terms are decreasing iff the sequence np is increasing iff p > 0. Thus we have proved the following important result. in order to determine whether a series converges we need to know only a very sparse set of the terms of the series – whatever is happening in between a2n and a2n+1 is immaterial. Our task is to find all values of p for which the series converges. so the p-series diverges by the nth term test. the p-series n n1p converges iff p > 1. . n=1 n 1 − 21−2 Using a computer algebra package I get 1024 X 1 1≤ = 1. But the latter series is a geometric series with geometric ratio r = 21−p . .

4. n=2 1 n=1 P R∞ Thus the series n an converges iff the improper integral 1 f (x)dx converges. n=2N +1 n=0 b) Apply part a) to show that for any p > 1. X 1 1 p ≤ Np . The Integral Test. q. a) Show that n n(log1 n)q converges iff q > 1. Exercise: Let Pp. Then X∞ Z ∞ ∞ X an ≤ f (x)dx ≤ an . INFINITE SERIES P∞ So it seems like n=1 n12 ≈ 1. N n 2 (1 − 21−p ) n=2 +1 P∞ 1 1 Example: n=2 n log n . and showing that the difference between convergence and divergence can be arbitarily subtle. But log n grows more slowly than n for any  > 0.64. Exercise: Let N be a non-negative integer. We get that the convergence of the series is equivalent to the convergence of X 2n 1 X1 n n = = ∞.16. r such that n np (log n)q1(log log n)r converges.) The following P∞ exercise gives a technique for using the Condensation Test to es- timate n=1 n1p to arbitrary accuracy. c) Find all values of p. whereas the Condensation Test tells us that it is at most 2. 1 P b) Show that n np (log n)q converges iff p > 1 or (p = 1 and q > 1). (Integral Test) Let f : [1. q. r be positive real numbers.244 11. 1 P Exercise: Determine whether the series n log(n!) converges. an = n log n is positive and decreasing (since its reciprocal is positive and increasing) so the Condensation Test applies. . a) Show that under the hypotheses of the Condensation Test we have ∞ X ∞ X an ≤ 2n a2n+N . (Note that since the terms are positive. n nn  converges. simply adding up any finite number of terms gives a lower bound.X can be continued indefinitely. P The pattern of Exercise X. giving series which con- verge or diverge excruciatingly slowly. and for n ∈ Z+ put an = f (n). indeed slowly enough so that replacing n with log n converts a convergent series to a divergent one. Theorem 11. This is rather subtle: we know that for any  > 0. ∞) → R be a positive decreasing function.3. n 2 log 2 log 2 n n P 1 so the series diverges. since it is a p-series with p = 1 + .

Indeed the series converges iff the improper integral 1 xp is finite. it is perhaps not so surprising that many texts content themselves to give one or the other. The Condensation Test and the Integral Test have a similar range of applicabil- ity: in most “textbook” examples. . the p-series diverges iff p > 1. Z ∞ dx = log x|∞x=1 = ∞. From the picture one sees immediately that – since f is decreasing – the lower sum is PN +1 PN n=2 an and the upper sum is n=1 an . so will the other. then we have Z ∞ dx x1−p x=∞ = | . In elementary analysis courses one often develops sequences and series before the study of functions of a real variable. under a mild additional hypothesis the Integral Test can be used to give asymptotic expansions for divergent series. 1 x p 1 − p x=1 The upper limit is 0 if p − 1 < 0 ⇐⇒ p > 1 and is ∞ if p < 1. the Condensation Test is more appealing (to me). the result follows. 1698-1746 9Why this should be so is not clear to me: the observation is purely empirical. N }. SERIES WITH NON-NEGATIVE TERMS II: CONDENSATION AND INTEGRATION 245 Proof.9 Example: P 1 Let us use the Integral Test to determine the set of p > 0 such that R ∞ dx n n p converges. In calculus texts. once again. 2. Exercise: Verify that all of the above examples involving the Condensation Test can also be done using the Integral Test. ∞) into subintervals RN [n. . so that N X +1 Z N N X an ≤ f (x)dx ≤ an . Namely. n + 1] for all n ∈ N and for any N ∈ N we compare the integral 1 f (x)dx with the upper and lower Riemann sums associated to the partition {1. it usually holds that the Condensation Test can be successfully applied to determine convergence / di- vergence of a series if and only if the Integral Test can be successfully applied. which is log- ical because a formal treatment of the Riemann integral is necessarily somewhat involved and technical. Among series which arise naturally in undergraduate analysis. From an aesthetic standpoint. . Thus many of these texts give the Condensation Test. Our treatment of the 8Colin Maclaurin. one almost always finds the Integral Test. 4. where one test succeeeds. which is logical since often integration and then improper integation are covered earlier in the same course in which one studies infinite series. we divide the interval [1. Given the similar applicability of the Condensation and Integral Tests.  Remark: The Integral Test is due to Maclaurin8 [Ma42] and later in more modern form to Cauchy [Ca89]. Finally. If p 6= 1. 1 x So. n=2 1 n=1 Taking limits as N → ∞. This is a rare opportunity in analysis in which a picture supplies a perfectly rigorous proof. . . On the other hand.

g : [a. Let f : [1. n→∞ f (n) so by the Squeeze Principle we have Z n+1 (82) f (x)dx ∼ f (n).246 11. {bn } with bn 6= 0 for all sufficiently large n. P Proof. Then n bn = ∞ and n=1 an ∼ n=1 bn . where C. Now fix  > 0 and choose K ∈ Z+ such that for all n ≥ K we have an ≤ (1 + )bn . Sup- pose the series n f (n) diverges and that as x → ∞. Show that f ∼ g iff f and g have the same degree and the same leading coefficient. Similarly. f (x) ∼ f (x + 1). Exercise: Let f and g be nonzero polynomial functions. Let {an } and {bn } be two sequences of positive real numbers P P PN PN with an ∼ bn and n an = ∞. Then. Then XN Z N f (n) ∼ f (x)dx. INFINITE SERIES next two results owes a debt to [CdD]. we have Proof. ∞) → R and that g(x) 6= 0 for all sufficiently large x. we find that Pn=1 N is at most 1 + 2 for all sufficiently n=1 bn P P large N . or R n+1 f (x)dx f (n + 1) 1≤ n ≤ . N →∞ n=1 bn  Theorem P 11. for n ≤ x ≤ n + 1. n . Let us first establish some notation: Suppose that f. Then for N ≥ K. Lemma 11. n=1 1 R n+1 Case 1: Suppose f is increasing. It follows that n=1 an PN an lim Pn=1 N = 1. By the Limit Comparison Test.K + (1 + ) bn . Dividing both sides by n=1 bn and using PN PN an limN →∞ n=1 bn = ∞. n=1 n=1 n=1 n=1 PN say. we write an ∼ bn if limn→∞ abnn = 1. we also have PN n=1 bn that PN is at most 1 + 2 for all suffiicently large N . f (n) ≤ n f (x)dx ≤ f (n + 1).18. given sequences {an }. N X K−1 X N X K−1 X N X an = an + an ≤ an + (1 + )bn n=1 n=1 n=K n=1 n=K K1 K−1 ! N N X X X X = an − (1 + )bn + (1 + )bn = C.K does not depend on N . We write f ∼ g and say f is asymptotic to g if limx→∞ fg(x) (x) = 1. ∞) → (0.17. f (n) f (n) By assumption we have f (n + 1) lim = 1. n an = ∞. Because our hypotheses are symmetric in n an and n bn . ∞) be continuous and monotone.

if for sufficiently large n an is bounded below by a positive constant M times rn forP r ≥ 1. d) The hypothesis of part b) holds if ρ = limn→∞ aan+1 n exists and is greater than 1. Conversely. Then for n ≤ x ≤ n + 1. then the series converges by comparison to the convergent geometric series n M rn . n or R n+1 f (n + 1) f (x)dx ≤ n ≤ 1. Then P the series n an diverges. we have Z n+1 f (n + 1) ≤ f (x)dx ≤ f (n). a) Suppose there exists N ∈ Z+ and 0 < r < 1 such that for all n ≥ N . we conclude Z N +1 N Z k+1 X N X f (x)dx = f (x)dx ∼ f (n). An easy induction argument shows that for all k ∈ N. aan+1 n ≥ r. 1 1 n=1 Case 2: Suppose f is decreasing. we have R N +1 1 f (x)dx ∞ ∗ f (N + 1) lim RN = = lim = 1. a) Our assumption is that for all n ≥ N . N →∞ f (x)dx ∞ N →∞ f (N ) 1 where in the starred equality we have applied L’Hopital’s Rule and then the Fun- damental Theorem of Calculus. the remainder of the proof proceeds as in the previous case. by our assumption that f (n) ∼ f (n + 1) and the Squeeze Principle we get (82). aan+1 n ≤ r. The Ratio Test.19.  5. . b) Suppose there exists N ∈ Z+ and r ≥ 1 such that for all n ≥ N . aan+1 n ≤ r < 1. 5. (Ratio Test) Let n an be a series with an > 0 for all n. We conclude Z N Z N +1 N X f (x)dx ∼ f (x)dx ∼ f (n). f (n) f (n) Once again. Proof. 1 k=1 k n=1 Further.17 with an = f (n) and bn = n f (x)dx. aN so aN +k ≤ aN rk . P Then the series n an converges. P Theorem 11. then the series diverges by comparison to the divergent geometric series n M rn . aN +k ≤ rk . c) The hypothesis of part a) holds if ρ = limn→∞ aan+1 n exists and is less than 1. Series With Non-Negative Terms III: Ratios and Roots P We continue our analysis of series n an with an ≥ 0 for all n.1. Then an+2 an+2 an+1 2 an = an+1 an ≤ r . SERIES WITH NON-NEGATIVE TERMS III: RATIOS AND ROOTS 247 R n+1 Applying Lemma 11. In this section we introduce two important tests based on a very simple – yet powerful – idea: if for sufficiently large n an is bounded above by a non-negative constant M times rn for 0 ≤ rP < 1. 5.

P∞ n Example: Let x > 0. it follows that for all k ∈ N. 1 a) Suppose therePexists N ∈ Z+ and 0 < r < 1 such that for all n ≥ N . the root test will also succeed. 1 P that for infinitely many positive integers n we have an ≥ 1. In this section we give a variant P of the Ratio Test. aN +k ≥ rk . (Recall we showed this earlier when x = 1. aan+1 n ≥ r ≥ 1. k=N k=0 k=0 P so the series n an converges by comparison. Ratios versus Roots. Suppose now that P 1 n an is a series with non-negative terms such that an ≤ r for some r < 1. our assumption is that for all n ≥ N .3. Exercise: Prove Theorem 11.2. Rais- n ing both sides to the nth power gives an ≤ rn . and once again we find that the series converges by comparison to a geometric series. ann ≤ r. 5. whenever the ratio test succeeds in determining the con- vergence or divergence of a series.19. b) Similarly.248 11. Then the b) Suppose n series n an diverges.) We consider the quantity xn+1 an+1 (n+1)! x = xn = . Then the series n an converges. INFINITE SERIES Summing these inequalities gives X∞ X∞ ∞ X ak = aN +k ≤ aN rk < ∞. so the series diverges by the Nth Term Test. 1 d) The hypothesis of part b) holds if ρ = limn→∞ ann exists and is greater than 1. We will show that the series n=0 xn! converges. The Root Test. Instead of focusing on the property that the geometric series n rn has constant ratios of consecutive terms. It is a fact – a piece of calculus folklore – that the Root Test is stronger than the Ratio Test.20. In order to explain this result we need to make use of the limit infimum and limit . P Theorem 11. we observe that the nth root of the nth term is equal to r.20. It follows that an 9 0. That is. 5. (Root Test) Let n an be a series with an ≥ 0 for all n. We leave the proofs of parts c) and d) as exercises. an n! n+1 an+1 It follows that limn→∞ an = 0. As above.  Exercise: Prove parts c) and d) of Theorem 11. Thus the series converges for any x > 0. aN so aN +k ≥ aN rk ≥ aN > 0. 1 c) The hypothesis of part a) holds if ρ = limn→∞ ann exists and is less than 1.

aan+1 n ≤ r. Put n an+1 an+1 ρ = lim inf . First we recast the ratio and root tests in those terms. we have an+1 1 1 an+1 ρ = lim inf ≤ θ = lim inf ann ≤ θ lim sup ann ≤ ρ = lim sup . P 1 c) Give anPexample of a non-negative series n an with θ = lim supn→∞ ann = 1 such that n an = ∞. a) Show that (i) =⇒ (ii) =⇒ (iii) and that neither implication can be reversed. the series P n an converges.  10This is not a typo: we really mean the limsup both times. Step 3: We must show that ρ ≤ θ. Proof. unlike in the previous exercise. n→∞ an n→∞ n→∞ n→∞ an Exercise: Let A and B be real numbers with the following property: for any real number r. SERIES WITH NON-NEGATIVE TERMS III: RATIOS AND ROOTS 249 supremum. n→∞ an n→∞ an P a) Show that if ρ < 1. b) Show that if θ > 1. r Now 1  a  n+k 1 1 n θ = lim sup ann = lim sup an+k k ≤ lim sup r n = r. hence θ ≤ θ. so that for all sufficiently large n. and we leave it as an exercise. This is very similar to the argument of Step 2. (iii) lim supn→∞ xn ≥ 1. For any series n an with positive terms. As in the proof of the Ratio Test. For this. Put 1 θ = lim sup ann . we have an+k < rk an for all k ∈ N.20. Show that B ≤ A. n→∞ P a) Show that if θ < 1. (ii) For infinitely many n. we conclude θ ≤ ρ. 5. P Proposition 11. b) Show that if ρ > 1 the series n an diverges. suppose r > ρ. Step 2: We show that θ ≤ ρ. lim inf xn ≤ lim sup xn . xn ≥ 1. the series n an converges. the series n an diverges. if A < r then B ≤ r. We may rewrite this as a  n an+k < rn+k n . n→∞ k→∞ k→∞ r By the preceding exercise.10 P Exercise: Consider the following conditions on a real sequence {xn }∞ n=1 : (i) lim supn→∞ xn > 1. b) Explain why the result of part b) of the previous Exercise is weaker than part b) of Theorem 11. ρ = lim sup . P Exercise: Let an be a series with positive terms. P Exercise: Let n an be a series with non-negative terms. . Step 1: For any sequence {xn }.21. r or 1  a  n+k1 n+k n an+k <r n .

22. ∞]. 1 c) limn→∞ (n!) n . Warning: The sense in which the Root Test is stronger than the Ratio Test is a theoretical one. This is not so bad if we keep our head – e. ρ < 1. P a) Show that ρ = 18 and ρ = 2. α b) For α ∈ R. If instead we tried the Root Test we would have to evaluate 1 n 1 limn→∞ ( n! ) . INFINITE SERIES Exercise: Give the details of Step 3 in the proof of Proposition 11. so the series converges. Indeed an+1 1/(n + 1)! n! 1 lim = lim = lim = lim = 0.  Exercise: Use Corollary 11. even though in theory when- ever the Ratio Test works the Root Test must also work. We turn now to the serious study of series with both positive and negative terms. so it is 0. n→∞ an n→∞ 1/n! n→∞ (n + 1)n! n→∞ n + 1 Thus the Ratio Test limit exists (no need for liminfs or limsups) and is equal to 0. . P∞ 1 Example: Consider again the series n=0 n! . But this is elaborate compared to the Ratio Test computation. Assume 1 an+1 that limn→∞ an → L ∈ [0. it may well be the case that the Ratio Test is easier to apply than the Root Test. Now suppose that the Ratio Test succeeds in showing that the series is divergent: that is ρ > 1. so the Root Test also shows that the series is convegent. b) Show that θ = θ = 12 .21. Proposition 11. so the Root Test shows that the series converges. we have θ ≤ ρ ≤ 1. limn→∞ n n .250 11.22 to evaluate the following limits: 1 a) limn→∞ n n . Then θ ≥ θ ≥ ρ > 1. Thus the root test limit is at most R1 for any positive R. Corollary 11. Absolute Convergence 6. Let {an }∞ n=1 be a sequence of positive real numbers. Then also limn→∞ ann = L. the hypothesis gives that for the infinite series n an we have ρ = L. n Exercise: Consider the series n 2−n+(−1) . so by Proposition 11.g. In the presence of factorials one should always attempt the Ratio Test first.1. so the Ratio Test fails. Then by Proposition 11. Turning these ideas around. Indeed. 6. P Proof. so the Root Test also shows that the series is divergent. Exercise: Construct further examples of series for which the Ratio Test fails but the Root Test succeeds to show either convergence or divergence.21 can be put to the following sneaky use. P Now let n an be a series which the Ratio Test succeeds in showing is conver- gent: that is. which was immediate. Introduction to absolute convergence. For a given relatively benign series. n! > Rn and thus ( n! ) ≤ ( R1n ) n = R1 .21 we must also have θ = L.21. one can show that for 1 n1 1 any fixed R > 0 and sufficiently large n.

P First Proof : Consider P the three series a n n . But we claim that this implies that n an + |an | converges as well. n n | and |a n an + |aP n |. (In the next section we get really serious by studying series in the absence of absolute conver- gence. But the following result already clarifies matters a great deal. 6. The main idea is that when we try to extend our rich array of convergence tests for non-negative series to series with both positive and negative terms. show that | n=0 an | ≤ n=0 |an |. ABSOLUTE CONVERGENCE 251 It turns out that under one relatively mild additional hypothesis. . such that nonabsolute converge and conditional convergence may indeed differ. By the Three Series Principle. The reasoning for this – which we admit will seem abstruse at best to our target audience – is that in functional analysis one studies convergence and absolute convergence of series in a more general context. n  P∞ P∞ P∞ Exercise: If n=0 an is absolutely convergent. Proposition 11. a sufficient (and often necessary) hypothesis is that the series be not just convergent but absolutely convergent. n=N n=N n=N P and an converges by the Cauchy criterion. n=N |an | < .PIn particular the series P a n n + |a n | has non-negative terms and n an +P|an | ≤ n 2|an | < ∞. Second Proof : The above argument P is clever – maybe too clever! Let’s try something more fundamental: since P∞ n n |a | converges. In this section we study this wonderful hypothesis: absolute convergence. We will later on give a separate definition for “conditionally convergent” and then it will be a theorem that a real series is conditionally convergent if and only if it is nonabsolutely convergent. virtually all of our work on series with non-negative terms can be usefully applied in this case. 11We warn the reader that the more standard terminology is conditionally convergent. so to decide whether P it is convergent we may use allP the tools of the last three sections. Proof. N X +k N X +k ∞ X | an | ≤ |an | ≤ |an | ≤ . consider the expression an + |an |: it is equal to 2an =P 2|an | when an is non-negative and 0 when Pan is negative. Note that A n |an | is a series with non-negative terms. A series n an which converges but for which n |an | diverges is said to be nonabsolutely convergent. Therefore for all k ≥ 0. So a n n + |an | converges. ∞ P∞ P∞ {an }n=1 of rational numbers such that Exercise: Find a sequence n=1 |an | is a rational number but n=1 an is an irrational number. n an converges. We shall give two proofs of this P important P result. although it is not obvious. This is indeed the case. Every absolutely convergent real series is convergent.11 The Pterminology absolutely convergent suggests that the convergence P of the se- ries n |an | is somehow “better” than the convergence of the series n an . This will lead to surprisingly delicate and intricate considerations. Indeed.) P P P real series n an is absolutely convergent if n |an | converges.23. for every  > 0 there exists N ∈ Z+ such that for all n ≥ N . Our hypothesis is that n |an | converges.

P under the hypotheses of parts b) and d) we have |an | 9 0. n→∞ an an+1 ρ = lim sup | |. P Theorem 11. Theorem 11. As an example of how Theorem 11. contradicting (ii). and P∞ P∞ (ii) n=1 an diverges and n=1 bn converges. the series n an is divergent. they do so by showing that bn 9 0. then the series n an is absolutely convergent. hence also an 9 0 Thus so n an diverges by the Nth Term Test (Theorem 11. a) Assume an 6= 0 for all n. |an | n ≤ r. this implies that n an is convergent. Parts a) and Pc) are immediate: applying Theorem 11. then applying the P limit comparison test to the abso- ∞ ∞ lute series n=1 |an | and n=1 |bn |. If there Pexists 0 ≤ r < 1 such that for all sufficiently large n. b) Assume an 6= 0 for allP n. n→∞ an an+1 ρ = lim inf | |. P∞ an −bn Then: n=1 |bn | = ∞ and limn→∞ bn = 0. Suppose: (i) limn→∞ abnn = 1. (Ratio & Root Tests for Absolute Convergence) Let n an be a real series.25.23. INFINITE SERIES Exercise: State and prove a Comparison Test and Limit Comparison Test for ab- solute convergence. (We will study this subtlety later on in detail. ∞). 1 d) If there are infinitely many n for which |an | n ≥ 1. Proof. we record the following result.252 11. because in general just because n |a n | = ∞ does not imply that n an diverges.24. then still P∞ n=1 bn is not absolutely convergent. then the series diverges.4). | aan+1 n | ≤ r.P∞ If n=1 |bn | < P∞. If there exists r > 1 such that for all sufficiently large n. P There is something to say in parts P b) and d).19 (resp. | aan+1 n | ≥ r.23 may be combined with the previous tests to give tests for absolute convergence. for a real series n an define the following quantities: an+1 ρ = lim | | when it exists.20) we find that P n |an | is convergent – and the point is that by Theorem 11. the series P n an is absolutely convergent.  P In particular. Theo- rem 11. 1 c) If there exists r < 1 such that for all sufficiently large n. P∞ Proof.) But recall that whenever P the Ratio or Root tests establish the divergence of a non-negative series n bn . n→∞ . Moreover an − bn an lim = lim − 1 = 0. n→∞ an 1 θ = lim |an | n when it exists. we get that n=1 an is absolutely convergent and thus convergent. (Ash [As12]) Let {an }∞ ∞ n=1 and {bn }n=1 be real sequences + such that for all n ∈ Z we have bn 6= 0. n→∞ bn n→∞ bn  Exercise: Show that if we weaken (i) to limn→∞ | abnn | = L ∈ (0.

Cauchy products II: Mertens’s Theorem. + |aN b1 | + . n=0 n=0  P By the Cauchy criterion. 0≤i. for sufficiently large N |AN BN − AB| < 3 . and let cn = k=0 ak bn−k .  6.27. P∞ We now wish to show that limN →∞ CN = n=0 cn = AB. . 6. P Therefore n |cn | converges by comparison. B = |bn |. . + aN )(b0 + . since AN BN → AB.12 P∞ Theorem 11. P∞(Mertens’ Theorem) Let n=0 an = A be an absolutely con- vegent Pseries and n=0 bn = B be a convergent series. of course. 12Franz Carl Joseph Mertens.26 may seem rather long. . Put ∞ X ∞ X A= |an |. We have proved this result already when an . it is in fact a rather straightforward argument: one shows that the difference between the “box prod- uct” and the partial sums of the Cauchy product becomes negligible as N tends to infinity. 1840-1927 . bn ≥ 0 for all n. While the proof of Theorem 11. so the Cauchy product series n cn converges.26. . We wish. n=0 n=0 k=0 n=0 k=0 P∞ Pn the last inequality following from the fact P∞that n=0 Pk=0 |ak ||bn−k | is the Cauchy ∞ product of Pthe two non-negative series n=0 |an | and n=0 |bn |. Then the Cauchy product ∞ series n=0 cn converges to AB. . + |aN bN | ∞ !  ∞ !  X X X X ≤ |AN BN − AB| + |an |  |bn | + |bn |  |an | . n=0 n≥N n=0 n≥N Fix  > 0. Let Pn=0 an = A and n=0 bn = B be two absolutely P∞ con- n vergent series. As far as the convergence of the Cauchy product. + bN ) = AN BN . ABSOLUTE CONVERGENCE 253 1 θ = lim sup |an | n . Then the Cauchy product series n=0 cn is absolutely convergent. Recall the notation X N = ai bj = (a0 + . a theorem of F.2. with sum AB. . to reduce to that case. this is completely straightforward: we have ∞ X ∞ X X n ∞ X X n |cn | = | ak bn−k | ≤ |ak ||bn−k | < ∞. P∞ P∞ Theorem 11. . In less space but with a bit more finesse. Mertens [Me72].j≤N We have |CN − AB| ≤ |N − AB| + |N − CN | = |AN BN − AB| + |a1 bN | + |a2 bN −1 | + |a2 bN | + . Proof. one can prove the following stronger result. for sufficiently large N we have n≥N |bn | < 3A and  P n≥N |an | < 3B and thus |CN − AB| < . n→∞ and then all previous material on Ratio and Root Tests applies to all real series. hence it converges. .

Since BN → B.e. . . CN = cn n=0 n=0 n=0 and also (for the first time) βn = Bn − B.13 A series which is nonabsolutely convergent is a more delicate creature than any we have studied thus far. 3. . which we give in the next section. Since our goal is to show that CN → AB P∞ and we know that AN B → AB. Put M = max |βn |. In fact the typical undergraduate student of calculus / analysis learns exactly one such test. + aN β0 = AN B + γN . . INFINITE SERIES Proof. Thm. BN = bn . say. i. + aN b0 ) = a0 BN + a1 BN −1 + . . . A test which can show that a series is convergent but nonabsolutely convergent is necessarily subtler than those of the previous section. thus if it is convergent but not absolutely convergent. M n≥N −N0 |an | ≤ /2. . . Non-Absolute Convergence P We say that Pa real series n an is nonabsolutely convergent if the series con- verges but n |an | diverges. 2 2 2 2 n≥N −N0 So γN → 0. .. put α = n=0 |an |. . (Rudin [R.  7. . + aN (B + β0 ) = AN B + a0 βN + a1 βN −1 + . + (a0 bN + . and thus for any  > 0 we may choose  N0 ∈ N such that for all n ≥ N0 we have |βn | ≤ 2α . . for all sufficiently large N . + aN B0 = a0 (B + βN ) + a1 (B + βN −1 ) + . it describes how the series converges.50]): define (as usual) N X N X N X AN = an .254 11. . Then |γN | ≤ |β0 aN + . . + βN a0 |    X    ≤ |β0 aN + . Now. . it suffices to show that γN → 0. + βN0 aN −N0 | + ≤M |an | + ≤ + = . In fact our terminology here is not completely standard. 0≤n≤N0 P By the Cauchy criterion. βN → 0. . so it must modify “convergent”. . . + βN0 aN −N0 | + |βN0 +1 aN −N0 −1 + . + an β0 . Then for all N ∈ N. CN = a0 b0 + (a0 b1 + a1 b0 ) + . 13One therefore has to distinguish between the phrases “not absolutely convergent” and “nonabsolutely convergent”: the former allows the possibility that the series is divergent. We defend ourselves grammatically: “nonabsolutely” is an adverb. where γN = a0 βN + a1 βN −1 + . whereas the latter does not. .

. However. and the inequalities S2 ≤ S4 ≤ . The Alternating Series Test. ≤ S2n ≤ S2n+1 ≤ S2n−1 ≤ .. . say Seven . Thus we have split up our sequence of partial sums into two complementary subsequences and found that each of these series converges. Now the sequence {Sn } converges iff Sodd = Seven . . . Consider the alternating harmonic series ∞ X (−1)n+1 1 1 = 1 − + − . subtracting a2 and then adding a3 leaves us no larger than where we started. . + (a2n−1 | − |a2n ) + a2n−1 ≥ 0. some computations with partial sums suggests that the alternating har- monic series is convergent. S2n+1 = (a1 − a2 ) + (a3 − a4 ) + . (AST3) The sequence of absolute values of the terms is weakly decreasing: a1 ≥ a2 ≥ . so the alternating harmonic series is not absolutely convergent. . Thus the sequence of odd-numbered partial sums {S2n−1 } is decreasing. of course. . so it converges to its least upper bound. ≥ an ≥ . . (Whether it converges to log 2 is a different matter. ≤ S3 ≤ S1 . . and this is no accident: since a2 ≥ a3 . Moreover. n=1 n 2 3 Upon taking the absolute value of every term we get the usual harmonic series. . which diverges. We draw the reader’s attention to three properties of this series: (AST1) The terms alternate in sign.) It will be convenient to write an = n1 . with sum log 2. say Sodd . . we can find a pattern that allows us to show that the series does indeed converge. the sequence {S2n+1 } converges to its greatest lower bound. By looking more carefully at the partial sums. (AST2) The nth term approaches 0. Here it is: consider the process of passing from the first partial sum S1 = 1 to S3 = 1− 21 + 13 = 5 6 . just the opposite sort of thing happens for the even- numbered partial sums: S2n+2 = S2n + a2n+1 − a2n+2 ≥ S2n and S2n+2 = a1 − (a2 − a3 ) − (a4 − a5 ) − . But indeed this argument is valid in passing from any S2n−1 to S2n+1 : S2n+1 = S2n−1 − (a2n − a2n+1 ) ≤ S2n−1 . By the Monotone Sequence Lemma. NON-ABSOLUTE CONVERGENCE 255 7. so that the alternating harmonic series n+1 is n (−1) P n+1 . − (a2n − a2n+1 |) − a2n+2 ≤ a1 . Therefore all the odd-numbered terms are bounded below by 0. 7. Therfore the sequence of even-numbered partial sums {S2n } is increasing and bounded above by a1 .1.. .. We have S3 ≤ 1. . On the other hand. which we will revisit much later on. These are the clues from which we will make our case for convergence.

given any  > 0. put ∞ X N X n+1 (83) EN = |( (−1) an ) − ( (−1)n+1 an )|. (ii) nonabsolutely convergent if 0 < p ≤ 1. it is often possible to show that EN → 0 without coming up with an explicit expression for N in terms of . for any n ∈ Z+ we have Sodd − Seven ≤ S2n+1 − S2n = a2n+1 . INFINITE SERIES show that Seven ≤ Sodd . P∞ (−1)n+1 Exercise: Let p ∈ R: Show that the alternating p-series n=1 np is: (i) divergent if p ≤ 0. Give necessary and sufficient condi- n P (x) P tions for n (−1) Q(x) to be nonabsolutely convergent.. b) For N ∈ Z+ . we may define EN as in (83) above: N X EN = |S − an |. Moreover. P∞ Thus the error in cutting off the infinite sum n=1 (−1)n+1 |an | after N terms is in absolute value at most the absolute value of the next term: aN +1 . it follows that |S − S2n | = S − S2n ≤ S2n+1 − S2n = a2n+1 and similarly |S − S2n+1 | = S2n+1 − S ≤ S2n+1 − S2n+2 = a2n+2 . Theorem 11.28b): we have . Since a2n+1 → 0. Therefore the preceding arguments have in fact proved the following more general result. we conclude Sodd = Seven = S. Let {an }∞ n=1 be a sequence of non-negative real numbers which is weakly decreasing and such that limn→∞ P an = 0. i. due originally due to Leibniz. Of course in all this we never used that an = n1 but only that we had a series satisfying (AST1) (i. an alternating series).e. and (iii) absolutely convergent if p > 1. Each of these statements is (in the jargon of mathematicians working in this area) soft: we assert that a quantity approaches 0 and N → ∞.256 11. (AST2) and (AST3). we have EN <  for all suffuciently large N . Then: a) The associated alternating series n (−1)n+1 an converges. n=1 n=1 Then we have the error estimate EN ≤ aN +1 . Further. since for all n we have S2n ≤ S2n+2 ≤ S ≤ S2n+1 . the series converges.e.28. to say that the error goes to 0 is a rephrasing of the fact that the partial sums of the series converge to S. P∞ For any convergent series n=1 an = S.. But as we have by now seen many times. and conversely: in other words. so that in theory. P (x) Exercise: Let Q(x) be a rational function. n=1 Then because the series converges to S. limN →∞ EN = 0. But this stronger statement is exactly what we have given in Theorem 11.

. in order to compute the true sum of the al- ternating harmonic series to six decimal place accuracy using this method. or equivalently such that (N +1)! > 106 . For instance. as long as we can similarly make explicit how large N has to be in order for aN to be less than a given  > 0. Thus we need to choose N such that aN +1 = (N +1)! < 10−6 . In this case we 1 were guaranteed to have an error at most 1001 and we see that the true error is about half of that. we would need to add up the first million terms: that’s a lot of calculation.3678791887125220458553791887. n=1 n Again. Thus the actual error in cutting off the sum after 1000 terms is E1000 = 0. Note well that although the error estimate of Theorem 11. Here. but also more useful to have. n=0 n! n=0 n! 10! Using a software package.6926474305598203096672310589 . it is not especially efficient for computations. We may take N = 1000. By The- orem 11.6931471805599453094172321214. P∞ n+1 Example: We compute n=1 (−1)n to three decimal place accuracy.0004997500001249997500010625033. which my software package tells me is14 log 2 = 0. 800.28b). 880 and 10! = 3. Therefore ∞ 9 X (−1)n X (−1)n 1 | − |< < 10−6 . it is enough to find N ∈ Z+ such that aN +1 = N1+1 < 10001 . It is important to remember that this and other error estimates only give upper bounds on the error: the true error could well be much smaller.28b) is very easy to apply. (Thus please be assured that this is not the way that a calculator or computer would compute log 2. . later we will show that the exact value of the sum is log 2. so that we may take N = 9 (but not N = 8). 628. Therefore ∞ 1000 X (−1)n+1 X (−1)n+1 1 | − |≤ . 7. . you should be wondering how it is computing this! More on this later. 1 A little calculation shows 9! = 362. we find 9 X (−1)n = 0.) P∞ n Example: We compute n=0 (−1) n! to six decimal place accuracy. if an tends to zero rather slowly (as in this example). Thus the estimate for the error is reasonably accurate. n=0 n! 14Yes. we find that 1000 X (−1)n+1 = 0. NON-ABSOLUTE CONVERGENCE 257 given an explicit upper bound on EN as a function of N . This type of statement is called a hard statement or an explicit error estimate: such statements tend to be more difficult to come by than soft statements. we get a completely explicit error estimate and can use this to actually compute the sum S to arbitrary accuracy. . n=1 n n=1 n 1001 Using a software package.

(Dirichlet’s Test) Let n=1 an . If we take bn = (−1)n+1 . P Therefore n an bn converges by the Cauchy criterion.258 11. . without saying what we were doing. we find that the series n+1 P P n an bn = n (−1) an converges. Then the series n bn an converges. + bn ≤ M for all n ∈ Z+ . + bn is bounded.3678794411714423215955237701 e so the true error is E9 = 0. . choose N > 1 such that aN 2M . n X n X n X n−1 X | ak bk | = | ak (Bk − Bk−1 )| = | ak Bk − ak+1 Bk | k=m k=m k=m k=m−1 n−1 X =| (ak − ak+1 )Bk + an Bn − am Bm−1 | k=m n−1 ! X ≤ |ak − ak+1 ||Bk | + |an ||Bn | + |am ||Bm−1 | k=m n−1 n−1 ! X X ≤ M( |ak − ak+1 |) + |an | + |am | = M (ak − ak+1 ) + an + am k=m k=m = M (am − an + an + am ) = 2M am ≤ 2M aN < .15 P∞ P∞ Theorem 11. What lies beyond the Alternating Series Test? We present one more result. Proof. . + bn are bounded. 1805-1859 . Dirichlet’s Test.0000002524589202757401445814516374. Let M ∈ R be such that Bn = b1 + .2.  In the preceding proof. then B2n+1 = 1 for all n and B2n = 0 for all n. Then n=1 an bn is con- vergent. we used the technique of summation by parts. INFINITE SERIES In this case the exact value of the series is 1 == 0. n=1 bn be two infinite se- ries. We have recovered the Alternating Series Test! In fact Dirichlet’s Test yields the following Almost Alternating Series Test: let {an } be a sequence decreasing to 0. For   > 0. Suppose that: (i) The partial sums Bn = b1 + . and for all n let bn ∈ {±1} be a “sign se- quence” which is almost alternating in the sense P that the sequence of partial sums Bn = b1 + . Then for n > m ≥ N . Applying Dirichlet’s Test with a sequence an which decreases to 0 and with this sequence {bn }. which this time is only very slightly less than the guaranteed 1 = 0. so {bn } has bounded partial sums. an elegant (and useful) test due originally to Dirichlet. . 15Johann Peter Gustav Lejeune Dirichlet. 10! 7.0000002755731922398589065255731922. . . P∞ (ii) The sequence an is decreasing with limn→∞ an = 0.29.

Take n=0 an = n=0 bn = n=0 √ n+1 . √j+1 ≥ √n+1 . Dirichlet himself applied his test to establish the convergence of a certain class of series of a mixed algebraic and number-theoretic nature. Let us give an example – due to Cauchy! – of a Cauchy product of two nonabsolutely P∞ P∞ P∞ (−1)n convergent series which fails to converge. Decomposition into positive and negative parts. we define its positive part r+ = max(r.4. 7. and thus each term in cn has absolute 1 2 1 value at least ( √n+1 ) = n+1 . + bn is unbounded. Cauchy Products III: A Divergent Cauchy Product. . P∞ sin n Exercise: Show that n=1 n is nonabsolutely convergent. so the series diverges. cos n b) Apply Dirichlet’s P∞ cos nTest to show that the series n=1 n converges. 0) . 7. . For more information on this work. + cos n Pis∞ bounded. we find |cn | ≥ 1 for all n. j ≤ n. NON-ABSOLUTE CONVERGENCE 259 Exercise: Show that Dirichlet’s generalization of the Alternating Series Test is “as strong as possible” in the following sense: if {bn } is a sequence of elements. we will be able to give a “trigonometry-free” proof of the pre- ceding two exercises. . c) Show that n=1 n is not absolutely convergent. all of the same size. For a real number r. Remark: Once we know about series of complex numbers and Euler’s formula eix = cos x + i sin x. . Exercise: a) Use the trigonometric identity sin(n + 12 ) − sin(n − 21 ) cos n = 2 sin( 21 ) to show that the sequence Bn = cos 1 + . The analytic proper- ties of these series were used to prove his celebrated theorem on prime numbers in arithmetic progressions. then there is a sequence an decreasing to zero such that n an bn diverges. so cn is equal to (−1)n times a sum of positive 1 1 1 terms. i+j=n i+1 j+1 Now (−1)i (−1)j = (−1)i+j = (−1)n . each ±1. To give a sense of how influential this work has become. the reader may consult (for instance) [DS]. in modern terminology Dirichlet studied the analytic properties of Dirichlet series associated to nontrivial Dirichlet characters.3. Since we are summing from i = 0 to n there are n +P1 terms. such that the sequence of partial sums Bn = b1P + . 7. Since i. √i+1 . The nth term in the Cauchy product is X 1 1 cn = (−1)i (−1)j √ √ . Thus the general term of n cn does not converge to 0.

and so on – of the function f (x) = n=0 an xn defined by a power series. Power Series I: Power Series as Series 8. Show: a) r = r+ − r− . P∞ However. contradiction. if we had an = 1 for all n we would 1 get the geometric series n=0 xn which converges iff x ∈ (−1. One of the major themes of Chapter three will be to try to view power series as “infinite polynomials”: in particular.e. for instance.30. n an is absolutely convergent. Exercise: Let r be a real number. If a series n is nonabsolutely convergent. P For any real series n an we have a decomposition X X X an = a+ n − a− n. for which values of ∞ x ∈ R does the power series n=0 an xn converge? .P∞ Thus. Thus: P Proposition 11. Question 11. P − − Case 2: Both n a+ | = n a+ P P P n and n an diverge. then both its positive and negative parts diverge. By the Three Series Principle there are two cases: P + P − P P + − Case 1: Both P n an and n an converge. n n n P − Let us call n a+ P at least if all three series converge. For a Psequence {an }∞ n=0 of real numbers. Thus the basic question about power series that we will answer in this section is the following. Hence n |an P n + an diverges: indeed. − b) Show that if n a+ n diverges and n an converges then n an = ∞. if it converged. 0). P Exercise: Let P n an be a real series. 8. P − then adding and subtracting n an we would get that 2 n a+ P n and 2 a n n converge. P∞ Let {an }∞n=0 be a sequence of real numbers. P n and P n an the positive part and negative part of n an . what is its domain? The natural domain of a power series is the set of all values of x for which the series converges. If a series nPis absolutely convergent. 1) and has sum 1−x . Pn k The nth partial sum of a power series is k=0 ak x . Let us now suppose that n an converges.. integrability. P − a) Show that if Pn a+ P n converges andP n an diverges then Pn an = −∞. a polynomial in x. Convergence of Power Series. Then a series of the form n=0 an x n is called a power series. we will regard x as a variable and be interested in the propetiesP∞– continuity. differentiability.1. both its positive and negative parts converge. INFINITE SERIES and its negative part r− = − min(r. Hence n |an | = n (an + an ) con- verges: i. if we want to regard the series n=0 an xn as a function of x.260 11. b) |r| = r+ + r− .31.

iff x ∈ (−R. We apply the Ratio Test: nRn |x|n+1 n+1 1 |x| lim n+1 n = |x| lim · = . when x = ±R. So it is indeed possible for a power series P to converge only at x = 0. Therefore the Ratio Test shows that (as we already knew!) the series converges absolutely at x = 0 and diverges at every nonzero x. i.. so it converges iff |ρ| = | R | < 1. 8. Namely. the same as in Example 4 but with x replaced by −x throughout. n→∞ (n + 1) R |x|n n→∞ n R R So once again the series converges absolutely when |x| < R. R).. This is a geometric series x x with geometric ratio ρ = R . the We must look separately at the series is the harmonic series n n1 . P∞ 1 n Example 3: Fix R ∈ (0. P case |x| = R – i. P∞ 1 n Example 4: Fix R ∈ (0. We apply the Ratio Test: (n + 1)!xn+1 lim = lim (n + 1)|x|. POWER SERIES I: POWER SERIES AS SERIES 261 There is one value of x for which the answer is trivial. diverges when |x| P> R. We apply the Ratio Test: 2 n2 R n |x|n+1  n+1 1 |x| lim 2 n+1 = |x| lim · = . iff x ∈ (−R. n=0 So every power series converges at least at x = 0. ∞). R).e.e.e. n→∞ (n + 1)R |x| n→∞ n R R Therefore the series converges absolutely when |x| < R and diverges when |x| > R. whereas plugging in x = −R gives n (−1) P n2 : since . ∞). P∞ (−1)n n Example 5: Fix R ∈ (0. consider n=1 nRn x . if we plug in x = 0 to our general power series. i. hence divergent. There is nothing interesting going on here. When x = R. hence (nonabsolutely) convergent. This time plugging in x = R gives n n12 n which is a convergent p-series. This is disappointing if we are interested in f (x) = n an xn as a function of x. the series n is the alternating harmonic series n (−1) P n . So the power series converges for x ∈ [−R. We apply the Ratio Test: xn+1 n! |x| lim | || | = lim = 0. we get ∞ X an 0n = a0 + a1 · 0 + a2 · 02 = a0 . n→∞ (n + 1)! xn n→∞ n + 1 So the power series converges for all x ∈ R and defines a function f : R → R. consider n=1 nRn x .. P∞ Example 1: Consider the power series n=0 n!xn . Thus the series converges iff −x ∈ [−R. n→∞ n!xn n→∞ The last limit is 0 if x = 0 and otherwise is +∞. ∞). R]. We may rewrite this se- P∞ 1 n ries as n=1 nR n (−x) . since in this case it is just the function from {0} to R which sends 0 to a0 . i. consider n=1 n2 Rn x .. xn P∞ Example 2: Consider n=0 n! . consider n=0 Rn x . But when x = −R. R).e. and we must look separately at x = ±R. ∞). P∞ 1 n Example 6: Fix R ∈ (0.

• the entire real line R = (−∞. If n an A n converges. i.. we may assume |an An | ≤ 1 for all n. n n A n A  P∞ Theorem 11. R] • for any R ∈ (0. a half-open interval [−R. Although this is usually the case in simple examples of interest. the alternating p-series with p = 2 is abso- lutely convergent. So what more is there to say or do? The issue here is that we have assumed that limn→∞ aan+1 n exists. n→∞ |an xn | n→∞ an So if ρ = limn→∞ aan+1 n . . b) If R = 0. the power series converges for all x ∈ R. This we need to take a different approach in the general case. we find |an+1 xn+1 | an+1 lim = |x| lim . 01 = ∞.e. INFINITE SERIES the p-series with p = 2 is convergent.32. ∞). iff |x| < ρ1 – and diverges when |x|ρ > 1 – i. then n x converges absolutely for all x ∈ (−A. then the power series converges only at x = 0. Thus the convergence set of a power series can take any of the following forms: • the single point {0} = [0. it is certainly does not happen in general (we ask the reader to revisit §X. ∞ 1 = 0. R) and definitely diverges for all x outside of [−R. That is. ∞).e.33. ∞). In each case the set of values is an interval containing 0 and with a certain radius. R) or (−R. show n an B n is absolutely con- P Proof. we have already said what is needed: we use the Ratio Test to see that the convergence set isPan interval around 0 of a certain radius R. 0]. c) If R = ∞. n an xn diverges.X for examples of this). R]. n n P P Lemma 11. Therefore the series converges (absolutely. a) There exists R ∈ [0. • for any R ∈ (0. an open interval (−R. Since 0 < B A < 1. iff |x| > ρ1 . R). Let n=0 an xn be a power series.. an extended real number R ∈ [0. This goal can be approached at various degrees of sophistication. Pn an xn converges absolutely and (ii) For all x with |x| > R. • for any R ∈ (0. i.262 11. in fact) on [−R. the Ratio Test tells us that the series converges when |x|ρ < 1 – i. an An → 0: by P omitting finitely many terms. with suitable conventions in the extreme cases. for then so is n an (−B)n . ∞). A). Let 0 < BP< A. P Let A > 0 and let n an x be a power series. R]. Since n an An converges. ∞) such that the series definitely converges for all x ∈ (−R. It is enough to vergent. Namely.. R]. At the calcu- lus level. taking a general power series n an xn and applying the Ratio Test. Our goal is to show that this is the case for any power series.. a closed interval [−R.e.  n  n X X B X B |an B n | = |an An | ≤ < ∞. the radius of convergence R is precisely the reciprocal of the Ratio Test limit ρ.e. ∞] such P that: (i) For all x with |x| < R.

R] or [−R. Thus RPsatisfies property (i).  Exercise: Prove parts) b). If |x| < R. the series converges absolutely for |x| < R and diverges for |x| > R. We have lim supn→∞ |an xn | n = |x| lim supn→∞ |an | n = |x|θ. 1 1 Proof. so in particular it converges absolutely at y. R). POWER SERIES I: POWER SERIES AS SERIES 263 d) If 0 < R < ∞. A). Show that the “formal identity” ∞ ! ∞ ! ∞ X X X n n an x bn x = cn xn n=0 n=0 n=0 is valid for all x ∈ (−R. We leave the proof of parts b) through d) to the reader as a straightforward exercise.) The drawback of Theorem 11. [−R.16 who published it in 1888 [Ha88] and included it in his 1892 PhD thesis. Similarly. It turns out that the result was established by our most usual suspect: it was first proven by Cauchy in 1821 [Ca21] but apparently had been nearly forgotten. (Cauchy-Hadamard) Let n an xn be a power series. Let R = min(Ra . A). c) and d) of Theorem 11. choose A such that |x| < A < R and then A0 such that 1 1 θ= < A0 < . then there exists A with |y| < A < R such that n an An converges. (Suggestion: use past results on Cauchy products. we need to appeal instead to the Root Test and make use of the limit supremum. so the series converges abso- lutely by the Root Test. a) Let R be the least upper bound of the set of x ≥ 0 such that n a n n x converges.32 the power series converges P absolutely on (−A. R). If y is such that |y| < R. In order to achieve this in general. A R 16Jacques Salomon Hadamard (1865-1963) . P∞ P∞ Exercise: Let n=0 an xn and n=0 bn xn be two power seriesPwith positive radii n of convergence Ra and Rb . (−R. |an xn | n ≤ A0 A < 1.S.34. as is the case when the ratio test limit ρ = limn→∞ |a|an+1 n| | exists. Then there exists A with R < A < |y| such that the power series converges on (−A. choose A such that R < |x| < A and then A0 such that 1 1 < A0 < = θ.33.33 is that it does not give an explicit description of the radius of convergence R in terms of the coefficients of the power series. Hadamard. so by Lemma 11. Similarly. n→∞ 1 Then the radius of convergence of the power series is R = θ : that is. if |x| > R. suppose there exists y with |y| > R such that n an y n converges. and put P 1 θ = lim sup |an | n . R). the convergence set of the power series is either (−R. Rb ). contradicting the definition of R. The following elegant result is generally attributed to J. This seems remarkably late in the day for a result which is so closely linked to (Cauchy’s) Root Test. Put cn = k=0 ak bn−k . 8. R A 1 Then for all sufficiently large n. R]. Put R = θ1 . P Proof. Theorem 11.

Since limn→∞ (n+1) nk = limn→∞ n+1n = 1. we recommend simply assuming that the Ratio Test limit ρ = limn→∞ | aan+1 n | exists and proving Theorem 11. INFINITE SERIES 1 Then there are infinitely many non-negative integers n such that |an xn | n ≥ A0 A > 1.  Remark: For the reader who is less than comfortable with limits infimum and supre- mum. Exercise: Prove Corollary 11.36.35. b) Give an example in which J 6= I. lim (nk )1/n = lim nk/n = 1. so the series n an x diverges: indeed an xn 9 0. This will be good enough for most of the power series encountered in practice. by Corollary 11.264 11. Let {an }∞ n=0 be a sequence P∞ of real numbers. for any k ∈ Z. k k Proof. a) Show that J ⊂ I. Theorem 11.35. n→∞ n→∞ n→∞ n→∞ The result now follows from the Cauchy-Hadamard Formula. n→∞ n→∞ (Alternately.36 under that additional assumption using the Ratio Test.22.) Therefore 1  1  1  1 lim sup(nk |an |) n = lim (nk ) n lim sup |an | n = lim sup |an | n . Exercise: Let n an xn be a power P P series with interval of convergence I. R = 1. the radius of convergence of n nk an xn is also R. b) If an 9 0. n P  Here is a useful criterion for the radius of convergence of a power series to be 1. then R ≤ 1. a) If {an } is bounded. Then. then R ≥ 1. take logarithms and apply L’Hôpital’s Rule. Corollary 11. Let n an xn be a power series P P with radius of convergence R. . c) Thus if {an } is bounded but not convergent to zero. Let J be the interval of convergence of n nan xn−1 . and let R be the radius of convergence of the power series n=0 an xn .

and consider a degree d polynomial expression P (t) = a0 + a1 t + . To every polynomialPexpression we as- N sociated a polynomial function P (x) from R to R. Namely we want to consider polynomial c-expansions and their associated func- PN tions. then an = 0 for all n. If P is the zero polynomial or is a constantPpolynomial. almost trivial – but nevertheless important result. Then we may write P (t) = aN (t − c)N + Q(t). CHAPTER 12 Taylor Taylor Taylor Taylor 1. Now we wish to establish the following simple – indeed. . Theorem 12. given by x 7→ n=0 an xn . and let c ∈ R. . PN P (x) = n=0 bn (x − c)n . A polynomial expression is something of the form P (t) = PN n n=0 an t with N ∈ N and a0 . 265 . then an = bn for P∞0 ≤ n n≤ N . A polynomial c-expansion induces a function in an evident PN manner: we map x to n=0 an (x − c)n . where. . . P (n) (c) b) Explicitly. A polynomial c-expansion is a formal expression n=0 an (t − c)n with N ∈ N and c.) And after that – thankfully! – we have spoken only of polynomials. Perhaps this is most cleanly handled by induction on the degree of P . a) We first discuss the existence of c-expansions. We wish – briefly! – to revisit this formalism in a slightly generalized context. PN a) There is a unique polynomial c-expansion n=0 bn (t−c)n such that for all x ∈ R. n=0 Then P (t) = b0 + b1 (t − c) + . Proof. (This came down to showing that that if a polynomial expression all n=0 an t defines the zero function. Suppose now that N > 0 and we have established that every polynomial function of degree less than N has a polynomial c-expansion. . Then we showed that the process is reversible in the following sense: if two expressions PN n PN n n=0 an t and n=0 bn t define the same polynomial function. an ∈ R.1. . Taylor Polynomials Recall that early on we made a distinction between polynomial expressions and polynomial functions. + bN −1 (t − c)N −1 + aN (t − c)N . . Let P be a polynomial of degree at most N . then there is nothing to show: a constant a is certainly of the form n bn (x − c)n : take b0 = a and bn = 0 for all n > 0. + aN −1 tN −1 + aN tN . since the leading term of aN (t − c)N is aN tn . Q(t) is a polynomial of degree less than N and thus by induction can be written as N X −1 Q(t) = bn (t − c)n . we have bn = n! for all 0 ≤ n ≤ N . . aN ∈ R.

+ bN (x − c)N . Let I be an interval and c an interior point of I. . so that n=0 (an − bn )(t − c) = PN −1 n P N −1 n n=0 (an − bn )(t − c) = n=0 dn t . . . i.  Corollary 12. by what we showed about polynomial functions we have dn = 0 PN n for all n. Differentiating (84) again and evaluating 00 at x = c gives P 00 (c) = 2b2 . Q is the zero polynomial and thus P = TN . + N bN (x − c)N −1 .+aN tN = a0 +a1 ((t−c)+c)+a2 ((t−c)+c)2 +. Applying Theorem 0 to TN (x) = k=0 k! (x − c)k gives k! = (k) TN (c) (k) (k) k! . and let f : I → R be a function which is N times differentiable at c.1 we get Q(x) = k=0 k! (x − c)k = PN k k=0 0 · (x − c) = 0. . .266 12. PN n n=0 bn (t − c) which induce thePN same function. As for the uniqueness: let P (x) be TN (c) any polynomial of degree at most N such that P (k) (c) = f (k) (c) for 0 ≤ k ≤ N . we say two functions f. TAYLOR TAYLOR TAYLOR TAYLOR Alternately and perhaps more directly. Let N X f (k) (c) TN (x) = (x − c)k . In particular 0 = dN = aN − bN . PN As for the uniqueness. b) Consider the identity N X (84) P (x) = bn (x − c)n = b0 + b1 (x − c) + b2 (x − c)2 + .. Reasoning as above we find that dN −1 = aN −1 − bN −1 = 0. g : I → R agree to order n at c if f (x) − g(x) lim = 0. and let Q = TN − P . . with highest PN degree term dN = aN − bN .2. we may write n=0 (an − bn )(t − c)n as n=0 dn tn . so bk = k! . TN (x) is the degree N Taylor polynomial of f at c.hence = f (c) for all 0 ≤ k ≤ N . x→c (x − c)n . By an argument almost identical PN to that just given. n=0 Evaluating (84) at x = c gives P (c) = b0 . suppose that we have two c-expansions n=0 an (t − c)n . Since the corresponding function x 7→ n=0 dn xn is identically zero. and so forth: continuing in this way we find that an = bn for all 0 6= n ≤ N .e. and evaluating at x = c gives P 0 (c) = b1 . Continuing in this manner one finds P (k) (c) that for 0 ≤ k ≤ N . Then Q is a polynomial of degree at most N such that Q(k) (c) = 0 PN Q(k) (c) for 0 ≤ k ≤ N . and thus the c-expansion is unique. k! k=0 Then TN (x) is the unique polynomial function of degree at most N such that (k) TN (c) = f (k) (c) for all 0 ≤ k ≤ N .  By definition.+aN ((t−c)+c)N and use the binomial theorem to expand each ((t − c) + c)n in powers of t − c. so b2 = P 2(c) . P (k) (c) = k!bk . . simply write a0 +a1 t+a2 t2 +. applying Theorem 12. PN f (k) (c) f (k) (c) Proof. Differentiating (84) gives P 0 (x) = b1 + 2b2 (x − c) + 3b3 (x − c)2 + . Taylor’s Theorem Without Remainder For n ∈ N and c ∈ I ◦ . . 2.

f (n) (c) = g (n) (c). then f and g agree to order m at c. The following are equivalent: (i) We have f (c) = g(c). x→c n!(x − c) n! x→c x−c provided the latter limit exists. is equally clear. . only now have the n − 1 applications of L’Hôpital’s Rule been unconditionally justified). in particular the limit exists. x→c x−c x→c x−c x−c Thus assuming f (c) = g(c). 1685 . x→c (x − c)0 x→c The converse. Then (i) holds iff h(c) = h0 (c) = . . 2. although special cases were known to earlier mathematicians. h(x) 0 (i) =⇒ (ii): L = limx→c (x−c) n is of the form 0 . f and g agree to order 1 at c if and only f 0 (c) = g 0 (c). Proof. we may assume n ≥ 2.1 probably correctly.e. (ii) f and g agree to order n at c. Set h(x) = f (x) − g(x). that if f (c) = g(c) then limx→c f (x) − g(x) = 0. Let c be an interior point of I. so L’Hôpital’s Rule gives h0 (x) L = lim .. The following result gives the expected generalization of these two examples. and then we find f (x) − g(x) f (x) − f (c) g(x) − g(c) lim = lim − = f 0 (c) − g 0 (c).3 (only) that was proved by Brook Taylor. So we may work with h instead of f and g. Indeed. . by the exercise above both hypotheses imply f (c) = g(c). we have f (x) − g(x) 0 = lim = lim f (x) − g(x) = f (c) − g(c). Theorem 12. it is h(n) (c) = 0 (and. . this latter limit is still of the form 00 . .3. By Corollary 1Brook Taylor. Example 0: We claim that two continuous functions f and g agree to order 0 at c if and only if f (c) = g(c). Indeed. g : I → R be two n times differen- tiable functions. suppose that f and g agree to order 0 at c. It is generally attributed to Taylor. TAYLOR’S THEOREM WITHOUT REMAINDER 267 Exercise: Show: if m ≤ n and f and g agree to order n at c. Example 1: We claim that two differentiable functions f and g agree to order 1 at c if and only if f (c) = g(c) and f 0 (c) = g 0 (c). Since f and g are continuous. Thus (ii) holds. By our assumptions. We do so iff n > 2. Note that Taylor’s Theorem often refers to a later result (Theorem 12. . we apply L’Hôpital’s Rule n − 1 times. f 0 (c) = g 0 (c).4) that we call “Taylor’s Theorem With Remainder. Since we dealt with n = 0 and n = 1 above. so we may apply L’Hôpital’s Rule again. getting h(n−1) (x) h(n−1) (x) − h(n−1) (c)   1 L = lim = lim . (ii) =⇒ (i): Let Tn (x) be the degree N Taylor polynomial to h at c. so we may assume that. But the expression in parentheses is nothing else than the derivative of the function h(n−1) (x) at x = c – i. = (n) h(x) h (c) = 0 and (ii) holds iff limx→c (x−c) n = 0.1731 . so L = 0. x→c n(x − c)n−1 provided the latter limit exists.” even though it is Theorem 12. (Taylor) Let n ∈ N and f. In general.

Exercise: Let f be a function which is n times differentiable at x = c. which is slightly more than we want (or need) to assume. so the numerator must also approach zero (otherwise the limit would be infinite): evaluating the numerator at c gives h0 (c) = 0. h(x) and Tn (x) agree to order n at x = c: h(x) − Tn (x) lim = 0. But to assume this last limit exists and is equal to h(n) (0) is to assume that nth derivative of h is continuous at zero. so we must have Tn (c) = h(c) = 0.2. we have the limit of a quotient of continuous functions which we know exists such that the denominator approaches 0. f and Tn agree to order n at c. . . then limx→c (x−c)n would not exist. x→c (x − c)n x→c (x − c)n−1 As above. so by the just proved implication (i) =⇒ (ii).3 in the form: f and g agree to order n at c iff f − g vanishes to order n at c. x→c n! assuming this limit exists. + h (c) n! (x − c)n−1 lim = lim = 0.268 12. a function f : I → R vanishes to order n at c if limx→c (x−c) n = 0.3 in our new terminology. x→c (x − c) n! (n) so h (c) = 0. Note that this concept came up prominently in the proof of Theorem 12. Show that f − Tn vanishes to order n at x = c. And so forth: continuing in this way we find that the existence of the limit in (85) implies that h(c) = h0 (c) = .  Remark: Above we avoided a subtle pitfall: we applied L’Hôpital’s Rule n − 1 times h(x) 0 to limx→c (x−c) n .) Exercise: . = h(n−1) (c) = 0. by assumption h(x) agrees to order n with the zero function: h(x) lim = 0. x→c (x − c)n x→c (x − c)n Tn (x) Clearly limx→c Tn (x) = Tn (c). + n! (x − c)n lim = lim = 0. . (This is just driving home a key point of the proof of Theorem 12. so (85) simplifies to h(n) (c) n! (x − c)n h(n) (c) 0 = lim n = . x→c (x − c)n Moreover. x→c (x − c)n Subtracting these limits gives (85) h00 (c) h(n) (c) Tn (x) h(c) + h0 (c)(x − c) + 2 (x − c)2 + . and let Tn be its degree n Taylor polynomial at x = c. but the final limit we got was still of the form 0 – so why not apply L’Hôpital one more time? The answer is if we do we get that h(n) (x) L = lim . f (x) For n ∈ N. . Therefore h00 (c) (n) Tn (x) h0 (c) + 2 (x − c) + . . so if Tn (c) 6= 0. TAYLOR TAYLOR TAYLOR TAYLOR 12. .

Then: a) There is z ∈ |[c.b vanishes to order a − 1 at 0 but does not vanish to order a at 0. we put ||f || = supx∈[a. in which case fa. Taylor’s Theorem With Remainder To state the following theorem. let n X f (k) (c)(x − c)k Tn (x) = k! k=0 be the degree n Taylor polynomial of f at c. it will be convenient to make a convention: real numbers a. b ∈ Z+ . 3. (n + 1)! b) There exists z ∈ |[c. c) Show that fa. a1 = f 0 (c) and thus the expression of a function differentiable at c as the sum of a linear function and a function vanishing to first order at c is unique. 3. (ii) We may write f (x) = a0 + a1 (x − c) + g(x) for a function g(x) vanishing to order 1 at c. (Taylor’s Theorem With Remainder) Let n ∈ N. Moreover. fn. (n + 1)! c) If f (n+1) is integrable on |[c.n vanishes to order n at x = 0 but is not twice differentiable at x = 0. b] if a ≤ b and the interval [b. b) Show that if the equivalent conditions of part a) are satisfied. and let Rn (x) = f (x) − Tn (x) be the remainder.b (0) = 0. x]|. . TAYLOR’S THEOREM WITH REMAINDER 269 a) Show that for a function f : I → R. x]| such that f (n+1) (z) (87) Rn (x) = (x − c)n+1 . then we must have a0 = f (c). So |[a.4.b] |f (x)|. and consider the following function fa. b]| → R. let I be an interval. d) Deduce that for any n ≥ 2.b is twice differentiable at x = 0 iff a ≥ b + 3. Theorem 12. b]| we will mean the interval [a. then Z x n+1 f (t)(x − t)n dt Rn (x) = . For a function f : |[a.b is differentiable at x = 0 iff a ≥ 2. and let f : I → R be defined and n + 1 times differentiable. 0 b) Show that fa.b : R → R: x sin x1b  a  if x 6= 0 fa. b. Exercise:2 Let a. a] if b < a.b (x) = 0 if x = 0 a) Show that fa. the following are equivalent: (i) f is differentiable at c. Let c ∈ I ◦ and x ∈ I. x]| such that f (n+1) (z) (86) Rn (x) = (x − z)n (x − c). b]| is the set of real numbers lying between a and b. by |[a. c n! 2Thanks to Didier Piau for a correction that led to this exercise.

x]|.4]. and these can used to our advantage in a rather beautiful way. . 3Thanks to Nick Fink for pointing out the hypothesis of continuity seems to be needed here. a) First note that. the proof is left to the reader. But in fact the assumed differentiability properties of f easily translate into differentiability properties of S(t). x]|. We closely follow [S. x]| → R by S(t) = Rn. x)| such that (n+1) −f (z) Rn (x) S(x) − S(c) S 0 (z) n! (x − z)n f (n+1) (z) = = = = . At first thought (my first thought. we may consider the Taylor polynomial Tn. In fact this opens the door to a key idea: for any t ∈ I ◦ . at least) consideration of S(t) just seems to make things that much more complicated.4 (Taylor’s Theorem With Remainder) implies Theorem 12.t (x) = f (x) − Tn. k! k=0 To emphasize the dependence on t. this gives f (n+1) (z) Rn (x) = S(c) − S(x) = (x − z)n (x − c). x]: there is z ∈ |(c.t to f centered at t and thus the remainder function n X f (k) (t) (88) Rn. Thm. (x − c)n+1 g(x) − g(c) g 0 (z) −(n + 1)(x − z)n (n + 1)! so f (n+1) (z) Rn (x) = (x − c)n+1 . then Z x Z x (n+1) f (t)(x − t)n dt Rn (x) = −(S(x) − S(c)) = − S 0 (t) = . S is differentiable on |[c.c (x) = Rn (x). since f is n + 1 times differentiable on I. TAYLOR TAYLOR TAYLOR TAYLOR d) We have ||f (n+1) || |Rn (x)| = |f (x) − Tn (x)| ≤ |x − c|n+1 . the expression Rn (x) = f (x) − Tn (x) certainly depends upon the “expansion point” c ∈ I ◦ : thus it would be more accurate to write it as Rn. x]|: there is z ∈ |(c. (n + 1)! Proof.c (x). Differentiating both sides of (88) with respect to t gives n  (k+1) f (k) (t) f (n+1) (t)  0 0 X f (t) k k−1 S (t) = −f (t)− (x − t) + (x − t) =− (x−t)n .x (x) = 0 and S(c) = Rn. notwithstanding the notation chosen. 20. k! (k − 1)! n! k=1 We apply the Mean Value Theorem to S on |[c. x]|. we define a new function S : |[c. (n + 1)! c) If f (n+1) is integrable on |[c. x)| such that S(x) − S(c) −f (n+1) (z) = S 0 (z) = (x − z)n .t (x) = f (x) − (x − t)k .3 (Taylor’s Theorem) under the additional hypothesis that f (n+1) exists and is continuous3 on the interval |[c. n! b) Apply the Cauchy Mean Value Theorem to S(t) and g(t) = (x − t)n+1 on |[c.270 12.t (x).  Exercise: Show that Theorem 12. Indeed. x−c n! Noting that S(x) = Rn. c c n! d) This follows almost immediately from part b).

∞] so that f converges (at least) on (−R. Taylor Series 4. Question 12. do we have T (x) = f (x)? Notice that Question 12. in theory the value of R is given 1 by Hadamard’s Formula R1 = lim supn→∞ |an | n .5a) is simply asking for which values of x ∈ R a power se- ries is convergent. Show that f is infinitely differentiable and in fact f (n) (0) = 0 for all n ∈ N. Namely. A)? The answer is no: consider the function f (x) of Exercise X. Just as with power series. whereas Colin Maclaurin’s Theory of fluxions was not published until 1742 and makes explicit attribution is made to Taylor’s work. f (x) is infin- itely differentiable and has f (n) (0) = 0 for all n ∈ N. But I know no good reason for this – Taylor series were in- troduced by Taylor in 1721. n=0 n! Thus T (x) = limn→∞ Tn (x). It is traditional to call Taylor series centered around c = 0 Maclaurin series.X. so all of our prior work on power series applies. a question to which we worked out a very satisfactory answer in §X.5. so its Taylor series is 4Special cases of the Taylor series concept were well known to Newton and Gregory in the 17th century and to the Indian mathematician Madhava of Sangamagrama in the 14th century.5b): must f (x) = T (x) for all x ∈ (−A. In particular T (x) is a power series. R). . if necessary. n=0 n! indeed. ∞] centered at 0. the set of values x on which a power series converges is an interval of radius R ∈ [0. −1 Exercise: Define a function f : R → R by f (x) = e x2 for x 6= 0 and f (0) = 0. to get from this to the general case one merely has to make the change of variables x 7→ x − c. When dealing with Taylor series there are two main issues. where Tn is the degree n Taylor polynomial at c. More precisely.4 Us- ing separate names for Taylor series centered at 0 and Taylor series centered at c often suggests – misleadingly! – to students that there is some conceptual difference between the two cases. Fix a number A. We define the Taylor series of f at c to be ∞ X f (n) (c)(x − c)n T (x) = . Let f : I → R be an infinitely differentiable func- tion. We may then move on to Question 12.1. and let c ∈ I. 4. So we will not use the term “Maclaurin series” here.X. it is no real loss of generality to assume that c = 0. and in practice we expect to be able to apply the Ratio Test (or. Let f : I → R be an infinitely differentiable function and (n) P∞ (0)xn T (x) = n=0 f n! be its Taylor series. The Taylor Series. T (x) converges. TAYLOR SERIES 271 4. the Root Test) to compute R. a) For which values of x does T (x) converge? b) If for x ∈ I. in which case our series takes the simpler form ∞ X f (n) (0)xn T (x) = . 0 < A ≤ R such that (−A. Henceforth we assume that R ∈ (0. If R = 0 then T (x) only converges at x = 0 and we have T (0) = f (0): this is a trivial case. A) ⊂ I.

A). Indeed. as we have already seen: just apply the Ratio Test. We conclude that f (n) (0) = 1 for P∞ n all n and thus the Taylor series is n=0 xn! . In fact a . There are plenty of other examples. the function f (x) = ex is equal to its Taylor series expansion at x = 0: ∞ x X xn e = . as claimed. TAYLOR TAYLOR TAYLOR TAYLOR 0xn P∞ P∞ T (x) = n=0 n! = n=0 0 = 0. According to Taylor’s Theorem with Remainder. so in either way ||f (n+1) || ≤ e|x| . (n + 1)! where ||f (n+1) || is the supremum of the the absolute value of the (n+1)st derivative on the interval |[0. if we wish to try to show that a T (x) = f (x) on some interval (−A. one could interpret this to mean that we are not really interested in “most” infinitely differentiable functions: the special functions one meets in calculus..e.4 gives us ||f (n+1) || Rn (x) ≤ |x − c|n+1 . it converges for all x ∈ R to the zero func- −1 tion. we have f (x) = T (x) ⇐⇒ f (x) = lim Tn (x) n→∞ ⇐⇒ lim |f (x) − Tn (x)| = 0 ⇐⇒ lim Rn (x) = 0. Of course f (0) = 0. so Rn (x) → 0. engineering and analytic number theory are almost invariably equal to their Taylor series expansions. physics. Therefore f (x) 6= T (x) in any open interval around x = 0. Indeed. at least in some small interval around any given point x = c in the domain. hence every derivative of ex is just ex again. we use Taylor’s Theorem with Remainder to show that Rn (x) → 0 for each fixed x ∈ R. 4. advanced calculus. Theorem 12. n=0 n! First we compute the Taylor series expansion: f (0) (0) = f (0) = e0 = 1. and f 0 (x) = ex . n→∞ n→∞ So it comes down to being able to give upper bounds on Rn (x) which tend to zero as n → ∞. However. Finally. That’s the bad news. since Rn (x) = |f (x) − Tn (x)|. i. in a sense that we will not try to make precise here. x]|. we have a tool for this: Taylor’s Theorem With Remainder.2. But – lucky us – in this case f (n+1) (x) = ex for all n and the maximum value of ex on this interval is ex if x ≥ 0 and 1 otherwise. “most” infinitely differentiable functions f : R → R are not equal to their Taylor series expansions in any open interval about any point. So  n+1  x Rn (x) ≤ e|x| . Indeed. this will hold whenever we can show that the norm of the nth derivative ||f (n) || does not grow too rapidly. (n + 1)! And now we win: the factor inside the parentheses approaches zero with n and is being multiplied by a quantity which is independent of n. f (x) = e x2 6= 0.272 12. Easy Examples. In any case. Next note that this power series converges for all real x. Example: We claim that for all x ∈ R. but for x 6= 0.

and thus our work on the general properties of uniform convergence of power series (in particular the M -test) is not needed here: everything comes from Taylor’s Theorem With Remainder. we define f (x) = (1 + x)α . Suppose that for all A ∈ [0. Exercise: Prove Theorem 12. A]. 1). Then limx→−1+ f (x) = ∞. We give a case study: the binomial series. 4. For n ∈ Z+ . b) For all x ∈ R we have f (x) = T (x): that is.6. ∞) there exists a number MA such that for all x ∈ [−A. then f is defined and infinitely differentiable on (−1. using Theorem 12. Exercise: Suppose f : R → R is a smooth function with periodic derivatives: there exists some k ∈ Z+ such that f = f (k) . |f (n) (x)| ≤ MA .6. it is for α = 32 and it is not for α = 32 ). A little thought shows that the work we did for f (x) = ex carries over verbatim under somewhat more general hypotheses. For x ∈ (−1. Then f is just a polynomial. Case 3: Suppose α < 0. n=1 n! .6 and therefore is equal to its Taylor series expansion at x = c.4 to show Rn (x) → 0 may require nonroutine work. f (n) (x) = (α)(α − 1) · · · (α − (n − 1))(1 + x)α−n . Let f (x) : R → R be a smooth function.3. Example continued: we use Taylor’s Theorem With Remainder to compute e = e1 accurate to 10 decimal places. f is equal to its Taylor series expansion at 0. Let α ∈ R. 4. ∞) and on no larger interval than this. Depending on the value of α. Theorem 12. say on [−A. Case 1: Suppose α ∈ N. Even for familiar. A] and all n ∈ N. in particular f is defined and infinitely differentiable for all real numbers. The upshot of this discussion is that if α is not a positive integer. but in any case f is only hαi times differentiable at x = −1. elementary functions. so the Taylor series to f at c = 0 is ∞ X (α)(α − 1) · · · (α − (n − 1)) n T (x) = 1 + x . Case 2: Suppose α is positive but not an integer. (n) P∞ (0)xn a) The Taylor series T (x) = n=0 f n! converges absolutely for all x ∈ R. Show that f satisfies the hypothesis of Theorem 12. The Binomial Series. TAYLOR SERIES 273 moment’s thought shows that Rn (x) → 0 uniformly on any bounded interval.g. f may or may not be defined for x < −1 (e. so f (n) (0) = (α)(α − 1) · · · (α − (n − 1)). Of course we have f (0) (0) = f (0) = 1.

x) = x =1+ x . n=0 n The binomial series is as old as calculus itself. and this ought not to be surprising because for α ∈ N. show       α α−1 α−1 (89) = + . we rename the Taylor series to f (x) as the binomial series ∞   X α n B(α.7. put   α = 1. 0). n ∈ Z+ . and consider the binomial series ∞   ∞   X α n X α n B(α. 0   + α (α)(α − 1) · · · (α − (n − 1)) ∀n ∈ Z . (1 + x) = x . b) For all α > 0. = . 1) are divergent. Proof. n=0 n n=1 n a) For all such α. c) If α ∈ (−1. Theorem 12. the series B(α. x) converges. x) = x . Second. d) If α ≤ −1. Let α ∈ R \ N. B(α. x) = 1. we would like to show – if possible! – that for all x ∈ I. 1) and B(α. for each fixed α we wish to find the interval I on which the series B(α. −1) are absolutely convergent. n=0 n So let’s extend our definition of binomial coefficients: for any α ∈ R. n n! Exercise: For any α ∈ R.274 12. a) We apply the Ratio Test: .5 It remains one of the most important and useful of all power series. the radius of convergence of B(α. For us. 1) is nonabsolutely convergent. having been studied by Newton in the 17th century. expanding out T (x) simply gives the binomial theorem: α   α X α n ∀α ∈ N. then B(α. −1) and B(α. x) = (1 + x)α . our order of business is the usual one when given a Taylor series: first. TAYLOR TAYLOR TAYLOR TAYLOR If α ∈ N. we recognize the nth Taylor series coefficient as the binomial coefficient α n . n n−1 n Finally. the series B(α.

α .

.

.

.

.

.

α − n.

ρ = lim .

n+1 = lim .

= 1. .

.

.

α  .

n→∞ .

n + 1 .

.

n→∞ .

b) Step 0: Let a > 1 be a real number and m ∈ Z+ . we get  m a−1 a < . a a+m 5In fact it is older. Taking reciprocals. see [Co49]. Then we have  m  m a 1 m m a+m = 1+ ≥1+ >1+ = > 0. For an account of the early history of the binomial series. a−1 a−1 a−1 a a where in the first inequality we have just taken the first two terms of the usual (finite!) binomial expansion. . n so the radius of convergence is ρ1 = 1.

1) is absolutely convergent. x). if S(α. 1) are absolutely convergent for all α ∈ (0. it also shows that B(α. 1). Choose an integer m ≥ 2 such that m < β. we find ∞   ∞     X α n X α−1 α−1 S(α. c) If α ∈ (−1. so does S(α. so 1 that β ∈ (0. n+1 n n+1 P∞ α this shows simultaneously that the sequence of terms of B(α. we get m 2m (n − 1)m am−1 n < ··· 2m − 1 3m − 1 nm − 1 m 2m (n − 1)m m − 1 1 1 = ··· ··· ≤ m−1 2m − 1 (n − 1)m − 1 nm − 1 an n 1 α 1  It follows that an < 1 . 4.  . if S(α − 1. −1) and B(α. Then   α α(1 − α) · · · (n − 1 − α) 1 1 1 | |= < 1(1 − ) · · · (n − 1 − ) n n! m m n! m − 1 2m − 1 (n − 1)m − 1 1 1 = ··· = an . x). TAYLOR SERIES 275 1 Step 1: Suppose α ∈ (0. 1) converges. Choose an integer m ≥ 2 such that m < α. If α ≤ −1. x) for all n ∈ N. By the N th term test. 1) is | α n |. 0) and n ∈ N. x) (absolutely) converges. and hence nm   α n−β lim | | = lim bn · lim = 0 · 1 = 0. −1) and S(α. then |α − n| ≥ n + 1 and thus     α α α−n | / |=| | ≥ 1. x) (absolutely) converges. 1). −1) is absolutely convergent. n+1 n n+1 and thus α  n 6→ 0. d) The absolute value of the nth term of both B(α. 0). then     α α α−n / = ∈ (−1. they are thus absolutely convergent for all non-integers α > 0. n=1 n n=1 n − 1 n Thus for any fixed x. so nm n m X α X 1 | |≤ 1 < ∞. so | n | < 1+ 1 . Further. n n n n 1+ m This shows that B(α. 1) diverge. n (n − 1)! n n Arguing as in Step 1 of part b) shows that bn < 11 . −1) and S(α. Then   α (1 − β)(2 − β) · · · (n − 1 − β) n − β n−β | |= = bn . n→∞ n n→∞ n→∞ n Therefore the Alternating Series Test applies to show that S(α. x) = 1 + x = 1+ + xn = (1 + x)S(α − 1. 1) = n=0 n is decreasing in absolute value and alternating in sign. By an evident induction argument. m 2m (n − 1)m n n say. since | α α  n  n (−1) | = | n |. so does S(α + n. Since S(α. write α = β − 1. S(α. 1). Step 2: Using the identity (89). Using Step 0.

f (−1) = B(α.8. As usual. we have t ∈ (0. f (1) = B(α. 1).276 12. and consider its Taylor series at zero. But |cn (−1)| = |cn (1)|. S(α. We have taken Step 1 of part b) from [Ho66]. so also cn (−1) → 0 and thus Rn−1 (−1) → 0. a) By Theorem 12. Remark: There is an extension of the Ratio Test due to J. Remark: As the reader has surely noted. 1). so by the nth term test cn (xt) → 0 as n → ∞ and thus Rn−1 (x) → 0. f (x) = B(α. the convergence of the binomial series S(α. Z x n Z x f (t)(x − t)n−1 dt 1 Rn−1 (x) = = α(α−1) · · · (α−n+1)(1+t)α−n (x−t)n−1 dt. −1) diverges. so B(α. 1) such that α(α − 1) · · · (α − n + 1) Rn−1 (x) = (1 + θx)α−n (x − θx)n−1 (x − 0). x).L. x) = x . b) If α > −1. α−1 > −1. the binomial series ∞   X α n B(α. Moreover. c) If α > 0. Let α ∈ R \ N. x) at x = ±1 is a rather delicate and tricky enterprise. It follows that n=1 cn (xt) converges. c) IfPα > 0. 1 + θx n−1 Then ∞ X (1 + s)α−1 = cn (s) n=1 and Rn−1 (x) = cn (xt)αx(1 + θx)α−1 .  . [La] Let Tn−1 (x) be the (n − 1)st Taylor polynomial for f at 0. put Rn−1 (x) = f (x)−Tn−1 (x). there is θ ∈ (0. n=0 n a) For all x ∈ (−1. so |xt| < 1. TAYLOR TAYLOR TAYLOR TAYLOR Exercise*: Show that for α ∈ (−1. 0). Raabe which sim- plifies much the of the above analysis.7b). the binomial series B(α. then by Theorem 12. cn (s) = s . 1). −1) is convergent. 0 (n − 1)! (n − 1)! 0 By the Mean Value Theorem for Integrals. including the preceding exercise. In fact most texts at this level – even [S] – do not treat it.4b). ∞ so n=1 cn (1) converges and thus cn (1) → 0. (n − 1)! Put   1−θ α − 1 n−1 t= . 1). b) The above argument works verbatim if x = 1 and α > −1. let f (x) = (1 + x)α . x) = lim Tn−1 (x) n→∞ is the Taylor series expansion of f at zero. Theorem 12. P∞ Since x ∈ (−1. −1). Proof.

Then there is a unique polynomial P of degree at most n such that P (xi ) = yi for 0 ≤ i ≤ n.9. f2 .+Ak (xk −x0 ) · · · (xk −xk−1 ). we can give a more explicit formula for the interpolating polynomial of Theorem 12.10. . . Uniqueness: Suppose that f and g are two such polynomials.) The present result is an instance of this: (90) gives the general form of the solution. we must have h ≡ 0 and thus f = g. “three points determine a parabola”: given real numbers x1 < x2 < x3 and real numbers f1 . . < xn be real numbers. Indeed. < xn be real numbers. then algebraically solve for these parameters. . If we plug in x = x0 we find y0 = P (x0 ) = A0 . . there is a unique linear function f : R → R such that f (x1 ) = f1 and f (x2 ) = f2 . Plugging in x = x1 we find y1 = P (x1 ) = A0 + A1 (x1 − x0 ). . . and let y0 . x1 − x0 x1 − x0 Inductively. getting yk = P (xk ) = A0 +A1 (xk −x0 )+A2 (xk −x0 )(xk −x1 )+. + An (x − x0 ) · · · (x − xn−1 ). and let y0 . i6=j . . which we may solve uniquely for Ak .9. . Ak−1 . . let x0 < . . . . Proof. . . . while leaving certain parameters undetermined. but in our study of quadrature we will want to have it. For 0 ≤ j ≤ n. Then the polynomial h = f − g has degree at most n and vanishes at the n + 1 points x0 . The former is often easier. . HERMITE INTERPOLATION 277 5. (This is sometimes called the method of undetermined coefficients. f2 . we plug in x = xk .. . following Lagrange. having solved uniquely for A0 . Theorem 12. . . there is a unique quadratic polynomial P (x) = ax2 + bx + c such that P (xi ) = fi for i = 1. (Why didn’t we start with this? Because often it is useful to make a distinction between proving the existence of something and giving an explicit construction of it. f3 . define Y x − xn x − x0 x − xj−1 x − xj+1 x − xn `j (x) = = ··· ··· . yn be real numbers. . let x0 < . In a similar way. there are real numbers A0 . . . In fact we will not need the explicit formula for most of our work here. An such that (90) P (x) = A0 + A1 (x − x0 ) + A2 (x − x0 )(x − x1 ) + . Existence: Often the way to find something is to postulate (i. Since a nonzero polynomial cannot have more roots than its degree. (Lagrange Interpolation Formula) Let n ∈ Z+ . One version of this is: given real numbers x1 < x2 and real numbers f1 . . xn . whereas the latter is often more useful. 2. 3. The following is a generalization of this. (Polynomial Interpolation) Let n ∈ Z+ .) Theorem 12. which we can uniquely solve for A1 : explicitly y1 − A0 y1 − y0 A1 = = . . .  In fact. 5. .e. guess) a certain form that the solution should take. Hermite Interpolation It is a well-known fact that “two points determine a line”. . xj − xn xj − x0 xj − xj−1 xj − xj+1 xj − xn 0≤i≤n. yn be real numbers.

. (Generalized Polynomial Interpolation) Let r. + mr . The equation (91) follows.9 by allowing some of the xi ’s to coincide. Therefore P (x) is a polynomial of degree at most n. Here is a more general interpolation theorem. . if a polynomial f can be written as (x − c)m g(x) with g(c) 6= 0. But there is a way of salvaging the situation. j ≤ n. in which case they are simply redundant. Proof. . . . One of the merits of the formula is that they do not depend on the values y0 . up to mr instances of xr . Each `j (x) is a product of n + 1 − 1 = n linear polynomials. then f 0 = m(x − c)m−1 g(x) + (x − c)m g 0 (x) = (x − c)m−1 (mg(x) + (x − c)g 0 (x)) . . . ex- cept instead of taking powers of x − x0 for a fixed center x0 we are taking different points xi . . then the polynomial f 0 has a root of multiplicity m − 1 at c. . ≤ xn ? At first glance no: if for instance x1 = x2 then the conditions f (x1 ) = f1 and f (x2 ) = f2 are inconsistent unless f1 = f2 . m2 instances of x2 . i. might it be possible to generalize Theorem 12. Theorem 12. + mr entries altogether. m1 . . and for each 1 ≤ i ≤ r and 1 ≤ j ≤ mr . . . also immediately. Let’s put n + 1 = m1 + . xn – which in the lingo of this field are often called the interpolation nodes. . Then there is a unique polynomial P of degree at most n such that for all 1 ≤ i ≤ r and 0 ≤ j ≤ mi − 1. . . . j=0 Proof. let fij be a real number. The key observation is that multiple roots of a polynomial function show up as roots of the derivative. xr be real numbers. . so is a polynomial of degree n. To establish (91) the key observation is that for all 1 ≤ i. Now look back at (90): it is reminiscent of the degree n Taylor polynomial..11. . This shows that if a polynomial f has a root of multiplicity m ≥ 1 at a point c. More precisely. Thus our list of numbers contains m1 instances of x1 . . .  . and put n + 1 = m1 + . . . Let x1 < .e. TAYLOR TAYLOR TAYLOR TAYLOR Then an explicit formula for the unique polynomial P (x) of degree at most n such that P (xi ) = yi for all 0 ≤ i ≤ n of Theorem 12.. Observe that when we plug in x = c the expression mg(c)+(c−c)g 0 (c) = mg(c) 6= 0. f 0 (x2 ) = f2 . . . if you like. `j (xi ) is equal to 1 if i = j and 0 if i 6= j. + mr .but in fact it is immediate. mr ∈ Z+ . f (x2 ) = f2 ” when x1 = x2 then is to take the second condition as a condition on the derivative of f at x2 : i. < xn with x0 ≤ x1 ≤ . < xr . A reasonable way of repairing “f (x1 ) = f1 . .278 12.e. . the function y = f (x) that we are interpolating – but only on the values x0 . P (j) (xi ) = fij . . . replacing the condition x0 < .  The polynomials `j (x) are called Lagrange basis polynomials. . .. thus m1 + .. Just brainstorming then. and each occurs with a certain multiplicity mi . We ask the reader to just stop and think about this: the formula looks a little complicated at first. so it is natural to worry that it may not be so easy to see this. But it is probably clearer to switch to a different notation: suppose that we have r distinct real numbers x1 < . yn – or.9 is n X (91) P (x) = yj `j (x).

fn . P (xi ) = f (xi ): here we are using the above slightly shady convention that when the xi ’s occur with multiplicity greater than 1. counted with multiplicity.12. The set S is linearly independent: indeed. and let f : I → R be an n times differentiable function. the conditions P (xi ) = f (xi ) are actually conditions on the derivatives of P and f at xi . and let f : I → R be (n + 1) times differentiable. HERMITE INTERPOLATION 279 Let us now try to switch back to the old notation: we give ourselves n + 1 real numbers “with multiplicity”: x0 ≤ x1 ≤ . so S must be a basis for Pn : in paticular S spans Pn . . We write the interpolation problem as above as f (xi ) = fi . Then there is ζ ∈ I with f (n) (ζ) = 0. and assume that f has at least n+1 roots on I. . + An (x − x0 ) · · · (x − xn−1 ). We begin with one preliminary result. But Pn has dimension n. so a nontrivial linear indeendence relationship would allow us to write a nonzero polynomial as a linear combination of polynomials of smaller degree. ≤ xn ∈ I. xn – such that (x − x0 ) · · · (x − xn ) (n+1) (92) R(x) = f (x) − P (x) = f (ζ). . a and n + 1 real numbers f0 . . . since we have already shown the existence of an interpolating poly- nomial f of degree at most n. ≤ xn . there is ζ ∈ I – in fact.13. In this case we claim that the interpolation polynomial can be taken in the same form as above: namely. . . . Theorem 12. Let us define the remainder function: for x ∈ I. .11 as generalizing Theorem 12. (x − x0 ) · · · (x − xn−1 )} spans the R-vector space Pn of all polynomials of degree at most n. . Having billed Theorem 12. . = xn = c. let us now call atten- tion to the other extreme: suppose x0 = . This brings up a key idea in general: let I be an interval containing the points x0 ≤ . R(x) = f (x) − P (x). we will now give an expression for R which generalizes one form of Taylor’s Theorem With Remainder. . Exercise: Prove Theorem 12. . Namely. but with the understanding that when a root is repeated more than once. #S = n + 1. for all x ∈ I. say. . lying in any closed interval containing x.12. (n + 1)! . . Further. (Hermite With Remainder) Let x0 ≤ . the polynomials have distinct degrees. We define the Hermite inter- polation polynomial P (x) to be the unique polynomial of degree at most n such that for all 0 ≤ i ≤ n. Following [CJ]. Let P be the Hermite Interpolation Polynomial for f . it suffices to show that the set of polynomials S = {1. . . . . . Then the interpolating polynomial P is precisely the degree n Taylor polynomial at c to any n times dif- ferentiable function f with f (j) (c) = fj for 0 ≤ j ≤ n.9. Then. . which is absurd. . (Generalized Rolle’s Theorem) Let f : I → R be n times dif- ferentiable. . the further conditions are conditions on the derivatives of f . 5. At the moment we will prove this by linear algebraic considerations (which is cheat- ing: we are not supposed to be assuming any knowledge of linear algebra in this text!). there are unique real numbers A0 . An such that f = f (x) = A0 + A1 (x − x0 ) + A2 (x − x0 )(x − x1 ) + . ≤ xn . . x − x0 . Theorem 12. . . x0 . (x − x0 )(x − x1 ).

+ Am (x − x0 ) · · · (x − xm−1 ). There is a unique value of c such that K(x) = 0: namely. (x − x0 ) · · · (x − xn ) The function K : I → R thus vanishes at least n + 2 times on I with multiplicity. . . n! c) Show that there is a sequence {ζk } taking values in I such that f (n) (ζk ) An = lim . If x = xi for some i. Use part c) to recover the formula or the nth Taylor series coefficient. . . Let Pn (x) = A0 + A1 (x − x0 ) + . (n + 1)! It follows that (x − x0 ) · · · (x − xn ) (n+1) R(x) = f (ζ). (n + 1)!  Remark: Restricting to Taylor polynomials. R(x) c= . and let m ≤ n. k→∞ n! d) Suppose that f has a continuous (n + 1)st derivative. We may thus assume x 6= xi for any i. ≤ xk is Pm (x) = A0 + A1 (x − x0 ) + . Show that the Hermite interpolation polynomial for f with respect to the approximation points x0 ≤ . Let c ∈ R. then both sides of (92) are 0. b) Suppose xn−1 6= xn . Exercise: a) Let x0 ≤ . Show that there is ζ ∈ [x0 . + An (x − x0 ) · · · (x − xn−1 ) be the Hermite interpolation polynomial for a function f . . . and consider K(x) = R(x) − c(x − x0 ) · · · (x − xn ).280 12. . . so equality holds. xn ] such that f (n) (ζ) An = . . so f (n+1) (ζ) c= . our earlier argument for the existence of the interpolating polynomial is certainly easier: recall this consisted of simply writing down the answer and checking that it was correct. ≤ xn . so by the Generalized Rolle’s Theorem there is ζ ∈ I such that 0 = K (n+1) (ζ) = f (n+1) (ζ) − P (n+1) (ζ) − (n + 1)!c = f (n+1) (ζ) − (n + 1)!c. TAYLOR TAYLOR TAYLOR TAYLOR Proof. However this proof of part b) of Taylor’s Theorem with Remainder seems easier.

This leads to the following definition: if {fn }∞ n=1 is a sequence of real functions defined on some interval I and f : I → R is another function.Pas in the case of just one series. P∞ For us the following example is all-important: let f (x) = n=0 an xn be a power series with radius of convergence Pn R > k0.. P∞ Remark: There is similarly a notion of an infinite series of functions n=0 fn and of pointwise convergence of this series to some limit function f . . . Michael1 1. f1 . fn (x) → f (x). A sequence of real functions is a sequence f0 . to say that x lies Pin the open interval (−R. Then we get a sequence of functions f0 . . R) → R. but let us restrict its domain to (−R. Pointwise Convergence All we have to do now is take these lies and make them true somehow. with each fn a function from I to R. so each fn is a polynomial of degree at most n. . . CHAPTER 13 Sequences and Series of Functions 1.1. 1George Michael. fn . we just define Sn = f0 + . we say fn converges to f pointwise if for all x ∈ I. . Pointwise convergence: cautionary tales. fn . . Put fn = k=0 ak x . it seems like a reasonable strategy to define some sense in which the sequence {fn } converges to the function f . . (We also say f is the pointwise limit of the sequence {fn }. f1 . R) of convergence is to say n that the sequence fn (x) = k=0 ak xk converges to f (x). So f may be viewed as a function f : (−R. . . Indeed. . Indeed. . and its derivatives and antiderivatives can be computed term-by-term. + fn and say that n fn converges pointwise to f if the sequence Sn converges pointwise to f . . since in fact there is an evident sense in which the sequence of functions fn converges to f .. R). Let I be an interval in the real numbers. As above. – G. our stated goal is to show that the function f has many desirable prop- erties: it is continuous and indeed infinitely differentiable. . . therefore fn makes sense as a function from R to R. in such a way that this converges process preserves the favorable properties of the fn ’s. Since the functions fn have all these properties (and more – each fn is a polynomial). . The previous description perhaps sounds overly complicated and mysterious. 1963– 281 .) In particular the sequence of partial sums of a power series converges pointwise to the power series on the interval I of convergence.

so really as nice as a function can be). Exactly what they meant by this – and indeed. Let f0 be the constant function 1. 1] as follows. Let f3 be the function which is equal to f2 except f (1/3) = f (2/3) = 0. b]. Baire. 1] → R that has infinitely many discontinuities! (On the other hand. so fn (0) → 0. so that fn (x) → 0 for 0 < x < 1 and fn (1) → 1.282 13. the function f is not Riemann integrable: for any partition of [0. For any 0 < x ≤ 1. 1] → R. This is a theorem of R. It follows that the sequence of functions {fn } has a pointwise limit f : [0. To get from fn to fn+1 we change the value of fn at the finitely many rational numbers na in [0. Unfortunately the limit function is discontinuous at x = 1. b) contains both rational and irrational numbers. or we’ll get distracted from our stated goal of establishing the wonderful properties of power series. 1] its upper sum is 1 and its lower sum is 0. And second. the function which is 0 for 0 ≤ x < 1 and 1 at x = 1. Let f1 be the function which is constantly 1 except f (0) = f (1) = 0. whether even they knew exactly what they meant (presumably some did better than others) is a matter of serious debate among historians of mathematics. It is possible to construct a sequence {fn }∞ n=1 of polynomial functions converg- ing pointwise to a function f : [0.2 The problem is that statements like this unfortunately need not be true! Example 1: Define fn = xn : [0. To a modern eye.) One can also find assertions in the math papers of old that if fn converges to Rb Rb f pointwise on an interval [a. then the pointwise limit f must be Riemann integrable. that if f is Riemann integrable. The functions fn converges pointwise to a function f which is 1 on every irrational point of [0. Remark: Example 1 was chosen for its simplicity. Thus mathematicians spoke of functions fn “approaching” or “getting infinitely close to” a fixed function f . 1] and 0 on every rational point of [0. there are in fact two things to establish here: first that if each fn is Riemann integrable. SEQUENCES AND SERIES OF FUNCTIONS The great mathematicians of the 17th. But we had better not talk about this. its integral is the limit of the sequence of integrals of the fn ’s. Clearly fn (0) = 0n = 0. then a fn dx → a f dx. 1] → R. especially power series and Taylor series) and often did not hesitate to assert that the pointwise limit of a sequence of functions having a certain nice property itself had that nice property. 18th and early 19th centuries encountered many sequences and series of functions (again. And so forth. so it is Riemann integrable. 2This is an exaggeration. not to exhibit maximum pathol- ogy. The precise definition of convergence of real sequences did not come until the work of Weierstrass in the latter half of the 19th century. Since every open interval (a. it turns out that it is not possible for a pointwise limit of continuous functions to be discontinuous at every point. Let f2 be the function which is equal to f1 except f (1/2) = 0. 1]. In fact both of these are false! Example 2: Define a sequence {fn }∞ n=0 with common domain [0. Thus each fn is equal to 1 except at a finite set of points: in particular it is bounded with only finitely many discontinuities. Therefore the pointwise limit of a sequence of continuous functions need not be continuous. despite the fact that each of the functions fn are continuous (and are polynomials. the sequence fn (x) = xn is a geometric sequence with geometric ratio x. . 1] from 1 to 0. Thus a pointwise limit of Riemann integrable functions need not be Riemann integrable.

interchange of limit operations is justified. integration or differentiation. n1 ] the function forms a “spike”: 1 1 f ( 2n ) = 2n and the graph of f from (0. and its integral is the area of a triangle with 1 base n1 and height 2n: 0 fn dx = 1. Uniform Convergence Most of the above pathologies vanish if we consider a stronger notion of convergence. So Z 1 Z 1 lim fn = 1 6= 0 = lim fn . 2. On the other hand. and fn (x) = 0 for x ≥ n1 . n→∞ 0 0 n→∞ Example 4: Let g : R → R be a bounded differentiable function such that limn→∞ g(n) does not exist. As we can see: in general it does matter! This is not to say that the interchange of limit operations is something to be systematically avoided. 0 fn0 (x) = ng n(nx) = g 0 (nx). Then for all x ∈ R. so lim fn0 (1) = lim g 0 (n) does not exist. UNIFORM CONVERGENCE 283 Example 3: We define a sequence of functions fn : [0. there exists n ∈ Z+ such that for all n ≥ N . On the contrary. Let M be such that |g(x)| ≤ M for all x ∈ R. we are given  > 0 and must find an N ∈ Z+ which . |fn (x)| ≤ M n . We say fn converges uni- u formly to f and write fn → f if for all  > 0. for any fixed nonzero x. so fn conveges pointwise to the function f (x) ≡ 0 and thus f 0 (x) ≡ 0. 2n) is a straight line. |fn (x) − f (x)| < . it is an essential part of the subject. define g(nx) fn (x) = n . 0) to ( 2n . and we are wondering whether it changes things to perform the limiting process on each fn individually and then take the limit versus taking the limit first and then perform the limiting process on f . n→∞ n→∞ Thus lim fn0 (1) 6= ( lim fn )0 (1). differentiability. under some specific additional hypotheses. 2..e. we have some other limiting process corresponding to the condition of continuity. the N is allowed to depend both on  and the point x ∈ I. On the other hand this sequence converges pointwise to the zero function f . n→∞ n→∞ A common theme in all these examples is the interchange of limit operations: that is.) For n ∈ Z . But we need to develop theorems to this effect: i. as is the 1 1 graph of f from ( 2n . Let {fn } be a sequence of functions with domain I. How does this definition differ from that of pointwise convergence? Let’s com- pare: fn → f pointwise if for all x ∈ I and all  > 0. The only difference is in the order of the quantifiers: in pointwise convergence we are first given  and x and then must find an N ∈ Z+ : that is. and in “natural circumstances” the interchange of limit operations is probably valid. |fn (x) − f (x)|. (For instance. On the interval [0. hence RiemannR integable. 1] → R as follows: fn (0) = 0. we may take g(x) = sin( πx + 2 ). In particular fn is piecewise linear hence con- tinuous. integrability. In the defini- tion of uniform convergence. In particular f 0 (1) = 0. there exists N ∈ Z+ such that for all n ≥ N and all x ∈ I. 2n) to ( n . 0).

Suppose that for all n ∈ Z+ . and in particular if fn converges to f uniformly. limx→c fn (x) = Ln . Step 2: We show that limx→c f (x) = L (so in particular the limit exists!). Then f is continuous on I. we find that by taking x ∈ (c − δ. Then the sequence {Ln } is convergent. limx→c f (x) exists and we have equality: lim Ln = lim lim fn (x) = lim f (x) = lim lim fn (x).284 13. |fm (x) − fn (x)| < . 2.  Corollary 13. The following result is the most basic one fitting under the general heading “uniform convegence justifies the exchange of limiting operations. Since we don’t yet have a real number to show that it converges to. Moreover. we get |f (x) − L| <  and thus limx→c f (x) = L. Lemma 13. the first and last term will each be less than 3 for sufficiently large n. Actually the argument for this is very similar to that of Step 1: |f (x) − L| ≤ |f (x) − fn (x)| + |fn (x) − Ln | + |Ln − L|.2. Let {fn } be a sequence of functions with common domain I. n ≥ N and all x ∈ I we have |fm (x) − fn (x)| < 3 . Since fn (x) → Ln .” Theorem 13. n→∞ n→∞ x→c x→c x→c n→∞ Proof. the middle term will be less than 3 for x sufficiently close to c. n ≥ N and all x ∈ I. for any  > 0 there exists N ∈ Z+ such that for all m. The following are equivalent: (i) For all  > 0. Exercise: Prove Lemma 13. . Uniform Convergence and Inherited Properties. n ≥ N . By the Cauchy criterion for uniform convergence.1.1. x 6= c and m. u (iii) fn → f . Thus uniform convergence is a stronger condition than pointwise convergence. |fn+k (x) − fn (x)| < .1. there is N ∈ Z+ such that for all n ≥ N . say to the real number L. then certainly fn converges to f pointwise. it is natural to try to use the Cauchy criterion.3. all k ≥ 0 and all x ∈ I. Suppose u moreover that fn → f . Let fn be a sequence of continuous functions with common u domain I and suppose that fn → f on I. Combining these three estimates. the fact that fm (x) → Lm and fn (x) → Ln give us bounds on the first and last terms: there exists δ > 0 such that if 0 < |x − c| < δ then |Ln − fn (x)| < 3 and |Lm − fm (x)| < 3 . we have    |Lm − Ln | ≤ + + = . Step 1: We show that the sequence {Ln } is convergent. and let c be a point of I. Since Ln → L and fn (x) → f (x). c + δ). Now comes the trick: for all x ∈ I we have |Lm − Ln | ≤ |Lm − fm (x)| + |fm (x) − fn (x)| + |fn (x) − Ln |. 3 3 3 So the sequence {Ln } is Cauchy and hence convergent. hence to try to bound |Lm − Ln |. (Cauchy Criterion For Uniform Convergnece) Let fn : I → R be a sequence of functions. SEQUENCES AND SERIES OF FUNCTIONS works simultaneously (or “uniformly”) for all x ∈ I. (ii) For all  > 0. Overall we find that by taking x sufficiently close to (but not equal to) c. there is N ∈ Z+ such that for all m.

2 breaks down when applied to this non-uniformly convergent sequence. b) and fn (b) → f (b). P)| ≤ (b − a) +  + (b − a) = (2(b − a) + 1). So for any partition P = {a = x0 < x1 < .4. Let {fn } be a sequence of (Riemann-Darboux) integrable func- u tions with common domain [a. b]. By uniform convergence. d])| ≤ . Let x ∈ I. Theorem 13. since f → f . n−1 X |U (fn . P)|+|U (fn . P)−L(f. P)| ≤ (b − a). P)−L(f. Then f is integrable and Z b Z b Z b lim fn = lim fn = f. i=0 and similarly. a) Show directly from the definition that the convergence of fn to f is not uniform. d] ⊂ [a. we get    |f (x) − f (c)| < + + = . Fix  > 0. by Darboux’s Criterion there is a partition P of [a. |fn (x) − f (x)| < . there is N ∈ Z+ such that for all n ≥ N and all x ∈ [a. b].3 is easier than Theorem 13. P) − L(f. we include a separate proof. b] such that U (fN . [c. Further. Thus |U (f. there exists n ∈ Z+ such that |f (x) − fn (x)| < 3 for all x ∈ I: in particular |fn (c) − f (c)| = |f (c) − fn (c)| < 3 . since fn (x) is continuous. [xn . b] → R be a sequence of functions. [c. Exercise: Let fn : [a. Consolidating these estimates.2. there exists δ > 0 such that for all x with |x − c| < δ we have |fn (x) − fn (c)| < 3 . it follows that for any subinterval [c. d]) − inf(f. b]. 2. thus we need to show that for any  > 0 there exists δ > 0 such that for all x with |x − c| < δ we have |f (x) − f (c)| < .  3 3 3 Exercise: Consider again fn (x) = xn on the interval [0. b) Try to pinpoint exactly where the proof of Theorem 13. n→∞ a a n→∞ a u Proof. Step 1: We prove the integrability of f . | inf(fn . < xn−1 < xn = b} of [a. P)| ≤ | sup(fn . The idea – again! – is to trade this one quantity for three quantities that we have an immediate handle on by writing |f (x) − f (c)| ≤ |f (x) − fn (x)| + |fn (x) − fn (c)| + |fn (c) − f (c)|. UNIFORM CONVERGENCE 285 Since Corollary 13. . P) − L(fN . P)−U (fn . xn+1 ]) − sup(f. d])| ≤ . b]. 1]. [c. 1) and 1 at x = 1. P)|+|L(fn . We need to show that limx→c f (x) = f (c). P) < . P)| ≤ |U (f. | sup(fn . [xn . Proof. We saw in Example 1 above that fn converges pointwise to the discontinuous function f which is 0 on [0. b] and n ≥ N . . Since fN is integrable. Show TFAE: u (i) fn → f on [a. P) − U (f. d]) − sup(f. |L(fn . P)−L(fn . xn+1 ]|(xi+1 − xi )| i=0 n−1 X ≤ (xi+1 − xi ) = (b − a). [c. . u (ii) fn → f on [a. Suppose that fn → f .

certainly fn0 → g on [a.5. limn→∞ fn0 (1) does not exist.3 g is continuous. We will give a relatively simple result sufficient for our coming applications. Corollary 13. But as we saw above.6. Let {fn }∞ n=1 be a sequence of functions on [a. g : [a. b]. For each n. Then n=0 Fn → F . b].  Exercise: It follows from Theorem 13. or in other words ( lim fn )0 = lim fn0 . Now applying Theorem 13. For this we should reconsider Example 4: let g : R → R be a bounded differentiable function such that limn→∞ g(n) does not exist. n=0 n=0 . b] to some function f . then Z b Z b Z b Z b | f− g| = | f − g| ≤ |f − g| ≤ (b − a). b] → R be the 0 unique function with Fn = fn and Fn (a) = 0. n→∞ n→∞ u u Proof.4 and the Fundamental Theorem of Calculus we have Z x Z x Z x 0 g= lim fn = lim fn0 = lim fn (x) − fn (a) = f (x) − f (a).286 13. Darboux’s Criterion shows f is integrable on [a. We suppose: (i) Each fn is continuously differentiable on [a. Exercise: Prove Corollary 13. b] such that n=0 fn → f . so fn → 0. Our next order of business is to discuss differentiation of sequences of functions. we just need suitable hypothe- ses. Then f is differentiable and f 0 = g. and similarly let F : [a. Well. b]. Verify this directly. let Fn : [a. b]. (ii) The functions fn converge pointwise on [a. Then for all x ∈ R.4 that the sequences in Examples 2 and 3 above are not uniformly convergent. b] → R are integrable and |f (x) − g(x)| ≤  for all x ∈ [a. b] to some function g. Theorem 13. b].  P∞ Corollary 13. 0 Then f is differentiable and f = g: ∞ X ∞ X (93) ( fn )0 = fn0 . b]. we get g = (f (x) − f (a))0 = f 0 . Let x ∈ [a. and (iii) The functions fn0 converge uniformly on [a. Since 0 each fn is continuous. Let {fn } be a sequence of continuous functions defined on P∞ u the interval [a. What we want is true in practice. Since fn0 → g on [a. x]. and let fn (x) = g(nx) n . Let n=0 fn (x) be a series of functions converging pointwise P∞ u to f (x). Thus we have shown the following somewhat distressing fact: uniform conver- gence of fn to f does not imply that fn0 converges. a a n→∞ n→∞ a n→∞ Differentiating and applying the Fundamental Theorem of Calculus. SEQUENCES AND SERIES OF FUNCTIONS Since  > 0 was arbitrary.5. Step 2: If f. don’t panic. Suppose that each fn0 is continuously differentiable and n=0 fn0 (x) → g. |fn (x)| ≤ n . Let M be M u such that |g(x)| ≤ M for all R. by Corollary 13. b] → R be the P∞ u unique function with F 0 = f and F (a) = 0. The details are left to you.7. a a a a u Rb Rb From this simple observation and Step 1 the fact that fn → f implies a fn → a f is almost immediate.

8. If there is g : [a. Since fn → f . b]. 2 2 By the Cauchy criterion. Theorem 13. n ≥ N implies |fm (x0 ) − 0 fn (x0 )| 2 and |fm (t) − fn0 (t)| < 2(b−a)  for all t ∈ [a. pp. b] and all m. b] to some function f . The Weierstrass M-test.6 (or more precisely. t ∈ [a. Thus Corollary 13. and choose N ∈ Z+ such that m. b] such that {fn (x0 )} is convergent for some x0 ∈ [a. so once again by the Cauchy criterion ϕn converges uniformly for u u all t 6= x.   |fm (x) − fn (x)| = |g(x)| ≤ |g(x) − g(x0 )| + |g(x0 )| < + = . Although Theorem 13. we say that the se- ries can be differentiated termwise or term-by-term. b]. b] and define fn (t) − fn (x) ϕn (t) = t−x and f (t) − f (x) ϕ(t) = . Proof.7 gives a condition under which a series of functions can be differentiated termwise. fn is uniformly convergent on [a.2 on the interchange of limit operations: f 0 (x) = lim ϕ(t) = lim lim ϕn (t) = lim lim ϕn (t) = lim fn0 (x). t−x so that for all n ∈ Z+ .152-153] Step 1: Fix  > 0. of its derivatives) has many pleasant consequences. b]. The latter inequality is telling us that the derivative of g := fm − fn is small on the entire interval [a. 2(b − a) 2 It follows that for all x ∈ [a. n ≥ N . 2.     (94) |g(x) − g(t)| = |x − t||g 0 (c)| ≤ |x − t| ≤ . [R. Applying the Mean Value Theorem to g. we get ϕn → ϕ for all t 6= x. limx→t ϕn (t) = fn0 (x).7) will be sufficient for our needs. Finally we apply Theorem 13. t→x t→x n→∞ n→∞ t→x n→∞  2. n ≥ N . Corollary 13. we cannot help but record the following stronger version. 0 P P P 0 When for a series n fn it holds that ( n fn ) = n fn .2. Now by (94) we have  |ϕm (t) − ϕn (t)| ≤ 2(b − a) for all m. b] → R such that u u fn0 → g on [a. Let {fn } be differentiable functions on the interval [a. The next order of business is to give a useful general criterion for a sequence of functions to be uniformly convergent. We have just seen that uniform convergence of a sequence of functions (and possibly. b] and f 0 = g. Step 2: Now fix x ∈ [a. we get a c ∈ (a.7. b]. b] → R such that fn → f on [a. b]. then there is f : [a. b) such that for all x. UNIFORM CONVERGENCE 287 Exercise: Prove Corollary 13. .

288 13. P . (Weierstrass M-Test) Let {fn }∞ n=0 be a sequence of functions defined on an interval I. x∈I In (more) words. Then n=0 fn is uniformly convergent. For x ∈ I.PLet {Mn }∞n=0 be a non-negative sequence such that ||fn || ≤ ∞ P∞ Mn for all n and M = n=0 Mn < ∞. Since n Mn < ∞. Let SN (x) = n=0 fn (x). n>N Mn < . we define ||f || = sup |f (x)|. Theorem 13. ∞] such that |f (x)| ≤ M for all x ∈ I. SEQUENCES AND SERIES OF FUNCTIONS For a function f : I → R. PN P Proof. for each  > 0 there is N0 ∈ Z+ such that for all N ≥ N0 .9. N ≥ N0 and k ∈ N. ||f || is the least M ∈ [0.

.

.

NX +k .

X X |SN +k (x) − SN (x)| = .

fn (x).

. ≤ |fn (x)| ≤ Mn < .

.

.

.

f is in fact infinitely differentiable. (Wonderful Properties of Power Series) Let P n=0 an xn be a ∞ power series with radius of convergence R > 0. Pn Thus by the Weierstrass M -test f is the uniform limit of the sequence Sn (x) = k=0 ak xk . d) For all n ∈ N. in order to show that f = n an xn = n fn is differentiable and the derivative may be compuited termwise. c) We have just seen that for a power series f convergent on (−R. By X. Since any x ∈ (−R.10. n=1 0 c) Since the power series f has the same radius of convergence R > 0 as f . R) lies in [−A.7 applies to show f 0 (x) = n=0 nan xn−1 . because power series converge absolutely on the interior of their interval of convergence. so f defines a function from [−A. hence by the result of part a) it is uniformly P∞ convergent on [−A.2 f is continuous on [−A. P A]. f is continuous P on (−R. b) f is differentiable. we compute that n fn0 = n (an xn ) = n nan−1 xn−1 . hence continuous and infinitely differentiable.7. b) According to Corollary 13. Then: a) f is continuous. R). A] to R.X. it is enough to check that (i) each fn is continuously differentiable and (ii) n fn0 is uniformly convergent. since fn = an xn – of course Pmonomial P functions are P continuously differentiable. its derivative may be computed termwise: X∞ f 0 (x) = nan xn−1 . and thus n n P n ||a n x || = n |an |A < ∞. Therefore Corollary 13. this power series also has radius of convergence R. Consider f (x) = n=0 an xn as a function f : (−R. Indeed. A] for some 0 < A < R. P But (i) is trivial. P R). But each Sn is a polynomial function. f (n) (0) = (n!)an . So by Theorem 13. R) → R. So we may continue in . As for (ii). Proof. A]. Power Series II: Power Series as (Wonderful) Functions P∞ Theorem 13. Morever. We claim that the series n an xn converges to f uniformly on [−A. A]. n=N +1 n>N n>N Therefore the series is uniformly convergent by the Cauchy criterion. its derivative f 0 is also given by a power series convergent on (−R.  3. as a function on [−A. A]. we have ||an xn || = |an |An . a) Let P 0 < A < R. R).

Ra ). Suppose that for some δ with 0 < δ ≤ Ra we have f (x) = g(x) for all x ∈ (−δ. Suppose that f (xn ) = g(xn ) for all n ∈ Z+ . often too much to hope for.  Exercise: Prove Theorem 13. P∞ Exercise: Let f (x) = n=0 an xn be a power series with an ≥ 0 for all n. Let {xn }∞ n=1 be a sequence of elements of (−A. (ii) We have an = 0 for all sufficiently large n.10d). (ii) The power series converges uniformly on [0. A) \ {0} such that limn→∞ xn = 0. The upshot of Corollary 13. Exercise: Let n an xn be a power series with infinite radius of convergence. 1). Show that the following P are equivalent: (i) f (1) = n an converges. hence P P f : R → R.11. since in general n=0 an = n=0 bn does not imply an = bn for all n. We leave this as an exercise to the reader. Then an = bn for all n. d) The formula f (n) (0) = (n!)an is simply what one obtains by repeated termwise differentiation. This is not obvious. so that f defines a function on (−1. 1]. Suppose that the radius of convergence is 1. then ∞ an n+1 F (x) = n=0 n+1 x is an anti-derivative of f . an xn with positive radius of conver- P The fact that for any power series f (x) = n (n) f (0) gence we have an = n!yields the following important result. The following exercise drives home that uniform convergence of a sequence or series of functions on all of R is a very strong condition. δ). (iii) f is bounded on [0. P∞ Exercise:PShow that if f (x) = n=0 an xn has radius of convergence R > 0. POWER SERIES II: POWER SERIES AS (WONDERFUL) FUNCTIONS 289 this way: by induction. Exercise: Suppose f (x) = n an xn and g(x) = n bn xn are two power series each P P converging on some open interval (−A. . Another way of saying this is that the only power series a function can be equal to on a small interval around zero is its Taylor series. (Uniqueness Theorem) Let f (x) = n an xn and g(x) = P P Corollary n n bn x be two power series with radii of convergence Ra and Rb with 0 < Ra ≤ Rb . 13. A). Show that the following are equivalent: defining a function (i) The series n an xn is uniformly convergent on R. derivatives of all orders exist. so that both f and g are infinitely differentiable functions on (−Ra . 3.11 is that the only way that two power series can be equal as functions – even in some very small interval around zero P∞ – is if allPof∞their coefficients are equal. Show that an = bn for all n. 1).

.

3. In particular if one learns about Fourier series or complex analysis then very natural proofs can be given. Bn > 0 for all n ≥ 1. Precisely. no tools beyond standard freshman calculus).1 have been given. and this certainly suffices. Among these we give here a particularly nice argument due to D. we have (96) An = (2n − 1)nBn−1 − 2n2 Bn . Since then several branches of mathematical analysis have been founded which give systematic tools for finding sums of this and similar series. but both of these topics are beyond the scope of an honors calculus course. Proof. Recall that the p-series n=1 np converges iff p > 1. Bn = x2 cos2n xdx. 0 2n − 1 2n b) For all positive integers n. in the intervening centuries literally hundreds of proofs of Theorem 14. Matsuoka [Ma61]. For all positive integers N . some of which use only tools we have developed (or indeed. we have: Z π 2 An An−1 (95) sin2 x cos2(n−1) xdx = = . a) Integrating by parts gives Z π2 An = cos x cos2n−1 xdx 0 291 . a) For all positive integers n. It is another matter entirely to determine the sum of the series exactly. (Euler) n=1 n12 = π6 . For n ∈ Z+ . On the other hand. In this section we devote our attention to the p = 2 case. Theorem 14. N π2 X 1 π2 0≤ − ≤ . n=1 n2 = 6 P∞ 1 Let p ∈ R. In fact this argument barely uses notions from infinite series! Rather. P∞ 2 Theorem 14.2. 0 0 Exercise: Show that An . 6 n=1 n2 4(N + 1) The proof will exploit a family of trigonometric integrals. 2 PN it gives an upper bound on π6 − n=1 n12 in terms of N which approaches 0 as N → ∞. Lemma 14. CHAPTER 14 Serial Miscellany π2 P∞ 1 1. we will show the following result. we define Z π2 Z π2 An = cos2n xdx. Daners [Da12] following Y. Euler’s original argument is brilliant but not fully rigorous by modern standards.1.

Thus Z π 2 An sin2 x cos2(n−1) xdx = = An−1 − An . 0 2 0 24 we have 2B0 π2 = . N X 1 π2 2BN 2 = − . n=1 n 6 AN Equivalently N π2 X 1 2BN (97) − 2 = > 0. 2n 2n 2n − 1 2n 2n − 1 2n − 1 b) Integrating by parts twice gives1 Z π2 Z π 2 2n An = 1 · cos xdx = 2n x sin x cos2n−1 xdx 0 0 Z π 2   =n x2 cos x cos2n−1 x − (2n − 1) sin2 x cos2(n−1) x dx 0 Z π 2 = −nBn + n(2n − 1) x2 (1 − cos2 x) cos2(n−1) xdx 0 = (2n − 1)nBn−1 − 2n2 Bn . we get 1 (2n − 1)Bn−1 2Bn 2Bn−1 2Bn 2 = − = − . 0 . 0 2n − 1 and     An−1 1 An 1 (2n − 1)An + An An = An + = = . B0 = x2 dx = .  Dividing (96) by n2 An and using (95). n nAn An An−1 An Thus for all n ∈ Z+ we have N N   X 1 X 2Bn−1 2Bn 2B0 2BN 2 = − = − . n=1 n n=1 An−1 An A0 AN Since π π π3 Z Z 2 π 2 A0 = dx = . 6 n=1 n AN π 1This time we leave it to the reader to check that the boundary terms uv | 2 evaluate to 0. SERIAL MISCELLANY Z π π 2 = (sin x)(cos2n−1 x) |02 − (sin x)((2n − 1) cos2(n−1) x)(− sin x)dx 0 Z π Z π 2 2 = (2n − 1) sin2 x cos2(n−1) xdx = (2n − 1) (1 − cos2 x) cos2(n−1) xdx 0 0 = (2n − 1)(An−1 − An ).292 14. A0 6 and thus for all N ∈ Z+ .

... + n + .. . AN 2 0 AN 2 2(N + 1) 4(N + 1) which proves Theorem 14. P∞ Namely. + n + . . the corresponding change in the sequence An of partial sums is not simply a reordering.. 2.. In this section we systematically investigate the validity of the “commutative law” for infinite sums. 2 Exercise: Prove Lemma 14. L + ). For instance. there are only finitely many terms of the sequence lying outside the interval (L − .2. REARRANGEMENTS AND UNORDERED SUMMATION 293 Lemma 14. 4 2 8 2 Then the first partial sum of the new series is 14 ..1. 2. the formal notion of rearrangement of a series n=0 an begins with a . Namely.3a) with N = n − 1 we get N Z π2 π2 X 1 2BN 2 0< − 2 = = x2 cos2N xdx 6 n=1 n A N A N 0 Z π 2  π 2 2 2  π 2 AN π2 ≤ sin2 x cos2N xdx = = .. the definition we gave for convergence of an infinite series a1 + a2 + . + an + . Thus there is some evidence to fuel suspicion that reordering the terms of an infi- nite series may not be so innocuous an operation as for that of an infinite seuqence. For all x ∈ [0. we have:   2 x ≤ sin x π and thus  π 2 x2 ≤ sin2 x. in terms of the limit of the sequence of partial sums An = a1 + . . π2 ]. . if we reorder 1 1 1 1 + + + . with a precision that might otherwise look merely pedantic. . The Prospect of Rearrangement. .4 and Lemma 14. + an makes at least apparent use of the ordering of the terms of the series. However.4. 2 4 8 2 as 1 1 1 1 + + + .4. as one can see by looking at very simple examples. Rearrangements and Unordered Summation 2. a description which makes clear that convergence to L will not be affected by any reordering of theP terms of the ∞ sequence. Note that this is somewhat surprising even from the perspective of infinite sequences: the statement an → L can be expressed as: for all  > 0. (Hint: use convexity!) Using Lemma 14. whereas every nonzero partial sum of the original series is at least 12 . All of this discussion is mainly justification for our setting up the “rearrangement problem” carefully. if we reorder the terms {an } of an infinite series n=1 an .

N } .. for all sufficiently large) N . so for all S. nk } be any finite subset of the natural numbers. n=0 S S n=0 The case of absolutely convergent series follows rather easily from this. . . and the case of nonabsolute convergence is where all the real richness and subtlety lies. let N be P PN the largest element of S. P∞ The point here is that the description n=0 an = supS AS is manifestly unchanged by rearranging the terms of the series by any permutation σ: taking S 7→ σ(S) gives a bijection on the set of all finite subsets of N. Proof.6 is that we have expressed the sum of a series with non- negative terms in a way which is manifestly independent of the ordering of the terms:2 for any bijection σ of N. N } for some N ∈ N. + ank . P∞ Question 14.N } : in other words. P Lemma 14. Let n an be a real series with an ≥ 0 for all n P and sum A ∈ [0.294 14. ∞ Pk is simply the supremum of the set An = n=0 ak of finite sums. Then: P ∞ a) Does the rearranged series n=0 aσ(n) converge? b) If it does converge. . the case of absolute convergence is not much harder than that. The Rearrangement Theorems of Weierstrass and Riemann. Let n=0 an = S is a convergent infinite series. . ... On the other hand. . Since an ≥ 0 for all n. Let A be the supremum of the finite sums AS .5. AN = a0 + . for any nonempty finite subset S of N.  The point of Lemma 14. Since AN = A{0. ∞].. We define the rearrange- σ of N. . . For N ∈ N. Then A is the supremum of the set of all finite sums AS = n∈S an as S ranges over all nonempty finite subsets of N. P∞ Indeed. suppose that an ≥ 0 for all n.. 2. SERIAL MISCELLANY permuation P∞ P∞σ : N → N. the sequence AN is increasing.. so A = limN →∞ AN = supN AN . does it converge to S? As usual. AS ≤ AN for some (indeed. . . . . + aN = A{0. N } so AS = n∈S an ≤ n=0 aN = AN ≤ A. the special case in which all terms are non-negative is easiest. and thus ∞ X ∞ X an = sup AS = sup Aσ(S) = aσ(n) .. each partial sum AN arises as AS for a suitable finite subset S. nk } ranges over all finite subsets 2This is a small preview of “unordered summation”. This shows that if we define A0 = sup AS S as S ranges over all finite subsets of N. let PN AN = n=0 an .. . In this case the sum A = n=0 an ∈ [0. . we have A = supN AN ≤ supS AS = A. the subject of the following section. The most basic questions on rearrangements of series are as follows. .2.6. On the other hand. Then S ⊂ {0. . Therefore A ≤ A0 and thus A = A0 . and put AS = an1 + . . so A = sup AS ≤ A. and let σ be a permutation of N. . i. then A0 ≤ A. let S = {n1 . Now every finite subset S ⊂ N is contained in {0.e. More gener- ally. . . . for all N ∈ N. a bijective function ment of n=0 an by σ to be the series n=0 aσ(n) . as S = {n1 . Thus A = A..

. there exists a permutation σ absolutely convergent of N such that n=0 aσ(n) = B. M X M X ∞ X ∞ X | aσ(n) − A| = | aσ(n) − an | ≤ |an | < .  Exercise: a) Give a proof of Step 1 of Theorem 14. . . . .e. aN0 −1 appear in both PM P∞ n=0 aσ(n) and n=0 an and thus get cancelled. 2.7. rather. . ≥ pn ≥ . For this it is convenient to imagine that the sequence {an } has been sifted into a disjoint union of two subsequences. So we may specify a rearangement as follows: we specify a choice of a certain number of positive terms – taken in decreasing order – and then a choice of a certain number of negative terms – taken in order of decreasing absolute value . P∞ Theorem 14. . nn } comprise the terms of the series.30 which tells us that since the convergence is nonabsolute. n1 ≤ n2 ≤ . . For any B ∈ [−∞. one consisting of the positive terms and one consisting of the negative terms (we may assume without loss of generality that there an 6= 0 for all n).7 that bypasses Lemma 14. Thus we have two sequences p1 ≥ p2 ≥ .6. we may even imagine both of these subseqeunce ordered so that they are decreasing in absolute value. so does σ(S) = {σ(n1 ). . P P + b) Use the decomposition P − of n an into its series of positive parts n an and negative parts n an to give a second proof of Step 2 of Theorem 14. ∞]. P∞ P∞ Proof. Let M0 ∈ N be sufficiently large so that the terms aσ(0) . Then for every permutation σ of N. . so we may choose M such that |an | ≤ M for all n. . ≤ nn ≤ . aN0 −1 (and possibly others). ∞]. REARRANGEMENTS AND UNORDERED SUMMATION 295 of N. Put A = n=0 an . aσ(M0 ) include all the terms a0 . . It follows that ∞ X ∞ X an = aσ(n) ∈ [0.8. n=N |aσ(n) | < . σ(nk )}. The key point Phere is Proposition P 11. rearrangement of a series with non-negative terms does not disturb the conver- gence/divergence or the sum. we are going to describe σ by a certain process. . (Weierstrass) Let n=0 an be an absolutely convergent P∞ series with sum A. ≥ 0. If we like. . .7. . . . ≤ 0 so that together {pn . . n nn = −∞. . n=0 n=0 n=0 n=N0 Indeed: by our choice of M we know that the terms a0 . . we have an → 0 and thus that {an } is bounded.. argue that for each  > 0 and all sufficiently large N . (Suggestion: P∞by reasoning as in Step 2. Then for all M ≥ M0 . . but by applying the triangle inequality and summing the absolute values we get an upper bound by assuming no further cancellation. We are not going to give an explicit “formula” for σ. P Step 1: Since n an is convergent. Fix  > 0 and let N0 ∈ N be such that n=N0 |an | < . (Riemann Rearrangement Theorem) Let n=0 an be a non- P∞ series. . P∞ Theorem 14. the rearranged series n=0 aσ(n) converges to A. n pn = ∞. This shows P∞ PM n=0 aσ(n) = limM →∞ n=0 aσ(n) = A. some further terms may or may not be cancelled. Proof. n=0 n=0 i. .

and so on. and so forth. Then we take one more negative term n2 . and note that the partial sum is still at least 2. . at least k negative terms.8 holds under somewhat milder hypotheses. . SERIAL MISCELLANY – and then a certain number of positive terms. As long as we include a finite. a little thought shows that the absolute value of the difference between the partial sums of the series and B approaches zero. Then we take at least one more positive term pN1 +1 and possibly further terms until we arrive at a partial sum which is at least M + 2. ∞]. .296 14. Step 3 (diverging to −∞): P An easy adaptation of the argument of Step 2 leads to ∞ a permutation σ such that n=0 aσ(n) = −∞. . With these issues in mind. + nN2 is less than B. And we continue in this manner: after the kth step we have used at least k positive terms. we can make the series diverge to ±∞ or converge to any given real number! Thus nonabsolute convergence is necessarily of a more delicate and less satisfactory P nature than absolute convergence. even if 0 > B. b) Does the conclusion of part a) hold without the assumption that the sequnece of terms is bounded? Theorem 14. then in the end we will have included every term pn and nn eventually. nN2 . Exercise: Let n an be a real series such that n a+ P P n = ∞. Then we repeat the process. a) Suppose that the sequence P {a n } is bounded. and a series to be conditionally convergent if it is convergent but not unconditionally convergent. Since |n1 | ≤ M .8 holds: for any A ∈ [−∞. p2 . . + pN1 + n1 is still at least 1. We first take positive terms p1 . Step 4 (converging to B ∈ R): if anything. . . (Main Rearrangement Theorem) A convergent real series is unconditionally convergent if and only if it is absolutely convergent. Show that the conclusion of Theorem P∞ 14. Because the general term an → 0. taking enough positive terms to get a sum strictly larger than B then enough negative terms to get a sum strictly less than B. there exists a permutation σ of N such that n=0 aσ(n) = A.) Then we take negative terms n1 . positive number of terms at each step.9. . + pN1 is greater than B. pN1 . Then much of our last two theorems may be summarized as follows. Show that there exists a permuta- tion σ of N such that n aσ(n) = ∞. . Theorem 14. hence we will get a rearrangement.8 exposes the dark side of nonabsolutely convergent series: just by changing the order of the terms. . . (To be sure. . then we take the first negative term n1 . and all the partial sums from that point on will be at least k. + pN1 + n1 + . Step 2 (diverging to ∞): to get a rearrangement diverging to ∞. . n an = ∞ and n an = −∞. the partial sum p1 + . . . we take at least one positive term. . in order until we arrive at a partial sum which is at least M + 1. we proceed as follows: we take positive terms p1 . the argument is simpler in this case. We leave this case to the reader. stopping when the partial sum p1 + . . stopping when the partial sum p1 + . . Therefore every term gets included eventually and the sequence of partial sums diverges to +∞.  The conclusion of Theorem 14. Because both the positive and negative parts diverge. . we define a series n an to be unconditionally convergent if it is con- vergent and every rearrangement converges to the same sum. . P P + P − Exercise: Let n an be a real series such that an → 0. this construction can be completed.

we say that the unordered sum s∈S as converges to A if: for all 3Both of these are well beyond the scope of these notes. Namely. with the full understanding of the implications of our defini- ∞ ion of n=0 an as the limit of a sequence of partial sums. .3.. versus P • n an converges to A but some rearrangement n aσ(n) does not. Show P∞ that there exists a permutation σ of N such that the set of partial limits of n=0 aσ(n) is the closed interval [a. . ArmedP now. REARRANGEMENTS AND UNORDERED SUMMATION 297 Many texts do not use the term “nonabsolutely convergent” and instead define a se- ries to be conditionally convergent if it is convergent but not absolutely convergent. we define aT = s∈T as = as1 + . it seems correct to make a distinction between the following two a priori different phenomena: P P • Pn an converges but n |an | does not. Aside from the fact that this terminology can be confusing to students to whom this rather intricate story of rearrangements has not been told. i. . However the notion of an infinite series n an . A]. P We wish to define s∈S as . The point here is that we recover the usual definition of a sequence by taking S = N (or S = Z+ ) but whereas N and Z+ come equipped with a natural ordering. the “unordered sum” of the numbers as as s P of S.. these two phenomena P are equivalent for real series. As we have seen. and let −∞ ≤ a ≤ A ≤ ∞. . Unordered summation. and define an S-indexed sequence of real numbers to be a function a• : S → R. let S be a nonempty set. you are certainly not expected to know what I am talking about here. for instance3 for series with values in an infinite- dimensional Banach space or with values in a p-adic field. sN } ranges over all elements of S. one in which the ordering of the terms is a priori immaterial? The answer to this question is yes and is given by the theory of unordered summation. whereas in the latter case one can show that every convergent series is unconditionally convergent whereas there exist nonabsolutely convergent series. i. it seems reasonable to ask: is there an alternative definition for the sum of an infinite series. . P∞ Exercise: Let n=0 an be any nonabsolutely convergent real series. Here it is: for every finite subset T = {s1 . (We also define a∅ = 0. . . In the former case it is a celebrated theorem of Dvoretzky-Rogers [DR50] that there exists a series which is unconditionally convergent but not absolutely convergent. the “naked set” S does not.) Finally. 2.e. as we are. It is very surprising that the ordering of the terms of a nonabsolutely convergent series affects both its convergence and its sum – it seems fair to say that this phe- nomenon was undreamt of by the founders of the theory of infinite series. 2.e. To be sure to get a definition of the sum of a series which does not depend on the ordering of the terms. absolute and unconditional convergence makes sense in other contexts. it is helpful to work in a context in which no ordering is present. for A ∈ R. + aPsN .

298 14. there is a unique function a• : ∅ → R.) Notation: because we are often going to be considering various finite subsets T of a set S. It follows that the real sequence {aTn } is Cauchy. 4And it’s not totally insane to do so: these arise in functional analysis and number theory. T 0 of S containing Tn . Theorem 14. (iv) There exists M ∈ R such that for all T ⊂f S. for each n ∈ Z+ . |aT − aT 0 | < 2 . there exists a finite subset T0 ⊂ S such that for allPfinite subsets T0 ⊂ T ⊂ S P have |aT − A| < . One could also consider unordered summation of S-indexed sequences with values in an arbtirary normed abelian group (G.4 A key feature of R is the positive-negative part decomposition.10. |M | ≥ A implies M ≥ A or M ≤ −A. with sum aS = s∈S as . Confusing Remark: We say we are doing “unordered summation”. X X | as − as | < . In other words. T 0 of S containing T . P P Exercise: Give reasonable definitions for s∈S as = ∞ and s∈S as = −∞. (iii) For all  > 0. so when we need to make the distinction we will say unordered summable. (ii) For all  > 0. |aT | ≤ M . Proof. We claim that a• is summable to A: indeed. . or equivalently the fact that for M ∈ R. but we delay such arguments for as long as possible. and consider the following assertions: (i) The S-indexed sequence a• is summable. SERIAL MISCELLANY  > 0. At a certain point in the theory considerations like this must be used in or- der to prove the desired results. we have |aT | = | s∈T as | < . the P “empty func- tion”. say to A. (Cauchy Criteria for Unordered Summation) Let a• : S → R be an S-index sequence.PShow that every S-indexed sequence a• : S → R is summable. The following result holds without using the positive-negative decomposition. (ii) =⇒ (i): We may choose. we denote the fact that A is a finite subset of B by A ⊂f B. (When S = Z≥N we already have a notion of summability. hence convergent. If there exists A ∈ R such that s∈S as = A. (i) =⇒ (ii) is immediate from the definition. Convince yourself that the most reasonable value to assign s∈∅ as is 0. we say that we s∈S as is convergent or that the S-indexed sequence a• is summable. choose n > 2 . but our se- quences take values in R. we allow ourselves the following time-saving notation: for two sets A and B. a finite subset Tn of S such that Tn ⊂ Tn+1 for all n and such that for all finite subsets T. there are exactly two ways for a real number to be large in absolute value: it can be very positive or very negative. for  > 0. | |). Exercise: If S = ∅. there exists a finite subset T ⊂ S such that for all finite subsets T. in which the absolute value is derived from the order structure. Exercise: Suppose S is finite. s∈T s∈T 0 Pthere exists T ⊂f S such that: for all T ⊂f S with T ∩ T = ∅.

the finite sums are uniformly bounded. (iii) =⇒ (iv): Using (iii).11. 2 From this and the triangle inequality it follows that if T and T 0 are two finite subsets containing T .10 there is  > 0 with the following property: for any T ⊂f S there is T 0 ⊂f S with T ∩ T 0 = ∅ and |aT 0 | ≥ . Now suppose a• is not summable. we can also find a T 00 disjoint from T ∪ T 0 with |aT 00 | ≥ . and put T 0 = T ∪ T0 . |aT | ≤ M . Then. Theorem 14.. choose T1 ⊂f S such that for all T 0 ⊂f S with T1 ∩ T 0 = ∅.10 above but not condition (iii): given any finite subset T ⊂ Z+ . 2. for any finite subset T containing Tn we have   |aT − A| ≤ |aT − aTn | + |aTn − A| <+ = .  Confusing Example: Let G = Zp with its standard norm. 2 2 (ii) =⇒ (iii): Fix  > 0.  |aT 0 − aT | = |aT 0 \T | < . and choose T0 ⊂f as in the statement of (ii). disjoint from T . It follows that aTn = |aTn+ | − |aTn− | hence  ≤ |aTn | ≤ |aTn+ | + |aTn− |. s∈T \T1 s∈T ∩T1 s∈T1 P so we may take M = 1 + s∈T1 |as |. But now decompose Tn = Tn+ ∪ Tn −. Therefore a• satisfies condition (iv) of Theorem 14. we may take T 0 = {n}. Define an : Z+ → G by an = 1 for all n. |aT 0 | ≤ 1. if we can find such a T 0 . Proof. such that |aT 0 | = 1: indeed. we have for any T ⊂f S |aT | = |#T | ≤ 1. REARRANGEMENTS AND UNORDERED SUMMATION 299 Then. there exists M ∈ R such that for all T ⊂f S. and so forth: there will be a sequence {Tn }∞n=1 of pairwise disjoint finite subsets of S such that for all n. so by Theorem 14. where n is larger than any element of T and prime to p. Although we have no reasonable expectation that the reader will be able to make any sense of the previous example. Because of the non-Archimedean nature of the norm. Now let T ⊂f S with T ∩ T0 = ∅. we offer it as motivation for delaying the proof of the implication (iv) =⇒ (i) above. In Theorem 14. Of course. where Tn+ consists of the elements s such that as ≥ 0 and Tn− consists of the elements s such that as < 0. s∈T s∈T 0 s∈T0 (iii) =⇒ (ii): Fix  > 0. |aT − aT 0 | < . . Then T 0 is a finite subset of S containing T0 and we may apply (ii): X X X | as | = | as − as | < . so X X X |aT | ≤ | as | + | as | ≤ 1 + |as |. and let T ⊂f S be such that for all finite subsets T of S with T ∩ T = ∅. there exists a finite subset T 0 . |aTn | ≥ . which uses the positive-negative decomposition in R in an essential way.e. for any finite subset T 0 of S containing T . Then for any T ⊂f S.10 above we showed that if a• is summable. write T = (T \ T1 ) ∪ (T ∩ T1 ). |aT | < 2 . An S-indexed sequence a• : S → R is summable iff the finite sums ae uniformly bounded: i.

14. . . for every M > 0. P (ii) =⇒ (i): We will prove the contrapositive: suppose the unordered sum n∈N an is divergent. there exists T such that for all finite subsets T of S disjoint from T . then by the Pigeonhole Principle there must exist 1 ≤ i1 < . If we now consider T10 . as the proof of that result shows. we have ||a|T | < .11 a• is not summable. Now for any permutation σ of N. T2n−1 0 . SERIAL MISCELLANY from which it follows that max |aTn+ . aT ≥ M . there exists a finite subset T ⊂ S such that A −  < aT ≤ A. . (i) =⇒ (ii): Fix  > 0. Let |a• | be the S-indexed sequence s 7→ |as |. Moreover. so the rearranged series n=0 aσ(n) also converges to A. with sum A. It follows that the infinite for all P ∞ series n=0 an converges P to A in the usual sense.12. with sum A. Then X as = sup aT . Then we have a disjoint union and no cancellation.300 14. We leave it to you to check that we can therefore build a rearrangement of the series with unbounded partial sums. By definition of the supremum. Thus the partial sums of a• are not uniformly bounded. we have A − aT ≤ aT 0 ≤ A. Then there exists T ⊂f S such that for every finite subset T of N containing T weP have |aT − A| < . But as in the proof of Theorem 14. the unordered sum n∈Z+ aσ(n) is manifestly the same as the unordered sum P P∞ n∈Z+ an . P∞ (ii) The series n=0 an is unconditionally convergent. Suppose |a• | is summable. there exists T ⊂f S such that |a|T ≥ 2M .  Proposition 14. Put N = maxn∈T n. T ⊂f S s∈S Proof. and thus X X |aT | = | as | ≤ | |as || = ||a|T | < . Then a• is summable iff |a• | is summable. there exists a subset TM ⊂f S such that for every finite subset T ⊃ TM .  Theorem 14. there must exist a subset T 0 ⊂ T such that (i) aT 0 consists entirely of non-negative terms or entirely of negative terms and (ii) |aT 0 | ≥ M . Proof. But the assumption A = ∞ implies there exists T ⊂f S such that aT ≥ M . for any  > 0. so we may define for all n a subset Tn0 ⊂ Tn such that |aTn0 | ≥ 2 and the sum aTn0 consists either entirely of non-negative elements of entirely of negative elements. there exists T ⊂ S with |aT = s∈T as | ≥ M . .13. s∈T s∈T Suppose |a• | is not summable. . and by Theorem 14. . and then non-negativity gives aT 0 ≥ aT ≥ M for all finite subsets T 0 ⊃ T . Let A = supT ⊂f S aT .12. so a• → A. Let 0 Tn = j=1 Tij . < in ≤ 2n − 1 such that all the terms Sn in each Ti0 are non-negative or all the terms in each Ti0 are negative. we can choose T to be disjoint from any given finite subset.. {0. . Proof. for any finite subset T 0 ⊃ T . (Absolute Nature of Unordered Summation) Let S be any set and a• : S → R be an S-indexed sequence. We must show that for any M ∈ R. Then n n ≥ N . . Then by Proposition 14.  . P Then by Theorem 14. For (i) The unordered sum n∈Z+ an is convergent. .11. Indeed. Next suppose A = ∞. n} ⊃ T so | k=0 ak − A| < . Let a• : S → R be an S-indexed sequence with as ≥ 0 for all s ∈ S. We first suppose that A < ∞. . so |Tn | ≥ n 2 : the finite sums aT are not uniformly bounded.11 for every M ≥ 0. TFAE: Theorem 14.  P a• : N → R an ordinary sequence and A ∈ R. aTn − | ≥ 2 . Then for any  > 0.

14 without appealing to the fact that |x| ≥ M implies x ≥ M or x ≤ −M ? For instance.13. then n |an | < ∞. Exercise*: Can one prove Theorem 14.12 and 14. so by Proposition 14. On the other hand.n and we are done: there is no need to set up separate the- ories of convergence. when we apply the very general definition of unordered summa- bility to the classical case of S = N. a more complicated result. the results on unordered summability become very helpful. then we can get an easier P proof of (ii) =⇒ (i): if the series n an is unconditionally convergent. if the unordered sequence is summable. Thus the discrepancy between two chosen bijections corresponds precisely to a rearrangement of the se- ries. order-dependent notion of convergence is buying us: namely. b• : S2 → R. It may perhaps be disappointing that such an elegant theory did not gain us anything new. the theory of conditionally convergent series. b2 : S → N. in fact in great multitude: if S is any countably infinite set. Exercise: Let S1 and S2 be two sets. Recall that our first proof of this de- pended on the Riemann Rearrangement Theorem.n∈N We may treat the first case as the unordered sum associated to the Z-indexed se- quence n 7→ an and the second as the unordered sum associated to the N×N-indexed sequence (m. By Theorem 14. hence by Theorem 14. then the choice of bijection b is immaterial.13 we get a second proof of the portion of the Main Rearrangement Theorem that says that a real series is unconditionally convergent iff it is absolutely convergent. where Cauchy products are easy to deal with). and let a• : S1 → R. b2 ◦ b−1 1 : N → N is a permutation of N. Or. For instance. m.n . often in nature one encounters biseries ∞ X an n=−∞ and double series X am. we recover precisely the theory of absolute (= unconditional) convergence.10 the unordered sequence |a• | is summa- ble.14. The theory of products of infinite series comes out especially cleanly in this un- ordered setting (which is not surprising.14 holds for S-indexed sequences with values in any Banach space? Any complete normed abelian group? Comparing Theorems 14. 2. REARRANGEMENTS AND UNORDERED SUMMATION 301 Exercise: Fill in the missing details of (ii) =⇒ (i) in the proof of Theorem 14. We . In both cases such bijections exist.12 the unordered sequence a• is summable. as we are getting an unconditionally convergent series. we may shoehorn these more ambitiously indexed series into conventional N-indexed series: this involves choosing a bijection b from Z (respectively N × N) to N. since it corresponds to the case of absolute convergence. However when we try to generalize the notion of an infinite series in various ways. if we prefer. if we allow ourselves to use the previously derived result that unconditional convergence and absolute convergence P coincide. then for any two bijections b1 . does Theorem 14. n) 7→ am. To sum up (!). This gives us a clearer perspective on exactly what the usual.

302 14.s2 ) = as1 bs2 . there is N ∈ Z such that | n=N an | < . SERIAL MISCELLANY assume the following nontriviality condition: there exists s1 ∈ S1 and s2 ∈ S2 such that as1 6= 0 and as2 6= 0. 1). a) ([C. s∈S1 ×S2 s1 ∈S1 s2 ∈S2 c) When S1 = S2 = N. Fix  > 0. because n an converges. 3. a) Show that a• and b• are both summable iff (a. By our work on power series – specifically P∞ Lemma 11. so by Theorem 13. Since we have convergence P at x = 1. Show that if a• is summable. 1].5 and let a• : S → R be an S-indexed sequence.1). there is no need to attempt this exercise if you do not already know and care about uncountable sts. Statement and Proof. 1). N X +k | an xn | ≤ xN M ≤ xN  < . Now we apply Abel’s Lemma (Proposition 10. show ! ! X X X (a. of polynomials) on [0.  P∞ Remark: Usually “Abel’s Theorem” means part b) of the above result: n=0 an = limx→1− an xn . . . 47]) Since n=0 an 1 = n=0 an . We define (a. n=0 an xn is a uniform limit of continuous functions (indeed. Exercise: Let S be an uncountable set. Abel’s Theorem 3. P∞ n P∞ Proof. withP“bn sequence” xN .3.