You are on page 1of 99

Boot Camp: Real Analysis Lecture Notes

Lectures by Itay Neeman


Notes by Alexander Wertheim
August 23, 2016

Introduction
Lecture notes from the real analysis class of Summer 2015 Boot Camp, delivered by
Professor Itay Neeman. Any errors are my fault, not Professor Neeman’s. Corrections are
welcome; please send them to [firstinitial][lastname]@math.ucla.edu.

Contents
1 Week 1 3
1.1 Lecture 1 - Construction of the Real Line . . . . . . . . . . . . . . . . . . . . 3
1.2 Lecture 2 - Uniqueness of R and Basic General Topology . . . . . . . . . . . 7
1.3 Lecture 3 - More on Compactness and the Baire Category Theorem . . . . . 11
1.4 Lecture 4 - Completeness and Sequential Compactness . . . . . . . . . . . . 16

2 Week 2 20
2.1 Lecture 5 - Convergence of Sums and Some Exam Problems . . . . . . . . . 20
2.2 Lecture 6 - Some More Exam Problems and Continuity . . . . . . . . . . . . 25
2.3 Lecture 7 - Path-Connectedness, Lipschitz Functions and Contractions, and
Fixed Point Theorems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
2.4 Lecture 8 - Uniformity, Normed Spaces and Sequences of Functions . . . . . 34

3 Week 3 39
3.1 Lecture 9 - Arzela-Ascoli, Differentiation and Associated Rules . . . . . . . . 39
3.2 Lecture 10 - Applications of Differentiation: Mean Value Theorem, Rolle’s
Theorem, L’Hopital’s Rule and Lagrange Interpolation . . . . . . . . . . . . 45
3.3 Lecture 11 - The Riemann Integral (I) . . . . . . . . . . . . . . . . . . . . . 51
3.4 Lecture 12 - The Riemann Integral (II) . . . . . . . . . . . . . . . . . . . . . 58

4 Week 4 65
4.1 Lecture 13 - Limits of Integrals, Mean Value Theorem for Integrals, and In-
tegral Inequalities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
4.2 Lecture 14 - Power Series (I), Taylor Series, and Abel’s Lemma/Theorem . . 72
4.3 Lecture 15 - Stone-Weierstrass and Taylor Series Error Approximation . . . . 80
4.4 Lecture 16 - Power Series (II), Fubini’s Theorem, and exp(x) . . . . . . . . . 87

1
5 Week 5 95
5.1 Lecture 17 - Some Special Functions and Differentiation in Several Variables 95
5.2 Lecture 18 - Inverse Function Theorem, Implicit Function Theorem and La-
grange Multipliers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
5.3 Lecture 19 - Multivariable Integration and Vector Calculus . . . . . . . . . . 99

2
1 Week 1
As per the syllabus, Week 1 topics include: cardinality, the real line, completeness, topology,
connectedness, compactness, metric spaces, sequences, and convergence.

1.1 Lecture 1 - Construction of the Real Line


Today’s main goal will be the construction of the real numbers. We will take the construc-
tion of N, Z, and Q for granted.

Let’s start with a fact. The rationals form a dense linear order with no endpoints.
Unpacked, this means:

(i) Dense: For all x and y, there exists z such that x < z < y

(ii) Linear: For all x and y, either x < y or x = y or y < x

(iii) No endpoints: For all x, there exists y such that y < x; for all x, there exists y such
that y > x

It turns out that every countable dense linear order with no endpoints is isomorphic to
(Q; <). We will come back to this result after a brief discussion of cardinality.

Cardinality:

Definition 1.1.1. Two sets are equinumerous (written A ∼


= B) if there is a bijection
f : A → B.

Note that ∼
= determines an equivalence relation:
(i) ∼
= is reflexive (take the identity)
(ii) ∼
= is symmetric (if f : A → B is a bijection, then f −1 : B → A is also a bijection)
(iii) ∼
= is transitive (if f : A → B and g : B → C are bijections, then g ◦ f : A → C is a
bijection)

Definition 1.1.2. A set x is finite if there exists n ∈ N such that x ∼


= {0, 1, . . . , n − 1}.
Definition 1.1.3. A set x is infinite if x is not finite.

We write A  B if there exists an injection f : A → B.

Theorem 1.1.4 (Cantor-Schroeder-Bernstein). If A  B and B  A, then A ∼


= B.
The proof of CSB is beyond the scope of this lecture, so we omit it here. Using CSB, we
can prove several useful facts.

Corollary 1.1.5 (Pigeonhole Principle). For all n, m ∈ N, if n < m, then {0, 1, . . . , m−1} 
{0, 1, . . . , n − 1}.

3
Proof. By the uniqueness of the cardinality of finite sets, we have {0, 1, . . . , n − 1} 6∼ =
{0, 1, . . . , m−1}, so by CSB, we must have {0, 1, . . . , n−1}  {0, 1, . . . , m−1} or {0, 1, . . . , m−
1}  {0, 1, . . . , n − 1}. It must be the latter, since inclusion is clearly an injection from
{0, 1, . . . , n − 1} to {0, 1, . . . , m − 1}
Note that one doesn’t really need the strength of CSB to prove the Pigeonhole Principle;
a direct argument can be made. We have the following (nearly) immediate corollary.
Corollary 1.1.6. For finite A, B, if A ( B, then A ∼
6= B.
Example 1.1.7. Note that the above corollary fails for infinite sets. Indeed, N and N \ {0}
are indeed equinumerous via the map n 7→ n + 1.
Now we wll talk a bit about countable sets.
Definition 1.1.8. A set A is countable if A is equinumerous with N, i.e. A ∼
= N.
Example 1.1.9. Here are some familiar faces which are countable:
(i) N is countable; just take the identity map.
(ii) Z is also countable. One can biject N with Z as follows:
0, 1, 2, 3, 4, . . .
0, 1, −1, 2, −2, . . .

(iii) More generally, if A and B are countable sets, then A ∪ B is also countable.
Claim 1.1.10. N × N is countable.
Proof. Proof 1: There is an obvious injection from N to N × N given by n 7→ (0, n). On the
other hand, (n, m) 7→ 2n 3m is an injection from N × N to N by the fundamental theorem of
arithmetic, so by CSB, N ∼ = N × N.
Proof 2: Picture the elements of N × N as a square lattice, e.g. by identifying N × N
with the corresponding set of points in the Cartesian plane. Starting at the first element of
this square (the origin, as it were), and for the k th element on the bottom row, count up k
elements and over k − 1 elements to the left. That is, we define a bijection f : N → N × N
so that f (k 2 + 1), . . . , f ((k + 1)2 ) lists the elements of the (k + 1) × (k + 1) square minus
elements of the k × k contained inside it.
Corollary 1.1.11. If A and B are countable, then so is A × B.
Proof. Fix bijections fA : N → A and fB : N → B. Fix n 7→ (h1 (n), h2 (n)) a biijection from
N to N × N. Then n 7→ (fA (h1 (n)), fB (h2 (n))) is a bijection from N to A × B.
Corollary 1.1.12. Q is countable.
Proof. There is an injection from N to Q given by inclusion. Also, there is an injection from
Q to Z × Z by mapping each element p/q ∈ Q (in lowest terms) to (p, q) ∈ Z × Z. By the
previous corollary, since Z is countable, Z × Z is countable, so there is an injection from
Z × Z to N, whence composing injections, we obtain an injection from Q to N. Applying
CSB, we’re done.

4
Now we may return to our claim stated earlier.
Theorem 1.1.13. Every countable dense linear order with no endpoints is isomorphic to
(Q; <).
Proof. Fix a countable dense linear order (L; <L ) with no endpoints. Let f : N → L, g : N →
Q be bijections; this is possible since L and Q are both countable. Our strategy will be as
follows: by induction on n, we will construct sequences an ∈ L, bn ∈ Q with the following
properties.
(1) an 6L am if and only if bn 6 bm for all n, m ∈ N (This guarantees injectivity, since
bn 6 bm and bm 6 bn implies an 6L am and am 6L an . The map is similarly well-
defined.)
(2) {a0 , a1 , . . .} = L
(3) {b0 , b1 , . . .} = Q
The map an 7→ bn will then be our isomorphism. One can see that (1) is very nearly all we
need. It guarantees that our map between Q and L is injective, well-defined, and respects
the order on each set. Condition (2) guarantees that our map covers all of L, and condition
(3) guarantees that our map is surjective.
To define an , bn we will work inductively. Suppose (inductively) that a0 , . . . , an−1 , b0 , . . . , bn−1
have been defined, and satisfy (1).
Case 1: If n is even, set an = f (n/2) (this covers (2)!). Then, pick b ∈ Q such that for all
m 6 n ∈ N, am 6L an if and only if bm 6 b and an 6L am if and only if b 6 bm . One can do
this because Q is a dense linear order and has no endpoints. We are picking b so that we
respect the order of elements chosen so far, i.e. to preserve (1).
Case 2: If n is odd, set bn = g((n − 1)/2) (this covers (3)!). Pick an ∈ L to preserve (1);
this is again possible L since is a dense linear order and has no endpoints.
Note then that {a0 , a2 , a4 , . . .} = L and {b1 , b3 , b5 , . . .} = Q, so conditions (2) and (3) met.

Construction of the real line (R):

We start with a familiar fact. The motivation for why we would like to do calculus on R
and not Q is that Q has natural ’gaps’ which R (as we will see) does not.
Proposition 1.1.14. There is no q ∈ Q such that q 2 = 2.
While Q has gaps, we may often approximate real numbers to arbitary accuracy.
Proposition 1.1.15. For every ε > 0 ∈ Q, there exists q ∈ Q such that q 2 < 2 < (q + ε)2 .
Proof. Suppose that there exists ε > 0 ∈ Q such that for all q ∈ Q, our claim is false.
Taking q = 0, we find ε2 6 2, and by the above proposition, since ε ∈ Q, the inequality is
strict. Further, if (nε)2 < 2 for any n > 1 ∈ N, then taking q = nε, we find (q + ε)2 6 2,
i.e. ((n + 1)ε)2 6 2. Again, since n + 1 is rational, we must have strict inequality, i.e.
((n + 1)ε)2 < 2. Hence, by induction, (n · ε)2 < 2 for all n ∈ N. This is impossible, for
example, taking the smallest n greater than the rational number 2/ε.

5
Now we will construct the real numbers, using equivalence classes of strictly increasing
sequences of bounded rational numbers, which we will naturally identify with their supre-
mums.
Definition 1.1.16. A sequence (an )∞
n=0 is strictly increasing if for all n, m ∈ N, n <
m =⇒ an < am . We say (an )∞
n=0 is bounded (in Q) if there exists c ∈ Q such that for all
n ∈ N, an < c.
Let E be the set of all strictly increasing bounded sequences of rationals. For (an ), (bn ) ∈
E, set (an ) ≈ (bn ) if and only if ∀n ∈ N, ∃m ∈ N such that bm > an and ∀n ∈ N, ∃m ∈ N
such that am > bn . In colloquial terms, the sequences (an ) and (bn ) are interleaved.
Proposition 1.1.17. ≈ is an equivalence relation on E.
Proof. It is straightforward to verify the three necessary conditions.
(1) (an ) ≈ (an ), since (an ) is strictly increasing

(2) ≈ is symmetric by the requirement for equivalence

(3) ≈ is also transitive by a (layered) application of the requirement for equivalence

For (an ) ∈ E, let [an be the equivalence class of (an ), which formally is the set {(bn )|(bn ) ≈
(an )}. By moving to equivalence classes, we have [an ] = [bn ] if and only if (an ) ≈ (bn ), i.e.
we translate equivalence to equality. Let E ∗ be the set of equivalence classes. Define < on
E ∗ by setting [an ] < [bn ] if ∃k ∈ N such that ∀n ∈ N, an < bk . Informally, [an ] < [bn ] if the
terms of (bn ) eventually bound the terms of (an ).
There are two things to check here, namely that < is well-defined, and < is a linear order
on E ∗ .
Well-defined: If (an ) ≈ (a0n ), (bn ) ≈ (b0n ), suppose there exists k ∈ N such that ∀n ∈
N, an < bk . Ten take l such that bk < b0l . Then for all n ∈ N, we have m ∈ N such that
a0n < am . So a0n < am < b0l , so < is well-defined.
Linear order on E ∗ : This is precisely what we have rigged in our definition of the equiva-
lence relation on E. That is, if [an ] 6 <[bn ] and [bn ] 6 <[an ], then (an ) and (bn ) are interlaced,
so (an ) ≈ (bn ), i.e. [an ] = [bn ].
Now, there is the matter of identifying the rationals in E ∗ . The map p 7→ [(p − 1/n)∞ n=1 ]

embeds Q into E (that is, is an order-preserving injection). From now on, we will identify
[(p − 1/n)∞ ∗
n=1 ] with p for p ∈ Q. Replacing E with an isomorphic copy, we have Q ⊆ E ;

call this isomorphic copy R, the real line.


Proposition 1.1.18. Q is dense in R. This means ∀x, y ∈ R such that x < y, there exists
z ∈ Q such that x < z < y.
Proof. Say x = [an ], y = [bn ]. Since x < y, there exists k ∈ N such that an < bk for all n ∈ N.
Take z = bk+1 ∈ Q. Then z < y, since bk+1 < bk+2 , and every element of z is bounded by
bk+1 , hence is bounded by bk+2 . We also have z > x, since (by the archimedean property of
the rationals), there exists n ∈ N such that bk < bk+1 − 1/n.

6
Corollary 1.1.19. R is a dense linear ordering.

Proof. This is clear, since Q ⊆ R!

Proposition 1.1.20. R has no endpoints.

1.2 Lecture 2 - Uniqueness of R and Basic General Topology


Today, we will talk about the properties which characterize R, as well as some general
topology.

Definition 1.2.1. An order (L; <) is Dedekind complete if

(i) Every A ⊆ L which is bounded above has a supremum (i.e., a <-least upper bond)

(ii) Every A ⊆ L which is bounded below has an infimum (i.e., a <-greatest lower bound)

Proposition 1.2.2. R is Dedekind complete.

Proof. Let A ⊆ R be bounded above. Let f : N → Q be onto, i.e. enumerate the rationals.
Put An = {f (i) | i 6 n, ∃x ∈ A such that f (i) 6 x}, and let (an ) be the sequence defined by
an = max{An }. Clearly, an 6 an+1 , since An ⊆ An+1 . Colloquially, we are building (an ) to
be a sequence of nondecreasing elements of Q which are less than some element of A, with
the goal of showing that [an ] is the sup of A. We break into two cases:
Case 1: Suppose (an )∞ n=0 is eventually constant. Say that an = p ∈ Q for all n > k for some
k ∈ N. Can check that p is a sup for A.
Case 2: Otherwise, we can thin (an )∞ ∞
n=1 to a subsequence (ank )k=1 which is strictly increas-
ing. Can check that [ank ] is a sup for A.

Example 1.2.3. Q is not Dedekind complete. Let A = {q ∈ Q | q 2 < 2}. Then A has no
least upper bound in Q. Let z = sup A; we will see z 2 = 2.

Definition 1.2.4. A linear order (L; <) is separable if there is a countable D ⊆ L which is
dense in L.

Let’s recap. So far, we know that (R, +) is:

(1) a dense linear order with no endpoints

(2) Dedekind complete

(3) separable

Proposition 1.2.5. (1)+(2)+(3) characterizes (R; <) uniquely up to isomorphism.

Proof. Let (L; <) satisfy (1)+(2)+(3). Using (3), pick a countable dense subset D of L. D
naturally inherits the linear order from L, and cannot have any endpoints. Indeed, if D
has a (say) right endpoint α, then since L has no endpoints, there are β1 , β2 ∈ L such that
α < β1 < β2 ; but then L has no point of D between β1 and β2 , contradicting the denseness
of D in L. Hence, D is a countable dense linear order with no endpoints, so (D; <) is

7
isomorphic to Q. Let f : D → Q witness this. We will extend f to an isomorphism from L
to R.
For every x ∈ L, let Ωx = {u | u ∈ D, u 6 x} ⊆ L. Since Ωx is a bounded set in L, and
since D has no endpoints, there exists d ∈ D such that d > x. Since f is order preserving,
f (d) > f (u) for every u ∈ Ωx , so f (u) must be an upper bound for f (Ωx ) in R. Thus, we
may put f (x) = supR (f (Ωx )) by the Dedekind completeness of R.
Note that if x < y ∈ L, then there exist z1 , z2 ∈ D such that x < z1 < z2 < y by the
density of D in L. Since f is order-preserving on the elements of D, f (z2 ) > f (z1 ) > f (u)
for every u ∈ Ωx , so f (z2 ) > f (z1 ) > f (x). However, since z2 ∈ Ωy , f (z1 ) < f (z2 ) 6 f (y), so
f (x) < f (y). This estalishes that f is order-preserving and injective on L.
Fix z ∈ R and let A = {p ∈ Q | p 6 z}. Let x be the sup in L of B = f −1 (A) ⊆ D; x exists
since L is Dedekind complete. Note B ⊆ Ωx , since B ⊆ D, and b 6 x for each b ∈ B, so
z = supR (f (B)) 6 supR (f (Ωx )) = f (x). Suppose z < f (x); then there is an element u ∈ Ωx
such that z < f (u) 6 f (x). Take q ∈ Q such that z < q < f (u); then f −1 (q) < u 6 x, since
f −1 is order preserving on Q, and f −1 (q) > b for each b ∈ B. So f −1 (q) is an upper bound
for B, but f −1 (q) < x, a contradiction. So f (x) = z, and hence f is surjective.

Topological Spaces
Definition 1.2.6. A topological space is a pair (X, T) where X is a set, T is a collection
of subsets of X, and T satisfies:
(1) ∅ ∈ T, X ∈ T
(2) V1 , . . . , Vn ∈ T implies Vi ∈ T
Tn
i=1

(3) Vi ∈ T for i ∈ I implies Vi ∈ T


S
i∈I

T is called the topology of the space, and the elements of T are called the open sets. We
will refer to X as the space when T is clear.
Definition 1.2.7. We say T is generated from U ⊆ T if T consists of arbitrary unions of
sets from U (note then U must be closed under finite intersection). U is called a basis for
T. Elements of U are basic open sets.
Example 1.2.8. If (L; <) is a linear order with no endpoints, the open intervals generate a
topology, called the order topology.
Definition 1.2.9. V is an (open) neighborhood of x if V is open and x ∈ V . V is a basic
open neighborhood if in addition V is a basic open set.
A basis for the neighborhoods of x is any U consisting of neighborhoods of x such that every
neighborhood of x contains some V ∈ U.
Proposition 1.2.10. A ⊆ X is open if and only if for all x ∈ A, there exists an open
neighborhood V of x such that V ⊆ A.
Proof. ( =⇒ ) If A is open, then A ⊆ A is a neighborhood of x.
( ⇐= ) For each x ∈ A, there is a neighborhood V of x such that V ⊆ A. Then A is the
union of all such neighborhoods, and is therefore open.

8
Definition 1.2.11. The interior of E ⊆ X is the union of all open subsets of X contained
in E. It is the largest open subset of X contained in E.
The exterior of E ⊆ X is the union of all open subsets of X which have empty intersection
with E. It is the largest open subset of X contained in X \ E, hence is the interior of X \ E.
The boundary of E ⊆ X is the set of all points of X which are not in Int E or Ext E.
Definition 1.2.12. A ⊆ X is closed if X \ A is open. Note that arbitrary intersections of
closed sets are closed, as are finite unions.
Definition 1.2.13. The closure of A in X, denoted A, is the intersection of all closed sets
in X containing A. It is the smallest closed set of X containing A.
Definition 1.2.14. D ⊆ X is dense in X if every nonempty open subset V of X contains
a point of D.
Definition 1.2.15. Let X b ⊆ X, T a topology on X. Then the relative or induced topol-
ogy T on X
b b is defined to be T
b = {V ∩ Xb | V ∈ T}.

Definition
S 1.2.16. An open cover of X is a collection {Vi }i∈I of open subsets
S of X such
that i∈I Vi = X. A subcover of X is a subcollection {Vi }i∈J , J ⊆ I such that i∈J Vi = X.
Definition 1.2.17. X is compact if every open cover has a finite subcover. Y ⊆ X is
compact if Y isScompact in the relative topology. Equivalently, whenever
S {Vi }i∈I are open
in X such that i∈I Vi ⊇ Y , then there exists J ⊆ I finite such that i∈J Vi ⊇ Y .
Proposition 1.2.18. If N ⊆ X is compact, and V is open, N \ V is compact.
Proof. Let U be an open cover of N \ V . Since V is open, U ∪ {V } is an open cover of N .
Since N is compact, there is a finite subcover U0 ⊆ U of N . If V ∈ U0 , replace U0 by U0 \ {V },
which yields a finite subcover of N \ V .
Definition 1.2.19. (X, T) is locally compact if for every x ∈ X, there is a compact N
containing a neighborhood of x.
Definition 1.2.20. (X, T) is connected if it cannot be partitioned into two nonempty open
sets, i.e. there are no open, disjoint A, B such that A ∪ B = X.
Proposition 1.2.21. R (with the order topology) is connected.
Proof. Suppose R = A ∪ B for some nonempty open, disjoint subsets A, B such that A ∩ B =
∅. Let a ∈ A, b ∈ B; WLOG, a < b. Put E = {x ∈ A | x < b}; since a ∈ E, E is nonempty
and bounded above. Let z = sup(E). Since R is Dedekind complete, we must have z ∈ A
or z ∈ B. We break into cases:
Case 1: Suppose z ∈ A. Since A is open, there is an open interval (x, y) such that
z ∈ (x, y) ⊆ A. Since R is dense, we can find ẑ > z with z < ẑ < min{b, y}. Then ẑ ∈ (x, y),
so ẑ ∈ A, but ẑ < b, contradicting that z is an upper bound for E.
Case 2: Suppose z ∈ B. Since B is open, there is an open interval (x, y) such that
z ∈ (x, y) ⊆ B. Then x < z, so x is not an upper bound for E (z is the least upper bound
for E). Thus, we can find ẑ ∈ E such that ẑ > x. Since z is an upper bound for E, hatz 6 z,
so ẑ ∈ (x, y) ⊆ B. This is a contradiction, since ẑ ∈ A.

9
Proposition 1.2.22. R is locally compact.
Proof. We show for all a < b ∈ R, the closed interval [a, b] = {x | a 6 x 6 b} is compact. Let
{Vi }i∈I be an open cover of [a, b]. Suppose for contradiction that there is no finite subcover.
The strategy we will pursue is as follows: we will find the greatest point x of [a, b] such that
[a, x] has a finite subcover. Of course, this x must lie in some open set, which allows us to
push the finite open cover to cover [a, x + ], contradicting
S the maximality of x.
Let A = {x ∈ [a, b] | there is finite J ⊆ I such that i∈J Vi ⊆ [a, x]}. A is nonempty and
bounded, since every element of A is less than b by hypothesis, and a ∈ A since we can
just take J = {i} for any Vi containing a (such a Vi must exist since {Vi }i∈I is an open
cover of [a, b]). If b ∈ A, we are done. Suppose not, and let c = sup(A). Then a 6 c 6 b.
Hence, there exists k ∈ I such that c ∈ Vk . Vk is open, so there is an open interval (u, w)
such that c ∈ (u, w) ⊆ Vk . Since c = sup(A), we can find x ∈ A such that u < x 6 c
(otherwise, u would be an explictly smaller lower boundSfor A, contradicting the minimality
S c). Since x ∈ A, we can find finite J ⊆ I such that i∈J ⊇ [a, x]. Take z ∈ (c, w). Then
of
i∈(J∪{k}) ⊇ [a, z], which contradicts the maximality of c, since z > c.

Definition 1.2.23. (X, T) is Hausdorff if for all x 6= y ∈ X, there exists neighborhoods


Vx , Vy ∈ T of x and y such that Vx ∩ Vy = ∅.
Proposition 1.2.24. R is Hausdorff.
Proof. This essentially boils down to density. Let x, y ∈ R be distinct points, and x < y
WLOG. Then there exists z ∈ R such that x < z < y, so for ε > 0, (x − ε, z) and (z, y + ε)
are neighborhoods of x and y respectively with empty intersection.
Proposition 1.2.25. Let (X, T) be Hausdorff and locally compact. Then for every open set
U and for every x ∈ U , there exists a compact Nx ⊆ U containing a neighborhood of x.
Proof. Fix N compact containing a neighborhood U x of x; this is possible by local compact-
ness. If N ⊆ U , we’re done. If not, we can use the fact that X is Hausdorff to selectively peel
away parts of N not contained in U while retaining a neighborhood of x. For every y ∈ N \U ,
fix a neighborhood
T Vy of y, and let Uyx be a neighborhood of x such that Uyx ∩ Vy = ∅.
Note y∈N \U Vy ⊇ N \ U , i.e. {Vy }y∈N \U form an open cover of N \ U ; but N \ U is compact,
since N is compact and U is open. Thus, there exists a finite subcover {Vy1 , . . . , Vyk } such
S b = N \S
that i=1,...,k Vyi ⊇ N \ U . Let N i=1,...,k Vyi . Then N is compact, and N ⊆ U .
b b
x x
b⊇ T
Further, N i=1,...,k (Uyi ∩ N ), since each Uyi has no points in common with Viy . Finally,
since U x ⊆ N , i=1,...,k (Uyxi ∩ U x ) is a neighborhood of x contained in N
T b.

1.3 Lecture 3 - More on Compactness and the Baire Category


Theorem
Today, we will give a few more results on compactness, and will introduce the Baire Category
Theorem.
Proposition 1.3.1. Let (X, T)
T be a compact space.
T Let Cii∈N be a collection of closed subsets
of X. Suppose for all n ∈ N, i6n Ci 6= ∅. Then i∈N Ci 6= ∅.

10
Proof. As we will see in this proof, compactness is T often the natural bridge between finite
facts and infinite claims, and vice versa. Suppose i∈N Ci = ∅. Then for every x ∈ X, there
is some ix ∈ N such that x ∈/ Cix . Then since Cix is closed, there is an open neighborhood Ux
of x contained in X \ Cix , i.e. Ux ∩ Cix = ∅. Note {Ux }x∈X is an open cover of X. Since X
is compact, we have k ∈ N and x1 , . . . , xk ∈ X such that Ux1 , . . . , Uxk cover the space. Since
Cixl ∩ Uxl = ∅ for each l ∈ {1, . . . , k}, every y ∈ X is outside at least one ofTCix1 , . . . , Cixk ,
so Cix1 ∩ · · · ∩ Cixk = ∅. Then for any n larger than max{ix1 , . . . , ixk }, i6n Ci = ∅, a
contradiction.

Proposition 1.3.2. Compact sets in Hausdorff spaces are closed.

Proof. Let N be a compact subset of a Hausdorff space X. It suffices to show that each
x ∈ X \ N has a neighborhood completely contained in X \ N . For each y ∈ Y , let Vy , Uy
be neighborhoods of y and x respectively such that Vy ∩ Uy = ∅. Then {Vy }y∈N is an open
cover of N , so there exists a finite subcover {Vy1 , . . . , Vyk }. Since Uyi ∩ Vyi = ∅ for each
i = 1, . . . , k, Uy1 ∩ · · · ∩ Uyk is a neighborhood of x which is disjoint from N .
We now have the tools to present proof of the Baire Category theorem. We will see
another equivalent formulation of the BCT today, as well as a distinct formulation in Lecture
4.

Theorem 1.3.3 (Baire Category Theorem). Let (X, T) Tbe Hausdorff and locally compact
(e.g., R). Let {Dn }n∈N be dense open subsets of X. Then n∈N Dn is dense (and nonempty).
T
Proof. Let U be a nonempty
T open subset of X. We will find a point in U ∩ ( n∈N Dn ). This
will establish that n∈N Dn is nonempty T and dense, as we will have shown each nonempty
open subset of X contains a point of n∈N Dn . The basic idea is that we will construct a
sequence of shrinking neighborhoods, making sure we pull in a point from each Dk at every
step. We will then leverage local compactness and Hausdorff-ness as needed to ensure that
our sets shrink to at least one point, using the previous two propositions.
Set U0 = U . Now by induction on k > 1, we construct the following:

(i) Take xk ∈ Uk−1 ∩ Dk−1 ; this is possible since Dk−1 is dense, so its intersection with
each nonempty open subset is nonempty

(ii) Find Nk compact containing a neighborhood of xk such that Nk ⊆ Uk−1 ; this is possible
since X is locally compact and Hausdorff, using the theorem proved at the end of
Lecture 2.

(iii) Let Uk be a neighborhood of xk such that Uk ⊆ Nk−1 (via (ii) above) and Uk−1 ⊆ Dk−1 ;
this is possible since Dk−1 is open, and we can take a neighborhood of xk in Nk−1 and
intersect it with Dk−1 (which is nonempty, since (i) guarantees xk ∈ Dk−1 )

Then we have constructed a chain U0 ⊇ N1 ⊆ U1 ⊇ N2 ⊇ U2 . . .. Each T Nk is compact,


hence closed by the previous proposition,
T and for every k ∈ N, we
T have i6k Ni = Nk 6= ∅.
Thus, by our earlier proposition, i∈N Ni 6= ∅. Take any y ∈ Ti∈N Ni ; then for every i,
y ∈ Ni+1T⊆ Ui+1 ⊆ Di , and y ∈ N0 ⊆ U0 = U . Thus, y ∈ n∈N Di , and y ∈ U , so
y ∈ U ∩ ( n∈N Di ).

11
Definition 1.3.4. A set C is nowhere dense if C has empty interior.

Proposition 1.3.5. C is closed nowhere dense if and only if X \ C is open dense.

Proof. ( =⇒ ) If C is closed nowhere dense, then X \ C is open, and C = C. Further, let


U be a nonempty open subset of X. Since C has empty interior, U has at least one point
u ∈ X \ C = X \ C.
( ⇐= ) If X \ C is open dense, then C is closed, whence C = C. Further, since X \ C is
dense, every nonempty open subset U of X contains a point u ∈ X \ C = X \ C. Thus, C
has empty interior, so C is nowhere dense.

Theorem 1.3.6 (Equivalent formulation of the Baire Category Theorem). Let S (X, T) be a
locally compact Hausdorff space. Let {Cn }n∈N be closed nowhere dense. Then n∈N Cn 6= X.

Proposition 1.3.7. Let (X, T) be Hausdorff. Then for every x ∈ X, X \ {x} is open.

S for each y 6= x ∈ X, there is a neighborhood Vy of y such that


Proof. Since X is Hausdorff,
x∈/ Vy . Then X \ {x} = y6=x∈X Vy , whence X \ {x} is open.

Definition 1.3.8. x ∈ X is isolated if {x} is open.

Proposition 1.3.9. If x is not isolated, then X \ {x} is dense.

Proof. Every nonempty open set clearly contains a point of X \ {x}, except for possibly {x}.
Since x is not isolated, {x} is not open, so we’re done.

Note: R has no isolated points.

Corollary 1.3.10. Let X be Hausdorff and locally compact with no isolated points. Then
X is not countable.

Proof. Suppose X were countable, and enumerate X via the sequence (qn )n∈N . For each
n ∈ N, put Dn = X \{qn }.TThen Dn is open dense
T by the previous
T two propositions,
S so by the
Baire Category Theorem, n∈N Dn 6= ∅. But n∈N Dn = n∈N X \{qn } = X \ n∈N {qn } = ∅,
a contradiction.

Corollary 1.3.11. R is not countable.

Proof. R is Hausdorff and locally compact with no isolated points.

Proposition 1.3.12T(F12.5). Let (X, T) be Hausdorff and locally compact. E ⊆ X is Gδ if


it can be written as n∈N Gn with Gn open for each n ∈ N. Prove that Q is not Gδ .

Proof. This is a standard application of the Baire Category Theorem. Let (qn )n∈N enumerate
Q. Note that D Tn∞= R \ {qn } is open dense, since R is Hausdorff and has no isolated points.
Suppose Q = n=1 Gn , where Gn is open for each n ∈ N. Note T Gn is also Tdense for each
n ∈ N, since Gn ⊇ Q. Thus, by T the Baire Category
T Theorem, ( G
n∈N n ) ∩ ( n∈N Dn ) 6= ∅.
But this is a contradiction, as n∈N Gn = Q and n∈N Dn = R \ Q.

12
Metric Spaces
Definition 1.3.13. A metric space is a pair (X, d) where d : (X × X) → [0, ∞) satisfies:
(1) For all x ∈ X, d(x, x) = 0

(2) For all x 6= y ∈ X, d(x, y) 6= 0

(3) For all x, y ∈ X, d(x, y) = d(y, x)

(4) (Triangle inequality): For all x, y, z ∈ X, d(x, z) 6 d(x, y) + d(y, z)


Intuitively, d(x, y) can be thought of as the distance between x and y.
Proposition 1.3.14. Let (X, d) be a metric space. Then the sets

B(z, r) = {x | d(z, x) < r}, for z ∈ X, r > 0

generate a topology on X called the metric topology; B(z, r) are the open balls of radius
r centered at z.
Proof. Let T be the collection of all unions of open balls. To show T is a topology, it suffies
to check that the intersection of any two open balls is a union of open balls. For this, it is
enough to show that for all z1 , z2 ∈ X and r1 , r2 ∈ (0, ∞), for all x ∈ B(z1 , r1 ) ∩ B(z2 , r2 ),
there exists s > 0 such that B(x, s) ⊆ B(z1 , r1 ) ∩ B(z2 , r2 ).
This essentially boils down to the triangle inequality. Let d1 = d(z1 , x) < r1 ; d2 = d(z2 , x) <
r2 . Let s > 0 be small enough such that d(z1 , x) + s < r1 , and d(z2 , x) + s < r2 ; this is
possibly because R is dense.
Fix y ∈ B(x, s). Then d(zi , y) 6 d(zi , x) + d(x, y) < di + s < ri for i = 1, 2. Hence,
B(x, s) ⊆ B(z1 , r1 ) ∩ B(z2 , r2 ).
Example 1.3.15. The usual metric on R: d(x, y) = |x − y|. Then the metric topology
on R is the usual order topology, generated by open intervals.
Example 1.3.16. The discrete metric on any set X:
(
0, if x = y
d(x) =
1, if x 6= y

Every subset of X is open with respect to this metric.


Definition 1.3.17. A metric space is compact if it is compact with the metric topology.
Equivalently, any covering of X with open balls has a finite subcover.
Y ⊆ X is compact if (Y, d|Y ×Y ) is compact. Equivalently, any covering of Y with open
balls has a finite subcover.
Definition 1.3.18. A sequence (xn )∞ n=1 is Cauchy if for every ε > 0, there exists n ∈ N
such that for all k, l > N ∈ N, d(xk , xl ) < ε. Intuitively, the points of a Cauchy sequence
cluster arbitrarily closely together if you go far out enough in the sequence.

13
Definition 1.3.19. A sequence (xn )∞n=1 converges to z if for every ε > 0, there exists
N ∈ N such that for all k > N ∈ N, d(xn , z) < ε.
Proposition 1.3.20. A sequence (xn )∞
n=1 in any metric space (X, d) can converge to at most
one z ∈ X.
Proof. The underlying idea here is that every metric space is Hausdorff, so we can find
two neighborhoods the two converging values which are disjoint, and points of the sequence
can’t be in both at once. Suppose otherwise, i.e. (xn )∞ n=1 converges to both z1 , z2 ∈ X, with
z1 6= z2 . Then d(z1 , z2 ) > 0; let ε = (1/2)d(z1 , z2 ).
By the definition of conergence, for each ε > 0, there exists Ni such that for all k > Ni ∈ N,
d(xk , zi ) < ε for i = 1, 2. But then taking k > max{N1 , N2 }, we get d(xk , z1 ) < ε and
d(xk , z2 ) < ε. But then d(z1 , z2 ) 6 d(xk , z1 ) + d(xk , z2 ) < 2ε = d(z1 , z2 ).
Definition 1.3.21. If (xn )∞
n=1 converges, then the unique z it converges to is called the limit
of (xn )∞
n=1 , denoted limn→∞ xn .

Proposition 1.3.22. If (xn )∞


n=1 converges, then it is Cauchy.

Proof. A simple application of the triangle inequality: take N large enough so that for all
m > N , d(xm , z) < ε/2. Then note that for all m, n > N , d(xm , xn ) 6 d(xm , z) + d(z, xn ) <
ε/2 + ε/2 = ε.
Proposition 1.3.23. If (xn )∞ ∞
n=1 converges to z, then so does every subsequence of (xn )n=1 .

Proof. Let (xn )∞ ∞ ∞


n=1 converge to z, and let (xnk )k=1 be a subsequence of (xn )n=1 . Let ε > 0
be given. Choose N ∈ N such that for all m > N ∈ N, d(xn , z) < ε. Let k be the smallest
element of N such that nk > N . Then for all j > k ∈ N, nj > nk > N , so d(xnj , z) < ε.
Definition 1.3.24. A metric space X is complete if every Cauchy sequence of points in X
has a limit in X.
Definition 1.3.25. Let (xn )∞
n=1 be a sequence, bounded above and below. Define the lim
inf of (xn )∞
n=1 by
lim inf xn = sup inf xl
xn →∞ n∈N l>n

and similarly, the lim sup of (xn )∞


n=1 by
lim inf xn = inf sup xl
xn →∞ n∈N l>n

For sequence of real numbers, these values live in R by Dedekind completeness.


Theorem 1.3.26. R (with the Euclidean metric) is complete.
Proof. We will use the Dedekind completeness of R. Let (xn )∞
n=1 be a sequence, bounded
above and below. Note that
inf xl 6 inf xl 6 · · · 6 sup xl 6 sup xl
l>1 l>2 l>2 l>1

This is because when we remove elements from our sequence, the inf can only go up, whereas
the sup can only decrease, but the sups will never fall below the infs. In particular, this shows
sup inf xl 6 inf sup xl
n∈N l>n n∈N l>n

14
Claim 1.3.27. For every ε > 0, there exists N ∈ N such that ∀k > N , xk < lim sup xn + ε.
Proof. Let N be large enough that
sup xl − lim sup xn < ε
l>N
or
sup xl < lim sup xn + ε
l>N

This is possible because (supl>n xl )∞


is a monotone decreasing sequence whose greatest
n=1
lower bound (i.e., limit by monotone convergence theorem) is lim sup xn . Then for all l >
N, xl 6 supl>n xl < lim sup xn + ε.
Claim 1.3.28. For every ε > 0, there exists N ∈ N such that ∀k > N , xk > lim sup xn − ε.

Assume (xn ) is Cauchy. We want to show that (xn ) has a limit in R.


Claim 1.3.29. If lim inf xl = lim sup xl = l for some l ∈ R, xn → l.
Proof. Let ε > 0 be given. By the claim above, there exists N ∈ N such that for all k > N ,
xk − l < ε
Additionally, by the claim above, there exists N 0 ∈ N such that for all k > N 0 ,
l − xk < ε
Thus, for all K > max{N, N 0 },
|xk − l| < ε

Thus, it is sufficient to show lim inf xl = lim sup xl .


Claim 1.3.30. For every ε > 0 and N ∈ N, there exists k > N such that xk > lim sup xl − ε.
Proof. Let ε > 0, N ∈ N be given. Suppose there were no such k > N such that xk >
supl>N −ε, i.e. xk 6 supl>N xl − ε for all k > N . Then supl>N xl − ε is an upper bound for
(xl )l>N which is smaller than supl>N xl , a contradiction. Hence, there exists k > N such
that xk > supl>N −ε. Since supl>N xl > lim sup xl , we get xk > lim sup xl − ε.
Claim 1.3.31. For every ε > 0 and N ∈ N, there exists k > N such that xk < lim inf xl + ε.

Suppose for contradiction, lim inf xl 6= lim sup xl . Let


lim sup xl − lim inf xl
ε=
3
Since (xn ) is Cauchy, there exists N ∈ N such that for all k1 , k2 > N ,
|xk1 − xk2 | < ε
By the claims above, there exist k1 , k2 > N such that xk1 > lim sup xl − ε and xk2 <
lim inf xl + ε. But then, |xk1 − xk2 | > ε by the triangle inequality, a contradiction.

15
1.4 Lecture 4 - Completeness and Sequential Compactness
Today, we will talk more about completeness, as well as its connection with compactness.

Definition 1.4.1. A limit point of A (in a metric space (X, d)) is a point z such that for
all ε > 0, A ∩ B(z, ε) 6= ∅. Equivalently, z is a limit point of A if z is the limit of a sequence
of values in A.

Proposition 1.4.2. Let (X, d) be a metric space. Then A ⊆ X is closed if and only if A
contains each of its limit points.

Proof. ( =⇒ ) Suppose A is closed, and let z be a limit point of A. If z ∈ / A, then since


X \ A is open in the metric topology, there exists ε > 0 such that B(z, ε) ⊆ X \ A. But then
A ∩ B(z, ε) = ∅, a contradiction since z is a limit point.
( ⇐= ) Suppose A contains all its limit points. Pick z ∈ X \ A. Then z is not a limit point
of A, so there exists ε > 0 such that B(z, ε) ∩ A = ∅, i.e. B(z, ε) ⊆ X \ A. Hence, X \ A is
open, so A is closed.

Definition 1.4.3. Let (X, d) be a metric space. A subset A ⊆ X is complete if and only
if (A, d|A×A ) is complete.

Proposition 1.4.4. If (X, d) is complete, then A ⊆ X is complete if and only if A is closed.

Proof. ( =⇒ ) Suppose A is complete. Let z be a limit point of A. Then there is a sequence


(xn )∞
n=1 of points in A such that xn → z; but xn is convergent, hence Cauchy in (X, d). Since
(xn ) is a sequence of points in A, it is also cauchy in (A, d|A×A ). Since A is complete, (xn )
must have a limit in (A, d|A×A ); call it ẑ ∈ A. Then xn → ẑ also in (X, d), so ẑ = z, whence
z ∈ A. (Note that we have not used the completeness of X here).
( ⇐= ) Suppose A is closed. Let (xn ) be Cauchy in (A, d|A×A ). Then (xn ) ⊆ A and (xn )
is Cauchy in (X, d). Since X is complete, there exists z ∈ X such that xn → z. Since A is
closed and z is a limit point of A, z ∈ A.

Proposition 1.4.5. Let (X, d) be a metric space. The sets B(z, r) = {x | d(z, x) 6 r} are
closed.

Proposition 1.4.6. Let (X, d) be a metric space. If U is open, for each z ∈ U , there exists
ε > 0 such that B(z, ε) ⊆ U .

Proof. Let ε be sufficiently small such that B(z, 2ε) ⊆ U . Then B(x, ε) ⊆ B(z, 2ε) ⊆ U .

Proposition 1.4.7. Let (X, d) be complete.


T Let {Bn }n∈N be open balls of radius rn with
limn→∞ rn = 0 and Bn+1 ⊆ Bn . Then n∈N Bn is nonempty.

Proof. Say Bn = (xn , rn ). Then for all N ∈ N, for all k, l > N , we have xk , xl ∈ BN , since
each of the open balls are nested, which gives d(xk , xl ) < 2rN . Since rn → 0, this implies
that (xn ) is Cauchy, since we can take N large enough that 2rN is arbitrarily small, which
bounds d(xl , xk ) for all k, l > N . By the completeness of (X, d), (xn ) has a limit; call it z.
For every N, z = limn>N +1 xn . Hence z is a limit point of BN +1 ⊆ B N +1 . Since B N +1 is
closed, z ∈ B N +1 , so z ∈ BN , since B N +1 ⊆ BN .

16
Theorem 1.4.8 (Baire Category Theorem,
T Version 2). Let X be a complete metric space.
Let {Dn }n∈N be open dense. Then n∈N Dn is dense.
Proof. Though BCT v2 is actually not equivalent to BCT v1, the main technique in this
proof will closely resemble that of the proof of BCT v1. In particular, we will inductively
construct a shrinking set of balls which capture points from each dense subset. The previous
lemma will guarantee that our eventual intersection is nonempty, which is dependent on the
assumption of completeness.
Let U0 be open. Inductively, we construct (for n > 1):
(i) Pick xn ∈ Un−1 ∩ Dn−1 ; this is possible since Dn−1 is dense
(ii) Pick rn > 0 sufficiently small such that B(xn , rn ) ⊆ Un−1 ∩ Dn−1 ; this is possible by
the our earlier proposition, since Un−1 ∩ Dn−1 is open
(iii) Reduce rn as needed so that rn < 1/n; this ensures rn → 0
(iv) Set Un = B(xn , rn )
Set Bn = B(xn , rn ). Then we get the following chain of inclusions
U0 ⊇ B 1 ⊇ B1 ⊇ B 2 ⊇ · · ·

T B n ⊆ Dn−1 for each n > 1 ∈ N. Now by the previous proposition, there exists
where
z ∈ n∈N Bn . Since Bn T⊆ B n ⊆ Dn−1 , z ∈ Dn for each n ∈ N; similarly, since z ∈ B 1 ,
z ∈ U0 . Thus, z ∈ U0 ∩ ( n∈N Dn ).
Corollary 1.4.9 (S08.6). A complete metric space with no isolated points cannot be count-
able.
Proof. See previous proof.
Definition 1.4.10. A metric space (X, d) is sequentially compact if every sequence has
a convergent subsequence.
Theorem 1.4.11. (X, d) is sequentially compact if and only if X is compact.
Proof. The “if” direction here is quite tricky, so we will start with the “only if”.
( ⇐= ) Suppose (X, d) is compact. Let (xn )∞ n=1 be given. Suppose for contradiction that no

z ∈ X is a limit of a subsequence of (xn )n=1 . Fix z ∈ X; if for every r > 0 and N ∈ N, there
exists k > N ∈ N such that xk ∈ B(z, r), then we can inductively construct a subsequence
of (xn ) which converges to z. Hence, for each z ∈ X, there exists rz > 0, Nz ∈ N such
that for all n > Nz , xn ∈ / B(z, rz ). The set {B(z, rz ) | z ∈ X} is an open cover of X. By
compactness, there are finitely z1 , . . . , zk such that B(z1 , rz1 ) ∪ · · · ∪ B(zk , rzk ) = X. But for
any n > max{Nz1 , . . . , Nzk }, we have xn ∈ / B(z1 , rz1 ) ∪ · · · ∪ B(zk , rzk ) = X, a contradiction.
( =⇒ ) Let {Vi }i∈I be an open cover of (X, d). Suppose {Vi }i∈I has no finite subcover. We
will aim to construct a sequence (xn ) with no convergent subsequence. The intuitive idea is
that we can construct a sequence (xn ) such that the terms of (xn ) are spaced sufficiently far
from each other element of the sequence, so that (xn ) cannot have a convergent subsequence.
We work inductively as follows. Let x0 be some point in X. Suppose we have constructed
xn . Then:

17
(i) There exists in ∈ I such that xn ∈ Vin ; this is possible since {Vi }i∈I is an open cover
for X

(ii) Fix rn > 0 such that B(xn , rn ) ⊆ Vin ; this is possible since Vin is open

(iii) Pick in , rn so that rn > 1/2L for the smallest possible L ∈ N; in other words, we want
to pick in and rn so that we can put the largest open ball possible around xn in Vin

(iv) Finally, pick xn+1 in X \ (Vi1 ∪ · · · ∪ Vin ); this is possible since X has no finite subcover

Claim 1.4.12. (xn ) constructed above has no convergent subsequence.


Proof. Suppose for contradiction that (xn ) has a subsequence converging to some z ∈ X, i.e.
for every ε > 0 and N ∈ N, there exists n > N such that xn ∈ B(z, ε).
Fix i such that z ∈ Vi . Fix L ∈ N such that B(z, 1/L) ⊆ Vi . Fix n ∈ N such that
xn ∈ B(z, 1/2L); we can do so by the above. Note then that B(xn , 1/2L) ⊆ B(z, 1/L) ⊆ Vi ,
where the first inclusion follows from the triangle inequality. At stage n in the induction
above, we could have picked in = i and rn = 1/2L. Hence, by construction, the rn we did
pick must be ≥ 1/2L, whence B(xn , 1/2L) ⊆ Vin .
Hence, for all k > n, xk ∈ / B(xn , 1/2L), since by construction we have xk ∈ / Vin for each
k > n. From this, we get that for all k > n,

d(z, xk ) > d(xk , xn ) − d(z, xn ) > 1/2L − d(z, xn ) = ε > 0

Thus, xk ∈
/ B(z, ε) for every k > n, so z is not a limit point of xk , a contradiction.
This completes the proof.

Theorem 1.4.13 (Heine-Borel). In R, a set A is compact if and only if it is closed (complete)


and bounded (both above and below).

Proof. ( =⇒ ) We proved earlier that every compact subset of R (a Hausdorff space) is


closed. If A were unbounded in either direction, then we would have a monotone set of
points in A which have no convergent subsequence, contradicting sequential compactness.
( ⇐= ) Suppose A is bounded and closed. We prove A is sequentially compact. Our method
will be a “lion in the desert” style proof. To hunt a lion in the desert, divide the desert into
two halves; the lion must be in one of them, so follow him there. Repeat until you’ve caught
the lion.
Let (xn ) be a sequence of points in A. It is sufficient to find a Cauchy subsequence by the
completeness of R. Let (a, b) ⊆ R such that A ⊆ (a, b) (possible since A is bounded above
and below). Working inductively, we define open intervals Bk such that (∗) for infinitely
many n, xn ∈ Bk (i.e. Bk contains infinitely many terms from our sequence):

(i) Set B0 = (a, b).

(ii) Having defined Bk , say Bk = (ak , bk ), let c be the midpoint of (ak , bk ). Then either xn
has infinitely many elements in (ak , c), or in (c, bk ) (if xn has infinitely many elements
equal to c, then that constant subsequence is clearly convergent).

18
(iii) If xn has infinitely many elements in (ak , c), take Bk+1 = (ak , c). Otherwise, set Bk+1 =
(c, bk ).

Using (∗), find a subsequence (xnk ) such that xnk ∈ Bk for each k ∈ N. Since the lengths of
Bk shrink to 0, (xnk ) is Cauchy.

Definition 1.4.14. (X, d) is totally bounded if for every ε > 0, X can be covered with
finitely many balls of radiuses less than ε.

A proof similar to the one above yields above gives the “only if” direction of the following:

Theorem 1.4.15 (S09.4(e), S13.3). A metric space (X, d) is compact if and only if X is
complete and totally bounded.

19
2 Week 2
As per the syllabus, Week 2 topics include: convergence of sums, rearrangements and ab-
solute convergence, continuity in topological and metric spaces, path connectedness, inter-
mediate value theorem, contraction maps and the fixed point theorem, uniform continuity,
uniform convergence, and the Arzela-Ascoli theorem.

2.1 Lecture 5 - Convergence of Sums and Some Exam Problems


Today, we will discuss convergence of sums, and will complete some exam problems concern-
ing the evaluation and convergence of sums.

(ai ) ⊆ R, ∞
P
Definition 2.1.1 (Convergence P of sums in R). For a sequence i=1 ai converges
n P ∞
to
P∞ s ∈ R if the sequence s n = a
i=1 i converges to s. a
i=1 i converges absolutely if
i=1 |ai | converges.

Define (sn ) by sn = ni=1 ai . Then to check that ∞


P P
i=1 ai converges, it is sufficient (in fact,
necessary) to show that (sn ) is Cauchy, i.e. for every ε > 0, there exists N ∈ N such that
for all k, l > N ,
l
X

|sk − sl | < ε, i.e. if k 6 l,
ai < ε
i=k+1

Example 2.1.2. For any (bn ) with 0 6 bn 6 M ,



X bn
n=1
10n

converges.

Proof. It is sufficient to show that the tail sums are arbitrarily small, i.e. for N sufficiently
large,

X bn

n=N
10n
We compare
∞ ∞
X bn X 1 1 1 1 M
n
6 M · n
=M· N · 1 = N
·
i=N
10 i=N
10 10 1 − 10 10 9

Since we can always take N sufficiently large so that 9 · 10N · ε > M , we are done.

Corollary 2.1.3 (S04.1). P (N) = {b | b ⊆ N} injects into R.

Proof. For b ⊆ N, set



X bn
f (b) =
n=1
10n

20
where (
1 if n ∈ b
bn =
0 if n ∈
/b
Note that f is well defined, since

X bn
n=1
10n
always converges by the previous proposition. Moreover, if a 6= b, then taking the least N
which belongs to one of a, b but not the other (assume WLOG N ∈ a, N ∈ / b), we have
N −1 ∞
X an 1 X an
f (a) = n
+ N +
n=0
10 10 n=N +1
10n

N −1 ∞
X bn 0 X bn
f (b) = n
+ N +
n=0
10 10 n=N +1
10n
Since no integer less than N is in one of a, b but not the other by the definition of N , we
have
N −1 N −1
X an X bn
n
=
n=0
10 n=0
10n
Note ∞ ∞
X bn 1 1 1 X an
n
6 N · < N +
n=N +1
10 10 9 10 n=N +1
10n

So f (b) < f (a), and f is injective.

Corollary 2.1.4. R ∼
= P (N).

Proof. First, note P (Q) ∼


= P (N), since we can map every subset A of Q maps to a unique
subset of N by applying a bijection from Q to N to each element of A. Then map f : R →
P (Q) given by
f (x) = {q ∈ Q | q < x}
Note f is injective since Q is dense in R, so if x < y ∈ R, then there is z ∈ Q such that
/ f (x). Thus, by CSB, R ∼
x < z < y, and z ∈ f (y) but z ∈ = P (N).
Theorem 2.1.5 (Cantor). P (N) is not countable.

Proof. Suppose for contradiction that there exists f : N → P (N) which is bijective. Let
b = {n ∈ N | n ∈ / f (n)}. Then b is not in the range of f , as for every n ∈ N, n ∈ b if and
only if n ∈
/ f (n), so b 6= f (n) for each n ∈ N. Thus, f is not onto, a contradiction.

Proposition 2.1.6. R ∼
=R×R

21
Proof. It suffices to show P (N) ∼= P (N) × P (N), as R ∼ = P (N). Let A1 , A2 ⊆ N be infinite
subsets of N with A1 ∪ A2 = N, A1 ∩ A2 = ∅.
Let gi : Ai → N be bijective for i = 1, 2. Define a function from P (N) to P (N) × P (N) by

b 7→ (g1 (b ∩ A1 ), g2 (b ∩ A2 ))

This is a bijection, since g1 and g2 are bijections.



Proposition 2.1.7 (S09.1). Let a0 = 0, an+1 = 6 + an . Show (an ) converges, and find its
limit.
Solution. We proceed directly, without use of continuity. By induction, we show that an 6 3
for all n ∈ N:
Note a0 = 0 6 3. Suppose an 6 3. Then

a2n+1 = 6 + an 6 6 + 3 = 9

so an+1 6 3. To prove a2n+1 > a2n , note

a2n+1 = 6 + an > an · 2 + an = 3an > a2n

So (an ) is increasing and bounded above by 3, so an has limit 6 3.


Claim 2.1.8. For every ε > 0, (an ) is not bounded by 3 − ε. This will show the limit is 3.
Proof. Restrict to ε < 4. We show that if 3 − ε is a bound, then so is 3 − 2ε. It is enough
to show that an+1 < 3 − ε =⇒ an < 3 − 2ε. Say an+1 < 3 − ε. Then

6 + an < 3 − ε

=⇒ 6 + an < 9 − 6ε + ε2
So an < 3 − 6ε + ε2 < 3 − 2ε, where the last inequality is obtained because ε2 < 4ε, since
ε < 4 by assumption.
Using repeated application of this to keep doubling ε, we get an 6 3 − 4 = −1. This is a
contradiction, since a0 > 3 − 4.
Proposition 2.1.9. Let (an ), (bn ), (cn ) be sequences. If an < bn < cn for each n ∈ N, and
(an ) and (cn ) converge to the same limit l, then (bn ) converges to l.
Proof. Pick N large enough such that |an − l| < ε, |cn − l| < ε. Then note

an − l < b n − l < c n − l < ε

Further,
ε > l − an > l − b n > l − c n
Hence, |bn − l| < ε.
Proposition 2.1.10. If limn→∞ dn = l1 and limn→∞ un = l2 and limn→∞ dn − un → 0, then
l1 = l2 .

22
Proof. Suppose not. Let ε > 0 such that |l1 − l2 | > ε. Take N large enough such that
ε ε ε
|dN − l1 | < and |uN − l2 | < and |dN − uN | <
3 3 3
Then
ε ε ε
|l1 − l2 | = |l1 − dN + dN − uN + uN − l2 | 6 |l1 − dN | + |dN − uN | + |uN − l2 | < + + =ε
3 3 3

P∞ (−1)n
Proposition 2.1.11. n=0 n+1 converges.

Proof. Set
n
X (−1)i
sn =
i=0
i+1
The basic idea here is that sn alternates between increasing and decreasing; that is, sn > sn+2
for each n even, and sn < sn+2 for n odd. We will build a strictly increasing sequence out
of elements of (sn ) and a strictly decreasing sequence out of elements of (sn ), and wedge the
terms of (sn ) between these two sequences.
Let (un ) be the sequence s0 , s0 , s2 , s2 , s4 , s4 , . . ., i.e. un = sbn/2c . Note that (un ) is decreasing.
Let (dn ) be the sequence s1 , s1 , s3 , s3 , s5 , s5 , . . ., i.e. un = sd(n+1)/2e . Note that (dn ) is
increasing.
Then for all n, dn 6 sn 6 un . Note (un ) and (dn ) are bounded below by 0 and bounded
above by 1 respectively, so both sequences converge. Finally, note
1
|un − dn | 6
n+1
so by the above proposition, (un ) and (dn ) converge to the same limit. Thus, by the previous
proposition, (sn ) converges.

Proposition 2.1.12 (S05.5). Let


N
1 X
SN = an
N n=1

(a) Suppose limn→∞ an = A. Show limN →∞ SN = A.

(b) Is the converse true? Prove your assertions.

Proof. (a) Let ε > 0 be given. Since (an ) converges to A, there exists M0 large enough such
that |an − A| < ε/2 for all n > M0 . Let M > M0 be large enough so that

|(a1 − A) + (a2 − A) + · · · + (aM0 − A)| ε


<
M 2

23
Then for N > M

a1 + · · · + aN
|SN − A| = − A
N

(a1 − A) + · · · + (aM0 − A) (aM0 − A) + · · · + (aN − A)
6 +
N N
ε |aM0 − A| + · · · + |aN − A|
< +
2 N
ε (ε/2) · (N − M0 )
< +
2 N

(b) Consider an = (−1)n . Then N


P
n=1 an is −1 or 0 depending on whether n is odd or even,
so sN → 0. But an does not itself converge.

Proposition 2.1.13 (S05.2). Let X be the space of sequences (σn )∞


n=1 with σn ∈ {0, 1}.
Define a metric on x by

X 1
d((σn ), (τn )) = |σn − τn |
n=1
2n
Prove (directly) that every infinite A ⊆ X has an accumulation point, i.e. X is sequentially
compact.
Proof. This will be another “lion in the desert” proof. The essential idea is that we can
make the distance between two sequences very small if they agree on a large number of first
digits. Let A ⊆ X be infinite. We will define ak ∈ {0, 1} for k > 1 so that
(∗)k : There are infinitely many (σn ) ∈ A extending (a1 , . . . , ak−1 ) (i.e., the first k digits of
(σn ) are a1 , . . . , ak−1 ).
holds for each ak . Note that (∗)1 holds because A is infinite, so there must either be an
infinite number of sequences starting with 0 or an infinite number of sequences starting with
1. Let a1 be 0 or 1 depending on which is the case.
Now suppose (∗)k holds. Note that
{(σn ) extending (a1 , . . . , ak−1 )} ={(σn ) extending (a1 , . . . , ak−1 , 0)} ∪ {(σn ) extending (a1 , . . . , ak−1 , 1)}
=B0 ∪ B1
Since (∗)k holds, A must have infinite intersection with at least one of the sets in the union.
If A ∩ B0 is infinite, put ak+1 = 0. Otherwise, set ak+1 = 1. It remains to show that (ak )∞k=1
is an accumulation point of A.
Let ε > 0 be given. Let N be large enough so that 1/(2N ) < ε. Using (∗)N +1 , find (σn ) ∈ A
which extends (a1 , . . . , aN ). Then
∞ ∞ ∞
X 1 X 1 X 1 1
d((σn ), (an )) = n
|σ n − a n | = 0 + n
|σn − a n | 6 n
= N

n=1
2 n=N +1
2 n=N +1
2 2

24
Example 2.1.14. There is a related space Y = {(σ)∞ n=1 | σn ∈ N} where the metric on Y is
defined by
(
0 if (σn ) = (τn )
d((σn ), (τn )) = 1
2N
where N is the least number such that σN 6= τN

One can check that d is a metric, and in fact, (Y, d) is complete. But Y is not sequentially
compact. For example, the sequence of elements

(0, 0, 0, . . .)

(1, 0, 0, . . .)
(2, 0, 0, . . .)
..
.
has no convergent subsequence, since the difference between any two elements of the sequence
is fixed at 1/2. In fact, (Y, d) is not even locally compact. In any open neighborhood of the
space, you can find a set of sequences which agree on a large number of digits, then each
differ in the N + 1 place, whence there is no convergent subsequence.

2.2 Lecture 6 - Some More Exam Problems and Continuity


Today, we will do some more exam problems. We will also introduce continuity.

Proposition 2.2.1 (F07.8). Suppose (an ) is a sequence such that an > 0 for all n ∈ N, and

X
an = ∞
n=1

Does ∞
X an
= ∞?
n=1
1 + an

Proof. Yes. Since



X
an = ∞,
n=1

the sequences !∞ !∞
k
X k
X
an and an
n=1 k=1 n=M k=1
are both unbounded for each M ∈ N.
Case 1: Suppose there exists M ∈ N such that for each n > M , an 6 1. Then for all
n > M,
an an
>
1 + an 2

25
Then
k k
X an 1 X
> an
n=M
1 + an 2 n=M
for each k > M ; since the RHS is not bounded, the LHS is not bounded either. Hence,
certainly
k
X an
n=1
1 + an
is not bounded.
Case 2: Suppose there are infinitely many n ∈ N such that an > 1. Then for each such n,
2an > an + 1, so
an 1
>
1 + an 2
Thus,  ∞
an
1 + an n=1
is a sequence of positive real numbers with infinitely many greater than 1/2, so

X an
=∞
n=1
1 + an

Proposition 2.2.2 (F13.1). Suppose (an ) is a sequence such that an > 0 for all n ∈ N. Let
n
Y
Pn = (1 + aj )
j=1

Prove ∞
X
lim Pn < ∞ ⇐⇒ an < ∞
n→∞
n=1

Proof. ( =⇒ ) It suffices to show that the sum is bounded, since each of the terms are
positive. Note that
n
Y
Pn = (1 + aj ) = (1 + a1 ) · · · (1 + an )
j=1

=1 + (a1 + · · · + an ) + (a1 a2 + a1 a3 + · · · + an−1 an ) + · · · + (a1 a2 · · · an )


>a1 + a2 + · · · + an

So for each n ∈ N,
n
X
an 6 P n < ∞
j=1

26
( ⇐= ) Once again, it suffices to show that (Pn ) is bounded. Assume

X
an < ∞
n=1

Then there exists N ∈ N such that



X 1
an <
n=N
2
For k > N ,
k
Y k
X k
X
(1 + aj ) =1 + aj + aj1 aj2 + · · · + (aN · · · ak )
j=N j=N j1 ,j2 =N,j1 6=j2
k
! k
!2 k
!k−N +1
X X X
61 + aj + aj + ··· + aj
j=N j=N j=N
   2  k−N +1
1 1 1
61 + + + ··· +
2 2 2
62

Thus, for every k > N ,


k
Y k
Y N
Y N
Y
(1 + aj ) = (1 + aj ) · (1 + aj ) < 2 · (1 + aj )
j=1 j=N j=1 j=1

P∞
Proposition 2.2.3 P∞ (S10.11). Suppose n=1 an converges absolutely. Show that
Pevery re-

arrangement
P∞ of a
n=1 n converges to the same limit. (A rearrangement of n=1 n is
a
n=1 aσ(n) where σ : N → N is a bijection.)

Proof. Let a = ∞
P
n=1 an . Let ε > 0 be given. We must show that there exists N ∈ N such
that for all k > N ,

X


aσ(n) − a < ε
n=1

Let N1 be large enough that for all k > N1 ,


k k
X ε X ε
a n − a <
2 and |an | <

n=1 n=N
2
1

the first inequality is obtained using the convergence of ∞


P
whereP n=1 an to a, and the second

since n=1 an converges absolutely. Let N > N1 be large enough that

{aσ1 , . . . , aσN } ⊇ {a1 , . . . , aN1 }

27
which is possible as σ is a bijection. Fix k > N , and put

A = {σ(1), . . . , σ(k)} \ {1, 2, . . . , N }

Then
∞ N1 N1
X X X X X ε ε
a σ(n) − a = a n − a + a i
6 a n − a + |ai | < + = ε

n=1

n=1 i∈A

n=1

i∈A
2 2

Proposition 2.2.4 (F08.5). Suppose ∞


P
n=1 an converges,
P∞ but not absolutely. Then for every
a ∈ R, there is a rearrangement (aσ(n) )∞
n=1 such that n=1 aσ(n) = a.

Proof. The key property here is tha the sum of all positive (resp. negative) terms tends to
positive (resp. negative) infinity, but the terms themselves are converging to 0. Let

X = {n | an > 0}

Y = {n | an < 0}
P∞
Claim 2.2.5. (1) n=1,n∈X an = ∞
P∞
(2) n=1,n∈Y an = −∞

Proof. Since ∞
P
n=1 an converges, each of (1), (2) implies the other, so it is enough to prove
that one of them holds (if one of the sums is unbounded but the other is finite, it is clear
that the sum over all n will eventually diverge). Suppose for contradiction that both (1) and
(2) fail. Then there exists M such that for all k ∈ N,
k
X
an < M
n=1,n∈X

k
X
an > −M
n=1,n∈Y

Then for all k ∈ N,


k
X k
X k
X
|an | = an − an < 2M
n=1 n=1,n∈X n=1,n∈Y

i.e. the sum converges absolutely, a contradiction.


P∞
Note that an → 0 since n=1 an converges. We will define sequences (un ) and (vn )
inductively. Take σ to be the rearrangement such that σ(1), . . . , σ(k1 ) are the elements of X
which are ≤ u1 ; σ(k1 +1), . . . , σ(j1 ) are the elements of Y which are ≤ v1 ; σ(j1 +1), . . . , σ(k2 )
are the elements of X which are in (u1 , u2 ]; σ(k2 + 1), . . . , σ(j2 ) are the elements of Y which
are in (v1 , v2 ]; and so on. This defines a bijection from N to N because X and Y partition

28
N, and σ is injective by construction; as we will see, σ will cover every element of X and Y ,
so σ is also surjective. Let u1 , v1 be sufficiently large enough that
1
i > u1 , i ∈ X =⇒ ai <
2
1
i > v1 , i ∈ Y =⇒ ai > −
2
This
Pj1 is possible since (an ) converges to 0. This defines σ up to j1 , so it also determines
n=1 aσ(n) . Say WLOG that
j1
X
aσ(n) < a
n=1

Then set u2 , picked so that adding elements of X in (u1 , u2 ] puts the sum between a and
a + (1/2); we can do this since ai < 1/2 for each i > v1 . Similarly, set v2 , picked so that
adding elements of Y in (v1 , v2 ] puts the sum between a − (1/2) and a. Keep repeating this
process until uk , vk are sufficiently large enough so that
1
i > uk , i ∈ X =⇒ ai <
4
1
i > vk , i ∈ Y =⇒ ai > −
4
Once again, this is possible since (an ) converges to 0. Repeat the above step with 1/4 instead
of 1/2; then it’s clear that the partial sums of the rearrangement are Cauchy with limit a.

Continuity

Definition 2.2.6. Let X, Y be topological spaces. Then f : X → Y is continuous at


x0 ∈ X if for every open neighborhood V of f (x0 ), f −1 (V ) contains an open neighborhood
of x0 .
If the topologies are generated by some basic open sets, then f is contnuous at x0 if and
only if for every basic open neighborhood V of x0 , there exists a basic open neighborhood
U of x0 such that U ⊆ f −1 (V ), i.e. f (U ) ⊆ V .
In particular, in metric spaces (where the topology is generated by open balls), f is con-
tinuous at x0 if and only if for every ε > 0 (standing for the basic open ball B(f (x0 ), ε)),
there exists δ > 0 (standing for the basic open ball B(x0 , δ)) such that x ∈ B(x0 , δ) implies
f (x) ∈ B(f (x0 ), ε), i.e. dX (x, x0 ) < δ =⇒ dy (f (x), f (x0 )) < ε.

Proposition 2.2.7. Let X and Y be metric spaces. Then f : X → Y is continous at x if


and only if whenever (xn ) converges to x, (f (xn )) converges to f (x).

Proof. ( =⇒ ) Suppose f is continuous at x, and (xn ) converges to x. Let δ > 0 be given.


We must show that there exists N ∈ N such that n > N implies d(f (xn ), f (x)) < δ. Since
f is continuous at x, there exists ε > 0 such that

d(y, x) < ε =⇒ d(f (y), f (x)) < δ

29
Since xn → x, there exists N ∈ N such that for all n > N , d(xn , x) < ε, whence
d(f (xn ), f (x)) < δ.
( ⇐= ) Fix f, x, and let δ > 0 be given. We must find ε > 0 such that d(x, y) < ε im-
plies d(f (x), f (y)) < δ. Suppose no such ε exists. n particular, for each n, ε = 1/n does
not satisfy the desired condition. Then there exists y ∈ X such that d(x, y) < 1/n but
d(f (x), f (y)) > δ. Let xn be some such y for each n ∈ N. Then (xn ) converges to x, but
f (xn ) does not converge to f (x), a contradiction.
Definition 2.2.8. Let X, Y be topological spaces. Then f : X → Y is continuous if and
only if it is continuous at all x ∈ X.
Proposition 2.2.9. f : X → Y is continuous if and only if for every open V ⊆ Y , f −1 (V )
is open in X.
Proof. ( =⇒ ) Let f be continuous, V ⊆ Y be open, and put U = f −1 (V ). Fix x ∈ U . Then
by continuity of f at x, since V is an open neighborhood of f (x), U must contain an open
neighborhood of x.
( ⇐= ) Fix f, x, and let V ⊆ Y be an open neighborhood of f (x). Since V is open, f −1 (V )
is an open neighborhood of x, whence f is continuous at x.
Corollary 2.2.10. The composition of continuous functions is continuous.
Note: +, −, · are continuous by their definitions on elements of R.
Theorem 2.2.11 (Intermediate Value Theorem). Let f : X → R be continuous. Suppose X
is connected. Then if f takes values y0 , y1 (with y0 < y1 , say) then f takes every value in
(y0 , y1 ). Precisely, for every y ∈ (y0 , y1 ), there exists x ∈ X such that f (x) = y.
Proof. Suppose y ∈ (y0 , y1 ) and for all x ∈ X, f (x) 6= y. Put
A = {x ∈ X | f (x) < y}; B = {x ∈ X | f (x) > y}
It is clear that A ∩ B = ∅ and X = A ∪ B by assumption. Both A and B are open in
X by continuity, since A = f −1 ((−∞, y)) and B = f −1 ((y, ∞)). Additionally, both A and
B are nonempty, since there exists x ∈ X such that f (x) = y0 < y and z ∈ X such that
f (z) = y1 > y. This is a contradiction, since X is connected.
Corollary 2.2.12 (W06.6). Let f : [a, b] → R be continuous. Then f takes every value
between f (a) and f (b).
Proof. Since we showed earlier that [a, b] is connected, this is immediate by the Intermediate
Value Theorem.

2.3 Lecture 7 - Path-Connectedness, Lipschitz Functions and Con-


tractions, and Fixed Point Theorems
Today, we will discuss the idea of path-connectedness, and show that there are sets in R2
which are connected but not path connected. We will also introduce classes of continuous
functions which have stronger conditions on how much the function can grow between points
which are close together. Finally, we will prove some valuable fixed point theorems for these
classes of functions.

30
Definition 2.3.1. A metric space (X, d) is path-connected if for all x0 , x1 ∈ X, there
exists a continuous f : [0, 1] → X such that f (0) = x0 , f (1) = x1 .

Proposition 2.3.2. If (X, d) is path-connected, then it is connected.

Proof. Suppose A, B are open, nonempty disjoint subsets such that A ∪ B = X. Pick
x0 ∈ A, x1 ∈ B. Since (X, d) is path-connected, there exists continuous f : X → [0, 1] such
that f (0) = x0 , f (1) = x1 . Then f −1 (A), f −1 (B) decomposes [0, 1] into disjoint nonempty
open sets, which is a contradiction as [0, 1] is connected.

Proposition 2.3.3. If f : [a, b] → X is continuous, then im(f ) = f ([a, b]) is connected.

Proof. Suppose there are A, B are open, nonempty disjoint subsets such that A∪B = im(f ).
Proceed as in the proposition above.

Proposition 2.3.4 (S11.11(b)). Give an example of a connected set in R2 which is connected


but not path-connected.

Proof. The classic example of a connected subset of R2 which is not path-connected is called
the Topologists’ Sine Curve. Let

Z = {(x, y) ∈ R2 | x = 0, y = 0 or x 6= 0, y = sin(1/x)}

The idea here is that while the graph cannot be broken up into disjoint open pieces, the
curve starts to wiggle so violently near the origin that no continuous curve from [0, 1] can
move “quickly” enough to pass through all the points.
Claim 2.3.5. Z is connected.
Proof. Suppose for contradiction that Z = A ∪ B, where A, B are open, disjoint, and
nonempty. The topology on Z is inherited from R2 , so we have A∗ , B ∗ open in R2 such
that A = A∗ ∩ Z, B = B ∗ ∩ Z.
One of A, B, must contain the origin. WLOG, say (0, 0) ∈ A. Then (0, 0) ∈ A∗ , so there
exists ε > 0 such that B((0, 0), ε) ∈ A∗ . So A has points in both Z − and Z + , where

Z − = {(x, y) ∈ Z | x < 0}

and
Z + = {(x, y) ∈ Z | x > 0}
B has points in at least one of Z − , Z + , since (0, 0) ∈
/ B and B is nonempty. Suppose WLOG
that B ∩ Z 6= ∅. Thus, A and B both have points in Z − . Then A ∩ Z − and B ∩ Z −

are nonempty, disjoint, open subsets of Z − , so Z − is not connected. But by the previous
proposition, Z − must be connected, as it is the image of a continuous function on an interval
(namely, (−∞, 0)).
Claim 2.3.6. Z is not path connected.

31
Proof. Let (x0 , sin(1/x0 )) = (x0 , y0 ) and x1 = 0, y1 = 0 be in Z with x0 < 0. Suppose
for contradiction that there exists continuous f : [0, 1] → Z such that f (0) = (x0 , y0 ) and
f (1) = (x1 , y1 ). Note that for any α ∈ [0, 1] we may write f (α) as (fX (α), fY (α)) with fX , fY
continuous by composing f with the continuous projections to the x and y-axis respectively.
Let (un ) be an increasing sequence converging to 0 such that fX (0) = x0 < un < 0 = fX (1)
and   (
1 1 if n is even
sin =
un −1 if n is odd
By the intermediate value theorem, there exists α0 ∈ [0, 1] such that fX (α0 ) = u0 . Then
by the Intermediate Value Theorem applied inductively, we have αn < αn+1 < 1 such that
fX (αn+1 ) = un+1 . Since (αn ) is an increasing sequence which is bounded above, it must
converge to some limit, say α 6 1.
Since fX is continuous, fX (αn ) must converge to fX (α). Since fX (αn ) = un and un converges
to 0, we must have fX (α) = 0. Since (fX (α), fY (α)) ∈ Z, we must have fY (α) = 0. By
the continuity of fY , this means that (fY (αn )) converges to 0. But (fX (αn ), fY (αn )) =
(un , sin(1/un )) ∈ Z, so
  (
1 1 if n is even
fY (αn ) = sin =
un −1 if n is odd

whence (fY (αn )) does not converge.


This completes the proof.

Theorem 2.3.7 (Extreme Value Theorem). Continuous functions f : X → R on compact


spaces X take minimum and maximum values.

Proof. Suppose f does not attain a maximum on X. Then there exists a sequence (yn )n∈N ⊆
im(f ) such that (∗) for all x ∈ X, there exists n ∈ N such that f (x) < yn . Set An =
f −1 ((−∞, yn )); note each An is open by the continuity of f . By (∗), {An }n∈N is an open
cover for X. By compactness, there exists k ∈ N finite such that
[
X⊆ An = f −1 ((−∞, max{y0 , . . . , yk }))
n6k

But this is a contradiction, since max{y0 , . . . , yk } = yi for some i ∈ {1, . . . , k}, so yi ∈ im(f ).
The proof is similar for minmum values.
Here is an alternative proof. Note that the continuous image of a compact set is compact
(we will prove this later). Hence, f (X) ⊆ R is compact, whence it is closed and bounded by
Heine-Borel. Since f (X) is bounded, it has a least upper bound α. Note α is a limit point
of f (X) (every open ball about α must contain a point of f (X), or else α would not be the
least upper bound for f (X)), whence α ∈ f (X) since f (X) is closed. Hence, there exists
x ∈ X such that f (x) = α. The proof that f takes minimum values is nearly identical.

Definition 2.3.8. Let (X, TX ), (Y, TY ) be topological spaces. The product topology on
X × Y is the topology generated by U × V, U ∈ TX , V ∈ TY .

32
Proposition 2.3.9. Let (X, d) be a metric space. Then d : X × X → R is continuous.

Proof. Fix x, y ∈ X 2 . Let α = d(x, y), and let ε > 0 be given. We must find some open
neighborhood U × V of (x, y) in X 2 such that for all (x̂, ŷ) ∈ U × V , d(x̂, ŷ) ∈ B(α, ε). Take
U = B(x, ε/2), V = B(y, ε/2). Then for all (x̂, ŷ) ∈ U × V , we have

d(x̂, ŷ) 6 d(x̂, x) + d(x, y) + d(y, ŷ) =⇒ d(x̂, ŷ) − α < ε

and
d(x, y) 6 d(x̂, x) + d(x̂, ŷ) + d(y, ŷ) =⇒ α − d(x̂, ŷ) < ε
so
|d(x̂, ŷ) − α| < ε
i.e., d(x̂, ŷ) ∈ B(α, ε).

Fixed Point Theorems

Definition 2.3.10. Let X, Y be metric spaces. Then f : X → Y is Lipschitz with con-


stant L if for all x1 , x2 ∈ X,

dY (f (x1 ), f (y1 )) 6 L · (dX (x1 , x2 ))

Proposition 2.3.11. All Lipschitz functions are continuous.

Proof. If f is Lipschitz with constant L, then for any point x ∈ X, take δ = ε/L. (Note that
this also shows that all Lipschitz functions are uniformly continuous; see Lecture 8).

Definition 2.3.12. f : X → Y is a contraction if it is Lipschitz with constant L < 1.

Theorem 2.3.13 (W08.1(b) - Banach Fixed Point Theorem). Let f : X → X be a contrac-


tion, where X is a complete metric space. Then f has a fixed point, i.e. there is x ∈ X such
that f (x) = x. Morever, the fixed point is unique.

Proof. Uniqueness: Suppose f (x) = x, f (y) = y. Since f is a contraction, d(f (x), f (y)) 6
L · d(x, y) for some L < 1. But f (x) = x, f (y) = y, so d(f (x), f (y)) = d(x, y), so it must be
that d(x, y) = 0, whence x = y.
Existence: There’s not much we can do here, so we start by picking an arbitrary point in
X and applying f . Then we hope for the best.
Let x0 ∈ X. Set xn+1 = f (xn ). For every n ∈ N,

(∗)d(xn+2 , xn+1 ) = d(f (xn+1 ), f (xn )) 6 L · d(xn+1 , xn ) < d(xn+1 , xn )

Using (∗) by induction, we obtain

d(xn+1 , xn ) 6 Ln · d(x1 , x0 )

33
Additionally, for every k > n + 1, we have

d(xn+1 , xk ) 6d(xn+1 , xn+2 ) + d(xn+2 , xn+3 ) + · · · + d(xk−1 , xk )


6L · d(xn , xn+1 ) + L2 · d(xn , xn+1 ) + · · · + Lk−(n+1) d(xn , xn+1 )

X
<d(xn , xn+1 ) · Li
i=1
L
=d(xn , xn+1 ) ·
1−L
Ln+1
<d(x0 , x1 ) ·
1−L
In particular, we see (xn ) is Cauchy. By completeness, xn → x for some x ∈ X. Then
xn+1 → x also, so f (xn ) → x. Finally, by continuity, we have

x = lim f (xn ) = f ( lim xn ) = f (x)


n→∞ n→∞

Theorem 2.3.14 (W08.1(a) - Brouwer Fixed Point Theorem). Let g : [a, b] → [a, b] be
continuous. Prove g has a fixed point.

Proof. Let f (x) = g(x) − x; note f is continuous. We have g(a) > a, since g(a) ∈ [a, b], so
f (a) > 0. Similarly, f (b) 6 0. Hence, by the Intermediate Value Theorem, there exists some
x ∈ [a, b] such that f (x) = 0, i.e. g(x) − x = 0, so g(x) = x.

Proposition 2.3.15 (F11.1). Let f : X → X. Suppose for all x 6= y, d(f (x), f (y)) < d(x, y).
Suppose X is compact. Prove f has a unique fixed point.

Proof. Uniqueness: If f (x) = x, f (y) = y for x 6= y, then d(x, y) = d(f (x), f (y)) < d(x, y),
a contradiction.
Existence: Let g(x) = d(x, f (x)). Since each of the component functions x 7→ x and x 7→
f (x) are continuous and d is continuous, g : X → [0, ∞) is continuous. Since X is compact, g
attains a minimum value. Let x ∈ X be such that g(x) is a minimal value of g on X. Suppose
g(x) = d(x, f (x)) = α > 0. Then by our assumption on f , d(f (x), f (f (x))) < d(x, f (x) = α,
contradicting the minimality of α. This finishes the proof, since g(x) = d(x, f (x)) = 0, so
x = f (x).

2.4 Lecture 8 - Uniformity, Normed Spaces and Sequences of


Functions
Today, we will discuss the idea of uniform continuity. We will also introduce pointwise and
uniform convergence of sequences of functions.

Uniformity

34
Definition 2.4.1. Let X, Y be metric spaces. Recall f : X → Y is continuous if for every
x ∈ X and ε > 0, there exists δ > 0 such that d(x, y) < δ =⇒ d(f (x), f (y)) < ε.
We say f : X → Y is uniformly continuous if for every ε > 0, there exists δ > 0 such that
for all x ∈ X, d(x, y) < δ =⇒ d(f (x), f (y)) < ε.

Proposition 2.4.2. If a function is uniformly continuous, it is continuous.

Proposition 2.4.3. Lipschitz functions are uniformly continuous.

Proof. Suppose f is Lipschitz with constant L. Then for all ε > 0, d(x, y) < ε/L =⇒
d(f (x), f (y)) < ε.

Theorem 2.4.4. Any continuous function f on a compact space X is uniformly continuous.

Proof. Fix f, ε > 0. We must find a δ > 0 such that for all x, y ∈ X, d(x, y) < δ =⇒
d(f (x), f (y)) < ε. By continuity, for every z ∈ x, there exists δz > 0 such that d(x, z) <
δz =⇒ d(f (x), f (z)) < ε/2. Then for all x, y ∈ B(z, δz ),
ε ε
(∗)d(f (x), f (y)) 6 d(f (x), f (z)) + d(f (z), f (y)) < + =ε
2 2
Note that {B(z, δz /2) | z ∈ X} is an open cover for X. By compactness, there are z1 , . . . , zk
such that B = {B(z1 , δz1 /2), . . . , B(zk , δzk /2)} cover X. Set δ = min{δz1 , . . . , δzk }. Suppose
d(x, y) < δ. Note x ∈ B(zi , δzi /2) for some i ∈ {1, . . . , k}, since B is an open cover for
X. Since d(x, y) < δ 6 δzi /2, by the triangle inequality we have y ∈ B(zi , δzi ). By (∗),
d(f (x), f (y)) < ε.

Proposition 2.4.5 (S04.2). f (x) = x is uniformly continuous on [0, ∞).

Proof. We break the


√ proof into two steps, which we glue together in the third step. √
Step 1: Observe x is uniformly continuous on [0, 3] by the previous theorem, since x is
continuous∗ on [0, 3] and [0, 3] is compact.
Step 2: We show f is Lipschitz on [1, ∞), hence uniformly continuous. Note
√ √ √ √
x = y + ( x − y)

so on [1, ∞)
√ √ √ √ √
x = y + 2 y( x − y) + ( x − y)2
√ √ √ √ √ √ √ √ √ √
=⇒ x − y = ( x − y)2 + 2 y( x − y) > 2 y( x − y) > 2( x − y)
Thus, on [1, ∞)
√ √ 1
x− y 6 (x − y)
2
Step 3: We now need to merge these two observations to get uniform continuity on the
full space [0, ∞). Fix ε > 0. By steps 1, 2, there exist δ1 , δ2 > 0 such that x, y ∈ [0, 3]
and |x − y| < δ1 implies |f (x) − f (y)| < ε, and x, y ∈ [1, ∞) and |x − y| < δ2 implies
|f (x) − f (y)| < ε. Take δ = min(δ1 , δ2 , 1). If |x − y| < δ, then either x, y are both in [0, 3]
or x, y are both in [1, ∞), and in each case |f (x) − f (y)| < ε.

35


Of course, you should now be asking: why is x continuous √ in the first place? In fact,
a much more general class of functions is continuous, of which x is a particular example.

Proposition 2.4.6. The forward image of compact sets by continuous functions are compact.

Proof. Let f : X → Y be continuous, and let K ⊆ X be compact. Let {Vi }i∈I be an


open cover of f (K). Then {f −1 (Vi )}i∈I is an open cover of K by the continuity of f . By
compactness of k, there is a finite subcover {f −1 (Vi1 ), . . . , f −1 (Vin )}. Then Vi1 , . . . , Vin cover
K.

Proposition 2.4.7. If f : X → Y is a continuous bijection, Y is Hausdorff,


√ and X is
−1
compact, then f : Y → X is also continuous. In particular, this shows x is continuous
on [0, M ] for each M , hence on [0, ∞).

Proof. Fix U ⊆ X open. We must show f (U ) is open. Let K = X \ U . Then K is compact,


since X is compact and U is open. Note f (U ) = Y \ f (X \ U ) since f is a bijection. By the
previous proposition, f (X \ U ) is compact, hence closed, since Y is Hausdorff. So f (U ) is
open.

Limits of functions

Definition 2.4.8. Let X, Y be metric spaces, E ⊆ X, and f : E → X. Let a ∈ X (with


possibly a ∈
/ E) with points of E arbitrarily close to A. Then we say limx→a,x∈E f (x) exists
and is equal to y if for every ε > 0, there exists δ > 0 such that d(x, a) < δ, x ∈ E, x 6= a
implies d(f (x), f (y)) < ε.
Note: some people allow x = a in the limit. We will also use this with a = ±∞.

Example 2.4.9. Let f (x) = 1/x from (0, ∞) to (0, ∞). Then limx→∞ f (x) = 0.

Convergence of sequences of functions

Definition 2.4.10. Let fn : X → Y , n ∈ N be a sequence of functions. Let f : X → Y . We


say (fn ) converges to f pointwise on X if for all x ∈ X, limn→∞ fn (x) exists and equals
f (x).

Example 2.4.11. Take fn : [0, 1] → [0, 1] given by fn (x) = xn . Then fn (x) converges
pointwise to (
0 if x ∈ [0, 1)
f (x) =
1 if x = 1
This is an example of a pointwise limit of continuous functions which is not continuous.
Another way to put it is:

lim lim fn (x) need not equal lim lim f (x)


n→∞ x→a x→a n→∞

even if both limits exist (e.g., in the example above, take a = 1).

36
Definition 2.4.12. Let (fn ) be a sequence of functions from X to Y and let f : X → Y .
Then (fn ) converges to f uniformly on X if for every ε > 0, there exists N ∈ N such that
for all x ∈ X, n > N , d(fn (x), f (x)) < ε.

The key difference between uniform convergence and pointwise convergence is that in
uniform convergence, N does not depend on the choice of x. This is similar to the case of
uniform continuity, which differs from continuity in that δ is independent of the choice of x.

Proposition 2.4.13. If (fn ) converges to f uniformly, (fn ) converges to f pointwise.

Proposition 2.4.14. Suppose (fn ) converges to f uniformly. Suppose fn is continuous at


x0 ∈ X for each n ∈ N. Then so is f .

Proof. Let ε > 0 be given. We need to find δ such that d(x0 , y) < δ =⇒ d(f (x0 ), f (y)) < ε.
Since (fn ) converges to f uniformly, there exists N ∈ N such that for all n > N, y ∈
X, d(fn (y), f (y)) < ε/3. By the continuity of fN at x0 , there exists δ > 0 such that d(x0 , y) <
δ =⇒ d(fN (x0 ), fn (y)) < ε/3. Then if d(x0 , y) < δ, we have
ε ε ε
d(f (x0 ), f (y)) 6 d(f (x0 ), fN (x0 )) + d(fN (x0 ), fN (y)) + d(fN (y), f (y)) < + + =ε
3 3 3

Corollary 2.4.15. If (fn ) converges uniformly to f on X, and fn is continuous on X for


each n ∈ N, then so is f . If (fn ) converges uniformly to f on X, and limx→x0 limn→∞ fn (x)
exists, then so does limn→∞ limx→x0 fn (x), and the two limits are equal.

Normed Spaces

Definition 2.4.16. Let V be a vector space. A norm on V is a function || · || : V → [0, ∞)


such that

(1) If x 6= 0 then ||x|| =


6 0

(2) ||αx|| = |α|||x|| (in particular, ||~0|| = 0)

(3) (triangle inequality) ||x + y|| 6 ||x|| + ||y||

We can then define d(x, y) = ||x − y||. This will be a metric on V .

Definition 2.4.17. For any set X, let

B(X) = {f | f is a bounded function (both above and below) from X → R}

Then ||f || := supx∈X |f (x)| is a norm on B(X) called the sup norm.

Note then that for (fn ) ⊆ B(X) and f ∈ B(X), (fn ) conveges to f uniformly on X if
and only if (fn ) converges to f in the metric on B(X). We can also consider convergence

37
of sums of functions. For a sequence of functions (fn ) between metric spaces X and Y , for
each x ∈ X we take
X∞
fn (x)
n=1

to be the limit as k → ∞ of
k
X
sk (x) = fn (x)
n=1

Theorem 2.4.18. If (fn ) consists of bounded, continuous functions on X and



X
||fn ||sup
n=1
P∞
converges, then n=1 fn converges uniformly to a continuous function f .

Proof. (Exericse). Look at tail sums, and use uniform convergence.

Getting a convergent subsequence

Definition 2.4.19. A sequence of functions (fn )∞ n=1 is (pointwise) equicontinuous at x0 if


for every ε > 0, there exists δ such that for all n ∈ N, d(x0 , y) < δ implies d(fn (x0 ), fn (y)) <
ε. Note δ may depend on x0 , but not on n!

Definition 2.4.20. A sequence of functions (fn )∞n=1 is pointwise bounded if for every
x0 ∈ X, there exists M such that for all n ∈ N, |fn (x0 )| < M . (At each point, the bound
might be different.)

Theorem 2.4.21 (Arzela-Ascoli). Let X be a separable metric space. Suppose fn : X → R,


and

(i) (fn ) is pointwise bounded

(ii) (fn ) is pointwise equicontinuous

Then (fn ) has a subsequence which converges pointwise to a continuous function. Moreover,
the convergence is uniform on compact subsets of X. Note : If X is a compact metric
space, then X is automatically separable.

38
3 Week 3
As per the syllabus, Week 3 topics include: definition of derivative, derivative for inverse
function, local maxima and minima, Rolle’s theorem, mean value theorem, Rolle’s the-
orem for higher order derivatives and applications to error bounding for approximations
by Lagrange interpolations, monotonicity, L’Hopital’s rule, uniform convergence limits of
derivatives (in homework), upper and lower Riemann integrals, upper and lower Riemann
sums, definition of the Riemann integral, integrability of bounded continuous functions on
bounded intervals, basic properties of the Riemann integral, integrability of mins, maxes,
sums, and products, Riemann-Stieltjes integral, the fundamenal theorems of calculus, inte-
gration by parts, change of variables in integration, improper integrals, integrals of uniform
convergence limits (in homework), Cauchy-Schwarz inequality.

3.1 Lecture 9 - Arzela-Ascoli, Differentiation and Associated Rules


Today, we will discuss the Arzela-Ascoli theorem. We will also introduce differentiation,
and discuss some important rules which allow us to compute derivatives of certain classes of
functions.
Recall, from last time:

Theorem 3.1.1 (Arzela-Ascoli). Let X be a separable metric space. Suppose fn : X → R,


and

(i) (fn ) is pointwise bounded

(ii) (fn ) is pointwise equicontinuous

Then (fn ) has a subsequence which converges pointwise to a continuous function. Moreover,
the convergence is uniform on compact subsets of X. Note : If X is a compact metric
space, then X is automatically separable.

Proof. Let D ⊆ X be countable and dense (such a D exists since X is separated). We will
proceed in four parts.

(1) Get a subsequence which converges pointwise on D (using pointwise boundedness)

(2) Show it converges pointwise everywhere on X (using equicontinuity)

(3) Check the limit is continuous (using equicontinuity)

(4) Show uniform convergence on compact sets

Part 1: We will show (fn ) has a subsequence which converges on D. Let (zl )∞ l=1 enumerate
D (this is possible since D is countable). The idea here is that we will find a subsequence of
(fn ) which converges at zl for each l ∈ N. We will then patch these subsequences together
in a way that one particular subsequence of (fn ) converges at zl for every l ∈ N.
Let A0 = N. By induction, we define Al+1 ⊆ Al such that (fn (zl ))n∈Al+1 converges. To do
so, suppose Al has been defined. The sequence (fn (zl ))n∈Al is a bounded sequence in R since

39
(fn ) is pointwise bounded, so it has a convergent subsequence. Take Al+1 ⊆ Al to be such
that (fn (zl ))n∈Al+1 is a convergent subsequence of (fn (zl ))n∈Al . In this way, (fn (z0 ))n∈A1
is a convergent subsequence of (fn (z0 ))n∈A0 , (fn (z1 ))n∈A2 is a convergent subsequence of
(fn (z1 ))n∈A1 , and so on, with A0 ⊇ A1 ⊇ A2 ⊇ · · · .
If we naively take the intersection Ai over all i ∈ N, we might end up with an empty set,
so we have to be careful. We want to arrange it so that we use the nestedness of the Ai s to
capture a subsequence which has all of the tail ends. Precisely, for k ∈ N, let nk be the least
natural number such that nk > k and nk ∈ Ak . Then for every l, nk ∈ Al for all k > l, since
Ak ⊆ Al for each k > l, and nk ∈ Ak for each k ∈ N. Hence, (nk ) is eventually contained in
Al for each l ∈ N.
Hence, for every l ∈ N, a tail end of (fnk (zl ))∞ k=1 is contained in (fn (zl ))n∈Al+1 . Since
(fn (zl ))n∈Al+1 converges, so (fnk (zl ))∞
k=1 .
Part 2: Now we want to how (fnk (x))∞ k=1 converges for all x ∈ X. The main tool we’ll use
here is equicontinuity; this allows us to leverage the continuity of fn on all of X for each
x ∈ X independently of our index n. We make this precise below.
Fix x ∈ X, ε > 0. It suffices to find N ∈ N such that for all k1 , k2 > N , |fnk1 (x)−fnk2 (x)| < ε.
Thus, the sequence is Cauchy, hence converges. By equicontinuity of (fn ), there exists δ > 0
such that d(z, x) < δ implies for all n ∈ N, |fn (z) − fn (x)| < ε/3. By the density of D, there
exists some z ∈ D such that d(z, x) < δ. By part (1), since (fnk (z)) converges for each z ∈ D,
it is Cauchy, whence there exists N ∈ N such that for all k1 , k2 > N, |fnk1 (z)−fnk2 (z)| < ε/3.
Then for all k1 , k2 > N ,
ε ε ε
|fnk1 (x)−fnk2 (x)| 6 |fnk1 (x)−fnk1 (z)|+|fnk1 (z)−fnk2 (z)|+|fnk2 (z)−fnk2 (x)| < + + =ε
3 3 3
where we have arrived at the first term using equicontinuity, the second term using part (1),
and the third term using equicontinuity again.
Part 3: For x ∈ X, let f (x) = limk→∞ fnk (x); this limit exists, as established by part (2).
We now show f is continuous. Fix ε > 0 and x ∈ X. Let δ > 0 be given such that d(x, z) < d
implies |fnk (x) − fnk (z)| < ε/3 for all k ∈ N; such a delta exists and is independent of nk by
equicontinuity.
Our strategy is now essentially the same as that of part (2). We use the convergence of our
subsequence on all of X from part (2) together with equicontinuity to establish the continuity
of f . Fix z ∈ X. Since fnk (x) → f (x) and fnk (z) → f (z), for k sufficiently large, we have
ε ε
|fnk (x) − f (x)| < and |fnk (z) − f (z)| <
3 3
Then
ε ε ε
|f (x) − f (z)| 6 |f (x) − fnk (x)| + |fnk (x) − fnk (z)| + |fnk (z) − f (z)| < + + =ε
3 3 3
where we have arrived at the first term using convergence from part (2), the second term
using equicontinuity, and the third term using convergence again.
Part 4: Let K be a compact subset of X. To prove that (fnk ) converges to f uniformly on
K, we will first establish uniform convergence on small neighborhoods. Using compactness,
we will patch this convergence together on an open cover, then trim down to a finite set of

40
neighborhoods, i.e. subcover. We make this precise below.
We want to show that for every ε > 0, there exists N ∈ N such that for all n > N ,
|fn (x) − f (x)| < ε
for all x ∈ X. Here, N is independent of x. Let ε > 0 be given. Let x ∈ X. Since (fnk ) is
equicontinuous, there exists δx1 > 0 depending on ε and x such that for all y ∈ B(x, δx1 ),
ε
|fnk (x) − fnk (y)| <
3
for all k ∈ N. Since f is continuous on all of X, there similarly exists δx2 > 0 such that for
all y ∈ B(x, δx2 ),
ε
|f (x) − f (y)| <
3
Finally, since (fnk ) converges to f pointwise, there exists Nx ∈ N such that for all j > Nx ,
we have
ε
|fnj (x) − f (x)| <
3
1 2
Put δx = min{δx , δx }. Putting these conditions together, for all j > Nx
ε ε ε
|fnj (y) − f (y)| 6 |fnj (y) − fnj (x)| + |fnj (x) − f (x)| + |f (x) − f (y)| 6 + + =ε
3 3 3
for all y ∈ B(x, δx ). Cover K with the open balls B(x, δx ) for each x ∈ X. Then there exists
a finite subcover B(x1 , δx1 ), . . . , B(xl , δxl ). Put N = max{Nx1 , . . . , Nxl }. Then as shown
above, for all j > N , we have
|fnj (x) − f (x)| < ε
for each x ∈ K, completing the proof.

Derivatives
We will work in R throughout our discussion of derivatives, though there are select proposi-
tions we will prove that hold for general metric spaces.
Definition 3.1.2. Let X ⊆ R, f : X → R, and x0 ∈ X be a limit point of X. We say f is
differentiable at x0 if
f (x) − f (x0 )
lim
x→x0 x − x0
exists. The derivative of f at x0 , denoted f 0 (x0 ), is the value of the limit above. If the
limit does not exist, f is not differentiable at x0 . Finally, f is differentiable on X if f
is differentiable at every point x0 ∈ X.
Example 3.1.3. We compute some fundamental examples of derivatives below.
(1) Let f (x) = c for some c ∈ R. Then
f (x) − f (x0 ) c−c
lim = lim = lim 0 = 0
x→x0 x − x0 x→x0 x − x0 x→x0

So f is differentable on R, and f 0 (x) = 0 for all x ∈ R.

41
(2) Let f (x) = x on R. Then

f (x) − f (x0 ) x − x0
lim = lim = lim 1 = 1
x→x0 x − x0 x→x 0 x − x0 x→x0

So f is differentable on R, and f 0 (x) = 1 for all x ∈ R.

(3) Let f (x) = xn for some n > 2. Then

f (x) − f (x0 ) xn − xn0


lim = lim
x→x0 x − x0 x→x0 x − x0

= lim xn−1 + x0 xn−2 + · · · + xn−1


0
x→x0

=x0n−1 + x0 xn−2
0 + · · · + x0n−1
=nxn−1
0

where the third step is true by the continuity of multiplication and addition on R (hence,
polynomials are continuous functions). So f is differentable on R, and f 0 (x) = nxn−1
for all x ∈ R.

(4) Let f (x) = 1/x for x ∈ R \ {0}. Then


1 1
f (x) − f (x0 ) −
lim = lim x x0
x→x0 x − x0 x→x0 x − x0
x0 −x
x0 x
= lim
x − x0
x→x0
−1
= lim
x→x0 x0 x
−1
= 2
x0
where the fourth step is justified by the continuity of multiplication and division on
R \ {0}. So f is differentable on R \ {0}, and f 0 (x) = −1/x2 for all x ∈ R.
Proposition 3.1.4. If f is differentiable at x0 , then f is continuous at x0 .
Proof. Fix ε > 0. We will use a lot less than full differentiability here. Since
f (x) − f (x0 )
lim = f 0 (x0 )
x→x0 x − x0
there exists δ > 0 such that

f (x) − f (x0 ) 0

|x − x0 | < δ =⇒
− f (x0 ) < 1
x − x0
and this implies
f (x) − f (x0 ) 0
x − x0 < |f (x0 )| + 1

42
so
|f (x) − f (x0 )| < |x − x0 |(|f 0 (x0 )| + 1)
Hence, it follows that
 
ε
|x − x0 | < min δ, 0
=⇒ |f (x) − f (x0 )| < ε
|f (x0 ) + 1|

Proposition 3.1.5 (Chain Rule). Let f : X → Y ⊆ R, and g : Y → R be continuous. Let


x0 ∈ X, y0 = f (x0 ). Suppose f is differentiable at x0 and g is differentiable at y0 . Then
(g ◦ f ) is differentiable at x0 , and (g ◦ f )0 (x0 ) = g 0 (y0 )f 0 (x0 ) = g 0 (f (x0 ))f 0 (x0 ).

Proof. This proof comes from Rudin’s “Principles of Mathematical Analysis”, Chapter 5,
page 105. By the definition of the derivative,

f (x) − f (x0 ) = (x − x0 )[f 0 (x0 ) + ε(x)]

g(y) − g(y0 ) = (y − y0 )[g 0 (y0 ) + δ(y)]


where x ∈ X, y ∈ Y , and ε(x) → 0 as x → x0 , δ(y) → 0 as y → y0 . Let y = f (x). Then by
the second expression, we get

(g ◦ f )(x) − (g ◦ f )(x0 ) = g(f (x)) − g(f (x0 )) = [f (x) − f (x0 )] · [g 0 (y0 ) + δ(y)]

By the first expression, this implies

(g ◦ f )(x) − (g ◦ f )(x0 ) = (x − x0 ) · [f 0 (x0 ) + ε(x)] · [g 0 (y0 ) + δ(y)]

Then if x 6= x0 , we find
(g ◦ f )(x) − (g ◦ f )(x0 )
= [f 0 (x0 ) + ε(x)] · [g 0 (y0 ) + δ(y)]
x − x0
Taking the limit at x → x0 , y → limx→x0 f (x) = f (x0 ) by the continuity of f , so that the
RHS tends to
g 0 (y0 )f 0 (x0 )
whence the conclusion follows.
1
Corollary 3.1.6. Let f (x) = xn
on R \ {0}. Then f is differentiable on R \ {0} and
f 0 (x) = x−n
n+1 .

Proof. Let g(y) = y1 on R \ {0}. Take h(x) = xn on R. Then f = g ◦ h, so by the chain rule,
f 0 (x0 ) = g 0 (y0 ) · h0 (x0 ), where y0 = h(x0 )) = xn0 , so
−1 −1 −n
f 0 (x0 ) = 2
· (n · x0n−1 ) = 2n · (nx0n−1 ) = n+1
y0 x0 x0

So for f (x) = x−n , f 0 (x) = −n · x−n−1 .

43
Proposition 3.1.7. Suppose f : X → Y, g : Y → X with X, Y ⊆ R and (f ◦ g)(y) = y for
all y ∈ Y , and (g ◦ f )(x) = x for all x ∈ X. Let x0 ∈ X, y0 ∈ Y such that y0 = f (x0 ) and
x0 = f (y0 ), and suppose f 0 (x0 ), g 0 (y0 ) exist. Then
1
g 0 (y0 ) =
f 0 (x 0)

Proof. Proof is immediate from the chain rule, and the fact that the derivative of the identity
is 1.
We will slightly weaken the assumptions of the previous proposition to obtain an identical
result.
Proposition 3.1.8. Suppose f : X → Y is bijective for X, Y ⊆ R and g = f −1 . Let
x0 ∈ X, y0 ∈ Y such that y0 = f (x0 ) and x0 = f (y0 ). Suppose
(i) f is differentiable at x0
(ii) f 0 (x0 ) 6= 0
(iii) g is continuous on Y
Then g is differentiable at y0 , and
1
g 0 (y0 ) =
f 0 (x 0)

Proof. Fix ε > 0. We must find δ > 0 such that



g(y) − g(y0 ) 1
|y − y0 | < δ =⇒ − < ε
y − y0 x0
By the continuity of u 7→ 1/u on R \ {0}, and since f 0 (x0 ) 6= 0, there exists ε1 > 0 such that

0
1 1
(∗)|u − f (x0 )| =⇒ − 0

u f (x0 )
Since f is differentiable at x0 , there exists ε2 > 0 such that

f (x) − f (x0 ) 0

|x − x0 | < ε2 =⇒ − f (x0 ) < ε1
x − x0
By the continuity of g, there exists δ > 0 such that
y − y0 | < δ =⇒ |g(y) − g(y0 )| < ε2
Then setting x = g(y) and using g(y0 ) = x0 , we get

f (x) − f (x0 ) 0
y − y0 0


x − x0 − f (x 0 ) =
g(y) − g(y0 ) − f (x 0 ) < ε1

so by (∗), we have
g(y) − g(y0 ) 1

y − y0 − 0 <ε
f (x0 )

44

Corollary 3.1.9. Let g(x) = n
x = x1/n . Then g is differentiable on (0, ∞), and g 0 (x) =
(1/n) · x1/n−1 .
Proof. Note g is the inverse of f (x) = xn . By an earlier proposition (2.4.7) in lecture
8, g is continuous on (0, ∞). Put let x0 ∈ (0, ∞) and put y0 = f (x0 ) = xn0 , so that
1/n
x0 = g(y0 ) = y0 . Using the previous proposition, we have
1 1 1 1/n−1
g 0 (y0 ) = n−1 = = ·y
nx0 (n−1)/n
ny0 n 0

Corollary 3.1.10. For any rational α, the function f (x) = xα is differentiable on (0, ∞),
and f 0 (x) = αxα−1 .
Proof. This is an easy consequence of the chain rule on the maps x 7→ xn , x 7→ x1/k , together
with the previous propositions computing their derivatives.
We conclude with a few more basic differentiation rules.
Proposition 3.1.11. Let f, g : X → R be differentiable functions on some X ⊆ R. Then
(1) (f + g)0 = f 0 + g 0
(2) (f g)0 = f 0 g + g 0 f (product rule)

3.2 Lecture 10 - Applications of Differentiation: Mean Value The-


orem, Rolle’s Theorem, L’Hopital’s Rule and Lagrange Inter-
polation
Today, we will highlight some useful applications of derivatives. Chief among them are
Rolle’s Theorem and the Mean Value Theorem (of which Rolle’s Theorem is a special case).
We will also discuss how derivatives can be used to analyze polynomial approximation via
Lagrange Interpolation. Finally, we will prove several incarnations of L’Hopital’s rule, which
is a useful tool for computing certain kinds of limits.
Proposition 3.2.1 (Newton’s approximation). f is differentiable at x0 with derivative L if
and only if for every ε > 0, there exists δ > 0 such that for all x,
(∗)|x − x0 | < δ =⇒ |f (x) − l(x)| < ε|x − x0 |
where l(x) is the line through (x0 , f (x0 )) with slope L, i.e. l(x) = f (x0 ) + (x − x0 ).
Informally, this proposition says f has derivative L at x0 if and only if the line with slope L
through (x0 , f (x0 ) tracks the curve very closely around x0 .
Proof. Note
f (x) − l(x) f (x) − f (x0 )
= −L
x − x0 x − x0
Hence, the equivalence between the prescribed condition and differentiability at x0 is clear.

45
Definition 3.2.2. Let f : X → R. We say x0 is a local maximum of f if there exists
δ > 0 such that for all x ∈ X, |x − x0 | < δ implies f (x) 6 f (x0 ). Local minima are defined
similarly.
Theorem 3.2.3. Let f : X → R. Let x0 ∈ X. Suppose x0 is a limit point of both (−∞, x0 )∩
X and (x0 , ∞)∩X. Suppose f is differentiable at x0 , and x0 is a local maximum of minumum
of f . Then f 0 (x0 ) = 0.
Proof. The idea here is that if x0 is a local max (resp. min) of f : X → R, and there are
elements of X to the right of x0 arbitrarily close to x0 , and similarly from the left, and (∗)
in the proposition above holds, then it must be that L = 0.
Suppose L > 0. Then using (∗) with ε = 1/2L, we get that for x close enough to x0 with
x > x0 ,
L L
l(x) − (x − x0 ) < f (x) < l(x) + (x − x0 )
2 2
Since l(x) = f (x0 ) + (x − x0 )L, this implies
L
f (x0 ) < f (x0 ) + (x − x0 ) < f (x)
2
which is a contradiction, since x0 is a local maximum. If L < 0, we get a similar contradiction
using points x such that x < x0 .
Theorem 3.2.4 (Rolle’s Theorem). Let f : [a, b] → R be continuous, and suppose f is
differentiable on (a, b), where a < b. Suppose f (a) = f (b). Then there exists w ∈ (a, b) such
that f 0 (w) = 0.
Proof. If for all x ∈ (a, b), f (x) = f (a) = f (b), then f is constant on [a, b], so f 0 (x) = 0 for
all x ∈ (a, b), and the theorem holds.
Suppose there exists c ∈ (a, b) such that f (c) 6= f (a). Assume f (c) > f (a). Since f is
continuous on [a, b], it attains some maximal value, call it M , and let w ∈ [a, b] such that
f (w) = M . Then M > f (c) > f (a) = f (b), so w ∈ (a, b). W is then a local maximum of
f . By the previous theorem f 0 (w) = 0. The case where f (c) < f (a) is dealt with similarly,
the key being that continuous functions attain maximum and minimum values on compact
sets.
Theorem 3.2.5 (Mean Value Theorem). Let a < b, and let f : [a, b] → R be continuous,
and differentiable on (a, b). Then there exists x ∈ (a, b) such that
f (b) − f (a)
f 0 (x) =
b−a
Proof. The geometric picture you should have in your head is this. For a function which is
continuous on an closed interval and differentiable on the open interval, between any two
points in the interval, there exists a third point in between the two chosen such that the
tangent line to the curve at that point is parallel to the secant line through the first two
points.
Let l(x) be the line between (a, f (a)) and (b, f (b)). Precisely,
(f (b) − f (a))(x − a)
l(x) = f (a) +
b−a

46
Note
f (b) − f (a)
l0 (x) =
b−a
Put g(x) = f (x) − l(x). Then
f (b) − f (a)
g 0 (x) = f 0 (x) −
b−a
and g(a) = g(b) = 0. By Rolle’s theorem, there exists x ∈ (a, b) such that g 0 (x) = 0. Then
f (b) − f (a)
f 0 (x) =
b−a

Example 3.2.6. For every rational α ∈ (0, 1) and every x, y > 1, |y α − xα | 6 |y − x|. Hence,
x 7→ xα is Lipschitz on [1, ∞). The proof uses the Mean Value Theorem, as one might
imagine.
Fix x, y ∈ [1, ∞), and assume WLOG x < y. Let f be the function x 7→ xα . From a previous
proposition, we know f 0 (x) = αxα−1 . By the mean value theorem applied to [x, y], there
exists z ∈ (x, y) such that
f (y) − f (x) 1
= f 0 (x) = αz α−1 6 z α−1 = 1−α 6 1
y−x z
since z > 1. Hence
f (y) − f (x) = y α − xα 6 y − x
Theorem 3.2.7 (Higher Order Rolle’s Theorem). Let f : [a, b] → R be continuous. Suppose
f is n-times differentiable on (a, b), i.e. f 0 , f 00 , f (3) , . . . , f (n) all exist. Suppose a = a0 < a1 <
· · · < an = b are such that for all i ∈ {1, . . . , n}, f (ai ) = 0. Then there exists x ∈ (a, b) such
that f (n) = 0.
Proof. A small note in the statement above: while we take f (ai ) = 0 for all i, this is more
a matter of convenience. This theorem is true as long as f (ai ) = c ∈ R for each i, but we
can always translate this case into the above form by replacing f with f − c. We proceed by
induction on n. Note the case n = 1 is Rolle’s Theorem, so our claim is true for n = 1.
Suppose the claim holds for n > 1, and suppose a = a0 < · · · < an+1 = b such that for all
i ∈ {1, . . . , n − 1}, f (ai ) = 0. By Rolle’s theorem, for each i, we can find ci ∈ (ai , ai+1 ) such
that f 0 (ci ) = 0. Now we have c1 , . . . , cn such that f 0 (cj ) = 0 for each j ∈ {1, . . . , n}, so using
the inductive hypothesis on f 0 (since f 0 is n-times differentiable), we get x ∈ (c0 , cn ) such
that (f 0 )(n) (x) = 0, i.e. f (n+1) (x) = 0.
Example 3.2.8 (Lagrange Interpolation). We can use the higher order Rolle’s theorem to
bound error in approximation by polynomials. We will start by approximating a function
by a line. Suppose f : [a, b] → R is continuous, and twice differentiable on (a, b). Suppose
|f 2 (x)| is bounded on (a, b) by M . We approximate f by the line
f (b) − f (a)
p1 (x) = f (a) + (x − a) ·
b−a

47
(2)
Let g(x) = f (x) − p1 (x). This is the approximation error. Note that p1 (x) = 0 for
all x ∈ [a, b], so g (2) (x) = f 2 (x). Take c ∈ (a, b). Approximate g by a second degree-2
polynomial p2 that hits g at a, b and c:
g(c)
p2 (x) = (x − a)(x − b)
(c − a)(c − b)

Let h(x) = g(x) − p2 (x). Then

2g(c) 2g(c)
h(2) (x) = g 2 (x) − = f 2 (x) −
(c − a)(c − b) (c − a)(c − b)

By the higher order Rolle’s theorem for h, and since h(a) = h(c) = h(b) = 0, we have some
z ∈ (a, b) such that h2 (z) = 0. Then

2g(c)
f (2) (z) =
(c − a)(c − b)
So
1 1
|g(c)| 6 |f (2) (z)||c − a||c − b| 6 M (b − a)2
2 2
Since the choice of c was arbitrary, the error g(x) is bounded on (a, b) by (M (b − a)2 )/2.
We can continue this same process with higher degree polynomial approximations. Set
q2 = p1 + p2 . Then q2 is a degree-2 polynomial with q2 (a) = f (a), q2 (b) = f (b), q2 (c) = f (c).
By a similar argument to the above, assuming f is 3-times differentiable and M bounds |f 3 |,
using n = 3 higher order Rolle’s, we get

M (b − a)3
|(f − q2 )(x)| 6
3·2
More generally, if qn is a polynomial of degree n hitting f at a = a0 < · · · < an = b, and f
is (n + 1)-times differentiable and f (n+1) is bounded, then

M (b − a)n
|(f − qn )(x)| 6
n!
where |f (n+1) | 6 M . The polynomials above are the Lagrange interpolations of f .

Derivatives of Monotone Functions

Definition 3.2.9. A function f : X → R (with X ⊆ R) is monotone increasing (nonde-


creasing) if x < y implies f (x) 6 f (y). We say f is strictly monotone increasing if x < y
implies f (x) < f (y). Monotone decreasing and strictly monotone decreasing are defined
similarly.

Proposition 3.2.10. If f is monotone increasing on X and differentiable at x0 ∈ Y , then


f 0 (x0 ) > 0. Similarly, f monotone decreasing implies f 0 (x0 ) 6 0.

48
Proof. Since f is non-decreasing, for all x 6= x0 ,
f (x) − f (x0 )
>0
x − x0
so
f (x) − f (x0 )
lim x → x0
x − x0
must be ≥ 0 if it exists.
Proposition 3.2.11. If f : [a, b] is continuous and differentiable on (a, b), and for all x ∈
(a, b), f 0 (x) > 0, then f is monotone increasing on [a, b]. Similarly, if for all x ∈ (a, b),
f 0 (x) 6 0, then f is monotone decreasing on [a, b]. This is a kind of “converse” to the above
proposition.
Proof. As one might expect, we proceed using the Mean Value Theorem. Fix x < y in [a, b].
By the Mean Value Theorem on [x, y], there exists w ∈ (x, y) such that
f (y) − f (x)
= f 0 (w)
y−x
Since f 0 (w) > 0 by hypothesis, we get
f (y) − f (x) = f 0 (w)(y − x) > 0
so f (y) > f (x).

L’Hopital’s Rule
We conclude today by presenting L’Hopital’s rule in several forms, differing mainly in the
hypotheses assumed.
Proposition 3.2.12 (L’Hopital’s Rule). Let X ⊆ R, f, g : X → R, and let x0 be a limit
point of X. Suppose f (x0 ) = g(x0 ) = 0. Suppose f, g are both differentiable at x0 . Suppose
g 0 (x0 ) > 0. Suppose there is a neighborhood (x0 − δ, x0 + δ) of x0 where g(x) is never 0 except
at x0 . Then
f (x) f 0 (x0 )
lim = 0
x→x0 ,x∈X g(x) g (x0 )
Proof. Note for x 6= x0 , we have
f (x) f (x) − f (x0 ) f (x) − f (x0 ) x − x0
= = ·
g(x) g(x) − g(x0 ) x − x0 g(x) − g(x0 )
As x → x0 ,
f (x) − f (x0 )
→ f 0 (x0 )
x − x0
and
x − x0 1
→ 0
g(x) − g(x0 ) g (x0 )
so
f (x) f 0 (x0 )
→ 0
g(x) g (x0 )

49
Proposition 3.2.13. Let X ⊆ R, f, g : X → R, and let x0 be a limit point of X. Suppose
f (x0 ) = g(x0 ) = 0. Suppose f, g are both differentiable at x0 . Suppose g 0 (x0 ) > 0. Suppose
there is a neighborhood (x0 − δ, x0 + δ) of x0 where g 0 (x) is never 0. Then
f (x) f 0 (x0 )
lim = 0
x→x0 ,x∈X g(x) g (x0 )
Proof. We show there exists δ > 0 such that |x−x0 | < δ, x 6= x0 such that g(x) 6= 0, reducing
our claim to the previous proposition.
Let δ > 0 be small enough such that |x−x0 | < δ implies g 0 (x) 6= 0. Suppose for contradiction,
there is x ∈ B(x0 , δ) \ {x0 } such that g(x) = 0. By Rolle’s Theorem on [x, x0 ] (or [x0 , x] if
x0 < x), there exists w between x and x0 such that g 0 (w) = 0, a contradiction.
Proposition 3.2.14. Suppose a < b and f, g : [a, b] → R are continuous. Suppose f (a) =
g(a) = 0, and g 0 is nonzero on (a, b]. If
f 0 (x)
lim
x→a g 0 (x)

exists and equals L, then g(x) 6= 0 for all x ∈ (a, b] and


f (x)
lim =L
x→a g(x)

Proof. First, by Rolle’s Theorem as above, g is nonzero on (a, b]. For each z ∈ (a, b),
let hz (x) = f (x)g(z) − g(x)f (z). Note hz (x) is continuous on [a, z], hz (a) = hz (z) = 0,
and h0z (x) = f 0 (z)g(z) − g 0 (x)f (z). By Rolle’s theorem, there exists w ∈ (a, z) such that
h0z (w) = 0, so f 0 (w)g(z) − g 0 (w)f (z) = 0, i.e.
f 0 (w) f (z)
0
=
g (w) g(z)
Now by assuption that
f 0 (x)
lim =L
x→a g 0 (x)

for every ε > 0, there exists δ > 0 such that


0
f (w)
|w − a| < δ =⇒ 0 − L < ε
g (w)
Now for any z ∈ (a, a + δ), by the above, we have w ∈ (a, z) ⊆ (a, a + δ) such that
f (z) f 0 (w)
= 0
g(z) g (w)
so
f (z)

g(z) − L <ε

50
3.3 Lecture 11 - The Riemann Integral (I)
Today, we will introduce Riemann integration as a method for computing areas under curves.
We will develop the Riemann integral, define what it means to be “Riemann integrable”,
and will prove that several classes of functions are Riemann integrable.
Definition 3.3.1. An interval I is any of [a, b], (a, b], [a, b), (a, b) for a 6 b (allowing for
a = −∞, b = ∞). Note that under this definition, ∅ and single points are considered
intervals.
Definition 3.3.2. An interval is bounded if a 6= −∞ and b 6= ∞. We define the length
of the interval to be b − a, denoted |I|.
Definition
S3.3.3. A partition of an interval I is a (finite) set P of pairwise disjoint intervals
such that J∈P J = I.
P
Proposition 3.3.4. If P is a (finite) partition of a bounded interval I, then |I| = J∈P |J|.
Proof. Let P be a partition of a bounded interval I whose endpoints are a, b where a < b ∈ R.
Let P = {J1 , . . . , Jn , and let Ji have endpoints xi < yi for each i. Since P is finite and every
pair of elements of P are disjoint, we can “rearrange” the intervals so that the endpoints are
ordered, i.e. x1 ≤ y1 ≤ x2 ≤ y2 ≤ · · · ≤ xn ≤ yn . Since I is the union of J1 , . . . , Jn , it follows
that x1 = a and yn = b, where the intervals J1 and Jn are inclusive on the left and right
respectively. Further, we must have yi = xi+1 for each i; if not, then there exists z ∈ [a, b]
such that yi < z < xi+1 , contradicting the fact that I is the union of elements of P . Finally,
we have
X n
X n−1
X
|J| = (yi − xi ) = (xi+1 − xi ) + (yn − xn ) = yn − x1 = (b − a) = |I|
J∈P i=1 i=1

Definition 3.3.5. A function f : I → R is piecewise constant if there is a partition P of


I such that for all J ∈ P , f is constant on J. We say f is piecewise constant with respect
to P .
Definition 3.3.6. A partition P 0 is finer than a partition P (of the same interval I) if for
every J 0 ∈ P 0 , there exists J ∈ P such that J 0 ⊆ J. We say P 0 refines P .
Proposition 3.3.7. If f is piecewise constant with respect to P and P 0 refines P , then f is
piecewise constant with respect to P 0 .
Proposition 3.3.8. If f, g are both piecewise constant on I, then so are f +g, f −g, f ·g, f /g
(if g is nonzero everywhere on I).
Proof. The key is to find a partition P of I such that f and g are both piecewise constant
on P . Say f is piecewise constant with respect to P1 , g is piecewise constant with respect
to P2 . Take P = {J ∩ K | J ∈ P1 , K ∈ P2 }; then one can easily see that P is partition of I
and is finer both P1 and P2 . It is straightorward to check that f + g, f − g, f · g and f /g are
also piecewise constant with respect to P .

51
Definition 3.3.9. For f piecewise constant on a bounded interval I with respect to a
partition P of I, define Z X
p.c. f= cJ · |J|
[P ] J∈P

where f (j) = cJ for all j ∈ J, i.e. cJ is the constant value of f on J.


R
Proposition 3.3.10. Let I be a bounded interval. Then the value of [P ] f for f : I → R for
any partition P of I for which f is piecewise constant is independent of P (it depends only
on f, I).

Proof. Let P 0 be a partition of I which refines P . Since P 0 is finer than P , there exists a
subset ΩJ of P 0 which partitions J for each J ∈ P . Since ΩJ is a partition of J, the constant
value of f on any J 0 ∈ ΩJ isPthe same as the constant value of f on J. Further, by an earlier
proposition, we have |J| = J 0 ∈ΩJ |J 0 |. Hence
Z X X X X Z
0 0
p.c. f= cJ · |J| = cJ · |J | = cJ 0 · |J | = p.c. f
[P ] J∈P J∈P J 0 ∈ΩJ J 0 ∈P 0 [P 0 ]

Then given partitions P1 , P2 of I, we can pick a partition P 0 which refines both P1 , P2 , whence
the above remark completes the proof.

Definition 3.3.11. Let I be a bounded interval, and f : I → R be piecewise constant. Then


Z Z
p.c. f = p.c. f
I [P ]

for some (equivalently, by the last proposition, all) partition(s) P of I such that f is piecewise
constant with respect to P .

Definition 3.3.12. We say f majorizes f on I if for all x ∈ I, f (x) > f (x). Similarly, f
minorizes f on I for all x ∈ I, f (x) 6 f (x).

Definition 3.3.13. Let I be a bounded interval, and f : I → R be bounded. Then the


upper Riemann integral of f on I is defined as
Z Z
f = inf{p.c. f | f majorizes f and is piecewise constant on I}
I [P ]

Analogously, the lower Riemann integral of f on I is defined as


Z Z
f = sup{p.c. f | f minorizes f and is piecewise constant on I}
I [P ]

Proposition 3.3.14. Let f be piecewise constant and bounded on a bounded interval I.


Then Z Z
f6 f
I I

52
Proof. Let f be piecewise constant with respect to a partition P1 of I and majorize f .
Similarly, let f be piecewise constant with respect to a partition P2 of I and minorize f .
Fix a partition P which refines both P1 and P2 , so that f and f are piecewise constant with
respect to P . Then it is clear f (x) > f (x) for each x ∈ I, so
Z X X Z
p.c. f= (f )J · |J| 6 (f )J · |J| = p.c. f
[P ] J∈P J∈P [P ]

where it is understood that (f )J is the piecewise constant value of f on J, and analogously


(f )J is the piecewise constant value of f on J. Since we have
Z Z
f 6 p.c. f
I [P ]

and Z Z
p.c. f6 f
[P ] I

for all f , f as defined above, together we have


Z Z
f6 f
I I

Definition 3.3.15. We say f is Riemann integrable on I if


Z Z
f= f
I I

Then Z
f
I
the Riemann integral of f on I, is
Z Z
f= f
I I

An alternate definition of the Riemann integral

Note that for any partition P of I, the function


gP (y) = sup f (x)
x∈J,J∈P

is piecewise constant with respect to P , and it majorizes f . Moreover, for any f majorizing
f which is piecewise constant with respect to P , we have f (y) > gP (y). By definition, we
have Z X
p.c. gP = |J| · sup f (x)
[P ] x∈J
J∈P

53
We refer to the sum on the RHS as the upper Riemann sum of f, P denoted U (f, P ).
Since gP majorizes f for each partition P , and is piecewise constant with respect to I, we
have
Z Z
f = inf{p.c. f | f majorizes f and is piecewise constant on I} 6 U (f, P )
I [P ]

for each partition P , whence we have


Z
f 6 inf{U (f, P ) | P a partition of I}
I

Similarly, since f (y) > gP (y) for any particular partition Y , we have
Z
inf{U (f, P ) | P a partition of I} 6 p.c. f
[P ]

for each f which majorizes f and is piecewise constant on I, so


Z
inf{U (f, P ) | P a partition of I} 6 f
I

so Z
f = inf{U (f, P ) | P a partition of I}
I
Similarly, define X
L(f, P ) = |J| · inf f (x)
x∈J
J∈P

This is the lower Riemann sum. We then have similarly


Z Z
f = sup{p.c. f | f minorizes f and is p.c. on I} = sup{L(f, P ) | P a partition of I}
I [P ]

Proposition 3.3.16. If f is piecewise constant on I, then f is Riemann integrable and


Z Z
f = p.c. f
I I

Proof. Since f ∈ {f | f minorizes f and is p.c. on I}, we have


Z Z
p.c. f 6 f
I I

Similarly, since f ∈ {f | f majorizes f and is p.c. on I}, we have


Z Z
p.c. f> f
I I

54
Hence, by a previous proposition, we have
Z Z Z
f= f= f
I I I

and Z Z Z Z Z
p.c. f6 f= f= f 6 p.c. f
I I I I I
so Z Z
p.c. f= f
I I

Theorem 3.3.17. Let I be a bounded interval. Let f : I → R be uniformly continuous.


Then f is Riemann integrable on I.
Proof. The theorem holds trivially if |I| = 0. Suppose |I| > 0. Fix ε > 0. We will show that
Z Z
f 6 f +ε
I I

We do this for every ε > 0, so by a previous proposition,


Z Z
f= f
I I

Since f is uniformly continuous, there exists δ > 0 such that |x − y| < δ =⇒ |f (x) − f (y)| <
ε/|I|. Take any partition P such that every interval J in P has length less than δ. Then for
each J ∈ P , for any x, y ∈ J
ε
f (x) < f (y) +
|I|
So for every y ∈ J,
ε
sup f (x) 6 f (y) +
x∈J |I|
So
ε
sup f (x) 6 inf f (y) +
x∈J y∈J |I|
and thus
|J|ε
|J| · sup f (x) 6 |J| · inf f (x) +
x∈J x∈J |I|
Summing over all J ∈ P , we obtain
X X X |J|ε
|J| · sup f (x) 6 |J| · inf f (x) +
J∈P
x∈J
J∈P
x∈J
J∈P
|I|

i.e.
U (f, P ) 6 L(f, P ) + ε

55
Thus, letting Γ be the set of all partitions of I,

inf U (f, Q) 6 U (f, P ) 6 L(f, P ) + ε 6 sup L(f, Q) + ε


Q∈Γ Q∈Γ

i.e. Z Z
f6 f +ε
I I

as desired.

Corollary 3.3.18 (S07.9). Let f : [a, b] → R be continuous ([a, b] bounded). Then f is


Riemann integrable on [a, b].

Proof. This is immediate, since [a, b] is compact, so f is in fact uniformly continuous on


[a, b], hence Riemann integrable by the last theorem.

Proposition 3.3.19. If : I → R is bounded by M , then


Z Z
−M · |I| 6 f 6 6 M · |I|
I I

Proof. This is clear, since the constant function M majorizes (resp −M minorizes) f .

Theorem 3.3.20. Let f : I → R be continuous. Suppose I is a bounded interval and f is


bounded. Then f is Riemann integrable on I.

Proof. Let ε > 0. We will find a partition P of I such that

U (f, P ) 6 L(f, P ) + ε

whence f is Riemann integrable on I, as shown in the previous theorem. Let a, b be the left
and right endpoints respectively of I. Let M bound |f | on I. Let δ > 0 be small enough so
that 2δ < b − a, and 2M · δ < ε/3.
By the previous theorem, since f is uniformly continuous on [a + δ, b − δ], f is Riemann
integrable on [a + δ, b − δ]. Hence, there exists a partition P of [a + δ, b − δ] such that
ε
U (f, P ) 6 L(f, P ) +
3
Let P be the partition of I attained by adding P to the intervals (a, a + δ) and (b − δ, b)
(where we could change the left and right braces to match whether I is open or closed on
either end). Note that

δ· inf f (x) > δ · −M and δ · inf f (x) > δ · −M


x∈(a,a+δ) x∈(b−δ,b)

and
δ· sup f (x) 6 δ · M and δ · sup f (x) 6 δ · M
x∈(a,a+δ) x∈(b−δ,b)

56
Then
U (f, P ) =δ · sup f (x) + U (f, P ) + δ · sup f (x)
x∈(a,a+δ) x∈(b−δ,b)

62M · δ + U (f, P )
ε
6L(f, P ) + + 2M · δ
3
ε
=L(f, P ) − 2M · δ + + 4M · δ
3
ε
6δ · inf f (x) + L(f, P ) + δ · inf f (x) + + 4M · δ
x∈(a,a+δ) x∈(b−δ,b) 3
ε
=L(f, P ) + + 4M · δ
3
ε ε
<L(f, P ) + + 2 ·
3 3
=L(f, P ) + ε

Definition 3.3.21. We say f is piecewise continuous on I if there exists a partition P


of I such that f is continuous on each J ∈ P .
Proposition 3.3.22. If f is bounded and piecewise constant on a bounded interval I, then
it is Riemann integrable on I.
Proof. Let ε > 0 be given. Let P be a partition of I such that f is continuous on each
element of P . Let |P | = n. Since f is Riemann integrable on each element of P , so there
exists a partition PJ of J for each J S∈ P such that U (f, PJ ) 6 L(f, PJ ) + ε/n. Replace P
by the refinement of P 0 obtained by J∈P PJ . Then
X ε
U (f, P 0 ) − L(f, P 0 ) = U (f, PJ ) − L(f, PJ ) 6 n · = ε
J∈P
n

Proposition 3.3.23. If f : [a, b] → R is monotone, then f is Riemann integrable on [a, b].


Proof. Homework (S13.1, F12.2).
Example 3.3.24. We now exhibit the standard example of a function f : [0, 1] → R which
is not Riemann integrable. Let
(
0 if x ∈ [0, 1] ∩ Q
f (x) =
1 if x ∈ [0, 1] \ Q
Then for every interval J ∈ [0, 1] of nonzero length,
sup f (x) = 1, inf = 0
x∈J x∈J

So for every partition P of I, L(f, P ) = 0 and U (f, P ) = 1, so


Z Z
f = 0 6= 1 = f
I I

57
Theorem 3.3.25. Let I be a bounded interval, and let f : I → R, g : I → R be Riemann
integrable on I. Then

(1) f + g is Riemann integrable on I, and


Z Z Z
f +g = f + g
I I I

(2) For any real c, cf is Riemann integrable on I, and


Z Z
cf = c f
I I

(3) If I = J ∪ K for disjoint intervals J, K, then f is Riemann integrable on each of J, K,


and Z Z Z
f= f+ f
I J K

(4) If f (x) > 0 for all x ∈ I, then Z


f >0
I

Using (1) and (2), if f (x) > g(x) for all x ∈ I, then
Z Z
f> g
I I

(5) If f (x) = c for all x ∈ I, then Z


f = c|I|
I

(6) The functions min(f, g) : x 7→ min(f (x), g(x)) and max(f, g) : x 7→ max(f (x), g(x)) are
both Riemann integrable on I

3.4 Lecture 12 - The Riemann Integral (II)


Today, we will prove several theorems which will help us calculate integrals. In particular, we
will prove the 1st and 2nd fundamental theorems of calculus; we will introduce the Riemann-
Stieltjes integral; and prove a kind of product rule (integration by parts) and chain rule
(change of variables) for integrals. We will begin by proving (6) from the theorem introduced
at the end of Lecture 11.
Proof. We will prove the result for the max function; the proof for the min function is similar.
We start with a claim.
Claim 3.4.1. Let a, a, b, b ∈ R such that a > a, b > b. Then max{a, b} − max{a, b} 6
(a − a) + (b − b).

58
Proof. Note

max(a, b) − max(a, b) = min(max(a, b) − a, max(a, b) − b)


(
a − a if max(a, b) = a
6
b−b if max(a, b) = b
6(a − a) + (b − b)

Fix ε > 0. Since f, g are Riemann integrable on I, there exist partitions P1 , P2 of I such
that
ε
U (f, P1 ) − L(f, P1 ) 6
2
ε
U (g, P2 ) − L(g, P2 ) 6
2
Let P refine P1 , P2 . Note that for any refinement K of a partition K, we have U (f, K 0 ) 6
0

U (f, K) and L(f, K 0 ) > L(f, K). Hence,


ε
U (f, P ) − L(f, P ) 6
2
ε
U (g, P ) − L(g, P ) 6
2
Note for any J ∈ P
max(sup f (x), sup g(x)) > max(f (x), g(x))
x∈J x∈J

for each x ∈ J, so
max(sup f (x), sup g(x)) > sup max(f (x), g(x))
x∈J x∈J x∈J

Similarly,
max(inf f (x), inf g(x)) 6 max(f (x), g(x))
x∈J x∈J

for each x ∈ J, so
max(inf f (x), inf g(x)) 6 inf max(f (x), g(x))
x∈J x∈J x∈J

Thus, by the claim above, we have


X
U (max(f, g), P ) − L(max(f, g), P ) = |J| · [(sup max(f, g)(x) − inf max(f, g)(x)]
x∈J x∈J
J∈P
X
6 |J| · [max(sup f (x), sup g(x)) − max(inf f (x), inf g(x))]
x∈J x∈J x∈J x∈J
J∈P
X
6 |J| · ([sup f (x) − inf f (x)] + [sup g(x) − inf g(x)])
x∈J x∈J x∈J x∈J
J∈P
=(U (f, P ) − L(f, P )) + (U (g, P ) − L(g, P ))
ε ε
< + =ε
2 2

59
Corollary 3.4.2. If f : I → R is Riemann integrable, then so are f+ = max(f, 0) and
f0 = min(f, 0) and |f | = f+ − f− .

Proof. By the last proposition, f+ and f− are Riemann integrable on I. Since |f | = f+ − f− ,


it must also be Riemann integrable on I.

Proposition 3.4.3. If f : I → R, let J, K ⊆ I be intervals such that J ∩ K = ∅, J ∪ K = I.


If f is Riemann integrable on each of J, K, then f is Riemann integrable on I.

Theorem 3.4.4. Let I be a bounded interval, and let f, g : I → R be Riemann integrable on


I. Then f · g is Riemann integrable on I.

Proof. Fix ε > 0. We will find a partition P of I such that U (f, P ) − L(f, P ) 6 ε. Note f
and g are bounded, and f = f+ + f− and g = g+ + g− . Hence

f · g = f+ g+ + f− g+ + f+ g− + g− g− = f+ g+ − (−f− )g+ − f+ (−g− ) + (−g− )(−g− )

and all functions above are positive on I. Hence, it suffices to prove the theorem for functions
which take nonnegative values only, whence we obtain f+ g+ , (−f− )(g+ ), f+ (−g− ), (−g− )(−g− )
are all Riemann integrable, hence so is their sum by closure of Riemann integrable functions
under addition and subtraction.
Assume f, g > 0 on I. Fix ε > 0. Note f, g are bounded; let M bound both of |f |, |g|. Fix
a partition P of I such that U (f, )) − L(f, P ), U (g, P ) − L(g, P ) 6 ε/2M . Then
X
U (f g, P ) − L(f g, P ) = |J| · [sup f (x)g(x) − inf f (x)g(x)]
x∈J x∈J
J∈P
X
6 |J| · [sup f (x) sup g(x) − inf f (x) inf g(x)]
x∈J x∈J x∈J x∈J
J∈P
X
6 |J| · [sup f (x) sup g(x) − inf f (x) sup g(x) + inf f (x) sup g(x) − inf f (x) inf g(x
x∈J x∈J x∈J x∈J x∈J x∈J x∈J x∈J
J∈P
X X
6 |J| · [sup f (x) − inf f (x)] · M − |J| · [sup g(x) − inf g(x)] · M
x∈J x∈J x∈J x∈J
J∈P J∈P
=M [(U (f, P ) − L(f, P )) + (U (g, P ) − L(g, P ))]
h ε ε i
<M + =ε
2M 2M

The Riemann-Stieltjes Integral


Let I be a bounded interval, f : I → R and α : I → R monotone increasing. The idea of the
Riemann-Stieltjes integral is to replace |J| by α(right edge of J) − α(left edge of J).

Definition 3.4.5. The α-length of an interval J ⊆ I, denoted α[J], is α(b) − α(a), where
J is any of [a, b], (a, b), [a, b), (a, b]). If J is empty, we take α[J] = 0. If α = id, then α[J] is
just the traditional length |J|.

60
Definition 3.4.6. The Riemann-Stieltjes integral of f on I (and upper/lower sums, and
piecewise constant integrals) are defined as before using α[J] instead of |J| throughout. We
denote the Riemann-Stieltjes integral of f on [a, b] using α by
Z b
f dα
a

Calculating Integrals

Theorem 3.4.7 (Fundamental Theorem of Calculus R I). Let a < b, f : [a, b] → R be Riemann
integrable. Let F : I → R be the function F (x) = [a,x] f . Then F is continuous on I, and for
every x0 ∈ [a, b], if f is continuous at x0 , then F is differentiable at x0 and F 0 (x0 ) = f (x0 ).
Proof. Let M be a bound for |f | on I. Then for every x < y in [a, b], we have:
Z Z Z Z Z

|F (y) − F (x)| =
f− f =
f 6
|f | 6 M = M (y − x)
[a,y] [a,x] [x,y] [x,y] [x,y]

So F is Lipschitz continuous on I. Suppose f is continuous at x0 . Then for every ε > 0,


there exists δ > 0 such that |y − x0 | < δ implies |f (y) − f (x0 )| < ε. For x > x0 ,
Z
F (x) − F (x0 ) = f
[x0 ,x]

Suppose x − x0 < δ. Then for all y ∈ [x0 , x],

f (x0 ) − ε < f (y) < f (x0 ) + ε

Using a previous proposition gives


Z
(x − x0 )(f (x0 ) − ε) < f < (x − x0 )(f (x0 ) + ε)
[x0 ,x]

Simillarly, if x < x0 , and x > x0 − δ, then


Z
(x − x0 )(f (x0 ) − ε) < f < (x − x0 )(f (x0 ) + ε)
[x0 ,x]

So |x − x0 | < δ implies

(x − x0 )(f (x0 ) − ε) < F (x) − F (x0 ) < (x − x0 )(f (x0 ) + ε)

whence
F (x) − F (x0 )
f (x0 ) − ε < < f (x0 ) + ε
x − x0
i.e.
F (x) − F (x0 )
|x − x0 | < δ =⇒ − f (x0 ) < ε
x − x0

61
Definition 3.4.8. We say F : I → R is an antiderivative of f : I → R (I bounded) if F is
differentiable and for all x ∈ I, F 0 (x) = f (x).

Theorem 3.4.9 (Fundamental Theorem of Calculus II). Let a < b, f : [a, b] → R be Riemann
integrable. If F is (any) antiderivative of f , then
Z
f = F (b) − F (a)
[a,b]

Proof. Let P be a partition of [a, b]. Let J ∈ P (with |J| > 0), say the left and right
endpoints of J are y and z. Then by the Mean Value Theorem, since F 0 = f by assumption,
F (z) − F (y) = (z − y)f (w) = |J|f (w) for some w ∈ J.
So
|J| inf f (x) = F (z) − F (y) 6 |J| sup f (x)
x∈J x∈J

For each J ∈ P , denote the left and right endpoint of J by yJ and zJ respectively. The
summing over all J ∈ P , we find
X
L(f, P ) 6 F (zJ ) − F (yJ ) 6 U (f, P )
J∈P

The middle sum is telescoping, so only the first and last terms of the sum survive, and we
get
L(f, P ) 6 F (b) − F (a) 6 U (f, P )
R
Since f is Riemann integrable, we can get L(f, P ) and U (f, P ) arbitrarily close to [a,b] f .
By the last inequality, then Z
f = F (b) − F (a)
[a,b]

Proposition 3.4.10. If F, G are both antiderivatives of a function f , then there exists a


constant c ∈ R such that for all x, F (x) = G(x) + c.

Proof. If f is Riemann integrable, we can use the Fundamental Theorem of Calculus II to


get Z Z
F (x) = F (a) + f and G(x) = G(a) + f
[a,x] [a,x]

So F (x) − G(x) = F (a) − G(a) is constant.


For the general case, we use the Mean Value Theorem. Let H = F − G. Then H 0 = F 0 − G0 ,
so for all x ∈ I, H 0 (x) = 0. Take y < z ∈ I. We show H(y) = H(z).
By the Mean Value Theorem, there exists w ∈ (y, z) such that

H(z) − H(y)
= H 0 (w) = 0
z−y

so H(y) = H(z).

62
R Rb
Hereafter, we denote [a,b]
f by a
f.

Theorem 3.4.11 (Integration by parts). Let F : [a, b] → R, G : [a, b] → R be differentiable


on [a, b]. Suppose F 0 , G0 are Riemann integrable on [a, b]. Then F G0 , G0 F are Riemann
integrable, and Z b Z b
F G0 = F (b)G(b) − F (a)G(a) = F 0G
a a

Proof. F is differentiable, hence continuous, so F is Riemann integrable on [a, b]; the same
holds for G. Since F 0 , G0 are Riemann integrable [a, b], G0 F and F G0 are also Riemann
integrable on [a, b]. By the 2nd fundamental theorem of calculus and the product rule for
differentiation,
Z b Z b Z b Z b
0 0 0 0
(F G ) = (F G)(b)−(F G)(a) = F (b)G(b)−F (a)G(a) = F G+G F = F G+ G0 F
a a a a

Theorem 3.4.12. Let α : [a, b] → R be monotone increasing. Suppose α is differentiable


on [a, b] and α0 is Riemann integrable on [a, b]. Let f be Riemann-Stieltjes integrable with
respect to α on [a, b]. Then f α0 is Riemann integrable on [a, b], and
Z b Z b
0
fα = f dα
a a

Proof. Consider first the case that f is piecewise constant on [a, b] with respect to P . Then
for each J ∈ P , by the 2nd fundamental theorem of calculus,
Z Z Z
f dα = cJ α = cJ α0 = cJ (α(z) − α(y)) = cJ · α[J]
0
J J J

where cJ is the constant value of f on J, and y, z are the endpoints of J. Hence


Z b XZ X Z
0 0
fα = fα = cJ α[J] = ab f dα
a J∈P J J∈P

For the general case, approximate f by piecewise constant functions. See Corollary 11.10.3
from Tao I.
Proposition 3.4.13. Let Φ : [a, b] → [Φ(a), Φ(b)] be continuous and monotone increasing.
Let f : [Φ(a), Φ(b)] → R be Riemann integrable. Then f ◦ Φ : [a, b] → R is Riemann-Stieltjes
integrable with respect to Φ, and
Z b Z Φ(b)
f dΦ = f
a Φ(a)

Proof. We will show this for the case of piecewise constant functions f . The general case
follows by a standard argument (see Tao I, 11.10.6). Say f is pieewise constant with respect
to P . Note if J ⊆ [Φ(a), Φ(b)] is an interval, then end points of J are taken as values by Φ

63
by the Intermediate Value Theorem. So there is an interval Jˆ ⊆ [a, b] such that Φ(J)
ˆ = J.
ˆ
Let P̂ be the set of these J. Then P̂ is a partition of [a, b], so
Z Φ(b) X X Z b
f= cJ · |J| = ˆ =
cJˆ · Φ[J] f dΦ
Φ(a) J∈P J∈P a

ˆ
where cJˆ = cJ , the constant value of f ◦ Φ on J.

Theorem 3.4.14 (Change of variables). Let Φ : [a, b] → [Φ(a), Φ(b)] be differentiable, mono-
tone increasing such that Φ0 is Riemann integrable. Let f : [Φ(a), Φ(b)] → R be Riemann
integrable. Then (f ◦ Φ) · Φ0 is Riemann integrable on [a, b], and
Z b Z Φ(b)
0
(f ◦ Φ)Φ = f
a Φ(a)

Proof. Immediate from the previous two results.

64
4 Week 4
As per the syllabus, Week 4 topics include: Young’s, Hölder’s, and Minkowski’s inequaities,
formal power series, radius of convergence, real analytic functions, absolute and uniform
convergence on closed subintervals, derivatives and integrals of power series, Taylor’s forumla,
Abel’s lemma, Abel’s theorem for uniform convergence and continuity, Stone-Weierstrass
theorem, Cauchy mean value theorem, Taylor theorem with reminder in Lagrange, Cauchy,
and integral forms, Newton’s methods for finding roots of a single function, error bounds in
numerical integration and differentiation (homework).

4.1 Lecture 13 - Limits of Integrals, Mean Value Theorem for


Integrals, and Integral Inequalities
Today, we will discuss how limits of integrals behave with respect to certain classes of func-
tions. We will prove an analogue of the Mean Value Theorem for integrals. Finally, we
prove some useful integral inequalities - namely, Cauchy-Schwarz and Young’s, Hölder’s, and
Minkowski’s inequalities.

A few words on limits

Rb
Definition 4.1.1. If f : [a, b) → R is not bounded, we can still try to make sense of a f
Rx Rb
as limx→b a f . If the limit exists, take a f to be this limit. We can similarly define an
Rb
interpretation of a f on the left side of the interval, as well as for integrals of the form
R∞ Rb
a
f, −∞ f .

Theorem 4.1.2 (F04.3). If fn : [a, b] → R are Riemann integrable and (fn ) converges uni-
formly to f : [a, b] → R, then f is Riemann integrable, and
Z b Z b
f = lim fn
a n→∞ a

Equivalently, under the above assumption,


Z b Z b
lim fn = lim fn
a n→∞ n→∞ a

Proof. Homework F04.3 (and a few more). For a counterexample when uniform convergence
is not assumed, see S08.2 (among others).
This gives the following special case for sums:
P∞
Theorem 4.1.3. If fn : [a, b] → R are Riemann integrable and n=1 fn converges uniformly
on [a, b], then
Z bX∞ ∞ Z b
X
fn = fn
a n=1 n=1 a

65
A useful proposition to know in the context of the theorem above is the Weierstrass
M -test:
Theorem 4.1.4 (Weierstrass M -test). If

X
||fn ||∞ < ∞
n=1
P∞
then n=1 fn converges uniformly. This is sometimes stated alternatively as follows: suppose

|fn (x)| 6 Mn
P∞ P∞
for each x ∈ [a, b]. Then n=1 fn converges uniformly if n=1 Mn converges.
Proof. See Rudin’s “Principles of Mathematical Analysis”, 7.10.
Proposition 4.1.5. Let f : I → R be continuous and Riemann integrable. Suppose |I| > 0,
and f (x) > 0 for all x ∈ I. Then Z
f =0
I
if and only if f (x) = 0 for all x ∈ I.
Proof. ( ⇐= ) This is clear.
( =⇒ ) Suppose for contradiction that f (x0 ) > 0 for some x0 ∈ I. For the sake of convenience,
suppose x0 is not an endpoint; if x0 were an endpoint, we will soon see that we can pick
a nearby point x00 6= x0 for which f (x00 ) > 0 using continuity anyway. Say f (x0 ) = u. By
continuity, there exists δ > 0 such that |x − x0 | < δ implies |f (x) − f (x0 )| < u/2, i.e.
f (x) > u/2. Take δ small enough that B(x0 , δ) ⊆ I. Let J, K be partitions of I to the left
of x0 − δ and to the right of x0 + δ. Then
Z Z Z x0 +δ Z
u
f (x)dx = f (x)dx + f (x)dx + f (x)dx > 0 + 2δ · + 0 > 0
I J x0 −δ K 2
a contradiction.
Corollary 4.1.6. Let f, g : I → R be continuous and Riemann integral. Suppose |I| > 0,
and f (x) 6 g(x) for all x ∈ I. Then if
Z Z
f= g
I I

then for all x ∈ I, f (x) = g(x).


Proof. Use the proposition above on g − f .
Theorem 4.1.7 (Mean Value Theorem I for integrals). Let f : [a, b] → R be continuous,
where a < b. Then there exists c ∈ [a, b] such that
Z b
1
f = f (c)
b−a a

66
In fact, we will prove something stronger.

Theorem 4.1.8 (Mean Value Theorem II for integrals). Let f : [a, b] → R be continuous,
where a < b. Let ϕ : [a, b] → R be Riemann integrable on [a, b], and suppose ϕ(x) > 0 for all
x ∈ [a, b]. Then there exists c ∈ [a, b] such that
Z b Z b
f · ϕ = f (c) · ϕ
a a

Note that Mean Value Theorem I follows from the special case ϕ = 1.

Proof. First, since f is continuous and [a, b] is compact, note that f achieves its minimum and
maximum, say at points xmin , xmax ∈ [a, b] respectively. Then in particular, |f | is bounded
by some M ∈ R, so if Z b
ϕ=0
a
then Z b Z b


f ϕ 6
Mϕ = 0
a a

so for every x ∈ [a, b],


Z b Z b
f ϕ = f (x) ϕ
a a
Now assume Z b
ϕ 6= 0
a
Replace ϕ by
Z b −1
ϕ· ϕ
a

so that we may assume Z b


ϕ=1
a
This normalization does not change the nature of the proof, but makes some later details
simpler. We then note
Z b Z b Z b
f (xmin )ϕ 6 fϕ 6 f (xmax )ϕ
a a a

i.e. Z b Z b Z b
f (xmin ) ϕ6 f ϕ 6 f (xmax ) ϕ
a a a
hence Z b
f (xmin ) 6 f ϕ 6 f (xmax )
a

67
By the Intermediate Value Theorem, since f (xmin ), f (xmax ) ∈ f ([a, b]), there exists c ∈ [a, b]
(in fact, c is in between xmin and xmax ) such that
Z b
f (c) = fϕ
a

Then Z b Z b
f (c) ϕ= fϕ
a a

Some Important Inequalities

Theorem 4.1.9 (Cauchy-Schwarz Inequality). Let f, g : I → R be continuous and Riemann


integrable, and let |I| > 0. Then
Z sZ Z
fg 6 f2 g2
I I I

with equality if and only if f and g are linearly dependent, meaning there exists c ∈ R such
that f (x) = cg(x) for all x ∈ R (or g(x) = cf (x) for all x ∈ R).

Proof. If Z
f2 = 0
I

then for all x ∈ I, f (x) = 0 by the previous proposition, and the theorem is clear. One
proceeds similarly if Z
g2 = 0
I
Suppose Z Z
2
f 6= 0, g 2 6= 0
I I

Note that for all u, v ∈ R, (u − v) > 0, i.e. u − 2uv + v 2 > 0, so 2uv 6 u2 + v 2 with equality
2 2

if and only if u = v. We apply this with

f (x) g(x)
u = qR , v = qR
I
f2 I
g2

These can be thought of the values at x of “normalizations” f and g. Then for all x ∈ I,

2f (x)g(x) f (x)2 g(x)2


qR 6 R + R 2
f 2
R
g 2 I
f2 I
g
I I

68
with equality if and only if
f (x)2 g(x)2
R
2
= R
I
f I
g2
Now integrate both sides to obtain
R R R
I
2f (x)g(x) IR
f (x)2 IR
g(x)2
qR 6 +
f 2
R
g 2 I
f2 I
g2
I I

with equality if and only if


f (x)2 g(x)2
R = R
I
f2 I
g2
for all x ∈ I. Hence R
I
2f (x)g(x)
q R R 62
f 2 g 2
I I
so sZ
Z Z
fg 6 f2 g2
I I I

with equality if and only if R


I
2f (x)g(x)
q R R 62
2
f Ig 2
I

In this case, f and g are clearly linearly dependent; if f = c · g for some c ∈ R, then one can
trace back through thse steps and in fact show that
sR
f2
c = RI 2
I
g

Proposition 4.1.10 (F10.11). Find the function g(x) which minimizes


Z 1
|f 0 (x)|2 dx
0

amongst smooth f : [0, 1] → R with f (0) = 0, f (1) = 1. Is the optimal g unique?


Solution. The idea is to use Cauchy-Schwarz on the functions f 0 , 1. We obtain
s
Z 1 Z 1 Z 1
0 0 2
f (x) · 1dx 6 |f (x)| 1
0 0 0

with equality if and only if f 0 (x) = c for some constant c ∈ R. By the 2nd fundamental
theorem of calculus, Z 1
f 0 (x) · 1dx = f (1) − f (0) = 1 − 0 = 1
0

69
so Z 1
(∗) |f 0 (x)|2 > 1
0
0
with equality if and only if f = c for a constant c (assuming f (1) = 1, f (0) = 0). Hence, any
f for which equality holds in (∗) will minimize the expression in question. Note the function
g(x) = x achieves equality. Also, g is the only function satisfying f (0) = 0, f (1) = 1 and f 0
is constant, so g is the unique minimizing function.

Theorem 4.1.11 (Young’s Inequality). Let φ : [0, ∞) → [0, ∞) be continuous and strictly
monotone increasing, with φ(0) = 0. (Then note we also have φ−1 : [0, ∞) → [0, ∞) which
is continuous, strictly monotone increasing, and φ−1 (0) = 0.) Then for every a, b > 0,
Z a Z b
ab 6 φ(x)dx + φ−1 (x)dx
0 0

with equality if and only if b = φ(a).

Proof. Divide into three cases: b < φ(a), b = φ(a), b > φ(a), and draw the picture in each
case.

Corollary 4.1.12. If p, q > 1, and 1/p + 1/q = 1, then for all a, b > 0,

ap b q
ab 6 +
p q
with equality if and only if ap = bq .

Proof. We use Young’s inequality. Let φ(x) = xp−1 ; note


1 1 1 1
+ = 1 =⇒ 1 − =
p q p q
so
p
p−1=
q
so
1 q
=
p−1 p
Similarly,
1 p
=
q−1 q
so
1 q 1 1
= = = =q−1
p−1 p p/q 1/(q − 1)
So φ−1 (y) = y 1/(p−1) = y q−1 . By Young’s inequality,
a b
a b
xp y q ap b q
Z Z
p−1 q−1
ab 6 x dx + y dy = + = +
0 0 p 0 q 0 p q

70
with equality if and only if
b = ap−1 = ap/q
which is true if and only if
b q = ap

Note that the case of p = q = 2 gives

a2 b 2
ab 6 +
2 2
which is exactly what we used to get Cauchy-Schwarz. We can use a similar argument to
the proof of Cauchy-Schwarz using the more general inequality proved above to show the
following inequality.

Theorem 4.1.13 (Hölder’s Inequality). Let f, g : I → R be continuous, and Riemann inte-


grable, and let |I| > 0. Let p, q > 1 such that 1/p + 1/q = 1. Then
Z sZ sZ
p q
|f g| 6 |f |p |g|q
I I I

with equality if and only if |f |p and |g|q are linearly dependent.

Proof. Put
|f (x)| |g(x)|
a = qR , b = qR
p
I
|f |p q
I
|g|q
and run the same proof as before using the corollary above instead.

Theorem 4.1.14 (Minkowski’s Inequality). Let f, g : I → R be continuous, and Riemann


integrable, and let |I| > 0. Let p > 1. Then
Z sZ sZ
|f + g|p 6 p
|f |p + p
|g|p
I I I

with equality if and only if f and g are linearly dependent by a constant ≥ 0.

Proof. Let q be such that


1 1
+ =1
q p
i.e.
1
q−1=
p−1
so
1
q =1+
p−1

71
We also then have
 
1
q(p − 1) = 1 + (p − 1) = p − 1 + 1 = p
p−1

By Hölder’s Inequality,
Z Z Z
p p−1
|f + g| = |f + g| · |f + g| 6 (|f | + |g|) · (|f + g|p−1 )
I I I

by the triangle inequality with equality if and only if f (x) and g(x) have the same sign for
each x ∈ I. Then by Hölder’s Inequality, we obtain
Z Z sZ sZ sZ sZ
|f | · |f + g|p−1 + |g| · |f + g|p−1 6 p |f |p q |f + g|q(p−1) + p |g|p q |f + g|q(p−1)
I I I I I I
sZ sZ ! sZ !
p p q
= |f |p + |g|p |f + g|p
I I I

Dividing by sZ
q
|f + g|p
I

we get sZ sZ
Z 1−1/q
|f + g|p 6 p
|f |p + p
|g|p
I I I

i.e., since 1 − 1/q = 1/p,


sZ sZ sZ
q p p
|f + g|p 6 |f |p + |g|p
I I I

For equality, through use of Hölder’s Inequality, we need |f |p , |f + g|p−1 to be linearly de-
pendent, and |g|p , |f + g|p−1 to be linearly dependent, so in particular, we need |f |, |g| to be
linearly dependent. Also, through use of the triangle inequality, we need f (x) and g(x) to
have the same sign for all x ∈ I. Combining these conditions, we need f and g to be linearly
dependent, and by a constant c > 0. This is a necessary condition for equality, and it is
straightforward to check it is also sufficient.

4.2 Lecture 14 - Power Series (I), Taylor Series, and Abel’s Lemma/Theorem
Today, we will introduce formal power series. We will discuss how the radius of convergence
affects the convergence of power series, including two powerful tools, namely Abel’s Lemma
and Abel’s Theorem. We will also briefly introduce Taylor series.

72
Definition 4.2.1. A formal power series centered at a ∈ R is any series of the form

X
cn (x − a)n
n=0

where cn ∈ R is called the nth coefficient of the series. The radius of convergence of the
series is defined to be
1
R=
lim supn→∞ (cn )1/n
We allow R = +∞ if lim supn→∞ (cn )1/n = 0 and R = 0 if lim supn→∞ (cn )1/n = ∞.

Theorem 4.2.2. (a) If |x − a| > R, then



X
cn (x − a)n
n=0

diverges.

(b) (S06.2) If |x − a| < R, then



X
cn (x − a)n
n=0

converges absolutely.

Proof. (a) It is enough to show that |cn (x − a)n | does not converge to 0; to do this, it is
enough to find infinitely many n where |cn (x − a)n | > 1. By the definition of lim sup, for
each ε > 0, there are infinitely many n such that

|cn |1/n > lim sup |cn |1/n − ε


n→∞

So for each ε > 0, we have infinitely many n such that


1
|cn |1/n > −ε
R
Since |x − a| > R by assumption, put
 
|x − a| 1
ε= −1 · >0
R |x − a|

Then  
1/n 1 |x − a| |x − a|
|cn | |x − a| > − ε |x − a| = − +1=1
R R R
so |cn (x − a)n | > 1 for infinitely many n. In particular, (cn (x − a)n )n∈N does not converge
to 0.

73
(b) Again using the definition of R and lim sup, for every ε > 0,
1
|cn |1/n < +ε
R
for all but finitely many n (say for all n > k ∈ N). Since |x − a| < R by assumption, put
 
|x − a| 1
ε= 1− · >0
R 2|x − a|
Then for all but finitely many n
 
1/n |x − a| |x − a| 1 R + |x − a| 2R
|cn | |x − a| < + 1− · = < =1
R R 2 2R 2R
whence
|cn (x − a)n | < Ln
where L < 1,
k
X ∞
X k
X ∞
X
n n n
|cn (x − a) | + |cn (x − a) | < |cn (x − a) | + Ln < ∞
n=0 n=k n=0 n=k

Theorem 4.2.3. Assume R > 0. Let f : (a − R, a + R) → R be given by



X
f (x) = cn (x − a)n
n=0

(Note f (x) is well-defined on the specified domain by the previous theorem.)


(a) For any r < R, the series

X
cn (x − a)n
n=0

converges uniformly to f on [a − r, a + r]. In particular, f is continuous on [a − r, a + r],


hence on (a − R, a + R).

(b) (Term by term differentiation of power series) f is differentiable on (a − R, a + R). For


every r < R, the series

X
ncn (x − a)n−1
n=1

converges uniformly to f 0 on [a − r, a + r].

(c) (Term by term integration of power series) For any closed [y, z] ⊆ (a − R, a + R),
Z z ∞ z
X cn (x − a)n+1
f=
y n=0
n+1
y

74
Proof. The proofs here are relatively straightforward with the help of the previous theorem,
with the exception of (b) which is a little tricky.

(a) The key here is to use the Weierstrass M-test with the help of the bound from the
previous proof. That is, in our proof of (b) in the last theorem, used with x = a + r, we
found L < 1 such that for all n > k ∈ N for some k,

|cn rn | < Ln

Then certainly, since |x − a| < |r| for each x ∈ (a − r, a + r), we have

|cn (x − a)n | < |cn rn | < Ln

for all n > k. Thus, it follows that



X k
X ∞
X
n n
sup |cn (x − a) | = sup |cn (x − a) | + sup |cn (x − a)n |
n=0 x∈[a−r,a+r] n=0 x∈[a−r,a+r] n=k x∈[a−r,a+r]
k
X X∞
< sup |cn (x − a)n | + n
L
n=0 x∈[a−r,a+r] n=k
<∞

Hence, by the Weierstrass M-test,


m
X
cn (x − a)n
n=0

converges uniformly to f on [a − r, a + r]

(b) By Tao II 3.7.2, it is sufficient to show that


 !0  !
Xl l
X
 cn (x − a)n  = ncn (x − a)n−1
n=0 n=1 l∈N
l∈N

conveges uniformly on [a − r, a + r] (plus convergence at a point, but this is given by


part (a)). Thus, we have to show

X
ncn (x − a)n−1
n=1

converges uniformly on [a − r, a + r]. For this it is enough to show that the radius of
convergence of
X∞
ncn (x − a)n−1
n=1

75
is > r by part (a) of this theorem. For this, it is sufficient to find one point x outside
[a − r, a + r] on which
X∞
ncn (x − a)n−1
n=1

converges; this is applying part (a) of the previous theorem (the series must diverge
at EVERY point outside the radius of convergence). Pick some x, w ∈ R such that
r < |x − a| < |x − w| < R. Since |x − w| < R, by part (a), f (w) converges absolutely;
in particular, |cn ||w − a|n is bounded by M for some M ∈ R. Then we compute
∞ ∞ ∞
X X |x − a|n−1 M X |x − a|n−1
n|cn (x − a)n−1 | = n|cn | |w − a|n−1
< · n
n=1 n=1
|w − a|n−1 |w − a| n=1 |w − a|n−1

Since |x − a|/|w − a| < 1, it is not hard to see



X |x − a|n−1
n
n=1
|w − a|n−1

converges, whence we’re done.

(c) This is similar to (b), but easier leveraging the information given in part (a). Since
m
X
fm = cn (x − a)n
n=0

converges uniformly f on [a − r, a + r] for any r < R by part (a), it follows that on any
subset [y, z] ⊆ (a − R, a + R), Z z Z z
fm → f
y y

i.e.
Z m
zX m Z
X z Z z m
X
n n
lim cn (x − a) = lim cn (x − a) = lim cn (x − a)n
m→∞ y m→∞ y y m→∞ n=0
n=0 n=0

so the desired conclusion follows.

R
Definition 4.2.4. f : E → R is real analytic at a ∈ (E) if on some neighborhood
(a − r, a + r) ⊆ E, f is equal to a power series with radius of convergence > r. We say f is
real analytic on an open set E if f is real analytic at each a ∈ E.

Proposition 4.2.5. If f is real analytic on E, then f is smooth (k-times continuously


differentiable for all k ∈ N) and for each k, f (k) is real analytic on E.

Proof. We proved both the base case and the induction case in part (b) of the previous
theorem.

76
Corollary 4.2.6 (Taylor’s formula). Let f : E → R be real analytic at a ∈ Int(E). Say

X
f (x) = cn (x − a)n
n=0

on some (a − r, a + r). Then for all k ∈ N, f k (a) = k! · ck . In particular,



X f (n) (a)
f (x) = (x − a)n
n=0
n!

Proof. Since f is real analytic at a, we can apply part (b) of the previous theorem to
differentiate f (x) term by term. Then a simple induction argument shows that the constant
term of the power series expansion for f k (x) at a is k! · ck . Plugging in at a makes all the
higher order terms vanish, so we obtain f k (a) = k! · ck . Isolating ck gives ck = f k (a)/k!,
yielding the familiar Taylor expansion on (a − r, a + r).

Corollary 4.2.7. If f is representable by (equal to) two power series with coefficients
(cn ), (dn ), then cn = dn for each n.

Proof. By the above corollary, Taylor series forces the value of cn , dn .


Next, we want to consider the behavior of power series at the end points of the radius of
convergence. The series may converge or diverge, but we show that if the series converges
at a = −R (resp a = R), then the series converges uniformly on [a − R, a] (resp [a, a + R]).

Lemma 4.2.8 (Abel’s Lemma (F12.1)). Let Pm(bn ) be a (non-strictly) decreasing sequence of
non-negative reals. Let (an ) be such that ( n=1 an )m is bounded on both sides, say by ±A.
Then n
X
aj bj 6 2Abm+1
j=m+1

Proof. The key here is to think of this as a special kind of “integration by parts” for sums.
Let m
X
sm = an
n=1

We know |sm | is bounded by A.


Claim 4.2.9. This is the parallel for sums of integration by parts.
n
X n
X
sj (bj+1 − bj ) − aj bj = sn bn+1 − sm bm+1
j=m+1 j=m+1

77
Proof. Note
n
X n
X n
X n
X n
X
sj (bj+1 − bj ) − aj b j = sj bj+1 − s j bj − aj b j
j=m+1 j=m+1 j=m+1 j=m+1 j=m+1
Xn Xn
= sj bj+1 − (sj − aj )bj
j=m+1 j=m+1
Xn Xn
= sj bj+1 − sj−1 bj
j=m+1 j=m+1
n
X n−1
X
= sj bj+1 − sj bj+1
j=m+1 j=m

=sn bn+1 − sm bm+1

Now to prove abel’s theorem, note (by the claim above and since (bj ) is decreasing),

n n n
X X
X

aj bj 6 |sn bn+1 − sm bm+1 | +
|sj (bj+1 − bj )| =|sn bn+1 − sm bm+1 | + |sj |(bj+1 − bj )
j=m+1 j=m+1 j=m+1
n
X
6Abn+1 + Abm+1 + A(bj − bj+1 )
j=m+1

=Abn+1 + Abm+1 + (Abm+1 − Abn+1 )


=2Abm+1

Theorem 4.2.10. If

X
cn (x − a)n
n=0

converges at x = a + R (for R > 0), then the function



X
f (x) = cn (x − a)n
n=0

converges uniformly on [a, a + R]. Similarly at a − R.

cn xn at x = R,
P
Proof. For simplicity, suppose that a = 0. Fix ε > 0. By convergence of
there exists N such that for n > m > N ,
n
X j
ε

c n R <
3
j=m+1

78
Now consider n n
X X  x j
j j
cn x = cn R
j=m+1 j=m+1
R

for x < R. By Abel’s lemma, used with aj = cj Rj , bj = (x/R)j , we obtain


n  x j
62· ε · x
X  m+1
j
c n R <ε

j=m+1
R 3 R

since x < R.

Corollary 4.2.11 (Abel’s Theorem). If



X
cn (x − a)n
n=0

converges at x = a + R (for R > 0), then the function



X
f (x) = cn (x − a)n
n=0

is continuous to the left of a + R, i.e.



X ∞
X
lim cn (x − a)n = cn R n
x→(a+R)−
n=0 n=0

Similarly at a − R.

Proof. A uniform limit of continuous functions is continuous, so this is immediate by the


previous theorem.

Example 4.2.12. We compute the Taylor series for f (x) = 1 + x around a = 0.

f (x) = 1 + x =⇒ f (0) = 1
1 1
f (x) = (1 + x)−1/2 =⇒ f 0 (0) =
2 2
−1 −1
f (x) = (1 + x)−3/2 =⇒ f 00 (0) =
4 4
3 3
f (x) = (1 + x)−5/2 =⇒ f 000 (0) =
8 8
.. ..
. .
k+1
(−1) · (2k − 2)! (−1)k+1 · (2k − 2)!
f (k) (x) = (1 + x)(−2k+1)/2
=⇒ f (k)
(0) =
(k − 1)!22k−1 (k − 1)!22k−1

79
So the Taylor series for f is

X (−1)k+1 · (2k − 2)! (−1)k+1 · (2k)! k
xk = x
k=0
(k − 1)!22k−1 k! (k!)2 4k (2k − 1)

Call
(−1)k+1 · (2k)! k
ck = x
(k!)2 4k (2k − 1)
k
P
(This formula√for ck works also at k = 0, 1.) We take for granted for now that k ck x
converges to 1 + x on (−1, 0]. At x = −1, we have
∞ ∞ ∞
X (−1)k+1 · (2k)! k (−1)0! X (2k)! X (2k)!
· (−1) = + − = 1 −
k=0
(k!)2 4k (2k − 1) (0!)2 40 (−1) k=1 (k!)2 4k (2k − 1) k=1
(k!)2 4k (2k − 1)

This is the limit of a decreasing sequence; in fact the same is true for any x ∈ (−1, 0]. Let
n
X
sn (x) = cj x j
j=0

Then on (−1, 0], sn (x) is decreasing and converges to 1 + x > 0. At x = −1, sn (x) > 0, so
(sn (−1))n∈N is decreasing and bounded below by 0, whence it converges. Thus, our power
series converges at x = −1; by the theorem, it converges uniformly on√[−1, 0].
Let’s recap. We saw that the polynomials sn converge uniformly to 1 + x on [−1, 0]. In
the next lecture, we will prove some more general theorems about functions which can be
uniformly approximated by polynomials. We also found

X (2k)! p
lim sn (−1) = 1 − = 1 + (−1) = 0
n→∞
k=1
(k!)2 4k (2k − 1)
so ∞
X (2k)!
=1
k=1
(k!)2 4k (2k − 1)

4.3 Lecture 15 - Stone-Weierstrass and Taylor Series Error Ap-


proximation
Today, we will discuss the approximation of certain classes of functions uniformly by poly-
nomials; we will use this to prove the Stone-Weierstrass theorem. We will also give various
forms of the error bounds for Taylor expansions.
Last time, we showed there is a sequence of polynomials which uniformly
√ √ converges to
1√+ x on [−1, 0]. We still need to show that the Taylor expansion of 1 + x at 0 converges
to 1 + x on (−1, 0]. We will prove this later.
Proposition 4.3.1. Let X, Y and Z be general metric spaces. Suppose gn : X → Y converge
uniformly to g, and fn : Y → Z converge uniformly to f , and f is uniformly continuous.
Then (fn ◦ gn ) converges uniformly to f ◦ g.

80
Proof. Fix ε > 0. By the uniform continuity of f , there exists δ > 0 such that d(y1 , y2 ) <
δ =⇒ d(f (y1 ), f (y2 )) < ε/2. Using uniform convergence of fn , gn , there exists N ∈ N such
that for all n > N ,
ε
d(fn (y), f (y)) < and d(gn (x), g(x)) < δ
2
for all Y ∈ Y and x ∈ X. Then for all n > N , and for all x ∈ X
ε ε
d(f (g(x)), fn (gn (x))) 6 d(f (g(x)), f (gn (x))) + d(f (gn (x)), fn (gn (x))) < + =ε
2 2
since d(g(x), gn (x)) < δ for all x ∈ X and fn → f uniformly at y = gn (x).

Proposition 4.3.2. Each of the following is a uniform limit of polynomials on the indicated
domain:

(1) x 7→ |x| on [−1, 1]

(2) x, y 7→ |x − y|/2 on [−1, 1]2

(3) x, y 7→ min(x, y) on [−1, 1]2

(4) x, y 7→ max(x, y) on [−1, 1]2

(5) x1 , . . . , xn 7→ min(x1 , . . . , xk ) on [−1, 1]k

(6) x1 , . . . , xn 7→ max(x1 , . . . , xk ) on [−1, 1]k

Proof. (1) Let h be x 7→ |x|, and let g √ : [−1, 1] → [−1, 0] be given by x 7→ x2 − 1, and
f : [−1, 0] → [0, 1] be given by y 7→ y + 1. Then one can see h = f ◦ g. We showed
in the last lecture that f is a uniform limit of polynomials; call them (fn ). Also, f is
uniformly continuous, since f is continuous on [−1, 0]. Since g is itself a polynomial,
it is trivially the uniform limit of the polynomials (gn ) where gn = g for each n ∈ N.
By the previous proposition, (fn ◦ gn ) converge uniformly to f ◦ g = h, and fn ◦ gn is a
composition of polynomials for each n ∈ N, hence a polynomial.

(2) Note x, y 7→ |x − y|/2 is the composition of x, y 7→ x − y, z 7→ |z|, w 7→ w/2. Since


z 7→ |z| is a uniform limit of polynomials, a near identical argument to that given in (1)
shows x, y 7→ |x − y|/2 is a uniform limit of polynomials.

(3) Note
x + y |x − y|
min(x, y) =−
2 2
The first term is a polynomial in x and y, and the second term is a uniform limit of
polynomials by (2), so a similar argument to (1) and (2) does the job.

(4) Similarly, note


x + y |x − y|
max(x, y) = +
2 2

81
(5) This can be done using a simple induction argument, using the composition
min(x1 , . . . , xk , xk+1 ) = min(min(x1 , . . . , xk ), xk+1 )

(6) This can similarly be done using a simple induction argument, using the composition
max(x1 , . . . , xk , xk+1 ) = max(max(x1 , . . . , xk ), xk+1 )

Proposition 4.3.3. The previous proposition holds true if your replace [−1, 1] in the do-
mains by [−M, M ] for any M ∈ R.
Proof. This is clear by composing the functions above with the linear polynomial which
scales the interval [−M, M ] to [−1, 1].
Theorem 4.3.4 (Stone-Weierstrass). Let X be a compact metric space. Let A ⊆ C(X)
(where C(X) is the set of continuous functions from X into R) satisfy:
(1) A is closed under sums, products, and products with scalars. Precisely:
g ∈ A, c ∈ R =⇒ c · g ∈ R
f, g ∈ A =⇒ f + g ∈ A
f, g ∈ A =⇒ f g ∈ A
I.e., A is an algebra
(2) The constant function x 7→ 1 belongs to A (i.e. A is unital)
(3) For every x1 6= x2 ∈ X, there exists f ∈ A such that f (x1 ) 6= f (x2 ) (A separates
points in X)
Then every f ∈ C(X) is a uniform limit of functions in A.
Example 4.3.5. If X = [a, b] ⊆ R, and A is the set of all polynomials, then A satisfies the
criteria in Stone-Weierstrass, so every continuous function f : [a, b] → R is a uniform limit
of polynomials.
Proof. As one might imagine, there is some heavy lifting to be done here. The proof is
essentially broken up into three steps. The first is to find functions in our algebra which
match f on at least two points for any two given points. Next, we use the continuity of our
approximating functions to keep these functions close to f on small neighborhoods about
one of the two points, then shrink down to a finite set of neighborhoods using compactness.
We then take the minimum the finitely many functions we recover, and we can approximate
the min function on a finite set of variables uniformly using polynomials (using the previous
proposition!), bounding the difference between f and our approximating functions from
above. The third and final crucial step is to repeat this process using the max function to
bound the difference between f and our approximating functions below. This is all made
precise in the follow claims.
Fix f ∈ C(X) and ε > 0. We need g ∈ A such that for all x ∈ X,
d(g(x), f (x)) < ε

82
Claim 4.3.6. For every s, t ∈ X, there exists fs,t ∈ A such that fs,t (s) = f (s) and fs,t (t) =
f (t).
Proof. If s = t, then we can use the constant function x 7→ f (s). Suppose s 6= t. By (3),
there exists h ∈ A such that h(s) 6= h(t). Since A is closed under multiplication by a scalar,
we can modify h such that we have h(t) − h(s) = f (t) − f (s) by multiplying by h by
f (s) − f (t)
h(s) − h(t)
where the denominator is nonzero by hypothesis. Then adding the constant function x 7→
f (s) − h(s), our function still lives in A by (1) and (2), and
h(s) = f (s) and h(t) = f (t)

Claim 4.3.7. For every s ∈ X, there is ĥs ∈ A such that


(1) |ĥs (s) − f (s)| < ε/2
(2) For all x ∈ X, ĥs (x) < f (x) + ε/2
Proof. The first condition might lead you to think we are taking a step backwards here; after
all, we previously found elements of A which agreed exactly with f at s. However, the first
condition comes at the cost of the second condition, which is clearly much stronger than
exact agreement at two points; our loss of exactness at s is the price for condition (2).
Fix s ∈ X. For each t ∈ X, by the previous claim there exists fs,t ∈ A such that
fs,t (s) = f (s) and fs,t (t) = f (t)
By the continuity of f − fs,t , there exists an open neighborhood Ut of t such that for all
x ∈ Ut ,
ε
|fs,t (x) − f (x)| <
4
In particular, for all x ∈ Ut , fs,t (x) < f (x) + ε/4. It is clear that the set of all such Ut for all
t ∈ X forms an open cover of X, so by compactness, there are finitely many t1 , . . . , tk such
that Ut1 ∪ · · · ∪ Utk = X. Let
h(x) = min(fs,t1 , . . . , fs,tk )(x)
Note that h(s) = f (s), and for all x ∈ X, h(x) < f (x) + ε/4. To see this, note that for each
x ∈ X, there is an i ∈ {1, . . . , k} such that x ∈ Utk , whence
ε
h(x) = min(fs,t1 , . . . , fs,tk )(x) 6 fs,ti (x) < f (x) +
4
Let M > 0 bound fs,t1 , . . . , fs,tk on X, which is possible since each of fs,t1 , . . . , fs,tk are
continuous and X is compact. Let p : [−M, M ]k → [−M, M ] be a polynomial such that for
all (y1 , . . . , yk ) ∈ [−M, M ]k ,
ε
|p(y1 , . . . , yk ) − min(y1 , . . . , yk )| <
4
83
This is possible by our earlier result on uniform approximation of the min in k-variables
by polynomials. Now let ĥs (x) = p(fs,t1 , . . . , fs,tk ). Then ĥs ∈ A, since A is closed under
“polynomial” operations and is unital, and
ε
|ĥs (x) − h(x)| <
4
So
ε ε
|ĥs (x) − h(x)| < <
4 2
and for all x ∈ X,
ε ε ε
ĥs (x) < f (x) + + = f (x) +
4 4 2

Claim 4.3.8. There is ĝ ∈ A such that for all x ∈ X,

f (x) − ε < ĝ(x) < f (x) + ε

This finishes the proof.


Proof. We almost had exactly what we wanted with the last claim, but the trouble is that we
could only get an upper bound on our approximating function. Now, we rig things similarly
to get a lower bound on the approximation error while not losing our upper bound.
For all s ∈ X, |ĥs (s) − f (s)| < ε/2 by the last claim. By the continuity of ĥs − f , we have
an open neighborhood Vs of s such that for all x ∈ Vs ,

|ĥs (x) − f (x)| <
4
In particular, for all x ∈ Vs ,

f (x) − < ĥs (x)
4
Once again, the set of all neighborhoods Vs for each s ∈ X forms an open cover of X, so by
the compactness of X, there are finitely many s1 , . . . , sl such that Vs1 ∪ · · · ∪ Vsl = X. Let
g = max(ĥs1 , . . . , ĥsl ). Then for all x ∈ X,
ε
g(x) < f (x) +
2

since ĥs (x) < f (x) + ε/2 for each x ∈ X, s ∈ X by the previous claim, and

f (x) − < g(x)
4
To see this, note that for each x ∈ X, there is an i ∈ {1, . . . , l} such that x ∈ Vsi , whence

g(x) = max(ĥs1 , . . . , ĥsl ) > ĥsi (x) > f (x) −
4

84
Now, as in the previous case, we can find a polynomial q on [−M 0 , M 0 ]l for M 0 sufficiently
large such that for all (y1 , . . . , yl ) ∈ [−M 0 , M 0 ]l ,
ε
|q(y1 , . . . , yl ) − max(y1 , . . . , yl )| <
4
Set ĝ = q(ĥs1 , . . . , ĥsl ). Then ĝ ∈ A, and for all x ∈ X,
ε ε ε
ĝ(x) < g(x) + < f (x) + + < f (x) + ε
4 2 4
and
ε 3ε ε
ĝ(x) > g(x) − > f (x) − − = f (x) − ε
4 4 4

These three claims together complete the proof. Note that we can replace the assumption
that A is unital with the weaker assumption that for all x ∈ X, there exists h ∈ A such that
h(x) 6= 0. One can then still prove the first claim with this weaker assumption. We also
used that A is unital to show that A is closed under compositions with polynomials p. But
this was only needed if p has a nonzero constant term; one can avoid this by showing that
the min and max functions can in fact be uniformly approximated by polynomials with a
zero constant term.

Some Error Bounds

Theorem 4.3.9 (Generalization of Mean Value Theorem). Let f, g : [a, b] → R be continuous


on [a, b] and differentiable on (a, b). Then there exists c ∈ (a, b) such that
f 0 (c)(g(b) − g(a)) = g 0 (c)(f (b) − f (a))
Moreover, if g 0 (x) is never 0 on (a, b), then
f 0 (c) f (b) − f (a)
0
=
g (c) g(b) − g(a)
Proof. Let
h(x) = (f (x) − f (a))(g(b) − g(a)) − (g(x) − g(a))(f (b) − f (a))
The h is continuous on [a, b], differentiable on (a, b), and h(a) = 0 = h(b), so by Rolle’s
theorem, there exists c ∈ (a, b) such that h0 (c) = 0, i.e.
h0 (c) = (f 0 (c))(g(b) − g(a)) − (g 0 (c))(f (b) − f (a)) = 0
proving our first claim. Now if g 0 (x) 6= 0 for all x ∈ (a, b), then g 0 (c) 6= 0, and g(b) − g(a) 6= 0
by Rolle’s Theorem, so
f 0 (c) f (b) − f (a)
0
=
g (c) g(b) − g(a)
As an aside, note that under the right hypotheses, L’hopital’s rule is an easy consequence of
this theorem. See the third incarnation from Lecture 10 for the analogy.

85
Remainder Calculuation for Taylor Series
Let f be n-times continuously differentiable on [a, x], and (n + 1)-times differentiable on
(a, x). Consider the function

f 2 (u) f (n) (u)


F (u) = f (u) + f 0 (u)(x − u) + (x − u)2 + · + (x − u)n
2! n!
as a function of u on the interval [a, x]. F is continuous on [a, b] and one time differentiable
on (a, x). Let g : [a, x] → R be continuous on [a, x] and differentiable on (a, x). By the
generalization of the Mean Value Theorem, there exists c ∈ (a, x) such that

F 0 (c) F (x) − F (a)


0
=
g (c) g(x) − g(a)

Note that differentiating F as a function of u gives:

f (2) (u) f (3) (u)


 
0 0 0 (2) 2
F (u) =f (u) + [(−1)f (u) + f (u)(x − u)] + (−1)(2)(x − u) + (x − u)
2! 2!
f n (u) f (n+1) (u)
+ · · · + (−1)(n)(x − u)n−1 + (x − u)n
n! n!
f (n+1) (u)
= (x − u)n
n!
Then note that F (a) is the Taylor expansion of f around a to the nth term, and
n
X f k (a)
F (x) − F (a) = f (x) − (x − a)k
k=0
k!

is the remainder of the nth term Taylor expansion, denoted Rn (x). By the generalized
Mean Value Theorem, there then exists c ∈ (a, x) such that
f (n+1) (c) Pn f k (a)
n!
(x − c)n f (x) − k=0 k!
(x − a)k
0
=
g (c) g(x) − g(a)

Theorem 4.3.10 (Taylor Expansion with Lagrange Remainder). Let f be n-times continu-
ously differentiable on [a, x] and (n + 1)-times differentiable on (a, x). Then
n
X f k (a)
f (x) = (x − a)k + Rn (x)
k=0
k!

where
f (n+1) (c)
Rn (x) = (x − a)n+1
(n + 1)!
for some c ∈ (a, x).

86
Proof. Use the above computation with g(u) = (x − u)n+1 . Then

g 0 (u) = (−1)(n + 1)(x − u)n

and
g(x) − g(a) = (0 − (x − a)n+1 )
so there exists some c ∈ (a, x) such that
f (n+1) (c)
− c)n
(x n+1 f (n+1) (c)
Rn (x) = n!
· (0 − (x − a) ) = (x − a)n+1
(−1)(n + 1)(x − c)n (n + 1)!

Theorem 4.3.11 (Taylor Expansion with Cauchy Remainder). Let f be n-times continu-
ously differentiable on [a, x] and (n + 1)-times differentiable on (a, x). Then
n
X f k (a)
f (x) = (x − a)k + Rn (x)
k=0
k!

where
f (n+1) (c)
Rn (x) = (x − c)n (x − a)
(n + 1)!
for some c ∈ (a, x).

Proof. Use the above computation with g(u) = x − u (or g(u) = u).

4.4 Lecture 16 - Power Series (II), Fubini’s Theorem, and exp(x)


Today, we will discuss yet another remainder term for Taylor expansions, the integral form
of the Taylor remainder. We will also prove Fubini’s theorem, and use this to justify the
multiplication of power series. We will use these tools to introduce the exponential function.

Theorem 4.4.1 (Taylor Expansion with Integral Remainder). Let f be (n + 1)-times dif-
ferentiable on [a, x] with f (n+1) Riemann integrable on [a, x]. Then
n
X f k (a)
f (x) = (x − a)k + Rn (x)
k=0
k!

where
x
f (n+1) (t)
Z
Rn (x) = (x − t)n dt
a n!
Proof. We proceed by induction on n. For the case n = 0, we need to show
Z x
f (x) = f (a) + f 0 (t)dt
a

87
which is true by the 1st fundamental theorem of calculus. Assume the theorem is true for
n > 0. Let f be (n + 2)-times differentiable, with f (n+2) Riemann integrable. Then in
particular, f (n+1) is continuous, hence Riemann integrable. By the theorem for n then:
n Z x (n+1)
X f k (a) k f (t)
f (x) = (x − a) + (x − t)n dt
k=0
k! a n!

Note (when differentiating with respect to t)


 0
1
(x − t) = −(x − t)n+1 ·
n
n+1
Hence, using integration by parts,
Z x (n+1) x Z x (n+2)
f (t) n f (n+1) (t) −(x − t)n+1 f (t) (x − t)n+1
(x − t) dt = · + · dt
a n! n! n + 1 a a n! n+1
Z x (n+2)
f (n+1) (a) (x − a)n+1 f (t)
= · + · (x − t)n+1 dt
n! n+1 a (n + 1)!
So
n Z x (n+2)
X f k (a) f (n+1) (a)(x − a)n+1
k f (t)
f (x) = (x − a) + + · (x − t)n+1 dt
k=0
k! (n + 1)! a (n + 1)!
n+1 k Z x (n+2)
X f (a) k f (t)
= (x − a) + · (x − t)n+1 dt
k=0
k! a (n + 1)!
n+1 k
X f (a)
= (x − a)k + Rn+1 (x)
k=0
k!

which completes the induction.


One can use the integral
√ √ show that for every x ∈ (−1, 0],
form of the Taylor remainder to
the Taylor series for 1 + x around a = 0 converges to 1 + x at x. Note that doing so
completes the proof of Stone-Weierstrass. Here are some orthogonal approaches to Stone-
Weierstrass (on [a, b] ⊆ R):

(i) Direct proof with explicit polynomials (this is the “Weierstrass” contribution of Stone-
Weierstrass)
(ii) First approximate the δ-“function” using polynomials; then given any f , its convo-
lutions with these polynomials uniformly approach f . See section 3.8 of Tao II for
details.

Proposition 4.4.2 (S07.7). Let f : R → R be twice continuously differentiable. Suppose f 00


is uniformly bounded and has a simple root at x∗ , i.e. f (x∗ ) = 0 but f 0 (x∗ ) 6= 0. Let x0 ∈ R,
and set xn+1 = F (xn ), where
f (x)
F (x) = x − 0
f (x)

88
Show that if x0 is close enough to x∗ , then there is a constant C such that for all n > 1

|xn − x∗ | 6 C|xn−1 − x∗ |2

Solution. Let M bound f 00 . Since f 0 (x∗ ) 6= 0, we can fix δ > 0 and L > 0 using the continuity
of f such that
|x − x∗ | < δ =⇒ |f 0 (x)| > L
Now for x ∈ B(x∗ , δ) we have (since f (x∗ ) = 0)



f (x) ∗
∗ f (x) − f (x )
|F (x) − x | = x − 0 − x = (x − x ) −
f (x) f 0 (x)

By the Taylor series expansion of f around x, we have

f (x∗ ) = f (x) + f 0 (x)(x∗ − x) + R2 (x∗ )

So
f (x∗ ) − f (x) = f 0 (x)(x∗ − x) + R2 (x∗ )
Plug this in above to get
0 ∗ ∗

f (x)(x − x) + R2 (x )
|F (x) − x∗ | = (x − x∗ ) +


f 0 (x)

Using the Lagrange term for the remainder,

f 00 (c)(x∗ − x)2
R2 (x) =
2
for some c between x, x∗ . Hence
00 ∗ 2

f (c)(x − x) 6 M |x − x∗ |2
|F (x) − x∗ | = (x − x∗ ) + (x∗ − x) +

0
2f (x) 2L

Taking C = M/2L completes the proof, with one subtlety; we need to make sure F (x) ∈
(x∗ − δ, x∗ + δ) in order for the argument to work for all xn . To do so, take x such that
|x − x∗ | < min(δ, 2L/M ).

Proposition 4.4.3 (W06.3). Let f : [a, b] → R be twice continuously diffferentiable. Find a


Rb
good error bound for the trapezoid approximation of a dx, where the trapezoid approximation
for n = 1 is:
f (b) + f (a)
(b − a) ·
2
Solution. The trapezoid approximation is given by
Z b
l(x)dx
a

89
where
x−a
l(x) = f (a) + (f (b) − f (a)) ·
b−a
So the error is given by Z b


f (x) − l(x)dx
a

In Lecture 10, we got the following error bounds on |f (x) − l(x)| using the higher order
Rolle’s theorem:
(x − a)(b − x) 00
f (x) − l(x) = · f (c)
2
for some c ∈ (a, b). Since f 00 is continuous on [a, b], let M be a bound for |f 00 |. Then the
error is bounded as follows:
Z b Z b Z b 3
(x − a)(b − x) |b − a||b − a|
M dx = M (b − a)

f (x) − l(x)dx 6 · M dx 6

a

a
2
a
2 2

Theorem 4.4.4 (Fubini’s Theorem for Sequences). If


∞ X
X ∞
an,m
n=1 m=1

converges absolutely, i.e.


∞ ∞
!
X X
|an,m | <∞
n=1 m=1

then so does ∞ X

X
an,m
m=1 n=1

and ∞ X
∞ ∞ X

X X
an,m = an,m
n=1 m=1 m=1 n=1

Proof. Let

X
sn = |an,m |
m=1

By assumption,

X
sn < ∞
n=1

Fix ε > 0. There exists N ∈ N such that for all k > N


k
X ε
sn <
n=N +1
2

90
Also, for each n,

X
|an,m | < ∞
m=1
Hence, we can find M > N such that for all n 6 N and for all l > M
l
X ε
|an,m | <
m=M +1
2N
Then for all l, k > M
k X l M X
M
k X
l M l
X X X X X


an,m − an,m =
an,m + an,m
n=1 m=1 n=1 m=1 n=M +1 m=1 n=1 m=M +1
k l N l
X X X X
6 an,m + an,m
n=M +1 m=1 n=1 m=M +1
ε ε
< +N ·
2 2N

Using a similar calculation, one can show that. Hence, for large enough M and l, k > M , we
have that
X l Xk l X
X k
an,m , an,m
n=1 m=1 m=1 n=1
are both within ε of the square sum
M X
X M
an,m
n=1 m=1

So
l X
X k l X
X k
an,m , an,m
n=1 m=1 m=1 n=1
both converge to
M X
X M
lim an,m
M →∞
n=1 m=1

Multiplication of Power Series

Definition 4.4.5. Let (cn ), (dn ) be two sequences of real numbers. We define the sequence
(en ) by
Xn
en = cj dn−j
j=0

The sequence (en ) is called the convolution of (cn ), (dn ).

91
Theorem 4.4.6. Let f, g : (a − r, a + r) → R be real analytic at a, with radius of convergence
> r, given by power series
X∞
cn (x − a)n
n=0

and ∞
X
dn (x − a)n
n=0

respectively. Then f g is real analytic at a with radius of convergence > r, with coefficients
given by the convolution of (cn ) and (dn ).

Proof. Note

! ∞
! ∞ ∞
!
X X X X
cn (x − a)n dm (x − a)m = cn (x − a)n dm (x − a)m
n=0 m=0 n=0 m=0
∞ X
X ∞
= cn (x − a)n dm (x − a)m
n=0 m=0
∞ X
X ∞
= cn (x − a)n dk−n (x − a)k−n
n=0 k=0
∞ X
X ∞
= cn dk−n (x − a)k
n=0 k=0
X∞ X ∞
= cn dk−n (x − a)k
k=0 n=0
X∞ X k
= cn dk−n (x − a)k since dm = 0 for all m < 0
k=0 n=0
X∞ ∞
X
k
= (x − a) cn dk−n
k=0 n=0
X∞
= (x − a)k ek
k=0

where interchanging the order of the sums on line 5 is justified by Fubini’s theorem

cn xn , dn y n converge absolutely at x, y ∈ R, then


P P
Corollary 4.4.7. More generally, if
∞ n
!
X X
cj dn−j xj y n−j
n=0 j=0

converges, to

X ∞
X
n
cn x dn y n
n=0 n=0

92
Definition 4.4.8. Define exp : R → R by

X xk
exp(x) =
k=0
k!

Theorem 4.4.9. (1) exp(x) converges absolutely for all x ∈ R


(2) exp is differentiable, and exp0 (x) = exp(x)
(3) exp is continuous, hence Riemann integrable, and
Z b
exp(x)dx = exp(b) − exp(a)
a

(4) exp(x + y) = exp(x) · exp(y)


(5) exp(0) = 1, exp(−x) = 1/ exp(x)
(6) exp is strictly increasing
Proof. (1) Note
1
lim sup =0
n→∞ n!
so the radius of convergence is infinite.
(2) By our earlier theorem, exp(x) is differentiable, and its derivative is given by term by
term differntiation of the series. One can easily show by induction that the coefficients
of exp0 (x) are exactly that of exp(x), whence the two series are equal.
(3) Since exp is differentiable, it is clearly continuous. One can show that
Z b
exp(x)dx = exp(b) − exp(a)
a

either by using term by term integration (which is possible by a previous theorem), or


by using the fundamental theorem of calculus and applying (2).
(4) Note

! ∞
!
X xk X yk
exp(x) exp(y) =
k=0
k! k=0
k!
∞ X n
X xj y n−j
= ·
n=0 j=0
j! (n − j)!
∞ n
X 1 X n!
= xj y n−j
n=0
n! j=0
j!(n − j)!

X 1
= (x + y)n
n=0
n!
= exp(x + y)

93
(5) Using (4) with x = y = 0 gives

exp(0 + 0) = exp(0) = exp(0) · exp(0)

so exp(0) = 1. Then for any x ∈ R,

exp(−x + x) = exp(0) = exp(x) · exp(−x) = 1

so
1
exp(−x) =
exp(x)

(6) It is straightforward to show that exp(x) is strictly positive, so (2) shows that exp0 (x) is
strictly positive.

Definition 4.4.10. Define ∞


X
e = exp(1) =
n=0

Proposition 4.4.11. For every rational q, exp(q) = eq .

Proof. An induction argument using (4) above shows the claim is true for q ∈ N, and (5)
establishes the claim for q ∈ Z. Then using (4) once more estalishes the claim for all
rationals.

Definition 4.4.12. For every a > 0, define the function x 7→ ax to be the unique continuous
extension of the function q 7→ aq for q ∈ Q.

Corollary 4.4.13. For all x ∈ R, exp(x) = ex .

Proof. This is clear by the last two propositions, since exp is continuous.

94
5 Week 5
As per the syllabus, Week 5 topics include: Fubini theorem for sequences, multiplication
of power series, the exponential and logarithm, sine and cosine, uniform approximation of
periodic functions by trigonometric polynomials, multi-variable differentiation, the chain
rule, partial derivatives, directional derivatives, differentiability of functions with continuous
partial derivatives, inverse function theorem, implicit function theorem, Lagrange multipliers,
integrals in several variables, change of variables, differentiation under the integral sign,
integration over product of spaces and double integrals, Clairaut’s theorem on equality of
mixed partial derivatives, local minima, maxima, and saddle points in two variables, Taylor’s
formula with remainder for functions of several variables, connection to Newton’s method in
several variables, line integrals, Green’s theorem, divergence theorem, Stokes theorem in R3 .

5.1 Lecture 17 - Some Special Functions and Differentiation in


Several Variables
Today, we will introduce the natural logarithm and the trigonometric functions sin and cos.
We will prove several key results about these functions, and will introduce basic Fourier
analysis. We will also discuss differentiation in several variables, and will prove the chain
rule. We will also introduce directional derivatives.
Definition 5.1.1. ln : (0, ∞) → R is the inverse of exp. Note that ln exists and is continuous
since exp is strictly monotone increasing, continuous, and onto (0, ∞).
Proposition 5.1.2. ln0 (y) = 1/y.
Proof. By our earlier theorem on derivatives of inverses (see Lecture 9, Prop 3.1.8), ln is
differentiable, and
1
ln0 (y) =
exp0 (x)
where y = exp(x), so
1 1
ln0 (y) = =
exp(x) y

P∞
Proposition 5.1.3. − n=1 xn /n converges to ln(1 − x) on (−1, 1).
Proof. Note that the series expansion

X
tn
n=0
converges absolutely to
1
f (t) =
1−t
for all |t| < 1. By our earlier theorem on integration of power series, we have
Z x ∞ Z x ∞ ∞
1 X
k
X xk+1 X xn
dt = t dt = =
0 1−t k=0 0 k=0
k + 1 n=1 n

95
for all x ∈ (−1, 1). Additionally, by the fundamental theorem of calculus,
Z x x
1
dt = − ln(1 − t) = − ln(1 − x)
0 1−t 0

Corollary 5.1.4. ln(1 − x) is analytic on (−1, 1).


Example 5.1.5. Here is a nice application of Abel’s theorem. In Lecture 5, we showed that

X xn
n=0
n

converges at x = −1. By Abel’s theorem, it converges to − ln(x − 1) at x = −1, so



X xn
= − ln(2)
n=1
n

Proposition 5.1.6. Let f : (a − r, a + r) → (b − s, b + s) and g : (b − s, b + s) → R be analytic


at a, b with radii of convergence r, s > 0 respectively. Then g ◦ f is analytic at a, with radius
of convergence r.
Proof. Here’s a sketch. Plug the power series of f into the power series of g, and use multipli-
cation of power series (i.e., convolution and Fubini) together with absolute convergence.
Example 5.1.7 (The Trigonometric Functions).
Definition 5.1.8. For α ∈ R, define sin α, cos α as fllows. Trace the path of length α on the
unit circle counterclockwise, starting at (1, 0). Let (x, y) be the endpoint of the path. Then
sin α = y, cos α = x. α is also the angle in the resulting right triangle with horizontal side
x, vertical side y, and hypotenuse 1.
Proposition 5.1.9. sin, cos are periodic, with period 2π.
Proposition 5.1.10. In the triangle with horizontal side x, vertical side y, and hypotenuse
r, where the angle between the horizonal side and the hypotenuse is α, x = r cos α, y = r sin α.
Proposition 5.1.11. sin and cos are continuous. In fact, they are Lipschitz continuous with
constant 1 on sufficiently small neighborhoods.
Proof. We give proof for sin; the proof for cos is very similar. Let h be a small perturbation
of the angle α. We want to put bounds on the quantity sin(α + h) − sin(α). Let u be the
length of the chord connecting the points on the unit circle determined by the endpoints of
the path of length α and the path of length α + h. Clearly, u < h, since h is the length of
the arc connecting the two points, which must be greater than the length of the straight line
connecting the two points, i.e. u. If θ is the angle between u and the vertical line extending
from the endpoint of the path of length α + h, then it is clear that
sin(α + h) − sin(α) = u cos θ 6 u < h = 1 · h

96
Proposition 5.1.12. sin and cos are differentiable, and sin0 α = cos α, cos0 α = − sin α.

Proof. We give proof for only sin. Let u, h, α, and θ be as they were the previous proof.
Then note
π h π
α+ −θ+ =
2 2 2
So
sin(α + h) − sin(α) u cos θ u h
= = cos(α + )
h h h 2
As h → 0, u/h → 1, and by the continuity of cos, cos(α + h/2) → cos(α), so

sin(α + h) − sin(α)
→ cos(α)
h

Corollary 5.1.13. sin, cos are smooth.

Proof. This is an easy induction using the above proposition.

Proposition 5.1.14. The taylor expansion of sin, cos around 0 converges to sin, cos on
(−∞, ∞).

Proof. We use the Lagrange Remainder. As cos and sin are smooth, the remainder for the
n-term expansion is (n+1)
f (c)
|Rn (x)| = xn+1
(n + 1)!
for some c between 0 and x. Then as f (n+1) = ± sin or ± cos, we have
1
|Rn (x)| 6 |x|n+1
(n + 1)!

which converges to 0 as n → ∞ for each fixed x.

Corollary 5.1.15.
x3 x5
sin x = x − + + ···
3! 5!
x2 x4
cos x = 1 − + + ···
2! 4!
Definition 5.1.16. The trigonometric polynomials are the functions obtained as linear
combinations of sin(nx), cos(nx) for n = 0, 1, 2, . . .. Note all trigonometric polynomials
are periodic with period 2π. We can then view them as functions on the interval [0, 2π]
where we identify 0 and 2π, or equivalently, as functions on the unit circle (denoted R/2πZ).

Proposition 5.1.17. The trigonometric polynomials form an algebra on R/2πZ.

97
Proof. By definition, the trigonometric polynomials are closed under addition and scalar
multiplication, so we only need to show closure under products. It is sufficient to show that
for all m, n ∈ N,
sin(nx) cos(mx), sin(nx) sin(mx), cos(nx) cos(mx)
are trigonometric polynomials. Using the following trigonometric identities,
1
sin A sin B = [cos(A − B) − cos(A + B)]
2
1
sin A sin B = [cos(A − B) + cos(A + B)]
2
1
sin A cos B = [sin(A − B) + sin(A + B)]
2
we note
1
sin(nx) sin(mx) = [cos((n − m)x) − cos((n + m)x)]
2
1
cos(nx) cos(mx) = [cos((n − m)x) + cos((n + m)x)]
2
1
sin(nx) cos(mx) = [sin((n − m)x) + sin((n + m)x)]
2
which completes the proof.

Proposition 5.1.18. The algebra of trigonometric polynomials is unital and separate points.

Proof. The constant function 1 is cos(0x), hence is a trigonometric polynomial. One can
check that for any x 6= y ∈ [0, 2π), either sin x 6= sin y or cos x 6= cos y.

Corollary 5.1.19. Any continuous function on R/2πZ (or equivalently, any continuous
function on R with period 2π) is the uniform limit of trigonometric polynomials.

Proof. Immediate by Stone-Weierstrass.


Note that Tao’s proof of Stone-Weierstrass also gives formulas for the coeffficients of the
approximating trigonometric polynomials, using convolutions. The coefficients obtained are
the start of Fourier analysis.

Multivariable Calculus (Differentiation and Integration)

5.2 Lecture 18 - Inverse Function Theorem, Implicit Function


Theorem and Lagrange Multipliers
Today, we will prove two essential results of multivariable calculus, namely the Inverse Func-
tion Theorem and the Implicit Function Theorem. We will then use the Implicit Function
Theorem to rigorously prove the Lagrange multipliers method for finding extrema constrained
to a particular surface.

98
5.3 Lecture 19 - Multivariable Integration and Vector Calculus
Today, we will introduce Riemann integration in several variables, and we will prove Fubini’s
theorem, which allows you to interchange the order of integration under the appropriate
circumstances. We will also prove the standard results of vector calculus, namely Green’s
Theorem, Stoke’s Theorem, and the Divergence Theorem.

99

You might also like