Foundations of Mathematics

Notes on the Foundations
of Mathematics and Analysis

Eduardo Dueñez Lucio Tavernini
July 28, 2020
Contents
0 Background 1
0.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
0.2 Proof and Paradox . . . . . . . . . . . . . . . . . . . . . . . . . . 3
0.3 Essential Linguistic Concepts . . . . . . . . . . . . . . . . . . . . 6
0.4 A Dog’s World . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
0.5 Peano’s Postulates, Induction . . . . . . . . . . . . . . . . . . . . 11
0.6 Formal Grammars . . . . . . . . . . . . . . . . . . . . . . . . . . 13
0.7 Warning About the Notation Used . . . . . . . . . . . . . . . . . 16
1 The Propositional Calculus 1-1

1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-1
1.2 The Language of the Propositional Calculus . . . . . . . . . . . . 1-2
1.3 Substitution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-5
1.4 Interpretations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-6
1.5 Tautologies and Contradictions . . . . . . . . . . . . . . . . . . . 1-8
1.6 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-11
1.7 Some Useful Tautologies . . . . . . . . . . . . . . . . . . . . . . . 1-16
1.8 Proofs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-18
1.9 Appendix A: More Examples . . . . . . . . . . . . . . . . . . . . . 1-21
1.10 Appendix B: The Greek Alphabet . . . . . . . . . . . . . . . . . . 1-24
2 The Predicate Calculus 2-1

2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-1
2.2 Quantification of Predicates . . . . . . . . . . . . . . . . . . . . . 2-2
2.3 Formulas of the Predicate Calculus . . . . . . . . . . . . . . . . . 2-4
2.4 Interpretations Revisited . . . . . . . . . . . . . . . . . . . . . . . 2-8
3 Sets 3-1
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-1
3.2 Axioms of Extension and Separation . . . . . . . . . . . . . . . . 3-3
3.3 Intersections and Differences . . . . . . . . . . . . . . . . . . . . . 3-8
3.4 Unions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-9
ii
CONTENTS iii
3.5 Powers, Products and Ordered Pairs . . . . . . . . . . . . . . . . 3-12

3.6 Appendix A: The Axiom Schema of Separation . . . . . . . . . . 3-16
4 The Natural Numbers 4-1

4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-1
4.2 The Set of Natural Numbers . . . . . . . . . . . . . . . . . . . . . 4-2
4.3 The Peano Postulates . . . . . . . . . . . . . . . . . . . . . . . . . 4-3
4.4 Trichotomy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-8
5 Functions and Relations 5-1

5.1 Functions, Forward and Inverse Images . . . . . . . . . . . . . . . 5-1
5.2 Compositions and Identities . . . . . . . . . . . . . . . . . . . . . 5-5
5.3 Injections, Surjections, Bijections and Inverses . . . . . . . . . . . 5-7
5.4 Properties of Functions . . . . . . . . . . . . . . . . . . . . . . . . 5-11
5.5 Indexed Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-15
5.6 Equinumerous Sets . . . . . . . . . . . . . . . . . . . . . . . . . . 5-17
5.7 Relations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-21
5.8 The Recursion Theorem . . . . . . . . . . . . . . . . . . . . . . . 5-26
5.9 Cantor and Schröder-Bernstein Theorems . . . . . . . . . . . . . . 5-30
6 Numbers and Arithmetic 6-1

6.1 The Arithmetic of the Natural Numbers . . . . . . . . . . . . . . 6-1
6.2 Order . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-12
6.3 The Integers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-20
6.4 The Rationals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-24
6.5 Algebraic Structures . . . . . . . . . . . . . . . . . . . . . . . . . 6-29
7 The Axiom of Choice 7-1

7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-1
7.2 Products and the Axiom of Choice . . . . . . . . . . . . . . . . . 7-2
7.3 One-Sided Inverses . . . . . . . . . . . . . . . . . . . . . . . . . . 7-6
7.4 Countable and Uncountable Sets . . . . . . . . . . . . . . . . . . 7-8
8 The Real Numbers 8-1

8.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-1
8.2 The Reals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-2
8.3 Addition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-6
8.4 Multiplication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-12
8.5 The Real Field, Density . . . . . . . . . . . . . . . . . . . . . . . 8-15
8.6 Monotonic Functions. No-Gaps Condition. . . . . . . . . . . . . . 8-21
8.7 Powers, Roots, Exponentials and Logarithms . . . . . . . . . . . . 8-27
CONTENTS iv
8.8 Sequences and Numerals . . . . . . . . . . . . . . . . . . . . . . . 8-37
9 Important Subsets of the Reals 9-1

9.1 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-1
9.2 Countable and Uncountable Sets . . . . . . . . . . . . . . . . . . 9-3
9.3 Open and Closed Sets . . . . . . . . . . . . . . . . . . . . . . . . 9-14
9.4 Properties of Open and Closed Sets . . . . . . . . . . . . . . . . . 9-19
9.5 Convexity and Connectedness . . . . . . . . . . . . . . . . . . . . 9-25
9.6 Compactness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-26
10 Sequences and Series 10-1

10.1 The Extended Reals . . . . . . . . . . . . . . . . . . . . . . . . . 10-1
10.2 The Topology of the Extended Reals . . . . . . . . . . . . . . . . 10-5
10.3 Sequences and their Limits . . . . . . . . . . . . . . . . . . . . . . 10-10
10.4 Subsequences and Subsequential Limits . . . . . . . . . . . . . . . 10-15
10.5 Cauchy Sequences . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-20
10.6 Some Convergence Results . . . . . . . . . . . . . . . . . . . . . . 10-22
10.7 Contractive Sequences . . . . . . . . . . . . . . . . . . . . . . . . 10-25
10.8 The Binomial Theorem . . . . . . . . . . . . . . . . . . . . . . . . 10-26
10.9 Monotonicity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-34
10.10 Series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-43
10.11 Euler’s Number e . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-59
11 Continuity 11-1
11.1 Limits of Functions . . . . . . . . . . . . . . . . . . . . . . . . . . 11-1
11.2 Continuous Functions . . . . . . . . . . . . . . . . . . . . . . . . . 11-13
11.3 Density and compactness in spaces of functions . . . . . . . . . . 11-27
12 Answers to the Exercises 12-1

12.1 Note 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12-1
12.2 Note 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12-10
12.3 Note 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12-13
12.4 Note 4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12-24
12.5 Note 5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12-28
12.6 Note 6 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12-41
12.7 Note 8 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12-58
12.8 Note 9 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12-69
12.9 Note 10 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12-76
12.10 Note 11 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12-87
Foundations Note 0
Background
Eduardo Dueñez and Lucio Tavernini

July 28, 2020
Contents
0.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
0.2 Proof and Paradox . . . . . . . . . . . . . . . . . . . . . . . . 3
0.3 Essential Linguistic Concepts . . . . . . . . . . . . . . . . . . 6
0.4 A Dog’s World . . . . . . . . . . . . . . . . . . . . . . . . . . 7
0.5 Peano’s Postulates, Induction . . . . . . . . . . . . . . . . . . 11
0.6 Formal Grammars . . . . . . . . . . . . . . . . . . . . . . . . 13
0.7 Warning About the Notation Used . . . . . . . . . . . . . . . 16
0.1 Introduction
There are those (Platonists) who believe that mathematics has an external reality
of its own and that mathematical objects really exist, somewhere out there, inde-
pendently of what we know about them, or what we call them.∗ With this point of
view, mathematics is a science. All mathematical theorems already exist, waiting
to be discovered.
There are others (formalists) who believe that mathematics has no external
reality and that mathematical objects only exist insofar as we have the ability to
describe their existence, to invent them, like characters in a novel. With this point
∗
“What’s in a name? That which we call a rose / By any other word would smell as sweet”
(Juliet, Act 2, Scene 2) refers to an external reality.
1
Note 0: Background – July 28, 2020 2
of view, mathematics is a game, the rules of which we make up as we go along,

according to our pleasure. In other words, mathematics is a manner of linguistic
expression and nothing more.
There are several other flavors of beliefs concerning what mathematics is and
the sort of reality in which it exists. In the end, it doesn’t matter who believes
what: Most mathematics courses a student encounters are pretty much immune
from such personal beliefs.
We would like to start developing mathematics using zero mathematical prereq-
uisites. Unfortunately, this does not appear to be possible. As harsh as this may
sound, we must understand certain mathematical concepts before we can present
those very same mathematical concepts. It is not so difficult to accept that math-
ematics is circular once we accept that we cannot bootstrap mathematics from
nothing. Our point of view is that we need a primitive understanding of recursive
structures. These are patterns we weave with language. Patterns that must be
grasped with ease by the reader if we are to proceed.
In Note 1 and Note 2 we develop the “background language,” where notions such
as set, set membership, function, finite and infinite are used without explanation to
develop the rules of reasoning we use in everyday mathematics. The background
language is the “bootstrapping” language: the most primitive language we use to
get us going. You may question the wisdom of our choice, but you can hardly
question that we need a starting place.†
Then, with only two undefined primitive notions (set and set membership) we
begin the construction of a mathematical universe where quite a bit of mathe-
matics exists (so that all mathematical objects are objects in this universe): the
ZF universe, named after Ernst Zermelo and Abraham Fränkel, the two major
contributors to the ideas used in its construction.
We do not claim that the ZF universe is the only, or best, place to do math-
ematics. In fact, the ZF universe does not suffice for all of mathematics. But, it
works quite well for enough mathematics to keep us going for some time. The ZF
universe is a good beginner’s universe, a nice place in which to start developing
some understanding of mathematical objects: Here, every mathematical object is
a set. There are only sets, and nothing but sets. The “unsets” cannot be spoken
of (because they don’t exist). But, we’re getting ahead of ourselves. Mathematics,
like many other enterprises, relies on the use of language to talk about truth and
falsehood. Therefore, we begin by reviewing how we use language.
†
There is a branch of mathematics called logic, but this not the place where beginners can
start comfortably.
0.2 Proof and Paradox

In mathematics we must do a great deal of logical reasoning. We are not going
to argue about what “logical reasoning” means. We want to use everyday logic,
common-sense logic.
Given certain statements, by the use of reason and known facts we should be
able to determine the truth or falsehood of those statements. We begin our practice
with statements made in plain everyday language.
0.2.1 Example. Sancho Panza, Governor of Barataria: Sancho Panza, while

governor of the island Barataria‡ is asked for an opinion on the following difficult
and important case. To cross a certain bridge, next to which there is a gallows,
one must state one’s destination and purpose (adónde y a qué va). According to
local law, a person making a false statement must be punished by hanging until
dead on the gallows by the bridge, while a person making a true statement must
be allowed to cross freely. No exceptions can be made (sin remisión alguna).
The problem arises when a man states that he wants to cross so he may be
hanged on the gallows by the bridge. The judges realize that they have a problem:
If they hang the man, then his statement is true, so he must be allowed to cross
freely. If they don’t hang him, then his statement is false, so he must be hanged,
as required by law.
Since the man is lying and is also telling the truth, Sancho at first proposes that
the portion§ which is lying be hanged, while the portion which is telling the truth
be allowed to cross, thus satisfying the letter of the law (desta manera se cumplirá
al pie de la letra la condición del pasaje). The emissary sent by the judges points
out that this would result in the man’s death, violating the spirit of the law (no se
consigue cosa alguna de lo que la ley pide) for the other portion. Sancho recognizes
the problem and suggests the man be allowed to cross, adding that the law should
be merciful when there is doubt.
Sancho’s solution makes sense, but does not solve the logical dilemma. His
solution is very practical, rather than logical. How do we find a practical solution
if we insist that we must know whether a statement is true or false and cannot be
anything else?
We call a paradox or antinomy a statement which following a logically correct
argument leads to self-contradiction. Of course, we are not being very precise here:
What is a statement? What is a logically correct argument? Does self-contradiction
mean that if a statement is true then it must be false, and if false then it must
‡
Miguel de Cervantes Saavedra, Don Quijote de la Mancha (II, Chapter LI), Editorial Juven-
tud, Barcelona, 1968, p. 908.
§
Of the man himself!
be true? What is the meaning of truth and falsehood? Let’s take these things to
mean what they usually mean in everyday language.
We have used, and will continue to use, a condition that is essential to everyday
mathematical reasoning, the law of the excluded middle: either a statement is true
or its denial (also called negation) is true.
We have just seen an example of a paradox. (This is by no means the shortest
known paradox, but it is one of the most colorful.) It is important to recognize
one when we see it. It is equally important not to see one when there is none, as
we show in the next example.
0.2.2 Example. Paul’s Epistle to Titus: We present what is sometimes incor-

rectly called the “Epimenides paradox.” The Berry¶ word-for-word translation of
Paul’s Epistle to Titus, I, 12, reads
Said one of themselves of their own a prophet, Cretans always [are]

liars, evil wild beasts, gluttons lazy.
(The “are” is not explicit in the original Greek, but is implied.) The corresponding
King James Version reads
One of themselves, even a prophet of their own, said, The Cretians [sic]
are alway [sic] liars, evil beasts, slow bellies.
Evidently, early Christian tradition identifies the prophet as the philosopher

Epimenides of Crete (sixth century b.c.e.). “Said one of themselves of their own”
means that it was a Cretan who made the statement. We may as well call this
Cretan Epimenides. The statement also says “Cretans are always liars.” For our
argument we take just these two clauses.
Consider now the following line of reasoning.
If what Epimenides said is true, then his statement must be false, be-
cause Epimenides is a Cretan. On the other hand, if what he said is
false, then Cretans aren’t liars, so that his statement must be true. The
conclusion is that if Epimenides’s statement is true, then it is false, and
that if it is false, then it is true.
The flaw should be obvious: The negation of “Cretans are always liars” is “there
is some Cretan who is not always a liar.” The above reasoning contains a serious
blunder in taking the negation of a sentence. Let’s try again with a different line
of reasoning:
¶
George Ricker Berry, The Interlinear Literal Translation of the Greek New Testament, Zon-
dervan, Grand Rapids, Michigan, 1958, p. 555.
If Epimenides’ statement is true, then what he said is false, because

Epimenides is a Cretan, so there must be some Cretan who is not always
a liar. If what he said is false, the statement being false is the same as
saying that there is some Cretan who is not always a liar. Therefore
Epimenides’ statement that Cretans are always liars (whether true or
false) implies that there is some Cretan who is not always a liar. That
is, the statement is false.
The “Epimenides paradox” is no paradox after all. The conclusion that the
statement is false applies to “Cretans are always liars” in the context of the state-
ment “A Cretan said: ‘Cretans are always liars.’ ” In other words, we have not
shown that Paul’s Epistle to Titus contains a false statement. We have shown that
the alleged paradox does not exist.
0.2.3 Example. Proof by Contradiction: The argument we used in the previ-

ous section is called proof by contradiction, also known by its Latin name, reductio
ad absurdum for “reduction to the absurd,” or “reduction to the impossible.” This
is a logical argument where we assume the truth of a statement to arrive at a false
result (an absurdity). We then conclude that the original statement we assumed
to be true, must, in fact, be false. This line of reasoning is based on the law of the
excluded middle, whereby a statement must be either true or false.
In the notes that follow, we shall discuss proof by contradiction in great detail
and we shall use it frequently in many of the proofs. Below, we give a more practical
application to illustrate its use in everyday mathematics.
Recall that the integer m is even if m = 2k for some integer k and is odd if
m = 2k + 1 for some integer k.‖ We show that if m2 is even then m is also even.
To obtain a contradiction, suppose that m2 is even and that m is odd.
Since m is odd, we have m = 2k + 1, which gives
m2 = (2k + 1)2 = 4k 2 + 4k + 1 = 2(2k 2 + 2k) + 1,
which is odd, contradicting the fact that m2 is even. We conclude that if m2 is

even then so is m. √
Now we prove that 2 is irrational. Recall that a real number x is called
rational if it can be written as the ratio m/n of two integers.
√ Otherwise, x is called
irrational. To obtain a contradiction, suppose that 2 is rational, so that we can
write
√ m
(1) 2= .
n
‖
For now, we assume without proof the well-known theorem that (according to the above
definitions) every integer is either even or odd, but never both.
Further, we can assume that m and n have no common factors, i.e.: we assume
that the fraction in (1) is in reduced form.
By squaring (1) we obtain 2 = m2 /n2 , which gives
(2) m2 = 2n2 .
That is, m2 is even. This means that m is also even. In other words, m = 2k for
some integer k. Therefore, m2 = 4k 2 . We substitute this last equality in (2) to
obtain 4k 2 = 2n2 . This gives, 2k 2 = n2 , showing that n2 is also even. Hence, n is
even also.
We have reached the conclusion that m and n are both even, contradicting our
original assumption that the fraction in (1) is in reduced form.√
Therefore, we have shown that√it is impossible to write 2 in the form (1).
Equivalently, we have shown that 2, not being rational, must be irrational.
If it is not clear why (or how) proofs by contradiction work, be patient. We
shall examine the ideas involved in great detail in the next note. Many will be
quite surprised at the simplicity of the ideas involved in this line of reasoning.
If you feel frustrated at this point, rest assured that proofs by contradiction
should become quite clear in Note 1.
0.3 Essential Linguistic Concepts

Mathematical structure is an extension of linguistic structure. (We need language
to communicate mathematics.) Because we would like to begin with zero math-
ematical prerequisites, we must face the issue of how we should go about getting
started. (As it turns out, we cannot even present linguistic structure with zero
mathematical prerequisites: Linguists use mathematical notions to describe the
structure of languages.)
Any discussion, if it is to begin without presupposing that all participants
have prior expertise in some specialized area of knowledge, must begin with some
common, plain, everyday language, the background language: the informal language
we use here to talk about mathematics. If you and I had no common language,
where would we begin? Where could we begin?
Because language (any language) cannot define itself, we would need an ex-
ternal reference of some sort. Lévi-Strauss∗∗ illustrates the confusion caused by a
confusing set of external references reported by Smith Bowens:††
∗∗
Claude Lévi-Strauss, The Savage Mind, The University of Chicago Press, Chicago, Illinois,
1966, p. 6
††
Elenore Smith Bowens, Return to Laughter, Anchor Books, New York, NY, 1964, p. 16.
E. Smith Bowens scarcely exaggerates in the amusing description she

gives of her confusion when, on her arrival in an African tribe, she
wanted to begin by learning the language. Her informants found it
quite natural, at an elementary stage of their instruction, to collect
a large number of botanical specimens, the names of which they told
her as they showed them to her. She was unable to identify them,
not because of their exotic nature but because she had never taken an
interest in the riches and diversities of the plant world. The natives on
the other hand took such an interest for granted.
These people are farmers: to them plants are as important

and familiar as people. I’d never been on a farm and am not
even sure which are begonias, dahlias, or petunias. Plants,
like algebra, have a habit of looking alike and being differ-
ent, or looking different and being alike; consequently math-
ematics and botany confuse me. For the first time in my life
I found myself in a community where ten-year-old children
weren’t my mathematical superiors. I also found myself in a
place where every plant, wild or cultivated, had a name and
a use, and where every man, woman and child knew literally
hundred of plants ... (my instructor) could not realize that it
was not the words but the plants which baffled me. (Smith
Bowens, p. 19).
In the same way that language cannot define itself but must rely on external ref-
erences, mathematics cannot define itself but must rely on some degree of linguistic
structure being already present. What linguistic structure must be within the lin-
guistic competence of the reader to understand the beginning of the development
of mathematics? We begin to answer this question in the next section.
The structures (whether linguistic or mathematical) are called inductive struc-
tures. In the next section we begin with the simplest possible meaningful example.
0.4 A Dog’s World

How can we define a language? We show how to accomplish this for a very simple
language, one that is a very, very small subset of English, to keep things simple
and manageable.
To simplify things we consider only upper case letters and ignore punctuation
marks. Is the text
bo has fleas
an English sentence? Of course it is. Is the text
the dog who bit bo has fleas
an English sentence also? Again, yes. How about the text given below?
the dog who bit the dog who bit bo has fleas
Again, we are compelled to say yes. You can see where all of this leads: If I ask
“Can I prefix the above sentence with another the dog who bit and still have an
English sentence?” you’d be hard-pressed to say no; on what grounds can you deny
me this request? On what grounds can you object to my repeating this operation
any number of times? Where can you draw the line? Is two times permissible, but
not three? Is 346 times permissible, but not 347?
Out of sheer frustration, you might agree that the text
the dog who bit · · · the dog who bit bo has fleas
is always an English sentence, where “· · · ” means “any number of times,” this

applying to the number of times the phrase the dog who bit is repeated.
The use of “· · · ” is objectionable: We don’t know what it means. We need
something precise, something definite, something explicit, something explicative:
We want to know precisely what “· · · ” means. Let’s answer this question. Let’s
define a language called DogLan (short for Dog Language). It’s a very simple
language defined by the rules:
(DL1) bo has fleas is a sentence of DogLan.
(DL2) If we prefix any sentence of DogLan with the text the dog who bit we
obtain a sentence of DogLan.
(DL3) The only sentences of DogLan are those which can be obtained from
finitely many applications of rules (DL1) and (DL2) above.
The phrase “finitely many” in (DL3) means that the process of applying rules
must eventually terminate: We do not allow this process (even if just mental) to
be repeated forever.
Is this so hard to understand? If you say yes, it might just mean that you find
these properties of English (and of many other languages) uninteresting, perhaps
even dull, finding yourself rather like Smith Bowens found herself in Africa.
There are plenty more: i dream, i dream that i dream, i dream that i dream that i
dream, and so on, is another.
You may reasonably ask: “Is this for real? What’s the point of all this?” The
point is that these are the things that we must look at. We’re using certain pecu-
liarities of language to achieve a very specific goal: The discovery—or invention, it
doesn’t matter which—of nothing less than mathematics. We are using mathemat-
ical thought at its most fundamental, irreducible, primitive level: What we have
done cannot be explained in simpler terms, and cannot separate “mathematical”
thought from “linguistic” thought.
Is the text
the dog who bit the dog who bit the dog who bit bo has fleas
a sentence of DogLan? To show that it is, consider the following argument. The
text
bo has fleas
is a sentence of DogLan by rule (DL1). We apply rule (DL2) to the above to obtain
the text
the dog who bit bo has fleas
which is a sentence of DogLan. We apply rule (DL2) to the above to obtain the
text
the dog who bit the dog who bit bo has fleas
which is a sentence of DogLan. Again, we apply rule (DL2) to the above to obtain
the dog who bit the dog who bit the dog who bit bo has fleas
which is a sentence of DogLan. Of course, this whole thing is highly unsatisfactory:

How can we compare the above sentence with the sentence given originally to see
if they’re identical?
We need an explicit method for comparing whether any two sentences of DogLan
are identical. We propose the following. To avoid writing so much, let us use the
abbreviations:
D is an abbreviation for the dog who bit.

B is an abbreviation for bo has fleas.
Further, let us say that if S is any sentence of DogLan, then DS represents the
sentence S prefixed with the dog who bit. We use the equality symbol “=”
between any two sentences of DogLan to mean that the two sentences are identical.
To deny that the two sentences are identical we use the symbol “̸=” instead. For
example:
̸
bo has fleas = the dog who bit bo has fleas
the dog who bit bo has fleas = the dog who bit bo has fleas
In abbreviated form, we have:
̸
B = DB
DB = DB
(In writing an expression such as DB, we are implicitly defining an operation,

which in this case is simply juxtaposition. In fact, this operation was already used
to state property (DL2) of DogLan.)
How do we know when two sentences of DogLan are identical? Consider the
following additional rules:
(DL4) The sentence bo has fleas cannot be obtained by prefixing any sentence
of DogLan with the phrase the dog who bit. In abbreviated form: If
S is any sentence of DogLan, then DS ̸= B.
(DL5) Let Sleft and Sright denote any two sentences of DogLan. If
DSleft = DSright .
then
Sleft = Sright
(Note that the converse of property (DL5) is assumed implicitly in order to make
the operation of juxtaposition well defined.)
For example, we see that DDDB ̸= DDB by arguing that
̸
DB = B by (DL4),
DDB = ̸ DB by (DL5),
DDDB = ̸ DDB by (DL5).
The concepts we have used in the above discussion are fundamental to the most
basic form of mathematical reasoning and cannot be put in more elementary form.
In fact, rules (DL1) through (DL5) are nothing less than a rephrasing of Peano’s
celebrated postulates (axioms) for the natural numbers, introduced in 1891. We
discuss some details below.
0.5 Peano’s Postulates, Induction

Can we describe the natural numbers 0, 1, 2, . . . using anything that is more ele-
mentary than our primitive understanding of these numbers? Can we explain what
these numbers are in terms of more primitive notions?
The notion of counting on one’s fingers can be extended and put to writing
by using unary notation, whereby we place a mark (say |) to denote 1, then add
another (say ||) to denote 2, then add another (say |||) to denote 3 and so on.
We prefer to start with 0, then write 0 | or 0′ (for 1), then 0 || or 0′′ (for 2), then
0 ||| or 0′′′ (for 3) and so on. The objects we generate in this manner represent the
collection N of natural numbers, N = {0, 1, 2, . . .}.
The notion of successor is very important. Every natural number n has exactly
one successor n′ , which is also a natural number. Of course, n′ = n + 1 in our
everyday language. The first successor is
0′ = 1.
The next is
0′′ = (0′ )′ = 1′ = 2.
Then comes
0′′′ = (0′′ )′ = 2′ = 3,
and so on. 0 is not a successor.
0.5.1 Definition. The Postulates: The natural numbers N satisfy the five ax-
ioms (N 1) through (N 5), given below, known as Peano’s postulates.∗
(N 1) 0 belongs to N.
(N 2) If n belongs to N, then its successor n′ also belongs to N.
(N 3) The only elements of N are those which can be obtained from finitely
many applications of rules (N 1) and (N 2) above.
(N 4) 0 is not the successor of any natural number.
(N 5) If n and m are any two natural numbers such that n′ = m′ , then n = m.

Note that the converse to (N 5), i.e.: if n = m then n′ = m′ , is true because
each natural number has exactly one successor.
0.5.2 Remarks. The similarities between N and DogLan are very obvious. In
fact, DogLan is N in disguise. Note the correspondence:
∗
Giuseppe Peano,“Sul concetto di numero,” Rivista di Matematica, 1(1891), 87-102, 256-267.
0 ↔ B
1 = 0′ ↔ DB
2 = 0′′ ↔ DDB
3 = 0′′′ ↔ DDDB
4 = 0′′′′ ↔ DDDDB
.. ..
. .
From N 1–N 3 we obtain the induction principle, which is of tremendous impor-
tance in the development of mathematics.
0.5.3 Theorem. Induction Principle: Suppose that S(n) is a statement about
the natural number n such that:
(IP 1) S(0) is true.
(IP 2) If S(n) is true then S(n′ ) is true.
Then, S(n) is true for all natural numbers n.
Proof. The natural numbers are generated thanks to (N 1)–(N 3). Suppose that
S(n) is a statement about the natural number n such that (IP 1) and (IP 2) hold.
Then, given any natural number m, how can we be sure that S(m) is true? If
m = 0, then S(m) is true by (IP 1). Otherwise, m is a successor, which is obtained
from 0 by applying the successor operation a specified number of times. Thanks to
(IP 1) and (IP 2), the latter applied with the same number of successor operations
required to generate m, we conclude that S(m) must be true.
0.5.4 Remark. Note that N 3 is required in the above proof because we must be
sure that all natural numbers other than 0 are successors and that the only natural
number that is not a successor is 0. In fact, N 3 and the induction principle are
equivalent. In many versions of Peano’s axioms the induction principle is used in
place of N 3.
0.5.5 Example. Let r be a real number with r ̸= 1. Use induction to prove the
correctness of the formula
∑
n
1 − rn+1
(3) rk = .
k=0
1−r
Answer. For n = 0 we have

∑
0
1−r
rk = 1 = ,
k=0
1−r
which is correct.
Suppose that the formula is correct for n. Then,
∑
n+1 ∑
n
k
r = rk + rn+1
k=0 k=0
1 − rn+1
= + rn+1
1−r
1 − rn+2
= .
1−r
The above is the formula (3) with n replaced by n′ = n + 1. In other words,
the assumption that (3) holds for n leads to the conclusion that it also holds for
n′ = n + 1. We have satisfied the hypotheses of the induction principle. Therefore,
the formula (3) holds for all n.
0.6 Formal Grammars

In this section we show how mathematical ideas can be used in linguistics to study
the structure of language. The mathematical-linguistic device to be described is
due to Chomsky† and is presented here in specialized form for simplicity.
0.6.1 Definition. Grammars: A grammar G = (N , T , P) consists of three ob-

jects:
• a finite collection N of nonterminal symbols with a distinguished element σ,
• a finite collection T of terminal symbols and
• a finite collection P of rewriting rules or productions;
together with the requirement that N and T have no element in common.
0.6.2 Definition. Rewriting Rules: A rewriting rule (or production) is a string

of the form
α → β,
where α is a nonterminal symbol and β is a juxtaposition of nonterminal and
terminal symbols consisting of at least one symbol. We call α the left side and β
the right side of the production α → β.
†
Noam Chomsky, Syntactic Structures, Mouton, The Hague, 1957.
0.6.3 Example. For DogLan the nonterminal symbols can be chosen as σ and δ;
the terminal symbols are B (for bo has fleas) and D (for the dog who bit);
the rewriting rules are
(R1) σ → B,
(R2) σ → δσ,
(R3) δ → D.
0.6.4 Definition. Languages: The language of G (the language defined by G) is
precisely the collection of all strings ρ of terminal symbols that can be obtained by
the application of finitely many productions starting with σ:
σ → · · · → ρ.
0.6.5 Remark. The idea behind the use of the rewriting rules is to start with one
of the rewriting rules with left side σ. Then, we rewrite the right side using one
of the rewriting rules. We keep rewriting the right side until we obtain a string
consisting exclusively of terminal symbols. Note:
σ → ··· → ρ
is called a derivation‡ in G.
For example, the following is a derivation in DogLan.
σ → δσ by (R2)
→ δδσ by (R2)
→ δδδσ by (R2)
→ δδδB by (R1)
→ δδDB by (R3)
→ δDDB by (R3)
→ DDDB by (R3)
0.6.6 Remark. An expression is any string of terminal symbols. A sentence, is an
expression that can be derived from the sentence symbol σ. If we derive a string of
terminal symbols from σ, we know that the string belongs to the language. (The
derived string is a sentence.) Can we show when it is impossible to derive a given
expression? The answer is affirmative for the type of grammars we are using here.
The basic idea is that it is possible to derive all sentences of a given length in
finitely many steps. Therefore, given an expression, we generate all sentences that
have the same length as the given expression. If we do not find the given string
listed among the sentences generated, the expression is not a sentence.
‡
Note that a derivation is yet another example of an inductive structure.
0.6.7 Example. Prove that DBD is not a sentence of DogLan.

Proof. Let |α| denote the length of any string α. We note that every production
α → β of DogLan is such that either
• |α| < |β|, in which case the string to which the production is applied becomes
longer, or
• |α| = |β| = 1, in which case β is a terminal symbol and the production α → β
cannot be used again.
In other words, if the length of the transformed string stays constant, eventually
we run out of productions that we can apply. This means that we can list all
derivations σ → · · · → ρ, where |ρ| = 3. If DBD is not listed, then DBD is not in
the language.
There is only one derivation that ends with a string of length 1:
σ → B.
There are two derivations that end with a string of length 2:
σ → δσ → Dσ → DB,
σ → δσ → δB → DB.
There are 6 derivations that end with a string of length 3:
 {

 δDB → DDB,

 δδB →

 DδB → DDB,



 {

δDB → DDB,
σ → δσ → δδσ → δDσ →

 DDσ → DDB,



 {



 DδB → DDB,
 Dδσ →
DDσ → DDB.
The string DBD is not listed. Therefore, it is not in the language.
0.6.8 Remark. In Note 1 we will discuss a certain language L (the language of
the propositional calculus). It turns out that L can be obtained from a grammar
G in which each of the collections N , T , P is infinite. This small variation of
definition 0.6.1 should, however, not prevent the reader from applying the present
discussion to the language L. Indeed, it is an excellent exercise to describe a
grammar G = (N , T , P) from which L can be obtained. The reader can answer
this question after reading Section 1.2 in Note 1. We will, however, not pursue this
particular construction of the language L in these notes.
0.7 Warning About the Notation Used

In this note we used mathematical symbols such as
=
→
↔
···
without having given these symbols a precise mathematical meaning. The reader
is presumed to possess an intuitive grasp of the intended purpose and meaning of
these symbols (from examining the context in which they are used).
It is important to understand that such sloppiness occurs only in Note 0. In
fact, this note is meant to serve two simple purposes. The first is to provide a lead
to the problem we face in just getting started. The second is to make the reader
aware that our language should be quite precise.
The word “proof” has been thrown about several times. Can the meaning of
the notion proof be explained? At this level, a proof is a sequence of statements
that, starting with some stated assumptions, leads to a conclusion through the
use of common-sense reasoning. In mathematics, a formal proof is defined as a
syntactical (grammatical) notion that is independent of semantics (meaning). In
these notes we shall not use formal proofs. Rather, we shall rely on the semantical
notion of logical consequence, also called tautological implication.
How should you study this note? There are no exercises and you will not be
tested on its contents. Think of this note as an eye opener, if you have never seen
this material before.
Foundations Note 1
The Propositional Calculus

July 28, 2020
Contents
1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-1
1.2 The Language of the Propositional Calculus . . . . . . . . . 1-2
1.3 Substitution . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-5
1.4 Interpretations . . . . . . . . . . . . . . . . . . . . . . . . . . 1-6
1.5 Tautologies and Contradictions . . . . . . . . . . . . . . . . . 1-8
1.6 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-11
1.7 Some Useful Tautologies . . . . . . . . . . . . . . . . . . . . 1-16
1.8 Proofs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-18
1.9 Appendix A: More Examples . . . . . . . . . . . . . . . . . . 1-21
1.10 Appendix B: The Greek Alphabet . . . . . . . . . . . . . . . 1-24
1.1 Introduction
In notes 1 and 2 we develop the basic rules of everyday mathematical reasoning. In
this note, we start with the so-called propositional calculus (also called sentential
calculus). Italicized terms in this informal introduction will later be defined for-
mally; however, their daily usage gives a rough first approximation to the precise
meaning that they have in mathematics.
A proposition is a statement (the assertion of a declarative sentence within an
interpretation) that has either the truth value truth, denoted by T , or falsehood,
1-1
Note 1: The Propositional Calculus – July 28, 2020 1-2
denoted by F . In other words, the interpretation of a sentence is a way of assigning

to that sentence the value T or F . For example, before we can turn the sentence
“Frank has been in Paris” into a statement, we must know who Frank is and which
Paris we are talking about. Statements, not sentences, are true or false. A sentence
cannot be turned into a statement without an interpretation.
Compound sentences are made up of smaller sentences joined together by terms
such as “or,” “and,” “just in case” and locutions such as “if … then ….” For example,
the compound sentence
Frank has been in Paris or Mary has been in Rome
can be decomposed as
(1.1) Frank
| {z in Paris} or Mary
has been has been in Rome,
| {z }
p q
where or is an example of what we call a sentential connective (also called a

Boolean, or logical, operator). If we let p stand for “Frank has been in Paris”
and let q stand for “Mary has been in Rome,” p and q are examples of propo-
sitional (or Boolean) variables. Thus, (1.1) can be represented symbolically by
(1.2) p ∨ q,
where the symbol ∨ stands for the sentential connective or.
Given an interpretation (Frank and Mary who? Paris, Texas and Rome, New
York?) the assertion of the above declarative sentence makes a statement that can
be true or false, depending on how we interpret the connective or. Below, we define
several connectives. We use “formula” as another name for “sentence.”
1.2 The Language of the Propositional Calculus

In elementary algebra we have a sense that while
(x + y)/z = r · x − w
is a well formed formula, the expression
x + y) + /z = r · x−))w
is not. The reason is that there are rules we must follow to write well formed
formulas. Below, we define the language L. This is the language we use here for
the propositional calculus. The language is defined by giving precise rules for the
formation of syntactically correct formulas.
1.2.1 Definition. The Language L: We start with the basic building blocks of
the language. The formal symbols of L are the symbols listed in quotation marks:
• the connectives ‘¬’, ‘∧’, ‘∨’, ‘→’, ‘↔’;
• left parenthesis ‘(’;
• right parenthesis ‘)’;
• an infinite collection of symbols ‘p’, ‘q’, ‘r’, …, ‘p1 ’, ‘p2 ’, ‘p3 ’, . . . ; called
variables.
We ignore spaces between formal symbols and we use the operation of concate-
nation by which we take a string (sequence, juxtaposition) of formal symbols and
juxtapose it with another string of formal symbols.
An expression of L is any sequence of formal symbols obtainable from the formal
symbols with finitely many applications of the concatenation operation. Thus,
(p ∧ q)))((∧∧
and
((p) ∧ q) ∨ (r ∧ s)
are examples of expressions of L. Note: To improve readability, we may use paren-
theses of sizes and shapes that differ from the official ‘(’ and ‘)’, since this practice
introduces changes that are purely cosmetic in nature.
Not all expressions of L make sense. Those that do, called sentences (or formu-
las), can only be assembled following the rules shown below. These are the rules
that separate what is grammatically correct from what is not.
Accordingly, a sentence, or formula, of L is not just any expression of L, but
one that is defined by the following rules.
(SL1) Any variable is a formula.
(SL2) If φ1 and φ2 are formulas, then
(φ1 ),
¬(φ1 ),
(φ1 ) ∧ (φ2 ),
(φ1 ) ∨ (φ2 ),
(φ1 ) → (φ2 ),
(φ1 ) ↔ (φ2 )
are formulas also.
(SL3) Formulas can only be constructed by finitely many applications of rules

(SL1) and (SL2).
1.2.2 Remark. Observe that a formula is nothing but a certain kind of expression,
hence it is just a sequence of formal symbols. The only distinction between an
arbitrary expression and a valid formula is the collection of syntactic rules (SL1)–
(SL3) that an expression must satisfy in order to be a formula. The point of this
remark is that no meaning whatsoever has been assigned to formulas. Meaning
can only be assigned in the context of an interpretation as explained in section 1.4
below.
1.2.3 Example. We show that
(1.3) ((¬(p)) ∨ (q)) ∧ ((p) ∨ (r))
is a formula of L.
Thanks to (SL1) the following are formulas of L:
(1.4) p, q, r.
Thanks to (SL2) and (1.4), the following are formulas of L:
(1.5) ¬(p), (p) ∨ (r).
Thanks to (SL2), (1.4) and (1.5), the following is a formula of L:
(1.6) (¬(p)) ∨ (q).
Finally, using (SL2), (1.5) and (1.6) we have that
((¬(p)) ∨ (q)) ∧ ((p) ∨ (r))
is a formula∗ of L.
1.2.4 Definition. The Names of the Connectives: The following are the
names of the connectives:
• the denial or negation of p, denoted by ¬p (read: “not p”);
∗
We have ignored an important issue: that of unique readability. In other words, every formula
of L can be read in one, and only one way. For example, in everyday English “she fed her dog
biscuits” can be read in at least two different ways. This is a syntactically ambiguous sentence
because we have two choices: We can take her dog as the indirect object and biscuits as the direct
object, or we can take her as the indirect object and dog biscuits as the direct object. In other
words, we can’t tell whether we should read the given sentence as “she fed biscuits to her dog,”
or whether we should read it as “she fed dog biscuits to her.”
• the conjunction of p and q, denoted by p ∧ q (read: “p and q”);
• the disjunction of p and q, denoted by p ∨ q (read: “p or q”);
• the conditional (or implication), denoted by p → q (read: “p implies q”);
• the biconditional (or equivalence, or bi-implication) between p and q, denoted

by p ↔ q (read: “p if and only if q”).
1.3 Substitution
1.3.1 Definition. If p1 , . . . , pn are propositional variables, a formula in p1 , . . . , pn
is a formula where only the propositional variables p1 , . . . , pn appear. For example
((p) ∧ (q)) → ((r) ∨ (s)) is a formula in p, q, r and s.
1.3.2 Theorem. The Rule of Substitution: If in a given formula φ every

occurrence of a propositional variable p is replaced by the formula (ψ), where ψ is
any (fixed) formula, then the resulting expression is a valid formula. The resulting
formula is denoted φ(p ← (ψ)) (“φ with (ψ) substituted for p”).
Proof. If p does not appear in φ then the substitution does nothing to φ, which
remains a valid formula.
Otherwise, we conclude that φ(p ← (ψ)) is a valid formula by reasoning as fol-
lows. Let the formula φ involve the variable p together with others: p1 , p2 , . . . , etc.
(It makes no difference whether some of these variables appear in the formula ψ.)
Starting from the variables p, p1 , p2 , . . . , successive applications of the rules
(SL1) and (SL2) generate a sequence of intermediate formulas ending in φ. (Rule
(SL3) ensures this). To find a sequence of intermediate formulas leading to the
final formula φ(p ← (ψ)), apply the same intermediate steps, but whenever p is
called for, insert (ψ) instead. Each of the steps remains valid because all application
of (SL2) apply generally to arbitrary formulas φ1 (and φ2 as required) and it is
irrelevant whether φ1 , say, is merely a variable (such as p) or some other formula
(such as (ψ)). Summing up, the formula φ(p ← (ψ)) is valid because we have
shown that it can be obtained from the rules (SL1) and (SL2).
1.3.3 Example. If the formula φ is (p) → (q) and the substitution is
q ← ((r) → (s)),
then φ(q ← ((r) → (s))) is the formula
(p) → ((r) → (s)).

1.3.4 Remark. The rule of substitution makes it unnecessary to make a typo-

graphic distinction between variables (which are denoted by Latin letters: p, q, r, . . . )
and formulas (which we have been denoting by Greek letters: φ, ψ, . . . ). In what
follows we will be sloppy about this distinction and will generally prefer to use
Latin letters to refer to both variables and formulas, with the understanding that
the rule of substitution allows for variables to be replaced by formulas.
1.4 Interpretations
Formulas must be interpreted before they become (true or false) statements. To do
this, each variable in the formula must be given an interpretation before the entire
formula can be given an interpretation.
1.4.1 Definitions. Interpretation of Formulas: To interpret formulas we define

how sentential connectives are interpreted.
• A truth value is either TRUE (truth) or FALSE (falsehood). We will often

abbreviate TRUE to T and FALSE to F .†
• An interpretation of a variable attaches a truth value to it. (For example, a

variable such as p must be interpreted as either T or F .)
• If φ is a formula from the language L then an interpretation for the variables

of φ is an interpretation for each variable appearing in φ as either T or F .
• Given an interpretation for the variables of a formula φ there follows an

interpretation of the formula φ as T or F . The precise manner of going from
the interpretation for the variables of φ to the interpretation of φ is explained
below.
Assume that every variable appearing in a formula has an interpretation. We

show how to obtain the interpretation of the entire formula, through the applica-
tion of the rules (SL2), starting from the interpretation of previously constructed
formulas. Let φ denote a formula having a particular interpretation (either “T ” or
“F ”). Then the interpretation of ¬(φ) is defined by the table below.
φ ¬(φ)
F T
T F
†
Considerations of the absolute meaning of the concepts of truth and falsehood are left to
religion and philosophy. As far as we are concerned, T and F are simply two distinct symbols.
Let φ and ψ denote formulas having one of the four possible simultaneous inter-
pretations: F and F , F and T , T and F , or T and T . Then the formulas (φ) ∧ (ψ),
(φ) ∨ (ψ), (φ) → (ψ) and (φ) ↔ (ψ) are interpreted according to the table below.
φ ψ (φ) ∧ (ψ) (φ) ∨ (ψ) (φ) → (ψ) (φ) ↔ (ψ)

F F F F T T
F T F T T F
T F F T F F
T T T T T T
1.4.2 Definitions. Idiomatic Uses of the Conditional and Biconditional:

Other idiomatic uses of the conditional p → q include:
q is a necessary condition for p,
p is a sufficient condition for q,
whenever p then q,
when p then q,
q whenever p,
q in case p,
if p then q,
p only if q,
q when p,
p only when q,
q if p.
Other idiomatic usages of the biconditional p ↔ q include:
p is a necessary and sufficient condition for q,
q is necessary and sufficient for p,
p when and only when q,
p is equivalent to q,
p just in case q,
p just when q,
p just if q,
p iff q.
Note: The word iff is frequently used as an abbreviation for the phrase if and only
if.
1.4.3 Example. Note that p → q is true when p is false, independently of whether
q is true or false. Thus, the assertion “if 1+1 = 3 then the Moon is made of cheese”
is true. Anything follows from a false statement. There is nothing wrong here. This
is the way things are supposed to work.
1.4.4 Remarks. In the language of elementary algebra, −a + b · c is read neither

−(a+(b·c)) nor (−(a+b))·c, but (−a)+(b·c). When we use the fully parenthesized
expression (−a) + (b · c) we need not remember the order in which the operations
are to be carried out. To avoid using a lot of parentheses, we need similar rules for
the connectives ¬, ∧, ∨, → and ↔.
1.4.5 Definition. Reading Conventions: The connectives ¬, ∧, ∨, → and ↔

are evaluated in the order just listed: first ¬, then ∧, then ∨, then →, then ↔. For
example, ¬p ∧ ¬q ∨ r is read [(¬(p)) ∧ (¬(q))] ∨ (r). When one of the connectives
follows itself, we evaluated it from right to left. For example,
¬p ∧ ¬¬q ∧ ¬¬r
is read [( ) ( )]
( ) ( ) ( )
¬(p) ∧ ¬ ¬(q) ∧ ¬ ¬(r) .
Next in the evaluation order are the connectives → and ↔. That is,
¬p ∧ q ∨ r → s ∧ t
is read [( ) ]
( ) ( )
¬(p) ∧ (q) ∨ (r) → (s) ∧ (t) .
The formula p → q → r is read p → (q → r). The formula p → q ↔ r is read

(p → q) ↔ r. The formula p ↔ q ↔ r is read p ↔ (q ↔ r).
1.5 Tautologies and Contradictions

Consider the formulas p ∨ ¬p and p ∧ ¬p. There are two possible ways to interpret
each, as shown in the table below.
p ¬p p ∨ ¬p p ∧ ¬p
F T T F
T F T F
We note that every interpretation of the formula p ∨ ¬p is true, while every inter-
pretation of the formula p ∧ ¬p is false. Evidently, there are formulas to which we
can attach truth values independently of interpretation. These are precisely the
formulas we shall call tautologies and contradictions, as defined below.
1.5.1 Definition. Tautologies and Contradictions: A tautology is a formula

φ whose every interpretation is true. A contradiction is a formula φ whose ev-
ery interpretation is false. Note that φ is a tautology if and only if ¬(φ) is a
contradiction.
We write |= φ to indicate that φ is a tautology, and write ̸|= φ to indicate that
φ is not a tautology.
Note that ̸|= φ is not the same as |= ¬(φ). The latter says that ¬(φ) is a
tautology, i.e.: that φ is a contradiction.
If |= (φ) → (ψ) we say that φ tautologically implies ψ and write φ ⇒ ψ. We
write φ ̸⇒ ψ for ̸|= (φ) → (ψ).
If |= (φ) ↔ (ψ) we say that φ and ψ are tautologically equivalent and write
φ ⇔ ψ. We write φ ̸⇔ ψ for ̸|= (φ) ↔ (ψ). (This says that φ and ψ are not
tautologically equivalent.)
By abuse of language, we may say that φ implies ψ, instead of saying that
φ tautologically implies ψ; and we may say that φ is equivalent to ψ, instead of
saying that φ is tautologically equivalent to ψ.
1.5.2 Theorem. The Rule of Substitution for Tautologies: If in any tau-

tology we replace every occurrence of the same propositional variable with some
formula enclosed in parentheses, the resulting formula is a tautology.
Proof. Let φ be any tautology in p1 , . . . , pn . Then, φ is always true, independently

of the truth values assigned to p1 , . . . , pn . Therefore, if we replace every instance
of any variable pi with any formula (ψ), we still have that φ is true, since φ is true
independently of whether each pi is true or false.
1.5.3 Example. Of course, we can apply the above theorem any number of times,
since after every substitution we still have a tautology. For example, the formula
s → (t → s)
is a tautology. If we make the substitutions

(p ∧ q → r) for s
(q → r) for t
we obtain the formula
(p ∧ q → r) → ((q → r) → (p ∧ q → r)),
which is a tautology also.

̸|= [(p → q) → r] ↔ [(p → q) ∧ (q → r)].
Consider the table below. It suffices to find a single row where
(p → q) → r and (p → q) ∧ (q → r)
have different truth values.

p q r p→q (p → q) → r q→r (p → q) ∧ (q → r)
F F F T F T T
F F T T T T T
F T F T F F F
F T T T T T T
T F F F T T F
T F T F T T F
T T F T F F F
T T T T T T T
In the first row, (p → q) → r and (p → q)∧(q → r) have different truth values under
the same interpretation. Since a tautology is a formula whose every interpretation
is true, the formula
[(p → q) → r] ↔ [(p → q) ∧ (q → r)]
is not a tautology because we have found an interpretation that is false.

1.5.5 Remark. To show that [(p → q) → r] ↔ [(p → q) ∧ (q → r)] is not a
tautology we need only show the first row of the above table.
|= (m → f ) ∧ (f → ℓ) → (m → ℓ).
This follows from the table below.

m f ℓ m→f f →ℓ m→ℓ (m → f ) ∧ (f → ℓ) → (m → ℓ)
F F F T T T T
F F T T T T T
F T F T F T T
F T T T T T T
T F F F T F T
T F T F T T T
T T F T F F T
T T T T T T T
1.5.7 Remarks. An assertion that is a tautology carries no information. For

example, let m stands for “Mary is happy”, f for “Frank is happy” and ℓ stand for
“Liz is happy.” The formula
(m → f ) ∧ (f → ℓ) → (m → ℓ)
is a tautology. It says:
If Frank is happy when Mary is happy, and Liz is happy when Frank is
happy, then Liz is happy when Mary is happy,
which gives no information because it is true in any interpretation.
In spite of this apparent uselessness, it is precisely this property of tautologies
that makes them the bases of our arguments used in mathematical proofs.
1.6 Examples
1.6.1 Definition. Exclusive Or: The interpretation of “p or q, but not both” or
“either p or q” is given by the following table.
p q either p or q ¬(p ↔ q) ¬p ∧ q p ∧ ¬q (¬p ∧ q) ∨ (p ∧ ¬q)
F F F F F F F
F T T T T F T
T F T T F T T
T T F F F F F
The table also shows that “either p or q” (“p exclusive or q”) is tautologically
equivalent to
¬(p ↔ q)
and is also tautologically equivalent to
(p ∧ ¬q) ∨ (¬p ∧ q).
1.6.2 Theorem. Conditionals and Biconditionals: The formula
(p ↔ q) ↔ (p → q) ∧ (q → p)
is a tautology.
Proof. The assertion follows from the table shown below because the corresponding
truth values of the last two columns are identical.
p q p→q q→p (p → q) ∧ (q → p) p↔q
F F T T T T
F T T F F F
T F F T F F
T T T T T T
1.6.3 Theorem. Commutativity: The connectives ∧, ∨ and ↔ commute, i.e.:
(1.7a) |= (p ∧ q) ↔ (q ∧ p),
(1.7b) |= (p ∨ q) ↔ (q ∨ p),
(1.7c) |= (p ↔ q) ↔ (q ↔ p).
Proof. The above tautologies follow from the tables below.
p q p∧q q∧p p q p∨q q∨p p q p↔q q↔p

F F F F F F F F F F T T
F T F F F T T T F T F F
T F F F T F T T T F F F
T T T T T T T T T T T T
1.6.4 Remark. Note that → does not commute:
̸|= (p → q) ↔ (q → p),
as can be seen from the table below.

p q p→q q→p
F T T F
1.6.5 Theorem. Associativity: The connectives ∧, ∨ and ↔ are associative,

i.e.:
(1.8a) |= [(p ∧ q) ∧ r] ↔ [p ∧ (q ∧ r)],
(1.8b) |= [(p ∨ q) ∨ r] ↔ [p ∨ (q ∨ r)],
(1.8c) |= [(p ↔ q) ↔ r] ↔ [p ↔ (q ↔ r)].

Proof. The above tautologies follow from the tables below
p q r p∧q q∧r (p ∧ q) ∧ r p ∧ (q ∧ r)
F F F F F F F
F F T F F F F
F T F F F F F
F T T F T F F
T F F F F F F
T F T F F F F
T T F T F F F
T T T T T T T
p q r p∨q q∨r (p ∨ q) ∨ r p ∨ (q ∨ r)
F F F F F F F
F F T F T T T
F T F T T T T
F T T T T T T
T F F T F T T
T F T T T T T
T T F T T T T
T T T T T T T
p q r p↔q q↔r (p ↔ q) ↔ r p ↔ (q ↔ r)
F F F T T F F
F F T T F T T
F T F F F T T
F T T F T F F
T F F F T T T
T F T F F F F
T T F T F F F
T T T T T T T
1.6.6 Remark. The connective → is not associative, i.e.:
̸|= [(p → q) → r] ↔ [p → (q → r)],
as can be seen from the table below.

p q r p→q q→r (p → q) → r p → (q → r)
F F F T T F T
1.6.7 Theorem. Distributivity: The connective ∧ distributes over ∨ and vice

versa, i.e.:
(1.9a) |= [p ∧ (q ∨ r)] ↔ [(p ∧ q) ∨ (p ∧ r)],
(1.9b) |= [p ∨ (q ∧ r)] ↔ [(p ∨ q) ∧ (p ∨ r)].

Proof. The above tautologies follow from the tables below
p q r q∨r p ∧ (q ∨ r) p∧q p∧r (p ∧ q) ∨ (p ∧ r)
F F F F F F F F
F F T T F F F F
F T F T F F F F
F T T T F F F F
T F F F F F F F
T F T T T F T T
T T F T T T F T
T T T T T T T T
p q r q∧r p ∨ (q ∧ r) p∨q p∨r (p ∨ q) ∧ (p ∨ r)
F F F F F F F F
F F T F F F T F
F T F F F T F F
F T T T T T T T
T F F F T T T T
T F T F T T T T
T T F F T T T T
T T T T T T T T
1.6.8 Theorem. Equivalence for Implication and Disjunction: We have

(1.10) |= [p → q] ↔ [¬p ∨ q].
Proof. The assertion follows from the table shown below (more specifically, from
the fact that the truth values on the last two columns are always equal).
p q ¬p p→q ¬p ∨ q
F F T T T
F T T T T
T F F F F
T T F T T
1.6.9 Theorem. De Morgan’s Laws: We have
(1.11a) |= [¬(p ∧ q)] ↔ [¬p ∨ ¬q],
(1.11b) |= [¬(p ∨ q)] ↔ [¬p ∧ ¬q].
Proof. The assertion follows from the tables below
p q ¬p ¬q p∧q ¬(p ∧ q) ¬p ∨ ¬q
F F T T F T T
F T T F F T T
T F F T F T T
T T F F T F F
p q ¬p ¬q p∨q ¬(p ∨ q) ¬p ∧ ¬q
F F T T F T T
F T T F T F F
T F F T T F F
T T F F T F F
1.6.10 Remarks. Note that,
|= [¬(p ∧ q)] ↔ [¬p ∨ ¬q]
and
|= [¬(p ∨ q)] ↔ [¬p ∧ ¬q]
can be written
¬(p ∧ q) ⇔ ¬p ∨ ¬q
and
¬(p ∨ q) ⇔ ¬p ∧ ¬q,
respectively. (The same holds for the other tautologies.)
1.7 Some Useful Tautologies

Below is a list of some of the most common tautologies, particularly tautological
implications and equivalences. The list is by no means exhaustive.
Tautologies
Law of the excluded middle p ∨ ¬p
Law of contradiction ¬(p ∧ ¬p)
Tautological Implications
Law of adjunction p→p
Law of simplification p∧q →q
Law of addition p→p∨q
Law of detachment (modus ponens) p ∧ (p → q) → q
Modus tollendo tollens ¬q ∧ (p → q) → ¬p
Modus tollendo ponens ¬p ∧ (p ∨ q) → q
Law of hypothetical syllogism (p → q) ∧ (q → r) → (p → r)
Law of absurdity (reductio ad absurdum) (p → q ∧ ¬q) → ¬p
Law of exportation (p ∧ q → r) → (p → (q → r))
Law of importation (p → (q → r)) → (p ∧ q → r)
Tautological Equivalences
Law of double negation p ↔ ¬¬p
Commutative laws p∨q ↔q∨p
p∧q ↔q∧p
(p ↔ q) ↔ (q ↔ p)
Law of contraposition (p → q) ↔ (¬q → ¬p)
De Morgan’s laws ¬(p ∧ q) ↔ ¬p ∨ ¬q
¬(p ∨ q) ↔ ¬p ∧ ¬q
Law of equivalence for
implication and disjunction (p → q) ↔ ¬p ∨ q
Law of negation for implication ¬(p → q) ↔ p ∧ ¬q
Laws for biconditionals (p ↔ q) ↔ (p → q) ∧ (q → p)
(p ↔ q) ↔ p ∧ q ∨ ¬p ∧ ¬q
Laws for identities p ↔ p ∨ (q ∧ ¬q)
p ↔ p ∧ (q ∨ ¬q)
Law of simultaneous conclusions (p → (q ∧ r)) ↔ (p → q) ∧ (p → r)
Note that the Law of negation for implication can be interpreted as follows: To
prove that p does not imply q, it is necessary and sufficient to show that p is true
and q is false.
1.7.1 Exercise. For each of the formulas given below, if the formula is a tautology,
give a truth table showing that it is; otherwise, show that the formula is not a
tautology by producing a counterexample (that is, a truth assignment making the
interpretation of the formula false).
1. p ↔ ¬¬p
2. p ↔ p ∨ (q ∧ ¬q)
3. p ↔ p ∧ (q ∨ ¬q)
4. ¬(p ∧ ¬p)
5. p → (p → p)
6. p → p ∧ q
7. p → p ∨ q
8. p → (q → p)
9. p ∧ q → p
10. (p → q) → (¬q → p)
11. (p ↔ q) ↔ p ∧ q ∨ ¬p ∧ ¬q
12. ¬p ∧ (p ∨ q) → q
13. p ∧ (p → q) → q
14. ¬(p ∨ q) ↔ ¬p ∨ ¬q
15. ¬(p → q) ↔ p ∧ ¬q
16. ¬p ∧ (p ↔ q) → ¬q
17. ¬q ∧ (p → q) → p
18. ¬(p → ¬q) → ¬p ∨ q
19. (p ∧ q → r) ↔ (p → (q → r))
20. (p → q) ∧ (q → r) → (p → r)
21. (p → q) ∨ (q → r) → (p → r)
22. (p → q) ∧ (p → r) → (p → q ∧ r)
1.8 Proofs
The use of tautological implications and equivalences forms the basis for our math-
ematical arguments used in the construction of proofs. A few of these are outlined
below. They are all based on the fact that if ϕ tautologically implies ψ, we can
prove ϕ to conclude ψ. The techniques illustrated in the examples given below
provide the basis for many of the “tricks” we shall use later in the rest of these
notes. This section will serve as a reference for much of what follows.
1.8.1 Definition. Premises and Consequences: Let φ and φ1 , . . . , φn be for-
mulas. We say that φ is a consequence of (or that φ logically follows from) the
formulas φ1 , . . . , φn , just in case
(1.12) |= [(φ1 ) ∧ · · · ∧ (φn )] → (φ).
We call the formulas φ1 , . . . , φn the premises (or hypotheses) of the consequence

and we call the formula φ the sequent (or conclusion) of the consequence. We use
the notation
(1.13) φ1 , . . . , φn |= φ
(read: “φ is a consequence of φ1 , . . . , φn ” or “φ follows from φ1 , . . . , φn ”) to indi-

cate that [(φ1 ) ∧ · · · ∧ (φn )] ⇒ φ. The latter notation (1.13) is more suggestive
than (1.12): one begins with the propositions φ1 , . . . , φn and concludes φ. More
formally, one says that φ can be proved under the hypotheses φ1 , . . . , φn .‡ Note
that this is not an absolute statement about the truth value of φ. In particular, φ
might not be a tautology by itself—however, any truth assignment that interprets
φ1 , . . . , φn as true must also interpret φ as true.
The negation of the situation above is denoted
φ1 , . . . , φn ̸|= φ
(read: “φ is not a consequence of φ1 , . . . , φn ” or “φ does not follow from φ1 , . . . , φn ”).

Also
φ =||= ψ
means that both
φ |= ψ and ψ |= φ.
This is notation is often useful in proofs of equivalences (“if and only if” state-
ments). Finally, one sometimes writes ψ =| φ instead of φ |= ψ. In any event, we
use shall mainly use the notation φ ⇒ ψ for φ |= ψ and φ ⇔ ψ for φ =||= ψ.
‡
The concept of proof is the same in other contexts, such as that of predicate calculus, which
is the subject of Note 2.
In analogy with the language of elementary algebra, where x = y = z means

that x = y and y = z, here, φ1 ⇒ φ2 ⇒ φ3 means that φ1 ⇒ φ2 and φ2 ⇒ φ3 . (So
that from the previous discussion we may conclude that φ1 ⇒ φ3 .)
Similarly, φ1 ⇔ φ2 ⇔ φ3 means that φ1 ⇔ φ2 and φ2 ⇔ φ3 , from which we
may conclude that φ1 ⇔ φ3 .
The situation with longer chains of tautological implications (⇒) and tautolog-
ical equivalences (⇔) is handled analogously. Be careful with a chain such as
φ1 ⇔ · · · ⇔ φn ⇒ φn+1 ⇔ · · · ⇔ φm .
Here, we can conclude that φ1 ⇒ φm , but cannot conclude that φ1 ⇔ φm
1.8.2 Definition. The Contrapositive of an Implication: The contraposition

(or contrapositive) of the implication p → q is the implication ¬q → ¬p. To prove
(1.14) |= (p → q) ↔ (¬q → ¬p)
we use the table shown below.
p q p→q ¬q → ¬p
F F T T
F T T T
T F F F
T T T T
An argument by contraposition is one where we prove ¬q → ¬p to conclude

that p → q.
1.8.3 Note. The reason proofs by contraposition are valid is that, by equa-
tion (1.14), both (p → q) → (¬q → ¬p) and (¬q → ¬p) → (p → q) are tautologies,
hence
¬q → ¬p =||= p → q.
Said otherwise, from the truth of ¬q → ¬p there follows the truth of p → q, and
vice versa.
1.8.4 Example. Proof by Contraposition: Consider the natural numbers§
N = {0, 1, 2, . . .}.
We wish to prove the statement

§
We have yet to invent the natural numbers. What we use here is the unofficial, intuitive
version you already know about.
If n2 is odd then n is odd,

about any such number n. Let p stand for “n2 is odd” and let q stand for “n is
odd.” To prove that p → q, we prove that ¬q → ¬p; that is¶
¬(n is odd) → ¬(n2 is odd).
The above is equivalent to
n is even → n2 is even.
Indeed, if n is even, n = 2m for some m. This gives n2 = 4m2 = 2(2m2 ), showing

that n2 is even.
1.8.5 Definition. Proof by Contradiction: We explain the argument called
proof by contradiction. It is based on the tautology
(1) |= (p → q ∧ ¬q) → ¬p
(the truth table shown below verifies this), called the law of absurdity. Note: It
does not matter what we use for q, since q ∧ ¬q is a contradiction. Further, q ∧ ¬q
can be replaced by q ↔ ¬q, or any other contradiction.‖
p q ¬p q ∧ ¬q p → q ∧ ¬q (p → q ∧ ¬q) → ¬p
F F T F T T
F T T F T T
T F F F F T
T T F F F T
Of course, (p → q ∧ ¬q) → ¬p can be rewritten
(2) (¬p → q ∧ ¬q) → p.
We might use (1) to prove that p is true by proving that ¬p leads to a contradiction.
In (2) we can replace p with p → q to obtain the tautology
(3) |= (p ∧ ¬q → r ∧ ¬r) → (p → q),
yet another form of the law of absurdity. We can prove p → q by asserting p while
denying q to obtain a contradiction. In other words, p → q can only be false if we
can have p true and q false; i.e.: p ∧ ¬q.
¶
If n is a natural number then the negation of the statement “n is odd” is the statement “n is
even”.
‖
We cannot write F (for falsehood), because truth and falsehood belong to interpretation,
not to the language L.
1.8.6 Example.
√ A Proof by Contradiction: We illustrate the technique by
∗∗
proving that 2 is irrational. Recall that a real number x is called rational if it
can be written as the ratio m/n of two integers. Otherwise, x is called
√ irrational.
The proof by contradiction goes as follows. (i) We suppose
√ that 2 is rational.
(ii) We obtain a contradiction.√(iii) We conclude that 2 cannot be rational.
Accordingly, suppose that 2 is rational, so that we can write
√ m
(1) 2= .
n
Further, we can assume that m and n have no common factors, i.e.: we assume
that the fraction in (1) is in reduced form.
By squaring both sides of (1) we obtain the equality 2 = m2 /n2 , which gives
(2) m2 = 2n2 .
Clearly, (2) shows that m2 is even. This means that m is also even. In other
words, m = 2k for some integer k. Therefore, after squaring both sides, we obtain
m2 = 4k 2 . We substitute this last equality in (2) to obtain 4k 2 = 2n2 . This gives,
2k 2 = n2 , showing that n2 is also even. Hence, n is even also.
We have reached the conclusion that m and n are both even, contradicting our
original assumption that the fraction in (1) is in reduced form (because we can
divide numerator and denominator by 2). √
Therefore, we have shown that√it is impossible to write 2 in the form (1).
Equivalently, we have shown that 2, not being rational, must be irrational.
1.9 Appendix A: More Examples

1.9.1 Remark. Life would be quite easy if we could use the propositional calculus
for all of our mathematical reasoning. Regrettably, this is not possible. In Note 3,
we shall introduce the predicate calculus to remedy this.
What this means is that truth tables will no longer suffice and that we shall
need to be more clever in showing why a statement is true (or false). To illustrate
how we can accomplish this in some simple cases, we shall use two problems already
solved using truth tables.
While the construction of a truth table is carried out by a simple mechanical
procedure, a proof without a truth table is much more of a puzzle: How do we know
what trick will take us from one step to the next in such a way that we reach the
end we seek? At first glance it may appear that memorization provides the only
way out. In reality, mathematical problem-solving skills provide what is needed.
∗∗
At this point, the real numbers have no official status. We use the intuitive version of the
reals you already know about.
The examples that follow were already worked out using truth tables. Why
must we rework them again without using truth tables? The point is to expose the
line of reasoning that must be used when truth tables do not apply.
1.9.2 Example. We prove the tautology
|= (p ↔ q) ↔ p ∧ q ∨ ¬p ∧ ¬q
of Exercise 1.7.1.11 without using a truth table. We have
(p ↔ q) ⇔ (p → q) ∧ (q → p) (Theorem 1.6.2),
⇔ (¬p ∨ q) ∧ (¬q ∨ p) (Theorem 1.6.8),
⇔ ¬p ∧ (¬q ∨ p) ∨ q ∧ (¬q ∨ p) (distributivity),
⇔ [(¬p ∧ ¬q) ∨ (¬p ∧ p) ] ∨ [ (q ∧ ¬q) ∨(q ∧ p)] (distributivity),
| {z } | {z }
contradiction contradiction
(Law for Identity:
⇔ (¬p ∧ ¬q) ∨ [ (q ∧ ¬q) ∨(q ∧ p)]
| {z } Exercise 1.7.1.2),
contradiction
⇔ (¬p ∧ ¬q) ∨ [(q ∧ p) ∨ (q ∧ ¬q) ] (commutativity),
| {z }
contradiction
⇔ (¬p ∧ ¬q) ∨ (q ∧ p) (Law for Identity),
⇔ (p ∧ q) ∨ (¬p ∧ ¬q) (commutativity).
1.9.3 Example. We prove the tautology
|= (p ∧ q → r) ↔ (p → (q → r))
of Exercise 1.7.1.19 without using a truth table. We have
(p ∧ q → r) ⇔ ¬(p ∧ q) ∨ r (Theorem 1.6.8),
⇔ ¬p ∨ ¬q ∨ r (Theorem 1.6.9),
⇔ ¬p ∨ (¬q ∨ r) (Associativity),
⇔ ¬p ∨ (q → r) (Theorem 1.6.8),
⇔ (p → (q → r)) (Theorem 1.6.8).
1.9.4 Example. A proof of the existence of infinitely many primes by
contradiction: This proof was first published by Euclid in his Elements (III
century B.C.), probably the most influential work in the history of mathematics.
Recall that a natural number n > 1 is prime if its only (positive integer) factors
are 1 and n itself. Otherwise, n is called composite.†† A well-known proposition
states that that every natural number n > 1 is divisible by some prime number‡‡ .
††
The number 1 is neither prime nor composite; it is called unity.
‡‡
The Fundamental Theorem of Arithmetic states that every natural n > 1 is a product of one
or more factors that are prime numbers (not necessarily distinct); moreover, the prime factors
of n are uniquely determined, up to reordering.
To start the proof by contradiction, assume that only finitely many prime num-
bers exist. Let us say that there are only N distinct prime numbers. List all of
these distinct prime numbers:
p1 , p 2 , p 3 , . . . , p N .
Let
a = 1 + p1 p2 · · · · · pN .
At this point we recall the assumption made at the beginning of the proof: p1 , p2 , . . . , pN
are all the prime numbers. Clearly, a > p1 , a > p2 , …, and a > pN . Therefore, no
prime number divides a. On the other hand, since a > 1, there must exist some
prime number p that divides a. This is a contradiction.
The assumption that only finitely many prime numbers exist leads to a contra-
diction. This proves that infinitely many prime numbers exist.
1.10 Appendix B: The Greek Alphabet

A α alpha
B β beta
Γ γ gamma
∆ δ delta
E ϵ ε epsilon
Z ζ zeta
H η eta
Θ θ ϑ theta
I ι iota
K κ kappa
Λ λ lambda
M µ mu
N ν nu
O o omicron
Ξ ξ xi
Π π ϖ pi
P ρ ϱ rho
Σ σ ς sigma
T τ tau
Υ υ upsilon
Φ φ ϕ phi
X χ chi
Ψ ψ psi
Ω ω omega
Foundations Note 2
The Predicate Calculus

July 28, 2020
Contents
2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-1
2.2 Quantification of Predicates . . . . . . . . . . . . . . . . . . 2-2
2.3 Formulas of the Predicate Calculus . . . . . . . . . . . . . . 2-4
2.4 Interpretations Revisited . . . . . . . . . . . . . . . . . . . . 2-8
2.1 Introduction
The notions developed in the propositional calculus do not enable us to penetrate
the language of everyday reasoning beyond the primitive notion of irreducible sen-
tence (propositional variable, sentence letter). Here we develop a finer and more
sophisticated structure that leads to a more powerful tool for describing mathe-
matical reasoning. We begin by introducing the notion predicate below.
In everyday language, a predicate is the phrase in a sentence or clause which
expresses what is said of the subject. In the formula (sentence) “x is positive,” the
predicate is “is positive.” If we let A denote this predicate symbolically, and let
A(x) denote the predicate A applied to x, the formula can be written A(x).
The above predicate is a one-place predicate. (The predicate applies to one
variable.) More generally, we may have a two-place predicate (e.g.: x is taller than
y), a three-place predicate (e.g.: x and y are the parents of z), and so on. We do
not allow zero-place predicates.
2-1
Note 2: The Predicate Calculus – July 28, 2020 2-2
We also have constants: In “x is greater than 5” and in “Charlie is taller than

Frank” the constants are 5, Charlie, and Frank. In addition, we need to talk about
functions:∗ If we want to say that
f (x) > 4 → x2 > 4,
for integer x, the function f here is defined by the rule which assigns to every
x the integer x2 . The domain of each function must be adequate for function
evaluation to make sense in a given interpretation, where “interpretation” shall
be defined below. The value of each function must also be in the domain of the
interpretation. A function of n variables is called an n-place function. We do not
allow zero-place functions.
We point out that we use the notion function as an undefined primitive notion
in the background language. Later, functions will be defined as special kinds of
sets.
In the examples that follow, N = {0, 1, 2, . . .} denotes the natural numbers, Z
the integers, Q the rationals and R the reals. These are the “unofficial” versions
of these numbers, as we haven’t invented them yet. They serve the purpose of
illustrating our development with examples to show, more or less, in which direction
our constructions will take us.
2.2 Quantification of Predicates

In interpreting the formula “x is taller than Charlie” we must ask† what values can
be substituted for x. Two possibilities come immediately to mind: We can consider
“for every x, x is taller than Charlie” or we can consider “there exists some x such
that x is taller than Charlie.” The technical name for “there exists some x such
that” and “for every x” is “quantifier.”
2.2.1 Definition (Existential and Universal Quantifiers). We introduce the notion

of quantification for a predicate. Consider the n-place predicate
A(x1 , . . . , xn ),
where
x1 , . . . , x n
∗
Remember that Note 1 and Note 2 develop the background language. Here “function” is an
intuitive notion lacking a precise definition.
†
Of course, we must also have an interpretation for “taller than” and we must know who
Charlie is. At the moment we are only concerned with presenting the existential and universal
quantifiers.
are variables. We write

(∀x)A(x1 , . . . , xn )
to say “for all x, A(x1 , . . . , xn ).” The text ∀x is called a universal quantifier with
attached variable letter x and is read “for all x.”
In the formula
(∃x)A(x1 , . . . , xn ),
the text ∃x is called an existential quantifier with attached variable letter x and is
read “for some x,” or “there exists an x such that.”
The variable x need not be one of the xi ’s.
2.2.2 Examples. The formula
(∀x)(x ≥ 0)
is false when x is allowed to take on integer values. The formula
(∃x)(x > 0),
is true if x is allowed to range over the integers. Quantifying a predicate does

not necessarily yield a true or false statement, even if the predicate is a one-place
predicate. For instance,
(∀y)(x > 0)
is neither true nor false.
2.2.3 Example (The Unique Existential Quantifier). Sometimes we say “there
exists a unique x such that... .” For this purpose we can use the unique existential
quantifier symbol ‘∃!’.
For example, to say “for every x there exists a unique y such that x + y = 0,”
we write
(∀x)(∃!y)(x + y = 0).
Note that the above can be replaced with
(∀x)(∃y)(∀z)[(x + y = 0) ∧ (x + z = 0 → y = z)].
2.2.4 Example (Using Only One Quantifier). The formula
¬(∃x)(p(x)) ↔ (∀x)(¬p(x))
is a tautology. (If there is no x for which p(x) is true then p(x) must be false for
all x. If p(x) is false for all x then there is no x for which p(x) is true.) Negating
the above and replacing p(x) with ¬p(x) we obtain the tautology
¬(∀x)(p(x)) ↔ (∃x)(¬p(x)).
Therefore, only one of the two quantifiers ∀ and ∃ is necessary, for we can replace
(∃x)(p(x)) with ¬(∀x)(¬p(x)),
and we can replace

(∀x)(p(x)) with ¬(∃x)(¬p(x)).
2.2.5 Remarks (Notational Conventions). Recall the conventions adopted for
using the symbols ⇒ and ⇔ in Note 1: If φ, φ1 , . . . , φn , ψ and ξ are formulas of L,
the text “φ ⇔ ψ ⇔ ξ” is shorthand for “φ ⇔ ψ and ψ ⇔ ξ.”‡ Similarly,
• “φ ⇒ ψ ⇒ ξ” is shorthand for “φ ⇒ ψ and ψ ⇒ ξ,”
• “φ ⇒ ψ ⇔ ξ” is shorthand for “φ ⇒ ψ and ψ ⇔ ξ,”
• “φ ⇔ ψ ⇒ ξ” is shorthand for “φ ⇔ ψ and ψ ⇒ ξ.”

More generally, “φ1 ⇔ φ2 ⇔ · · · ⇔ φn ” is shorthand for “φ1 ⇔ φ2 and φ2 ⇔ φ3
and . . . and φn−1 ⇔ φn .” Cases where some instances of ⇔ are replaced with ⇒
are analogous to the above.
2.3 Formulas of the Predicate Calculus

In the study of the propositional calculus we used a single language, the language
L defined in Note 1. Here, we allow arbitrarily many languages to illustrate the
richness of the structure and to create a number of examples. Beginning with the
next note, we shall define an extremely simple language to talk about sets.
2.3.1 Definition (Formulas). We can quantify predicates already quantified. We
now explain these ideas and define some “formal” objects.
A formula, is the formal object we define below. The formal symbols are the sen-
tential connectives, parentheses, and the symbols we list here in quotation marks:
(P L1) comma ‘,’;
(P L2) equality symbol ‘=’, (not all languages may have ‘=’);
(P L3) existential quantifier symbol ‘∃’;
(P L4) universal quantifier symbol ‘∀’;
(P L5) a collection of variable letters x, x1 , x2 , ... ;

‡
Further recall that φ ⇒ ψ is shorthand for |= (φ) → (ψ), while φ ⇔ ψ is shorthand for
|= (φ) ↔ (ψ).
(P L6) a collection of constants a, a1 , a2 , ... ;
(P L7) a collection of predicate letters A, A1 , A2 , ... ;
(P L8) a collection of function letters f , f1 , f2 , ... .
A language need not have equality, or constants, or functions to be interesting.

As a matter of fact, for our development of set theory in the next note, we use a
language with none of these and with a single predicate.
In a language with equality, there is a special two place predicate symbol E;
E(x, y) is interpreted to mean equality of x and y (Section 2.4). It is less awkward
to write x = y than the formal E(x, y). We also write x ̸= y for ¬E(x, y).
Moreover, we suppose that to each predicate letter and to each function letter
there corresponds a positive integer n (the arity) that designates the predicate
letter or the function letter as an n-place predicate letter or an n-place function
letter, respectively.
A formal string is any juxtaposition of formal symbols obtainable from the
formal symbols by finitely many applications of the concatenation operation. We
must weed out the ungrammatical formulas by defining a syntax. First, we define a
term by the rules given below. Here φ stands for any function letter, and t1 , . . . , tn
denote any term. (We ignore spaces.)
(P L9) A variable letter is a term.
(P L10) A constant is a term.
(P L11n ) If φ is any n-place function letter and t1 , . . . , tn are terms, then φ(t1 , . . . , tn )
is a term.
(P L12) The only terms are those which can be constructed by finitely many
applications or rules (P L9), (P L10), and (P L11n ) given above.
Note that our definition of term does not make explicit allowance for the use of
operators such as + in “x+y.” To do this, define the two-place function letter f that
is interpreted as f (x, y) = x + y, where + is given meaning in the interpretation.
Then we have
f (x, y) for x + y,
f (f (x, y), z) for (x + y) + z,
and so on. Also note that rule (P L11n ) depends on the value of n, n = 1, 2, . . ..
The text
“t1 , . . . , tn ”
is a construct that means
“ ‘t1 ’, if n = 1,” or “ ‘t1 , t2 ’, if n = 2,” or “ t1 , t2 , t3 ’, if n = 3,” and so on.
We introduce more symbols: v, v1 , v2 , . . . for any variable letter; P for any n-

place predicate letter; φ, φ1 , φ2 , . . . for any formula.
A formula is defined by the following rules.
(P L13) If t1 and t2 are terms, then t1 = t2 is a formula.
(P L14n ) If P is an n-place predicate letter and t1 , . . . , tn are terms then P (t1 , . . . , tn )

is a formula. (Here we have n = 1, 2, ....)
(P L15) If φ1 and φ2 are formulas, then
¬(φ1 ),
(φ1 ) ∧ (φ2 ),
(φ1 ) ∨ (φ2 ),
(φ1 ) → (φ2 ),
(φ1 ) ↔ (φ2 )
are formulas also.
(P L16) If φ is a formula and v is a variable, then (∀v)(φ) and (∃v)(φ) are

formulas.
(P L17) The only formulas are those which can be obtained by finitely many ap-
plications of rules (P L13), (P L14n ), (P L15), and (P L16) given above.
Formulas free of quantifiers, that is, formulas that can be formed without using
rule (P L16), are called irreducible predicate formulas. Any irreducible formula can
be obtained from a suitable formula of the propositional calculus upon replacing
propositional variables by terms. See remark 2.3.8.
2.3.2 Example. We use the same rules for eliminating parentheses we used in the
previous note. For example, the following is a formula:
(∀x)(∃y)(A(x, y, z)) → (∃x)(B(y) ∧ C(z, f (x, y))),
where A is a three-place predicate letter, B is a one-place predicate letter, C is a

two-place predicate letter, and f is a two-place function letter.
2.3.3 Definition (The Scope of a Quantifier). The scope of a quantifier is the

quantifier together with the smallest formula immediately following the quantifier.
2.3.4 Examples.
scope of ∃y
z }| {
(∀x) (∃y)(y > x ∧ y − 1 < x)
| {z }
scope of ∀x
[z scope of ∀y
}| { z
scope of ∀z
}| {]
(∃x) (∀y)(y > 0 ∨ x + z = 5) ∧ (∀z)(x + y > z)
| {z }
scope of ∃x
2.3.5 Definition (Bound and Free Occurrences). An occurrence of a variable in

a formula is said to be bound if the variable occurs within the scope of a quantifier
in that variable, otherwise the occurrence of the variable is said to be free.
2.3.6 Example. In the formula
(∀x)(x > y)
the variable x is bound and the variable y is free. In the formula
(∀x)(x < 0) ∧ (∀y)(y > x)
the first occurrence of x is bound and the second is free (so x is both free and
bound in the formula), while the (only) occurrence of y is bound.
2.3.7 Definition (Bound and Free Variables). A variable is called a free variable
in a formula if at least one occurrence of the variable is free. A variable is called a
bound variable in a formula if at least one occurrence of the variable is bound.
2.3.8 Remark. In the propositional calculus we can substitute in a formula φ

every occurrence of a variable p by some formula ψ within parentheses to obtain
the formula φ(p ← (ψ)). In the predicate calculus it is also possible to make sub-
stitutions (as we shall see below); however, the presence of quantifiers (hence the
distinction between free and bound variables) makes it necessary to place some
restrictions in the allowable substitutions. Those restrictions are necessary to en-
sure that the process of substitution behaves in a sane way under interpretations
(Section 2.4).
We will only deal with substitutions from a very naive point of view. We work
in a fixed predicate language. The basic rules are the following:
• If φ is a formula in the language L of the propositional calculus involving

the variables p1 , p2 , . . . , pn , and if ψ1 , ψ2 , . . . , ψn are formulas in the predicate
language, then
(2.1) φ(p1 ← (ψ1 ), p2 ← (ψ2 ), . . . , pn ← (ψn ))
denotes the predicate formula obtained upon replacing each occurrence of p1

in φ by (ψ1 ), of p2 by (ψ2 ), . . . , of pn by (ψn ). (That the formula so obtained
is a valid predicate formula follows from rule (15).)
• If ξ is a (predicate) formula, x is a variable and ψ is any term, then
(2.2) ξ(x ← (ψ))
denotes the formula obtained from ξ after replacing every free occurrence of
x by (ψ). Note: All bound occurrences of x are not replaced by (ψ). Rather,
they remain x’s.
In addition to the rules above, there is an important sense of allowability of sub-

stitutions. In a nutshell, the substitution (2.2) is allowable iff performing it does
not cause any occurrence of a variable in ψ to become bound in ξ(x ← (ψ)). We
will provide examples to illustrate this point in Section 2.4 after we introduce the
concept of interpretation. Non-allowable substitutions cannot be interpreted and
must be forbidden.
2.4 Interpretations Revisited

The basic idea behind the notion interpretation is similar to that of the previous
note. An interpretation ultimately permits finding a truth value for (some) for-
mulas.§ However, we need to interpret more that just irreducible formulas (which
in predicate languages play the role that propositional variables played in propo-
sitional logic). We need to interpret quantified formulas as well. For this purpose
it is necessary at this point to appeal to the intuitive notions of set, relation, and
function.
2.4.1 Definition (Interpretation). An interpretation of a predicate language L is a

nonempty set D, called the domain of the interpretation, together with a collection
of rules:
• for assigning to every constant a member of the set D;

§
These are formulas without free variables, to be called “propositions.”
• for assigning to every n-place function letter an n-place function (taking each
n-tuple of elements in D to a unique member of D);
• for assigning to every n-place predicate letter an n-place function with Truth
and Falsehood as the only possible function values.
Once the above rules are provided, it becomes possible to interpret arbitrary terms.
The passage from terms to formulas is as follows:
• If a, b are terms then the equality formula E(a, b) (or a = b, as we prefer) has
an interpretation of Truth exactly when a and b are interpreted as the same
member of D.
• Quantifiers are interpreted by replacing variables by elements of the set D:
– (∀x)(φ) has an interpretation of Truth if and only if φ(x ← a) has an
interpretation of Truth for all a in D;
– (∃x)(φ) has an interpretation of Truth if there exists a in D such that
φ(x ← a) has an interpretation of Truth.
• Sentential connectives are interpreted in a truth-functional manner (i.e.: in
the manner prescribed by the truth tables from Note 1, Section 2.4).
2.4.2 Definition (Proposition). A proposition is a formula without free variables.
2.4.3 Remarks. Existential quantifiers complicate things in connection with
tautologies. For example, the tautology called the law of hypothetical syllogism,
(p → q) ∧ (q → r) → (p → r),
remains a tautology when we change the above to
(∃x)[(p(x) → q(x)) ∧ (q(x) → r(x)) → (p(x) → r(x))],
since (p(x) → q(x)) ∧ (q(x) → r(x)) → (p(x) → r(x)) is a tautology for every x.
On the other hand, the formula
(∃x)(p(x) → q(x)) ∧ (∃x)(q(x) → r(x)) → (∃x)(p(x) → r(x))
is not a tautology. To see this, take the domain of interpretation to be N ordered
in the usual way¶ and consider the formula
(2.3) (∃x)(x ≥ 0 → x < 9) ∧ (∃x)(x < 9 → x < 0) → (∃x)(x ≥ 0 → x < 0).
¶
This means that our language has a constant for each natural number and two binary (2-
ary) predicate letters E, L, where E is interpreted as “is equal to” and L as “is less than.” The
sentence x < y is really the sentence L(x, y), while the sentence w ≥ z is an abbreviation for
E(w, z) ∨ L(z, w).
The implication x ≥ 0 → x < 9 is true with x = 8. (Since the antecedent and the
consequent are both true.) The implication x < 9 → x < 0 is true with x = 10.
(Since the antecedent and the consequent are both false.) On the other hand, the
implication x ≥ 0 → x < 0 is always false in N, since the antecedent is always true
and the consequent is always false. Therefore, the antecedent of (2.3), i.e.:
(∃x)(x ≥ 0 → x < 9) ∧ (∃x)(x < 9 → x < 0)

| {z } | {z }
x=8 x=10
is true, while its consequent (∃x)(x > 0 → x < 0) is false. This shows that (2.3) is
not a tautology, i.e.: there is an interpretation of (2.3) that is false.
On the other hand, the formula
(∀x)(p(x) → q(x)) ∧ (∀x)(q(x) → r(x)) → (∀x)(p(x) → r(x))
is a tautology.
2.4.4 Example. Some Quantified Tautologies.
(1) (∃v)(φ) ↔ ¬(∀v)(¬(φ))
(2) (∀v)(φ) ↔ ¬(∃v)(¬(φ))
(3) (∃v1 )(∃v2 )(φ) ↔ (∃v2 )(∃v1 )(φ)
(4) (∀v1 )(∀v2 )(φ) ↔ (∀v2 )(∀v1 )(φ)
(5) (∀v)(φ1 ∧ φ2 ) ↔ (∀v)(φ1 ) ∧ (∀v)(φ2 )
(6) (∃v)(φ1 ∨ v2 ) ↔ (∃v)(φ1 ) ∨ (∃v)(φ2 )
(7) (∀v)(φ1 → φ2 ) → ((∃v)(φ1 ) → (∃v)(φ2 ))
(8) (∃v)(φ1 ∧ φ2 ) → (∃v)(φ1 ) ∧ (∃v)(φ2 )
(9) (∀v)(φ1 ) ∨ (∀v)(φ2 ) → (∀v)(φ1 ∨ φ2 )
2.4.5 Remarks. Inconsistent Hypotheses. It is extremely important to realize
that we need consistent hypotheses to obtain valid results. If we deduce q using an
argument based on the tautology
(2.4) p1 ∧ . . . ∧ pn ⇒ q,
we can deduce anything if the premises are inconsistent. To illustrate this point,
suppose that we use the premises p and ¬p. We show below that
p, ¬p |= q
for any q whatsoever. To see this, we can use the following line of reasoning.
Since p, ¬p |= p ∧ ¬p and |= p → p ∨ q for any q whatsoever, we have
p, ¬p |= p ∧ ¬p |= (p ∨ q) ∧ ¬p |= (p ∧ ¬p) ∨(q ∧ ¬p)) |= q ∧ ¬p |= q.

| {z }
contradiction
The above holds for arbitrary q. Therefore, it also holds for ¬q. No amount of
“logically correct” reasoning does any good if the premises are inconsistent: We
conclude that q and ¬q are both true. Since this cannot be, we cannot tolerate
inconsistent premises if we want a valid conclusion.
2.4.6 Remarks. Unlisted Hypotheses. Thanks to the laws of exportation and

importation, we can rewrite
(2.5) p1 , . . . , pi−1 , pi , pi+1 , . . . , pn |= q,

| {z }
pi is listed
as
(2.6) p1 , . . . , pi−1 , pi+1 , . . . , pn |= pi → q,

| {z }
pi is missing
which says that if q follows from the hypotheses p1 , . . . , pn then pi → q follows from
the remaining hypotheses p1 , . . . , pi−1 , pi+1 , . . . , pn obtained by removing pi from the
original collection of hypotheses. Frequently, mathematical arguments are carried
out without listing all the hypotheses used, assuming that the reader is aware that
certain hypotheses are in place all the time. When this is the case, it is not unusual
to write pi |= q (or pi ⇒ q) instead of writing (2.6). Here, p1 , . . . , pi−1 , pi+1 , . . . , pn
are the hypotheses assumed to be in place by default.
2.4.7 Exercise. Let N, Z, Q and R denote, respectively, the natural numbers,

the integers, the rationals and the reals. For each of these as the domain of inter-
pretation, explain whether the interpretation of each of the sentences given below
is true or false.
1. (∃x)(∀y)(x = −y)
2. (∀y)(∃x)(x = −y)
3. (∀y)(∃x)(2x = y)
4. (∀y)(∃x)(x2 = y)
5. (∀y)(∃x)(y 2 = x)
6. (∃x)(x2 = x)
7. (∀x)(∀y)(∃z)(z = x + y)
8. (∃z)(∀y)(∃x)(z = x + y)
9. (∀x)(∃y)(x > 0 → x2 = y)
10. (∀x)(∃y)(x > 0 → y 2 = x)
11. (∀x)(∀y)(x = y 2 → x ≥ 0)
12. (∀x)(∀y)(y = x2 → x ≥ 0)
13. (∀x)(∀y)(x2 = y 2 → x = y)
2.4.8 Exercise. Consider the following statements:
I. Some Mems are not Ens.
II. No Ens are Vees.
Consider a predicate language L with 1-place predicate symbols M, N, V (as well

as any necessary logical connectives and quantifiers). The atomic formulas M (x),
N (x), V (x) are interpreted as “x is a Mem”, “x is an En”, “x is a Vee”, respectively.
1. Write a formula φ in the language L whose interpretation is statement I.
2. Write a formula ψ in the language L whose interpretation is statement II.
3. Regard statements I. and II. as axioms (that is, assume that the interpretation
of both formulas φ and ψ is true). For each of the statements below, either
provide a proof using the axioms and tautological reasoning, or else provide
a counterexample showing the statement need not follow from the axioms.
(a) There are some Mems.

(b) Everybody fails to be an En, or fails to be a Vee.
(c) Some Mems are not Vees.
(d) Some Vees are not Mems.
(e) No Mem is a Vee.
(f) Some Mems are Vees.
Foundations Note 3
Sets

July 28, 2020
Contents
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-1
3.2 Axioms of Extension and Separation . . . . . . . . . . . . . . 3-3
3.3 Intersections and Differences . . . . . . . . . . . . . . . . . . 3-8
3.4 Unions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-9
3.5 Powers, Products and Ordered Pairs . . . . . . . . . . . . . . 3-12
3.6 Appendix A: The Axiom Schema of Separation . . . . . . . . 3-16
3.1 Introduction
Because we cannot bootstrap mathematics from nothing, we cannot define every-
thing in terms of things previously defined because we need a beginning. We need
starting points. Thus, an undefined primitive notion is a notion that is not defined
in terms of more elementary notions. The notion set is such a notion.
As we shall show in Section 3.2.13 below, paradoxes creep in far too easily in
the development unless we specify quite carefully how sets may be constructed.
For this we need axioms. An axiom is simply a proposition we take to be true
without proof. The language in which axioms must be written is the very small
language of the predicate calculus that we describe below. We must choose our
axioms carefully and our language accurately.
We develop the rudiments of Zermelo-Fränkel set theory (ZF for short). Note
that this brand of set theory comes in many flavors and it is very likely that other
3-1
Note 3: Sets – July 28, 2020 3-2
sources may contain an axiomatization somewhat different from the one given here.
In ZF there are only sets and there is nothing else.
In developing a theory where only sets exist, we lose the intuitive notion of
sets of “individuals,” i.e.: a fleet of ships does not exist in the theory (there are
no ships, only sets). In other words, we can only talk about sets whose members
are also sets. Not only is this not devastating, but is actually quite irrelevant for
our purposes. The first batch of sets that we shall manufacture to get the process
started are the “official” natural numbers (defined as sets).
Also an undefined primitive notion is that of belonging to (being a member of,
being an element of ) a set. If A is a set which has a as a member we say that
a belongs to A, or that a is in A, or that A contains a. We use the two-place
predicate ∈ (a variant of the Greek letter epsilon, which is usually written ϵ or ε)
and write x ∈ A (read: “x is an element of A,” or “x belongs to A,” or “x is in A,”
or “A contains x”) to denote that x is a member of A.
We write x ∈ / A for ¬(x ∈ A) (read: “x is not an element of A,” or“x does not
belong to A,” or “x is not in A,” or “A does not contain x”).
As it would be rather pointless to develop a theory that applies to nothing, we
may as well suppose from the very beginning that a set exists. But, this assumption
is only temporary. Eventually, we will introduce an axiom to give us the existence
of a very special set (an inductive set), a set that will enable us to deduce the
existence of a set to replace our intuitive N (the natural numbers) with a formal
one: a set whose existence follows from the axioms presented. Everything else will
be built from there.
In summary, our language LZF of ZF set theory has infinitely many variable
letters, interpreted as names of sets (undefined primitive objects); no constant
letters; no function letters; and a single two-place predicate letter, interpreted as
the set membership relation, a primitive (undefined) predicate. Everything else
will be defined. Below, we begin by defining equality of sets.
3.1.1 Definition (Range of Quantification). When the quantification shown in a
formula is of the form ∀x or ∃y, as in
(∀x)(∃y)(φ(x, y)),
the domain of interpretation (also called range of quantification) for the variables
x and y is taken to be unrestricted. That is, x and y can be any sets whatsoever.
When the domain of a quantified variable is restricted to be a specific set, we may
indicate that set explicitly, as in
(∀x ∈ X)(∃y ∈ Y )(φ(x, y)),
where x is restricted to be in X and y is restricted to be in Y .

Note 3: Sets – July 28, 2020 3-3
3.2 Axioms of Extension and Separation

3.2.1 Axiom of Extension. Equality of sets is denoted by the usual equality
symbol = and is defined as follows. Two sets are equal if and only if they have
precisely the same members:
(∀A)(∀B)[A = B ↔ (∀x)(x ∈ A ↔ x ∈ B)].
3.2.2 Proposition (Properties of Set Equality).
(3.1) (∀A)(A = A),
(3.2) (∀A)(∀B)(A = B → B = A),
(3.3) (∀A)(∀B)(∀C)(A = B ∧ B = C → A = C).

(The above three properties are called reflexivity, symmetry and transitivity, re-
spectively.)
Proof. Property (3.1) is immediate: We have
A = A ↔ (∀x)(x ∈ A ↔ x ∈ A),
which is a tautology. Property (3.2) follows from the commutativity of ↔:
A = B ⇒ (∀x)(x ∈ A ↔ x ∈ B)
⇒ (∀x)(x ∈ B ↔ x ∈ A)
⇒ B = A.
Finally, (3.3) follows thanks the tautologies
(∀x)(p) ∧ (∀x)(q) ⇔ (∀x)(p ∧ q) and [(r ↔ s) ∧ (s ↔ t)] ⇒ (r ↔ t).
We have
A = B ∧ B = C ⇒ (∀x)(x ∈ A ↔ x ∈ B)) ∧ (∀x)(x ∈ B ↔ x ∈ C))
| {z } | {z }
p q
p q
z }| { z }| {
⇒ (∀x)[(x ∈ A
| {z } ↔ x ∈ B
| {z } ) ∧(x ∈ B
| {z } ↔ x ∈
| {z })]
C
r s s t
⇒ (∀x)(x ∈ A ↔ x ∈ C)
⇒ A = C.
Note 3: Sets – July 28, 2020 3-4
3.2.3 Remark (Distinct Sets). We write A ̸= B for ¬(A = B). Thanks to the
tautology ¬(p → q) ⇔ p ∧ ¬q, we have
¬(A = B) ⇔ ¬(∀x)(x ∈ A ↔ x ∈ B)
⇔ ¬(∀x)[(x ∈ A → x ∈ B) ∧ (x ∈ B → x ∈ A)]
[ ]
⇔ (∃x) ¬[(x ∈ A → x ∈ B) ∧ (x ∈ B → x ∈ A)]
⇔ (∃x)[¬(x ∈ A → x ∈ B) ∨ ¬(x ∈ B → x ∈ A)]
⇔ (∃x)[(x ∈ A ∧ x ∈
/ B) ∨ (x ∈ B ∧ x ∈
/ A)]
⇔ (∃x)(x ∈ A ∧ x ∈
/ B) ∨ (∃x)(x ∈ B ∧ x ∈
/ A).
In words: the sets A and B are distinct if and only if there is something in A that
is not in B or there is something in B that is not in A.
3.2.4 Remarks. If a set has precisely the members a1 , . . . , an , we denote it by
{a1 , . . . , an }. The set {a} which has a as the only member is called the singleton of
a. More generally, a one-member set may be called a singleton, without specifying
the member.
Sets have no ordering. An object either belongs to a set, or does not belong
to that set. It can neither belong twice nor can it belong before or after some
other object in the same set. (The order in which the elements of a set are listed
is immaterial.) For example, assuming for now that the objects listed are sets, we
have
{1, 1, 1} = {1, 1} = {1}

̸= {1, 2} = {2, 1}
̸= {1, {1, 2}} = {1, {2, 1}}
̸= {2, {2, 1}} = {2, {1, 2}} = {{1, 2}, 2} = {{2, 1}, 2}.
While an axiom is a proposition we take to be true (a universally applicable

premise), an axiom schema is a recipe for creating axioms. We introduce an axiom
schema below.
We obtain new sets from old ones thanks to the axiom schema given below.
3.2.5 Axiom Schema of Separation. Let φ(x) be a formula where the variable
x is free and where the variable B is not free. Then, for every such φ(x) and every
set A there is a set B consisting of precisely those elements x of A for which φ(x)
is true:
(∀A)(∃B)(∀x)[x ∈ B ↔ x ∈ A ∧ φ(x)].
The set B defined above is denoted by
B = {x ∈ A | φ(x)}
Note 3: Sets – July 28, 2020 3-5
(read: “B is the set of all x in A such that φ(x)”). Note: We must have a set A
before we can define the set B.
Since we have an axiom for each formula φ(x), we have infinitely many axioms,
hence the name axiom schema. (The axiom schema of separation shall also be
called, colloquially and informally, the axiom of subsets.) Note that we cannot
write something like ∀φ, since a formula is a metalinguistic notion, outside of the
language we have chosen to develop set theory. We can quantify over sets, and
only sets, the only objects in our theory.
3.2.6 Definition (The Empty Set). If any set A exists, so must the set ∅ (read:
“empty set”) defined by
(3.4) ∅ = {x ∈ A | x ̸= x}.
Thanks to the axiom schema of separation, (3.4) defines a set. To show that ∅ has
no elements and is independent of the choice of A used in (3.4), note that if x ∈
/A
then x cannot be in B. If x ∈ A then x cannot be in B either, since x = x always
(2.2.1). In other words: (∀x)(x ∈
/ ∅), there is no x which belongs to ∅.
Could there be another set, different from ∅, that has no elements? Suppose
that E is any set with no elements. Then x ∈ ∅ ⇒ x ∈ E, since x ∈ ∅ is always
false. Similarly, x ∈ E ⇒ x ∈ ∅, since x ∈ E is always false. We have shown that
x ∈ ∅ ⇔ x ∈ E.
Thanks to the axiom of extension, ∅ = E. This shows that any two empty sets are
identical: There is only one empty set.
3.2.7 Definition (Subsets). If A and B are sets and every element of B is also an
element of A, we say that B is a subset of A and write B ⊂ A, or A ⊃ B (read:
“A is a superset of B”). The defining formula is
(∀A)(∀B)[B ⊂ A ↔ (∀x)(x ∈ B → x ∈ A)].
We also call the phrase “B ⊂ A” an inclusion and say that A includes B or that
B is included in A. If B is a subset of A, but B ̸= A, we say that B is a proper
subset of A.
To deny B ⊂ A or A ⊃ B, we write B ̸⊂ A (read: “B is not a subset of A”) or
A ̸⊃ B (read: “A is not a superset of B”), respectively. We have
¬(B ⊂ A) ⇔ ¬(∀x)(x ∈ B → x ∈ A)
⇔ (∃x)[¬(x ∈ B → x ∈ A)]
⇔ (∃x)(x ∈ B ∧ x ∈
/ A).
In words: B is not a subset of A if and only if there is something in B that is not
in A.
Note 3: Sets – July 28, 2020 3-6
3.2.8 Example. Given the set A = {a, b, c}, its subsets are ∅, {a}, {b}, {c}, {a, b},
{a, c}, {b, c}, {a, b, c}. Except for the last, they all are proper subsets of A. Do not
confuse the notion of set membership with the notion of subsets: in this example
a ∈ A, but a ̸⊂ A. For B = {a, {a}}, we have {a} ∈ B and {a} ⊂ B.
3.2.9 Theorem (Singletons). The singleton of every member of a nonempty set
is a set.
Proof. If A is not empty, for every member a ∈ A, we have
{a} = {x ∈ A | x = a}.
3.2.10 Remark (Russell’s Paradox). The argument used in the proof of the the-
orem below is known as Russell’s paradox. Of course, in the present situation it
does not lead to a paradox; but it can if the axioms are not specified properly.
3.2.11 Theorem (No Set Contains Everything). For every set A there is a set B
such that B ∈
/ A.
Proof. Let A be any set. Use the axiom schema of separation to define the set
(3.5) B = {x ∈ A | x ∈/ x} }.
| {z
φ(x)
If B ∈/ A, there is nothing to prove and we are finished.

We show that B ∈ A leads to a contradiction (and is, therefore, impossible).
Accordingly, suppose that B ∈ A. Either B ∈ B or B ∈ / B. Let φ(x) stand for
x∈ / x.
If B ∈ B, then φ(B) is false. Therefore, thanks to (3.5), B ∈
/ B. We obtain
(3.6) B ∈ A → (B ∈ B → B ∈
/ B).
If B ∈
/ B, then φ(B) is true. Therefore, thanks to (3.5), B ∈ B. We obtain
(3.7) B ∈ A → (B ∈
/ B → B ∈ B).
From (3.6) and (3.7) we obtain
B ∈ A → (B ∈ B ↔ B ∈
/ B),
which is a contradiction. From the above and tautology
[p → (q ↔ ¬q)] → ¬p
we conclude that B ∈
/ A.
Note 3: Sets – July 28, 2020 3-7
3.2.12 Corollary (Nonexistence of a Universal Set). There is no set such that all
sets belong to it.
Proof. If A is any such set, we apply the theorem to A to obtain a set B ∈

/ A,
showing that not all sets belong to A.
3.2.13 Remarks (Origin of Russell’s Paradox). We illustrate the origin of Russell’s

paradox. Suppose for a moment that we replace the axiom schema of separation
with the axiom schema of abstraction: For every φ(x), where x is free, there is a
set X such that
x ∈ X ↔ φ(x).
Define the set A using x ∈
/ x for φ(x). Then, we obtain the set A such that
x∈A↔x∈
/ x.
If A ∈ A, we conclude that A ∈ / A. If A ∈ / A, we conclude that A ∈ A. Since

the axiom asserts that the set A exists, we cannot deny its existence. Hence, the
axiom leads to a contradiction. Since the axiom is true, we have a tautology that
implies a contradiction. We have a paradox.
This is the situation in which an axiomatization proposed by Frege∗ in 1893
was refuted by Russell.† In 1908, Zermelo proposed that the axiom schema of
abstraction be replaced with the axiom schema of separation.‡ Actually, Russell’s
paradox is predated by the Burali-Forti paradox.§
It is also very important to note that in the axiom schema of separation if we
allow the variable B to be free in φ(x) we can write
B = {x ∈ A | x ∈
/ B}.
If A is the empty set, then B is also the empty set. If A is not empty, we have
a ∈ A for some a. Then,
a∈B→a∈ /B
and
a∈
/ B → a ∈ B,
whence
a∈B↔a∈
/ B,
∗
Gottlob Frege, Grundgesetze der Arithmetik, 1893.
†
See Patrick Suppes, Axiomatic Set Theory, Dover, New York, 1972, pp. 5–12.
‡
Ernst Zermelo, “Untersuchungen ueber die Grundlagen der Mengenlehre,” Math. Annalen,
1908, Vol. 65, pp. 261-281.
§
Burali-Forti, C. “Una questione sui numeri transfiniti,” Rendiconti del Circolo Mat. di
Palermo, 1897, Vol. 11, pp. 154-164.
Note 3: Sets – July 28, 2020 3-8
a contradiction. Hence, the existence of the set B, guaranteed by the axiom schema
of separation, leads to a contradiction. We cannot escape contradiction by insisting
that B is always empty, for if B is empty and A is not, there must be some a ∈ A
with a ∈/ B, whence a ∈ B, making B nonempty.
3.3 Intersections and Differences

3.3.1 Definition (The Intersection of Two Sets). For any two sets A and B, their
intersection A ∩ B (read: “A intersection B”) is the set containing those elements
that A and B have in common (see Figure 3.1):
A ∩ B = {x ∈ A | x ∈ B},
or
A ∩ B = {x ∈ B | x ∈ A}.
In other words:
x ∈ A ∩ B ⇔ x ∈ A ∧ x ∈ B.
A B
Figure 3.1: The shaded portion represents A ∩ B.
3.3.2 Definition (Disjoint Sets). Two sets are said to be disjoint just in case they
have no element in common, i.e.: A and B are disjoint iff A ∩ B = ∅.
3.3.3 Definition (Set Difference). We denote the set difference A \ B (read: “A

minus B”) to be the set whose members are all the elements a of A which are not
members of B. That is,
A \ B = {x ∈ A | x ̸∈ B}.
That is: x ∈ A \ B ⇔ x ∈ A ∧ x ∈
/ B. (See Figure 3.2.)
Note 3: Sets – July 28, 2020 3-9
A B
Figure 3.2: The shaded portion represents A \ B.
3.3.4 Definition (Arbitrary Intersections). Let S be a nonempty collection of sets

and let A denote any set in S. Define the intersection
∩
S = {x ∈ A | (∀y ∈ S)(x ∈ y)}.
That the above definition does not depend on the choice of A is clear upon examin-
ing the condition for membership in ∩S (since A is an element of S and, therefore,
is one of the y’s): ∩
x∈ S ⇔ (∀y ∈ S)(x ∈ y).
Alternate notations are:
∩ ∩
A and {A | A ∈ S}.
A∈S
3.3.5 Examples.
1. Let A = {0, 1, 3, 4, 5} and B = {2, 3, 5, 6}. Then, A ∩ B = {3, 5}.
2. Let A = {a, b, c, d} and B = {a, c, e}. Then, A \ B = {b, d}.
3. For every set A we have that A \ A = ∅ and A \ ∅ = A.
4. Let R denote the reals and Q denote the rationals. Then R \ Q denotes the
irrationals.
5. Let A = {1, 2, 3}, B = {0, 1, 2}, C = {2, 3, 4} and S = {A, B, C}. Then
∩S = {2}.
3.4 Unions
According to the axiom schema of separation, we must specify a set C to define
the intersection
A ∩ B = {x ∈ C | x ∈ A ∧ x ∈ B}.
Note 3: Sets – July 28, 2020 3-10
We can take C to be A or B. It makes no difference. Unfortunately, we have no

mechanism for defining a set C such that x ∈ C ⇔ x ∈ A ∨ x ∈ B, since neither
A nor B will do. If we choose C = A, the set we define is simply A, and if we
choose C = B we get B. If A and B are just sets, any sets, we need an axiom to
guarantee the existence of C for any choice of A and B.
Instead of beginning by defining the union of just two sets, for technical reasons,
we start by defining the union of arbitrarily many sets below.
3.4.1 Axiom of Unions. Given any collection S of sets, there is a set which
contains all the elements of the members of S. That is, given any set S, there is a
set U such that
A ∈ S ∧ x ∈ A ⇒ x ∈ U.
Of course, U may contain much more. Put more precisely, we have:
(∀S)(∃U )(∀A)(∀x)(A ∈ S ∧ x ∈ A → x ∈ U ).
3.4.2 Definition (Unions). Suppose we have a collection S of sets. Let U denote

the set given by the axiom of unions. Define the union of the sets in S by
∪
S = {x ∈ U | x ∈ A for some A ∈ S};
or, less informally: ∪

S = {x ∈ U | (∃A ∈ S)(x ∈ A)}.
3.4.3 Remark. Alternate notations for ∪S are:

∪ ∪
A and {A | A ∈ S}.
A∈S
If the collection S consists of precisely two sets, S = {A, B}, we write

∪ ∪
A ∪ B or X for {A, B}.
X∈{A,B}
From the above, given any two sets A and B, can we form their union A ∪ B?
Of course we can, if A and B are both members of some set C. However, in general,
we do not know whether such set C always exists for any two given sets A and B.
Therefore, the answer is negative, unless we invent another axiom.
3.4.4 Axiom of Pairing. Any two sets are both members of some set:
(∀A)(∀B)(∃C)(A ∈ C ∧ B ∈ C).
Note 3: Sets – July 28, 2020 3-11
3.4.5 Definition (Unordered Pairs). Thanks to the axiom of pairing, any two sets
A and B belong to a third set C. But, C may contain more. The axiom schema
of separation gives us the set containing precisely A and B:
{A, B} = {x ∈ C | x = A ∨ x = B}.
3.4.6 Remark. Thanks to the axiom of pairing, for any two sets A and B the
unordered pair {A, B} is a set. The axiom of unions gives us a set U which contains
the elements of A and B. The axiom schema of separation gives us the set
A ∪ B = {x ∈ U | x ∈ A ∨ x ∈ B}.
(See Figure 3.3.)
A B
Figure 3.3: The shaded portion represents A ∪ B.
3.4.7 Example. If A = {a, b, c} and B = {b, c, d, e}, then A ∪ B = {a, b, c, d, e},

A ∩ B = {b, c}. Clearly, it is always the case that A and B are both subsets of
A ∪ B; that A ∩ B is a subset of A and of B.
3.4.8 Proposition (Singletons). The singleton of a set is a set.
Proof. When A = B, the unordered pair {A, B} is {A, A} = {A}. This says that
if A is a set, so is the singleton {A} of that set.
3.4.9 Proposition (There Is No Biggest Set). Every set is a proper subset of some
other set.
Proof. Given the set A there is a set B ∈

/ A (Theorem 3.2.11). The singleton {B}
of B is a set. The union A ∪ {B} of the sets A and {B} is a set. Then A is a
proper subset of A ∪ {B} because B ∈ / A.
Note 3: Sets – July 28, 2020 3-12
3.4.10 Proposition (De Morgan’s Laws). For any three sets A, B and C we have
(3.8) A \ (B ∪ C) = (A \ B) ∩ (A \ C),
(3.9) A \ (B ∩ C) = (A \ B) ∪ (A \ C).
Proof. We prove (3.8): We have, for all x,
x ∈ A \ (B ∪ C) ⇔ x∈A∧x∈ / (B ∪ C)
⇔ x ∈ A ∧ ¬(x ∈ B ∨ x ∈ C)
⇔ x ∈ A ∧ (x ∈
/ B∧x∈ / C)
⇔ (x ∈ A ∧ x ∈
/ B) ∧ (x ∈ A ∧ x ∈
/ C)
⇔ x∈A\B∧x∈A\C
⇔ x ∈ (A \ B) ∩ (A \ C).
We prove (3.9): We have, for all x,
x ∈ A \ (B ∩ C) ⇔ x∈A∧x∈ / (B ∩ C)
⇔ x ∈ A ∧ ¬(x ∈ B ∧ x ∈ C)
⇔ x ∈ A ∧ (x ∈
/ B∨x∈ / C)
⇔ (x ∈ A ∧ x ∈
/ B) ∨ (x ∈ A ∧ x ∈
/ C)
⇔ x∈A\B∨x∈A\C
⇔ x ∈ (A \ B) ∪ (A \ C).
3.5 Powers, Products and Ordered Pairs

Coordinates are fundamental in mathematics. For example, the familiar xy plane
consists of all ordered pairs of real numbers While the construction of the real
numbers is still far away, we lay the foundations required to construct sets of
ordered pairs.
3.5.1 Axiom of Powers. All subsets of a given set are elements of some other
set. That is, given any set A there is a set P such that if B is a subset of A then
B ∈ P:
(∀A)(∃P )(∀B)(B ⊂ A → B ∈ P ).
Note 3: Sets – July 28, 2020 3-13
3.5.2 Definition (The Power Set of a Set). With every set A we associate the set
P(A), the set of all subsets of A, called the power set of A:
P(A) = {x ∈ P | x ⊂ A},
where the set P is the set given by the axiom of powers.

3.5.3 Example. We have P(∅) = {∅} and
P({1, 2, 3}) = {∅, {1}, {2}, {3}, {1, 2}, {1, 3}, {2, 3}, {1, 2, 3}}.
3.5.4 Remark. When we list two objects, a and b, in that order, we speak of the
ordered pair “first a, then b”. We denote this ordered pair by (a, b). Instead of
introducing a new undefined primitive notion, we shall define (a, b) as a set. To do
this, we need to clear up some technicalities first.
3.5.5 Proposition (Background for Ordered Pairs). For every pair of sets A and
B and for every x ∈ A and every y ∈ B
{{x}, {x, y}} ∈ P(P(A ∪ B)).
Proof. For any two sets A and B we have:
x ∈ A ⇒ {x} ⊂ A ⇒ {x} ⊂ A ∪ B ⇒ {x} ∈ P(A ∪ B),
and
x ∈ A ∧ y ∈ B ⇒ {x} ⊂ A ∪ B ∧ {y} ⊂ A ∪ B
⇒ {x, y} ⊂ A ∪ B
⇒ {x, y} ∈ P(A ∪ B).
Since {x} and {x, y} are both members of P(A ∪ B), we have
{{x}, {x, y}} ⊂ P(A ∪ B),
which implies
{{x}, {x, y}} ∈ P(P(A ∪ B)).
3.5.6 Definition (Ordered Pairs). The ordered pair (x, y) is defined by
(x, y) = {{x}, {x, y}}.
3.5.7 Proposition (Equality of Ordered Pairs). Two ordered pairs (x, y) and
(x′ , y ′ ) are equal if and only x = x′ and y = y ′ .
Note 3: Sets – July 28, 2020 3-14
Proof. We need to prove that for all x, y, x′ and y ′ we have
(1) (x, y) = (x′ , y ′ ) ⇔ x = x′ ∧ y = y ′ .
Clearly, x = x′ ∧ y = y ′ ⇒ (x, y) = (x′ , y ′ ). This proves (1) from right to left.

For the proof of (1) from left to right, we need to establish
(x, y) = (x′ , y ′ ) ⇒ x = x′ ∧ y = y ′ .
Suppose that (x, y) = (x′ , y ′ ). We have two cases: x = y and x ̸= y.

If x = y then (x, y) = {{x}}. To have {{x}} = (x′ , y ′ ), the ordered pair (x′ , y ′ )
must be the singleton of {x} also, which means that x′ = y ′ = x.
If x ̸= y then (x, y) is a set with two elements. So, (x′ , y ′ ) must also be
a set with two elements, i.e.: x′ ̸= y ′ . For x ̸= y and x′ ̸= y ′ , the equality
{{x}, {x, y}} = {{x′ }, {x′ , y ′ }} implies {x} = {x′ } (implying x = x′ ) and implies
{x, y} = {x′ , y ′ } (implying y ∈ {x′ , y ′ }). Now, y ∈ {x′ , y ′ } implies y = x′ or y = y ′ .
That y = x′ is impossible, since x′ = x ̸= y. The only possibility left is that
y = y′.
3.5.8 Definition (Cartesian Products). Given two sets A and B, for x ∈ A and
y ∈ B, the set {{x}, {x, y}} belongs to P(P(A ∪ B)). We define the Cartesian (or
cross) product A × B (read: “A cross B”) of A and B by
{ }
A × B = (x, y) ∈ P(P(A ∪ B)) x ∈ A ∧ y ∈ B .
3.5.9 Example. We prove that (A × B) \ (C × D) = [(A \ C) × B] ∪ [A × (B \ D)].

For all ordered pairs (x, y) we have:
(x, y) ∈ (A × B) \ (C × D) ⇔ (x, y) ∈ A × B ∧ ¬(x, y) ∈ C × D

⇔ x ∈ A ∧ y ∈ B ∧ ¬(x ∈ C ∧ y ∈ D)
⇔ x ∈ A ∧ y ∈ B ∧ (x ∈/ C ∨y ∈ / D)
⇔ (x ∈ A ∧ y ∈ B ∧ x ∈/ C) ∨ (x ∈ A ∧ y ∈ B ∧ y ∈
/ D)
⇔ (x ∈ A \ C ∧ y ∈ B) ∨ (x ∈ A ∧ y ∈ B \ D))
⇔ (x, y) ∈ (A \ C) × B ∨ (x, y) ∈ A × (B \ D)
⇔ (x, y) ∈ [(A \ C) × B] ∪ [A × (B \ D)].
3.5.10 Exercise. Commutativity: Prove the following for any two sets A and B.
(a) A ∪ B = B ∪ A,
(b) A ∩ B = B ∩ A.
Note 3: Sets – July 28, 2020 3-15
3.5.11 Exercise. Associativity: Prove the following for any three sets A, B,
and C.
(a) (A ∪ B) ∪ C = A ∪ (B ∪ C),
(b) (A ∩ B) ∩ C = A ∩ (B ∩ C).
3.5.12 Exercise. Distributivity: Prove the following for any three sets A, B,
and C.
(a) A ∪ (B ∩ C) = (A ∪ B) ∩ (A ∪ C),
(b) A ∩ (B ∪ C) = (A ∩ B) ∪ (A ∩ C).
3.5.13 Exercise. Prove the following for any two sets A and B.
(a) A = B ⇔ A ⊂ B ∧ B ⊂ A,
(b) A ⊂ B ⇔ A ∪ B = B,
(c) A ⊂ B ⇔ A ∩ B = A,
(d) A ⊂ B ⇔ A \ B = ∅.
3.5.14 Exercise. Prove the following for any four sets A, B, C and D.
(a) A \ (A \ B) = A ∩ B,
(b) A ⊂ B ∧ B ⊂ C ⇒ A ⊂ C,
(c) A ∩ (B \ C) = (A ∩ B) \ (A ∩ C),
(d) (A ∪ B) × C = (A × C) ∪ (B × C),
(e) (A ∩ B) × (C ∩ D) = (A × C) ∩ (B × D),
(f ) (A \ B) × C = (A × C) \ (B × C).
3.5.15 Exercise. Symmetric Difference: Define the binary operation ∆ (sym-
metric difference) between the sets A and B by
A∆B = (A \ B) ∪ (B \ A).
(See Figure 3.4.) Prove that this operation is commutative, associative, A∆∅ = A
and A∆A = ∅.
Note the alternate definition
A∆B = (A ∪ B) \ (A ∩ B).
prove that
(A ∪ B) \ (A ∩ B) = (A \ B) ∪ (B \ A)
for all sets A and B.
Note 3: Sets – July 28, 2020 3-16
A B
Figure 3.4: The shaded portion represents A∆B.
3.5.16 Exercise. Every product A×B is a set of ordered pairs. Prove the following
converse: for every set S of ordered pairs, there are sets A and B such that S ⊂
A × B.
3.5.17 Exercise. Does the trick used to define ordered pairs extend to define or-
dered triples? Define the triple (x, y, z) = {{x}, {x, y}, {x, y, z}}. If the statement
(x, y, z) = (x′ , y ′ , z ′ ) → x = x′ ∧ y = y ′ ∧ z = z ′
is true for all x, y, z, x′ , y ′ , z ′ , give a proof. Otherwise give a counterexample.
3.6 Appendix A: The Axiom Schema of Separa-

tion
The axiom schema of separation is quite frequently stated as follows.
(S1) Let ψ(x, y1 , . . . , yk ) be a formula only in the variables x, y1 , . . . , yk , where x

is free. (The variables y1 , . . . , yk may or may not be free). If Y is not one of
the y1 , . . . , yk then
(∀y1 ) · · · (∀yk )(∀X)(∃Y )(∀x)[x ∈ Y ↔ x ∈ X ∧ ψ(x, y1 , . . . , yk )].
What is the difference between the formulation given here and that given in Sec-
tion 3.2.5? We repeat the latter below.
(S2) Let φ(x) be a formula where the variable x is free and where the variable Y
is not free. Then,
(∀X)(∃Y )(∀x)[x ∈ Y ↔ x ∈ X ∧ φ(x)].

Note 3: Sets – July 28, 2020 3-17
As an example, examine the definition of set intersection: C = A ∩ B, where

C = {x ∈ A | x ∈ B}. Then, with ψ(x, B) ↔ x ∈ B, (S1) gives
(3.10) (∀B)(∀A)(∃C)(∀x)(x ∈ C ↔ x ∈ A ∧ x ∈ B),
while, with φ(x) ↔ x ∈ B, (S2 gives
(3.11) (∀A)(∃C)(∀x)(x ∈ C ↔ x ∈ A ∧ x ∈ B).
Now, (3.10) plainly states that for every pair of sets A and B there is a set
C = A ∩ B. We need only one formula, ψ(x, B) ⇔ x ∈ B. Only one axiom is
needed to obtain all intersections A ∩ B. On the other hand, (3.11) says that for
every A there is a C = A ∩ B, which depends on the formula φ(x) ⇔ x ∈ B. We
can get any A ∩ B we want, but we must change the formula and, therefore rely on
a different axiom for each choice of B. We no longer obtain all intersections A ∩ B
with a single axiom. In the end, both (S1) and (S2) serve the same purpose.
Note that we cannot modify (S2) to read
(3.12) (∀ψ)(∀X)(∃Y )(∀x)[x ∈ Y ↔ x ∈ X ∧ φ(x)].
We have agreed to quantify only over sets, not formulas. We are using what is
called a first-order predicate language.
We close by pointing out that the axiom schema of separation follows from the
axiom schema of replacement, which shall be presented later.
Foundations Note 4
The Natural Numbers

July 28, 2020
Contents
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-1
4.2 The Set of Natural Numbers . . . . . . . . . . . . . . . . . . 4-2
4.3 The Peano Postulates . . . . . . . . . . . . . . . . . . . . . . 4-3
4.4 Trichotomy . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-8
4.1 Introduction
Assuming, for the time being, that a set exists and, recalling that
• if a set exists so does the empty set ∅ and
• if a is set so are {a} and a ∪ {a},
we can define the first four natural numbers as:
0 = ∅,
1 = 0 ∪ {0} = {0},
2 = 1 ∪ {1} = {0} ∪ {{0}} = {0, {0}} = {0, 1},
3 = 2 ∪ {2} = {0, 1} ∪ {{0, 1}} = {0, 1, {0, 1}} = {0, 1, 2}.
Expressed just in terms of the empty set, the above are:
0 = ∅,
1 = {∅},
2 = {∅, {∅}},
3 = {∅, {∅}, {∅, {∅}}}.
4-1
Note 4: The Natural Numbers – July 28, 2020 4-2
The numbers just defined are all distinct from each other. Other consequences
of the above definitions are 0 ∈ 1 ∈ 2 ∈ 3 and 0 ⊂ 1 ⊂ 2 ⊂ 3.
We have defined just the first four natural numbers. To define them all and
package them in a set that can replace our intuitive notion N we need a new axiom.
Below, we show how to accomplish this.
4.2 The Set of Natural Numbers

Let us abstract the properties we require to define the natural numbers as a set.
4.2.1 Definition. Inductive Sets: A set A is called inductive if
(4.1) (∅ ∈ A) ∧ (∀x)(x ∈ A → x ∪ {x} ∈ A).
Define the successor of any set x to be the set x+ = x ∪ {x}. Then, an inductive
set is a set that contains the empty set and the successor of every of its members.
4.2.2 Remark. At this point we need an axiom to ensure that an inductive set
exists.
4.2.3 Axiom of Infinity. There exists an inductive set, i.e.:
(∃A)[(∅ ∈ A) ∧ (∀x)(x ∈ A → x ∪ {x} ∈ A)].
4.2.4 Remarks. Note that the above axiom can also be written
(∃A)[{x ∈ A | x ̸= x} ∈ A ∧ (∀x)(x ∈ A → x ∪ {x} ∈ A)].
The existence of ∅ = {x ∈ A | x ̸= x} is part of the axiom.

The axiom of infinity asserts that an inductive set exists. This axiom does not
say how many inductive sets there are and does not say what they may look like.
We want the smallest (in an appropriate sense) inductive set. This is accomplished
via the construction shown below.
4.2.5 Definition. The Set of Natural Numbers: The set of natural numbers ω
is defined as the smallest (by inclusion) inductive set. Let ϕ(B) denote (4.1) with
A replaced by B. Let A denote the inductive set whose existence is guaranteed by
the axiom of infinity. Define
ω = {x ∈ A | (∀B)(ϕ(B) ↔ x ∈ B)}.
4.2.6 Proposition. The set ω of natural numbers is the smallest inductive set.
Proof. By the axiom of extension, the set ω is defined uniquely. We show that ω is
inductive. We have ∅ ∈ ω because ∅ belongs to every inductive set. If x ∈ ω, then
x must belong to every inductive set B. Therefore, x ∪ {x} also belongs to every
inductive set B, and, therefore, to ω. This shows that ω is inductive. We now show
the minimality of ω. Note that for every x, x belongs to ω if and only if x belongs
to every inductive set. Therefore, ω is a subset of every inductive set.
4.3 The Peano Postulates

In our setting the Peano postulates of Note 0 for the natural numbers ω can be
proved, i.e.: they are not taken as axioms. As a matter of fact, several of the
postulates were used in the definition of ω. The postulates are stated as (P 1)
through (P 5) below. The proof of (P 5) requires a simple technical result.
The set ω of natural numbers is inductive. That is, we have
(P 1) 0∈ω
and
(P 2) (∀n) (n ∈ ω → n+ ∈ ω).
4.3.1 Theorem. The set ω of natural numbers has the properties
(P 3) (∀A)[A ⊂ ω ∧ 0 ∈ A ∧ (∀n)(n ∈ A → n+ ∈ A) → A = ω],
i.e.: every inductive subset of ω must be ω; and
(P 4) (∀n) (n ∈ ω → n+ ̸= 0),
i.e.: 0 is not the successor of any natural number.

Proof. To prove (P 3), let A by any set such that
0 ∈ A ∧ (∀n)(n ∈ A → n+ ∈ A).
Then, A is an inductive set. Therefore, by the minimality of ω, we have ω ⊂ A. If

A ⊂ ω also, it follows that A = ω.
To prove (P 4), we note that n+ always contains n (since n+ = n ∪ {n}) and
that 0 is empty. Therefore, no n+ can be 0.
4.3.2 Remarks. Postulate (P 3) is called the induction principle. We show below
how to use it to construct an inductive proof, a proof where we show that if for
some set A we have (i) 0 ∈ A and (ii) n ∈ A → n+ ∈ A we can conclude that
A = ω.
The induction principle is sometimes formulated in a slightly generalized form

called the principle of strong induction:
The set ω of natural numbers has the property
(P 3′ ) (∀A)[A ⊂ ω ∧ (∀n)(n ∈ ω ∧ n ⊂ A → n ∈ A) → A = ω].
Any member m of a natural number n is called a predecessor of n (said differently,

each natural number is the set of all its predecessors). A proof by strong induction
concludes that a set A of natural numbers is equal to ω by showing that, whenever
every predecessor of a natural number n is in A (this is the statement n ⊂ A), n
itself is in A. Note that, since 0 is a natural number having no predecessors, the
∀n statement, with n = 0, simply states that 0 ∈ A unconditionally; therefore, just
like a regular inductive proof, a proof by strong induction should start by showing
that 0 ∈ A.
What can we say about a natural number m that is a member of some successor
n+ = n ∪ {n} of a natural number n? Of course, for any pair of sets m and n,
m ∈ n ∪ {n} ⇔ m ∈ n ∨ m ∈ {n}
(4.2)
⇔ m ∈ n ∨ m = n.
Can m ∈ n and m = n both be true? If so, then we have n ∈ n. If m and n are

natural numbers, this cannot be, as we show below.∗
4.3.3 Proposition. The natural numbers have the following properties.
(I) Every element of any natural number is a natural number:
(4.3) (∀n)(n ∈ ω → n ⊂ ω).
(II) No natural number is a subset of any of its elements; i.e.:
(4.4) (∀n ∈ ω)(∀m ∈ ω)(m ∈ n → n ̸⊂ m),
or, equivalently, by contraposition,
(∀n ∈ ω)(∀m ∈ ω)(n ⊂ m → m ∈

/ n).
(III) If m and n are natural numbers such that m ∈ n+ , then either m ∈ n or

m = n (exclusive or).
∗
We shall leave unanswered whether it is possible for a set to be a member of itself. This will
not be true for any set we encounter in these notes.
Proof. First we prove (I) by induction. Consider the set A of all natural numbers
having the property (4.3):
A = {n ∈ ω | n ⊂ ω}.
To prove (I), it suffices to show that A = ω. We need to prove that A fulfills the
criteria of (P 3):
(i) We show that 0 ∈ A (the base step of the inductive proof). This is obvious,
since 0 = ∅ ⊂ ω.
(ii) We show that n ∈ A ⇒ n+ ∈ A (the inductive step of the proof). So we start

by assuming that n ∈ A (the inductive hypothesis), and seek to conclude that
n+ ∈ A. Thus, we need to show that m ∈ n+ ⇒ m ∈ ω. So let m ∈ n+ .
Since n+ = n ∪ {n}, we have m ∈ n or m = n. In case m ∈ n, the inductive
hypothesis n ∈ A means simply that n ⊂ ω, which implies that m ∈ ω. In
case m = n, the inductive hypothesis n ∈ A immediately gives n ∈ ω since
A ⊂ ω.
Now we prove (II): Let A be the set of all natural numbers defined by
A = {n ∈ ω | (∀m) (m ∈ n → n ̸⊂ m)}
(4.5)
= {n ∈ ω | (∀m) (n ⊂ m → m ∈ / n)}.
We need to show that A = ω. The proof is inductive.
(i) We prove that 0 ∈ A: Since 0 has no element, m ∈ 0 is always false. Hence,

the implication m ∈ 0 ⇒ 0 ̸⊂ m is always true. Therefore, 0 ∈ A.
(ii) We prove that n ∈ A ⇒ n+ ∈ A: Suppose that n ∈ A.
(a) Since n ⊂ n, we have n ∈

/ n by (4.5).
(b) Since n ∈ n ∪ {n} and n ∈
/ n, it follows that n+ = n ∪ {n} is not a
subset of n.
(c) If n+ ⊂ m, for some natural number m, then n ⊂ m also and, since
n ∈ A, we have that m ∈
/ n, by (4.5).
Thus, n+ is not a subset of n and n+ is not a subset of any element of n. In

other words,
(∀m)(m ∈ n+ → n+ ̸⊂ m).
Therefore n+ ∈ A.
We have shown that 0 ∈ A and
(∀n)(n ∈ A → n+ ∈ A).
Therefore, thanks to (P 3), A = ω.

Finally, we prove (III): If m and n are natural numbers such that m ∈ n+ ,
then, by (4.2), m ∈ n or m = n. If m ∈ n and m = n, then n ∈ n. This contradicts
(4.4) which implies
n ∈ n → n ̸⊂ n.
Therefore, m ∈ n ∧ m = n is impossible. This leaves us with
m ∈ n ∪ {n} ⇔ (m ∈ n ∨ m = n) ∧ ¬(m ∈ n ∧ m = n)
⇔ ¬(m ∈ n ⇔ m = n),
i.e.: if m ∈ n+ , then either m ∈ n or m = n (exclusive or).

4.3.4 Definition. Transitive Sets: A set A is called transitive if
(∀a)(a ∈ A → a ⊂ A),
i.e.: a set is transitive if it includes everything it contains.

4.3.5 Examples.
1. The set A = {∅, {∅}} is transitive. Its elements are ∅ and {∅}. Its subsets
are ∅, {∅}, {{∅}} and {∅, {∅}}. Every element of A is also a subset of A.
2. The set A = {∅, {{∅}}} is not transitive. Its elements are ∅ and {{∅}}.
Its subsets are ∅, {∅}, {{{∅}}} and {∅, {{∅}}}. We have {{∅}} ∈ A, but
{{∅}} ̸⊂ A.
4.3.6 Proposition. Every natural number is transitive, i.e.:
(∀n ∈ ω)(∀m ∈ ω)(m ∈ n → m ⊂ n).
Proof. Define A to be the set of all transitive natural numbers, i.e.:
A = {n ∈ ω | (∀m) (m ∈ n → m ⊂ n)}.
We will show that A = ω. The proof is by induction:

(i) We prove that 0 ∈ A: Since 0 has no element, m ∈ 0 is false for all m.
Therefore, the implication m ∈ 0 ⇒ m ⊂ 0 is true for all m. Hence, 0 ∈ A.
(ii) We prove that n ∈ A ⇒ n+ ∈ A: Suppose that n ∈ A, i.e.: suppose that
n is transitive. For every m ∈ n+ we have either m ∈ n or m = n (by
Proposition 4.3.3). We have two cases to consider, (a) m ∈ n and (b) m = n:
(a) If m ∈ n then m ⊂ n (because n is transitive), therefore m ∈ n+

(because n+ = n ∪ {n}).
(b) If m = n, then m ⊂ n+ (because n+ = n ∪ {n}.
It follows that every element of n+ is a subset of n+ ; that is, n+ ∈ A.
We have shown that 0 ∈ A and (∀n)(n ∈ A ⇒ n+ ∈ A). Therefore, thanks to (P 3),

A = ω.
4.3.7 Theorem. If two natural numbers are distinct then so are their successors,
i.e.: the set ω of natural numbers has the property
(∀n ∈ ω)(∀m ∈ ω)(n ̸= m → n+ ̸= m+ ),
or, by contraposition,
(P 5) (∀n ∈ ω)(∀m ∈ ω)(n+ = m+ → n = m).
Proof. Suppose that m and n are in ω and that n+ = m+ .

Since n ∈ n+ and n+ = m+ , we have that n ∈ m+ . Hence, since m+ = m∪{m},
either n ∈ m or n = m. We have shown that
(4.6) ¬(n ∈ m ↔ n = m).
Since m ∈ m+ and m+ = n+ , we have that m ∈ n+ . Hence, since n+ = n ∪ {n},

either m ∈ n or m = n.
(4.7) ¬(m ∈ n ↔ m = n).
If m = n, we are finished. If m ̸= n, since (4.6) and (4.7) are both true, we have
m ∈ n and n ∈ m. Therefore, since n and m are transitive by Proposition 4.3.6,
we have m ⊂ n and n ⊂ m. That is: m = n.
4.3.8 Exercise. Prove that every natural number other than zero is the successor
of a natural number.
4.3.9 Example. We end this section with an informal example of a proof by strong
induction. We have not yet introduced either the arithmetic operations or the order
of ω (a task we will postpone until Note 6), so we are forced to provide an informal
example that relies on our physical intuition and some (yet unproven) properties
of natural numbers.
Suppose that n is a natural number and we are in possession of a rectangular
chocolate bar of size n + 1 by 1. (We take the length of the bar to be n + 1 so that
even when n = 0 we have a 1-by-1 bar rather than no chocolate at all.) Suppose
this bar is meant to be cut into 1-by-1 pieces (n+1 of them). It is very easy to show,
using ordinary induction, that n cuts will be necessary if, at each successive step,
we cut off (say) the rightmost 1-by-1 square, repeating n times over. It should also
be intuitively clear that, regardless of the position of the successive cuts exactly
n cuts will be necessary. The latter statement is most easily proved by strong
induction.
Claim. It takes n cuts to break an (n + 1)-by-1 chocolate bar into 1-by-1 pieces,
regardless of the order in which the cuts are performed.
Proof. Let A be the set of natural numbers n for which the statement above is
true. By strong induction on n, we will show that A = ω.
The truth of the statement for n = 0 is obvious, since 0 + 1 = 1 and a 1-by-1
chocolate bar is already cut into 1-by-1 pieces, so zero cuts are needed.
Assume now that n is a nonzero natural number such that n ⊂ A; in other
words, every predecessor m of n is an member of A (this is the strong inductive
hypothesis). Perform a first cut on the bar in any allowable place; this splits the bar
into two smaller bars of sizes a-by-1 and b-by-1. Neither a nor b is zero; therefore
(by Exercise 4.3.8), we have a = k + = k + 1, b = m+ = m + 1 for some natural
numbers k, m. Moreover, the fact that no chocolate was lost nor added in the
process of cutting means that a + b = n + 1.
Since neither a, b are zero, it follows from a + b = n+ that both a, b are prede-
cessors of n+ (though we haven’t proved this). Hence, k, m are predecessors of n
(this is proved in Proposition 4.4.7 below).
By the strong inductive hypothesis, both k, m are members of A; therefore,
the a-by-1 bar requires k cuts to be split into 1-by-1 pieces and the b-by-1 bar
requires m.
We conclude that k + m cuts will always be required to cut the two smaller
bars into 1-by-1 pieces. Adding the first cut that split the (n + 1)-by-1 bar into
two, we see that the altogether number of cuts needed is equal to 1 + k + m =
a + b − 1 = n; hence the statement is true for n, so n ∈ A, and the strong induction
is complete.
4.4 Trichotomy
If m and n are natural numbers, m ∈ n does not formalize the intuitive notion that
“m is less than n” unless exactly one of m ∈ n, m = n and n ∈ m is true. We show
that this is indeed the case.
4.4.1 Proposition. For any two natural numbers m and n at most one of the
statements m ∈ n, m = n and n ∈ m can be true.
Proof. Recall that
(4.8) no natural number is a subset of any of its elements.
If m = n and n ∈ m then m ∈ m and m ⊂ m, contradicting (4.8). If m = n and

m ∈ n then, again, m ∈ m and m ⊂ m, contradicting (4.8).
Recall that every natural number is a transitive set (it include everything it
contains). Therefore, m ∈ n implies m ⊂ n and n ∈ m implies n ⊂ m. Hence,
m ∈ n and n ∈ m together imply m = n. Then, m = n together with m ∈ n (or
n ∈ m), once again, contradicting (4.8).
4.4.2 Definition. Two natural numbers m and n are called comparable if m ∈ n,
or m = n, or n ∈ m. For each natural number n define
Cn = {m ∈ ω | m ∈ n ∨ m = n ∨ n ∈ m}
(the set of all natural numbers comparable with n) and define
N = {n ∈ ω | Cn = ω}
(the set of all natural numbers comparable with every natural number).
4.4.3 Remarks. Note that n ∈ N is equivalent to Cn = ω. From the definition of
Cn it is clear that
(4.9) n ∈ Cn
and
(4.10) m ∈ n → m ∈ Cn .
If m is comparable with n then n is comparable with m and vice versa. Therefore
(4.11) m ∈ Cn ↔ n ∈ Cm .
4.4.4 Proposition. 0 ∈ N , i.e.: C0 = ω.

Proof. The proof is inductive. 0 ∈ C0 trivially. We prove that m ∈ C0 ⇒ m+ ∈ C0
for all natural numbers m to conclude that C0 = ω. Suppose that m ∈ C0 . There
are three cases to consider: m ∈ 0, m = 0 and 0 ∈ m
(a) The case m ∈ 0 is impossible for any m.
(b) If m = 0 then 0 ∈ m+ , i.e.: 0 ∈ 0+ . Thus, 0 ∈ Cm+ . Hence, m+ ∈ C0 .

(c) If 0 ∈ m then 0 ∈ m+ , since m+ contains all of its predecessors. Thus,

0 ∈ Cm+ . Hence, m+ ∈ C0 .
In all possible cases m+ ∈ C0 . We have shown that C0 = ω.
4.4.5 Proposition. Suppose that Cn = ω for some natural number n. Then, for
all natural numbers m
m ∈ n+ → m+ ∈ Cn+ .
Proof. If m ∈ n+ , since n+ = n ∪ {n}, we have two cases to consider: m = n and
m ∈ n.
(a) If m = n then m+ = n+ . Hence, m+ ∈ Cn+ .
(b) Consider the case m ∈ n. Since Cn = ω, we have m+ ∈ Cn . Therefore, we

have three possible cases: m+ ∈ n, m+ = n and n ∈ m+ .
(i) If m+ ∈ n then m+ ∈ n+ , since m ∈ n (hypothesis). Hence, m+ ∈ Cn+ .

(ii) If m+ = n then m+ ∈ n+ , since n ∈ n+ . Hence, m+ ∈ Cn+ .
(iii) If n ∈ m+ then either n ∈ m or n = m, since m+ = m∪{m}. Since m is a
transitive set (it includes everything it contains), n ∈ m implies m ⊂ m.
Hence, n ⊂ m in either case. Because we also have m ∈ n (hypothesis),
n is a subset of one of its elements, a contradiction. Therefore, n ∈ m+
is impossible.
In all possible cases m+ ∈ Cn+ .

4.4.6 Proposition. Trichotomy Law. Any two natural numbers are comparable,
i.e.: N = ω. In other words, given any two natural numbers m and n, exactly one
of the following holds:
m ∈ n, m = n, n ∈ m.
Proof. The proof is inductive. Thanks to Proposition 4.4.4, 0 ∈ N . We prove that

n ∈ N ⇒ n+ ∈ N , i.e.: that
(∀n)(Cn = ω → Cn+ = ω),
to conclude that N = ω.
Suppose that n ∈ N for some natural number n, i.e.: suppose that Cn = ω.
The proof that Cn+ = ω is inductive. We prove that 0 ∈ Cn+ . Then we prove that
m ∈ Cn+ → m+ ∈ Cn+
for all natural numbers m to conclude that Cn+ = ω.

(a) We prove that 0 ∈ Cn+ . Since C0 = ω we have n+ ∈ C0 . Hence, 0 ∈ Cn+ .
(b) We prove that m ∈ Cn+ → m+ ∈ Cn+ . Suppose that m ∈ Cn+ . There are
three cases to consider: m ∈ n+ , m = n+ and n+ ∈ m.
(i) If m ∈ n+ then m+ ∈ Cn+ , thanks to Proposition 4.4.5.

(ii) If n+ = m then n+ ∈ m+ , since m+ = m ∪ {m}. Thus, n+ ∈ Cm+ .
Hence, m+ ∈ Cn+ .
(iii) If n+ ∈ m then n+ ∈ m+ , since m+ = m ∪ {m}. Thus, n+ ∈ Cm+ .
Hence, m+ ∈ Cn+ .
In all cases in (b) m+ ∈ Cn+ .
By induction, we conclude that Cn+ = ω.

4.4.7 Proposition.
(4.12) (∀m ∈ ω)(∀n ∈ ω)(m ∈ n ↔ m+ ∈ n+ ).
Proof. Recall that a natural number is a transitive set (it includes everything it
contains). In other words, for n and ℓ in ω we have
(4.13) n ∈ ℓ ⇒ n ⊂ ℓ.
Therefore for m, n and ℓ in ω we have, thanks to Exercise 4.4.9
(4.14) m ∈ n ∧ n ∈ ℓ ⇒ m ∈ ℓ.
We prove that
(4.15) m+ ∈ n+ ⇒ m ∈ n.
Suppose that m+ ∈ n+ . Since n+ = n ∪ {n}, m+ ∈ n+ means m+ ∈ n ∪ {n} which

implies m+ ∈ n or m+ = n. Since m ∈ m+ always, we have m ∈ m+ ∧ m+ ∈ n or
m ∈ m+ ∧ m+ = n. Hence, m ∈ n in either case, thanks to (4.14) in the first of
the two cases.
Now, we prove that
(4.16) m ∈ n → m+ ∈ n+ .
We proceed by induction on n. The case where n = 0 is vacuous. So, (4.16) holds

trivially for n = 0. Suppose (4.16) holds. Suppose that m ∈ n+ . Then m ∈ n or
m = n.
• If m ∈ n then, by the inductive hypothesis,
m+ ∈ n+ ⊂ (n+ )+ .
Hence, m+ ∈ (n+ )+ .
• If m = n then, by the inductive hypothesis,
m+ = n+ ∈ (n+ )+ .
Hence, m+ ∈ (n+ )+ .
In either case we obtain

m ∈ n+ ⇒ m+ ∈ (n+ )+ ,
which is (4.16) with n replaced by n+ . We have established (4.16).
Now, (4.15) and (4.16) establish (4.12).
4.4.8 Remarks. The last two propositions establish two crucial properties of the
natural numbers. In Note 6 we formalize the intuitive notion m < n by the property
m ∈ n. Then, trichotomy states that for any two natural numbers m and n exactly
one of the following holds: m < n, m = n, n < m.
The sum m + n of two natural number shall be defined by repeated applications
of the successor operation. We begin by defining m + 0 = m. Then, for 0 < n, the
sum m + n will be defined by formalizing the intuitive idea
m + n = (. . . ((m + 1) + 1) . . .) + 1 .
| {z }
n times
4.4.9 Exercise. Prove that if A is a transitive set, then
(4.17) (∀a)(∀x)(x ∈ a ∧ a ∈ A → x ∈ A).
4.4.10 Exercise. Prove that if A is a set such that (4.17) holds, then A is transitive.
4.4.11 Exercise. Prove that if A is a transitive set, then ∪A ⊂ A.
4.4.12 Exercise. Prove that if a is a transitive set, then ∪(a+ ) = a.
4.4.13 Exercise. Prove that A is a set such that ∪A ⊂ A, then A is transitive.
4.4.14 Exercise. Prove that if a is a transitive set, then a+ is a transitive set.
4.4.15 Exercise. Prove that ω is transitive.
4.4.16 Exercise. Prove that if A is a transitive set, then P(A) is a transitive set.
4.4.17 Exercise. Prove that if A is a transitive set, then ∪A is a transitive set.
4.4.18 Exercise. Prove that if every member of a set A is a transitive set, then
∩A is a transitive set.
4.4.19 Exercise. Prove that if every member of a set A is a transitive set, then
∪A is a transitive set.
4.4.20 Exercise. Prove Postulate (P 3′ ) (the principle of strong induction).
Suggestion: Show that if A satisfies the premises of (P 3′ ), then the set
B = {n ∈ ω | n ⊂ A}
is inductive. Conclude that A = ω.

Foundations Note 5
Functions and Relations

July 28, 2020
Contents
5.1 Functions, Forward and Inverse Images . . . . . . . . . . . . 5-1
5.2 Compositions and Identities . . . . . . . . . . . . . . . . . . 5-5
5.3 Injections, Surjections, Bijections and Inverses . . . . . . . . 5-7
5.4 Properties of Functions . . . . . . . . . . . . . . . . . . . . . 5-11
5.5 Indexed Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-15
5.6 Equinumerous Sets . . . . . . . . . . . . . . . . . . . . . . . 5-17
5.7 Relations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-21
5.8 The Recursion Theorem . . . . . . . . . . . . . . . . . . . . . 5-26
5.9 Cantor and Schröder-Bernstein Theorems . . . . . . . . . . . 5-30
5.1 Functions, Forward and Inverse Images

Intuitively, a function f from the set A to the set B is a rule which assigns to
every element a of A exactly one element f (a) of B. Instead of introducing rule
as another undefined primitive notion, we shall define functions as special kinds of
sets.
5.1.1 Remarks. So far, the only numbers at our disposal are the natural numbers
ω developed in Note 4. We adopt the more familiar name N for ω to evoke the
arithmetic structure afforded by the familiar operations of addition and multipli-
cation. These operations will be introduced and studied in Note 6). Nevertheless,
5-1
Note 5: Functions and Relations – July 28, 2020 5-2
N, the integers Z, the rationals Q and the reals R with the usual arithmetic op-
erations will be used in many of the examples given to show the meaning of what
we are doing. Recall that we start the natural numbers with 0, not 1 (as some
authors do). The examples are not part of the official development of ideas, but
are intended to be useful in understanding where these ideas will eventually lead.
5.1.2 Definition. Functions. A function f from (or on) A to (or into) B is a
subset of A×B such that for each a ∈ A there is exactly one ordered pair (a, b) ∈ f .
f
We usually write∗ f : A → B or A → B to say that f is a function from A to
B. The set A is called the domain of f and B is called the codomain of f . When
the name of the function is not mentioned, we simply write A → B.
If (a, b) ∈ f , we write f (a) for b and we say that f maps a to f (a). If the value
f (a) is given by some expression E(a) (a formula where the variable a is free), we
may write
f : A → B : a 7→ E(a)
(read: “a maps to E(a)”). The text f (a) = E(a) is called the rule (for computing
the value) of f .
5.1.3 Examples. Below are two simple examples showing how functions may be
defined.
(1) The text

f : R → R : x 7→ x2
denotes the function that maps every real number x to the real number x2 .
The rule is given by f (x) = x2 . Here, E(x) is x2 . The function f is the set
f = {(x, y) ∈ R × R | y = x2 }.
(2) Define f : R → R by the rule

{
1, x rational,
f (x) =
0, x irrational.
We have defined the function

( ) ( )
f = Q × {1} ∪ (R \ Q) × {0} .
5.1.4 Exercise. Consider the following statement.

∗
Do not confuse the function symbol → with the sentential connective →. (They are identical.)
We clearly have a notational clash. Another clash appears in “1/n → 0 as n → ∞.”
A subset f of A × B is a function if
(∀a ∈ A)(∃b ∈ B)(∀b′ ∈ B)[(a, b) ∈ f ∧ (a, b′ ) ∈ f → b = b′ ].
Is the above statement true? Explain.

(∀a ∈ A)(∃b ∈ B)[(a, b) ∈ f ].

(∀a ∈ A)(∃b ∈ B)(∀b′ ∈ B)[(a, b) ∈ f ∧ (a, b′ ) ∈

/ f → b ̸= b′ ].

5.1.7 Definition. For any two sets A and B we define another set, B A (read: “B
to the A”), as follows. If f : A → B then f ⊂ A × B. Therefore, f ∈ P(A × B).
Thanks to the axiom schema of separation, we define the set
B A = {f ∈ P(A × B) | f is a function}.
In plainer language, given any two sets A and B, the collection of all functions
from A to B is a set. This set is is denoted by B A .
5.1.8 Remarks. A function is a set. Therefore, equality of functions is simply
equality of sets. That is, the functions f and g are equal if and only if the domains
are equal, the codomains are equal and f (x) = g(x) for all x in the domain.
Note that if A is empty so is A × B. This situation gives rise to the empty
function (which is just the empty set ∅). That there is only one empty function is
a consequence of the axiom of extension.
Do not confuse f with f (a). If f : A → B then f ∈ B A , while f (a) ∈ B.
5.1.9 Examples.
(1) If A is nonempty then ∅A is empty because there is no function with nonempty

domain and empty codomain. (Since every f : A → ∅ must assign some
f (a) ∈ ∅ to every a ∈ A and A ̸= ∅, no such f exists.)
(2) If B is a set then B ∅ = {∅} because there is only one function ∅ : ∅ → B

with empty domain.
(3) The set RN is the collection of all “sequences” f (0), f (1), f (2), . . . of real
numbers.
5.1.10 Exercise. Prove that B A = ∅ iff B = ∅ and A ̸= ∅.

5.1.11 Exercise. Prove that if B ⊂ C then B A ⊂ C A .
5.1.12 Exercise. How many elements are there in 23 ?
5.1.13 Definition. Forward and Inverse Image Functions. For every function
f : A → B there are two functions, the function
f∗ : P(A) → P(B),
the forward image function of f ; and the function
f ∗ : P(B) → P(A),
the inverse image function of f . These two functions are defined as follows. Given
any A0 ⊂ A, f∗ (A0 ) is the collection of all f (a) for a ∈ A0 . Given any B0 ⊂ B,
f ∗ (B0 ) is the collection of all a ∈ A such that f (a) ∈ B0 . The function value f∗ (A)
is called the range (or image) of f and is denoted by ranf .
5.1.14 Example. Suppose
f : {0, 1, 2} → {0, 1}
is given by
f (0) = f (1) = f (2) = 0.
Then,
(1) ranf = f∗ ({0, 1, 2}) = {0},
(2) f∗ ({0}) = f∗ ({1}) = f∗ ({2}) = {0},
(3) f∗ ({0, 1}) = f∗ ({0, 2}) = f∗ ({1, 2}) = {0},
(4) f ∗ ({0, 1}) = f ∗ ({0}) = {0, 1, 2},
(5) f ∗ ({1}) = ∅.
5.2 Compositions and Identities

5.2.1 Definition. Compositions. Given functions f : A → B and g : B → C,
the function g ◦ f : A → C (read “g composed with f ”), called the composition of
g and f , is defined by
(5.1) (g ◦ f )(a) = g(f (a))
for all a ∈ A. This situation can be visualized as shown in Figure 5.1.
g◦f
.......................................................................
................ ............
........... ...........
.......... ..........
..
..
..
.......... .........
..
..
..
...... ........
........
..
..
.....
. .......
..
..
...... .......
.......
.
..
.... .......
R
.
...... f g .....
A -B -C
Figure 5.1: Function Composition.
The composition of f : A → B, g : B → C and h : C → D is shown in

Figure 5.2, which illustrates one of two ways in which we can define the composition
of three functions. Here, we have chosen to associate the binary operation of
function composition from left to right by choosing the path h ◦ (g ◦ f ).
h ◦ (g ◦ f )
.................................................................
..................... ...............
.............. .............
............ ..........
.
................. ..........
.
.......
. .........
.
. .........
g◦f
.
...... ........
..
..
.. ........
..
..
...... .......
..
..
...... ..... .......
..
..
..... ....
..
..
...
............................ ........................................ .......
.......
..
..
..... ................... .............
..
.............. ......
........ .............. ..
............
......
......
..
..
.... ........... .... ......
................. ........ ......
.
.. .......... ......
. ............. ...
........ ......
. ............. ....
......
......
. .......
..............
f g R .....
h R .....
A -B -C -D
Figure 5.2: The Composition of Three Functions.
Thanks to the definition of function composition given by (5.1), we have

(h ◦ (g ◦ f ))(a) = h((g ◦ f )(a))
= h(g(f (a)))
= (h ◦ g)(f (a))
= ((h ◦ g) ◦ f )(a)
for all a ∈ A. In other words, we always have h ◦ (g ◦ f ) = (h ◦ g) ◦ f, showing that
function composition is associative.
Associativity is shown in Figure 5.3, which is an example of a commutative

diagram, a diagram where we obtain the same result independently of the path we
choose in arriving from an initial place to a terminal place in the diagram.
h ◦ (g ◦ f )
...............................................................
...................... ...............
.............. .............
............ ..........
.......
........... ..........
.
.......
. .........
.. .........
g◦f
. .
..... ........
......
..
..
......
. ........
.......
..
..
...... .......
..
..
..... ....
...
..
..
. ......................................................................
. .......
.......
..
..
..... ..
..
..
............ .............
..
.... ......
........ ..............
. ..........
..
...
......
......
..
..
.... ............ .........
... ......
................. ........ ......
.
.. ........... ......
............... ...
........ ......
......
.. ........... .... ......
.......
..............
f g R h R
..... ......
A... -B -C -D
.... ..
.............
......
........
.... ....
.... .... .
.... ...... .
.... ...... .... .... ...... ...
.... ...... ...... .....
.... ...... ....... ......
.... ...... ....... ......
.... ....... ....... ...........
h◦g
.... ....... ..
..
..
.
..... ........ ......... ......
...... ........
.......... ......... ......
...... ............. ........... ...........
..................... ............. .
......
...... .......................................... ...
........
...... .
....... .......
.......
(h ◦ g) ◦ f
....... .......
....... ..
........ ........
........ ........
.......... .........
........... ..........
............
................. ............
.............
.....................................................................
Figure 5.3: A Commutative Diagram for the Composition of Three Functions.
5.2.2 Definition. Identities: With every set X we associate the identity function
of X, defined by
idX : X → X : x 7→ x.
In other words,
(5.2) idX = {(x, x) ∈ X × X | x ∈ X}.
Note that for any f : A → B we have
f ◦ idA = f, idB ◦ f = f, and idB ◦ f ◦ idA = f,
as illustrated in Figure 5.4.
f
A -B
I I
idA idB
Figure 5.4: Identity Functions of A and B.

5.2.3 Proposition. Uniqueness of Identities. Thanks to the axiom of exten-

sion, the identity function of a set is unique. That is, given any set X, (5.2) defines
idX uniquely. In other words, any function f : X → X such that f (x) = x for all
x ∈ X must be f = {(x, x) ∈ X × X | x ∈ X} = idX .
We want to clarify the matter further: There is exactly one function idA such
that
(5.3) f ◦ idA = f for all functions f :A→B
and
(5.4) idA ◦ g = g for all functions g : B → A.
Here, A and B are arbitrary.

Proof. Let i : A → A be any function such that
(5.5) f ◦i=f for all functions f :A→B
and
(5.6) i◦g =g for all functions g : B → A.
Since (5.5) and (5.6) hold for arbitrary A and B, they must hold when we take
A = B. Then, thanks to (5.3) with f = i we have
(5.7) i = i ◦ idA .
Thanks to (5.6) with g = idA we have
(5.8) i ◦ idA = idA .
From (5.7) and (5.8) we obtain i = idA .
5.3 Injections, Surjections, Bijections and Inverses

5.3.1 Definition. Injections, Surjections, Bijections. The function
f :A→B
is called
• a surjection (or is said to be surjective) just in case for every b ∈ B there is
some a ∈ A such that b = f (a) (i.e.: f∗ (A) = B);
• an injection (or is said to be injective) just in case
(∀a ∈ A)(∀a′ ∈ A)(f (a) = f (a′ ) → a = a′ ),
or, by contraposition from the above line, just in case
(∀a ∈ A)(∀a′ ∈ A)(a ̸= a′ → f (a) ̸= f (a′ ));
• a bijection (or is said to be bijective) just in case it is both injective and

surjective.
5.3.2 Examples. For real numbers a and b, let [a, b] denote the set (interval) of
all real numbers x such that a ≤ x ≤ b.
• The function f : [−1, 1] → [−1, 1] : x 7→ x2 is neither injective nor surjective:

f (−1) = f (1) and there is no x such that f (x) = −1.
• The function g : [0, 1] → [−1, 1] : x 7→ x2 is injective but not surjective:

There is no x such that f (x) = −1.
• The function h : [−1, 1] → [0, 1] : x 7→ x2 is surjective but not injective:

f (−1) = f (1).
• The function i : [0, 1] → [0, 1] : x 7→ x2 , is both injective and surjective: i is

a bijection.
5.3.3 Remarks. Injectivity and surjectivity can be visualized by means of simple

cartoons. For example, Figure 5.5 illustrates a function that is surjective, but not
injective, since we have f (a) = f (a′ ) = b with a ̸= a′ , while every b in B is of the
form b = f (x) for some x in A.
'$ f '$
.........................................................................
a q..
...............
...........
.............
............
..........
..........
........ b
′ q......
a ..............................................................................................................................* j ........
... q
............
&% &%
A B
Figure 5.5: A Surjective Noninjective Function.
On the other hand, Figure 5.6 illustrates a function that is injective, but not
surjective, since no two distinct x’s in A are mapped to the same point in B and
since there is no x in A such that f (x) = b′′ .
'$ f ..........
..........................
.. q q b
.......
..........................................................................................
.............................
q
....
....................
a ..
..
....
.. ...
...
..
..
′′ ..
′ q....... ... q b
... ..
a ...........................................................................................................................1 q ′ ..
.
........ .....
.. .. b .....
...
..
&% .................................
A B
Figure 5.6: An Injective Nonsurjective Function.
5.3.4 Remark. From the usual definition of a function inverse, it must be clear
that the invertible functions are precisely the bijections, as we shall show below.
5.3.5 Exercise. Let Q denote the rationals and let Q+ denote the nonnegative
rationals. Explain whether each of the following functions is injective or surjective.
1. f : Q → Q : x 7→ x2 .
2. g : Q+ → Q : x 7→ x2 .
3. h : Q → Q+ : x 7→ x2 .
4. i : Q+ → Q+ : x 7→ x2 .
5.3.6 Definition. Inverses. The function f : A → B is said to be invertible, or

is said to have an inverse f −1 , if there exists a function f −1 : B → A such that
f ◦ f −1 = idB and f −1 ◦ f = idA ,
i.e.: such that f (f −1 (b)) = b for all b ∈ B and f −1 (f (a)) = a for all a ∈ A. This
situation can be visualized as shown in Figure 5.7.
f
.....................................................
................ ............
...........
AY ......... j ..........
.
.........
..........
............. . .
..
. .
. .........
.
B
.........
...
I ...................................................
I
f −1
idA idB
Figure 5.7: A Function Inverse.
5.3.7 Theorem. Existence of Inverses. Every bijection has a bijective inverse.

Proof. Let f : A → B be a given bijection. Define g : B → A as follows. For every

b ∈ B there is some ab ∈ A such that f (ab ) = b (by surjectivity). This ab is unique
(by injectivity). Define g(b) = ab . This can be done for every b ∈ B, meaning that
the domain of g is the whole of B. Additionally, g must be injective (otherwise f
would not be a function) and must also be surjective (for the same reason). The
function g is so defined that g(f (a)) = a for all a ∈ A and f (g(b)) = b for all
b ∈ B.
5.3.8 Exercise. Prove that every invertible function is bijective.
5.3.9 Remark. To put it in a different way, f : A → B is invertible iff the set
{(b, a) ∈ B × A | (a, b) ∈ f }
is a function, in which case the above set is f −1 .

5.3.10 Theorem. Uniqueness of Inverses: The inverse of an invertible function
is unique.
Proof. Consider the bijection f : A → B with inverse f −1 : B → A. Let g : B → A
be any function such that
f ◦ g = idB and g ◦ f = idA .
We have
g = idA ◦ g
= (f −1 ◦ f ) ◦ g
= f −1 ◦ (f ◦ g)
= f −1 ◦ idB
= f −1 .
5.3.11 Remark. Note that (f −1 )−1 = f , since f satisfies all the requirements for
being the inverse of f −1 .
5.3.12 Exercise. Determine whether the composition g ◦ f is injective, surjective,
neither injective nor surjective depending upon f and g each being injective or
surjective.
5.3.13 Exercise. Prove that the composition g ◦ f has an inverse if each of the
functions g and f does, in which case (g ◦ f )−1 = f −1 ◦ g −1 .
5.3.14 Exercise. Construct functions f and g, neither bijective, such that g ◦ f
is bijective.
5.3.15 Definition. If f : A → B is a bijection, we sometimes say that f is a

f
bijection between A and B. We may also write f : A ∼ B, A ∼ B, or even A ∼ B
when we do not want to mention the name of the bijection.
5.3.16 Definition. Restrictions. Given the function f : A → B and a subset C
of A, the restriction of f to C is the function f |C : C → B defined by f |C(c) = f (c).
(We may also write (f |C) for f |C.)
5.3.17 Example. The inverse trigonometric function sin−1 is not the inverse of the
sine function (which is not invertible). Rather, it is the inverse of the restriction
sin |[−π/2, π/2] when we take the codomain to be [−1, 1]. (We have a bijection
because the restriction is a strictly increasing function.)
5.3.18 Remarks. Some authors call surjections onto functions (as opposed to into
functions, functions which are not necessarily surjective), and injections one-one or
one-to-one functions. A bijection would then be called a one-to-one onto function.
Map, mapping, operator, transformation, embedding, immersion, submersion, are
but a few of many names that are frequently used to name certain kinds of functions.
Sometimes you may find the word “function” not used in the general context we
use here, but used to mean a function of a certain type. Some use the symbol f to
denote both the function f and its forward image function f∗ , context indicating
which of the two is meant. Some use the symbol f −1 to denote the inverse image
function f ∗ and also the inverse function f −1 (when it exists). Note that if f :
A → B is a bijection and b ∈ B then f −1 (b) ∈ A, while f ∗ (b) is usually undefined.†
However, we do have that f ∗ ({b}) = {f −1 (b)}.
5.4 Properties of Functions

Inverse images have rather nice properties, as can be seen from the three results
given below.
5.4.1 Proposition. If f : A → B then
f ∗ (B1 ∪ B2 ) = f ∗ (B1 ) ∪ f ∗ (B2 )
for any two subsets B1 and B2 of B.

† ∗
f (b) is defined not when b ∈ B, but when b ⊂ B. It is certainly possible to have b ⊂ B
for some b ∈ B. In this case both sets f ∗ (b) and f ∗ ({b}) are defined; however, they are disjoint
subsets of A (hence distinct unless they are both empty); provided, of course, that b ̸= {b}. The
existence of a set X such that X = {X} can neither be proved nor disproved using the axioms of
ZF studied in these notes. For any specific set X defined using the axioms of ZF (in particular,
for all sets X defined in these notes) we have X ̸= {X}.
Proof. The chain of biconditionals
a ∈ f ∗ (B1 ∪ B2 ) ⇔ f (a) ∈ B1 ∪ B2
⇔ f (a) ∈ B1 ∨ f (a) ∈ B2
⇔ a ∈ f ∗ (B1 ) ∨ a ∈ f ∗ (B2 )
⇔ a ∈ f ∗ (B1 ) ∪ f ∗ (B2 )
establishes the proposition.

f ∗ (B1 ∩ B2 ) = f ∗ (B1 ) ∩ f ∗ (B2 )

a ∈ f ∗ (B1 ∩ B2 ) ⇔ f (a) ∈ B1 ∩ B2
⇔ f (a) ∈ B1 ∧ f (a) ∈ B2
⇔ a ∈ f ∗ (B1 ) ∧ a ∈ f ∗ (B2 )
⇔ a ∈ f ∗ (B1 ) ∩ f ∗ (B2 )

f ∗ (B1 \ B2 ) = f ∗ (B1 ) \ f ∗ (B2 )

a ∈ f ∗ (B1 \ B2 ) ⇔ f (a) ∈ B1 \ B2
⇔ f (a) ∈ B1 ∧ f (a) ∈ / B2
∗
⇔ a ∈ f (B1 ) ∧ a ∈ / f ∗ (B2 )
⇔ a ∈ f ∗ (B1 ) \ f ∗ (B2 )

5.4.4 Remark. Forward images do not behave so nicely; but we have the following
result.
f∗ (A1 ∪ A2 ) = f∗ (A1 ) ∪ f∗ (A2 )
for any two subsets A1 and A2 of A.

Proof. For all b in B and any two subsets A1 and A2 of A we have
b ∈ f∗ (A1 ∪ A2 ) ⇔ (∃a ∈ A1 ∪ A2 )(f (a) = b)

⇔ (∃a1 ∈ A1 )(f (a1 ) = b) ∨ (∃a2 ∈ A2 )(f (a2 ) = b)
⇔ b ∈ f∗ (A1 ) ∨ b ∈ f∗ (A2 )
⇔ b ∈ f∗ (A1 ) ∪ f∗ (A2 ).
5.4.6 Examples. The analogs of proposition 5.4.2 and 5.4.3 do not hold for for-
ward images. For example, let f : A → B. The equality
f∗ (A1 ∩ A2 ) = f∗ (A1 ) ∩ f∗ (A2 ),
where A1 and A2 are subsets of A, does not necessarily hold: To see why the last
proof does not work when we replace ∪ with ∩, note that when we do so we have
b ∈ f∗ (A1 ∩ A2 ) ⇔ (∃a ∈ A1 ∩ A2 )(f (a) = b)

̸⇔ (∃a1 ∈ A1 )(f (a1 ) = b) ∧ (∃a2 ∈ A2 )(f (a2 ) = b)
(5.9)
⇔ b ∈ f∗ (A1 ) ∧ b ∈ f∗ (A2 )
⇔ b ∈ f∗ (A1 ) ∩ f∗ (A2 ).
The second biconditional above fails because we cannot guarantee that a1 ∈ A1 ∩A2
or a2 ∈ A1 ∩ A2 for all subsets A1 and A2 of A and all b in B, for arbitrary A and
B. However, we always have inclusion and even equality, if f is injective. (See the
exercises below.)
To discover a counterexample, consider Figure 5.8 below as a guide. In Figure
................................
.......
.......
............ ........
.......
..
f '$
.... ..
...................................................................................
...
aq
. .............
.... ........... ...
...
............
...........
..........
...
...
. A1 ..
..
jbq
.........
........
..
... ...
..
.
*
........
............
.
...
a′ q A 2
... . .. ..........
.
........
&%
... ... ...
........ ..
............. ..
...
.... ..........................................................................................

.......
........
............. ........
.......
..
B
.............................
Figure 5.8: A Cartoon of f∗ (A1 ∩ A2 ) = ∅ ̸= {b} = f∗ (A1 ) ∩ f∗ (A2 ).

5.8 we have a ̸= a′ , A1 = {a}, A2 = {a′ }, f (a) = f (a′ ) = b. Thus, we obtain

f∗ (A1 ∩ A2 ) = f∗ ({a} ∩ {a′ }) = f∗ (∅) = ∅,
while
f∗ (A1 ) ∩ f∗ (A2 ) = f∗ ({a}) ∩ f∗ ({a′ }) = {b}.
To construct an actual counterexample, define A = {0, 1}, B = {0}, A1 = {0},
A2 = {1}, f (0) = f (1) = 0. Then,
f∗ ({0} ∩ {1}) = f∗ (∅) = ∅
while
f∗ ({0}) ∩ f∗ ({1}) = {0} ∩ {0} = {0}.
We also have
f∗ ({0} \ {1}) = f∗ ({0}) = {0},
while
f∗ ({0}) \ f∗ ({1}) = {0} \ {0} = ∅,
showing that the equality f∗ (A1 \ A2 ) = f∗ (A1 ) \ f∗ (A2 ) does not necessarily hold
for any two subsets A1 and A2 of A.
5.4.7 Exercise. Prove that if f : A → B then
f∗ (A1 ∩ A2 ) ⊂ f∗ (A1 ) ∩ f∗ (A2 )
for every two subsets A1 and A2 of A. Additionally, if f is injective then
f∗ (A1 ∩ A2 ) = f∗ (A1 ) ∩ f∗ (A2 )
for every two subsets A1 and A2 of A.
5.4.8 Exercise. Prove that if f : A → B is injective then
f∗ (A \ A1 ) ⊂ B \ f∗ (A1 )
for every subset A1 of A.
5.4.9 Exercise. Prove that if f : A → B is surjective then
B \ f∗ (A1 ) ⊂ f∗ (A \ A1 )
for every subset A1 of A.
Note: From the last two exercises, we obtain that if if f : A → B is bijective
then
f∗ (A \ A1 ) = B \ f∗ (A1 )
for every two subsets A1 and A2 of A.
5.5 Indexed Sets

Below, we extend some results of the previous section to arbitrary intersections
and unions.
5.5.1 Definition. We introduce indexed sets: If I is a set, A is a collection of sets
and h : I → A is a function from I into A, we call the range of h,
S = {h(i) ∈ A | i ∈ I} = ran h,
an indexed set and call I the indexing set of S. The function h itself is called the
indexing function (or simply the indexing) of S by I. Usually, we write Ai for h(i)
and write S = {Ai | i ∈ I}.
An indexing can be thought as a way to attach “labels” (elements i ∈ I) to
elements of the collection S. Note that the indexing need not be injective, so we
may have Ai = Aj for distinct indexes i, j ∈ I. The indexing set I is arbitrary
(subject to the condition that h indexes the entire collection S): a collection S can
be indexed by different indexing sets I, J; moreover, the same indexing set I can
be used to index S in different ways (through a different indexing function).
The union of all the sets Ai for i ∈ I is given by
∪ ∪
Ai = {a ∈ ∪A | a ∈ Ai for some i ∈ I} = S,
i∈I
while their intersection is defined by

∩ ∩
Ai = {a ∈ ∪A | a ∈ Ai for all i ∈ I} = S.
i∈I
5.5.2 Example.∪Let I = {5, 6, . . .}, A =∩P(N) and define Ai = {0, 1, . . . , i} for

all i ∈ I. Then, i∈I Ai = {0, 1, . . .} and i∈I Ai = {0, 1, 2, 3, 4, 5}.
5.5.3 Remark. Every collection A of sets is an indexed set: Take I = A and
h = idA . then
A = {a ∈ A | a ∈ A} = {idA (a) ∈ A | a ∈ A} = ran idA .
5.5.4 Proposition. If {Bi | i ∈ I} is an indexed set of subsets of B and f : A → B

then ( )
∪ ∪
f∗ Bi = f ∗ (Bi ).
i∈I i∈I

( )
∪ ∪
a ∈ f∗ Bi ⇔ f (a) ∈ Bi
i∈I i∈I
⇔ f (a) ∈ Bi for some i ∈ I
⇔ a ∈ f ∗ (Bi ) for some i ∈ I
∪
⇔ a∈ f ∗ (Bi )
i∈I

5.5.5 Proposition. If {Bi | i ∈ I} is an indexed set of subsets of B and f : A → B
then ( )
∩ ∩
f∗ Bi = f ∗ (Bi ).
i∈I i∈I

( )
∩ ∩
∗
a∈f Bi ⇔ f (a) ∈ Bi
i∈I i∈I
⇔ f (a) ∈ Bi for all i ∈ I
⇔ a ∈ f ∗ (Bi ) for all i ∈ I
∩
⇔ a∈ f ∗ (Bi )
i∈I

5.5.6 Proposition. If {Ai | i ∈ I} is an indexed set of subsets of A and f : A → B
then ( )
∪ ∪
f∗ Ai = f∗ (Ai ).
i∈I i∈I
(∪ ) ∪ (∪ )
Proof. First, we show that f∗ i∈I Ai ⊂ i∈I f∗ (Ai ). If b ∈ f∗ i∈I Ai then
b = f (a) for some a in at least one of Ai for i ∈ ∪I. Therefore b is a member of at
i ), i ∈ I. That is,(∪
least one of the sets f∗ (A∪ b ∈ i∈I) f∗ (Ai ). ∪
Next, we show that i∈I f∗ (Ai ) ⊂ f∗ i∈I A∪ i . If b ∈ i∈I f∗ (Ai ) then b =
(∪ where) a ∈ Ai for some i ∈ I. That is, a ∈ i∈I Ai . This implies b = f (a) ∈
f (a),
f∗ i∈I Ai .
5.6 Equinumerous Sets

We would like to believe that a child displaying four fingers in response to the
question “How many pieces of candy do you have?” has established a one-to-one
correspondence between the fingers displayed and the pieces of candy: Each finger
displayed corresponds to one, and only one, piece of candy. Thus the child has
(in some restricted sense, perhaps) a notion of two collections being equinumerous,
the collection of fingers displayed and the collection of pieces of candy being talked
about.
There is no reason why the notion of two collections being equinumerous should
not be extended to arbitrary sets. This notion is then used to classify sets as being
either finite or infinite.
5.6.1 Definition. We say that two sets A and B are equinumerous (or equivalent,
or isomorphic) just in case there is a bijection between them: In symbols we write
A ∼ B. To say that A and B are not equinumerous we write A ̸∼ B. A bijection
is also called an isomorphism (of sets) or an equivalence.
5.6.2 Example. If two sets are isomorphic we may consider them to be the same
set for a number of practical purposes. (Mathematics is full of “equalities” that are
really isomorphisms.) For example, many use P(A) and 2A interchangeably, since
for any set A we have P(A) ∼ 2A . (Recall that 2 = {0, 1} and that 2A is the set of
all functions f : A → 2, i.e.: all functions f : A → {0, 1}.)
A proof (definitely not the shortest) goes as follows. Define the function
φ : P(A) → 2A : B 7→ fB ,
where for every subset B of A

{
1, if a ∈ B,
fB (a) =
0, if a ∈
/ B.
Define the function

ψ : 2A → P(A) : f 7→ Bf ,
where for every f : P(A) → {0, 1}
Bf = {a ∈ A | f (a) = 1}.
We show that these functions are invertible. Note that
BfB = {a ∈ A | fB (a) = 1}
= {a ∈ A | a ∈ B}
= B
for all B ∈ P(A). Further, for all f ∈ 2A

{
1, if a ∈ Bf ,
fBf (a) =
0, if a ∈
/ Bf ;
{
1, if f (a) = 1,
=
0, if f (a) = 0;
= f (a)
for all a ∈ A.
Therefore,
ψ(φ(B)) = ψ(fB ) = BfB = B
for all B ∈ P(A) and
φ(ψ(f )) = φ(Bf ) = fBf = f
for all f ∈ 2A , showing that φ is a bijection between P(A) and 2A . The function
fB is called the characteristic function of the set B and is usually denoted by χB .
5.6.3 Definition. Finite and Infinite Sets. We say that a set is finite just in
case it is equivalent to some natural number. Otherwise, the set is called infinite.
5.6.4 Remark. We know that if n is a natural number, then m ∈ n → m ⊂ n
for every set m. (Every natural number is a transitive set: a set that includes
everything it contains.) While m ∈ n means that m is a natural number, A ⊂ n
does not mean that A is a natural number. So, it does not make sense to ask
whether the implication can be turned around. But, in some sense, we can, as
shown below.
5.6.5 Proposition. Every proper subset of a natural number n is equivalent to
some element of n.
Proof. The proof is inductive. The proposition is true for n = 0 (since 0 has no
proper subsets).
Suppose the proposition is true for some n. Let A be a proper subset of n+ .
Recall that n+ = n ∪ {n}. There are three cases to consider.
Case 1 If A is a proper subset of n then, thanks to the inductive hypothesis, A
is equivalent to some element m of n. Since n+ = n ∪ {n}, we also have m ∈ n+ .
Case 2 If A = n then the proposition hold trivially, since n ∼ n.
Case 3 If A is not a proper subset of n and A ̸= n then n ∈ A, since n+ = n∪{n}.

In other words, A ⊂ n ∪ {n} and A ̸⊂ n implies a ∈ A → a ∈ {n}. Hence, n ∈ A.
If n ∈ A and n ⊂ A then n ∪ {n} ⊂ A, contradicting the hypothesis that A is
a proper subset of n ∪ {n}. Therefore, there is a natural number k in n but not in
A.
Define f : A → n by {
i, if i ̸= n,
f (i) =
k, if i = n.
This function is injective and is therefore a bijection between A and f∗ (A).
If f∗ (A) = n then f is a bijection between A and n (a proper subset of n+ ). If
f∗ (A) ̸= n then f∗ (A) is a proper subset of n. By the induction hypothesis, f∗ (A)
is equivalent to some element m of n. Therefore, A ∼ f∗ (A) ∼ m. Since m ∈ n
implies m ∈ n+ , we are finished.
5.6.6 Proposition. No natural number n is equivalent to a proper subset of itself.
Proof. The proof is inductive. The proposition is true for n = 0, since 0 has no
proper subsets.
Suppose that the proposition is true for some n. Let A ⊂ n+ and let f : n+ → A
be a bijection. We must show that A = n+ .
Either n ∈ A or n ∈ / A. Assume first n ∈ A. Together with the equivalence
n+ ∼ A, this implies n+ \ {n} ∼ A \ {n}, that is, n ∼ A \ {n}. However, A \
{n} ⊂ n+ \ {n} = n, so by the inductive hypothesis A \ {n} = n, and therefore
A = n ∪ {n} = n+ .
Assume now n ∈ / A. Then A ⊂ n. The restriction f |n is a bijection between n
and a subset B of A. Since B ⊂ A and A ⊂ n, we have B ⊂ n. By the inductive
hypothesis applied to f |n we have B = n. Since n ∈ / n and B = (f |n)∗ (n), the
injectivity of f implies f (n) ∈
/ B. On the other hand f (n) ∈ A, contradicting
A ⊂ n = B.
Therefore, no such bijection f exists.
5.6.7 Proposition. No finite set is equivalent to a proper subset of itself.
Proof. The proposition is true for the empty set ∅, since ∅ has no proper subsets.
Let A be a nonempty finite set and B a proper subset of A. We suppose there is
a bijection f : A ∼ B to obtain a contradiction.
Since A is finite, there is a bijection g : A ∼ n for some n ∈ N. Then, g∗ (B) is
a proper subset of n.
Define the injection i : B → A : b 7→ b. We have
g −1 f i g
n ∼ A ∼ B → A ∼ n.
The function φ = g ◦ i ◦ f ◦ g −1 is injective, since it is a composition of injections.
We have
φ∗ (n) = (g ◦ i ◦ f ◦ g −1 )∗ (n)
= (g∗ ◦ i∗ ◦ f∗ ◦ (g −1 )∗ )(n)
= (g∗ ◦ i∗ ◦ f∗ )(A)
= (g∗ ◦ i∗ )(B)
= g∗ (B),
showing that φ∗ (n), the range of φ, is a proper subset of n. Since every injection
ψ : X → Y always defines a bijection between X and ψ∗ (X), φ defines a bijection
between n and a proper subset (g∗ (B)) of n. We have a contradiction, since no
natural number n is equivalent to a proper subset of n.
5.6.8 Corollary. The set N of natural numbers is infinite.
Proof. The function f : N → N \ {0} : n 7→ n+ is a bijection between N and a
proper subset (N \ {0}) of N. Therefore, N cannot be finite.
5.6.9 Remark. Hilbert’s Hotel. Since infinite sets are not part of our usual
perception of physical reality, we may obtain counterintuive results when we try to
use them in connection with tangible, everyday objects.
As an example, consider the celebrated Hilbert’s Hotel, which is always full, yet
has always room for another guest without the need to share a room.
Hilbert’s Hotel has infinitely many rooms, numbered 0, 1, 2 and so on. The
hotel is always full. When a new guest arrives, the person in room n is asked to
move to room n+ , leaving room 0 available for the new guest.
The room shift is defined by f : N → N \ {0} : n 7→ n+ .
5.6.10 Proposition. A set can be equivalent to at most one natural number.
Proof. If A is a set, m and n are natural numbers, A ∼ m and A ∼ n, then m ∼ n.
If m ̸= n then either m ∈ n or n ∈ m.
If m ∈ n then m ⊂ n (since every natural number is a transitive set). Hence,
m is a proper subset of n (since m ̸= n). Therefore m is equivalent to a proper
subset of m, a contradiction.
Since the roles of m and n can be interchanged, n ∈ m also implies a contra-
diction. Therefore, m = n.
5.6.11 Exercise. Prove that if B ∩ C = ∅ then AB∪C ∼ AB × AC . Hint: Note
that B ∩ C = ∅ guarantees that if f ∈ AB and g ∈ AC then the union
f ∪ g = {(x, f (x)) | x ∈ B} ∪ {(x, g(x)) | x ∈ C}
is a function. In fact, h = f ∪ g : B ∪ C → A is given by

{
f (x), x ∈ B,
h=
g(x), x ∈ C.
In other words, we have f = h|B and g = h|C.

5.6.12 Exercise. Prove that (A × B)C ∼ AC × B C .
5.6.13 Exercise. Prove that (AB )C ∼ AB×C .
5.7 Relations
When we think about formalizing the notion that “a is the mother of b” we can
think of a collection of pairs (a, b), where a belongs to the set of all women and b
belongs to the set of all human beings. (We are talking about humans.) Not all
women are mothers and a mother may have more than one child. This point serves
to illustrate why relations are more general than functions.
What we call a relation below, should be more properly called a binary relation.
(There are binary, ternary, quaternary relations and so on. An example of a ternary
relation is “a and b are the parents of c.”) Since we shall only use binary relations,
we shall call them simply “relations.”
5.7.1 Definition. Relations: If A and B are sets, any subset R of A × B is called
a relation from A to B. When A = B we say that R is a relation in A. Instead
of writing (a, b) ∈ R we may write a R b. The domain of R, written dom R, is the
subset of A given by
dom R = {a ∈ A | a R b for some b ∈ B}.
The range of R, written ran R, is the subset of B given by
ran R = {b ∈ B | a R b for some a ∈ A}.
5.7.2 Examples.
(1) The empty set ∅ is the relation with domain ∅ and range ∅.
(2) The relation R from a set A to its power set P(A) defined by a R B ⇔ a ∈ B
has domain A and range P(A) \ {∅}. (Every element of A belongs to some
nonempty subset B of A.)
(3) Let A = {0, 1, 2, 3}, R = {(0, 1), (0, 2), (0, 3), (1, 2), (1, 3), (2, 3)}. Then, R is
the usual “less than” relation in A. For example, (1, 3) ∈ R because 1 < 3
and (3, 2) ∈
/ R because 3 ̸< 2. The domain of R is A and the range is {1, 2, 3}.
(4) The relation R in a set A defined by a R b ⇔ a = b has domain and range

equal to A.
5.7.3 Definition. Reflexivity, Symmetry and Transitivity: A relation R in

A is called
• reflexive if a R a for all a ∈ A,
• symmetric if a R b → b R a for all a, b ∈ A,

• transitive if a R b ∧ b R c → a R c for all a, b, c ∈ A.

5.7.4 Examples.
(1) A relation R where a R b means that a is a sibling of b is symmetric and
transitive, but not reflexive.
(2) A relation R where a R b means that a is a half-sibling of b is symmetric, but
not reflexive or transitive. Note: It may happen that if a is a half-sibling of b
and b is a half-sibling of c that a and c are also half-siblings, or even siblings.
Nevertheless, neither is guaranteed.
(3) A relation R where a R b means that a is at least as tall as b is reflexive and
transitive, but not symmetric.
(4) A relation R where a R b means that a is taller that b is transitive, but not
reflexive or symmetric.
(5) The relation of equality on a set is reflexive, symmetric and transitive.
(6) The relation R = A × B is the largest relation from A to B.
(7) The empty set ∅ is the smallest relation from any set to any other set.
(8) Let L denote the collection of all lines in the plane and let ℓ R ℓ′ denote that
ℓ is parallel to ℓ′ . If we agree that “parallel” does not rule out the possibility
that the two lines are coincident (that the two lines are the same line), then
R is reflexive, symmetric and transitive.
5.7.5 Example. Ordering The Natural Numbers The relations < and ≤ in
N are defined by
m < n ↔ m ∈ n,
m ≤ n ↔ m < n ∨ m = n.
We may write n > m for m < n and n ≥ m for m ≤ n.

Let m and n be natural numbers. Recall that a natural number is a transitive
set (it includes everything it contains), i.e.: m ∈ n → m ⊂ n. Recall that a natural
number cannot be a subset of one of its elements. Hence, if m ⊂ n then n ∈ / m
(otherwise n would be a subset of one of its members). Therefore,
m ⊂ n ∧ m ̸= n ↔ m ∈ n
(trichotomy: either m ∈ n, or m = n, or n ∈ m). To summarize,
m ∈ n ↔ m is a proper subset of n.
It is now a simple matter to verify that < is transitive, but neither reflexive nor
symmetric; while ≤ is reflexive and transitive, but not symmetric.
5.7.6 Remark. In the above example, the relations < and ≤ defined in N are
examples of orderings of N. These relations are discussed in some detail in Note 6.
5.7.7 Definition. Equivalence Relations: A relation that is reflexive, symmet-
ric and transitive is called an equivalence relation.
5.7.8 Definition. Equivalence Classes: Let R be an equivalence relation on a
set A. For every a ∈ A, define the equivalence class [a]R (read: “the equivalence
class of a under R” or “the equivalence class with representative a”) by
[a]R = {b ∈ A | b R a}.
Each equivalence class [a]R is a subset of A, i.e.: [a]R is an element of P(A).

Define A/R (read: “A modulo R” or “A mod R”), a subset of P(A), by
A/R = {[a]R ∈ P(A) | a ∈ A}.
Thus, A/R is the set whose members are all equivalence classes [a]R corresponding
to every a ∈ A.
5.7.9 Proposition. Any representative of an equivalence class uniquely identifies
the equivalence class. That is, if R is an equivalence relation in A then aR b implies
[a]R = [b]R for all a, b ∈ A.
Proof. Suppose R is an equivalence relation in A; a, b ∈ A; a R b. Then, for every
c ∈ A,
c ∈ [a]R ⇒ aRc
⇒ aRb∧aRc
⇒ bRa∧aRc
⇒ bRc
⇒ c ∈ [b]R
and
c ∈ [b]R ⇒ bRc
⇒ aRb∧bRc
⇒ aRc
⇒ c ∈ [a]R ,
showing that [a]R = [b]R .

5.7.10 Proposition. Equivalence classes are disjoint. That is, if R is an equiva-

lence relation in A then [a]R ̸= [b]R ⇒ [a]R ∩ [b]R = ∅ for all a, b ∈ A.
Proof. Let R be an equivalence relation in A; a, b ∈ A; [a]R ̸= [b]R . Suppose that
c ∈ [a]R and c ∈ [b]R . Then a R c and b R c. Thanks to symmetry, we have a R c
and c R b. Thanks to transitivity, we have a R b, which implies [a]R = [b]R , a
contradiction. Hence, [a]R and [b]R are disjoint.
5.7.11 Remarks. Equivalence relations are related to the notion of a partition of

a set. In fact, every equivalence relation in a nonempty set defines a partition on
that set and vice versa, as we show below.
5.7.12 Definition. Partitions. A partition of a set A is a collection P of disjoint
nonempty subsets of A whose union is A. In other words, given a set A and a
collection P of nonempty subsets of A, P is a partition of A just in case
(∀S ∈ P )(∀S ′ ∈ P )(S ̸= S ′ → S ∩ S ′ = ∅)
and ∪P = A.
5.7.13 Theorem. An equivalence relation R in a nonempty set A defines a par-
tition A/R of A.
Proof. Suppose R is an equivalence relation in A ̸= ∅. Then A/R ̸= ∅ and
∪ ∪
(A/R) = [a]R = A.
a∈A
Since the elements of A/R are also disjoint, A/R is a partition of A.

5.7.14 Theorem. Every partition of a set defines an equivalence relation on that
set.
Proof. Let P be a partition of A. Then every a ∈ A belongs to exactly one
Sa ∈ P (because the elements of P are disjoint). Define the relation R in A by
a R b ↔ b ∈ Sa . Clearly, a R a for every a ∈ A. Further, since a R b means that a
and b belong to Sa , we have a R b ⇒ b R a. Finally, if a R b and b R c we have a ∈ Sb
and c ∈ Sb , showing that a R c.
5.7.15 Remarks. If you are even thinking of turning the notion “A is equinu-
merous with B” into an equivalence relation, stop! We cannot take the Cartesian
product of the “set of all sets” with itself, because the “set of all sets” does not
exist.
Equivalence relation will be put to use in the construction of the integers and
the rationals.
5.7.16 Example. Even though, as pointed out above, set isomorphism is not an
equivalence relation among all sets, it is for all subsets of a given set. That is, for
.
any set A, define the equivalence relation = in P(A) by
.
a=b↔a∼b
for all a and b in P(A) (for all subsets a and b of A). Recall that a ∼ b means that
there is a bijection between a and b.
.
The equivalences classes of this relation, the elements of P(A)/ =, are all sets
of subsets of A that are equinumerous. In other words, the equivalence class [a]=. is
the set of all sets that are equinumerous with a. We can define the cardinal number
of a to be [a]=. . For example, if n is a subset of A then [n]=. is the collection of
all subsets of A with precisely n elements. Unfortunately, we cannot define the
cardinal number of just any set in this fashion.
5.7.17 Example. Suppose that odd and even have the usual meaning in N.
(Recall that we have yet to define arithmetic in N.) Define the equivalence relation
.
= in N by
.
m = n ↔ m and n are both even or both odd
for all m and n in N.
There are precisely two equivalence classes under this relation. We have
{
. all even natural numbers if n is even,
[n]= =
all odd natural numbers if n is odd.
For example:
[3]=. = [7]=. = [1]=. = [17]=. = {n ∈ N | n is odd}
and
[8]=. = [12]=. = [100]=. = [0]=. = {n ∈ N | n is even}.
.
In other words, the set N/ = has precisely two elements:
. { }
N/ = = {n ∈ N | n is even}, {n ∈ N | n is odd}
{ }
= [0]=. , [1]=. .
5.7.18 Remarks. From Relations to Functions: A function is a special kind
of relation. Recall that a relation R from A to B is any subset of the Cartesian
product A × B. We say that R is univalent at a ∈ A if
(∀b ∈ B)(∀b′ ∈ B)[(a, b) ∈ R ∧ (a, b′ ) ∈ R → b = b′ ],
i.e.: given a ∈ A, there is at most one ordered pair (a, b) ∈ R. The relation R is a
function from A to B precisely when
• the domain of R is A and
• R is univalent at every a ∈ A.
If both of the above conditions hold, for every a ∈ A there is exactly one b ∈ B
such that a R b, in which case we write b = R(a).
5.8 The Recursion Theorem

The induction principle is the tool we use for proving things by induction. For
defining things by induction we need a new tool: the recursion theorem.
What should an inductive definition look like? Consider the following example.
Fix m ∈ N. We want to define the function sm : N → N : n 7→ m + n. Note that
“+” has no meaning yet. We want to define addition in N. The idea is simple:
Define the successor function σ : N → N : n 7→ n+ . To compute m + n apply n
times the successor function to m. What’s wrong with the following attempt to
define the desired function sm ?
For m ∈ N define inductively the function sm : N → N by
sm (0) = m
and
sm (k + ) = σ(sm (k))
for each k ∈ N. For example,
sm (3) = sm (2)+ = (sm (1)+ )+ = ((sm (0)+ )+ )+ = ((m+ )+ )+ .
The unofficial “m + 3” then becomes official as sm (3).
What is wrong with the above definition? What is very wrong is that we do not
know that sm exists. (What does “define inductively” mean?) The reasoning is
correct, provided we can be assured of the existence of the function sm . This is
precisely what the recursion theorem does: It ensures the existence of the function
sm in this case and of something analogous to it in more general situations as well.
For fixed m ∈ N, what are the required ingredients? Given the set N, an
element 0 ∈ N, and a function σ : N → N. We need the existence of a unique
function sm : N → N such that
• sm (0) = 0 and
• sm (n+ ) = σ(sm (n))

for all n ∈ N.
5.8.1 Theorem. Recursion Theorem. Given any set A, any element a ∈ A,
and any function f : A → A, there exists a unique function φ : N → A such that
• φ(0) = a and
• φ(n+ ) = f (φ(n))
for all n ∈ N.
Proof. If such a function φ exists, where do we look for it? The function φ must
be a set of ordered pairs (n, b) ∈ N × A with certain properties. The requirement
φ(0) = a is another way of writing (0, a) ∈ φ. The requirement
(5.10) φ(n+ ) = f (φ(n))
says that if φ(n) = b then φ(n+ ) = f (b), i.e.: that
(5.11) (n, b) ∈ φ ⇒ (n+ , f (b)) ∈ φ.
In other words, (5.10) implies (5.11). On the other hand, (5.10) is another way of
writing
φ(n) = b ⇒ φ(n+ ) = f (b) ⇒ φ(n+ ) = f (φ(n)).
Hence, (5.11) implies (5.10). Therefore (5.10) and (5.11) are equivalent.
A function is a special kind of relation. So, φ must be a special kind of relation
R from N to A. By what has been said, to be a candidate, any such relation R
must have the property
(5.12) (0, a) ∈ R
and
(5.13) (n, b) ∈ R ⇒ (n+ , f (b)) ∈ R
for all n ∈ N and b ∈ A. Let us call any such relation R recursive (with respect
to φ).
Among all relations from N to A, i.e.: among all elements of the power set
P(N × A), we are only interested in those that are recursive. Define
Φ = {R ∈ P(N × A) | R is recursive}
Since N × A ∈ Φ, the set Φ is nonempty.

We want to select the smallest member φ of Φ. To this end, define the set
∩ ∩
(5.14) φ= Φ= R.
R∈Φ
Note that the above is equivalent to

{ }
φ = (n, b) ∈ N × A (∀R ∈ φ) [(n, b) ∈ R] .
We prove that φ ∈ Φ. Indeed, (0, a) ∈ R for every R ∈ Φ. Therefore, (0, a)

also belongs to the intersection φ. If (n, b) ∈ φ then (n, b) ∈ R for every R ∈ Φ.
Since (5.13) holds for every R ∈ φ, (n+ , f (b)) ∈ R for every R ∈ φ. Therefore,
(n+ , f (b)) ∈ φ also. Hence, φ satisfies (5.12) and (5.13). Therefore φ ∈ Φ.
Note that (5.14) defines φ uniquely (thanks to the axiom schema of separation).
Further, φ is minimal, i.e.:
ψ ∈ Φ ⇒ φ ⊂ ψ.
Therefore, Φ contains no proper subset of φ.
We prove that φ is a function with domain N. Define N ⊂ N by
N = {n ∈ N | n is in the domain of φ and φ is univalent at n}.
We prove that N = N inductively. First, we show that 0 ∈ N . Then, we show

that (∀n ∈ N)(n ∈ N → n+ ∈ N ).
We prove that 0 ∈ N . We have already shown that 0 is in the domain of φ.
Now suppose that 0 ∈ / N . Then φ must fail to be univalent at 0, so there is a
c ∈ A, c ̸= a, such that (0, c) ∈ φ. Define ψ = φ \ {(0, c)}. Note that ψ is a proper
subset of φ and that ψ contains (0, a) (since (0, a) ∈ φ and (0, a) ̸= (0, c)). Since
n+ ̸= 0 for all n ∈ N, (n+ , f (x)) cannot be (0, c), so
(n, x) ∈ ψ ⇒ (n+ , f (x)) ∈ ψ
for every x ∈ A. Therefore, ψ ∈ φ and ψ is a proper subset of φ, contradicting the

minimality of φ. We conclude that 0 ∈ N .
Suppose that m ∈ N . Then, there exists a unique b ∈ A such that (m, b) ∈ φ.
Further, property (5.13) implies that (m+ , f (b)) ∈ φ, so φ is defined at m+ . To
prove that m+ ∈ N , suppose the contrary. Then, for some c ∈ A, c ̸= f (b), we
have (m+ , c) ∈ φ. Define ψ = φ \ {(m+ , c)}. Note that ψ is a proper subset of φ.
Then ψ satisfies property (5.12). We claim that ψ also satisfies property (5.13).
Property (5.13) is true for ψ whenever n+ ̸= m+ simply because it is true for φ.
It only remains to show that property (5.13) holds when n+ = m+ , that is, when
n = m. But this is obvious, since there is a unique b such that (m, b) ∈ φ (hence
also such that (m, b) ∈ ψ), and in this case (m+ , b) ∈ ψ, so property (5.13) holds
when n = m. Therefore, ψ ∈ φ and ψ is a proper subset of φ, contradicting the

minimality of φ. We conclude that n+ ∈ N .
Hence, thanks to the induction principle, N = N. Therefore, φ is a function
φ : N → A with the stated properties. Further, φ is unique.
Certain slight generalizations of the Recursion Theorem are valid and useful.
The first generalization allows f to depend on n.
5.8.2 Theorem. Given any set A, any element a ∈ A and, for each n ∈ N, a
function fn : A → A, there is a unique function φ : N → A such that
• φ(0) = a, and
• φ(n+ ) = fn (φ(n)) for all n ∈ N.
Proof. The proof, a variation of that of Theorem 5.8.1, is left as an exercise.

The above suggests another mild generalization of the recursion theorem, namely
allowing the functions fn to have a different domain and codomain.
5.8.3 Theorem. Let a family A = {An | n ∈ N} indexed by the set N of natural
numbers be given. Fix any element a0 ∈ A0 and, for each n ∈ N, let a ∪
function
fn : An → An be given. Then there exists a unique function φ : N → n∈N An
+
such that
1. φ(0) = a0 ,
2. φ(n) ∈ An for all n ∈ N, and
3. φ(n+ ) = fn (φ(n)) for all n ∈ N.
Proof. The proof is yet another slight variation of that of Theorem 5.8.1. It is left
as an exercise.
5.8.4 Remark. In the language of Note 7 (Definition 7.2.7), property 2. above
says that φ is a choice function for the indexed family A = {An }.
Finally, we have a strictly finite, “bounded” version of Theorem 5.8.3.
5.8.5 Theorem (Bounded Recursion Theorem). Fix a natural number n.
Let a family A = {Am | m ∈ n+ } indexed by the set n+ be given. Let an element
a0 ∈ A0 and, for each m ∈ n, a function
∪ fm : Am → Am+ be given. Then there
exists a unique function φ : n → m∈n+ Am such that
+
(5.15) φ(0) = a0 ,
(5.16) φ(m) ∈ Am
for all m ∈ n+ , and
(5.17) φ(m+ ) = fm (φ(m))
for all m ∈ n.
Proof. The proof is left as an exercise.
5.8.6 Exercise. Show that Theorem 5.8.5 implies Theorem 5.8.3.
5.8.7 Remark. Other generalizations of the Recursion Theorem are possible. In
Note 6 we will state a version of the rather more sophisticated theorem of transfinite
recursion which allows to recursively define functions whose domain is an arbitrary
well-ordered set.
5.9 Cantor and Schröder-Bernstein Theorems

5.9.1 Definition. Given any two sets A and B we say that A ⪯ B (read: “A is
dominated by B” or “B dominates A”) just in case there is an injection from A to
B. Additionally, we say that A ≺ B (read: “A is strictly dominated by B” (or “B
strictly dominates A”) just in case A ⪯ B and A ̸∼ B. In different words: A ≺ B
means that there is an injection from A to B and that there is no bijection between
A and B.
5.9.2 Remarks. This example is at the intuitive level, since we do not yet know
what 2n means for a natural number n. We can remove infinitely many elements
from an infinite set A to obtain a set B and still have A ∼ B.
For example, the bijection between the natural numbers N and the even natural
numbers Ne given by n 7→ 2n shows that there are just as many natural numbers
as there are even natural numbers. In other words, the set of natural numbers is
not any “bigger” (in the sense of strict set dominance) than the set of even natural
numbers. To visualize this situation, think of the following infinite table:
N Ne
0 7→ 0
1 7 → 2
2 7→ 4
3 7 → 6
.. .. ..
. . .
Strict set dominance (≺) gives us a notion of a set being bigger than another
set: A ≺ B means that B is bigger than A. Under this notion, is there a biggest
set? The answer is negative, as shown below.
5.9.3 Theorem. (Cantor) For every set A we have A ≺ P(A).
Proof. Recall that for every a ∈ A, {a} is a subset of A. Therefore, {a} is a member
of P(A). Define
f : A → P(A) : a 7→ {a}.
This function is injective, since
f (a) = f (a′ ) ⇔ {a} = {a′ } ⇔ a = a′ .
This shows that A ⪯ P(A).

To finish the proof, we show that A ̸∼ P(A). Suppose the contrary. Then,
there is a bijection g : A → P(A). Define the subset A0 of A by
A0 = {a ∈ A | a ∈
/ g(a)}.
Since g is surjective, A0 = g(a0 ) for some a0 in A. Hence, since for any a ∈ A we

have
a ∈ A0 ⇔ a ∈ g(a0 ),
in particular, we have
a0 ∈ A0 ⇔ a0 ∈ g(a0 ).
On the other hand, from the definition of A0 we have
a0 ∈ A 0 ⇔ a0 ∈
/ g(a0 ).
We have shown that
a0 ∈ g(a0 ) ⇔ a0 ∈ A0 ⇔ a0 ∈
/ g(a0 ),
which is a contradiction. Hence, no such bijection g exists.

5.9.4 Theorem. (Schröder-Bernstein) For any two sets A and B, if A ⪯ B
and B ⪯ A, then A ∼ B.
Proof. (Kleene,‡ 1952) To say that A ⪯ B means that there is an injection
f : A → B. This injection defines a bijection A ∼ f∗ (A) between A and the subset
f∗ (A) of B. (Every bijection is a surjection between its domain and its range.)
‡
This is the “equivalence theorem” in Stephen Kleene’s Introduction to Metamathematics,
Van Nostrand, Princeton, 1952, pp. 11–12. Note that Kleene’s proof does not use the recursion
theorem because it is given in the background language, where an inductive definition (i.e.: the
recursion theorem) is taken for granted.
Likewise, to say that B ⪯ A means that there is an injection g : B → A. This

injection defines a bijection B ∼ g∗ (B) between B and the subset g∗ (B) of A.
To prove the theorem, we must find a bijection h : A ∼ B. This can be done
easily if we can find subsets M ⊂ A and N ⊂ B such that
f∗ (M ) = N and g∗ (B \ N ) = A \ M,
for then we can define

{
f (a) if a ∈ M ,
h(a) = −1
g (a) if a ∈ A \ M .
The function h is injective because f and g are. Moreover, we have that
h∗ (A) = h∗ (M ∪ (A \ M ))
= h∗ (M ) ∪ h∗ (A \ M )
= N ∪ (B \ N )
= B,
showing that h is also surjective.

The proof thus hinges on finding the sets M and N mentioned above. This
is something we can do with mirrors, as shown in the figure below. To build the
M0 M1 M2
A
J J
J
J J
J
J
J
J
J J J J J J

J f JJ J
g
J
J J
g
J
J J
g
^
J

f J^
J

f J^

J
J
JJ
J
JJ
J

J
JJ
JJ
JJ
JJ
JJ
JJ
B
N1 N2 N3
Figure 5.9: Kleene’s Proof of the Schröder-Bernstein Theorem.
reflections shown in the above figure, we use the recursion theorem and proceed as
follows. Let M0 = A \ g∗ (B). Then f defines a bijection
M0 ∼ N1 ,
where N1 = f∗ (M0 ). Now, g defines a bijection
N1 ∼ M1 ,
where M1 = g∗ (N1 ). Again, f defines a bijection
M1 ∼ N2 ,
where N2 = f∗ (M1 ). By proceeding in this fashion, we obtain an infinite sequence

of bijections
M0 ∼ N1 ∼ M1 ∼ N2 ∼ M2 ∼ · · · .
Define the sets
M = M0 ∪ M1 ∪ M2 ∪ · · · ,
N = N1 ∪ N2 ∪ N3 ∪ · · · .
We verify that the two sets M and N have the desired properties. We have:
f∗ (M ) = f∗ (M0 ∪ M1 ∪ M2 ∪ · · · )
= f∗ (M0 ) ∪ f∗ (M1 ) ∪ f∗ (M2 ) ∪ · · ·
= N1 ∪ N2 ∪ N3 ∪ · · ·
= N.
We also have:
g∗ (B \ N ) = g∗ (B) \ g∗ (N )
= g∗ (B) \ g∗ (N1 ∪ N2 ∪ N3 ∪ · · · )
= g∗ (B) \ (g∗ (N1 ) ∪ g∗ (N2 ) ∪ g∗ (N2 ) ∪ · · · )
= (A \ M0 ) \ (M1 ∪ M2 ∪ M3 ∪ · · · )
= A \ M.
The proof is concluded.

5.9.5 Example. Is there a biggest set (in the sense of strict set dominance)? To
obtain a contradiction, suppose that a biggest set exists. That is, suppose that
there is a set B such A ⪯ B for every set A.
Thanks to the axiom of powers, the power set of any set is a set. Hence, P(B)
is a set. Cantor’s theorem gives B ≺ P(B). On the other hand, P(B) ⪯ B by
the maximality of B. We have a contradiction. Why? Because B ≺ P(B) implies
B ⪯ P(B), which together with P(B) ⪯ B and the Schröder-Bernstein theorem
gives B ∼ P(B), contradicting B ≺ P(B), which says that B ̸∼ P(B).
5.9.6 Exercise. Instead of using Russell’s paradox argument, use Cantor’s theorem
and the Schröder-Bernstein theorem to prove that there is no set of all sets.
Foundations Note 6
Numbers and Arithmetic

July 28, 2020
Contents
6.1 The Arithmetic of the Natural Numbers . . . . . . . . . . . . 6-1
6.2 Order . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-12
6.3 The Integers . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-20
6.4 The Rationals . . . . . . . . . . . . . . . . . . . . . . . . . . 6-24
6.5 Algebraic Structures . . . . . . . . . . . . . . . . . . . . . . . 6-29
6.1 The Arithmetic of the Natural Numbers

In this section we give the natural numbers N constructed in Note 4 the usual
arithmetic operations: addition, multiplication and exponentiation.
We begin by defining what we mean by a binary operation. A binary operation
∗ on a set A is a function
∗ : A × A → A.
Usually, we write a ∗ b instead of writing ∗(a, b).
The operation ∗ on the set A is
• called associative if
(a ∗ b) ∗ c = a ∗ (b ∗ c)
for every a, b and c in A, in which case we can write a∗b∗c without ambiguity
of meaning;
6-1
Note 6: Numbers and Arithmetic – July 28, 2020 6-2
• called commutative if a ∗ b = b ∗ a for every a and b in A;
• said to have a unit, or identity, if there is some element u of A such that

a ∗ u = u ∗ a = a for every a ∈ A.
Note: No binary operation can have more than one unit, for if u and e are units
for the binary operation ∗ on A, then u = u ∗ e (since a = a ∗ e for all a ∈ A) and
u ∗ e = e (since u ∗ a = a for all a ∈ A); hence, u = u ∗ e = e. Note that nothing
has been said about the operation ∗ being commutative in discussing units for that
operation: a ∗ u = u ∗ a for all a ∈ A must hold for the unit u.
6.1.1 Definition. Addition: Recall that
σ : N → N : k 7→ k +
is the successor function. As shown in Note 5, for every m ∈ N the recursion

theorem asserts the existence of a unique function sm : N → N such that
(1) sm (0) = m
and such that
(2) sm (k + ) = σ(sm (k)) = sm (k)+
for all k ∈ N. Define the binary operation + (addition) on N by
m + n = sm (n)
for all m and n in N. We also call m + n the sum of the numbers m and n.
6.1.2 Remark. The definition of addition above implies that
m+ = m + 1
for all m ∈ N. Eventually we aim, for the most part, to stop using successors
in favor of addition. This will become especially important once the set-theoretic
definition of “number” becomes blurred by the passage from the natural numbers
N to the integers Z, then to the rationals Q and eventually to the reals R (and
beyond…). Each of these number systems is an extension of the prior one, but does
not really contain it as a subset. In particular, the “same” natural number n is a
different set when regarded as an element of each of the above number systems.
6.1 Exercise. 1. Use the recursion theorem to prove that, for each n ∈ N, there
exists a unique function σn : N → N such that σ0 = idN and σn+ = σ ◦ σn
(where σ is the successor function).
2. Prove that σn (m) = sm (n) for all m, n ∈ N (in the notation of Defini-
tion 6.1.1). In other words, the functions σn may be used to provide an
alternative definition of the operation of addition of natural numbers.
6.2 Exercise. 1. Use the recursion theorem to prove that, for each n ∈ N, there
exists a unique function ςn : N → N such that ς0 = idN and ςn+ = ςn ◦ σ.
2. Prove or disprove: ςn (m) = m + n for all m, n ∈ N.

∗
6.1.3 Example. We have
3+4 = s3 (4)
= s3 (3)+ by (2)
= (s3 (2)+ )+ by (2)
= ((s3 (1)+ )+ )+ by (2)
= (((s3 (0)+ )+ )+ )+ by (2)
= (((3+ )+ )+ )+ by (1)
= ((4+ )+ )+
= (5+ )+
= 6+
= 7.
In other words, thanks to the recursion theorem, the intuitive construction
m + n = (. . . ((m + 1) + 1) . . .) + 1
| {z }
n times
can be carried out in a way that makes sense within the axioms of ZF.
6.1.4 Proposition. Addition on N satisfies the following properties
(1) (m + n)+ = m + n+ ,
(2) m + 0 = m = 0 + m,
(3) m+ + n = (m + n)+ ,
(4) (ℓ + m) + n = ℓ + (m + n),
(5) m + n = n + m;
for all ℓ, m and n in N. In particular, addition is associative and commutative

with unit 0.
∗
This example, like many more to follow, is strictly “unofficial,” since we have yet to invent
the decimal representation (or any positional representation) for numbers.
Proof. We prove (1): Thanks to the definition of addition, we have
(m + n)+ = sm (n)+ = sm (n+ ) = m + n+ .
We prove (2): By definition, m + 0 = sm (0) = m. The equality 0 + m = m is

proved by induction on m. The equality holds for m = 0, since 0 + 0 = s0 (0) = 0.
Suppose it also holds for m, then we have
0 + m+ = (0 + m)+ by (1)
= m+ by the inductive hypothesis.
It follows that 0 + m = m holds for all m.

We prove (3): The proof is by induction on n. Thanks to (2), for n = 0 we
have
m+ + 0 = m+ = (m + 0)+ .
Suppose (3) holds for n. Then,
m+ + n+ = (m+ + n)+ by (1)

= ((m + n)+ )+ by the inductive hypothesis
= (m + n+ )+ by (1),
which is (3) with n replaced by n+ . This shows that (3) holds for all n.
We prove (4): The proof is by induction on n. For n = 0, we have
(ℓ + m) + 0 = ℓ + m = ℓ + (m + 0),
so that (iv) is true for n = 0. If (iv) is true for n, then
(ℓ + m) + n+ = ((ℓ + m) + n)+ by (1)

= (ℓ + (m + n))+ by the inductive hypothesis
= ℓ + (m + n)+ by (1)
= ℓ + (m + n+ ) by (1),
We prove (5): The proof is by induction on n. For n = 0, thanks to (2), we
have m + 0 = 0 + m. Suppose (5) is true for n. Then,
m + n+ = (m + n)+ by (1)
= (n + m)+ by the inductive hypothesis
= n+ + m by (3),
6.3 Exercise. This exercise may be regarded as a continuation of Exer-

cise 6.1.
1. Use the recursion theorem to prove that, for each n ∈ N, there exists a unique
function ςn : N → N such that ς0 = idN and ςn+ = ςn ◦ σ.
2. Prove or disprove: ςn (m) = m + n for all m, n ∈ N.

6.1.5 Proposition. The biconditional
m∈n↔m+ℓ∈n+ℓ
holds for all m, n and ℓ in N.

Proof. Fix m and n in N. We prove the implication
(1) m∈n→m+ℓ∈n+ℓ
for all ℓ ∈ N. The proof is by induction on ℓ. For ℓ = 0 the implication (1) follows
from 1.4.2. Suppose that (1) is true for ℓ. Then
m + ℓ ∈ n + ℓ ⇒ (m + ℓ)+ ∈ (n + ℓ)+ by Proposition 4.7 in Note 4

⇒ m + ℓ+ ∈ n + ℓ+ ,
which, together with the inductive hypothesis (1), yields (1) with ℓ replaced by ℓ+ .
We have proved (1).
We prove the implication
(2) m + ℓ ∈ n + ℓ → m ∈ n.
Recall that in Note 4 we proved that there are three mutually exclusive possibilities
for the two natural numbers m and n (trichotomy):
m ∈ n, m = n, n ∈ m.
• If m+ℓ ∈ n+ℓ, we cannot have m = n, for this would imply that m+ℓ ∈ m+ℓ,
which is impossible, since no natural number is a member of any of its subsets.
• If m + ℓ ∈ n + ℓ, we cannot have n ∈ m, for then, thanks to (1) (with m

and n reversed), we would have n + ℓ ∈ m + ℓ, which thanks to trichotomy,
contradicts the hypothesis m + ℓ ∈ n + ℓ.
• The only remaining possibility is m ∈ n.
The proof of (2) is concluded.

Now, (1) and (2) establish the proposition.
6.1.6 Proposition. Cancellation Law for Addition: The implication
m+ℓ=n+ℓ→m=n
holds for all m, n and ℓ in N.

Proof. Suppose that
(1) m + ℓ = n + ℓ.
We use trichotomy. If m ∈ n then, by Proposition 1.5, we have m + ℓ ∈ n + ℓ,

which contradicts (1), again by trichotomy. If n ∈ m then, by Proposition 1.5, we
have n + ℓ ∈ m + ℓ, which also contradicts (1), again by trichotomy. The only
possibility left is m = n.
6.1.7 Definition. Multiplication: For each m in N define
gm : N → N : n 7→ n + m.
For every m ∈ N the recursion theorem asserts the existence of a unique function
pm : N → N such that
(1) pm (0) = 0
and such that
(2) pm (n+ ) = gm (pm (n)) = pm (n) + m
for all n ∈ N.
Define the binary operation · on N by
m · n = pm (n)
for all m and n in N. We may write mn in place of m · n.

3·4 = p3 (4)
= p3 (3) + 3 by (2)
= p3 (2) + 3 + 3 by (2)
= p3 (1) + 3 + 3 + 3 by (2)
= p3 (0) + 3 + 3 + 3 + 3 by (2)
= 0+3+3+3+3 by (1)
= 12.
Once again, we have used the recursion theorem to formalize the intuitive con-
struction
m·n=m | +m+ {z. . . + m}
n times
within the axioms of ZF.

6.1.9 Proposition. For all m and n in N we have
(1) m · 0 = 0,
(2) m · n+ = (m · n) + m,
(3) 0 · m = 0,
(4) m+ · n = (m · n) + n.
Proof. We prove (1) and (2): Thanks to the definition of multiplication, (1)
follows from (1.7.1), while (2) follows from (1.7.2), since m · n+ = pm (n+ ) =
pm (n) + m = (m · n) + m.
We prove (3): The proof is by induction on m. For m = 0, we have, thanks
to (1.7.1), 0 · 0 = p0 (0) = 0. Suppose that (3) holds for m. We have
0 · m+ = (0 · m) + 0 by (2)
= 0+0 by the inductive hypothesis
= 0,
which is (3) with m replaced by m+ . This establishes (3).

We prove (4): Fix m and proceed by induction on n. We have
m+ · 0 = 0 by (1)
= 0+0 by (1.4.2)
= (m · 0) + 0, by (1)
which is (4) with n = 0. Suppose that (4) holds for n. We have
m+ · n+ = (m+ · n) + m+ by (2)
= ((m · n) + n) + m+ by the inductive hypothesis
= (m · n) + (n + m+ ) by (1.4.4)
= (m · n) + (n + m)+ by (1.4.1 )
= (m · n) + (m + n)+ by (1.4.5 )
= (m · n) + m + n+ by (1.4.1 )
= (m · n+ ) + n+ , by (2)
which is (4) with n replaced by n+ . We have established (4).

6.1.10 Proposition. Multiplication on N satisfies the following properties
(1) m · 1 = m,
(2) m · (n + ℓ) = (m · n) + (m · ℓ),
(3) m · (n · ℓ) = (m · n) · ℓ,
(4) m · n = n · m;
for all ℓ, m and n in N. In particular, multiplication is associative and commutative

with unit 1. Moreover, multiplication distributes over addition, as shown in (2).
Proof. We prove (1): From the definition of multiplication, we have
m · 1 = pm (1) = pm (0+ ) = pm (0) + m = 0 + m = m.
We prove (2): Fix m and n to proceed by induction on ℓ. For ℓ = 0 we have
m · (n + 0) = m · n by (1.4.2)
= (m · n) + 0 by (1.4.2)
= (m · n) + (m · 0). by (1.9.1)
This establishes (2) for ℓ = 0. Suppose that (2) holds for ℓ. We have
m · (n + ℓ+ ) = m · (n + ℓ)+ by (1.4.1)
= (m · (n + ℓ)) + m by (1.9.2)
= ((m · n) + (m · ℓ)) + m by the inductive hypothesis
= (m · n) + ((m · ℓ) + m) by (1.4.4)
= (m · n) + (m · ℓ+ ), by (1.9.2)
which is (2) with ℓ replaced bu ℓ+ . We have established (2).

We prove (3): We fix m and n and proceed by induction on ℓ. For ℓ = 0, we
have, thanks to (1.9.1),
m · (n · 0) = m · 0 = 0,
showing that (3) holds for ℓ = 0. Suppose that (3) holds for ℓ. We have
m · (n · ℓ+ ) = m · ((n · ℓ) + n) by (1.9.2)
= (m · ((n · ℓ)) + (m · n) by (2)
= ((m · n) · ℓ) + (m · n) by the inductive hypothesis
= (m · n) · ℓ+ , by (1.9.2)
which is (3) with ℓ replaced by ℓ+ . We have established (3).

We prove (4): Fix m and proceed by induction on n. For n = 0 we have

m · 0 = 0 = 0 · n, thanks to (1.9.1) and (1.9.3). This establishes (4) for n = 0.
Suppose that (4) holds for n. Then, we have
m · n+ = (m · n) + m by (1.9.2)
= (n · m) + m by the inductive hypothesis
= n+ · m, by by (1.9.4)
which is (4) with n replaced by n+ . This establishes (4) for all n ∈ N.

6.1.11 Proposition. For all ℓ ̸= 0, m and n in N we have the equivalence
m ∈ n ↔ m · ℓ ∈ n · ℓ.
Proof. Fix m and n in N. First we prove
(1) (∀ℓ)(ℓ ̸= 0 → (m ∈ n → m · ℓ ∈ n · ℓ)).
Since ℓ ̸= 0 if and only if ℓ is a successor, i.e.: ℓ = k + for some k, (1) can be

rewritten
(2) (∀k)(m ∈ n → m · k + ∈ n · k + ).
Suppose that m ∈ n. We prove that m · k + ∈ n · k + for all k by induction on k.

For k = 0, we have m · 0+ = m · 1 = m and n · 0+ = n · 1 = n. Therefore, the
inclusion holds.
Suppose that m · k + ∈ n · k + for k. We have
m · (k + )+ = (m · k + ) + m by (1.9.2)
∈ (n · k + ) + m by the inductive hypothesis and (1.5)
∈ (n · k + ) + n by (1.5), since m ∈ n
∈ n · (k + )+ . by (1.9.2)
We have established (2) and, therefore, (1).

Now we prove
(3) (∀k)(m · k + ∈ n · k + → m ∈ n).
There are three mutually exclusive possibilities for the two natural numbers m and
n (trichotomy):
m = n, n ∈ m, m ∈ n.
• If m · k + ∈ n · k + , we cannot have m = n, for this would imply that m · k + ∈
m · k + , which is impossible, since no natural number is a member of any of
its subsets.
• If m · k + ∈ n · k + , we cannot have n ∈ m, for then, thanks to (1) (with m

and n reversed), we would have n · k + ∈ m · k + , which thanks to trichotomy,
contradicts the hypothesis m · k + ∈ n · k + .
• The only remaining possibility is m ∈ n.

The proof of (3) is concluded.
Now, (1) and (3) establish the proposition.
6.1.12 Proposition. Cancellation Law for Multiplication: For all ℓ ̸= 0, m
and n in N we have
m · ℓ = n · ℓ → m = n.
Proof. Suppose that
(1) m · ℓ = n · ℓ.
We use trichotomy. If m ∈ n, then, by Proposition 1.11, m · ℓ ∈ n · ℓ, contradicting

(1), again, by trichotomy. If n ∈ m, then, by Proposition 1.11, n · ℓ ∈ m · ℓ,
contradicting (1), again, by trichotomy. The only possibility left is m = n.
6.1.13 Definition. Powers: For each m in N define hm : N → N : n 7→ n · m.
For every m ∈ N the recursion theorem asserts the existence of a unique function
em : N → N such that
(1) em (0) = 1
and such that
(2) em (n+ ) = hm (pm (n)) = em (n) · m.
Define the binary operation ↑ (exponentiation) on N by
m ↑ n = em (n)
for all m and n in N. We usually write mn in place of m ↑ n.

34 = e3 (4)
= e3 (3) · 3 by (2)
= e3 (2) · 3 · 3 by (2)
= e3 (1) · 3 · 3 · 3 by (2)
= e3 (0) · 3 · 3 · 3 · 3 by (2)
= 1·3·3·3·3 by (1)
= 81.
Once again, we have used the recursion theorem to formalize the intuitive con-
struction
mn = m| · m {z
· . . . · m}
n times
within the axioms of ZF.

6.1.15 Proposition. For all ℓ, m and n in N we have m(n+ℓ) = mn · mℓ .
Proof. The proof is by induction on ℓ. From the definition of exponentiation, we
have m0 = em (0) = 1. For ℓ = 0 we have
m(n+0) = mn = mn · 1 = mn · m0 .
This establishes the proposition for ℓ = 0.

Suppose that the proposition holds for ℓ. We have
+) +
m(n+ℓ = m(n+ℓ) by (1.4.1)
= m(n+ℓ) · m since em ((n + ℓ)+ ) = em (n + ℓ) · m
= mn · mℓ · m by the inductive hypothesis
mn · mℓ since em (ℓ+ ) = em (ℓ) · m,
+
=
which is the proposition with ℓ replaced by ℓ+ .

The inductive proof is now complete.
6.1.16 Proposition. For all ℓ, m and n in N we have (mn )ℓ = m(n·ℓ) .
Proof. The proof is by induction on ℓ. For ℓ = 0 we have
(mn )0 = 1 = m0 = m(n·0) ,
which establishes the proposition for ℓ = 0.

Suppose that the proposition holds for ℓ. We have
+
(mn )ℓ = (mn )ℓ · mn by (1.13.2)
= m(n·ℓ) · mn by the inductive hypothesis
= m(n·ℓ)+n by(1.15)
+
= m(n·ℓ ) by (1.9.2),
which is the proposition with ℓ replaced by ℓ+ .

The inductive proof is now complete.
6.1.17 Definition. Precedence: From first to last, the order of evaluation (prece-
dence) of the arithmetic operators is exponentiation, multiplication, addition. For
example m + n · ℓ ↑ k is read as m + (n · (ℓ ↑ k)) and is normally written m + n · ℓk .
We read mn·ℓ as m(n·ℓ) . In other words, mn·ℓ stands for m ↑ (n · ℓ), not for
m ↑ n · ℓ, which is mn · ℓ.
Since multiplication has precedence over addition, we can omit the parentheses
in (m · n) + (m′ · n′ ) to write m · n + m′ · n′ instead.
While addition and multiplication are associative, exponentiations is not. When
several instances of the exponentiation operator follow each other, the evaluation
proceeds from right to left. For example, m ↑ n ↑ ℓ is read m ↑ (n ↑ ℓ), which is
ℓ ℓ ℓ
usually written mn . In other words mn is read as m(n ) and not as (mn )ℓ . For the
latter, of course, we have (mn )ℓ = m(n·ℓ) .
6.4 Exercise. Show that addition does not distribute over multiplication.
6.5 Exercise. Show that exponentiation is neither commutative nor associative.
6.6 Exercise. Show that exponentiation does not distribute over addition or mul-
tiplication.
6.7 Exercise. Show that neither addition nor multiplication distributes over ex-
ponentiation.
6.2 Order
In the definitions given below we use the neutral symbols ⊏ and ⊑ as generic names
for relations that behave like the familiar < and ≤, respectively. For the sake of
associating pronounceable names with our symbols, ⊏ and ⊑ shall be pronounced
“less than” and “less than or equal to,” respectively.
6.2.1 Definition.
(1) A relation ⊑ in a set A is called antisymmetric if for all x and y in A
x ⊑ y ∧ y ⊑ x ⇒ x = y.
(2) A relation ⊏ in a set A is called asymmetric if for all x and y in A
x ⊏ y ⇒ ¬y ⊏ x.
(3) A relation that is reflexive, antisymmetric and transitive is called a partial

ordering.
(4) A relation that is asymmetric and transitive is called a strict ordering.

(5) Let ⊑ be a partial ordering of A. Two elements x and y of A are comparable

in the given ordering if x ⊑ y or y ⊑ x. Otherwise (if neither x ⊑ y nor
y ⊑ x), we say that x and y are incomparable.
(6) Let ⊏ be a strict ordering of A. Two elements x and y of A are comparable

in the given ordering if x ⊏ y, or x = y, or y ⊏ x. Otherwise (if x ̸= y and
neither x ⊏ y nor y ⊏ x), we say that x and y are incomparable.
(7) An ordering (partial or strict) of a set A is called linear, or total, if every two
distinct elements of A are comparable in the ordering.
6.2.2 Example. Fix a set A and consider its power set P(A). Then, set inclusion
⊂ defines a partial ordering in P(A). The ordering is total if and only if A is empty
or is a singleton. For example, with P({0, 1}) = {∅, {0}, {1}, {0, 1}}, we have that
{0} ̸⊂ {1} and {1} ̸⊂ {0}. (Be careful, we have 0 ⊂ 1, 0 ∈ 1 and {0} ∈ {1}; but
{0} ̸⊂ {1}.)
6.8 Exercise. Let (A, ⊏A ) (B, ⊏B ) be strictly linearly ordered sets. The lexico-
graphic ordering of A × B is the order ≺ defined by
(a1 , b1 ) ≺ (a2 , b2 ) ⇔ (a1 ⊏A a2 ) ∨ (a1 = a2 ∧ b1 ⊏B b2 ).
Prove that the lexicographic ordering is a strict linear ordering of A × B. Try to

generalize to more general Cartesian products.
6.2.3 Proposition. Let A be a set. There exists a natural bijective correspondence
between partial orderings of A and strict orderings of A. Namely, a partial ordering
⊑ and a strict ordering ⊏ correspond to one another if and only if, for all x, y ∈ A,
(6.1) x ̸= y ⇒ (x ⊑ y ↔ x ⊏ y).
In other words, ⊑ and ⊏ correspond to each other if and only if they agree (as
relations) when restricted to pairs (x, y) ∈ A × A such that x ̸= y.
Proof. First, fix a partial ordering ⊑ of A, and let ⊏ be any strict ordering ⊏ of A
for which property (6.1) holds. For all elements x, y of A we have x ⊏ y ⇔ x ⊑ y
if x ̸= y, and ¬ x ⊏ y if x = y (by the asymmetry of ⊏). Therefore, ⊏ must be the
relation defined by
(6.2) x ⊏ y ⇔ x ⊑ y ∧ x ̸= y
for all x, y ∈ A. Now we prove that the relation ⊏ uniquely defined by (6.2) is a
strict ordering of A.
Thanks to (6.2) and to the antisymmetry of ⊑, we have

x ⊏ y ∧ y ⊏ x ⇒ (x ⊑ y ∧ x ̸= y) ∧ (y ⊑ x ∧ y ̸= x)
⇒ (x ⊑ y ∧ y ⊑ x) ∧ x ̸= y
⇒ x = y ∧ x ̸= y,
a contradiction. Thus, we have
¬(x ⊏ y ∧ y ⊏ x),
which is equivalent to
¬ x ⊏ y ∨ ¬ y ⊏ x,
which is equivalent to
x ⊏ y ⇒ ¬y ⊏ x
which establishes that ⊏ is asymmetric. The rest of the proof that ⊏ is a strict
ordering is left as an exercise.
Conversely, fix a strict ordering ⊏ of A, and let ⊑ be any partial ordering of A
for which (6.1) holds. For all elements x, y of A we have x ⊆ y ⇔ x ⊏ y if x ̸= y,
and x ⊏ y if x = y (by the reflexivity of ⊑). Therefore, ⊏ must be the relation
defined by
(6.3) x ⊑ y ⇔ x ⊑ y ∨ x = y.
Now we prove that the relation ⊑ uniquely defined by (6.3) is a partial ordering
of A.
Thanks to (6.3) and to the asymmetry of ⊏, we have
x ⊑ y ∧ y ⊑ x ⇒ (x ⊏ y ∨ x = y) ∧ (y ⊏ x ∨ y = x)
⇒ x ⊏ y ∧ (y ⊏ x ∨ y = x) ∨ x = y ∧ (y ⊏ x ∨ y = x)
⇒ x ⊏ y ∧ y ⊏ x ∨x ⊏ y ∧ y = x
| {z }
contradiction
∨x = y ∧ y ⊏ x ∨ x = y ∧ y = x
⇒ (x ⊏ y ∨ y ⊏ x) ∧ x = y ∨ x = y
⇒ x = y,
since q ∧ p ∨ p ⇒ p. This establishes the antisymmetry of ⊑. The rest of the proof
that ⊑ is a partial ordering is left as an exercise.
The proof is complete since we have shown that given one of the orderings, the
other one exists and is uniquely characterized by the property (6.1).
6.2.4 Definition. Suppose ⊑ is a partial ordering of A. Any subset B ⊂ A is
called a chain in A if every two elements of B are comparable.
6.2.5 Definition. If ⊑ is a partial ordering of A, we say that A has a
(1) least element a if a ⊑ x for every x ∈ A;
(2) minimal element a if, for all x in A, x ⊑ a ⇒ x = a;
(3) greatest element a if x ⊑ a for every x ∈ A;
(4) maximal element a if, for all x in A, a ⊑ x ⇒ x = a.
It is clear that any least element of a partially ordered set A is minimal, and
any greatest element is maximal. The converses of these statements are not true in
general. In fact, A may have minimal elements but no least element, and maximal
elements but no greatest element. See Examples 6.2.6 below.
If A has a least element this element is unique, since for any two such elements
a and a′ we have a ⊑ a′ and a′ ⊑ a, which implies a = a′ by antisymmetry. (A
similar argument yields uniqueness of greatest elements.) Moreover, if A has a
least element a then a is the unique minimal element of A, because if a′ is minimal
we have a ⊑ a′ since a is least, hence a = a′ since a′ is minimal. (Similarly, a
greatest element is the unique maximal element.) Finally, we remark that if A is a
chain (i. e., A is linearly ordered) then a minimal element is necessarily least, and
a maximal element is greatest: If a is a minimal element then every element x ∈ A
is comparable to a, hence either x ⊑ a or a ⊑ x; however, the latter relation would
contradict the minimality of a.
6.2.6 Examples. The set N ordered by set inclusion has least and minimal element
0, but no greatest or maximal elements. (We have not officially shown that set
inclusion is actually an ordering of N; this is done in Propositions 6.2.10 and 6.2.12
below.)
Given a nonempty set A, define
B = {x ∈ P(A) | x ̸= ∅},
the set of all nonempty subsets of A. Then, B is a partially ordered set under set
inclusion (⊂). Every singleton in B is a minimal element, yet B does not have a
least element unless A is a {singleton.}
For example, take A = 1, {1, 2} . Then, we have
{ }
{ } { }
B = {1}, {1, 2} , 1, {1, 2} .
{ }
The two singletons {1} and {1, 2} { in B are} both minimal elements of B. Yet, B
has no least element. The element 1, {1, 2} of B is both maximal and greatest.
6.2.7 Definition. If ⊑ is a partial ordering of A and B ⊂ A, we say that a ∈ A is

(1) a lower bound of B if a ⊑ x for every x ∈ B;
(2) a greatest lower bound (or infimum) of B if for every lower bound x of B we
have x ⊑ a;
(3) an upper bound of B if x ⊑ a for every x ∈ B;
(4) a least upper bound (or supremum) of B if for every upper bound x of B we
have a ⊑ x.
Note that neither lower nor upper bounds of B need be elements of B. A set that
has an upper (lower) bound, is said to be bounded above (below). The least upper
bound and the greatest lower bound of B, when they exist are denoted by sup B
and inf B, respectively.
If ⊏ is a strict ordering of A then (A, ⊏) is called
(5) complete if every nonempty subset of A that is bounded above has a least
upper bound;
(6) dense if for every x, y ∈ A with x ⊏ y there is a z ∈ A such that x ⊏ z ⊏ y.

6.2.8 Definition. Well Ordering: A partially ordered set A is called well ordered
(and the ordering is called a well ordering) if every nonempty subset of A has a
least element.
6.9 Exercise. Prove that a well-ordered set is linearly ordered.
6.2.9 Definition. The Standard Ordering of N: Define the relations < and
≤ in N by
m<n↔m∈n
and
m ≤ n ↔ m ∈ n ∨ m = n.
Note: We also write n > m for m < n and n ≥ m for m ≤ n.
If m < n we say that m is less than n, or that n is greater than m.
6.2.10 Proposition. The standard ordering of N by < (resp., by ≤) is a strict
ordering (resp., a partial ordering).
Proof. By Proposition 6.2.3 it suffices to show that < is a strict ordering.
By Trichotomy (Proposition 4.4.6), it follows that < is asymmetric: For any
two natural numbers m, n, m ∈ n implies n ∈ / m (as well as m ̸= n).
To prove transitivity of the ordering < let k, m, n be naturals with k < m and
m < n, that is, k ∈ m and m ∈ n. By Proposition 4.3.6, n is a transitive set, hence
m ⊂ n. Since k ∈ m, we conclude k ∈ n, so k < n. (The transitive property of

natural numbers implies the transitivity of the order in N.)
6.2.11 Proposition. N is well ordered.
Proof. By Proposition 6.2.10, N is partially ordered by its standard order. In order
to conclude that every nonempty subset of N has a least element, it will suffice to
prove that every subset of N without a least element is empty. Suppose, then, that
A is a subset of N without a least element. Define
S̄ = {n ∈ N | (∀m)(m < n → m ∈
/ S)}.
We show that S̄ is inductive. Clearly, 0 ∈ N vacuously, since m < 0 is false for
all m ∈ N.
Suppose that n ∈ S̄. Then for any m < n+ we have two possibilities:
• Suppose that m = n. Then, by the definition of S̄, m ∈ S ⇒ m ≥ n means
that n is the least element of A. This contradicts the hypothesis that A has
no least element.
• Suppose that m < n. Then, by the definition of S̄, m ∈

/ S, since n ∈ S̄ by
the inductive hypothesis.
In either of the above cases, we obtain m ∈
/ S, showing that n+ ∈ S̄.
We have shown that S̄ is inductive. This means that S̄ = N. This makes the
statement
(∀m)(m < n → m ∈ / A)
true for all n, which makes m ∈
/ S true for all m, which makes A the empty set, as
was to be shown.
6.2.12 Proposition. The natural numbers are linearly ordered by their standard
order.
Proof. Immediate consequence of Proposition 6.2.11 and Exercise 6.9.
6.2.13 Proposition. For all natural numbers m and n,
m ≤ m + n.
Proof. We prove the proposition by showing that, for each fixed natural number
m, the set
Sm = {n ∈ N | m ≤ m + n}
is inductive, hence contains every natural number n. Clearly 0 ∈ Sm since m =
m + 0 (hence m ≤ m + 0). For the inductive step, assume n ∈ Sm , that is,
m ≤ m + n. If m = m + n then m ∈ m+ = (m + n)+ = m + n+ , hence m < m + n+

(so m ≤ m + n+ ). If m < m + n then, on the one hand, m ∈ m+ , so m < m+ ; on
the other hand, m+ ∈ (m + n)+ = m + n+ (by Proposition 4.4.7), so m+ < m + n+ .
By transitivity, m < m + n+ (hence m ≤ m + n+ ). In either case we conclude that
n+ ∈ Sm . By induction, Sm = N, concluding the proof.
6.10 Exercise. Carry out a variation of the proof of Proposition 6.2.13 showing
instead that, for any natural number n, the set
Sn = {m ∈ N | m ≤ m + n}
is inductive.
6.2.14 Proposition. For natural numbers m and n,
m≤n ⇔ (∃k ∈ N)(n = m + k).
Proof. For any natural number m, define
Sm = {n ∈ N | m ≤ n ↔ (∃k ∈ N)(n = m + k)}.
The proof will be finished once we show that Sm is inductive, hence equal to N.
First we show 0 ∈ Sm . If m = 0 then both 0 ≤ 0 and (∃k)(0 = 0 + k) are true
(the latter because 0 = 0 + 0). If m ̸= 0 then m > 0 (by trichotomy, since m ≮ 0),
so m ≰ 0, and on the other hand 0 ̸= m+k for any k ∈ N (because 0 < m ≤ m+k,
by Proposition 6.2.13). In either case we conclude 0 ∈ Sm .
For the inductive step, assume that n ∈ Sm . We must show that n+ ∈ Sm . We
do this by considering the cases m ≤ n and m > n separately.
If m ≤ n then n ∈ Sm implies that n = m + k for some k ∈ N. We also have
m < n+ (using n < n+ and transitivity), so m ≤ n+ . It remains to observe that
n+ = (m + k)+ = m + k + . We conclude that n+ ∈ Sm .
If n < m then, as shown in the proof of Proposition 4.4.5, we have either n+ < m
or n+ = m. In the first case we have m ≰ n+ and we must show n+ ̸= m + k for
all k ∈ N, which follows from Proposition 6.2.13: n+ < m ≤ m + k. In the second
case we have both m ≤ n+ and n+ = m + 0. The conclusion n+ ∈ Sm follows in
either case.
6.2.15 Definition. For any natural numbers m ≤ n we call the natural number k
such that n = m + k the difference or subtraction of n and m, and write k = n − m.
Note that such a natural number k exists by Proposition 6.2.14, and is unique by
the Cancellation Law for Addition (Proposition 6.1.6).
6.11 Exercise. Carry out a variation of the proof of Proposition 6.2.14 showing
instead that, for any natural number n, the set
Sn = {m ∈ N | m ≤ n ↔ (∃k ∈ N)(n = m + k)}.
is inductive.
6.12 Exercise. Let m and n be natural numbers such that m · n = 0. Prove that
m = 0 or n = 0.
6.13 Exercise. A natural number n is
• even if n = 2 · m for some natural number m and
• odd if n = 2 · m + 1 for some natural number m.
Prove that every natural number is either even or odd (exclusive or). Moreover,
show that m is unique.
6.2.16 Proposition. The following properties hold for all natural numbers a, b, c.
1. a + c 0 then a · c < b · c ⇔ a < b.
Proof. Part 1. is merely a restatement of Proposition 6.1.5.
The special case a = 0 of part 2. follows immediately from Exercise 6.12 and
its obvious converse.
The general case is obtained from the special case as follows. First we show that
a 0). Indeed, a 0, hence
0 = 0·c < b′ ·c (by the special case already proved). Adding a·c to both sides of the
inequality preserves it, by part 1., giving a·c = a·c+0 < a·c+b′ ·c = (a+b′ )·c = b·c.
Now we show a · c 0) by proving its contrapositive
a ≥ b ⇒ a · c ≥ b · c. Indeed, a = b ⇒ a · c = b · c, whereas a > b ⇒ a · c > b · c (if
c > 0) by the above paragraph. Hence a ≥ b ⇒ a · c ≥ b · c is proved for all a, b, c
with c > 0, and we are finished.
6.14 Exercise (The Division Algorithm in N). For any natural numbers a, b
with b > 0 there exist natural numbers q, r satisfying
a=b·q+r and r < b.
Moreover, such numbers q, r are unique.
Suggestion: Consider the set R = {n ∈ N | (∃m ∈ N)(a = b · m + n)}. Show
that R has a least element r such that r < b; moreover n = b · q + r for some q ∈ N.
Uniqueness may be proved using properties of inequalities to derive a contradiction
from a hypothetical equality equality bq + r = bq ′ + r′ with (q, r) ̸= (q ′ , r′ ). (Note
that this exercise generalizes Exercise 6.13.)
6.2.17 Definition. For any natural numbers a, b with b ̸= 0, the natural number
q of Exercise 6.14 is called the (incomplete) quotient and r the remainder of the
(Euclidean) division of a by b.
In case r = 0, that is to say, if a = b · q for some q ∈ N, we say that b divides a,
or that a is a multiple of b, or that a is divisible by b. This relation is denoted by
b | a.
(It is sometimes convenient to allow b = 0 in the latter definition. Namely, 0 | a
just in case a = 0 · q = 0.)
Finally, if b | a (and b ̸= 0), the incomplete quotient r is actually a complete
quotient, it is denoted by a ÷ b, a/b or ab , and called the division of a by b.
Rather than proceeding to carry out a study of the formal properties of the sub-
traction of natural numbers defined above, we switch now to the proper setting in
which the operation is unconditionally defined (i.e., without the restriction m ≤ n
above). This setting is that of the integers. (Furthermore, extending the operation
of division a/b to the case b ∤ a will require introducing the rational numbers.)
6.3 The Integers

We use a pair of natural numbers to name an integer. For example, think of (2, 0)
as an example (a name) of “integer 2,” while (0, 2) is a name of “integer −2.” If we
think of (m, n) as representing the integer with “positive part” m and “negative
part” n, then (n + ℓ, n + ℓ) is another name for the same integer, for any natural
number ℓ. In fact, every integer has infinitely many names under this system. So,
we define an integer as an equivalence class of pairs of natural numbers under a
suitable equivalence relation.
6.3.1 Definition. Define the relation ≃ on N × N by
(m, n) ≃ (m′ , n′ ) ↔ m + n′ = m′ + n
for all ordered pairs (m, n) and (m′ , n′ ) in N × N. Clearly, the domain and range
of ≃ are N.
6.3.2 Proposition. The relation ≃ is an equivalence relation on N × N.
Proof. We show that ≃ is an equivalence relation:
• Reflexivity: For all m and n in N we have (m, n) ≃ (m, n), since m + n =

m + n.
• Symmetry: Suppose that (m, n) ≃ (m′ , n′ ). This gives m + n′ = m′ + n.

Hence, m′ + n = m + n′ , which gives (m′ , n′ ) ≃ (m, n).
• Transitivity: Suppose that (m, n) ≃ (m′ , n′ ) and (m′ , n′ ) ≃ (m′′ , n′′ ). We

need to show that (m, n) ≃ (m′′ , n′′ ). To accomplish this, we note that we
are given
(1) m + n′ = m′ + n
and
(2) m′ + n′′ = m′′ + n′ .
Does it follow that (m, n) ≃ (m′′ , n′′ )? For this to hold we need to establish
that (1) and (2) imply the equality
(3) m + n′′ = m′′ + n.
Does (3) follow from (1) and (2)? We have
(m + n′′ ) + n′ = m + (n′′ + n′ )
= m + (n′ + n′′ )
= (m + n′ ) + n′′
= (n + m′ ) + n′′
= n + (m′ + n′′ )
= n + (m′′ + n′ )
= (n + m′′ ) + n′
= (m′′ + n) + n′
Then, (3) follows from the above and the cancellation law.
6.3.3 Definition. An integer is an element of the quotient set Z = (N × N)/ ≃.

6.3.4 Remarks. The natural numbers are not a subset of the integers. Rather,
we have the isomorphism
{ }
Φ : N → [(m, 0)]≃ ∈ Z m ∈ N
given by
Φ(m) = [(m, 0)]≃ ,
which suggests that we identify each natural number m with the integer [(m, 0)]≃ .
This we do and, by abuse of language, write N ⊂ Z. Strictly speaking, N ̸⊂ Z and
N × {0} ̸⊂ Z. Note that (m, 0) ∈ / Z, but [(m, 0)]≃ ∈ Z. The integer [(m, 0)]≃ shall
be denoted by mZ .
It is not difficult to show that Φ is the only isomorphism from N into a bona fide
subset (subring) of Q, so the identification of N with a subset of Z is unambiguous.
6.3.5 Definition. Addition: Addition in Z is denoted by +Z and is defined by
[(m, n)]≃ +Z [(m′ , n′ )]≃ = [(m + m′ , n + n′ )]≃ .
Of course, we must shown that the definition makes sense. We must show that the
result we obtain is independent of the choice of representative for each equivalence
class. This is stated below.
6.3.6 Proposition. For all m, n, m′ , n′ , a, b, a′ , b′ ∈ N, if
(m, n) ≃ (m′ , n′ ) and (a, b) ≃ (a′ , b′ )
then (m + a, n + b) ≃ (m′ + a′ , n′ + b′ ).
6.3.7 Definition. Additive Inverses: For each integer [(m, n)]≃ define
−[(m, n)]≃ = [(n, m)]≃ .
Then, we have
[(m, n)]≃ +Z (−[(m, n)]≃ ) = [(m, n)]≃ +Z [(n, m)]≃

= [(m + n, n + m)]≃
= [(m + n, m + n)]≃
= 0Z .
The integer −[(m, n)]≃ is called the additive inverse of [(m, n)]≃ . Of course, we
need to show that each integer has a unique additive inverse. Note that −0Z = 0Z .
We write −nZ for the integer [(0, n)]≃
Every integer x = [(m, n)]≃ is either x = kZ , some k ∈ N, if m ≥ n; or is of the
form x = −kZ , for some k ∈ N, if m < n.
6.3.8 Proposition. The operation +Z has the following properties.
(1) The operation is associative: (x +Z y) +Z z = x +Z (y +Z z).
(2) The operation is commutative: x +Z y = y +Z x.
(3) The operation has the identity 0Z : x +Z 0Z = x.
(4) Additive inverses exist: For any x ∈ Z there is a y ∈ Z such that
x +Z y = 0Z .

6.3.9 Definition. Subtraction: We denote by −x the additive inverse of the integer
x. We define the binary operator subtraction, denoted by −, by x − y = x +Z (−y).
We also define the unary operator − : Z → Z : x 7→ −x. The name “−” is
ambiguous, as usual.
6.3.10 Definition. Multiplication: Multiplication ·Z in Z is defined by
[(m, n)]≃ ·Z [(m′ , n′ )]≃ = [(mm′ + nn′ , mn′ + nm′ )]≃ .
Here, the product · in N is not written out explicitly. In other words, we may write
mn instead of m · n.)
(m, n) ≃ (m′ , n′ ) and (a, b) ≃ (a′ , b′ )
then (ma + nb, mb + na) ≃ (m′ a + n′ b, m′ a′ + n′ b′ ).

6.3.12 Proposition. The operation ·Z has the following properties.
(1) The operation is associative: (x ·Z y) ·Z z = x ·Z (y ·Z z).
(2) The operation is commutative: x ·Z y = y ·Z x.
(3) The operation distributes over +Z : x ·Z (y +Z z) = (x ·Z y) +Z (x ·Z z).
(4) The operation has the identity 1Z , with 1Z ̸= 0Z : x ·Z 1Z = x.
(5) There are no zero divisors: x ·Z y = 0Z → x = 0Z ∨ y = 0Z .
(6) (−1Z ) ·Z x = −x.

6.3.13 Definition. Ordering: Define the relation <Z in Z by
[(m, n)]≃ <Z [(m′ , n′ )]≃ ↔ m + n′ < m′ + n.
Below we verify that the definition is independent of the choice of representatives.

(m, n) ≃ (m′ , n′ ) and (a, b) ≃ (a′ , b′ )
then m + b < a + n ↔ m′ + b′ < a′ + n′ .


6.3.15 Proposition. The relation <Z is a linear ordering of Z.
6.3.16 Proposition. The following hold for all integers x, y and z.
(1) x <Z y ↔ x +Z z <Z y +Z z.
(2) If 0Z <Z z then x <Z y ↔ x ·Z z <Z y ·Z z.
(3) x +Z z = y +Z z → x = y.
(4) If 0Z <Z z then x ·Z z = y ·Z z → x = y.

6.3.17 Proposition. The function Φ : N → Z : m 7→ [(m, 0)]≃
(1) is injective
and for all m, n ∈ N we have
(2) Φ(m + n) = Φ(m) +Z Φ(n),
(3) Φ(mn) = Φ(m) ·Z Φ(n),
(4) m < n ↔ Φ(m) <Z Φ(n),
(4) Φ(m) − Φ(n) = [(m, n)]≃ .

6.4 The Rationals

We write <, +, ·, 0, 1, −1, . . . instead of <Z , +Z , ·Z , 0Z , 1Z , −1Z , . . .. In other words,
by default, our numbers are the integers Z and we think of N as a subset of Z. We
write N ⊂ Z even though N is only isomorphic to a subset of Z. As usual, we may
omit the dot when denoting multiplication.
6.4.1 Definition. Define the relation ∼ = in Z × (Z \ {0}) by
(m, n) ∼
= (m′ , n′ ) ↔ m · n′ = m′ · n
for all ordered pairs (m, n) and (m′ , n′ ) in Z × (Z \ {0}). Clearly, the domain and
range of ∼= are Z and Z \ {0}, respectively.
6.4.2 Proposition. The relation ∼

= is an equivalence relation on Z × (Z \ {0}).
Solution. We show that ∼
= is an equivalence relation:
• Reflexivity: For all m and n in Z we have (m, n) ∼
= (m, n), since m·n = m·n.
• Symmetry: Suppose that (m, n) ∼ = (m′ , n′ ). This gives m · n′ = m′ · n.
Hence, m · n = m · n , which gives (m′ , n′ ) ∼
′ ′
= (m, n).
• Transitivity: Suppose that (m, n) ∼ = (m′ , n′ ) and (m′ , n′ ) ∼
= (m′′ , n′′ ). We
need to show that (m, n) ∼
= (m′′ , n′′ ). To accomplish this, we note that we
are given
(1) m · n′ = m′ · n
and
(2) m′ · n′′ = m′′ · n′ .
Does it follow that (m, n) ∼

= (m′′ , n′′ )? For this to hold we need to establish
the equality
(3) m · n′′ = m′′ · n.
Does (3) follow from (1) and (2)? It does: Thanks to (1), (2) and the
properties of multiplication we have
(m · n′′ ) · m′ = m · (n′′ · m′ )
= m · (m′ · n′′ )
= m · (m′′ · n′ )
= m · (n′ · m′′ )
= (m · n′ ) · m′′
= (m′ · n) · m′′
= (m′′ · n) · m′ .
Now, (3) follows from the above and the cancellation law for multiplication. We
have shown that ∼= is an equivalence relation.
6.4.3 Definition. A rational number is an element of the quotient set
Q = [Z × (Z \ {0})]/ ∼
=.
Two very special rationals are
0Q = [(0, 1)]∼
= and 1Q = [(1, 1)]∼
=.
6.4.4 Remarks. The integers { are not a subset

of the
} rationals. Rather, we have
the isomorphism Φ : Z → [(m, 1)]∼
= ∈ Q m ∈ Z given by Φ(m) = [(m, 1)]∼ =,
which suggests that we identify each integer m with the rational [(m, 1)]∼ = . This
we do and, by abuse of language, write Z ⊂ Q. Strictly speaking, Z ̸⊂ Q and
Z × {1} ̸⊂ Q. Note that (m, 1) ∈ / Z, but [(m, 1)]≃ ∈ Z. The rational [(m, 1)]∼
= shall
be denoted by mQ .
Intuitively, in the context of our intuitive notion of the real numbers, we have
m ∼ m′
= ′ ↔ m · n′ = m′ · n.
n n
6.4.5 Definition. Define +Q by
′ ′ ′ ′ ′
[(m, n)]∼
= +Q [(m , n )]∼
= = [(mn + m n, nn )]∼
=.
The above definition is motivated by the unofficial computation

m m′ mn′ + m′ n
+ ′ = .
n n nn′
6.4.6 Proposition. For all m, n, m′ , n′ , a, b, a′ , b′ ∈ Z, if
(m, n) ∼
= (m′ , n′ ) and (a, b) ∼
= (a′ , b′ )
then (mn′ + m′ n, nn′ ) ∼
= (ab′ + a′ b, bb′ ).
6.4.7 Proposition. The operation +Q has the following properties.
(1) The operation is associative: (x +Q y) +Q z = x +Q (y +Q z).
(2) The operation is commutative: x +Q y = y +Q x.
(3) The operation has the identity 0Q : x +Q 0Q = x.
(4) Additive inverses exist: For any x ∈ Q there is a y ∈ Q such that
x +Q y = 0Q .

6.4.8 Definition. Define ·Q by
′ ′ ′ ′
= ·Q [(m , n )]∼
[(m, n)]∼ = = [(mm , nn )]∼
=.
The above definition is motivated by the unofficial computation

m m′ mm′
· = .
n n′ nn′
6.4.9 Proposition. For all m, n, m′ , n′ , a, b, a′ , b′ ∈ Z, if
(m, n) ∼
= (m′ , n′ ) and (a, b) ∼
= (a′ , b′ )
then (ma, nb) ∼

= (m′ a′ + n′ b′ ).
6.4.10 Proposition. The operation ·Q has the following properties.
(1) The operation is associative: (x ·Q y) ·Q z = x ·Q (y ·Q z).
(2) The operation is commutative: x ·Q y = y ·Q x.
(3) The operation distributes over +Q : x ·Q (y +Q z) = (x ·Q y) +Q (x ·Q z).
(4) The operation has the identity 1Q , with 1Q ̸= 0Q : x ·Q 1Q = x.
(5) There are multiplicative inverses: For every x ̸= 0Q there is a y ̸= 0Q such

that x ·Q y = 1Q .
(6) There are no zero divisors: x ·Q y = 0Q ⇒ x = 0Q ∨ y = 0Q .
(7) (−1Q ) ·Q x = −x.

6.4.11 Remarks. If x has a multiplicative inverse, the inverse is unique and shall
be denoted by x−1 . In other words,
( )−1
[(m, n)]∼
= = [(n, m)]∼
=.
We can define division (÷) by any nonzero rational. The unofficial computation
m m′ m n′ mn′
÷ ′ = · ′ =
n n n m nm′
gives us the definition
′ ′ ′ ′ ′ ′
= ÷ [(m , n )]∼
[(m, n)]∼ = ·Q [(n , m )]∼
= = [(m, n)]∼ = = [(mn , nm )]∼
=.
6.4.12 Definition. Define the relation <Q in Q by

′ ′ ′ ′
[(m, n)]∼ = ↔ mn < nm )]∼
= <Q [(m , n )]∼ =,
with n > 0 and n′ > 0. This poses no difficulty. Thanks to the equality
= = [(−a, −b)]∼
[(a, b)]∼ =,
every rational x can be written x = [(a, b)]∼

= , where b > 0. The above definition is
motivated by the unofficial relation
m m′
< ′ ↔ mn′ < nm′ .
n n
6.4.13 Proposition. The relation <Q has the following properties.
(1) <Q is a linear ordering of Q.
For all rationals x, y and z the following hold.
(2) x <Q y ↔ x +Q z <Q y +Q z.
(3) If 0Q <Q z then x <Q y ↔ x ·Q z <Q y ·Q z.
(4) x +Q z <Q y +Q z → x = y.
(5) If z ̸= 0 then x ·Q z = y ·Q z → x = y.
6.4.14 Proposition. The function Φ : Z → Q : m 7→ [(m, 1)]∼
=
(1) is injective
and for all m, n ∈ N we have
(2) Φ(m + n) = Φ(m) +Q Φ(n),
(3) Φ(mn) = Φ(m) ·Q Φ(n),
(4) If m < n ↔ Φ(m) <Q Φ(n),
(5) Φ(0) = 0Q ,
(6) Φ(1) = 1Q ,
(6) Φ(m) ÷ Φ(n) = [(m, n)]∼

=.

6.5 Algebraic Structures

6.5.1 Definitions. We begin with some basic definitions.
(1) Semigroup A set A with an associative binary operation ∗ is called a

semigroup. In other words, (A, ∗) is called a semigroup if the operation
∗ : A × A → A is associative: (x ∗ y) ∗ z = x ∗ (y ∗ z) for all z, y, z ∈ A.
(2) Monoid A set A with an associative binary operation ∗ and an identity

is called a monoid (or semigroup with identity). In other words, (A, ∗, ι∗ ) is
called a monoid if
(i) (A, ∗) is a semigroup and

(ii) the element ι∗ ∈ A is such that x ∗ ι∗ = ι∗ ∗ x = x for all x ∈ A.
(3) Group A monoid where inverses exist is called a group. In other words,
(A, ∗, ι∗ ) is called a group if
(i) (A, ∗) is a monoid and

(ii) for every x ∈ A there is a y ∈ A such that x ∗ y = y ∗ x = ι∗ for all
x ∈ A.
(3) Abelian Group An Abelian group is a group where the operation com-
mutes. In other words, (A, ∗, ι∗ ) is an Abelian group if
(i) (A, ∗, ι∗ ) is a group and

(ii) x ∗ y = y ∗ x for all x, y ∈ A.
6.5.2 Remarks. • A tacit convention throughout mathematics dictates that

any binary operation denoted by the conventional symbol ‘+’ should be both
associative and commutative. Moreover, in such case the identity for the
operation + is denoted by the conventional symbol ‘0’ and read “zero”. We
will always adhere to these conventions.
• Identities and inverses are unique (when they exist).

6.5.3 Examples. • (N, +, 0), (N, ·, 1), (Z, +, 0) and (Z, ·, 1) are monoids. Of
these, only (Z, +, 0) is a group, and it is Abelian (0 does not have a multi-
plicative inverse; 1 has no additive inverse in N).
• Let X be a set. Then X X is a monoid under composition ◦ with identity idX .

(This follows from the associativity of composition and the properties f ◦
idX = f = idX ◦ f ; see Definitions 5.2.1 and 5.2.2.)
• The set of bijections
S(X) = {f ∈ X X | f is bijective}
is a group under composition ◦ with identity idX . Note that S(X) a “sub-
monoid” of X X in the obvious sense. In general, any monoid A contains a
group A× (with the same identity and operation) consisting exactly of the
invertible elements of A. In the case of the monoid X X , recall that invertible
functions are (the )same as bijective functions, by Theorem 5.3.7 and Exer-
×
cise 5.3.8, so X X = S(X).
In the special case when X is a natural number n, one usually writes Sn for
S(n). This is the symmetric group on n symbols (or letters) (why “symbols”
or “letters” rather than “numbers” or “sets”, we know not.)
6.15 Exercise. Prove that the group S(X) is Abelian if and only if X contains
at most two distinct elements.
6.5.4 Definition. (Pi Notation) Let (A, ∗, ι∗ ) be a monoid. For any natural
number n and any family {ai | i ∈ n} (equivalently, {ai | i ∈ N ∧ i < n}) of
elements of A, indexed by n (this family is called an n-tuple of elements of A),
there exists a unique element in A, denoted
∏ ∏
ai or ai
i∈n 0≤i<n
and called the ∗-product of the n-tuple, such that the following properties are
satisfied:
• The product of a 0-tuple (the empty tuple, an empty subset of A) is the
identity element of A: ∏
ai = ι∗ .
i∈0
• For every n ∈ N and every n+ -tuple {ai }i∈n+ :

( )
∏ ∏
ai = ai ∗ an ,
i∈n+ i∈n
where ∏
ai
i∈n
denotes the product of the n-tuple obtained from the n+ -tuple {ai }i∈n+ by
removing its last element an .
∏
We now prove the existence and uniqueness of the product operators i∈n for
all n ∈ N.
Formally, the set of all n-tuples is the set An of∏all functions σ : n → A. The
product∏operator on n-tuples should be a function i∈n from An to A.† In other
words, i∈n ∈ AA (by convention, AA means A(A ) and not
n n n A n
∏ (A ) ). It remains
to construct a function π with domain N such that π(n) = i∈n ∈ AA .
n
We seek to construct π by applying the Recursion Theorem 5.8.3. We start by

taking the family of sets {AA | n ∈ N}. Let π0 ∈ AA be the function that takes
n 0
the only element of A0 (namely the “0-tuple” ∅) to the element ι∗ of A, that is
π0 (∅) = ι∗ .
n+
Now we define fn : AA → AA as follows. For σ ∈ AA , let fn (σ) be the
n n
n+
function τ ∈ AA such that, for any {ai }i∈n+ :
τ ({ai }i∈n+ ) = σ({ai }i∈n ) ∗ an .
In words, fn transforms an operation σ on n-tuples into an operation τ on n+ -tuples

in two steps: (1) apply σ to the n-tuples obtained by removing the last element
of the n+ -tuple; (2) ∗-multiply the outcome of (1) by the last, hitherto unused
element of the n+ -tuple.
We apply Theorem 5.8.3 to conclude that there exists a unique function π :
N → ∪n∈N AA such that:
n
• π(0) = π0 , and
• π(n+ ) = fn (π(n)).
∏
We now write i∈n for π(n). Albeit more cryptic, the base and recursive
conditions imposed on π are equivalent to the two properties demanded of the
product operator in Definition 6.5.4. The existence and uniqueness of the product
of an n-tuple thus follow from those stated in Theorem 5.8.3.
6.5.5 Remarks.
∏
• Our construction of the product operators i∈n above, simultaneously for all
n-tuples, is a bit abstract. See Exercises 6.17 and 6.16 below (and perhaps
their solutions) for a more direct approach.
∏ any n-tuple a = {ai }i∈n , strictly speaking we should write its

†
From this
∏ point of view, for
product as i∈n (a) instead of i∈n ai ; however, we will use the latter notation exclusively, since
it is standard.
• If A is merely a semigroup it is still possible to define the product of non-

empty n-tuples (i. e., excluding the case n = 0) simply by replacing the first
requirement in Definition 6.5.4 by:
∏
ai = a0 ,
i∈1
and indicating n ̸= 0 in the second requirement.

• Actually, Pi Notation can be defined as long as ∗ is a binary operation on
the set A, just as in the case of a semigroup. However, we warn the reader
that Pi Notation as defined above is not commonly used to define products
of n-tuples with respect to a non-associative operation ∗.
• Obvious variations of the
∏ notation (and its definition) are
∏ often used. For
instance, the notation 0≤i<n is far more common than i∈n . On the other
hand,
∏ ∏
n
as well as the “top-bottom adorned” notation
0≤i≤n i=0
are both equivalent to ∏
i∈n+
+
(note it is n and not n).
The meaning and formal construction of more general such product opera-
tions as
∏n
ai
i=m
for any natural numbers (or integers) m ≤ n and set {ai }i∈I indexed by an
interval I = {i ∈ Z | m ≤ i ≤ n} should also be clear.
• When the operation ∗ is commutative, it is sensible to define the product
∏
ai
i∈I
of any family {ai }i∈I indexed by a finite set I (not necessarily linearly or-
dered). Any such finite set is in bijection with a natural number n through
some function f : n → I, so {af (k) }k∈n is an n-tuple whose product is well
defined. It is routine, though tedious, to prove a generalized commutative
law stating that, under the hypothesis that ∗ is commutative, the n-tuples
corresponding to different bijections f all have the same product, which is
then defined to be the product of the I-tuple {ai }.
6.5.6 Example. To demonstrate the expressive power of the Pi Notation, note that
in the monoid (N, ·, 1) consisting of the natural numbers under multiplication, we
could have defined powers by ∏
mn = m
i∈n
(i.e., mn is the product of the constant n-tuple {ai }i∈n such that ai = m for
all i). See Example 6.5.13 (and also Exercise 6.20) for a definition of products in
a formally identical manner. See also Exercise 6.18 below for an adaptation of the
above definition to any monoid.
6.16
∏ Exercise. Provide a direct proof of the uniqueness of the product operators
i∈n : An
→ A by induction (rather than by appealing to the uniqueness stated
in the recursion theorem).
6.17
∏ Exercise. Use the Bounded Recursion Theorem 5.8.5 to provide a definition
of i∈n ai .
6.18 Exercise. In a monoid (A, ∗, ι∗ ), for any element a ∈ A and any natural
number n ∈ N define the n-th ∗-power a∗n (or simply an if the operation is clear
from context) by ∏
an = a,
i∈n
that is, an is the ∗-product of the constant n-tuple {ai | i ∈ n} with ai = a for all i.
1. Show that the definition above is equivalent to a recursive one, namely:
• a0 = ι∗ , and
• am = am ∗ a for all m ∈ N.
+
2. Prove that the Laws of Exponents
(a) am+n = am ∗ an , and

(b) am·n = (am )n
hold for all m, n ∈ N.
6.19 Exercise. Let {ai | i ≤ n} be an n+ -tuple in a monoid (A, ∗, ι). Prove that
∏ ∏
ai = a0 ∗ ai+1 .
i≤n i<n
(Observe that, for n = 2, the statement asserts the associativity (a0 ∗ a1 ) ∗ a2 =

a0 ∗ (a1 ∗ a2 ) of ∗.)
6.5.7 Definition. Let f : X → X. For any natural number n we define the n-fold
composition f (n) : X → X as the n-th power f ◦n of f in the monoid X X (in the
notation of Exercise 6.18). This is equivalent to a recursive definition:
• f (0) = idX , and
• f (n+1) = f ◦ f (n) .
(By Exercise 6.18.2(b), it is immaterial whether we write f (n) ◦ f or f ◦ f (n) . For

psychological reasons we prefer the latter: see Definition 6.5.10 below.)
6.5.8 Examples. As a rather trivial example, every natural number n is obtained
from 0 by n applications of the successor function σ:
n = σ (n) (0).
Addition of natural numbers may be defined using repeated applications of the

successor function σ, namely σ (n) : m 7→ m + n. Perhaps more eloquently,
m + n = σ (n) (m).
(See Exercise 6.1, where the notation σn was used for σ (n) .)
The above is the law of exponents σ (m+n) = σ (n) ◦ σ (m) (applied to 0).
Similarly, the product m · n is n-fold repeated addition of m, starting from 0:
( )(n)
m · n = σ (m) (0),
( )(n)
which is the law of exponents σ (m·n) = σ (m) (applied to 0).
6.5.9 Remark. Let an n-tuple of functions {fm : X → X | m ∈ n} be given.
Using Pi Notation
∏ in the monoid (X X , ◦, idX ) we obtain a definition for the n-fold
composition i∈n fi as a function from X → X. One can, however, compose func-
tions whose domain and codomain are different sets. This motivates the following
definition.
6.5.10 Definition. Let an N-tuple of sets {Ai | i ∈ N} be given. For any n-tuple
∏ f = {fi : Ai → Ai+ | i ∈ n} one defines the composition of {fi },
of functions
denoted ◦i∈n fi (or, informally, fn ◦ · · · ◦ f1 ◦ f0 ) recursively as follows:
∏
• ◦i∈0 fi = idA0 (the composition of an empty family is the identity).
∏ ∏
• ◦i∈n+ fi = fn ◦ ◦i∈n fi . Note the order!
(Informally, fn+1 ◦ fn ◦ · · · ◦ f0 = fn+1 ◦ (fn ◦ · · · ◦ f0 )).
∏
We often will drop the adornment ◦ from ◦ , hoping it will not lead to confusion.
Now we show that the composition operators above exist and are well defined.
The details are very similar to those for Pi Notation.
Let Fn be the set of all such n-tuples {fi }, where for each i ∈ n we have
fi : Ai → Ai+ .‡ Let Bn = AA n be the set of all functions A0 → An . Let Cn = Bn
0 Fn
be the set of all functions transforming n-tuples {fi }∏

to functions g : A0 → An . For
each n, we seek to define a certain element π(n) = ◦i∈n ∈ Cn . Define π0 ∈ C0 by
π0 (∅) = idA0 . Now define φn : Cn → Cn+ as follows: For any function σ : Fn → Bn ,
let τ : Fn+1 → Bn+1 be that function such that, for any n+ -tuple f :
τ (f ) = fn ◦ σ(f |n).
∏
Then the requirements in the definition of i∈n read as follows:
• π(0) = π0 , and
• π(n + 1) = φn (π(n)).
By the recursion theorem 5.8.3, the existence and uniqueness of the composition
of a family is proved.
6.5.11 Definition. (Sigma Notation) Let (A, +, 0A ) be an additive monoid (that
is, a monoid whose operation, denoted by the symbol ‘+’, is commutative). Then,
by convention, the binary operation is called the addition of A and the identity
element 0A is called the zero of A. In this situation, the product of an n-tuple
{ai }i∈n of elements of A with respect to the additive operation + is
∏called the sum
∑
of the n-tuple, and Pi notation is replaced by Sigma notation ( becomes ),
which nonetheless has the same formal properties:
• The sum of a 0-tuple is the zero element of A:
∑
ai = 0 A .
i∈0
• For every n ∈ N and every n+ -tuple {ai }i∈n+ :

( )
∑ ∑
ai = ai + an ,
i∈n+ i∈n
where ∑
ai
i∈n
denotes the sum of the n-tuple obtained from the n+ -tuple {ai }i∈n+ by re-
moving its last element an .
‡
∏ Ai+
Fn = i∈n (Ai ) is a Cartesian product in the sense of Definition 7.2.2.
6.5.12 Remarks.
• Suitably reinterpreted, our above remarks 6.5.5 on Pi Notation apply also to

Sigma Notation. Note in particular the possibility (which we will not pursue)
of defining the sum of any finite family of elements of an additive monoid.
• A ring (Definition 6.5.14) has operations both of addition and multiplication

(see Definitions 6.5.14, 6.5.18 and 6.5.20). By convention, in any ring, Sigma
notation always refers to a sum computed using the operation of addition,
and Pi notation to a product computed using the operation of multiplication.
6.5.13 Example. In the monoid (N, +, 0) consisting of the natural numbers under
multiplication, we could have defined
∑
m·n= m
i∈n
(i.e., m · n is the sum of the constant n-tuple {ai }i∈n such that ai = m for all i.)
See Exercise 6.20 below for an adaptation of this definition to any additive monoid.
6.20 Exercise. In an additive monoid (A, ⊕, 0A ), for any element a ∈ A and any
natural number n ∈ N define the n-th multiple n • a by
∑
n•a= a,
i∈n
that is, n • a is the ⊕-sum of the constant n-tuple {ai | i ∈ n} with ai = a for
all i. (We are denoting addition in A by ⊕ to preclude confusion below with the
addition m + n of natural numbers.)
1. Show that the definition above is equivalent to a recursive one, namely:
(a) 0 • a = 0A (do not confuse 0 ∈ N with 0A ∈ A), and

(b) m+ • a = (m • a) ⊕ a for all m ∈ N.
2. For all m, n ∈ N, prove the equalities
(a) (m + n) • a = (m • a) ⊕ (n • a)
(the mixed distributivity of • over addition§ ), and
(b) (m · n) • a = m • (n • a) (the mixed associativity of • and ·).
§
The distributivity is mixed because the addition + in N is likely different from ⊕ in A.
6.21 Exercise. Let {ai | i ∈ n} and {bi | i ∈ n} be n-tuples in an additive monoid

(A, +, 0) (whose operation of addition is commutative). Prove the identity
∑ ∑ ∑
(ai + bi ) = ai + bi .
i<n i<n i<n
6.22 Exercise. Let {aij | i ∈ m ∧ j ∈ n} be a subset, indexed by m × n, of an

additive monoid A, for some m, n ∈ N. (Note: To lighten the notation, for any
(i, j) ∈ m × n we write aij (or sometimes ai,j ) for a(i,j) . In this notation, ij does
not stand for the product i · j.)
Prove the identity
( ) ( )
∑ ∑ ∑ ∑
aij = aij .
i<m j<n j<n i<m
(Note that the special case n = 2 above is just Exercise 6.21.)

6.5.14 Definition. (Commutative) Ring with Identity: If A is a set over
which the binary operations + and · are defined, 0 and 1 are both elements of A,
with 0 ̸= 1, (A, +, ·, 0, 1) is called a ring with identity if
(1) (A, +, 0) is an Abelian group;
(2) the operation · is associative and distributes over +:
x · (y + z) = (x · y) + (x · z)
for all x, y, z ∈ A.
(3) 1 is an identity for ·, i.e.: (A, ·, 1) is a monoid.

If (2) is replaced by
(2’) the operation · is associative, commutative, and distributes over +:
x · (y + z) = (x · y) + (x · z)
for all x, y, z ∈ A,
then A is called a commutative ring with identity.
6.5.15 Remarks. • Note that “commutative” in the definition above refers to
the operation of multiplication of the ring; the operation of addition in any
ring is commutative. In fact, by convention throughout mathematics, any
operation conventionally called “addition” and denoted by the symbol ‘+’
must be both associative and commutative.
• Some authors call a ring with identity as defined above simply a ring, yet oth-
ers define a ring as a structure which does not (necessarily) have an identity.
For us, every ring shall have an identity, though for emphasis and clarity we
will usually stick to the clumsy nomenclature “ring with identity”.
6.5.16 Definition. Let (A, +, ·, 0A , 1A ) be a ring. There exists a unique function
ι : N → A such that ι(0) = 0A and ι(n + 1) = ι(n) + 1A . The function ι is uniquely
extended to a function Z → A by setting ι(−n) = −ι(n) for all n ∈ N. The image
ι∗ (Z) is called the prime ring of A. One can always identify an integer n ∈ Z with
its image ι(n) = ñ ∈ A.
6.5.17 Remark. The prime ring P of A is actually a ring in its own right (a
subring of A), though P need not be a subring isomorphic to Z (for one thing: ι
need not be injective).
6.5.18 Definition. Integral Domain: A commutative ring with identity (A, +, ·, 0, 1)
is called an integral domain if it has no zero divisors, i.e.:
(∀x ∈ A)(∀y ∈ A)(x ̸= 0 ∧ y ̸= 0 ⇒ x · y ̸= 0).
(A divisor of zero in a ring A is an element x ̸= 0 in A such that x · y = 0 for some
y ̸= 0 in A.)
6.5.19 Remarks. (Z, +, ·, 0, 1) and (Q, +, ·, 0, 1) are both integral domains. The
latter has more structure, because all nonzero elements are invertible. The struc-
ture, called a field, is introduced below.
The rational numbers Q, the real numbers R and the complex numbers C,
(but not the natural numbers N or the integers Z) with the usual operations of
addition and multiplication, provide examples of a mathematical structure called
a field. The real numbers R shall be introduced in Note 8. The prime ring of each
of these rings is the ring Z itself (when we regard Z ⊂ Q ⊂ R ⊂ C).
6.23 Exercise. Let {ai | i < n} be an n-tuple in a ring with identity (R, +, ·, 0, 1),
and let c ∈ R. Prove the identity
∑ ∑
c· ai = (c · ai ).
i<n i<n
(Note that the special case n = 2 is the distributivity property c · (a0 + a1 ) =

(c · a0 ) + (c · a1 ).)
6.24 Exercise. Let {ai }i<m and {bi }j<n be tuples in a commutative ring with
identity (R, +, ·, 0, 1). Prove the identity
( ) ( )
∑ ∑ ∑∑
(6.4) ai · bj = (ai · bi )
i<m j<n i<m j<n
(which can be seen as a generalization of the distributivity of multiplication over

addition: the product of two sums “expands” to a sum of products).
Try to generalize the above identity to the case of a product of three or more
(or, as a challenge, to n such) sums. As a concrete challenge, try to find an identity
expressing ( )
∏ ∑
aij
j<n i<m
as a sum.
6.5.20 Definition. Fields: A commutative ring with identity (A, +, ·, 0, 1) is
called an field if every nonzero element has a multiplicative inverse:
[ ]
(∀x ∈ A) x ̸= 0 ⇒ (∃y ∈ A)(a · y = 1) .
6.5.21 Remark. A field is also an integral domain, since the existence of multi-
plicative inverses implies that there are no zero divisors.
6.5.22 Definition. Self-Contained Definition of Field: A self-contained defi-
nition of field goes as follows: A field is a set A together with two binary operations,
addition (+) and multiplication (·), with the following properties. (The multipli-
cation symbol is not explicitly written out.)
(1) Addition: To every pair x and y in A there corresponds an element x + y
of A such that
(i) addition is associative: (x + y) + z = x + (y + z);
(ii) addition is commutative: x + y = y + x;
(iii) there exists an element (additive identity) 0 of A such that x+0 =
x for every x in A;
(iv) for every x in A there is an element (additive inverse) −x of F
such that x + (−x) = 0.
(2) Multiplication: To every x and y in A there correspond an element xy of
A such that
(i) multiplication is associative: (xy)z = x(yz);
(ii) multiplication is commutative: xy = yx;
(ii) there exists an element (multiplicative identity) 1 of A, with 1 ̸=
0, such that 1x = x for all x in A;
(iv) every x in A other than 0 has a (multiplicative inverse) y in A
such that xy = 1. (Note: The multiplicative inverse is necessarily
unique and denoted x−1 .)
(3) Multiplication distributes over addition: x(y + z) = xy + xz. (Note:

Multiplication takes precedence over addition. Thus, xy + zw is read as
(xy) + (zw).)
1 x
6.5.23 Notation. We write x − y for x + (−y), or 1/x for x−1 , or x/y for
x y
xy −1 , x2 for xx, x3 for xxx and so on. Note that a field must have at least two
elements (0 and 1).
6.5.24 Proposition. The addition axioms 5.8(1) imply the following.
(1) x + y = x + z → y = z.
(2) x + y = x → y = 0.
(3) x + y = 0 → y = −x.
(4) −(−x) = x.
Proof. We prove (1): If x + z = y + z then
y = 0+y
= (−x + x) + y
= −x + (x + y)
= −x + (x + z)
= (−x + x) + z
= 0+z
= z.
We prove (2): Take z = 0 in (1).

We prove (3): Take z = −x in (1).
We prove (4): In (3) replace x with −x and use the equality −x + x = 0.
6.5.25 Proposition. The multiplication axioms 5.8(2) imply the following.
(1) x ̸= 0 ∧ xy = xz → y = z.
(2) x ̸= 0 ∧ xy = x → y = 1.
(3) x ̸= 0 ∧ xy = 1 → y = x−1 .
−1
(4) x ̸= 0 → [x−1 ] = x.
Proof. We prove (1): If xy = xz and x ̸= 0 then
y = 1y = (x−1 x)y = x−1 (xy) = x−1 (xz) = (x−1 x)z = 1z = z.
We prove (2): Take z = 1 in (1).

We prove (3): Take z = x−1 in (1).
We prove (4): In (3) replace x with x−1 and use the equality x−1 x = 0.
6.5.26 Proposition. The field axioms 5.8 imply the following.
(1) 0x = 0.
(2) x ̸= 0 ∧ y ̸= 0 → xy ̸= 0.
(3) (−x)y = −(xy) = x(−y).
(4) (−x)(−y) = xy.

Proof. We prove (1): From 0x + 0x = (0 + 0)x = 0x and 5.10(2) we obtain
0x = 0.
We prove (2): The proof is by contradiction. Suppose that x ̸= 0, y ̸= 0 and
xy = 0. Then, thanks to (1) we have
1 = 1 · 1 = (y −1 y)(x−1 x) = y −1 x−1 xy = y −1 x−1 0 = 0,
a contradiction.
We prove (3): Apply 5.10(3) to
(−x)y + xy = (−x + x)y = 0y = 0
to obtain (−x)y = −(xy), which is the first equality in (3). Apply 5.10(3) to
x(−y) + xy = x(−y + y) = x0 = 0
to obtain x(−y) = −(xy), which is the second equality in (3)

We prove (4): Thanks to (3) and 5.10(4) we have
(−x)(−y) = −[x(−y)] = −[−(xy)] = xy.
6.5.27 Remark. Q is an example of an ordered field, which is defined below.

6.5.28 Definition. Ordered Fields: An ordered field (A, +, ·, 0, 1, <) is a field
(A, +, ·, 0, 1) together with a strict ordering < of A such that for all x, y, z ∈ A we
have
(1) x < y ↔ x + z < y + z; and
(2) 0 < x ∧ 0 < y → 0 < xy.
An ordered field (A, +, ·, 0, 1, <) is called
(3) dense if (A, <) is dense;
(4) complete if (A, <) is complete.
We say that x is positive if 0 < x and that x is negative if x < 0.

6.5.29 Proposition. In an ordered field we have:
(1) x > 0 ↔ −x < 0.
(2) x > 0 ∧ y < z → xy < zy.
(3) x < 0 ∧ y < z → xy > xz.
(4) x ̸= 0 → x2 > 0.
(5) 1 > 0.
(6) 0 < x < y → 0 < y −1 < x−1 .
Proof. We prove (1): For x > 0 we have 0 = −x + x > −x + 0 = −x. Hence,0 >
−x. For x < 0 we have 0 = −x + x < −x + 0 = −x. Hence, 0 < −x.
We prove (2): If z > y then z − y > y − y = 0, giving x(z − y) > 0 for x > 0.
Hence
xz = x(z − y) + xy > 0 + xy = xy.
We prove (3): Thanks to (1),(2) and 5.12(3) we have
−[x(z − y)] = (−x)(z − y) > 0.
Hence, x(y − z) < 0 and xz < yz.

We prove (4): Thanks to 5.14(2), we have x > 0 → x2 > 0. If x < 0 then
−x > 0. Hence, (−x)2 > 0. Then, since (−x)2 = x2 by 5.12(4), we have x2 > 0.
We prove (5): Use x = 1 in (4).
We prove (6): If y > 0 and z ≤ 0 then yz ≤ 0. Since yy −1 = 1, we have
y −1 > 0. Using the same reasoning with y replaced by x we obtain x−1 > 0.
Finally, we multiply both sides of x < y by the positive value x−1 y −1 > 0 to obtain
y −1 < x−1 .
6.5.30 Proposition. The rationals are dense.
Proof. Given any two rationals x and y with x < y, we have

x+x x+y y+y
x= < < = y.
2 2 2
Therefore, by choosing z = (x + y)/2 we have x < z < y with z ∈ Q.
6.5.31 Remark. The rationals are not complete. In the sense of being dense, Q
has no gaps. Nevertheless, Q has holes in the sense that it fails to be complete: not
every subset of Q that is bounded above has a least upper bound (and not every
subset of Q that is bounded below has a greatest lower bound). We now proceed
to show this.
Define A to be the set of all positive rationals x such that x2 < 2 and define B
to be the set of all positive rationals x such that 2 < x2 . A, B are both nonempty
since 1 ∈ A and 2 ∈ B. Whenever 0 < x < y we have 0 < x2 < y 2 (since
x2 = xx < xy < yy = y 2 ), so x ∈ A ∧ y ∈ B ⇒ x < y. In particular, every element
of B is an upper bound for A, and every element of A is a lower bound for B.
We claim that A has no largest element and B has no smallest element. That
is, for every x ∈ A there is a y ∈ A such that x < y and for every x ∈ B there is a
y ∈ B such that y < x.
Let x > 0 be given. Define
x2 − 2 2x + 2
(1) y =x− = .
x+2 x+2
The above gives
(2x + 2)2 2(x2 − 2)

(2) y2 − 2 = − 2 = .
(x + 2)2 (x + 2)2
If x ∈ A we have x2 − 2 < 0. Then (1) gives y > x > 0 and (2) gives y 2 − 2 < 0,
i.e.: y ∈ A. If x ∈ B we have x2 − 2 > 0. Then (1) gives 0 < y < z and (2) gives
y 2 − 2 > 0, i.e.: y ∈ B.
Suppose that A had a supremum sup A ∈ Q. Certainly sup A > 0; moreover,
sup A ∈ / A since A has no largest element. We also have sup A ∈ / B since B, all
of whose elements are upper bounds for A, itself has no smallest element. Hence
(sup A)2 = 2 by trichotomy. Yet the square of no rational number is equal to 2,¶ a
contradiction. Similarly, inf B does not exist in Q.
The failure of the rationals to be complete is the primary justification for con-
structing the real numbers.
¶
The unofficial proof of this fact, given in Note 1 long before we had formally defined Q, is
now official.
6.5.32 Definition. The Absolute Value Function: If (A, +, · , 0, 1, <) is any

ordered field, we define the absolute value function | · | : A → A by
{
a, if 0 ≤ a,
|a| =
−a, if a < 0.
6.5.33 Definition. Archimedean Field: An ordered field (A, +, · , 0, 1, <), where

N ⊂ A, is an Archimedean field if for every a ∈ A there is a k ∈ N such that |a| < k.
6.5.34 Proposition. The field Q of rational numbers is Archimedean.
Proof. Let a ∈ Q be given. If |a| = 0, choose k = 1 to obtain |a| < k. Otherwise,
|a| = m/n for unique positive integers m and n with no common divisors. Hence,
m m
|a| = ≤ = m < m + 1.
n 1
Choose k = m + 1.
6.5.35 Corollary. For any two rational numbers a and b with 0 < a < b there is
a positive integer k such that b < ka.
Proof. Given a and b with 0 < a < b, we have 0 < b/a. Apply the proposition to
a/b to obtain 0 < b/a < k for some positive integer k. Hence, b < ka.
Foundations Note 7
The Axiom of Choice

July 28, 2020
Contents
7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-1
7.2 Products and the Axiom of Choice . . . . . . . . . . . . . . . 7-2
7.3 One-Sided Inverses . . . . . . . . . . . . . . . . . . . . . . . . 7-6
7.4 Countable and Uncountable Sets . . . . . . . . . . . . . . . . 7-8
7.1 Introduction
In this note we introduce a new axiom, the axiom of choice (abbreviated AC),
required to justify a line of reasoning that, while superficially appearing extremely
obvious and intuitive, leads to equivalent statements that cannot be proved from
the remaining axioms of ZF. Some consider AC an axiom separate from the axioms
of ZF and refer to ZF + AC (or ZFC) to the axiom system that results when the
axiom of choice is included with the other axioms of ZF.
The axiom of choice sneaks up, somewhat unexpectedly, when we follow what
appears to be a rather obvious line of reasoning. To illustrate this, consider the
following question: Suppose that we are given a surjection f : A → B. Does it
follow that there exists an injection g : B → A? Recall that it is not always possible
to define an injection from any set B to any set A. For example, if B = {0, 1} and
A = {0}, no injection from B to A exists.
The answer to the question is affirmative. We can discover the argument needed
for a proof by reasoning as follows. Suppose that f : A → B is surjective. Define
7-1
Note 7: The Axiom of Choice – July 28, 2020 7-2
g : B → A as follows. Since f is surjective, for every b ∈ B we have that b = f (a)

for some a ∈ A. Define g(b) = a for any such a. It is not important which a ∈ A
we choose, as long as f (a) = b for the chosen b ∈ B.
So, we have a function g : B → A. Is it injective? For g to be injective, no two
distinct b’s in B can be mapped to the same a in A. In other words, if g were not
injective, there would be b and b′ in B such that g(b) = g(b′ ) with b ̸= b′ . This
implies, thanks to the definition of g, that there is an a ∈ A such that f (a) = b
and also f (a) = b′ . This is impossible, since, thanks to b ̸= b′ , we would have
f (a) = b ̸= b′ = f (a),
which is absurd.
Is there something wrong with the above line of reasoning? No, there is not.
But, we have introduced a line of reasoning not used before. We assumed that
for every member b of the arbitrary set B we could choose an element of the
arbitrary set f ∗ ({b}) infinitely often. Is this assumption incorrect? No, it is not,
but, as strange and, perhaps counterintutive and surprising, as it might seem, this
assumption is equivalent to using a new axiom.
7.2 Products and the Axiom of Choice

We begin by reviewing the Cartesian product of two sets defined in Note 4. Recall
that if a is set, so is the singleton {a}; and that if a and b are sets, the unordered
pair {a, b} is also a set. Therefore if a and b are sets, then {{a}, {a, b}} is also a
set, which we defined as the ordered pair (a, b).
Further, for any two sets A and B, for each a ∈ A and b ∈ B, the ordered pair
(a, b) belongs to P(P(A ∪ B)). We defined the Cartesian (or cross) product A × B
(read: “A cross B”) of A and B by
{ }

(CP 1)
A × B = (a, b) ∈ P(P(A ∪ B)) a ∈ A ∧ b ∈ B .
Thanks to the axiom schema of separation, A × B is a set.

Our goal is to define arbitrary Cartesian products. In fact, not only do we wish
to define products of the form A0 × A1 × · · · × An , or of the form A0 × A1 × A2 × · · · ,
but we also wish to define the Cartesian product of an indexed set {Ai | i ∈ I} for an
arbitrary indexing set I. To accomplish this, we begin by exploring some“natural”
isomorphisms. Then, we will present an alternate definition of A × B that differs
from the previous definition in a nonessential way.
7.2.1 Definition. General Cartesian Products: Does A2 = A × A? It does
not, but we show that there is an isomorphism A2 ∼ A × A. The argument goes
as follows.
Recall that 2 = {0, 1} (by definition) and that A2 is the set of all functions
f : {0, 1} → A (also by definition). Define φ : A2 → A × A as follows. Given any
function f : {0, 1} → A (any element of A2 ), define
φ(f ) = (f (0), f (1)) ∈ A × A.
The function φ is bijective. In fact, its inverse φ−1 : A × A → A2 , maps
(a, b) ∈ A × A to the function f : {0, 1} → A defined by f (0) = a, f (1) = b.
In other words, for every function f ∈ A2 there is exactly one ordered pair
(a, b) ∈ A × A such that (a, b) = (f (0), f (1)). Further, for every (a, b) ∈ A × A
there is exactly one function ga,b ∈ A2 such that
ga,b (0) = a and ga,b (1) = b.
The sets A2 and A × A are essentially interchangeable. (They are isomorphic.)
Is there an analogous reasoning we can use if we replace the product A × A with
the product A × B?
The isomorphism we are looking for is
{f ∈ (A ∪ B)2 | f (0) ∈ A ∧ f (1) ∈ B} ∼ A × B.
To show that such an isomorphism exists, define
φ : {f ∈ (A ∪ B)2 | f (0) ∈ A ∧ f (1) ∈ B} → A × B
as follows. The function
f : {0, 1} → A ∪ B, with f (0) ∈ A and f (1) ∈ B,
is mapped to the ordered pair
φ(f ) = (f (0), f (1)) ∈ A × B.
The inverse
φ−1 : A × B → {f ∈ (A ∪ B)2 | f (0) ∈ A ∧ f (1) ∈ B}
sends every ordered pair (a, b) ∈ A × B to the function f : {0, 1} → A ∪ B defined
by f (0) = a and f (1) = b.
We now do something not usually done in mathematics, we redefine A × B as
(CP 2) A × B = {f ∈ (A ∪ B)2 | f (0) ∈ A ∧ f (1) ∈ B}.
If we agree that this redefinition presents a conflict only over form and not over
content, it is now clear how to keep going. The obvious thing to do is to define
A × B × C = {f ∈ (A ∪ B ∪ C)3 | f (0) ∈ A ∧ f (1) ∈ B ∧ f (2) ∈ C},
then
A×B×C ×D = {f ∈ (A∪B∪C ∪D)4 | f (0) ∈ A∧f (1) ∈ B∧f (2) ∈ C ∧f (3) ∈ D},
and so on.
There is no reason why the above unions cannot be replaced by an arbitrary
union. All of this motivates the definition given below.
7.2.2 Definition. Cartesian Products: The Cartesian product of an indexed
set {Ai | i ∈ I} is
∏
(CP 3) Ai = {f ∈ (∪i∈I Ai )I | f (i) ∈ Ai for all i ∈ I}.
i∈I
7.2.3 Remark. We cannot do away with (CP 1) and replace it with (CP 2).
Definition (CP 2) uses the notion function, which was defined using (CP 1). In
other words, a function f : A → B is as a special subset of the Cartesian product
A×B, which was defined using (CP 1). If we replaced (CP 1) with (CP 2) we would
have a hopelessy circular definition.
The line of thought must be made clear: We use (CP 1) to define the cross
product of two sets, which we use to define functions, which we use to define arbi-
trary cross products via (CP 3). When in (CP 3) we have two sets, this definition
reduces to (CP 2), which differs from (CP 1) only in form, but not in substance
(thanks to the isomorphism we described above).
7.2.4 Remark. When the indexing set I in (CP 3) is empty, the corresponding
product is, of course, empty. The product is also empty ∏ when Ai is empty for some
i. If I ̸= ∅ and Ai ̸= ∅ for all i ∈ I does it follow that i∈I Ai ̸= ∅? (Recall that
the indexing set I is arbitrary.) The following axiom, the axiom of choice,∗ ensures
the outcome.
7.2.5. Axiom of Choice AC1 : The nonempty Cartesian product of nonempty
sets is nonempty. In other∏ words, for every indexed set {Ai | i ∈ I}, if I ̸= ∅ and
Ai ̸= ∅ for all i ∈ I then i∈I Ai ̸= ∅.
An equivalent way to state the axiom is the following. For every indexed set
{Ai | i ∈ I}, if I ̸= ∅ and Ai ̸= ∅ for all i ∈ I then there exists an indexed set
{ai | i ∈ I} such that ai ∈ Ai for all i ∈ I.
∗
The axiom of choice, formulated (not quite in the form given presently) in 1904 by Ernst
Zermelo, was not shown to be independent of the other axioms of set theory until 1963. This feat
was accomplished by Paul Cohen, while proving something even more amazing: the independence
of the continuum hypothesis. (See Cohen, P. J. “The Independence of the Continuum Hypoth-
esis,” Proc. Nat. Acad. Sci. USA, 50, 1143-1148, 1963; and Cohen, P. J. “The Independence of
the Continuum Hypothesis. II,” Proc. Nat. Acad. Sci. USA 51, 105-110, 1964.)
7.2.6 Remark. The name of the above axiom may seem strange (why “axiom
of choice” and not “axiom of products?”).
Let A be a nonempty collection of nonempty sets. (Think of A as an indexed
set.) The axiom of choice says that the product
∏
= {f ∈ (∪a∈A a)A | f (a) ∈ a for all a ∈ A}
a∈A
is nonempty. In other words, there exists a function

∪
f :A→ a such that f (a) ∈ a.
a∈A
Below we elaborate on the nature of such functions.

7.2.7 Definition. Choice Function: A choice function for a collection A of
nonempty sets is any function f : A → ∪A = ∪a∈A a such that
(∀a ∈ A)[f (a) ∈ a].
In other words, a choice function “chooses” an element f (a) from every member a
of A.
7.2.8 Remark. From the above discussion, we see that AC1 above implies AC2
below.
7.2.9. Axiom of Choice AC2 : Every nonempty collection of nonempty sets has
a choice function.
7.2.10 Remark. When ∪ A =∪P(X) \ {∅}, the set of all nonempty subsets of
some set X, we have a∈A a = x∈P(X)\{∅} x = X and AC2 says that there exists
a function
f : P(X) \ {∅} → X such that f (x) ∈ x.
The function “chooses” an element f (x) from each nonempty subset x of X. We
have proved that AC2 implies AC3 given below.
7.2.11. Axiom of Choice AC3 : For every nonempty set A there is a function
f : P(A) \ {∅} → A such that f (A0 ) ∈ A0 for every nonempty subset A0 of A.
7.2.12 Remark. It turns out that not only AC1 ⇒ AC2 ⇒ AC3 , but that the
three are equivalent. This point shall be discussed later. Warning: Some call f
in AC3 a choice function for A, instead of properly calling it a choice function for
P(A) \ {∅}.
7.3 One-Sided Inverses

We apply the axiom of choice below to complete our discussion of function inverses.
One-sided inverses of a function f : A → B are defined by following the diagram
shown below.
fR
.................................................
............ .........
......... ........
....... f .......
A ..
..
-
.
.. ......
....
...... ... B
...... .......
....
I...........................
I
.............
......... I
..........................................
fL
idA idB
Figure 1: One-Sided Inverses: fL ◦ f = idA and f ◦ fR = idB .

7.3.1 Definition. Left Inverses: A left inverse of f : A → B is a function
fL : B → A such that fL ◦ f = idA .
7.3.2 Definition. Right Inverse: A right inverse of f : A → B is a function
fR : B → A such that f ◦ fR = idB .
7.3.3 Theorem. Existence of Left Inverses: If A is nonempty then every
injection f : A → B has a surjective left inverse,
Proof. Let the injection f : A → B be given. We define a surjection from B to
A, with the desired property, as follows. If b ∈ f∗ (A) then define fL (b) = a, where
f (a) = b. (This a is unique for the given b because f is injective.) If b ∈
/ f∗ (A) then
define fL (b) = a0 , where a0 is any element of A. Of course, there is only one such
fL just in case f is also surjective, since surjectivity means that f∗ (A) = B.
7.3.4 Theorem. Existence of Right Inverses: If A is nonempty then every
surjection f : A → B has an injective right inverse.
Proof. Given the surjection f : A → B, we define an injection from B to A, with
the desired property, as follows.
Thanks to the axiom of choice (AC3 ), there is a function
φ : P(A) \ {∅} → A
such that φ(A0 ) ∈ A0 for every nonempty subset A0 of A.

Because f is surjective, we have f ∗ ({b}) ̸= ∅ for every b ∈ B. Since f ∗ ({b}) ⊂ A
also, φ(f ∗ ({b})) = a with a ∈ A. Define fR (b) = a.
Note that for b ̸= b′ we have f ∗ ({b}) ∩ f ∗ ({b′ }) = ∅ (otherwise f is not a
function). Thus, fR is injective. Note that there is only one such fR just in case f
is also injective, since injectivity means that each f ∗ ({b}) is a singleton.
7.3.5 Remarks. What is wrong with the “proof” given below?
Given the surjection f : A → B, we define an injection from B to A,

with the desired property, as follows.
Because f is surjective, we have f ∗ ({b}) ̸= ∅ for every b ∈ B. Define
fR (b) = a by choosing any a ∈ f ∗ ({b}).
There is nothing wrong if we understand that for every b ∈ B we choose some

a ∈ f ∗ (b) arbitrarily often and that, therefore, we are using a line of reasoning that
is equivalent to asserting the axiom of choice.
The idea of taking for granted the action of choosing arbitrarily often may
appear to be a harmless part of our everyday mathematical reasoning. In fact,
without using the axiom of choice the amount of mathematics that can be devel-
oped is limited. As we shall see, while the existence of choice functions may be
“intuitively obvious” it is equivalent to reformulations that are not.
Another reformulation of the axiom of choice is given below. (Its equivalence
to the other three forms given above shall be discussed later.)
7.3.6. Axiom of Choice AC4 : Every nonempty relation includes a function
with the same domain.
7.3.7 Remark. We give another proof of the existence of right inverses. Given
a surjection f : A → B with A ̸= ∅, define the set
f ◦ = {(b, a) ∈ B × A | (a, b) ∈ f }.
Then f ◦ is a relation from B to A such that
dom f ◦ = B,
ran f ◦ ⊂ A.
Thanks to the axiom of choice (AC4 ), there is a function g ⊂ f ◦ with
dom g = dom f ◦ = B.
Any such g is a right inverse of f .

7.3.8 Theorem. A function has an inverse if and only if the function has both
left and right inverses, in which case all inverses are identical.
Proof. Let f : A → B be given. Clearly, an inverse of f is both a left inverse and
a right inverse. On the other hand, if fL is a left inverse and fR is a right inverse,
then we have
fL = fL ◦ idB
= fL ◦ (f ◦ fR )
= (fL ◦ f ) ◦ fR
= idA ◦ fR
= fR .
7.3.9 Example. The function f : [−1, 1] → [−1, 1] : x 7→ x2 has neither a left

nor a right inverse.
We show that f has no left inverse. Suppose the contrary. Then, there exists
a function fL : [−1, 1] → [−1, 1] such that (fL ◦ f )(x) = x for all x in [−1, 1].
Therefore, with x = −1, we have
(fL ◦ f )(−1) = fL (f (−1)) = fL (1) = −1.
With x = 1, we have
(fL ◦ f )(1) = fL (f (1)) = fL (1) = 1.
We have a contradiction.
7.4 Countable and Uncountable Sets

Recall that a set A is finite if A ∼ n for some n ∈ N; otherwise, A is infinite.
(Note: The empty set is finite.)
7.4.1 Definition. Countable and Uncountable Sets: We say that a set
• A is countable (or countably infinite) if A ∼ N;
• A is uncountable if A is neither finite nor countable, i.e.: A is infinite, but

not countably infinite;
• A is at most countable if A is finite or countable.

7.4.2 Proposition. A subset of a finite set is finite.
Proof. If A ∼ n and B ⊂ A then B is equivalent to an element m of n (by
Proposition 5.6.5), and m ∈ N.
7.4.3 Proposition. The union of two finite sets is finite.
Proof. Let A, B be finite sets. Then A ∪ B is the union of the disjoint sets A and
B \A, and the latter is finite being a subset of the finite set B, by Proposition 7.4.2.
Hence we may assume at the outset, without loss of generality, that A and B
are disjoint, with A ∼ m and B ∼ n. Under this assumption, we claim that
A ∪ B ∼ m + n.
Note first that k ∈ m + n and k ≥ m imply k = m + l for a unique l ∈ n.
Given bijections f : m → A and g : n → B we define h : m + n → A ∪ B by:
{
f (k) if k < m;
h(k) =
g(l) if m ≤ k = m + l.
The verification that h : m + n → A ∪ B is a bijection is left to the reader.

7.4.4 Proposition. The union of a finite set and a countable set is countable.
Proof. Let A be finite and B countable. Without loss of generality we may, and
do assume A and B to be disjoint, since A ∪ B is the union of the countable set B
with the set A \ B, which is disjoint from B and finite, being a subset of the finite
set A. We have bijections f : n → A for some n ∈ N and g : N → B. Define
h : N → A ∪ B by {
f (k) if k < n;
h(k) =
g(k − n) if k ≥ n.
The simple verification that h is a bijection is left to the reader.
7.4.5 Proposition. The union of two countable sets is countable.
Proof. Let A and B be countable sets. Then A ∪ B is the union of the countable
set B with the set A \ B, which is at most countable, being a subset of A. If A \ B
is finite the assertion follows from Proposition 7.4.5. Otherwise A \ B is countable
and disjoint from B. The result therefore follows provided we can prove it under
the additional assumption that A and B are disjoint, which we do below
Given bijections f : N → A and f : N → B, define h : N → A ∪ B by
{
f (l) if k = 2l for some l ∈ N;
h(k) =
g(l) if k = 2l + 1 for some l ∈ N.
By the Division Algorithm, Proposition 6.14, h is uniquely defined on all of N.

The verification that h is a bijection is left to the reader.
7.4.6 Proposition. The union of two sets, each at most countable, is at most
countable.
Proof. This follows from Propositions 7.4.2, 7.4.4 and 7.4.5.

7.4.7 Remark. Below we show another application of the recursion theorem com-
bined with the axiom of choice: We prove that every infinite set includes a count-
ably infinite subset. Before giving a proof, let us examine the following line of
reasoning.
Let A be an infinite set. Since A is nonempty it contains some a0 .

Define A0 = {a0 } and B1 = A \ A0 . Since A is infinite and A0 is finite,
B1 is nonempty. So it must contain some a1 . Clearly, a1 ̸= a0 , since
a0 ∈
/ A1 . Define A1 = A0 ∪ {a1 } = {a0 , a1 }.
Suppose that we have obtained a collection An = {a0 , a1 , . . . , an } of
elements of A such that ai ̸= aj for i ̸= k. Define Bn+ = A \ An . Since
A is infinite and An is finite, Bn+ is nonempty. So, it must contain
some element an+ which is not an element of An . Define
An+ = An ∪ {an+ } = {a0 , a1 , . . . , an , an+ }.
In this manner, we define inductively a set {a0 , a1 , a2 , . . .} ⊂ A, a count-

ably infinite subset of A.
What is wrong with the above argument? The induction principle says that if
S is a set such that 0 ∈ S and such that n ∈ S ⇒ n+ ∈ S for every n ∈ N, then
S = N. Exactly how do we define such a set S? The answer is given below.
7.4.8 Proposition. Every infinite set includes a countably infinite subset.
Proof. Let A be an infinite set and let f : P(A) \ {∅} → A be a choice function for
A.
Let P denote the collection of all finite subsets of A. Since P is infinite, it
follows that if B ∈ P then P \ B is a nonempty subset of A and that, therefore,
P \ B is an element of P(A) \ {∅}.
Define
g : P → P : B 7→ B ∪ {f (A \ B)}.
Thanks to the recursion theorem, there is a function φ : P → P such that φ(0) = ∅
and such that
φ(n+ ) = g(φ(n)) = φ(n) ∪ {f (A \ φ(n))}
for every n ∈ N.
Define h : N → A : n 7→ f (A \ φ(n)). We prove that h is injective to establish
that N is equivalent to a subset of A.
We temporarily interrupt the proof. To illustrate how the ideas work,

note that the role of the function g is temporary. Once it is used to define φ, there
is no further need to look at it again. So, we examine the role played by the three
remaining functions, f , φ, and h. Note that
φ(1) = φ(0) ∪ {f (A \ φ(0))}
= ∅ ∪ {f (A \ ∅)}
= {f (A)}
= {a0 },
where a0 = f (A) is the element of A that f chooses. Next, note that
φ(2) = φ(1) ∪ {f (A \ φ(1))}
= {a0 } ∪ {f (A \ {a0 })}
= {a0 } ∪ {a1 }
= {a0 , a1 },
where a1 = f (A \ {a0 }) is the element of A \ {a0 } that f chooses. Clearly, a1 ̸= a0 .
Next, note that
φ(3) = φ(2) ∪ {f (A \ φ(2))}
= {a0 , a1 } ∪ {f (A \ {a0 , a1 })}
= {a0 , a1 } ∪ {a2 }
= {a0 , a1 , a2 },
where a2 = f (A \ {a0 , a1 }) is the element of A \ {a0 , a1 } that f chooses. Clearly,
a0 , a1 and a2 are distinct elements of A. now, we look at h. We have
h(0) = f (A \ φ(0))
= f (A \ ∅)
= a0 ;
h(1) = f (A \ φ(1))
= f (A \ {a0 })
= a1 ;
h(2) = f (A \ φ(2))
= f (A \ {a0 , a1 })
= a2 .
We resume the proof. Since h(n) = f (A \ φ(n)) ∈ A \ φ(n), we have
(1) (∀n ∈ N)(h(n) ∈
/ φ(n)).
Since h(n) = f (A \ φ(n)) ∈ φ(n) ∪ {f (A \ φ(n))} = φ(n+ ), we have
(2) (∀n ∈ N)(h(n) ∈ φ(n+ )).
(3) (∀n ∈ N)(∀m ∈ N)(n ⊂ m ⇒ φ(n) ⊂ φ(m)).
(4) (∀n ∈ N)(∀m ∈ N)(n ∈ m ⇒ h(n) ̸= h(m)).
For any two natural numbers m and n with m ̸= n either m ∈ n or n ∈ m.

Then, if n ∈ m we have, thanks to (4), that h(n) ̸= h(m). The same conclusion
holds if m ∈ n. Therefore, h is injective.
Foundations Note 8
The Real Numbers

July 28, 2020
Contents
8.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-1
8.2 The Reals . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-2
8.3 Addition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-6
8.4 Multiplication . . . . . . . . . . . . . . . . . . . . . . . . . . 8-12
8.5 The Real Field, Density . . . . . . . . . . . . . . . . . . . . . 8-15
8.6 Monotonic Functions. No-Gaps Condition. . . . . . . . . . . 8-21
8.7 Powers, Roots, Exponentials and Logarithms . . . . . . . . . 8-27
8.8 Sequences and Numerals . . . . . . . . . . . . . . . . . . . . 8-37
8.1 Introduction
Note: In this note the symbol ⇒ is used to denote the sentential connective →
and also to denote a tautological implication (logical consequence). The symbol
⇔ will be used analogously (for esthetic symmetry). The purpose is to release →
from performing triple duty. Thus, → will be used to denote limits (e.g.: 1/x → 0
as x → +∞) and as a function symbol (e.g.: f : A → B).
Recall (see Note 6) that an ordered field (A, +, · , 0, 1, <) is a field (A, +, · , 0, 1)
together with a strict ordering∗ < of A such that for all a, b, c ∈ A we have
∗
Per Note 6, an ordering is a relation that is asymmetric and transitive. Do not confuse
“asymmetric” with “antisymmetric.” To say that < is asymmetric means that a < b ⇒ ¬(b < a).
8-1
Note 8: The Real Numbers – July 28, 2020 8-2
• a < b ⇔ a + c < b + c; and
• a < b ⇔ a · c < b · c, whenever 0 < c.
Further recall (see Note 6) that an ordered field (A, +, ·, 0, 1, <)
• is called dense if for all a and b in A such that a < b there is a c ∈ A such
that a < c < b; and
• is called complete if every nonempty subset of A that is bounded above has a

least upper bound in A. (Alternatively, if every nonempty subset of A that
is bounded below has a greatest lower bound in A.)
In Note 6 we proved that the rationals Q are dense, but not complete. That Q
is dense is easy to see: Given a, b ∈ Q with a < b, we have
a+a a+b b+b
a= < < = b.
2 2 2
Therefore, by choosing c = (a + b)/2, we have a < c < b with c ∈ Q. To show that
Q fails to be complete is a little more subtle, but quite elementary: We define A
to be the set of all positive rationals a such that a2 < 2 and define B to be the
set of all positive rationals b such that 2 < b2 . As shown in Remark 6.5.31, neither
sup A nor inf B exists in Q.
That Q fails to be complete means that not every subset of Q that is bounded
above has a least upper bound and that not every subset of Q that is bounded
below has a greatest lower bound. The reals are constructed to be complete.
8.2 The Reals

We write <, +, ·, 0, 1, −1, . . . instead of <Q , +Q , ·Q , 0Q , 1Q , −1Q , . . .. In other words,
by default, our numbers are the rationals Q and we think of Z as a subset of Q.
We write Z ⊂ Q even though Z is only isomorphic to a subset of Q. Likewise, we
write N ⊂ Z. As noted in Remarks 6.3.4, the identification of Z (and, likewise,
of N) with a subset of Q is unambiguous. As usual, we may omit the dot denoting
multiplication.
8.2.1 Definition. Initial and Final Segments: Let ⊏ be a linear ordering of
a set A and let B ⊂ A.
B is an initial segment of A if
x∈B∧y ⊏x⇒y ∈B
for all x, y ∈ A. In other words, B ⊂ A is an initial segment of A if whenever x is

in B so is everything in A that is to the left (less than) x.
B is a final segment of A if
x∈B∧x⊏y ⇒y ∈B
Note that some of the consequences of B being, say, an initial segment, are
x∈B∧y ∈
/B⇒x⊏y and x ̸∈ B ∧ x ⊏ y ⇒ y ∈
/ B.
8.2.2 Definitions. Let (A, ⊏) be a linearly ordered set. For x ∈ A define
A<x = {y ∈A|y < x}

A≤x = {y ∈A|y ≤ x}
(8.1)
A>x = {y ∈A|y > x}
A≥x = {y ∈A|y ≥ x} .
Clearly, A<x and A≤x are initial segments of A whereas A>x and A≥x are final
segments.
8.2.3 Remarks. We constructed the integers Z from the natural numbers N by
using two natural numbers to name an integer. Then, we constructed the rationals
Q from the integers Z by using two integers to name a rational number. The
construction of the reals from the rationals is more complicated: infinitely many
rationals are needed to name a real.
8.2.4 Definition. Dedekind Cuts: A Dedekind cut is a nonempty proper
initial segment of Q without a largest element. That is, a Dedekind cut is a subset
x of Q such that
(1) x ̸= ∅ and x ̸= Q (x is nonempty proper subset of Q);
(2) x has no largest element, i.e.: for every a ∈ x there is a b ∈ x such that a < b;
(3) x is an initial segment.
8.2.5 Definition. The Real Numbers: The set R of real numbers is defined
as
R = {x ∈ P(Q) | x is a Dedekind cut}.
8.2.6 Definition (Ordering of R). Define the ordering <R on R by
<R = {(x, y) ∈ R × R | x ⊂ y ∧ x ̸= y}.

That is, for all x and y in R we have
x <R y ⇔ x ⊂ y ∧ x ̸= y.
We write x ≤R y if x = y or x <R y, i.e.:
x ≤R y ⇔ x ⊂ y.
We also write x >R y for y <R x and x ≥R y for y ≤R x.

8.2.7 Proposition (Trichotomy for the Real Numbers:). The relation <R is
a linear ordering on R.
Proof. The transitivity of <R follows immediately from the transitivity of set in-
clusion.
Let x and y be real numbers. We must show that exactly one of the alternatives
x <R y, x = y, y <R x,
holds. Clearly, at most one of them holds. We finish the proof by showing that
at least one of them holds. To do this, suppose that x <R y and x = y are both
false. That is, suppose that x ̸⊂ y. We must show that y ⊂ x. Since x ̸⊂ y, there
is some rational r in x \ y. Choose any s ∈ y. If r ≤ s, then, since y is an initial
segment, we have r ∈ y, contradicting r ∈ x \ y. Therefore, we have s < r. Then,
since r ∈ x and x is an initial segment, we have s ∈ x.
We have shown that s ∈ y ⇒ s ∈ x. Since x ̸= y and s is arbitrary, we have
y ⊂ x.
8.2.8 Proposition. Least Upper Bound Property: Every nonempty subset
of R that is bounded above has a unique least upper bound.
Proof. Suppose that A is a nonempty subset of R that is bounded above. We prove
that ∪A is a least upper bound. By trichotomy, it will be unique.
First, we prove that ∪A is a Dedekind cut.
We prove that ∪A ̸= ∅. Since A ̸= ∅, we have a ∈ A for some a ∈ R. Note that
a ̸= ∅ because the empty set is not a Dedekind cut. Since ∅ ̸= a ⊂ ∪A, we have
∪A ̸= ∅.
We prove that ∪A ̸= Q. Let b∗ be an upper bound of A. Then b∗ ̸= Q. Since
x ∈ A ⇒ x ⊂ b∗ we have ∪A ⊂ b∗ . Therefore ∪A ̸= Q.
We prove that ∪A has no largest element. For every x ∈ ∪A we have x ∈ a for
some a ∈ A. Since a is a Dedekind cut, it has no largest element. Hence, x < y
for some y ∈ a. This means that y ∈ ∪A, showing that for all x ∈ ∪A there is a
y ∈ ∪A with x < y.
We prove that ∪A is an initial segment. Choose any x ∈ ∪A. Then, x ∈ a for

some a ∈ A. If y < x then y ∈ a also, because a is an initial segment. Hence,
y ∈ ∪A, showing that ∪A is also an initial segment.
We have shown that ∪A is a Dedekind cut. We show that it is also the least
upper bound of A. Clearly, ∪A is an upper bound of A, i.e.: a ∈ A ⇒ a ≤R ∪A.
Consider any b <R ∪A. This means that there is an x ∈ ∪A with x ∈ / b. Since
x ∈ ∪A, we have x ∈ a for some a ∈ A. This says that b <R a, so that b cannot be
a bound for A.
We have shown that ∪A = sup A.
8.2.9 Proposition. Every nonempty subset of R that is bounded below has a
unique greatest lower bound.
Proof. See the Exercise below.
8.1 Exercise. Prove Proposition 8.2.9.
Henceforth we will denote the order relation <R of R simply by <.
8.2.10 Definition. Intervals The following subsets of the real line R are called
intervals. For real a and b define
(a, b) = {x ∈ R | a < x < b},

[a, b) = {x ∈ R | a ≤ x < b},
(a, b] = {x ∈ R | a < x ≤ b},
[a, b] = {x ∈ R | a ≤ x ≤ b},
(a, +∞) = R>a = {x ∈ R | a < x},
(−∞, b) = R b
we have [a, b] = ∅; on the other hand, [a, a] = {a}, the set with the single point a.
The notation (a, b) is ambiguous: it can denote an open interval or an ordered pair.
Context tells which.
8.2.11 Proposition. Intervals of R can be characterized as the subsets I ⊂ R
having the following intermediacy property:
(∀x, y, z ∈ R)(x, z ∈ I ∧ x < y < z ⇒ y ∈ I).
Proof. See the Exercise below.


8.2.12 Corollary. An initial (or final) segment of an interval I ⊂ R is an interval.
Proof. It is obvious from the definitions that an initial or final segment satisfies
the intermediacy property, hence is an interval by Proposition 8.2.11.
8.3 Addition
We begin by defining the additive identity of R.
8.3.1 Definition. Real Zero: Define 0R to be the set of all negative rationals:
0R = {a ∈ Q | a < 0}.
8.3.2 Definition. Real Addition: Addition +R of two reals is defined by
x +R y = {a + b ∈ Q | a ∈ x ∧ b ∈ y}.
8.3.3 Proposition. x +R y is a Dedekind cut for all x, y ∈ R.
Proof. First, we prove that x +R y ̸= ∅ and x +R y ̸= Q. Since x and y are cuts,
neither is empty. Therefore, x + y ̸= ∅. Further, x ̸= Q and y ̸= Q. Take a ∈ x
and b ∈ y. If a′ ∈
/ x and b′ ∈
/ y then a′ > a and b′ > b. Hence, a′ + b′ > a + b, giving
′ ′
a +b ∈ / x +R y. This shows that x +R y ̸= Q.
Next, we prove that x +R y has no largest element. Choose c ∈ x +R y. Then
c = a + b with a ∈ x and b ∈ y. Take r ∈ x with r > a. Then c < r + b and
r + b ∈ x +R y. This shows that x +R y has no largest element.
We prove that x +R y is an initial segment. Choose c ∈ x +R y. Then c = a + b
for some a ∈ x and b ∈ y. If d < c then d − b < a, giving d − b ∈ x and
d = (d − b) + b ∈ x +R y. This shows that x +R y is an initial segment.
8.3.4 Definition. The Negative of a Real Number For every real x we
want to define the real −x in such a way that x +R (−x) = 0R .
Guided by Figure 1, we may be tempted to define −x so that
a ∈ (−x) ⇔ −a ∈
/x
for all a ∈ Q, i.e.: −x = {a ∈ Q | −a ∈
/ x}.
−x
z }| {
... q q q ... Q →
a 0 −a
| {z }
x
Figure 1: A Visualization of the Negative of a Cut.

Unfortunately, −x defined in this way may not be a cut because it may contain
a largest element. For example, we would have
−0R = {a ∈ Q | −a ∈
/ 0R }
= {a ∈ Q | −a ≥ 0}
= {a ∈ Q | a ≤ 0},
which is not a cut, since 0 is the largest element of −0R and 0 ∈ (−0R ). Therefore
we modify things a little and define
−x = {a ∈ Q | (∃b > a)(−b ∈

/ x)},
which does the trick quite nicely, as we prove in Proposition 8.3.6 below.
8.3.5 Proposition. For any Dedekind cut x ∈ R
a ∈ x ∧ b ∈ (−x) ⇒ a < −b;

a ∈ x ∧ a ≥ −b ⇒ b ∈
/ (−x);
b ∈ (−x) ∧ a ≥ −b ⇒ a ∈/ x.
Before starting the proof of this easy proposition, we remark that it almost
characterizes (−x). In fact, the set of b ∈ Q for which −b > a for all a ∈ x is
exactly our early (not quite correct) definition {b ∈ Q | −b ∈ / x} of (−x). The
proposition is useful because it is often all that is needed to prove of one half of an
identity involving (−x). The proof will not use the fact that (−x) is a Dedekind
cut; this is established in Proposition 8.3.6 below.
Proof. It suffices to prove the first statement since it is tautologically equivalent to
the second and third (in view of ¬(a < −b) ⇔ a ≥ b, by trichotomy).
Let a ∈ x and b ∈ (−x). By definition, there exists c > b with −c ∈ / x. Since
c −c; since x is an initial segment and −c ∈ / x we have −b ∈ /x
and −b > a.
8.3.6 Proposition. If x ∈ R then −x ∈ R.
Proof. We prove that if x is a cut then −x is also a cut.
• We prove that −x ̸= ∅. There is some rational b that is not in x (because

x ̸= Q). Define a = −b − 1. Then a ∈ x, since a < −b and −(−b) = b ∈/ x.
• We prove that −x ̸= Q. Consider any a ∈ x. Since a ∈ A and a ≥ a =

/ x, by Proposition 8.3.5. Hence, −x ̸= Q.
−(−a), we have −a ∈
• We prove that −x is an initial segment. Suppose that a′ < a ∈ (−x). Then

there is some b > a with −b ∈/ x. Hence, for any such b, we have b > a′ with
−b ∈/ x. Therefore, a′ ∈ (−x).
• We prove that −x has no largest element. Consider any a ∈ (−x). There

is some b > a with −b ∈/ x. Because the rationals are dense, there is some
rational a such that b > a′ > a. We have a′ > a with a′ ∈ (−x), as was to
′
be shown.
8.3.7 Proposition. For every positive rational c and every real x there is some
a ∈ x such that a + c ∈
/ x.
Proof. Let c and x be fixed and given as in the statement.
Fix a rational number number b ∈ x. Let
S = {n ∈ N | b + nc ∈
/ x}.
First, we show that S is a nonempty, proper subset of N.

Since a cut is a proper subset of Q, there exists d ∈
/ x. Then b < d (since x is an
initial segment). Since c is positive, the Archimedean property of Q implies that
b + mc > d for some positive integer m. Since d ∈ / x and x is an initial segment
we must have b + mc ∈ / x, hence m ∈ S and S is nonempty. Since b ∈ x we have
0∈/ S so S is a proper subset of N.
Since N is well-ordered there exists a least element n0 in S; moreover n0 ≥ 1 and
n0 −1 ∈ N\S. Letting a = b+(n0 −1)c, we have a ∈ x and a+c = b+n0 c ∈ / x.
8.3.8 Corollary. For every rational c > 0 and any real x there exist a ∈ x and
b∈/ x with b − a < c.
(Intuitively speaking, there can be no gap of any positive length between a cut
x and its complement Q \ x.)
Proof. Let c and x be as in the statement. By Proposition 8.3.7 there exists a′ ∈ x
such that a′ + c ∈/ x. Let a ∈ x such that a > a′ (x has no maximum) and let
b = a′ + c. Then b ∈
/ x and b − a < b − a′ = c.
An alternate (and perhaps simpler) proof is as follows. Choose d ∈ Q such that
0 < d < c and apply Proposition 8.3.7 with d replacing c. We obtain a ∈ x and
b∈/ x such that b − a = d < c.
8.3.9 Proposition. Real addition +R satisfies the axioms of addition in a field.
(1) Real addition is associative: (x +R y) +R z = x +R (y +R z) for all x, y, z ∈ R.
(2) Real addition is commutative: x +R y = y +R x for all x, y ∈ R.

(3) There exists an additive identity 0R : x +R 0R = x for all x in R.

(4) Every real number has an additive inverse for all x in R there exists y ∈ R
with x +R y = 0R . In fact y = −x is one such additive inverse.
(It can be shown that 0R is the only additive identity and that −x is the only
additive inverse of x. See Exercise 8.3 below.)
Proof. We prove (1): Each sum (x +R y) +R z is the set of rational sums
(a + b) + c ∈ Q, where a ∈ x, b ∈ y, and c ∈ z. Thanks to the associativity of
rational addition, i.e.: (a + b) + c = a + (b + c), x +R (y +R z) yields the same set.
We prove (2): Each sum x +R y is the set of rational sums a + b ∈ Q,
where a ∈ x and b ∈ y. Thanks to the commutativity of rational addition, i.e.:
a + b = b + a, y +R x yields the same set.
We prove (3): Choose any x ∈ R. If a ∈ x and b ∈ 0R then a + b < a (since
b < 0). This gives a + b ∈ x, i.e.: x +R 0R ⊂ x. Now, for every a ∈ x there is some
b ∈ x such that a < b. We have a − b ∈ 0R , hence, a = b + (a − b) ∈ x +R 0R ,
giving x ⊂ x +R 0R . We have shown that x = x +R 0R .
We prove (4): We prove that −x is the additive inverse of x. The definition
of +R gives
(1) x +R (−x) = {a + b | a ∈ x ∧ (∃c > b)(−c ∈
/ x)}.
• We prove that x +R (−x) ⊂ 0R . For all b ∈ (−x) we have b < c for some c
such that −c ∈/ x. Since x is an initial segment, we have a < −c for all a ∈ x;
thus a + b < a + c < 0. This shows that a + b ∈ 0R , proving x +R (−x) ⊂ 0R .
• We prove that 0R ⊂ x +R (−x). Choose any c ∈ 0R . Then c < 0 and −c > 0.
Thanks to Corollary 8.3.8 there exist a ∈ x and d ∈
/ x such that d − a < −c.
Let b = c − a. Then b = c − a < −d with −(−d) = d ∈ / x, so b ∈ (−x) by
definition. Moreover, c = a + b ∈ x +R (−x).
8.3 Exercise. Show that 0R is the unique additive identity of R, and that −x is
the only additive inverse of x.
8.3.10 Proposition. Cancellation Law The Cancellation Law
x +R y = x +R z ⇒ y = z,
holds for all x, y and z in R.
Proof. Let x +R y = x +R z. Add (−x) to both sides:
(−x) + (x + y) = (−x) + (x + z)
((−x) + x) + y = ((−x) + x) + z
0R + y = 0R + z
y = z.
8.3.11 Proposition. Addition Preserves Order For all x, y and z in R we

have
x <R y ⇔ x +R z <R y +R z.
Proof. We prove the biconditional in each direction.
• We prove that
(1) x <R y ⇒ x +R z <R y +R z
for all x, y and z in R. We begin by proving
(2) x ≤R y ⇒ x +R z ≤R y +R z.
Indeed, x ≤R y says that x ⊂ y. This, together with the definition of +R ,
x +R z = {a + b ∈ Q | a ∈ x ∧ b ∈ z},
gives
x +R z = {a + b ∈ Q | a ∈ x ∧ b ∈ z} ⊂ {a + b ∈ Q | a ∈ y ∧ b ∈ z} = y +R z,
which says that x +R z ≤R y +R z, which establishes (2). From Proposi-

tion 8.3.10, we obtain
(3) x ̸= y ⇒ x +R z ̸= y +R z
Now, (1) follows from (2) and (3).

• We prove that x +R z <R y +R z ⇒ x <R y for all x, y and z in R. Suppose
that x +R z <R y +R z. Then, (x +R z ⊂ y +R z) ∧ (x +R z ̸= y +R z). We
cannot have x = y, since this would imply x +R z = y +R z. Therefore, by
trichotomy, we only have two possibilities: either x <R y or y <R x.
If y <R x, then y ⊂ x, and we have
y +R z = {a + b ∈ Q | a ∈ y ∧ b ∈ z} ⊂ {a + b ∈ Q | a ∈ x ∧ b ∈ z} = x +R z,
which gives y +R z ≤R x +R z. Since x ̸= y, we have y +R z <R x +R z, a

contradiction (thanks to trichotomy).
The only possibility left is x <R y.
8.3.12 Definition (Subtraction of Real Numbers). For x, y ∈ R define
x −R y = x +R (−y).
Subtraction is merely a cosmetic rewriting of the sum of the additive inverse.

8.4 Exercise. For all x ∈ R show

0R <R x ⇔ −x <R 0R .
8.5 Exercise. For x, y ∈ R show
x <R y ⇔ −y <R −x ⇔ 0R < y −R x.
8.3.13 Definition. Real Absolute Value For x ∈ R we define its absolute
value |x| by |x| = x ∪ −x. Note that |x| is the larger of x and −x. Since x and −x
are both initial segments, |x| is the larger of the two, giving x ⊂ |x| and −x ⊂ |x|.
That is to say: x ≤R |x| and −x ≤R |x|. Note that when we shall finish making
the reals into a field, this definition will match the definition given in Note 6.
8.3.14 Proposition. For all x ∈ R we have 0R ≤R |x|.
Proof. If 0R ≤R x then 0R ⊂ x ⊂ x ∪ −x = |x|. Hence, 0R ≤R |x|.
If x <R 0R then x ⊂ 0R . Hence, 0R ⊂ −x ⊂ x ∪ −x = |x|, which gives
0R ≤R |x|.
8.6 Exercise. Let A be a nonempty subset of R. Define
−A = {−x | x ∈ A} .
Prove the following:
1. A is bounded above if and only if −A is bounded below, in which case
inf(−A) = − sup A.
2. A is bounded below if and only if −A is bounded above, in which case
sup(−A) = − inf A.
8.7 Exercise. Let A be a nonempty subset of R and let c ∈ R. Define
A + c = {x +R c | x ∈ A}
1. A is bounded above if and only if A + c is bounded above, in which case
sup(A + c) = c +R sup A.
2. A is bounded below if and only if A + c is bounded below, in which case
inf(A + c) = c +R inf A.
8.8 Exercise. Let A, B be nonempty subsets of R, both bounded above. Define
A + B = {z ∈ R | z = x +R y for some x ∈ A, y ∈ B}.
Prove that A + B is nonempty and bounded above, and sup(A + B) = sup A +R
sup B.
We remark that the same relationship holds for the infima, as long as the
assumption on upper- is replaced by lower-boundedness.
8.4 Multiplication
We begin by defining the multiplicative identity of R.
8.4.1 Definition. Real One Define 1R = {a ∈ Q | a < 1}.
8.4.2 Definition. Real Multiplication It may be tempting to define multipli-
cation ·R by
x ·R y = {ab | a ∈ x ∧ b ∈ y}.
Unfortunately, the above does not work: Since x and y are both initial segments,
they both contain negative rationals of arbitrarily large magnitude. This results in
x ·R y containing arbitrarily large rationals. Therefore we proceed as follows.
We define 0R ·R x = x ·R 0R = 0R . For 0R <R x and 0R <R y, we define
x ·R y = Q≤0 ∪ {ab ∈ Q | a ∈ x ∧ 0 < a ∧ b ∈ y ∧ 0 < b}.
More generally, we define


 (−x) ·R (−y) if x <R 0R ∧ y <R 0R ,
x ·R y = −[(−x) ·R y] if x <R 0R ∧ 0R <R y,

−[x ·R (−y)] if 0R <R x ∧ y <R 0R .
8.4.3 Proposition. For all x, y ∈ R we have x ·R y ∈ R.
8.4.4 Definition. The Multiplicative Inverse of a Real Number For every
real x we want to define its multiplicative inverse x−1 in such a way that x ·R x−1 =
1R .
A combination of the ideas involved in defining the negative −x and the product
x ·R y suggests the following definition: For x >R 0R , let
x−1 = Q≤0 ∪ {a ∈ Q>0 | (∃b > a)(b−1 ∈/ x)}.

( )
For x <R 0R , define x−1 = − (−x)−1 . (Naturally, 0−1
R is undefined.)
8.4.5 Proposition. If x ∈ R then x−1 ∈ R. Moreover, 0R <R x ⇒ 0R <R x−1 ,

and 0R >R x ⇒ 0R >R x−1 .
Proof. The proof is left as an exercise (see below).
8.4.6 Proposition. The operation ·R satisfies the axioms of multiplication in a
field.
(1) Real multiplication is associative: (x·R y)·R z = x·R (y ·R z) for all x, y, z ∈ R.
(2) Real multiplication is commutative: x ·R y = y ·R x for all x, y ∈ R.
(3) Real multiplication distributes over +R : x ·R (y +R z) = (x ·R y) +Z (x ·R z)

for all x, y, z ∈ R.
(4) There is a multiplicative identity 1R , with 1R ̸= 0R : x ·R 1R = x for all

x ∈ R.
(5) There are multiplicative inverses: For every x ̸= 0R there exists y ̸= 0R such
that x ·R y = 1R . (In fact, y is x−1 as defined above.)
(6) There are no zero divisors: x ·R y = 0R ⇒ x = 0R ∨ y = 0R for all x, y ∈ R.
(7) (−1R ) ·R x = −x for all x ∈ R.

8.11 Exercise. Prove the following consequences of Proposition 8.4.6:
1. 0R ·R x = 0R for all x ∈ R.
2. (−x) ·R y = −(x ·R y) for all x, y ∈ R.
8.4.7 Definition (Division of Real Numbers). For real numbers x, y with

y ̸= 0R define the division (or quotient) of x by y by
x
x/y = = x ·R y −1 .
y
Division is just an alias for multiplication by the reciprocal.
8.4.8 Proposition. For all x, y in R we have
(1) Cancellation Law for multiplication: for all z ̸= 0R , x ·R z = y ·R z ⇒ x = y;
(2) Monotonicity of multiplication by positive numbers: for all z >R 0R ,

x <R y ⇒ x ·R z <R y ·R z.
Proof. (1) Multiply both sides of x ·R z = y ·R z by z −1 .
(2) The particular case w >R 0R ∧ z >R 0R ⇒ w ·R z >R 0R follows immediately

from the definitions of multiplication and of the ordering. Apply the above
with w = y +R (−x) >R 0R to obtain y ·R z + (−x) ·R z = (y +R (−x)) ·R z >R

0R . Adding x ·R z to both sides and using Exercise 8.11:
y ·R z = y ·R z +R 0R
= y ·R z +R (−x) ·R z + x ·R z
>R 0R +R x ·R z
= x ·R z.
8.12 Exercise. Let A be a nonempty subset of (0, ∞). Define

{ }
A−1 = x−1 | x ∈ A .
1. A−1 is bounded above if and only if A is bounded below by a positive number,

in which case sup(A−1 ) = (inf A)−1 .
2. A−1 is bounded below by a positive number if and only if A is bounded above,

in which case inf(A−1 ) = (sup A)−1 .
Try to generalize the problem to subsets of (−∞, 0).

8.13 Exercise. Let A be a nonempty subset of R. For any real number c define
cA = {c ·R x | x ∈ A} .
Prove the following
1. If c > 0 then cA is bounded above if and only if A is bounded above, in which

case sup(cA) = c ·R sup A.
2. If c > 0 then cA is bounded below if and only if A is bounded below, in which

case inf(cA) = c ·R inf A.
3. If c < 0 then cA is bounded above if and only if A is bounded below, in which

case sup(cA) = c ·R inf A.
4. If c < 0 then cA is bounded below if and only if A is bounded above, in which

case inf(cA) = c ·R sup A.
8.14 Exercise. Let A, B be nonempty subsets of R+ , both bounded above. Define
AB = {z ∈ R | z = x ·R y for some x ∈ A, y ∈ B}.
Then AB is nonempty and bounded above, and sup(AB) = sup A · sup B.

8.5 The Real Field, Density

We have proved that the reals form a complete ordered field. Next, we have to
justify the customary notion that the rationals are a subset of the reals. They
are a proper subset (irrational numbers exist) and also dense (there are rational
numbers arbitrarily close to any given real number).
8.5.1 Proposition. Define Φ : Q → R by Φ(a) = Q<a = {b ∈ Q | b < a}.
Then, Φ is injective and for all a, b ∈ Q we have
(1) Φ(a + b) = Φ(a) +R Φ(b),
(2) Φ(ab) = Φ(a) ·R Φ(b),
(3) Φ(0) = 0R ,
(4) Φ(1) = 1R ,
(5) a < b ⇔ Φ(a) <R Φ(b).
8.5.2 Definitions. For every a ∈ Q define aR = Φ(a) and
Q∗ = {aR ∈ R | a ∈ Q} = range of Φ.
Then (Q∗ , +R , ·R , 0R , 1R , <R ) is an Archimedean field with Q∗ ⊂ R. Clearly,

Q ̸= Q∗ , but Q is isomorphic to the bona fide subset (subfield) Q∗ of R. We
identify Q with Q∗ to simply write Q ⊂ R, with the understanding that it is not
Q, but the subset Q∗ of R, that we call “the rationals.” This means that we do
not distinguish between, for example, 3 ∈ Q and 3R = {a ∈ Q | a < 3} ∈ R.
We shall write <, +, ·, 0, 1, −1, . . . instead of <R , +R , ·R , 0R , 1R , −1R , . . .. In
other words, by default, our numbers are the reals R and we think of Q as a subset
of R. We now have
N ⊂ Z ⊂ Q ⊂ R.
The irrationals are the elements of R that are not rational, i.e.: the elements
of R \ Q. (There is no standard notation for the set R \ Q of irrational numbers;
some author use I.)
The set of positive natural numbers is denoted by N+ . Thus,
N+ = {n ∈ N | 0 < n} = {n ∈ Z | 0 < n}.
In Note 6 we proved that < orders N well. This means that every nonempty subset
of N has a least element, i.e.: for every nonempty subset S of N there is an n0 ∈ S
such that n < n0 ⇒ n ∈ / S.
8.5.3 Remark. Henceforth we will never again use the set-theoretical construction
of R as a set of Dedekind cuts. All we shall ever use in subsequent proofs is the
fact that R is a complete ordered field possessing an isomorphically embedded copy
Q∗ of Q. Analysis, in the classical sense, is the study of the reals as a structure
possessing the above properties, rather than the study of any specific set possessing
that structure. It is, of course, far from obvious that such an ordered complete field
should exist at all, and it is precisely to show the existence of any such structure
at all that we have provided a particular example constructed via Dedekind cuts.
The reader is henceforth urged to rely only on the structural rather than set-
theoretical properties of R in subsequent proofs. Of course any new results proved
in this fashion may also be used subsequently. As a first example, the Archimedean
property of R is proved in Proposition 8.5.9 below, using only the ordering and
completeness of the field R.
8.5.4 Proposition. The set R \ Q of irrational numbers is nonempty. In fact,
there exists a unique real number z such that z > 0√and z 2 = 2, and we have
z ∈ R \ Q. (The number z is conventionally denoted 2.)
Proof. Strictly speaking, we already know that Q must be a proper subset of R
since R is complete but Q is not (Remark 6.5.31). Since this fact is so important,
we presently choose to reproduce the argument in a slightly different form, working
directly in R.
Let
A = {x ∈ (0, +∞) | x2 < 2}.
Then A is nonempty (since 1 ∈ A) and bounded above by 2 because 22 > 2 and
the function x 7→ x2 is strictly increasing† in (0, +∞). Let z = sup A. We claim
that z 2 = 2. Certainly z is positive. Define
z2 − 2 2z + 2
(8.2) y=z− = .
z+2 z+2
The above gives
(2z + 2)2 2(z 2 − 2)

(8.3) y2 − 2 = − 2 = .
(z + 2)2 (z + 2)2
If z 2 < 2 then (8.2) gives y > z > 0 and (8.3) gives y 2 < 2, so y ∈ A,
contradicting the upper bound property of z. If z 2 > 2 then (8.2) gives 0 < y < z
and (8.3) gives y 2 > 2, showing (by strict monotonicity) that y is an upper bound
for A and contradicting the minimality of z.
See Definition 8.6.1 below. The strict monotonicity is shown as follows: 0 < x < x′ ⇒ x2 =
†
xx < xx′ < x′ x′ = (x′ )2 .

Thus z 2 = 2. Recalling that the latter equation has no solution z in√Q (Re-
mark 6.5.31) we conclude that√ z 2is irrational. Conventionally one writes 2 for z
2
and the equation z = 2 as ( 2) = 2: z is the (positive) square root √ of 2. Note
that the strict monotonicity of x 7→ x2 on [0, +∞) implies that 2 is uniquely
defined.
8.15 Exercise. Prove that the set of irrational numbers is closed under neither
addition nor multiplication. (Hence R \ Q is not a ring.)
8.5.5 Proposition. For all real nonnegative x there exists a positive integer n
such that n − 1 ≤ x < n. In particular, N+ is unbounded above (and so is N).
Proof. First we prove that N+ is unbounded above. Assume, for the sake of
seeking a contradiction, that N+ is bounded above. Then z = sup N+ exists by
the completeness property of R, Proposition 8.2.8 (which is they key ingredient of
the present proof). Since z is least, there is n ∈ N+ such that z − 1 < n. Then
z < n + 1, and n + 1 ∈ N+ , contradicting that z is an upper bound for N+ . The
contradiction shows that N+ is bounded above (and so is N ⊃ N+ ).
Now fix x nonnegative. Since N+ is unbounded above, there exist positive
integers numbers n > x. Since N is well-ordered, there exists a least n ∈ N
satisfying x < n. Let n0 be such least n. By assumption n0 ̸= 0 since x ≥ 0, so
n0 − 1 ∈ N and, by minimality of n0 , n0 − 1 ≤ x. Then n0 ≤ x < n0 .
8.5.6 Corollary. For all real numbers x there exists a unique integer n such
that n ≤ x < n + 1. In particular, Z is unbounded above and below.
Proof. If x ≥ 0 there exists m ∈ Z+ with m − 1 ≤ x < m, so let n = m − 1. If
x ∈ Z− let n = x. If x < 0 and x is not an integer then m − 1 ≤ −x < m, hence
−m < x ≤ −m − 1 for some integer m. Moreover x ̸= −m − 1 since x ∈ / Z, so let
n = −m.
To show n is unique let n′ be another such. If n ̸= n′ then, without loss of
generality, we may assume n < n′ , so x < n + 1 ≤ n′ ≤ x, a contradiction. Hence
n = n′ .
8.5.7 Notation. For each x ∈ R, the unique integer n such that n ≤ x < n + 1
is denoted by ⌊x⌋ and called the largest or greatest integer in x or sometimes the
integer part of x.
The real number x − ⌊x⌋ ∈ [0, 1) is called the fractional part of x and denoted
*x+ in these Notes. Observe that the identity x = ⌊x⌋ + *x+ uniquely determines
the numbers ⌊x⌋ and *x+ subject to the conditions ⌊x⌋ ∈ Z and *x+ ∈ [0, 1).
8.5.8 Corollary. inf{1/n | n ∈ N+ } = 0.
Proof. Define S = {1/n | n ∈ N+ }. Then, S is nonempty and is bounded below

by 0. Hence, i = inf S ≥ 0. We claim that i > 0 is impossible. If i were positive,
then 1/i < n for some n ∈ N+ , by Proposition 8.5.5, whence i > 1/n would follow,
contradicting the fact that i is a lower bound for S. We conclude that i = 0.
8.5.9 Corollary (Archimedean Property of the Reals). The field R of real
numbers is Archimedean: For every x, y ∈ R with x > 0 there is some positive
integer n such that y < nx.
Proof. Let x, y ∈ R with x > 0. Since N+ is unbounded above there exists a
positive integer n such that x−1 y < n. Multiplying by x > 0 the inequality is
preserved and we obtain y < nx.
8.5.10 Theorem. (R, +R , ·R , 0R , 1R , <R ) is an ordered complete Archimedean
field.
Proof. This sums up the statements of 8.2.7, 8.2.8, 8.3.9, 8.4.6, 8.4.8 and 8.5.9.
8.5.11 Remarks. We leave it as an exercise to prove that the notion of Archimedean

field used above is equivalent to the notion used in Note 6.
Next, we show that Q is also dense in a more general sense.
8.5.12 Proposition (Q is dense in R). For every two reals x and y with x < y
there is a rational a such that x < a < y.
Proof. Let reals x and y with x < y be given. We consider the following cases.
(i) If 0 < x < y then y −x > 0 and the Archimedean property gives the existence
of a positive integer n such that 1 < n(y − x). Hence,
(1) nx + 1 < ny.
Since nx > 0, there is a positive integer m such that
(2) m − 1 ≤ nx < m.
From (1) and (2) we obtain
(3) m ≤ nx + 1 < ny.
From (2) and (3) we obtain nx < m < ny. The desired rational is a = m/n.
(ii) If x ≤ 0 and y > 0, take a = 1/n for sufficiently large n. Then, x < a < y.
(iii) If x < y < 0 then −x > −y > 0. Thanks to (i), with x replaced by −y and
y replaced by −x, we obtain a rational a with −y < a < −x, i.e.: such that
x < −a < y.
(iv) If x < 0 and y ≥ 0, take a = −1/n for sufficiently large n. Then, x < a < y.
8.5.13 Corollary (The irrationals are dense in R). For every two real
numbers a and b with a < b there is an irrational x such that a < x < b.
Proof. Let reals a and b with a < b be given. By Proposition 8.5.12, there exists
c ∈ Q with
a b
√ <c< √ .
2 2
√
Hence, a < c 2 < b.
We remark that we can always choose c ̸= 0 without loss of generality. For, if
a < 0 < b, we can take c = 1/n or c = −1/n
√ for sufficiently large n ∈ N+ .
Assuming thus c ̸= 0 we take x =√c 2 noting that x must be irrational, since
if x were rational so would be x/c = 2, a contradiction.
√
8.16 Exercise. For any integers n, k with n ≥ 2, prove that n k is √ either an
integer m (this means that k = m is a perfect n-th power) or else n k is an
n
irrational number. Your proof may rely on the Fundamental Theorem of Arithmetic
(existence of uniqueness of factorization of positive integers into primes) as well as
the existence and uniqueness of n-roots (a fact proved below in Section 8.7.)
8.17 Exercise. Call two nonzero real numbers x, y commensurable if y/x is ratio-
nal; otherwise x, y are incommensurable.
1. Show that the relation of commensurability is an equivalence relation in the

set of nonzero real numbers. The equivalence class of a nonzero real is called
its commensurability class. (For completeness, we may call {0R } the com-
mensurability class of 0R .)
2. Show that the commensurability class of any nonzero real is dense in R.
8.18 Exercise. For every integer n ≥ 2, n given nonzero real numbers x1 , x2 , . . . , xn

are called pairwise incommensurable if, for all i, j ≤ n with i ̸= j, xi and xj are
incommensurable.
Prove that n pairwise incommensurable numbers exist for each n ≥ 2. (Note
that the special case n = 2 is equivalent to the existence of an irrational number.)
8.19 Exercise. Let any two real numbers a, b be given with a < b.
For each n ≥ 2, prove that n pairwise mutually incommensurable numbers exist
in (a, b).
8.20 Exercise. For any n ∈ N call n real numbers u1 , u2 , . . . , un additively inde-
pendent if, for all integers c1 , c2 , . . . , cn ,
c1 u1 + c2 u2 + · · · + cn un = 0 ⇒ c1 = c2 = · · · = cn = 0.
(For n = 0 the left-hand side of the equation above is the identity 1 = 1, since an
empty product is 1 by definition, hence ∅ is a set of numbers additively indepen-
dent.)
An arbitrary subset S ⊂ R is called additively independent if every n (distinct)
elements of S are additively independent.
Show the following:
1. Any subset of a set of additively independent numbers is additively indepen-
dent.
2. For n = 1, x1 is additively independent if and only if x1 is nonzero.
3. For n = 2, x1 , x2 are additively independent if and only if they are nonzero

incommensurable numbers.
4. For any n ≥ 2, if x1 , x2 , . . . , xn are additively independent, then they are

pairwise incommensurable.
5. When n = 3, find a counterexample to the converse of the statement above.

8.21 Exercise. For any n ∈ N call n nonzero real numbers u1 , u2 , . . . , un multi-
plicatively independent if, for all integers e1 , e2 , . . . , en ,
ue11 ue22 · . . . · uenn = 1 ⇒ e1 = e2 = · · · = en = 0.
(For n = 0 the left-hand side of the equation above is the identity 1 = 1, since
an empty product is 1 by definition, hence ∅ is a set of numbers multiplicatively
independent.)
An arbitrary subset S ⊂ R \ {0} is called multiplicatively independent if every
n (distinct) elements of S are multiplicatively independent.
Assuming the results from section 8.7 (specifically, the Existence and Unique-
ness of Roots and the algebraic laws of powers, logarithms and exponents), prove
the following:
1. If S is an additively independent set, then T = {2x | x ∈ S} is a multiplica-
tively independent set.
2. If T is a multiplicatively independent subset of R+ , then S = {log2 t | t ∈ T }

is an additively independent subset of R.
(Note that there is nothing special about the base 2 above. Any other base b ̸= 1
would work just fine.)
8.5.14 Remark. The reader may ask whether additively/multiplicatively inde-
pendent sets exist of large size (hitherto we only know this up to size n = 2, being
a restatement of the existence of irrational numbers). These exist, not only for any
finite size n, but even infinite. The integral powers of any transcendental number
(such as π or Euler’s number e) provide an explicit example of an additively in-
dependent countable subset of R. Much more is true: there exist subsets S of R
that are both uncountable and algebraically independent, in the sense that only
trivial algebraic relations (i.e., only trivial equations involving addition, subtrac-
tion, multiplication, division, hence integral powers) hold between elements of S.
Such S may even be taken to be a subset of an arbitrarily small nonempty open
interval (a, b). Any such sets S is both additively and multiplicatively indepen-
dent, and much more. However, it is not possible to describe any single such set S
explicitly; nonconstructive proofs of the existence of S rely on indirect arguments.
(A relatively simple counting argument works, based on the fact that there are only
countably many algebraic relations between any countable set of numbers whereas
R is uncountable. See Proposition 9.2.16.)
8.6 Monotonic Functions. No-Gaps Condition.

8.6.1 Definition. Monotonic Function. Let f : I → R be a function defined
on a subset I ⊂ R. We call f
• (monotonically) increasing if x ≤ y ⇒ f (x) ≤ f (y)
• strictly (monotonically) increasing if x < y ⇒ f (x) < f (y)
• (monotonically) decreasing if x ≤ y ⇒ f (x) ≥ f (y)
• strictly (monotonically) decreasing if x < y ⇒ f (x) > f (y)
for all x, y ∈ I. (The word “monotonically” is optional in the nomenclature above
and only used for emphasis.)
In either of the cases above, f is called monotonic. In either of the strict cases
above, f is called strictly monotonic.
8.6.2 Proposition. Let f, g be monotonic functions such that the composition
g ◦ f is defined. Then g ◦ f is monotonic. If both f, g are strictly monotonic, so is
g ◦ f . Specifically,
• If f, g are both (strictly) increasing or both (strictly) decreasing, then g ◦ f

is (strictly) increasing.
• If one of f, g is (strictly) increasing and the other is (strictly) decreasing,

then g ◦ f is (strictly) decreasing.
Proof. We provide the details only in the case when f is strictly increasing and g
strictly decreasing. The remaining cases are all cosmetic variations of this one.
Let x, y ∈ Dom(g ◦ f ), x < y. Then f (x) < f (y) by the strict monotonicity of
f , and
(g ◦ f )(x) = g(f (x)) by definition of composition,

> g(f (y)) since f (x) < f (y) and g is strictly decreasing,
= (g ◦ f )(y).
This shows that g ◦ f is strictly decreasing.

8.6.3 Proposition. A strictly monotonic function is injective. In particular, a
strictly monotonic surjective function f : I → J ⊂ R is bijective.
(Note that no assumption is made about J other than it is the range of f . In
particular, J need not be an interval. However, see Proposition 8.6.13 below.)
Proof. The second assertion follows immediately from the first, which we now
prove.
First, assume f is strictly increasing. For any two real numbers x ̸= y in the
domain of f we have either x < y or y < x; by renaming them we may assume the
first inequality holds. By the strict monotonicity then f (x) < f (y), so f (x) ̸= f (y).
Therefore f is injective.
The case of f strictly decreasing is deduced from the above applied to the
function g defined by g(x) = −f (x) and two obvious facts:
• g is (strictly) increasing if and only f is (strictly) decreasing, and
• g is injective if and only if f is injective.
8.6.4 Proposition. An injective monotonic function is strictly monotonic. In

particular, an invertible monotonic function is strictly monotonic.
Proof. Let f be increasing and injective. Then x < y implies f (x) ≤ f (y); how-
ever, by injectivity f (x) ̸= f (y). Therefore f (x) < f (y). Since invertible implies
injective (and surjective), the second claim is obvious. The proof for f decreasing
is essentially identical and we omit it (or just apply the present proof to −f ).
8.6.5 Definition. Let S ⊂ R. For any c ∈ R we define the initial segment

S<c = {x ∈ S | x < c} and the final segment S>c = {x ∈ S | x > c}.
(Recall Equation (8.1).)
8.6.6 Proposition. Let f : S → R be a monotonic function defined on a dense
subset S of a nonempty open interval I ⊂ R. For each c ∈ I,
• If f is increasing:
sup f (x) ≤ inf f (x).
x∈S<c x∈S>c
• If f is decreasing:
sup f (x) ≤ inf f (x).
x∈S>c x∈S<c
• If f is increasing and f (c) is defined:
sup f (x) ≤ f (c) ≤ inf f (x).

x∈S<c x∈S>c
• If f is decreasing and f (c) is defined:
sup f (x) ≤ f (c) ≤ inf f (x).

x∈S>c x∈S<c
We stress that the conclusion holds for all c ∈ I.

Proof. Since c ∈ I, I is an open interval, and S is dense in I, both sets S<c and
S>c are nonempty.
For all x ∈ S<c and y ∈ S>c , if f is increasing (resp., for all y ∈ S<c and
x ∈ S>c , if f is decreasing) we have f (x) ≤ f (y). The existence of the relevant
suprema/infima follows by completeness in all cases; moreover, the first two state-
ments are obvious. If additionally f (c) is defined, then f (x) ≤ f (c) ≤ f (y). The
last two assertions thus follow as well.
8.6.7 Proposition. Let I ⊂ R be a nonempty open interval, S be a dense subset
of I, and f : S → R be a monotonic function on S. For each dense subset T ⊂ S
and each c ∈ I,
• supx∈T<c f (x) = supx∈S<c f (x) and inf x∈T>c f (x) = inf x∈S>c f (x), if f is in-
creasing, and
• supx∈T>c f (x) = supx∈S>c f (x) and inf x∈T<c f (x) = inf x∈S<c f (x), if f is de-
creasing.
Note that the conclusions hold for all c ∈ I, not merely for all c ∈ S.
Proof. Since T is dense in S, it is also dense in I, so T>c and T<c are also nonempty.
Let f be increasing (resp., decreasing). Since T ⊂ S, T>c ⊂ S>c and T<c ⊂ S<c ,
then supx∈T<c ≤ supx∈S<c and inf x∈T>c ≥ inf x∈S>c (resp., supx∈T>c ≤ supx∈S>c and
inf x∈T<c ≥ inf x∈S<c ). The reverse inequalities follow from the monotonicity of f
noting that for all x, y ∈ S such that x < c < y, the density of T implies that there
exist x′ , y ′ ∈ T with x < x′ < c < y ′ < y.
8.6.8 Definition. No-Gaps Condition. Let I ⊂ R be a nonempty open inter-
val, S a dense subset of I, and f : S → R a monotonic function on S. We say that
f satisfies the No-Gaps condition on I if, for every c ∈ I:
• If f is increasing:
sup f (x) = inf f (x).
x∈S<c x∈S>c
• If f is decreasing:
sup f (x) = inf f (x).
x∈S>c x∈S<c
(Note that, in view of Proposition 8.6.6, the equalities above can both be relaxed
to inequalities ≥.)
8.6.9 Remarks. • By Proposition 8.6.6, if f is defined at c then the common
value of the supremum and infimum in Definition 8.6.8 is f (c).
• By Proposition 8.6.6, the suprema/infima in Proposition 8.6.8 can be taken

over segments of S or of any dense subset T ⊂ S without changing their
value.
• The denomination “No-Gaps” is ad hoc terminology to be used in these Notes.

The No-Gaps condition for monotonic functions whose domain is an open
interval is equivalent to continuity, which is a more general concept applicable
to functions not necessarily monotonic and the main topic of Note 11. The
No-Gaps condition can be see as a prelude to continuity.
8.6.10 Proposition. Let p, q be real numbers. Let f : R → R be defined by
f (t) = pt + q (this is called a linear function or, more properly, an affine function
(or affine mapping) from R to R). Then f is monotonic (increasing if p ≥ 0,
decreasing if p ≤ 0) and satisfies the No-Gaps condition on R. For p ̸= 0, the
monotonicity is strict and f is bijective.
Proof. The statement is obvious if p = 0, for then f is the constant function
f (t) = q. For p ̸= 0 it is very easy to check that f −1 (t) = p−1 t − p−1 q (incidentally,
also a linear function), so f is bijective.
Henceforth we will assume that p > 0, leaving p < 0 to the reader.
f is increasing, since x < y implies 0 ≤ p(y − x) = (py + q) − (px + q) =

f (y) − f (x). If p > 0 the inequality is strict, so f is strictly increasing.
Let c ∈ R and ϵ > 0. Choose x < c < y such that y − x < ϵ/p. Then
f (y) − f (x) = (py + q) − (px + q) = p(y − x) < ϵ. Since this is true for all ϵ > 0 we
conclude that inf x>c f (x) − supx<c f (x) ≤ 0, proving that f satisfies the No-Gaps
condition.
8.6.11 Proposition. Let f be a monotonic function defined on a dense subset
S of a nonempty open interval I. Assume that f has an inverse f −1 defined on
a dense subset T of a nonempty open interval J. Then f satisfies the No-Gaps
condition on I.
Proof. Let c ∈ I. We asume that f is increasing, leaving the trivial modifications
needed to handle the decreasing case to the reader. Let T+ = f∗ (S>c ) and T− =
f∗ (S<c ). By bijectivity and monotonicity it follows immediately that (i) x ∈ T−
and y ∈ T + implies x < y, and (ii) T− ∪ T+ is equal to T (if c ∈ / dom(f )) or
T \ {f (c)} (if c ∈ domf ).
Let α = sup T− , β = inf T+ and T ∗ = T− ∪ T+ . Clearly α, β ∈ J. From (i),
α ≤ β. From (ii), T ∗ is dense in J (since T is). By the density of T ∗ , the interval
(α, β) ⊂ J must be empty (since it contains no points of T ∗ ). Therefore α = β,
concluding the proof.
8.6.12 Proposition (Intermediate Value Theorem: Monotone Case). Let
f be a monotonic function defined on a nonempty open interval I. Assume that f
satisfies the No-Gaps condition.
Let α = inf x∈I f (x) (or α = −∞ if f is not bounded below on I) and β =
supx∈I f (x) (or β = +∞ if f is not bounded above on I). Then the range of f is
an interval with endpoints α, β.
(Note that α, β themselves may or may not be in the range.)
Proof. We will present the details of the proof only in the case when f is increasing,
the decreasing case being analogous (or obtained upon replacing f by −f ). Let
J = Im(f ).
By definition, α ≤ f (x) (if α ̸= −∞) and f (x) ≤ β (if β ̸= ∞) for all x ∈ I. If
both α, β are real, we must have α ≤ β. The result is trivial if α = β so henceforth
we will assume that either α < β, or α = −∞, or β = ∞. We will show below that
J ⊃ (α, β), whence the conclusion that J is an interval with endpoints α, β will
follow immediately (note that α, β themselves may or may not be in J, but that
does not change the conclusion that the latter set is an interval).
Let w ∈ (α, β) and
S = {x ∈ I | f (x) ≤ w},
T = {x ∈ I | f (x) > w}.
The monotonicity of f implies that S is an initial segment, and T a final segment

of I, and also that elements of S are lower bounds for T , and elements of T are upper
bounds for S. S is nonempty, because w > α implies that w is not a lower bound
for f in I (and no w is a lower bound if α = −∞). Similarly, T ̸= ∅. Let c = sup S,
d = inf T . Clearly c ≤ d and sup f (S) ≤ w ≤ inf f (T ). By Proposition 8.2.12, S
is an interval ending at c and T an interval starting at d (note that c may or may
not belong to S, and d may or may not belong to T ). However, S ∪ T = I is clear
from the definitions, so we must have c = d. Now, in the notation of the proof of
Proposition 8.6.6, S contains the initial segment I<c and T the final segment I>c
of I, so sup f∗ (I<c ) ≤ sup f∗ (S) and inf f∗ (T ) ≤ inf f∗ (I>c ). By Proposition 8.6.6
and the present hypothesis we get sup f∗ (I<c ) = f (c) = inf f∗ (I>c ).
Putting everything together we get:
w ≤ inf f∗ (T ) ≤ inf f∗ (I>c ) = f (c) = sup f∗ (I<c ) ≤ sup f∗ (S) ≤ w.
Therefore all of the above are equalities and w = f (c) showing that w ∈ J. Since
w ∈ (α, β) was arbitrary, this shows that (α, β) ⊂ J and concludes the proof.
8.6.13 Proposition. Let f be a strictly monotonic function defined on a nonempty
open interval I. Assume that f satisfies the No-Gaps condition. Then:
• The range of f is an open interval J (with endpoints given in the statement
of Proposition 8.6.12);
• f has a strictly monotonic inverse f −1 : J → I (increasing or decreasing in

agreement with f ); and
• f −1 satisfies the No-Gaps condition.

Proof. By Proposition 8.6.12 the range of f is an interval J. If J has one or two
real endpoints we need to show they do not belong to J. We write the proof for
strictly increasing f only.
Say J has a left endpoint α = inf x∈I f (x). If α ∈ J then α = f (a) for some
a ∈ I. Since I is open, there exists b ∈ I with b < a. By strict monotonicity,
f (b) < f (a). Since α is the left endpoint of J then f (b) ≥ α = f (a). Contradiction.
The proof that J does not contain its right endpoint is completely analogous.
Therefore J is an open interval.
By Proposition 8.6.3, f is both injective and surjective from I to J, hence has
an inverse f −1 : J → I.
Let u, v ∈ J with u < v. Then u = f (x) and v = f (y) for some x, y ∈ J (namely,
for x = f −1 (u) and y = f −1 (v)). If y ≤ x then the monotonicity of f implies that
v = f (y) ≤ f (x) = u contradicting u < v. Hence f −1 (u) = x < y = f −1 (v) and
f −1 is strictly increasing.
Finally, let γ ∈ J and c = f −1 (γ). Since both f and f −1 are strictly in-
creasing, I<c = (f −1 )∗ (J<γ ) and I>c = (f −1 )∗ (J>γ ). Therefore inf y>γ f (y) =
inf(f −1 )∗ (J>γ ) = inf I>c = c = sup I<c = sup(f −1 )∗ (J<γ ) = supy<γ f (y), so f −1
satisfies the No-Gaps condition.
8.7 Powers, Roots, Exponentials and Logarithms

8.7.1 Proposition. Existence and Uniqueness of Roots For each positive
integer n, the function f : R+ → R+ defined by f (x) = xn has an inverse f −1 :
R+ → R+ .
The function f −1 is strictly increasing and satisfies the No-Gaps condition.
8.7.2 Remark. For any y ∈ R+ the unique solution to xn = y, namely x = f −1 (y),
√
is denoted by x = y 1/n or x = n y.
The proof of Proposition 8.7.1 requires several preliminary steps.
8.7.3 Proposition. For any real b > 1 and any n ∈ N+ :
bn > 1.
Proof. This is an easy induction on n, the n = 1 case being the tautology b > 1 ⇒
b > 1. Assume bn > 1 for some n ∈ N+ . Multiplying both sides by the positive
real number b we obtain bn+1 > b. By the hypothesis b > 1 and transitivity we
obtain bn+1 > 1.
8.7.4 Proposition. The function f : R+ → R+ defined by f (x) = x−1 is strictly
decreasing.
Proof. Multiply both sides of x < y by y −1 x−1 to get y −1 < x−1 .
8.7.5 Proposition. For any positive (resp., negative) integer n the n-th power
function f : R+ → R+ defined by f (x) = xn is strictly increasing (resp., strictly
decreasing).
Proof. Let 0 < x < y. Then x−1 y > 1. If n > 0, from Proposition 8.7.3, (x−1 y)n >
1. From the ordinary laws of integer exponents valid in any Abelian group (such
as the Abelian multiplicative group R+ ) we have (x−1 y)n = (x−1 )n y n = (xn )−1 y n ;
hence (xn )−1 y n > 1. Multiplying both sides of this inequality by the positive
number xn we obtain xn < y n as desired.
For n < 0 the result follows from Proposition 8.6.2 applied to f and the function
g : x → x−1 , which is strictly decreasing by Proposition 8.7.4.
8.7.6 Proposition. For any real b > 1 (respectively, 0 1. By Proposition 8.7.3 with k = n−m, bn−m > 1. Multiplying
both sides by the positive number bm and using the ordinary laws of exponents we
get bn > bm as desired.
Now assume 0 1 by proposition 8.7.3
applied to b−1 > 1. Multiplying by the positive number bn we obtain bm > bn as
desired.
8.7.7 Proposition. Let a, b be commuting elements in a ring (that is, ab = ba).
For any n ∈ N+ :
bn − an = (b − a)(bn−1 + bn−2 a + · · · + an−1 ).
In Sigma notation the above reads

∑
n
b − a = (b − a)
n n
bn−k ak−1 .
k=1
∑1
(Note that k=1 b1−k ak−1 = 1, the unity element of the ring.)
Proof. Easy exercise on induction and the laws of exponents safely left to the
reader.
Proof. We are ready to prove Proposition 8.7.1. We show that the function
f : R+ → R+ , x 7→ xn , is strictly increasing, surjective, and satisfies the No-Gaps
condition. By Proposition 8.6.13 and the definition of inverse function, it will follow
that xn = f (x) = y has the unique solution x = f −1 (y).
f is strictly increasing by Proposition 8.7.5.
We claim that α = inf x>0 f (x) = 0. Clearly α ≥ 0. For any ϵ > 0 take
x < min{1, ϵ}. By strict monotonicity of f we have f (x) < f (1) = 1, and by
Proposition 8.7.6 we have α ≤ f (x) = xn ≤ x < ϵ. Since this holds for all ϵ > 0 we
have α = 0.
Now we claim that f is unbounded above. For any y > 0 take x > max{1, y}.
By strict monotonicity of f we have f (x) > f (1) = 1 and by Proposition 8.7.6 we
have f (x) = xn ≥ x > y. Since this holds for all y > 0, f is unbounded above, so
in the notation of Proposition 8.6.12, β = +∞. By the same proposition, Ran(f )
is the open interval (0, ∞), so f is surjective.
Finally, we show that f satisfies the No-Gaps condition.
Let ϵ > 0 be arbitrary. Choose x, y such that x < c < y < c + 1 and
ϵ
y−x< .
n(c + 1)n−1
Then
∑
n
0 < y − x = (y − x)
n n
y n−k xk−1 by Proposition 8.7.7,
k=1
∑n
≤ (y − x) y n−k y k−1 since y > x,
k=1
∑
n
= (y − x) y n−1 laws of exponents,
k=1
≤ (y − x) · n(c + 1)n−1 since y < c + 1, by Proposition 8.7.5,
<ϵ by the assumed upper bound on y − x.
Then inf (c,∞) f − sup(0,c) f ≤ f (y) − f (x) = y n − xn < ϵ. Since ϵ > 0 was arbitrary,
we conclude that inf (c,∞) f ≤ sup(0,c) f . This concludes the proof that f satisfies
the No-Gaps condition.
8.7.8 Corollary. For positive real numbers a and b and positive integer n,
(ab)1/n = a1/n b1/n .
Proof. (The reader is invited to provide a formal proof by induction based on the
following informal argument.)
Define c = a1/n and d = b1/n . We have, by commutativity,
· · }c |d ·{z
ab = cn dn = c| ·{z · · d} = (cd) · · · (cd) = (cd)n .
| {z }
n n n
Since the nth root is unique,

(ab)1/n = cd = a1/n b1/n .
8.7.9 Corollary. For positive real a and positive integers m and n, we have
(am )1/n = (a1/n )m .
Proof. (The reader is invited to provide a formal proof by induction based on the
following informal argument.)
Thanks to Corollary 8.7.8, we have
| ·{z
(am )1/n = (a | ·{z
· · a})1/n = (a · · a1/n}) = (a1/n )m .
1/n
m m
8.7.10 Remark. For positive real b and rational r = m/n > 0 it is tempting to
define br = (bm )1/n . Before we do so we must verify that for m/n = p/q we have
(bm )1/n = (bp )1/q , i.e.: the definition of br is independent of the representative we
use for r. This we do below.
8.7.11 Corollary. If b is a positive real; m, n, p, q are positive integers; and
m/n = p/q; then
(bm )1/n = (bp )1/q .
Proof. If m/n = p/q then mq = pn. Hence, bmq = bpn , giving
(8.4) (bm )q = (bp )n .
The functions defined by b 7→ bn and b 7→ b1/n are inverses of each other, i.e.: for
real positive a
(8.5) (ak )1/k = (a1/k )k = a.
From (8.4) and (8.5) we obtain
((bm )1/n )n )q = (((bp )1/q )q )n .
Hence,
((bm )1/n )nq = (((bp )1/q )qn .
Therefore, by the uniqueness of roots, the above yields (bm )1/n = (bp )1/q .
8.7.12 Definition. In view of the above result, for any x > 0 and for m, n > 0 we
define xm/n = (xm )1/n , noting that this definition does not depend on the choice of
m and n, but only on the value r = m/n. This defines xr for all rationals r > 0.
For rationals r < 0, we define xr = (x−r )−1 = 1/x−r . Finally, we define x0 = 1 for
all x > 0.
8.7.13 Proposition. For each rational number r > 0 (resp., r < 0) the rth -power
function f : R+ → R+ , x 7→ xr , is strictly increasing (resp., strictly decreasing).
Proof. Write r = m/n for some nonzero integers m, n with n > 0. Then, by
definition, f = h ◦ g where g(x) = xm and h(y) = y 1/n . By Proposition 8.7.1, h is
strictly increasing, whereas by Proposition 8.7.5, g is strictly increasing if m > 0
(resp., strictly decreasing if m < 0). The assertion follows from Proposition 8.6.2.
8.7.14 Proposition. For positive real a, b and rational r and s, ar br = (ab)r ,

br+s = br bs and brs = (br )s .
Proof. Write r = m/n and s = p/q with integers m, n, p, q and n, q positive. We

have
(br+s )nq = (bm/n+p/q )nq

= (b(mq+np)/(nq) )nq
= bmq+np
= bmq bnp
= (bm/n )nq (bp/q )nq
= (bm/n bp/q )nq
= (br bs )nq .
From the above and the uniqueness of roots, we obtain br+s = br bs .

Similarly,
(brs )nq = (bm/n·p/q )nq

= (b(mp)/(nq) )nq
= bmp
= (bm )p
= ((bm/n )n )p
= (bm/n )pn
= ((bm/n )p )n
= (((bm/n )p/q )q )n
= ((bm/n )p/q )nq
= ((br )s )nq .
From the uniqueness of roots, we obtain br+s = (br )s .

Finally,
(ar br )n = (ar )n (br )n

= anr bnr
= am b m
= (ab)m
= ((ab)m/n )n
= ((ab)r )n .
From the uniqueness of roots, we obtain ar br = (ab)r .

We wish to extend the definition of the exponential functions x 7→ bx to ar-

bitrary real exponents x. This is accomplished by proving monotonicity and the
No-Gaps condition together with a general result extending functions satisfying the
No-Gaps condition from a dense subset of an interval to the whole interval. First
we prove strict monotonicity.‡
8.7.15 Proposition. Let b > 1 (resp., 0 1. Let x, y ∈ Q, x < y. Write x = m/n, y = p/q for
integers m, n, p, q with n, q > 0. Then mq < np. By Proposition 8.7.6, bmq < bnp .
1
By the strict monotonicity of t 7→ t nq (Proposition 8.7.1) and the laws of exponents
mq 1 1 np
(Proposition 8.7.14): bx = bm/n = b nq = (bmq ) nq < (bnp ) nq = b nq = bp/q = by .
The result for 0 < b < 1 follows similarly, or can be derived from the above
using the identity bx = (b−1 )−x .
As a first step towards proving that f satisfies the No-Gaps condition, we prove
a very useful inequality.
8.7.16 Proposition. Bernoulli’s Inequality: If t ≥ −1 and n ∈ N then
(8.6) (1 + t)n ≥ 1 + nt.
(For t = −1 and n = 0 we interpret 00 as 1.)

Equality holds if and only if t = 0 or n = 0, 1.
Proof. We prove the inequality by induction on n, for any fixed t > −1. For
n = 0 we have (1 + t)0 = 1 = 1 + 0t. Suppose that (8.6) holds for some n ∈ N. We
have
(1 + t)n+1 = (1 + t)n (1 + t)
≥ (1 + nt)(1 + t)
= 1 + (n + 1)t + nt2
≥ 1 + (n + 1)t,
which is (8.6) with n replaced by n + 1. If both n ≥ 1 and t ̸= 0, the last inequality

is strict. Hence, we have shown that (8.6) holds for all n ∈ N, with strict inequality
if t ̸= 0 and n > 1. Since equality clearly holds when t = 0 or n = 0, 1, the proof
is complete.
8.7.17 Proposition. Let b be a positive real number, b ̸= 1. Let f : Q → R+ be
defined by f (x) = bx . Then f satisfies the No-Gaps condition on R.
‡
Note that Proposition 8.7.15 generalizes 8.7.6.
Proof. It follows from the identity bx = (b−1 )−x that the statement for 0 1, so we assume henceforth the latter holds.
From Bernoulli’s Inequality (8.6) with t = (b − 1)/n, (1 + b−1
n
)n ≥ b, so by the
strict monotonicity of n-th roots:
b−1
(8.7) 1+ ≥ b1/n .
n
Let c ∈ R be arbitrary. For any ϵ > 0, choose n ∈ N such that n > ϵ−1 bc+1 .
Now choose rational numbers x, y such that x < c < y and y − x < 1/n. Then
by−x < b1/n by strict monotonicity (Proposition 8.7.15). Now,
by − bx = bx (by−x − 1)
< bc (b1/n − 1)
( )
b−1
≤b c
by (8.7)
n
bc+1
<
n
< ϵ.
Since ϵ > 0 was arbitrary, this shows that inf y>c by − supx<c bx ≤ 0, so f satisfies
the No-Gaps condition.
We are ready to extend the definition of bx to irrational exponents x via the
following general result about extension of monotonic functions satisfying the No-
Gaps condition.
8.7.18 Proposition. Let I ⊂ R be a nonempty open interval, S a dense subset
of I, and f : S → R a monotonic function satisfying the No-Gaps condition (on I).
Then the function f¯ : I → R defined by
{
supx<c f (x) = inf x>c f (x) if f is increasing,
f¯(c) =
inf x<c f (x) = supx>c f (x) if f is decreasing
extends f to a monotonic function satisfying the No-Gaps condition. Moreover, f¯

is increasing (resp., decreasing) if f is (and strictly so if f is).
8.7.19 Remark. In Proposition 8.7.18 above, is essential that I be an open inter-
val. The function x 7→ 1/x is monotonically decreasing on S = (0, +∞), but does
not extend to I = [0, +∞) although S is dense in I.
Proof. We will assume that f is increasing, leaving the formal modifications needed
to treat the case of decreasing f to the reader.
The function f¯ is well defined and extends f by Propositions 8.6.6 and 8.6.8.
For any x, z ∈ I with x < z the density of S allows choosing y ∈ S such that
x < y < z. Then monotonicity of f implies that f¯(x) = inf f∗ (Sx+ ) ≤ f (y) ≤
sup f∗ (Sz− ) = f¯(z) (with inequalities strict if f is strictly monotonic), so f¯ is
monotonically increasing (and strictly so if f is).
To show that f satisfies the No-Gaps condition, it suffices to observe that,
for each c ∈ I, since f¯ extends f and S<c ⊂ I<c , sup f∗ (S<c ) ≤ sup f¯∗ (I<c );
similarly inf f∗ (S>c ) ≥ inf f¯∗ (I>c ). By the No-Gaps condition for f it follows that
sup f¯∗ (I<c ) ≥ inf f¯∗ (I>c ), proving the No-Gaps condition for f¯.
8.7.20 Definition. Combining Propositions 8.7.15, 8.7.18, and 8.6.13 we imme-
diately obtain, for each fixed positive real number b ̸= 1 (called the base), functions
expb : R → R+ (the exponential function base b) and logb : R+ → R (the logarithm
function base b) satisfying the following properties:
• expb and logb are mutually inverse, strictly monotonic functions (increasing,
for b > 1; decreasing, for 0 < b < 1) satisfying the No-Gaps condition, and
• expb (x) = bx for all x ∈ Q.
Henceforth we use the notation bx as an alternative to expb (x). (By Proposi-

tion 8.7.18, expb (x) = bx for all x ∈ Q, so no confusion should arise.)
We also define, 1x = 1 for all x ∈ R.
Note that, since logb is inverse to expb , the only solution x to the equation
y = bx is x = logb (y), and conversely.
8.7.21 Proposition (Laws of real exponents). For positive real a, b and all
real x, y we have
bx+y = bx by ,
(bx )y = bxy , and
(ab)x = ax bx .
Proof. All of the above statements are obvious if a = 1, b = 1, x = 0, or y = 0, so

we assume henceforth a, b ̸= 1 and x, y ̸= 0.
We prove the first identity above for b > 1, 0 < b < 1 following similarly.
For now, let S = Q. Let also x, y ∈ R be arbitrary. By definition of the
exponential function and monotonicity (Proposition 8.6.7), bx = sup f (Sx− ) and
by = sup f (Sy− ). By Proposition 8.7.14, for all u ∈ Sx− , v ∈ Sy− we have u + v ∈
S(x+y)− and bu bv = bu+v . By Exercise 8.8, the supremum of all such u + v is x + y.
By Exercise 8.14, the supremum of all bu bv is bx by . Therefore, bx by = bx+y .
(This is one of the few instances we will meet when an identity is proved in
one step; however, note that we needed two auxiliary propositions: one for sums,
another one for products. With the help of just one of those propositions, we would
have proved an inequality in one direction, and the reverse direction could have
been obtained using infima instead of suprema. We invite the reader to ponder
that, in the absence of the auxiliary propositions of Exercises 8.8 and 8.14, no
inequality could be proved.)
Now we prove the second identity assuming b > 1 and x, y > 0. Let S = Q+ .
Let x, y ∈ R+ , u ∈ Sx− and v(∈ Sy− . By the) laws of exponents 8.7.14, buv = (bu )v .
We clearly have supv∈Sy− supu∈Sx− (uv) = supv∈Sy− (xv) = xy. Since the set
of such products uv (resp., products xv) is clearly dense in (0, xv) (resp., dense
in (0, xy)), the monotonicity of expb (Proposition 8.6.7) implies that
( )
sup sup buv = sup bxv = bxy .
v∈Sy− u∈Sx− v∈Sy−
On the other hand, by the definition of the exponential function expb we have
supu∈Sx− bu = bx , so by the monotonicity of the v th -power function (Proposi-
tions 8.7.13), we have supu∈Sx− (bu )v ≤ (bx )v , by Proposition 8.6.6. By the same
Proposition and the obvious observation bx > bu > 1, the monotonicity of expbx
implies that supv∈Sy− (bx )v = (bx )y . Altogether we obtain
sup sup (bu )v ≤ sup (bx )v = (bx )y .

v∈Sy− u∈Sx− v∈Sy−
Combining the results of the two paragraphs above, we obtain bxy ≤ (bx )y .
The reverse inequality follows by working with infima instead. This completes the
proof of the second identity in the case b > 1 and x, y > 0. The case 0 < b < 1 is
completely analogous.
At this point we note that the special case y = −1 of the second identity,
namely the identity b−x = (bx )−1 , actually follows from the first, already proved
identity by putting y = −x. We also have the special case (b−1 )y = b−y , which
easily follows taking suprema (if b < 1) or infima (if b > 1) over all rationals v < y
in the identity (b−1 )v = b−v . The second identity now follows in full generality (for
x, y ̸= 0) by writing x = ±s and y = ±t with s, t > 0 and combining the two
special cases above with the identity (already proved) applied to s, t.
We now prove the third identity assuming a, b > 1 for every x ∈ R. Recall
that (ab)u = au bu for every rational u. Since ab > 1, by definition of expab we have
supu∈Qx− (ab)u = (ab)x . From the definition of supremum we also have au ≤ ax
and bu ≤ bx for all u ∈ Q− , so au bu ≤ ax bx and supu∈Qx− (au bu ) ≤ ax bx . We
conclude that (ab)x ≤ ax bx and the reverse inequality follows by working with
infima taken over Qx+ , so the third inequality is established for a, b > 1. The cases
when a < 1 or b < 1 are obtained by first applying the identity with a−1 replacing
a or b−1 replacing b (respectively, which we can do by the case already proved) and
subsequently using the special cases x = −1, y = −1 of the second identity.
8.7.22 Proposition (Laws of logarithms). For any fixed positive real base
b ̸= 1, all a ∈ R and all x, y > 0 we have
logb (xy) = logb (x) + logb (y), and
logb (xa ) = a logb (x).
Proof. By the laws of real exponents and the mutually inverse relation between
expb and logb ,
expb (logb (x) + logb (y)) = expb (logb x) expb (logb y) = xy.
Applying logb to both sides above we obtain the first identity claimed in the state-
ment.
Similarly,
expb (a logb x) = ba logb x = (blogb x )a = xa .
The second identity now follows applying logb to both sides.
8.7.23 Definition. Fix a real number κ ̸= 0. Define the κth -power function powκ :
R+ → R+ by:
powκ (x) = expx (κ) = xκ .
(Recall that 1κ = 1 for all κ ∈ R.) Then:
8.7.24 Proposition. • powκ is a strictly monotonic function (increasing for
κ > 0, decreasing for κ < 0);
• powκ is bijective, with inverse pow1/κ ; and
• powκ satisfies the No-Gaps condition.
Of course the usual notation for powκ (x) is simply xκ .
Proof. We assume κ > 0, leaving the other case to the reader. First we prove powκ
is monotonic. We assume κ > 0 leaving κ < 0 to the reader. Let now 0 < x < y
and z = y/x > 1. By the (strictly increasing) monotonicity of the v th -power map
for rational v > 0 we have z v > 1v = 1, so z κ > 1 follows. Multiplying both
sides of the inequality by the positive number xκ and using the laws of exponents
(Proposition 8.7.21) we get xκ < y κ , proving that expκ is strictly increasing for
κ > 0.
By the laws of exponents, (x1/κ )κ = x = (xκ )1/κ , so powκ is bijective, with
inverse pow1/κ .
Finally, powκ satisfies the No-Gaps condition by Proposition 8.6.11.
8.8 Sequences and Numerals

8.8.1 Definition. Sequences A sequence is simply an element of a (finite or
infinite) Cartesian product. Below are the details.
For any set S and k ∈ N a function f : k → S is called a finite sequence in S.
If we define an = f (n) for every§ n ∈ k = {0, 1, . . . , k − 1}, we denote the finite
sequence by {an }k−1 n=0 , or {a0 , a1 , . . . ak−1 }. The values a0 , a1 , . . . ak−1 , which need
not be distinct, are called the terms of the finite sequence.
For any set S a function f : N → S is called an infinite sequence in S. In other
words, S N is the collection of all infinite sequences in S. If we define an = f (n), we
denote the infinite sequence by {an }∞ n=0 , or {an }, or {a0 , a1 , a2 , . . .}. The values a0 ,
a1 , a2 ,. . . , which need not be distinct, are called the terms of the infinite sequence.
The term sequence is used to refer to a finite or infinite sequence. The notation
is ambiguous: {an }n=0 k−1
, {a0 , a1 , . . . ak−1 }, {an }∞
n=0 , {an }, {a0 , a1 , a2 , . . .} can denote
a sequence or the range of a sequence. Context tells which is meant. This notation
is used for historical reasons. Some use the notation (an )k−1 n=0 , (a0 , a1 , . . . ak−1 ),
∞
(an )n=0 , (an ), (a0 , a1 , a2 , . . .) instead.
8.8.2 Proposition. Division Algorithm Given integers m and n, with n ≥ 1,
there exist unique integers q (the quotient) and r (the remainder), with 0 ≤ r < n,
such that m = nq + r.
Proof. Define the set S = {m − kn ∈ N | k ∈ Z}. Since n ≥ 1, it follows that
|m| n ≥ |m|. Hence,
m − (− |m|)n = m + |m| n ≥ m + |m| ≥ 0.
Therefore, by choosing k = − |m| we obtain m − kn ≥ 0, i.e.: m − kn ∈ S, showing

that S is not empty.
Thus, S is a nonempty subset of N (all the members of S are nonnegative).
Thank to the well ordering principle, S has a least element r. Thanks to the
definition of S, we have r = m − qn for some q ∈ Z.
We show that r < n. Suppose the contrary. Then r ≥ n gives
m − (q + 1)n = (m − qn) − n = r − n ≥ 0,
which says that m − (q + 1)n ∈ S. Hence m − (q + 1)n = r − n < r contradicting

the minimality of r in S. We conclude that 0 ≤ r < n.
§
Recall that zero in N is simply the empty set and k = (k − 1) ∪ {k − 1} for k ∈ N, k ̸= 0.
Hence, we have 1 = ∅ ∪ {∅} = {∅} = {0}, 2 = 1 ∪ {1} = {0} ∪ {{0}} = {0, {0}} = {0, 1},
3 = 2 ∪ {2} = {0, 1} ∪ {{0, 1}} = {0, 1, {0, 1}} = {0, 1, 2} and so on.
We prove the uniqueness of q and r. Suppose that
m = qn + r and m = q ′ n + r′ ,
with 0 ≤ r < n and 0 ≤ r′ < n. Then, we have r − r′ = n(q − q ′ ). Hence, recalling

that n ≥ 1,
(1) |r − r′ | = n |q − q ′ | .
We add the inequalities −n < −r ≤ 0 and 0 ≤ r′ < n to obtain −n < r′ − r < n,

which implies
(2) |r′ − r| < n.
Thanks to (1) and (2) we have
n |q − q ′ | = |r − r′ | < n,
which implies |q − q ′ | < 1. Hence, since |q − q ′ | ∈ N, we have q = q ′ . Then, with

q = q ′ in (1) we obtain r = r′ .
8.8.3 Definition (Base Expansions for Integers). Let b be an integer greater
than 1 (the base). Let m ∈ Z. First consider the case m > 0. By repeated
applications of the division algorithm we have
m = m1 b + n0 ,
m1 = m2 b + n1 ,
..
.
mk−1 = mk b + nk ,
with mj = 0 for all j ≥ k; 0 ≤ n0 < b, . . . , 0 ≤ nk−1 < b; and 0 < nk < b. This
process yields
m = n k bk + · · · + n 1 b + n 0 .
The sequence {nk , . . . , n1 , n0 } is called the base-b representation of m, or base-b
numeral representing m, and we usually write
∑
k
(8.8) +(ñk . . . ñ1 ñ0 )b to denote (or name) the number m= n j bj
j=0
(the leading lexical symbol ‘+’ and the parentheses are optional in practice; the
suffix b may also be omitted if it is ten or otherwise clear from context), where ñi
stands for a lexical symbol (called a numeral in this context) representing ni , and
where the juxtaposition in (8.8) is purely lexical and does not indicate product.
For example, 12710 does not mean 1 · 2 · 7, but names the number 1 · 102 + 2 · 10 + 7,
so that 12710 , 1778 , 11111112 and 7F16 all name the same number, since
1 · 102 + 2 · 10 + 7 = 1 · 82 + 7 · 8 + 7
= 1 · 26 + 1 · 25 + 1 · 24 + 1 · 23 + 1 · 22 + 1 · 2 + 1
= 7 · 16 + 15.
(The numerals A, B, C, . . . , F, . . . denote ten, eleven, twelve, …, fifteen, ….) The

numbers n0 , n1 , . . . , nk are called the base-b digits of m. Note that any base-b
digit is an integer n such that 0 ≤ n < b. The subscript indicating the base is
frequently omitted when b is ten. Note that k is characterized by the inequalities
bk ≤ m < bk+1 . By the strict monotonocity of logb and Notation 8.5.7, it follows
that k = ⌊logb m⌋.
The integer m = 0 is represented by the single numeral 0̃ = +(0̃)b . We may let
k = 0 in this case.
For m < 0, with the representation of |m| given in (8.8), the representation of
m is obtained replacing the symbol ‘+’ by ‘−’.
8.8.4 Definition. Let b be a base (an integer greater than 1). Let x be a nonneg-
ative real. Let m = ⌊x⌋ be the largest m ∈ N such that m ≤ x. For this m, define
k and nk , nk−1 , . . . , n0 as in Definition 8.8.3 above, and
 n

 n−1 = largest n ∈ N such that m + ≤ x,

 b

 n n

 n−2 = largest n ∈ N such that m +
−1
+ 2 ≤ x,


 b b
(8.9) ..
 .



 n−1 n−j+1 n

 n−j = largest n ∈ N such that m + + · · · + −j+1 + −j ≤ x,

 b b b

 ..
. .
Note that each n−j exists since the set of n ∈ N whose largest element is defined
to be n−j always contains n = 0 but no element n ≥ b. Therefore n−j is a base-b
digit for j = 1, 2, . . . .
We note that for arbitrary x > 0, the first (that is, largest) j such that nj ̸= 0
is still characterized by the inequality bj ≤ x < bj+1 , hence its value is ⌊logb x⌋.
Let S denote the collection of all such sums:
{ }
n−1 n−j
S = sj = m + + · · · + j ∈ R j ∈ N ∧ j > 0 ∧ (8.9) holds .
b b
We have sj ≤ x < sj + b−j , so clearly x = sup S. It follows that x both determines

and is in turn uniquely determined by the sequence of its base-b digits
{nk , . . . , n1 , n0 , n−1 , n−2 , . . .}.
We usually write that sequence of digits as the base-b expansion of x
(8.10) +(ñk . . . ñ1 ñ0 . ñ−1 ñ−2 . . .)b ,
which is a lexical string of numerals conventionally denoting x (where the leading

lexical symbol ‘+’ and the parentheses are optional in practice).¶ The case x < 0
is handled replacing ‘+’ by ‘−’ in the base-b expansion of |x|.
We call any lexical string E as in (8.10) (possibly prefixed by a minus symbol
instead) a (lax) base-b expansion.
By convention we shall call a base-b expansion nonpositive if it is prefixed
by the minus symbol, otherwise it is nonnegative. The positive expansions are the
nonnegative ones except +(0̃.0̃0̃0̃ . . . )b . The negative expansions are the nonpositive
ones except −(0̃.0̃0̃0̃ . . . )b .
Given a base-b expansion E, let
∑
k
(8.11) sj = ni bi for j = 0, 1, 2, . . . ,
i=−j
and let S = {sj | j ∈ Z≤k }—obviously a monotonically increasing sequence. The

base-b expansion is said to denote (or represent, or name, or have the value of ) the
real number
{
sup S if the expansion E is nonnegative;
x=
− sup S if the expansion E is nonpositive.
While we have hitherto tacitly assumed k ≥ 0, we may as well take k as the largest
j such that nj ̸= 0 (perhaps k < 0). For the base-b representations ±(0̃.0̃0̃0̃ . . . )b
we may regard k as being minus infinity and sj as the empty sum with value 0 for
all j.
A priori we do not know whether every base-b expansion denotes a real num-
ber x. Moreover, we have shown that every real number is denoted by some base-b
¶
In Note 10 we will study infinitely long sums (series) and we shall be able to interpret
∑
k
(ñk . . . ñ1 ñ0 . ñ−1 ñ−2 . . .)b as n j bj .
j=−∞
For the time being the reader may just take sup S as the definition of the series above.
expansion, but we do not know if that expansion is unique. In fact, ±(0̃.0̃0̃0̃ . . . )b

are two (lexically) different representations of the real number 0, so expansions are
not always unique.
We call a base-b expansion strict if it is not −(0̃.0̃0̃0̃ . . . )b and its digits {nj |
j ∈ Z≤k } have the following property:
(∀j ∈ Z)(∃i < j)(ni ̸= b − 1).
In other words, an expansion is strict if not all its digits are b − 1 from any point
on. For instance, in base ten,
+(0.999 . . . )10 and + (1.000 . . . )10
both name the number 1. The second expansion is strict, but the first one is not.
8.8.5 Remark. It is easy to see that digits chosen according to (8.9) give a strict
expansion of the nonnegative real number x.
8.8.6 Theorem (Existence and uniqueness of base-b expansions). Fix an
integer base b > 1. The relation “E denotes x” is a bijection between strict base-b
expansions E and real numbers x.
A real number x is denoted by exactly two different base-b expansions if and
only if an integer l exists such that bl x is an integer. Otherwise x is denoted by
exactly one base-b expansion, necessarily strict.
Proof. We already proved the existence of a base-b expansion with digits chosen
according to (8.9).
We show that any base-b representation denotes certain real number x. It
suffices to check that the sequence {sj } with sj given by (8.11) is bounded above.
On the one hand, we clearly have
∑
k
sj ≤ (b − 1)bi = bk+1 − b−j ,
i=−j
so the sequence {sj } is bounded above by bk+1 , hence its supremum x exists. In
fact, since∑the sequence {sj } is monotonically increasing as j decreases and also
si ≤ sj + j−1l=i (b − 1)b = sj + b − b , it follows that sj + b is an upper bound for
l j i j
{si }, hence
(8.12) s j ≤ x ≤ s j + bj
for all integers j. Equality holds on the left (resp., on the right) if and only if
ni = 0 (resp., ni = b − 1) for all i < j.
We claim that two base-b representations
E : +(ñk . . . ñ1 ñ0 . ñ−1 ñ−2 . . .)b and E′ : +(ñ′k . . . ñ′1 ñ′0 . ñ′−1 ñ′−2 . . .)b
agreeing down to but excluding the j-th position, that is, such that ni = n′i for all
i > j and (for the sake of definiteness) with nj > n′j , the inequalities
(8.13) (nj − n′j − 1)bj ≤ x − y ≤ (nj − n′j + 1)bj
hold, with equality on the left (resp., on the right) if and only if ni = 0 and
n′i = b − 1 (resp. ni = b − 1 and n′i = 0)
∑for all i < j. ∑
In fact, for the partial sums sj = i=j ni bi of E and s′j = ki=j n′i bi of E′ , we
k
have, by (8.12),
x − y = (x − sj ) + (sj − s′j ) − (y − sj )
≥ 0 + (nj − n′j )bj − bj
= (nj − n′j − 1)bj ,
with equality if and only if y − s′j = bj and x = sj , that is, ni = 0 and n′i = b − 1 for
all i < j. This proves the left inequality in (8.13) and justifies the case of equality;
the analysis of the right inequality is similar and omitted.
Assuming x = y and E different from E′ , both agreeing down to but excluding
the j-th position as above, it follows from (8.13) that nj − n′j − 1 ≤ 0; since also
n′j < nj by assumption, we conclude nj = n′j + 1. Therefore the left inequality
in (8.13) is an equality and we conclude that E is a strict base-b representation and
E′ a lax but not strict representation, both of x, and E′ is obtained decreasing the
rightmost nonzero numeral of E by one, and replacing all subsequent zero numerals
of E by the numeral denoting b − 1.
Clearly, positive expansions denote positive numbers, and negative expansions
denote negative numbers. As shown above, every positive number is denoted by a
strict positive expansion E, but some are also denoted by a non-strict one E′ . This
also obviously applies to negative numbers and their expansions.
The number zero has two expansions differing only by their syntactic sign, as
noted in Definition 8.8.4. Note that 0 = 1b0 is an integer. Every nonzero real x
having two distinct expansions will have the same sign as either of them. Denote
by l the last nonzero position of the strict expansion E of x. Then 2−l x ∈ Z.
Conversely, if a real x satisfies 2l ∈ Z, we have either x = 0 (which has both a
proper and an improper expansion, or else x ̸= 0. In the latter case n′i = 0 for
all digits with i < −l in the strict expansion of x, but nj ̸= 0 for some j ≥ −l.
Let j the least such, so nj is the right-most nonzero digit of the expansion of x.
Then x also has the non-strict expansion E′ obtained from E as above. The proof
is complete.
8.8.7 Corollary. Every real number possessing a non-strict base-b expansion to

some base b is rational.
Proof. By Theorem 8.8.6, any such number is of the form b−l n for some integer n,
hence is rational.
8.22 Exercise. Prove that there exist countably many numbers in [0, 1] having a
nonstrict binary (base-2) expansion.
(A significant strengthening of this Exercise is given in Corollary 9.2.12.)
Foundations Note 9
Important Subsets of the Reals

July 28, 2020
Contents
9.1 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-1
9.2 Countable and Uncountable Sets . . . . . . . . . . . . . . . . 9-3
9.3 Open and Closed Sets . . . . . . . . . . . . . . . . . . . . . . 9-14
9.4 Properties of Open and Closed Sets . . . . . . . . . . . . . . 9-19
9.5 Convexity and Connectedness . . . . . . . . . . . . . . . . . 9-25
9.6 Compactness . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-26
9.1 Preliminaries
9.1.1 Proposition. For all real x and y we have:
(1) |xy| = |x||y|;
(2) |x|2 = x2 ;
(3) |x| ≤ y ⇔ −y ≤ x ≤ y
(note that both statements are necessarily false when y < 0);
(4) |x| ≤ |y| ⇔ −|y| ≤ x ≤ |y|;
(5) −|x| ≤ x ≤ |x|.

9-1
Note 9: Important Subsets of the Reals – July 28, 2020 9-2
9.1.2 Remark. The triangle inequality is the most important and basic inequal-
ity in analysis. Below we present it in two forms.
9.1.3 Proposition (Triangle Inequality). The inequality |x + y| ≤ |x| + |y|
holds for all real x and y.
Proof. By Proposition 9.1.1 we have
|x + y|2 = (x + y)2
= x2 + 2xy + y 2
≤ x2 + 2|xy| + y 2
= |x|2 + 2|x||y| + |y|2
= (|x| + |y|)2 .
√
The conclusion now follows upon taking square roots (the function t 7→ t is
strictly monotonically increasing on [0, +∞]).
9.1.4 Definition. Let I ⊂ R be an interval and let f : I → [0, +∞). We say that
f is subadditive if, whenever x, y and x + y are in I, we have
f (x + y) ≤ f (x) + f (y).
If, in addition, I = [−a, a] is an interval symmetric about the origin, and f is an
even function (that is, f (−x) = f (x) for all x ∈ I), we say that the function f is
symmetric subadditive (or even subadditive).
In light of Definition 9.1.4, the triangle inequality is the statement that the
absolute value is subadditive. Obviously, it is also symmetric (even).
9.1.5 Proposition. 1. A function f defined on an interval I is subadditive if
and only if f satisfies
f (y − x) ≥ f (y) − f (x) whenever x, y and y − x are in I.
2. If f is subadditive and symmetric, then

f (y − x) ≥ |f (y) − f (x)| whenever x, y and y − x are in I.
(One is tempted to say that a subadditive function is also “supsubtractive”.)
Proof. 1. First assume f is subadditive. Then
f (y) = f (x + (y − x)) ≤ f (x) + f (y − x) ⇒ f (y − x) ≥ f (y) − f (x).
Conversely, if the last inequality above holds whenever x, y, y − x ∈ I, then
f (x + y) − f (y) ≥ f ((x + y) − y) = f (x) ⇒ f (x) + f (y) ≤ f (x + y).
2. Now assume that f is subadditive and even (and I is symmetric). We have

x − y ∈ I if and only if y − x ∈ I. Thus, we may apply the inequality of
part 1. to y and x (in this order) to obtain
f (x − y) ≥ f (x) − f (y).
Since f (x − y) = f (−(y − x)) = f (y − x) (by the symmetry of f ) we obtain
f (y − x) ≥ max{f (y) − f (x), f (x) − f (y)}

= max{(f (y) − f (x)) , − (f (y) − f (x))}
= |f (y) − f (x)| .

9.1.6 Corollary. The inequality |x| − |y| ≤ |x − y| holds for all real x and y.
Proof. This follows from 2. of the Proposition above. (The function x 7→ |x| is
obviously symmetric, and also subadditive, by the triangle inequality.)
9.2 Countable and Uncountable Sets

9.2.1 Remarks. Recall that a set A is finite if A ∼ n for some n ∈ N (there is a
bijection between A and n; intuitively, A has exactly n elements for some n ∈ N);
otherwise, A is infinite. (Note: The empty set is finite: ∅ ∼ 0. The bijection is the
empty function ∅.) Further recall (Note 7 ) that we say that a set
• A is countable (or countably infinite) if A ∼ N (i.e., if there exists a bijection

f : A → N);
• A is uncountable if A is neither finite nor countable, i.e.: A is infinite, but

not countably infinite;
• A is at most countable if A is finite or countable;
• A ⊂ R is bounded above if it has an upper bound: there exists a real number

u such that x ≤ u for all x ∈ A;
• A ⊂ R is bounded below if it has a lower bound: there exists a number l such

that l ≤ x for all x ∈ A;
• A ⊂ R is bounded if it is bounded above and is also bounded below.
Further, every nonempty subset A of R that has an upper bound has a least upper
bound sup A and every nonempty subset A of R that has a lower bound has a
greatest lower bound inf A.
9.2.2 Remarks. Recall (Note 8) that for any set A
• a finite sequence in A is an element of An for some n ∈ N (that is, a function

k=0 , or {ak }, or {a0 , . . . , an−1 }, or a0 , . . . , an−1 for
f : n → A); we write {ak }n−1
such a sequence;
• an infinite sequence in A is an element of AN (that is, a function f : N →

A) and we write {ak }∞ k=0 , or {ak }, or {a0 , a1 , . . .}, or a0 , a1 , . . . for such a
sequence.
The notation {a0 , . . . , an−1 } is ambiguous: it can denote the set whose elements
are listed or the sequence whose range consists of those values, in the order given.
Context tells which one is meant.
9.2.3 Remarks. If A is countable then there is a bijection f : N → A. By writing
an in place of f (n), we can list the elements of A:
(1) a0 , a 1 , a 2 , . . . .
This listing contains no duplicates since f is injective (m ̸= n implies am ̸= an ).

Conversely, if the elements of A can be listed as shown in (1), without duplicate
entries, the assignment an 7→ n defines the bijection f −1 : A → N.
In other words, to say that A is countable is to say that the elements of A can
be listed as shown in (1). A simple example of a countabe set other than N is Z:
Z = {0, −1, 1, −2, 2, . . .}.
The bijection
{
n/2, if n is even,
f : N → Z : n 7→
−(n + 1)/2, if n is odd,
gives an explicit correspondence between N and Z.

9.2.4 Remarks. Recall (Note 7 ) that every infinite set includes a countable
subset. (The proof makes use of the axiom of choice.) Countable subsets are the
smallest infinite sets in the sense that if A is infinite then A includes a countable
subset B, that is, there is an injection from N into A whose image is a countable
subset B ⊂ A (i.e., N ⪯ A.) The construction of such countable subset B requires
the axiom choice in general. However, if A is countable, the axiom of choice is not
needed to prove that every infinite subset B ⊂ A is countable, as we show below.
9.2.5 Proposition. Every infinite subset of a countable set is countable.
Proof. Suppose A is countable and B is an infinite subset of A. Since A is

countable, its elements can be listed: a0 , a1 , a2 , . . .. Define the sequence {nk }∞
k=0 as
follows.
The definition of {nk }∞
k=0 is recursive: Define n0 to be the smallest n such that
an ∈ B. Having already defined
n0 , n1 , . . . , nk ,
define nk+1 to be the smallest n such that n > nk and an ∈ B. (Note: The
existence of a smallest element follows from the well ordering principle: Every
nonempty subset of N has a smallest element. At least one such n must exist since
B is infinite.) It is readily shown that {nk } is a strictly increasing sequence (that
is, n0 < n1 < n2 < . . . ).
The function
f : N → B : k 7→ ank
is the desired bijection between N and B. Note that we have an explicit way of
defining each element ank of B, without resorting to the axiom of choice.
9.2.6 Theorem. For every infinite set A the following three conditions are
equivalent.
(1) A is countable.
(2) There exists an injection f : A → N.
(3) There exists a surjection g : N → A.

Proof. We use the results presented in Note 7 about one-sided inverses: Ev-
ery injection has a (necessarily surjective) left inverse and every surjection has a
(necessarily injective) right inverse.
We prove that (1) implies (2): Suppose that A is countable. Then, there
exits a bijection h : N → A. Thus, h−1 : A → N is a bijection, hence an injection.
Take f = h−1 .
We prove that (2) implies (3): Suppose that (2) holds. Then f has a left
inverse fL : A → N. Left inverses are surjective. Take g = fL .
We prove that (3) implies (1): Suppose that (3) holds. Then g has a right
inverse h : A → N. Since right inverses are injective, h is a bijection h : A → h∗ (A)
onto its image h∗ (A). Thanks to Proposition 9.2.5, h∗ (A) is countable since N is
countable and A ∼ h∗ (A) is an infinite subset of N. Since h∗ (A) and A are
equinumerous, A is countable also.
(We remark that the right inverse h may be constructed without the Axiom of
Choice thanks to the well-ordering of N: For each a ∈ A let h(a) be the minimum
element of the nonempty subset g ∗ ({a}) ⊂ N.)
9.2.7 Theorem. A countable union of countable sets is a countable set: If A0 , A1 ,

A2 , . . . are countable sets, then the union
∪
∞
A= An
n=0
is countable
Proof. We need to define a bijection N ∼ A. Since each An is countable, its
elements can be listed: an,0 , an,1 , an,2 , . . .. For each m ∈ N the number of elements
an,k such that n + k = m is finite. Take any listing of all an,k with n + k = 0, then
of all an,k with n + k = 1, then of all an,k with n + k = 2, and so on:
a0,0 , a0,1 , a1,0 , a0,2 , a1,1 , a2,0 , a0,3 , a1,2 , a2,1 , a3,0 , . . . ,
|{z} | {z } | {z } | {z }
0 1 2 3
for example.
The list contains all the elements of A, but, if any two sets An and Ak have
an element in common, that element is listed more than once. Therefore, the list
can be reduced, if necessary, to contain no duplicate entries, while containing all of
the elements of A. Suppose this has been done and that the above list contains no
duplicates. The list must be infinite, since A0 is infinite (and a set cannot contain
duplicates). The list without duplicates is of the form
b0 , b 1 , b 2 , . . . .
This defines a bijection N ∼ A : i 7→ bi .
9.2.8 Remark. Some find the visual device presented below helpful. Accord-
ingly, we can start the above proof as follows.
Since each An is countable, its elements can be written
an,0 , an,1 , an,2 , . . . .
Therefore, we can arrange all the elements of all the sets in the entire collection in
the array
a0,0 a0,1 a0,2 a0,3 ···
a1,0 a1,1 a1,2 a1,3 ···
a2,0 a2,1 a2,2 a2,3 ···
a3,0 a3,1 a3,2 a3,3 ···
.. .. .. .. ..
. . . . .
By following the arrows shown in the array below, we can list all the elements
in the array. (We can single-thread the two-dimensional array by starting with a0,0
and following the arrows as shown.)
a0,0 → a0,1 a0,2 → a0,3 ···
↙ ↗ ↙
a1,0 a1,1 a1,2 a1,3 ···
↓ ↗ ↙
a2,0 a2,1 a2,2 a2,3 ···
a3,0 a3,1 a3,2 a3,3 ···
.. .. .. .. ..
. . . . .
The list is given by
a0,0 , a0,1 , a1,0 , a2,0 , a1,1 , a0,2 , a0,3 , a1,2 , a2,1 , a3,0 , . . . .
The rest of the proof proceeds as above.
9.1 Exercise. Prove that the enumeration of a countable family of countable sets
in the proof of Theorem 9.2.7 is given by the function
(m + n)(m + n + 1)
f (m, n) = m +
2
from N × N to N. Prove that f is a bijection. Can you visualize f geometrically?
Can you find other explicit bijections N × N ∼ N?
9.2.9 Corollary. Suppose we are given a collection {Sα | α ∈ A} of sets. If the
indexing set A is at most countable and if for every α ∈ A the set Sα is at most
countable, then the union
∪
(1) S= Sα
α∈A
is at most countable.
Proof. The theorem states that given a collection {Sα | α ∈ N}, where each Sα
is countable, then
∪
∞ ∪
(2) S= Sα = Sα
α=0 α∈N
is countable. If we replace “Sα is countable” with “Sα is at most countable,” we

can conclude that S is at most countable. (There are two relevant cases: If all sets
involved are finite, then S is finite. If at least one of the sets involved is countably
infinite, then S is countably infinite.) The same holds if we replace N with any set
that is at most countable.
Thus, (1) follows from (2) and, with the provisions made, S is at most countable.
9.2.10 Corollary. Let A be a countable set. For n = 1, 2, 3, . . . let An denote
the set of all n-tuples (x1 , x2 , . . . , xn ), where xk ∈ A. Then, each An is countable.
In other words, the finite Cartesian product of countable sets is countable.
Proof. The proof is by induction. For n = 1, the set A1 is the collection of all sin-
gletons (x) with x ∈ A. Therefore, A1 ∼ A. Suppose that An is countable for some
n. Every element of An+1 is of the form (x1 , . . . , xn , xn+1 ), where (x1 , . . . , xn ) ∈ An
and xn+1 ∈ A. For fixed (x1 , . . . , xn ), the set of all (x1 , . . . , xn , xn+1 ) is equivalent
to A. Therefore,
∪
An+1 = {(x1 , . . . , xn , xn+1 ) | x1 , . . . , xn , xn+1 ∈ An+1 }
(x1 ,...,xn )∈An
is a countable union of countable sets. Thanks to the previous corollary, An+1 is

countable
9.2.11 Proposition. The set Q of rational numbers is countable.
Proof. Clearly, Q is infinite, since N ⊂ Q, and we proved that N is infinite.
Let Q+ denote the positive rationals, Q− denote the negative rationals and
write Q = Q− ∪ {0} ∪ Q+ . The array
1/1 1/2 → 1/3 1/4 ···
↓ ↗ ↙ ↗
2/1 2/2 2/3 2/4 ···
↙ ↗
3/1 3/2 3/3 3/4 ···
↓ ↗
4/1 4/2 4/3 4/4 ···
.. .. .. .. ..
. . . . .
lists the elements of Q+ with lots of duplicates.

We follows the argument in the proof of Theorem 1.7 to show that Q+ is
countable. Q− is also countable, since Q− ∼ Q+ . Q is a finite union of at most
countable sets and is, therefore, at most countable. Hence, countable, since it is
infinite.
9.2.12 Corollary. There are countably many real numbers possessing a non-strict
base-b expansion to some base b.
Proof. The set S of such real numbers satisfies N ⊂ S ⊂ Q, by Theorem 8.8.6 and
Corollary 8.8.7. Hence S is infinite. Since Q is countable by Proposition 9.2.11, it
follows that S is countable by Proposition 9.2.5.
9.2.13 Remarks. If you know the Fundamental Theorem of Arithmetic (the
prime factorization theorem), we can argue as follows. Let N+ denote the positive
integers. Every element of Q+ can be written uniquely as m/n, where m and n are
both in N+ and have no common divisors. Then,
f : Q+ → N : m/n 7→ 2m 3n
is an injection, showing that Q+ is countable thanks to Proposition 9.2.5.

Alternatively, we can argue as follows. (N+ )2 = N+ × N+ is countable, i.e.:
there is a bijection f : N+ → (N+ )2 . (Exercise 9.1 provides an equivalence N ∼ N2
from whence an equivalence N+ ∼ (N+ )2 is easily obtained.) The function
g : (N+ )2 → Q+ : (m, n) 7→ m/n
is surjective. The composition g ◦f shows that there is a surjection g ◦f : N → Q+ .

(A bijection composed with a surjection yields a surjection.) Hence, since Q+ is
infinite, it is countable by Theorem 9.2.6.
The rest of the proof (the passage from Q+ to Q) follows as that of Proposi-
tion 9.2.11 above.
9.2.14 Proposition. The interval [0, 1] is uncountable.
Proof. The proof is by contradiction. Assume [0, 1] is not uncountable. Since [0, 1]
is certainly not finite, it must be countable, and so must [0, 1). Let
S = {xj | j ∈ N+ }
be an enumeration of the elements of [0, 1). For j ∈ N+ , let

(j) (j)
E(j) : +(0̃ . ñ−1 ñ−2 . . .)10
be the unique strict decimal expansion of xi . Define a decimal expansion E :

+(0̃ . ñ−1 ñ−2 . . .)10 by {
(j)
0 if n−j ̸= 0,
n−j = (j)
1 if n−j = 0,
for j = 1, 2, . . . .
(j)
Then, for each j ∈ N, E is a different expansion from E(j) since n−j ̸= n−j .
Note that E is strict since it contains no numerals 9̃, hence E denotes a real number
x ∈ [0, 1) different from xj for j = 1, 2, . . . . This contradicts the assumption that
S is an enumeration of all the elements of [0, 1).
9.2.15 Remark. Proposition 9.2.14 is due to Cantor. The elegant indirect argu-
ment used in the proof is known as “Cantor’s diagonal argument” because, if the
digits of the expansions E(1) , E(2) , . . . are written on a square array,
(1) (1) (1)
E(1) : 0 . n−1 n−2 n−2 ...
(2) (2) (2)
E(2) : 0 . n−1 n−2 n−2 ...
(3) (3) (3)
E(3) : 0 . n−1 n−2 n−2 ...
.. .. .. ...
. . .
then E is obtained by traversing (and successively choosing different digits from

those found on) the diagonal.
9.2.16 Corollary. R is uncountable.
Proof. R contains the uncountable set [0, 1] as a subset.
9.2 Exercise. Prove that there are uncountably many lax binary (base-2) expan-
sions of numbers in [0, 1].
9.2.17 Remark. Cantor’s diagonal argument provided in the proof of Proposi-
tion 9.2.14 works just fine in any base b ≥ 3 but not directly in base 2.
Exercises 9.2 and 8.22 above, combined with the results of Note 7.4, provide an
alternate proof of the same proposition using base-2 expansions.
9.2.18 Definition (Cantor Set). The Cantor set is the set of all x ∈ [0, 1] having
a ternary (base-3) lax expansion of the form
E : +(0̃ . ñ−1 ñ−2 . . .)3
where nj ̸= 1 for all j < 0 (that is, nj is 0 or 2 but not 1). We call any such
expansion E a zero/two ternary expansion.
Figure 9.1: Successive approximations to the Cantor set.
(Note that we allow lax expansions. In fact, the Cantor set contains all rational
numbers of the form n/3l for some l ≥ 0 and 0 ≤ n ≤ 3l , by Theorem 8.8.6.
Therefore the Cantor set is infinite.)
9.3 Exercise. Let C be the Cantor set.
1. Use Exercise 9.2 to prove that C is uncountable.
2. For any ϵ > 0, prove that there exists a finite family of intervals I = {[ai , bi ] |
i = 1, 2, . . . , m} (for some m ∈ N+ ) such that
∑
m ∪ ∪
m
(bi − ai ) < ϵ and C ⊂ I= [ai , bi ].
i=1 i=1
Intuitively speaking, the Cantor set has null total length (“zero measure”), yet it
is still uncountable!
9.4 Exercise. The Cantor function is the function f : [0, 1] → [0, 1] defined as
follows.
First, define a transformation d 7→ δ of ternary numerals {0, 1, 2} to binary
numerals {0, 1} by
0 7→ 0
1 7→ 1
2 7→ 1.
Second, for x ∈ [0, 1], write x in base 3 as x = (0.d1 d2 . . . )3 (lax expansions are
allowed).
If none of the digits dj of x are 1 (i.e., if x is in the Cantor set C), define
f (x) = (0.δ1 δ2 . . . )2
by applying the transformation d 7→ δ to each ternary digit dj of x.
Otherwise, there exists a least n ∈ N+ such that dn = 1. In this case, define
f (x) = (0.δ1 δ2 . . . δn )2 .
Succinctly: Apply the substitution d 7→ δ to the digits of the ternary expansion

of x, up to and including, but not beyond, the position (if any) where the first
digit 1 is found. Interpret the result as a binary number.
1. Prove that f is well defined, that is, if x has both a strict and a non-strict
expansion, then both recipes for f (x) agree.
2. Prove that f satisfies the following properties:
• f (0) = 0 and f (1) = 1;

• f (1 − x) = 1 − f (x) for all x ∈ [0, 1].
• f (x/3) = f (x)/2 for all x ∈ [0, 1];
• f is monotonically increasing;
3. Prove that the properties in 2. uniquely characterize the Cantor function.
4. Define functions A : [0, 1] → [0, 1] and B : [0, 1] → [0, 1] by
A(z) = inf{x ∈ [0, 1] | f (x) = z} = inf(f ∗ ({z})),

B(z) = sup{x ∈ [0, 1] | f (x) = z} = sup(f ∗ ({z})).
More specifically, prove the following characterization of the functions A, B:
• Let A(1) = B(1) = 1.

• If z = (0.δ1 δ2 . . . δn−1 1)2 has a binary expansion terminating at the n-th
digit δn = 1, then
A(z) = (0.d1 d2 . . . dn−1 1)3

B(z) = (0.d1 d2 . . . dn−1 2)3
with dj = 2δj for j = 1, 2, . . . , n − 1.

• If z = (0.δ1 δ2 . . . )2 has a proper, nonterminating binary expansion, then
A(z) = B(z) = (0.d1 d2 . . . )3
with dj = 2δj for j = 1, 2, . . . .
5. Prove that A, B take values in the Cantor set C and are right inverses to f :
f (A(z)) = z and f (B(z)) = z for all z ∈ [0, 1].

More specifically, show that f ∗ ({z}) is the closed interval [A(z), B(z)]. Note
that this interval has nonzero length only if z has a terminating binary ex-
pansion.
The graph of y = f (x) is known as the “devil’s staircase”. The staircase
climbs up as one moves from left to right. The intervals [A(z), B(z)] at
height z (having nonzero length) are the “steps” of the staircase.
6. Prove that the devil’s staircase has infinitely countably many steps (of nonzero
length), and that it“steps up” exactly at irrational points x ∈ C (of which
there are uncountably many.)
7. Prove that the sum (series) of the (positive) lengths of the steps of the devil’s
staircase is equal to 1. Hence, the steps essentially “cover the entire length” of
the interval [0, 1], so to speak. (Note: this is only a heuristic interpretation.)
8. Roughly sketch the devil’s staircase (f is so complicated that any drawing is

necessarily only an approximation to its true graph.)
9.2.19 Remark. We have the chain N ∼ Z ∼ Q ⪯ R. We shall show that
Q ≺ R below.
9.2.20 Theorem. R ∼ 2N .
Proof. We establish a bijection R ∼ I, where I = {x ∈ R | −1 < x < 1}. This is
easy to do, take
x
f : I → R : x 7→ .
1 − x2
( √ )
Its inverse is f −1 : y 7→ 2y/ 1 + 1 + 4y 2 .
Now, we define an injection g : I → 2N as follows. As shown in Note 8,
every real x has a base-2 expansion. Thus, every x ∈ I has a unique strict binary
expansion of the form
±0.b−1 (x)b−2 (x) · · · ,
with bk (x) = bk (−x) ∈ {0, 1}. Define the element g(x) ∈ 2N = {0, 1}N (that is,
g(x) is a countable sequence of binary digits 0, 1) by
{
{0, b−1 (x), b−2 (x), · · · } if x < 0,
g(x) =
{1, b−1 (x), b−2 (x), · · · } if x ≥ 0.
g is an injection by the uniqueness of binary expansions (Theorem 8.8.6), and

so is g ◦ f −1 : R → 2N . Therefore R ⪯ 2N .
Now we show 2N ⪯ R. Fix any base b > 2. Given any sequence
a = {a0 , a1 , a2 , . . . }
in 2N = {0, 1}N (that is, aj ∈ {0, 1} for all j ∈ N) let X(a) be the real number
with base-b expansion
+(a0 . a1 a2 . . . )b
Since b > 2, we have b − 1 > 1, and since aj ≤ 1 for all j ∈ N, no digit b − 1
appears in the expansion above, which is therefore strict. By the uniqueness of
(strict) base-b expansion, a 7→ X(a) is an injection 2N → R, so 2N ⪯ R.
From R ⪯ 2N and 2N ⪯ R, thanks to the Schröder-Bernstein theorem we have
R ∼ 2N .
9.2.21 Corollary. N ≺ R.
Proof. Thanks to Cantor’s theorem we have N ≺ 2N . Thanks to the theorem, we
have 2N ∼ R. Hence, N ≺ R. We have shown that N ≺ 2N ∼ R.
9.2.22 Remark (The Continuum Hypothesis). Any set (such as 2N ) equiv-
alent to R is said to have the cardinality of the continuum. The continuum hy-
pothesis, abbreviated CH, asserts that there is no set whose cardinality is strictly
between the cardinalities of N (countable cardinality) and R (continuum cardi-
nality), that is, CH is the assertion that there does not exist a set S such that
N ≺ S ≺ 2N .
Some sixty years passed since CH was formulated until Paul Cohen, in 1963,
showed that neither the CH nor the denial of CH introduces any inconsistencies in
ZFC. That is, if ZFC is consistent, it remains consistent whether CH is assumed
to be true, or CH is assumed to be false. One may say that CH is a statement
independent from the ZFC axioms: Neither CH nor the negation of CH are theo-
rems in ZFC set theory. The continuum hypothesis is probably the most famous
example of an undecidable statement (or an incompleteness) in ZFC set theory.
9.3 Open and Closed Sets

9.3.1 Definition. Distance Between Two Points The distance d(x, y) be-
tween the reals x and y is the nonnegative real number |y − x|. The notion of
neighborhood defined below is based on this notion of distance. Note that for all
real x and y we have
x = y ⇔ |y − x| = 0.
9.3.2 Definition. Neighborhood For ϵ > 0, the ϵ-neighborhood Nϵ (x) of x is
the set { }
Nϵ (x) = y ∈ R |y − x| < ϵ .
In other words, Nϵ (x) consists precisely of all those points in R that are less than
ϵ-close to x (the ϵ-neighbors of x).
More generally, a neighborhood of x is any set N such that N ⊃ Nϵ (x) for some
ϵ > 0. We sometimes write N (x) to denote a neighborhood of x. (Note that if
y ∈ N (x) and y ̸= x then N (x) need not be a neighborhood of y, e.g., the Exercise
below.)
9.3.3 Proposition. Fix reals x and y. If
(1) y ∈ Nϵ (x) for all ϵ>0
then x = y. In other words: If |y − x| < ϵ for all ϵ > 0 then x = y.
Proof. We prove the contrapositive. If x ̸= y, let ϵ = |y − x| > 0. Then |y − x| ̸<
ϵ.
9.3.4 Proposition. The topology of R possesses the Hausdorff separation prop-
erty, namely:
For any x, y ∈ R, if x ̸= y, there exist respective disjoint neighborhoods N (x)
and N (y).
Proof. Let ϵ = |y − x|/2 and N (x) = Nϵ (x) = (x − ϵ, x + ϵ), N (y) = Nϵ (y) =
(y − ϵ, y + ϵ). By the triangle inequality,
z ∈ Nϵ (x) ⇒ |z − x| < ϵ
⇒ |z − y| = |(y − x) − (z − x)| ≥ |y − x| − |z − x| > 2ϵ − ϵ = ϵ
⇒z∈ / Nϵ (y),
so the neighborhoods Nϵ (x) and Nϵ (y) are disjoint.
9.3.5 Definition. Interior, Limit and Isolated Points Let S be a subset
of R and let x be a point in R. Then
(1) x is called an interior point of S if S is a neighborhood of x, i.e., Nϵ (x) ⊂ S
for some ϵ > 0 (note that, if x ∈ R is an interior point of S, then necessarily
x ∈ S);
(2) x is called a limit (or cluster) point of S if every neighborhood of x contains
a point in S other that x. In other words, x is a limit point of S if and only
if for every neighborhood N (x) of x we have
(∃y)(y ̸= x ∧ y ∈ S ∩ N (x)).
Note that a limit point of S need not itself be in S.
(3) If x ∈ S is not a limit point of S then it is called an isolated point of S. This
is equivalent to S ∩ (Nϵ (x) \ {x}) = ∅ for some ϵ > 0. Note that an interior
point is not isolated, hence an isolated point is never an interior point.
9.3.6 Definition. A subset S of R is called

(1) open, if every point in S is an interior point of S;
(2) closed, if every limit point of S belongs to S;
(3) perfect, if S is closed and every point of S is a limit point of S;
(4) dense in a subset T of R, if every point of T is in S or is a limit point of S.
9.3.7 Examples.
1. The empty set ∅ is open. It is also closed. The same is true of (−∞, +∞).
2. All closed intervals [a, b] as well as any infinite closed intervals [a, +∞) and
(−∞, b] are closed subsets of R. Any open interval (a, b) as well as any infinite
open intervals (−∞, b) and (a, +∞) are open subsets of R. (These statements
are true not only for a 0 whereas a ≤ x − ϵ and x + ϵ ≤ b, so
Nϵ (x) = (x − ϵ, x + ϵ) ⊂ (a, b).
3. For a < b, every point of the interval [a, b) is a limit point of [a, b). The point
b is also a limit point of [a, b) although b ∈
/ [a, b).
4. Define S = {1, 1/2, 1/3, . . .}. S has only one limit point, namely 0, although
0 ∈/ S. In fact, for any ϵ > 0 there always exists n ∈ N+ with 1/n < ϵ,
thus 1/n ∈ S ∩ Nϵ (0). Since any neighborhood N (0) contains such Nϵ (0), it
follows that 0 is a limit point of S.
On the other hand, every point of S is an isolated point. Indeed, for n > 1,
the distance between 1/n and 1/(n − 1) is 1/[n(n − 1)], while the distance
between 1/n and 1/(n + 1) is 1/[n(n + 1)]. Take
1
(i) ϵ= .
n(n + 1)
Then, we have
1 1 1
+ϵ = +
n n n(n + 1)
1 1
< +
n n(n − 1)
n
=
n(n − 1)
1
= .
n−1
We also have
1 1 1
−ϵ = −
n n n(n + 1)
n
=
n(n + 1)
1
= .
n+1
We have shown that

1 1 1 1 1
= −ϵ< < +ϵ< ,
n+1 n n n n−1
i.e.: that Nϵ (1/n) ∩ (S \ {1/n}) = ∅ for ϵ given by (i) and n > 1, thus
1/n is an isolated point of S. For n = 1, choose ϵ = 1/2 to conclude that
N1/2 (1) ∩ (S \ {1}) = ∅, showing 1 is an isolated point of S. Every point of
S is isolated, and S has empty interior (has no interior points).
It remains to show that nonzero points not in S are not limit points of S. If
x < 0 then take ϵ = −x > 0: Nϵ (x) is disjoint from S showing x is not a limit
point of S. If x > 1 take ϵ = x − 1 to show x is not a limit point. Otherwise
x ∈ (1/(n+1), 1/n) for a unique n ∈ N+ : take ϵ = min(x−1/(n+1), 1/n−x).
It is easily shown that Nϵ (x) is disjoint from S, so x is not a limit point of S.
5. Define
S = {1, −1, 1/2, −1/2, 1/3, −1/3 . . .}

= {1, 1/2, 1/3, . . .} ∪ {−1, −1/2, −1/3, . . .}.
S has only one limit point: 0. Every point of S is an isolated point.
6. Every point in R is a limit point of the rationals Q. Every point in Q is a

limit point of Q. (See Note 8.)
9.3.8 Examples. Some sets are open, some are closed, some are both and some
are neither.
1. Open intervals are open sets.
2. The empty set is open vacuously (there is no x ∈ ∅, so we can find a neigh-

borhood N (x) ⊂ ∅ for every x ∈ ∅). Note that ∅ is also closed, since every
limit point of ∅ (there are none) belongs to ∅.
3. Clearly, (−∞, +∞) is open, since it contains N1 (x) = (x − 1, x + 1). The in-
terval (−∞, +∞) is also closed, since every limit point belongs to (−∞, +∞).
4. The interval [a, b] is closed. Moreover, every point of the interval is a limit
point. Every point in (a, b) is an interior point of [a, b].
5. The intervals (a, b) and (a, b] are not closed: a is a limit point that does not
belong to the interval. The interval (a, b] is neither open nor closed.
9.3.9 Proposition. The following intervals are open:
∅, (a, b), (a, +∞), (−∞, b), (−∞, +∞).
The following intervals are closed:
∅, [a, b], (−∞, b], [a, +∞), (−∞, +∞).

9.3.10 Remarks. Note that R = (−∞, +∞) and ∅ are both open and closed.
Open intervals of the form [a, b), (a, b] with a < b are sometimes called semiopen,
but we will rarely use this nomenclature since there is not anything particularly
special about semiopen intervals (they are neither open nor closed).
On the other hand, intervals of the form (−∞, b] and [a, +∞) are closed sets, as
we invite the reader to verify. Although some authors call these intervals semiopen,
we will refrain from this practice.
9.3.11 Proposition. If x is a limit point of the set S then every neighborhood
of x contains infinitely many points of S.
Proof. Suppose that x is a limit point of S and that there is a neighborhood Ix
of x which contains finitely many points. If Ix ∩ {x} is empty then x cannot be a
limit point. Otherwise, since Ix ∩ S is finite, Ix ∩ S \ {x} = {x1 , . . . , xn } for some
n ≥ 1. Define
ϵ = min{|x − x1 |, . . . , |x − xn |}.
Then, we have ϵ > 0 and Ix = (x − ϵ, x + ϵ) is a neighborhood of x. Ix contains
no points of S other than (possibly) x. Hence x cannot be a limit point of S. We
have a contradiction. Therefore, if x is a limit point of S then no neighborhood of
x can contain finitely many points of S. Since, by definition, every neighborhood
of x must contain some point of S, the only possibility left is that the number of
points in S is infinite.
9.4 Properties of Open and Closed Sets

9.4.1 Proposition. The arbitrary union of open sets is open.
Proof. Let {Sα | α ∈ A} be an arbitrary collection of open sets. Define S =
∪α∈A Sα . Let x ∈ S. Then x must be in at least one of the Sα . Since each Sα is
open, there is some open interval I such that x ∈ I ⊂ Sα , whence
∪
x ∈ I ⊂ Sα ⊂ Sα = S,
α∈A
showing that S is open.

9.4.2 Proposition. Every open set is a union of open intervals.
Proof. Let S be an open set. For each x ∈ S there is an open interval Ix such
that x ∈ Ix ⊂ S. Consider the union ∪x∈S Ix . Clearly, every x ∈ S belongs to the
union, since ∪
x ∈ S ⇒ x ∈ Ix ⇒ x ∈ Ix ,
x∈S
showing that S ⊂ ∪x∈S Ix . On the other hand, x ∈ ∪x∈S Ix implies x ∈ Iy ⊂ S for

some y ∈ S. Therefore, ∪x∈S Ix ⊂ S. We conclude that S = ∪x∈S Ix .
9.4.3 Proposition. Every open set is a countable union of open intervals
Proof. Let S be an open set. For each x ∈ S there is an open interval Ix
such that x ∈ Ix ⊂ S. Say that Ix = (ax , bx ). Then, ax < x < bx . There are
rationals âx and b̂x such that ax < âx < x < b̂x < bx . Therefore, for open S,
we have x ∈ S ⇒ x ∈ (âx , b̂x ) ⊂ S. We use the same reasoning used above to
obtain S = ∪x∈S (âx , b̂x ). Since there are countably many intervals with rational
endpoints, the union is a countable union.
9.4.4 Proposition. A set is open if and only if its complement is closed.
Proof. Let S be an open subset of R. For every limit point x of R \ S, every
neighborhood of x contains some point other than x in R \ S. Therefore, this x is
not an interior point of S. Because S is open, all of its interior points belong to S.
So, x must belong to R \ S. We have shown that every limit point of R \ S belongs
to R \ S. That is, this R \ S is closed.
Let S be a subset of R such that its complement R \ S is closed. For every
x ∈ S, x ∈
/ R \ S, therefore x is not a limit point of R \ S (every limit point of this
set belongs to the set because it is closed). Thus, there is a neighborhood Ix of x
such that Ix ∩ (R \ S) = ∅. That is:
∅ = Ix ∩ (R \ S) = (Ix ∩ R) \ (Ix ∩ S)) = Ix \ (Ix ∩ S) ⇒ (Ix ∩ S) = Ix ⇒ Ix ⊂ S,

showing that for every x ∈ S there is a neighborhood Ix ⊂ S of x. That is, S is

open.
9.4.5 Corollary. A set is closed if and only if its complement is open.
Proof. We proved
(1) S is open ⇔ R \ S is closed.
Given a set T , define S = R \ T , so that R \ S = R \ (R \ T ). Thanks to (1), we

have
R \ T is open ⇔ R \ (R \ T ) is closed,
giving the desired results, since R \ (R \ T ) = T .
9.5 Exercise. Show that R and ∅ are the only subsets of the reals that are simul-
taneously open and closed.
9.4.6 Example. The complement of the interval [a, b] is (−∞, a) ∪ (b, −∞), the
union of two open sets, which is an open set. Therefore, [a, b] is a closed set.
9.4.7 Proposition. De Morgan’s Laws The following generalize the De Mor-
gan laws of Note 3. Let {Sα | α ∈ A} be an arbitrary collection of sets and let T
be any set. Then,
( )
∪ ∩
(1) T\ Sα = (T \ Sα ),
α∈A α∈A
( )
∩ ∪
(2) T\ Sα = (T \ Sα ).
α∈A α∈A
Proof. We prove (1): We have

( ) ( )
∪ ∪
x∈T\ Sα ⇔ x∈T ∧x∈
/ Sα
α∈A α∈A
⇔ (∀α ∈ A) (x ∈ T ∧ x ∈/ Sα )
⇔ (∀α ∈ A) (x ∈ T \ Sα )
∩
⇔ x∈ (T \ Sα ) .
α∈A
We prove (2): We have

( ) ( )
∩ ∩
x∈T\ Sα ⇔ x∈T ∧x∈
/ Sα
α∈A α∈A
⇔ (∃α ∈ A) (x ∈ T ∧ x ∈/ Sα )
⇔ (∃α ∈ A) (x ∈ T \ Sα )
∪
⇔ x∈ (T \ Sα ) .
α∈A
9.4.8 Proposition. The arbitrary intersection of closed sets is closed.

Proof. Thanks to De Morgan’s laws, we have
( )
∪ ∩
(1) R\ Sα = (R \ Sα ).
α∈A α∈A
∪
If each Sα is open, so is α∈A Sα , so that we have a closed set on the left side of
(1). On the other hand, each R \ Sα on the right side of (1) is closed. We conclude
that the arbitrary intersection of closed sets is closed.
9.6 Exercise. Prove that the Cantor set is closed.
9.4.9 Proposition. A finite intersection of neighborhoods of a point x is a neigh-
borhood of x.
∩
Proof. Let S = ni=1 Si , where each Si is a neighborhood of x. Any point x ∈ S
must be in every Si . Since each Si is a neighborhood of x, for every i there is
a neighborhood (ai , bi ) ⊂ Si of x. Thus, ai < x < bi for i = 1, . . . , n. Define
. , an } and b =∩min{b1 , b2 , . . . , bn }. Then (a, b) is a neighborhood
a = max{a1 , a2 , . .∩
of x contained in ni=1 (ai , bi ) ⊂ ni=1 Si = S.
9.4.10 Proposition. A finite intersection of open sets is open.
∩
Proof. Let S = ni=1 Si , where each Si is an open set. Any point x ∈ S must
be in every Si . Since each each Si is open, it is a neighborhood of x, hence S is a
neighborhood of x (by Proposition 9.4.9), showing that S is open.
9.4.11 Remark. The argument used to choose (a, b) in the proof of Proposi-
tion 9.4.10 fails when the collection of sets is infinite. In fact, an infinite intersection
of open sets need not be open. Consider
∩
∞
(−1 − 1/n, 1 + 1/n) = [−1, 1].
i=1
9.4.12 Proposition. A finite union of closed sets is closed.

Proof. Let S = ∪ni=1 Si , where each Si is a closed set. Thanks to De Morgan’s
laws, we have (n )
∪ ∩n
R\S =R\ Si = (R \ Si ) .
i=1 i=1
Each set R \ Si is closed, since each Si is open. Since a finite intersection of open
sets is open, R \ S must be open. So, S must be closed.
9.4.13 Remark. To see that the infinite union of closed sets need not be closed,
consider
∪
∞
[−1 + 1/n, 1 − 1/n] = (−1, 1).
i=1
For another example, consider by the limit point 0 of S = ∪n∈N+ {1/n}, where
each singleton {1/n} is a closed set.
9.4.14 Definition. Closure and Interior of a Set The closure S − of a subset
S of R is defined by S − = S ∪ S∗ , where S∗ denotes the set of all limit points of S.
The interior S ◦ of S is the set of all interior points of S. Note that S ◦ ⊂ S ⊂ S − .
9.4.15 Examples. The closure of the open interval (a, b) is the closed interval
[a, b]. The interior of the closed interval [a, b] is the open interval (a, b). The interior
of Q is empty and the closure of Q is R (by density).
We do not necessarily have (S − )◦ = S. For example, let S = (−1, 0) ∪ (0, 1).
Then S − = [−1, 1] and (S − )◦ = (−1, 1). An even simpler example: Let S = {0}.
Then S − = {0} and (S − )◦ = {0}◦ = ∅ ̸= S.
We do not necessarily have (S ◦ )− = S. For example, let S = Z. Then S ◦ = ∅
and (S ◦ )− = ∅. For a simpler example: S = {0} gives S ◦ = ∅ and (S ◦ )− = ∅− =
∅ ̸= S.
9.4.16 Proposition. For all subsets S of R the following statements hold:
(1) S − is the set of points all of whose neighborhoods intersect S.
(2) S − is closed.
(3) S = S − if and only if S is closed.
(4) S − is the smallest closed subset of R which includes S, i.e.: if T is a closed

subset of R then S ⊂ T ⇒ S − ⊂ T .
Proof. We prove (1). If x ∈ S − then either x ∈ S, in which case every

neighborhood of x contains x and hence intersects S, or else x ∈ S ∗ , in which case
every neighborhood of x intersects S at an element y ∈ S \ {x}. Conversely if every
neighborhood of x intersects S then either x ∈ S ⊂ S − or, if x ∈ / S, then every
neighborhood of x contains an element y ∈ S, and necessarily y ̸= x (since x ∈ / S),
hence x ∈ S∗ ⊂ S − .
We prove (2). Let x ∈ R \ S − be arbitrary. By (1), there is a neighborhood
N (x) of x disjoint from S. Then N (x) includes an open neighborhood Ix of xdisjoint
from S. Since Ix is both disjoint from S and open (hence a neighborhood of all of
its points), it must be disjoint from S∗ also, since any neighborhood of a point in
S∗ intersects S. Hence Ix ⊂ R \ (S ∪ S∗ ) = R \ S − . We have shown that R \ S −
includes a neighborhood of every point it contains, that is, R \ S − is open. Hence,
its complement S − = R \ (R \ S − ) is closed, by Proposition 9.4.4.
We prove (3). If S = S − then S is closed thanks to (1). Hence, S∗ ⊂ S.
Therefore, S − = S ∪ S∗ = S.
We prove (4). Suppose S ⊂ T and T is closed. Then S∗ ⊂ T∗ and T∗ ⊂ T , so
S∗ ⊂ T . Hence, S − = S ∪ S∗ ⊂ T .
9.4.17 Proposition. Suppose S is a nonempty subset of R. If S is bounded
above then sup S ∈ S − . If S is bounded below then inf S ∈ S − .
Proof. We only prove the first of the statements, the second being formally
identical.
Let S be nonempty and bounded above, and let s = sup S. If (a, b) is any open
interval containing s, then a is not an upper bound of S but s is, so there exists
an element x ∈ S ∩ (a, s] ⊂ (a, b). Since any neighborhood of s contains such an
interval (a, b), we conclude s ∈ S − by Proposition 9.4.16.
9.4.18 Definition. Nested Sets A sequence {Sn }∞ n=0 of sets is simply an element
of P(R) , that is, a function f : N → P(R) that assigns to every n ∈ N a subset
N
Sn of R. The sequence {Sn }∞ n=0 is called nested if
S0 ⊃ S1 ⊃ S2 ⊃ · · · .
9.4.19 Theorem. Nested Intervals Property If {In }∞ n=0 is a nested sequence

of nonempty closed bounded intervals then there is an x such that x ∈ In for all
n ∈ N. In other words, given a nested sequence of nonempty closed bounded
intervals there is a point belonging to all the intervals in the sequence.
Proof. Let {In }∞ n=0 be a nested sequence of closed bounded intervals with In =
[an , bn ] and an ≤ bn .
We show that, for each n ∈ N, the number bn is an upper bound of {ak | k ∈ N}.
Choose n and k in N. We prove that ak ≤ bn by considering the two possible cases:
(i) Suppose that n ≤ k. Then, thanks to the inclusion Ik ⊂ In , we obtain the

inequality
ak ≤ b k ≤ b n .
(ii) Suppose that n > k. Then, thanks to the inclusion In ⊂ Ik , we obtain the
inequality
ak ≤ an ≤ b n .
Thanks to (i) and (ii) we have ak ≤ bn for all n and k in N . In particular, we have
an ≤ b0 for all n ∈ N. Thus, the set {an | n ∈ N} is nonempty and is bounded
above. Define
s = sup{ak | k ∈ N}.
We have s ≤ bn for all n ∈ N. Hence, since an ≤ s ≤ bn , we have s ∈ In for all
n ∈ N.
9.4.20 Corollary. If {[an , bn ]}∞
n=0 is a nested sequence such that
inf{bn − an | n ∈ N} = 0
then there is a unique x such that x ∈ [an , bn ] for all n ∈ N.

Proof. The existence of x follows from the Nested Intervals Property. To prove
the uniqueness let x, y be two numbers in [an , bn ] for all n ∈ N. Without loss of
generality x ≤ y, thus 0 ≤ y − x ≤ bn − an for all n ∈ N, so y − x is a lower bound
for the differences bn − an . Hence 0 ≤ y − x ≤ inf {bn − an | n ∈ N} = 0, and the
conclusion x = y follows.
9.4.21 Corollary. Bolzano-Weirstrass Every bounded infinite subset of the
reals has a limit point.
Proof. Let S be an infinite bounded subset of R. Since S is bounded, we have
S ⊂ [a0 , b0 ] for some choice of a0 and b0 . Let c0 = (a0 + b0 )/2. Since S is infinite,
at least one of the intervals [a0 , c0 ] and [c0 , b0 ] contains infinitely many elements of
S. Define this interval to be [a1 , b1 ]. Note that b1 − a1 = 2−1 (b0 − a0 ). We proceed
recursively. Suppose that we have a sequence of nested intervals
[a0 , b0 ] ⊃ [a1 , b1 ] ⊃ · · · ⊃ [an , bn ]
such that bn − an = 2−n (b0 − a0 ) and such that [an , bn ] contains infinitely many
elements of S. Let cn = (an + bn )/2. Since [an , bn ] is infinite, at least one of the
intervals [an , cn ] and [cn , bn ] contains infinitely many elements of S. Define this
interval to be [an+1 , bn+1 ]. Note that
bn+1 − an+1 = 2−(n+1) (b0 − a0 ).

In this manner we define a sequence {[an , bn ]}∞

n=0 of nested intervals such that
each interval contains infinitely many points of S and such that
inf{bn − an ∈ R | n ∈ N} = 0.
∩
Thanks to Theorem 9.4.19, there exists x ∈ ∞ n=0 [an , bn ] (x is not necessarily in S,
though it is necessarily unique, by Corollary 9.4.20). Moreover, any neighborhood
N of x contains the intervals [an , bn ] for any sufficiently large n (this holds as long
as N ⊃ Nϵ (x) and n is large enough that 2−n (b0 − a0 ) < ϵ), hence N contains
(infinitely many) points in S. This shows that x is a limit point of S.
9.5 Convexity and Connectedness

9.5.1 Definition (Convexity). A subset S of R is called convex if for every two
points x and y in S we have [x, y] ⊂ S.
9.5.2 Proposition. All convex subsets of R are intervals.
Proof. This is a mere restatement of the Intermediacy Property of intervals, Propo-
sition 8.2.11.
9.5.3 Definition. A subset S ⊂ R is called disconnected if there exist open subsets
U, V ⊂ R such that
(S ⊂ U ∪ V ) ∧ (S ∪ U ̸= ∅) ∧ (S ∪ V ̸= ∅) ∧ (U ∩ V = ∅).
In other words, S is disconnected if it is the union of two (nonempty) pieces that

can be covered by disjoint open sets. We can say that the open sets U, V witness
the disconnectedness of S.
9.5.4 Definition. A subset S ⊂ R is called connected if it is not disconnected.
Equivalently, for all open sets U, V ⊂ R we have
(S ⊂ U ∪ V ) ∧ (U ∩ V = ∅) ⇒ (S ⊂ U ) ∨ (S ⊂ V ).
Of course, the above definition simply states that any disjoint open sets U, V fail
to witness the disconnectedness of S.
9.5.5 Theorem. A subset of R is connected if and only if it is an interval.
Proof. By Proposition 9.5.2, it suffices to show that a subset S ⊂ R is connected
if and only it is convex.
The stated equivalence is obvious if S has zero or one elements, so we assume
S has at least two distinct elements henceforth.
Let S be connected. Let x, y be arbitrary elements of S such that x < y. Let

z ∈ [x, y]. If z = x or z = y then z ∈ S by hypothesis. If z ∈ (x, y) then, since
S is connected, the disjoint open sets (−∞, z) and (z, +∞) do not witness the
disconnectedness of S. Since x ∈ S ∩ (−∞, z) and y ∈ S ∩ (z, +∞), we must have
S ̸⊂ (−∞, z) ∪ (z, +∞), hence z ∈ S. This proves that [x, y] = {x, y} ∪ (x, y) ⊂ S,
so S is convex.
Conversely, let S be convex. We prove that S is connected. Let U, V be disjoint
open sets covering S. Let x, y be any two different elements of S. Each belongs to
exactly one of the sets U, V so, without loss of generality, we may assume x < y
and x ∈ U . By convexity we have [x, y] ⊂ S ⊂ U ∪ V .
Let
W = {t ∈ [x, y] | [x, t] ⊂ U } and z = sup W.
Clearly W ⊂ U . Observe also that W is an initial segment of [x, y], since
(x ≤ s < t) ∧ (t ∈ W ) ⇒ [x, s] ⊂ [x, t] ⊂ W ⇒ s ∈ W.
Therefore [x, z) ⊂ W ⊂ [x, z], and certainly [x, z) ⊂ U .

R\V is a closed set (by proposition 9.4.4) and W ⊂ U ⊂ R\V (since U ∩V = ∅).
We have z is a limit point of W (by proposition 9.4.17), hence it is also a limit point
of the closed set R \ V , hence z ∈ R \ V , so z ∈ / V . Since z ∈ [x, y] ⊂ S ⊂ U ∪ V ,
we have z ∈ U . Therefore, [x, z] = [x, z) ∪ {z} ⊂ U , so z ∈ W = [x, z].
We claim that z = y.
Since U is open and z ∈ U , we have (z, v) ⊂ U for some v > z. For all u ∈ (z, v),
we have [x, u] = [x, z] ∪ (z, u] ⊂ U . Since z is the least upper bound of W , we must
have u ∈ / W , so u > y. Since u > y for all u ∈ (z, v), we have y ≤ z. Certainly
z ≤ y, since y is an upper bound for W , hence z = y.
For any two elements x < y of S we have shown that [x, y] is included in one
of the sets U, V , say U . Since every element of S belongs to exactly one of the
sets U, V , we conclude that S ⊂ U , concluding the proof that S is connected.
9.6 Compactness
9.6.1 Definitions. Let S be a subset of R. An open cover of S is a collection
C = {Uα | α ∈ A}
of open subsets of R, indexed by an arbitrary set A, such that

∪
S⊂ Uα .
α∈A
If C = {Uα | α ∈ A} is an open cover of S, B ⊂ A and C ′ = {Uα | α ∈ B}

is also an open cover of S, we say that C ′ is a subcover of C. If B is finite, C ′ is
called a finite subcover.
The set S is called compact if every open cover of S contains a finite subcover.
That is, if {Uα | α ∈ A} is an open cover of S then there are α1 , . . . , αn ∈ A such
that
S ⊂ Uα1 ∪ · · · ∪ Uαn .
9.6.2 Examples. The following are examples of open covers of R:
C1 = {(−∞, +∞)},
C2 = {(n − 1, n + 1) | n ∈ Z},
C3 = {(−n, n) | n ∈ N},
C4 = {(−r, r) | r ∈ Q ∧ r > 0},
C5 = {(r − 1, r + 1) | r ∈ Q},
C6 = {(−x, x) | x ∈ R ∧ x > 0},
C7 = {(x − 1, x + 1) | x ∈ R}.
9.6.3 Proposition. Every finite subset of R is compact.

Proof. Suppose that S is a finite subset of R. Then,
S = {x0 , x1 , . . . , xn }.
Let C = {Tα | α ∈ A} be an open cover of S. Each xj belongs to some Tαj ∈ C.

Therefore,
S ⊂ Tα0 ∪ Tα1 ∪ · · · ∪ Tαn
because xj ∈ Tαj for j = 0, . . . , n. Thus, the set {Tα0 , Tα1 , . . . , Tαn } is a finite cover
of S.
9.6.4 Remark. Note that in the above proposition the argument given is valid
for any open cover. That is, to prove compactness we must show that any open
cover whatsoever contains a finite subcover. To show that a set is not compact we
need only find one open cover that does not have a finite subcover.
9.6.5 Example. The interval [0, +∞) is not compact. Define Tn = (−1, n) for
n = 1, 2, . . .. Then, we have
∪
∞
[0, +∞) ⊂ Tn ,
n=1
so that C = {Tn ∈ P(R) | n ∈ N+ } is a cover of S. Let
C ′ = {Tα0 , Tα1 , . . . , Tαk }
be any finite subcover of C. Each Tαj is of the form Tαj = (−1, mj ). Therefore,
Tα0 ∪ Tα1 ∪ · · · ∪ Tαn = (−1, m),
where m = sup{m0 , m1 , . . . , mk }. Since no such finite union can include [0, +∞),
no finite subcover exists.
9.6.6 Example. The interval (0, 1) is not compact. Define Tn = (1/n, 1) for
n = 1, 2, . . .. Then, we have
∪
∞
(0, 1) ⊂ Tn ,
n=1
so that C = {Tn ∈ P(R) | n ∈ N+ } is a cover of S. Let
C ′ = {Tα0 , Tα1 , . . . , Tαk }
be any finite subcover of C. Each Tαj is of the form Tαj = (1/mj , 1). Therefore,
Tα0 ∪ Tα1 ∪ · · · ∪ Tαn = (1/m, 1),
where m = sup{m0 , m1 , . . . , mk }. Since no such finite union can include (0, 1), no
finite subcover exists.
9.6.7 Remark. In the above examples, [0, +∞) is closed but not bounded, while
(0, 1) is bounded but not closed. Neither is compact. Are closed and bounded sets
compact? The answer is affirmative, as we show below.
9.7 Exercise. Let K be a compact subset of R.
1. Prove that, for each n ≥ 1ϵ > 0 there exists a finite subset S of K such that
∪
K⊂ Nϵ (x).
x∈S
(This property is usually stated by saying that compact sets are totally
bounded.)
2. Prove that there exists a set L ⊂ K such that
• L is at most countable, and

• L is dense in K (i.e., K ⊂ L− ).
9.6.8 Proposition. Every compact subset of R is bounded.

Proof. Let K be a compact subset of R. With Tn = (−n, n) for n = 1, 2, . . .,
C = {Tn | n ∈ N+ } is an open cover of K. Since K is compact, there is a finite
subcover
C ′ = {Tα1 , Tα2 , . . . , Tαm },
such that
K ⊂ Tα1 ∪ Tα2 ∪ · · · ∪ Tαm ⊂ (−k, k),
where k = sup{α1 , α2 , . . . , αm }. Hence, K is bounded, since every element x of K
satisfies the inequality −k < x < k.
9.6.9 Proposition. Every compact subset of R is closed.
Proof. Let K be a compact subset of R. We prove that it is closed by showing
that its complement R \ K is open.
Choose any x ∈ R \ K. For n = 1, 2, . . . define
Tn = (−∞, x − 1/n) ∪ (x + 1/n, +∞).
Tn is open because it is the union of open intervals. We have

∪
∞
R \ {x} = Tn .
n=1
Since x ∈
/ K, the above implies
∪
∞
K⊂ Tn .
n=1
The compactness of K implies the existence of a finite subcover {Tn1 , Tn2 , . . . , Tnm }
of K. Without loss of generality n1 < n2 < · · · < nm . Thus,
∪
m
K⊂ Tnk = Tnm = (−∞, x − 1/nm ) ∪ (x + 1/nm , +∞).
k=1
It follows that K ∩ [x − 1/nm , x + 1/nm ] = ∅. Hence, R \ K ⊃ [x − 1/nm , x + 1/nm ],

and the latter is a neighborhood of x (as it contains N1/nm (x)).
We have shown that x ∈ R \ K implies x has a neighborhood in R \ K, i.e.:
R \ K is open. Hence, its complement K = R \ (R \ K) is closed.
9.6.10 Definition. Let C = {Cα | α ∈ A} be a collection of closed subsets

of R indexed by an arbitrary set A. We say that C has the finite intersection
property (FIP) if the intersection of any finite subcollection of C is nonempty, that
is, for any finite subset B ⊂ A,
∩
Cα ̸= ∅.
α∈B
9.6.11 Proposition (FIP Characterization of Compactness). A subset K ⊂

R is compact if and only if K is closed and each FIP collection of closed subsets
of K has nonempty intersection, in other words, a closed set K is compact if and
only if, for all collections {Cα | α ∈ A} of closed subsets Cα ⊂ K:
( )
∩ ∩
(∀ finite B ⊂ A) Cα ̸= ∅ ⇒ Cα ̸= ∅.
α∈B α∈A
| {z }
FIP property
(Note: We are requiring K to be closed at the outset. At the price of introducing

the definition of sets Cα ⊂ K that are closed in K, which we presently decline to do,
the a priori requirement that K be closed could be removed. Since, a posteriori,
compact sets are closed, e.g. by Proposition 9.6.9 above, this is merely a cosmetic
loss of generality.)
Proof. The equivalence is essentially an application of the generalized De Morgan
laws and of the complementary relation between open and closed sets. We prove
the first implication leaving the second as an exercise for the reader.
Let K be compact, and let {Cα | α ∈ A} be a FIP collection of closed subsets
of K. Consider the family {Uα | α ∈ A} of open subsets Uα = R \ Cα of R. By
the generalized De Morgan laws 9.4.7 and the FIP, for all finite subsets B ⊂ A:
∪ ∩
R\ Uα = Cα
α∈B α∈B
is a nonempty subset of K. Therefore, no finite subcover of {Uα } covers K. Since

K is compact, {Uα }∪is not an open cover of K, hence K ̸⊂ ∪α∈A Uα . Let I be the
nonempty set K \ α∈A Uα . Note that K \ Uα = R \ Uα = Cα (since Cα ⊂ K,
hence Uα ⊂ R \ K). By generalized De Morgan:
∪ ∩ ∩
∅ ̸= I = K \ Uα = (K \ Uα ) = Cα .
α∈A α∈A α∈A
9.8 Exercise. Complete the proof of Proposition 9.6.11 by showing that a closed
set all of whose FIP collections of closed subsets have nonempty intersection is
compact.
9.9 Exercise. Let K ⊂ R be compact. Let C0 ⊃ C1 ⊃ . . . be a nested sequence

of nonempty closed subsets of K. Show that
∩
∞
Cn ̸= ∅.
n=0
(This strengthening of the Nested Intervals Property holds in any compact set.
In light of the Heine-Borel characterization (below) of compact subsets of R as
sets that are both closed and bounded, this exercise explains why nonempty closed
bounded intervals are needed in formulating the Nested Intervals Property.)
9.6.12 Theorem. Heine-Borel A subset of R is compact if and only if it is
closed and bounded.
Proof. In the previous two propositions we have proved that every compact set is
closed and bounded. Below we prove that every closed and bounded set is compact.
Let S be a closed and bounded subset of R and let C = {Tα | α ∈ A} be
an open cover of S. To prove that C contains a finite subcover, we proceed by
contradiction: We suppose that C does not contain a finite subcover. That is, we
suppose that
{
There is no finite subset {Tα1 , Tα2 , . . . , Tαn } of C such that
(1)
S ⊂ Tα1 ∪ Tα2 ∪ . . . ∪ Tαn .
Since S is bounded, S ⊂ [a0 , b0 ] for some a0 and b0 . Let c0 = (a0 + b0 )/2.

At least one of the subsets S ∩ [a0 , c0 ] and S ∩ [c0 , b0 ] of S does not have a finite
subcover. For if both had finite subcovers, so would
S = (S ∩ [a0 , c0 ]) ∪ (S ∩ [c0 , b0 ]),
(by taking the union of the two finite subcovers) violating (1).
If S ∩ [a0 , c0 ] fails to have a finite subcover then define [a1 , b1 ] = [a0 , c0 ]; else
define [a1 , b1 ] = [c0 , b0 ]. Thus, S ∩ [a1 , b1 ] fails to have a finite subcover.
We proceed recursively. Suppose that we have nested intervals
[a0 , b0 ] ⊃ [a1 , bb ] ⊃ · · · ⊃ [an , bn ],
with bn − an = 2−n (b0 − a0 ), such that S ∩ [an , bn ] does not have a finite subcover.
Define cn = (an + bn )/2. Then, at least one of the subsets S ∩ [an , cn ] and
S ∩ [cn , bn ] of S does not have a finite subcover. (The reasoning is analogous to
that given for [a0 , b0 ].)
If S ∩[an , cn ] fails to have a finite subcover then define [an+1 , bn+1 ] = [an , cn ]; else
define [an+1 , bn+1 ] = [cn , bn ]. Thus, S ∩ [an+1 , bn+1 ] fails to have a finite subcover.
Further, we have bn+1 − an+1 = 2−(n+1) (b0 − a0 ) and
[a0 , b0 ] ⊃ [a1 , bb ] ⊃ · · · ⊃ [an , bn ] ⊃ [an+1 , bn+1 ].
We have defined an infinite sequence {[an , bn ]}∞

n=0 of nested intervals such that
{
There is no n ∈ N and there is no finite subset {Tα1 , Tα2 , . . . , Tαn } of C
(2)
such that S ∩ [an , bn ] ⊂ Tα1 ∪ Tα2 ∪ . . . ∪ Tαn ,
and such that

inf{bn − an | n ∈ N} = 0.
Thanks to Corollary 9.4.20, there is a point x that is contained in all of the intervals.
Each interval [an , bn ] must contain infinitely many points of S, otherwise S ∩[an , bn ]
would have a finite subcover contradicting (2).
The point x defined above is a limit point of S. Since S is closed, x ∈ S. Since
C covers S, we have x ∈ T for some T ∈ C. Since T is open, there is some ϵ > 0
such that
(3) (x − ϵ, x + ϵ) ⊂ T.
Now, bn − an = 2−n (b0 − a0 ) with n sufficiently large, say n > n0 for some n0 ,
implies bn − an < ϵ. Hence, since x ∈ [an , bn ] for all n, we have
(4) n > n0 ⇒ [an , bn ] ⊂ (x − ϵ, x + ϵ).
Now, (3) and (4) together imply
(5) (∃n0 ∈ N)(∃T ∈ C)(∀n ∈ N)(n > n0 ⇒ S ∩ [an , bn ] ⊂ T ).
We have shown that (1) implies (2). Additionally we have shown that (1)
implies (5). Since (5) implies that (2) is false, we have shown that (1) implies a
contradiction. Hence, (1) is false, i.e.: S has a finite subcover.
9.6.13 Theorem. Any subset S of R that has one of the three following prop-
erties has the other two.
(1) S is closed and bounded.
(2) S is compact.
(3) Every infinite subset of S has a limit point in S.

Proof. We have already proved that (1) ⇔ (2) (Heine-Borel) and that (1) ⇒ (3)
(Bolzano-Weirstrass). The proof is concluded by showing that (3) ⇒ (1), since we
then have
(3) ⇔ (1) ⇔ (2).
Suppose that (3) holds for a subset S of R. If S is not bounded, then for each
n ∈ N there is a point xn ∈ S such that |xn | > n. The set of all such points xn is
infinite and has no limit point in R and, therefore, has none in S. We have shown
that (3) implies that S is bounded.
To show that S is bounded, let x ∈ R \ S be a limit point of S. Then, for each
n ∈ N, there is a point xn ∈ S such that
(1) |x − xn | < 1/(n + 1) and x ̸= xn .
Let A denote the set of all such points xn . These need not be all distict; however,
for any ϵ > 0 an infinity of distinct such points lies in Nϵ (x). To see this, note first
that xm ∈ Nϵ (x) provided 1/(m + 1) < ϵ, so A ∩ Nϵ (x) ̸= ∅. Suppose now that
Nϵ (x) contained only finitely many points of A, say
A ∩ Nϵ (x) = {xn0 , . . . , xnk }.
Let a denote the smallest element of
B = {|x − xn0 |, . . . , |x − xnk |}.
Then a > 0, since |x−xj | > 0 for all j ∈ N. Choose ñ so large that 0 < 1/(ñ+1) <
a. Then, no element of A is less than 1/(ñ + 1), contradicting (1). Hence, the point
x is a limit point of A.
We show that x is the only limit point of A. To see this, suppose that y is
another limit point of A. Let ϵ > 0 be arbitrary. Let n be such that 1/(n + 1) <
ϵ/2. Since y is a limit point of A there exist infinitely many j ∈ N such that
|xj − y| < ϵ/2; in particular there exists one such j0 with j0 > n. Therefore
|y − x| = |(y − xj0 ) + (xj0 − x)|

≤ |y − xj0 | + |xj0 − x|
< ϵ/2 + 1/(j0 + 1)
< ϵ/2 + 1/(n + 1)
< ϵ/2 + ϵ/2 = ϵ.
Since ϵ is arbitrary we must have y = x. By (3), the (only) limit point x of

{xj | j ∈ N} ⊂ S must itself be in S, so S is closed.
We have shown that (3) implies that S is closed and bounded, i.e.: (3) ⇒ (1).
Foundations Note 10
Sequences and Series

July 28, 2020
Contents
10.1 The Extended Reals . . . . . . . . . . . . . . . . . . . . . . . 10-1
10.2 The Topology of the Extended Reals . . . . . . . . . . . . . 10-5
10.3 Sequences and their Limits . . . . . . . . . . . . . . . . . . . 10-10
10.4 Subsequences and Subsequential Limits . . . . . . . . . . . . 10-15
10.5 Cauchy Sequences . . . . . . . . . . . . . . . . . . . . . . . . 10-20
10.6 Some Convergence Results . . . . . . . . . . . . . . . . . . . 10-22
10.7 Contractive Sequences . . . . . . . . . . . . . . . . . . . . . . 10-25
10.8 The Binomial Theorem . . . . . . . . . . . . . . . . . . . . . 10-26
10.9 Monotonicity . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-34
10.10 Series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-43
10.11 Euler’s Number e . . . . . . . . . . . . . . . . . . . . . . . . 10-59
10.1 The Extended Reals

10.1.1 Note. In this note the symbol ⇒ is used to denote the sentential connective
→ and also to denote a tautological implication (logical consequence). The symbol
10-1
Note 10: Sequences and Series – July 28, 2020 10-2
10.1.2 Definition. We generalize the definition of Dedekind cut introduced in

Note 8. A lax Dedekind cut is an initial segment of Q without a largest element.
A lax Dedekind cut which is a proper and nonempty initial segment of Q will
be henceforth called strict (i.e., a Dedekind cut per Definition 8.2.4); otherwise it
is non-strict. The only non-strict cuts are the empty set ∅ and Q itself.
10.1.3 Definition. The affinely extended real line E is the set of lax Dedekind
cuts. The empty cut ∅ will henceforth be denoted by the conventional symbol −∞
(negative infinity) and the improper cut Q by +∞ (positive infinity, occasionally
denoted simply ∞); thus
E = R ∪ {+∞} ∪ {−∞}.
The following definitions extend those of the order and operations of R to E
(1) For every x ∈ R define −∞ < x < +∞.
(2) For x ∈ R define
(i) x + (+∞) = (+∞) + x = +∞, and x + (−∞) = (−∞) + x = −∞,

(ii) x − (+∞) = −∞, and x − (−∞) = +∞,
x
(iii) = 0,
±∞
(iv) (+∞) + (+∞) = (+∞) − (−∞) = +∞, and
(−∞) + (−∞) = (−∞) − (+∞) = −∞,
(v) (+∞) · (+∞) = (−∞) · (−∞) = +∞, and
(+∞) · (−∞) = (−∞) · (+∞) = −∞.
(3) For x ∈ R, x > 0, define
(i) x · (+∞) = (+∞) · x = +∞, and x · (−∞) = (−∞) · x = −∞,

+∞ −∞
(ii) = +∞, and = −∞.
x x
(4) For x ∈ R, x < 0, define
(i) x · (+∞) = (+∞) · x = −∞, and x · (−∞) = (−∞) · x = +∞,

+∞ −∞
(ii) = −∞, and = +∞.
x x
The elements of R are the finite elements of E, while the elements of E \ R are
the infinite elements of E, i.e.: +∞ and −∞.
Define −∞ ≤ x ≤ +∞ for all x ∈ E. Then E is a totally ordered set.
Let S be a nonempty subset of E. Then S is bounded above by +∞. If S

is bounded above by a real number x then sup S exists and is a real number.
Otherwise the least upper bound of S is
sup S = +∞.
Similarly, if S is a nonempty subset of E then S is bounded below by −∞. If

S is bounded below by a real number x then sup S exists and is a real number.
Otherwise the greatest lower bound of S is
inf S = −∞.
We also define∗
sup ∅ = −∞ and inf ∅ = +∞.
In other words, every subset of E has a least upper bound and a greatest lower
bound in E.
10.1.4 Definition. We extend the definition of powers, exponentials and loga-
rithms as follows:
• For κ > 0: (+∞)κ = +∞;
• For κ < 0: (+∞)κ = 0;
• For b > 1: expb (+∞) = b∞ = +∞ and expb (−∞) = b−∞ = 0;
• For 0 1: logb (0) = −∞ and logb (+∞) = +∞;
• For 0 < b < 1: logb (0) = +∞ and logb (+∞) = −∞.

10.1.5 Remarks. Note that such expressions as
±∞ ±∞
+∞ + (−∞) −∞ + (+∞)
0 ±∞
+∞ − (+∞) −∞ − (−∞) 0 · (±∞)
are undefined (and no sensible meaning may be assigned to any of them).† In

particular, E is not a ring. Moreover, despite visual appearances, note that −∞
and +∞ are not additive inverses of each other since (+∞) + (−∞) is undefined
(hence not equal to zero).
∗
Strictly speaking these are not ad hoc definitions. For instance, sup ∅ = −∞ follows from the
fact that E is the set of upper bounds of ∅, and the least element of E is −∞.
†
Warning: Usually, in the context of measure theory, 0 · (±∞) is defined to be zero.
It is worth remarking that a few of the ad hoc definitions in (1)–(4) above do

follow from the definitions of the order and operations of real numbers (i.e., strict
Dedekind cuts) as given in Note 8, when lax cuts are allowed.
10.1 Exercise. Which of the definitions in (1)–(4) of Definition 10.1.3 follow from
those of strict Dedekind cuts simply by allowing lax Dedekind cuts? Which ones
fail? Which of the “forbidden” expressions in Remarks 10.1.5 would have a meaning
in such sense? (The reader is invited to ask herself why such expressions are best
left undefined.)
10.2 Exercise. Show that E is an ordered set isomorphic to [−1, 1] via the function
f : [−1, 1] → E defined by
 x

 1 − x2 if x ∈ (−1, 1);

f (x) = −∞ if x = −1;


+∞ if x = +1.
In other words,
• f is bijective, and
• for all x, y ∈ [−1, 1]: x<y ⇔ f (x) < f (y).

10.1.6 Remark. To preclude confusion, a subset S ⊂ E will be called bounded if
and only if S is a subset of R that is bounded in R. Bounds equal to ±∞ do not
imply boundedness according to this definition.
10.1.7 Proposition. Let S ⊂ E be any subset. Then S has a least upper bound
sup S and a greatest lower bound inf S in E.
Proof. We prove sup S exists. If S is empty then sup S = −∞ (since −∞ is
an upper bound for S vacuously, and it is the least such). The same holds if
S = {−∞}. If +∞ ∈ S then sup S = +∞. In any other case S contains some
real elements but not +∞. If S is bounded above (by a real number) then sup S
exists in R by completeness of R; otherwise, if S is unbounded above we have
sup S = +∞. The proof for S is formally identical and omitted.
10.1.8 Definition. An interval in E is either an interval I ⊂ R or one of the
following sets:
• [−∞, x] = {y ∈ E | −∞ ≤ y ≤ x},
• [−∞, x) = {y ∈ E | −∞ ≤ y < x},
• [x, +∞] = {y ∈ E | x ≤ y ≤ +∞},

• (x, +∞] = {y ∈ E | x < y ≤ +∞},
for some x ∈ E.
10.2 The Topology of the Extended Reals

10.2.1 Definition (Topology of the Extended Reals). A neighborhood of a
real number x in E is any subset N (x) includining a real neighborhood of x (that
is, N (x) ⊃ Nϵ (x) = (x − ϵ, x + ϵ) for some ϵ ∈ (0, ∞)).
A neighborhood of +∞ is any subset N (+∞) ⊂ E such that (x, +∞] ⊂ N (+∞)
for some x < +∞. It is sometimes convenient to define Nϵ (+∞) = (1/ϵ, +∞] for
ϵ > 0 (the ϵ-neighborhood of +∞). Then a neighborhood of +∞ is any set including
some ϵ-neighborhood of +∞.
A neighborhood of −∞ is any subset N (−∞) ⊂ E such that [−∞, x) ⊂ N (−∞)
for some x > −∞. It is sometimes convenient to define Nϵ (−∞) = [−∞, −1/ϵ)
for ϵ > 0 (the ϵ-neighborhood of −∞). Then a neighborhood of −∞ is any set
including some ϵ-neighborhood of −∞.
An interior point of a subset S ⊂ E is a point x ∈ E such that S is a neigh-
borhood of x. (Note that we must have x ∈ S in this case, since S includes a
neighborhood of x, which includes x.)
A subset S ⊂ E is open if every point in S is an interior point of S, that is,
if S is a neighborhood of each of its points. Arbitrary unions, as well as finite
intersections of open sets are open. It follows that unions and a finite intersections
of neighborhoods of a fixed x ∈ E are neighborhoods of x.
The interior of a subset S ⊂ E is the set, denoted S ◦ , of its interior points.
Clearly S ◦ ⊂ S, with S = S◦ if and only if S is open.
A limit point of a set S ⊂ E is any x ∈ E such that, for any neighborhood
N (x) of x we have S ∩ N (x) \ {x} ̸= ∅ (any neighborhood of x contains some point
of S other than x). Note that x need not be in S. The set of limit points of S is
denoted S∗ .
A subset S ⊂ E is closed if it contains all its limit points in E. Equivalently, a
subset S ⊂ E is closed if the complement E\S is open. Finite unions and arbitrary
intersections of closed sets are closed
The closure of a subset S ⊂ E is the union S − of S with the set S∗ of limits
point of S. Clearly S − ⊃ S, with S = S − if and only if S is closed (equivalently, if
S ⊃ S∗ ). The closure of S may be characterized as the set of elements all of whose
neighborhoods intersect S.
A subset S of E is convex if, for all x, y ∈ S, x < y ⇒ [x, y] ⊂ S.
10.3 Exercise. Prove that the two definitions of a closed subset of E in Defini-
tion 10.2.1 are equivalent.
10.2.2 Remarks. • Note that all intervals [−∞, x) are open according to the
definition above. So are all (x, +∞], and so is [−∞, ∞]. An open subset of
R is certainly open in R, hence the interior S ◦ of a subset of R is the same
even if one regards S as a subset of E. However, it is important to stress that
if S ⊂ R is closed in R, it need not be closed as a subset of E. For instance
R itself is closed in R, but not closed in E since it is missing its limit points
±∞. In particular, when S is a subset of R there is ambiguity as to the
meaning of S − , the closure of S. We will try to provide enough context to
preclude confusion. By and large we shall henceforth work with the topology
of E rather than R because the compactness of E (see below) has many useful
consequences. Many statements are valid in E without exception, whereas
additional hypotheses need to be added for their truth in R. We already saw
the first important example of this phenomenon in Proposition 10.1.7.
• Since E is order-isomorphic to [−1, 1] (or any other closed interval [a, b] of R

with a < b for that matter) say, via the function f : [−1, 1] → E of Exer-
cise 10.2, it follows that E is also topologically isomorphic (“homeomorphic”)
to [−1, 1] in the sense that f and f −1 each map closed sets into closed sets.
Equivalently, f maps relatively open subsets of [−1, 1] into open subsets of E
and f −1 maps open subsets of E into relatively open subsets of [−1, 1] in the
sense of Definition ?? below, which the reader may want to read at this point
(though certainly f −1 does not map every open subset of E to an open set.
E.g., (f −1 )∗ (E) = [−1, 1] is not open; it is merely relatively open in [−1, 1]).
10.2.3 Proposition. • A subset S of E is convex if and only if S is an interval.
• A subset S of E is connected if and only if S is an interval.

Proof. Left to the reader. See Exercise below.
10.2.4 Proposition. Q is dense in E.
Proof. Since Q is dense in R (Proposition 8.5.12), it remains only to show that
all intervals (a, +∞) (for a < +∞) and (−∞, b) (for b > −∞) contain a rational
point. This, however, is an immediate consequence of the fact that Z (hence Q) is
both unbounded above and unbounded below.
10.2.5 Proposition. The topology of E possesses the Hausdorff separation prop-
erty:
For any x, y ∈ E, if x ̸= y, there exist respective disjoint neighborhoods N (x)
and N (y). (Equivalently, there exists a neighborhood N (x) of x whose complement
E \ N (x) is a neighborhood of y.)
Proof. If x, y ∈ R the neighborhoods Nϵ (x) and Nϵ (y) with ϵ = |y − x|/2 are

disjoint as shown in the proof of Proposition 9.3.4.
If x ∈ R and y = +∞ (or vice versa), let N (x) = (x − 1, x + 1) and N (y) =
(x + 1, +∞]. The case y = −∞ is analogous. In case x = −∞ and y = +∞, take
N (x) = [−∞, 0) and N (y) = (0, +∞].
10.2.6 Proposition. Let N ⊂ E be a neighborhood of a point x ∈ E. Let
x1 , x2 , . . . , xn be finitely many points different from x. Then x has a neighborhood
N ′ ⊂ N such that xi ∈ / N for i = 1, 2, . . . , n.
Proof. We prove the statement by induction on n. For n = 0 just take N ′ = N .
For the inductive step, let {x1 , x2 , . . . , xn , xn+1 } be given, all different from x.
By the inductive hypothesis there exists N1 ⊂ N such that N1 is a neighborhood of
x not containing xj for j = 1, 2, . . . , n. By the Hausdorff separation property there
exists a neighborhood N2 of x not containing xn+1 . Let N ′ = N1 ∩ N2 . Then N ′
is a neighborhood of x included in N not containing xj for j = 1, 2, . . . , n + 1.
10.2.7 Proposition. If x is a limit point (in E) of the set S ⊂ E, then every
neighborhood of x contains infinitely many points of S.
(This is a slight generalization of Proposition 9.3.11.)
Proof. Given any neighborhood N of x and any finite sequence {x1 , x2 , . . . , xn } of
points in S \ {x} there exists a neighborhood N ′ ⊂ N of x containing no xj for
j = 1, 2, . . . , n, by Proposition 10.2.6. Since x is a limit point of S, N ′ contains
some point y ∈ S \{x}. We conclude that no neighborhood N of s contains at most
finitely many elements of S \{x}. Hence every neighborhood of x contains infinitely
many elements of S \ {x} (and a fortiori infinitely many elements of S).
10.2.8 Proposition. Let S ⊂ E be nonempty. Then sup S and inf S are in the
closure S − of S in E. In fact, sup S = sup S − = max S − and inf S = inf S − =
min S − .
(Note that this generalizes Proposition 9.4.17.)
Proof. We prove the statement for u = sup S, the case of inf S being formally
identical. Since S ⊂ E, it has both a supremum and an infimum in E.
First we note that (u, +∞] is a open set (empty, if u = +∞) disjoint from S,
since u is an upper bound of S by assumption. Hence u is also an upper bound
for S − . It remains only to prove u ∈ S − , for then u = max S − .
If u = −∞ then necessarily S = {−∞} = S − since S is nonempty and bounded
above by −∞. The statement follows in this case.
Henceforth assume u > −∞. Let N (u) be any neighborhood of u. Then
N (u) ⊃ (v, u] for some v < u. Since u is the least upper bound, there exists
s ∈ S ∩ (v, u] ⊂ N (u). We have shown that any neighborhood N (u) intersects S

so u ∈ S − .
10.2.9 Theorem (Heine-Borel). Any closed subset of the extended reals is com-
pact. (In particular, E itself is compact.)
(Note that this is a slight strengthening of Theorem 9.6.12 (Heine-Borel in R),
since a subset S ⊂ R is closed in E if and only if it is bounded and closed in R.)
Proof. Let S ⊂ E be closed. Let {Ui }i∈I be any open cover of S. If +∞ ∈ S then
one of the sets Ui , say A = Ua , is an open neighborhood of +∞, otherwise put
A = ∅. If −∞ ∈ S then some B = Ub is an open neighborhood of −∞, otherwise
put B = ∅.
We claim that Se = S \ (A ∪ B) is closed and bounded. First, Ac = E \ A and
B c = E \ B are closed since A, B are open. Hence S \ (A ∪ B) = S ∩ Ac ∩ B c is
closed. Now, we show Se is bounded above. This is obvious if S itself is bounded
above; otherwise, if S is unbounded above, then +∞ = sup S ∈ S − = S, so A is
a neighborhood of +∞ and Ac is bounded above, hence so is S. e Similarly, Se is
bounded below.
By the Heine-Borel Theorem 9.6.12, a finite subcover {Ui }i∈J indexed by a finite
e Then {A, B} ∪ {Ui }i∈J is a cover, still finite, of S.
subset J of I covers S.
10.2.10 Proposition. Any FIP family of closed subsets of E has nonempty inter-
section. (Per Definition 9.6.10, a family of closed sets has the Finite Intersection
Property (FIP) if and only if all of its finite subfamilies have nonempty intersec-
tion.)
Proof. This is merely a reformulation of the Heine-Borel Theorem 10.2.9. In view of
this fact, this is Proposition 9.6.11, a statement whose proof the reader is invited to
verify only depends on the abstract definition of compactness and on no particulars
of the topology (say, of R or E).
10.2.11 Definition. Let x ∈ E. A basis neighborhoods of x is a set B whose
members are neighborhoods of x and such that the following cofinality property is
satisfied:
For every neighborhood N (x) (not necessarily in the class B) there exists N ∈ B
such that N ⊂ N (x).
A shrinking sequence of neighborhoods of x is a sequence {Nn } of neighborhoods
of x which “shrinks to x” in the following sense:
(i) Nn ⊃ Nn+1 for all n ∈ N, and
(ii) for any neighborhood N (x) there exists m such that Nm ⊂ N (x) (hence
Nn ⊂ N (x) for all n > m).
A shrinking sequence of neighborhoods of x, when regarded as a set of neighbor-

hoods of x, is obviously a basis of neighborhoods of x.
Note that a basis of neighborhoods B of x (resp., a shrinking sequence of
neighborhoods Nn of x) uniquely determines x, by the Hausdorff separation prop-
erty 10.2.5.
10.2.12 Proposition. For any x ∈ E there exists a shrinking sequence of open
convex neighborhoods of x.
Proof. For x ∈ R we may take the sequence B = {N1/n (x) | n ∈ N+ } (readers who
insist on indexing by all n ∈ N may take N0 = E, or they may prefer the sequence
{N1/(n+1) (x)} instead). For x = +∞ take B = {(n, +∞] | n ∈ N}. For x = −∞
take B = {[−∞, −n) | n ∈ N}.
10.2.13 Proposition. Let x ∈ E. The following are bases of neighborhoods of x:
• The set of all neighborhoods of x.
• The set of all open neighborhoods of x.
• The set of all open convex neighborhoods of x.
• The set of all closed neighborhoods of x.
• The set of all closed convex neighborhoods of x.
Proof. Left to the reader.

10.2.14 Proposition. Let x ∈ E. Any neighborhood N (x) of x contains an open
convex subneighborhood N of x (that is, an open interval N containing x) such
that N − ⊂ N (x). (Note that N − is thus a closed neighborhood of x.)
Proof. Let N (x) be any neighborhood of x. If x ∈ R, then N (x) contains an
open neighborhood Nϵ (x) for some ϵ > 0. Let N = (x − ϵ/2, x + ϵ/2). Then
N − = [x − ϵ/2, x + ϵ/2] ⊂ Nϵ (x) ⊂ N (x).
If x = +∞ then N (x) contains the open neighborhood (T, +∞] for some T <
+∞, hence also the open convex neighborhood N = (T + 1, +∞] with N − =
[T + 1, +∞] ⊂ (T, +∞] ⊂ N (x). The case of x = −∞ is formally identical.
10.3 Sequences and their Limits

10.3.1 Definitions. Unless otherwise specified, by a sequence in S we mean a
countable sequence in S, i.e.: a function f : N → S, where N denotes the natural
numbers. Instead of writing f (0), f (1), f (2), . . ., we write a0 , a1 , a2 , . . .. Without
warning, we may replace N with {1, 2, 3, . . .}. The sequence is often denoted by
{an }∞
n=0 , or {an }, or {a0 , a1 , a2 , . . .}. The values a0 , a1 , a2 , . . . , which need not
be distinct, are called the terms of the sequence. The range of {an } is the range
of the function f , that is, the set f∗ (N) = {a0 , a1 , a2 , . . . } (the ordering of whose
elements is ignored).
When S is not specified, by a sequence we mean a real sequence, i.e.: a sequence
in R.
If S ⊂ R is bounded above and it so happens that sup S ∈ S, we define
max S = sup S,
i.e.: max S denotes the largest element of S. Similarly, if S ⊂ R is bounded below

and it so happens that inf S ∈ S, we define
min S = inf S,
i.e.: min S denotes the smallest element of S.

The notation {a0 , a1 , a2 , . . . } is ambiguous: it can denote a set or a sequence.
Context tells what it means.
10.3.2 Examples. If S ⊂ R is finite, then max S and min S are always de-
fined. This need not be so when S is infinite. For example, max(0, 1] = 1 because
sup(0, 1] = 1 ∈ (0, 1], but min(0, 1] does not exist, since inf(0, 1] = 0 ∈ / (0, 1].
Similarly, min[0, 1) = 0 because inf[0, 1) = 0 ∈ [0, 1), but max[0, 1) does not exist,
since sup[0, 1) = 1 ∈
/ [0, 1).
We also have that max[0, 1] = 1 and min[0, 1] = 0, while neither max(0, 1) nor
min(0, 1) is defined.
10.3.3 Definition (Tails of a sequence). Let {ak }k≥0 be a real sequence. For
each n ∈ N the n-th tail of {ak }k≥0 is the sequence {ak }k≥n = {an+k }k≥0 (namely
the sequence {ak } with the first n terms omitted). Clearly the n-tail of {ak } is
a subsequence of {ak } (the k-th term of the n-th tail is bk = an+k ). Note that
{ak } = {a0+k } is a tail of itself (the “zeroth” tail).
10.3.4 Notation. Let {P (n)}n∈N be a sequence of sentences in the language of
the real numbers. We say that P holds eventually, or ultimately, or almost always,
or that P (n) holds for all sufficiently large n, if the set {n ∈ N | P (n)} contains a
final segment of N, that is, if there exists N ∈ N such that n > N ⇒ P (n).
10.3.5 Definition. Convergence and Divergence A (real) sequence {an } is

said to be convergent or is said to converge if there exists a real number L such
that, for every neighborhood N (x) of x, an is eventually in N (x) (in other words,
we have an ∈ N (x) for all sufficiently large n). More explicitly, for all ϵ > 0, there
exists some natural number N such that
n>N ⇒ |an − L| < ϵ.
If the above is true, we say that the sequence converges to L and we call L the
limit of the sequence. If a sequence does not converge, it is said to be divergent, or
to diverge.
To say that {an } converges to L we may also write
lim an = L, or: an → L as n → ∞.
n→∞
A formal phrasing of the definition of convergence to L ∈ R is the following:
lim an = L ⇔ (∀ϵ > 0)(∃N ∈ N)(∀n ∈ N)(n > N ⇒ |an − L| < ϵ).
n→∞
We say that {an } diverges to +∞ if, for every neighborhood N (+∞), an is

eventually in N (+∞) (i.e., N (+∞) contains an for all n sufficiently large). Equiv-
alently, for all T < +∞ there exists N such that n > N ⇒ an > T . In this case
we write
lim an = +∞, or: an → +∞ as n → ∞.
n→∞
We say that {an } diverges to −∞ if, for every neighborhood N (−∞), an is

eventually in N (−∞) (i.e., N (−∞) contains an for all n sufficiently large). Equiv-
alently, for all T > −∞ there exists N such that n > N ⇒ an < T . In this case
we write
lim an = −∞, or: an → −∞ as n → ∞.
n→∞
In general, for a number L ∈ E we have limn→∞ an = L if and only if, for every
neighborhood N (L) we have an ∈ N (L) eventually.
10.3.6 Remarks. • It follows immediately from the definition of basis of neigh-
borhoods of L that no generality is lost in Definition 10.3.5 by considering
only neighborhoods N (L) belonging to any fixed basis B of neighborhoods
of L (or to any fixed shrinking sequence of neighborhoods of L for that mat-
ter). For instance, it is always enough to take N (L) to be an ϵ-neighborhood
of L for arbitrary ϵ > 0, or perhaps an (1/n)-neighborhood for arbitrary
n ∈ N.
• The definition above is the first important one in mathematics entailing three
quantifiers (a so-called “∀3 formula”); therein resides its complexity and the
need for refined skill in its use. The definitions of the limit of a function
and of continuity, to be introduced in Note 11, are essentially identical and
among the most important in analysis.
• A constant sequence {an } with an = L for all n has the obvious limit L.
• It is entirely possible for a sequence to be divergent, but to diverge to neither

±∞. Consider the sequence {(−1)n }. It is clearly divergent, yet it is also
clear that its even terms form a (constant) sequence with limit +1 while its
odd terms form a (constant) sequence with limit −1. We elaborate on this
observation by introducing the concepts of subsequences and subsequential
limits in section 4 below.
• In Definition 10.3.5 above note that, while, any neighborhood interval N (L)
does contain all terms an for n sufficiently large, hence for infinitely many
n ∈ N, it is misleading to state that N (L) contains infinitely many terms
of the sequence since these terms may not be distinct. In fact, N (L) may
contain only finitely many distinct numbers an : consider the extreme example
of the constant sequence {an } with an = L for all n ∈ N. Then N (L) contains
exactly one term of the sequence (namely L), yet an is in N (L) for all natural
numbers n. Still, it is quite common to abuse language and quantify over
elements an in the range of the sequence rather than over the natural numbers
n (in the domain of the sequence). For instance “infinitely many terms of
{an } satisfy […]” is commonly interpreted to mean “for infinitely many natural
numbers n, an satisfies […]”.
10.5 Exercise. Let {an } be a real sequence. Prove:
lim an = 0 ⇔ lim |an | = 0

n→∞ n→∞
10.6 Exercise. Let {Nn } be a shrinking sequence of neighborhoods of L. Let {an }

be a sequence such that an ∈ Nn for all n ∈ N. Prove that an → L as n → ∞.
10.3.7 Proposition. If a tail of a sequence has a limit L ∈ E, then all tails have
the same limit L (a sequence and all its tails have the same limit, if any).
Proof. Left to the reader.
10.3.8 Definition. Any property of a sequence {an } that is completely determined

by any and all tails of {an } is called final. Vaguely speaking, final properties are
those independent of the terms in any finite initial subsequence {an | n ≤ N } of
{an }.
10.3.9 Example. The property “an → L as n → ∞” is the first and most im-
portant example of a final property (by Proposition 10.3.7 above). Below we will
study other final properties such as {an } being bounded, having a given superior
or inferior limit, haven a given subsequential limit, etc.
10.3.10 Remark. In studying final properties, we may relax the definition of
sequence slightly by allowing sequences finitely
( many of whose
)−1 terms may be un-
defined. For instance, the sequence an = (n − 5)(n − 10) is defined for n ∈ N
except n = 5 and n = 10. In studying final properties of this {an } we may ignore
all n ≤ 10, if so wished, perhaps by passing (say) to the tail {an+10 }.
10.3.11 Proposition. Let L ∈ E and let B be any basis of neighborhoods of L.
A real sequence {an } has limit L ∈ E if and only if, for every N ∈ B, an is
eventually in N .
Proof. The direct implication is obvious. The converse implication follows im-
mediately from the definition of basis of neighborhoods: For any N (L) there is
N ∈ B with N ⊂ N (L). Hence a sequence that is eventually in N is eventually in
N (L).
10.3.12 Definition. Bounded and Unbounded Sequences A sequence {an }
is said to be bounded if there is a real number B such that |an | ≤ B for all n. If
no such number B exists, the sequence is said to be unbounded. In other words, a
sequence is bounded if and only if its range is a bounded subset of E.
10.3.13 Example. The sequence {0, 1, 2, . . .} is not convergent. Its limit is +∞.
The sequence {1, 1/2, 1/3, . . .} converges to 0. The first sequence is unbounded,
the second is bounded.
10.3.14 Proposition. Uniqueness of Limits The limit of a sequence is unique
(if it exists at all, either as a real or extended real value).
Proof. Let L1 , L2 be two limits of a sequence {an }. Let N (L1 ) and N (L2 ) be any
neighborhoods. Since an ∈ N (L1 ) ∩ N (L2 ) for all sufficiently large n, N (L1 ) and
N (L2 ) are not disjoint. Since this is true for all such N (L1 ) and N (L2 ), we must
have L1 = L2 by the Hausdorff property, Proposition 10.2.5.
10.3.15 Remark. Proposition 10.3.14 justifies the use of the notation limn→∞ an
to denote the (unique) limit L (if it exists). In other words, the relation “an → L
as n → ∞” between sequences {an } ∈ EN and extended reals L ∈ E defines L
well as a function of {an }. To be more precise, by restricting the relation to the

set A ⊂ EN of sequences that have some limit in E, hence a unique limit, we get
a (well defined) function
A→E
{an } 7→ L = lim an .
n→∞
10.3.16 Proposition (Closure is Set of Limits of Sequences). Let S be a

subset of E. Then S − , the closure of S in E, is exactly the set of limits of sequences
in S.
Proof. If L ∈ E is the limit of a sequence {an } ⊂ S then every neighborhood
N (L) intersects S nontrivially since N (L) contains an for all sufficiently large n.
Hence L ∈ S − .
Conversely, assume L ∈ S − . Take a shrinking sequence {Nn } of neighborhoods
of L. Since L is a limit point each neighborhood Nn contains some point an ∈ S.
For each n ∈ N the sequence {ak } so constructed is eventually in Nn (namely,
ak ∈ Nk ⊂ Nn for all k > n). Since {Nn } shrinks to L we have L = limn→∞ an .
10.3.17 Corollary. If {an } is a convergent sequence in a closed subset S ⊂ R
then limn→∞ an ∈ S.
Proof. By Proposition 10.3.16, L = limn→∞ an ∈ S − where S − is the closure of S
in E. Since S is closed in R we obviously have S − ⊂ S ∪ ±∞ (S already contains
all of its real limit points). Since {an } is convergent, we have L ̸= ±∞, hence
L ∈ S − \ {±∞} = S.
10.3.18 Remark. Not every bounded sequence converges. For example, the
sequence {(−1)n } is bounded, but does not converge. However, every convergent
sequence is bounded, as shown below.
10.3.19 Proposition. Every convergent sequence is bounded.
Proof. Suppose the sequence {an } is convergent. Then, there is some N and some
L ∈ R such that
n > N ⇒ |an − L| < 1.
Thanks to the above and the triangle inequality,
n > N ⇒ |an | = |(an − L) + L| ≤ |an − L| + |L| < 1 + |L|.
In other words, |an | < 1 + |L| for all sufficiently large n.

What about the rest of the terms? These do not present a problem: Define
B = max{|a0 |, . . . , |aN |}.

We then have
n ≤ N ⇒ |an | ≤ B,
Therefore, |an | ≤ max{1 + |L|, B} for all n, as required.
10.3.20 Corollary. Every unbounded sequence diverges.
Proof. Let {an } be a sequence. Proposition 10.3.19 above shows that
{an } converges ⇒ {an } is bounded.
Therefore, by contraposition,
{an } is unbounded ⇒ ¬({an } converges).
10.4 Subsequences and Subsequential Limits

10.4.1 Remark. While a bounded sequence is not necessarily convergent, we
have the next best thing to it: every bounded sequence has a convergent subse-
quence, as we show below. (See the Bolzano-Weierstrass Theorem 10.4.11.)
10.4.2 Definition. Subsequences An increasing sequence of natural numbers
is a sequence of natural numbers {n0 , n1 , n2 , . . .} such that n0 < n1 < n2 < · · ·
(note the inequalities are all strict). Given a sequence {an }, any sequence of the
form {ank }∞k=0 = {an0 , an1 , an2 , . . .}, where {nk }k∈N is any increasing sequence of
natural numbers, is called a subsequence of {an }n∈N . Trivially, every sequence is
a subsequence of itself (by taking nk = k for all k ∈ N). A subsequential limit of
{an } the (extended real) limit of any subsequence of {an }.
The set of all extended real subsequential limits of {an } will be denoted by
SeqLim{an } or by SeqLimn→∞ an , that is
L ∈ SeqLim{an } ⇔ lim ank = L for some subsequence {ank } of {an }.

k→∞
10.4.3 Remark. If {nk } is an increasing sequence in N then it easily follows by

induction that nk ≥ n0 + k for all k ∈ N. This has an immediate but very useful
consequence: For any N there exists K such that k > K ⇒ nk > N .
10.4.4 Example. A sequence need not be convergent nor have a limit in E in
order to have a subsequential limit. For example
{an }∞
n=0 = {1, 2, 1/2, 1, 3, 1/3, 1, 4, 1/4, . . .}
is an unbounded sequence with subsequential limits 0 and 1 as well as +∞. The

convergent subsequences are
{bk }∞ ∞
k=0 = {a3k }k=0 = {a0 , a3 , a6 , . . .} = {1, 1, 1, . . .} with limit 1
and
{ck }∞ ∞
k=0 = {a3k+2 }k=0 = {a2 , a5 , a8 , . . .} = {1/2, 1/3, 1/4, . . .} with limit 0.
The divergent subsequence
{dk }∞ ∞
k=0 = {a3k+1 }k=0 = {a1 , a4 , a7 , . . .} = {2, 3, 4, . . .} has limit +∞.
10.4.5 Proposition. If limn→∞ an = L then limk→∞ ank = L for all subsequences
{ank } of {an }.
Proof. Let limn→∞ an = L and {ank } be any subsequence of {an }. Let N be any
neighborhood of L. There exists M ∈ N such that n > M ⇒ an ∈ N . Since
nk ≥ k for all k ∈ N, we have k > M ⇒ nk ≥ k > M → ank ∈ N . This proves
that limk→∞ ank = L.
The converse implication is obvious since {an } is a subsequence of itself.
10.4.6 Proposition. Let S ⊂ E. Then the closure S − of S in E is exactly the set
of subsequential limits of sequences in S.
Proof. If {an } ⊂ S then every subsequence {ank } is also in S. If such subsequence
has a limit L, then L is in S − by Proposition 10.3.16.
Conversely, if L ∈ S − , then (by Proposition 10.3.16) L is the limit of a sequence
in S, hence a fortiori also a subsequential limit thereof.
10.4.7 Proposition. An extended real number L is a subsequential limit of a
sequence {an } if and only if it is a subsequential limit of any and all of the tails
of {an } (a sequence and all its tails have the same subsequential limits). In other
words, the property “L is a subsequential limit of {an }” is final.
Proof. Purely cosmetic variation of the proof of 10.3.7 safely left to the reader.
10.4.8 Notation. Let {P (n)}n∈N be a sequence of sentences in the language of
the real numbers. We say that P holds frequently or infinitely often, or P (n) holds
for arbitrarily large n, if the set {n ∈ N | P (n)} is infinite, hence unbounded: for
any M ∈ N there exists n ∈ N such that n > M and P (n) holds.
A very common application of the above nomenclature is the following: If {an }
is a sequence and S ⊂ E we say that {an } is frequently (or infinitely often) in S if
{n ∈ N | an ∈ S} is infinite (i.e., unbounded): for all M ∈ N there exists n ∈ N
such that n > M and an ∈ S (that is, an ∈ S for arbitrarily large n).
10.9 Exercise. Let S be a set. Show that a sequence is frequently in S if and only
if it has a subsequence in S.
10.4.9 Proposition. A point L ∈ E is a subsequential limit of {an } if and only
if, for every neighborhood N (L), {an } is frequently in N (L).
Proof. If L = limk→∞ ank and N (L) is any neighborhood of L then there exists
K such that ank ∈ N (L) for all k > K. The K-th tail {ank+K | k ∈ N+ } is
a subsequence of {an } in N (L). Hence {an } is in S infinitely often (this is the
backward (easy) implication of Exercise 10.9).
Conversely, let L ∈ E such that every N (L) contains terms an for arbitrarily
large n. Let {Nn } be any shrinking sequence of neighborhoods of L. (For instance,
Nn = N1/n (L).)
Define an increasing sequence {nk } ⊂ N recursively as follows:
• Take n0 to be any integer n such that an ∈ N0 .
• For any k ∈ N, given nk , let nk+1 be any integer n such that n > nk and
an ∈ Nk+1 .
The hypothesis “an ∈ Nk for arbitrarily large n” ensures that n0 , n1 , n2 , . . . may

be successively chosen.
Since the sequence {Nk } shrinks to L, it is a basis of neighborhoods of L. Since
ank ∈ Nk for all k ∈ N, we have L = limk→∞ ank , by Exercise 10.6.
10.4.10 Proposition. Let {an } be a sequence. The set SeqLim{an } ⊂ E of
subsequential limits of {an } is closed.
Proof. Let A = SeqLim{an }. Let L ∈ A− . We will prove L ∈ A, showing that A
is closed.
Let N be any open neighborhood of L. Since L ∈ A− , there exists l ∈ A ∩ N .
Since N is open and l ∈ A, N is a neighborhood of l so an is frequently in N ,
by Proposition 10.4.9. We conclude that an is frequently in N also. Since N is
an arbitrary open neighborhood of L, we conclude that L itself is a subsequential
limit of {an } by Proposition 10.4.9.
10.4.11 Theorem (Bolzano-Weierstrass). Every sequence in E has at least
one subsequential limit in E. (I.e., the set SeqLim{an } of subsequential limits of
a sequence {an } is not empty.) In particular, a bounded sequence in R has a
convergent subsequence.
(This version of the Bolzano-Weierstrass theorem amounts to a slight generaliza-
tion of Theorem 9.4.21. The reader is invited to provide a proof of the present result
emulating that one: See Exercise 10.10 below. We also remark that, in the termi-
nology introduced in Exercise 10.11 below, this theorem states that (i) bounded
subsets of R, and also (ii) arbitrary subsets of the extended reals E are relatively
sequentially compact.)
Proof. Note that the second statement follows from the first since obviously a
bounded sequence {an } can only have bounded (hence real) subsequential limits,
hence {an } must have a convergent subsequence provided it has any subsequential
limits at all.
In what follows, let {an } be a sequence in E. For each l ∈ N let {an+l } be
the l-th tail of {an }. Let Bl be the closure of the range of {an+l } (a nonempty,
closed subset of E). The range of {an+l+1 } is a subset of the range of {an+l }, hence
Bl ⊃ Bl+1 : in other words, B0 ⊃ B1 ⊃ . . . is a nested sequence of nonempty
bounded and closed subsets of E. In particular, B0 is compact (by Heine-Borel,
∩ and the sequence {Bl } consists of closed subsets of a compact
Theorem 10.2.9),
set. Let I = l∈N Bl . By the FIP characterization of compactness we conclude
that I ̸= ∅.
Let L ∈ I. The proof will be complete once we prove that {an } is frequently in
any neighborhood N of L, by Proposition 10.4.9. Given such N and an arbitrary
natural number M we have L ∈ BM +1 , so we have N ∩{an+M +1 } ̸= ∅, by definition
of closure. Hence, there exists m ∈ N with am+M +1 ∈ N . Letting n = m + M + 1
we have n > M and an ∈ N . We have shown that {an } is frequently (infinitely
often) in an arbitrary neighborhood N of L, so L is a subsequential limit of {an },
∩
10.4.12 Remark. The proof above shows l Bl ⊂ SeqLim{an }. Actually, equality
holds. See Theorem 10.9.17.
10.10 Exercise. Provide an alternate proof of Theorem 10.4.11 along the lines of
that given for the Bolzano Weierstrass theorem for bounded infinite subsets of R,
Theorem 9.4.21. Alternatively, use 9.4.21 to prove 10.4.11.
10.11 Exercise. A subset S ⊂ R is called relatively sequentially compact if any
sequence {an } ⊂ S has a convergent subsequence. A subset K ⊂ R is called
sequentially compact if any sequence {an } ⊂ K has a convergent subsequence
whose (unique) limit is in K.
1. Prove that a subset S ⊂ R is relatively sequentially compact if and only if
its closure S − (in R) is sequentially compact.
2. Prove that a subset of R is relatively sequentially compact if and only if it
is bounded, and sequentially compact if and only if it is closed and bounded
(i.e., for subsets of R sequential compactness is the same property as com-
pactness).
10.12 Exercise. A subset S ⊂ E is called relatively sequentially compact if any

sequence {an } ⊂ S has a subsequential limit L ∈ E. A subset K ⊂ E is called
sequentially compact if any sequence {an } ⊂ K has a subsequential limit whose
(unique) limit is in K.
1. Prove that every subset S ⊂ E is relatively sequentially compact.
2. Prove that a subset S ⊂ E is relatively sequentially compact if and only if it

is closed.
In particular, for a subset S ⊂ E, sequential compactness is the same as compact-

ness, which in is merely the property of being closed in E.
(Note that an unbounded subset S of R has compact closure in E, while the
closure of S in R is certainly not compact.)
10.13 Exercise. Let S ⊂ E be closed (hence compact). If a sequence is frequently
in S show that it has a subsequential limit in S.
10.4.13 Proposition. A sequence {an } has a unique subsequential limit L ∈ E if
and only if limn→∞ an = L.
Proof. Let {an } have L ∈ E as its only subsequential limit. Let N be any open
neighborhood of L. Then K = E \ N is a closed, hence compact subset of E.
{an } cannot be frequently in K since it would them have a subsequence {ank }
in K and also a subsequential limit in K, by Exercise 10.12, contradicting the
assumption that the unique subsequential limit L of {an } is not in K. Therefore
{an } is eventually in E \ K = N . Since this holds for any open neighborhood N
of L we have shown limn→∞ an = L.
The converse implication follows from Proposition 10.4.5.
10.4.14 Corollary. A sequence has no extended real limit if and only if it has at
least two different subsequential limits in E.
Proof. Any sequence {an } has at least one extended real limit, by Bolzano-Weierstrass’s
theorem 10.4.11. If {an } has no extended real limit it must therefore have at
least two subsequential limits by Proposition 10.4.13. By the same proposition we
have conversely that, if {an } has at least two subsequential limits, then it has no
limit.
10.5 Cauchy Sequences

10.5.1 Definition. The sequence {an } is called a Cauchy sequence, or is said
to be Cauchy if, given any ϵ > 0, there is some N such that
n, m > N ⇒ |an − am | < ϵ.
The phrase “there is some N such that for m, n > N ” can be referred to as “for all
sufficiently large m and n.” Be careful, though: m and n must be able to become
arbitrarily large independently of each other. To see what this means, consider the
following example.
10.5.2 Example. Consider the sequence {an }∞ ∞
n=1 = {logb n}n=1 , for any fixed
b > 1 (say, b = 2 for definiteness). For any m ≤ n we have
(n) ( )
n−m
|an − am | = | logb n − logb m| = logb = logb 1 + .
m m
Consider what happens when m, n are further constrained to be at most a bounded
distance apart, say 0 ≤ n − m ≤ K for some fixed K > 0. For b > 1, the function
logb is strictly increasing and moreover inf x>1 logb (x) = logb 1 = 0 (See Note 8).
Hence, for any ϵ > 0 there exists X > 0 such that 0 ≤ logb (1 + x) < ϵ for
0 ≤ x < X (in fact, we may take X = expb (ϵ) − 1 = ( b − 1). Let N) = K/X.
ϵ
As long as m > N we will have 0 ≤ logb (n/m) = logb 1 + (n − m)/m < ϵ since
0 ≤ (n − m)/m < K/m < K/N = X. Thus,
0 < n − m < K ∧ n > N ⇒ |an − am | < ϵ.
In particular, given N and any ϵ > 0 there always exist terms am , an such that
|an − am | < ϵ. However, this is not enough to conclude that the sequence {an } is
Cauchy; indeed it is not. In fact, in the case at hand the Cauchy property fails not
just barely, but in a spectacular fashion. For its failure, it would suffice to find a
single positive ϵ (presumably small) for which the property not be met; however, in
this case the property fails to hold for any ϵ > 0 whatsoever, no matter how large!
100
In fact, given any ϵ > 0 (say ϵ = 1, or ϵ = 1010 for that matter) and any N choose
m > max{N, 1} and any n > bϵ m. Then n > m > N and |an − am | > logb bϵ = ϵ.
In other words, there exist arbitrarily large m, n such that am , an are arbitrarily far
apart, so our earlier calculation that am , an may be close together with m, n large
was rather misleading, and did not at all prove the Cauchy property.
10.5.3 Proposition. Every convergent sequence is Cauchy.
Proof. Suppose that {an } is a convergent sequence. Then, there some L such that
given any ϵ > 0 there is an N such that
ϵ
n > N ⇒ |an − L| < .
2
Therefore,
ϵ ϵ
m, n > N ⇒ |am − an | = |(am − L) + (L − an )| ≤ |am − L| + |L − an | < + = ϵ,
2 2
showing that the sequence is Cauchy.
10.5.4 Proposition. Every Cauchy sequence is bounded.
Proof. Suppose that {an } is a Cauchy sequence. Then, there is some N such that
m, n > N ⇒ |am − an | < 1.
Therefore,
n > N ⇒ |an | = |(an − aN +1 ) + aN +1 | ≤ |an − aN +1 | + |aN +1 | < 1 + |aN +1 |.
showing that the sequence is bounded for all sufficiently large n.

Define B = max{|a0 |, |a1 |, . . . , |aN |}. We have |an | ≤ B + |aN +1 | + 1, for all n,
showing that the sequence is bounded.
10.5.5 Theorem (Cauchy Principle). A sequence is Cauchy if and only if it is
convergent.
Proof. Let {an } be a sequence. Thanks to Proposition 10.5.3, the convergence of
this sequence implies that the sequence is Cauchy.
Now we must prove that if {an } is Cauchy, then it is convergent. By Bolzano-
Weierstrass’s theorem 10.4.11, {an } has at least one subsequential limit. {an } is
bounded, by Proposition 10.5.4, hence all subsequential limits of {an } are real. Let
L1 , L2 ∈ SeqLim{an }. Let ϵ > 0 be given. By the Cauchy hypothesis, there exists
M ∈ N such that m, n > M ⇒ |an − am | < ϵ/3. Since L1 , L2 are subsequential
limits of {an } there exist m, n > M such that |am − L1 | < ϵ/3 and |an − L2 | < ϵ/3
({an } is frequently in any neighborhoods of L1 , L2 , by Proposition 10.4.9). Hence
|L2 − L1 | = |L2 − an + an − am + am − L1 | ≤ |L2 − an | + |an − am | + |am − L1 |

ϵ ϵ ϵ
< + + = ϵ.
3 3 3
Since ϵ > 0 is arbitrary, we conclude L1 = L2 , hence {an } has a unique subsequen-
tial limit L1 ∈ R. By Proposition 10.4.13, {an } converges (to L1 ).
10.5.6 Remark. A Cauchy sequence in R is necessarily convergent, whereas one
in Q not necessarily so. (See Exercise 10.14 below.) Let us call a set S Cauchy-
complete if every Cauchy sequence in S converges to a point in S. (This notion
makes sense in the much more general context of metric spaces, which need not
be subsets of R.) Then, in a sense, the completeness of R is equivalent to its

Cauchy completeness and to the Archimedean property. To be more precise, R
may be characterized, up to isomorphism, as the only ordered field which is both
Archimedean and Cauchy-complete.
10.14 Exercise. Construct a sequence {an } recursively as follows. Fix a0 ∈ Q+
and, recursively,
√ let an+1 = (an + 2/an )/2. Show that {an } is Cauchy and has
the limit 2 (that is, the limit L is positive and satisfies L2 = 2). Recall that no
rational number L has that property; hence Q is Cauchy-incomplete.
10.6 Some Convergence Results

10.6.1 Theorem. Suppose {an } and {bn } are real sequences such that
lim an = L1 ∈ E and lim bn = L2 ∈ E.

n→∞ n→∞
Then
(1) limn→∞ (c + an ) = c + L1 for any c ∈ R;
(2) limn→∞ can = cL1 for any c ∈ R

(here we interpret 0(±∞) as 0);
(3) limn→∞ (an + bn ) = L1 + L2 if L1 + L2 is defined

(i.e., it is not the sum of +∞ and −∞);
(4) limn→∞ (an − L1 )(bn − L2 ) = 0 if L1 , L2 ∈ R;
(5) limn→∞ an bn = L1 L2 if L1 L2 is defined

(i.e., it is not the product of 0 with ±∞);
(6) limn→∞ an /bn = L1 /L2 provided L1 /L2 is defined

(i.e., at least one of L1 , L2 is not ±∞, and L2 ̸= 0);
(6a) limn→∞ an /bn = +∞ if L1 > L2 = 0 and bn > 0 for all n, or else L1 <
L2 = 0 and bn < 0 for all n;
(6b) limn→∞ an /bn = −∞ if L1 > L2 = 0 and bn < 0 for all n, or else L1 <
L2 = 0 and bn > 0 for all n;
(7) limn→∞ |an | = |L1 |.
(8) L1 ≤ L2 , provided an ≤ bn for all n ∈ N.

Proof. The proofs below cover all cases involving only real numbers and their arith-
metic. The reader is asked to provide the necessary modifications needed to deal
with extended reals.
We prove (1) Given ϵ > 0 there is some N such that
n > N ⇒ |an − L1 | < ϵ.
Therefore, we have
n > N ⇒ |(c + an ) − (c + L1 )| = |an − L1 | < ϵ.
We prove (2) If c = 0, there is nothing to prove. Otherwise, given ϵ > 0 there

is some N such that
ϵ
n > N ⇒ |an − L1 | < .
|c|
Therefore, we have
ϵ
n > N ⇒ |can − cL1 | = |c||an − L1 | < |c| = ϵ.
|c|
We prove (3) Given ϵ > 0 there are N1 and N2 such that

ϵ ϵ
n > N1 ⇒ |an − L1 | < and n > N2 ⇒ |bn − L2 | < .
2 2
For N = max{N1 , N2 } we have
ϵ ϵ
n > N ⇒ |(an + bn ) − (L1 + L2 )| ≤ |an − L1 | + |bn − L2 | < + = ϵ.
2 2
We prove (4) Given ϵ > 0 there are N1 and N2 such that
√ √
n > N1 ⇒ |an − L1 | < ϵ and n > N2 ⇒ |bn − L2 | < ϵ.
Therefore, with N = max{N1 , N2 } we have

√√
n > N ⇒ |(an − L1 )(bn − L2 )| < ϵ ϵ = ϵ.
We prove (5) Thanks to (1), {an − L1 } and {bn − L2 } converge to 0. Thanks

to (2), {L2 (an − L1 )} and {L1 (bn − L2 )} also converge to 0. Consider the identity
(9) an bn − L1 L2 = (an − L1 )(bn − L2 ) + L1 (bn − L2 ) + L2 (an − L1 ).
Apply (4) to the sequence {(an − L1 )(bn − L2 )} to obtain
lim (an − L1 )(bn − L2 ) = 0.

n→∞
Apply (3) twice to (9) to obtain
lim (an bn − L1 L2 ) = lim [(an − L1 )(bn − L2 ) + L1 (bn − L2 )] + lim L2 (an − L1 )

n→∞ n→∞ n→∞
= lim (an − L1 )(bn − L2 ) + lim L1 (bn − L2 ) + lim L2 (bn − L1 )
n→∞ n→∞ n→∞
= 0
We conclude that {an bn } converges to L1 L2 .

We prove (6) Because bn appears in the denominator, we need to bound it
away from zero. Since {bn } converges to L2 and L2 ̸= 0, there is some N1 such that
1
n > N1 ⇒ |bn − L2 | < |L2 |.
2
Therefore,
1
n > N1 ⇒ |L2 | = |(L2 − bn ) + bn | ≤ |bn − L2 | + |bn | < |L2 | + |bn |,
2
whence
1
n > N1 ⇒ |bn | > |L2 |.
2
Thanks to (2) and (3) the sequence {L2 an − L1 bn } converges to 0, since L2 L1 −
L1 L2 = 0. Therefore, given any ϵ > 0, there is some N2 such that
ϵL22
n > N2 ⇒ |L2 an − L1 bn | < .
2
For N = max{N1 , N2 } we have

an L 1 L2 an − L1 bn

n>N ⇒ − =
bn L2 bn L 2
|L2 an − L1 bn |
=
|bn ||L2 |
|L2 an − L1 bn |
< 2
L22
= ϵ.
We prove (7) Given any ϵ > 0 there is some N such that
n > N ⇒ |an − L1 | < ϵ.
Therefore,
n > N ⇒ |an | − |L1 | ≤ |an − L1 | < ϵ
by the triangle inequality.

We prove (8) We show the converse: If L1 > L2 then for all N there exists
n > N such that an > bn . In fact, we prove the stronger statement that an > bn
for all sufficiently large n.
Let L1 > L2 and ϵ = (L1 − L2 )/2 > 0. There exist N1 , N2 such that n > N1 ⇒
|an − L1 | > ϵ and n > N2 ⇒ |bn − L2 | < ϵ. Let N = max{N1 , N2 }. Then
L1 + L2
n > N ⇒ bn < L 2 + ϵ = = L1 − ϵ < a n .
2
10.7 Contractive Sequences

10.7.1 Remark. Below we introduce a class of sequences that plays an impor-
tant role in many iterative methods (computer techniques used to solve nonlinear
problems).
10.7.2 Definition. The real sequence {an } is said to contractive if there is a
constant γ, 0 ≤ γ < 1, such that the inequality
(10.1) |an+2 − an+1 | ≤ γ |an+1 − an |
holds for n ∈ N.
10.7.3 Theorem (The Discrete Contraction Principle). Every contractive
sequence is convergent.
Proof. Suppose that {an } is a contractive sequence. If a0 = a1 then {an } is ob-
viously constant and thus convergent. We assume a1 ̸= a0 in what follows. We
have,
|a2 − a1 | ≤ γ|a1 − a0 |
|a3 − a2 | ≤ γ|a2 − a1 | ≤ γ 2 |a1 − a0 |
|a4 − a3 | ≤ γ|a3 − a2 | ≤ γ 3 |a1 − a0 |
..
.
|an+1 − an | ≤ γ|an − an−1 | ≤ γ n |a1 − a0 |
We show that {an } is Cauchy. For nonnegative j we have
|an+j − an | = |(an+j − an+j−1 ) + (an+j−1 − an+j−2 ) + · · · + (an+1 − an )|
≤ |an+j − an+j−1 | + |an+j−1 − an+j−2 | + · · · + |an+1 − an |
≤ |a1 − a0 |(γn+j−1 + γn+j−2 + · · · + γ n )
γ n − γ n+j
= |a1 − a0 |
1−γ
γn
< |a1 − a0 |.
1−γ
The calculation above shows that m, n > N → |an − am | < γ N |a1 − a0 | (1 − γ).
Note that this last upper bound depends only on N and tends to 0 as a function
of N , as N → ∞. Therefore ( the sequence
) {an } is Cauchy. (To be explicit, given
ϵ > 0 choose any N > logγ |a1 −a0 | . Then m, n > N ⇒ |an − am | < ϵ.)
(1−γ)ϵ
10.15 Exercise. If in Definition 10.7 we replace (10.1) with

(10.2) |an+2 − an+1 | < |an+1 − an |
for n ∈ N, does the conclusion of Theorem 10.7.3 hold?
10.16 Exercise. Compute the number
√ √
√
c= 2+ 2+ 2 + · · ·.
10.7.4 Theorem (Squeeze Principle). If {an } and {cn } are real sequences
with a common limit L ∈ E and {bn } is a real sequence such that
an ≤ b n ≤ c n for all n ∈ N,
then {bn } converges to L also.
(Note: By the finality of the property “bn → L as L → ∞”, it is always enough
to assume that the inequalities hold for all sufficiently large n.)
Proof. It suffices to show, given an arbitrary convex neighborhood N of L, that bn
is eventually in N . Since an → L we have an is eventually in N , say for all n > N1 .
Similarly, bn is eventually in N , say for all n > N2 . Let N = max{N1 , N2 }. By the
assumed inequalities and the convexity of N we have n > N ⇒ bn ∈ [an , cn ] ⊂ N .
We conclude that bn → L as n → ∞.
10.8 The Binomial Theorem

We state a common generalization of the Recursion Theorem 5.8.3 and of the
Bounded Recursion Theorem 5.8.5
10.8.1 Theorem (Transfinite Recursion). Let (I, <) be a nonempty well-
∏ set. Let {Ai | i ∈ I} be a family of sets. For any subset J ⊂ I let
ordered
PJ = j∈J Aj . For each i ∈ I let I(i) = I<j be the initial segment {j ∈ I | j < i}.
Conversely, for each proper initial segment J ⊂ I let i(J) = min(I \ J) and let also
a function fJ : PJ → Ai(J) be given. Then, there exists a unique function φ ∈ PI
such that
( )
(10.3) φ(i) = fI(i) φ|I(i)
for all i ∈ I, where φ|I(i) denotes the restriction of φ to the subset I(i) of I.
We will not present a proof the above result. We content ourselves with pointing
out that the standard proof relies heavily on Zorn’s Lemma, itself a nontrivial
consequence of the Axiom of Choice. However, from an intuitive point of view the
Transfinite Recursion Theorem simply says that if successive values of a function
are specified solely in terms or prior ones, then the overall function may be defined
by a recursive process.
10.8.2 Proposition. There exists a unique function φ : N × N → N such that
(10.4) φ(m, 0) = φ(0, n) = 1 for all m, n ∈ N
and
(10.5) φ(m + 1, n + 1) = φ(m, n + 1) + φ(m + 1, n) for all m, n ∈ N.
Proof. Let N × N have the lexicographic ordering
(i1 , j1 ) ≺ (i2 , j2 ) ⇔ i1 < i2 ∨ (i1 = i2 ∧ j1 < j2 ).
This ordering is strict and linear (see Exercise 6.8). For each (i, j) ∈ N × N let
I(i, j) be the initial segment {(a, b) ∈ I | (a, b) ≺ (i, j)} and define fij : NI(i,j) → N
by {
( ) 1 if i = 0 or j = 0
fij {nab }(a,b)∈I(i,j) =
n(i−1,j) + n(i,j−1) if i > 0 and j > 0.
The existence and uniqueness of φ now follows immediately from the Transfinite
Recursion Theorem 10.8.1. (Note that i, j > 0 if and only i = m + 1, j = n + 1 for
some natural numbers m, n; thus m = i − 1 and n = j − 1.)
10.8.3 Remark. We hope that the reader finds the existence of the function φ in
Proposition 10.8.2 rather intuitive. We do not wish to suggest that the full power
of the Transfinite Recursion Theorem is needed to construct the recursively defined
function φ. Far from it. It is perfectly possible, and the reader is invited to try to
find a proof of the existence of φ using the Countable Recursion Theorem 5.8.2.
We use transfinite recursion because it is the natural tool to dispatch the issue at
hand quickly before proceeding to the goal of defining binomial coefficients.
10.8.4 Definition. For natural numbers n, k with 0 ≤ k ≤ n we define the (n, k)-
th binomial coefficient
( )
n
(10.6) = φ(n − k, k)
k
with φ(a, b) defined recursively per Proposition 10.8.2.

More explicitly, the binomial coefficients are defined recursively by

( ) ( )
n n
(10.7) = =1 for all n ∈ N,
0 k
and
( ) ( ) ( )
n+1 n n
(10.8) = + for all n, k ∈ N, 0 < k < n.
k+1 k k+1
10.8.5 Remark. It is often convenient to extend the definition of binomial coef-
ficients to all integers k ∈ Z (but still assuming n ∈ N) by declaring
( )
n
(10.9) =0 if k < 0 or k > n.
k
With this convention, which we will adopt henceforth, the recurrence equation (10.8)
holds for all k ∈ N, as is easily seen.
10.8.6 Remark. Often, binomial coefficients are visualized in Pascal’s Triangle,
namely the infinite array of numbers
1
1 1
1 2 1
1 3 3 1
1 4 6 4 1
1 5 10 10 5 1
.. .. .. .. .. ..
. . . . . .
The numbers on the far left and far right are all equal to 1. Positions to the left
and right of the triangle are regarded as filled with zeros. Any number inside the
triangle is the sum of those immediately above-left and above-right of it. The (n)
number in the ((n)+ 1)-st
( n )row and (k + 1)-st column of Pascal’s Triangle is k
.
The symmetry nk = n−k of the entries in the triangle is apparent (see exercise
( ) ( n )
below). It is also obvious that n1 = n−1 = n.
10.17 Exercise. Prove the following identities involving binomial coefficients:
( ) ( )
n n
(10.10) = for all n ∈ N and k ∈ Z.
k n−k
∑n ( ) ( )
j n+1
(10.11) = for all 0 ≤ k ≤ n.
j=k
k k + 1
∑ m
m ( )( ) ( )
n n+m
(10.12) = for all m, n ∈ N and k ∈ Z.
j=0
j j + k k + m
10.8.7 Theorem (Binomial Theorem). Let (A, +, ·, 0A , 1A ) be a commutative

ring with identity (the reader
(n) may want to assume that A is Z, Q or R). Identify
the binomial coefficients k with elements of (the prime ring of) A (this amounts
to the usual identification of integers with elements of, say, R). For any x, y ∈ A
and any natural number n ∈ N we have
(10.13) ( ) ( ) ( ) ( ) ( )
n n n n−1 n n−k k n n n
(x + y) = n
x + x y + ··· + x y + ··· + xy n−1
+ y
0 1 k n−1 n
= 1xn + nxn−1 y + · · · + nxy n−1 + 1y n
∑n ( )
n n−k k
= x y .
k=0
k
(Note that 00A must be interpreted as 1A .)

Proof. The proof is by induction on n. The case n = 0 is the trivial identity
( )
0 0 0 0
(x + y) = 1A = 1A 1A 1A = xy .
0
Assume Equation (10.13) holds for a fixed n ∈ N and all x, y ∈ A. Then

∑ n ( )
n+1 n n n−k k
(x + y) = (x + y)(x + y) = (x + y) x y
k=0
k
∑n ( ) ∑n ( )
n n−k k n n−k k
=x x y +y x y distributivity
k=0
k k=0
k
∑n ( ) ∑n ( )
n n−k k n n−k k
= xx y + x y y
k=0
k k=0
k
∑
multiplicative homogeneity of (Exercise 6.23)
∑n ( ) n ( )
n n−k+1 k ∑ n n−k k+1
= x y + x y
k=0
k k=0
k
∑n ( ) n+1 ( )
n n−k+1 k ∑ n
= x y + xn+1−ℓ y ℓ
k=0
k ℓ=1
ℓ − 1
by letting ℓ = k + 1 in second sum
n+1 ( )
∑ n+1 ( )
n n+1−k k ∑ n
= x y + xn+1−ℓ y ℓ
k ℓ − 1
k=0
( ℓ=0) ( )
n n
by convention = =0
−1 n+1
n+1 [( )
∑ ( )]
n n
= + xn+1−k y k
k=0
k k − 1
renaming dummy ℓ to k and combining both sums
n+1 (
∑ )
n + 1 n+1−k k
= x y
k=0
k
by the binomial coefficient recurrence (10.8)
(see Remark 10.8.5.)
and the inductive step of the proof is complete.

10.8.8 Examples.
(x + y)2 = x2 + 2xy + y 2
(x + y)3 = x3 + 3x2 y + 3xy 2 + y 3
(x + y)4 = x4 + 4x3 y + 6x2 y 2 + 4xy 3 + y 4
(x + y)5 = x4 + 5x4 y + 10x3 y 2 + 10x2 y 3 + 5xy 4 + y 5 .
In the remainder of this section we work in the field R of real numbers. (The
same definitions and statements hold in Q or C, and in fact in any field whose
prime ring is isomorphic to Z, that is, any field of characteristic zero.)
10.8.9 Definition. Factorial For n ∈ N define the factorial function n! (read:
n-factorial) recursively as follows. Define 0! = 1 and (n + 1)! = (n + 1)(n!). Thus,
∏
n
(10.14) n! = i = 1 · 2 · . . . · n.
i=1
(Recall that an empty product is 1 by definition, hence 0! = 1 as the empty product

over {i ∈ N | 1 ≤ i ≤ 0}.)
10.8.10 Example. We have 4! = 4·(3!) = 4·3·(2!) = 4·3·2·(1!) = 4·3·2·1·(0!) =
4 · 3 · 2 · 1 · 1 = 4 · 3 · 2 · 1 = 24.
10.8.11 Proposition. For all natural numbers n, k ∈ N,
( ) ∏
1 ∏
k k
n n−j+1
= = (n − j + 1)
k j k!
(10.15) j=1 j=1
n(n − 1) · . . . · (n − k + 1)
= .
1 · 2 · ... · k
Proof. First we remark that none of the denominators k! is ever zero. Moreover,
in case k > n, all products ( )in (10.15) are zero since their (n + 1)-st factor is
n − (n + 1) + 1 = 0. Since nk = 0 in that case, the identity is true.
It remains to show that the product formulas on the right of (10.15) satisfy
the defining properties (10.7) and (10.8). They
∏ are obviously all equivalent, so we
prove this only for the expressions C(n, k) = kj=1 (n − j + 1)/j.
If k = 0,
∏0
n−j+1
C(n, 0) = =1
j=1
j
(as an empty product).
If k = n,
∏k
∏
k
n−j+1 j=1 (n − j + 1)
C(n, n) = = ∏k
j=1
j j=1 j
∏k
ℓ
= ∏kℓ=1 change of variable ℓ = n − j + 1
j=1 j
k!
= = 1.
k!
For 0 < k < n,
∏
k
n−j+1 ∏
k+1
n−j+1
C(n, k) + C(n, k + 1) = +
j=1
j j=1
j
∏
k
n−j+1 n−k ∏n−j+1
k
= + ·
j=1
j k + 1 j=1 j
( ) k
n−k ∏n−j+1
= 1+
k + 1 j=1 j
∏k
n + 1 j=1 (n − j + 1)
= ∏k
k+1 j=1 j
∏k
(n + 1) ℓ=1 (n − ℓ + 1)
= ∏
(k + 1) kj=1 j
∏k
(n − ℓ + 1)
= ℓ=0∏k+1
j=1 j
∏k+1
j=1 (n + 1 − j + 1)
= ∏k+1 letting ℓ = j − 1
j=1 j
∏
k+1
n+1−j+1
=
j=1
j
= C(n + 1, k + 1).
( )
We conclude the proof noting that C(n, k) must be equal to nk for all n, k ≥ 0
since both quantities satisfy the same recurrence. The proof of (10.15) is complete.
10.8.12 Corollary. For all natural numbers n, k with 0 ≤ k ≤ n:

( )
n n!
(10.16) = .
k k!(n − k)!
Proof. From (10.15),

( )
1 ∏
k
n
= (n − j + 1)
k k! j=1
∏n
ℓ
= ℓ=n−k+1 setting ℓ = n − j + 1
k! ∏
∏n−k n
ℓ=1 ℓ
ℓ
= ∏n−k · ℓ=n−k+1 multiplying by 1
ℓ=1 ℓ k!
∏n−k ∏n
ℓ · ℓ=n−k+1 ℓ
= ℓ=1 ∏n−k
k! ℓ=1 ℓ
∏n
ℓ=1 ℓ
=
k!(n − k)!
n!
= .
k!(n − k)!
It is worth noting that, while formula (10.15) is valid for all k ∈ N, for-
mula (10.16) is only valid for k ≤ n. Moreover, the attentive reader will readily
observe that the right-hand side of (10.15) is perfectly well defined for any number
n (say, rational, or even real or complex!). This is no coincidence, as shall be
explained in Note 10 when binomial series are introduced.
10.18 Exercise. As a corollary of the Binomial Theorem, prove Bernoulli’s in-
equality
(1 + t)n ≥ 1 + nt
for all n ∈ N and all t ≥ 0.
(Note: Bernoulli’s inequality actually holds for all t ≥ −1.)
10.8.13
(100Examples.
) 100·99 • The coefficient of x2 in the expansion
( ) of (1 + x)100 is
2
= 1·2 = 9900
2
= 4950. That of x5 in (1 + x)10 is 10
5
= 10·9·8·7·6
1·2·3·4·5
= 252.
• The following is a non-example, but incredibly suggestive nonetheless. What

if we formally put n = −1 in the binomial theorem? On the one hand,
formally at least, Proposition 10.8.11 gives
( ) ∏
−1 − j + 1 ∏
k k
−1
= = (−1) = (−1)k .
k j=1
j j=1
On the other hand,

( ) ( ) ( ) ( )
−1 −1 −1 −2 −1 −3 2 −1 −1−k k
x + x y+ x y + ··· + x y + ...
0 1 2 k
= x−1 − x−2 y + x−3 y + · · · + (−1)k x−1−k y k + . . .
( )
1 y y2 ky
k
= 1 − + 2 − · · · + (−1) k + . . . .
x x x x
At least if |y| < |x|, we recognize between the parentheses above the geometric
series with ratio − xy and sum (1 + xy )−1 . So, in fact,
( )
1 y y2 ky
k
1 1 1
1 − + 2 − · · · + (−1) k + . . . = · y = = (x + y)−1 ,
x x x x x 1− x x+y
at least when |y| < |x|. Hence the binomial theorem, hitherto a purely algebraic
statement, secretly contains information about infinite series. We shall revisit this
puzzle in Note 10. We emphasize that Proposition 10.8.11 alone (and certainly not
the less general, if catchy,
( )factorial formula (10.16)) allows extending the meaning
of binomial coefficients nk to values n ∈ / N.
10.9 Monotonicity
10.9.1 Definition. A real (or extended real) sequence {an } is said to be
(1) monotonically increasing if an ≤ an+1 for n = 0, 1, 2, . . .;
(2) monotonically decreasing if an ≥ an+1 for n = 0, 1, 2, . . ..
A monotonic sequence is one that is either monotonically increasing or monotoni-
cally decreasing.
10.9.2 Remark. It is a trivial matter to prove by induction that, for a monoton-
ically increasing (resp., decreasing) sequence {an } and all natural numbers m ≤ n
we have am ≤ an (resp., am ≥ an .)
10.9.3 Theorem (Monotone Sequence Theorem). A monotonic sequence
converges if and only if it is bounded.
Proof. Part 1. Suppose that {an } is a monotonically increasing sequence. If {an }
converges, it is bounded. Now, we must construct the proof in the other direction.
Suppose that the sequence {an } is bounded. Let R denote the range of {an }. Let
B denote the least upper bound of R. Therefore,
(1). an ≤ B for n = 0, 1, 2, . . .
For every ϵ > 0 there is some N such that
(2) B − ϵ < aN ≤ B,
otherwise B cannot be the least upper bound of R. Since the sequence is mono-
tonically increasing,
(3) n > N ⇒ aN ≤ an .
Therefore, thanks to (1), (2) and (3), we have
n > N ⇒ B − ϵ < aN ≤ an ≤ B ⇒ −ϵ < an − B ≤ 0 ⇒ ϵ > B − an ≥ 0

⇒ |an − B| < ϵ.
showing that an → B as n → ∞.
Part 2. If {an } is a monotonically decreasing sequence, the sequence {−an } is
monotonically increasing. Clearly, {an } is bounded if and only if {−an } is bounded.
Thanks to the first part of the proof, the conclusion of the theorem applies to the
sequence {−an }. We use Theorem 10.6.1(2) with c = −1 to conclude that the
convergence of {an } follows from the convergence of {−an }.
10.19 Exercise. Write a direct proof of Part 2 of the proof of Theorem 10.9.3
above. That is, modify Part 1 of the proof to apply directly to monotonically
decreasing sequences.
10.9.4 Proposition. Every monotonic sequence {an } has a limit limn→∞ an ∈ E.
Proof. By Theorem 10.9.3, we need only consider unbounded monotonic sequences.
Say {an } is monotonically increasing and unbounded. It is necessarily bounded
below by a0 , so it must be unbounded above. Given T there exists N such that
aN > T . Since {an } is increasing, we have n > N ⇒ an ≥ aN > T . Hence
an → +∞ as n → ∞. The case of {an } monotonically decreasing is formally
identical.
10.9.5 Theorem (Limits of Powers). For any real number κ ̸= 0 and any
sequence {an } in (0, +∞) with limn→∞ an = L ∈ [0, +∞]:
lim (an )κ = Lκ
n→∞
provided Lκ is defined. (Refer to Definition 10.1.4 and the exercise below for
improper cases.)
Proof. The proof below assumes L ∈ (0, +∞). The modifications needed when
L = 0, +∞ are left to the reader (see Exercise 10.20 below). { }
First, assume κ > 0. Let ϵ > 0 be given. Let ϵo = min ϵ, 21 Lκ . Since
L > 0 we have ϵo > 0. Then 0 < 12 Lκ ≤ Lκ − ϵo < Lκ < Lκ + ϵo . Let δ =
min{(Lκ + ϵ0 )1/κ − L, L − (Lκ − ϵo )1/κ }. Note that δ > 0 since κ > 0 implies that
pow1/κ : x 7→ x1/κ is strictly increasing (Proposition 8.7.24).
By the strict monotonicity of powκ : x 7→ xκ , we have
|x − L| < δ ⇒ L − δ < x < L + δ

⇒ (Lκ − ϵo )1/κ < x < (Lκ + ϵo )1/κ
⇒ Lκ − ϵo < xκ < Lκ + ϵo
⇒ |xκ − Lκ | < ϵo ≤ ϵ.
Therefore, given ϵ > 0, it suffices to choose N such that n > N ⇒ |an − L| < δ in
order to conclude that (an )κ → Lκ as n → ∞.
For κ < 0 we have
( )
κ 1
lim (an ) = lim
n→∞ n→∞ (an )−κ
1
= by Theorem 10.6.1(6)
limn→∞ (an )−κ
1
= −κ since −κ > 0 (above case)
L
= Lκ .
10.20 Exercise. Let {an } be a real sequence (not necessarily positive) and κ ∈ R.
Let limn→∞ an have a limit L ∈ E. Show that the equality
lim (an )κ = Lκ .
n→∞
holds in any of the following cases:
1. κ = 0 (interpret x0 as 1 for all x ∈ E).
2. κ ∈ Z+ (if L = +∞ interpret Lκ = +∞; if L = −∞ interpret Lκ as +∞ if

κ is even, −∞ if κ is odd).
3. κ ∈ Z− and L ̸= 0 (if L = ±∞ interpret Lκ = 0).
4. {an } in [0, +∞), L = 0 and κ > 0.
5. {an } in [0, +∞), L = 0 and κ < 0 (interpret 0κ as +∞).

6. {an } in [0, +∞), L = +∞ and κ > 0 (interpret (+∞)κ as +∞).
7. {an } in [0, +∞), L = +∞ and κ < 0 (interpret (+∞)κ as 0).

10.9.6 Theorem (Limits of Exponentials). For any real positive b ̸= 1 and any
sequence {an } with limn→∞ an = L ∈ E:
lim ban = bL .
n→∞
(Refer to Definition 10.1.4 for the meaning of bL in the cases L = ±∞.)

Proof. The scheme of the proof is identical to that of Theorem 10.9.5 above. We
provide the details in case L ∈ (−∞, +∞). The cases L = ±∞ are left to the
reader (Exercise 10.21). { }
First assume b > 1. Let ϵ > 0 be given. Let ϵo = min ϵ, 12 bL . Clearly, ϵo > 0.
Then 0 < 12 bL ≤ bL − ϵo < bL < bL + ϵo . Let δ = min{logb (bL + ϵo ) − L, L −
logb (bL − ϵo )}. Note that δ > 0 since b > 1 implies that logb is strictly increasing
(Definition 8.7.20).
By the strict monotonicity of expb , we have
|x − L| < δ ⇒ L − δ < x < L + δ

⇒ logb (bL − ϵo ) < x < logb (bL + ϵo )
⇒ bL − ϵo < bx < bL + ϵo
⇒ |bx − bL | < ϵo ≤ ϵ.
Therefore, given ϵ > 0, it suffices to choose N such that n > N ⇒ |an − L| < δ in
order to conclude that ban → bL as n → ∞.
If 0 1) and Theorem 10.6.1(2)
= bL .
10.9.7 Theorem (Limits of Logarithms). For any real positive b ̸= 1 and any
sequence {an } in [0, +∞] with limn→∞ an = L ∈ [0, +∞]:
lim logb (an ) = logb L.

n→∞
(Refer to Definition 10.1.4 the definition of logb L when L = 0 or L = +inf ty.)

Proof. The proof is a formal analogue of that of Theorem 10.9.6 with expb and logb
swapping their respective roles. The details are left to the reader.
10.21 Exercise. Fill in the missing details in the proofs of Theorems 10.9.6
and 10.9.7.
10.22 Exercise. Strengthen Theorems 10.9.6 and 10.9.7 by extending them to
cover the cases limn→∞ an = +∞ as well as {an } in [0, +∞) with limn→∞ an = 0
in 10.9.7 and limn→∞ an = −∞ in 10.9.6.
1
10.9.8 Theorem (Some Special Sequences). (1) If p > 0 then lim p = 0.
n→∞ n
√
(2) If p > 0 then lim n p = 1.
n→∞
√
(3) lim n
n = 1.
n→∞
nr
(4) If p > 0 and r is any real number then lim = 0.
n→∞ (1 + p)n
(5) If |x| < 1 then lim xn = 0.

n→∞
Proof. We prove (1) Apply 10.9.5 to the sequence an = n → L = +∞ with

κ = −p < 0 to obtain 1/np = n−p → (+∞)−p = 0.
We prove (2) If p = 1 the claim is obvious. For p ̸= 1, apply Theorem 10.9.6
√
with b = p and an = 1/n → L = 0 to conclude n p = p1/n → p0 = 1 as n → ∞.
We prove (3) Define an = n1/n − 1. We have an > 0. Thanks to the binomial
theorem we have, for n ≥ 2,
( )
n 2 1
n = (1 + an ) ≥
n
an = n(n − 1)a2n .
2 2
Hence, for n ≥ 2 we have √
2
0 ≤ an ≤ .
n−1
√
Clearly 2/(n − 1) → 0 as n → ∞, hence 2/(n − 1) = (2/(n − 1))1/2 → 0 also,
by Exercise 10.20, part 4. The squeeze principle now gives limn→∞ (n1/n − 1) =
limn→∞ an = 0.
We prove (4) Let k be any fixed positive integer such that k > r (for instance,
k = 1 + max{0, ⌊r⌋}). For n > 2k we have
( )
n n k n(n − 1) · · · (n − k + 1) k nk pk
(1 + p) > p = p > k .
k k! 2 k!
Therefore, for n > 2k we have
nr 2k k! r−k
0< < n .
(1 + p)n pk
Since r − k < 0, we have limn→∞ nr−k = 0 thanks to (1).

We prove (5)‡ With r = 0 and x = 1/(1 + p) we have
[ ]n
nr 1 1
= = = xn .
(1 + p)n (1 + p)n (1 + p)
Since |x| < 1 implies p = (1 − x)/x > 0, the result follows thanks to the above
and (4).
10.9.9 Definition. Let {ak }k≥0 be a real sequence. Let
un = sup ak = sup an+k ∈ E,

k≥n k≥0
ln = inf ak = inf an+k ∈ E

k≥n k≥0
be the sequences of suprema and infima (in the extended reals) of the tails of {ak }.
Then {un } is monotonically decreasing and {ln } is monotonically increasing, so
both have a limit in E. We define
(10.17) lim sup an = lim un = lim sup ak = lim sup an+k ∈ E

n→∞ n→∞ n→∞ k≥n n→∞ k≥0
(the superior limit or “lim sup” of {an }), and
(10.18) lim inf an = lim ln = lim inf ak = lim inf an+k ∈ E

n→∞ n→∞ n→∞ k≥n n→∞ k≥0
(the inferior limit or “lim inf” of {an }).

10.9.10 Remarks. • The properties “{an } has superior (resp. inferior) limit
L as n → ∞” are final: they depend only on (any) tail of {an }. The reader
is encouraged to ascertain this fact.
• The notations
limn→∞ an and limn→∞ an
are common alternatives to denote the superior and inferior limits of the
sequence {an }.
10.9.11 Examples.
(1) For any sequence {an } we have limn→∞ an = L if and only if
lim inf an = lim sup an = L.

n→∞ n→∞
‡
Except for the trivial case x = 0, this fact can also be proved directy from Theorem 10.9.6
applied to the sequence an = n → ∞ and b = |x| < 1 to give limn→∞ |xn | = |x|+∞ = 0.
(2) For the sequence an = (−1)n we have lim supn→∞ (−1)n = +1 = limn→∞ a2n
and lim inf n→∞ (−1)n = −1 = limn→∞ a2n+1 . In fact, these are the only
subsequential limits of {an }.
(3) The sequence an = n(−1) , that is the sequence 0, 1, 2, 3−1 , 4, 5−1 , . . . has
n+1
superior limit +∞ = lim inf n→∞ an = limn→∞ a2n+1 and inferior limit 0 =
lim inf n→∞ an = limn→∞ a2n , and these are exactly its subsequential limits.
√ √ √ √
(4) The sequence an = *n 2+ = n 2 − ⌊n 2⌋ (an is the fractional part of n 2)
has superior limit 1 and inferior limit 0. It can be shown that every number
in [0, 1] is a subsequential limit of {an }.
The reader may ask at this point whether the superior (resp., inferior) limits
are always equal to the largest (resp., smallest) subsequential limits of a sequence.
The answer is affirmative and given in Theorem 10.9.17 below.
10.9.12 Proposition. Let {an } and {bn } be real sequences and c ∈ R. Then the
following relations hold (using the arithmetic of the extended reals):
1. lim supn→∞ (c + an ) = c + lim supn→∞ an and

lim inf n→∞ (c + an ) = c + lim inf n→∞ an .
2. For c > 0: lim supn→∞ (can ) = c lim supn→∞ an and

lim inf n→∞ (can ) = c lim inf n→∞ an .
3. For c < 0: lim supn→∞ (can ) = c lim inf n→∞ an and

lim inf n→∞ (can ) = c lim supn→∞ an .
4. lim supn→∞ (an + bn ) ≤ lim supn→∞ an + lim supn→∞ bn and

lim inf n→∞ (an + bn ) ≥ lim inf n→∞ an + lim inf n→∞ bn
as long as the sums on the right are defined in E.
Proof. We only prove the first part of 4., leaving everything else to the reader
(actually, the second part of 4. follows from the first together with the case c = −1
of 3.). Set un = supk≥n ak , vn = supk≥n bk and wn = supk≥n (ak + bk ). For k ≥ n
we have ak ≤ un and bk ≤ vn , hence ak + bk ≤ un + vn . Hence un + vn is
an upper bound for the tail {ak + bk }k≥n , whence wn ≤ un + vn follows. Then
lim supn→∞ (an + bn ) = limn→∞ wn ≤ limn→∞ (un + vn ) = limn→∞ un + limn→∞ vn =
lim supn→∞ ak + lim supn→∞ bk (as long as the last sum is defined.)
10.9.13 Proposition. If limn→∞ an exists in E, then
(10.19) lim inf an = lim an = lim sup an .

n→∞ n→∞ n→∞
Proof. We prove only the right equality. Set un = supk≥n ak . Let L = limn→∞ an
and U = limn→∞ un = lim supn→∞ an .
By Proposition 10.2.14, any neighborhood N (L) of L contains a closed sub-
neighborhood N . Since an → L as n → ∞, the tails {ak }k≥n are in N for n
sufficiently large. Since N is closed in E, N contains the closure of those tails,
hence un ∈ N for all n large. Since N is closed, U = limn→∞ un ∈ N ⊂ N (L).
Since U is in every neighborhood of L we have U = L.
10.9.14 Proposition. Let {an } and {bn } be sequences such that an ≤ bn for all
sufficiently large n. Then
lim sup an ≤ lim sup bn ,

n→∞ n→∞
lim inf an ≤ lim inf bn .

n→∞ n→∞
Proof. Let un = supk≥n an and vn = supk≥n bn . Since an ≤ bn for all sufficiently

large n, we have un ≤ vn for all sufficiently large n. Therefore lim supn→∞ an =
limn→∞ un ≤ limn→∞ vn = lim supn→∞ bn . The proof of the second inequality is
formally identical.
10.9.15 Proposition. Let C0 ⊃ C1 ⊃ . . . be a nested sequence of nonempty,
closed subsets of E. Then ∩
C= Cn
n∈N
is a nonempty closed subset of E, and
sup C = lim sup Cn = inf{sup Cn | n ∈ N}

n→∞
and
inf C = lim inf Cn = sup{inf Cn | n ∈ N}.
n→∞
Proof. First note that {Cn }, being a nested sequence of nonempty closed subsets
of the compact set E satisfies the FIP property; hence C is nonempty, by the FIP
characterization of compactness. Moreover C is closed, being an intersection of
closed sets.
We only prove the equalities for sup C, the case of inf C following formally
identically. Let s = inf{sup Cn }. Since {Cn } is nested the sequence {sup Cn } is
monotonically decreasing and s = limn→∞ sup Cn .
We have C ⊂ Cn , hence sup C ≤ sup Cn for all n ∈ N. Thus, sup C ≤ s.
Let Dn = Cn ∩ [t, +∞]. Then {Dn } is a nested
Now, let t > sup C be arbitrary. ∩
sequence of closed sets. Note that n Dn ⊂ C ∩ [t, +∞] = ∅ since C ⊂ [−∞, t).
By (the contrapositive of) the FIP characterization of compactness and the nesting
property, Dm = ∅ for some m ∈ N. This means that n > m ⇒ Cn ⊂ [−∞, t). By

the (decreasing) monotonicity of the sequence {sup Cn }, we have s = inf{sup Cn |
n > m} ≤ sup Cm ≤ t. For arbitrary t > sup C we have shown that s ≤ t, so we
may conclude that s ≤ sup C. The proof that s = sup C is complete.
10.9.16 Remark. The hypothesis that the sets in the nested sequence {Cn } in
Proposition 10.9.15 be closed is essential, as is the role played by the compactness
of E in the proof. If the assumption that all Cn be closed is removed, even adding
the assumption C ̸= ∅, only the inequalities sup C ∪ ≤ limn→∞ sup Cn and inf C ≥
limn→∞ inf Cn remain true. Consider Cn = {0} ∪ k≥n {(−1)k (1 + 1/k)}. Then
C = {0} so sup C = inf C = 0, but limn→∞ sup Cn = +1 and limn→∞ inf Cn = −1.
10.9.17 Theorem. Let {an } be a real (or extended real) sequence.
For each m ∈ N define
Am = {am+n | n ∈ N} ⊂ E and Cm = (Am )−
(Cm is the closure in E of the m-th tail of {an }). Let

∩
C= Cm .
m∈N
Then C = SeqLim{an } (C is the set of subsequential limits of {an }). Moreover
lim sup an = sup C = max C and lim inf an = inf C = min C.

n→∞ n→∞
Proof. By Propositions 10.2.8 and 10.9.15, C is a closed nonempty set, and
max C = sup C = lim sup Cn = lim sup An = lim sup an .

n→∞ n→∞ n→∞
Similarly, min C = inf C = lim inf n→∞ an .

Let S = SeqLim{an } be the set of subsequential limits of {an }. It remains to
prove S = C.
The inclusion S ⊂ C is proved as follows. If L is a subsequential limit, say
ank → L as k → ∞ then, for all m ∈ N, the m-th {ank+m | k ∈ N} of {ank } is a
sequence in Am . Therefore L ∈ A− m = Cm . Since this holds for all m ∈ N we have
s ∈ ∩{Cm } = C.
Conversely, let L ∈ C, hence L ∈ Cm for all m ∈ N. We will construct a
subsequence {ank } with limit L. First we remark, since s ∈ Cm = A− m , that for any
neighborhood N (s) and any m ∈ N, Am intersects N (L); therefore an ∈ Am ∩N (L)
for some n ≥ m.
Let {Nk } be a shrinking sequence of neighborhoods of L. Successively construct
an increasing sequence n0 < n1 < . . . as follows:
• Let n0 be any n such that an ∈ A0 ∩ N0

• Having chosen nk , let nk+1 be any n such that n ≥ 1 + nk and an ∈ A1+nk ∩
Nk+1 .
Clearly {nk } is increasing; moreover, since {Nm } is nested, we have
l>k ⇒ anl ∈ Nl ⊂ Nk .
Therefore, {ank } is eventually in any given Nk . Since {Nk } shrinks to L, we have
ank → L as k → ∞. We conclude that L ∈ S, hence C ⊂ S, finishing the proof
that C = S. In particular, S is closed and nonempty so sup S = max S and
inf S = min S.
10.23 Exercise. Let {an } be any enumeration of the rationals, that is, an = f (n)
where f : N → Q is a bijection. Prove that the set of subsequential limits of {an }
is equal to [−∞, +∞].
10.10 Series
10.10.1 Definition. Given a real sequence {an }, the sequence of terms, we form
another sequence, the sequence of partial sums {sk }, defined by
∑
k
sk = an ,
n=0
for n = 0, 1, 2, . . .. If {sk } converges to L, we write

∑
∞
an = L,
n=0
∑
and say that the infinite series ∞ n=0 an converges to L, or that the sequence {an }
is
∑∞ summable to L (or has the sum ∑L). Otherwise, we∑ say that the infinite series
∞ ∞
n=0 an diverges. We also write n=0 an = +∞ or n=0 an = −∞ depending
upon whether {sk } diverges to +∞ or to −∞, respectively. (Of course, a series
may diverge without diverging to ±∞.)
∑
∞
10.10.2 Theorem. Cauchy Principle for Series The series an converges
n=0
if and only if for every ϵ > 0 there is some N such that
n
∑
(10.20) n ≥ m > N ⇒ ak < ϵ.
k=m+1
Proof. Apply the Cauchy criterion to the sequence of partial sums. For every ϵ > 0
there is some N such that
∑ ∑
n m

m, n > N ⇒ ak − ak < ϵ.
k=0 k=0
Therefore, with m ≤ n we obtain (10.20).

∑
∞
10.10.3 Corollary. If the series an converges then an → 0 as n → ∞ (that
n=0
is, limn→∞ an exists and equals 0).
(Caveat:
∑ Note that the converse of this statement is false. The Harmonic
Series n 1/n diverges, as shown below, yet limn→∞ 1/n = 0.)
Proof. Take m = n − 1 > N in (10.20) to obtain: n > N + 1 ⇒ |an | < ϵ.
10.10.4 Proposition (“nth -Term”
∑ Test for Divergence). If limn→∞ an ̸= 0 or
limn→∞ an does not exist, then n an diverges.
Proof. This is the contrapositive statement to that of the Corollary above.
10.10.5 Theorem. Suppose {an } and {bn } are real sequences such that
∑
∞ ∑
∞
an = L 1 ∈ E and bn = L2 ∈ E.
n=0 n=0
Then
∑∞
(1) n=0 can = cL1 for any c ∈ R
(here we interpret 0(±∞) as 0);
∑∞
(2) n=0 (an + bn ) = L1 + L2 if L1 + L2 is defined
(i.e., it is not the sum of +∞ and −∞);
(3) L1 ≤ L2 if an ≤ bn for all n.
Proof. Let {σn } (resp., {ςn }) be the sequence of partial sums of {an } (resp., of
{bn }).
• We prove (1). The sequence {sn } of partial sums of {can } satisfies sn = cσn .
The claim follows from Theorem 10.6.1(2).
• We prove (2) The sequence {sn } of partial sums of {an + bn } satisfies

sn = σn + ςn . The claim follows from Theorem 10.6.1(3).
• We prove (3) We have σn ≤ ςn for all n ∈ N. The claim follows from

Theorem 10.6.1(8).
10.10.6 Theorem (Monotonicity). A series of nonnegative terms converges
if and only if its sequence of partial sums is bounded (above).
Proof. The sequence of partial sums of a series of nonnegative terms is a monoton-
ically increasing sequence. Therefore, it converges if and only if its bounded, by
Theorem 10.9.3.
10.10.7 Definition
∑ (Absolutely Convergent Series. Absolute Summa-
bility). A series ∞n=0 an is called absolutely convergent, and the sequence {an }
absolutely summable, if the series
∑∞
|an |
n=0
converges.
(For a series of nonnegative terms an ≥ 0, absolute summability is exactly the
same thing as summability.)
10.10.8 Theorem (Absolute Convergence Test). An absolutely convergent
series is convergent.
∑n ∑n
Proof. Let {an } be absolutely summable.
∑ Let σ
∑n = k=0 |ak | and s n = k=0 ak
be the sequences of partial sums of |ak | and ak . For m ≤ n:
n
∑

|sn − sm | = ak

k=m+1
∑n
≤ |ak | by the triangle inequality
k=m+1
= σn − σm .
∑
Since |ak | converges, the sequence of its partial sums {σn } converges, so it is a
∑ above, {sn } is also a Cauchy sequence, hence
Cauchy sequence. By the inequality
convergent. Therefore the series ak converges.
10.10.9 Theorem (Rearrangement Theorem). Let {ak } be a real sequence.
Let i 7→ ki be a bijection N → N (a permutation of N). The sequence {aki } is
called a rearrangement
∑ of {ak }.
∑ If the series k ak is absolutely convergent series then the rearranged series
i aki is also absolutely convergent, and
∑
∞ ∑
∞
ak i = ak .
i=0 k=0
Proof. Let {sk }, {σk }, {ti } and {τi } be the sequences of partial sums of {ak },
{|ak |}, {aki } and {|aki |}. ∑
First we prove {aki } is∑absolutely summable. Let A = k |ak |. Then σk ≤
A < +∞ for all k since k |ak | is absolutely convergent. By monotonicity, it
suffices to show that {τi } is bounded above by A. Let j 7→ kj be the inverse to
the bijection k 7→ ik . Given i, let K = max{kj | 0 ≤ j ≤ i}. Then we have
{k0 , k1 , . . . , ki } ⊂ {0, 1, . . . , K}, so
∑
i ∑
K
τi = |akj | ≤ |ak | ≤ σK ≤ A.
j=0 k=0
∑ ∑
Now, let S = ak . We shall prove that j akj = S. Note first that
k
∞ ∞
∑ ∑k ∑ ∑
∞

|S − sk | = aj − aj = aj ≤ |aj | = A − σk .

j=0 j=0 j=k+1 j=k+1
for all k ∈ N. Let ϵ > 0 be given. Choose Kϵ such that k ≥ Kϵ ⇒ 0 ≤ A − σk ≤

ϵ. Let I = max{i0 , i1 , . . . , iKϵ }. For any ι > I let K ′ = max{k0 , k1 , . . . , kι }.
Since ι > I we have K ′ ≥ Kϵ and also {k0 , k1 , . . . , kι } ⊃ {0, 1, . . . , Kϵ } (because
{0, 1, . . . , ι} ⊃ {i0 , i1 , . . . , iKϵ }). Now, we have
( ′ )
∑ ∞ ∑ ι ∑ K ∑
∞ ∑ ι

|S − tι | = ak − ak j = ak + ak − akj

k=0 j=0 k=0 k=K ′ +1 j=0

( )
∑ ∞ ∑K ′
∑ ι ∑

= ak + ak − akj = (S − sK ′ ) + ak

k=K ′ +1 k=0 j=0 0≤k≤K ′
k∈{k
/ ,k ,...,k }

0 1 ι
′
since {k0 , k1 , . . . , kι } ⊂ {0, 1, . . . , K }
∑ ∑
≤ |S − sK ′ | + |ak | ≤ A − σK ′ + |ak |
0≤k≤K ′ Kϵ <k≤K ′
k∈{k
/ 0 ,k1 ,...,kι } k∈{k
/ 0 ,k1 ,...,kι }
since {k0 , k1 , . . . , kι } ⊃ {0, 1, . . . , Kϵ }

≤ A − σK ′ + (σK ′ − σKϵ ) = A − σKϵ ≤ ϵ.
∑∞
Since this holds for every ϵ > 0 we conclude S = limi→∞ ti = i=0 ai .
10.10.10 Definition (Conditional Convergence). A convergent series that is
not absolutely
∑ convergent is called a conditionally convergent series. These are the
series n an such that
∑
∞ ∑
∞
an converges and |an | diverges.
n=0 n=0
10.10.11 Remark. ∑ The distinction between absolutely and conditionally conver-

gent series n an may seem trivial, but it is actually quite subtle and very∑ impor-
tant in analysis. If {ai | 0 ≤ i ≤ n} is a finite sequence, then the sum ni=0 ai
is independent of ordering in the sense of generalized commutativity. Namely, if
i 7→ mi is a permutation of the numbers 0, 1, . . . , ∑ n (that is, m0 , m1 , . . . , mn are
n
0,
∑n 1, . . . , n in some order), then the rearranged sum i=0 ami still has sum equal to
i=0 ai . Note that the finite sequence {ai }, regarded as an infinite sequence with
ai = 0 for i > n, is absolutely summable.
By Theorem 10.10.9 rearrangements preserve absolute convergence and value of
sum. However, if {ai } is only conditionally summable, then the following remark-
able result holds: For ∑any L ∈ E there exists a permutation i 7→ mi of the natural
numbers such that ∞ i=0 ami = L. This shocking fact points to an utter failure
of commutativity for “sums” of countably many terms (i.e., series). Even more
is true. Given a conditionally summable sequence {ai } as above and any closed
interval I ⊂ E, there exists a permutation i 7→ mi such that the I is the set of
sequential limits of the sequence {sn } of partial sums of {ami }.
10.24 Exercise. If a series is conditionally convergent, prove that it includes
infinitely many positive terms and infinitely many negative terms.
10.10.12 Theorem (Comparison ∑∞ Test). Let {an }∑and {bn } be sequences such
that |an | ≤ bn for all n. If n=0 bn converges, then ∞ n=0 an is absolutely conver-
gent.
∑n ∑n
Proof. Let sn = k=0 |an | and σn = k=0 bn be the sequences∑of partial sums
of {|an |} and {bn }. Since |an | ≤ bn we have sn ≤ σn . Since n bn converges,
∑
{σn } is bounded, hence so is {sn }. By monotonicity (Theorem 10.10.6), n |an |
converges.
10.10.13 Proposition (Dominated Monotone Convergence to Zero). Let
N × N → R : (n, k) 7→ bn,k be a real-valued function on N × N. (This amounts
to giving a sequence of sequences {bn,k }n , one for each k ∈ N.) For each n ∈ N,
assume that {bn,k }k monotonically decreases to zero:
bn,0 ≥ bn,1 ≥ . . . and lim bn,k = 0.

k→∞
Assume that {bn,0 }n is summable.§ Then, for each k ∈ N, the sequence {bn,k }n is
summable, and
∑
∞
lim bn,k = 0.
k→∞
n=0
§
∑
This is the “domination”
∑ condition, in the sense that the series n bn,0 is convergent and it
dominates all others n bn,k term-by-term.
Proof. We have 0 ≤ bn,k ≤ bn,0 . By the comparison test and the assumed summa-
bility of {bn,0 }n , {bn,k }n is summable for every k ∈ N.∑Let ϵ > 0 be given. Since
{bn,0 }n is summable there exists N ∈ N+ such that ∞ n=N bn,k < ϵ/2. For fixed
n ∈ N, since {bn,k }k decreases to 0, there exists Kn such that k > Kn ⇒ bn,k < 2Nϵ
.
Let K = max{K0 , K1 , . . . , KN }. Then we have, for all k > K:
∑
∞ ∑
N −1 ∑
∞
0≤ bn,k = bn,k + bn,k
n=0 n=0 n=N
∑
N
ϵ
−1 ∑∞
< + bn,0
n=0
2N n=N
ϵ ϵ
<N + = ϵ.
2N 2
∑
Since this holds for all ϵ > 0 we conclude limk→∞ ∞ n=0 bn,k = 0.
10.10.14 Theorem (Monotone Convergence Theorem). Let {bn } be a se-

quence of nonnegative terms. Let N × N → R : (n, k) 7→ an,k be a real-valued
function on N × N. (This amounts to giving a sequence of sequences {an,k }n ,
one for each k ∈ N.) For each n ∈ N, assume that {an,k }k is nonnegative and
monotonically increases to bn :
0 ≤ an,0 ≤ an,1 ≤ . . . and lim an,k = bn .

k→∞
Then, for each k ∈ N sequence {an,k }n is summable if and only if {bn }n is

summable. In any case
∑
∞ ∑
∞
lim an,k = bn ∈ [0, +∞].
k→∞
n=0 n=0
Proof. Let bn,k = bn − an,k . Since {an,k }k monotonically increases to bn , {bn,k }k

monotonically decreases to 0.
First assume {bn } is summable. We note that {an,k }n and {bn,k } are summable
for every k ∈ N since they are nonnegative and dominated by the summable
sequence {bn }n .
By Proposition 10.10.13,
∑
∞ ∑
∞
0 = lim bn,k = lim (bn − an,k )
k→∞ k→∞
(∞
n=0 n=0
)
∑ ∑
∞
= lim bn − an,k
k→∞
n=0 n=0
∑
∞ ∑
∞
= lim bn − lim an,k
k→∞ k→∞
n=0 n=0
∑
∞ ∑
∞
= bn − lim an,k .
k→∞
n=0 n=0
∑∞ ∑∞
This shows that limk→∞ n=0 an,k = n=0 bn , concluding the proof in the case
when {bn } is summable.
∑
Now assume n bn = +∞. For all M ∈ R there exists N ∈ N+ such that
∑N −1
n=0 bn > M + 1. For all n < N there exist Kn ∈ N such that an,k > bn − 1/N for
all k > Kn (since an,k → bn as k → ∞). Let K = max{K0 , K1 , . . . , KN −1 }. Then
∑∞ ∑
N −1 ( ) N ∑ −1
1 1
k>K ⇒ an,k ≥ bn − > bn − · N > M + 1 − 1 = M.
n=0 n=0
N n=0
N
∑ ∑
Since M was arbitrary, this shows n an,k → +∞ = n bn as k → ∞, concluding
the proof.
∑
∞
10.10.15 Theorem (Geometric Series). The geometric series xn con-
n=0
verges absolutely if |x| < 1, and diverges if |x| ≥ 1. Its sum is
∑
∞
1
(10.21) xn = for |x| < 1.
n=0
1−x
Proof. For |x| ≥ 1 we have xn ̸→ 0 as n → ∞ so the series diverges, by the nth -term
test, Corollary 10.10.4.
For |x| < 1, we have
∑
k
1 − xk+1
sk (x) = xn = .
n=0
1−x
∑ nthat sk (|x|) = (1 − |x| )/(1 − |x|)

k+1
Note is the sequence of partial sums of
n |x |. Since |x| < 1 ∑
we have limn→∞ |x | = 0, so limk→∞ sk (|x|) = 1/(1 − |x|)
n
n
(Theorem 10.6.1) and n→∞ x is absolutely convergent; moreover its sum is
limk→∞ sk (x) = 1/(1 − x) for the same reason.
10.25 Exercise.∑∞ Let ϱ = {ρn } be a sequence of strictly positive terms such

that the series n=0 ρn is convergent. In Definition 10.7.2, replace the contraction
inequality by
ρn+1
|an+2 − an+1 | ≤ |an+1 − an |.
ρn
Call a sequence {an } satisfying these inequalities ϱ-contractive. Assuming the
sequence ϱ is strictly positive and summable, prove that a ϱ-contractive sequence
∑∞casen ρn = γ for a fixed γ ∈ (0, 1), leading to
n
converges. (Note that the special
the convergent geometric series n=0 γ , corresponds to a contractive sequence per
Definition 10.7.2.)
10.10.16 Theorem (Ratio Test). Let {an } be a sequence of nonzero real numbers
such that an ̸= 0 for all sufficiently large n.
• If
an+1

lim sup <1
n→∞ a n

∑
then {an } is absolutely summable (in particular, n an converges).
• If
an+1
lim inf >1
n→∞ an
∑
then |an | → ∞ as n → ∞ (in particular, n an diverges).
Proof. • Let ρ = lim supn→∞ |an+1 /an | < 1. Fix σ ∈ (ρ, 1). Since σ > ρ, the
definition of superior limit implies
un = sup |ak+1 /ak | < σ

k≥n
for all sufficiently large n, say for n > N . Therefore |ak+1 | ≤ uN |ak | < σ|ak |
for all k > N , and it follows easily by induction that |ak+N | ≤ σ k |aN | for all
k ∈ N. Thus, for any K,
∑
K ∑
N ∑
∞
|ak | ≤ |ak | + σ k |aN |
k=0 k=0 k=1
∑N
σ|aN |
= |ak | +
k=0
1−σ
by
∑ Theorem 10.10.15 (since σ < 1). We have shown that the partial sums
∑ of
|ak | are bounded above. Using Theorem 10.10.6, we conclude that an
is absolutely convergent.
• Let ρ = lim inf n→∞ |an+1 /an | > 1. Fix σ ∈ (1, ρ). Since σ < ρ, the definition
of superior limit implies
un = inf |ak+1 /ak | > σ
k≥n
for all sufficiently large n, say for n > N (in particular an ̸= 0 for all n > N ).
Therefore |ak+1 | ≥ uN |ak | > σ|ak | for all k > N , and it follows easily by
induction that |ak+N | ≥ σ k |aN | for all k ∈ N. Since |aN | > 0 and σ k → ∞
as k → ∞ (because σ > 1), we have |an | → ∞ as n → ∞.
10.10.17 Theorem (Limit Comparison Test). Let {an }, {bn } be sequences
such that bn > 0 for all sufficiently large n. Then:
∑ |an | ∑
• if n bn converges and lim supn→∞ < +∞ then n an converges abso-
bn
lutely;
∑ |an | ∑
• if n bn diverges and lim inf n→∞> 0 then n an diverges.
bn
10.26 Exercise. Prove Theorem 10.10.17.
10.10.18 Theorem (Power Test). Let {an } be a sequence of nonzero real num-
bers.
• If √
lim sup n
|an | < 1
n→∞
∑
then {an } is absolutely summable (in particular, n an converges).
• If √n
lim inf|an | > 1
n→∞
∑
then |an | → ∞ as n → ∞ (in particular, n an diverges).
√
Proof. • Let ρ = lim supn→∞ n |an | < 1. Fix σ ∈ (ρ, 1). Since σ > ρ, the
definition of superior limit implies
√
un = sup k |ak | < σ
k≥n
for all sufficiently large n, say for n > N . Therefore |an | < σ n for n > N .
For any K,
∑
K ∑
N ∑
∞
|an | ≤ |an | + σn
n=0 n=0 n=N +1
∑
N
σ N +1
= |an | +
n=0
1−σ
by
∑ Theorem 10.10.15 (since σ < 1). We have shown that the partial sums ∑ of
n |an | are bounded above. Using Theorem 10.10.6, we conclude that n an
is absolutely convergent.
√
• Let ρ = lim inf n→∞ n |an | > 1. Fix σ ∈ (1, ρ). Since σ < ρ, the definition of
superior limit implies √
un = inf n |an | > σ
k≥n
for all sufficiently large n, say for n > N . Therefore |an | ≥ σ n for all n > N .
Since σ > 1, it follows from Theorem 10.9.8 that |an | → ∞ as n → ∞.
∑
10.10.19 Theorem. ∑ Let a1 ≥ a2 ≥ · · · ≥ 0. Then, the series ∞ n=1 an converges
∞ n n
if and only if the series n=1 2 a2 converges.
Proof. Given n let k be any integer such that n ≤ 2k . Then
s n = a1 + a2 + a3 + · · · + an
≤ a1 + a2 + · · · + a2k + a2k +1 + · · · + a2k+1 −1
= a20
+(a21 + a22 −1 )
| {z }
2 terms
+(a22 + a22 +1 + a22 +2 + a23 −1 )
| {z }
4 terms
+(a23 + a23 +1 + a23 +2 + · · · + a24 −1 )
| {z }
8 terms
+ · · · + (a2k + a2k +1 + a2k +2 + · · · + a2k+1 −1 )
| {z }
2k terms
≤ a1 + 2a2 + 4a4 + 8a8 + · · · + 2k a2k
= tk ,
giving the bound sn ≤ tk . Therefore,
{tk } bounded ⇒ {sn } bounded,

{sn } unbounded ⇒ {tk } unbounded.
Conversely, given k, take any n with n > 2k ; we have
s n = a1 + a2 + a3 + · · · + an
≥ a1 + a2 + · · · + a2k−1 + a2k−1 +1 + · · · + a2k
= a20 + a21
+(a21 +1 + a22 )
| {z }
2 terms
+(a22 +1 + a22 +2 + a22 +3 + a23 )
| {z }
4 terms
+(a23 +1 + a23 +2 + a23 +3 + · · · + a24 )
| {z }
8 terms
+ · · · + (a2k−1 +1 + a2k−1 +2 + a2k−1 +3 + · · · + a2k )
| {z }
2k terms
1
≥ a1 + a2 + 2a4 + 4a8 + · · · + 2k−1 a2k
2
1
= (a1 + 2a2 + 4a4 + 8a8 + · · · + 2k a2k )
2
1
= tk ,
2
giving the bound 2sn ≥ tk .Therefore,
{sn } bounded ⇒ {tk } bounded,

{tk } unbounded ⇒ {sn } unbounded.
∑ ∑
Since each of the series n an and k 2k ak2 has nonnegative terms, each series
converges if and only if it its partial sums are bounded (by monotonicity, Theo-
rem 10.10.6). The conclusion follows, since we have shown that {sn } is bounded if
and only if {tk } is bounded.
∑∞
1
10.10.20 Corollary (The “p-series”). The series converges for p > 1
n=1
np
and diverges for p ≤ 1.
Proof. If p < 0 then {1/np } diverges to +∞ so, by the nth -term test, (Corol-
lary 10.10.4), the p-series diverges. For p ≥ 0 we apply Theorem 10.10.19 with
∑
∞ ∑∞
1
an = p
,
n=1 n=1
n
∑
∞ ∑
∞
1
n
2 a2n = 2n .
n=0 n=0
2np
We have
∑
∞
1 ∑ ∞ ∑ ∞
n
2 np = 2(1−p)n = xn ,
n=0
2 n=0 n=0
which is the geometric series with x = 21−p . By Theorem 10.10.15, this series
converges if and only if |x| = 21−p < 1, that is, exactly when p < 0 Since |x| =
21−p < 1 if and only if 1 − p < 0 the result follows.
∑∞
1
10.10.21 Corollary (Harmonic Series). The harmonic series diverges.
n=1
n
Proof. Take p = 1 above.

10.10.22 Remark.
∑ The harmonic series shows that the converse to the nth -term
test is false: n 1/n diverges, while 1/n → 0 as n → ∞.
10.10.23 Definition (Cauchy Product). The Cauchy product of two sequences
{an | n ∈ N} and {bn | n ∈ N} is the sequence {cn | n ∈ N} with
∑
n
cn = a0 bn + a1 bn−1 + · · · + an b0 = ak bn−k .
k=0
∑∞ ∑∞ ∑∞
The Cauchy product of two series n=0 an and n=0 bn is the series n=0 cn with
cn as above.
10.10.24 Theorem. The Cauchy product of a series absolutely convergent to A
and series convergent to B converges to AB.
Proof. Let {an } be summable to A, {|an |} be summable to α and {bn } be summable
to B. Let {sn }, {σn }, {tn } be the sequences of partial sums of {an }, {|an |} and
{bn }. Then sn → A and tn → B as n → ∞. The convergent sequence {tn } is
bounded, by ∑Proposition 10.3.19. Say |tn | ≤ T for all n ∈ N.
Let cn = nk=0∑ ak bn−k be the nth term of the Cauchy product of {an } and {bn }
and let and γn = nk=0 ck be the nth partial sum of {cn }.
For any n ∈ N we have
∑
n ∑
n ∑
k ∑
γn = ck = al bk−l = ak b l ,
k=0 k=0 l=0 0≤k,l
k+l≤n
that is, γn is the sum of all products ak bl with k + l ≤ n. (Unofficially, such pairs
(k, l) can be visualized as the points with integral coordinates in the triangle △
bounded by the lines x = 0, y = 0 and x + y = n on the plane.)
⌊n⌋ (1) (2) (3)
Let m = 2
. We write γn = γn + γn + γn , where
∑ ∑ ∑
γn(1) = ak b l = ak bl = sm tm ,
0≤k,l≤m 0≤k≤m 0≤l≤m
∑ ∑
γn(2) = ak b l = ak (tn−k − tm ), (here n − k ≥ n − m ≥ m)
0≤k≤m 0≤k≤m
m<l≤n−k
∑ ∑
γn(3) = ak b l = ak tn−k .
m<k≤n m<k≤n
0≤l≤n−k
(1)
(γn is the sum of the terms ak bl corresponding to points (k, l) in the largest square
(2) (3)
□ inscribed in △, while γn , resp. γn , is the sum of the terms corresponding to
points in the upper half, resp. in the right half, of △.)
Let ϵ > 0 be arbitrary. By Theorem 10.6.1, we have sn tn → AB as n → ∞, so
there exists N1 such that n > N1 ⇒ |sn tn − AB| < ϵ/3. Since {|tn |} converges it is
Cauchy, so there exists N2 such that m, n > N2 ⇒ |tn − tm | < ϵ/(3α + 1). Finally,
since {σn } converges there exists N3 such that m, n > N3 ⇒ |σn −σm | < ϵ/(3T +1).
Let N = 2 max{N1 , N2 , N3 } + 1. We will prove that n > N ⇒ |γn − AB| < ϵ,
concluding the proof that {cn } is summable to AB.
Let n > N . We will show that |γn − AB| < ϵ, concluding
⌊ ⌋the proof.
⌊n⌋
Since n > N ≥ 2Ni + 1 we have m = 2 ≥ (2N1 +1)+1
2
= Ni + 1 > Ni for
i = 1, 2, 3.
(1)
From n > N1 we immediately obtain |γn − AB| = |sm tm − AB| < ϵ/3.
Since n > N2 , we have
∑ ϵ α ϵ
|γn(2) | ≤ |ak | ≤ϵ < .
0≤k≤m
3α + 1 3α + 1 3
Since n > N3 , we have

∑ ∑ T ϵ
|γn(3) | ≤ |ak ||tn−k | ≤ T |ak | = (σn − σm )T < ϵ< .
m<k≤n m<k≤n
3T + 1 3
By the triangle inequality we finally obtain
|γn − AB| = |γn(1) − AB + γn(2) + γn(3) |

≤ |γn(1) − AB| + |γn(2) | + |γn(3) |
ϵ ϵ ϵ
< + + =ϵ
3 3 3
10.10.25 Corollary. The Cauchy product of two absolutely convergent series is

absolutely convergent.
Proof. Use the notation of the proof of Theorem ∑n 10.10.24. Let also {ςn } be the
sequence of partial sums of {|bn |}, and let un = k=0 |ak ||bn−k |. By the hypotheses
of absolute convergence and Theorem 10.10.24, {un } is convergent; moreover |γn | ≤
un , by the triangle inequality. We conclude that {γn } is absolutey summable, by
the comparison test (Theorem 10.10.12).
10.10.26 Definition (Alternating Sequences and Series). An alternating se-
quence is a sequence {an } such that (−1)n an ≥ 0 for all n, or else (−1)n+1 an ≤ 0
n
for all n. Thus an alternating sequence is of the form an = (−1)∑ bn , or else of the
form (−1) bn for some nonnegative sequence {bn }. A series n an where {an }
n+1
is alternating is called an alternating series.

10.10.27 Theorem (Alternating Series Test). An alternating series the abso-
lute values of whose terms monotonically decreases to zero is convergent:
∑ If {an }
is alternating and |a0 | ≥ |a1 | ≥ . . . with an → 0 as n → ∞, then n an converges.
Proof. We consider the case when (−1)n an ≥ 0, the case (−1)n+1 an ≥ 0 being
analogous ∑ sequence {−an } instead). Let
∑ (or obtained by reasoning in terms of the
sn = nk=0 an be the sequence of partial sums of n an . Since an = (−1)n |an | and
{|an |} decreases, we have
s2n+2 = s2n + a2n+1 + a2n+2 = s2n − (|a2n+1 | − |a2n+2 |) ≤ s2n
and
s2n+3 = s2n+1 + a2n+2 + a2n+3 = s2n+1 + (|a2n+2 | − |a2n+3 |) ≥ s2n+1 .
Thus, s0 ≥ s2 ≥ s4 ≥ . . . and s1 ≤ s3 ≤ s5 ≤ . . . (the sequence {s2n } is decreasing,
whereas {s2n+1 } increases).
Now we show that s2m ≥ s2n+1 for all m, n ∈ N. If m ≤ n, we have s2m ≥
s2n ≥ s2n − |a2n+1 | = s2n + a2n+1 = s2n+1 . If m > n then n ≤ m − 1, so we have
s2n+1 ≤ s2(m−1)+1 = s2m−1 = s2m − a2m = s2m − |a2m | ≤ s2m .
Let {
[sn+1 , sn ] if n is even;
In =
[sn , sn+1 ] if n is odd.
Then {In } is a nested sequence of nonempty closed and bounded intervals (to
be explicit: I2n+1 ⊂ I2n since s2n ≥ s2n+2 ; I2n+2 ⊂ I2n+1 since s2n+1 ≤ s2n+3 ).
Moreover, In has ∩ length |sn+1 − sn | = an → 0 as n → ∞, by assumption. By
Corollary 9.4.20, n In contains a unique element L ∈ R. It is clear that L =
limn→∞ sn since In ⊂ Nϵ (L) as ∑soon as |an | < ϵ, which happens for all sufficiently
large n, by assumption. Hence n an converges to L.
10.10.28 Remarks. The harmonic series

∑∞
1
(1)
n=1
n
is divergent, while the alternating harmonic series

∑
∞
(−1)n+1
(2)
n=1
n
is convergent, by the Alternating Series Test 10.10.27. The second series is obtained
from the first by reversing the sign of every second term of the series. What happens
if instead of reversing the sign of every second term we reverse the sign of every
k th term?
For example, the series corresponding to k = 1, . . . , 5 are shown below.
1 1 1 1 1 1 1 1 1
k=1: S1 = −1 − − − − − − − − − − · · ·,
2 3 4 5 6 7 8 9 10
1 1 1 1 1 1 1 1 1
k=2: S2 = 1− + − + − + − + − + · · ·,
2 3 4 5 6 7 8 9 10
1 1 1 1 1 1 1 1 1
k=3: S3 = 1+ − + + − + + − + + · · ·,
2 3 4 5 6 7 8 9 10
1 1 1 1 1 1 1 1 1
k=4: S4 = 1+ + − + + + − + + + · · ·,
2 3 4 5 6 7 8 9 10
1 1 1 1 1 1 1 1 1
k=5: S5 = 1+ + + − + + + + − + · · ·.
2 3 4 5 6 7 8 9 10
Since the negative of the harmonic series is divergent, S1 is divergent. Since the
alternating harmonic series is convergent, S2 is convergent. We have divergence for
k = 1 and convergence for k = 2. What happens for k > 2?
10.10.29 Theorem. Let Sk denote the variant of the harmonic series defined
above. Then, Sk is convergent if and only if k = 2.
Proof. The cases k = 1 and k = 2 are already known. For k > 2, consider the
sequence {Sk,m }∞
m=0 of partial sums of Sk . We shall examine the subsequence of
the form {Sk,km }∞
m=0
We write
Sk,km = Ŝ1,m + · · · + Ŝk−2,m + Ŝk−1,m − Ŝk,m ,
where, for j = 1, . . . , k,
∑
m
1
Ŝj,m = .
n=0
nk + j
For example, for S5,5m we have
S5,5m = Ŝ1,m + Ŝ2,m + Ŝ3,m + Ŝ4,m − Ŝ5,m ,
where
1 1 1 1 ∑
m
1
Ŝ1,m = + + + ··· + =
1 6 11 mk + 1 n=0
nk + 1
1 1 1 1 ∑
m
1
Ŝ2,m = + + + ··· + =
2 7 12 mk + 2 n=0
nk + 2
1 1 1 1 ∑
m
1
Ŝ3,m = + + + ··· + =
3 8 13 mk + 3 n=0
nk + 3
1 1 1 1 ∑
m
1
Ŝ4,m = + + + ··· + =
4 9 14 mk + 4 n=0
nk + 4
1 1 1 1 ∑m
1
Ŝ5,m = + + + ··· + =
5 10 15 mk + 5 n=0
nk + 5
For j = 1, . . . , k − 2, we have
∑
m
1 ∑ 1
m ∑ 1 m
1∑1
m
Ŝj,m = ≥ = = → +∞
n=0
nk + j n=0
nk + k n=1
nk k n=1
n
as m → ∞.
We also have
1 1 1 1
Ŝk−1,m − Ŝk,m = − + ··· + − .
k−1 k mk + k − 1 mk + k
This is the partial sum of a decreasing alternating series. Since a decreasing alter-
nating series is convergent, we have
Ŝk−1,m − Ŝk,m → α
as m → ∞, for some real α.
We have
( )
Sk,km = Ŝ1,m + · · · + Ŝk−2,m + Ŝk−1,m − Ŝk,m → +∞
as m → ∞, since the first k − 2 terms are positive and unbounded, while the
difference in parentheses is bounded.
The sequence {Sk,m }∞ m=0 of partial sums of Sk contains the divergent subse-
quence {Sk,km }m=0 . Therefore, the whole sequence {Sk,m }∞
∞
m=0 is divergent. We
have established that the series Sk is divergent for k > 2.
10.10.30 Remark. For the eyes of the inquisitive reader we put forth the following
series¶
∑
∞
(−1)n−1 1 1 1 1
= − + − + · · · = log 2;
n=1
n 1 2 3 4
∑∞
(−1)n−1 1 1 1 1 π
= − + − + ··· = ;
n=1
2n − 1 1 3 5 7 4
∞ ( ) √
∑ 1 1 1 1 1 1 1 1 log(1 + 2)
− − + = − − + + ··· = √ .
n=0
8n + 1 8n + 3 8n + 5 8n + 7 1 3 5 7 2
In each of them there is, on average, an equal balance of positive and negative
terms. The evaluation of these and an infinity of similar formulas can be done
using the Dirichlet Class Number Formula, a most remarkable result in analytic
number theory (the branch of mathematics that studies arithmetic properties of
numbers primarily utilizing tools from analysis).
10.11 Euler’s Number e

∑∞
1
10.11.1 Proposition. The series converges.
n=0
n!
Proof. Consider the k-th partial sum

∑k
1
sk =
n=0
n!
1 1 1
= 1+1+ + + ··· +
1·2 1·2·3 1 · 2 · 3···k
1 1 1 1
≤ 1 + 0 + 1 + 2 + · · · + k−1
2 2 2 2
1 − (2)1 k
= 1+
1 − 12
1
< 1+
1 − 12
= 3.
The sequence {sk } of partial sums is a monotonically increasing sequence that is

bounded above. Hence, it is convergent.
¶
“log” without further context means “natural logarithm” (usually denoted “ln” in elementary
courses), that is, log means loge where e = 2.71828 . . . is Euler’s Number: See Section 10.11 below.
(For a somewhat different and simpler, if less direct proof, use the ratio test:
∑
limn→∞ 1/(n+1)!
1/n!
1
= limn→∞ n+1 = 0 < 1 ⇒ n 1/n! converges.)
10.11.2 Definition (The Number e). Define
∑∞
1
e= .
n=0
n!
(A numerical approximation is e = 2.718281828459 . . . .)

10.11.3 Proposition. ( )n
1
lim 1+ = e.
n→∞ n
Proof. By the Binomial Theorem 10.8.7 and formula (10.15) for the binomial co-
efficients:
( )n ∑ ( ) ∑ k ( )
1 ∏n−l+1 ∑ 1 ∏
n n k n
1 1 n l−1
1+ = k k
= = 1− .
n n k! n k! n
k=0 k=0 l=1 k=0
| l=1 {z }
an,k
∑
Clearly, {an,k }n monotonically increases to 1/k!. Since k 1/k! converges,
∑∞it follows
from the Monotone Convergence Theorem 10.10.14 that (1 + n ) = k=0 an,k →
∑
1 n
∞
k=0 k! = e as n → ∞.
1
10.27 Exercise. Show that the sequences {an } and {bn } defined by
( )2n
1
an = 1 + n
2
( )−2n
1
bn = 1 − n
2
are bounded and monotonic ({an } is increasing and {bn } is decreasing). Also show
that they have the same limit.
(In view of Proposition 10.11.3, their common limit is e because {an } is a
subsequence of {(1 + n1 )n }.)
10.11.4 Theorem. Euler’s number e is irrational.
Proof. The proof is by contradiction. Assume that e is rational. Then e = m/n
for some integers m, n with n ̸= 0. Since e > 0, we may assume m, n > 0 without
loss of generality. We also have shown that 2 < e < 3, therefore e is not an integer
and we must have n ≥ 2. Let
∑
n
n! ∑
n
a= = n(n − 1) . . . (n − k + 1)
k=0
k! k=0
and
∑∞
n! ∑
∞
n! ∑
∞
1
ϵ= = = .
k=n+1
k! k=1
(n + k)! k=1 (n + 1)(n + 2) . . . (n + k)
Then a ∈ N+ , and
∑
∞
1 ∑ 1
∞
1 1 1 1
0<ϵ< = = · = ≤ = 1.
| · n{z. . . n} k=1 n n 1− n−1 2−1
k 1
k=1
n n
k factors
We have m = ne, hence m(n − 1)! = n(n − 1)!e = n!e, so
∑∞
1 ∑n
n! ∑∞
n!
m(n − 1)! = n! = + = a + ϵ,
k=0
k! k=0
k! k=n+1
k!
by the geometric formula (10.21). Hence ϵ = m(n − 1)! − a is an integer in (0, 1),
a contradiction. Therefore, e is irrational.
10.11.5 Definition. The natural exponential function exp : E → E is the function
expe , that is, the function x 7→ ex (with exp(−∞) = e−∞ = 0 and exp(+∞) =
+∞). The natural logarithm function ln : [0, +∞] → E is simply the function loge
(logarithm to the base e), with ln(0) = −∞ and ln(+∞) = +∞.
Note: In pure mathematics it is very common to denote the natural logarithm
function by “log”. However, the notation “ln” is an ISO standard and precludes any
confusion with logarithms to other bases, particularly with “common” logarithms
(to the base 10).
Foundations Note 11
Continuity

July 28, 2020
Contents
11.1 Limits of Functions . . . . . . . . . . . . . . . . . . . . . . . 11-1
11.2 Continuous Functions . . . . . . . . . . . . . . . . . . . . . . 11-13
11.3 Density and compactness in spaces of functions . . . . . . . . 11-27
11.1 Limits of Functions

Note: In this Note the symbol ⇒ is used to denote the sentential connective →
and also to denote a tautological implication (logical consequence). The symbol
Moreover, throughout this note when we use topological language (neighbor-
hoods, open sets, limit points, closures, etc.) we will always refer to the topology
of the extended real line E, unless otherwise specified.
11.1.1 Definition (Pointwise and uniform limits of sequences of func-
tions). If S is a subset of E and {fn | n ∈ N} is a family of functions fn : S → E
we call {fn } a sequence of functions on S.
We say that a function g : S → E is the pointwise limit of {fn } as n → ∞
if limn→∞ fn (x) = g(x) for all x ∈ S. We write: fn → g (pointwise) as n → ∞.
(“Pointwise” means that the limit function g is determined at each point x as a
limit of the values of the functions of the sequence at said point. The pointwise
11-1
Note 11: Continuity – July 28, 2020 11-2
limit of a sequence of functions is simply a collection of numerical limits, one at

each point x.)
To simplify the notation, we will write “ϵ > 0” to mean “ϵ ∈ (0, +∞)” (that is,
we exclude ϵ = +∞), and similarly for “δ > 0”, etc. However, even if the reader
shall forget this convention, no logical harm is done.
We have fn → g (pointwise) as n → ∞ if and only if
(∀x ∈ S)(∀ϵ > 0)(∃N ∈ N)(n > N ⇒ fn (x) ∈ Nϵ (g(x))).
If g takes values in R, then fn → g (pointwise) as n → ∞ if and only if
(11.1) (∀x ∈ S)(∀ϵ > 0)(∃N ∈ N)(n > N ⇒ |fn (x) − g(x)| < ϵ).
If {fn } is a sequence of functions S → R (note here that the values take values
in R, though perhaps S ⊂ E), we say that a function g : S → R is the uniform
limit of {fn } on S as n → ∞ if
(11.2) (∀ϵ > 0)(∃N ∈ N)(∀x ∈ S)(n > N ⇒ |fn (x) − g(x)| < ϵ).
We write: fn → g as n → ∞ (uniformly on S).

Note that we do not define the concept of uniform convergence of a sequence
of functions taking values in E.∗
Comparing the definitions, we see that in the definition of fn → g pointwise,
N = N (ϵ, x) is allowed to depend on both ϵ and x, whereas in the definition
of uniform convergence, N = N (ϵ) only depends on ϵ. This explains the name
“uniform”: the choice of N must be uniform (that is, constant or fixed) as x varies
over S (of course N still depends on ϵ). In this light, it is obvious that if {fn }
has the uniform limit g then it has the pointwise limit g. The converse is false in
general, as the example below shows.
11.1.2 Remark. We have fn → f pointwise on S if and only if, for every x ∈ S
and every neighborhood N of f (x) we have fn (x) is eventually in N .
By contrast, there is no purely topological definition ensuring fn → f uniformly
on S. The definition of uniform limit above depends essentially on the fact that
we have a uniform way (via ϵ) to measure the sizes of neighborhoods of various
points x as x itself varies.
11.1.3 Proposition. The pointwise limit of a sequence of function {fn } is unique
(if it exists).
∗
In applications beyond those concerning us, uniform convergence of functions taking values
in E is studied by introducing a different metric (notion of distance). For instance, one may
consider uniform limits with respect to the metric on E defined by d(x, y) = |f −1 (y) − f −1 (x)|
with f the function from Exercise 10.2.
Proof. Let {fn } be a sequence of functions S → E. If fn → g (pointwise), then

g(x) is the limit of the sequence {fn (x)} for each x ∈ S. By the uniqueness of
limits of sequences (Proposition 10.3.14), g(x) unique for each x ∈ S (if it exists),
hence the pointwise limit function g is unique.
11.1.4 Example. Let fn : [0, 1] → [0, 1] : x 7→ xn . Let
{
0 if x ∈ [0, 1);
g : [0, 1] → [0, 1] : x 7→
1 if x = 1.
We claim that fn → g pointwise on [0, 1]. Clearly fn (1) = 1 → 1 = g(1) as n → ∞.

Similarly fn (0) → 0 = g(0). For x ∈ (0, 1) we have fn (x) = xn = expx (n) →
expx (+∞) = 0 as n → +∞, by Theorem 10.9.6.
However, g is not the uniform limit of {fn } on S (one also says that the pointwise
limit g “is not attained uniformly” as x varies on S). To see this, take any ϵ < 1.
For any N ∈ N and any n > N we have fn ≥ ϵ on [a, 1] where a = ϵ1/n (note that
a < 1 since a < 1 and x 7→ x1/n is strictly increasing). Therefore, for any x ∈ [a, 1)
we have g(x) = 0 yet fn (x) ∈ / Nϵ (g(x)) = (−ϵ, ϵ).
The reason the pointwise limit is not reached uniformly on [0, 1] is the extremely
slow decay to 0 (as n → ∞) of the values fn (x) for x very close to 1. For x very
close to 1, the values fn (x) stay close to 1 rather than close to 0, even when n is
moderately large. On the other hand, if we restrict the functions fn to any interval
[0, b] for fixed b < 1, then fn has a uniform limit equal to the function g : 0 7→ 0 on
[0, b]. (Given ϵ > 0, it suffices to take N > logb (ϵ): The same N works uniformly
for all x ∈ [0, b].)
Reiterating: A pointwise limit of a sequence of functions on a set S need not
be a uniform limit on S (the pointwise limit need not be attained uniformly on all
of S).
11.1 Exercise. Let {fn } be a sequence of functions S → R. Assume that fn → f
(pointwise) as n → ∞. If S = {s1 , s2 , . . . , sm } is a finite set, prove that the
pointwise limit f is attained uniformly on S.
11.1.5 Definition (Supremum norm. Uniform distance and uniform topology).
Fix an arbitrary set S. The supremum norm (or “sup norm”) of any function
f : S → E is the extended real number
{ }
∥f ∥ = sup |f (x)| | x ∈ S .
(∥f ∥ is the least upper bound of the absolute values of the values of f on S.)
Note that ∥f ∥ ≤ B if and only if −B ≤ f (x) ≤ B for all x ∈ S. In particular,
∥f ∥ < +∞ if and only if f is bounded.
Define the uniform distance du (f, g) between two (not necessarily bounded)
functions f, g : S → R to be the nonnegative extended real number
du (f, g) = ∥g − f ∥ .
For all functions f, g, h : S → R we have
1. Positivity: du (f, g) > 0 unless f = g (i.e., unless f (x) = g(x) for all x ∈ S);
2. Symmetry: du (f, g) = du (g, f );
3. Triangle inequality: du (f, h) ≤ du (f, g) + du (g, h).
While we shall rarely use the notation du (f, g) in the future (writing directly
∥g − f ∥ instead), the reader should think of any sup norm of a difference as a
uniform distance.
For any real ϵ > 0, the (uniform) ϵ-neighborhood of a function f ∈ RS is the
set
Nϵu (f ) = Nϵ (f ) = {g ∈ RS | ∥g − f ∥ < ϵ}.
(We use the superscript “u” for emphasis only, to remind the reader the neighbor-
hood is uniform. Note that hitherto the notion of neighborhood of a function was
undefined.) Building on uniform ϵ-neighborhoods, one defines a (uniform) neigh-
borhood of f to be any set N ⊂ RS such that Nϵu (f ) ⊂ N for some ϵ > 0, and
an open set U ⊂ RS to be one that is a (uniform) neighborhood of each of its
elements.
11.1.6 Remark. For any sequence {fn } in RS , we have fn → f uniformly as
n → ∞ if and only if f ∈ RS and
(∀ϵ > 0)(∃N ∈ N)(∀n ∈ N)(n > N ⇒ ∥fn − f ∥ < ϵ).
Equivalently, fn → f uniformly as n → ∞ if and only if, given any uniform

neighborhood N of f , fn is eventually in N .
The above formulation of uniform limit in RS is formally identical to that
of the limit an → L of a sequence in R, and looks cleaner than that given in
Definition 11.1.1 because the innermost quantification over all x ∈ S is implicit in
the definition of the sup norm.
11.2 Exercise. For all f, g : S → R prove:
1. ∥−f ∥ = ∥f ∥.
2. ∥f ∥ > 0 unless f is the zero function.

11.3 Exercise. 1. For all functions f, g : S → R prove:
∥f + g∥ ≤ ∥f ∥ + ∥g∥ .
2. For all functions f : S → R and all u ∈ R, prove:
∥uf ∥ ≤ |u| · ∥f ∥
provided we interpret |u| · ∥f ∥ as 0 if u = 0 (even if ∥f ∥ = +∞).

11.4 Exercise. Use exercises 11.2 and 11.3 to prove properties 1.–3. of the uniform
distance du (Definition 11.1.5).
11.1.7 Definition. Let S be a subset of E and let L ∈ E and c ∈ S∗ (c is a limit
point of S). Given a function f : S → E, we say that the L is the limit of f (x) as
x tends to c, or f (x) approaches L as x approaches c, if, for every neighborhood
N ′ of L, there exists a neighborhood N of c such that
(∀x ∈ S)(x ∈ N \ {c} ⇒ f (x) ∈ N ′ ).
We say that the limit of f (x) exists as x approaches c if there exists L ∈ E such
that f (x) → L as x → c. If L ∈ R, we say that the limit exists properly, or f
converges to L as x → c.
Two immediate reformulations of the above definition are:
( )
f∗ S ∩ (N \ {c}) ⊂ N ′ ,
(where f∗ is the forward-image map f∗ : P(S) → P(E)), and
S ∩ (N \ {c}) ⊂ f ∗ (N ′ )
(where f ∗ is the inverse-image map f ∗ : P(E) → P(S)).

To denote the property above, we write
f (x) → L as x → c.
Equivalently, f (x) → L as x → c if and only if for all ϵ > 0 there exists δ > 0
such that
(11.3) (∀x ∈ S)(x ∈ Nδ (c) \ {c} ⇒ f (x) ∈ Nϵ (L)).
(Recall the definition of ϵ-neighborhoods of ±∞ in 10.2.1.)

Now we spell out many special cases of the definition.
• When both c, L are real, we have limx→c f (x) = L if and only if
(11.4) (∀ϵ > 0)(∃δ > 0)(∀x ∈ S)(0 < |x − c| < δ ⇒ |f (x) − L| < ϵ).
(11.4) is the most important special case of (11.3) by far.
• When c ∈ R and L = +∞ we have limx→c f (x) = +∞ if and only if
(∀ϵ > 0)(∃δ > 0)(∀x ∈ S)(0 < |x − c| < δ ⇒ f (x) > 1/ϵ).
• When c ∈ R and L = −∞ we have limx→c f (x) = −∞ if and only if
(∀ϵ > 0)(∃δ > 0)(∀x ∈ S)(0 < |x − c| < δ ⇒ f (x) < −1/ϵ).
• When L ∈ R and c = +∞ we have limx→+∞ f (x) = L if and only if
(∀ϵ > 0)(∃δ > 0)(∀x ∈ S)(x > 1/δ ⇒ |f (x) − L| < ϵ).
• When L ∈ R and c = +∞ we have limx→+∞ f (x) = L if and only if
(∀ϵ > 0)(∃δ > 0)(∀x ∈ S)(x < −1/δ ⇒ |f (x) − L| < ϵ).
• When c = L = +∞ we have limx→+∞ f (x) = +∞ if and only if
(∀ϵ > 0)(∃δ > 0)(∀x ∈ S)(x > 1/δ ⇒ f (x) > 1/ϵ).
Other cases when c, L are ±∞ are analogous.
11.1.8 Remark. In Definition 11.1.7, it suffices to consider fixed bases of neigh-

borhoods, say {Nn′ } of L and {Nm } of c (these might be shrinking sequences of
neighborhoods, namely Nm = N1/m (c)). Then limx→c f (x) = L is equivalent to
the statement
(∀n)(∃m)(∀x ∈ S)(x ∈ Nm \ {c} ⇒ f (x) ∈ Nn′ ).
11.1.9 Proposition. If the limit of f (x) exists as x → c, the limit is unique. In

this case, we write
lim f (x) = L
x→c
to denote the unique L ∈ E such that f (x) → L as x → c.

Proof. If f (x) → L1 and also f (x) → L2 as x → c, then for every neighborhoods N1′
of L1 , N2′ of L2 , there exist neighborhoods N1 , N2 of c such that f∗ (N1 \ {c}) ⊂ N1′
and f∗ (N2 \ {c}) ⊂ N2′ . Let N = N1 ∩ N2 . Since c is a limit point of the
domain of( f , and N is )a neighborhood of c, there exists x ∈ N \ {c}, hence
f (x) ∈ f∗ (N1 ∩ N2 ) \ {c} ⊂ f∗ (N1 \ {c}) ∩ f∗ (N2 \ {c}) ⊂ N1′ ∩ N2′ . Hence N1′ and
N2′ are not disjoint. We conclude that L1 and L2 have no disjoint neighborhoods,
so L1 = L2 , by the Hausdorff property of E, Proposition 10.2.5.
√
11.1.10 Example. Consider √ f : [0, +∞) → R : x 7→ x and fix c ≥ 0. Then
limx→c f (x) exists and equals c. To prove this statement we use a so-called “ϵ-δ
argument” (that is, we prove that (11.4) holds). It goes as follows. √
Let ϵ > 0 be given. First, suppose that c ̸= 0. Choose δ = ϵ c. For x ≥ 0 we
have
√ √
√ √ (√ √ ) x + c

0 < |x − c| < δ ⇒ x − c = x − c · √ √
x + c
|x − c|
= √ √
x+ c
|x − c|
≤ √
c
δ
< √
c
= ϵ.
If c = 0, choose δ = ϵ2 . For x ≥ 0 we have

√ √
0 < |x − 0| < δ ⇒ x − 0 < δ = ϵ,
since f is strictly increasing on [0, +∞).

11.1.11 Example. Consider f : (0, +∞) → R : x 7→ 1/x. Then limx→c f (x)
exists for all c > 0, and its value is L = 1/c. The ϵ-δ argument to prove this
statement goes as follows. Given c > 0 and ϵ > 0, choose δ = min{ϵc2 /2, c/2}.
Then, δ ≤ c/2 and δ ≤ ϵc2 /2. Further, we note that
|x − c| < δ ⇒ −δ < x − c ⇒ c − δ < x.
For x > 0 we have

1 1 |x − c| |x − c| δ ϵc2 /2
0 < |x − c| < δ ⇒ − = < ≤ = 2 = ϵ.
x c xc (c − δ)c (c − c/2)c c /2
11.1.12 Remark. How does one prove that a limit does not exist? To assert
that limx→c f (x) exists is to assert
(11.5) (∃L ∈ E)(∀ϵ > 0)(∃δ > 0)(∀x ∈ S)(x ∈ Nδ (c) \ {c} ⇒ f (x) ∈ Nϵ (L)).
Recall that if p and q are propositions, the negation ¬(p ⇒ q) is equivalent to

p ∧ ¬q. Therefore, the negation
¬(x ∈ S ∩ (Nδ (c) \ {c}) ⇒ f (x) ∈ Nϵ (L))
is equivalent to the statement
x ∈ Nδ (c) \ {c} ∧ f (x) ∈

/ Nϵ (L).
Accordingly, the negation of (11.5), i.e.: “limx→c f (x) does not exist,” is
(11.6) (∀L ∈ E)(∃ϵ > 0)(∀δ > 0)(∃x ∈ S)(x ∈ Nδ (c) \ {c} ∧ f (x) ∈
/ Nϵ (L)).
When c ∈ R, asserting that limx→c f (x) is not equal to any real number L
(classically, one says that the limit diverges) is equivalent to
(11.7) (∀L ∈ R)(∃ϵ > 0)(∀δ > 0)(∃x ∈ S)(0 < |x − c| < δ ∧ |f (x) − L| < ϵ).
(Note that limx→∞ f (x) could still exist and be ±∞.) We shall put this to use
below.
11.1.13 Example. We show that if f : (0, +∞) → R : x 7→ 1/x, then
limx→0 f (x) is not equal to any real number L. We need to show that for ar-
bitrary L there is some ϵ > 0 such that given any δ > 0 we can choose x > 0 such
that
1
0 < |x − 0| < δ ∧ − L ≥ ϵ.
x
Surely, we can choose x > 0 so small that 1/x > L + ϵ and such that x < δ (if
L + ϵ > 0, just choose x positive and smaller than min{δ, 1/(L + ϵ)}; otherwise any
x ∈ (0, δ) will do). With this x we have

1
− L > ϵ.
x
In fact, for the function f above we have limx→0 f (x) = +∞. The corresponding
“ϵ-δ argument” is as follows. Given ϵ > 0 let δ = ϵ. For all x ∈ (0, +∞) we have
0 < |x − 0| < δ ⇒ 0 < x < δ = ϵ ⇒ f (x) = 1/x > 1/ϵ.
(Of course, having proved f (x) → ∞ as x → 0 also shows that f (x) diverges in R,
since limits are unique in E, by Proposition 11.1.9.)
11.1.14 Examples. 1. The domain S of the function in Definition 11.1.7 need

not be open or perfect. Consider f : N → R : x 7→ 2x. Since N has no limit
points, limx→c f (x) does not exist at any point of N. On the other hand, for
g : {1, 1/2, 1/3, . . .} → R : x 7→ 2x we have limx→0 g(x) = 0.
2. Note that f need not be defined at the point c. For example, if
f : (−∞, 0) ∪ (0, +∞) → R : x 7→ 1,
we have limx→0 f (x) = 1; but f (0) is not defined.
3. For f : (−∞, 0) ∪ (0, +∞) → R : x 7→ 1/x, limx→0 f (x) does not exist.
4. If f is defined at c, the limit may exist, but need not equal f (c). For example,
consider f : R → R given by
{
1, x ̸= 0,
f (x) =
0, x = 0.
Then, limx→0 f (x) = 1, but f (0) = 0.
5. The limit may fail to exist even if the function is bounded, c is a limit point
of S, and f (c) is defined. Consider f : R → R given by
{
1, x rational,
f (x) =
0, x irrational,
then limx→c f (x) does not exist for any c ∈ E: For any L ∈ E the neighbor-
hood N ′ = N1/2 (L) of L includes at most one of the points 0, 1, while f takes
both values 0 and 1 in any punctured neighborhood N \ {c} of c, showing
f (x) ↛ L as x → c.
This example also shows that the limit of a function may not exist anywhere
(meaning: not at any limit point c of its domain).
11.5 Exercise. Consider f : R → R given by
{
x, x rational,
f (x) =
0, x irrational.
Prove that limx→c f (x) exists for c = 0 and does not exist for any other extended
real number c.
11.1.15 Theorem. Suppose S is a subset of E, c is a limit point of S (in E),

and f : S → E. Then
(A) lim f (x) = L

x→c
if and only if for every sequence {xn } in S \ {c} with limn→∞ xn = c, we have
(B) lim f (xn ) = L.

n→∞
Proof. (A) implies (B): Assume (A) holds. Let {xn } be a sequence in S \ {c}
with limn→∞ xn = c. In order to prove (B) we need to show that f (x) is eventually
in any given neighborhood N ′ of L.
By the definition of the limit of a function there exists a neighborhood N of c
such that
(∀x ∈ S)(x ∈ N \ {c} ⇒ f (x) ∈ N ′ ).
Since {xn } is eventually in (S \ {c}) ∩ N , we have {f (xn )} is eventually in N ′ ,
concluding this part of the proof.
(B) implies (A): This part of the proof follows from the existence of a count-
able basis of neighborhoods shrinking to any point in E. To be more precise, we
shall prove the contrapositive statement: the negation of (A) implies the negation
of (B). Let {Nn } be a shrinking sequence of neighborhoods of c. Assuming that
(A) does not hold, there must exist a neighborhood N ′ of L with the following
property: Every neighborhood N of c contains at least one point x ∈ S \ {c} such
/ N ′ . Applying this to the neighborhood Nn we obtain, for each n ∈ N,
that f (x) ∈
a point xn ∈ Nn ∩ S \ {c} with f (x) ∈ / N ′ ; therefore, {f (xn )} is disjoint from the
neighborhood N ′ of L and we certainly have f (xn ) ̸→ f (c) as n → ∞. Since the
sequence {Nn } shrinks to c we have limn→∞ xn = c. Hence (B) does not hold. The
proof is complete.
Note that Proposition 11.1.3 proved earlier is an immediate corollary of theo-
rem 11.1.15.
11.1.16 Definition (Uniform and pointwise limits of functions). Let S ⊂ E
and let S∗ be the set of limit points of S in E. Let a function f : S → E be given.
For every c ∈ S∗ , assume that limx→c f (x) exists. Then the function g : S∗ → E
defined by g(c) = limx→c f (x) is called the pointwise cluster limit of f . Thus, g is
uniquely characterized by the property f (x) → g(c) as x → c for every c ∈ S∗ .
The adjective “cluster” is meant to remind the reader that the limit is assumed
to exist at cluster (limit) points of S, while “pointwise” means that property is
a universal quantification over points (cluster points of the domain); see equa-
tion (11.8) below. We regard the adjective “cluster” as optional, used only for
emphasis. (I.e., to distinguish this from other type of limits, for example from
pointwise or uniform limits of sequences of functions—in a cluster limit the func-
tion f is fixed.) We caution the reader that the nomenclature “pointwise cluster
limit” is not standard.
In other words, g is the pointwise cluster limit of f if
(11.8) (∀c ∈ S∗ )(∀ϵ > 0)(∃δ > 0)(∀x ∈ S)(x ∈ Nδ (c) \ {c} ⇒ f (x) ∈ Nϵ (f (c))).
If S ⊂ R and S∗ denotes the set of real limit points of S, we say that g : S∗ → R

is the uniform cluster limit of f on S∗ if
(11.9) (∀ϵ > 0)(∃δ > 0)(∀c ∈ S∗ )(∀x ∈ S)(x ∈ Nδ (c) \ {c} ⇒ f (x) ∈ Nϵ (g(c))).
Note that uniform limits are defined above exclusively in a real (rather than ex-
tended real) context. See the comments in definition 11.1.1 and the footnote
therein.
The difference between (11.8) and (11.9) is subtle, but very important. To wit,
δ = δ(c, ϵ) is allowed to depend on both c and ϵ for the pointwise limit to exist, but
the same δ must satisfy the condition simultaneously (“uniformly”) for all c ∈ S∗
for the limit to be uniform (though δ is still allowed to depend on ϵ in the latter
case). Clearly, if g is the uniform cluster limit of f then g is also the pointwise
limit, but not conversely. The example below shows that the pointwise limit may
exist even when a uniform one does not. In such case, one says that the (pointwise)
limit exists but it is not attained uniformly.
11.6 Exercise. Prove that limx→c 1
x2
exists for all c ∈ E, with
lim 1/x2 = 1/c2 .

x→c
Show that the limit is attained uniformly on R \ (−a, a) for any a > 0, but not on
R \ {0}.
√ √ √
11.7 Exercise. Show that limx→c x = c for all c ∈ [0+∞] (recall that +∞ =
(+∞)1/2 = +∞ by definition). In fact, show that the limit is reached uniformly
on [0, +∞).
11.1.1 Properties of Limits

11.1.17 Definition. Let S be a subset of E, let α be any real number and let
f, g : S → R. We define the functions α + f, αf, f + g, f g, |f |, −f and f /g taking
values in E by
(1) (α + f )(x) = α + f (x);

(2) (αf )(x) = αf (x);
(3) (f + g)(x) = f (x) + g(x);
(4) (f g)(x) = f (x)g(x);
(5) (f /g)(x) = f (x)/g(x);
(6) |f |(x) = |f (x)|;
(7) (−f )(x) = −f (x)

(if f (x) = ±∞, interpret −f (x) and (−1) · f (x) as ∓∞).
whenever the right-hand sides are defined in the arithmetic of E (i.e., x ∈ S such
that the right-hand side is of the form ∞ − ∞, 0 · ∞ or ∞/∞ must be excluded).
11.1.18 Theorem. Let S be a subset of E and let c be a limit point of S. Let
f, g : S → E. Suppose
lim f (x) = L1 and lim g(x) = L2 .

x→c x→c
Then, for any real number α,
(1) limx→c (α + f )(x) = α + L1 ;
(2) limx→c (αf )(c) = αL1 ;
(3) limx→c (f + g)(x) = L1 + L2 ;
(4) limx→c (f g)(x) = L1 L2 ;
(5) limx→c (f /g)(x) = L1 /L2 ;
(5a) limx→c (f /g)(x) = +∞ if L1 > L2 = 0 and g(x) > 0 for all x ̸= c, or else
L1 < L2 = 0 and g(x) < 0 for all x ̸= c;
(5b) limx→c (f /g)(x) = −∞ if L1 > L2 = 0 and g(x) < 0 for all x ̸= c, or else
L1 < L2 = 0 and g(x) > 0 for all x ̸= c;
(6) limx→c |f |(x) = |L1 |;
(7) limx→c (−f )(x) = −L1 (if L1 = ±∞, interpret −L1 as (−1)L1 = ∓∞)
whenever the expressions on α, L1 , L2 on the right-hand side are defined in the

arithmetic of E.
Proof. The proof is an immediate consequence of Theorems 10.6.1 and 11.1.15.
11.8 Exercise. For a subset S ⊂ R, if {fn } and {gn } are sequences of functions
S → R, and if f, g are the uniform limits of {fn }, {gn } on S as n → ∞, then f + g
is also the uniform limit of {fn + gn } on S.
11.1.19 Definition. If F is a family of functions into R, we say that F is uniformly
bounded if there exists a finite N ≥ 0 such that every f ∈ F is a function taking
values in [−N, N ].
If F is a family of functions uniformly bounded, then every function f ∈ F
is bounded. However, the converse is not true, as witnessed by the family {fn }
with fn the function x 7→ nx in [0, 1]. Then |fn (x)| ≤ n for all x ∈ [0, 1], so fn is
bounded, yet no uniform bound N exists since fn (1) = n > N for all n > N .
11.9 Exercise. If fn → f uniformly on S, and each fn is bounded, then {fn } is
uniformly bounded.
11.10 Exercise. 1. For a subset S ⊂ R, if {fn } and {gn } are uniformly bounded
sequences of functions S → R having uniform limits f, g : S → R, show that
f g is the uniform limit of fn gn on S.
2. Construct sequences {fn }, {gn } with {fn }, {gn } uniformly bounded, fn → f

pointwise, gn → g uniformly, but fn gn does not tend to f g uniformly.
11.2 Continuous Functions

11.2.1 Definition. Let S be a subset of E. A function f : S → E is said
to be continuous at c ∈ S if given any neighborhood N ′ of f (c) there exists a
neighborhood N of c such that
(∀x ∈ S)(x ∈ N ⇒ f (x) ∈ N ′ ).
Note that c may be either a limit point or an isolated point of S. Equivalently, for
every neighborhood N ′ of f (c) there exists a neighborhood N of c such that
f∗ (S ∩ N ) ⊂ N ′ .
More classically, the “ϵ-δ” definition of continuity is as follows: f is continuous

at c if and only if, for every ϵ > 0 there exists δ > 0 such that
(∀x ∈ S)(x ∈ Nδ (c) ⇒ f (x) ∈ Nϵ (f (c))).

If c and f (c) are both real, we have the equivalent formulation: Given ϵ > 0 there
exists δ > 0 such that
(∀x ∈ S)(|x − c| < δ ⇒ |f (x) − f (c)| < ϵ).
We urge the reader to spell out a formulation, using inequalities, of all the improper
cases of the definition (when either c, f (c), or both, are infinite).
If f is not continuous at c, we say that f is discontinuous at c.
If f is continuous at every point c ∈ S, we say that f is continuous on S (or
just that it is continuous).
If S ⊂ R and f : S → R, we say that f is uniformly continuous on S if, given
any ϵ > 0, there is a δ > 0 such that
(∀c ∈ S)(x ∈ S ∧ x ∈ Nδ (c) ⇒ f (x) ∈ Nϵ (f (c))).
Equivalently,
(∀c ∈ S)(x ∈ S ∧ |x − c| < δ ⇒ |f (x) − f (c)| < ϵ).
11.2.2 Remark. In the above definition, if c is a limit point of S, then continuity

at c is equivalent to limx→c f (x) = f (c). If c is an isolated point of S, then f is
necessarily continuous at c (even though limx→c f (x) is undefined). Indeed, if c
is an isolated point, given any ϵ > 0, simply choose δ > 0 small enough that
S ∩ Nδ (c) = {c}.
Intuitively, f is continuous at a limit point c of S if the values f (x) for x close
to c are close to f (c). In geometric terms, the graph of f (x) does not jump nor
oscillate (vertically) by any sizable amount† as x varies over small neighborhoods
of c.
11.2.3 Remark. The distinction between continuity on S and uniform continu-
ity on S is analogous to that between limx→c f (x) simply existing for every c ∈ S
and existing uniformly on S.
A function f : S → R is continuous on S if
(∀c ∈ S)(∀ϵ > 0)(∃δ > 0)(x ∈ S ∩ Nδ (c) ⇒ f (x) ∈ Nϵ (f (c))).
Note that δ may depend on c and on ϵ. To say that limx→c f (x) exists uniformly
on S means
(∀ϵ > 0)(∃δ > 0)(∀c ∈ S)(x ∈ S ∩ Nδ (c) ⇒ f (x) ∈ Nϵ (f (c))).
Note that δ may depend on ϵ, but the same δ must work (uniformly) for all c ∈ S.
Clearly, uniform continuity implies continuity.
†
There can be oscillations, provided they dampen down to naught as c is approached.
11.2.4 Theorem (Sequential Characterization of Continuity). Let S ⊂ E

and f : S → E. Let c ∈ S. Then f is continuous at c if and only if for every
sequence {xn } in S with limn→∞ xn = c we have limn→∞ f (xn ) = f (c).
Proof. The proof is a cosmetic variation of that of Theorem 11.1.15.
First assume f is continuous at c. Let {xn } be a sequence in S with limn→∞ xn =
c. We show now that f (xn ) is eventually in any given neighborhood any given
neighborhood N ′ of f (c) (hence f (xn ) → f (c) as n → ∞).
By the definition of the limit of a function there exists a neighborhood N of c
such that
(∀x ∈ S)(x ∈ N \ {c} ⇒ f (x) ∈ N ′ ).
Since xn → c, {xn } is eventually in S ∩ N , hence {f (xn )} is eventually in N ′ ∪
{f (c)} = N ′ (since f (c) ∈ N ′ ). We have shown that limn→∞ f (xn ) = f (c).
Conversely we shall show, if f is discontinuous at c, that a sequence {xn } exists
such that xn → c but f (xn ) ̸→ f (c) as n → ∞. This will complete the proof.
Let {Nn } be a shrinking sequence of neighborhoods of c. Assuming that f is
discontinuous at c, there must exist a neighborhood N ′ of f (c) with the following
property: Every neighborhood N of c contains at least one point x such that
/ N ′ . Note that necessarily x ̸= c, since f (c) ∈ N ′ . Applying this to the
f (x) ∈
neighborhoods Nn we obtain, for each n ∈ N, a point xn ∈ Nn ∩ S \ {c} with
/ N ′ ; therefore, {f (xn )} is disjoint from the neighborhood N ′ of f (c) and we
f (x) ∈
certainly have f (xn ) ̸→ f (c) as n → ∞. Since the sequence {Nn } shrinks to c we
have limn→∞ xn = c, concluding the proof.
11.2.5 Remark. The above theorem states that, under the hypothesis of con-
tinuity, if {xn } converges to L then {f (xn )} converges to f (L). That is,
( )
lim f (xn ) = f (L) = f lim xn .
n→∞ n→∞
In this sense, continuity is a property that can be stated informally as “the limit
of the f ’s is the f of the limit.”
11.2.6 Example. Let us agree to write every rational number x as a reduced
fraction x = m/n with n > 0 (i.e., the greatest common divisor of m and n is 1).
Define the function f : R → R by
{
1/n, if x = m/n,
f (x) =
0, if x is irrational.
We prove that f is continuous at every irrational and discontinuous at every ra-

tional.
Suppose c is rational. Then, f (c) > 0. Let {xk } denote a sequence of irra-
tionals that converges to c (this sequence can be easily constructed by picking
xk ∈ N1/(k+1) (c) \ Q, where the latter set is nonempty by the density of the irra-
tionals in R). We have f (xk ) = 0 for all k, giving
lim f (xk ) = 0 ̸= f (c).

k→∞
Hence, f is not continuous at c.

Suppose that c is irrational. Given ϵ > 0 choose an integer N > 1/ϵ. If x1 =
m1 /n1 and x2 = m2 /n2 are distinct, then 0 < |x2 − x1 | = |m2 n1 − m1 n2 |/(n1 n2 ) (a
fraction not necessarily reduced), so we have |m2 n1 − m1 n2 | ≥ 1 and |x2 − x1 | ≥
1/(n1 n2 ). If n1 , n2 ≤ N then |x2 − x1 | ≥ 1/(n1 n2 ) ≥ 1/N 2 . Since N1/(2N 2 ) (c)
contains no two points differing at least by 1/N 2 , it contains at most one rational
number r = a/b with b ≤ N , and necessarily r ̸= c. Let δ = min{|r −c|, 1/(2N 2 )} if
such r exists, else δ = 1/(2N 2 ). Then Nδ (c) also excludes r (if the latter exists), so
it contains no rational numbers m/n with n < N . It follows that 0 ≤ f (x) < 1/N
for all x ∈ Nδ (c): This is obvious if x is irrational, while if x = m/n is rational
then n > N , so f (x) = 1/n < 1/N . Therefore f∗ (Nδ (c)) ⊂ Nϵ (0), where 0 = f (c).
11.2.1 Hölder Continuity

11.2.7 Definition. Let I ⊂ R be an interval with nonempty interior (i.e., I has
endpoints a, b with a < b). A function f : I → R is called Hölder continuous with
exponent α and constant γ if there exist real constants α, γ such that
|f (y) − f (x)| ≤ γ |y − x|α
for all x and y in I.

In this case we say f is (α, γ)-Hölder continuous.
A function f is called Hölder continuous with exponent α if f is Hölder con-
tinuous with exponent α and constant γ for some γ. In this case we write f is
α-Hölder continuous. (The exponent α plays a much more prominent role in the
theory than the constant γ.) The set of all functions that are α-Hölder contin-
uous on an interval I is denoted by C 0,α (I). (The “0” means that the function
f is not necessarily differentiable. Since these Notes do not treat the subject of
differentiable functions, we beg the reader’s understanding.)
For obvious reasons the constant γ must be nonnegative; in fact, if γ = 0 then
f is constant. For less obvious reasons (see the two below), the concept of Hölder
continuity is only used in analysis for α ∈ (0, 1]. The case of α = 1 is especially
important: See the definition of Lipschitz continuity below.
11.11 Exercise. Fix α ≤ 0 and γ > 0. Construct a discontinuous function

f : [0, 1] → R satisfying the definition of Hölder continuity with exponent α and
constant γ (for x = y and α < 0, we interpret |y − x|α as +∞; for x = y and α = 0,
we interpret |y − x|α as 1).
(Usually when one talks about Hölder continuity the exponent α is tacitly
assumed to be positive for the above reason.)
11.12 Exercise. If α > 1 and γ ∈ [0, +∞), show that a Hölder-continuous function
f with exponent α and constant γ defined on any interval I ⊂ R is necessarily
constant. (This explains the condition α ≤ 1 in Definition 11.2.7.)
Henceforth all discussions of Hölder continuity will assume that the exponent
α satisfies α ∈ (0, 1].
11.2.8 Proposition. For 0 < α ≤ β ≤ 1 we have C 0,α (I) ⊃ C 0,β (I) for any
bounded interval I ⊂ R. More specifically, if I has length ℓ < +∞ and f is
(β, γ)-Hölder, then f is (α, ℓβ−α γ)-Hölder.
Proof. For t ∈ [0, 1] we have tβ ≤ tα . Applying this with t = |y − x| /ℓ ≤ 1 (for
x, y ∈ I) we obtain ℓ−β |y − x|α ≤ ℓ−α |y − x|α . Multiplying both sides by ℓβ γ, the
implication
f is (β, γ)-Hölder ⇒ f is (α, ℓβ−α γ)-Hölder
follows.
11.2.9 Proposition. Fix α ∈ (0, 1]. Then the function powα : [0, +∞) → R :
x 7→ xα is subadditive, namely:
(x + y)α ≤ xα + y α for all x, y ∈ [0, +∞).
Proof. First we prove that u, v > 0 and u + v = 1 imply uα + v α ≥ 1. Under these

assumptions we have u, v < 1, so expu and expv are monotonically decreasing.
Moreover, since α ≤ 1, we have uα ≥ u1 = u and v α ≥ v 1 = v, whence uα + v α ≥
u + v = 1 follows.
To prove the inequality in the statement of the proposition, we note that the
case x = 0 (resp., y = 0) is trivial, so we assume x, y > 0 henceforth. Put
u = x/(x + y), v = y/(x + y). Note that u, v satisfy the hypotheses above. We
obtain
(x + y)α = (x + y)α · 1 ≤ (x + y)α (uα + v α ) = [(x + y)u]α + [(x + y)v]α = xα + y α .
11.2.10 Proposition. Let 0 < α ≤ 1. The function powα : [0, +∞) → R : x 7→

xα is α-Hölder continuous (with constant γ = 1).
Proof. powα is subadditive, by Proposition 11.2.9. It is also monotonically in-

creasing. Hence,
(y − x)α = powα (y − x) ≥ powα (y) − powα (x) = y α − xα ≥ 0
whenever 0 ≤ x ≤ y (Proposition 9.1.5).

Now consider arbitrary nonnegative x and y. If either of them is zero, there is
nothing to prove. Otherwise, we may suppose that x ≤ y (if not, interchange x
and y). Then, from the above, |y α − xα | = y α − xα ≤ (y − x)α = 1 · |y − x|α .
11.2.11 Proposition. A Hölder continuous function is uniformly continuous.
Proof. Suppose that I is a real interval and f : I → R is Hölder continuous with
exponent α and constant γ. Given ϵ > 0, choose δ = (ϵ/γ)1/α . Then, for any c ∈ I,
we have
(x ∈ I) ∧ (|x − c| < δ) ⇒ |f (x) − f (c)| ≤ γ |x − c|α < γ · δ α = ϵ,
since powα is strictly increasing on [0, +∞).

11.2.12 Remarks. • From the last two propositions it follows that
f : [0, +∞) → R : x 7→ xα
√
(in particular, x 7→ x) is uniformly continuous for α ∈ (0, 1].
• If a sequence {xn } in [0, +∞) converges to L, then {xn α} has limit Lα .

11.13 Exercise. Refer to Exercise 9.4. Let f : [0, 1] → [0, 1] be the Cantor
function (the graph of f is the devil’s staircase). Let αo = log3 (2) = 0.6309 . . . .
1. Prove that f is not α-Hölder continuous for any α > α0 . (Note that f (x) =
xαo for x = 1, 3−1 , 3−2 , . . . .)
2. For z ∈ [0, 1], let B(z) = sup f ∗ ({z}). Prove that B is a right inverse to the
Cantor function (that is, f (B(z)) = z for all z ∈ [0, 1]) satisfying B(f (x)) ≥ x
for all x ∈ [0, 1] and f (B(z) + B(w)) ≤ z + w whenever B(z) + B(w) ≤ 1.
(It may help to note that equality holds if and only if the strict binary
expansions of z, w are “disjoint” in the following sense: for all j ∈ N+ at
least one among the j-th digits δj of z and δj′ of w is zero.)
3. Prove that f is subadditive. (Use the monotonicity of f and part 2.)
4. Find γ > 0 such that f (x) ≤ γxαo for all x ∈ [0, 1].
5. Combine the previous two parts to show that f is α-Hölder continuous for
any exponent α ∈ (0, αo ].
11.2.13 Definition. Lipschitz Continuity Let I denote a real interval. The

function f : I → R is called Lipschitz continuous if it is Hölder continuous with
exponent 1. That is, if there is a constant γ such that
|f (x) − f (y)| ≤ γ |x − y|
for all x and y in I. The constant γ is called the Lipschitz (or Hölder) constant
of f .
11.2.2 Contractions
We introduce a class of functions that plays an important role in many iterative
methods.
11.2.14 Definition. Contractions Let I ⊂ R be an interval. The function
f : I → I is called a contraction if it is Lipschitz continuous with Lipschitz constant
γ < 1. In other words, f is a contraction if there exits a constant γ < 1 such that
|f (x) − f (y)| ≤ γ|x − y|
holds for all x and y in I.

11.2.15 Theorem. Contraction Principle Let I be a closed interval. If f :
I → I is a contraction, then
(1) the equation x = f (x) has a unique solution,
(2) for any x0 in I the sequence {xk } defined by xk+1 = f (xk ) (k = 0, 1, 2, . . .)

converges to this solution.
Proof. Let f : I → I be a given contraction, with I a closed subset of R. We

prove that the equation x = f (x) cannot have more than one solution: Suppose
that x∗ and y∗ are any two solutions of x = f (x). Then, we have
|x∗ − y∗ | = |f (x∗ ) − f (y∗ )| ≤ γ|x∗ − y∗ |.
If x∗ ̸= y∗ the above gives γ ≥ 1, a contradiction (since γ < 1). We conclude that

x∗ ̸= y∗ .
Next, we prove that {xn } is convergent. We have
|xn+2 − xn+1 | = |f (xn+1 ) − f (xn )| ≤ γ|xn+1 − xn |,

showing that the sequence {xn } is contractive and, therefore, convergent (The-
orem 10.7.3). Since {xn } is a sequence the closed set I, the limit belongs to I
also.
A contraction is continuous. Let x∗ denote the limit of {xk }. Then, the conti-
nuity of f gives
lim xn = x∗ ⇒ lim f (xn ) = f (x∗ ).
n→∞ n→∞
Therefore, since xn+1 = f (xn ),
x∗ = lim xn = lim xn+1 = lim f (xn ) = f (x∗ ),

n→∞ n→∞ n→∞
giving x∗ = f (x∗ ). That is, x∗ satisfies the equation x = f (x).

11.2.16 Example. Compute the number
√ √
√
c = 2 + 2 + 2 + · · ·.
√
Define f : [0, +∞) → R : x 7→ 2 + x. Define the sequence {xn } by x0 = 0 and
xn+1 = f (xn ) for n = 0, 1, 2, . . .. We have
( )√ √
√ √ 2 + x + 2 + y
|f (x) − f (y)| = 2+x− 2+y √ √
2 + x + 2 + y
|2 + x − (2 + y)|
= √ √
2+x+ 2+y
|x − y|
≤ √ ,
2 2
showing that f is a contraction.
√ Therefore, the equation x = f (x) has a unique
solution c, which satisfies c = 2 + c. This gives c = 2.
11.14 Exercise. Find a function f : R → R such that |f (x) − f (y)| < |x − y| and
such that the equation x = f (x) has no solutions. Conclude that the conclusion of
the contraction theorem fails for γ = 1, if the defining condition is strengthened to
|f (y) − f (x)| < γ|y − x|.
11.2.3 Continuity-Preserving Operations

11.2.17 Theorem. Let S be a subset of E and let c ∈ E. Assume f, g : S → E
are continuous at c. If the function α + f (resp., αf , f + g, f g, f /g, |f |, −f ; see
Definition 11.1.17) is defined at c, it is also continuous at c.
In particular, the functions α + f , αf , f + g, f g, f /g, |f |, −f are continuous
on their entire domain of definition, provided f and g are continuous.
(Note that the theorem states absolutely nothing about those points c at which
the functions α + f , …, are undefined.)
Proof. Apply Theorem 11.1.18.
11.2.18 Theorem. Let S ⊂ E and f : S → T be continuous at c ∈ S. Let T ⊂
E and g : T → R be defined and continuous at f (c) ∈ T . Then the composition
g ◦ f : S → R is defined and continuous at c. In particular, the composition of
continuous functions is continuous (wherever it is defined).
Proof. Suppose that f and g are as stated in the hypotheses of the theorem.
Given a neighborhood N ′′ of g(f (c)) there exists a neighborhood N ′ of f (c) such
that g∗ (N ′ ) ⊂ N ′′ , by continuity of g at f (c). There exists a neighborhood N of c
such that f∗ (N ) ⊂ N ′ , by continuity of f at c. Then (g ◦ f )∗ (N ) = g∗ (f∗ (N )) ⊂
g∗ (N ′ ) ⊂ N ′′ , so g ◦ f is continuous at c.
11.2.4 Continuity and Open Sets

If f : S → R is continuous on S, the forward image f∗ (T ) of an open subset T of S
is not necessarily an open set. For instance, consider f : R → R : x 7→ 1. No open
subset of E is mapped to an open set, since f∗ (T ) = {1} for every T ⊂ R. Is the
inverse image f ∗ (T ) of an open set T an open set?√The answer is a qualified yes. To
illustrate why, consider f : [0, +∞) → R : x 7→ x. We have f ∗ ((−1, 1)) = [0, 1),
which is not open. But, we can salvage the situation if we make the notion of open
set a relative notion, as shown below.
11.2.19 Definition. Fix a subset S ⊂ E.
For every x ∈ S, a subset T ⊂ S is called a neighborhood of x relative to S, or
a (relative) neighborhood of x in S if some neighborhood N of x (in E) satisfies
N ∩ S ⊂ T . Equivalently, T is a relative neighborhood of x in S if there exists
ϵ > 0 such that Nϵ (x) ∩ S ⊂ T . (In words: T contains all points of S sufficiently
close to x.)
A subset T ⊂ S is open relative to S, or T is a relatively open subset of S, or T
is (relatively) open in S if T is a relative neighborhood (in S) of each of its points.
(We caution the reader that “neighborhood in S” always means “relative neigh-
borhood in S”, and “open in S” always means “relatively open in S”.)
11.2.20 Remark. In the case S = E above, we have that T is relatively open in
E if and only if T is open, so no confusion is possible. In fact, no confusion can
arise whenever S is any open set as the exercise below shows.
11.15 Exercise. Prove the following statements.
1. A set T ⊂ E is relatively open in E if and only if T is an open set.

2. More generally, if S is any open subset of E (for instance, if S is an open

interval (a, b), or S = R), then a set T ⊂ S is relatively open in S if and only
if T is open.
3. If c is any isolated point of a subset S ⊂ E then {c} is a relatively open
subset of S, although {c} is not an open set in E.
11.2.21 Proposition. Let T ⊂ S ⊂ E. The statements below are equivalent:
(1) T is open relative to S.
(2) There exists an open set U such that T = S ∩ U .
Proof. We prove that (1) implies (2) Suppose that (1) holds. For each x ∈ T ,
choose ϵ(x) > 0 such that S ∩ Nϵ(x) (x) ⊂ T . Let
∪
U= Nϵ(x) (x).
x∈T
Then U is an open set. For all x ∈ T we have x ∈ S (since T ⊂ S) and also

x ∈ Nϵ(x) (x), hence x ∈ U . Therefore T ⊂ S ∩ U . Conversely we have, by the
generalized de Morgan’s Laws (Proposition 9.4.7),
∪
S∩U = (S ∩ Nϵ(x) (x)) ⊂ T
x∈T
since S ∩ Nϵ(x) (x) ⊂ T for each x ∈ T . We conclude that T = S ∩ U .

We prove that (2) implies (1) Suppose that (2) holds. Let U ⊂ E be an
open set such that T = S ∩ U . Certainly U ⊃ U ∩ S = T , so U is a neighborhood
(in E) of every point x ∈ T . Also U ∩ S = T ⊂ T , so T is a relative neighborhood
(in S) of each of its points: T is open relative to S.
11.2.22 Proposition. 1. A function f : S → E is continuous at a point c ∈ S
if and only if, for every neighborhood N ′ of f (c), the inverse image f ∗ (N ′ )
is a relative neighborhood of c in S.
2. A function f : S → E is continuous if and only if, for every open set U the
inverse image f ∗ (U ) is relatively open in S.
Proof. 1. Let f be continuous at c ∈ S and N ′ be any neighborhood of f (c). By
definition, there exists a neighborhood N of c (in E) such that f∗ (S ∩ N ) ⊂
N ′ . Then S ∩ N ⊂ f ∗ (N ′ ), showing that f ∗ (N ′ ) is a relative neighborhood
of c in S.
Conversely, assume that f ∗ (N ′ ) is a relatively open neighborhood of c when-
ever N ′ is a neighborhood of f (c). Then, for any such N ′ we have f ∗ (N ′ ) =
S ∩ N for some neighborhood N of c, by Proposition 11.2.21. By definition
of inverse image, we have f∗ (S ∩ N ) ⊂ N ′ , showing that f is continuous at c.
2. Let f : S → E be continuous. Then f is continuous at c for each c ∈ S.

Let U ⊂ E be open. Certainly f ∗ (U ) is a subset of S. Furthermore, for all
c ∈ f ∗ (U ) the open set U is a neighborhood of f (c). By continuity of f at c
and part 1. above, f ∗ (U ) is a relative neighborhood of c. By definition, we
have f ∗ (U ) is relatively open in S.
Conversely, assume that f ∗ (U ) is relatively open in S for every open set U .
For all c ∈ S, we claim that f ∗ (N ′ ) is a relative neighborhood of c when-
ever N ′ is a neighborhood of f (c). In fact, any such N ′ includes an open
neighborhood U of f (c), and f ∗ (N ′ ) includes f ∗ (U ), which is open (thus a
neighborhood of c), so f ∗ (N ′ ) is a neighborhood of c. This proves the claim.
By the claim above and part 1., f is continuous at every point c ∈ S, hence
f is continuous by definition.
11.2.5 Continuity and Connectedness

11.2.23 Proposition. If A, B are disjoint and relatively open subsets of S ⊂ E,
there exist disjoint open sets U, V such at A = S ∩ U and B = S ∩ V . (Here
“relatively open” means “relatively open in S”.)
Proof. We prove the result under the assumption that S ⊂ R, leaving the general
case S ⊂ E as an exercise.
For all x ∈ A choose ϵ(x) such that S ∩ N2ϵ(x) (x) ⊂ A. For all y ∈ B choose
ϵ(y) such that S ∩ N2ϵ(x) (y) ⊂ B. Let
∪
U= Nϵ(x) (x)
x∈A
∪
V = Nϵ(y) (y).
y∈B
Clearly U, V are open and U ⊃ A, V ⊃ B, so S ∩ U ⊃ A, S ∩ V ⊃ B. Further, by

generalized de Morgan (Proposition 9.4.7),
∪( )
S∩U = S ∩ Nϵ(x) (x) ⊂ A
x∈A
∪( )
S∩V = S ∩ Nϵ(y) (y) ⊂ A.
y∈B
Hence A = S ∩ U , B = S ∩ V .
We show that U, V are disjoint. By choice of ϵ(x), we have |z − x| ≥ 2ϵ(x) for
all x ∈ A, z ∈ S \ A. Similarly, |z − y| ≥ 2ϵ(y) for all y ∈ B, z ∈ S \ B. Since A, B
are disjoint, we have in particular
|y − x| ≥ 2 max{ϵ(x), ϵ(y)} ≥ ϵ(x) + ϵ(y) for all x ∈ A, y ∈ B.
Now, the triangle inequality gives
|z − x| + |z − y| ≥ |y − x| ≥ ϵ(x) + ϵ(y) for all z ∈ R.
Therefore we have either |z − x| ≥ ϵ(x) or |z − y| ≥ ϵ(y) for all z ∈ R. This shows

that Nϵ(x) (x) ∩ Nϵ(y) (y) = ∅ for all x ∈ A, y ∈ B. Finally, generalized de Morgan
gives ∪( )
U ∩V = Nϵ(x) (x) ∩ Nϵ(y) (y) = ∅.
x∈A
y∈B
This concludes the proof (under the assumption S ⊂ R).

11.16 Exercise. Complete the proof of Proposition 11.2.23 in the general case
when S ⊂ E.
11.2.24 Proposition. A set S is disconnected if and only if it is the union of two
disjoint, nonempty and relatively open subsets.
A set S is connected if and only if whenever A, B are disjoint and relatively
open subsets of S such that S = A ∪ B, we have A = S or B = S.
Proof. We only prove the first assertion since it implies the second one tautologi-
cally.
Let S be disconnected, so there exist disjoint open sets U, V ⊂ E such that
S ⊂ U ∪ V while U ∩ S and V ∩ S are nonempty. Then A = U ∩ S and B = V ∩ S
are two disjoint, nonempty, relatively open subsets of S whose union is S.
Conversely, let S = A ∪ B be the union of two disjoint, nonempty and relatively
open subsets. By Proposition 11.2.23, we have A = S ∩ U , B = S ∩ V for some
disjoint open sets U, V ⊂ E. Then S = A ∪ B ⊂ U ∪ V so U, V cover S, proving
that S is disconnected.
11.2.25 Proposition. Let S ⊂ E be connected, and let f : S → E be continuous.
Then f∗ (S) is a connected subset of E.
Proof. Let U, V be disjoint relatively open sets covering f∗ (S). Let A = f ∗ (U ),
B = f ∗ (V ). U, V are disjoint and cover f∗ (S), so A, B are disjoint and cover S.
Furthermore, A, B are relatively open subsets of S, by Proposition 11.2.22. By
Proposition 11.2.23, we conclude that one of the subsets A, B of S is improper,
say A = S, so U ⊃ f∗ (S), and therefore f∗ (S) is connected.
11.2.26 Theorem (Intermediate Value Theorem). Let I = [a, b] be a nonempty
closed interval in E. Let f : I → E be continuous.
• If f (a) ≤ f (b) then, for every y ∈ [f (a), f (b)] we have y = f (x) for some
x ∈ [a, b].
• If f (b) ≤ f (a) then, for every y ∈ [f (b), f (a)] we have y = f (x) for some
x ∈ [a, b].
Proof. By Propositions 10.2.3 and 11.2.25, the intervals I and J = f∗ (I) are both
connected, hence convex. Since f (a), f (b) ∈ J we have y ∈ J for every y between
f (a) and f (b), so y = f (x) for some x ∈ I.
11.2.27 Corollary. Let I = [a, b] be a nonempty closed interval in E. Let f : I →
E be continuous. If f (a)f (b) < 0 then f (x) = 0 for some x ∈ I.
Proof. We have f (a)f (b) < 0 if and only if, either f (a) < 0 < f (b), or else f (b) <
0 < f (a), so the stated result follows by the Intermediate Value Theorem 11.2.26
with y = 0.
11.2.6 Continuity and Compact Sets

Recall that a set K ⊂ E is compact if every open cover U = {Uα | α ∈ A} of K
admits a finite subcover {Uβ | β ∈ B} (indexed by a finite subset B ⊂ A).
In the case of subsets K of E, the Heine-Borel Theorem (10.2.9) states that K
is compact if and only if K is a closed subset of E.
We emphasize that closed subsets C of R may not be closed in E, hence not
compact. In fact compact subsets K ⊂ R are those that are both closed and
bounded in R: these conditions ensure that K is closed in E. If C is closed in R,
but not bounded (say, above), then C is not closed in E since the closure of C in E
includes at least one infinite point (say, +∞).
11.2.28 Proposition. A subset K ⊂ E is compact if and only if every cover of K
by sets relatively open in K admits a finite subcover.
Proof. Assume K is compact. Let S = {Sα | α ∈ A} be any cover of K by
relatively open subsets of K. For every α ∈ A, Proposition 11.2.21 asserts that
Sα = K ∩ Uα for some open set Uα . Clearly U = {Uα | α ∈ A} is a cover of K by
open sets. By compactness, {Uβ | β ∈ B} covers K for some finite B ⊂ A, and
clearly {Sβ | β ∈ B} is a finite subcover of S.
Conversely, assume every cover of K by sets relatively open in K admits a finite
subcover. Given any open cover U = {Uα | α ∈ A} of K, for each α ∈ A we let
Sα = K ∩ Uα be the induced relatively open subset of K. Then S = {Sα | α ∈ A}
is a cover of K by relatively open subsets. By hypothesis, {Sβ | β ∈ B} covers K
for some finite B ⊂ A. Then clearly {Uβ | β ∈ B} is a finite subcover of U.
11.2.29 Proposition. Let S be a compact subset of E. Let f : S → E be

continuous. Then f∗ (S) (the image of f ) is a compact subset of E.
Proof. Let V = {Vα | α ∈ A} be any open cover of f∗ (S). For each α ∈ A, let
Nα = f ∗ (Vα ) be the inverse image of Vα . By Proposition 11.2.22, {Nα | α ∈ A}
is a relatively open cover of S. Using Proposition 11.2.28 one finds a subcover
{Nβ | β ∈ B} indexed by a finite set B ⊂ A. For each y ∈ f∗ (S) we have y = f (x)
for some x ∈ S, hence x ∈ Nβ for some β ∈ B (since {Nβ } covers S) and thus
y = f (x) ∈ Vβ . Therefore the finite family {Vβ | β ∈ B} covers f∗ (S). Applying
proposition 11.2.28 once more, we conclude that f∗ (S) is compact.
11.2.30 Theorem (Extreme Value Theorem: General version). Let S be a
nonempty compact subset of E and f : S → E be continuous. Then there exist
s, t ∈ S such that
f (s) ≤ f (x) ≤ f (t) for all x ∈ S.
In other words, f takes its maximum and minimum values on S.
Proof. Let S be compact and f : S → E be continuous. Since S is nonempty
and compact, Proposition 11.2.29 states that f∗ (S) is a compact (and obviously
nonempty) subset of E. By Heine-Borel’s Theorem 10.2.9 and Proposition 10.2.8,
f∗ (S) includes its extrema: inf(f∗ (S)) = f (s), inf(f∗ (S)) = f (t) for some s, t ∈ S,
whence the assertion follows.
11.2.31 Corollary (Extreme Value Theorem: Classical version). If I ⊂ R
is any nonempty finite closed interval, and f : I → R is continuous, then there
exist s, t ∈ I such that
f (s) ≤ f (x) ≤ f (t) for all x ∈ I.
In particular, f is bounded on I.
Proof. The finite closed interval I is compact by the Heine-Borel Theorem 9.6.12,
so the stated inequalities follow from Theorem 11.2.30 while the boundedness state-
ment is obvious since f (s), f (t) ∈ R.
Recall that f : (0, +∞) → R : x 7→ 1/x is continuous,
√ but not uniformly
continuous. In fact, given any δ > 0 and any n > 1/ δ we have f (1/(n + 1)) −
f (1/n) = 1 while |1/(n + 1) − 1/n| < 1/n2 < δ, so the definition of uniform
continuity fails for ϵ = 1. In fact, the argument above shows that restriction
of f to any interval (0, ϵ) for arbitrary ϵ > 0 fails to be uniformly continuous.
The proposition below shows that continuous functions with compact domains are
necessarily uniformly continuous.
11.2.32 Proposition. Let S be a compact subset of R and let f : S → R be

continuous. Then f is uniformly continuous on S.
Proof. Let f : S → R be continuous on the compact set S. Let ϵ > 0 be given. For
each x ∈ S choose δ(x) > 0 such that |f (y) − f (x)| < ϵ/2 for all y ∈ S ∩ Nδ(x) (x).
Certainly the family {Nδ(x)/2 (x) | x ∈ S}, indexed by all points of S, is an open
cover of S. By compactness of S, a finite subcover exists; it has the form {Ni | 1 ≤
i ≤ n} where Ni = Nδ(xi )/2 for suitable xi in S (for 1 ≤ i ≤ n).
Let δ = 21 min{δ(xi ) | 1 ≤ i ≤ n}. For all x, y ∈ S with |y − x| < δ there exists
i such that x ∈ Ni ⊂ Nδ(xi ) (xi ). Since |y − xi |≤ |y − x| + |x − xi | < δ + 12 δ(xi ) ≤
1
2
δ(xi ) + 12 δ(xi ) = δ(xi ), we have y ∈ Nδ(xi ) (xi ) also. Hence,
ϵ ϵ
|f (y) − f (x)| ≤ |f (y) − f (xi )| + |f (x) − f (xi )| < + = ϵ.
2 2
We conclude that f is uniformly continuous on S.
11.3 Density and compactness in spaces of func-

tions
11.3.1 Topological notions in B(S; R)
Throughout this section we fix an arbitrary nonempty set S. (S need not be a
subset of E.)
Let B(S) = B(S; R) be the set of all bounded functions S → R. For f ∈ B(S),
let ∥f ∥ = sup{|f (x)| | x ∈ S}. Note that ∥f ∥ < +∞ since f is bounded by
assumption.
The triangle inequality immediately implies that the distance du (f, g) = ∥g − f ∥ ≤
∥f ∥ + ∥g∥ is finite‡ for all f, g ∈ B(S). Using uniform ϵ-neighborhoods
Nϵu (f ) = {g ∈ B(S) | ∥g − f ∥ < ϵ}
of a given function f ∈ B(S) one defines general (uniform) neighborhoods of f as

those sets N ⊂ B(S) that include a uniform ϵ-neighborhood for some ϵ > 0, and
subsequently uniform open sets are defined as those that are neighborhoods of each
of its members.
‡
Note that ∥g − f ∥ can very well be finite even if f, g are unbounded. For technical simplicity,
we shall henceforth mostly work in the space B(S) of bounded functions. Just as R is the set of
extended real numbers x with |x| < +∞, B(S) is the set of functions f : S → R with ∥f ∥ < +∞.
Our primary interest will eventually be the space C(S) of continuous real-valued functions on a
compact set S; such functions are necessarily bounded, by the Extreme Value Theorem 11.2.31.
Similarly, a (uniform) limit point of S ⊂ B(S) is a function f ∈ B(S) whose

every neighborhood N includes points of S other than f . In fact, by applying this
definition to each of the shrinking neighborhoods N1/(n+1) (f ), it is readily shown
that every neighborhood of a limit point includes infinitely many points of S.
Uniform closed sets of B(S) are defined, at the outset, as those sets that include
all their limit points. In fact, the same general arguments used in Notes 9 and 10
show that uniform closed sets C are exactly the complements in B(S) of uniform
open sets U , and are also exactly the sets C that contain the uniform limit f of
any uniformly convergent sequence {fn } in B(S).
A uniform compact set K ⊂ B(S) is defined by the property that every (uni-
form) open cover U of K includes a finite subcover V. Equivalently, a uniform
compact set K is a (uniform) closed subset of B(S) possessing the Finite Intersec-
tion Property: If F is a collection of (uniform) closed subsets of K all of whose
finite subcollections G ⊂ F have nonempty intersection, then F has nonempty in-
tersection. (And we remark that the a priori requirement that K be uniform closed
is superfluous if one works with relatively uniform closed subsets of K throughout).
Henceforth we shall drop the adjective “uniform” and use it only for emphasis.
There is little room for confusion since we have defined no other topology in B(S).
11.17 Exercise. Fixed a nonempty set S and ϵ > 0. Prove that
{f ∈ B(S) | ∥f ∥ ≤ ϵ}
is the uniform closure of Nϵu (f ) in B(S).

11.3.1 Definition. A family F ⊂ B(S) of functions is equibounded if it is a
bounded subset of B(S) in the sense that there exists a real number B ≥ 0 such
that ∥f ∥ ≤ B for all f ∈ F .
11.3.2 Definition. A sequence {fn } in B(S) is (uniformly) Cauchy if
(∀ϵ > 0)(∃N ∈ N)(∀m, n > N )(∥fn − fm ∥ < ϵ).
11.3.3 Proposition. • Every uniformly convergent sequence in B(S) is uni-

formly Cauchy.
• Every uniformly Cauchy (hence every uniformly convergent) sequence {fn }

in B(I) is equibounded.
11.3.4 Theorem. A sequence {fn } in B(S) is uniformly Cauchy if and only if it
is uniformly convergent: B(S) is complete with respect to the uniform metric.
Proof. Let {fn } be uniformly Cauchy. Then {fn (x)} is Cauchy for each x ∈ S.
By the Cauchy Principle in R (Theorem 10.5.5), there exists f (x) ∈ R such that
fn (x) → f (x) as n → ∞, and the function f : x 7→ f (x) is the pointwise limit of
fn as n → ∞.
Using the hypothesis that {fn } is uniformly Cauchy, we show that fn → f
uniformly. Let ϵ > 0 be given. Choose N large enough that ∥fn − fm ∥ < ϵ/3 for
all m, n > N . For now, fix x ∈ S and let y = fN +1 (x). Since fn (x) is eventually in
the neighborhood (y − 3ϵ , y + 3ϵ ) of y, the limit f (x) of this sequence belongs to the
closed neighborhood [y − 3ϵ , y + 3ϵ ]. Therefore, |f (x) − fN +1 (x)| ≤ 3ϵ . For all n > N
we have
ϵ ϵ 2
|fn (x) − f (x)| ≤ |fn (x) − fN +1 (x)| + |fN +1 (x) − f (x)| < + = ϵ.
3 3 3
Since this holds for all x ∈ S, we have ∥fn − f ∥ ≤ 32 ϵ < ϵ for all n > N , so fn → f
uniformly.
Conversely, let {fn } be uniformly convergent to f . Given ϵ > 0 choose N large
enough that ∥fn − f ∥ < ϵ/2 for all n > N . Then we have, for all m, n > N ,
ϵ ϵ
∥fn − fm ∥ ≤ ∥fn − f ∥ + ∥fn − f ∥ < + = ϵ,
2 2
so {fn } is uniformly Cauchy.
11.3.2 Uniform convergence and continuity

Throughout this section, S will be a nonempty subset of E. Let
C(S) = C(S; R) = {f ∈ B(S; R) | f is continuous}.
Recall Example 11.1.4. For x ∈ [0, 1], we have

{
0, 0 ≤ x < 1;
lim xn = f (x) =
n→∞ 1, x = 1.
Hence fn → f pointwise, where fn : x 7→ xn .

Since {xn } monotonically decreases to its limit f (x) as n → ∞, we have
0 ≤ f (x) ≤ fn (x) ≤ 1 for all n ∈ N and x ∈ [0, 1]
so |fn (x) − f (x)| ≤ 1, and hence ∥fn − f ∥ ≤ 1 for all n ∈ N.

Evidently we have |f0 − f | = 1 since f0 ( 21 ) − f ( 12 ) = 1 − 0 = 1. Now, fix n > 0
and y ∈ [0, 1), and let x = y 1/n ∈ [0, 1). Then |fn (x) − f (x)| = y. Therefore,
∥fn − f ∥ ≥ y. Since this holds for all y ∈ [0, 1), we conclude ∥fn − f ∥ ≥ 1,
hence ∥fn − f ∥ = 1. Summing up, although fn → f pointwise, we actually have

∥fn − f ∥ = 1 for all n, so fn ̸→ f in the uniform topology (i.e., fn does not tend
to f uniformly). In the uniform topology, the sequence {fn } is divergent.
Note that the sequence {fn } consists of continuous functions, yet the pointwise
limit function f is discontinuous. This lamentable situation cannot happen if the
limit is uniform, as the following proposition shows.
11.3.5 Proposition. Let S be a nonempty subset of E, fix x ∈ S, and let {fn } be
a sequence of functions in B(S) such that fn is continuous at x for all n. If fn → f
uniformly on S, then f is continuous at x.
Proof. Let x ∈ S and ϵ > 0 be given. Choose N large enough that ∥fn − f ∥ < ϵ/3
for all n > N . Using the continuity of fN +1 , choose a neighborhood N of x such
that |fN +1 (y) − fN +1 (x)| < ϵ/3 for all y ∈ S ∩ N . Then
|f (y) − f (x)| ≤ |fN +1 (x) − f (x)| + |fN +1 (y) − fN +1 (x)| + |f (y) − fN +1 (y)|
ϵ ϵ ϵ
< + + =ϵ for all y ∈ S ∩ N .
3 3 3
Therefore, f is continuous at x.
11.3.6 Corollary. Let S be a nonempty subset of E and let {fn } be a sequence
of functions in C(S). If fn → f uniformly on S, then f is continuous.
Proof. Apply Proposition 11.3.5 at each point x ∈ S.
11.3.7 Corollary. For any subset S ⊂ E, C(S) is a closed subset of B(S).
Proof. The limit of any uniformly convergent sequence {fn } in C(S) belongs to C(S),
by Corollary 11.3.6. Therefore, C(S) is a closed set in B(S).
11.3.8 Proposition. Let S be a nonempty subset of R, fix x ∈ S, and let {fn }
be a sequence of functions in C(S) such that fn is uniformly continuous for all n.
If fn → f uniformly on S, then f is uniformly continuous on S.
Proof. Let ϵ > 0 be given. Choose N large enough that ∥fn − f ∥ < ϵ/3 for all
n > N . Using the uniform continuity of fN +1 , choose δ > 0 such that |fN +1 (y) −
fN +1 (x)| < ϵ/3 for all x, y ∈ S with |y − x| < δ. For all such x, y we have
|f (y)−f (x)| < ϵ, by the same chain of inequalities in the proof of Proposition 11.3.5
above. Therefore, f is uniformly continuous on S.
11.3.3 Equicontinuity and Compactness

11.3.9 Proposition. Fix a nonempty set S. A compact subset F ⊂ B(S) is
equibounded.
Proof. Let F ⊂ B(S) be compact. For ϵ > 0, let Uϵ = {f ∈ B(S) | ∥f ∥ < 1ϵ }. The
family U = {Uϵ | ϵ > 0} is an open cover of B(S), hence of F. Since F is compact,
F is covered by a finite subcollection {Uϵi | 1 ≤ ∪ i ≤ n} for some positive numbers
ϵ1 , . . . , ϵn . Let ϵ = min{ϵi | 1 ≤ i ≤ n}. Clearly, 1≤i≤n Uϵi = Uϵ , hence ∥f ∥ < 1/ϵ
for all f ∈ F , showing that F is equibounded.
11.3.10 Proposition. Fix a nonempty set S. A compact subset F ⊂ B(S) is
uniform closed.
Proof. Let F ⊂ B(S) be compact. Let f be a uniform limit point of F. For each
ϵ > 0, let Uϵ = {f ∈ B(S) | ∥f ∥ > ϵ}. By Exercise 11.17 and the positivity of the
distance du , the collection U = {Uϵ | ϵ > 0} is an open cover of F \ {f }. Clearly,
no subcover of U covers F ∪ {f } since f ∈/ Uϵ for all ϵ > 0. Since F is compact, U
does not cover F, hence f ∈ F .
11.3.11 Proposition. Fix a nonempty set S and a compact subset K ⊂ B(S).
Then K is sequentially compact: Every sequence {fn } in K has a uniformly con-
vergent subsequence.
Proof. Let K ⊂ B(S) be compact. Let {fn } be a sequence in K. The method
used in the proof of the Bolzano-Weierstrass Theorem 10.4.11 (using the uniform
topology of B(S) instead of the topology of E) constructs a subsequence {fnk } of
{fn } with uniform limit f .
11.3.12 Remark. A topologic space in which every sequence has a convergent
subsequence is called sequentially compact. Every compact space is sequentially
compact. In fact, it can be shown that in the context of metric spaces (i.e., spaces
whose topology is defined by a suitable distance function d(·, ·) satisfying the prop-
erties stated in Definition 11.3.1) sequential compactness also implies compactness.
The proof of this converse is significantly more complicated than that of Proposi-
tion 11.3.11 (the proof of the direct implication given above generalizes immediately
to any space compact with respect to a metric topology).
11.3.13 Example. Let S = [0, 1]. The set B = {f ∈ B(S) | ∥f ∥ ≤ 1} is equi-
bounded and closed, by Exercise 11.17. However, B is not sequentially compact.
For n ≥ 1, define fn : [0, 1] → R by


2nx, 0 ≤ x < 2n
1
fn (x) = 2(1 − nx), 2n

1
≤ x ≤ n1


0, 1
n
≤ x ≤ 1.
It is straightforward to verify that fn ∈ B. Moreover, fn → 0 pointwise while

∥fn ∥ = 1 for all n. Therefore, every subsequence of {fn } has as pointwise limit the
zero function, but does not uniformly converge to this limit (even considering that
each function fn and the limit function 0 are continuous, as the reader can easily
verify). This shows that B is a closed and equibounded subset of B(S) that is not
sequentially compact, hence not compact.
(Example 11.1.4 provides another instance, but we want to emphasize that the
failure have a uniformly convergent subsequence can happen even if the pointwise
limit exists and is a continuous function).
The example above suggests that the failure of an equibounded sequence {fn } to
have a uniformly convergent subsequence may be explained by a lack of uniformity
in continuity, not for each function fn individually (since each fn in the example
is continuous on a compact interval, hence uniformly continuous), but across the
entire family {fn }. More precisely, given ϵ ∈ (0, 1) we may choose δ = 2n
1
to show
fn is uniformly continuous, but no single choice of δ > 0 works for all n. This
motivates the following definition.
11.3.14 Definition. Fix a nonempty set S ⊂ R. A family of functions F ⊂ B(S)
is called equicontinuous if, for all ϵ > 0 there exists δ > 0 such that
|y − x| < δ ⇒ |f (y) − f (x)| < ϵ for all f ∈ F and all x, y ∈ S.
11.19 Exercise. Construct a uniformly convergent sequence {fn } of continuous

functions in [0, 1] such that {fn } is not equicontinuous, then prove that a sequen-
tially compact subset of C([0, 1]) need not be equicontinuous. (In fact, a compact
subset of C([0, 1]) need not be equicontinuous.)
11.3.15 Theorem (Arzelà-Ascoli). Fix a nonempty compact set S ⊂ R. An
equicontinuous and equibounded uniformly closed set of functions F ⊂ C(S) is
sequentially compact in the uniform topology of C(S).
Proof. Let F be an equicontinuous and equibounded closed subset of C(S). Choose
u ≥ 0 such that ∥f ∥ ≤ u for all f ∈ F . For each ϵ > 0, choose δ(ϵ) > 0 satisfying:
For all y, z ∈ S and all f ∈ F , |z − y| < δ(ϵ) ⇒ |f (z) − f (y)| < ϵ
Let {fn } be a sequence in F. Let D be a dense subset of S and let {xm } be a

sequence with range D (see Exercise 9.7).§
We will construct a function ℓ : N × N → N with the following properties:
1. The function n 7→ ℓ(0, n) is strictly increasing;
§
In fact, Arzelà-Ascoli’s Theorem is essentially trivial if S is a (nonempty) finite subset of R.
When S is infinite, D must be countable, so there exists a bijection N ∼ D : n 7→ xn .
2. for each m ∈ N, {ℓ(m + 1, n)}n is a subsequence of {ℓ(m, n)}n ;
3. for each m ∈ N, {fℓ(m,n) (xm )}n is a convergent sequence.
We construct the functions ℓ(m, ·) one at a time, by recursion on m. For m = 0,

we let n 7→ ℓ(0, n) be any subsequence of the identity sequence {n | n ∈ N} such
that {fℓ(0,n) (x0 )} converges. Since {fn (x0 )} takes values in the compact interval
[−u, u], the Bolzano-Weierstrass Theorem 10.4.11 ensures the existence of such a
subsequence. Certainly, ℓ(0, ·) is strictly increasing, being a subsequence of the
strictly increasing identity sequence, so 1. holds.
Once ℓ(m, ·) has been constructed, we let ℓ(m + 1, ·) be any subsequence of
ℓ(m, ·) such that fℓ(m+1,n) (xm ) converges to some number g(xm ) as n → ∞. Again,
this is possible thanks to Bolzano-Weierstrass’s theorem. By construction, proper-
ties 2. and 3. hold for m and m + 1, respectively. The construction of ℓ(m, ·) for
all m ∈ N is obtained by an application of the recursion theorem.
Let nk = ℓ(k, k) for each k ∈ N. A simple inductive argument shows that, for
each m ∈ N, the tail {nk | k ≥ m} of {nk } is a subsequence of {ℓ(m, k) | k ∈ N}.
Since fℓ(m,n) (xm ) → g(xm ) as n → ∞, we also have fnk (xk ) → g(xk ) as k → ∞.
We show that g : D → R is uniformly continuous. More precisely, we show
that y, z ∈ D and |z − y| < δ(ϵ) imply |g(z) − g(y)| < ϵ.
Fix y, z ∈ D with |z − y| < δ(ϵ). Then |fn (z) − fn (y)| < ϵ for all n. Let
ϵ = ϵ − |fn (z) − fn (y)| > 0. Choose N large enough that |fn (y) − g(y)| < 21 ϵ′ ,
′
|fn (z) − g(z)| < 21 ϵ′ for all n > N . Then
|g(z) − g(y)| ≤ |g(z) − fn (z)| + |fn (z) − fn (y)| + |g(y) − fn (y)|

1 1
< ϵ′ + (ϵ − ϵ′ ) + ϵ′ = ϵ.
2 2
Next, we extend g to a function defined on all of S. Fix x ∈ S. For each ϵ > 0,
let Dϵ (x) = D ∩ Nϵ (x) and let Ēϵ (x) = (g∗ (Dϵ (x))− ⊂ [−u, u]. Since x ∈ S = D− ,
D̄ϵ (x) is nonempty and so is the closed set Eϵ (x). Any finite intersection of sets
{Ēϵi (x) | 1 ≤ i ≤ n} includes Ēϵ (x) with ϵ = mini ϵi , so it is nonempty.∩By the FIP
characterization of compactness of [−u, u], the intersection Ē(x) := ϵ>0 Ēϵ (x) is
nonempty.
We claim that Ē(x) contains a single element Lx . By the uniform continuity of
g proved above, we have the implication:
For all ϵ > 0, λ1 , λ2 ∈ g∗ (Dδ(ϵ)/2 (x)) ⇒ |λ2 − λ1 | < ϵ,
because y, z ∈ Dδ(ϵ)/2 implies |z − y| < δ(ϵ). Therefore:
For all ϵ > 0, λ1 , λ2 ∈ Ēδ(ϵ)/2 (x) ⇒ |λ2 − λ1 | ≤ ϵ.

Since the above holds for all ϵ > 0, no element L′ ̸= Lx may belong to all the sets
Ēδ̄(ϵ) , so a fortiori not to Ē(x) either, proving the uniqueness of Lx .
If x ∈ D, we certainly have g(x) ∈ Ē(x), hence g(x) = Lx . Therefore, we may
extend g to a function defined on all of S by letting g(x) = Lx .
Further, we claim that this extension of g satisfies the uniform continuity con-
dition:
(1) For all ϵ > 0 and all y, z ∈ S: |y − z| < δ(ϵ) ⇒ |g(z) − g(y)| ≤ ϵ.
Given arbitrary y, z ∈ S with |y − z| < δ(ϵ), let δ ′ = δ(ϵ) − |y − z| > 0. By the

triangle inequality, we have |b − a| < δ(ϵ) for all a ∈ Dδ′ /2 (y), b ∈ Dδ′ /2 (z). By the
uniform continuity of g on D we have |g(b) − g(a)| < ϵ. Therefore, |λ2 − λ1 | ≤ ϵ
for all λ1 ∈ Ēδ′ /2 (y), λ2 ∈ Ēδ′ /2 (z). In particular, |g(z) − g(y)| ≤ ϵ, proving (1).
Finally, we show that fnk → g as k → ∞, uniformly on S. Fix ϵ > 0. Put
∪
δ̄ = δ( 3ϵ ). Choose a finite subset {xm1 , . . . , xml } of D such that S ⊂ li=1 Nδ̄ (xmi )
(since D is dense in S, the neighborhoods {Nδ̂ (x) | x ∈ D} cover S; since S is
compact, finitely many of them still cover S). Using the convergence fn (xmi ) →
g(xmi ) as n → ∞, choose N large enough that |fn (xmi ) − g(xmi )| < 3ϵ for all n > N
and 1 ≤ i ≤ l (such uniform N exists since the set {x1 , . . . , xm } is finite). For every
y ∈ S, choose i such that |y − xmi | < δ̄. Using the uniform continuity of g in (1)
above we have, for all n > N ,
|fn (y) − g(y)| ≤ |fn (y) − fn (xi )| + |fn (xi ) − g(xi )| + |g(y) − g(xi )|
ϵ ϵ ϵ
< + + = ϵ.
3 3 3
Since g is the uniform limit of a sequence in F and F is closed, we have g ∈ F .
Therefore, F is sequentially compact.
11.3.16 Remarks. • Every sequentially compact metric space is compact (a
general result we do not prove in these Notes). Therefore, Arzelà-Ascoli’s
Theorem may be formulated to state that an equicountinuous and equi-
bounded uniformly closed subset of C(S) (for compact S ⊂ R) is compact in
the uniform topology.
• An equivalent formulation of Theorem 11.3.15 states that if F is equicon-

tinuous and equibounded, then it is relatively sequentially compact in the
sense that every sequence {fn } in F has a subsequential uniform limit g in
C(S) (though not necessarily in F): Note that the proof only used the hy-
potheses of equicontinuity and equiboundedness to prove the existence of the
subsequential uniform limit g.
• We emphasize that Arzelà-Ascoli’s Theorem gives sufficient conditions, not

necessary ones, for the sequential compactness of F. Using the fact (not
proved in these Notes) that a metric space F is sequentially compact if and
only if it is compact, we see that the hypotheses that F is uniformly closed
and equibounded are necessary, by Propositions 11.3.9 and 11.3.10. The extra
hypothesis of equicontinuity is sufficient, but not necessary, by Exercise 11.19.
11.3.4 Uniform approximation of continuous functions by

polynomials
11.3.17 Example. Consider the function f : x 7→ 1/(1 − x) on [0, 1). Clearly, f is
continuous; it is also unbounded since f (x) → +∞ as x → 1. On the other hand,
we have
∑
∞
f (x) = xk = 1 + x + x2 + . . .
k=0
for all x ∈ [0, 1), by the Geometric Formula. Letting
1 − xn ∑
n−1
fn (x) = = = 1 + x + · · · + xn−1 on [0, 1)
1−x k=0
we have fn → f pointwise on [0, 1). However, the convergence is not uniform:

Each fn is continuous on the compact interval [0, 1], hence bounded on [0, 1] by
Theorem 11.2.31 (and, a fortiori, on [0, 1)). Since a uniform limit of bounded
functions is bounded while f is unbounded, the pointwise limit fn → f is not
uniform.
11.3.18 Example. Consider the function g : x 7→ 1/(x2 + 1) on R. Clearly, g is
continuous and bounded (in fact, ∥g∥ = 1). Moreover. g is even and monotonically
decreasing on [0, +∞) with limx→+∞ g(x) = 0. Applying the Intermediate Value
Theorem 11.2.26 on all the intervals [0, b] for b > 0 we conclude that the range
of g is (0, 1]. (In fact, g admits a continuous extension to a surjective function
E → [0, 1] by setting g(±∞) = 0.)
We claim that ∥g − h∥ ≥ 12 for every polynomial h : R → R. If h has degree at
least 1 we have |h(x)| → +∞ as x → +∞ so there exists x0 such that |h(x0 )| ≥ 23 ,
hence 21 ≤ |h(x0 ) − g(x0 )| ≤ ∥h − g∥. If h is a constant c ≤ 21 , then 12 ≤ 1 − c =
h(0) − g(0) ≤ ∥h − g∥. If h is a constant c > 21 , then g(x0 ) = min{1, c − 12 } for
some x0 (since c − 12 > 0), so 21 ≤ h(x0 ) − g(x0 ) ≤ ∥h − g∥. This completes the
proof that ∥g − h∥ for all polynomials h. We conclude that g cannot be uniformly
approximated by polynomials.
In the first example above, the continuous function f , being unbounded, can-
not be approximated uniformly by polynomials (although it is a pointwise limit
of polynomials). In the second one, the continuous and bounded nonconstant
function g cannot be uniformly approximated by polynomials because the approx-
imating polynomials would necessarily need to stay bounded at ±∞, hence be
constant, and the nonconstant function g cannot be uniformly approximated by
constant functions.
In the first example, the unbounded behavior of f (x) as x approached the limit
point 0 clashed with the bounded behavior of any approximating polynomial, while
in the second example, the bounded behavior of g at the limit points ±∞ clashed
with the unbounded behavior at ±∞ of polynomials of positive degree.
The situation near ±∞ cannot be remedied: Polynomials are simply not a suf-
ficiently rich class of functions to uniformly approximate most functions in neigh-
borhods of ±∞. Now, only bounded functions can be uniformly approximated by
polynomials in any given bounded domain.
For essentially the same reason, polynomials cannot be used to approximate
arbitrary functions near a limit point of its domain, unless the functions themselves
are controlled near that limit point. This can be accomplished by requiring that
the domain contain all its limit points (i.e., be closed), and the function to be
approximated continuous everywhere. After all, any uniform limit of continuous
functions (such as polynomials) is continuous.
At this point, it seems plausible to conjecture that any continuous, real-valued
function on a compact set S ⊂ R can be uniformly approximated by polynomials.
This result is true; in fact, it is a particular case of the Stone-Weierstrass theorem.¶
We will settle for a modern proof, using Bernstein polynomials, of Weierstrass’s
classical result in which S is a compact interval [a, b].
11.3.19 Theorem (Weierstrass’s Approximation Theorem). For real num-
bers a < b, let P(I) be the set of all polynomial functions on the interval I = [a, b].
Then P(I) is dense in C(I) in the uniform topology.
Equivalently: For every f ∈ C(I) and every ϵ > 0 there exists a polynomial
p ∈ P(I) such that |p(x) − f (x)| < ϵ for all x ∈ I.
We will proof of Theorem 11.3.19 using properties of Bernstein’s Polynomials,
which we proceed to define.
¶
More precisely, it is a special case of Stone’s generalization of Weierstrass’s Theorem ***
below. In Stone’s theorem, S is a compact topological space (not necessarily a subset of E) and
the approximation is done via functions in a suitable class (algebra) of continuous functions on S,
not necessarily polynomials.
11.3.20 Definition. Fix natural numbers n, k with 0 ≤ k ≤ n. The k th Bernstein

polynomial of degree n is the function bnk : R → R defined by
( )
n k
n
bk (t) = t (1 − t)n−k .
k
(Above, t0 = 1 for all t ∈ R).

The functions bnk will be collectively called Bernstein polynomials.
11.3.21 Proposition. The Bernstein polynomials satisfy the following properties:
1. bnk (t) ≥ 0 for all 0 ≤ k ≤ n and all t ∈ [0, 1].
∑
n
2. bnk (t) = 1 for all n ∈ N and all t ∈ R.
k=0
3. (k + 1)(1 − t)bnk+1 (t) = (n − k)tbnk (t) for 0 ≤ k < n and t ∈ [0, 1].
4. bnn−k (1 − t) = bnk (t).

Proof. 1. Obvious, since t ≥ 0 and 1 − t ≥ 0 when t ∈ [0, 1].
∑ n ( )n
2. k b k (t) = t + (1 − t) = 1n = 1, by the Binomial Theorem.
( n ) ( )
3. This follows immediately from the identity (k + 1) k+1 = (n − k) nk and the
definition of Bernstein polynomials.
( n ) (n)
4. This follows from the identity n−k = k .
11.3.22 Proposition. Fix n, k with 0 ≤ k ≤ n.
[ m ]
1. For all m ≥ k + 1 and t ∈ n+1 , 1 , we have
∑
m
bnj (t) ≥ (m − k)bnk (t).
j=k+1
(
2. For all t ∈ k
n+1
, 1], we have
∑
k
bnk (t) (1 − t)k
bnj (t) ≤ where ρnk (t) = .
j=0
1 − ρnk (t) (n − k + 1)t
Proof. 1. Using part 3. of Proposition 11.3.21, one shows bnj (t) ≥ bkn (t) for k <
j ≤ m by induction on j. The stated inequality follows upon adding these
m − k inequalities together.
2. For fixed n the sequence j 7→ j/(n − j + 1) is strictly monotonically increas-

ing on (0, 1]. Therefore, the nonnegative sequence j 7→ rj = ρnj (t) increases
k
towards rk < 1 (since t > n+1 ). Using part 3. of Proposition 11.3.21 and in-
duction, we have bj (t) = rk rk−1 . . . rj+1 bnj (t) ≤ rkk−j+1 bnk (t). By the Geometric
n
Formula,
∑k
1 − rkk+1 n bn (t)
bnj (t) ≤ · bk (t) ≤ k .
j=0
1 − rk 1 − r k
For integers c, d such that 0 ≤ c ≤ d ≤ n, define
∑
d
n
B[c,d] (t) := bnj (t).
j=c
Clearly, 0 ≤ B[c,d]
n
(t) ≤ 1, by parts 1. and 2. of Proposition 11.3.21.
11.3.23 Proposition. For all δ ∈ (0, 1) and all ϵ > 0 there exists n ∈ N such that
[ ]
k
n
B[0,k] (t) < ϵ for all k ≤ n and all t ∈ + δ, 1 .
n
Proof. Without loss of generality, we assume ϵ ≤ 1.

It follows immediately from the hypothesis δ > 0 that the nonnegative function
1−t t−δ
ηδ : t 7→ · ,
t 1 − (t − δ)
defined on the interval [δ, 1], takes values strictly less than 1. Since ηδ is continuous,
the Extreme Value Theorem 11.2.31 implies that there exists γ = γδ > 0 such that
ηδ (t) ≤ 1 − γ for all t ∈ [δ, 1].
Let n be any integer greater than (1 + ϵ−1 γ −1 )δ −1 . We will prove the inequality
Bkn (t) < ϵ whenever t, k satisfy nk + δ ≤ t ≤ 1.
Let m be the unique integer satisfying m < (n + 1)t ≤ m + 1. Certainly, m ≤ n
and t > m/(n+1). Further, we have m ≥ (n+1)t−1 > k +nδ +t−1 > k +ϵ−1 γ −1 .
Since ϵ, γ ≤ 1 we have m ≥ k + 1. Part 1. of Proposition 11.3.22 gives
(1) n
B[k+1,m] (t) ≥ ϵ−1 γ −1 bnk (t),
while part 2. thereof gives
(2) n
B[0,k] (t) ≤ γ −1 bnk (t),
since t ≥ k
n
+δ > k
n+1
.
If bnk (t) = 0, inequality (2) gives B[0,k]

n
(t) = 0 and the proof is complete. Other-
wise, we may divide (2) by (1) obtaining:
n
B[0,k] (t)
n
B[0,k] (t) ≤ n
≤ ϵ,
B[k+1,m] (t)
n
since the denominator B[k+1,m] (t) is positive (by (1)) and bounded above by 1. The
proof is complete.
Proof of Weierstrass’s Approximation Theorem. First, assume I = [0, 1]
is the unit interval. Let f ∈ C([0, 1]).
For n > 0, define the nth Bernstein approximant of f to be the function [0, 1] →
R : t 7→ B n (f ; t), where
∑n ( )
n j
B (f ; t) = f bnj (t).
j=0
n
(We could also define B 0 (f ; t) to be the constant f (0), but we only need the ap-
proximants for n large.)
Let E n (f ; t) = |B n (f ; t) − f (t)| for t ∈ [0, 1]. We have

∑ n
( )

E (f ; t) =
n
f n bj (t) − f (t)
j n

j=0

∑ n
( (j) ) n

= f n − f (t) bj (t) by Proposition 11.3.21.2.

j=0
Let Jn = {j ∈ N | j ≤ n}. For any subset S ⊂ Jn , let

∑
BSn (f ; t) = bnj (t),
j∈S
∑ ( )
ESn (f ; t) = f j − f (t) bnj (t).
n
j∈S
Let ϵ > 0 be given. We may assume ϵ ≤ 1. By uniform continuity, there exists

δ > 0 such that |f (u) − f (t)| < ϵ/2 for all t, u ∈ I with |u − t| < δ.
By Proposition 11.3.23, there exists n such that B[0,k] n
(t) < 81 ϵ/(∥f ∥ + 1) when-
[k ]
ever t ∈ n + δ, 1 . Replacing t by 1 − t and using part 4. of Proposition 11.3.21,
n
we have B[n−k,n] (t) < 16 ϵ/(∥f ∥ + 1) whenever t ∈ [0, nk − δ].
For fixed t ∈ [0, 1] we partition Jn as follows:
• The “middle set” M = {j ∈ Jn | n(t − δ) ≤ j ≤ n(t + δ)};
• the “right set” R = {j ∈ Jn | j > n(t + δ)};
• the “left set” L = {j ∈ Jn | j < n(t − δ)}.
Either of the last two sets R, L may be empty. By choice of n, we have BRn (t), BLn (t) <
1
8
ϵ/(∥f ∥ + 1). By choice of δ, we have |f ( nj ) − f (t)| < ϵ/2 for j ∈ M .
Since |f (u) − f (t)| ≤ |f (u)| + |f (t)| ≤ 2 ∥f ∥ for all t, u ∈ [0, 1], we have
E n (f ; t) ≤ EMn
(f ; t) + ERn (f ; t) + ELn (f ; t)
ϵ n
≤ BM (f ; t) + 2 ∥f ∥ BRn (f ; t) + 2 ∥f ∥ BLn (f ; t)
2
ϵ 2 ∥f ∥ ϵ ϵ ϵ
≤ ·1+2· < + = ϵ,
2 8(∥f ∥ + 1) 2 2
uniformly for all t ∈ [0, 1]. This concludes the proof that the polynomials B n (f ; ·)
uniformly approximate f on [0, 1]. ( )
In general, if f is continuous on a closed interval [a, b], let g : t 7→ f (1−t)a+tb
and let p( be )any polynomial such that (|p(t)
) − g(t)| < ϵ for all t ∈ [0, 1]. Let
( x−a )
q : x 7→ p b−a . Then |f (x) − q(x)| = g b−a − p b−a < ϵ for all x ∈ [a, b].
x−a x−a

Foundations of Mathematics

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Foundations of Mathematics

Uploaded by

Copyright:

Available Formats

Notes on the Foundations

of Mathematics and Analysis

1 The Propositional Calculus 1-1

2 The Predicate Calculus 2-1

3.5 Powers, Products and Ordered Pairs . . . . . . . . . . . . . . . . 3-12

4 The Natural Numbers 4-1

5 Functions and Relations 5-1

6 Numbers and Arithmetic 6-1

7 The Axiom of Choice 7-1

8 The Real Numbers 8-1

8.8 Sequences and Numerals . . . . . . . . . . . . . . . . . . . . . . . 8-37

9 Important Subsets of the Reals 9-1

10 Sequences and Series 10-1

12 Answers to the Exercises 12-1

Eduardo Dueñez and Lucio Tavernini

of view, mathematics is a game, the rules of which we make up as we go along,

0.2 Proof and Paradox

0.2.1 Example. Sancho Panza, Governor of Barataria: Sancho Panza, while

0.2.2 Example. Paul’s Epistle to Titus: We present what is sometimes incor-

Said one of themselves of their own a prophet, Cretans always [are]

Evidently, early Christian tradition identifies the prophet as the philosopher

If Epimenides’ statement is true, then what he said is false, because

0.2.3 Example. Proof by Contradiction: The argument we used in the previ-

m2 = (2k + 1)2 = 4k 2 + 4k + 1 = 2(2k 2 + 2k) + 1,

which is odd, contradicting the fact that m2 is even. We conclude that if m2 is

0.3 Essential Linguistic Concepts

E. Smith Bowens scarcely exaggerates in the amusing description she

These people are farmers: to them plants are as important

0.4 A Dog’s World

an English sentence? Of course it is. Is the text

the dog who bit bo has fleas

is always an English sentence, where “· · · ” means “any number of times,” this

(DL1) bo has fleas is a sentence of DogLan.

the dog who bit bo has fleas

which is a sentence of DogLan. Of course, this whole thing is highly unsatisfactory:

D is an abbreviation for the dog who bit.

In abbreviated form, we have:

(In writing an expression such as DB, we are implicitly defining an operation,

0.5 Peano’s Postulates, Induction

(N 2) If n belongs to N, then its successor n′ also belongs to N.

(N 4) 0 is not the successor of any natural number.

(N 5) If n and m are any two natural numbers such that n′ = m′ , then n = m.

Answer. For n = 0 we have

0.6 Formal Grammars

0.6.1 Definition. Grammars: A grammar G = (N , T , P) consists of three ob-

• a finite collection N of nonterminal symbols with a distinguished element σ,

• a finite collection T of terminal symbols and

• a finite collection P of rewriting rules or productions;

together with the requirement that N and T have no element in common.

0.6.2 Definition. Rewriting Rules: A rewriting rule (or production) is a string

0.6.7 Example. Prove that DBD is not a sentence of DogLan.

0.7 Warning About the Notation Used

The Propositional Calculus

Eduardo Dueñez and Lucio Tavernini

denoted by F . In other words, the interpretation of a sentence is a way of assigning

where or is an example of what we call a sentential connective (also called a

1.2 The Language of the Propositional Calculus

(SL3) Formulas can only be constructed by finitely many applications of rules

(1.3) ((¬(p)) ∨ (q)) ∧ ((p) ∨ (r))

Thanks to (SL2) and (1.4), the following are formulas of L:

(1.5) ¬(p), (p) ∨ (r).

Thanks to (SL2), (1.4) and (1.5), the following is a formula of L:

(1.6) (¬(p)) ∨ (q).