Professional Documents
Culture Documents
0 Background 1
0.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
0.2 Proof and Paradox . . . . . . . . . . . . . . . . . . . . . . . . . . 3
0.3 Essential Linguistic Concepts . . . . . . . . . . . . . . . . . . . . 6
0.4 A Dog’s World . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
0.5 Peano’s Postulates, Induction . . . . . . . . . . . . . . . . . . . . 11
0.6 Formal Grammars . . . . . . . . . . . . . . . . . . . . . . . . . . 13
0.7 Warning About the Notation Used . . . . . . . . . . . . . . . . . 16
3 Sets 3-1
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-1
3.2 Axioms of Extension and Separation . . . . . . . . . . . . . . . . 3-3
3.3 Intersections and Differences . . . . . . . . . . . . . . . . . . . . . 3-8
3.4 Unions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-9
ii
CONTENTS iii
11 Continuity 11-1
11.1 Limits of Functions . . . . . . . . . . . . . . . . . . . . . . . . . . 11-1
11.2 Continuous Functions . . . . . . . . . . . . . . . . . . . . . . . . . 11-13
11.3 Density and compactness in spaces of functions . . . . . . . . . . 11-27
Background
Contents
0.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
0.2 Proof and Paradox . . . . . . . . . . . . . . . . . . . . . . . . 3
0.3 Essential Linguistic Concepts . . . . . . . . . . . . . . . . . . 6
0.4 A Dog’s World . . . . . . . . . . . . . . . . . . . . . . . . . . 7
0.5 Peano’s Postulates, Induction . . . . . . . . . . . . . . . . . . 11
0.6 Formal Grammars . . . . . . . . . . . . . . . . . . . . . . . . 13
0.7 Warning About the Notation Used . . . . . . . . . . . . . . . 16
0.1 Introduction
There are those (Platonists) who believe that mathematics has an external reality
of its own and that mathematical objects really exist, somewhere out there, inde-
pendently of what we know about them, or what we call them.∗ With this point of
view, mathematics is a science. All mathematical theorems already exist, waiting
to be discovered.
There are others (formalists) who believe that mathematics has no external
reality and that mathematical objects only exist insofar as we have the ability to
describe their existence, to invent them, like characters in a novel. With this point
∗
“What’s in a name? That which we call a rose / By any other word would smell as sweet”
(Juliet, Act 2, Scene 2) refers to an external reality.
1
Note 0: Background – July 28, 2020 2
be true? What is the meaning of truth and falsehood? Let’s take these things to
mean what they usually mean in everyday language.
We have used, and will continue to use, a condition that is essential to everyday
mathematical reasoning, the law of the excluded middle: either a statement is true
or its denial (also called negation) is true.
We have just seen an example of a paradox. (This is by no means the shortest
known paradox, but it is one of the most colorful.) It is important to recognize
one when we see it. It is equally important not to see one when there is none, as
we show in the next example.
(The “are” is not explicit in the original Greek, but is implied.) The corresponding
King James Version reads
One of themselves, even a prophet of their own, said, The Cretians [sic]
are alway [sic] liars, evil beasts, slow bellies.
If what Epimenides said is true, then his statement must be false, be-
cause Epimenides is a Cretan. On the other hand, if what he said is
false, then Cretans aren’t liars, so that his statement must be true. The
conclusion is that if Epimenides’s statement is true, then it is false, and
that if it is false, then it is true.
The flaw should be obvious: The negation of “Cretans are always liars” is “there
is some Cretan who is not always a liar.” The above reasoning contains a serious
blunder in taking the negation of a sentence. Let’s try again with a different line
of reasoning:
¶
George Ricker Berry, The Interlinear Literal Translation of the Greek New Testament, Zon-
dervan, Grand Rapids, Michigan, 1958, p. 555.
Note 0: Background – July 28, 2020 5
The “Epimenides paradox” is no paradox after all. The conclusion that the
statement is false applies to “Cretans are always liars” in the context of the state-
ment “A Cretan said: ‘Cretans are always liars.’ ” In other words, we have not
shown that Paul’s Epistle to Titus contains a false statement. We have shown that
the alleged paradox does not exist.
Further, we can assume that m and n have no common factors, i.e.: we assume
that the fraction in (1) is in reduced form.
By squaring (1) we obtain 2 = m2 /n2 , which gives
(2) m2 = 2n2 .
That is, m2 is even. This means that m is also even. In other words, m = 2k for
some integer k. Therefore, m2 = 4k 2 . We substitute this last equality in (2) to
obtain 4k 2 = 2n2 . This gives, 2k 2 = n2 , showing that n2 is also even. Hence, n is
even also.
We have reached the conclusion that m and n are both even, contradicting our
original assumption that the fraction in (1) is in reduced form.√
Therefore, we have shown that√it is impossible to write 2 in the form (1).
Equivalently, we have shown that 2, not being rational, must be irrational.
If it is not clear why (or how) proofs by contradiction work, be patient. We
shall examine the ideas involved in great detail in the next note. Many will be
quite surprised at the simplicity of the ideas involved in this line of reasoning.
If you feel frustrated at this point, rest assured that proofs by contradiction
should become quite clear in Note 1.
In the same way that language cannot define itself but must rely on external ref-
erences, mathematics cannot define itself but must rely on some degree of linguistic
structure being already present. What linguistic structure must be within the lin-
guistic competence of the reader to understand the beginning of the development
of mathematics? We begin to answer this question in the next section.
The structures (whether linguistic or mathematical) are called inductive struc-
tures. In the next section we begin with the simplest possible meaningful example.
bo has fleas
Note 0: Background – July 28, 2020 8
an English sentence also? Again, yes. How about the text given below?
the dog who bit the dog who bit bo has fleas
Again, we are compelled to say yes. You can see where all of this leads: If I ask
“Can I prefix the above sentence with another the dog who bit and still have an
English sentence?” you’d be hard-pressed to say no; on what grounds can you deny
me this request? On what grounds can you object to my repeating this operation
any number of times? Where can you draw the line? Is two times permissible, but
not three? Is 346 times permissible, but not 347?
Out of sheer frustration, you might agree that the text
the dog who bit · · · the dog who bit bo has fleas
(DL2) If we prefix any sentence of DogLan with the text the dog who bit we
obtain a sentence of DogLan.
(DL3) The only sentences of DogLan are those which can be obtained from
finitely many applications of rules (DL1) and (DL2) above.
The phrase “finitely many” in (DL3) means that the process of applying rules
must eventually terminate: We do not allow this process (even if just mental) to
be repeated forever.
Is this so hard to understand? If you say yes, it might just mean that you find
these properties of English (and of many other languages) uninteresting, perhaps
even dull, finding yourself rather like Smith Bowens found herself in Africa.
There are plenty more: i dream, i dream that i dream, i dream that i dream that i
dream, and so on, is another.
Note 0: Background – July 28, 2020 9
You may reasonably ask: “Is this for real? What’s the point of all this?” The
point is that these are the things that we must look at. We’re using certain pecu-
liarities of language to achieve a very specific goal: The discovery—or invention, it
doesn’t matter which—of nothing less than mathematics. We are using mathemat-
ical thought at its most fundamental, irreducible, primitive level: What we have
done cannot be explained in simpler terms, and cannot separate “mathematical”
thought from “linguistic” thought.
Is the text
the dog who bit the dog who bit the dog who bit bo has fleas
a sentence of DogLan? To show that it is, consider the following argument. The
text
bo has fleas
is a sentence of DogLan by rule (DL1). We apply rule (DL2) to the above to obtain
the text
which is a sentence of DogLan. We apply rule (DL2) to the above to obtain the
text
the dog who bit the dog who bit bo has fleas
which is a sentence of DogLan. Again, we apply rule (DL2) to the above to obtain
the dog who bit the dog who bit the dog who bit bo has fleas
Further, let us say that if S is any sentence of DogLan, then DS represents the
sentence S prefixed with the dog who bit. We use the equality symbol “=”
between any two sentences of DogLan to mean that the two sentences are identical.
Note 0: Background – July 28, 2020 10
To deny that the two sentences are identical we use the symbol “̸=” instead. For
example:
̸
bo has fleas = the dog who bit bo has fleas
the dog who bit bo has fleas = the dog who bit bo has fleas
̸
B = DB
DB = DB
(DL4) The sentence bo has fleas cannot be obtained by prefixing any sentence
of DogLan with the phrase the dog who bit. In abbreviated form: If
S is any sentence of DogLan, then DS ̸= B.
(DL5) Let Sleft and Sright denote any two sentences of DogLan. If
DSleft = DSright .
then
Sleft = Sright
(Note that the converse of property (DL5) is assumed implicitly in order to make
the operation of juxtaposition well defined.)
For example, we see that DDDB ̸= DDB by arguing that
̸
DB = B by (DL4),
DDB = ̸ DB by (DL5),
DDDB = ̸ DDB by (DL5).
The concepts we have used in the above discussion are fundamental to the most
basic form of mathematical reasoning and cannot be put in more elementary form.
In fact, rules (DL1) through (DL5) are nothing less than a rephrasing of Peano’s
celebrated postulates (axioms) for the natural numbers, introduced in 1891. We
discuss some details below.
Note 0: Background – July 28, 2020 11
0′ = 1.
The next is
0′′ = (0′ )′ = 1′ = 2.
Then comes
0′′′ = (0′′ )′ = 2′ = 3,
and so on. 0 is not a successor.
0.5.1 Definition. The Postulates: The natural numbers N satisfy the five ax-
ioms (N 1) through (N 5), given below, known as Peano’s postulates.∗
(N 1) 0 belongs to N.
(N 3) The only elements of N are those which can be obtained from finitely
many applications of rules (N 1) and (N 2) above.
0 ↔ B
1 = 0′ ↔ DB
2 = 0′′ ↔ DDB
3 = 0′′′ ↔ DDDB
4 = 0′′′′ ↔ DDDDB
.. ..
. .
From N 1–N 3 we obtain the induction principle, which is of tremendous impor-
tance in the development of mathematics.
0.5.3 Theorem. Induction Principle: Suppose that S(n) is a statement about
the natural number n such that:
(IP 1) S(0) is true.
(IP 2) If S(n) is true then S(n′ ) is true.
Then, S(n) is true for all natural numbers n.
Proof. The natural numbers are generated thanks to (N 1)–(N 3). Suppose that
S(n) is a statement about the natural number n such that (IP 1) and (IP 2) hold.
Then, given any natural number m, how can we be sure that S(m) is true? If
m = 0, then S(m) is true by (IP 1). Otherwise, m is a successor, which is obtained
from 0 by applying the successor operation a specified number of times. Thanks to
(IP 1) and (IP 2), the latter applied with the same number of successor operations
required to generate m, we conclude that S(m) must be true.
0.5.4 Remark. Note that N 3 is required in the above proof because we must be
sure that all natural numbers other than 0 are successors and that the only natural
number that is not a successor is 0. In fact, N 3 and the induction principle are
equivalent. In many versions of Peano’s axioms the induction principle is used in
place of N 3.
0.5.5 Example. Let r be a real number with r ̸= 1. Use induction to prove the
correctness of the formula
∑
n
1 − rn+1
(3) rk = .
k=0
1−r
which is correct.
Suppose that the formula is correct for n. Then,
∑
n+1 ∑
n
k
r = rk + rn+1
k=0 k=0
1 − rn+1
= + rn+1
1−r
1 − rn+2
= .
1−r
The above is the formula (3) with n replaced by n′ = n + 1. In other words,
the assumption that (3) holds for n leads to the conclusion that it also holds for
n′ = n + 1. We have satisfied the hypotheses of the induction principle. Therefore,
the formula (3) holds for all n.
0.6.3 Example. For DogLan the nonterminal symbols can be chosen as σ and δ;
the terminal symbols are B (for bo has fleas) and D (for the dog who bit);
the rewriting rules are
(R1) σ → B,
(R2) σ → δσ,
(R3) δ → D.
0.6.4 Definition. Languages: The language of G (the language defined by G) is
precisely the collection of all strings ρ of terminal symbols that can be obtained by
the application of finitely many productions starting with σ:
σ → · · · → ρ.
0.6.5 Remark. The idea behind the use of the rewriting rules is to start with one
of the rewriting rules with left side σ. Then, we rewrite the right side using one
of the rewriting rules. We keep rewriting the right side until we obtain a string
consisting exclusively of terminal symbols. Note:
σ → ··· → ρ
is called a derivation‡ in G.
For example, the following is a derivation in DogLan.
σ → δσ by (R2)
→ δδσ by (R2)
→ δδδσ by (R2)
→ δδδB by (R1)
→ δδDB by (R3)
→ δDDB by (R3)
→ DDDB by (R3)
0.6.6 Remark. An expression is any string of terminal symbols. A sentence, is an
expression that can be derived from the sentence symbol σ. If we derive a string of
terminal symbols from σ, we know that the string belongs to the language. (The
derived string is a sentence.) Can we show when it is impossible to derive a given
expression? The answer is affirmative for the type of grammars we are using here.
The basic idea is that it is possible to derive all sentences of a given length in
finitely many steps. Therefore, given an expression, we generate all sentences that
have the same length as the given expression. If we do not find the given string
listed among the sentences generated, the expression is not a sentence.
‡
Note that a derivation is yet another example of an inductive structure.
Note 0: Background – July 28, 2020 15
=
→
↔
···
without having given these symbols a precise mathematical meaning. The reader
is presumed to possess an intuitive grasp of the intended purpose and meaning of
these symbols (from examining the context in which they are used).
It is important to understand that such sloppiness occurs only in Note 0. In
fact, this note is meant to serve two simple purposes. The first is to provide a lead
to the problem we face in just getting started. The second is to make the reader
aware that our language should be quite precise.
The word “proof” has been thrown about several times. Can the meaning of
the notion proof be explained? At this level, a proof is a sequence of statements
that, starting with some stated assumptions, leads to a conclusion through the
use of common-sense reasoning. In mathematics, a formal proof is defined as a
syntactical (grammatical) notion that is independent of semantics (meaning). In
these notes we shall not use formal proofs. Rather, we shall rely on the semantical
notion of logical consequence, also called tautological implication.
How should you study this note? There are no exercises and you will not be
tested on its contents. Think of this note as an eye opener, if you have never seen
this material before.
Foundations Note 1
Contents
1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-1
1.2 The Language of the Propositional Calculus . . . . . . . . . 1-2
1.3 Substitution . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-5
1.4 Interpretations . . . . . . . . . . . . . . . . . . . . . . . . . . 1-6
1.5 Tautologies and Contradictions . . . . . . . . . . . . . . . . . 1-8
1.6 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-11
1.7 Some Useful Tautologies . . . . . . . . . . . . . . . . . . . . 1-16
1.8 Proofs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-18
1.9 Appendix A: More Examples . . . . . . . . . . . . . . . . . . 1-21
1.10 Appendix B: The Greek Alphabet . . . . . . . . . . . . . . . 1-24
1.1 Introduction
In notes 1 and 2 we develop the basic rules of everyday mathematical reasoning. In
this note, we start with the so-called propositional calculus (also called sentential
calculus). Italicized terms in this informal introduction will later be defined for-
mally; however, their daily usage gives a rough first approximation to the precise
meaning that they have in mathematics.
A proposition is a statement (the assertion of a declarative sentence within an
interpretation) that has either the truth value truth, denoted by T , or falsehood,
1-1
Note 1: The Propositional Calculus – July 28, 2020 1-2
(1.2) p ∨ q,
where the symbol ∨ stands for the sentential connective or.
Given an interpretation (Frank and Mary who? Paris, Texas and Rome, New
York?) the assertion of the above declarative sentence makes a statement that can
be true or false, depending on how we interpret the connective or. Below, we define
several connectives. We use “formula” as another name for “sentence.”
1.2.1 Definition. The Language L: We start with the basic building blocks of
the language. The formal symbols of L are the symbols listed in quotation marks:
• the connectives ‘¬’, ‘∧’, ‘∨’, ‘→’, ‘↔’;
• left parenthesis ‘(’;
• right parenthesis ‘)’;
• an infinite collection of symbols ‘p’, ‘q’, ‘r’, …, ‘p1 ’, ‘p2 ’, ‘p3 ’, . . . ; called
variables.
We ignore spaces between formal symbols and we use the operation of concate-
nation by which we take a string (sequence, juxtaposition) of formal symbols and
juxtapose it with another string of formal symbols.
An expression of L is any sequence of formal symbols obtainable from the formal
symbols with finitely many applications of the concatenation operation. Thus,
(p ∧ q)))((∧∧
and
((p) ∧ q) ∨ (r ∧ s)
are examples of expressions of L. Note: To improve readability, we may use paren-
theses of sizes and shapes that differ from the official ‘(’ and ‘)’, since this practice
introduces changes that are purely cosmetic in nature.
Not all expressions of L make sense. Those that do, called sentences (or formu-
las), can only be assembled following the rules shown below. These are the rules
that separate what is grammatically correct from what is not.
Accordingly, a sentence, or formula, of L is not just any expression of L, but
one that is defined by the following rules.
(SL1) Any variable is a formula.
(SL2) If φ1 and φ2 are formulas, then
(φ1 ),
¬(φ1 ),
(φ1 ) ∧ (φ2 ),
(φ1 ) ∨ (φ2 ),
(φ1 ) → (φ2 ),
(φ1 ) ↔ (φ2 )
are formulas also.
Note 1: The Propositional Calculus – July 28, 2020 1-4
is a formula of L.
Thanks to (SL1) the following are formulas of L:
(1.4) p, q, r.
is a formula∗ of L.
1.2.4 Definition. The Names of the Connectives: The following are the
names of the connectives:
• the denial or negation of p, denoted by ¬p (read: “not p”);
∗
We have ignored an important issue: that of unique readability. In other words, every formula
of L can be read in one, and only one way. For example, in everyday English “she fed her dog
biscuits” can be read in at least two different ways. This is a syntactically ambiguous sentence
because we have two choices: We can take her dog as the indirect object and biscuits as the direct
object, or we can take her as the indirect object and dog biscuits as the direct object. In other
words, we can’t tell whether we should read the given sentence as “she fed biscuits to her dog,”
or whether we should read it as “she fed dog biscuits to her.”
Note 1: The Propositional Calculus – July 28, 2020 1-5
1.3 Substitution
1.3.1 Definition. If p1 , . . . , pn are propositional variables, a formula in p1 , . . . , pn
is a formula where only the propositional variables p1 , . . . , pn appear. For example
((p) ∧ (q)) → ((r) ∨ (s)) is a formula in p, q, r and s.
Proof. If p does not appear in φ then the substitution does nothing to φ, which
remains a valid formula.
Otherwise, we conclude that φ(p ← (ψ)) is a valid formula by reasoning as fol-
lows. Let the formula φ involve the variable p together with others: p1 , p2 , . . . , etc.
(It makes no difference whether some of these variables appear in the formula ψ.)
Starting from the variables p, p1 , p2 , . . . , successive applications of the rules
(SL1) and (SL2) generate a sequence of intermediate formulas ending in φ. (Rule
(SL3) ensures this). To find a sequence of intermediate formulas leading to the
final formula φ(p ← (ψ)), apply the same intermediate steps, but whenever p is
called for, insert (ψ) instead. Each of the steps remains valid because all application
of (SL2) apply generally to arbitrary formulas φ1 (and φ2 as required) and it is
irrelevant whether φ1 , say, is merely a variable (such as p) or some other formula
(such as (ψ)). Summing up, the formula φ(p ← (ψ)) is valid because we have
shown that it can be obtained from the rules (SL1) and (SL2).
q ← ((r) → (s)),
1.4 Interpretations
Formulas must be interpreted before they become (true or false) statements. To do
this, each variable in the formula must be given an interpretation before the entire
formula can be given an interpretation.
φ ¬(φ)
F T
T F
†
Considerations of the absolute meaning of the concepts of truth and falsehood are left to
religion and philosophy. As far as we are concerned, T and F are simply two distinct symbols.
Note 1: The Propositional Calculus – July 28, 2020 1-7
Let φ and ψ denote formulas having one of the four possible simultaneous inter-
pretations: F and F , F and T , T and F , or T and T . Then the formulas (φ) ∧ (ψ),
(φ) ∨ (ψ), (φ) → (ψ) and (φ) ↔ (ψ) are interpreted according to the table below.
¬p ∧ ¬¬q ∧ ¬¬r
is read [( ) ( )]
( ) ( ) ( )
¬(p) ∧ ¬ ¬(q) ∧ ¬ ¬(r) .
Next in the evaluation order are the connectives → and ↔. That is,
¬p ∧ q ∨ r → s ∧ t
is read [( ) ]
( ) ( )
¬(p) ∧ (q) ∨ (r) → (s) ∧ (t) .
p ¬p p ∨ ¬p p ∧ ¬p
F T T F
T F T F
We note that every interpretation of the formula p ∨ ¬p is true, while every inter-
pretation of the formula p ∧ ¬p is false. Evidently, there are formulas to which we
can attach truth values independently of interpretation. These are precisely the
formulas we shall call tautologies and contradictions, as defined below.
Note 1: The Propositional Calculus – July 28, 2020 1-9
1.5.3 Example. Of course, we can apply the above theorem any number of times,
since after every substitution we still have a tautology. For example, the formula
s → (t → s)
(p ∧ q → r) → ((q → r) → (p ∧ q → r)),
(p → q) → r and (p → q) ∧ (q → r)
|= (m → f ) ∧ (f → ℓ) → (m → ℓ).
1.6 Examples
1.6.1 Definition. Exclusive Or: The interpretation of “p or q, but not both” or
“either p or q” is given by the following table.
p q either p or q ¬(p ↔ q) ¬p ∧ q p ∧ ¬q (¬p ∧ q) ∨ (p ∧ ¬q)
F F F F F F F
F T T T T F T
T F T T F T T
T T F F F F F
The table also shows that “either p or q” (“p exclusive or q”) is tautologically
equivalent to
¬(p ↔ q)
and is also tautologically equivalent to
(p ∧ ¬q) ∨ (¬p ∧ q).
1.6.2 Theorem. Conditionals and Biconditionals: The formula
(p ↔ q) ↔ (p → q) ∧ (q → p)
is a tautology.
Proof. The assertion follows from the table shown below because the corresponding
truth values of the last two columns are identical.
p q p→q q→p (p → q) ∧ (q → p) p↔q
F F T T T T
F T T F F F
T F F T F F
T T T T T T
Note 1: The Propositional Calculus – July 28, 2020 1-12
(1.7a) |= (p ∧ q) ↔ (q ∧ p),
(1.7b) |= (p ∨ q) ↔ (q ∨ p),
(1.7c) |= (p ↔ q) ↔ (q ↔ p).
̸|= (p → q) ↔ (q → p),
p q r p∧q q∧r (p ∧ q) ∧ r p ∧ (q ∧ r)
F F F F F F F
F F T F F F F
F T F F F F F
F T T F T F F
T F F F F F F
T F T F F F F
T T F T F F F
T T T T T T T
p q r p∨q q∨r (p ∨ q) ∨ r p ∨ (q ∨ r)
F F F F F F F
F F T F T T T
F T F T T T T
F T T T T T T
T F F T F T T
T F T T T T T
T T F T T T T
T T T T T T T
p q r p↔q q↔r (p ↔ q) ↔ r p ↔ (q ↔ r)
F F F T T F F
F F T T F T T
F T F F F T T
F T T F T F F
T F F F T T T
T F T F F F F
T T F T F F F
T T T T T T T
p q ¬p ¬q p∧q ¬(p ∧ q) ¬p ∨ ¬q
F F T T F T T
F T T F F T T
T F F T F T T
T T F F T F F
p q ¬p ¬q p∨q ¬(p ∨ q) ¬p ∧ ¬q
F F T T F T T
F T T F T F F
T F F T T F F
T T F F T F F
and
|= [¬(p ∨ q)] ↔ [¬p ∧ ¬q]
can be written
¬(p ∧ q) ⇔ ¬p ∨ ¬q
and
¬(p ∨ q) ⇔ ¬p ∧ ¬q,
respectively. (The same holds for the other tautologies.)
Note 1: The Propositional Calculus – July 28, 2020 1-16
Tautologies
Law of the excluded middle p ∨ ¬p
Law of contradiction ¬(p ∧ ¬p)
Tautological Implications
Law of adjunction p→p
Law of simplification p∧q →q
Law of addition p→p∨q
Law of detachment (modus ponens) p ∧ (p → q) → q
Modus tollendo tollens ¬q ∧ (p → q) → ¬p
Modus tollendo ponens ¬p ∧ (p ∨ q) → q
Law of hypothetical syllogism (p → q) ∧ (q → r) → (p → r)
Law of absurdity (reductio ad absurdum) (p → q ∧ ¬q) → ¬p
Law of exportation (p ∧ q → r) → (p → (q → r))
Law of importation (p → (q → r)) → (p ∧ q → r)
Tautological Equivalences
Law of double negation p ↔ ¬¬p
Commutative laws p∨q ↔q∨p
p∧q ↔q∧p
(p ↔ q) ↔ (q ↔ p)
Law of contraposition (p → q) ↔ (¬q → ¬p)
De Morgan’s laws ¬(p ∧ q) ↔ ¬p ∨ ¬q
¬(p ∨ q) ↔ ¬p ∧ ¬q
Law of equivalence for
implication and disjunction (p → q) ↔ ¬p ∨ q
Law of negation for implication ¬(p → q) ↔ p ∧ ¬q
Laws for biconditionals (p ↔ q) ↔ (p → q) ∧ (q → p)
(p ↔ q) ↔ p ∧ q ∨ ¬p ∧ ¬q
Laws for identities p ↔ p ∨ (q ∧ ¬q)
p ↔ p ∧ (q ∨ ¬q)
Law of simultaneous conclusions (p → (q ∧ r)) ↔ (p → q) ∧ (p → r)
Note that the Law of negation for implication can be interpreted as follows: To
prove that p does not imply q, it is necessary and sufficient to show that p is true
and q is false.
Note 1: The Propositional Calculus – July 28, 2020 1-17
1.7.1 Exercise. For each of the formulas given below, if the formula is a tautology,
give a truth table showing that it is; otherwise, show that the formula is not a
tautology by producing a counterexample (that is, a truth assignment making the
interpretation of the formula false).
1. p ↔ ¬¬p
2. p ↔ p ∨ (q ∧ ¬q)
3. p ↔ p ∧ (q ∨ ¬q)
4. ¬(p ∧ ¬p)
5. p → (p → p)
6. p → p ∧ q
7. p → p ∨ q
8. p → (q → p)
9. p ∧ q → p
10. (p → q) → (¬q → p)
11. (p ↔ q) ↔ p ∧ q ∨ ¬p ∧ ¬q
12. ¬p ∧ (p ∨ q) → q
13. p ∧ (p → q) → q
14. ¬(p ∨ q) ↔ ¬p ∨ ¬q
15. ¬(p → q) ↔ p ∧ ¬q
16. ¬p ∧ (p ↔ q) → ¬q
17. ¬q ∧ (p → q) → p
19. (p ∧ q → r) ↔ (p → (q → r))
20. (p → q) ∧ (q → r) → (p → r)
21. (p → q) ∨ (q → r) → (p → r)
22. (p → q) ∧ (p → r) → (p → q ∧ r)
Note 1: The Propositional Calculus – July 28, 2020 1-18
1.8 Proofs
The use of tautological implications and equivalences forms the basis for our math-
ematical arguments used in the construction of proofs. A few of these are outlined
below. They are all based on the fact that if ϕ tautologically implies ψ, we can
prove ϕ to conclude ψ. The techniques illustrated in the examples given below
provide the basis for many of the “tricks” we shall use later in the rest of these
notes. This section will serve as a reference for much of what follows.
1.8.1 Definition. Premises and Consequences: Let φ and φ1 , . . . , φn be for-
mulas. We say that φ is a consequence of (or that φ logically follows from) the
formulas φ1 , . . . , φn , just in case
(1.13) φ1 , . . . , φn |= φ
φ1 , . . . , φn ̸|= φ
φ1 ⇔ · · · ⇔ φn ⇒ φn+1 ⇔ · · · ⇔ φm .
p q p→q ¬q → ¬p
F F T T
F T T T
T F F F
T T T T
1.8.3 Note. The reason proofs by contraposition are valid is that, by equa-
tion (1.14), both (p → q) → (¬q → ¬p) and (¬q → ¬p) → (p → q) are tautologies,
hence
¬q → ¬p =||= p → q.
Said otherwise, from the truth of ¬q → ¬p there follows the truth of p → q, and
vice versa.
N = {0, 1, 2, . . .}.
n is even → n2 is even.
(1) |= (p → q ∧ ¬q) → ¬p
(the truth table shown below verifies this), called the law of absurdity. Note: It
does not matter what we use for q, since q ∧ ¬q is a contradiction. Further, q ∧ ¬q
can be replaced by q ↔ ¬q, or any other contradiction.‖
p q ¬p q ∧ ¬q p → q ∧ ¬q (p → q ∧ ¬q) → ¬p
F F T F T T
F T T F T T
T F F F F T
T T F F F T
We might use (1) to prove that p is true by proving that ¬p leads to a contradiction.
In (2) we can replace p with p → q to obtain the tautology
yet another form of the law of absurdity. We can prove p → q by asserting p while
denying q to obtain a contradiction. In other words, p → q can only be false if we
can have p true and q false; i.e.: p ∧ ¬q.
¶
If n is a natural number then the negation of the statement “n is odd” is the statement “n is
even”.
‖
We cannot write F (for falsehood), because truth and falsehood belong to interpretation,
not to the language L.
Note 1: The Propositional Calculus – July 28, 2020 1-21
1.8.6 Example.
√ A Proof by Contradiction: We illustrate the technique by
∗∗
proving that 2 is irrational. Recall that a real number x is called rational if it
can be written as the ratio m/n of two integers. Otherwise, x is called
√ irrational.
The proof by contradiction goes as follows. (i) We suppose
√ that 2 is rational.
(ii) We obtain a contradiction.√(iii) We conclude that 2 cannot be rational.
Accordingly, suppose that 2 is rational, so that we can write
√ m
(1) 2= .
n
Further, we can assume that m and n have no common factors, i.e.: we assume
that the fraction in (1) is in reduced form.
By squaring both sides of (1) we obtain the equality 2 = m2 /n2 , which gives
(2) m2 = 2n2 .
Clearly, (2) shows that m2 is even. This means that m is also even. In other
words, m = 2k for some integer k. Therefore, after squaring both sides, we obtain
m2 = 4k 2 . We substitute this last equality in (2) to obtain 4k 2 = 2n2 . This gives,
2k 2 = n2 , showing that n2 is also even. Hence, n is even also.
We have reached the conclusion that m and n are both even, contradicting our
original assumption that the fraction in (1) is in reduced form (because we can
divide numerator and denominator by 2). √
Therefore, we have shown that√it is impossible to write 2 in the form (1).
Equivalently, we have shown that 2, not being rational, must be irrational.
The examples that follow were already worked out using truth tables. Why
must we rework them again without using truth tables? The point is to expose the
line of reasoning that must be used when truth tables do not apply.
1.9.2 Example. We prove the tautology
|= (p ↔ q) ↔ p ∧ q ∨ ¬p ∧ ¬q
of Exercise 1.7.1.11 without using a truth table. We have
(p ↔ q) ⇔ (p → q) ∧ (q → p) (Theorem 1.6.2),
⇔ (¬p ∨ q) ∧ (¬q ∨ p) (Theorem 1.6.8),
⇔ ¬p ∧ (¬q ∨ p) ∨ q ∧ (¬q ∨ p) (distributivity),
⇔ [(¬p ∧ ¬q) ∨ (¬p ∧ p) ] ∨ [ (q ∧ ¬q) ∨(q ∧ p)] (distributivity),
| {z } | {z }
contradiction contradiction
(Law for Identity:
⇔ (¬p ∧ ¬q) ∨ [ (q ∧ ¬q) ∨(q ∧ p)]
| {z } Exercise 1.7.1.2),
contradiction
⇔ (¬p ∧ ¬q) ∨ [(q ∧ p) ∨ (q ∧ ¬q) ] (commutativity),
| {z }
contradiction
⇔ (¬p ∧ ¬q) ∨ (q ∧ p) (Law for Identity),
⇔ (p ∧ q) ∨ (¬p ∧ ¬q) (commutativity).
1.9.3 Example. We prove the tautology
|= (p ∧ q → r) ↔ (p → (q → r))
of Exercise 1.7.1.19 without using a truth table. We have
(p ∧ q → r) ⇔ ¬(p ∧ q) ∨ r (Theorem 1.6.8),
⇔ ¬p ∨ ¬q ∨ r (Theorem 1.6.9),
⇔ ¬p ∨ (¬q ∨ r) (Associativity),
⇔ ¬p ∨ (q → r) (Theorem 1.6.8),
⇔ (p → (q → r)) (Theorem 1.6.8).
1.9.4 Example. A proof of the existence of infinitely many primes by
contradiction: This proof was first published by Euclid in his Elements (III
century B.C.), probably the most influential work in the history of mathematics.
Recall that a natural number n > 1 is prime if its only (positive integer) factors
are 1 and n itself. Otherwise, n is called composite.†† A well-known proposition
states that that every natural number n > 1 is divisible by some prime number‡‡ .
††
The number 1 is neither prime nor composite; it is called unity.
‡‡
The Fundamental Theorem of Arithmetic states that every natural n > 1 is a product of one
or more factors that are prime numbers (not necessarily distinct); moreover, the prime factors
of n are uniquely determined, up to reordering.
Note 1: The Propositional Calculus – July 28, 2020 1-23
To start the proof by contradiction, assume that only finitely many prime num-
bers exist. Let us say that there are only N distinct prime numbers. List all of
these distinct prime numbers:
p1 , p 2 , p 3 , . . . , p N .
Let
a = 1 + p1 p2 · · · · · pN .
At this point we recall the assumption made at the beginning of the proof: p1 , p2 , . . . , pN
are all the prime numbers. Clearly, a > p1 , a > p2 , …, and a > pN . Therefore, no
prime number divides a. On the other hand, since a > 1, there must exist some
prime number p that divides a. This is a contradiction.
The assumption that only finitely many prime numbers exist leads to a contra-
diction. This proves that infinitely many prime numbers exist.
Note 1: The Propositional Calculus – July 28, 2020 1-24
Contents
2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-1
2.2 Quantification of Predicates . . . . . . . . . . . . . . . . . . 2-2
2.3 Formulas of the Predicate Calculus . . . . . . . . . . . . . . 2-4
2.4 Interpretations Revisited . . . . . . . . . . . . . . . . . . . . 2-8
2.1 Introduction
The notions developed in the propositional calculus do not enable us to penetrate
the language of everyday reasoning beyond the primitive notion of irreducible sen-
tence (propositional variable, sentence letter). Here we develop a finer and more
sophisticated structure that leads to a more powerful tool for describing mathe-
matical reasoning. We begin by introducing the notion predicate below.
In everyday language, a predicate is the phrase in a sentence or clause which
expresses what is said of the subject. In the formula (sentence) “x is positive,” the
predicate is “is positive.” If we let A denote this predicate symbolically, and let
A(x) denote the predicate A applied to x, the formula can be written A(x).
The above predicate is a one-place predicate. (The predicate applies to one
variable.) More generally, we may have a two-place predicate (e.g.: x is taller than
y), a three-place predicate (e.g.: x and y are the parents of z), and so on. We do
not allow zero-place predicates.
2-1
Note 2: The Predicate Calculus – July 28, 2020 2-2
for integer x, the function f here is defined by the rule which assigns to every
x the integer x2 . The domain of each function must be adequate for function
evaluation to make sense in a given interpretation, where “interpretation” shall
be defined below. The value of each function must also be in the domain of the
interpretation. A function of n variables is called an n-place function. We do not
allow zero-place functions.
We point out that we use the notion function as an undefined primitive notion
in the background language. Later, functions will be defined as special kinds of
sets.
In the examples that follow, N = {0, 1, 2, . . .} denotes the natural numbers, Z
the integers, Q the rationals and R the reals. These are the “unofficial” versions
of these numbers, as we haven’t invented them yet. They serve the purpose of
illustrating our development with examples to show, more or less, in which direction
our constructions will take us.
A(x1 , . . . , xn ),
where
x1 , . . . , x n
∗
Remember that Note 1 and Note 2 develop the background language. Here “function” is an
intuitive notion lacking a precise definition.
†
Of course, we must also have an interpretation for “taller than” and we must know who
Charlie is. At the moment we are only concerned with presenting the existential and universal
quantifiers.
Note 2: The Predicate Calculus – July 28, 2020 2-3
(∀x)(x ≥ 0)
(∀x)(∃y)(∀z)[(x + y = 0) ∧ (x + z = 0 → y = z)].
¬(∃x)(p(x)) ↔ (∀x)(¬p(x))
is a tautology. (If there is no x for which p(x) is true then p(x) must be false for
all x. If p(x) is false for all x then there is no x for which p(x) is true.) Negating
the above and replacing p(x) with ¬p(x) we obtain the tautology
¬(∀x)(p(x)) ↔ (∃x)(¬p(x)).
Note 2: The Predicate Calculus – July 28, 2020 2-4
Therefore, only one of the two quantifiers ∀ and ∃ is necessary, for we can replace
(P L2) equality symbol ‘=’, (not all languages may have ‘=’);
(P L11n ) If φ is any n-place function letter and t1 , . . . , tn are terms, then φ(t1 , . . . , tn )
is a term.
(P L12) The only terms are those which can be constructed by finitely many
applications or rules (P L9), (P L10), and (P L11n ) given above.
Note that our definition of term does not make explicit allowance for the use of
operators such as + in “x+y.” To do this, define the two-place function letter f that
is interpreted as f (x, y) = x + y, where + is given meaning in the interpretation.
Then we have
f (x, y) for x + y,
f (f (x, y), z) for (x + y) + z,
and so on. Also note that rule (P L11n ) depends on the value of n, n = 1, 2, . . ..
The text
“t1 , . . . , tn ”
Note 2: The Predicate Calculus – July 28, 2020 2-6
¬(φ1 ),
(φ1 ) ∧ (φ2 ),
(φ1 ) ∨ (φ2 ),
(φ1 ) → (φ2 ),
(φ1 ) ↔ (φ2 )
are formulas also.
(P L17) The only formulas are those which can be obtained by finitely many ap-
plications of rules (P L13), (P L14n ), (P L15), and (P L16) given above.
Formulas free of quantifiers, that is, formulas that can be formed without using
rule (P L16), are called irreducible predicate formulas. Any irreducible formula can
be obtained from a suitable formula of the propositional calculus upon replacing
propositional variables by terms. See remark 2.3.8.
2.3.2 Example. We use the same rules for eliminating parentheses we used in the
previous note. For example, the following is a formula:
2.3.4 Examples.
scope of ∃y
z }| {
(∀x) (∃y)(y > x ∧ y − 1 < x)
| {z }
scope of ∀x
[z scope of ∀y
}| { z
scope of ∀z
}| {]
(∃x) (∀y)(y > 0 ∨ x + z = 5) ∧ (∀z)(x + y > z)
| {z }
scope of ∃x
(∀x)(x > y)
the first occurrence of x is bound and the second is free (so x is both free and
bound in the formula), while the (only) occurrence of y is bound.
2.3.7 Definition (Bound and Free Variables). A variable is called a free variable
in a formula if at least one occurrence of the variable is free. A variable is called a
bound variable in a formula if at least one occurrence of the variable is bound.
denotes the formula obtained from ξ after replacing every free occurrence of
x by (ψ). Note: All bound occurrences of x are not replaced by (ψ). Rather,
they remain x’s.
• for assigning to every n-place function letter an n-place function (taking each
n-tuple of elements in D to a unique member of D);
• for assigning to every n-place predicate letter an n-place function with Truth
and Falsehood as the only possible function values.
Once the above rules are provided, it becomes possible to interpret arbitrary terms.
The passage from terms to formulas is as follows:
• If a, b are terms then the equality formula E(a, b) (or a = b, as we prefer) has
an interpretation of Truth exactly when a and b are interpreted as the same
member of D.
• Quantifiers are interpreted by replacing variables by elements of the set D:
– (∀x)(φ) has an interpretation of Truth if and only if φ(x ← a) has an
interpretation of Truth for all a in D;
– (∃x)(φ) has an interpretation of Truth if there exists a in D such that
φ(x ← a) has an interpretation of Truth.
• Sentential connectives are interpreted in a truth-functional manner (i.e.: in
the manner prescribed by the truth tables from Note 1, Section 2.4).
2.4.2 Definition (Proposition). A proposition is a formula without free variables.
2.4.3 Remarks. Existential quantifiers complicate things in connection with
tautologies. For example, the tautology called the law of hypothetical syllogism,
(p → q) ∧ (q → r) → (p → r),
remains a tautology when we change the above to
(∃x)[(p(x) → q(x)) ∧ (q(x) → r(x)) → (p(x) → r(x))],
since (p(x) → q(x)) ∧ (q(x) → r(x)) → (p(x) → r(x)) is a tautology for every x.
On the other hand, the formula
(∃x)(p(x) → q(x)) ∧ (∃x)(q(x) → r(x)) → (∃x)(p(x) → r(x))
is not a tautology. To see this, take the domain of interpretation to be N ordered
in the usual way¶ and consider the formula
(2.3) (∃x)(x ≥ 0 → x < 9) ∧ (∃x)(x < 9 → x < 0) → (∃x)(x ≥ 0 → x < 0).
¶
This means that our language has a constant for each natural number and two binary (2-
ary) predicate letters E, L, where E is interpreted as “is equal to” and L as “is less than.” The
sentence x < y is really the sentence L(x, y), while the sentence w ≥ z is an abbreviation for
E(w, z) ∨ L(z, w).
Note 2: The Predicate Calculus – July 28, 2020 2-10
The implication x ≥ 0 → x < 9 is true with x = 8. (Since the antecedent and the
consequent are both true.) The implication x < 9 → x < 0 is true with x = 10.
(Since the antecedent and the consequent are both false.) On the other hand, the
implication x ≥ 0 → x < 0 is always false in N, since the antecedent is always true
and the consequent is always false. Therefore, the antecedent of (2.3), i.e.:
is true, while its consequent (∃x)(x > 0 → x < 0) is false. This shows that (2.3) is
not a tautology, i.e.: there is an interpretation of (2.3) that is false.
On the other hand, the formula
is a tautology.
2.4.4 Example. Some Quantified Tautologies.
(1) (∃v)(φ) ↔ ¬(∀v)(¬(φ))
(2) (∀v)(φ) ↔ ¬(∃v)(¬(φ))
(3) (∃v1 )(∃v2 )(φ) ↔ (∃v2 )(∃v1 )(φ)
(4) (∀v1 )(∀v2 )(φ) ↔ (∀v2 )(∀v1 )(φ)
(5) (∀v)(φ1 ∧ φ2 ) ↔ (∀v)(φ1 ) ∧ (∀v)(φ2 )
(6) (∃v)(φ1 ∨ v2 ) ↔ (∃v)(φ1 ) ∨ (∃v)(φ2 )
(7) (∀v)(φ1 → φ2 ) → ((∃v)(φ1 ) → (∃v)(φ2 ))
(8) (∃v)(φ1 ∧ φ2 ) → (∃v)(φ1 ) ∧ (∃v)(φ2 )
(9) (∀v)(φ1 ) ∨ (∀v)(φ2 ) → (∀v)(φ1 ∨ φ2 )
2.4.5 Remarks. Inconsistent Hypotheses. It is extremely important to realize
that we need consistent hypotheses to obtain valid results. If we deduce q using an
argument based on the tautology
(2.4) p1 ∧ . . . ∧ pn ⇒ q,
we can deduce anything if the premises are inconsistent. To illustrate this point,
suppose that we use the premises p and ¬p. We show below that
p, ¬p |= q
Note 2: The Predicate Calculus – July 28, 2020 2-11
for any q whatsoever. To see this, we can use the following line of reasoning.
Since p, ¬p |= p ∧ ¬p and |= p → p ∨ q for any q whatsoever, we have
The above holds for arbitrary q. Therefore, it also holds for ¬q. No amount of
“logically correct” reasoning does any good if the premises are inconsistent: We
conclude that q and ¬q are both true. Since this cannot be, we cannot tolerate
inconsistent premises if we want a valid conclusion.
as
which says that if q follows from the hypotheses p1 , . . . , pn then pi → q follows from
the remaining hypotheses p1 , . . . , pi−1 , pi+1 , . . . , pn obtained by removing pi from the
original collection of hypotheses. Frequently, mathematical arguments are carried
out without listing all the hypotheses used, assuming that the reader is aware that
certain hypotheses are in place all the time. When this is the case, it is not unusual
to write pi |= q (or pi ⇒ q) instead of writing (2.6). Here, p1 , . . . , pi−1 , pi+1 , . . . , pn
are the hypotheses assumed to be in place by default.
1. (∃x)(∀y)(x = −y)
2. (∀y)(∃x)(x = −y)
3. (∀y)(∃x)(2x = y)
4. (∀y)(∃x)(x2 = y)
5. (∀y)(∃x)(y 2 = x)
Note 2: The Predicate Calculus – July 28, 2020 2-12
6. (∃x)(x2 = x)
7. (∀x)(∀y)(∃z)(z = x + y)
8. (∃z)(∀y)(∃x)(z = x + y)
9. (∀x)(∃y)(x > 0 → x2 = y)
11. (∀x)(∀y)(x = y 2 → x ≥ 0)
12. (∀x)(∀y)(y = x2 → x ≥ 0)
13. (∀x)(∀y)(x2 = y 2 → x = y)
3. Regard statements I. and II. as axioms (that is, assume that the interpretation
of both formulas φ and ψ is true). For each of the statements below, either
provide a proof using the axioms and tautological reasoning, or else provide
a counterexample showing the statement need not follow from the axioms.
Sets
Contents
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-1
3.2 Axioms of Extension and Separation . . . . . . . . . . . . . . 3-3
3.3 Intersections and Differences . . . . . . . . . . . . . . . . . . 3-8
3.4 Unions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-9
3.5 Powers, Products and Ordered Pairs . . . . . . . . . . . . . . 3-12
3.6 Appendix A: The Axiom Schema of Separation . . . . . . . . 3-16
3.1 Introduction
Because we cannot bootstrap mathematics from nothing, we cannot define every-
thing in terms of things previously defined because we need a beginning. We need
starting points. Thus, an undefined primitive notion is a notion that is not defined
in terms of more elementary notions. The notion set is such a notion.
As we shall show in Section 3.2.13 below, paradoxes creep in far too easily in
the development unless we specify quite carefully how sets may be constructed.
For this we need axioms. An axiom is simply a proposition we take to be true
without proof. The language in which axioms must be written is the very small
language of the predicate calculus that we describe below. We must choose our
axioms carefully and our language accurately.
We develop the rudiments of Zermelo-Fränkel set theory (ZF for short). Note
that this brand of set theory comes in many flavors and it is very likely that other
3-1
Note 3: Sets – July 28, 2020 3-2
sources may contain an axiomatization somewhat different from the one given here.
In ZF there are only sets and there is nothing else.
In developing a theory where only sets exist, we lose the intuitive notion of
sets of “individuals,” i.e.: a fleet of ships does not exist in the theory (there are
no ships, only sets). In other words, we can only talk about sets whose members
are also sets. Not only is this not devastating, but is actually quite irrelevant for
our purposes. The first batch of sets that we shall manufacture to get the process
started are the “official” natural numbers (defined as sets).
Also an undefined primitive notion is that of belonging to (being a member of,
being an element of ) a set. If A is a set which has a as a member we say that
a belongs to A, or that a is in A, or that A contains a. We use the two-place
predicate ∈ (a variant of the Greek letter epsilon, which is usually written ϵ or ε)
and write x ∈ A (read: “x is an element of A,” or “x belongs to A,” or “x is in A,”
or “A contains x”) to denote that x is a member of A.
We write x ∈ / A for ¬(x ∈ A) (read: “x is not an element of A,” or“x does not
belong to A,” or “x is not in A,” or “A does not contain x”).
As it would be rather pointless to develop a theory that applies to nothing, we
may as well suppose from the very beginning that a set exists. But, this assumption
is only temporary. Eventually, we will introduce an axiom to give us the existence
of a very special set (an inductive set), a set that will enable us to deduce the
existence of a set to replace our intuitive N (the natural numbers) with a formal
one: a set whose existence follows from the axioms presented. Everything else will
be built from there.
In summary, our language LZF of ZF set theory has infinitely many variable
letters, interpreted as names of sets (undefined primitive objects); no constant
letters; no function letters; and a single two-place predicate letter, interpreted as
the set membership relation, a primitive (undefined) predicate. Everything else
will be defined. Below, we begin by defining equality of sets.
3.1.1 Definition (Range of Quantification). When the quantification shown in a
formula is of the form ∀x or ∃y, as in
(∀x)(∃y)(φ(x, y)),
the domain of interpretation (also called range of quantification) for the variables
x and y is taken to be unrestricted. That is, x and y can be any sets whatsoever.
When the domain of a quantified variable is restricted to be a specific set, we may
indicate that set explicitly, as in
3.2.3 Remark (Distinct Sets). We write A ̸= B for ¬(A = B). Thanks to the
tautology ¬(p → q) ⇔ p ∧ ¬q, we have
¬(A = B) ⇔ ¬(∀x)(x ∈ A ↔ x ∈ B)
⇔ ¬(∀x)[(x ∈ A → x ∈ B) ∧ (x ∈ B → x ∈ A)]
[ ]
⇔ (∃x) ¬[(x ∈ A → x ∈ B) ∧ (x ∈ B → x ∈ A)]
⇔ (∃x)[¬(x ∈ A → x ∈ B) ∨ ¬(x ∈ B → x ∈ A)]
⇔ (∃x)[(x ∈ A ∧ x ∈
/ B) ∨ (x ∈ B ∧ x ∈
/ A)]
⇔ (∃x)(x ∈ A ∧ x ∈
/ B) ∨ (∃x)(x ∈ B ∧ x ∈
/ A).
In words: the sets A and B are distinct if and only if there is something in A that
is not in B or there is something in B that is not in A.
3.2.4 Remarks. If a set has precisely the members a1 , . . . , an , we denote it by
{a1 , . . . , an }. The set {a} which has a as the only member is called the singleton of
a. More generally, a one-member set may be called a singleton, without specifying
the member.
Sets have no ordering. An object either belongs to a set, or does not belong
to that set. It can neither belong twice nor can it belong before or after some
other object in the same set. (The order in which the elements of a set are listed
is immaterial.) For example, assuming for now that the objects listed are sets, we
have
B = {x ∈ A | φ(x)}
Note 3: Sets – July 28, 2020 3-5
(read: “B is the set of all x in A such that φ(x)”). Note: We must have a set A
before we can define the set B.
Since we have an axiom for each formula φ(x), we have infinitely many axioms,
hence the name axiom schema. (The axiom schema of separation shall also be
called, colloquially and informally, the axiom of subsets.) Note that we cannot
write something like ∀φ, since a formula is a metalinguistic notion, outside of the
language we have chosen to develop set theory. We can quantify over sets, and
only sets, the only objects in our theory.
3.2.6 Definition (The Empty Set). If any set A exists, so must the set ∅ (read:
“empty set”) defined by
(3.4) ∅ = {x ∈ A | x ̸= x}.
Thanks to the axiom schema of separation, (3.4) defines a set. To show that ∅ has
no elements and is independent of the choice of A used in (3.4), note that if x ∈
/A
then x cannot be in B. If x ∈ A then x cannot be in B either, since x = x always
(2.2.1). In other words: (∀x)(x ∈
/ ∅), there is no x which belongs to ∅.
Could there be another set, different from ∅, that has no elements? Suppose
that E is any set with no elements. Then x ∈ ∅ ⇒ x ∈ E, since x ∈ ∅ is always
false. Similarly, x ∈ E ⇒ x ∈ ∅, since x ∈ E is always false. We have shown that
x ∈ ∅ ⇔ x ∈ E.
Thanks to the axiom of extension, ∅ = E. This shows that any two empty sets are
identical: There is only one empty set.
3.2.7 Definition (Subsets). If A and B are sets and every element of B is also an
element of A, we say that B is a subset of A and write B ⊂ A, or A ⊃ B (read:
“A is a superset of B”). The defining formula is
(∀A)(∀B)[B ⊂ A ↔ (∀x)(x ∈ B → x ∈ A)].
We also call the phrase “B ⊂ A” an inclusion and say that A includes B or that
B is included in A. If B is a subset of A, but B ̸= A, we say that B is a proper
subset of A.
To deny B ⊂ A or A ⊃ B, we write B ̸⊂ A (read: “B is not a subset of A”) or
A ̸⊃ B (read: “A is not a superset of B”), respectively. We have
¬(B ⊂ A) ⇔ ¬(∀x)(x ∈ B → x ∈ A)
⇔ (∃x)[¬(x ∈ B → x ∈ A)]
⇔ (∃x)(x ∈ B ∧ x ∈
/ A).
In words: B is not a subset of A if and only if there is something in B that is not
in A.
Note 3: Sets – July 28, 2020 3-6
3.2.8 Example. Given the set A = {a, b, c}, its subsets are ∅, {a}, {b}, {c}, {a, b},
{a, c}, {b, c}, {a, b, c}. Except for the last, they all are proper subsets of A. Do not
confuse the notion of set membership with the notion of subsets: in this example
a ∈ A, but a ̸⊂ A. For B = {a, {a}}, we have {a} ∈ B and {a} ⊂ B.
3.2.9 Theorem (Singletons). The singleton of every member of a nonempty set
is a set.
Proof. If A is not empty, for every member a ∈ A, we have
{a} = {x ∈ A | x = a}.
3.2.10 Remark (Russell’s Paradox). The argument used in the proof of the the-
orem below is known as Russell’s paradox. Of course, in the present situation it
does not lead to a paradox; but it can if the axioms are not specified properly.
3.2.11 Theorem (No Set Contains Everything). For every set A there is a set B
such that B ∈
/ A.
Proof. Let A be any set. Use the axiom schema of separation to define the set
(3.5) B = {x ∈ A | x ∈/ x} }.
| {z
φ(x)
(3.6) B ∈ A → (B ∈ B → B ∈
/ B).
If B ∈
/ B, then φ(B) is true. Therefore, thanks to (3.5), B ∈ B. We obtain
(3.7) B ∈ A → (B ∈
/ B → B ∈ B).
B ∈ A → (B ∈ B ↔ B ∈
/ B),
[p → (q ↔ ¬q)] → ¬p
we conclude that B ∈
/ A.
Note 3: Sets – July 28, 2020 3-7
3.2.12 Corollary (Nonexistence of a Universal Set). There is no set such that all
sets belong to it.
x∈A↔x∈
/ x.
B = {x ∈ A | x ∈
/ B}.
If A is the empty set, then B is also the empty set. If A is not empty, we have
a ∈ A for some a. Then,
a∈B→a∈ /B
and
a∈
/ B → a ∈ B,
whence
a∈B↔a∈
/ B,
∗
Gottlob Frege, Grundgesetze der Arithmetik, 1893.
†
See Patrick Suppes, Axiomatic Set Theory, Dover, New York, 1972, pp. 5–12.
‡
Ernst Zermelo, “Untersuchungen ueber die Grundlagen der Mengenlehre,” Math. Annalen,
1908, Vol. 65, pp. 261-281.
§
Burali-Forti, C. “Una questione sui numeri transfiniti,” Rendiconti del Circolo Mat. di
Palermo, 1897, Vol. 11, pp. 154-164.
Note 3: Sets – July 28, 2020 3-8
a contradiction. Hence, the existence of the set B, guaranteed by the axiom schema
of separation, leads to a contradiction. We cannot escape contradiction by insisting
that B is always empty, for if B is empty and A is not, there must be some a ∈ A
with a ∈/ B, whence a ∈ B, making B nonempty.
A ∩ B = {x ∈ A | x ∈ B},
or
A ∩ B = {x ∈ B | x ∈ A}.
In other words:
x ∈ A ∩ B ⇔ x ∈ A ∧ x ∈ B.
A B
3.3.2 Definition (Disjoint Sets). Two sets are said to be disjoint just in case they
have no element in common, i.e.: A and B are disjoint iff A ∩ B = ∅.
A \ B = {x ∈ A | x ̸∈ B}.
That is: x ∈ A \ B ⇔ x ∈ A ∧ x ∈
/ B. (See Figure 3.2.)
Note 3: Sets – July 28, 2020 3-9
A B
That the above definition does not depend on the choice of A is clear upon examin-
ing the condition for membership in ∩S (since A is an element of S and, therefore,
is one of the y’s): ∩
x∈ S ⇔ (∀y ∈ S)(x ∈ y).
Alternate notations are:
∩ ∩
A and {A | A ∈ S}.
A∈S
3.3.5 Examples.
1. Let A = {0, 1, 3, 4, 5} and B = {2, 3, 5, 6}. Then, A ∩ B = {3, 5}.
2. Let A = {a, b, c, d} and B = {a, c, e}. Then, A \ B = {b, d}.
3. For every set A we have that A \ A = ∅ and A \ ∅ = A.
4. Let R denote the reals and Q denote the rationals. Then R \ Q denotes the
irrationals.
5. Let A = {1, 2, 3}, B = {0, 1, 2}, C = {2, 3, 4} and S = {A, B, C}. Then
∩S = {2}.
3.4 Unions
According to the axiom schema of separation, we must specify a set C to define
the intersection
A ∩ B = {x ∈ C | x ∈ A ∧ x ∈ B}.
Note 3: Sets – July 28, 2020 3-10
3.4.1 Axiom of Unions. Given any collection S of sets, there is a set which
contains all the elements of the members of S. That is, given any set S, there is a
set U such that
A ∈ S ∧ x ∈ A ⇒ x ∈ U.
Of course, U may contain much more. Put more precisely, we have:
(∀S)(∃U )(∀A)(∀x)(A ∈ S ∧ x ∈ A → x ∈ U ).
From the above, given any two sets A and B, can we form their union A ∪ B?
Of course we can, if A and B are both members of some set C. However, in general,
we do not know whether such set C always exists for any two given sets A and B.
Therefore, the answer is negative, unless we invent another axiom.
3.4.4 Axiom of Pairing. Any two sets are both members of some set:
(∀A)(∀B)(∃C)(A ∈ C ∧ B ∈ C).
Note 3: Sets – July 28, 2020 3-11
3.4.5 Definition (Unordered Pairs). Thanks to the axiom of pairing, any two sets
A and B belong to a third set C. But, C may contain more. The axiom schema
of separation gives us the set containing precisely A and B:
{A, B} = {x ∈ C | x = A ∨ x = B}.
3.4.6 Remark. Thanks to the axiom of pairing, for any two sets A and B the
unordered pair {A, B} is a set. The axiom of unions gives us a set U which contains
the elements of A and B. The axiom schema of separation gives us the set
A ∪ B = {x ∈ U | x ∈ A ∨ x ∈ B}.
A B
Proof. When A = B, the unordered pair {A, B} is {A, A} = {A}. This says that
if A is a set, so is the singleton {A} of that set.
3.4.9 Proposition (There Is No Biggest Set). Every set is a proper subset of some
other set.
3.4.10 Proposition (De Morgan’s Laws). For any three sets A, B and C we have
(3.8) A \ (B ∪ C) = (A \ B) ∩ (A \ C),
(3.9) A \ (B ∩ C) = (A \ B) ∪ (A \ C).
x ∈ A \ (B ∪ C) ⇔ x∈A∧x∈ / (B ∪ C)
⇔ x ∈ A ∧ ¬(x ∈ B ∨ x ∈ C)
⇔ x ∈ A ∧ (x ∈
/ B∧x∈ / C)
⇔ (x ∈ A ∧ x ∈
/ B) ∧ (x ∈ A ∧ x ∈
/ C)
⇔ x∈A\B∧x∈A\C
⇔ x ∈ (A \ B) ∩ (A \ C).
x ∈ A \ (B ∩ C) ⇔ x∈A∧x∈ / (B ∩ C)
⇔ x ∈ A ∧ ¬(x ∈ B ∧ x ∈ C)
⇔ x ∈ A ∧ (x ∈
/ B∨x∈ / C)
⇔ (x ∈ A ∧ x ∈
/ B) ∨ (x ∈ A ∧ x ∈
/ C)
⇔ x∈A\B∨x∈A\C
⇔ x ∈ (A \ B) ∪ (A \ C).
3.5.1 Axiom of Powers. All subsets of a given set are elements of some other
set. That is, given any set A there is a set P such that if B is a subset of A then
B ∈ P:
(∀A)(∃P )(∀B)(B ⊂ A → B ∈ P ).
Note 3: Sets – July 28, 2020 3-13
3.5.2 Definition (The Power Set of a Set). With every set A we associate the set
P(A), the set of all subsets of A, called the power set of A:
P(A) = {x ∈ P | x ⊂ A},
P({1, 2, 3}) = {∅, {1}, {2}, {3}, {1, 2}, {1, 3}, {2, 3}, {1, 2, 3}}.
3.5.4 Remark. When we list two objects, a and b, in that order, we speak of the
ordered pair “first a, then b”. We denote this ordered pair by (a, b). Instead of
introducing a new undefined primitive notion, we shall define (a, b) as a set. To do
this, we need to clear up some technicalities first.
3.5.5 Proposition (Background for Ordered Pairs). For every pair of sets A and
B and for every x ∈ A and every y ∈ B
and
x ∈ A ∧ y ∈ B ⇒ {x} ⊂ A ∪ B ∧ {y} ⊂ A ∪ B
⇒ {x, y} ⊂ A ∪ B
⇒ {x, y} ∈ P(A ∪ B).
Since {x} and {x, y} are both members of P(A ∪ B), we have
which implies
{{x}, {x, y}} ∈ P(P(A ∪ B)).
3.5.7 Proposition (Equality of Ordered Pairs). Two ordered pairs (x, y) and
(x′ , y ′ ) are equal if and only x = x′ and y = y ′ .
Note 3: Sets – July 28, 2020 3-14
(x, y) = (x′ , y ′ ) ⇒ x = x′ ∧ y = y ′ .
3.5.8 Definition (Cartesian Products). Given two sets A and B, for x ∈ A and
y ∈ B, the set {{x}, {x, y}} belongs to P(P(A ∪ B)). We define the Cartesian (or
cross) product A × B (read: “A cross B”) of A and B by
{ }
A × B = (x, y) ∈ P(P(A ∪ B)) x ∈ A ∧ y ∈ B .
3.5.10 Exercise. Commutativity: Prove the following for any two sets A and B.
(a) A ∪ B = B ∪ A,
(b) A ∩ B = B ∩ A.
Note 3: Sets – July 28, 2020 3-15
3.5.11 Exercise. Associativity: Prove the following for any three sets A, B,
and C.
(a) (A ∪ B) ∪ C = A ∪ (B ∪ C),
(b) (A ∩ B) ∩ C = A ∩ (B ∩ C).
3.5.12 Exercise. Distributivity: Prove the following for any three sets A, B,
and C.
(a) A ∪ (B ∩ C) = (A ∪ B) ∩ (A ∪ C),
(b) A ∩ (B ∪ C) = (A ∩ B) ∪ (A ∩ C).
3.5.13 Exercise. Prove the following for any two sets A and B.
(a) A = B ⇔ A ⊂ B ∧ B ⊂ A,
(b) A ⊂ B ⇔ A ∪ B = B,
(c) A ⊂ B ⇔ A ∩ B = A,
(d) A ⊂ B ⇔ A \ B = ∅.
3.5.14 Exercise. Prove the following for any four sets A, B, C and D.
(a) A \ (A \ B) = A ∩ B,
(b) A ⊂ B ∧ B ⊂ C ⇒ A ⊂ C,
(c) A ∩ (B \ C) = (A ∩ B) \ (A ∩ C),
(d) (A ∪ B) × C = (A × C) ∪ (B × C),
(e) (A ∩ B) × (C ∩ D) = (A × C) ∩ (B × D),
(f ) (A \ B) × C = (A × C) \ (B × C).
3.5.15 Exercise. Symmetric Difference: Define the binary operation ∆ (sym-
metric difference) between the sets A and B by
A∆B = (A \ B) ∪ (B \ A).
(See Figure 3.4.) Prove that this operation is commutative, associative, A∆∅ = A
and A∆A = ∅.
Note the alternate definition
A∆B = (A ∪ B) \ (A ∩ B).
prove that
(A ∪ B) \ (A ∩ B) = (A \ B) ∪ (B \ A)
for all sets A and B.
Note 3: Sets – July 28, 2020 3-16
A B
3.5.16 Exercise. Every product A×B is a set of ordered pairs. Prove the following
converse: for every set S of ordered pairs, there are sets A and B such that S ⊂
A × B.
3.5.17 Exercise. Does the trick used to define ordered pairs extend to define or-
dered triples? Define the triple (x, y, z) = {{x}, {x, y}, {x, y, z}}. If the statement
(x, y, z) = (x′ , y ′ , z ′ ) → x = x′ ∧ y = y ′ ∧ z = z ′
What is the difference between the formulation given here and that given in Sec-
tion 3.2.5? We repeat the latter below.
(S2) Let φ(x) be a formula where the variable x is free and where the variable Y
is not free. Then,
Now, (3.10) plainly states that for every pair of sets A and B there is a set
C = A ∩ B. We need only one formula, ψ(x, B) ⇔ x ∈ B. Only one axiom is
needed to obtain all intersections A ∩ B. On the other hand, (3.11) says that for
every A there is a C = A ∩ B, which depends on the formula φ(x) ⇔ x ∈ B. We
can get any A ∩ B we want, but we must change the formula and, therefore rely on
a different axiom for each choice of B. We no longer obtain all intersections A ∩ B
with a single axiom. In the end, both (S1) and (S2) serve the same purpose.
Note that we cannot modify (S2) to read
We have agreed to quantify only over sets, not formulas. We are using what is
called a first-order predicate language.
We close by pointing out that the axiom schema of separation follows from the
axiom schema of replacement, which shall be presented later.
Foundations Note 4
Contents
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-1
4.2 The Set of Natural Numbers . . . . . . . . . . . . . . . . . . 4-2
4.3 The Peano Postulates . . . . . . . . . . . . . . . . . . . . . . 4-3
4.4 Trichotomy . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-8
4.1 Introduction
Assuming, for the time being, that a set exists and, recalling that
• if a set exists so does the empty set ∅ and
• if a is set so are {a} and a ∪ {a},
we can define the first four natural numbers as:
0 = ∅,
1 = 0 ∪ {0} = {0},
2 = 1 ∪ {1} = {0} ∪ {{0}} = {0, {0}} = {0, 1},
3 = 2 ∪ {2} = {0, 1} ∪ {{0, 1}} = {0, 1, {0, 1}} = {0, 1, 2}.
Expressed just in terms of the empty set, the above are:
0 = ∅,
1 = {∅},
2 = {∅, {∅}},
3 = {∅, {∅}, {∅, {∅}}}.
4-1
Note 4: The Natural Numbers – July 28, 2020 4-2
The numbers just defined are all distinct from each other. Other consequences
of the above definitions are 0 ∈ 1 ∈ 2 ∈ 3 and 0 ⊂ 1 ⊂ 2 ⊂ 3.
We have defined just the first four natural numbers. To define them all and
package them in a set that can replace our intuitive notion N we need a new axiom.
Below, we show how to accomplish this.
Define the successor of any set x to be the set x+ = x ∪ {x}. Then, an inductive
set is a set that contains the empty set and the successor of every of its members.
4.2.2 Remark. At this point we need an axiom to ensure that an inductive set
exists.
4.2.4 Remarks. Note that the above axiom can also be written
4.2.5 Definition. The Set of Natural Numbers: The set of natural numbers ω
is defined as the smallest (by inclusion) inductive set. Let ϕ(B) denote (4.1) with
A replaced by B. Let A denote the inductive set whose existence is guaranteed by
the axiom of infinity. Define
ω = {x ∈ A | (∀B)(ϕ(B) ↔ x ∈ B)}.
4.2.6 Proposition. The set ω of natural numbers is the smallest inductive set.
Note 4: The Natural Numbers – July 28, 2020 4-3
Proof. By the axiom of extension, the set ω is defined uniquely. We show that ω is
inductive. We have ∅ ∈ ω because ∅ belongs to every inductive set. If x ∈ ω, then
x must belong to every inductive set B. Therefore, x ∪ {x} also belongs to every
inductive set B, and, therefore, to ω. This shows that ω is inductive. We now show
the minimality of ω. Note that for every x, x belongs to ω if and only if x belongs
to every inductive set. Therefore, ω is a subset of every inductive set.
(P 1) 0∈ω
and
(P 2) (∀n) (n ∈ ω → n+ ∈ ω).
(P 4) (∀n) (n ∈ ω → n+ ̸= 0),
0 ∈ A ∧ (∀n)(n ∈ A → n+ ∈ A).
m ∈ n ∪ {n} ⇔ m ∈ n ∨ m ∈ {n}
(4.2)
⇔ m ∈ n ∨ m = n.
Proof. First we prove (I) by induction. Consider the set A of all natural numbers
having the property (4.3):
A = {n ∈ ω | n ⊂ ω}.
To prove (I), it suffices to show that A = ω. We need to prove that A fulfills the
criteria of (P 3):
(i) We show that 0 ∈ A (the base step of the inductive proof). This is obvious,
since 0 = ∅ ⊂ ω.
Now we prove (II): Let A be the set of all natural numbers defined by
A = {n ∈ ω | (∀m) (m ∈ n → n ̸⊂ m)}
(4.5)
= {n ∈ ω | (∀m) (n ⊂ m → m ∈ / n)}.
(∀n)(n ∈ A → n+ ∈ A).
(∀a)(a ∈ A → a ⊂ A),
A = {n ∈ ω | (∀m) (m ∈ n → m ⊂ n)}.
or, by contraposition,
If m = n, we are finished. If m ̸= n, since (4.6) and (4.7) are both true, we have
m ∈ n and n ∈ m. Therefore, since n and m are transitive by Proposition 4.3.6,
we have m ⊂ n and n ⊂ m. That is: m = n.
4.3.8 Exercise. Prove that every natural number other than zero is the successor
of a natural number.
4.3.9 Example. We end this section with an informal example of a proof by strong
induction. We have not yet introduced either the arithmetic operations or the order
of ω (a task we will postpone until Note 6), so we are forced to provide an informal
example that relies on our physical intuition and some (yet unproven) properties
of natural numbers.
Suppose that n is a natural number and we are in possession of a rectangular
chocolate bar of size n + 1 by 1. (We take the length of the bar to be n + 1 so that
even when n = 0 we have a 1-by-1 bar rather than no chocolate at all.) Suppose
Note 4: The Natural Numbers – July 28, 2020 4-8
this bar is meant to be cut into 1-by-1 pieces (n+1 of them). It is very easy to show,
using ordinary induction, that n cuts will be necessary if, at each successive step,
we cut off (say) the rightmost 1-by-1 square, repeating n times over. It should also
be intuitively clear that, regardless of the position of the successive cuts exactly
n cuts will be necessary. The latter statement is most easily proved by strong
induction.
Claim. It takes n cuts to break an (n + 1)-by-1 chocolate bar into 1-by-1 pieces,
regardless of the order in which the cuts are performed.
Proof. Let A be the set of natural numbers n for which the statement above is
true. By strong induction on n, we will show that A = ω.
The truth of the statement for n = 0 is obvious, since 0 + 1 = 1 and a 1-by-1
chocolate bar is already cut into 1-by-1 pieces, so zero cuts are needed.
Assume now that n is a nonzero natural number such that n ⊂ A; in other
words, every predecessor m of n is an member of A (this is the strong inductive
hypothesis). Perform a first cut on the bar in any allowable place; this splits the bar
into two smaller bars of sizes a-by-1 and b-by-1. Neither a nor b is zero; therefore
(by Exercise 4.3.8), we have a = k + = k + 1, b = m+ = m + 1 for some natural
numbers k, m. Moreover, the fact that no chocolate was lost nor added in the
process of cutting means that a + b = n + 1.
Since neither a, b are zero, it follows from a + b = n+ that both a, b are prede-
cessors of n+ (though we haven’t proved this). Hence, k, m are predecessors of n
(this is proved in Proposition 4.4.7 below).
By the strong inductive hypothesis, both k, m are members of A; therefore,
the a-by-1 bar requires k cuts to be split into 1-by-1 pieces and the b-by-1 bar
requires m.
We conclude that k + m cuts will always be required to cut the two smaller
bars into 1-by-1 pieces. Adding the first cut that split the (n + 1)-by-1 bar into
two, we see that the altogether number of cuts needed is equal to 1 + k + m =
a + b − 1 = n; hence the statement is true for n, so n ∈ A, and the strong induction
is complete.
4.4 Trichotomy
If m and n are natural numbers, m ∈ n does not formalize the intuitive notion that
“m is less than n” unless exactly one of m ∈ n, m = n and n ∈ m is true. We show
that this is indeed the case.
4.4.1 Proposition. For any two natural numbers m and n at most one of the
statements m ∈ n, m = n and n ∈ m can be true.
Note 4: The Natural Numbers – July 28, 2020 4-9
Cn = {m ∈ ω | m ∈ n ∨ m = n ∨ n ∈ m}
N = {n ∈ ω | Cn = ω}
(the set of all natural numbers comparable with every natural number).
4.4.3 Remarks. Note that n ∈ N is equivalent to Cn = ω. From the definition of
Cn it is clear that
(4.9) n ∈ Cn
and
(4.10) m ∈ n → m ∈ Cn .
(4.11) m ∈ Cn ↔ n ∈ Cm .
m ∈ n, m = n, n ∈ m.
to conclude that N = ω.
Suppose that n ∈ N for some natural number n, i.e.: suppose that Cn = ω.
The proof that Cn+ = ω is inductive. We prove that 0 ∈ Cn+ . Then we prove that
m ∈ Cn+ → m+ ∈ Cn+
(b) We prove that m ∈ Cn+ → m+ ∈ Cn+ . Suppose that m ∈ Cn+ . There are
three cases to consider: m ∈ n+ , m = n+ and n+ ∈ m.
Proof. Recall that a natural number is a transitive set (it includes everything it
contains). In other words, for n and ℓ in ω we have
(4.13) n ∈ ℓ ⇒ n ⊂ ℓ.
(4.14) m ∈ n ∧ n ∈ ℓ ⇒ m ∈ ℓ.
We prove that
(4.15) m+ ∈ n+ ⇒ m ∈ n.
(4.16) m ∈ n → m+ ∈ n+ .
m+ ∈ n+ ⊂ (n+ )+ .
Hence, m+ ∈ (n+ )+ .
m+ = n+ ∈ (n+ )+ .
Hence, m+ ∈ (n+ )+ .
m + n = (. . . ((m + 1) + 1) . . .) + 1 .
| {z }
n times
4.4.10 Exercise. Prove that if A is a set such that (4.17) holds, then A is transitive.
4.4.11 Exercise. Prove that if A is a transitive set, then ∪A ⊂ A.
4.4.12 Exercise. Prove that if a is a transitive set, then ∪(a+ ) = a.
4.4.13 Exercise. Prove that A is a set such that ∪A ⊂ A, then A is transitive.
4.4.14 Exercise. Prove that if a is a transitive set, then a+ is a transitive set.
4.4.15 Exercise. Prove that ω is transitive.
4.4.16 Exercise. Prove that if A is a transitive set, then P(A) is a transitive set.
4.4.17 Exercise. Prove that if A is a transitive set, then ∪A is a transitive set.
Note 4: The Natural Numbers – July 28, 2020 4-13
4.4.18 Exercise. Prove that if every member of a set A is a transitive set, then
∩A is a transitive set.
4.4.19 Exercise. Prove that if every member of a set A is a transitive set, then
∪A is a transitive set.
4.4.20 Exercise. Prove Postulate (P 3′ ) (the principle of strong induction).
Suggestion: Show that if A satisfies the premises of (P 3′ ), then the set
B = {n ∈ ω | n ⊂ A}
Contents
5.1 Functions, Forward and Inverse Images . . . . . . . . . . . . 5-1
5.2 Compositions and Identities . . . . . . . . . . . . . . . . . . 5-5
5.3 Injections, Surjections, Bijections and Inverses . . . . . . . . 5-7
5.4 Properties of Functions . . . . . . . . . . . . . . . . . . . . . 5-11
5.5 Indexed Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-15
5.6 Equinumerous Sets . . . . . . . . . . . . . . . . . . . . . . . 5-17
5.7 Relations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-21
5.8 The Recursion Theorem . . . . . . . . . . . . . . . . . . . . . 5-26
5.9 Cantor and Schröder-Bernstein Theorems . . . . . . . . . . . 5-30
5-1
Note 5: Functions and Relations – July 28, 2020 5-2
N, the integers Z, the rationals Q and the reals R with the usual arithmetic op-
erations will be used in many of the examples given to show the meaning of what
we are doing. Recall that we start the natural numbers with 0, not 1 (as some
authors do). The examples are not part of the official development of ideas, but
are intended to be useful in understanding where these ideas will eventually lead.
5.1.2 Definition. Functions. A function f from (or on) A to (or into) B is a
subset of A×B such that for each a ∈ A there is exactly one ordered pair (a, b) ∈ f .
f
We usually write∗ f : A → B or A → B to say that f is a function from A to
B. The set A is called the domain of f and B is called the codomain of f . When
the name of the function is not mentioned, we simply write A → B.
If (a, b) ∈ f , we write f (a) for b and we say that f maps a to f (a). If the value
f (a) is given by some expression E(a) (a formula where the variable a is free), we
may write
f : A → B : a 7→ E(a)
(read: “a maps to E(a)”). The text f (a) = E(a) is called the rule (for computing
the value) of f .
5.1.3 Examples. Below are two simple examples showing how functions may be
defined.
f = {(x, y) ∈ R × R | y = x2 }.
A subset f of A × B is a function if
A subset f of A × B is a function if
A subset f of A × B is a function if
B A = {f ∈ P(A × B) | f is a function}.
In plainer language, given any two sets A and B, the collection of all functions
from A to B is a set. This set is is denoted by B A .
5.1.8 Remarks. A function is a set. Therefore, equality of functions is simply
equality of sets. That is, the functions f and g are equal if and only if the domains
are equal, the codomains are equal and f (x) = g(x) for all x in the domain.
Note that if A is empty so is A × B. This situation gives rise to the empty
function (which is just the empty set ∅). That there is only one empty function is
a consequence of the axiom of extension.
Do not confuse f with f (a). If f : A → B then f ∈ B A , while f (a) ∈ B.
5.1.9 Examples.
(3) The set RN is the collection of all “sequences” f (0), f (1), f (2), . . . of real
numbers.
f∗ : P(A) → P(B),
f ∗ : P(B) → P(A),
the inverse image function of f . These two functions are defined as follows. Given
any A0 ⊂ A, f∗ (A0 ) is the collection of all f (a) for a ∈ A0 . Given any B0 ⊂ B,
f ∗ (B0 ) is the collection of all a ∈ A such that f (a) ∈ B0 . The function value f∗ (A)
is called the range (or image) of f and is denoted by ranf .
5.1.14 Example. Suppose
f : {0, 1, 2} → {0, 1}
is given by
f (0) = f (1) = f (2) = 0.
Then,
(5) f ∗ ({1}) = ∅.
Note 5: Functions and Relations – July 28, 2020 5-5
g◦f
.......................................................................
................ ............
........... ...........
.......... ..........
..
..
..
.......... .........
..
..
..
...... ........
........
..
..
.....
. .......
..
..
...... .......
.......
.
..
.... .......
R
.
...... f g .....
A -B -C
h ◦ (g ◦ f )
.................................................................
..................... ...............
.............. .............
............ ..........
.
................. ..........
.
.......
. .........
.
. .........
g◦f
.
...... ........
..
..
.. ........
..
..
...... .......
..
..
...... ..... .......
..
..
..... ....
..
..
...
............................ ........................................ .......
.......
..
..
..... ................... .............
..
.............. ......
........ .............. ..
............
......
......
..
..
.... ........... .... ......
................. ........ ......
.
.. .......... ......
. ............. ...
........ ......
. ............. ....
......
......
. .......
..............
f g R .....
h R .....
A -B -C -D
h ◦ (g ◦ f )
...............................................................
...................... ...............
.............. .............
............ ..........
.......
........... ..........
.
.......
. .........
.. .........
g◦f
. .
..... ........
......
..
..
......
. ........
.......
..
..
...... .......
..
..
..... ....
...
..
..
. ......................................................................
. .......
.......
..
..
..... ..
..
..
............ .............
..
.... ......
........ ..............
. ..........
..
...
......
......
..
..
.... ............ .........
... ......
................. ........ ......
.
.. ........... ......
............... ...
........ ......
......
.. ........... .... ......
.......
..............
f g R h R
..... ......
A... -B -C -D
.... ..
.............
......
........
.... ....
.... .... .
.... ...... .
.... ...... .... .... ...... ...
.... ...... ...... .....
.... ...... ....... ......
.... ...... ....... ......
.... ....... ....... ...........
h◦g
.... ....... ..
..
..
.
..... ........ ......... ......
...... ........
.......... ......... ......
...... ............. ........... ...........
..................... ............. .
......
...... .......................................... ...
........
...... .
....... .......
.......
(h ◦ g) ◦ f
....... .......
....... ..
........ ........
........ ........
.......... .........
........... ..........
............
................. ............
.............
.....................................................................
5.2.2 Definition. Identities: With every set X we associate the identity function
of X, defined by
idX : X → X : x 7→ x.
In other words,
f
A -B
I I
idA idB
and
and
Since (5.5) and (5.6) hold for arbitrary A and B, they must hold when we take
A = B. Then, thanks to (5.3) with f = i we have
(5.7) i = i ◦ idA .
f :A→B
is called
• a surjection (or is said to be surjective) just in case for every b ∈ B there is
some a ∈ A such that b = f (a) (i.e.: f∗ (A) = B);
Note 5: Functions and Relations – July 28, 2020 5-8
5.3.2 Examples. For real numbers a and b, let [a, b] denote the set (interval) of
all real numbers x such that a ≤ x ≤ b.
'$ f '$
.........................................................................
a q..
...............
...........
.............
............
..........
..........
........ b
′ q......
a ..............................................................................................................................* j ........
... q
............
&% &%
A B
On the other hand, Figure 5.6 illustrates a function that is injective, but not
surjective, since no two distinct x’s in A are mapped to the same point in B and
since there is no x in A such that f (x) = b′′ .
Note 5: Functions and Relations – July 28, 2020 5-9
'$ f ..........
..........................
.. q q b
.......
..........................................................................................
.............................
q
....
....................
a ..
..
....
.. ...
...
..
..
′′ ..
′ q....... ... q b
... ..
a ...........................................................................................................................1 q ′ ..
.
........ .....
.. .. b .....
...
..
&% .................................
A B
5.3.4 Remark. From the usual definition of a function inverse, it must be clear
that the invertible functions are precisely the bijections, as we shall show below.
5.3.5 Exercise. Let Q denote the rationals and let Q+ denote the nonnegative
rationals. Explain whether each of the following functions is injective or surjective.
1. f : Q → Q : x 7→ x2 .
2. g : Q+ → Q : x 7→ x2 .
3. h : Q → Q+ : x 7→ x2 .
4. i : Q+ → Q+ : x 7→ x2 .
i.e.: such that f (f −1 (b)) = b for all b ∈ B and f −1 (f (a)) = a for all a ∈ A. This
situation can be visualized as shown in Figure 5.7.
f
.....................................................
................ ............
...........
AY ......... j ..........
.
.........
..........
............. . .
..
. .
. .........
.
B
.........
...
I ...................................................
I
f −1
idA idB
{(b, a) ∈ B × A | (a, b) ∈ f }
We have
g = idA ◦ g
= (f −1 ◦ f ) ◦ g
= f −1 ◦ (f ◦ g)
= f −1 ◦ idB
= f −1 .
5.3.11 Remark. Note that (f −1 )−1 = f , since f satisfies all the requirements for
being the inverse of f −1 .
5.3.12 Exercise. Determine whether the composition g ◦ f is injective, surjective,
neither injective nor surjective depending upon f and g each being injective or
surjective.
5.3.13 Exercise. Prove that the composition g ◦ f has an inverse if each of the
functions g and f does, in which case (g ◦ f )−1 = f −1 ◦ g −1 .
5.3.14 Exercise. Construct functions f and g, neither bijective, such that g ◦ f
is bijective.
Note 5: Functions and Relations – July 28, 2020 5-11
a ∈ f ∗ (B1 ∪ B2 ) ⇔ f (a) ∈ B1 ∪ B2
⇔ f (a) ∈ B1 ∨ f (a) ∈ B2
⇔ a ∈ f ∗ (B1 ) ∨ a ∈ f ∗ (B2 )
⇔ a ∈ f ∗ (B1 ) ∪ f ∗ (B2 )
a ∈ f ∗ (B1 ∩ B2 ) ⇔ f (a) ∈ B1 ∩ B2
⇔ f (a) ∈ B1 ∧ f (a) ∈ B2
⇔ a ∈ f ∗ (B1 ) ∧ a ∈ f ∗ (B2 )
⇔ a ∈ f ∗ (B1 ) ∩ f ∗ (B2 )
a ∈ f ∗ (B1 \ B2 ) ⇔ f (a) ∈ B1 \ B2
⇔ f (a) ∈ B1 ∧ f (a) ∈ / B2
∗
⇔ a ∈ f (B1 ) ∧ a ∈ / f ∗ (B2 )
⇔ a ∈ f ∗ (B1 ) \ f ∗ (B2 )
5.4.6 Examples. The analogs of proposition 5.4.2 and 5.4.3 do not hold for for-
ward images. For example, let f : A → B. The equality
where A1 and A2 are subsets of A, does not necessarily hold: To see why the last
proof does not work when we replace ∪ with ∩, note that when we do so we have
The second biconditional above fails because we cannot guarantee that a1 ∈ A1 ∩A2
or a2 ∈ A1 ∩ A2 for all subsets A1 and A2 of A and all b in B, for arbitrary A and
B. However, we always have inclusion and even equality, if f is injective. (See the
exercises below.)
To discover a counterexample, consider Figure 5.8 below as a guide. In Figure
................................
.......
.......
............ ........
.......
..
f '$
.... ..
...................................................................................
...
aq
. .............
.... ........... ...
...
............
...........
..........
...
...
. A1 ..
..
jbq
.........
........
..
... ...
..
.
*
........
............
.
...
a′ q A 2
... . .. ..........
.
........
&%
... ... ...
........ ..
............. ..
...
.... ..........................................................................................
.......
........
............. ........
.......
..
B
.............................
S = {h(i) ∈ A | i ∈ I} = ran h,
an indexed set and call I the indexing set of S. The function h itself is called the
indexing function (or simply the indexing) of S by I. Usually, we write Ai for h(i)
and write S = {Ai | i ∈ I}.
An indexing can be thought as a way to attach “labels” (elements i ∈ I) to
elements of the collection S. Note that the indexing need not be injective, so we
may have Ai = Aj for distinct indexes i, j ∈ I. The indexing set I is arbitrary
(subject to the condition that h indexes the entire collection S): a collection S can
be indexed by different indexing sets I, J; moreover, the same indexing set I can
be used to index S in different ways (through a different indexing function).
The union of all the sets Ai for i ∈ I is given by
∪ ∪
Ai = {a ∈ ∪A | a ∈ Ai for some i ∈ I} = S,
i∈I
φ : P(A) → 2A : B 7→ fB ,
Bf = {a ∈ A | f (a) = 1}.
BfB = {a ∈ A | fB (a) = 1}
= {a ∈ A | a ∈ B}
= B
Note 5: Functions and Relations – July 28, 2020 5-18
for all a ∈ A.
Therefore,
ψ(φ(B)) = ψ(fB ) = BfB = B
for all B ∈ P(A) and
φ(ψ(f )) = φ(Bf ) = fBf = f
for all f ∈ 2A , showing that φ is a bijection between P(A) and 2A . The function
fB is called the characteristic function of the set B and is usually denoted by χB .
5.6.3 Definition. Finite and Infinite Sets. We say that a set is finite just in
case it is equivalent to some natural number. Otherwise, the set is called infinite.
5.6.4 Remark. We know that if n is a natural number, then m ∈ n → m ⊂ n
for every set m. (Every natural number is a transitive set: a set that includes
everything it contains.) While m ∈ n means that m is a natural number, A ⊂ n
does not mean that A is a natural number. So, it does not make sense to ask
whether the implication can be turned around. But, in some sense, we can, as
shown below.
5.6.5 Proposition. Every proper subset of a natural number n is equivalent to
some element of n.
Proof. The proof is inductive. The proposition is true for n = 0 (since 0 has no
proper subsets).
Suppose the proposition is true for some n. Let A be a proper subset of n+ .
Recall that n+ = n ∪ {n}. There are three cases to consider.
Case 1 If A is a proper subset of n then, thanks to the inductive hypothesis, A
is equivalent to some element m of n. Since n+ = n ∪ {n}, we also have m ∈ n+ .
Case 2 If A = n then the proposition hold trivially, since n ∼ n.
Define f : A → n by {
i, if i ̸= n,
f (i) =
k, if i = n.
This function is injective and is therefore a bijection between A and f∗ (A).
If f∗ (A) = n then f is a bijection between A and n (a proper subset of n+ ). If
f∗ (A) ̸= n then f∗ (A) is a proper subset of n. By the induction hypothesis, f∗ (A)
is equivalent to some element m of n. Therefore, A ∼ f∗ (A) ∼ m. Since m ∈ n
implies m ∈ n+ , we are finished.
5.6.6 Proposition. No natural number n is equivalent to a proper subset of itself.
Proof. The proof is inductive. The proposition is true for n = 0, since 0 has no
proper subsets.
Suppose that the proposition is true for some n. Let A ⊂ n+ and let f : n+ → A
be a bijection. We must show that A = n+ .
Either n ∈ A or n ∈ / A. Assume first n ∈ A. Together with the equivalence
n+ ∼ A, this implies n+ \ {n} ∼ A \ {n}, that is, n ∼ A \ {n}. However, A \
{n} ⊂ n+ \ {n} = n, so by the inductive hypothesis A \ {n} = n, and therefore
A = n ∪ {n} = n+ .
Assume now n ∈ / A. Then A ⊂ n. The restriction f |n is a bijection between n
and a subset B of A. Since B ⊂ A and A ⊂ n, we have B ⊂ n. By the inductive
hypothesis applied to f |n we have B = n. Since n ∈ / n and B = (f |n)∗ (n), the
injectivity of f implies f (n) ∈
/ B. On the other hand f (n) ∈ A, contradicting
A ⊂ n = B.
Therefore, no such bijection f exists.
5.6.7 Proposition. No finite set is equivalent to a proper subset of itself.
Proof. The proposition is true for the empty set ∅, since ∅ has no proper subsets.
Let A be a nonempty finite set and B a proper subset of A. We suppose there is
a bijection f : A ∼ B to obtain a contradiction.
Since A is finite, there is a bijection g : A ∼ n for some n ∈ N. Then, g∗ (B) is
a proper subset of n.
Define the injection i : B → A : b 7→ b. We have
g −1 f i g
n ∼ A ∼ B → A ∼ n.
The function φ = g ◦ i ◦ f ◦ g −1 is injective, since it is a composition of injections.
We have
φ∗ (n) = (g ◦ i ◦ f ◦ g −1 )∗ (n)
= (g∗ ◦ i∗ ◦ f∗ ◦ (g −1 )∗ )(n)
= (g∗ ◦ i∗ ◦ f∗ )(A)
= (g∗ ◦ i∗ )(B)
= g∗ (B),
Note 5: Functions and Relations – July 28, 2020 5-20
showing that φ∗ (n), the range of φ, is a proper subset of n. Since every injection
ψ : X → Y always defines a bijection between X and ψ∗ (X), φ defines a bijection
between n and a proper subset (g∗ (B)) of n. We have a contradiction, since no
natural number n is equivalent to a proper subset of n.
5.6.8 Corollary. The set N of natural numbers is infinite.
Proof. The function f : N → N \ {0} : n 7→ n+ is a bijection between N and a
proper subset (N \ {0}) of N. Therefore, N cannot be finite.
5.6.9 Remark. Hilbert’s Hotel. Since infinite sets are not part of our usual
perception of physical reality, we may obtain counterintuive results when we try to
use them in connection with tangible, everyday objects.
As an example, consider the celebrated Hilbert’s Hotel, which is always full, yet
has always room for another guest without the need to share a room.
Hilbert’s Hotel has infinitely many rooms, numbered 0, 1, 2 and so on. The
hotel is always full. When a new guest arrives, the person in room n is asked to
move to room n+ , leaving room 0 available for the new guest.
The room shift is defined by f : N → N \ {0} : n 7→ n+ .
5.6.10 Proposition. A set can be equivalent to at most one natural number.
Proof. If A is a set, m and n are natural numbers, A ∼ m and A ∼ n, then m ∼ n.
If m ̸= n then either m ∈ n or n ∈ m.
If m ∈ n then m ⊂ n (since every natural number is a transitive set). Hence,
m is a proper subset of n (since m ̸= n). Therefore m is equivalent to a proper
subset of m, a contradiction.
Since the roles of m and n can be interchanged, n ∈ m also implies a contra-
diction. Therefore, m = n.
5.6.11 Exercise. Prove that if B ∩ C = ∅ then AB∪C ∼ AB × AC . Hint: Note
that B ∩ C = ∅ guarantees that if f ∈ AB and g ∈ AC then the union
5.7 Relations
When we think about formalizing the notion that “a is the mother of b” we can
think of a collection of pairs (a, b), where a belongs to the set of all women and b
belongs to the set of all human beings. (We are talking about humans.) Not all
women are mothers and a mother may have more than one child. This point serves
to illustrate why relations are more general than functions.
What we call a relation below, should be more properly called a binary relation.
(There are binary, ternary, quaternary relations and so on. An example of a ternary
relation is “a and b are the parents of c.”) Since we shall only use binary relations,
we shall call them simply “relations.”
5.7.1 Definition. Relations: If A and B are sets, any subset R of A × B is called
a relation from A to B. When A = B we say that R is a relation in A. Instead
of writing (a, b) ∈ R we may write a R b. The domain of R, written dom R, is the
subset of A given by
5.7.2 Examples.
(1) The empty set ∅ is the relation with domain ∅ and range ∅.
(2) The relation R from a set A to its power set P(A) defined by a R B ⇔ a ∈ B
has domain A and range P(A) \ {∅}. (Every element of A belongs to some
nonempty subset B of A.)
(3) Let A = {0, 1, 2, 3}, R = {(0, 1), (0, 2), (0, 3), (1, 2), (1, 3), (2, 3)}. Then, R is
the usual “less than” relation in A. For example, (1, 3) ∈ R because 1 < 3
and (3, 2) ∈
/ R because 3 ̸< 2. The domain of R is A and the range is {1, 2, 3}.
m < n ↔ m ∈ n,
m ≤ n ↔ m < n ∨ m = n.
m ⊂ n ∧ m ̸= n ↔ m ∈ n
m ∈ n ↔ m is a proper subset of n.
Note 5: Functions and Relations – July 28, 2020 5-23
It is now a simple matter to verify that < is transitive, but neither reflexive nor
symmetric; while ≤ is reflexive and transitive, but not symmetric.
5.7.6 Remark. In the above example, the relations < and ≤ defined in N are
examples of orderings of N. These relations are discussed in some detail in Note 6.
5.7.7 Definition. Equivalence Relations: A relation that is reflexive, symmet-
ric and transitive is called an equivalence relation.
5.7.8 Definition. Equivalence Classes: Let R be an equivalence relation on a
set A. For every a ∈ A, define the equivalence class [a]R (read: “the equivalence
class of a under R” or “the equivalence class with representative a”) by
[a]R = {b ∈ A | b R a}.
Thus, A/R is the set whose members are all equivalence classes [a]R corresponding
to every a ∈ A.
5.7.9 Proposition. Any representative of an equivalence class uniquely identifies
the equivalence class. That is, if R is an equivalence relation in A then aR b implies
[a]R = [b]R for all a, b ∈ A.
Proof. Suppose R is an equivalence relation in A; a, b ∈ A; a R b. Then, for every
c ∈ A,
c ∈ [a]R ⇒ aRc
⇒ aRb∧aRc
⇒ bRa∧aRc
⇒ bRc
⇒ c ∈ [b]R
and
c ∈ [b]R ⇒ bRc
⇒ aRb∧bRc
⇒ aRc
⇒ c ∈ [a]R ,
5.7.16 Example. Even though, as pointed out above, set isomorphism is not an
equivalence relation among all sets, it is for all subsets of a given set. That is, for
.
any set A, define the equivalence relation = in P(A) by
.
a=b↔a∼b
for all a and b in P(A) (for all subsets a and b of A). Recall that a ∼ b means that
there is a bijection between a and b.
.
The equivalences classes of this relation, the elements of P(A)/ =, are all sets
of subsets of A that are equinumerous. In other words, the equivalence class [a]=. is
the set of all sets that are equinumerous with a. We can define the cardinal number
of a to be [a]=. . For example, if n is a subset of A then [n]=. is the collection of
all subsets of A with precisely n elements. Unfortunately, we cannot define the
cardinal number of just any set in this fashion.
5.7.17 Example. Suppose that odd and even have the usual meaning in N.
(Recall that we have yet to define arithmetic in N.) Define the equivalence relation
.
= in N by
.
m = n ↔ m and n are both even or both odd
for all m and n in N.
There are precisely two equivalence classes under this relation. We have
{
. all even natural numbers if n is even,
[n]= =
all odd natural numbers if n is odd.
For example:
[3]=. = [7]=. = [1]=. = [17]=. = {n ∈ N | n is odd}
and
[8]=. = [12]=. = [100]=. = [0]=. = {n ∈ N | n is even}.
.
In other words, the set N/ = has precisely two elements:
. { }
N/ = = {n ∈ N | n is even}, {n ∈ N | n is odd}
{ }
= [0]=. , [1]=. .
5.7.18 Remarks. From Relations to Functions: A function is a special kind
of relation. Recall that a relation R from A to B is any subset of the Cartesian
product A × B. We say that R is univalent at a ∈ A if
(∀b ∈ B)(∀b′ ∈ B)[(a, b) ∈ R ∧ (a, b′ ) ∈ R → b = b′ ],
i.e.: given a ∈ A, there is at most one ordered pair (a, b) ∈ R. The relation R is a
function from A to B precisely when
Note 5: Functions and Relations – July 28, 2020 5-26
• R is univalent at every a ∈ A.
If both of the above conditions hold, for every a ∈ A there is exactly one b ∈ B
such that a R b, in which case we write b = R(a).
sm (0) = m
and
sm (k + ) = σ(sm (k))
for each k ∈ N. For example,
What is wrong with the above definition? What is very wrong is that we do not
know that sm exists. (What does “define inductively” mean?) The reasoning is
correct, provided we can be assured of the existence of the function sm . This is
precisely what the recursion theorem does: It ensures the existence of the function
sm in this case and of something analogous to it in more general situations as well.
For fixed m ∈ N, what are the required ingredients? Given the set N, an
element 0 ∈ N, and a function σ : N → N. We need the existence of a unique
function sm : N → N such that
• sm (0) = 0 and
for all n ∈ N.
5.8.1 Theorem. Recursion Theorem. Given any set A, any element a ∈ A,
and any function f : A → A, there exists a unique function φ : N → A such that
• φ(0) = a and
• φ(n+ ) = f (φ(n))
for all n ∈ N.
Proof. If such a function φ exists, where do we look for it? The function φ must
be a set of ordered pairs (n, b) ∈ N × A with certain properties. The requirement
φ(0) = a is another way of writing (0, a) ∈ φ. The requirement
In other words, (5.10) implies (5.11). On the other hand, (5.10) is another way of
writing
φ(n) = b ⇒ φ(n+ ) = f (b) ⇒ φ(n+ ) = f (φ(n)).
Hence, (5.11) implies (5.10). Therefore (5.10) and (5.11) are equivalent.
A function is a special kind of relation. So, φ must be a special kind of relation
R from N to A. By what has been said, to be a candidate, any such relation R
must have the property
(5.12) (0, a) ∈ R
and
for all n ∈ N and b ∈ A. Let us call any such relation R recursive (with respect
to φ).
Among all relations from N to A, i.e.: among all elements of the power set
P(N × A), we are only interested in those that are recursive. Define
Φ = {R ∈ P(N × A) | R is recursive}
We want to select the smallest member φ of Φ. To this end, define the set
∩ ∩
(5.14) φ= Φ= R.
R∈Φ
• φ(0) = a, and
such that
1. φ(0) = a0 ,
Proof. The proof is yet another slight variation of that of Theorem 5.8.1. It is left
as an exercise.
5.8.4 Remark. In the language of Note 7 (Definition 7.2.7), property 2. above
says that φ is a choice function for the indexed family A = {An }.
Finally, we have a strictly finite, “bounded” version of Theorem 5.8.3.
5.8.5 Theorem (Bounded Recursion Theorem). Fix a natural number n.
Let a family A = {Am | m ∈ n+ } indexed by the set n+ be given. Let an element
a0 ∈ A0 and, for each m ∈ n, a function
∪ fm : Am → Am+ be given. Then there
exists a unique function φ : n → m∈n+ Am such that
+
(5.15) φ(0) = a0 ,
Note 5: Functions and Relations – July 28, 2020 5-30
(5.16) φ(m) ∈ Am
for all m ∈ n.
Proof. The proof is left as an exercise.
5.8.6 Exercise. Show that Theorem 5.8.5 implies Theorem 5.8.3.
5.8.7 Remark. Other generalizations of the Recursion Theorem are possible. In
Note 6 we will state a version of the rather more sophisticated theorem of transfinite
recursion which allows to recursively define functions whose domain is an arbitrary
well-ordered set.
N Ne
0 7→ 0
1 7 → 2
2 7→ 4
3 7 → 6
.. .. ..
. . .
Note 5: Functions and Relations – July 28, 2020 5-31
Strict set dominance (≺) gives us a notion of a set being bigger than another
set: A ≺ B means that B is bigger than A. Under this notion, is there a biggest
set? The answer is negative, as shown below.
5.9.3 Theorem. (Cantor) For every set A we have A ≺ P(A).
Proof. Recall that for every a ∈ A, {a} is a subset of A. Therefore, {a} is a member
of P(A). Define
f : A → P(A) : a 7→ {a}.
This function is injective, since
A0 = {a ∈ A | a ∈
/ g(a)}.
a0 ∈ A 0 ⇔ a0 ∈
/ g(a0 ).
a0 ∈ g(a0 ) ⇔ a0 ∈ A0 ⇔ a0 ∈
/ g(a0 ),
f∗ (M ) = N and g∗ (B \ N ) = A \ M,
h∗ (A) = h∗ (M ∪ (A \ M ))
= h∗ (M ) ∪ h∗ (A \ M )
= N ∪ (B \ N )
= B,
M0 M1 M2
A
J J
J
J J
J
J
J
J
J J J J J J
J f JJ J
g
J
J J
g
J
J J
g
^
J
f J^
J
f J^
J
J
JJ
J
JJ
J
J
JJ
JJ
JJ
JJ
JJ
JJ
B
N1 N2 N3
reflections shown in the above figure, we use the recursion theorem and proceed as
follows. Let M0 = A \ g∗ (B). Then f defines a bijection
M0 ∼ N1 ,
N1 ∼ M1 ,
Note 5: Functions and Relations – July 28, 2020 5-33
M1 ∼ N2 ,
M = M0 ∪ M1 ∪ M2 ∪ · · · ,
N = N1 ∪ N2 ∪ N3 ∪ · · · .
We verify that the two sets M and N have the desired properties. We have:
f∗ (M ) = f∗ (M0 ∪ M1 ∪ M2 ∪ · · · )
= f∗ (M0 ) ∪ f∗ (M1 ) ∪ f∗ (M2 ) ∪ · · ·
= N1 ∪ N2 ∪ N3 ∪ · · ·
= N.
We also have:
g∗ (B \ N ) = g∗ (B) \ g∗ (N )
= g∗ (B) \ g∗ (N1 ∪ N2 ∪ N3 ∪ · · · )
= g∗ (B) \ (g∗ (N1 ) ∪ g∗ (N2 ) ∪ g∗ (N2 ) ∪ · · · )
= (A \ M0 ) \ (M1 ∪ M2 ∪ M3 ∪ · · · )
= A \ M.
Contents
6.1 The Arithmetic of the Natural Numbers . . . . . . . . . . . . 6-1
6.2 Order . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-12
6.3 The Integers . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-20
6.4 The Rationals . . . . . . . . . . . . . . . . . . . . . . . . . . 6-24
6.5 Algebraic Structures . . . . . . . . . . . . . . . . . . . . . . . 6-29
• called associative if
(a ∗ b) ∗ c = a ∗ (b ∗ c)
for every a, b and c in A, in which case we can write a∗b∗c without ambiguity
of meaning;
6-1
Note 6: Numbers and Arithmetic – July 28, 2020 6-2
Note: No binary operation can have more than one unit, for if u and e are units
for the binary operation ∗ on A, then u = u ∗ e (since a = a ∗ e for all a ∈ A) and
u ∗ e = e (since u ∗ a = a for all a ∈ A); hence, u = u ∗ e = e. Note that nothing
has been said about the operation ∗ being commutative in discussing units for that
operation: a ∗ u = u ∗ a for all a ∈ A must hold for the unit u.
6.1.1 Definition. Addition: Recall that
σ : N → N : k 7→ k +
(1) sm (0) = m
m + n = sm (n)
for all m and n in N. We also call m + n the sum of the numbers m and n.
6.1.2 Remark. The definition of addition above implies that
m+ = m + 1
for all m ∈ N. Eventually we aim, for the most part, to stop using successors
in favor of addition. This will become especially important once the set-theoretic
definition of “number” becomes blurred by the passage from the natural numbers
N to the integers Z, then to the rationals Q and eventually to the reals R (and
beyond…). Each of these number systems is an extension of the prior one, but does
not really contain it as a subset. In particular, the “same” natural number n is a
different set when regarded as an element of each of the above number systems.
6.1 Exercise. 1. Use the recursion theorem to prove that, for each n ∈ N, there
exists a unique function σn : N → N such that σ0 = idN and σn+ = σ ◦ σn
(where σ is the successor function).
Note 6: Numbers and Arithmetic – July 28, 2020 6-3
2. Prove that σn (m) = sm (n) for all m, n ∈ N (in the notation of Defini-
tion 6.1.1). In other words, the functions σn may be used to provide an
alternative definition of the operation of addition of natural numbers.
6.2 Exercise. 1. Use the recursion theorem to prove that, for each n ∈ N, there
exists a unique function ςn : N → N such that ς0 = idN and ςn+ = ςn ◦ σ.
3+4 = s3 (4)
= s3 (3)+ by (2)
= (s3 (2)+ )+ by (2)
= ((s3 (1)+ )+ )+ by (2)
= (((s3 (0)+ )+ )+ )+ by (2)
= (((3+ )+ )+ )+ by (1)
= ((4+ )+ )+
= (5+ )+
= 6+
= 7.
m + n = (. . . ((m + 1) + 1) . . .) + 1
| {z }
n times
can be carried out in a way that makes sense within the axioms of ZF.
6.1.4 Proposition. Addition on N satisfies the following properties
(1) (m + n)+ = m + n+ ,
(2) m + 0 = m = 0 + m,
(3) m+ + n = (m + n)+ ,
(4) (ℓ + m) + n = ℓ + (m + n),
(5) m + n = n + m;
0 + m+ = (0 + m)+ by (1)
= m+ by the inductive hypothesis.
which is (3) with n replaced by n+ . This shows that (3) holds for all n.
We prove (4): The proof is by induction on n. For n = 0, we have
(ℓ + m) + 0 = ℓ + m = ℓ + (m + 0),
which is (4) with n replaced by n+ . This shows that (4) holds for all n.
We prove (5): The proof is by induction on n. For n = 0, thanks to (2), we
have m + 0 = 0 + m. Suppose (5) is true for n. Then,
m + n+ = (m + n)+ by (1)
= (n + m)+ by the inductive hypothesis
= n+ + m by (3),
which is (5) with n replaced by n+ . This shows that (5) holds for all n.
Note 6: Numbers and Arithmetic – July 28, 2020 6-5
1. Use the recursion theorem to prove that, for each n ∈ N, there exists a unique
function ςn : N → N such that ς0 = idN and ςn+ = ςn ◦ σ.
m∈n↔m+ℓ∈n+ℓ
(1) m∈n→m+ℓ∈n+ℓ
for all ℓ ∈ N. The proof is by induction on ℓ. For ℓ = 0 the implication (1) follows
from 1.4.2. Suppose that (1) is true for ℓ. Then
which, together with the inductive hypothesis (1), yields (1) with ℓ replaced by ℓ+ .
We have proved (1).
We prove the implication
(2) m + ℓ ∈ n + ℓ → m ∈ n.
Recall that in Note 4 we proved that there are three mutually exclusive possibilities
for the two natural numbers m and n (trichotomy):
m ∈ n, m = n, n ∈ m.
• If m+ℓ ∈ n+ℓ, we cannot have m = n, for this would imply that m+ℓ ∈ m+ℓ,
which is impossible, since no natural number is a member of any of its subsets.
m+ℓ=n+ℓ→m=n
(1) m + ℓ = n + ℓ.
gm : N → N : n 7→ n + m.
For every m ∈ N the recursion theorem asserts the existence of a unique function
pm : N → N such that
(1) pm (0) = 0
for all n ∈ N.
Define the binary operation · on N by
m · n = pm (n)
3·4 = p3 (4)
= p3 (3) + 3 by (2)
= p3 (2) + 3 + 3 by (2)
= p3 (1) + 3 + 3 + 3 by (2)
= p3 (0) + 3 + 3 + 3 + 3 by (2)
= 0+3+3+3+3 by (1)
= 12.
Note 6: Numbers and Arithmetic – July 28, 2020 6-7
Once again, we have used the recursion theorem to formalize the intuitive con-
struction
m·n=m | +m+ {z. . . + m}
n times
(1) m · 0 = 0,
(2) m · n+ = (m · n) + m,
(3) 0 · m = 0,
(4) m+ · n = (m · n) + n.
Proof. We prove (1) and (2): Thanks to the definition of multiplication, (1)
follows from (1.7.1), while (2) follows from (1.7.2), since m · n+ = pm (n+ ) =
pm (n) + m = (m · n) + m.
We prove (3): The proof is by induction on m. For m = 0, we have, thanks
to (1.7.1), 0 · 0 = p0 (0) = 0. Suppose that (3) holds for m. We have
0 · m+ = (0 · m) + 0 by (2)
= 0+0 by the inductive hypothesis
= 0,
m+ · 0 = 0 by (1)
= 0+0 by (1.4.2)
= (m · 0) + 0, by (1)
m+ · n+ = (m+ · n) + m+ by (2)
= ((m · n) + n) + m+ by the inductive hypothesis
= (m · n) + (n + m+ ) by (1.4.4)
= (m · n) + (n + m)+ by (1.4.1 )
= (m · n) + (m + n)+ by (1.4.5 )
= (m · n) + m + n+ by (1.4.1 )
= (m · n+ ) + n+ , by (2)
(1) m · 1 = m,
(2) m · (n + ℓ) = (m · n) + (m · ℓ),
(3) m · (n · ℓ) = (m · n) · ℓ,
(4) m · n = n · m;
m · (n + 0) = m · n by (1.4.2)
= (m · n) + 0 by (1.4.2)
= (m · n) + (m · 0). by (1.9.1)
This establishes (2) for ℓ = 0. Suppose that (2) holds for ℓ. We have
m · (n + ℓ+ ) = m · (n + ℓ)+ by (1.4.1)
= (m · (n + ℓ)) + m by (1.9.2)
= ((m · n) + (m · ℓ)) + m by the inductive hypothesis
= (m · n) + ((m · ℓ) + m) by (1.4.4)
= (m · n) + (m · ℓ+ ), by (1.9.2)
m · (n · ℓ+ ) = m · ((n · ℓ) + n) by (1.9.2)
= (m · ((n · ℓ)) + (m · n) by (2)
= ((m · n) · ℓ) + (m · n) by the inductive hypothesis
= (m · n) · ℓ+ , by (1.9.2)
m · n+ = (m · n) + m by (1.9.2)
= (n · m) + m by the inductive hypothesis
= n+ · m, by by (1.9.4)
m ∈ n ↔ m · ℓ ∈ n · ℓ.
(2) (∀k)(m ∈ n → m · k + ∈ n · k + ).
There are three mutually exclusive possibilities for the two natural numbers m and
n (trichotomy):
m = n, n ∈ m, m ∈ n.
• If m · k + ∈ n · k + , we cannot have m = n, for this would imply that m · k + ∈
m · k + , which is impossible, since no natural number is a member of any of
its subsets.
Note 6: Numbers and Arithmetic – July 28, 2020 6-10
(1) m · ℓ = n · ℓ.
(1) em (0) = 1
m ↑ n = em (n)
Once again, we have used the recursion theorem to formalize the intuitive con-
struction
mn = m| · m {z
· . . . · m}
n times
m(n+0) = mn = mn · 1 = mn · m0 .
(mn )0 = 1 = m0 = m(n·0) ,
6.1.17 Definition. Precedence: From first to last, the order of evaluation (prece-
dence) of the arithmetic operators is exponentiation, multiplication, addition. For
example m + n · ℓ ↑ k is read as m + (n · (ℓ ↑ k)) and is normally written m + n · ℓk .
We read mn·ℓ as m(n·ℓ) . In other words, mn·ℓ stands for m ↑ (n · ℓ), not for
m ↑ n · ℓ, which is mn · ℓ.
Since multiplication has precedence over addition, we can omit the parentheses
in (m · n) + (m′ · n′ ) to write m · n + m′ · n′ instead.
While addition and multiplication are associative, exponentiations is not. When
several instances of the exponentiation operator follow each other, the evaluation
proceeds from right to left. For example, m ↑ n ↑ ℓ is read m ↑ (n ↑ ℓ), which is
ℓ ℓ ℓ
usually written mn . In other words mn is read as m(n ) and not as (mn )ℓ . For the
latter, of course, we have (mn )ℓ = m(n·ℓ) .
6.4 Exercise. Show that addition does not distribute over multiplication.
6.5 Exercise. Show that exponentiation is neither commutative nor associative.
6.6 Exercise. Show that exponentiation does not distribute over addition or mul-
tiplication.
6.7 Exercise. Show that neither addition nor multiplication distributes over ex-
ponentiation.
6.2 Order
In the definitions given below we use the neutral symbols ⊏ and ⊑ as generic names
for relations that behave like the familiar < and ≤, respectively. For the sake of
associating pronounceable names with our symbols, ⊏ and ⊑ shall be pronounced
“less than” and “less than or equal to,” respectively.
6.2.1 Definition.
(1) A relation ⊑ in a set A is called antisymmetric if for all x and y in A
x ⊑ y ∧ y ⊑ x ⇒ x = y.
x ⊏ y ⇒ ¬y ⊏ x.
(7) An ordering (partial or strict) of a set A is called linear, or total, if every two
distinct elements of A are comparable in the ordering.
6.2.2 Example. Fix a set A and consider its power set P(A). Then, set inclusion
⊂ defines a partial ordering in P(A). The ordering is total if and only if A is empty
or is a singleton. For example, with P({0, 1}) = {∅, {0}, {1}, {0, 1}}, we have that
{0} ̸⊂ {1} and {1} ̸⊂ {0}. (Be careful, we have 0 ⊂ 1, 0 ∈ 1 and {0} ∈ {1}; but
{0} ̸⊂ {1}.)
6.8 Exercise. Let (A, ⊏A ) (B, ⊏B ) be strictly linearly ordered sets. The lexico-
graphic ordering of A × B is the order ≺ defined by
(6.1) x ̸= y ⇒ (x ⊑ y ↔ x ⊏ y).
In other words, ⊑ and ⊏ correspond to each other if and only if they agree (as
relations) when restricted to pairs (x, y) ∈ A × A such that x ̸= y.
Proof. First, fix a partial ordering ⊑ of A, and let ⊏ be any strict ordering ⊏ of A
for which property (6.1) holds. For all elements x, y of A we have x ⊏ y ⇔ x ⊑ y
if x ̸= y, and ¬ x ⊏ y if x = y (by the asymmetry of ⊏). Therefore, ⊏ must be the
relation defined by
(6.2) x ⊏ y ⇔ x ⊑ y ∧ x ̸= y
for all x, y ∈ A. Now we prove that the relation ⊏ uniquely defined by (6.2) is a
strict ordering of A.
Note 6: Numbers and Arithmetic – July 28, 2020 6-14
It is clear that any least element of a partially ordered set A is minimal, and
any greatest element is maximal. The converses of these statements are not true in
general. In fact, A may have minimal elements but no least element, and maximal
elements but no greatest element. See Examples 6.2.6 below.
If A has a least element this element is unique, since for any two such elements
a and a′ we have a ⊑ a′ and a′ ⊑ a, which implies a = a′ by antisymmetry. (A
similar argument yields uniqueness of greatest elements.) Moreover, if A has a
least element a then a is the unique minimal element of A, because if a′ is minimal
we have a ⊑ a′ since a is least, hence a = a′ since a′ is minimal. (Similarly, a
greatest element is the unique maximal element.) Finally, we remark that if A is a
chain (i. e., A is linearly ordered) then a minimal element is necessarily least, and
a maximal element is greatest: If a is a minimal element then every element x ∈ A
is comparable to a, hence either x ⊑ a or a ⊑ x; however, the latter relation would
contradict the minimality of a.
6.2.6 Examples. The set N ordered by set inclusion has least and minimal element
0, but no greatest or maximal elements. (We have not officially shown that set
inclusion is actually an ordering of N; this is done in Propositions 6.2.10 and 6.2.12
below.)
Given a nonempty set A, define
B = {x ∈ P(A) | x ̸= ∅},
the set of all nonempty subsets of A. Then, B is a partially ordered set under set
inclusion (⊂). Every singleton in B is a minimal element, yet B does not have a
least element unless A is a {singleton.}
For example, take A = 1, {1, 2} . Then, we have
{ }
{ } { }
B = {1}, {1, 2} , 1, {1, 2} .
{ }
The two singletons {1} and {1, 2} { in B are} both minimal elements of B. Yet, B
has no least element. The element 1, {1, 2} of B is both maximal and greatest.
Note 6: Numbers and Arithmetic – July 28, 2020 6-16
(2) a greatest lower bound (or infimum) of B if for every lower bound x of B we
have x ⊑ a;
(4) a least upper bound (or supremum) of B if for every upper bound x of B we
have a ⊑ x.
Note that neither lower nor upper bounds of B need be elements of B. A set that
has an upper (lower) bound, is said to be bounded above (below). The least upper
bound and the greatest lower bound of B, when they exist are denoted by sup B
and inf B, respectively.
If ⊏ is a strict ordering of A then (A, ⊏) is called
(5) complete if every nonempty subset of A that is bounded above has a least
upper bound;
S̄ = {n ∈ N | (∀m)(m < n → m ∈
/ S)}.
We show that S̄ is inductive. Clearly, 0 ∈ N vacuously, since m < 0 is false for
all m ∈ N.
Suppose that n ∈ S̄. Then for any m < n+ we have two possibilities:
• Suppose that m = n. Then, by the definition of S̄, m ∈ S ⇒ m ≥ n means
that n is the least element of A. This contradicts the hypothesis that A has
no least element.
m ≤ m + n.
Proof. We prove the proposition by showing that, for each fixed natural number
m, the set
Sm = {n ∈ N | m ≤ m + n}
is inductive, hence contains every natural number n. Clearly 0 ∈ Sm since m =
m + 0 (hence m ≤ m + 0). For the inductive step, assume n ∈ Sm , that is,
Note 6: Numbers and Arithmetic – July 28, 2020 6-18
Sn = {m ∈ N | m ≤ m + n}
is inductive.
6.2.14 Proposition. For natural numbers m and n,
The proof will be finished once we show that Sm is inductive, hence equal to N.
First we show 0 ∈ Sm . If m = 0 then both 0 ≤ 0 and (∃k)(0 = 0 + k) are true
(the latter because 0 = 0 + 0). If m ̸= 0 then m > 0 (by trichotomy, since m ≮ 0),
so m ≰ 0, and on the other hand 0 ̸= m+k for any k ∈ N (because 0 < m ≤ m+k,
by Proposition 6.2.13). In either case we conclude 0 ∈ Sm .
For the inductive step, assume that n ∈ Sm . We must show that n+ ∈ Sm . We
do this by considering the cases m ≤ n and m > n separately.
If m ≤ n then n ∈ Sm implies that n = m + k for some k ∈ N. We also have
m < n+ (using n < n+ and transitivity), so m ≤ n+ . It remains to observe that
n+ = (m + k)+ = m + k + . We conclude that n+ ∈ Sm .
If n < m then, as shown in the proof of Proposition 4.4.5, we have either n+ < m
or n+ = m. In the first case we have m ≰ n+ and we must show n+ ̸= m + k for
all k ∈ N, which follows from Proposition 6.2.13: n+ < m ≤ m + k. In the second
case we have both m ≤ n+ and n+ = m + 0. The conclusion n+ ∈ Sm follows in
either case.
6.2.15 Definition. For any natural numbers m ≤ n we call the natural number k
such that n = m + k the difference or subtraction of n and m, and write k = n − m.
Note that such a natural number k exists by Proposition 6.2.14, and is unique by
the Cancellation Law for Addition (Proposition 6.1.6).
Note 6: Numbers and Arithmetic – July 28, 2020 6-19
6.11 Exercise. Carry out a variation of the proof of Proposition 6.2.14 showing
instead that, for any natural number n, the set
Sn = {m ∈ N | m ≤ n ↔ (∃k ∈ N)(n = m + k)}.
is inductive.
6.12 Exercise. Let m and n be natural numbers such that m · n = 0. Prove that
m = 0 or n = 0.
6.13 Exercise. A natural number n is
• even if n = 2 · m for some natural number m and
• odd if n = 2 · m + 1 for some natural number m.
Prove that every natural number is either even or odd (exclusive or). Moreover,
show that m is unique.
6.2.16 Proposition. The following properties hold for all natural numbers a, b, c.
1. a + c < b + c ⇔ a < b.
2. If c > 0 then a · c < b · c ⇔ a < b.
Proof. Part 1. is merely a restatement of Proposition 6.1.5.
The special case a = 0 of part 2. follows immediately from Exercise 6.12 and
its obvious converse.
The general case is obtained from the special case as follows. First we show that
a < b ⇒ a · c < b · c (if c > 0). Indeed, a < b means b = a + b′ for some b′ > 0, hence
0 = 0·c < b′ ·c (by the special case already proved). Adding a·c to both sides of the
inequality preserves it, by part 1., giving a·c = a·c+0 < a·c+b′ ·c = (a+b′ )·c = b·c.
Now we show a · c < b · c ⇒ a < b (if c > 0) by proving its contrapositive
a ≥ b ⇒ a · c ≥ b · c. Indeed, a = b ⇒ a · c = b · c, whereas a > b ⇒ a · c > b · c (if
c > 0) by the above paragraph. Hence a ≥ b ⇒ a · c ≥ b · c is proved for all a, b, c
with c > 0, and we are finished.
6.14 Exercise (The Division Algorithm in N). For any natural numbers a, b
with b > 0 there exist natural numbers q, r satisfying
a=b·q+r and r < b.
Moreover, such numbers q, r are unique.
Suggestion: Consider the set R = {n ∈ N | (∃m ∈ N)(a = b · m + n)}. Show
that R has a least element r such that r < b; moreover n = b · q + r for some q ∈ N.
Uniqueness may be proved using properties of inequalities to derive a contradiction
from a hypothetical equality equality bq + r = bq ′ + r′ with (q, r) ̸= (q ′ , r′ ). (Note
that this exercise generalizes Exercise 6.13.)
Note 6: Numbers and Arithmetic – July 28, 2020 6-20
6.2.17 Definition. For any natural numbers a, b with b ̸= 0, the natural number
q of Exercise 6.14 is called the (incomplete) quotient and r the remainder of the
(Euclidean) division of a by b.
In case r = 0, that is to say, if a = b · q for some q ∈ N, we say that b divides a,
or that a is a multiple of b, or that a is divisible by b. This relation is denoted by
b | a.
(It is sometimes convenient to allow b = 0 in the latter definition. Namely, 0 | a
just in case a = 0 · q = 0.)
Finally, if b | a (and b ̸= 0), the incomplete quotient r is actually a complete
quotient, it is denoted by a ÷ b, a/b or ab , and called the division of a by b.
Rather than proceeding to carry out a study of the formal properties of the sub-
traction of natural numbers defined above, we switch now to the proper setting in
which the operation is unconditionally defined (i.e., without the restriction m ≤ n
above). This setting is that of the integers. (Furthermore, extending the operation
of division a/b to the case b ∤ a will require introducing the rational numbers.)
(m, n) ≃ (m′ , n′ ) ↔ m + n′ = m′ + n
for all ordered pairs (m, n) and (m′ , n′ ) in N × N. Clearly, the domain and range
of ≃ are N.
6.3.2 Proposition. The relation ≃ is an equivalence relation on N × N.
Proof. We show that ≃ is an equivalence relation:
Of course, we must shown that the definition makes sense. We must show that the
result we obtain is independent of the choice of representative for each equivalence
class. This is stated below.
6.3.6 Proposition. For all m, n, m′ , n′ , a, b, a′ , b′ ∈ N, if
then (m + a, n + b) ≃ (m′ + a′ , n′ + b′ ).
Proof. The proof is left as an exercise.
6.3.7 Definition. Additive Inverses: For each integer [(m, n)]≃ define
Then, we have
The integer −[(m, n)]≃ is called the additive inverse of [(m, n)]≃ . Of course, we
need to show that each integer has a unique additive inverse. Note that −0Z = 0Z .
We write −nZ for the integer [(0, n)]≃
Every integer x = [(m, n)]≃ is either x = kZ , some k ∈ N, if m ≥ n; or is of the
form x = −kZ , for some k ∈ N, if m < n.
6.3.8 Proposition. The operation +Z has the following properties.
x +Z y = 0Z .
Note 6: Numbers and Arithmetic – July 28, 2020 6-23
Here, the product · in N is not written out explicitly. In other words, we may write
mn instead of m · n.)
6.3.11 Proposition. For all m, n, m′ , n′ , a, b, a′ , b′ ∈ N, if
(3) x +Z z = y +Z z → x = y.
(m, n) ∼
= (m′ , n′ ) ↔ m · n′ = m′ · n
for all ordered pairs (m, n) and (m′ , n′ ) in Z × (Z \ {0}). Clearly, the domain and
range of ∼= are Z and Z \ {0}, respectively.
Note 6: Numbers and Arithmetic – July 28, 2020 6-25
(1) m · n′ = m′ · n
and
Does (3) follow from (1) and (2)? It does: Thanks to (1), (2) and the
properties of multiplication we have
(m · n′′ ) · m′ = m · (n′′ · m′ )
= m · (m′ · n′′ )
= m · (m′′ · n′ )
= m · (n′ · m′′ )
= (m · n′ ) · m′′
= (m′ · n) · m′′
= (m′′ · n) · m′ .
Now, (3) follows from the above and the cancellation law for multiplication. We
have shown that ∼= is an equivalence relation.
6.4.3 Definition. A rational number is an element of the quotient set
Q = [Z × (Z \ {0})]/ ∼
=.
Two very special rationals are
0Q = [(0, 1)]∼
= and 1Q = [(1, 1)]∼
=.
Note 6: Numbers and Arithmetic – July 28, 2020 6-26
(m, n) ∼
= (m′ , n′ ) and (a, b) ∼
= (a′ , b′ )
We can define division (÷) by any nonzero rational. The unofficial computation
m m′ m n′ mn′
÷ ′ = · ′ =
n n n m nm′
gives us the definition
′ ′ ′ ′ ′ ′
= ÷ [(m , n )]∼
[(m, n)]∼ = ·Q [(n , m )]∼
= = [(m, n)]∼ = = [(mn , nm )]∼
=.
Note 6: Numbers and Arithmetic – July 28, 2020 6-28
with n > 0 and n′ > 0. This poses no difficulty. Thanks to the equality
= = [(−a, −b)]∼
[(a, b)]∼ =,
(4) x +Q z <Q y +Q z → x = y.
(5) If z ̸= 0 then x ·Q z = y ·Q z → x = y.
Proof. The proof is left as an exercise.
6.4.14 Proposition. The function Φ : Z → Q : m 7→ [(m, 1)]∼
=
(1) is injective
and for all m, n ∈ N we have
(2) Φ(m + n) = Φ(m) +Q Φ(n),
(5) Φ(0) = 0Q ,
(6) Φ(1) = 1Q ,
(3) Group A monoid where inverses exist is called a group. In other words,
(A, ∗, ι∗ ) is called a group if
(3) Abelian Group An Abelian group is a group where the operation com-
mutes. In other words, (A, ∗, ι∗ ) is an Abelian group if
S(X) = {f ∈ X X | f is bijective}
is a group under composition ◦ with identity idX . Note that S(X) a “sub-
monoid” of X X in the obvious sense. In general, any monoid A contains a
group A× (with the same identity and operation) consisting exactly of the
invertible elements of A. In the case of the monoid X X , recall that invertible
functions are (the )same as bijective functions, by Theorem 5.3.7 and Exer-
×
cise 5.3.8, so X X = S(X).
In the special case when X is a natural number n, one usually writes Sn for
S(n). This is the symmetric group on n symbols (or letters) (why “symbols”
or “letters” rather than “numbers” or “sets”, we know not.)
6.15 Exercise. Prove that the group S(X) is Abelian if and only if X contains
at most two distinct elements.
6.5.4 Definition. (Pi Notation) Let (A, ∗, ι∗ ) be a monoid. For any natural
number n and any family {ai | i ∈ n} (equivalently, {ai | i ∈ N ∧ i < n}) of
elements of A, indexed by n (this family is called an n-tuple of elements of A),
there exists a unique element in A, denoted
∏ ∏
ai or ai
i∈n 0≤i<n
and called the ∗-product of the n-tuple, such that the following properties are
satisfied:
• The product of a 0-tuple (the empty tuple, an empty subset of A) is the
identity element of A: ∏
ai = ι∗ .
i∈0
where ∏
ai
i∈n
denotes the product of the n-tuple obtained from the n+ -tuple {ai }i∈n+ by
removing its last element an .
Note 6: Numbers and Arithmetic – July 28, 2020 6-31
∏
We now prove the existence and uniqueness of the product operators i∈n for
all n ∈ N.
Formally, the set of all n-tuples is the set An of∏all functions σ : n → A. The
product∏operator on n-tuples should be a function i∈n from An to A.† In other
words, i∈n ∈ AA (by convention, AA means A(A ) and not
n n n A n
∏ (A ) ). It remains
to construct a function π with domain N such that π(n) = i∈n ∈ AA .
n
π0 (∅) = ι∗ .
n+
Now we define fn : AA → AA as follows. For σ ∈ AA , let fn (σ) be the
n n
n+
function τ ∈ AA such that, for any {ai }i∈n+ :
• π(0) = π0 , and
• π(n+ ) = fn (π(n)).
∏
We now write i∈n for π(n). Albeit more cryptic, the base and recursive
conditions imposed on π are equivalent to the two properties demanded of the
product operator in Definition 6.5.4. The existence and uniqueness of the product
of an n-tuple thus follow from those stated in Theorem 5.8.3.
6.5.5 Remarks.
∏
• Our construction of the product operators i∈n above, simultaneously for all
n-tuples, is a bit abstract. See Exercises 6.17 and 6.16 below (and perhaps
their solutions) for a more direct approach.
i∈n+
+
(note it is n and not n).
The meaning and formal construction of more general such product opera-
tions as
∏n
ai
i=m
for any natural numbers (or integers) m ≤ n and set {ai }i∈I indexed by an
interval I = {i ∈ Z | m ≤ i ≤ n} should also be clear.
• When the operation ∗ is commutative, it is sensible to define the product
∏
ai
i∈I
of any family {ai }i∈I indexed by a finite set I (not necessarily linearly or-
dered). Any such finite set is in bijection with a natural number n through
some function f : n → I, so {af (k) }k∈n is an n-tuple whose product is well
defined. It is routine, though tedious, to prove a generalized commutative
law stating that, under the hypothesis that ∗ is commutative, the n-tuples
corresponding to different bijections f all have the same product, which is
then defined to be the product of the I-tuple {ai }.
Note 6: Numbers and Arithmetic – July 28, 2020 6-33
6.5.6 Example. To demonstrate the expressive power of the Pi Notation, note that
in the monoid (N, ·, 1) consisting of the natural numbers under multiplication, we
could have defined powers by ∏
mn = m
i∈n
(i.e., mn is the product of the constant n-tuple {ai }i∈n such that ai = m for
all i). See Example 6.5.13 (and also Exercise 6.20) for a definition of products in
a formally identical manner. See also Exercise 6.18 below for an adaptation of the
above definition to any monoid.
6.16
∏ Exercise. Provide a direct proof of the uniqueness of the product operators
i∈n : An
→ A by induction (rather than by appealing to the uniqueness stated
in the recursion theorem).
6.17
∏ Exercise. Use the Bounded Recursion Theorem 5.8.5 to provide a definition
of i∈n ai .
6.18 Exercise. In a monoid (A, ∗, ι∗ ), for any element a ∈ A and any natural
number n ∈ N define the n-th ∗-power a∗n (or simply an if the operation is clear
from context) by ∏
an = a,
i∈n
that is, an is the ∗-product of the constant n-tuple {ai | i ∈ n} with ai = a for all i.
• a0 = ι∗ , and
• am = am ∗ a for all m ∈ N.
+
6.19 Exercise. Let {ai | i ≤ n} be an n+ -tuple in a monoid (A, ∗, ι). Prove that
∏ ∏
ai = a0 ∗ ai+1 .
i≤n i<n
6.5.7 Definition. Let f : X → X. For any natural number n we define the n-fold
composition f (n) : X → X as the n-th power f ◦n of f in the monoid X X (in the
notation of Exercise 6.18). This is equivalent to a recursive definition:
• f (n+1) = f ◦ f (n) .
n = σ (n) (0).
m + n = σ (n) (m).
(See Exercise 6.1, where the notation σn was used for σ (n) .)
The above is the law of exponents σ (m+n) = σ (n) ◦ σ (m) (applied to 0).
Similarly, the product m · n is n-fold repeated addition of m, starting from 0:
( )(n)
m · n = σ (m) (0),
( )(n)
which is the law of exponents σ (m·n) = σ (m) (applied to 0).
6.5.9 Remark. Let an n-tuple of functions {fm : X → X | m ∈ n} be given.
Using Pi Notation
∏ in the monoid (X X , ◦, idX ) we obtain a definition for the n-fold
composition i∈n fi as a function from X → X. One can, however, compose func-
tions whose domain and codomain are different sets. This motivates the following
definition.
6.5.10 Definition. Let an N-tuple of sets {Ai | i ∈ N} be given. For any n-tuple
∏ f = {fi : Ai → Ai+ | i ∈ n} one defines the composition of {fi },
of functions
denoted ◦i∈n fi (or, informally, fn ◦ · · · ◦ f1 ◦ f0 ) recursively as follows:
∏
• ◦i∈0 fi = idA0 (the composition of an empty family is the identity).
∏ ∏
• ◦i∈n+ fi = fn ◦ ◦i∈n fi . Note the order!
(Informally, fn+1 ◦ fn ◦ · · · ◦ f0 = fn+1 ◦ (fn ◦ · · · ◦ f0 )).
Note 6: Numbers and Arithmetic – July 28, 2020 6-35
∏
We often will drop the adornment ◦ from ◦ , hoping it will not lead to confusion.
Now we show that the composition operators above exist and are well defined.
The details are very similar to those for Pi Notation.
Let Fn be the set of all such n-tuples {fi }, where for each i ∈ n we have
fi : Ai → Ai+ .‡ Let Bn = AA n be the set of all functions A0 → An . Let Cn = Bn
0 Fn
where ∑
ai
i∈n
denotes the sum of the n-tuple obtained from the n+ -tuple {ai }i∈n+ by re-
moving its last element an .
‡
∏ Ai+
Fn = i∈n (Ai ) is a Cartesian product in the sense of Definition 7.2.2.
Note 6: Numbers and Arithmetic – July 28, 2020 6-36
6.5.12 Remarks.
6.5.13 Example. In the monoid (N, +, 0) consisting of the natural numbers under
multiplication, we could have defined
∑
m·n= m
i∈n
(i.e., m · n is the sum of the constant n-tuple {ai }i∈n such that ai = m for all i.)
See Exercise 6.20 below for an adaptation of this definition to any additive monoid.
6.20 Exercise. In an additive monoid (A, ⊕, 0A ), for any element a ∈ A and any
natural number n ∈ N define the n-th multiple n • a by
∑
n•a= a,
i∈n
that is, n • a is the ⊕-sum of the constant n-tuple {ai | i ∈ n} with ai = a for
all i. (We are denoting addition in A by ⊕ to preclude confusion below with the
addition m + n of natural numbers.)
(a) (m + n) • a = (m • a) ⊕ (n • a)
(the mixed distributivity of • over addition§ ), and
(b) (m · n) • a = m • (n • a) (the mixed associativity of • and ·).
§
The distributivity is mixed because the addition + in N is likely different from ⊕ in A.
Note 6: Numbers and Arithmetic – July 28, 2020 6-37
x · (y + z) = (x · y) + (x · z)
for all x, y, z ∈ A.
x · (y + z) = (x · y) + (x · z)
for all x, y, z ∈ A,
then A is called a commutative ring with identity.
6.5.15 Remarks. • Note that “commutative” in the definition above refers to
the operation of multiplication of the ring; the operation of addition in any
ring is commutative. In fact, by convention throughout mathematics, any
operation conventionally called “addition” and denoted by the symbol ‘+’
must be both associative and commutative.
Note 6: Numbers and Arithmetic – July 28, 2020 6-38
• Some authors call a ring with identity as defined above simply a ring, yet oth-
ers define a ring as a structure which does not (necessarily) have an identity.
For us, every ring shall have an identity, though for emphasis and clarity we
will usually stick to the clumsy nomenclature “ring with identity”.
6.5.16 Definition. Let (A, +, ·, 0A , 1A ) be a ring. There exists a unique function
ι : N → A such that ι(0) = 0A and ι(n + 1) = ι(n) + 1A . The function ι is uniquely
extended to a function Z → A by setting ι(−n) = −ι(n) for all n ∈ N. The image
ι∗ (Z) is called the prime ring of A. One can always identify an integer n ∈ Z with
its image ι(n) = ñ ∈ A.
6.5.17 Remark. The prime ring P of A is actually a ring in its own right (a
subring of A), though P need not be a subring isomorphic to Z (for one thing: ι
need not be injective).
6.5.18 Definition. Integral Domain: A commutative ring with identity (A, +, ·, 0, 1)
is called an integral domain if it has no zero divisors, i.e.:
(∀x ∈ A)(∀y ∈ A)(x ̸= 0 ∧ y ̸= 0 ⇒ x · y ̸= 0).
(A divisor of zero in a ring A is an element x ̸= 0 in A such that x · y = 0 for some
y ̸= 0 in A.)
6.5.19 Remarks. (Z, +, ·, 0, 1) and (Q, +, ·, 0, 1) are both integral domains. The
latter has more structure, because all nonzero elements are invertible. The struc-
ture, called a field, is introduced below.
The rational numbers Q, the real numbers R and the complex numbers C,
(but not the natural numbers N or the integers Z) with the usual operations of
addition and multiplication, provide examples of a mathematical structure called
a field. The real numbers R shall be introduced in Note 8. The prime ring of each
of these rings is the ring Z itself (when we regard Z ⊂ Q ⊂ R ⊂ C).
6.23 Exercise. Let {ai | i < n} be an n-tuple in a ring with identity (R, +, ·, 0, 1),
and let c ∈ R. Prove the identity
∑ ∑
c· ai = (c · ai ).
i<n i<n
(1) x + y = x + z → y = z.
(2) x + y = x → y = 0.
(3) x + y = 0 → y = −x.
(4) −(−x) = x.
y = 0+y
= (−x + x) + y
= −x + (x + y)
= −x + (x + z)
= (−x + x) + z
= 0+z
= z.
(1) x ̸= 0 ∧ xy = xz → y = z.
(2) x ̸= 0 ∧ xy = x → y = 1.
(3) x ̸= 0 ∧ xy = 1 → y = x−1 .
−1
(4) x ̸= 0 → [x−1 ] = x.
Note 6: Numbers and Arithmetic – July 28, 2020 6-41
(2) x ̸= 0 ∧ y ̸= 0 → xy ̸= 0.
a contradiction.
We prove (3): Apply 5.10(3) to
to obtain (−x)y = −(xy), which is the first equality in (3). Apply 5.10(3) to
x(−y) + xy = x(−y + y) = x0 = 0
(4) x ̸= 0 → x2 > 0.
(5) 1 > 0.
Proof. We prove (1): For x > 0 we have 0 = −x + x > −x + 0 = −x. Hence,0 >
−x. For x < 0 we have 0 = −x + x < −x + 0 = −x. Hence, 0 < −x.
We prove (2): If z > y then z − y > y − y = 0, giving x(z − y) > 0 for x > 0.
Hence
xz = x(z − y) + xy > 0 + xy = xy.
We prove (3): Thanks to (1),(2) and 5.12(3) we have
x2 − 2 2x + 2
(1) y =x− = .
x+2 x+2
The above gives
If x ∈ A we have x2 − 2 < 0. Then (1) gives y > x > 0 and (2) gives y 2 − 2 < 0,
i.e.: y ∈ A. If x ∈ B we have x2 − 2 > 0. Then (1) gives 0 < y < z and (2) gives
y 2 − 2 > 0, i.e.: y ∈ B.
Suppose that A had a supremum sup A ∈ Q. Certainly sup A > 0; moreover,
sup A ∈ / A since A has no largest element. We also have sup A ∈ / B since B, all
of whose elements are upper bounds for A, itself has no smallest element. Hence
(sup A)2 = 2 by trichotomy. Yet the square of no rational number is equal to 2,¶ a
contradiction. Similarly, inf B does not exist in Q.
The failure of the rationals to be complete is the primary justification for con-
structing the real numbers.
¶
The unofficial proof of this fact, given in Note 1 long before we had formally defined Q, is
now official.
Note 6: Numbers and Arithmetic – July 28, 2020 6-44
Contents
7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-1
7.2 Products and the Axiom of Choice . . . . . . . . . . . . . . . 7-2
7.3 One-Sided Inverses . . . . . . . . . . . . . . . . . . . . . . . . 7-6
7.4 Countable and Uncountable Sets . . . . . . . . . . . . . . . . 7-8
7.1 Introduction
In this note we introduce a new axiom, the axiom of choice (abbreviated AC),
required to justify a line of reasoning that, while superficially appearing extremely
obvious and intuitive, leads to equivalent statements that cannot be proved from
the remaining axioms of ZF. Some consider AC an axiom separate from the axioms
of ZF and refer to ZF + AC (or ZFC) to the axiom system that results when the
axiom of choice is included with the other axioms of ZF.
The axiom of choice sneaks up, somewhat unexpectedly, when we follow what
appears to be a rather obvious line of reasoning. To illustrate this, consider the
following question: Suppose that we are given a surjection f : A → B. Does it
follow that there exists an injection g : B → A? Recall that it is not always possible
to define an injection from any set B to any set A. For example, if B = {0, 1} and
A = {0}, no injection from B to A exists.
The answer to the question is affirmative. We can discover the argument needed
for a proof by reasoning as follows. Suppose that f : A → B is surjective. Define
7-1
Note 7: The Axiom of Choice – July 28, 2020 7-2
Recall that 2 = {0, 1} (by definition) and that A2 is the set of all functions
f : {0, 1} → A (also by definition). Define φ : A2 → A × A as follows. Given any
function f : {0, 1} → A (any element of A2 ), define
φ(f ) = (f (0), f (1)) ∈ A × A.
The function φ is bijective. In fact, its inverse φ−1 : A × A → A2 , maps
(a, b) ∈ A × A to the function f : {0, 1} → A defined by f (0) = a, f (1) = b.
In other words, for every function f ∈ A2 there is exactly one ordered pair
(a, b) ∈ A × A such that (a, b) = (f (0), f (1)). Further, for every (a, b) ∈ A × A
there is exactly one function ga,b ∈ A2 such that
ga,b (0) = a and ga,b (1) = b.
The sets A2 and A × A are essentially interchangeable. (They are isomorphic.)
Is there an analogous reasoning we can use if we replace the product A × A with
the product A × B?
The isomorphism we are looking for is
{f ∈ (A ∪ B)2 | f (0) ∈ A ∧ f (1) ∈ B} ∼ A × B.
To show that such an isomorphism exists, define
φ : {f ∈ (A ∪ B)2 | f (0) ∈ A ∧ f (1) ∈ B} → A × B
as follows. The function
f : {0, 1} → A ∪ B, with f (0) ∈ A and f (1) ∈ B,
is mapped to the ordered pair
φ(f ) = (f (0), f (1)) ∈ A × B.
The inverse
φ−1 : A × B → {f ∈ (A ∪ B)2 | f (0) ∈ A ∧ f (1) ∈ B}
sends every ordered pair (a, b) ∈ A × B to the function f : {0, 1} → A ∪ B defined
by f (0) = a and f (1) = b.
We now do something not usually done in mathematics, we redefine A × B as
(CP 2) A × B = {f ∈ (A ∪ B)2 | f (0) ∈ A ∧ f (1) ∈ B}.
If we agree that this redefinition presents a conflict only over form and not over
content, it is now clear how to keep going. The obvious thing to do is to define
A × B × C = {f ∈ (A ∪ B ∪ C)3 | f (0) ∈ A ∧ f (1) ∈ B ∧ f (2) ∈ C},
Note 7: The Axiom of Choice – July 28, 2020 7-4
then
A×B×C ×D = {f ∈ (A∪B∪C ∪D)4 | f (0) ∈ A∧f (1) ∈ B∧f (2) ∈ C ∧f (3) ∈ D},
and so on.
There is no reason why the above unions cannot be replaced by an arbitrary
union. All of this motivates the definition given below.
7.2.2 Definition. Cartesian Products: The Cartesian product of an indexed
set {Ai | i ∈ I} is
∏
(CP 3) Ai = {f ∈ (∪i∈I Ai )I | f (i) ∈ Ai for all i ∈ I}.
i∈I
7.2.3 Remark. We cannot do away with (CP 1) and replace it with (CP 2).
Definition (CP 2) uses the notion function, which was defined using (CP 1). In
other words, a function f : A → B is as a special subset of the Cartesian product
A×B, which was defined using (CP 1). If we replaced (CP 1) with (CP 2) we would
have a hopelessy circular definition.
The line of thought must be made clear: We use (CP 1) to define the cross
product of two sets, which we use to define functions, which we use to define arbi-
trary cross products via (CP 3). When in (CP 3) we have two sets, this definition
reduces to (CP 2), which differs from (CP 1) only in form, but not in substance
(thanks to the isomorphism we described above).
7.2.4 Remark. When the indexing set I in (CP 3) is empty, the corresponding
product is, of course, empty. The product is also empty ∏ when Ai is empty for some
i. If I ̸= ∅ and Ai ̸= ∅ for all i ∈ I does it follow that i∈I Ai ̸= ∅? (Recall that
the indexing set I is arbitrary.) The following axiom, the axiom of choice,∗ ensures
the outcome.
7.2.5. Axiom of Choice AC1 : The nonempty Cartesian product of nonempty
sets is nonempty. In other∏ words, for every indexed set {Ai | i ∈ I}, if I ̸= ∅ and
Ai ̸= ∅ for all i ∈ I then i∈I Ai ̸= ∅.
An equivalent way to state the axiom is the following. For every indexed set
{Ai | i ∈ I}, if I ̸= ∅ and Ai ̸= ∅ for all i ∈ I then there exists an indexed set
{ai | i ∈ I} such that ai ∈ Ai for all i ∈ I.
∗
The axiom of choice, formulated (not quite in the form given presently) in 1904 by Ernst
Zermelo, was not shown to be independent of the other axioms of set theory until 1963. This feat
was accomplished by Paul Cohen, while proving something even more amazing: the independence
of the continuum hypothesis. (See Cohen, P. J. “The Independence of the Continuum Hypoth-
esis,” Proc. Nat. Acad. Sci. USA, 50, 1143-1148, 1963; and Cohen, P. J. “The Independence of
the Continuum Hypothesis. II,” Proc. Nat. Acad. Sci. USA 51, 105-110, 1964.)
Note 7: The Axiom of Choice – July 28, 2020 7-5
7.2.6 Remark. The name of the above axiom may seem strange (why “axiom
of choice” and not “axiom of products?”).
Let A be a nonempty collection of nonempty sets. (Think of A as an indexed
set.) The axiom of choice says that the product
∏
= {f ∈ (∪a∈A a)A | f (a) ∈ a for all a ∈ A}
a∈A
In other words, a choice function “chooses” an element f (a) from every member a
of A.
7.2.8 Remark. From the above discussion, we see that AC1 above implies AC2
below.
7.2.9. Axiom of Choice AC2 : Every nonempty collection of nonempty sets has
a choice function.
7.2.10 Remark. When ∪ A =∪P(X) \ {∅}, the set of all nonempty subsets of
some set X, we have a∈A a = x∈P(X)\{∅} x = X and AC2 says that there exists
a function
f : P(X) \ {∅} → X such that f (x) ∈ x.
The function “chooses” an element f (x) from each nonempty subset x of X. We
have proved that AC2 implies AC3 given below.
7.2.11. Axiom of Choice AC3 : For every nonempty set A there is a function
f : P(A) \ {∅} → A such that f (A0 ) ∈ A0 for every nonempty subset A0 of A.
7.2.12 Remark. It turns out that not only AC1 ⇒ AC2 ⇒ AC3 , but that the
three are equivalent. This point shall be discussed later. Warning: Some call f
in AC3 a choice function for A, instead of properly calling it a choice function for
P(A) \ {∅}.
Note 7: The Axiom of Choice – July 28, 2020 7-6
A ..
..
-
.
.. ......
....
...... ... B
...... .......
....
I...........................
I
.............
......... I
..........................................
fL
idA idB
φ : P(A) \ {∅} → A
f ◦ = {(b, a) ∈ B × A | (a, b) ∈ f }.
dom f ◦ = B,
ran f ◦ ⊂ A.
dom g = dom f ◦ = B.
then we have
fL = fL ◦ idB
= fL ◦ (f ◦ fR )
= (fL ◦ f ) ◦ fR
= idA ◦ fR
= fR .
With x = 1, we have
We have a contradiction.
Proof. Let A, B be finite sets. Then A ∪ B is the union of the disjoint sets A and
B \A, and the latter is finite being a subset of the finite set B, by Proposition 7.4.2.
Hence we may assume at the outset, without loss of generality, that A and B
are disjoint, with A ∼ m and B ∼ n. Under this assumption, we claim that
A ∪ B ∼ m + n.
Note first that k ∈ m + n and k ≥ m imply k = m + l for a unique l ∈ n.
Given bijections f : m → A and g : n → B we define h : m + n → A ∪ B by:
{
f (k) if k < m;
h(k) =
g(l) if m ≤ k = m + l.
What is wrong with the above argument? The induction principle says that if
S is a set such that 0 ∈ S and such that n ∈ S ⇒ n+ ∈ S for every n ∈ N, then
S = N. Exactly how do we define such a set S? The answer is given below.
7.4.8 Proposition. Every infinite set includes a countably infinite subset.
Proof. Let A be an infinite set and let f : P(A) \ {∅} → A be a choice function for
A.
Let P denote the collection of all finite subsets of A. Since P is infinite, it
follows that if B ∈ P then P \ B is a nonempty subset of A and that, therefore,
P \ B is an element of P(A) \ {∅}.
Define
g : P → P : B 7→ B ∪ {f (A \ B)}.
Thanks to the recursion theorem, there is a function φ : P → P such that φ(0) = ∅
and such that
φ(n+ ) = g(φ(n)) = φ(n) ∪ {f (A \ φ(n))}
for every n ∈ N.
Define h : N → A : n 7→ f (A \ φ(n)). We prove that h is injective to establish
that N is equivalent to a subset of A.
Note 7: The Axiom of Choice – July 28, 2020 7-11
Contents
8.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-1
8.2 The Reals . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-2
8.3 Addition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-6
8.4 Multiplication . . . . . . . . . . . . . . . . . . . . . . . . . . 8-12
8.5 The Real Field, Density . . . . . . . . . . . . . . . . . . . . . 8-15
8.6 Monotonic Functions. No-Gaps Condition. . . . . . . . . . . 8-21
8.7 Powers, Roots, Exponentials and Logarithms . . . . . . . . . 8-27
8.8 Sequences and Numerals . . . . . . . . . . . . . . . . . . . . 8-37
8.1 Introduction
Note: In this note the symbol ⇒ is used to denote the sentential connective →
and also to denote a tautological implication (logical consequence). The symbol
⇔ will be used analogously (for esthetic symmetry). The purpose is to release →
from performing triple duty. Thus, → will be used to denote limits (e.g.: 1/x → 0
as x → +∞) and as a function symbol (e.g.: f : A → B).
Recall (see Note 6) that an ordered field (A, +, · , 0, 1, <) is a field (A, +, · , 0, 1)
together with a strict ordering∗ < of A such that for all a, b, c ∈ A we have
∗
Per Note 6, an ordering is a relation that is asymmetric and transitive. Do not confuse
“asymmetric” with “antisymmetric.” To say that < is asymmetric means that a < b ⇒ ¬(b < a).
8-1
Note 8: The Real Numbers – July 28, 2020 8-2
• is called dense if for all a and b in A such that a < b there is a c ∈ A such
that a < c < b; and
In Note 6 we proved that the rationals Q are dense, but not complete. That Q
is dense is easy to see: Given a, b ∈ Q with a < b, we have
a+a a+b b+b
a= < < = b.
2 2 2
Therefore, by choosing c = (a + b)/2, we have a < c < b with c ∈ Q. To show that
Q fails to be complete is a little more subtle, but quite elementary: We define A
to be the set of all positive rationals a such that a2 < 2 and define B to be the
set of all positive rationals b such that 2 < b2 . As shown in Remark 6.5.31, neither
sup A nor inf B exists in Q.
That Q fails to be complete means that not every subset of Q that is bounded
above has a least upper bound and that not every subset of Q that is bounded
below has a greatest lower bound. The reals are constructed to be complete.
x∈B∧y ⊏x⇒y ∈B
Note 8: The Real Numbers – July 28, 2020 8-3
x∈B∧x⊏y ⇒y ∈B
Note that some of the consequences of B being, say, an initial segment, are
x∈B∧y ∈
/B⇒x⊏y and x ̸∈ B ∧ x ⊏ y ⇒ y ∈
/ B.
Clearly, A<x and A≤x are initial segments of A whereas A>x and A≥x are final
segments.
8.2.3 Remarks. We constructed the integers Z from the natural numbers N by
using two natural numbers to name an integer. Then, we constructed the rationals
Q from the integers Z by using two integers to name a rational number. The
construction of the reals from the rationals is more complicated: infinitely many
rationals are needed to name a real.
8.2.4 Definition. Dedekind Cuts: A Dedekind cut is a nonempty proper
initial segment of Q without a largest element. That is, a Dedekind cut is a subset
x of Q such that
(2) x has no largest element, i.e.: for every a ∈ x there is a b ∈ x such that a < b;
8.2.5 Definition. The Real Numbers: The set R of real numbers is defined
as
R = {x ∈ P(Q) | x is a Dedekind cut}.
8.2.6 Definition (Ordering of R). Define the ordering <R on R by
x <R y ⇔ x ⊂ y ∧ x ̸= y.
x ≤R y ⇔ x ⊂ y.
x <R y, x = y, y <R x,
holds. Clearly, at most one of them holds. We finish the proof by showing that
at least one of them holds. To do this, suppose that x <R y and x = y are both
false. That is, suppose that x ̸⊂ y. We must show that y ⊂ x. Since x ̸⊂ y, there
is some rational r in x \ y. Choose any s ∈ y. If r ≤ s, then, since y is an initial
segment, we have r ∈ y, contradicting r ∈ x \ y. Therefore, we have s < r. Then,
since r ∈ x and x is an initial segment, we have s ∈ x.
We have shown that s ∈ y ⇒ s ∈ x. Since x ̸= y and s is arbitrary, we have
y ⊂ x.
8.2.8 Proposition. Least Upper Bound Property: Every nonempty subset
of R that is bounded above has a unique least upper bound.
Proof. Suppose that A is a nonempty subset of R that is bounded above. We prove
that ∪A is a least upper bound. By trichotomy, it will be unique.
First, we prove that ∪A is a Dedekind cut.
We prove that ∪A ̸= ∅. Since A ̸= ∅, we have a ∈ A for some a ∈ R. Note that
a ̸= ∅ because the empty set is not a Dedekind cut. Since ∅ ̸= a ⊂ ∪A, we have
∪A ̸= ∅.
We prove that ∪A ̸= Q. Let b∗ be an upper bound of A. Then b∗ ̸= Q. Since
x ∈ A ⇒ x ⊂ b∗ we have ∪A ⊂ b∗ . Therefore ∪A ̸= Q.
We prove that ∪A has no largest element. For every x ∈ ∪A we have x ∈ a for
some a ∈ A. Since a is a Dedekind cut, it has no largest element. Hence, x < y
for some y ∈ a. This means that y ∈ ∪A, showing that for all x ∈ ∪A there is a
y ∈ ∪A with x < y.
Note 8: The Real Numbers – July 28, 2020 8-5
Note that for a ≥ b we have (a, b) = (a, b] = [a, b) = ∅, the empty set. For a > b
we have [a, b] = ∅; on the other hand, [a, a] = {a}, the set with the single point a.
The notation (a, b) is ambiguous: it can denote an open interval or an ordered pair.
Context tells which.
8.2.11 Proposition. Intervals of R can be characterized as the subsets I ⊂ R
having the following intermediacy property:
8.3 Addition
We begin by defining the additive identity of R.
8.3.1 Definition. Real Zero: Define 0R to be the set of all negative rationals:
0R = {a ∈ Q | a < 0}.
8.3.2 Definition. Real Addition: Addition +R of two reals is defined by
x +R y = {a + b ∈ Q | a ∈ x ∧ b ∈ y}.
8.3.3 Proposition. x +R y is a Dedekind cut for all x, y ∈ R.
Proof. First, we prove that x +R y ̸= ∅ and x +R y ̸= Q. Since x and y are cuts,
neither is empty. Therefore, x + y ̸= ∅. Further, x ̸= Q and y ̸= Q. Take a ∈ x
and b ∈ y. If a′ ∈
/ x and b′ ∈
/ y then a′ > a and b′ > b. Hence, a′ + b′ > a + b, giving
′ ′
a +b ∈ / x +R y. This shows that x +R y ̸= Q.
Next, we prove that x +R y has no largest element. Choose c ∈ x +R y. Then
c = a + b with a ∈ x and b ∈ y. Take r ∈ x with r > a. Then c < r + b and
r + b ∈ x +R y. This shows that x +R y has no largest element.
We prove that x +R y is an initial segment. Choose c ∈ x +R y. Then c = a + b
for some a ∈ x and b ∈ y. If d < c then d − b < a, giving d − b ∈ x and
d = (d − b) + b ∈ x +R y. This shows that x +R y is an initial segment.
8.3.4 Definition. The Negative of a Real Number For every real x we
want to define the real −x in such a way that x +R (−x) = 0R .
Guided by Figure 1, we may be tempted to define −x so that
a ∈ (−x) ⇔ −a ∈
/x
for all a ∈ Q, i.e.: −x = {a ∈ Q | −a ∈
/ x}.
−x
z }| {
... q q q ... Q →
a 0 −a
| {z }
x
Unfortunately, −x defined in this way may not be a cut because it may contain
a largest element. For example, we would have
−0R = {a ∈ Q | −a ∈
/ 0R }
= {a ∈ Q | −a ≥ 0}
= {a ∈ Q | a ≤ 0},
which is not a cut, since 0 is the largest element of −0R and 0 ∈ (−0R ). Therefore
we modify things a little and define
which does the trick quite nicely, as we prove in Proposition 8.3.6 below.
8.3.5 Proposition. For any Dedekind cut x ∈ R
Before starting the proof of this easy proposition, we remark that it almost
characterizes (−x). In fact, the set of b ∈ Q for which −b > a for all a ∈ x is
exactly our early (not quite correct) definition {b ∈ Q | −b ∈ / x} of (−x). The
proposition is useful because it is often all that is needed to prove of one half of an
identity involving (−x). The proof will not use the fact that (−x) is a Dedekind
cut; this is established in Proposition 8.3.6 below.
Proof. It suffices to prove the first statement since it is tautologically equivalent to
the second and third (in view of ¬(a < −b) ⇔ a ≥ b, by trichotomy).
Let a ∈ x and b ∈ (−x). By definition, there exists c > b with −c ∈ / x. Since
c < b we have −b > −c; since x is an initial segment and −c ∈ / x we have −b ∈ /x
and −b > a.
8.3.6 Proposition. If x ∈ R then −x ∈ R.
Proof. We prove that if x is a cut then −x is also a cut.
be shown.
8.3.7 Proposition. For every positive rational c and every real x there is some
a ∈ x such that a + c ∈
/ x.
Proof. Let c and x be fixed and given as in the statement.
Fix a rational number number b ∈ x. Let
S = {n ∈ N | b + nc ∈
/ x}.
(2) x ≤R y ⇒ x +R z ≤R y +R z.
x +R z = {a + b ∈ Q | a ∈ x ∧ b ∈ z},
gives
x +R z = {a + b ∈ Q | a ∈ x ∧ b ∈ z} ⊂ {a + b ∈ Q | a ∈ y ∧ b ∈ z} = y +R z,
(3) x ̸= y ⇒ x +R z ̸= y +R z
y +R z = {a + b ∈ Q | a ∈ y ∧ b ∈ z} ⊂ {a + b ∈ Q | a ∈ x ∧ b ∈ z} = x +R z,
x −R y = x +R (−y).
8.4 Multiplication
We begin by defining the multiplicative identity of R.
8.4.1 Definition. Real One Define 1R = {a ∈ Q | a < 1}.
8.4.2 Definition. Real Multiplication It may be tempting to define multipli-
cation ·R by
x ·R y = {ab | a ∈ x ∧ b ∈ y}.
Unfortunately, the above does not work: Since x and y are both initial segments,
they both contain negative rationals of arbitrarily large magnitude. This results in
x ·R y containing arbitrarily large rationals. Therefore we proceed as follows.
We define 0R ·R x = x ·R 0R = 0R . For 0R <R x and 0R <R y, we define
(5) There are multiplicative inverses: For every x ̸= 0R there exists y ̸= 0R such
that x ·R y = 1R . (In fact, y is x−1 as defined above.)
1. 0R ·R x = 0R for all x ∈ R.
y ·R z = y ·R z +R 0R
= y ·R z +R (−x) ·R z + x ·R z
>R 0R +R x ·R z
= x ·R z.
cA = {c ·R x | x ∈ A} .
Q∗ = {aR ∈ R | a ∈ Q} = range of Φ.
In Note 6 we proved that < orders N well. This means that every nonempty subset
of N has a least element, i.e.: for every nonempty subset S of N there is an n0 ∈ S
such that n < n0 ⇒ n ∈ / S.
Note 8: The Real Numbers – July 28, 2020 8-16
8.5.3 Remark. Henceforth we will never again use the set-theoretical construction
of R as a set of Dedekind cuts. All we shall ever use in subsequent proofs is the
fact that R is a complete ordered field possessing an isomorphically embedded copy
Q∗ of Q. Analysis, in the classical sense, is the study of the reals as a structure
possessing the above properties, rather than the study of any specific set possessing
that structure. It is, of course, far from obvious that such an ordered complete field
should exist at all, and it is precisely to show the existence of any such structure
at all that we have provided a particular example constructed via Dedekind cuts.
The reader is henceforth urged to rely only on the structural rather than set-
theoretical properties of R in subsequent proofs. Of course any new results proved
in this fashion may also be used subsequently. As a first example, the Archimedean
property of R is proved in Proposition 8.5.9 below, using only the ordering and
completeness of the field R.
8.5.4 Proposition. The set R \ Q of irrational numbers is nonempty. In fact,
there exists a unique real number z such that z > 0√and z 2 = 2, and we have
z ∈ R \ Q. (The number z is conventionally denoted 2.)
Proof. Strictly speaking, we already know that Q must be a proper subset of R
since R is complete but Q is not (Remark 6.5.31). Since this fact is so important,
we presently choose to reproduce the argument in a slightly different form, working
directly in R.
Let
A = {x ∈ (0, +∞) | x2 < 2}.
Then A is nonempty (since 1 ∈ A) and bounded above by 2 because 22 > 2 and
the function x 7→ x2 is strictly increasing† in (0, +∞). Let z = sup A. We claim
that z 2 = 2. Certainly z is positive. Define
z2 − 2 2z + 2
(8.2) y=z− = .
z+2 z+2
The above gives
If z 2 < 2 then (8.2) gives y > z > 0 and (8.3) gives y 2 < 2, so y ∈ A,
contradicting the upper bound property of z. If z 2 > 2 then (8.2) gives 0 < y < z
and (8.3) gives y 2 > 2, showing (by strict monotonicity) that y is an upper bound
for A and contradicting the minimality of z.
See Definition 8.6.1 below. The strict monotonicity is shown as follows: 0 < x < x′ ⇒ x2 =
†
Thus z 2 = 2. Recalling that the latter equation has no solution z in√Q (Re-
mark 6.5.31) we conclude that√ z 2is irrational. Conventionally one writes 2 for z
2
and the equation z = 2 as ( 2) = 2: z is the (positive) square root √ of 2. Note
that the strict monotonicity of x 7→ x2 on [0, +∞) implies that 2 is uniquely
defined.
8.15 Exercise. Prove that the set of irrational numbers is closed under neither
addition nor multiplication. (Hence R \ Q is not a ring.)
8.5.5 Proposition. For all real nonnegative x there exists a positive integer n
such that n − 1 ≤ x < n. In particular, N+ is unbounded above (and so is N).
Proof. First we prove that N+ is unbounded above. Assume, for the sake of
seeking a contradiction, that N+ is bounded above. Then z = sup N+ exists by
the completeness property of R, Proposition 8.2.8 (which is they key ingredient of
the present proof). Since z is least, there is n ∈ N+ such that z − 1 < n. Then
z < n + 1, and n + 1 ∈ N+ , contradicting that z is an upper bound for N+ . The
contradiction shows that N+ is bounded above (and so is N ⊃ N+ ).
Now fix x nonnegative. Since N+ is unbounded above, there exist positive
integers numbers n > x. Since N is well-ordered, there exists a least n ∈ N
satisfying x < n. Let n0 be such least n. By assumption n0 ̸= 0 since x ≥ 0, so
n0 − 1 ∈ N and, by minimality of n0 , n0 − 1 ≤ x. Then n0 ≤ x < n0 .
8.5.6 Corollary. For all real numbers x there exists a unique integer n such
that n ≤ x < n + 1. In particular, Z is unbounded above and below.
Proof. If x ≥ 0 there exists m ∈ Z+ with m − 1 ≤ x < m, so let n = m − 1. If
x ∈ Z− let n = x. If x < 0 and x is not an integer then m − 1 ≤ −x < m, hence
−m < x ≤ −m − 1 for some integer m. Moreover x ̸= −m − 1 since x ∈ / Z, so let
n = −m.
To show n is unique let n′ be another such. If n ̸= n′ then, without loss of
generality, we may assume n < n′ , so x < n + 1 ≤ n′ ≤ x, a contradiction. Hence
n = n′ .
8.5.7 Notation. For each x ∈ R, the unique integer n such that n ≤ x < n + 1
is denoted by ⌊x⌋ and called the largest or greatest integer in x or sometimes the
integer part of x.
The real number x − ⌊x⌋ ∈ [0, 1) is called the fractional part of x and denoted
*x+ in these Notes. Observe that the identity x = ⌊x⌋ + *x+ uniquely determines
the numbers ⌊x⌋ and *x+ subject to the conditions ⌊x⌋ ∈ Z and *x+ ∈ [0, 1).
8.5.8 Corollary. inf{1/n | n ∈ N+ } = 0.
Note 8: The Real Numbers – July 28, 2020 8-18
(i) If 0 < x < y then y −x > 0 and the Archimedean property gives the existence
of a positive integer n such that 1 < n(y − x). Hence,
(2) m − 1 ≤ nx < m.
From (2) and (3) we obtain nx < m < ny. The desired rational is a = m/n.
(ii) If x ≤ 0 and y > 0, take a = 1/n for sufficiently large n. Then, x < a < y.
Note 8: The Real Numbers – July 28, 2020 8-19
(iii) If x < y < 0 then −x > −y > 0. Thanks to (i), with x replaced by −y and
y replaced by −x, we obtain a rational a with −y < a < −x, i.e.: such that
x < −a < y.
(iv) If x < 0 and y ≥ 0, take a = −1/n for sufficiently large n. Then, x < a < y.
8.5.13 Corollary (The irrationals are dense in R). For every two real
numbers a and b with a < b there is an irrational x such that a < x < b.
Proof. Let reals a and b with a < b be given. By Proposition 8.5.12, there exists
c ∈ Q with
a b
√ <c< √ .
2 2
√
Hence, a < c 2 < b.
We remark that we can always choose c ̸= 0 without loss of generality. For, if
a < 0 < b, we can take c = 1/n or c = −1/n
√ for sufficiently large n ∈ N+ .
Assuming thus c ̸= 0 we take x =√c 2 noting that x must be irrational, since
if x were rational so would be x/c = 2, a contradiction.
√
8.16 Exercise. For any integers n, k with n ≥ 2, prove that n k is √ either an
integer m (this means that k = m is a perfect n-th power) or else n k is an
n
irrational number. Your proof may rely on the Fundamental Theorem of Arithmetic
(existence of uniqueness of factorization of positive integers into primes) as well as
the existence and uniqueness of n-roots (a fact proved below in Section 8.7.)
8.17 Exercise. Call two nonzero real numbers x, y commensurable if y/x is ratio-
nal; otherwise x, y are incommensurable.
8.19 Exercise. Let any two real numbers a, b be given with a < b.
For each n ≥ 2, prove that n pairwise mutually incommensurable numbers exist
in (a, b).
8.20 Exercise. For any n ∈ N call n real numbers u1 , u2 , . . . , un additively inde-
pendent if, for all integers c1 , c2 , . . . , cn ,
c1 u1 + c2 u2 + · · · + cn un = 0 ⇒ c1 = c2 = · · · = cn = 0.
(For n = 0 the left-hand side of the equation above is the identity 1 = 1, since an
empty product is 1 by definition, hence ∅ is a set of numbers additively indepen-
dent.)
An arbitrary subset S ⊂ R is called additively independent if every n (distinct)
elements of S are additively independent.
Show the following:
1. Any subset of a set of additively independent numbers is additively indepen-
dent.
(For n = 0 the left-hand side of the equation above is the identity 1 = 1, since
an empty product is 1 by definition, hence ∅ is a set of numbers multiplicatively
independent.)
An arbitrary subset S ⊂ R \ {0} is called multiplicatively independent if every
n (distinct) elements of S are multiplicatively independent.
Assuming the results from section 8.7 (specifically, the Existence and Unique-
ness of Roots and the algebraic laws of powers, logarithms and exponents), prove
the following:
1. If S is an additively independent set, then T = {2x | x ∈ S} is a multiplica-
tively independent set.
Note 8: The Real Numbers – July 28, 2020 8-21
Proof. We provide the details only in the case when f is strictly increasing and g
strictly decreasing. The remaining cases are all cosmetic variations of this one.
Let x, y ∈ Dom(g ◦ f ), x < y. Then f (x) < f (y) by the strict monotonicity of
f , and
• If f is increasing:
sup f (x) ≤ inf f (x).
x∈S<c x∈S>c
• If f is decreasing:
sup f (x) ≤ inf f (x).
x∈S>c x∈S<c
• supx∈T<c f (x) = supx∈S<c f (x) and inf x∈T>c f (x) = inf x∈S>c f (x), if f is in-
creasing, and
• supx∈T>c f (x) = supx∈S>c f (x) and inf x∈T<c f (x) = inf x∈S<c f (x), if f is de-
creasing.
Note that the conclusions hold for all c ∈ I, not merely for all c ∈ S.
Note 8: The Real Numbers – July 28, 2020 8-24
Proof. Since T is dense in S, it is also dense in I, so T>c and T<c are also nonempty.
Let f be increasing (resp., decreasing). Since T ⊂ S, T>c ⊂ S>c and T<c ⊂ S<c ,
then supx∈T<c ≤ supx∈S<c and inf x∈T>c ≥ inf x∈S>c (resp., supx∈T>c ≤ supx∈S>c and
inf x∈T<c ≥ inf x∈S<c ). The reverse inequalities follow from the monotonicity of f
noting that for all x, y ∈ S such that x < c < y, the density of T implies that there
exist x′ , y ′ ∈ T with x < x′ < c < y ′ < y.
8.6.8 Definition. No-Gaps Condition. Let I ⊂ R be a nonempty open inter-
val, S a dense subset of I, and f : S → R a monotonic function on S. We say that
f satisfies the No-Gaps condition on I if, for every c ∈ I:
• If f is increasing:
sup f (x) = inf f (x).
x∈S<c x∈S>c
• If f is decreasing:
sup f (x) = inf f (x).
x∈S>c x∈S<c
(Note that, in view of Proposition 8.6.6, the equalities above can both be relaxed
to inequalities ≥.)
8.6.9 Remarks. • By Proposition 8.6.6, if f is defined at c then the common
value of the supremum and infimum in Definition 8.6.8 is f (c).
Therefore all of the above are equalities and w = f (c) showing that w ∈ J. Since
w ∈ (α, β) was arbitrary, this shows that (α, β) ⊂ J and concludes the proof.
8.6.13 Proposition. Let f be a strictly monotonic function defined on a nonempty
open interval I. Assume that f satisfies the No-Gaps condition. Then:
• The range of f is an open interval J (with endpoints given in the statement
of Proposition 8.6.12);
Finally, let γ ∈ J and c = f −1 (γ). Since both f and f −1 are strictly in-
creasing, I<c = (f −1 )∗ (J<γ ) and I>c = (f −1 )∗ (J>γ ). Therefore inf y>γ f (y) =
inf(f −1 )∗ (J>γ ) = inf I>c = c = sup I<c = sup(f −1 )∗ (J<γ ) = supy<γ f (y), so f −1
satisfies the No-Gaps condition.
bn > 1.
Proof. This is an easy induction on n, the n = 1 case being the tautology b > 1 ⇒
b > 1. Assume bn > 1 for some n ∈ N+ . Multiplying both sides by the positive
real number b we obtain bn+1 > b. By the hypothesis b > 1 and transitivity we
obtain bn+1 > 1.
8.7.4 Proposition. The function f : R+ → R+ defined by f (x) = x−1 is strictly
decreasing.
Proof. Multiply both sides of x < y by y −1 x−1 to get y −1 < x−1 .
8.7.5 Proposition. For any positive (resp., negative) integer n the n-th power
function f : R+ → R+ defined by f (x) = xn is strictly increasing (resp., strictly
decreasing).
Proof. Let 0 < x < y. Then x−1 y > 1. If n > 0, from Proposition 8.7.3, (x−1 y)n >
1. From the ordinary laws of integer exponents valid in any Abelian group (such
as the Abelian multiplicative group R+ ) we have (x−1 y)n = (x−1 )n y n = (xn )−1 y n ;
hence (xn )−1 y n > 1. Multiplying both sides of this inequality by the positive
number xn we obtain xn < y n as desired.
For n < 0 the result follows from Proposition 8.6.2 applied to f and the function
g : x → x−1 , which is strictly decreasing by Proposition 8.7.4.
Note 8: The Real Numbers – July 28, 2020 8-28
8.7.6 Proposition. For any real b > 1 (respectively, 0 < b < 1) the function f :
Z → R defined by f (k) = bk is strictly increasing (respectively, strictly decreasing).
Proof. Let m, n be integers with m < n.
First assume b > 1. By Proposition 8.7.3 with k = n−m, bn−m > 1. Multiplying
both sides by the positive number bm and using the ordinary laws of exponents we
get bn > bm as desired.
Now assume 0 < b < 1. Then bm−n = (b−1 )n−m > 1 by proposition 8.7.3
applied to b−1 > 1. Multiplying by the positive number bn we obtain bm > bn as
desired.
8.7.7 Proposition. Let a, b be commuting elements in a ring (that is, ab = ba).
For any n ∈ N+ :
∑1
(Note that k=1 b1−k ak−1 = 1, the unity element of the ring.)
Proof. Easy exercise on induction and the laws of exponents safely left to the
reader.
Proof. We are ready to prove Proposition 8.7.1. We show that the function
f : R+ → R+ , x 7→ xn , is strictly increasing, surjective, and satisfies the No-Gaps
condition. By Proposition 8.6.13 and the definition of inverse function, it will follow
that xn = f (x) = y has the unique solution x = f −1 (y).
f is strictly increasing by Proposition 8.7.5.
We claim that α = inf x>0 f (x) = 0. Clearly α ≥ 0. For any ϵ > 0 take
x < min{1, ϵ}. By strict monotonicity of f we have f (x) < f (1) = 1, and by
Proposition 8.7.6 we have α ≤ f (x) = xn ≤ x < ϵ. Since this holds for all ϵ > 0 we
have α = 0.
Now we claim that f is unbounded above. For any y > 0 take x > max{1, y}.
By strict monotonicity of f we have f (x) > f (1) = 1 and by Proposition 8.7.6 we
have f (x) = xn ≥ x > y. Since this holds for all y > 0, f is unbounded above, so
in the notation of Proposition 8.6.12, β = +∞. By the same proposition, Ran(f )
is the open interval (0, ∞), so f is surjective.
Finally, we show that f satisfies the No-Gaps condition.
Note 8: The Real Numbers – July 28, 2020 8-29
Let ϵ > 0 be arbitrary. Choose x, y such that x < c < y < c + 1 and
ϵ
y−x< .
n(c + 1)n−1
Then
∑
n
0 < y − x = (y − x)
n n
y n−k xk−1 by Proposition 8.7.7,
k=1
∑n
≤ (y − x) y n−k y k−1 since y > x,
k=1
∑
n
= (y − x) y n−1 laws of exponents,
k=1
≤ (y − x) · n(c + 1)n−1 since y < c + 1, by Proposition 8.7.5,
<ϵ by the assumed upper bound on y − x.
Then inf (c,∞) f − sup(0,c) f ≤ f (y) − f (x) = y n − xn < ϵ. Since ϵ > 0 was arbitrary,
we conclude that inf (c,∞) f ≤ sup(0,c) f . This concludes the proof that f satisfies
the No-Gaps condition.
8.7.8 Corollary. For positive real numbers a and b and positive integer n,
(ab)1/n = a1/n b1/n .
Proof. (The reader is invited to provide a formal proof by induction based on the
following informal argument.)
Define c = a1/n and d = b1/n . We have, by commutativity,
· · }c |d ·{z
ab = cn dn = c| ·{z · · d} = (cd) · · · (cd) = (cd)n .
| {z }
n n n
| ·{z
(am )1/n = (a | ·{z
· · a})1/n = (a · · a1/n}) = (a1/n )m .
1/n
m m
Note 8: The Real Numbers – July 28, 2020 8-30
8.7.10 Remark. For positive real b and rational r = m/n > 0 it is tempting to
define br = (bm )1/n . Before we do so we must verify that for m/n = p/q we have
(bm )1/n = (bp )1/q , i.e.: the definition of br is independent of the representative we
use for r. This we do below.
8.7.11 Corollary. If b is a positive real; m, n, p, q are positive integers; and
m/n = p/q; then
(bm )1/n = (bp )1/q .
Proof. If m/n = p/q then mq = pn. Hence, bmq = bpn , giving
The functions defined by b 7→ bn and b 7→ b1/n are inverses of each other, i.e.: for
real positive a
Hence,
((bm )1/n )nq = (((bp )1/q )qn .
Therefore, by the uniqueness of roots, the above yields (bm )1/n = (bp )1/q .
8.7.12 Definition. In view of the above result, for any x > 0 and for m, n > 0 we
define xm/n = (xm )1/n , noting that this definition does not depend on the choice of
m and n, but only on the value r = m/n. This defines xr for all rationals r > 0.
For rationals r < 0, we define xr = (x−r )−1 = 1/x−r . Finally, we define x0 = 1 for
all x > 0.
8.7.13 Proposition. For each rational number r > 0 (resp., r < 0) the rth -power
function f : R+ → R+ , x 7→ xr , is strictly increasing (resp., strictly decreasing).
Proof. Write r = m/n for some nonzero integers m, n with n > 0. Then, by
definition, f = h ◦ g where g(x) = xm and h(y) = y 1/n . By Proposition 8.7.1, h is
strictly increasing, whereas by Proposition 8.7.5, g is strictly increasing if m > 0
(resp., strictly decreasing if m < 0). The assertion follows from Proposition 8.6.2.
Proof. It follows from the identity bx = (b−1 )−x that the statement for 0 < b < 1
follows from that for b > 1, so we assume henceforth the latter holds.
From Bernoulli’s Inequality (8.6) with t = (b − 1)/n, (1 + b−1
n
)n ≥ b, so by the
strict monotonicity of n-th roots:
b−1
(8.7) 1+ ≥ b1/n .
n
Let c ∈ R be arbitrary. For any ϵ > 0, choose n ∈ N such that n > ϵ−1 bc+1 .
Now choose rational numbers x, y such that x < c < y and y − x < 1/n. Then
by−x < b1/n by strict monotonicity (Proposition 8.7.15). Now,
by − bx = bx (by−x − 1)
< bc (b1/n − 1)
( )
b−1
≤b c
by (8.7)
n
bc+1
<
n
< ϵ.
Since ϵ > 0 was arbitrary, this shows that inf y>c by − supx<c bx ≤ 0, so f satisfies
the No-Gaps condition.
We are ready to extend the definition of bx to irrational exponents x via the
following general result about extension of monotonic functions satisfying the No-
Gaps condition.
8.7.18 Proposition. Let I ⊂ R be a nonempty open interval, S a dense subset
of I, and f : S → R a monotonic function satisfying the No-Gaps condition (on I).
Then the function f¯ : I → R defined by
{
supx<c f (x) = inf x>c f (x) if f is increasing,
f¯(c) =
inf x<c f (x) = supx>c f (x) if f is decreasing
The function f¯ is well defined and extends f by Propositions 8.6.6 and 8.6.8.
For any x, z ∈ I with x < z the density of S allows choosing y ∈ S such that
x < y < z. Then monotonicity of f implies that f¯(x) = inf f∗ (Sx+ ) ≤ f (y) ≤
sup f∗ (Sz− ) = f¯(z) (with inequalities strict if f is strictly monotonic), so f¯ is
monotonically increasing (and strictly so if f is).
To show that f satisfies the No-Gaps condition, it suffices to observe that,
for each c ∈ I, since f¯ extends f and S<c ⊂ I<c , sup f∗ (S<c ) ≤ sup f¯∗ (I<c );
similarly inf f∗ (S>c ) ≥ inf f¯∗ (I>c ). By the No-Gaps condition for f it follows that
sup f¯∗ (I<c ) ≥ inf f¯∗ (I>c ), proving the No-Gaps condition for f¯.
8.7.20 Definition. Combining Propositions 8.7.15, 8.7.18, and 8.6.13 we imme-
diately obtain, for each fixed positive real number b ̸= 1 (called the base), functions
expb : R → R+ (the exponential function base b) and logb : R+ → R (the logarithm
function base b) satisfying the following properties:
• expb and logb are mutually inverse, strictly monotonic functions (increasing,
for b > 1; decreasing, for 0 < b < 1) satisfying the No-Gaps condition, and
bx+y = bx by ,
(bx )y = bxy , and
(ab)x = ax bx .
(This is one of the few instances we will meet when an identity is proved in
one step; however, note that we needed two auxiliary propositions: one for sums,
another one for products. With the help of just one of those propositions, we would
have proved an inequality in one direction, and the reverse direction could have
been obtained using infima instead of suprema. We invite the reader to ponder
that, in the absence of the auxiliary propositions of Exercises 8.8 and 8.14, no
inequality could be proved.)
Now we prove the second identity assuming b > 1 and x, y > 0. Let S = Q+ .
Let x, y ∈ R+ , u ∈ Sx− and v(∈ Sy− . By the) laws of exponents 8.7.14, buv = (bu )v .
We clearly have supv∈Sy− supu∈Sx− (uv) = supv∈Sy− (xv) = xy. Since the set
of such products uv (resp., products xv) is clearly dense in (0, xv) (resp., dense
in (0, xy)), the monotonicity of expb (Proposition 8.6.7) implies that
( )
sup sup buv = sup bxv = bxy .
v∈Sy− u∈Sx− v∈Sy−
On the other hand, by the definition of the exponential function expb we have
supu∈Sx− bu = bx , so by the monotonicity of the v th -power function (Proposi-
tions 8.7.13), we have supu∈Sx− (bu )v ≤ (bx )v , by Proposition 8.6.6. By the same
Proposition and the obvious observation bx > bu > 1, the monotonicity of expbx
implies that supv∈Sy− (bx )v = (bx )y . Altogether we obtain
Combining the results of the two paragraphs above, we obtain bxy ≤ (bx )y .
The reverse inequality follows by working with infima instead. This completes the
proof of the second identity in the case b > 1 and x, y > 0. The case 0 < b < 1 is
completely analogous.
At this point we note that the special case y = −1 of the second identity,
namely the identity b−x = (bx )−1 , actually follows from the first, already proved
identity by putting y = −x. We also have the special case (b−1 )y = b−y , which
easily follows taking suprema (if b < 1) or infima (if b > 1) over all rationals v < y
in the identity (b−1 )v = b−v . The second identity now follows in full generality (for
x, y ̸= 0) by writing x = ±s and y = ±t with s, t > 0 and combining the two
special cases above with the identity (already proved) applied to s, t.
We now prove the third identity assuming a, b > 1 for every x ∈ R. Recall
that (ab)u = au bu for every rational u. Since ab > 1, by definition of expab we have
supu∈Qx− (ab)u = (ab)x . From the definition of supremum we also have au ≤ ax
and bu ≤ bx for all u ∈ Q− , so au bu ≤ ax bx and supu∈Qx− (au bu ) ≤ ax bx . We
conclude that (ab)x ≤ ax bx and the reverse inequality follows by working with
infima taken over Qx+ , so the third inequality is established for a, b > 1. The cases
Note 8: The Real Numbers – July 28, 2020 8-36
when a < 1 or b < 1 are obtained by first applying the identity with a−1 replacing
a or b−1 replacing b (respectively, which we can do by the case already proved) and
subsequently using the special cases x = −1, y = −1 of the second identity.
8.7.22 Proposition (Laws of logarithms). For any fixed positive real base
b ̸= 1, all a ∈ R and all x, y > 0 we have
logb (xy) = logb (x) + logb (y), and
logb (xa ) = a logb (x).
Proof. By the laws of real exponents and the mutually inverse relation between
expb and logb ,
expb (logb (x) + logb (y)) = expb (logb x) expb (logb y) = xy.
Applying logb to both sides above we obtain the first identity claimed in the state-
ment.
Similarly,
expb (a logb x) = ba logb x = (blogb x )a = xa .
The second identity now follows applying logb to both sides.
8.7.23 Definition. Fix a real number κ ̸= 0. Define the κth -power function powκ :
R+ → R+ by:
powκ (x) = expx (κ) = xκ .
(Recall that 1κ = 1 for all κ ∈ R.) Then:
8.7.24 Proposition. • powκ is a strictly monotonic function (increasing for
κ > 0, decreasing for κ < 0);
• powκ is bijective, with inverse pow1/κ ; and
• powκ satisfies the No-Gaps condition.
Of course the usual notation for powκ (x) is simply xκ .
Proof. We assume κ > 0, leaving the other case to the reader. First we prove powκ
is monotonic. We assume κ > 0 leaving κ < 0 to the reader. Let now 0 < x < y
and z = y/x > 1. By the (strictly increasing) monotonicity of the v th -power map
for rational v > 0 we have z v > 1v = 1, so z κ > 1 follows. Multiplying both
sides of the inequality by the positive number xκ and using the laws of exponents
(Proposition 8.7.21) we get xκ < y κ , proving that expκ is strictly increasing for
κ > 0.
By the laws of exponents, (x1/κ )κ = x = (xκ )1/κ , so powκ is bijective, with
inverse pow1/κ .
Finally, powκ satisfies the No-Gaps condition by Proposition 8.6.11.
Note 8: The Real Numbers – July 28, 2020 8-37
m − (q + 1)n = (m − qn) − n = r − n ≥ 0,
m = qn + r and m = q ′ n + r′ ,
(1) |r − r′ | = n |q − q ′ | .
n |q − q ′ | = |r − r′ | < n,
m = m1 b + n0 ,
m1 = m2 b + n1 ,
..
.
mk−1 = mk b + nk ,
with mj = 0 for all j ≥ k; 0 ≤ n0 < b, . . . , 0 ≤ nk−1 < b; and 0 < nk < b. This
process yields
m = n k bk + · · · + n 1 b + n 0 .
The sequence {nk , . . . , n1 , n0 } is called the base-b representation of m, or base-b
numeral representing m, and we usually write
∑
k
(8.8) +(ñk . . . ñ1 ñ0 )b to denote (or name) the number m= n j bj
j=0
(the leading lexical symbol ‘+’ and the parentheses are optional in practice; the
suffix b may also be omitted if it is ten or otherwise clear from context), where ñi
Note 8: The Real Numbers – July 28, 2020 8-39
stands for a lexical symbol (called a numeral in this context) representing ni , and
where the juxtaposition in (8.8) is purely lexical and does not indicate product.
For example, 12710 does not mean 1 · 2 · 7, but names the number 1 · 102 + 2 · 10 + 7,
so that 12710 , 1778 , 11111112 and 7F16 all name the same number, since
1 · 102 + 2 · 10 + 7 = 1 · 82 + 7 · 8 + 7
= 1 · 26 + 1 · 25 + 1 · 24 + 1 · 23 + 1 · 22 + 1 · 2 + 1
= 7 · 16 + 15.
Note that each n−j exists since the set of n ∈ N whose largest element is defined
to be n−j always contains n = 0 but no element n ≥ b. Therefore n−j is a base-b
digit for j = 1, 2, . . . .
We note that for arbitrary x > 0, the first (that is, largest) j such that nj ̸= 0
is still characterized by the inequality bj ≤ x < bj+1 , hence its value is ⌊logb x⌋.
Let S denote the collection of all such sums:
{ }
n−1 n−j
S = sj = m + + · · · + j ∈ R j ∈ N ∧ j > 0 ∧ (8.9) holds .
b b
Note 8: The Real Numbers – July 28, 2020 8-40
∑
k
(8.11) sj = ni bi for j = 0, 1, 2, . . . ,
i=−j
While we have hitherto tacitly assumed k ≥ 0, we may as well take k as the largest
j such that nj ̸= 0 (perhaps k < 0). For the base-b representations ±(0̃.0̃0̃0̃ . . . )b
we may regard k as being minus infinity and sj as the empty sum with value 0 for
all j.
A priori we do not know whether every base-b expansion denotes a real num-
ber x. Moreover, we have shown that every real number is denoted by some base-b
¶
In Note 10 we will study infinitely long sums (series) and we shall be able to interpret
∑
k
(ñk . . . ñ1 ñ0 . ñ−1 ñ−2 . . .)b as n j bj .
j=−∞
For the time being the reader may just take sup S as the definition of the series above.
Note 8: The Real Numbers – July 28, 2020 8-41
In other words, an expansion is strict if not all its digits are b − 1 from any point
on. For instance, in base ten,
both name the number 1. The second expansion is strict, but the first one is not.
8.8.5 Remark. It is easy to see that digits chosen according to (8.9) give a strict
expansion of the nonnegative real number x.
8.8.6 Theorem (Existence and uniqueness of base-b expansions). Fix an
integer base b > 1. The relation “E denotes x” is a bijection between strict base-b
expansions E and real numbers x.
A real number x is denoted by exactly two different base-b expansions if and
only if an integer l exists such that bl x is an integer. Otherwise x is denoted by
exactly one base-b expansion, necessarily strict.
Proof. We already proved the existence of a base-b expansion with digits chosen
according to (8.9).
We show that any base-b representation denotes certain real number x. It
suffices to check that the sequence {sj } with sj given by (8.11) is bounded above.
On the one hand, we clearly have
∑
k
sj ≤ (b − 1)bi = bk+1 − b−j ,
i=−j
so the sequence {sj } is bounded above by bk+1 , hence its supremum x exists. In
fact, since∑the sequence {sj } is monotonically increasing as j decreases and also
si ≤ sj + j−1l=i (b − 1)b = sj + b − b , it follows that sj + b is an upper bound for
l j i j
{si }, hence
(8.12) s j ≤ x ≤ s j + bj
for all integers j. Equality holds on the left (resp., on the right) if and only if
ni = 0 (resp., ni = b − 1) for all i < j.
Note 8: The Real Numbers – July 28, 2020 8-42
E : +(ñk . . . ñ1 ñ0 . ñ−1 ñ−2 . . .)b and E′ : +(ñ′k . . . ñ′1 ñ′0 . ñ′−1 ñ′−2 . . .)b
agreeing down to but excluding the j-th position, that is, such that ni = n′i for all
i > j and (for the sake of definiteness) with nj > n′j , the inequalities
hold, with equality on the left (resp., on the right) if and only if ni = 0 and
n′i = b − 1 (resp. ni = b − 1 and n′i = 0)
∑for all i < j. ∑
In fact, for the partial sums sj = i=j ni bi of E and s′j = ki=j n′i bi of E′ , we
k
have, by (8.12),
x − y = (x − sj ) + (sj − s′j ) − (y − sj )
≥ 0 + (nj − n′j )bj − bj
= (nj − n′j − 1)bj ,
with equality if and only if y − s′j = bj and x = sj , that is, ni = 0 and n′i = b − 1 for
all i < j. This proves the left inequality in (8.13) and justifies the case of equality;
the analysis of the right inequality is similar and omitted.
Assuming x = y and E different from E′ , both agreeing down to but excluding
the j-th position as above, it follows from (8.13) that nj − n′j − 1 ≤ 0; since also
n′j < nj by assumption, we conclude nj = n′j + 1. Therefore the left inequality
in (8.13) is an equality and we conclude that E is a strict base-b representation and
E′ a lax but not strict representation, both of x, and E′ is obtained decreasing the
rightmost nonzero numeral of E by one, and replacing all subsequent zero numerals
of E by the numeral denoting b − 1.
Clearly, positive expansions denote positive numbers, and negative expansions
denote negative numbers. As shown above, every positive number is denoted by a
strict positive expansion E, but some are also denoted by a non-strict one E′ . This
also obviously applies to negative numbers and their expansions.
The number zero has two expansions differing only by their syntactic sign, as
noted in Definition 8.8.4. Note that 0 = 1b0 is an integer. Every nonzero real x
having two distinct expansions will have the same sign as either of them. Denote
by l the last nonzero position of the strict expansion E of x. Then 2−l x ∈ Z.
Conversely, if a real x satisfies 2l ∈ Z, we have either x = 0 (which has both a
proper and an improper expansion, or else x ̸= 0. In the latter case n′i = 0 for
all digits with i < −l in the strict expansion of x, but nj ̸= 0 for some j ≥ −l.
Let j the least such, so nj is the right-most nonzero digit of the expansion of x.
Then x also has the non-strict expansion E′ obtained from E as above. The proof
is complete.
Note 8: The Real Numbers – July 28, 2020 8-43
Contents
9.1 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-1
9.2 Countable and Uncountable Sets . . . . . . . . . . . . . . . . 9-3
9.3 Open and Closed Sets . . . . . . . . . . . . . . . . . . . . . . 9-14
9.4 Properties of Open and Closed Sets . . . . . . . . . . . . . . 9-19
9.5 Convexity and Connectedness . . . . . . . . . . . . . . . . . 9-25
9.6 Compactness . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-26
9.1 Preliminaries
9.1.1 Proposition. For all real x and y we have:
(1) |xy| = |x||y|;
(2) |x|2 = x2 ;
(3) |x| ≤ y ⇔ −y ≤ x ≤ y
(note that both statements are necessarily false when y < 0);
9-1
Note 9: Important Subsets of the Reals – July 28, 2020 9-2
9.1.2 Remark. The triangle inequality is the most important and basic inequal-
ity in analysis. Below we present it in two forms.
9.1.3 Proposition (Triangle Inequality). The inequality |x + y| ≤ |x| + |y|
holds for all real x and y.
Proof. By Proposition 9.1.1 we have
|x + y|2 = (x + y)2
= x2 + 2xy + y 2
≤ x2 + 2|xy| + y 2
= |x|2 + 2|x||y| + |y|2
= (|x| + |y|)2 .
√
The conclusion now follows upon taking square roots (the function t 7→ t is
strictly monotonically increasing on [0, +∞]).
9.1.4 Definition. Let I ⊂ R be an interval and let f : I → [0, +∞). We say that
f is subadditive if, whenever x, y and x + y are in I, we have
f (x + y) ≤ f (x) + f (y).
If, in addition, I = [−a, a] is an interval symmetric about the origin, and f is an
even function (that is, f (−x) = f (x) for all x ∈ I), we say that the function f is
symmetric subadditive (or even subadditive).
In light of Definition 9.1.4, the triangle inequality is the statement that the
absolute value is subadditive. Obviously, it is also symmetric (even).
9.1.5 Proposition. 1. A function f defined on an interval I is subadditive if
and only if f satisfies
f (y − x) ≥ f (y) − f (x) whenever x, y and y − x are in I.
f (x − y) ≥ f (x) − f (y).
Further, every nonempty subset A of R that has an upper bound has a least upper
bound sup A and every nonempty subset A of R that has a lower bound has a
greatest lower bound inf A.
Note 9: Important Subsets of the Reals – July 28, 2020 9-4
The notation {a0 , . . . , an−1 } is ambiguous: it can denote the set whose elements
are listed or the sequence whose range consists of those values, in the order given.
Context tells which one is meant.
9.2.3 Remarks. If A is countable then there is a bijection f : N → A. By writing
an in place of f (n), we can list the elements of A:
(1) a0 , a 1 , a 2 , . . . .
The bijection
{
n/2, if n is even,
f : N → Z : n 7→
−(n + 1)/2, if n is odd,
n0 , n1 , . . . , nk ,
define nk+1 to be the smallest n such that n > nk and an ∈ B. (Note: The
existence of a smallest element follows from the well ordering principle: Every
nonempty subset of N has a smallest element. At least one such n must exist since
B is infinite.) It is readily shown that {nk } is a strictly increasing sequence (that
is, n0 < n1 < n2 < . . . ).
The function
f : N → B : k 7→ ank
is the desired bijection between N and B. Note that we have an explicit way of
defining each element ank of B, without resorting to the axiom of choice.
9.2.6 Theorem. For every infinite set A the following three conditions are
equivalent.
(1) A is countable.
is countable
Proof. We need to define a bijection N ∼ A. Since each An is countable, its
elements can be listed: an,0 , an,1 , an,2 , . . .. For each m ∈ N the number of elements
an,k such that n + k = m is finite. Take any listing of all an,k with n + k = 0, then
of all an,k with n + k = 1, then of all an,k with n + k = 2, and so on:
a0,0 , a0,1 , a1,0 , a0,2 , a1,1 , a2,0 , a0,3 , a1,2 , a2,1 , a3,0 , . . . ,
|{z} | {z } | {z } | {z }
0 1 2 3
for example.
The list contains all the elements of A, but, if any two sets An and Ak have
an element in common, that element is listed more than once. Therefore, the list
can be reduced, if necessary, to contain no duplicate entries, while containing all of
the elements of A. Suppose this has been done and that the above list contains no
duplicates. The list must be infinite, since A0 is infinite (and a set cannot contain
duplicates). The list without duplicates is of the form
b0 , b 1 , b 2 , . . . .
This defines a bijection N ∼ A : i 7→ bi .
9.2.8 Remark. Some find the visual device presented below helpful. Accord-
ingly, we can start the above proof as follows.
Since each An is countable, its elements can be written
an,0 , an,1 , an,2 , . . . .
Therefore, we can arrange all the elements of all the sets in the entire collection in
the array
a0,0 a0,1 a0,2 a0,3 ···
a1,0 a1,1 a1,2 a1,3 ···
a2,0 a2,1 a2,2 a2,3 ···
a3,0 a3,1 a3,2 a3,3 ···
.. .. .. .. ..
. . . . .
By following the arrows shown in the array below, we can list all the elements
in the array. (We can single-thread the two-dimensional array by starting with a0,0
and following the arrows as shown.)
Note 9: Important Subsets of the Reals – July 28, 2020 9-7
↙ ↗ ↙
↓ ↗ ↙
.. .. .. .. ..
. . . . .
The list is given by
a0,0 , a0,1 , a1,0 , a2,0 , a1,1 , a0,2 , a0,3 , a1,2 , a2,1 , a3,0 , . . . .
The rest of the proof proceeds as above.
9.1 Exercise. Prove that the enumeration of a countable family of countable sets
in the proof of Theorem 9.2.7 is given by the function
(m + n)(m + n + 1)
f (m, n) = m +
2
from N × N to N. Prove that f is a bijection. Can you visualize f geometrically?
Can you find other explicit bijections N × N ∼ N?
9.2.9 Corollary. Suppose we are given a collection {Sα | α ∈ A} of sets. If the
indexing set A is at most countable and if for every α ∈ A the set Sα is at most
countable, then the union
∪
(1) S= Sα
α∈A
is at most countable.
Proof. The theorem states that given a collection {Sα | α ∈ N}, where each Sα
is countable, then
∪
∞ ∪
(2) S= Sα = Sα
α=0 α∈N
Note 9: Important Subsets of the Reals – July 28, 2020 9-8
↓ ↗ ↙ ↗
↙ ↗
↓ ↗
.. .. .. .. ..
. . . . .
Note 9: Important Subsets of the Reals – July 28, 2020 9-9
f : Q+ → N : m/n 7→ 2m 3n
S = {xj | j ∈ N+ }
where nj ̸= 1 for all j < 0 (that is, nj is 0 or 2 but not 1). We call any such
expansion E a zero/two ternary expansion.
Note 9: Important Subsets of the Reals – July 28, 2020 9-11
(Note that we allow lax expansions. In fact, the Cantor set contains all rational
numbers of the form n/3l for some l ≥ 0 and 0 ≤ n ≤ 3l , by Theorem 8.8.6.
Therefore the Cantor set is infinite.)
9.3 Exercise. Let C be the Cantor set.
1. Use Exercise 9.2 to prove that C is uncountable.
2. For any ϵ > 0, prove that there exists a finite family of intervals I = {[ai , bi ] |
i = 1, 2, . . . , m} (for some m ∈ N+ ) such that
∑
m ∪ ∪
m
(bi − ai ) < ϵ and C ⊂ I= [ai , bi ].
i=1 i=1
Intuitively speaking, the Cantor set has null total length (“zero measure”), yet it
is still uncountable!
9.4 Exercise. The Cantor function is the function f : [0, 1] → [0, 1] defined as
follows.
First, define a transformation d 7→ δ of ternary numerals {0, 1, 2} to binary
numerals {0, 1} by
0 7→ 0
1 7→ 1
2 7→ 1.
Second, for x ∈ [0, 1], write x in base 3 as x = (0.d1 d2 . . . )3 (lax expansions are
allowed).
If none of the digits dj of x are 1 (i.e., if x is in the Cantor set C), define
f (x) = (0.δ1 δ2 . . . )2
by applying the transformation d 7→ δ to each ternary digit dj of x.
Otherwise, there exists a least n ∈ N+ such that dn = 1. In this case, define
f (x) = (0.δ1 δ2 . . . δn )2 .
Note 9: Important Subsets of the Reals – July 28, 2020 9-12
1. Prove that f is well defined, that is, if x has both a strict and a non-strict
expansion, then both recipes for f (x) agree.
5. Prove that A, B take values in the Cantor set C and are right inverses to f :
More specifically, show that f ∗ ({z}) is the closed interval [A(z), B(z)]. Note
that this interval has nonzero length only if z has a terminating binary ex-
pansion.
The graph of y = f (x) is known as the “devil’s staircase”. The staircase
climbs up as one moves from left to right. The intervals [A(z), B(z)] at
height z (having nonzero length) are the “steps” of the staircase.
6. Prove that the devil’s staircase has infinitely countably many steps (of nonzero
length), and that it“steps up” exactly at irrational points x ∈ C (of which
there are uncountably many.)
7. Prove that the sum (series) of the (positive) lengths of the steps of the devil’s
staircase is equal to 1. Hence, the steps essentially “cover the entire length” of
the interval [0, 1], so to speak. (Note: this is only a heuristic interpretation.)
a = {a0 , a1 , a2 , . . . }
Note 9: Important Subsets of the Reals – July 28, 2020 9-14
in 2N = {0, 1}N (that is, aj ∈ {0, 1} for all j ∈ N) let X(a) be the real number
with base-b expansion
+(a0 . a1 a2 . . . )b
Since b > 2, we have b − 1 > 1, and since aj ≤ 1 for all j ∈ N, no digit b − 1
appears in the expansion above, which is therefore strict. By the uniqueness of
(strict) base-b expansion, a 7→ X(a) is an injection 2N → R, so 2N ⪯ R.
From R ⪯ 2N and 2N ⪯ R, thanks to the Schröder-Bernstein theorem we have
R ∼ 2N .
9.2.21 Corollary. N ≺ R.
Proof. Thanks to Cantor’s theorem we have N ≺ 2N . Thanks to the theorem, we
have 2N ∼ R. Hence, N ≺ R. We have shown that N ≺ 2N ∼ R.
9.2.22 Remark (The Continuum Hypothesis). Any set (such as 2N ) equiv-
alent to R is said to have the cardinality of the continuum. The continuum hy-
pothesis, abbreviated CH, asserts that there is no set whose cardinality is strictly
between the cardinalities of N (countable cardinality) and R (continuum cardi-
nality), that is, CH is the assertion that there does not exist a set S such that
N ≺ S ≺ 2N .
Some sixty years passed since CH was formulated until Paul Cohen, in 1963,
showed that neither the CH nor the denial of CH introduces any inconsistencies in
ZFC. That is, if ZFC is consistent, it remains consistent whether CH is assumed
to be true, or CH is assumed to be false. One may say that CH is a statement
independent from the ZFC axioms: Neither CH nor the negation of CH are theo-
rems in ZFC set theory. The continuum hypothesis is probably the most famous
example of an undecidable statement (or an incompleteness) in ZFC set theory.
More generally, a neighborhood of x is any set N such that N ⊃ Nϵ (x) for some
ϵ > 0. We sometimes write N (x) to denote a neighborhood of x. (Note that if
y ∈ N (x) and y ̸= x then N (x) need not be a neighborhood of y, e.g., the Exercise
below.)
9.3.3 Proposition. Fix reals x and y. If
(1) y ∈ Nϵ (x) for all ϵ>0
then x = y. In other words: If |y − x| < ϵ for all ϵ > 0 then x = y.
Proof. We prove the contrapositive. If x ̸= y, let ϵ = |y − x| > 0. Then |y − x| ̸<
ϵ.
9.3.4 Proposition. The topology of R possesses the Hausdorff separation prop-
erty, namely:
For any x, y ∈ R, if x ̸= y, there exist respective disjoint neighborhoods N (x)
and N (y).
Proof. Let ϵ = |y − x|/2 and N (x) = Nϵ (x) = (x − ϵ, x + ϵ), N (y) = Nϵ (y) =
(y − ϵ, y + ϵ). By the triangle inequality,
z ∈ Nϵ (x) ⇒ |z − x| < ϵ
⇒ |z − y| = |(y − x) − (z − x)| ≥ |y − x| − |z − x| > 2ϵ − ϵ = ϵ
⇒z∈ / Nϵ (y),
so the neighborhoods Nϵ (x) and Nϵ (y) are disjoint.
9.3.5 Definition. Interior, Limit and Isolated Points Let S be a subset
of R and let x be a point in R. Then
(1) x is called an interior point of S if S is a neighborhood of x, i.e., Nϵ (x) ⊂ S
for some ϵ > 0 (note that, if x ∈ R is an interior point of S, then necessarily
x ∈ S);
(2) x is called a limit (or cluster) point of S if every neighborhood of x contains
a point in S other that x. In other words, x is a limit point of S if and only
if for every neighborhood N (x) of x we have
(∃y)(y ̸= x ∧ y ∈ S ∩ N (x)).
Note that a limit point of S need not itself be in S.
(3) If x ∈ S is not a limit point of S then it is called an isolated point of S. This
is equivalent to S ∩ (Nϵ (x) \ {x}) = ∅ for some ϵ > 0. Note that an interior
point is not isolated, hence an isolated point is never an interior point.
Note 9: Important Subsets of the Reals – July 28, 2020 9-16
We also have
1 1 1
−ϵ = −
n n n(n + 1)
n
=
n(n + 1)
1
= .
n+1
5. Define
9.3.8 Examples. Some sets are open, some are closed, some are both and some
are neither.
3. Clearly, (−∞, +∞) is open, since it contains N1 (x) = (x − 1, x + 1). The in-
terval (−∞, +∞) is also closed, since every limit point belongs to (−∞, +∞).
4. The interval [a, b] is closed. Moreover, every point of the interval is a limit
point. Every point in (a, b) is an interior point of [a, b].
5. The intervals (a, b) and (a, b] are not closed: a is a limit point that does not
belong to the interval. The interval (a, b] is neither open nor closed.
( )
∩ ∪
(2) T\ Sα = (T \ Sα ).
α∈A α∈A
Each set R \ Si is closed, since each Si is open. Since a finite intersection of open
sets is open, R \ S must be open. So, S must be closed.
9.4.13 Remark. To see that the infinite union of closed sets need not be closed,
consider
∪
∞
[−1 + 1/n, 1 − 1/n] = (−1, 1).
i=1
For another example, consider by the limit point 0 of S = ∪n∈N+ {1/n}, where
each singleton {1/n} is a closed set.
9.4.14 Definition. Closure and Interior of a Set The closure S − of a subset
S of R is defined by S − = S ∪ S∗ , where S∗ denotes the set of all limit points of S.
The interior S ◦ of S is the set of all interior points of S. Note that S ◦ ⊂ S ⊂ S − .
9.4.15 Examples. The closure of the open interval (a, b) is the closed interval
[a, b]. The interior of the closed interval [a, b] is the open interval (a, b). The interior
of Q is empty and the closure of Q is R (by density).
We do not necessarily have (S − )◦ = S. For example, let S = (−1, 0) ∪ (0, 1).
Then S − = [−1, 1] and (S − )◦ = (−1, 1). An even simpler example: Let S = {0}.
Then S − = {0} and (S − )◦ = {0}◦ = ∅ ̸= S.
We do not necessarily have (S ◦ )− = S. For example, let S = Z. Then S ◦ = ∅
and (S ◦ )− = ∅. For a simpler example: S = {0} gives S ◦ = ∅ and (S ◦ )− = ∅− =
∅ ̸= S.
9.4.16 Proposition. For all subsets S of R the following statements hold:
(2) S − is closed.
S0 ⊃ S1 ⊃ S2 ⊃ · · · .
(ii) Suppose that n > k. Then, thanks to the inclusion In ⊂ Ik , we obtain the
inequality
ak ≤ an ≤ b n .
Thanks to (i) and (ii) we have ak ≤ bn for all n and k in N . In particular, we have
an ≤ b0 for all n ∈ N. Thus, the set {an | n ∈ N} is nonempty and is bounded
above. Define
s = sup{ak | k ∈ N}.
We have s ≤ bn for all n ∈ N. Hence, since an ≤ s ≤ bn , we have s ∈ In for all
n ∈ N.
9.4.20 Corollary. If {[an , bn ]}∞
n=0 is a nested sequence such that
inf{bn − an | n ∈ N} = 0
such that bn − an = 2−n (b0 − a0 ) and such that [an , bn ] contains infinitely many
elements of S. Let cn = (an + bn )/2. Since [an , bn ] is infinite, at least one of the
intervals [an , cn ] and [cn , bn ] contains infinitely many elements of S. Define this
interval to be [an+1 , bn+1 ]. Note that
inf{bn − an ∈ R | n ∈ N} = 0.
∩
Thanks to Theorem 9.4.19, there exists x ∈ ∞ n=0 [an , bn ] (x is not necessarily in S,
though it is necessarily unique, by Corollary 9.4.20). Moreover, any neighborhood
N of x contains the intervals [an , bn ] for any sufficiently large n (this holds as long
as N ⊃ Nϵ (x) and n is large enough that 2−n (b0 − a0 ) < ϵ), hence N contains
(infinitely many) points in S. This shows that x is a limit point of S.
(S ⊂ U ∪ V ) ∧ (S ∪ U ̸= ∅) ∧ (S ∪ V ̸= ∅) ∧ (U ∩ V = ∅).
(S ⊂ U ∪ V ) ∧ (U ∩ V = ∅) ⇒ (S ⊂ U ) ∨ (S ⊂ V ).
Of course, the above definition simply states that any disjoint open sets U, V fail
to witness the disconnectedness of S.
9.5.5 Theorem. A subset of R is connected if and only if it is an interval.
Proof. By Proposition 9.5.2, it suffices to show that a subset S ⊂ R is connected
if and only it is convex.
The stated equivalence is obvious if S has zero or one elements, so we assume
S has at least two distinct elements henceforth.
Note 9: Important Subsets of the Reals – July 28, 2020 9-26
9.6 Compactness
9.6.1 Definitions. Let S be a subset of R. An open cover of S is a collection
C = {Uα | α ∈ A}
C1 = {(−∞, +∞)},
C2 = {(n − 1, n + 1) | n ∈ Z},
C3 = {(−n, n) | n ∈ N},
C4 = {(−r, r) | r ∈ Q ∧ r > 0},
C5 = {(r − 1, r + 1) | r ∈ Q},
C6 = {(−x, x) | x ∈ R ∧ x > 0},
C7 = {(x − 1, x + 1) | x ∈ R}.
S = {x0 , x1 , . . . , xn }.
be any finite subcover of C. Each Tαj is of the form Tαj = (−1, mj ). Therefore,
where m = sup{m0 , m1 , . . . , mk }. Since no such finite union can include [0, +∞),
no finite subcover exists.
9.6.6 Example. The interval (0, 1) is not compact. Define Tn = (1/n, 1) for
n = 1, 2, . . .. Then, we have
∪
∞
(0, 1) ⊂ Tn ,
n=1
be any finite subcover of C. Each Tαj is of the form Tαj = (1/mj , 1). Therefore,
where m = sup{m0 , m1 , . . . , mk }. Since no such finite union can include (0, 1), no
finite subcover exists.
9.6.7 Remark. In the above examples, [0, +∞) is closed but not bounded, while
(0, 1) is bounded but not closed. Neither is compact. Are closed and bounded sets
compact? The answer is affirmative, as we show below.
9.7 Exercise. Let K be a compact subset of R.
1. Prove that, for each n ≥ 1ϵ > 0 there exists a finite subset S of K such that
∪
K⊂ Nϵ (x).
x∈S
(This property is usually stated by saying that compact sets are totally
bounded.)
Since x ∈
/ K, the above implies
∪
∞
K⊂ Tn .
n=1
The compactness of K implies the existence of a finite subcover {Tn1 , Tn2 , . . . , Tnm }
of K. Without loss of generality n1 < n2 < · · · < nm . Thus,
∪
m
K⊂ Tnk = Tnm = (−∞, x − 1/nm ) ∪ (x + 1/nm , +∞).
k=1
9.8 Exercise. Complete the proof of Proposition 9.6.11 by showing that a closed
set all of whose FIP collections of closed subsets have nonempty intersection is
compact.
Note 9: Important Subsets of the Reals – July 28, 2020 9-31
(This strengthening of the Nested Intervals Property holds in any compact set.
In light of the Heine-Borel characterization (below) of compact subsets of R as
sets that are both closed and bounded, this exercise explains why nonempty closed
bounded intervals are needed in formulating the Nested Intervals Property.)
9.6.12 Theorem. Heine-Borel A subset of R is compact if and only if it is
closed and bounded.
Proof. In the previous two propositions we have proved that every compact set is
closed and bounded. Below we prove that every closed and bounded set is compact.
Let S be a closed and bounded subset of R and let C = {Tα | α ∈ A} be
an open cover of S. To prove that C contains a finite subcover, we proceed by
contradiction: We suppose that C does not contain a finite subcover. That is, we
suppose that
{
There is no finite subset {Tα1 , Tα2 , . . . , Tαn } of C such that
(1)
S ⊂ Tα1 ∪ Tα2 ∪ . . . ∪ Tαn .
(by taking the union of the two finite subcovers) violating (1).
If S ∩ [a0 , c0 ] fails to have a finite subcover then define [a1 , b1 ] = [a0 , c0 ]; else
define [a1 , b1 ] = [c0 , b0 ]. Thus, S ∩ [a1 , b1 ] fails to have a finite subcover.
We proceed recursively. Suppose that we have nested intervals
with bn − an = 2−n (b0 − a0 ), such that S ∩ [an , bn ] does not have a finite subcover.
Define cn = (an + bn )/2. Then, at least one of the subsets S ∩ [an , cn ] and
S ∩ [cn , bn ] of S does not have a finite subcover. (The reasoning is analogous to
that given for [a0 , b0 ].)
Note 9: Important Subsets of the Reals – July 28, 2020 9-32
If S ∩[an , cn ] fails to have a finite subcover then define [an+1 , bn+1 ] = [an , cn ]; else
define [an+1 , bn+1 ] = [cn , bn ]. Thus, S ∩ [an+1 , bn+1 ] fails to have a finite subcover.
Further, we have bn+1 − an+1 = 2−(n+1) (b0 − a0 ) and
{
There is no n ∈ N and there is no finite subset {Tα1 , Tα2 , . . . , Tαn } of C
(2)
such that S ∩ [an , bn ] ⊂ Tα1 ∪ Tα2 ∪ . . . ∪ Tαn ,
(3) (x − ϵ, x + ϵ) ⊂ T.
Now, bn − an = 2−n (b0 − a0 ) with n sufficiently large, say n > n0 for some n0 ,
implies bn − an < ϵ. Hence, since x ∈ [an , bn ] for all n, we have
We have shown that (1) implies (2). Additionally we have shown that (1)
implies (5). Since (5) implies that (2) is false, we have shown that (1) implies a
contradiction. Hence, (1) is false, i.e.: S has a finite subcover.
9.6.13 Theorem. Any subset S of R that has one of the three following prop-
erties has the other two.
(2) S is compact.
Proof. We have already proved that (1) ⇔ (2) (Heine-Borel) and that (1) ⇒ (3)
(Bolzano-Weirstrass). The proof is concluded by showing that (3) ⇒ (1), since we
then have
(3) ⇔ (1) ⇔ (2).
Suppose that (3) holds for a subset S of R. If S is not bounded, then for each
n ∈ N there is a point xn ∈ S such that |xn | > n. The set of all such points xn is
infinite and has no limit point in R and, therefore, has none in S. We have shown
that (3) implies that S is bounded.
To show that S is bounded, let x ∈ R \ S be a limit point of S. Then, for each
n ∈ N, there is a point xn ∈ S such that
Let A denote the set of all such points xn . These need not be all distict; however,
for any ϵ > 0 an infinity of distinct such points lies in Nϵ (x). To see this, note first
that xm ∈ Nϵ (x) provided 1/(m + 1) < ϵ, so A ∩ Nϵ (x) ̸= ∅. Suppose now that
Nϵ (x) contained only finitely many points of A, say
Then a > 0, since |x−xj | > 0 for all j ∈ N. Choose ñ so large that 0 < 1/(ñ+1) <
a. Then, no element of A is less than 1/(ñ + 1), contradicting (1). Hence, the point
x is a limit point of A.
We show that x is the only limit point of A. To see this, suppose that y is
another limit point of A. Let ϵ > 0 be arbitrary. Let n be such that 1/(n + 1) <
ϵ/2. Since y is a limit point of A there exist infinitely many j ∈ N such that
|xj − y| < ϵ/2; in particular there exists one such j0 with j0 > n. Therefore
Contents
10.1 The Extended Reals . . . . . . . . . . . . . . . . . . . . . . . 10-1
10.2 The Topology of the Extended Reals . . . . . . . . . . . . . 10-5
10.3 Sequences and their Limits . . . . . . . . . . . . . . . . . . . 10-10
10.4 Subsequences and Subsequential Limits . . . . . . . . . . . . 10-15
10.5 Cauchy Sequences . . . . . . . . . . . . . . . . . . . . . . . . 10-20
10.6 Some Convergence Results . . . . . . . . . . . . . . . . . . . 10-22
10.7 Contractive Sequences . . . . . . . . . . . . . . . . . . . . . . 10-25
10.8 The Binomial Theorem . . . . . . . . . . . . . . . . . . . . . 10-26
10.9 Monotonicity . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-34
10.10 Series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-43
10.11 Euler’s Number e . . . . . . . . . . . . . . . . . . . . . . . . 10-59
10-1
Note 10: Sequences and Series – July 28, 2020 10-2
E = R ∪ {+∞} ∪ {−∞}.
sup S = +∞.
inf S = −∞.
We also define∗
sup ∅ = −∞ and inf ∅ = +∞.
In other words, every subset of E has a least upper bound and a greatest lower
bound in E.
10.1.4 Definition. We extend the definition of powers, exponentials and loga-
rithms as follows:
• For κ > 0: (+∞)κ = +∞;
In other words,
• f is bijective, and
for some x ∈ E.
10.2.2 Remarks. • Note that all intervals [−∞, x) are open according to the
definition above. So are all (x, +∞], and so is [−∞, ∞]. An open subset of
R is certainly open in R, hence the interior S ◦ of a subset of R is the same
even if one regards S as a subset of E. However, it is important to stress that
if S ⊂ R is closed in R, it need not be closed as a subset of E. For instance
R itself is closed in R, but not closed in E since it is missing its limit points
±∞. In particular, when S is a subset of R there is ambiguity as to the
meaning of S − , the closure of S. We will try to provide enough context to
preclude confusion. By and large we shall henceforth work with the topology
of E rather than R because the compactness of E (see below) has many useful
consequences. Many statements are valid in E without exception, whereas
additional hypotheses need to be added for their truth in R. We already saw
the first important example of this phenomenon in Proposition 10.1.7.
(ii) for any neighborhood N (x) there exists m such that Nm ⊂ N (x) (hence
Nn ⊂ N (x) for all n > m).
Note 10: Sequences and Series – July 28, 2020 10-9
max S = sup S,
min S = inf S,
If the above is true, we say that the sequence converges to L and we call L the
limit of the sequence. If a sequence does not converge, it is said to be divergent, or
to diverge.
To say that {an } converges to L we may also write
lim an = L, or: an → L as n → ∞.
n→∞
lim an = L ⇔ (∀ϵ > 0)(∃N ∈ N)(∀n ∈ N)(n > N ⇒ |an − L| < ϵ).
n→∞
In general, for a number L ∈ E we have limn→∞ an = L if and only if, for every
neighborhood N (L) we have an ∈ N (L) eventually.
10.3.6 Remarks. • It follows immediately from the definition of basis of neigh-
borhoods of L that no generality is lost in Definition 10.3.5 by considering
only neighborhoods N (L) belonging to any fixed basis B of neighborhoods
of L (or to any fixed shrinking sequence of neighborhoods of L for that mat-
ter). For instance, it is always enough to take N (L) to be an ϵ-neighborhood
of L for arbitrary ϵ > 0, or perhaps an (1/n)-neighborhood for arbitrary
n ∈ N.
Note 10: Sequences and Series – July 28, 2020 10-12
• The definition above is the first important one in mathematics entailing three
quantifiers (a so-called “∀3 formula”); therein resides its complexity and the
need for refined skill in its use. The definitions of the limit of a function
and of continuity, to be introduced in Note 11, are essentially identical and
among the most important in analysis.
• A constant sequence {an } with an = L for all n has the obvious limit L.
• In Definition 10.3.5 above note that, while, any neighborhood interval N (L)
does contain all terms an for n sufficiently large, hence for infinitely many
n ∈ N, it is misleading to state that N (L) contains infinitely many terms
of the sequence since these terms may not be distinct. In fact, N (L) may
contain only finitely many distinct numbers an : consider the extreme example
of the constant sequence {an } with an = L for all n ∈ N. Then N (L) contains
exactly one term of the sequence (namely L), yet an is in N (L) for all natural
numbers n. Still, it is quite common to abuse language and quantify over
elements an in the range of the sequence rather than over the natural numbers
n (in the domain of the sequence). For instance “infinitely many terms of
{an } satisfy […]” is commonly interpreted to mean “for infinitely many natural
numbers n, an satisfies […]”.
10.5 Exercise. Let {an } be a real sequence. Prove:
A→E
{an } 7→ L = lim an .
n→∞
We then have
n ≤ N ⇒ |an | ≤ B,
Therefore, |an | ≤ max{1 + |L|, B} for all n, as required.
10.3.20 Corollary. Every unbounded sequence diverges.
Proof. Let {an } be a sequence. Proposition 10.3.19 above shows that
Therefore, by contraposition,
{an }∞
n=0 = {1, 2, 1/2, 1, 3, 1/3, 1, 4, 1/4, . . .}
Note 10: Sequences and Series – July 28, 2020 10-16
10.9 Exercise. Let S be a set. Show that a sequence is frequently in S if and only
if it has a subsequence in S.
10.4.9 Proposition. A point L ∈ E is a subsequential limit of {an } if and only
if, for every neighborhood N (L), {an } is frequently in N (L).
Proof. If L = limk→∞ ank and N (L) is any neighborhood of L then there exists
K such that ank ∈ N (L) for all k > K. The K-th tail {ank+K | k ∈ N+ } is
a subsequence of {an } in N (L). Hence {an } is in S infinitely often (this is the
backward (easy) implication of Exercise 10.9).
Conversely, let L ∈ E such that every N (L) contains terms an for arbitrarily
large n. Let {Nn } be any shrinking sequence of neighborhoods of L. (For instance,
Nn = N1/n (L).)
Define an increasing sequence {nk } ⊂ N recursively as follows:
• For any k ∈ N, given nk , let nk+1 be any integer n such that n > nk and
an ∈ Nk+1 .
emulating that one: See Exercise 10.10 below. We also remark that, in the termi-
nology introduced in Exercise 10.11 below, this theorem states that (i) bounded
subsets of R, and also (ii) arbitrary subsets of the extended reals E are relatively
sequentially compact.)
Proof. Note that the second statement follows from the first since obviously a
bounded sequence {an } can only have bounded (hence real) subsequential limits,
hence {an } must have a convergent subsequence provided it has any subsequential
limits at all.
In what follows, let {an } be a sequence in E. For each l ∈ N let {an+l } be
the l-th tail of {an }. Let Bl be the closure of the range of {an+l } (a nonempty,
closed subset of E). The range of {an+l+1 } is a subset of the range of {an+l }, hence
Bl ⊃ Bl+1 : in other words, B0 ⊃ B1 ⊃ . . . is a nested sequence of nonempty
bounded and closed subsets of E. In particular, B0 is compact (by Heine-Borel,
∩ and the sequence {Bl } consists of closed subsets of a compact
Theorem 10.2.9),
set. Let I = l∈N Bl . By the FIP characterization of compactness we conclude
that I ̸= ∅.
Let L ∈ I. The proof will be complete once we prove that {an } is frequently in
any neighborhood N of L, by Proposition 10.4.9. Given such N and an arbitrary
natural number M we have L ∈ BM +1 , so we have N ∩{an+M +1 } ̸= ∅, by definition
of closure. Hence, there exists m ∈ N with am+M +1 ∈ N . Letting n = m + M + 1
we have n > M and an ∈ N . We have shown that {an } is frequently (infinitely
often) in an arbitrary neighborhood N of L, so L is a subsequential limit of {an },
concluding the proof.
∩
10.4.12 Remark. The proof above shows l Bl ⊂ SeqLim{an }. Actually, equality
holds. See Theorem 10.9.17.
10.10 Exercise. Provide an alternate proof of Theorem 10.4.11 along the lines of
that given for the Bolzano Weierstrass theorem for bounded infinite subsets of R,
Theorem 9.4.21. Alternatively, use 9.4.21 to prove 10.4.11.
10.11 Exercise. A subset S ⊂ R is called relatively sequentially compact if any
sequence {an } ⊂ S has a convergent subsequence. A subset K ⊂ R is called
sequentially compact if any sequence {an } ⊂ K has a convergent subsequence
whose (unique) limit is in K.
1. Prove that a subset S ⊂ R is relatively sequentially compact if and only if
its closure S − (in R) is sequentially compact.
2. Prove that a subset of R is relatively sequentially compact if and only if it
is bounded, and sequentially compact if and only if it is closed and bounded
(i.e., for subsets of R sequential compactness is the same property as com-
pactness).
Note 10: Sequences and Series – July 28, 2020 10-19
As long as m > N we will have 0 ≤ logb (n/m) = logb 1 + (n − m)/m < ϵ since
0 ≤ (n − m)/m < K/m < K/N = X. Thus,
0 < n − m < K ∧ n > N ⇒ |an − am | < ϵ.
In particular, given N and any ϵ > 0 there always exist terms am , an such that
|an − am | < ϵ. However, this is not enough to conclude that the sequence {an } is
Cauchy; indeed it is not. In fact, in the case at hand the Cauchy property fails not
just barely, but in a spectacular fashion. For its failure, it would suffice to find a
single positive ϵ (presumably small) for which the property not be met; however, in
this case the property fails to hold for any ϵ > 0 whatsoever, no matter how large!
100
In fact, given any ϵ > 0 (say ϵ = 1, or ϵ = 1010 for that matter) and any N choose
m > max{N, 1} and any n > bϵ m. Then n > m > N and |an − am | > logb bϵ = ϵ.
In other words, there exist arbitrarily large m, n such that am , an are arbitrarily far
apart, so our earlier calculation that am , an may be close together with m, n large
was rather misleading, and did not at all prove the Cauchy property.
10.5.3 Proposition. Every convergent sequence is Cauchy.
Proof. Suppose that {an } is a convergent sequence. Then, there some L such that
given any ϵ > 0 there is an N such that
ϵ
n > N ⇒ |an − L| < .
2
Note 10: Sequences and Series – July 28, 2020 10-21
Therefore,
ϵ ϵ
m, n > N ⇒ |am − an | = |(am − L) + (L − an )| ≤ |am − L| + |L − an | < + = ϵ,
2 2
showing that the sequence is Cauchy.
10.5.4 Proposition. Every Cauchy sequence is bounded.
Proof. Suppose that {an } is a Cauchy sequence. Then, there is some N such that
Therefore,
Then
(6a) limn→∞ an /bn = +∞ if L1 > L2 = 0 and bn > 0 for all n, or else L1 <
L2 = 0 and bn < 0 for all n;
(6b) limn→∞ an /bn = −∞ if L1 > L2 = 0 and bn < 0 for all n, or else L1 <
L2 = 0 and bn > 0 for all n;
Proof. The proofs below cover all cases involving only real numbers and their arith-
metic. The reader is asked to provide the necessary modifications needed to deal
with extended reals.
We prove (1) Given ϵ > 0 there is some N such that
Therefore, we have
ϵL22
n > N2 ⇒ |L2 an − L1 bn | < .
2
For N = max{N1 , N2 } we have
an L 1 L2 an − L1 bn
n>N ⇒ − =
bn L2 bn L 2
|L2 an − L1 bn |
=
|bn ||L2 |
|L2 an − L1 bn |
< 2
L22
= ϵ.
Therefore,
n > N ⇒ |an | − |L1 | ≤ |an − L1 | < ϵ
Note 10: Sequences and Series – July 28, 2020 10-25
The calculation above shows that m, n > N → |an − am | < γ N |a1 − a0 | (1 − γ).
Note that this last upper bound depends only on N and tends to 0 as a function
of N , as N → ∞. Therefore ( the sequence
) {an } is Cauchy. (To be explicit, given
ϵ > 0 choose any N > logγ |a1 −a0 | . Then m, n > N ⇒ |an − am | < ϵ.)
(1−γ)ϵ
10.7.4 Theorem (Squeeze Principle). If {an } and {cn } are real sequences
with a common limit L ∈ E and {bn } is a real sequence such that
an ≤ b n ≤ c n for all n ∈ N,
then {bn } converges to L also.
(Note: By the finality of the property “bn → L as L → ∞”, it is always enough
to assume that the inequalities hold for all sufficiently large n.)
Proof. It suffices to show, given an arbitrary convex neighborhood N of L, that bn
is eventually in N . Since an → L we have an is eventually in N , say for all n > N1 .
Similarly, bn is eventually in N , say for all n > N2 . Let N = max{N1 , N2 }. By the
assumed inequalities and the convexity of N we have n > N ⇒ bn ∈ [an , cn ] ⊂ N .
We conclude that bn → L as n → ∞.
We will not present a proof the above result. We content ourselves with pointing
out that the standard proof relies heavily on Zorn’s Lemma, itself a nontrivial
consequence of the Axiom of Choice. However, from an intuitive point of view the
Transfinite Recursion Theorem simply says that if successive values of a function
are specified solely in terms or prior ones, then the overall function may be defined
by a recursive process.
10.8.2 Proposition. There exists a unique function φ : N × N → N such that
and
This ordering is strict and linear (see Exercise 6.8). For each (i, j) ∈ N × N let
I(i, j) be the initial segment {(a, b) ∈ I | (a, b) ≺ (i, j)} and define fij : NI(i,j) → N
by {
( ) 1 if i = 0 or j = 0
fij {nab }(a,b)∈I(i,j) =
n(i−1,j) + n(i,j−1) if i > 0 and j > 0.
The existence and uniqueness of φ now follows immediately from the Transfinite
Recursion Theorem 10.8.1. (Note that i, j > 0 if and only i = m + 1, j = n + 1 for
some natural numbers m, n; thus m = i − 1 and n = j − 1.)
10.8.3 Remark. We hope that the reader finds the existence of the function φ in
Proposition 10.8.2 rather intuitive. We do not wish to suggest that the full power
of the Transfinite Recursion Theorem is needed to construct the recursively defined
function φ. Far from it. It is perfectly possible, and the reader is invited to try to
find a proof of the existence of φ using the Countable Recursion Theorem 5.8.2.
We use transfinite recursion because it is the natural tool to dispatch the issue at
hand quickly before proceeding to the goal of defining binomial coefficients.
10.8.4 Definition. For natural numbers n, k with 0 ≤ k ≤ n we define the (n, k)-
th binomial coefficient
( )
n
(10.6) = φ(n − k, k)
k
(10.13) ( ) ( ) ( ) ( ) ( )
n n n n−1 n n−k k n n n
(x + y) = n
x + x y + ··· + x y + ··· + xy n−1
+ y
0 1 k n−1 n
= 1xn + nxn−1 y + · · · + nxy n−1 + 1y n
∑n ( )
n n−k k
= x y .
k=0
k
(x + y)2 = x2 + 2xy + y 2
(x + y)3 = x3 + 3x2 y + 3xy 2 + y 3
(x + y)4 = x4 + 4x3 y + 6x2 y 2 + 4xy 3 + y 4
(x + y)5 = x4 + 5x4 y + 10x3 y 2 + 10x2 y 3 + 5xy 4 + y 5 .
Note 10: Sequences and Series – July 28, 2020 10-31
In the remainder of this section we work in the field R of real numbers. (The
same definitions and statements hold in Q or C, and in fact in any field whose
prime ring is isomorphic to Z, that is, any field of characteristic zero.)
10.8.9 Definition. Factorial For n ∈ N define the factorial function n! (read:
n-factorial) recursively as follows. Define 0! = 1 and (n + 1)! = (n + 1)(n!). Thus,
∏
n
(10.14) n! = i = 1 · 2 · . . . · n.
i=1
∏
k
n−j+1 ∏
k+1
n−j+1
C(n, k) + C(n, k + 1) = +
j=1
j j=1
j
∏
k
n−j+1 n−k ∏n−j+1
k
= + ·
j=1
j k + 1 j=1 j
( ) k
n−k ∏n−j+1
= 1+
k + 1 j=1 j
∏k
n + 1 j=1 (n − j + 1)
= ∏k
k+1 j=1 j
∏k
(n + 1) ℓ=1 (n − ℓ + 1)
= ∏
(k + 1) kj=1 j
∏k
(n − ℓ + 1)
= ℓ=0∏k+1
j=1 j
∏k+1
j=1 (n + 1 − j + 1)
= ∏k+1 letting ℓ = j − 1
j=1 j
∏
k+1
n+1−j+1
=
j=1
j
= C(n + 1, k + 1).
( )
We conclude the proof noting that C(n, k) must be equal to nk for all n, k ≥ 0
since both quantities satisfy the same recurrence. The proof of (10.15) is complete.
It is worth noting that, while formula (10.15) is valid for all k ∈ N, for-
mula (10.16) is only valid for k ≤ n. Moreover, the attentive reader will readily
observe that the right-hand side of (10.15) is perfectly well defined for any number
n (say, rational, or even real or complex!). This is no coincidence, as shall be
explained in Note 10 when binomial series are introduced.
10.18 Exercise. As a corollary of the Binomial Theorem, prove Bernoulli’s in-
equality
(1 + t)n ≥ 1 + nt
for all n ∈ N and all t ≥ 0.
(Note: Bernoulli’s inequality actually holds for all t ≥ −1.)
10.8.13
(100Examples.
) 100·99 • The coefficient of x2 in the expansion
( ) of (1 + x)100 is
2
= 1·2 = 9900
2
= 4950. That of x5 in (1 + x)10 is 10
5
= 10·9·8·7·6
1·2·3·4·5
= 252.
10.9 Monotonicity
10.9.1 Definition. A real (or extended real) sequence {an } is said to be
(1) monotonically increasing if an ≤ an+1 for n = 0, 1, 2, . . .;
(2) monotonically decreasing if an ≥ an+1 for n = 0, 1, 2, . . ..
A monotonic sequence is one that is either monotonically increasing or monotoni-
cally decreasing.
10.9.2 Remark. It is a trivial matter to prove by induction that, for a monoton-
ically increasing (resp., decreasing) sequence {an } and all natural numbers m ≤ n
we have am ≤ an (resp., am ≥ an .)
10.9.3 Theorem (Monotone Sequence Theorem). A monotonic sequence
converges if and only if it is bounded.
Proof. Part 1. Suppose that {an } is a monotonically increasing sequence. If {an }
converges, it is bounded. Now, we must construct the proof in the other direction.
Suppose that the sequence {an } is bounded. Let R denote the range of {an }. Let
B denote the least upper bound of R. Therefore,
(1). an ≤ B for n = 0, 1, 2, . . .
Note 10: Sequences and Series – July 28, 2020 10-35
(2) B − ϵ < aN ≤ B,
otherwise B cannot be the least upper bound of R. Since the sequence is mono-
tonically increasing,
(3) n > N ⇒ aN ≤ an .
showing that an → B as n → ∞.
Part 2. If {an } is a monotonically decreasing sequence, the sequence {−an } is
monotonically increasing. Clearly, {an } is bounded if and only if {−an } is bounded.
Thanks to the first part of the proof, the conclusion of the theorem applies to the
sequence {−an }. We use Theorem 10.6.1(2) with c = −1 to conclude that the
convergence of {an } follows from the convergence of {−an }.
10.19 Exercise. Write a direct proof of Part 2 of the proof of Theorem 10.9.3
above. That is, modify Part 1 of the proof to apply directly to monotonically
decreasing sequences.
10.9.4 Proposition. Every monotonic sequence {an } has a limit limn→∞ an ∈ E.
Proof. By Theorem 10.9.3, we need only consider unbounded monotonic sequences.
Say {an } is monotonically increasing and unbounded. It is necessarily bounded
below by a0 , so it must be unbounded above. Given T there exists N such that
aN > T . Since {an } is increasing, we have n > N ⇒ an ≥ aN > T . Hence
an → +∞ as n → ∞. The case of {an } monotonically decreasing is formally
identical.
10.9.5 Theorem (Limits of Powers). For any real number κ ̸= 0 and any
sequence {an } in (0, +∞) with limn→∞ an = L ∈ [0, +∞]:
lim (an )κ = Lκ
n→∞
provided Lκ is defined. (Refer to Definition 10.1.4 and the exercise below for
improper cases.)
Note 10: Sequences and Series – July 28, 2020 10-36
Proof. The proof below assumes L ∈ (0, +∞). The modifications needed when
L = 0, +∞ are left to the reader (see Exercise 10.20 below). { }
First, assume κ > 0. Let ϵ > 0 be given. Let ϵo = min ϵ, 21 Lκ . Since
L > 0 we have ϵo > 0. Then 0 < 12 Lκ ≤ Lκ − ϵo < Lκ < Lκ + ϵo . Let δ =
min{(Lκ + ϵ0 )1/κ − L, L − (Lκ − ϵo )1/κ }. Note that δ > 0 since κ > 0 implies that
pow1/κ : x 7→ x1/κ is strictly increasing (Proposition 8.7.24).
By the strict monotonicity of powκ : x 7→ xκ , we have
Therefore, given ϵ > 0, it suffices to choose N such that n > N ⇒ |an − L| < δ in
order to conclude that (an )κ → Lκ as n → ∞.
For κ < 0 we have
( )
κ 1
lim (an ) = lim
n→∞ n→∞ (an )−κ
1
= by Theorem 10.6.1(6)
limn→∞ (an )−κ
1
= −κ since −κ > 0 (above case)
L
= Lκ .
10.20 Exercise. Let {an } be a real sequence (not necessarily positive) and κ ∈ R.
Let limn→∞ an have a limit L ∈ E. Show that the equality
lim (an )κ = Lκ .
n→∞
lim ban = bL .
n→∞
Therefore, given ϵ > 0, it suffices to choose N such that n > N ⇒ |an − L| < δ in
order to conclude that ban → bL as n → ∞.
If 0 < b < 1 we have
lim ban = lim (b−1 )−an
n→∞ n→∞
−1 −L
= (b ) by case above (b−1 > 1) and Theorem 10.6.1(2)
= bL .
10.9.7 Theorem (Limits of Logarithms). For any real positive b ̸= 1 and any
sequence {an } in [0, +∞] with limn→∞ an = L ∈ [0, +∞]:
10.21 Exercise. Fill in the missing details in the proofs of Theorems 10.9.6
and 10.9.7.
10.22 Exercise. Strengthen Theorems 10.9.6 and 10.9.7 by extending them to
cover the cases limn→∞ an = +∞ as well as {an } in [0, +∞) with limn→∞ an = 0
in 10.9.7 and limn→∞ an = −∞ in 10.9.6.
1
10.9.8 Theorem (Some Special Sequences). (1) If p > 0 then lim p = 0.
n→∞ n
√
(2) If p > 0 then lim n p = 1.
n→∞
√
(3) lim n
n = 1.
n→∞
nr
(4) If p > 0 and r is any real number then lim = 0.
n→∞ (1 + p)n
Since |x| < 1 implies p = (1 − x)/x > 0, the result follows thanks to the above
and (4).
10.9.9 Definition. Let {ak }k≥0 be a real sequence. Let
be the sequences of suprema and infima (in the extended reals) of the tails of {ak }.
Then {un } is monotonically decreasing and {ln } is monotonically increasing, so
both have a limit in E. We define
• The notations
limn→∞ an and limn→∞ an
are common alternatives to denote the superior and inferior limits of the
sequence {an }.
10.9.11 Examples.
(1) For any sequence {an } we have limn→∞ an = L if and only if
(2) For the sequence an = (−1)n we have lim supn→∞ (−1)n = +1 = limn→∞ a2n
and lim inf n→∞ (−1)n = −1 = limn→∞ a2n+1 . In fact, these are the only
subsequential limits of {an }.
(3) The sequence an = n(−1) , that is the sequence 0, 1, 2, 3−1 , 4, 5−1 , . . . has
n+1
superior limit +∞ = lim inf n→∞ an = limn→∞ a2n+1 and inferior limit 0 =
lim inf n→∞ an = limn→∞ a2n , and these are exactly its subsequential limits.
√ √ √ √
(4) The sequence an = *n 2+ = n 2 − ⌊n 2⌋ (an is the fractional part of n 2)
has superior limit 1 and inferior limit 0. It can be shown that every number
in [0, 1] is a subsequential limit of {an }.
The reader may ask at this point whether the superior (resp., inferior) limits
are always equal to the largest (resp., smallest) subsequential limits of a sequence.
The answer is affirmative and given in Theorem 10.9.17 below.
10.9.12 Proposition. Let {an } and {bn } be real sequences and c ∈ R. Then the
following relations hold (using the arithmetic of the extended reals):
Proof. We only prove the first part of 4., leaving everything else to the reader
(actually, the second part of 4. follows from the first together with the case c = −1
of 3.). Set un = supk≥n ak , vn = supk≥n bk and wn = supk≥n (ak + bk ). For k ≥ n
we have ak ≤ un and bk ≤ vn , hence ak + bk ≤ un + vn . Hence un + vn is
an upper bound for the tail {ak + bk }k≥n , whence wn ≤ un + vn follows. Then
lim supn→∞ (an + bn ) = limn→∞ wn ≤ limn→∞ (un + vn ) = limn→∞ un + limn→∞ vn =
lim supn→∞ ak + lim supn→∞ bk (as long as the last sum is defined.)
10.9.13 Proposition. If limn→∞ an exists in E, then
Proof. We prove only the right equality. Set un = supk≥n ak . Let L = limn→∞ an
and U = limn→∞ un = lim supn→∞ an .
By Proposition 10.2.14, any neighborhood N (L) of L contains a closed sub-
neighborhood N . Since an → L as n → ∞, the tails {ak }k≥n are in N for n
sufficiently large. Since N is closed in E, N contains the closure of those tails,
hence un ∈ N for all n large. Since N is closed, U = limn→∞ un ∈ N ⊂ N (L).
Since U is in every neighborhood of L we have U = L.
10.9.14 Proposition. Let {an } and {bn } be sequences such that an ≤ bn for all
sufficiently large n. Then
and
inf C = lim inf Cn = sup{inf Cn | n ∈ N}.
n→∞
Proof. First note that {Cn }, being a nested sequence of nonempty closed subsets
of the compact set E satisfies the FIP property; hence C is nonempty, by the FIP
characterization of compactness. Moreover C is closed, being an intersection of
closed sets.
We only prove the equalities for sup C, the case of inf C following formally
identically. Let s = inf{sup Cn }. Since {Cn } is nested the sequence {sup Cn } is
monotonically decreasing and s = limn→∞ sup Cn .
We have C ⊂ Cn , hence sup C ≤ sup Cn for all n ∈ N. Thus, sup C ≤ s.
Let Dn = Cn ∩ [t, +∞]. Then {Dn } is a nested
Now, let t > sup C be arbitrary. ∩
sequence of closed sets. Note that n Dn ⊂ C ∩ [t, +∞] = ∅ since C ⊂ [−∞, t).
By (the contrapositive of) the FIP characterization of compactness and the nesting
Note 10: Sequences and Series – July 28, 2020 10-42
10.10 Series
10.10.1 Definition. Given a real sequence {an }, the sequence of terms, we form
another sequence, the sequence of partial sums {sk }, defined by
∑
k
sk = an ,
n=0
Proof. Apply the Cauchy criterion to the sequence of partial sums. For every ϵ > 0
there is some N such that
∑ ∑
n m
m, n > N ⇒ ak − ak < ϵ.
k=0 k=0
∑
∞ ∑
∞
an = L 1 ∈ E and bn = L2 ∈ E.
n=0 n=0
Then
∑∞
(1) n=0 can = cL1 for any c ∈ R
(here we interpret 0(±∞) as 0);
∑∞
(2) n=0 (an + bn ) = L1 + L2 if L1 + L2 is defined
(i.e., it is not the sum of +∞ and −∞);
Proof. Let {σn } (resp., {ςn }) be the sequence of partial sums of {an } (resp., of
{bn }).
• We prove (1). The sequence {sn } of partial sums of {can } satisfies sn = cσn .
The claim follows from Theorem 10.6.1(2).
= σn − σm .
∑
Since |ak | converges, the sequence of its partial sums {σn } converges, so it is a
∑ above, {sn } is also a Cauchy sequence, hence
Cauchy sequence. By the inequality
convergent. Therefore the series ak converges.
10.10.9 Theorem (Rearrangement Theorem). Let {ak } be a real sequence.
Let i 7→ ki be a bijection N → N (a permutation of N). The sequence {aki } is
called a rearrangement
∑ of {ak }.
∑ If the series k ak is absolutely convergent series then the rearranged series
i aki is also absolutely convergent, and
∑
∞ ∑
∞
ak i = ak .
i=0 k=0
Note 10: Sequences and Series – July 28, 2020 10-46
Proof. Let {sk }, {σk }, {ti } and {τi } be the sequences of partial sums of {ak },
{|ak |}, {aki } and {|aki |}. ∑
First we prove {aki } is∑absolutely summable. Let A = k |ak |. Then σk ≤
A < +∞ for all k since k |ak | is absolutely convergent. By monotonicity, it
suffices to show that {τi } is bounded above by A. Let j 7→ kj be the inverse to
the bijection k 7→ ik . Given i, let K = max{kj | 0 ≤ j ≤ i}. Then we have
{k0 , k1 , . . . , ki } ⊂ {0, 1, . . . , K}, so
∑
i ∑
K
τi = |akj | ≤ |ak | ≤ σK ≤ A.
j=0 k=0
∑ ∑
Now, let S = ak . We shall prove that j akj = S. Note first that
k
∞ ∞
∑ ∑k ∑ ∑
∞
|S − sk | = aj − aj = aj ≤ |aj | = A − σk .
j=0 j=0 j=k+1 j=k+1
Assume that {bn,0 }n is summable.§ Then, for each k ∈ N, the sequence {bn,k }n is
summable, and
∑
∞
lim bn,k = 0.
k→∞
n=0
§
∑
This is the “domination”
∑ condition, in the sense that the series n bn,0 is convergent and it
dominates all others n bn,k term-by-term.
Note 10: Sequences and Series – July 28, 2020 10-48
Proof. We have 0 ≤ bn,k ≤ bn,0 . By the comparison test and the assumed summa-
bility of {bn,0 }n , {bn,k }n is summable for every k ∈ N.∑Let ϵ > 0 be given. Since
{bn,0 }n is summable there exists N ∈ N+ such that ∞ n=N bn,k < ϵ/2. For fixed
n ∈ N, since {bn,k }k decreases to 0, there exists Kn such that k > Kn ⇒ bn,k < 2Nϵ
.
Let K = max{K0 , K1 , . . . , KN }. Then we have, for all k > K:
∑
∞ ∑
N −1 ∑
∞
0≤ bn,k = bn,k + bn,k
n=0 n=0 n=N
∑
N
ϵ
−1 ∑∞
< + bn,0
n=0
2N n=N
ϵ ϵ
<N + = ϵ.
2N 2
∑
Since this holds for all ϵ > 0 we conclude limk→∞ ∞ n=0 bn,k = 0.
By Proposition 10.10.13,
∑
∞ ∑
∞
0 = lim bn,k = lim (bn − an,k )
k→∞ k→∞
(∞
n=0 n=0
)
∑ ∑
∞
= lim bn − an,k
k→∞
n=0 n=0
∑
∞ ∑
∞
= lim bn − lim an,k
k→∞ k→∞
n=0 n=0
∑
∞ ∑
∞
= bn − lim an,k .
k→∞
n=0 n=0
∑∞ ∑∞
This shows that limk→∞ n=0 an,k = n=0 bn , concluding the proof in the case
when {bn } is summable.
∑
Now assume n bn = +∞. For all M ∈ R there exists N ∈ N+ such that
∑N −1
n=0 bn > M + 1. For all n < N there exist Kn ∈ N such that an,k > bn − 1/N for
all k > Kn (since an,k → bn as k → ∞). Let K = max{K0 , K1 , . . . , KN −1 }. Then
∑∞ ∑
N −1 ( ) N ∑ −1
1 1
k>K ⇒ an,k ≥ bn − > bn − · N > M + 1 − 1 = M.
n=0 n=0
N n=0
N
∑ ∑
Since M was arbitrary, this shows n an,k → +∞ = n bn as k → ∞, concluding
the proof.
∑
∞
10.10.15 Theorem (Geometric Series). The geometric series xn con-
n=0
verges absolutely if |x| < 1, and diverges if |x| ≥ 1. Its sum is
∑
∞
1
(10.21) xn = for |x| < 1.
n=0
1−x
Proof. For |x| ≥ 1 we have xn ̸→ 0 as n → ∞ so the series diverges, by the nth -term
test, Corollary 10.10.4.
For |x| < 1, we have
∑
k
1 − xk+1
sk (x) = xn = .
n=0
1−x
• If
an+1
lim sup <1
n→∞ a n
∑
then {an } is absolutely summable (in particular, n an converges).
• If
an+1
lim inf >1
n→∞ an
∑
then |an | → ∞ as n → ∞ (in particular, n an diverges).
Proof. • Let ρ = lim supn→∞ |an+1 /an | < 1. Fix σ ∈ (ρ, 1). Since σ > ρ, the
definition of superior limit implies
for all sufficiently large n, say for n > N . Therefore |ak+1 | ≤ uN |ak | < σ|ak |
for all k > N , and it follows easily by induction that |ak+N | ≤ σ k |aN | for all
k ∈ N. Thus, for any K,
∑
K ∑
N ∑
∞
|ak | ≤ |ak | + σ k |aN |
k=0 k=0 k=1
∑N
σ|aN |
= |ak | +
k=0
1−σ
by
∑ Theorem 10.10.15 (since σ < 1). We have shown that the partial sums
∑ of
|ak | are bounded above. Using Theorem 10.10.6, we conclude that an
is absolutely convergent.
Note 10: Sequences and Series – July 28, 2020 10-51
• Let ρ = lim inf n→∞ |an+1 /an | > 1. Fix σ ∈ (1, ρ). Since σ < ρ, the definition
of superior limit implies
un = inf |ak+1 /ak | > σ
k≥n
for all sufficiently large n, say for n > N (in particular an ̸= 0 for all n > N ).
Therefore |ak+1 | ≥ uN |ak | > σ|ak | for all k > N , and it follows easily by
induction that |ak+N | ≥ σ k |aN | for all k ∈ N. Since |aN | > 0 and σ k → ∞
as k → ∞ (because σ > 1), we have |an | → ∞ as n → ∞.
10.10.17 Theorem (Limit Comparison Test). Let {an }, {bn } be sequences
such that bn > 0 for all sufficiently large n. Then:
∑ |an | ∑
• if n bn converges and lim supn→∞ < +∞ then n an converges abso-
bn
lutely;
∑ |an | ∑
• if n bn diverges and lim inf n→∞> 0 then n an diverges.
bn
10.26 Exercise. Prove Theorem 10.10.17.
10.10.18 Theorem (Power Test). Let {an } be a sequence of nonzero real num-
bers.
• If √
lim sup n
|an | < 1
n→∞
∑
then {an } is absolutely summable (in particular, n an converges).
• If √n
lim inf|an | > 1
n→∞
∑
then |an | → ∞ as n → ∞ (in particular, n an diverges).
√
Proof. • Let ρ = lim supn→∞ n |an | < 1. Fix σ ∈ (ρ, 1). Since σ > ρ, the
definition of superior limit implies
√
un = sup k |ak | < σ
k≥n
for all sufficiently large n, say for n > N . Therefore |an | < σ n for n > N .
For any K,
∑
K ∑
N ∑
∞
|an | ≤ |an | + σn
n=0 n=0 n=N +1
∑
N
σ N +1
= |an | +
n=0
1−σ
Note 10: Sequences and Series – July 28, 2020 10-52
by
∑ Theorem 10.10.15 (since σ < 1). We have shown that the partial sums ∑ of
n |an | are bounded above. Using Theorem 10.10.6, we conclude that n an
is absolutely convergent.
√
• Let ρ = lim inf n→∞ n |an | > 1. Fix σ ∈ (1, ρ). Since σ < ρ, the definition of
superior limit implies √
un = inf n |an | > σ
k≥n
for all sufficiently large n, say for n > N . Therefore |an | ≥ σ n for all n > N .
Since σ > 1, it follows from Theorem 10.9.8 that |an | → ∞ as n → ∞.
∑
10.10.19 Theorem. ∑ Let a1 ≥ a2 ≥ · · · ≥ 0. Then, the series ∞ n=1 an converges
∞ n n
if and only if the series n=1 2 a2 converges.
Proof. Given n let k be any integer such that n ≤ 2k . Then
s n = a1 + a2 + a3 + · · · + an
≤ a1 + a2 + · · · + a2k + a2k +1 + · · · + a2k+1 −1
= a20
+(a21 + a22 −1 )
| {z }
2 terms
+(a22 + a22 +1 + a22 +2 + a23 −1 )
| {z }
4 terms
+(a23 + a23 +1 + a23 +2 + · · · + a24 −1 )
| {z }
8 terms
+ · · · + (a2k + a2k +1 + a2k +2 + · · · + a2k+1 −1 )
| {z }
2k terms
≤ a1 + 2a2 + 4a4 + 8a8 + · · · + 2k a2k
= tk ,
s n = a1 + a2 + a3 + · · · + an
≥ a1 + a2 + · · · + a2k−1 + a2k−1 +1 + · · · + a2k
= a20 + a21
+(a21 +1 + a22 )
| {z }
2 terms
+(a22 +1 + a22 +2 + a22 +3 + a23 )
| {z }
4 terms
+(a23 +1 + a23 +2 + a23 +3 + · · · + a24 )
| {z }
8 terms
+ · · · + (a2k−1 +1 + a2k−1 +2 + a2k−1 +3 + · · · + a2k )
| {z }
2k terms
1
≥ a1 + a2 + 2a4 + 4a8 + · · · + 2k−1 a2k
2
1
= (a1 + 2a2 + 4a4 + 8a8 + · · · + 2k a2k )
2
1
= tk ,
2
giving the bound 2sn ≥ tk .Therefore,
We have
∑
∞
1 ∑ ∞ ∑ ∞
n
2 np = 2(1−p)n = xn ,
n=0
2 n=0 n=0
which is the geometric series with x = 21−p . By Theorem 10.10.15, this series
converges if and only if |x| = 21−p < 1, that is, exactly when p < 0 Since |x| =
21−p < 1 if and only if 1 − p < 0 the result follows.
∑∞
1
10.10.21 Corollary (Harmonic Series). The harmonic series diverges.
n=1
n
∑
n
cn = a0 bn + a1 bn−1 + · · · + an b0 = ak bn−k .
k=0
∑∞ ∑∞ ∑∞
The Cauchy product of two series n=0 an and n=0 bn is the series n=0 cn with
cn as above.
10.10.24 Theorem. The Cauchy product of a series absolutely convergent to A
and series convergent to B converges to AB.
Proof. Let {an } be summable to A, {|an |} be summable to α and {bn } be summable
to B. Let {sn }, {σn }, {tn } be the sequences of partial sums of {an }, {|an |} and
{bn }. Then sn → A and tn → B as n → ∞. The convergent sequence {tn } is
bounded, by ∑Proposition 10.3.19. Say |tn | ≤ T for all n ∈ N.
Let cn = nk=0∑ ak bn−k be the nth term of the Cauchy product of {an } and {bn }
and let and γn = nk=0 ck be the nth partial sum of {cn }.
For any n ∈ N we have
∑
n ∑
n ∑
k ∑
γn = ck = al bk−l = ak b l ,
k=0 k=0 l=0 0≤k,l
k+l≤n
that is, γn is the sum of all products ak bl with k + l ≤ n. (Unofficially, such pairs
(k, l) can be visualized as the points with integral coordinates in the triangle △
bounded by the lines x = 0, y = 0 and x + y = n on the plane.)
Note 10: Sequences and Series – July 28, 2020 10-55
⌊n⌋ (1) (2) (3)
Let m = 2
. We write γn = γn + γn + γn , where
∑ ∑ ∑
γn(1) = ak b l = ak bl = sm tm ,
0≤k,l≤m 0≤k≤m 0≤l≤m
∑ ∑
γn(2) = ak b l = ak (tn−k − tm ), (here n − k ≥ n − m ≥ m)
0≤k≤m 0≤k≤m
m<l≤n−k
∑ ∑
γn(3) = ak b l = ak tn−k .
m<k≤n m<k≤n
0≤l≤n−k
(1)
(γn is the sum of the terms ak bl corresponding to points (k, l) in the largest square
(2) (3)
□ inscribed in △, while γn , resp. γn , is the sum of the terms corresponding to
points in the upper half, resp. in the right half, of △.)
Let ϵ > 0 be arbitrary. By Theorem 10.6.1, we have sn tn → AB as n → ∞, so
there exists N1 such that n > N1 ⇒ |sn tn − AB| < ϵ/3. Since {|tn |} converges it is
Cauchy, so there exists N2 such that m, n > N2 ⇒ |tn − tm | < ϵ/(3α + 1). Finally,
since {σn } converges there exists N3 such that m, n > N3 ⇒ |σn −σm | < ϵ/(3T +1).
Let N = 2 max{N1 , N2 , N3 } + 1. We will prove that n > N ⇒ |γn − AB| < ϵ,
concluding the proof that {cn } is summable to AB.
Let n > N . We will show that |γn − AB| < ϵ, concluding
⌊ ⌋the proof.
⌊n⌋
Since n > N ≥ 2Ni + 1 we have m = 2 ≥ (2N1 +1)+1
2
= Ni + 1 > Ni for
i = 1, 2, 3.
(1)
From n > N1 we immediately obtain |γn − AB| = |sm tm − AB| < ϵ/3.
Since n > N2 , we have
∑ ϵ α ϵ
|γn(2) | ≤ |ak | ≤ϵ < .
0≤k≤m
3α + 1 3α + 1 3
is convergent, by the Alternating Series Test 10.10.27. The second series is obtained
from the first by reversing the sign of every second term of the series. What happens
if instead of reversing the sign of every second term we reverse the sign of every
k th term?
For example, the series corresponding to k = 1, . . . , 5 are shown below.
1 1 1 1 1 1 1 1 1
k=1: S1 = −1 − − − − − − − − − − · · ·,
2 3 4 5 6 7 8 9 10
1 1 1 1 1 1 1 1 1
k=2: S2 = 1− + − + − + − + − + · · ·,
2 3 4 5 6 7 8 9 10
1 1 1 1 1 1 1 1 1
k=3: S3 = 1+ − + + − + + − + + · · ·,
2 3 4 5 6 7 8 9 10
1 1 1 1 1 1 1 1 1
k=4: S4 = 1+ + − + + + − + + + · · ·,
2 3 4 5 6 7 8 9 10
1 1 1 1 1 1 1 1 1
k=5: S5 = 1+ + + − + + + + − + · · ·.
2 3 4 5 6 7 8 9 10
Since the negative of the harmonic series is divergent, S1 is divergent. Since the
alternating harmonic series is convergent, S2 is convergent. We have divergence for
k = 1 and convergence for k = 2. What happens for k > 2?
10.10.29 Theorem. Let Sk denote the variant of the harmonic series defined
above. Then, Sk is convergent if and only if k = 2.
Proof. The cases k = 1 and k = 2 are already known. For k > 2, consider the
sequence {Sk,m }∞
m=0 of partial sums of Sk . We shall examine the subsequence of
the form {Sk,km }∞
m=0
We write
Sk,km = Ŝ1,m + · · · + Ŝk−2,m + Ŝk−1,m − Ŝk,m ,
Note 10: Sequences and Series – July 28, 2020 10-58
where, for j = 1, . . . , k,
∑
m
1
Ŝj,m = .
n=0
nk + j
For example, for S5,5m we have
S5,5m = Ŝ1,m + Ŝ2,m + Ŝ3,m + Ŝ4,m − Ŝ5,m ,
where
1 1 1 1 ∑
m
1
Ŝ1,m = + + + ··· + =
1 6 11 mk + 1 n=0
nk + 1
1 1 1 1 ∑
m
1
Ŝ2,m = + + + ··· + =
2 7 12 mk + 2 n=0
nk + 2
1 1 1 1 ∑
m
1
Ŝ3,m = + + + ··· + =
3 8 13 mk + 3 n=0
nk + 3
1 1 1 1 ∑
m
1
Ŝ4,m = + + + ··· + =
4 9 14 mk + 4 n=0
nk + 4
1 1 1 1 ∑m
1
Ŝ5,m = + + + ··· + =
5 10 15 mk + 5 n=0
nk + 5
For j = 1, . . . , k − 2, we have
∑
m
1 ∑ 1
m ∑ 1 m
1∑1
m
Ŝj,m = ≥ = = → +∞
n=0
nk + j n=0
nk + k n=1
nk k n=1
n
as m → ∞.
We also have
1 1 1 1
Ŝk−1,m − Ŝk,m = − + ··· + − .
k−1 k mk + k − 1 mk + k
This is the partial sum of a decreasing alternating series. Since a decreasing alter-
nating series is convergent, we have
Ŝk−1,m − Ŝk,m → α
as m → ∞, for some real α.
We have
( )
Sk,km = Ŝ1,m + · · · + Ŝk−2,m + Ŝk−1,m − Ŝk,m → +∞
as m → ∞, since the first k − 2 terms are positive and unbounded, while the
difference in parentheses is bounded.
The sequence {Sk,m }∞ m=0 of partial sums of Sk contains the divergent subse-
quence {Sk,km }m=0 . Therefore, the whole sequence {Sk,m }∞
∞
m=0 is divergent. We
have established that the series Sk is divergent for k > 2.
Note 10: Sequences and Series – July 28, 2020 10-59
10.10.30 Remark. For the eyes of the inquisitive reader we put forth the following
series¶
∑
∞
(−1)n−1 1 1 1 1
= − + − + · · · = log 2;
n=1
n 1 2 3 4
∑∞
(−1)n−1 1 1 1 1 π
= − + − + ··· = ;
n=1
2n − 1 1 3 5 7 4
∞ ( ) √
∑ 1 1 1 1 1 1 1 1 log(1 + 2)
− − + = − − + + ··· = √ .
n=0
8n + 1 8n + 3 8n + 5 8n + 7 1 3 5 7 2
In each of them there is, on average, an equal balance of positive and negative
terms. The evaluation of these and an infinity of similar formulas can be done
using the Dirichlet Class Number Formula, a most remarkable result in analytic
number theory (the branch of mathematics that studies arithmetic properties of
numbers primarily utilizing tools from analysis).
(For a somewhat different and simpler, if less direct proof, use the ratio test:
∑
limn→∞ 1/(n+1)!
1/n!
1
= limn→∞ n+1 = 0 < 1 ⇒ n 1/n! converges.)
10.11.2 Definition (The Number e). Define
∑∞
1
e= .
n=0
n!
∑
Clearly, {an,k }n monotonically increases to 1/k!. Since k 1/k! converges,
∑∞it follows
from the Monotone Convergence Theorem 10.10.14 that (1 + n ) = k=0 an,k →
∑
1 n
∞
k=0 k! = e as n → ∞.
1
10.27 Exercise. Show that the sequences {an } and {bn } defined by
( )2n
1
an = 1 + n
2
( )−2n
1
bn = 1 − n
2
are bounded and monotonic ({an } is increasing and {bn } is decreasing). Also show
that they have the same limit.
(In view of Proposition 10.11.3, their common limit is e because {an } is a
subsequence of {(1 + n1 )n }.)
10.11.4 Theorem. Euler’s number e is irrational.
Proof. The proof is by contradiction. Assume that e is rational. Then e = m/n
for some integers m, n with n ̸= 0. Since e > 0, we may assume m, n > 0 without
Note 10: Sequences and Series – July 28, 2020 10-61
loss of generality. We also have shown that 2 < e < 3, therefore e is not an integer
and we must have n ≥ 2. Let
∑
n
n! ∑
n
a= = n(n − 1) . . . (n − k + 1)
k=0
k! k=0
and
∑∞
n! ∑
∞
n! ∑
∞
1
ϵ= = = .
k=n+1
k! k=1
(n + k)! k=1 (n + 1)(n + 2) . . . (n + k)
Then a ∈ N+ , and
∑
∞
1 ∑ 1
∞
1 1 1 1
0<ϵ< = = · = ≤ = 1.
| · n{z. . . n} k=1 n n 1− n−1 2−1
k 1
k=1
n n
k factors
∑∞
1 ∑n
n! ∑∞
n!
m(n − 1)! = n! = + = a + ϵ,
k=0
k! k=0
k! k=n+1
k!
by the geometric formula (10.21). Hence ϵ = m(n − 1)! − a is an integer in (0, 1),
a contradiction. Therefore, e is irrational.
10.11.5 Definition. The natural exponential function exp : E → E is the function
expe , that is, the function x 7→ ex (with exp(−∞) = e−∞ = 0 and exp(+∞) =
+∞). The natural logarithm function ln : [0, +∞] → E is simply the function loge
(logarithm to the base e), with ln(0) = −∞ and ln(+∞) = +∞.
Note: In pure mathematics it is very common to denote the natural logarithm
function by “log”. However, the notation “ln” is an ISO standard and precludes any
confusion with logarithms to other bases, particularly with “common” logarithms
(to the base 10).
Foundations Note 11
Continuity
Contents
11.1 Limits of Functions . . . . . . . . . . . . . . . . . . . . . . . 11-1
11.2 Continuous Functions . . . . . . . . . . . . . . . . . . . . . . 11-13
11.3 Density and compactness in spaces of functions . . . . . . . . 11-27
11-1
Note 11: Continuity – July 28, 2020 11-2
(11.1) (∀x ∈ S)(∀ϵ > 0)(∃N ∈ N)(n > N ⇒ |fn (x) − g(x)| < ϵ).
If {fn } is a sequence of functions S → R (note here that the values take values
in R, though perhaps S ⊂ E), we say that a function g : S → R is the uniform
limit of {fn } on S as n → ∞ if
(11.2) (∀ϵ > 0)(∃N ∈ N)(∀x ∈ S)(n > N ⇒ |fn (x) − g(x)| < ϵ).
(∥f ∥ is the least upper bound of the absolute values of the values of f on S.)
Note that ∥f ∥ ≤ B if and only if −B ≤ f (x) ≤ B for all x ∈ S. In particular,
∥f ∥ < +∞ if and only if f is bounded.
Note 11: Continuity – July 28, 2020 11-4
Define the uniform distance du (f, g) between two (not necessarily bounded)
functions f, g : S → R to be the nonnegative extended real number
du (f, g) = ∥g − f ∥ .
1. Positivity: du (f, g) > 0 unless f = g (i.e., unless f (x) = g(x) for all x ∈ S);
While we shall rarely use the notation du (f, g) in the future (writing directly
∥g − f ∥ instead), the reader should think of any sup norm of a difference as a
uniform distance.
For any real ϵ > 0, the (uniform) ϵ-neighborhood of a function f ∈ RS is the
set
Nϵu (f ) = Nϵ (f ) = {g ∈ RS | ∥g − f ∥ < ϵ}.
(We use the superscript “u” for emphasis only, to remind the reader the neighbor-
hood is uniform. Note that hitherto the notion of neighborhood of a function was
undefined.) Building on uniform ϵ-neighborhoods, one defines a (uniform) neigh-
borhood of f to be any set N ⊂ RS such that Nϵu (f ) ⊂ N for some ϵ > 0, and
an open set U ⊂ RS to be one that is a (uniform) neighborhood of each of its
elements.
11.1.6 Remark. For any sequence {fn } in RS , we have fn → f uniformly as
n → ∞ if and only if f ∈ RS and
1. ∥−f ∥ = ∥f ∥.
∥f + g∥ ≤ ∥f ∥ + ∥g∥ .
∥uf ∥ ≤ |u| · ∥f ∥
We say that the limit of f (x) exists as x approaches c if there exists L ∈ E such
that f (x) → L as x → c. If L ∈ R, we say that the limit exists properly, or f
converges to L as x → c.
Two immediate reformulations of the above definition are:
( )
f∗ S ∩ (N \ {c}) ⊂ N ′ ,
S ∩ (N \ {c}) ⊂ f ∗ (N ′ )
f (x) → L as x → c.
Equivalently, f (x) → L as x → c if and only if for all ϵ > 0 there exists δ > 0
such that
(11.4) (∀ϵ > 0)(∃δ > 0)(∀x ∈ S)(0 < |x − c| < δ ⇒ |f (x) − L| < ϵ).
(∀ϵ > 0)(∃δ > 0)(∀x ∈ S)(0 < |x − c| < δ ⇒ f (x) > 1/ϵ).
(∀ϵ > 0)(∃δ > 0)(∀x ∈ S)(0 < |x − c| < δ ⇒ f (x) < −1/ϵ).
(∀ϵ > 0)(∃δ > 0)(∀x ∈ S)(x > 1/δ ⇒ |f (x) − L| < ϵ).
(∀ϵ > 0)(∃δ > 0)(∀x ∈ S)(x < −1/δ ⇒ |f (x) − L| < ϵ).
(∀ϵ > 0)(∃δ > 0)(∀x ∈ S)(x > 1/δ ⇒ f (x) > 1/ϵ).
Proof. If f (x) → L1 and also f (x) → L2 as x → c, then for every neighborhoods N1′
of L1 , N2′ of L2 , there exist neighborhoods N1 , N2 of c such that f∗ (N1 \ {c}) ⊂ N1′
and f∗ (N2 \ {c}) ⊂ N2′ . Let N = N1 ∩ N2 . Since c is a limit point of the
domain of( f , and N is )a neighborhood of c, there exists x ∈ N \ {c}, hence
f (x) ∈ f∗ (N1 ∩ N2 ) \ {c} ⊂ f∗ (N1 \ {c}) ∩ f∗ (N2 \ {c}) ⊂ N1′ ∩ N2′ . Hence N1′ and
N2′ are not disjoint. We conclude that L1 and L2 have no disjoint neighborhoods,
so L1 = L2 , by the Hausdorff property of E, Proposition 10.2.5.
√
11.1.10 Example. Consider √ f : [0, +∞) → R : x 7→ x and fix c ≥ 0. Then
limx→c f (x) exists and equals c. To prove this statement we use a so-called “ϵ-δ
argument” (that is, we prove that (11.4) holds). It goes as follows. √
Let ϵ > 0 be given. First, suppose that c ̸= 0. Choose δ = ϵ c. For x ≥ 0 we
have
√ √
√ √ (√ √ ) x + c
0 < |x − c| < δ ⇒ x − c = x − c · √ √
x + c
|x − c|
= √ √
x+ c
|x − c|
≤ √
c
δ
< √
c
= ϵ.
11.1.12 Remark. How does one prove that a limit does not exist? To assert
that limx→c f (x) exists is to assert
(11.5) (∃L ∈ E)(∀ϵ > 0)(∃δ > 0)(∀x ∈ S)(x ∈ Nδ (c) \ {c} ⇒ f (x) ∈ Nϵ (L)).
Accordingly, the negation of (11.5), i.e.: “limx→c f (x) does not exist,” is
(11.6) (∀L ∈ E)(∃ϵ > 0)(∀δ > 0)(∃x ∈ S)(x ∈ Nδ (c) \ {c} ∧ f (x) ∈
/ Nϵ (L)).
When c ∈ R, asserting that limx→c f (x) is not equal to any real number L
(classically, one says that the limit diverges) is equivalent to
(11.7) (∀L ∈ R)(∃ϵ > 0)(∀δ > 0)(∃x ∈ S)(0 < |x − c| < δ ∧ |f (x) − L| < ϵ).
(Note that limx→∞ f (x) could still exist and be ±∞.) We shall put this to use
below.
11.1.13 Example. We show that if f : (0, +∞) → R : x 7→ 1/x, then
limx→0 f (x) is not equal to any real number L. We need to show that for ar-
bitrary L there is some ϵ > 0 such that given any δ > 0 we can choose x > 0 such
that
1
0 < |x − 0| < δ ∧ − L ≥ ϵ.
x
Surely, we can choose x > 0 so small that 1/x > L + ϵ and such that x < δ (if
L + ϵ > 0, just choose x positive and smaller than min{δ, 1/(L + ϵ)}; otherwise any
x ∈ (0, δ) will do). With this x we have
1
− L > ϵ.
x
In fact, for the function f above we have limx→0 f (x) = +∞. The corresponding
“ϵ-δ argument” is as follows. Given ϵ > 0 let δ = ϵ. For all x ∈ (0, +∞) we have
(Of course, having proved f (x) → ∞ as x → 0 also shows that f (x) diverges in R,
since limits are unique in E, by Proposition 11.1.9.)
Note 11: Continuity – July 28, 2020 11-9
3. For f : (−∞, 0) ∪ (0, +∞) → R : x 7→ 1/x, limx→0 f (x) does not exist.
4. If f is defined at c, the limit may exist, but need not equal f (c). For example,
consider f : R → R given by
{
1, x ̸= 0,
f (x) =
0, x = 0.
5. The limit may fail to exist even if the function is bounded, c is a limit point
of S, and f (c) is defined. Consider f : R → R given by
{
1, x rational,
f (x) =
0, x irrational,
then limx→c f (x) does not exist for any c ∈ E: For any L ∈ E the neighbor-
hood N ′ = N1/2 (L) of L includes at most one of the points 0, 1, while f takes
both values 0 and 1 in any punctured neighborhood N \ {c} of c, showing
f (x) ↛ L as x → c.
This example also shows that the limit of a function may not exist anywhere
(meaning: not at any limit point c of its domain).
11.5 Exercise. Consider f : R → R given by
{
x, x rational,
f (x) =
0, x irrational.
Prove that limx→c f (x) exists for c = 0 and does not exist for any other extended
real number c.
Note 11: Continuity – July 28, 2020 11-10
if and only if for every sequence {xn } in S \ {c} with limn→∞ xn = c, we have
Proof. (A) implies (B): Assume (A) holds. Let {xn } be a sequence in S \ {c}
with limn→∞ xn = c. In order to prove (B) we need to show that f (x) is eventually
in any given neighborhood N ′ of L.
By the definition of the limit of a function there exists a neighborhood N of c
such that
(∀x ∈ S)(x ∈ N \ {c} ⇒ f (x) ∈ N ′ ).
Since {xn } is eventually in (S \ {c}) ∩ N , we have {f (xn )} is eventually in N ′ ,
concluding this part of the proof.
(B) implies (A): This part of the proof follows from the existence of a count-
able basis of neighborhoods shrinking to any point in E. To be more precise, we
shall prove the contrapositive statement: the negation of (A) implies the negation
of (B). Let {Nn } be a shrinking sequence of neighborhoods of c. Assuming that
(A) does not hold, there must exist a neighborhood N ′ of L with the following
property: Every neighborhood N of c contains at least one point x ∈ S \ {c} such
/ N ′ . Applying this to the neighborhood Nn we obtain, for each n ∈ N,
that f (x) ∈
a point xn ∈ Nn ∩ S \ {c} with f (x) ∈ / N ′ ; therefore, {f (xn )} is disjoint from the
neighborhood N ′ of L and we certainly have f (xn ) ̸→ f (c) as n → ∞. Since the
sequence {Nn } shrinks to c we have limn→∞ xn = c. Hence (B) does not hold. The
proof is complete.
Note that Proposition 11.1.3 proved earlier is an immediate corollary of theo-
rem 11.1.15.
11.1.16 Definition (Uniform and pointwise limits of functions). Let S ⊂ E
and let S∗ be the set of limit points of S in E. Let a function f : S → E be given.
For every c ∈ S∗ , assume that limx→c f (x) exists. Then the function g : S∗ → E
defined by g(c) = limx→c f (x) is called the pointwise cluster limit of f . Thus, g is
uniquely characterized by the property f (x) → g(c) as x → c for every c ∈ S∗ .
The adjective “cluster” is meant to remind the reader that the limit is assumed
to exist at cluster (limit) points of S, while “pointwise” means that property is
a universal quantification over points (cluster points of the domain); see equa-
tion (11.8) below. We regard the adjective “cluster” as optional, used only for
Note 11: Continuity – July 28, 2020 11-11
emphasis. (I.e., to distinguish this from other type of limits, for example from
pointwise or uniform limits of sequences of functions—in a cluster limit the func-
tion f is fixed.) We caution the reader that the nomenclature “pointwise cluster
limit” is not standard.
In other words, g is the pointwise cluster limit of f if
(11.8) (∀c ∈ S∗ )(∀ϵ > 0)(∃δ > 0)(∀x ∈ S)(x ∈ Nδ (c) \ {c} ⇒ f (x) ∈ Nϵ (f (c))).
(11.9) (∀ϵ > 0)(∃δ > 0)(∀c ∈ S∗ )(∀x ∈ S)(x ∈ Nδ (c) \ {c} ⇒ f (x) ∈ Nϵ (g(c))).
Note that uniform limits are defined above exclusively in a real (rather than ex-
tended real) context. See the comments in definition 11.1.1 and the footnote
therein.
The difference between (11.8) and (11.9) is subtle, but very important. To wit,
δ = δ(c, ϵ) is allowed to depend on both c and ϵ for the pointwise limit to exist, but
the same δ must satisfy the condition simultaneously (“uniformly”) for all c ∈ S∗
for the limit to be uniform (though δ is still allowed to depend on ϵ in the latter
case). Clearly, if g is the uniform cluster limit of f then g is also the pointwise
limit, but not conversely. The example below shows that the pointwise limit may
exist even when a uniform one does not. In such case, one says that the (pointwise)
limit exists but it is not attained uniformly.
11.6 Exercise. Prove that limx→c 1
x2
exists for all c ∈ E, with
Show that the limit is attained uniformly on R \ (−a, a) for any a > 0, but not on
R \ {0}.
√ √ √
11.7 Exercise. Show that limx→c x = c for all c ∈ [0+∞] (recall that +∞ =
(+∞)1/2 = +∞ by definition). In fact, show that the limit is reached uniformly
on [0, +∞).
whenever the right-hand sides are defined in the arithmetic of E (i.e., x ∈ S such
that the right-hand side is of the form ∞ − ∞, 0 · ∞ or ∞/∞ must be excluded).
11.1.18 Theorem. Let S be a subset of E and let c be a limit point of S. Let
f, g : S → E. Suppose
(5a) limx→c (f /g)(x) = +∞ if L1 > L2 = 0 and g(x) > 0 for all x ̸= c, or else
L1 < L2 = 0 and g(x) < 0 for all x ̸= c;
(5b) limx→c (f /g)(x) = −∞ if L1 > L2 = 0 and g(x) < 0 for all x ̸= c, or else
L1 < L2 = 0 and g(x) > 0 for all x ̸= c;
(7) limx→c (−f )(x) = −L1 (if L1 = ±∞, interpret −L1 as (−1)L1 = ∓∞)
11.8 Exercise. For a subset S ⊂ R, if {fn } and {gn } are sequences of functions
S → R, and if f, g are the uniform limits of {fn }, {gn } on S as n → ∞, then f + g
is also the uniform limit of {fn + gn } on S.
11.1.19 Definition. If F is a family of functions into R, we say that F is uniformly
bounded if there exists a finite N ≥ 0 such that every f ∈ F is a function taking
values in [−N, N ].
If F is a family of functions uniformly bounded, then every function f ∈ F
is bounded. However, the converse is not true, as witnessed by the family {fn }
with fn the function x 7→ nx in [0, 1]. Then |fn (x)| ≤ n for all x ∈ [0, 1], so fn is
bounded, yet no uniform bound N exists since fn (1) = n > N for all n > N .
11.9 Exercise. If fn → f uniformly on S, and each fn is bounded, then {fn } is
uniformly bounded.
11.10 Exercise. 1. For a subset S ⊂ R, if {fn } and {gn } are uniformly bounded
sequences of functions S → R having uniform limits f, g : S → R, show that
f g is the uniform limit of fn gn on S.
Note that c may be either a limit point or an isolated point of S. Equivalently, for
every neighborhood N ′ of f (c) there exists a neighborhood N of c such that
f∗ (S ∩ N ) ⊂ N ′ .
If c and f (c) are both real, we have the equivalent formulation: Given ϵ > 0 there
exists δ > 0 such that
We urge the reader to spell out a formulation, using inequalities, of all the improper
cases of the definition (when either c, f (c), or both, are infinite).
If f is not continuous at c, we say that f is discontinuous at c.
If f is continuous at every point c ∈ S, we say that f is continuous on S (or
just that it is continuous).
If S ⊂ R and f : S → R, we say that f is uniformly continuous on S if, given
any ϵ > 0, there is a δ > 0 such that
Equivalently,
Note that δ may depend on c and on ϵ. To say that limx→c f (x) exists uniformly
on S means
Note that δ may depend on ϵ, but the same δ must work (uniformly) for all c ∈ S.
Clearly, uniform continuity implies continuity.
†
There can be oscillations, provided they dampen down to naught as c is approached.
Note 11: Continuity – July 28, 2020 11-15
In this sense, continuity is a property that can be stated informally as “the limit
of the f ’s is the f of the limit.”
11.2.6 Example. Let us agree to write every rational number x as a reduced
fraction x = m/n with n > 0 (i.e., the greatest common divisor of m and n is 1).
Define the function f : R → R by
{
1/n, if x = m/n,
f (x) =
0, if x is irrational.
Suppose c is rational. Then, f (c) > 0. Let {xk } denote a sequence of irra-
tionals that converges to c (this sequence can be easily constructed by picking
xk ∈ N1/(k+1) (c) \ Q, where the latter set is nonempty by the density of the irra-
tionals in R). We have f (xk ) = 0 for all k, giving
f : [0, +∞) → R : x 7→ xα
√
(in particular, x 7→ x) is uniformly continuous for α ∈ (0, 1].
1. Prove that f is not α-Hölder continuous for any α > α0 . (Note that f (x) =
xαo for x = 1, 3−1 , 3−2 , . . . .)
2. For z ∈ [0, 1], let B(z) = sup f ∗ ({z}). Prove that B is a right inverse to the
Cantor function (that is, f (B(z)) = z for all z ∈ [0, 1]) satisfying B(f (x)) ≥ x
for all x ∈ [0, 1] and f (B(z) + B(w)) ≤ z + w whenever B(z) + B(w) ≤ 1.
(It may help to note that equality holds if and only if the strict binary
expansions of z, w are “disjoint” in the following sense: for all j ∈ N+ at
least one among the j-th digits δj of z and δj′ of w is zero.)
4. Find γ > 0 such that f (x) ≤ γxαo for all x ∈ [0, 1].
Note 11: Continuity – July 28, 2020 11-19
5. Combine the previous two parts to show that f is α-Hölder continuous for
any exponent α ∈ (0, αo ].
|f (x) − f (y)| ≤ γ |x − y|
for all x and y in I. The constant γ is called the Lipschitz (or Hölder) constant
of f .
11.2.2 Contractions
We introduce a class of functions that plays an important role in many iterative
methods.
11.2.14 Definition. Contractions Let I ⊂ R be an interval. The function
f : I → I is called a contraction if it is Lipschitz continuous with Lipschitz constant
γ < 1. In other words, f is a contraction if there exits a constant γ < 1 such that
showing that the sequence {xn } is contractive and, therefore, convergent (The-
orem 10.7.3). Since {xn } is a sequence the closed set I, the limit belongs to I
also.
A contraction is continuous. Let x∗ denote the limit of {xk }. Then, the conti-
nuity of f gives
lim xn = x∗ ⇒ lim f (xn ) = f (x∗ ).
n→∞ n→∞
(Note that the theorem states absolutely nothing about those points c at which
the functions α + f , …, are undefined.)
Proof. Apply Theorem 11.1.18.
11.2.18 Theorem. Let S ⊂ E and f : S → T be continuous at c ∈ S. Let T ⊂
E and g : T → R be defined and continuous at f (c) ∈ T . Then the composition
g ◦ f : S → R is defined and continuous at c. In particular, the composition of
continuous functions is continuous (wherever it is defined).
Proof. Suppose that f and g are as stated in the hypotheses of the theorem.
Given a neighborhood N ′′ of g(f (c)) there exists a neighborhood N ′ of f (c) such
that g∗ (N ′ ) ⊂ N ′′ , by continuity of g at f (c). There exists a neighborhood N of c
such that f∗ (N ) ⊂ N ′ , by continuity of f at c. Then (g ◦ f )∗ (N ) = g∗ (f∗ (N )) ⊂
g∗ (N ′ ) ⊂ N ′′ , so g ◦ f is continuous at c.
Hence A = S ∩ U , B = S ∩ V .
We show that U, V are disjoint. By choice of ϵ(x), we have |z − x| ≥ 2ϵ(x) for
all x ∈ A, z ∈ S \ A. Similarly, |z − y| ≥ 2ϵ(y) for all y ∈ B, z ∈ S \ B. Since A, B
Note 11: Continuity – July 28, 2020 11-24
• If f (a) ≤ f (b) then, for every y ∈ [f (a), f (b)] we have y = f (x) for some
x ∈ [a, b].
• If f (b) ≤ f (a) then, for every y ∈ [f (b), f (a)] we have y = f (x) for some
x ∈ [a, b].
Proof. By Propositions 10.2.3 and 11.2.25, the intervals I and J = f∗ (I) are both
connected, hence convex. Since f (a), f (b) ∈ J we have y ∈ J for every y between
f (a) and f (b), so y = f (x) for some x ∈ I.
11.2.27 Corollary. Let I = [a, b] be a nonempty closed interval in E. Let f : I →
E be continuous. If f (a)f (b) < 0 then f (x) = 0 for some x ∈ I.
Proof. We have f (a)f (b) < 0 if and only if, either f (a) < 0 < f (b), or else f (b) <
0 < f (a), so the stated result follows by the Intermediate Value Theorem 11.2.26
with y = 0.
In particular, f is bounded on I.
Proof. The finite closed interval I is compact by the Heine-Borel Theorem 9.6.12,
so the stated inequalities follow from Theorem 11.2.30 while the boundedness state-
ment is obvious since f (s), f (t) ∈ R.
Recall that f : (0, +∞) → R : x 7→ 1/x is continuous,
√ but not uniformly
continuous. In fact, given any δ > 0 and any n > 1/ δ we have f (1/(n + 1)) −
f (1/n) = 1 while |1/(n + 1) − 1/n| < 1/n2 < δ, so the definition of uniform
continuity fails for ϵ = 1. In fact, the argument above shows that restriction
of f to any interval (0, ϵ) for arbitrary ϵ > 0 fails to be uniformly continuous.
The proposition below shows that continuous functions with compact domains are
necessarily uniformly continuous.
Note 11: Continuity – July 28, 2020 11-27
{f ∈ B(S) | ∥f ∥ ≤ ϵ}
Proof. Let {fn } be uniformly Cauchy. Then {fn (x)} is Cauchy for each x ∈ S.
By the Cauchy Principle in R (Theorem 10.5.5), there exists f (x) ∈ R such that
fn (x) → f (x) as n → ∞, and the function f : x 7→ f (x) is the pointwise limit of
fn as n → ∞.
Using the hypothesis that {fn } is uniformly Cauchy, we show that fn → f
uniformly. Let ϵ > 0 be given. Choose N large enough that ∥fn − fm ∥ < ϵ/3 for
all m, n > N . For now, fix x ∈ S and let y = fN +1 (x). Since fn (x) is eventually in
the neighborhood (y − 3ϵ , y + 3ϵ ) of y, the limit f (x) of this sequence belongs to the
closed neighborhood [y − 3ϵ , y + 3ϵ ]. Therefore, |f (x) − fN +1 (x)| ≤ 3ϵ . For all n > N
we have
ϵ ϵ 2
|fn (x) − f (x)| ≤ |fn (x) − fN +1 (x)| + |fN +1 (x) − f (x)| < + = ϵ.
3 3 3
Since this holds for all x ∈ S, we have ∥fn − f ∥ ≤ 32 ϵ < ϵ for all n > N , so fn → f
uniformly.
Conversely, let {fn } be uniformly convergent to f . Given ϵ > 0 choose N large
enough that ∥fn − f ∥ < ϵ/2 for all n > N . Then we have, for all m, n > N ,
ϵ ϵ
∥fn − fm ∥ ≤ ∥fn − f ∥ + ∥fn − f ∥ < + = ϵ,
2 2
so {fn } is uniformly Cauchy.
|f (y) − f (x)| ≤ |fN +1 (x) − f (x)| + |fN +1 (y) − fN +1 (x)| + |f (y) − fN +1 (y)|
ϵ ϵ ϵ
< + + =ϵ for all y ∈ S ∩ N .
3 3 3
Therefore, f is continuous at x.
11.3.6 Corollary. Let S be a nonempty subset of E and let {fn } be a sequence
of functions in C(S). If fn → f uniformly on S, then f is continuous.
Proof. Apply Proposition 11.3.5 at each point x ∈ S.
11.3.7 Corollary. For any subset S ⊂ E, C(S) is a closed subset of B(S).
Proof. The limit of any uniformly convergent sequence {fn } in C(S) belongs to C(S),
by Corollary 11.3.6. Therefore, C(S) is a closed set in B(S).
11.3.8 Proposition. Let S be a nonempty subset of R, fix x ∈ S, and let {fn }
be a sequence of functions in C(S) such that fn is uniformly continuous for all n.
If fn → f uniformly on S, then f is uniformly continuous on S.
Proof. Let ϵ > 0 be given. Choose N large enough that ∥fn − f ∥ < ϵ/3 for all
n > N . Using the uniform continuity of fN +1 , choose δ > 0 such that |fN +1 (y) −
fN +1 (x)| < ϵ/3 for all x, y ∈ S with |y − x| < δ. For all such x, y we have
|f (y)−f (x)| < ϵ, by the same chain of inequalities in the proof of Proposition 11.3.5
above. Therefore, f is uniformly continuous on S.
Note 11: Continuity – July 28, 2020 11-31
Since the above holds for all ϵ > 0, no element L′ ̸= Lx may belong to all the sets
Ēδ̄(ϵ) , so a fortiori not to Ē(x) either, proving the uniqueness of Lx .
If x ∈ D, we certainly have g(x) ∈ Ē(x), hence g(x) = Lx . Therefore, we may
extend g to a function defined on all of S by letting g(x) = Lx .
Further, we claim that this extension of g satisfies the uniform continuity con-
dition:
(1) For all ϵ > 0 and all y, z ∈ S: |y − z| < δ(ϵ) ⇒ |g(z) − g(y)| ≤ ϵ.
|fn (y) − g(y)| ≤ |fn (y) − fn (xi )| + |fn (xi ) − g(xi )| + |g(y) − g(xi )|
ϵ ϵ ϵ
< + + = ϵ.
3 3 3
Since g is the uniform limit of a sequence in F and F is closed, we have g ∈ F .
Therefore, F is sequentially compact.
11.3.16 Remarks. • Every sequentially compact metric space is compact (a
general result we do not prove in these Notes). Therefore, Arzelà-Ascoli’s
Theorem may be formulated to state that an equicountinuous and equi-
bounded uniformly closed subset of C(S) (for compact S ⊂ R) is compact in
the uniform topology.
1 − xn ∑
n−1
fn (x) = = = 1 + x + · · · + xn−1 on [0, 1)
1−x k=0
In the first example above, the continuous function f , being unbounded, can-
not be approximated uniformly by polynomials (although it is a pointwise limit
of polynomials). In the second one, the continuous and bounded nonconstant
function g cannot be uniformly approximated by polynomials because the approx-
imating polynomials would necessarily need to stay bounded at ±∞, hence be
constant, and the nonconstant function g cannot be uniformly approximated by
constant functions.
In the first example, the unbounded behavior of f (x) as x approached the limit
point 0 clashed with the bounded behavior of any approximating polynomial, while
in the second example, the bounded behavior of g at the limit points ±∞ clashed
with the unbounded behavior at ±∞ of polynomials of positive degree.
The situation near ±∞ cannot be remedied: Polynomials are simply not a suf-
ficiently rich class of functions to uniformly approximate most functions in neigh-
borhods of ±∞. Now, only bounded functions can be uniformly approximated by
polynomials in any given bounded domain.
For essentially the same reason, polynomials cannot be used to approximate
arbitrary functions near a limit point of its domain, unless the functions themselves
are controlled near that limit point. This can be accomplished by requiring that
the domain contain all its limit points (i.e., be closed), and the function to be
approximated continuous everywhere. After all, any uniform limit of continuous
functions (such as polynomials) is continuous.
At this point, it seems plausible to conjecture that any continuous, real-valued
function on a compact set S ⊂ R can be uniformly approximated by polynomials.
This result is true; in fact, it is a particular case of the Stone-Weierstrass theorem.¶
We will settle for a modern proof, using Bernstein polynomials, of Weierstrass’s
classical result in which S is a compact interval [a, b].
11.3.19 Theorem (Weierstrass’s Approximation Theorem). For real num-
bers a < b, let P(I) be the set of all polynomial functions on the interval I = [a, b].
Then P(I) is dense in C(I) in the uniform topology.
Equivalently: For every f ∈ C(I) and every ϵ > 0 there exists a polynomial
p ∈ P(I) such that |p(x) − f (x)| < ϵ for all x ∈ I.
We will proof of Theorem 11.3.19 using properties of Bernstein’s Polynomials,
which we proceed to define.
¶
More precisely, it is a special case of Stone’s generalization of Weierstrass’s Theorem ***
below. In Stone’s theorem, S is a compact topological space (not necessarily a subset of E) and
the approximation is done via functions in a suitable class (algebra) of continuous functions on S,
not necessarily polynomials.
Note 11: Continuity – July 28, 2020 11-37
3. (k + 1)(1 − t)bnk+1 (t) = (n − k)tbnk (t) for 0 ≤ k < n and t ∈ [0, 1].
∑
m
bnj (t) ≥ (m − k)bnk (t).
j=k+1
(
2. For all t ∈ k
n+1
, 1], we have
∑
k
bnk (t) (1 − t)k
bnj (t) ≤ where ρnk (t) = .
j=0
1 − ρnk (t) (n − k + 1)t
Proof. 1. Using part 3. of Proposition 11.3.21, one shows bnj (t) ≥ bkn (t) for k <
j ≤ m by induction on j. The stated inequality follows upon adding these
m − k inequalities together.
Note 11: Continuity – July 28, 2020 11-38
Formula,
∑k
1 − rkk+1 n bn (t)
bnj (t) ≤ · bk (t) ≤ k .
j=0
1 − rk 1 − r k
∑
d
n
B[c,d] (t) := bnj (t).
j=c
Clearly, 0 ≤ B[c,d]
n
(t) ≤ 1, by parts 1. and 2. of Proposition 11.3.21.
11.3.23 Proposition. For all δ ∈ (0, 1) and all ϵ > 0 there exists n ∈ N such that
[ ]
k
n
B[0,k] (t) < ϵ for all k ≤ n and all t ∈ + δ, 1 .
n
defined on the interval [δ, 1], takes values strictly less than 1. Since ηδ is continuous,
the Extreme Value Theorem 11.2.31 implies that there exists γ = γδ > 0 such that
ηδ (t) ≤ 1 − γ for all t ∈ [δ, 1].
Let n be any integer greater than (1 + ϵ−1 γ −1 )δ −1 . We will prove the inequality
Bkn (t) < ϵ whenever t, k satisfy nk + δ ≤ t ≤ 1.
Let m be the unique integer satisfying m < (n + 1)t ≤ m + 1. Certainly, m ≤ n
and t > m/(n+1). Further, we have m ≥ (n+1)t−1 > k +nδ +t−1 > k +ϵ−1 γ −1 .
Since ϵ, γ ≤ 1 we have m ≥ k + 1. Part 1. of Proposition 11.3.22 gives
(1) n
B[k+1,m] (t) ≥ ϵ−1 γ −1 bnk (t),
(2) n
B[0,k] (t) ≤ γ −1 bnk (t),
since t ≥ k
n
+δ > k
n+1
.
Note 11: Continuity – July 28, 2020 11-39
(We could also define B 0 (f ; t) to be the constant f (0), but we only need the ap-
proximants for n large.)
Let E n (f ; t) = |B n (f ; t) − f (t)| for t ∈ [0, 1]. We have
∑ n
( )
E (f ; t) =
n
f n bj (t) − f (t)
j n
j=0
∑ n
( (j) ) n
= f n − f (t) bj (t) by Proposition 11.3.21.2.
j=0
Either of the last two sets R, L may be empty. By choice of n, we have BRn (t), BLn (t) <
1
8
ϵ/(∥f ∥ + 1). By choice of δ, we have |f ( nj ) − f (t)| < ϵ/2 for j ∈ M .
Since |f (u) − f (t)| ≤ |f (u)| + |f (t)| ≤ 2 ∥f ∥ for all t, u ∈ [0, 1], we have
E n (f ; t) ≤ EMn
(f ; t) + ERn (f ; t) + ELn (f ; t)
ϵ n
≤ BM (f ; t) + 2 ∥f ∥ BRn (f ; t) + 2 ∥f ∥ BLn (f ; t)
2
ϵ 2 ∥f ∥ ϵ ϵ ϵ
≤ ·1+2· < + = ϵ,
2 8(∥f ∥ + 1) 2 2
uniformly for all t ∈ [0, 1]. This concludes the proof that the polynomials B n (f ; ·)
uniformly approximate f on [0, 1]. ( )
In general, if f is continuous on a closed interval [a, b], let g : t 7→ f (1−t)a+tb
and let p( be )any polynomial such that (|p(t)
) − g(t)| < ϵ for all t ∈ [0, 1]. Let
( x−a )
q : x 7→ p b−a . Then |f (x) − q(x)| = g b−a − p b−a < ϵ for all x ∈ [a, b].
x−a x−a