 1870 to 1940 
Mark Scheﬀer
(Version 1.0)
2
3
.
Mark Scheﬀer, id. 415968, email: zax@chello.nl. Last changes:
March 22, 2002. This report is part of a practical component of the Com
puting Science study at the Eindhoven University of Technology.
4
To work on the foundations of mathematics, two things are needed:
Love and Blood.
 Anonymous quote, 2001.
Contents
1 Introduction 9
2 Cantor’s paradise 13
2.1 The beginning of settheory . . . . . . . . . . . . . . . . . . . 13
2.2 Basic concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
3 Mathematical constructs in settheory 21
3.1 Some mathematical concepts . . . . . . . . . . . . . . . . . . . 21
3.2 Relations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
3.3 Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
3.4 Induction Methods . . . . . . . . . . . . . . . . . . . . . . . . 32
3.4.1 Induction . . . . . . . . . . . . . . . . . . . . . . . . . 32
3.4.2 Deduction . . . . . . . . . . . . . . . . . . . . . . . . . 33
3.4.3 The principle of induction . . . . . . . . . . . . . . . . 34
3.5 Real numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
3.5.1 Dedekind’s cuts . . . . . . . . . . . . . . . . . . . . . . 46
3.5.2 Cantor’s chains of segments . . . . . . . . . . . . . . . 47
3.5.3 Cauchysequences . . . . . . . . . . . . . . . . . . . . . 48
3.5.4 Properties of the three deﬁnitions . . . . . . . . . . . . 50
3.6 Inﬁnite sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
3.7 The Continuum Hypothesis . . . . . . . . . . . . . . . . . . . 60
3.8 Cardinal and Ordinal numbers and Paradoxes . . . . . . . . . 63
3.8.1 Cardinal numbers and Cantor’s Paradox . . . . . . . . 63
3.8.2 Ordinal numbers and BuraliForti’s Paradox . . . . . . 65
4 Peano and Frege 71
4.1 Peano’s arithmetic . . . . . . . . . . . . . . . . . . . . . . . . 71
4.2 Frege’s work . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
5
6 CONTENTS
5 Russell 79
5.1 Russell’s paradox . . . . . . . . . . . . . . . . . . . . . . . . . 82
5.2 Consequences and philosophies . . . . . . . . . . . . . . . . . 88
5.3 Zermelo Fraenkel . . . . . . . . . . . . . . . . . . . . . . . . . 92
5.3.1 Axiomatic set theory . . . . . . . . . . . . . . . . . . . 92
5.3.2 Zermelo Fraenkel (ZF) Axioms . . . . . . . . . . . . . 93
6 Hilbert 99
6.1 Hilbert’s proof theory . . . . . . . . . . . . . . . . . . . . . . . 101
6.2 Hilbert’s 23 problems . . . . . . . . . . . . . . . . . . . . . . . 110
7 Types 113
7.1 Russell and Whitehead’s Principia Mathematica . . . . . . . . 113
7.2 Ramsey, Hilbert and Ackermann . . . . . . . . . . . . . . . . . 119
7.3 Quine . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121
8 G¨ odel 123
8.1 Informally: G¨ odel’s incompleteness theorems . . . . . . . . . . 123
8.2 Formally: G¨ odel’s Incompleteness Theorems . . . . . . . . . . 127
8.2.1 On formally undecidable propositions . . . . . . . . . . 127
8.2.2 The impossibility of an ‘internal’ proof of consistency . 130
8.2.3 G¨ odel numbering and a concrete proof of G
1
, G
2
and G
3
131
8.3 G¨ odel’s theorem and Peano Arithmetic . . . . . . . . . . . . . 132
8.4 Consequences . . . . . . . . . . . . . . . . . . . . . . . . . . . 134
8.5 NeumannBernaysG¨ odel axioms . . . . . . . . . . . . . . . . . 135
9 Church and Turing 141
9.1 Turing and Turing Machine . . . . . . . . . . . . . . . . . . . 141
9.2 Church and the Lambda Calculus . . . . . . . . . . . . . . . . 153
9.3 The ChurchTuring thesis . . . . . . . . . . . . . . . . . . . . 166
10 Conclusion 169
A Timeline and Images 181
CONTENTS 7
Mathematical Notations
Many diﬀerent notations have been developed for set theory and logic.
Most notations that we have used are standard today; other notations that
we have used are introduced in the text.
Mathematical Logic
symbol meaning also described as
∧ conjuction and
∨ disjunction (inclusive) or
negation not
ϕ(x) propositional function
→ implication if . . . then
↔ biimplication if and only if, iﬀ
≡ equivalence is equivalent to
∀ universal quantiﬁer for all
∃ existential quantiﬁer exists
∃! oneelement existential quantiﬁer exists a unique
In most places we have chosen to use the following notation
1
to denote
quantiﬁcations:
(relation : range : term)
denotes the relationship over a set of terms ranging over range
Consider a general pattern (Q x : ϕ(x
0
, . . . , x
n
) : t(x
0
, . . . , x
n
)), with Q
a quantiﬁer, ϕ a boolean expression in terms of the dummies x
0
, . . ., x
n
,
and t(x
0
, . . . , x
n
) the term of the quantiﬁcation. The quantiﬁcation is the
accumulation of values t(x
0
, . . . , x
n
) using an operator or relation indicated
by Q, over all values (x
0
, . . . , x
n
) for which ϕ(x
0
, . . . , x
n
) holds.
1
Notation originally due to E.W. Dijkstra.
8 CONTENTS
This notation is suitable for formal manipulation and unambiguous in the
sense that it explicitly indicates the quantiﬁer Q, the dummies and the range
of the dummies that is indicated by the boolean expression ϕ (i.e. it exactly
determines the domain of the quantiﬁcation). This allows us to reason about
general properties of quantiﬁcations, in a way in which the (scopes of the)
bound variables are clearly identiﬁed. Note that this type of quantiﬁcation
is only suitable for binary operations that are symmetric and associative.
Example:
(
¸
x : 0 ≤ x ≤ 5 : x
2
)
=
0
2
+ 1
2
+ 2
2
+ 3
2
+ 4
2
+ 5
2
=
5
¸
x=0
x
2
Example:
(∃x : x ∈ N : x
3
−x
2
= 18)
≡
‘there exists a natural number x such that x
3
−x
2
= 18’
If the term ranges over all possible values of the variable (here : x), or if
it is clear what the range of a variable is, we can omit it.
Example:
(∀x : true : x ∈ A → x ∈ B)
≡
(∀x :: x ∈ A → x ∈ B)
≡
‘all elements of A are also elements of B’
Chapter 1
Introduction
Pure mathematics is, in its way, the poetry of logical ideas.
 Albert Einstein
This report covers the most important developments and theory of the
foundations of mathematics in the period of 1870 to 1940. The tale of the
foundations is fairly familiar in general terms and for its philosophical con
tent; here the main emphasis is laid on the mathematical theory. The history
of the foundations of mathematics is complicated and is a manysided story;
with this article I do not aim to give a deﬁnitive or complete version, but
to capture what I consider the essence of the theoretical developments, and
to present them in a clear and modern setting. Some basic mathematical
knowledge on settheory and logics are presupposed.
By the middle of the nineteenth century, certain logical problems (for
example paradoxes around the notions of inﬁnity, the inﬁnitesimal and con
tinuity) at the heart of mathematics had inspired a movement, led by German
mathematicians, to provide mathematics with more rigorous foundations.
This is where the theory of this report begins, with the emergence of set
theory by the German mathematician Cantor. In section 2.1 we informally
describe how work on a problem concerning trigonometric series gradually
led Cantor to his theory of sets (section 2.2). As a result of the work of
Weierstrass, Dedekind and Cantor, pure mathematics had been provided
with much more sophisticated foundations. The notion of inﬁnitesimal had
been banished, ‘real’ numbers had been provided with a logically consistent
9
10 CHAPTER 1. INTRODUCTION
deﬁnition (section 3.5), continuity had been redeﬁned and, more controver
sially, a whole new branch of arithmetic had been invented which addressed
itself to the problems (e.g. paradoxes) of inﬁnity (sections 3.6, 3.7).
In 1895 Cantor discovered a paradox (section 3.8.1) that he did not publish
but communicated to Hilbert in 1896. In 1897 it was rediscovered in a slightly
diﬀerent form by BuraliForti (section 3.8.2). Cantor and BuraliForti could
not resolve this paradox, but it was not taken so seriously, partly because
the paradoxes appeared in a rather technical region.
The Italian mathematician Peano (section 4.1) was able to show that the
whole of arithmetic could be founded upon a system that uses three basic
notions and ﬁve initial axioms. At the same time the German mathematician
Frege (section 4.2) worked on developing a logical basis for mathematics. Just
as Peano, Frege wanted to put mathematics on ﬁrm grounds. But Frege’s
grounds were strictly logic; he followed a development later called logicism,
also known as the development of socalled mathematical logic.
The British mathematician Russell noted Peano’s work and later that
of Frege. Soon thereafter he showed (section 5.1) how ﬁnite descriptions
like ‘set of all sets’ could be selfcontradictory (i.e. paradoxical) and pointed
out the diﬃculties that arose with selfreferential terms. This paradox that
Russell found existed not only in speciﬁc technical regions but in all of the
axiomatic systems underlying mathematics at the same time (section 5.1).
But since the paradoxes could be avoided in most practical applications of
set theory, the belief in set theory as a proper foundation of mathematics
remained. Axiomatic set theory (section 5.3.1) was an attempt to come to
a theory without paradoxes. Various responses to the paradox (section 5.2)
led to new sets of axioms for set theory. The two main approaches are by the
German mathematicians Zermelo and Fraenkel (section 5.3), and by the Hun
garian von Neumann, the HungarianAustrian G¨ odel and the Briton Bernays
(section 8.5). It also led to the emergence of the ‘intuitionistic’ philosophy of
mathematics by the Dutch mathematician Brouwer (not covered here) and
to a theory of types, proposed by Russell himself with the help of his for
mer teacher, the English mathematician Whitehead. Despite of the paradox
Russell and Whitehead still claimed that all mathematics could be founded
on a mathematical logic; this believe was given a deﬁnite presentation in
their work ‘Principia Mathematica’ (section 7.1). Various consequences fol
lowed (section 7.3) and new conceptions of logic arose (by Wittgenstein and
11
Ramsey, see section 7.2).
At the turn of the century, the German mathematician David Hilbert
listed certain important problems concerning the foundations of mathema
tics and mathematics in general (section 6.2. To overcome paradoxes and
other problems that arose in existing systems, Hilbert developed a theory of
axiomatic systems (section 6.1). He then stimulated his student Zermelo in
using this axiomatic method to develop as ﬁrst a set of axioms for set theory
(section 5.3.2). Hilbert had since then made more precise demands on any
proposed set of axioms for mathematics (section 6.1) in terms of consistency,
completeness and decidability.
In 1931 G¨ odel had shown that consistency and completeness could not
both be attained (chapter 8). G¨ odel’s work left outstanding Hilbert’s ques
tion of decidability. The English mathematician Turing proved in 1936 that
there are undecidable problems, by giving the socalled halting problem that
cannot be solved by any algorithm (section 9.1), after formalizing the no
tion of algorithm with his concept of the Turing Machine. The American
mathematician Church (independently) obtained the same result but with
another formalization of the notion of an algorithm, using his computational
model of lambda calculus (section 9.2). In section 9.3 we state that these two
notions are equivalent and correspond to the intuitive notion of algorithm or
computability. In chapter 10 I summarize the theory of the foundations of
mathematics, before giving my own opinion and make some suggestions for
future work.
This article is part of the practical component of my study of computing
science, and written for a large part in 8 weeks at the HeriotWatt university
in Edinburgh under supervision of prof. F. Kamareddine. I want to thank
Rob Nederpelt and the formal methods section of the computing science de
partment of the Eindhoven University of Technology for making this possible.
Rob Nederpelt always inspired me to continue working on this report and was
patient in explaining diﬃcult proofs to me. And last but not least, I want
to thank Fairouz Kamareddine for her support and positive motivation, and
Boukje Nouwen (as she breathes a sigh of relief that this is (I think) the last
revision) for the typesetting and editing of large parts of this document and
for helping me in many ways to ﬁnish this article in such a small period of
time.
12 CHAPTER 1. INTRODUCTION
Chapter 2
Cantor’s paradise
2.1 The beginning of settheory
Perhaps the most surprising thing about mathematics is that it
is so surprising. The rules which we make up at the beginning
seem ordinary and inevitable, but it is impossible to foresee their
consequences. These have only been found out by long study, ex
tending over many centuries. Much of our knowledge is due to a
comparatively few great mathematicians such as Newton, Euler,
Gauss, or Riemann; few careers can have been more satisfying
than theirs. They have contributed something to human thought
even more lasting than great literature, since it is independent of
language.
 Titchmarsh, E. C. in [88]
By the late 19
th
century the discussions about the foundations of geometry
had become the focus for a running debate about the nature of the branches
of mathematics ([23, last paragraph of section 35, page 69/70]). Although
there had been no conscious plan leading in that direction, the stage was set
for a consideration of questions about the fundamental nature of mathema
tics.
In the study of logic, the work of the English mathematician George Boole
in the 1850s ([49, chapter 2.S4, page 51]), and the American Charles Sanders
13
14 CHAPTER 2. CANTOR’S PARADISE
Peirce around 1880 ([49, page 187]), had contributed to the development of a
symbolism to explore logical deductions and in Germany the logician Gottlob
Frege (see [98]) had directed keen attention to fundamental questions.
All of these debates came together through the pioneering work of the
German mathematician Georg Cantor on the concept of a set. Cantor had
begun work in this area because of his interest in Riemann’s theory of trigono
metric series.
In Germany at the university of Halle, the direction of Cantor’s research
turned away from number theory and towards analysis. This was due to
Heine, one of his senior colleagues at Halle, who challenged Cantor to prove
the open problem on the uniqueness of representation of a function as a
trigonometric series (see [30, section 5.2, page 182]). Starting from the work
on trigonometric series and on the function of a complex variable done by
the German mathematician Bernhard Riemann (see [75]) in 1854, Cantor in
1870 showed ([30, page 182]) that such a function can be represented in only
one way by a trigonometric series. Consideration of the collection of numbers
(originally termed ‘point sets’, see [30, section 5.2, page 184]) that would not
conﬂict with such a representation led him, ﬁrst, in 1872, to deﬁne irrational
numbers in terms of convergent sequences of rational numbers (or quotients
of integers, see section 3.5.2) and then to begin his major lifework, the theory
of sets and the concept of transﬁnite numbers.
2.2. BASIC CONCEPTS 15
2.2 Basic concepts
The essence of mathematics lies in its freedom.
 Georg Cantor, quoted in [58]
In 1974 Cantor published his ﬁrst article on settheory. A set, wrote Can
tor (in ‘Untersuchungen ¨ uber die Grundlagen der Mengenlehre I’, published
in [20, page 261281]), is “a collection of deﬁnite, distinguishable objects of
perception or thought conceived as a whole”. In this report we use a similar
description of the concept of a set.
What is a set? A (ﬁnite or inﬁnite) collection of objects, that is considered
as a single, abstract object.
A set is sometimes also called aggregate, class or (as it was ﬁrst called by
Riemann (see [31, page 88]) and later by the mathematician Russell:) mani
fold. The objects are also called elements or members of the set.
We denote a set of elements between brackets ‘¦’, ’¦’, and membership of
an element to a set by the membership relation ∈.
Example: If we consider a set that contains natural numbers, we write 4 ∈
¦2, 3, 4, 5¦ to indicate that 4 is an element of the set ¦2, 3, 4, 5¦. We write
4 ∈ ¦7, 8, 9¦ to indicate that 4 is not an element of the set ¦7, 8, 9¦.
In a mathematical context we mostly consider sets of numbers and functions.
We denote the wellknown sets of natural numbers by N (this set is also called
the naturals), the integers by Z, the fractional numbers by Q (this set is also
called the rationals) and the reals by R (this set is also called the continuum).
The objects of a set themselves can also be sets.
What is set theory? A branch of mathematics that deals with the proper
ties of welldeﬁned collections of objects, which may be of a mathematical
nature, such as numbers or functions, or not.
16 CHAPTER 2. CANTOR’S PARADISE
Cantor deﬁned ([49, page 288]) two sets A and B to be identical (equal),
notation A = B, if and only if A and B have the same elements. When later
settheory was axiomatized, this deﬁnition became also known as the
Axiom of extensionality: A = B := (∀x :: (x ∈ A ↔ x ∈ B))
Example: ¦3, 3, 7¦ = ¦7, 3¦ and ¦2, ¦3, 4¦¦ = ¦¦2, 3¦, 4¦
The relation ‘is a subset of’, notation ⊆, indicates that one set is con
tained in the other:
Deﬁnition of subset: A ⊆ B := (∀x :: x ∈ A → x ∈ B)
Deﬁnition of proper subset: A ⊂ B := (A ⊂ B ↔ A ⊆ B ∧ A = B)
We often want to create a new set from a given set by selecting elements
that have certain properties. For example we take the set of powers of three
or the set of all even numbers (to be exact: the set containing those ele
ments of the set of natural numbers that have the property to be divisible
by 2). This principle was used by Cantor, and we also call it the unrestricted
or naive comprehension principle because it later (see sections 3.8 and 5.1)
turned out to be untenable.
Comprehension principle: For all properties ϕ there is precisely one set,
denoted by ¦x [ ϕ(x)¦, whose elements are exactly those objects which have
the property ϕ.
We thus have that y ∈ ¦x [ ϕ(x)¦ ↔ ϕ(y). As a consequence (by taking
for all x, ϕ(x) = false), there is at least one set that has no elements: the
empty set, denoted by ∅.
Theorem: (∃!x :: (∀y :: y / ∈ x))
Proof: If we take ϕ to be false, the comprehension principle says that ‘there
is precisely one set whose elements are exactly those objects which have the
property false’. In mathematical notation: (∃!x :: (∀y :: y ∈ x ↔ false)).
This is equivalent to saying there is no element y that can be a member of
x: (∃!x :: (∀y :: y / ∈ x)). From now on, we denote this unique set x by ∅ and
call it the empty set.
2.2. BASIC CONCEPTS 17
Corollary: (∀a :: ∅ ⊆ a)
Proof: We want to prove that (∀a :: ∅ ⊆ a) or, using the deﬁnition of the
subset relation: (∀x :: x ∈ ∅ → x ∈ a). From the previous theorem we know
that (∀y :: y / ∈ ∅). This yields us (∀x :: false → x ∈ a), which is true.
Using the comprehension principle we can create new sets from given sets.
So now we can introduce some operations on sets, by applying the compre
hension principle. But before we do that, we ﬁrst introduce some general
(i.e. regardless whether the operations are settheoretic or not) properties
of operations: idempotence, commutativity, associativity and distributivity.
Although Cantor did not formulate these properties as such, they are used
in the branch of calculus and useful in the set theory that follows in this
chapter.
Suppose ⊕ and are binary
1
operations on a certain domain and E, F and
G are elements on that domain (for example sets), on which we have deﬁned
the equality relation ‘=’.
Deﬁnition of idempotence:
⊕ is idempotent := (∀E :: E ⊕E = E)
Deﬁnition of commutativity:
⊕ is commutative := (∀E, F :: E ⊕F = F ⊕E)
Deﬁnition of associativity:
⊕ is associative := (∀E, F, G :: (E ⊕F) ⊕G = E ⊕(F ⊕G))
Deﬁnition of distributivity:
⊕ is distributive
2
over := (∀E, F, G :: E ⊕(F G) = (E ⊕F) (E ⊕G))
1
These properties can also be generated for operations of arbitrary arity, but this will
not be necessary for our discussion.
2
This form of distributivity is also called leftdistributivity, as opposed to right
distributivity.
⊕ is rightdistributive over := (∀E, F, G :: (E F) ⊕G = (E ⊕G) (F ⊕G))
In ordinary mathematics this distinction is often left out for commutative operations, and
we for example simply say that is distributive over + (when in fact it is both left and
rightdistributive).
18 CHAPTER 2. CANTOR’S PARADISE
The symbol ∪ is employed to denote the union of two sets. Thus, the set
A ∪ B is deﬁned as the set that consists of all elements belonging either to
set A or set B.
Deﬁnition of union: A ∪ B := ¦x [ x ∈ A ∨ x ∈ B¦
The intersection operation is denoted by the symbol ∩. A ∩ B is deﬁned
as the set composed of all elements that belong to both A and B.
Deﬁnition of intersection: A ∩ B := ¦x [ x ∈ A ∧ x ∈ B¦
Any two sets the intersection of which is the empty set are said to be dis
joint. A collection of sets is called (pairwise) disjoint or mutually exclusive
if any two distinct sets in it are disjoint.
Example: The operations union and intersection on sets are both idempo
tent, commutative and associative.
The diﬀerence of sets B and A, denoted B −A, contains those elements
of B, that are not in A.
Deﬁnition of diﬀerence: B −A := ¦x [ x ∈ B ∧ x / ∈ A¦
If A ⊆ B we often call the diﬀerence B−A the relative complement of A
in B. We then call B the universe, and if it is clear what the universe is we
often denote the relative complement of A by A
c
. From the deﬁnitions that
we have introduced so far, we can deduce three properties that are known as
the laws of reciprocity. The second and third law are also known as the laws
of de Morgan, named after the English mathematician Augustus de Morgan:
First law of reciprocity: A ⊆ B ↔ A
C
⊇ B
C
Second law of reciprocity: (A ∪ B)
C
= A
C
∩ B
C
Third law of reciprocity: (A ∩ B)
C
= A
C
∪ B
C
We deﬁne the power set of V , denoted by {(V ), as the set of all subsets
of V . Note that if V = ∅, this operation creates a larger set from a given set
V .
2.2. BASIC CONCEPTS 19
Deﬁnition of powerset: {(V ) := ¦A [ A ⊆ V ¦
Given a set V , we thus have that (∀y :: y ∈ {(V ) ↔ y ⊆ V )
We can extend the union of a pair of sets to any ﬁnite collection of sets;
the union is then deﬁned as the set of all objects which belong to at least
one set in the collection A. We can do the same for the intersection.
Deﬁnition:
¸
A := ¦x [ (∃y :: y ∈ A ∧ x ∈ y)¦
Deﬁnition:
¸
A := ¦x [ (∀y :: y ∈ A → x ∈ y)¦
We can divide a set of objects into a partition, that is a family of subsets
that are mutually exclusive and jointly exhaustive. Assume P is a set of
subsets of X.
Deﬁnition of partition: P is a partition of X :=
X =
¸
¦A [ A ∈ P¦ ∧ (∀A, B : A, B ∈ P : A = B ∨ A ∩ B = ∅)
In this chapter I have made extensive use of [30] in section 2.1 and [17]
in section 2.2.
20 CHAPTER 2. CANTOR’S PARADISE
Chapter 3
Mathematical constructs in
settheory
3.1 Some mathematical concepts
The mathematician is entirely free, within the limits of his imagi
nation, to construct what world he pleases. What he is to imagine
is a matter for his own caprice; he is not thereby discovering the
fundamental principles of the universe nor becoming acquainted
with the ideas of God. If he can ﬁnd, in experience, sets of entities
which obey the same logical scheme as his mathematical entities,
then he has applied his mathematics to the external world; he has
created a branch of science.
 J.W.N. Sullivan in Aspects of Science, 1925
Now that we have this apparatus of settheory available, we will see that
it is not just a separate branch of mathematics, but that we can deﬁne some
basic mathematical constructs in settheory. In this section we will consider
pairs and the cartesian product, necessary before we can treat relations (in
section 3.2) and functions (in section 3.3).
First we consider the mathematical concept of an ordered pair < a, b >.
Compared to a ‘normal’ pair, where two pairs are considered equal if they
have the same elements, we want an ordered pair to also have the property
21
22 CHAPTER 3. MATHEMATICAL CONSTRUCTS IN SETTHEORY
that the elements appear in the same order:
(∀c, d :: < a, b > = < c, d > ↔ a = c ∧ b = d)
We can now easily verify that the following deﬁnition (see [17, chapter
8]) in settheory satisﬁes the desired property.
Deﬁnition of ordered pair
1
: < a, b > := ¦a, ¦a, b¦¦
As the cartesian product A B is by deﬁnition the set of all ordered
pairs < a, b > with a ∈ A and b ∈ B, we can now use the same deﬁnition in
settheory:
Deﬁnition of cartesian product: A B := ¦< a, b > [ a ∈ A ∧ b ∈ B¦
Let V = ¦V
i
[ i ∈ I¦ be a set of sets. We now deﬁne the cartesian product
of a set of sets, denoted by V or
i∈I
V
i
. The deﬁnition uses the concept of
a function, that will be introduced on page 29.
Deﬁnition of cartesian product of a set of sets:
V := ¦f : I →
¸
i∈I
V
i
[ (∀i : i ∈ I : f(i) ∈ V
i
)¦
1
Representation originally by Kuratowski, see [49, page 294].
3.2. RELATIONS 23
3.2 Relations
Mathematicians do not study objects, but relations between ob
jects. Thus, they are free to replace some objects by others as
long as the relations remain unchanged. Content to them is irre
levant: they are interested in form only.
 J.H. Poincar´ e
In mathematics, a relation maps each element from an input set (called
domain) to either true or false. We formalize this notion in settheory.
Deﬁnition of binary relation:
R is a binary relation between X and Y := R ⊆ X Y
Note: We can easily generalize this deﬁnition for nary relations: R is an
nary relation on X
1
, . . . X
n
:= R ⊆ X
1
, X
2
. . . X
n
, for n ∈ N. We call n
the arity of the function.
Example: We have already seen the deﬁnitions of the subset and proper sub
set relations in section 2.1. There we deﬁned the set R ⊆ XY implicitly by
using a statement; only those pairs < x, y > are in R for which the statement
holds (here we are using in fact the comprehension principle of page 16). We
will continue to use statements to deﬁne relations.
We deﬁne the following shorthand notation (sometimes also written in
inﬁx notation as xRy): R(x, y) := < x, y > ∈ R.
The mathematical expression ‘x < y’ is now equivalent to the set theoretic
expression ‘< x, y >∈ R’, with R representing the ‘less than’ relation.
Example: The relation < on the naturals (i.e. between N and N) can be
deﬁned as:
< 0, 1 >, < 1, 2 >, < 2, 3 >, . . .
< 0, 2 >, < 1, 3 >, < 2, 4 >, . . .
< 0, 3 >, < 1, 4 >, < 2, 5 >, . . .
.
.
.
24 CHAPTER 3. MATHEMATICAL CONSTRUCTS IN SETTHEORY
On a relation R we can deﬁne the concepts of domain and range.
Deﬁnition of domain, range:
dom(R) := ¦x ∈ X [ (∃y : y ∈ Y : R(x, y))¦
ran(R) := ¦y ∈ Y [ (∃x : x ∈ X : R(x, y))¦
If we deﬁne the identity relation of X, we want it to have the usual pro
perty that id
X
(x) = x for all x ∈ X (see for example [3, section 1.9.5.b, page
30]). In settheory, we denote the identity relation on V by I
V
.
Deﬁnition of identity relation: I
V
:= ¦< x, y >∈ V V [ x = y¦
Assume R is a binary relation on a set X (i.e. R ⊆ X X). As we did
for operations in section 2.2, we can also deﬁne some general properties of
relations. Note that we have already deﬁned an equality relation ‘=’ on X at
page 16. Hereby we can explicitly state on which domain the property holds
(e.g. R is reﬂexive on X) or leave this implicit (e.g. simply R is reﬂexive).
Deﬁnition of reﬂexivity:
R is reﬂexive := (∀x : x ∈ X : R(x, x))
Deﬁnition of symmetry:
R is symmetric := (∀x, y : x, y ∈ X : R(x, y) → R(y, x))
Deﬁnition of antisymmetry:
R is antisymmetric := (∀x, y : x, y ∈ X : R(x, y) ∧ R(y, x) → x = y)
Deﬁnition of transitivity:
R is transitive := (∀x, y, z : x, y, z ∈ X : R(x, y) ∧ R(y, z) → R(x, z))
Deﬁnition of connectivity:
R is connective := (∀x, y : x, y ∈ X : R(x, y) ∨ (x = y) ∨ R(y, x))
Deﬁnition of equivalence:
R is an equivalence relation := R is reﬂexive, symmetric and transitive
3.2. RELATIONS 25
Note: Asymmetric means not symmetric, and is not the same as anti
symmetric.
Example: The subset relation is reﬂexive, antisymmetric (note that the proof
of antisymmetry uses the axiom of extensionality of page 16) and transitive,
but not connective.
If R is an equivalence relation on a set X, we denote the equivalence class
of x with respect to R as [x]
R
.
Deﬁnition of equivalence class: [x]
R
:= ¦y ∈ X [ R(x, y)¦
If R is an equivalence relation on X, the quotient set X/R of X modulo
R is the set of equivalence classes [x]
R
for all x ∈ X.
Deﬁnition of quotient set: X/R := ¦[x]
R
[ x ∈ X¦
We now continue to build on the concept of relations, by categorizing
them based on the properties they have. An important property of relations
is the ability to compare and order elements. Suppose X and Y are sets, and
R is a relation on X.
Deﬁnition of (weak) partial ordering: R is a (weak) partial ordering :=
R is reﬂexive, antisymmetric and transitive (on X)
Deﬁnition of quasi ordering: R is a quasi ordering := R is irreﬂexive and
transitive
Deﬁnition of strict partial ordering: R is a strict partial ordering :=
R is irreﬂexive, antisymmetric and transitive
Deﬁnition of (total or linear) ordering: R is a (total or linear) ordering
:= R is irreﬂexive, antisymmetric, transitive and connective
Deﬁnition of wellordering: R is a wellordering := R is an ordering on
X and each nonempty subset of X has a least element
26 CHAPTER 3. MATHEMATICAL CONSTRUCTS IN SETTHEORY
Deﬁnition of wellfoundedness: A set V is wellfounded by a relation R
:= S is partially ordered by R and contains no inﬁnite descending chains
A set S contains a set C that is an inﬁnite descending chain iﬀ
C ⊂ S ∧ C has no minimal element.
Theorem: (without proof) Any subset of a wellfounded set is also well
founded.
Now we can speak of a set of which the elements are ordered by a relation
R, we deﬁne the wellknown concepts of (immediate) successor and prede
cessor.
Deﬁnition of (immediate) predecessor: An element x
1
∈ X is a pre
decessor of an element x
2
∈ X (with respect to an ordering R on X) :=
R(x
1
, x
2
) ∧ R(x
2
, x
1
). x
1
is an immediate predecessor of x
2
if in addition
(∃x
3
: x
3
∈ X ∧ x
3
= x
1
∧ x
3
= x
2
: R(x
1
, x
3
) ∧ R(x
3
, x
2
))
Deﬁnition of (immediate) successor: An element x
2
∈ X is a suc
cessor of an element x
1
∈ X (with respect to an ordering R on X) :=
R(x
1
, x
2
) ∧ R(x
2
, x
1
). x
2
is an immediate successor of x
1
if in addition
(∃x
3
: x
3
∈ X ∧ x
3
= x
1
∧ x
3
= x
2
: R(x
1
, x
3
) ∧ R(x
3
, x
2
))
Note that with these deﬁnitions it can be easily proved that if a relation
R on X is an ordering, then each element except the smallest has a unique
immediate predecessor and each element except the largest has a unique
immediate successor. The notions of smallest and largest elements will be
introduced hereafter. In the literature the immediate successor or predeces
sor is sometimes called just successor or predecessor. Sometimes we also see
that the term ‘direct’ is used in stead of ‘immediate’, or we simply speak of
the ‘next’ or ‘previous’ value.
When R is a partial ordering we often denote it by the symbol _, and
when it is a quasi ordering by ≺. Now we can distinguish elements based on
their order. Let X be a set, partially ordered by _ and let Y be a subset of X.
Deﬁnition of minimal element:
x is a minimal element of X := x ∈ X ∧ (∃y : y ∈ X ∧ y = x : y _ x)
3.2. RELATIONS 27
Deﬁnition of maximum element:
x is a maximum element of X := x ∈ X ∧ (∃y : y ∈ X ∧ y = x : x _ y)
Deﬁnition of least element:
x is a least (also called smallest or ﬁrst) element of X :=
x ∈ X ∧ (∀y : y ∈ X : x _ y)
Deﬁnition of maximal element:
x is a maximal (also called greatest, largest, last) element of X :=
x ∈ X ∧ (∀y : y ∈ X : y _ x)
Deﬁnition of lowerbound:
x is a lowerbound for Y in X := x ∈ X ∧ (∀y : y ∈ Y : x _ y)
Deﬁnition of upperbound:
x is an upperbound for Y in X := x ∈ X ∧ (∀y : y ∈ Y : y _ x)
Deﬁnition of inﬁmum:
x is an inﬁmum for Y in X := x is the greatest lowerbound for Y in X
Deﬁnition of supremum:
x is a supremum for Y in X := x is the smallest upperbound for Y in X
Example: Let X = ¦4, 6, 12, 24, 36¦ and R(x, y) := x is a divisor of y. Then
R is a partial order (but not strict) and also a quasi order, but not a (total)
order. 4 and 6 are minimal elements of X, but X has no least element. 1 is
a lowerbound for X, and 2 is the inﬁmum of X.
28 CHAPTER 3. MATHEMATICAL CONSTRUCTS IN SETTHEORY
The socalled least number principle says that any nonempty subset of
the natural numbers has a least element. This principle can be shown (a
proof can be found in [59, page 7]) to be equivalent to the principles of weak
and strong induction, that will be introduced in section 3.4.
Example: The relation < on the naturals is an example of a total ordering
on N. From the socalled least number principle we can conclude that N is
also wellordered by <. We prove the latter.
Proof: We know that < is an ordering on N. We show by induction on the
number of elements of A, notation [ A [, that (∀A : A ⊆ N ∧ A = ∅ : A has
a least element).
Suppose N = ¦0, . . . , n¦, n ∈ N. Let A ⊆ N. For [ N [ = 0 it is trivial that
A is wellordered. For [ N [ = n + 1, if A ∩ ¦0, . . . , n¦ = ∅, n + 1 is a least
element of A. If A ∩ ¦0, . . . , n¦ = ∅, we can apply the induction principle
to conclude that A ∩ ¦0, . . . , n¦ has a least element. The least element of
A ∩ ¦0, . . . , n¦ is also a least element of A ∩ ¦0, . . . , n + 1¦.
3.3. FUNCTIONS 29
3.3 Functions
In mathematics, a function maps each element from an input set to one or
more elements of an output set; in other words it is a special kind of relation
that indicates for each pair < x, y > of the input and output set if it belongs
to the function or not. More precisely, f is a function or mapping from X
to Y means that f assigns to each x ∈ X a uniquely determined y ∈ Y , no
tation f(x) = y. We can deﬁne this notion in settheory by using a relation
between X and Y such that for each x ∈ X there is a unique y ∈ Y such
that < x, y >∈ f.
Deﬁnition of function: f is a function from a set X to a set Y , notation
f : X → Y := f ⊆ X Y ∧ (∀x : x ∈ X : (∃!y : y ∈ Y : < x, y >∈ f))
The deﬁnitions of domain and range as given in the subsection about
relations can now also be used for functions. We now introduce a notation
for the set of all functions f : X → Y .
Deﬁnition of Y
X
: Y
X
:= ¦f ∈ {(X Y ) [ f is a function from X to Y ¦
As we did before for relations and operations, we now deﬁne some general
properties for functions.
Deﬁnition of injective: f : X → Y is injective or an injection :=
(∀x
1
, x
2
: x
1
, x
2
∈ X : x
1
= x
2
→ f(x
1
) = f(x
2
))
Deﬁnition of surjective: f : X → Y is surjective or a surjection :=
(∀y : y ∈ Y : (∃x : x ∈ X : y = f(x))
Deﬁnition of bijective:
f : X → Y is bijective or a bijection := f is surjective and f is injective
If f is bijective, f is also called a (onetoone) correspondence between
X and Y .
Example: We have the following property:
f : X → Y is surjective ↔ Ran(f) = Y .
30 CHAPTER 3. MATHEMATICAL CONSTRUCTS IN SETTHEORY
Example: f : N → [−2π, 2π], with f(x) = sin(x) is a function and a relation.
g : [−2π, 2π] →N, with g(x) = y iﬀ x = sin(y) is a relation, not a function.
We will now consider two special kinds of functions: the identity function
and the sequence.
Deﬁnition of sequence:
s is a sequence of X := s is a function from N to X (i.e. s ∈ X
N
)
Deﬁnition of identity function:
The identity function id
X
:= id
X
: X → X and (∀x : x ∈ X : id
X
(x) = x)
We now introduce some operations on functions in settheory. We can
easily check that these deﬁnitions correspond to mathematical operations.
Deﬁnition of composition: The composition g◦f of two functions f : A →
B and g : B → C := the function g ◦ f : A → C with g ◦ f(x) = g(f(x)), for
all x ∈ A
Deﬁnition of inverse function: The inverse of a bijection f : X → Y :=
the function f
−1
: Y → X with (∀y : y ∈ Y : f
−1
(y) = x ↔ y = f(x))
Deﬁnition of restricted function: The restriction of a function f :
X → Y to X
0
, with X
0
⊆ X := the function fX
0
: X
0
→ Y with
(∀x : x ∈ X
0
: fX
0
(x) = f(x))
Just as in algebra, we can now combine a set and relations on that set
into a structure.
Deﬁnition of (relational) structure: 'X, R
0
, . . . , R
p
` is a (relational)
structure := X is a set and R
0
, . . . , R
p
are relations on X
The concept of a structure enables us to abstract from the exact set and
relations, and reason about sets of structures instead. There also is a useful
deﬁnition for equivalence of structures, called isomorphism.
3.3. FUNCTIONS 31
Let R = 'X, R
0
, . . . , R
p
` and S = 'Y, S
0
, . . . , S
p
` be two structures, such
that (∀i : 0 ≤ i ≤ p : the arity of R
i
and S
i
is n
i
+ 1).
Deﬁnition of isomorphism: f is an isomorphism between R and S := f
is a bijection from X to Y and (∀i : 0 ≤ i ≤ p : (∀x
0
, . . . , x
n
i
: x
0
, . . . , x
n
i
∈
X : R
i
(x
0
, . . . , x
n
i
) ↔ S
i
(f(x
0
), . . . , , f(x
n
i
))))
With the notion of isomorphism, we can now abstract over structures.
When two structures are similar (the sets are of the same size and the rela
tionships between the elements in one structure are retained between images
of those elements in the other structure), we call them isomorphic.
Deﬁnition of isomorphic: Two structures R and S are isomorphic, nota
tion R · S := there exists an isomorphism from R to S
Deﬁnition of automorphism:
f is an automorphism of R := f is an isomorphism from R to R
Example: An isomorphism from structure 'N, <` to 'N
even
, <` is given by
f : N → N
even
, with f(n) = 2n. f is not an isomorphism from 'N, ⊕` to
'N, <`, with a ⊕b := b divides a.
Example: The function g : R
+
→R
+
with g(x) = log(x) is an isomorphism
between 'R
+
, ∗` and 'R
+
, +`, because for all r
1
, r
2
∈ R
+
, log(r
1
∗ r
2
) =
log(r
1
) +log(r
2
).
Example: An automorphism of 'A, R
0
, . . . , R
p
` is the identity function id
A
:
A → A, so id
A
= ¦< a, a > [ a ∈ A¦. Also, the function f(x) = 2x
3
is an
automorphism of 'R, <`.
32 CHAPTER 3. MATHEMATICAL CONSTRUCTS IN SETTHEORY
3.4 Induction Methods
There is a tradition of opposition between adherents of induction
and deduction. In my view it would be just as sensible for the two
ends of a worm to quarrel.
 A. Whitehead, quoted in [76]
3.4.1 Induction
Induction is a method of reasoning from a part to a whole, from particu
lars to generals, or from the individual to the universal. It should not be
confused with the mathematical principle of induction (treated in section
3.4.3). In ordinary induction we examine a certain number of cases and
then generalize. Reasoning by analogy, where a conclusion is made based on
an analogues situation, is also a primitive form of induction (see [23, page 6]).
Example of inductive reasoning:
2
Coﬀee shop burger no. 1 was greasy . . .
Coﬀee shop burger no. 2 was greasy . . . . . .
.
.
.
Coﬀee shop burger no. 100 was greasy . . .
Therefore, all coﬀee shop burgers are greasy (or: the next coﬀee shop burger
will be greasy).
So in induction the conclusion contains information that was not con
tained in the premisses. This is the source of uncertainty in inductions:
inductions are strengthened as conﬁrming instances pile up, but they can
never bring certainty (unless every possible cause is actually examined, in
which case they become deductions). As said in [49, page 366], the broad
diﬀerence between deductive and inductive reasoning is that in deduction
the conclusion asserts less than the premisses, whereas in induction it asserts
more. In chapter 14, section 3 of [49] there is a more detailed treatment of
inductive reasoning, including a distinguishment between determinative and
conceptual induction. In both these kinds of induction, the conclusion goes
beyond the premisses (or the evidence).
2
Example from: Peter Suber, Philosophy department, Earlham College.
3.4. INDUCTION METHODS 33
3.4.2 Deduction
Mathematics, in its widest signiﬁcance, is the development of all
types of formal, necessary, deductive reasoning.
 A. Whitehead, quoted in [100]
In contrast to induction, deduction is a method of reasoning that is based
on a rigorous proof: a derivation (using ﬁxed rules called a system of logic), of
one statement (the conclusion) from one or more statements (the premisses)
 i.e. a chain of statements, each of which is either a premise or a consequence
of a statement occurring earlier in the proof. In deductive reasoning, we are
not directly concerned with the truth of the conclusion but rather whether
the conclusion does or does not follow from the premisses. If the conclusion
follows from the premisses, we say that our reasoning is valid; if it does not
we say that our reasoning is invalid.
The Greek found deductive reasoning, not empirical procedures, the method
to establish mathematical facts. This usage is a generalization of what the
Greek philosopher Aristotle called the syllogism (see [49, chapter 1, section
5 and 6)]), but a syllogism is now recognized as merely a special case of a
deduction. Also, the traditional view that deduction proceeds from the gene
ral to the speciﬁc has been abandoned as incorrect by most logicians. Some
experts regard all valid inferences as deductive in form and for this and other
reasons reject the supposed contrast between deduction and induction. The
German mathematician Hilbert greatly contributed to deductive reasoning as
we will see when we introduce his proof theory (also known as the axiomatic
method) in chapter 6. Logic, in mathematical context, can be seen as the
theory of the formal structure of deductive reasoning. The logic of Hilbert’s
metamathematics (see section 6.1) and Russell’s Principia Mathematica (see
section 7.1) are a form of reasoning with deductive certainty, although others
have proposed diﬀerent formalizations of deductive logic (see [49, page 121]).
Originally based on Aristotle’s logic, the deductive argument has become
more subtle and complex and is now based on modern symbolic logic.
34 CHAPTER 3. MATHEMATICAL CONSTRUCTS IN SETTHEORY
3.4.3 The principle of induction
Informal
The principle of induction, also known as mathematical induction, is an
important process for proving theorems. It was even used by Peano to deﬁne
the concept of natural numbers (see section 4.1, axiom 3). ‘Mathematical
induction’ is unfortunately named, for it is unambiguously a form of deduc
tion. The name was probably inspired by the fact that, just like induction,
it generalizes to a whole set from a smaller sample. But, as we will see,
mathematical induction concludes with deductive certainty.
The informal structure of the proof of a theorem by mathematical induc
tion is fairly simple:
1) Basis. Prove that the theorem holds for a speciﬁc case (which often is
minimal for a given ordering of the elements). This case is also called
base case.
2) Induction step. Prove a rule that says that if the theorem holds for an
arbitrary element, it is true for the next case. This often is a rule of
heredity that tells us that the theory is true for the immediate successor
case of an arbitrary element if it is true for the arbitrary element itself.
The claim that the theorem is true for an arbitrary element is called
the induction hypothesis.
3) Conclusion. Together, 1 and 2 imply that the theorem holds for all
cases starting with the base case. If you didn’t use the minimal case in
step 1, then you have proven only that the theorem holds for that case
and its successors, not for all possible cases.
The induction step can take two forms which correspond to two forms of
mathematical induction. Again we assume there is an ordering of the ele
ments with +1 the immediate successor relation.
Weak: prove that if the theorem holds for an arbitrary element n, then it
holds for the element n + 1
Strong: prove that if the theorem holds for all elements up to some arbitrary
element n, then it holds for the element n + 1
3.4. INDUCTION METHODS 35
We will now formally state the principle of induction. This is important,
since many mistakes are being made in applying the principle. It does not
go without saying that if we are to use mathematical induction to prove that
some theorem applies to ‘all possible cases’, then those cases must somehow
be enumerable and in some way linked to the integers. And we have to be
able to speak about the minimal case, the n
th
case, the successor of a given
case, etc.
Formal
Suppose that we want to prove a property ϕ(s) that holds for all s ∈
S. The induction principle assumes that S is a wellfounded set and every
element except for the smallest has an immediate predecessor. This condition
is also known as S is inductive. The structure of an inductive set in fact
resembles that of the naturals, i.e. if we have the axioms (see Peano axioms
in section 4.1) 0 is in N and if x is in N then x + 1 is in N, the set N is
inductive. In case the set S is the naturals, we also refer to the principle as
natural induction.
The principle presupposes the following two conditions:
A
l
S is a set, wellfounded by relation R (such that ‘+’ denotes the im
mediate successor of an element with respect to the relation R) and
with smallest element e
B
l
Every element except e has a (unique) immediate predecessor and ϕ
is a property of elements of S
If A
l
and B
l
hold, we can use the induction principle.
Deﬁnition of the (weak) (mathematical) induction principle:
if
C
l
ϕ(e) (i.e. e has a property ϕ)
D
l
(∀s : s ∈ S : ϕ(s) → ϕ(s+)) (i.e. if s ∈ S has property ϕ, then the
(unique) immediate successor of s also has property ϕ)
then the property ϕ holds for every element in S
36 CHAPTER 3. MATHEMATICAL CONSTRUCTS IN SETTHEORY
Step C
l
is also called the base of a proof by induction, step D
l
is also
called the induction step, and ϕ(s) is called the induction hypothesis.
Proof: Suppose S is a wellfounded set and every element except the small
est, denoted e, has an immediate predecessor, and suppose that a property
ϕ is true for e, as well as for the immediate successor s+ ∈ S if it is true for
s ∈ S. We now prove by contradiction that ϕ holds for all s ∈ S. Suppose
that ϕ is not true for all s ∈ S. Let N be the set of elements of S for which
ϕ is not true, i.e. N = ¦s ∈ S [ ϕ(s)¦. By the theorem of page 26 we also
know that if S is wellfounded, any subset of S is also wellfounded, thus N
contains a smallest element n. If n = e, we have a contradiction. If n > e, n
has an immediate predecessor, denoted n−. Since n is the smallest element
for which ϕ doesn’t hold, ϕ must hold for n−. But then by D
l
, ϕ must also
hold for the immediate successor of n−, that is n: contradiction. Thus ϕ
must be true for all s ∈ S.
As we mentioned before, this principle can be generalized in several ways.
One way is to prove in step C
l
that ϕ holds for a (possibly nonminimal) case
b ∈ S. In step D
l
we then show that (∀s : s ∈ S ∧ s ≥ b : ϕ(s) → ϕ(s+)).
The conclusion then is that the property ϕ holds for all elements in S that
are ordered larger or equal to b.
We now show (with proof by contradiction) why the additional property B
l
that every element except the smallest must have an immediate predecessor
is necessary for the induction principle.
Consider the natural numbers with the ordering < deﬁned as follows:
• if n and m are both even, then n < m if n < m
• if n and m are both odd, then n < m if n < m
• if n is even and m is odd, we always deﬁne n < m
We can check that N is wellfounded by <, but not every element (for
example 1) has an immediate predecessor. We take the property ϕ that every
element is even. The smallest element in the ordering is 0, which is even.
Also, if s has property ϕ then so does the successor of s. That is because
in our ordering, the successor of an even number is always the next even
number, never an odd number, and if s has property ϕ, then s must be even.
3.4. INDUCTION METHODS 37
Therefore (with only conditions A
l
, C
l
and D
l
holding) every natural num
ber is even: contradiction!
There is however a weaker principle, called transﬁnite induction which 
suitably stated  does apply to every wellordered set. But ﬁrst we regard a
stronger principle, that is based on the same assumptions ( A
l
and B
l
) as the
weak induction principle.
Principle of strong (mathematical) induction: The same as for (weak)
induction, but instead of C
l
and D
l
with
D
2
) (∀x : x ∈ S : (∀y : y ∈ S : R(y, x) → ϕ(y)) → ϕ(x)) (i.e. for all x ∈ S
we have ϕ(x) if all Rpredecessors y of x have property ϕ)
Sometimes this is also informally stated using the infamous three dots as
(∀s : s ∈ S : (ϕ(e) ∧ ϕ(e+) ∧ . . . ∧ ϕ(s)) → ϕ(s+).
Proof: Suppose 'X, R` is a structure such that A
l
, B
l
and E
l
hold. Again
we use proof by contradiction, and assume (∃x : x ∈ X : ϕ(x)). Thus
¦x ∈ X [ ϕ(x)¦ is nonempty and has a smallest element e
(since 'X, R`
is wellfounded). We now have ϕ(e
) ∧ (∀z : z ∈ X : R(z, e
) → ϕ(z)).
According to E
l
(substitute z for y, X for S, and take e
for x) we then have
ϕ(e
): contradiction.
Note that the base case is not really left out, since it is implicitly present
in the quantiﬁcation (take e for x). This form of induction, when applied
to ordinals (ordinals form a wellordered and hence wellfounded set and are
introduced in section 3.8.2) is called transﬁnite induction.
Principle of transﬁnite induction
3
: The same as for strong induction,
but instead of A
l
and B
l
as assumptions, it can be applied to any set S
that is wellordered by relation a R, and with smallest element e.
3
Sometimes this principle is called the Principle of Complete Induction, for example in
[4], but this is less common.
38 CHAPTER 3. MATHEMATICAL CONSTRUCTS IN SETTHEORY
An example of such a set are the ordinals or cardinals, or even the class
of all ordinals. A proof by transﬁnite induction typically needs to distinguish
three cases:
1. s is a minimal element
2. s has an immediate predecessor (i.e. the set of elements which are
smaller than s has a largest element)
In this case we can apply normal induction.
3. s has no immediate predecessor (i.e. s is a socalled limitordinal, see
also section 3.8.2)
The case for limit ordinals is typically approached by noting that a limit
ordinal b is (by deﬁnition) the union of all ordinals a < b and using this
fact to prove ϕ(b) assuming that ϕ(a) holds true for all a < b.
Proof: The proof of the principle of transﬁnite induction is similar to the
proof of the strong induction principle.
Clearly, all three given principle are equivalent, since we proved them to
be true. These proofs however are based on an underlying set of axioms (the
socalled ZF axioms and the Peano axioms, that will be introduced in section
5.3 and chapter 4 respectively). Without these conditions (to be exact, with
out Peano’s induction axiom), we cannot directly prove the principles to be
true from the ZF axioms alone
4
. In that case we can prove the equivalence
of the principles by showing that they imply each other. As an example,
we now prove that (mathematical) induction is a special case of transﬁnite
induction, for the set of natural numbers. To prove this it suﬃces to show
that ( C
l
and D
l
) ↔ E
l
.
4
With only the fundamental axioms of ZermeloFraenkel set theory, it is not possible to
prove mathematical induction. An extra axiom is needed, the infamous Axiom of Choice,
or one of its equivalent forms. The four statements known as ‘Axiom of Choice’, ‘Zorn’s
Lemma’, ‘WellOrdering principle’ (also known as wellordering theorem, see page 3.8.2)
and ‘Mathematical Induction Principle’ are all equivalent, meaning that if you assume one
of them to be true, the others follow as consequences, but none of them can be proven
from the other fundamental axioms in ZF set theory alone. There are also other equivalent
statements that are sometimes used (such as Zermelo’s postulate), and it is a nice exercise
to prove the equivalence of these statements.
3.4. INDUCTION METHODS 39
Normal induction (IND):
(∀ϕ :: ϕ(0) ∧ (∀k : k ∈ N : ϕ(k) → ϕ(k + 1)) → (∀n : n ∈ N : ϕ(n)))
Transﬁnite induction (TFIND):
(∀ψ :: (∀q : q ∈ N : (∀p : p ∈ N : p < q → ψ(p)) → ψ(q)) → (∀m : m ∈ N : ψ(m)))
We can prove the equivalence of IND and TFIND in two ways: in a con
structive way or with a proof by contradiction. We give both proofs.
Proof by Contradiction: (from: [17])
It suﬃces to prove that IND’ ≡ TFIND’, with
IND’ ≡ (∀ϕ :: ϕ(0) ∧ (∀k : k ∈ N : ϕ(k) → ϕ(k + 1)))
TFIND’ ≡ (∀ψ :: (∀q : q ∈ N : (∀p : p ∈ N : p < q → ψ(p)) → ψ(q)))
Proof of TFIND’ → IND’: Assume ϕ is a property. We assume TFIND’,
and instantiate ψ with the property ϕ. We now want to prove IND’. If we
take q = 0, (∀p : p ∈ N : p < 0 → ϕ(p)) is trivially true. Thus we have
ϕ(0). We now prove by contradiction that (∀k : k ∈ N : ϕ(k) → ϕ(k + 1)).
Assume k ∈ N, ϕ(k) ∧ ϕ(k + 1). That means the condition of TFIND’
(∀p : p ∈ N : p < q → ϕ(p)), with q = k + 1 must not be true: (∀p :
p ∈ N : p < k + 1 → ϕ(p)), i.e. (∃p : p ∈ N : p < k + 1 ∧ ϕ(p)).
Let s ∈ N be the smallest number such that s < k + 1 ∧ ϕ(s), that is
(∀r : r ∈ N : r < s → ϕ(r)). But then we would have ϕ(s) according to
TFIND’ (namely if we take s for q and r for p), contradiction. Now we have
proved that (∀ϕ :: (∀k : k ∈ N : ϕ(k) → ϕ(k + 1))), and since we already
have proven (∀ϕ :: ϕ(0)), we have IND’.
Proof of IND’ → TFIND’: Assume IND’, instantiate ϕ with ψ. For all
properties ψ we have to prove (∀q : q ∈ N : (∀p : p ∈ N : p < q →
ψ(p)) → ψ(q)). First we prove this for q = 0. If we take q = 0, we have
(p < 0 → ψ(p)) → ψ(0), i.e. ψ(0). This is true by the assumption of IND’.
Now we prove this for q > 0. Suppose we have (∀q : q ∈ N : (∀p : p ∈ N : p <
q → ψ(p)). By IND’ we also know that (∀k : k ∈ N : ϕ(k) → ϕ(k + 1)), and
thus ϕ(q) also holds for all q > 0. Hereby we have proved TFIND’.
40 CHAPTER 3. MATHEMATICAL CONSTRUCTS IN SETTHEORY
Constructive Proof:
Proof of TFIND → IND: Assume TFIND, and let ϕ be a property. We
now need to prove that ϕ(0) ∧ (∀k : k ∈ N : ϕ(k) → ϕ(k + 1)) → (∀n :
n ∈ N : ϕ(n)). Assume ϕ(0) ∧ (∀k : k ∈ N : ϕ(k) → ϕ(k + 1)). We
want to use TFIND to conclude (∀n : n ∈ N : ϕ(n)). TFIND gives us:
(∀k : k ∈ N : (∀l : l ∈ N : l < k → ϕ(l)) → ϕ(k)). Let k ∈ N. We now have
that (∀l : l ∈ N : l < k → ϕ(l)) → ϕ(k). If k = 0, (∀l : l ∈ N : l < k → ϕ(l))
is trivially true since the range of l is empty. Thus ϕ(k) holds for k = 0.
Assume k > 0, and (∀l : l ∈ N : l < k → ϕ(l)). This means ϕ(k − 1) holds
(since k −1 ∈ N). But we have assumed that (∀k : k ∈ N : ϕ(k) → ϕ(k +1)).
Thus ϕ(k) holds also for k > 0.
Proof of IND → TFIND: Assume ψ is a property. Also assume that
(i): (∀k : k ∈ N : (∀l : l ∈ N : l < k → ϕ(l)) → ϕ(k)). Let s(k) :=
(∀l : l ∈ N : l < k → ϕ(l)), for all k ∈ N. We prove (∀n : n ∈ N : ϕ(n)) by
ﬁrst proving that (∀n : n ∈ N : s(n)) by using IND, and subsequently that
(∀n : n ∈ N : s(n) → ϕ(n)). Clearly, s(0) holds trivially since the range of
l is empty in that case. Suppose s(k) holds. Since s(k + 1) ≡ s(k) ∧ ϕ(k),
we can conclude s(k + 1) because ϕ(k) follows from (i) and the deﬁnition of
s(k). Now we have s(0) ∧ (∀k : k ∈ N : s(k) → s(k +1)), and thus (by using
IND) that (∀n : n ∈ N : s(n)). And, by the deﬁnition of s, (i) gives us that
(∀n : n ∈ N : ϕ(n)).
Structural Induction
In many cases we do not want to prove properties about the integers or
similar wellordered sets. In such cases straight induction is not always useful.
However, forms of induction can also be appropriate when trying to prove
properties about structures deﬁned recursively. This generalized induction
principle is known as structural induction. It is useful when objects are built
up from more primitive objects: if we can show the primitive objects have
the desired property, and that the act of building preserves that property,
then we have shown that all objects must have the property. The induc
tive hypothesis (i.e., the assumption) is to assume that something is true for
‘simpler’ forms of an object and then prove that it holds for ‘more complex’
forms. ‘Complexity’ can be deﬁned in several ways: the most common way
is to say that one object is more complex than another if it includes that
3.4. INDUCTION METHODS 41
other object as a subpart, but this need not always be the case.
A general treatment of recursively deﬁned structures (formal deﬁnition
of structural induction over recursive datatypes) will be presented in a later
version of this report.
Example: We show that mathematical induction is an instance of the general
notion of structural induction over values of recursively deﬁned types, in a
later version of this report.
Example: As an example of the use of mathematical induction we prove the
binomial theorem. The binomial theorem states that for all x, y ∈ R, and
n ∈ N we have
EQ ≡ (x +y)
n
=
n
¸
j=0
n
j
x
n−j
y
j
We call the lefthand side of this equality LHS, and the righthand side
RHS, and abbreviate the equality by EQ. We assume two real numbers x
and y and prove EQ by induction on n.
Basis case: For n = 0 the EQ clearly is correct, since both sides are 1. For
some reason, most textbooks take n = 1 as the basis, in which case LHS is
simply x +y, and RHS is
1
0
x
1−0
y
0
+
1
1
x
1−1
y
1
= x +y
Induction case: We assume EQ is true for n = k and have to show that it is
then also true for n = k + 1 :
(x +y)
k+1
=
k+1
¸
j=0
k + 1
j
x
k+1−j
y
j
First, we rewrite the left side of this equation:
LHS = (x +y)
k+1
= (x +y)
k
(x +y) =
(here in fact we are using the induction hypothesis)
k
¸
j=0
k
j
x
k−j
y
j
(x +y) =
42 CHAPTER 3. MATHEMATICAL CONSTRUCTS IN SETTHEORY
k
¸
j=0
k
j
x
k−j+1
y
j
+
k
¸
j=0
k
j
x
k−j
y
j+1
In rewriting the right side of the equation, we use Pascal’s identity:
(∀k, n : k, n ∈ N ∧ 0 < k < n :
n + 1
k
=
n
k −1
+
n
k
)
We ﬁrst prove the latter:
n
k −1
+
n
k
=
n!
(k −1)!(n −k + 1)!
+
n!
k!(n −k)!
=
n! k
k!(n −k + 1)!
+
n! (n −k + 1)
k! (n −k + 1)!
=
n! (k + (n −k + 1))
k! (n −k + 1)!
=
(n + 1)!
k! (n + 1 −k)!
=
n + 1
k
Now we rewrite RHS:
RHS =
k+1
¸
j=0
k + 1
j
x
k+1−j
y
j
=
We split out the j = 0 and j = k + 1 terms before applying Pascal’s
identity.
x
k+1
+ y
k+1
+
k
¸
j=1
k + 1
j
x
k+1−j
y
j
=
x
k+1
+ y
k+1
+
k
¸
j=1
k
j
+
k
j −1
x
k+1−j
y
j
=
x
k+1
+ y
k+1
+
k
¸
j=1
k
j
x
k+1−j
y
j
+
k
¸
j=1
k
j −1
x
k+1−j
y
j
We can now bring x
k+1
into the ﬁrst sum (as the j = 0 term), and y
k+1
into the second sum (as the j = k + 1 term). This gives
RHS =
k
¸
j=0
k
j
x
k+1−j
y
j
+
k+1
¸
j=1
k
j −1
x
k+1−j
y
j
3.4. INDUCTION METHODS 43
and
LHS =
k
¸
j=0
k
j
x
k−j+1
y
j
+
k
¸
j=0
k
j
x
k−j
y
j+1
The ﬁrst sums of LHS and RHS are the same, and we can see that the
second sums are also equal, by doing a dummy transformation (let i = j −1):
k+1
¸
j=1
k
j −1
x
k+1−j
y
j
=
k
¸
i=0
k
i
x
k−i
y
i+1
So LHS = RHS, and we can conclude that EQ holds for all x, y ∈ R and
n ∈ N.
Example: We give an example of a proof about binary trees using structural
induction. First we deﬁne a data structure for binary trees. For this example
we will use a deﬁnition in the notation of the language Z to describe recur
sive data structures. The structure of a binary tree is well known and says
that a tree is either a leaf or made up of two subtrees glued together by a node.
TREE ::= leaf [ node < TREE TREE >
An example of such a tree is node(leaf, node(node(leaf, leaf), leaf)). We
now deﬁne the size of a tree, by counting both the leaves and the nodes. The
basic idea of the deﬁnition is that we deﬁne the size of a tree inductively over
the structure, saying how the size of a given tree is calculated from the sizes
of its parts. Again we deﬁne the size in the language Z, by ﬁrst declaring its
type and then saying how it is deﬁned in each of the two cases:
44 CHAPTER 3. MATHEMATICAL CONSTRUCTS IN SETTHEORY
size : TREE →N
∀ t
1
, t
2
: TREE •
size(leaf) = 1 ∧
size(node(t
1
, t
2
)) = 1 + size(t
1
) + size(t
2
)
Similarly, we make two new deﬁnitions about trees:
leaves: TREE →N
nodes: TREE →N
∀ t
1
, t
2
: TREE •
leaves(leaf) = 1 ∧
leaves(node(t
1
,t
2
)) = leaves(t
1
) + leaves(t
2
) ∧
nodes(leaf) = 0 ∧
nodes(node(t
1
,t
2
)) = 1 + nodes(t
1
) + nodes(t
2
)
We now want to prove the following theorem by structural induction on the
size of the tree t.
Theorem: For all trees t, size(t) = leaves(t) + nodes(t).
Proof: Let t, t
, t
1
and t
2
be of type TREE. We prove the theorem by
induction on the size of t.
Base case: Assume t=leaf. Then size(t) = size(leaf) = 1. Also, leaves(t) +
nodes(t) = leaves(t) + 0 = 1 + 0 = 1.
Induction case: Assume t = node(t
1
, t
2
). The induction hypothesis says that
the theorem holds for all t
with size(t
) < size(t). Then size(t)= size(node(t
1
,
t
2
)) = 1 + size(t
1
) + size(t
2
) = (apply induction hypothesis to t
1
and t
2
) 1
+ (leaves(t
1
) + nodes(t
1
)) + (leaves(t
2
) + nodes(t
2
)).
And leaves(t) + nodes(t) = leaves(node(t
1
, t
2
)) + nodes(node(t
1
, t
2
)) =
(leaves(t
1
) + leaves(t
2
)) + (1 + nodes(t
1
) + nodes(t
2
)) = (commutativity and
associativity of + ) 1 + (leaves(t
1
) + nodes(t
1
)) + (leaves(t
2
) + nodes(t
2
)).
3.5. REAL NUMBERS 45
3.5 Real numbers
What do we mean when we say ‘continuum’ ? Here is a description Albert
Einstein gave on page 83 of [21]:
The surface of a marble table is spread out in front of me. I can
get from any point on this table to any other point by passing
continuously from one point to a ‘neighboring’ one, and repeating
this process a (large) number of times, or, in other words, by
going from point to point without executing ‘jumps’. I am sure
the reader will appreciate with suﬃcient clearness what I mean
here by ‘neighboring’ and by ‘jumps’ (if he is not too pedantic).
We express this property of the surface by describing the latter as
a continuum.
People have been using the concept of real numbers for a long time (the
Babylonians for example already calculated with roots long B.C., see [12]).
In order for set theory to cover the fundamental structures of analysis, a
precise and formal basis for the real numbers was needed. Even simple equa
tions have no solutions if all we knew were rational numbers (for example,
there is no rational number x such that x
2
= x ∗ x = 2).
When Cantor developed his set theory, it was well known that each type of
number could be constructed as the limit of a sequence of numbers of another
type. But it became clear that, especially in connection with theorems as
serting the existence of some limit relations, (see [30, page 182]) the proof
might require irrational numbers to be deﬁned in terms of rational ones, in
order to avoid begging the question of existence involved in the theorem.
Cauchy and Heine tried to deﬁne the irrational or real numbers in the second
half of the 19
th
century. In 1872 Cantor and Dedekind followed with their
precise deﬁnition of the real numbers. We ﬁrst present the three methods
(of Dedekind, Cantor and Cauchy) of deﬁning the reals in terms of rationals
and then show that they are identiﬁable.
46 CHAPTER 3. MATHEMATICAL CONSTRUCTS IN SETTHEORY
3.5.1 Dedekind’s cuts
As a professor in the Polytechnic School in Z¨ urich I found my
self for the ﬁrst time obliged to lecture upon the elements of the
diﬀerential calculus and felt more keenly than ever before the lack
of a really scientiﬁc foundation for arithmetic.
 Richard Dedekind, in the opening of the paper in which Dedekind’s
cuts were introduced.
Dedekind deﬁned a cut to determine a real number. A cut is a partition
of a sequence into two disjoint nonempty subsequences, all the members of
one of which are less than all the members of the other. Dedekind used the
point at which the sequence is partitioned
5
to deﬁne a real number.
Deﬁnition of a (Dedekind) cut:
Given an ordering < on a set V , a subset C ⊆ V is a cut in V :=
1) C = ∅ ∧ C = V
2) (∀a, b : a, b ∈ C : a ∈ C ∧ b < a → b ∈ C)
3) C does not have a greatest element
Example: ¦x ∈ Q [ x
2
< 2¦ is a cut in Q. Notice that we can also deﬁne the
same cut as ¦x ∈ Q [ x
4
< 4¦.
Each real number r can now be deﬁned by a cut C in Q if r is the supre
mum for C. Each cut then determines a unique real number (see paragraph
3.5.4). We want to identify cuts that deﬁne the same real number, such as
for example ¦x ∈ Q [ x
2
< 2¦ and ¦x ∈ Q [ x
4
< 4¦.
Deﬁnition of (Dedekind) cut equivalence: A cut C
1
is equivalent to a
cut C
2
, notation C
1
∼ C
2
:= there is a supremum r for C
1
and for C
2
We can now deﬁne R
Dedekind
as the set of all equivalence classes of all cuts
in Q: R
Dedekind
:= ¦C ⊆ Q [ C is a cut in Q ¦/∼.
5
Actually, Dedekind’s original deﬁnition did not use a partition but a slightly more
complex division. For details see the link ‘Dedekind cuts’ at http://zax.mine.nu/stage.
3.5. REAL NUMBERS 47
Example: ¦x ∈ Q [ x
2
< 2¦ has
√
2 as supremum. We can identify the real
number
√
2 with the equivalence class of all sets that have
√
2 as supremum.
3.5.2 Cantor’s chains of segments
In mathematics the art of proposing a question must be held of
higher value than solving it.
 A thesis defended in Cantor’s doctoral examination.
Cantor deﬁned a chain of segments to determine a real number (see also
[17, chapter 12]). This is a sequence of ever decreasing intervals in Q, the
limit of which determines a unique real number.
Deﬁnition of chain segments:
< a
n
, b
n
>
V
n∈N
is a chain of segments (in V ) :=
1) (∀n : n ∈ N : a
n
∈ V ∧ b
n
∈ V )
2) (∀n : n ∈ N : a
n
≤ a
n+1
≤ b
n+1
≤ b
n
)
3) (∀n : n ∈ N : b
n
−a
n
≤ 2
−n
)
Example: Consider the following chain of segments in Q:
<< 1, 2 >, < 1.4, 1.5 >, < 1.41, 1.42 >, < 1.414, 1.415 >, . . . >.
Each segment ‘includes’
√
2.
Note that < a
n
, b
n
>
V
n∈N
(notation < a
n
, b
n
>
V
or < a
n
, b
n
> when it
is clear which set V is meant) is actually a sequence, and in 3) a minimum
bound is put on the speed of convergence. We now want to be able to say
when two chains are equivalent.
Deﬁnition of chain equivalence: The chains of segments < a
n
, b
n
> and
< c
n
, d
n
> are equivalent, notation < a
n
, b
n
> ∼ < c
n
, d
n
> :=
(∀k : k ∈ N : b
k
≥ c
k
∧ d
k
≥ a
k
)
Theorem: ∼ is an equivalence relation on the set of all chains of segments
of Q
48 CHAPTER 3. MATHEMATICAL CONSTRUCTS IN SETTHEORY
Each equivalence class of chains of segments in Qnow determines uniquely
a real number r. To be precise, r is determined by < a
n
, b
n
>
∼
if
(∀n : n ∈ N : a
n
< r < b
n
). r then is the only real number with this property
(see also paragraph 3.5.4).
We can now deﬁne R
Cantor
as the set of all equivalence classes of chains
of segments in Q : R
Cantor
:=< a
n
, b
n
>
Q
n∈N
/ ∼
3.5.3 Cauchysequences
Men pass away, but their deeds abide.
 Louis Cauchy, his last words quoted in [22].
Cauchy deﬁned a Cauchy sequence to determine a real number. His sequence
of numbers deﬁnes a real by letting the numbers come closer to the real num
ber in every step.
Deﬁnition of Cauchy Sequence: With _ a partial order on a set
6
V ,
¦a
n
¦
V
n∈N
is a Cauchy sequence in V :=
1) (∀n : n ∈ N : a
n
∈ V )
2) (∀k : k ∈ N : (∃p : p ∈ N : (∀n, m : n, m ∈ N : n, m > p →
[ a
n
−a
m
[ ≤ 2
−k
)))
Example: The informally (using ‘. . .’ to informally indicate an inﬁnite con
tinuation) deﬁned sets ¦1, 1.4, 1.414, 1.4142, 1.41421, 1.414213, . . .¦ and
¦1, 1.414, 1.4121, . . .¦ are both Cauchy sequences. For each n ∈ N, a
n+1
lays
closer to
√
2 than a
n
.
We also denote a Cauchy sequence ¦a
n
¦
n∈N
simply by a
n
. We now want
to be able to say when two Cauchy sequences are equivalent.
6
V is in general an ordered, commutative ring. We will not further discuss this here,
and for the rest of this paragraph take V = Q.
3.5. REAL NUMBERS 49
Deﬁnition of Cauchy sequence equivalence: The sequences a
n
and b
n
are equivalent, notation a
n
∼ b
n
:= lim
n→∞
(a
n
) = lim
n→∞
(b
n
)
Note that in the deﬁnition of equivalence the hitherto undeﬁned notion
of a limit is used. With the following deﬁnition we can formalize the notion
of a limit.
Deﬁnition of sequence convergence: A sequence ¦a
n
¦
n∈N
of elements of
a set V is said to converge to a sequence ¦b
n
¦
n∈N
, notation lim
n→∞
(a
n
) =
lim
n→∞
(b
n
) := (∀k : k ∈ N : (∃p, q : p, q ∈ N : (∀n, m : n, m ∈ N ∧ n >
p ∧ m > q : [ a
n
−b
m
[ < 2
−k
)))
Note: convergence is usually deﬁned in terms of real numbers, but we can
not use such deﬁnition here because we yet have to deﬁne the reals. The num
ber r is then called the limit of the sequence a
n
, notation lim
n→∞
(a
n
) = r,
if (∀k : k ∈ N : (∃p : p ∈ N : (∀n : n ∈ N ∧ n > p :[ a
n
−r [< 2
−k
))).
A sequence is said to diverge if it does not converge.
Theorem: Any convergent sequence ¦a
n
¦
n∈N
is bounded and has a unique
limit.
Proof: First we prove (by contradiction) the uniqueness. Suppose the se
quence has 2 limits, c and c
. Take any k ∈ N. Then from the deﬁnition of
convergence there is an integer p such that [ a
n
−c [ < 2
−k
if n > p. Also, there
is an integer p
such that [ a
n
− c
[ < 2
−k
, if n > p
. Adding the two equa
tions we get (using the triangle inequality: (∀a, b :: [ a + b [ ≤ [ a [ + [ b [ ))
: [ c
− c [ = [ (a
n
− c) + (c
− a
n
) [ ≤ [ a
n
− c [ + [ a
n
− c
[ < 2
−k
∗ 2.
Hence, [ c
−c [ < 2 ∗ 2
−k
, for all k ∈ N, if n > p ∧n > p
. This means c = c
,
thus the limit is indeed unique. Now we prove boundedness. The sequence
converges, so we can take, for example, k = l. Then there is a p such that
[ a
j
−c [ < 2
−k
for j > p. We then have, again using the triangle inequality,
that [ a
j
[ ≤ [ a
j
− c [ + [ c [ < 2
−l
+ [ c [. Then the sequence can be
bounded by M = max.¦[ a
1
[, [ a
2
[, . . . , [ a
p
[, (1 + [ c [)¦
Each real number can now be deﬁned by an equivalence class of Cauchy
sequences: r is determined by a
n ∼
if r = lim
n→∞
(a
n
), for each sequence a
n
from the equivalence class a
n ∼
.
50 CHAPTER 3. MATHEMATICAL CONSTRUCTS IN SETTHEORY
We can now deﬁne R
Cauchy
as the set of all equivalence classes of Cauchy
sequences in Q : R
Cauchy
:=< a
n
>
Q
n∈N
/ ∼
3.5.4 Properties of the three deﬁnitions
Before these deﬁnitions for real numbers were given, we intuitively thought of
the reals as inﬁnite sequences of (decimal) digits. In the rest of this section
we assume that by R we mean this set of reals, i.e. all inﬁnite sequences
of decimal numbers. We can now check whether the three new deﬁnitions
indeed are correct ways to identify real numbers:
1) < a
n
, b
n
> Q is a chain of segments → (∃!c : c ∈ R : (∀n : n ∈ N : a
n
≤
c ≤ b
n
))
2) C is a cut in Q → (∃!c : c ∈ R : c = supremum(C))
3) ¦a
n
¦
n∈N
is a Cauchy sequence → (∃!c : c ∈ R : lim
n→∞
(a
n
= c))
Then we can check for every newly deﬁned set X of reals that:
a) it contains a countable, densely ordered (i.e. (∀r
1
, r
2
: r
1
, r
2
∈ D : (∃q :
q ∈ Q : r
1
< q < r
2
))) set D without endpoint, which is dense in X.
b) every Dedekind cut has a supremum in X.
Every set for which a) and b) hold is isomorphic with R. If a deﬁnition
satisﬁes a) and b) it possesses the properties we intuitively want the real
numbers to have. It can be proven that if these two properties hold we have
deﬁned the reals successfully such that there is a total ordering on the reals,
the reals are densely ordered and the ordering is continuous.
3.6. INFINITE SETS 51
3.6 Inﬁnite sets
Our minds are inﬁnite, and yet even in these circumstances of
ﬁnitude we are surrounded by possibilities that are inﬁnite, and
the purpose of life is to grasp as much as we can out of that in
ﬁnitude.
 A.N. Whitehead in [76]
The size of a ﬁnite set V , notation [ V [, can be deﬁned by the number of
elements that it has. But counting the elements does not end for inﬁnite sets.
Cantor was concerned with the problem of measuring the sizes of inﬁnite sets
(because he was investigating questions about singularities of Fourier series,
see [30, chapter 4]) and proposed a rather nice solution to this problem. He
observed that two ﬁnite sets have the same size if the elements of one set
can be paired with the elements of the other set; this method compares sets
without resorting to counting and can be extended to inﬁnite sets.
This is the concept of an equivalence relation between sets (the relation is
also referred to as ‘are of the same cardinality’, ‘equipotent’ or ‘equipollent’
(see [30, page 229])).
Deﬁnition of set equivalence: A set V is equivalent to a set W, notation
V ∼ W := there is a bijection f : V → W
It is simple to check that ∼ has the properties of an equivalence relation,
i.e. it is reﬂexive, symmetric and transitive. But if we consider ∼ to be a
true relation, we need the concept of V , the set of all sets: ∼ ⊆ V V . But
the existence of V is paradoxical, see section 3.8.
This new method to measure the number of elements of a set is reﬂected
in the notion of cardinality of a set, and led to the surprising result that
there are many levels of inﬁnity. Before we present a proof of this result,
using Cantor’s famous diagonalization method, we ﬁrst introduce some more
deﬁnitions.
52 CHAPTER 3. MATHEMATICAL CONSTRUCTS IN SETTHEORY
Postulate for Cardinal numbers:
With every set V is associated a welldeﬁned abstract entity V , called the
cardinal number of V , such that V ∼ W ↔ V = W. We can think of V
as denoting the common property of set equivalence (as deﬁned above) of all
sets in the equivalence class of V .
It proved diﬃcult however, to come to an exact deﬁnition of cardinality
from this postulate. Cantor regarded cardinals as special abstract entities
of a new kind. In 1884, the German mathematician Frege came with his
own deﬁnition of cardinal numbers. He discussed it with the mathematician
Russell and they proposed the idea of deﬁning V as V/ ∼, the equivalence
class of V modulo ∼. The postulate for cardinal numbers then follows at
once. Frege also denoted ﬁnite cardinal numbers as natural numbers: ∅ = 0,
¦∅¦ = 1, ¦∅, ¦∅¦¦ = 2, . . .. This FregeRussell deﬁnition would become stan
dard, until  as we will later see in section 3.8  it became known that this
deﬁnition could also lead to a paradox.
Cantor used the Hebrew letter aleph to name the diﬀerent levels of in
ﬁnity. The cardinality of the set of natural numbers is by deﬁnition called
alephnull or alephnough, notation ℵ
0
. The ‘next levels’ of inﬁnity are called
ℵ
1
, ℵ
2
, . . .. Since the cardinality of the set of reals was unknown, Cantor de
ﬁned it as c. If we assume the continuum hypothesis (see section 3.7), that
says there is no level of inﬁnity between the cardinality of N and R, the car
dinality of the set of reals can also be denoted by alephone, notation ℵ
1
.
Property of cardinality: Given the cardinality V of a set V , we have
• If V is ﬁnite: V = the number of elements of V
• If V is inﬁnite: V = ℵ
i
, when there exists a bijection between V and
the set {
i
(N)
Sometimes the cardinality of a set V is also denoted by [ V [ , after the size
of a set V . A more rigorous treatment of cardinal numbers will be given in
section 3.8.1. This new concept enabled Cantor to deﬁne more concepts for
the analysis of inﬁnite sets. It also inspired others to analyze the properties
of inﬁnite sets.
3.6. INFINITE SETS 53
No other question has ever moved so profoundly the spirit of man,
no other idea has so fruitfully stimulated his intellect; yet no other
concept stands in greater need of clariﬁcation than that of the in
ﬁnite.
 D. Hilbert, quoted in [96]
In the rest of this section we will present some of the results of the research
of inﬁnite sets.
Deﬁnition of ﬁnite: A set V is ﬁnite := (∃n : n ∈ N : V ∼ ¦x ∈ N [ x < n¦)
Deﬁnition of inﬁnite: A set V is inﬁnite := V is not ﬁnite
Deﬁnition of Dedekind inﬁnite:
A set V is Dedekind inﬁnite := (∃W : W ⊂ V : V ∼ W)
Theorem: V is Dedekind inﬁnite ↔ V is inﬁnite (from [17])
Proof: We show that V is inﬁnite iﬀ N ≤
1
V . We prove the two implications
of the theorem separately:
V is Dedekind inﬁnite → V is inﬁnite: V is Dedekind inﬁnite, i.e. there
exists a W ⊂ V such that V ∼ W, i.e. there exists a bijection f : V → W.
Because W is nonempty and W ⊂ V there also exists an a ∈ V such that
a / ∈ W. Consider the function g : N → V , deﬁned recursively by g(0) = a
and g(k + 1) = f(g(k)). We now have to show that g is an injection, i.e for
all i, j ∈ N : i = j → g(i) = g(j). We use induction on i:
i = 0: if 0 = j then g(0) = a / ∈ W and g(j) ∈ W, so g(0) = g(j).
i = k + 1 : assume k + 1 = j, then we can prove g(k + 1) = g(j) by
induction on j:
j = 0 : g(0) = a / ∈ W and g(k + 1) ∈ W, so g(k + 1) = g(0).
j = l + 1: we know k = 1 = j = l + 1, so k = l. By the induction
hypotheses g(k) = g(l). Since f is a bijection we also have that
f(g(k)) = f(g(l)), i.e g(k + 1) = g(l + 1) or g(i) = g(j).
54 CHAPTER 3. MATHEMATICAL CONSTRUCTS IN SETTHEORY
V is Dedekind inﬁnite ← V is inﬁnite: N ≤
1
V , so there exists a bijec
tion f : N → V . We show that W := V − ¦f(0)¦, clearly a real subset
of V (W ⊂ V ), is equivalent to V (W ∼ V ). The following function g is a
bijection from V to W: g(f(i)) = f(i +1), g(x) = x if x = f(i), for all i ∈ N.
Deﬁnition of countable:
A set V is countable, also called denumerable := V is ﬁnite or V ∼ N
Deﬁnition of uncountable: A set V is uncountable := V is not countable
Deﬁnition of denumeration: A denumeration of a set V is a bijection
f : N → V
Cantor then proved that N, Z and Q all have the same cardinality and
also called these sets countably inﬁnite.
Theorem: Q is countable
Proof: We give a bijection from N to Q, by listing all elements of Q. Consider
a table with all fractionals
a
b
(a ∈ N, b ∈ N
+
, with fractional
a
b
on the a
th
row and the b
th
column. If we list all elements row by row, we would not
obtain a correspondence between N and Q, since the list would never get
to the second row. By listing the elements at the diagonals (southwest to
northeast), starting from the northwest corner, we obtain a correspondence
between N and Q. Because
2
2
=
1
1
, etc, we hereby skip an element when it
would cause a repetition. We can also give a bijection from Q to an inﬁnite
subset of N which is equivalent to N: for each fractional
a
b
∈ Q with a and b
relative prime, let f(< a, b >) :=
1
2
(a +b)(a +b + 1) +n.
An example of an uncountable set is the set of real numbers, R. In 1873
Cantor proved that R is uncountable, using a technique called diagonaliza
tion (also known as the diagonal method), see [17, page 99].
Theorem: R is uncountable
Proof: Suppose there is a bijection f between N and R. We contradict this
by ﬁnding an x in R that is not paired with anything in N. We construct
this X by taking the ﬁrst fractional digit of x arbitrarily but never 0 or 9 or
the ﬁrst fractional digit of f(1), the second fractional digit of x also diﬀerent
from 0, 9, and the second fractional digit of f(2), etc. Continuing this way
3.6. INFINITE SETS 55
down the diagonal of the table of digits, we obtain all digits of x. x is not
f(n) for any n because the n
th
fractional digit of x diﬀers from the n
th
frac
tional digit of f(n).
Note that we avoid the problem of certain numbers such as 2.3999 . . . and
2.4000 . . . being equal by never selecting a 9 or a 0. Similarly, we can use
this diagonalization method to show that N ∼ ¦0, 1¦
N
.
Theorem: (∀V :: {(V ) ∼ ¦0, 1¦
V
). (see [17, page 98])
Proof: We show that there is a bijection K from {(V ) to ¦0, 1¦
V
. For
W ⊆ V , deﬁne K(W) (also denoted K
W
), the characteristic function of W,
as:
K
W
(v) = 1 if v ∈ W
K
W
(v) = 0 if v / ∈ W.
We now show that K is a bijection from {(V ) to ¦0, 1¦
W
:
1) f is injective: let W
1
, W
2
⊂ V and suppose W
1
= W
2
, that means there
is an element w ∈ V , such that (w ∈ W
1
∧w / ∈ W
2
)∨(w / ∈ W
1
∧w ∈ W
2
).
Then we have that (K
W
1
(w) = 1 ∧ K
W
2
(w) = 0) ∨ (K
W
1
(w) = 0 ∧
K
W
2
(w) = 1), and thus (∃w : w ∈ V : K
W
1
(v) = K
W
2
(v)), i.e. K
W
1
=
K
W
2
.
2) f(w) is surjective: suppose g ∈ ¦0, 1¦
V
. Let W
g
= ¦v ∈ V [ g(v) = 1¦.
Then (∀v : v ∈ V : K
W
g
(v) = 1 ↔ g(v) = 1), thus (∀v : v ∈ V :
K
W
g
(v) = g(v)), and g = K
W
g
.
We can deﬁne an ordering relation ≤
1
on the cardinalities of sets. We
say that V ≤
1
W if there is an injection from V to W but not vice versa.
Then V <
1
W of course means that V ≤
1
W holds but not V ∼ W. This
relation on the set of cardinals only depends on the cardinals themselves and
not on the choice of the particular sets V and W. The relation ≤
1
is reﬂexive
and transitive. Cantor also conjectured that ≤
1
is a partial order. This was
later proven independently by the two mathematicians F. Bernstein and E.
Schr¨oder (see [59, page39]).
We give two theorems that are based on the relation <
1
:
Theorem: (without proof) (∀V : V is a nonempty set: V <
1
{(V ))
56 CHAPTER 3. MATHEMATICAL CONSTRUCTS IN SETTHEORY
Theorem: V is Dedekind inﬁnite ↔N ≤
1
V
Proof: This theorem follows directly from the theorem on page 53 and the
deﬁnition of inﬁnite.
Although we have seen that N is countable but R is not, we might still
think that there is some smaller interval of the reals that can be paired to
the naturals.
Theorem: N ∼ [0, 1]
Proof of Poincar´ e (see [17]) We show there is no bijection f : N → [0, 1],
in particular (∀f : (f : N → [0, 1]) : f is not surjective). We do this
by constructing for every function f : N → [0, 1] a y ∈ [0, 1] such that
(∀n : n ∈ N : f(n) = y). We construct this y by means of a chain of
segments (see paragraph 3.5.2).
Let f : N → [0, 1]. Let S
n
be an inﬁnite chain of segments such that
1) (∀i : i ∈ N : f(i) / ∈ S
i
)
2) (∀i : i ∈ N : S
i+1
⊆ S
i
)
3) (∀i : i ∈ N : [ S
i
[ = 3
−i−1
),
with [ S
i
[ being the length of segment S
i
.
We can construct such a chain of segments, for if we divide a segment
S
n
= [p
q
, q
n
] in three equal parts (i.e. each part has length 3
−n−1
), at least
one of these parts does not contain f(n + 1). We take this part for S
n+1
.
The constructed chain of segments determines (see paragraph 3.5.2) a real
number y, with (∀n : n ∈ N : y ∈ S
n
), and thus certainly y ∈ [0, 1]. We also
have that (∀n : n ∈ N : f(n) / ∈ S
n
∧ y ∈ S
n
), i.e. so (∀n : n ∈ N : y = f(n)).
The following theorem gives a way to prove the equivalence of sets:
Theorem of CantorBernstein: V ≤
1
W ∧ W ≤
1
V → V ∼ W
Proof: Assume V ≤
1
W and W ≤
1
V . Then there are injections f : V → W
and g : W → V . We know that Dom(g) = W, so to prove g is surjective
we have to prove Ran(g) ∼ W. Since Ran(g) ⊆ V and g ◦ f is an injec
tion from V to Ran(g), we have V ≤
1
Ran(g). And since for all W and V ,
W ⊆ V ∧ V ≤
1
W → V ∼ W (see the lemma below), we have Ran(g) ∼ V .
3.6. INFINITE SETS 57
Lemma: W ⊆ V ∧ V ≤
1
W → V ∼ W
Proof: Suppose W ⊆ V and V ≤
1
W. There is an injection h : V → W. Let
A
0
:= V − W, and (∀n : n ∈ N : A
n+1
:= h(A
n
)). We now give the desired
bijection k : V → W.
• k(a) := a if a / ∈
¸
n
A
n
• k(a) := h(a) if a ∈
¸
n
A
n
We show that k is a bijection:
• k is injective: Suppose a = b, then k(a) = k(b) by using a case analysis
a / ∈
¸
n
A
n
∧ b / ∈
¸
n
A
n
, a / ∈
¸
n
A
n
∧ b ∈
¸
n
A
n
, a ∈
¸
n
A
n
∧ b / ∈
¸
n
A
n
, a ∈
¸
n
A
n
∧b ∈
¸
n
A
n
. For all cases, it follows that k(a) = k(b)
by the deﬁnition of k and the injectivity of h.
• k is surjective: Suppose w ∈ W, thus w / ∈ A
0
. Again we use case
analysis:
– if w / ∈
¸
n
A
n
then w = k(w).
– if w ∈
¸
n
A
n
, assume w ∈ A
p
. Since w / ∈ A
0
, p ≥ 1. Thus there is
a w
∈ A
p−1
such that w = k(w
).
Example: We prove that (a, b) ∼ [0, 1] for all a, b ∈ R by using the theorem
of CantorBernstein. We ﬁrst prove that (0, 1) ∼ [0, 1] and consequently
that (0, 1) ∼ (a, b). Then, by the transitivity of ∼ we can conclude that
(a, b) ∼ [0, 1].
Proof of (0, 1) ∼ [0, 1]: The identity function id
(0,1)
: (0, 1) → [0, 1]
is an injection from (0, 1) to [0, 1], so (0, 1) ≤
1
[0, 1]. The function
f(x) =
1
3
(x + 1) is an injection from [0, 1] to (0, 1), so [0, 1] ≤
1
(0, 1).
By the theorem of CantorBernstein we now know that (0, 1) ∼ [0, 1].
Proof of (0, 1) ∼ (a, b): The function f(x) = (b −a)x +a is a bijection
from (0, 1) to (a, b).
Using the CantorBernstein theorem we can also prove that
(a, b) ∼ (0, 1) ∼ R ∼ R
n
∼ ¦0, 1¦
R
∼ {(N) ∼ N
N
, for all n ∈ N, n ≥ 1.
58 CHAPTER 3. MATHEMATICAL CONSTRUCTS IN SETTHEORY
Theorem: V is inﬁnite →N ≤
1
V
Proof: V is inﬁnite and thus not empty. We take one element x
0
∈ V . Next,
we take an element x
1
∈ V −¦x
0
¦. We can repeat this inﬁnitely (i.e. for all
n we can select an x ∈ V − ¦x
0
, . . . , x
n
¦), if we assume that it is possible
to always select an element from any nonempty set (see the axiom of choice
below). In this way we get a countable subset of V , namely ¦x
0
, x
1
, x
2
, . . .¦.
The only assumption we have made here is the socalled axiom of choice.
Axiom of choice (AC): Given any set W of nonempty sets V , there is a
function f which assigns to each member V of W an element f(V ) of V .
This deﬁnition was proposed ﬁrst in an article by Zermelo in 1908 (trans
lated in [93, pages 199215]). Such a function f is called a choice function
for W. The axiom can be restricted by limiting to those families W of a par
ticular cardinality. Since for any ﬁnite W the axiom is provable, the weakest
nontrivial case occurs when W is denumerable (see page 54 for the deﬁnition
of denumerable). This case is known as the Denumerable axiom. Zermelo
regarded the AC as already implicitly used by mathematicians. In response
some people asked when this assumption developed from mathematics, when
it is implicitly used, and when exactly it can or cannot be avoided. Zermelo
attempted to prove AC, but the controversy over his proof of 1904 (see [63,
page 310]) led Zermelo to axiomatize set theory (see section 5.3.1). We can
add AC to set theory based on the axioms of Zermelo and Fraenkel (ZF, see
section 5.3), in which case it is termed ZFC (ZF supplemented by the Axiom
of Choice). For more details on the role of the AC, we refer to section 5.3
and [63]. See http://zax.mine.nu/stage and click on ‘links’ for some quotes
about the AC.
An instance of the following theorem (without proof) of the British ma
thematician F.P. Ramsey is often used in graph theory. The notation V
n
in
this theorem is deﬁned as the set of all subsets of V with n elements, i.e.
V
n
:= ¦X ⊆ V [ X has n elements¦.
Theorem of Ramsey: If V is a denumerable set and f : V
n
→ ¦0, 1, . . . , m−
1¦ with n, m ∈ N and n, m ≥ 1 then (∃W : W ≤
1
V : W is denumerable and
f is constant on W
n
).
3.6. INFINITE SETS 59
Theorem: R
2
∼ R ∼ (0, 1)
Proof: We can say that R ∼ (0, 1) if there is a bijection between (0, 1)
and R. Indeed, there exists a bijection f : (0, 1) → R, deﬁned as f(x) =
tan(
π
2
(2x−1)). Thus: R ∼ (0, 1). If we consider an element of R
2
, that is two
real numbers between 0 and 1, then we can map these numbers to an element
r ∈ R by interchangeably taking the next digit of each of the two numbers.
For example, we map (0.76584 . . . , 0.13275, . . .) uniquely to (0, 71635 . . .).
Thus: R
2
∼ R. Since ∼ is transitive, we know that R
2
∼ R ∼ (0, 1).
Theorem: {(N) ∼ (0, 1)
Proof: First we show that {(N) ≤
1
R. Suppose V ∈ {(N), map V to
the decimal 0.a
1
a
2
. . ., with a
i
= 1 if i ∈ V and a
i
= 0 otherwise. This
injection proves that {(N) ≤
1
R. Now we give an injection from (0, 1) to
{(N): assume r ∈ (0, 1), i.e. r = 0.a
1
a
2
. . . with 0 ≤ a
i
≤ 9. We want
to identify numbers such as 0.3999 . . . and 0.4000 . . .. Therefore we assume
there is not an i ∈ N such that for all n > i, n ∈ N, a
n
= 9. Then we
map r to the set ¦1a
1
, 1a
1
a
2
, . . .¦ of natural numbers. Clearly, this map
ping is welldeﬁned. For example, r = 0.17803 . . . is mapped to the set
¦11, 117, 1178, 11780, 117803, . . .¦. Thus (0, 1) ≤ {(N), hence {(N) ∼ (0, 1).
Corollary: {(N) ∼ R
Proof: This directly follows from {(N) ∼ (0, 1) and (0, 1) ∼ R, and the
transitivity of ∼.
60 CHAPTER 3. MATHEMATICAL CONSTRUCTS IN SETTHEORY
3.7 The Continuum Hypothesis
We still think that the study of the size of the continuum should
be our guiding light for further research in set theory.
 Judah Haim in [33]
After showing that the real numbers cannot be put into onetoone corre
spondence with the natural numbers (see section 3.5), Cantor hypothesized
in 1877 that each inﬁnite subset of R is either denumerable or equivalent
to the continuum. This hypothesis was ﬁrst published in 1878 in [13] and
became later known as:
The Continuum Hypothesis (CH): (N ≤
1
A ≤
1
R) → (A ∼ N∨A ∼ R)
This hypothesis (as given in [17, page 128]) is also known in many other
forms, of which we will mention and explain the most important. We can
immediately see that the following version of CH is equivalent to the given
deﬁnition: ‘any set of real numbers is either ﬁnite, countably inﬁnite or has
the same cardinality as the entire set of reals’. This means that ‘the num
ber of real numbers is the next level of inﬁnity above the number of natural
numbers’ (see also [30, page 197]).
As we saw in section 3.6, Cantor deﬁned the cardinality of the natural
numbers to be ℵ
0
, and the next levels of inﬁnity to be ℵ
1
, ℵ
2
, ℵ
3
, etc. He also
named the cardinality of the reals c, for continuum. Cantor’s original for
mulation of CH was: (B) c = ℵ
1
. Since Cantor also proved that {(N) ∼ R
(see page 59), we can also state CH as: (C) {(N) ∼ ℵ
1
. The cardinality of
the power set of any set X is equal to the cardinality of ¦0, 1¦
X
(see page
55), often denoted as 2
X
, so another formulation
7
of CH is: (D) 2
ℵ
0
= ℵ
1
(see [31]). These formulations, although (B) leads us to think about sizes
of reals, (C) about subsets and (D) about cardinal exponentiations, are all
equivalent in ZFC. We will not go into details of less precise or more de
pendant formulations such as ‘what is the cardinality of the set of points on
a geometrical line?’.
7
Actually in this formulation we have identiﬁed the cardinalities ℵ
0
and ℵ
1
with the
sets that have these cardinalities.
3.7. THE CONTINUUM HYPOTHESIS 61
Some of the theory that is needed in the remaining part of this section, for
the generalized continuum hypothesis, will be introduced in later chapters.
If you are not familiar with the notations that are used, you might want to
skip the remaining part of this section and get back to it later.
In 1908 the German mathematician Felix Haussdorf proposed the follo
wing generalization of CH (that is also called alephhypothesis):
The Generalized Continuum Hypothesis (GCH):
(∀r : r is an ordinal : 2
ℵ
r
= ℵ
r+1
)
For a deﬁnition and the notation of ordinal numbers, we refer to section
3.8.1. Obviously, (see section 5.3) we have that ZF + GCH ¬ CH. Note
that ZF +GCH ¬ AC (so we don’t need ZFC once we have GCH).
Cantor and many other great mathematicians spent years trying to prove
CH or its negation (Cantor tried to prove his hypothesis by using a decom
pensation theorem; for details see [31, page 117]), but did not succeed. This
problem was so important that Hilbert (see section 6.2) put it ﬁrst in his list
of 23 problems.
In 1938 signiﬁcant progress was made when the mathematician G¨ odel
proved (in his article ‘What is Cantor’s continuum problem?’) that CH is
consistent with ZFC (see section 5.3.2) by constructing a model of ZFC +
CH. Since at the same period, G¨odel proved his famous incompleteness the
orem (see chapter 8), people suspected that CH was one of the statements
(of ZFC) that can neither be proved nor disproved. Mathematicians sus
pected that CH was undecidable in ZFC but it took until 1963 until this
was proved by Paul Cohen in [15].
To do that he used a new technique called forcing. Forcing is a combi
natorial technique for proving statements consistent with the axioms of set
theory. Cohen used it in order to prove that the negation of AC and the
negation of CH are consistent with the axioms of set theory (AC and CH
were already known to be consistent). Essentially it consists of a method
of performing the following algorithm: start with a model of set theory ´.
Construct an object X not in ´ with certain properties. Consider the smal
lest model ´
with X an element of ´
and ´ a subset of ´
(this is done
in a way such that the construction of ´
is implicit in the construction of
62 CHAPTER 3. MATHEMATICAL CONSTRUCTS IN SETTHEORY
X). For more details on forcing, see [51] and [81].
Thus Cohen constructed a model of ZFC + CH and this, along with
G¨odel’s model of ZFC + CH, showed that CH is undecidable in ZFC. So
this means that either CH or CH could be added as an axiom of ZFC.
But since neither of these axioms seems axiomatic or ‘selfevident’ they have,
unlike AC, not been adopted as axioms of set theory. Mathematicians either
accept this incompleteness in set theory or try to ﬁnd more intuitive axioms
that will help decide it. In other words, the question remains what intuitive
axiom of set theory we need to make it more complete, and whether, with
some axiom system for set theory, the continuum hypothesis is true.
3.8. CARDINAL AND ORDINAL NUMBERS AND PARADOXES 63
3.8 Cardinal and Ordinal numbers and Para
doxes
Every transﬁnite consistent multiplicity, that is, every transﬁnite
set, must have a deﬁnite aleph as its cardinal number.
 Georg Cantor
3.8.1 Cardinal numbers and Cantor’s Paradox
In section 3.6 we already encountered cardinal numbers and the notion of
set equivalence. After deﬁning the equivalence of sets (see page 51), Cantor
realized that all sets that are equivalent to a given set V have a common
property. He identiﬁed this property with the cardinal number V of a set V ,
a property that abstracts from the nature and order of the elements of a set.
Example: Consider the following sets: A = ¦1, 2, 3¦, B = ¦3, 2, 1¦, C =
¦¦4¦, 7, ¦a, b¦¦, D = ¦1, ¦4¦¦. We can say that A ∼ B ∼ C, or (equiva
lently) A = B = C. We also have A ∼ D, or A = D. Note that in this
example the equality ‘=’ between cardinal numbers is a new type of equality
that is deﬁned as A = B ↔ A ∼ B.
We can see that cardinality abstracts from the order and nature of the
elements, and for ﬁnite sets the cardinal number can be identiﬁed with the
ordinary ‘number of elements’. Therefore we identify the cardinal number of
a ﬁnite set of n elements with the natural number n. We denote the smallest
inﬁnite set (or transﬁnite) cardinal number by ℵ
0
. As we have already seen
on page 52, this is the cardinal number of N or any denumerable inﬁnite set.
Cantor deﬁned the ‘next’ levels of inﬁnity by ℵ
1
, ℵ
2
, . . ..
The next question was how to pass from the abstract notion of cardinal
numbers to real cardinal numbers, i.e. one wanted to regard cardinal numbers
as objects of the mathematical system. It turned out to be quite a problem
to deﬁne the cardinal V of a set V as an object of set theory. In naive set
theory, as well as in Quine’s ‘New Foundations’ (see section 7.3), the deﬁni
tion of the cardinal V of V poses no problem: V can be deﬁned as the set
of all sets equivalent to V . But this deﬁnition (ﬁrst given by Frege, see page
64 CHAPTER 3. MATHEMATICAL CONSTRUCTS IN SETTHEORY
3.6) of cardinal numbers as given in section 3.6 can lead to a paradox that
was ﬁrst found by Cantor.
Cantor’s paradox: The set of all sets is its own power set. Therefore, the
cardinality of the set of all sets must be bigger than itself.
In axiomatic set theory however (e.g. in ZF, see section 5.3), without the
unrestricted comprehension axiom, there is no set which contains all sets
equivalent to V . With this paradox the need arose to ﬁnd a new deﬁnition of
cardinals in a context without the unrestricted comprehension axiom, such
that traditional paradoxes could no longer be derived.
Several new deﬁnitions of cardinal numbers were then proposed, based
on ordinal numbers (for which we refer to the next section
8
). The following
deﬁnition that comes from the mathematician von Neumann is now the stan
dard deﬁnition for cardinal numbers.
Deﬁnition of Cardinal number (or initial number):
A cardinal number α := an ordinal number α with property (∀γ :: α ∼ γ →
α ≤ γ)
For each set V we can prove (see [17, section 2.10]) that there exists
exactly one cardinal number α satisfying V ∼ α (proof uses AC). We call
this unique α the cardinality or cardinal number of the set V , and is also
denoted by V .
In other words, with the axiom of choice we can develop the theory of
ordinals in the von Neumann way and deﬁne V to be the least ordinal α equiv
alent to V . The existence of such an α is guaranteed by the wellordering
theorem. If we have the axiom of foundation among our axioms, even if the
axiom of choice is absent we can deﬁne V as the set of all sets W of least
rank among those equivalent with V (see [1]). In the absence of the axioms
of choice and foundation the operation V is undeﬁnable (see [1]).
For more information on the deﬁnition and calculus of cardinal numbers,
we refer to [59, chapter 6], [25] and [34].
8
The rest of this section depends on concepts that are deﬁned in later chapters.
3.8. CARDINAL AND ORDINAL NUMBERS AND PARADOXES 65
3.8.2 Ordinal numbers and BuraliForti’s Paradox
We already introduced Cantor’s concept of cardinal number in section 3.6,
and saw in the previous paragraph that it abstracts of the order and nature
of the elements of a set. Cantor also deﬁned a property of sets, the ordinal
number, that only abstracts from the nature of the elements of a set, but
retains the order in which they are given.
Here we consider sets with a total ordering (see page 25). Recall that in
addition for a wellordered set, each nonempty subset also has a ﬁrst mem
ber in the given ordering (see also page 3.2). In the case of ordered sets, the
concept of equivalence is now replaced by the sharper concept of similarity.
We consider two ordered sets V and W similar, notation V · W, if there is
a bijection between V and W that retains all order relations. Note that we
have already seen this relation with the concept of isomorphism (‘is isomor
phic to’, see page 31), and note that · is an equivalence relation. Instead
of saying two sets are similar, we also can say they are of the same order type.
Deﬁnition of an Order Type: An equivalence class under the · (isomor
phism) relation
The equivalence class to which an ordered set V belongs is called the
order type of V . All wellordered sets that are as such similar to a given set
V have a common property. Cantor identiﬁed this property with the ordinal
number V of a wellordered set V , a property that only abstracts from the
nature of the elements of a set. And just as for cardinals (see section 3.8.1)
the question was posed how to deﬁne ordinal numbers as part of set theory.
In 1883 Cantor deﬁned in [13] an ordinal number as the order type of a well
ordered set.
Deﬁnition of Ordinal Number (Cantor): A wellordered set V has or
dinal number o := o is the order type of V
If a set is ﬁnite and simply ordered, it is wellordered and it has an ordinal
number. The ordinal number of that set is the same, regardless of the order
of the elements. For each ﬁnite and simply ordered set, we can therefore
66 CHAPTER 3. MATHEMATICAL CONSTRUCTS IN SETTHEORY
identify the (ﬁnite) cardinal number with the ordinal number.
Example: 0 = ∅; 1 = ¦0¦; 2 = ¦0, 1¦; 3 = ¦0, 1, 2¦ are ordinal numbers.
The smallest inﬁnite ordinal number is called ω. This is the ordinal num
ber of the sequence ¦0, 1, 2, 3, . . .¦, which can be seen as N or as the sequence
of ﬁnite cardinal numbers in their ‘natural’ order. We introduce some other
transﬁnite ordinals by example (from [10, page 66]).
Example:
If we call the set ∅ as ‘0’, the next set as ‘1’, etc., then consider the union
of all the sets ¦0, 1, 2, . . . ¦. This is another ordinal called ω and is the
ﬁrst nonﬁnite ordinal. It has a successor: ω ∪ ω, called ω + 1. More
ordinals can be obtained by continuing this succession, and taking the
union of all these ordinals yields an ordinal we call ω∗2, etc. The natural
numbers in reverse order are denoted
∗
ω.
V
1
= ¦2, 3, 4, . . . , 1¦ ; V
2
= ¦3, 4, 5, . . . , 1, 2¦
V
3
= ¦1, 3, 5, . . . , 2, 4, 6, . . .¦ ; V
4
= ¦. . . , 3, 2, 1¦
V
5
= ¦1, 3, 5, . . . , 6, 4, 2¦ ; V
6
= ¦1, 11, 21, . . . , 2, 12, 22, . . .¦
N = ω ; V
1
= ω + 1 ; V
2
= ω + 2 ; V
3
= ω +ω = ω ∗ 2
V
4
=
∗
ω ; V
5
= ω +
∗
ω ; V
6
= ω ∗ 10
For ordinal numbers n of N and m of M we say that n < m if the well
ordered set N is similar to a real subset of M.
Unfortunately, a similar situation as for cardinal numbers, was found
for ordinal numbers. In 1897 it was found by the Italian assistant of the
mathematician Peano, BuraliForti, that this deﬁnition can give rise to a
paradox (see [18, page 259]).
3.8. CARDINAL AND ORDINAL NUMBERS AND PARADOXES 67
The BuraliForti Paradox: The set of all ordinal numbers, taken in their
natural order, form a wellordered series, and therefore also has an ordinal
number Ω. But the ordinal number of any subset of the set of all ordinals
exceeds every number of that subset, and therefore Ω exceeds any ordinal
number whatsoever.
This led to new proposals for deﬁnitions of ordinal numbers. Hence we
hereunder present another deﬁnition, given by John von Neumann in [61].
In 1923 he pointed out that among all wellordered sets having a Cantorian
ordinal as their order type, there is a particular one with some very special
properties. Von Neumann deﬁned this particular set as the ordinal of that
order type.
Deﬁnition of ordinal number: A set α is an ordinal number :=
1) α is a wellordered set with the binary relation ∈ as its ordering
2) (∀β :: β ∈ α ↔ β ⊂ α)
With this deﬁnition of ordinal numbers, the BuraliForti paradox can
no longer be applied, since the set of all ordinals is wellordered by and
2) also holds (a proof is given in [59, section 4.2]). According to this def
inition, the empty set is an ordinal number. This ordinal number is also
denoted by 0. Similarly we also denote the ordinal numbers ¦0¦ by 1, ¦0, 1¦
by 2, ¦0, 1, 2¦ by 3, etc. Otherwise said: 0 = ∅, 1 = ¦∅¦, 2 = ¦∅, ¦∅¦¦, . . ..
These ordinal numbers, which are ﬁnite sets, are called ﬁnite ordinal num
bers. The ﬁnite ordinal numbers are identiﬁed with the natural numbers.
The set ω = ¦0, 1, 2, . . .¦ of all natural numbers is also an ordinal number.
An ordinal number that is an inﬁnite set, like ω, is called a transﬁnite ordi
nal number. For every wellordered set V , there exists exactly one ordinal
number isomorphic to V .
Deﬁnition of ordinal number of a wellordered set V :
The ordinal number of a wellordered set V := the ordinal number isomorphic
to V
68 CHAPTER 3. MATHEMATICAL CONSTRUCTS IN SETTHEORY
A detailed treatment of ordinal calculus that is based on this deﬁnition
of of ordinal numbers, is outside the scope of this report. In the remainder
of this section we will only deﬁne the most common concepts.
As we saw in 3.2 we also write α ∈ β (we denote ordinals by lowercase
Greek letters) as α < β, which deﬁnes an ordering on the ordinal numbers.
The least ordinal number is of course 0, and the ordering of the ﬁnite ordi
nal numbers coincides with the usual ordering of the natural numbers. The
least transﬁnite ordinal is ω (see also 5.3.2). The ordering ≤, deﬁned by
α ≤ β := α < β ∨ α = β, is a linear ordering and a wellordering of the
ordinal numbers. Therefore we can apply transﬁnite induction (see page 37)
on ordinal numbers.
For any ordinal number α, the set α
= ¦γ [ γ ≤ α¦ (called a seg
ment of α) also is an ordinal number, and α
is the unique predecessor of
α. A transﬁnite ordinal without a predecessor is called a limit ordinal num
ber, and all the other ordinal numbers are called isolated ordinal numbers.
The ﬁrst limit ordinal number is ω. For any set V of ordinal numbers,
¦γ [ (∃η : η ∈ V : η ≤ γ)¦ is an ordinal number, the supremum of V .
A full treatment of the theory of ordinal numbers is omitted here. Ri
gorous study has produced a complete calculus of ordinal numbers and pro
duced signiﬁcant results. We only mention here the socalled wellordering
theorem, which Cantor had accepted as true (see [18, page 257]) but that
was ﬁrst proved rigorously by Zermelo in 1904.
WellOrdering Theorem: Every set can be wellordered.
This means that ordinals give us a way of ‘counting’ any set, even if it is
not ﬁnite. The particular signiﬁcance of the wellordering theorem lies in the
possibility that we can apply the principal of mathematical induction (which
is well known for denumerable sets, see section 3.4.3) to any arbitrary well
ordered set. Ordinal numbers form the basis of transﬁnite induction which
is a generalization of the principle of induction.
3.8. CARDINAL AND ORDINAL NUMBERS AND PARADOXES 69
We now have the following properties (given without proof):
• Two ﬁnite and ordered sets have the same order type if and only if they
have the same cardinal number
• Cantor’s theorem : the cardinality of any set is lower than the cardi
nality of the set of all its subsets (i.e. there is no highest aleph)
• If two sets have the same ordinal number, they have the same cardinal
number, but not necessarily vice versa
For more information and theory on cardinal numbers, ordinal calculus
and set theory we refer to two classical books on settheory: [25] and [34].
The ﬁrst one gives a good introduction to set theory and presupposes little
mathematical knowledge, the latter is more suitable for readers with experi
ence on set theory.
70 CHAPTER 3. MATHEMATICAL CONSTRUCTS IN SETTHEORY
Chapter 4
Peano and Frege
4.1 Peano’s arithmetic
Questions that pertain to the foundations of mathematics, al
though treated by many in recent times, still lack a satisfactory
solution. The diﬃculty has its main source in the ambiguity of
language.
 Peano in the opening of the paper ‘Arithmetices Principia’, novo
methodo exposita in which he introduces axioms for the integers
The Italian mathematician Giuseppe Peano (18581932) spent most of
his career successively in the inﬁnitesimal calculus, in foundations of mathe
matics and in linguistic studies. After his work on calculus (see Peano’s ﬁrst
publication [65]) and geometry (see [66] [67]), Peano gained particular inter
est in the ﬁeld of number theory, also known as arithmetic. Like Dedekind
(see quote on page 46), Peano became aware of the lack of rigour in mathe
matics by his experience in teaching inﬁnitesimal calculus.
What is number theory? The ﬁeld of mathematics consisting of the study
of the properties of the natural numbers
Since then, Peano strived for rigor, for an abstract mathematics. He came
to the conclusion that mathematics must be constructed, independently of
intuition or common sense, in a way that absolutely guarantees the validity
71
72 CHAPTER 4. PEANO AND FREGE
of its theorems.
In order to satisfy this requirement he devoted himself to the transforma
tion of mathematics into a selfcontained system, and rewrote mathematics in
symbolic form as an axiomatic system (see section 6.1), based exclusively on
postulated primitive notions and primitive propositions. To discard intuition,
he ﬁrst renounced ordinary language (because it is often not suﬃcient and
imprecise) and desired a new mathematical symbolism, consisting entirely
of neutral symbols. Second, he formalized the logic of the mathematical ar
gument to replace intuitive inference by application of a limited number of
stated logical rules.
So Peano formalized both the language of mathematics and the logic
of the mathematical argument, and thereto ﬁrst developed parts of sym
bolic logic and ﬁrst formalized propositional and predicate calculus. This
development was rudimentary and would later be worked out in full detail
by the mathematicians Russell and Whitehead in ‘Principia Mathematica’
(1910, see section 7.1). He introduced letters to denote propositions and
propositional functions (Peano’s logic notation) and the symbol ∈ for the
membership relation of a set.
The work of formalization of mathematics was published in the journal
‘Rivista di Mathematica’ (this journal was previously founded by himself)
and ‘Formulario Mathematico’, a series of 5 books that is also known as
‘Formulaire de Math´ematique’
1
. In 1899 he axiomatized the arithmetic of
cardinal numbers, to be published in the third volume of ‘Formulario Math
ematico’ in 1901. Peano based the foundations of arithmetic on 5 axioms
(see [31, page 227]), that are formulated with the help of three (undeﬁned)
terms, the acquaintance with the latter being assumed:
a) N (the set of natural numbers)
b) 0 (the particular natural number zero)
c) a+ (the immediate successor of the natural number a)
1
The original ‘Formulaire de Math´ematique’ was called ‘Formulario Mathematico’ when
the ﬁrst ﬁnal version appeared in 1908, because Peano at that time consistently used
Interlingua, his simpliﬁcated dialect of Latin, for all his mathematical publications.
4.1. PEANO’S ARITHMETIC 73
Deﬁnition of the Peano axioms for the natural numbers:
1) 0 ∈ N
(zero is a natural number)
2) a ∈ N → a+ ∈ N
(the immediate successor of any number is a number)
3) 0 ∈ S ∧ (∀x :: (x ∈ S → x+ ∈ S)) →N ⊂ S
(if a set S contains zero and if it contains any number x it also contains
the immediate successor x+ of that number, then S includes the whole
of N)
4) a, b ∈ N ∧ a+ = b+ → a = b
(no two diﬀerent numbers have the same immediate successor)
5) a ∈ N → a+ = 0
(zero is not the immediate successor of a number)
Axiom three has the function to formalize the principle known as mathe
matical induction. We can show that in ZF (see section 5.3) we can derive
the ﬁve axioms of Peano. For more information on the Peano axioms, I refer
to [31, chapter 5], [49, page 146147] and [64, appendix A].
After deﬁning the natural numbers, Peano used a recursive deﬁnition to
deﬁne the arithmetical sum, product and other operators, and he derived
much of the elementary number theory.
Example: Peano deﬁned the sum a + b by recursion with respect to b :
a + 0 = a, a + (b+) = (a + b)+. Similarly we can deﬁne the product
a ∗ b : a ∗ 0 = 0, a ∗ (b+) = (a ∗ b) +a.
Peano then showed how rationals and reals can be formally obtained from
naturals, and further considered elementary analysis and geometry. In later
years, Peano turned away from the foundations of mathematics and devoted
most of his time on his new international auxiliary language Interlingua. He
invented this language (see [49, page 148150]) in an attempt to reduce the
grammatical structure of languages and create a universal language. His
mathematical work were to have a profound inﬂuence on the thought of
mathematics, but his language Interlingua received little response.
74 CHAPTER 4. PEANO AND FREGE
4.2 Frege’s work
As I think about acts of integrity and grace, I realize that there
is nothing in my knowledge to compare with Frege’s dedication to
truth. His entire life was on the verge of completion, much of his
work had been ignored to the beneﬁt of men inﬁnitely less capa
ble, his second volume was about to be published, and upon ﬁnding
that his fundamental assumption was in error, he responded with
intellectual pleasure clearly submerging any feelings of disappoint
ment. It was almost superhuman and a telling indication of that
of which men are capable if their dedication is to creative work and
knowledge instead of cruder eﬀorts to dominate and be known.
 B. Russell about Frege, in [93, page 127]
The German mathematician and philosopher Gottlob Frege (18481925)
was one of the founders of modern symbolic logic putting forward the (lo
gistic) view that mathematics is reducible to logic. He has written many
important papers on philosophy. Frege once said ‘every good mathematician
is at least half a philosopher, and every good philosopher is at least half a
mathematician’. Famous is his ontological argument for the existence of god,
but we will not discuss his philosophical writings here. We will mention his
three most important works on the foundations of mathematics: Begriﬀs
schrift, Grundlagen der Arithmetik and Grundgesetze der Arithmetik.
Begriﬀsschrift
Just as Peano, the German mathematician Gottlob Frege invented a log
ical symbolism to which he gave the name ‘Begriﬀsschrift’ (in English known
as ‘Concept script’). We will not treat the symbolism that was used in Be
griﬀsschrift here, in full detail (it can be found in [49, page 175182] and in
[31, page 177199]), but give a few examples of his new logic and describe
the rest of his work in general terms.
Frege rejected the subject/predicate regimentation on which Aristotelian
logic depends, and recognized (not as the ﬁrst) that the patterns of Aris
totle cannot always be used to evaluate inferences correctly.
4.2. FREGE’S WORK 75
Example: Certain obvious inferences, such as:
If Joe doesn’t wear a kilt, than Joe is not Scottish.
Joe doesn’t wear a kilt.
Therefore, Joe is not Scottish.
do not fall under the patterns of traditional logic (also called syllogisms). Ac
tually this is another kind of inference that contains a conditional expression
of the form:
if B then A
B
Therefore, A.
Frege adopted this new rule in the system of logic of his Begriﬀsschrift.
With arbitrary expressions for A and B, the rule became later known as
modus ponens. A logic that evaluates these sorts of expressions is called a
propositional logic.
What is propositional calculus (or sentential calculus)?
A symbolic system of treating compound propositions and their logical re
lationships. Compound propositions are formed via a set of derivation rules
using standard symbols: ∧, ∨, →, ; Basic propositions consist of simple,
unanalyzed propositions.
Frege based his propositional calculus on 6 axioms: for all x, y and z:
1 x → (y → x)
2 (x → (y → z)) → ((x → y) → (x → z))
3 (x → (y → z)) → (y → (x → z))
4 (x → y) → (y → x)
5 x → x
6 x → x
76 CHAPTER 4. PEANO AND FREGE
Derivations in the propositional calculus were based on two procedures of
substitution and the rule of modus ponens. For the full calculus of predi
cates, three additional axioms were needed. For all x, y and (propositional
functions) F:
7 (x = y) → (F(x) → F(y))
8 x = x
9 (∀x :: F(x)) → F(y)
Frege presented this new logic in his ‘Begriﬀsschrift’ in 1879. It consists
of three parts. In the ﬁrst part he provides a list of inferences from which,
he believes, all truths of logic can be derived. Then Frege demonstrates in
the second part the completeness of his logic (i.e. all inferences that can be
shown to be valid inferences using the techniques of Aristotelian or proposi
tional logic can also be shown to be valid using only Frege’s laws and rules
of inference). The third part of Begriﬀsschrift shows that logic alone suﬃces
to show the validity of certain inferences (about properties that are heredi
tary in socalled ‘ancestral sequences’). He also showed that mathematical
induction (see section 3.4.3) can be replaced by a principle about ancestral
sequences that depends only on logical laws.
Grundlagen der Arithmetik
Throughout his work Frege developed (as the ﬁrst) the main thesis of logi
cism, that mathematics is reducible to logic. But thereto, he had to do more
than developing a new logical symbolism. His next book, ‘Die Grundlagen
der Arithmetik’ (1884), was devoted to the ‘foundations of arithmetic’. In
this book, Frege treated the foundations of arithmetic, based on the concept
of (cardinal) numbers. He put forward the logicist philosophy that arithmetic
could be founded upon logic alone, and he discussed work of others in detail
(see [49, 184185]). In [31, page 183] we learn more about Frege’s philosophy.
In the introduction of his book Frege announced his three guiding principles:
1) Always to separate sharply the psychological from the logical, the sub
jective from the objective
2) Never to ask for the meaning of a word in isolation, but only in the
context of a proposition
4.2. FREGE’S WORK 77
3) Never to lose sight of the distinction between concept and object
In his book he presented his own theory of numbers, and wanted to show
that all the truths in arithmetic are derivable from logical laws and deﬁni
tions alone. He did this by sketching the proof, but not giving the oﬃcial
Begriﬀsschrift proofs of the truths of arithmetic. Before Frege could do that
he needed a new version of Begriﬀsschrift, to accompany the new require
ments that his formalization of the concept of numbers had, but also to ﬁll
in pieces that were simply missing.
Grundgesetze der Arithmetik
In his next three papers ‘Function and Concept’, ‘On Sense and Meaning’,
and ‘On Concept and Object (1892)’, he introduced all modiﬁcations that he
was to make to his language, Begriﬀsschrift, and his logical system. During
that period he also completed his deﬁnitions of the natural numbers and some
of the proofs of simple truths of arithmetic from these deﬁnitions and logical
laws. His new logical calculus included a symbolic representation of the truth
value of any given proposition, which provided a shorter notation for many
Begriﬀsschrift propositions. The calculus also had several other new logical
and arithmetical symbols, one of the most important of them being a notation
for what Frege called the ‘courseofvalues’ of a propositional function. The
courseofvalues of a propositional function ϕ , denoted by Frege as ˘ εϕ(ε),
denoted the truth value for all possible values of the argument (here ε). We
denote it as cov and deﬁne equal courseofvalues by cov(f) = cov(g) ↔ (∀a ::
f(a) = g(a)). In 1893, Frege published the ﬁrst volume of his ‘Grundgesetze
der Aritmetik’, the ‘Basic Laws of Arithmetic’. It set out the new version of
logic and began the proofs that were to make the project successful. In the
second part Frege wanted to deﬁne the natural numbers and some basic laws
governing them and, in the third part, he would deﬁne the real numbers and
lay the foundations for expressing analysis in terms of logic. In 1902, when
volume 2 was in press, he received a now famous letter from the English
mathematician and logician Russell (see chapter 5), who pointed out, with
great modesty, a contradiction could be derived in Frege’s system (see section
5.1). This contradiction would later be named after Russell and become
known as ‘Russell’s paradox’.
78 CHAPTER 4. PEANO AND FREGE
Hardly anything more unwelcome can befall a scientiﬁc writer
than one of the foundations of his ediﬁce be shaken after his work
is ﬁnished. I have been placed in this position by a letter of mr.
Bertrand Russell just as printing of the second volume was near
ing completion . . . .
 The ﬁrst paragraph of the appendix from Frege’s ‘Grundgesetze
der Aritmetik’
After many letters between the two (see for example [93, pages 124128]),
Frege modiﬁed one of his axioms and explained in an appendix to the book
that this was done to restore the consistency of the system. However with
this modiﬁed axiom, many of the theorems of volume 1 do not go through
and Frege must have known this. He probably never realized that even with
the modiﬁed axiom the system is inconsistent since this was not shown until
after Frege’s death in 1925, by Leshniewski (see [85]).
The scope of Frege’s Grundgesetze is similar to that of Principia Mathe
matica (to be discussed in section 7.1), and both aimed at a logistic basis
for mathematics, but with Russell’s theory of types Principia Mathematica
did not contain the paradox. Frege’s contribution to the foundations of ma
thematics was therefore largely indirect (through Principia Mathematica,
see [49, page 181]). Although Frege attracted only a small audience in his
lifetime, he was a major inﬂuence on Peano and Russell, and in the years
thereafter his inﬂuence on contemporary philosophy, especially on thought
about language and logic, has become ubiquitous.
In this text I have made extensive use of the excellent books [98] and [97]
about Frege that contain many more references about Frege and his work,
and chapter 4.5 from [31] and chapter 6, section 4 from [49].
Chapter 5
Russell
The fact that all Mathematics is Symbolic Logic is one of the
greatest discoveries of our age; and when this fact has been esta
blished, the remainder of the principles of mathematics consists
in the analysis of Symbolic Logic itself.
 B. Russell in Principles of Mathematics, 1903
The English logician and philosopher Bertrand Russell (18721970) pu
blished in his long life an incredible number of books on logic, the theory of
knowledge and many other topics. He certainly was one of the most impor
tant logicians and philosophers of the 20
th
century.
Russell’s private life, aﬀairs, imprisonment, his social and political cam
paigns and advocacy of both paciﬁsm and nuclear disarmament are certainly
interesting, but we will not discuss these subjects here (see for more informa
tion and references on Russell’s life and work [62], [80] and [31, chapter 6, 7,
11 and sections 8.2, 8.3, 8.4, 8.8.3, 8.9.2, 10.1, 10.2.1]). I quote the following
assessment from [73]: “Bertrand Russell had one of the most widely varied
and persistently inﬂuential intellects of the 20
th
century. During most of his
active life, a span of three generations, Russell had at any time more than
40 books in print ranging over philosophy, mathematics, science, ethics, so
ciology, education, history, religion, politics and polemic. The extent of his
inﬂuence resulted partly from his amazing eﬃciency in applying his intellect
(he normally wrote at the rate of 3,000 largely unaltered words a day) and
partly from the deep humanitarian feeling that was the mainspring of his ac
79
80 CHAPTER 5. RUSSELL
tions. This feeling expressed itself consistently at the frontier of social change
through what he himself would have called a liberal anarchistic, leftwing,
and skeptical atheist temperament.”
Here, we will focus on Russell’s mathematical contributions to the foun
dations of mathematics. His contributions relating to mathematics include
his discovery of Russell’s paradox, his defense of logicism (the view that
mathematics is, in some signiﬁcant sense, reducible to formal logic), his in
troduction of the theory of types, and his reﬁning and popularizing of the
ﬁrstorder predicate calculus. Along with Kurt G¨ odel (see chapter 8), he is
usually credited with being one of the two most important logicians of the
twentieth century. We will look at each of these contributions in more detail.
Russell discovered the paradox which bears his name in 1901, while
working on his ‘Principles of Mathematics’ (1903). The paradox and the
closely related vicious circle principle are discussed in section 5.1. Russell’s
own response to the paradox came with the introduction of types (see chap
ter 7). Using the vicious circle principle also adopted by Henri Poincar´e,
together with Russell’s socalled ‘noclass’ theory of classes, Russell was then
able to explain why the unrestricted comprehension axiom (see section 2.1)
fails: propositional functions, such as ‘x is a set’, should not be applied to
themselves since selfapplication would involve a vicious circle. On this view,
it follows that it is possible to refer to a collection of objects for which a
given condition (or predicate) holds only if they are all at the same level or
‘type’.
Although ﬁrst introduced by Russell in 1903 in the Principles, his theory
of types ﬁnds its mature expression in his 1908 article ‘Mathematical Logic as
Based on the Theory of Types’ and in the monumental work he coauthored
with Alfred North Whitehead, ‘Principia Mathematica’ (1910, 1912, 1913).
Principia Mathematica and the theory of types will be treated in detail in
chapter 7. The theory admits of two versions, the ‘simple theory’ and the
‘ramiﬁed theory’. Both versions of the theory later came under attack. For
some, they were too weak since they failed to resolve all of the known para
doxes. For others, they were too strong since they disallowed many ma
thematical deﬁnitions which, although consistent, violated the vicious circle
principle. Russell’s response to the second of these objections was to intro
duce, within the ramiﬁed theory, the axiom of reducibility. Although the
81
axiom successfully lessened the vicious circle principle’s scope of application,
many claimed that it was simply too ad hoc to be justiﬁed philosophically.
Of equal signiﬁcance during this same period was Russell’s defense of logi
cism, the theory that mathematics was in some important sense reducible to
logic. First defended in his Principles, and later in more detail in ‘Principia
Mathematica’, Russell’s logicism consisted of two main theses. The ﬁrst
is that all mathematical truths can be translated into logical truths or, in
other words, that the vocabulary of mathematics constitutes a proper subset
of that of logic. The second is that all mathematical proofs can be recast as
logical proofs or, in other words, that the theorems of mathematics consti
tute a proper subset of those of logic.
Like Gottlob Frege, Russell’s basic idea for defending logicism was that
numbers may be identiﬁed with sets of sets and that numbertheoretic state
ments may be explained in terms of quantiﬁers and identity. It followed
that numbertheoretic operations could be explained in terms of settheoretic
operations such as intersection, union, and the like. In ‘Principia Mathema
tica’ Whitehead and Russell were able to provide detailed derivations of many
major theorems in set theory, ﬁnite and transﬁnite arithmetic, and elemen
tary measure theory. A fourth volume on geometry was planned but never
completed.
For more information on Russell’s theory of types and about Principia
Mathematica, we refer to chapter 7. In this chapter we used parts of [73]
and [39].
82 CHAPTER 5. RUSSELL
5.1 Russell’s paradox
I hoped sooner or later to arrive at a perfect mathematics which
should leave no room for doubts, and bit by bit to extend the sphere
of certainty from mathematics to other sciences.
 Russell, in [78]
Paradoxes have been known for a long time, but in particular with the
introduction of more formal systems at the end of the 19
th
century paradoxes
became more inﬂuential on the foundations of mathematics. Before we de
scribe the most famous paradox of Russell, we ﬁrst deﬁne the notion of a
paradox.
What is a paradox? A paradox is a statement which appears selfcontradictory
or contrary to expectations, and is also known as an antinomy
In an axiomatic system (see section 6.1) a paradox is a derivation that
leads to a contradictory statement.
A paradox is properly something which is contradictory to ge
neral opinion; but is frequently used to signify something self
contradictory [...] Paralogism, by its etymology, is best ﬁtted to
signify an oﬀence against the formal rules of inference.
 De Morgan, in [31, page 310]
In [86], three ‘paradox threats’ are identiﬁed: when systems are complex,
formal or designed for computers, there often is not enough intuition to notice
inconsistencies. With the previously described formalizations, the systems
of Cantor (see chapter 2), Peano (see section 4.1), Frege (see section 4.2),
and not to mention Russell himself were at risk. And indeed, in 1902 Russell
discovered a paradox in Frege’s ‘Grundgesetze der Aritmetik’. The paradox
turned out to be at the basics of mathematics, since it could be formulated in
all the systems mentioned above. We ﬁrst formulate the paradox in Cantor’s
set theory:
Russell’s paradox: Let R = ¦x [ x ∈ x¦. Then R ∈ R ↔ R / ∈ R
5.1. RUSSELL’S PARADOX 83
Russell in 1901 studied Cantor’s work [31, section 6.6.1] and after noting
that some sets belonged to themselves while the rest did not do so, Russell
showed that the set of all sets which do not belong to themselves belongs to
itself if and only if it does not do so  and, by repetition of the argument,
vice versa also. Russell also expressed this paradox in terms of predicates,
and as such ﬁrst presented his discovery in a letter to Frege (see [93, page
124] and see also the quote on page 78).
Since Peano’s system was based on the set theory of Cantor, also Peano’s
work contained the paradox. In Frege’s work (Grundgesetze der Aritmetik)
selfapplication was not possible, so R ∈ R was not allowed, but the para
dox could still be expressed by using Frege’s notion (see page 77) of the
courseofvalues of a function. If we deﬁne equal courseofvalues cov by
cov(f) = cov(g) ↔ (∀a :: f(a) = g(a)), we can derive the paradox in Frege’s
work as follows (see also [86, page 7] for a slightly diﬀerent proof):
Deﬁne f(x) := (∀ϕ :: (cov(ϕ) = x) → ϕ(x)), and let K := cov(f).
f(K)
≡ ¦def. f¦
(∀ϕ :: cov(ϕ) = K → ϕ(K))
≡ ¦elim.¦
(∀ϕ :: cov(ϕ) = K → ϕ(K))
≡ ¦instantiate ϕ with f¦
cov(f) = K → f(K)
≡ ¦def. K, elim. →¦
f(K)
The paradox had a big inﬂuence, since it could be formulated in all sys
tems, and all statements in classical logic were entailed by a contradiction.
84 CHAPTER 5. RUSSELL
In the eyes of many mathematicians (e.g. Hilbert, Brouwer) it therefore
appeared that no proof could be trusted once it was discovered that the
logic underlying all mathematics was inconsistent. Russell’s paradox arises
as a result of naive settheory’s socalled unrestricted or naive comprehension
axiom (see page 16). Cantor created this axiom with the intuition that any
coherent condition may be used to determine a set. But that means that the
condition ϕ that determines a set V = ¦x [ ϕ(x)¦ may depend on the whole
set V , i.e. it allows impredicative deﬁnitions (see below for the deﬁnition of
impredicative). Most attempts at resolving Russell’s paradox have therefore
concentrated on various ways of restricting or abandoning this axiom.
Before we consider the consequences of the discovery of the paradox,
we ﬁrst take a further look at the nature of the paradox, hereby following
Russell’s own analysis. While writing ‘The Principles’, Russell’s attention
was attracted by what is now known as Cantor’s paradox and (according to
a letter he wrote to the French mathematician Jourdain) found that there
was something wrong with his earlier refutation of Cantor’s paradox (see [29,
section 7]). He removed his earlier refutation from ‘The Principles’ and his
revised diagnosis uncovered a true paradox. As we have already seen, he
summarized this discovery and the reasoning that led thereto in a second
letter to Frege.
After discovering his famous paradox, Russell traced the fallacy back to
what he called the ‘vicious circle principle’. The ‘vicious circle’ that his prin
ciple is named after, arises from the assumption that a set of objects may
contain members which can only be deﬁned by means of the set as a whole.
Therefore, Russell said that statements are not legitimate and meaningless,
if they contain a set of objects such that it will contain members which pre
suppose this (total or whole) set of objects. That means a statement is only
legitimate if all propositions it contains refer to already deﬁned sets.
Deﬁnition of impredicative: A deﬁnition is impredicative if it involves a
set V that has a member v ∈ V whose deﬁnition depends on V .
1
1
Note that a direct implementation of this deﬁnition as a new axiom of set theory is not
possible; We might rephrase the deﬁnition as ‘whatever set contains an apparent element,
that element must not be dependant on that set’. This might be implemented by ﬁxing
‘an apparent element’ of a set and then expressing its independency of other elements of
that set. This independency means that, regardless of the nature of the elements of the
5.1. RUSSELL’S PARADOX 85
In a sense those impredicative deﬁnitions are thus circular, and were con
sidered the cause of antinomies. For more information about impredicativity,
see [57, section 15.3].
Deﬁnition of Vicious Circle Principle
2
: Deﬁnitions, assumptions or
statements involving all of a set must not be a part or an element of that
set. In other words, impredicative deﬁnitions should be avoided.
In terms of set theory we can formulate the principle as : No set V is
allowed to contain members v deﬁnable only in terms of V , or members v
involving or presupposing V .
Vicious circle fallacies are arguments that are condemned by the vicious
circle principle. Such arguments may not necessarily lead to contradictions
(since fallacious arguments can lead to true conclusions).
In Principia Mathematica (see [31, section 7.2]), Russell assembles a col
lection of seven diﬀerent paradoxes, all of which were based on the same
circular type of reasoning, and then he resolved them by making their circu
larity explicit. We will now mention eight of the most wellknown paradoxes,
most of whom originate from the vicious circle principle.
set, the nature of the apparent element remains the same. The ‘nature’ of the elements
can be seen as all the members of that element (or in case the element is an individual,
the nature of the apparent element can be seen as that individual). This leads us to the
following axiom:
(∀X :: (∀x : x ∈ X : x = a → (∀x
: x
∈ X ∧ x = x
: x
= b(x
) → a ∈ X))). Clearly
this does not avoid the paradox of Russell. We consider a set X:=R ≡ ¦x [ x / ∈ x¦ and
an element x ∈ R, i.e. we have x / ∈ x. Despite the fact that the set X is ‘too large’, the
axiom does not prohibit the existence of the set X. The axiom tells us x = a → (∀x
:
x
∈ R ∧ x = x
: x
= b(x
) → a ∈ R). In other words, we can change each element in R
except x and the nature of x should not depend on it. The only thing we know about x is
that x ∈ x and x ∈ R. So to obtain a contradiction we have to show that x ∈ x ∨ x / ∈ R.
Now we can change all x
into any value b(x
), but still we will have x / ∈ x and x ∈ R. So
unfortunately this most ‘direct’ attempt to solve the paradox fails.
2
Russell formulated it originally as ‘Whatever involves all of a collection must not be
one of the collection’. Or, as formulated in [49, page 113]: ‘If, provided a certain collection
had a total, it would have members only deﬁnable in terms of that total, then the said
collection has no total’. Another formulation of [87] says ‘No entity can be deﬁned in
terms of a totality of which it is itself a possible member’.
86 CHAPTER 5. RUSSELL
1 Russell’s paradox (1903), which we have discussed in this section. The
impredicativity is clear in the deﬁnition of the set that contains all sets
that are not members of themselves. There are many popularizations of
this paradox, one of them is from Russell himself (1919) and concerns
the plight of the barber of a certain village who has enunciated the
principle that he shaves only all those persons of the village who do
not shave themselves. The paradox is then formed by the question
‘Does the barber shave himself?’.
2 BuraliForti’s paradox (1897), which we have discussed in section 3.8.2.
The impredicativity comes from the ordinal number of the naturally
ordered set of all order numbers.
3 Cantor’s paradox, which we have discussed in section 3.8.1. The im
predicativity comes from the cardinal number of the set of all sets.
4 The liar’s paradox: We quote from [49, page 127]: “If a man says ‘I
am lying’, his utterance is selfcontradictory, and it cannot be either
true or false. The oldest form of this particular paradox, in the words
of Principia Mathematica, is that of Epimenides the Cretan, ‘who said
that all Cretans were liars, and all other statements made by Cretans
were certainly lies’.”.
5 Richard’s paradox: The French schoolteacher Jules Richard (1862
1956) published a paradox in [74] in 1905. He considered a set V of
all nonterminating decimals that can be deﬁned in a ﬁnite number of
words. By arranging V as a sequence, and applying Cantor’s diagonal
argument to the members of V , a diﬀerent but nonterminating decimal
was produced, deﬁned in a ﬁnite number of words.
6 Paradox of deﬁnitions. Again we quote from [49]: “The possible deﬁ
nitions of speciﬁc ordinal numbers can be arranged in a sequence, and
there are therefore at most ℵ
0
of them. But the totality of ordinal
numbers is not denumerable, and so there exist ordinal numbers which
cannot be individually deﬁned. Among such indeﬁnable ordinals there
is a least, and thus it appears that the description ‘the least indeﬁnable
ordinal’ yields a deﬁnition of an entity that cannot be deﬁned.”.
7 Berry’s paradox: “The least integer not nameable in fewer than nine
teen syllables” is itself a name that contains only eighteen syllables.
5.1. RUSSELL’S PARADOX 87
8 The GrellingNelson paradox: The German philosopher Kurt Grelling
(18861942) published with his friend Leonard Nelson (18821927) in
1908 a paradox. As described in [31, page 336]: “Some words can be
predicated of themselves: in English, ‘word’ is a word, ‘noun’ is a noun,
and so on. This property is called ‘autological’, and is obviously itself
autological. Other English words are not autological; ‘German’, say, or
‘verb’. They are called ‘heterological’  but this word is heterological if
and only if it is not so.”.
The ﬁrst three paradoxes are logical paradoxes that can be formulated
within Cantor’s set theory. The remaining ﬁve are mainly paradoxes of nam
ing, they are of a semantic kind. All these paradoxes have stimulated funda
mental research, and especially Russell’s paradox that revealed the vicious
circle principle and ﬁrst showed the need for a theory of types or other re
striction of the power of the comprehension axiom.
88 CHAPTER 5. RUSSELL
5.2 Consequences and philosophies
Perhaps the greatest paradox of all is that there are paradoxes in
mathematics.
 E. Kasner and J. Newman quoted in [46]
The various proposals to overcome this paradox led to various theories.
One proposal was to reconstruct set theory on an axiomatic basis (this
axiomatic method was ﬁrst suggested by Hilbert, see section 6.1) suﬃciently
restrictive to exclude the paradoxes. Hilbert and other formalists had the
basic idea to allow the use of only welldeﬁned and ﬁnitely constructible
objects, together with rules of inference that were deemed to be absolutely
certain.
The mathematician Zermelo in 1908 as ﬁrst did an attempt to formulate
proper axioms for settheory such that the paradox is not deducable, but
most other parts of settheory are. This attempt was successful and, after
a reﬁnement by the mathematician Fraenkel, led to the ZF axiom system
(see section 5.3) which is still the most accepted basis today. Subsequent
reﬁnements to ZF have been made by Skolem, and later by the three mathe
maticians von Neumann, Bernays and G¨ odel (see section 8.5).
Russell’s own response to the paradox came with the introduction of his
theory of types in his Principia Mathematica (see section 5.4). Russell al
ready laid out a ﬁrst version of his theory to eliminate the paradoxes in
1908. Since selfapplication (R ∈ R) caused a contradiction, he decided to
suppress this. With this approach he assigned types to variables (as types
he took sets) and allows expressions such as x ∈ y only if the type of x
is one less (in some order) than the type of y. The outlawing of impredi
cative deﬁnitions seemed a solution to the known paradoxes in set theory.
But it turned out there are essential and accepted parts of mathematics that
contain impredicative deﬁnitions. This was a serious problem to Russell’s
solution, despite the fact that many instances of impredicative deﬁnitions in
mathematics could be circumvented. We quote from [22, page 265]: “In 1918,
the German mathematician Hermann Weyl (18851955) tried to construct as
much parts of analysis as possible from the natural number system without
the use of impredicative deﬁnitions. Although he succeeded in obtaining a
considerable part of analysis, he was unable to derive the important theorem
5.2. CONSEQUENCES AND PHILOSOPHIES 89
that every nonempty set of real numbers having an upperbound has a least
upperbound”.
Other attempts towards a solution for the paradoxes of set theory focus on
the foundations of logic. Luitzen Brouwer and the intuitionists took this
approach and tried to prevent the paradoxes by denying the principle of the
excluded middle (which states that any mathematical statement is either
true or false). Brouwer ﬁrst attacked the logical foundations of mathematics
in his doctoral thesis in 1907; This formed the beginning of the Intuitionist
School. The intuitionists had the basic idea that one cannot assert the exis
tence of a mathematical object unless one can also indicate how to go about
constructing it.
In the period after the discovery of the paradoxes, we distinguish three
main philosophies of mathematics: logicism, intuitionism and formalism.
What is Logicism? A school of mathematical thought which holds the
thesis that mathematics is a part of (or a branch of) logic.
Logicists contend that all of mathematics can be deduced from pure logic,
without the use of any speciﬁcally mathematical concepts, such as number or
set. The ﬁrst ideas date back to Leibniz (1616) and the actual reduction of
mathematics to logic was started by Dedekind (1818) and Frege (18841903)
and later by Peano, and Whitehead and Russell (in Principia Mathematica
19101913).
What is Intuitionism? A school of mathematical thought by the 20
th
cen
tury Dutch mathematician L.E.J. Brouwer (18811966) that contends that
the primary objects of mathematical discourse are mental constructions go
verned by selfevident laws.
Intuitionists have challenged many of the oldest principles of mathema
tics as being nonconstructive (and hence meaningless). They proposed that
a proof in mathematics should be excepted only if it constructed the mathe
matical entity it talked about, and not if it merely showed that the entity
‘could’ be constructed or that supposing its nonexistence would result in
contradiction.
90 CHAPTER 5. RUSSELL
Brouwer had the fundamental insight that such nonconstructive argu
ments will be avoided if one abandons a principle of classical logic (which
lies for example behind De Morgan’s laws). This is the principle of the ex
cluded third (or excluded middle), which asserts that for every proposition
ϕ, either ϕ or ϕ; or equivalently that, for every ϕ, ϕ implies ϕ. This
principle is basic to classical logic and had already been enunciated by Aris
totle, though with some reservations, as he pointed out that the statement
“there will be a sea battle tomorrow” is neither true nor false.
Because of the weight it places on mental apprehension through construc
tion of purported mathematical entities, intuitionism is sometimes also called
constructivism. A still more severe form of constructivism which we will not
further discuss is strict ﬁnitism, in which one rejects inﬁnite sets. More in
formation on intuitionism can be found in [60].
What is Formalism? A school of mathematical thought introduced by the
20
th
century mathematician David Hilbert, which holds that all mathematics
can be reduced to rules for manipulating formulas without any reference to
the meanings of the formulas.
Formalists contend that it is the mathematical symbols themselves, and
not any meaning that might be ascribed to them, that are the basic objects
of mathematical thought. Hilbert’s program, called formalism, was to con
centrate on the formal language of mathematics and to study its syntax. A
statement should be a metatheorem, that is a theorem provable within the
syntax of mathematics.
These three philosophies do not necessarily contradict each other, and
all philosophies are still advocated today. Whether the logicist thesis has
been established seems to be matter of opinion. Though successful, it can
be questioned on the ground that the systematic development of logic pre
supposes mathematical ideas in its formulation. The intuitionists succeeded
in rebuilding large parts of presentday mathematics, but a large part is still
wanting, making intuitionist mathematics less powerful and in many respects
much more complicated than classical mathematics. These are serious ob
jections to the intuitionistic approach, but it is generally conceded that its
methods do not lead to contradictions, and some hope for a new intuitionist
reconstruction of mathematics carried out in a diﬀerent and more successful
5.2. CONSEQUENCES AND PHILOSOPHIES 91
way. Unfortunately for the formalists, a consequence of G¨ odel’s incomplete
ness theorem (see chapter 8) is that the consistency of mathematics can be
proved only in a language which is stronger than the language of mathema
tics itself. Yet, formalism is not dead  most pure mathematicians are tacit
formalists, but the naive attempt to prove the consistency of mathematics in
a weaker system had to be abandoned. From [11, item from Paul Bernays]
we learn that most mathematicians of all three philosophies are also philo
sophical realists: “While no one, except an extremist intuitionist, will deny
the importance of the language of mathematics, most mathematicians are
also philosophical realists who believe that the words of this language denote
entities in the real world. Following the Swiss mathematician Paul Bernays
(18881977), this position is also called Platonism, since Plato believed that
mathematical entities really exist.”. For more information about realism, see
[57].
92 CHAPTER 5. RUSSELL
5.3 Zermelo Fraenkel
5.3.1 Axiomatic set theory
After the discovery of Russell’s paradox, it became clear that set theory
needed a new and more rigorous basis. Hilbert’s proof theory, that will be
treated in more detail in chapter 6.1, oﬀered a way to put set theory on ﬁrm
and hopefully consistent grounds. The socalled ideal calculus was a ﬁrst
formalization of Cantor’s set theory, but it lacked the preciseness of Hilbert’s
later theories and was inconsistent because it still contained in some form the
(naive) comprehension principle (see page 16). The ﬁrst real axiomatization
of set theory was given in 1908 by the German mathematician Ernest Zermelo
in [101]. The attitude adopted in his axiomatic development of set theory
is that it is not necessary to know what ‘sets’ are and the ‘things’ that are
its elements, nor what the ‘membership relation’ means [49, see page 288,
paragraph 1]. Zermelo instead postulated a domain B of abstract objects and
represented the elements or ‘things’ of this domain by the letters a, b, c, . . ..
He then deﬁned the primitive notions of equality and membership: a = b
states that ‘a’ and ‘b’ designate the same ‘thing’. a ∈ b is deﬁned on the
domain B and if a ∈ b holds, we call b a set and a an element of this set. Thus
some, but not necessary all objects of B are sets. The assumptions adopted
about these notions are called the axioms of the theory. Its theorems are the
axioms together with the statements that can be deduced from the axioms
using the rules of inference (see also section 6), for example by a system of
logic. Criteria for the choice of axioms have been identiﬁed by several people
(see Hilbert’s theory in section 6, or [49, last sentence of page 287]). The
most accepted criteria (more formally deﬁned in chapter 6) include:
1. Consistency of the system (it should be impossible to derive both a
statement and its negation, in other words the paradoxes should be
avoided).
2. Plausibility (the axioms should be in accord with intuitive beliefs about
sets, see [60]).
3. Completeness (richness of the theory: the desirable results of Cantorian
set theory ought to be derived as theorems).
In the next paragraph we will present the set of axioms that Zermelo has
chosen and that formed the basis for all future axiomatizations of set theory
5.3. ZERMELO FRAENKEL 93
(see also section 8.5).
5.3.2 Zermelo Fraenkel (ZF) Axioms
Zermelo formulated his axiomatic system in 1908, the extensions of Fraenkel
are from 1922. In the same year (1922) the Norwegian mathematician Skolem
(18871963) proposed a formal language for formulating the theory.
Zermelo noted that the sets involved in a derivation of the paradoxes are
very large
3
(for Cantor’s paradox it is the set of all sets (see section 3.8.1),
for Russell’s paradox it is the set of all sets which are not members of them
selves (see section 3.8.2), and for the BuraliForti paradox (see section 3.8.2)
it is the set of all wellorderings). Therefore he wanted to restrict the size of
sets, and he changed the (naive) comprehension principle into his separation
axiom, such that the paradox could no longer be derived:
Separation Axiom: (∀z∃y∀x :: (x ∈ y ↔ x ∈ z ∧ ϕ(x)))
For every set z and deﬁnite
4
property ϕ of sets there exists a set whose ele
ments are exactly those of z having the property ϕ.
There are also certain limitations on the property ϕ (i.e. it should be de
ﬁnite) that we will mention later in section 8.5. We show that the standard
derivation of Russell’s paradox cannot be applied when the naive compre
hension axiom is replaced by the separation axiom.
Let R = ¦x [ x ∈ Z ∧ x / ∈ x¦
R ∈ R ↔ R ∈ Z ∧ R / ∈ R
→ R / ∈ R, contradiction.
R / ∈ R ↔ R / ∈ Z ∨ R ∈ R
3
The term proper class is sometimes used to refer to these ‘excessively large’ sets; all
other sets are then referred to as improper classes. This means all sets are classes but not
every class is a set. A class that is not a set is called a proper class.
4
See section 8.5 for the deﬁnition of the concept of deﬁniteness.
94 CHAPTER 5. RUSSELL
← R / ∈ Z
In both equations above we can only conclude that R ∈ R ↔ R / ∈ R if
we know that R / ∈ Z. Since we cannot directly conclude (or refute) R ∈ Z,
Russell’s derivation of his paradox does not apply.
However, this fact alone does not guarantee that there does not exist a
paradox, as claimed in some articles, but merely that the separation axiom
does not permit the construction of paradoxical sets with elements deﬁned
in terms of the sets themselves. But until consistency is proved, there might
be other less obvious ways to construct a paradox.
We now give all of the ZF axioms that constitute set theory. The ﬁrst
seven axioms are those that were originally formulated by Zermelo. Axiom
8 and 9 were later added by Fraenkel and von Neumann respectively. The
axioms 1 through 8 are the original set of the ZermeloFraenkel axioms.
In the deﬁnitions below we use several shorthand notations. If we wish
however we can express these deﬁnitions in full detail, such that the notation
of each expression does not depend on previous axioms. For example, in
axiom 8 we used the ∃! to denote that there is exactly one y, and in axiom
9 we used the symbols ∩ and ∅, and in axiom 6 we used ⊆ to express x ⊆ z
as a shorthand for (∀y :: y ∈ x → y ∈ z). The separation and substitution
axioms are actually axiom schemes.
The ZermeloFraenkel axioms:
1. Extensionality axiom (or axiom of determination):
(∀x, y, z :: (z ∈ x ↔ z ∈ y) → x = y)
Sets are uniquely determined by their members,or to be exact: if every
element of a set x is at the same time an element of y, and conversely,
then x = y.
2. Axiom of the empty set:
(∃x∀y :: y / ∈ x)
There is an (improper, see also footnote on page 93) set, the ‘null’ or
‘empty’ set, which contains no elements at all.
5.3. ZERMELO FRAENKEL 95
3. Separation axiom:
(∀z∃y∀x :: x ∈ y ↔ x ∈ z ∧ϕ(x)), ϕ is deﬁnite and does not contain y.
For every set z there exists a set y whose elements are exactly those of
z having the property ϕ.
4. Pairing axiom:
(∀a, b :: (∃y∀x :: x ∈ y ↔ x = a ∨ x = b))
Given two sets a and b there exists a set whose elements are exactly a
and b.
5. Sumset axiom or Union axiom:
(∀z∃y∀x :: x ∈ y ↔ (∃w :: w ∈ z ∧ x ∈ w))
For every set z there exists a set y whose elements are exactly those
objects occurring in at least one element of z.
6. Power set axiom
(∀z∃y∀x :: x ∈ y ↔ x ⊆ z)
For every set z there is a set y that includes every subset of x.
7. Axiom of inﬁnity:
(∃z :: ∅ ∈ z ∧ (∀a : a ∈ z : ¦a¦ ∈ z))
There exists a successor set.
8. Axiom of replacement or axiom of substitution (by Fraenkel):
(∀x∃!y :: ϕ(x, y)) → (∀a :: (∃b∀y :: y ∈ b ↔ (∃x : x ∈ a : ϕ(x, y))))
The image of a set under an operation ϕ (functional property) is again
a set.
9. Axiom of foundation or axiom of regularity (by von Neumann):
(∀a ::= ∅ → (∃b :: b ∈ a ∧ b ∩ a = ∅))
Every nonempty set is disjoint from at least one of its elements.
Theorem: (from [49, chapter 11]) The domain B itself (see page 92) is not
a set.
Proof: Suppose V is any given set. Then
5
, V has a subset W that consists of
those elements of V that are not members of themselves. But then W is not
an element of itself (because in that case we would have W ∈ W, while W
5
Since the property x / ∈ x is deﬁnite. See section 8.5 for the deﬁnition of the concept
of deﬁniteness.
96 CHAPTER 5. RUSSELL
consists of elements that are not members of themselves). But if W would
be an element of V − W, we would also have W ∈ W. This means that W
is not a member of V . But V is certainly in B, and therefore B is not the
same as V . Thus B cannot coincide with any set at all.
The theory is not complete, since many statements are independent of
ZF. Independent of the previous axioms, the following two statements have
a more dubious status (and are not part of standard ZF):
10. Axiom of choice (AC):
(∀x :: (∃f : f is a function : Dom(f) = x−¦∅¦∧Ran(f) ⊂
¸
A∧(∀a :
a ∈ Dom(f) : f(a) ∈ a)))
Every set x has a choice function.
Deﬁnition of choice function: A function f is called choice function for
the set V := Dom(f) = V −¦∅¦ ∧ (∀v : v ∈ Dom(f) : f(v) ∈ V )
11. Generalized Continuum Hypothesis(GCH):
For any cardinal ℵ
r
, ¦0, 1¦
ℵ
r
= ℵ
r+1
In 1908 Felix Haussdorf proposed this generalization of CH. Another
formulation of this axiom and more information are given in section 3.6. In
the remainder of this section, we will give a short explanation of the nature
of the other axioms. For more detailed information, we refer to section 8.5
and to the rich literature on set theory that is available (for example [17],
[24], [49, chapter 11], [28]).
The axioms are not minimal. For example, as we have already seen in
section 2.2
6
, the axiom of the empty set can be deduced from the separation
axiom. We also have empty set axiom + substitution axiom ¬ separation
axiom. We have also seen in section 2.2 how we can deﬁne basic operations
with the extensionality and separation axioms. The pairing, sum and pow
erset axioms, together with the extensionality axiom, ensure uniqueness of
the pairs, sums and powersets of sets. With these axioms alone we can al
ready create an inﬁnite number of axioms. However, each set constructed
6
The existence of the empty set in section 2.2 was actually derived from the compre
hension principle but the result can similarly be obtained from the separation axiom.
5.3. ZERMELO FRAENKEL 97
with axioms 1 to 6 only has a ﬁnite number of elements. It is the inﬁnity
axiom that we need to create inﬁnite sets. These sets are not unique, but the
smallest successor set, denoted ω, is unique. We call its elements the natural
numbers. With this axiom we can now also prove the principle of induction
for ω (see section 3.4.3). The substitution axiom says that whenever ϕ is a
property of sets, such that to every x there is exactly one y for which ϕ(x, y),
and a is a set, then there exists a set, the elements of which are exactly
those y for which an x ∈ a exists such that ϕ(x, y). The foundation axiom
says that each nonempty set has epsilonminimal elements (see below). An
implication of this axiom is that there is no function f deﬁned on ω such
that (∀i : i ∈ ω : f(i + 1) ∈ f(i)). For a motivation and analysis of the role
of the foundation axiom we refer to [17, section 2.1].
Deﬁnition of epsilonminimal:
An element b ∈ a is epsilonminimal in a := b ∩ a = ∅
Another corollary of the foundation axiom is that there is no set which
has itself as its only element. Note that to prevent the paradoxes we need
the separation axiom, not the foundation axiom.
The origin of the axiom of choice was Cantor’s recognition of the impor
tance of being able to wellorder arbitrary sets; i.e., to deﬁne an ordering
relation for a given set such that each nonempty subset has a least element.
The virtue of a wellordering for a set is that it oﬀers a means of proving
that a property holds for each of its elements by a process (transﬁnite in
duction) similar to mathematical induction. Zermelo (1904) gave the ﬁrst
proof that any set can be wellordered. His proof employed a settheoretic
principle that he called the axiom of choice, which, shortly thereafter, was
shown to be equivalent to the socalled wellordering theorem. One form of
this principle is expressed as the axiom of choice. A choice function for a set
A ‘chooses’ an element from each nonempty subset in A. If x is a nonempty
set the elements of which are nonempty sets, then there exists a function f
with domain y such that for member a of y, f(a) ∈ a. For a more detailed
discussion of the axiom of choice we refer to [17, section 2.9].
Intuitively, the axiom asserts the possibility of making a simultaneous choice
of an element in every nonempty member of any set; this guarantee accounts
for its name. The assumption is signiﬁcant only when the set has inﬁnitely
many members. Zermelo was the ﬁrst to state explicitly the axiom, although
98 CHAPTER 5. RUSSELL
it had been used but essentially unnoticed earlier. It soon became the subject
of vigorous controversy because of its unconstructive nature. There are a few
mathematicians who feel that the use of the axiom of choice is improper, but
to the vast majority it, or an equivalent assertion, has become an indispens
able and commonplace tool. For this discussion of the axiom of choice we
have used [63], [77] and [11].
A discussion of the Generalized Continuum Hypothesis can be found in
section 3.7.
Chapter 6
Hilbert
The further a mathematical theory is developed, the more harmo
niously and uniformly does its construction proceed, and unsus
pected relations are disclosed between hitherto separated branches
of science.
 Hilbert, quoted in [76]
David Hilbert (18621943) was a German mathematician who reduced
geometry to a series of axioms and contributed substantially to the esta
blishment of the formalistic foundations of mathematics. His ﬁrst work was
on invariant theory and in 1888 he proved his famous Basis theorem (see
[5]). After that he did signiﬁcant work in the areas of algebraic number
theory, and published his ‘Zahlbericht’, or ‘Report on the theory of numbers’
in 1897. In 1899 he published the ‘Grundlagen der Geometrie’ (to appear
in English as ‘The foundations of Geometry’ in 1902), which contained (see
[31, section 4.7.2]) what would become a widely accepted set of 21 axioms
for Euclidian geometry and an analysis of their signiﬁcance. This axiomatic
method that Hilbert used (for geometry, but its application and concept
is more general and can be used far beyond the domain of geometry, see
also [57, section 14.7]) will be treated in section 6.1. A substantial part of
Hilbert’s fame rests on a list of 23 mathematical problems he outlined in
1900, and posed as a challenge for the next century. Some of these problems
were related to the foundations of mathematics (see section 6.2). In 1905
Hilbert attempted to lay a ﬁrm foundation of mathematics by proving its
consistency, resulting in two volumes of ‘Grundlagen der Mathematik’ that
99
100 CHAPTER 6. HILBERT
were intended to lead to a proof theory. Despite that in 1931 Kurt G¨ odel
showed this goal to be unattainable (see chapter 8), the work Hilbert had
done on the foundations of mathematics nevertheless remained inﬂuential to
the development of logic. Hilbert’s work on integral equations in about 1909,
(see [45]) led to research in functional analysis and established the basis for
his work on inﬁnitedimensional space, later called Hilbert space (see [22,
page 232]). When Hilbert was made an honorary citizen of G¨ ottingen he
gave an address which ended with six famous words, showing his enthusiasm
for mathematics and optimism for solving mathematical problems: “There
are absolutely no unsolvable problems. Instead of the foolish ignorabimus
[Latin for ‘the ignorant’], our answer is on the contrary: Wir m¨ ussen wissen,
Wir werden wissen” [We must know, We shall know].
6.1. HILBERT’S PROOF THEORY 101
6.1 Hilbert’s proof theory
Hilbert formalized mathematical theories in order to turn them into well
deﬁned objects of discussion, thus making possible the new kind of investi
gation to which he gave the new name metamathematics. Hilbert was the
ﬁrst who emphasized that strict formalization of a theory involves the total
abstraction from the meaning, the result being called a formal system or
formalism. In its structure, a formalized theory is no longer a system of
meaningful propositions but one of sentences as sequence of words, which
in turn are sequences of letters (a symbolic language). Hilbert’s method of
making the formal system as a whole the object of mathematical study is
called metamathematics or proof theory.
What is metamathematics? The study about mathematics itself (with
respect to formalized mathematical systems, metamathematics thus consists
of statements about the signs and formulas occurring within axiomatic sys
tems). One of the primary goals of metamathematics is to determine the
nature of mathematical reasoning
After Hilbert presented an axiomatic development of geometry in ‘Grund
lagen der Geometrie’ (1899), he devoted himself to the much greater task of
applying his new metamathematic method to pure mathematics as a whole.
Or, as Hilbert wrote in 1917: “Since the examination of the consistency is a
task that cannot be avoided, it appears necessary to axiomatize logic itself
and to prove that number theory and set theory are only parts of logic”.
Hilbert took a formal(istic) approach to achieve this logistic goal (logicism
is the study that uses logic as the basis of mathematics and formalists at
tempted to successfully axiomatize mathematics, see also the philosophies in
section 5.2). Thereto Hilbert identiﬁed three properties that an axiomatic
system should have: it should be decidable, complete and consistent. In or
der to deﬁne these notions, we ﬁrst have to make precise some other concepts.
Deﬁnition of an axiom:
A proposition that is regarded as true without proof
Deﬁnition of free variable:
A variable that is not bound within the scope of a quantiﬁer
102 CHAPTER 6. HILBERT
An axiom that does not contain any variables is also called an axiom
statement, an axiom with free variables is called an axiom scheme and each
free variable is to be quantiﬁed over all wellformed formulas.
Deﬁnition of statement (or sentence): A wellformed formula with no
free variables
Of the systems that Hilbert’s proof theory applies to, we here consider
those susceptible to G¨odel’s incompleteness theorem (that will be presented
in chapter 8).
Deﬁnition of an STGA language: A language
1
L is Susceptible to
G¨odel’s argument (STGA) if it consists of:
1 c, a denumerable set of (wellformed) expressions (also called formulas)
of L
2 o ⊆ c, sentences of L (i.e. with no free variables)
3 { ⊆ o, provable sentences of L
4 1 ⊆ o, refutable sentences of L
5 H ⊆ c, predicates of L (i.e. with free variables, H ∩ o = ∅). For
convenience, we here assume predicates to have exactly one variable.
6 A function ϕ : c N → c, ϕ assigns to every E ∈ c and n ∈ N an
expression E(n) such that for every H ∈ H we take for E and every
n ∈ N, H(n) is a sentence (H(n) ⊆ c hence, H(n) ⊆ o).
We can think of such a function ϕ as a substitution function. Infor
mally, the sentence H(n) expresses the proposition that the number n
belongs to the set names by H.
The following set is the only one that depends on a semantic
interpretation of the expressions, and is normally determined by a
model that we accept as representing the truth. The model should
be distinguished from the set of derivation rules that (syntactically
or mechanically) determines whether sentences are provable or
1
Sometimes also called system, since it not only deﬁnes a language but also includes
the (dis)provability and truth of expressions.
6.1. HILBERT’S PROOF THEORY 103
refutable. It is important to realize that the truth of a sentence
is not the same as the provability of that sentence.
7 T ⊆ o, true sentences of L. This set can be determined by a model
(see page 107)
First, we give an intuitive explanation of this deﬁnition: In most parts of
mathematics, not every sequence of symbols is meaningful or useful. There
fore we only consider the socalled wellformed formulas c. Some of these
formulas (also called propositions) do not contain free variables, we name
them sentences (o). Some of them are provable from the axiomatic system
(i.e. they can be derived from the axioms and derivation rules of the axiomatic
system), and are elements of {. Others are refutable, also called disprovable
(i.e. their negation can be derived from the axioms and derivation rules of
the axiomatic system) and are elements of 1. These notions only depend on
whether the sentence is derivable from the axiomatic system and are inde
pendent from the truth of the sentence. We call the set of true sentences T
(the other sentences are false). Other formulas have free variables, i.e. they
are functions. We call them predicates (H). We also assume there exists a
function ϕ that assigns to every expression H ∈ H and natural number n a
sentence H(n).
What is an Axiomatic System? An axiomatic system (sometimes also
called formal axiomatic system) is a logical system that gives rise to an STGL
language and has an explicitly stated ﬁnite set of axioms from which provable
sentences can be derived (using a ﬁnite set of derivation rules)
The set of axioms and derivation rules determines which sentences of L
are provable or not. The axiomatic system also contains a syntax deﬁnition
that determines the wellformedness of expressions of L. Normally, the syn
tax deﬁnition of an axiomatic system consists of an alphabet of symbols and
a set of rules. We show that this notion of an axiomatic system gives rise
to a language that falls under the category of STGL languages. Such an
axiomatic system A is often deﬁned as follows:
104 CHAPTER 6. HILBERT
Deﬁnition of axiomatic system: An axiomatic system A consists of:
• An alphabet Σ, consisting of a ﬁnite number of constants (with their
arities) and variables.
• A recursive deﬁnition of a syntax, determining which formulas are well
formed formulas.
• An initially determined and ﬁxed set of axioms and derivation rules
(also called transformation rules or rules of inference).
The recursive deﬁnition over the given alphabet gives us the set of ex
pressions. The variables enable us to form predicates. The set of axioms and
derivation rules let us prove or refute sentences. Ideally, we want all sen
tences that are provable coincide with the sentences we intuitively consider
true ({ = T ) and the refutable sentences coincide with those we consider
false. We call a system with this property correct. We now give an example
of a deﬁnition of a simple axiomatic system.
Example: axiomatic system A
1
• Σ = ¦∨
2
,
1
, (
0
, )
0
, ∀
2
, x
0
, y
0
, R
2
0
, true
0
, false
0
¦
The numbers that are written in superscript denote the arity of the
relations; a constant or variable is a 0ary relation.
• ϕ is a wellformed formula if it
0. is one of the constants true and false.
1. is an atomic formula R
i
(x
1
, . . . , x
j
), with R
i
a relation with arity
j, and x
1
, . . . , x
j
variables or constants.
2. has the form of ϕ
1
∨ ϕ
2
, ϕ
1
∧ ϕ
2
, (ϕ
1
), ϕ
1
, ∀x
i
(ϕ
1
), where ϕ
1
and
ϕ
2
are smaller formulas and x
i
is some variable from Σ.
6.1. HILBERT’S PROOF THEORY 105
• For all variables x, variables or constants c and d and wellformed for
mula ϕ,
R
0
(c, d)
true
∀x(ϕ)
false
false
true
true
false
true ∧ ϕ
ϕ
false ∧ ϕ
false
true ∨ ϕ
true
ϕ ∨ true
ϕ
106 CHAPTER 6. HILBERT
The STGA language L that can be constructed
2
on the basis of A
1
,
denoted by L
A
1
, consists of the following parts:
1. c is the set of usual mathematical predicates formed by the symbols of
the given alphabet (so c includes the binary relation R
0
).
2. o is the set of those expressions without free variables (i.e. proposi
tions).
3. The provable sentences { are those that are true from the derivation
rules. For example, false ∧ R
0
(false, true) → true ∧ R
0
(false, true)
→ true ∧ true → true.
4. The refutable sentences 1 are those that are false from the derivation
rules. For example, ∀y (false ∨y) ∧ true → false ∧ true → false.
5. The predicates are those expressions with one free variable.
6. For each such predicate we can replace the free variable by a formula
that is represented
3
by a natural number, and obtain a proposition.
7. The deﬁnition of an axiomatic system does not include a model. If we
think of the standard logic that is used in practice, we can see that for
all formulas except those with an ∀symbol, the formulas are derivable
if and only if they are true.
We now introduce some concepts related to STGA languages and axiomatic
systems. We assume that A is an axiomatic system that gives rise to an
STGA language L.
Deﬁnition of derivable: A formula ϕ is derivable in L := ϕ ∈ {.
A formula ϕ is derivable from an axiomatic system A, notation A ¬ ϕ :=
there is an axiom a
i
of A and a sequence of formulas ϕ
1
,. . . , ϕ
l
such that
ϕ
1
= a
i
and ϕ
l
= ϕ and each ϕ
i
follows from the preceding formulas and the
axioms of A by the derivation rules of A.
2
Sometimes it is also said that an axiomatic system A
1
gives rise to a language L
A
3
An example of such a bijective function between a predicate and a set of natural
numbers will be given in section 8.2.
6.1. HILBERT’S PROOF THEORY 107
We call the sequence of formulas ϕ
1
, . . . , ϕ
l
in a derivation of the state
ment ϕ a formal proof π of the statement ϕ. When A ¬ ϕ, we also write
ϕ ∈ A.
Example:
A
1
¬ false ∧ R
0
(false, true)
A
1
¬ ∀x)x (since the formula is not wellformed, i.e. does not follow
to be true from the syntax deﬁnition)
A
1
¬ ∀y (false ∨y) ∧ true (since it does not follow from the derivation
rules, i.e. is a refutable sentence)
Hilbert proposed a program to reformulate all mathematics as a formal
axiomatic theory, and this theory has to be proved to be consistent, i.e. free
from contradiction. The standard method that was used to prove the consis
tency of axiomatic systems was to give a ‘model’. A model for an axiomatic
theory is simply a system of objects, chosen from some other theory and
satisfying the axioms.
This means we can relate axiomatic systems to existing systems by means
of a model, also called interpretation or structure. A model of a formal
axiomatic theory is a welldeﬁned mathematical system with the particular
structure that is characterized by the theory.
Deﬁnition of universe: Set of values that variables of an axiomatic system
may take
Deﬁnition of a model: A universe together with an assignment of nary
relations to nary constants, and a corresponding assignment of the variables.
We deﬁne a model ´for an axiomatic system A by : ´ = (U, P
1
, . . . , P
k
)
with U a universe for A and P
1
, . . . , P
k
the relations corresponding to symbols
R
1
, . . . , R
k
of A. If a formula ϕ is true in the model ´ (i.e. by interpretation
of the relation symbols by the corresponding relations), notation ´ [= ϕ, we
say that ´ is a model of ϕ.
108 CHAPTER 6. HILBERT
Example: Let ´
1
= (N, ≤) be a model for axiomatic system A
1
´
1
[= ∀x∀y(x ≤ y ∨ y ≤ x)
´
1
[= ∀x∀y(x ≤ y ∧ y ≤ x)
Note that instead of using R
1
for the relation symbol, we immediately took
the interpretation ≤.
A theory Th of a model ´, notation Th(´) is the set of true statements
in the language of that model.
Deﬁnition of a theory: Th(´) := ¦ϕ [ ϕ is a statement and ´ [= ϕ¦
So now we can say that Hilbert was looking for an axiomatic system for
which logic can be a model. Hilbert proposed such an axiomatic system to
have the properties of consistency, completeness and decidability. We will
now introduce these concepts, along with some other properties of axiomatic
systems. Since the properties of an axiomatic system A give rise to corre
sponding properties in the language L
A
, we here distinguish in each deﬁnition
between the property of a language and of an axiomatic system.
Deﬁnition of decidability:
A language L is decidable := (∀ϕ :: (ϕ ∈ { ∨ ϕ ∈ 1)).
An axiomatic system A is decidable := (∀ϕ :: there is an algorithm that de
cides in a ﬁnite number of steps whether (or not) A ¬ ϕ) (see also [49, page
270])
Deﬁnition of consistency:
A language L is consistent := (∃s : s ∈ o : s ∈ { ∧ s ∈ 1), i.e. { ∩ 1 = ∅
or no sentence is both provable and refutable in L.
An axiomatic system A is consistent := (∃ϕ :: A ¬ ϕ ∧ A ¬ ϕ) (i.e. it is
not possible for any formula ϕ, to derive both ϕ and ϕ) (see also [49, page
240])
A language L is inconsistent if is not consistent. Clearly, L is inconsistent
if { and 1 are not disjoint. Note that consistency and decidability do not
refer to T , but only concern { and 1. The following deﬁnitions of com
pleteness, soundness and correctness also depend on the truth set T (and
therefore on the model that determines that truth set).
6.1. HILBERT’S PROOF THEORY 109
Deﬁnition of completeness:
A language L is complete for a model ´ := (∀ϕ :: ´ [= ϕ → ϕ ∈ {).
An axiomatic system A is complete for model ´ :=
(∀ϕ :: ´ [= ϕ → A ¬ ϕ) (i.e. all true statements in the model are deriva
ble/provable)
A language L is incomplete if it is not complete. Note that the statement
(∀ϕ :: ´ [= ϕ → A ¬ ϕ) is equivalent with (∀ϕ :: A ¬ ϕ → ´ [= ϕ), i.e. all
statements ϕ that are not derivable/provable, are also not true in the model.
Deﬁnition of soundness:
A language L is sound for a model ´ := (∀ϕ :: ϕ ∈ { → ´ [= ϕ).
An axiomatic system A is a sound axiomatization for a model ´ :=
(∀ϕ :: A ¬ ϕ → ´ [= ϕ) (i.e. if a statement ϕ is derivable/provable, it is
true in the model)
Deﬁnition of correctness:
A language L is correct for a model ´ := { ⊆ T ∧ 1∩ T = ∅ (i.e. every
provable sentence is true and every refutable sentence is false (not true)).
An axiomatic system A is correct for a model ´ := A is sound for ´ and
A is complete for ´
Theorem: If L is correct, it is consistent.
Proof: This follows directly from the deﬁnitions of correctness and consis
tency because if { is a subset of T and T is disjoint from 1, then { must
be disjoint from 1.
110 CHAPTER 6. HILBERT
6.2 Hilbert’s 23 problems
Who of us would not be glad to lift the veil behind which the future
lies hidden: to cast a glance at the next level of our science and
at the secrets of its development during future centuries? What
particular goals will there be toward which the leading mathema
tical spirits of coming generations will strive? What new methods
and new facts in the wide and rich ﬁeld of mathematical thought
will the next centuries disclose?
 D. Hilbert, in the opening of his speech to the 1900 Congress
in Paris
In 1900 Hilbert outlined his list of 23 mathematical problems to the In
ternational Congress of Mathematics in Paris, which he urged upon the at
tention of his contemporaries. His famous address was important and still
today inﬂuences and stimulates mathematical research all over the world.
It was not only a collection of problems, but it was also his philosophy of
mathematics (see also the formalist viewpoint in section 5.2) and a collec
tion of problems important to that philosophy. Many of the problems have
since been solved, and each solution was a noted event (or even a mathema
tical breakthrough). Some of these problems however remain unsolved till
this day. In 2000, in the footsteps of Hilbert, the Clay Mathematics Insti
tute (see http://zax.mine.nu/interests/questions/clay.htm) has made a new
list of 7 (for a large part mathematical) problems to be solved in this century.
Among those problems is one of the original problems (number 8) of
Hilbert. It requires a solution to the Riemann hypothesis, which is usually
considered to be the most important unsolved problem in mathematics. We
mention some of the original problems that are related to the foundations
of mathematics. For a complete source of information on the 23 (or 25?,
see [32]) original publications of Hilbert, see the articles [41] and [40], also
available online [42].
6.2. HILBERT’S 23 PROBLEMS 111
• Problem 1: Cantor’s problem of the cardinal number of the continuum.
This problem is also known as the Continuum Hypothesis and exten
sively covered in section 3.7.
• Problem 2: The consistency of the axioms of arithmetic. The question
is whether it can be shown that the axioms on which arithmetic is based
are consistent. G¨odel later showed that any formal system that contains
arithmetic (see chapter 8) can never prove its own consistency. Another
metamathematical argument might exist, that cannot be expressed in
the system, but can prove its consistency.
• Problem 6: Mathematical treatment of the axioms of physics, asks to
treat in the same manner, by means of axioms, those physical sciences
in which mathematics plays an important part; in the ﬁrst rank are the
theory of probabilities and mechanics. So far no complete axiomatiza
tion of physics has been found.
• Problem 9: Proof of the most general law of reciprocity in algebraic
number theory. For any ﬁeld of numbers, the law of reciprocity (for
more references see http://www.mathematik.unibielefeld.de/∼kersten/
hilbert/prob9.html) is to be proved for the residues of the l
th
power,
when l denotes a prime, and further when l is a power of 2 or a power
of an odd prime. This problem is still unsolved.
• Problem 10: Decidability of solvability of diophantine equations. This
question asks if, ‘given a diophantine equation with any number of un
known quantities and with rational integral numerical coeﬃcients, to
devise a process according to which it can be determined by a ﬁnite
number of operations whether the equation is solvable in rational inte
gers’. In modern terminology the problem asks to devise an algorithm
that tests whether a polynomial has an integral root. A root of a poly
nomial is an assignment of values to its variables so that the value of the
polynomial is 0. A root is an integral root if all variables are assigned
integer values. Some polynomials have an integral root (for example
6x
3
yz
2
+3xy
2
−x
3
−10 has an integral root at x = 5, y = 3 and z = 0)
and some do not.
Hilbert did not use the term algorithm but rather ‘a process according
to which it can be determined by a ﬁnite number of operations’. In
order to solve this problem this notion had to be made more precise
112 CHAPTER 6. HILBERT
(this was done by Turing, see section 9.1). Also, Hilbert asked that an
algorithm be devised. Thus he apparently assumed such an algorithm
exists, but now we know that this problem is algorithmically unsolv
able. In 1970, the young Russian Yuri Matijasevic, building on the
work of Martin Davis, Hilary Potnam and Julia Robinson, showed that
no algorithm exists for testing whether a polynomial has integral roots.
• Problem 23: Further development of the methods of the calculus of
variations. Of the 23 problems Hilbert posed, this one is the least deﬁ
nite, since it involves the general question of extending the calculus of
variations, which basically is the theory of the variation of functions.
With some examples that we will not treat here, Hilbert gave a jus
tiﬁcation of the necessity for an extension of the diﬀerential and in
tegral calculus (for more references see http://www.mathematik.uni
bielefeld.de/∼kersten/hilbert/prob23.html).
At the end of his article, Hilbert says that he does not believe mathema
tics will, like other sciences, split into separate branches whose connection
becomes ever more loose, but that the organic unity of mathematics is in
herent in the nature of this science, for mathematics is the foundation of all
exact knowledge of natural phenomena. For a more detailed assessment of
Hilbert’s view, see [49, section 12.4] and [31, section 4.7].
Chapter 7
Types
7.1 Russell and Whitehead’s Principia Ma
thematica
Logic has become more mathematical and mathematics has be
come more logical. The consequence is that it has now become
wholly impossible to draw a line between the two; in fact, the two
are one. They diﬀer as boy and man; logic is the youth of ma
thematics and mathematics is the manhood of logic.
 B. Russell in [79, page 194]
In section 4.1 we saw that with the postulates he presented, Peano stated
and organized the fundamental laws of number theory, the core of mathema
tics. If statements satisfying these conditions could be derived in this logic,
it would show that (at least part of) mathematics was founded in pure logic.
As we have seen in section 4.2, Frege was adherent to the goal of logicism that
all of mathematics could be derived from logic alone. But unfortunately the
language that he created was inconsistent, as we have learned from Russell’s
paradox in section 5.1. In his 1908 paper, ‘Mathematical Logic as Based on
the Theory of Types’, Russell laid out a theory to eliminate the paradoxes.
With Principia Mathematica, Bertrand Russell and his teacher, the mathe
matician Alfred Whitehead, presented this theory to prevent the paradoxes
while at the same time allowing many of the operations Frege considered de
sirable. The theory of types basically says that all sets and other entities have
113
114 CHAPTER 7. TYPES
a logical ‘type’, these types can be ordered and sets are always constructed
from speciﬁed members with lower types. We will look at the theory of types
in more detail in section 7.2.
Principia Mathematica consisted of three volumes (sometimes also called ‘the
Principia’) and was named after the ‘Philosophiae naturalis principia mathe
matica’ of the English physician Isaac Newton. But unlike Newton’s book it
dealt not with the application of mathematical techniques to physics, but to
logic and mathematics itself. With their mathematical treatment of the prin
ciples of the mathematicians, Russell and Whitehead intended to summarize
the recent work in logic as well as to give a revolutionary and systematical
development of mathematical logic and derive basic mathematical principles
from the principles of logic alone.
Their collaboration began in 1903 when Whitehead and Russell were both
in the initial stages of preparing second volumes to earlier books on related
topics: Whitehead’s 1898 ‘A Treatise on Universal Algebra’ and Russell’s
1903 ‘The Principles of Mathematics’. Their work overlapped considerably
and they began collaborating on what would become ‘Principia Mathema
tica’. The approach of Russell and Whitehead was essentially that of Frege,
to deﬁne mathematical entities (like numbers) in pure logic and then derive
their fundamental properties. Indeed, their deﬁnition of natural numbers was
basically the same as the one of Frege, but unlike him, they opted to avoid
the philosophical aspects and justiﬁcations. Although ‘Principia’ was largely
successful there still was critique on the axioms of inﬁnity and the axiom of
reducibility, they were considered to be too ad hoc solutions to be justiﬁed
philosophically. In 1919 Russell published about the philosophy behind his
work in an ‘Introduction to Mathematical Philosophies’ which was accessible
to a broad audience and therefore has been the main source through which
Russell’s logicist view of mathematics has become known.
I quote the following assessment about Principia Mathematica from [91]:
“In addition to its notation (much of it borrowed from Peano), its mas
terful development of logical systems for propositional and predicate logic,
and its overcoming of diﬃculties that had beset earlier logical theories and
logistic conceptions, the Principia oﬀered discussions of functions, deﬁnite
descriptions, truth, and logical laws that had a deep inﬂuence on discus
sions in analytical philosophy and logic throughout the 20th century. What
is perhaps missing is any hesitation or perplexity about the limits of logic:
whether this logic is, for example, provably consistent, complete, or decida
7.1. RUSSELL AND WHITEHEAD’S PRINCIPIA MATHEMATICA 115
ble, or whether there are concepts expressible in natural languages but not
in this logical notation. This is somewhat odd, given the wellknown list of
problems posed by Hilbert in 1900 that came to animate 20thcentury logic,
especially German logic. The Principia is a work of conﬁdence and mastery
and not of open problems and possible diﬃculties and shortcomings; it is a
work closer to the naive progressive elements of the Jahrhundertwende than
to the agonizing ﬁn de siecle.”. We would like to add that with the very for
mal and accurate buildup of mathematics, Russell and Whitehead not only
managed to avoid the paradoxes but also created one of the most impressive
and complicated works of all times and that is, next to Aristotle’s Organon,
considered to be the most inﬂuential book on logic that was ever written.
In the next section we will further investigate Russell’s theory of types.
The English mathematician Frank Plumpton Ramsey (19031930) oﬀered
criticism to the theory of types that was accommodated in later editions of
Principia Mathematica. The result of this is the ‘deramiﬁed theory of types’
that will be treated in subsequent sections, together with a later simpliﬁcation
to this theory by the mathematicians Hilbert and Wilhelm Ackermann (1896
1962) from Germany.
The mathematician Alonzo Church also published articles on type systems,
but did not develop his typed version of lambda calculus before the 1940’s,
and his typed lambda calculus thereby falls outside the scope of this article
(18701940). We will only summarize his work in this paragraph. The main
diﬀerence between the type structure of Russell and that of Church is that
the former is setbased with linear ordering of types and the latter is function
based with a nonlinear order of types. The type theory that emerged from
Church’s lambda calculus (see section 9.2) was extended with simple types
in 1940 to prevent paradoxes, similar to the extension of logical set theory
with simple types by Russell in 1910 to avoid the paradoxes. Church also
proposed another logical set theory in 1974.
[..] in the simple theory of types it is well known that the indi
viduals may be dispensed with if classes and relations of all types
are retained; or one may abandon also classes and relations of the
lowest type, retaining only those of higher type. In fact any ﬁnite
number of levels at the bottom of the hierarchy of types may be
deleted. But this is no reduction in the variety of entities, because
the truncated theory of types, by appropriate deletions of entities
116 CHAPTER 7. TYPES
in each type, can be made isomorphic to the original hierarchy 
and indeed the continued adequacy of the truncated hierarchy to
the original purposes depends on this isomorphism.
 A. Church in ‘The need for abstract entities’.
Organization of Principia Mathematica
The nearly 2,000 pages Principia Mathematica starts with a short preface
that explains what it wants to demonstrate, namely that pure mathematics
can be based on logic alone and requires no other primitive notions. Russell
classiﬁes statements that involve logical constants only (such as the laws of
reciprocity, see page 18 of Principia Mathematica) as pure mathematics, and
other mathematical assertions that also refer to nonlogical contents (such as
the statement that (perceptual) space is threedimensional) as part of applied
mathematics. The belief was then expressed that pure mathematics was suf
ﬁcient to include all traditional mathematics. Then, after an introduction,
the ﬁrst volume introduces a symbolic logic that is based on a small set
of axioms, and then lays out the propositional and predicate calculi. Built
upon these, Whitehead and Russell deﬁne types, sets, relations and their
properties, and basic operations on sets. The second volume continues with
a purely logical theory of cardinal and ordinal arithmetic. This allowed them
to introduce basic arithmetic, including addition, multiplication and expo
nentiation of both ﬁnite cardinals and of relations.
The volume ends with a general theory of simply ordered sets (series) which
is followed by a logical base of fundamental mathematical analysis, including
subjects as convergent sequences, continuity, limits and derivatives.
The third volume was meant to prepare the ground for the fourth and con
cluding volume on geometry (which was never completed), and contained a
theory of numbers that was called ‘measurement’. It starts with a theory of
wellordered sets, ﬁnite, inﬁnite and continuous series, the negative integers,
ratios and the real numbers, and ﬁnally vectors, coordinates and basic geo
metric notions such as angles.
More details about the organization of Principia Mathematica and a critical
assessment of its work can be found in [31, chapter 7, and speciﬁcally section
7.8].
7.1. RUSSELL AND WHITEHEAD’S PRINCIPIA MATHEMATICA 117
The symbolic logic and notation of Principia Mathematica
Russell and Whitehead opted for a more modern notation of Peano in
stead of Frege’s Begriﬀsschrift. Unlike Frege, Russell and Whitehead treated
functions as ﬁrstclass citizens. A good introduction to the logical calculus
and the speciﬁc notation that was used in Principia Mathematica can be
found in [49, section 3.2 and 3.3] and [31, sections 7.2, 7.3, 7.7 and 7.8].
Russell’s theory of types
Russell’s 1908 book included a categorization of most of the important
contradictions of that time, and an analysis of their common characteristics.
To prevent the paradoxes he catalogued, Russell formulated the vicious circle
principle (see page 85) and implemented it using types in Principia Mathe
matica (see for details [31, section 7.9] and [49, section 3.2 and 3.3]).
What is a type?
A type is the range of signiﬁcance of a propositional function, that is, the
collection of arguments for which the said function is signiﬁcant and has val
ues.
The type of a variable in a proposition is ﬁxed by all the values the func
tion is concerned with, i.e. by the totality over which the variable ranges.
This division of objects into types (the type of an object can be seen as a
property of that object) is necessary to conform to the vicious circle principle,
i.e. to make sure that ‘whatever contains an apparent variable must not be
a possible variable of that variable’. This can be established by making sure
that ‘an apparent variable’ is of a diﬀerent and higher type than the possible
values of that type. This linear order of types prevents vicious circles, since
the variables contained in an object determine the type of that object.
Russell then deﬁned an individual as being not a proposition but a con
stant, destitute of complexity. We can now categorize propositions by their
types. First order propositions are elementary propositions that only con
tain individuals, second order propositions are propositions with ﬁrstorder
propositions as variables and possibly propositions of lower than ﬁrst order
types. This can be continued, such that the n + 1
th
order propositions con
tain propositions of order n and possibly others of order smaller than n.
118 CHAPTER 7. TYPES
We now also restrict relations like ∈ so that x ∈ y is only signiﬁcant when
y is of a type one level higher than x, and we conﬁne quantiﬁers always
to a single level. As can be proved however, this way of restricting propo
sitions prevents the paradoxes but can in some cases be needlessly restrictive.
For more information about types in Principia Mathematica, see [31,
section 7.9] and [49, section 3.3]. For a formalization (in modern notation) of
Russell’s Ramiﬁed Theory of Types (RTT), we refer to [86, chapter 3]. On
its turn, this reference is again partly based on [52], [53], [54] and [43], all of
which in a certain context discuss RTT.
A detailed introduction to the (symbolic) logic and notation of Principia
Mathematica, as well as a formal introduction to RTT, STT and NF and
MP (see section 7.3), is to be included in a later version of this report.
7.2. RAMSEY, HILBERT AND ACKERMANN 119
7.2 Ramsey, Hilbert and Ackermann
Suppose a contradiction were to be found in the axioms of set
theory. Do you seriously believe that a bridge would fall down?
 F.P. Ramsey, quoted in [58]
Ramsey published his ﬁrst major work ‘The Foundations of Mathematics’
(see [69, page 105142]) in 1925. In this publication he attempted to improve
Principia Mathematica in two ways. First he proposed dropping the axiom
of reducibility which, he writes, is “[...] certainly not selfevident and there
is no reason to suppose it true; and if it were true, this would be a happy
accident and not a logical necessity, for it is not a tautology.”. His second
simpliﬁcation is to suggest simplifying Russell’s theory of types by regarding
certain semantic paradoxes as linguistic. He accepted Russell’s solution to
remove the logical paradoxes of set theory arising from, for example, ‘the
set of all sets which are not members of themselves’. However, the seman
tic paradoxes such as ‘this is a lie’ are, Ramsey claims, quite diﬀerent and
depend on the meaning of the word ‘lie’. These he removed with his reinter
pretation of the axiom of reducibility.
After his suggestions, Russell’s theory became known as the ramiﬁed theory
of types (RTT), and Ramsey’s modiﬁcation of the theory as the deramiﬁed
theory of types.
For more detailed information about the history of deramiﬁcation, we refer
to [86, chapter 4].
Hilbert, together with Ackermann (see [2]), simpliﬁed Russell’s theory of
types by removing the orders into what has become known as the ‘simple
theory of types’ (STT). We quote from page 115 of [49]: “[In the simple
theory of types,] every individual or individual variable is said to be of type
i; and if a predicate or predicate variable ϕ(x
1
, . . . , x
n
) has arguments x
1
,
. . . , x
n
, of types τ
1
, . . . , τ
2
respectively, then ϕ(x
1
, . . . , x
n
) is said to be of
type (τ
1
, . . . , τ
2
). Thus, for example, any predicate with two individual ar
guments is of type (i, i), while a predicate with a single argument that is
itself a predicate with two individual arguments is of type (i, i, (i, i)). Having
introduced the hierarchy of types in this way, we shall now require bound
variables to be of some deﬁnite type. Every quantiﬁer will then range over
the totality of all entities of the same type as the bound variable. When
120 CHAPTER 7. TYPES
this is done, we have a very comprehensive logical calculus which is secure
against vicious circularity”.
A further discussion and formalization (in the form of Church’s simply
typed lambda calculus λ → c) of the simple theory of types can be found in
[86].
7.3. QUINE 121
7.3 Quine
Just as the introduction of the irrational numbers . . . is a conve
nient myth [which] simpliﬁes the laws of arithmetic . . . so physical
objects are postulated entities which round out and simplify our
account of the ﬂux of existence . . . The conceptional scheme of
physical objects is [likewise] a convenient myth, simpler than the
literal truth and yet containing that literal truth as a scattered part
 Quine, quoted in [50]
Willard Van Orman Quine (19082000) was an American mathematician
who became interested in the work of Russell. An alternative to Russell’s sys
tem is one that allows a single universe of all types (or all sets). In Russell’s
theory such an object is too big but according to others, including Quine,
having a set of all sets or a type of all types is legitimate as long as we do not
permit forming all subsets. If there is some restriction on which subsets can
be formed, for example by requiring a stratiﬁed predicate to deﬁne the sub
set, then no contradiction will result. Quine proposed in [94, pages 80101]
a system called New Foundations, NF, based on this idea. To restrict the
way subsets are formed, Quine further restricted the comprehension axiom to:
NFC(omprehension) Axiom: ∃x∀y :: (y ∈ x ↔ ϕ(y)), where x is not
free in ϕ(y) and ϕ(y) is stratiﬁed
In [86, footnote 4], we ﬁnd two deﬁnitions of stratiﬁcation.
Deﬁnition of heterogeneous stratiﬁcation: A wellformed formula ϕ
is heterogeneously stratiﬁed := there is a function f from the variables and
constants of ϕ to the natural numbers such that for each atomic wellformed
formula F(x
1
, . . . , x
n
) of ϕ, f(F) = 1 + (max : 1 ≤ i ≤ n : f(x
i
))
Deﬁnition of homogeneous stratiﬁcation: A wellformed formula ϕ is
homogeneously stratiﬁed := ϕ is heterogeneously stratiﬁed and for the corre
sponding function f we also have that f(x
i
) = f(x
j
) for 0 ≤ i, j ≤ n
With the NFC axiom the paradox is obviously prevented, since the sen
tence ϕ ≡ x / ∈ x is not stratiﬁed.
122 CHAPTER 7. TYPES
We quote from [86, page 3]: “NF is weak for mathematical induction and
the axiom of choice is not compatible with NF. We cannot prove Peano’s
axiom[s] in it, unless we assume the existence of a class with m + 1 ele
ments. Also, NF is said to lack motivation because its axiom of compre
hension is justiﬁed only on technical grounds and one’s mental image of set
theory does not lead to such an axiom. To overcome some of the diﬃculties,
Quine adopted similar measures to NBG (NeumannBernayG¨odel, see sec
tion 8.5) set theory[, and developed another noniterative set theory called
ML (Mathematical Logic), ﬁrst presented in [70]]. Like NBG, ML contains
a bifurcation of classes into elements and nonelements. Sets can enjoy the
property of being full objects whereas classes cannot. ML was obtained from
NF by replacing (NFC) by two axioms, one for class existence and one for
elementhood. The rule of class existence provides [. . . ] the existence of the
classes of all elements satisfying any condition ϕ, stratiﬁed or not. The rule
of elementhood is such as to provide the elementhood of just those classes
which exist for NF. Therefore, the two axioms of comprehension for ML [are]:
Comprehension by a set: (∃y∀x :: x ∈ y ↔ ϕ(x)), where ϕ(x) is stratiﬁed
with set variables only in which y does not occur free.
Impredicative comprehension by a class: (∃∀x :: x ∈ y ↔ ϕ(x)), where ϕ(x)
is any formula in which y does not occur free.
ML was liked both for the manipulative convenience we regain in it and
the symmetrical universe it furnishes. It was however proved subject to the
BuraliForti paradox”.
For more information, we refer to [70], [71], [72] and the website
http://diamond.boisestate.edu/∼holmes/holmes/nf.html.
Chapter 8
G¨odel
The development of mathematics towards greater precision has
led, as is well known, to the formalization of large tracts of it,
so that one can prove any theorem using nothing but a few me
chanical rules. [. . .] It will be shown below that this is not the
case, that on the contrary there are in the two systems mentioned
[viz. Principia Mathematica and ZF] relatively simple problems
in the theory of integers that cannot be decided on the basis of the
axioms.
 K. G¨ odel, in the opening of the paper introducing the incom
pleteness theorem (1931)
8.1 Informally: G¨ odel’s incompleteness theorems
No system of Hilbert’s type in which the integers (or Peano’s arithmetic, see
section 4.1) can be deﬁned can be both consistent and complete. At the
time this seemed unreal, but in 1931 Kurt G¨ odel (born in 1906 in Brnn,
AustriaHungary, what is now Brno, Czech Republic) presented mathema
ticians with the astounding and melancholy conclusion that the axiomatic
method has certain limitations, which rule out the possibility that even the
ordinary arithmetic (as by Peano) can ever be fully axiomatized. As a corol
lary of this theorem, he proved that it is impossible to establish the internal
logic consistency of a very large class of deductive systems. It provoked a
reappraisal of philosophies of mathematics.
123
124 CHAPTER 8. G
¨
ODEL
G¨odel’s famous incompleteness theorem and the corresponding corollary
are also called the ﬁrst and the second incompleteness theorem. G¨ odel was
able to show that, if an axiomatic system of formalized arithmetic is wide
enough, then
1. The system is necessarily incomplete, in the sense that there exists a
formula ϕ of the system such that neither ϕ nor its negation is derivable
(see also section 8.2 for the deﬁnition of incompleteness), and
2. If the system is consistent, then no proof of its consistency is possible
which can be formalized within it (see also section 8.2 for the deﬁnition
of consistency).
We ﬁrst indicate (in 8 steps, following the lines of the original proof of
G¨odel) the main lines of both theorems in this section, and provide a more
rigorous and exact proof of the theorems in section 8.2 and further sections.
1 The (syntax of) formulas of an axiomatic system are precisely deﬁned
and built up from a ﬁnite alphabet of symbols. Proofs are noth
ing but a ﬁnite series of formulas and can be replaced by numbers.
With such a representation, the G¨ odel numbering, G¨odel gave a well
ordering of all wellformed formulae of an axiomatic system S (to be
precise, of ωcomplete systems, see section 8.2 for more details). G¨ odel
then showed how to represent metamathematical concepts as ‘formula’,
‘proofschema’ and ‘provable formula’ by a series of natural numbers.
We deﬁne gn(ϕ) to be the G¨ odel number corresponding to wellformed
formula ϕ of S.
2 We consider a formula prov(ϕ) of S, stating that ϕ is a provable for
mula. Precisely, we deﬁne prov(ϕ) := ‘ϕ is a provable formula’. A class
sign is a formula with just one free variable. We suppose that the class
signs are ordered by a function R with domain N, such that R(n) is
deﬁned as the n
th
class sign. By [R(n); q] we denote the formula which
is denoted by replacing the free variable in R(n) by q.
3 We now deﬁne a set K of natural G¨ odel numbers by n ∈ K ↔
prov([R(n); n]). Since the symbols that are used in this formula are
all deﬁnable in S, there also is a formula with one free variable (i.e. a
class sign) that denotes n ∈ K, for some natural number n. We call
8.1. INFORMALLY: G
¨
ODEL’S INCOMPLETENESS THEOREMS 125
this class sign C. So there is a natural number q such that C = R(q).
We now show that the proposition G ≡ [R(q); q], is unprovable in S.
Since
1
this formula says that q ∈ K, that is prov([R(q); q]), we can
say that G is a property that asserts of itself that it is not provable.
4 We show that G is provable ↔ G is provable, and hence is undecid
able:
• Suppose G is provable, this means [R(q); q] is provable, (by replac
ing the variable in the class sign by q) that is q ∈ K, i.e. prov([R(q); q]),
and this says prov(G) : G is not provable.
• Suppose G is not provable, this means its negation [R(q); q] is
provable, (by replacing the variable in the class sign C by q)
that is q / ∈ K, i.e. (prov([R(q); q]), and this is equivalent with
prov([R(q); q]) or prov(G) : G is provable.
A proof of G leads to a proof of G and vice versa, thus the system S
is inconsistent. So if we assume that S is consistent, then both G and
G must not be provable: G is undecidable in S.
5 By a metamathematical consideration we know however that G is true.
Because from the remark that G asserts its own unprovability, it follows
at once that G is true, since G is unprovable (because undecidable).
So there is a true statement in S (namely G) that is not provable: the
system S is incomplete!
6 If we add G as an axiom, we can again apply the argument given
in the previous ﬁve steps in the same way. Basically we then create
another formula G
, since in step 3 a proposition is deﬁned that states
‘this formula is not provable’, or in other words ‘this formula does not
follow from the axioms’. That means, the proposition depends on the
set of axioms. Therefore, as I. GrattanGuinness cleverly calls it in [31,
page 510], the system S is ‘essentially incompleteable’.
7 G¨odel then showed that ‘if arithmetic is consistent, it is incomplete’.
We want to prove this conditional statement as a whole. We deﬁne
the condition of the statement by A: ‘arithmetic is consistent’. We
1
By replacing in the class sign C, which expresses that n ∈ K for some natural number
n, the free variable by q.
126 CHAPTER 8. G
¨
ODEL
already have seen in section 6.1 that this means that there is at least
one formula ϕ of arithmetic that is not true. So we can express A ≡
(∃y :: (∀x :: prov (x is a proof of y))). A system is incomplete if
there is a true statement that is not provable. Thus we can represent
the conclusion of the conditional statement by G.
8 We can now formally prove A → G (see section 8.2 for the proof). This
means that if A is provable, we know (by modus ponens or the role of
detachment) that G is provable. But we already saw that (unless S
is inconsistent), G is not provable; thus if S is consistent, A is not
provable! That means if arithmetic is consistent its consistency cannot
be established by metamathematical reasoning within the formalism
of arithmetic (this is G¨ odel’s theorem 11, see [93, page 614]). Or, as
expressed in [31, page 510], ‘any set S of consistent formulae of PM
cannot include the formula F asserting its consistency’.
8.2. FORMALLY: G
¨
ODEL’S INCOMPLETENESS THEOREMS 127
8.2 Formally: G¨ odel’s Incompleteness Theorems
The ﬁrst incompleteness theorem says that Principia Mathematica or any
other system in which arithmetic can be developed, is essentially incomplete,
that is in any consistent set of arithmetical axioms there are statements that
are true but cannot be derived from the set.
The second theorem says that it is impossible to give a metamathemat
ical proof of the consistency of a system comprehensive enough to contain
the whole of arithmetic  unless the proof itself employs rules of inference
in certain essential respects diﬀerent from the derivation rules identifying
theorems within the systems.
In the following two paragraphs, we will ﬁrst give an abstract version of
G¨odel’s ﬁrst and second incompleteness theorem, investigate the set of lan
guages that the theorem applies to, and then in the third paragraph ﬁll in
the details by giving a speciﬁc G¨ odel numbering for arithmetic. Then in the
next sections we will apply the theorem to the system of Peano Arithmetic
and that of Principia Mathematica, and discuss the consequences of the in
completeness theorem.
8.2.1 On formally undecidable propositions
We assume there is an STGA language L and investigate the conditions for
a system L for which G¨odel showed that there is a true sentence that is not
provable in L (i.e. (∃t : t ∈ T : t / ∈ {)). We deﬁne the following concepts:
A predicate H expresses a set of numbers A := (∀n :: H(n) ∈ T ↔ n ∈ A)
A is expressible in L if A is expressed by some predicate of L. Note that
expressibility in L only concerns with T and not with { and 1.
Theorem: Not every set of numbers is expressible.
Proof: (from [84]) Since L is built up of a ﬁnite number of symbols and
derivation rules, there are only denumerably many expressions or predicates
of L. But (by Cantor’s theorem, see page 69) there are nondenumerably
many sets of natural numbers. Therefore, not every set of numbers is ex
pressible in L.
128 CHAPTER 8. G
¨
ODEL
Let gn be a function that assigns to each expression a unique natural
number (just as in step 1 in section 8.1, i.e. gn is a bijection between c and
N). For any E ∈ c, we also call gn(E) the G¨ odel number of E. We will
give a speciﬁc numbering in section 8.2.3. For this abstract treatment the
only assumption
2
we make is that every number is the G¨odel number of some
expression.
We deﬁne E
n
to be the inverse of gn, i.e. gn(E
n
) = n. The diagonali
zation of E
n
for E
n
⊆ H, is deﬁned by E
n
(n). We deﬁne d(n) to be the
G¨odel number of the diagonalization of E
n
, that is: d(n) := g
n
(E
n
(n)), and
call d the diagonal function of the system. For each set of natural numbers
A, we deﬁne A
∗
to be the set of all numbers n such that d(n) ∈ A, i.e. we
have n ∈ A
∗
↔ d(n) ∈ A. For any set of natural numbers A, we deﬁne
its complement
¯
A to be the set of all natural numbers not in A. The com
plement operation ∼ binds stronger than the ∗, i.e. (
¯
A
∗
) is to be read as (
¯
A)
∗
.
Abstract form of G¨ odel’s ﬁrst theorem: Let P be a set of G¨odel num
bers of all the provable sentences. If the set
¯
{
∗
is expressible in L and L is
correct, then there is a true sentence of L not provable in L.
Proof: (based on [84]) Suppose L is correct and
¯
P
∗
is expressible in L by
a predicate H with G¨odel number h. Let G be the diagonalization of H
(i.e. the sentence H(h)). We show that G is true but not provable in L. H
expresses
¯
P
∗
in L, i.e. H(n) is true ↔ n ∈
¯
P
∗
for all n ∈ N. In particular,
H(h) is true ↔ h ∈
¯
P
∗
. We have that h ∈
¯
P
∗
↔ d(h) ∈
¯
{ ↔ d(h) / ∈ P.
But since h is the G¨odel number of H and by the deﬁnition of d, d(h) is
the G¨odel number of H(h) and so d(h) ∈ P ↔ H(h) is provable in L and
d(h) / ∈ P ↔ H(h) is not provable in L. Now we have: H(h) is true ↔ H(h)
is not provable in L. This means that H(h) is either true and not provable
in L or false but provable in L. The latter alternative violates the hypothe
sis that L is correct. Hence it must be that H(h) is true but not provable in L.
Note that in this proof we have not deﬁned the set T by a model but
determined the truth of G by a metamathematical argument just as we have
seen in step 5 of section 8.1, that is nevertheless commonly accepted by all
mathematicians. Note also that the proposition G corresponds to the propo
2
This assumption is for technical reasons that make the proof more simple; G¨ odel’s
original numbering did not have this restriction.
8.2. FORMALLY: G
¨
ODEL’S INCOMPLETENESS THEOREMS 129
sition G of point 3 of section 8.1, since H(h) is a proposition that expresses
of itself that it is not provable.
Theorem: If L is correct and if the set
¯
P
∗
is expressible in L, then L is
incomplete.
Proof: A system L that is correct and for which the set
¯
P
∗
is expressible in
L contains a sentence G that is true but not provable or refutable (By the
previous theorem and the assumption of correctness). Hence G is true, but
undecidable in L, and hence also incomplete.
That is where the name incompleteness theorem comes from. By this
theorem, it follows immediately that if a system is consistent, and the set
¯
P
∗
is expressible in that system (which we will later see is true for a system
of basic arithmetic) then it is incomplete. Note that this is the statement
A → G of point 8 in section 8.1.
When we study a particular language L, such as a system containing Peano’s
arithmetic or the system of Principia Mathematica, we have to verify the
assumption that
¯
P
∗
is expressible in L. We can do this by separately verifying
the following conditions.
G
1
: For any set A expressible in L, the set A
∗
is expressible in L.
G
2
: For any set A expressible in L, the set
¯
A is expressible in L.
G
3
: The set P is expressible in L.
Theorem: G
1
∧ G
2
∧ G
3
→
¯
P
∗
is expressible in L.
Proof: G
1
and G
2
imply that for any expressible set A,
¯
A
∗
is expressible in
L. In particular we then have that if P is expressible in L (i.e G
3
holds),
¯
P
∗
is expressible in L.
Before we prove a general form of G¨ odel’s second incompleteness theo
rem, we introduce some more deﬁnitions.
A sentence E
n
is a G¨ odel sentence for a set A of natural numbers if either
E
n
is true and its G¨ odel number lies in A, or E
n
is false and its G¨ odel number
lies outside A, i.e. E
n
is a G¨odel sentence for A if and only if E
n
∈ T ↔ n ∈ A.
Diagonal Lemma: For any set A, if A
∗
is expressible in L, then there is a
G¨odel sentence for A.
130 CHAPTER 8. G
¨
ODEL
Proof: Suppose H is a predicate that expresses A
∗
in L; let h be its G¨odel
number. Then d(h) is the G¨odel number of H(h). For any number n, H(n)
is true ↔ n ∈ A
∗
, therefore, H(h) is true ↔ d(h) ∈ A, and since d(h) is the
G¨odel number of H(h), then H(h) is a G¨odel sentence for A.
Lemma: If L satisﬁes G
1
, then for any set A expressible in L, there is a
G¨odel sentence for A.
Proof: L satisﬁes G
1
, thus for any expressible set A, A
∗
is expressible in
L. Now we can apply the previous lemma to conclude that there is a G¨ odel
sentence for A.
With the diagonal lemma we can also prove the ﬁrst theorem as follows:
Since
¯
P
∗
is expressible in L, by the diagonal lemma, there is a G¨ odel sentence
G for
¯
P. A G¨odel sentence for
¯
P is a sentence which is (by the deﬁnition
of a G¨ odel sentence) true if and only if it is not provable in L. So for any
correct system L, a G¨odel sentence for
¯
P is a sentence which is true but not
provable in L.
8.2.2 The impossibility of an ‘internal’ proof of consis
tency
With the diagonal lemma we can also prove a general form of G¨ odel’s second
theorem, that was ﬁrst formulated in this form by the Polish mathematician
Alfred Tarski.
A general form of G¨ odel’s second theorem (by Tarski)
1. The set
¯
T
∗
is not expressible in L
2. If condition G
1
holds, then
¯
T is not expressible in L
3. If conditions G
1
and G
2
both hold, then the set T is not expressible in
L (i.e. for systems for which G
1
and G
2
hold, truth within the system
is not deﬁnable within the system.)
Proof: To begin with, there cannot possibly be a G¨ odel sentence for the set
¯
T because such a sentence would be true if and only if its G¨odel number was
not the G¨odel number of a true sentence, and this is absurd.
8.2. FORMALLY: G
¨
ODEL’S INCOMPLETENESS THEOREMS 131
1. If
¯
T
∗
were expressible in L, then by the diagonal lemma, there would be
a G¨odel sentence for the set
¯
T, which we have just shown is impossible.
Therefore,
¯
T
∗
is not expressible in L.
2. Suppose condition G
1
holds. Then if
¯
T were expressible in L, the set
¯
T
∗
would be expressible in L, violating (1).
3. If G
2
also holds, then if T were expressible in L, then
¯
T would also be
expressible in L, violating (2).
Now we have seen both theorems in a general form, we will consider particular
mathematical languages, starting with ﬁrst order arithmetic, which we can
build on in section 8.3 to prove the incompleteness of systems based on
Peano’s arithmetic and other systems.
8.2.3 G¨ odel numbering and a concrete proof of G
1
, G
2
and G
3
This section will be completed in a later version of this document. For the
moment we refer to G¨odel’s original work that can be found in [93].
132 CHAPTER 8. G
¨
ODEL
8.3 G¨ odel’s theorem and Peano Arithmetic
The classiﬁcation of the various modes of syllogisms, when they
are exact, has little importance in mathematics. In the mathema
tical sciences are found numerous forms of reasoning irreducible
to syllogisms.
 G. Peano in [68, page 379]
There are various diﬀerent incompleteness proofs of Peano Arithmetic
(with and without exponentiation). We mention three of them. The sim
plest uses a truth set deﬁned by Tarski and shows that every axiomatizable
subsystem of ^ (the complete theory of arithmetic) is incomplete. This
proof of G¨ odel’s ﬁrst theorem however cannot be formalized in arithmetic
(since the truth set is not expressible in arithmetic), and was based on the
underlying assumption that Peano Arithmetic is correct, implying that every
sentence provable in Peano Arithmetic is a true sentence. G¨odel’s original
incompleteness proof involves the much weaker assumption of ωconsistency.
Deﬁnition of simple consistency: An axiomatic system A is
simply consistent := no sentence is both provable and refutable in A
Deﬁnition of ωinconsistent: An axiomatic system A is ωinconsistent
:= there is a predicate F(w) (in one free variable w) such that the sentence
(∃w :: F(w)) is provable but all the sentences F(0), F(1), . . . are refutable
Deﬁnition of ωincomplete: An axiomatic system A is ωincomplete := A
is a simply consistent axiomatic system in which all Σ
0
sentences are provable
G¨odel’s original proof was based on the assumption of ωconsistency and
shows that every axiomatizable ωconsistent system in which all true Σ
0

sentences are provable is incomplete. This proof is of course formalizable in
Peano Arithmetic (and this is necessary for G¨ odel’s second theorem) and also
shows that any axiomatic system A that is simply consistent and in which
all Σ
0
sentences are provable, is ωincomplete.
The third proof (1936) is due to Rosser and uses the even weaker assumption
of simple consistency. It is based on an axiomatic system by the American
mathematician Raphael Robinson (19121995), that we refer to as R. It
8.3. G
¨
ODEL’S THEOREM AND PEANO ARITHMETIC 133
shows that every axiomatizable simply consistent extension of R is incom
plete, but thereto uses a more elaborate sentence than the G¨ odel sentence
‘G is undecidable’.
We intend to include the three proofs in a later version of this document.
They can be found in [84] but in a particular presentation that does not use
the concept of a model for axiomatic systems, and that sometimes attaches
diﬀerent meanings to established deﬁnitions, nevertheless it contains in our
opinion one of the best discussions of G¨ odel’s incompleteness theorems.
In a later version of this document we will also show how, given the proof
of incompleteness of Peano Arithmetic, G¨ odel’s theorems apply to Principia
Mathematica.
We quote K. G¨odel on the ﬁrst page of [27]:
The most comprehensive formal systems that have been set up
hitherto are the system of Principia Mathematica on the one hand
and the ZermeloFraenkel axiom system of set theory (further de
veloped by J. von Neumann) on the other. These two systems are
so comprehensive that in them all methods of proof today used
in mathematics are formalized, that is, reduced to a few axioms
and rules of inference. One might therefore conjecture that these
axioms and rules of inference are suﬃcient to decide any ma
thematical question that can at all be formally expressed in these
systems. It will be shown that this is not the case, that on the
contrary there are in the two systems mentioned relatively simple
problems in the theory of integers that cannot be decided on the
basis of the axioms”.
134 CHAPTER 8. G
¨
ODEL
8.4 Consequences
I had a lot of conversations with him [G¨ odel] and a lot of dis
agreements. Like most others, I was hard to convince about the
incompleteness theorem. There was at the time a tendency, which
I shared, to think that it was special to a certain type of formali
zation of logic and that a radical reformalization might have the
eﬀect that the G¨ odel argument did not apply. I persisted in that
longer than I should have, and he was always trying to convince
me otherwise.
 A. Church in an interview at Princeton University (1985)
In a later version of this document we will discuss the implications of
G¨odel’s theorem and show the reactions that followed the publication of his
paper [27] in 1931.
8.5. NEUMANNBERNAYSG
¨
ODEL AXIOMS 135
8.5 NeumannBernaysG¨ odel axioms
There is an inﬁnite set A that is not too big.
There’s no sense in being precise when you don’t even know what
you’re talking about.
 John von Neumann (sources unknown)
Let us recapture the situation of the axiomatic theory of sets before we
introduce the NeumannBernaysG¨odel theory.
When Cantor introduced his set theory, he gave the informal deﬁnition
(see page 16) of a set being ‘any comprehension into a whole M of deﬁnite
and separate objects m of our intuition or thought’. After Hilbert proposed
his proof theory, set theory was given a more rigorous basis, and axiomatic
theories for Cantor’s sets were proposed. Cantor’s deﬁnition was replaced by
the principle of comprehension (see page 16), which was adopted by Frege
and Russell. Based on this principle a ﬁrst formal theory of sets, called ‘ideal
calculus’ was developed (not treated in detail here, see for example [36]). The
antinomies of BuraliForti and Russell however showed that this theory was
inconsistent, and one way to restore consistency was to incorporate in the
system a theory of types, as was done by Russell. At the same time, intu
itionists tried to do mathematics without Cantor’s set theory at all. Others
tried to overcome the inconsistencies by making Cantor’s set theory more
rigidly axiomatic, and the most successful axiomatization of set theory was
presented by Zermelo in 1908.
The problem for him was to solve the problem of axiomatization in such
a way that it excludes all contradictions but still is suﬃciently wide for all
that is valuable in this theory to be preserved. As we have seen in section
5.3, Zermelo postulated a domain of abstract objects (sets) and elements of
this domain, deﬁned the primitive notions of ‘equality’ and ‘is element of’
relation, and introduced 7 axioms. The comprehension axiom was replaced
by the weaker separation axiom, that only allows new sets to be created
from existing sets and with deﬁnite predicates. Before we will describe why
the Hungarian mathematician von Neumann opposed this solution and came
with his own solution to the paradoxes, we will look at this separation axiom
136 CHAPTER 8. G
¨
ODEL
in more detail. Zermelo deﬁned the separation axiom as follows:
Separation axiom:
(∀z∃y∀x :: x ∈ y ↔ x ∈ z ∧ ϕ(x)), ϕ is deﬁnite and does not contain y. For
every set z there exists a set y whose elements are exactly those of z having
the property ϕ.
The concept of deﬁniteness in this axiom was deﬁned by Zermelo as fol
lows: “A question or assertion ϕ, the validity or invalidity of which is decided
without arbitrariness by the basic laws of logic, is said to be ‘deﬁnite’ ”.
We have already seen on page 93 that this axiom excludes the paradoxes of
Russell and BuraliForti, and as Kneebone remarks
3
in [49, page 263] also
the semantic paradoxes.
In [83], the Norwegian mathematician Skolem pointed out that the deﬁ
nition of ‘deﬁniteness’ was rather vague and he made precise the formulation
of ‘by the basic laws of logic’. Fraenkel used Skolem’s idea to formulate
the separation axiom in a new way (for details, see [49, page 290, 291]). In
1922 Fraenkel proposed the introduction of another axiom that allows the
existence of larger cardinal numbers than hitherto possible. The foundation
axiom of von Neumann makes occurrence of socalled extraordinary sets im
possible. A set is extraordinary if there is a sequence of sets V
1
, V
2
, V
3
, . . .
such that V
2
∈ V
1
, V
3
∈ V
2
, etc. Von Neumann’s subsequent interest in set
theory led to the second major axiomatization of set theory in the 1920s.
His formulation diﬀered considerably from Zermelo and Fraenkel (see sec
tion 5.3) because the notion of function, rather than that of set, was taken
as primitive. In a series of papers beginning in 1937, however, the Swiss
logician Paul Bernays, a collaborator with the formalist David Hilbert, mod
iﬁed the von Neumann approach in a way that put it in much closer contact
with Zermelo and Fraenkel. In 1940, the Czechborn Kurt G¨ odel, known for
his incompleteness proof (see chapter 8), further simpliﬁed the theory. This
version is known as the NeumannBernaysG¨odel (NBG) axioms.
3
We quote: “since a deﬁnite property is one that is decidable by the basic relations of
the domain B [of sets, the abstract objects postulated by Zermelo], no such property as
that of being deﬁnable in a ﬁnite number of words can be used in the deﬁnition of a set,
and the semantic paradoxes are thus also excluded”.
8.5. NEUMANNBERNAYSG
¨
ODEL AXIOMS 137
Before we give the axioms, it is convenient to adopt the undeﬁned notions
of class and the membership relation (though, as is also true in Zermelo and
Fraenkel, ∈ suﬃces). In the axioms we distinguish between the use of capital
Latin letters and lowercase Latin letters for the variables. The capital letters
stand for variables that take classes (the totalities corresponding to certain
properties) as values. A class is deﬁned to be a set if it is a member of some
class; those classes that are not sets are called proper classes. The lowercase
letters are used as special restricted variables for sets.
Example: ‘for all x, A(x)’ stands for ‘for all X, if X is a set, then A(X)’;
i.e. the condition holds for all sets. Intuitively, sets are intended to be those
classes that are adequate for mathematics, and proper classes are thought
of as those collections that are ‘so big’ that, if they were permitted to be
sets, contradictions would follow. In the NeumannBernaysG¨ odel axioms,
the classical paradoxes are avoided. This can be proven by showing in each
case that the collection on which the paradox is based is a proper class, i.e. is
not a set.
Theorem: With the NeumannBernaysG¨odel axioms, the derivation of
Russell’s paradox does not apply.
Proof: We show that R := ¦x [ x is a set ∧ x / ∈ x¦ is a class, but not
a set. For all y we have that y ∈ R ↔ y is a set ∧ y / ∈ y. We prove by
contradiction that R is not a set.
Suppose R is a set. Suppose R ∈ R. But then we have (take R for y in the
above statement) R ∈ R ↔ R is a set ∧ R / ∈ R: contradiction. So we must
have R / ∈ R. Then by our assumption we have R is a set ∧ R / ∈ R, and
thus R ∈ R: contradiction. Since in both cases (R ∈ R and R / ∈ R) we get
a contradiction, out assumption that R is a set must be wrong.
The NeumannBernaysG¨ odel axioms (NBG):
1 Extensionality axiom (or axiom of determination):
(∀X, Y, z :: (z ∈ X ↔ z ∈ Y ) → X = Y )
Classes are uniquely determined by their members, to be exact: if every
element (that is a set) of a class X is at the same time an element of
Y , and conversely, than X = Y .
138 CHAPTER 8. G
¨
ODEL
2 Axiom of the empty set:
(∃x∀y :: y / ∈ x)
There is an (improper, see also footnote on page 93) set, the ‘null’ or
‘empty’ set, which contains no elements at all.
3 Axiom for class formation: (∃Y ∀x :: (x ∈ Y ↔ ϕ(x)), ϕ is a proposi
tion in which set variables are only introduced by existential and uni
versal quantiﬁers. For every set z there exists a set y whose elements
are exactly those of z having the property ϕ.
4 Pairing axiom:
(∀a, b :: (∃y∀x :: x ∈ y ↔ x = a ∨ x = b))
Given two sets a and b there exists a set whose elements are exactly a
and b.
5 Sumset axiom or Union axiom:
(∀z∃y∀x :: x ∈ y ↔ (∃w :: w ∈ z ∧ x ∈ w))
For every set z there exists a set y whose elements are exactly those
objects occurring in at least one element of z.
6 Power set axiom
(∀z∃y∀x :: x ∈ y ↔ x ⊆ z)
For every set z there is a set y that includes every subset of x.
7 Axiom of inﬁnity:
(∃z :: ∅ ∈ z ∧ (∀a : a ∈ z : ¦a¦ ∈ z))
There exists a successor set.
8 Axiom of choice:
(∀x :: (∃f : f is a function : Dom(f) = x − ¦∅¦ ∧ (∀a : a ∈ Dom(f) :
f(a) ∈ x)))
Every set x has a choice function.
9 Axiom of replacement or axiom of substitution (by Fraenkel):
(∀x∃!y : ϕ is a class : ϕ(x, y)) → (∀a :: (∃b∀y :: y ∈ b ↔ (∃x : x ∈ a :
ϕ(x, y))))
The image of a set under an operation (functional property) is again a
set.
10 Axiom of restriction: X = ∅ → (∃y : y ∈ X ∧ X ∩ y = ∅) Every
nonempty class is disjoint from one of its elements.
8.5. NEUMANNBERNAYSG
¨
ODEL AXIOMS 139
The axioms 1, 3, 9 and 10 are diﬀerent from ZF. The third axiom (scheme)
is presented in a form to facilitate a comparison with the third axiom (scheme)
of ZF. In a detailed development of NBG, however, there appears, instead,
a list of seven axioms (not schemes) that for each of certain conditions there
exists a corresponding class of all those sets satisfying the condition. From
this ﬁnite set of axioms, each instance of the above scheme, can be obtained
as a theorem. When obtained in this way, the third axiom scheme of NBG
is called the class existence theorem.
In contrast to the ninth axiom scheme of ZF (see section 5.3.2), that of
NBG is not an axiom scheme but an axiom. Thus, with the comments above
about the third axiom in mind, it follows that NBG has only a ﬁnite number
of axioms. On the other hand, since the ninth axiom or scheme of ZF provides
an axiom for each formula, ZF has inﬁnitely many axioms. The ﬁniteness of
the axioms for NBG makes the logical study of the system simpler.
The relationship between the theories may be summarized by the state
ment that ZF is essentially the part of NBG that refers only to sets. We give
the following theorems without proof:
Theorem: Every theorem of ZF is a theorem of NBG
Theorem: Any theorem of NBG that speaks only about sets is a theorem
of ZF
Theorem: ZF is consistent if and only if NBG is consistent
Note that the fact that NBG avoids the classical paradoxes and that
there is no apparent way to derive any one of them in ZF does not settle the
question of the consistency of either theory. All we know from this theorem
is that either both axioms are consistent, or both are inconsistent.
140 CHAPTER 8. G
¨
ODEL
Chapter 9
Church and Turing
9.1 Turing and Turing Machine
We may hope that machines will eventually compete with men in
all purely intellectual ﬁelds.
 Alan Turing in [38, page 46]
Alan Mathison Turing (19121954) was an English mathematician and
logician who pioneered in the ﬁeld of computer theory and who contributed
important logical analyses of computer processes. Turing studied in Cam
bridge, worked there on probability theory and (independently of de Moivre)
discovered the central limit theorem. In 1936 he won the Smith’s Prize. As
we have seen in the previous chapters, many mathematicians had attempted
to eliminate all possible error from mathematics by establishing a formal,
or purely algorithmic, procedure for establishing truth (the socalled for
malist program). With his incompleteness theorem (see section 8.1), Kurt
G¨odel threw up an obstacle to this eﬀort, for he showed that any useful ma
thematical axiom system is incomplete in the sense that there must exist
propositions whose truth can never be decided (within the system). Turing
was motivated by G¨ odel’s work to seek an algorithmic method of determining
whether any given propositions were undecidable, with the ultimate goal of
eliminating them from mathematics. Instead, he proved in his seminal paper
‘On Computable Numbers, with an Application to the Entscheidungspro
blem’ (reprinted in [19]) that there cannot exist any such universal method
of determination. We now regard this decision problem, or Entscheidungs
141
142 CHAPTER 9. CHURCH AND TURING
problem, in more detail.
Decidability was one of Hilbert’s requirements for an axiomatic system
(see section 6.1). The problem of decidability asks if, given a mathematical
proposition, one could ﬁnd an algorithm which would decide if the propo
sition is true or false. When given an algorithm, it is easy to see that it
can prove certain propositions. But it is more diﬃcult to prove there is no
algorithm that can solve certain propositions. Thereto Turing introduced a
hypothetical computing device (later called Turing machine). The Turing
Machine and proof of undecidability are given later in the section.
After this important publication Turing completed his Ph.D. in 1938 on
systems of logic based on ordinals, under direction of Alonzo Church (see
section 9.2). During the war Turing worked on breaking German Enigma
codes, and in 1948 he worked in Manchester on the construction of a new
digital computer. He described a modern computer before technology had
reached the point where the construction was a realistic possibility. His ef
forts in the construction of early computers and the development of early
programming techniques were of prime importance. He also championed the
theory that computers eventually could be constructed that would be capable
of human thought, and he proposed a simple test, now known as the Tur
ing test, to assess this capability. Turing’s papers on the subject are widely
acknowledged as the foundation of research in artiﬁcial intelligence. In 1952
Turing published the ﬁrst part of his theoretical study of morphogenesis, the
development of pattern and form in living organisms.
The Turing Machine
Turing introduced his hypothetical computing device in 1936. He origi
nally conceived the machine as a mathematical tool that could infallibly re
cognize undecidable propositions  i.e., those mathematical statements that,
within a given formal axiomatic system (that includes at least arithmetic),
cannot be either true or false. G¨ odel had demonstrated that such proposi
tions exist in any such system. Turing instead proved there can never exist
any universal algorithmic method for determining whether a proposition is
undecidable. This was left open by G¨odel, since the incompleteness theorem
(see section 8.1) only stated that consistency and completeness could not at
the same time be attained; that means there were statements (in consistent
9.1. TURING AND TURING MACHINE 143
systems) about numbers, indubitably true, which could not be proved from
ﬁnitely many rules. But the decidability of mathematical statements was
not settled by G¨odels theorem because it needs a formal deﬁnition of (al
gorithmic) method in the formulation of the problem (or a deﬁnition of the
notion of algorithm in the deﬁnition of decidability in section 6.1). Thereto
Turing introduced a machine that was later to be called the Turing machine,
an idealized mathematical model that reduces the logical structure of any
computing device to its essentials. By extrapolating the essential features of
information processing, Turing was instrumented in the development of the
modern digital computer. His model served as a basis for all subsequent digi
tal computers, which share his basic scheme of an input/output device (tape
and head), memory (tape) and central processing unit (head and transition
function).
Nowadays there are many models of computing devices available in the
theory of computation (complexity). We will not cover restricted models
such as ﬁnite automata and pushdown languages (and corresponding notions
such as regular languages and contextfree grammars). We now directly in
troduce the much more powerful model of Turing that we need to invest all
mathematical problems.
The Turing Machine model uses an inﬁnite tape as its unlimited memory,
and has a tape head that can read and write symbols (of a set Γ) and move
around a tape (to the L(eft) or R(ight)). We here assume the tape is right
inﬁnite; this means the tape continues inﬁnitely to the right side but it has
a leftmost position. Initially the tape contains an input string of symbols
from an input alphabet Σ and is blank (i.e. ﬁlled with a special blank symbol
.) everywhere else. The Turing Machine is in a state q of a set of states Q,
and starts in an initial state q
0
. It uses a transition function δ that deter
mines how it gets from one conﬁguration (that is the current state, the tape
contents and the head location) to the next. This transition can consist of
writing a new symbol of the tape alphabet Γ to the tape and moving the tape
head either Left or Right, and depends on the current state and the current
symbol on tape. This computation (i.e. sequence of transitions) continues
until the Turing Machine enters either the (ﬁnal) state q
accept
or the (ﬁnal)
state q
reject
. We can deﬁne a Turing Machine (sometimes called determin
istic, since each transition is determined uniquely given the conﬁguration)
formally as a septuple:
144 CHAPTER 9. CHURCH AND TURING
Deﬁnition of a Turing Machine (TM):
A Turing Machine (TM) := (Q, Σ, Γ, δ, q
0
, q
accept
, q
reject
) with:
1 Q is a ﬁnite set of states.
2 Σ is a ﬁnite input alphabet not containing the special blank symbol ..
3 Γ is a ﬁnite tape alphabet, where ¦.¦ ∈ Γ and Σ ⊆ Γ.
4 δ is the transition function, where δ is ﬁnite and
δ : QΓ → QΓ ¦L, R¦.
5 q
0
is the start state, where q
0
∈ Q.
6 q
accept
is the accept state, where q
accept
∈ Q.
7 q
reject
is the reject state, where q
reject
∈ Q and q
reject
= q
accept
.
We call conﬁgurations accepting conﬁgurations if the state is q
accept
, re
jecting conﬁgurations if the state is q
reject
, and halting conﬁgurations if the
state is either q
accept
or q
reject
. A start conﬁguration C on input w is a con
ﬁguration with state q
0
and the head is on the leftmost position on the tape
with just w on it.
After deﬁning the Turing Machine, Turing made his famous proposal
(known as Turing’s thesis, see also section 9.3) for the concept of ‘com
putability by a Turing machine’. The proposal says that whenever there
is an eﬀective method for obtaining the values of a mathematical function
(i.e. it is intuitively or eﬀectively computable), the function can be computed
by a Turing Machine. The converse claim is trivial, and if the thesis is correct
we can reduce problems of (non)existence of eﬀective methods by problems
of the (non)existence of Turing Machine problems. We quote one of Turing’s
formulations from [90]:
Turing’s Thesis: LCM’s [Logical Computing Machines, Turing’s expres
sion for Turing Machines] can do anything that could be described as “rule
of thumb” or “purely mechanical”.
We now introduce more of Turing’s theory of Turing Machines before we
deﬁne his proof of undecidability.
9.1. TURING AND TURING MACHINE 145
We deﬁne a language to be a set of strings, a string being a series of
alphabet symbols (i.e. w ∈ Σ
∗
, for all strings w). We say that a TM M
accepts input string w if a sequence of conﬁgurations C
1
, . . . , C
k
exists where
1 C
1
is the start conﬁguration of M on input w.
2 Each C
i
yields C
i+1
via the transition function δ on M.
3 C
k
is an accepting conﬁguration.
A set of strings that M accepts is called the language of M.
Deﬁnition of the language of a TM: The language of a TM M, notation
L(M) := ¦w [ w is a string that M accepts ¦.
Let w ∈ Σ
∗
. We now deﬁne a notion that covers the ability of a TM to
end in the accept state when started with any string of a certain language.
Deﬁnition of Turingrecognizable: A language L is recognized by a TM
M := there exists a TM M such that for all strings
1 with input w, M stops in q
accept
if w ∈ L and
2 with input w, M stops in q
reject
or does not stop (loops) if w / ∈ L.
If language L is recognized by a TM M we say that M is an acceptor for
L. We distinguish between recognizing and deciding capabilities.
Deﬁnition of Turingdecidable (or decidable): A language L is decided
by a TM M := there exists a deterministic TM M such that:
1 with input w, M holds in q
accept
if w ∈ L, and
2 with input w, M holds in q
reject
if w / ∈ L.
If a language L is decided by a TM M we say that M is a decider for L.
There are several variants on Turing Machines such as doublesided in
ﬁnite Turing Machines, multitape Turing Machines, nondeterministic Tur
ing Machines and certain types of socalled enumerators. Most variants are
equivalent in the sense that they can recognize the same set of languages
146 CHAPTER 9. CHURCH AND TURING
(but not necessarily equally eﬃcient).
Example: We now give an example of a Turing Machine solving a mathema
tical problem by ﬁrst deﬁning it as a language problem. The problem (idea
from [56]) is to design a Turing Machine that computes the function
f(x, y) = x +y if x ≥ y
f(x, y) = 0 if x < y
For simplicity, we assume x and y to be positive integers. First we have to
choose a convention for representing positive integers, and decide what the
initial situation of the tape is. We choose a unary notation in which any
positive integer xis represented by w(x) ∈ ¦1¦
+
, such that [ w(x) [ = x. We
assume that w(x) and w(y) are on the tape in unary notation, separated by
a single ‘0’ and with the readwrite head on the leftmost symbol of w(x).
We ﬁrst describe how the sum of x and y can be calculated, then how the
comparison x ≥ y can be made and ﬁnally how to combine those two ma
chines into a Turing Machine that computes the desired function.
Calculating the sum
To add the two numbers a and b, we only have to remove the separating
0, so addition amounts to the concatenation of two strings. The following
Turing Machine, called Adder, adds a and b and is constructed relatively
simple:
Adder = (Q, Σ, Γ, δ, q
0
, q
A
, q
R
), with
Q = ¦q
0
, q
1
, . . . , q
4
¦
Σ = ¦0, 1¦
Γ = ¦0, 1, .¦
q
0
= ¦q
0
¦
q
A
= ¦q
4
¦
q
R
= ¦¦
δ(q
0
, 1) = (q
0
, 1, R)
9.1. TURING AND TURING MACHINE 147
δ(q
0
, 0) = (q
1
, 1, R)
δ(q
1
, 1) = (q
1
, 1, R)
δ(q
1
, .) = (q
2
, ., L)
δ(q
2
, 1) = (q
3
, 0, L)
δ(q
3
, 1) = (q
3
, 1, L)
δ(q
3
, .) = (q
4
, ., R)
Note that we remove the ‘0’ by temporarily creating an extra ‘1’, a fact
that is remembered by putting the machine into state q
1
. The transition
δ(q
2
, 1) = (q
0
, 0, R) is needed to remove this ‘1’ at the end of the computa
tion. Finally, we move the readwrite head back to the leftmost ‘1’. This
is not strictly necessary in this example, because the machine is designed
such that it will terminate right after any addition, but it is not harmful and
normally a good habit to let any action terminate in a state from which it is
easy to take further transitions.
Comparison
To compare two numbers a and b, we again assume they are written in the
notation that we used before and divided by a single ‘0’. We will construct
a Turing Machine that halts in an accepting state if a ≥ b and in a rejecting
state if a < b. Thereto we can match each ‘1’ on the left of the dividing
‘0’ with a ‘1’ on the right. We can do this by starting at the leftmost ‘1’
(of the number a) and interchangeably check oﬀ the leftmost symbols of the
numbers a and b by replacing them with the symbols ‘x’ and ‘y’ respectively.
The matching will stop when one of the two sequences of ‘1’s is completely
checked oﬀ. If x < y then the right sequences will still contain ‘1’s, and
if x ≥ y either the left sequence contains ‘1’s or neither sequence contains
‘1’s. In the ﬁrst case, we still ﬁnd a ‘1’ on the right when all ‘1’s on the left
have been replaced. We use this to get into the state q
5
. In the second case,
if a ≥ b, when we attempt to match another ‘1’, we encounter a blank at
the right of the working space, which can be used as a signal to enter the
accepting state. If we work this out in detail, we get the following Turing
Machine called Comparer :=
(Q, Σ, Γ, δ, q
0
, q
A
, q
R
), with:
148 CHAPTER 9. CHURCH AND TURING
Q = ¦q
0
, q
1
, q
2
, q
3
, q
4
, q
5
, q
6
, q
7
¦
Σ = ¦0, 1¦
Γ = ¦0, 1, x, y, .¦
q
0
= ¦q
0
¦
q
A
= ¦q
5
¦
q
R
= ¦q
7
¦
The transitions of δ can be grouped in several parts.
δ(q
0
, 1) = (q
1
, x, R)
δ(q
1
, 1) = (q
1
, 1, R)
δ(q
1
, 0) = (q
2
, 0, R)
δ(q
2
, y) = (q
2
, y, R)
δ(q
2
, 1) = (q
3
, y, L)
This set replaces the leftmost ‘1’ of a with ‘x’, then causes the readwrite
head to travel right to the ﬁrst ‘1’ of b and replace it with the symbol ‘y’.
When the dividing ‘0’ is passed, the machine enters state q
2
, indicating that it
is now dealing with the number b. When the symbol ‘y’ has been written, the
machine enters a state q
3
, indicating that on ‘1’ of ‘y’ has been successfully
paired with a ‘1’ of ‘x’. The next group of transitions reverses the direction
and repositions the readwrite head over the leftmost ‘1’ of a, and returns
control to the initial state,
δ(q
3
, y) = (q
3
, y, L)
δ(q
3
, 0) = (q
4
, 0, L)
δ(q
4
, 1) = (q
4
, 1, L)
δ(q
4
, x) = (q
0
, x, R)
9.1. TURING AND TURING MACHINE 149
The rewriting continues this way when the input is a string 1
x
01
y
, stopping
only when on one side no more ‘1’s can be replaced. In that case either the
left side will not contain anymore ‘1’s (a ≤ b), or the right side has run out of
‘1’s (a > b). In case the left side will not contain anymore ‘1’s, the transition
δ(q
4
, x) = (q
0
, x, R) will leave the readwrite head on a ‘0’ in stead of a ‘1’.
δ(q
0
, 0) = (q
5
, x, L) (a ≤ b)
δ(q
2
, .) = (q
6
, ., L) (a > b)
In the ﬁrst case we still have to check whether the right side has any ‘1’s left,
to determine whether a = b. This is done in the state q
5
.
δ(q
5
, x) = (q
5
, x, R)
δ(q
5
, 0) = (q
5
, 0, R)
δ(q
5
, y) = (q
5
, y, R)
δ(q
5
, 1) = (q
7
, y, R) (a < b)
δ(q
5
, .) = (q
6
, ., L) (a = b)
Combining Turing Machines for complicated tasks
We now have to put together the Turing Machines’ Adder and Comparer
to obtain the desired Turing Machine that computes the given function. We
can do this by starting with the input a and b in the previously described
notation and starting position, and using Comparer to determine whether
or not a ≥ b. We index all states with a C, i.e. the last transition will be
δ(q
C,0
, x) = (q
C,5
, x, L) or δ(q
C,2
, .) = δ(q
C,6
, ., L). In the ﬁrst case (a ≥ b),
the Comparer should send a ‘start signal’ to the Adder, to give a +b as out
put. In the second case (a < b), the Comparer should send a ‘start signal’
to a Turing Machine, (called Eraser) that simply replaces all ‘1’s by ‘0’s to
output the value 0 in the desired format.
We show how we can let the Comparer send these ‘start signals’. We ﬁrst
index all states of the Adder by A and of the Eraser by E. Now in case of
a ≥ b, Comparer ends in state q
C,5
, and we can add a transition δ(q
C,5
, ∗) =
δ(q
A,0
, ∗). The star ‘∗’ stands for any possible symbol, so actually this tran
sition is a shorthand notation for a set of transitions. Similarly, we can let
150 CHAPTER 9. CHURCH AND TURING
δ(q
C,7
, ∗) = δ(q
E,0
, ∗) bring the Eraser in the initial state. The Adder respec
tively Eraser will then give the desired output because their behavior on the
input does not change as a result of the remaining of the states by comparer
(to be exact: the state in which the comparer terminates is suitable as an
initial position for Adder or Eraser). The only thing we have not taken care
of is that when the Comparer enters a ﬁnal state, it does not have the initial
representation of the numbers a and b on tape, but has replaced the ‘1’s by
‘x’s and ‘y’s. We can easily (it is just some extra work, you can try it as an
exercise if you want) ﬁx this by letting Comparer, as the last action before
entering a ﬁnal state, replace all ‘x’s and ‘y’s by ‘1’s. The result is a Turing
Machine that combines Comparer, Adder and Eraser to compute the func
tion f. Similarly to this example, we can for example multiply two numbers
a and b, and we can also translate macroinstructions like ‘if p then q
j
else
q
k
’ (meaning that when we read ‘p’ on tape, then the Turing Machine goes
into a state q
j
and otherwise into a state q
k
), and even combine them into
complicated subprograms that can be invoked repeatedly whenever needed.
(End of Example)
The Entscheidungsproblem
After introducing the notion of a TM in [89], Turing answered Hilbert’s
decision problem for mathematical logic (in German called ‘Entscheidungs
problem’) in the negative. The Entscheidungsproblem asks whether there
exists a deﬁnite method or algorithm which (at least in principle) can be ap
plied to any given mathematical property to decide whether that proposition
is provable. We now deﬁne the notion of an algorithm with the notion of a
Turing Machine, and the set of provable propositions by the set of languages
that can be decided by some TM. If we look at the deﬁnition of decidability
in section 6.1, we have that for all formulas ϕ an algorithm, i.e. a TM, exists
that decides whether ϕ is true or not. If we code ϕ by means of a language,
and this is always possible (see the previous example for a demonstration),
we can reformulate the problem as: for all strings w ∈ L, there exists a TM
M that decides ϕ. We now show that this is not possible for all problems
(i.e. languages) by giving a speciﬁc problem, the Halting problem, that is not
decidable.
The Halting problem is the problem of testing whether a TM accepts a
given input string. We deﬁne the problem by stating it as a language pro
9.1. TURING AND TURING MACHINE 151
blem, and asking whether that language is decidable.
Deﬁnition of the Halting problem:
For all strings w, H := ¦< M, w > [ M is a TM and M accepts w¦. Is H
decidable? (i.e. is there for each language a TM that decides for all strings
w if they belong to the language or not, that is (using Turing’s thesis, see
section 9.3): is there for each problem an algorithm that can decide it?).
Theorem: H is recognizable
Proof (by Turing): The following TM U, also called Universal Turing Ma
chine because it is capable of simulating any other Turing Machine, recog
nizes H. We informally deﬁne U, because a detailed deﬁnition of the septuple
such a TM consists of (see the deﬁnition of a TM) is a lot of work.
Description of Universal Turing Machine: U =
“On the input < M, w > where M is a TM and w is a string:
1 simulate M on input w
2 if M ever enters its accept state, accept”
Note that this TM loops on input < M, w > if M loops on w, which is
why this machine does not decide H. If the algorithm had some way to de
termine that M was not halting on w, it could reject. Hence H is sometimes
called the Halting problem. As Turing demonstrated, an algorithm has no
way to make this determination.
Theorem: H is undecidable (see also [82, page 165]).
Proof (by Turing): We assume H is decidable and obtain a contradiction.
Suppose D is a decider for H, and deﬁned by
D(< M, w >) :=“
• accept if M accepts w
• reject if M does not accept w”
Now we construct a new TM O with D as a subroutine. This new TM
calls D to determine what M does when the input to M is its own description
< M >. Once O has determined this information, it does the opposite. That
is, it rejects if M accepts and accepts if M does not accept. The following is
a description of O: O = “On input < M >, where M is a TM:
152 CHAPTER 9. CHURCH AND TURING
1 run D on input < M, < M >>,
2 output the opposite of what D outputs; that is if D accepts, reject and
if D rejects, accept”
We summarize the behavior of O as follows:
O(< M >) = “
• accept if M does not accept < M >
• reject if M accepts < M > ”
Now we obtain the contradiction by running O with its own description
< O > as input. In that case we get:
O(< O >) = “
• accept if O does not accept < O >
• reject if O does accept < O > ”
Thus neither O nor D can exist.
Turing wrote in his last publication about the interpretation of unsolvable
problems, such as the Halting problem for Turing machines:
These . . . may be regarded as going some way towards a demon
stration, within mathematics itself, of the inadequacy of ‘reason’
unsupported by common sense.
 Alan Turing
In this section I have made extensive use of [38] [92] for information on
the life and work of Turing and [89] [82] [19] for the theory of TM’s and the
Halting problem. Another valuable source of information on Turing’s life and
work is the website http://www.turing.org.uk/
9.2. CHURCH AND THE LAMBDA CALCULUS 153
9.2 Church and the Lambda Calculus
Alonzo Church (19031995) was an American mathematician, whose work is
of major importance in mathematical logic, recursion theory and in theore
tical computer science. One of the most important contributions to logic is
his invention in the 1930s of the lambda calculus. He is also remembered
for Church’s theorem published in 1936 in [14, page 345363], stating that
the lambda calculus can be used to embody a correct formalization of the
notion of computability (see section 9.3). The notion of lambda deﬁnability
is conceptually the basis for the discipline of functional programming, and
the lambda calculus is also the basis for type theory. Church also founded
the Journal of Symbolic logic in 1956. He had 31 doctoral students including
famous mathematicians such as Turing, Kleene, Kemeny and Smullyan. We
now introduce the lambda calculus (Church’s formalization of the notion of
eﬀective calculability) in a modern setting, using [9, chapter 4].
Application and abstraction
First we introduce the basic concepts of λcalculus. A formalization fol
lows thereafter. The lambda calculus has only two basic operations, abstrac
tion and application.
• Abstraction is for constructing functions: For an expression E we in
troduce λx.E to denote the abstraction of E over x, i.e. ‘the function
of x which computes E’.
Example
1
: λx . x + 1, λn . n n, etc.
We will later see how to deﬁne a recursive function; this is not so easy
since we do not have function names.
• (Function) application: The expression F A denotes that F is consid
ered as a function (an algorithm) applied to input A. The original
lambda calculus theory is typefree so we also consider F F, that is, F
applied to itself.
Example: (λx . x + 1) 4, (λn . n n) 7, etc.
1
Note that in some examples we have simpliﬁed the notation for the clarity of the
example, since in pure lambda calculus we do not have arithmetic symbols, like + and ,
but we can encode these operations in the pure lambda calculus, as we will later see.
154 CHAPTER 9. CHURCH AND TURING
These two notions can be very powerful if we introduce the rule of beta
reduction which allows us to apply an expression over an abstraction, and for
example, rewrite (λx . x+1)4 to 4+1. Similarly (λn . nn) 7 can be reduced
to 77. It is also allowed to use arbitrary nesting: ((λn . λx . (x+1)n) 7) 4
can be reduced to (λx . (x + 1) 7) 4 and then to (4 + 1) 7.
Similar to ordinary mathematics, the names of the variables are irrele
vant to the rules that can be applied, which allows a transformation of the
names (also known as dummy transformation). This rule in lambda calculus
is called alpha conversion. For example, alpha conversion allows us to rewrite
λn . nn to λx . xx, since they are essentially the same function.
Note that we also want to use functions as variables and arguments:
((λf . (λn . λx . fx n) 7)(λy . y + 1)) 4 should reduce to the earlier
expression.
But above we only have functions of one argument; we now introduce functions
with more arguments, while avoiding new notations. We can solve this pro
blem by using iteration of applications, often called currying after the Amer
ican mathematician H.B. Curry who made it popular.
Example: f(x, y) = 3 x + y can be written as F
1
≡ λx . (λy . 3 x + y).
Then f(4, 5) is written (F
1
4) 5, that is ((λx . (λy . 3 x + y)) 4) 5, which
can be reduced to (by using beta reduction): 3 4 + 5.
The above explanation and examples give an idea of what lambda calcu
lus is. We will now work towards a more formal deﬁnition of lambda calculus.
The system of lambda calculus is based on the structure of Abstract Reduc
tion Systems (ARS). The terms of the ARS then coincide with the inductively
deﬁned lambda terms and the reduction relation will be β−reduction. So be
fore we formally deﬁne the lambda calculus, we introduce the most relevant
theory of abstract reduction systems.
Abstract Reduction Systems
Deﬁnition of Abstract Reduction System (ARS): An abstract reduc
tion system / := a structure ' A, →` consisting of a set A and a binary
relation → on A (i.e. →⊆ A A).
The relation is also called reduction or rewrite relation. If for a, b ∈ A, we
have a → b, we call b a onestep reduct of a.
9.2. CHURCH AND THE LAMBDA CALCULUS 155
The transitive and reﬂexive closure of → is written as (or alternatively
→
∗
). This means is the smallest relation on A satisfying, for all a, b, c ∈ A,
(closure of →) if a → b then a b,
(reﬂexive) a a, and
(transitive) if a b and b c then a c.
Thus a b if and only if there exists a ﬁnite sequence of reduction steps
a ≡ a
0
→ a
1
→ . . . → a
n
≡ b. This sequence may be empty, in which case
a ≡ b. Here ≡ denotes (the syntactic) identity of elements of A, i.e. a ≡ b if
and only if a and b are the same element of A.
Deﬁnition of Normal Form: A term a ∈ A of an ARS < A, →> is a
normal form := there is no b ∈ A such that a → b. Furthermore, b ∈ A has
a normal form if and only if b a for some normal form a ∈ A
Deﬁnition of Weakly Normalizing: The reduction relation → of an
ARS < A, →> is weakly normalizing (or weakly terminating) := every a ∈ A
has a normal form. In this case we also say that A is weakly normalizing
Deﬁnition of Strongly Normalizing: The reduction relation → of an
ARS < A, →> is strongly normalizing (also called terminating, wellfounded
or noetherian) := there exists no inﬁnite reduction a
0
→ a
1
→ a
2
→ . . .,
with for all n ∈ N, a
n
∈ A.
Lemma If an ARS is strongly normalizing, it is weakly normalizing.
Proof: We prove this by proving the contraposition: if 'A, →` is not weakly
normalizing then 'A, →` is not strongly normalizing. Suppose 'A, →` is not
weakly normalizing. Then there is a
0
∈ A without a normal form. Since a
0
has no normal form, then certainly a
0
is not a normal form itself, so there is
a
1
∈ A such that a
0
→ a
1
. Now a
0
has no normal form, so a
1
can not be a
normal form. Thus we get an element a
2
∈ A such that a
1
→ a
2
. Repeating
this process yields an inﬁnite reduction a
0
→ a
1
→ a
2
→ . . ..
Deﬁnition of Unique Normal Form: The reduction relation → of an
ARS < A, →> has the unique normal form property := for all a, b, c ∈ A
156 CHAPTER 9. CHURCH AND TURING
such that a b, a c, and b, c are normal forms, we have b ≡ c
Lemma An ARS < A, →> with the unique normal form property is not
always weakly normalizing.
Proof: For instance, the abstract reduction system with only element a ∈ A
and rewrite rule a → a has no normal forms, so it trivially has the unique
normal form property and is not weakly normalizing.
Deﬁnition of Local Conﬂuence: A reduction relation → of an ARS
< A, →> is called locally conﬂuent or weakly conﬂuent (also weakly Church
Rosser) := for all a, b, c ∈ A with a → b and a → c there exists a d ∈ A such
that b d and c d
Deﬁnition of Conﬂuence: A reduction relation → of an ARS < A, →>
is called conﬂuent (or has the ChurchRosser property, or is ChurchRosser)
:= for all a, b, c ∈ A with a b and a c there exists a d ∈ A such that
b d and c d
Lemma If a reduction relation has the unique normal form property and is
weakly normalizing then it is conﬂuent.
Proof: Suppose we have a b and a c. Since → is weakly normalizing,
there are normal forms b
and c
such that b b
and c c
. By transitivity
we also have a b
and a c
, and thus by the unique normal form property
b
≡ c
. Hence b b
and c b
.
Lemma If → is conﬂuent then → has the unique normal form property.
Proof: Suppose a b, a c, and b, c are normal forms. By conﬂuence,
there exists a d such that b d and c d. Since b and c are normal forms,
we must have b ≡ d and c ≡ d, thus b ≡ c.
Syntax
Now we have seen the basic principle of lambda calculus, we will give a
more formal deﬁnition. We formally deﬁne the syntax of the lambda calculus
by giving its grammar.
9.2. CHURCH AND THE LAMBDA CALCULUS 157
Deﬁnition of the Syntax of Lambda Terms:
Lambda Term E := C [ v [ (E
1
E
2
) [ (λv . E) , with
• C ranges over a set of constants
(we will use the constant names a, b, c, . . . for elements of C)
• v ranges over a (denumerable) set of variables (using v, w, x, . . .)
• (E
1
E
2
) denotes a combination involving the application of one expres
sion (E
1
) to another (E
2
). The subexpression E
1
is referred to as the
operator and E
2
is referred to as the operand
• (λv . E) denotes an abstraction. Informally it denotes a function of v
which produces result E. The subexpression E is referred to as the body
of the abstraction and v is called the bound variable of the abstraction
We also call lambda terms simply ‘terms’ or ‘expressions’.
Notational conventions: to achieve a minimal notation, we drop parentheses
whenever possible, and assume:
• Association to the left for iterated application:
F E
1
E
2
. . . E
n
denotes (. . . ((F E
1
) E
2
) . . . E
n
),
• Association to the right for iterated abstraction:
λx
1
. x
2
. . . . .x
n
.E or shortly λx
1
x
2
. . . x
n
. E
denotes λx
1
. (λx
2
. (. . . (λx
n
. E) . . .)).
Example: We can write the expression F
1
of the previous example as
λx y . 3 x +y, and λv . E
1
E
2
means (λv . (E
1
E
2
)).
Free/Bound Variables and αconversion
We distinguish between free and bound occurrences of variables in an ex
pression. An occurrence of v in E is said to be bound if it occurs within a
subexpression of E with the form λv . E
1
, and the occurrence is said to be
free otherwise.
Example: n occurs free in λx . (x + 1) n, whereas x occurs bound in this
expression. Both n and x occur bound in λn . (λx . x + 1) n. Further x
occurs both bound and free in (λx . x +1) x (the second occurrence of ‘x’
158 CHAPTER 9. CHURCH AND TURING
in this expression is bound, the third occurrence is free).
Deﬁnition of free variables: The free variables of a term E, denoted by
FV (E), is a set of variables deﬁned recursively by:
• FV (C) = ∅,
• FV (v) = ¦v¦,
• FV (E
1
E
2
) = FV (E
1
) ∪ FV (E
2
),
• FV (λv . E) = FV (E) −¦v¦.
An expression E is said to be closed if FV (E) = ∅.
Example: The expression λz . (λx . z +x)(λy . y z) is closed.
αconversion
We consider two terms as ‘equivalent’ if they only diﬀer in their bound
variables. So λx . x and λy . y are considered being equivalent. But we must
distinguish λx . y +x and λy . y +y, since one has a free occurrence of y and
the other not. Note also that λxy . xy and λxy . yx are not equivalent. The
renaming process is called αconversion, and allows us to change the name
of a bound variable, as long as we do so consistently. It is formally deﬁned
as the equivalence relation generated by the following reduction:
Deﬁnition of αreduction: λx . E →
α
λy . E
, where E
is obtained from
E by replacing all free occurrences of x in E by y, provided y is fresh, that is,
y neither occurs as a free variable nor as a bound variable in the expression
E (i.e. it does not occur in E).
Expressions that can be made textually equivalent by renaming bound
variables are called αconvertible or alpha(betically) equivalent. When two
lambda terms E
1
and E
2
are αconvertible in this sense we write E
1
≡
α
E
2
,
and often also E
1
≡ E
2
.
9.2. CHURCH AND THE LAMBDA CALCULUS 159
Example: Some αconversions:
λx . x + 1 ≡
α
λy . y + 1
λx . (λy . y x) ≡
α
λy . (λy . y y) (because the y’s in (λy . y x) will get
bound)
λx .(λy . x y)y ≡
α
λx .(λz . x z)y
From now on, two λterms are considered (syntactically) equal if they are
αconvertible to each other.
Substitution
We now formally deﬁne the concept of substitution of a variable in lambda
terms.
Deﬁnition of Substitution: The substitution of expression E for each free
occurrence of v in expression E
0
, denoted by E
0
[E/v], is deﬁned by induction
on the structure of E
0
as:
• C[E/v] ≡ C
• x[E/v] ≡
E if x ≡ v
x if x ≡ v
• (E
1
E
2
)[E/v] ≡ (E
1
[E/v])(E
2
[E/v])
• (λx . E
1
)[E/v] ≡
λx . E
1
if x ≡ v
λx . (E
1
[E/v]) if x ≡ v and x / ∈ FV (E)
λy . ((E
1
[y/x])[E/v]) if x ≡ v and x ∈ FV (E)
and y / ∈ FV (E
1
E)
Example: (λx . z+7x)[x+3/z] ≡ λy . (z+7y)[x+3/z] ≡ λy . (x+3)+7y.
The following lemma tells us that substitution behaves well; it can be
proven by induction on the structure of λterms.
Lemma For all terms E
0
, E
1
, E
2
and variables x, y such that x ≡ y:
E
0
[E
1
/x][E
2
/y] ≡ E
0
[E
2
/y][E
1
[E
2
/y]/x].
160 CHAPTER 9. CHURCH AND TURING
Reduction System for the Lambda Calculus
As we have seen with an example at the beginning of this section, the
main rule for the lambda calculus is the beta reduction rule, that we can now
formally deﬁne.
Deﬁnition of βreduction: βreduction is the compatible relation gener
ated by (λv . E
1
)E
2
→
β
E
1
[E
2
/v], with the rules:
E
1
→
β
E
2
E
1
E →
β
E
2
E
E
1
→
β
E
2
EE
1
→
β
EE
2
E
1
→
β
E
2
λv.E
1
→
β
λv.E
2
As before, any term matching the lefthand side of the rule is called a redex
and thus any expression of the form (λv . E
1
)E
2
is called a βredex.
βreduction is a reduction relation →
β
of the pure lambda calculus. We often
write → resp. instead of →
β
and
β
. We use =
β
(or sometimes simply
=) to denote the equivalence relation generated by →
β
. Note the diﬀerence
between ≡
(
α) and =
(β)
.
Example: (λnx . (x + 1) n) 7 4 →
β
(λx . (x + 1) 7) 4 →
β
(4 + 1) 7.
Example: This example illustrates the need of αconversion during β reduc
tion, even if distinct names are chosen from the start. Deﬁne TWICE ≡
λf . λx . f(fx), then
(λy . yy)TWICE
→
β
TWICE TWICE
≡ (λf . λx . f(fx)) TWICE
→
β
λx . TWICE (TWICE x)
≡ λx . TWICE ((λf . λx . f(fx))x)
→
β
λx . TWICE ((λx . f(fx))[x/f]) (Note the name clash)
≡
α
λx . TWICE ((λy . f(fy))[x/f])
9.2. CHURCH AND THE LAMBDA CALCULUS 161
≡ λx . TWICE (λy . x(xy))
→
β
. . .
Example:
1. (λx . x + 1) ((λy . y y) 3)
β
(two possibilities) (3 3) + 1,
so diﬀerent reduction paths are possible.
2. Ω ≡ (λx . xx)(λx . xx) →
β
(λx . xx)(λx . xx) →
β
, thus inﬁnite
sequences of steps are possible: βreduction is not always terminating.
This corresponds to ‘selfreproducing programs’.
3. (λx . xxx)(λx . xxx) →
β
(λx . xxx)(λx . xxx)(λx . xxx) →
β
, so
terms can even become arbitrarily large.
4. (λy . c)((λx . xxx)(λx . xxx)) → c, but also
(λy . c)((λx . xxx)(λx . xxx)) → (λy . c)((λx . xxx)(λx . xxx)(λx . xxx))
and the latter term can be reduced to c or again to a longer term, etc.
Although we already saw that λcalculus is neither weakly nor strongly nor
malizing, it does have the important conﬂuence property. First we introduce
the following deﬁnition of the diamond property that we use to prove that
→
β
is conﬂuent. To prevent confusion in the notation we will from now on
also use the implication symbol ⇒.
Deﬁnition of the Diamond Property: A binary relation → on the
lambda terms Λ satisﬁes the diamond property, notation →[= ♦ :=
(∀M, M
1
, M
2
: M, M
1
, M
2
∈ Λ : (M → M
1
∧ M → M
2
) ⇒ (∃M
3
: M
3
∈ Λ :
M
1
→ M
3
∧ M
2
→ M
3
))
Note that a reduction →
β
has the ChurchRosser property if it satisﬁes
the diamond property.
Lemma: Let → be a binary relation on a set Λ with its transitive,
reﬂexive closure and let →[= ♦. Then [= ♦.
162 CHAPTER 9. CHURCH AND TURING
Proof: Assume → is a binary relation on a set Λ with its transitive,
reﬂexive closure, and →[= ♦. We now have to prove that [= ♦. Suppose
M, L, K ∈ Λ, M L and M K. We then have to prove (∃N : N ∈ Λ :
L N ∧ K N). Let
(*) M ≡ M
0
→ M
1
→ . . . → M
n
≡ L, for some n ∈ N
(**) M ≡ K
0
→ K
1
→ . . . → K
m
≡ K, for some m ∈ N
We now need to apply a technique called induction loading (see for more
information the links on http://zax.mine.nu/stage/) to prove that K and L
have a common reduct N. To be precise, we show that l(m, n) holds for all
m, n ∈ N, with
l(m, n) := there exists a N(i, j) ∈ Λ, with i, j ∈ N and 0 ≤ i ≤ n
∧ 0 ≤ j ≤ m such that:
(a) N(i, 0) ≡ M
i
if 0 ≤ i ≤ n
(b) N(0, j) ≡ K
j
if 0 ≤ j ≤ m
(c) N(i, j) → N(i, j + 1) if 0 ≤ i ≤ n ∧ 0 ≤ j < m
(d) N(i, j) → N(i + 1, j) if 0 ≤ i < n ∧ 0 ≤ j ≤ m
Clearly, when l(m, n) is true for all m, n ∈ N, we know that K and L have
a common reduct. So the only remaining proof obligation is to show that
l(m, n) holds for all m, n ∈ N. We prove this by induction to n.
Base case (n): n=0
(a) let N(0, 0) be M
0
, then (a) holds trivially by reﬂexivity of ‘≡’.
(b) let N(0, j) be K
j
for 0 ≤ j ≤ m, then (b) also holds trivially.
Note that this is valid in combination with the deﬁnition under (a)
since N(0, 0) ≡ M
0
≡ M ≡ K
0
.
(c) N(i, j) → N(i, j + 1) holds because i = 0 and (**).
(d) N(i, j) → N(i + 1, j) holds trivially because n = 0 yields an empty range for i.
9.2. CHURCH AND THE LAMBDA CALCULUS 163
Induction case (n): Induction hypothesis (i.h.n): suppose that for n = k,
k ∈ N, for all m ∈ N the statement l(m, n) is true. We now prove the
statement for n = k + 1. We do this by induction to m.
Base case (m): m=0
(a) let N(k + 1, 0) be M
k+1
for 0 ≤ k ≤ m, then (a) holds trivially.
(b) since j = 0 this amounts to N(0, 0) ≡ K
0
.
This is true because of our previous deﬁnition of N(0, 0) ≡ M
0
.
and the fact that M
0
≡ M ≡ K
0
.
(c) holds trivially, because m = 0 yields an empty range for j.
(d) N(i, j) → N(i + 1, j) because j = 0 and (*).
Induction case (m): Induction hypothesis (i.h.m): suppose that for m = r
and n = k + 1, r ∈ N, the statement l(m, n) is true. We now prove the
statement for m = r + 1.
(a) N(i, 0) ≡ M
i
for 0 ≤ i ≤ k + 1 follows from i.h.n.
(b) N(0, j) ≡ K
j
for 0 ≤ j ≤ r + 1 follows from i.h.m.
(c) and (d)
We already know from the induction hypotheses that N(i, j) →
N(i, j +1) is okay for (0 ≤ i ≤ k +1 ∧0 ≤ j < r) ∨ (0 ≤ i < k ∧0 ≤
j < r + 1). What we now have to show is that this is also true for
i = k+1 and j = r+1. We know by (c) of i.h.m there exists a N(k, r)
such that N(k, r) → N(k, r + 1). We also know by (d) of i.h.n that
there exists a N(k, r) such that N(k, r) → N(k + 1, r). Then by the
diamond property of → we know (∃N(k +1, r +1) : N(k +1, r +1) ∈
Λ : N(k, r + 1) → N(k + 1, r + 1) ∧ N(k + 1, r) → N(k + 1, r + 1)).
We can now sketch the proof
2
of the following fundamental theorem of
the untyped lambda calculus:
2
The lines of the proof are due to W. Tait and P. MartinL¨ of (see [6], section 3.2]), but
as far as I know this is the ﬁrst proof that formalized the above lemma to a reasonable
extent.
164 CHAPTER 9. CHURCH AND TURING
Theorem (Church, Rosser): →
β
is conﬂuent.
Proof: By the previous lemma, we know that if any binary relation on a set
satisﬁes the diamond property, its transitive reﬂexive closure also satisﬁes the
diamond property. Suppose we have a binary relation →
partial−β
on the set Λ
such that
β
is the transitive reﬂexive closure of →
partial−β
. So if we prove
that →
partial−β
satisﬁes the diamond property, by application of the previous
lemma we have proved that
β
satisﬁes the diamond property, i.e. →
β
is
conﬂuent.
A concrete deﬁnition of →
partial−β
, a proof that its transitive reﬂexive closure
is indeed →
β
, and a proof that →
partial−β
satisﬁes the diamond property can
be found on pages 6062 of [6].
Theorem: λcalculus has the unique normal form property.
Proof: Suppose that a term a of 'Λ, →` has two normal forms, n
1
∈ Λ
and n
2
∈ Λ. This means there is no b ∈ Λ such that n
1
→ b or n
2
→ b.
But a n
1
∧ a n
2
, and then by the ChurchRosser property we know
(∃c : c ∈ Λ : a n
1
∧ a n
2
). But then we must have n
1
≡ n
2
.
Example: All constants are normal forms, as well as x, λx.x, λx.xx, yy, . . ..
Note that the term (λx.xx)(λx.xx) cannot be reduced to a normal form.
Conﬂuence is a fundamental property for functional programming; we relay
on this when we evaluate programs by rewriting, knowing that we never have
to backtrack an evaluation (this is also one of the main diﬀerences with logic
programming).
In the λcalculus we have deﬁned in this section, we can represent natural
numbers and basic operations on the natural numbers. We will not show
this here; in most books on the lambda calculus there are some examples of
how to do basic arithmetic in lambda calculus. The λcalculus represents a
certain class of (partial) functions on the integers. By a classical result of the
American mathematician Stephen C. Kleene (19091994) this is exactly the
set of (partial) recursive functions. The proof can be found in [6, theorem
9.2.16]. Church also thought of the set of functions that could be calculated
in his λcalculus, and conjectured the following thesis:
9.2. CHURCH AND THE LAMBDA CALCULUS 165
Church’s thesis (1936) The set of eﬀectively computable functions, i.e. functions
that intuitively (eﬀectively) can be computed, is the same as the set of
functions that can be deﬁned in λcalculus.
A more formal version and detailed treatment of Church’s thesis can be
found in section 9.3.
Alan Turing proved in 1937 that the class of Turing computable functions is
the same as the class of functions deﬁnable in λcalculus.
So the power of Turing Machines is the same as the power of λcalculus.
Both models capture the intuitive idea of computation. This important thesis
is the subject of the next section.
166 CHAPTER 9. CHURCH AND TURING
9.3 The ChurchTuring thesis
The ChurchTuring thesis concerns the intuitive notion of algorithm (or ef
fective or mechanical method) in logic and mathematics. The notion of an
algorithm or an eﬀective method is an informal one, and attempts to char
acterize this eﬀectiveness lacked rigor, mainly because the key requirement
that the method demands no insight or ingenuity is left unexplicated.
One of Turing’s achievements in his paper of 1936 (reprinted in [19]
and online available at http://www.abelard.org/turpap2/tp2ie.asp) was to
present a formally exact predicate with which the informal predicate ‘can be
calculated by means of an algorithm or eﬀective method’ may be replaced.
The formal concept proposed by Turing is that of computability by a Turing
Machine (see section 9.1). He introduced this thesis in [90] in the course of
arguing that the ‘Entscheidungsproblem’ for the predicate calculus is unsolv
able.
Turing’s thesis: TM’s can do anything that could be described as intu
itively computable
Church also presented in [14] a formally exact way to express this no
tion of intuitively computable. Turing’s method was however more obvious
and more general than Church’s, since the latter only considered functions
of positive integers. In order to calculate the values of the function Church
introduced his lambda calculus and speciﬁed the notion of a recursive func
tion (see section 9.2).
Church’s thesis: A function of positive integers is eﬀectively computable
only if it is recursive
The reverse implication is also referred to as the converse of Church’s
thesis. The class of lambdadeﬁnable functions and the class of recursive
functions were later shown to be identical. This was established in the case
of functions of positive integers by Church and the American mathematician
Kleene (see [47], [14]). After learning of Church’s proposal, Turing quickly
established that the apparatus of lambdadeﬁnability and his own apparatus
of computability were equivalent ([89], page 263).
9.3. THE CHURCHTURING THESIS 167
Theorem: Lambdadeﬁnability and Turing Machinecomputability
are equivalent.
Proof: See [89, page 263] for a proof that Turing’s machines and Church’s
lambda calculus can compute the same set of functions.
Although Turing and Church had chosen diﬀerent ways to formalize the
intuitive notion of eﬀective computability, respectively by identifying the no
tion with that of computability by a Turing Machine and in the lambda cal
culus, both methods are equivalent. After this proof of equivalence, Kleene
introduced the term ‘ChurchTuring thesis’ to refer to any of the two equiv
alent theses ([48], page 232).
ChurchTuring thesis: The intuitive notion of an algorithm equals the
Turing Machine algorithm or (equivalent) the calculable functions of lambda
calculus
There are a number of misunderstandings of the ChurchTuring thesis,
collected in [16]; Turing did not show that
• Any problem can be solved ‘by instructions, explicitly stated rules or
procedures’
• A universal TM ‘can compute any function that any computer, with
any architecture can compute’ (Turing said noting about the limits of
what can be computed by a machine)
• Whatever can be calculated by a machine (working on ﬁnite data in
accordance with a ﬁnite program of instructions) is Turingmachine
computable (this is known as Thesis M, see [16])
• Any process that can be given a systematic mathematical description
(or a ‘precise enough characterization of a set of steps’, or that is
‘scientiﬁcally describable’ or ‘scientiﬁcally explicable’) can be simulated
by a TM (this is known as Thesis S, see [16])
Since the word ‘computable’ is often tied by deﬁnition to eﬀective calcu
lability, the ChurchTuring thesis is often stated as ‘All computable functions
are computable by a Turing Machine’ (a function is said to be computable if
and only if there is an eﬀective procedure for determining its values).
168 CHAPTER 9. CHURCH AND TURING
If we summarize the above, we can say that to deﬁne the concept of an
algorithm, Church used a notational system, the lambda calculus. Turing did
the same with his theoretical computing device, the Turing Machine. On the
face they seemed very diﬀerent from one another, but these two deﬁnitions
turned out to be equivalent, in the sense that each picks out the same set
of mathematical functions. The ChurchTuring thesis is the assertion that
this set contains every function whose values can be obtained by a method
or algorithm corresponding to our intuitive notion of eﬀectively computable.
Clearly, if there were functions of which an informal (intuitive) statement,
but not the formal statement, were true, then the latter would be less gene
ral than the former and so could not be reasonably be employed to replace
it. When the thesis is expressed in terms of the formal concept by Turing,
it is appropriate to refer to the thesis also as the Turing thesis, and idem
for the case of Church. It is agreed amongst mathematicians and logicians
that ‘computable by means of a TM’ is the correct accurate rendering of the
informal notion in question.
Chapter 10
Conclusion
It is a profoundly erroneous truism, repeated by all copy books
and by eminent people, when they are making speeches, that we
should cultivate the habit of thinking of what we are doing. The
precise opposite is the case. Civilization advances by extending
the number of important operations which we can perform with
out thinking about them. . . . The study of mathematics is apt to
commence in disappointment . . . We are told that by its aid the
stars are weighed and the billions of molecules in a drop of water
are counted. Yet, like the ghost of Hamlet’s father, this greatest
science eludes the eﬀorts of our mental weapons to grasp it.
 A. Whitehead, in [99]
When I started my study on the foundations of mathematics, I did not
quite know what to expect. By now I’ve learned that the foundations of
mathematics can be a fascinating and important subject. Learning this new
subject was an interesting challenge, but sometimes hard work when I had
to go through numerous books that were full of details or too vague and
philosophical. Most books that I found on the foundations of mathematics
were either very detailed and descriptive (with an unmatched level of detail
and exactness is the book [31] of I. GrattanGuinness) or treat only a part
of the theory that was developed from 1890 to 1940 (for example [17] gives
an excellent introduction to set theory). One of the better, though relatively
unknown, is the book of G.T. Kneebone [49] that is quite complete and still
considerably theoretic. One of the motivations to write this article was to
169
170 CHAPTER 10. CONCLUSION
present the theory properly. Hopefully that makes it more clear and enjoy
able. Some of the good literature used, such as the books just mentioned,
will be found in the references at the end of this report.
At the same time, I also tried to brieﬂy introduce the reader to the his
torical context of the most important developments. Most undergraduate
courses I have taken gave little or no information about the history that is
laying behind the theory. Emphasis was laid on the accumulation of mathe
matical knowledge. I believe that the history of mathematics in education
can not only make the study of mathematics more interesting, but also help
in the growth of mathematical understanding and appreciation of the current
form of the theory.
I want to conclude this report with a summary of the theory and my own
view on the project, and with some ideas for future work.
The project
In the beginning of the 20
th
century Hilbert said we should formalize all
of mathematics, mathematical reasoning. This ‘project’ (from now on I will
refer to it as the project) has been the central theme of this report. When
reading about the work and biographies of all those brilliant men that have
put themselves on this problem, you can (at least that’s what happened to
me) get caught up into this fascinating philosophical question.
To most people however, this all seems very impractical. We all know
you can make a popular operating system or start your own business on the
web and in one year make a million dollars if you’re lucky. And when it
comes to verifying mathematical proofs and making reliable software, a for
mal basis is rarely used, the human mind is still the most important, and
other techniques, such as modelchecking, are preferred. It might be worth
writing another article, on how and why in that respect the more practical,
working mathematicians and more theoretical logicians (or formalists, if you
prefer) grew apart. But let’s ﬁrst go back to the project.
The attempt to formalize mathematical reasoning is not new  the Greek
already thought rationality was the supreme goal. We can think of Plato
171
and Reason, or as Russell
1
would say  think of Pythagoras and Rationality!
Aristoteles made a big step in formalizing the reasoning, with his patterns
of reasoning that are known as syllogisms. Ever since, logic was further
developed and important contributions come from De Morgan, Leibniz and
especially Boole. Because he was interested in theology and God (see [31,
chapter 3] and also [30, section 5.8, page 203]), Cantor became obsessed with
the notion of inﬁnite, and developed his theory of inﬁnite sets. With Cantor
mathematics got more abstract, and some people regarded his set theory
as a disease. Poincar´e, the great French mathematician, said
2
: (from [95])
“Later generations will regard Mengenlehre (set theory) as a disease from
which one has recovered.”. Peano and Frege, as we have learned in chapter
4, brought mathematical reasoning to an even higher level of formalization.
So far, so good. But there turned out to be some problems, and although
Cantor had already noticed this (see Cantor’s paradox in section 3.8), it was
Russell who spread the bad news to everyone, by stating his Russell paradox.
At that point Hilbert proposed to use a formal axiomatic method to solve
these problems, and he gave his famous three requirements of consistency,
completeness and decidability.
This proposal of Hilbert to formalize mathematics, led to the development
of several axiomatic systems, such as those of Zermelo and Fraenkel, and of
G¨odel, Bernays and Neumann. Russell and Whitehead made their own at
tempt to formalize mathematics, with their theory of types. But although
all of these attempts were fruitful to a certain extent, in total they all failed,
and it took G¨ odel and Turing to show that in fact ‘the project’ couldn’t
be done. Formalizing mathematics so that we have absolute truth is not
possible! But these works of G¨odel and Turing were new and complicated,
and not everyone clearly recognized its importance. And even nowadays, few
people are familiar with the details of their work, and we often see confu
sion between notions like ‘checking the proof of a statement’ and ‘checking
whether a statement is true (or not)’. There is also much confusion about the
exact implications of G¨ odel’s and Turing’s work. G¨ odel created a statement
within arithmetics, that is not provable in any axiomatic system. Turing
later formalized the notion of computability to show there is no mechanical
1
Although rationality is more commonly associated with Plato, Russell always insisted
on attributing it to Pythagoras (see [62]).
2
Whether or not he actually said this is a matter of debate amongst historians of
mathematics.
172 CHAPTER 10. CONCLUSION
procedure to decide if a statement is correct or not.
At ﬁrst this was a shock, but then mathematicians were saying (and
again it would be nice to write an article about the diﬀerent responses of
mathematicians and logicians): so what  we should do mathematics exactly
the same way as we’ve always done it, this does not apply to the problems
I care about. Indeed mathematicians continued with their work, and the
theorems of G¨odel and Turing had no or little impact in practice on how
we (should) do mathematics. The only eﬀect the project might have had on
working mathematicians, is that they have become a bit more precise in the
use of language and in writing their proofs. Some of course were inspired
by problems like the 23 of Hilbert. But there has been another consequence
of all this theoretical work, that I was made aware of through a videotaped
lecture of G.J. Chaitin on the internet. I quote him about Hilbert’s attempt
to formalize all mathematics after the publications of the theorems of G¨ odel
and Turing: “It failed in that precise technical sense. But in fact it succeeded
magniﬁcently, not formalization of reasoning, but formalization of algorithms
has been the great technological success of our time  computer programming
languages! So if you look at the history of the beginning of this century you’ll
see papers by logicians studying the foundations of mathematics in which
they had predicate calculi. Now you look back and you say this is clearly
a programming language! [...] If you look at Turing’s paper of course there
is a machine language [...]. Or, as von Neumann said: the universal Turing
Machine is really the notion of a general purpose programmable computer 
and that’s the idea of software. [...] If you look at papers by Alonzo Church
you see the lambda calculus, which is a functional programming language.
If you look at G¨ odel’s original paper you see what looks like LISP, it’s very
close to LISP”. As he showed there are numerous examples of unexpected
oﬀspring of theoretical research, and all of the foundational work is not so
impractical after all! As G.J. Chaitin concluded in his speech, this is the
way “we’re all beneﬁting from the glorious failure of this project!”. Now
this is not entirely true, but it is true that theoretical studies, as he says
“don’t have spinoﬀ in dollars right away, but sometimes they have vastly
unexpected consequences”. Formal methods/studies have not always done a
good job promoting themselves  maybe we can emphasize this aspect and
show that technology often advances through fascinating impractical ideas.
173
Status of the project
That brings us to ask if the question of the foundation of mathematics,
more than a decade after Hilbert formulated it, is now settled once and for
all. The short answer is: it is not. Even from the amount of interesting
resources on current research that are available on the internet alone, we can
conclude there is still a lot of work to do on the foundations of mathematics.
I consider creating an online version of this document with more background
information and links.
Although G¨ odel and Turing showed that it is impossible to totally for
malize even basic arithmetic, let alone the whole of mathematics, it is still
possible to formalize parts of mathematics (for example, geometry) success
fully. As P. Andrews says in [4], “attempts to understand the nature of rea
soning and to build sophisticated information systems which can draw logical
conclusions may be regarded as part of an endeavor to fashion more powerful
intellectual tools for coping with the increasingly complex problems which
confront mankind.” In that respect the formalization is not restricted to ma
thematical reasoning, and it can also be applied to other disciplines (such
as physics, chemistry or even social sciences). Especially the development
of software and computer systems will be facilitated by a formalization of
theories. Despite that total formalization of parts of mathematics is very
useful, this is not the focus of most current research: (most people believe
that) the human mind will (at least for the near future) be the one to prove
whether a given mathematical statement is true or not.
Ideas for future work and
distinguishment between mathematics and software
And although it cannot be determined by a machine whether any given
mathematical statement is true, we can try to develop an axiomatic system
such that as much as possible of the interesting statements
3
can be proved
within that system. This is useful because, even when all axiomatic systems
are incomplete and there are undecidable statements, if we provide one of the
3
As interesting statements, we consider all statements in the (everyday) work of prac
ticing mathematicians. These ‘practical’ statements do not include the speciﬁc purely
theoretical statements that G¨odel invented for his incompleteness theorem.
174 CHAPTER 10. CONCLUSION
statements that the system does contain, and which we claim to be decida
ble by providing a concrete and completely formalized (dis)proof of it within
that system, we still have a way to decide mechanically whether or not the
proof is correct for the given statement. The question then is if the set of
statements for which we can do this, still forms a part of mathematics that
is interesting enough. This has to be a part of our investigation: to ﬁnd out
how many of the practical mathematical proofs contain ‘metaarguments’, in
other words which classes will fall outside our system. Although we want to
change as little as possible to the (side of) mathematics itself, this also might
be a necessary option
4
. As P. Andrews calls his book [4], we get: ‘to truth
through proof’. This should be the ﬁrst goal for the near future:
(1) Investigate which parts of mathematics can(not) be formalized (i.e. con
tain ‘metaarguments’), which formalization is best usable and allows most
parts of (practical) mathematics to be formalized, and totally formalize proof
checking for as most parts of mathematics as possible.
Formalization is not only important to check the correctness of mathema
tical theories that are becoming ever more complex. Many models in physics
and chemistry depend on underlying mathematical theorems, and the suc
cess of the model depends on the correctness of the mathematical theorems.
Also, we are becoming more and more dependent on automated systems, in
particular computers and software. There is a growing need for reliable (that
is, correctly speciﬁed and working according to the speciﬁcations) software,
not only for (safety) critical systems, but also in everyday applications. A
formal approach can not only be used to prove correctness of mathematical
statements but also of computer programs. This is an important point:
Distinguishment between mathematics and software construction.
Instead of the proofs of mathematical statements, we are then checking
the derivation steps of program derivations. I want to emphasize this diﬀer
ence, since it is often unclear or left implicit which of the two is meant when
arguments for/against formalistic studies are given. We have to realize that
we can never obtain a 100% guarantee of correctness of any algorithm, since
4
For a successful formalization of parts of mathematics we therefore do not only look
at the axiomatic system, but it also might require us to limit certain parts of mathematics
so that they contain less undecidable proofs or require us to rewrite certain existing proofs
to a form that is permitted by the system.
175
we also are dependent on the correctness of the proofchecker. That is why
we have to try to keep the proofchecker as simple, small and intuitive as pos
sible (see also the ‘Bruijn criterion’ in [26, pages 4 and 26]). And analogue,
we can never obtain a 100% guarantee of correctness of any mathematical
statement, since we learned from G¨odel that the consistency of any axiomatic
system cannot be proved within that system, and therefore we better also
try to keep the axiomatic system as simple, small and intuitive as possible
(we could see all this as the Bruijn criterion variant for axiomatic systems).
But nevertheless, any such implementation of a proof checker would give us
the highest degree of certainty possible.
Software and Proof Checking
I would also like to remark that proof checking for programs can only give
us a way to verify the correctness of programs. At least as important (to ob
tain correct programs) is the correct construction of programs. This is the
focus of the work in the area of programming methodology. At the Eindhoven
University of Technology for example, the techniques of E.W. Dijkstra are
used to derive correct programs from their speciﬁcation. Unfortunately both
areas (proof checking/veriﬁcation vs. construction/derivation) are merely ad
vocates of their own approach, while a combination of both could give the
best results. Although there has been some minor work on formalizing these
proof techniques and combining formal methods and program derivations
(see for example [26]), cooperation is still minimal. If we go one step further
back in the process of creating correct software, the success of any piece of
software depends on the correctness of its speciﬁcation. These ﬁrst phases of
software engineering (indicating user requirements/speciﬁcations) can also be
adopted to comply with the methods of program derivation and formal proof
checkers (note that we not only use the term ‘proof checker’ for mathematics,
i.e. to check mathematical statements, but also for the software variant: for
checking algorithms/programs derivations). And since we can never obtain
a 100% guarantee of correctness of software (it depends for example on the
correctness of the speciﬁcations and the proof checker itself), model checking
techniques can also be used as a veriﬁcation method to improve reliability
even further. Therefore I stress for an integrated approach, for the combina
tion of all of the mentioned methods can only together give us the highest
reliability (i.e. highest chance of correctness of software). Such an integrated
approach requires research and cooperation between the various branches
176 CHAPTER 10. CONCLUSION
representing the methods I mentioned before and ultimately incorporation
in the software engineering process.
Mathematics and Proof Checking
Let’s go back to proof checking of mathematical statements. We men
tioned the ﬁrst goal of investigating and formalizing proof checking. As a
next step (2) we can think of building proof assistants. Proof assistants not
only check the proofs for us, but also help us in making the proofs: they
are tools that are a combination of a proof development system and a proof
checker. A good article about proof assistants using dependent type systems
can be found in [8]. Also an interesting article on computer assisted mathe
matics (for computer algebra) is [7] with an abstract history of computations
versus proofs in mathematics. The notion of ‘helping’ or ‘assisting’ in making
proofs might be considered vague. For complicated statements, we can think
of tools that keep track of the context of the proof, of the remaining proof
obligations and even ﬁll in part of the proofs for us automatically.
Proof assistants should make it easier for us to prove mathematical theorems.
Then (3) we can think of building a standard library of proved mathematics.
After a proof checker has conﬁrmed the correctness of a given mathematical
statement and its corresponding proof, they can be stored in a database. It
can be accessible to everyone via the internet and even be used for previously
mentioned automated proving methods by proof assistants. And although we
can not see the quality of mathematical work as evident as the quality of phys
ical products, this could be the long awaited ‘quality stamp’ for mathematics.
There have already been attempts to build standard libraries of mathematics
(see the Mizar project at http://www.mizar.org/ and the PRL project, see
http://.www.cs.cornell.edu/Info/Projects/NuPRL/nuprl.html, but they lack
the formal basis that has to be provided by (1) and (2)). Barendregt and
his group have formalized parts of algebra using the theorem prover COQ.
This shows that it is possible to formalize large parts of mathematics, but
the process itself of formalizing mathematics is too direct and informal and
needs to be further developed. Many valuable experiences have come out of
attempts on what are here called phase (2), (3) and (4), but for a successful
result this is premature and do we ﬁrst have to start thoroughly at the be
ginning (1). Work in this direction was done in [44], where a syntaxdriven
derivation system is presented for a formal language of mathematics called
177
Weak Type Theory. This is a start of a more rigorous approach to the trans
lation of mathematical texts (statements and proofs).
We see the extension of proof assistants with more intelligent and sophis
ticated automated proving methods, as the last and ﬁnal phase (4) of future
work. Part of the branch of automated proving are classical theorem proving
methods (such as for example automated induction, etc.). New methods are
from areas such as neural networks, fuzzy logic and genetic and DNA com
puting and in the future possibly even quantum computing.
I want to end these ideas by summarizing the steps that are laying ahead
of us, in a new project.
The new project (for mathematics):
1 Investigate which parts of mathematics can(not) be formalized (i.e. con
tain ‘metaarguments’), which formalization is best usable and allows
most parts of (practical) mathematics to be formalized, and totally
formalize proof checking for as most parts of mathematics as possible
2 building a proof assistant (probably based on some form of WTT and
some form of TT)
3 build a standard library (archive) of proved mathematics
4 further develop automated proving techniques (to build in the proof
assistant)
And similarly we can formulate the new project for computer systems:
The new project (for software construction):
1 formalize as much of program derivation checking as possible
2 build a programming assistant (environment) based on a suited (and
preferably popular) programming language
3 build a standard library of reusable correct software (i.e. suitable for
component based software engineering) and its speciﬁcation
4 further develop automated proving and program derivation techniques
178 CHAPTER 10. CONCLUSION
One of the most important questions, part of step (1), has so far in this
conclusion been avoided: What to take for the basis of mathematics? This is
one of the most diﬃcult questions and as we have seen many great scientists
have thought about this. There is currently no consensus of what is the best
approach, and I am not in the position to give an argumented opinion. A
thorough research of the alternatives will have to yield the best approach and
will show which choice of foundational system is best usable in practice.
The only thing I can say is that it seems that recently most people seem to
favor type theory over category theory, relational calculi and also over set
theory. P.J. Scott for example favors type theory over category theory in
the introduction of [55]. H. Barendregt gives arguments for the use of type
theory over set theory in [7], and we quote from [4, the second page of the
preface]: “[People prefer the approach they are most familiar with.] However,
those familiar with both type theory and axiomatic set theory recognize that
in some ways the former provides a more natural vehicle than the latter for
formalizing what mathematicians actually do”. On the contrary, on http://
www.rbjones.com/rbjpuc/logic/jrh0111.htm we ﬁnd a detailed assessment
on the choice for a foundational system, with advantages of set theory over
type theory. Also, several new types of logic have been proposed, such as IF
logic (see [37]) and several types of socalled ‘fuzzy logics’, but until so far
it seems they lack preciness, formalization and proofs to support claims that
they can be used successfully as a foundation for mathematics.
A ﬁnal remark on the debate between type theory and axiomatic set theory
as a foundational basis, is that if there is a mapping from the axioms of
(some form of) set theory in (some form of) type theory and vice versa, type
theoretic expressions have their counterparts in set theory. It is interesting to
investigate if among such mappings there is indeed a bijection. That would
show the equivalence of both theories in expressive power, so that the debate
can turn onto the question which theory is more intuitive and useful.
Some do not really believe in a successful formalization of mathematics but
rather see the indeterminacies in mathematical representations and the un
decidabilities in any formal system as the source of problem solving and
creative power (see [87, page 174]). This standpoint was already mentioned
in 1807 by the German mathematician Hegel (17701831) in [35]: “Dagegen
muß behauptet werden, daß die Wahrheit nicht ein ausgepr¨ agte M¨ unze ist,
die fertig gegeben und so angestrichen werden kann”.
179
I am aware of the limitations of this report. Many chapters are still infor
mal, such as the work of Frege in chapter 4. The theory of types in chapter
7 and of G¨ odels incomepleteness theorem in chapter 8 are not completely
covered and certain subjects closer to logic (such as intuitionism) are treated
very minimally. The only excuse I have is that it is simply not possible to
study all the original works in such a short period of time, and include all
theory in this report. I hope to complete this work at a later stage. It might
also be worth to extend (on both sides) the period of which the theory is
treated in this report. Recently we have seen interesting new theories on
category and type theory and even on the foundations of mathematics, as
we look at Chaitin’s results on randomness; it seems that he went further
where G¨odel and Turing left oﬀ. Finally I would like to remark that the ‘new
project’, consisting of the four steps mentioned in this conclusion, is just my
own view of work that lays ahead of us. To end with a concluding remark
by Alan Turing, from his paper on the Turing test: “We can only see a short
distance ahead, but we can see plenty there that needs to be done”.
Mark Scheﬀer, August 2001
5
5
p.s. To those who wonder what the turtle and the elephant are doing on the cover of
this report, I refer to the website http://zax.mine.nu/stage/.
180 CHAPTER 10. CONCLUSION
Appendix A
Timeline and Images
Figure A.1: Luitzen Brouwer
Figure A.2: George Cantor
Drawings by Soshichi Uchii, suchii@bun.kyotou.ac.jp;
Photo Quine by Kelly Wise;
Photo Ramsey due to Harcourt, Brace, Jovanovich.
181
182 APPENDIX A. TIMELINE AND IMAGES
Figure A.3: Richard Dedekind
Figure A.4: Gottlob Frege
Figure A.5: Kurt G¨ odel
Figure A.6: David Hilbert
183
Figure A.7: John von Neumann
Figure A.8: Giuseppe Peano
Figure A.9: Henri Poincar´e
Figure A.10: Willard Van Orman Quine
184 APPENDIX A. TIMELINE AND IMAGES
Figure A.11: Frank Plumpton Ramsey
Figure A.12: Bertrand Russell
Figure A.13: Alan Turing
185
186 APPENDIX A. TIMELINE AND IMAGES
Bibliography
[1] Y. BarHillel A.A. Fraenkel and A. Levy. Foundations of set theory.
NorthHolland Press, Amsterdam, 2 edition, 1973. First edition 1958.
[2] W. Ackermann and D. Hilbert. Grundz¨ uge der Theoretischen Logik,
volume Band XXVII of Die Grundlehren der Mathematischen Wis
senschaften in Einzeldarstellungen. SpringerVerlag, ﬁrst edition, 1928.
Berlin.
[3] J.H.J. Almering. Analyse. Delftse Uitgevers Maatschappij, 1993.
[4] P. Andrews. An introduction to mathematical logic and type theory: to
truth through proof. Academic press, 1986.
[5] J. Backer and P. Rudnicki. Hilbert’s basis theorem. Association of
Mizar Users, University of Bialystok, 12, 2000, 2000. Published in
Journal of Formalised Mathematics.
[6] H. Barendregt. The Lambda Calculus  Its Syntax and Semantics, vol
ume 103. Elsevier Science Publishing Company, Inc., 1984.
[7] H. Barendregt and A.M. Cohen. Electronic Communication of Ma
thematics and the Interaction of Computer Algebra Systems and Proof
Assistants. J. Symbolic Computation. Academic Press, 2001.
[8] H. Barendregt and H. Geuvers. Proofchecking using Dependent Type
Systems, volume 2, chapter 18, pages 11491240 of Handbook of Artiﬁ
cial Reasoning. Oxford Press, 2001.
[9] C.J. Bloo. Computational Models. TU/e Press, 2001. Manuscript
originally started by H. Geuvers and J. Hooman.
187
188 BIBLIOGRAPHY
[10] J. Breuer. Introduction to the Theory of Sets. PrenticeHall, August
1958.
[11] Encyclopedia Brittanica. P. Bernays. EB, 2000.
[12] K.S. Brown. Mathematics. Seanet, 1991.
[13] G. Cantor. Ein beitrag zur mannigfaltigkeitslehre. Journal f. reine und
angew. Math., Gesammelte Abhandlungen., 84, pages 119133, 1878.
Translated in ‘Contributions to the foundation of the theory of transﬁ
nite numbers (translation from German’, by Philip E. Jourdain, Dover
Publishing, 1952.
[14] A. Church. An unsolvable problem in elementary number theory, vol
ume 58. American journal of Mathematics, 1936.
[15] P.J. Cohen. Set Theory and the Continuum Hypothesis. Benjamin,
1966.
[16] B.J. Copeland. The ChurchTuring Thesis. SpringerVerlag, 1997. Item
in Stanford Encyclopedia of Philosophy.
[17] H.C. Doets D. van Dalen and H. de Swart. Sets: Naive, Axiomatic and
Applied. Pergamon Press, 1978.
[18] J.W. Dauben. Georg Cantor, His Mathematics and Philosophy of the
Inﬁnite. Harvard University Press, 1979.
[19] M. Davis. The Undecidable: Basic Papers on Undecidable Propositions,
Unsolvable Problems and Computable Functions. Raven Press, New
York, 1965.
[20] Diverse. Mathematische Annalen, 65. SpringerVerlag, Berlin, 1908.
[21] A. Einstein. Relativity: the special and general theory. Methuen Press,
London, 1970.
[22] H. Eves. Mathematical Circles Revisited. Boston Press, 1971.
[23] H. Eves. Foundations and fundamental concepts of mathematics. Dover
publications inc., Mineola, New York, third edition edition, 1997.
BIBLIOGRAPHY 189
[24] A. Fraenkel. Einleitung in die Mengenlehre. SpringerVerlag, third
edition, 1928.
[25] A.A. Fraenkel. Abstract Set Theory. NorthHolland Press, Amsterdam,
3 edition, 1966. First edition in 1953.
[26] M. Franssen. Cocktail. Eindhoven University Press, 2000. Doctoral
thesis.
[27] K. G¨ odel. On formally undecidable propositions of Principia Mathema
tica and related systems. Dover publications, New York, 1992. English
translation of G¨ odel’s original 1931 publication of the incompleteness
theorem. First published in 1962 by Basic Books, inc., New York.
[28] D. Goldrei. Classic set theory, a guided independant study. Chapman
and Hall, 1996.
[29] I. GrattanGuinness. How did Russell write the principles of mathema
tics (1903). McMaster University Library Press, 1997. In the Journal
of the Bertrand Russell Archive.
[30] I. GrattanGuinness. From the Calculus to Set theory 16301910.
Princeton University Press, 2000. First published in 1980 by G. Duck
worth & Co, London.
[31] I. GrattanGuinness. The Search for Mathematical Roots 18701940.
Princeton University Press, 2000.
[32] I. GrattanGuinness. A sideways look at Hilbert’s Twentythree Pro
blems of 1900. Middlesex University Press, 2000.
[33] J. Haim. Introduction of the Israel Mathematical Conference Procee
dings, volume 6. Barllan University Press, 1993.
[34] P.R. Halmos. Naive Set Theory. Van Nostrand Press, London, 1990.
[35] G.W.F. Hegel. Ph¨anomenologie des Geistes. Reprint: Meiner, Hbg.,
1807. English translation ‘The Phemenology of Mind’ by J.B. Baillie
in 1910, London.
[36] H. Hermes and H. Schulz. Mathematische Logik. Unknown, 1952. In
Encyklopedia Mathematische Wissenschaften, I1, 1, I, page 58.
190 BIBLIOGRAPHY
[37] J. Hintikka. The Principles of Mathematics Revisited. Cambridge Uni
versity Press, 1996.
[38] A. Hodges. Turing. The Great Philosophers. Phoenix, 1997.
[39] A.D. Irvine. Bertrand Arthur William Russell. Stanford University
Press, 2000.
[40] D. Joyce. Hilbert’s 1900 Address. Clark University, Worcester, 1997.
[41] D. Joyce. A list of Hilbert’s problems. Clark University, Worcester,
1997.
[42] D. Joyce. The Mathematical Problems of David Hilbert, http://
alepho.clarku.edu/ djoyce/hilbert/. Clark University, Worcester, 1997.
[43] F. Kamareddine and T. Laan. A reﬂection on russell’s ramiﬁed types
and kripke’s hierarchy of truths. Journal of the Interest Group in Pure
and Applied Logic, 4 (2):195–213, 1996.
[44] F. Kamareddine and R. Nederpelt. A derivation system for a formal
language of mathematics. To be published, July 2001.
[45] I. Kaplansky. Encyclopedia Brittanica, item on David Hilbert. EB,
1990.
[46] E. Kasner and J. Newman. Mathematicians and the imagination. New
York Publishing, 1940.
[47] S.C. Kleene. Lambdadeﬁnability and recursiveness. Duke Mathemati
cal Journal 2:340353, 1936.
[48] S.C. Kleene. Mathematical Logic. New York, 1967.
[49] G.T. Kneebone. Mathematical logic and the foundations of mathema
tics. D. van Nostrand Company, 1963. Reprint 2001.
[50] J. Koendrink. Solid Shape. Cambridge, 1990.
[51] K. Kunen. Set theory: an introduction of independence proofs. New
York Press, 1980.
BIBLIOGRAPHY 191
[52] T. Laan. A formalization of the ramiﬁed type theory. TUE Computing
Science Reports, 1994. Technical Report 9433.
[53] T. Laan. The Evolution of Type Theory in Logic and Mathematics.
PhD thesis, Eindhoven University of Technology, 1997.
[54] T. Laan and R.P. Nederpelt. A modern elaboration of the ramiﬁed
theory of types. Studia Logica, 57(2/3):243–278, 1996.
[55] J. Lambek and P.J. Scott. Introduction to higher order logic. Cambridge
University Press, 2001.
[56] P. Linz. An introduction to formal languages and automata. D.C. Heath
and Company, 1990.
[57] J.R. Lucas. The conceptual roots of mathematics. Rootledge Press,
2000.
[58] D. MacHale. Comic Sections. Dublin, 1993.
[59] Mosch´e Machover. Set theory, logic and their limitations. Cambridge
University Press, 1996.
[60] P. Mancosu. From Brouwer to Hilbert, the debate on the foundations
of mathematics in the 1920s. Oxford University Press, 1998.
[61] E. Maor. To inﬁnity and beyond. Boston Press, 1987.
[62] R. Monk. Russell. The Great Philosophers. Routledge, 1999. First
published in 1997 by Phoenix.
[63] G.H. Moore. Zermelo’s axiom of choice: it’s origins, development and
inﬂuence. SpringerVerlag, 1982.
[64] E. Nagel and J. R. Newman. G¨ odel’s proof. New York University Press,
1986. First published in 1958.
[65] G. Peano. Calcolo diﬀerenziale e principii di calcolo integrale. Turin
Press, 1884.
[66] G. Peano. Applicazioni geometriche del calcolo inﬁnitesimale. Turin
Press, 1887.
192 BIBLIOGRAPHY
[67] G. Peano. Calcolo geometrico secundo lAusdehnungslehre di H. Grass
mann e precedutto dalle operazioni della logica deduttiva. Fratelli Bocca,
Torino, 1888. Translation in German ‘Geometric Calculus : Accor
ding to the Ausdehnungslehre of H. Grassmann’ by Lloyd Kannenberg,
november 1999, Publisher Birkhauser.
[68] G. Peano. Dizionario di matematica. Parte prima. Logica matematica.
Unknown, 1901. In Ri(e)vista di mathematica, edited by Peano.
[69] L.J.J. Wittgenstein P.M. Sullivan. The foundations of mathematics.
Unknown, June 1927. Reprinted by F. P. Ramsey, June 1927, Theoria
61 (2) (1995), pages 105142.
[70] W. Van Orman Quine. Mathematical Logic. Harvard University Press,
1951. Revised edition of Norton, New York 1940.
[71] W. Van Orman Quine. From a Logical Point of View: 9 Logico
Philisophical Essays. Harvard University Press, 2 edition, 1961. Cam
bridge, Massachusetts.
[72] W. Van Orman Quine. Set Theory and its Logic. Harvard University
Press, 1963. Cambridge, Massachusetts.
[73] R.C.W. Bertrand Russell entry in Encyclopedia Brittanica. EB, 2000.
[74] J. Richard. Les principes de math´ematiques et le probl`eme des ensem
bles. Revue gnrale des sciences pures et appliques, 16, 1905. Published
also in Acta Mathematica 30 (1906), pages 295296.
[75] B. Riemann. Uber die Hypothesen, welche der Geometrie zu grunde
liegen. G¨ottingen Press, 1854.
[76] N. Rose. Mathematical Maxims and Minims. Raleigh NC, 1988.
[77] H. Rubin and J.E. Rubin. Equivalents of the axiom of choice. North
Holland Press, Amsterdam, 1963.
[78] B. Russell. My philosophical development. London: George Allen and
Unwin, New York: Simon and Schuster, 1959.
BIBLIOGRAPHY 193
[79] B. Russell. Introduction to Mathematical Philosophy. The Great
Philosophers. London: George Allen and Unwin; New York: The
Macmillan Company, 1999. First published in 1997 by Phoenix.
[80] B. Russell. The autobiography of Bertrand Russell. Routledge, 2000.
[81] S. Shelah. Proper forcing, lecture notes in mathematics. Springer
Verlag, 1982.
[82] M. Sipser. Introduction to the theory of computation. PWS Publishing
Company, Boston, 1997.
[83] A.T. Skolem. Einige bemerkungen zur axiomatischen begr¨ undung der
mengenlehre. Akademiska Bokhandeln, Helsinki, 1922. In ‘Matem
atikerkongressen i Helsingfors 47 juli 1922, Den femte skandinaviska
matematikerkongressen’, pages. 217232. Reprinted in ‘Selected Works
in Logic’, by A.T. Skolem, edited by Jens E. Fenstad, 1970, Publisher
Universitetsforlaget, Oslo.
[84] R.M. Smullyan. G¨ odel’s incompleteness theorems. Oxford Logic
Guides. Oxford University Press, 1992.
[85] B. Sobocinski. L’analyse de l’antinomie Russellienne par Lesniewski.
Unknown, 1950. Methodus I, pages 94107, 220228, 308316; Metho
dus II, pages 237257.
[86] F. Kamareddine T. Laan and R. Nederpelt”. Types in Logic and Ma
thematics before 1940, volume 8. Bulletin of Symbolic Logic, January
2002. To be published.
[87] M. Tiles. Mathematics and the image of reason. Routledge, 1991.
[88] E.C. Titchmarsh. Mathematical Maxims and Minims. Rome Press,
1988.
[89] A.M. Turing. On computable numbers, with an application to the Ent
scheidungsproblem, volume 42, pages 230265 of 2. London Mathe
matical Society, 1936. With corrections from Proceedings of the Lon
don Mathematical Society, Series 2, Vol.43 (1937) pages 544 to 546.
Reprinted with some annotations in ‘The Undecidable: Basic Papers
on Undecidable Propositions, Unsolvable Problems and Computable
Functions’, ed. Martin Davis, 1965, Raven Press, New York.
194 BIBLIOGRAPHY
[90] A.M. Turing. Intelligent Machinery. National Physical Labatory,
1948. National Physical Labatory Report in ‘Machine Intelligence 5’
by Meltzer, B. and Michie, P., 1969, Edinburgh University Press.
[91] Unknown. Encyclopedia Brittanica; Item on Principia Mathematica.
EB, 2000.
[92] Unknown. Encyclopedia Brittanica; Item on Turing. EB, 2000.
[93] J. van Heijenoort. From Frege to G¨ odel: source book in mathematical
logic 18791931. Harvard University Press, 1967.
[94] W. van Orman Quine. New foundations for Mathematical Logic. The
American Monthly, February 1937. 44(2), pages 7080.
[95] Various. The Mathematical Intelligencer, volume 13. SpringerVerlag,
Berlin, 1991.
[96] J. von Neumann. Zur Einfurung der transﬁniten Zahlen. Acta Szeged.
1:199208 [I, 3], 1923.
[97] J. Weiner. Frege in Perspective. Cornell, 1990.
[98] J. Weiner. Frege. Past Masters. Oxford University Press, 1999.
[99] A. Whitehead. An introduction to Mathematics. Williams and Norgate,
London, 1911.
[100] A. Whitehead. A treatise on universal algebra. New York, 1960.
[101] E. Zermelo. Untersuchungen ¨ uber die Grundlagen der Mengenlehre,
I. SpringerVerlag, 1908. In Mathematische Annalen 65, 1908, pages
261281.
2
3
.
Mark Scheﬀer, id. 415968, email: zax@chello.nl. Last changes: March 22, 2002. This report is part of a practical component of the Computing Science study at the Eindhoven University of Technology.
. two things are needed: Love and Blood. . 2001.Anonymous quote.4 To work on the foundations of mathematics.
. 15 3 Mathematical constructs in settheory 3. . . . . . . . .4. . . . 3. . . 3. . . . . . . . . . . . . 71 4. . 3. . . . . .4. . . . . .2 Relations . . . 13 2. . . . . . . .2 Ordinal numbers and BuraliForti’s Paradox 21 21 23 29 32 32 33 34 45 46 47 48 50 51 60 63 63 65 . . . .2 Deduction . .5. . . . . . . .2 Basic concepts . . . . . . . . . . . . . . . . . . . . 3. . . . . . . . . . 3. . . . . . . . . . . . . . . . . . . . . . . .1 Cardinal numbers and Cantor’s Paradox . 3.1 The beginning of settheory . 3. . . . . . . . . .3 The principle of induction . . . . . . . . . . . . . . . . . .5. . . . . .3 Cauchysequences . . . . . . 74 5 . . . . . . 3.4 Induction Methods . . . . . . . . .1 Peano’s arithmetic . . . . . . . 3. . . . . . . . . .1 Induction . . . . . . . . .Contents 1 Introduction 9 2 Cantor’s paradise 13 2.1 Dedekind’s cuts . 3. . . . .5. . . . . . . . . .5. . . . . . . . . .8 Cardinal and Ordinal numbers and Paradoxes . . . . . . . . . . . .7 The Continuum Hypothesis . . . . . . . . . . . . . . . . . . . . . . .6 Inﬁnite sets . . . . . . . . . .3 Functions . . . . . . . . . . . . . . 4 Peano and Frege 71 4. . . . 3. . .4 Properties of the three deﬁnitions . . . . . . . . . . . . . . . . . . . . . . . . . 3. . . . . . . . . . . 3. .2 Frege’s work .4. . . . . . . . . . . . . 3. . . . . . . . . . . . . . . . . . . . . . .2 Cantor’s chains of segments . . . . . . . . . . . . . 3. . . . .1 Some mathematical concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .8.8. . . . . . . .5 Real numbers . . . . . . . . . . . . . . . 3. .
.1 Hilbert’s proof theory . .2 The impossibility of an ‘internal’ proof of consistency . . . . . . . . . . . 166 169 181 . . . . . .2. . . . . . . . . . . . . . . . . . . . . . . 132 o 8. 113 7. . 5. . . . .3 G¨del numbering and a concrete proof of G1 . . . . . . . 101 6. .5 NeumannBernaysG¨del axioms . . . 134 8.6 5 Russell 5. . . .2 Consequences and philosophies . . . . . .1 Informally: G¨del’s incompleteness theorems . . . . . . . . . . . . . . . . . . . . . . . . . .2 Hilbert’s 23 problems . . . . . . . . . . . . . . . . . 130 8. 6 Hilbert 99 6. . . . .2 Ramsey. . . . . . 153 . . . . . . . . . . .3. . . . . . . . . 121 8 G¨del o 123 8. . . . .3 The ChurchTuring thesis . . . . . . 110 7 Types 113 7. . . . . . . . . . . . . . . 10 Conclusion A Timeline and Images 141 . . . .3 Quine . . . 119 7. . . . . 127 8. . . . . . . . . . . Hilbert and Ackermann .4 Consequences . . . . . . . . . .3 G¨del’s theorem and Peano Arithmetic . . . . . .3. . . . . . 123 o 8. . . . . . . . . . . . . 5. . . . . . . . . . . . . . . . . .1 Russell’s paradox .2 Formally: G¨del’s Incompleteness Theorems . . . . . . . . 127 o 8. . . . . .1 Turing and Turing Machine . . . . . . . . . . . . . . . . . . . . . . . . . 5. . . 141 . . . .2 Church and the Lambda Calculus . . .3 Zermelo Fraenkel . . 5. . . . . . . . . . . .2 Zermelo Fraenkel (ZF) Axioms CONTENTS 79 82 88 92 92 93 .1 Axiomatic set theory . . . . . . . . . . . . . . . . . . . . 9. . 9.2. . . . . . .1 Russell and Whitehead’s Principia Mathematica . . . . . . . G2 and G3 131 o 8. . . 135 o 9 Church and Turing 9. . . . .1 On formally undecidable propositions . . . . .2. . . . . .
. and t(x0 . . . 1 Notation originally due to E. . .CONTENTS 7 Mathematical Notations Many diﬀerent notations have been developed for set theory and logic. with Q a quantiﬁer. . xn . . . xn ) the term of the quantiﬁcation. xn ) using an operator or relation indicated by Q. . .W. . xn ) : t(x0 . Mathematical Logic symbol meaning ∧ ∨ ¬ ϕ(x) → ↔ ≡ ∀ ∃ ∃! conjuction disjunction (inclusive) negation propositional function implication biimplication equivalence universal quantiﬁer existential quantiﬁer oneelement existential quantiﬁer also described as and or not if . . . . xn ) for which ϕ(x0 . other notations that we have used are introduced in the text. . . . xn ) holds. over all values (x0 . . The quantiﬁcation is the accumulation of values t(x0 . . iﬀ is equivalent to for all exists exists a unique In most places we have chosen to use the following notation1 to denote quantiﬁcations: (relation : range : term) denotes the relationship over a set of terms ranging over range Consider a general pattern (Q x : ϕ(x0 . . . . Most notations that we have used are standard today. Dijkstra. . . xn )).. . . . . ϕ a boolean expression in terms of the dummies x0 . then if and only if. . .
or if it is clear what the range of a variable is. Example: (∀x : true : x ∈ A → x ∈ B) ≡ (∀x :: x ∈ A → x ∈ B) ≡ ‘all elements of A are also elements of B’ . Example: ( = 02 + 12 + 22 + 32 + 42 + 52 = 5 x : 0 ≤ x ≤ 5 : x2 ) x2 x=0 Example: (∃x : x ∈ N : x3 − x2 = 18) ≡ ‘there exists a natural number x such that x3 − x2 = 18’ If the term ranges over all possible values of the variable (here : x).e. This allows us to reason about general properties of quantiﬁcations. we can omit it. it exactly determines the domain of the quantiﬁcation). Note that this type of quantiﬁcation is only suitable for binary operations that are symmetric and associative.8 CONTENTS This notation is suitable for formal manipulation and unambiguous in the sense that it explicitly indicates the quantiﬁer Q. in a way in which the (scopes of the) bound variables are clearly identiﬁed. the dummies and the range of the dummies that is indicated by the boolean expression ϕ (i.
Chapter 1 Introduction
Pure mathematics is, in its way, the poetry of logical ideas.  Albert Einstein This report covers the most important developments and theory of the foundations of mathematics in the period of 1870 to 1940. The tale of the foundations is fairly familiar in general terms and for its philosophical content; here the main emphasis is laid on the mathematical theory. The history of the foundations of mathematics is complicated and is a manysided story; with this article I do not aim to give a deﬁnitive or complete version, but to capture what I consider the essence of the theoretical developments, and to present them in a clear and modern setting. Some basic mathematical knowledge on settheory and logics are presupposed. By the middle of the nineteenth century, certain logical problems (for example paradoxes around the notions of inﬁnity, the inﬁnitesimal and continuity) at the heart of mathematics had inspired a movement, led by German mathematicians, to provide mathematics with more rigorous foundations. This is where the theory of this report begins, with the emergence of set theory by the German mathematician Cantor. In section 2.1 we informally describe how work on a problem concerning trigonometric series gradually led Cantor to his theory of sets (section 2.2). As a result of the work of Weierstrass, Dedekind and Cantor, pure mathematics had been provided with much more sophisticated foundations. The notion of inﬁnitesimal had been banished, ‘real’ numbers had been provided with a logically consistent 9
10
CHAPTER 1. INTRODUCTION
deﬁnition (section 3.5), continuity had been redeﬁned and, more controversially, a whole new branch of arithmetic had been invented which addressed itself to the problems (e.g. paradoxes) of inﬁnity (sections 3.6, 3.7). In 1895 Cantor discovered a paradox (section 3.8.1) that he did not publish but communicated to Hilbert in 1896. In 1897 it was rediscovered in a slightly diﬀerent form by BuraliForti (section 3.8.2). Cantor and BuraliForti could not resolve this paradox, but it was not taken so seriously, partly because the paradoxes appeared in a rather technical region. The Italian mathematician Peano (section 4.1) was able to show that the whole of arithmetic could be founded upon a system that uses three basic notions and ﬁve initial axioms. At the same time the German mathematician Frege (section 4.2) worked on developing a logical basis for mathematics. Just as Peano, Frege wanted to put mathematics on ﬁrm grounds. But Frege’s grounds were strictly logic; he followed a development later called logicism, also known as the development of socalled mathematical logic. The British mathematician Russell noted Peano’s work and later that of Frege. Soon thereafter he showed (section 5.1) how ﬁnite descriptions like ‘set of all sets’ could be selfcontradictory (i.e. paradoxical) and pointed out the diﬃculties that arose with selfreferential terms. This paradox that Russell found existed not only in speciﬁc technical regions but in all of the axiomatic systems underlying mathematics at the same time (section 5.1). But since the paradoxes could be avoided in most practical applications of set theory, the belief in set theory as a proper foundation of mathematics remained. Axiomatic set theory (section 5.3.1) was an attempt to come to a theory without paradoxes. Various responses to the paradox (section 5.2) led to new sets of axioms for set theory. The two main approaches are by the German mathematicians Zermelo and Fraenkel (section 5.3), and by the Hungarian von Neumann, the HungarianAustrian G¨del and the Briton Bernays o (section 8.5). It also led to the emergence of the ‘intuitionistic’ philosophy of mathematics by the Dutch mathematician Brouwer (not covered here) and to a theory of types, proposed by Russell himself with the help of his former teacher, the English mathematician Whitehead. Despite of the paradox Russell and Whitehead still claimed that all mathematics could be founded on a mathematical logic; this believe was given a deﬁnite presentation in their work ‘Principia Mathematica’ (section 7.1). Various consequences followed (section 7.3) and new conceptions of logic arose (by Wittgenstein and
11 Ramsey, see section 7.2). At the turn of the century, the German mathematician David Hilbert listed certain important problems concerning the foundations of mathematics and mathematics in general (section 6.2. To overcome paradoxes and other problems that arose in existing systems, Hilbert developed a theory of axiomatic systems (section 6.1). He then stimulated his student Zermelo in using this axiomatic method to develop as ﬁrst a set of axioms for set theory (section 5.3.2). Hilbert had since then made more precise demands on any proposed set of axioms for mathematics (section 6.1) in terms of consistency, completeness and decidability. In 1931 G¨del had shown that consistency and completeness could not o both be attained (chapter 8). G¨del’s work left outstanding Hilbert’s queso tion of decidability. The English mathematician Turing proved in 1936 that there are undecidable problems, by giving the socalled halting problem that cannot be solved by any algorithm (section 9.1), after formalizing the notion of algorithm with his concept of the Turing Machine. The American mathematician Church (independently) obtained the same result but with another formalization of the notion of an algorithm, using his computational model of lambda calculus (section 9.2). In section 9.3 we state that these two notions are equivalent and correspond to the intuitive notion of algorithm or computability. In chapter 10 I summarize the theory of the foundations of mathematics, before giving my own opinion and make some suggestions for future work. This article is part of the practical component of my study of computing science, and written for a large part in 8 weeks at the HeriotWatt university in Edinburgh under supervision of prof. F. Kamareddine. I want to thank Rob Nederpelt and the formal methods section of the computing science department of the Eindhoven University of Technology for making this possible. Rob Nederpelt always inspired me to continue working on this report and was patient in explaining diﬃcult proofs to me. And last but not least, I want to thank Fairouz Kamareddine for her support and positive motivation, and Boukje Nouwen (as she breathes a sigh of relief that this is (I think) the last revision) for the typesetting and editing of large parts of this document and for helping me in many ways to ﬁnish this article in such a small period of time.
12 CHAPTER 1. INTRODUCTION .
Much of our knowledge is due to a comparatively few great mathematicians such as Newton. chapter 2. C.1 The beginning of settheory Perhaps the most surprising thing about mathematics is that it is so surprising. . These have only been found out by long study. the stage was set for a consideration of questions about the fundamental nature of mathematics. in [88] By the late 19th century the discussions about the foundations of geometry had become the focus for a running debate about the nature of the branches of mathematics ([23. but it is impossible to foresee their consequences. the work of the English mathematician George Boole in the 1850s ([49. or Riemann. extending over many centuries.Chapter 2 Cantor’s paradise 2. page 51]). Although there had been no conscious plan leading in that direction.S4. page 69/70]). and the American Charles Sanders 13 .Titchmarsh. Gauss. E. Euler. few careers can have been more satisfying than theirs. The rules which we make up at the beginning seem ordinary and inevitable. They have contributed something to human thought even more lasting than great literature. last paragraph of section 35. In the study of logic. since it is independent of language.
page 184]) that would not conﬂict with such a representation led him. to deﬁne irrational numbers in terms of convergent sequences of rational numbers (or quotients of integers.2. . CANTOR’S PARADISE Peirce around 1880 ([49. In Germany at the university of Halle.2) and then to begin his major lifework.5. section 5. Consideration of the collection of numbers (originally termed ‘point sets’. All of these debates came together through the pioneering work of the German mathematician Georg Cantor on the concept of a set. page 187]). see [30. the direction of Cantor’s research turned away from number theory and towards analysis. ﬁrst. one of his senior colleagues at Halle. had contributed to the development of a symbolism to explore logical deductions and in Germany the logician Gottlob Frege (see [98]) had directed keen attention to fundamental questions.14 CHAPTER 2. Cantor in 1870 showed ([30. who challenged Cantor to prove the open problem on the uniqueness of representation of a function as a trigonometric series (see [30. This was due to Heine. Cantor had begun work in this area because of his interest in Riemann’s theory of trigonometric series. the theory of sets and the concept of transﬁnite numbers. section 5. see section 3. page 182]).2. in 1872. Starting from the work on trigonometric series and on the function of a complex variable done by the German mathematician Bernhard Riemann (see [75]) in 1854. page 182]) that such a function can be represented in only one way by a trigonometric series.
class or (as it was ﬁrst called by Riemann (see [31.2. 8.2 Basic concepts The essence of mathematics lies in its freedom. quoted in [58] In 1974 Cantor published his ﬁrst article on settheory. The objects are also called elements or members of the set. We denote a set of elements between brackets ‘{’. What is a set? A (ﬁnite or inﬁnite) collection of objects. published ¨ in [20. that is considered as a single. 8. 4. 9}. A set is sometimes also called aggregate. wrote Cantor (in ‘Untersuchungen uber die Grundlagen der Mengenlehre I’. which may be of a mathematical nature. 3. we write 4 ∈ {2. such as numbers or functions. the fractional numbers by Q (this set is also called the rationals) and the reals by R (this set is also called the continuum). We write 4 ∈ {7.Georg Cantor. page 88]) and later by the mathematician Russell:) manifold . the integers by Z. The objects of a set themselves can also be sets. or not. 4. page 261281]). 3. We denote the wellknown sets of natural numbers by N (this set is also called the naturals). A set. . distinguishable objects of perception or thought conceived as a whole”. 9} to indicate that 4 is not an element of the set {7. 5}. 5} to indicate that 4 is an element of the set {2. abstract object. In this report we use a similar description of the concept of a set. ’}’. Example: If we consider a set that contains natural numbers. BASIC CONCEPTS 15 2.2. . is “a collection of deﬁnite. What is set theory? A branch of mathematics that deals with the properties of welldeﬁned collections of objects. and membership of an element to a set by the membership relation ∈. In a mathematical context we mostly consider sets of numbers and functions.
page 288]) two sets A and B to be identical (equal). ϕ(x) = false). We thus have that y ∈ {x  ϕ(x)} ↔ ϕ(y). For example we take the set of powers of three or the set of all even numbers (to be exact: the set containing those elements of the set of natural numbers that have the property to be divisible by 2). From now on. 3. 4}} = {{2. we denote this unique set x by ∅ and / call it the empty set. notation ⊆. In mathematical notation: (∃!x :: (∀y :: y ∈ x ↔ false)). indicates that one set is contained in the other: Deﬁnition of subset: A ⊆ B := (∀x :: x ∈ A → x ∈ B) Deﬁnition of proper subset: A ⊂ B := (A ⊂ B ↔ A ⊆ B ∧ A = B) We often want to create a new set from a given set by selecting elements that have certain properties.16 CHAPTER 2. notation A = B. Comprehension principle: For all properties ϕ there is precisely one set. Theorem: (∃!x :: (∀y :: y ∈ x)) / Proof: If we take ϕ to be false. if and only if A and B have the same elements. denoted by ∅. CANTOR’S PARADISE Cantor deﬁned ([49. 7} = {7. there is at least one set that has no elements: the empty set. When later settheory was axiomatized. As a consequence (by taking for all x. this deﬁnition became also known as the Axiom of extensionality: A = B := (∀x :: (x ∈ A ↔ x ∈ B)) Example: {3. This is equivalent to saying there is no element y that can be a member of x: (∃!x :: (∀y :: y ∈ x)). This principle was used by Cantor. the comprehension principle says that ‘there is precisely one set whose elements are exactly those objects which have the property false’. . 4} The relation ‘is a subset of’. {3. whose elements are exactly those objects which have the property ϕ. 3}. and we also call it the unrestricted or naive comprehension principle because it later (see sections 3.8 and 5. 3} and {2.1) turned out to be untenable. denoted by {x  ϕ(x)}.
⊕ is rightdistributive over := (∀E. regardless whether the operations are settheoretic or not) properties of operations: idempotence. G :: (E ⊕ F ) ⊕ G = E ⊕ (F ⊕ G)) Deﬁnition of distributivity: ⊕ is distributive2 over := (∀E. Suppose ⊕ and are binary1 operations on a certain domain and E. Deﬁnition of idempotence: ⊕ is idempotent := (∀E :: E ⊕ E = E) Deﬁnition of commutativity: ⊕ is commutative := (∀E.2. but this will not be necessary for our discussion. F :: E ⊕ F = F ⊕ E) Deﬁnition of associativity: ⊕ is associative := (∀E. commutativity. F.and rightdistributive). and we for example simply say that × is distributive over + (when in fact it is both left. But before we do that. Although Cantor did not formulate these properties as such. G :: E ⊕ (F G) = (E ⊕ F ) (E ⊕ G)) These properties can also be generated for operations of arbitrary arity. So now we can introduce some operations on sets. we ﬁrst introduce some general (i. From the previous theorem we know that (∀y :: y ∈ ∅).e. BASIC CONCEPTS 17 Corollary: (∀a :: ∅ ⊆ a) Proof: We want to prove that (∀a :: ∅ ⊆ a) or. 2 This form of distributivity is also called leftdistributivity. G :: (E F ) ⊕ G = (E ⊕ G) (F ⊕ G)) In ordinary mathematics this distinction is often left out for commutative operations. This yields us (∀x :: false → x ∈ a). on which we have deﬁned the equality relation ‘=’. by applying the comprehension principle. F. / Using the comprehension principle we can create new sets from given sets. F. using the deﬁnition of the subset relation: (∀x :: x ∈ ∅ → x ∈ a). associativity and distributivity.2. as opposed to rightdistributivity. F and G are elements on that domain (for example sets). which is true. 1 . they are used in the branch of calculus and useful in the set theory that follows in this chapter.
named after the English mathematician Augustus de Morgan: First law of reciprocity: A ⊆ B ↔ AC ⊇ B C Second law of reciprocity: (A ∪ B)C = AC ∩ B C Third law of reciprocity: (A ∩ B)C = AC ∪ B C We deﬁne the power set of V . this operation creates a larger set from a given set V. CANTOR’S PARADISE The symbol ∪ is employed to denote the union of two sets. as the set of all subsets of V . we can deduce three properties that are known as the laws of reciprocity. contains those elements of B. . The diﬀerence of sets B and A. denoted by P(V ). that are not in A. Example: The operations union and intersection on sets are both idempotent. A ∩ B is deﬁned as the set composed of all elements that belong to both A and B. denoted B − A. Deﬁnition of diﬀerence: B − A := {x  x ∈ B ∧ x ∈ A} / If A ⊆ B we often call the diﬀerence B − A the relative complement of A in B. We then call B the universe. Deﬁnition of union: A ∪ B := {x  x ∈ A ∨ x ∈ B} The intersection operation is denoted by the symbol ∩. and if it is clear what the universe is we often denote the relative complement of A by Ac . A collection of sets is called (pairwise) disjoint or mutually exclusive if any two distinct sets in it are disjoint. Thus. the set A ∪ B is deﬁned as the set that consists of all elements belonging either to set A or set B. From the deﬁnitions that we have introduced so far.18 CHAPTER 2. commutative and associative. The second and third law are also known as the laws of de Morgan. Note that if V = ∅. Deﬁnition of intersection: A ∩ B := {x  x ∈ A ∧ x ∈ B} Any two sets the intersection of which is the empty set are said to be disjoint.
BASIC CONCEPTS Deﬁnition of powerset: P(V ) := {A  A ⊆ V } Given a set V . Deﬁnition of partition: P is a partition of X := X = {A  A ∈ P } ∧ (∀A. Assume P is a set of subsets of X. We can do the same for the intersection. B : A.2.1 and [17] in section 2. that is a family of subsets that are mutually exclusive and jointly exhaustive. B ∈ P : A = B ∨ A ∩ B = ∅) In this chapter I have made extensive use of [30] in section 2. Deﬁnition: Deﬁnition: A := {x  (∃y :: y ∈ A ∧ x ∈ y)} A := {x  (∀y :: y ∈ A → x ∈ y)} We can divide a set of objects into a partition.2. the union is then deﬁned as the set of all objects which belong to at least one set in the collection A. . we thus have that (∀y :: y ∈ P(V ) ↔ y ⊆ V ) 19 We can extend the union of a pair of sets to any ﬁnite collection of sets.2.
CANTOR’S PARADISE .20 CHAPTER 2.
Compared to a ‘normal’ pair. within the limits of his imagination. in experience. he is not thereby discovering the fundamental principles of the universe nor becoming acquainted with the ideas of God. In this section we will consider pairs and the cartesian product. What he is to imagine is a matter for his own caprice. where two pairs are considered equal if they have the same elements. to construct what world he pleases.2) and functions (in section 3.Chapter 3 Mathematical constructs in settheory 3. sets of entities which obey the same logical scheme as his mathematical entities.W. First we consider the mathematical concept of an ordered pair < a.J. necessary before we can treat relations (in section 3. b >. 1925 Now that we have this apparatus of settheory available.3). he has created a branch of science. but that we can deﬁne some basic mathematical constructs in settheory. we want an ordered pair to also have the property 21 .N. Sullivan in Aspects of Science.1 Some mathematical concepts The mathematician is entirely free. we will see that it is not just a separate branch of mathematics. then he has applied his mathematics to the external world. If he can ﬁnd. .
We now deﬁne the cartesian product of a set of sets. d > ↔ a = c ∧ b = d) We can now easily verify that the following deﬁnition (see [17. . Deﬁnition of cartesian product of a set of sets: ×V := {f : I → i∈I Vi  (∀i : i ∈ I : f (i) ∈ Vi )} 1 Representation originally by Kuratowski. see [49. we can now use the same deﬁnition in settheory: Deﬁnition of cartesian product: A × B := {< a. Deﬁnition of ordered pair1 : < a. MATHEMATICAL CONSTRUCTS IN SETTHEORY that the elements appear in the same order: (∀c. b >  a ∈ A ∧ b ∈ B} Let V = {Vi  i ∈ I} be a set of sets. b}} As the cartesian product A × B is by deﬁnition the set of all ordered pairs < a.22 CHAPTER 3. {a. denoted by ×V or ×i∈I Vi . that will be introduced on page 29. The deﬁnition uses the concept of a function. b > := {a. d :: < a. page 294]. chapter 8]) in settheory satisﬁes the desired property. b > = < c. b > with a ∈ A and b ∈ B.
5 >. < 2.2. Xn := R ⊆ X1 . < 0. . X2 × . < 1. . We deﬁne the following shorthand notation (sometimes also written in inﬁx notation as xR y): R(x. but relations between objects. Thus. . .2 Relations Mathematicians do not study objects. 2 >. for n ∈ N. × Xn . 1 >. Deﬁnition of binary relation: R is a binary relation between X and Y := R ⊆ X × Y Note: We can easily generalize this deﬁnition for nary relations: R is an nary relation on X1 . . We formalize this notion in settheory. 3 >. . only those pairs < x. Content to them is irrelevant: they are interested in form only. Poincar´ e In mathematics. . < 1. There we deﬁned the set R ⊆ X ×Y implicitly by using a statement. y > ∈ R’. . between N and N) can be deﬁned as: < 0. < 2. . 4 >. 3 >. they are free to replace some objects by others as long as the relations remain unchanged. 4 >. Example: The relation < on the naturals (i.e. < 0. The mathematical expression ‘x < y’ is now equivalent to the set theoretic expression ‘< x. y) := < x. We call n the arity of the function. y > ∈ R. a relation maps each element from an input set (called domain) to either true or false. . < 1. . Example: We have already seen the deﬁnitions of the subset and proper subset relations in section 2.H. . . . We will continue to use statements to deﬁne relations. . with R representing the ‘less than’ relation. RELATIONS 23 3. . . . 3 >. y > are in R for which the statement holds (here we are using in fact the comprehension principle of page 16). < 2.1.J.3. 2 >. .
range: dom(R) := {x ∈ X  (∃y : y ∈ Y : R(x. y) ∧ R(y. Deﬁnition of identity relation: IV := {< x. z)) Deﬁnition of connectivity: R is connective := (∀x. y. symmetric and transitive . page 30]). z ∈ X : R(x. Hereby we can explicitly state on which domain the property holds (e. y : x. y : x. In settheory. As we did for operations in section 2. section 1. x)) Deﬁnition of antisymmetry: R is antisymmetric := (∀x. y) ∧ R(y. R ⊆ X × X). y. MATHEMATICAL CONSTRUCTS IN SETTHEORY On a relation R we can deﬁne the concepts of domain and range. y : x. y ∈ X : R(x. Deﬁnition of reﬂexivity: R is reﬂexive := (∀x : x ∈ X : R(x.b. y) ∨ (x = y) ∨ R(y.9. y))} ran(R) := {y ∈ Y  (∃x : x ∈ X : R(x. we denote the identity relation on V by IV . x)) Deﬁnition of symmetry: R is symmetric := (∀x. Note that we have already deﬁned an equality relation ‘=’ on X at page 16.g. x) → x = y) Deﬁnition of transitivity: R is transitive := (∀x. R is reﬂexive on X) or leave this implicit (e. x)) Deﬁnition of equivalence: R is an equivalence relation := R is reﬂexive. y))} If we deﬁne the identity relation of X.e.g. simply R is reﬂexive). y) → R(y.24 CHAPTER 3. y ∈ X : R(x. we want it to have the usual property that idX (x) = x for all x ∈ X (see for example [3.2. Deﬁnition of domain. we can also deﬁne some general properties of relations. z) → R(x.5. y ∈ X : R(x. y > ∈ V × V  x = y} Assume R is a binary relation on a set X (i. z : x.
but not connective. we denote the equivalence class of x with respect to R as [x]R . Deﬁnition of (weak) partial ordering: R is a (weak) partial ordering := R is reﬂexive. by categorizing them based on the properties they have.2.3. and R is a relation on X. the quotient set X/R of X modulo R is the set of equivalence classes [x]R for all x ∈ X. and is not the same as antisymmetric. Deﬁnition of quotient set: X/R := {[x]R  x ∈ X} We now continue to build on the concept of relations. An important property of relations is the ability to compare and order elements. antisymmetric (note that the proof of antisymmetry uses the axiom of extensionality of page 16) and transitive. If R is an equivalence relation on a set X. y)} If R is an equivalence relation on X. RELATIONS 25 Note: Asymmetric means not symmetric. Deﬁnition of equivalence class: [x]R := {y ∈ X  R(x. Suppose X and Y are sets. antisymmetric and transitive Deﬁnition of (total or linear) ordering: R is a (total or linear) ordering := R is irreﬂexive. antisymmetric. Example: The subset relation is reﬂexive. transitive and connective Deﬁnition of wellordering: R is a wellordering := R is an ordering on X and each nonempty subset of X has a least element . antisymmetric and transitive (on X) Deﬁnition of quasi ordering: R is a quasi ordering := R is irreﬂexive and transitive Deﬁnition of strict partial ordering: R is a strict partial ordering := R is irreﬂexive.
In the literature the immediate successor or predecessor is sometimes called just successor or predecessor. x1 is an immediate predecessor of x2 if in addition (¬∃x3 : x3 ∈ X ∧ x3 = x1 ∧ x3 = x2 : R(x1 . x2 is an immediate successor of x1 if in addition (¬∃x3 : x3 ∈ X ∧ x3 = x1 ∧ x3 = x2 : R(x1 . The notions of smallest and largest elements will be introduced hereafter. MATHEMATICAL CONSTRUCTS IN SETTHEORY Deﬁnition of wellfoundedness: A set V is wellfounded by a relation R := S is partially ordered by R and contains no inﬁnite descending chains A set S contains a set C that is an inﬁnite descending chain iﬀ C ⊂ S ∧ C has no minimal element. x2 )) Note that with these deﬁnitions it can be easily proved that if a relation R on X is an ordering. x3 ) ∧ R(x3 . x3 ) ∧ R(x3 . x2 ) ∧ ¬R(x2 . we deﬁne the wellknown concepts of (immediate) successor and predecessor. and when it is a quasi ordering by ≺. partially ordered by and let Y be a subset of X. Let X be a set. or we simply speak of the ‘next’ or ‘previous’ value. Now we can speak of a set of which the elements are ordered by a relation R. Deﬁnition of (immediate) predecessor: An element x1 ∈ X is a predecessor of an element x2 ∈ X (with respect to an ordering R on X) := R(x1 . Now we can distinguish elements based on their order.26 CHAPTER 3. x2 ) ∧ ¬R(x2 . Deﬁnition of minimal element: x is a minimal element of X := x ∈ X ∧ (¬∃y : y ∈ X ∧ y = x : y x) . x1 ). x1 ). then each element except the smallest has a unique immediate predecessor and each element except the largest has a unique immediate successor. When R is a partial ordering we often denote it by the symbol . Theorem: (without proof) Any subset of a wellfounded set is also wellfounded. x2 )) Deﬁnition of (immediate) successor: An element x2 ∈ X is a successor of an element x1 ∈ X (with respect to an ordering R on X) := R(x1 . Sometimes we also see that the term ‘direct’ is used in stead of ‘immediate’.
and 2 is the inﬁmum of X. Then R is a partial order (but not strict) and also a quasi order. 24. 36} and R(x.3. last) element of X := x ∈ X ∧ (∀y : y ∈ X : y x) Deﬁnition of lowerbound: x is a lowerbound for Y in X := x ∈ X ∧ (∀y : y ∈ Y : x Deﬁnition of upperbound: x is an upperbound for Y in X := x ∈ X ∧ (∀y : y ∈ Y : y y) y) x) Deﬁnition of inﬁmum: x is an inﬁmum for Y in X := x is the greatest lowerbound for Y in X Deﬁnition of supremum: x is a supremum for Y in X := x is the smallest upperbound for Y in X Example: Let X = {4. 1 is a lowerbound for X. y) := x is a divisor of y. RELATIONS 27 Deﬁnition of maximum element: x is a maximum element of X := x ∈ X ∧ (¬∃y : y ∈ X ∧ y = x : x Deﬁnition of least element: x is a least (also called smallest or ﬁrst) element of X := x ∈ X ∧ (∀y : y ∈ X : x y) Deﬁnition of maximal element: x is a maximal (also called greatest. 6. . largest. but not a (total) order. 12. 4 and 6 are minimal elements of X. but X has no least element.2.
n} = ∅. we can apply the induction principle to conclude that A ∩ {0. Suppose N = {0. . . . that (∀A : A ⊆ N ∧ A = ∅ : A has a least element). if A ∩ {0. . . The least element of A ∩ {0. . For  N  = n + 1. . For  N  = 0 it is trivial that A is wellordered. This principle can be shown (a proof can be found in [59. n} has a least element. .4. MATHEMATICAL CONSTRUCTS IN SETTHEORY The socalled least number principle says that any nonempty subset of the natural numbers has a least element. From the socalled least number principle we can conclude that N is also wellordered by <. . n ∈ N. . If A ∩ {0. . . . . n}. . notation  A . Example: The relation < on the naturals is an example of a total ordering on N. that will be introduced in section 3. . . . . . . . n + 1}. Proof: We know that < is an ordering on N. .28 CHAPTER 3. . n + 1 is a least element of A. . n} is also a least element of A ∩ {0. We show by induction on the number of elements of A. n} = ∅. Let A ⊆ N . page 7]) to be equivalent to the principles of weak and strong induction. We prove the latter.
y > of the input and output set if it belongs to the function or not.3. We now introduce a notation for the set of all functions f : X → Y . we now deﬁne some general properties for functions. f is a function or mapping from X to Y means that f assigns to each x ∈ X a uniquely determined y ∈ Y . in other words it is a special kind of relation that indicates for each pair < x. x2 : x1 . Example: We have the following property: f : X → Y is surjective ↔ Ran(f ) = Y . We can deﬁne this notion in settheory by using a relation between X and Y such that for each x ∈ X there is a unique y ∈ Y such that < x. More precisely. x2 ∈ X : x1 = x2 → f (x1 ) = f (x2 )) Deﬁnition of surjective: f : X → Y is surjective or a surjection := (∀y : y ∈ Y : (∃x : x ∈ X : y = f (x)) Deﬁnition of bijective: f : X → Y is bijective or a bijection := f is surjective and f is injective If f is bijective.3.3 Functions In mathematics. Deﬁnition of injective: f : X → Y is injective or an injection := (∀x1 . a function maps each element from an input set to one or more elements of an output set. FUNCTIONS 29 3. notation f (x) = y. notation f : X → Y := f ⊆ X × Y ∧ (∀x : x ∈ X : (∃!y : y ∈ Y : < x. Deﬁnition of function: f is a function from a set X to a set Y . . y > ∈ f )) The deﬁnitions of domain and range as given in the subsection about relations can now also be used for functions. f is also called a (onetoone) correspondence between X and Y . y > ∈ f . Deﬁnition of Y X : Y X := {f ∈ P(X × Y )  f is a function from X to Y } As we did before for relations and operations.
. with g(x) = y iﬀ x = sin(y) is a relation. with X0 ⊆ X := the function f X0 : X0 → Y with (∀x : x ∈ X0 : f X0 (x) = f (x)) Just as in algebra. . not a function. We will now consider two special kinds of functions: the identity function and the sequence. s ∈ X N ) Deﬁnition of identity function: The identity function idX := idX : X → X and (∀x : x ∈ X : idX (x) = x) We now introduce some operations on functions in settheory. Deﬁnition of composition: The composition g◦f of two functions f : A → B and g : B → C := the function g ◦ f : A → C with g ◦ f (x) = g(f (x)). 2π]. . MATHEMATICAL CONSTRUCTS IN SETTHEORY Example: f : N → [−2π. . . and reason about sets of structures instead. g : [−2π. 2π] → N. with f (x) = sin(x) is a function and a relation.e. . called isomorphism. R0 . for all x ∈ A Deﬁnition of inverse function: The inverse of a bijection f : X → Y := the function f −1 : Y → X with (∀y : y ∈ Y : f −1 (y) = x ↔ y = f (x)) Deﬁnition of restricted function: The restriction of a function f : X → Y to X0 . . Rp is a (relational) structure := X is a set and R0 . we can now combine a set and relations on that set into a structure. Rp are relations on X The concept of a structure enables us to abstract from the exact set and relations.30 CHAPTER 3. Deﬁnition of sequence: s is a sequence of X := s is a function from N to X (i. . We can easily check that these deﬁnitions correspond to mathematical operations. . There also is a useful deﬁnition for equivalence of structures. Deﬁnition of (relational) structure: X.
< . Rp and S = Y. . f is not an isomorphism from N. we can now abstract over structures. ∗ and R+ . xni ∈ X : Ri (x0 . xni : x0 . with a ⊕ b := b divides a. < to Neven . notation R S := there exists an isomorphism from R to S Deﬁnition of automorphism: f is an automorphism of R := f is an isomorphism from R to R Example: An isomorphism from structure N. Sp be two structures. . . . . . . . . . . the function f (x) = 2x3 is an automorphism of R. . . . Deﬁnition of isomorphism: f is an isomorphism between R and S := f is a bijection from X to Y and (∀i : 0 ≤ i ≤ p : (∀x0 . r2 ∈ R+ .3. we call them isomorphic. f (xni )))) With the notion of isomorphism. . . Rp is the identity function idA : A → A. xni ) ↔ Si (f (x0 ). R0 . Deﬁnition of isomorphic: Two structures R and S are isomorphic. . . . When two structures are similar (the sets are of the same size and the relationships between the elements in one structure are retained between images of those elements in the other structure). Example: An automorphism of A. R0 . Also. a >  a ∈ A}. < . . + . so idA = {< a. . < is given by f : N → Neven . log(r1 ∗ r2 ) = log(r1 ) + log(r2 ). . Example: The function g : R+ → R+ with g(x) = log(x) is an isomorphism between R+ . .3. . . . FUNCTIONS 31 Let R = X. with f (n) = 2n. . because for all r1 . such that (∀i : 0 ≤ i ≤ p : the arity of Ri and Si is ni + 1). . S0 . . . ⊕ to N.
. MATHEMATICAL CONSTRUCTS IN SETTHEORY 3. or from the individual to the universal. Philosophy department. 100 was greasy . . . . the conclusion goes beyond the premisses (or the evidence).1 Induction Induction is a method of reasoning from a part to a whole.32 CHAPTER 3. from particulars to generals. Reasoning by analogy. including a distinguishment between determinative and conceptual induction. Earlham College. . As said in [49. In chapter 14. all coﬀee shop burgers are greasy (or: the next coﬀee shop burger will be greasy). in which case they become deductions).4. Coﬀee shop burger no.A. Example of inductive reasoning: 2 Coﬀee shop burger no.4. This is the source of uncertainty in inductions: inductions are strengthened as conﬁrming instances pile up. . In my view it would be just as sensible for the two ends of a worm to quarrel. . 1 was greasy . . Therefore. .4 Induction Methods There is a tradition of opposition between adherents of induction and deduction. . is also a primitive form of induction (see [23. where a conclusion is made based on an analogues situation.3). . In both these kinds of induction. quoted in [76] 3. 2 was greasy . Coﬀee shop burger no. whereas in induction it asserts more. . but they can never bring certainty (unless every possible cause is actually examined. the broad diﬀerence between deductive and inductive reasoning is that in deduction the conclusion asserts less than the premisses. . 2 Example from: Peter Suber. section 3 of [49] there is a more detailed treatment of inductive reasoning. . page 6]). In ordinary induction we examine a certain number of cases and then generalize. Whitehead. It should not be confused with the mathematical principle of induction (treated in section 3. page 366]. So in induction the conclusion contains information that was not contained in the premisses.
page 121]). each of which is either a premise or a consequence of a statement occurring earlier in the proof. the traditional view that deduction proceeds from the general to the speciﬁc has been abandoned as incorrect by most logicians. in mathematical context. if it does not we say that our reasoning is invalid . deduction is a method of reasoning that is based on a rigorous proof: a derivation (using ﬁxed rules called a system of logic). .i.1) are a form of reasoning with deductive certainty. If the conclusion follows from the premisses. we say that our reasoning is valid . Some experts regard all valid inferences as deductive in form and for this and other reasons reject the supposed contrast between deduction and induction. can be seen as the theory of the formal structure of deductive reasoning. quoted in [100] In contrast to induction.4. chapter 1.A. not empirical procedures.3.2 Deduction Mathematics. the deductive argument has become more subtle and complex and is now based on modern symbolic logic. Originally based on Aristotle’s logic. but a syllogism is now recognized as merely a special case of a deduction. necessary. Whitehead. Also.e. The Greek found deductive reasoning. .1) and Russell’s Principia Mathematica (see section 7. deductive reasoning. is the development of all types of formal. of one statement (the conclusion) from one or more statements (the premisses) . The logic of Hilbert’s metamathematics (see section 6. the method to establish mathematical facts.4. in its widest signiﬁcance. a chain of statements. This usage is a generalization of what the Greek philosopher Aristotle called the syllogism (see [49. INDUCTION METHODS 33 3. Logic. although others have proposed diﬀerent formalizations of deductive logic (see [49. The German mathematician Hilbert greatly contributed to deductive reasoning as we will see when we introduce his proof theory (also known as the axiomatic method) in chapter 6. In deductive reasoning. we are not directly concerned with the truth of the conclusion but rather whether the conclusion does or does not follow from the premisses. section 5 and 6)]).
Together. Weak: prove that if the theorem holds for an arbitrary element n. ‘Mathematical induction’ is unfortunately named. mathematical induction concludes with deductive certainty. then you have proven only that the theorem holds for that case and its successors. This case is also called base case. Prove that the theorem holds for a speciﬁc case (which often is minimal for a given ordering of the elements). then it holds for the element n + 1 Strong: prove that if the theorem holds for all elements up to some arbitrary element n. The name was probably inspired by the fact that. Prove a rule that says that if the theorem holds for an arbitrary element. 3) Conclusion. The claim that the theorem is true for an arbitrary element is called the induction hypothesis. 2) Induction step. it is true for the next case. for it is unambiguously a form of deduction. The induction step can take two forms which correspond to two forms of mathematical induction. axiom 3).34 CHAPTER 3. The informal structure of the proof of a theorem by mathematical induction is fairly simple: 1) Basis. This often is a rule of heredity that tells us that the theory is true for the immediate successor case of an arbitrary element if it is true for the arbitrary element itself. 1 and 2 imply that the theorem holds for all cases starting with the base case. MATHEMATICAL CONSTRUCTS IN SETTHEORY 3. not for all possible cases. also known as mathematical induction. It was even used by Peano to deﬁne the concept of natural numbers (see section 4. is an important process for proving theorems.1. then it holds for the element n + 1 . just like induction.3 The principle of induction Informal The principle of induction. as we will see.4. But. it generalizes to a whole set from a smaller sample. If you didn’t use the minimal case in step 1. Again we assume there is an ordering of the elements with +1 the immediate successor relation.
The principle presupposes the following two conditions: l A S is a set. we also refer to the principle as natural induction.3.1) 0 is in N and if x is in N then x + 1 is in N.4. wellfounded by relation R (such that ‘+’ denotes the immediate successor of an element with respect to the relation R) and with smallest element e l B Every element except e has a (unique) immediate predecessor and ϕ is a property of elements of S l l If A and B hold.e. if s ∈ S has property ϕ. if we have the axioms (see Peano axioms in section 4.e. Formal Suppose that we want to prove a property ϕ(s) that holds for all s ∈ S. It does not go without saying that if we are to use mathematical induction to prove that some theorem applies to ‘all possible cases’.e. Deﬁnition of the (weak) (mathematical) induction principle: if l C ϕ(e) (i. the nth case. This condition is also known as S is inductive. This is important. the set N is inductive. etc. And we have to be able to speak about the minimal case. since many mistakes are being made in applying the principle. The induction principle assumes that S is a wellfounded set and every element except for the smallest has an immediate predecessor. i. then those cases must somehow be enumerable and in some way linked to the integers. the successor of a given case. e has a property ϕ) l D (∀s : s ∈ S : ϕ(s) → ϕ(s+)) (i. we can use the induction principle. In case the set S is the naturals. The structure of an inductive set in fact resembles that of the naturals. INDUCTION METHODS 35 We will now formally state the principle of induction. then the (unique) immediate successor of s also has property ϕ) then the property ϕ holds for every element in S .
has an immediate predecessor. if s has property ϕ then so does the successor of s. Consider the natural numbers with the ordering deﬁned as follows: • if n and m are both even. l We now show (with proof by contradiction) why the additional property B that every element except the smallest must have an immediate predecessor is necessary for the induction principle. denoted e. But then by D . If n = e. and suppose that a property ϕ is true for e. l One way is to prove in step C that ϕ holds for a (possibly nonminimal) case l b ∈ S. this principle can be generalized in several ways. Proof: Suppose S is a wellfounded set and every element except the smallest. ϕ must also hold for the immediate successor of n−. we always deﬁne n We can check that N is wellfounded by . Let N be the set of elements of S for which ϕ is not true. Thus ϕ must be true for all s ∈ S. .36 CHAPTER 3. which is even. Also. By the theorem of page 26 we also know that if S is wellfounded. n has an immediate predecessor. never an odd number. If n > e. the successor of an even number is always the next even number. then n m if n < m m if n < m m • if n is even and m is odd. That is because in our ordering.e. we have a contradiction. step D is also called the induction step. then n • if n and m are both odd. as well as for the immediate successor s+ ∈ S if it is true for s ∈ S. and ϕ(s) is called the induction hypothesis. ϕ must hold for n−. denoted n−. The smallest element in the ordering is 0. We now prove by contradiction that ϕ holds for all s ∈ S. The conclusion then is that the property ϕ holds for all elements in S that are ordered larger or equal to b. We take the property ϕ that every element is even. but not every element (for example 1) has an immediate predecessor. Suppose that ϕ is not true for all s ∈ S. Since n is the smallest element l for which ϕ doesn’t hold. i. any subset of S is also wellfounded. and if s has property ϕ. MATHEMATICAL CONSTRUCTS IN SETTHEORY l l Step C is also called the base of a proof by induction. thus N contains a smallest element n. As we mentioned before. N = {s ∈ S  ¬ϕ(s)}. In step D we then show that (∀s : s ∈ S ∧ s ≥ b : ϕ(s) → ϕ(s+)). that is n: contradiction. then s must be even.
4. that is based on the same assumptions ( A and B ) as the weak induction principle. C and D holding) every natural number is even: contradiction! There is however a weaker principle. but instead of C and D with . Principle of strong (mathematical) induction: The same as for (weak) l l induction. called transﬁnite induction which suitably stated . INDUCTION METHODS 37 l l l Therefore (with only conditions A .does apply to every wellordered set.3. But ﬁrst we regard a l l stronger principle.
when applied to ordinals (ordinals form a wellordered and hence wellfounded set and are introduced in section 3. Thus {x ∈ X  ¬ϕ(x)} is nonempty and has a smallest element e (since X. This form of induction. 3 . for example in [4]. Note that the base case is not really left out. and with smallest element e. X for S. it can be applied to any set S that is wellordered by relation a R. Sometimes this principle is called the Principle of Complete Induction. Again we use proof by contradiction. l l but instead of A and B as assumptions. R is wellfounded). R is a structure such that A . and assume (∃x : x ∈ X : ¬ϕ(x)). for all x ∈ S we have ϕ(x) if all Rpredecessors y of x have property ϕ) Sometimes this is also informally stated using the infamous three dots as (∀s : s ∈ S : (ϕ(e) ∧ ϕ(e+) ∧ . l According to E (substitute z for y. . D2 ) (∀x : x ∈ S : (∀y : y ∈ S : R(y. . We now have ¬ϕ(e ) ∧ (∀z : z ∈ X : R(z.2) is called transﬁnite induction. since it is implicitly present in the quantiﬁcation (take e for x). e ) → ϕ(z)). B and E hold. and take e for x) we then have ϕ(e ): contradiction. but this is less common.8. x) → ϕ(y)) → ϕ(x)) (i.e. ∧ ϕ(s)) → ϕ(s+). l l l Proof: Suppose X. Principle of transﬁnite induction3 : The same as for strong induction.
Proof: The proof of the principle of transﬁnite induction is similar to the proof of the strong induction principle. without Peano’s induction axiom). see also section 3.8.2) and ‘Mathematical Induction Principle’ are all equivalent. but none of them can be proven from the other fundamental axioms in ZF set theory alone. or one of its equivalent forms. ‘Zorn’s Lemma’. Without these conditions (to be exact. The four statements known as ‘Axiom of Choice’. As an example. MATHEMATICAL CONSTRUCTS IN SETTHEORY An example of such a set are the ordinals or cardinals. we now prove that (mathematical) induction is a special case of transﬁnite induction. all three given principle are equivalent.8. To prove this it suﬃces to show l l l that ( C and D ) ↔ E . Clearly. s has an immediate predecessor (i.3 and chapter 4 respectively). we cannot directly prove the principles to be true from the ZF axioms alone4 . An extra axiom is needed. see page 3. that will be introduced in section 5. These proofs however are based on an underlying set of axioms (the socalled ZF axioms and the Peano axioms. the others follow as consequences. it is not possible to prove mathematical induction. There are also other equivalent statements that are sometimes used (such as Zermelo’s postulate). since we proved them to be true. With only the fundamental axioms of ZermeloFraenkel set theory. and it is a nice exercise to prove the equivalence of these statements. In that case we can prove the equivalence of the principles by showing that they imply each other.2) The case for limit ordinals is typically approached by noting that a limit ordinal b is (by deﬁnition) the union of all ordinals a < b and using this fact to prove ϕ(b) assuming that ϕ(a) holds true for all a < b. the infamous Axiom of Choice. meaning that if you assume one of them to be true. s has no immediate predecessor (i. ‘WellOrdering principle’ (also known as wellordering theorem. 3. A proof by transﬁnite induction typically needs to distinguish three cases: 1. 4 .e. s is a socalled limitordinal. s is a minimal element 2. for the set of natural numbers.38 CHAPTER 3.e. the set of elements which are smaller than s has a largest element) In this case we can apply normal induction. or even the class of all ordinals.
and instantiate ψ with the property ϕ. and thus ϕ(q) also holds for all q > 0. We assume TFIND’. we have (p < 0 → ψ(p)) → ψ(0). For all properties ψ we have to prove (∀q : q ∈ N : (∀p : p ∈ N : p < q → ψ(p)) → ψ(q)). we have IND’. i. Let s ∈ N be the smallest number such that s < k + 1 ∧ ¬ ϕ(s). We now want to prove IND’. (∃p : p ∈ N : p < k + 1 ∧ ¬ ϕ(p)). If we take q = 0. If we take q = 0. i. (∀p : p ∈ N : p < 0 → ϕ(p)) is trivially true. By IND’ we also know that (∀k : k ∈ N : ϕ(k) → ϕ(k + 1)). Proof by Contradiction: (from: [17]) It suﬃces to prove that IND’ ≡ TFIND’. That means the condition of TFIND’ (∀p : p ∈ N : p < q → ϕ(p)). contradiction. Suppose we have (∀q : q ∈ N : (∀p : p ∈ N : p < q → ψ(p)). . that is (∀r : r ∈ N : r < s → ϕ(r)). ψ(0). First we prove this for q = 0. Thus we have ϕ(0). with q = k + 1 must not be true: ¬(∀p : p ∈ N : p < k + 1 → ϕ(p)). Hereby we have proved TFIND’. and since we already have proven (∀ϕ :: ϕ(0)). ϕ(k) ∧ ¬ϕ(k + 1). INDUCTION METHODS Normal induction (IND): (∀ϕ :: ϕ(0) ∧ (∀k : k ∈ N : ϕ(k) → ϕ(k + 1)) → (∀n : n ∈ N : ϕ(n))) Transﬁnite induction (TFIND): 39 (∀ψ :: (∀q : q ∈ N : (∀p : p ∈ N : p < q → ψ(p)) → ψ(q)) → (∀m : m ∈ N : ψ(m))) We can prove the equivalence of IND and TFIND in two ways: in a constructive way or with a proof by contradiction.4. Assume k ∈ N. But then we would have ϕ(s) according to TFIND’ (namely if we take s for q and r for p). This is true by the assumption of IND’.3. with IND’ ≡ (∀ϕ :: ϕ(0) ∧ (∀k : k ∈ N : ϕ(k) → ϕ(k + 1))) TFIND’ ≡ (∀ψ :: (∀q : q ∈ N : (∀p : p ∈ N : p < q → ψ(p)) → ψ(q))) Proof of TFIND’ → IND’: Assume ϕ is a property.e. Proof of IND’ → TFIND’: Assume IND’. We give both proofs. Now we have proved that (∀ϕ :: (∀k : k ∈ N : ϕ(k) → ϕ(k + 1))). We now prove by contradiction that (∀k : k ∈ N : ϕ(k) → ϕ(k + 1)).e. Now we prove this for q > 0. instantiate ϕ with ψ.
for all k ∈ N. Let k ∈ N. TFIND gives us: (∀k : k ∈ N : (∀l : l ∈ N : l < k → ϕ(l)) → ϕ(k)). Let s(k) := (∀l : l ∈ N : l < k → ϕ(l)). the assumption) is to assume that something is true for ‘simpler’ forms of an object and then prove that it holds for ‘more complex’ forms. We prove (∀n : n ∈ N : ϕ(n)) by ﬁrst proving that (∀n : n ∈ N : s(n)) by using IND. s(0) holds trivially since the range of l is empty in that case. (i) gives us that (∀n : n ∈ N : ϕ(n)). We now need to prove that ϕ(0) ∧ (∀k : k ∈ N : ϕ(k) → ϕ(k + 1)) → (∀n : n ∈ N : ϕ(n)). In such cases straight induction is not always useful. But we have assumed that (∀k : k ∈ N : ϕ(k) → ϕ(k + 1)).40 CHAPTER 3. Assume ϕ(0) ∧ (∀k : k ∈ N : ϕ(k) → ϕ(k + 1)). Thus ϕ(k) holds for k = 0. The inductive hypothesis (i. However. and that the act of building preserves that property. This means ϕ(k − 1) holds (since k − 1 ∈ N). Since s(k + 1) ≡ s(k) ∧ ϕ(k).. We want to use TFIND to conclude (∀n : n ∈ N : ϕ(n)). If k = 0. we can conclude s(k + 1) because ϕ(k) follows from (i) and the deﬁnition of s(k). MATHEMATICAL CONSTRUCTS IN SETTHEORY Constructive Proof: Proof of TFIND → IND: Assume TFIND. by the deﬁnition of s. Also assume that (i): (∀k : k ∈ N : (∀l : l ∈ N : l < k → ϕ(l)) → ϕ(k)). Proof of IND → TFIND: Assume ψ is a property. (∀l : l ∈ N : l < k → ϕ(l)) is trivially true since the range of l is empty. and thus (by using IND) that (∀n : n ∈ N : s(n)). Assume k > 0. Suppose s(k) holds. and (∀l : l ∈ N : l < k → ϕ(l)). and subsequently that (∀n : n ∈ N : s(n) → ϕ(n)). It is useful when objects are built up from more primitive objects: if we can show the primitive objects have the desired property. Structural Induction In many cases we do not want to prove properties about the integers or similar wellordered sets. Clearly.e. We now have that (∀l : l ∈ N : l < k → ϕ(l)) → ϕ(k). ‘Complexity’ can be deﬁned in several ways: the most common way is to say that one object is more complex than another if it includes that . This generalized induction principle is known as structural induction. Now we have s(0) ∧ (∀k : k ∈ N : s(k) → s(k + 1)). and let ϕ be a property. forms of induction can also be appropriate when trying to prove properties about structures deﬁned recursively. Thus ϕ(k) holds also for k > 0. then we have shown that all objects must have the property. And.
We assume two real numbers x and y and prove EQ by induction on n. For some reason.3. in a later version of this report. and abbreviate the equality by EQ. but this need not always be the case. we rewrite the left side of this equation: LHS = (x + y)k+1 = (x + y)k (x + y) = (here in fact we are using the induction hypothesis) k j=0 k j xk−j y j (x + y) = . y ∈ R. and RHS is 1 0 x1−0 y 0 + 1 1 x1−1 y 1 = x + y Induction case: We assume EQ is true for n = k and have to show that it is then also true for n = k + 1 : k+1 (x + y) k+1 = j=0 k+1 j xk+1−j y j First. most textbooks take n = 1 as the basis. and the righthand side RHS. in which case LHS is simply x + y. The binomial theorem states that for all x.4. since both sides are 1. 41 A general treatment of recursively deﬁned structures (formal deﬁnition of structural induction over recursive datatypes) will be presented in a later version of this report. INDUCTION METHODS other object as a subpart. Example: We show that mathematical induction is an instance of the general notion of structural induction over values of recursively deﬁned types. Example: As an example of the use of mathematical induction we prove the binomial theorem. Basis case: For n = 0 the EQ clearly is correct. and n ∈ N we have n EQ ≡ (x + y) = n j=0 n j xn−j y j We call the lefthand side of this equality LHS.
we use Pascal’s identity: (∀k. MATHEMATICAL CONSTRUCTS IN SETTHEORY k j=0 k j k x k−j+1 y + j=0 j k j xk−j y j+1 In rewriting the right side of the equation. This gives k RHS = j=0 k j k+1 xk+1−j y j + j=1 k j−1 xk+1−j y j . and y k+1 into the second sum (as the j = k + 1 term).42 CHAPTER 3. k x k+1 + y k+1 + j=1 k k+1 j k j k j + xk+1−j y j = k j−1 y j x k+1 + y k+1 + j=1 k xk+1−j y j = k x k+1 + y k+1 + j=1 x k+1−j + j=1 k xk+1−j y j j−1 We can now bring xk+1 into the ﬁrst sum (as the j = 0 term). n : k. n ∈ N ∧ 0 < k < n : We ﬁrst prove the latter: n k−1 = = + n k = n! n! + (k − 1)!(n − k + 1)! k!(n − k)! n+1 k = n k−1 + n ) k n! k n! (n − k + 1) n! (k + (n − k + 1)) + = k!(n − k + 1)! k! (n − k + 1)! k! (n − k + 1)! (n + 1)! = k! (n + 1 − k)! k+1 n+1 k Now we rewrite RHS: RHS = j=0 k+1 j xk+1−j y j = We split out the j = 0 and j = k + 1 terms before applying Pascal’s identity.
y ∈ R and n ∈ N. INDUCTION METHODS and k 43 LHS = j=0 k j k xk−j+1 y j + j=0 k j xk−j y j+1 The ﬁrst sums of LHS and RHS are the same. leaf)). by counting both the leaves and the nodes. First we deﬁne a data structure for binary trees. leaf). We now deﬁne the size of a tree. and we can see that the second sums are also equal. by doing a dummy transformation (let i = j −1): k+1 j=1 k j−1 k xk+1−j y j = i=0 k i xk−i y i+1 So LHS = RHS. node(node(leaf. and we can conclude that EQ holds for all x. The structure of a binary tree is well known and says that a tree is either a leaf or made up of two subtrees glued together by a node. TREE ::= leaf  node < TREE × TREE > An example of such a tree is node(leaf. Again we deﬁne the size in the language Z. For this example we will use a deﬁnition in the notation of the language Z to describe recursive data structures.4. by ﬁrst declaring its type and then saying how it is deﬁned in each of the two cases: . The basic idea of the deﬁnition is that we deﬁne the size of a tree inductively over the structure.3. Example: We give an example of a proof about binary trees using structural induction. saying how the size of a given tree is calculated from the sizes of its parts.
Also. . t2 )) = 1 + size(t1 ) + size(t2 ) = (apply induction hypothesis to t1 and t2 ) 1 + (leaves(t1 ) + nodes(t1 )) + (leaves(t2 ) + nodes(t2 )).t2 )) = leaves(t1 ) + leaves(t2 ) ∧ nodes(leaf) = 0 ∧ nodes(node(t1 . t2 )) = (leaves(t1 ) + leaves(t2 )) + (1 + nodes(t1 ) + nodes(t2 )) = (commutativity and associativity of + ) 1 + (leaves(t1 ) + nodes(t1 )) + (leaves(t2 ) + nodes(t2 )). leaves(t) + nodes(t) = leaves(t) + 0 = 1 + 0 = 1. t2 )) + nodes(node(t1 . We prove the theorem by induction on the size of t. Then size(t)= size(node(t1 .t2 )) = 1 + nodes(t1 ) + nodes(t2 ) We now want to prove the following theorem by structural induction on the size of the tree t. The induction hypothesis says that the theorem holds for all t with size(t ) < size(t). t2 : TREE • leaves(leaf) = 1 ∧ leaves(node(t1 . Proof: Let t. Induction case: Assume t = node(t1 . t2 )) = 1 + size(t1 ) + size(t2 ) Similarly. t2 : TREE • size(leaf) = 1 ∧ size(node(t1 . size(t) = leaves(t) + nodes(t). MATHEMATICAL CONSTRUCTS IN SETTHEORY size : TREE → N ∀ t1 . t1 and t2 be of type TREE. And leaves(t) + nodes(t) = leaves(node(t1 . Base case: Assume t=leaf. Theorem: For all trees t. we make two new deﬁnitions about trees: leaves: TREE → N nodes: TREE → N ∀ t1 .44 CHAPTER 3. t2 ). Then size(t) = size(leaf) = 1. t .
REAL NUMBERS 45 3. or. a precise and formal basis for the real numbers was needed. Cauchy and Heine tried to deﬁne the irrational or real numbers in the second half of the 19th century. in other words. In 1872 Cantor and Dedekind followed with their precise deﬁnition of the real numbers. .. there is no rational number x such that x2 = x ∗ x = 2).5. We ﬁrst present the three methods (of Dedekind. and repeating this process a (large) number of times. (see [30. see [12]). Even simple equations have no solutions if all we knew were rational numbers (for example. I am sure the reader will appreciate with suﬃcient clearness what I mean here by ‘neighboring’ and by ‘jumps’ (if he is not too pedantic). especially in connection with theorems asserting the existence of some limit relations.5 Real numbers What do we mean when we say ‘continuum’ ? Here is a description Albert Einstein gave on page 83 of [21]: The surface of a marble table is spread out in front of me. by going from point to point without executing ‘jumps’.3.C. in order to avoid begging the question of existence involved in the theorem. We express this property of the surface by describing the latter as a continuum. I can get from any point on this table to any other point by passing continuously from one point to a ‘neighboring’ one. Cantor and Cauchy) of deﬁning the reals in terms of rationals and then show that they are identiﬁable. In order for set theory to cover the fundamental structures of analysis. When Cantor developed his set theory. it was well known that each type of number could be constructed as the limit of a sequence of numbers of another type. But it became clear that. People have been using the concept of real numbers for a long time (the Babylonians for example already calculated with roots long B. page 182]) the proof might require irrational numbers to be deﬁned in terms of rational ones.
5 .4).5.Richard Dedekind.1 Dedekind’s cuts As a professor in the Polytechnic School in Z¨rich I found myu self for the ﬁrst time obliged to lecture upon the elements of the diﬀerential calculus and felt more keenly than ever before the lack of a really scientiﬁc foundation for arithmetic. Each cut then determines a unique real number (see paragraph 3.mine. a subset C ⊆ V is a cut in V := 1) C = ∅ ∧ C = V 2) (∀a. Deﬁnition of a (Dedekind) cut: Given an ordering < on a set V . Dedekind’s original deﬁnition did not use a partition but a slightly more complex division. Each real number r can now be deﬁned by a cut C in Q if r is the supremum for C. . We want to identify cuts that deﬁne the same real number.46 CHAPTER 3. Notice that we can also deﬁne the same cut as {x ∈ Q  x4 < 4}. in the opening of the paper in which Dedekind’s cuts were introduced.nu/stage. Actually. Dedekind deﬁned a cut to determine a real number. Dedekind used the point at which the sequence is partitioned5 to deﬁne a real number. b ∈ C : a ∈ C ∧ b < a → b ∈ C) 3) C does not have a greatest element Example: {x ∈ Q  x2 < 2} is a cut in Q. notation C1 ∼ C2 := there is a supremum r for C1 and for C2 We can now deﬁne RDedekind as the set of all equivalence classes of all cuts in Q: RDedekind := {C ⊆ Q  C is a cut in Q }/∼. such as for example {x ∈ Q  x2 < 2} and {x ∈ Q  x4 < 4}. MATHEMATICAL CONSTRUCTS IN SETTHEORY 3. all the members of one of which are less than all the members of the other. A cut is a partition of a sequence into two disjoint nonempty subsequences. Deﬁnition of (Dedekind) cut equivalence: A cut C1 is equivalent to a cut C2 . b : a.5. For details see the link ‘Dedekind cuts’ at http://zax.
bn > and < cn . .415 >. √ Each segment ‘includes’ 2. 3. .5.41. chapter 12]).42 >. < 1. dn > := (∀k : k ∈ N : bk ≥ ck ∧ dk ≥ ak ) Theorem: ∼ is an equivalence relation on the set of all chains of segments of Q . and in 3) a minimum bound is put on the speed of convergence. We now want to be able to say when two chains are equivalent. < 1. notation < an . 1. < 1. bn > when it n∈N is clear which set V is meant) is actually a sequence.3.2 Cantor’s chains of segments In mathematics the art of proposing a question must be held of higher value than solving it. Cantor deﬁned a chain of segments to determine a real number (see also [17. 1. . dn > are equivalent. Note that < an . We can identify the real √ √ number 2 with the equivalence class of all sets that have 2 as supremum.5 >. 2 >. bn >V or < an . bn > ∼ < cn . .414. 1.A thesis defended in Cantor’s doctoral examination. bn >V is a chain of segments (in V ) := n∈N 1) (∀n : n ∈ N : an ∈ V ∧ bn ∈ V ) 2) (∀n : n ∈ N : an ≤ an+1 ≤ bn+1 ≤ bn ) 3) (∀n : n ∈ N : bn − an ≤ 2−n ) Example: Consider the following chain of segments in Q: << 1. Deﬁnition of chain segments: < an . Deﬁnition of chain equivalence: The chains of segments < an .5. REAL NUMBERS 47 √ Example: {x ∈ Q  x2 < 2} has 2 as supremum. This is a sequence of ever decreasing intervals in Q. >.4. bn >V (notation < an . the limit of which determines a unique real number.
His sequence of numbers deﬁnes a real by letting the numbers come closer to the real number in every step. . m : n. . . bn >∼ if (∀n : n ∈ N : an < r < bn ). 1. an+1 lays closer to 2 than an .4). . his last words quoted in [22].4. To be precise.414213.} are both Cauchy sequences. We also denote a Cauchy sequence {an }n∈N simply by an . r is determined by < an .5. V is in general an ordered. Deﬁnition of Cauchy Sequence: With {an }V is a Cauchy sequence in V := n∈N 1) (∀n : n ∈ N : an ∈ V ) 2) (∀k : k ∈ N : (∃p : p ∈ N : (∀n.’ to informally indicate an inﬁnite continuation) deﬁned sets {1.4142.} and {1. . m ∈ N : n. 1.√ 1. 6 a partial order on a set6 V .4121.5. .Louis Cauchy. and for the rest of this paragraph take V = Q. 1. 1. 1.414. MATHEMATICAL CONSTRUCTS IN SETTHEORY Each equivalence class of chains of segments in Q now determines uniquely a real number r. For each n ∈ N.3 Cauchysequences Men pass away. . We will not further discuss this here. . We now want to be able to say when two Cauchy sequences are equivalent. r then is the only real number with this property (see also paragraph 3. We can now deﬁne RCantor as the set of all equivalence classes of chains of segments in Q : RCantor :=< an . Cauchy deﬁned a Cauchy sequence to determine a real number. 1. . m > p →  an − am  ≤ 2−k ))) Example: The informally (using ‘. .414. but their deeds abide. commutative ring. bn >Q / ∼ n∈N 3.41421.48 CHAPTER 3.
b ::  a + b  ≤  a  +  b  )) :  c − c  =  (an − c) + (c − an )  ≤  an − c  +  an − c  < 2−k ∗ 2. Suppose the sequence has 2 limits. c and c . . if (∀k : k ∈ N : (∃p : p ∈ N : (∀n : n ∈ N ∧ n > p : an − r < 2−k ))).  a2 .3. for all k ∈ N. The sequence converges. Then the sequence can be bounded by M = max. . Now we prove boundedness. Theorem: Any convergent sequence {an }n∈N is bounded and has a unique limit. . notation limn→∞ (an ) = r. if n > p . but we cannot use such deﬁnition here because we yet have to deﬁne the reals. Proof: First we prove (by contradiction) the uniqueness. notation an ∼ bn := limn→∞ (an ) = limn→∞ (bn ) Note that in the deﬁnition of equivalence the hitherto undeﬁned notion of a limit is used. Then from the deﬁnition of convergence there is an integer p such that  an −c  < 2−k if n > p.5. With the following deﬁnition we can formalize the notion of a limit. if n > p ∧ n > p . k = l. We then have. that  aj  ≤  aj − c  +  c  < 2−l +  c . q : p. Deﬁnition of sequence convergence: A sequence {an }n∈N of elements of a set V is said to converge to a sequence {bn }n∈N . m ∈ N ∧ n > p ∧ m > q :  an − bm  < 2−k ))) Note: convergence is usually deﬁned in terms of real numbers.  ap . thus the limit is indeed unique. This means c = c . The number r is then called the limit of the sequence an . so we can take. Also. notation limn→∞ (an ) = limn→∞ (bn ) := (∀k : k ∈ N : (∃p. . q ∈ N : (∀n. A sequence is said to diverge if it does not converge. there is an integer p such that  an − c  < 2−k . for each sequence an from the equivalence class an ∼ . Take any k ∈ N. REAL NUMBERS 49 Deﬁnition of Cauchy sequence equivalence: The sequences an and bn are equivalent.  c − c  < 2 ∗ 2−k . m : n. again using the triangle inequality.{ a1 . Then there is a p such that  aj − c  < 2−k for j > p. (1 +  c )} Each real number can now be deﬁned by an equivalence class of Cauchy sequences: r is determined by an ∼ if r = limn→∞ (an ). . Hence. for example. Adding the two equations we get (using the triangle inequality: (∀a.
50 CHAPTER 3. . r2 : r1 .5. which is dense in X. i.e. In the rest of this section we assume that by R we mean this set of reals.e. densely ordered (i. r2 ∈ D : (∃q : q ∈ Q : r1 < q < r2 ))) set D without endpoint. We can now check whether the three new deﬁnitions indeed are correct ways to identify real numbers: 1) < an . we intuitively thought of the reals as inﬁnite sequences of (decimal) digits. bn > Q is a chain of segments → (∃!c : c ∈ R : (∀n : n ∈ N : an ≤ c ≤ bn )) 2) C is a cut in Q → (∃!c : c ∈ R : c = supremum(C)) 3) {an }n∈N is a Cauchy sequence → (∃!c : c ∈ R : limn→∞ (an = c)) Then we can check for every newly deﬁned set X of reals that: a) it contains a countable. MATHEMATICAL CONSTRUCTS IN SETTHEORY We can now deﬁne RCauchy as the set of all equivalence classes of Cauchy sequences in Q : RCauchy := < an >Q / ∼ n∈N 3. It can be proven that if these two properties hold we have deﬁned the reals successfully such that there is a total ordering on the reals. b) every Dedekind cut has a supremum in X. Every set for which a) and b) hold is isomorphic with R. (∀r1 . the reals are densely ordered and the ordering is continuous.4 Properties of the three deﬁnitions Before these deﬁnitions for real numbers were given. all inﬁnite sequences of decimal numbers. If a deﬁnition satisﬁes a) and b) it possesses the properties we intuitively want the real numbers to have.
. notation  V . and the purpose of life is to grasp as much as we can out of that inﬁnitude. i.6. This is the concept of an equivalence relation between sets (the relation is also referred to as ‘are of the same cardinality’.3. page 229])). Before we present a proof of this result. INFINITE SETS 51 3.N. He observed that two ﬁnite sets have the same size if the elements of one set can be paired with the elements of the other set.A. But if we consider ∼ to be a true relation. notation V ∼ W := there is a bijection f : V → W It is simple to check that ∼ has the properties of an equivalence relation. But counting the elements does not end for inﬁnite sets. chapter 4]) and proposed a rather nice solution to this problem. we ﬁrst introduce some more deﬁnitions.6 Inﬁnite sets Our minds are inﬁnite.e. this method compares sets without resorting to counting and can be extended to inﬁnite sets. it is reﬂexive. Deﬁnition of set equivalence: A set V is equivalent to a set W . the set of all sets: ∼ ⊆ V × V . . see [30. symmetric and transitive. and led to the surprising result that there are many levels of inﬁnity. ‘equipotent’ or ‘equipollent’ (see [30. we need the concept of V . and yet even in these circumstances of ﬁnitude we are surrounded by possibilities that are inﬁnite. This new method to measure the number of elements of a set is reﬂected in the notion of cardinality of a set. Cantor was concerned with the problem of measuring the sizes of inﬁnite sets (because he was investigating questions about singularities of Fourier series.8. using Cantor’s famous diagonalization method. see section 3. can be deﬁned by the number of elements that it has. Whitehead in [76] The size of a ﬁnite set V . But the existence of V is paradoxical.
In 1884. the cardinality of the set of reals can also be denoted by alephone. . .8 . Cantor deﬁned it as c. after the size of a set V . Cantor regarded cardinals as special abstract entities of a new kind. . the equivalence class of V modulo ∼. A more rigorous treatment of cardinal numbers will be given in section 3. notation ℵ0 . called the cardinal number of V . . This new concept enabled Cantor to deﬁne more concepts for the analysis of inﬁnite sets.1. notation ℵ1 . Frege also denoted ﬁnite cardinal numbers as natural numbers: ∅ = 0. to come to an exact deﬁnition of cardinality from this postulate. {∅}} = 2. such that V ∼ W ↔ V = W .7). This FregeRussell deﬁnition would become standard.52 CHAPTER 3. ℵ2 . Cantor used the Hebrew letter aleph to name the diﬀerent levels of inﬁnity. MATHEMATICAL CONSTRUCTS IN SETTHEORY Postulate for Cardinal numbers: With every set V is associated a welldeﬁned abstract entity V . we have • If V is ﬁnite: V = the number of elements of V • If V is inﬁnite: V = ℵi . It also inspired others to analyze the properties of inﬁnite sets. . We can think of V as denoting the common property of set equivalence (as deﬁned above) of all sets in the equivalence class of V . . {∅} = 1. . the German mathematician Frege came with his own deﬁnition of cardinal numbers. The postulate for cardinal numbers then follows at once. until . Since the cardinality of the set of reals was unknown. The cardinality of the set of natural numbers is by deﬁnition called alephnull or alephnough.. {∅. If we assume the continuum hypothesis (see section 3. that says there is no level of inﬁnity between the cardinality of N and R. He discussed it with the mathematician Russell and they proposed the idea of deﬁning V as V / ∼.. The ‘next levels’ of inﬁnity are called ℵ1 . when there exists a bijection between V and the set P i (N) Sometimes the cardinality of a set V is also denoted by  V  . It proved diﬃcult however. Property of cardinality: Given the cardinality V of a set V .it became known that this deﬁnition could also lead to a paradox.8.as we will later see in section 3.
i. deﬁned recursively by g(0) = a / and g(k + 1) = f (g(k)). so g(0) = g(j). / j = l + 1: we know k = 1 = j = l + 1. We prove the two implications of the theorem separately: V is Dedekind inﬁnite → V is inﬁnite: V is Dedekind inﬁnite. We use induction on i: i = 0: if 0 = j then g(0) = a ∈ W and g(j) ∈ W .D. / i = k + 1 : assume k + 1 = j. there exists a W ⊂ V such that V ∼ W . no other idea has so fruitfully stimulated his intellect. so k = l. We now have to show that g is an injection. yet no other concept stands in greater need of clariﬁcation than that of the inﬁnite.e. Consider the function g : N → V .e.e for all i. then we can prove g(k + 1) = g(j) by induction on j: j = 0 : g(0) = a ∈ W and g(k + 1) ∈ W . quoted in [96] 53 In the rest of this section we will present some of the results of the research of inﬁnite sets. By the induction hypotheses g(k) = g(l). INFINITE SETS No other question has ever moved so profoundly the spirit of man.3. so g(k + 1) = g(0). Deﬁnition of ﬁnite: A set V is ﬁnite := (∃n : n ∈ N : V ∼ {x ∈ N  x < n}) Deﬁnition of inﬁnite: A set V is inﬁnite := V is not ﬁnite Deﬁnition of Dedekind inﬁnite: A set V is Dedekind inﬁnite := (∃W : W ⊂ V : V ∼ W ) Theorem: V is Dedekind inﬁnite ↔ V is inﬁnite (from [17]) Proof: We show that V is inﬁnite iﬀ N ≤1 V .e g(k + 1) = g(l + 1) or g(i) = g(j). i. i. . Since f is a bijection we also have that f (g(k)) = f (g(l)). Because W is nonempty and W ⊂ V there also exists an a ∈ V such that a ∈ W . . i.6. Hilbert. there exists a bijection f : V → W . j ∈ N : i = j → g(i) = g(j).
Z and Q all have the same cardinality and also called these sets countably inﬁnite. we would not obtain a correspondence between N and Q. the second fractional digit of x also diﬀerent from 0. we obtain a correspondence between N and Q. is equivalent to V (W ∼ V ). 2 An example of an uncountable set is the set of real numbers. b ∈ N+ . starting from the northwest corner. We show that W := V − {f (0)}. since the list would never get to the second row. we hereby skip an element when it 2 1 would cause a repetition. We can also give a bijection from Q to an inﬁnite subset of N which is equivalent to N: for each fractional a ∈ Q with a and b b relative prime. R. In 1873 Cantor proved that R is uncountable. clearly a real subset of V (W ⊂ V ). b >) := 1 (a + b)(a + b + 1) + n. Continuing this way . We construct this X by taking the ﬁrst fractional digit of x arbitrarily but never 0 or 9 or the ﬁrst fractional digit of f (1). Theorem: R is uncountable Proof: Suppose there is a bijection f between N and R. Deﬁnition of countable: A set V is countable. 9. MATHEMATICAL CONSTRUCTS IN SETTHEORY V is Dedekind inﬁnite ← V is inﬁnite: N ≤1 V . By listing the elements at the diagonals (southwest to northeast). Consider a table with all fractionals a (a ∈ N. see [17. so there exists a bijection f : N → V . with fractional a on the ath b b row and the bth column. etc. We contradict this by ﬁnding an x in R that is not paired with anything in N. If we list all elements row by row. using a technique called diagonalization (also known as the diagonal method). also called denumerable := V is ﬁnite or V ∼ N Deﬁnition of uncountable: A set V is uncountable := V is not countable Deﬁnition of denumeration: A denumeration of a set V is a bijection f :N→V Cantor then proved that N. The following function g is a bijection from V to W : g(f (i)) = f (i + 1). Theorem: Q is countable Proof: We give a bijection from N to Q. by listing all elements of Q. and the second fractional digit of f (2). let f (< a. Because 2 = 1 . g(x) = x if x = f (i). page 99].54 CHAPTER 3. etc. for all i ∈ N.
and g = KWg . Cantor also conjectured that ≤1 is a partial order.6. INFINITE SETS 55 down the diagonal of the table of digits. For W ⊆ V . This was later proven independently by the two mathematicians F. Bernstein and E. we can use this diagonalization method to show that N ∼ {0. 1}V . Then we have that (KW1 (w) = 1 ∧ KW2 (w) = 0) ∨ (KW1 (w) = 0 ∧ KW2 (w) = 1). o We give two theorems that are based on the relation <1 : Theorem: (without proof) (∀ V : V is a nonempty set: V <1 P(V )) . The relation ≤1 is reﬂexive and transitive. Schr¨der (see [59. and thus (∃w : w ∈ V : KW1 (v) = KW2 (v)). the characteristic function of W . We can deﬁne an ordering relation ≤1 on the cardinalities of sets. (see [17. . Let Wg = {v ∈ V  g(v) = 1}. Then V <1 W of course means that V ≤1 W holds but not V ∼ W . thus (∀v : v ∈ V : KWg (v) = g(v)). page 98]) Proof: We show that there is a bijection K from P(V ) to {0. we obtain all digits of x. This relation on the set of cardinals only depends on the cardinals themselves and not on the choice of the particular sets V and W . that means there / / is an element w ∈ V . deﬁne K(W ) (also denoted KW ). being equal by never selecting a 9 or a 0. KW1 = KW2 . Then (∀v : v ∈ V : KWg (v) = 1 ↔ g(v) = 1).3999 . as: KW (v) = 1 if v ∈ W / KW (v) = 0 if v ∈ W . 2) f (w) is surjective: suppose g ∈ {0. x is not f (n) for any n because the nth fractional digit of x diﬀers from the nth fractional digit of f (n). . i.e. 1}N . We say that V ≤1 W if there is an injection from V to W but not vice versa. 1}W : 1) f is injective: let W1 .3. and 2. W2 ⊂ V and suppose W1 = W2 .4000 . Similarly. . . 1}V ). Note that we avoid the problem of certain numbers such as 2. 1}V . page39]). We now show that K is a bijection from P(V ) to {0. Theorem: (∀V :: P(V ) ∼ {0. such that (w ∈ W1 ∧w ∈ W2 )∨(w ∈ W1 ∧w ∈ W2 ).
We also have that (∀n : n ∈ N : f (n) ∈ Sn ∧ y ∈ Sn ). 1] Proof of Poincar´ (see [17]) We show there is no bijection f : N → [0. e in particular (∀f : (f : N → [0.5. Since Ran(g) ⊆ V and g ◦ f is an injection from V to Ran(g). We do this by constructing for every function f : N → [0. Theorem: N ∼ [0. W ⊆ V ∧ V ≤1 W → V ∼ W (see the lemma below). so (∀n : n ∈ N : y = f (n)).e. each part has length 3−n−1 ). 1]) : f is not surjective). 1]. we might still think that there is some smaller interval of the reals that can be paired to the naturals. so to prove g is surjective we have to prove Ran(g) ∼ W . and thus certainly y ∈ [0. for if we divide a segment Sn = [pq . / The following theorem gives a way to prove the equivalence of sets: Theorem of CantorBernstein: V ≤1 W ∧ W ≤1 V → V ∼ W Proof: Assume V ≤1 W and W ≤1 V .e. We construct this y by means of a chain of segments (see paragraph 3. Let f : N → [0.56 CHAPTER 3. we have Ran(g) ∼ V . We know that Dom(g) = W .5. . with (∀n : n ∈ N : y ∈ Sn ). we have V ≤1 Ran(g).2) a real number y. Let Sn be an inﬁnite chain of segments such that 1) (∀i : i ∈ N : f (i) ∈ Si ) / 2) (∀i : i ∈ N : Si+1 ⊆ Si ) 3) (∀i : i ∈ N :  Si  = 3−i−1 ). We take this part for Sn+1 . Then there are injections f : V → W and g : W → V . And since for all W and V . MATHEMATICAL CONSTRUCTS IN SETTHEORY Theorem: V is Dedekind inﬁnite ↔ N ≤1 V Proof: This theorem follows directly from the theorem on page 53 and the deﬁnition of inﬁnite.2). with  Si  being the length of segment Si . We can construct such a chain of segments. 1]. i. at least one of these parts does not contain f (n + 1). Although we have seen that N is countable but R is not. 1] such that (∀n : n ∈ N : f (n) = y). The constructed chain of segments determines (see paragraph 3. qn ] in three equal parts (i. 1] a y ∈ [0. 1].
and (∀n : n ∈ N : An+1 := h(An )). thus w ∈ A0 . 1) ≤1 [0.3. it follows that k(a) = k(b) n by the deﬁnition of k and the injectivity of h. 1]. 1}R ∼ P(N) ∼ NN . a ∈ n An ∧b ∈ n An . 1]. Again we use case / analysis: – if w ∈ / n An then w = k(w). then k(a) = k(b) by using a case analysis / / / a ∈ n An ∧ b ∈ n An . a ∈ n An ∧ b ∈ n An . so (0. b ∈ R by using the theorem of CantorBernstein. • k(a) := a if a ∈ / n An n • k(a) := h(a) if a ∈ An We show that k is a bijection: • k is injective: Suppose a = b. Proof of (0. 1] to (0. 1) ∼ [0. 1). b) ∼ [0. 1]: The identity function id(0.1) : (0. Thus there is a w ∈ Ap−1 such that w = k(w ). 1) → [0. 1]. Using the CantorBernstein theorem we can also prove that (a. We now give the desired bijection k : V → W . • k is surjective: Suppose w ∈ W . 1) ∼ R ∼ Rn ∼ {0. Let A0 := V − W . 1) to (a. Proof of (0. a ∈ n An ∧ b ∈ / An . 1] for all a. 1) ∼ (a. 1] and consequently that (0. 3 By the theorem of CantorBernstein we now know that (0. for all n ∈ N. 1) ∼ (a. so [0. There is an injection h : V → W . 1]. b). p ≥ 1. We ﬁrst prove that (0. . assume w ∈ Ap . Example: We prove that (a.6. b) ∼ (0. Since w ∈ A0 . b) ∼ [0. For all cases. b): The function f (x) = (b − a)x + a is a bijection from (0. INFINITE SETS 57 Lemma: W ⊆ V ∧ V ≤1 W → V ∼ W Proof: Suppose W ⊆ V and V ≤1 W . 1) ∼ [0. The function f (x) = 1 (x + 1) is an injection from [0. 1) to [0. 1] is an injection from (0. 1] ≤1 (0. 1) ∼ [0. by the transitivity of ∼ we can conclude that (a. 1). / – if w ∈ n An . Then. n ≥ 1. b).
the weakest nontrivial case occurs when W is denumerable (see page 54 for the deﬁnition of denumerable). An instance of the following theorem (without proof) of the British mathematician F. Theorem of Ramsey: If V is a denumerable set and f : V n → {0.e. in which case it is termed ZFC (ZF supplemented by the Axiom of Choice).mine.3. i.nu/stage and click on ‘links’ for some quotes about the AC. . xn }).3). . page 310]) led Zermelo to axiomatize set theory (see section 5. . Axiom of choice (AC): Given any set W of nonempty sets V . In this way we get a countable subset of V .}. In response some people asked when this assumption developed from mathematics. . The only assumption we have made here is the socalled axiom of choice. when it is implicitly used. m ∈ N and n. The notation V n in this theorem is deﬁned as the set of all subsets of V with n elements. namely {x0 . . We take one element x0 ∈ V . For more details on the role of the AC. there is a function f which assigns to each member V of W an element f (V ) of V . x1 . . See http://zax. Ramsey is often used in graph theory. . The axiom can be restricted by limiting to those families W of a particular cardinality. . This deﬁnition was proposed ﬁrst in an article by Zermelo in 1908 (translated in [93. . pages 199215]). Zermelo attempted to prove AC. 1. .3 and [63]. We can repeat this inﬁnitely (i. . Zermelo regarded the AC as already implicitly used by mathematicians. but the controversy over his proof of 1904 (see [63. V n := {X ⊆ V  X has n elements}. m ≥ 1 then (∃W : W ≤1 V : W is denumerable and f is constant on W n ).P. we refer to section 5. This case is known as the Denumerable axiom.e. and when exactly it can or cannot be avoided.1). x2 . Such a function f is called a choice function for W . We can add AC to set theory based on the axioms of Zermelo and Fraenkel (ZF. Since for any ﬁnite W the axiom is provable.58 CHAPTER 3. see section 5. m− 1} with n. we take an element x1 ∈ V − {x0 }. Next. . if we assume that it is possible to always select an element from any nonempty set (see the axiom of choice below). MATHEMATICAL CONSTRUCTS IN SETTHEORY Theorem: V is inﬁnite → N ≤1 V Proof: V is inﬁnite and thus not empty. for all n we can select an x ∈ V − {x0 .
is mapped to the set {11. that is two 2 real numbers between 0 and 1. 1178. 1) and R. Now we give an injection from (0. . For example. an = 9. 11780.4000 . . 1). 1). INFINITE SETS 59 Theorem: R2 ∼ R ∼ (0. . with 0 ≤ ai ≤ 9. For example. 117. Clearly. Thus: R ∼ (0. 1).. 1) and (0. . 1) ∼ R. r = 0.3999 . Since ∼ is transitive. Suppose V ∈ P(N).76584 . If we consider an element of R2 . then we can map these numbers to an element r ∈ R by interchangeably taking the next digit of each of the two numbers. This injection proves that P(N) ≤1 R. deﬁned as f (x) = tan( π (2x−1)). .3.a1 a2 .e. 71635 . We want to identify numbers such as 0.. hence P(N) ∼ (0. 1a1 a2 . . . r = 0. . 0. there exists a bijection f : (0. i. 1) Proof: We can say that R ∼ (0. 1) ≤ P(N). . 117803.a1 a2 . . Then we map r to the set {1a1 . . . 1) Proof: First we show that P(N) ≤1 R. .).6. and 0.17803 . n ∈ N. Therefore we assume there is not an i ∈ N such that for all n > i.13275.}. 1). . 1) to P(N): assume r ∈ (0. . Indeed. . . we map (0. . Corollary: P(N) ∼ R Proof: This directly follows from P(N) ∼ (0. . . this mapping is welldeﬁned. . we know that R2 ∼ R ∼ (0.) uniquely to (0. and the transitivity of ∼. .} of natural numbers. . 1) → R. Theorem: P(N) ∼ (0. . Thus: R2 ∼ R. 1) if there is a bijection between (0. with ai = 1 if i ∈ V and ai = 0 otherwise. map V to the decimal 0. . Thus (0.
5). ℵ3 . 7 . we can also state CH as: (C) P(N) ∼ ℵ1 . are all equivalent in ZF C. page 197]). countably inﬁnite or has the same cardinality as the entire set of reals’.60 CHAPTER 3. This means that ‘the number of real numbers is the next level of inﬁnity above the number of natural numbers’ (see also [30. MATHEMATICAL CONSTRUCTS IN SETTHEORY 3. ℵ2 . for continuum. The cardinality of the power set of any set X is equal to the cardinality of {0. .7 The Continuum Hypothesis We still think that the study of the size of the continuum should be our guiding light for further research in set theory.Judah Haim in [33] After showing that the real numbers cannot be put into onetoone correspondence with the natural numbers (see section 3. This hypothesis was ﬁrst published in 1878 in [13] and became later known as: The Continuum Hypothesis (CH): (N ≤1 A ≤1 R) → (A ∼ N ∨ A ∼ R) This hypothesis (as given in [17. Actually in this formulation we have identiﬁed the cardinalities ℵ0 and ℵ1 with the sets that have these cardinalities. so another formulation7 of CH is: (D) 2ℵ0 = ℵ1 (see [31]). (C) about subsets and (D) about cardinal exponentiations. although (B) leads us to think about sizes of reals. He also named the cardinality of the reals c. Since Cantor also proved that P(N) ∼ R (see page 59). 1}X (see page 55). etc. As we saw in section 3. Cantor’s original formulation of CH was: (B) c = ℵ1 . often denoted as 2X . page 128]) is also known in many other forms. of which we will mention and explain the most important. These formulations. We can immediately see that the following version of CH is equivalent to the given deﬁnition: ‘any set of real numbers is either ﬁnite. and the next levels of inﬁnity to be ℵ1 .6. We will not go into details of less precise or more dependant formulations such as ‘what is the cardinality of the set of points on a geometrical line?’. Cantor deﬁned the cardinality of the natural numbers to be ℵ0 . Cantor hypothesized in 1877 that each inﬁnite subset of R is either denumerable or equivalent to the continuum.
we refer to section 3. If you are not familiar with the notations that are used.7. Note that ZF + GCH AC (so we don’t need ZF C once we have GCH).3) we have that ZF + GCH CH. will be introduced in later chapters. for details see [31. THE CONTINUUM HYPOTHESIS 61 Some of the theory that is needed in the remaining part of this section.2) put it ﬁrst in his list of 23 problems. Forcing is a combinatorial technique for proving statements consistent with the axioms of set theory.8. Essentially it consists of a method of performing the following algorithm: start with a model of set theory M. Consider the smallest model M with X an element of M and M a subset of M (this is done in a way such that the construction of M is implicit in the construction of .3. for the generalized continuum hypothesis. people suspected that CH was one of the statements (of ZF C) that can neither be proved nor disproved.1. In 1938 signiﬁcant progress was made when the mathematician G¨del o proved (in his article ‘What is Cantor’s continuum problem?’) that CH is consistent with ZF C (see section 5. (see section 5. page 117]). Obviously. Construct an object X not in M with certain properties. Since at the same period. but did not succeed. This problem was so important that Hilbert (see section 6. Mathematicians suspected that CH was undecidable in ZF C but it took until 1963 until this was proved by Paul Cohen in [15]. Cohen used it in order to prove that the negation of AC and the negation of CH are consistent with the axioms of set theory (AC and CH were already known to be consistent). Cantor and many other great mathematicians spent years trying to prove CH or its negation (Cantor tried to prove his hypothesis by using a decompensation theorem. To do that he used a new technique called forcing.2) by constructing a model of ZF C + CH. G¨del proved his famous incompleteness theo orem (see chapter 8).3. you might want to skip the remaining part of this section and get back to it later. In 1908 the German mathematician Felix Haussdorf proposed the following generalization of CH (that is also called alephhypothesis): The Generalized Continuum Hypothesis (GCH): (∀r : r is an ordinal : 2ℵr = ℵr+1 ) For a deﬁnition and the notation of ordinal numbers.
In other words. not been adopted as axioms of set theory.62 CHAPTER 3. the continuum hypothesis is true. and whether. showed that CH is undecidable in ZF C. with some axiom system for set theory. Mathematicians either accept this incompleteness in set theory or try to ﬁnd more intuitive axioms that will help decide it. along with o G¨del’s model of ZF C + CH. Thus Cohen constructed a model of ZF C + ¬CH and this. For more details on forcing. unlike AC. So this means that either CH or ¬CH could be added as an axiom of ZF C. . MATHEMATICAL CONSTRUCTS IN SETTHEORY X). see [51] and [81]. But since neither of these axioms seems axiomatic or ‘selfevident’ they have. the question remains what intuitive axiom of set theory we need to make it more complete.
3}. Note that in this example the equality ‘=’ between cardinal numbers is a new type of equality that is deﬁned as A = B ↔ A ∼ B. or A = D. that is. As we have already seen on page 52. {4}}. b}}. {a. Cantor realized that all sets that are equivalent to a given set V have a common property. . one wanted to regard cardinal numbers as objects of the mathematical system. We can see that cardinality abstracts from the order and nature of the elements. 2. this is the cardinal number of N or any denumerable inﬁnite set. ℵ2 . . Cantor deﬁned the ‘next’ levels of inﬁnity by ℵ1 . 7.e. 1}. In naive set theory. C = {{4}.3). the deﬁnition of the cardinal V of V poses no problem: V can be deﬁned as the set of all sets equivalent to V . every transﬁnite set. The next question was how to pass from the abstract notion of cardinal numbers to real cardinal numbers. Example: Consider the following sets: A = {1. and for ﬁnite sets the cardinal number can be identiﬁed with the ordinary ‘number of elements’. or (equivalently) A = B = C.8. Therefore we identify the cardinal number of a ﬁnite set of n elements with the natural number n. He identiﬁed this property with the cardinal number V of a set V . i. must have a deﬁnite aleph as its cardinal number. .. B = {3.Georg Cantor 3.8. as well as in Quine’s ‘New Foundations’ (see section 7. see page . We denote the smallest inﬁnite set (or transﬁnite) cardinal number by ℵ0 . We can say that A ∼ B ∼ C. D = {1. CARDINAL AND ORDINAL NUMBERS AND PARADOXES 63 3. After deﬁning the equivalence of sets (see page 51).1 Cardinal numbers and Cantor’s Paradox In section 3.8 Cardinal and Ordinal numbers and Paradoxes Every transﬁnite consistent multiplicity.6 we already encountered cardinal numbers and the notion of set equivalence. a property that abstracts from the nature and order of the elements of a set.3. But this deﬁnition (ﬁrst given by Frege. 2. It turned out to be quite a problem to deﬁne the cardinal V of a set V as an object of set theory. We also have A ∼ D. .
without the unrestricted comprehension axiom. The existence of such an α is guaranteed by the wellordering theorem. If we have the axiom of foundation among our axioms. MATHEMATICAL CONSTRUCTS IN SETTHEORY 3. section 2. The following deﬁnition that comes from the mathematician von Neumann is now the standard deﬁnition for cardinal numbers. . chapter 6]. We call this unique α the cardinality or cardinal number of the set V .6 can lead to a paradox that was ﬁrst found by Cantor. we refer to [59.g. based on ordinal numbers (for which we refer to the next section8 ). and is also denoted by V . With this paradox the need arose to ﬁnd a new deﬁnition of cardinals in a context without the unrestricted comprehension axiom. Cantor’s paradox: The set of all sets is its own power set. For more information on the deﬁnition and calculus of cardinal numbers.64 CHAPTER 3. such that traditional paradoxes could no longer be derived.6) of cardinal numbers as given in section 3.10]) that there exists exactly one cardinal number α satisfying V ∼ α (proof uses AC). Therefore. 8 The rest of this section depends on concepts that are deﬁned in later chapters.3). Deﬁnition of Cardinal number (or initial number): A cardinal number α := an ordinal number α with property (∀γ :: α ∼ γ → α ≤ γ) For each set V we can prove (see [17. there is no set which contains all sets equivalent to V . see section 5. In axiomatic set theory however (e. [25] and [34]. In other words. in ZF. In the absence of the axioms of choice and foundation the operation V is undeﬁnable (see [1]). even if the axiom of choice is absent we can deﬁne V as the set of all sets W of least rank among those equivalent with V (see [1]). the cardinality of the set of all sets must be bigger than itself. Several new deﬁnitions of cardinal numbers were then proposed. with the axiom of choice we can develop the theory of ordinals in the von Neumann way and deﬁne V to be the least ordinal α equivalent to V .
6. regardless of the order of the elements.1) the question was posed how to deﬁne ordinal numbers as part of set theory. a property that only abstracts from the nature of the elements of a set.3. We consider two ordered sets V and W similar . In 1883 Cantor deﬁned in [13] an ordinal number as the order type of a wellordered set. The ordinal number of that set is the same. and saw in the previous paragraph that it abstracts of the order and nature of the elements of a set. Deﬁnition of an Order Type: An equivalence class under the phism) relation (isomor The equivalence class to which an ordered set V belongs is called the order type of V .8. that only abstracts from the nature of the elements of a set.8. And just as for cardinals (see section 3. but retains the order in which they are given. For each ﬁnite and simply ordered set. if there is a bijection between V and W that retains all order relations. Here we consider sets with a total ordering (see page 25). Note that we have already seen this relation with the concept of isomorphism (‘is isomorphic to’. Cantor also deﬁned a property of sets. we also can say they are of the same order type. we can therefore . the ordinal number .2). Instead of saying two sets are similar. All wellordered sets that are as such similar to a given set V have a common property. CARDINAL AND ORDINAL NUMBERS AND PARADOXES 65 3. and note that is an equivalence relation. In the case of ordered sets. Deﬁnition of Ordinal Number (Cantor): A wellordered set V has ordinal number o := o is the order type of V If a set is ﬁnite and simply ordered. notation V W . each nonempty subset also has a ﬁrst member in the given ordering (see also page 3. Cantor identiﬁed this property with the ordinal number V of a wellordered set V . the concept of equivalence is now replaced by the sharper concept of similarity. it is wellordered and it has an ordinal number.2 Ordinal numbers and BuraliForti’s Paradox We already introduced Cantor’s concept of cardinal number in section 3. Recall that in addition for a wellordered set. see page 31).8.
4.}. . . was found for ordinal numbers. V4 = {. . . The natural numbers in reverse order are denoted ∗ω. V1 = ω + 1 . . 1}.. . . . 2. 2. page 66]). . . . 6. This is the ordinal number of the sequence {0. 3. 3. 3. 4. then consider the union of all the sets {0. BuraliForti. 1. More ordinals can be obtained by continuing this succession. . 3. It has a successor: ω ∪ ω. 1. . . 2} . 2 = {0. 1. The smallest inﬁnite ordinal number is called ω. 5. V5 = ω +∗ω . 1. MATHEMATICAL CONSTRUCTS IN SETTHEORY identify the (ﬁnite) cardinal number with the ordinal number. 6. This is another ordinal called ω and is the ﬁrst nonﬁnite ordinal. . V3 = ω + ω = ω ∗ 2 V4 = ∗ω . . called ω + 1.} N = ω . V2 = {3. . }. Example: 0 = ∅. V6 = ω ∗ 10 For ordinal numbers n of N and m of M we say that n < m if the wellordered set N is similar to a real subset of M . 1} V5 = {1. 2. Example: If we call the set ∅ as ‘0’. . . V2 = ω + 2 . . and taking the union of all these ordinals yields an ordinal we call ω ∗2. 3. 12. 11. 3 = {0. . V6 = {1. which can be seen as N or as the sequence of ﬁnite cardinal numbers in their ‘natural’ order. . . . 2. . 21. 5. . 22. . V1 = {2. . 4. etc. . We introduce some other transﬁnite ordinals by example (from [10. . etc. a similar situation as for cardinal numbers. 1 = {0}. 2} V3 = {1. . Unfortunately. . 2. that this deﬁnition can give rise to a paradox (see [18. In 1897 it was found by the Italian assistant of the mathematician Peano. . 5. . page 259]). the next set as ‘1’. . 2} are ordinal numbers. 4.} .66 CHAPTER 3. . 1} .
taken in their natural order. there exists exactly one ordinal number isomorphic to V . {0. For every wellordered set V . 1.. These ordinal numbers. Von Neumann deﬁned this particular set as the ordinal of that order type. which are ﬁnite sets. given by John von Neumann in [61]. {∅}}. and therefore also has an ordinal number Ω. In 1923 he pointed out that among all wellordered sets having a Cantorian ordinal as their order type. Deﬁnition of ordinal number of a wellordered set V : The ordinal number of a wellordered set V := the ordinal number isomorphic to V .2]). {0. Otherwise said: 0 = ∅.8. the BuraliForti paradox can no longer be applied. . form a wellordered series. . . is called a transﬁnite ordinal number . This led to new proposals for deﬁnitions of ordinal numbers. since the set of all ordinals is wellordered by and 2) also holds (a proof is given in [59. . Hence we hereunder present another deﬁnition. there is a particular one with some very special properties. and therefore Ω exceeds any ordinal number whatsoever. 1 = {∅}. 2} by 3. The set ω = {0.3. Deﬁnition of ordinal number: A set α is an ordinal number := 1) α is a wellordered set with the binary relation ∈ as its ordering 2) (∀β :: β ∈ α ↔ β ⊂ α) With this deﬁnition of ordinal numbers. etc. the empty set is an ordinal number. 2 = {∅. . are called ﬁnite ordinal numbers. The ﬁnite ordinal numbers are identiﬁed with the natural numbers. 1} by 2. like ω.} of all natural numbers is also an ordinal number. section 4. Similarly we also denote the ordinal numbers {0} by 1. This ordinal number is also denoted by 0. . According to this definition. CARDINAL AND ORDINAL NUMBERS AND PARADOXES 67 The BuraliForti Paradox: The set of all ordinal numbers. But the ordinal number of any subset of the set of all ordinals exceeds every number of that subset. 2. 1. An ordinal number that is an inﬁnite set.
This means that ordinals give us a way of ‘counting’ any set. The least transﬁnite ordinal is ω (see also 5.2 we also write α ∈ β (we denote ordinals by lowercase Greek letters) as α < β. which Cantor had accepted as true (see [18. Rigorous study has produced a complete calculus of ordinal numbers and produced signiﬁcant results. As we saw in 3. page 257]) but that was ﬁrst proved rigorously by Zermelo in 1904. which deﬁnes an ordering on the ordinal numbers. In the remainder of this section we will only deﬁne the most common concepts. The ﬁrst limit ordinal number is ω. the set α = {γ  γ ≤ α} (called a segment of α) also is an ordinal number. and all the other ordinal numbers are called isolated ordinal numbers. the supremum of V . even if it is not ﬁnite. . We only mention here the socalled wellordering theorem. A transﬁnite ordinal without a predecessor is called a limit ordinal number . The particular signiﬁcance of the wellordering theorem lies in the possibility that we can apply the principal of mathematical induction (which is well known for denumerable sets. and the ordering of the ﬁnite ordinal numbers coincides with the usual ordering of the natural numbers. MATHEMATICAL CONSTRUCTS IN SETTHEORY A detailed treatment of ordinal calculus that is based on this deﬁnition of of ordinal numbers. Ordinal numbers form the basis of transﬁnite induction which is a generalization of the principle of induction. see section 3.3. and α is the unique predecessor of α. is outside the scope of this report.2). For any set V of ordinal numbers. The ordering ≤. WellOrdering Theorem: Every set can be wellordered. deﬁned by α ≤ β := α < β ∨ α = β. The least ordinal number is of course 0. Therefore we can apply transﬁnite induction (see page 37) on ordinal numbers. A full treatment of the theory of ordinal numbers is omitted here. {γ  (∃η : η ∈ V : η ≤ γ)} is an ordinal number. For any ordinal number α.4. is a linear ordering and a wellordering of the ordinal numbers.3) to any arbitrary wellordered set.68 CHAPTER 3.
ordinal calculus and set theory we refer to two classical books on settheory: [25] and [34]. there is no highest aleph) • If two sets have the same ordinal number.8. . they have the same cardinal number.3. but not necessarily vice versa For more information and theory on cardinal numbers. CARDINAL AND ORDINAL NUMBERS AND PARADOXES We now have the following properties (given without proof): 69 • Two ﬁnite and ordered sets have the same order type if and only if they have the same cardinal number • Cantor’s theorem : the cardinality of any set is lower than the cardinality of the set of all its subsets (i.e. the latter is more suitable for readers with experience on set theory. The ﬁrst one gives a good introduction to set theory and presupposes little mathematical knowledge.
MATHEMATICAL CONSTRUCTS IN SETTHEORY .70 CHAPTER 3.
still lack a satisfactory solution. What is number theory? The ﬁeld of mathematics consisting of the study of the properties of the natural numbers Since then.Peano in the opening of the paper ‘Arithmetices Principia’.Chapter 4 Peano and Frege 4. Peano became aware of the lack of rigour in mathematics by his experience in teaching inﬁnitesimal calculus.1 Peano’s arithmetic Questions that pertain to the foundations of mathematics. Like Dedekind (see quote on page 46). independently of intuition or common sense. . He came to the conclusion that mathematics must be constructed. The diﬃculty has its main source in the ambiguity of language. for an abstract mathematics. Peano strived for rigor. Peano gained particular interest in the ﬁeld of number theory. After his work on calculus (see Peano’s ﬁrst publication [65]) and geometry (see [66] [67]). also known as arithmetic. although treated by many in recent times. novo methodo exposita in which he introduces axioms for the integers The Italian mathematician Giuseppe Peano (18581932) spent most of his career successively in the inﬁnitesimal calculus. in a way that absolutely guarantees the validity 71 . in foundations of mathematics and in linguistic studies.
based exclusively on postulated primitive notions and primitive propositions.1). Second. CHAPTER 4. The work of formalization of mathematics was published in the journal ‘Rivista di Mathematica’ (this journal was previously founded by himself) and ‘Formulario Mathematico’. This development was rudimentary and would later be worked out in full detail by the mathematicians Russell and Whitehead in ‘Principia Mathematica’ (1910. because Peano at that time consistently used Interlingua. for all his mathematical publications. that are formulated with the help of three (undeﬁned) terms. page 227]). In 1899 he axiomatized the arithmetic of e cardinal numbers. To discard intuition. to be published in the third volume of ‘Formulario Mathematico’ in 1901.1). So Peano formalized both the language of mathematics and the logic of the mathematical argument. his simpliﬁcated dialect of Latin. and rewrote mathematics in symbolic form as an axiomatic system (see section 6.72 of its theorems. PEANO AND FREGE In order to satisfy this requirement he devoted himself to the transformation of mathematics into a selfcontained system. 1 . the acquaintance with the latter being assumed: a) N (the set of natural numbers) b) 0 (the particular natural number zero) c) a+ (the immediate successor of the natural number a) The original ‘Formulaire de Math´matique’ was called ‘Formulario Mathematico’ when e the ﬁrst ﬁnal version appeared in 1908. he formalized the logic of the mathematical argument to replace intuitive inference by application of a limited number of stated logical rules. consisting entirely of neutral symbols. a series of 5 books that is also known as ‘Formulaire de Math´matique’1 . and thereto ﬁrst developed parts of symbolic logic and ﬁrst formalized propositional and predicate calculus. he ﬁrst renounced ordinary language (because it is often not suﬃcient and imprecise) and desired a new mathematical symbolism. He introduced letters to denote propositions and propositional functions (Peano’s logic notation) and the symbol ∈ for the membership relation of a set. Peano based the foundations of arithmetic on 5 axioms (see [31. see section 7.
4. and he derived much of the elementary number theory. then S includes the whole of N) 4) a. We can show that in ZF (see section 5.1.3) we can derive the ﬁve axioms of Peano. In later years. product and other operators. a + (b+) = (a + b)+. [49. After deﬁning the natural numbers. page 146147] and [64. Peano then showed how rationals and reals can be formally obtained from naturals. page 148150]) in an attempt to reduce the grammatical structure of languages and create a universal language. appendix A]. Peano turned away from the foundations of mathematics and devoted most of his time on his new international auxiliary language Interlingua. Peano used a recursive deﬁnition to deﬁne the arithmetical sum. Example: Peano deﬁned the sum a + b by recursion with respect to b : a + 0 = a. a ∗ (b+) = (a ∗ b) + a. Similarly we can deﬁne the product a ∗ b : a ∗ 0 = 0. For more information on the Peano axioms. but his language Interlingua received little response. PEANO’S ARITHMETIC Deﬁnition of the Peano axioms for the natural numbers: 1) 0 ∈ N (zero is a natural number) 2) a ∈ N → a+ ∈ N (the immediate successor of any number is a number) 73 3) 0 ∈ S ∧ (∀x :: (x ∈ S → x+ ∈ S)) → N ⊂ S (if a set S contains zero and if it contains any number x it also contains the immediate successor x+ of that number. I refer to [31. and further considered elementary analysis and geometry. . He invented this language (see [49. His mathematical work were to have a profound inﬂuence on the thought of mathematics. b ∈ N ∧ a+ = b+ → a = b (no two diﬀerent numbers have the same immediate successor) 5) a ∈ N → a+ = 0 (zero is not the immediate successor of a number) Axiom three has the function to formalize the principle known as mathematical induction. chapter 5].
in full detail (it can be found in [49. and every good philosopher is at least half a mathematician’.2 Frege’s work As I think about acts of integrity and grace. but give a few examples of his new logic and describe the rest of his work in general terms. much of his work had been ignored to the beneﬁt of men inﬁnitely less capable. his second volume was about to be published. Famous is his ontological argument for the existence of god. I realize that there is nothing in my knowledge to compare with Frege’s dedication to truth. . page 175182] and in [31. page 127] The German mathematician and philosopher Gottlob Frege (18481925) was one of the founders of modern symbolic logic putting forward the (logistic) view that mathematics is reducible to logic. We will mention his three most important works on the foundations of mathematics: Begriﬀsschrift. in [93. He has written many important papers on philosophy. and upon ﬁnding that his fundamental assumption was in error. he responded with intellectual pleasure clearly submerging any feelings of disappointment. but we will not discuss his philosophical writings here. We will not treat the symbolism that was used in Begriﬀsschrift here. the German mathematician Gottlob Frege invented a logical symbolism to which he gave the name ‘Begriﬀsschrift’ (in English known as ‘Concept script’). . Russell about Frege. Grundlagen der Arithmetik and Grundgesetze der Arithmetik. PEANO AND FREGE 4. page 177199]). and recognized (not as the ﬁrst) that the patterns of Aristotle cannot always be used to evaluate inferences correctly. Frege rejected the subject/predicate regimentation on which Aristotelian logic depends. It was almost superhuman and a telling indication of that of which men are capable if their dedication is to creative work and knowledge instead of cruder eﬀorts to dominate and be known. Frege once said ‘every good mathematician is at least half a philosopher.B.74 CHAPTER 4. His entire life was on the verge of completion. Begriﬀsschrift Just as Peano.
What is propositional calculus (or sentential calculus)? A symbolic system of treating compound propositions and their logical relationships. Therefore. Compound propositions are formed via a set of derivation rules using standard symbols: ∧. Joe is not Scottish. FREGE’S WORK Example: Certain obvious inferences. Joe doesn’t wear a kilt. 75 do not fall under the patterns of traditional logic (also called syllogisms). Basic propositions consist of simple. ∨.2. ¬ . such as: If Joe doesn’t wear a kilt. With arbitrary expressions for A and B. Frege based his propositional calculus on 6 axioms: for all x. the rule became later known as modus ponens. y and z: 1 x → (y → x) 2 (x → (y → z)) → ((x → y) → (x → z)) 3 (x → (y → z)) → (y → (x → z)) 4 (x → y) → (¬y → ¬x) 5 ¬¬x → x 6 x → ¬¬x . A. Frege adopted this new rule in the system of logic of his Begriﬀsschrift.4. than Joe is not Scottish. →. A logic that evaluates these sorts of expressions is called a propositional logic. unanalyzed propositions. Actually this is another kind of inference that contains a conditional expression of the form: if B then A B Therefore.
based on the concept of (cardinal) numbers. he had to do more than developing a new logical symbolism.e. His next book. the subjective from the objective 2) Never to ask for the meaning of a word in isolation. For the full calculus of predicates. all inferences that can be shown to be valid inferences using the techniques of Aristotelian or propositional logic can also be shown to be valid using only Frege’s laws and rules of inference). 184185]). In this book. y and (propositional functions) F : 7 (x = y) → (F (x) → F (y)) 8 x=x 9 (∀x :: F (x)) → F (y) Frege presented this new logic in his ‘Begriﬀsschrift’ in 1879. but only in the context of a proposition . For all x. The third part of Begriﬀsschrift shows that logic alone suﬃces to show the validity of certain inferences (about properties that are hereditary in socalled ‘ancestral sequences’). he believes. was devoted to the ‘foundations of arithmetic’. that mathematics is reducible to logic. But thereto. In the introduction of his book Frege announced his three guiding principles: 1) Always to separate sharply the psychological from the logical. three additional axioms were needed.76 CHAPTER 4. In [31. In the ﬁrst part he provides a list of inferences from which. PEANO AND FREGE Derivations in the propositional calculus were based on two procedures of substitution and the rule of modus ponens. and he discussed work of others in detail (see [49. ‘Die Grundlagen der Arithmetik’ (1884). Then Frege demonstrates in the second part the completeness of his logic (i.4. Frege treated the foundations of arithmetic. all truths of logic can be derived.3) can be replaced by a principle about ancestral sequences that depends only on logical laws. Grundlagen der Arithmetik Throughout his work Frege developed (as the ﬁrst) the main thesis of logicism. He also showed that mathematical induction (see section 3. He put forward the logicist philosophy that arithmetic could be founded upon logic alone. page 183] we learn more about Frege’s philosophy. It consists of three parts.
one of the most important of them being a notation for what Frege called the ‘courseofvalues’ of a propositional function. Begriﬀsschrift. In 1902. We denote it as cov and deﬁne equal courseofvalues by cov(f ) = cov(g) ↔ (∀a :: f (a) = g(a)). and wanted to show that all the truths in arithmetic are derivable from logical laws and deﬁnitions alone. but also to ﬁll in pieces that were simply missing. ‘On Sense and Meaning’. During that period he also completed his deﬁnitions of the natural numbers and some of the proofs of simple truths of arithmetic from these deﬁnitions and logical laws. when volume 2 was in press. FREGE’S WORK 3) Never to lose sight of the distinction between concept and object 77 In his book he presented his own theory of numbers. In 1893. The calculus also had several other new logical and arithmetical symbols. . Before Frege could do that he needed a new version of Begriﬀsschrift. ˘ denoted the truth value for all possible values of the argument (here ε). he received a now famous letter from the English mathematician and logician Russell (see chapter 5). the ‘Basic Laws of Arithmetic’. Frege published the ﬁrst volume of his ‘Grundgesetze der Aritmetik’. He did this by sketching the proof. Grundgesetze der Arithmetik In his next three papers ‘Function and Concept’. and ‘On Concept and Object (1892)’. he introduced all modiﬁcations that he was to make to his language. In the second part Frege wanted to deﬁne the natural numbers and some basic laws governing them and. which provided a shorter notation for many Begriﬀsschrift propositions. but not giving the oﬃcial Begriﬀsschrift proofs of the truths of arithmetic. in the third part. to accompany the new requirements that his formalization of the concept of numbers had. denoted by Frege as εϕ(ε). It set out the new version of logic and began the proofs that were to make the project successful. with great modesty.2. a contradiction could be derived in Frege’s system (see section 5. and his logical system.1). This contradiction would later be named after Russell and become known as ‘Russell’s paradox’. His new logical calculus included a symbolic representation of the truth value of any given proposition.4. he would deﬁne the real numbers and lay the foundations for expressing analysis in terms of logic. who pointed out. The courseofvalues of a propositional function ϕ .
. . Frege modiﬁed one of his axioms and explained in an appendix to the book that this was done to restore the consistency of the system. I have been placed in this position by a letter of mr. . Bertrand Russell just as printing of the second volume was nearing completion . has become ubiquitous. Although Frege attracted only a small audience in his lifetime. he was a major inﬂuence on Peano and Russell. but with Russell’s theory of types Principia Mathematica did not contain the paradox. see [49. PEANO AND FREGE Hardly anything more unwelcome can befall a scientiﬁc writer than one of the foundations of his ediﬁce be shaken after his work is ﬁnished. pages 124128]). page 181]). In this text I have made extensive use of the excellent books [98] and [97] about Frege that contain many more references about Frege and his work. Frege’s contribution to the foundations of mathematics was therefore largely indirect (through Principia Mathematica. and chapter 4. and in the years thereafter his inﬂuence on contemporary philosophy.1). by Leshniewski (see [85]). . section 4 from [49]. and both aimed at a logistic basis for mathematics.The ﬁrst paragraph of the appendix from Frege’s ‘Grundgesetze der Aritmetik’ After many letters between the two (see for example [93. many of the theorems of volume 1 do not go through and Frege must have known this.78 CHAPTER 4.5 from [31] and chapter 6. . The scope of Frege’s Grundgesetze is similar to that of Principia Mathematica (to be discussed in section 7. especially on thought about language and logic. However with this modiﬁed axiom. He probably never realized that even with the modiﬁed axiom the system is inconsistent since this was not shown until after Frege’s death in 1925.
sociology. a span of three generations. I quote the following assessment from [73]: “Bertrand Russell had one of the most widely varied and persistently inﬂuential intellects of the 20th century. ethics. science.2. 11 and sections 8.000 largely unaltered words a day) and partly from the deep humanitarian feeling that was the mainspring of his ac79 . 10. 8. He certainly was one of the most important logicians and philosophers of the 20th century.3. [80] and [31.Chapter 5 Russell The fact that all Mathematics is Symbolic Logic is one of the greatest discoveries of our age. aﬀairs. 8. 7.B. education. and when this fact has been established. but we will not discuss these subjects here (see for more information and references on Russell’s life and work [62]. religion. his social and political campaigns and advocacy of both paciﬁsm and nuclear disarmament are certainly interesting. mathematics. imprisonment.1]). the remainder of the principles of mathematics consists in the analysis of Symbolic Logic itself. 8. 1903 The English logician and philosopher Bertrand Russell (18721970) published in his long life an incredible number of books on logic. Russell had at any time more than 40 books in print ranging over philosophy.1. the theory of knowledge and many other topics.9.2. history.4. 8.2.8. The extent of his inﬂuence resulted partly from his amazing eﬃciency in applying his intellect (he normally wrote at the rate of 3. Russell’s private life. politics and polemic. 10. chapter 6. . Russell in Principles of Mathematics.3. During most of his active life.
1) fails: propositional functions. in some signiﬁcant sense. Using the vicious circle principle also adopted by Henri Poincar´.” Here. reducible to formal logic). Although the . For some. although consistent. we will focus on Russell’s mathematical contributions to the foundations of mathematics. For others. The theory admits of two versions. leftwing. Russell’s own response to the paradox came with the introduction of types (see chapter 7). Along with Kurt G¨del (see chapter 8). The paradox and the closely related vicious circle principle are discussed in section 5. they were too weak since they failed to resolve all of the known paradoxes. and skeptical atheist temperament. he is o usually credited with being one of the two most important logicians of the twentieth century. his introduction of the theory of types. 1913). ‘Principia Mathematica’ (1910. Principia Mathematica and the theory of types will be treated in detail in chapter 7. Although ﬁrst introduced by Russell in 1903 in the Principles. Russell discovered the paradox which bears his name in 1901. We will look at each of these contributions in more detail. e together with Russell’s socalled ‘noclass’ theory of classes. the axiom of reducibility. should not be applied to themselves since selfapplication would involve a vicious circle. 1912. it follows that it is possible to refer to a collection of objects for which a given condition (or predicate) holds only if they are all at the same level or ‘type’. his defense of logicism (the view that mathematics is. his theory of types ﬁnds its mature expression in his 1908 article ‘Mathematical Logic as Based on the Theory of Types’ and in the monumental work he coauthored with Alfred North Whitehead. His contributions relating to mathematics include his discovery of Russell’s paradox. Both versions of the theory later came under attack. and his reﬁning and popularizing of the ﬁrstorder predicate calculus. On this view. Russell’s response to the second of these objections was to introduce. RUSSELL tions. the ‘simple theory’ and the ‘ramiﬁed theory’. they were too strong since they disallowed many mathematical deﬁnitions which. violated the vicious circle principle. such as ‘x is a set’. Russell was then able to explain why the unrestricted comprehension axiom (see section 2. within the ramiﬁed theory.80 CHAPTER 5. while working on his ‘Principles of Mathematics’ (1903). This feeling expressed itself consistently at the frontier of social change through what he himself would have called a liberal anarchistic.1.
It followed that numbertheoretic operations could be explained in terms of settheoretic operations such as intersection.81 axiom successfully lessened the vicious circle principle’s scope of application. First defended in his Principles. For more information on Russell’s theory of types and about Principia Mathematica. Of equal signiﬁcance during this same period was Russell’s defense of logicism. The ﬁrst is that all mathematical truths can be translated into logical truths or. and the like. and elementary measure theory. many claimed that it was simply too ad hoc to be justiﬁed philosophically. Russell’s logicism consisted of two main theses. . In this chapter we used parts of [73] and [39]. that the theorems of mathematics constitute a proper subset of those of logic. in other words. that the vocabulary of mathematics constitutes a proper subset of that of logic. the theory that mathematics was in some important sense reducible to logic. Russell’s basic idea for defending logicism was that numbers may be identiﬁed with sets of sets and that numbertheoretic statements may be explained in terms of quantiﬁers and identity. and later in more detail in ‘Principia Mathematica’. we refer to chapter 7. The second is that all mathematical proofs can be recast as logical proofs or. in other words. ﬁnite and transﬁnite arithmetic. union. In ‘Principia Mathematica’ Whitehead and Russell were able to provide detailed derivations of many major theorems in set theory. A fourth volume on geometry was planned but never completed. Like Gottlob Frege.
Before we describe the most famous paradox of Russell. and is also known as an antinomy In an axiomatic system (see section 6. by its etymology.1) a paradox is a derivation that leads to a contradictory statement. but is frequently used to signify something selfcontradictory [. A paradox is properly something which is contradictory to general opinion.Russell. We ﬁrst formulate the paradox in Cantor’s set theory: Russell’s paradox: Let R = {x  x ∈ x}. The paradox turned out to be at the basics of mathematics. we ﬁrst deﬁne the notion of a paradox. formal or designed for computers. three ‘paradox threats’ are identiﬁed: when systems are complex.1 Russell’s paradox I hoped sooner or later to arrive at a perfect mathematics which should leave no room for doubts. With the previously described formalizations. page 310] In [86]. Peano (see section 4. and bit by bit to extend the sphere of certainty from mathematics to other sciences. . is best ﬁtted to signify an oﬀence against the formal rules of inference. in 1902 Russell discovered a paradox in Frege’s ‘Grundgesetze der Aritmetik’. .. the systems of Cantor (see chapter 2). in [78] Paradoxes have been known for a long time. RUSSELL 5.1)..De Morgan.82 CHAPTER 5. there often is not enough intuition to notice inconsistencies. in [31. but in particular with the introduction of more formal systems at the end of the 19th century paradoxes became more inﬂuential on the foundations of mathematics. And indeed. Frege (see section 4. Then R ∈ R ↔ R ∈ R / .2). and not to mention Russell himself were at risk.] Paralogism. What is a paradox? A paradox is a statement which appears selfcontradictory or contrary to expectations. since it could be formulated in all the systems mentioned above.
K.1] and after noting that some sets belonged to themselves while the rest did not do so. page 7] for a slightly diﬀerent proof): Deﬁne f (x) := (¬∀ϕ :: (cov(ϕ) = x) → ϕ(x)). Russell also expressed this paradox in terms of predicates.¬¬} (∀ϕ :: cov(ϕ) = K → ϕ(K)) ≡ {instantiate ϕ with f } cov(f ) = K → f (K) ≡ {def. and all statements in classical logic were entailed by a contradiction. ¬f (K) ≡ {def. and as such ﬁrst presented his discovery in a letter to Frege (see [93. we can derive the paradox in Frege’s work as follows (see also [86. so R ∈ R was not allowed. section 6. If we deﬁne equal courseofvalues cov by cov(f ) = cov(g) ↔ (∀a :: f (a) = g(a)). and let K := cov(f ).6. also Peano’s work contained the paradox.5. Since Peano’s system was based on the set theory of Cantor.1. →} f (K) The paradox had a big inﬂuence. by repetition of the argument. f} ¬(¬∀ϕ :: cov(ϕ) = K → ϕ(K)) ≡ {elim. RUSSELL’S PARADOX 83 Russell in 1901 studied Cantor’s work [31. page 124] and see also the quote on page 78). vice versa also. . Russell showed that the set of all sets which do not belong to themselves belongs to itself if and only if it does not do so . but the paradox could still be expressed by using Frege’s notion (see page 77) of the courseofvalues of a function. since it could be formulated in all systems. In Frege’s work (Grundgesetze der Aritmetik) selfapplication was not possible.and. elim.
Before we consider the consequences of the discovery of the paradox. hereby following Russell’s own analysis. Russell’s attention was attracted by what is now known as Cantor’s paradox and (according to a letter he wrote to the French mathematician Jourdain) found that there was something wrong with his earlier refutation of Cantor’s paradox (see [29.e. if they contain a set of objects such that it will contain members which presuppose this (total or whole) set of objects. arises from the assumption that a set of objects may contain members which can only be deﬁned by means of the set as a whole. Russell traced the fallacy back to what he called the ‘vicious circle principle’. Most attempts at resolving Russell’s paradox have therefore concentrated on various ways of restricting or abandoning this axiom. regardless of the nature of the elements of the 1 . he summarized this discovery and the reasoning that led thereto in a second letter to Frege. We might rephrase the deﬁnition as ‘whatever set contains an apparent element. that element must not be dependant on that set’.84 CHAPTER 5. But that means that the condition ϕ that determines a set V = {x  ϕ(x)} may depend on the whole set V . This might be implemented by ﬁxing ‘an apparent element’ of a set and then expressing its independency of other elements of that set. Brouwer) it therefore appeared that no proof could be trusted once it was discovered that the logic underlying all mathematics was inconsistent. Cantor created this axiom with the intuition that any coherent condition may be used to determine a set. That means a statement is only legitimate if all propositions it contains refer to already deﬁned sets. Russell’s paradox arises as a result of naive settheory’s socalled unrestricted or naive comprehension axiom (see page 16). As we have already seen. While writing ‘The Principles’. The ‘vicious circle’ that his principle is named after. it allows impredicative deﬁnitions (see below for the deﬁnition of impredicative). Therefore. He removed his earlier refutation from ‘The Principles’ and his revised diagnosis uncovered a true paradox. After discovering his famous paradox. i. This independency means that.1 Note that a direct implementation of this deﬁnition as a new axiom of set theory is not possible. Russell said that statements are not legitimate and meaningless. Deﬁnition of impredicative: A deﬁnition is impredicative if it involves a set V that has a member v ∈ V whose deﬁnition depends on V . we ﬁrst take a further look at the nature of the paradox. Hilbert. RUSSELL In the eyes of many mathematicians (e. section 7]).g.
So to obtain a contradiction we have to show that x ∈ x ∨ x ∈ R. In other words. We consider a set X:=R ≡ {x  x ∈ x} and / an element x ∈ R. page 113]: ‘If. In terms of set theory we can formulate the principle as : No set V is allowed to contain members v deﬁnable only in terms of V . i. Deﬁnition of Vicious Circle Principle2 : Deﬁnitions.2]). provided a certain collection had a total. all of which were based on the same circular type of reasoning. Vicious circle fallacies are arguments that are condemned by the vicious circle principle. This leads us to the following axiom: (∀X :: (∀x : x ∈ X : x = a → (∀x : x ∈ X ∧ x = x : x = b(x ) → a ∈ X))). and were considered the cause of antinomies. For more information about impredicativity. or members v involving or presupposing V . Clearly this does not avoid the paradox of Russell. impredicative deﬁnitions should be avoided.1. The axiom tells us x = a → (∀x : x ∈ R ∧ x = x : x = b(x ) → a ∈ R). but still we will have x ∈ x and x ∈ R. Or. Despite the fact that the set X is ‘too large’. as formulated in [49. The ‘nature’ of the elements can be seen as all the members of that element (or in case the element is an individual. We will now mention eight of the most wellknown paradoxes. . In Principia Mathematica (see [31. the / axiom does not prohibit the existence of the set X. and then he resolved them by making their circularity explicit. see [57. the nature of the apparent element remains the same. 2 Russell formulated it originally as ‘Whatever involves all of a collection must not be one of the collection’. we have x ∈ x. then the said collection has no total’.e.5. So unfortunately this most ‘direct’ attempt to solve the paradox fails. RUSSELL’S PARADOX 85 In a sense those impredicative deﬁnitions are thus circular. / / Now we can change all x into any value b(x ). most of whom originate from the vicious circle principle. Such arguments may not necessarily lead to contradictions (since fallacious arguments can lead to true conclusions). we can change each element in R except x and the nature of x should not depend on it. Russell assembles a collection of seven diﬀerent paradoxes. it would have members only deﬁnable in terms of that total. The only thing we know about x is that x ∈ x and x ∈ R. set. Another formulation of [87] says ‘No entity can be deﬁned in terms of a totality of which it is itself a possible member’. assumptions or statements involving all of a set must not be a part or an element of that set.3]. section 15. the nature of the apparent element can be seen as that individual). In other words. section 7.
. The oldest form of this particular paradox. page 127]: “If a man says ‘I am lying’. The impredicativity is clear in the deﬁnition of the set that contains all sets that are not members of themselves.”. 6 Paradox of deﬁnitions. 5 Richard’s paradox: The French schoolteacher Jules Richard (18621956) published a paradox in [74] in 1905. and applying Cantor’s diagonal argument to the members of V . and all other statements made by Cretans were certainly lies’. one of them is from Russell himself (1919) and concerns the plight of the barber of a certain village who has enunciated the principle that he shaves only all those persons of the village who do not shave themselves. 2 BuraliForti’s paradox (1897). The impredicativity comes from the cardinal number of the set of all sets. 4 The liar’s paradox: We quote from [49.1.8. and there are therefore at most ℵ0 of them. which we have discussed in section 3. his utterance is selfcontradictory. 7 Berry’s paradox: “The least integer not nameable in fewer than nineteen syllables” is itself a name that contains only eighteen syllables. and thus it appears that the description ‘the least indeﬁnable ordinal’ yields a deﬁnition of an entity that cannot be deﬁned. deﬁned in a ﬁnite number of words. The impredicativity comes from the ordinal number of the naturally ordered set of all order numbers. is that of Epimenides the Cretan. RUSSELL 1 Russell’s paradox (1903). Again we quote from [49]: “The possible deﬁnitions of speciﬁc ordinal numbers can be arranged in a sequence. a diﬀerent but nonterminating decimal was produced. But the totality of ordinal numbers is not denumerable. By arranging V as a sequence. which we have discussed in this section. 3 Cantor’s paradox. in the words of Principia Mathematica.86 CHAPTER 5. which we have discussed in section 3. The paradox is then formed by the question ‘Does the barber shave himself?’. and so there exist ordinal numbers which cannot be individually deﬁned.8. There are many popularizations of this paradox. Among such indeﬁnable ordinals there is a least.”. ‘who said that all Cretans were liars.2. He considered a set V of all nonterminating decimals that can be deﬁned in a ﬁnite number of words. and it cannot be either true or false.
or ‘verb’. The remaining ﬁve are mainly paradoxes of naming.”. page 336]: “Some words can be predicated of themselves: in English. Other English words are not autological. All these paradoxes have stimulated fundamental research.but this word is heterological if and only if it is not so. ‘noun’ is a noun. they are of a semantic kind.1. The ﬁrst three paradoxes are logical paradoxes that can be formulated within Cantor’s set theory. ‘German’. This property is called ‘autological’. . and so on. say. RUSSELL’S PARADOX 87 8 The GrellingNelson paradox: The German philosopher Kurt Grelling (18861942) published with his friend Leonard Nelson (18821927) in 1908 a paradox.5. As described in [31. and especially Russell’s paradox that revealed the vicious circle principle and ﬁrst showed the need for a theory of types or other restriction of the power of the comprehension axiom. ‘word’ is a word. They are called ‘heterological’ . and is obviously itself autological.
Since selfapplication (R ∈ R) caused a contradiction. This was a serious problem to Russell’s solution. The outlawing of impredicative deﬁnitions seemed a solution to the known paradoxes in set theory. Bernays and G¨del (see section 8. Hilbert and other formalists had the basic idea to allow the use of only welldeﬁned and ﬁnitely constructible objects.4). Newman quoted in [46] The various proposals to overcome this paradox led to various theories. after a reﬁnement by the mathematician Fraenkel. Although he succeeded in obtaining a considerable part of analysis. o Russell’s own response to the paradox came with the introduction of his theory of types in his Principia Mathematica (see section 5. he was unable to derive the important theorem . This attempt was successful and. he decided to suppress this.1) suﬃciently restrictive to exclude the paradoxes.2 Consequences and philosophies Perhaps the greatest paradox of all is that there are paradoxes in mathematics. One proposal was to reconstruct set theory on an axiomatic basis (this axiomatic method was ﬁrst suggested by Hilbert. together with rules of inference that were deemed to be absolutely certain. despite the fact that many instances of impredicative deﬁnitions in mathematics could be circumvented. page 265]: “In 1918.E. the German mathematician Hermann Weyl (18851955) tried to construct as much parts of analysis as possible from the natural number system without the use of impredicative deﬁnitions. Russell already laid out a ﬁrst version of his theory to eliminate the paradoxes in 1908. Subsequent reﬁnements to ZF have been made by Skolem. But it turned out there are essential and accepted parts of mathematics that contain impredicative deﬁnitions. RUSSELL 5. see section 6. Kasner and J. With this approach he assigned types to variables (as types he took sets) and allows expressions such as x ∈ y only if the type of x is one less (in some order) than the type of y. .88 CHAPTER 5.5). but most other parts of settheory are. and later by the three mathematicians von Neumann. We quote from [22. led to the ZF axiom system (see section 5. The mathematician Zermelo in 1908 as ﬁrst did an attempt to formulate proper axioms for settheory such that the paradox is not deducable.3) which is still the most accepted basis today.
In the period after the discovery of the paradoxes.5. The intuitionists had the basic idea that one cannot assert the existence of a mathematical object unless one can also indicate how to go about constructing it. What is Intuitionism? A school of mathematical thought by the 20th century Dutch mathematician L. we distinguish three main philosophies of mathematics: logicism. Brouwer ﬁrst attacked the logical foundations of mathematics in his doctoral thesis in 1907. and Whitehead and Russell (in Principia Mathematica 19101913). Other attempts towards a solution for the paradoxes of set theory focus on the foundations of logic. without the use of any speciﬁcally mathematical concepts. The ﬁrst ideas date back to Leibniz (1616) and the actual reduction of mathematics to logic was started by Dedekind (1818) and Frege (18841903) and later by Peano. and not if it merely showed that the entity ‘could’ be constructed or that supposing its nonexistence would result in contradiction.E. intuitionism and formalism. Luitzen Brouwer and the intuitionists took this approach and tried to prevent the paradoxes by denying the principle of the excluded middle (which states that any mathematical statement is either true or false).2. What is Logicism? A school of mathematical thought which holds the thesis that mathematics is a part of (or a branch of) logic. Logicists contend that all of mathematics can be deduced from pure logic. They proposed that a proof in mathematics should be excepted only if it constructed the mathematical entity it talked about. .J. This formed the beginning of the Intuitionist School. Brouwer (18811966) that contends that the primary objects of mathematical discourse are mental constructions governed by selfevident laws. such as number or set. CONSEQUENCES AND PHILOSOPHIES 89 that every nonempty set of real numbers having an upperbound has a least upperbound”. Intuitionists have challenged many of the oldest principles of mathematics as being nonconstructive (and hence meaningless).
These three philosophies do not necessarily contradict each other. Formalists contend that it is the mathematical symbols themselves. which asserts that for every proposition ϕ. This principle is basic to classical logic and had already been enunciated by Aristotle. These are serious objections to the intuitionistic approach. that are the basic objects of mathematical thought. as he pointed out that the statement “there will be a sea battle tomorrow” is neither true nor false. Whether the logicist thesis has been established seems to be matter of opinion. RUSSELL Brouwer had the fundamental insight that such nonconstructive arguments will be avoided if one abandons a principle of classical logic (which lies for example behind De Morgan’s laws). it can be questioned on the ground that the systematic development of logic presupposes mathematical ideas in its formulation. but it is generally conceded that its methods do not lead to contradictions. This is the principle of the excluded third (or excluded middle). and some hope for a new intuitionist reconstruction of mathematics carried out in a diﬀerent and more successful . that is a theorem provable within the syntax of mathematics. Though successful. but a large part is still wanting. and not any meaning that might be ascribed to them. was to concentrate on the formal language of mathematics and to study its syntax. for every ϕ. making intuitionist mathematics less powerful and in many respects much more complicated than classical mathematics. and all philosophies are still advocated today. called formalism. A statement should be a metatheorem. What is Formalism? A school of mathematical thought introduced by the 20th century mathematician David Hilbert. Hilbert’s program. Because of the weight it places on mental apprehension through construction of purported mathematical entities. intuitionism is sometimes also called constructivism. though with some reservations.90 CHAPTER 5. ¬¬ϕ implies ϕ. or equivalently that. The intuitionists succeeded in rebuilding large parts of presentday mathematics. More information on intuitionism can be found in [60]. in which one rejects inﬁnite sets. which holds that all mathematics can be reduced to rules for manipulating formulas without any reference to the meanings of the formulas. A still more severe form of constructivism which we will not further discuss is strict ﬁnitism. either ϕ or ¬ϕ.
5. For more information about realism. item from Paul Bernays] we learn that most mathematicians of all three philosophies are also philosophical realists: “While no one. Unfortunately for the formalists. will deny the importance of the language of mathematics.most pure mathematicians are tacit formalists. except an extremist intuitionist. this position is also called Platonism. see [57]. formalism is not dead . since Plato believed that mathematical entities really exist. Following the Swiss mathematician Paul Bernays (18881977). a consequence of G¨del’s incompleteo ness theorem (see chapter 8) is that the consistency of mathematics can be proved only in a language which is stronger than the language of mathematics itself. From [11.”. but the naive attempt to prove the consistency of mathematics in a weaker system had to be abandoned. Yet.2. CONSEQUENCES AND PHILOSOPHIES 91 way. most mathematicians are also philosophical realists who believe that the words of this language denote entities in the real world. .
1. c. Criteria for the choice of axioms have been identiﬁed by several people (see Hilbert’s theory in section 6. Thus some.. oﬀered a way to put set theory on ﬁrm and hopefully consistent grounds. see page 288. in other words the paradoxes should be avoided). The most accepted criteria (more formally deﬁned in chapter 6) include: 1. The socalled ideal calculus was a ﬁrst formalization of Cantor’s set theory. for example by a system of logic. we call b a set and a an element of this set. last sentence of page 287]). The ﬁrst real axiomatization of set theory was given in 1908 by the German mathematician Ernest Zermelo in [101]. The attitude adopted in his axiomatic development of set theory is that it is not necessary to know what ‘sets’ are and the ‘things’ that are its elements. Plausibility (the axioms should be in accord with intuitive beliefs about sets. b. He then deﬁned the primitive notions of equality and membership: a = b states that ‘a’ and ‘b’ designate the same ‘thing’. nor what the ‘membership relation’ means [49. RUSSELL 5.3 5. 3. . 2. a ∈ b is deﬁned on the domain B and if a ∈ b holds. paragraph 1]. Completeness (richness of the theory: the desirable results of Cantorian set theory ought to be derived as theorems). but not necessary all objects of B are sets. that will be treated in more detail in chapter 6. Hilbert’s proof theory.3. Consistency of the system (it should be impossible to derive both a statement and its negation.1 Zermelo Fraenkel Axiomatic set theory After the discovery of Russell’s paradox.92 CHAPTER 5. but it lacked the preciseness of Hilbert’s later theories and was inconsistent because it still contained in some form the (naive) comprehension principle (see page 16). or [49. In the next paragraph we will present the set of axioms that Zermelo has chosen and that formed the basis for all future axiomatizations of set theory . . it became clear that set theory needed a new and more rigorous basis. The assumptions adopted about these notions are called the axioms of the theory. see [60]). . Zermelo instead postulated a domain B of abstract objects and represented the elements or ‘things’ of this domain by the letters a. Its theorems are the axioms together with the statements that can be deduced from the axioms using the rules of inference (see also section 6).
3. for Russell’s paradox it is the set of all sets which are not members of themselves (see section 3.5. and he changed the (naive) comprehension principle into his separation axiom.2).8. There are also certain limitations on the property ϕ (i.5 for the deﬁnition of the concept of deﬁniteness. We show that the standard derivation of Russell’s paradox cannot be applied when the naive comprehension axiom is replaced by the separation axiom.5). 93 5. such that the paradox could no longer be derived: Separation Axiom: (∀z∃y∀x :: (x ∈ y ↔ x ∈ z ∧ ϕ(x))) For every set z and deﬁnite4 property ϕ of sets there exists a set whose elements are exactly those of z having the property ϕ.2 Zermelo Fraenkel (ZF) Axioms Zermelo formulated his axiomatic system in 1908.2) it is the set of all wellorderings). Let R = {x  x ∈ Z ∧ x ∈ x} / R∈R↔R∈Z ∧R∈R / → R ∈ R. 4 See section 8. 3 . This means all sets are classes but not every class is a set. A class that is not a set is called a proper class. ZERMELO FRAENKEL (see also section 8.e. all other sets are then referred to as improper classes. Zermelo noted that the sets involved in a derivation of the paradoxes are very large3 (for Cantor’s paradox it is the set of all sets (see section 3. contradiction.8. Therefore he wanted to restrict the size of sets.3. / / R∈R↔R∈Z ∨R∈R / The term proper class is sometimes used to refer to these ‘excessively large’ sets. the extensions of Fraenkel are from 1922. and for the BuraliForti paradox (see section 3.5. In the same year (1922) the Norwegian mathematician Skolem (18871963) proposed a formal language for formulating the theory.1). it should be deﬁnite) that we will mention later in section 8.8.
Since we cannot directly conclude (or refute) R ∈ Z. For example.94 ←R∈Z / CHAPTER 5. However. this fact alone does not guarantee that there does not exist a paradox. but merely that the separation axiom does not permit the construction of paradoxical sets with elements deﬁned in terms of the sets themselves. z :: (z ∈ x ↔ z ∈ y) → x = y) Sets are uniquely determined by their members. 2. there might be other less obvious ways to construct a paradox.or to be exact: if every element of a set x is at the same time an element of y. see also footnote on page 93) set. and conversely. The ZermeloFraenkel axioms: 1. We now give all of the ZF axioms that constitute set theory. and in axiom 6 we used ⊆ to express x ⊆ z as a shorthand for (∀y :: y ∈ x → y ∈ z). Axiom 8 and 9 were later added by Fraenkel and von Neumann respectively. such that the notation of each expression does not depend on previous axioms. y. The separation and substitution axioms are actually axiom schemes. then x = y. . which contains no elements at all. RUSSELL In both equations above we can only conclude that R ∈ R ↔ R ∈ R if / we know that R ∈ Z. in axiom 8 we used the ∃! to denote that there is exactly one y. and in axiom 9 we used the symbols ∩ and ∅. / Russell’s derivation of his paradox does not apply. the ‘null’ or ‘empty’ set. In the deﬁnitions below we use several shorthand notations. If we wish however we can express these deﬁnitions in full detail. Extensionality axiom (or axiom of determination): (∀x. But until consistency is proved. The axioms 1 through 8 are the original set of the ZermeloFraenkel axioms. as claimed in some articles. Axiom of the empty set: (∃x∀y :: y ∈ x) / There is an (improper. The ﬁrst seven axioms are those that were originally formulated by Zermelo.
4. See section 8. Then5 . Sumset axiom or Union axiom: (∀z∃y∀x :: x ∈ y ↔ (∃w :: w ∈ z ∧ x ∈ w)) For every set z there exists a set y whose elements are exactly those objects occurring in at least one element of z. y)) → (∀a :: (∃b∀y :: y ∈ b ↔ (∃x : x ∈ a : ϕ(x. 5. while W Since the property x ∈ x is deﬁnite. ZERMELO FRAENKEL 95 3.3. For every set z there exists a set y whose elements are exactly those of z having the property ϕ. 8. ϕ is deﬁnite and does not contain y. Separation axiom: (∀z∃y∀x :: x ∈ y ↔ x ∈ z ∧ ϕ(x)). Axiom of foundation or axiom of regularity (by von Neumann): (∀a ::= ∅ → (∃b :: b ∈ a ∧ b ∩ a = ∅)) Every nonempty set is disjoint from at least one of its elements. Axiom of replacement or axiom of substitution (by Fraenkel): (∀x∃!y :: ϕ(x. Pairing axiom: (∀a. 7. Proof: Suppose V is any given set. Axiom of inﬁnity: (∃z :: ∅ ∈ z ∧ (∀a : a ∈ z : {a} ∈ z)) There exists a successor set. y)))) The image of a set under an operation ϕ (functional property) is again a set.5. 9. b :: (∃y∀x :: x ∈ y ↔ x = a ∨ x = b)) Given two sets a and b there exists a set whose elements are exactly a and b. Theorem: (from [49. 6.5 for the deﬁnition of the concept / of deﬁniteness. V has a subset W that consists of those elements of V that are not members of themselves. Power set axiom (∀z∃y∀x :: x ∈ y ↔ x ⊆ z) For every set z there is a set y that includes every subset of x. chapter 11]) The domain B itself (see page 92) is not a set. But then W is not an element of itself (because in that case we would have W ∈ W . 5 .
each set constructed The existence of the empty set in section 2. chapter 11]. RUSSELL consists of elements that are not members of themselves). But V is certainly in B. For more detailed information. The axioms are not minimal. The theory is not complete. we will give a short explanation of the nature of the other axioms.2 was actually derived from the comprehension principle but the result can similarly be obtained from the separation axiom. 6 . the following two statements have a more dubious status (and are not part of standard ZF ): 10. together with the extensionality axiom. the axiom of the empty set can be deduced from the separation axiom. 1}ℵr = ℵr+1 In 1908 Felix Haussdorf proposed this generalization of CH.96 CHAPTER 5. we refer to section 8. Independent of the previous axioms. ensure uniqueness of the pairs. With these axioms alone we can already create an inﬁnite number of axioms.6. The pairing. [24]. [28]).5 and to the rich literature on set theory that is available (for example [17]. This means that W is not a member of V . Generalized Continuum Hypothesis(GCH): For any cardinal ℵr . [49. {0. and therefore B is not the same as V .2 how we can deﬁne basic operations with the extensionality and separation axioms. A ∧ (∀a : Deﬁnition of choice function: A function f is called choice function for the set V := Dom(f ) = V − {∅} ∧ (∀v : v ∈ Dom(f ) : f (v) ∈ V ) 11. since many statements are independent of ZF. Thus B cannot coincide with any set at all. as we have already seen in section 2. But if W would be an element of V − W . Another formulation of this axiom and more information are given in section 3. sums and powersets of sets. we would also have W ∈ W . For example. sum and powerset axioms. We have also seen in section 2.26 . Axiom of choice (AC): (∀x :: (∃f : f is a function : Dom(f ) = x − {∅} ∧ Ran(f ) ⊂ a ∈ Dom(f ) : f (a) ∈ a))) Every set x has a choice function. However. We also have empty set axiom + substitution axiom separation axiom. In the remainder of this section.
which. to deﬁne an ordering relation for a given set such that each nonempty subset has a least element. ZERMELO FRAENKEL 97 with axioms 1 to 6 only has a ﬁnite number of elements.9]. y). then there exists a function f with domain y such that for member a of y. section 2.4. With this axiom we can now also prove the principle of induction for ω (see section 3. An implication of this axiom is that there is no function f deﬁned on ω such that (∀i : i ∈ ω : f (i + 1) ∈ f (i)). denoted ω. One form of this principle is expressed as the axiom of choice. the elements of which are exactly those y for which an x ∈ a exists such that ϕ(x. the axiom asserts the possibility of making a simultaneous choice of an element in every nonempty member of any set. y).. and a is a set. Intuitively. not the foundation axiom. Zermelo (1904) gave the ﬁrst proof that any set can be wellordered. this guarantee accounts for its name. The origin of the axiom of choice was Cantor’s recognition of the importance of being able to wellorder arbitrary sets. These sets are not unique. The substitution axiom says that whenever ϕ is a property of sets.3).1]. The foundation axiom says that each nonempty set has epsilonminimal elements (see below). For a motivation and analysis of the role of the foundation axiom we refer to [17.e. f (a) ∈ a. If x is a nonempty set the elements of which are nonempty sets. section 2. although . We call its elements the natural numbers. The assumption is signiﬁcant only when the set has inﬁnitely many members. then there exists a set. The virtue of a wellordering for a set is that it oﬀers a means of proving that a property holds for each of its elements by a process (transﬁnite induction) similar to mathematical induction. His proof employed a settheoretic principle that he called the axiom of choice. Note that to prevent the paradoxes we need the separation axiom. was shown to be equivalent to the socalled wellordering theorem. It is the inﬁnity axiom that we need to create inﬁnite sets. Deﬁnition of epsilonminimal: An element b ∈ a is epsilonminimal in a := b ∩ a = ∅ Another corollary of the foundation axiom is that there is no set which has itself as its only element. Zermelo was the ﬁrst to state explicitly the axiom.5. A choice function for a set A ‘chooses’ an element from each nonempty subset in A. such that to every x there is exactly one y for which ϕ(x. For a more detailed discussion of the axiom of choice we refer to [17. i. is unique. but the smallest successor set.3. shortly thereafter.
. [77] and [11].7. A discussion of the Generalized Continuum Hypothesis can be found in section 3. There are a few mathematicians who feel that the use of the axiom of choice is improper. has become an indispensable and commonplace tool. but to the vast majority it. It soon became the subject of vigorous controversy because of its unconstructive nature. or an equivalent assertion. For this discussion of the axiom of choice we have used [63].98 CHAPTER 5. RUSSELL it had been used but essentially unnoticed earlier.
In 1899 he published the ‘Grundlagen der Geometrie’ (to appear in English as ‘The foundations of Geometry’ in 1902).2). but its application and concept is more general and can be used far beyond the domain of geometry. resulting in two volumes of ‘Grundlagen der Mathematik’ that 99 . see also [57. and published his ‘Zahlbericht’. .7. section 14. and unsuspected relations are disclosed between hitherto separated branches of science. After that he did signiﬁcant work in the areas of algebraic number theory.Chapter 6 Hilbert The further a mathematical theory is developed.Hilbert.7]) will be treated in section 6. section 4. His ﬁrst work was on invariant theory and in 1888 he proved his famous Basis theorem (see [5]). or ‘Report on the theory of numbers’ in 1897. Some of these problems were related to the foundations of mathematics (see section 6.1. This axiomatic method that Hilbert used (for geometry. which contained (see [31. the more harmoniously and uniformly does its construction proceed. quoted in [76] David Hilbert (18621943) was a German mathematician who reduced geometry to a series of axioms and contributed substantially to the establishment of the formalistic foundations of mathematics. and posed as a challenge for the next century.2]) what would become a widely accepted set of 21 axioms for Euclidian geometry and an analysis of their signiﬁcance. A substantial part of Hilbert’s fame rests on a list of 23 mathematical problems he outlined in 1900. In 1905 Hilbert attempted to lay a ﬁrm foundation of mathematics by proving its consistency.
100 CHAPTER 6. When Hilbert was made an honorary citizen of G¨ttingen he o gave an address which ended with six famous words. the work Hilbert had done on the foundations of mathematics nevertheless remained inﬂuential to the development of logic. (see [45]) led to research in functional analysis and established the basis for his work on inﬁnitedimensional space. Hilbert’s work on integral equations in about 1909. We shall know]. later called Hilbert space (see [22. . Despite that in 1931 Kurt G¨del o showed this goal to be unattainable (see chapter 8). our answer is on the contrary: Wir m¨ssen wissen. page 232]). HILBERT were intended to lead to a proof theory. showing his enthusiasm for mathematics and optimism for solving mathematical problems: “There are absolutely no unsolvable problems. u Wir werden wissen” [We must know. Instead of the foolish ignorabimus [Latin for ‘the ignorant’].
the result being called a formal system or formalism. we ﬁrst have to make precise some other concepts. What is metamathematics? The study about mathematics itself (with respect to formalized mathematical systems. Hilbert took a formal(istic) approach to achieve this logistic goal (logicism is the study that uses logic as the basis of mathematics and formalists attempted to successfully axiomatize mathematics.1 Hilbert’s proof theory Hilbert formalized mathematical theories in order to turn them into welldeﬁned objects of discussion. Hilbert’s method of making the formal system as a whole the object of mathematical study is called metamathematics or proof theory. In its structure. Thereto Hilbert identiﬁed three properties that an axiomatic system should have: it should be decidable. which in turn are sequences of letters (a symbolic language). it appears necessary to axiomatize logic itself and to prove that number theory and set theory are only parts of logic”. Or. metamathematics thus consists of statements about the signs and formulas occurring within axiomatic systems). Hilbert was the ﬁrst who emphasized that strict formalization of a theory involves the total abstraction from the meaning. see also the philosophies in section 5. he devoted himself to the much greater task of applying his new metamathematic method to pure mathematics as a whole. In order to deﬁne these notions. Deﬁnition of an axiom: A proposition that is regarded as true without proof Deﬁnition of free variable: A variable that is not bound within the scope of a quantiﬁer .6. a formalized theory is no longer a system of meaningful propositions but one of sentences as sequence of words. HILBERT’S PROOF THEORY 101 6. One of the primary goals of metamathematics is to determine the nature of mathematical reasoning After Hilbert presented an axiomatic development of geometry in ‘Grundlagen der Geometrie’ (1899).2). thus making possible the new kind of investigation to which he gave the new name metamathematics. complete and consistent.1. as Hilbert wrote in 1917: “Since the examination of the consistency is a task that cannot be avoided.
refutable sentences of L 5 H ⊆ E. 1 . an axiom with free variables is called an axiom scheme and each free variable is to be quantiﬁed over all wellformed formulas. provable sentences of L 4 R ⊆ S. HILBERT An axiom that does not contain any variables is also called an axiom statement. with free variables. We can think of such a function ϕ as a substitution function. since it not only deﬁnes a language but also includes the (dis)provability and truth of expressions. the sentence H(n) expresses the proposition that the number n belongs to the set names by H. Informally.e. Deﬁnition of an STGA language: A language1 L is Susceptible to G¨del’s argument (STGA) if it consists of: o 1 E. H(n) is a sentence (H(n) ⊆ E hence. we here consider those susceptible to G¨del’s incompleteness theorem (that will be presented o in chapter 8).e. Deﬁnition of statement (or sentence): A wellformed formula with no free variables Of the systems that Hilbert’s proof theory applies to. and is normally determined by a model that we accept as representing the truth. For convenience. sentences of L (i. The model should be distinguished from the set of derivation rules that (syntactically or mechanically) determines whether sentences are provable or Sometimes also called system. H ∩ S = ∅). a denumerable set of (wellformed) expressions (also called formulas) of L 2 S ⊆ E. we here assume predicates to have exactly one variable. 6 A function ϕ : E × N → E. with no free variables) 3 P ⊆ S. The following set is the only one that depends on a semantic interpretation of the expressions. H(n) ⊆ S).102 CHAPTER 6. ϕ assigns to every E ∈ E and n ∈ N an expression E(n) such that for every H ∈ H we take for E and every n ∈ N. predicates of L (i.
and are elements of P. their negation can be derived from the axioms and derivation rules of the axiomatic system) and are elements of R. The axiomatic system also contains a syntax deﬁnition that determines the wellformedness of expressions of L. These notions only depend on whether the sentence is derivable from the axiomatic system and are independent from the truth of the sentence.e. true sentences of L. i. we give an intuitive explanation of this deﬁnition: In most parts of mathematics. not every sequence of symbols is meaningful or useful. It is important to realize that the truth of a sentence is not the same as the provability of that sentence. also called disprovable (i. Some of them are provable from the axiomatic system (i.1. the syntax deﬁnition of an axiomatic system consists of an alphabet of symbols and a set of rules. Such an axiomatic system A is often deﬁned as follows: . Some of these formulas (also called propositions) do not contain free variables. Others are refutable. Therefore we only consider the socalled wellformed formulas E. Other formulas have free variables. We show that this notion of an axiomatic system gives rise to a language that falls under the category of STGL languages. Normally. HILBERT’S PROOF THEORY refutable. we name them sentences (S). What is an Axiomatic System? An axiomatic system (sometimes also called formal axiomatic system) is a logical system that gives rise to an STGL language and has an explicitly stated ﬁnite set of axioms from which provable sentences can be derived (using a ﬁnite set of derivation rules) The set of axioms and derivation rules determines which sentences of L are provable or not. We also assume there exists a function ϕ that assigns to every expression H ∈ H and natural number n a sentence H(n). This set can be determined by a model (see page 107) First. 103 7 T ⊆ S.6.e. We call them predicates (H). they are functions. We call the set of true sentences T (the other sentences are false). they can be derived from the axioms and derivation rules of the axiomatic system).e.
We call a system with this property correct. . ¬1 . Example: axiomatic system A1 2 • Σ = {∨2 . . (0 . R0 . xj ). 2. Ideally. ¬ϕ1 . consisting of a ﬁnite number of constants (with their arities) and variables. . • An initially determined and ﬁxed set of axioms and derivation rules (also called transformation rules or rules of inference). The set of axioms and derivation rules let us prove or refute sentences. ∀xi (ϕ1 ). . . xj variables or constants. with Ri a relation with arity j. . The variables enable us to form predicates. . ϕ1 ∧ ϕ2 . 1. has the form of ϕ1 ∨ ϕ2 . The recursive deﬁnition over the given alphabet gives us the set of expressions. We now give an example of a deﬁnition of a simple axiomatic system. HILBERT Deﬁnition of axiomatic system: An axiomatic system A consists of: • An alphabet Σ. y 0 . f alse0 } The numbers that are written in superscript denote the arity of the relations. . )0 . • ϕ is a wellformed formula if it 0. is one of the constants true and false. determining which formulas are wellformed formulas. x0 . • A recursive deﬁnition of a syntax. is an atomic formula Ri (x1 . a constant or variable is a 0ary relation.104 CHAPTER 6. (ϕ1 ). true0 . . ∀2 . and x1 . where ϕ1 and ϕ2 are smaller formulas and xi is some variable from Σ. we want all sentences that are provable coincide with the sentences we intuitively consider true (P = T ) and the refutable sentences coincide with those we consider false.
HILBERT’S PROOF THEORY 105 • For all variables x.1. d) true ¬f alse true true ∧ ϕ ϕ true ∨ ϕ true .6. ∀x(ϕ) f alse ¬true f alse f alse ∧ ϕ f alse ϕ ∨ true ϕ R0 (c. variables or constants c and d and wellformed formula ϕ.
and obtain a proposition. true) → true ∧ true → true. 6. notation A ϕ := there is an axiom ai of A and a sequence of formulas ϕ1 . 3. 4. The predicates are those expressions with one free variable. For example. E is the set of usual mathematical predicates formed by the symbols of the given alphabet (so E includes the binary relation R0 ). denoted by LA1 . 5. . Deﬁnition of derivable: A formula ϕ is derivable in L := ϕ ∈ P. the formulas are derivable if and only if they are true. 2. ∀y (false ∨ y) ∧ true → false ∧ true → false. ϕl such that ϕ1 = ai and ϕl = ϕ and each ϕi follows from the preceding formulas and the axioms of A by the derivation rules of A. 7. S is the set of those expressions without free variables (i. For each such predicate we can replace the free variable by a formula that is represented3 by a natural number. For example.. The refutable sentences R are those that are false from the derivation rules. .106 CHAPTER 6. The deﬁnition of an axiomatic system does not include a model. we can see that for all formulas except those with an ∀symbol. Sometimes it is also said that an axiomatic system A1 gives rise to a language LA An example of such a bijective function between a predicate and a set of natural numbers will be given in section 8. ¬ false ∧ R0 (false. We now introduce some concepts related to STGA languages and axiomatic systems. A formula ϕ is derivable from an axiomatic system A. propositions). The provable sentences P are those that are true from the derivation rules. We assume that A is an axiomatic system that gives rise to an STGA language L.e. . If we think of the standard logic that is used in practice. true) → true ∧ R0 (false. consists of the following parts: 1.2. 3 2 . HILBERT The STGA language L that can be constructed2 on the basis of A1 .
. Pk ) with U a universe for A and P1 . also called interpretation or structure. . . .e. free from contradiction. i. Deﬁnition of universe: Set of values that variables of an axiomatic system may take Deﬁnition of a model: A universe together with an assignment of nary relations to nary constants. . Rk of A. i. The standard method that was used to prove the consistency of axiomatic systems was to give a ‘model’. . . .e. . Pk the relations corresponding to symbols R1 . Example: A1 ¬ false ∧ R0 (false. . chosen from some other theory and satisfying the axioms. When A ϕ.1. we say that M is a model of ϕ.6. notation M = ϕ. ϕl in a derivation of the statement ϕ a formal proof π of the statement ϕ. i. HILBERT’S PROOF THEORY 107 We call the sequence of formulas ϕ1 . .e. is a refutable sentence) Hilbert proposed a program to reformulate all mathematics as a formal axiomatic theory. . true) A1 ∀x)x¬ (since the formula is not wellformed. . . and this theory has to be proved to be consistent. This means we can relate axiomatic systems to existing systems by means of a model. we also write ϕ ∈ A. . . If a formula ϕ is true in the model M (i.e. . and a corresponding assignment of the variables. does not follow to be true from the syntax deﬁnition) ∀y (false ∨ y) ∧ true (since it does not follow from the derivation A1 rules. P1 . by interpretation of the relation symbols by the corresponding relations). A model of a formal axiomatic theory is a welldeﬁned mathematical system with the particular structure that is characterized by the theory. A model for an axiomatic theory is simply a system of objects. We deﬁne a model M for an axiomatic system A by : M = (U.
completeness and decidability. i. to derive both ϕ and ¬ϕ) (see also [49. L is inconsistent if P and R are not disjoint. The following deﬁnitions of completeness. Note that consistency and decidability do not refer to T .108 CHAPTER 6. Since the properties of an axiomatic system A give rise to corresponding properties in the language LA . Deﬁnition of a theory: T h(M) := {ϕ  ϕ is a statement and M = ϕ} So now we can say that Hilbert was looking for an axiomatic system for which logic can be a model.e. but only concern P and R. Hilbert proposed such an axiomatic system to have the properties of consistency. it is not possible for any formula ϕ. HILBERT Example: Let M1 = (N. . A theory T h of a model M. An axiomatic system A is decidable := (∀ϕ :: there is an algorithm that decides in a ﬁnite number of steps whether (or not) A ϕ) (see also [49. page 240]) A language L is inconsistent if is not consistent.e. along with some other properties of axiomatic systems. soundness and correctness also depend on the truth set T (and therefore on the model that determines that truth set). page 270]) Deﬁnition of consistency: A language L is consistent := ¬(∃s : s ∈ S : s ∈ P ∧ s ∈ R). An axiomatic system A is consistent := ¬(∃ϕ :: A ϕ ∧ A ¬ ϕ) (i. Deﬁnition of decidability: A language L is decidable := (∀ϕ :: (ϕ ∈ P ∨ ϕ ∈ R)). ≤) be a model for axiomatic system A1 M1 = ∀x∀y(x ≤ y ∨ y ≤ x) M1 = ∀x∀y(x ≤ y ∧ y ≤ x) Note that instead of using R1 for the relation symbol. we immediately took the interpretation ≤. We will now introduce these concepts. notation T h(M) is the set of true statements in the language of that model. P ∩ R = ∅ or no sentence is both provable and refutable in L. we here distinguish in each deﬁnition between the property of a language and of an axiomatic system. Clearly.
e. i.6.e. are also not true in the model.e. An axiomatic system A is complete for model M := (∀ϕ :: M = ϕ → A ϕ) (i.1.e. then P must be disjoint from R. An axiomatic system A is a sound axiomatization for a model M := (∀ϕ :: A ϕ → M = ϕ) (i. Note that the statement (∀ϕ :: M = ϕ → A ϕ) is equivalent with (∀ϕ :: A ϕ → M = ϕ). it is true in the model) Deﬁnition of correctness: A language L is correct for a model M := P ⊆ T ∧ R ∩ T = ∅ (i. all true statements in the model are derivable/provable) A language L is incomplete if it is not complete. HILBERT’S PROOF THEORY 109 Deﬁnition of completeness: A language L is complete for a model M := (∀ϕ :: M = ϕ → ϕ ∈ P). Deﬁnition of soundness: A language L is sound for a model M := (∀ϕ :: ϕ ∈ P → M = ϕ). Proof: This follows directly from the deﬁnitions of correctness and consistency because if P is a subset of T and T is disjoint from R. all statements ϕ that are not derivable/provable. An axiomatic system A is correct for a model M := A is sound for M and A is complete for M Theorem: If L is correct. every provable sentence is true and every refutable sentence is false (not true)). . if a statement ϕ is derivable/provable. it is consistent.
. which is usually considered to be the most important unsolved problem in mathematics. see the articles [41] and [40].2) and a collection of problems important to that philosophy. also available online [42]. in the opening of his speech to the 1900 Congress in Paris In 1900 Hilbert outlined his list of 23 mathematical problems to the International Congress of Mathematics in Paris.2 Hilbert’s 23 problems Who of us would not be glad to lift the veil behind which the future lies hidden: to cast a glance at the next level of our science and at the secrets of its development during future centuries? What particular goals will there be toward which the leading mathematical spirits of coming generations will strive? What new methods and new facts in the wide and rich ﬁeld of mathematical thought will the next centuries disclose? .nu/interests/questions/clay. the Clay Mathematics Institute (see http://zax. HILBERT 6. and each solution was a noted event (or even a mathematical breakthrough). For a complete source of information on the 23 (or 25?. in the footsteps of Hilbert. In 2000. Many of the problems have since been solved. We mention some of the original problems that are related to the foundations of mathematics.htm) has made a new list of 7 (for a large part mathematical) problems to be solved in this century.mine. which he urged upon the attention of his contemporaries. but it was also his philosophy of mathematics (see also the formalist viewpoint in section 5. Among those problems is one of the original problems (number 8) of Hilbert.110 CHAPTER 6. Some of these problems however remain unsolved till this day. His famous address was important and still today inﬂuences and stimulates mathematical research all over the world. It requires a solution to the Riemann hypothesis. It was not only a collection of problems.D. see [32]) original publications of Hilbert. Hilbert.
y = 3 and z = 0) and some do not. This problem is also known as the Continuum Hypothesis and extensively covered in section 3.unibielefeld. Some polynomials have an integral root (for example 6x3 yz 2 + 3xy 2 − x3 − 10 has an integral root at x = 5. ‘given a diophantine equation with any number of unknown quantities and with rational integral numerical coeﬃcients. • Problem 10: Decidability of solvability of diophantine equations.html) is to be proved for the residues of the lth power. So far no complete axiomatization of physics has been found. This question asks if.mathematik. In order to solve this problem this notion had to be made more precise . • Problem 9: Proof of the most general law of reciprocity in algebraic number theory. A root of a polynomial is an assignment of values to its variables so that the value of the polynomial is 0. • Problem 6: Mathematical treatment of the axioms of physics. to devise a process according to which it can be determined by a ﬁnite number of operations whether the equation is solvable in rational integers’.6. G¨del later showed that any formal system that contains o arithmetic (see chapter 8) can never prove its own consistency. when l denotes a prime. This problem is still unsolved. Hilbert did not use the term algorithm but rather ‘a process according to which it can be determined by a ﬁnite number of operations’. that cannot be expressed in the system. by means of axioms. but can prove its consistency. In modern terminology the problem asks to devise an algorithm that tests whether a polynomial has an integral root. HILBERT’S 23 PROBLEMS 111 • Problem 1: Cantor’s problem of the cardinal number of the continuum. and further when l is a power of 2 or a power of an odd prime. the law of reciprocity (for more references see http://www. in the ﬁrst rank are the theory of probabilities and mechanics. A root is an integral root if all variables are assigned integer values. The question is whether it can be shown that the axioms on which arithmetic is based are consistent.de/∼kersten/hilbert/prob9. For any ﬁeld of numbers. • Problem 2: The consistency of the axioms of arithmetic. Another metamathematical argument might exist. asks to treat in the same manner.7.2. those physical sciences in which mathematics plays an important part.
unibielefeld. since it involves the general question of extending the calculus of variations. Hilbert gave a justiﬁcation of the necessity for an extension of the diﬀerential and integral calculus (for more references see http://www.7]. At the end of his article.mathematik. Thus he apparently assumed such an algorithm exists. Hilbert asked that an algorithm be devised . Of the 23 problems Hilbert posed. but now we know that this problem is algorithmically unsolvable.4] and [31. Hilary Potnam and Julia Robinson. building on the work of Martin Davis. for mathematics is the foundation of all exact knowledge of natural phenomena.1). Also. With some examples that we will not treat here.112 CHAPTER 6. see section 9. which basically is the theory of the variation of functions. showed that no algorithm exists for testing whether a polynomial has integral roots. section 4.de/∼kersten/hilbert/prob23. like other sciences. split into separate branches whose connection becomes ever more loose. see [49. this one is the least deﬁnite. . Hilbert says that he does not believe mathematics will.html). For a more detailed assessment of Hilbert’s view. but that the organic unity of mathematics is inherent in the nature of this science. section 12. In 1970. • Problem 23: Further development of the methods of the calculus of variations. the young Russian Yuri Matijasevic. HILBERT (this was done by Turing.
presented this theory to prevent the paradoxes while at the same time allowing many of the operations Frege considered desirable. the core of mathematics. it would show that (at least part of) mathematics was founded in pure logic. As we have seen in section 4. as we have learned from Russell’s paradox in section 5. page 194] In section 4. They diﬀer as boy and man. The consequence is that it has now become wholly impossible to draw a line between the two.1 we saw that with the postulates he presented. But unfortunately the language that he created was inconsistent. The theory of types basically says that all sets and other entities have 113 . . With Principia Mathematica.1.1 Russell and Whitehead’s Principia Mathematica Logic has become more mathematical and mathematics has become more logical. Peano stated and organized the fundamental laws of number theory.Chapter 7 Types 7. logic is the youth of mathematics and mathematics is the manhood of logic. Frege was adherent to the goal of logicism that all of mathematics could be derived from logic alone. Russell laid out a theory to eliminate the paradoxes. In his 1908 paper. ‘Mathematical Logic as Based on the Theory of Types’.B.2. the two are one. in fact. Bertrand Russell and his teacher. If statements satisfying these conditions could be derived in this logic. Russell in [79. the mathematician Alfred Whitehead.
the Principia oﬀered discussions of functions. for example. truth. What is perhaps missing is any hesitation or perplexity about the limits of logic: whether this logic is. they opted to avoid the philosophical aspects and justiﬁcations. to deﬁne mathematical entities (like numbers) in pure logic and then derive their fundamental properties. We will look at the theory of types in more detail in section 7. but to logic and mathematics itself. or decida .114 CHAPTER 7. but unlike him. Although ‘Principia’ was largely successful there still was critique on the axioms of inﬁnity and the axiom of reducibility. Indeed. and its overcoming of diﬃculties that had beset earlier logical theories and logistic conceptions. these types can be ordered and sets are always constructed from speciﬁed members with lower types. Their collaboration began in 1903 when Whitehead and Russell were both in the initial stages of preparing second volumes to earlier books on related topics: Whitehead’s 1898 ‘A Treatise on Universal Algebra’ and Russell’s 1903 ‘The Principles of Mathematics’. With their mathematical treatment of the principles of the mathematicians. deﬁnite descriptions. But unlike Newton’s book it dealt not with the application of mathematical techniques to physics.2. and logical laws that had a deep inﬂuence on discussions in analytical philosophy and logic throughout the 20th century. complete. Principia Mathematica consisted of three volumes (sometimes also called ‘the Principia’) and was named after the ‘Philosophiae naturalis principia mathematica’ of the English physician Isaac Newton. Russell and Whitehead intended to summarize the recent work in logic as well as to give a revolutionary and systematical development of mathematical logic and derive basic mathematical principles from the principles of logic alone. Their work overlapped considerably and they began collaborating on what would become ‘Principia Mathematica’. provably consistent. its masterful development of logical systems for propositional and predicate logic. The approach of Russell and Whitehead was essentially that of Frege. In 1919 Russell published about the philosophy behind his work in an ‘Introduction to Mathematical Philosophies’ which was accessible to a broad audience and therefore has been the main source through which Russell’s logicist view of mathematics has become known. their deﬁnition of natural numbers was basically the same as the one of Frege. I quote the following assessment about Principia Mathematica from [91]: “In addition to its notation (much of it borrowed from Peano). they were considered to be too ad hoc solutions to be justiﬁed philosophically. TYPES a logical ‘type’.
retaining only those of higher type.2) was extended with simple types in 1940 to prevent paradoxes.1. This is somewhat odd. considered to be the most inﬂuential book on logic that was ever written..”. The Principia is a work of conﬁdence and mastery and not of open problems and possible diﬃculties and shortcomings. and his typed lambda calculus thereby falls outside the scope of this article (18701940). We will only summarize his work in this paragraph. it is a work closer to the naive progressive elements of the Jahrhundertwende than to the agonizing ﬁn de siecle. RUSSELL AND WHITEHEAD’S PRINCIPIA MATHEMATICA 115 ble. In the next section we will further investigate Russell’s theory of types. The English mathematician Frank Plumpton Ramsey (19031930) oﬀered criticism to the theory of types that was accommodated in later editions of Principia Mathematica. or one may abandon also classes and relations of the lowest type. by appropriate deletions of entities . The mathematician Alonzo Church also published articles on type systems. or whether there are concepts expressible in natural languages but not in this logical notation. The type theory that emerged from Church’s lambda calculus (see section 9. similar to the extension of logical set theory with simple types by Russell in 1910 to avoid the paradoxes. [. In fact any ﬁnite number of levels at the bottom of the hierarchy of types may be deleted.7. Church also proposed another logical set theory in 1974. given the wellknown list of problems posed by Hilbert in 1900 that came to animate 20thcentury logic. The main diﬀerence between the type structure of Russell and that of Church is that the former is setbased with linear ordering of types and the latter is function based with a nonlinear order of types. But this is no reduction in the variety of entities. together with a later simpliﬁcation to this theory by the mathematicians Hilbert and Wilhelm Ackermann (18961962) from Germany. but did not develop his typed version of lambda calculus before the 1940’s. because the truncated theory of types. Russell and Whitehead not only managed to avoid the paradoxes but also created one of the most impressive and complicated works of all times and that is.] in the simple theory of types it is well known that the individuals may be dispensed with if classes and relations of all types are retained. We would like to add that with the very formal and accurate buildup of mathematics. especially German logic. next to Aristotle’s Organon. The result of this is the ‘deramiﬁed theory of types’ that will be treated in subsequent sections.
Whitehead and Russell deﬁne types. sets. relations and their properties.8]. can be made isomorphic to the original hierarchy and indeed the continued adequacy of the truncated hierarchy to the original purposes depends on this isomorphism. Organization of Principia Mathematica The nearly 2. limits and derivatives. continuity. including subjects as convergent sequences.000 pages Principia Mathematica starts with a short preface that explains what it wants to demonstrate. the ﬁrst volume introduces a symbolic logic that is based on a small set of axioms. The second volume continues with a purely logical theory of cardinal and ordinal arithmetic. and basic operations on sets. It starts with a theory of wellordered sets. and contained a theory of numbers that was called ‘measurement’. The volume ends with a general theory of simply ordered sets (series) which is followed by a logical base of fundamental mathematical analysis. including addition. ﬁnite. multiplication and exponentiation of both ﬁnite cardinals and of relations.A. . Church in ‘The need for abstract entities’. after an introduction. ratios and the real numbers. Then. the negative integers. Built upon these. This allowed them to introduce basic arithmetic. namely that pure mathematics can be based on logic alone and requires no other primitive notions. and ﬁnally vectors. TYPES in each type. and other mathematical assertions that also refer to nonlogical contents (such as the statement that (perceptual) space is threedimensional) as part of applied mathematics. chapter 7.116 CHAPTER 7. The third volume was meant to prepare the ground for the fourth and concluding volume on geometry (which was never completed). Russell classiﬁes statements that involve logical constants only (such as the laws of reciprocity. The belief was then expressed that pure mathematics was sufﬁcient to include all traditional mathematics. . inﬁnite and continuous series. coordinates and basic geometric notions such as angles. and then lays out the propositional and predicate calculi. see page 18 of Principia Mathematica) as pure mathematics. More details about the organization of Principia Mathematica and a critical assessment of its work can be found in [31. and speciﬁcally section 7.
to make sure that ‘whatever contains an apparent variable must not be a possible variable of that variable’. second order propositions are propositions with ﬁrstorder propositions as variables and possibly propositions of lower than ﬁrst order types. The type of a variable in a proposition is ﬁxed by all the values the function is concerned with. 7. section 3. RUSSELL AND WHITEHEAD’S PRINCIPIA MATHEMATICA 117 The symbolic logic and notation of Principia Mathematica Russell and Whitehead opted for a more modern notation of Peano instead of Frege’s Begriﬀsschrift. such that the n + 1th order propositions contain propositions of order n and possibly others of order smaller than n.3] and [31. Russell formulated the vicious circle principle (see page 85) and implemented it using types in Principia Mathematica (see for details [31. that is.2. i. and an analysis of their common characteristics.2 and 3. What is a type? A type is the range of signiﬁcance of a propositional function.3]). First order propositions are elementary propositions that only contain individuals. sections 7. the collection of arguments for which the said function is signiﬁcant and has values. Russell then deﬁned an individual as being not a proposition but a constant. 7. Unlike Frege.e.1. Russell and Whitehead treated functions as ﬁrstclass citizens.2 and 3. This division of objects into types (the type of an object can be seen as a property of that object) is necessary to conform to the vicious circle principle. A good introduction to the logical calculus and the speciﬁc notation that was used in Principia Mathematica can be found in [49. section 3.3. since the variables contained in an object determine the type of that object. Russell’s theory of types Russell’s 1908 book included a categorization of most of the important contradictions of that time. i.e. We can now categorize propositions by their types.7 and 7. destitute of complexity.8]. by the totality over which the variable ranges. . This can be continued. This linear order of types prevents vicious circles. This can be established by making sure that ‘an apparent variable’ is of a diﬀerent and higher type than the possible values of that type. To prevent the paradoxes he catalogued. section 7.9] and [49.7.
as well as a formal introduction to RTT. chapter 3]. A detailed introduction to the (symbolic) logic and notation of Principia Mathematica. section 7. this way of restricting propositions prevents the paradoxes but can in some cases be needlessly restrictive. STT and NF and MP (see section 7. we refer to [86.3).9] and [49. this reference is again partly based on [52]. [53]. section 3. and we conﬁne quantiﬁers always to a single level. all of which in a certain context discuss RTT. see [31. is to be included in a later version of this report.118 CHAPTER 7. [54] and [43]. On its turn. . For a formalization (in modern notation) of Russell’s Ramiﬁed Theory of Types (RTT). As can be proved however.3]. TYPES We now also restrict relations like ∈ so that x ∈ y is only signiﬁcant when y is of a type one level higher than x. For more information about types in Principia Mathematica.
. HILBERT AND ACKERMANN 119 7. .2 Ramsey. Hilbert and Ackermann Suppose a contradiction were to be found in the axioms of set theory. . for example. His second simpliﬁcation is to suggest simplifying Russell’s theory of types by regarding certain semantic paradoxes as linguistic. ‘the set of all sets which are not members of themselves’. Thus. . while a predicate with a single argument that is itself a predicate with two individual arguments is of type (i. Hilbert. xn ) has arguments x1 . and if it were true. chapter 4]. and if a predicate or predicate variable ϕ(x1 .P. i). xn . together with Ackermann (see [2]). . When . xn ) is said to be of type (τ1 . . Ramsey claims. .F. τ2 ). (i.2. . . i. Do you seriously believe that a bridge would fall down? . quite diﬀerent and depend on the meaning of the word ‘lie’. then ϕ(x1 . for it is not a tautology. the semantic paradoxes such as ‘this is a lie’ are. . . . . After his suggestions. He accepted Russell’s solution to remove the logical paradoxes of set theory arising from.] every individual or individual variable is said to be of type i. for example. First he proposed dropping the axiom of reducibility which. For more detailed information about the history of deramiﬁcation. . Having introduced the hierarchy of types in this way. we refer to [86. simpliﬁed Russell’s theory of types by removing the orders into what has become known as the ‘simple theory of types’ (STT). page 105142]) in 1925. . Ramsey. However. τ2 respectively.”. quoted in [58] Ramsey published his ﬁrst major work ‘The Foundations of Mathematics’ (see [69. . . .] certainly not selfevident and there is no reason to suppose it true. These he removed with his reinterpretation of the axiom of reducibility. i)). In this publication he attempted to improve Principia Mathematica in two ways. . of types τ1 .7. Every quantiﬁer will then range over the totality of all entities of the same type as the bound variable. is “[. and Ramsey’s modiﬁcation of the theory as the deramiﬁed theory of types. Russell’s theory became known as the ramiﬁed theory of types (RTT). this would be a happy accident and not a logical necessity. RAMSEY. he writes... any predicate with two individual arguments is of type (i. We quote from page 115 of [49]: “[In the simple theory of types. . we shall now require bound variables to be of some deﬁnite type.
.120 CHAPTER 7. A further discussion and formalization (in the form of Church’s simply typed lambda calculus λ → c) of the simple theory of types can be found in [86]. we have a very comprehensive logical calculus which is secure against vicious circularity”. TYPES this is done.
is a convenient myth [which] simpliﬁes the laws of arithmetic . In Russell’s theory such an object is too big but according to others. . having a set of all sets or a type of all types is legitimate as long as we do not permit forming all subsets. NF.7.3 Quine Just as the introduction of the irrational numbers . . pages 80101] a system called New Foundations. . Quine proposed in [94. since the sentence ϕ ≡ x ∈ x is not stratiﬁed. quoted in [50] Willard Van Orman Quine (19082000) was an American mathematician who became interested in the work of Russell. . for example by requiring a stratiﬁed predicate to deﬁne the subset. including Quine. The conceptional scheme of physical objects is [likewise] a convenient myth. . xn ) of ϕ. . j ≤ n With the NFC axiom the paradox is obviously prevented. so physical objects are postulated entities which round out and simplify our account of the ﬂux of existence . / . An alternative to Russell’s system is one that allows a single universe of all types (or all sets). based on this idea.Quine. then no contradiction will result.3. To restrict the way subsets are formed. footnote 4]. Quine further restricted the comprehension axiom to: NFC(omprehension) Axiom: ∃x∀y :: (y ∈ x ↔ ϕ(y)). QUINE 121 7. . . . f (F ) = 1 + (max : 1 ≤ i ≤ n : f (xi )) Deﬁnition of homogeneous stratiﬁcation: A wellformed formula ϕ is homogeneously stratiﬁed := ϕ is heterogeneously stratiﬁed and for the corresponding function f we also have that f (xi ) = f (xj ) for 0 ≤ i. If there is some restriction on which subsets can be formed. where x is not free in ϕ(y) and ϕ(y) is stratiﬁed In [86. simpler than the literal truth and yet containing that literal truth as a scattered part . we ﬁnd two deﬁnitions of stratiﬁcation. Deﬁnition of heterogeneous stratiﬁcation: A wellformed formula ϕ is heterogeneously stratiﬁed := there is a function f from the variables and constants of ϕ to the natural numbers such that for each atomic wellformed formula F (x1 . .
[71]. The rule of elementhood is such as to provide the elementhood of just those classes which exist for NF. Impredicative comprehension by a class: (∃∀x :: x ∈ y ↔ ϕ(x)). ML was obtained from NF by replacing (NFC) by two axioms.boisestate. We cannot prove Peano’s axiom[s] in it. see seco tion 8. Like NBG. For more information. It was however proved subject to the BuraliForti paradox”. and developed another noniterative set theory called ML (Mathematical Logic). . one for class existence and one for elementhood. stratiﬁed or not.5) set theory[. we refer to [70]. [72] and the website http://diamond. ML contains a bifurcation of classes into elements and nonelements. Sets can enjoy the property of being full objects whereas classes cannot. The rule of class existence provides [. where ϕ(x) is any formula in which y does not occur free.edu/∼holmes/holmes/nf.html. . To overcome some of the diﬃculties. TYPES We quote from [86. Quine adopted similar measures to NBG (NeumannBernayG¨del. ﬁrst presented in [70]]. ML was liked both for the manipulative convenience we regain in it and the symmetrical universe it furnishes. NF is said to lack motivation because its axiom of comprehension is justiﬁed only on technical grounds and one’s mental image of set theory does not lead to such an axiom. unless we assume the existence of a class with m + 1 elements. ] the existence of the classes of all elements satisfying any condition ϕ. the two axioms of comprehension for ML [are]: Comprehension by a set: (∃y∀x :: x ∈ y ↔ ϕ(x)). where ϕ(x) is stratiﬁed with set variables only in which y does not occur free. page 3]: “NF is weak for mathematical induction and the axiom of choice is not compatible with NF.122 CHAPTER 7. . Also. Therefore.
Principia Mathematica and ZF] relatively simple problems in the theory of integers that cannot be decided on the basis of the axioms. what is now Brno. but in 1931 Kurt G¨del (born in 1906 in Brnn. o AustriaHungary. he proved that it is impossible to establish the internal logic consistency of a very large class of deductive systems. G¨del. as is well known.1 Informally: G¨del’s incompleteness theorems o No system of Hilbert’s type in which the integers (or Peano’s arithmetic. . It provoked a reappraisal of philosophies of mathematics. see section 4. Czech Republic) presented mathematicians with the astounding and melancholy conclusion that the axiomatic method has certain limitations. . As a corollary of this theorem.K. to the formalization of large tracts of it. that on the contrary there are in the two systems mentioned [viz. 123 .1) can be deﬁned can be both consistent and complete. . which rule out the possibility that even the ordinary arithmetic (as by Peano) can ever be fully axiomatized. in the opening of the paper introducing the incomo pleteness theorem (1931) 8. At the time this seemed unreal. [.Chapter 8 G¨del o The development of mathematics towards greater precision has led. so that one can prove any theorem using nothing but a few mechanical rules.] It will be shown below that this is not the case.
2 for more details). The system is necessarily incomplete. With such a representation.2 for the deﬁnition of consistency). then no proof of its consistency is possible which can be formalized within it (see also section 8. A class sign is a formula with just one free variable. o We deﬁne gn(ϕ) to be the G¨del number corresponding to wellformed formula ϕ of S. GODEL G¨del’s famous incompleteness theorem and the corresponding corollary o are also called the ﬁrst and the second incompleteness theorem. stating that ϕ is a provable formula. If the system is consistent. q] we denote the formula which is denoted by replacing the free variable in R(n) by q. ‘proofschema’ and ‘provable formula’ by a series of natural numbers. G¨del gave a wello o ordering of all wellformed formulae of an axiomatic system S (to be precise. following the lines of the original proof of G¨del) the main lines of both theorems in this section. if an axiomatic system of formalized arithmetic is wide enough. 1 The (syntax of) formulas of an axiomatic system are precisely deﬁned and built up from a ﬁnite alphabet of symbols. By [R(n). We ﬁrst indicate (in 8 steps. Precisely. for some natural number n. in the sense that there exists a formula ϕ of the system such that neither ϕ nor its negation is derivable (see also section 8. see section 8. We suppose that the class signs are ordered by a function R with domain N. Proofs are nothing but a ﬁnite series of formulas and can be replaced by numbers. G¨del o then showed how to represent metamathematical concepts as ‘formula’. of ωcomplete systems.2 for the deﬁnition of incompleteness). a class sign) that denotes n ∈ K. Since the symbols that are used in this formula are all deﬁnable in S. 3 We now deﬁne a set K of natural G¨del numbers by n ∈ K ↔ o ¬prov([R(n).124 ¨ CHAPTER 8. n]). and provide a more o rigorous and exact proof of the theorems in section 8. such that R(n) is deﬁned as the nth class sign. We call . G¨del was o able to show that.2 and further sections. there also is a formula with one free variable (i. we deﬁne prov(ϕ) := ‘ϕ is a provable formula’. 2 We consider a formula prov(ϕ) of S.e. then 1. and 2. the G¨del numbering.
We By replacing in the class sign C. thus the system S is inconsistent. 7 G¨del then showed that ‘if arithmetic is consistent. i.1. and hence is undecidable: • Suppose G is provable. the system S is ‘essentially incompleteable’. and this says ¬prov(G) : G is not provable. then both G and ¬G must not be provable: G is undecidable in S. 1 . So if we assume that S is consistent. So there is a natural number q such that C = R(q). we can say that G is a property that asserts of itself that it is not provable. Since1 this formula says that q ∈ K. page 510]. GrattanGuinness cleverly calls it in [31.e. is unprovable in S. 4 We show that G is provable ↔ ¬G is provable. the proposition depends on the set of axioms.e. since in step 3 a proposition is deﬁned that states ‘this formula is not provable’. q]) or prov(G) : G is provable. it is incomplete’. That means. q]). that is ¬prov([R(q). q]). ¬prov([R(q). and this is equivalent with / prov([R(q). (by replacing the variable in the class sign C by q) that is q ∈ K.¨ 8. which expresses that n ∈ K for some natural number n. Because from the remark that G asserts its own unprovability. We deﬁne the condition of the statement by A: ‘arithmetic is consistent’. (by replacing the variable in the class sign by q) that is q ∈ K. this means its negation ¬[R(q). We now show that the proposition G ≡ [R(q). Therefore. it follows at once that G is true. Basically we then create another formula G . o We want to prove this conditional statement as a whole. i. the free variable by q. we can again apply the argument given in the previous ﬁve steps in the same way. INFORMALLY: GODEL’S INCOMPLETENESS THEOREMS 125 this class sign C. q] is provable. as I. • Suppose G is not provable. A proof of G leads to a proof of ¬G and vice versa. q] is provable. or in other words ‘this formula does not follow from the axioms’. q]. 5 By a metamathematical consideration we know however that G is true. this means [R(q). since G is unprovable (because undecidable). q]). So there is a true statement in S (namely G) that is not provable: the system S is incomplete! 6 If we add G as an axiom. (¬¬prov([R(q).
thus if S is consistent. 8 We can now formally prove A → G (see section 8. But we already saw that (unless S is inconsistent).2 for the proof). So we can express A ≡ (∃y :: (∀x :: ¬prov (x is a proof of y))). This means that if A is provable. . see [93. A system is incomplete if there is a true statement that is not provable. Thus we can represent the conclusion of the conditional statement by G.126 ¨ CHAPTER 8. G is not provable. as o expressed in [31. page 614]). we know (by modus ponens or the role of detachment) that G is provable.1 that this means that there is at least one formula ϕ of arithmetic that is not true. Or. ‘any set S of consistent formulae of P M cannot include the formula F asserting its consistency’. GODEL already have seen in section 6. A is not provable! That means if arithmetic is consistent its consistency cannot be established by metamathematical reasoning within the formalism of arithmetic (this is G¨del’s theorem 11. page 510].
Then in the o next sections we will apply the theorem to the system of Peano Arithmetic and that of Principia Mathematica.2.1 On formally undecidable propositions We assume there is an STGA language L and investigate the conditions for a system L for which G¨del showed that there is a true sentence that is not o provable in L (i. Proof: (from [84]) Since L is built up of a ﬁnite number of symbols and derivation rules. and discuss the consequences of the incompleteness theorem.¨ 8. is essentially incomplete. we will ﬁrst give an abstract version of G¨del’s ﬁrst and second incompleteness theorem. Note that expressibility in L only concerns with T and not with P and R.2 Formally: G¨del’s Incompleteness Theorems o The ﬁrst incompleteness theorem says that Principia Mathematica or any other system in which arithmetic can be developed. not every set of numbers is expressible in L. see page 69) there are nondenumerably many sets of natural numbers. Therefore.2. (∃t : t ∈ T : t ∈ P)). . We deﬁne the following concepts: / A predicate H expresses a set of numbers A := (∀n :: H(n) ∈ T ↔ n ∈ A) A is expressible in L if A is expressed by some predicate of L. In the following two paragraphs. FORMALLY: GODEL’S INCOMPLETENESS THEOREMS 127 8. there are only denumerably many expressions or predicates of L. 8.unless the proof itself employs rules of inference in certain essential respects diﬀerent from the derivation rules identifying theorems within the systems. But (by Cantor’s theorem.e. that is in any consistent set of arithmetical axioms there are statements that are true but cannot be derived from the set. investigate the set of lano guages that the theorem applies to. Theorem: Not every set of numbers is expressible. and then in the third paragraph ﬁll in the details by giving a speciﬁc G¨del numbering for arithmetic. The second theorem says that it is impossible to give a metamathematical proof of the consistency of a system comprehensive enough to contain the whole of arithmetic .
1. we have n ∈ A∗ ↔ d(n) ∈ A. i. We deﬁne d(n) to be the G¨del number of the diagonalization of En . If the set P is expressible in L and L is correct.2. is deﬁned by En (n). we also call gn(E) the G¨del number of E. i.3. For any E ∈ E. (A∗ ) is to be read as (A)∗ . We have that h ∈ P ∗ ↔ d(h) ∈ P ↔ d(h) ∈ P . For this abstract treatment the o only assumption2 we make is that every number is the G¨del number of some expression. This means that H(h) is either true and not provable in L or false but provable in L. that is nevertheless commonly accepted by all mathematicians. Proof: (based on [84]) Suppose L is correct and P ∗ is expressible in L by a predicate H with G¨del number h. 2 . The latter alternative violates the hypothesis that L is correct. i. Now we have: H(h) is true ↔ H(h) / is not provable in L. In particular.e. G¨del’s o original numbering did not have this restriction.e. Hence it must be that H(h) is true but not provable in L. GODEL Let gn be a function that assigns to each expression a unique natural number (just as in step 1 in section 8. We will o give a speciﬁc numbering in section 8.e. then there is a true sentence of L not provable in L. The diagonalization of En for En ⊆ H. gn is a bijection between E and N). i. But since h is the G¨del number of H and by the deﬁnition of d. We deﬁne En to be the inverse of gn. gn(En ) = n.e. Note that in this proof we have not deﬁned the set T by a model but determined the truth of G by a metamathematical argument just as we have seen in step 5 of section 8. We show that G is true but not provable in L. / H(h) is true ↔ h ∈ P ∗ . that is: d(n) := gn (En (n)).128 ¨ CHAPTER 8.e. H(n) is true ↔ n ∈ P ∗ for all n ∈ N. Let G be the diagonalization of H o (i. For each set of natural numbers A. we deﬁne its complement A to be the set of all natural numbers not in A. For any set of natural numbers A. and o call d the diagonal function of the system. the sentence H(h)). i. The complement operation ∼ binds stronger than the ∗. we deﬁne A∗ to be the set of all numbers n such that d(n) ∈ A. Note also that the proposition G corresponds to the propoThis assumption is for technical reasons that make the proof more simple. H expresses P ∗ in L.1.e. Abstract form of G¨del’s ﬁrst theorem: Let P be a set of G¨del numo o ∗ bers of all the provable sentences. d(h) is o the G¨del number of H(h) and so d(h) ∈ P ↔ H(h) is provable in L and o d(h) ∈ P ↔ H(h) is not provable in L.
1. such as a system containing Peano’s arithmetic or the system of Principia Mathematica. By this theorem. but undecidable in L. Diagonal Lemma: For any set A. since H(h) is a proposition that expresses of itself that it is not provable. Before we prove a general form of G¨del’s second incompleteness theoo rem.1. G1 : For any set A expressible in L. Hence G is true. then there is a G¨del sentence for A. o A sentence En is a G¨del sentence for a set A of natural numbers if either o o En is true and its G¨del number lies in A.2. That is where the name incompleteness theorem comes from. In particular we then have that if P is expressible in L (i. When we study a particular language L. We can do this by separately verifying the following conditions. and hence also incomplete. Theorem: If L is correct and if the set P ∗ is expressible in L. it follows immediately that if a system is consistent. Note that this is the statement A → G of point 8 in section 8.e G3 holds). En is a G¨del sentence for A if and only if En ∈ T ↔ n ∈ A. i. FORMALLY: GODEL’S INCOMPLETENESS THEOREMS 129 sition G of point 3 of section 8. we introduce some more deﬁnitions. Proof: G1 and G2 imply that for any expressible set A. the set A∗ is expressible in L. G2 : For any set A expressible in L. Proof: A system L that is correct and for which the set P ∗ is expressible in L contains a sentence G that is true but not provable or refutable (By the previous theorem and the assumption of correctness). if A∗ is expressible in L.e. A∗ is expressible in L. or En is false and its G¨del number o lies outside A. G3 : The set P is expressible in L. the set A is expressible in L. o .¨ 8. and the set P ∗ is expressible in that system (which we will later see is true for a system of basic arithmetic) then it is incomplete. Theorem: G1 ∧ G2 ∧ G3 → P ∗ is expressible in L. we have to verify the assumption that P ∗ is expressible in L. P ∗ is expressible in L. then L is incomplete.
GODEL o Proof: Suppose H is a predicate that expresses A∗ in L. A∗ is expressible in L. A general form of G¨del’s second theorem (by Tarski) o 1. thus for any expressible set A. The set T ∗ is not expressible in L 2. o o Lemma: If L satisﬁes G1 . there is a G¨del sentence for A. therefore. there cannot possibly be a G¨del sentence for the set o T because such a sentence would be true if and only if its G¨del number was o not the G¨del number of a true sentence.e. H(n) o ∗ is true ↔ n ∈ A .2 The impossibility of an ‘internal’ proof of consistency With the diagonal lemma we can also prove a general form of G¨del’s second o theorem. If condition G1 holds. A G¨del sentence for P is a sentence which is (by the deﬁnition o of a G¨del sentence) true if and only if it is not provable in L. for systems for which G1 and G2 hold. If conditions G1 and G2 both hold. that was ﬁrst formulated in this form by the Polish mathematician Alfred Tarski. then the set T is not expressible in L (i. then for any set A expressible in L.2. there is a G¨del sentence G for P . o Proof: L satisﬁes G1 . then H(h) is a G¨del sentence for A. So for any o correct system L. H(h) is true ↔ d(h) ∈ A.130 ¨ CHAPTER 8. then T is not expressible in L 3. and this is absurd. With the diagonal lemma we can also prove the ﬁrst theorem as follows: o Since P ∗ is expressible in L. For any number n. and since d(h) is the G¨del number of H(h). truth within the system is not deﬁnable within the system. a G¨del sentence for P is a sentence which is true but not o provable in L. 8. Now we can apply the previous lemma to conclude that there is a G¨del o sentence for A. o .) Proof: To begin with. by the diagonal lemma. let h be its G¨del number. Then d(h) is the G¨del number of H(h).
FORMALLY: GODEL’S INCOMPLETENESS THEOREMS 131 1. then by the diagonal lemma. If G2 also holds. For the moment we refer to G¨del’s original work that can be found in [93]. we will consider particular mathematical languages. If T ∗ were expressible in L.2. then T would also be expressible in L. G2 o and G3 This section will be completed in a later version of this document. which we can build on in section 8. Then if T were expressible in L. the set T ∗ would be expressible in L. violating (2). then if T were expressible in L. starting with ﬁrst order arithmetic. there would be a G¨del sentence for the set T . Suppose condition G1 holds. which we have just shown is impossible. o Therefore.2. T ∗ is not expressible in L. 8.3 to prove the incompleteness of systems based on Peano’s arithmetic and other systems. violating (1). o .¨ 8. 3. Now we have seen both theorems in a general form. 2.3 G¨del numbering and a concrete proof of G1 .
We mention three of them. It . This proof of G¨del’s ﬁrst theorem however cannot be formalized in arithmetic o (since the truth set is not expressible in arithmetic). Deﬁnition of simple consistency: An axiomatic system A is simply consistent := no sentence is both provable and refutable in A Deﬁnition of ωinconsistent: An axiomatic system A is ωinconsistent := there is a predicate F (w) (in one free variable w) such that the sentence (∃w :: F (w)) is provable but all the sentences F (0). In the mathematical sciences are found numerous forms of reasoning irreducible to syllogisms. It is based on an axiomatic system by the American mathematician Raphael Robinson (19121995). This proof is of course formalizable in Peano Arithmetic (and this is necessary for G¨del’s second theorem) and also o shows that any axiomatic system A that is simply consistent and in which all Σ0 sentences are provable. F (1). that we refer to as R. GODEL 8. is ωincomplete. . The simplest uses a truth set deﬁned by Tarski and shows that every axiomatizable subsystem of N (the complete theory of arithmetic) is incomplete. .G. and was based on the underlying assumption that Peano Arithmetic is correct. G¨del’s original o incompleteness proof involves the much weaker assumption of ωconsistency. . . has little importance in mathematics. Peano in [68. are refutable Deﬁnition of ωincomplete: An axiomatic system A is ωincomplete := A is a simply consistent axiomatic system in which all Σ0 sentences are provable G¨del’s original proof was based on the assumption of ωconsistency and o shows that every axiomatizable ωconsistent system in which all true Σ0 sentences are provable is incomplete. implying that every sentence provable in Peano Arithmetic is a true sentence.132 ¨ CHAPTER 8. when they are exact.3 G¨del’s theorem and Peano Arithmetic o The classiﬁcation of the various modes of syllogisms. The third proof (1936) is due to Rosser and uses the even weaker assumption of simple consistency. page 379] There are various diﬀerent incompleteness proofs of Peano Arithmetic (with and without exponentiation).
reduced to a few axioms and rules of inference. They can be found in [84] but in a particular presentation that does not use the concept of a model for axiomatic systems. but thereto uses a more elaborate sentence than the G¨del sentence o ‘G is undecidable’. and that sometimes attaches diﬀerent meanings to established deﬁnitions. We intend to include the three proofs in a later version of this document.¨ 8. . von Neumann) on the other. G¨del on the ﬁrst page of [27]: o The most comprehensive formal systems that have been set up hitherto are the system of Principia Mathematica on the one hand and the ZermeloFraenkel axiom system of set theory (further developed by J. G¨del’s theorems apply to Principia o Mathematica. We quote K. that on the contrary there are in the two systems mentioned relatively simple problems in the theory of integers that cannot be decided on the basis of the axioms”. One might therefore conjecture that these axioms and rules of inference are suﬃcient to decide any mathematical question that can at all be formally expressed in these systems. given the proof of incompleteness of Peano Arithmetic. It will be shown that this is not the case.3. that is. nevertheless it contains in our opinion one of the best discussions of G¨del’s incompleteness theorems. o In a later version of this document we will also show how. These two systems are so comprehensive that in them all methods of proof today used in mathematics are formalized. GODEL’S THEOREM AND PEANO ARITHMETIC 133 shows that every axiomatizable simply consistent extension of R is incomplete.
. .A. and he was always trying to convince me otherwise. Church in an interview at Princeton University (1985) In a later version of this document we will discuss the implications of G¨del’s theorem and show the reactions that followed the publication of his o paper [27] in 1931. I was hard to convince about the incompleteness theorem. which I shared. Like most others.4 Consequences I had a lot of conversations with him [G¨del] and a lot of diso agreements. to think that it was special to a certain type of formalization of logic and that a radical reformalization might have the eﬀect that the G¨del argument did not apply. GODEL 8.134 ¨ CHAPTER 8. I persisted in that o longer than I should have. There was at the time a tendency.
¨ 8.5. The comprehension axiom was replaced by the weaker separation axiom. set theory was given a more rigorous basis. Based on this principle a ﬁrst formal theory of sets.5 NeumannBernaysG¨del axioms o There is an inﬁnite set A that is not too big. At the same time. After Hilbert proposed his proof theory. and axiomatic theories for Cantor’s sets were proposed. The antinomies of BuraliForti and Russell however showed that this theory was inconsistent. and one way to restore consistency was to incorporate in the system a theory of types.John von Neumann (sources unknown) Let us recapture the situation of the axiomatic theory of sets before we introduce the NeumannBernaysG¨del theory.3. Before we will describe why the Hungarian mathematician von Neumann opposed this solution and came with his own solution to the paradoxes. Others tried to overcome the inconsistencies by making Cantor’s set theory more rigidly axiomatic. called ‘ideal calculus’ was developed (not treated in detail here. NEUMANNBERNAYSGODEL AXIOMS 135 8. he gave the informal deﬁnition (see page 16) of a set being ‘any comprehension into a whole M of deﬁnite and separate objects m of our intuition or thought’. that only allows new sets to be created from existing sets and with deﬁnite predicates. . intuitionists tried to do mathematics without Cantor’s set theory at all. as was done by Russell. There’s no sense in being precise when you don’t even know what you’re talking about. we will look at this separation axiom . and introduced 7 axioms. The problem for him was to solve the problem of axiomatization in such a way that it excludes all contradictions but still is suﬃciently wide for all that is valuable in this theory to be preserved. o When Cantor introduced his set theory. Cantor’s deﬁnition was replaced by the principle of comprehension (see page 16). deﬁned the primitive notions of ‘equality’ and ‘is element of’ relation. and the most successful axiomatization of set theory was presented by Zermelo in 1908. As we have seen in section 5. see for example [36]). which was adopted by Frege and Russell. Zermelo postulated a domain of abstract objects (sets) and elements of this domain.
His formulation diﬀered considerably from Zermelo and Fraenkel (see section 5.3) because the notion of function. was taken as primitive. a collaborator with the formalist David Hilbert. The foundation axiom of von Neumann makes occurrence of socalled extraordinary sets impossible. 3 . GODEL in more detail. etc. no such property as that of being deﬁnable in a ﬁnite number of words can be used in the deﬁnition of a set. In a series of papers beginning in 1937. . the validity or invalidity of which is decided without arbitrariness by the basic laws of logic. Zermelo deﬁned the separation axiom as follows: Separation axiom: (∀z∃y∀x :: x ∈ y ↔ x ∈ z ∧ ϕ(x)). modiﬁed the von Neumann approach in a way that put it in much closer contact with Zermelo and Fraenkel. however. We have already seen on page 93 that this axiom excludes the paradoxes of Russell and BuraliForti. page 290. This version is known as the NeumannBernaysG¨del (NBG) axioms. In 1940. The concept of deﬁniteness in this axiom was deﬁned by Zermelo as follows: “A question or assertion ϕ.136 ¨ CHAPTER 8. the Swiss logician Paul Bernays. In [83]. and as Kneebone remarks3 in [49. such that V2 ∈ V1 . ϕ is deﬁnite and does not contain y. the Czechborn Kurt G¨del. For every set z there exists a set y whose elements are exactly those of z having the property ϕ. rather than that of set. V3 . 291]). Fraenkel used Skolem’s idea to formulate the separation axiom in a new way (for details. o We quote: “since a deﬁnite property is one that is decidable by the basic relations of the domain B [of sets. known for o his incompleteness proof (see chapter 8). . the abstract objects postulated by Zermelo]. see [49. Von Neumann’s subsequent interest in set theory led to the second major axiomatization of set theory in the 1920s. V3 ∈ V2 . . page 263] also the semantic paradoxes. V2 . In 1922 Fraenkel proposed the introduction of another axiom that allows the existence of larger cardinal numbers than hitherto possible. the Norwegian mathematician Skolem pointed out that the deﬁnition of ‘deﬁniteness’ was rather vague and he made precise the formulation of ‘by the basic laws of logic’. further simpliﬁed the theory. and the semantic paradoxes are thus also excluded”. is said to be ‘deﬁnite’ ”. A set is extraordinary if there is a sequence of sets V1 .
e. and proper classes are thought of as those collections that are ‘so big’ that.e. This can be proven by showing in each case that the collection on which the paradox is based is a proper class. than X = Y . if X is a set. Since in both cases (R ∈ R and R ∈ R) we get / a contradiction. those classes that are not sets are called proper classes. Suppose R ∈ R. A class is deﬁned to be a set if it is a member of some class. . In the NeumannBernaysG¨del axioms. Suppose R is a set. So we must / have R ∈ R. Then by our assumption we have R is a set ∧ R ∈ R. Y. Example: ‘for all x. sets are intended to be those classes that are adequate for mathematics.¨ 8. and conversely. but not / a set. z :: (z ∈ X ↔ z ∈ Y ) → X = Y ) Classes are uniquely determined by their members. Theorem: With the NeumannBernaysG¨del axioms. i. A(x)’ stands for ‘for all X. as is also true in Zermelo and Fraenkel. out assumption that R is a set must be wrong. contradictions would follow. it is convenient to adopt the undeﬁned notions of class and the membership relation (though. if they were permitted to be sets. The NeumannBernaysG¨del axioms (NBG): o 1 Extensionality axiom (or axiom of determination): (∀X. But then we have (take R for y in the above statement) R ∈ R ↔ R is a set ∧ R ∈ R: contradiction. the derivation of o Russell’s paradox does not apply. o the classical paradoxes are avoided. then A(X)’. NEUMANNBERNAYSGODEL AXIOMS 137 Before we give the axioms. Intuitively. ∈ suﬃces). the condition holds for all sets. We prove by / contradiction that R is not a set. In the axioms we distinguish between the use of capital Latin letters and lowercase Latin letters for the variables. to be exact: if every element (that is a set) of a class X is at the same time an element of Y . The lowercase letters are used as special restricted variables for sets. For all y we have that y ∈ R ↔ y is a set ∧ y ∈ y. and / / thus R ∈ R: contradiction. Proof: We show that R := {x  x is a set ∧ x ∈ x} is a class. is not a set. i.5. The capital letters stand for variables that take classes (the totalities corresponding to certain properties) as values.
5 Sumset axiom or Union axiom: (∀z∃y∀x :: x ∈ y ↔ (∃w :: w ∈ z ∧ x ∈ w)) For every set z there exists a set y whose elements are exactly those objects occurring in at least one element of z. .138 ¨ CHAPTER 8. b :: (∃y∀x :: x ∈ y ↔ x = a ∨ x = b)) Given two sets a and b there exists a set whose elements are exactly a and b. ϕ is a proposition in which set variables are only introduced by existential and universal quantiﬁers. GODEL 2 Axiom of the empty set: (∃x∀y :: y ∈ x) / There is an (improper. 3 Axiom for class formation: (∃Y ∀x :: (x ∈ Y ↔ ϕ(x)). 7 Axiom of inﬁnity: (∃z :: ∅ ∈ z ∧ (∀a : a ∈ z : {a} ∈ z)) There exists a successor set. 9 Axiom of replacement or axiom of substitution (by Fraenkel): (∀x∃!y : ϕ is a class : ϕ(x. 4 Pairing axiom: (∀a. see also footnote on page 93) set. the ‘null’ or ‘empty’ set. For every set z there exists a set y whose elements are exactly those of z having the property ϕ. y)))) The image of a set under an operation (functional property) is again a set. 6 Power set axiom (∀z∃y∀x :: x ∈ y ↔ x ⊆ z) For every set z there is a set y that includes every subset of x. 10 Axiom of restriction: X = ∅ → (∃y : y ∈ X ∧ X ∩ y = ∅) Every nonempty class is disjoint from one of its elements. which contains no elements at all. 8 Axiom of choice: (∀x :: (∃f : f is a function : Dom(f ) = x − {∅} ∧ (∀a : a ∈ Dom(f ) : f (a) ∈ x))) Every set x has a choice function. y)) → (∀a :: (∃b∀y :: y ∈ b ↔ (∃x : x ∈ a : ϕ(x.
From this ﬁnite set of axioms. 3. The ﬁniteness of the axioms for NBG makes the logical study of the system simpler. each instance of the above scheme. a list of seven axioms (not schemes) that for each of certain conditions there exists a corresponding class of all those sets satisfying the condition. or both are inconsistent. All we know from this theorem is that either both axioms are consistent.5. however. with the comments above about the third axiom in mind. instead. On the other hand. ZF has inﬁnitely many axioms. NEUMANNBERNAYSGODEL AXIOMS 139 The axioms 1. When obtained in this way.¨ 8.3. In a detailed development of NBG. it follows that NBG has only a ﬁnite number of axioms. We give the following theorems without proof: Theorem: Every theorem of ZF is a theorem of NBG Theorem: Any theorem of NBG that speaks only about sets is a theorem of ZF Theorem: ZF is consistent if and only if NBG is consistent Note that the fact that NBG avoids the classical paradoxes and that there is no apparent way to derive any one of them in ZF does not settle the question of the consistency of either theory. since the ninth axiom or scheme of ZF provides an axiom for each formula. the third axiom scheme of NBG is called the class existence theorem. can be obtained as a theorem. there appears. The third axiom (scheme) is presented in a form to facilitate a comparison with the third axiom (scheme) of ZF. The relationship between the theories may be summarized by the statement that ZF is essentially the part of NBG that refers only to sets. Thus. In contrast to the ninth axiom scheme of ZF (see section 5. that of NBG is not an axiom scheme but an axiom. . 9 and 10 are diﬀerent from ZF.2).
140 ¨ CHAPTER 8. GODEL .
1). worked there on probability theory and (independently of de Moivre) discovered the central limit theorem. for he showed that any useful mao thematical axiom system is incomplete in the sense that there must exist propositions whose truth can never be decided (within the system). he proved in his seminal paper ‘On Computable Numbers. Kurt G¨del threw up an obstacle to this eﬀort. or Entscheidungs141 . . As we have seen in the previous chapters. We now regard this decision problem. page 46] Alan Mathison Turing (19121954) was an English mathematician and logician who pioneered in the ﬁeld of computer theory and who contributed important logical analyses of computer processes.1 Turing and Turing Machine We may hope that machines will eventually compete with men in all purely intellectual ﬁelds. procedure for establishing truth (the socalled formalist program). Turing studied in Cambridge. with an Application to the Entscheidungsproblem’ (reprinted in [19]) that there cannot exist any such universal method of determination.Chapter 9 Church and Turing 9. many mathematicians had attempted to eliminate all possible error from mathematics by establishing a formal. With his incompleteness theorem (see section 8. Turing was motivated by G¨del’s work to seek an algorithmic method of determining o whether any given propositions were undecidable.Alan Turing in [38. Instead. or purely algorithmic. In 1936 he won the Smith’s Prize. with the ultimate goal of eliminating them from mathematics.
since the incompleteness theorem o (see section 8. the development of pattern and form in living organisms. within a given formal axiomatic system (that includes at least arithmetic). The Turing Machine Turing introduced his hypothetical computing device in 1936. This was left open by G¨del. and he proposed a simple test. His efforts in the construction of early computers and the development of early programming techniques were of prime importance. The Turing Machine and proof of undecidability are given later in the section. in 1938 on systems of logic based on ordinals. Thereto Turing introduced a hypothetical computing device (later called Turing machine). given a mathematical proposition.D. G¨del had demonstrated that such proposio tions exist in any such system. He originally conceived the machine as a mathematical tool that could infallibly recognize undecidable propositions . cannot be either true or false. and in 1948 he worked in Manchester on the construction of a new digital computer.1). one could ﬁnd an algorithm which would decide if the proposition is true or false. The problem of decidability asks if..1) only stated that consistency and completeness could not at the same time be attained. During the war Turing worked on breaking German Enigma codes.i. Turing instead proved there can never exist any universal algorithmic method for determining whether a proposition is undecidable.e. He described a modern computer before technology had reached the point where the construction was a realistic possibility. that means there were statements (in consistent . After this important publication Turing completed his Ph. But it is more diﬃcult to prove there is no algorithm that can solve certain propositions. under direction of Alonzo Church (see section 9. it is easy to see that it can prove certain propositions.2). to assess this capability. those mathematical statements that. in more detail. now known as the Turing test. He also championed the theory that computers eventually could be constructed that would be capable of human thought. CHAPTER 9. In 1952 Turing published the ﬁrst part of his theoretical study of morphogenesis. When given an algorithm. CHURCH AND TURING Decidability was one of Hilbert’s requirements for an axiomatic system (see section 6. Turing’s papers on the subject are widely acknowledged as the foundation of research in artiﬁcial intelligence.142 problem.
e. since each transition is determined uniquely given the conﬁguration) formally as a septuple: . Thereto Turing introduced a machine that was later to be called the Turing machine. We can deﬁne a Turing Machine (sometimes called deterministic. sequence of transitions) continues until the Turing Machine enters either the (ﬁnal) state qaccept or the (ﬁnal) state qreject . TURING AND TURING MACHINE 143 systems) about numbers. this means the tape continues inﬁnitely to the right side but it has a leftmost position. memory (tape) and central processing unit (head and transition function). and depends on the current state and the current symbol on tape. By extrapolating the essential features of information processing. His model served as a basis for all subsequent digital computers. which could not be proved from ﬁnitely many rules. This computation (i. indubitably true. ﬁlled with a special blank symbol ) everywhere else.e. the tape contents and the head location) to the next. Nowadays there are many models of computing devices available in the theory of computation (complexity). This transition can consist of writing a new symbol of the tape alphabet Γ to the tape and moving the tape head either Left or Right. and starts in an initial state q0 . an idealized mathematical model that reduces the logical structure of any computing device to its essentials.9.1). We now directly introduce the much more powerful model of Turing that we need to invest all mathematical problems. It uses a transition function δ that determines how it gets from one conﬁguration (that is the current state. Turing was instrumented in the development of the modern digital computer. We will not cover restricted models such as ﬁnite automata and pushdown languages (and corresponding notions such as regular languages and contextfree grammars). Initially the tape contains an input string of symbols from an input alphabet Σ and is blank (i. and has a tape head that can read and write symbols (of a set Γ) and move around a tape (to the L(eft) or R(ight)).1. The Turing Machine model uses an inﬁnite tape as its unlimited memory. But the decidability of mathematical statements was not settled by G¨dels theorem because it needs a formal deﬁnition of (alo gorithmic) method in the formulation of the problem (or a deﬁnition of the notion of algorithm in the deﬁnition of decidability in section 6. We here assume the tape is rightinﬁnite. which share his basic scheme of an input/output device (tape and head). The Turing Machine is in a state q of a set of states Q.
After deﬁning the Turing Machine. 5 q0 is the start state. where δ is ﬁnite and δ : Q × Γ → Q × Γ × {L. R}.e. Turing’s expression for Turing Machines] can do anything that could be described as “rule of thumb” or “purely mechanical”. where qreject ∈ Q and qreject = qaccept . where { } ∈ Γ and Σ ⊆ Γ. 2 Σ is a ﬁnite input alphabet not containing the special blank symbol . qaccept . see also section 9. where qaccept ∈ Q. δ. it is intuitively or eﬀectively computable). Turing made his famous proposal (known as Turing’s thesis.144 CHAPTER 9. We now introduce more of Turing’s theory of Turing Machines before we deﬁne his proof of undecidability. and halting conﬁgurations if the state is either qaccept or qreject . qreject ) with: 1 Q is a ﬁnite set of states. Σ. We quote one of Turing’s formulations from [90]: Turing’s Thesis: LCM’s [Logical Computing Machines. The proposal says that whenever there is an eﬀective method for obtaining the values of a mathematical function (i. where q0 ∈ Q. 7 qreject is the reject state. 6 qaccept is the accept state. . The converse claim is trivial. CHURCH AND TURING Deﬁnition of a Turing Machine (TM): A Turing Machine (TM) := (Q. rejecting conﬁgurations if the state is qreject . A start conﬁguration C on input w is a conﬁguration with state q0 and the head is on the leftmost position on the tape with just w on it. Γ. the function can be computed by a Turing Machine. 3 Γ is a ﬁnite tape alphabet. q0 .3) for the concept of ‘computability by a Turing machine’. and if the thesis is correct we can reduce problems of (non)existence of eﬀective methods by problems of the (non)existence of Turing Machine problems. 4 δ is the transition function. We call conﬁgurations accepting conﬁgurations if the state is qaccept .
Deﬁnition of the language of a TM: The language of a TM M .e. A set of strings that M accepts is called the language of M . M holds in qreject if w ∈ L.9. . Most variants are equivalent in the sense that they can recognize the same set of languages . TURING AND TURING MACHINE 145 We deﬁne a language to be a set of strings. Ck exists where 1 C1 is the start conﬁguration of M on input w. We now deﬁne a notion that covers the ability of a TM to end in the accept state when started with any string of a certain language. M stops in qreject or does not stop (loops) if w ∈ L. a string being a series of alphabet symbols (i. There are several variants on Turing Machines such as doublesided inﬁnite Turing Machines. w ∈ Σ∗ . 3 Ck is an accepting conﬁguration. Let w ∈ Σ∗ . . 2 Each Ci yields Ci+1 via the transition function δ on M . multitape Turing Machines. Deﬁnition of Turingdecidable (or decidable): A language L is decided by a TM M := there exists a deterministic TM M such that: 1 with input w. If language L is recognized by a TM M we say that M is an acceptor for L. . M holds in qaccept if w ∈ L. and 2 with input w. / If a language L is decided by a TM M we say that M is a decider for L. M stops in qaccept if w ∈ L and / 2 with input w.1. for all strings w). We distinguish between recognizing and deciding capabilities. We say that a TM M accepts input string w if a sequence of conﬁgurations C1 . notation L(M ) := {w  w is a string that M accepts }. nondeterministic Turing Machines and certain types of socalled enumerators. Deﬁnition of Turingrecognizable: A language L is recognized by a TM M := there exists a TM M such that for all strings 1 with input w. .
y) = x + y if x ≥ y f (x. so addition amounts to the concatenation of two strings. q4 } Σ = {0. and decide what the initial situation of the tape is. . 1. y) = 0 if x < y For simplicity. First we have to choose a convention for representing positive integers. We choose a unary notation in which any positive integer xis represented by w(x) ∈ {1}+ . . . qR ). then how the comparison x ≥ y can be made and ﬁnally how to combine those two machines into a Turing Machine that computes the desired function. 1) = (q0 . qA . q0 . Calculating the sum To add the two numbers a and b. we assume x and y to be positive integers. δ. Γ. adds a and b and is constructed relatively simple: Adder = (Q. . 1. We assume that w(x) and w(y) are on the tape in unary notation. separated by a single ‘0’ and with the readwrite head on the leftmost symbol of w(x). The following Turing Machine.146 CHAPTER 9. with Q = {q0 . CHURCH AND TURING (but not necessarily equally eﬃcient). We ﬁrst describe how the sum of x and y can be calculated. } q0 = {q0 } qA = {q4 } qR = {} δ(q0 . The problem (idea from [56]) is to design a Turing Machine that computes the function f (x. Σ. R) . such that  w(x)  = x. q1 . we only have to remove the separating 0. 1} Γ = {0. Example: We now give an example of a Turing Machine solving a mathematical problem by ﬁrst deﬁning it as a language problem. called Adder.
) = (q4 . qR ). We use this to get into the state q5 . R) is needed to remove this ‘1’ at the end of the computation. 0. In the second case. Finally. R) δ(q1 . L) δ(q3 . we again assume they are written in the notation that we used before and divided by a single ‘0’. 0) = (q1 . 1. The matching will stop when one of the two sequences of ‘1’s is completely checked oﬀ. qA . we still ﬁnd a ‘1’ on the right when all ‘1’s on the left have been replaced. We will construct a Turing Machine that halts in an accepting state if a ≥ b and in a rejecting state if a < b. Thereto we can match each ‘1’ on the left of the dividing ‘0’ with a ‘1’ on the right. Comparison To compare two numbers a and b. 1) = (q3 . which can be used as a signal to enter the accepting state. if a ≥ b. . 1. 1) = (q0 . 1) = (q1 . R) 147 Note that we remove the ‘0’ by temporarily creating an extra ‘1’. q0 . and if x ≥ y either the left sequence contains ‘1’s or neither sequence contains ‘1’s. L) δ(q2 . If x < y then the right sequences will still contain ‘1’s. L) δ(q3 . The transition δ(q2 . . If we work this out in detail. In the ﬁrst case. Σ. we encounter a blank at the right of the working space. we move the readwrite head back to the leftmost ‘1’. R) δ(q1 . 1. This is not strictly necessary in this example.9. but it is not harmful and normally a good habit to let any action terminate in a state from which it is easy to take further transitions. We can do this by starting at the leftmost ‘1’ (of the number a) and interchangeably check oﬀ the leftmost symbols of the numbers a and b by replacing them with the symbols ‘x’ and ‘y’ respectively. TURING AND TURING MACHINE δ(q0 . because the machine is designed such that it will terminate right after any addition. with: .1. ) = (q2 . Γ. a fact that is remembered by putting the machine into state q1 . δ. we get the following Turing Machine called Comparer := (Q. 1) = (q3 . 0. when we attempt to match another ‘1’.
R) δ(q2 .148 CHAPTER 9. δ(q3 . 0. } q0 = {q0 } qA = {q5 } qR = {q7 } The transitions of δ can be grouped in several parts. 1. The next group of transitions reverses the direction and repositions the readwrite head over the leftmost ‘1’ of a. 0) = (q4 . q1 . δ(q0 . 1. y. R) δ(q2 . x) = (q0 . L) δ(q4 . 1) = (q4 . 1) = (q1 . y) = (q2 . y. then causes the readwrite head to travel right to the ﬁrst ‘1’ of b and replace it with the symbol ‘y’. q7 } Σ = {0. x. 1) = (q1 . indicating that on ‘1’ of ‘y’ has been successfully paired with a ‘1’ of ‘x’. When the dividing ‘0’ is passed. L) δ(q4 . 1. 1) = (q3 . R) δ(q1 . y. q2 . 0. R) . and returns control to the initial state. 1} Γ = {0. CHURCH AND TURING Q = {q0 . q4 . R) δ(q1 . the machine enters state q2 . 0) = (q2 . the machine enters a state q3 . q6 . x. y) = (q3 . x. L) δ(q3 . L) This set replaces the leftmost ‘1’ of a with ‘x’. q3 . y. q5 . indicating that it is now dealing with the number b. When the symbol ‘y’ has been written.
9.1. TURING AND TURING MACHINE
149
The rewriting continues this way when the input is a string 1x 01y , stopping only when on one side no more ‘1’s can be replaced. In that case either the left side will not contain anymore ‘1’s (a ≤ b), or the right side has run out of ‘1’s (a > b). In case the left side will not contain anymore ‘1’s, the transition δ(q4 , x) = (q0 , x, R) will leave the readwrite head on a ‘0’ in stead of a ‘1’. δ(q0 , 0) = (q5 , x, L) (a ≤ b) δ(q2 , ) = (q6 , , L) (a > b) In the ﬁrst case we still have to check whether the right side has any ‘1’s left, to determine whether a = b. This is done in the state q5 . δ(q5 , x) = (q5 , x, R) δ(q5 , 0) = (q5 , 0, R) δ(q5 , y) = (q5 , y, R) δ(q5 , 1) = (q7 , y, R) (a < b) δ(q5 , ) = (q6 , , L) (a = b) Combining Turing Machines for complicated tasks We now have to put together the Turing Machines’ Adder and Comparer to obtain the desired Turing Machine that computes the given function. We can do this by starting with the input a and b in the previously described notation and starting position, and using Comparer to determine whether or not a ≥ b. We index all states with a C, i.e. the last transition will be δ(qC,0 , x) = (qC,5 , x, L) or δ(qC,2 , ) = δ(qC,6 , , L). In the ﬁrst case (a ≥ b), the Comparer should send a ‘start signal’ to the Adder, to give a + b as output. In the second case (a < b), the Comparer should send a ‘start signal’ to a Turing Machine, (called Eraser) that simply replaces all ‘1’s by ‘0’s to output the value 0 in the desired format. We show how we can let the Comparer send these ‘start signals’. We ﬁrst index all states of the Adder by A and of the Eraser by E. Now in case of a ≥ b, Comparer ends in state qC,5 , and we can add a transition δ(qC,5 , ∗) = δ(qA,0 , ∗). The star ‘∗’ stands for any possible symbol, so actually this transition is a shorthand notation for a set of transitions. Similarly, we can let
150
CHAPTER 9. CHURCH AND TURING
δ(qC,7 , ∗) = δ(qE,0 , ∗) bring the Eraser in the initial state. The Adder respectively Eraser will then give the desired output because their behavior on the input does not change as a result of the remaining of the states by comparer (to be exact: the state in which the comparer terminates is suitable as an initial position for Adder or Eraser). The only thing we have not taken care of is that when the Comparer enters a ﬁnal state, it does not have the initial representation of the numbers a and b on tape, but has replaced the ‘1’s by ‘x’s and ‘y’s. We can easily (it is just some extra work, you can try it as an exercise if you want) ﬁx this by letting Comparer, as the last action before entering a ﬁnal state, replace all ‘x’s and ‘y’s by ‘1’s. The result is a Turing Machine that combines Comparer, Adder and Eraser to compute the function f . Similarly to this example, we can for example multiply two numbers a and b, and we can also translate macroinstructions like ‘if p then qj else qk ’ (meaning that when we read ‘p’ on tape, then the Turing Machine goes into a state qj and otherwise into a state qk ), and even combine them into complicated subprograms that can be invoked repeatedly whenever needed. (End of Example) The Entscheidungsproblem After introducing the notion of a TM in [89], Turing answered Hilbert’s decision problem for mathematical logic (in German called ‘Entscheidungsproblem’) in the negative. The Entscheidungsproblem asks whether there exists a deﬁnite method or algorithm which (at least in principle) can be applied to any given mathematical property to decide whether that proposition is provable. We now deﬁne the notion of an algorithm with the notion of a Turing Machine, and the set of provable propositions by the set of languages that can be decided by some TM. If we look at the deﬁnition of decidability in section 6.1, we have that for all formulas ϕ an algorithm, i.e. a TM, exists that decides whether ϕ is true or not. If we code ϕ by means of a language, and this is always possible (see the previous example for a demonstration), we can reformulate the problem as: for all strings w ∈ L, there exists a TM M that decides ϕ. We now show that this is not possible for all problems (i.e. languages) by giving a speciﬁc problem, the Halting problem, that is not decidable. The Halting problem is the problem of testing whether a TM accepts a given input string. We deﬁne the problem by stating it as a language pro
9.1. TURING AND TURING MACHINE blem, and asking whether that language is decidable.
151
Deﬁnition of the Halting problem: For all strings w, H := {< M, w >  M is a TM and M accepts w}. Is H decidable? (i.e. is there for each language a TM that decides for all strings w if they belong to the language or not, that is (using Turing’s thesis, see section 9.3): is there for each problem an algorithm that can decide it?). Theorem: H is recognizable Proof (by Turing): The following TM U , also called Universal Turing Machine because it is capable of simulating any other Turing Machine, recognizes H. We informally deﬁne U , because a detailed deﬁnition of the septuple such a TM consists of (see the deﬁnition of a TM) is a lot of work. Description of Universal Turing Machine: U = “On the input < M, w > where M is a TM and w is a string: 1 simulate M on input w 2 if M ever enters its accept state, accept” Note that this TM loops on input < M, w > if M loops on w, which is why this machine does not decide H. If the algorithm had some way to determine that M was not halting on w, it could reject. Hence H is sometimes called the Halting problem. As Turing demonstrated, an algorithm has no way to make this determination. Theorem: H is undecidable (see also [82, page 165]). Proof (by Turing): We assume H is decidable and obtain a contradiction. Suppose D is a decider for H, and deﬁned by D(< M, w >) :=“ • accept if M accepts w • reject if M does not accept w” Now we construct a new TM O with D as a subroutine. This new TM calls D to determine what M does when the input to M is its own description < M >. Once O has determined this information, it does the opposite. That is, it rejects if M accepts and accepts if M does not accept. The following is a description of O: O = “On input < M >, where M is a TM:
2 output the opposite of what D outputs. . of the inadequacy of ‘reason’ unsupported by common sense. . that is if D accepts.turing. .Alan Turing In this section I have made extensive use of [38] [92] for information on the life and work of Turing and [89] [82] [19] for the theory of TM’s and the Halting problem. may be regarded as going some way towards a demonstration.152 CHAPTER 9.org.uk/ . Another valuable source of information on Turing’s life and work is the website http://www. In that case we get: O(< O >) = “ • accept if O does not accept < O > • reject if O does accept < O > ” Thus neither O nor D can exist. such as the Halting problem for Turing machines: These . Turing wrote in his last publication about the interpretation of unsolvable problems. < M >>. reject and if D rejects. accept” We summarize the behavior of O as follows: O(< M >) = “ • accept if M does not accept < M > • reject if M accepts < M > ” Now we obtain the contradiction by running O with its own description < O > as input. CHURCH AND TURING 1 run D on input < M. within mathematics itself.
9.2. CHURCH AND THE LAMBDA CALCULUS
153
9.2
Church and the Lambda Calculus
Alonzo Church (19031995) was an American mathematician, whose work is of major importance in mathematical logic, recursion theory and in theoretical computer science. One of the most important contributions to logic is his invention in the 1930s of the lambda calculus. He is also remembered for Church’s theorem published in 1936 in [14, page 345363], stating that the lambda calculus can be used to embody a correct formalization of the notion of computability (see section 9.3). The notion of lambda deﬁnability is conceptually the basis for the discipline of functional programming, and the lambda calculus is also the basis for type theory. Church also founded the Journal of Symbolic logic in 1956. He had 31 doctoral students including famous mathematicians such as Turing, Kleene, Kemeny and Smullyan. We now introduce the lambda calculus (Church’s formalization of the notion of eﬀective calculability) in a modern setting, using [9, chapter 4]. Application and abstraction First we introduce the basic concepts of λcalculus. A formalization follows thereafter. The lambda calculus has only two basic operations, abstraction and application. • Abstraction is for constructing functions: For an expression E we introduce λx.E to denote the abstraction of E over x, i.e. ‘the function of x which computes E’. Example1 : λx . x + 1, λn . n × n, etc. We will later see how to deﬁne a recursive function; this is not so easy since we do not have function names. • (Function) application: The expression F A denotes that F is considered as a function (an algorithm) applied to input A. The original lambda calculus theory is typefree so we also consider F F , that is, F applied to itself. Example: (λx . x + 1) 4, (λn . n × n) 7, etc.
Note that in some examples we have simpliﬁed the notation for the clarity of the example, since in pure lambda calculus we do not have arithmetic symbols, like + and ×, but we can encode these operations in the pure lambda calculus, as we will later see.
1
154
CHAPTER 9. CHURCH AND TURING
These two notions can be very powerful if we introduce the rule of beta reduction which allows us to apply an expression over an abstraction, and for example, rewrite (λx . x+1)4 to 4+1. Similarly (λn . n×n) 7 can be reduced to 7×7. It is also allowed to use arbitrary nesting: ((λn . λx . (x+1)×n) 7) 4 can be reduced to (λx . (x + 1) × 7) 4 and then to (4 + 1) × 7. Similar to ordinary mathematics, the names of the variables are irrelevant to the rules that can be applied, which allows a transformation of the names (also known as dummy transformation). This rule in lambda calculus is called alpha conversion. For example, alpha conversion allows us to rewrite λn . nn to λx . xx, since they are essentially the same function. Note that we also want to use functions as variables and arguments: ((λf . (λn . λx . f x × n) 7)(λy . y + 1)) 4 should reduce to the earlier expression. But above we only have functions of one argument; we now introduce functions with more arguments, while avoiding new notations. We can solve this problem by using iteration of applications, often called currying after the American mathematician H.B. Curry who made it popular. Example: f (x, y) = 3 × x + y can be written as F1 ≡ λx . (λy . 3 × x + y). Then f (4, 5) is written (F1 4) 5, that is ((λx . (λy . 3 × x + y)) 4) 5, which can be reduced to (by using beta reduction): 3 × 4 + 5. The above explanation and examples give an idea of what lambda calculus is. We will now work towards a more formal deﬁnition of lambda calculus. The system of lambda calculus is based on the structure of Abstract Reduction Systems (ARS). The terms of the ARS then coincide with the inductively deﬁned lambda terms and the reduction relation will be β−reduction. So before we formally deﬁne the lambda calculus, we introduce the most relevant theory of abstract reduction systems. Abstract Reduction Systems Deﬁnition of Abstract Reduction System (ARS): An abstract reduction system A := a structure A, → consisting of a set A and a binary relation → on A (i.e. →⊆ A × A). The relation is also called reduction or rewrite relation. If for a, b ∈ A, we have a → b, we call b a onestep reduct of a.
9.2. CHURCH AND THE LAMBDA CALCULUS
155
The transitive and reﬂexive closure of → is written as (or alternatively → ). This means is the smallest relation on A satisfying, for all a, b, c ∈ A,
∗
(closure of →) if a → b then a (reﬂexive) a (transitive) if a a, and b and b
b,
c then a
c.
Thus a b if and only if there exists a ﬁnite sequence of reduction steps a ≡ a0 → a1 → . . . → an ≡ b. This sequence may be empty, in which case a ≡ b. Here ≡ denotes (the syntactic) identity of elements of A, i.e. a ≡ b if and only if a and b are the same element of A. Deﬁnition of Normal Form: A term a ∈ A of an ARS < A, →> is a normal form := there is no b ∈ A such that a → b. Furthermore, b ∈ A has a normal form if and only if b a for some normal form a ∈ A Deﬁnition of Weakly Normalizing: The reduction relation → of an ARS < A, →> is weakly normalizing (or weakly terminating) := every a ∈ A has a normal form. In this case we also say that A is weakly normalizing Deﬁnition of Strongly Normalizing: The reduction relation → of an ARS < A, →> is strongly normalizing (also called terminating, wellfounded or noetherian) := there exists no inﬁnite reduction a0 → a1 → a2 → . . ., with for all n ∈ N, an ∈ A. Lemma If an ARS is strongly normalizing, it is weakly normalizing. Proof: We prove this by proving the contraposition: if A, → is not weakly normalizing then A, → is not strongly normalizing. Suppose A, → is not weakly normalizing. Then there is a0 ∈ A without a normal form. Since a0 has no normal form, then certainly a0 is not a normal form itself, so there is a1 ∈ A such that a0 → a1 . Now a0 has no normal form, so a1 can not be a normal form. Thus we get an element a2 ∈ A such that a1 → a2 . Repeating this process yields an inﬁnite reduction a0 → a1 → a2 → . . .. Deﬁnition of Unique Normal Form: The reduction relation → of an ARS < A, →> has the unique normal form property := for all a, b, c ∈ A
there exists a d such that b d and c d.156 such that a b. so it trivially has the unique normal form property and is not weakly normalizing. Proof: Suppose a b. Since → is weakly normalizing. or is ChurchRosser ) := for all a. c are normal forms. →> is called conﬂuent (or has the ChurchRosser property. thus b ≡ c. and thus by the unique normal form property we also have a b and a b and c b. Since b and c are normal forms. By transitivity there are normal forms b and c such that b c . →> with the unique normal form property is not always weakly normalizing. →> is called locally conﬂuent or weakly conﬂuent (also weakly ChurchRosser ) := for all a. Hence b Lemma If → is conﬂuent then → has the unique normal form property. the abstract reduction system with only element a ∈ A and rewrite rule a → a has no normal forms. a c. CHURCH AND TURING c. we have b ≡ c Lemma An ARS < A. b. we must have b ≡ d and c ≡ d. we will give a more formal deﬁnition. Proof: Suppose we have a b and a c. c ∈ A with a b and a c there exists a d ∈ A such that b d and c d Lemma If a reduction relation has the unique normal form property and is weakly normalizing then it is conﬂuent. b and c c . b ≡ c . By conﬂuence. c ∈ A with a → b and a → c there exists a d ∈ A such that b d and c d Deﬁnition of Conﬂuence: A reduction relation → of an ARS < A. and b. a CHAPTER 9. Deﬁnition of Local Conﬂuence: A reduction relation → of an ARS < A. Syntax Now we have seen the basic principle of lambda calculus. We formally deﬁne the syntax of the lambda calculus by giving its grammar. and b. Proof: For instance. . c are normal forms. b.
c. Example: n occurs free in λx . . .2.) 157 • (E1 E2 ) denotes a combination involving the application of one expression (E1 ) to another (E2 ). (x + 1) × n. E) denotes an abstraction. . Both n and x occur bound in λn . E1 E2 means (λv .)). . . Notational conventions: to achieve a minimal notation. . . . . (E1 E2 )). for elements of C) • v ranges over a (denumerable) set of variables (using v. Example: We can write the expression F1 of the previous example as λx y . and the occurrence is said to be free otherwise. x. and λv . . E) . . . x + 1) × n.xn . and assume: • Association to the left for iterated application: F E1 E2 . Informally it denotes a function of v which produces result E. ((F E1 ) E2 ) . x2 . The subexpression E is referred to as the body of the abstraction and v is called the bound variable of the abstraction We also call lambda terms simply ‘terms’ or ‘expressions’. En denotes (. we drop parentheses whenever possible. . 3 × x + y. E) . Free/Bound Variables and αconversion We distinguish between free and bound occurrences of variables in an expression. x + 1) × x (the second occurrence of ‘x’ . . . w. . with • C ranges over a set of constants (we will use the constant names a. . .E or shortly λx1 x2 . E1 . CHURCH AND THE LAMBDA CALCULUS Deﬁnition of the Syntax of Lambda Terms: Lambda Term E := C  v  (E1 E2 )  (λv . (λx . . b. • Association to the right for iterated abstraction: λx1 . En ). . The subexpression E1 is referred to as the operator and E2 is referred to as the operand • (λv . Further x occurs both bound and free in (λx . An occurrence of v in E is said to be bound if it occurs within a subexpression of E with the form λv . whereas x occurs bound in this expression. (λxn . . (λx2 . .9. (. E denotes λx1 . xn .
E . • F V (λv . The renaming process is called αconversion.e. . Expressions that can be made textually equivalent by renaming bound variables are called αconvertible or alpha(betically) equivalent. it does not occur in E). y + y. Deﬁnition of free variables: The free variables of a term E. y are considered being equivalent. xy and λxy . since one has a free occurrence of y and the other not. So λx . An expression E is said to be closed if F V (E) = ∅. y × z) is closed. z + x)(λy . that is. Example: The expression λz . • F V (v) = {v}. CHURCH AND TURING in this expression is bound. provided y is fresh. αconversion We consider two terms as ‘equivalent’ if they only diﬀer in their bound variables. as long as we do so consistently. E →α λy . E) = F V (E) − {v}.158 CHAPTER 9. yx are not equivalent. (λx . and often also E1 ≡ E2 . y neither occurs as a free variable nor as a bound variable in the expression E (i. But we must distinguish λx . where E is obtained from E by replacing all free occurrences of x in E by y. • F V (E1 E2 ) = F V (E1 ) ∪ F V (E2 ). y + x and λy . When two lambda terms E1 and E2 are αconvertible in this sense we write E1 ≡α E2 . the third occurrence is free). and allows us to change the name of a bound variable. Note also that λxy . It is formally deﬁned as the equivalence relation generated by the following reduction: Deﬁnition of αreduction: λx . x and λy . denoted by F V (E). is a set of variables deﬁned recursively by: • F V (C) = ∅.
Lemma For all terms E0 . (z+7×y)[x+3/z] ≡ λy . (λy . y such that x ≡ y: E0 [E1 /x][E2 /y] ≡ E0 [E2 /y][E1 [E2 /y]/x]. is deﬁned by induction on the structure of E0 as: • C[E/v] ≡ C • x[E/v] ≡ E if x ≡ v x if x ≡ v • (E1 E2 )[E/v] ≡ (E1 [E/v])(E2 [E/v]) if x ≡ v λx . (x+3)+7×y. Substitution We now formally deﬁne the concept of substitution of a variable in lambda terms. z+7×x)[x+3/z] ≡ λy . two λterms are considered (syntactically) equal if they are αconvertible to each other. y × y) (because the y’s in (λy . x × y)y ≡α λx . E1 . E1 )[E/v] ≡ λy .(λz .9. The following lemma tells us that substitution behaves well. y × x) ≡α λy .2. Deﬁnition of Substitution: The substitution of expression E for each free occurrence of v in expression E0 . E2 and variables x. x × z)y From now on. CHURCH AND THE LAMBDA CALCULUS 159 Example: Some αconversions: λx .(λy . it can be proven by induction on the structure of λterms. y + 1 λx . x + 1 ≡α λy . (λy . ((E1 [y/x])[E/v]) if x ≡ v and x ∈ F V (E) and y ∈ F V (E1 E) / Example: (λx . E1 if x ≡ v and x ∈ F V (E) / λx . y × x) will get bound) λx . (E1 [E/v]) • (λx . denoted by E0 [E/v]. .
with the rules: E1 →β E2 E1 E →β E2 E E1 →β E2 EE1 →β EE2 E1 →β E2 λv. λx . λx . TWICE ((λy . f (f x). Deﬁnition of βreduction: βreduction is the compatible relation generated by (λv . Example: This example illustrates the need of αconversion during β reduction. CHURCH AND TURING Reduction System for the Lambda Calculus As we have seen with an example at the beginning of this section. the main rule for the lambda calculus is the beta reduction rule. yy)TWICE →β TWICE TWICE ≡ (λf . E1 )E2 is called a βredex . We use =β (or sometimes simply =) to denote the equivalence relation generated by →β . λx . TWICE ((λx . instead of →β and β . any term matching the lefthand side of the rule is called a redex and thus any expression of the form (λv . f (f y))[x/f ]) (Note the name clash) . f (f x))[x/f ]) λx . (x + 1) × n) 7 4 →β (λx .E1 →β λv. then (λy . Deﬁne TWICE ≡ λf . TWICE ((λf . Note the diﬀerence between ≡( α) and =(β) . even if distinct names are chosen from the start.160 CHAPTER 9. f (f x))x) →β ≡α λx . Example: (λnx . that we can now formally deﬁne. (x + 1) × 7) 4 →β (4 + 1) × 7.E2 As before. We often write → resp. TWICE (TWICE x) ≡ λx . βreduction is a reduction relation →β of the pure lambda calculus. f (f x)) TWICE →β λx . E1 )E2 →β E1 [E2 /v].
xxx)(λx . Then = ♦. xxx) →β · · ·. Deﬁnition of the Diamond Property: A binary relation → on the lambda terms Λ satisﬁes the diamond property. xx) →β · · ·. xxx)(λx . xxx)(λx . notation → = ♦ := (∀M. it does have the important conﬂuence property. xxx) →β (λx . xxx)(λx . x(xy)) . M2 ∈ Λ : (M → M1 ∧ M → M2 ) ⇒ (∃M3 : M3 ∈ Λ : M1 → M3 ∧ M2 → M3 )) Note that a reduction →β has the ChurchRosser property if it satisﬁes the diamond property. 3. TWICE (λy . xxx)) and the latter term can be reduced to c or again to a longer term. CHURCH AND THE LAMBDA CALCULUS ≡ →β Example: 1. M1 . (λy . its transitive. but also (λy .2. 4. (λx . 161 2. . This corresponds to ‘selfreproducing programs’. c)((λx . c)((λx . so diﬀerent reduction paths are possible.9. so terms can even become arbitrarily large. xx)(λx .. Ω ≡ (λx . xx) →β (λx . xxx)(λx . Although we already saw that λcalculus is neither weakly nor strongly normalizing. xx)(λx . xxx)) → (λy . M1 . M2 : M. thus inﬁnite sequences of steps are possible: βreduction is not always terminating. λx . x + 1) ((λy . xxx)(λx . To prevent confusion in the notation we will from now on also use the implication symbol ⇒. (λx . etc. xxx)(λx .. c)((λx . y × y) 3) β (two possibilities) (3 × 3) + 1. Lemma: Let → be a binary relation on a set Λ with reﬂexive closure and let → = ♦. xxx)) → c. First we introduce the following deﬁnition of the diamond property that we use to prove that →β is conﬂuent.
162
CHAPTER 9. CHURCH AND TURING
Proof: Assume → is a binary relation on a set Λ with its transitive, reﬂexive closure, and →= ♦. We now have to prove that = ♦. Suppose M , L, K ∈ Λ, M L and M K. We then have to prove (∃N : N ∈ Λ : L N ∧K N ). Let (*) M ≡ M0 → M1 → . . . → Mn ≡ L, for some n ∈ N
(**) M ≡ K0 → K1 → . . . → Km ≡ K, for some m ∈ N We now need to apply a technique called induction loading (see for more information the links on http://zax.mine.nu/stage/) to prove that K and L have a common reduct N. To be precise, we show that l(m, n) holds for all m, n ∈ N, with l(m, n) := there exists a N (i, j) ∈ Λ, with i, j ∈ N and 0 ≤ i ≤ n ∧ 0 ≤ j ≤ m such that: (a) N (i, 0) ≡ Mi (b) N (0, j) ≡ Kj (c) N (i, j) → N (i, j + 1) (d) N (i, j) → N (i + 1, j) if 0 ≤ i ≤ n if 0 ≤ j ≤ m if 0 ≤ i ≤ n ∧ 0 ≤ j < m if 0 ≤ i < n ∧ 0 ≤ j ≤ m
Clearly, when l(m, n) is true for all m, n ∈ N, we know that K and L have a common reduct. So the only remaining proof obligation is to show that l(m, n) holds for all m, n ∈ N. We prove this by induction to n. Base case (n): n=0 (a) let N (0, 0) be M0 , then (a) holds trivially by reﬂexivity of ‘≡’. (b) let N (0, j) be Kj for 0 ≤ j ≤ m, then (b) also holds trivially. Note that this is valid in combination with the deﬁnition under (a) since N (0, 0) ≡ M0 ≡ M ≡ K0 . (c) N (i, j) → N (i, j + 1) holds because i = 0 and (**). (d) N (i, j) → N (i + 1, j) holds trivially because n = 0 yields an empty range for i.
9.2. CHURCH AND THE LAMBDA CALCULUS
163
Induction case (n): Induction hypothesis (i.h.n): suppose that for n = k, k ∈ N, for all m ∈ N the statement l(m, n) is true. We now prove the statement for n = k + 1. We do this by induction to m. Base case (m): m=0 (a) let N (k + 1, 0) be Mk+1 for 0 ≤ k ≤ m, then (a) holds trivially. (b) since j = 0 this amounts to N (0, 0) ≡ K0 . This is true because of our previous deﬁnition of N (0, 0) ≡ M0 . and the fact that M0 ≡ M ≡ K0 . (c) holds trivially, because m = 0 yields an empty range for j. (d) N (i, j) → N (i + 1, j) because j = 0 and (*). Induction case (m): Induction hypothesis (i.h.m): suppose that for m = r and n = k + 1, r ∈ N, the statement l(m, n) is true. We now prove the statement for m = r + 1. (a) N (i, 0) ≡ Mi for 0 ≤ i ≤ k + 1 follows from i.h.n. (b) N (0, j) ≡ Kj for 0 ≤ j ≤ r + 1 follows from i.h.m. (c) and (d) We already know from the induction hypotheses that N (i, j) → N (i, j + 1) is okay for (0 ≤ i ≤ k + 1 ∧ 0 ≤ j < r) ∨ (0 ≤ i < k ∧ 0 ≤ j < r + 1). What we now have to show is that this is also true for i = k+1 and j = r+1. We know by (c) of i.h.m there exists a N (k, r) such that N (k, r) → N (k, r + 1). We also know by (d) of i.h.n that there exists a N (k, r) such that N (k, r) → N (k + 1, r). Then by the diamond property of → we know (∃N (k + 1, r + 1) : N (k + 1, r + 1) ∈ Λ : N (k, r + 1) → N (k + 1, r + 1) ∧ N (k + 1, r) → N (k + 1, r + 1)). We can now sketch the proof2 of the following fundamental theorem of the untyped lambda calculus:
The lines of the proof are due to W. Tait and P. MartinL¨f (see [6], section 3.2]), but o as far as I know this is the ﬁrst proof that formalized the above lemma to a reasonable extent.
2
164
CHAPTER 9. CHURCH AND TURING
Theorem (Church, Rosser): →β is conﬂuent. Proof: By the previous lemma, we know that if any binary relation on a set satisﬁes the diamond property, its transitive reﬂexive closure also satisﬁes the diamond property. Suppose we have a binary relation →partial−β on the set Λ such that β is the transitive reﬂexive closure of →partial−β . So if we prove that →partial−β satisﬁes the diamond property, by application of the previous lemma we have proved that β satisﬁes the diamond property, i.e. →β is conﬂuent. A concrete deﬁnition of →partial−β , a proof that its transitive reﬂexive closure is indeed →β , and a proof that →partial−β satisﬁes the diamond property can be found on pages 6062 of [6]. Theorem: λcalculus has the unique normal form property. Proof: Suppose that a term a of Λ, → has two normal forms, n1 ∈ Λ and n2 ∈ Λ. This means there is no b ∈ Λ such that n1 → b or n2 → b. n2 , and then by the ChurchRosser property we know But a n1 ∧ a n2 ). But then we must have n1 ≡ n2 . (∃c : c ∈ Λ : a n1 ∧ a Example: All constants are normal forms, as well as x, λx.x, λx.xx, yy, . . .. Note that the term (λx.xx)(λx.xx) cannot be reduced to a normal form. Conﬂuence is a fundamental property for functional programming; we relay on this when we evaluate programs by rewriting, knowing that we never have to backtrack an evaluation (this is also one of the main diﬀerences with logic programming). In the λcalculus we have deﬁned in this section, we can represent natural numbers and basic operations on the natural numbers. We will not show this here; in most books on the lambda calculus there are some examples of how to do basic arithmetic in lambda calculus. The λcalculus represents a certain class of (partial) functions on the integers. By a classical result of the American mathematician Stephen C. Kleene (19091994) this is exactly the set of (partial) recursive functions. The proof can be found in [6, theorem 9.2.16]. Church also thought of the set of functions that could be calculated in his λcalculus, and conjectured the following thesis:
e. So the power of Turing Machines is the same as the power of λcalculus. Both models capture the intuitive idea of computation. is the same as the set of functions that can be deﬁned in λcalculus. CHURCH AND THE LAMBDA CALCULUS 165 Church’s thesis (1936) The set of eﬀectively computable functions. This important thesis is the subject of the next section.2. Alan Turing proved in 1937 that the class of Turing computable functions is the same as the class of functions deﬁnable in λcalculus. i.3.9. . A more formal version and detailed treatment of Church’s thesis can be found in section 9. functions that intuitively (eﬀectively) can be computed.
Turing quickly established that the apparatus of lambdadeﬁnability and his own apparatus of computability were equivalent ([89]. CHURCH AND TURING 9. page 263). The formal concept proposed by Turing is that of computability by a Turing Machine (see section 9. After learning of Church’s proposal.org/turpap2/tp2ie.2). The class of lambdadeﬁnable functions and the class of recursive functions were later shown to be identical. In order to calculate the values of the function Church introduced his lambda calculus and speciﬁed the notion of a recursive function (see section 9. and attempts to characterize this eﬀectiveness lacked rigor.166 CHAPTER 9.1). . This was established in the case of functions of positive integers by Church and the American mathematician Kleene (see [47]. Church’s thesis: A function of positive integers is eﬀectively computable only if it is recursive The reverse implication is also referred to as the converse of Church’s thesis. One of Turing’s achievements in his paper of 1936 (reprinted in [19] and online available at http://www. mainly because the key requirement that the method demands no insight or ingenuity is left unexplicated. Turing’s method was however more obvious and more general than Church’s.abelard. Turing’s thesis: TM’s can do anything that could be described as intuitively computable Church also presented in [14] a formally exact way to express this notion of intuitively computable. The notion of an algorithm or an eﬀective method is an informal one. [14]). since the latter only considered functions of positive integers.asp) was to present a formally exact predicate with which the informal predicate ‘can be calculated by means of an algorithm or eﬀective method’ may be replaced.3 The ChurchTuring thesis The ChurchTuring thesis concerns the intuitive notion of algorithm (or effective or mechanical method) in logic and mathematics. He introduced this thesis in [90] in the course of arguing that the ‘Entscheidungsproblem’ for the predicate calculus is unsolvable.
page 232). with any architecture can compute’ (Turing said noting about the limits of what can be computed by a machine) • Whatever can be calculated by a machine (working on ﬁnite data in accordance with a ﬁnite program of instructions) is Turingmachinecomputable (this is known as Thesis M. Proof: See [89. THE CHURCHTURING THESIS 167 Theorem: Lambdadeﬁnability and Turing Machinecomputability are equivalent.3. see [16]) • Any process that can be given a systematic mathematical description (or a ‘precise enough characterization of a set of steps’. respectively by identifying the notion with that of computability by a Turing Machine and in the lambda calculus. Kleene introduced the term ‘ChurchTuring thesis’ to refer to any of the two equivalent theses ([48]. After this proof of equivalence. or that is ‘scientiﬁcally describable’ or ‘scientiﬁcally explicable’) can be simulated by a TM (this is known as Thesis S. ChurchTuring thesis: The intuitive notion of an algorithm equals the Turing Machine algorithm or (equivalent) the calculable functions of lambdacalculus There are a number of misunderstandings of the ChurchTuring thesis. the ChurchTuring thesis is often stated as ‘All computable functions are computable by a Turing Machine’ (a function is said to be computable if and only if there is an eﬀective procedure for determining its values). both methods are equivalent. collected in [16]. Turing did not show that • Any problem can be solved ‘by instructions. explicitly stated rules or procedures’ • A universal TM ‘can compute any function that any computer. page 263] for a proof that Turing’s machines and Church’s lambda calculus can compute the same set of functions. Although Turing and Church had chosen diﬀerent ways to formalize the intuitive notion of eﬀective computability.9. . see [16]) Since the word ‘computable’ is often tied by deﬁnition to eﬀective calculability.
the lambda calculus. On the face they seemed very diﬀerent from one another. When the thesis is expressed in terms of the formal concept by Turing. we can say that to deﬁne the concept of an algorithm. and idem for the case of Church.168 CHAPTER 9. in the sense that each picks out the same set of mathematical functions. but not the formal statement. Clearly. were true. but these two deﬁnitions turned out to be equivalent. it is appropriate to refer to the thesis also as the Turing thesis. CHURCH AND TURING If we summarize the above. Church used a notational system. if there were functions of which an informal (intuitive) statement. . then the latter would be less general than the former and so could not be reasonably be employed to replace it. the Turing Machine. The ChurchTuring thesis is the assertion that this set contains every function whose values can be obtained by a method or algorithm corresponding to our intuitive notion of eﬀectively computable. It is agreed amongst mathematicians and logicians that ‘computable by means of a TM’ is the correct accurate rendering of the informal notion in question. Turing did the same with his theoretical computing device.
. We are told that by its aid the stars are weighed and the billions of molecules in a drop of water are counted. in [99] When I started my study on the foundations of mathematics. though relatively unknown.T. this greatest science eludes the eﬀorts of our mental weapons to grasp it. . Most books that I found on the foundations of mathematics were either very detailed and descriptive (with an unmatched level of detail and exactness is the book [31] of I.A. Whitehead. that we should cultivate the habit of thinking of what we are doing. like the ghost of Hamlet’s father. Civilization advances by extending the number of important operations which we can perform without thinking about them. but sometimes hard work when I had to go through numerous books that were full of details or too vague and philosophical. I did not quite know what to expect. . The precise opposite is the case. By now I’ve learned that the foundations of mathematics can be a fascinating and important subject. Learning this new subject was an interesting challenge. GrattanGuinness) or treat only a part of the theory that was developed from 1890 to 1940 (for example [17] gives an excellent introduction to set theory). . . Kneebone [49] that is quite complete and still considerably theoretic. One of the motivations to write this article was to 169 . The study of mathematics is apt to commence in disappointment . . repeated by all copy books and by eminent people.Chapter 10 Conclusion It is a profoundly erroneous truism. One of the better. Yet. when they are making speeches. is the book of G.
But let’s ﬁrst go back to the project. We all know you can make a popular operating system or start your own business on the web and in one year make a million dollars if you’re lucky.170 CHAPTER 10. Most undergraduate courses I have taken gave little or no information about the history that is laying behind the theory. Hopefully that makes it more clear and enjoyable. I also tried to brieﬂy introduce the reader to the historical context of the most important developments. CONCLUSION present the theory properly. Emphasis was laid on the accumulation of mathematical knowledge. Some of the good literature used. such as modelchecking. such as the books just mentioned. The project In the beginning of the 20th century Hilbert said we should formalize all of mathematics. I believe that the history of mathematics in education can not only make the study of mathematics more interesting. if you prefer) grew apart. When reading about the work and biographies of all those brilliant men that have put themselves on this problem. The attempt to formalize mathematical reasoning is not new . you can (at least that’s what happened to me) get caught up into this fascinating philosophical question. will be found in the references at the end of this report. And when it comes to verifying mathematical proofs and making reliable software. I want to conclude this report with a summary of the theory and my own view on the project.the Greek already thought rationality was the supreme goal. on how and why in that respect the more practical. To most people however. mathematical reasoning. and other techniques. We can think of Plato . a formal basis is rarely used. the human mind is still the most important. are preferred. working mathematicians and more theoretical logicians (or formalists. this all seems very impractical. It might be worth writing another article. At the same time. but also help in the growth of mathematical understanding and appreciation of the current form of the theory. and with some ideas for future work. This ‘project’ (from now on I will refer to it as the project) has been the central theme of this report.
Russell always insisted on attributing it to Pythagoras (see [62]). logic was further developed and important contributions come from De Morgan. Because he was interested in theology and God (see [31. Turing later formalized the notion of computability to show there is no mechanical Although rationality is more commonly associated with Plato. Poincar´. At that point Hilbert proposed to use a formal axiomatic method to solve these problems. and he gave his famous three requirements of consistency. o and not everyone clearly recognized its importance. But there turned out to be some problems. Ever since. And even nowadays. Peano and Frege.”.8). Formalizing mathematics so that we have absolute truth is not possible! But these works of G¨del and Turing were new and complicated. and some people regarded his set theory as a disease. page 203]). and we often see confusion between notions like ‘checking the proof of a statement’ and ‘checking whether a statement is true (or not)’. Bernays and Neumann. But although all of these attempts were fruitful to a certain extent. This proposal of Hilbert to formalize mathematics. brought mathematical reasoning to an even higher level of formalization. as we have learned in chapter 4. chapter 3] and also [30. and developed his theory of inﬁnite sets. with his patterns of reasoning that are known as syllogisms. and although Cantor had already noticed this (see Cantor’s paradox in section 3. such as those of Zermelo and Fraenkel. completeness and decidability. with their theory of types. and it took G¨del and Turing to show that in fact ‘the project’ couldn’t o be done. led to the development of several axiomatic systems. So far. 2 Whether or not he actually said this is a matter of debate amongst historians of mathematics. Russell and Whitehead made their own ato tempt to formalize mathematics. Leibniz and especially Boole. by stating his Russell paradox. in total they all failed. or as Russell1 would say . Cantor became obsessed with the notion of inﬁnite. G¨del created a statement o o within arithmetics. the great French mathematician. There is also much confusion about the exact implications of G¨del’s and Turing’s work. that is not provable in any axiomatic system.8. With Cantor mathematics got more abstract. so good.think of Pythagoras and Rationality! Aristoteles made a big step in formalizing the reasoning. section 5.171 and Reason. few people are familiar with the details of their work. and of G¨del. 1 . it was Russell who spread the bad news to everyone. said2 : (from [95]) e “Later generations will regard Mengenlehre (set theory) as a disease from which one has recovered.
The only eﬀect the project might have had on working mathematicians. this is the way “we’re all beneﬁting from the glorious failure of this project!”. At ﬁrst this was a shock.J. but formalization of algorithms has been the great technological success of our time . but then mathematicians were saying (and again it would be nice to write an article about the diﬀerent responses of mathematicians and logicians): so what . this does not apply to the problems I care about. but sometimes they have vastly unexpected consequences”. If you look at G¨del’s original paper you see what looks like LISP. but it is true that theoretical studies. Some of course were inspired by problems like the 23 of Hilbert.. that I was made aware of through a videotaped lecture of G. But there has been another consequence of all this theoretical work. which is a functional programming language. I quote him about Hilbert’s attempt to formalize all mathematics after the publications of the theorems of G¨del o and Turing: “It failed in that precise technical sense.172 CHAPTER 10. CONCLUSION procedure to decide if a statement is correct or not. it’s very o close to LISP”. not formalization of reasoning..] If you look at Turing’s paper of course there is a machine language [. [. . But in fact it succeeded magniﬁcently.computer programming languages! So if you look at the history of the beginning of this century you’ll see papers by logicians studying the foundations of mathematics in which they had predicate calculi.] If you look at papers by Alonzo Church you see the lambda calculus.. and all of the foundational work is not so impractical after all! As G. Or.]. Indeed mathematicians continued with their work. Now this is not entirely true.. as von Neumann said: the universal Turing Machine is really the notion of a general purpose programmable computer and that’s the idea of software.maybe we can emphasize this aspect and show that technology often advances through fascinating impractical ideas. Chaitin on the internet. Formal methods/studies have not always done a good job promoting themselves . Chaitin concluded in his speech.. As he showed there are numerous examples of unexpected oﬀspring of theoretical research. as he says “don’t have spinoﬀ in dollars right away.we should do mathematics exactly the same way as we’ve always done it.J.. and the theorems of G¨del and Turing had no or little impact in practice on how o we (should) do mathematics. is that they have become a bit more precise in the use of language and in writing their proofs. Now you look back and you say this is clearly a programming language! [.
this is not the focus of most current research: (most people believe that) the human mind will (at least for the near future) be the one to prove whether a given mathematical statement is true or not. I consider creating an online version of this document with more background information and links. Despite that total formalization of parts of mathematics is very useful. geometry) successfully. chemistry or even social sciences). The short answer is: it is not. we consider all statements in the (everyday) work of practicing mathematicians. is now settled once and for all.173 Status of the project That brings us to ask if the question of the foundation of mathematics. Although G¨del and Turing showed that it is impossible to totally foro malize even basic arithmetic. Andrews says in [4]. we can conclude there is still a lot of work to do on the foundations of mathematics. it is still possible to formalize parts of mathematics (for example. if we provide one of the As interesting statements.” In that respect the formalization is not restricted to mathematical reasoning. These ‘practical’ statements do not include the speciﬁc purely theoretical statements that G¨del invented for his incompleteness theorem. Ideas for future work and distinguishment between mathematics and software And although it cannot be determined by a machine whether any given mathematical statement is true. let alone the whole of mathematics. As P. This is useful because. “attempts to understand the nature of reasoning and to build sophisticated information systems which can draw logical conclusions may be regarded as part of an endeavor to fashion more powerful intellectual tools for coping with the increasingly complex problems which confront mankind. Especially the development of software and computer systems will be facilitated by a formalization of theories. o 3 . even when all axiomatic systems are incomplete and there are undecidable statements. and it can also be applied to other disciplines (such as physics. more than a decade after Hilbert formulated it. we can try to develop an axiomatic system such that as much as possible of the interesting statements3 can be proved within that system. Even from the amount of interesting resources on current research that are available on the internet alone.
Instead of the proofs of mathematical statements.174 CHAPTER 10. this also might be a necessary option4 . in other words which classes will fall outside our system. Andrews calls his book [4]. Although we want to change as little as possible to the (side of) mathematics itself. we still have a way to decide mechanically whether or not the proof is correct for the given statement. CONCLUSION statements that the system does contain. As P. There is a growing need for reliable (that is. This has to be a part of our investigation: to ﬁnd out how many of the practical mathematical proofs contain ‘metaarguments’. This is an important point: Distinguishment between mathematics and software construction. contain ‘metaarguments’). The question then is if the set of statements for which we can do this. not only for (safety) critical systems. A formal approach can not only be used to prove correctness of mathematical statements but also of computer programs.e. but it also might require us to limit certain parts of mathematics so that they contain less undecidable proofs or require us to rewrite certain existing proofs to a form that is permitted by the system. which formalization is best usable and allows most parts of (practical) mathematics to be formalized. Also. We have to realize that we can never obtain a 100% guarantee of correctness of any algorithm. still forms a part of mathematics that is interesting enough. and which we claim to be decidable by providing a concrete and completely formalized (dis)proof of it within that system. This should be the ﬁrst goal for the near future: (1) Investigate which parts of mathematics can(not) be formalized (i. we are then checking the derivation steps of program derivations. I want to emphasize this diﬀerence. since For a successful formalization of parts of mathematics we therefore do not only look at the axiomatic system. since it is often unclear or left implicit which of the two is meant when arguments for/against formalistic studies are given. we get: ‘to truth through proof’. we are becoming more and more dependent on automated systems. Many models in physics and chemistry depend on underlying mathematical theorems. and the success of the model depends on the correctness of the mathematical theorems. Formalization is not only important to check the correctness of mathematical theories that are becoming ever more complex. but also in everyday applications. correctly speciﬁed and working according to the speciﬁcations) software. and totally formalize proof checking for as most parts of mathematics as possible. 4 . in particular computers and software.
These ﬁrst phases of software engineering (indicating user requirements/speciﬁcations) can also be adopted to comply with the methods of program derivation and formal proof checkers (note that we not only use the term ‘proof checker’ for mathematics. small and intuitive as possible (see also the ‘Bruijn criterion’ in [26. and therefore we better also try to keep the axiomatic system as simple. Therefore I stress for an integrated approach. At the Eindhoven University of Technology for example. But nevertheless. Software and Proof Checking I would also like to remark that proof checking for programs can only give us a way to verify the correctness of programs. construction/derivation) are merely advocates of their own approach. i. while a combination of both could give the best results. Although there has been some minor work on formalizing these proof techniques and combining formal methods and program derivations (see for example [26]).e. small and intuitive as possible (we could see all this as the Bruijn criterion variant for axiomatic systems). This is the focus of the work in the area of programming methodology. Dijkstra are used to derive correct programs from their speciﬁcation. Such an integrated approach requires research and cooperation between the various branches . pages 4 and 26]). since we learned from G¨del that the consistency of any axiomatic o system cannot be proved within that system. to check mathematical statements. highest chance of correctness of software). cooperation is still minimal. At least as important (to obtain correct programs) is the correct construction of programs. we can never obtain a 100% guarantee of correctness of any mathematical statement. And since we can never obtain a 100% guarantee of correctness of software (it depends for example on the correctness of the speciﬁcations and the proof checker itself). the techniques of E. any such implementation of a proof checker would give us the highest degree of certainty possible.e. for the combination of all of the mentioned methods can only together give us the highest reliability (i. If we go one step further back in the process of creating correct software. Unfortunately both areas (proof checking/veriﬁcation vs. That is why we have to try to keep the proofchecker as simple. model checking techniques can also be used as a veriﬁcation method to improve reliability even further. And analogue. the success of any piece of software depends on the correctness of its speciﬁcation.175 we also are dependent on the correctness of the proofchecker.W. but also for the software variant: for checking algorithms/programs derivations).
html. but for a successful result this is premature and do we ﬁrst have to start thoroughly at the beginning (1). Then (3) we can think of building a standard library of proved mathematics. Proof assistants not only check the proofs for us. There have already been attempts to build standard libraries of mathematics (see the Mizar project at http://www. where a syntaxdriven derivation system is presented for a formal language of mathematics called .org/ and the PRL project.www. (3) and (4). Proof assistants should make it easier for us to prove mathematical theorems. Mathematics and Proof Checking Let’s go back to proof checking of mathematical statements. And although we can not see the quality of mathematical work as evident as the quality of physical products. Many valuable experiences have come out of attempts on what are here called phase (2). This shows that it is possible to formalize large parts of mathematics. they can be stored in a database. but also help us in making the proofs: they are tools that are a combination of a proof development system and a proof checker.mizar.cs. Also an interesting article on computer assisted mathematics (for computer algebra) is [7] with an abstract history of computations versus proofs in mathematics. Work in this direction was done in [44]. We mentioned the ﬁrst goal of investigating and formalizing proof checking. of the remaining proof obligations and even ﬁll in part of the proofs for us automatically. but they lack the formal basis that has to be provided by (1) and (2)). It can be accessible to everyone via the internet and even be used for previously mentioned automated proving methods by proof assistants. this could be the long awaited ‘quality stamp’ for mathematics.edu/Info/Projects/NuPRL/nuprl. For complicated statements. but the process itself of formalizing mathematics is too direct and informal and needs to be further developed. CONCLUSION representing the methods I mentioned before and ultimately incorporation in the software engineering process. After a proof checker has conﬁrmed the correctness of a given mathematical statement and its corresponding proof. see http://. A good article about proof assistants using dependent type systems can be found in [8]. Barendregt and his group have formalized parts of algebra using the theorem prover COQ. The notion of ‘helping’ or ‘assisting’ in making proofs might be considered vague. we can think of tools that keep track of the context of the proof.176 CHAPTER 10.cornell. As a next step (2) we can think of building proof assistants.
etc.e. New methods are from areas such as neural networks. contain ‘metaarguments’). Part of the branch of automated proving are classical theorem proving methods (such as for example automated induction.177 Weak Type Theory. The new project (for mathematics): 1 Investigate which parts of mathematics can(not) be formalized (i. This is a start of a more rigorous approach to the translation of mathematical texts (statements and proofs).e. which formalization is best usable and allows most parts of (practical) mathematics to be formalized. and totally formalize proof checking for as most parts of mathematics as possible 2 building a proof assistant (probably based on some form of WTT and some form of TT) 3 build a standard library (archive) of proved mathematics 4 further develop automated proving techniques (to build in the proof assistant) And similarly we can formulate the new project for computer systems: The new project (for software construction): 1 formalize as much of program derivation checking as possible 2 build a programming assistant (environment) based on a suited (and preferably popular) programming language 3 build a standard library of reusable correct software (i. as the last and ﬁnal phase (4) of future work. We see the extension of proof assistants with more intelligent and sophisticated automated proving methods.). I want to end these ideas by summarizing the steps that are laying ahead of us. suitable for component based software engineering) and its speciﬁcation 4 further develop automated proving and program derivation techniques . in a new project. fuzzy logic and genetic and DNA computing and in the future possibly even quantum computing.
. type theoretic expressions have their counterparts in set theory. It is interesting to investigate if among such mappings there is indeed a bijection. has so far in this conclusion been avoided: What to take for the basis of mathematics? This is one of the most diﬃcult questions and as we have seen many great scientists have thought about this. several new types of logic have been proposed. page 174]). This standpoint was already mentioned in 1807 by the German mathematician Hegel (17701831) in [35]: “Dagegen muß behauptet werden. part of step (1). The only thing I can say is that it seems that recently most people seem to favor type theory over category theory.com/rbjpuc/logic/jrh0111. but until so far it seems they lack preciness.rbjones. That would show the equivalence of both theories in expressive power. formalization and proofs to support claims that they can be used successfully as a foundation for mathematics. those familiar with both type theory and axiomatic set theory recognize that in some ways the former provides a more natural vehicle than the latter for formalizing what mathematicians actually do”. A thorough research of the alternatives will have to yield the best approach and will show which choice of foundational system is best usable in practice. and I am not in the position to give an argumented opinion.] However.J.htm we ﬁnd a detailed assessment on the choice for a foundational system. H. P. Barendregt gives arguments for the use of type theory over set theory in [7]. CONCLUSION One of the most important questions. There is currently no consensus of what is the best approach. so that the debate can turn onto the question which theory is more intuitive and useful.178 CHAPTER 10. is that if there is a mapping from the axioms of (some form of) set theory in (some form of) type theory and vice versa. relational calculi and also over set theory. on http://www. Scott for example favors type theory over category theory in the introduction of [55]. a u die fertig gegeben und so angestrichen werden kann”. Also. with advantages of set theory over type theory. Some do not really believe in a successful formalization of mathematics but rather see the indeterminacies in mathematical representations and the undecidabilities in any formal system as the source of problem solving and creative power (see [87. A ﬁnal remark on the debate between type theory and axiomatic set theory as a foundational basis. and we quote from [4. the second page of the preface]: “[People prefer the approach they are most familiar with. such as IF logic (see [37]) and several types of socalled ‘fuzzy logics’. daß die Wahrheit nicht ein ausgepr¨gte M¨nze ist. On the contrary.
consisting of the four steps mentioned in this conclusion.179 I am aware of the limitations of this report. from his paper on the Turing test: “We can only see a short distance ahead. is just my own view of work that lays ahead of us. Mark Scheﬀer. I refer to the website http://zax. To end with a concluding remark by Alan Turing.mine. as we look at Chaitin’s results on randomness. The only excuse I have is that it is simply not possible to study all the original works in such a short period of time. Finally I would like to remark that the ‘new o project’. I hope to complete this work at a later stage. Many chapters are still informal. August 20015 p. such as the work of Frege in chapter 4. Recently we have seen interesting new theories on category and type theory and even on the foundations of mathematics. It might also be worth to extend (on both sides) the period of which the theory is treated in this report. but we can see plenty there that needs to be done”. The theory of types in chapter 7 and of G¨dels incomepleteness theorem in chapter 8 are not completely o covered and certain subjects closer to logic (such as intuitionism) are treated very minimally. it seems that he went further where G¨del and Turing left oﬀ. To those who wonder what the turtle and the elephant are doing on the cover of this report.s. 5 .nu/stage/. and include all theory in this report.
CONCLUSION .180 CHAPTER 10.
1: Luitzen Brouwer Figure A. suchii@bun.jp.2: George Cantor Drawings by Soshichi Uchii.ac.kyotou.Appendix A Timeline and Images Figure A. Photo Ramsey due to Harcourt. 181 . Jovanovich. Photo Quine by Kelly Wise. Brace.
5: Kurt G¨del o Figure A.6: David Hilbert .3: Richard Dedekind Figure A.4: Gottlob Frege Figure A. TIMELINE AND IMAGES Figure A.182 APPENDIX A.
9: Henri Poincar´ e Figure A.8: Giuseppe Peano Figure A.10: Willard Van Orman Quine .7: John von Neumann Figure A.183 Figure A.
11: Frank Plumpton Ramsey Figure A.12: Bertrand Russell Figure A.184 APPENDIX A. TIMELINE AND IMAGES Figure A.13: Alan Turing .
185 .
TIMELINE AND IMAGES .186 APPENDIX A.
J. The Lambda Calculus . [9] C. Hilbert. volume 2. [2] W.A. NorthHolland Press. An introduction to mathematical logic and type theory: to truth through proof. Almering. Levy. Grundz¨ge der Theoretischen Logik. [5] J. Ackermann and D. TU/e Press. chapter 18. 12. Amsterdam. Andrews. Geuvers and J. 2 edition. Backer and P. BarHillel A..M. Cohen. Berlin. 1986. 1973. 1993. Computational Models. 187 . 2001. Academic Press. [8] H. [6] H. Hooman. Symbolic Computation. First edition 1958. Hilbert’s basis theorem. Barendregt. 2000. University of Bialystok. 1928. Bloo.Bibliography [1] Y. u volume Band XXVII of Die Grundlehren der Mathematischen Wissenschaften in Einzeldarstellungen. Foundations of set theory. Inc.J. Academic press. Oxford Press. pages 11491240 of Handbook of Artiﬁcial Reasoning. volume 103. ﬁrst edition. Proofchecking using Dependent Type Systems. Delftse Uitgevers Maatschappij. Manuscript originally started by H. 2001.H. 2000. Fraenkel and A. Barendregt and A. 2001. Association of Mizar Users. SpringerVerlag. [4] P. Geuvers. Electronic Communication of Mathematics and the Interaction of Computer Algebra Systems and Proof Assistants. Published in Journal of Formalised Mathematics. [7] H. Analyse. Elsevier Science Publishing Company.Its Syntax and Semantics. Barendregt and H. 1984. [3] J. Rudnicki. J.
1971. de Swart. . 65. Mathematics. Doets D. Introduction to the Theory of Sets. Set Theory and the Continuum Hypothesis. An unsolvable problem in elementary number theory. 1997. Benjamin. Mathematical Circles Revisited. 1878.188 BIBLIOGRAPHY [10] J.C. 1978. Math. Seanet. New York. Methuen Press. New York. Einstein.S. Foundations and fundamental concepts of mathematics. Brown. The Undecidable: Basic Papers on Undecidable Propositions. Item in Stanford Encyclopedia of Philosophy. Berlin. reine und angew. [22] H. [16] B. [11] Encyclopedia Brittanica. Church. Dover publications inc.. [13] G. 1936. [21] A. PrenticeHall. Dauben. Breuer. Georg Cantor. Axiomatic and Applied. Pergamon Press.. Mineola. Davis. [15] P. P. [17] H. 2000. EB. Gesammelte Abhandlungen. Translated in ‘Contributions to the foundation of the theory of transﬁnite numbers (translation from German’. The ChurchTuring Thesis. Eves. August 1958. Eves. 1997.W. 1966. [18] J. 1965. Harvard University Press. volume 58.J. SpringerVerlag. Cohen. American journal of Mathematics. Jourdain. 1908. by Philip E. Ein beitrag zur mannigfaltigkeitslehre.. 1952. 84. Mathematische Annalen. Sets: Naive. SpringerVerlag. His Mathematics and Philosophy of the Inﬁnite. Dover Publishing. third edition edition. [20] Diverse. Boston Press. 1991. Copeland. Unsolvable Problems and Computable Functions. Raven Press. Journal f. [12] K. [19] M. Cantor. 1979. [23] H.J. [14] A. van Dalen and H. Relativity: the special and general theory. 1970. pages 119133. Bernays. London.
[33] J. 3 edition.R. Abstract Set Theory. 1993.. [36] H. Van Nostrand Press. Einleitung in die Mengenlehre. [34] P. volume 6. Fraenkel. 1992. Doctoral thesis. 2000. [28] D. The Search for Mathematical Roots 18701940. I1.BIBLIOGRAPHY 189 [24] A. First edition in 1953. English translation ‘The Phemenology of Mind’ by J. GrattanGuinness. Reprint: Meiner. 1996. Duckworth & Co. a guided independant study. Middlesex University Press. [29] I. Hegel. inc. Barllan University Press. In the Journal of the Bertrand Russell Archive. [32] I.. London. [30] I. A sideways look at Hilbert’s Twentythree Problems of 1900. Fraenkel.F. Goldrei. First published in 1980 by G. Introduction of the Israel Mathematical Conference Proceedings. Halmos. GrattanGuinness. third edition. 2000. Cocktail. McMaster University Library Press. [26] M. Eindhoven University Press. First published in 1962 by Basic Books. I.B. Haim. Schulz. Franssen. Classic set theory. 1966. page 58. 1807. NorthHolland Press. English translation of G¨del’s original 1931 publication of the incompleteness o theorem. 2000. GrattanGuinness. Chapman and Hall. Princeton University Press. Mathematische Logik. Amsterdam. London.W. GrattanGuinness. Naive Set Theory. 1. SpringerVerlag. 1990. Dover publications. From the Calculus to Set theory 16301910. [27] K. Unknown. Hermes and H. G¨del. New York. a [35] G. 2000. In Encyklopedia Mathematische Wissenschaften. London. Baillie in 1910. New York. Princeton University Press. Hbg. [25] A.A. Ph¨nomenologie des Geistes. 1952. On formally undecidable propositions of Principia Mathemao tica and related systems. . [31] I. How did Russell write the principles of mathematics (1903). 1997. 1928.
Clark University. Kleene. 1997. 2000. 1936. Reprint 2001. [38] A. 1967. Encyclopedia Brittanica. Clark University. Hintikka. Newman. New York Press. Phoenix. Journal of the Interest Group in Pure and Applied Logic. To be published. Koendrink. [51] K. [50] J. Kunen. [42] D. 1996. 1980. The Principles of Mathematics Revisited. Stanford University Press. 1963. [46] E. A derivation system for a formal language of mathematics. 1997. 1996. Clark University. Worcester.clarku. D. 1990. Cambridge University Press. [48] S. EB.T. Kneebone. Kaplansky. 1997. http://alepho. Mathematical logic and the foundations of mathematics. Irvine. Kamareddine and R.190 BIBLIOGRAPHY [37] J. 1940. [44] F. Joyce. Lambdadeﬁnability and recursiveness. item on David Hilbert. Hodges. The Great Philosophers.D. [39] A. Cambridge. Worcester. July 2001. Mathematical Logic. Duke Mathematical Journal 2:340353. 1990. 4 (2):195–213. van Nostrand Company. [43] F. Bertrand Arthur William Russell. . [49] G. The Mathematical Problems of David Hilbert.C. Laan.edu/ djoyce/hilbert/. Kasner and J. Joyce.C. Joyce. Nederpelt. [41] D. New York Publishing. [40] D. Mathematicians and the imagination. New York. 1997. Kamareddine and T. Solid Shape. Set theory: an introduction of independence proofs. Worcester. Hilbert’s 1900 Address. A list of Hilbert’s problems. [45] I. [47] S. Kleene. A reﬂection on russell’s ramiﬁed types and kripke’s hierarchy of truths. Turing.
[65] G. The Evolution of Type Theory in Logic and Mathematics. Applicazioni geometriche del calcolo inﬁnitesimale. An introduction to formal languages and automata. [54] T. Calcolo diﬀerenziale e principii di calcolo integrale.BIBLIOGRAPHY 191 [52] T. Monk. [55] J. First published in 1997 by Phoenix. Peano. . Peano. [57] J. Technical Report 9433. To inﬁnity and beyond. TUE Computing Science Reports. 57(2/3):243–278. development and inﬂuence. Eindhoven University of Technology. Routledge. [64] E. [53] T. D. 2001. First published in 1958. A modern elaboration of the ramiﬁed theory of types. From Brouwer to Hilbert. Laan and R. [60] P. Heath and Company. [63] G. Laan. Laan. 1996. MacHale. Newman. 1998. Nagel and J. 1987.R.C. Cambridge e University Press.H. Linz. G¨del’s proof. 1999. 2000. Cambridge University Press. Rootledge Press. o 1986. The Great Philosophers. Moore. [66] G. Comic Sections. 1982. Nederpelt. Lucas. 1997. Turin Press. [56] P. [62] R. 1994. [61] E. Russell. SpringerVerlag. logic and their limitations. Set theory. Turin Press. [58] D. 1884. Boston Press. PhD thesis. Dublin. Scott. New York University Press. Oxford University Press.J. Zermelo’s axiom of choice: it’s origins. Introduction to higher order logic. Lambek and P. 1993. Studia Logica. 1990. Maor.P. [59] Mosch´ Machover. A formalization of the ramiﬁed type theory. the debate on the foundations of mathematics in the 1920s. R. 1887. Mancosu. 1996. The conceptual roots of mathematics.
pages 105142. June 1927. [68] G.M. The foundations of mathematics. G¨ttingen Press. Revised edition of Norton. New York 1940. Published also in Acta Mathematica 30 (1906). Richard. Massachusetts. Russell. 1901. pages 295296. Harvard University Press. Wittgenstein P. Bertrand Russell entry in Encyclopedia Brittanica.J. [72] W. Les principes de math´matiques et le probl`me des enseme e bles. Harvard University Press. 2 edition. Rubin and J. Publisher Birkhauser.C. June 1927. London: George Allen and Unwin. edited by Peano. Ramsey. New York: Simon and Schuster. 1888. 2000. P. 1905. [78] B.192 BIBLIOGRAPHY [67] G. From a Logical Point of View: 9 LogicoPhilisophical Essays. Van Orman Quine.W. Uber die Hypothesen. Translation in German ‘Geometric Calculus : According to the Ausdehnungslehre of H. [70] W. Grassmann e precedutto dalle operazioni della logica deduttiva. Mathematical Logic. [74] J. 1951. 1988. Torino. [75] B. Harvard University Press. 16. Sullivan. Cambridge. Rose. Logica matematica. Amsterdam. In Ri(e)vista di mathematica. Grassmann’ by Lloyd Kannenberg. My philosophical development. Parte prima. EB. Peano. Unknown. Set Theory and its Logic. . Dizionario di matematica. Reprinted by F. 1854. [77] H. Raleigh NC.E. Fratelli Bocca. Van Orman Quine. Mathematical Maxims and Minims. [71] W. [69] L. o [76] N. Cambridge. 1963. Revue gnrale des sciences pures et appliques. Peano.J. Equivalents of the axiom of choice. november 1999. NorthHolland Press. welche der Geometrie zu grunde liegen. Calcolo geometrico secundo lAusdehnungslehre di H. Unknown. Massachusetts. Rubin. 1961. 1959. 1963. Riemann. [73] R. Van Orman Quine. Theoria 61 (2) (1995).
Routledge. Turing. Sobocinski. Oslo. 1991. 220228. First published in 1997 by Phoenix. by A. Tiles. [83] A. .M. 1922. Rome Press. edited by Jens E. 2000. 308316. January 2002. New York: The Macmillan Company. Helsinki. Skolem. Reprinted with some annotations in ‘The Undecidable: Basic Papers on Undecidable Propositions. 1936. [82] M. Reprinted in ‘Selected Works in Logic’. ed. 217232. 1965. The Great Philosophers. Skolem. SpringerVerlag. pages. pages 94107. Boston.T. Nederpelt”. Sipser. London: George Allen and Unwin. Titchmarsh. [89] A. Mathematical Maxims and Minims. Introduction to Mathematical Philosophy. Kamareddine T. Martin Davis. Mathematics and the image of reason. The autobiography of Bertrand Russell. Vol. Smullyan. On computable numbers. London Mathematical Society. 1997. Publisher Universitetsforlaget. Unknown. Akademiska Bokhandeln. lecture notes in mathematics. 1950. 1988. volume 8. [88] E.C. Fenstad.43 (1937) pages 544 to 546. With corrections from Proceedings of the London Mathematical Society. Einige bemerkungen zur axiomatischen begr¨ndung der u mengenlehre. Methodus II. [87] M. Unsolvable Problems and Computable Functions’.BIBLIOGRAPHY 193 [79] B. [80] B. In ‘Matematikerkongressen i Helsingfors 47 juli 1922. Russell. volume 42. 1982. L’analyse de l’antinomie Russellienne par Lesniewski. Raven Press. Series 2. Russell. To be published. 1992. pages 230265 of 2. Oxford Logic [85] B.T. [81] S. Proper forcing. [84] R. PWS Publishing Company. pages 237257. G¨del’s incompleteness theorems. with an application to the Entscheidungsproblem. Bulletin of Symbolic Logic. Methodus I. Laan and R. o Guides. Shelah. 1999. [86] F. Oxford University Press. Types in Logic and Mathematics before 1940. Den femte skandinaviska matematikerkongressen’.M. Routledge. 1970. New York. Introduction to the theory of computation.
Oxford University Press. Encyclopedia Brittanica. An introduction to Mathematics. SpringerVerlag. pages 7080. New foundations for Mathematical Logic.. Whitehead.194 BIBLIOGRAPHY [90] A. Zur Einfurung der transﬁniten Zahlen. ¨ I. Berlin. Item on Turing. [100] A. New York. In Mathematische Annalen 65. volume 13. National Physical Labatory. London. and Michie. [97] J. Turing. The Mathematical Intelligencer. Weiner. Past Masters. [93] J. Intelligent Machinery. Item on Principia Mathematica. 1908. P. B. National Physical Labatory Report in ‘Machine Intelligence 5’ by Meltzer. Encyclopedia Brittanica. Cornell. 1999. February 1937. van Orman Quine. 3]. von Neumann. [101] E. A treatise on universal algebra. SpringerVerlag. 1967. Weiner. 1991. 1908. [99] A.M. From Frege to G¨del: source book in mathematical o logic 18791931. 1911. Frege in Perspective. EB. 44(2). [98] J. [91] Unknown. 2000. 1:199208 [I. 1948. 1960. EB. Zermelo. . 1969. [92] Unknown. Frege. pages 261281. Edinburgh University Press. 2000. Whitehead. [94] W. Harvard University Press. Acta Szeged. The American Monthly. Williams and Norgate. Untersuchungen uber die Grundlagen der Mengenlehre. [96] J. [95] Various. van Heijenoort. 1990. 1923.